You are on page 1of 893

Advances in Intelligent Systems and Computing 1261

Aboul Ella Hassanien · Adam Slowik ·


Václav Snášel · Hisham El-Deeb ·
Fahmy M. Tolba   Editors

Proceedings
of the International
Conference
on Advanced
Intelligent Systems
and Informatics
2020
Advances in Intelligent Systems and Computing

Volume 1261

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
** Indexing: The books of this series are submitted to ISI Proceedings,
EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/11156


Aboul Ella Hassanien Adam Slowik
• •

Václav Snášel Hisham El-Deeb


• •

Fahmy M. Tolba
Editors

Proceedings
of the International
Conference on Advanced
Intelligent Systems
and Informatics 2020

123
Editors
Aboul Ella Hassanien Adam Slowik
Faculty of Computers and Artificial Department of Electronics
Intelligence, Information Technology and Computer Science
Department, and Chair of the Scientific Koszalin University of Technology
Research Group in Egypt Koszalin, Poland
Cairo University
Cairo, Egypt Hisham El-Deeb
Rector of the Electronic Research Institute
Václav Snášel Cairo, Egypt
Faculty of Electrical Engineering
and Computer Science
VŠB-Technical University of Ostrava
Ostrava-Poruba, Moravskoslezsky
Czech Republic

Fahmy M. Tolba
Faculty of Computers and Information
Ain Shams University
Cairo, Egypt

ISSN 2194-5357 ISSN 2194-5365 (electronic)


Advances in Intelligent Systems and Computing
ISBN 978-3-030-58668-3 ISBN 978-3-030-58669-0 (eBook)
https://doi.org/10.1007/978-3-030-58669-0
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This volume constitutes the refereed proceedings of the 5th International


Conference on Advanced Intelligent Systems and Informatics (AISI 2020), which
took place in Cairo, Egypt, during October 19–21, 2020, and is an international
interdisciplinary conference covering research and development in the field of
informatics and intelligent systems. In response to the call for papers for AISI 2020,
113 papers were submitted for the main conference and 57 for three special ses-
sions, so the total is 170 papers submitted for presentation and inclusion in the
proceedings of the conference. After a careful blind refereeing process, 79 papers
were selected for inclusion in the conference proceedings. The papers were eval-
uated and ranked on the basis of their significance, novelty, and technical quality by
at least two reviewers per paper. After a careful blind refereeing process, 79 papers
were selected for inclusion in the conference proceedings. The papers cover current
research intelligent systems, deep learning technology, document and sentiment
analysis, blockchain and cyber-physical system, health informatics and AI against
COVID-19, data mining, power and control systems, business intelligence, social
media and digital transformation, robotic, control design, and smart systems. We
express our sincere thanks to the plenary speakers, workshop chairs, and
International Program Committee members for helping us to formulate a rich
technical program. We would like to extend our sincere appreciation for the out-
standing work contributed over many months by the Organizing Committee: local
organization chair and publicity chair. We also wish to express our appreciation to
the SRGE members for their assistance. We would like to emphasize that the
success of AISI 2020 would not have been possible without the support of many
committed volunteers who generously contributed their time, expertise, and
resources toward making the conference an unqualified success. Finally, thanks to
Springer team for their support in all stages of the production of the proceedings.
We hope that you will enjoy the conference program.

v
Organization

Honorary Chair

Fahmy Tolba, Egypt


General Chairs
Vaclav Snasel Rector of the Technical University of Ostrava,
Czech Republic
Hesham El-deeb Rector of the Electronic Research Institute, Egypt

Co-chairs
Aboul Ella Hassanien Scientific Research Group in Egypt (SRGE)
Allam Hamdan Ahlia University, Manama, Bahrain

International Advisory Board


Norimichi Tsumura, Japan
Kuo-Chi Chang, China
Tarek Sobh, USA
Mahmoud Abdel-Aty, Egypt
Reda Salah, Egypt
Nagwa Badr, Egypt
Vaclav Snasel, Czech Republic
Janusz Kacprzyk, Poland
Siddhartha Bhattacharyya, India
Ahmed Hassan, Egypt
Hesham El-deeb, Egypt
Khaled Shaalan, Egypt
Ayman Bahaa, Egypt
Ayman El Desoky, Egypt

vii
viii Organization

Nouby Mahdy Ghazaly, Egypt


Hany Harb, Egypt
Alaa El-Sadek, Egypt
Arabi Keshk, Egypt
Magdy Zakariya, Egypt
Saleh Mesbah, Egypt
Fathi El-Sayed Abd El-Samie, Egypt
Tarek Ghareb, Egypt
Mohamed Belal, Egypt
Program Chair
Adam Slowik Koszalin University of Technology, Poland

Track Chairs

Intelligent Natural Language Processing Track


Khaled Shaalan, Egypt
Informatics Track
Diego Alberto Oliva, Mexico
Intelligent Systems Track
Ashraf Darwish, Egypt
Robotics, Automation and Control
Ahmad Taher Azar
Internet of Things and Big Data Analytics Track
Sherine Abd El-Kader
Publicity Chairs
Khaled Ahmed, USA
Mohamed Abd Elfattah, Egypt
Assem Ahmed Alsawy, Egypt
Technical Program Committee
Milan Stehlik Johannes Kepler University Linz, Austria
Fatmah Omara Egypt
Wael Badawy Egypt
Passent ElKafrawy Egypt
Walaaa Medhat Egypt
Aarti Singh India
Tahani Alsubait UK
Ahmed Fouad Egypt
Organization ix

Ali R. Kashani USA


Arun Kumar Sangaiah India
Rizwan Patan India
Gaurav Dhiman India
Nand Kishor Meena UK
Evgenia Theodotou Greece
Pavel Kromer Czech Republic
Irma Aslanishvili Czech Republic
Jan Platos Czech Republic
Ivan Zelinka Czech Republic
Sebastian Tiscordio Czech Republic
Natalia Spyropoulou Hellenic Open University, Greece
Dimitris Sedaris Hellenic Open University, Greece
Vassiliki Pliogou Metropolitan College, Greece
Pilios Stavrou Metropolitan College, Greece
Eleni Seralidou University of Piraeus, Greece
Stelios Kavalaris Metropolitan College, Greece
Litsa Charitaki University of Athens, Greece
Elena Amaricai University of Timișoara, Greece
Qing Tan Athabasca University, Greece
Pascal Roubides Broward College, Greece
Manal Abdullah King Abdulaziz University, KSA
Mandia Athanasopoulou Metropolitan College, Greece
Vicky Goltsi Metropolitan College, Greece
Mohammad Reza Noruzi Tarbiat Modarres University, Iran
Abdelhameed Ibrahim Egypt
Ahmed Elhayek Germany
Amira S. Ashour KSA
Boyang Albert Li .
Edgard Marx Germany
Fatma Helmy Egypt
Ivan Ermilov Germany
Mahmoud Awadallah USA
Minu Kesheri India
Mona Solyman Egypt
Muhammad Saleem Germany
Nabiha Azizi Algeria
Namshik Han UK
Noreen Kausar KSA
Noura Semary Egypt
Rania Hodhod Georgia
Reham Ahmed Egypt
Sara Abdelkader Canada
Sayan Chakraborty India
Shoji Tominaga Japan
x Organization

Siva Ganesh Malla India


Soumya Banerjee India
Sourav Samanta India
Suvojit Acharjee India
Swarna Kanchan India
Takahiko Horiuchi Japan
Tommaso Soru Germany
Wahiba Ben Abdessalem KSA
Zeineb Chelly Tunis

Local Arrangement Chairs


Ashraf/Darwish, Egypt
Mohamed Abd Elfattah, Egypt
Heba Aboul Ella, Egypt
Contents

Intelligence and Decision Making System


A Context-Based Video Compression: A Quantum-Inspired Vector
Quantization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Osama F. Hassan, Saad M. Darwish, and Hassan A. Khalil
An Enhanced Database Recovery Model Based on Game Theory
for Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Yasser F. Mokhtar, Saad M. Darwish, and Magda M. Madbouly
Location Estimation of RF Emitting Source Using Supervised
Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Kamel H. Rahouma and Aya S. A. Mostafa
An Effective Offloading Model Based on Genetic Markov Process
for Cloud Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Mohamed S. Zalat, Saad M. Darwish, and Magda M. Madbouly
Toward an Efficient CRWSN Node Based on Stochastic Threshold
Spectrum Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Reham Kamel Abd El-Aziz, Ahmad A. Aziz El-Banna, HebatAllah Adly,
and Adly S. Tag Eldien
Video Captioning Using Attention Based Visual Fusion
with Bi-temporal Context and Bi-modal Semantic Feature Learning . . . 65
Noorhan K. Fawzy, Mohammed A. Marey, and Mostafa M. Aref
Matchmoving Previsualization Based on Artificial Marker Detection . . . 79
Houssam Halmaoui and Abdelkrim Haqiq
Research Method of Blind Path Recognition Based on DCGAN . . . . . . 90
Ling Luo, Ping-Jun Zhang, Peng-Jun Hu, Liu Yang, and Kuo-Chi Chang

xi
xii Contents

The Impact of the Behavioral Factors on Investment


Decision-Making: A Systemic Review on Financial Institutions . . . . . . . 100
Syed Faisal Shah, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum

Deep Learning Technology and Applications


A Deep Learning Architecture with Word Embeddings to Classify
Sentiment in Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Eman Hamdi, Sherine Rady, and Mostafa Aref
Deep Neural Networks for Landmines Images Classification . . . . . . . . . 126
Refaat M. Fikry and H. Kasban
Deep Convolutional Neural Networks for ECG Heartbeat
Classification Using Two-Stage Hierarchical Method . . . . . . . . . . . . . . . 137
Abdelrahman M. Shaker, Manal Tantawi, Howida A. Shedeed,
and Mohamed F. Tolba
Study of Region Convolutional Neural Network Deep Learning
for Fire Accident Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Ntawiheba Jean d’Amour, Kuo-Chi Chang, Pei-Qiang Li, Yu-Wen Zhou,
Hsiao-Chuan Wang, Yuh-Chung Lin, Kai-Chun Chu, and Tsui-Lien Hsu

Document and Sentiment Analysis


Norm-Referenced Achievement Grading: Methods and Comparison . . . 159
Thepparit Banditwattanawong and Masawee Masdisornchote
Review of Several Address Assignment Mechanisms for Distributed
Smart Meter Deployment in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . 171
Tien-Wen Sung, Xiaohui Hu, and Haiyan Ou
An Approach for Sentiment Analysis and Personality Prediction
Using Myers Briggs Type Indicator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Alàa Genina, Mariam Gawich, and Abdelfatah Hegazy
Article Reading Sequencing for English Terminology Learning
in Professional Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Tien-Wen Sung, Qingjun Fang, You-Te Lu, and Xiaohui Hu
Egyptian Student Sentiment Analysis Using Word2vec During
the Coronavirus (Covid-19) Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Lamiaa Mostafa
Various Pre-processing Strategies for Domain-Based Sentiment
Analysis of Unbalanced Large-Scale Reviews . . . . . . . . . . . . . . . . . . . . . 204
Sumaia Mohammed AL-Ghuribi, Shahrul Azman Noah, and Sabrina Tiun
Contents xiii

Arabic Offline Character Recognition Model Using Non-dominated


Rank Sorting Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Saad M. Darwish, Osama F. Hassan, and Khaled O. Elzoghaly
Sentiment Analysis of Hotel Reviews Using Machine
Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Sarah Anis, Sally Saad, and Mostafa Aref

Blockchain and Cyber Physical System


Transparent Blockchain-Based Voting System: Guide
to Massive Deployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Aicha Fatrah, Said El Kafhali, Khaled Salah, and Abdelkrim Haqiq
Enhanced Technique for Detecting Active and Passive Black-Hole
Attacks in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Marwa M. Eid and Noha A. Hikal
A Secure Signature Scheme for IoT Blockchain Framework
Based on Multimodal Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Yasmin A. Lotfy and Saad M. Darwish
An Evolutionary Biometric Authentication Model for Finger
Vein Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Saad M. Darwish and Ahmed A. Ismail
A Deep Blockchain-Based Trusted Routing Scheme for Wireless
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Ibrahim A. Abd El-Moghith and Saad M. Darwish
A Survey of Using Blockchain Aspects in Information
Centric Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
Abdelrahman Abdellah, Sherif M. Saif, Hesham E. ElDeeb,
Emad Abd-Elrahman, and Mohamed Taher

Health Informatics and AI Against COVID-19


Real-Time Trajectory Control of Potential Drug Carrier Using
Pantograph “Experimental Study” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Ramy Farag, Ibrahim Badawy, Fady Magdy, Zakaria Mahmoud,
and Mohamed Sallam
Early Detection of COVID-19 Using a Non-contact
Forehead Thermometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Ahmed G. Ebeid, Enas Selem, and Sherine M. Abd El-kader
The Mass Size Effect on the Breast Cancer Detection Using 2-Levels
of Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Ghada Hamed, Mohammed Abd El-Rahman Marey, Safaa El-Sayed Amin,
and Mohamed Fahmy Tolba
xiv Contents

An Integrated IoT System to Control the Spread of COVID-19


in Egypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Aya Hossam, Ahmed Magdy, Ahmed Fawzy,
and Shriene M. Abd El-Kader
Healthcare Informatics Challenges: A Medical Diagnosis Using Multi
Agent Coordination-Based Model for Managing the Conflicts
in Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Sally Elghamrawy
Protection of Patients’ Data Privacy by Tamper Detection
and Localization in Watermarked Medical Images . . . . . . . . . . . . . . . . 358
Alaa H. ElSaadawy, Ahmed S. ELSayed, M. N. Al-Berry,
and Mohamed Roushdy
Breast Cancer Classification from Histopathological Images
with Separable Convolutional Neural Network and Parametric
Rectified Linear Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
Heba Gaber, Hatem Mohamed, and Mina Ibrahim

Big Data Analytics and Service Quality


Big Data Technology in Intelligent Distribution Network: Demand
and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Zhi-Peng Ye and Kuo-Chi Chang
Memory Management Approaches in Apache Spark: A Review . . . . . . 394
Maha Dessokey, Sherif M. Saif, Sameh Salem, Elsayed Saad,
and Hesham Eldeeb
The Influence of Service Quality on Customer Retention:
A Systematic Review in the Higher Education . . . . . . . . . . . . . . . . . . . . 404
Aisha Alshamsi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
The Impact of Ethical Leadership on Employees Performance:
A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
Hind AlShehhi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum

Data Mining, Decision Making, and Intelligent Systems


Evaluating Non-redundant Rules of Various Sequential Rule
Mining Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Nesma Youssef, Hatem Abdulkader, and Amira Abdelwahab
Impact of Fuzzy Stability Model on Ad Hoc Reactive Routing
Protocols to Improve Routing Decisions . . . . . . . . . . . . . . . . . . . . . . . . . 441
Hamdy A. M. Sayedahmed, Imane M. A. Fahmy, and Hesham A. Hefny
Contents xv

A Multi-channel Speech Enhancement Method Based on Subband


Affine Projection Algorithm in Combination with Proposed Circular
Nested Microphone Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Ali Dehghan Firoozabadi, Pablo Irarrazaval, Pablo Adasme, Hugo Durney,
Miguel Sanhueza Olave, David Zabala-Blanco, and Cesar Azurdia-Meza
Game Theoretic Approach to Optimize Exploration Parameter
in ACO MANET Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Marwan A. Hefnawy and Saad M. Darwish
Performance Analysis of Spectrum Sensing Thresholding Methods
for Cognitive Radio Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Rhana M. Elshishtawy, Adly S. Tag Eldien, Mostafa M. Fouda,
and Ahmed H. Eldeib
The Impacts of Communication Ethics on Workplace Decision
Making and Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
Alyaa Alyammahi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
A Comparative Study of Various Deep Learning Architectures
for 8-state Protein Secondary Structures Prediction . . . . . . . . . . . . . . . . 501
Moheb R. Girgis, Enas Elgeldawi, and Rofida Mohammed Gamal

Power and Control Systems


Energy Efficient Spectrum Aware Distributed Clustering in Cognitive
Radio Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Randa Bakr, Ahmad A. Aziz El-Banna, Sami A. A. El-Shaikh,
and Adly S. Tag ELdien
The Autonomy Evolution in Unmanned Aerial Vehicle: Theory,
Challenges and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Mohamed M. Eltabey, Ahmed A. Mawgoud, and Amr Abu-Talleb
A Non-destructive Testing Detection Model for the Railway
Track Cracks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Kamel H. Rahouma, Samaa A. Mohammad, and Nagwa S. Abdel Hameed
Study of Advanced Power Load Management Based on the Low-Cost
Internet of Things and Synchronous Photovoltaic Systems . . . . . . . . . . 548
Elias Turatsinze, Kuo-Chi Chang, Pei-Qiang Li, Cheng-Kuo Chang,
Kai-Chun Chu, Yu-Wen Zhou, and Abdalaziz Altayeb Ibrahim Omer
Power Grid Critical State Search Based on Improved Particle Swarm
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
Jie Luo, Hui-Qiong Deng, Qin-Bin Li, Rong-Jin Zheng, Pei-Qiang Li,
and Kuo-Chi Chang
xvi Contents

Study of PSO Optimized BP Neural Network and Smith Predictor


for MOCVD Temperature Control in 7 nm 5G Chip Process . . . . . . . . 568
Kuo-Chi Chang, Yu-Wen Zhou, Hsiao-Chuan Wang, Yuh-Chung Lin,
Kai-Chun Chu, Tsui-Lien Hsu, and Jeng-Shyang Pan
Study of the Intelligent Algorithm of Hilbert-Huang Transform
in Advanced Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Cheng Zhang, Jia-Jing Liu, Kuo-Chi Chang, Hsiao-Chuan Wang,
Yuh-Chung Lin, Kai-Chun Chu, and Tsui-Lien Hsu
Study of Reduction of Inrush Current on a DC Series Motor
with a Low-Cost Soft Start System for Advanced Process Tools . . . . . . 586
Governor David Kwabena Amesimenu, Kuo-Chi Chang, Tien-Wen Sung,
Hsiao-Chuan Wang, Gilbert Shyirambere, Kai-Chun Chu,
and Tsui-Lien Hsu
Co-design in Bird Scaring Drone Systems: Potentials and Challenges
in Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
Moammar Dayoub, Rhoda J. Birech, Mohammad-Hashem Haghbayan,
Simon Angombe, and Erkki Sutinen
Proposed Localization Scenario for Autonomous Vehicles in GPS
Denied Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Hanan H. Hussein, Mohamed Hanafy Radwan,
and Sherine M. Abd El-Kader

Business Intelligence
E-cash Payment Scheme in Near Field Communication
Based on Boosted Trapdoor Hash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Ahmed M. Hassan and Saad M. Darwish
Internal Factors Affect Knowledge Management and Firm
Performance: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
Aaesha Ahmed Al Mehrez, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Enhancing Our Understanding of the Relationship Between
Leadership, Team Characteristics, Emotional Intelligence
and Their Effect on Team Performance: A Critical Review . . . . . . . . . . 644
Fatima Saeed Al-Dhuhouri, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Factors Affect Customer Retention: A Systematic Review . . . . . . . . . . . 656
Salama S. Alkitbi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Contents xvii

The Effect of Work Environment Happiness


on Employee Leadership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Khadija Alameeri, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Performance Appraisal on Employees’ Motivation:
A Comprehensive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
Maryam Alsuwaidi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum

Social media and Digital transformation


Social Media Impact on Business: A Systematic Review . . . . . . . . . . . . 697
Fatima Ahmed Almazrouei, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Digital Transformation and Organizational Operational Decision
Making: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
Ala’a Ahmed, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
The Impact of Innovation Management in SMEs Performance:
A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
Fatema Al Suwaidi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
The Effect of Digital Transformation on Product Innovation:
A Critical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
Jasim Almaazmi, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum
Women Empowerment in UAE: A Systematic Review . . . . . . . . . . . . . . 742
Asma Omran Al Khayyal, Muhammad Alshurideh, Barween Al Kurdi,
and Said A. Salloum

Robotic, Control Design and Smart Systems


Lyapunov-Based Control of a Teleoperation System in Presence
of Time Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Mohamed Sallam, Ihab Saif, Zakaria Saeed, and Mohamed Fanni
Development and Control of a Micro-robotic System
for Medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
Fady Magdy, Ahmed Waheed, Ahmed Moustafa, Ramy Farag,
Ibrahim M. Badawy, and Mohamed Sallem
Wake-up Receiver for LoRa-Based Wireless Sensor Networks . . . . . . . 779
Amal M. Abdel-Aal, Ahmad A. Aziz El-Banna, and Hala M. Abdel-Kader
xviii Contents

Smart Approach for Discovering Gateways in Mobile


Ad Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
Kassem M. Mostafa and Saad M. Darwish
Computational Intelligence Techniques in Vehicle to Everything
Networks: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
Hamdy A. M. Sayedahmed, Emadeldin Mohamed, and Hesham A. Hefny
Simultaneous Sound Source Localization by Proposed Cuboids Nested
Microphone Array Based on Subband Generalized
Eigenvalue Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
Ali Dehghan Firoozabadi, Pablo Irarrazaval, Pablo Adasme, Hugo Durney,
Miguel Sanhueza Olave, David Zabala-Blanco, and Cesar Azurdia-Meza
A Framework for Analyzing 4G/LTE-A Real Data Using Machine
Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
Nihal H. Mohammed, Heba Nashaat, Salah M. Abdel-Mageid,
and Rawia Y. Rizk
Robust Kinematic Control of Unmanned Aerial Vehicles
with Non-holonomic Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Ahmad Taher Azar, Fernando E. Serrano, Nashwa Ahmad Kamal,
and Anis Koubaa
Nonlinear Fractional Order System Synchronization via
Combination-Combination Multi-switching . . . . . . . . . . . . . . . . . . . . . . 851
Shikha Mittal, Ahmad Taher Azar, and Nashwa Ahmad Kamal
Leader-Follower Control of Unmanned Aerial Vehicles with State
Dependent Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862
Ahmad Taher Azar, Fernando E. Serrano, Nashwa Ahmad Kamal,
and Anis Koubaa
Maximum Power Extraction from a Photovoltaic Panel Connected
to a Multi-cell Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
Arezki Fekik, Ahmad Taher Azar, Nashwa Ahmad Kamal,
Fernando E. Serrano, Mohamed Lamine Hamida, Hakim Denoun,
and Nacira Yassa
Hidden and Coexisting Attractors in a New Two-Dimensional
Fractional Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
Amina-Aicha Khennaoui, Adel Ouannas, and Giuseppe Grassi

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891


Intelligence and Decision Making
System
A Context-Based Video Compression:
A Quantum-Inspired Vector Quantization
Approach

Osama F. Hassan1, Saad M. Darwish2, and Hassan A. Khalil3(&)


1
Department of Mathematics, Faculty of Science, Damanhour University,
Damanhour, Egypt
osamafarouk@sci.dmu.edu.eg
2
Department of Information Technology, Institute of Graduate Studies
and Research, Alexandria University, Alexandria, Egypt
saad.darwish@alexu.edu.eg
3
Department of Mathematics, Faculty of Science,
Zagazig University, Zagazig 44519, Egypt
h.a.khalil@zu.edu.eg

Abstract. This paper proposes a modified video compression model that


optimizes vector quantization codebook by using the adapted Quantum Genetic
Algorithm (QGA) that uses the quantum features, superposition, and entangle-
ment to build optimal codebook for vector quantization. A context-based initial
codebook is created by using a background subtraction algorithm; then, the
QGA is adapted to get the optimal codebook. This optimal feature vector is then
utilized as an activation function inside the neural network’s hidden layer to
remove redundancy. Furthermore, approximation wavelet coefficients were
lossless compressed with Differential Pulse Code Modulation (DPCM); whereas
details coefficients are lossy compressed using Learning Vector Quantization
(LVQ) neural networks. Finally, Run Length Encoding is engaged to encode the
quantized coefficients to achieve a high compression ratio. As individuals in the
QGA are actually the superposition of multiple individuals, it is less likely that
good individuals will be lost. Experiments have proven the system’s ability to
achieve a higher compression ratio with acceptable efficiency measured by
PSNR.

Keywords: Video compression  Neural Network  Quantum Genetic


Algorithm  Context-based compression

1 Introduction

The immense use of multimedia technology during the past decades has increased the
demand for digital information. This enormous demand with a massive amount of data
made the current technology unable to efficiently deal with it. However, removing
redundancies in video compression solved this problem [1]. Reducing the bandwidth
and storage capacity while preserving the quality of a video is the main goal for video

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 3–15, 2021.
https://doi.org/10.1007/978-3-030-58669-0_1
4 O. F. Hassan et al.

compression. Compression techniques are divided into two types, lossless and lossy
compression.
Nevertheless, there are still many problems or challenges that hinder video com-
pression from being popular. The main issue is how to make a tradeoff between the
video quality in terms of Peak Signal to Noise Ratio (PSNR) and the compression ratio.
Moreover, researchers sometimes are not able to reach applicable perceptual com-
pression techniques because of application- and context-based quality expectations of
users. Nevertheless, perceptual video compression has great potential as a solution to
facilitate multimedia content management due to its efficiency for data rate reduction
[2]. Recently, several approaches have been presented that attempt to tackle the above
problems. The taxonomy of these approaches can be categorized as spatial, temporal,
statistical, and psycho-visual redundancies. Readers looking for more information
regarding these types can refer to [3]. In general, spatial redundancies can be exploited
to remove or reduce higher frequencies in an effective way without affecting the
perceived quality.
Vector Quantization (VQ) is an efficient and easy technique for video compression.
VQ includes three main steps, encoding, codebook generation, and decoding; see [3]
for more information. Neural Network (NN) is commonly used in video coding
algorithms [4] that is made of two main components, spatial component, and recon-
struction component, the spatial encodes intra-frame visual patterns. The reconstruction
aggregates information to predict details. Some of these algorithms can better exploit
spatial redundancies for rebuilding high-frequency structures by making the spatial
component deep. The neural video compression method based on the predictive VQ
algorithm requires the correct detection of key frames in order to improve its perfor-
mance. Recently, evolutionary optimization techniques (e.g., genetic algorithm, and
swarm intelligence) are exploited to enhance the NN learning process and build an
intelligent vector quantization [5].
Quantum computation is an interdisciplinary science that emerged from informa-
tion science and quantum science. Quantum Genetic Algorithm (QGA) is an opti-
mization technique adapted to the Genetic Algorithm (GA) to quantum computing.
They are mainly based on qubits and state superposition of quantum mechanics. Unlike
the classical representation of chromosomes (binary string, for instance), here they are
represented by vectors of qubits (quantum register). Thus, a chromosome can represent
the superposition of all possible states [6]. Some efforts were spent to use QGA for
exploring search spaces for finding an optimal solution [7, 8]. The codebook design is a
crucial process in video compression and can be regarded as a searching problem that
seeks to find the most representative codebook which could correctly be applied in the
video compression.

1.1 Novelty and Contribution


The novelty of the proposed video compression model is that it is done based on
removing different types of redundancies in one package. The model handles the
frame’s spatial redundancy by dropping the duplicate in the high-frequency coefficients
of the Discrete Wavelet Transform (DWT) through adapting vector quantization based
NN, whereas the redundancy inside the low-frequency (high energy) coefficients will
A Context-Based Video Compression 5

be eliminated by using DPCM. The model controls the enter-frames temporal redun-
dancy by utilizing a background subtraction algorithm to extract motion objects within
frames to generate the condensed initial codebook. Regarding statistical redundancy,
the model employs run-length encoding to increase the compression ratio. Overall, the
model performance depends mainly on the construction of the optimal codebook for
vector quantization. It exploits QGA with a fitness function based on the Euclidean
distance between the initial codebook and each frame in the video. Utilizing QGA
helps in that the effective statistical size of the population appears to be increased. This
means that the advantage of good building blocks has been magnified with the aim of
enhancing the optimal features selection process [8].

2 Literature Survey

Research in the video compression domain has attracted tremendous interest in recent
years. This is mainly due to its challenging nature in effectively satisfying high com-
pression ratio and quality after decoding without degradation of the reconstructed
video. An insight into the penitential of using vector quantization for real-time neural
video codec is provided in [4]. This technique utilizes Predictive Vector Quantization
(PVQ) that combines vector quantization and differential pulse code modulation.
Another work involving hybrid transformation-based video compression may be seen
in [1]. The hybrid compressed frame is quantized and entropy coded with Huffman
coding. This method utilized the motion vectors, found from estimation using adaptive
rood pattern search, and is compensated globally. Their system was more complex
because the hybrid transforms with quantization needs a lot of time to compress the
video.
With this same objective, in 2015, Elmolla et al. [1] introduced run-length and
Huffman coding as a means of packaging hybrid coding. This type of compression has
the ability to overcome the drawbacks of wavelet analysis, but there are some of the
limitations, they are not optimal for the sparse approximation of curve features beyond-
singularities. More formal description, as well as a review of video compression based
on Huffman coding, can be found in [9]. Yet, Huffman coding requires two passes. The
first pass is used to build a statistical model of the data, whereas the second pass is used
to encode it, so it is a relatively slow process. Due to that, some other techniques are
faster than Huffman coding when reading or writing files.
A lot of research interest is being shown in optimization techniques that can obtain
the temporal redundancy that deals with motion estimation and compensation based on
edge matching, which can alleviate the problem of local minima and, at the same time,
reduce computational complexity [10]. The ant colony edge detector is used to create
edges for motion compensation. The main disadvantages of block matching are the
heavy computation involved and the motion averaging effect of the blocks. Another
approach was proposed by Rubina in 2015 [11], defining a technique to provide
temporal–based video compression based on fast three-dimensional cosine transform.
To minimize the influence caused by the hybrid transformation in terms of com-
pression quality and increase the compression ratio; Esakkirajan et al. [12] incorporated
the advantages of multiwavelet coefficients, possesses more than one scaling function,
6 O. F. Hassan et al.

and adaptive vector quantization scheme, the design of the codebook is based on the
dynamic range of the input data. Another approach was suggested by Nithin et al. in
2016 [13]. It defined a technique to provide component-level information to support
spatial redundancy reduction based on properties of fast curvelet transform, Burrows-
Wheeler Transform, and Huffman coding from the assembly. Although video com-
pression has been studied for nearly many decades, there is still room to make it more
efficient and practical in the real application. According to the aforementioned review,
it can be found that past studies were primarily not addressing the issues associated
with the building of codebook for vector quantization compression algorithms (most
often built randomly). However, to the best of our knowledge, little attention has been
paid to devising new optimal codebooks and improving its efficiency for vector
quantization.

3 Methodology

This paper proposes a new model that combines the two types of video coding: intra-
frame and inter-frame coding in a unified framework to remove different types of
redundancies (spatial, temporal, and statistical). The intra-frame coding is achieved by
fusing the information come from both of wavelet transform and quantization infor-
mation, the wavelet transform decorrelates the pixels of the input frame, converting
them into a set of coefficients that can be coded more efficiently than the original pixel
values themselves. In contrast, the quantization information originates from DPCM that
forms the core of essentially all lossless compression algorithms. For inter-frame
coding, the vector quantization technique is adapted based on the background sub-
traction algorithm to condense the codebook length. Finally, Run Length Encoding
(RLE) algorithm is used to merge information for the two coding techniques to achieve
high compression by removing the statistical redundancy. Figure 1 shows the main
model components for both compression and decompression phase, respectively, and
how they are linked to each other.

Fig. 1. Flow diagram of the proposed system: (Left) compression phase. (Right) Decompression
phase
A Context-Based Video Compression 7

Step 1: Generate Initial Codebook: In this step, a codebook for each video is built
offline that relies on extracting the moving parts of the frames (foreground) beside the
background; each of them is represented as a codeword. The separating of moving
objects is performed based on the background subtraction technique. Background
subtraction is a widely used approach for detecting moving objects in videos from static
cameras [14]. The accuracy of this approach is dependent on the speed of movement in
the scene. Faster movements may require a higher threshold [3].
Step 2: Codebook Optimization: Given the initial codebook, the next step is to tune
the codewords inside the codebook is given a specific objective function. The quantum
genetic algorithm is adopted here to realize this step; the domain of QGA is opti-
mization problems where the set of feasible solutions is discrete or can be reduced to a
discrete one, and the goal is to find the best possible solution [6–8, 15]. The structure of
a QGA is illustrated in Fig. 2. The suggested model utilizes the quantum parallelism
that refers to the process of evaluating a function once on a “superposition” of all
possible inputs to produce a superposition of all possible outputs. It means that the time
required to calculate all possible outputs is the same time required to calculate only one
output with a classical computer. Quantum register with superposition can store
exponentially more data than a classical register of the same size. In the quantum
algorithm, superimposed states are connected by a quantum connection called Entan-
glement. In general, quantum superposition gives quantum algorithms the advantage of
has less complexity than its classic equivalent algorithm.

Fig. 2. QGA structure (left) flowchart, (right) pseudocode


8 O. F. Hassan et al.

A chromosome is simply a vector of m qubits that forms a quantum register. Herein,


the easiest way to create the initial population (a combination
  of different codewords) is
to initialize all the amplitudes of qubits by the value p1ffiffi . All quantum superposition
2
states will be expressed by a chromosome with equal probability. In order to make a
reproduction, the evaluation phase quantifies the quality of each quantum chromosome
in the population. The evaluation is based on an objective function (Euclidean distance
in our case) that corresponds to each individual, after measuring an adaptation value. It
permits to mark individuals in the population. In order to exploit effectively superposed
states of qubits, each qubit must be observed, known as measuring chromosomes,
which leads us to extract a classic chromosome.
In order to intensify the search and improve performance, the interference operation
allows modifying the amplitudes of individuals by moving the state of each qubit in the
sense of the value of the best solution. This can be made by using a unit transformation
that allows a rotation whose angle is a function of the amplitudes and value of the
corresponding bit in the reference solution. The value of the rotation angle must be
chosen so that to avoid premature convergence. It is often empirically determined, and
its direction is determined as a function of the values of probabilities where a qubit is in
state 0 and state 1.
Quantum genetic uses quantum gates to perform the rotation of an individual’s
amplitudes. Quantum gates can also be designed according to practical problems. The
qubits constituting individuals are rotated by quantum gates to update the population Q
(t). The quantum rotating gates are given by the following equation [8]:
    
ait þ 1 cosðDhi Þ sinðDhi Þ ati
¼ ð1Þ
bti þ 1 sinðDhi Þ cosðDhi Þ bti

where ati and bti are the probability amplitudes associated with the 0 state and the 1 state
of the ith qubit at time t. Therefore, the values a2 and b2 represent the probability of
seeing a qubit in states 0 and 1 respectively, when the value of the qubit is measured.
As such, the equation a2 þ b2 ¼ 1 is a physical requirement. Where Dhi is the rotation
angle of qubit quantum gate i of each quantum chromosome. It is often obtained from a
lookup table to ensure convergence, as illustrated in Table 1.

Table 1. Rotation angle selection strategy


xi bi f ðxi Þ [ f ðbi Þ Dhi Sðam ; bm Þ  Dhi
d cm  bm [ 0 cm  bm \0 cm ¼ 0 bm ¼ 0
0 0 False 0 0 0 0 0
0 0 True 0 0 0 0 0
0 1 False 0 0 0 0 0
0 1 True d 1 −1 0 1
1 0 False 0 0 0 0 0
1 0 True d −1 1 1 0
1 1 False 0 0 0 0 0
1 1 True 0 0 0 0 0
A Context-Based Video Compression 9

The increase in the production of a good building block seems to be the most
significant advantage of QGA. The promotion of good building blocks in the classical
GA is statistically due to their ability to produce fit offspring, which will survive and
further deploy that building block. However, when a new building block appears in the
population, it only has one chance to ‘prove itself’. By using superimposed individuals,
the QGA removes much of the randomness of the GA. Thus, the statistical advantage
of good building blocks should be much greater in the QGA. This, in turn, should cause
the number of good building blocks to grow much more rapidly. This is clearly a very
significant benefit [6–8].

3.1 Compression
Data compression schemes design involves trade-offs among the compression rate, the
distortion rate (when using lossy data compression), and the computational resources
required to compress and decompress the data [1, 2]. The compression phase consists
of two main stages, lossless compression based on DPCM and lossy compression based
on an enhanced Learning Vector Quantization (LVQ) neural network. Both stages
operate on wavelet coefficients of each frame [16].
(a) Lossless Compression: for each frame, the low-frequency wavelet coefficients with
a large amount of energy are losslessly compressed to preserve the most important
features from loss. In this, Differential Pulse Code Modulation DPCM is employed as a
signal encoder that uses the baseline of pulse-code modulation (PCM) but adds some
functionality based on the prediction of the samples of the signal [16]. DPCM takes the
values of two consecutive samples; if they are analog samples, quantize them; calculate
the difference between successive values; then, get the entropy code for the difference.
Applying this process, eliminated the short-term redundancy (positive correlation of
nearby values) of the signal.
(b) Lossy Compression: for each frame, the high-frequency wavelet coefficients with a
small amount of energy (not salient features) are lossy compressed to achieve a high
compression ratio. In this LVQ neural network are adapted to compress these coeffi-
cients; LVQ neural network utilizes an optimized codebook for each video as a
dynamic vector quantization to be embedded into the hidden layer as an activation
function. Unlike the current methods that employed the neural network as a black-box
for lossy compression, the suggested model adapts optimized VQ derived from step 2
as an activation function embedded in each hidden layer’s neurons.
(c) Run Length Coding: given both of the quantized coefficients vector obtained from
the DPCM lossless compression stage and VQ index vector obtained from the LVQ
neural network lossy compression stage, the two vectors are merged into a unified
vector with specific delimitation between them for decoding. In this case, there exists
one unified vector for each frame. To increase the compression ratio, RLE is utilized to
handle statistical redundancy among unified vector elements [1].
10 O. F. Hassan et al.

3.2 Decompression
The decompression process is done in a reverse way to the compression process, as
illustrated in Fig. 1 that includes the following steps. First, apply run-length decoding
to each row of the matrix Vr that contains the compressed video to retrieve the merged
coefficients vector fr. This vector comprises the quantized coefficients LLc and VQ
index vector Idx for each frame. Then for the quantized coefficients LLc, apply inverse
DPCM to obtain the uncompressed coefficients (low frequencies) LL. For the given VQ
index vector Idx, by utilizing the stored codebook table, this index value is converted to
the equivalent vector to retrieve the high-frequency coefficients (each frame has one
vector that contains HL coefficients). Given LL and HL from the previous steps, these
bands are combined with their other two unaltered bands (LH, HH) that given from the
database and utilizing inverse DWT to get the decompressed frame. Repeat the pre-
vious steps for all rows in the compressed matrix Vr ; collect the frames for retrieving
the original video.

4 Experimental Results

Experiments were conducted on a benchmark video dataset (available at http://www.


nada.kth.se/cvap/actions/ and https://media.xiph.org/video/derf/). The testbed is a set of
videos with different resolutions, different numbers of frames, and various extensions
like avi and mpeg. The testbed includes eight videos, as shown in Fig. 3. Herein, the
background for all these videos is unmovable, while their foreground is varying from
near stability like Miss America to movement like Aquarium. In this paper, the sug-
gested intelligent vector quantization model that relies on quantum genetic algorithm
has been tested with several benchmark videos. The parameter values were chosen
according to the most values found in the literature [6, 8, 15].
The first set of experiments was performed to compare both of compression ratios
and quality performance of the proposed model that utilizes the quantum genetic
algorithm to build optimal codebook for vector quantization that is used as an activation
function inside neural network’s hidden layer with LBG- based video compression
technique (without QGA) that relies on the randomness to build the codebook. As
shown in Table 2, using QGA achieves an improvement of about 6% in the compression
ratio and 8% in PSNR compared with the LBG video coding technique. Furthermore,
QGA achieves better results with about 0.2% compared to traditional GA. Figure 4
shows the visual difference between the original and the reconstructed video’s frame.
A Context-Based Video Compression 11

Fig. 3. Benchmark dataset

Table 2. Performance evaluation with random, GA, and QGA-based cookbook generation.
Video Random Codebook Codebook
codebook generation with generation with
method (LBG) GA QGA
CR PSNR CR PSNR CR PSNR
Man running 30.365 37.547 32.512 40.271 33.312 41.234
Traffic road 29.354 27.241 30.665 30.048 32.335 31.324
Aquarium 28.968 28.248 30.552 31.810 31.310 32.239
Akiyo 28.785 27.954 30.512 30.141 32.234 31.325
Miss America 28.417 40.696 30.512 44.254 31.865 45.087
Boxing 29.657 37.857 30.512 40.936 31.469 45.632
12 O. F. Hassan et al.

Fig. 4. (a) Original frame. (b) Reconstructed frame (PSNR = 31.810)

An advantageous point of a QGA is its ability to find a globally optimal solution in


multidimensional space. This ability is also useful for constructing an optimal code-
book of VQ for video compression. This means that we can obtain a better quality of a
representative codebook. The reason for the low compression ratio is that the proposed
model utilizes the lossless compression to compress a large number of important
coefficients. In general, in the case of a small number of elements within the quanti-
zation vector, both algorithms were equivalent for all problem instances. However,
augmenting the number of items leads QGA to behave better than GA, and this in all
problem solution variants.
The next experiment shows the comparison of the proposed model with other
related video compression systems. The first comparative algorithm [10] utilized a
motion estimation technique based on that ant colony and modified fast Haar wavelet
transform to remove the temporal redundancy. The second algorithm [1] employed fast
curvelet transform with run-length encoding and Huffman coding to remove spatial
redundancy. On the contrary, the proposed model removes both of temporal redun-
dancy by utilizing optimal vector quantization, spatial redundancy by employing
DPCM, and finally statistical redundancy by implementing run-length encoding.
Both of Table 3 and Table 4 shows that the proposed model gives better results in
terms of PSNR of the reconstructed video of about 23% improvement as compared to
the first algorithm, and 3% improvement as compared to the second system. In addi-
tion, the proposed model improves the compression ratio by 22% as compared to the
second system. The rationale of these results is that using QGA helps to build an
accurate codebook with minimum distortion for the vector quantization technique.
Furthermore, using RLE for statistical redundancy removing beside DPCM and vector
quantization yields more CR as compared with the second algorithm.

Table 3. Comparative result with optimized technique.


Video A. Suri et al. method [10] Proposed model
PSNR PSNR
Tennis 30.438 38.347
Suzie 34.5746 42.908
A Context-Based Video Compression 13

Table 4. Comparative result with traditional technique for video coding.


Video A. Elmolla Proposed
method [1] model
CR PSNR CR PSNR
Traffic road 25.11 31.08 32.335 31.324
Aquarium 24.93 30.64 31.310 32.239

In video compression, designing a codebook can be regarded as an optimization


problem; its goal is to find the optimal solution, which is the most representative
codebook [5]. It is assumed that the QGA-based vectors are mapped to their nearest
representative in the codebook with respect to a distortion function i.e., more PSNR.
QGA is applied by different natural selection to find the most representative codebook
that has better fitness value in video compression. To expedite evolution and prevent
the solution from getting out of searching space, tuning crossover and mutation ratio
are firstly explicitly determined.
Moreover, quantum algorithms generally have the ability to minimize the com-
plexity of equivalent algorithms that run on classic computers. Regarding the global
complexity, the global complexity for QGA (Evaluation + Interference) is of the order
of O(N), while for a standard GA (Evaluation + Selection + Crossover + Mutation)
the global complexity is often of the order of O(N2), where N is the size of the
population. Therefore, we assure that this result is very encouraging since the com-
plexity here has been reduced to become linear. Indeed, one can imagine what happens
if we consider a very large population of chromosomes; it will be very useful to use
QGA instead of GA.
There are some potential difficulties with the QGA presented here, even as a
theoretical model that includes: (1) some fitness functions may require “observing” the
superimposed individuals in a quantum mechanical sense. This would destroy the
superposition of the individuals and ruin the quantum nature of the algorithm. (2) The
difficulty of reproduction is more fundamental. However, while it is not possible to
make an exact copy of a superposition, it is possible to make an inexact copy. If the
copying errors are small enough, they can be considered as a “natural” form of
mutation [6–8].

5 Conclusion

Like most other problems, the design of suitable video compression strength involves
multiple design criteria and specifications. Finding optimal codebook in vector quan-
tization, not a simple task. Consequently, there is a need for optimization-based
methods that can be used to obtain an optimal solution that would satisfy the
requirements. Ideally, the optimization method should lead to the global optimum of
the objective function. In the work presented in this paper, QGA has been used to
achieve an optimal solution in the multidimensional nonlinear problem of conflicting
nature (high compression ratio with an acceptable quality of the reconstructed video).
14 O. F. Hassan et al.

In general, the application of the proposed model faces some constraints that include
(1) using static background videos (not a movable camera) to efficiently apply the
background subtraction algorithm, (2) extra memory space is needed to store some
information (wavelet HH and LH bands) that are used in the decoding process. Our
experimental results have shown that QGA can be a very promising tool for exploring
large search spaces while preserving the relation efficiency/performance. Our future
work will focus on comparing different QGA strategies to study the effect of choosing
rotation gate angles. Furthermore, the model can be upgraded to work with videos with
a movable camera instead of the static camera.

References
1. Elmolla, A., Salama, G., Elbayoumy, D.: A novel video compression scheme based on fast
curvelet transform. Int. J. Comput. Sci. Telecommun. 6(3), 7–10 (2015)
2. Shraddha, P., Piyush, S., Akhilesh, T., Prashant, K., Manish, M., Rachana, D.: Review of
video compression techniques based on fractal transform function and swarm intelligence.
Int. J. Mod. Phys. B 34(8), 1–10 (2020)
3. Ponlatha, S., Sabeenian, R.: Comparison of video compression standards. Int. J. Comput.
Electr. Eng. 5(6), 549–554 (2013)
4. Knop, M., Cierniak, R., Shah, N.: Video compression algorithm based on neural network
structures. In: Proceedings of the International Conference on Artificial Intelligence and Soft
Computing, Poland, pp. 715–724 (2014)
5. Haiju, F., Kanglei, Z., En, Z., Wenying, W., Ming, L.: Subdata image encryption scheme
based on compressive sensing and vector quantization. Neural Comput. Appl. 33(8), 1–17
(2020)
6. Zhang, J., Li, H., Tang, Z., Liu, C.: Quantum genetic algorithm for adaptive image multi-
thresholding segmentation. Int. J. Comput. Appl. Technol. 51(3), 203–211 (2015)
7. Mousavi, S., Afghah, F., Ashdown, J.D., Turck, K.: Use of a quantum genetic algorithm for
coalition formation in large-scale UAV networks. Ad Hoc Netw. 87(1), 26–36 (2019)
8. Tian, Y., Hu, W., Du, B., Hu, S., Nie, C., Zhang, C.: IQGA: a route selection method based
on quantum genetic algorithm-toward urban traffic management under big data environment.
World Wide Web 22(5), 2129–2151 (2019)
9. Atheeshwar, M., Mahesh, K.: Efficient and robust video compression using Huffman coding.
Int. J. Adv. Res. Eng. Technol. 2(8), 5–8 (2014)
10. Suri, A., Goraya, A.: Hybrid approach for video compression using ant colony optimization
and modified fast Haar wavelet transform. Int. J. Comput. Appl. 97(17), 26–30 (2014)
11. Rubina, I.: Novel method for fast 3d DCT for video compression. In: International
Conference on Creativity in Intelligent Technologies and Data Science, Russia, pp. 674–685
(2015)
12. Esakkirajan, S., Veerakumar, T., Navaneethan, P.: Adaptive vector quantization based video
compression scheme. In: IEEE International Conference on Signal Processing and
Communication Technologies, India, pp. 40–43 (2009)
13. Nithin, S., Suresh, L.P.: Video coding on fast curvelet transform and burrows wheeler
transform. In: IEEE International Conference on Circuit, Power and Computing Technolo-
gies, India, pp. 1–5 (2016)
A Context-Based Video Compression 15

14. Boufares, O., Aloui, N., Cherif, A.: Adaptive threshold for background subtraction in
moving object detection using stationary wavelet transforms 2D. Int. J. Adv. Comput. Sci.
Appl. 7(8), 29–36 (2016)
15. Wang, W., Yang, S., Tung, C.: Codebook design for vector quantization using genetic
algorithm. Int. J. Electron. Bus. 3(2), 83–89 (2005)
16. Singh, A.V., Murthy, K.S.: Neuro-curvelet model for efficient image compression using
vector quantization. In: International Conference on VLSI Communication Advanced
Devices Signals and Systems and Networking, India, pp. 179–185 (2013)
An Enhanced Database Recovery Model Based
on Game Theory for Mobile Applications

Yasser F. Mokhtar(&), Saad M. Darwish, and Magda M. Madbouly

Institute of Graduate Studies and Research, Department of Information


Technology, Alexandria University, Alexandria, Egypt
{yasser_fakhry,Saad.darwish,
magda_madbouly}@alexu.edu.eg

Abstract. In the mobile environment, the communication between Mobile


Hosts (MHs) and the database servers have a lot of challenges in the Mobile
Database System (MDS). It suffers from many factors such as handoff, insuf-
ficient bandwidth, many transactions update, and recurrent failure that consider
significant problems in the information system constancy. However, fault tol-
erance technicality enables systems to perform the tasks in the incidence of
faults. The aim of this paper is to catch the optimal recovery solution from the
available state-of-the-art techniques in MDS using the game theory tool to catch
the best one. Some of the presented recovery protocols are selected and analyzed
to choose the most important variables, such as the number of processes, time
for sending messages, and the number of messages logged-in time that affect the
recovery process. Then, the game theory technique is adapted based on the
payoff matrix to choose the optimal recovery technique according to the given
environmental variables. Game Theory (GT) is distinguished from other eval-
uation methods in that it uses a utility function to reach the optimal solution,
while others used particular parameters. The experiments were carried by the
NS2 simulator to implement the selected recovery protocols. The experiments
approve the effectiveness of the proposed model compared to other techniques.

Keywords: Mobile database system  Game theory  Recovery control 


Decision making

1 Introduction

With improved networking capabilities, mobile communication is considered one of


the most necessary parts of our life. Mobile Computing (MC) indicates various nodes
or devices that allow people to access the information or data from wherever they are
[1]. However, there are a lot of limitations in the mobile environment that introduce
several challenges to manage mobile data transactions like frequent disconnections [2–
5]. The main challenges of transaction control come from the mobility of MH and the
limitation in wireless bandwidth [6–8].
To recover from the failure, many recovery modules have been developed by
researchers to ensure that sensitive information can be recovered. They take into
consideration the obstacles of wireless communications when dealing with the mobile

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 16–25, 2021.
https://doi.org/10.1007/978-3-030-58669-0_2
An Enhanced Database Recovery Model Based on Game Theory 17

database such as energy shortage, repeated disconnecting from the server, and handoff
problems. In this regard, there are two types of recovery approaches in the literature:
forward recovery and backward recovery; see [9, 10] for more details. Recovery is a
time-consuming and resource-intensive operation, and these protocols require plenty of
them. Managing the log is considered the most expensive operation in the recovery
protocols, so for the MDS, an economical and efficient scheme of its management is
necessary [11].
The MD recovery system becomes very hard due to that most of the nodes are
mobile. Despite that, many parameter requirements for the recovery module are
specified to get a dynamic recovery protocol per the environmental conditions [7–10].
The difficulty here, how to select a convenient method for the environmental condi-
tions. Then, which of the given factors (parameters) are very influential in the recovery
process. Thus, the better solution must require an optimal balance between the appli-
cable algorithm and the performance at all levels of operations. So, the aim of this
proposal is to introduce an optimal selective protocol (decision making) using GT tool
to catch the best recovery algorithm from a pool of available algorithms by taking into
consideration the circumstances in which the interruption occurred that can greatly
improve the system performance and presents high availability data management for
handoff and disconnected operations. The essential difference between the current
evaluation-based recovery techniques and GT-based approaches is that the first group
tries to find a suitable solution with particular parameters, whereas the second group is
used to make a decision based on the utility function according to many conflicted
parameters.
Most of the previous recovery techniques did not take into consideration the cir-
cumstances nature in which the disconnection occurred in the mobile environment.
Therefore, the goal is to develop a new intelligent approach for mobile database
recovery by considering some of the environmental factors using GT. GT is applied
here since there is no one optimal solution suitable for all circumstances for recovery
operation from available recovery protocols. Thus, GT is a suitable tool for selecting
the best strategy (best solution) according to the utility function of each player (al-
ternative recover approaches are represented as players) [12–15].

1.1 Research Motivation and Challenges


The mobile systems are often subject to different environmental circumstances, which
may cause data loss or communication disconnection. Therefore, traditional schemes of
recovery cannot be directly utilized in these systems. The most important challenges
facing traditional mechanisms as follows: (1) Some recovery protocols depend on the
MH storage which is not stable and very limited; (2) Some protocols affect the logging
scheme which may lead to machine load overhead; (3) Some schemes require a great
amount of data transfer; (4) Some protocols may be slow during the process of
recovery according to the distance between the MH and the Base Station (BS);
(5) several of the algorithms suffer from performance, the number of repeated processes
and exchanged messages. In general, despite the efforts that have been introduced to
address the mobile database recovery, there is still a possibility to improve it
significantly.
18 Y. F. Mokhtar et al.

1.2 Novelty
The novelty of the suggested model is to provide high capabilities for a recovery
treatment in MDS by applying a new smart technique based on two players’ GT
mechanism as a decision-maker by picking out the suitable protocol to increase the
recovery processing efficiency. Since the real problem is not choosing one of the
known recovery techniques, but the problem is choosing the most appropriate method
according to the changes imposed in the operating circumstance as it is often vague and
variable. In this regard, the present work is going to ensure the selection of the best
available recovery method through GT based on its important parameters. In the
proposed work, a comparison between a set of different strategies for each recovery
protocol is made. These strategies highlight the effect of certain parameters (features)
on the dynamic performance of the chosen protocol. The work demonstrates that
perfect parameters further improved the performance significantly, which showed the
great potential of the proposed method. Furthermore, the proposed model submits a
high degree of flexibility by adding newest recovery methods that may improve the
results effectively.
Besides this introduction, the next sections are planned as the following: Sect. 2
explains state-of-the-art MD recovery techniques, Sect. 3 introduces the proposed MD
recovery model, Sect. 4 defines the criteria to evaluate the proposed GT technique in
the MD recovery and displays the results, and as a final point, Sect. 5 draws conclu-
sions and future work directions.

2 State-of-the-Art Mobile Database Recovery Algorithms

Recovery in the MDS area of research is not a new one, but there are still many
possibilities for the improvement of existing protocols and for creating new protocols.
Authors in [16, 17] investigated research for recovery by combining a movement-based
checkpointing with message logging. Their algorithms depend basically on a threshold
of the mobility handoffs to takes checkpoints, and a specified number of host migra-
tions across cells rather takes periodically as in [16] or checkpoint is taken once while it
travels in the same region as in [17]. Their algorithms rely on factors such as failure
rate, and the mobility rate, which decreases overhead in maintaining recovery data to
reduce the recovery cost. However, the wrong choice of the threshold of mobility
handoff may impact the performance adversely.
In a different contribution, authors in [18] advised an algorithm for a rollback
recovery based on message logging and independent checkpointing. This method is
exploited mobile agents for managing the message logs and checkpoints with a certain
threshold and latest checkpoint. Therefore, the recovery time of the MH not never
overrun a particular threshold. The benefit of this framework is that nor the send or
receive message log size could be large because only a few messages are exchanged in
the networks.
Maryam et al. [19] presented another method based on a log management technique
combined with a mobile agent-based framework to decrease recovery time. The MH
performed the checkpointing by using a model based on the frequency. This protocol
An Enhanced Database Recovery Model Based on Game Theory 19

reduced the time of recovery by reducing the exchanged messages. However, com-
plexity is increased especially when many agents are required.
Similarly, authors in [20] recommended another protocol using log management as
a solution to support recovery in the MC environment. In this way, the information of
the log is stored in the controller of the BS, which covered the geographical area as a
cell. Their system based on a kind of tracking agent in the controller of the BS to get
the MH location update when a transaction is launched. The most advantage of using
the log management method is that it’s easy to apply, while the most disadvantage of
this method is that the recovery is likely to be slow if the home agent is far from the
mobile unit.
According to the provided analysis, we find that most of the works presented were
characterized as the following: (1) Most recovery studies were based on using different
techniques in the recovery process such as log management, check pointing,
movement-based check pointing, and agent-based logging scheme; (2) These methods
are completely different; so one of these methods may not work as an alternative to
another method; that means that each algorithm has a different parameter from the
others and different assumptions to solve the recovery problem, as well as each method
works in a different environment from the other. (3) Although some proposals tried to
merge more than one technique into one contribution (hybrid method) but they are still
suffered for selecting the best fusion from this pool of the methods. So, this may cause
a high recovery cost, and the recovery may be too complex; (4) finally, most schemes
did not take into consideration the environmental conditions as influential factors in the
recovery process. Based on the above, the practical implementation of the recovery
algorithms is bounded. However, to the best of our knowledge, it is possible to develop
a scheme, which optimizes the performance by choosing the best available recovery
methods according to the current situation (the variables of each case). Hence, GT has
been employed because of its importance for decision making, which works through
conflict analysis or interactive decision theory to choose the best solution through
competition between the strategies provided for each method.

3 The Proposed Model

A typical architecture for an MDS includes a small database fragment residing on the
MH derived from the main database. This design is to manage the mobility constraints
to be facilitated the MHs and MSS. If the MH is exists in the cell serviced by MSS, a
local MH can communicate directly with this MSS. The MH can move fairly from any
cell to another, and each cell has a BS and a number of MHs. BSs are configured
station with a wireless interface that can communicate with the MHs to transfer the data
over the wireless network. Each MH can connect to the BS during a wireless channel;
the BS connects to the server of the database over wired links, and the MH cannot
connect directly to the server of the database without the existence of the BS [10, 21].
The concentration of this research is on how to employ GT to find the optimal
recovery protocol from alternative recovery algorithms in the MCS. Herein, two
important selected algorithms of recovery work according to its algorithmic architec-
ture, are implemented. The proposed model has been prototyped using a two-player GT
20 Y. F. Mokhtar et al.

tool to obtain an optimal decision for the best recovery mechanism. The proposed work
differs from other recovery systems because it takes into account some important
variables in the mobile environment during handoffs or service outages, which are
different in situations. While conventional recovery algorithms depend on some
assumptions in the environment and built their works on these assumptions. Figure 1
depicts the suggested model’s architecture.

Fig. 1. Overview of the proposed MDS recovery model.

To explain the technicality of the suggested recovery model in the MDS, we


evaluated a portion of the most significant developed algorithms in the recovery of MD
to specify any of these algorithms that are picked for investigation. In this case, we
classified the available recovery algorithms into some categories depending on how
they work or its characteristics. These categories differ in how the recovery method-
ology is applied, as discussed in [9–11]. In our proposal, we selected two recovery
protocols that represented two different algorithms in terms of application. The selected
algorithms are log management method [20], and the mobile agent method [19].
Herein, each player must have a group of strategies to enter the competition with other
players. To get these strategies, feature analysis and extraction on each method are
done to find out the most important influential aspects in each protocol. Thus for the
GT, each selected method is represented as a player, and each player’s strategies come
from the use of each effective variable. For instance, the first protocol (player 1) used
some factors such as log arrival rate, handoff rate, average log size, and mobility rate.
Likewise, the second protocol (player 2) used some factors e.g., a number of processes
in the checkpoint, a threshold of handoff, and log size.
To prepare the necessary recovery algorithms’ parameters for the GT as a decision-
making technique, we first implement the selected protocols using the selected
important factors that each protocol depends on. To obtain each player’s strategies,
An Enhanced Database Recovery Model Based on Game Theory 21

each algorithm is implemented based on real database transactions. A package of an


objective function for total recovery cost is computed that varies in calculation from
any algorithm to another. Based on the previous steps, we build the GT matrix in
accordance with each method (player) output values that reflected the operating results
with its factors so that the outputs of each work could be evaluated objectively. These
results are considered as the gains of each player called utility or payoff in the game.
GT is a mathematics branch in analyzing conflict-of-interest situations that inves-
tigates interactions involving strategic decision making. Any interaction between
competitors is considered as a game. The players are assumed to act rationally. At the
final of the game, the player acquires a value related to the actions of others called
payoff or utility. These payoffs or benefit estimate the satisfaction degree of a player
extracted from the conflicting situation. All player in this game has some available
choices to do an action called strategies [14, 15]. The GT could be described as
follows: (1) a set of players (the selected algorithms for negotiation); (2) a pool of
strategies for each player (the strategies reflect the assumed values of significant
coefficients in each protocol that also reflect the possible environmental changes);
(3) the benefits or payoffs (utility) to any player for every possible list of strategy
chosen by the players.
Accordingly, the payoff matrix, created based on recovery algorithms’ imple-
mentation in accordance with different parameters, is produced. In this game, players
pick out their action plans simultaneously. When all players in the game chosen their
strategies independently to earn a maximum profit, the game is named a non-
cooperative game. On the contrary, if all players have formed a coalition, the game is
known as a co-operative game [21, 22]. Herein, the non-cooperative GT is applied in
the analysis of strategic choices [15]. Table 1 shows the bi-matrix for the two players
with its payoffs. Herein, a11 is the benefit value for player 1’s strategy given payoff
function u1 ðs1 ; t1 Þ if ðs1 ; t1 Þ is chosen, and b11 is the payoff value for player 2’s strategy
given payoff function u2 ðs1 ; t1 Þ if ðs1 ; t1 Þ is chosen.

Table 1. The bi-matrix for the two players


Player 2: mobile agent method
Player 1: log management method Strategy t1 t2 ... th
s1 ða11 ; b11 ) ða12 ; b12 ) .. ða1h ; b1h )
.
s2 ða21 ; b21 ) ða22 ; b22 ) .. ða2h ; b2h )
.
.. .. .. .. ..
. . . . .
sm ðam1 ; bm1 ) ðam2 ; bm2 Þ . . . ðamh ; bmh Þ

Herein, the payoff calculation for each player relies on the selection of the other
player (i.e., choosing a policy for a player impacts the gain value of the other player). In
our proposal, there are three utility functions, time-consuming, memory used as a
performance cost, and the probability rate for complete the recovery (recovery done).
22 Y. F. Mokhtar et al.

The supposition is that the execution time for each algorithm can take values between 0
and 5 s; so payoff values for the TIME function will be distributed as follows:
8
>
> C1;i 2 0:1; 0 ui ¼ 6
>
>
>
> C1;i 2 0:5; 0:1  ui ¼ 4
<
C1;i 2 0:9; 0:5 ui ¼ 2
if ð1Þ
>
> C1;i 2 1; 0:9 ui ¼ 0
>
>
>
> C 2 2; 1 ui ¼ 2
: 1;i
C1;i 2 5; 2 ui ¼ 4

In the same idea, C2,i = MEMOi where MEMOi is the memory cost used for each
algorithm. The assumption is that the memory cost for each algorithm can take values
between 0 and 4000 Kbyte; so payoff values for the memory cost will:
8
>
> C2;i 2 500; 0 ui ¼ 6
> C 2 1000; 500
> ui ¼ 4
>
> 2;i
<
C2;i 2 1500; 1000 ui ¼ 2
if ð2Þ
>
> C 2 2000; 1500 ui ¼ 0
> 2;i
>
>
> C 2 2500; 2000 ui ¼ 2
: 2;i
C2;i 2 4000; 2500 ui ¼ 4

For the recovery probability rate, C3,i = DONE_PROBi where DONE_PROBi used
to check if recovery has done according to the handoff rates threshold. Thus, payoff
values for recovery probability rates are between %0 and %100 that are distributed as
follows:
8
>
> C3;i 2 20%; 0 ui ¼1
>
>
< C3;i 2 40%; 20% ui ¼2
if C3;i 2 60%; 40% ui ¼3 ð3Þ
>
>
> C3;i
> 2 80%; 60% ui ¼4
:
C3;i 2 100%; 80% ui ¼5

The player’s total winnings are the sum of the return values for each variable
described above (C1 ; C2 ; C3 Þ. To get the final solution, there are two ways to figure out
the best method (1) get a single and only one dominant equilibrium strategy in the
game and (2) using Nash Equilibrium (NE). See [23] for more details. Unfortunately,
most of the games do not contain dominated strategies. Thus, the other way to find the
solution if there is more than one solution (more than one Nash Equilibrium) to the
given problem is to find any other handling mechanism. To overcome this problem, all
the values in the payoff’s matrix are subject to the system of addition or deduction of
some points according to one of the important variables “say the execution time”, so
that additional points can be given as a bonus to the fastest element and vice versa
(normalization and reduction phase). Finally, the revisions payoff’s matrix sends to the
GT model again to find a better solution (Pure Nash) that fits the different environ-
mental variables; the optimum solution for this problem.
An Enhanced Database Recovery Model Based on Game Theory 23

4 Experimental Results

The NS2 simulation is used to evaluate the proposed recovery-based GT model in


mobile database applications. For our implementation, we employed some mobile log
files with a different size that contains a lot of processes data items for recovering
according to each algorithm’s mechanism. The NS2 simulation software is a simulator
for a discrete event intent at networking research; it gives substantial support for
routing, simulation of TCP, and multicast protocols over wired and wireless networks
(local and satellite) [24]. Furthermore, Matlab software is utilized for GT building.
Herein, both of mobile agent-based recovery and log management-based recovery
algorithms [19, 20] are utilized for GT as a decision-maker that selects the best of them
in relation to the available parameters. The parameter values for each recovery protocol
were chosen according to the most values found in the literature that are used as input
in the suggested model to examine the performance of each of them.

Recovery Execution Time


2
Time with seconds

1.5
1
0.5
0
First Sterategy Second Sterategy Third Sterategy

Log Management Mobile Agent Proposed Model

Fig. 2. Comparison results in terms of total execution time

Total Recovery Performance Payoffs


20
Payoff Values

15

10

0
First Sterategy Second Sterategy Third Sterategy

Log Management Mobile Agent Proposed Model

Fig. 3. Comparison results in terms of total performance.


24 Y. F. Mokhtar et al.

Figure 2 and Fig. 3 reveal that the suggested recovery model yields the best results
in terms of required execution time and the total payoff function in comparison with
each recovery algorithm individually using their default factors’ values as stated in [19,
20]. Game theory is used to anticipate and explain the actions of all agents (recovery
protocols) involved in competitive situations and to test and determine the relative
optimality of different strategies. In this case, the scenarios have a well-defined out-
come, and decision-makers receive a “payoff (the value of the outcome to the partic-
ipants). That is, participants will gain or lose something, depending on the outcome.
Furthermore, the decision-makers are rational: when faced with two alternatives,
players will choose the option that provides the greatest benefits. In general, log
management and agent-based protocols suffer from execution time, especially when
MH goes far away from the first base station, so this choice becomes more difficult with
a large log size. The results depict that log management and agent-based protocols
suffer from execution time especially when MH goes far away from the first BS, so this
choice becomes more difficult with a large log size.

5 Conclusion and Future Work

In this paper, a novel game theory model is proposed to find the optimal recovery
solution in MDS. The new algorithm has been demonstrated as a competitive manner
between two of the most important recovery protocols in MDS. The idea of the game
theory is that each algorithm chooses the most appropriate strategy in terms of time for
sending messages, and the number of messages logged in time to reach the detection of
the proper recovery solution according to the environmental variables. A key step in a
game-theoretic analysis is to discover which strategy is a recovery protocol’s best
response to the strategies chosen by the others. The experimental results show the
superiority of the suggested recovery model. It is possible in the future to give an
opportunity to new works (three players, for example) in order to achieve the best
efficiency of the proposed model. In addition, applying the game theory-based mixed
strategy as well as applying the proposed model in the cloud algorithms to reach a
better result.

References
1. Bhagat, B.V.: Mobile database review and security aspects. Int. J. Comput. Sci. Mob.
Comput. 3(3), 1174–1182 (2014)
2. Vishnu, U.S.: Concept and management issues in mobile distributed real time database. Int.
J. Recent Trends Electr. Electron. Eng. 1(2), 31–42 (2011)
3. Ibikunle, A.: Management issues and challenges in mobile database system. Int. J. Eng. Sci.
Emerg. Technol. 5(1), 1–6 (2013)
4. Anandhakumar, M., Ramaraj, E., Venkatesan, N.: Query issues in mobile database systems.
Asia Pac. J. Res. 1(XX), 24–36 (2014)
5. Veenu, S.: A comparative study of transaction models in mobile computing environment.
Int. J. Adv. Res. Comput. Sci. 3(2), 111–117 (2012)
An Enhanced Database Recovery Model Based on Game Theory 25

6. Adil, S., Areege, S., Nawal, E., Tarig, E.: Transaction processing, techniques in mobile
database: an overview. Int. J. Comput. Sci. Appl. 5(1), 1–10 (2015)
7. Roselin, T.: A survey on data and transaction management in mobile databases. Int.
J. Database Manag. Syst. 4(5), 1–20 (2012)
8. Vehbi, Y., Ymer, T.: Transactions management in mobile database. Int. J. Comput. Sci.
Issues 13(3), 39–44 (2016)
9. Geetika, A.: Mobile database design: a key factor in mobile computing. In: Proceedings of
the 5th National Conference on Computing for Nation Development, India, pp. 1–4 (2011)
10. Renu, P.: Fault Tolerance approach in mobile distributed systems. Int. J. Comput. Appl.
2015(2), 15–19 (2015)
11. Vijay, R.: Fundamentals of Pervasive Information Management Systems, 2nd edn. Wiley,
Hoboken (2013)
12. John, J.: Decisions in disaster recovery operations: a game theoretic perspective on
organization cooperation. J. Homel. Secur. Emerg. Manage. 8(1), 1–14 (2011)
13. Murthy, N.: Game theoretic modelling of service agent warranty fraud. J. Oper. Res. Soc. 68
(11), 1399–1408 (2017)
14. Sarah, A.: A case for behavioral game theory. J. Game Theory 6(1), 7–14 (2017)
15. Benan, L.: Game Theory and engineering applications. IEEE Antennas Propag. Mag. 56(3),
256–267 (2014)
16. Sapna, I.: Movement-based check pointing and logging for failure recovery of database
applications in mobile environments. Distrib. Parallel Database. 23(3), 189–205 (2008)
17. Parmeet, A.: Log based recovery with low overhead for large mobile computing systems.
J. Inf. Sci. Eng. 29(5), 969–984 (2013)
18. Chandreyee, S.: Check pointing using mobile agents for mobile computing system. Int.
J. Recent Trends Eng. 1(2), 26–29 (2009)
19. Maryam, A., Mohammad, A.: Recovery time improvement in the mobile database systems.
In: International Conference on Signal Processing Systems, Singapore, pp. 688–692 (2009)
20. Miraclin, K.: Log management support for recovery in mobile computing environment. Int.
J. Comput. Sci. Inf. Secur. 3(1), 1–6 (2009)
21. Xueqin, Z.: A survey on game theoretical methods in human–machine networks. Future
Gener. Comput. Syst. 92(1), 674–693 (2019)
22. Hitarth, V., Reema, N.: A survey on game theoretic approaches for privacy preservation in
data mining and network security. In: 2nd International Workshop on Recent Advances on
Internet of Things: Technology and Application Approaches, vol. 155, pp. 686–691 (2019)
23. Georgios, G.: Dominance-solvable multi criteria games with incomplete preferences. Econ.
Theory Bull. 7(2), 165–171 (2019)
24. Himanshu, M.: A review on network simulator and its installation. Int. J. Res. Sci. Innov. 1
(IX), 115–116 (2014)
Location Estimation of RF Emitting Source
Using Supervised Machine Learning
Technique

Kamel H. Rahouma(&) and Aya S. A. Mostafa(&)

Department of Electrical Engineering, Faculty of Engineering,


Minia University, Minia, Egypt
kamel_rahouma@yahoo.com, ayasami89@yahoo.com

Abstract. Supervised machine learning algorithms are dealing with a known


set of input data and a pre-calculated response of that set (output or target). In
the present work, supervised machine learning is applied to estimate the x-y
location of an RF emitter. Matlab Statistical and Machine Learning Tool
Box 2019b is used to build the training algorithms and create the predictive
models. The true emitter position is calculated according to the data gathered by
two sensing receivers. Those data are the training data fed to the learner to
generate the predictive model. A linearly x-y moving emitter-sensors platform is
considered for generality and simplicity. Regression algorithms in the toolbox
regression learner are tried for the nearest prediction and better accuracy. It is
found that the three regression algorithms, Fine tree regression, Linear SVM
regression, and Gaussian process regression (Matern 5/2) achieve better results
than other algorithms in the learner library. The resulted location error of the
three algorithms in training phase are about 1%, 2.5%, and 0.07% respectively,
and the coefficient of determination is about 1.0 for the three algorithms. Testing
new data, errors reach about, 4%, 5.5%, and 1%, and the coefficient of deter-
mination is about 0.9. The technique is tested for near and far platforms. It is
proved that emitter location problem is solved with good accuracy using
supervised machine learning technique.

Keywords: Supervised machine learning  Machine learning applications 


Geolocation with machine learning  Emitter-sensor data collection  Regression
with Matlab toolbox

1 Introduction

Machine learning is considered an automated analytical process. Two related proce-


dures which are train and predict carrying out all the essential mathematics to build a
predictive model, apply the predictive model to test an unlabeled data, and then check
the model accuracy. The training process of the learner uses a data set of a known
response. A new set of data is fed to the predictive model to predict the data output
response. Machine learning utilizes computational algorithms to learn information
directly from data without depending on a predetermined equation as a model [1]. The
algorithms enhance their performance as the available number of data samples

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 26–37, 2021.
https://doi.org/10.1007/978-3-030-58669-0_3
Location Estimation of RF Emitting Source 27

increases. Machine learning uses two types of techniques: unsupervised machine


learning which detects concealed patterns or natural structures in data [2]. It is used to
get inferences from input datasets with unlabeled responses. Applications for unsu-
pervised machine learning include gene sequence analysis, market research, and object
recognition. The other type is supervised learning, which trains a model on known
input data and known output response so that it can predict upcoming future outputs
[3]. A supervised learning algorithm takes a known set of input data and known
responses to the data (output, target) and trains a model to generate reasonable pre-
dictions for the response to new data. Supervised learning uses classification and
regression techniques to develop predictive models.
In the present work the prediction of continuous numeric responses is carried out
for continuous numeric input data. Regression prediction techniques can achieve this
demand [4]. Figure 1 illustrates the steps of supervised machine learning using
regression algorithms.

Fig. 1. Supervised machine learning technique, (a) Training process, (b) Learning process.

As shown in Fig. 1 the learning procedure is composed of two processes, and two
related data sets: The training process is performed first, using a training data set that
includes a known target field or response. The response field contains the known
output numeric value associated with each record or observation in the training data set.
Using different regression algorithms, the training process generates a predictive model
for every algorithm to select the model that best predicts the output response.
The prediction process is performed following the training process. It applies the
predictive model selected by the training process to a new data of the same type of the
trained data set. In the present application, the dataset is collected from emitter-sensor
platform. The dataset is [location of sensors (x-y), angles of arrival (/1, and /2) w.r.t.
28 K. H. Rahouma and A. S. A. Mostafa

the two sensors, and a true calculated emitter location (x-y)]. This dataset is used to
train the selected model. A new data is then collected but the emitter location will be
the output response of the predicting model.

2 Literature Preview

There are many machine learning packages available now to be used in different
applications. The list of machine learning packages is vast. Caret (R), Python, Matlab,
and more are now ready to use [5]. In the present work we choose Matlab ML toolbox
2019b to be applied to our study. Matlab ML toolbox has many advantages: easy to
learn, has an attractive GUI interface, there is no need to learn commands and working
in the command line, and supports the data type used in the application.
Jain et al. [6] describe a semi-supervised locally linear embedding algorithm to
estimate the location of mobile device in indoor wireless networks. The resulted mean
error distance of 3.7 m is obtained with standard deviation error distance of 3.5 m.
Freng et al. [7]: used machine learning to identify unknown information of radar
emitter. All radars used in training and testing are stationary. Matlab2017b is used to
implement the model. The resulted identification accuracy of the unknown radar
parameters can reach 90% in the measurement error range of 15%. Canadell Solana [8]
predicted the user equipment UE geolocation using Matlab to prepare the data and
create the models. Multiple supervised regression algorithms were tested and evaluated.
The resulted median accuracy is of 5.42 m, and the mean error of 61.62 m. We can say
that machine learning algorithms is a promising solution for geolocation problem.

3 Supervised Machine Learning Procedure

Supervised machine learning is applied in the present work using Matlab 2019b
Statistics and Machine Learning Toolbox algorithms [9]. The procedure steps are:

3.1 Data Collection and Preparation


The proposed geolocation system assumes an emitting source tracked by two sensors,
receivers [10]. Sensors are continuously collecting the emitter x-y location which are
the angle of arrival aoa1 w.r.t. sensor1, and aoa2 w.r.t. sensor2 assuming known sensors
locations xs1,2, ys1,2, and zs1,2, s1, or s2 stands for sensor1 or sensor2 [11]. Thus, the
collected data are used mathematically for calculating the x and y emitter location, xe,
and ye. A set of training data with a known output response is arranged and fed to the
data import tool. Figure 2 shows the data table received by the regression learner
importing data tool. The input training data variables and the calculated output
response are sensors xs1, ys1, xs2, and ys2 coordinates and the resulted emitter xe, and ye
coordinated. The data length is selected to be about 300 observations as a medium data
length. The input data response (xe) is varying from 2000 up to 5000 m according to
the locations and angles of arrivals of the sensors [12].
Location Estimation of RF Emitting Source 29

Fig. 2. Importing data to the regression training learner.

The training learner can deal only with one response, so the training process will
start predicting xe at first then the same procedure will be repeated for ye. Figure 3
presents the classification of data as predictors and data response before starting the
training session.

Fig. 3. Predictors and response for a training session.

3.2 Cross-Validated Regression Model


Regression model trained on cross-validated folds [13]. Estimate the quality of
regression by cross-validation using data k-folds or disjoint sets where k = 1 or more.
Every fold trains the model using out of fold observations. Testing the model perfor-
mance using in-fold observations. For example, suppose the number of k-folds for
cross-validation is five-folds as shown in Fig. 3. In this case, every training fold
contains about 4/5 of the data set and every test fold contains about 1/5 of the data. This
procedure means that the response for every observation is computed by using the
30 K. H. Rahouma and A. S. A. Mostafa

model trained without this observation. Then the average test error is calculated over all
folds. For big data sets, cross-validation may take long accessing time, so no validation
could be chosen.

3.3 Choose a Regression Training Algorithm


Now the regression trainer is ready to start a training session. Pushing the start session
button as shown in Fig. 3 will open a window to choose a regression algorithm from
the learner library to carry out the training process according to the selected algorithm,
building the training model, and giving statistical values helping in evaluating the
created training model. All data handling and analytical process are carried out auto-
matically by the learner. Figure 4 shows the toolbox regression algorithms contained in
the learner library.

3.4 Creating Predictive Model


The regression training algorithm is selected, then pressing the start button initiates the
training process. Figure 5 illustrates the result of applying the Fine Tree algorithm as an
example and exporting the predictive model that will be used for predicting responses
of new data sets. As shown in the figure, the RMSE is about 34 m and the mean
average error MAE is about 29 m which indicates an accepted result looking for
coordinates long distances. The model name is written in the export model text box.
The shown name, trained Model, is the default name. The model structure is saved to
the workspace to be used in testing and predicting new data and responses.

Fig. 4. Selecting regression algorithm.


Location Estimation of RF Emitting Source 31

Fig. 5. Exporting the predictive model to the workspace.

As shown in Figs. 4 and 5, the History box shows the algorithms used in training
the input data set and the resulted model root mean square error RMSE. A frame is
drawn around the smallest RMSE value. The current model box specifies the selected
training model output evaluation statistics. The current Model box specifies the
quantitative estimates that evaluate the output of the current model training process.

3.5 Evaluating the Predictive Model Accuracy


The accuracy of the regression model describes the performance of the model. The
regression learner uses the following metrics as measures of accuracy [14, 15]:
– Mean Squared Error (MSE) represents the difference between the true and predicted
response values and given by:

1X k
MSE ¼ ðxi  ex Þ2 ð1Þ
k i¼1

Where: k… the total number of observations or fields, i … the observation or field


number such that i = 1: k, x is the true response value, and ex is the predicted value
of x.
– Root Mean Squared Error (RMSE) is the error rate defined by the square root of
MSE as:
pffiffiffiffiffiffiffiffiffiffi
RMSE ¼ MSE ð2Þ

– Mean absolute error (MAE) represents the difference between the true and predicted
values and given by:

1X k
MAE ¼ jxi  ex j ð3Þ
k i¼1
32 K. H. Rahouma and A. S. A. Mostafa

– The coefficient of determination (R-squared) represents the coefficient of how well


the predicted values fit compared to the original values.
Pk
ðxi  ex Þ2
R ¼ 1  Pi¼1
2
k
ð4Þ
i¼1 ðxi  xÞ2

where: x … the mean value of x. The value of R is between 0 and 1. The model of
the higher value of R means that the model is better.

4 Geolocation of Emitter Source in Moving Platform

The geolocation of an emitter source could be estimated using two receivers (sensors).
The data collected by sensors in x-y plane are azimuth angles of arrivals hs1, and hs2 for
sensors 1 and 2 respectively. The regression learner will now be used to predict the
emitter location by observing the available data. The data set fed to the learner will be
aoa1(hs1), aoa2(h2), and sensors x-y coordinates (xs1, ys1, xs2, and ys2) as shown in
Fig. 2. The better performances found when training and testing different regression
algorithms in the learner library are the Fine Tree, the linear SVM, and the Gaussian
prediction Regression algorithms.

4.1 Output Results of Training Algorithms


The three mentioned regression algorithms are trained using the observations of sensors
data set of about 200 observations (records). The corresponding x-coordinate of the
emitter (response) is mathematically calculated using equations in [12]. The learner
output results are shown in Fig. 6 as follows: a, and b for the prediction model Fine
Tree, c, and d for Linear SVM, e, and f for Gaussian Process Regression (Matern 5/2).
Location Estimation of RF Emitting Source 33

(a) (b)

(c) (d)

(e) (f)

Fig. 6. Training results for Fine Tree (a, b), LVSM (c, d), and GPR (e, f).
34 K. H. Rahouma and A. S. A. Mostafa

Evaluation and accuracy of the three learning predictive models are shown in
Table 1.

Table 1. Evaluation statistics of Fine Tree, LSVM, and GPR (Matern 5/2) regression models.
Regression algorithms Accuracy metrics
RMSE MAE R-squared
Fine Tree 33.129 28.67 1.00
LSVM 82.213 74.66 0.99
GPR (Matern 5/2) 0.0343 0.0297 1.00

4.2 Testing Predictive Models


Data records (10 records) are selected randomly from the trained data set to test the
predictive models. Figure 7 illustrates the result (a) for part of the trained data set, and
(b) for new data representing near platform.

(a) (b)

Fig. 7. Results of testing regression models, (a) data from the trained set, and (b) new data.

Tables 2 and 3 present the accuracy of regression prediction models for both
datasets described in Fig. 6a and 6b.

Table 2. Evaluation statistics of the tested dataset.


Regression algorithm Accuracy metrics
RMSE (m) MAE (m) R-Squared
Fine Tree 19.3 13.636 0.999
LSVM 73.149 63.622 0.986
GPR (Matern 5/2) 0.0359 0.0297 1.00
Location Estimation of RF Emitting Source 35

Table 3. Evaluation statistics of the tested new dataset.


Regression algorithm Accuracy metrics
RMSE (m) MAE (m) R-squared
Fine Tree 4.6811 4.1636 0.8808
LSVM 5.9279 5.4923 0.8727
GPR (Matern 5/2) 0.9994 0.9944 0.9091

5 Results Analysis

Matlab Statistics and Machine Learning Toolbox is used here to apply machine
learning in geolocating emitting sources in the x-y plane. The toolbox machine learning
tool for regression prediction, Regression Learner, is introduced and steps of learning
and training procedure are discussed as shown in Fig. 1. Predictive models are created
and evaluated. The x-coordinate of emitter location is predicted using information
collected from receiving sensors.
The learner application has different regression algorithms to make a prediction. In
the present work, most of the regression algorithms are tested for better performance. It
is found that three algorithms achieve the best performance considering the current case
study. The three algorithms are Fine Tree Regression, Linear SVM Regression, and
Gaussian Process Regression (Matern 5/2). Figures 2, 3, 4 and 5 tell the story of
supervised machine learning starting from data preparation up to training, evaluating,
and exporting a predictive model. The output of each of the three regression predictive
models is illustrated in Fig. 6. The accuracy metrics are listed in Table 1. From the
figure and the table, it is clear that the created predictive models can perform well, and
the prediction in the training phase is near ideal. The testing phase is carried out using
two sets of data. A set selected randomly from the training data set, and a new data set.
The test procedure of the selected data gives good prediction results with high accuracy
as shown in Fig. 7a and Table 2. The prediction process using the new data sample is
considered satisfactory as shown in Fig. 7b and Table 3. For new data, it is expected to
have less accuracy, but the achieved result is good.
Table 4 summarizes the average error of the predicted emitter x-position for the
three regression models, and the three used datasets. The GPR (Matern5/2) model
achieves the best result compared with the other two models.

Table 4. Distance percentage error of the three regression models.


Data set Regression models
Fine tree (%) Linear SVM (%) GPR (Matern 5/2) (%)
Training 1 2.5 0.009
Testing-training 1.3 3.2 0.01
New data 3.6 5 1.4
36 K. H. Rahouma and A. S. A. Mostafa

6 Conclusion

Supervised machine learning is applied to solve the problem of geolocating an emitting


source in the x-y plane. Matlab Statistics and Machine Learning Toolbox is discussed
and applied. Regression prediction algorithms built in the library of the toolbox are
applied saving time and complexity of mathematical calculations and analytical pro-
cesses. Near optimum prediction is achieved for training algorithms discussed in the
present work. Conceder testing the three algorithms, the minimum percentage error
reaches less than 1% for GPR algorithm, while the maximum error reached about 5%
for SVM algorithm. The same procedure is used to predict the ye coordinate of the
emitter. For new data, accepted results are obtained. It is proved that machine learning
could help in geolocation applications achieving good results.

References
1. Smola, A.J.: An introduction to machine learning basics and probability theory, statistical
machine learning program. ACT 0200, Australia, Canberra (2007)
2. Osisanwo, F.Y., Akinsola, J.E.T., Awodele, O., Hinmikaiye, J.O., Olakanmi, O., Akinjobi,
J.: Supervised machine learning algorithms: classification and comparison. Int. J. Comput.
Trends Technol. 48(3), 128–138 (2017)
3. Dangeti, P.: Statistics for Machine Learning, Techniques for Exploring Supervised,
Unsupervised, and Reinforcement Learning Models with Python and R. Packt Publishing
Ltd., Birmingham (2017)
4. Malmström, M.: 5G positioning using machine learning. Master of Science thesis in Applied
Mathematics. Department of Electrical Engineering, Linköping University (2018)
5. Zhang, X., Wang, Y., Shi, W.: CAMP: performance comparison of machine learning
packages on the edges. In: Computer Science HotEdge (2018)
6. Jain, V.K., Tapaswi, S., Shukla, A.: Location estimation based on semi-supervised locally
linear embedding (SSLLE) approach for indoor wireless networks. Wirel. Pers. Commun. 67
(4), 879–893 (2012). https://doi.org/10.1007/s11277-011-0416-2
7. Feng, Y., Wang, G., Liu, Z., Feng, R., Chen, X., Tai, N.: An unknown radar emitter
identification method based on semi-supervised and transfer learning. Algorithms 12(12), 1–
11 (2019)
8. Canadell Solana, A.: MDT geolocation through machine learning: evaluation of supervised
regression ML algorithms. MSc. thesis submitted to the College of Engineering and Science
of Florida Institute of Technology, Melbourne, Florida (2019)
9. MathWorks, Statistics and Machine Learning Toolbox, R2019b
10. Progri, I.: Geolocation of RF Signals Principles and Simulations, 1st edn. Springer,
Heidelberg (2011)
11. Diethert, A.: Machine and Deep Learning with MATLAB. Application Engineering
MathWorks Inc., London (2018)
12. Rahouma, K.H., Mostafa, A.S.A.: 3D geolocation approach for moving RF emitting source
using two moving RF sensors. In: Advances in Intelligent Systems and Computing, pp. 746–
757, vol. 921. Springers (2019)
Location Estimation of RF Emitting Source 37

13. Varoquaux, G.: Cross-validation failure: small sample sizes lead to large error bars.
NeuroImage 180, 68–77 (2018)
14. Prairie, Y.T.: Evaluating the predictive power of regression models. Can. J. Fish. Aquat. Sci.
53(3), 490–492 (1996)
15. Raschka, S.: Model evaluation, model selection, and algorithm selection in machine
learning, pp. 1–49 arXiv:1811.12808v2 [cs.LG] (2018)
An Effective Offloading Model Based
on Genetic Markov Process for Cloud Mobile
Applications

Mohamed S. Zalat(&), Saad M. Darwish, and Magda M. Madbouly

Department of Information Technology, Institute of Graduate Studies


and Research, Alexandria University, Alexandria, Egypt
{mohamed.zalat,saad.darwish}@alexu.edu.eg,
mmadbouly@hotmail.com

Abstract. Mobile Cloud Computing (MCC) has drawn significant research


attention as mobile devices’ capability has been improved in recent years. MCC
forms the platforms for a broad range of mobile cloud solutions. MCC’s key
idea is to use powerful back-end computing nodes to enhance the capabilities of
small mobile devices and provide better user experiences. In this paper, we
propose a novel idea for solving multisite computation offloading in dynamic
mobile cloud environments that considers the environmental changes during
applications’ life cycles and relationships among components of an application.
Our proposal, called Genetic Markov Mobile Cloud Computing (GM-MCC),
adopts a Markov Decision Process (MDP) framework to determine the best
offloading decision that assigns components of the application to the target site
by consuming the minimum amount of mobile’s energy through determining the
cost metrics to identify overhead on each the component. Furthermore, the
suggested model utilizes a genetic algorithm to tune the MDP parameters to
achieve the highest benefit. Simulation results demonstrate that the proposed
model considers the different capabilities of sites to allocate appropriate com-
ponents. There is a lower energy cost for data transfer from the mobile to the
cloud.

Keywords: Mobile Cloud Computing  Offloading  Application partitioning


algorithm  Genetic algorithm  Markov Decision Process

1 Introduction

Mobile Cloud Computing (MCC) is an emerging technology linked to a broad range of


mobile learning applications, healthcare, context-aware navigation, and social cloud.
MCC is an infrastructure where the data storage and data processing are performed
outside the mobile device but inside the cloud [1]. A mobile device itself has limita-
tions such as limited network bandwidth, energy consumed by transmission and
computation, network availability, and little storage [2]. However, the limited battery
life is still a big obstacle for the further growth of mobile devices. Several known
power-conservation techniques include turning off the mobile computing devices
screen when not used, optimizing I/O, and slowing down the CPU [3]. One accessible

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 38–50, 2021.
https://doi.org/10.1007/978-3-030-58669-0_4
An Effective Offloading Model Based on Genetic Markov Process 39

technology to reduce the energy consumption for mobile devices is MCC. Its funda-
mental idea is computation offloading or cyber forging, which means that parts of an
application executing on the remote server, with results communicated back to the local
device [4].
The offloading mechanism divides the application between local and remote exe-
cution. The decision may have to change with fluctuations in operating conditions such
as computation cost, communication cost, excepted total cost of executing, user input,
response time, and security agent [5]. Some critical issues concerning the partitioning
problem include application component classification, application component weigh-
ing, reduced communication overhead, and reduced algorithm complexity [6, 7]. The
elastic application is the application that can separate its components in runtime while
preserving its semantics [8]. Messer et al. [9] list the fundamental attributes of elastic
application as a distributed platform like; ad-hoc platform construction, application
partitioning, transparent and distributed execution, adaptive offloading, and beneficial
offloading. Many researchers describe the MCC application partitioning taxonomy [1,
10–17]. They tag the application of partitioning algorithms granularity level as module,
objective, thread, class, task, component, bundle, allocation site, and hybrid level
partitioning. In general, synchronization in distributed deployment of elastic applica-
tion for MCC represents the main challenges for MCC.
The traditional computational offloading algorithms focus on single-site offloading,
where an application is divided between the mobile device and a single remote server.
In recent years several, researchers studied and implemented algorithms that focus on
multisite offloading. Multisite model has a valuable resource more than a single site and
more power saving with less time-consuming. We focus on the partitioning/scheduling
process, which is the most important phase of cyber foraging to place each task at the
surrogate(s) or the mobile device most capable of performing it, based on the context
information and the predictable cost of doing so.

1.1 Mobile Application Graph Problem


Mobile application execution can be represented as a sequence of components in
several graph topologies, such as linear, tree, or mesh. Based on the fact that offloaded
components are likely to be executed sequentially [6]. The assumption is generally
used and tested in other research for simplification and convenience. Thus, we use a
weighted directed acyclic graph G ¼ ðV; EÞ to represent the relationship among the
components in a mobile application. Each vertex v  V denotes a component, and edge
eðu; vÞ denotes the communication channel between components u and v. Fig. 1
illustrates a weighted directed execution graph of a mobile application. The unshaded
vertices are offloadable components, and the shaded vertices are unoffloadable com-
ponents because of I/O, hardware, or externals constraints. The weight on the vertices,
denoted by Ci for vertex i, is a vector weight representing the cost of executing the
components on each site. The weight on edges, denoted by Ci;j for edge e(i, j) is also a
vector weight that represents the cost of transmitting data between components in
different sites. The cost metric can either be time or energy consumption.
Our algorithm adopts both cost metrics to determine offloading decisions using an
adoptive genetic-based Markov decision process. Those weights are constructed using
40 M. S. Zalat et al.

Fig. 1. Mobile application represented as a weighted directed graph.

both static analysis and dynamic profiling methods for an application. The application
structure is analyzed for possible migration points and run several times with different
input and environment to identify the overhead on each of its components.

1.2 Genetic Algorithm


Genetic algorithms were developed to study the adaptive process of natural systems to
create artificial systems. Genetic algorithms differ from traditional optimization
methods in the following ways as it uses a coding of the parameter set rather than the
parameters themselves. Genetic algorithms search from a population of search solu-
tions instead of a single one. Furthermore, genetic algorithms use probabilistic tran-
sition rules [19]. A genetic algorithm consists of a string representation (“genes”) of the
solutions (called individuals) in the search space, a set of genetic operators for gen-
erating new search individuals, and a stochastic assignment to control the genetic
operators. Typically, a genetic algorithm consists of the following steps. (1) Initializa-
tion- an initial population of the search solutions is randomly generated. (2) Evaluation
of the fitness function - the fitness value of each individual is calculated according to
the fitness function (objective function). (3) Genetic operators - new individuals are
generated randomly by examining the fitness value of the search individuals and
applying genetic operators to individuals. (4) Repeat steps 2 and 3 until the algorithm
converges. From the above description, we can see that genetic algorithms use the
notion of survival of the fittest by passing.

1.3 Markov Decision Process


Markov Decision Processes [20] are one tool of artificial intelligence that can be used
to get optimal action policies under a stochastic domain. Given an action executed in a
known state of the world, it is possible to calculate the probability of the next state in
the world. The probability of reaching a state s0 when the action a has occurred is
calculated by the summation of the conditional probability of reaching a state s0 from
every possible state si of the world given the action a. A formal description of an MDP
is the tuple ðS; A; P; R; tÞ where:
• S is a finite set of states of the world where s  S where s is system state information
which is characterized by the combination of the channel state and the execution
location of component, so the system state at decision t is Xt,i = (t, i, c) where c is
the channel state for the next epoch between the mobile and offloading sites (i.e.
either g or b) and I  [0,k] which is location of the executed component t [21].
An Effective Offloading Model Based on Genetic Markov Process 41

• A is a finite set of actions; the decision can be chosen from two major actions:
migrate execution or continue execute the next component locally. In our case,
action for a state s are represented by As and taken at each stage t is represented by
a 2 As .
• Transition probability P½s0 js; a of states and P represents the transition matrix for
the next stage s0 . For each action and state of the world, there is a probabilistic
distribution over states of the world that can be reached by executing this action.
• R : S x A ! ℜ or Rðs; a; s0 Þ is a reward function or cost function. To each action in
each state of the world, is assigned a real number. The function Rðs; a; s0 Þ is defined
as the reward of executing action a in state s which will be s0 .
• Decision epochs is representing a point of time ðtÞ for decision for action a in the
state s, where a finite horizon discrete time problem and decisions are made at the
beginning of a period/stage.
Decision rule dt ¼ S ! A mapping from state to action at decision epoch t that indi-
cates which action to choose when the system is a specific state at a given time. The
policy p ¼ ðd0 ; d1 ; . . .:; dn Þ represents a sequence of decision rule to be used at all
decision epochs t under policy p. According to [22] there exists a stationary policy p
that is optimal for all policies. Therefore, in this paper, our goal is to determine an
optimal stationary policy that suggests the best action that minimizes the sum of the
cost incurred at the current stage as well as the least total expected cost that can be
incurred from all subsequent stages. We denote Vp ðsÞ to be the expected total cost of
executing the application given initial state s and policy p, calculated as:
" #
X
n
V ðsÞ ¼ E
p p
CðXt ; at jX0 ¼ s ð1Þ
t¼0

Where E p represents the conditional expectation with respect to policy p and CðXt ; at Þ
is the cost incurred at stage t by taking action at. The cost function is considered to be
either the amount of energy consumed or time spent by the mobile as a result of taking
the specific action, which will be explained in the following sections.
The remainder of this paper is structured as follows: Sect. 2 presents some related
work. Section 3 introduces the proposed offloading technique in detail. Section 4
reports the performance evaluation of the proposed model and gives experimental
results. Finally, conclusions and lines for future work are drawn in Sect. 5.

2 Related Work

In the literature, several offloading algorithms research focus on single-site offloading,


where an application is divided between the mobile device and a single remote server.
Cuervo et al. [11] suggested an algorithm to fine-grained code offload to maximize
energy savings with minimal burden on the programmer under the mobile device’s
current connectivity constrains. Chun et al. [12] designed and implemented a Clone-
Cloud system that is a flexible application partitioner and execution runtime that
enables unmodified mobile applications running in an application-level virtual machine
42 M. S. Zalat et al.

to seamlessly off-load part of their execution from mobile devices onto device clones
operating in a computational cloud. Kovachev et al. [13] modeled the partitioning of
data stream application by using a data flow graph. The genetic algorithm is used to
maximize the throughput of the application. Over the last few years, many researchers
focus on multisite offloading. Also, most of current approaches make offloading
decisions based on profiling information that assumes a stable network environment
[15]. However, this assumption is not always correct because the mobility of a user
could create a dynamic bandwidth between the mobile and the server. As a result, if the
network profile information does not match the actual post-decision bandwidth, the
offloading decision could lead to a critical Quality-of-Service (QoS) failure [16].
Sinha and Kulkarni [2] developed a multisite offloading algorithm that uses dif-
ferently weighted nodes and different network bandwidths. However, their work
assumes a stable channel state when making the offloading decision. Terefe et al. [17]
presented a model to describe the energy consumption of multisite application exe-
cution. They adopt a Markov Decision Process (MDP) framework to formulate the
multisite partitioning problem as a delay-constrained, least-cost shortest path problem
on a state transition graph. However, they depend on the constantly of Markov values
for decisioning. Ou et al. [18] proposed a (K + 1)-way partitioning algorithm to keep
the component interaction as small as possible. They utilized the Heavy Edge and Light
Vertex Matching (HELVM) algorithm that splits the application graph for multiple
servers while satisfying some pre-defined constraints. Unfortunately, HELVM assumes
all servers are alike (i.e., they have homogeneous capacities).
Niu et al. [3] presented a multi-way partition algorithm called EMSO (Energy-
Efficient Multisite Offloading) that formulates the partitioning problem as a weighted
directed acyclic graph and traverses the search tree using depth-first search. It then
computes the energy consumption of nodes under the current and critical bandwidth.
The algorithm determines the most energy-efficient multisite partitioning decision, but
it does not provide a guarantee for completion time. Hence, it is not suitable for real-
time multimedia applications.
This paper focuses on multisite offloading MCC, a substantial extension of the
work presented paper [17] that adopted the MDP framework to formulate the multisite
partitioning problem as a delay-constrained, shortest path problem on a state transition
graph. As a new idea, the proposed model is built based on a quantified data structure
for each site’s node and determine optimal offloading policy according to the optimal
time and energy for each state. This accomplished by using a genetic algorithm in
association with MDP.

3 Proposed Model

The main goal of the suggested algorithm is to obtain a fine-grained offloading


mechanism to maximize energy savings with minimal time cost and without effect the
mobile performance through finding the optimal policy (p ) to distribute application
components between different multisite. The main diagram of the suggested model is
depicted in Fig. 2 which divided into three steps: Step1. The profiler subsystem gathers
An Effective Offloading Model Based on Genetic Markov Process 43

the environment data using static analysis and dynamic profiling mechanisms [23–26].
Step2. GM-GA engine employs the GA to get the probabilities of the best off-loading
sites. Step3. GM-MCC engine utilizes the best population to achieve the optimal policy
(p ). The algorithms are specified in the following subsections.

Fig. 2. The proposal GM-MCC model.


44 M. S. Zalat et al.

Algorithm 1: GM-MCC Engine

Input: Environment Data (ED).


Output: Optimal Policy ( ).
Formulate the problem costs according for current ED
Get the best transitions probability ( ) // see algorithm 2 .
Compute
Optimal Iteration Energy Cost (OIEC).
Optimal Iteration Time Cost (OIEC). //see algorithm 3.
Construct . // see algorithm 4
Return .

Algorithm 1 constructs the optimal Policy (p ) after handling the environment data
(ED). First, it formulates the problem costs according to the current ED. Secondly, it
calls algorithm 2 for obtaining the best probability (Pbest ) for Energy Cost (EC) and ED.
Thirdly, it computes the Optimal Iteration Energy Cost (OIEC), and Optimal Iteration
Time Cost (OIEC) by algorithm 3. Then, it constructs the Optimal Policy (p ) using
algorithm 4 and returns it.

Algorithm 2: GM-Probability Generating


Input: (ED), Energy cost (ED).
Output: the best transitions probability(Pbest)
Initialize Pm, Pc, maxGenenerationNo.
Generate population P0
Evaluate initial Population (P0) and find the Iteration Energy cost (IEC)
//call algorithm 3

Repeat
For j = 1 to PopulationSize /2 do:
Select two parents P1 and P2 from Pi−1 offspring (P1, P2)
Crossover(P1, P2) and generate two new child C1,C2 With across probability Pc
Mutate C1,C2 randomly with a probability Pm limitation.
Add C1,C2 to new population Pnew.
Evaluate Pnew //see algorithm 3
Sort Pnew.
Stop criteria
If fitness ( ) = 0.
Fitness ( ) < fitness
= maxGenenerationNo .
Pbest = Elitism (Pbest,Pnew)
Return Pbest.
An Effective Offloading Model Based on Genetic Markov Process 45

Algorithm 3: GM- Value Iteration (GM-VI)


Initialize Vπ(s)=0 for all s ϵ S
For all k=1=1 to n+1
For all
For all

π(s)=arg min[Qk(s,a)]

End for
End for
Return
46 M. S. Zalat et al.

Herein, generating a suitable probability algorithm is based on genetic algorithms


[19]. Algorithm 2 begins with initializing the population chromosome as an individual
that represents a possible solution to the problem and is composed of a string of genes.
The fitness evaluation of each gene is the minimal mobile energy consumption value.
Algorithm 3 calculates the value of every state with its probability based on bellman’s
equation [22], which expresses the optimality condition. The solution of the optimality
equation represents the minimum expected total cost and the MDP policy p. Note that
the MDP policy indicates which site to migrate the execution given the current state or
state at the same site. Algorithms such as the Value Iteration Algorithm (VIA) and
linear programming can all be applied to solve Bellman’s optimality equation [20]. We
implement the VIA as GM-Value Iteration in our work because of its theoretical
simplicity and ease of coding and implementation. Furthermore, Algorithm 4 compares
the policy according to delay time and check if there are any availability to change in
policy of energy to get optimal policy ðp Þ.

4 Experimental Results

The simulation platform is a PC with Windows 10, 64-bit, 8 GB RAM. The simulation
is performed to determine the role of GA-MCC Engine and its succor GA-Probability
generating, GM- VI, and how GM-PI can enhance the performance of offloading
methodology. The experiments test for three offloading sites (multisite model) with the
following environment attributes listed in Table 1. Also, mobile applications graph
parameters are listed in Table 2, where there are there application performance
behavior categories: computation-intensive (CI), data-intensive (DI), and randomly one
which it was changed according to user requests.

Table 1. Sites characteristics.


Site Characteristics Specification
Mobile Mobile f0 ¼ 500 MHz fi : CPU clock speed (cycles/second)
ps ¼ 1:3 W pr ¼ 1:0 W of offloading site qi
pc ¼ 0:9 W ps : Mobile power consumption when
pidle ¼ 0:3 W sending data
pc : Mobile power consumption when
Site 1 Local/Private f1 ¼ 2 GHz
computing
cloud rg ¼ 100 kb/s pidle : Mobile power consumption at
rb ¼ 50 kb/s idle
Site 2 Local/Private f2 ¼ 3 GHz rg : Data transmission rate in a good
cloud rg ¼ 50 kb/s channel state
rb ¼ 10 kb/s rb : Data transmission rate in a bad
Site 3 Remote/Public f0 ¼ 5 GHz channel state
cloud rg ¼ 50 kb/s
rb ¼ 10 kb/s
An Effective Offloading Model Based on Genetic Markov Process 47

Table 2. Mobile application parameters


Category Parameter Symbol/Units Value
CI Node wv ðM cyclesÞ 500–650 wv : Total CPU cycles needed by the
weight dvs ðKBÞ 4–6 instructions of component v
dvr ðKBÞ 5–8 dvs : Data (bytes) sent by component
DI Node wv ðMcyclesÞ 100–150 v to the database
dvr : Data (bytes) received by a
weight dvs ðKBÞ 25–30
component v from the database
dvr ðKBÞ 15–17 du;v : Data transferred from
Random Node wv ðMcyclesÞ 100–650 component u to v
weight dvs ðKBÞ 14–30
dvr ðKBÞ 5–35
Edge du;v ðKBÞ 100–120
weight

The experiments were conducted to validate the efficiency of the suggested


offloading model under a single site and multisite in terms of energy consumption.
Fig. 3 shows the amount of energy consumed for each category CI, DI, and Random.
There are saving between 19–40% according to graph type. Also, we compare the
energy-saving between the first-generation population and the best one. Fig. 4 shows
that our algorithm saves between 19–40%. The first generation saves between 3–16%.
Furthermore, the GM-MCC algorithm, as shown in Fig. 5, saves more time when it
performs application offloading. The saving percentage is between 2–48% for different
category.

45.00
40.00
Energy Consumption

35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
Mobile Site 1 Site 2 Site 3 MulƟsite Saving

CI DI Random

Fig. 3. Energy consumption per sites and multisite


48 M. S. Zalat et al.

100%

80%
Energy Consumption %

19%
60% 40%
31%
40%

20% 15%
16%
0% 3%
CI DI Random

Power 1st GeneraƟon Power Saving VC* GeneraƟon Power Saving

Fig. 4. Power saving comparing between the first generation and the best energy value
generation

Time Saving
180
160
Time Required

140
120
100
80
60
40
20
0
CI

Random
DI

CI

Random

CI

Random
DI

DI

CI

Random
DI

CI

Random
DI

10 30 70 100 150

Mobile Site1 Site2 Site3 MulƟSite Saving

Fig. 5. GA-MCC time saving

5 Conclusion and Future Work

We presented a modified partitioning model to find the optimal policy for mobile
application offloading. The suggested model employs the genetic algorithm to find
populations for a Markov decision model to choose the best solution to handle the
An Effective Offloading Model Based on Genetic Markov Process 49

mobile application between different multisite instead of mobile-only or single cloud


site. The simulation results showed better results in terms of time and energy. It is
possible in the future to allow using another evolutionary algorithm to enhance the
algorithm performance.

References
1. De, D.: Mobile Cloud Computing: Architectures, Algorithms and Applications, 1st edn.
CRC Press LLC, Florida (2015)
2. Sinha, K., Kulkarni, M.: Techniques for fine-grained, multi-site computation offloading, In:
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing, USA, pp. 184–194 (2011)
3. Niu, R., Song, W., Liu, Y.: An energy-efficient multisite offloading algorithm for mobile
devices. Int. J. Distrib. Sens. Netw. 9(3), 1–6 (2013)
4. Hyytiä, E., Spyropoulos, T., Ott, J.: Offload (only) the right jobs: robust offloading using the
markov decision processes. In: Proceedings of IEEE 16th International Symposium on A
World of Wireless, Mobile and Multimedia Networks, USA, pp. 1–9 (2015)
5. Balan, K., Gergle, D., Satyanarayanan, M., Herbsleb, J.: Simplifying cyber foraging for
mobile devices. In: Proceedings of the 5th International Conference on Mobile Systems,
Applications and Services, Puerto Rico, pp. 272–285 (2007)
6. Yuan, Z., Hao, L., Lei, J., Xiaoming, F.: To offload or not to offload: an efficient code
partition algorithm for mobile cloud computing. In: Proceedings of the IEEE 1st
International Conference on Cloud Networking, France, pp. 80–86 (2012)
7. Ou, S., Yang, K., Liotta, A.: An adaptive multi-constraint partitioning algorithm for
offloading in pervasive systems. In: Proceedings of the Fourth Annual IEEE International
Conference on Pervasive Computing and Communications, Italy, pp. 10–125 (2006)
8. Veda, A.: Application partitioning-a dynamic, runtime, object-level approach. Master’s
thesis Indian Institute of Technology Bombay (2006)
9. Messer, A., Greenberg, I., Bernadat, P., Milojicic, D., Deqing, C., Giuli, T., et al.: Towards a
distributed platform for resource-constrained devices. In: Proceedings of the 22nd
International Conference on Distributed Computing Systems. Austria, pp. 43–51 (2002)
10. Ahmed, E., Gani, A., Sookhak, M., Hamid, S., Xiam, F.: Application optimization in mobile
cloud computing: motivation, taxonomies, and open challenges. J. Netw. Comput. Appl. 52
(1), 52–68 (2015)
11. Cuervo, E., Balasubramanian, A., Cho, D.k., Wolman, A., Saroiu, S., Chandram, R., et al.:
MAUI: making smartphones last longer with code offload. In: Proceedings of the 8th
International Conference on Mobile Systems, Applications, and Services, USA, pp. 49–62
(2010)
12. Chun, B.-G., Ihm, S., Maniatis, P., Naik, M., Patti, A.: CloneCloud: elastic execution
between mobile device and cloud. In: Proceedings of the Sixth Conference on Computer
Systems, Austria, pp. 301–314 (2011)
13. Kovachev, D., Klamma, R.: Framework for computation offloading in mobile cloud
computing. Int. J. Interact. Multimedia Artif. Intell. 1(7), 6–15 (2012)
14. Kumar, K., Lu, Y.H.: Cloud computing for mobile users: can offloading computation save
energy? Computer 43(4), 51–56 (2010)
15. Zhou, B., Dastjerdi, A., Calheiros, R., Srirama, S., Buyya, R.: A context sensitive offloading
scheme for mobile cloud computing service. In: Proceedings of the IEEE 8th International
Conference on Cloud Computing, USA, pp. 869–876 (2015)
50 M. S. Zalat et al.

16. Bakshi, A., Dujodwala, Y.: Securing cloud from ddos attacks using intrusion detection
system in virtual machine. In: Proceedings of Second International Conference on
Communication Software and Networks, Singapore, pp. 260–264 (2010)
17. Terefe, M., Lee, H., Heo, N., Fox, G., Oh, S.: Energy-efficient multisite offloading policy
using markov decision process for mobile cloud computing. Pervasive Mob. Comput. 27(1),
75–89 (2016)
18. Ou, S., Yang, K., Zhang, J.: An effective offloading middleware for pervasive services on
mobile devices. Pervasive Mob. Comput. 3(4), 362–385 (2007)
19. Simon, H.A.: The Sciences of the Artificial. MIT press, Cambridge (2019)
20. Thrun, M.: Projection-Based Clustering Through Self-Organization and Swarm Intelligence:
Combining Cluster Analysis with the Visualization of High-Dimensional Data. Springer,
Berlin (2018)
21. Zhang, W., Wen, Y., Wu, D.: Energy-efficient scheduling policy for collaborative execution
in mobile cloud computing. In: Proceedings of the IEEE INFOCOM, Italy, pp. 190–194
(2013)
22. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge
(2018)
Toward an Efficient CRWSN Node Based
on Stochastic Threshold Spectrum Sensing

Reham Kamel Abd El-Aziz1,2(&), Ahmad A. Aziz El-Banna1,


HebatAllah Adly1, and Adly S. Tag Eldien1
1
Electrical Engineering Department, Faculty of Engineering at Shoubra,
Benha University, Cairo, Egypt
reham.kamel19@feng.bu.edu.eg
2
Electronics and Communication Department,
Modern Academy for Engineering and Technology, Cairo, Egypt

Abstract. The high demand for wireless sensor networks (WSNs) is growing in
different applications. Most WSNs use the unlicensed band (ISM band) which
leads to congestion in that band. On the other hand, without damaging the
quality of service (QoS) of the network, minimizing the consumed energy is
vital in sensor networks design. Cognitive radio-based wireless sensor networks
(CRWSNs) afford some solutions to the problem of scarce unlicensed band
spectrum. The spectrum sensing is the main function of the cognitive radio
networks. In this paper, for maximizing the accuracy of sensing, as well as the
energy efficiency of the network, proposed novel method by employing adaptive
spectrum sensing. Spectrum sensing is performed by Secondary User (SU) to
identify if the Primary User (PU) is idle, then for verifying that primary user is
actually idle, sensing the spectrum again is done by secondary user in order to
provide better protection for the primary user. Because of CRWSN has a con-
straint in energy, that adaptive interval of sensing could also, be modified to
optimize the energy efficiency of the network according to the different activity
of the PU. Simulation results were provided to validate the efficacy of the
proposed algorithms to enhance both spectrum sensing performance and energy
efficiency.

Keywords: Wireless sensor network  Cognitive radio-based wireless sensor


network  Spectrum sensing  Sensing time  Energy efficiency  Sensing
performance

1 Introduction

The challenge of spectrum shortage has become more significant due to the massive
rise of wireless communication techniques. Owing to the restricted frequency
deployment systems, the restricted available spectrum cannot satisfy the increasing
demand for wireless communications [1]. Cognitive radio (CR) with an amoral attitude
and flexible access to spectrum has come to pass to solve this problem. Based on a
software-defined radio, cognitive radio is identified as an intellectual wireless com-
munication platform that is conscious of its surroundings that is efficiently commu-
nicated with optimal use of the radio spectrum [2].
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 51–64, 2021.
https://doi.org/10.1007/978-3-030-58669-0_5
52 R. K. A. El-Aziz et al.

In this paper, we consider new WSN technology with Cognitive Radio called
Cognitive Radio Wireless Sensor Network (CRWSN). The CR technology enables
sensor nodes to identify appropriately licensed bands by implementing spectrum
sensing in CRWSNs, where the Secondary Users (SUs) can use spectrum gaps or white
spaces opportunistically to increase bandwidth utilization while detecting the Primary
Users (PUs) as idle. Because PUs should not be clashed with SUs, it is very essential
for SU to track the movement of PUs accurately. The time of sensing is a key factor
which can improve the performance of sensing. In general, longer scanning time will
minimize sensing errors and provide the PU with better security. The optimum sensing
time therefore leads to a balance between sensing performance and secondary
throughput [3].
In this paper, we employ the sensing optimization results to enhance the efficiency
of the CRWSN making it more robust to the noise fluctuations, since a stochastic
method of threshold level determination helps in overcoming the noise fluctuations,
otherwise the time optimization offers more sensing accuracy and leads to high energy
efficiency.
The remainder of this paper is formulated as follows. Section 2 provides a literature
review on spectrum sensing using the technique of energy detection, in addition to
describing the threshold expression in noise uncertainty condition using a stochastic
method. Section 3 illustrates how we could implement proposed stochastic threshold
for the energy detection technique in the design phase of the proposed method, and
formulates the energy-efficient optimization problem for optimum sensing time.
Throughout Sect. 4, simulations are used to test the efficiency of the proposed scheme.
Finally, Sect. 5 concludes the paper and discusses the future work.

2 Basic Concepts

2.1 Energy Detection Based Spectrum Sensing


The most popular spectrum sensing methods currently are the matched filter detection
[4], energy detection [5, 6], cyclostationary detection [7] and eigenvalue-based
detection [8]. Various comparisons between these approaches are widely covered in the
literature, e.g. in [9], however, we can summarize the main differences between these
techniques as follows. The eigenvalue-based method of detection does not require the
information of the PU’s signal properties, but the computation is complex. Cyclosta-
tionary method of detection is robust to the noise uncertainty and able to distinguish
noise and PU which leads to high sensing accuracy, but it is complex. The matched
filter detection method has the lowest time of execution and robust under low SNR
conditions, but the PU signal information is needed and High computational
complexity.
The energy detector method, also known as radiometry or periodogram, is the most
popular way of detecting spectrum due to its low complexity of implementation and
rapid efficiency [10]. To formulate the spectrum sensing, a binary hypothesis is used.
H0 and H1 denote the idle hypothesis and the busy PU states respectively, while pi and
pb specify H0 and H1 probabilities, respectively. Therefore, pi + pb = 1.
Toward an Efficient CRWSN Node Based on Stochastic Threshold 53

In this paper, we consider the energy detection method to detect the PU operation.
The SU compares the energy obtained to a predefined threshold, and if the energy
obtained is greater than the limit, the PU will be considered busy; otherwise, the PU
will be considered idle. The energy detector test statistics G(z) can be expressed as
follows [3]:

1X N
GðzÞ ¼ jzðnÞj2 ; ð1Þ
r2v n¼1

where z(n) is the signal sampled and N is the samples number taken during the sensing
process. If the PU is H1, z(n) = f(n) + v(n), where the PU’s signal is f(n), This is
expected to be an independent and identically distributed (iid) random process with a
mean of zero and a variance of r2f , and v(n) is a white Gaussian noise with a mean of
zero and a variance of r2v . On the other side, if the PU is in H0, z(n) = v(n). The test
statistics follow the central and non-central distribution of chi-square with 2N degrees
of freedom under hypothesis H0 and H1, respectively. The test statistic can be
approximated as Gaussian because the central limit theorem can be applied when the
value of N is high enough [2]. We can then describe the test statistics as follows:

N ðN; 2N Þ   H0
GðzÞ  2 ð2Þ
NðN ð1 þ cÞ; 2N 1 þ cÞ H1

where the Signal to Noise Ratio (SNR) received from the PU is c ¼ r2f =r2v . On this
basis, it is possible to define the probability of detection pD and the probability of false
alarm PFA as follows:
rffiffiffiffi!
k N
pD ¼ pðH1 jH1 ÞpD ¼ Q pffiffiffiffiffiffi  ð3Þ
2N ð1 þ cÞ 2
rffiffiffiffi!
k N
pFA ¼ pðH1 jH0 Þ ¼ Q pffiffiffiffiffiffi  ð4Þ
2N 2

where the threshold of sensing k is in comparison with the power received. Specifi-
cally, the PU is considered active when the SU senses the PU and the energy obtained
is greater than k; otherwise the PU is considered idle. Q() is the Q function.
The number of samples N can be calculated as N = 2t W, where W is the PU signal
bandwidth and t denotes the time of sensing [2]. Using (3), the threshold of sensing k
can be obtained as:
rffiffiffiffi!
pffiffiffiffiffiffi N
k ¼ 2N ð1 þ cÞ Q1 ðpD Þ þ ; ð5Þ
2
54 R. K. A. El-Aziz et al.

where Q−1() is the inverse of the Q-function defined above. Substituting by k in (4),
the PFA is obtained as:
rffiffiffiffi!
1 N
pFA ¼ Q ð1 þ cÞQ ðpD Þ þ c ; ð6Þ
2

PD should be greater than or equal to a predefined threshold pth D to ensure essential


protection for the PU in CRWSNs. Based on (6), since pD is a fixed value, PFA
decreases as the sensing time increases. Moreover, as pD decreases, the value of
Q−1(pD) increases and PFA decreases as Q−1(pD) increases. Therefore, PD is set as pth
D
(pD = pth
D ) to make sure that the available secondary throughput is maximum.

2.2 Noise Uncertainty Stochastic Approach


The fluctuations in noise are defined as random signals because we cannot accurately
determine their values, that is, they are values of uncertainty. Let’s denote the estimated
variance in noise as [11]:
 2  
b v ¼ a þ 10 log r2v
10 log r ð7Þ

where a flouts a uniform distribution in the [−UdB, UdB] interval, and at U 2= 0 there
 is
no ambiguity about noise. The resulting estimated noise b r v falls to ð1=r Þrv ; rrv , and
2 2

r ¼ 10ðU=10Þ .
Under the condition of noise uncertainty, signal power Ps should be larger than the
entire noise power interval size to distinguish the presented signal situation from only
noise fluctuation r2v [2], i.e.,

Ps [ rr2v  ð1=rÞr2v ¼ ðr  1=rÞr2v ð8Þ

SNR ¼ Ps =r2v [ ðr  1=r Þ ð9Þ

Under both hypotheses, the mean of the test statistics is related to the noise variance.
For practice, the estimated noise variance b r 2v is used to calculate the noise variance
instead of the noise variance r2v . In simulations, noise uncertanity is considered to better
satisfy the realistic implementation settings.
Two major problems in spectrum sensing are the noise uncertainty and quality
degradation, e.g. the false alarm probability PFA increases and the probability of
detection PD decreases. In addition, a fixed threshold energy detection algorithm
provides degraded quality with noise uncertainty. This indicates that in the presence of
noise uncertainty, the dynamic threshold would yield better performance [10].
Toward an Efficient CRWSN Node Based on Stochastic Threshold 55

3 System Model

In this paper, we consider a typical CRWSN consists of a single PU, and a secondary
connection transmitter-receiver pair as shown in Fig. 1. In addition, other source and
sink nodes are also exist for data transmission and the main spectrum access links are
the PU link or the licensed band, and the SU link or the opportunistic spectrum.

Fig. 1. Proposed system model.

3.1 The Proposed Threshold Expression

The Old Threshold Under Noise Uncertainty


The value of the threshold k0 can be determined as follows. In the case of hypothesis
H0, which only corresponds to the presence of noise, we know that g [n] is (i.i.d.)
Gaussian random variables with zero-mean and r2v variance. When the samples are
large enough, using the Central Limit Theorem (CLT), the noise approaches Gaussian
distribution (µ – v, r2v ), which can be determined from simulation. Then the k0
threshold value is [2, 6].
 
k0 ¼ lv þ rv :Q1 1  ð1  PFA ÞN
1
ð10Þ

Stochastic Threshold
The probability of false alarm PFA, will get increased in conventional signal threshold
detection with U dB uncertainty if the actual noise r2v is greater than the expected noise
variance br 2v . For an optimal trade-off between PD and PFA, the decision threshold k
could be selected. It is important to get the knowledge of noise intensity and signal
strength to get the optimum threshold value of kS. Noise power can be estimated, but it
is necessary to obtain signal power, transmission and propagation characteristics. The
threshold is usually chosen in practice to fulfill a certain PFA, which only requires
knowledge of the noise power. Unless signal SNR is small, the situation is similar to
hypothesis H0, the detection probability PD will be increased, e.g. the probability of
detection in conventional signal threshold detection with uncertainty UdB. When the
signal SNR decreases, the test statistics will be lower than the threshold more often if
r2v is lower than b r 2v , which is equal to the threshold increase. Then the detection
probability will be increasing. The high and low threshold values can be set using the
maximum, minimum noise uncertainty value respectively [12] as follows:
56 R. K. A. El-Aziz et al.

kH ¼ k0 þ U; and kL ¼ k0  U ð11Þ

There are three cases for signal decision according to Eq. (11): 1) If G(z) > kH , so
signal is existing.
2) If G(z) < kL , then signal is missing. 3) If kL < G(z) < kH , there is no decision. In
this scenario, the sensing will fail and the receiver will request a new spectrum sensing
from the cognitive user [12].
To overcome that problem of G(z) lies between kL and kH , the stochastic suggested
threshold kS will investigate threshold various values between lower and higher
thresholds and drawing their histogram to get some insights from it as follows. After
building the histogram, threshold with most repeated value which has the greater
histogram would be selected for use if the signal lies between kL and, then kS is defined
as:

X
k
kS ¼ Maxn ð ki Þ ð12Þ
i¼1

where n is the complete observations number, i is a number of iterations and ki is a


function of histogram which counts the number of observations falling between the
threshold values (known as bins). Additionally, k is the bins entire number (k = kH k
h )
L

and h is size of bin. Summation’s largest value of the equivalent threshold value is
selected and used as the stochastic threshold.

3.2 The Proposed Sensing Time Scheme


Following the frame structure in [3], equally dividing time to frames, that contains
firstly phase for sensing, then second phase for transmitting data. It is believed that the
inaccurate SU’s spectrum sensing, causing errors in sensing (i.e., false alarm and miss
detection). The SU conducts spectrum sensing during the sensing process to detect the
behavior of the PU. If the result of sensing finds that the PU is idle, during the data
transmission process the SU always has data to transmit, otherwise, the SU would
remain silent. To simplify the problem, a time-framed structure is assumed to follow
the operation of the PU. In other words, the spectrum is either occupied by the PU or
vacant during one frame time.
Figure 2 displays the spectrum sensing frame structure where T represents frame
time, ts denotes time of spectrum sensing, D0 and D1 show sensing outcomes when the
PU is idle and active, respectively. The SU conducts the second spectrum sensing
dynamically in the proposed scheme based on the first test of spectrum sensing. In
particular, for time ts sensing spectrum is performed by SU, and then keeps quiet if the
sensing output is D1. If the sensing result is D0, spectrum sensing will be performed by
the SU for time ts again to verify the absence of the PU for better protection. The SU
will transmit data if the second sensing result is still D0, Else it will keep quiet.
Moreover, the final decision for sensing is obtained from 2nd sensing result, when the
result of 2nd sensing is differing from the 1st sensing result.
Toward an Efficient CRWSN Node Based on Stochastic Threshold 57

Fig. 2. Structure of spectrum sensing frame.

It is possible to save the energy needed for spectrum sensing at each frame start,
when takes into account the primary activity level as sensing interval will be expanded
to several frames if the primary user is active according to results of final sensing. As a
result of improving the accuracy of spectrum sensing, the number of inaccurate data
transmission is reduced, this is in turn increases network energy efficiency by avoiding
excessive energy consumption due to incorrect data transmission. Six possible cases
based on the results of the first and second spectrum sensing are shown in Fig. 3.

Fig. 3. Sensing time and sensing interval frame structure.

Based on the six cases, the PU activity was successfully detected in Cases 1 and 2.
Case 3 triggered miss detection problem, while Cases 4 and 6 contributed to false alarm
problem. Only in case 5, a good result was achieved. From the discussed cases, case 3
only leads to the missed detection problem p1m can be specified as:

p1m ¼ pb ð1  pD Þ2 ð13Þ
58 R. K. A. El-Aziz et al.

while p2m is identified as:

p2m ¼ pb ð1  pD Þ ð14Þ

Based on Fig. 3. When data is transmitted (i.e., Case 3 and Case 5), it must execute
a spectrum sensing twice. As discussed above, when the primary user actual state is U0
is valid only for data transmission. the invalid throughput probability as a result of miss
detection Px1 and the valid throughput probability Px2 could be estimated by the
following equations [3]:

px1 ¼ pb ð1  pD Þ2 ð15Þ

px2 ¼ pi ð1  pFA Þ2 ð16Þ

Thus, the SU probability for data transmission can be determined using following
expression as modeled in [3]

PX ¼ PX1 þ PX2 ð17Þ

As shown in Fig. 3, the SU may be silence in two situations: 1) the SU perform the
spectrum sensing once and the result is D1 so it became silent n frames as case 1 and 4,
2) the SU perform the spectrum sensing once and the result is D0 then perform the 2nd
spectrum sensing and the result is D1 so it became silent n frames as case 2 and 6. the
probability of doing spectrum sensing once and twice is Py1 and Py2 and can expressed
as

py1 ¼ pb pD þ pi pFA ð18Þ

py2 ¼ pb ð1  pD ÞpD þ pi ð1  pFA ÞpFA ð19Þ

when D1 is the sensing final result. Thus, Pv is the probability that sensing final result
D1, is given as follows:

py ¼ py1 þ py2 ð20Þ

when Px and Py are known, and assuming that we have two successive frames, the
throughput can be discussed, it will be four situations for the secondary throughput: 1)
through data then remains silent for n frames, 2) Remain silent for n frames then remain
silent for n frames, 3) through data then through data, 4) Remain silent for n frames
then through data. situations 1, 3, and 4 returns correct secondary throughput ST1, ST3,
and ST4, respectively, that equations are defined as follows [3]:

px2 py ðT  ts ÞC
ST1 ¼ ST4 ¼ ð21Þ
nþ1

ST3 ¼ p2x2 ðT  ts ÞC þ px1 px2 ðT  ts ÞC ð22Þ


Toward an Efficient CRWSN Node Based on Stochastic Threshold 59

where C denotes the channel capacity of the SU without PU interference that can be
described according to Shannon theorem ðC ¼ log2 ð1 þ csÞÞ, where cs indicates the
SU transmitter received SNR. Furthermore, n gives the frames number that the SU
stays silent. For ST3 the valid throughput for two frames(current and next frames)
represented by the term p2x2 ðT  ts ÞC; and the term px1 px2 ðT  ts ÞC, is the achieved
throughput by one frame when the other frame senses a miss detection. The total
average valid throughput per average frame ST is:

ST ðtsÞ ¼ ST1 þ ST3 þ ST4 ð23Þ

As n is the number of frames and it is set to positive integers, it’s value can be
modified depending on the PU activity. The values of pi and pb determines the value of
n. as the PU busy the sensing interval increased, thus n depends essentially on the prob
that PU busy state will continue. Assume that the PU current state is U1, then the
probability that it will stay occupied for n frames, ps(n), is determined by:

ps ðnÞ ¼ pn1
b ð1  pb Þ; n 2 f1; 2; 3; . . .g ð24Þ

Then the probability of the PU being occupied for at most n frames Ps(n), can be
described as:

X
n
P s ð nÞ ¼ ps ð i Þ ð25Þ
i¼1

A threshold w is specified for Ps(n), which corresponds to 0  w  1. The


n values depending on w based on the equation as follows:

n ¼ minfn : Ps ðnÞ  wg ð26Þ

After the previous clarification for the two proposed schemes, stochastic threshold
and sensing time, the proposed CRWSN node will use the two mentioned schemes as
follows when turning on the CRWSN node it will discover the working environment to
determine the appropriate value of stochastic threshold which will be used as an offline
threshold for spectrum sensing operations will be performed either as a first sensing or
second sensing, which will enhance the sensing performance.

4 Performance Evaluation

This section investigates the proposed scheme’s performance evaluation using


MATLAB. A comparison of proposed scheme with two other schemes for optimizing
sensing time, adaptive sensing threshold discussed in [13] and hybrid threshold dis-
cussed in [14].
60 R. K. A. El-Aziz et al.

4.1 Simulation Parameters


The simulation scenario is a basic CRWSN, consisting of a single PU and a secondary
link with a randomly assigned transmitter-receiver sensor node pair within the PU’s
communication range. The band that was licensed occupied by the PU is assigned to
the SUs. Other simulation parameters as, P1D ¼ 0:9; W ¼ 6 MHZ, T ¼ 0:2 s;
w ¼ 0:5; cs ¼ 20 dB, c ¼ 20dB, and C ¼ 6:6582 bits/sec/Hz.

4.2 Simulation Results


Figure 4-a illustrates the false alarm probability and the probability of missed detection
of the old expression of threshold for different SNR at U = 0 dB and U = 1 dB noise
uncertainty, it found that the missed detection prob is increased in case of U = 1 dB
rather than U = 0 case which indicates that the sensing result using old threshold are
effected by noise uncertainty, also the prob of false alarm is increased.

Fig. 4. a) For different SNR, missed detection prob. and false alarm prob. of the old threshold at
U = 0 dB and U = 1 dB noise uncertainty respectively. b) Histogram for Stochastic-threshold in
dB at environment of noise uncertainty.

Figure 4-b shows the histogram of the number of trials against the threshold levels
in dB at ambient noise uncertainty. The value has the highest number of iterations
occurs at a stochastic threshold equals −28.2 dB, so it has been selected.
In Fig. 5-a, plot of the Stochastic and double thresholds probabilities of false alarm
and missed detection versus SNR for noise uncertainty of 1 dB, which shows the two
schemes double threshold and stochastic threshold approximately have the same per-
formance taking in consideration the advantage of stochastic in case of ambiguity when
the received signal level in the middle between the higher and lower threshold.
Figure 5-b demonstrates the probability of missed detection against SNR using the
old, double and stochastic threshold respectively for noise uncertainty U = 0 dB and
U = 1 dB. Additionally, missed detection prob Pm = 0.1 was obtained according to the
802.22 standard maximum acceptable value of PFA = 0.1, at 8.2 ms sensing duration
Toward an Efficient CRWSN Node Based on Stochastic Threshold 61

Fig. 5. a) For different SNR, probabilities of false alarm and missed detection of the stochastic
and double thresholds at U = 1 dB noise uncertainty. b) For different SNR, Missed detection
probability for the old threshold at U = 0 dB and U = 1 dB noise uncertainty, and the double and
stochastic threshold under U = 1 dB noise uncertainty.

for original threshold at SNR = 25.5 dB at U = 0 dB and obtained at 25 dB SNR at


U = 1 dB at PFA = 0.4 which was not accepted by the standard, for the U = 1 dB, if
target PFA = 0.1 is obtained at 22.6 dB SNR for the stochastic threshold and 22.5 dB
SNR for the double threshold. The noise uncertainty U is 0 dB. If accurate information
of noise exists, the old threshold exceeds the stochastic and double by 3 dB as illus-
trated in Fig. 5-a. Furthermore, if the expected noise uncertainty U = 1 dB, the effi-
ciency of the original threshold is worse than the results of the stochastic and double
threshold with respect to the situation PFA = 0.1. Moreover, stochastic threshold per-
formance outperforms by 0.1 dB, the double threshold as seen in Fig. 5-b.
To cover various levels of noise uncertainty that occur at different operation
environments, we conduct a study to compute the corresponding stochastic threshold
values. Figure 6-a concludes the results achieved from this study, where the increase in
the noise uncertainty reduces the stochastic threshold. The prior knowledge of these
values in the initialization phase will help the node to save time in the next operation
phases.
Figure 6-b illustrates the sensing duration n as a function of pi when the threshold
w is 0.5 and the output of the sensing is D1. As could be seen, as pi increases, the
sensing duration n reduces. If pi is greater than 0.5, the interval of sensing n is often 1.
In different words, at start of each frame, the SU executes spectrum sensing at pi > 0.5.
The scanning interval n shrinks as pi increases, hence the SU executes more frequent
spectrum sensing and there are additional chances for spectrum holes to be utilized so
secondary throughput is enhanced. Moreover, as pi decreases, the sensing duration
increases, thereby reducing spectrum sensing energy consumption to boost energy
efficiency. In fact, the threshold w is highly dependent on the choice between energy
efficiency and secondary throughput.
Figure 7-a shows the comparison of these three schemes in the secondary
throughput for fixed T = 0.2 s as a function of pi. As is obvious, the proposed
scheme’s secondary throughput is lower than those suggested in [13] and [14]. This is
62 R. K. A. El-Aziz et al.

Fig. 6. a) Stochastics threshold values for different noise uncertainty. b) The interval of sensing
as a function of pi.

because the SU must execute the sensing again to guarantee the initial result of sensing
when it indicates that the PU is idle. It minimizes the available data transmission time.
Moreover, the interval of sensing becomes extended when the Primary user activity
becomes more and the outcomes of sensing display that the primary user is active. This
decreases spectrum sensing consuming for energy; however, when the SU stays silent
for n frames, the chances for data transmission are also wasted. These are the two key
reasons why the suggested scheme’s secondary throughput is lower than the other two
schemes.

Fig. 7. a) Secondary throughput comparison. b) Miss detection probability comparison.

Figure 7-b compares the probability of miss detection p1m of the planned scheme
with the existing ones in [13] and [14]. The suggested scheme’s probability of miss
detection is often smaller than two other schemes. Because the SU just executes
spectrum sensing once in [13] and [14], the probability of such two approaches is just
like p2m, according to Eq. (14), it can be seen that the ratio between pm1 and p2m is
1  pD . Since pD is put on the value itself as just the two other techniques (i.e., 0.9),
Toward an Efficient CRWSN Node Based on Stochastic Threshold 63

with our suggested model the probability of miss detection reduces by a factor 10. In
particular, as pi becomes smaller, the difference among both p1m and p2m has become
significantly larger. Even though when the result of the sensing is D0 spending more
time for detecting the primary user and for the average frame the secondary throughput
is lower, the primary user has more protection because of the lower probability of miss
detection, Therefore, the decrease in the incorrect data transmissions number will cause
reduction in energy consumption which increases the energy efficiency of the network
and extends the lifespan of the network. Therefore, within the extended lifetime, the
network can be used more to substitute the wasted secondary throughput.

5 Conclusion and Future Work

This paper presents a creative model for CRWSN node, which determines the proposed
stochastic threshold in the first run of the node that helps to overcome the noise
uncertainty in the sensing environment, which shows an enhancement in sensing result.
Then executes sensing on the basis of results of the first sensing for either one or two
intervals. This helps the CRWSN as a secondary user to conduct again the spectrum
sensing to verify that the primary user is in fact silent when the results of first sensing
shows that the primary user is silent. Furthermore, the proposed solution respects the
PU’s level of activity. The interval of sensing may be varied to further enhance the
energy efficiency depending on the different primary user activity levels.
Moreover, simulation analysis validates that higher energy efficiency and better
performance of spectrum sensing occur from the proposed scheme. Simulation analysis
shows that the stochastic proposed threshold at noise uncertainty existence of 1 dB
exceeds the double threshold at PFA = 0.1 and time of sensing 8.2 ms by more than
0.1 dB. As a future work, we can try the proposed CRWSN node model in cooperative
sensing situation and study its effect on the cooperative sensing performance.

References
1. Ivanov, A., Dandanov, N., Christoff, N., Poulkov, V.: Modern spectrum sensing techniques
for cognitive radio networks: practical implementation and performance evaluation. Int.
J. Comput. Inf. Eng. 12(7), 572–577 (2018)
2. Rabie, A., Yousry, H., Bayomy, M.: Stochastic threshold for spectrum sensing of
professional wireless microphone systems. Int. J. Comput. Sci. Netw. 4(4) (2015)
3. Kong, F., Cho, J., Lee, B.: Optimizing spectrum sensing time with adaptive sensing interval
for energy-efficient CRSNs. IEEE Sens. J. 17(22), 7578–7588 (2017)
4. Lee, J.W., Kim, J.H., Oh, H.J., Hwang, S.H.: Energy detector using hybrid threshold in
cognitive radio systems. IEICE Trans. Commun. E92-B(10), 3079–3083 (2009)
5. Kay, S.M.: Fundamentals of Statistical Signal Processing: Detection Theory. Prentice-Hall,
Upper Saddle River, (1998)
6. Atapattu, S., Tellambura, C., Jiang, H.: Analysis of area under the ROC curve of energy
detection. IEEE Trans. Wireless Commun. 9(3), 1216–1225 (2010)
7. Sutton, P.D., Nolan, K.E., Doyle, L.E.: Cyclostationary signatures in practical cognitive
radio applications. IEEE J. Sel. Areas Commun. 26(1), 13–24 (2008)
64 R. K. A. El-Aziz et al.

8. de Souza Lima Moreira, G., de Souza, R.A.A.: On the throughput of cognitive radio
networks using eigenvalue-based cooperative spectrum sensing under complex Nakagami-m
fading. In: Proceeding of International Symposium. Network, Computer Communication
(ISNCC), pp. 1–6, May 2016
9. Kyryk, M., Matiishyn, L., Yanyshyn, V., Havronskyy, V.: Performance comparison of
cognitive radio networks spectrum sensing methods. In: Proceeding of International
Conferences on Modern Problems Radio Engineering, Telecommunication and Computer
Science (TCSET), pp. 597–600, February 2016
10. Farag, H.M., Ehab, M.: An efficient dynamic thresholds energy detection technique for
cognitive radio spectrum sensing. In: Proceeding of Computer Engineering Conference
(ICENCO), pp. 139–144, December 2014
11. Prashob, R.N., Vinod, A.P., Krishna, A.K.: An adaptive threshold based energy detector for
spectrum sensing in cognitive radios at low SNR. In: The 7th IEEE VTS Asia Pacific
Wireless Communication (2010)
12. Xie, S., Shen, L.: Double-threshold energy detection of spectrum sensing for cognitive radio
under noise uncertainty environment. In: International Conference on Wireless Communi-
cations & Signal Processing (2012)
13. Luo, L., Roy, S.: Efficient spectrum sensing for cognitive radio networks via joint
optimization of sensing threshold and duration. IEEE Trans. Commun. 60(10), 2851–2860
(2012)
14. Li, X., Cao, J., Ji, Q., Hei, Y.: Energy efficient techniques with sensing time optimization in
cognitive radio networks. In: Proceeding of IEEE Wireless Communication and Networking
Conference (WCNC), pp. 25–28, April 2013
Video Captioning Using Attention Based
Visual Fusion with Bi-temporal Context
and Bi-modal Semantic Feature Learning

Noorhan K. Fawzy(&), Mohammed A. Marey(&),


and Mostafa M. Aref(&)

Faculty of Computer and Information Sciences,


Ain Shames University, Cairo, Egypt
{norhan.khaled,mohammed.marey,
mostafa.aref}@cis.asu.edu.eg

Abstract. Video captioning is a recent emerging task that describes a video


through generating a natural language sentence. Practically videos are untrim-
med where both localizing and describing the event of interest is crucial for
many vision based- real life applications. This paper proposes a deep neural
network framework for effective video event localization through using a
bidirectional Long Short Term Memory (LSTM) that encodes past, current and
future context information. Our framework adopts an encoder decoder network
that accepts the event proposal with highest temporal intersection with ground
truth for captioning. Our encoder is fed with attentively fused visual features,
extracted by a two stream 3D convolution neural network, along with the
proposal’s context information for generating an effective representation. Our
decoder accepts learnt semantic features that represent bi-modal (two modes)
high-level semantic concepts. We conduct experiments to demonstrate that
utilizing both semantic features and contextual information provides better
captioning performance.

Keywords: Video to natural language  Encoder-decoder  Recurrent neural


network  Attention-based LSTM  Bidirectional LSTM  Temporal action
localization  Deep learning

1 Introduction

Video Captioning is a machine intelligence tasks, whom ultimate goal is generating a


natural language description for the content of a video clip, just like humans. Many
recent real life intelligent applications such as video search and retrieval, automatic
video subtitling for supporting blind disabled people, have been gaining an emerging
need. Recent large scale activity datasets [9, 10]; have highlighted the success of many
models that solved the video action recognition task. Such models would output labels
like jumping, dancing or sporting archery. The tightness of the details level provided by
those models is considered a key limitation. For compensating this limitation, many
subsequent works in the research community [1–3] have embraced the task of
explaining video semantics using natural language sentences. These works would

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 65–78, 2021.
https://doi.org/10.1007/978-3-030-58669-0_6
66 N. K. Fawzy et al.

likely describe a video with an informative sentence such as “A man is shooting a bow
towards a target”. In practice, videos are not pre-segmented into short clips that contain
the action of interest which will be described in natural language.
The research community has been striving recently to develop models that are able
to identify all events in a single pass [4] of the video. We design a temporal localization
module and a video captioning module that, for each input video, a descriptive sentence
along with its time location is generated automatically. The main tasks that our
framework covers can be listed as:
1. The detection of temporal event segment. For this, we adopt a Bidirectional single
pass temporal event proposals generation model that encodes past, current and
future video information, Fig. 4.
2. The representation of this event segment. It is important to note that representing the
target event within the video sequence in isolation without considering its con-
textual information will not produce consistent video caption. The need to consider
the temporally neighbouring video contents for the target event is necessary for
providing antecedents and consequences for understanding an event, Fig. 3.
3. Generating a description for the detected event. The linking of visual features to
textual captions directly, as done in many works [1–3] may neglect many rich
intermediate and high-level descriptions, such as people, objects, scenes, and
actions. To address this issue, this study employs extracting two types (bi-modal) of
high-level semantic concepts. Where concepts that describes objects,
backgrounds/scenes are referred to static semantic concepts. Whereas concepts that
describe dynamic actions are referred to dynamic semantic concept.
In Sect. 2 we will discuss the related works. After that our proposed Framework
description will be provided in Sect. 3. Section 4 and 5 discusses the performance
evaluation and implementation details respectively. Finally our conclusion and future
work will be given in Sect. 6.

2 Related Works

Our video captioning framework requires both temporal localization and descriptions
for all events that may happen in a video. We review related works on the above two
tasks.

2.1 Action Recognition and Localization


Deep convolution networks with 3D kernels such as 3D ResNet [5, 6], 3D Inception
network [7, 8] architectures, are able to extract motion characteristics from input frame
volumes over time. They achieved surprisingly high recognition performance on a
variety of action datasets [9, 10]. Also the combination of these 3D convolution net-
works with temporally-recurrent layers such as LSTMs, in [6, 8, 12], have shown great
improvement in performance. [13] argued that untrimmed videos contain target actions
that usually occupy a small portion of the whole video stream. Some current methods
for action temporal localization [13, 14] rely on applying action classifiers at every time
Video Captioning Using Attention Based Visual Fusion 67

location and at multiple temporal scales, in a temporal sliding window fashion. Major
drawbacks regarding these methods are, the high computational complexity, scales of
sliding windows are predetermined based on the statistics of the dataset and temporal
boundaries generated are usually approximate and fixed during classification.
Recent researches [4, 15, 27] worked on avoiding the high computational com-
plexity drawbacks of sliding windows. They used a CNN-RNN architecture. The work
in [4] followed an end to end proposal generation model. Their model scans an
untrimmed video stream of length L frames divided into T = L/ d non-overlapping time
steps, where d = 16 frame, as in Fig. 1 (a). Each time step is encoded with the acti-
vations from the top layer of a 3D convolutional network pre-trained for action clas-
sification (C3D network [16]), as in Fig. 1 (b). A recurrent neural network (RNN) was
used for modelling the sequential information into a discriminative sequence of hidden
states, as in Fig. 1 (c). The hidden representation at each time step is used for producing
confidence scores of multiple proposals with multiple time scales that all end at time t,
as illustrated in Fig. 1 (d). However, these methods simply neglects future event
context and only encode past and current event context information when predicting
proposals.

2.2 Action Captioning


Orthogonal to work studying action proposals, early approaches in deep learning [1, 2]
directly connected video with language. The encoder-decoder unified framework was
used. Translating video pixels to natural language was their aim. They used a single
deep neural network. Where a convolutional neural network (CNN) like ResNet [5],
C3D [15] or two-stream network [17] was used as an encoder for extracting features
from the video. A mean pooling of features across all frames is applied for obtaining a
fixed-length vector representation, which is considered as a simple and reasonable
semantic representation for short video clips, Fig. 2 (a). Translation to natural language
is done via a stacked two-layer recurrent neural network (RNN), typically implemented
with long short-term memory (LSTM) [18, 19], Fig. 2 (b). However, they considered
frame features of the video equally, without any particular focus. Using a single
temporally collapsed feature vector for representing such videos, Fig. 2 (a), leads to the
incoherent fusion of the dependencies and the ordering of activities within an event.
Extracting the temporal structure implied within the input video is important. Many
follow-up works such as [20, 21] explore improving model’s capability of encoding
both local motion features and global temporal structures. They proposed a novel
spatio-temporal 3D CNN, that accepts a 3-D spatio-temporal grid of cuboids. These
cuboids encodes the histograms of oriented gradients, oriented flow and motion
boundary (HoG, HoF, and MbH) [22]. They argued that, average pooling these local
temporal motion features would collapse and neglect the model’s ability to utilize the
video’s global temporal structure. For this a soft attention mechanism was adapted,
which permits the RNN decoder weighting each temporal feature vector.
Although the attention-based approaches have achieved excellent results, they still
ignore representing high-level video concepts/attributes. The work in [23] extracted
high-level explicit semantic concepts which further improved visual captioning.
68 N. K. Fawzy et al.

3 Framework

The framework of our approach consists of three components:


1) Visual feature extraction.
2) Event Proposal generation.
3) Captioning (Sentence generation).
In this section, we introduce each component of the framework in details.

3.1 Visual Features Extraction


The input video with L frames is discretized by dividing into T non-overlapping time
steps, where each time step is of size d = 16-frames. We adopt a two-stream [16] 3D
Residual Neural Network for the extraction of spatio-temporal features [28] from each
clip. A stream for learning to extract motion features from RGB frames using 3D
ResNet-18 [5]. The other for learning abstract high level motion features from motion
boundary frames using 3D ResNeXt-101 [5]. Motion boundary frames carry optimized
smooth optical flow inputs. The reason of using the two-stream 3D approach is that, the
study in [7] found that using optical flows as inputs to 3D CNNs resulted in a higher
level of performance that can be obtained from RGB inputs, but that the best perfor-
mance could be achieved by combining both. The reason we used Motion Boundaries
is that, optical flow represents the absolute motion between two frames, which contains
motion from foreground objects and background camera motion. Motion boundaries
are the derivative of the flow. In many cases, camera motion is locally translational and
varies smoothly across the image plane, which is eliminated in the motion boundary
frames.

3.2 Proposal Generation


This pipeline is an updated version of the work in [4], where they used only a single
direction RNN. The fused two stream features, are used with a bi-directional LSTM
recurrent sequence model. At each time step (t), we pass the hidden state that encodes
the sequence of visual features observed till time (t) through a fully connected layer
with sigmoid nonlinear function, as in Eq. 1 and Eq. 2. This produces multiple
(K) proposals scores, as
Figure 1 (d). Proposals have different time scales with a fixed ending boundary
(t) and (K) confidence scores. This is done for each LSTM direction (forward-
backward) independently, Fig. 4.

ð1Þ

ð2Þ

Finally after the passes from the two directions, we obtain a number N of proposals
collected from all time steps of both directions. We fuse the two sets of scores for the
Video Captioning Using Attention Based Visual Fusion 69

same proposals, yielding the final scores, as in Eq. 3. Thus, at each time step t, we take
both forward confidence C! i score and backward confidence score Ci for each pro-
posal, to compute the final proposal’s confidence score Cp . The proposal with a score
larger than a threshold will be finally selected for further captioning.

Fig. 1. Single pass temporal proposal generation.

Fig. 2. (a) CNN encoder for extracting frame visual features. (b) Collapsing the features across
the entire video through mean pooling and passing them to the stacked LSTM decoder.
70 N. K. Fawzy et al.

Fig. 3. Fusing both local context information and target event’s content for generating caption
words.

Cit ¼ fCi! XCi gki¼1 ð3Þ

The context of a proposal which is referred to future and past context can be
obtained from the hidden state hf and hp of the final forward and backward LSTM
layers respectively, as illustrated in Fig. 4. An action proposal has a start time step S
and an end time step E, Fig. 5.

3.3 Caption Generation


To implement caption generation using semantic features, a dynamic semantic concepts
network (DSC-N) is built upon an encoding LSTM. It accepts the visual features from
the temporal stream at each time step within the action proposal. Also the static
semantic concepts network (SSC-N) accepts visual features from the spatial stream.
The last output of each of the dynamic and static semantic concept LSTM networks is
passed to a fully connected layer and sigmoid activation function. Each outputs a
probability distribution, where (pd ) is the probabilities of the set of dynamic concepts
(verbs) and (ps ) is the probabilities of the set of static concepts (nouns) extracted from
the data set.
We follow the encoder-decoder framework using LSTMs for generating the cap-
tions. The potential power of the encoder-decoder lies in the fact that it is a generative
neural network that can map sequences of different lengths to each other. The input to
the LSTM sequence decoder is obtained by applying an attention mechanism on the
Video Captioning Using Attention Based Visual Fusion 71

concatenated semantic concept features which is referred to (Attended Semantic


Concepts-ASC) to treat each semantic feature differently at each time step, Fig. 5.
Both dynamic and static semantic concept are concatenated (Et ) and serve as inputs
for the attention layer. A weight value (wa ) reflecting the semantic concept features to
focus on at a current time step (t) is learnt within the attention layer from input semantic
features. The weight of the semantic concepts features (ct ) can be calculated using
Eq. 4, where ba is the bias.

ct ¼ softmax ðWa :Et þ ba Þ ð4Þ

The converted semantic concepts features ct serve as inputs to the decoding LSTM.
Conventionally, the attention should be directed to an object if the word to be generated
is a noun, and similarly the focus should be on behaviour if the word is a verb. The
output hidden state of the decoder at each time step is passed to a fully connected layer
and softmax operation that identifies the probability distribution of caption words.
Finally, caption is generated from the output words that are aligned in order from the
first to “EOS” which indicates the end of statement. Since the context is vital, in
captioning a detected proposal, we initialize the hidden state at t = 0 of the decoding
LSTM by fusing the proposal states from the forward and backward passes, which
capture both past and future contexts, h!f , hp , together with the visual features of the
detected proposal, using a sequence encoder, Fig. 5. The visual input to the sequence
encoder xt is defined in Eq. 5 and Eq. 6.

xt ¼ Ft ðSn Þ ð5Þ

Ft ðSn Þ ¼ fusionðh!
f ; hp ; V ; Ht1 Þ; n ¼ f1. . . Ng
t
ð6Þ

Where Vt ¼ fvi gPi¼1 is the two-stream visual features at each time step (of = 16
frames) within the proposal Sn , such that as mentioned before we have P time steps
which starts at S and ends at E within each proposal, Fig. 5. We Design a dynamic
attention mechanism to fuse visual features V ¼ fvi gEi¼S and context vectors h! f , hp .
Mainly the Dynamic attention on features at each time step while can effectively
improve the decoder’s captioning performance. To address this issue, we adopt an
attention-based LSTM encoder. This means that for each proposal we fuse its hidden
states together with its visual features, through a weighted linear combination. Where
that at tth time step of the sequence encoder, the un-normalized relevance score rti for
each time step’s features i within a proposal can be obtained as Eq. 7. Where, S and E
denote the start and end time steps of the proposal. Ht1 Is the hidden state of the
sequence encoder at the t-1 time step. Vector concatenation is applied on h!f , hp . The
weights of vi can be obtained by softmax normalization such as Eq. 8. The attended
visual feature is generated by a weighted sum through Eq. 9. The final input to the
sequence encoder could be expressed as Eq. 10. The last output of the sequence
encoder (context vector) is passed to initialize the hidden state ht ¼ 0 of the LSTM
Caption Decoder, Fig. 5. This vector encapsulates information from all input elements
for aiding the decoder to make accurate predictions.
72 N. K. Fawzy et al.

Fig. 4. Representing both future and past context information encoded in the hidden states hf
and hp for the forward and backward LSTMs at a time step t. The proposal prediction step
multiples the backward and forward confidence scores for each proposal in the current time step
producing the final score per proposal.

Fig. 5. Caption generation module. The sequence decoder is initialized with attended visual
features along with past and future context information. Whereas the input to the decoder is the
attended semantic extracted features.
Video Captioning Using Attention Based Visual Fusion 73

rti ¼ WTa :tanh ðWv vi þ Wh ½h!


f ; hp  þ WH Ht1 þ b Þ; i ¼ f S ... Eg ð7Þ
  .X E
bti ¼ exp rti rt
m¼s m
ð8Þ

X
P
Vt ¼ bti :vi ð9Þ
i¼1

Ft ðSn Þ ¼ ½V  t ; h!
f ; hp  ð10Þ

4 Performance Evaluation
4.1 Dataset
To train and assess the performance of the caption generation, we used the MSR-VTT
(video to text) dataset [1] which is divided into 20 categories of different activities, such
as (music, cooking, sports, education, politics …etc.). From these categories we
worked on the sports category, which consists of 785 video, where we divided into
80% training, 10% validation and 10% testing. Each video has around 20 natural
language captions. For training the semantic concept features networks, a set of words
acting as candidates, are defined with a size of C from all training captions. Among
them, we choose the most frequent 500 verb and 1500 noun as the designated
vocabulary to set the size of the fully connected layer within the dynamic and static
semantic concept networks respectively. Each video will have a created ground truth of
1 hot vector where 1’s are located for nouns in the caption that appeared in the
designated vocabulary.

4.2 Experiments
We experiment our adopted bi-directional LSTM temporal proposal module for
detecting event segments that are close to the real segments within the MSR-VTT
(sports). The Recall of such module in Table 1 was better than single-direction LSTM
which confirms that bidirectional prediction that encodes past, current and future
context indeed improves proposal quality, compared to single direction prediction that
encodes past and current context only. To assess the performance of our video cap-
tioning module we conduct experiments on the following:
1. Measuring the accuracy for each semantic feature extraction network. We used the
Mean Square Error (MSE) metric to evaluate the difference between the generated
semantic words and the ground truth words per network.
74 N. K. Fawzy et al.

2. Investigate the effects of semantic features (DSC-N, SSC-N) and visual context
captured through temporal visual attention (TVA-C), on caption generation per-
formance. BLEU [25] and CIDEr-D [26], are considered typical evaluation metrics,
for measuring the performance (precision and recall) of caption generation. During
caption words prediction, the sequence encoder model is used to encode the input
sequence once which returns the hidden states such that we use the last state to
initialize the sequence decoder model. In Table 3 we report the evaluation of the
following methods:

– (TVA-C) + ASC: this method indicates using the temporal visual attention with
context encoder for initializing the caption generation sequence decoder and the
attended concatenated dynamic and semantic concepts (ASC), which is composed
of DSC-N + SSC-N, as input to the decoder at each time step.
– Bi-H + ASC: this method indicates using the hidden states of the proposal event
from both directions within the bidirectional LSTM, Fig. 4, which are concatenated
to represent an input to the initial hidden state of the decoder and the attended
concatenated dynamic and semantic concepts (ASC), which is composed of DSC-
N + SSC-N, as input to the decoder at each time step.
– (TVA-C) + DSC-N: this method indicates using only the dynamic semantic con-
cepts DSC-N as input to the decoder at each time step, where the initialization is
done by temporal visual attention with context encoder.
– (TVA-C) + SSC-N: this method indicates using only the static semantic concepts
SSC-N as input to the decoder at each time step, where the initialization is done by
temporal visual attention with context encoder.
– (TVA-C): this method indicates using the temporal visual attention with context
encoder for initializing the caption generation sequence decoder. No input to the
decoder and no semantic features used such that the output words probability at
each time step is based only on the hidden state of the previous time step.
– (TVA-C) + Bi-H: this method indicates using the hidden states of the proposal
event from both directions within the bidirectional LSTM, Fig. 4, which are

Table 1. Recall of the proposal module on MSR-VTT (sports) test set.


Method TIOU = 0.8
Bidirectional-Temporal proposal generation 0.93
Single-direction Temporal proposal generation (Forward) 0.84

Table 2. Performance of semantic feature networks on the MSR-VTT (Sports).


Networks Val-accuracy Test-accuracy
SSC-N 99.66 99.68
DSC-N 99.84 99.87
Video Captioning Using Attention Based Visual Fusion 75

Table 3. Caption generation performance on the MSR-VTT (Sports) test set.


Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 CIDer
(TVA-C) + ASC 87.8 75.2 64.7 60.0 96.4
Bi-H + ASC 83.8 70.8 60.0 56.0 94.3
(TVA-C) + DSC-N 79.8 62.2 54.3 59.5 78.6
(TVA-C) + SSC-N 77.0 58.1 50.7 47.8 57.0
(TVA-C) + Bi-H 67.1 53.3 47.5 43.6 48.3
(TVA-C) 62.4 49.2 42.2 37.1 41.7

Table 4. Performance comparison against another framework on MSR-VTT (sports) test set.
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 CIDer
hLSTMat [29] 80.3 68.2 57.5 53.7 91.1
(TVA-C) + ASC 87.8 75.2 64.7 60.0 96.4

concatenated to represent an input to the decoder. The temporal visual attention


with context encoder is used for initializing the caption generation sequence
decoder. No semantic features used.

4.3 Experimental Results and Discussion


The recorded values in Table 2 indicate a high accuracy of semantic feature extraction
in both networks. The results in Table 3 indicate that, utilizing semantic concepts
(ASC, DSC-N, and SSC-N) networks during captioning is more effective than the case
that only used the caption generation without semantics. It is important to note that the
performance of (TVA-C) +DSC-N model is better than the (TVA-C) + SSC-N model.
This may be related to the effect of the dynamic semantic concept network that indi-
cates an activity present within the video.
Another reason is that the caption of a video usually contains single activity (verb)
and multiple objects (nouns). Furthermore, incorporating both the dynamic and static
semantic concept features within the model (ASC) is the most effective. Also when we
reused the proposal’s hidden states (backward, forward) as “context vectors” and fused
them with the event’s visual features, via attention mechanism (TVA-C), along with the
semantic features concepts (ASC), we got better captioning results, better than using
context vectors alone. The comparison in Table 4 indicates that our framework (TVA-
C) +ASC has better caption generation performance than a similar work in [29] which
applies temporal attention on mean pooled visual inputs. Our framework outperforms
[29] because we apply attention for the fusion of visual inputs along with context
vectors. Also our utilization of semantic features makes sense for better performance
than [29].
76 N. K. Fawzy et al.

5 Implementation Details

In this work, we utilized Pytorch python library which is a deep learning library within
Python. Anaconda on Ubuntu 14.04 LTS environment was used to implement the
proposed framework. The GPU hardware used for the experiment was GeForce RTX
2080-ti. For training the semantic concepts feature networks (SSC-N and DSC-N), we
used “Adam model optimization” with the binary cross-entropy loss calculation
function. For the caption generation, RMSprop was used as the model optimization
algorithm, and the categorical cross-entropy cost function was selected as the loss
function. In order to train the bi-directional proposal generation module of our
framework, in Fig. 4, we wish to train the network with samples that express temporal
long overlapped segments, which are longer than the (K) proposals we want to detect at
each time step, so that we encourage the network to avoid saturation of the hidden
states in both directions. Each time step in the input video sequence can be considered
multiple times each in a different context through dense sampling.

6 Conclusion

In this paper, a deep neural network framework is proposed that identifies and handles
three challenges related to the task of video captioning. These challenges are (1) se-
mantic concepts feature learning, for reducing the gap between low-level video feature
and sentence descriptions (2) event representation and (3) context fusion with visual
features. Firstly we adopt a bidirectional LSTM framework, for localizing events such
that it encodes both past and future contexts where both contexts help localizing the
current event better. We further reused the proposal’s context information (hidden
states) from the localization module as context vectors and dynamically fused those
with event clips features, which are extracted by two stream 3D ResNets. Using an
attention based mechanism, to fuse visual contents with context information produced
superior results compared to using the context alone. The proposed model also learns
additionally semantic features that describe the video content effectively, using LSTMs.
Experiments on MSR-VTT (sports) dataset, demonstrate the performance of the pro-
posed framework. Our future works are as follows:
1. Coupling proposal and captioning modules into one unified framework, trained in
an end to end manner.
2. Investigate how to exploit the temporal event proposal module and the bi-modal
features for multiple sentence generation for videos (dense captioning).

References
1. Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging
video and language. In: IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), Las Vegas, pp. 5288–5296 (2016)
Video Captioning Using Attention Based Visual Fusion 77

2. Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating
videos to natural language using deep recurrent neural networks. In: Proceedings of the 2015
Conference of the North American Chapter of the Association for Computational
Linguistics, pp. 1494–1504. North American Chapter of the Association for Computational
Linguistics (NAACL), Colorado (2015)
3. Mahdisoltani, F., Berger, G., Gharbieh, W., Fleet, D., Memisevic, R.: Fine-grained video
classification and captioning. ArXiv_CV (2018)
4. Buch, S., Escorcia, V., Shen, C., Ghanem, B., Carlos Niebles, J.: SST: single-stream
temporal action proposals. In: IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, pp. 6373–6382 (2017)
5. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D
CNNs and ImageNet? In: IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Salt Lake City , pp. 6546–6555 (2018)
6. Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3D residual
networks for action recognition. In: IEEE International Conference on Computer Vision
Workshop (ICCVW), pp. 3154–3160 (2017)
7. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics
dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Honolulu, pp. 4724–4733 (2017)
8. Wang, X., Miao, Z., Zhang, R., Hao, S.: I3D-LSTM: a new model for human action
recognition. In: IOP Conference Series: Materials Science and Engineering (2019)
9. Kay, W., et al.: The kinetics human action video dataset. ArXiv (2017)
10. Soomro, K., Zamir, A.R., Shah, M.: A dataset of 101 human actions classes from videos in
the wild. ArXiv (2012)
11. Zhao, Y., Yang, R., Chevalier, G., Xu, X., Zhang, Z.: Deep residual bidir-LSTM for human
activity recognition using wearable sensors. Math. Prob. Eng. 1–13 (2018)
12. Kuppusamy, P.: Human action recognition using CNN and LSTM-RNN with attention
model. Int. J. Innov. Technol. Exploring Eng. (IJITEE) 8, 1639–1643 (2019)
13. Shou, Z., Wang, D., Chang, S.-F.: Temporal action localization in untrimmed videos via
multi-stage CNNs. In: IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), USA, pp. 1049–1058 (2016)
14. Lin, T., Zhao, X., Fan, Z.: Temporal action localization with two-stream segment-based
RNN. In: IEEE International Conference on Image Processing (ICIP), Beijing, pp. 3400–
3404 (2017)
15. Yao, G., Lei, T., Liu, X., Jiang, P.: Temporal action detection in untrimmed videos from fine
to coarse granularity. Appl. Sci. 8(10), 1924 (2018)
16. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features
with 3D convolutional networks. In: Proceedings of the 15th IEEE International Conference
on Computer Vision, ICCV 2015, pp. 4489–4497 (2015)
17. Karen, S., Andrew, Z.: Two-stream convolutional networks for action recognition in videos.
Adv. Neural. Inf. Process. Syst. 1, 568–576 (2014)
18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
19. Pan, P., Xu, Z., Yang, Y., Wu, F., Zhuang, Y.: Hierarchical recurrent neural encoder for
video representation with application to captioning. In: IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 1029–1048 (2016)
20. Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing
videos by exploiting temporal structure. In: IEEE International Conference on Computer
Vision (ICCV), USA, pp. 4507–4515 (2015)
78 N. K. Fawzy et al.

21. Jeff, D., et al.: Long-term recurrent convolutional networks for visual recognition and
description. IEEE Trans. Patt. Anal. Mach. Intell. 39(4), 677–691 (2017)
22. Wang, H., et al.: Action recognition by dense trajectories. In: IEEE Conference on Computer
Vision & Pattern Recognition (CVPR), USA (2011)
23. Yu, Y., Ko, H., Choi, J., Kim, G.: End-to-end concept word detection for video captioning,
retrieval, and question answering. In: IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), USA, pp. 3261–3269 (2017)
24. Bird, S., et al.: Natural Language Processing with Python. O’Reilly Media Inc, California
(2009)
25. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of
machine translation. In: Proceedings of the 40th Annual Meeting on Association for
Computational Linguistics (ACL 2002), USA, pp. 311–318 (2002)
26. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus based image description
evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2015), Boston, Massachusetts, pp. 4566–4575 (2015)
27. Escorcia, V., Caba Heilbron, F., Niebles, J.C., Ghanem, B.: DAPs: deep action proposals for
action understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016.
LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-
46487-9_47
28. Wang, L., Qiao, Y., Tang, X.: Action recognition and detection by combining motion and
appearance features (2014)
29. Jingkuan, S., et al.: Hierarchical LSTM with adjusted temporal attention for video
captioning. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence, Australia, pp. 2737–2743 (2017)
Matchmoving Previsualization Based
on Artificial Marker Detection

Houssam Halmaoui1,2(B) and Abdelkrim Haqiq2(B)


1
ISMAC - Higher Institute of Audiovisual and Film Professions, Rabat, Morocco
houssam.halmaoui@gmail.com
2
Faculty of Sciences and Techniques, Computer, Networks, Mobility and Modeling
Laboratory: IR2M, Hassan First University of Settat, 26000 Settat, Morocco
abdelkrim.haqiq@uhp.ac.ma

Abstract. In this article, we propose a method for inserting a 3D syn-


thetic object into a video of real scene. The originality of the proposed
method lies in the combination and the application to visual effects of
different algorithms of computer vision and computer graphics. First,
the intrinsic parameters and distortion coefficients of the camera are
estimated using a planar checkerboard pattern with Zhang’s algorithm.
Then, AruCo marker dictionary and the corresponding feature detection
algorithm are used to detect the four corners of a single artificial marker
added to the scene. A perspective-4-point method is used to estimate
the rotation and the translation of the camera with respect to a 3D ref-
erence system attached to the marker. The camera perspective model is
then used to project the 3D object on the image plan, while respecting
perspective variations when the camera is moving. The 3D object is illu-
minated with diffuse and specular shading models, in order to match the
object to the lighting of the scene. Finally, we conducted an experiment
to quantitatively and qualitatively evaluate the stability of the method.

Keywords: Camera pose · Fiducial markers · Diffuse and specular


shading · Augmented reality · Visual effects

1 Introduction
Matchmoving or camera tracking is an augmented reality method used in visual
effects [12], and which consists in inserting 3D synthetic objects into a video of
real scene, in such a way that the object coexist coherently with the other ele-
ments, while respecting the geometry and the lighting of the scene (see Fig. 3b).
Usually, in filmmaking, this effect is achieved in post-production. However, in
case of problem during shooting, whether it is technical (lighting or tracking) or
artistic (object position), it is difficult to do the matchmoving in post-production
without spending a lot of time processing the video frame by frame or without
re-shooting the scene. This implies significant time and cost losses. In this arti-
cle, we propose a method to make a previsualization of the result on set. This
allow to detect and correct problems during shooting.
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 79–89, 2021.
https://doi.org/10.1007/978-3-030-58669-0_7
80 H. Halmaoui and A. Haqiq

The problem of matchmoving has both geometrical and lighting aspects,


which are often treated separately in the literature. The estimation of the cam-
era pose (rotation and translation of the camera) is central to the geometrical
aspect. In [8,9,13], external sensors (inertial sensor, Wifi or Hololens headset)
are used for this purpose. In [1,7,10] the estimation is performed by deep learn-
ing, this solves the motion blur problem, but requires the presence of textures
in the image. Traditional methods are based on feature detection, which are
patterns such as corners or similar connected regions (blobs) that have the par-
ticularity of being reliable for tracking [4]. Some detectors are invariant to scale,
affine transformations and orientation [11]. Artificial markers can be added in
the scene in order to handle the problem of non textured area and for more
robust detection, thanks to the unique binary code included in the markers.
For this study, we use the ArUco detection algorithm [2] and the corresponding
marker dictionary because of its detection performance compared to other arti-
ficial markers [14] and for the speed of calculation. By locating the four corners
of a single marker, it can be used as a 3D reference system for the estimation
of the camera pose, thanks to a perspective-4-points method [16]. Concerning
the lighting aspect, the two most used methods for rendering are rasterization
[3] and ray tracing [15]. We use rasterization because of the processing speed on
CPU and the sufficient rendering quality for previsualization. Finally, instead of
using an automatic illumination estimation method [6], we choose manually the
lighting position of the 3D object when adjusting its geometrical aspect (manual
intervention which is necessary since it depends on the user’s choice).
The originality of our method is to propose a matchmoving previsualization
solution, by combining all the geometrical and shading aspects in a single system,
thanks to various computer vision and computer graphics algorithms. On the
other hand, the method is accessible through the use of a single camera, an
artificial marker and fast camera pose estimation and rendering algorithms.

Fig. 1. Steps of the proposed method.

The steps of the proposed method are summarized in Fig. 1. First, the camera
is calibrated in order to estimate its intrinsic parameters. Then, we proceed to
Matchmoving Previsualization 81

the detection of the artificial marker corners in order to estimate the camera
pose. After adjusting the desired geometrical appearance of the 3D object, we
project it onto the image using the camera perspective model and the estimated
camera parameters. The visible faces of the object are computed by a Hidden
Surface Removal algorithm. Finally, we assign to each face a color using a diffuse
and specular shading models, according to a lighting position chosen by the user.
The article is organized as follows. In Sect. 2, we present the camera perspec-
tive model used for camera parameters estimation and for 3D object projection.
In Sect. 3, we present in details the steps of the proposed algorithm. In Sect. 4,
we present the results of quantitative and qualitative evaluation.

2 The Camera Perspective Model

Calibration Matrix

The aim is to establish the relationship between a point coordinates in the image
in pixels and the corresponding 3D point in the world space in meters [12]. The
camera is considered to follow a pinhole model. The 3D coordinates are specified
in a camera reference system (XY Z) as shown in Fig. 2.

Fig. 2. Pinhole model and reference systems for camera and image.

 T
We consider a point of the scene with coordinates Xc Yc Zc . The projec-
 T
tion x̃ ỹ in the image as a function of the focal length f is:
Xc Yc
x̃ = f ỹ = f (1)
Zc Zc
 T  T
The values x̃ ỹ are physical measurements in meters. The values x y
in pixels are written as a function of the width dx and the height dy of a pixel
 T
and the center of the image x0 y0 which corresponds to the projection of the
origin of the camera reference system in the image plane:
x̃ ỹ
x= + x0 y= + y0 (2)
dx dy
One thus, we obtain:
82 H. Halmaoui and A. Haqiq

f Xc f Yc
x= + x0 y= + y0 (3)
dx Zc dy Zc
The Eq. 3 can be written in matrix form:
 T  T
x y 1 ∼ K Xc Yc Zc (4)
⎡ ⎤
αx 0 x0
f f
with K = ⎣ 0 αy y0 ⎦, αx = and αy =
dx dy
0 0 1
K is called the calibration matrix and the symbol ∼ means that equality is
obtained up to a factor (by dividing the right hand term by Zc ).

Distortions
The cameras used in real life are more complicated than a simple pinhole model.
The image usually suffers from radial distortions caused by the spherical shape
 T
of the lens. The relationship between the coordinates x̃ ỹ of the ideal image
 T
(without distortion) and the coordinates x̃dist ỹdist of the observed image
(with distortion) is as follows [16]:

x̃dist = (1 + κ1 (x̃2 + ỹ 2 ) + κ2 (x̃2 + ỹ 2 )2 )x̃


(5)
ỹdist = (1 + κ1 (x̃2 + ỹ 2 ) + κ2 (x̃2 + ỹ 2 )2 )ỹ
The coefficients κ1 and κ2 control the amount of distortion. In the case of
large distortions (wide angle lens), a third coefficient κ3 can be added as a third
order in the polynomial formula. This distortion model is combined with the Eq.
 T
2 in order to have a model as a function of the pixel coordinates xdist ydist :

xdist = (1 + κ1 (x̃2 + ỹ 2 ) + κ2 (x̃2 + ỹ 2 )2 )(x − x0 ) + x0


(6)
ydist = (1 + κ1 (x̃2 + ỹ 2 ) + κ2 (x̃2 + ỹ 2 )2 )(y − y0 ) + y0

Camera Matrix
Assuming the distortions are compensated (see Sect. 3), the image formation
can be modeled by the Eq. 4. This model allows to express the coordinates of a
point in the camera reference system. For matchmoving problem, the 3D object
coordinates can be defined in any reference system of the scene. We must there-
 T
fore, before applying the projection model, transform the coordinates X Y Z
 T
expressed in any reference system into coordinates Xc Yc Zc expressed in the
camera reference system. The transformation formula is:
 T  T
Xc Yc Zc = R X Y Z + t (7)
R and t are the extrinsic parameters. R is a 3 × 3 rotation matrix defined by
3 angles and t is a translation vector. Thus, Eq. 4 becomes:
Matchmoving Previsualization 83

   T
xy1 ∼P XY Z1 (8)
 
with P = K R | t . P is called the camera matrix. Matchmoving problem is
equivalent to calculate the matrix P for each frame of the video.

3 Proposed Method

Calibration Matrix Estimation


Assuming that the focal length of the camera does not change during video
acquisition, we need to estimate the calibration matrix only once at the begin-
ning. This is done by using several images of a planar checkerboard with known
dimensions, captured from different points of view [16]. Figure 3a shows the
calibration configuration. We used a phone camera mounted on a tripod and
controlled remotely for more acquisition stability to reduce the calibration error.

Fig. 3. (a): Camera calibration configuration. (b): Result of 3D object projection in


a video acquired with a moving camera.

Considering that Z = 0 corresponds to the plane of the checkerboard pattern,


the Eq. 8 is transformed into a homography:
 T   T
x y 1 = λK r1 r2 r3 t X Y 0 1 (9)
where r1 , r2 and r3 are the columns of R, and λ an arbitrary coefficient.
Thus, we have:
 T   T
x y 1 = λK r1 r2 t X Y 1 (10)
Therefore, the homography formula is:
 T  T
xy1 =H XY 1 (11)
We note H the 3 × 3 homography matrix:
 
H = λK r1 r2 t (12)
84 H. Halmaoui and A. Haqiq

H is estimated using a DLT (Direct Linear Transformation) algorithm [5]. We


start by detecting the corners of the squares of the checkerboard using a feature
 T
detection algorithm. Each corner x y gives us two equations from Eq. 11. The
 T
corresponding X Y 1 is known, since we have the physical distance between
the corners. Note that H has a degree of freedom of 8 since the homography
equation is defined up to a factor. Thus, we need a minimum of four corners to
find H. Since the position of the corners is subject to noise, we use more than four
corners. Then, the least squares method is used to find an approximate solution,
by a singular-value decomposition, corresponding to the best homography H that
minimize a cost function. Once we have H for each view, we can deduce K from
Eq. 12: by adding constraints on K using the fact that r1 and r2 are orthonormal
[16], the Eq. 12 is simplified into a linear equation and the estimation of K is
performed again with the DLT algorithm.

Distortion Coefficient Estimation


Once the intrinsic parameters (x0 , y0 , αx , αy ) are known, the images of the
checkerboard are used again. The positions of the ideal points (without dis-
tortion) are known since the dimensions of the checkerboard are also known.
The corresponding distorted points in the image are detected using a feature
detection algorithm. Then, the distortion model (Eq. 6) is solved using a least
squares method in order to estimate the distortion coefficients κ1 and κ2 .

Markers Detection
Since the camera is moving, this step and all those that follow must be done for
each frame. As mentioned previously, we use ArUco artificial markers [2]. A single
marker is added to the scene in order to be used as a 3D reference (see Fig. 3b).
The detection algorithm is as follows: 1) Edge detection and thresholding. 2)
Polygonal approximation of concave rectangles with four corners. 3) Perspective
projection to obtain a frontal rectangle, and identification of the internal code
by comparing it to the markers in the dictionary. For more details see [2].

Pose Estimation
Once the corners of the marker are detected, we use their positions to estimate R
and t [16]. First, the four corners are used to estimate the homography between
the marker and the corresponding image (we consider the marker in the plane
Z = 0). As we mentioned early, a minimum of four coplanar points is necessary
to estimate a homography with the DLT algorithm [5]. Finally, knowing K and
H, the calculation of the external parameters is performed using Eq. 12:

r1 = K −1 h1
r2 = K −1 h2 (13)
−1
t=K h3
Matchmoving Previsualization 85

where h1 , h2 and h3 are the columns of H. We deduce r3 by using a cross product:

r3 = r1 × r2 (14)

3D to 2D Projection

After adjusting the desired object size and position through geometric trans-
formations, we project the vertices of the 3D object in the image as follows: 1)
Transformation of the 3D object’s coordinates in the camera reference system
using the extrinsic parameters R and t. 2) Projection of the 3D points in the
image using the calibration matrix K. 3) Application of the distortion model.

Hidden Surface Removal

When projecting 3D objects, as in real life, we want to see only the front of the
objects and not the back of them. This process is called HSR (Hidden Surface
Removal). For this, we use the Z-buffering HSR algorithm. Two buffers are
used, one for color and one for depth (Z-buffer). We start by calculating for
each polygon (face formed by three vertices) the distance dpoly from its center
 T  T
Xpoly Ypoly Zpoly to the camera using the translation vector t = tx ty tz :

dpoly = (tx − Xpoly )2 + (ty − Ypoly )2 + (tz − Zpoly )2 (15)
For each pixel of the projected 3D object, if the distance of the corresponding
polygon is less than the one stored in the depth buffer, the distance in the depth
buffer and the color in the color buffer are replaced by those of the corresponding
polygon. In the following, we see how to calculate the polygon color.

Ambient, Diffuse and Specular Shading

In order to illuminate the projected 3D object, we use the ambient, diffuse and
specular models, which are the most widely used [3]. Ambient reflection simulates
lighting that affects equally all objects, its contribution is a constant value.
Diffuse reflection scatter the light in all directions, its contribution is:

Idif f use = Cd max((N • L), 0) (16)


with L the light direction, N the normal vector and Cd the diffuse color.
Light position is selected by the user to best match that of the real scene.
The color Cd depends on the type of lighting and the object material, but we
don’t cover this aspect here, we simply use a constant value of the color. Finally,
specular reflection simulates the shininess of an object, its contribution is:

Ispecular = Cs max(0, (R • V )n ) (17)


86 H. Halmaoui and A. Haqiq

Fig. 4. Angles of incidence of light and angle of view.

with Cs the specular color, V the direction of view, n is a factor for controlling
the width of the gloss spot, and R is the direction of reflection. Figure 5 shows
an example of projection using diffuse and specular lighting.
To accelerate the process in CPU, we assign the same color to the whole poly-
gon (flat shading). For better quality, a value obtained by interpolation from the
three vertices of the polygon is assigned to each pixel. Also, CPU does not allow
fast rendering for thousands of vertices, and a GPU implementation is necessary.
Note that all the steps, including the rendering part, were implemented on CPU
using C++, OpenCV and the ArUco library.

Fig. 5. A 3D object projection and illumination using, from left to right, the diffuse
model, the specular model, and a combination of both.

4 Quantitative and Qualitative Evaluations

We carried out an experiment to measure the stability of the method. We used


a tripod-mounted mobile phone to capture videos of a fixed scene (without any
camera movement), in order to have ground truth. Given that the camera and
the scene are static, a vertex should ideally be in the same position in the image.
We compute a repeatability score between each frame and the first frame of
R
the sequence considered as reference: S = , with R the number of repeatable
Nf
detections, which corresponds to the number of projected vertices whose position
has not changed compared to the reference image. Nf is the total number of
vertices. We evaluated three kinds of sequences: high lighting, low lighting, and
variable lighting, and for each type of sequence we placed the camera at different
distances. The aim is to measure the impact of the lighting and marker size on
the stability of the matchmoving. Figures 6 shows the evaluation results.
Figure 6a corresponds to the sequences with high lighting. We can see that
when the marker size is big enough, the score is around 0.9, therefore the
Matchmoving Previsualization 87

Fig. 6. Quantitative evaluation results.

matchmoving is very stable. When the size of the marker decreases, the score
drops to 0.8.
Figure 6b corresponds to the cases where the lighting is low. We observe the
same thing as previously, but with a slightly higher drop in score (around 0.7)
when the size of the marker is small.
Figure 6c shows the case of variable lighting. We can see that when we vary
the lighting (after 500 frames), by introducing shadows on the marker and vary-
ing its contrast, the score can become very low.
In order to measure by how many pixels the projected vertices have moved,
we recalculated the score of the sequences where the marker size is small (worst
results), by introducing a displacement tolerance of ±1 pixel. Figure 6d shows
the result obtained. We can see that the score remains equal to 1 all the time,
which means that the errors we obtained in the previous experiments are due to
vertex displacements of ±1 pixel, which is usually not very visible.
In addition to the size of the markers and lighting, another factor that impacts
the stability of the matchmoving is the distance of the projected object from the
marker. Indeed, the projection error is more important when the object is far
away from the marker, due to calibration error. We repeated the last experiment
with small marker size for variable lighting (worst results) by choosing a tolerance
of ±1, and by placing the object at different distances. Figure 6e shows the result.
We can see that when lighting vary, the further the object moves away from the
marker, the more score drops. In the majority of the frames, except when the
object are very far from the marker, the score remains very high. One solution
to correct this stability problem is to manually process the few low score frames
in post-production.
From these experiences, it can be deduced that in order to have a stable
projection result, it is necessary to use a large marker, a high and homogeneous
lighting, and to place the object close to the marker.
88 H. Halmaoui and A. Haqiq

It is also important to reduce the calibration error by making sure that the
checkerboard are perfectly plan and by using professional camera.

Fig. 7. Examples of results for qualitative evaluation.

To qualitatively evaluate the result, we projected a 3D object in video


sequences captured with a moving mobile phone camera. Figure 7 shows result
images acquired from different points of view. We can see that the geometry
of the 3D object corresponds to the perspectives of the scene during the cam-
era movement. The 3D object is immobile to facilitate the evaluation of the
result: for an accurate result, the object must remain immobile during the cam-
era movement. We found that the detection of markers suffers from instability
and non-detection when the camera is moving fast. Among our perspectives, we
propose to solve this problem with a Kalman filter. Finally, in matchmoving,
the marker should not be visible in the output video. In the case of immobile
objects, it can be placed above the marker. In the case of a moving object, a
solution is to place the marker in a homogeneous area and then remove it using
inpainting algorithm.

5 Conclusion and Perspective


We presented a method for matchmoving previsualization by estimating the cam-
era pose using a single artificial marker as a 3D reference. The different modules
are independent, this allows to replace each one of them by the most appro-
priate according to the desired application. In our case, we chose the methods
that allow us to process the images accurately and quickly. The quantitative and
qualitative evaluations on videos of real scenes attests to the effectiveness of the
proposed algorithms, but we found that the method suffer from instability on
some frames when the marker have small image size, when lighting is low or inho-
mogeneous, when the objects projected are far from the marker, and when the
camera moves fast. We intend to test other methods for feature detection, pose
estimation and calibration, in order to improve the accuracy. All the algorithms
have been implemented on CPU, on the one hand this allows easier migration
to any platform, but to get a more realistic rendering quality it is necessary to
use a parallel GPU implementation. In this context, we envisage to work on the
lighting aspect by considering the type of materials and other lighting models.
We also intend to apply the method in more practical cases such as the creation
of virtual environments, and to consider the case of 3D animated objects.
Matchmoving Previsualization 89

References
1. En, S., Lechervy, A., Jurie, F.: Rpnet: an end-to-end network for relative camera
pose estimation. In: Proceedings of the European Conference on Computer Vision
(ECCV) (2018)
2. Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Medina-Carnicer, R.:
Generation of fiducial marker dictionaries using mixed integer linear programming.
Pattern Recognit. 51, 481–491 (2016)
3. Gordon, V.S., Clevenger, J.L.: Computer Graphics Programming in OpenGL with
C++. Stylus Publishing, LLC (2018)
4. Harris, C.G., Stephens, M.: A combined corner and edge detector. In: Proceedings
of the Alvey Vision Conference, pp. 147–151 (1988)
5. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cam-
bridge University Press (2003)
6. Hold-Geoffroy, Y., Sunkavalli, K., Hadap, S., Gambaretto, E., Lalonde, J.F.: Deep
outdoor illumination estimation. In: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 7312–7321 (2017)
7. Kendall, A., Grimes, M., Cipolla, R.: Posenet: a convolutional network for real-time
6-DOF camera relocalization. In: Proceedings of the IEEE international conference
on computer vision, pp. 2938–2946 (2015)
8. Kotaru, M., Katti, S.: Position tracking for virtual reality using commodity WiFi.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recog-
nition, pp. 68–78 (2017)
9. Lee, J., Hafeez, J., Kim, K., Lee, S., Kwon, S.: A novel real-time match-moving
method with hololens. Appl. Sci. 9(14), 2889 (2019)
10. Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Relative camera pose estimation
using convolutional neural networks. In: International Conference on Advanced
Concepts for Intelligent Vision Systems, pp. 675–687. Springer (2017)
11. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int.
J. Comput. Vis. 60(1), 63–86 (2004)
12. Radke, R.J.: Computer Vision for Visual Effects. Cambridge University Press
(2013)
13. Rambach, J.R., Tewari, A., Pagani, A., Stricker, D.: Learning to fuse: a deep learn-
ing approach to visual-inertial camera pose estimation. In: International Sympo-
sium on Mixed and Augmented Reality (ISMAR), pp. 71–76. IEEE (2016)
14. Romero-Ramirez, F.J., Muñoz-Salinas, R., Medina-Carnicer, R.: Speeded up detec-
tion of squared fiducial markers. Image Vis. Comput. 76, 38–47 (2018)
15. Wyman, C., Marrs, A.: Introduction to directx raytracing. In: Ray Tracing Gems,
pp. 21–47. Springer (2019)
16. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern
Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Research Method of Blind Path Recognition
Based on DCGAN

Ling Luo1, Ping-Jun Zhang1, Peng-Jun Hu1, Liu Yang1,


and Kuo-Chi Chang1,2,3,4(&)
1
School of Information Science and Engineering,
Fujian University of Technology, Fuzhou, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
College of Mechanical and Electrical Engineering,
National Taipei University of Technology, Taipei, Taiwan
4
Department of Business Administration,
North Borneo University College, Sabah, Malaysia

Abstract. In order to solve the problem that there are few blind path data sets
and a lot of manual data collection work in the current blind guide system,
computer vision algorithm is used to automatically generate blind path images in
different environments. Methods a blind path image generation method based on
the depth convolution generative adversary network (DCGAN) is proposed. The
method uses the characteristics of typical blind path, which is the combination of
depression and bulge. The aim of long short memory network’ (LSTM) is to
encode the depression part, and the aim of convolution neural network (CNN) is
to encode the bulge part. The two aspects of information are combined to
generate blind path images in different environments. It can effectively improve
the blind path recognition rate of the instrument and improve the safe travel of
the visually impaired. Conclusion generative adversarial networks (GANs) can
be used to generate realistic blind image, which has certain application value in
expanding blind channel recognition data, but it still needs to be improved in
some details.

Keywords: Convolutional neural network (CNN)  Depth convolution


generative adversary network (DCGAN)  Generative adversarial networks
(GANs)  Advanced blind path recognition system  Algorithm

1 Introduction

According to a study published in The Lancet Global Health, by 2050, the number of
cases of blindness in the world will increase from 36 to 115 million. China is one of the
countries with the largest number of vision disorders in the world. At the end of 2016,
the number of vision disorders was about 17.31 million, of which more than 5 million
were blind, with an annual increase of 450,000 blind people and 1.35 million low
vision people [1]. Most of the information obtained by human beings is transmitted by
vision, accounting for 80% [2]. Because of the physiological defects and the

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 90–99, 2021.
https://doi.org/10.1007/978-3-030-58669-0_8
Research Method of Blind Path Recognition Based on DCGAN 91

increasingly complex living environment, it brings many inconveniences to the blind


people's life. In view of the inconvenience of blind people, life, guide dogs and guide
rods have gradually become tools to help blind people travel. However, the guide dog
is not easy to train and the cost is high, and the detection range of guide stick is limited.
In view of this, the design of intelligent guide system is beneficial to the actual life of
blind people.
Blind path recognition is an important part of intelligent guide system. Blind path
recognition belongs to image recognition, which has been a hot research topic in the
field of computer vision. CNN is the most widely used method in image recognition.
The biggest disadvantage of CNN is that it requires a large-scale training data set to train
the samples, which will undoubtedly slow down the convergence speed of the network
model and require many training skills to improve the recognition rate. This article
draws on the following two advanced and usability aspects of image recognition,
learning GANs, and using DCGAN to identify blind tracks. Lu et al. used DCGAN to
generate face image [3]. Zeng et al. build a semi supervised depth generation network to
change the recognition rate by increasing training samples through label distribution [4].
Good fellow et al. proposed a generation model is the GANs in 2014 [5]. This can
generate high-quality images and extract features. The difference between the gener-
ation model and the traditional generation model lies in that it includes two parts: the
network of generation and the network of discrimination. This will generate a con-
frontational relationship between the network and the discriminatory regional network.
The idea of GANs originates from the zero sum game, that is to say, whether the
interests of two parties are increased or not is affected by each other, the increase of
interests of one party will lead to the decrease of interests of the other party, and the
two parties are in the state of confrontation game. This idea is used in the GANs. For
the generation network, the data generated will be infinitely close to the real data
through a series of data fitting using the input original data. For the discrimination
network, the “false data” and “real data” generated by fitting will be compared to
finally distinguish whether the generated data is real or not. At the same time, the
generation network will be from the discrimination network. The distribution of
learning data in the network optimizes the learning of the generated network, so that the
two networks finally achieve Nash equilibrium [6]. The operation process diagram of
GANs is exhibition in Fig. 1 [7].
The algorithm of GANs core principle is described below: since given generation
network, the optimal discrimination network and the discrimination network are two
classification models, and the training process is shown in Formula (1).

min max VðD; GÞ ¼ Ex  pdateðxÞ½ln DðxÞ þ Ez  pzðxÞ½ln ð1  DðGðzÞÞÞ ð1Þ


G D

In the formula: x is shows the real sample of input; z is shows the random noise in
the input generation model; D(x) represents the proportion that the discrimination
model think the input sample as the real sample; the value of D(x) is 1, X represents
100% of the real sample; the value of D(x) is 0, x cannot be the real sample;
G(z) shows the samples created by the generation model after receiving random noise;
Pdate (x) shows the real data situation; PZ(x) shows the generated data situation. It is
92 L. Luo et al.

Fig. 1. The operation process diagram of GANs

the task of the discriminator to accurately determine whether the input samples are true
or not. The V(D, G) to be obtained is the maximum, that is, to find max D, the D works
so that D(G(z)) is infinitely close to 0 and D(x) is infinitely close to 1. It is the task of
the G to make the generated samples infinitely close to the real samples. To get the
minimum V(D, G), that is, to find the Ming, then the G works to make D(G(z))
infinitely close to 1, D(x) infinitely close to 0 [8].
This is a process of dynamic confrontation game. In this training process, GANs
runs an alternative optimization mode. The aim to make the accuracy of the discrim-
inant framework as low as possible, the fixed process is discriminant model, and the
optimized process is to generate model. The aim to improve the accuracy of the
discriminant framework, the fixed process is to generate model.

2 Methodology

DCGAN is a generalized model of GANs improved by Dundar et al. in 2015 [9]. It’s
standard structure is the same as that of GANs. The difference is that on the basis of
GANs, CNN is used to replace the internal structure of generator, and this is used to
generate image samples.
In DCGAN, The structure of D is a CNN, which is used to determine the proba-
bility that the input image is a real image. The difference of this CNN is that the down
sampling operation of the pooling layer is replaced by the step long convolution,
because the down sampling operation of the pooling layer is only a simple extraction of
pixel points, while the step length convolution can extract the deep features while
constantly reducing the size of the feature map In the process of down sampling
operation, convolution operation is carried out at the same time. The aim to make the
network easier to converge, LeakyRelu is used as the activation function in the D, and
its formula is as follows (2), the characteristics of LeakyRelu are shown in the formula,
which is composed of two linear segments. On the positive half axis, it shows a positive
proportion mapping, on the negative half axis, it shows a negative value adjusted by the
slope leak. The input image needs to be convoluted through each layer in D in order to
Research Method of Blind Path Recognition Based on DCGAN 93

extract the convolution features of the subsequent input logistic function, so the final
output is the probability of the real image [10].

x x[0
LeakyReluðxÞ ¼ ð2Þ
leak  x; else

There is no difference between the input of G and the input of GANs. The input of
G is a 100 dimensional noise vector z. then the extended feature points pass through
each linear layer. The first layer of G is the full connection layer. The purpose is to
reconstruct the 4 * 4 * 1024 dimensional feature map of the 100 dimensional Gaussian
noise vectors. Next, the deconvolution is used for the up sampling operation to
gradually reduce the number of channels. Only Sigmoid function is used in the last
layer, because sigmoid has better performance when the features are different, and it
can continuously enhance the feature effect in the process of circulation. In every other
layer, we use Relu to activate the function; Relu’s formula is as follows (3). The
characteristic is that the positive half axis behaves the same as LeakyRelu, and the
negative half axis is 0 under any condition, so as to filter the negative value, i.e.
neurons when the input value is negative, it will not be activated, which can improve
the calculation efficiency. The final output is a 64 * 64 * 3 image. The network
structure of is G shown in Fig. 2.

x; x[0
ReluðxÞ ¼ ð3Þ
0; else

Fig. 2. Structure of G in DCGAN

3 Using DCGAN to Enhance Blind Track Image

Under the condition of the DCGAN, the enhancement of blind path image mainly goes
through the following processes: firstly, the training environment of blind path
recognition should be built, then a sufficient number of original data samples should be
collected, finally, DCGAN model training should be carried out under the premise of
the first two processes, and then the generated data can be obtained.
94 L. Luo et al.

3.1 Build a Training Environment for Blind Path Recognition


Based on Tensorflow learning framework, this paper build DCGAN model. The
training environment of blind path recognition is shown in Table 1.

Table 1. The training environment of blind path recognition


Project name Detailed information
Operating system Windows10, 64 bit
RAM 32 GB Memory
CPU Intel(R) Core(TM) i7-7700HQ @2.80 GHz
Graphics card NVIDIA GeForce GTX 1050 Ti
Python version Python3.7.7
Tensorflow version TensorFlow1.8.0
Development environment Jupyter Notebook
Primary library Numpy/Matplotlib/Tensorflow, etc.

3.2 Original Data Sample


Since there is no public data set about blind channel, the blind channel data set used in
this paper is collected and produced by myself, which contains images in different
scenes. DCGAN model is used to enhance the data.

3.3 Model Training


During the training, the data is loaded in the way of batch processing, and the size of
batch processing is set to 20, that is, during the training, each batch loads 20 blind
channel images; during the test, the size of batch processing is set to 12, that is, during
the test, each batch loads 12 blind channel images. The Adam optimizer with global
learning rate of 0.0002 and momentum of 0.5 optimizes the loss function [11]. The
cycle process can be described in this way: the first step is to generate blind path image
output by G, the second step is to accept the discrimination of true and false images in
the D, the third step is to calculate the birth loss and discrimination loss respectively
from the output of G and D, and in order to maintain the balance of confrontation, the
update times ratio between the D and the G is 1:2.
The training of GANs is usually unstable. Dropout is used in the first hidden layer
of the G of this model, and its value is set to 0.5. The purpose is to prevent over fitting
from appearing in the training process. At the same time, it is also used in the first
hidden layer of the D and its value is set to 0.5, and its value is set to 0.9. At the same
time, L2 regularization is carried out for all convolution layers with stride 1 and all
parameters of fully connected layers.
Research Method of Blind Path Recognition Based on DCGAN 95

3.4 Experimental Results Display


After a certain number of cycles, the model converges gradually. Figures 4, 5 and 6
show the corresponding image output by G when the corresponding arbitrarily dis-
tributed data is input. Figure 3 is the original blind path image photo; Fig. 4 is the
image output after running 5 cycles under DCGAN. It can be seen from the figure that
the image cannot show the shape of blind path at this time, and Fig. 5 is the image
output after running 100 cycles under DCGAN. At this time, the shape of blind path
can be clearly displayed, but the more detailed content of the image cannot be well
displayed. Figure 6 is the image output after running 5 cycles under DCGAN. After
400 cycles of running, the image output is stable, and the blind path in the image can
also be displayed well.

Fig. 3. Original image Fig. 4. 5 Cycle image output.

Fig. 5. 100 Cycle image output.


Fig. 6. 400 cycle image output.
96 L. Luo et al.

3.5 Loss Function


Figure 7 and Fig. 8 show the change of the real loss function data (lossd_loss_real) and
the generated forged data loss (d_loss_fake) through the D with the increase of training
times. It can be seen from the figure that both of them show a downward trend in the
overall picture, and there is a large range of shock in the later period of training, which
is due to the effect of confrontation with the D after the network model of confrontation
generation is stable.
Figure 9 and Fig. 10 are the sum of loss functions for D to distinguish the real data

Fig. 7. The change process d_loss_real of Fig. 8. The change process d_loss_fake of
blind path image blind path image

and the generated forged data, It can be concluded that both G and D training are
relatively smooth in the early stage and fluctuate obviously in the later stage. This is
because the generation network and discrimination network are gradually optimized in
the process of increasing training times. As the training process is a dynamic zero sum
game process, one party's interests is increased as shown in the figure will result in the
other party's interests is drop [12].

Fig. 9. The change process d_loss of blind Fig. 10. The change process g_loss of
path image blind path image
Research Method of Blind Path Recognition Based on DCGAN 97

3.6 Feasibility Generation of Validation Samples


In order to verify the feasibility of generating samples from DCGAN, this paper builds
a model based on TensorFlow deep learning framework for cifar-10 data validation. It
is a real problem that the number of images is small. There are a few kinds of models
that can not be distinguished. The image generated by DCGAN is used to fill in the
categories with a small number of images, so that the number of images in each
category is approximately the same. The following is a partial code description of the
discriminator, generator and the training of both.
For the definition of D model. Using LeakyReLU and Dropout, using the minimum
binary cross entropy loss function, Adam with a learning rate of 0.0002 and a
momentum of 0.5 is used for random gradient descent.
Standard convolution:
model.add(Conv2D(64, (3,3), padding = ‘same’, input_shape = in_shape)).
The three convolution layers use 2  2 steps and fill with 0 to sample the input
image:
model.add(Conv2D(64, (3,3), padding = ‘same’, input_shape = in_shape)).
There is no pooling layer in the classifier, and only one node in the output layer. It
has sigmoid activation function, which can predict whether the input sample is true or
false:
model.add(Dense(1, activation = ‘sigmoid’).
For the G, the first layer, the dense layer, needs enough nodes to process multiple
versions of the output image:
model.add(Dense(n_nodes, input_dim = latent_dim)).
Activation of nodes can be reshaped into images similar to those entering the
convolution layer, as follows: 256 different 4  4 feature maps:
model.add(Reshape((4, 4, 256))).
The upsampling process, that is, the deconvolution process, uses the step of
upsampling 2d layer configuration (2 * 2). This effect is to double the width and height
of the input characteristic map. Because there are three channels, three filters are
needed, that is, the same steps are carried out three times. We can output the 32 * 32
image we need:
model.add(Conv2DTranspose(128, (4,4), strides = (2,2), padding = ‘same’)).
The output layer is a conv2d kernel with a size of 3 * 3 and a fill of 0. Its purpose is
to create a single feature map and keep its size at 32 * 32 * 3. Tanh activation is used to
ensure that the output value is at [−1,1]:
model.add(Conv2D(3, (3,3), activation = ‘tanh’, padding = ‘same’)).
Figure 11 is some codes of training generator and discriminator are as follows:
98 L. Luo et al.

Fig. 11. Train the generator and discriminator

3.7 Comparison with Other Blind Path Recognition


In order to improve the safe travel situation of blind people, many scholars put forward
their own ideas. For example, document [13] proposed to identify blind path through
adaptive threshold segmentation, which acts on the whole image in the lab color space;
document [14] proposed to detect all lines in the whole image with blind path, using
edge and Hough transform, selecting the blind path boundary from the parallel rela-
tionship of blind path; document [15] proposed to distinguish sidewalk and blind path
from the characteristics of blind path. It integrates color continuity space and texture,
line detection and threshold segmentation; These methods are very applicable when the
blind road is straight and other road noodles, but the environment of the blind road in
reality is not ideal. For example, the waist of the blind road is cut off by the manhole
cover, the blind road is pressed by the parked vehicles on the side of the road, the green
trees are on the blind road, and various signboards on the commercial street occupy the
blind road. In this case, this method is not available. Document [16] proposed to the
blind path recognition method is a combination of biogeographical optimization
algorithm (BBO) and kernel fuzzy C-means (KFCM) algorithm. However, the model
of this method is complex and its practicability is not high.
The method adopted in this paper follows the trend of the development of the times.
In the age of AI, DCGAN is used in blind path identification. It only needs to be
embedded in the blind guide product, which does not occupy the volume of the
product, nor increase the weight of the product, and reduces the user's load. DCGAN
has the advantages of high recognition rate, not affected by the environmental condi-
tions, and has the advantages of the use of new science and technology is the biggest
encouragement for the development of science and technology, and promotes scientific
research to move forward in a more convenient direction for human beings.
Research Method of Blind Path Recognition Based on DCGAN 99

4 Conclusion and Suggestion

This paper introduces the principle of generating countermeasure network and deep
convolution generating countermeasure network, and then trains the deep convolution
generating countermeasure network on the original sample image to realize the
recognition of blind path image. The results show that the deep convolution generation
network can identify the blind path in the image better, and the significance of the
results is to use the depth learning to affect the blind path data of the blind system to
effectively solve the problem.

References
1. Li, S.: Research on the guidance system based on ultrasonic sensor array. Chongqing
University of Technology (2013)
2. Yin, L.: Research on 3D reconstruction method of computer vision based on OpenCV.
Anhui University (2011)
3. Lu, P., Dong, H.: Face image generation based on deep convolution antagonism generation
network. Mod. Comput. (21), 56–58, 64 (2019)
4. Zeng, Q., Xiang, D., Li, N., Xiao, H.: Image recognition method based on semi supervised
depth generation countermeasure network. Meas. Control Technol. 38(08), 37–42 (2019)
5. Ian, G., Jean, P.-A., Mehdi, M., et al.: Generative adversarial nets. In: Advances in Neural
Information Processing Systems, pp. 2672–2680 (2014).
6. Ye, C., Guan, W.: Application of generative adversary network. J. Tongji Univ. (Nat. Sci.
Ed.) 48(04), 591–601 (2020)
7. Guo, Q.: Generation of countermeasure samples based on generation countermeasure
network. Mod. Comput. 07, 24–28 (2020)
8. Ke, J., Xu, Z.: Research on speech enhancement algorithm based on generation
countermeasure network. Inf. Technol. Netw. Secur. 37(05), 54–57 (2018)
9. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
10. Ke, Y., Wang, X., Zheng, Y.: Deep convolution generation countermeasure network
structure. Electron. Technol. Softw. Eng. 24, 5–6 (2018)
11. Tang, X., Du, Y., Liu, Y., Li, J., Ma, Y.: An image recognition method based on conditional
depth convolution generation countermeasure network. Acta Automatica Sinica 44(05),
855–864 (2018)
12. Jia, J., Li, J.: Pest image recognition algorithm based on semi supervised generation network.
J. Wuhan Light Ind. Univ. 38(04), 45–52 (2019)
13. Ke, J.: Blind path recognition system based on image processing. Shanghai Jiaotong
University (2008)
14. Ke, J., Zhao, Q., Shi, P.: Blind path recognition algorithm based on image processing.
Comput. Eng. 35(01), 189–191, 197 (2009)
15. Yang, X., Yang, J., Yu, X.: Blind path recognition algorithm in image processing. Shang
(15), 228, 206 (2015)
16. Wang, M., Li, Y., Zhang, L.: Blind path region segmentation algorithm based on texture
features. Inf. Commun. 07, 23–26 (2017)
The Impact of the Behavioral Factors
on Investment Decision-Making: A Systemic
Review on Financial Institutions

Syed Faisal Shah1, Muhammad Alshurideh1,2(&), Barween Al Kurdi3,


and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
alshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE

Abstract. The purpose of the study is to identify the effects of behavioral


factors (cognitive biases) on financial decision-making. A systematic review
method was implemented and selected 29 research published studies between
the years 2010-2020 and were critically reviewed. The main findings of the
study indicate that the most common factors appear in papers were overconfi-
dence (18), anchoring bias (11), herding effect (10) and loss aversion (9), which
has a significant impact on the financial decision making process. Moreover,
almost half of the articles were survey-based (questionnaire), quantitative
method and the rest of the articles were the qualitative and mixed-methods. The
study concluded that the overall impact of behavioral/psychological factors
highly influence on financial decision-making. However, the time and search of
the key terms in the papers’ title were considered as the key limitations, which
prevent in-depth investigation of the study. For future research, those most
repetitive cognitive bias should be measured during COVID-19 pandemic
uncertain situation.

Keywords: Behavioral finance  Behavioral factor  Cognitive biases 


Decision-making  Systemic review

1 Introduction

The financial markets are operating in a highly competitive environment, where the
activity is revolutionized by technological and geopolitical developments. Globaliza-
tion has a high impact on financial centers in developed countries [1]. The behavioral
finance consists of two elements: Cognitive psychology and limits to arbitrage, where
cognitive implies human subjective thinking. Human makes systemic errors in judg-
ment and decision-making, for instance, overconfident as individual rely heavily on a
recent experience in decision making that leads to distortions [2]. The author also
suggested that behavioral finance should rely on knowledge rather than an arrogant
approach. Moreover, the most notable field is behavioral finance for an experiment on

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 100–112, 2021.
https://doi.org/10.1007/978-3-030-58669-0_9
The Impact of the Behavioral Factors on Investment Decision-Making 101

psychological influence in human irrational decision-making in the financial markets


[3, 4]. Likewise, [5] claim that decision-making is an essential skill and mental function
that how people act while making investment decisions. Similarly, a human is relying
either on system 1 (unconscious) and system 2 (conscious) or both in the same time in
judgement and decision making process [6]. The system 1 is a spontaneous, easy,
unconscious, and quick answering process, it depends on heuristics, naive, and non-
statistical decision, on the contrary, the system 2 work consciously, it is a slow,
deliberate, effortful process, utilizes statistics, and it requires time and resources to use
in decision-making [7]. As the system 1 is unconscious and intuition decision process,
it falls into different types of biases and traps, so in this systemic review process 82
independent factors have been identified which affects decision-making and it also
helped to form a new model. The proposed model undertaking current COVID-19
uncertain situation as a mediator with the most repetitive factors understand to influ-
ence on individual decision process. This paper conducted a systemic review method to
identify the impact of behavioral factors influencing financial decision-making. The
reason for using systemic review is to summarize the huge number of studies about a
subject [8]. Overall, the result of the study shows that cognitive biases have a signif-
icant impact on financial decision-making. The arrangement of the paper is as follows:
The second part follows the literature review. The third part explains the method with
the help of different tables. The fourth part shows the results (Table) and analyses the
studies. The final part is the conclusion.

2 Literature Review

Several researchers have studied the impact of psychological biases on investment


decision-making from various points of view i.e. culture, environment and so on, where
they obtained significant and useful results. Why and how behavioral finance come up
to the area of finance? The behavioral concept first established in the early 1980s by
few scholars from the fields of economics, psychology, engineering in the place of
Russel Sage Foundation in New York. Also, authors explain that behavioral finance is
informed by three elements of psychology: the first element is cognitive or behavioral
psychology, explain how investor mind do a requisite calculation that needed to
increase wealth, the second social psychology which appreciates person acts and find
the way for acceptance, and finally the emotional responses to the concentration of
trading, investor focus on decision-making not precisely on calculation [9]. The
research on prospect theory in 1979 and judgment during uncertainty situation used
heuristics and biases in 1982, this contribution severely affected behavioral economics.
It is understood that studying the impact of behavioral factors in financial decision-
making is through the systemic review, contributes to the existing literature in the field.
The systemic review is research that uses literature from data sources about the
established issue. Such investigations offer numbers of evidence related to intervention
by the employment of a clear and systemized research method, also critically reviewing
the selected studies. It helps to incorporate useful information from the group of studies
conducted separately on certain topics which may avoid conflicting or coincident
results, and also to classify the themes requiring pieces of evidence and providing
102 S. F. Shah et al.

direction for future investigation [10]. Likewise, the systemic review is a form of
literature review, in which authors collect and analyze critically related research papers
by applying methods that are selected research question (s) and answer (s) in structured
methodology [11]. Here, in this study, systemic review uses reliable sources and the
main elements in each study were elicited and summarized in a planned manner. The
most repetitive factors are as follows:
1. Overconfidence: [12] is the tendency to overstate the degree of how a person is
accurate. [13] suggested that, the overconfidence directs to an error in judgment as a
result of financial losses and failure in most new ventures. In other words, over-
confidence can help to promote professional performance [14]. It is also noted that
overconfidence can enhance other’s perception of one’s abilities, which may help to
achieve faster promotion and greater investment duration [15]. The overconfidence
has significant impact on decision making [16–21].
2. Anchoring bias: it is defined as when an individual makes an estimate from the
initial starting value (information) and that is in mind till the final answer. As a
human is not aware of the bias, so the adjustment is not enough. Any change in
starting point can lead to different estimates so the bias depends on initial values
(information) [22]. The anchoring bias has significant impact on decision making
[14, 23, 24].
3. Loss aversion: it is explained that the loss effects appear twice greater than gain,
even the attractive expected value from lotteries view is not accepted, if it involves
potential loss [25–27]. Similarly, the loss aversion has significant impact on deci-
sion making [9, 28–30]
4. Herding effect: it is like a kind of imitation behavior leading to an alignment of the
individual’s behavior [31]. According to scholars, a person conceals his or her own
beliefs and imitates the actions of others is known as herding behavior. It is found
that, the herding effects influence decision making [18, 32–34].

3 Methods

The method used to conduct a critical review of the studies called systematic review
and guidelines used by [35–37]. Their several studies conducted systemic reviews on
different topics [38–40]. Systematic reviews are a kind of literature review that gives a
rigorous and clear form of picture about a subject, it also helps to reduce implicit
researcher bias by taking on comprehensive search strategies, specific searching key-
words and standardized inclusion and exclusion criteria. Moreover, systemic reviews
eases search beyond researchers on subject [41–44]. This methodological section
consists of three stages: 1st stage describes the inclusion and exclusion criteria; 2nd
stage present sources of data and searches strategy and 3rd stage is data coding and
analysis. The details of these stages are shown below.
The Impact of the Behavioral Factors on Investment Decision-Making 103

3.1 Inclusion/Exclusion Criteria


The selected papers should be analyzed in-depth and should realize the inclusion and
exclusion criteria described in Table 1 [38, 45, 46]. Moreover, it helps to maintain
focus and keep in mind the quality of research [42, 47–53].

Table 1. Inclusion and exclusion criteria.


No. Criteria Inclusion Exclusion
1 Source type Peer-reviewed articles, Other removed
Scholarly journals, Case
studies, Academic Journals,
Dissertations & Theses
2 Selected English only Others removed
language
3 Type of Quantitative, qualitative, Verbal\ Visual tape, film,
studies empirical studies, systematic documentary, reports, other
review studies
4 Study design Prior and controlled studies, –
survey, interview, case study
5 Measurement Behavioral Factors (Heuristics
and Biases) and Decision-
making
6 Outcome Relationship between cognitive –
biases and Decision-making
7 Context Should involve Behavioral All contexts do not mention
Factors (Heuristics and Biases) separately (behavioral finance),
and Decision-making (behavioral factor),
(psychological factor) and
(behavioral economics), AND
(decision-making) in the Title

3.2 Data Sources and Research Strategies


The research articles have been included in this systematic literature review from a
broad search of existing research papers using various databases (ProQuest One
Academic, Google scholar, Emerald insight, Taylor & Francis, Springer and Science
Direct). Systematic reviews are noteworthy, as they fulfill the need to take a com-
prehensive review at all existing research studies related to a research question [54].
The search for the below studies ranges from 2010 to 2020 and embark on in March
2020. The keywords that were included in the search terms were (behavioral finance,
behavioral factors, psychological factors and behavioral economic AND decision-
making) see Table 2. The search terms are used only in the title as advance search
104 S. F. Shah et al.

Table 2. The data sources and search keywords


Behavioral Finance
Serial Keywords search Google Emerald IEEE ProQuest Science Springer Wiley Taylor Total
# Scholar One Direct & Frequency
Academic Francis Each
Keyword
1 “Behavioral 17 6 0 3 3 1 1 1 32
Finance” AND
“Decision-
making”
2 “Behavioral 12 2 1 2 2 2 0 0 21
factors” AND
“Decision-
making”
3 “Psychological 28 2 0 8 2 0 0 2 42
factors” AND
“Decision-
making”
4 “Behavioral 36 1 0 5 4 0 2 1 49
economic” AND
“Decision-
making”
Total Frequency 93 11 1 18 11 3 3 4 144
Each Database

Table 3. The filtration criteria for articles from each database


Behavioral finance
Serial Keywords Initial Filtration Number of Number of Number of Number of
# search search in of articles frequency frequency frequency frequency
databases in (after (irrelevant (after (after
databases removing articles) scamming critically
duplication) the papers) reading
papers)
1 Google 93 38 25 21 15 10
Scholar
2 Emerald 11 11 11 10 9 8
3 IEEE 1 1 1 1 1 0
4 ProQuest 18 18 12 8 4 3
One
Academic
5 Science 11 11 11 6 5 4
Direct
6 Springer 3 3 3 3 0 0
7 Wiley 3 3 3 2 2 2
8 Taylor & 4 4 4 2 2 2
Francis
Total 144 89 70 53 38 29
frequency
of articles
The Impact of the Behavioral Factors on Investment Decision-Making 105

criteria in databases, however, only two most relevant articles were collected through a
basic search. In the initial search in databases Google scholar (N = 93), Emerald
(N = 11), IEEE (N = 1), ProQuest One Academic (N = 18), Science Direct (N = 11),
Wiley (N = 3), Taylor & Francis (N = 4) so from all databases, the total were
(N = 144). The step by step filtration has been done, firstly the filtration of article
papers was done by excluding books and other documents (remaining N = 89), the
second step removed duplication (remaining N = 70), the third step removed irrelevant
(remaining N = 53), in the fourth step the number remained after scamming papers
were N = 38 and finally 8 papers were removed by critically reading the documents, so
the total remaining papers were N = 29, see Table 3. In this manner, the relevant
studies were selected and included in the systematic review process, as in Table 4. [55]
suggest that this process ensures the replicability of the study. It formally started in
1979 by [22] and the study was the prospect theory which is decision-making under
risk.

4 Result

Table 4 is coded as follow: (a) author(s), (b) Database, (c) Year, (d) Place,
(e) Dependent variable (s), (f) Context, (g) Data collection method, (h) Methodology
and (i) Sample size. The selected studies are critically filtered, and the exclusion
process strictly followed the factor (s) that affects the dependent variable (s) [56, 58].
The articles are taken between 2010 to 2020 and selected 29 studies, 16 papers (55%)
performed in the stock market. Most studies (50%) were conducted in Asian financial
markets, 6 (20.6%) articles in the United States and other papers were from another part
of the world. The quantitative methods were used in 13 papers, the qualitative method
in 11 papers and the remaining 5 studies used mixed methods. Besides, the highest
number of articles is collected from the database of google scholar (10) and Emerald
(8) and rest 4 and below papers.

Table 4. Analysis of included research articles


Author Database Year Place Dependent Context Data collection Methodology Sample size
(s) variables methods
[23] Google 2014 Universidade Gender (Male & Real Estate Questionnaire Quantitative 217 (108 men &
Scholar Católica de Female) financial (Survey) Method 109 women)
Brasília decision-making
of individual
investors
[14] Google 2016 Tabriz city, Investors’ Stock Exchange Questionnaire Quantitative statistical sample,
Scholar Iran Decisions (Survey) method 385 people
Making
[56] Google 2018 Pakistan Financial Stock Exchange Semi-structured Qualitative 30 Interviews
Scholar decisions of interview method research
investors strategy

(continued)
106 S. F. Shah et al.

Table 4. (continued)
Author Database Year Place Dependent Context Data collection Methodology Sample size
(s) variables methods
[18] Google 2011 Vietnam Investment Ho Chi Minh Structured Mixed 1.
Scholar Performance of Stock Exchange interviews, semi methods 172 respondents
Individual structured (Questionnaire) &
Investors interviews, 2 Managers of
(Moderator unstructured HOSE
Decision- interviews, self-
making) completion
questionnaire,
observation,
group discussion
[28] Google 2017 Pakistan investors’ Pakistani stock Questionnaire Quantitative 41 respondents
Scholar decision-making markets (Survey) Method
and investment
performance
[20] Google 2018 Pakistan Investment Pakistan Stock Questionnaire Quantitative sample consists of
Scholar decisions and Exchange (Survey) method 143 investors
perceived market trading on the
efficiency PSX
[57] Google 2016 Pakistan Decision of Islamabad Stock Questionnaire Quantitative 100 investors
Scholar investment Exchange (Survey) Method from Islamabad
Stock Exchange
[58] Google 2011 United States Behavioral the retirement EBRI/ICI 401(k) Qualitative 21 million
Scholar Decision-making savings decision database method participants in the
in Americans’ (Retirement sample &
Retirement saving) & JDM Literature Review
Savings and behavioral-
Decisions economics
(individuals’ literatures
savings behavior)
[59] Google 2013 Sydney, Strategic Organization Empirical studies Qualitative Empirical Studies
Scholar Australia decisions within (Company) method (# not Specified)
firms
[60] Google 2012 Croatia CEO’s process Business firm Literature Qualitative Research studies
Scholar Reviews method (Literature
Reviews) (# not
Specified)
[9] Emerald 2010 UK Discrepancy Wholesale and Round Qualitative Round
between the retail financial Table discussion method Table discussion
academic and the markets “held in on behavioral on behavioral
professional London,11 finance attended finance attended
world when it December 2009 by academics and by academics and
comes to at Armourers’ practitioners. practitioners
utilizing Hall (Viewpoint)
behavioral
finance research
(Finance
industry)
[29] Emerald 2010 United States Decision-making The Behavioral Empirical Studies Qualitative A cross
financial method disciplinary
paradigm. review of relevant
(Conceptual natural and social
paper) sciences is
conducted to
identify common
foundational
concepts
[61] Emerald 2012 UK Financial Behavioral Research papers Qualitative Research papers
decisions Finance used of method (# not specified)
psychological
experimental
methods

(continued)
The Impact of the Behavioral Factors on Investment Decision-Making 107

Table 4. (continued)
Author Database Year Place Dependent Context Data collection Methodology Sample size
(s) variables methods
[4] Emerald 2015 UK Influence of Financial Empirical Studies Qualitative Empirical Studies
moods and markets (Literature method (# not Specified)
emotions on Reviews)
financial
behavior
[62]) Emerald 2019 Tunisia Unexpected Tunisian Stock (Financial Market Quantitative Sample Publicly
earnings Exchange Council Tunisia) methods traded 39
(UE) and surprise Announcements companies (2010-
unexpected 2014)
earnings (SUE),
Earning per share
EPS and the
revision of
earnings forecast
(REV)
[17] Emerald 2019 Egypt Investment Egyptian Stock Questionnaire Quantitative Structured
decisions market (Survey) method questionnaire
(demographic survey carried out
characteristics: among 384 local
age, gender, Egyptian, foreign,
education level institutional, and
and experience) individual
investors
[63] Emerald 2020 Malaysia Individual Generation Y in Questionnaire Quantitative A total of 502
investment Malaysia (Survey) method respondents (male
decisions and female)
[34] Emerald 2012 Malaysia Day-of-the week This paper Literature Qualitative Psychological
anomaly conceptually Reviews method biases literature
investigates the and links (# not
role of Specified)
psychological
biases on the
day-of-the week
anomaly
(DOWA) in
Stock Market
[30] ProQuest 2014 India Decision-making Indian stock Questionnaire Quantitative 150 respondents
for investment market (Survey) method
[64] ProQuest 2012 United States Financial the financial Qualitative meta- Qualitative Meta- Analysis (#
practices processes of analysis of the method not specified)
individuals, current state of
groups, and financial behavior
organizations
[65] ProQuest 2015 Pakistan Investment Islamabad Stock Questionnaire Quantitative 200 Financial
decision. Exchange, (Survey) Method investors
(Moderator is Islamabad, (risk (Respondents)
Risk Perception) perception in
Pakistani culture
context.)
[32] Science 2017 United States Decision-making four major US All firm level data Mixed Sample Period
Direct industries are from Annual methods (1996 -2015) All
(Manufacturing, Compustat financial services
Construction, database from firms (SIC codes
Wholesale and 1996 to 2015 and 6000–6999),
Services the annual gross regulated utilities
domestic product (SIC codes 4900–
(GDP) data was 4999) and firms
extracted from with less than ten
World Bank (10) years of
database continuous data
were excluded

(continued)
108 S. F. Shah et al.

Table 4. (continued)
Author Database Year Place Dependent Context Data collection Methodology Sample size
(s) variables methods
[19] Science 2016 Malaysia financial Malaysian stock Questionnaire Quantitative 200 respondents
Direct decisions market (Survey) method
[21] Science 2013 China Venture The irrational Double-sided Qualitative Studies on the
Direct Enterprise Value behavior of moral hazard method double-sided
and the venture model moral hazard (#
Investment entrepreneurs not specified)
and venture
capitalists has
significant
impact on the
corporation’s
investment
[66] Science 2019 United States Oneself and Identified in the Literature Mixed The 190 subjects
Direct decision-making behavioral- Reviews and methods were SCU
for others DMfO economics laboratory students, recruited
literature apply experiment by email and
in decision- (Questionnaire) studies (Literature
making for reviews)
others (DMfO)
[67] Wiley 2015 Hong Kong Financial Stock Market Five experiments Mixed 5 studies: 1. End
decision-making methods anchoring 155, 2.
Visual bias 202,
3. Consequential
stakes 48, 4. Eye
tracking 50, 5.
Run-length 162.
Total sample size
617
[68] Wiley 2017 United States Remediation Trihydro Questionnaire Quantitative The survey was
decision Corporation (Survey) method completed by 118
behaviors respondents
representing
academia,
consultants,
clients, and others
[69] Taylor & 2013 Espana Quality of financial Research papers Qualitative Selected studies
Francis (Spain) financial institutions and method of cognitive
decisions markets biases as well as
cognitive models
(# not specified)
[33] Taylor & 2016 Pakistan Individual the Lahore Stock Questionnaire Quantitative The investors of
Francis investor’s Exchange (LSE) (Survey) method stock exchange
sample collected
254

5 Conclusion

Prior papers explain the effects of the behavioral factors’ decision-making, which
provides insight into the topic to identify human bias and improve investment decision-
making. The current study led through a systematic review method on behavioral or
psychological factors influence in financial decision-making. The reason behind is to
examine the comprehensive analysis of published papers and to review the conclusions.
The result of this study shows that most frequently appeared factors are overconfi-
dence, anchoring bias (heuristics bias), loss aversion (prospect factor) and herding
effect in financial decision-making. Moreover, most of the articles focused on the
The Impact of the Behavioral Factors on Investment Decision-Making 109

financial sector and used a quantitative method (13 studies), a qualitative method (11
studies) and mixed methods (5 studies). Finally, half (50%) of the studies were con-
ducted in Asian financial markets, 20% in the United States and the rest of the articles
were from another part of the world. The constraints of the study were the time and
search of key terms in the title of the articles. For future direction, those most repetitive
cognitive bias should be measured during COVID-19 pandemic uncertain situation.

References
1. Hilton, D.J.: The psychology of financial decision-making: applications to trading, dealing,
and investment analysis. J. Psychol. Financ. Mark. 2(1), 37–53 (2001)
2. Ritter, J.R.: Behavioral finance. Pacific-Basin Financ. J. 11(4), 429–437 (2003)
3. Bazerman, M.H., Moore, D.A.: Judgment in Managerial Decision Making. Wiley, New
York (1994)
4. Duxbury, D.: Behavioral finance: insights from experiments II: biases, moods and emotions.
Rev. Behav. Financ. (2015)
5. Fünfgeld, B., Wang, M.: Attitudes and behaviour in everyday finance: evidence from
Switzerland. Int. J. Bank Mark. (2009)
6. Kahneman, D.: Maps of bounded rationality: psychology for behavioral economics. Am.
Econ. Rev. 93(5), 1449–1475 (2003)
7. Kahneman, D.: Thinking, Fast and Slow. Macmillan, New York (2011)
8. Corrêa, V.S., Vale, G.M.V., de R. Melo, P.L., de A. Cruz, M.: O ‘Problema da Imersão’ nos
Estudos do Empreendedorismo: Uma Proposição Teórica. Rev. Adm. Contemp. 24(3), 232–
244 (2020)
9. DeBondt, W., Forbes, W., Hamalainen, P., Muradoglu, Y.G.: What can behavioural finance
teach us about finance? Qual. Res. Financ. Mark. (2010)
10. Linde, K., Willich, S.N.: How objective are systematic reviews? Differences between
reviews on complementary medicine. J. R. Soc. Med. 96(1), 17–22 (2003)
11. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of E-learning system in
Higher Educational Environments in the UAE: applying the Extended Technology
Acceptance Model (TAM). Br. Univ. Dubai (2018)
12. Fischhoff, B., Slovic, P., Lichtenstein, S.: Knowing with certainty: the appropriateness of
extreme confidence. J. Exp. Psychol. Hum. Percept. Perform. 3(4), 552 (1977)
13. Singh, R.P.: Overconfidence. New Engl. J. Entrep. (2020)
14. Shabgou, M., Mousavi, A.: Behavioral finance: behavioral factors influencing investors’
decisions making. Adv. Soc. Humanit. Manag. 3(1), 1–6 (2016)
15. Oberlechner, T., Osler, C.L.: Overconfidence Curr. Mark (2004). https://faculty.haas.berkeley.
edu/lyons/Osler%20overconfidence%20in%20FX.pdf. Accessed 20 Apr 2011
16. Robinson, A.T., Marino, L.D.: Overconfidence and risk perceptions: do they really matter
for venture creation decisions? Int. Entrep. Manag. J. 11(1), 149–168 (2015)
17. Metawa, N., Hassan, M.K., Metawa, S., Safa, M.F.: Impact of behavioral factors on
investors’ financial decisions: case of the Egyptian stock market. Int. J. Islam. Middle East.
Financ. Manag. (2019)
18. Le Luong, P., Thi Thu Ha, D.: Behavioral factors influencing individual investors’ decision-
making and performance: a survey at the Ho Chi Minh Stock Exchange (2011)
19. Bakar, S., Yi, A.N.C.: The impact of psychological factors on investors’ decision making in
Malaysian stock market: a case of Klang Valley and Pahang. Procedia Econ. Financ. 35,
319–328 (2016)
110 S. F. Shah et al.

20. Shah, S.Z.A., Ahmad, M., Mahmood, F.: Heuristic biases in investment decision-making
and perceived market efficiency. Qual. Res. Financ. Mark. (2018)
21. Jing, G.U., Hao, C., Xian, Z.: Influence of psychological and emotional factors on the
venture enterprise value and the investment decision-making. In: ITQM, pp. 919–929 (2013)
22. Tversky, A., Kahneman, D.: Prospect theory: an analysis of decision under risk.
Econometrica 47(2), 263–291 (1979)
23. Matsumoto, A.S., Fernandes, J.L.B., Ferreira, I., Chagas, P.C.: Behavioral finance: a study
of affect heuristic and anchoring in decision making of individual investors, Available SSRN
2359180 (2013)
24. Copur, Z.: Handbook of Research on Behavioral Finance and Investment Strategies:
Decision Making in the Financial Industry: Decision Making in the Financial Industry. IGI
Global (2015)
25. Merkle, C.: Financial loss aversion illusion. Rev. Financ. 24(2), 381–413 (2020)
26. Tversky, A., Kahneman, D.: Advances in prospect theory: cumulative representation of
uncertainty. J. Risk Uncertain. 5(4), 297–323 (1992)
27. Tversky, A., Kahneman, D.: Loss aversion in riskless choice: a reference-dependent model.
Q. J. Econ. 106(4), 1039–1061 (1991)
28. Anum, B.A.: Behavioral factors and their impact on individual investors decision making
and investment performance: empirical investigation from Pakistani stock market. Glob.
J. Manag. Bus. Res. (2017)
29. Olsen, R.A.: Toward a theory of behavioral finance: implications from the natural sciences.
Qual. Res. Financ. Mark. 2(2), 100–128 (2010)
30. Roopadarshini, S.: A study on implication of behavioral finance towards investment decision
making on stock market. Asia Pacific J. Manag. Entrep. Res. 3(1), 202 (2014)
31. Yang, J., Cashel-Cordo, P., Kang, J.G.: Empirical research on herding effects: case of real
estate markets. J. Account. Financ. 20(1), 1–9 (2020)
32. Camara, O.: Industry herd behaviour in financing decision making. J. Econ. Bus. 94, 32–42
(2017)
33. Sarwar, A., Afaf, G.: A comparison between psychological and economic factors affecting
individual investor’s decision-making behavior. Cogent Bus. Manag. 3(1), 1232907 (2016)
34. Brahmana, R.K., Hooy, C., Ahmad, Z.: Psychological factors on irrational financial decision
making. Humanomics 28, 236–257 (2012)
35. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
36. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
37. Mallett, R., Hagen-Zanker, J., Slater, R., Duvendack, M.: The benefits and challenges of
using systematic reviews in international development research. J. Dev. Eff. 4(3), 445–455
(2012)
38. Meline, T.: Selecting studies for systemic review: inclusion and exclusion criteria.
Contemp. issues Commun. Sci. Disord. 33(Spring), 21–27 (2006)
39. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
40. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK
mobile phone market. J. Manag. Res. 6(1), 109 (2014)
The Impact of the Behavioral Factors on Investment Decision-Making 111

41. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
42. Alshurideh, et al.: Loyalty program effectiveness: theoretical reviews and practical proofs.
Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
43. Assad, N.F., Alshurideh, M.T.: Financial reporting quality, audit quality, and investment
efficiency: evidence from GCC economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208
(2020)
44. Assad, N.F., Alshurideh, M.T.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
45. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Al A’yadeh, I.: An investigation
of factors affecting patients waiting time in primary health care centers: an assessment study
in Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
46. Alshurideh, et al.: Understanding the quality determinants that influence the intention to use
the mobile learning platforms: a practical study. Int. J. Interact. Mob. Technol. 13(11), 157–
183 (2019)
47. Al Kurdi, B.: Investigating the factors influencing parent toy purchase decisions: reasoning
and consequences. Int. Bus. Res. 10(4), 104–116 (2017)
48. Kurdi, B.A., Alshurideh, M., Salloum, S.A., Obeidat, Z.M., Al-dweeri, R.M.: An empirical
investigation into examination of factors influencing university students’ behavior towards e-
learning acceptance using SEM approach. Int. J. Interact. Mob. Technol. 14(2), 19–41
(2020)
49. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality,
price fairness and service recovery shape customer satisfaction and delight? A practical study
in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
50. Al Kurdi, B.: Healthy-food choice and purchasing behaviour analysis: an exploratory study
of families in the UK. Durham University (2016)
51. Al-Dmour, H., Al-Shraideh, M.T.: The influence of the promotional mix elements on
Jordanian consumer’s decisions in cell phone service usage: an analytical study.
Jordan J. Bus. Adm. 4(4), 375–392 (2008)
52. Alshurideh, M., Nicholson, M., Xiao, S.: The effect of previous experience on mobile
subscribers’ repeat purchase behaviour. Eur. J. Soc. Sci. 30(3), 366–376 (2012)
53. Ashurideh, M.: Customer service retention – a behavioural perspective of the UK mobile
market. Durham University (2010)
54. García-Feijoo, M., Eizaguirre, A., Rica-Aspiunza, A.: Systematic review of sustainable-
development-goal deployment in business schools. Sustainability 12(1), 440 (2020)
55. González, I.F., Urrútia, G., Alonso-Coello, P.: Revisiones sistemáticas y metaanálisis: bases
conceptuales e interpretación. Rev. española Cardiol. 64(8), 688–696 (2011)
56. Shahid, M.N., Aftab, F., Latif, K., Mahmood, Z.: Behavioral finance, investors’ psychology
and investment decision making in capital markets: an evidence through ethnography and
semi-structured interviews. Asia Pacific J. Emerg. Mark. 2(1), 14 (2018)
57. Hunjra, A.I., Qureshi, S., Riaz, L.: Psychological factors and investment decision making: a
confirmatory factor analysis. J. Contemp. Manag. Sci. 2(1) (2016)
58. Knoll, M.A.Z.: The role of behavioral economics and behavioral decision making in
Americans’ retirement savings decisions. Soc. Sec. Bull. 70, 1 (2010)
59. Garbuio, M., Lovallo, D., Ketenciouglu, E.: Behavioral economics and strategic decision
making (2013)
60. Galetic, L., Labas, D.: Behavioral economics and decision making: importance, application
and development tendencies. In: An Enterprise Odyssey. International Conference
Proceedings, p. 759 (2012)
112 S. F. Shah et al.

61. Muradoglu, G., Harvey, N.: Behavioural finance: the role of psychological factors in
financial decisions. Rev. Behav. Financ. 4, 68–80 (2012)
62. Bouteska, A., Regaieg, B.: Psychology and behavioral finance. EuroMed J. Bus. (2019)
63. Rahman, M., Gan, S.S.: Generation Y investment decision: an analysis using behavioural
factors. Manag. Financ. (2020)
64. Howard, J.A.: Behavioral finance: contributions of cognitive psychology and neuroscience
to decision making. J. Organ. Psychol. 12(2), 52–70 (2012)
65. Riaz, L., Hunjra, A.I.: Relationship between psychological factors and investment decision
making: the mediating role of risk perception. Pakistan J. Commer. Soc. Sci. 9(3), 968–981
(2015)
66. Ifcher, J., Zarghamee, H.: Behavioral economic phenomena in decision-making for others.
J. Econ. Psychol. 77, 102180 (2019)
67. Duclos, R.: The psychology of investment behavior:(De) biasing financial decision-making
one graph at a time. J. Consum. Psychol. 25(2), 317–325 (2015)
68. Clayton, W.S.: Remediation decision-making and behavioral economics: results of an
industry survey. Groundw. Monit. Remediat. 37(4), 23–33 (2017)
69. De Bondt, W., Mayoral, R.M., Vallelado, E.: Behavioral decision-making in finance: an
overview and assessment of selected research. Spanish J. Financ. Accounting/Revista
Española Financ. y Contab. 42(157), 99–118 (2013)
Deep Learning Technology
and Applications
A Deep Learning Architecture with Word
Embeddings to Classify Sentiment in Twitter

Eman Hamdi(&) , Sherine Rady, and Mostafa Aref

Faculty of Computer and Information Sciences, Ain Shams University, Abbassia,


Cairo, Egypt
{emanhamdi,srady,mostafa.aref}@cis.asu.edu.eg

Abstract. Social Media Networks are one of the main platforms to express our
feelings. The emotions we put in text tell a lot about our behavior towards any
topic. Therefore, the analysis of text is a need for detecting one’s emotions in
many fields. This paper introduces a deep learning model that classify senti-
ments from tweets using different types of word embeddings. The main com-
ponent of our model is the Convolutional Neural Network (CNN) and the main
used features are word embeddings. Trials are made on randomly initialized
word embeddings and pretrained ones. The used pre-trained word embeddings
are of different variants such as Word2Vec, Glove and fastText models. The
model consists of three CNN streams that are concatenated and followed by a
fully-connected layer. Each stream contains only one convolutional layer and
one max-pooling layer. The model works on detecting positive and negative
emotions from Stanford Twitter Sentiment (STS) dataset. The accuracy achieved
is 78.5% when using the randomly initialized word embeddings and achieved a
maximum accuracy 84.9% using Word2Vec word embeddings. The model not
only proves that randomly initialized word embedding can achieve good
accuracy, it also showing the power of the pretrained word embeddings that
helps to achieve a higher competitive accuracy in sentiment classification.

Keywords: Sentiment classification  Deep learning  Convolutional neural


networks  Word embeddings  Social medial networks

1 Introduction

Emotions felt by humans are a very significant characteristic for understanding their
psychological traits. As text is the main method for communication between humans,
there are many approaches for studying text for identifying emotions and classifying
sentiment from it [1]. Social media networks as Twitter and Facebook, are main
platforms in which every event, situation and news are posted and discussed. People
are using them as windows to describe their emotions and believes toward different
types of topics [2]. Posts on this type of sites have the natural and realistic factor as
people are posting freely through the day expressing themselves. That fact makes these
sites valuable sources of textual data that can be studied and analyzed to detect sen-
timents and emotions [3]. Despite having this enormous amount of textual data, it is
nearly impossible to process it manually. That raised the need for various techniques to

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 115–125, 2021.
https://doi.org/10.1007/978-3-030-58669-0_10
116 E. Hamdi et al.

automatically process and classify text [4]. Therefore, detecting emotions from text can
be applied using different approaches. We can roughly divide them to lexicon-based
approaches and machine learning approaches.
Recently, As a machine learning approach, Deep learning models such as Recurrent
Neural Networks (RNN) and Convolutional Neural Networks (CNN) are used in text
classification tasks [5, 6]. These models can detect implicit semantic relations between
the words. Also, high-level features can be learnt using deep learning models even
when using low-level features. This is achieved in text classification tasks whenever
enough amount of textual data is present in the training phase [7].
One of the most essential aspects in text classification tasks is the selection of
features. Words are represented as discrete values in lexicon-based approaches. This
representation makes the data sparse because it treats words as distinct and discrete
values. The problem caused can be solved by using different representation, therefore
word embeddings are used. Word embeddings are dense, continuous vector repre-
sentation of words. These vectors give insights about words meaning as words that
have similar semantics appear to be nearby each other in this space [8]. For initializing
word embeddings, two methods are used. 1. Randomly initialized word embeddings
and 2. Pre-trained word embeddings as Glove [9], Word2vec [10] or fastText [11]. The
first initialization mainly defines random values to represent the word embeddings.
These values can be kept static or updated through the training phase to learn task
specific word embeddings. The second initialization is using word embeddings that are
already pre-trained using huge data sets which can also be static or trainable through
the training phase.
In this paper, a Convolutional Neural Network model is proposed that classify
negative and positive sentiments from Tweets. The introduced model works on both
randomly initialized word embeddings and pre-trained word embeddings. The model
basically contains three CNN streams. They are merged and followed by a fully-
connected Sigmoid layer. The Stanford Twitter Sentiment dataset (STS) [12] is used to
train, validate and test the model. The model achieved 78.5% accuracy using randomly
initialized word embeddings and 84.9% accuracy using pretrained Word2Vec word
embeddings.
The paper organization goes as follows: Sect. 2 discusses the related work in
emotion detection and sentiment classification from text. In Sect. 3, the architecture of
the CNN model is explained, and each layer is described in detail. Experimental results
are shown and discussed in Sect. 4, and Sect. 5 is a summary of the proposed work.

2 Related Work

Traditional machine learning techniques and deep learning techniques are both applied
in text classification tasks. One of the early works on using machine learning with
sentiment classification is introduced in [13]. The main idea of this work is studying the
ability of machine learning methods in classifying text by sentiment (positive or
negative). Three methods are employed which are Naive Bayes, Maximum Entropy
and Support Vector Machines (SVMs). These methods are tested using a movie review
A Deep Learning Architecture with Word Embeddings to Classify Sentiment 117

dataset from IMDB. Support vector machines give the best performance recording
82.9% accuracy while Naïve Bayes gives the lowest performance recording 78.7%
accuracy.
In [14], a Naïve Bayes model and a Maximum Entropy model are implemented to
classify positive and negative sentiment from tweets. They used the java version of the
Twitter Application Programming Interface (API) to collect tweets. As features,
Multinomial unigrams, Bernoulli unigrams and bigrams are tested. The best perfor-
mance is given using the Naïve Bayes model with unigrams.
In [12], Naive Bayes, Maximum Entropy and SVMs are applied for the classifi-
cation task of tweets. Unigrams, bigrams and unigrams with part of speech tags are the
selected features. They achieved a max accuracy of 83% using Maximum Entropy with
both unigrams and bigrams.
Recently, the deep learning techniques are applied to different classification tasks in
speech recognition as in [15–17], image processing as in [18–20] and text classification
as in [6, 21–24].
One of the deep learning models used for text classification are CNNs. CNNs are
originally created for image processing and then applied to various classification tasks
including text classification [25]. In [6], a combination of a CNN and a Recurrent
Neural Network (RNN) model is implemented. The ability of the model to understand
semantic and orthographic information by using the characters only is proven as it is
tested on the English Penn Treebank dataset and achieved competitive results even
when using 60% less parameters.
In [21], a character-level CNN for text classification is proposed. Large data sets are
created to test the model on. AG’s news corpus, Sogou news corpus, DBPedia ontology
dataset, Yelp reviews, Yahoo! Answers dataset, and Amazon reviews are used to
collect the data. Two deep CNNs were built. The model achieved good performance in
sentiment analysis and topic classification compared against traditional and deep
learning models.
In [22], a CNN that consists of one layer is implemented and tested for different
sentence classification tasks. The used features are word embeddings which are either
randomly initialized or pretrained (Word2Vec). Four variations of the model were built.
The models are tested on 7 text classification tasks using different datasets. It out-
performs previous work on 4 tasks out of 7.
In [23], a CNN architecture is proposed. It consists of a convolutional layer, max
pooling layer and a soft-max layer for classification a tweet as negative or positive. The
model is tested on two tasks of sentiment analysis from Semeval-2015 Twitter Sen-
timent Analysis challenge. For the first task -phrase level task- the accuracy achieved is
84.79% and for the second task -message-level task- the accuracy achieved is 64.59%.
In [24], A CNN model is applied to binary sentiment task (positive or negative
label) and ternary classification task (positive, negative or neutral label). Using Movie
Reviews, the model achieved 80.96% F1 score at binary classification task and 68.31%
at ternary classification task. Using Stanford Sentiment Treebank (SST), the model
achieved 81.4% F1 score for binary classification task while it achieved 70.2% F1
score for the same task on Customer Reviews (CR).
118 E. Hamdi et al.

3 The Proposed Deep Learning Architecture for Sentiment


Classification

The proposed deep learning architecture shown in Fig. 1 includes text preprocessing,
CNN streams and a fully-connected layer as they are the main modules of the archi-
tecture. Text preprocessing is applied by filtering sentences, tokenizing sentences and
indexing sentences. Then the processed text is represented in the form of word
embeddings and passed to the three CNN streams. The outputs of the CNN steams are
merged together and passed to the fully-connected layer. A detailed explanation of each
module is discussed in this section.

Fig. 1. The main modules of the Deep learning architecture

3.1 Text Preprocessing


Text preprocessing is important to prepare data to work with deep learning models. We
applied Filtering, tokenizing and indexing to sentences. The filtering stage purpose is to
clean the data from unneeded words or symbol. The sentences are filtered by removing
punctuation marks, stop words, links, mentions, hashtags as in (#happy, #home, #sick),
emoticons, names. The filtered sentences are then tokenized as each sentence is split
into words. Finally, the indexing is applied to give each distinct word in the vocabulary
a specific index that will be the index of the word embedding. In our work, that means
when a sentence is to be fed into the CNN, it is fed as indices that are passed to the
embedding layer to fetch the corresponding word embeddings for each index in the
sentence.
A Deep Learning Architecture with Word Embeddings to Classify Sentiment 119

3.2 Word Embeddings


Word embeddings are continuous word representations that can be applied on the top
of a CNN model. Each word is represented by a unique vector either containing
random values or containing meaningful values from pre-training. The pretrained word
embeddings are trained using huge un-labelled data and can be used in many natural
language processing tasks. In the proposed model, randomly initialized word embed-
dings and pretrained word embeddings are tested on the embedding layers. For each
word in the vocabulary there is a word embedding d. given a sentence s of m words, it
is represented as a sentence matrix d 1:m concatenating all the word embeddings on it
such that:

d 1:m ¼ d 1  d 2  . . .  d m ð1Þ

3.3 CNN Streams


The main building block of our model is the CNN. The model contains 3 CNN streams.
each one is consisting of an embedding layer, a convolutional layer and a max pooling
layer. In Fig. 2 the detailed process of a sentence going through one CNN stream is
shown.

Fig. 2. The sentence flow through one CNN stream

The embedding layer holds all the word embeddings for the words in the vocab-
ulary. As an indexed sentence is passed to that layer, a sentence matrix as mentioned in
the previous section will be produced. This sentence matrix is then passed to the
convolutional layer. The filter number is 100 but the filter sizes differ between the three
CNN streams. 3, 5, 7 are the filter sizes respectively. Feature maps are produced from
120 E. Hamdi et al.

the filtering and passed to the max-pooling layer where the maximum important fea-
tures are chosen from each map and generate feature vectors. The feature vectors
generated from each CNN streams are merged to be passed to the fully connected layer.

3.4 Fully-Connected Layer


The Fully connected layer is responsible for classification an input as negative or
positive sentiment. As shown in Fig. 3, The output of the CNN streams is flattened
before it is fed to this layer. This means whatever the dimensions of the output from the
previous layer, it will be converted to a 1D vector so that the Fully-connected layer is
able to work on it.

Fig. 3. The input of the fully-connected layer

The activation function used is Sigmoid function given by the equation:

1
;ð Z Þ ¼ ð2Þ
1 þ ez

As Z is the flattened vector to be passed to the output neuron. Also, Dropout is


applied on this layer to prevent overfitting. The ratio is set to be 0.5. It is a general-
ization technique introduced in [26].

4 Experimental Results

In this section, the dataset, evaluation metrics, configurations and results discussion
will be explained in detail.
A Deep Learning Architecture with Word Embeddings to Classify Sentiment 121

4.1 Dataset
The model is tested on the labelled Stanford Twitter Sentiment dataset, which consists
of 1.6 million tweets. 80K randomly selected sentences are used for training and 16K
sentences are chosen for validation. For testing, we used the original testing set that is
manually annotated by [12] and consists of 359 sentences. The sentences are labelled
with one value of three class labels: positive emotion, negative emotion or neutral. We
used the positive and the negative classes.

4.2 Evaluation Metrics


Four evaluation metrics are used: Accuracy, Precision, Recall and F1-score. These
metrics in [27] are defined as follows:
Accuracy is the ratio of correctly predicted sentences to the total number of sen-
tences as given in the equation:

Accuracy ¼ ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ ð3Þ

In which TP (True Positives) are the correctly classified positive sentences. TN


(True Negatives) are the correctly classified negative sentences. FP (False Positives) are
the wrongly classified positive sentences. FN (False Negatives) are the wrongly clas-
sified negative sentences.
Precision is the ratio of correctly predicted positive sentences to the total predicted
positive sentences as in the equation:

Precision ¼ TP/TP þ FP ð4Þ

Recall is the ratio of correctly predicted positive sentences to the all sentences in
actual class. It is given by the equation:

Recall ¼ TP/TP þ FN ð5Þ

F1-score is the weighted average of Precision and Recall. This score considers both
false positives and false negatives into calculation. It is given by the equation:

F1 Score ¼ 2  ðRecall  PrecisionÞ=ðRecall þ PrecisionÞ ð6Þ

4.3 Configurations
The maximum length of sentences has been adjusted to 16. The embedding layer
dimensions are [vocabulary size * word embedding dimensions] and each sentence
matrix is of dimensions [maximum length of sentences * word embedding length]. For
word embeddings initialization, we used 4 different initializations as follows:
Randomly initialized word embeddings: The word embedding has the dimensions
of [300 * 1], with the embedding layer and the sentence matrix dimensions as [vocab
size * 300] and [14 * 300] respectively. It took 10 epochs to train the network.
122 E. Hamdi et al.

Pre-trained Glove word embeddings: The pretrained Glove Wikipedia and Glove
twitter were both used. We used the 100-dimension version of the word embeddings
for both types of them. For both settings, the word embedding has the dimensions of
[100 * 1], with the embedding layer and the sentence matrix dimensions as [vocab size
* 100] and [14 * 100] respectively. The model took 4 epochs to train in both settings.
Pre-trained word2vec word embeddings: Google’s pretrained word embeddings are
used in this initialization. The word embedding dimensions in this setting are [300 * 1]
with the embedding layer and the sentence matrix dimensions as [vocab size * 300] and
[14 * 300] respectively. It took only 4 epochs to train the network.
Pre-trained fastText word embeddings: we used both fastText Wikipedia and fas-
tText Crawl. We used the 300-dimension version of the word embeddings for both
types of them. For both settings, the word embedding has the dimensions of [300 * 1],
with the embedding layer and the sentence matrix dimensions as [vocab size * 300] and
[14 * 300] respectively Using fastText Wikipedia, the model needed 4 epochs of
training while it took 5 epochs of training using fastText Crawl.
The results are obtained using Intel(R) Core (TM) i7-5500U CPU @2.40 GHz
personal computer with 16.0 GB RAM. The experiments have been developed using
Keras deep learning framework, TensorFlow backend and CUDA.

4.4 Results Discussion


The testing results obtained from using different word embeddings are shown in
Table 1 and Fig. 4. The model achieved 78.5% accuracy using the randomly initialized
word embeddings and getting higher using different types of word embeddings.

Table 1. Model evaluation in terms of Accuracy, Precision, Recall and F1-score


Word Accuracy Precision Recall F1-score
embeddings
Random 78.5 73.9 89 80.7
Word2Vec 84.9 83.3 87.9 85.5
Glove Wiki 79.3 81.7 76.3 78.9
Glove Twitter 83.0 84.9 80.7 82.8
fastText Crawl 82.1 82.0 82.8 82.5
fastText Wiki 81.3 83.2 79.1 81.1

Fig. 4. Model evaluation using different word embeddings


A Deep Learning Architecture with Word Embeddings to Classify Sentiment 123

The model’s performance gets better using different types of word embeddings.
which is expected as the pretrained word embeddings are already holding information
about English words before the training of the model even starts. While in the randomly
initialized setting, the values are totally random and starting from scratch to learn the
relation between the words. Even with the later setting, the model achieved good
accuracy that indicated the ability of CNN to capture syntactic and semantic relations
between words without any prior knowledge only with using raw text.
The model reached the maximum accuracy using Word2Vec word embeddings. it
achieved 84.9% accuracy. Using other types of word embeddings raise the accuracy
measures compared to the randomly initialized setting. This proves that the pretrained
word embeddings are powerful and have prior insights of words before the training
even starts. Word2Vec scoring the maximum accuracy means it is the most related This
means Word2Vec embeddings are the most related pretrained type to the training,
validation and testing data on our work. It can represent our data the best among the
other types of word embeddings.

5 Conclusion

A deep convolutional architecture is proposed in this paper. The model contains three
main modules namely text preprocessing, CNN streams, and a fully connected sigmoid
layer for the classification. The model is tested on the sentiment task (negative or
positive emotion) using the Stanford Twitter Sentiment dataset. Using either randomly
initialized word embedding or pre-trained word embeddings, the accuracy scores are
good. The model achieved 78.5% accuracy using only the randomly initialized word
embeddings that are set to be trainable through the training phase and achieved the
maximum accuracy of 84.9% using the pretrained word embeddings (Word2Vec).
The CNN model can work good with random initialization of word embeddings but
using pretrained ones gives the model insights of the relation between the words that
helps to improve the performance. For future work, using new types of pretrained word
embeddings as ELMO and BERT will be explored. Also, Character-level embeddings
will be tested either individually or with word embeddings.

References
1. Polignano, M., De Gemmis, M., Basile, P., Semeraro, G.: A comparison of word-
embeddings in emotion detection from text using BiLSTM, CNN and Self-Attention. In: The
27th Conference on User Modeling, Adaptation and Personalization, pp. 63–68 (2019).
https://doi.org/10.1145/3314183.3324983
2. Gaind, B., Syal, V., Padgalwar, S.: Emotion detection and analysis on social media. arXiv
preprint arXiv:1901.08458 (2019)
3. De Choudhury, M., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social
media. In: Seventh International AAAI Conference, pp. 128–137 (2013)
4. Deng, X., Li, Y., Weng, J., Zhang, J.: Feature selection for text classification: a review.
Multimed. Tools Appl. 78(3), 3797–3816 (2019)
124 E. Hamdi et al.

5. Cheng, H., Yang, X., Li, Z., Xiao, Y., Lin, Y.: Interpretable text classification using CNN
and max-pooling. arXiv preprint arXiv:1910.11236 (2019)
6. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In:
Thirtieth AAAI Conference on Artificial Intelligence (2016)
7. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
8. Word embeddings and their use in sentence classification tasks. arXiv preprint arXiv:1610.
08229 (2016)
9. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation.
In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP), pp. 1532–1543 (2014)
10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space, pp. 1–12. arXiv preprint arXiv:1301.3781 (2013)
11. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword
information . Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
12. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision.
Processing 150(12), 1–6 (2009)
13. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine
learning techniques. In: Proceedings of ACL-02, Conference on Empirical Methods in
Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics
(2002)
14. Parikh, R., Movassate, M.: Sentiment analysis of user-generated twitter updates using
various classification techniques. CS224N Final Report, vol. 118 (2009).
15. Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-
to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:
1701.02720 (2017)
16. Qian, Y., Woodland, P.C.: Very deep convolutional neural networks for robust speech
recognition. In: IEEE Spoken Language Technology Workshop (SLT), San Diego, CA,
USA, vol. 1, no. 16, pp. 481–488 (2016)
17. Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y., Sainath, T.: Deep learning for
audio signal processing. IEEE J. Sel. Top. Signal Process. 13(2), 206–219 (2019)
18. Ren, S., He, K., Girshick, R., Zhang, X., Sun, J.: Object detection networks on convolutional
feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1476–1481 (2016)
19. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition, pp. 1–14. arXiv preprint arXiv:1409.1556 (2014)
20. Shang, R., He, J., Wang, J., Xu, K., Jiao, L., Stolkin, R.: Dense connection and depthwise
separable convolution based CNN for polarimetric SAR image classification. Knowl. Based
Syst. 105542 (2020)
21. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text
classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)
22. Kim, Y.: Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:
1408.5882 (2014)
23. Severyn, A., Moschitti, A.: UNITN: training deep convolutional neural network for twitter
sentiment classification. In: Proceedings of the 9th International Workshop on Semantic
Evaluation, pp. 464–469 (2015). https://doi.org/10.18653/v1/S15-2079
24. Kim, H., Jeong, Y.S.: Sentiment classification using Convolutional Neural Networks. Appl.
Sci. 9(11), 2347 (2019)
A Deep Learning Architecture with Word Embeddings to Classify Sentiment 125

25. Johnson, R., Zhang, T.: Semi-supervised convolutional neural networks for text categoriza-
tion via region embedding. In: Advances in Neural Information Processing Systems,
pp. 919–927 (2015)
26. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a
simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–
1958 (2014)
27. Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. (2018). https://doi.
org/10.1016/j.aci.2018.08.003
Deep Neural Networks for Landmines Images
Classification

Refaat M. Fikry and H. Kasban(&)

Nuclear Research Center, Atomic Energy Authority, P.O. 13759,


Inshas, Cairo, Egypt
eng_refaat@yahoo.com,
hany_kasban@yahoo.com

Abstract. This paper presents an efficient solution for automatic classification


between the Anti-Tank (AT) landmines signatures and standard hyperbolic
signatures obtained from other objects, including the Anti-personnel
(AP) landmines based on pretrained deep Convolutional Neural Network
(CNN). Specifically, two deep learning techniques have been tested and com-
pared with another published landmine classification method. The first technique
is based on VGG-16 pertained network for both features extraction and clas-
sification from the dataset of landmines images. While the second technique use
Resnet-18 pretrained network for features extraction and Support Vector
Machine (SVM) used for classification. The proposed algorithm has been tested
using dataset of landmines images taken by the Laser Doppler Vibrometer based
Acoustic to Seismic (LDV-A/S) landmine detection system. The results show
that, the deep learning-based technique give higher classification accuracy than
published landmine classification method. The Resnet-18 pretrained network-
based and SVM classification gives better average accuracy than VGG-16
pertained network-based classification.

Keywords: Landmines  LDV-A/S  CNN  Deep neural networks

1 Introduction

There is no doubt that landmines are among the main problems affecting the entire
world. Egypt one of the countries affected by this problem. Over 20 million landmines
are subject to explosion anytime and these landmines inhabit a large region estimated at
3800 km2. Landmines are usually explosive devices caused by victims, which are
positioned on or near the ground until a person or animal causes its detonating system.
Some landmines include explosives, while the others contain shrapnel pieces. They’re
categorized into AP landmines, which made for killing human and AT landmines
which created for destroying the vehicles [1]. AT landmines are heavier than AP
landmines and carry more explosives than AP landmines, which also need extra
pressure or weight to detonate [2]. The main obstacles that face the landmines clearance

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 126–136, 2021.
https://doi.org/10.1007/978-3-030-58669-0_11
Deep Neural Networks for Landmines Images Classification 127

are; absence of mines maps, the mine locations change due to physical and climatic
conditions, the different types of AP and AT landmines, and the high removal costs
while the landmines production cost is lower than detection and removal cost [3].
Many algorithms and techniques have been made and created for landmines
detection and removal. Each algorithm has its cons and pros [1, 4, 5]. One of the
accurate and efficient landmine detection systems is the LDV-A/S system. The col-
lected data from this system include mixtures of mines, blank spots, and clutters [4].
LDV-A/S data interpretation is performed manually and off-line, depending on the
experience and skills of the trained operator. This process is time and efforts consume
rather than the human skill dependent leads to inconsistencies and errors in the
interpretation process. here are many algorithms which are used in landmines detec-
tions based on LDV detection system [5–10], also some techniques are used in
detection but based on GPR system like [11–13] whereas references [14–16] could
classify between AP and AT mines based on LDV-A/S system using MFCC algorithms
as a feature extractor and ANN and SVM as a classifier with different techniques. Also
[17] successes to classify between AT and AP mines based on GPR using CNN.
The classification process of hidden landmine objects contains of two phases for
performing both the training of the input image designs and the evaluation of the
testing image sets. They’re feature extraction and classification phases as in Fig. 1. The
feature extraction stage means converting the input training images into a series of
features vectors contain the information which discriminate the main image features.
The output of this stage is the feature matrix, which combines the features vectors
coefficients [15, 16]. This features matrix is used as a reference for two processes
during the classification process. The classifier employs a collection of feature matrices
from various training images to build some type of model for model training. After that,
testing of this model should really be executed using a test image set to validate the
performance of the created design and attain the ideal design enabling classification to
get place.
Recently, deep learning algorithms have been commonly used with difficult feature
extraction and other image fields in conventional RGB object classification and medical
images. Conventional deep learning algorithms, such as convolution neural networks,
can solve an object classification and recognition problem, although they can achieve
good classification accuracy, but rely too much on large amounts of data and long-term
practice [18–24]. For the classification and recognition of objects, how to preserve the
validity of the deep learning algorithm with less samples are of great importance. Such
problems were significantly solved when the transfer learning algorithm came into the
vision of the researchers [25–27].
In case of land mine recognition, there’s also a considerable danger that the process
will pose to the people involved. Given very good results obtained by deep learning in
several recognition and classification tasks [28–30]. In this paper, two techniques from
deep learning are chosen for landmine classification based on CNN. First one, VGG-16
pertained network is chosen as both feature extractor and classifier method. While in
the second one, Resnet-18 pretrained network is chosen as feature extractor and SVM
is used as classifier.
128 R. M. Fikry and H. Kasban

Training Images
Feature Feature Testing
Or Classifier
Extractor Coefficients decision
Testing Images

Fig. 1. Landmine classification system.

Our experimental strategy running on LDV-A/S actual data obtained from a test site
highlights the key benefits of using the planned methods with respect to other state-of-
the-art solutions. More specifically: (i) Although our proposed approach does not rely
on any theoretical modeling, it is less vulnerable to mistakes because of simplified
interpretations or simplifications of models (e.g. linearization, etc.); (ii) the ability of
the proposed method of detecting the images with small patches (iii) the possibility of
embedding also real acquisitions in the training step enables to improve system per-
formance up to 100% of accuracy.
This paper is organized as follows. In Sect. 2, a brief overview on LDV-A/S system
is presented. Section presents the traditional landmine classification approach. Sec-
tion 4 shows the two proposed techniques of deep learning algorithms which are used
in classification. In Sect. 5, the obtained results are shown including different perfor-
mance measures to evaluate classification accuracy. Finally, some conclusions remarks
are given in Sect. 6.

2 LDV-A/S System

Figure 2 shows the block diagram of the main components used in the LDV-A/S
detection technique of buried objects. Examples for landmine images obtained with this
technique are shown in Fig. 3. The LDV emits laser beams to a vibrating surface of the
land area under test [1]. The surface vibrations lead to reflected laser light doppler
frequency shifting. A photo detector detects the backward light returning into the LDV
from the buried landmine object on the opposite path. This light is modulated and could
be get details about the surface speed along the direction of laser beam. The voltage
obtained from the output signal is directly proportional to the instantaneous vibration
point surface velocity. A personal computer monitor (PC) shows a 2-D image of the
ground surface analyzed by the XY mirrors. A measuring grid is defined before
scanning and superimposed on the image of the ground surface. Due to its high
sensitivity to detect AP and AT mines, excellent spatial resolution and long working
distances, the LDVs are especially suitable for this measuring application.
Deep Neural Networks for Landmines Images Classification 129

Fig. 2. Simple diagram of the LDV-A/S system

Fig. 3. Samples of landmine images

3 Conventional Landmine Classification Technique

There are many drawbacks to the current conventional landmine classification, based
on geometric details. Object identification from the landmine image is accomplished by
thresholding the landmine image to exclude the dark background by a certain threshold.
This mask removes the background image and leaves only details about the important
objects. Then an area thresholding step based on the areas of objects is carried out for
removing the unwanted area clutters. An image preprocessing algorithm is performed
prior to thresholding such as morphological operations [1]. Figure 4 shows a block
diagram of a conventional method of landmine detection [18]. Thresholding of
intensity may not remove all of the unwanted noise and clutters in the images. The
problem of noise effect is a matter seldom investigated by researchers in this field.
130 R. M. Fikry and H. Kasban

Histogram
Landmine
Histogram first trough
Image
Estimation

Pre-Porcessing

Intensity Thresholding

Area Thresholding

Decision making

Classification Result

Fig. 4. Steps of the conventional landmine classification technique

4 Proposed Landmine Classification Approach

4.1 VGG-16 Pretrained Network Implementation


This technique uses the transfer learning to retrain a CNN to classify a new set of
images. VGG-16 pretrained image classification network has been trained over large
number of images and it can be used images classification up to 1000 object categories.
This type of the CNN network is rich features representations for a wide range of
images. Very deep convolutional network for large scale image recognition (VGG-16)
architecture is introduced by Simonyan and Zisserman [31]. This architecture consists
of a 16-layer network comprised of convolutional layers as shown in Fig. 5. For few
data sets problem in the deep learning contribute to over-fitting and ideal local solu-
tions, transfer learning solves this problem to a certain extent. However, it is often
difficult to solve the under-adaptation problem of the transfer learning. Figure 6 shows
the steps of transfer learning from pretrained network.
3x3 conv. 128

3x3 conv. 128

3x3 conv. 256

3x3 conv. 256

3x3 conv. 256

3x3 conv. 512

3x3 conv. 512

3x3 conv. 512

3x3 conv. 512

3x3 conv. 512

3x3 conv. 512


3x3 conv. 64

3x3 conv. 64

fc 4096

fc 4096

fc 4096
Pool / 2

Pool / 2

Pool / 2

Pool / 2
Pool / 2
Size:224

Size:112

Size:56

Size:28

Size:14

Size:7

Fig. 5. VGG-16 model architecture


Deep Neural Networks for Landmines Images Classification 131

Fig. 6. Transfer learning workflow

4.2 Resnet Deep Featurizer and SVM Classifier


The learned image features are extracted from a pre-trained CNN in this section, and
using these features for training the classifier. Extraction of features is the fastest and
simplest way utilize the representational power of pre-trained deep networks. SVM is
used as classifier in this paper. The Resnet-18 network consists of a hierarchical rep-
resentation of input images. The deeper layers are the higher features level. The global
pooling layers pools the input features over the total spatial locations, the total features
are 512. Figure 7 shows the proposed workflow of this technique.

Fig. 7. Feature extraction workflow from pretrained network.

5 Results and Discussion

In this section, the accuracy and performance of the proposed technique of deep
learning networks based on transfer learning from pretrained networks are demon-
strated. The algorithms are implemented in MATLAB 2019A environment using deep
learning toolbox in MATLAB. The algorithms are performed via a laptop computer
with detailed technical parameters shown as one CPU of Intel core I3-
3120 M@2.5 GHz and 2 core processors, 4 logical processors, one RAM of 6 GB,
and 64-bit system type.
132 R. M. Fikry and H. Kasban

5.1 Image DataSet


The proposed automated object detection techniques are applied in this section to 100
images of different types of AT and AP landmines buried at varying depths. The LDV-
A/S device scans certain images. The collected contains 100 images; 50 AP images and
50 AT images. A summary of the images data is shown in Table 1. For both tech-
niques, 70% from the collected images are chosen for training purpose from both two
types of landmines and 30% for testing and validate the accuracy in each technique.
To test the efficiency of both the proposed deep learning techniques, three types of
noise with different degrees are added to test dataset images and measure the accuracy
according to the following equations:
For Gaussian noise:

1 ðzuÞ2
PGðzÞ ¼ pffiffiffiffiffiffi e 2r2 ð1Þ
r 2p

Where z is gray level, l is mean value and r is the standard deviation. Assume in
our case that l = 0 and r = 10(x/10), x = 1:10.
For Speckle noise

Rsn ði, jÞ = Msn ði, jÞKsn ði, jÞ þ nsn ði, jÞ ð2Þ

where Msn is the original image, Ksn is the multiplicative component, Rsn is the
observed image and nsn is the speckle noise additive component. With mean = 0, and
variance = x/5000, x = 1:10.
For Salt and pepper noise the mean is 0 and the variance = x/1000, x = 1:10.
After training process is implemented on the training set of both types of landmines
using the modified architecture of VGG-16 network, the accuracy and loss function are
shown in Fig. 8 which shows the prefect accuracy 100% for the training dataset
classification.

Fig. 8. Accuracy and loss values versus number of training epochs during the training process.
Deep Neural Networks for Landmines Images Classification 133

Table 1. Image dataset


Type & Buried depth No. of Type & Buried depth No. of
Model (cm) images Model (cm) images
AT VS 2.2 2.5 11 AT EM 12 5 6
AT VS 2.2 5 4 AT TMA 4 15 4
AT VS 2.2 10 3 AT TMA 4 10 2
AT VS 1.6 2.5 3 AP VS 5.0 5 25
AT VS 1.6 7.5 3 AP VAL 2.5 3
69
AT M 15 7.5 6 AP VAL 5 8
69
AT M 19 5 8 AP PMD 6 5 7

Then the test process is applied on the test dataset which is chosen randomly (30%
of the total images for each type) which shows also 100% accuracy of classification.
Applying the different types of the noise on the testing dataset to check the
accuracy of this technique, Tables 2, 3 and 4 show the classification accuracy using
different classification method for image distorted with using AWGN, Speckle and Salt
& Pepper noises.
The results demonstrate increasing in the classification rate by rising the SNR. The
results show that the usage of the Resnet-18 Pretrained Network + SVM gives a
classification rate of 98.67% at SNR = 0 dB and it increases up to 100% at 10 dB
SNR. When applying the Resnet-18 pretrained network technique on the training set of
the images, the features feed the SVM classifier is applied, 100% accuracy is verified
for both training and testing classification between AP and AT landmines.

Table 2. Classification accuracy (%) using different classification method for image distorted
with AWGN noise
SNR Traditional GoogleNet pertained Resnet-18 pretrained network
(dB) approach [18] network + SVM
0 90 97.33 98.67
5 94 98.33 99.33
10 96 99.33 100
15 98 99.67 100
20 98 100 100
134 R. M. Fikry and H. Kasban

Table 3. Classification accuracy (%) using different classification method for image distorted
with speckle noise
SNR Traditional GoogleNet pertained Resnet-18 pretrained
(dB) approach [18] network network + SVM
0 98 99.33 100
5 98 99.67 100
10 98 100 100
15 98 100 100
20 98 100 100

Table 4. Classification accuracy using different classification method for image distorted with
salt & pepper noise
SNR Traditional GoogleNet pertained Resnet-18 pretrained
(dB) approach [18] network network + SVM
0 92% 97.33 100
5 98% 98.67 100
10 98% 99.33 100
15 98% 99.67 100
20 98% 100 100

6 Conclusion

In this article, an effective solution for automatic classification between AT signatures


mines and the other items such as AP mines or standard hyperbolic signatures based on
pretrained deep CNN with LDV-A/S system is presented. Specifically, two techniques
from deep learning are chosen, first one, a VGG-16 pertained network is chosen as both
feature extractor and classifier method. While in the second one, Resnet-18 pretrained
network is chosen as feature extractor and SVM is used as classifier. Both techniques
give very high accuracy (100%) in landmines image without any noise types added
while by adding the noise, the Resnet-18 pretrained network technique only gives
100% accuracy for both training and testing classification between AP and AT
landmines.

References
1. Kasban, H.: Detection of buried objects using acoustic waves, M.Sc. thesis, Faculty of
Electronic Engineering, Department of Electronics and Electrical Communications Engi-
neering, Menoufia University (2008)
2. Paik, J., Lee, C., Abidi, M.: Image processing-based mine detection techniques: a review.
Subsurf. Sens. Technol. Appl. 3, 153–202 (2002)
Deep Neural Networks for Landmines Images Classification 135

3. El-Qady, G., Al-Sayed, A.S., Sato, M., Elawadi, E., Ushijima, K.: Mine detection in Egypt:
Evaluation of new technology. International Atomic Energy Agency (IAEA), IAEA (2007)
4. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., Abd El-Samie, F.: Automatic object
detection from acoustic to seismic landmine images. Presented at the International
Conference on Computer Engineering & Systems, Cairo - Egypt (2008)
5. Travassos, X.L., Avila, S.L., Ida, N.: Artificial neural networks and machine learning
techniques applied to ground penetrating radar: a review. Appl. Comput. Inform. (2018)
6. Kasban, H., Zahran, O., Elaraby, S., El-Kordy, M., Abd El-Samie, F.: A comparative study
of landmine detection techniques. Sens. Imaging Int. J. 11, 89–112 (2010)
7. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., El-Rabaie, E.S., Abd El-Samie, F.:
Efficient detection of landmines from acoustic images. Prog. Electromagnet. Res. C 6, 79–92
(2009)
8. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., Abd El-Samie, F.: False alarm rate
reduction in the interpretation of acoustic to seismic landmine data using mathematical
morphology and the wavelet transform. Sens. Imaging 11(3), 113–130 (2010)
9. Makki, I.: Hyperspectral imaging for landmine detection, Ph.D. thesis, Electricaland
Electronics Engineering Department, Lebanese University and Politecnico Di Torino (2017)
10. Kasban, H., Zahran, O., El-Kordy, M., Elaraby, S., El-Rabaie, E.S., Abd El-Samie, F.:
Optimizing automatic object detection from images in laser doppler vibrometer based
acoustic to seismic landmine detection system. In: National Radio Science Conference,
NRSC, Proceedings (2009)
11. Lameri, S., Lombardi, F., Bestagini, P., Lualdi, M., Tubaro, S.: Landmine detection from
GPR data using convolutional neural networks. Presented at the 25th European Signal
Processing Conference (EUSIPCO), Kos, Greece (2017)
12. Bestagini, P., Lombardi, F., Lualdi, M., Picetti, F., Tubaro, S.: Landmine detection using
autoencoders on multi-polarization GPR volumetric data, arXiv, vol. abs/1810.01316 (2018)
13. Silva, J.S., Guerra, I.F.L., Bioucas-Dias, J., Gasche, T.: Landmine detection using
multispectral images. IEEE Sens. J. 19, 9341–9351 (2019)
14. Elshazly, E., Elaraby, S., Zahran, O., El-Kordy, M., Abd El-Samie, F.: Cepstral detection of
buried landmines from acoustic images with a spiral scan. Presented at the ICENCO’2010 -
2010 International Computer Engineering Conference: Expanding Information Society
Frontiers (2010)
15. Elshazly, E., Zahran, O., Elaraby, S., El-Kordy, M., Abd El-Samie, F.: Cepstral identification
techniques of buried landmines from degraded images using ANNs and SVMs based on
spiral scan. CIIT Int. J. Digit. Image Process. 5(12), 529–539 (2013)
16. Elshazly, E., Elaraby, S., Zahran, O., El-Kordy, M., El-Rabaie, E.S., Abd El-Samie, F.:
Identification of buried landmines using Mel frequency cepstral coefficients and support
vector machines (2012)
17. Almaimani, M.: Classifying GPR images using convolutional neural networks, M.Sc. thesis,
Computer Science Department, University of Tennessee at Chattanooga, Chattanooga,
Tennessee (2018)
18. Zhang, L., Liu, J., Zhang, B., Zhang, D., Zhu, C.: Deep cascade model-based face
recognition: when deep-layered learning meets small data. IEEE Trans. Image Process. 29,
1016–1029 (2020)
19. Yayeh Munaye, Y., Lin, H.P., Adege, A.B., Tarekegn, G.B.: UAV positioning for
throughput maximization using deep learning approaches. Sensors 19(12), 2775 (2019)
20. Vishal, V., Ramya, R., Srinivas, P.V., Samsingh, R.V.: A review of implementation of
artificial intelligence systems for weld defect classification. Mater. Today Proc. 16, 579–583
(2019)
136 R. M. Fikry and H. Kasban

21. Verma, A., Singh, P., Alex, J.S.R.: Modified convolutional neural network architecture
analysis for facial emotion recognition. In: 2019 International Conference on Systems,
Signals and Image Processing (IWSSIP), pp. 169–173 (2019)
22. Treebupachatsakul, T., Poomrittigul, S.: Bacteria classification using image processing and
deep learning. In: 2019 34th International Technical Conference on Circuits/Systems,
Computers and Communications (ITC-CSCC), pp. 1–3 (2019)
23. Talo, M., Baloglu, U.B., Yıldırım, Ö., Rajendra Acharya, U.: Application of deep transfer
learning for automated brain abnormality classification using MR images. Cogn. Syst. Res.
54, 176–188 (2019)
24. Stephen, O., Maduh, U.J., Ibrokhimov, S., Hui, K.L., Al-Absi, A.A., Sain, M.: A multiple-
loss dual-output convolutional neural network for fashion class classification. In: 2019 21st
International Conference on Advanced Communication Technology (ICACT), pp. 408–412
(2019)
25. Deng, C., Xue, Y., Liu, X., Li, C., Tao, D.: Active transfer learning network: a unified deep
joint spectral-spatial feature learning model for hyperspectral image classification. IEEE
Trans. Geosci. Remote Sens. 57, 1741–1754 (2019)
26. Côté-Allard, U., Fall, C.L., Drouin, A., Campeau-Lecours, A., Gosselin, C., Glette, K., et al.:
Deep learning for electromyographic hand gesture signal classification using transfer
learning. IEEE Trans. Neural Syst. Rehabil. Eng. 27, 760–771 (2019)
27. Ahn, E., Kumar, A., Feng, D., Fulham, M., Kim, J.: Unsupervised deep transfer feature
learning for medical image classification. In: 2019 IEEE 16th International Symposium on
Biomedical Imaging (ISBI 2019), pp. 1915–1918 (2019)
28. Harkat, H., Ruano, A.E., Ruano, M.G., Bennani, S.D.: GPR target detection using a neural
network classifier designed by a multi-objective genetic algorithm. Appl. Soft Comput. 79,
310–325 (2019)
29. Giovanneschi, F., Mishra, K.V., Gonzalez-Huici, M.A., Eldar, Y.C., Ender, J.H.G.:
Dictionary learning for adaptive GPR landmine classification. IEEE Trans. Geosci. Remote
Sens. 57, 10036–10055 (2019)
30. Dumin, O., Plakhtii, V., Shyrokorad, D., Prishchenko, O., Pochanin, G.: UWB subsurface
radiolocation for object location classification by artificial neural networks based on discrete
tomography approach. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer
Engineering (UKRCON), pp. 182–187 (2019)
31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. arXiv:1409.1556 (2014)
Deep Convolutional Neural Networks for ECG
Heartbeat Classification Using Two-Stage
Hierarchical Method

Abdelrahman M. Shaker(&), Manal Tantawi, Howida A. Shedeed,


and Mohamed F. Tolba

Faculty of Computer and Information Sciences, Ain Shams University,


Cairo, Egypt
{Abdelrahman.shaker,manalmt,dr_howida,
fahmytolba}@cis.asu.edu.eg

Abstract. Electrocardiogram (ECG) is widely used in computer-aided systems


for arrhythmia detection because it provides essential information for the heart
functionalities. The cardiologist uses it to diagnose and detect the abnormalities
of the heart. Hence, automating the process of ECG heartbeat classification
plays a vital role in the clinical diagnosis. In this paper, a two-stage hierarchical
method is proposed using deep Convolution Neural Networks (CNN) to
determine the category of the heartbeats in the first stage, and then classify the
classes belonging to that category in the second stage. This work is based on 16
different classes from the public MIT-BIH arrhythmia dataset. But the MIT-BIH
dataset is unbalanced, which degrades the classification accuracy of the deep
learning models. This problem is solved by using an adaptive synthetic sampling
technique to generate synthetic heartbeats to restore the balance of the dataset.
In this study, an overall accuracy of 97.30% and an average accuracy of
91.32% are obtained, which surpasses several ECG classification methods.

Keywords: Heartbeat classification  Arrhythmias  Convolution Neural


Networks (CNN)

1 Introduction

Cardiovascular Disease (CVD) is one of the most serious health problems and it is the
world’s leading global cause of death. CVD involves a large number of health con-
ditions, including heart and blood vessel disease, heart attack, stroke, heart failure, and
arrhythmia. Every year around 17.9 million die from CVD, which is 31% of all deaths
worldwide [1]. Arrhythmia refers to an abnormal heartbeat, it may be too slow, too fast,
or irregular heartbeat. An abnormal heartbeat can affect the normal work of the heart
functions, such as pumping inadequate blood to the body [2].
ECG is a common tool used for diagnosing cardiac arrhythmias by measuring the
electrical activity of the heart over time. An ECG record consists of consecutive
heartbeats, each heartbeat composes of three complex waves, including P, QRS, and T
waves. The cardiac activities of the heart can be measured by these complex waves and
their study is vital in arrhythmias diagnosis [3].
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 137–147, 2021.
https://doi.org/10.1007/978-3-030-58669-0_12
138 A. M. Shaker et al.

The category of the arrhythmia can be identified by determining the classes of some
consecutive heartbeats [4]. Therefore, there is a need to determine the class of each
heartbeat. The process of analyzing an enormous number of ECG records is very
complex and requires much time from the cardiologists. Hence, automating this process
is very crucial in discovering different cardiac disorders.
There are two different approaches for automatic heartbeat classification; the first
approach is to extract the features using hand-crafted methods and fed the extracted
features to classification algorithms, such as Support Vector Machines (SVM) [5, 6],
Feedforward Neural Network (FNN) [7], Probabilistic Neural Networks (PNN) [8],
General Regression Neural Networks (GRNN) [9]; the second approach is to use deep
neural networks because its structure combines the feature extraction and classification
stages into a single learning method without the need for hand-engineering features,
including Convolutional Neural Networks (CNN) [10, 11], Deep Convolution Neural
Networks (DCNN) [12], Long Short Term Memory (LSTM) [13], and combination of
CNN and LSTM [14].
Deep learning—in the last several years—has advanced rapidly and its techniques
have shown remarkable success in various fields, including computer vision [15],
bioinformatics [16], and medical diagnosis [17].
In this paper, a two-stage hierarchical approach is proposed to classify 16 classes of
the public MIT-BIH arrhythmia dataset into one of the five main categories in the first
stage, and then determine the class belonging to that category in the second stage, using
Convolution Neural Networks (CNN), with superior performance than other existing
studies.
The rest of the paper is structured as follows: The related work is provided in
section two. The proposed architecture and methodology are described in section three.
The experimental results are presented in section four. Finally, the conclusion and
future work are provided in section five.

2 Related Work

In the literature, many researchers utilized the first approach of ECG heartbeat clas-
sification by using various features extraction methods, including Principal Component
Analysis (PCA), Discrete Wavelet Transform (DWT), Independent Component
Analysis (ICA), and Higher Order Spectra (HOS). While in the classification stage,
several classification algorithms have been utilized, including PNN, GRNN, GNN, and
SVM.
R.J. Martis et al. [18] used DWT and PCA to extract the features and SVM to
classify five different classes from the MIT-BIH arrhythmia dataset, they obtained an
overall accuracy of 98.11%. On the other hand, Yazdanian et al. [19] considered the
same five classes and achieved an overall accuracy of 96.67% using a combination of
wavelet transform features in addition to morphological and time-domain features,
these features were fed into SVM in the classification stage.
S. Sahoo et al. [20] classified four different classes from the MIT-BIH arrhythmia
dataset using SVM with the usage of DWT in the feature extraction stage, they
achieved an overall accuracy of 98.39%. On the other hand, S.N. Yu [21] used ICA to
Deep Convolutional Neural Networks for ECG Heartbeat Classification 139

extract the features and classified different eight classes using neural networks with an
overall accuracy of 98.71%.
In [22], the authors proposed a feature combination between ICA and DWT with
the usage of PNN for classification between five classes from the MIT-BIH dataset and
an overall accuracy of 99.28% was obtained. While in [23], the feature set composed of
a combination of linear and non-linear features, and SVM was used to classify five
different classes. They obtained an overall accuracy of 98.91%.
El-Saadawy et al. [24] considered 15 classes of the MIT-BIH arrhythmia dataset.
They proposed a hierarchical method based on two-stages. DWT and PCA were used
to extract the morphological features. Therefore, the extracted features concatenated
with four RR features. SVM was used in the classification stage and an overall
accuracy of 94.94% was obtained.
The approaches of deep learning have the capability of learning the most relevant
features automatically from the data. Hence, the traditional steps that are required in the
first approach namely feature extraction, feature reduction, and classification can be
developed in one learning method, which is called end-to-end learning. Recently, there
are studies that applied several deep learning methods for ECG classification.
Zhang [10] proposed a 6-layer CNN model, comprising two convolutional layers,
followed by two downsampling layers, and two fully connected layers. They consid-
ered five classes of the MIT-BIH dataset, and overall accuracy of 97.50% was obtained.
In [11], the authors considered 14 classes and proposed 1D-CNN consisting of 10
layers. They obtained an overall accuracy of 97.8%.
Acharya et al. [25] classified five different categories of MIT-BIH arrhythmia
dataset using a 9-layer CNN model. To overcome the imbalance problem in the MIT-
BIH dataset, they calculated Z-score of the ECG heartbeats and generated synthetic
ones by varying the standard deviation and the mean. They achieved an overall
accuracy of 94.03% using the synthetic data, and an overall accuracy of 89.07% when
the model was trained only with the original data.
A. M. Shaker et al. [12] provided a generalization method of deep CNN for
classification of ECG heartbeats to 15 different types of arrhythmias from the MIT-BIH
dataset. They solved the imbalance problem by generating synthetic heartbeats using
Generative Adversarial Networks (GANs). After the dataset had been balanced using
GAN, they obtained an overall accuracy above 98.0%, precision above 90.0%,
specificity above 97.4%, and sensitivity above 97.7%.
In this study, the imbalance problem of the MIT-BIH dataset is solved by using
Adaptive Synthetic (ADASYN) sampling technique [26] by generating synthetic
heartbeats based on the density distribution of the data. Also, we propose a two-stage
hierarchical approach to overcome the hand-engineering methods of feature extraction
in the literature. The proposed approach classifies 16 different classes from the MIT-
BIH arrhythmia dataset using data only from lead 1.
140 A. M. Shaker et al.

3 Methodology

We discuss in this section the proposed techniques for preprocessing and classification.
A detailed description of each technique will be presented in the following sub-
sections.

3.1 Preprocessing Stage


The first step of this stage is to increase the signal-to-noise ratio by enhancing the
quality of the signal. The noise of each ECG record is reduced by removing the
undesirable frequencies (low and high frequencies) out of the signal using Butterworth
filter with a range [0.5–40] Hz. Therefore, each ECG record of the MIT-BIH dataset is
divided dynamically into multiple heartbeats using the positions of the R peaks, each
heartbeat should contain P, QRS, and T waves.
Detecting the beginning and the end of each heartbeat using a fixed segmentation
method is not always reliable because such assumption does not consider the heart rate
variations. Hence, a dynamic heartbeat segmentation method is utilized to overcome
the heart rate variability as proposed in [24]. The dynamic segmentation strategy
measures the number of samples before and after each R peak based on the duration
between the current and previous R peaks (RR previous) as well as the duration
between the current and next R peaks (RR next). Thereafter, the number of samples of
the largest interval is divided into a part before the R peak and the other part is
considered after the R peak. Such method ensures that each heartbeat contains the three
main waves in an invariant way to the variability of the heart rate. Finally, each
heartbeat is resized to contain 300 samples and the amplitude is normalized between
[0–1]. Figure 1 shows the results of the preprocessing stage.

(a) Normal heartbeat (b) Premature ventricular contraction


heartbeat

Fig. 1. Segmented & filtered heartbeats after applying the proposed preprocessing.
Deep Convolutional Neural Networks for ECG Heartbeat Classification 141

3.2 The Proposed Method for Classification Stage


Based on ANSI/AAMI EC57: 1998 standard, the 16 classes of the MIT-BIH dataset are
mapped into five categories as shown in Table 1. The proposed method, as shown in
Fig. 2, classifies the heartbeats to one of the five main categories in the first stage and
recognizes the class that falls in this category in the second stage.

Table 1. The five main categories and MIT-BIH classes mapping.


Category MIT-BIH Classes
N NOR, LBBB, RBBB, AE, NE
S APC, AP, BAP, NP
V PVC, VE, VF
Q FPN, PACE, UN
F VFN

Heartbeats

The proposed CNN model

Stage1
N S V Q F
Category Category Category Category Category

The proposed The proposed The proposed The proposed


CNN model CNN model CNN model CNN model
Stage2

5 Classes 4 Classes 3 Classes 3 Classes 1 Class

Fig. 2. The proposed architecture of the two-stage hierarchical method.

There is no need for classification network in stage 2 for F category because it has
only one class. The heartbeats that have been correctly classified in stage 1 only will be
passed to the second stage.
The proposed CNN model is inspired from the VGG network [27] with some
modifications because the VGG network is very deep and dedicated for large scale
images not 1D signals. The first two layers of the proposed network are 1D convo-
lutional layers with 64 filters and kernel size of three, followed by one Max pooling
142 A. M. Shaker et al.

layer with pool size of two, followed by another two 1D convolutional layers with 128
filters and kernel size of five, followed by Max pooling layer with pool size of two,
followed by three 1D convolutional layers with 256 filters and kernel size of five. After
that, two fully connected layers are added with number of neurons 128 and 64
respectively. Finally, the output layer contains N neurons; where N is the number of
classes for each category. The proposed model is shown in Fig. 3.

Two convolutional Two convolutional


layers followed by layers followed by Three convolutional
pooling layer layers followed by FC layers
pooling layer
pooling layer

Heartbeat

Fully connected layers

Fig. 3. The proposed CNN model for each category.

4 Experimental Results

In this section, we describe the utilized dataset and demonstrate how the data is divided
into training and testing sets. Also, the achieved results and a comparison with the
existing studies are provided.

4.1 Dataset Description


MIT-BIH arrhythmia dataset [28] is the most utilized dataset in the literature. It con-
tains 48 ECG records of different ages and genders; each one is a 30-minute-long with
a sampling frequency of 360 Hz. Each record is attached with the beat’s annotations
Deep Convolutional Neural Networks for ECG Heartbeat Classification 143

and the locations of the R peaks, which used as the ground truth for training and testing
stages. In this study, ECG records only from lead 1 are considered.
The beats of the records are divided into training and testing sets, the data division
in [24] is followed for comparison sake. The ratio of training and testing sets is not the
same for all the classes because the number of heartbeats for the classes is not dis-
tributed in an equal way. The training set of the normal class, which is the dominant
class in the dataset, consists of 13% of the total normal heartbeats, whereas training
percentage of 40% is considered for some classes that have lower number of beats. On
the other hand, training percentage of 50% is considered for the classes that have
limited number of heartbeats. The division of the heartbeats is shown in Table 2.

Table 2. Training ratio for each class utilized in this study.


Heartbeat type Number of total Training Number of
beats ratio training
beats
Normal beat (N) 75017 13% 9753
Left Bundle Branch block (LBBB) 8072 40% 3229
Right Bundle Branch block (RBBB) 7255 40% 2902
Atrial Premature Contraction (APC) 2546 40% 1019
Premature Ventricular Contraction 7129 40% 2852
(PVC)
Paced (PACE) 7025 40% 2810
Aberrated Atrial Premature (AP) 150 50% 75
Ventricular Flutter Wave (VF) 472 50% 236
Fusion of Ventricular and Normal 802 50% 401
(VFN)
Blocked Atrial Premature (BAP) 193 50% 97
Nodal (junctional) Escape (NE) 229 50% 115
Fusion of Paced and Normal (FPN) 982 50% 491
Ventricular Escape (VE) 106 50% 53
Nodal (junctional) Premature (NP) 83 50% 42
Atrial Escape (AE) 16 50% 8
Unclassifiable (UN) 15 50% 7
16 Classes 110092 21.88% 24090

4.2 Results
During the preprocessing stage, the records of the MIT-BIH dataset are segmented into
separate heartbeats. The training set is selected randomly of 24090 heartbeats based on
the data division in Table 2, and the other 86002 heartbeats are used as the testing set.
Adam optimizer [29] is utilized to train the proposed network, the weights of the
network are initialized with standard normal distribution.
In this study, data from lead1 only of the MIT-BIH arrhythmia dataset is utilized
and 16 classes are considered. The data augmentation using ADAYSN is done across
144 A. M. Shaker et al.

the two stages, the number of samples in the first stage for the classes of categories S,
V, F, and Q is increased to match the number of heartbeats in category N. In the second
stage, the number of heartbeats for each category is balanced separately based on the
dominant class in each category.
The performance is evaluated by measuring the average accuracy for each class and
the overall accuracy across the two stages. The achieved overall accuracy in the first
stage is 98.2%, whereas the overall accuracy across the two stages is 97.3%. The
achieved average accuracy for each class per category is shown in Table 3.

Table 3. Average accuracy for each class per category.


Class Average accuracy
Normal beat (N) 99.49%
Left Bundle Branch block (LBBB) 99.79%
Right Bundle Branch block (RBBB) 99.77%
Atrial Premature Contraction (APC) 99.71%
Premature Ventricular Contraction (PVC) 99.52%
Paced (PACE) 99.90%
Aberrated Atrial Premature (AP) 85.71%
Ventricular Flutter Wave (VF) 95.57%
Fusion of Ventricular and Normal (VFN) 78.30%
Blocked Atrial Premature (BAP) 97.62%
Nodal (junctional) Escape (NE) 80.00%
Fusion of Paced and Normal (FPN) 99.57%
Ventricular Escape (VE) 96.00%
Nodal (junctional) Premature (NP) 97.22%
Atrial Escape (AE) 100.00%
Unclassifiable (UN) 33.33%
16 Classes 91.32%

The comparison between the presented work and other recent existing studies is
given in Table 4. It demonstrates that large number of classes is considered, and the
overall accuracy has been improved compared to published results.
Deep Convolutional Neural Networks for ECG Heartbeat Classification 145

Table 4. Comparison of this work with other studies.


Study #of Feature set Classifier Overall
classes accuracy
Martis et al. [18] 5 PCA SVM 98.11%
Yazdanian et al. 5 Wavelet SVM 96.67%
[19]
Sahoo el al [20] 4 DWT SVM 98.39%
Yu er al [21] 8 ICA NN 98.71%
Zhang et al. [10] 5 End-to-end 1D-CNN 97.50%
Acharya [25] 5 End-to-end 1D-CNN 94.03%
El-sadawy [24] 15 DWT SVM 94.94%
Proposed method 16 Two-stage hierarchical 1D-CNN 97.30%
method

5 Conclusion and Future Work

In this paper, a two-stage hierarchical method has been proposed to classify 16 classes
of the public MIT-BIH arrhythmia dataset. Dynamic heartbeat segmentation method is
used in the preprocessing stage to overcome the variability of the heart rate. The
imbalance problem of the MIT-BIH dataset is solved by using an oversampling
technique (ADAYSN) to restore the balance of the dataset. An overall accuracy of
97.30% across the two stages and an average accuracy of 91.32% are achieved, which
surpasses other existing studies as well as more classes (16 classes) are considered.
Further research will be done to utilize the ECG records of the two leads. Also, we aim
to deploy the proposed model to real-time monitoring systems.

References
1. World Health Organization. Cardiovascular diseases (CVDs) (2017). http://www.who.int/
mediacentre/factsheets/fs317/en/
2. American Heart Association Arrhythmia (2017). https://www.heart.org/en/health-topics/
consumer-healthcare/what-is-cardiovascular-disease
3. Artis, S.G., Mark, R.G., Moody, G.B.: Detection of atrial fibrillation using artificial neural
networks. In: Proceedings of the Computers in Cardiology, Venice, Italy, 23–26 September
1991, pp. 173–176. IEEE, Piscataway (1991)
4. Kastor, J.A.: Arrhythmias, 2nd edn. W.B. Saunders, London (1994)
5. Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng.
Med. Biol. Mag. 20(3), 45–50 (2001)
6. El-Saadawy, H., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Electrocardiogram (ECG) clas-
sification based on dynamic beats segmentation. In: Proceedings of the 10th International
Conference on Informatics and Systems - INFOS’16 (2016). https://doi.org/10.1145/
2908446.2908452
146 A. M. Shaker et al.

7. Perez, R.R., Marques, A., Mohammadi, F.: The application of supervised learning through
feed-forward neural networks for ECG signal classification. In: Proceedings of the IEEE
Canadian Conference on Electrical and Computer Engineering (CCECE), Vancouver, BC,
Canada, 15–18 May 2016, pp. 1–4. IEEE, Piscataway (2016)
8. Zebardast, B., Ghaffari, A., Masdari, M.: A new generalized regression artificial neural
networks approach for diagnosing heart disease. Int. J. Innov. Appl. Stud. 4, 679 (2013)
9. Alqudah, A.M., Albadarneh, A., Abu-Qasmieh, I., Alquran, H.: Developing of robust and
high accurate ECG beat classification by combining gaussian mixtures and wavelets features.
Australas. Phys. Eng. Sci. Med. 42(1), 149–157 (2019)
10. Li, D., Zhang, J., Zhang, Q., Wei, X.: Classification of ECG signals based on 1D
convolution neural network. In: 2017 IEEE 19th International Conference on e-Health
Networking, Applications and Services (Healthcom) (2017). https://doi.org/10.1109/
healthcom.2017.8210784
11. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba M.F.: Heartbeat classification using 1D
convolutional neural networks. In: Hassanien, A., Shaalan, K., Tolba, M. (eds) Proceedings
of the International Conference on Advanced Intelligent Systems and Informatics 2019. AISI
2019. Advances in Intelligent Systems and Computing, vol. 1058. Springer, Cham (2020)
12. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Generalization of convolutional
neural networks for ECG classification using generative adversarial networks. IEEE Access
8, 35592–35605 (2020)
13. Yildirim, Ö.: A novel wavelet sequence based on deep bidirectional LSTM network model
for ECG signal classification. Comput. Biol. Med. 96, 189–202 (2018). https://doi.org/10.
1016/j.compbiomed.2018.03.016
14. Shaker, A.M., Tantawi, M., Shedeed, H.A., Tolba M.F.: Combination of convolutional and
recurrent neural networks for heartbeat classification. In: Hassanien, AE., Azar, A., Gaber,
T., Oliva, D., Tolba, F. (eds) Proceedings of the International Conference on Artificial
Intelligence and Computer Vision (AICV2020). AICV 2020. Advances in Intelligent
Systems and Computing, vol. 1153. Springer, Cham (2020)
15. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for
computer vision: a brief review. Comput. Intell. Neurosci. 2018, 1–13 (2018). https://doi.
org/10.1155/2018/7068349
16. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18, 851–869
(2017)
17. Bakator, M., Radosav, D.: Deep learning and medical diagnosis: a review of literature.
Multimodal Technol. Interact. 2, 47 (2018). https://doi.org/10.3390/mti2030047
18. Martis, R.J., Acharya, U.R., Mandana, K., Ray, A.K., Chakraborty, C.: Application of
principal component analysis to ECG signals for automated diagnosis of cardiac health.
Expert Syst. Appl. 39, 11792–11800 (2012)
19. Yazdanian, H., Nomani, A., Yazdchi, M.R.: Autonomous detection of heartbeats and
categorizing them by using support vector machines. IEEE (2013)
20. Sahoo, S., Kanungo, B., Behera, S., Sabut, S.: Multiresolution wavelet transform based
feature extraction and ECG classification to detect cardiac abnormalities. Measurement 108,
55–66 (2017)
21. Yu, S.N., Chou, K.T.: Integration of independent component analysis and neural networks
for ECG beat classification. Expert Syst. Appl. 34, 2841–2846 (2008)
22. Martis, R.J., Acharya, U.R., Min, L.C.: ECG beat classification using PCA, LDA, ICA and
discrete wavelet transform. Biomed. Sign. Process Contr. 8, 437–448 (2013)
23. Elhaj, F.A., Salim, N., Harris, A.R., Swee, T.T., Ahmed, T.: Arrhythmia recognition and
classification using combined linear and nonlinear features of ECG signals. Comput. Meth.
Progr. Biomed. 127, 52–63 (2016)
Deep Convolutional Neural Networks for ECG Heartbeat Classification 147

24. El-Saadawy, H., Tantawi, M., Shedeed, H.A., Tolba, M.F.: Hybrid hierarchical method for
electrocardiogram heartbeat classification. IET Sig. Process. 12(4), 506–513 (2018). https://
doi.org/10.1049/iet-spr.2017.0108
25. Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A., San, T.R.: A deep
convolutional neural network model to classify heartbeats. Comput. Biol. Med. (2017).
https://doi.org/10.1016/j.compbiomed.2017.08.022
26. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for
imbalanced learning, In: IEEE International Joint Conference on Neural Networks (IEEE
World Congress on Computational Intelligence), pp. 1322–1328 (2008)
27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. In: ICLR (2015)
28. MIT-BIH Arrhythmias Database. http://www.physionet.org/physiobank/database/mitdb/.
Accessed 3 Apr 2020
29. Kingma, D.P., Jimmy, B.: Adam: a method for stochastic optimization. CoRR,
abs/1412.6980 (2014)
Study of Region Convolutional Neural
Network Deep Learning for Fire Accident
Detection

Ntawiheba Jean d’Amour1,2, Kuo-Chi Chang1,2,6(&), Pei-Qiang Li1,


Yu-Wen Zhou1,2, Hsiao-Chuan Wang3, Yuh-Chung Lin1,2,
Kai-Chun Chu4, and Tsui-Lien Hsu5
1
School of Information Science and Engineering,
Fuzhou University, Fujian University of Technology, No. 33 Xuefu South Road,
New District, Fuzhou 350118, Fujian, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
Institute of Environmental Engineering, National Taiwan University,
Taipei, Taiwan
4
Department of Business Management, Fujian University of Technology,
Fuzhou, China
5
Institute of Construction Engineering and Management,
National Central University, Taoyuan, Taiwan
6
College of Mechanical & Electrical Engineering,
National Taipei University of Technology, Taipei, Taiwan

Abstract. Fires accident is one of the disasters which take human life,
infrastructure destruction due to its violence or to the delay for the rescue.
Object detection is one of the popular topics in recent years, which can play the
robust impact for detecting fire and more efficient to provide information to this
disaster. However, this study presents the fire detection processed using region
convolution neural network. We will train images of different objects in fire
using ground truth labeling. After labeling images and determining the region of
interest (ROI), the features are extracted from training data, and the detector will
be trained and will work to each and image of fire. To validate the effectiveness
of this system the algorithm demonstrates images taken from our dataset.

Keywords: Fire accident detection  Convolutional neural network (CNN) 


Region convolutional neural network (R-CNN)  Region of interest  Image
processing

1 Introduction

The last world center of fire statistics report published in 2019 done on 34 countries
from 35 cities had shown that in 2017:49 Million calls for all fire safety centers, 3.1
Million of fires, 16.8 thousand civilian deaths and 47.9 thousand civilian fire injuries as
it is shown in Table 1. The information about the fire is mainly provided by sensors,

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 148–155, 2021.
https://doi.org/10.1007/978-3-030-58669-0_13
Study of Region Convolutional Neural Network Deep Learning 149

which are suitable for indoor situations such as houses, government buildings, and
industries [1, 2].

Table 1. Summary of 10 countries with many fires (2019 N0 24 CTIF)


Country Calls Fires Fire death
USA 34683500 1349500 3 400
Russia 146544 N/A 132844
Vietnam 93 000 N/A 4 197
France 66 628 4658600 306 600
Great Britain 63786 687413 199894
Italy 61000 1000071 325 941
Spain 46570 335317 130915
Argentina 44 536 166 952 55 265
Ukraine 42 486 229 313 84 083
Poland 38 413 519902 125892

Nowadays, the technology is developing very fast where it can be used as a tool in
different activities for the good not only for human beings but for the entire environ-
ment in general. Deep learning neural network is one of the technologies which is
trending where the system can work as a human neural system. CNN is a type of
artificial neural network, which comes from neuroscience dating back to the proposal
of the first artificial neuron in 1943 [3]. It has achieved a great success in a computer
vision tasks, this share many properties with the visual system of human brain [4, 5].
CNN is a feed forward architecture, and it introduces the architecture which is non-
linear and feature extraction performance and plays the robust importance in classifi-
cation [6]. In recent year, researchers thus have attempted to tackle some object
detection techniques, one which presented the great success is region-based CNN
(RCNN) method, which is the type of CNN extension for performing object detection
tasks [7]. RCNN is defined as a visual object detection system that combines bottom up
region proposals with features computed by the convolutional neural network. RCNN
uses a selective search method in which region proposals are taken from an external
network [8, 9].
However, this study will present to use of R-CNN object detection technology for
detecting different kind of fire. The Image will be labeled and trained using Labeler
train ground truth using a MATLAB deep learning toolbox. This project is organized as
follows: First section is an Introduction, second one is related works, the third is
implementation and the last is the conclusion and future works.
150 N. Jean d’Amour et al.

2 Related Works
2.1 Fire Accident Detection
Recently, many extensive studies have been presenting the best impact using new
technologies. Ordinary camera on the scene detects fire from real time video data
processing, where flame and fire flickers are detected by analyzing the video in the
wavelet domain. Video based Fire Detection (VFD) where the surveillance camera and
computer vision are used. The camera is placed on the hilltop and can cover the square
of 100 km2 best for wildfire detection and can give the accurate information about the
fire [10, 11], Cost effective fire detection with CNN for surveillance videos is proposed,
where the idea inspired by Google Net architecture and is refined with special focus on
computational complexity and detection accuracy [12, 13]. To detect the wildfire
without barriers, unmanned aero vehicle is proposed and using Deep learning Networks
for achieving the high accuracy for the wide range of aerial photographs [14, 15]. For
other hand wireless sensor network (WSN) is the most used. This is the system where
gate away and coordinating note is in contact and exchange data. They are indoor for
house fire detection and outdoor mostly for forest fire detection. Sensors detect the
smoke and temperature of the place where they are installed; if the smoke or tem-
perature reach or exceed the set level they send information to the central node, the
central note to the base station. Global positioning system (GPS) and positioning
techniques can be used to find out the location and information about the place on fire.
We believe that WSN will face the challenges of limited memory, limited computing
and limited power consumption in the future [16, 17]. Moreover, fire detection with
IOT is introduced when during real time fire detection, self-neuron network is devel-
oped from scratch, and have been trained on the dataset compiled from multiple
sources, then the model is tested on a real-world fire dataset [18].

2.2 Region Convolution Neural Network Object Detection


For region proposals and feature extraction include two parts discuss below.
First is to generate the region proposals which will define the set of candidates for
our detector. From an input image, RCNN computes 2000 bottom-up proposals. ROI is
the part of the image which we pay more attention than other parts and it has the needed
information for the detection [19].
Second is feature extraction: The process of reducing the number resources without
losing important or relevant information will help the learning speed. From each region
proposal RCNN uses selective search for proposal generation for producing a 4096-
feature vector. The architecture compatible to CNN requires the input image with
277  277 pixel which will also be needed for computing the features for a regional
proposal. Then that image is processed through five CNN and two totally connected
layers, extracted features from the image are fed to the Support Vector machine
(SVM) to classify the presence of the object within that candidate region proposal.
Localizing object with deep network and train with the highest-quality model for only a
small quantity of annotated data has shown that the CNN can lead to the high object
detection performance (Fig. 1) [20].
Study of Region Convolutional Neural Network Deep Learning 151

Fig. 1. Result of R-CNN: region with CNN features.

3 System Implementation and Verification

3.1 Labeling of Selected Images


This study is implemented using the MATLAB deep learning toolbox, where the
dataset consists of the different images of fire downloaded from Google. The dataset is
made by 50 images for each category cars, forest or houses. From our dataset, 40
images have been used for training and 10 images for testing. Each and every image to
be trained is Labeled using ground truth labeler app which is in the MATLAB deep
learning toolbox. The ROI is labeled rectangular for training the object detectoras it is
shown on Fig. 2. Ground truth label data will facilitate to create the box label data store
that will use for training the detector. It is composed of four coordinates, see Table 2.

Fig. 2. ROI ground truth labeling.

3.2 Object Detector Training


After labeling each and every image, the training option uses the sigmoid function
(Sgmd) which is a nonlinear activation function, so it is used because it exists between
0 to 1, and especially it used for models where we want to predict the probability as an
output (Fig. 3). The sigmoid function is the right choice because the probability of
anything exists between 0 to 1. Sigmoid function is a differentiable which means it is
possible to find the slope curve at any point. In updating the curve, to know in which
direction or how much to change or update the curve depends upon the slope. That is
152 N. Jean d’Amour et al.

Table 2. Box label datastore (Blds).

Fig. 3. Sigmoid function graph.

why we use differentiation in almost every part of Machine Learning and Deep
Learning.
The object detector is trained through the combination image data store (Imds) and
box label data store (Blds). Bounding box plays the biggest role in reducing the range
of searching features extracted from fire images which conserves the computing
resources: time, processor, memory and error reduction. Figure 4 presents the flow
chart of object detector training.

3.3 Results and Discussion


This fire detector has been trained on a single CPU with 8 GB RAM with 25 epochs.
The detector has been well completed with the score of 0.995. For result 10 images of
Study of Region Convolutional Neural Network Deep Learning 153

Fig. 4. Train objects detector flow chart.

houses, cars and forest are used for verification. The aim of this work is to present the
method that can be more efficient and more reliable for fire detection. Therefore, it
becomes inevitable also to use a test data set of images of fire found on Google. So,
after all the process for our model, it has been compared with ones of previous
researches (Table 3). However, they are some failure because RCNN use selecting
search to find out the region proposals, for the future work we will use Faster RCNN
and this one is faster and more accurate than RCNN.

Table 3. comparison of proposed techniques with previous researches


Technique Precision Recall Accuracy (%)
Proposed method 0.83 0.97 95.6
Khan Muhammad et al. [10] 0.8 0.93 94.43
Arpit Jadon et al. [18] 0.97 0.94 93.91
K. Muhammad et al. [19] 0.82 0.98 94.3

From the images in Fig. 5, images used for testing present the performance of the
model for fire detection. For images of cars the percentage is 99,9%, for house one is
99,9%, other is 92,3%, finally for forest the percentage is 97,3% and 99,9% which is
the range of more of the images used for testing.
154 N. Jean d’Amour et al.

Fig. 5. Fire detection results: column one shows results of forest, column two shows car and
column three shows house.

4 Conclusion and Future Works

This study own object detector. It is implemented from image of fire of different
objects, especially houses, cars and forests. This project design and implement the
design of fire detection. From international fire statics report, we have seen a huge
number of fires in different countries which is accompanied by deaths and injuries. Fire
detection can be the solution as the fast provider of information to the fire fighters. This
will play the robust impact in reducing the number of invalid and abundant calls
provide full information about the incident. For future work the project will be
implemented with Faster RCNN to reduce the failure, and classify object on fire. Link
the detection with the network so if there is any detection the system provides the full
information to the firefight like the location, object on fire and fire intensity.

References
1. Brushlinsky, N.N., Aherens, M., Skolov, S.V., Wagner, P.: World fire statistics. Russia,
International Association of Fire and Rescue Service (CTIF) (2019)
2. Chih-Cheng, L., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using
inherently safer design strategies (III) advanced thin film process and reduction of power
consumption control. J. Loss Prev. Process Ind. 43, 280–291 (2015)
3. McCulloch, W., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.
Math. Biophys. 5, 115–133 (1943)
4. Liang, M., Hu, X.: The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 3367–3375 (2015)
Study of Region Convolutional Neural Network Deep Learning 155

5. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G
base station based on internet of things collaborative control. IEEE Access 8, 32935–32946
(2020)
6. Symeonidis, G.: Recurrent attention for deep neural object detection. Springer (2019)
7. Kriszhevsky, A., Sutkever, I., Hinton, G.E.: Image classification with deep convolution
neural network. In: Advances in Neural Information Processing System (NIPS), pp. 1097–
1105 (2012)
8. Uğur Töreyin, B., Dedeoğlu, Y., Güdükbay, U., Enis Çetin, A.: Computer vision based
method for real-time fire and flame detection, pp. 49–57. Elsevier (2006)
9. Enis Çetin, A., Dimitropoulos, K., Gouverneur, B., Grammalidis, N., Günay, O., Hakan
Habiboǧlu, Y., Uǧur Töreyin, B., Verstockt, S.: Video fire detection. Rev. Digit. Signal
Process. 23(6), 1827–1843 (2013). https://doi.org/10.1016/j.dsp.2013.07.003. ISSN 1051-
2004
10. Muhammad, K., Ahmad, J., Mehmood, I., Rho, S., Baik, S.W.: Convolutional neural
networks based fire detection in surveillance videos. IEEE Access 6, 18174–18183 (2018)
11. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for
wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7,
105562–105571 (2019)
12. Uğur Töreyin, B., Dedeoğlu, Y., Güdükbay, U., Enis Çetin, A.: Computer vision based
method for real-time fire and flame detection (2006)
13. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Agent-based middleware
framework using distributed CPS for improving resource utilization in smart city. Future
Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006.
ISSN 0167-739X
14. Lee, W., Kim, S., Lee, Y.-T., Lee, H.-W., Choi, M.: Deep neural networks for wild fire
detection with unmanned aerial vehicle. In: 2017 IEEE International Conference on
Consumer Electronics (ICCE), Las Vegas, NV, pp. 252–253 (2017)
15. Dener, M., Özkök, Y., Bostancıoğlu, C.: Fire detection systems in wireless sensor networks.
Procedia – Soc. Behav. Sci. 195, 1846–1850 (2015). https://doi.org/10.1016/j.sbspro.2015.
06.408. ISSN 1877-0428
16. Girshick, R., Donahue, J., Darrell, T., Malik, J.: The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 580–587 (2014)
17. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for
accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1),
142–158 (2016)
18. Jadon, A., Omama, M., Varshney, A., Ansari, M.S., Sharma, R.: FireNet: a specialized
lightweight fire & smoke detection model for real-time IoT applications. In: EEE (2019)
19. Muhammad, K., Ahmad, J., Baik, S.W.: Early fire detection using convolutional neural
networks during surveillance for effective disaster management. Neurocomputing 288, 30–
42 (2018)
20. Weinzaepfel, P., Csurka, G., Cabon, Y., Humenberger, M.: Visual localization by learning
objects-of-interest dense match regression. In: The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pp. 15634–5643 (2019)
Document and Sentiment Analysis
Norm-Referenced Achievement Grading:
Methods and Comparison

Thepparit Banditwattanawong1(&) and Masawee Masdisornchote2


1
Department of Computer Science, Kasetsart University, Bangkok, Thailand
thepparit.b@ku.th
2
School of Information Technology, Sripatum University, Bangkok, Thailand
masawee.ma@spu.ac.th

Abstract. Grading informs learners and instructors of both current learning


ability levels and necessary improvement. For norm referenced grading, the
instructors conventionally use a statistical method. This paper proposes an
algorithm for the norm referenced grading. Moreover, the rise of artificial
intelligence nowadays makes us curious how a machine learning technique is
efficient in the norm referenced grading. We therefore compare the statistical
method and our algorithm with the machine learning method. The experiment
relies on the data sets of both normal and skewed distributions. The comparative
evaluation reveals that our algorithm and the machine learning method yield
similar grading results in several cases. On the other hand, in overall, the
algorithm, machine learning, and statistical methods produce the best, moderate,
and lowest grading qualities, respectively.

Keywords: K-means  Z score  Clustering  Nonbinary grading  T score

1 Introduction

There are basically two types of nonbinary grading systems [1]: criterion-referenced
grading and norm-referenced grading. The former normally calculates the percentage of
a learning score and map it to the predefined range of percent to determine a grade. This
grading system is suitable for an examination that covers all topics of learning and thus
requires long exam taking as well as answer checking times. In contrast, large classes
and/or large courses widely use the norm-referenced grading system to meet exam-
taking time constrains and to save exam-answer checking resources. The system
compares the score of each individual to a relative criteria defined based on all indi-
viduals’ scores to decide a grade. The criteria is set by a conventionally statistical
means either with or without conditions (e.g., a class’s grade point average (GPA) must
be kept below 3.25). This paper focuses on an unconditional norm-referenced grading.
We separately implement such grading by three attractive means: our proposed algo-
rithm, a conventionally statistical method, and an unsupervised machine-learning
technique namely K-means. K-means is a popular clustering algorithm that is easy to
apply to grading. The grading results of each approach will be measured and compared
to one another based on the practical data sets of various distribution characteristics.

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 159–170, 2021.
https://doi.org/10.1007/978-3-030-58669-0_14
160 T. Banditwattanawong and M. Masdisornchote

The main contributions of this paper are a simple and efficient grading algorithm
and a novel insight into the comparative performance of statistical and machine
learning methods and our algorithm in unconditionally norm-referenced grading. To
the best of our knowledge, we also demonstrate for the first time the applicability of K-
means clustering technique for norm-referenced grading. The merit of this paper would
help worldwide graders with the selection of a right grading method to meet their
objectives.

2 Related Work

As for applying a machine learning clustering technique to learners’ achievement, [2]


analyzed the performance of students by using k-means to cluster the 10-subject marks
of 118 students. The centroid of each cluster was mapped to a grade of total 7 grades
ranging from A to G. The resulting grade of each cluster was the performance indicator
of students in the cluster. Academic planners could use such an indicator to take
appropriate action to remedy the students. Similarly, [3] clustered previous GPAs and
internal class assessments (e.g., class test marks, lab performance, assignment, quiz,
and attendance) separately by using K-means. Therefore, each student’s performance
was associated with several clusters, which were used to create a set of rules for
classifying the student’s final grade. In this way, any weak students were identified
before final exam to reduce ratio of fail students. [4] employed K-means to create 9
groups of GPAs: exceptional, excellent, superior, very good, above average, good, high
pass, pass, and fail. Students whose GPAs belonged to the exceptional and the fail
groups were called gifted and dunce, respectively. The gifted students were enhanced
of their knowledge whereas the dunce students were remedied through differentiated
instruction. [5] clustered students from different countries based on their attributes:
average grade, the number of participated events, the number of active days, and the
number of attended chapters. They determined an optimal k value of K-means by
means of Silhouette index resulting in k = 3. Among the 3 clusters, the most compact
cluster (i.e., a cluster with the least value of within cluster sum of square) was further
analyzed for correlation between the average grade and the other attributes. [6] utilized
K-means to cluster 190 students’ test scores into 4 classes, excellent, good, moderate,
and underarchiever, to take the appropriate self-development and teaching strategy of
treatment. [7] explored several machine learning techniques for early grade prediction
to allow instructors to improve students’ performance in early stages. Restricted
Boltzmann Machine was found to be most accurate for students’ grade prediction. K-
means was also used to cluster students based on technical course and nontechnical
course performance.
Regarding an automated grading and scoring approach, [8] proposed a peer grading
method to enable student evaluation at scale by having students assess each other.
Since students are not trained in grading, the method enlisted probabilistic models and
ordinal peer feedback to solve a rank aggregation problem. [9] proposed a method to
automatically construct grade membership functions, lenient-type grades, strict-type
grades, and normal-type grades, to perform fuzzy reasoning to infer students’ scores.
Norm-Referenced Achievement Grading: Methods and Comparison 161

3 Grading Algorithm

In this section, we propose an algorithm below for norm-referenced unconditional


grading that improves our previous heuristic in [10].

Algorithm 1. Proposed Algorithm.

The algorithm is explained as follows. Line 1 initially ranks the scores of learners
within a group from the best down to the worst. Line 2 and line 3 figure out maximum
and minimum scores from the rank to decide the best and the worst grades to be
assigned to the learners. For example, the performance of the best learner in the group
might perform not good enough to deserve A, so nobody receives A. Line 4 counts the
number of eligible grades to be assigned. Once the eligible grades is determined, line 5
sequentially goes through the rank to calculate gaps between every pair of contiguous
scores. Line 6 sorts the gaps in a descending order. Line 7 selects maximum gaps to be
used to define score ranges on line 8 to match the number of eligible grades. For
instance, four eligible grades require four score ranges to be defined, thus selectWi-
destGaps() returns the first three maximum gaps. Finally, line 9 assigns grades based on
the ranges. In this way, the algorithm is simple and straightforward. Its performance
will be proved in Sect. 6.

4 Statistical Grading

A conventionally statistical grading method relies on z scores and t scores [1]. Z score
is a measure of how many standard deviations below or above the population mean a
raw score is. Z score (z) is technically defined in (1) as the signed fractional number of
standard deviation r by which the value of an observation or a data point x is above the
mean value l of what is being observed or measured.
162 T. Banditwattanawong and M. Masdisornchote

xl
z¼ ð1Þ
r
Observed values above the mean have positive z scores, while values below the
mean have negative z scores.
T score converts individual scores into standard forms and is much like z score
when a sample size is above 30. In psychometrics, T score (t) is a z score shifted and
scaled to have a mean of 50 and a standard deviation of 10 as in (2).

t ¼ 10  z þ 50 ð2Þ

The statistical grading method begins by converting raw scores to z scores. The z
scores are further converted to t scores to simplify interpretation because t scores
normally range from 0 to 100 unlike z scores that can be negative real numbers. The t
scores are then sorted and a range between maximum and minimum t scores is divided
by the desired number of grades to obtain an identical score interval. The interval is
used to define the t-score ranges of all grades. In this way, we can map raw scores to z
scores, the z scores to t scores, the t scores to t-score intervals, and the t-score intervals
to resulting grades, respectively. One advantage when using z score is the skipping of
some grades if no score falls in the corresponding t-score intervals of such grades.

5 Machine Learning-Based Grading

In this section, we explain how to apply K-means clustering algorithm to grading as the
norm-referenced grading is natural to unsupervised learning rather than supervised one.
K-means [11] is an unsupervised machine learning technique for partitioning n
objects into k clusters. K-means begins by randomizing k centroids, one for each
cluster. Assign every object to a cluster whose centroid is nearest to the object. Re-
calculate the means of all assigned objects within each cluster to serve as k new
centroids as barycenters of the clusters. Iterate the object assignment to the clusters and
the centroid re-calculation until no more object moves between clusters. In other words,
Pk Pnj  

K-means algorithm aims at minimizing an objective function j¼1 i¼1 xi  cj
where nj is the number of objects in  j, xi = <xi1, xi2, …, xim> is an object in
 cluster
cluster j whose centroid is cj, and xi  cj  is Euclidean distance. Also note that the
initial centroid randomization can result in different final clusters.
When applying K-means algorithm to higher educational grading, k is set to the
number of eligible grades. Graders must decide such a number to avoid some best and
worst grades if appropriate.
The quality of clustering results can be measured by using a well-known metric
namely Davies Bouldin index. Let us denote by dkw the mean intra-cluster distance of
Pnj  
the points belonging to cluster Cjw to their barycenter cj : dj ¼ n1j i¼1 xi  cj . Let us
also denote a distance between barycenters c0j and cj of clusters C0j and Cj by
 
 
Djj0 ¼ c0j  cj .
Norm-Referenced Achievement Grading: Methods and Comparison 163

DBI is figured out by using (3) [12]. The lower DBI, the better clustering results
(i.e., low DBI clusters have low intra-cluster distances and high inter-cluster distances).
!
1 Xk dj þ d0j
DBI ¼ max 0 0
8j 2f1;::;kg^j 6¼j ð3Þ
k j¼1 Djj0

6 Evaluation

We evaluated z score method, our algorithm, and K-means in norm-referenced


unconditional grading. We initially describe experimental configuration and data sets’
characteristics. Then, grading results along with performance metrics are provided.

6.1 Experimental Configuration


We used five different data sets of accumulative term scores. They were scored on the
scale of 0.0 to 100.0 points and held the practical patterns of distributions.
The first data set has normal distribution namely ND. Table 1 shows the raw scores
of ND set. Mean and median are 63. Mode is unavailable as every score has the same
frequency of 1. Standard deviation (SD) is 13.9.

Table 1. Sorted scores of ND data set.


Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score
1 88 7 76 13 66 19 60 25 50 31 38
2 86 8 75 14 65 20 59 26 49
3 84 9 74 15 64 21 54 27 48
4 79 10 73 16 63 22 53 28 47
5 78 11 72 17 62 23 52 29 42
6 77 12 67 18 61 24 51 30 40

Figure 1 projects the normal distribution of ND set. The horizontal axis represents z
score. Area under the curve represents normal distribution value computed with (4) [1].

1
f ð xÞ ¼ pffiffiffiffiffiffi e2ð r Þ
1 xl 2
ð4Þ
r 2p

The second and the third data sets have positively and negatively skewed distri-
butions namely SD+ and SD−, respectively. Positively skewed distribution is an
asymmetric bell shape skewed to the left probably caused by overly difficult exam
questions from the viewpoint of learners. Table 2 shows the raw scores of SD+ set.
Mode, median, and mean are 52, 60.9, and 53, respectively. Figure 2 depicts the
normal distribution of SD+ set. The skewness equals 1.006.
164 T. Banditwattanawong and M. Masdisornchote

Fig. 1. Distribution of ND data set Fig. 2. Distribution of SD+ data set

Table 2. Sorted scores of SD+ data set.


Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score
1 92 7 73 13 60 19 52 25 51 31 45
2 90 8 73 14 54 20 52 26 51
3 89 9 73 15 53 21 52 27 50
4 86 10 65 16 53 22 52 28 50
5 77 11 62 17 53 23 51 29 46
6 74 12 61 18 52 24 51 30 46

Negatively skewed distribution is an asymmetric bell shape skewed to the right


probably caused by overly easy exam questions from the viewpoint of learners. Table 3
shows the raw scores of SD− set. Mode, median, and mean equal 87, 82, and 73.5,
respectively. Figure 3 depicts the normal distribution of SD− set. The skewness is
−1.078.
These 3 data sets contain the same number of raw scores and were synthesized to be
realistic and aim at clarifying the extreme behaviors of the three compared methods.
The fourth data set RD− was collected from a group of real learners taking the same
undergrad course in academic year 2019. Unlike SD+ and SD− that are heavily
skewed, RD− (and RD+) represent imperfectly normal distributions (i.e., slightly
skewed). The RD− data set has the negative skew of −0.138 as showed in Table 4 and
Fig. 4. Mode, median, and mean equal 66.7, 56.6, and 57.9, respectively.
The last data set, RD+ , was the real term scores of the other group of learners from
another university different from that of RD−. Opposite to RD−, RD+ data set has the
slightly positive skew of 0.155. The characteristics of RD+ are shown in Table 5 and
Fig. 5. Mode, median, and mean equal 82.5, 66.4, and 65.7, respectively.

Table 3. Sorted scores of SD− data set.


Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score
1 94 7 86 13 84 19 74 25 62 31 34
2 93 8 86 14 84 20 73 26 61
3 87 9 86 15 83 21 72 27 52
4 87 10 85 16 82 22 65 28 50
5 87 11 85 17 77 23 64 29 38
6 87 12 85 18 75 24 63 30 36
Norm-Referenced Achievement Grading: Methods and Comparison 165

Table 4. Sorted scores of RD− data set.


Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score
1 80.8 12 70.6 23 61.4 34 55.5 45 50.7 56 44.5
2 80.2 13 69.1 24 60.7 35 55.2 46 50 57 43.5
3 78.7 14 68.7 25 60.5 36 55.2 47 48.8 58 42
4 76.8 15 68 26 59.2 37 55.1 48 48.7 59 35.7
5 76.1 16 67.6 27 58.7 38 54.7 49 48.6 60 28.4
6 75.2 17 66.7 28 58.5 39 53.9 50 46.7 61 28
7 75.1 18 66.7 29 57.8 40 52.6 51 46.4
8 72.5 19 65.8 30 57.4 41 52.5 52 46.2
9 72.1 20 63.5 31 56.6 42 51.7 53 45
10 71.6 21 61.6 32 55.7 43 51.3 54 44.9
11 70.8 22 61.5 33 55.5 44 51 55 44.6

Table 5. Sorted scores of RD+ data set.


Record# Score Record# Score Record# Score Record# Score Record# Score Record# Score
1 89.47 18 74.57 35 69.1 52 66.1 69 60.33 86 53.8
2 87.1 19 73.3 36 68.77 53 65.87 70 59.83 87 53.73
3 82.73 20 73.2 37 68.6 54 65.8 71 58.93 88 53.37
4 82.53 21 73.1 383 68.27 55 64.77 72 58.87 89 53.37
5 82.53 22 72.83 9 67.87 56 64.73 73 58.53 90 52.87
6 82.17 23 72.63 40 67.77 57 64.73 74 58.47 91 52.47
7 80.7 24 72.1 41 67.63 58 64.57 75 58.27 92 52.1
8 80.5 25 71.83 42 67.63 59 64.57 76 57.53 93 52
9 79.97 26 71.77 43 67.57 60 64.3 77 57 94 51.97
10 79.43 27 70.8 44 67.33 61 64.17 78 56.77 95 51.8
11 79.3 28 70.4 45 67.1 62 64.13 79 55 96 50.9
12 78.9 29 70.23 46 67 63 63.93 80 54.8 97 50.7
13 78.47 30 70.2 47 66.77 64 63.9 81 54.57 98 50.2
14 78.27 31 70.2 48 66.73 65 63.57 82 54.5 99 50.1
15 77.87 32 69.43 49 66.4 66 63 83 54.5 100 45
16 77.87 33 69.17 50 66.37 67 62.83 84 54.43
17 75.73 34 69.17 51 66.37 68 60.63 85 54.37

Fig. 4. Distribution of RD− Fig. 5. Distribution of RD+


Fig. 3. Distribution of SD-
166 T. Banditwattanawong and M. Masdisornchote

Besides the data sets, we engaged a grading system that evaluated the scores into 5
eligible grades: A, B, C, D, and F without any class GPA constraint. We made an
assumption that there was no skipped grade. We realized the grading system in 3 ways
by using our algorithm, z score, and K-means separately.
The results of each method had their quality measured in DBI metric as if the
grades represented distinct clusters. The underlying reason of using DBI as the quality
metric in norm-referenced grading is intuitive. Recall that a DBI value becomes low if
clusters are small and far from one another. Learners with much similar achievement
should receive the same grade (i.e., low intra-cluster distances), and different grades
must be able to discriminate achievements between the groups of learners as much
clearly as possible (i.e., high inter-cluster distances). To interpret the quality results of
each method, the lower DBI, the better method.

6.2 Grading Result


We graded ND data set, having normal curve distribution, by using our algorithm, z
score, and K-means and reported their results, respectively, in angle brackets showed in
Table 6. The algorithm delivered exactly the same results as K-means method. Their
DBIs equaled 0.330. Z score method yielded the equivalent DBI of 0.443. It might be
questionable from student viewpoint why graders using z score gave learners who
scored 78 and 79 the same grades A as that of 84, and 47 mark holder the same grade F
as that of 42. Technically answering, because 78 and 79 fell in the same z-score interval
of A while 47 fell in the z-score interval of F.

Table 6. Results of 3 grading methods for ND.


Score Grade Score Grade Score Grade Score Grade Score Grade Score Grade
88 <A, A, A> 76 <B, B, B> 61 <C, C, C> 66 <C, C, C> 50 <D, D, D> 38 <F, F, F>
86 <A, A, A> 75 <B, B, B> 60 <C, C, C> 65 <C, C, C> 49 <D, D, D>
84 <A, A, A> 74 <B, B, B> 59 <C, C, C> 64 <C, C, C> 48 <D, D, D>
79 <B, A, B> 73 <B, B, B> 54 <D, D, D> 63 <C, C, C> 47 <D, F, D>
78 <B, A, B> 72 <B, B, B> 53 <D, D, D> 52 <D, D, D> 42 <F, F, F>
77 <B, B, B> 62 <C, C, C> 67 <C, C, C> 51 <D, D, D> 40 <F, F, F>

We graded SD+ data set with the algorithm, z score, and K-means as showed in
Table 7. The algorithm delivered exactly the same results as K-means method. DBI
was 0.222. Z score method gave the equivalent DBI of 0.575. There were many grades
F when using z score method.

Table 7. Results of 3 grading methods for SD+ .


Score Grade Score Grade Score Grade Score Grade Score Grade Score Grade
92 <A, A, A> 73 <B, C, B> 60 <C, D, C> 52 <D, F, D> 51 <D, F, D> 45 <F, F, F>
90 <A, A, A> 73 <B, C, B> 54 <D, F, D> 52 <D, F, D> 51 <D, F, D>
89 <A, A, A> 73 <B, C, B> 53 <D, F, D> 52 <D, F, D> 50 <D, F, D>
86 <A, A, A> 65 <C, C, C> 53 <D, F, D> 52 <D, F, D> 50 <D, F, D>
77 <B, B, B> 61 <C, D, C> 53 <D, F, D> 51 <D, F, D> 46 <F, F, F>
74 <B, B, B> 61 <C, D, C> 52 <D, F, D> 51 <D, F, D> 46 <F, F, F>
Norm-Referenced Achievement Grading: Methods and Comparison 167

Similarly, we graded SD− data set as in Table 8. Again, the algorithm delivered the
equivalent DBI of 0.299. Z score’s and K-means method’s DBIs were equally 0.233.

Table 8. Results of 3 grading methods for SD-.


Score Grade Score Grade Score Grade Score Grade Score Grade Score Grade
94 <A, A, A> 86 <B, A, A> 84 <B, A, A> 74 <B, B, B> 62 <C, C, C> 34 <F, F, F>
93 <A, A, A> 86 <B, A, A> 84 <B, A, A> 73 <B, B, B> 61 <C, C, C>
87 <B, A, A> 86 <B, A, A> 83 <B, A, A> 72 <B, B, B> 52 <D, D, D>
87 <B, A, A> 85 <B, A, A> 82 <B, A, A> 65 <C, C, C> 50 <D, D, D>
87 <B, A, A> 85 <B, A, A> 77 <B, B, B> 64 <C, C, C> 38 <F, F, F>
87 <B, A, A> 85 <B, A, A> 75 <B, B, B> 63 <C, C, C> 36 <F, F, F>

Since in practice, there is no perfectly normal distribution with respect to learners’


achievement, we describe experimental results based on real data sets having slightly
skewed distributions. We graded RD− data set with the 3 methods in Table 9. The gaps
between two consecutive raw scores were utilized by our algorithm where four widest
gaps (italized) were used as grading steps.

Table 9. Results of 3 grading methods for RD−.


Score Gap Grade Score Gap Grade Score Gap Grade Score Gap Grade Score Gap Grade
80.8 – <A, A, A> 68.7 0.4 <B, B, B> 58.7 0.5 <C, C, C> 52.6 1.3 <C, C, C> 45 1.2 <C, D, D>
80.2 0.6 <A, A, A> 68 0.7 <B, B, B> 58.5 0.2 <C, C, C> 52.5 0.1 <C, C, C> 44.9 0.1 <C, D, D>
78.7 1.5 <A, A, A> 67.6 0.4 <B, B, B> 57.8 0.7 <C, C, C> 51.7 0.8 <C, C, D> 44.6 0.3 <C, D, D>
76.8 1.9 <A, A, A> 66.7 0.9 <B, B, B> 57.4 0.4 <C, C, C> 51.3 0.4 <C, C, D> 44.5 0.1 <C, D, D>
76.1 0.7 <A, A, A> 66.7 0 <B, B, B> 56.6 0.8 <C, C, C> 51 0.3 <C, C, D> 43.5 1 <C, D, D>
75.2 0.9 <A, A, A> 65.8 0.9 <B, B, B> 55.7 0.9 <C, C, C> 50.7 0.3 <C, C, D> 42 1.5 <C, D, D>
75.1 0.1 <A, A, A> 63.5 2.3 <C, B, B> 55.5 0.2 <C, C, C> 50 0.7 <C, C, D> 35.7 6.3 <D, F, F>
72.5 2.6 <B, A, B> 61.6 1.9 <C, B, C> 55.5 0 <C, C, C> 48.8 1.2 <C, D, D> 28.4 7.3 <F, F, F>
72.1 0.4 <B, A, B> 61.5 0.1 <C, B, C> 55.2 0.3 <C, C, C> 48.7 0.1 <C, D, D> 28 0.4 <F, F, F>
71.6 0.5 <B, A, B> 61.4 0.1 <C, B, C> 55.2 0 <C, C, C> 48.6 0.1 <C, D, D>
70.8 0.8 <B, A, B> 60.7 0.7 <C, B, C> 55.1 0.1 <C, C, C> 46.7 1.9 <C, D, D>
70.6 0.2 <B, A, B> 60.5 0.2 <C, B, C> 54.7 0.4 <C, C, C> 46.4 0.3 <C, D, D>
69.1 1.5 <B, B, B> 59.2 1.3 <C, C, C> 53.9 0.8 <C, C, C> 46.2 0.2 <C, D, D>

All 3 methods produced different grading results in overall. Particularly, the


algorithm and K-means methods graded A for the same group of learners whereas z
score and K-means methods judged F the same group of learners. To evaluate the
quality of each method’s results, we consider DBIs: The algorithm had the DBI of
0.375 whereas K-means method and z score method gave the equivalent DBIs of 0.469
and 0.492, respectively. Therefore, the algorithm delivered the best grading results with
respect to RD− data set. The algorithm accomplished the lowest DBI because grade D
has only one member score. Such a single-member grading result is comparable to the
smallest possible cluster, which DBI prefers.
168 T. Banditwattanawong and M. Masdisornchote

We graded RD+ data set with the algorithm, z score, and K-means methods as
showed in Table 10. We found that the algorithm, z score method, and K-means
method yielded DBIs of 0.345, 0.529, and 0.486, respectively. This means that the
algorithm defeated the others.

Table 10. Results of 3 grading methods for RD+ .


Score Gap Grade Score Gap Grade Score Gap Grade Score Gap Grade Score Gap Grade
89.47 – <A, A, A> 73.1 0.1 <C, B, B> 67.63 0.14 <C, C, C> 64.17 0.13 <C, C, C> 54.57 0.23 <D, D, F>
87.1 2.37 <A, A, A> 72.83 0.27 <C, B, B> 67.63 0 <C, C, C> 64.13 0.04 <C, C, C> 54.5 0.07 <D, D, F>
82.73 4.37 <B, A, A> 72.63 0.2 <C, B, B> 67.57 0.06 <C, C, C> 63.93 0.2 <C, C, C> 54.5 0 <D, D, F>
82.53 0.2 <B, A, A> 72.1 0.53 <C, B, B> 67.33 0.24 <C, C, C> 63.9 0.03 <C, C, C> 54.43 0.07 <D, D, F>
82.53 0 <B, A, A> 71.83 0.27 <C, B, B> 67.1 0.23 <C, C, C> 63.57 0.33 <C, C, C> 54.37 0.06 <D, D, F>
82.17 0.36 <B, A, A> 71.77 0.06 <C, B, B> 67 0.1 <C, C, C> 63 0.57 <C, C, C> 53.8 0.57 <D, F, F>
80.7 1.47 <B, A, A> 70.8 0.97 <C, C, B> 66.77 0.23 <C, C, C> 62.83 0.17 <C, C, C> 53.73 0.07 <D, F, F>
80.5 0.2 <B, B, A> 70.4 0.4 <C, C, B> 66.73 0.04 <C, C, C> 60.63 2.2 <D, D, D> 53.37 0.36 <D, F, F>
79.97 0.53 <B, B, A> 70.23 0.17 <C, C, B> 66.4 0.33 <C, C, C> 60.33 0.3 <D, D, D> 53.37 0 <D, F, F>
79.43 0.54 <B, B, A> 70.2 0.03 <C, C, B> 66.37 0.03 <C, C, C> 59.83 0.5 <D, D, D> 52.87 0.5 <D, F, F>
79.3 0.13 <B, B, A> 70.2 0 <C, C, B> 66.37 0 <C, C, C> 58.93 0.9 <D, D, D> 52.47 0.4 <D, F, F>
78.9 0.4 <B, B, A> 69.43 0.77 <C, C, B> 66.1 0.27 <C, C, C> 58.87 0.06 <D, D, D> 52.1 0.37 <D, F, F>
78.47 0.43 <B, B, A> 69.17 0.26 <C, C, B> 65.87 0.23 <C, C, C> 58.53 0.34 <D, D, D> 52 0.1 <D, F, F>
78.27 0.2 <B, B, A> 69.17 0 <C, C, B> 65.8 0.07 <C, C, C> 58.47 0.06 <D, D, D> 51.97 0.03 <D, F, F>
77.87 0.4 <B, B, A> 69.1 0.07 <C, C, B> 64.77 1.03 <C, C, C> 58.27 0.2 <D, D, D> 51.8 0.17 <D, F, F>
77.87 0 <B, B, A> 68.77 0.33 <C, C, B> 64.73 0.04 <C, C, C> 57.53 0.74 <D, D, D> 50.9 0.9 <D, F, F>
75.73 2.14 <C, B, B> 68.6 0.17 <C, C, B> 64.73 0 <C,C,C> 57 0.53 <D, D, D> 50.7 0.2 <D, F, F>
74.57 1.16 <C, B, B> 68.27 0.33 <C, C, C> 64.57 0.16 <C, C, C> 56.77 0.23 <D, D, D> 50.2 0.5 <D, F, F>
73.3 1.27 <C, B, B> 67.87 0.4 <C, C, C> 64.57 0 <C, C, C> 55 1.77 <D, D, F> 50.1 0.1 <D, F, F>
73.2 0.1 <C, B, B> 67.77 0.1 <C, C, C> 64.3 0.27 <C, C, C> 54.8 0.2 <D, D, F> 45 5.1 <F, F, F>

7 Finding, Discussion, and Implication

Fig. 6. Detailed performance Fig. 7. Overall performance

Figure 6 compares all measured DBIs of 3 methods with respect to each data set. Since
the lower DBI the better clustering quality, we can draw conclusions here that K-means
method is suitable for heavily skewed distribution (i.e., SD+ and SD− data sets) and
unfriendly to normal (i.e., ND) and nearly normal (or slightly skewed) distributions
(i.e., RD− and RD+). K-means’ DBIs have l = 0.348 and r = 0.112. The algorithm is
Norm-Referenced Achievement Grading: Methods and Comparison 169

generally appropriate for all kinds of distributions (l = 0.314 and r = 0.052.). In


contrast, z score method is not recommended for any case (l = 0.454 and r = 0.119.).
The absolute degree instead of the positive or negative polar of skewness has impacts
on the methods’ grading qualities. Figure 7 comparatively projects the overall per-
formance of each method across all 5 data sets. Our algorithm is optimal while K-
means method produces the subtly underneath results of about 10.77% higher DBI.
Z score method performs worst (44.67% greater DBI than that of the algorithm) mainly
because it is totally blind to raw score gaps between different grading levels.
The key findings are that the algorithm and K-means method lead to the same
grading results based on normal and positively skewed distributions. Z score method
and K-means method yield identical grading results based on negatively skewed dis-
tribution. K-means method is suitable for skewed distribution (i.e., SD+ and SD−). The
implication of these findings is that the algorithm is generally appropriate for all kinds
of distributions. When grading the imperfectly-normal-distribution data sets, our
algorithm yields the best DBIs followed by K-means method and z score, respectively.

References
1. Wadhwa, S.: Handbook of Measurement and Testing. Ivy Publishing House, New Delhi
(2008)
2. Arora, R.K., Badal, D.: Evaluating student’s performance using k-means clustering. Int.
J. Comput. Sci. Technol. 4, 553–557 (2013)
3. Borgavakar, S.P., Shrivastava, A.: Evaluating student’s performance using k-means
clustering. Int. J. Eng. Res. Technol. 6, 114–116 (2017)
4. Parveen, Z., Alphones, A., Naz, S.: Extending the student’s performance via k-means and
blended learning. Int. J. Eng. Appl. Comput. Sci. 2, 133–136 (2017)
5. Shankar, S., Sarkar, B.D., Sabitha, S., Mehrotra, D.: Performance analysis of student
learning metric using k-mean clustering approach. In: 6th International Conference - Cloud
System and Big Data Engineering, India, pp. 341–345 (2016)
6. Xi, S.: A new student achievement evaluation method based on k-means clustering
algorithm. In: 2nd International Conference on Education Reform and Modern Management,
pp. 175–178. Atlantis Press, Hong Kong (2015)
7. Iqbal, Z., Qayyum, A., Latif, S., Qadir, J.: Early student grade prediction: an empirical study.
In: 2nd International Conference on Advancements in Computational Sciences, Pakistan,
pp. 1–7 (2019)
8. Ramen, K., Joachims, T.: Methods for ordinal peer grading. In: 20th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining, New York, pp. 1037–1046 (2014)
9. Bai, S.M., Chen, S.M.: Automatically constructing grade membership functions for students’
evaluation for fuzzy grading systems. In: 2006 World Automation Congress, Hungary,
pp. 1–6 (2006)
170 T. Banditwattanawong and M. Masdisornchote

10. Banditwattanawong, T., Masdisornchote, M.: Norm-referenced achievement grading of


normal, skewed, and imperfectly normal distributions based on machine learning versus
statistical techniques. In: 2020 IEEE Conference on Computer Applications, Myanmar,
pp. 1–8 (2020)
11. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques.
Morgan Kaufmann Publishers, Burlington (2016)
12. Desgraupes, B.: Clustering Indices. University Paris Ouest, France (2017)
Review of Several Address Assignment
Mechanisms for Distributed Smart Meter
Deployment in Smart Grid

Tien-Wen Sung, Xiaohui Hu(&), and Haiyan Ou

College of Information Science and Engineering, Fujian University


of Technology, Fuzhou, China
tienwen.sung@gmail.com, 2191905070@smail.fjut.edu.cn,
512376856@qq.com

Abstract. Deploying wireless objects or devices is a fundamental basis for


many network-based applications. The objects could be smart meters in a smart
gird, sensors or actuators in a WSN, or IoT things. After the physical deploy-
ment of those devices, a network address allocation becomes another essential
procedure to enable network communications among the devices for device
controls or message delivery purposes. This paper gives a review on several
notable address assignment mechanisms by describing the key technique of each
of these proposed approaches. The advantages as well as weaknesses are also
introduced and a brief comparison is also given in this paper. The review
provides a valuable reference in making further improvement of distributed
address allocations and a meaningful reference for the relevant applications in
the topics of smart grid, sensor networks, and Internet of things.

Keywords: Address assignment  Distributed smart meter deployment  Smart


grid

1 Introduction

Smart grid technology and application are being adopted for the development of
intelligent power systems in many countries [1]. Smart meter deployment is one of the
essentials in a smart grid infrastructure. The smart meters are deployed and utilized for
measuring energy consumption as well as billing information of smart houses. A robust
communication network is fundamental for message delivery among the meters. The
communications network possibly used in smart grids is not yet clearly defined and
several choices can be selected as needed [2]. There exist three common network
structures used in smart meter communications: communication using a mobile net-
work; communication using a data concentrator (DC); and communication using a
gateway [3]. Communication technology success depends on various facts such as
network medium, topology, address allocation, routing, installation environment, and
etc. The one of address allocation is related to the ones of network topology and
routing. This paper focuses on the part of network address allocation and provides a
review on five notable address assignment mechanisms which were proposed for and

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 171–178, 2021.
https://doi.org/10.1007/978-3-030-58669-0_15
172 T.-W. Sung et al.

can be used in the distributed node deployments of smart grid, sensor network, and
Internet of things applications. The purpose is to give the relative concepts of these
mechanisms for further improvement and employment in smart grid applications.

2 Related Works

Nowadays wireless communication has been utilized in various network-based appli-


cations. It is also employed in the infrastructure of smart grids in which smart meters
are interconnected with wireless connections [4, 5]. ZigBee, WiFi, Bluetooth and
Cellular networks are available wireless technologies for using in smart grids [6, 7].
The tree-based or cluster-based networking methods are usually utilized as a basis for
message routing in the networks which consist of smart meters, data aggregation
points, gateways, and the control center [8]. Regarding to the network address allo-
cation, there have been several notable approaches proposed for these kinds of net-
works which consist of wireless distributed devices. To provide a research basis and a
valuable reference for making improvement of distributed address allocation and
adoption to relevant applications with smart meter networking, this paper reviews the
proposed Distributed Address Assignment Mechanism [9], ZigBee Adaptive Joining
Mechanism (ZAJM) [10], Expanded Distributed Address Assignment Mechanism
(EDAAM) [11], Multi-step Distributed Address Assignment Mechanism (M-DAAM)
[12], and Zigbee Connectivity Enhancement Mechanism (ZCEM) [13]. The key
technique, advantage, and weakness of these five mechanisms are introduced and
described in Sect. 3, respectively.

3 Network Address Allocation Approaches

This section reviews and describes five network address allocation approaches which
are applicable to a variety of well-known wireless ad hoc networks, includes the
distributed networks of wireless smart meters, sensors, IoT things, etc.

3.1 Distributed Address Assignment Mechanism (DAAM)


The Distributed Address Assignment Mechanism is defined in the ZigBee specification
[9]. The address allocation is hierarchical and distributed, thus a tree-based topology.
A parent node owns a block of network addresses and will allocate a sub-block of the
addresses to its child node which is a potential parent node of a possible subtrees. Once
the child become a parent node of a subtree, it will do the same step to allocate
addresses to its children. If it is a certainty that the child of a parent node is a leaf node,
the parent will allocate a single address instead of a sub-block of addresses to the child.
In ZigBee networks, there is an algorithm to determine the range of an address sub-
block or the value of a single address which will be allocated. The algorithm is based
on the device types and the network configuration parameters. The device types are
ZigBee Coordinator (ZC), ZigBee Router (ZR), and ZigBee End Device (ZED). The
ZigBee coordinator, router, and end device can be treated as a control center, data
Review of Several Address Assignment Mechanisms 173

aggregation point, and smart meter in a neighborhood area network, respectively.


A ZED has no capability to accept a connection request from any other node. The
network configuration parameters are nwkMaxChildren (Cm), nwkMaxRouters (Rm),
and nwkMaxDepth (Lm). The address assignment can be achieved by using the
formulas:
(
1 þ Cm  ðLm  d  1Þ; Rm ¼ 1
Cskip ðd Þ ¼ Lm d1
1 þ Cm Rm Cm Rm ð1Þ
1Rm ; otherwise

An ¼ Aparent þ Cskip ðd Þ  Rm þ n ð2Þ

A parent node will assign an address that is one greater than its own to its first ZR
child. Later addresses assigned to other ZR children are separated from each other by
the value of Cskip ðd Þ shown in Eq. (1). Network addresses should be assigned to nth
ZED children in sequential numbers of the value of An shown in Eq. (2). Fig. 1 shows
an example of address allocation by using the DAAM.

Fig. 1. An example of DAAM

The advantage of DAAM is that the addresses are computable and unique. DAAM
is designed and suitable for tree routing. The weakness of DAAM is that it is possible
to cause an address space waste in some cases, for example, the number of nodes in a
subtree is less than the number of reserved addresses in an address block for that
subtree. Moreover, DAAM could cause an orphan node problem due to the network
constraints of its configuration parameters.
174 T.-W. Sung et al.

3.2 ZigBee Adaptive Joining Mechanism (ZAJM)


The ZigBee Adaptive Joining Mechanism approach [10] also basically follows the
DAAM address assignment, but it has an additional mechanism, called connection
shifting, to mitigate the orphan node problem in DAAM. The basic idea of ZAJM is
that one node may have more than one parent candidates to choose as its parent. In this
case, the node can change its connection target from original parent to another parent,
then the address assigned by the original parent can be released and assigned to another
new node. Fig. 2 illustrates the connection shifting of ZAJM and the reduction of
orphan nodes. It can be found that nodes P, D, M, N are orphan nodes because no any
potential parent can accept their join requests and assign addresses to them due to the
network constraints of configuration parameters nwkMaxChildren (Cm), nwkMaxRou-
ters (Rm), and nwkMaxDepth (Cm). But it also can be found that node R can change it
parent from Z to G, then Z can accept the join request of P. Node Z takes the address of
R back and assigns it to P. Similarly, node E can change it parent from A to G, then A
can take the address block back and accept the join request of D. Moreover, once D has
connected to A and obtained an address block, it can accept the join requests of M and
N. In this case, the connection shifting of node E (from A to G) reduces three orphan
nodes and raises the connection ratio.

Fig. 2. An example of DAAM

One of the advantages of ZAJM is that ZAJM can decrease the number of orphan
nodes, therefore the ratio of connected nodes is increased. In other words, the uti-
lization of address space is improved. Another advantage is that ZAJM usually makes
the sizes of subtrees more balanced when the connection shifting is performed. That is
to say, the loads among parent nodes are more balanced. The most important advantage
is that ZAJM fully retains the rules and feature of tree routing. The weakness of ZAJM
is that the address of a node will change while the node performs the connection
shifting mechanism.
Review of Several Address Assignment Mechanisms 175

3.3 Expanded Distributed Address Assignment Mechanism (EDAAM)


The Expanded Distributed Address Assignment Mechanism [11] is similar to the
stochastic addressing scheme [12], which generally follows the rules of DAAM if no
addressing failure occurs. The difference between EDAAM and the stochastic
addressing scheme is that once the addressing failed by DAAM, the stochastic
addressing scheme uses stochastic numbers generated from the unreserved addresses
but EDAAM uses one more DAAM-based address block of the unreserved addresses to
allocate addresses. Both of EDAAM and the stochastic addressing scheme have an
essential precondition to perform with desired effects. That is the maximum number of
the addresses belong to the address block reserved by DAAM networking parameters
(nwkMaxChildren, nwkMaxRouters, and nwkMaxDepth) should be much less than the
entire address space. Sufficient unreserved addresses are necessary. Under this pre-
condition, the advantage of improving utilization of overall address space can be
achieved. Besides the requirement of sufficient unreserved addresses, another weakness
of EDAAM is that additional routing tables are needed for routing among different
DAAM-based address blocks. Fig. 3 illustrates an example of EDAAM.

Fig. 3. An example of EDAAM

3.4 Multi-step Distributed Address Assignment Mechanism (M-DAAM)


The Multi-step Distributed Address Assignment Mechanism [13] is basically like
DAAM that each router allocates its own sub-block addresses to its children. To reduce
the number of useless addresses of a router, M-DAAM uses a multi-step method and
adjusts the network configuration parameters to improve the connection ratio. The
176 T.-W. Sung et al.

network parameters are able to be changed in the stepwise method of M-DAAM. The
type of a node can be selected to change to a router or end device, therefore the network
depth could increase. After the change of nwkMaxDepth, M-DAAM goes into another
step to adjust nwkMaxRouters(d) and nwkMaxChild(d), thus the maximum routers and
children that a router with depth d can support. The adjustment is based on the depth of
the network. A lower depth brings a higher value of nwkMaxRouters(d). Conversely, a
higher depth brings a lower value of nwkMaxRouters(d). In addition, the routers have
no any child will be changed to a logical node type of end device. This decreases the
useless routers and reduces the waste of address space. The advantage of M-DAAM is
the variable network parameters which can adjust the network topology to improve the
utilization of entire network space. It is applicable for a large-scale network and
alleviates the orphan node problem. The weakness is the assumption of that the node
type of a node can be changed. The required memory size of each node also increases
for the multi-step process.

3.5 Zigbee Connectivity Enhancement Mechanism (ZCEM)


The Zigbee Connectivity Enhancement Mechanism [14] is a connection shifting-based
approach similar to the ZAJM. The ZECM refines the connection shifting mechanism
of ZAJM by concerning the issue of multiple potential parents. It is possible that after
two or more potential parents performing connection shifting processes, an orphan
node has multiple candidates to choose for joining network and getting an address. The
orphan nodes in ZAJM use a first-come-first-served (FCFS) strategy to choose their
parent while all the candidates periodically broadcast beacons. Although the decision
making is fast, sometimes the result of the choice may not be the best. Accordingly,
ZCEM proposed an improvement probability (IP) calculation for each node which
sums the remaining capacity ratio of the corresponding subtree. An orphan node will
determine its parent by choosing the potential parent which will bring a most increased
improvement probability after the network joining. The simulation results of ZCEM
indicates the orphan nodes can be well reduced and the connection ratio of ZCEM is a
little better than the one of ZAJM. Thus, the utilization of network address space is
improved. The weakness is the overhead of improvement probability calculation and
the latency of determining the parent. An orphan node needs to wait for a certain
duration to receive the related information from all the candidates for the decision
making.

4 Comparison

Table 1 shows the relative comparison of the above described address allocation
approaches. The comparative description is made based on several aspects: (1) whether
or not the approach is suitable for cooperating with a tree routing scheme? (2) is the
complexity low, moderate, or high when the approach performs? (3) is the scalability
low, moderate, or high when the approach applies to a larger-scale network? (4) what is
the main weakness of the approach? It is shown that the routing used in EDAAM
should be a partially tree routing. EDAAM is difficult to perform a fully tree routing
Review of Several Address Assignment Mechanisms 177

because once the addressing failed by normal DAAM it uses one more additional
DAAM-based address block of the unreserved addresses to allocate addresses. This
breaks the regular address numbering rule used in tree routing. EDAAM and M-
DAAM have a higher complexity than the ones of the other approaches since they
generally have more operations and overheads. For example, the complexity caused by
the adjustment and reconfiguration of the networking parameters in M-DAAM.
However, this brings M-DAAM a higher scalability than the ones of the other
approaches. Thus, M-DAAM performs well in a large-size distributed network.

Table 1. A brief comparison of the reviewed address allocation approaches.


Approach Tree Relative Scalability Weakness
routing complexity
DAAM Yes Low Moderate Limitation caused by networking
parameters
ZAJM Yes Low Moderate Address change caused by
connection shifting
EDAAM Partially Moderate Moderate Requirement of enough unreserved
Yes addresses
M-DAAM Yes Moderate High Assumption of changeable node
type of a node
ZCEM Yes Low Moderate Improvement Probability
(IP) calculation and acquisition

5 Conclusion

There are various wireless ad hoc networks such as a smart meter network, sensor
network, Internet of things, etc. These kinds of networks usually consist of many
wireless and distributed devices deployed in a certain area or region. The message
delivery among these devices relies on a constructed and configured network, and a
network address allocation method should be designed and performed for the com-
munication network. This paper reviews on the proposed network addressing
approaches: DAAM, ZAJM, EDAAM, M-DAAM, and ZCEM, and then give an
introduction, illustration, and brief comparison for these address allocation schemes.
The review can provide an important reference for further improvement and advanced
utilization in related applications. In future works, this preliminary survey result can
facilitate our research project about the smart grid applications based on the smart
meter network.

Acknowledgement. This work is supported by the Fujian Provincial Natural Science Foun-
dation in China (Project Number: 2017J01730), the Fujian University of Technology (Project
Number: GY-Z18183), and the Education Department of Fujian Province (Project Number:
JT180352).
178 T.-W. Sung et al.

References
1. Chan, J., Ip, R., Cheng, K.W., Chan, K.S.P.: Advanced metering infrastructure deployment
and challenges. In: Proceedings of the 2019 IEEE PES GTD Grand International Conference
and Exposition Asia (GTD Asia), Bangkok, Thailand, 19–23 March 2019, pp. 435–439
(2019)
2. Abdulla, G.: The deployment of advanced metering infrastructure. In: Proceedings of the
2015 First Workshop on Smart Grid and Renewable Energy (SGRE), Doha, Qatar, 22–23
March 2015, pp. 1–3 (2015)
3. Chren, S., Rossi, B., Pitner, T.: Smart grids deployments within EU projects: the role of
smart meters. In: Proceedings of the 2016 Smart Cities Symposium Prague (SCSP), Prague,
Czech Republic, 26–27 May 2016, pp. 1–5 (2016)
4. Aboelmaged, M., Abdelghani, Y., Abd El Ghany, M.A.: Wireless IoT based metering
system for energy efficient smart cites. In: Proceedings of the 2017 29th International
Conference on Microelectronics (ICM), Beirut, Lebanon, 10–13 December 2017, pp. 1–4
(2017)
5. Dhivya, M., Valarmathi, K.: IoT based smart electric meter. In: Hemanth, D., Kumar, V.,
Malathi, S., Castillo, O., Patrut, B. (eds.) Emerging Trends in Computing and Expert
Technology, COMET 2019. Lecture Notes on Data Engineering and Communications
Technologies, vol. 35, pp. 1260–1269. Springer, Cham (2019)
6. Hlaing, W., Thepphaeng, S., Nontaboot, V., Tangsunantham, N., Sangsuwan, T., Pira, C.:
Implementation of WiFi-based single phase smart meter for internet of things (IoT). In:
Proceedings of the 2017 International Electrical Engineering Congress (iEECON), Pattaya,
Thailand, 8–10 March 2017, pp. 1–4 (2017)
7. Burunkaya, M., Pars, T.: A smart meter design and implementation using ZigBee based
wireless sensor network in smart grid. In: Proceedings of the 2017 4th International
Conference on Electrical and Electronic Engineering (ICEEE), Ankara, Turkey, 8–10 April
2017, pp. 158–162 (2017)
8. Wang, G., Zhao, Y., Ying, Y., Huang, J., Winter, R.M.: Data aggregation point placement
problem in neighborhood area networks of smart grid. Mob. Netw. Appl. 23(4), 696–708
(2018)
9. Alliance, Z.: ZigBee Specification (Document 053474r13), 1 December 2006
10. Sung, T.W., Yang, C.S.: An adaptive joining mechanism for improving the connection ratio
of ZigBee wireless sensor networks. Int. J. Commun Syst 23(2), 231–251 (2010)
11. Hwang, H., Deng, Q., Jin, X., Kim, K.: An expanded distributed address assignment
mechanism for large scale wireless sensor network. In: Proceedings of the 2012 8th
International Conference on Wireless Communications, Networking and Mobile Computing
(WiCom), Shanghai, China, 21–23 September 2012, pp. 1–3 (2012)
12. Kim, H.S., Yoon, J.: Hybrid distributed stochastic addressing scheme for ZigBee/IEEE
802.15.4 wireless sensor networks. ETRI J. 33(5), 704–711 (2011)
13. Kim, H.S., Bang, J.S., Lee, Y.H.: Distributed network configuration in large-scale low power
wireless networks. Comput. Netw. 70, 288–301 (2014)
14. Chang, H.-Y.: A connectivity-increasing mechanism of ZigBee-based IoT devices for
wireless multimedia sensor networks. Multimed. Tools Appl. 78(5), 5137–5154 (2017).
https://doi.org/10.1007/s11042-017-4584-2
An Approach for Sentiment Analysis
and Personality Prediction Using Myers
Briggs Type Indicator

Alàa Genina1(&), Mariam Gawich2(&), and Abdelfatah Hegazy1(&)


1
College of Computing & Information Technology, Arab Academy for Science,
Technology & Maritime Transport, Sheraton, Cairo, Egypt
alaa.genina@gmail.com, ahegazy@aast.edu
2
GRELITE, Université Française en Egypte, Cairo, Egypt
mariam.gawish@ufe.edu.eg

Abstract. Due to the rapid development of “Web 5.0”, in the last few years,
researchers have started to pay attention to social media using Personality
Prediction and Sentiment Analysis. However, due to the high costs and the
privacy of these datasets, this paper presents a study on Sentiment Analysis and
Personality Prediction through Social Media using Myers–Briggs Type Indi-
cator (MBTI) personality assessment test to analyze textual data with the use of
different classification algorithms. The data are collected from Kaggle with
approximately 8600 rows of Twitter data. The system is tested using 25% of the
dataset and the remaining 75% is for the training set. The results show an
average accuracy rate of 78.2% with the use of different classification algo-
rithms, and a 100% accuracy rate using the Random Forest (RF) and Decision
Tree classifiers.

Keywords: Data mining  Text mining  Sentiment Analysis  Emoticons


analysis  MBTI personality prediction  Machine learning  Classification
techniques

1 Introduction

Due to the rise of virtual assistants, the upcoming intelligent web “Web 5.0” will see
that applications are able to interpret information on more complex levels, emotionally
as well as logically. Social Media is a place, which is designed to allow people share,
comment, and post according to their beliefs and thoughts through websites and
applications [1]. This is the reason for increasing the amount of sentiment. Opinion
Mining or Sentiment Analysis (SA) is the way of discovering people’s feelings,
opinions, and emotions through a service or a product with the use of natural language
processing (NLP) to determine whether it is positive, negative, or neutral [2]. Sentiment
analysis can be applied by these approaches: Machine Learning, Lexicon-based, and
hybrid approach. For machine learning, it utilizes different features to build a classifier
to characterize the text which expresses the sentiment from supervised and

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 179–186, 2021.
https://doi.org/10.1007/978-3-030-58669-0_16
180 A. Genina et al.

unsupervised data. For the lexicon-based approach, it utilizes a wide range of words
suspended by polarity score in order to determine the general assessment score of the
content given. The main asset of this technique is that it does not require any training
data [3]. Emoticons as well are used to create pictorial icons that display a sentiment or
emotion with letters, numbers, and punctuation marks [4].
Personality takes into account various dimensions to define its type based on the
Myers-Briggs Type Indicator (MBTI), which are Introversion (I) – Extroversion (E),
Sensing (S) – Intuition (N), Thinking (T) – Feeling (F), and Judging (J) – Perceiving
(P). These four dimensions lead to 16 types of personality; each type consists of four
letters that help in predicting the human personalities and the interactions between them
such as (ISTJ, ISFJ, INFJ, INTJ …). The personality assessment will help the market
place and organizations to know customers’ feedback on a product or a service. For this
reason, the Sentiment Analysis techniques can facilitate the personality assessment
through the texts and the emoticons that have been expressed on the social media
websites [3]. The purpose of this work is to use the Sentiment Analysis for emoticons
expressed and to predict type of personality using MBTI personality assessment.
This paper is organized as follows: Sect. 2; provides the Literature Review. Sec-
tion 3; proposes a model for the approach. Section 4; demonstrates the traditional
machine learning algorithms approaches. Section 5; presents the conclusion.

2 Background and Related Work

Due to the data available through social networking websites, people could share their
feelings, opinions and emotions, which drove researchers to experiment the Sentiment
Analysis and Personality Type to analyze and predict the behavior of users.
In the early 2000s, as a way to recognize and analyze opinions and feelings,
sentiment analysis and opinion mining have been introduced. According to various
levels (document level, sentence level, feature level, and word level) of analysis, the
sentiment analysis can be applied. At the Document Level, the sentiment of the full
document would be taken in order to locate the general sentiment of each document.
For the Sentence Level, it’s same as the document level, but each sentence is con-
sidered individually when calculating sentiment. The Feature Level sentiment is
measured on the attribute level, particularly when applying sentiment analysis to
customers’ or products’ feedback [3, 5, 6].
Personality prediction has different types of assessments such as (MBTI, The
Winslow Personality Profile, Disc Assessment, Big Five, The Holtzman Inkblot
Technique, Process Communication Model, etc.). There are many machine learning
algorithms that can be applied to predict the type of personality.
Authors in [Sentiment Analysis of Teachers Using Social Information in Educa-
tional Platform] have created an automatic Sentiment Analysis System that analyzes
textual reviews (in Greek language) and determines the users’ attitude and their sat-
isfaction [5].
An Approach for Sentiment Analysis and Personality Prediction 181

Authors in [Reddit: A Gold Mine for Personality Prediction] have extracted a


number of linguistic and user activity features across the MBTI dimensions. Also, they
have evaluated a number of benchmark models achieved through the use of machine
learning algorithms and achieved a macro F- scores between 67% and 82% on the
individual dimension and 82% accuracy for exact or one-off accurate type prediction [7].
Authors in [Machine Learning-Based Sentiment Analysis for Twitter Accounts]
have done a comparison and figured out that TextBlob and WordNet use word sense
disambiguation (W-WSD) with greater accuracies. They have used machine learning
techniques (Naïve Bayes, SVM), sentiment lexicons (W-WSD, Senti Word Net) with
the use of python code, Tweepy API, and Weka tool [8].
Authors in [Sentiment Analysis on Social Media using Morphological Sentence
Pattern Model] proposed an approach has been useful in solving the partial matching
problem and mismatching problem [9].
Authors in [A Comparative Study of Different Classifiers for Automatic Personality
Prediction] have compared the results in several classifiers provided in Weka based on
correctly classified instances, F-measure, time taken, mean errors, and Kappa statistics
through the data of undergraduate students extracted from Twitter [10].
The aim of this work is to compare the performance of different machine learning
(ML) algorithms, in order to get the sentiment of tweets taking in account the emoti-
cons expressed in each tweet by converting them into text to get the polarity is it
positive, negative, or neutral and predicting the type of personality based on MBTI
personality assessment.

3 Proposed Model for Predicting Personality Using


Sentiment Analysis and MBTI

The model purposed is to analyze tweets collected from Twitter dataset to predict the
sentiment of these tweets; taking into consideration the emoticons expressed in each
tweet and to predict the type of personality using MBTI personality assessment. Dif-
ferent techniques will be applied in order to determine the most efficient one.
Figure 1 shows the proposed system, which includes seven components: Data
collection, Converting Emoticons to Text, Pre-processing, Feature Extraction, Senti-
ment Analysis, Classification, and the Algorithm Stage, which is composed of a
comparison between different classifiers in order to predict the personality type.
182 A. Genina et al.

Fig. 1. The proposed model for predicting personality using sentiment analysis & MBTI.

3.1 Data Collection Phase


The data has been collected from Kaggle, which includes approximately 8600 rows of
data. The data contains two columns. The first one is the Type, which consists of four
letters of MBTI Type/Code of this person. The second one is the Posts, which includes
the last 50 tweets that have been posted with a separation of “|||” (3 pipe characters) [11].

3.2 Converting Emoticons to Text and the Pre-processing Phase


A dictionary has been made containing all the emoticons to convert them into text with
the use of Kaggle Cloud, Python, and Anaconda Software i.e. converting: D into a
smiley face.
The Pre-processing Phase is the most important one in the data mining process,
which may affect the way of the final data pre-processing outcomes [12]. It includes the
following steps:
– Removal of Noise; to remove characters digits that can intervene to text analysis.
i.e. removing all the special characters except hash tags in tweets.
An Approach for Sentiment Analysis and Personality Prediction 183

– Removal of Punctuation i.e. “,.” “:;”


– Removal of Numbers.
– Removal of the re-duplicated tweets.
– Using Lowercase; lowercasing all the text data.
i.e. CANADA = canada
– Removal of Stop words, to remove all the commonly used words in English
language.
i.e. the movie was great = movie great.
– Using Lemmatization; to return all the words to its root.
i.e. Troubles = Trouble
– Using Bag-of-words; which is a model for classification, where the frequencies are
utilized as a feature for training a classifier.
– Using Tokenization; to break up the strings into pieces such as phrases, words,
symbols, keywords and other elements called “Tokens”.

3.3 Feature Extraction Phase


This step is to analyze the tweets in order to determine certain features. Each one is
symbolized by a vector to be understood by the classifier.
– Count Vectorizer (CV): it is a simple way to implement vocabulary of known
words, tokenize a series of text documents and to encode new documents utilizing
such vocabulary.
– TFIDF: learn the vocabulary, tokenize documents and inverse the weightings of
document frequency, and permit to encode new documents [13].

3.4 Sentiment Analysis and the Classification Phase


Sentiment Analysis (SA) is the process of analyzing the posts to detect the polarity; is it
positive, negative, or neutral.
In order to classify the tweets, different techniques have been applied such as:
XGBoost, Stochastic Gradient Descent (SGD) classifier, Decision Tree, K-nearest
Neighbors (KNN), Naïve Bayes, Logestic Regression (LR), and Random Forest (RF).

3.5 Algorithm Stages


The model is implemented through the use of Python in Anaconda Software and
Kaggle Cloud. As shown in Fig. 1, the algorithm consists of six stages:
– Importing the dataset.
– Converting emoticons into text.
– Pre-processing or Cleansing and creating a tidy dataset.
– Feature Extraction.
184 A. Genina et al.

– Sentiment Analysis through detecting the polarity of each post.


– Predicting Personality to understand the behavior or language styles of each person
through MBTI personality assessment; for example, to help the market place if there
is a company which wants to know the feedback of their customers on a product or
services.
The following parameters have been used to evaluate the performance of the
proposed model:
– Accuracy: It is the ratio of correctly predicted observations (true positive and true
negative) to the total number of observations.
Accuracy = TP + TN/TP + FP + FN + TN
– Recall (Sensitivity): It is the ratio of true positive observations to all the true
positive and false negative observations.
Recall = TP/TP + FN
– Specificity: It is the ratio of true negative observations to all the true negative and
false positive observations.
Specificity = TN/TN + FP
– Precision: It is the ratio of true positive observations to the total number of true
positive and false positive observations.
Precision = TP/TP + FP
Whereas:
– True Positive (TP): Is the result where the approach correctly predicts the positive
instances.
– True Negative (TN): Is the result where the approach correctly predicts the negative
instances.
– False Positive (FP): Is the result where the approach incorrectly predicts the positive
instances.
– False Negative (FN): Is the result where the approach incorrectly predicts the
negative instances [14].

4 Experimental Results and Discussion

The dataset used consists of two columns: The type of personality and the last 50
tweets separated by (3 pipe characters) “|||” between each tweet. Accordingly, different
classifiers as: XGBoost, SGD, Random Forest (RF), Logistic Regression (LR), KNN,
Naïve Bayes, and Decision Tree have been used in order to predict the personality type
using MBTI personality assessment; taking into consideration the Emoticons expressed
in each tweet by converting them into a text through creating a dictionary including
Emoticons and its similar texts. The system is tested using 25% of the dataset and the
remaining 75% is for the training set.
An Approach for Sentiment Analysis and Personality Prediction 185

Table 1. Classification of different classifiers.


Classifiers Sensitivity% Specificity% Precision%
XGBoost Classifier 87.27% 30.49% 66.81%
Stochastic Gradient Descent (SGD) 82.26% 40.3% 68.85%
Random Forest (RF) 92.29% 18% 64.35%
Logistic Regression (LR) 84.58% 38.6% 68.86%
KNN 69.23% 43.5% 66.30%
Naïve Bayes 63.39% 54.5% 69%
Decision Tree Classifier 62.35% 42.7% 63.58%

The results show that for the Sensitivity, Random Forest has the highest percentage
with 92.29%, followed by XGBoost classifier with 87.27%. Finally, Naïve Bayes has
the highest specificity with 54.50% and precision with 69%.

100% 100%

76.19% 78.20%
72.81% 72.41%
66.45%

XGBoost StochasƟc Random LogisƟc KNN Naïve Bayes Decision Tree


Classifier Gradient Forest (RF) Regression Classifier
Descent (LR)
(SGD)

Fig. 2. The accuracy rate of different classifiers

The performance was examined with the use of different classifiers. The results
show that the Decision Tree and Random Forest (RF) have the highest accuracy with
100%, followed by KNN with 78.2%. After that, the XGBoost with 76.19% and SGD
reported the fourth-best performance with 72.81% accuracy rate. Finally, Naïve Bayes
reports the lowest accuracy.
186 A. Genina et al.

5 Conclusion

This paper presents an approach to classify the sentiment of tweets whether they are
positive, negative, or neutral and predicts the type of personality based on MBTI
personality assessment along with a comparison between different machine learning
classifier results in order to understand the behaviour of users, which will assist the
organization; for example, to know the feedback on a product or a service. According
to Table 1 and Fig. 2, it is found that the KNN classifier has an average accuracy
percentage with 78.2%, SGD has 82.26% as a Sensitivity and Specificity, and
XGBoost has 66.81% and KNN 66.30% as Precision. The experimental results show
that the XGBoost classifier has improved the model performance and speed.
Further work aiming to increase the number of data, and use the deep learning
techniques.

References
1. Bayu, P., Riyanarto, S.: Personality classification based on Twitter text using Naive Bayes,
KNN and SVM. In: Proceedings of International Conference on Data and Software
Engineering (ICoDSE), Yogyakarta, Indonesia (2015)
2. http://medium.com/retailmenot-engineering/sentiment-analysis-series-1-15-min-reading-b80
7db860917. Accessed 21 Sept 2018
3. Alàa, G., Mariam, G., Abdel fatah, H.: A survey for sentiment analysis and personality
prediction for text analysis. In: The First World Conference on Internet of Things:
Applications & Future (ITAF 2019), Springer, Singapore, April 2020
4. http://britannica.com/story/whats-the-difference-between-emoji-andemoticons. Accessed 22
Dec 2019
5. Nikolaos, S., Isidoros, P., Iosif, M., Michael, P.: Sentiment analysis of teachers using social
information in educational platform environments. Int. J. Artif. Intell. Tools 29(2), 1–28 (2020)
6. Murugan, A., Chelsey, H., Thomas, N.: Modeling text sentiment: learning and lexicon
models. Adv. Anal. Data Sci. 2, 151–164 (2018)
7. Matej, G., Jan, Š.: Reddit: a gold mine for personality prediction. In: Proceedings of the
Second Workshop on Computational Modeling of People’s Opinions, Personality, and
Emotions in Social Media New Orleans, Louisiana, pp. 87–97, June 2018
8. Ali, H., Sana, M., Ahmad, K., Shahaboddin, S.: Machine learning-based sentiment analysis
for Twitter accounts. Math. Comput. Appl. 23(11), 1–15 (2018)
9. Youngsub, H., Kwangmi, K.: Sentiment analysis on social media using morphological
sentence pattern model. In: 15th International Conference on Software Engineering
Research, Management and Applications (SERA), London, UK 2017
10. Nor, N., Zurinahni, Z., Tan, Y.: A comparative study of different classifiers for automatic
personality prediction. In: 6th IEEE International Conference on Control System, Computing
and Engineering, Penang, Malaysia, November 2016
11. (MBTI) Myers-Briggs Personality Type Dataset | Kaggle. Kaggle.com. https://www.kaggle.
com/datasnaek/mbti-type. Accessed 9 Sept 2018
12. Wikipedia. https://en.wikipedia.org/wiki/Data_pre-processing. Accessed 22 Dec 2019
13. Machine learning mastery. https://machinelearningmastery.com/prepare-text-data-machine-
learning-scikit-learn/. Accessed 23 Dec 2019
14. Machine learning crash course. https://developers.google.com/machine-learning/crash-
course/classification/true-false-positive-negative. Accessed 27 May 2020
Article Reading Sequencing for English
Terminology Learning in Professional Courses

Tien-Wen Sung1, Qingjun Fang2(&), You-Te Lu3, and Xiaohui Hu1


1
College of Information Science and Engineering, Fujian University
of Technology, Fuzhou, China
tienwen.sung@gmail.com, 2191905070@smail.fjut.edu.cn
2
College of Civil Engineering, Fujian University of Technology, Fuzhou, China
89100941@qq.com
3
Department of Information and Communication, Southern Taiwan University
of Science and Technology, Tainan, Taiwan
yowder@stust.edu.tw

Abstract. Reading is one of the key methods to learn English, especially for
the EFL (English as Foreign Language) students. For the purpose of assisting
students in learning the English terminology of a professional (technical) course,
this study aims at finding reading sequences among pre-collected articles and
determining one ending article for each reading sequence. The topics or contents
described in the articles are all highly related to the specific professional course.
The determination of the ending article depends on the relations among the
articles. The relation is defined by professional terminology overlap between
articles. Floyd-Warshall algorithm and clustering concepts are utilized in the
recommendation of the ending article and reading sequences. Brief simulation
results of the algorithm are also illustrated in the paper.

Keywords: Reading sequencing  Terminology  English learning  Floyd-


Warshall algorithm  Clustering

1 Introduction

With the rapid development of science and technology, professional knowledge


changes with each passing day, among which quite a lot of professional information
and knowledge are written and published in English. In order to strengthen ESL
(English as Second Language) or EFL (English as Foreign Language) [1] students’
acquisition and absorption of professional knowledge not covered in the traditional
teaching materials, improve students’ reading and understanding ability of professional
English literature, and meet the requirements of enterprise development, it is necessary
to adjust the current teaching mechanism and personnel training mode for the major
courses by enhancing the reading and learning of English information and knowledge
literatures, cultivating students’ ability and habit of English reading, and improving
their self-competitiveness. With the development and popularization of the Internet and
information technology, it is necessary to add the assistance of network and infor-
mation technology into learning activities, and properly combine the technology and

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 187–194, 2021.
https://doi.org/10.1007/978-3-030-58669-0_17
188 T.-W. Sung et al.

good learning guidance, which can not only make students more convenient to
understand and obtain professional knowledge written in English, but also strengthen
students’ English ability. All the English word meaning, word usage, sentence com-
position, sentence meaning, article structure and professional knowledge can be further
learned and understood [2]. Accordingly, this study combines the extracurricular
learning tasks to a professional course and proposes an article reading sequencing
mechanism for students to read English professional articles in a better sequence. The
reading guidance is based on a collection of English literatures of the professional
course. After the analysis of terminology of the articles, an appropriate reading
sequence will be obtained and suggested to students to obtain a better learning effi-
ciency of English terminology and reading ability of the professional course.

2 Related Works

Information and communication technology (ICT) has been widely used in modern
learning platforms and environments, especially utilized in English language learning
[3]. There are different types of technology introduced to improve English learning. For
instances, social networking services (SNS) can be used in second and foreign lan-
guage teaching and learning [4, 5]. Gamification is also a good model for English
teaching and learning [6]. Virtual Reality (VR) technology has become an innovative
multimedia-based language learning assistance [7]. Mobile devices and wireless
communication technology are employed for mobile or ubiquitous English learning [8],
and location-based services (LBS) can provide good applications in this kind of
learning environments [9]. Moreover, some related works carried out the research on
reading activity in English learning [9, 10]. In this kind of learning approaches, arti-
ficial intelligence (AI) algorithms can be utilized to provide the functions of classifying
learning contents, recommending appropriate articles to read, giving optimal reading
sequences to learners, and etc. This paper also focuses on the reading sequence rec-
ommendation and employs Floyd-Warshall algorithm [11] and the concept of clus-
tering to find reading sequences among collected articles. The articles are used to the
improvement of acquiring and learning English terminology of a professional course
for EFL learners.

3 Article Reading Sequencing

The basic method of improving the English terminology learning of a professional


course for the EFL students is to read a series of English articles containing related
contents about the course. An assistance reading/learning tool or platform is also
essential for an information technology enhanced learning environment, as shown in
Fig. 1. The system provides the functions such as translation, learning portfolio
recording, and etc. This paper focuses on the approach of article reading sequencing
and does not describe the reading assistance system in detail. Before the system giving
a recommendation of article reading sequence, enough number of articles containing
the professional contents should be collected in advance. The sources could be Wiki
Article Reading Sequencing for English Terminology Learning 189

pages, book chapters, magazine articles, and etc. Moreover, a set of terminology terms
related to the professional course should also be defined in advance.

Fig. 1. A basic learning system

3.1 Article Relation


The basic rule of article sequencing used in this study is that two articles with the most
terminology overlap is preferred adjacent in the sequence. This will strengthen the
memory of terminology terms and its corresponding knowledge read in the articles.
The terminology overlap is treated as the relation between two articles. Let A ¼
fa1 ; a2 ; . . .; an g be the set of collected articles containing related contents of the pro-
fessional course, where n is the number of articles. Let T ¼ ft1 ; t2 ; . . .; tm g be the set of
terminology terms related to the course, and m is the number of terms. The article
relation between ai and aj is denoted by rij and defined as:
!
1   1 1
rij ¼  Ti \ Tj \ T   þ   ð1Þ
2 jTi \ T j Tj \ T 

where Ti and Tj represent the sets of terminology terms appear in the articles ai and aj ,
respectively. If rij is a low value, it means that there is a little relation between the
article ai and aj . In the case, it is not appropriate to read aj immediately after reading ai .
The possible reading sequence could be aj !ak !ai if the values of rik and rkj are high
enough. This study defines a threshold value, denoted by hR . If rij  hR , it indicates that
the relation between the articles ai and aj is high. Otherwise, it is low and one or more
intermediate articles are needed to be read after ai and before aj . To satisfy a reading
sequence planning for the cases of a specified ending article of reading, for any article
190 T.-W. Sung et al.

there must be at least one another article that their relation is higher than or equal to hR ,
as shown:

8ai 2 A; 9aj 2 A such that rij  hR ð2Þ

3.2 Article Sequencing


To find a reading sequence with the article collection, a clustering operation can be
performed as the next step. Each article can be treated as a node in the cluster, and a
specified ending article of reading can be chosen as the cluster head. This cluster is a
little different from the one-hop cluster (a star topology). There should be a one-hop or
multi-hop path from an article (a node) to the ending article (cluster head). Each
connection between two articles in the path indicates a relation of higher than or equal
to hR between the two directly connected articles.
Firstly, a 2-dimensional matrix D½i; jnn is defined as follows, and it converts the
concept of relation into distance between any article pair ai and aj . A higher relation
implies a closer distance between the articles ai and aj .
8
< 1  rij ; i 6¼ j; rij  hR
D½i; jnn ¼ 0; i¼j ð3Þ
:
1; otherwise

By utilizing the matrix D½i; jnn and the concept of Floyd-Warshall algorithm, a
sequence matrix S½i; jnn can be obtained. The matrix S½i; jnn represents as the next
one article from article ai to aj . Before Floyd-Warshall algorithm performs, the S½i; jnn
is initialized as:

j; i 6¼ j; rij hR
S½i; jnn ¼ ð4Þ
null; otherwise

Algorithm 1. Determine the article reading sequence from article to


/* initialization */
for each ( , ) where 1 ≤ i, j ≤ n and i ≠ j
if ≥ℎ
[ , ]=j
end if
end for
/* Floyd-Warshall algorithm */
for k = 1 to n
for each ( , ) where 1 ≤ i, j ≤ n
if D [i, j] > D [i, k] + D [k, j]
D [i, j] = D [i, k] + D [k, j]
S [i, j] = S [i, k]
end if
end for
end for
Article Reading Sequencing for English Terminology Learning 191

After the Floyd-Warshall algorithm is performed with the matrices D½i; jnn and
initialized S½i; jnn , the final result of S½i; jnn can indicate a sequence (path) from
article ai to aj , one by one hop. The algorithm is shown as following Algorithm 1. If a
student reads ai as the first article and will read aj as the ending article, for example, the
next article after ai is aS½i;j : If S½i; j is k, the next article after ak is aS½k;j .

3.3 Ending Article


In the cases of that article reading for professional terminology learning is an
extracurricular learning activity, the beginning and ending articles to read could be
unspecified. The number of articles asked to read could be also unspecified. Students
can freely read articles selected by themselves for the first read from the provided
article collection. In this kind of reading activity, the simple clustering method can be
modified to provide a recommendation of ending article with an optimal reading
sequence. This article will be an optimal ending article for any beginning article
selected. The optimal reading sequence will be obtained by giving the shortest multi-
hop path from the beginning article to the ending article. The shortest path implies a
better correlation among the articles on the path. The average length of all the paths
from every article ai 2 A to an recommended ending article b 2 A is defined as gðbÞ,
shown in Eq. (5).

1 Xn
gð bÞ ¼  D½ai ; bnn ð5Þ
n i¼1

To find an optimal ending article from the article collection (the article set A), the
learning system acquires the values of gðai Þ for each ai 2 A. If the value of
gðaEND 2 AÞ is minimum, it is acquired that the optimal ending article for reading is
aEND .

aEND ¼ arg min1  i  n gðai Þ ð6Þ


192 T.-W. Sung et al.

Algorithm 2 performs clustering on the collected articles. It determines an optimal


ending article for reading while the beginning article is not specified. In other words, no
matter students select which article to be their first reading article, they will read the
ending article at the end. However, the number of articles read by each student will be
different. It depends on the length of reading sequence (path). Once the article col-
lection changes, the result of clustering as well as each path will also change.

4 Simulation

This study uses simulations to validate the algorithm of article reading sequencing.
Numbers of total articles are 100, 200, and 300, respectively. The ending article to read
is acquired and recommended by the algorithm. The simulation results can show the
reading sequence for every freely selected beginning article. Figure 2 shows the article
clustering results. The hollow squares represent the articles and the solid square rep-
resents an ending article. Each connection between two articles indicates their distance
and implies their terminology relation. In Fig. 2(a) and Fig. 2(b), the cases of total
number of 100 and 200 articles are shown respectively. Each path shown in the figure
represents an article reading sequence.
Figure 3 shows the case of total number of 300 articles. In Fig. 3(a), it also shows
the reading path from each beginning article to the ending article acquired and rec-
ommended by the algorithm. In Fig. 3(b), it shows a special case that there are two
clusters. Each cluster has a cluster head (an ending article). This can be done by
modifying the clustering algorithm to have two initial cluster heads. After the algorithm
is performed, new (final) cluster heads will be acquired. Each article will have a path to
one of the two cluster heads, depending on the distance (relation) among articles. In
other words, reading from one beginning article will reach one of the two ending
Article Reading Sequencing for English Terminology Learning 193

Fig. 2. The reading sequence. (a) Number of articles = 100; (b) Number of articles = 200.

articles. In the cases of large number of articles, multi-cluster clustering is a good


approach to classify the articles into different topics.

Fig. 3. The reading sequence. (a) 300 articles, 1 cluster; (b) 300 articles, 2 clusters.

5 Conclusion

This paper proposes a method which utilizes the Floyd-Warshall algorithm and clus-
tering concepts to find the reading path among a pre-collected English professional
(technical) articles. This study is to assist EFL students in learning English terminology
terms of a professional course. The calculation of reading sequences depends on the
relationship strengths among the articles. In this study, the relation between articles is
defined as a key parameter to find the reading sequence and the ending article. The
algorithms and the brief simulation results of 100, 200, 300 articles dividing into 1 or 2
clusters are shown in the paper. In future works, an additional parameter can be
introduced into the algorithm to control the length of reading path. That is to specify a
number of articles to read in the learning activity. This can solve the problem of
194 T.-W. Sung et al.

different students read different numbers of articles due to the selection of different
beginning articles. More results of statistics and questionnaires could be also presented
in future.

Acknowledgement. This work is supported by the Research Project of University Teaching


Reformation granted by Education Department of Fujian Province (Project Number:
FBJG20170130).

References
1. Mousavian, S., Siahpoosh, H.: The Effects of vocabulary pre-teaching and pre-questioning
on intermediate Iranian EFL learners’ reading comprehenstion ability. Inte. J. Appl. Linguist.
Engl. Lit. 7(2), 58–63 (2018)
2. Wu, T.T., Sung, T.W., Huang, Y.M., Yang, C.S., Yang, J.T.: Ubiquitous English learning
system with dynamic personalized guidance of learning portfolio. Educ. Technol. Soc. 14(4),
164–180 (2011)
3. Ahmadi, D., Reza, M.: The use of technology in English language learning: a literature
review. Int. J. Res. Engl. Educ. 3(2), 115–125 (2018)
4. Reinhardt, J.: Social media in second and foreign language teaching and learning: blogs,
wikis, and social networking. Lang. Teach. 52(1), 1–39 (2019)
5. Andujar, A., Cakmak, F.: Foreign language learning through instagram: a flipped learning
approach. In: New Technological Applications for Foreign and Second Language Learning
and Teaching, pp. 135–156. IGI Global (2020)
6. Pujolà, J.T., Appel, C.: Gamification for technology-enhanced language teaching and
learning. In: New Technological Applications for Foreign and Second Language Learning
and Teaching, pp. 93–111. IGI Global (2020)
7. Pinto, R.D., Peixoto, B., Krassmann, A., Melo, M., Cabral, L., Bessa, M.: Virtual reality in
education: learning a foreign language. In: Proceedings of the World Conference on
Information Systems and Technologies (WorldCIST), Galicia, Spain, 16–19 April 2019,
pp. 589–597 (2019)
8. Elaish, M.M., Shuib, L., Ghani, N.A., Yadegaridehkordi, E., Alaa, M.: Mobile learning for
English language acquisition: taxonomy, challenges, and recommendations. IEEE Access 5,
19033–19047 (2017)
9. Wu, T.T., Sung, T.W., Huang, Y.M., Yang, C.S.: Location awareness mobile situated
English reading learning system. J. Internet Technol. 11(7), 923–934 (2010)
10. Sung, T.W., Wu, T.T.: Dynamic e-book guidance system for English reading with learning
portfolio analysis. Electron. Libr. 35(2), 358–373 (2017)
11. Rosen, K.H.: Discrete Mathematics and Its Applications, 8th edn. McGraw-Hill, New York
(2019)
Egyptian Student Sentiment Analysis Using
Word2vec During the Coronavirus (Covid-19)
Pandemic

Lamiaa Mostafa(&)

Business Information System Department, Arab Academy for Science


and Technology and Maritime Transport, Alexandria, Egypt
Lamiaa.mostafa31@aast.edu, Lamiaa.mostafa31@gmail.com

Abstract. Education field is affected by the COVID-19 pandemic which also


affects how universities, schools, companies and communities function. One
area that has been significantly affected is education at all levels, including both
undergraduate and graduate. COVID-19 pandemic emphasis the psychological
status of the students since they changed their learning environment. E-learning
process focuses on electronic means of communication and online support
communities, however social networking sites help students manage their
emotional and social needs during pandemic period which allow them to express
their opinions without controls. The paper will propose a Sentiment Analysis
Model that will analyze the sentiments of students in the learning process with in
their pandemic using Word2vec technique and Machine Learning techniques.
The sentiment analysis model will start with the processing process on the
student's sentiment and selects the features through word embedding then uses
three Machine Learning classifies which are Naïve Bayes, SVM and Decision
Tree. Results including precision, recall and accuracy of all these classifiers are
described in this paper. The paper helps understand the Egyptian student's
opinion on learning process during COVID-19 pandemic.

Keywords: Sentiment analysis  Coronavirus  COVID-19  Word2Vec 


Learning  Pandemic

1 Introduction

Due to the new environmental change for the spread of Covid-19 that leads to global
pandemic, E-learning techniques are emerging in all over the world. Educational
institutions should manage the stress and provide a healthy learning environment [1, 2].
The author [3] described the shift of using learning technology from ‘nice to have’ to
‘mission-critical’ for educational process, he also divided the educational process into
two vital parts technology and the learning design. He emphasized that educational
institution must develop and sustain their learning contents to provide effective
teaching and learning process.
Student satisfaction is very important for the educational institutions. Teachers can
understand the student using his feedback [18]. Students can express their opinions
through the following ways: Class room feedback, clickers, mobile phones and social

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 195–203, 2021.
https://doi.org/10.1007/978-3-030-58669-0_18
196 L. Mostafa

media like Facebook, Twitter. Social media feedbacks have following problems:
Sentiment classification is used to understand the feeling toward a written piece of text
through classifying the text whether it is positive, negative, or neutral [4–7]. Data that
is feed to classifier must be cleaned and represented in an accurate way. In the senti-
ment classification process, feature vector is used as the representation of data to work
on. There are different types of feature vector; one of the well known techniques is the
Bag of Words (BOW) and word embedding. Researchers in [5] uses the combination of
Word2vec and Bag-of-Centroids’ feature vector in the sentiment classification of online
mobile consumer reviews while testing the results using different machine learning
classifiers, it was found that the proposed feature vector performed well in comparison
with Word2vec feature vector. The rest of this paper is organized as follows: Sentiment
Analysis in the second Section, word embedding will be described in Sect. 3; Student
Sentiment Analysis Model will be described in Sect. 4, Student Sentiment model
results will be analyzed in Sect. 5; Sect. 6 includes the paper conclusion and future
work.

2 Sentiment Analysis

Students use the social media to express their feelings. Teacher has to read all the
feedbaṇṇcks this may lead to time consumption, database the holds all text must be
maintained and students addiction level is increased due to the high usage of social
media. Authors in [18] analyzed the Long Short-Term Memory (LSTM) of student
emotions in learning and they concluded that fusion of multiple layers accompanied
with LSTM improves the result over a common Natural Language Processing
(NLP) method.
Researchers defined the sentiment analysis process into the following stages: data
acquisition, data preparation, review analysis, and sentiment classification [13–16].
Data acquisition is the process of identifying the source of the sentiment text, data
preparation include removing irrelevant terms, review analysis techniques such as
Term-Frequency-Inverse Document Frequency (TF-IDF), Bag of words (BOW) and
Word2vec and finally the classification stage which depends on machine learning
techniques like Naïve Bayes (NB) and Support Vector Machine (SVM).
Authors in [8] aim in conduct sentiment analysis on Amazon product reviews
including 400,000 reviews using various NLP techniques and classification algorithms.
They classified the review through the following steps: preprocessing the reviews and
converting them to clean reviews, after which using word embedding, words are
converted into numerical representations. Classifiers used are the Naïve Bayes, logistic
regression, and random forest algorithm. Accuracy of all classifiers are compared, it
helps companies to understand their customer's opinions.
Authors in [9] uses bag of n-grams for feature selection, and mutual information
was used. Naive Bayes classifier achieved accuracy level equals to 88.80% on IMDB
movie reviews dataset.
Authors in [4] proposed sequential approach that uses automatic machine learning
to improve the quality of natural language processing development. Gamification
techniques are used in different fields, authors in [15] proposed a Sentiment Analysis
Egyptian Student Sentiment Analysis 197

Classifier that analyzes the sentiments of students when using Gamification tools in an
educational course. The results had shown that the best classifier accuracy results is the
Naïve Bayes, also when executing a test on the 1000 students, the agree group of using
Gamification in learning shown a better results comparing to the disagree group; this
proves that Gamification will enhance the student performance in learning.
Student sentiment were collected to understand the prioritizing of the appropriate
service facility to optimize the facilities output in order to increase student satisfaction
and decrease building life-cycle costs. Student sentiments were collected from 100
students and Term frequency as a feature extraction was used while SVM and NB
classifiers were used. The most important factor that affects student satisfaction is the
communication between the university management and the students [17].
Two techniques are used for sentiment classification: machine learning and lexicon-
based [14, 17]. Machine learning uses traditional mining algorithms such as Naive
Bayes, Support Vector Machines and Neural Networks (NN). Naïve Bayes [14–17], the
following section will explore some of the researches that implement the word
embedding.

3 Word Embedding

Word embedding is the process of understanding the relationship between words by


dividing a sentence into vectors in which many words with a similar context are usually
mapped on similar vector representation. Word embedding are dimension vector rep-
resentations of word types. Word embedding techniques examples are Latent Semantic
Indexing, Random Indexing and Word2Vec [5], the most popular word embedding
model today is Skip-gram [23]. Word Embedding is used in sentiment analysis [10–
13]. Different dimensions of Word2vec was used in [10] as it shows improvement level
compared to Bag of words.Word2Vec is used to cluster product features and the results
compared with TF-IDF feature vector and obtained improved results [11]. Authors in
[12] use combination of features extraction which is bag of words, TF-IDF, and
Word2vec with logistic regression and random forest classifier. They concluded that
Word2vec is a better method for feature extraction along with random forest classifier
for text classification.Word2vec is more efficient from traditional feature vectors like
Bag-of-Words, Term-based feature vector based on [13]. There was an investigation
made by authors in [19] to understand how is single word can affect the sentiment
context.
Authors in [20] train their systems by using a combination of Word Embedding, the
focus was to validate the quality of the system using precision and recall analysis.
Movie reviews were classified and labeled into five values: negative, somewhat neg-
ative, neutral, somewhat positive, positive which combined with Word Embedding for
polarity detection of movie reviews [21]. An investigation for the adaption of Word
Embedding techniques in a content-based recommendation Scenario was examined in
[22] and Content-based Recommendation framework was created from Wikipedia to
learn user profiles based on such Word Embedding, authors concluded that algorithms
showed efficient results on Collaborative Filtering and Matrix Factorization, especially
in high-sparsity recommendation scenarios.
198 L. Mostafa

4 Student Sentiment Analysis Model

Student Sentiment Analysis Model is used to analyze the students’ opinion in the
learning process through their pandemic; Sentiment Model passes by different stages
including data collection, text processing, feature selection and classification. Figure 1
represents the three components of the Student Sentiment Model.

Fig. 1. Student Sentiment Analysis model.

4.1 Data Collection and Text Pre-processing


Sentiments were collected from 1000 students in Arab Academy for Science and
Technology and Maritime Transport University (AAST) University in College of
Management and technology for a business course using Google sheet in English text.
700 students refused the e-learning process and define the problems of this process
while 300 students were interested in the e-learning process. Table 1 classifies the
problems of e-learning process; the largest number of sentiments is focusing on the
problem of the lack of communication between the student and the teacher.

Table 1. Classification of Sentiments describing the problems of E-learning


Problems of Internet Online lecture Online Lack direct
E-learning connection voice latency examination communication with
limited time teacher
# of 117 173 164 246
sentiments

The second phase of the proposed sentiment model is the text processing. It is the
process of cleaning the text from unneeded words [14–16]. Preprocessing steps include
punctuation eraser, case convertor, and stop word removal and porter stemmer [16].
Egyptian Student Sentiment Analysis 199

4.2 Features Selection


Text is represented using a collection of features. The process of features selection is to
select keywords that represent the text meaning while removing the irrelevant features.
Example of Feature extraction methods are Document Frequency (DF), Information
Gain (IG), Mutual Information (MI) [16] and Word2Vec [5, 22].
Word2vec is a two layer neural networks technique which produces word vectors
from many text of words as input by observing the contextual data the input words
appear in [22]. Every word is assigned to a related vector in the Word2vec space.
Word2vec algorithm will discover that words with similar context are positioned in
near proximity to one another in the word space. Wored2vec has two approaches: Skip-
gram and Continuous Bag of Words (CBOW), Fig. 2 will describe how each technique
works, where w(t) is the target word or input given.

Fig. 2. CBOW and Skip-gram Techniques [24].

Skip-gram is the inverse of CBOW, in Continuous Bag of Words (CBOW)


architecture, CBOW predicts center words (target) based on the neighboring words [24]
however in the Skip-gram model context words nearby are predicted from center word
[23], and Skip-gram is more efficient when the corpus is small. Skip-gram passes by the
following steps: data preparation, hyper parameters, generate training data, model
training and inference. Data preparation defines the corpus and cleans the required text,
second step is building the vocabulary, third step is encoding the words and calculate
error rate and adjusting the weights, the last step is finding word vector and finding the
similar words. Skip-gram works using the following equation [25]:

Q ¼ C  ðD þ D  log2 ðVÞÞ ð1Þ

Where, D is the document, v is the dimensionality. Student Sentiment Analysis


model uses Document Frequency and the Skip-gram.
200 L. Mostafa

4.3 Machine Learning Classifiers


The features are selected and the classifiers will work on the selected features. Sta-
tistical methods and Machine Learning classifiers are usually used in sentiment clas-
sification including Multivariate Regression Models, Decision Trees, Neural Networks,
Support Vector Machine, Concept Vector Space Model (CVSM) and Naïve Bayes
[26]. Student Sentiment Analysis model uses Knime [27] for preprocessing and key-
word extraction and classification process; here is the description of each one of the
classifiers.
• A Naive Bayes classifier depends on Bayes’ theorem, based on the following
equation [16]:
Qm
pðcÞð pðf i jcÞniðdÞ
pNB ðcjd Þ ¼ i¼1
ð2Þ
pðdÞ

Where, c is the class, d is the document, m is the no of features and fi is the feature
vector.
• Support vector machine separates positive and negative samples using the best
surface, SVM uses the following equation [16]:

1 Xl
minw;b;e wT w þ C e
i¼1 i
ð3Þ
2

Given training vectors wT , x is the input vector, c is the class, b is the bias, i
represent vector.
• Decision Tree is a K-array tree in which nodes has specific characteristics, Decision
Tree uses the following equation [16]:
Xm
infoðDÞ ¼ i¼1
pi log2 ðpÞ ð4Þ

where D is the document, i is the vector, m is the number of feature, pi is the


likelihood that self-assertive vector in D fits in with class ci.

5 Student Sentiment Analysis Model Results

Student Sentiment Analysis Model passes by three steps: The first step is data col-
lection represented in 1000 student sentiments. Step two is the processing in which
documents are cleaned and keywords are extracted using Document Frequency and
Skip-gram. Step three is the classification and scoring that uses three classifiers (SVM,
NN, and Decision Tree), the scorer calculates the accuracy level. Precision and recall
are calculated based on the following equations [16]:
Egyptian Student Sentiment Analysis 201

number of relevant items retrieved


recall ¼ ð5Þ
a number of relevant items in the collection
number of relevant items retrieved
precion ¼ ð6Þ
total number of items retrieved

Student Sentiment analysis Model tests the machine learning classifiers based on
1000 student sentiments, sentiments are classified into accept e-learning process sen-
timent or reject e-learning process sentiment. Naive Bayes, SVM and Decision Tree
were used for the classification of the sentiments. Table 2 shows the results of the
sentiment model.

Table 2. Student Sentiment Analysis Model Results


Classifier Accuracy Precision Recall
DF Skip-gram DF Skip-gram DF Skip-gram
NB 87% 91% 0.85 0.91 0.82 0.90
SVM 79% 89% 0.77 0.82 0.79 0.89
Decision Tree 76% 85% 0.74 0.84 0.71 0.83

The results of Student Sentiment Analysis Model agreed with the conclusion of [5,
15, 16], proving that NB is the highest accuracy for sentiment classification. The results
agreed with [5] by showing that the accuracy of the decision tree classifier in mobile
reviews was lower than the rest of the classifiers used which are logistic regression CV
(LRCV), multilayer perceptron (MLP), Random Forest (RF), and Gaussian Naïve
Bayes (GNB). Student sentiments dislike the e-learning process and this was reflected
on their sentiments that is a good indicator for student opinions, this is equivalent to the
conclusion of [17, 18].

6 Conclusion and Future Work

Student Sentiment Analysis Model is designed and created; Student Sentiment Model
data set was 1000 student sentiments that are divided into 700 accept e-learning process
sentiment and 300 refuse e-learning process sentiments. Student sentiment Analysis
Model passes by text processing, feature selection including Document Frequency and
Word2Vedc (Skip-gram) and machine learning classification, three classifiers were
used NB, SVM and Decision Tree. The results had shown that the best classifier
accuracy result is the NB.
The limitations of the Student Sentiment Analysis Model are: Sample size must be
enlarged while involving different student majors. Different classifiers should be tested
such as Multilayer Perceptron (MLP), Random Forest (RF), and Gaussian Naïve Bayes
(GNB). Arabic sentiments should be the future plan of the author since the sentiments
are collected from Egyptian students; however the Arabic language process is a
complicated process.
202 L. Mostafa

References
1. Cheston, C.C., Flickinger, T.E., Chisolm, M.S.: Social media use in medical education: a
systematic review. Acad. Med. 88(6), 893–901 (2013)
2. Marshall, A., Spinner, A.: COVID-19: challenges and opportunities for educators and
generation Z learners. Mayo Foundation for Medical Education and Research. In: Mayo
Clinic Proceedings (2020)
3. Schaffhauser, D.: National Federation of the Blind takes on e-text pilots. Campus
Technology (2012)
4. Polyakov, E.V., Voskov, L.S., Abramov, P.S., Polyakov, S.V.: Generalized approach to
sentiment analysis of short text messages in natural language processing. Informatsionno-
upravliaiushchie sistemy [Inf. Control Syst.] (1), 2–14 (2020). https://doi.org/10.31799/
1684-8853-2020-1-2-14
5. Poonam Choudhari, P., Veenadhari, S.: Sentiment classification of online mobile reviews
using combination of Word2vec and Bag-of-Centroids. In: Swain, D., et al. (eds.) Machine
Learning and Information Processing. Advances in Intelligent Systems and Computing, vol.
1101. Springer (2020)
6. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1–
135 (2008)
7. Van Looy, A.: Sentiment analysis and opinion mining (business intelligence 1). In: Social
Media Management. Springer Texts in Business and Economics. Springer, Cham (2016).
https://doi.org/10.1007/978-3-319-21990-5_7
8. Meenakshi, M., Banerjee, A., Intwala, N., Sawan, V.: Sentiment analysis of amazon mobile
reviews. In: Tuba, M., et al. (eds.) ICT Systems and Sustainability. Advances in Intelligent
Systems and Computing, vol. 1077. Springer (2020)
9. Narayanan, V., Arora, I., Bhatia, A.: Fast and accurate sentiment classification using an
enhanced naive Bayes model. In: Intelligent Data Engineering and Automated Learning,
IDEAL 2013. Lecture Notes in Computer Science, vol. 8206, pp. 194–201 (2013)
10. Bansal, B., Shrivastava, S.: Sentiment classification of online consumer reviews using word
vector representations. Procedia Comput. Sci. 132, 1147–1153 (2018). International
Conference on Computational Intelligence and Data Science, ICCIDS 2018, Edited by
Singh, V., Asari, V.K. Elsevier (2018)
11. Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on
word2vec and SVMperf. Expert Syst. with Appl. 42, 1857–1863 (2015)
12. Waykole, R.N., Thakare, A.D.: A review of feature extraction methods for text classification.
Int. J. Adv. Eng. Res. Dev. 5(04) (2018). e-ISSN (O): 2348–4470, p-ISSN (P): 2348-6406
13. Fang, X., Zhan, J.: Sentiment analysis using product review data. J. Big Data (2015). https://
doi.org/10.1186/s40537-015-0015-2
14. Mostafa, L., Abd Elghany, M.: Investigating game developers’ guilt emotions using
sentiment analysis. Int. J. Softw. Eng. Appl. (IJSEA), 9(6) (2018)
15. Mostafa, L.: Student sentiment analysis using gamification for education context. In:
Hassanien, A., Shaalan, K., Tolba, M. (eds.) Proceedings of the International Conference on
Advanced Intelligent Systems and Informatics 2019, AISI 2019. Advances in Intelligent
Systems and Computing, vol. 1058. Springer, Cham (2019)
16. Mostafa, L.: Machine learning-based sentiment analysis for analyzing the travelers reviews
on Egyptian hotels. In: Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F. (eds.)
Proceedings of the International Conference on Artificial Intelligence and Computer Vision
(AICV2020), AICV 2020. Advances in Intelligent Systems and Computing, vol. 1153.
Springer, Cham (2020)
Egyptian Student Sentiment Analysis 203

17. Abd Elghany, M., Abd Elghany, M., Mostafa, L.: The analysis of the perception of service
facilityies and their impact on student satisficiation in higher education. IJBR 19(1) (2019).
ISSN: 1555–1296
18. Sangeetha, K., Prabha, D.: Sentiment analysis of student feedback using multi-head attention
fusion model of word and context embedding for LSTM. J. Ambient Intell. Hum. Comput.
(2020)
19. Dessí, D., Dragoni, M., Fenu, G., Marras, M., Reforgiato Recupero, D.: Deep learning
adaptation with word embeddings for sentiment analysis on online course reviews. In:
Agarwal, B., Nayak, R., Mittal, N., Patnaik, S. (eds.) Deep Learning-Based Approaches for
Sentiment Analysis. Algorithms for Intelligent Systems. Springer, Singapore (2020)
20. Buscaldi, D., Gangemi, A., Reforgiato Recupero, D.: Semantic web challenges. In: Fifth
SemWebEval Challenge at ESWC 2018, Heraklion, Crete, Greece, 3 June–7 June, Revised
Selected Papers, 3rd edn. Springer (2018)
21. Li, Y., Pan, Q., Yang, T., Wang, S., Tang, J., Cambria, E.: Learning word representations for
sentiment analysis. Cogn. Comput. 9(6), 843–851 (2017)
22. Cataldo Musto, C., Semeraro, G., Gemmis, M., Lops, P.: Learning word embeddings from
Wikipedia for content-based recommender systems. In: Ferro, N., et al. (eds.) ECIR 2016.
LNCS, vol. 9626, pp. 729–734. Springer (2016). https://doi.org/10.1007/978-3-319-30671-
1_60
23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of
words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)
24. Yilmaz, S., Toklu, S.: A Deep Learning Analysis on Question Classification Task Using
Word2vec Representations. Springer, London (2020)
25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations
in Vector Space. arXiv:1301.3781v3 [cs.CL] 7 (2013)
26. Gyongyi, Z., Molina, H., Pedersen, J.: Web content categorization using link information,
Technical report, Stanford University (2006)
27. Knime. https://www.knime.com/. Accessed 11 Sept 2019
Various Pre-processing Strategies for Domain-
Based Sentiment Analysis of Unbalanced
Large-Scale Reviews

Sumaia Mohammed AL-Ghuribi1,2(&), Shahrul Azman Noah1,


and Sabrina Tiun1
1
Faculty of Information Science and Technology,
Universiti Kebangsaan Malaysia, Bangi, Malaysia
somaiya.ghoraibi@gmail.com
2
Faculty of Applied Sciences, Department of Computer Science,
Taiz University, Taizz, Yemen

Abstract. User reviews are important resources for many processes such as
recommender systems and decision-making programs. Sentiment analysis is one
of the processes that is very useful for extracting the valuable information from
these reviews. Data preprocessing step is of importance in the sentiment analysis
process, in which suitable preprocessing methods are necessary. Most of the
available research that study the effect of preprocessing methods focus on bal-
anced small-sized dataset. In this research, we apply different preprocessing
methods for building a domain lexicon for unbalanced big-sized reviews. The
applied preprocessing methods study the effects of stopwords, negation words
and the number of word’s occurrence. Followed by applying different prepro-
cessing methods to determine the words that have high sentiment orientations in
calculating the total review sentiment score. Two main experiments with five
cases are tested on the Amazon dataset for the movie domain. The best suitable
preprocessing method is then selected for building the domain lexicon as well as
calculating the total review sentiment score using the generated lexicon. Finally,
we evaluate the proposed lexicon by comparing it with the general-based lexicon.
The proposed lexicon outperforms the general lexicon in calculating the total
review sentiment score in term of accuracy and F1-measure. Furthermore, the
results prove that sentiment words are not restricted to adjectives and adverbs
only (as commonly claimed); nouns and verbs also contribute to the sentiment
score and thus effects in the sentiment analysis process. Moreover, the results also
show that negation words have positive effects in the sentiment analysis process.

Keywords: User reviews  Sentiment analysis  Data preprocessing methods 


Domain-based lexicon  Unbalanced dataset  Sentiment words

1 Introduction

Millions of people share their opinions on goods, services and deals on a regular basis,
using, among others, online channels such as social networks, forums, wikis, and
discussion boards. These reviews reflect the users’ experiences on the consumed

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 204–214, 2021.
https://doi.org/10.1007/978-3-030-58669-0_19
Various Pre-processing Strategies for Domain-Based Sentiment Analysis 205

services and have significant importance for users, vendors, and companies. The
essence of these reviews is complicated, in which they are short, unstructured and
sensitive to noise, since they are written by regular, non-professional users [1]. To get
benefit from these reviews, many fields are involved in processing them such as sen-
timent analysis. Sentiment Analysis (SA) is used to extract the feeling or opinion of
people towards a specific entity. It focuses on predicting the sentiment score (i.e.
positive, neutral, negative) of the given entity. SA usually works in three main levels:
document-level, sentence-level, and aspect-level. In this research, we are interested in
the document-level opinion mining task which aims to extract the sentiment score of
the whole document.
Approaches to SA can be roughly divided into machine learning and lexicon-based
approaches. In this research, we are interested in the lexicon-based approach as it does
not require pre-label training data. SA is considered to be a text analysis task, whereby
the pre-processing process and the feature selection process are of significance and
affect the efficacy of the SA performance. As a result, the words that are chosen for
building the lexicon and their scores (i.e. polarities) will greatly affect the SA perfor-
mance. For example, if the lexicon incorrectly assigned a value to an opinion word or
misses important sentiment indicator words, the accuracy will negatively affect the SA
results. Many research have been done to study the effects of various preprocessing on
many languages, such as Arabic [2], English [3] and Indonesian [4]. The role of the
pre-processing step differs based on the nature of the data. For example, stemming
methods were claimed to significantly improve the performance of SA for Arabic
language but otherwise for the Indonesian language. As a result, not all the datasets
require the same preprocessing methods, as some will affect positively while others will
not be affected or hardly affected.
Most of the works that study on the effect of pre-processing used balanced small-
sized datasets (i.e. the number of positive and negative reviews are almost equal) such
as [5, 6]. Furthermore, the works mainly observed the effect of simple pre-processing
methods such as stemming and stopwords removal. In this paper, however, the focused
is on unbalanced big-size dataset, as well as exploring the effect of word types such as
noun, adjective, adverb and verb to the performance of SA.

2 Related Work

The preprocessing step is the process of removing noisy data from the dataset, which
can negatively affect the overall result of a specific task. User reviews contain a lot of
noise data that makes preprocessing step a crucial step for such data and improve the
performance of SA classifiers [5]. Many studies have conducted the importance of the
preprocessing step in the SA process. Follow are some of them:
Haddi, Liu et al. [5] explored the role of pre-processing in the SA process for the
movie domain. The used preprocessing steps are removing non-alphabetic signs,
removing stopwords, removing movie domain-specific words, tagging negation words,
and stemming. The reported results of their experiment showed that the appropriate
preprocessing methods can clearly improve the performance of the classifier.
206 S. M. AL-Ghuribi et al.

Jianqiang [6] studied the effect of six various preprocessing methods in the sen-
timent classification for the twitter domain. The methods are removing stopwords,
removing numbers, replacing negative indication, removing URL, replacing acronyms
with their original words using acronym dictionary and reverting words with repeating
letters to their original form. The result of his experiments showed that replacing
negative indication and replacing acronyms with their original words have improved
the classification accuracy, while the reminder methods are hardly affecting the
classifier.
Krouska, Troussas et al. [7] presented five preprocessing techniques to study their
effects on the classification performance. The methods are a weighting scheme using
TF-IDF, stemming using the snowball stemmer, removing stopwords, tokenization
(unigram, bigram, trigram) and various feature selection methods. The results of their
experiment showed that unigram and trigram with information gain feature selection
method improve the accuracy of the SA process.
Zin, Mustapha et al. [3] studied the effects of various preprocessing strategies in
SA for the movie domain. They used three cleaning strategies tiers for the prepro-
cessing phase as follows: the first is removing stopwords, the second is removing
stopwords and meaningless words, and the last is removing stopwords, meaningless
words, numbers, and words less than three characters. Their experiments are applied on
the Internet Movie Database (IMDb) and the results of the last tier is the best for
improving the SA.
Finally, we can notice that most of the researches, that study the effect of the
preprocessing step in SA, focus only on the balanced small-sized dataset. Also, their
used methods focus only on stemming or removing stopwords and ignore the words’
type. The differ in our work that we focus on the unbalanced big-sized dataset and the
syntax of each sentence to study the effect of each word type in the SA process.

3 Methodology

In this section, the used methodology of this research will be described in detail.
Figure 1 illustrates the framework of the study. The study contains of two phases:
training phase and testing phase. In the training phase, the lexicon is built based on
three parameters: the occurrence of words in a review, the words appended to the
lexicon and the effect of stopwords and negation words. While in the testing phase, the
total review sentiment score is calculated, and the word’s polarity is assigned from the
generated lexicon in the training phase. The total review sentiment score is calculated
based on two parameters: the part of the speech of the word in the review and the effect
of the negation words. Finally, we evaluate each of the above parameters to extract the
main ones that positively affect the lexicon building and the calculation of the total
review sentiment score then, compare the best method with other baselines.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis 207

Apply Building
Amazon TRAINING
Different Domain-Based PART
Movie Preprocessing Lexicon
Reviews

Calculating
TESTING
Review PART
Sentiment Score

Evaluation and
comparison with
baselines

Fig. 1. The Framework of the Study

A. Training Phase (Building a Domain Based Lexicon)


In this study, we focus on the domain-based lexicon, and we select the movie domain
because it is one of the most used and popular domains. We also choose Kevin’s
method for building the lexicon [8], which uses a hybrid method that estimate the
sentiment score using the probability method and the information theoretic method.
The efficiency of this hybrid method in SA has been proved in many large and diverse
corpora and it overcomes the poor performance of supervised machine techniques [9].
The difference in our work with Kevin’s method is that we used various preprocessing
methods in building the lexicon to find the most effective method that significantly
improve the SA process.
We experiment with two approaches of building the domain lexicon which is based
on presence and absent of words; and word frequencies. In each experiments few cases
are being considered as illustrated in Table 1.

Table 1. The two approaches for building the domain lexicon


Experiments# Case# Removal of Removal of
stopwords negation words
1 – Words are presented by the presence and 1 No No
absent of each word in a review 2 Yes No
3 Yes Yes
2 – Words are presented by the frequency 1 Yes Yes
(occurrences) of each word in a review 2 Yes No

As mentioned before, Kevin’s method depends on both the probability method and
the information theoretic method, thus the word occurrences feature is an important
factor. In experiment 1, the occurrence of words are not taken into account in each
review. (i.e. words are considered to present and absent); whereas otherwise for
experiment 2.
208 S. M. AL-Ghuribi et al.

B. Testing Phase (Calculating the Total Review Sentiment Score)


The testing phase focuses on converting the review text into a numerical score or
known as sentiment score (i.e. sometimes called virtual rating or implicit rating). This
value is used in many applications such as recommender system and summarization
process, and the accuracy of this value affects the accuracy of the application that uses
it. We implement few methods to generate the sentiment score. Some existing research
consider only the adjectives as the sentiment words while others consider both
adjectives and adverbs as the sentiment words. The difference in our work is that we
study the effects of all words type (i.e. noun, verb, adjective, and adverb) and thus, we
do not restrict the sentiment words to a specific type. Additionally, we also explore the
effect of using negation words in calculating the sentiment score of the review.
In this step the lexicons built during the training phase are used for assigning the
sentiment polarity for each word, and for the five cases illustrated previously in Table 1
(i.e. three cases in experiment 1 and two cases in experiment 2), the effect of choosing
the sentiment words based on the following parameters are being explored:
i. Consider all the words as sentiment words.
ii. Use a combination Part of Speech (POS) to derive the sentiment words (i.e.
adjective; adjective and adverb; noun, adjective, and adverb; noun, verb, adjec-
tive, and adverb).
iii. Assign higher priorities to the adjectives and/or adverbs in calculating the senti-
ment scores.
iv. Take into account negating words in the parameters of (ii) and (iii). The negation
words described in [10] are used.
After choosing the sentiment words from each review based on the previous param-
eters, we use the blocking technique mentioned in [11] for storing the generated
lexicon. Blocking technique aims to store the lexicon in blocks (i.e. there are 27 various
blocks named from A to Z), each block contains the words that start with the similar
letters. We deal in our experiments with big-sized dataset, so blocking technique will
enhance the searching process for each word’s score from the lexicon because the
algorithm does not go through all the lexicon to get the word’s score, it searches only
on the word’s block. In this case, we reduce the number of search comparisons and
speed the execution time which are the main parameters for any search algorithm.

4 Results and Evaluation

In this section, the results of the two experiments mentioned in Sect. 3 will be
described in detail. We use the Amazon dataset for the movie domain [12]. It consists
of 1,697,533 reviews in which 86.52% are positive reviews and 13.47% are negative
reviews. We divide the dataset into 80% for building the lexicon, and 20% for cal-
culating the total review sentiment score using the generated lexicon.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis 209

A. Training Phase (Building a Domain Based Lexicon)


As discussed in the methodology section there are two experimental approaches for
building the lexicon with different cases for each (refer to Table 1). As a result, five
domain lexicons have been built of which three for experiment 1 and two for exper-
iment 2. Each lexicon was built based on three parameters, for example the first lexicon
of experiment #1 for case #1 was built based on the presence or absent of words in a
review, according to a parameter O and using stopwords. The parameter O refers to the
minimum number of reviews in which the word is mentioned. We have tested three
values of O which are 50, 30, 10 (i.e. 50 means that the appended words are mentioned
in at least 50 reviews). The test showed that O = 10 gave the best results in terms of
F1-measure and accuracy in calculating the review sentiment score for all cases.
Table 2 shows the generated lexicons based on O = 10.

Table 2. Details of the generated lexicons


Lexicon Lexicon details
Total size Positive words Negative words
Experiment #1, Case #1 115,242 112,153 3,089
Experiment #1, Case #2 92,181 88,923 3,258
Experiment #1, Case #3 92,168 88,868 3,300
Experiment #2, Case #1 123,738 118,104 5,634
Experiment #2, Case #2 123,178 117,657 55,21

B. Testing Phase (Calculating the Total Review Sentiment Score)


The aim of this phase is to calculate the total review sentiment score using the gen-
erated lexicon in the previous phase based on four parameters. The following describe
in more detail how the score of sentiment are generated using the parameters.
i. Consider all the words as sentiment words
The review is tokenized into words and for each tokenize of words, the score is
calculated from the generated lexicon and the total review sentiment score is the
summation of the sentiment score of each token.
ii. Using POS to choose the sentiment words
In this parameter, we want to check which of the words have more sentiment
orientations. We will use four different cases for selecting the sentiment words as
follows: using adjectives only (ADJ), using adjectives and adverbs only (ADJ
+ADV), using adjectives, adverbs, and nouns only (N+ADJ+ADV), and using
adjectives, adverbs, nouns, and verbs (N+V+ADJ+ADV).
iii. Give more priorities to adjectives and/or adverbs in calculating their scores
The idea here is to give the adjectives and the adverbs more priorities than the
other types of words by multiplying their scores by a value (i.e. different values
are tested). For example: Exp1.Case1 (ADJ(x3)+ADV(x2)) means that in Exp1.
Case1 we select only adjectives and adverbs as sentiment words and multiply the
sentiment score of each adjective with 3 and each adverb with 2.
210 S. M. AL-Ghuribi et al.

iv. Use negation word with the previous two parameters


During the implementation of the previous parameters, we notice that the number
of true negative is always small due to the difficulties to determine the negative
words. In this part, we suggest using all the negation words to increase the
possibility of true negative reviews which in rule will enhance the overall per-
formance measures. Two values for the negative words are tested, either AVG or
SEP, AVG means that all the negation words have the same score which is the
average sentiment score of all the negation words’ scores. SEP means that the
negative sentiment score of each negation word is its value from the lexicon. For
experiment 2, beside AVG and SEP, a new case for negative named SMALL is
used in which not all the negation words that are mentioned in [10] are used, only
few negation words are used as follows:

Small_Negation_Words=[‘no’,’not’,’nor’,’none’,’nothing’,’isnt’,’cannot’,’werent’,’rather’,
‘might’,’neither’,’dont’,’could’,’doesnt’,’couldnt’,’couldve’,’wasnt’,’didnt’,’would’,’wouldnt’,
‘wouldve’,’should’,’shouldnt’,’shouldve’,’neverending’,’nobody’,’scare’,’scares’,’less’,
’hardly’].

The resulted sentiment score from the previous four parameters is a value ranged
between [−1, 1] and we compare it with the real rating provided in the dataset to
calculate the two performance measures (accuracy and F1-measure). If the real rating is
between 4 to 5 and the resulted sentiment score is positive, it considers as a correct
calculation for the review score (i.e. True Positive), but if the resulted score is negative,
it considers as false Positive (i.e. same thing for negative).
Tables 3 and 4 present the results for the experiments in calculating the total review
sentiment score. The results are sorted in descending order based on the accuracy
measure. For experiment #2, only the result for case 2 is presented as the results for
case 1 were rather low.
In this section, we will summarize the observations of the best results of each
experiment that presented in Tables 3 and 4 (i.e. determined by bold font in both
accuracy and F1-Measure columns) as follows:
1. Experiment 2 gives a better result than Experiment 1, this means that in building the
lexicon, identifying the occurrence of a word in a review based on its occurrence is
more accurate than identifying based on presence and absent of the word in a
review.
2. All the best results from all the experiments include all the types of POS (i.e. nouns,
verbs, adjective, and adverb). It means that words that have sentiment orientation do
not restrict only on the adjectives and adverbs, nouns and verbs also have sentiment
orientations that highly affect the SA process.
3. Additionally, all the best results include negation words, this proves that negation
words have a big effect in the SA process.
4. Four of the best results have priorities for adjectives only or both adjectives and
adverbs, this shows that both the adjectives and adverbs have higher sentiment
orientations than the nouns and verbs, but this does not significantly influence the
nouns and verbs in the SA process.
Various Pre-processing Strategies for Domain-Based Sentiment Analysis 211

Table 3. Accuracy and F1-Measure for the methods of Experiment 1 for Case 1, 2 and 3.
Case POS/Words Negation Accuracy F1-Measure
#1 N+ADJ+ADV SEP 87.754 93.147
N+V+ADJ(x3)+ADV(x2) SEP 87.718 93.191
N+ADJ(x3)+ADV(x2) – 87.693 93.103
N+ADJ+ADV AVG 87.686 93.144
N+V+ADJ(x2)+ADV SEP 87.68 93.181
N+ADJ+ADV – 87.435 93.027
N+V+ADJ+ADV SEP 87.363 93.029
N+V+ADJ+ADV AVG 87.204 92.961
N+V+ADJ+ADV – 87.02 92.872
ADJ(x5)+ADV(x2) – 86.442 92.579
ADJ(x4)+ADV(x2) – 86.293 92.513
ADJ(x2) – 85.764 92.263
All words – 85.5 92.133
ADJ+ADV – 83.829 90.57
ADJ+ADV AVG 82.895 89.794
ADJ – 82.151 89.58
ADJ+ADV SEP 81.936 89.144
#2 N+V+ADJ(x3)+ADV SEP 88.197 93.405
N+V+ADJ(x4)+ADV SEP 88.173 93.379
N+V+ADJ(x2)+ADV SEP 88.162 93.401
N+V+ADJ(x3)+ADV(x1.5) SEP 88.137 93.38
N+V+ADJ(x1.5)+ADV SEP 88.08 93.366
N+V+ADJ(x2)+ADV(x1.5) SEP 88.05 93.348
N+V+ADJ(x5)+ADV(x2) SEP 88.047 93.311
N+V+ADJ(x3)+ADV(x2) SEP 88.047 93.336
N+V+ADJ(x3)+ADV(x2) SEP 87.978 93.282
N+V+ADJ+ADV SEP 87.894 93.271
N+ADJ(x3)+ADV(x2) SEP 87.742 93.135
N+ADJ(x5)+ADV(x3) SEP 87.63 93.06
N+ADJ+ADV SEP 87.593 93.065
All words – 87.208 92.963
ADJ+ADV SEP 83.737 90.391
#3 N+V+ADJ(x3)+ADV SEP 88.148 93.402
N+V+ADJ(x3)+ADV(x2) SEP 88.147 93.404
N+V+ADJ(x2)+ADV SEP 88.04 93.363
N+ADJ(x3)+ADV(x2) SEP 87.925 93.248
N+V+ADJ+ADV SEP 87.738 93.223
N+ADJ+ADV SEP 87.641 93.133
N+V+ADJ+ADV – 87.527 93.124
All words – 87.497 93.11
N+ADJ+ADV – 87.445 93.05
212 S. M. AL-Ghuribi et al.

Table 4. Accuracy and F1-Measure for the methods of Experiment 2 for Case 2.
Case POS/Words Negation Accuracy F1-Measure
#2 N+V+ADJ+ADV SEP 89.431 94.018
N+V+ADJ+ADV SMALL 89.412 94.008
N+V+ADJ+ADV AVG 89.207 93.91
N+V+ADJ+ADV – 89.12 93.876
N+V+ADJ(x3)+ADV(x2)+Neg: SEP SEP 89.084 93.806
N+V+ADJ(x3)+ADV+Neg: SEP SEP 89.02 93.77
N+V+ADJ(x3)+ADV SMALL 89.009 93.771
N+ADJ+ADV SEP 88.594 93.436
N+ADJ+ADV – 88.498 93.481
ADJ+ADV – 83.64 90.381
ADJ – 81.654 89.191
ADJ+ADV SEP 81.55 88.639
ADV SEP 77.24 85.482

Finally, Exp2.Case2(N_V_ADJ_ADV+Neg: SEP) method got the highest accuracy


and F1-measure, so it is chosen as the best method for calculating the total sentiment
score, and Exp2.Case2 is chosen as the best method for building the lexicon.
C. Evaluation
After determining Experiment #2 for Case #2 (N+V+ADJ+ADV) with Negation = SEP
method as the most efficient for calculating the total review sentiment score in the
previous section, we will evaluate the domain-based lexicon that is generated in
Experiment #2 for Case #2 with the general-based lexicon (i.e. SentiWordNet) using this
method. We use 5 cross-validation and the dataset is divided into 80% for training (i.e.
building the lexicon for the domain lexicon), and 20% for testing (i.e. calculating review
score). Figure 2 shows the accuracy and F1-measure for both the domain lexicon and
SentiWordNet using the Exp2.Case2(N_V_ADJ_ADV+Neg: SEP) method. The
domain lexicon outperforms the SentiWordNet in both the accuracy and F1-measure.

96
93
90
87
84
81
78
75
Fold1 Fold2 Fold3 Fold4 Fold5 Average

Accuracy_Domain Accuracy_SentiWordNet
F1-measure Domain F1-measure SentiWordNet

Fig. 2. Accuracy and F1-measure of 5 cross-validation for both domain and general lexicon
Various Pre-processing Strategies for Domain-Based Sentiment Analysis 213

5 Conclusion

In conclusion, this research aims to study the effects of various preprocessing methods
in building the domain-based lexicon and in calculating the total review sentiment
score for unbalanced big-sized dataset. Many preprocessing methods are presented and
the lexicon that is generated in Experiment #2 for Case #2 proves to be the best lexicon
and Experiment #2 for Case #2(N_V_ADJ_ADV) with negation = SEP method is
selected as the best method for calculating the total review sentiment score. This in rule
proves that sentiment word is not restricted only on the adjectives and adverbs only, it
can be nouns or verbs also. Additionally, negation words prove their positive effects in
the SA process. Finally, we compared the best proposed domain lexicon with the
general lexicon in calculating the total review sentiment score, and the results show that
the proposed domain lexicon outperforms the general lexicon in both accuracy and F1-
measure. Using the domain lexicon, the accuracy is 9.5% higher than the Senti-
WordNet and the F1-measure is 6.1% higher than the SentiWordNet. This big differ-
ence proves that using the domain-based lexicon is more efficient than the general
lexicon. For future work, we are planning to use the domain lexicon in the aspect-level
sentiment analysis to find the aspect sentiment score inside the review which is deeper
than the total review sentiment score.

Acknowledgment. We acknowledge the support of the Organization for Women in Science for
the Developing World (OWSD) and Sida (Swedish International Development Cooperation
Agency).

References
1. AL-Ghuribi, S.M., Noah, S.A.M.: Multi-criteria review-based recommender system–the
state of the art. IEEE Access 7(1), 169446–169468 (2019)
2. Duwairi, R., El-Orfali, M.: A study of the effects of preprocessing strategies on sentiment
analysis for Arabic text. J. Inf. Sci. 40(4), 501–513 (2014)
3. Zin, H.M., et al.: The effects of pre-processing strategies in sentiment analysis of online
movie reviews. In: AIP Conference Proceedings. AIP Publishing LLC (2017)
4. Pradana, A.W., Hayaty, M.: The effect of stemming and removal of stopwords on the
accuracy of sentiment analysis on indonesian-language texts. Kinetik: Game Technol. Inf.
Syst. Comput. Netw. Comput. Electron. Control 4(4), 375–380 (2019)
5. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia
Comput. Sci. 17, 26–32 (2013)
6. Jianqiang, Z.: Pre-processing boosting Twitter sentiment analysis? In: 2015 IEEE
International Conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE (2015)
7. Krouska, A., Troussas, C., Virvou, M.: The effect of preprocessing techniques on Twitter
sentiment analysis. In: 2016 7th International Conference on Information, Intelligence,
Systems & Applications (IISA). IEEE (2016)
8. Labille, K., Gauch, S., Alfarhood, S.: Creating domain-specific sentiment lexicons via text
mining. In: Proceedings of the Workshop Issues Sentiment Discovery Opinion Mining
(WISDOM) (2017)
214 S. M. AL-Ghuribi et al.

9. Labille, K., Alfarhood, S., Gauch, S.: Estimating sentiment via probability and information
theory. In: KDIR (2016)
10. Farooq, U., et al.: Negation handling in sentiment analysis at sentence level. JCP 12(5), 470–
478 (2017)
11. Thabit, K., AL-Ghuribi, S.M.: A new search algorithm for documents using blocks and
words prefixes. Sci. Res. Essays 8(16), 640–648 (2013)
12. He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with
one-class collaborative filtering. In: Proceedings of the 25th International Conference on
World Wide Web. International World Wide Web Conferences Steering Committee (2016).
http://jmcauley.ucsd.edu/data/amazon/links.html
Arabic Offline Character Recognition Model
Using Non-dominated Rank Sorting Genetic
Algorithm

Saad M. Darwish1(&), Osama F. Hassan2, and Khaled O. Elzoghaly2


1
Department of Information Technology, Institute of Graduate Studies
and Research, Alexandria University, Alexandria, Egypt
saad.darwish@alexu.edu.eg
2
Faculty of Science, Department of Mathematics, Damanhour University,
Damanhur, Egypt
Osamafarouk@sci.dmu.edu.eg

Abstract. In recent years, there was intensive research on Arabic Optical


Character Recognition (OCR), especially the recognition of scanned, offline,
machine-printed documents. However, Arabic OCR results are unsatisfactory
and are still an evolving research area. Exploring the best feature extraction
techniques and selecting an appropriate classification algorithm lead to superior
recognition accuracy and low computational overhead. This paper presents a
new Arabic OCR approach by integrating both of Extreme Learning Machine
(ELM) and Non-dominated Rank Sorting Genetic Algorithm (NRSGA) in a
unified framework with the aim of enhancing recognition accuracy. ELM is
adopted as a neural network classifier that has a short processing time and avoids
many difficulties faced by gradient-based learning methods such as learning
epochs and local minima. NSRGA is utilized as a feature selection algorithm
that has better convergence and spread of solutions. NSRGA emphasizes
ranking among the solutions of the same front, along with elite preservation
mechanism, and ensuring diversity through the nearest neighbor method reduces
the run-time complexity using the simple principle of space-time trade-off. The
Experimental results reveal the efficiency of the proposed model and demon-
strated that the features selection approach increases the accuracy of the
recognition process.

Keywords: Arabic OCR  Extreme Learning Machine  Feature selection 


NRSGA

1 Introduction

OCR is the automatic recognition of characters from images that has many applications
such as document recovery, car plate recognition, zip code recognition, and various
banking and business applications. Generally, OCR is divided into online and offline
character recognition systems [1]. Online OCR recognizes characters as they are
entered and utilizes the order, speed, and direction of individual pen strokes to achieve
a high level of accuracy in recognizing handwritten text. Offline OCR is complicated

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 215–226, 2021.
https://doi.org/10.1007/978-3-030-58669-0_20
216 S. M. Darwish et al.

because this type of recognition needs to overcome many complications, including


similarities of different character forms, overlaps in character, and interconnections
between the surrounding characters. Although offline systems are less precise than
online setups, they are widely used in specialized applications such as interpreting
handwritten postal addresses on envelopes and reading currency amounts on bank
checks. Furthermore, offline OCR saves time and money rewriting old documents in
electronic format [2, 3]. Consequently, obstacles facing offline OCR, and the increasing
need for OCR applications, make offline OCR an exhilarating field of research.
OCR system aims to achieve a high recognition rate, overcome the poor quality of
scanned images, especially in historical documents, and adapt style and size variations
within the same document. Regardless of other languages, Arabic OCR is still
developing because of the complex nature of Arabic words structure and syntax. Some
of these complexities are that [2]: (1) The outline of each character depends upon its
place in the word in two or four shapes. (2) Some characters’ shape is similar, but the
position and the number of dots, for example , that can be written above or
below the characters differ. (3) The characters are written connected to each other. Yet,
some of the characters cannot be connected to the last characters, which cause a word
to have many connected components; these are called Pieces of Arabic Words (PAWs).
Moreover, special marks called diacritics, written above or below the character, are
used to modify the character accent.
The OCR output is dependent on text quality, text image processing, and the
different methods for classifying to improve the detection rate. Generally, the OCR
system comprises six stages: image acquisition (scanning), segmentation, preprocess-
ing, feature extraction, classification, and post-processing [4]. The two main factors that
affect the OCR recognition rate are: (1) a set of representative features from word
images and (2) an efficient classification algorithm [5]. The selection of a stable and
representative collection of features is the heart of the OCR system design. This process
captures the essential characteristics of a word and combines them in a feature vector,
yet simultaneously ignores the unimportant ones. OCR classification techniques can be
broadly grouped into three categories [6, 7]: heuristic (e.g., fuzzy), template matching
(e.g., dynamic time warping), and learning-based methods (e.g., neural networks).
These algorithms still do not achieve satisfactory results with Arabic OCR as they are
not generalize training data well and sensitive to common types of distortions.
Currently, the Genetic Algorithm (GA) is considered one of the most powerful
unbiased optimization techniques for sampling a large solution and is used to find the
most optimized solution for a given problem [8]. To deal with the problem facing
multi-objective genetic algorithm such as computational complexity, the need for
specifying a sharing parameter, and non-elitism, the Non-dominated Sorting Genetic
Algorithm (NSGA-II) is employed that is based on Pareto dominance for measuring the
quality of solutions during their search [9]. Recently, many studies introduced variants
of NSGA-II that improve on the time-complexity by managing the book-keeping in a
better way than the basic algorithm. One of these variants is the Non-dominated Rank
Sorting Genetic Algorithm (NRSGA) that classifies the population into fronts first and
then assigning ranks to the solutions in a front [10]. This technique ensures diversity
with no extra niching parameter. As ensuring diversity is parameterized implicitly in
the fitness criteria, convergence property does not get affected while ensuring diversity.
Arabic Offline Character Recognition Model 217

Furthermore, to ensure diversity, distances between nearest neighbors are considered,


which is fast and straightforward. However, there are some emerging problems
regarding how to get an optimal Arabic OCR that overcomes the curious nature of
Arabic characters, achieving a high recognition rate, and dealing with many font styles.
This research is motivated by all these challenges.
This work mainly contributes to creating a new Arabic OCR model that focuses on
printed and segmentation-free images of the Arabic word, both through NRSGA and
ELM. NRSGA is utilized in the feature selection process with two different objective
functions. Since NRSGA decreases the computational complexity of ordinary genetic
algorithms, it has the ability to find a better spread of solutions and better convergence
near the true Pareto-optimal. In addition, the suggested model exploits an efficient
generalized Single hidden Layer Feed-Forward Networks (SLFN) algorithm, called the
Extreme Learning Machine (ELM), as a classification algorithm. The model aims to
reach the least recognition error, the shortest running time, and the simplest structure.
The work presented in this paper is an extension of the previous work [11] to
improve the selection of the best features at an appropriate time and treats the traditional
NSGA-II imperfections for features selection such as lack of uniform diversity and
absence of lateral diversity preserving operator among the current best non dominated
solutions. The suggested model investigates the potential improvements to the recog-
nition accuracy in using NRSGA instead of NSAG-II for word’ features extraction
within offline Arabic character recognition applications. The results show that the new
variant is on par with traditional one and hence can be considered in such situations,
which the warrant an alternative approach for validation of results obtained by any of the
other methods, especially for improving computation cost in the testing phase.
The structure of the paper is organized as follows: a short survey about previous
research is provided in Sect. 2. In Sect. 3, the proposed model is discussed in detail.
Section 4 gives experimental results demonstrating the proposed model’s performance
and evaluation. Then the paper concludes with final remarks on the study and the future
work in Sect. 5.

2 Related Work

Work in the Arabic OCR field has been particularly important in recent years, primarily
because it has been challenging to achieve both targets increasing recognition rate and
decreasing computational cost without degrading one another. For instance, the authors
in [12] built an Arabic OCR system using Scale Invariant Feature Transform (SIFT) as
features for classification of letters in conjunction with the online failure prediction
method. With increasing window dimensions, the system scans every word; seg-
menting points are set in which the classifier achieves maximum confidence.
To highlight the influence of image descriptors, the research in [13] concentrated on
improving the feature extraction phase by selecting efficient feature subsets using
different feature selection techniques. These techniques rank the 96 possible features
based on their importance. The research has shown that the NSGA chooses the right
feature sub-set relative to the other four methods. The system also found that the SVM
classifier gives the highest classification quality.
218 S. M. Darwish et al.

The concept of partial segmentation processes has been utilized in [14] for rec-
ognizing Arabic machine-printed texts using and the Hausdorff distance. To determine
the number of multi-size sliding windows in the given form of a PAW, the trans-
forming stroke width was used to calculate the size and font styles. The method uses
Hausdorff’s distance to determine the resemblance of the two images (character and
sliding window). The system gave satisfying results of the high recognition rate for the
APTI database and the PATS-A01 database. However, increasing the number of sliding
windows in each image made the processing time-consuming.
To tackle the problem of word segmentation, the authors in [15] defined each shape
of an Arabic word as a specific class, without word segmentation. The features
extracted for every word were twenty vertical sliding windows to include structural and
geometrical representations of Arabic words. The last phase was the classification
phase, where the multi-class SVM was applied. The system was tested with various
Arabic word datasets and obtained a recognition rate of 98.5%. Recently, extreme
learning machine has become a cutting edge and promising approach in image clas-
sification. Researchers in [16] developed an expert system for identifying Brazilian
vehicles’ standard license plates. Many classification algorithms were applied to
identify numbers and letters. Among the used classifiers, ELM achieved the highest
accuracy of plate characters’ detection with the smallest standard deviation.
In [11], the authors suggested a model that fuses ELM with NSGA-II to solve the
problem of Arabic OCR recognition. However, utilizing NSGA-II does not find the
best solution features. If the number of solutions in the first front does not exceed the
population size, diversity among the finally obtained set may not be adequately
ensured. From the survey conducted, it has been inferred that the current methods for
Arabic OCR fail to handle transformation factors since they have a major limitation in
selecting the optimal distinctive features from different words.

3 Proposed Model

The block diagram that summarizes the main components of the proposed Arabic OCR
model is depicted in Fig. 1. The model utilizes NRSGA to select the optimal features
and ELM classifier for recognition of the scanned, offline, machine-printed documents
without long training time. The model consists of two main phases: training and testing
phases. The following subsections discuss the components of the model in detail with
the clarification of the objective of each step.

3.1 Image Acquisition


Although there are many popular Arabic databases, the proposed model used well-
researched printed databases that have good resolution, and many font styles and sizes.
The first one is the PATS-A01 database that consists of 2766 text line images in eight
fonts. The second is the APTI database that contains 113,284 text images, 10 Arabic
fonts, 10 font sizes, and 4 font styles [15, 17]. The samples are variable in size, font
type, orientation, and noise degree. Since PATS-A01 images are lines of words, lines
were segmented manually to get separated words samples like the APTI samples, in
order to unify input for the preprocessing step.
Arabic Offline Character Recognition Model 219

Fig. 1. The proposed Arabic OCR Model by fusing ELM and Ranked NSGA-II.

3.2 Preprocessing
Preprocessing aims to produce a clear version of each image for the OCR system [13–
15]. In this step, each word image follows five operations to prepare for feature
extraction. These operations are: (a) transforming the image to grayscale and then to
binary format, (b) removing noise from the image by using appropriate median filter,
(c) removing all small objects by applying morphologic open and close operation,
(d) correcting the image if it is rotated, (e) resizing image to appropriate dimensions.
220 S. M. Darwish et al.

3.3 Segmentation
Since word segmentation is the primary source of errors in recognition, the suggested
model avoids this step and uses pre-segmented images (segmentation-free words) [1].
However, images from the PATS-A01 database are lines of words; these will be
segmented manually.

3.4 Feature Extraction and Selection


The major goal of the feature extraction stage is to maximize the recognition rate with
the least amount of features that are stored in a feature vector. The underlying concept
of this step is to extract features from word images that achieve a high degree of
similarity between samples of the same classes and a high degree of divergence
between samples of other classes [6, 12]. As stated in [5], the second-order statistics
focused on feature extraction methods have achieved higher levels of diversity than the
power spectrum (transform-based) and the structural methods. From these second-order
statistics, image moments achieved the best results [13]. Consequently, the suggested
model employs a set of fourteen features dependent on invariant moments; because
they are translation and scale-invariant. The feature vector contains the kurtosis and
skewness for both horizontal and vertical projection, the vertical and Horizontal center,
the number of objects in the image, and the first seven invariant moments [5, 6, 12, 13].
Arabic Offline Character Recognition Model 221

Fig. 2. Flowchart of NRSGA.

As a general rule, the proposed model needs to extract the best features that
optimize classification results and highlight the discrepancy among different classes.
Therefore, NRSGA is utilized to select the best features and reduce the dimensionality
of the training dataset. The basic concept of NRSGA is to classify the population into
fronts first and then to assign ranks to the solutions in a front. The ranking is assigned
with respect to all the solutions of the population, and each front is reclassified with this
ranking. The distinguishing features of this algorithm are: (1) Reclassifying the solu-
tions of the same front based on ranks. (2) Successfully avoiding sharing parameters by
ensuring diversity among trade-off solutions using the nearest neighbor method [10].
Algorithm 1 illustrates the main steps of NRSGA, and Fig. 2 graphically depicts its
main component [18, 19].
Herein, we adopt two different conflicting objectives: to minimize the number of
genes (features) used in classification while maintaining acceptable classification
accuracy expressed as testing error. In general, utilizing NRSGA for feature selection
ensures diversity with no extra niching parameter. Traditional elitism does not allow an
already found Pareto-optimal solution to be deleted. As ensuring diversity is parame-
terized implicitly in the fitness criteria, convergence property does not get affected
while ensuring diversity. To ensure diversity, distances between nearest neighbors are
considered that is fast and straightforward.

3.5 Classification Using Extreme Learning Machine


Classification is the OCR system’s decision-making process that uses the features
extracted from the previous stage. The classification algorithm is taught with the
training dataset; then, it is fed with the testing dataset to recognize the different classes
(each class is a word). Achieving a high recognition rate requires a powerful
222 S. M. Darwish et al.

Fig. 3. ELM’s structure.

classification technique that outperforms its contemporaries’ techniques in terms of


speed, simplicity, and recognition rate. The proposed model utilizes ELM, a fast and
efficient learning algorithm, defined as a generalized Single hidden Layer Feedforward
Network (SLFN). Fundamentals of ELM techniques are composed of twofold [20]:
universal approximation capability with random hidden layer, and various learning
techniques with easy and fast implementations. Figure 3 shows the structure of the
ELM.
ELM aims to break the barriers between the conventional artificial learning tech-
niques and biological learning mechanism and represents a suite of machine learning
techniques in which hidden neurons need not be tuned. Compared with traditional
neural networks and support vector machines, ELM offers significant advantages such
as fast learning speed, ease of implementation, and minimal human intervention. Due
to its remarkable generalization performance and implementation efficiency, ELM has
been applied in various applications. See [18, 21, 22] for more details.

4 Experimental Results

In this section, the accuracy of the proposed model was tested, and the results were
compared with the results of previous systems on the same benchmarked databases.
The testbed dataset contains more than 102 images and more than 50 testing samples
from PATS-A01 and APTI [15, 17]. The model tests only four of the eight fonts in this
database, which are Arial, Naskh, Simplified, and Tahoma in the experiments. The
individual text lines of the PATS-A01 database were segmented manually to separate
Arabic Offline Character Recognition Model 223

Fig. 4. Arabic words samples.

them into words. Training classes were 22 different Arabic words in different sizes,
orientations, noise degrees, and fonts. Figure 4 shows samples of Arabic words. The
experiments were conducted on an AMD Quad-core, 2 GHz processor, 4 GB
DDR3 RAM laptop, and Windows 8.1 operating system. The code was written in
MATLAB language using MATLAB 2011Rb software. Many criteria were used in the
evaluation of the suggested model; these criteria are training time, defined as the time
spent on training ELM, testing time, which is the time spent on predicting all testing
data, and training/testing accuracy, which is the root mean square of correct
classification.
The first set of experiments was performed to compare the identification accuracy
of the proposed model that employs NRSGA to determine the optimal features and the
traditional version of the model without using optimization (i.e., using 14 features from
second-order statistics). Results for the previous model that utilizes NSGA-II [11] were
included in the table to verify the difference between NRSGA and NSGA-II in terms of
recognition accuracy and computational cost (time for training and testing). A set of
features is extracted from each word image forming a feature vector for each word.
Each feature vector is then classified individually using an extreme learning classifier.
The results shown in Table 1 revealed that the use of the five optimal features [f4, f6, f7,
f8, f10] extracted by NRSGA with the classifier generates a further identification rate
improvement of about 0.2% for the same method using six optimal features [f4, f6, f7, f8,
f12, f14] extracted by NSGA-II for both PATS-A01 and APTI. But this slight
improvement came at the expense of the increase in training time as compared NSGA-
II-based recognition model because NRSGA has an extra-added complexity for
reclassifying the solutions of the same front based on ranks.
One possible explanation of this result is that NRSGA maintains the diversity
among the solutions by dynamically controlling the crowding distance. NRSGA alle-
viates most of the difficulties of non-dominated sorting and sharing evolutionary
algorithms. The basic NSGA-II performs even worse since its complexity is O(MN2),
where M is the number of objectives, and N is the size of the dataset [9]. Though the
difference is negligible for small populations, there is a marked difference for popu-
lation sizes of 1000 and more. This is significant since such problems frequently
demand the use of large population sizes to yield reasonable results.
224 S. M. Darwish et al.

Table 1. The identification accuracy rates for the suggested model using NRSGA or NSGA-II.
Dataset Accuracy Testing Time Training Time
(%) (sec) on average (sec) on average
PATS-A01 With NRSGA 98.92 14 115
(650 samples) With NSGA-II 98.69 16 98
Without 97.04 30 60
feature
selection
APTI (550 With NRSGA 95.76 11 96
samples) With NSGA-II 95.37 14 73
Without 97.04 25 15
feature
selection

The performance improvement comes from the correct identification of word image
because NRSGA can extract optimal features (discriminative features) with the help of
the objective fitness function that mixes the recognition error. APTI contains images
with a small Arial font in contrast with PATS-A01 that includes images with a big Arial
font; so, the accuracy for APTI decreases compared with the second dataset. As
expected, using only six features, on average, for each sample will decrease the time
required for identification in the test phase as compared with fourteen features (on
average 56% reducing in time). For the training phase, the NSGA-II module consumes
more time for feature selection, about a 63% increase in time.
The second set of experiments was performed to show how the recognition rate of
the suggested model relies on the number of word’s image samples per word because if
the word has more enrolled samples, the chance of correct hit increases. The maximum
allowed limit of word’s image is 50 (for both PATS-A01 and APTI) per class and
through which they appear different operations on the image such as rotation, scaling,
and noise. In Table 2, as expected, the recognition rate increases as the number of
samples grows due to the increase in inter-class word’s image variability. The Accu-
racy rate grows nearly by 3% for every increase by 10 of the number of samples in the
dataset.

Table 2. Relationship between accuracy rate and the number of samples.


Test set No of samples Accuracy (%) Testing time (s)
PATS-A01 (650 samples) 10 80.98 3.12
20 88.48 5.51
40 90.36 10.82
50 98.69 24.74
APTI (550 samples) 10 80.02 1.09
20 87.41 1.92
40 90.33 3.78
50 98.03 8.66
Arabic Offline Character Recognition Model 225

In general, increasing the number of samples within each class does not affect
largely in improving accuracy up to 60 samples, since the suggested model relies on
extracting characteristic features from the pattern word image, which does not vary
much based on the font type and style. Combining all samples to learn the proposed
model increases accuracy up to 99%; due to the NRSGA performance in choosing the
best features that represent the word’s image in general. This increase is done at the
cost of the time taken to train the model. But this time is negligible compared to the
time consumed in the testing phase. In the training phase, the optimal feature selection
module takes the most time. One possible justification for this reduction in accuracy for
the APTI dataset is that it contains images with a small Arial font, and resizing will
degrade the image’s quality. Some limitations are facing our model due to overlapping
fonts such as Diwani and Thuluth fonts that significantly affect accuracy compared to
the other fonts.

5 Conclusions and Discussions

Arabic offline OCR for printed text is a very challenging and open area of research.
This paper developed an Arabic OCR for printed words based on a combination of the
ELM classifier and the NRSGA. In the beginning, the model used fourteen features
dataset. After applying NRSGA, the datasets were reduced to five features dataset;
then, data was fed into the ELM network, which is a fast and simple single hidden layer
feed-forward network. ELM avoids the local minimum traps and long training time of
ordinary neural networks, and the hidden layer of SLFNs needs not to be tuned.
Moreover, NRSGA helps select the most defining features that decreased the dataset’s
complexity by 57% and significantly improved the performance. The model achieves a
high recognition accuracy of 98.87% for different samples in a short time. Future work
includes utilizing more complex ELM networks such as optimal weight-learning
machines to achieve promising results on the Latin and Indian OCRs and should be
tested on Arabic OCRs.

References
1. Lawgali, A.: A survey on Arabic character recognition. Int. J. Signal Process. Image Process.
Pattern Recogn. 8(2), 401–426 (2015)
2. Lorigo, L., Govindarajum, V.: Offline Arabic handwriting recognition: a survey. J. Pattern
Anal. Mach. Intell. 28(5), 712–724 (2006)
3. Jumari, K., Ali, M.: A survey and comparative evaluation of selected off-line Arabic
handwritten character recognition systems. Malays. J. Comput. Sci. 36(1), 1–18 (2012)
4. Bouazizi, I., Bouriss, F., Salih-Alj, Y.: Arabic reading machine for visually impaired people
using TTS and OCR. In: 4th International Conference on Intelligent Systems and Modelling,
Thailand, pp. 225–229 (2013)
5. Mohamad, M., Nasien, D., Hassan, H., Haron, H.: A review on feature extraction and feature
selection for handwritten character recognition. Int. J. Adv. Comput. Sci. Appl. 6(2), 204–
213 (2015)
226 S. M. Darwish et al.

6. Ismail, S., Abdullah, S.: Geometrical-matrix feature extraction for on-line handwritten
characters recognition. J. Theor. Appl. Inf. Technol. 49(1), 1–8 (2013)
7. Bhavsar, H., Ganatra, A.: A comparative study of training algorithms for supervised machine
learning. Int. J. Soft Comput. Eng. 2(4), 2231–2307 (2012)
8. Hassan, O., Gamal, A., Abdel-khalek, S.: Genetic algorithm and numerical methods for
solving linear and nonlinear system of equations: a comparative study. Intell. Fuzzy Syst.
J. 38(3), 2867–2872 (2020)
9. Golchha, A., Qureshi, G.: Non-dominated sorting genetic algorithm-II – a succinct survey.
Int. J. Comput. Sci. Inf. Technol. 6(1), 252–255 (2015)
10. D’Souza, R., Sekaran, C., Kandasamy, A.: Improved NSGA-II based on a novel ranking
scheme. J. Comput. 2(2), 91–95 (2010)
11. Darwish, S., El Nagar, S.: Arabic offline character recognition using the extreme learning
machine algorithm. Int. J. Digit. Content Technol. Appl. 11(4), 1–14 (2017)
12. Stolyarenko, A., Dershowitz, N.: OCR for Arabic using sift descriptors with online failure
prediction. J. Imaging 3(1), 1–10 (2011)
13. Abandah, G., Malas, T.: Feature selection for recognizing handwritten Arabic letters. Eng.
Sci. J. 37(2), 1–20 (2010)
14. Saabni, R.: Efficient recognition of machine printed Arabic text using partial segmentation
and Hausdorff distance. In: IEEE Conference on Soft Computing and Pattern Recognition,
Tunis, pp. 284–289 (2014)
15. Al Tameemi, A., Zheng, L., Khalifa, M.: Off-line Arabic words classification using multi-set
features. Inf. Technol. J. 10(9), 1754–1760 (2011)
16. Neto, E., Gomes, S., Filho, P., Albuquerque, V.: Brazilian vehicle identification using a new
embedded plate recognition system. Measur. J. 70(1), 36–46 (2015)
17. Slimane, F., Ingold, R., Kanoun, S., Alimi, A., Hennebert, J.: A new Arabic printed text
image database and evaluation protocols. In: IEEE International Conference on Document
Analysis and Recognition, Spain, pp. 946–950 (2009)
18. Murugavel, A., Ramakrishnan, S.: An optimized extreme learning machine for epileptic
seizure detection. Int. J. Comput. Sci. 41(4), 1–10 (2014)
19. Deb, K.: Multi-objective optimization using evolutionary algorithms: an introduction.
J. Multi-objective Evol. Optim. Prod. Des. Manuf. 1(1), 3–34 (2011)
20. Biesiada, J., Duch, W., Kachel, A., Maczka, K.: Feature ranking methods based on
information entropy with parson windows. In: International Conference on Research in
Electro Technology and Applied Informatics, Indonesia, pp. 1–10 (2005)
21. Huang, B., Wang, H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn.
Cybernet. 2(2), 107–122 (2011)
22. Huang, G., Zhu, Q., Siew, C.: Extreme learning machine: a new learning scheme of feed
forward neural networks. In: IEEE International Joint Conference on Neural Networks,
USA, pp. 985–990 (2004)
Sentiment Analysis of Hotel Reviews Using
Machine Learning Techniques

Sarah Anis(&), Sally Saad, and Mostafa Aref

Faculty of Computer and Information Sciences,


Ain-Shams University, Cairo, Egypt
Sarrahaniss@gmail.com

Abstract. Sentiment analysis is the task of identifying opinions expressed in


any form of text. With the widespread usage of social media in our daily lives,
social media websites became a vital and major source of data about user
reviews in various fields. The domain of tourism extended activity online in the
most recent decade. In this paper, an approach is introduced that automatically
perform sentiment detection using Fuzzy C-means clustering algorithm, and
classify hotel reviews provided by customers from one of the leading travel
sites. Hotel reviews have been analyzed using various techniques like Naïve
Bayes, K-Nearest Neighbor, Support Vector Machine, Logistic Regression, and
Random Forest. An ensemble learning model was also proposed that combines
the five classifiers, and results were compared.

Keywords: Sentiment analysis  Sentiment detection  Sentiment


classification  Machine learning  Tourism

1 Introduction

Nowadays, people generally prefer to communicate and socialize on the web. On travel
sites, users express their opinions and write reviews of their hotel experience. Taking
advantage of these huge volumes of data is of great value to tourism associations and
organizations which aim to increase profitability and enhance or maintain customer
satisfaction. Customer Reviews provided through the text are considered either sub-
jective or objective. Sentences with subjective expressions include opinions, beliefs,
personal feelings and views, while objective expressions include facts, evidences and
measurable observations [1]. Most of the sentiment analysis approaches apply senti-
ment detection first which differentiates between objective and subjective reviews then
determines the sentiment polarity of subjective reviews whether it is positive or neg-
ative. Fuzzy C-Means clustering algorithm was applied for sentiment detection to
classify sentences to subjective or objective. Objective sentences are filtered out and
only subjective meaningful sentences are retained.
We used five different machine learning classifiers for sentiment classification
namely Naïve Bayes, K-Nearest Neighbor, Support Vector Machine, Logistic
Regression and Random Forest. Then we applied an ensemble learning model between
the five classifiers to achieve better results. This paper is organized as follows. First, a
summary of some related work is found in Sect. 2. Section 3 introduces the proposed

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 227–234, 2021.
https://doi.org/10.1007/978-3-030-58669-0_21
228 S. Anis et al.

system and discusses the techniques used throughout this paper. Section 4 shows the
final results and compares the performance of various techniques used. Finally, Sect. 5
presents the conclusion.

2 Related Work

The main goal of sentiment analysis is to identify the polarity of text whether it is
positive or negative. Recently, data available online related to tourism is increasing
exponentially [2]. Sentiment analysis can effectively aid decision making in tourism, by
improving the understanding of tourist experience [3].
Many machine learning methods are used for sentiment classification of hotel
reviews. Machine learning methods are categorized into supervised, semi-supervised
and unsupervised approaches. Neha Nandal et al. [4] have utilized the Support Vector
Machine to classify Amazon customer reviews, where aspect terms are identified first
for each review and finally give a polarity score for each review. Their outcomes
showed that among the three kernels of the Support Vector machine, the Radial Basis
Function (RBF) kernel provided the best result. Vishal S. et al. [5] have performed
sentence-level sentiment analysis. They mainly focused on negation identification from
online news articles. They used machine learning algorithms like Support Vector
Machine and Naïve Bayes. They achieved accuracies of 96.46% and 94.16% for
Support Vector Machine and Naïve Bayes respectively. Aurangzeb Khan et al. [6]
proposed a sentence-level sentiment analysis approach. They extracted subjective
sentences and labeled them positive or negative based on their word-level feature using
the Naïve Bayes classifier. Then they applied the Support Vector machine for sentiment
classification. Their proposed method on average achieves an accuracy of 83%. Vishal
Kharde et al. [7] showed that machine learning methods, such as Support Vector
Machine and Naïve Bayes have the highest accuracy in sentiment classification on
twitter data. Xia et al. [8] used an ensemble framework for Sentiment Classification that
combined various feature sets namely Part-of-speech information, Word-relations, and
three machine learning techniques, which are Naïve Bayes, Maximum Entropy, and
Support Vector Machine. They found that Naïve Bayes worked better on feature sets
with a smaller size. By contrast, the Maximum Entropy and Support Vector machine
were more effective for high-dimensional feature sets. They performed different
ensemble approaches like the fixed combination, weighted combination, and Meta-
classifier combination to improve the accuracy.
Rehab M. Duwairi et al. [9] performed sentiment analysis on Arabic reviews using
machine learning methods. Three classifiers were used which are Naïve Bayes, Support
Vector Machine and K-Nearest Neighbor. Their experimental results showed that the
Support Vector machine achieved the highest precision, while the K-Nearest Neighbor
achieved the highest recall. Anjuman Prabhat et al. [10] Performed sentiment classi-
fication on twitter reviews. They have used Naïve Bayes and Logistic Regression for
the classification of reviews. The results showed that Logistic regression achieved
better accuracy and precision than the Naïve Bayes. Bhavitha et al. [11] compared the
performance of the Random Forest and Support Vector Machine on sentiment analysis.
Random Forest obtained better accuracy, but requires higher processing and training
Sentiment Analysis of Hotel Reviews 229

time. Gayatri Khanvilkar et al. [12] employed both Random Forest and Support Vector
Machine on sentiment analysis for sentiment classification. They claim that their
proposed system helps to improve Sentiment analysis for Product Recommendation,
using Multi-class classification.
One of the main tasks in sentiment analysis is sentiment detection. Sentiment
detection or subjectivity detection is the task of identifying subjective text. In this work,
Samir Rustamov et al. [13] used the Fuzzy Control system and Adaptive Neuro-Fuzzy
Inference System for sentence-level subjectivity detection from movie reviews. They
used informative features that improve the accuracy of the systems with no language-
specific requirements. Iti Chaturvedi et al. [14] used both Bayesian networks and fuzzy
recurrent neural networks for subjectivity detection. Bayesian networks are used to
capture dependencies in high-dimensional data, and fuzzy recurrent are then used to
model temporal features. They claim that their proposed model can deal with standard
subjectivity detection problems, and also proved its portability across languages.

3 Proposed System

Initially, the dataset used contains 38,932 labeled hotel reviews from the Kaggle
website. This data set provides reviews of a single hotel. Some pre-processing was
done to clean and prepare data for sentiment analysis (Fig. 1). Pre-processing of text
involves eliminating irrelevant content from text [15]. Online texts usually contain lots
of noise, and uninformative parts like typos, bad grammar, URLs, Stop words, and
Expressions. The main goal of data pre-processing is to reduce the noise in the text,
which should help improve the performance of the classifier, and speed up the clas-
sification process.
One of the steps of pre-processing is the removal of stop words. Stop words such as
“the”, “a”, “an”, and “in” take valuable processing that is not needed so they should be
removed. Removal of punctuations is another important step in the process of data
cleaning. For example: “.”, “,”, “?” are important punctuations that should be retained
while others need to be removed. Moreover, to avoid word sense disambiguation, an
apostrophe lookup is required to convert any apostrophe into standard text. For
example, “it’s a very nice place” should be transformed into “it is a very nice place”.
Sometimes words are not in proper formats therefore standardizing words is important
for data cleaning, for example: “it is a good place” should be “it is a good place”.
For feature extraction word embedding was used; to represent words in each
sentence of a review. Word embedding is a technique where words or expressions are
mapped to vectors of real numbers [16]. Word2vec technique [17] was employed for
feature extraction. The corpus used in training the word2vec model was built with all
the words available in the dataset. Word2Vec is one of the most popular methods that
uses shallow neural networks. Word2vec can extract deep semantic features between
words [17]. It computes continuous vector representations of words. The computed
word vectors retain a huge amount of syntactic and semantic regularities present in the
language [18], and transform them into relation offsets in the resulting vector space.
The number of features extracted for each word is 100.
230 S. Anis et al.

Fig. 1. The proposed system

Each review is composed of several sentences. After tokenizing sentences into


words, the summation of all word vectors in the review is computed. Normalizing the
features was a prior step to subjectivity detection. For subjectivity detection, the Fuzzy
C-means clustering algorithm was used, to classify data into positive, negative, and
objective. The Fuzzy C-means clustering algorithm is very similar to the K-means
algorithm, but instead of using hard clustering where each data point can belong to only
one cluster, Fuzzy C-means performs soft clustering in which each data point can
belong to multiple clusters to a certain degree by means of a Membership Function. For
example, data points that are close to the center of a cluster will have a high degree of
membership for that cluster while data points that are far from the center of that cluster
will have a low degree of membership. After clustering, objective sentences are dis-
carded, and only sentences classified as positive or negative are retained for further
classification. Five different machine learning techniques have been used to build our
sentiment classification model, which are Naïve Bayes, K-Nearest Neighbor, Support
Vector Machine, Logistic Regression, and Random Forest. An Ensemble learning
model between the five classifiers was also applied to enhance the accuracy.

3.1 Machine Learning Methods


Naïve Bayes is the one of simplest models that is based on the Bayes Theorem. It
works efficiently with large datasets as it is fast and accurate with a very low com-
putational cost [19]. Naïve Bayes classifier assumes that all the features are
Sentiment Analysis of Hotel Reviews 231

independent [20], but this assumption is not always true in real-life situations. How-
ever, Naïve Bayes often works well in practice. Given a predictor X, P(C|X) is the
posterior probability of class C and P(X|C) is the probability of X given the class. P(X)
is the prior probability of X.

PðCjXÞ ¼ PðXjCÞ  PðCÞ=PðXÞ ð1Þ

K-Nearest Neighbor (KNN) is also a very simple and easy-to-implement method


[20]. The KNN algorithm assumes that similar features or data points exist close to
each other. The most commonly used method for calculating the distance between two
points is the standard Euclidean distance. However, there are other ways of calculating
distance, and the choice depends on the problem. The input consists of the K closest
training samples in the feature space. To choose the right value for K, we run the KNN
algorithm several times with different values of K and choose the K that returns the
highest accuracy. The Nearest Neighbors have been successful in many classification
and regression problems but the main disadvantage of KNN that it becomes signifi-
cantly slower when the volume of data increases.
Support Vector Machine (SVM) is one of the most effective and famous classi-
fication machine learning methods. SVM has been developed from statistical learning
theory [19]. Moreover, it is memory efficient as it uses a subset of training points in the
prediction process. SVM works well with a clear margin of separation and with high
dimensional data. On the other side, it works poorly with overlapping classes and is
also sensitive to the type of kernel used.
Logistic Regression is a statistical method that is used for binary classification.
Logistic Regression measures the relationship between the test variable and our fea-
tures, by estimating probabilities using its underlying Sigmoid function. The Sigmoid
function is an S-shaped curve that can take any real-valued number and map it into a
value between the range of 0 and 1, these values are then transformed into either 0 or 1
using a threshold classifier. It is well-known for its efficiency as it does not require too
many computational resources, it is highly interpretable, it does not require input
features to be scaled, it does not require any tuning, and it is easy to implement.
A disadvantage of it is that it cannot solve non-linear problems since its decision
surface is linear and it is also vulnerable to overfitting.
Random Forest consists of a large number of individual decision trees that operate
as an ensemble. Each tree in the Random Forest returns a class prediction and Random
Forest makes decision based on the majority of votes. The ensemble learning technique
of this algorithm reduces the overfitting problem in decision trees and improves the
overall accuracy [21]. It is very stable and can handle missing values and non-linear
parameters efficiently. Random Forest is also comparatively less impacted by noise.
The disadvantages are the long training time and high complexity as it generates a lot
of trees that requires more computational power and resources than the simple decision
tree.
232 S. Anis et al.

4 Results and Analysis

Dataset of hotel reviews from the Kaggle website, contains 26,521 positive reviews and
12,411 negative reviews. After subjectivity detection, the total number of subjective
reviews is 37,827 reviews. Sentiment classification performance was evaluated using
precision, accuracy, recall, and F-score measures. Precision is the ratio of true positives
to the total number of predicted positives, while Recall is the ratio of true positives to
the total number of true positives and false negatives. True Positives (TP) are actually
positive data points that are correctly classified as positive by the model, and False
Negatives (FN) are positive data points that are misclassified as negative by the model.
F-score is the weighted average of Precision and Recall, taking both false positives and
false negatives into account. According to those measures, a comparison between the
applied machine learning methods was applied, and results are presented in Table 1.

Precision ¼ TP=TP þ FP ð2Þ

Recall ¼ TP=TP þ FP ð3Þ

F-score ¼ 2  ðRecall  PrecisionÞ=ðRecall þ PrecisionÞ ð4Þ

Table 1. Evaluation of methods performance.


Method Accuracy Precision Recall F-score
Naïve Bayes 0.718 0.727 0.957 0.655
K-Nearest Neighbor 0.838 0.836 0.956 0.827
Support Vector Machine 0.863 0.875 0.938 0.859
Logistic Regression 0.859 0.867 0.942 0.854
Random Forest 0.846 0.848 0.95 0.838

Fig. 2. Accuracy of various techniques


Sentiment Analysis of Hotel Reviews 233

The results obtained show that Support Vector Machine and Logistic Regression
techniques achieved better than the other techniques in terms of accuracy. For K-
Nearest Neighbor, there is no ideal value for the initial number of neighbors K, it is
selected after testing and evaluation. However, the best results were achieved when
using 17 as K. Naïve Bayes classifier had the least accuracy compared to the other
techniques (Fig. 2).
One of the problems that could cause errors in the classification process is that a
single review can have different or mixed opinions on different aspects, which makes
the task of assigning an overall polarity to the review hard. To achieve better results, an
ensemble model of all classifiers was taken into consideration. An ensemble learning
model is commonly known to outperform the performance of single classifiers.
Ensemble models provide higher consistency and reduce errors. The results of the
ensemble classifier are shown in Table 2. However, The Ensemble model didn’t
improve the accuracy in our case compared to the accuracy achieved by the Support
Vector Machine.

Table 2. Evaluation of ensemble model.


Method Accuracy Precision Recall F-score
Ensemble learning model 0.862 0.872 0.94 0.858

5 Conclusion

In this research, different techniques were investigated to classify hotel reviews. The
algorithms applied were Naïve Bayes, K-Nearest Neighbor, Support Vector Machine,
Random Forest, and Logistic Regression. The best results were obtained by the Support
Vector Machine. On the other hand, the Naïve Bayes classifier had the least achieved
accuracy. The Support Vector Machine achieved 86.3% accuracy, logistic Regression
achieved 85.9%, Random Forest classifier accuracy was 84.6% while K-Nearest
Neighbor and Naïve Bayes classifier accuracies were 83.8% and 71.8% respectively.
An ensemble learning model was also created, combining those five classifiers to
increase the accuracy. But after testing the model, the ensemble learning model
achieved 86.2% accuracy, which means that the accuracy did not improve, and the
Support Vector Machine is more efficient in this case.

References
1. Feldman, R.: Techniques and applications for sentiment analysis. Commun. ACM 56(4),
82–89 (2013)
2. Alaei, A., Becken, S., Stantic, B.: Sentiment analysis in tourism: capitalising on big data.
J. Travel Res. 58(9), 175–191 (2017)
3. Valdivia, A., Luzón, M.V., Herrera, F.: Sentiment analysis in trip advisor. IEEE Intell. Syst.
32(4), 72–77 (2017)
234 S. Anis et al.

4. Nandal, N., Tanwar, R., Pruthi, J.: Machine learning based aspect level sentiment analysis
for Amazon products. Spat. Inf. Res. 1–7 (2020)
5. Shirsat, V.S., Jagdale, R.S., Deshmukh, S.N.: Sentence level sentiment identification and
calculation from news articles using machine learning techniques. In: Iyer, B., Nalbalwar, S.,
Pathak, Nagendra Prasad (eds.) Computing, Communication and Signal Processing.
Advances in Intelligent Systems and Computing, vol. 810, pp. 371–376. Springer,
Singapore (2019)
6. Khan, A., Baharudin, B.B., Khairullah, K.: Sentence based sentiment classification from
online customer reviews. In: 8th International Conference on Frontiers of Information
Technology, Pakistan, Article no. 25, pp. 1–6 (2010)
7. Kharde, V.A., Sonawane, S.S.: Sentiment analysis of twitter data: a survey of techniques.
Int. J. Comput. Appl. 139(11), 0975–8887 (2016)
8. Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment
classification. Inf. Sci. Int. J. 181(6), 1138–1152 (2011)
9. Duwairi, R.M., Qarqaz, I.: Arabic sentiment analysis using supervised classification. In:
2014 International Conference on Future Internet of Things and Cloud, Barcelona. pp. 579–
583. IEEE (2014)
10. Prabhat, A., Khullar, V.: Sentiment classification on big data using Naïve Bayes and logistic
regression. In: 2017 International Conference on Computer Communication and Informatics
(ICCCI 2017), Coimbatore, India, pp. 1–5. IEEE (2017)
11. Bhavitha, B.K., Rodrigues, A.P., Niranjan, N.C.: Comparative study of machine learning
techniques in sentimental analysis. In: 2017 International Conference on Inventive
Communication and Computational Technologies (ICICCT), pp. 216–221. IEEE (2017)
12. Khanvilkar, G., Vora, D.: Sentiment analysis for product recommendation using random
forest. Int. J. Eng. Technol. 7(33), 87–89 (2018)
13. Rustamov, S., Clements, M.A.: Sentence-level subjectivity detection using neuro-fuzzy
models. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity,
Sentiment and Social Media Analysis, Atlanta, Georgia, pp. 108–114 (2013)
14. Chaturvedi, I., Ragusa, E., Gastaldo, P., Zunino, R., Cambria, E.: Bayesian network based
extreme learning machine for subjectivity detection. J. Franklin Inst. 335(4), 1780–1797
(2017)
15. Haddi, E., Liu, X., Shi, Y.: The role of text pre-processing in sentiment analysis. Procedia
Comput. Sci. 17, 26–32 (2013)
16. Ray, P., Chakrabarti, A.: A mixed approach of deep learning method and rule-based method
to improve aspect level sentiment analysis. Appl. Comput. Inform. (2019). https://doi.org/10.
1016/j.aci.2019.02.002
17. Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on
word2vec and SVMperf. Expert Syst. Appl. 42(4), 1857–1863 (2015)
18. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word
representations. In: Proceedings of the 2013 Conference of the North American Chapter of
the Association for Computational Linguistics: Human Language Technologies, Georgia,
pp. 746–751 (2013)
19. Sahayak, V., Shete, V., Pathan, A.: Sentiment analysis of Twitter data. Int. J. Innovative Res.
Adv. Eng. (IJIRAE) 2(1), 30–38 (2015)
20. Baid, P., Gupta, A., Chaplot, N.: Sentiment analysis of movie reviews using machine
learning techniques. Int. J. Comput. Appl. 179(7), 45–49 (2017)
21. Dimitriadis, S.I., Liparas, D.: How random is the random forest? RF algorithm on the service
of structural imaging biomarkers for AD: from ADNI database. Neural Regeneration Res. 13
(6), 962–970 (2018)
Blockchain and Cyber Physical System
Transparent Blockchain-Based Voting
System: Guide to Massive Deployments

Aicha Fatrah1(B) , Said El Kafhali1 , Khaled Salah2 , and Abdelkrim Haqiq1


1
Computer, Networks, Mobility and Modeling Laboratory: IR2M,
Faculty of Sciences and Techniques, Hassan First University of Settat,
26000 Settat, Morocco
{a.fatrah,said.elkafhali,abdelkrim.haqiq}@uhp.ac.ma
2
Electrical and Computer Engineering Department,
Khalifa University of Science and Technology, Abu Dhabi, UAE
khaled.salah@ku.ac.ae

Abstract. With the development of society and its people democratic


consciousness, voting as a crucial canal of democracy has to satisfy high
expectations of modern society, new technologies and techniques must
be deployed for better voting experience, and to replace traditional cost
inefficient and laborious traditional paper voting. Blockchain technology
is currently disrupting every industry where security and data integrity
are prioritized. In this paper, we leveraged this technology to propose a
design and implementation of a blockchain-based electronic voting sys-
tem for large scale election. The novelty in this paper when compared to
other state of the art blockchain-based voting systems is that it respects
voter’s privacy with a full transparency for auditing and user-friendly
terminals, which will boost the confidence of people in the voting system
and therefor increase the number of participants in the election.

Keywords: Blockchain · Smart contracts · Electronic voting ·


Zero-knowledge proof

1 Introduction
The democratic regime relies on elections to enable people to formally participate
in the decision making. Currently, there exist two major means of voting: paper-
based voting and electronic voting. Paper voting system comes with some pros,
mainly the ease of use for even illiterate people and the secrecy, since the ballot
is not bond to the voter in anyway. But when examining the workflow of this
system a lot of issues can be detected, including integrity issues; since the system
is run by people, its integrity rely in the trustworthiness of these people. Thus the
system is vulnerable to corruption and human errors. There is also accessibility
issues since the location of the polling stations can be a struggle for people in
remote and rural areas or people with disabilities or even for citizens who might
not be in the country.
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 237–246, 2021.
https://doi.org/10.1007/978-3-030-58669-0_22
238 A. Fatrah et al.

Electronic voting is an online voting system that uses cryptography tech-


niques to ensure anonymity and security. Voters can use their electronic devices
to cast their vote. Election results are automatically counted by the system. The
entire voting workflow is interconnected, so compared to the traditional paper
based system it is more convenient for voters, efficient in terms of organization
and fast. But as promising as it seems, electronic voting also has the following
disadvantages:

– Privacy issues: the voter private information or even the vote choice can be
leaked.
– Security issues: attackers can eavesdrop on transmitted data or even take
control over the system and temper with the result.
– Integrity issues: the centralized system is a black box, voters cannot verify
whether their vote was counted or not. The centralization also puts the system
in danger to Distributed Denial-of Service attacks.

In the domain of electronic voting, Blockchain can be the missing puzzle


to better e-voting. It can be used as the accounting transparent, auditable and
unaltered database, which will effectively reduce the risks and enhance the per-
formance of overall voting system.
This paper is an extension of our initial proof of concept version [1] in which
we presented a proof of concept to our blockchain-based electronic system, we
made changes to the previous protocol, we removed the need of a token as a
requirement to cast the vote which reduced the complexity and cost of the overall
system, we also changed the use of the zero knowledge proof from proofing the
validity of the vote to proving that voters are eligible. In this system the validity
of the vote is integrated the voting contract that only accepts valid choices from
the list of predetermined candidates.
The rest of the paper is organized as follows: Sect. 2 presents an overview of
electronic voting state of the art for both centralized and decentralized solutions.
The key concepts of our blockchain-based voting systems are explained in Sect. 3.
Section 4 illustrates the general design of our scheme. Section 5 represents the
implementation of our proposed system and the technological stack used. The
evaluation of our system is presented in Sect. 6. Finally, Sect. 7 includes conclud-
ing remarks and future works.

2 Electronic Voting

Electronic voting evolved and started to replace traditional voting methods


thanks to both the development of Internet technology and the progress made
in modern cryptography. There exit a list of important requirements for an elec-
tronic voting system on a large scale to be effective:

– Integrity: only eligible voters can participate, and each eligible voter can only
vote once, and votes cannot be altered or deleted form the system.
Transparent Blockchain-Based Voting System 239

– Accessibility and availability: voters can remotely access the system to partic-
ipate regardless their physical location at any time during the entire electoral
period.
– Privacy: the voter choice should always remain anonymous during the election
and post-election period.
– Transparency: the entire system should be auditable by the public, and voters
can verify if their votes were casted and tallied.
– Security: the system should be immune to typical cyber-attacks.
– Affordability: the system has to be affordable for implement and maintain by
the government, it should also be less expensive.
– Scalability: the system can support large scale election.

2.1 Centralized Electronic Voting


General electronic voting systems are all based on cryptosystems developed by
cryptographers. David Shaum [2] proposed the first electronic voting scheme
based on blind signature in 1981, the purpose was to hide voter identity by
using public key cryptography and disconnect the voter to its corresponding
ballot. Later on more cryptographic protocols were proposed to improve elec-
tronic voting [3]. There exist Eletronic voting schemes based on blind signature
and ring signature. Also electric voting schemes based on homomorphic encryp-
tion. We can also mention Electronic voting schemes based on hybrid network.
These schemes depend heavily on a trusted third party to decrypt and count
votes which puts the voters’ privacy in risk of exposure. Also these systems are
not auditable and puts full trust in the authorities managing the election.

2.2 Blockchain-Based Electronic Voting


The advent of Blockchain technology is expected to disrupt both modern elec-
tronic voting and traditional paper voting systems. Blockchain is the backbone
of electronic cryptocurrencies such as Bitcoin and Ether, it is a distributed,
immutable and transparent ledger on peer-to-peer network. Its consensus algo-
rithms like proof of work in Bitcoin solve the inconsistency problem in distributed
systems. That’s why the application of Blockchain in e-voting has gained atten-
tion of researchers and even startups like Agora and Follow my vote. The first
proposition of Zhao and Chain [4] in 2015 was based on the Bitcoin blockchain
with a penalty and reward system for voters. Other schemes were later intro-
duced like Lee, James, Ejeta and Kim [5] in 2016 and Cruz et al. [6] in 2017, but
those these two schemes depends heavily on a trusted third party to manage the
system. McCorry et al. [7] used Ethereum blockchain, which added more busi-
ness logic thanks to Ethereum blockchain, the voters’ privacy is done with the
use of zero knowledge proof, but unfortunately the voting scheme is only binary
which means voters can only vote with a yes or no and does not support multi-
ple candidates. In the same year researchers started experimenting with Zcash
blockchain because it offers anonymity of transactions, it is a blockchain based
of zero knowledge proof and can be used to protect voters’ privacy. P. Tarasov
240 A. Fatrah et al.

and H. Tewari [8] proposed the first Zcash based electronic voting, although the
system provide anonymity is it still lacking the logic capability offered but smart
contracts in Ethereum.

3 Key Concepts of Blockchain-Based Voting System


3.1 Blockchain

In 2008, the Bitcoin whitepaper was first published “Bitcoin: A Peer-to-Peer


Electronic Cash System” [9]. The Bitcoin Whitepaper or some might call it,
the Satoshi whitepaper described a decentralized peer-to-peer network to solve
the double-spending problem of digital currencies without relying on a central
authority, transactions are verified by nodes in the networks called miners, who
continuously listen to new transactions coming into the network and they race
to solve a hard mathematical problem generated by the system, the work put
to solve the problem requires immense computing power, the first to solve the
problem get the privilege to add new block into the blockchain and get a bitcoin
reward for the proof-of-work.

3.2 Ethereum Blockchain and Smart Contracts

Ethereum is a public blockchain-based platform, it allows to build decentralized


applications [10] and run with a modified version of Satoshi Nakamoto consensus
via transaction-based state transitions. It supports smart contract functionality
which adds a business logic layer, a smart contract is first written in a program-
ming language such as Solidity and then compiled via Ethereum Virtual Machine
into a bytecode that can be deployed into Ethereum network. Ethereum has two
types of accounts, the first type is the External Owned Account (EOA) controlled
by the user, the second type is the contract account, which means a contract run
by its code, both types of accounts are represented by a 20-byte (160-bit) address,
and both can store Ether (Ethereum generated token/cryptocurrency), transac-
tions between different accounts have a gas cost or fee to encourage miners to
include transactions or bytecode of a smart contract into Ethereum Blockchain.
Gas is a metric aiming to standardize the fees of operations inside the network.
There is three types of transactions inside the Ethereum Blockchain; first, the
fund transfer between externally owned accounts, second, the deployment of a
contract into the network and third the execution of an already deployed one.

3.3 Zero-Knowledge Proof

The Zero Knowledge Proof (ZKP) was first introduced by Goldwasser, Micali
and Rackoff back in 1989 [11]. It uses cryptographic primitives that allow proving
that a statement is true about some secret data without actually revealing any
other information about the secret beyond that statement. There exist two types
of ZKP; interactive and non-interactive. Interactive ZKP (IZKP) requires the
Transparent Blockchain-Based Voting System 241

prover and the verifier to continually interact to prove a statement, while the non-
interactive ZKP (NIZKP) allow verifying a statement without having the two
parties online, which makes the NIZKP faster and more efficient. In our voting
system we will need the ZKP proof that a voter is permitted to vote without
revealing its identity or the choice they made. Zero-Knowledge Set Membership
Proof (ZKSMP) consider the problem of proving that a committed value belongs
to some discrete set. More specifically and for our voting system, it is the problem
of proving that voter is within a list of eligible voters so the voter can cast its
vote anonymously.
Zero knowledge succinct non interactive arguments of knowledge or zkSnarks
[12] allow to verify the correctness of a computation without executing it, the
content inside of the computation does not need to be known, only the ability
to verify that the computation was executed correctly. Succinct in zkSnarks
suggests that the size of the proofs are very small even for complex computations,
which is ideal to make the blockchain network more efficient. The use of zkSnarks
has great potential when combined with smart contracts; it will increase privacy,
efficiency and scalability, any information can be verifies without having to reveal
it, with only one-way short interaction between prover and verifier and it can
also solve major scalability facing blockchains by having complex calculations
offchain. In this paper we implemented zkSnarks for our zero knowledge proof
verification since it is faster and light using ZoKrates [13] Toolbox.

4 System Design and Overview


4.1 System Design
The system architecture is as presented in Fig. 1 has both on-chain and off-
chain components, the main role of off-chain component is to reduce the cost of
both storage and calculation cost inside the Ethereum virtual machine. The user
than can be voters or election administration interact with e-voting distributed
application via a web interface developed in reactjs, and via metamask (web3.js)
to interact with the blockchain. The database is used to store the list of eligible
voters and candidates and is used to generate the smart contracts.

4.2 Workflow and Roles


The main three phases in our system Fig. 2 are:

– Pre-voting phase: in this phase the voter needs to provide proof of identity to
the election admin the KYC, we suppose that this phase is ensured via some
KYC application and it is secure enough to protect the voter personal data.
The administration has to verify the voters’ identity to see if they are eligible
to vote. If the voter proof of identity is valid the admin provide the voter
with the secret phrase, this phrase has to be stored and secured because it is
going to be used as a proof of knowledge to allow voters to cast their vote.
When the registration phase is over, the admin collect all the hash values
242 A. Fatrah et al.

Fig. 1. System general architecture.

of the eligible voters to create the arithmetic circuit with Zorkates for the
verify.sol smart contract. The admin has to create the list of candidates for
the election.sol contract.
– Voting phase: in this phase the voter has to provide the proof and when the
proof is validated via verify.sol, the vote can be casted but the voter has to
change their Ethereum address at this stage to hide their identity, this is the
only phase in which the voter has to use different address so that even the
admin cannot know their vote choice. After casting the vote, the voters can
get the id of the transaction that will allow them to audit their vote and make
sure it was added to the block. Testnet Rinkeby allow the exploration of the
blocks via their interface [14].
– Post-voting phase: the election.sol automatically tally the votes and return
the election results.

Fig. 2. System workflow and roles.


Transparent Blockchain-Based Voting System 243

5 Implementation
5.1 Stuck of Technologies
To implement a proof of concept of our system we used the following stuck
of technologies and tools such as Truffle [15] to handle smart contracts and
test them locally. Remix [17] the online IDE for solidity programming language.
Metamask [16] which is a wallet gateway to communicate with the blockchain via
web3.js. Testnet Rinkeby [14] is used to test contracts on Ethereum alike environ-
ment. Ganache [15] a local Ethereum to deploy contracts.It comes with free test
accounts with fake ethers. Zokrates [13] an Ethereum tool kit for zkSNARKS.

5.2 Generating the Proof


There exit three parts in zkSNARK; G generator, P prover and V verifier. Third
party has to generate G by giving a program c and random number r as input
parameters. The output will be the proving key pk and the verifier key vk.
(pk, vk) = G(c, r) (1)
Then the generator will share the proving key and verifier key with the prover
and verifier. Then the prover will generate the proof by giving a publicly available
input x, a witness w and the proving key as the input and is expressed as
prf = P (pk, x, w) (2)
The proof will be sent to the verifier who can verify it using vk, x and the
prf to get the output as true if proof is valid.
V (vk, x, w) → true/f alse (3)
So in our case, the prover will be voter and the verifier will be the smart
contract. The voter will proof his knowledge for the secret phrase of one of the
voters without any information about the voter to the smart contract.
To generate the zero knowledge set membership proof using zkSNARKS we
first need to generate the secret phrase, we used randomstring js library via
command line to generate an example of 64 alphabetic characters.
Then we generate the Sha256 from the random phrase by first converting the
phrase into a binary of 512 bit and divide it into four parts each has 128 bit so
that it can be an input to Zokrates with maximum input length of 254. Each
part is converted into a 10 base decimal. The output is a big integer represen-
tation of the secret phrase. The sha256packed zokrates function imported from
512bitPacked.zok takes an array of four fields elements as inputs and returns an
array of two field elements, each 128 bits. The voters need to proof that they
know the secret phrase without for the hash without revealing the secret phrase.
The arithmetic circuit is created in Zokrates high level language, the circuit is
created using the hash values as an alternative to OR gate logic that does not
exit in Zokrates. For more voters the same circuit will be used by parsing an
array of voters’ hashes.
244 A. Fatrah et al.

5.3 Smart Contracts

The candidate.sol contains candidate details, it has addCandidate, get Num-


berOfCandidates and get Candidate funtions Fig. 3.

Fig. 3. Candidate.sol

Fig. 4. Election.sol
Transparent Blockchain-Based Voting System 245

Election contract is responsible on voting action, validate candidate and


finally tallying total votes Fig. 4.

6 Evaluation
6.1 Gas Consumption
Smart contract uses the concept of gas which is the fuel of the system, the sender
of a transaction need to pay a fee in Wei (i.e. Eth) for the computational work
executed by the smart contract invoked. The more complex the computation
executed by the smart contract, the more gas will be needed. Setting a gas
limit is a way to protect the system from code with infinite loops. The product
of gasPrice and gas represents the maximum amount of Wei needed for the
execution of a transaction. And it is used by the miners to priorities transactions
for inclusion into the Blockchain. We tried to have off-chain component in order
to reduce the gas consumption.

Table 1. Gas Consumption by voters Table 2. Gas Consumption by Admin

Transaction name Gas Cost (Ether) Transaction name Gas Cost (Ether)
Proof verification 1625259 0,001625259 Add candidate list 39110 0,00003911
Vote 28108 0,000028108 Sha256 contract deploy 1903793 0,001903793
T1 : 0,001653367 T2 : 0,001942903

Table 1 and Table 2 show an estimation of the gas consumption of the main
transactions inside our system. The transactions that have highest consumption
are the proof verification transaction executed by voters and Sha256 contract
deployment executed by the Admin. The transactions made by the Admin are
executed in the pre-voting phase only once, while the transaction made by the
voters are executed n times the number of voters.
To have the overall cost of the election for a number of voters n we can
calculate
n × T1 + T2 (4)
The total cost is still relatively expensive for a large-scale election but the code
provided is just a proof of concept, and it has to be more optimized to be used
in real life election.

7 Conclusion
Despite the fact that the Blockchain technology success became conspicuous, a
lot of people still do not fully understand it, therefor it limits its ability to emerge
and be exploit in different fields other than cryptocurrencies. Another problem
facing the Blockchain technology is the binding of the digital and physical iden-
tity of its users, the technology cannot manage identities outside the Blockchain
246 A. Fatrah et al.

which calls for a third party to do the work of management. In this paper we
presented a design and implementation of a blockchain-based electronic voting
system, the main aim is to bring more transparency into the electoral system,
protect voters’ privacy and allow anyone to audit the system. Hence, it will
increase the number and confidence of voters.

References
1. Fatrah, A., El Kafhali, S., Haqiq, A. and Salah, K.: Proof of concept blockchain-
based voting system. In: Proceedings of the 4th International Conference on Big
Data and Internet of Things BDIoT 2019, Article No. 31, pp. 1–5, October 2019.
https://doi.org/10.1145/3372938.3372969
2. Chaum, D.: Untraceable electronic mail, return addresses and digital pseudonyms.
Commun. ACM 24(2), 84–88 (1981)
3. Xiao S., Wang X.A., Wang W., Wang H: Survey on blockchain-based electronic
voting. In: Barolli L., Nishino H., Miwa H. (eds.) Advances in Intelligent Net-
working and Collaborative Systems. INCoS. Advances in Intelligent Systems and
Computing, vol. 1035. Springer, Cham (2019)
4. Zhao, Z., Chan, T.H.: How to vote privately using bitcoin. Springer (2015)
5. Lee, K., James, J.I., Ejeta, T.G., Kim, H.J.: Electronic voting service using block-
chain. J. Digit. Forensics Secur. Law 11, article 8 (2016)
6. Jason, P.C., Yuichi, K.: E-voting system based on the bitcoin protocol and blind
signatures. Trans. Math. Model. Appl. 10, 14–22 (2017)
7. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for board room voting
with maximum voter privacy. In: International Conference on Financial Cryptog-
raphy and Data Security, pp. 357–375. Springer (2017)
8. Tarasov, P., Tewari, H.: Internet Voting Using Zcash. IACR Cryptology ePrint
Arch. 2017, 585 (2017)
9. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008). https://
bitcoin.org/bitcoin.pdf (2008)
10. Buterin, V., et al.: Ethereum white paper. GitHub Repository 1(2013), 22–23
(2013). https://github.com/ethereum/wiki/wiki/White-Paper
11. Goldwasser, S., Micali, S., Rackoff, C.: The knowledge complexity of interactive
proof systems. SIAM J. Comput. 18(1), 186–208 (1989)
12. Reitwiessner, C.: zkSNARKS in a nutshell. 5 December 2016. https://chriseth.
github.io/notes/articles/zksnarks/zksnarks.pdf
13. Zokrates project : Toolbox for zkSNARKs on Ethereum (2019). https://github.
com/Zokrates/ZoKrates
14. Rinkeby Homepage (2020). https://www.rinkeby.io
15. Truffle Suite Homepage (2020). https://www.trufflesuite.com
16. Metamask Homepage (2020). https://metamask.io/
17. Ethereum Remix IDE (2020). https://remix.ethereum.org/
Enhanced Technique for Detecting Active
and Passive Black-Hole Attacks in MANET

Marwa M. Eid1(&) and Noha A. Hikal2


1
ECE-Department, Delta Higher Institute for Engineering & Technology,
Mansoura, Egypt
marwa.3eeed@gmail.com
2
Information Technology Department, Faculty of CIS, Mansoura University,
Mansoura, Egypt
dr_nahikal@mans.edu.eg

Abstract. MANETs are still in demand for further developments in terms of


security and privacy. However, lack of infrastructure, dynamic topology, and
limited resources of MANETs poses an extra overhead in terms of attack
detection. Recently, applying modified versions of LEACH routing protocol to
MANET has proved a great routing enhancement in preserving nodes vitality,
load balancing, and reducing data loss. This paper introduces a newly developed
active and passive blackhole attack detection technique in MANET. The pro-
posed technique based on weighing a group of selected node’s features using
AdaBoost-SVM on AOMDV-LEACH clustering technique is considered a
stable and strong classifier which can strengthen the weights of major features
while suppressing the weight of the others. The proposed technique is examined
and tested on the detection accuracy, routing overhead. Results show up to 97%
detection accuracy in superior execution time for different mobility conditions.

Keywords: AdaBoost algorithm  AOMDV  Black-hole attack  MANET 


SVM

1 Introduction

Recently, mobile ad hoc network (MANET) has gained a great necessity as a mobile
network for extracting and exchanging critical information. It has great application
areas starting from mobile sensing applications, ending, not endless, by military
communications in battlefields. Since node battery’s drains could stop its function and
causes a link break. Recent approaches deploy low energy adaptive clustering hierar-
chy (LEACH) as an efficient energy consumption MANET routing protocol [1].
Researchers have proved that applying LEACH into MANET environment provides
network long lifetime and high reliability of link in terms of mobility. However, unlike
wired networks, the node’s behavior is unpredictable and unknown during the data
routing operation, which makes MANET more vulnerable to different attacks. Hence,
MANET security is still an open issue, many types of research are proposing different
defenses against many kinds of attacks such as [2]; snooping attack, poisoning attack,
denial of service (DoS) and distributed DoS (DDoS), writing table overflow, and black-

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 247–260, 2021.
https://doi.org/10.1007/978-3-030-58669-0_23
248 M. M. Eid and N. A. Hikal

hole attack. In general, these attacks are categorized under four types of attacks [3],
which are; sinking, spoofing, fabrication, and flushing. The most malicious node
behavior is sinking, when one or more nodes don’t cooperate in routing operation in the
network, moreover, they drop the packets. Sinking behavior undergoes DoS attack [3].
An intelligent routing algorithm must be able to decide the best route from source to
destination and isolate the sinker node upon analyzing these features. The absence of an
administrator and the capability of every node to act as a router increase the vulner-
ability of MANET. The contribution of this paper lies in the ability to exploit the data
aggregated by employing LEACH with reactive on-demand multipath Ad hoc On-
demand Multipath Distance Vector (AOMDV) [4] routing protocol to MANET in
detecting active and passive black-hole attacks in MANET. This protocol works as a
hybrid routing technique that integrates the advantages of both reactive and proactive
routing. It performs a single route discovery process and caches multiple paths; a new
route discovery process occurs when all these paths are broken. The route discovery
process is done in a proactive manner while path selection is done in a reactive manner.
In this paper, a semi-parametric machine learning framework is proposed for weighting
the aggregated data from cluster members to detect the black-hole attack, and hence
exclude the attacker nodes and delete paths through them. Moreover, machine learning
based on semi-parametric analysis provides the advantage of machine learning high
detection accuracy while reducing the computational complexity due to using semi-
parametric analysis. The proposed method performs a conditional proactive routing
phase during the lifetime of the MANET to gather neighbor nodes information, and
then apply a semi-parametric machine learning to detect the malicious node through
optimized time and considerable computational complexity. Finally, these results can
be used in reactive routing fashion to guide other nodes. The novelty lies in features
weight adjusting in an iterative way that helps greatly in reducing the computation
complexity and time of detection, which helps greatly in preserving nodes vitality
during mobility. In addition, the proposed framework has great flexibility in adjusting
threshold value to distinguish between malicious and benign sinker nodes.
The remainder of this paper is organized as follows. The literature survey is pre-
sented in Sect. 2. In Sect. 3, the proposed technique is presented and explained in
detail. Section 4 introduces the simulation results and discussions. Finally, the con-
clusions are introduced in the last section.

2 Related Works

One of the most significant challenges is securing the links in designing the network,
especially through insecure mediums. However, due to the limited capacity of a node,
previous studies have reported that the conventional security routine with large com-
putations and overhead of communications is improper in MANETs [5]. Moreover, it
has been observed that adding extra new metrics and performing minor changes in the
structure and operation of routing protocols could increase the performance and
security of real-time applications. Ektefa [6] analyzed the classification tree and typical
support vector machine (SVM) methods to detect intrusions through a set of attributes
like information gain for each, entropy function, etc. Applying this algorithm gives
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 249

better accuracy to the detection process, the required computations, the workload of a
network administrator, and overhead of communications to the system are still open
problems. The authors in [7] provided a blackhole detection mechanism that develops a
dynamic threshold to detect the severe changes in the normal behavior of the network
transactions. Additionally, in [8] another solution based on detecting the blackhole
attack by always considering the first route reply is the reply from the malicious node
and deleted this transaction. Although this solution decreases the data loss and
increases the throughput, also it cannot distinguish between malicious and benign
packet sinking. Yazhini and Devipriya in [9] proposed the first modified AODV
routing protocol supported by a support SVM model to detect blackhole attacks.
A density curve with respect to time was drawn to monitor the sent packets from a
source to destination nodes with and without blackhole from a simple network com-
posed of seven nodes. It was observed that the low peaks in the density curve indicate
to one malicious node detection. However, further data collection is required to
determine the black hole node and generate complete behavior proofs that contain
information from both data traffic and forwarding paths with more evidence to get a
higher accurate prediction result. Ardjani et al. [10] proposed an enhanced SVM by
particle swarm (PSO-SVM) to optimize the accuracy of SVM. Hence, Kaur and Gupta
in [11] adopted the idea of integrating the minimum and maximum variants of Ant
colony optimization to SVM (i.e. ACO-SVM) based on AODV routing protocol to
detect only passive blackhole attacks in MANET. Additionally, the authors in [12]
applied the idea of Genetic algorithm to identify packet dropping by passive black
holes in an intrusion detection system. The authors in [13] proposed machine learning
techniques for distinguishing normal and attacked behavior of a network. Furthermore,
the Author in [14] attempted to prevent the black hole attack in MANET by applying a
little modification in AODV protocol based on introducing the reliability factor-based
approach to detect fake RREP. The value of this factor is usually checked to detect the
attacker node; this technique has introduced a better result. Although many researchers
focused on detecting passive black holes, there are limited existing solutions to detect
active black-hole attacks have been proposed. Thus, for a networking problem, a more
effective machine learning model can be introduced by extra representative out of bias
data.

3 The Proposed Technique [Methods/Experimental]

Black hole attacks are generally classified into two types; passive and active black hole.
Figure 1 shows an illustration of the passive and active black hole attacks. The pro-
posed technique is based on integrating the LEACH routing protocol based on reactive
on-demand multipath AOMDV [15] with machine learning algorithms for enhancing
the performance of detecting the two types of black hole attacks. The detection process
is based on testing the intelligent group of QoS significant features. These features are
proved to be efficient indicators for active malicious behavior, compared with other
features which could be noisy data that distracts the accuracy of detection decisions.
Moreover, deploying the AdaBoost weight adaption algorithm for adapting each fea-
ture weight during the learning process provides an efficient monitoring action for
250 M. M. Eid and N. A. Hikal

detecting an active black hole. To collect these features, a group of cluster head nodes
is selected periodically to work collaboratively in data aggregation and to build
exchangeable routing reply tables of trustworthy nodes.

Fig. 1. Illustration of black hole attacks; (a) Passive blackhole (b) Active blackhole.

These clusters heads are selected periodically during the whole MANET lifetime
based on LEACH-AOMDV dynamic cluster head (CH) selection technique [16]. At
each round, LEACH-AOMDV applies the random method to distribute the energy load
among nodes. CH is chosen upon having a higher energy level and a threshold
probability value. AOMDV applies the basic AODV route discovery technique, with
an energy economical perspective. It registers multipath from source to destination, one
of them is the primary path and the others are alternatives ones. The primary path, as
well as the alternative ones, is used to transmit the packets, thus to increase the network
utilization. Multipath selection is done based on a pre-advertised hop counts, the
protocol rejects all the replies with hop counts equal to or larger than the advertised
one. Multipath is the routs with lower hop counts.
At each round during the whole MANET lifetime, clustering is done periodically
through two stages; cluster set-up stage and steady-state stage, respectively. During the
set-up state the CH is selected while in the steady-state stage, data is sensed. The
steady-state stage lifetime is much longer than the set-up stage to reduce energy
consumption. CH is selected every round based on a mathematical calculation of
threshold formula functioned in the total number of nodes within the cluster, the round
number, and node’s CH probability (p). To equalize the nodes’ energy consumption,
the workload of CH’s is distributed among all nodes during the whole lifetime of
MANET by rotating their roles, i.e.; CH of the first round can’t be repeated in the next
1/p rounds. The proposed technique works through two concatenating phases in each
round; i) Data aggregation; that simulates the process of the reactive protocol to collect
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 251

neighbors’ nodes features, ii) Malicious Nodes identification; that analyzes the col-
lected data based on machine learning to build trustworthy routing tables, as done in
proactive routing protocols. Moreover, the adaptive boosting (AdaBoost) algorithm had
been strongly recommended to generate stronger classifiers from a set of weak clas-
sifiers. The AdaBoost weight update algorithm [17] is applied to adjust the weights of
each feature, and iterative learning is used to reduce the computation complexity.
AdaBoost algorithm plays an important role in strengthening or suppressing the
weights of input features that enhances the performance of the SVM learner. SVM is
particularly chosen in the detection phase as a nonlinear machine learning algorithm
characterized by being a stable, strong classifier that provides a high detection accu-
racy. Figure 2 shows a flowchart of the principal processes of the proposed method.
This method is repeated periodically during the whole MANET lifetime.

3.1 Phase 1: Set up Phase


It is the first phase in which the entire MANET spatial area is divided into number of
clusters and a corresponding cluster heads (CH’s) are selected for each cluster. The
residual energy plays an important role in selecting CH in each round. Each node has
the same probability (p) to be a CH in the first round, as the number of rounds increase
the probability of each node to be selected again as a CH decreases until all nodes are
dead [18].

3.2 Phase 2: Data Aggregation


This phase starts occasionally at CH of each suspicious cluster based on the computed
jitter value for this cluster. CM’s are scattered in the deployment field and the simu-
lation is performed by taking the following parameters: Packet Delivery Ratio,
Throughput, Average number of hops, Normalized Routing load, Number of Packet
loss, Average Data Delivery, Dropped packets, and Energy Consumption.
Step 1: The CH node nk, (where k = 1, 2, K; K = total number of suspicious
clusters) sends RREQ packets to its neighbors within the cluster, i.e., CM’s.
Step 2: When the neighbor node receives RREQ packets, it responds with a RREP
packet if it has the destination, otherwise, it multicasts the RREQ to its neighbors to
continue the route-finding operation.
Step 3: Conventional RREQ, RREP packets contain information about: hop count,
modified neighbors with a modified hop count. Sinker node always advertises the
smaller number of hops to the flag itself as the shortest path.
Step 4: At each CH node nk, a route reply table (RREPT) is constructed containing
essential features associated with its neighbor node CM, ni (i = 1, 2,…I; I total
number of CM into each cluster). These features are:
i. Destination node identification number D-ID.
ii. The initiation time (IT); the time at which the RREQ was sent.
iii. Waiting time (WT); the time at which the RREP was received, the difference
between IT and WT represents an end-to-end delay.
iv. Next hop node identification number (NH).
252 M. M. Eid and N. A. Hikal

v. The total number of hops count to reach the destination (HC).


vi. Packet delivery rate (PDR) history of each neighbor node.
vii. The residual energy of each node within the cluster

Fig. 2. Flowchart of the proposed technique

3.3 Phase 3: Malicious Node Identification


Once the RREPT is completed, a data analysis phase is started. This phase aims to
analyze the selected RREP features mentioned above in order to give a decision
regarding the benign versus malicious behavior of nodes.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 253

Step 1: For each feature Fj, (j = 1, 2, 3..,..J; total number of features), there are N
observations recorded over an interval i (i = 1,…T) under normal conditions and
based on past records, i.e.; previous RREPT, for each feature, the mean value Cj and
the standard deviation dj values are mathematically computed as as follows [19]:

1 XN
cFj ¼ i¼1 ji
F ð1Þ
N
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN  2
i¼1 Fji  cFj
dF j ¼ ð2Þ
N

Step 2: For each new entry, compute a normalized feature value and the Euclidean
distance dj, respectively as:

Fij  min Fj
Fij ¼ ð3Þ
max Fj  min Fj
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
  XN  
 j ; cF j
dj F ¼ Fji  cFj 2 ð4Þ
i¼1

By the above equations, we can get the base of parametric sampled data used in
machine learners.
Step 3: Apply SVM classifier based on the collected sampled pairs (Fij ; Fij ). The
learners are working in a semi-parametric way since the AdaBoost module for
weights adjusting is employed. The initial feature weights are of the same value as:

1
W1 ðtÞ ¼ ð5Þ
J
The new weights Wt+1 are iteratively updated based on the calculated SVM classifier’s
error value e of the sampled pairs (Fij, Fij ) and the previous weights sets Wt as [19]:
  
Wt exp  /t Fij ht Fij
Wt þ 1 ¼ ð6Þ
Zt

Where:
 
1 1  et
/t ¼ log ð7Þ
2 et

And
XN
ht ¼ arg minhj ej ¼ i¼1
W t ði Þ ð8Þ
254 M. M. Eid and N. A. Hikal

Here N denotes the total number of weights. The weights iteration update process
continues until solving the AdaBoost-SVM optimization problem [20]:

1 X
minw uðW; nÞ ¼ kW k2 þ C n
i i
ð9Þ
2
Where: u denotes the optimization function that is subjected to sampled pairs
(Fij ; Fij ), C is a regularization parameter, i is the iteration number. And ni [ 0 is the
ith slack variable.

4 Experimental Results and Discussion

For testing the proposed technique, a MANET simulation environment is implemented


using the NS-2 simulator ver. 2.35 on Intel Core I3 processor at 2.40 GHz, 2 GB of
RAM running Ubuntu 12.04 Linux. Simulation of MANET using NS-2 simulator ver.
2.35 rather than NS-3, offers a more diverse set of MANET modules, in addition, to
unify the implementation environment with similar researches in this field [15, 21, 22].
The simulation was conducted over an area of size 1500 m  1500 m rectangular
space with randomly distributed mobile nodes and communication model with a
constant bit rate (CBR) traffic source. The MANET parameters are shown in Table 1.

4.1 Security Analysis


In this paper, the simulations are conducted under different mobility speeds scenarios to
assess and investigate the performance of the network with and without the effect of an
attack. The speeds of the nodes are fixed to 5 m/s and 20 m/s. Furthermore, these nodes
move randomly in the whole direction. The four scenarios are two benign scenarios
with node mobility 5 m/s and 20 m/s, and two more scenarios with the same conditions
but they are exposed to active sink-hole attack. Now, assume node 4 and node 5, which
belongs to cluster 1 and 2 respectively, are attacker nodes. A simple comparison
between normal and abnormal network routing parameters can show the significant
decrease in throughput, PDR, and jitter values, which is used as a first indication of the
existence of abnormality within certain clusters, which means that some clusters are
marked as suspicious nodes. Table 2 shows the numerical comparison among different
scenarios, the variance in the performance stability in abnormal cases can be noticed
obviously when considering jitter values as shown in scenarios. The comparison of
average jitter experienced by the two scenarios. Clusters whose CH senses performance
abnormality should start immediately in building a trustworthy RREPT for its neigh-
bors and announce it.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 255

Table 1. MANET simulation parameters.


Parameter Value
MAC protocol IEEE 802.11
Antenna model Omni-directional
Network scale 1500 m  1500 m
Simulation time 10 s.
MAC type 802.11
Application traffic, routing protocol CBR, AOMDV/LEACH
Packet size 1000 bytes/packet
Data rate 0.1 Mbps
Node velocity, pause time 5, 20 m/s, 1.0 s
No. of mobile nodes 33 [3 clusters]
Observation parameters PDF, end-to-end delay, jitter, throughput

Table 2. Transaction abnormality performance indices


Scenario Status End-to-end (s) Jitter
Scenario (1) Normal 0.0187 0.0048
Abnormal 0.0196 1.1732
Scenario (2) Normal 0.0199 0.0065
Abnormal 0.0242 1.3969

4.2 Detection Analysis


MATLAB simulation program is used to perform the semi-parametric analysis and to
classify the different nodes measured performance parameters based on AdaBoost-
SVM (Eq.’s 1–9). Standard classification performance metrics are used; Precision,
Recall, Accuracy, and F-measure as in [15]. The detection threshold value has a great
influence on distinguishing between benign and attacker nodes. Since decreasing the
threshold value leads to an increase in the false-positive rate, conversely, increasing the
threshold value leads to failing in detection. Figure 3 introduced the performance of
different learners for the last scenario, this is considered the worst, with increasing the
threshold value. Here, the proposed method shows more detection stability comparing
with semi-parametric learning.
The proposed technique robustness is also examined and tested against the
increasing number of black-hole attacker nodes. The key indices to test the MANET
performance under the proposed technique is to measure the PDR, throughput value,
average end to end delay, and the average packet loss. Figure 4 introduces an
256 M. M. Eid and N. A. Hikal

Fig. 3. Detection accuracy versus learner threshold value.

illustration of these key performance parameters against the increasing number of


attacker nodes. Hence, the proposed algorithm clearly approves to work better com-
pared to other examined learners. Comparing to other common algorithms in this field;
Figs. 5 and 6 respectively, compares the results of absolute error and the detection time
values based on the same attacks. Considering the obtained values, applying the
conventional SVM [15] introduces an absolute error of 0.148 and an elapsed time of
10.62 s. Moreover, by combing Ant colony optimization to SVM (SVM+ACO) [23],
the values are 0.122 and 6.99 s., respectively while the Particle swarm optimization
(SVM+PSO) [9] has values of 0.094 and 2.19 s. The result confirms better accuracy for
classification but a higher execution time. While the values in the case of applying
Decision-tree (C4.5) [24] are 0.111 and 2.48 s. Regarding the results, this technique
reaches a higher detection rate with a low false alarm, while the clustering technique
limits network overheads. Moreover, one of the major disadvantages of LEACH is that
the Cluster head can be died due to any reason then the cluster will become useless.
Hence, the data gathered by this cluster nodes would never reach its destination. Thus,
in the future, another clustering protocol can be investigated and examined using the
proposed method. Furthermore, to extend this work, multiple cooperative sinkhole
nodes will be considered, also different attacks can be examined and tested using the
proposed technique. Moreover, more robust optimizer can be examined like the pro-
posed in [25] to be added in the feature selection stage.
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 257

(a)

(b)

Fig. 4. MANET performance against the increasing number of black-hole attacker nodes.

Absolute Error
0.4
0.3
0.2
0.1
0
[24] [23] [23] [18] [15] [9] Proposed
Decision Tree Naïve Random Conventional SVM-ACO SVM-PSO Technique
Bayes Forest SVM

Fig. 5. The absolute error of the proposed algorithm compared with the common existing
algorithms.
258 M. M. Eid and N. A. Hikal

Detecon me (sec.)

15

10

0
[24] [23] [23] [18] [15] [9] Proposed
Decision Tree Naïve Random Conventional SVM-ACO SVM-PSO
Technique

Fig. 6. Detection time of the proposed algorithm compared with the common existing
algorithms.

5 Conclusions

Cluster-based routing has confirmed to be more efficient in terms of enlarging network


lifetime, load balancing, and robustness against different attacks. Although dealing with
MANET poses extra overhead in terms of attack detection, the proposed technique has
proved an efficient detection accuracy and superior detection time. The selected group
of features and their weights adaptation based on AdaBoost algorithm further reduce
the time complexity of the SVM classifier, so it results in an accurate and fast detection
technique. However, the proposed method mainly depends on the correctness of the
data aggregated from RREP packets, so any packet modification attacks are not con-
sidered in this paper. Moreover, LEACH algorithm has the ability to detect a passive
black-hole attack, since the CH that doesn’t transmit data all the time is remarked as a
black-hole one and its probability to be chosen is highly reduced. However, it might be
so hard to detect collaborative sinkholes that cooperate to send fake requests (i.e.
RREQ and RREP).

References
1. Belavagi, M.C., Muniyal, B.: Performance evaluation of supervised machine learning
algorithms for intrusion detection. Procedia Comput. Sci. 89, 117–123 (2016). https://doi.
org/10.1016/j.procs.2016.06.016
2. Jangir, S.K., Hemrajani, N.: A comprehensive review and performance evaluation of
detection techniques of black hole attack in MANET. J. Comput. Sci. 13, 537–547 (2017).
https://doi.org/10.3844/jcssp.2017.537.547
3. Abdel-Fattah, F., Farhan, K.A., Al-Tarawneh, F.H., Altamimi, F.: Security challenges and
attacks in dynamic mobile ad hoc networks MANETs. In: 2019 IEEE Jordan International
Joint Conference on Electrical Engineering and Information Technology, JEEIT 2019,
pp. 28–33 (2019). https://doi.org/10.1109/JEEIT.2019.8717449
Enhanced Technique for Detecting Active and Passive Black-Hole Attacks 259

4. Sarika, S., Pravin, A., Vijayakumar, A., Selvamani, K.: Security issues in mobile ad hoc
networks. Procedia - Procedia Comput. Sci. 92, 329–335 (2016). https://doi.org/10.1016/j.
procs.2016.07.363
5. Vimala, S., Khanaa, V., Nalini, C.: A study on supervised machine learning algorithm to
improvise intrusion detection systems for mobile ad hoc networks. Cluster Comput. 22(2),
4065–4074 (2018). https://doi.org/10.1007/s10586-018-2686-x
6. Ektefa, M., Memar, S., Affendey, L.S.: Intrusion detection using data mining techniques. In:
2010 International Conference on Information Retrieval & Knowledge Management
(CAMP), Shah Alam, Selangor, pp. 200–203 (2010)
7. Panos, C., Ntantogian, C., Malliaros, S., Xenakis, C.: Analyzing, quantifying, and detecting
the blackhole attack in infrastructure-less networks. Comput. Netw. 113, 94–110 (2017).
https://doi.org/10.1016/j.comnet.2016.12.006
8. Koujalagi, A.: Considerable detection of black hole attack and analyzing its performance on
AODV routing protocol in MANET (mobile ad hoc network). Am. J. Comput. Sci. Inf.
Technol. 06, 1–6 (2018). https://doi.org/10.21767/2349-3917.100025
9. Yazhini, S.P., Devipriya, R.: Support vector machine with improved particle swarm
optimization model for intrusion detection. Int. J. Sci. Eng. Res. 7, 37–42 (2016)
10. Ardjani, F., Sadouni, K., Benyettou, M.: Optimization of SVM multiclass by particle swarm
(PSO-SVM). In: 2010 2nd International Workshop on Database Technology and Applica-
tions, DBTA 2010, p. 3 (2010). https://doi.org/10.1109/DBTA.2010.5658994
11. Kaur, S., Gupta, A.: A novel technique to detect and prevent black hole attack in MANET.
Int. J. Innov. Res. Sci. Eng. Technol. 3, 4261–4267 (2015). https://doi.org/10.15680/
IJIRSET.2015.0406092
12. Elwahsh, H., Gamal, M., Salama, A.A., El-Henawy, I.M.: A novel approach for classifying
MANETs attacks with a neutrosophic intelligent system based on genetic algorithm. Secur.
Commun. Netw. 2018 (2018). https://doi.org/10.1155/2018/5828517
13. Nagalakshmi, T.J., Rajeswari, T.: Detecting packet dropping malicious nodes in MANET
using SVM. Int. J. Pure Appl. Math. 119, 3945–3953 (2018). https://doi.org/10.5958/0976-
5506.2018.00752.0
14. Gupta, P., Goel, P., Varshney, P., Tyagi, N.: Reliability factor based AODV protocol:
prevention of black hole attack in MANET. In: Advances in Intelligent Systems and
Computing, pp. 271–279. Springer (2019). https://doi.org/10.1007/978-981-13-2414-7_26
15. Shakya, P., Sharma, V., Saroliya, A.: Enhanced multipath LEACH protocol for increasing
network life time and minimizing overhead in MANET. In: 2015 International Conference
on Communication Networks, pp. 148–154. IEEE (2015). https://doi.org/10.1109/ICCN.
2015.30
16. Chandel, J., Kaur, N.: Energy consumption optimization using clustering in mobile ad-hoc
network. Int. J. Comput. Appl. 168, 11–16 (2017). https://doi.org/10.5120/ijca2017914405
17. Tu, C., Liu, H., Xu, B.: AdaBoost typical algorithm and its application research. In: 3rd
International Conference on Mechanical, Electronic and Information Technology Engineer-
ing (ICMITE 2017), pp. 1–6 (2017). https://doi.org/10.1051/matecconf/201713900222
18. Almomani, I., Al-kasasbeh, B., Al-akhras, M.: WSN-DS : a dataset for intrusion detection
systems in wireless sensor networks (2016)
19. Mazraeh, S., Ghanavati, M., Neysi, S.H.N.: Intrusion detection system with decision tree and
combine method algorithm. Int. Acad. J. Sci. Eng. 6, 167–177 (2019)
20. Li, X., Wang, L., Sung, E.: AdaBoost with SVM-based component classifiers. Eng. Appl.
Artif. Intell. 21, 785–795 (2008). https://doi.org/10.1016/j.engappai.2007.07.001
21. Anbarasan, M., Prakash, S., Anand, M., Antonidoss, A.: Improving performance in mobile
ad hoc networks by reliable path selection routing using RPS-LEACH. Concurr. Comput.
Pract. Exp. 31, e4984 (2019). https://doi.org/10.1002/cpe.4984
260 M. M. Eid and N. A. Hikal

22. Arebi, P., Rishiwal, V., Verma, S., Bajpai, S.K.: Base route selection using leach low energy
low cost in MANET (2016). https://www.semanticscholar.org/paper/Base-Route-Selection-
Using-Leach-Low-Energy-Low-In-Arebi-Rishiwal/b88542600c90a97a7af0b6eb42f37d7920
c2ecf1. Accessed 19 July 2019
23. Tyagi, S., Kumar, N.: A systematic review on clustering and routing techniques based upon
LEACH protocol for wireless sensor networks. J. Netw. Comput. Appl. 36, 623–645 (2013).
https://doi.org/10.1016/j.jnca.2012.12.001
24. Pavani, K., Damodaram, A.: Anomaly detection system for routing attacks in mobile ad hoc
networks. Int. J. Netw. Secur. 6, 13–24 (2014)
25. El-Sayed, E.-K.M., Eid, M.M., Saber, M., Ibrahim, A.: MbGWO-SFS: modified binary grey
wolf optimizer based on stochastic fractal search for feature selection. IEEE Access 8,
107635–107649 (2020). https://doi.org/10.1109/ACCESS.2020.3001151
A Secure Signature Scheme for IoT Blockchain
Framework Based on Multimodal Biometrics

Yasmin A. Lotfy1(&) and Saad M. Darwish2


1
Faculty of Engineering, Department of Computers,
Pharos University in Alexandria, Alexandria, Egypt
yasmin.lotfy11@gmail.com
2
Department of Information Technology, Institute of Graduate Studies
and Research, Alexandria University, Alexandria, Egypt
saad.darwish@alexu.edu.eg

Abstract. Blockchain technology is receiving a lot of attention in recent years


for its wide variety of applications in different fields. On the other hand, the
Internet of Things (IoT) technology currently considers the new growth engine
of the fourth industrial revolution. Current studies developed techniques to
overcome the limitations of IoT authentication for secure scalability in the
Blockchain network by only managing storing the authentication keys. How-
ever, this authentication method does not consider security when extending the
IoT to the network; as the nature of IoT ensures connectivity with multiple
objects, with many places that increase security threats, it causes serious damage
to assets. This brought many challenges in achieving equilibrium between
security and scalability. Aiming to fill this gap, the work proposed in this paper
adapts multimodal biometrics for extracting a private key with high entropy for
authentication to improve IoT network security. The suggested model also
evaluates the security score for the IoT device using a whitelist through
Blockchain smart contract to ensure the devices authenticate quickly and limit
the infected machines. Experimental results prove that our model is inherently
unforgeable against adaptively chosen message attacks; it also reduces the
number of infected devices to the network by up to 49% compared to the
conventional schemes.

Keywords: Blockchain  IoT  Fuzzy identity signature  Multimodal


biometrics  Feature fusion

1 Introduction

As life increasingly moves online, one of the challenges facing Internet users is to
execute a transaction in an environment in which there is no trust among them. This
unavoidably increases the need for a cost-efficient secured data transfer environment.
Blockchain is a peer-to-peer network that is cryptographically secure, immutable, and
updatable only via an agreement among peers [1]. It’s a promising technology whereby
peers can exchange values using transactions despite requiring a central authority; it
protects the user’s privacy and prevents identity theft [2]. The digital signature veri-
fication protects the transaction in blockchain by ensuring that the creator has the right

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 261–270, 2021.
https://doi.org/10.1007/978-3-030-58669-0_24
262 Y. A. Lotfy and S. M. Darwish

private key [3]. Unfortunately, in several critical fields where highly strict verification
is required, this technique doesn’t ensure that the creator of a transaction is an
authorized user. There is a possibility that an intruder may collect the private key and
produce illegal processes.
Permissioned Blockchain works for a wide scale of industries involving banking,
finance, insurance, healthcare, human resources, and digital music services [4].
Blockchain is based on asymmetric cryptography and digital signature scheme par-
ticipants. For a Blockchain to identify and maintain proof of ownership of any digital
assets, each user has a pair of public and private keys. Private keys are at considerable
risk of exposure; leads to weak blockchain system [5]. One of the potential solutions to
this issue is to use biometric data like fingerprint, face, and iris as a private key [6].
Since biometric data consider part of the body, it provides an efficient way to identify
the user. However, biometric data is noisy and tends to vary while taken, since two
biometric scans produced from the same feature are rarely identical despite the
advantages of biometric authentication. Consequently, even though the parties use a
mutual secret key generated from biometric data, conventional protocols don’t guar-
antee correctness. The fuzzy private key is the perfect solution for the current issues in
biometric authentication [7]. Fuzzy private key principle implements based on in a
fuzzy identity-based signature scheme allows for the generation of a signature using
“fuzzy data” (i.e., a noisy data such as a biometric) as a signing and verification key.
Internet of things (IoT) offers a more responsive environment by allowing various
devices to communicate and exchange information. IoT devices collect and analyze
significant amounts of data, such as personal and confidential data from daily life.
However, hacking and cyber-attacks hitting IoT devices are growing each year and, as
most of the IoT devices low-power and low-performance devices, that make it difficult
to apply security methods implemented for traditional PCs to IoT devices; hence, they
become vulnerable to cyber-attacks. To solve this problem, Attention has been given to
the incorporation of Blockchain technology and IoT [8].
Motivated by the above challenges also in trying to adapt with them, in this paper,
we aim to introduce multimodal biometrics technology for authentication in the
blockchain system to extend IoT devices securely. The multimodal biometric system
decreases the potential of spoofing and helps overcome the weaknesses of unimodal
biometric systems. The private key created from two biometric features is fused to
obtain the most unique and high entropy key. Furthermore, regardless of whether the
user is authenticated through the network, our model automatically evaluates the IoT
devices’ security score using the whitelist via smart contract and restrict the scalability
of the infected device when the score is low. Furthermore, fuzzy matching is exploited
to match the digital signature in the verification phase to handle noisy biometric
features.
The rest of this paper is organized as follows: Sect. 2 describes some of the recent
related works. The detailed description of the proposed model has been made in
Sect. 3. In Sect. 4, the results and discussions on the dataset are given. Finally, the
conclusion is annotated in Sect. 5.
A Secure Signature Scheme for IoT Blockchain Framework 263

2 Related Work

Conventional security on blockchain private keys is mainly done in two ways, either by
encrypting the keys or developing on hardware or software-based wallets [9]. Those are
unsuitable as the safety probably won’t be guaranteed; also, all these wallets still must
synchronize to the blockchain. To address the problems described above, in 2018, W.
Dai [10] has suggested a lightweight wallet based on the Trust zone that can build a
secure and stable code environment that demands high security. Their approach is more
portable compared to the hardware wallet and secure than the software wallet.
In recent years, IoT has been received much attention from researchers. For
instance, in 2016, Samaniego, M., [11] introduced a collaborative system for an
automatic smart contract between blockchain and IoT. This approach suggested a
secure and scalable platform with non-centralized authority. The main goal is not only
confirming that a correct device has generated a transaction but to ensure that a proper
user has generated that transaction on his intention. Yet, checking the user’s intention
when generated blockchain transactions is still a challenging issue.
In another work, in 2015, Balfanz, D. [12] used biometric information such as
fingerprints and irises in secure hardware, and then activated the private key. By linking
such an authentication method with blockchain, it ensures a secure blockchain system.
However, it is necessary to convey a device with biometric information and to input
that biometric information from that dedicated secured equipment.
Since blockchain still considers new technology, there is room for making it more
efficient and practical in real applications. According to the review mentioned above, it
can be found that past studies were primarily dedicated to (1) Devising different types
of wallets either offline or online that employ storing the private key and keeping
backups of the wallets. (2) Creating new techniques for singing and verification by
encrypting the private key still does not guarantee a trusted secure authentication.
(3) Not addressing the problem that varies between issuing a transaction from a
legitimate user and another one that hacked the private key. (4) Not addressing the
issues when an infected device extends across the network. However, to the best of our
knowledge, little attention has been paid to suggest a new approach to merge block-
chain and biometrics at the algorithm level to help improve IoT security and usability in
the blockchain-based framework.

3 Proposed Methodology

This paper proposes a modified model combines multimodal biometrics and blockchain
technology in a unified framework based on a fuzzy identity-based signature to secure
IoT device authentication and allows the extension in the blockchain network. We take
into consideration that every IoT device has security flaws and subjected to installing
infected software. Therefore we evaluate the security score of the devices by using the
whitelist, which defines a list of verified software and then restricts device extension
other than the list [13]. We apply a multimodal biometrics method based on feature
level fusion of both fingerprint and finger-vein to extract a unique biometric private key
during the biometric key extraction phase.
264 Y. A. Lotfy and S. M. Darwish

During the transaction generation phase, the sender encrypts the data with his
private biometric key, and the fuzzy biometric signature is created. The sender adds his
biometric signature to a new blockchain transaction, then this new block’s transaction
is appended to the ledger waiting for approval or rejection. Finally, during the verifi-
cation phase a unique strict verification is applied using the biometric public key from
the previous transition, private biometric key and the signature, if the verification is
valid the block is added to the public ledger of the blockchain and the data is trans-
mitted through the network otherwise the block is rejected. The main diagram of the
suggested IoT blockchain model is depicted in Fig. 1. To create and submit a trans-
action to any user in the IoT network, the user has to go through five phases: evaluating
the security score, biometric key extraction phase, registration phase, transaction
generation phase, and the verification phase.

Fig. 1. Block diagram of the proposed IoT blockchain authentication model


A Secure Signature Scheme for IoT Blockchain Framework 265

3.1 Fuzzy Identity Based Signature Scheme


Our proposed model relies on Fuzzy Identity Based Signature (FIBS) [14] for gener-
ating and verifying blockchain transactions through IoT devices. Since the person can
produce a different biometric key every time, he initiates a transaction. Therefore, the
conventional digital signature schemes aren’t suitable since they allow a stable data to
be used as a key. FIBS uses fuzzy data as a cryptographic key such as finger print, iris,
finger-vein, etc. It allows a user with identity w to issue a signature that could be
verified with identity w0 if and only if w and w0 are within a certain distance. Refer to
[7] for the full steps of building FIBS.

3.2 Security Score Evaluation for IoT Devices Phase


IoT devices can be hacked due to various vulnerabilities, but the most common sce-
narios are malicious programs installed by the inattention of the user or by hacker
attacks [15]. To resolve such issues, our suggested model evaluates the security score
based on software verification installed in IoT devices using the whitelist and records it
into the blockchain via the smart contract. The whitelist located in an agent embedded
in a secure area inside the device contains all the software installed in IoT devices. The
security score evaluation is implemented through the following steps: (1) IoT device
manufacturers install the software in the device, then provide the whitelist, and also
create smart contracts with both the manufacturer’s whitelist. The Initial Agent Hash
Value (IAHV) of the agent embedded in the IoT device, then recorded them in the
blockchain through Whitelist Smart Contract (WSC). (2) The device access WSC
recorded in the blockchain then compares the IAHV recorded in the block with the
Device Agent Hash Value (DAHV) of the current whitelist installed on the device.
(3) if both match, means the device is not infected or hacked, and the security score is
set to be a high and list of software in the whitelist is transmitted to the device, else
means forge detection and the connection will be restricted and the manufacture will be
alerted with the request for whitelist update via smart contract. (4) Scoring Smart
Contract (SSC) is created include device unique data and the security status based agent
software and recorded into the blockchain. (5) SSC of the device can be inquired when
the device is extended to other networks to guarantee safety authentication [16].

3.3 Biometric Key Extraction Phase


During this phase, the biometric data is generated by extracting unique features from
both fingerprint and finger vein. We use image enhancement techniques to improve
contrast, brightness, and remove noise. Regions of interest (ROIs) are extracted.
Finger-print vector and finger-vein vector are generated respectively based on a unified
Gabor filter framework. Both vectors are reduced in dimensionality by applying
Principal Component Analysis (PCA) and concatenated to create a Bio-Private key.
The reasons for conducting fusion of fingerprint and finger-vein characteristics are that
(1) finger-vein and fingerprint are two physiological characteristics carried by one
finger and both have reliable properties in biometrics, (2) fingerprints and finger veins
hold complementarity in universality, accuracy and security [17]. Both pre-processing
266 Y. A. Lotfy and S. M. Darwish

(region of interest) and fusion steps reduce key dimensions to reduce computational
processing. See [17, 18] for more details. Since we work with a private permissioned
blockchain, where all the participants have known identities, thus, our proposed
scheme is built over the biometric key infrastructure that includes Biometric Certificate
Authority (BCA) for confirming the participant identities in order to determine the
exact permissions over resources and access to information that participants have in a
blockchain network.

3.4 Registration Phase


In this phase, the user biometric private key is generated from the user biometric
data w. His public key associated with the private key is also created and certified, then
registered into the blockchain network. This phase consists of four steps: (1) Con-
firming the user identity. (2) Generating public key and master key. (3) Creating a
private key associated with the user’s biometric information. (4) Issuing a public key
certificate. The Biometric Certificate Authority (BCA) issues a Public Key Certificate
(PKC) by giving a digital signature to the public key which is a set of attributes related
to the holder of the certificate such as user ID, and expiration date and much more
information, all of those attributes are encrypted by the BCA’s private key so that
tampering will invalidate the certificate. Then the BCA registers a PKC in the repos-
itory and publishes it to the network. See [19] for more details.

3.5 Transaction Generation Phase


In this phase the sender generate a new blockchain transaction which includes the
Owner 3’s PKC (a receiver’s PKC), contents, and their hash value H. This phase
consists of the three steps: (1) Creating new blockchain transaction. (2) Generating the
fuzzy signature (3) Issuing the signature into the Blockchain. The sender attaches his
biometric signature to a new blockchain transaction, then the new block’s transaction is
appended to the ledger waiting to be confirmed or rejected. See [19] for more details
regarding these steps.

3.6 Transaction Verification Phase


In this phase, we use a verification algorithm that takes the public parameters PP, an
identity w0 such that jw \ w0 j  d, hash of the message H, and the corresponding
signature S as an input. It returns a bit b, where b = 1 means that the signature is valid,
otherwise invalid. The verification is done through two-phase hierarchical verification.
This phase consists of the two steps: (1) Checking the expiration date, and (2) Calcu-
lating the signature verification results. If the equality holds BVer ¼ 1; verification
success; otherwise, BVer ¼ 0 verification failed. The client application will then be
notified that the transaction has been immutably appended to the chain, as well as
notification of whether the transaction was validated or invalidated [18, 19].
A Secure Signature Scheme for IoT Blockchain Framework 267

4 Experimental Results

Currently, there is no open database contain Fingerprint and Finger Vein images for the
same person. The ready-made database for fingerprint and finger vein is selected from
the Shandong University for SDUMLA-HMT Database [20]. We assume that the same
person has independent different biometric traits. The fingerprint and finger vein
images were collected from 106 volunteers. We perform the evaluation using the IoT
device with the following specifications: CPU: Intel Core i7 7500U Processor 2.7 GHz
and memory: 8 GB. System type: 64-bit operating Enterprise as a running operating
system. The biometric part of key extraction has been implemented using MATLAB.
The blockchain part is employed in the open-source blockchain platform, the hyper
ledger Fabric [21], a platform used to develop and implement cross-industry blockchain
applications and systems. Moreover, we use a fuzzy identity-based signature as a
digital signature algorithm during this evaluation.
In our proposed model, we restrict installing infected software on the IoT device by
automatically evaluating the security score of the device using the whitelist and the
smart contract and then recording it to the blockchain. Our proposed scheme ensure
maximum scalability when the security score is high, and restrict scalability when the
security score is low. The adversary has two common attacks to obtain that informa-
tion. One is attacking the private key, and the other is forging the signature.

Table 1. Security level comparison of our proposed model and other signature schemes against
some popular attacks
Signature schemes Private key attack or leakage Forgery of the digital signature
PKSS Low Middle
FKSSU Middle Middle-High
FIKSSM High High

In the following, we analyze the security of our proposed scheme and confirm their
effectiveness against common attacks in the blockchain system. Also, we compare
security level against three signature schemes: the conventional private key based
signature scheme (PKSS), the fuzzy key signature scheme based on unimodal bio-
metrics (FKSSU), and our proposed fuzzy identity key signature scheme based on
multimodal biometrics (FIKSSM). Table 1 illustrate the security level comparison of
the three signature scheme. In the PKSS, the long-term private key is managed in the
IoT device, a cloud server, or wallet. Therefore, there is a high risk that an adversary
steals the private key and a forges a digital signature and create illegal Blockchain
transaction. Losing the private key meaning that losing the ownership of the asset
forever as there is no recovery for the private key. This makes PKSS vulnerable to
security breaches and doesn’t provide sufficient security in today’s connected and data-
driven world.
On the other hand, in FKSSU, the private key is not managed in the IoT device; and
its private key is derived from his biometric information. It considers a secure scheme,
268 Y. A. Lotfy and S. M. Darwish

but based on only one biometric trait; it has several problems in capturing clean data,
including noisy data, inter-class variations, spoof attacks, and unacceptable error rates.
Therefore, the FKSSU is not completely secure against that attack. In our FIKSSM, the
private key is not managed by the IoT device, too, like FKSSU. However, we use the
user’s fingerprint and finger knuckle print to derive a unique private key. Multimodal
biometric increases the amount of data analyzed, helps in an accurate match in the
verification and makes it exponentially more difficult for the adversary to spoof.
Adding multiple modalities makes it difficult to find and use all the biometric data
needed to spoof the algorithm. That makes our system more secure, robust, and able to
avoid spoofing attacks.
Regarding forge a digital signature attack, this threat is to produce an illegal
blockchain transaction and make the verification of a digital signature succeed. This
threat can be managed if we adopt a safe algorithm for generating key pairs. In the
PKSS and FKSSU, if a secure signature algorithm that is difficult to be forged is used,
these signature schemes are safe. In the FIKSSM, we use a secure fuzzy identity
signature algorithm that is proved secure under the EUF-CMA model (Existential
Unforgeability against Adaptive Chosen Message Attacks) [7]. In the UF-FIBS-CMA
model, the adversary’s probability of producing a valid signature of any message under
a new private key is negligible.

Fig. 2. Comparison results of vulnerability.

The second set of experiments aims to validate the robustness of the proposed
model in terms of vulnerability defined as the flaws or weaknesses in the IoT device
that leave it exposed to threats. It is the result of dividing the number of infected
devices by the number of all devices in the network. In Fig. 2, the horizontal axis
represents simulation time, and the vertical axis represents the number of infected IoT
devices due to connection with other peripheral malicious devices. In PKSS model, the
infected devices are linked directly to the IoT devices in the network that makes the
scheme vulnerable to Distributed Denial of Service (DDOS) attack in which the
A Secure Signature Scheme for IoT Blockchain Framework 269

attacker uses several infected IoT devices to overwhelm a specific target node, and 53%
of the devices have been infected due to their connection to infected devices.
The FKSSU still isn’t secure enough because it is only depend on only securing the
user authentication and verification using a unimodal biometric key, but ignore the
security score for the device when it extended to the network that make the device
vulnerable to malicious programs created by the user’s carelessness or by hackers’
attacks. On the other hand, in our FIKSSM proposed scheme, the number of devices
connected to malicious devices is reduced to 4%. The number of devices connected to
malicious devices reduced by up to 49% compared to PKSS. As our scheme is
restricted to connection depending on the security score and continuously verifies
software via the agent and the whitelist, the number of infected devices continues to
decrease thanks to the whitelist update.

5 Conclusion

In this work, we introduce a new signature scheme that resolves security and scalability
issues in IoT devices based on blockchain technology. The suggested model utilizes
multimodal biometrics that securely extends the connection of IoT devices in the
network and guarantees that the person who generates the blockchain transaction is the
correct asset owner and not just a fraud. The proposed model improves security
throughout two phases. The first is by using multimodal biometric as a private key for
authentication and verification by applying the fuzzy identity signature to create a
blockchain transaction. Secondly, by deploying whitelist software on IoT devices, then
record all installed software to the blockchain via smart contract. Our proposed model
reduces the extension of infected devices, up to 49% compared with conventional
schemes. Therefore, the proposed scheme achieves practically high performance in
terms of security against spoofing and signature forgery and scalability. On the other
hand, our proposed model is expected to outperform in latency and throughput only at
the start compared to conventional models. For future work, we plan to extend our
proposed model in a real-world and examine the performance during implementation.

References
1. Bozic, N., Pujolle, G., Secci, S.: A tutorial on blockchain and applications to secure network
control-planes. In: 3rd IEEE Smart Cloud Networks & Systems Conference, United Arab
Emirates, pp. 1–8, IEEE (2016)
2. Cai, Y., Zhu, D.: Fraud detections for online businesses: a perspective from blockchain
technology. Financ. Innov. 2(1), 1–10 (2016)
3. Zyskind, G. Nathan, O., Pentland, A.S.: Decentralizing privacy: using blockchain to protect
personal data. In: Proceedings of the IEEE Security and Privacy Workshops, USA, pp. 180–
184. IEEE (2015)
4. Min, X., Li, Q., Liu, L., Cui, L.: A permissioned blockchain framework for supporting
instant transaction and dynamic block size. In: Proceedings of the IEEE
Trustcom/BigDataSE/ISPA Conference, China, pp. 90–96. IEEE (2016)
270 Y. A. Lotfy and S. M. Darwish

5. Gervais, A., Karame, G., Capkun, V., Capkun, S.: Is bit coin a decentralized currency? IEEE
Secur. Priv. 12(3), 54–60 (2014)
6. Murakami, T., Ohki, T., Takahashi, K.: Optimal sequential fusion for multibiometric
cryptosystems. Inf. Fusion 32, 93–108 (2016)
7. Yang, P., Cao, Z., Dong, X.: Fuzzy identity based signature with applications to biometric
authentication. Comput. Electr. Eng. 37(4), 532–534 (2017)
8. Fernández-Caramés, T., Fraga-Lamas, P.: A review on the use of blockchain for the Internet
of Things. IEEE Access 6, 32979–33001 (2018)
9. Goldfeder, S., Bonneau, J., Kroll, J.A. Felten, E.W.: Securing bit coin wallets via threshold
signatures. Master thesis, Princeton University, USA (2014)
10. Dai, W., Deng, J., Wang, Q., Cui, C., Zou, D., Jin, H.: SBLWT: a secure blockchain
lightweight wallet based on trust zone. IEEE Access 6, 40638–40648 (2018)
11. Samaniego, M., Deters, R.: Blockchain as a service for IoT. In: IEEE International
Conference on Internet of Things (iThings), China, pp. 433–436. IEEE (2016)
12. Balfanz, D.: Fido U2F implementation considerations. FIDO Alliance Proposed Standard,
pp. 1–5 (2015)
13. Dery, S.: Using whitelisting to combat malware attacks at Fannie Mae. IEEE Secur. Priv. 11
(4), 90–92 (2013)
14. Waters, B.: Efficient Identity-based encryption without random Oracles. Lecture Notes on
Computer Science, vol. 3494, pp. 114–127, Berlin (2005)
15. Alaba, A., Othman, M., Hashem, T., Alotaibi, F.: Internet of Things security: a survey.
J. Netw. Comput. Appl. 88, 10–28 (2017)
16. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the Internet of Things.
IEEE Access 4, 2292–2303 (2016)
17. Ross, A., Govindarajan, R.: Feature level fusion of hand and face biometrics. Biometric
Technol. Hum. Ident. II, USA 5779, 196–204 (2005)
18. Wang, Z., Tao, J.: A fast implementation of adaptive histogram equalization. In: 8th
International Conference on Signal Processing, China. pp. 1–4. IEEE (2006)
19. Zhao, H., Bai, P., Peng, Y., Xu, R.: Efficient key management scheme for health blockchain.
Trans. Intell. Technol. 3(2), 114–118 (2018)
20. Shandong University (SDUMLA). http://mla.sdu.edu.cn/info/1006/1195.htm
21. Cachin, C.: Architecture of the hyperledger blockchain fabric. In: Workshop on Distributed
Crypto-Currencies and Consensus Ledgers, Zurich. vol. 310, p. 4 (2016)
An Evolutionary Biometric Authentication
Model for Finger Vein Patterns

Saad M. Darwish1(&) and Ahmed A. Ismail2


1
Department of Information Technology, Institute of Graduate Studies
and Research, Alexandria University, Alexandria, Egypt
saad.darwish@alexu.edu.eg
2
Higher Institute for Tourism, Hotels and Computers, Seyouf,
Alexandria, Egypt
Gisapp13@gmail.com

Abstract. Finger vein technology is the biometric system that utilizes the vein
structure for recognition. Finger vein recognition has gained a great deal of
publicity because earlier biometric approaches experienced significant pitfalls.
These include its inability to handle the imbalanced collection and failure to
extract salient features for finger vein patterns. Such disadvantages have trig-
gered a lack of consistency of the optimization algorithm or have contributed to
a decrease in its efficiency. The key objective of the research discussed in this
paper is to examine the impact of the genetic algorithm in the selection of the
optimum vector characteristics of the finger vein. This is done by incorporating a
multilevel of control genes (Hieratical genetic algorithm) to boost the features’
variability inside the features’ vector, and minimizing the correlation among
features. The boosted feature selection method yields the ideal features’ vector
that can handle the utility of large intra-class differences and limited inter-class
similarities. The proposed model also offers the idea of reducing the finger vein
features dimension to diminish duplication, but not at the expense of decreasing
accuracy. The performance study of the proposed model is carried out through
multiple tests. The findings indicate an overall increase of 6% is achieved in
accuracy relative to some state-of-the-art finger vein recognition systems present
in the literature.

Keywords: Biometrics  Finger vein recognition  Optimization  Hierarchal


genetic algorithm

1 Introduction

Biometrics is seen as a basis of highly robust authentication systems facilitating several


advantages over the traditional systems. The reliability of a biometric is a measure of
the extent to which the feature or attribute is sensitive to considerable modification over
time. A highly robust biometric does not change significantly over time [1]. A gener-
alized biometric system is a functional combination of five main components, as shown
in Fig. 1 [2]. Finger vein is characterized as inside the human body, which cannot be
stolen or counterfeited. These advantages endow this tool with a highly viable in the
application in commercial places, residences, or other private places [3]. However, the

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 271–281, 2021.
https://doi.org/10.1007/978-3-030-58669-0_25
272 S. M. Darwish and A. A. Ismail

texture information of the finger vein is limited, and the pose variation of the finger
may cause a change of finger vein infrared image. These finger vein variations always
produce high inner-class-distance of two images from one individual, and so degrade
the matching performance, even for the accurate segmented images [4].

Fig. 1. Components of biometric system and process flow diagram.

Recently, the incorporation of specialized Genetic Algorithms (GAs) in the finger


veins-based human identification to increase its functioning has received a great deal of
attention among researchers working in this field [5]. Unfortunately, the GA chromo-
some and the phenotype structure are assumed to be fixed or pre-defined (characterized
by specific numbers of parameters/genes). Also, it cannot be used in real-time appli-
cations because the convergence time is unknown. Hierarchical Genetic Algorithm
(HGA) is an improved version of standard GA. Moreover, HGA differs from standard
GA in the structure of chromosomes. The hierarchical structure consists of parameter
and control genes. Parameter genes exist at the lowest level, and control genes reside at
the higher levels of the parameter genes. HGA will search over a more extensive search
space and converge to the right solution with a higher grade of accuracy [6].

1.1 Problem Statement and Contribution


The uniqueness, reliability, and tolerance of the forgery are the finger vein’s main
features that make user authentication safe. To develop highly accurate personal
identification systems, the finger-vein patterns should be extracted accurately from the
captured images. In general, choosing the best set of discriminative features for
extracting from input data is one of the most challenging aspects of designing a
verification system. Even now, the ideal set of features for separating data can be
realized using optimization algorithms. This research aims to implement the optimum
finger vein identification model, taking into account the more discriminative descriptors
that can significantly improve biometric system performance, using HGA. In the
suggested model, HGA has been employed instead of traditional GA to evolve the
standard structure of GA’s chromosome through two levels of control genes to regulate
parametric genes. This evolved chromosome’s structure helps to increase the diversity
of the genes, thus avoiding the coming together to a single optimum (most favourable
An Evolutionary Biometric Authentication Model for Finger Vein Patterns 273

solution). Employing the optimal feature selection process produces the narrow search
space that can be used to locate particular subject quickly in a small set of subjects
instead of an exhaustive search on the complete database.
The rest of the paper is organized as follows: Sect. 2 describes some of the state-of-
the-art related finger vein identification schemes. The detailed description of the pro-
posed model has been made in Sect. 3. In Sect. 4, the results and discussions are given.
Finally, conclusions are drawn in Sect. 5.

2 Related Work

Existing studies regarding finger vein feature extraction can be divided into four main
groups [5, 7–9]: (1) Filtering or transforming methods, (2) Segmenting the pixels cor-
responding to the veins and reflect or directly compare those pixels. (3) Concise
extraction methods. (4) Machine learning approaches, such as neural networks. In recent
years, deep learning has been received much attention from researchers [10–12]. For
instance, in the article [10], the authors suggested a convolutional-neural-network-based
finger vein recognition framework and analysed the network’s strengths over four
publicly accessible databases. The comprehensive set of experiments recorded indicates
that the recommended solution’s precision will surpass 95% of the accurate recognition
rate for all four publicly-accessible datasets. However, the neural network application is
facing some difficulties, including hyper-parameter tuning, which is non-trivial, needs a
big dataset for proper training, is still a black box, and is comparatively slow.
Some pioneering work has been done recently by incorporating wavelet transfor-
mation to characterize the finger vein patterns explicitly. In [13], a new technique is
introduced to extract vein patterns from near-infrared images that are improved by
directional wavelet transformation and eight-way neighbourhood methods to reduce the
necessary computational expense and to conserve essential details from low-resolution
images. However, greater complexity translates into more resources required to per-
form the computation - more memory and processor cycles and time. Furthermore, the
flexibility of DWTs is a two-edged sword - it is sometimes very difficult to choose
which basis to use. The aim of the research presented in [14] is to deliver a new scheme
focused on finger vein patterns by utilizing block standard local binary template and
block two-dimensional principal component analysis approach to minimize data
redundancy efficiently. Next, a block multi-scale uniform local binary pattern features
operator focused on enhanced circular proximity is used to remove spatial texture
properties of finger vein images effectively. See [15] for more details regarding current
finger vein identification approaches.
According to the review above, it can be found that past studies were primarily
devoted to (1) Developing various forms of feature extraction that are used to collect
the pattern of finger vein details (spatial/transformation- features); (2) Failure to resolve
problems relevant to the collection of suitable features from the pools of derived
features for feature extraction algorithms (most frequently depending on the methods
used); and (3) In order to obtain compact features vector, most dimensional reducing
methods depend on transformation matrix that, in case of managing all extracted
functions, involves comprehensive calculations. However, to the best of our knowledge
274 S. M. Darwish and A. A. Ismail

based on Google scholar, there has been no attention given to designing new optimal
vector technologies as well as to boost their vector matching efficiencies.

3 Methodology

Biometric data cannot be measured directly but require stable and distinctive features
first to be extracted from sensor outputs. The problem of feature selection is selecting a
set of candidate features and selecting a subset that best performs under some classi-
fication system [3, 16]. This proposal’s motivation is the valorization of the features,
which respectively maximize and minimize the signatures variation in the inter- and
intra-class assessments, respectively [16]. The key difficulty in dealing with features
produced by finger vein tools is the heterogeneity of features with respect to the same
class due to several factors. The proposed finger-vein recognition algorithm consists of
two phases: the enrolment (training) phase and the verification (testing) phase. Both
stages start with finger-vein image pre-processing. After the pre-processing and the
feature extraction step, the finger-vein template database is built for the enrolment
stage. For the verification stage, the enrolled finger-vein image is matched with a
similar pattern after its optimal features are extracted. Figure 2 shows the flowchart of
the proposed model.

Fig. 2. Hierarchal genetic algorithm-based finger vein recognition model.


An Evolutionary Biometric Authentication Model for Finger Vein Patterns 275

3.1 Finger Vein Data Acquisition


The database of finger vein images was collected from 123 persons comprising of 83
males and 40 females from University Sains Malaysia [17]. The age of the subject
ranged from 20 to 52 years old. Every subject provided four fingers: left index, left
middle, right index, and right middle fingers resulting in a total of 492 finger classes
obtained. The captured finger images provided two important features: the geometry
and the vein pattern. The spatial and depth resolution of the collected finger images
were 640 x 480 and 256 grey levels, correspondingly.

3.2 Region of Interest Extraction and Pre-processing


Collecting the samples of finger vein can introduce translational and rotational changes
in various images picked up from the same finger or individual. Therefore, pro-
grammed extraction of the region of interest (ROI) that can diminish intra-class
deviations is highly desirable. The useful area is said to be “Region of Interest.” The
image area without an effective pattern is first discarded since it only holds sur-
roundings (background) data [16]. In this case, the theoretical model can be checked by
a collection of benchmark finger vein datasets, which only have an ROI. The aim of
finger vein image pre-processing is to enhance some image features relevant for a
further processing task. As a result, interesting details in the image are highlighted, and
noise is removed from the image. Finger images include noise with rotational and
translational variations. To eliminate these variations, finger images are subjected to
pre-processing steps that include image filtering, enhancement, and histogram equal-
ization. See [18, 19] for more details.

3.3 Feature Extraction


In machine learning and pattern recognition, feature extraction starts from a preliminary
group of measured data. It builds derived values (features) intended to be informative
and non-redundant [3]. The feature extractor’s key objective is to switch the finger vein
images into a set of features that are like to the images of the same class and are
distinctive to the images of a different class. From another point of view, feature
extraction is a dimensionality reduction process, where an initial set of raw variables is
reduced to more manageable groups (features) for processing, while still accurately and
completely describing the original data set. There are two types of features extracted
from the images based on the application. They are local and global features. See [19]
for more details. A combination of global and local features enriches the accuracy of
the recognition with more computational overheads.
In this research, the Gabor filter was employed to extract finger vein pattern’
features. The most important advantage of Gabor filters is their invariance to rotation,
scale, and translation. Furthermore, they are robust against photometric disturbances,
such as illumination changes and image noise. The Gabor filter-based features are
directly extracted from the gray-level images. Three types of information can be found
by using Gabor filters: magnitude, phase, and orientation, which can be individually or
276 S. M. Darwish and A. A. Ismail

jointly applied in different systems. Readers looking for more information regarding
how to extract Gabor feature vector can refer to [20].

3.4 Feature Selection Based on Hierarchal Genetic Algorithm


Feature selection is a widely researched topic in the area of machine learning. It has
been found to be valuable in reducing the complexity, computational speed while
improving the accuracy of an identification problem [21]. Obtaining robust features in
the biometric applications is a hard assignment due to features being influenced by
inherent factors such as sexual characteristics, and outside stimuli like varying illu-
mination can cause further difficulties in feature extraction. The suggested model uti-
lizes a hierarchal genetic algorithm to select from the set of chosen initially features the
optimum ones. The use of the HGA is particularly essential for the structure or
topology as well as the parametric optimization [22]. Unlike the set-up of the con-
ventional GA optimization, where the chromosome structure is assumed to be fixed or
pre-defined, HGA operates without these constraints. The complicated chromosomes
may provide a good new way to solve the problem, and have demonstrated better
results in complex issues than the conventional GA [22].

Fig. 3. Hierarchical chromosome structure.

One of the main differences between HGA and GA is that HGA can dynamically
vary the presentation approach due to active and inactive genes, which shows that the
phenotype with different lengths is available within the same chromosome represen-
tation. Hence, HGA will search over a more extensive search space and converge to the
right solution with a higher grade of accuracy. Those chromosome structures of the
conventional GA are assumed pre-defined or fixed, while the HGA works without these
constraints. It utilizes multiple levels of control genes that introduced hierarchically, as
illustrated in Fig. 3 [23].
Since the chromosome structure of HGA fixed, and this is true even for different
parameter lengths, there is no extra effort required for reconfiguring the usual genetic
operations. Therefore, the standard methods of mutation and crossover may apply
independently to each level of genes or even for the whole chromosome if this is
homogenous. However, the genetic operations that affect the high-level genes can
result in changes within the active genes, leading to multiple changes in the lower level
genes. It’s the precise reason why the HGA cannot only obtain a good set of system
parameters but can also reach a minimized system topology [22]. Herein, an instance of
An Evolutionary Biometric Authentication Model for Finger Vein Patterns 277

a GA-feature selection optimization problem can be described formally as a four-


tuple (R, Q, T, f) defined as [5, 6, 22, 23]:
• R is the solution space that represents a collection of n-dim Gabor feature vectors.
Each bit is a gene that characterizes the nonexistence or presence of the feature
inside the vector.
• Q is the probability predicate, such as crossover and mutation (for each level of
genes). The crossover is the process of exchanging the ascending’ genes to produce
one or two descendants that carry inherent genes from both parents to increase the
variety of mutated individuals. Herein, a single point crossover is employed because
of its simplicity. The goal of mutation is to prevent falling into a locally optimal
solution of the solved problem; a uniform mutation is employed for its simple
implementation. The selection operator retains the best fitting chromosome of one
generation and selects the fixed numbers of parent chromosomes. Tournament
selection is the most popular selection technique in the genetic algorithm due to its
efficiency and simple implementation.
• f is the set of feasible solutions (new generation populations). With these new
generations, the fittest chromosome will represent the finger vein vector with a set
of salient elements. This vector will specify the optimal feature grouping according
to the identification accuracy.
• f is the objective function (fitness function). The individual that has higher fitness
will win to be added to the predicate operators’ mate. Herein, the fitness function is
computed based on MSE value that shows the difference between the input image
and the matched image.

3.5 Building Finger Vein Feature Set


In the training phase, once the hierarchical genetic procedure made the selection of an
optimal feature set, the features are extracted for each training sample. They are stored
in a database that contains a finger vein image and its feature vector beside the label of
the image that identifies the user. In general, the proposed model’s quality depends on
the number of feature’s vector stored per user, which will be experimentally verified.
Overall, increasing the number of features vectors per user increases the required
storage in the database. In this work, given the training data set, the suggested model
builds the finger vein feature database. In the testing phase, given the inquiry sample
PEf m from a test dataset, the feature extraction stage is implemented to get FPE. Then
the distance dPi between FPE and the K class centers (feature vector for each user) as
dpi ¼ kFPE  Ci k; i ¼ 1; 2; . . .; K is calculated. At that time, the inquiry sample is
allocated to the cluster according to the smallest distance [24].

4 Experimental Results

To verify the suggested model, a lot of experiments were piloted on a benchmarked


Finger Vein USM (FV-USM) Database [17]. The database consists of the information
of the finger vein with the extracted ROI (region of interest). The database’s images
278 S. M. Darwish and A. A. Ismail

were collected from 123 individuals comprising 83 males and 40 females, who were
staff and students of University Sains Malaysia. The age of the subject ranged from 20
to 52 years old. Every subject provided four fingers: left index, left middle, right index,
and right middle fingers resulting in a total of 492 finger classes obtained. Each finger
was captured six times in one session, and each individual participated in two sessions,
separated by more than two weeks. In the first session, a total of 2952 (123  4  6)
images were collected. Therefore, from two sessions, a total of 5904 images from 492
finger classes are obtained. The spatial and depth resolution of the captured finger
images were 640  480 and 256 grey levels, respectively.
These set of experiments are running with the following configuration parameters
that includes: Generation Number (GN) = 10, Population Size (PS) = 10, Mutation
Ratio = 0.3, Crossover Ratio = 0.8. The suggested system has been implemented in
MATLAB (2017) using the laptop computer with the following specifications: Pro-
cessor: Intel (R), Core (TM) i7-7500U CPU@270 GHZ @ 290 GHz. Installed memory
(RAM): 8 GB. System type: 64-bit operating system, x64 based processor. Microsoft
windows 10 Enterprise as running operating system.
The identification accuracies achieved by the state-of-the-art finger-vein-based
biometric systems [10, 13, 14], that are discussed in the literature survey section, are
reported in Table 1 together with the obtained performance with the proposed HGA-
based approach, when using the same training and testing strategies. Herein, Equal
Error Rate (EER) was used for evaluation. An ERR is a point where the false accep-
tance rate and false rejection rate intersects. A device with a lower EER is regarded to
be more accurate. As it can be seen from the reported accuracies, our HGA based
identification model achieves better results. As shown in Table 1, the CNN–based
approach, when implemented to classify finger vein images, performs poorly compared
to the proposed model. Deep learning finger vein identification method performance
can be enhanced by employing large datasets, so there is a need for a large finger vein
image dataset.

Table 1. Performance Evaluation of various finger vein classification techniques (using 3


training images per session)
Reference Method EER
R. Das et al. Convolutional-neural- network-based finger vein recognition 0.086
[10] framework
H. C. Hsien Directional wavelet transformation and eight-way neighbourhood 0.163
[13] methods
N. Hu et al. Two-dimensional principal component analysis approach 0.124
[14]
Proposed Features selection using HGA 0.079
model

The second set of experiments works to verify the function of the selection module
for features to enhance accuracy. Herein, the adaptive features selection procedure is
implemented in order to find the best significant features that help to reduce the total
An Evolutionary Biometric Authentication Model for Finger Vein Patterns 279

assessment time with no loss of accuracy. The suggested model is applied as a selection
method using both HGA and conventional GA. Tables 2 and 3 show the detailed
confusion matrix for finger vein recognition dependent on HGA and GA, where
population size is set to 100 chromosomes, and the number of maximum generations is
set to 20 for large search space. The crossover and mutation rates are set to 60% and
40%, respectively.

Table 2. Confusion matrix for HGA-based identification (average)


Prediction Actual
Within class Between class
Within class 97.2% 2.8%
Between class 2.8% 97.2%

Table 3. Confusion matrix for GA-based identification (average)


Prediction Actual
Within class Between class
Within class 91.3 8.7
Between class 8.7 91.3

As shown in Tables 2 and 3, the proposed HGA-based identification model


achieves a higher accuracy of approximately 6% compared to the traditional GA
method and reduces the false acceptance rate. One of the explanations of this result the
HGA depends on the existence of active and inactive genes, which shows that the
phenotype with different lengths is available within the same chromosome represen-
tation. Hence, HGA will search over a larger search space and converge to the right
solution with a higher grade of accuracy. Those chromosome structures of the con-
ventional GA are assumed pre-defined or fixed, while the HGA works without these
constraints. Utilizing HGA forces the GA to maintain a heterogeneous population
throughout the evolutionary process, thus avoiding the convergence to a single opti-
mum. In general, the feature selection problem has a multimodal character because
multiple optimum solutions could be found in the search space [9].

5 Conclusions and Future Work

This study has provided an effective personal identity model focused on the finger vein.
The enhanced finger vein images were fused with local and global characteristics to
obtain the vein’s Gabor transformation-driven design. Use the hierarchal genetic
algorithm in general, offers several benefits as a resource to identify the optimal fea-
tures for the identification of finger veins. The suggested model can help with the use of
control genes to generate discriminated features vectors based on small datasets. Thus,
they provide utility to the proposed model for operating on a small data set instead of
280 S. M. Darwish and A. A. Ismail

using a large data sample to create a significant feature vector, which in turn takes a
considerable amount of time. Due to its relatively low computational complexity in an
online process, the proposed model is perfect for mobile applications. Future research
involves using fuzzy logic to improve representations of finger veins and minutiae
extraction for matching.

References
1. Jaiswal, S.: Biometric: case study. J. Glob. Res. Comput. Sci. 2(10), 19–48 (2011)
2. Vishi, K., Yayilgan, S.: Multimodal biometric authentication using fingerprint and iris
recognition in identity management. In: IEEE International Conference on Intelligent
Information Hiding and Multimedia Signal Processing, pp. 334–341, China (2013)
3. Van, H., Thai, T.: Robust finger vein identification base on discriminant orientation feature.
In: Seventh International Conference on Knowledge and Systems Engineering, pp. 348–353,
Vietnam (2015)
4. Liu, Z., Yin, Y., Wang, H., Song, S., Li, Q.: Finger vein recognition with manifold learning.
J. Netw. Comput. Appl. 33(3), 275–282 (2010)
5. Hani, M., Nambiar, V., Marsono, M.: GA-based parameter tuning in finger-vein biometric
embedded systems for information security. In: IEEE International Conference on
Communications, pp. 236–241, China (2012)
6. Qi, D., Zhang, S., Liu, M., Lei, Y.: An improved hierarchical genetic algorithm for
collaborative optimization of manufacturing processes in metal structure manufacturing
systems. Adv. Mech. Eng. 9(3), 1–10 (2017)
7. He, C. Li, Z., Chen, L., Peng, J.: Identification of finger vein using neural network
recognition research based on PCA. In: IEEE International Conference on Cognitive
Informatics & Cognitive Computing, pp. 456–460, UK (2017)
8. Kono, M., Ueki, H., Umemura, S.: Near-infrared finger vein patterns for personal
identification. Appl. Opt. 41(35), 7429–7436 (2002)
9. Wu, J., Liu, C.: Finger-vein pattern identification using principal component analysis and the
neural network technique. J. Expert Syst. Appl. 38(5), 5423–5427 (2011)
10. Das, R., Piciucco, E., Maiorana, E., Campisi, P.: Convolutional neural network for finger-
vein-based biometric identification. IEEE Trans. Inf. Forensics Secur. 14(2), 360–373 (2019)
11. Liu, Y., Ling, J., Liu, Z., Shen, J., Gao, C.: Finger vein secure biometric template generation
based on deep learning. Soft. Comput. 22(7), 2257–2265 (2017)
12. Jalilian, E., Uhl, A.: Improved CNN-segmentation-based finger vein recognition using
automatically generated and fused training labels. In: Handbook of Vascular Biometrics,
pp. 201–223. Springer, Cham (2020)
13. Chih-Hsien, H.: Improved finger-vein pattern method using wavelet-based for real-time
personal identification system. J. Imaging Sci. Technol. 62(3), 304021–304028 (2018)
14. Hu, N., Ma, H., Zhan, T.: Finger vein biometric verification using block multi-scale uniform
local binary pattern features and block two-directional two-dimension principal component
analysis. Optik 208(1), 1–10 (2020)
15. Mohsin, A., Zaidan, A., Zaidan, B., Albahri, O., et al.: Finger vein biometrics: taxonomy
analysis, open challenges, future directions, and recommended solution for decentralised
network architectures. IEEE Access 8(8), 9821–9845 (2020)
An Evolutionary Biometric Authentication Model for Finger Vein Patterns 281

16. Parthiban, K., Wahi, A., Sundaramurthy, S., Palanisamy, C.: Finger vein extraction and
authentication based on gradient feature selection algorithm. In: IEEE International
Conference on the Applications of Digital Information and Web Technologies, pp. 143–
147, India (2014)
17. Ragan, R., Indu. M.: A novel finger vein feature extraction technique for authentication. In:
IEEE International Conference on Emerging Research Areas: Magnetics, Machines and
Drives, pp. 1–5, India (2014)
18. Yang, J., Zhang, X.: Feature-level fusion of fingerprint and finger-vein for personal
identification. Pattern Recogn. Lett. 3(5), 623–628 (2012)
19. Iqbal, K., Odetayo, M., James, A.: Content-based image retrieval approach for biometric
security using color, texture and shape features controlled by fuzzy heuristics. J. Comput.
Syst. Sci. 78(1), 1258–1277 (2012)
20. Veluchamy, S., Karlmarx, L.: System for multimodal biometric recognition based on finger
knuckle and finger vein using feature-level fusion and k-support vector machine classifier.
IET Biomet. 6(3), 232–242 (2016)
21. Unnikrishnan, P.: Feature selection and classification approaches for biometric and
biomedical applications. Ph.D. thesis, School of Electrical and Computer Engineering,
RMIT University, Australia, (2014)
22. Xiang, T., Man, K., Luk, K., Chan, C.: Design of multiband miniature handset antenna by
MoM and HGA. Antennas Wirel. Propag. Lett. 5(1), 179–182 (2006)
23. Guenounou, O., Belmehdi, A., Dahhou, B.: Optimization of fuzzy controllers by neural
networks and hierarchical genetic algorithms. In: Proceedings of the European Control
Conference (ECC), pp. 196–203, Greece (2007)
24. Itqan, S., Syafeeza, A., Saad, N., Hamid, N., Saad, W.: A review of finger-vein biometrics
identification approaches. Int. J. Sci. Technol. 9(32), 1–8 (2016)
A Deep Blockchain-Based Trusted Routing
Scheme for Wireless Sensor Networks

Ibrahim A. Abd El-Moghith(&) and Saad M. Darwish

Department of Information Technology, Institute of Graduate Studies


and Research, Alexandria University, Alexandria, Egypt
{ibrahim.abd.el.moghith,saad.darwish}@alexu.edu.eg

Abstract. Routing is one of the most important operations in Wireless Sensor


Networks (WSNs) as it deals with data delivery to base stations. Routing attacks
can easily destroy and significantly degrade the operation of WSNs. A trust-
worthy routing scheme is very essential to ensure the protection of routing and
the efficiency of WSNs. There are a range of studies to boost trustworthiness-
between routing nodes, such as cryptographic schemes, trust protection, or
centralized routing decisions, etc. Nonetheless, most routing schemes are
impossible to implement in real cases, because it is challenging to efficiently
classify untrusted actions of routing nodes. In the meantime, there is still no
effective way to prevent malicious node attacks. In view of these problems, this
paper proposes a trusted routing scheme using fusion of deep-chain, and Markov
Decision Processes (MDPs) to improve the routing security and efficiency for
WSNs. The proposed model relies on proof of authority mechanism inside the
blockchain network to authenticate the process of sending the node. The col-
lection of validators needed for proofing is governed by a deep learning tech-
nique focused on the characteristics of each node. In turn, MDPs are
implemented to select the right next hop as a forwarding node that can transfer
messages easily and safely. From experimental results, we can find that even in
the 50% malicious node routing environment, our routing system still has a good
delay performance compared to other routing algorithms.

Keywords: Wireless Sensor Networks  Trusted routing  Deep-chain 


Blockchain  Markov Decision Processes

1 Introduction

The multi-hop routing technique is one of the critical technologies of WSN. Never-
theless, distributed and dynamic features of WSN make the multi-hop routing vul-
nerable to the various pattern of attacks and thus seriously affect the security. Classical
secure routing schemes target specific malicious or selfish attacks. They are not suitable
for multi-hop distributed WSN as they mainly rely on the authentication mechanism
and encryption algorithm [1]. Routing nodes cannot recognize the truth of the infor-
mation routing released by other routing nodes in certain specific routing algorithms.
A malicious node may emission a fake queue length information to increase the
probability of receiving packets, thereby affecting other routing nodes’ routing
scheduling. Current routing schemes find it tricky to identify such malicious nodes, as
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 282–291, 2021.
https://doi.org/10.1007/978-3-030-58669-0_26
A Deep Blockchain-Based Trusted Routing Scheme for WSNs 283

it is difficult to accurately distinguish the real-time change in routing information


between two routing nodes [2].

Fig. 1. Black hole attack.

Fig. 2. Key elements of blockchain systems.

When a malicious node gets data packets from a neighbor node, it directly discards
packets and does not forward data packets to its next-hop neighbor node. This gen-
erates a “black hole” data in the network that is hard to detect in WSNs for routing
nodes (see Fig. 1) [3]. These malicious nodes may be external intrusion attackers or
legitimate internal nodes intercepted by outside attackers. Trust management has
recently become a popular way of ensuring the safety of the routing network. This
method can make the routing node effectively select the relatively trustworthy routing
links. On the other hand, its usage is limited since the trust values of adjacent routing
nodes can only be accessed by one routing node that does not entirely follow the
distributed multi-hop WSN. The blockchain is a trusted, decentralized, self-organized
ledger system ideal for WSNs spread across multi-shops. In recent years, there has been
a lot of research on the blockchain technology of routing algorithms [4]. The
284 I. A. Abd El-Moghith and S. M. Darwish

blockchain is a distributed database maintained by multiple nodes and fundamentally


deals with trust and security issues. Figure 2. Illustrates the main component of the
blockchain. Most crucially, the consensus process is how any accounting node achieves
agreement to assess the effectiveness of an interruption avoidance transaction.
Several standard consensus algorithms are discussed in [5], among them the proof
of authority (PoA) that is a Byzantine Fault Tolerant (BFT) consensus algorithm for
authorization and private blockchain. The algorithm relies on a collection of trust-
worthy entities (i.e., authorities) known as validators. Validators collect, create, and add
blocks on the chain for the transactions from customers. We must, therefore, pay
particular attention to the choice of validators. A deep-learning model then expands the
collection of validators through the properties set for each node [6]. Recently, Rein-
forcement learning has been used to overcome the routing issues by allowing wireless
nodes to observe and gather information from their effective local operating environ-
ment, learn and make efficient routing decisions on the fly. A standard decision-making
approach is to choose the best next-hop, depending on the current scenario. Many
researchers in literature have introduced MDSs as one of the most suitable decision-
making methods for a random dynamic approach to solving this issue. In this case, each
hop in the routing process can be considered as a state, so that each hop decides to
select one of the best hops. Then, following sequential decisions, messages can be
efficiently and safely transmitted to the destination [7].
This article provides a modified WSN trusted routing scheme to address the above-
mentioned black hole problem. This current model, as distinct from other solutions in
the same type, utilizes proof in authority inside the blockchain network in order to
authenticate the node transmission phase. To accomplish this aim, a deep neural net-
work is used to pick the salient nodes that reflect the characteristics of the node
dependent validators. In addition, MMDPs is employed for better routing decisions.
The remainder of this paper is structured as follows: Sect. 2 provides a summary of
current strategies for the reliable routing scheme in WSN. Section 3 describes the
proposed model of efficient routing. Section 4 offers several experimental results to
evaluate the efficiency of the proposed model. Eventually, we finish the paper and
layout future plans in Sect. 5.

2 Related Work

Throughout this section, we will review several conventional trustworthy routing


strategies to increase route protection and reliability. Then we introduce some related
approaches to blockchain development routing schemes. Finally, we analyze current
systems implementing MDP to make the right judgment for the delivery of the mes-
sages. The authors in [8] suggested a lightweight, low-energy, adaptive hierarchy
clustering used to detect suspicious behaviors between nodes. As stated in [9], many
proposals was presented to provide a stable spatial routing algorithm for a wireless
sensor network to classify the incident and transmit details to the base station. The
authors in [10] utilized hierarchical routing algorithms based on several factors like
distance between nodes and the base station, the nodes’ distribution density, and
residual energy of nodes to design secure routing protocol. In [11], the author has
A Deep Blockchain-Based Trusted Routing Scheme for WSNs 285

suggested secure communication and routing architecture based on the security


architecture in the design of the routing protocol.
Recently, several researchers have merged the tamper-proof and traceable features
of blockchain technologies with routing algorithms to increase the stability of routing
nodes. The trusted public key management framework is introduced by De la Rocha
et al. [12]. The solution substituted a blockchain protocol with conventional public key
infrastructures, thereby removing central authentication and offering a decentralized
inter-domain routing network. In [13], Li et al. developed a multi-link, concurrent,
blockchain-based communications network. The nodes can be marked as malicious or
non-malicious, depending on the particular interconnected factor connectivity tree
algorithm and the behavioral characteristics of the blockchain-based data routing
nodes. Ramezan et al. used smart contracts to build a blockchain-based contractual
routing protocol for routing networks with untrusted nodes [14]. The key theory is that
the source node verifies each hop routing arrival at the smart contract, and malicious
behavior nodes are registered. The following packets will not move through an
established malicious node anymore. However, a malicious node with the to-kens
algorithm may falsely claim the packets were received. So, there are safety hazards.
Recent advances in MDP solvers have made feasible the solution for large-scale
structures and launched new future work into WSNs. For instance, the authors in [15]
used MDPs to establish a WSN-controlled transmission power-level routing protocol.
By identifying the optimal policy for the MDP configuration, the preferred power
source is selected. In [16], the authors proposed a solution for managing the selection
of nodes in event-detection applications using mobile nodes equipped with a directional
antenna and a global positioning system. The approach fails to offset the energy usage
of the next transmitting nodes. To summarize, most of the secure protocols offer
security against replay and routing table poisoning attacks, but do not have an adequate
means of defense against black-hole attacks. Current blockchain- based routing pro-
tocols depend on proof of work principle to authenticate transactions (packets) that
allow further overhead handling. Unlike these protocols, the proposed scheme relies on
proof of authority for authentication that requires less computational time as it depends
only on a few key nodes (validators).

3 The Proposed Secure Routing Scheme

This proposed scheme aims primarily to construct a reliable, trustworthy routing for
wireless sensor networks, using an integration between the deep chain and Markov
decision-making to minimize computational overhead. The main diagram of the pro-
posed scheme is shown in Fig. 3 consisting of three phases; constructing a node data
structure, electing a validator through a deep learning model, and optimizing the next
hop via MDP. The following subsections discuss each of these phases in detail.
286 I. A. Abd El-Moghith and S. M. Darwish

Fig. 3. The proposed trusted routing scheme

3.1 Step 1: Build Node Data Structure


In the beginning, all sensors have the same processes and have no role as validators or
slave nodes. They have a unique identifier (e.g., anonymous addresses), not anonymous
sensors. In every transmission, every packet size is equal. In the wireless sensor net-
work, two forms of data transmission exist direct transmission and multi-hop data
transfer. See [17] for more details. In this case, the multi-hop data transmission is used.
With symmetrical communication, every cell in the WSN has the same initial energy,
and the cells keep static. The function of any node initially set to unstated is trans-
formed into the validator or the minion during the initial initiation. Each node in the
network holds a data structure with a variety of information pieces on the node
property, such as selected action (validator or nor), energy level, coverage, connec-
tivity, and the number of its neighbors, as described in Fig. 4 with an example. See [17]
for more details.
A Deep Blockchain-Based Trusted Routing Scheme for WSNs 287

30 4 1 0 1 1

Coverage
Energy level (E)

No. neighbors (ζ)

Status (S)

Connectivity
Action
Fig. 4. Filled node’s data structure

3.2 Step 2: Validators Election Using Deep Neural Learning and Node
Authentication
After the data structure has been established for each node, the features of these nodes
are used to select the most prominent nodes which are used as validators in the
authentication network in the blockchain proof framework. The selection is based on a
deep neural network. Deep learning approaches aim to learn functional hierarchies, in
which features are built on higher levels utilizing minor levels. The activation poten-
tials provided by each of the individual input measurements of the first hidden layer are
used to pick the most suitable functions. The characteristics are chosen to give
improved classifications than the initial high dimensional characteristics. Herein,
stacked RBMS (Deep Belief Network) is employed. See [18] for more details.
The suggested scheme uses the blockchain network, which is essentially a dis-
tributed ledger of tamper-resistant, decentralized, and traceable functionality through-
out the wireless sensor network to improve the trust and reliability of the routing
information. To record related information of each node, the blockchain token trans-
actions are used. The main framework is divided into two parts: the actual routing
network and the blockchain network. In general, packets from the source terminal to
the destination terminal are transmitted to a routing node Ri, this node then selects the
next-hop routing node Rp via the routing policy obtained by the local learning model
(MDP in our case). The local learning model continuously searches and collects
information on the status of the blockchain network for the appropriate network
routing. Upon continuous transmission, the packets will be sent to the targeted routing
node Rt and then to the destination terminal. A unique consensus algorithm is given in
every blockchain platform to ensure fairness of the blockchain transaction. In our
blockchain network, we use the Proof of Authority (PoA) Consensus algorithm that can
handle transactions more effectively.
Throughout our scenario, there are two types of entities in the PoA blockchain
network: (1) the validators are pre-authenticated blockchain nodes; they have advanced
authorization. Their particular tasks include smart contract execution, blockchain
transactions verification, and block release on the blockchain. In this case, a deep
neural network selects the validators. If a malicious validator occurs, it will attack just
one contiguous block at the most, while another validator votes will remove the
malicious validators. (2) The minions are less-privileged nodes and cannot perform the
verification work as validators in the PoA blockchain. Every routing node on our
288 I. A. Abd El-Moghith and S. M. Darwish

system is also a minion, has fewer blockchain privileges, and has a unique blockchain
address. They may execute token tracts, activate other contract functions, and check on
the Blockchain transaction information.
We use specific blockchain tokens throughout the blockchain network to reflect the
numerous packets sent to the target nodes. The purpose of a token is to reflect the
digital details of the related packets contained in the smart contract. Routing nodes will
initiate token contracts to generate tokens and map the packet’s status information.
They must perform token transfers on a token basis with each other by way of the token
agreement. The token transfers cannot randomly be updated with malicious nodes due
to the consensus mechanism between server nodes; to some degree, it is the same token
that reflects a packet exchanged through the routing nodes.

3.3 Step 3: Next Hope Selection Using MDPs


The key question of WSN routing is how best to find the next step in every hop. As
mentioned in the literature, the key impacts on next hop decision taking involve trust,
congestion probability and distance to the target”. Readers looking for more infor-
mation regarding how to compute these factors can refer to [7]. The optimal next step is
a standard decision-making mechanism focused on current circumstance, and we are
implementing MDPs to address the issue as it is one of the better choices for a random
dynamic system. Any hop on the route can be seen as a state; each hop is determined to
pick one of the next best hops.
The decision-making in each stage relies on the current scenario, and the entire
routing method is efficient in chain decisions. Because hops are not infinite from source
to destination, we follow a final Markov decision to solve it. The simple principle is
that to find a sequence of better hops by candidates; we must use optimal decision
metrics in the routing process as a criterion for the decisions to construct a Finite
Markov Decision control system. As the network of the wireless sensor is a global
network, central computation does not appeal to accomplish one path. Every node is,
therefore, responsible for measuring and making decisions in any hop. Thereby, we
find the decision of next-hop as a one-phase decision-making process; the purpose of
the decision is to optimize the reward for each move.

4 The Experimental Results

The goal of this experimental package is to check token latency efficiency in the
proposed scheme. In this scenario, three kinds of malicious nodes are allocated, and
they are expected to appear: (i) A malicious node releases a false low (10% of the exact
amount) queue length information. Still, it transmits packets to other routing nodes.
(ii) A malicious node releases the correct queue length information, but it doesn’t send
any package to other routing nodes. (iii)A malicious node releases a false low- queue
length information and doesn’t transmit any packet to different routing. We have used
the transaction packaging period as an estimation factor of the average latency of the
token transaction that tracks the time span that miners have placed on the token
A Deep Blockchain-Based Trusted Routing Scheme for WSNs 289

transaction. We have reported PoA and PoW blockchain systems’ token transaction
latency with an increase in arrival rate k.
The experiment results are shown in Fig. 5, we can see that the latency of the
transaction is relatively stable and does not fluctuate much with the arrival rate k. The
average transaction latency of our PoA blockchain system was around 0.29 ms, while
that of the PoW blockchain system was around 0.52 ms. The results show that our
blockchain system based on the PoA consensus mechanism can save about 44% of the
transaction latency. Such a token transaction delay is acceptable and has little impact on
the routing schedule. It is practical and efficient to use our PoA blockchain system to
collect and manage routing scheduling information.

PoW PoA
0.60
Average Transaction Latency (ms).

0.50

0.40

0.30

0.20

0.10

0.00
1 2 3 4 5 6 7 8 9

Arrival Rate λ (packets/slot)

Fig. 5. Average transaction latency for both PoA and PoW-based blockchain systems

The second set of experiments was conducted to validate the efficiency of the
suggested trusted routing scheme in terms of token transaction throughput. Figure 6
reveals that the token transaction throughput rises steadily as the rate of synchronous
requests increases, and the curve progressively flattens out as the throughput reaches its
peak. The token transaction throughput of our blockchain system using the PoA
consensus mechanism is stable at 3300 concurrent requests per second, whereas that of
the classical blockchain system using the PoW consensus mechanism is only stable
around 1900 concurrent requests per second. We can see from the experimental results
that the PoA-based scheme has more efficient transaction processing ability in the face
of high request concurrency depending on the limited number of validators. It is
suitable and valid to take the PoA algorithm as the consensus mechanism algorithm of
the blockchain system. This PoA blockchain-based routing scheduling scheme can
effectively cope with the situation of large concurrent requests in the routing
environment.
290 I. A. Abd El-Moghith and S. M. Darwish

PoA PoW
Token Transaction Throughput

3500
3000
2500
(times/s)

2000
1500
1000
500
0
500 1000 1500 2000 2500 3000 3500 4000

Concurrent Request Rate (times/s)

Fig. 6. Throughput of transaction token for both PoA and PoW-based blockchain systems

5 Conclusions and Future Work

In this paper, we proposed a trusted routing scheme using a fusion of deep-chain and
Markov decision processes to improve the routing network’s performance. We use the
blockchain token to represent the routing packets, and each routing transaction is
released to the blockchain network through the confirmation of the validator nodes.
Routing nodes will track dynamic and trustworthy routing information on the block-
chain network by having each routing transaction tracker traceable and tamper-
resistant. We also define the MDP model in depth for the efficient discovery of the right
route and prevent routing ties to hostile nodes. Our test results indicate that our schema
will easily remove the assaults of hostile nodes, and the latency of the device is
outstanding. In the future, we plan to use our approach for experiments in more routing
scheduling algorithms besides the backpressure algorithm to verify its effectiveness and
portability. We also plan to incorporate the blockchain-based data validation
technology.

References
1. Yang, J., He, S., Xu, Y., Chen, L., Ren, J.: A trusted routing scheme using blockchain and
reinforcement learning for wireless sensor networks. Sensors 19(4), 1–19 (2019)
2. Jiao, Z., Zhang, B., Li, C., Mouftah, H.T.: Backpressure-based routing and scheduling
protocols for wireless multihop networks: a survey. IEEE Wirel. Commun. 23(1), 102–110
(2016)
3. Ahmed, F., Ko, Y.: Mitigation of black hole attacks in routing protocol for low power and
lossy networks. Secur. Commun. Netw. 9(18), 5143–5154 (2016)
4. Gomez-Arevalillo, A., Papadimitratos, P.: Blockchain-based public key infrastructure for
inter-domain secure routing. In: International Workshop on Open Problems in Network
Security, pp. 20–38, Italy (2017)
A Deep Blockchain-Based Trusted Routing Scheme for WSNs 291

5. Bach, L.M., Mihaljevic, B., Zagar, M.: Comparative analysis of blockchain consensus
algorithms. In: Proceedings of the IEEE International Convention on Information and
Communication Technology, Electronics and Microelectronics, pp. 1545–1550, Croatia
(2018)
6. Bogner, A., Chanson, M., Meeuw, A.: A decentralized sharing app running a smart contract
on the Ethereum blockchain. In: Proceedings of the 6th International Conference on the
Internet of Things, pp. 177–178, Germany (2016)
7. Wang, E., Nie, Z., Du, Z., Ye, Y.: MDPRP: Markov decision process based routing protocol
for mobile WSNs. Commun. Comput. Inf. Sci. Book Series 698, 91–99 (2016)
8. Wang, Y., Ye, Z., Wan, P., Zhao, J.: A survey of dynamic spectrum allocation based on
reinforcement learning algorithms in cognitive radio networks. Artif. Intell. Rev. 51(3), 493–
506 (2018). https://doi.org/10.1007/s10462-018-9639-x
9. Arfat, Y., Shaikh, A.: A survey on secure routing protocols in wireless sensor networks. Int.
J. Wirel. Microw. Technol. 6(3), 9–19 (2016)
10. Deepa, C., Latha, B.: HHCS hybrid hierarchical cluster based secure routing protocol for
wireless sensor networks. In: Proceedings of the IEEE International Conference on
Information Communication and Embedded Systems, pp. 1–6, India (2014)
11. Khan, F.: Secure communication and routing architecture in wireless sensor networks. In:
Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, pp. 647–650,
Japan (2014)
12. De la Rocha, A., Arevalillo, G., Papadimitratos, P.: Blockchain-based public key
infrastructure for inter-domain secure routing. In: Proceedings of the International Workshop
on Open Problems in Network Security, pp. 20–38, Italy (2017)
13. Li, J., Liang, G., Liu, T.: A novel multi-link integrated factor algorithm considering node
trust degree for blockchain-based communication. KSII Trans. Internet Inf. Syst. 11(8),
3766–3788 (2017)
14. Ramezan, G., Leung, C.: A blockchain-based contractual routing protocol for the Internet of
Things using smart contracts. Wirel. Commun. Mob. Comput. 2018, 1–14 (2018). Article ID
4029591
15. Rehan, W., Fischer, S., Rehan, M., Husain, M.: A comprehensive survey on multichannel
routing in wireless sensor networks. J. Netw. Comput. Appl. 95, 1–25 (2017)
16. Kim, H.-Y.: An energy-efficient load balancing scheme to extend lifetime in wireless sensor
networks Expert Syst. Appl. Clust. Comput. 19(1), 279–283 (2016)
17. Darwish, S., El-Dirini, M., Abd El-Moghith, I.: An adaptive cellular automata scheme for
diagnosis of fault tolerance and connectivity preserving in wireless sensor networks.
Alexandria Eng. J. 57(4), 4267–4275 (2018)
18. Wang, T., Wen, C.K., Wang, H., Gao, F., Jiang, T., Jin, S.: Deep learning for wireless
physical layer: opportunities and challenges. China Commun. 14(11), 92–111 (2017)
A Survey of Using Blockchain Aspects
in Information Centric Networks

Abdelrahman Abdellah1,3(B) , Sherif M. Saif1 , Hesham E. ElDeeb1 ,


Emad Abd-Elrahman2 , and Mohamed Taher3
1
Electronics Research Institute of Egypt, Cairo, Egypt
{abdosheham,sherif saif,eldeeb}@eri.sci.eg
2
National Telecommunication Institute, Cairo, Egypt
emad.abdelrahman@nti.sci.eg
3
Computer and Systems Engineering Department, Ain Shams University,
Cairo, Egypt

Abstract. A host-centered networking paradigm was used to deal with


the needs of early Internet users in today’s Internet architecture. Never-
theless, the use of Internet has grown, with most people mainly involved
in collecting large quantities of information independent of their physi-
cal location. In addition, these users usually have immediate demands.
Hence, the needs from internet have got a new form and changed the web
paradigm, where a stronger need for security and connectivity arose. This
has driven researchers to think about a fundamental change to the archi-
tecture of the Web. In this respect, we reviewed many research attempts
that explored information centric networking (ICN) and how it can be
coupled with blockchain technology to strengthen the security aspects
and to develop an effective model for the creation of the future Internet.

Keywords: Information Centric Network · Blockchain · Internet


infrastructure · Future Internet

1 Introduction

The concept of information centric network (ICN) is a promising common app-


roach to various prospective Internet research projects. The technology involves
in-network caching, multi-sided communication via replication, and models of
interaction that disassociate senders and receivers [1]. The objective is to pro-
vide a better-suited network infrastructure service which can be more resilient
to disruption and failure. ICN concept was started by different names such as
Named-objects, Named-Data and Information Aware Network [2]. The concept
of ICN (Information Centric Network) is a shifting paradigm from user-centric
to content-centric one [3]. While Internet is mainly used to share information
instead of connecting pair-wise among end users, ICN intends to represent exist-
ing and anticipated requirements [3] better than the current Internet architecture
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 292–301, 2021.
https://doi.org/10.1007/978-3-030-58669-0_27
A Survey of Using Blockchain Aspects in Information Centric Networks 293

whose networking size is expanding remarkably in terms of memory cost, power


computations and processing complexity [4]. Furthermore, the current tradi-
tional Internet architecture faces many problems regarding the content avail-
ability such as:
– content moved within the site.
– content moved to a different site or site changed domain.
– source temporarily unreachable (e.g. server overloaded).
– content permanently unavailable (e.g. company out of service).
In contrast, ICN adopts the deployment of in-network caching model, and
multicast mechanisms by naming information on the network layer to enhance
an effective and real time delivery of information to the consumers. However,
despite the fact that ICN model is an up-and-coming solution that tackles many
issues of the current Internet, ICN still faces many challenges that should be
considered. One of these challenges is the possibility of tampering the original
data when the publisher registers its content in ICN nodes. Another challenge
when a malicious ICN node refuses to forward data to other ICN nodes or users,
this will cause additional delay in the network. To end this, with the development
of blockchain technology that entered the view of the general public, most of ICN
problems can be solved using the powerful security aspects of this technology.
By using the blockchain paradigm, all of the executed transactions that carry
the ICN node behaviors [5], and then committed to the global blockchain. Each
blockchain ledger stores a copy of data, thus, any ICN node cannot deny or refuse
the transactions that have been committed by the blockchain. The blockchain
can achieve global agreement for the whole sequence of contents. Therefore, an
incompatible record/transaction will be removed once it is confirmed. These
non-repudiation and non-tamping characteristics of the blockchain guarantee a
secure availability of contents in ICN.
The rest of the paper is organized as follows. Section 2 proposes some related
work and the previous ICN implementations followed by a comparison among
them in terms of the applied security mode. In Sect. 3, we propose how the
blockchain can be used as a solution in ICN, Sect. 4, the role of blockchain to
protect public key system of ICN. Finally, a brief summary and conclusion is
provided in Sect. 5.

2 Related Work and Previous Implementations


2.1 Data Oriented Network Architecture (DONA)
DONA radically changes naming by substituting flat names for the linear URLs.
Like URLs that are connected to specific locations by their DNS portion, even
if the data changes, the flat names in Data Oriented Network Architecture
(DONA) may be permanent. This allows the caching and replication of informa-
tion on the network layer, thus increasing the availability of information. DONA
allows copies other than the closest to access explicit requests. Furthermore,
DONA uses the existing IP mechanism to protect its nodes against security
attacks [6]. In this architecture:
294 A. Abdellah et al.

– The publisher sends a REGISTER message that contains the name of the
object to its local Resolution Handler (RH), which maps between the content’s
name and its Location, to store a pointer to the publisher.
– Then, this registration is propagated by the RH to other RHs.
– A user sends a FIND message to his local RH in order to find a content,
which also spreads this message to other RHs in according to their routing
policy until an appropriate registry entry is built. Then, the request follows
the pointers created by the RHs to reach the publisher [6].

2.2 Named Data Networking (NDN)

NDN intends to reconfigure the Internet protocol stack by making the exchange
of named data by using a variety of networking technologies. Names in NDN
are hierarchical and may be identical to URLs, every name element in NDN
may be anything, including a pointed human-readable string or a hash key [7,8].
As shown in Figure the below figure (see Fig. 1), all messages are forwarded
hop-by-hop by Content Routers (CRs) to map information names to the output
interface(s) that should be used to forward INTEREST messages towards appro-
priate data sources. The CS serves as a local cache for information/objects that
have passed through the CR. The subscriber sends an INTEREST That con-
tains the object’s name of the data requested. When an information object that
matches the requested name is found at a publisher node or in a CS, the INTER-
EST message is neglected and the information is provided in a DATA message.
This message is forwarded back to subscriber(s) in a hop-by-hop manner.
On the other hand, in NDN, the subscriber is responsible to trust the owner
of and his public key that was used for signing [7,9].

Fig. 1. Named data networking (NDN)

2.3 Scalable and Adaptive Internet Solutions (SAIL)

The SAIL architecture supports many extended properties, such as searching for
specific data objects by entering some keywords [3]. The SAIL is able to combine
elements present in NDN and other approaches. Furthermore, SAIL mode can
even operate in a hybrid mode and can be implemented over different routing
A Survey of Using Blockchain Aspects in Information Centric Networks 295

and forwarding technologies [10]. In SAIL, The publisher sends a PUBLISH mes-
sage with its locator to the local NRS, Name Resolution System (NRS) is used to
map object names to locators that can be used to reach the corresponding infor-
mation object, and then the publisher makes an information object available.
The local NRS sends a PUBLISH message to the global NRS [10]. The global
NRS stores mapping between the publisher and the local NRS, substituting any
earlier mapping like this. If a subscriber is interested in an information object,
they can send a GET message to their local NRS consulting the global NRS in
order to return a locator for the object. Finally, the subscriber sends a GET
message to the publisher, using the returned locator, and the publisher responds
with the information object in a DATA message.
In line with that The SAIL architecture depends on the hash values in names,
which allows self-certification of both the authority and the local part [11].

2.4 Convergence
The CONVERGENCE architecture (see Fig. 2) has many similarities with NDN,
in fact, its prototype has been implemented as an improvement of the NDN
model [12]. Subscribers submit INTEREST messages to request an information
object, which are forwarded hop-by-hop by the Border Nodes (BNs) to publishers
or Internal Nodes (INs) that provide caching (arrows 1–3 and 6). Publishers
respond with DATA messages which follow the opposite path (arrows 7–10).
Moreover, Convergence adopts the NDN protection strategy per DATA mes-
sage, and each DATA message is electronically signed [13].

Fig. 2. Convergence model

2.5 Mobility First


This project is funded by the initiative of the United States Future Network
Infrastructure [14]. Mobility First offers comprehensive protocols for managing
296 A. Abdellah et al.

mobility and cellular communications, as well as multicast. The structure of


Mobility First is based on the isolation of names from their network addresses
for all individuals connected to the network (including data objects, devices and
services) [16]. A Globally Unique Identifier (GUID) is allocated to each net-
work object in Mobility First via a global naming system that converts human-
readable names into GUIDs. Every Mobility First device has to obtain GUIDs
for its objects of information, and its services. In MobilityFirst, all interaction
begins via a Global Name Resolution Service (GNRS) with GUIDs that are con-
verted into network addresses through one or more phases. A publisher wishing
to make any information available requests for a GUID’s naming system and
then signs the GUID in the GNRS with its network name. A GUID is assigned
to a collection of addresses of GNRS databases that are contacted using regular
routing [15]. The subscriber sends a GET message to its local Content Router
(CR) that includes the GUID of the requested object along with its own GUID
for the response [14]. Furthermore, MobilityFirst proposes a decentralized trust
model for name certification, and can be securely bounded to the entity via
cryptographic techniques.

2.6 General Comparison Among ICN Architectures in Terms


of Security

The main focus of the proposed architectures is the data integrity rather than the
dependability on the media and IP based solutions. All ICN approaches pinpoint
security as an essential problem especially in the used encryption mechanisms
or in data naming level of the existing Internet structure.
As in DONA [6], the self-certifying techniques is used to verify the received
data matches the requested data, while NDN and CONVERGENCE models
force the consumers to accept the publisher public key and its signature [9]. In
Mobility First architecture, GUIDs performs all tasks by using self-certifying
hashes and other cryptographic techniques. However, PURSUIT is the only
noted model that check the incoming and outgoing packets in the forwarding
nodes as well as in the destination nodes [17] (see Table 1).
Many implementations of ICN relies on self-certifying names (see Table 1),
which need the network nodes to test whether the name on a packet fits the
data within the packet. This makes it hard for consumers to determine if that
information is what they needed or Who publish, distribute and remove. In
fact, ICN solutions depend on cryptographic keys and trusted parties to verify
information-name. Thus, the need for key management mechanisms is becoming
a vitally necessary. However, very little research papers are done in this scope [1].
Therefore, blockchain technology emerged to be used with ICN to tackle
these drawbacks and responses because the formation of trust relationships and
effective privacy protection among the different parties involved remains an open
issue in ICN.
A Survey of Using Blockchain Aspects in Information Centric Networks 297

Table 1. The used security modes in ICN architectures

Architecture name Signatures Self-certifying Packet level authentication


DONA No Yes No
NDN Yes No No
SAIL No Yes No
PURSUIT No No Yes
CONVERGENCE Yes No No
Mobility first No Yes No

3 Using Blockchain as a Solution in ICN


The blockchain is a defined as distributed and tamper-proof record that no
centralized entity controls, but can be shared and accessed by all members.
Each record is called a block, and can be added to the existing blocks as long
as the new block is approved by all ledgers in the network. It uses complicated
hash functions, blockchain applies data integrity by preventing any alteration,
deletion, manipulation, and invalid data from being recorded. It may be stated
blockchain fits automatically ICN approaches because of its decentralized nature
and its strong security aspects The integration between blockchain technology
and ICN have become more popular recently and this combination achieves a
significant positive impact in many previous studies like in [18,19].
Throughout ICN, all nodes are operating together to boost the content’s
delivery. However, these nodes are vulnerable to many difficulties related to
protection and malicious behaviours such as:

– Denial of Services (DoS) Attack: the DoS attacks in ICN abuse the stateful
forwarding plane, targeting either the intermediate ICN nodes, intermediate
nodes or the publishers nodes [5].
– Hijacking: A malicious ICN node is able to declare as publisher invalid paths
to any content. Since content requests were directly related to the identified
invalid pathways in ICN, they will not be addressed in the vicinity of the
malicious node.
– Cache pollution: An opponent can often request less popular content in order
to destroy the cache based on popularity in ICN.

For these purposes the authors of [5] developed Blockchain-Based


Information-Centric Networking (BICN). In this model the blockchain is carry-
ing all transactions that transactions record ICN node behaviours. This system
will efficiently feed the behavioral reports at ICN nodes into the database to
ensure and track any fraudulent entities. By applying blockchain and account
its advantages in ICN, many security issues can be solved such as:

– the hash of inappropriate content rarely appears in the blockchain because


each hash value is verified by blockchain miners.
298 A. Abdellah et al.

– jamming attacks and hijacking attacks can be avoided. When an adversary


acts as authorised user to deliver an unnecessary or harmful content. It first
needs to send a request message to its local RH and broadcast this request to
the parent RH. Both the local and parent RHs need to upload request message
and the address of the subscriber to the blockchain (see Fig. 3). After that, the
blockchain will verify the credibility of the address and the request message. If
it is not legal, this unnecessary or malicious content request will be removed.
– Blockchain system is used to find the malicious ICN node for hijacking
attacks. When a malicious ICN node declares invalid pathway for any con-
tent as a publisher, Blockchain can help to detect and delete the fake register
message recorded in the blockchain.

Open issues in BICN:


– Privacy Concerns: It has been proved that the anonymization of the trans-
action address still cannot guarantee the anonymity of the users, and some
deliberate attacks can still cause threats.
– Compared with Bitcoin, the amount of transactions in BICN is far more.
Hence, the communication network is challenged by broadcasting transac-
tions.
In summary, the main challenge of ICN caching is the distributed in-network
cache nature. So, assuring the data integrity can be guaranteed by signing the
contents from their producers or publishers and also by authenticating the inter-
ested objects for these contents1 . Moreover, the in-network caching proposed
by ICN increases the chance of attacks. This is because of the long-lived con-
tent caching nature. Another thing, the intermediate aware nodes between the
publisher/producer and consumers. So, designing a security model for ICN archi-
tecture had some challenges like: the model is not end to end (i.e., non transpar-
ent), long-lived cache and secure key management issues either for generation,
distribution or refreshing. So, we consider that Blockchain can overcome those
challenges and provide an efficient secure way for ICN.

4 The Role of Blockchain in Protecting Public Key


System in ICN
As discussed in Sect. 2, many ICN architectures such as NDN use the signature
and Public Key Infrastructure (PKI) based on Certificate Authorities (CA) to
ensure authentication and data integrity [9]. However, This kind of security
aspect suffers from serious security issues. For instance, CA can be compromised
by a hacker to use an unauthorized public key and produce malicious data using
a counterfeit certificate.
In contrast, blockchain can present a significant solution to this problem
by applying a decentralized public key management system. This method helps
1
ICN Research Challenges (2014): https://www.ietf.org/archive/id/draft-kutscher-
icnrg-challenges-02.txt.
A Survey of Using Blockchain Aspects in Information Centric Networks 299

many organizations to decide on the status of a shared public key database. To


accomplish this aim, blockchain, which enables to record every digital transaction
in a secure, transparent, and non-repudiation manner. The main concept is to
establish a blockchain public key in each domain (e.g. /com, /net, /gov), which
is completed by many miners who validate the originality behind public keys and
create blocks containing the certified keys [20]. In other words, instead of relying
on an individual CA to issue a certificate, like traditional methods, blockchain
allows multiple distributed entities to verify the security certificates by using the
majority rule to reach a perfect consensus about the state of the issued public
keys. If more than a half of the miners (the validator nodes) reveal a positive
result about the issued key then this key is approved and its related transactions
are recorded in the blockchain.

Fig. 3. BICN

5 Conclusion

Within this paper we have presented a study of the information-centered net-


working research area. This paper presented the challenges associated with using
ICN, including problems related to the efficient and cost effective delivery of con-
tent, the need for consistent and exclusive naming of the data objects, and the
significant problem of security that was addressed by several related works that
used the blockchain technology to enhance the security aspects of the ICN. After
300 A. Abdellah et al.

listing the challenges, we then stated a set of common information-centric imple-


mentations that can be used as building blocks to construct an architecture that
satisfies the criteria of the raised issues. We also outlined the characteristics that
make blockchain technology a possible candidate for many applications. In the
present networking implementations, we described each infrastructure, addressed
its legislation, outlined conventional solutions to the needed service and its chal-
lenges. Finally, we presented some related works of combining blockchain aspects
with ICN approaches. Nonetheless, owing to a variety of problems, technology
implementations are still controversial. In potential study projects, these prob-
lems should be overcome and the multiple blockchain implementations in specific
and real-environments should be verified.

References
1. Nikos, F., Giannis, F., George, C.: Access control enforcement delegation for
information-centric networking architectures. In: Proceedings of the Second Edi-
tion of the ICN Workshop on Information-Centric Networking, Helsinki, Finland,
pp. 85–90 (2012)
2. Kutscher, D.: It’s the network: towards better security and transport performance
in 5G. In: IEEE Conference on Computer Communications Workshops (INFOCOM
WKSHPS), San Francisco, USA, pp. 656–661 (2016)
3. Ahlgren, B., Dannewitz, C., Imbrenda, C., Kutscher, D., Ohlman, B.: A survey of
information-centric networking. IEEE Commun. Mag. 50(7), 26–36 (2012)
4. Corella, F.: User authentication with privacy and security. Unfunded Proposal to
the NSF Secure and Trustworthy Cyberspace (SaTC) Program (2012)
5. Li, H., Wang, K., Miyazaki, T., Xu, C., Guo, S., Sun, Y.: Trust-enhanced content
delivery in blockchain-based information-centric networking. IEEE Netw. 33(5),
183–189 (2019)
6. Koponen, T.: A data-oriented network architecture. Teknillinen korkeakoulu (2008)
7. Wu, T.Y., Lee, W.T., Duan, C.Y., Wu, Y.W.: Data lifetime enhancement for
improving QoS in NDN. In: ANT/SEIT, pp. 69–76 (2014)
8. Content Centric Networking project. http://www.ccnx.org/
9. NSF Named Data Networking project. http://www.named-data.net/
10. FP7 SAIL project. http://www.sail-project.eu/
11. Xylomenos, G., et al.: A survey of information-centric networking research. IEEE
Commun. Surv. Tutor. 16(2), 1024–1049 (2013)
12. FP7 CONVERGENCE project. http://www.ict-convergence.eu/
13. Salsano, S., Detti, A., Cancellieri, M., Pomposini, M., Blefari-Melazzi, N.:
Transport-layer issues in information centric networks. In: Proceedings of the Sec-
ond Edition of the ICN Workshop on Information-Centric Networking, Helsinki
Finland, pp. 19–24 (2012)
14. NSF Mobility First project. http://mobilityfirst.winlab.rutgers.edu/
15. Vu, T., Baid, A., Zhang, Y., Nguyen, T.D., Fukuyama, J., Martin, R.P., Raychaud-
huri, D.: A shared hosting scheme for dynamic identifier to locator mappings in the
global internet. In: IEEE 32nd International Conference on Distributed Computing
Systems, pp. 698–707, Washington, DC United States (2012)
16. Baid, A., Vu, T., Raychaudhuri, D.: Comparing alternative approaches for network-
ing of named objects in the future internet. In: Proceedings of IEEE INFOCOM
Workshops, pp. 298–303 (2012)
A Survey of Using Blockchain Aspects in Information Centric Networks 301

17. Lagutin, D.: Redesigning internet-the packet level authentication architecture.


Licentiate’s thesis, Helsinki University of Technology (2008)
18. Ortega, V., Bouchmal, F., Monserrat, J.F.: Trusted 5G vehicular networks:
blockchains and content-centric networking. IEEE Veh. Technol. Mag. 13(2), 121–
127 (2018)
19. Mori, S.: Secure caching scheme by using blockchain for information-centric
network-based wireless sensor networks. J. Sig. Process. 22(3), 97–108 (2018)
20. Yang, K., Sunny, J.J., Wang, L.: Blockchain-based decentralized public key man-
agement for named data networking. In: The International Conference on Com-
puter Communications and Networks, Hangzhou, China (2018)
Health Informatics and AI Against
COVID-19
Real-Time Trajectory Control
of Potential Drug Carrier Using
Pantograph “Experimental Study”

Ramy Farag1(B) , Ibrahim Badawy1 , Fady Magdy1 , Zakaria Mahmoud2 ,


and Mohamed Sallam1
1
Helwan University, Cairo 11795, Egypt
Ramy.Maher92@h-eng.helwan.edu.eg
2
Mechatronics Engineering Department, High Institute of Engineering, Giza, Egypt

Abstract. Microparticles have the potentials to be used for many med-


ical purposes in-side the human body such as drug delivery and other
operations. In this paper, we present a teleoperation system that allows
an operator to control the position of a microparticle by making a tra-
jectory for the microparticle to follow over a distance using a 2 DOF
haptic device in real-time manner. The reference position is sent con-
tinuously from the haptic device to the local setup, and in order to
achieve real-time, the lowest rate of updating the set point and the low-
est rate of updating the control signal are preset. The mechanism con-
trolling the microparticle consists of four electromagnetic coils utilized
as wireless actuators to remotely control the motion of the micropar-
ticle. For achieving closed loop control, a microscopic camera is used
to measure the actual position of the microparticle while flowing in the
water. The results showed that the operator can control the microparticle
while achieving real-time system response. Moreover, an auto-tuned con-
trol system is deployed to guarantee the position control with maximum
settling error less than 8 µm on step response experiment, making the
system a candidate to perform further evaluations inside the microfluidic
channels.

Keywords: Drug carrier · Object tracking · Image processing ·


Real-time system · Control · Auto-tuning

1 Introduction
Microparticles can be coated with drugs and injected inside the human body to
work as a drug delivery robot or to do microassembly operations [1]. Under
the influence of the magnetic fields, the particle could be positioned at the
required coordinates, where the medicine is needed to be positioned. Such a
micro-manipulation process requires having a precise micro-robotic system that
allows the physician to remotely control the motion of the particle and at the
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 305–313, 2021.
https://doi.org/10.1007/978-3-030-58669-0_28
306 R. Farag et al.

same time feel the interaction forces between the particle and the environment
inside the human body. Several researchers proposed different systems for the
wireless micro-manipulation of the magnetic particles as in [1] and [2]. Kummer
et al. [3] demonstrated a system with 8 coils to control a microrobot that has a
diameter of 500 µm. A haptic interface was presented in [4] to enable the physi-
cian to feel the interaction forces arising from the contact between the particle
and a microberad without visual feedback. Sun et al. [5] developed a similar but
autonomous system with visual servoing and precision position control.

Fig. 1. Our system, comprising 4 coils, microparticle and its reservoir, real-time embed-
ded board and microscopic camera

However, the researchers who have worked on similar systems as this have
not emphasized the design details of their controllers or the exact solution for
selecting the gains of their controller [6–9]. The system’s parameters are dynamic,
making an accurate modeling is not feasible. Such uncertainty due to many
reasons, one is due to the presence of the microparticle in a water contain-er.
The microparticle should flow on the top of the water; however, by time it
continues to submerge in water, till it fully falls to the reservoir’s bottom. Also,
because the microparticle is affected by its surrounding such as the residual
magnetism left in the coils. These conditions affect the system’s parameters and
arise uncertainty as the time flows.
In this paper, we present our developed system (see Fig. 1 and Fig. 3) in which
a microparticle with a diameter 100 µm is controlled to follow a certain trajectory
provided by the operator. We also developed the system to response in real-time
manner, making a system working in real-time, makes it more reliable in working
in some critical conditions [16]. We also tackled the problem of selecting the
controller’s gains, by deploying an auto-tuning optimization algorithm to select
Real-Time Trajectory Control of Drug Carrier 307

the parameters. The system has two separate devices; slave device that consists of
4 coils generating magnetic field on the microparticle, and a 2 DOF haptic device
that is used for controlling the trajectory of the microparticle. More details about
the system’s are given in the next section. The microparticles’ (Fig. 2) control
algorithm is built on MyRio, which is real-time embedded evaluation board.

Fig. 2. The paramagnetic particles used in the experiments. The particles have diam-
eters 100 µm [6]

Our previous work on similar systems [18–21] which include deploying control
schemes for controlling the systems’ output and performance.

2 Experimental Setup
The system comprises of real-time embedded evaluation board and real-time
controller running over it, 4 coils its driver to control and the microparticle in
two-dimensional space, pantograph robot.

Fig. 3. The system setup layout


308 R. Farag et al.

2.1 Real-Time Controller


The MyRio embedded board has the feature to work in real-time, thus we could
achieve real-time controller. The whole monitoring and control algorithm are
divided into multiple tasks (layers) (see Fig. 4 and Fig. 5), some work on the PC
connected to the MyRio board and other work on the MyRio board. That causes
computation parallelism, which minimize the whole computational time.

Fig. 4. Main layer of the control algorithm

Fig. 5. Pantograph’s inverse kinematic layer

The PID controller used is auto-tuned to run an optimization algorithm to


calculate the gain of the controller, which is minimize the settling error, the rise
time and settling time. Also, the minimum rate of updating the control signal
and calculating the new set point position via the pantograph is set to be 20
update per second, thus real-time control is achieved.

2.2 Pantograph Robot


It is 4 link system, with two encoders to determine the two main angles of the
its configuration (see Fig. 1). The pantograph system can be used as two degree
Real-Time Trajectory Control of Drug Carrier 309

of freedom robot. However, in our system it is used to enable the operator to


control the trajectory of the microparticle by manipulating its end-effector.
The pantograph has 4 main angles. However, two of them are passive angles,
which can be calculated using the other two angles as follows [17]:
   
f1 l cosq1 + l2 cosq2 − l3 cosq3 − l4 cosq4 − lo
f= = 1 (1)
f2 l1 cosq1 + l2 cosq2 − l3 cosq3 − l4 cosq4

The 2 passive angles are q2 and q3 (see Fig. 6), all the link lengths are known,
as well as, q1 and q4 by the pictographs encoder. q2 and q3 can be calculated
iteratively using Newton-Raphson method as follows:
δf
qpi+1 = qpi − f (2)
δqp

Where p represents the angle’s index (2–3) and (1) is the holonomic constrains
of the pantograph. However, the main use of the pantograph is to use it to make
a trajectory for the microparticle to follow, so it is required to also obtain the x-y
coordinate of the pantograph’s end-effector. That can be done by two methods,
one by using Newton-Raphson method on the holonomic constrains (1) and the
following equations to determine the x-y coordinate:

x − l1 cosq1 − l2 cosq2 = 0 (3)

y − l1 sinq1 − l2 sinq2 = 0 (4)

Fig. 6. Pantograph’s Kinematics [17]


310 R. Farag et al.

The second method can be used to obtain the x-y coordinate is by the inverse
kinematics method (see Fig. 5), which is more recommended than the Newton-
Raphson’s method, since Newton-Raphson’s method is computational more
expensive and does not determine the exact solutions [17].

3 Results

Four experiments have been conducted, the first two of them to evaluate the
system’s response when it is given a trajectory to follow (see Fig. 7 and Fig. 8).
The second two to evaluate the system’s response given a single step input (see
Fig. 9). One of the first two experiments and one of the second two experiments
are done with the PID controller’s gains “P1 ” (Fig. 7), that are calculated using
the optimization algorithm to minimize the cost function (5), where ET is a
vector holding the error at every sampling time. The other two experiments are
done with the initial PID controller’s gains “P2 ” (Fig. 8).

F (Pk ) = E T E (5)

Fig. 7. The microparticle’s response given a trajectory to follow with controller’s gains
“P1 ”

The maximum error when the microparticle is following the operator’s manip-
ulation over the pantograph and the controller’s parameters are set to “P1 ”, is
approximately equal to 120 µm and the mean absolute error is equal to 27 µm.
while the maximum error is approximately equal to 541 µm, when the controller’s
gains are set to “P2 ” and the mean absolute error is equal to 111 µm.
Real-Time Trajectory Control of Drug Carrier 311

Fig. 8. The microparticle’s response given a trajectory to follow with controller’s gains
“P2 ”

Fig. 9. The step responses of the microparticle, one using the controller’s gains “P1 ”
and the other using “P2 ”

The other two experiments are to evaluate the optimization algorithm param-
eters “P1 ” with comparison to the initial parameters “P2 ”, when the micropar-
ticle is given a step input.
The rise-time and settling time of the microparticle given “P1 ” gains are 2.92
and 29 ms respectively, while the overshoot is equal to 2.89%. On the other hand,
the rise time of the microparticle given “P2 ” gains is 5.03 ms and the settling
time is more than 196 ms and the overshoot is equal to 9.41%.
312 R. Farag et al.

4 Conclusion and Future Work


In this paper we propose the use of an optimization algorithm to compute the
gains of the PID controller and its deployment in real-time manner. Also, we
investigated the system’s response, when the microparticle is given a single step
input and when its is given an online trajectory to follow using a pantograph
robot.
The system’s settling error when it is given a single point is less than 8 µm,
that outperforms some work of the previously published work that we mentioned
before. Also, the results show that the operator can control the microparticle
via the pantograph in micro-scale precision, while achieving real-time system
response.
This performance makes the use of optimization algorithms to compute the
gains of the controllers more recommended than the approaches taken by the
previously mentioned work, also makes this system a candidate to perform fur-
ther evaluations inside the microfluidic channels.

References
1. Khalil, I., Brink, F., Sukas, O., Misra, S.: Microassembly using a cluster of paramag-
netic microparticles. In: IEEE International Conference on Robotics and Automa-
tion (ICRA), Karlsruhe, Germany (2013)
2. Abbott, J., Nagy, Z., Beyeler, F., Nelson, B.: Robotics in the small, part I:
microbotics. IEEE Robot. Autom. Mag. 14(2), 92–103 (2007)
3. Kummer, M., Abbott, J., Kratochvil, B., Borer, R., Sengul, A., Nelson, B.:
OctoMag: an electromagnetic system for 5-DOF wireless micromanipulation. IEEE
Trans. Robot. 26(6), 1006–1017 (2010)
4. Lu, T., Pacoret, C., Heriban, D., Mohand-Ousaid, A., Regnier, S., Hayward, V.:
KiloHertz bandwidth, dual-stage haptic device lets you. IEEE Trans. Haptics
10(3), 382–390 (2016)
5. Sun, Y., Nelson, J.: Biological cell injection using an autonomous microrobotic
system. Int. J. Robot. Res. 21(10–11), 861–868 (2002)
6. Keuning, J., de Vries, J., Abelmann, J., Misra, S.: Image-based magnetic control of
paramagnetic microparticles in water. In: International Conference on Intelligent
Robots and Systems. IEEE/RSJ International Conference on Intelligent Robots
and Systems, San Francisco, CA, USA (2011)
7. Khalil, I., Keuning, J., Abelmann, L., Misra, S.: Wireless magnetic-based control of
paramagnetic microparticles. In: International Conference on Biomedical Robotics
and Biomechatronics, Roma, Italya, June (2012)
8. Khalil, I., Metz, R., Reefman, B., Misra, S.: Magnetic-based minimum input motion
control of paramagnetic microparticles in three-dimensional space. In: International
Conference on Intelligent Robots and Systems, Tokyo, Japan (2013)
9. El-Gazzar, A., Al-Khouly, L., Klingner, A., Misra, S., Khalil, I.: Non-contact
manipulation of microbeads via pushing and pulling using magnetically controlled
clusters of paramagnetic microparticles. In: International Conference on Intelligent
Robots and Systems, Hamburg, Germany, 2 October 2015
10. Du, X., Htet, K., Tan, K.: Development of a genetic-algorithm-based nonlinear
model predictive control scheme on velocity and steering of autonomous vehicles.
IEEE Trans. Ind. Electron. 63(11), 6970–6977 (2016)
Real-Time Trajectory Control of Drug Carrier 313

11. Guazzelli, P., Pereira, W., et al.: Weighting factors optimization of predictive
torque control of induction motor by multiobjective genetic algorithm. IEEE Trans.
Power Electron. 34(7), 6628–6638 (2019)
12. Xu, F., Chen, H., Gong, X., Mei, Q.: Fast nonlinear model predictive control on
FPGA using particle swarm optimization. IEEE Trans. Ind. Electron. 63(1), 310–
321 (2016)
13. Smoczek, J., Szpytko, J.: Particle swarm optimization-based multivariable gener-
alized predictive control for an overhead crane. IEEE/ASME Trans. Mech. 22(1),
258–268 (2017)
14. AliZamani, A., Tavakoli, S., Etedali, S.: Fractional order PID control design for
semi-active control of smart base-isolated structures: a multi-objective cuckoo
search approach. ISA Trans. 67, 222–232 (2017)
15. Bououden, S., Chadli, M., Karimi, H.: An ant colony optimization-based fuzzy
predictive control approach for nonlinear processes. Inf. Sci. 299, 143–158 (2015)
16. Arridha, R., Sukaridhoto, S., Pramadihanto, D., Funabiki, N.: Classification exten-
sion based on IoT-big data analytic for smart environment monitoring and analytic
in real-time system. Int. J. Space-Based Situated Comput. 7(2), 82–93 (2017)
17. Khalil, I., Abu Seif, M.: Modeling of a Pantograph Haptic Device. http://www.
mnrlab.com/uploads/7/3/8/3/73833313/modeling-of-pantograph.pdf
18. Sallam, M., Ramadan, A., Fanni, M.: Position tracking for bilateral teleoperation
system with varying time delay. In: the 2013 IEEE/ASME International Conference
on Advanced Intelligent Mechatronics (AIM), Wollongong, pp. 1146-1151 (2013).
https://doi.org/10.1109/AIM.2013.6584248
19. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of mas-
ter salve system using optimal NPID and FOPID. In: 2019 IEEE 28th Interna-
tional Symposium on Industrial Electronics (ISIE), Vancouver, pp. 485-490 (2019).
https://doi.org/10.1109/ISIE.2019.8781129
20. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of master
slave robotics system using optimal control schemes. IOP Conf. Ser.: Mater. Sci.
Eng. 610, 012056 (2019). https://doi.org/10.1088/1757-899X/610/1/012056
21. Sallam, M., Ramadan, A., Fanni, M., Abdellatif, M.: Stability verification for bilat-
eral teleoperation system with variable time delay. Int. J. Mech. Mech. Eng. 5,
2477–2482 (2011)
Early Detection of COVID-19 Using
a Non-contact Forehead Thermometer

Ahmed G. Ebeid1, Enas Selem2(&), and Sherine M. Abd El-kader3


1
Faculty of Engineering, Higher Technological Institute 10th of Ramadan,
Cairo, Egypt
ahmedgebeid@gmail.com
2
Faculty of Science, Suez Canal University, Ismailia, Egypt
enas.selem@yahoo.com
3
Computers & Systems Department at Electronics Research Institute,
Giza, Egypt
Sherine@eri.sci.eg

Abstract. In this paper a non-contact thermometer is designed. It calculates the


temperature from the infrared radiation produced by the subject being measured.
It can be used for industrial purpose as well as for medical purposes. In medical
purpose it is used to measure the core body temperature from forehead tem-
perature depends on ambient air temperature that affects in the heat transfer
coefficient. Heat transfers by conduction from the core body temperature and by
convection from forehead to ambient air. The overall heat transfer coefficient is
determined empirically depends on studies that had been made by measuring
forehead temperature and core body temperature in various ambient air tem-
peratures for hundreds of persons. The accuracy of the proposed thermometer is
±0.3 °C compared with Rossmax HA-500 device which depends on the
accuracy of the studies and its results in various conditions.

Keywords: Corona virus  Non-contact thermometer  Core body


temperature  Forehead  Internet of Things (IoT)  Fifth Generation (5G) 
Local manufacturing

1 Introduction

The sudden appearance of the Coronavirus caused global panic and occupied the minds
of all researchers around the world. Egypt is at the forefront of the world’s countries
that took all precautions and measures to face the Coronavirus crisis, and the Egyptian
health system provided a model in dealing with the Coronavirus crisis professionally in
accordance with the instructions of the World Health Organization (WHO). It also
encourages researchers in all fields to confront the Coronavirus, many teams trying to
produce respirators and others in the production of smart masks, or in the production of
a non-contact thermometer.
Actually, there is a great need for the non- contact thermometer [1, 2] to early detect
coronavirus by rapidly discovering the person who suffers from high temperature. It is
used to remotely measure the core body temperature, it can measure the core body

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 314–323, 2021.
https://doi.org/10.1007/978-3-030-58669-0_29
Early Detection of COVID-19 315

temperature within few seconds, few centimeters away from the body [3], records

approximately 30 readings, and gives an alarm at 38 c. The prices of this kind of
thermometers are ranging from 100 to 200 dollars. IR thermometers have two types
namely medical and industrial thermometer. Industrial thermometers [4] can be used as
a medical thermometer [5], but it gives inaccurate readings [6]. Many companies in
Egypt imported large quantities of thermometers with large amounts money and still
there is a need for more and more. So, we designed and manufactured non-contact IoT
thermometer to help in the early detection of the coronavirus with reasonable cost.
Generally, the health care services are suffering from enormous challenges such as,
high cost of the devices, the increase number of patients, a wide spread of chronic
diseases, and a lack of healthcare management resources. The utilization of IoT & 5G
in medical services [7, 8] will solve a lot of these problems by introducing the coming
benefits: Easy access to health care service by both of patients and doctors, smooth
incorporation with several technologies, ensure the analysis and processing of massive
data, effective utilization of healthcare resources, provide real time and remote moni-
toring established by the joint healthcare services, ensure real-time interaction between
doctors and patients, and authorize health care services.
All of these benefits of the IoT & 5G services will improve the performance of
healthcare applications by presenting different services to hospital, clinics and non-
clinical patient environments.
IR thermometers are of two types, industrial and medical infrared thermometers [9,
10]. Table 1 summarizes the difference between clinical and industrial thermometers.

Table 1. Difference between clinical and industrial thermometers thermometer


Clinical thermometer Industrial thermometer
Short range as it used for human body Broad range as it used for different substance
Used for human body Used for solid, liquid and gas substances
Used in hospitals, homes and airports Used in laboratories and companies
The range from 35 °C to 42 °C The range is −10 °C to 300 °C
Give alarm at 38 °C No alarm
Measure core body temperature within Measure the skin temperature which is differed
3 s, 10 cm away from the body approximately 3 °C from the core body
temperature
Price is approximately 2000 L.E 300 L.E

In this paper we are targeting to design a local manufactured non-contact ther-


mometer to help in the early detection of coronavirus. The designed thermometer has
two modes namely clinical mode and industrial mode, in clinical mode the core body
temperature is accurately estimated with accuracy ±0.3 °C which depends on the
accuracy of the studies in Rossmax HA-500 and its results in various conditions.
The remainder of the paper is arranged as follows: In Sect. 2, related research is
surveyed; method of our work is introduced in Sect. 3, whereas connection between IR
thermometer and a WBAN network is presented in Sect. 4. In Sect. 5, results are
analyzed. Eventually, the conclusions are outlined in Sect. 6.
316 A. G. Ebeid et al.

2 Related Research

Measuring the core degree of core body temperature TC without surgery (non- inva-
sively) is one of the most important research topics. Conventional techniques of esti-
mating TC are usually not proper for the continual applications for the time of physical
activity because it is invasive and it is not accurate enough. The rectal measurement of
TC or measuring at the bottom border of the esophagus (close to the heart) is annoying
particularly in case of using sensors connected via wired connection. Axillary, orally,
and ear (tympanic membrane) temperatures are not accurate especially through the
practice of actions. The alternative method for measuring TC is the use of ingestible
telemetric thermometers but they are very expensive for daily use by a large number of
persons. Although, its great importance, easy, cheap, and accurate measurement of TC
still a great challenge. Therefore, a lot of techniques have been investigated for the non-
invasive TC measurement. The invasively measurement of core body temperature is
conducted using several experiments with varying assumption. The experiment is done
in each method with a varying number of volunteers. The volunteers are varying in
characteristics such as age, weight, height, and body fat. They wear different clothes
with different thermal and vapor resistance. The volunteers were put under different test
scenarios such as standing rest, walking on the treadmill for different times. The
volunteers engaged in a different number of test sessions. The volunteers enter the test
room with different room conditions such as snug [50% Relative Humidity (RH), 25 °
C], hot-dry (20% RH, 40 °C), and hot-humid (70% RH, 35 °C)… etc. TC was measured
within different places such as pectoralis, sternum forehead, left scapula, left thigh, and
left rib cage. The method that estimate TC non-invasively are varying in the factors
which TC depends on such as skin temperature, ambient temperature, heat flux, heart
rate…etc. The estimated TC was compared to the observed TC . The observed TC is a
rectal temperature or taken through a thermometer pill. In [11] the protocol estimates
the core body temperature TC based on three factors which are skin temperature Ts ,
ambient temperature Ta and Heat Flux (HF) or heat loss which is known as the heat
transferred per unit time per unit area or from or to an object. Several linear regression
techniques were presented to predict TC which is dependent variable from two
dependent variables HF and Ts . In [12] TC was estimated exactly as [11] based on Ts ,
HF but the main difference that it take into consideration the Heart Rate (HR). In [13]
the core temperature is estimated using Kalman Filtering (KF) which consists of
training and validation datasets. These data sets are composed of the data from test
volunteers. The parameters of KF model were predicted from the practice dataset using
linear regression of TC against Ts , HF, and HR. KF method used to estimate TC consists
of three states: state-transition state (A), noise correlated with each state and obser-
vation state (C). KF approach achieves perfect assessment of core body temperature in
case of the availability of two of those three inputs (Ts , HF and HR). In [14] a Kalman
filter was used to suits the parameters of its model to each person and gives real-time Tc
estimates. This model uses the Activity (Ac ) of the person, HR, and Ts , also uses two
environmental variables, Ta and Relative Humidity (RH), to estimate the person’s TC in
real time. There are several methods of estimating core body temperature that has
already experimentally applied as a patent [15, 16]. In [17], the patented introduce the
Early Detection of COVID-19 317

procedure of estimating the core body temperature including: determining heat flux
from an aimed surface area of the body that way giving the surface temperature;
estimating the core temperature of the body based on two factors (ambient temperature
and surface temperature), the function including the skin heat loss to the environment.
In [18, 19], this invention is the thermometer that aimed to measure the body cavity
temperature utilizing infrared sensing methods. Infrared radiation released by tissue
surfaces gathered by an infrared lens and directed to an infrared sensor. This infrared
sensor produces a signal voltage based on the difference of temperature between the
body tissues being spotted and the infrared sensor. To detect the correct tissue tem-
perature, a supplementary sensor is utilized to determine the ambient temperature of the
infrared sensor and this ambient temperature is combined to the signal voltage. In [20],
this invention presents an IR thermometer which introduces a method for measuring
core body temperature based on contact temperature, ambient temperature, and
humidity degree. In [21], this invention presents a forehead the non-contact ther-
mometer that measures the core body temperature using the thermal radiation of the
forehead. The core body temperature is calculated as a function of Ta and Ts .

3 Method

The designed IR thermometer is a non-contact temperature measuring device that


reveals infrared radiation from the surface of all objects and calculates it to rele-
vant temperature readings. It enables the user to measure temperature quickly without
touching the measuring object. It can be used to find overheated equipment and
electrical circuits, and also it can be adjusted for medical purposes to measure the core
body temperature. The detailed description of the designed IR thermometer will be
covered in the next subsection.

3.1 The Proposed Infrared Thermometer Component


The components of the designed Infrared Thermometer, as shown in Fig. 1 and 2 are:
1. 0.96” LCD
2. 8-bit microcontroller
3. IR temperature sensor
4. Buzzer
5. Trigger push button
6. Power on/off switch
7. Battery.
318 A. G. Ebeid et al.

Fig. 1. IR Thermometer’s component Fig. 2. The designed thermometer

3.2 IR Thermometer Working Theory


The designed IR thermometer can work as an industrial thermometer and medical
thermometer with two modes switch namely industry mode and medical mode. In the
case of medical mode, an additional calculation is made to estimate the core body
temperature, the range is reduced to be from 35 to 42, the emissivity changed to 0.98
and the accuracy is raised. In the case of industry mode, the estimation of the core body
temperature is skipped and the range is raised by changing the amplification of the
signal. The IR thermometer works as follows:
It records various readings. This reading is taken by measuring the radiation pro-
duced from the object. The greatest reading is chosen as a parameter in the equation
containing the reading of the thermopile sensor, the reading of Negative-Temperature
Coefficient (NTC) which read the ambient air temperature and the emissivity of the
surface. This equation is used to measure the actual temperature of the object.
In the case of medical use, the measurement of the core body temperature is made.
Core body temperature is estimated using different techniques and using various fac-
tors. In the proposed IR thermometer, the core body temperature depends on three
factors: the ambient air which measured by on board temp sensor, convection heat
transfer from skin to ambient air, the emissivity of forehead skin.
The proposed IR thermometer measures the temperature at 1 cm away from the
object. This is because the field of view of the used sensor is very large as it ranges
from 80 to 100 mm. This distance can be increased using the optics system, using
convex lenses minimize the field of view but it must be suitable for the wave length of
the radiation produced from the object. The wavelength of the sensor used in the
proposed IR thermometer ranges from 5.5 to 14 µm. So, the used material of the
convex lens with the same wavelength of the used sensor.
Germanium is used for medical use but in the industrial purpose, the Fresnel lens is
used. Determining the distance between a thermometer and a person depends on the
type of the convex lens.
Early Detection of COVID-19 319

3.3 Core Body Temperature Estimation


As mention above, in the designed thermometer, the core body temperature is calcu-
lated depend on ambient temperature, skin temperature, and the emissivity of the skin.
First, the skin temperature is calculated with the Thermopile sensorThe thermopile
voltage VTP is then determined by:

VTP ¼ S: 2obj : Tobj4  Tsen4 ð1Þ

Where: VTP is the thermopile output voltage, S is the instrument factor, 2obj is the
emissivity of the object Tsen is the ambient Temperature, Tobj is the object Temperature.
Finally, the core body temperature is calculated as follows:

Tc ¼ hnqcðTs  Ta Þ þ Ts ð2Þ

Where: Ts is the skin temperature Ta is the ambient temperature, C is the blood specific
heat, h is the coefficient of a radiation view factor between ambient and the skin tissue
q is the blood flow per unit area.

3.4 The Designed Non-contact Thermometer

Operation: After turning on the power switch, the IR temperature sensor communi-
cates with the microcontroller, after pressing the trigger push button the microcontroller
reads the temperature from the sensor then send it to the LCD, and then the buzzer
alerts the object reading temperature, the temperature will be shown on the LCD and
lasts until the next push button trigger for new temperature reading. The designed
thermometer work flow is shown in Fig. 3.
Specification

• Measurement range: −70 to 380 °C


• Accuracy: ±0.3 °C
• Resolution: 0.1 °C
• Emissivity: 0.98 (For Human Skin)
• Distance spot ratio: 1:2
• 9-volt battery.
320 A. G. Ebeid et al.

Start

Measure forehead temperature

No LCD
Temperature is in range
print low
Yes

Estimate core body temperature

Yes Body Yes Turn on


Body temperature >37.8 temperature
red led
38.2>
No
No
Turn on green led Turn on orange led

LCD print core body

End

Fig. 3. the designed thermometer work flow

4 Connection Between IR Thermometer and a WBAN


Network

The non-contact thermometer can be used as a wireless temperature sensor in a WBAN


system [22] that gives an alarm when the person temperature is 38 °C or more. This IoT
non-contact thermometer will be connected to the mobile gateway which is connected
to a database in the medical server, the persons whose temperature is 37 °C will be
colored with green mark while the assumed infected persons whose temperature exceed
38 °C will be colored with red as shown in Fig. 4. This medical server will be located
in the ministry of health to continually follow up on the temperature of a large number
of people. In order to meet the 5G and the IoT requirements [23, 24], The AESAS
algorithm [25] will be used to increase the capacity and density of the network to serve
a large number of devices effectively without any degradation in the quality of services,
and to ensure that the priority is satisfied among all types of mobile gateways, increased
the overall throughput of the network due to decreasing of the data drop rate, and
decreased the delay or latency for sensitive-delay applications and services. The first
part of the system had been completed by designing the non-contact thermometer; the
second part will be the connection of all of these devices by a WBAN system [22] to
ensure reliable and fast delivery of the medical data to the medical server.
Early Detection of COVID-19 321

Fig. 4. Tracking of infected people with Coronavirus

5 Result

The studies that had been made through measuring the forehead temperature and the
core body temperature for many persons under various ambient air temperatures are
difficult to obtain and costly high then we compare our results with another device
(Rossmax HA-500) as its output represents these studies as its algorithm based on it.
By comparing the infrared thermometer estimation by Rossmax HA-500 [26] as shown
in Fig. 5, it is found the maximum variation in estimating the core body temperature is
0.23 degree by calibrating both of them in National Institute of Standards (NIS).
Rossmax HA-500 has been tested clinically in various huge education hospitals depend
on ASTM E1965-98:2009 protocol regulatory standard, covering enough feverish and
normal core body temperature related to satisfied clinical rrepetitions and accuracy
measurement with comparison to the competitor oral temperature estimation reading.
The accuracy of the estimation depends on the accuracy of the studies and its results in
various conditions. The overall accuracy of the medical forehead thermometer is the
combination of the accuracy of measuring the forehead temperature with the thermopile
sensor and the accuracy of estimation of the core body temperature.
Each manufacturer make algorithm for the forehead thermometer device either
depending on estimating equation that calculate the core body temperature after
measuring ambient air temperature and forehead temperature or depending on look up
table that had been saved before in EEPROM in the device for the corresponding core
body temperature to the forehead temperature and ambient air temperature.
322 A. G. Ebeid et al.

Fig. 5. Our designed thermometer estimation compared to Rossmax HA-500

6 Conclusions

In this paper a local manufacturing easy use, low price non-contact thermometer. It can
be used for industry and medical purpose. The accuracy 0.23 is produced which is
considered a perfect accuracy for medical use, with a certificate from the National
Institute of Standards in Egypt. The non-contact thermometer could be used as a part of
WBAN system for early, real time detection and tracking of people suffers from a fever
which might infected with corona virus. In the future work the non-contact ther-
mometer will be experimentally tested with WBAN system that will be connected to a
database in the medical server, and track the infected persons with coronavirus whose
have a fever with red color.

Acknowlegment. We would like to express our special thanks of gratitude to the National
Institute of Standards (NIS) in Egypt for their support in calibration process, as well as Eng.
Mohammed Ibrahim.

References
1. Sebban, E.: Infrared Noncontact Thermometer, US. Patent 549,114, issued 21 August 2007
2. Yelderman, M., et al.: Noncontact Infrared Tympanic Thermometer, US. Patent 5,159,936,
issued 3 November 1992
3. Wenbin, C., Chiachung, C.: Evaluation of performance and uncertainty of infrared tympanic
thermometers. Sensors 10(4), 3073–3089 (2010)
4. Cascetta, F.: An evaluation of the performance of an infrared tympanic thermometer.
Measurement 16(4), 239–246 (1995)
5. Jang, C., Chou, L.: Infrared Thermometers Measured on Forehead Artery Area, US.
Patent US 2003/0067958A1, issued 10 April 2003
6. Teran, C.G., Torrez‐Llanos, J., et al.: Clinical accuracy of a non-contact infrared skin
thermometer in paediatric practice. Child Care Health Dev. 38(4), 471–476 (2012)
Early Detection of COVID-19 323

7. Dhanvijay, M.M., Patil, S.C.: Internet of Things: a survey of enabling technologies in


healthcare and its applications. Comput. Netw. 153(22), 113–131 (2019)
8. Alam, M.M., Malik, H., Khan, M.I., Pardy, T., Kuusik, A., Le Moullec, Y.: A survey on the
roles of communication technologies in iot-based personalized healthcare applications. IEEE
Access 6(4), 36611–36631 (2018)
9. ASTM Standard E.: 1965, Standard specification for infrared thermometers for intermittent
determination of patient temperature (2003)
10. Fraden, J Diego, S.C.: US, Medical Thermometer For Determining Body Core Temper-
ature US. Patent 7,785,266 B2, issued 31 August 2010
11. Xu, X., Karis, A.J., et al.: Relationship between core temperature, skin temperature and heat
flux during exercise in heat. Euro. J. Appl. Physiol. 113, 2381–2389 (2013)
12. Welles, A.P., Xu, X., et al.: Estimation of core body temperature from skin temperature, heat
flux, and heart rate using a Kalman filter. Comput. Biol. Med. 5(21), 1–6 (2018)
13. Eggenbereger, P., Macrae, B.A., et al.: Prediction of core body temperature based on skin
temperature, heat flux, and heart rate under different exercise and clothing conditions in the
heat in young adult males. Front. Physiol. 10(9), 1–11 (2018)
14. Laxminarayan, S., Rakesh, V., et al.: Individualized estimation of human core body
temperature using noninvasive measurements. J Apll Physoil 124(6), 1387–1402 (2017)
15. Zou, S., Province, H.: Thermometer, US. Patent D837, 668 S, issued 8 January 2019
16. Roth, J., Raton, B.: Contact and Non-Contact Thermometer, US. Patent/000346, issued 2
January 2014
17. Pompei, F., Boston, M.A.: Ambient and Perfusion Normalized Temperature Detector, EP 0
991 926 B1, issued 12 December 2005
18. Fraden, J., Jolla, L.C.: Infrared Thermometer, US. Patent 6,129,673, issued 10 October 2000
19. Fradenand, J., Calif, L.: “INFRARED THERMOMETER.: US. Patent 6, 129, 673, issued 10
October 2000
20. Jones, M.N., Park, L.F, et al.: Infrared Thermometer, US. Patent 0257469, issued 15 October
2009
21. Pompei, F., Boston, M.A.: Temporal Artery Temperature Detector, US. Patent 6,292,685,
issued 18 September 2001
22. Selem, E., Fatehy, M., Abd El-Kader, S.M., Nassar, H.: THE (Temperature Heterogeneity
Energy) aware routing protocol for iot health application. IEEE Access 7, 108957–108968
(2019)
23. Hussein, H.H., Abd El-Kader, S.M.: Enhancing signal to noise interference ratio for device
to device technology in 5G applying mode selection technique. In: 2017 International
Conference on Advanced Control Circuits Systems (ACCS) Systems & 2017 International
Conference on New Paradigms in Electronics & Information Technology (PEIT),
Alexandria, pp. 187–192 (2017)
24. Salem, M.A., Tarrad, I.F., Youssef, M.I., Abd El-Kader, S.M.: An adaptive EDCA
selfishness-aware scheme for dense WLANs in 5G networks. IEEE Access 8, 47034–47046
(2020)
25. Salem, M.A., Tarrad, I.F., Youssef, M.I., Abd El-Kader, S.M.: QoS categories activeness-
aware adaptive EDCA algorithm for dense IoT networks. Int. J. Comput. Netw. Commun. 11
(03), 67–83 (2019)
26. RossMax HA500 Thermometer Instruction Manual, Model: HA500 www.rossmax.com
The Mass Size Effect on the Breast Cancer
Detection Using 2-Levels of Evaluation

Ghada Hamed(&), Mohammed Abd El-Rahman Marey,


Safaa El-Sayed Amin, and Mohamed Fahmy Tolba

Faculty of Computer and Information Sciences,


Ain Shams University, Cairo, Egypt
ghadahamed@cis.asu.edu.eg

Abstract. Breast cancer is one of the most dangerous cancers and with the
tremendous increase in the mammograms taken daily, computer-aided diagnosis
systems play an important role for a fast and accurate prediction. In this paper,
we propose three phases to detect and classify breast tumors. First, is the data
preparation for converting DICOM files to images without losing data. Then,
they are divided into mammograms with large and small masses representing the
input to the second model training phase. The third phase is the model evalu-
ation through two testing levels, first is the large masses checking and the
second level is the small masses checking to output the detection results for
large and small masses. The two testing levels using the trained small and large
masses model overcomes the recent YOLO based detection work and the
combined sizes trained model by achieving an overall accuracy of 89.5%.

Keywords: Breast cancer detection  Digital mammograms classification  You


Only Look Once  Computer aided diagnosis systems

1 Introduction

Breast Cancer comes after skin cancer in being one of the common and leading causes
of increasing the mortality among people and especially women in the whole world [1].
Mammography is the process of screening breast using some amount of Xrays to
generate mammograms that shall contain breast cancer signs if they exist. It is one of
the most common and used tools to screen for breast cancer. However, due to the
tremendous increase in the number of mammograms taken daily, the process becomes
very hard for doctors and consumes a lot of time which makes it prone to errors in the
decision and the diagnosis process [2–4].
So, Computer-Aided Diagnosis (CADs) plays an important role as a second
decision next to the doctors decision [5, 6]. Mainly, the current research works on
deploying the Convolutional Neural Networks (CNNs) to develop CADs since CNNs
are able during training to extract features representing the various con-texts of images
without feature engineering which leads to a great impact on the detection performance.
Also, it is proved that the use of CNN overcomes the drawbacks of conventional mass
detection models [7–11].

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 324–335, 2021.
https://doi.org/10.1007/978-3-030-58669-0_30
The Mass Size Effect on the Breast Cancer Detection 325

The works goal of this paper is to detect the existent masses in mammograms and
classify them with high accuracy. The masses in the mammograms have no fixed sizes
or near a range of sizes, they may be very small for example of width 10 and height 10
pixels and maybe large for example of width 900 and height 800 pixels. For this wide
range we utilize You Only Look Once (YOLO) model to detect masses by first training
YOLO using large masses only, then training using small masses only and the final step
is combining the resultant detected large and small masses. Each time YOLO is trained
with dierent configurations and images properties to fit the training required objective.
The paper is organized as follows. Some of the recent literature reviews is presented
and stated in Sect. 2. Followed by discussing the used datasets in Sect. 3 and the
proposed method in Sect. 4. Section 5 lists the used eval-uation metrics and how they
are calculated. Then, in Sect. 6 the conducted experiments and their results comparing
the YOLO based state of artwork to show the contribution of the proposed approach.
Finally, the work conclusion is discussed in Sect. 7.

2 Related Work

There is a lot of conducted research on breast cancer detection to prove the importance
of CADs in taking decisions regarding breast cancer diagnosis [12–15] and [16]. For
example, in [13] the CADs development impact to early detect breast cancer is proved
comparing with late cancer reveal. In [14], the authors proved the great help of CAD
systems existence to diagnose the chest radiography. In [15] and [16], the authors
showed the improvement of the radiologists’ performance to diagnose the breast
tumors using CADs.
In [17], the multi-scale belief network is used to detect the masses’ positions in the
mammograms of INbreast to obtain a sensitivity of 85%. In [18], a CNN based model
is constructed to merge the low and high-level deep features extracted from two dif-
ferent CNN layers for training to get 96.7% for the classification accuracy. In [19], the
deep CNN is trained to detect masses exist in mammograms with a sensitivity of 89.9%
using the transfer learning advantage.
In [20], all images are first preprocessed by removing the pectoral muscles of the
given mammograms and extracting the fibro-glandular. Then, all pre-processed
mammograms are divided into overlapped parts to be trained using RCNN. Their
testing detection and classification accuracies are 72% and 77%, respectively. In [21],
the residual network is used to classify the of screening pop-ulations mammograms to
obtain AUC of 0.886 and 0.765 for the malignant and benign masses.
In [22], the GoogLeNet and the AlexNet models are used to classify tumors that
exist in the breast to obtain Area Under Curve (AUC) of 0.88 for GoogleNet and 0.83
for AlexNet using Film Mammography dataset number 3 (BCDR-F03). In [23], the
detection performance of the existing masses is obtained very near to the human
evaluation by using feature-based classification, where the radiologists’ average AUC
is 0.814, and the system AUC is 0.840.
326 G. Hamed et al.

3 Datasets

In our experiments, we used two datasets from the public and commonly used. The
annotations of both datasets with the masses regions of interest are set using Breast
Imaging Reporting and Data System (BI-RADS) descriptors. In our work, we assigned
the BI-RADS categories 2 and 3 to the benign cases and the categories 4, 5, and 6 to the
malignant cases. Both datasets also contain Cran-iocaudal (CC) and Mediolateral
Oblique (MLO) views for their mammograms. The two datasets are:
INbreast dataset INbreast dataset [24] are Full-Field Digital Mammograms (FFDM)
which can be requested from here. We selected all the mammograms contain biopsies
in the INbreast dataset which are 107 mammograms with 116 masses. Mammograms
contain both small and large masses as shown in Fig. 1(a) & (b).
CBIS-DDSM dataset CBIS-DDSM [25] is an updated version of the Digital
Database for Screening Mammography (DDSM) since DDSM annotations are set to
indicate generate locations of lesions that are not precise, so segmentation algorithms
are implemented by many researchers on DDSM to get accurate feature extraction in
the CBIS-DDSM. In our experiments, we worked on the 891 cases with large and small
masses as shown in Fig. 1 (c) & (d). The dataset is available online for use from here.

(a) (b) (c) (d)

Fig. 1. Examples from the most widely used public datasets of breast mammograms.
(a) INbreast Mammogram Example 1 with SMALL mass; (b) INbreast Mammo-gram Example
2 with LARGE mass; (c) CBISDDSM Mammogram Example 1 with SMALL mass;
(d) CBISDDSM Mammogram Example 2 with LARGE mass

4 Methods

In this paper, the deep learning You Only Look Once (YOLO) model is used to detect
masses that exist in the breast and classify them. YOLO is selected since it does not
need to go through the image by dividing it into regions of interest for detecting the
included objects bounding boxes and classify them like what has been done by RCNN
and Faster RCNN. YOLO looks to the image in a one-shot [26]. Besides that, the
traditional commonly used CNN based networks like AlexNet, GoogleNet, and RCNN
achieve good detection results but they are slow to predict mammograms in a real-life
The Mass Size Effect on the Breast Cancer Detection 327

application. So, since YOLO examines the image in one shot leads to good results and
faster detection at the same time which is proved later in the experimental results
section.
YOLO has 3 versions which are YOLO-V1 [26], YOLO-V2 [?], and YOLOV3 [?]
such that each version makes an update that leads to better results. So, YOLO-V3 is
used in our experiments and it is the deepest model that is composed of 106 layers of
convolutional layers, max-pooling layers, activation functions, 3  3 and 1  1 filters
with skip connections like ResNet.
We exploited YOLO-V3 advantage of working on the image at the different scales
which leads to very good results in the case of large masses [?]. Since the training
process is going on with the help of the anchor boxes which are set of 9 pairs of width
and height to the object can be detected by YOLO. In our case, the masses exist in the
breast dont have a range of sizes, they may be very small for example in the INbreast
there exist very small masses and the smallest one is of size (W = 12, H = 8) of the case
ID 51049107. On the other size, there are very large masses for example the largest
mass exist is of size (W = 163, H = 110) of the case ID 24065530. So, as obviously
there are large differences in the masses sizes which makes it very difficult to cluster all
the masses of the training set in 9 anchors and if this is done, there will be a large set of
anchors missed between every 2 anchors. So, to overcome the variance of masses sizes
exist in mammograms we train YOLO-V3 two times, one with large masses with 9
large anchor boxes and one with small masses with 9 small anchor boxes.
The full workflow is given in Fig. 2 which is composed into 4 blocks: pre-
processing, training, testing, and evaluation phases. In the first preprocessing phase, the
DICOM images of both the INbreast and the CBIS-DDSM are converted to images
formats then scaled to be 8-bits images instead of 16-bit. All the extracted mammo-
grams are resized to 448448 to be trained with YOLO. The INbreast annotations are
given in XML files so they are read to be placed in a separate annotation text file for
each case with class type, xmin, ymin, mass width and mass height. For the CBIS-
DDSM, the annotations are given in the form of DICOM files that contain the regions
of interest. So, we process the DICOM image by converting it to image then extract
from it the ROI coordinates in the same format done in case of INbreast. The mam-
mograms of both datasets are then split into mammograms with large masses and others
with small masses. This accomplished based on the area of the mass, if the mass area is
less than or equal 100,000, then the mammogram is considered from the small cases
and otherwise, it is considered from the large cases.
The second phase which is the training phase is composed of training YOLO-V3
using 80% from the large cases and training another time but with 80% from the small
cases. Then, the testing phase comes to test the model with the remaining 20% of the
large and the small cases to compute the model detection performance. Finally, the last
phase which is the evaluation phase using new mammograms different from those
using the training and the testing phases. The objective of this phase is to simulate this
approach if applied in real life. So, new mammograms are used to be evaluated using
the model weights generated from training the mammograms with large masses to
detect any large masses. Then, the given new mammograms are passed through another
328 G. Hamed et al.

Fig. 2. The proposed approach phases to detect the breast SMALL and LARGE masses

level of evaluation using the model weights generated from training the mammograms
with small masses to detect any small masses if they exist. For each detected object if
exist, the mass coordinates are extracted with the class probability.

5 Evaluation Metrics

To evaluate the detection accuracy of the masses exist in the mammograms of the
testing set and their classification accuracy, we used the following metrics:
1. Intersect Over Union (IOU): It measures the correctness of the predicted bounding
box by calculating the ratio of the intersection area between thebounding box of the
mass ground truth and the bounding box of the predicted mass divided over their
union area. IOU in our experiments is used to deduce the mass region of interest
detected location but if and only if its value equals to or exceeds 50% comparing
with its ground truth coordinates.
2. The confusion matrix: Is a matrix used to evaluate the classification performance of
a binary classifier using the True Positives (TP), True Negatives (TN), False Pos-
itives (FP) and False Negatives (FN).
3. Precision: It measures the model positive predictions from all that shall be positive
and it is calculated as follows:
The Mass Size Effect on the Breast Cancer Detection 329

TP
P recision = ð1Þ
TP + FP

4. Recall (Sensitivity): It is known as the true positive rate and is calculated as follows:

TP
RecallðSensitivityÞ = ð2Þ
TP + FN

5. Average Precision (AP) & Mean Average Precision(mAP): The AP combines both
precision and recall together by calculating the area under the precision-recall curve.
While the Mean Average Precision is the mean of the AP calculated for all the
classes.

6 Experimental Results

In this section, the conducted experiments of the proposed CAD system with their
results are presented. All are executed on an Intel Core (TM) i7-9700 K desktop
processor 8 cores up to 4.9 GHz Turbo 300 series with 16 GB RAM and GIGABYTE
GeForce RTX 2080 Ti. The development environments used to preprocess and extract
the ground truth annotations from the INbreast and the CBIS-DDSM datasets are
Matlab and Python 3.7. To compile YOLO and train it, we used C++ programming
languages on Ubuntu 14.04 operating system. In all experiments, we used the INbreast
and the CBIS-DDSM datasets in different combinations to do different trials for per-
formance evaluation. Both are divided into 80% for the training set and 20% for the
testing set. The experiments are done after setting the following configurations for
training:
– All the mammograms are divided into 13  13 grid cells (N).
– Number of anchor boxes used during training = 9.
– The classes number (C) is 2 which are benign & malignant and the number of
coordinates (coords) predicted for each box is 4.
– Mammograms are resized to 448  448, i.e. model input.
– Number of training iterations = 4000 iterations.
– Resizing augmentation is enabled during training.
– Learning Rate (LR) = 0.001, other values are used and 0.001 is the best value leads
to best results without overfitting.
– The steps when the LR is changed during training are at 2500 and 3000.
– Scales used to change the LR during training are as follows: 0.1, 0.1.
The above configurations results in output tensor of prediction (ToP) of size 13 
13  (k. (1 + 4 + 2)), i.e. N  N  (k. (1 + coords + C)).
330 G. Hamed et al.

6.1 Experiment I - Large & Small Masses Detection using YOLO-V3


The objective of this experiment is to study the effect of applying the training step on
large and small masses together as shown in Table 1. This experiment is conducted to
be able to compare our 2-levels detection approach results with [3] since they used the
same approach done in this experiment of detecting masses in mammograms using
trained YOLO model on mammograms with different sizes of masses. So, 80% from
both INbreast and CBIS-DDSM datasets are used for training which results in 315
mammograms of benign and malignant classes and the remaining 20% are used to test
the model which are 83 mammograms. As shown in 1, the achieved mAP is 71.47%.
When we check the undetected mammograms at all such for example the mammogram
IDs of 20586908 and 20586934 in the INbreast that are not detected due to the small
size of their masses as shown in Fig. 3 (a) & (b), i.e. their masses area are 11,286 and
30,456, respectively. On the other side, most of the detected mammograms are those
that contain large masses such as mammogram ID of 20588562 shown in Fig. 3
(c) which contains a malignant mass of area 357,984.

Table 1. Results of mass detection & classification using YOLO-V3 on mammograms with
LARGE and SMALL sizes (large variances in masses sizes), where B is donated for benign class,
M is donated for malignant class, ap is the average precision and mAP, is the mean average
precision
Training Testing Set B: TP- M: TP- B AP M AP Prec. Recall MAP@IOU =
Set FP FP 0.5
CBIS- CBIS- 23-7 34-14 67.85% 75.10% 73.00% 66.00% 71.47%
DDSM DDSM
INbreast INbreast

6.2 Experiment II - Small Masses Detection using YOLO-V3


The objective from this second experiment is to study the effect of model training using
large sizes only from masses which gives the results in Table 2. Many trails and
conducted using different combinations of the datasets in the training and the testing set
as follows:
1. Trail 1: Trained with 94 mammograms and tested with 24 mammograms from
CBIS-DDSM.
2. Trail 2: Trained with 156 mammograms from CBIS-DDSM and INbreast and tested
with 24 mammograms from INbreast.
The best mAP obtained in the case of training YOLO-V3 on mammograms with
small masses is 67.98% when the model is trained with CBIS-DDSM and tested on the
same dataset. While the second trial is less than the first one since most of the samples
in the training set are from CBIS-DDSM which have different features from those of
the INbreast with taken into consideration that the main difference is that INbreast is
FFDM while CBIS-DDSM is screened mammograms which are not originally digital.
The Mass Size Effect on the Breast Cancer Detection 331

(a) (b) (c)

Fig. 3. Detected large masses versus the undetected small masses. (a) The undetected INbreast
mammogram ID of 20586908 with SMALL mass; (b) The undetected IN-breast mammogram ID
of 20586934 with SMALL mass; (c) The detected INbreast mammogram ID of 20588562 with
LARGE mass

Table 2. Results of mass detection & classification using SMALL sized masses only
Trail B:TP-FP M:TP-FP B AP M AP Presicion Recall mAP @IOU = 0.5
Trail 1 7-7 10-2 57.75% 78.21% 65.00% 71.00% 67.98%
Trail 2 4-3 13-2 45.54% 80.86% 77.00% 65.00% 63.20%

6.3 Experiment III - Large Masses Detection using YOLO-V3


This experiment’s objective is to study the effect of training using large sizes only from
masses to obtain the results in Table 3. Many trails and conducted using different
combinations of the datasets in the training and the testing set as follows:
1. Trail 1: Trained with 136 mammograms and tested with 40 mammograms from
CBIS-DDSM
2. Trail 2: Trained with 163 mammograms and tested with 80 mammograms from
both from CBIS-DDSM and INbreast.
3. Trail 3: Trained with 85 mammograms and tested with 12 mammograms from
INbreast.
4. Trail 4: Trained with 203 mammograms from CBIS-DDSM and INbreast and tested
with 40 mammograms from INbreast.

Table 3. Results of mass detection & classification using LARGE sized masses only
Trial B: TP - FP M: TP - FP B AP M AP Precision Recall MAP @IOU = 0.5
Trail 1 10-5 14-8 66.73% 72.42% 65.00% 59.00% 69.58%
Trail 2 17-10 48-13 64.48% 77.93% 74.00% 78.00% 71.20%
Trail 3 3-1 9-0 99.20% 80.00% 92.00% 92.00% 90.00%
Trail 4 4-1 33-1 90.29% 88.73% 95.00% 88.00% 89.51%
332 G. Hamed et al.

In this experiment, the obtained mAP reached 89.5% when trained and tested using
mammograms of large masses. This is considered the best result in the done experi-
ments. The main reason behind this is that the masses sizes are large and large here is
computed relative to the full size of mammograms which makes these large masses
more obvious and contained more features to be trained with and consequently fetched
in the testing results in better detection results.

6.4 Comparative Study between the Previous 3 Experiments


The performance of the model trained with large and small masses together in
Experiment I is compared versus the other two models with each other as shown in
Table 4. Here, completely new mammograms from CBIS-DDSM are used for testing.
The evaluation process is done as follows:
1. Select new mammogram (M) with large or small masses.
2. Test the selected mammogram with the trained model in Experiment 1 to get
(Combined Model Result).
3. Test the selected mammogram with the trained models in Experiment 2 and 3
through 2 parallel paths as follows:
– Path 1: Test M with the model trained with small masses in Experiment 2 to get
(Small Model Result).
– Path 2: Test M with the model trained with large masses in Experiment 3 to get
(Large Model Result).
4. Union the results obtained from path 1 and path 2 to get the detection results
obtained by both trained models, i.e. Small Model Result [ Large Model Result.
5. Check the intersected resultant masses from path 1 and path 2 and if exist, the
masses with greater confidence score are detected.
As shown in Table 4, in case of checking mammograms with small masses, the best
accurate detection and classification results obtained from the model trained on
mammograms with small masses in Experiment 2 (Small Model Result). From the
(Small Model Result) of the given mammograms, the model is able to detect all
existing masses and classify them correctly. In the case of the large masses, when we
compared the three models’ performance, the model trained detected all the masses
correctly and only one case is classified wrongly from the given 4 mammograms and
this is can be treated by leaving the classification task to other classifiers other than
YOLO. So, we can depend to get accurate results on the LARGE masses & SMALL
masses trained models which yields to better results than [3] that used YOLO-V1 for
detection after training it by using the same approach applied in the combined model.
The Mass Size Effect on the Breast Cancer Detection 333

Table 4. Results of mass detection & classification using YOLO-V3 on NEW mam-mograms
using COMBINED trained model versus LARGE & SMALL trained model together (GT: is
annotated for the Ground Truth, B: annotated for benign class, M: annotated for malignant class)
Mammogram ID Size category GT class Combined model result Small model result Large model result
Test P_ 00131_ LEFT CC Small B B: 99% B: 97% Not Detected
Test P_ 01101_ LEFT CC Small B B: 30% B: 99% B: 83%
Test P_ 00576_ LEFT CC Small M M: 33% M: 96% Not Detected
Test P_ 00347_ LEFT CC Small M M: 57% M: 95% B: 83%
Test P_ 01365_ LEFT CC Large B M: 93% Not Detected B: 82%
Test P_ 01595_ LEFT CC Large B M: 53% Not Detected M: 99%
Test P_ 00296_ LEFT CC Large M M: 80% Not Detected M: 72%
Test P_ 00758_ LEFT CC Large M Not Detected Not Detected M: 92%

7 Conclusion

In this paper, we utilized YOLO model to develop new detection methodology by


passing mammograms through two paths of testing. In our work, we used two from the
commonly used and public datasets which are INbreast and CBIS-DDSM. Then, all the
mammograms selected from both datasets are divided by considering all the mam-
mograms that have masses of area less than or equal 100,000 as small mammograms
and otherwise as large masses. Then, the same YOLO model is trained on each set
separately followed by two parallel paths of testing for the new mammograms eval-
uation. The first path is testing the new mammogram using the model trained on
mammograms with large masses. While the second path which is testing the same
mammogram using the model trained on mammograms with small masses. This results
in mAP of 89.51% compared by the detection accuracy of the model trained on a large
scale of sizes from masses (large and small) which is 71.47%. Also, by implementing
the proposed idea the mass type classification performance is improved compared with
the recent YOLO-based breast masses detection.

References
1. Boyle, P., Levin, B., et al.: World Cancer Report 2008. IARC Press, International Agency
for Research on Cancer, Lyon (2008)
2. Al-antari, M.A., Al-masni, M.A., Park, S.U., Park, J.H., Metwally, M.K., Kadah, Y.M., Han,
S.M., Kim, T.S.: An automatic computer-aided diagnosis system for breast cancer in digital
mammograms via deep belief network. J. Med. Biol. Eng. 38(3), 443–456 (2017)
3. Al-masni, M., Al-antari, M.A., Park, J.M., Gi, G., Kim, T., Rivera, P., Valarezo, E., Han, S.
M., Kim, T.S.: Detection and classification of the breast abnormalities in digital
mammograms via regional convolutional neural network. In: 39th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2017), Jeju
Island, South Korea, pp. 1230–1236 (2017)
4. Al-masni, M.A., Al-antari, M., Park, J.M., Gi, G., Kim, T.Y.K., Rivera, P., Valarezo, E.,
Choi, M.T., Han, S.M., Kim, T.S.: Simultaneous detection and classifi-cation of breast
masses in digital mammograms via a deep learning YOLO-based CAD system. Comput.
Meth. Prog. Biomed 157, 85–94 (2018)
334 G. Hamed et al.

5. Al-antari, M.A., Al-masni, M.A., Park, S.U., Park, J.H., Kadah, Y.M. Han, S.M., Kim, T. S.:
Automatic computer-aided diagnosis of breast cancer in digital mammograms via deep belief
network, Global Conference on Engineering and Applied Science (GCEAS), Japan,
pp. 1306–1314 (2016)
6. Al-antari, M.A., Al-masni, M.A., Kadah, Y.M.: Hybrid model of computer-aided breast
cancer diagnosis from digital mammograms. J. Sci. Eng. 04(2), 114–126 (2017)
7. Wang, Y., Tao, D., Gao, X., Li, X., Wang, B.: Mammographic mass segmentation:
embedding multiple features in vector-valued level set in ambiguous regions. Pattern
Recognit. 44(9), 1903–1915 (2011)
8. Rahmati, P., Adler, A., Hamarneh, G.: Mammography segmentation with maximum
likelihood active contours. Med. Image Anal. 16(9), 1167–1186 (2012)
9. Domnguez, A.R., Nandi, A.: Toward breast cancer diagnosis based on automated
segmentation of masses in mammograms. Pattern Recognit. 42(6), 1138–1148 (2009)
10. Qiu, Y., Yan, S., Gundreddy, R.R., Wang, Y., Cheng, S., Liu, H., Zheng, B.: A new
approach to develop computer-aided diagnosis Scheme of breast mass classification using
deep learning technology. J. X-Ray Sci. Technol. 25(5), 751–763 (2017)
11. Hamed, G., Marey, M.A.E.R., Amin, S.E.S., Tolba, M.F.: Deep learning in breast cancer
detection and classification. In: Joint European-US Workshop on Applications of Invariance
in Computer Vision, pp. 322–333. Springer, Cham (2020)
12. Hamed, G., Marey, M., Amin, S.E.S. and Tolba, M.F.: A Proposed Model for denoising
breast mammogram images. In: 2018 13th International Conference on Computer
Engineering and Systems (ICCES), pp. 652–657. IEEE December 2018
13. Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and
future potential. Comput. Med. Imaging Graph. 31(4), 198–211 (2007)
14. Van Ginneken, B., ter Haar Romeny, B.M., Viergever, M.: Computer-aided diagnosis in
chest radiography: a survey. IEEE Trans. Med. Imaging 20(12), 1228–1241 (2001)
15. Jiang, Y., Nishikawa, R.M., Schmidt, R.A., et al.: Improving breast cancer diagnosis with
computer-aided diagnosis. Acad. Radiol. 6(1), 2233 (1999)
16. Chan, H.-P., Doi, K., Vybrony, C.J., et al.: Improvement in radiologists detection of
clustered microcalcifications on mammograms: the potential of computer aided diagnosis.
Invest. Radiol. 25(10), 1102–1110 (1990)
17. Dhungel, N., Carneiro, G., Bradley, A.P.: 2015. Automated mass detection from
mammograms using deep learning and random forest. In: International Conference on
Digital Image Computing: Techniques and Applications (DICTA) (2018). https://doi.org/10.
1109/dicta.2015.7371234
18. Jiao, Z., Gao, X., Wang, Y., Li, J.: A deep feature based framework for breast masses
classification. Neurocomputing 197, 221–231 (2016)
19. Suzuki, S., Zhang, X., Homma, N., Ichiji, K., Sugita, N., Kawasumi, Y., Ishibashi, T.,
Yoshizawa, M.: Mass detection using deep convolutional neural network for mammographic
computer-aided diagnosis. In: Proceedings of the SICE Annual Conference 2016, Tsukuba,
Japan, pp. 1382–1386 (2016)
20. Akselrod-Ballin, A., Karlinsky, L., Alpert, S., Hasoul, S., Ben-Ari, R., Barkan, E.: A region
based convolutional network for tumor detection and classification in breast mammography,
pp. 197–205. Springer, Cham (2016)
21. Wu, N., Phang, J., Park, J., Shen, Y., Huang, Z., Zorin, M., Jastrzbski, S., et al.: Deep neural
networks improve radiologists performance in breast cancer screening. IEEE Trans. Med.
Imaging 39(4), 1184–1194 (2019)
22. Jiang F.: Breast mass lesion classification in mammograms by transfer learn-ing. In:
ICBCB17, Hong Kong, pp 59–62 (2017). https://doi.org/10.1145/3035012.3035022
The Mass Size Effect on the Breast Cancer Detection 335

23. Rodriguez-Ruiz, A., Lng, K., Gubern-Merida, A., Broed-ers, M., Gennaro, G., Clauser, P.,
Thomas, H.H., et al.: Stand-alone artificial intelligence for breast cancer detection in
mammography: comparison with 101 radi-ologists. JNCI: J. Natl. Cancer Inst. 111(9), 916–
922 (2019)
24. Moreira, I., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M., Cardoso, J.: INbreast:
toward a full-field digital mammographic database. Acad. Radiol. 19(2), 236–248 (2012)
25. Lee, R.S., Gimenez, F., Hoogi, A., Miyake, K.K., Gorovoy, M., Rubin, D.L.: A curated
mammography dataset for use in computer-aided detection and diagnosis research. Sci. Data
4, 170–177 (2017)
26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 779–788 (2016)
27. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
28. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.
02767 (2018)
An Integrated IoT System to Control
the Spread of COVID-19 in Egypt

Aya Hossam1(&), Ahmed Magdy2, Ahmed Fawzy3,


and Shriene M. Abd El-Kader4
1
Electrical Engineering Department, Faculty of Engineering (Shoubra),
Benha University, Benha, Egypt
aya.ahmed@feng.bu.edu.eg
2
Electrical Engineering Department, Faculty of Engineering,
Suez Canal University, Ismailia, Egypt
3
Nanotechnology Central Lab, Electronics Research Institute (ERI),
Cairo, Egypt
4
Computers and Systems Department, Electronics Research Institute,
Cairo, Egypt

Abstract. Coronavirus disease 2019 (COVID-19) is one of the most dangerous


respiratory illness through the last one hundred years. Its dangerous is returned
to its ability to spread quickly between people. This paper proposes a smart real
solution to help Egyptian government to track and control the spread of COVID-
19. In this paper, we suggest an integrated system that can ingest big data from
different sources using Micro-Electro-Mechanical System (MEMS) IR sensors
and display results in an interactive map, or dashboard, of Egypt. The proposed
system consists of three subsystems, which are: Embedded Microcontroller
(EM), Internet of Things (IoT) and Artificial Intelligent (AI) subsystems.
The EM subsystem includes accurate temperature measuring device using IR
sensors and other detection components. The EM subsystem can be used in the
entrance of places like universities, schools, and subways to screen and check
temperature of people from a distance within seconds and get data about sus-
pected cases. Then, the IoT subsystem will transmit the collected data from
individuals such as temperature, ID, age, gender, location, phone number. etc.,
to the specific places and organizations. Finally, a software based on AI analysis
will be applied to execute statistics and forecast how and to what extent the virus
will spread. Due to the important role of Geographic Information Systems
(GIS) and interactive maps, or dashboards, in tracking COVID-19, this paper
introduces an advanced dashboard of Egypt. This dashboard has been intro-
duced to locate and tally confirmed infections, fatalities, recoveries and present
the statistical results of AI model.

Keywords: COVID-19  Embedded Microcontroller (EM)  MEMS  Internet


of Things (IoT)  Artificial intelligence (AI)

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 336–346, 2021.
https://doi.org/10.1007/978-3-030-58669-0_31
An Integrated IoT System to Control the Spread of COVID-19 in Egypt 337

1 Introduction

The novel coronavirus has spread from china to several countries around the world, so
it has been announced as a global pandemic by the World Health Organization(WHO)
on 12th March 2020 [1–3]. From that time, all the infected countries still exploring for
an effective and practical solution to tackle the problems arising due to COVID-19 [4,
5]. At the time of writing this paper, June 24, 2020, the total number of COVID-19
confirmed cases reached 9,154,232 cases including around 473,650 persons, recorded
as deaths from the disease reported to WHO. For Egypt, there are 56,809confirmed
cases with around 2,278 persons recorded as deaths from the disease [6].
Researchers in sciences and engineering are attempting to suggest new models, and
systems that can help community to fight COVID-19 [7–9]. Any system or model for
monitoring COVID-19 infected people can help to contain the spread of virus and
alarming the people or health workers on the expected spread rate. The latest
improvements in the field of information and communication technologies (ICTs) [10,
11], Internet of Things (IoT) [12–14], and artificial intelligence (AI) [13, 15] can help
researches to build systems and models that can help to stop the spread of the COVID-
19. These technologies can be used to handle the big amount of data taken from public
health surveillance, real-time epidemic outbreaks monitoring, trend now-
casting/forecasting, regular situation briefing and updating from governmental insti-
tutions and organisms [16]. AI technology can help to fight and contain the coronavirus
through many applications such as population screening, tracking how infection out-
breaks and where, and notifications of when to seek medical help [13]. Screening the
population help to identify and certain who is potentially ill which is necessary for
containing COVID-19. Due to that, the prediction of where the virus might appear next
can be obtained and help the government to take an effective response [17]. IoT
technology can provide community with Real-time tracking and live updates in various
online databases like interactive maps of countries or dashboards [18]. Such these
applications assist communities to predict which populations are most susceptible to
the negative impacts of a coronavirus spread and help to alert the individuals of
expected infection regions in real time so those regions must be avoided.
In this paper, a new integrated system that can ingest big data from different sources
using IR sensors and display results in an interactive map of Egypt (see Fig. 1). This
system consists of two parts which are: hardware, and software. Hardware part includes
the Embedded Microcontroller (EM) and IoT subsystems. The goal of proposed
hardware system is to manufacture a device based on MEMS IR sensors that used to
monitor the temperature of people in the proximity and quickly determine whether they
may have a fever, as one of the symptoms of the coronavirus. In the other hand,
Software part includes AI models that used to do statistics on the collected data by the
hardware part. Also, AI part helps in introducing an advanced interactive map locates
and tallies confirmed infections, fatalities and recoveries. Graphs declares the virus
spread over time. The proposed system will help the competent authority in the
Egyptian government to identify or suspect the corona virus, thereby reducing the
spread of the virus.
338 A. Hossam et al.

Indicators of possible COVID-19


No indicators of infection (High temperature).
possible COVID- Assessment of *Must take data from the case
19 infection. people at Check- like:
point. -(Name / Age / Gender/ ID)
-(Other symptoms/ travel history
/ Contact with confirmed cases)
** Explain to the patient the im-
Treat as a normal Case. portance of home isolation and con-
tact 105 for Egyptian Ministry help
"Patient under Investigation".
This operation is done for all people and at different check
i t
Collected Data

IoT Subsystem

Statistics of
Reporting
collected data Monitoring
App.

Final Output AI Subsystem

EGYPT dashboard to track COVID-19 Tracking and monitoring


of patients overall Egypt.

Registration of
patients on data-

Show the output


curve of AI model.

Fig. 1. The Flowchart of the whole proposed Integrated System.

2 The Proposed System

This section demonstrates a detailed proposed integrated system based on IoT to fight
the spread of COVID-19 in Egypt. This integrated system has two parts which are:
Hardware and Software parts. The hardware part includes the EM and IoT subsystems,
while the software part includes the AI software-based models. Firstly, the EM system
includes accurate temperature measuring device using MEMS IR sensors, microcon-
troller, digital screen with online contact, and other detection components. This pro-
posed device can be used in places including universities, schools, airports, subway and
railway stations. The advantage of this hardware device is that it can screen people
from a distance and within minutes can test hundreds of individuals for fever.
An Integrated IoT System to Control the Spread of COVID-19 in Egypt 339

Secondly, the IoT subsystem which transmits the collected data from individuals such
as temperature, ID, age, gender, location, phone number…. etc., to the specific places
for analysis. Finally, after collecting the required data, a software based on AI analysis
will be applied to execute statistics and forecast how and to what extent the virus will
spread, using a set of features and pre-determined parameters. Finally, an advanced
interactive map, denotated as 3AS digital dashboard, of Egypt has been introduced to
locate confirmed infections, fatalities and recoveries. This service allows Geographic
Information System (GIS) users to consume and display disparate data inputs without
central hosting or processing to ease data sharing and speed information aggregation.
The 3AS dashboard helps to predict the virus spread over time and regions. Fig. 1
shows the flowchart of the whole proposed integrated system to control the spread of
COVID-19.

2.1 Hardware Part


This section explains the hardware part of the proposed system in details. This paper
proposes a new hardware device based on MEMS IR sensors to check the temperature
of people at the entrance of various places. This part includes the EM and IoT systems.
Embedded Microcontroller Subsystem is considered as one of the most impor-
tant modern systems worldwide. In the proposed system, the EM subsystem has an
indispensable function and represents the link between the system mind and its other
parties. The basic component which is responsible for temperature detection is the
MEMS IR sensor. Infrared (IR) radiation gets out from any object with a temperature
degree above absolute zero value. The band of frequencies range from 3x1011 to
4x1014 Hz and a wavelength of 0.75 to 1000 lm. Thus, the definition of IR radiation is
the electromagnetic wave that radiate between visible light and microwave ranges.
There are some important factors that affected on both IR spectrum and energy density
which are object type, surface shape, surface temperature, and other factors. The Planck
radiation formula [19] represents the relationship between IR radiation and tempera-
ture. The ideal IR radiation object is denoted as a “black body”.
In the proposed EM system, MEMS IR sensor can help in detection of COVID-19
as a basic element to detect the body temperature of the people. It consists of LED and a
photodiode. Principle of work is based on emission of light from IR LED and photo-
diode which sense with the IR radiation. When IR radiation falls on photodiode, pho-
todiode resistance will change due to the intensity of radiation. Also, the voltage drop
across the photodiode will be changed. The voltage comparator will be used to measure
the voltage drop and produce the output accordingly as shown in Fig. 2(a,b) [20].
The position of LED and photodiodes classified into two techniques: direct and
indirect. For direct method, IR LED and photodiode will be line of sight. But in the
case of indirect Incidence, that used in the proposed thermal gun device, both the
IR LED and Photo diode are placed in parallel (side by side), facing both in same
direction. In that position as shown in Fig. 2(d), when an object is kept in front of IR
pair, the IR light gets reflected by the object and gets absorbed by photodiode. For the
proposed thermal gun device, a sensor has been needed to sense the temperature of
human body (not ambient) without any direct contact with the person.
340 A. Hossam et al.

One of working principles can be summarized as: there are two different materials
A and B. When an IR radiation is gathered by the absorber as shown in Fig. 2(c), the
thermocouple junction will warm up. The temperature difference between the hot
junction and cold junction stabilizes. The see back effect generates a voltage between
the open ends as follows:

V out ¼ ðaA  aB ÞDT ð1Þ

where aA and aB are the Seebeck coefficients for thermoelectric materials A and B,
respectively. A thermopile is a series-connected array of thermocouples. Thus, the
voltage generated by the thermopile IR detector is directly proportional to the number
of thermocouples, N

V out ¼ N ðaA  aB ÞDT ¼ ðaA  aB ÞDT total ð2Þ

where DTtotal is the sum of the temperature differences in the thermocouples.

Fig. 2. Components and working principle of IR sensor (a) Example of MEMS IR sensor [20],
(b) Main components of MEMS IR sensor, (c) MEMS thermoelectric IR sensor, and (d) Indirect
incidence.

IoT System is an important field that help to capture real-time data at scale. These
data will be used by AI and data analysis systems to understand healthcare trends,
model risk associations, and predict outcomes. IoT can help to develop real-time
tracking map for following cases of COVID19 across Egypt.
Due to that, this paper used IoT in the proposed system to transmit the collected
data from different checkpoints at real time to the specific places and organizations for
An Integrated IoT System to Control the Spread of COVID-19 in Egypt 341

analysis. There are a lot of design solutions of IoT subsystem. Also, IoT helps in
updating the numbers of the proposed 3AS dashboard every day. This paper introduces
two of them which can be used in the proposed system. According to the place and
number of people will be monitored the suitable design solution of these two models
can be chosen. Fig. 3 presents the components of the two design solutions of IoT
subsystem. Table 1 provides a comparison between these two models of the proposed
IoT system.

Fig. 3. The components of the two design solutions of IoT subsystem.

Table 1. A comparison between Raspberry Pi and ESP models of the proposed Hardware part.

Raspberry Pi Model Smart Phone Model

Figure

Raspberry Pi used here as a main brain


Smart phone is the main function in
(processor) for embedded system to make
data entry in this model and can make
Principle of any analysis on EM system.
connection between IoT subsystem and
working This model uses GSM and GPS module to
EM system.
link raspberry pi with IoT and AI subsys-
This model uses ESP Module to the di-
tems.
rect connection between sensors, IoT
and AI subsystems.

• low cost. • High GUI Performance.


• DSI display port for connecting a Rasp- • Reliable system.
berry Pi touchscreen display • More secure.
Advantages • Micro SD port for loading your operating • Support all features of iPhone like Fire-
system and storing data based.

Disad- • Low GUI Performance. • High Cost.


vantages • Low Reliability
342 A. Hossam et al.

2.2 Software Part: AI Model


The software part includes the AI models which play a vital role in analyzing the
collected data using the proposed hardware part. Since the coronavirus outbreak,
researchers scramble to use AI models, and other data analytic tools to explain COVID-
19 infections, predict and monitor the virus spread. This can help government to
manage and limit of socio-economic impacts.
In this paper, the proposed system uses AI model to track and predict the manner of
COVID-19 disease spread not only over time but also over areas. After collecting the
required data from the hardware part, a software based on AI analysis has been applied
to execute statistics and forecast how and to what extent the virus will spread, given a
set of pre-determined parameters and characteristics as shown in Fig. 4.

Fig. 4. The operation of AI in the proposed integrated system.

This paper also introduces an AI advanced interactive map (or dashboard) locates
and tallies confirmed infections, fatalities and recoveries in Egypt. The role of AI
model in this dashboard is that it can estimate and divide regions into groups of no risk,
moderate risk, and high-risk regions. The identified high-risk areas can then be quar-
antined earlier to help government reducing the spread the coronavirus. This web
service allows GIS users to display different data inputs without central processing that
can help to ease data sharing and speed information aggregation.
The proposed dashboard also presents an “outbreaks near you” feature, or alert
message, that informs individual users about nearby infected areas based on their
current location as obtained from their Web browser/smartphone [6, 7]. The proposed
interactive Map helps users to detect the areas that should be put under quarantine.
Communication through the proposed dashboard introduces accessible information to
An Integrated IoT System to Control the Spread of COVID-19 in Egypt 343

Egyptian people around the country to protect themselves and their communities. This
tool type improves data transparency and helps authorities disseminate information.

3 Results

The Geographic Information Systems (GIS) and interactive maps, or dashboards, are
considered as important and critical tools in tracking, monitoring, and combating
COVID-19. In response to that, this paper develops the first interactive dashboard in
Egypt to visualize and track the daily reported cases of COVID-19 in real time. This
developed dashboard, called 3AS dashboard according to the first letter of Authors’
Names, declares the location and number of confirmed COVID-19 cases, deaths, and
recoveries in Egypt. Also, it helps researchers, scientists, and public health authorities
to make searches and use AI models to make statistics. All data collected and displayed
are made available which taken from Egyptian Health Ministry, WHO reports, and
Google Sheets about COVID-19. These reported data which displayed on the devel-
oped 3AS dashboard aligns with the daily and WHO situation reports within Egypt (see
Fig. 5). Furthermore, 3AS dashboard is particularly effective at capturing the data
about infected cases and deaths…etc. of COVID-19 in new infected regions all over
Egypt. The developed 3AS dashboard provides many information related to COVID-19
such as the reported detailed data of each governate in Egypt, hotline links with
Egyptian Health Ministry, International link of COVID-19 for WHO, and statistics
results of the AI model daily/accumulated (see Fig. 6). Also, the new confirmed cases
can use 3AS dashboard to find the nearest hospital that has empty intensive care beds.

Fig. 5. The developed 3AS dashboard of Egypt.


344 A. Hossam et al.

Fig. 6. The developed 3AS dashboard indicates Alex map and detailed report.

4 Conclusion

This paper introduces an integrated IoT system to control COVID-19. The proposed
system consists of two parts which are: hardware and software. The hardware part
includes the EM and IoT systems while the software part includes the AI software-
based models. The hardware part was designed to help in screen temperature of people
and provides the AI models with the collected data to make statistics on it. MEMS IR
sensors in EM system were used to get more accurate and fast temperature degrees of
people. Also, this paper introduces the best two models in IoT system that can be used
to transmit data for doing processing and analysis.
This paper developed the first interactive dashboard, named 3AS dashboard, spe-
cialized for Egypt. 3AS dashboard declares the location and number of confirmed
COVID-19 cases, deaths, and recoveries in Egypt. It provides community with many
information related to COVID-19 such as the reported detailed data in Egypt, hotline
links with Egyptian Health Ministry, International link of COVID-19 for WHO, the
location of hospitals that have empty intensive care beds, and statistics results of the AI
model daily/accumulated.
With respect to further improvement, ongoing developments are illustrating that
important enhancement in the proposed integrated system as well as design and
implement the proposed EM system can be expected in the near future. This
enhancement will have a great effect on the proposed 3AS dashboard to control the
spread of COVID-19 in Egypt.
An Integrated IoT System to Control the Spread of COVID-19 in Egypt 345

References
1. Brüssow, H.: The novel coronavirus – a snapshot of current knowledge. Microb. Biotechnol.
13, 607–612 (2020). https://doi.org/10.1111/1751-7915.13557
2. Zhu, N., Zhang, D., Wang, W., et al.: A novel coronavirus from patients with pneumonia in
China, 2019. N. Engl. J. Med. 382, 727–733 (2020). https://doi.org/10.1056/
NEJMoa2001017
3. WHO/Europe—Coronavirus disease (COVID-19) outbreak. http://www.euro.who.int/en/
health-topics/health-emergencies/coronavirus-covid-19. Accessed 30 May 2020
4. Huang, C., Wang, Y., Li, X., et al.: Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China. Lancet 395, 497–506 (2020). https://doi.org/10.1016/S0140-
6736(20)30183-5
5. Rapid AI Development Cycle for the Coronavirus (COVID-19) Pandemic: Initial Results for
Automated Detection & Patient Monitoring using Deep Learning CT Image Analysis.
https://arxiv.org/abs/2003.05037. Accessed 24 June 2020
6. Coronavirus COVID-19 (2019-nCoV). https://www.arcgis.com/apps/opsdashboard/index.
html#/bda7594740fd40299423467b48e9ecf6. Accessed 30 May 2020
7. COVID-19: China’s Resilient Digital & Technologies—Accenture. https://www.accenture.
com/cn-en/insights/strategy/coronavirus-china-covid-19-digital-technology-learnings. Accessed
24 June 2020
8. Abnormal respiratory patterns classifier may contribute to large-scale screening of people
infected with COVID-19 in an accurate and unobtrusive manner. https://arxiv.org/abs/2002.
05534. Accessed 24 Jun 2020
9. Ting, D.S.W., Carin, L., Dzau, V., Wong, T.Y.: Digital technology and COVID-19. Nat.
Med. 26, 459–461 (2020)
10. New Report: How Korea Used ICT to Flatten the COVID-19 Curve. https://www.ictworks.
org/korea-used-ict-flatten-covid-19-curve/#.XuJ990UzbIU. Accessed 11 June 2020
11. Okereafor K, Adebola O (2020) The role of ICT in curtailing the global spread of the
coronavirus disease. https://doi.org/10.13140/RG.2.2.35613.87526
12. Singh, R.P., Javaid, M., Haleem, A., Suman, R.: Internet of Things (IoT) applications to
fight against COVID-19 pandemic. Diabetes Metab Syndr 14, 521–524 (2020). https://doi.
org/10.1016/j.dsx.2020.04.041
13. Vaishya, R., Javaid, M., Khan, I.H., Haleem, A.: Artificial Intelligence (AI) applications for
COVID-19 pandemic. Diabetes Metab. Syndr. Clin. Res. Rev. 14, 337–339 (2020). https://
doi.org/10.1016/j.dsx.2020.04.012
14. Selem, E., Fatehy, M., El-Kader, S.M.A., Nassar, H.: THE (Temperature heterogeneity
energy) aware routing protocol for IoT health application. IEEE Access 7, 108957–108968
(2019). https://doi.org/10.1109/ACCESS.2019.2931868
15. Hu, Z., Ge, Q., Li, S., et al.: Artificial Intelligence Forecasting of Covid-19 in China (2020).
https://arxiv.org/abs/2002.07112
16. Selem, E., Fatehy, M., El-Kader, S.M.A.: E-Health applications over 5G networks:
challenges and state of the art. In: ACCS/PEIT 2019 - 2019 6th International Conference on
Advanced Control Circuits and Systems and 2019 5th International Conference on New
Paradigms in Electronics and Information Technology. Institute of Electrical and Electronics
Engineers Inc., pp 111–118 (2019)
346 A. Hossam et al.

17. Ahmed, E.M., Hady, A.A., El-Kader, S.M.A., et al.: Localization methods for Internet of
Things: current and future trends. In: ACCS/PEIT 2019 - 2019 6th International Conference
on Advanced Control Circuits and Systems and 2019 5th International Conference on New
Paradigms in Electronics and Information Technology. Institute of Electrical and Electronics
Engineers Inc., pp 119–125 (2019)
18. Kamel Boulos, M.N., Geraghty, E.M.: Geographical tracking and mapping of coronavirus
disease COVID-19/severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
epidemic and associated events around the world: how 21st century GIS technologies are
supporting the global fight against outbreaks and epidemics. Int. J. Health Geogr. 19, 8
(2020). https://doi.org/10.1186/s12942-020-00202-8
19. Xu, D., Wang, Y., Xiong, B., Li, T.: MEMS-based thermoelectric infrared sensors: a review.
Front. Mech. Eng. 12(4), 557–566 (2017). https://doi.org/10.1007/s11465-017-0441-2
20. D6T MEMS Thermal Sensors—OMRON - Americas. https://www.components.omron.com/
product-detail?partNumber=D6T. Accessed 21 June 2020
Healthcare Informatics Challenges: A Medical
Diagnosis Using Multi Agent Coordination-
Based Model for Managing the Conflicts
in Decisions

Sally Elghamrawy1,2(&)
1
Computer Engineering Department, MISR Higher Institute for Engineering
and Technology, Mansoura, Egypt
sally_elghamrawy@ieee.org, sally@mans.edu.org
2
Scientific Research Group in Egypt (SRGE), Mansoura, Egypt

Abstract. Healthcare Informatics mainly concerns with the management of


patient medical information using different information technologies. The
automated medical diagnosing is one of the main challenging tasks in healthcare
informatics field due to diverse clinical considerations and the conflicting
diagnosing that might occur. To this end, a Multi Agent Coordination-based
Model (MACM) is presented in this paper to manage conflicts in decisions that
might occurs during the diagnosing process. In MACM, a coordination between
different agents will be applied in form of competition and negotiation pro-
cesses. A Bidding Contract Competition Module (BCCM) is proposed to handle
the bidding and contracting between agents. In addition, an Adaptive Bidding
Protocol (ABP) is proposed to manage the bidding and selecting phases in
BCCM. The performance of the proposed BCCM module are evaluated using
number of experiments. The results obtained show better performance when
compared to different multi agent systems.

Keywords: Healthcare informatics  Multi-Agent System (MAS)  Medical


diagnosing  Agent competition  Agent negotiation  Bidding protocols

1 Introduction

Healthcare Informatics is a multidisciplinary area [1, 2] that combines medical, social and
computer sciences. Healthcare informatics employs information technologies to extract
knowledge from medical data and to manage healthcare information. Automated medical
diagnosing is a primary challenge in healthcare informatics. The healthcare workers need
an automated medical diagnosing that save their time and efforts. In this sense, many
researchers provide innovative diagnosing models for various diseases. (e.g. breast cancer
[3], COVID-19 [4, 21] and Alzheimer [5]). The automated diagnosing models help
clinicians in reaching diagnostic decisions using medical decision support systems. But
there are some cases where two or more diagnosing decisions are conflicted with each
other causing uncertainty in the data provided. This happens due to the significant feature
for the accuracy in the decision-making process. Multi-Agent System (MAS) [6] is the

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 347–357, 2021.
https://doi.org/10.1007/978-3-030-58669-0_32
348 S. Elghamrawy

best choice for solving this conflicting due to its nature that are inspired from the coor-
dination and communications skills of human structural models in team working mode.
The coordination within MAS is essential for managing the inner dependencies
among agents [7, 8]. Furthermore, a MAS can demonstrate different perspectives in a
medical situation, for example during COVID-19 pandemic [9], the senior medical
specialists may be remotely monitoring and diagnosing patients and being located far
from the care centre that has junior doctors with less experience. This situation may
lead to conflicting in decisions. In this paper, a Multi Agent Coordination-based Model
(MACM) is proposed to manage the conflicting decisions that might occurs during the
diagnosing process. The proposed coordination-based model simulates the interaction
between two agents: Senior (S-Agents) and Junior (J-Agents) medical agents during
diagnosing case of a patient. To ensure success of agent’s actions and interactions in
MACM, their coordination, including competition and cooperation becomes impera-
tive. The negotiation module in MACM used to resolve these conflicts that might
occurs when agent trying to gain more reward in assigning a specific task or goal. As a
result, this paper main goal is to develop a coordination module in order to provide the
agents the ability to solve their conflicts and reach a compromise.
The paper is organized as follows: Sects. 2 presents the recent work devolving
MASs in healthcare informatics field. The medical diagnosing using the proposed
Multi Agent Coordination-Based Model (MACM) is proposed in Sects. 3. In Sect. 4, a
Bidding Contract Competition Model (BCCM) in the Competition Module is proposed,
considering the bidding and contracting between agents and showing the main con-
tributions that can help in the development of MACM. In addition, an Adaptive
Bidding Protocol (ABP) is proposed to manage the bidding and selecting phases in
BCCM. The performance of BCCM is evaluated, in Sect. 5, using experimental
evaluation. Finally, Sect. 6 concludes the paper’s main contributions and proposes the
topics for future research.

2 Related Work

Multi-Agent Systems (MAS) are extensively considered in the healthcare Informatics


[11] field, for example in medical diagnosis [12–15], patient scheduling [16–18], and
medical decision support systems [19]. The authors in [12] proposed a distributed
architecture for diagnosis using an artificial immune systems (AIS). The architecture
used four common agents in their MAS: Cure provider agent, Grouped diagnosis agent,
Diagnosis agent and B cell agents. A fuzzy inference system is presented for automated
diagnosing. A real distributed medical diagnosing application are presented in [13]
using a mixture of possibilistic logic and an argument theory. The query dialogue
method is presented to detect ambiguous and inconsistent data. An Agent Based
modelling framework is proposed in [14] for analysing Alzheimer MRI using image
segmentation and processing. The framework depends on the cooperation and nego-
tiation among different agents. The patient scheduling challenge is one of the recent
research area that many researchers made efforts for presenting the models that reduce
the patients waiting times. Gao et al. [16] presented a decentralized MAS model for
services scheduling. The model presented a negotiation process between agents using
Healthcare Informatics Challenges: A Medical Diagnosis 349

game theoretic theories. This research assumed that the patient selection is a type of
agent and the other type is the scheduling priorities using a contract net protocol. In
addition, the authors in [18] presented two agent negotiation models to automatically
schedule patients’ meeting using counterproposal approach. The first model main goal
is to present new slots for the meeting. The second one attempted to manage that the
patient attends in the particular slots. Jemal et al. [19] proposed a Decision Support
System based on MAS using Intuitionistic Fuzzy to allow the implementation of the
considered project in healthcare spots using cloud and mobile computing technologies.
These researchers devolved many agent-based negotiation models for healthcare
domain. However, there are limited efforts presented on how the agents are governed in
cooperative or competitive manner. In this context, this paper proposed a multi agent
coordination based model that manage the interaction between agents using competi-
tion, negotiation and cooperative modules. Also, the authors in [20] argued the problem
of the negotiating agents operating on MAS by applying an optimization algorithm. An
integration between machine learning technique with a negotiation optimization
algorithm is presented to analyze intelligent supply chain management. And a number
of researchers [22–24] presented solutions for the communications between different
types of agents using ontology mapping systems, in order to provide semantic inter-
operability between agents.

3 The Proposed Multi Agent Coordination-Based Model


(MACM) During Medical Diagnosing

The main goal of MACM is to propose a multi agent system for coordination between
two different medical opinions during medical diagnosis. The first opinion is the senior
medical specialists may be remotely monitoring and diagnosing patients and being
located far from the care centre. This opinion is simulated in MACM as Senior-agents
(S-Agent). The second opinion is the trainee (junior) doctors with less experience. This
opinion has enough knowledge about the case of the patient, but have difficulty to reach
a diagnosing decision without consulting the senior. This opinion is simulated in
MACM as Senior-agents (S-Agent). This situation may lead to conflicting in decisions.
In this context, a Multi Agent Coordination-based Model (MACM) is proposed to
manage the conflicting decisions that might occurs during the diagnosing process.
MACM simulates the interaction between Senior-agents (S-Agent) and Junior-
agents (J-Agent) medical agents during diagnosing the case of a patient. MACM is
responsible for providing agents the ability to coordinate the interactions among dif-
ferent agents. This coordination can be in the form of coordination or competition with
other agents. Figure 1 shows the coordination module and its interaction with other
modules. S-Agent and J-Agent are associated with a set of task specific agents.
According to the user’s needs, these agents cooperate with each other to achieve user’s
required task. The coordination module attempts to allocate diagnosis to other agents
and synthesizes the results from these agents to generate an overall output.
S-Agent and J-Agent are defined as two main agents in MACM using the Java Agent
Development Framework (JADE). The medical knowledge of patients’ cases is stored
and used by S-Agents. While J-Agents use the raw data in the patient case database that
350 S. Elghamrawy

contains signs and personal data of the patients. When J-Agents need to cooperate with
S-Agents to reach a specific diagnosis, a dialogue is initiated between them.

CT-X-ray
PCR-CRC
data Cooperate
Tasks/Agents
Repository Compete

Patient
Information
Agents data
Repository

Diagnosis
Decision

Fig. 1. The Multi Agent Coordination based Model (MACM) for managing medical diagnosis

4 The Proposed Bidding Contract Competition Module


(BCCM)

The Agent Competition module in MACM is activated if the coordination module of an


agent gets information indicating that this agent will compete with other agents to win a
specific diagnosis when there is a conflict in decision between the S-agent and J-Agent.
To win a specific diagnosis, the agents in competition need to propose a bid and then
this bid is revised. After revising the proposed bid, the diagnosis is assigned to pair of
agents with the highest bid. As a result, there must be contract messages between these
agents to control and manage the interaction between them. To manage this sequence
of competition between agents, a Bidding Contract Competition Model (BCCM) is
proposed, as shown in Fig. 2. BCCM main goal is to allow agents to compete with
each other’s in order to give the right diagnosis decision based on the result of pro-
posed bidding protocol, and to facilitate agents to dynamically generate contracts with
other agents. The J-Agents are the agents that announce that they need help for the
diagnosis task and they create a task/Plan respiratory to provide information about the
diagnosing process. While the S-Agents are responsible for offering the help to these
announcements by proposing a bid, and then perform the delegated diagnosis.
A competition will be held among different S-Agents in BCCM using the proposed
bidding protocol, trying to get the right of taking a specific task. As showed in Fig. 2,
BCCM consists of four main phases: (1) The Broadcasting Phase used to connect the
J-Agents’ needs with S-Agents’ Offers. It consists of three basic modules: The Task
Healthcare Informatics Challenges: A Medical Diagnosis 351

Broadcasting Phase Bidding Phase Selecting Phase Contracting Phase


J-Agents
Negotiation
Task Bid Formulator Bid/Agent Module Pair of
Announcer Association S-Agents/
contract Library

Tasks/Sub Agent Pairs/Contact


-Tasks Capabilities Evaluator Association
Repositor Blackboard Bid
Checker Comparator
Behaviours Evaluator Contract Message
Creator
S-Agents
Broadcasted Agent
Validator Agent Bidder Creator
Selector Result Evaluator

Fig. 2. The Bidding Contract Competition Model (BCCM)

Announcer is used to make each J-Agent announce the task or sub task that need to be
diagnosed. The J-Agent broadcasts its information based on pre-defined diagnosis
Repository. The Blackboard Checker: Each J-Agent provides their needs in executing a
task or subtask on the blackboard space, then the S-Agents fetches the queries coming
from the J-Agents using this module. (2) The Bidding Phase used to handle the bids
proposed by S-Agents to allocate a specific task, depending on the capabilities and
behaviour of S-Agents, it consists of four basic modules: The Agent Bidder Creator
Module which is used to collect information about the requested diagnosis broadcasted.
Then, all registered S-Agents will create bids to start the competition of allocating
the diagnosing task. The Capabilities Evaluator Module is used to evaluate the
capabilities tuple for each S-Agent to perform the announced diagnosis. The beha-
viours Evaluator Module: Also each S-Agent evaluates its behaviours when performing
the broadcasted task or sub-task [10]. The Bid Formulator Module: After each S-Agent
evaluates its capabilities and behaviours of performing the broadcasted task/sub-task, it
formulates a corresponding bid based on bidding protocol, which is in turn sent to the
selection phase. (3) The Selection Phase is used to revise the bids proposed by each S-
Agent and then choose and modify the bid after one bid iteration. It consists of three
main modules: The Bid/Agent Association Module which is used to associate each bid
for its desired task/subtask and with its formulator S-Agent and stores it in the bid/agent
library. The Bid Comparison Module: The J-Agent uses this module to compare the
delivered bids from S-Agents. The Agent Selector Module used to select pair of S-
Agent with the maximum bids for execution a specific diagnosing task. (4) The
Contracting Phase used to control the interaction between the competitive S-Agents
by generating a contract messages between them. It consists of three main modules:
Result Evaluator Module used to evaluate the performance. Contract Message Creator
Module used to generate the contract between the selected pair of S-Agents. The Agent
Pair/Contact Association collects the output (result) of the BCCM phases by associ-
ating each contract message with its corresponding pair of S-Agents and stores it in the
agent pair/contact library.
352 S. Elghamrawy

4.1 The Proposed Adaptive Bidding Protocol (ABP)


A bid represents an offer to execute a specific diagnosis based on its announcement.
In BCCM context, the bid is defined as an indicator of the capability of S-Agent to
perform a specific diagnosis. Generally, agent’s bids in any MAS are influenced by
different factors depending on the auction situation they attempt to compete in it. Thus,
An Adaptive Bidding Protocol (ABP) is proposed for the bidding and selecting phases
of the BCCM, as shown in Fig. 3.

Input=<Backboard
Tuple>

S-Agent Check
Blackboard

Can Satisfy Not Available


Check Diagnosis Get S-Agent
Demands Capabilities
Can’t Satisfy Available

Demands/capabilities Mapping

Generate Agent/Task Rules

Get strength and certainty for each


assigning

Check Task <


Min(str, cer)

>=
ABP Selecting Phase Generate S-Agent

Fig. 3. The ABP protocol used in the bidding phase of BCCM

In the bidding phase, ABP used to generate the S-Agent’s bid in terms of its
capabilities and behaviours, and in terms of desired diagnosing’s demands in the
auction of task allocation problem. In the selection phase, ABP is used to determine
which factor has the deep impact in selecting the pair of S-Agents with highest bid.
J-Agents announce the diagnosis needed to be allocated in the blackboard. Then, the
S-Agents check the blackboard for the announced tasks. Then, the ABP protocol is
used by the S-Agents in the bidding phase of BCCM, as shown in Fig. 3, to generate
the bids. This ABP protocol mainly focuses on the factors that describe the agents.
Capability of any S-Agent and on the factors describe the tasks. A Mapping is used
Healthcare Informatics Challenges: A Medical Diagnosis 353

after each S-Agent checks the announced task’s demands and checks its capabilities.
From this mapping, U1 is obtained that represents the factors that describe S-Agent’s
capabilities, behaviours, and announced task’s demands.

5 The Experimental Evaluation

A number of experiments are used to validate the performance of the proposed com-
petition module in BCCM. Simply speaking, the problem discussed here is an opti-
mization problem of agent competition for diagnosing process. The objective is to
maximize the utility, reduce the cost by shorten the processing time for allocating the
diagnosis to the competitive agents and reduce the failure rate. In each experiment, the
agents’ bids are generated with random utility. For each bid, the sum of all the agents’
utility are calculated to find pair of bids that maximize the total utilities for agents.
There are three experiments in this stage. Number of agents is 100 to 600. Number of
Diagnosis tasks is 1 to 15. The limitation of the number of bids per agent is 10 bids per
agent. The code was implemented using the .NET technology (Visual Stuio.Net 2019)
and run on an intel core i5-8250U processor with 8 GB of RAM.
Experiment One: In this experiment the optimal utility of the proposed BCCM is
measured with different agents’ number. Different number of agents per task are taken
as 100, 200, 300, 400, 500 and 600. The results of implementing BCCM are shown in
Fig. 4.

100 Agents 200Agents 300 Agents


400 Agents 500 Agents 600 Agents
1.05
Optimal Utility

0.95

0.9

0.85

0.8

0.75
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No. of Diagnosis tasks

Fig. 4. The optimal utility of the proposed BCCM

This figure describes that as the number of agents per task increased better results in
optimality rate is obtained. In addition, the proposed BCCM achieves much higher
(near optimal) results when compared with different recent models: SCM [20] and DSS
[16]. It can secure over 98% of the optimal utility; as shown from Fig. 5.
354 S. Elghamrawy

1.1
1
Optimal Utility

0.9 BCCM

0.8 DSS

0.7 SCM

0.6
0.5 No. of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Diagnosing
Tasks

Fig. 5. The optimal utility of BCCM compared to recent models DSS [16] and SCM [20]

This means that the utility of BCCM increases when the diagnosis increased, due to
the rewards that the diagnosis granted to the S-Agents, that attempts to allocate or bid
for winning the task, this leads to increase the utility for these agents.
Experiment Two: In this experiment, the cost needed by BCCM is measured. Fig-
ure 6 shows the computation time needed by BCCM, DSS [16] and AIS [12] with 100
agents. In the proposed BCCM, as expected, the computational cost grows rapidly as
the size of the contract constraints (problem size) increases. The cost of BCCM is
slightly higher than DSS [16] and AIS [12]; however, BCCM gives highest utility and
lowest failure rate when compared with them.

8000

7000

6000

5000 BCCM
CPU Time (ms)

4000 DSS
3000
AIS
2000

1000
No. of
0 Diagnosis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Task

Fig. 6. The CPU time for BCCM compared to recent models DSS [16] AIS [12]
Healthcare Informatics Challenges: A Medical Diagnosis 355

In BBCM, the agents have the benefits of generating the bids by themselves, finding
the winning combinations of S-Agents and assigning contracts to each pair of winning
S-Agents, and that prevents any conflict of interests that might happened between
mediators, this also grantee the autonomous and independence of agents.
Experiment Three: The failure rate of BCCM is measured and compared to the rates
of recent models DSS [16] and AIS [12], as shown in Fig. 7. Lower failure rate resulted
from BCCM when compared with DSS and AIS. Noticed that DSS depends on a
mediator in the bid generation and AR has a limited number of bids, due to that agent
can cover only a narrow portion of its utility space with its own bids. As a result, there
is a risk of not finding an overlap between the bids from the negotiating agents which
maximize the failure rate.

0.25
BCCM
0.2
DSS
Failure Rate %

0.15 AIS

0.1

0.05

No. of
0 Diagnosis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Task

Fig. 7. The failure rate of BCCM compared to recent models DSS [16] AIS [12]

6 Conclusions and Future Work

The agent coordination module in MACM is responsible for giving agents the ability to
coordinate with other agents. This coordination can be cooperated or competed with
other agents. The cooperation module in MACM is used for coordination among
collaborative agents, however the competition module used for coordination among
selfish or competitive agents. The negotiation module in MACM is used to resolve
conflicts that might occur when agents compete to assign a specific task. The main
contribution of this paper reflected in simulating the behaviour of two types of medical
decisions (Senior and Junior agents), in MACM, during medical diagnosing. These
agents may have conflicts of decisions during the diagnosis, for this reason, a Bidding
Contract Competition Module (BCCM) is proposed to handle the bidding and con-
tracting between agents using an Adaptive Bidding Protocol (ABP). Finally, a number
of experiments are performed to validate the effectiveness of the proposed modules,
and their associated algorithms; through comparative studies between the result
obtained from these modules and those obtained from recent models. The preliminary
356 S. Elghamrawy

results demonstrated the efficiency of BCCM and their associated algorithms. As a


future work, an intention for handling the unknown dependencies and relations
between the available agents, these dependencies lead to a complex cooperation
process.

References
1. O’Donoghue, J., Herbert, J.: Data management within mhealth environments: patient
sensors, mobile devices, and databases. J. Data Inf. Qual. 4, 5:1–5:20 (2012). https://doi.org/
10.1145/2378016.2378021
2. Hassan, M.K., El Desouky, A.I., Elghamrawy, S.M., Sarhan, A.M.: Big data challenges and
opportunities in healthcare informatics and smart hospitals. In: Security in Smart Cities:
Models, Applications, and Challenges 2019, pp. 3–26. Springer, Cham (2019)
3. Almurshidi, S.H., Abu-Naser, S.S.: Expert System for Diagnosing Breast Cancer. Al-Azhar
University, Gaza, Palestine (2018)
4. ELGhamrawy, S.M.: Diagnosis and prediction model for COVID19 patients response to
treatment based on convolutional neural networks and whale optimization algorithm using
CT images. medRxiv, 1 January 2020
5. Shaju, S., Davis, D., Reshma, K.R.: A survey on computer aided techniques for diagnosing
Alzheimer disease. In: 2016 International Conference on Circuit, Power and Computing
Technologies (ICCPCT), 18 March 2016, pp. 1–4. IEEE (2016)
6. Ferber, J.: Multi-Agent Systems: An Introduction to Distributed Artificial Intelligence.
Addison-Wesley, Reading (1999)
7. Tweedale, J., Ichalkaranje, N., Sioutis, C., Jarvis, B., Consoli, A., Phillips-Wren, G.:
Innovations in multi-agent systems. J. Netw. Comput. Appl. 30(3), 1089–1115 (2007)
8. Bosse, T., Jonker, C.M., Van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and
verification of dynamics in agent models. Int. J. Coop. Inf. Syst. 18(01), 167–193 (2009)
9. Gorbalenya, A.E.: Severe acute respiratory syndrome-related coronavirus–the species and it
viruses, a statement of the coronavirus study group. BioRxiv (2020)
10. El-Ghamrawy, S.M., Eldesouky, A.I.: An agent decision support module based on granular
rough model. Int. J. Inf. Technol. Decis. Making 11(04), 793–820 (2012)
11. Li, M., Huang, F.: Formal describing the organizations in the pervasive healthcare
information system: multi-agent system perspective. In: ICARM 2016 – 2016 International
Conference on Advanced Robotics and Mechatronics, pp. 524–529 (2016). https://doi.org/
10.1109/icarm.2016.7606975
12. Rocha, D., Lima-Monteiro, P., Parreira-Rocha, M., Barata, J.: Artificial immune systems
based multi-agent architecture to perform distributed diagnosis. J. Intell. Manuf. 30(4),
2025–2037 (2019). https://doi.org/10.1007/s10845-017-1370-y
13. Yan, C., Lindgren, H., Nieves, J.C.: A dialogue-based approach for dealing with uncertain
and conflicting information in medical diagnosis. Auton. Agents Multi-Agent Syst. 32(6),
861–885 (2018)
14. Allioui, H., Sadgal, M., El Faziki, A.: Alzheimer detection based on multi-agent systems: an
intelligent image processing environment. In: International Conference on Advanced
Intelligent Systems for Sustainable Development, pp. 314–326. Springer, Cham, July 2018
15. Nachabe, L., El Hassan, B., Taleb, J.: Semantic multi agent architecture for chronic disease
monitoring and management. In: International Conference on Emerging Internetworking,
Data & Web Technologies, pp. 284–294. Springer, Cham, February 2019
Healthcare Informatics Challenges: A Medical Diagnosis 357

16. Gao, J., Wong, T., Wang, C.: Coordinating patient preferences through automated
negotiation: a multiagent systems model for diagnostic services scheduling. Adv. Eng.
Inform. 42, 100934 (2019)
17. Ahmadi-Javid, A., Jalali, Z., Klassen, K.J.: Outpatient appointment systems in healthcare: a
review of optimization studies. Eur. J. Oper. Res. 258(1), 3–34 (2017)
18. Rodrigues Pires De Mello, R., Angelo Gelaim, T., Azambuja Silveira, R.: Negotiation
strategies in multi-Agent systems for meeting scheduling. In: Proceeding of 2018 44th Latin
American Computing Conference. CLEI 2018, pp. 242–250 (2018). https://doi.org/10.1109/
clei.2018.00037
19. Jemal, H., Kechaou, Z., Ben Ayed, M.: Multi-agent based intuitionistic fuzzy logic
healthcare decision support system. J. Intell. Fuzzy Syst. 37(2), 2697–2712 (2019). https://
doi.org/10.3233/jifs-182926
20. Chen, C., Xu, C.: A negotiation optimization strategy of collaborative procurement with
supply chain based on multi-agent system. Math. Probl. Eng. 2018 (2018). https://doi.org/10.
1155/2018/4653648
21. Khalifa, N.E.M., Taha, M.H.N., Hassanien, A.E., Elghamrawy, S.: Detection of coronavirus
(COVID-19) associated pneumonia based on generative adversarial networks and a fine-
tuned deep transfer learning model using chest X-ray dataset. arXiv preprint (2020). arXiv:
2004.01184
22. Calvaresi, D., Schumacher, M., Calbimonte, J.P.: Agent-based modeling for ontology-driven
analysis of patient trajectories. J. Med. Syst. 44(9), 1–11 (2020)
23. El-Ghamrawy, S.M., El-Desouky, A.I.: Distributed multi-agent communication system
based on dynamic ontology mapping. International J. Commun. Netw. Distrib. Syst. 10(1),
1–24 (2013)
24. Elghamrawy, S.M., Eldesouky, A.I., Saleh, A.I.: Implementing a dynamic ontology mapping
approach in multiplatform communication module for distributed multi-agent system. Int.
J. Innovative Comput. Inf. Control 8(7), 4547–4564 (2012)
Protection of Patients’ Data Privacy
by Tamper Detection and Localization
in Watermarked Medical Images

Alaa H. ElSaadawy1(&), Ahmed S. ELSayed1, M. N. Al-Berry1,


and Mohamed Roushdy2
1
Faculty of Computer and Information Sciences, Ain Shams University,
Cairo, Egypt
alaa_elsaadawy@cis.asu.edu.eg
2
Faculty of Computers and Information Technology,
Future University in Egypt, New Cairo City, Egypt

Abstract. Sharing medical documents among different specialists in different


hospitals has become popular as a result of modern communication technolo-
gies. Accordingly, protecting patients’ data and authenticity against any unau-
thorized access or modification is a must. Watermarking technique is one of the
solutions to protect patients’ information against any signal processing or
geometric attacks. This paper uses a tamper detection and localization water-
marking technique. It embeds a Quick Response (QR) code generated from the
patient’s information into a medical image. In the extraction step, it can detect if
the QR image is attached or not. The proposed approach detects signal pro-
cessing and geometric attacks and localizes the tamper resulting from text
addition, content removal and copy and paste attacks.

Keywords: Tamper detection  Tamper localization  Medical imaging 


Watermark  QR code

1 Introduction

Using shared medical images in some services like telemedicine, telediagnosis, and
teleconsultation has been facilitated after the availability of computer networks.
Sharing patient information among specialists in different hospitals is a must to
understand diseases and avoid misdiagnosis [1–3]. One of the available techniques and
approaches to protect medical images while transferring through the internet against
any corruption or unauthorized access is the watermarking techniques [4].
Hiding the patient’s data into the medical image without distorting the image during
transmission is essential to ensure the confidentiality of transmitted data. Recovering
the hidden data and the original medical image without errors is the priority in Elec-
tronic Patient Record (EPR) data hiding [5, 6]. Since making any modifications on
medical images may lead to misdiagnosis, authenticity, which ensures that the source is
valid and belong to the right patient, and integrity control, which checks that the image
has not tampered, are the major purposes of medical images watermarking [7–9].

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 358–369, 2021.
https://doi.org/10.1007/978-3-030-58669-0_33
Protection of Patients’ Data Privacy 359

Watermarking techniques can be categorized using three different criteria (each


criterion split the methods into different categories). Namely, the working domain,
human perception and reversibility. From the working domain perspective, water-
marking techniques can be classified into transform and spatial domain. In the spatial
domain, the values of the pixels are modified directly to change the colour and intensity
of pixels, i.e., information is added by modifying the value of the pixel [10]. Least
Significant Bit (LSB), Additive Watermarking and Text mapping are examples of the
spatial domain techniques. Spatial domain techniques are simple and have low com-
putational complexity [11]. Working in the transform domain is more complex than
spatial domain but it is more robust against attacks [12]. It depends on transforming the
medical image into another domain before embedding the watermark. This can be done
using the Discrete Wavelet Transform (DWT) [13], Discrete Cosine Transform
(DCT) [14] or Singular Value Decomposition (SVD) [15].
Watermarking techniques can be divided, concerning reversibility, into reversible
and irreversible techniques. Reversible techniques ensure the recovery of both the
medical image and the embedded bits without any distortion [16], while by using
irreversible techniques we can recover the embedded bits only [17].
On the other hand, watermarking techniques can be classified based on human
perception into two classes: visible watermarks, like logos, and invisible watermarks
that can be used in authentication and integrity applications [17, 18]. Invisible
watermarking methods can be divided into three groups: robust, fragile, and semi-
fragile. For copyright protection, robust watermarking techniques have been used; as
they are robust against multiple attacks [17]. Fragile techniques are used mainly in
authentication; as they are sensitive to any linear or nonlinear modification [19, 20].
Finally, semi-fragile methods are used for fuzzy authentications; as they combine the
advantages of both robust and fragile techniques [21, 22].
Attacks are one of the most popular challenges of watermarking techniques. The
two common attacks are signal processing attacks (like image compression, adding
noise and different filters) and geometric attacks (such as rotation, translation and
scaling) [23]. In this paper, a fragile, spatial domain watermarking technique is used to
detect and localize tampers due to different attacks.
The rest of the paper is organized as follows: Sect. 2 presents the literature review,
Sect. 3 explains the proposed technique, Sect. 4 shows the experimental results and
finally, Sect. 5 contains the conclusions and future work.

2 Literature Review

Authentication, integrity and data hiding are priorities in watermarking techniques [24].
Protecting transmitted medical documents against the attack and detecting any tamper
on the medical images become the common objective of the watermarking techniques.
A brief literature review of watermarking techniques for medical images is presented in
this section.
Y. AL-Nabhani et al. [25] have developed a blind, invisible and robust watermarking
technique against some signal processing attacks, such as Gaussian noise and median
filter, and other geometric attacks like JPEG compression, rotation and cropping.
360 A. H. ElSaadawy et al.

The proposed technique used the wavelet domain to embed the watermark. In embed-
ding, the watermark was inserted in the middle-frequency coefficient block of three DWT
levels, while in extraction, a Probabilistic Neural Network (PNN) was used. The pro-
posed technique was able to extract the watermark in all cases but, depending on the
attack, the quality of some extracted watermarks was poor.
A. Sharma et al. [26] have evaluated their proposed method on Magnetic Reso-
nance Imaging (MRI), Computed Tomography Scan (CT Scan) and ultrasound images
[27] of size 512  512 and watermark of size 256  256. The proposed method has
been evaluated against salt & paper noise, Gaussian noise, low pass filter, histogram
equalization and speckle noise from signal processing attacks. Rotation, JPEG com-
pression and cropping from geometric attacks. Their proposed method decomposed the
medical image into Region of Interest (ROI) and Non-Region of Interest (NROI) using
second-level DWT then embedded the hashed watermark image in ROI and encrypted
the EPR and embedded it in NROI. Normalization Correlation (NC) and Bit Error Rate
(BER) for evaluating their results.
In [12], L. Laouamer et al. presented a tamper detection and localization approach
against some attacks like compression, adding noise, rotation, cropping and median
filter. Peak Signal to Noise Ratio (PSNR) has been used to measure the robustness of
the proposed approach on eight grayscale images of size 255  255 and watermark
with size 85  85. the presented approach was semi-blind. It can detect the tamper
blocks by extracting the attacked watermark and comparing it with the original one.
In the spatial domain, a robust watermarking technique was presented by M.
E. Moghaddam et al. [28]. Their presented approach was based on changing the least
significant colour for 5  5 neighbours of a certain location, which was selected using
the Imperialistic Competition Algorithm (ICA). PSNR was used to evaluate the results.
The PSNR changed after applying some attacks especially JPEG compression which
indicates that, the extracted watermark is far from the original.
A blind watermarking technique was proposed by R. Thanki et al. [4]. The pro-
posed scheme was robust against geometric and signal processing attacks. DCT was
applied on the block with High Frequency (HF) of size 8  8 pixels. Then White
Gaussian Noise (WGN) sequence is used to modify the mid-band frequency DCT
coefficients of each DCT block. These steps were done after applying a Discrete Cosine
Transform (DCT). On the other hand, the correlation properties of the WGN sequence
was used for watermark extraction.
Tamper detection and localization approaches were developed by S. Gull et al. [24].
The proposed approaches are robust against multiple signal processing and geometric
attacks and detect tampered caused by text addition, copy and paste and content
removal attacks. The approach applied LSB algorithm on blocks of the image of size
4  4 pixels. The proposed approach was tested on some medical images of size
256  256 using PSNR and BER.
After analyzing the literature work, we have noticed that some techniques are
rousted against signal processing and geometric attacks and others have tamper
detection and localization on some attacks like text addition, content removal and copy
and paste. All the discussed techniques used greyscale images as the watermark. In [29]
we have presented medical images watermarking technique, which generate a QR Code
that contains the data of the patient and embed it in the medical image. In this paper, we
Protection of Patients’ Data Privacy 361

extend to tamper detection and localization. we detect signal processing and geometric
attacks in addition to text addition, content removal and copy and paste attacks [24].

3 Proposed Tamper Detection Watermarking Technique

3.1 Embedding Technique


The used method [29] based on the original image [24], as shown in Fig. 1, splits the
input medical image into blocks of size 4  4 pixels (B). It then sets the last two least
significant bits to zero and computes the mean of the block (M) which is embedded into
the upper half of the block (B). Besides, the proposed method takes the QR code as a
watermark and encrypts it by XORing it with the mean (M) then embeds it in the lower
half of the block (B).

Fig. 1. Embedding technique

3.2 Extraction Technique


The main target of the proposed extraction technique is to detect if the encrypted
medical image is attacked or not and extract the QR code.
As shown in Fig. 2, the proposed method splits the encrypted medical image into
4  4 pixels blocks (B), then computes the mean of the block (B) after setting the least
two significant bits to zero (M). The QR code bit is extracted from the lower half of the
block (B) then the extracted bit is decrypted by XORing it with the mean (M). To apply
tamper detection, the original mean (Mb) is extracted from the upper half of the block
(B) and XORed with the computed mean (M). In the case of zero results, the encrypted
medical image is not attacked otherwise there is an attack on the image.

3.3 Attacks and Tampers


The proposed method has been evaluated against multiple signal (Gaussian noise, salt
and pepper noise, median filter, histogram equalization, sharpening and low pass filter,
JPEG), geometric attacks (resize, rotation and crop) and tamper localization (content
removal, copy and paste and text addition).
a. Singal attacks: For salt and pepper attack we have used Matlab function with noise
density 0.05. While the median filter attack using the default 3-by-3 neighbourhood.
In Gaussian noise attack, we add white noise of mean zero and variance 0.01 to the
image.
362 A. H. ElSaadawy et al.

Fig. 2. Extraction technique

b. Geometric attacks: In rotation attack, we rotate the image with angle 30 clockwise.
While the image is resized up with the double size of the original image. For crop
attack, we have used a fixed size 50  50 pixels block to be cropped from the
image.
c. Tamper localization: In content removal, we remove a block with size *50−150
pixel. While in copy and paste, we take a part of 50 pixels and paste it in another
position in the image. Finally, text addition we add a word with four characters only
“text” and with font size *30−60.

4 Results and Discussion

The proposed method has been tested on 138 grey medical images taken from OPENi
[30] medical images database as shown in Fig. 3, and the QR code is generated using
patients’ data. It has been tested on different sizes of the medical images and QR codes
as follows: (64  64, 128  128, 256  256, 512  512, 1024  1024, 2048  2048
and 4096  4096) for medical images and (16  16, 32  32, 64  64, 128  128,
256  256, 512  512 and 1024  1024) for QR codes.

Fig. 3. Sample of dataset


Protection of Patients’ Data Privacy 363

We have evaluated the presented scheme against tamper detection using Bit Error
Rate (BER). BER is the percentage of bits with an error related to the total number of
bits [31] as shown in Eq. 1.

NE
BER ¼ ð1Þ
NB
where NE is the number of bits with error and NB is the total number of bits, when the
BER is not equal to zero it indicates that this image is attacked.

4.1 Tamper Detection


The results of tamper detection of attacks tested on sample image in Fig. 4 is presented
in Table 1 and Table 2, the values represent the percentage of the tampered blocks in
the attacked image. The extracted QR codes after different attacks on the sample image
with various sizes are presented in Fig. 5. After analyzing the QR code extracted after
attacks, it is observed that we can’t extract visually accepted QR Code, we have tested
the acceptance of the extracted QR Code using BER. The average BER of each attack
on the 138 medical images is recorded in Table 3 and Table 4. After analyzing the
results, we have realized that the value of BER for most of the attacks exceed 80%
which indicates that the extracted watermark is different from the original watermark.
For crop attack, BER is decreased as image size increase this is because we crop the
same size of the block on all sizes on images. Regarding sharpening and low pass filter,
we apply the same filter for all sizes so it affects the small size more than the large ones.
Finally, salt and pepper filter BER are the same for all sizes that is because we add the
same ratio of salt and pepper so it affects all of them with the same percentage, also we
add a small ration 0.05 so BER for this attack is the smallest one 50%.
For the cropping attack, the proposed method was tested using a fixed cropping
block size of 100  100 pixel so the extracted watermark will be the same as the
original one except for the cropped block. As the image size increases, the BER for the
cropping attack is decreased.

Fig. 4. Tested medical image


364 A. H. ElSaadawy et al.

(a)

(b)

Fig. 5. Results of attacks on image (a) Signal attacks (b) Geometric attacks
Protection of Patients’ Data Privacy 365

Table 1. Tamper Detection results for Signal attacks on a sample image


Size of QR Gaussian Salt & Median Histogram Sharpening Low JPEG
code noise pepper filter equalization pass
noise filter
16  16 0.996 0.511 0.921 1.0 0.953 1.0 0.996
32  32 0.996 0.52 0.873 0.999 0.896 0.993 0.966
64  64 0.993 0.523 0.874 0.998 0.879 0.997 0.947
128  128 0.993 0.533 0.898 0.999 0.859 0.989 0.938
256  256 0.994 0.53 0.91 0.999 0.846 0.995 0.935
512  512 0.994 0.527 0.911 0.999 0.836 0.996 0.93
1024  1024 0.994 0.528 0.908 0.999 0.815 0.996 0.927

Table 2. Tamper Detection results for Geometric attacks on a sample image


Size of QR code Resize Rotation (30) Crop
16  16 0.99 0.882 0.027
32  32 0.971 0.854 0.004
64  64 0.967 0.846 0.004
128  128 0.965 0.843 0.001
256  256 0.963 0.84 0.001
512  512 0.961 0.839 0.0003
1024  1024 0.961 0.839 0.0002

Table 3. Average BER results for Signal Attack on 138 medical images
Size of QR Gaussian Salt & Median Histogram Sharpening Low JPEG
code noise pepper filter equalization pass
noise filter
16  16 0.997 0.523 0.896 0.997 0.913 0.97 0.996
32  32 0.997 0.528 0.841 0.997 0.872 0.939 0.992
64  64 0.997 0.529 0.825 0.998 0.835 0.909 0.987
128  128 0.997 0.529 0.816 0.999 0.804 0.888 0.979
256  256 0.997 0.529 0.811 0.999 0.78 0.877 0.973
512  512 0.997 0.529 0.809 1 0.764 0.87 0.973
1024  1024 0.997 0.529 0.804 0.999 0.738 0.867 0.973
366 A. H. ElSaadawy et al.

Table 4. Average BER results for Geometric Attack on 138 medical images
Size of QR code Resize Rotation (30) Crop
16  16 0.981 0.997 0.66
32  32 0.972 0.997 0.628
64  64 0.968 0.995 0.148
128  128 0.963 0.991 0.037
256  256 0.96 0.987 0.01
512  512 0.958 0.986 0.002
1024  1024 0.957 0.986 0.001

4.2 Tamper Localization


Our scheme has been tested against tamper localization in addition to the previous
attacks. It has been tested against copy-paste, text addition and content removal. As
shown in Fig. 6, the proposed scheme can detect any tamper even in a small region. In
the text addition, we have added the word “Text” in two fixed places of the water-
marked medical image and it was detected. Moreover, for copy-paste attack, we have
used the part of the watermarked medical image and copied it in the watermarked
medical image and again the technique detected the added image. Finally, in content
removal, we have removed a small block from the watermarked medical image and
refill this block with random colours in the range of the surrounded blocks and the
scheme was also able to detect the removed region. The average BER of copy-paste,
text addition and content removal on the 138 medical images are recorded in Table 5
After analyzing the results, we have realized that BER in all tamper over all images is
reported with a value greater than zero that considers an indication that, in all cases and
attacks, the tamper is detected regardless its size and location.

Fig. 6. Results of tamper localization on sample images


Protection of Patients’ Data Privacy 367

Table 5. Average BER results for tamper localization attacks on 138 medical images
Size of QR code Copy and paste Text addition Content removal
16  16 0.0623 0.659 0.125
32  32 0.062 0.61 0.046
64  64 0.0623 0.3511 0.056
128  128 0.062 0.038 0.056
256  256 0.061 0.06 0.018
512  512 0.06 0.015 0.0045
1024  1024 0.059 0.0073 0.0011

5 Conclusion

In this paper, a tamper detection and localization watermarking technique are proposed.
The proposed method uses the EPR generated QR code as a watermark image and a
greyscale medical image as a host image. The host image is split into blocks of size
4  4 pixels, then the mean of the block is computed after setting the least two
significant bits to zero and embed it in the upper half of the block. On the other hand,
the watermark pixel is embedded in the lower half of the block after encrypting it. In
the extraction of the QR code, the watermarked medical image is divided into blocks of
4  4 pixels and the mean is computed, like in the embedding technique, then the
watermark is extracted from the lower half of the block and the mean from the upper
half of the block. We can detect if the medical image is attacked when the computed
mean and the extracted one are not the same.
The proposed method was tested on 138 medical images with various sizes against
geometric and signal processing attacks using BER as a measure. Tamper localization
was tested on 138 medical images from the dataset using text addition, content removal
and copy and paste attacks. BER results indicate that the difference between the
extracted watermark and the original one is approximately above 80% on geometric
and signal attacks except for crop and salt and pepper attacks. For crop attack, block
size to be cropped is small relative to the size of the watermark size. For low pass
filtering and the sharpening, BER is valued is decreased when the size of the QR code
increase because we apply the same filter on all sizes, so the filter effect decreases when
the size increases. Besides, for salt and pepper as the percentage of salt and pepper
added to the image is about 0.05 of the size of the medical image. The proposed
method can detect all the tested attacks on various sizes of medical image and QR
codes. Regardless of the size and location of the tamper is detected and can be
localized.
In the future, we aim at proposing a technique that protects the transmitted image
against any attack not only detects the geometrics attack. This can be done by changing
the domain of the transmitted image to the frequency domain.
368 A. H. ElSaadawy et al.

References
1. Mousavi, S.M., Naghsh, A., Abu-Bakar, S.A.R.: Watermarking techniques used in medical
images: a survey. J. Digit. Imaging 27(6), 714–729 (2014). https://doi.org/10.1007/s10278-
014-9700-5
2. Kuang, L.Q., Zhang, Y., Han, X.: A Medical image authentication system based on
reversible digital watermarking. In: 2009 1st International Conference on Information
Science and Engineering (ICISE), pp. 1047–1050 (2009)
3. Bhatnagar, G., Jonathan, W.U.Q.M.: Biometrics inspired watermarking based on a fractional
dual tree complex wavelet transform. Future Gener. Comput. Syst. 29(1), 182–195 (2013)
4. Thanki, R., Borra, S., Dwivedi, V., Borisagar, K.: An efficient medical image watermarking
scheme based on FDCuT–DCT. Eng. Sci. Technol. Int. J. 20(4), 1366–1379 (2017)
5. Navas, K.A., Thampy, S.A., Sasikumar, M.: EPR hiding in medical images for telemedicine.
Int. J. Electron. Commun. Eng. 2(2), 223–226 (2008)
6. Munch, H., Engelmann, U., Schroter, A., Meinzer, H.P.: The integration of medical images
with the electronic patient record and their webbased distribution. Acad. Radiol. 11(6), 661–
668 (2004)
7. Rahman, A.U., Sultan, K., Musleh, D., Aldhafferi, N., Alqahtani, A., Mahmud, M.: Robust
and fragile medical image watermarking: a joint venture of coding and chaos theories.
J. Healthcare Eng. (2018)
8. Jabade, V.S., Gengaje, S.R.: Literature review of wavelet. Int. J. Comput. Appl. 31(1), 28–
35 (2011)
9. Adnan, W.W., Hitam, S., Abdul-Karim, S., Tamjis, M.R.: A review of image watermarking.
In: Proceedings of Student Conference on Research and Development, Swedan (2003)
10. Chandrakar, N., Bagga, J.: Performance comparison of digital image watermarking
techniques: a survey. Int. J. Comput. Appl. Technol. Res. 2(2), 126–130 (2013)
11. Saqib, M., Naaz, S.: Spatial and frequency domain digital image watermarking techniques
for copyright protection. Int. J. Eng. Sci. Technol. (IJEST) 9(6), 691–699 (2017)
12. Laouamer, L., AlShaikh, M., Nana, L., Pascu, A.C.: Robust watermarking scheme and
tamper detection based on threshold versus intensity. J. Innov. Digit. Ecosyst. 2(1–2), 1–12
(2015)
13. Ahmad, A., Sinha, G.R., Kashyap, N.: 3-level DWT image watermarking against frequency
and geometrical attacks. Int. J. Comput. Netw. Inf. Secur. 6(12), 58 (2014)
14. Zengzhen, M.: Image quality assessment in multiband DCT domain based on SSIM. Optik
Int. J. Light Electron Opt. 125(12), 6470–6473 (2014)
15. Benhocine, A., Laouamer, L., Nana, L., Pascu, A.C.: New images watermarking scheme
based on singular value decomposition. J. Inf. Hiding Multimed. Signal Process. 4(1), 9–18
(2013)
16. Kaur, M., Kaur, R.: Reversible watermarking of medical images authentication and
recovery-a survey. Inf. Oper. Manage. 3(1), 241–244 (2012)
17. Mousavi, S.M., Naghsh, A., Abu-Bakar, S.A.R.: Watermarking techniques used in medical
images: a survey. Digit. Imaging 27(6), 714–729 (2014)
18. Mohanty, S.P., Ramakrishnan, K.R.: A dual watermarking technique for images. In:
Proceedings of the 7th ACM International Multimedia, pp. 49–51 (1999)
19. Alomari, R.S., Al-aer, A.: A fragile watermarking algorithm for content authentication. Int.
J. Comput. Inf. Sci. 2(1), 27–37 (2004)
20. Zhao, Y.: Dual domain semi-fragile watermarking for image authentication (2003)
21. Yu, X., Wang, C., Zhou, X.: Review on semi-fragile watermarking algorithms for content
authentication of digital images. Future Internet 56(9), 1–17 (2017)
Protection of Patients’ Data Privacy 369

22. Lin, E.T., Podilchuk, C.I., Delp III, E.J.: Detection of image alterations using semifragile
watermarks. In: Proceedings of the SPIE—Security and Watermarking of Multimedia
Contents II, USA (2000)
23. Hosny, K.M., Darwish, M.M., Li, K., Salah, A.: Parallel multi-Core CPU and GPU for fast
and robust medical image watermarking. In: IEEE Access (2018)
24. Gull, S., Loan, N.A., Parah, S.A., Sheikh, J.A., Bhat, G.M.: An efficient watermarking
technique for tamper detection and localization of medical images. J. Ambient Intell.
Humaniz. Comput. 11(5), 1799–1808 (2018)
25. Yahya, A.N., Jalab, H.A., Wahid, A., Noor, R.M.: Robust watermarking algorithm for
digital images using discrete wavelet and probabilistic neural network. J. King Saud Univ. –
Comput. Inf. Sci. 27(4), 393–401 (2015)
26. Sharma, A., Singh, A.K., Ghrera, S.P.: Robust and secure multiple watermarking for medical
images. Wireless Pers. Commun. 92(4), 1611–1624 (2018)
27. Zhang, L., Zhou, P.P.: Localized affine transform resistant watermarking in region-of-
interest. Telecommun. Syst. 44(3), 205–220 (2010)
28. Moghaddam, M.E., Nemati, N.: A robust color image watermarking technique using
modified imperialist competitive algorithm. Forensic Sci. Int. 233(1), 193–200 (2013)
29. ElSaadawy, A.H., ELSayed, A.S., Al-Berry, M.N., Roushdy, M.: Reversible watermarking
for protecting patient’s data privacy using an EPR-generated QR code. In: AICV (2020)
30. OPENi Medica Image dtabase. https://openi.nlm.nih.gov/. Accessed 10 Sep 2019
31. Shimonski, R.J., Eaton, W., Khan, U., Gordienko, Y.: Exploring the sniffer pro interface. In:
Sniffer Pro Network Optimization and Troubleshooting Handbook, pp. 105–158 (2002)
Breast Cancer Classification
from Histopathological Images with Separable
Convolutional Neural Network and Parametric
Rectified Linear Unit

Heba Gaber1(&), Hatem Mohamed2(&), and Mina Ibrahim3(&)


1
Department of Information Systems, Faculty of Computers and Information,
Menoufia University, Menoufia, Egypt
hebanewg@gmail.com
2
Department of Information Systems, Faculty of Computers and Information,
Menoufia University, Alexandria, Egypt
hatem6803@yahoo.com
3
Department of Information Systems, Faculty of Computers and Information,
Menoufia University, Cairo, Egypt
mina.ibrahim@ci.menofia.edu.eg

Abstract. The convolutional neural network has achieved great success in the
classification of medical imaging including breast cancer classification. Breast
cancer is one of the most dangerous cancers impacting women all over the
world. In this paper, we propose a deep learning framework. This framework
includes the proposed pre-processing phase and the proposed separable con-
volutional neural network (SCNN) model. Our pre-processing uses patch
extraction and data augmentation to enrich the training set and improve the
performance. The SCNN model uses separable convolution and parametric
rectified linear unit (PRELU) as an activation function. The SCNN shows
superior performance and faster than the pre-trained neural network models.
The SCNN approach is evaluated using the BACH2018 dataset [1]. We test the
performance using 40 random images. The framework achieves accuracy
between 97.5% and 100%. The best accuracy is 100% for multi-class and binary
class. The framework provides superior classification performance compared to
existing approaches.

Keywords: Deep learning  SCNN  PRELU  Convolutional Neural Network


(CNN)

1 Introduction

Breast cancer is one of the most difficult and dangerous diseases that a person can face
in his life. Women are considered the most sensitive to breast cancer. According to a
study by the American Cancer Society (ACS) for 2020, in the USA, the estimated
deaths of women due to breast cancer is near to 42,170, also about 276,480 new cases
of invasive breast cancer will be diagnosed in women, and about 48,530 new cases of
carcinoma in situ will be diagnosed [2]. The early and correct diagnosis helps in the
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 370–382, 2021.
https://doi.org/10.1007/978-3-030-58669-0_34
Breast Cancer Classification from Histopathological Images 371

process of treatment and reduce the number of deaths. The pathologists play a very
important role in the process of diagnosis, which is done manually, the manual process
may lead to an error in diagnosis. Also, it is considered a stressful process for
pathologists and consumes a lot of time and depends on the accuracy and clarity of the
image [3]. Computer-aided detection system (CAD) has used to overcome the misdi-
agnosis problem, we generate a framework that increases the performance of the
classification of breast cancer and reduce the time wasted in the diagnosis process. This
framework depends on the convolutional neural network. Our model depends on work
in a separable convolution neural network which is used less multiplication process
than the traditional convolution and is faster than the traditional [4]. The framework is
evaluated using the BACH2018 dataset [1]. The accuracy, specificity, sensitivity,
precision, and f1-score were used as evaluation metrics.
This paper is organized as follows: Sect. 2: illustrates the related works. Section 3:
discuss the proposed methods. Section 4: illustrates the materials that we will use.
Section 5: discuss the results and discussion. Section 6: discuss the analysis and
comparisons of results. Section 7: illustrates the conclusion and future work.

2 Related Work

The traditional machine learning approaches including the support vector machine,
principle component analysis, and random forest encountered major shortcomings in
image classification. Therefore, researchers have tended to use deep learning. Nowa-
days, a deep learning approach provides a superior performance of the classification of
medical imaging with different modalities. The study by Bayramoglu et al. [5], pro-
posed a CNN model with different sizes of convolution kernels 7  7, 5  5, and
3  3. They evaluated their model with the BreakHis dataset. They performed patient-
level classification and reported 83.25% accuracy for binary classification. In another
study by Spanhol et al. [6], proposed a model similar to AlexNet with different fusion
techniques for image and patient-level classification of breast cancer. This study that
classifies the BreakHis dataset reported 90% and 85.6% accuracy for image and
patient-level classification respectively. Besides, the study by Araujo et al. [7], pro-
posed a CNN-based approach to classify the BC Classification Challenge 2015 dataset.
Araujo’s model achieved approximately 77.8% accuracy when classifying four classes,
and 83.3% accuracy for binary class. Also, the study by Chennamsetty et al. [8],
presented a multi-classification of breast cancer from histopathological images using an
ensemble of Pre-Trained Neural Networks (Resnet-101, Densenet-161). They used
BACH 2018 dataset. They tested the proposed ensemble using 40 random chosen
images and achieved an accuracy of 97.5%, and when they classify 100 images pro-
vided by the organizers, the proposed scheme achieved an accuracy of 87%. Chen-
namsetty’s won the first place of ICIAR2018 Grand Challenge on Breast Cancer
Histology Images. Also, Kwok [9], proposed a method using a pre-trained model
(Inception-Resnet-v2) for the multiclass classification of breast cancer using the BACH
2018 dataset. Different data augmentation methods and patches were employed to
improve the accuracy of the method. In Kwok study, the accuracy of the 100 test
images provided by the organizers was 87%. The framework won the first place of
372 H. Gaber et al.

ICIAR2018 Grand Challenge. In 2019 Alom et al. [10], proposed Inception Recurrent
Residual Convolutional Neural Network (IRRCNN) model for breast cancer classifi-
cation. The IRRCNN model combines the strength of the Inception Network
(Inception-v4), the Residual Network (ResNet), and the Recurrent Convolutional
Neural Network (RCNN). The model was tested with the BC Classification Challenge
2015 dataset. They achieved 99.05% and 98.59% testing accuracy for the binary and
multi-class cases respectively.
In this paper, we will present the proposed framework that we used to classify
BACH 2018 dataset [1]. Our framework is divided into two parts as shown in Fig. 1.
Part1 includes the pre-processing stage and proposed SCNN model. In part2 the trained
model is used to predict the test set. Finally, we evaluated the performance of our
framework using accuracy, precision, recall, f1-score and confusion matrix [12]. The
pre-processing stage is divided into two parallel processes. The patch extraction pro-
cess and the data augmentation process then images are resized as shown in Fig. 1.

Fig. 1. Overview of the framework.


Breast Cancer Classification from Histopathological Images 373

3 Methods
3.1 Patch Extraction
This process was performed to enrich our training set with samples and improve the
performance of the SCNN model. A sequential patch extraction was used. The input
image was cropped into patches. Patch size depends on the original size of the image in
the dataset. For example, in BACH 2018. patches were cropped from each image in the
dataset, using a patch size of 1495  1495 pixels and stride of 100 pixels [9]. The 400
histological images were cropped into 4000 patches, 10 patches for each sample.
Figure 2 shows the patch extraction process.

Fig. 2. Process of patch extraction applied on BACH 2018.

3.2 Data Augmentation


The data augmentation is used to increase the amount of training set using information
only in the training set and to reduce the overfitting in the test set. We applied different
data augmentation techniques including Vertical flipping, horizontal flipping, (Vertical
& horizontal) both flipping, crop images, Gaussian blur with random sigma between 0
and 0.5, Add Gaussian noise, Average Pooling and affine transformations to each
image (Scale = 2.5, translate percent = 0.04 and rotate = 15%) [8, 10]. The augmented
samples generated from this process were selected randomly from the ten different data
augmentation techniques. In BACH 2018 dataset, the number of samples in the training
set was 320. The total augmented samples generated were 3200. Finally, 1280 samples
were selected randomly.

3.3 Resize Process


All generated samples from the data augmentation process, the patch extraction pro-
cess, and the training set were resized. In the BACH2018 dataset, the samples were
resized to 299  299 pixels.

3.4 Proposed Separable Convolutional Neural Network (SCNN) Model


After applying the pre-processing, the training set now is ready to pass through the
model as it is illustrated in Fig. 1. The proposed model is a convolutional neural
network model. The model depends on the separable convolution (SeparableConv2D)
unit and parametric rectified unit (PRELU) as an activation function. The separable
convolution uses the depthwise spatial convolution [4] which is depthwise convolution
374 H. Gaber et al.

followed by a pointwise convolution that mixes the resulting output channels. The
separable convolution is faster than the traditional convolution because it uses one
dimension filters and less multiplication operations than the traditional convolution and
gives better performance. The PRELU unit is a type of non-linear activation function.
This PRELU function was tested in [11] study and it was more qualified than the other
activation functions. The main advantage of the SCNN model is that it provides better
performance with fewer network parameters when compared to the pre-trained neural
networks.
The structure of the SCNN model as following. We start this model with a fully
connected layer that receives the input image with size 48  48  3, then three blocks
are applied which will be illustrated in the next paragraph. After the third block, two
fully connected layers are performed. Each one is followed by a dropout layer with a
50% dropout rate. Finally, a fully connected output layer with softmax as activation
function is used for multi-classification and sigmoid function for binary classification.
This model contains 39 layers with 4844132 network parameters. Table 1 illustrates
the 39 layers of the model. The input to each layer, output, and parameters of each layer
are explained in the table.
This model consists of three blocks, each block contains one or more sub-block.
Each sub-block consists of separableConv2d layer, activation layer with activation
function PRELU and batch normalization layer. The sub-blocks are followed by a max-
pooling layer with a pooling size 2  2 and a dropout layer with a 25% dropout rate as
shown in Fig. 3. The first block has one sub-block. The parameters of the Separa-
bleConv2D layer in this sub-block were 32 filters, the dimensions of the kernel were
3  3 and padding same. The second block has two sub-block. The parameters of the
SeparableConv2D layer in the two sub-block were a total of 64 filters, the dimensions
of the kernel were 3  3 and padding same. The third block has three sub-blocks. The
parameters of the SeparableConv2D layer in the three sub-block were a total of 128
filters, the dimensions of the kernel were 3  3 and padding same.

Fig. 3. Diagram of SCNN model for two blocks.


Breast Cancer Classification from Histopathological Images 375

Table 1. The 39 layer of the proposed SCNN model.


Layer number Layer name Input Output Parameters
1 dense_1 (Dense) (48, 48, 32) (48, 48, 32) 128
2 activation_1 (Activation = PRELU) (48, 48, 32) (48, 48, 32) 0
3 batch_normalization_1 (48, 48, 32) (48, 48, 32) 128
4 dropout_1 (Dropout = 0.25) (48, 48, 32) (48, 48, 32) 0
5 separable_conv2d_1(filters = 32) (48, 48, 32) (48, 48, 32) 1344
6 activation_2 (Activation = PRELU) (48, 48, 32) (48, 48, 32) 0
7 batch_normalization_2 (48, 48, 32) (48, 48, 32) 128
8 max_pooling2d_1(Pooling_size = (2,2)) (48, 48, 32) (24, 24, 32) 0
9 dropout_2 (Dropout = 0.25) (24, 24, 32) (24, 24, 32) 0
10 separable_conv2d_2(filters = 64) (24, 24, 32) (24, 24, 64) 2400
11 activation_3 (Activation = PRELU) (24, 24, 64) (24, 24, 64) 0
12 batch_normalization_3 (24, 24, 64) (24, 24, 64) 256
13 separable_conv2d_3(filters = 64) (24, 24, 64) (24, 24, 64) 4736
14 activation_4 (Activation = PRELU) (24, 24, 64) (24, 24, 64) 0
15 batch_normalization_4 (24, 24, 64) (24, 24, 64) 256
16 max_pooling2d_2(Pooling_size = (2,2)) (24, 24, 64) (12, 12, 64) 0
17 dropout_3 (Dropout = 0.25) (12, 12, 64) (12, 12, 64) 0
18 separable_conv2d_4(filters = 128) (12, 12, 64) (12, 12, 128) 8896
19 activation_5 (Activation = PRELU) (12, 12, 128) (12, 12, 128) 0
20 batch_normalization_5 (12, 12, 128) (12, 12, 128) 512
21 separable_conv2d_5(filters = 128) (12, 12, 128) (12, 12, 128) 17664
22 activation_6 (Activation = PRELU) (12, 12, 128) (12, 12, 128) 0
23 batch_normalization_6 (12, 12, 128) (12, 12, 128) 512
24 separable_conv2d_6(filters = 128) (12, 12, 128) (12, 12, 128) 17664
25 activation_7 (Activation = PRELU) (12, 12, 128) (12, 12, 128) 0
26 batch_normalization_7 (12, 12, 128) (12, 12, 128) 512
27 max_pooling2d_3(Pooling_size = (2,2)) (12, 12, 128) (6, 6, 128) 0
28 dropout_4 (Dropout = 0.25) (6, 6, 128) (6, 6, 128) 0
29 dense_2 (Dense) (6, 6, 128) (6, 6, 512) 66048
30 activation_8 (Activation = PRELU) (6, 6, 512) (6, 6, 512) 0
31 batch_normalization_8 (6, 6, 512) (6, 6, 512) 2048
32 dropout_5 (Dropout = 0.5) (6, 6, 512) (6, 6, 512) 0
33 flatten_1 (Flatten) (6, 6, 512) (18432) 0
34 dense_3 (Dense) (18432) (256) 4718848
35 activation_9 (Activation = PRELU) (256) (256) 0
36 batch_normalization_9 (256) (256) 1024
37 dropout_6 (Dropout = 0.5) (256) (256) 0
38 dense_4 (Dense) (256) (4)multi-class 1028
39 activation_10 (Activation = softmax) (4) (4) 0

3.5 Training Methodology


Initially, we split the dataset randomly to 80% for the training set, 10% for the vali-
dation set to monitor progress, and tune the model. And 10% for the testing set [8]. We
used this partition ratio depending on the recent studies to simply compare our results
with their results. Therefore, the number of training set is 320 samples, the validation
set is 40 samples and the test set is 40 samples. We prepared five methodologies of the
pre-processing process are shown in Table 2. These methodologies are only data
augmentation, only 10 patches, only 14 patches, data augmentation with 10 patches,
376 H. Gaber et al.

Table 2. The pre-processing methodologies.


Number of samples Training Total
Data augmentation 1280 (selected randomly) 320 (selected randomly) 1600
14 patches 14 * 400 = 5600 320 5920
10 patches 10 * 400 = 4000 320 4320
Data augmentation and 14 patches 1280 + 5600 = 6880 320 7200
Data augmentation and 10 patches 1280 + 4000 = 5280 320 5600

and data augmentation with 14 patches. Every one of the five methodologies are added
to the training set. Finally, we have five different training sets. We have trained the
model with these five training sets.
We used a machine with two GPUs (NVIDIA GeForce GTX 1060 Ti). The back
propagation is performed by the Adagrad optimization function with a constant
learning rate equals 0.05. This learning rate number is chosen because it is suitable for
the optimization function and the batch size number. We also use batch size 64
depending on the number of samples in the training set and to help the model to be
more stable.
All of these parameters are dependent on each other. By these parameters, the
model is very stable and reaches the optimal performance within 40 epochs. Besides,
the objective function is a categorical cross-entropy for multi-class and binary cross-
entropy for binary classification.

3.6 Evaluation Metrics


To evaluate the proposed framework, we used accuracy, precision (2), recall, or sen-
sitivity (3), Specificity (3), F1-score, and confusion matrix as evaluation metrics [12].
TP: True positive, FP: false positive, FN: false negative, TN: True negative. Accuracy
is the correctly identified prediction for each class divided by the total number of
dataset. It is calculated using the following formula (1):

ACCURACY ¼ ðTP þ TN Þ=ðTP þ TN þ FP þ FN Þ ð1Þ

PRECISION ¼ TP=ðTP þ FPÞ ð2Þ

RECALL ¼ TP=ðTP þ FN Þ ð3Þ

SPECIFICITY ¼ TN=ðTN þ FPÞ ð4Þ

F1-score is a measure of test accuracy, and it uses both precision and recall to compute
the scores. It is a good metric when test data is imbalanced. F1-score is calculated using
the following formula (5):

F1  SCORE ¼ 2xððprecisionxrecallÞ=ðprecision þ recallÞÞ ð5Þ


Breast Cancer Classification from Histopathological Images 377

4 Materials

Breast Cancer Classification Challenge 2018 Dataset (BACH 2018) was made avail-
able as part of the ICIAR-2018 grand challenge [1]. This dataset consists of high
resolution (2048  1536) pathology images, which are annotated H&E-stained images
for breast cancer classification released in 2018. The dataset consists of 400 images that
were equally distributed across four classes (100 samples per class). The four classes
are normal tissue which is a non-cancerous sample, benign which is a non-cancerous
breast conditions are unusual growths, in situ which is a non-invasive cancer where
abnormal cells have been found in the lining of the breast milk duct, and invasive
carcinoma in which the abnormal cancer cells that began forming in the milk ducts
have spread beyond the ducts into other parts of the breast tissue. For binary classi-
fication, the normal and benign classes are combined to generate non-carcinoma class
and the in situ and invasive classes are combined to generate carcinoma class. It is the
update of the bioimaging 2015 dataset that is classified in the Araujo’s study [7].
Sample images which display the four classes of BACH 2018 are shown in Fig. 4.

Fig. 4. Normal tissue. Benign lesion. In situ carcinoma. Invasive carcinoma.

5 Results and Discussion


5.1 The Model Performance Evaluation
After learning the model on the five training sets that we prepare with the five different
methodologies. We evaluate the model optimization using the accuracy and loss
function. The loss function is calculated on training and validation and its interpretation
is based on how well the model is doing in these two sets. It is the sum of errors made
for each sample in training or validation sets. Loss value implies how poorly or well a
model behaves after each iteration. For the training and validation sets and through 40
iteration or epochs, the experimental results prove that the highest accuracy with the
lowest loss (with the dark blue line) happened when using 10 patches and data aug-
mentation methodology for preparing the training set as shown in Figs. 5, 6. And the
lowest accuracy with the highest loss when using data augmentation methodology for
preparing the training set (with the green line).
378 H. Gaber et al.

TRAINING ACCURACY FOR BACH 2018 DATASET


VALIDATION ACCURACY FOR BACH2018 DATASET
1 1
0.9 0.9
0.8 0.8
0.7 0.7
ACCURACY

ACCURACY
0.6 AUG 0.6 AUG
0.5 14PATCH 0.5 14PATCH
0.4 10PATCH 0.4 10PATCH
0.3 0.3
14PATCH+AUG 14PATCH+AUG
0.2 0.2
0.1 10PATCH+AUG 0.1 10PATCH+AUG
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
EPOCHS EPOCHS

Fig. 5. Training and validation accuracy with five methodologies of pre-processing through 40
epochs. Aug means data augmentation and patch means patch extraction.

TRAIN LOSS FOR BACH 2018 DATASET VALIDATION LOSS FOR BACH2018 DATASET
0.6 1.2

0.5 1

0.4 0.8
AUG AUG
LOSS
LOSS

0.3 14PATCH 0.6 14PATCH


10PATCH 10PATCH
0.2 0.4
14PATCH+AUG 14PATCH+AUG
0.1 0.2
10PATCH+AUG 10PATCH+AUG
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
EPOCHS EPOCHS

Fig. 6. Training and validation loss with five methodologies of pre-processing through 40
epochs. Aug means data augmentation and patch means patch extraction.

After learning the model using the five training sets that prepared using the five
methodologies that are shown in Table 3. We used the trained models to predict 40
randomly chosen images from the four classes. The experimental results prove that the
highest accuracy achieved when we used methodology (5) although it’s training set
samples are less than methodology (4) and the lowest test accuracy happened when
using methodology (1) in which the training set constructed with data augmentation.
Also for binary classification of class carcinoma and class non-carcinoma, the
methodology (5) win.

Table 3. Prediction accuracy when using 40 samples of four classes.


Model+ methodologies Total number of training set Test Accuracy
SCNN model+ data augmentation (1) 1600 68%
SCNN model+ 14 patches (2) 5920 90%
SCNN model+ 10 patches (3) 4320 93%
SCNN model+ 14 patches+ data 7200 95%
augmentation (4)
SCNN model+ 10 patches+ data 5600 100%
augmentation (5)
Breast Cancer Classification from Histopathological Images 379

5.2 Evaluation of Methodology (5) Using Evaluation Metrics


For multi-class, we generate a confusion matrix for 40 random imbalance distribution
samples of the four classes (benign = 11, in situ = 9, invasive = 6, and normal = 14),
we use the imbalance distribution to prove our result quality, but we also check the
balance distribution that generates the same result. Depending on the confusion matrix
that is shown in Table 4, we calculate precision, recall, f1-score, accuracy, and
specificity and we achieve 100% for all as illustrated in Table 6. For binary class, we
generate a confusion matrix for 40 random imbalance distribution samples of the two
classes (carcinoma = 19 and non-carcinoma = 21). Depending on the confusion matrix
that is shown in Table 5, we calculate precision, recall, f1-score, accuracy, and
specificity and we achieve 100% for all as illustrated in Table 7.

Table 4. Confusion matrix of 4 classes. Table 5. Confusion matrix of 2 classes.

PREDICTION PREDICTION

Benign In Invasive Normal Carcinoma Non-


situ carcinoma

Carcinoma 19 0
Benign 11 0 0 0
TRUTH

Non- 0 21
TRUTH

In situ 0 9 0 0 carcinoma
Invasive 0 0 6 0

Normal 0 0 0 14

Table 6. Test performance of four classes.


Class name #OF PRECISION RECALL F1- ACCURACY SPECIFICITY
IMAGES SCORE
BENIGN 11 100% 100% 100% 100% 100%
INSITU 9 100% 100% 100% 100% 100%
INVASIVE 6 100% 100% 100% 100% 100%
NORMAL 14 100% 100% 100% 100% 100%

Table 7. Test performance of 2 classes.


Class name #OF PRECISION RECALL F1- ACCURACY SPECIFICITY
IMAGES SCORE
Carcinoma 19 100% 100% 100% 100% 100%
Non- 21 100% 100% 100% 100% 100%
carcinoma
380 H. Gaber et al.

6 Analysis and Comparisons of Results

We illustrate the previous studies that classify the BACH2018 dataset. As we illustrate
in Tables 9 and 10, these studies (Golatkar, [13]- Rakhlin, [14]- Chennamsetty, [8]-
Kwok, [9]) proposed a pre-trained models of neural network and achieved accuracy
93%, 93.8%, 97.5%, 98% respectively for binary classification. 85%, 87.2%, 97.5%,
98% for multi-classification. The studies (Araújo, [7]- Alom, [10]) classifies the
Bioimaging 2015 dataset. The first one proposed CNN + SVM and achieved 83.3%,
77.8% for binary and multi-class, the second proposed IRRCNN model + data aug-
mentation, and achieved 99.09%, 98.59% for binary and multi-class. Our proposed
model with 10 patches and data augmentation has achieved 100% for all of the eval-
uation metrics for binary and multi-class. Therefore, our method shows significant
improvement in the state-of-the-art performance for both binary and multi-class breast
cancer recognition. The computation time for this experiment is given in Table 8. As
shown in the table methodology (5) reported better results with less time than
methodology (4).

Table 8. Computational time per sample for the breast cancer classification experiments.
DATASET MODEL NUMBER OF Epochs TIME
SAMPLES (M)
BACH2018 SCNN + Methodology 5 (10 PATCH + 5600 40 <13 m
Augmentation)
BACH2018 SCNN + Methodology 4 (14 PATCH + 7200 40 <17 m
Augmentation)

Table 9. Recent studies that classify the BACH 2018 dataset for the binary class case.
Paper name Approach Accuracy Sensitivity Specificity Precision F1- Rank
(model) score
(Araújo, 2017) [7] CNN+SVM 83.3% 66.7% ——— ——— ——— 7
non-
carcinoma
95.6%
carcinoma
(Golatkar,2018) [13] Inception-v3 93% ——— ——— ——— ——— 6
+Patches
(Rakhlin,2018) [14] (Resnet-50, 93.8% 96.5% 88.0% ——— ——— 5
Inceptionv3,
VGG-16)
+LightGBM
(Chennamsetty,2018) Resnet-101, 97.5% 95% 100% ——— ——— 4
[8] Densenet-161
(Kwok, 2018) [9] Inception-Resnet- 98% ——— ——— ——— ——— 3
v2
(Alom,2019) [10] IRRCNN +Aug 99.09% ——— ——— ——— ——— 2
Proposed model (SCNN model) 100% 100% 100% 100% 100% 1
+Aug + 10
Patches
Breast Cancer Classification from Histopathological Images 381

Table 10. Recent studies that classify the BACH 2018 dataset for the multi-class case.
Paper name Approach Accuracy Sensitivity Specificity Precision F1- Rank
(model) score
(Araújo, 2017) [7] CNN+SVM 77.8% 77.8% Normal, ——— ——— ——— 7
in situ and
66.7% Benign
88.9% Invasive
(Golatkar,2018) [13] Inception-v3 85% ——— ——— ——— ——— 6
+Patches
(Rakhlin,2018) [14] (Resnet-50, 87.2% ——— ——— ——— ——— 5
Inception-
v3, VGG16)
+LightGBM
(Chennamsetty,2018) Resnet-101, 97.5% For classes namely classes Normal, ——— ——— 4
[8] Densenet- Normal, In-Situ & In-Situ and
161 Benign were 100%, Invasive were
and forInvasive class 100%, and for
was 91% Benignwas 97%
(Kwok, 2018) [9] Inception- 98% ——— ——— ——— ——— 3
Resnet-v2
(Alom,2019) [10] IRRCNN 98.59% ——— ——— ——— ——— 2
+Aug
Proposed model (SCNN The best 100% 100% 100% 100% 1
model) accuracy
+Aug + 10 100%
Patches

7 Conclusion and Future Work

In this study, we have proposed a binary and multi-class breast cancer classification
using the Separable Convolutional Neural Network (SCNN) model, with a parametric
rectified linear unit (PRELU) as an activation function. The experiments were con-
ducted using the SCNN model on the BACH 2018 dataset. We have tested five
methodologies of the pre-processing method. Preparing the training set with 10 patches
+ data augmentation (methodology 5) is better than 14 patches + data augmentation
(methodology 4). The performance was evaluated using different performance metrics.
The proposed framework shows 100% testing accuracy and 100% for sensitivity and
Specificity for binary class and multi-class breast cancer recognition on BACH 2018
dataset. The model converged to its optimal accuracy within 40 epochs, in <13 min.
Thus, the experimental results show state-of-the-art testing accuracy for breast cancer
recognition compared with existing methods. In future work, we will develop the
model to reach the optimal performance in less than 40 iterations.

References
1. Aresta, G., Araújo, T., Kwok, S., Chennamsetty, S.S., Safwan, M., Alex, V., Marami, B.,
Prastawa, M., Chan, M., Donovan, M., Fernandez, G., Zeineh, J., Kohl, M., Walz, C.,
Ludwig, F., Braunewell, S., Baust, M., Vu, Q.D., To, M.N.N., Aguiar, P.: BACH: grand
challenge on breast cancer histology images. Med. Image Anal. 56, 122–139 (2019)
382 H. Gaber et al.

2. American Cancer Society (2017). https://www.cancer.org


3. Elmore, J.G., Longton, G.M., Carney, P.A., Geller, B.M., Onega, T., Tosteson, A.N.A.,
et al.: Diagnostic concordance among pathologists interpreting breast biopsy specimens.
JAMA 313(11), 1122–1132 (2015)
4. Wang, G., Yuan, G., Li, T., Lv, M.: An multi-scale learning network with depthwise
separable convolutions. IPSJ Trans. Comput. Vis. Appl. 10(1), 1–8 (2018)
5. Bayramoglu, N., Kannala, J., Heikkilä, J.: Deep learning for magnification independent
breast cancer histopathology image classification. In: 2016 23rd International Conference on
Pattern Recognition (ICPR), pp. 2440–2445. IEEE (2016)
6. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: Breast cancer histopathological image
classification using convolutional neural networks. In: 2016 International Joint Conference
on Neural Networks (IJCNN), pp. 2560–2567. IEEE (2016)
7. Araujo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy, C., Pol´onia, A., Campilho,
A.: Classification of breast cancer histology images using convolutional neural networks.
PLOS ONE 12(6), 1–14 (2017)
8. Chennamsetty, S.S., Safwan, M., Alex, V.: Classification of breast cancer histology image
using an ensemble of pre-trained neural networks. In: Campilho, A., Karray, F., ter Haar
Romeny, B. (eds.) Image Analysis and Recognition, pp. 804–811. Springer, Cham (2018).
https://doi.org/10.1007/978-3-319-93000-8_91
9. Kwok, S.: Multiclass classification of breast cancer in whole-slide images. In: Campilho, A.,
Karray, F., ter Haar Romeny, B. (eds.) Image Analysis and Recognition, pp. 931–940.
Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93000-8_106
10. Alom, M.Z., Yakopcic, C., Nasrin, M.S., Taha, T.M., Asari, V.K.: Breast cancer
classification from histopathological images with inception recurrent residual convolutional
neural network. J. Digit. Imaging 32(4), 605617 (2019)
11. Zhang, Y.D., Pan, C., Chen, X., Wang, F.: Abnormal breast identification by nine-layer
convolutional neural network with parametric rectified linear unit and rank-based stochastic
pooling. J. Comput. Sci. 27, 57–68 (2018)
12. Nahid, A.A., Kong, Y.: Involvement of machine learning for breast cancer image
classification: a survey. Comput. Math. Meth. Med. (2017)
13. Golatkar, A., Anand, D., Sethi, A.: Classification of breast cancer histology using deep
learning. In: Campilho, A., Karray, F., ter Haar Romeny, B. (eds.) Image Analysis and
Recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics) LNCS, vol. 10882, pp. 837–844.
Springer, Cham (2018)
14. Rakhlin, A., Shvets, A., Iglovikov, V., Kalinin, A.A.: Deep convolutional neural networks
for breast cancer histology image analysis. In: Campilho, A., Karray, F., ter Haar Romeny,
B. (eds.) Image Analysis and Recognition. (Including Subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics) LNCS, vol. 10882, pp. 737–744. Springer,
Cham (2018)
Big Data Analytics and Service Quality
Big Data Technology in Intelligent Distribution
Network: Demand and Applications

Zhi-Peng Ye1 and Kuo-Chi Chang1,2,3,4(&)


1
School of Information Science and Engineering,
Fujian University of Technology, Fuzhou, China
1275499928@qq.com, albertchangxuite@gmail.com
2
Fuzhou University, No. 33 Xuefu South Road, New District,
Fuzhou 350118, Fujian, China
3
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
4
College of Mechanical and Electrical Engineering,
National Taipei University of Technology, Taipei, Taiwan

Abstract. Nowadays, with the development and maturity of big data technol-
ogy, big data analysis technology is more and more widely used in practice, and
more and more data are gradually applied to the smart grid of our country. Since
the data in the smart grid meets the 4 V characteristics of big data (large
quantity, fast speed, many types, low value density), the use of big data tech-
nology can provide more accurate and cheaper data for power information
generation Economic value and significance. Through the analysis of the
development process of China’s power industry, the development of distribution
network in China obviously lags behind the development of power generation
and transmission network. At present, more than 95% of the blackouts are
caused by the distribution network, and half of the power loss occurs in the
distribution network, so the automation of the distribution network system
urgently needs the support of new technologies. This paper first enumerates
several key points of big data technology, including big data collection, storage
and analysis, and then expounds several methods of big data analysis. On this
basis, big data technology is applied to the field of intelligent distribution net-
work. Especially in the application of distribution forecasting, it can provide
more powerful technical support for the operation of smart distribution network,
continuously improve the technical level of China’s smart distribution network,
and promote the optimization and upgrading of smart grid system. Finally, an
optimized prediction model is proposed, and the application of the new tech-
nology (5G technology) developed at the present stage is prospected, and its
contribution to the data acquisition and application of big data technology is
analyzed.

Keywords: Big data analysis technology  Intelligent distribution network 


Electricity generation forecast  5G communication technology

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 385–393, 2021.
https://doi.org/10.1007/978-3-030-58669-0_35
386 Z.-P. Ye and K.-C. Chang

1 Introduction

By definition, big data refers to large, complex structures and many types of data sets
that existing tools and software cannot capture, manage, store, search, share, analyze
and visualize in a short period of time. The distribution network, which is located at the
end of the power system, has the characteristics of wide geographical distribution, large
scale of power grid, many kinds of equipment, various forms of network connection,
changeable operation mode and so on. Therefore, the electric power data that needs to
be collected has various structures and a large amount of data, which has already
reached the state of “big” data [1]. In the distribution link of power system in our
country, it is necessary to analyze the electricity demand of users before power dis-
patching and distribute the power load to each end-user scientifically and reasonably.
Therefore, power demand analysis is particularly important, which determines the
economy of power grid operation. However, at present, the data acquisition frequency
of the power data acquisition system is too low to effectively meet the data acquisition
needs of the power system. However, the traditional manual collection method can not
accurately collect effective data samples, and the construction of database also needs to
be solved.
At present, the smart grid in China is developing rapidly, but the application of big
data in the distribution network is almost blank. Without data, analysis becomes empty
talk. In recent years, the distribution link has been paid more and more attention, a large
number of intelligent acquisition equipment have been installed in the distribution
network, and big data has obtained the acquisition source [2]. The deficiency is that
there is no scientific and unified standard to manage the distribution network data.
There are great differences in platform data in different regions, which brings great
difficulties to the use of data. In the application of big data technology, big data model
is not enough to accurately predict the actual power demand, because there are many
factors that affect user electricity consumption, and most of these data are relevant. For
example, through the study of holiday data, population migration and weather and
other factors [3], find out the relationship between these factors and electricity con-
sumption, optimize the load forecasting model, and constantly improve the accuracy
and economy of power load forecasting.

2 Source and Characteristics of Big Data in Distribution


Network

In the past ten years of the development of intelligent distribution network, a lot of data
has been accumulated in power enterprises, and because of the differences in database
types of various systems, these data are divided into 3 kinds: structured data, semi-
structured data and unstructured data, including user watt-hour meter data. Load
monitoring data, dispatching operation data, maintenance record data and so on. In
addition to the data measured through smart devices, there are also a large number of
operational data, customer service data and data outside power companies, including
the Internet data [4].
Big Data Technology in Intelligent Distribution Network 387

There are many kinds of distribution network big data, which can be roughly
divided into three types according to its sources: power enterprise measurement data,
power enterprise operation data and power enterprise external data. At present, the
power measurement data is most commonly used in our country, which can be used to
predict distribution and detect equipment faults and line fault areas. The use of the latter
two types is less, but it is also necessary to study the relationship between these two
types of data and distribution network, so it is more urgent to make better use of
measurement data and mine the value of these data. These data sources and classifi-
cations can be shown in Fig. 1.

Fig. 1. Source and characteristics of big data in distribution network.

2.1 Collection Method of Measurement Data


The power consumption information of users and the operation parameters of distri-
bution network equipment are collected by various watt-hour meters and terminal
equipment. It includes watt-hour meter, low-voltage user meter reading collector, low-
voltage user meter reading concentrator, feeder terminal unit (FTU), distribution
transformer terminal unit (TTU), distribution terminal unit (DTU) and so on. After
collecting measurement data, these devices are sent to the distribution automation
master station by wireless or wired transmission, where they are stored and processed.
Compared with the traditional single data collection method, the collection function of
big data technology has the characteristics of high reliability, high speed, wide source,
time-saving and labor-saving. In order to prevent data information damage, loss and
other problems, we can also upload it to the storage space or do a good backup work,
so that the risk of storage data damage can be reduced to a minimum, but at the same
time, it also puts forward higher requirements for the security of the network envi-
ronment [5].

2.2 Key Techniques and Analysis Methods of Big Data

Storage and Disposal of Big Data. At present, stream processing and batch pro-
cessing are the main data storage and processing technologies. Stream processing is
suitable for situations with high real-time requirements in distribution networks, such
as online evaluation of multi-source heterogeneous data, load scheduling, on-line
monitoring and so on. In the case of batch processing, such as distribution network
388 Z.-P. Ye and K.-C. Chang

planning, we can use the historical data of power construction to carry out new power
grid planning and construction.
Due to the large number of measurement points, large amount of data and strong
real-time performance of the intelligent distribution network, it is difficult to process
and ensure the reliability of the data if it is logically concentrated. Cloud computing,
which has sprung up in recent years, integrates distributed file system, distributed
processing system and so on. It provides a platform and technical support for big data
storage and processing. The recent popularity of edge computing [6], a terminal-based
service, also provides a backup for cloud computing.
Analytical Techniques of Big Data. Big data analysis is the process of discovering
hidden patterns and unknown relationships and mining useful information by analyzing
a large number of various types of data. Big data technology includes data mining and
visualization. Among them, the purpose of data mining is to reveal hidden, previously
unknown and potentially valuable information from a large number of data in the
database, and to establish the relationship between events and models. Inductive rea-
soning can be made through the model to help decision-making. Data visualization is to
define the data in the database as elements to promote the composition of the data
image, so as to carry out data analysis from multiple angles.
In the intelligent distribution network, the collected data emphasize real-time, so it is
necessary to strengthen the research on multi-dimensional index technology and data
association technology. Due to the large number of distribution network data, it is also
necessary to strengthen calculation and advanced communication technology as an
assistant to give full play to the role of big data.

3 Big Data Analysis Method and Means

3.1 Feature Clustering of Distribution Network Data


Data feature clustering refers to dividing the data into some clustering classes according
to the inherent properties of the data. Combined with the characteristics of the distri-
bution network big data, it can propose a data clustering solution to deal with the
spatio-temporal characteristics of the distribution network. For example, extract data
from the load monitoring system, classify and draw data curves according to different
types and different regions of users, and then carry out load characteristic cluster
analysis to provide research direction for power sales and load management. Through
the clustering of load curves, it can be used as the preprocessing process of load
forecasting and electricity price forecasting; through the analysis of line and equipment
fault information of distribution network, a curve cluster with similar changes is
formed, which can better estimate and restrain the impact of faults.
Big Data Technology in Intelligent Distribution Network 389

3.2 Distribution Network Data Feature Classification


Classification refers to the method that the training generated classification function or
classification model maps data objects to several given categories. Classification
analysis belongs to supervised learning. By knowing the classification attributes of
samples, training energy is a model to identify new samples. At present, in the fault
identification and diagnosis of distribution transformer in China, Bayesian classifica-
tion method has been used to classify transformer faults as internal or external
grounding and short-circuit faults, and neural network algorithms have been used to
identify normal and abnormal operating states. The following is the Bayesian formula
used by the Bayesian classifier [7]:

Pðxjyi ÞPðyi Þ
Pðyi jxÞ ¼ ð1Þ
PðxÞ

In this formula (1): Pðyi Þ is the prior probability of class yi , and Pðyi jxÞ represents
the probability that the item x to be classified belongs to category yi , which is a
posteriori probability. Formula (1) can also be expressed as:

Pða1 ; a2 ; . . .; an jyi ÞPðyi Þ


Pðyi ja1 ; a2 ; . . .; an Þ ¼ ð2Þ
Pða1 ; a2 ; . . .; an Þ

Where ai is the attribute of item x to be classified, if


Pðyk jx) ¼ maxfPðy1 jxÞ; Pðy2 jxÞ; . . .; Pðyn jxÞg, then x 2 yk .

3.3 The Technical Route of Big Data Analysis of Distribution Network


Starting from the problems to be solved, a complete business problem can be solved by
combining a variety of data mining methods. For example, in solving the problem of
data budget processing, we can use the method of statistical description to analyze and
process the data, uniformly standardize the data with dimensionless model, and explore
the application points with the method of cluster analysis. Based on the clustering
classification method described above, the fast processing method of big data in large-
scale active distribution network can be developed.
Based on big data clustering and classification technology of distribution network,
the fast performance of big data for large-scale distribution network is studied. The
technical route of high-speed data analysis and processing technology is shown in
Fig. 2.
390 Z.-P. Ye and K.-C. Chang

Fig. 2. Fast processing method of big data of distribution network.

4 Application of Big Data Technology in Distribution


Network to Load Forecasting

With the increase of factors affecting the power consumption of users, a new intelligent
forecasting method is proposed. Because the new algorithm has good nonlinear fitting
ability, a large number of research results have appeared in the field of load forecasting
in recent years. The artificial neural network algorithm (ANN), new clustering meth-
ods, and a multi-model partitioning algorithm (MMPA) [8, 9], and so on are used in
load forecasting. The ability to analyze and process data based on intelligent fore-
casting can enable researchers to extract more valuable information from distribution
network big data and provide more effective data support for regional planning and
construction, power grid dispatching, load forecasting and other decisions. The fol-
lowing is a brief description of the implementation process of the short-term load
forecasting model based on cluster analysis and neural network.

4.1 Clustering Method


According to the actual situation, different time has the greatest impact on the power
load, for example, during the working day, the load is high and the electricity con-
sumption is the largest. On the other hand, the power consumption of the rest day on
the weekend decreases obviously, so the power data can be preliminarily clustered
according to the number of weeks. Monday is the beginning of the working day, and
the current to start the equipment is large and falls into a separate category. Overtime
Saturdays are grouped separately, Sundays and holidays are grouped together. As a
result, the week can be divided into four categories. In this study, considering the
relationship between temperature range and prediction accuracy, temperatures are
divided into four categories, namely, [0 °C, 20 °C], [20 °C, 30 °C], [30 °C, 35 °C] and
[35 °C, 45 °C]. Above of, historical data were divided into 16 categories. According to
the clustering results, historical data similar to the predicted daily data were selected as
training samples to train the neural network model. The input quantity includes the date
temperature, and the output quantity is the load value at the corresponding time.
Big Data Technology in Intelligent Distribution Network 391

4.2 Construction of Load Forecasting Model


The three-layer neural network structure consists of input layer, hidden layer and
output layer [10]. Based on clustering method and neural network, the load prediction
model was built, as shown in the following Fig. 3. The historical data are classified,
and the historical data which are similar to the predicted daily data are selected as
training samples, and the simulation model is established. The input includes the
forecast day and its temperature range, and the output is the predicted daily load value.

Fig. 3. Load forecasting model based on cluster analysis.

5 Prospect of 5G Technology in Distribution Information


Collection

Compared with other generations of network technology, 5G technology also belongs


to cellular mobile network, which has the characteristics of high-speed transmission
and low delay. In the various information collection points of the distribution network,
intelligent instruments based on 5 g technology are widely used to transmit the mon-
itoring data back to the server more efficiently and accurately [11, 12]. Therefore, the
use of 5G technology can easily solve the problems mentioned in this paper that the
data acquisition frequency of the existing intelligent instruments is low and the relia-
bility of the collected data is not high, and it also provides excellent data for the
establishment of a more accurate prediction model.

5.1 Application in the Operation and Management of Intelligent


Distribution Network
In the daily operation point inspection management of equipment, the handheld
intelligent terminal based on 5G technology can obtain the state data of monitoring
points and transmit the maintenance and emergency repair information of intelligent
distribution network. Since the transmission rate of 5G can be up to 10 Gbps, the video
and audio information generated in the process of spot inspection and emergency repair
can also be saved to the cloud platform in real time to better record the running status of
the equipment and provide a basis for maintenance or replacement of electrical
equipment.
392 Z.-P. Ye and K.-C. Chang

5.2 Application in Distribution Network Control


In case of common faults in the distribution network, such as single-phase short circuit,
the fault indicator transmits the fault information to the server. The fault detection
system of the server sends out the removal fault instruction according to the real-time
monitoring information and fault information of the equipment. When the fault
information and instruction are transmitted by 5G network, the fault time is greatly
reduced, the power supply reliability is improved, and the fault caused by the accident
is reduced Loss. The application of the 5G network in the above two situations is
shown in Fig. 4.

Fig. 4. Suggestions on the application of 5G technology.

The following is the comparison of the technical indicators of 5G and 4G. The
superior performance of 5G technology can be seen from Table 1.

Table 1. Comparison of the technical indicators of 5G and 4G.


Technical index 4G 5G Increase multiple
Time delay 10 ms 1 ms 0.1 times
User rate 10 Mbps 1.1–1 Gbps 10–100
Peak rate 1 Gbps 20 Gbps 20 times
Flow density 0.1 Tbps/km2 10 Tbps/km2 100 times
Connection number density 106/km2 106/km 10 times
Mobility 350 km/h 500 km/s 1.43 times
Energy efficiency 1 times 100 times 100 times
Spectral efficiency 1 times 3–5 times 3–5 times

The main role of 5G technology in distribution control is to reduce the time delay of
fault instructions. The increase in user usage rate enables users to transmit larger data,
such as high-definition video. The lower energy efficiency of 5G compared to 4G is
more in line with the current concept of environmental protection.
Big Data Technology in Intelligent Distribution Network 393

6 Conclusion

In conclusion, accurate load forecasting can effectively reduce the cost of power
generation and improve economic and social benefits. In order to improve the accuracy
of short-term load forecasting model, a forecasting model based on clustering method
and neural network is proposed under the influence of date and temperature. Consid-
ering that the power system is a system with the joint action of many factors, in the
follow-up work, we should try to analyze the influence of more factors on the pre-
diction, consider a better clustering method, and establish a practical and available
prediction model. 5G technology is used in data acquisition to make training data
samples more reliable and improve the accuracy of model prediction.

References
1. He, X., Ai, Q., Qiu, R.C., et al.: A big data architecture design for smart grids based on
random matrix theory. IEEE Trans. Smart Grid 8(2), 674–686 (2017)
2. Tu, C., He, X., Shuai, Z., et al.: Big data issues in smart grid – a review. Renew. Sustain.
Energy Rev. 79, 1099–1107 (2017)
3. Raza, M.Q., Khosravi, A.: A review on artificial intelligence based load demand forecasting
techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 50, 1352–1372 (2015)
4. Lomotey, R.K., Deters, R.: Topics and terms mining in unstructured data stores. In:
Computational Science and Engineering, pp. 854–861 (2013)
5. Bayindir, R., Colak, I., Fulli, G., et al.: Smart grid technologies and applications. Renew.
Sustain. Energy Rev. 66, 499–516 (2016)
6. Han, W., Xiao, Y.: Edge computing enabled non-technical loss fraud detection for big data
security analytic in smart grid. J. Ambient Intell. Hum. Comput. 11, 1–12 (2019)
7. Zhu, J., Chen, J., Hu, W., et al.: Big learning with Bayesian methods. Nat. Sci. Rev. 4(4),
627–651 (2017)
8. Kuo, P., Huang, C.: A high precision artificial neural networks model for short-term energy
load forecasting. Energies 11(1), 213 (2018)
9. Singh, P., Dwivedi, P.: Integration of new evolutionary approach with artificial neural
network for solving short term load forecast problem. Appl. Energy 217, 537–549 (2018)
10. Liang, Y., Niu, D., Hong, W., et al.: Short term load forecasting based on feature extraction
and improved general regression neural network model. Energy 166, 653–663 (2019)
11. Borenius, S., Costarequena, J., Lehtonen, M., et al.: Providing network time protocol based
timing for smart grid measurement and control devices in 5G networks. In: International
Conference on Communications, pp. 1–6 (2019)
12. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G
base station based on Internet of Things collaborative control. IEEE Access 8, 32935–32946
(2020)
Memory Management Approaches in Apache
Spark: A Review

Maha Dessokey1(&) , Sherif M. Saif1 , Sameh Salem2 ,


Elsayed Saad2 , and Hesham Eldeeb1
1
Electronics Research Institute, Cairo, Egypt
mdessoky@eri.sci.eg
2
Faculty of Engineering, Helwan University, Helwan, Egypt

Abstract. In the era of Big Data, processing large amounts of data through
data-intensive applications, is presenting a challenge. An in-memory distributed
computing system; Apache Spark is often used to speed up big data applications.
It caches intermediate data into memory, so there is no need to repeat the
computation or reload data from disk when reusing these data later. This
mechanism of caching data in memory makes Apache Spark much faster than
other systems. When the memory used for caching data is full, the cache
replacement policy used by Apache Spark is the Least Recently Used (LRU),
however LRU algorithm performs poorly in some workloads. This review is
going to give an insight about different replacement algorithms used to address
the LRU problems, categorize the different selection factors and provide a
comparison between the algorithms in terms of selection factors, performance
and the benchmarks used in the research.

Keywords: Apache Spark  Cache management  Replacement algorithms 


In-memory data processing

1 Introduction

Every day large amounts of data are generated from different sources as social data,
machine data and transactional data. This increasingly growing data cause a problem of
how to store and perform any kind of analysis on such huge heterogeneous datasets.
To get benefit from this huge amount of data, researchers have been working on
building novel data analysis techniques for Big Data more than ever before which has
led to the continuous development of many different Big Data algorithms and platforms
[1].
Big Data platforms include the pioneering MapReduce framework initially pro-
posed by Google, the open-source Hadoop MapReduce framework [2] and Apache
Spark framework [3] that improves the performance of Hadoop by up to 100x through
in-memory cluster computing [4]. Apache Spark caches data required for computation
in the memory of the nodes in the cluster, so data can be referred back without
reloading it from disk; thus eliminating expensive intermediate disk writes. Apache
Spark is widely used in many application domains, including Bioinformatics and
Biomedicine [5, 6], Finance [7], and Astronomy [8], etc.

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 394–403, 2021.
https://doi.org/10.1007/978-3-030-58669-0_36
Memory Management Approaches in Apache Spark: A Review 395

This paper is going to discuss the Apache Spark platform, how it caches the data
and how it frees space when the cache is full. Then review the researches done to
enhance its performance.
The rest of this paper is organized as follows: Sect. 2 introduces Apache Spark
platform, Sect. 3 introduces cache mechanism in Apache Spark, Sect. 4 reviews dif-
ferent Apache Spark cache replacement polices, Sect. 5 reviews some caching opti-
mization techniques in Apache Spark, and Sect. 6 concludes and gives
recommendations.

2 Apache Spark Platform

This section provides background knowledge about Apache Spark platform, features
and cache memory management, to help understand the different techniques introduced
to improve the Apache Spark performance.
Apache Spark employs the master-slave architecture. When a user submits a job to
Apache Spark, the master initializes a driver and a slave initializes a pre-defined
number of executors to run the job. The driver splits the input data into partitions and
assigns each partition to an executor for processing. The executor loads the data block
corresponding to that partition and perform operations on the data.
Apache Spark exposes a programming model to Big Data application developers
based on resilient distributed datasets (RDDs) [9]. As a key abstraction in Apache
Spark, RDD is a collection of objects partitioned across nodes in Apache Spark cluster
and all partitions can be computed in parallel. More importantly, as a crucial
abstraction, RDD leverages the distributed memory to cache the intermediate results.
Recently dataframes abstract was built on top of RDD where data is organized in
rows and can be used with SparkSQL.
Apache Spark supports two types of operations- transformation and action.
Apache Spark Transformation. It is a function that produces new RDD from the
existing RDDs. Examples of such operations are map(), filter(), groupByKey(), and
join() operations. Applying transformation built an RDD lineage, with the entire parent
RDDs of the final RDD(s). RDD lineage, also known as RDD operator graph or RDD
dependency graph. It is Directed Acyclic Graph (DAG) of the entire parent RDDs of
RDD.
Apache Spark Actions. When the actual dataset is needed to be worked with, at that
point action is performed. When the action is triggered, it launches a computation on an
RDD and returns a value to the program or writes data to the external storage.
Examples of action operations are count(), collect(), countByValue() and save().
Lazy evaluation feature in Apache Spark means that transformations are lazy and it
will not be executed immediately, the execution will start only when an action is
triggered. Hence, in lazy evaluation data is not loaded until it is necessary, and before
execution some optimization could be done given the DAG.
DAG is a set of vertices and edges, where in Apache Spark, vertices represent the
RDDs and the edges represent the operations to be applied on RDD. In Apache
Spark DAG, every edge directs from earlier to later in the sequence. On the calling of
396 M. Dessokey et al.

action, the created DAG submits to DAG scheduler which further splits the graph into
the stages of the task [10]. Figure 1 shows a DAG example and Fig. 2. shows how the
DAG scheduler splits it into stages.

Fig. 1. DAG example

Fig. 2. DAG scheduler example

If an intermediate partition is not cached or a failure occurs on any node holding the
cache, Apache Spark will reconstruct the partition using the DAG and continue to
execute computation.
In Sect. 4, it will be discussed how DAG information could be used in selecting
which RDD to persist in cache and which to remove in case if the memory is full.

3 Cache Mechanism in Apache Spark

In this section, the Apache Spark cache memory management will be discussed.
Memory usage in Spark largely falls under one of two categories: execution and
storage. Execution memory used to process intermediate data as in shuffles, joins, sorts
and aggregations operations, while storage memory refers to that used for caching and
propagating internal data across the cluster. In Spark, execution and storage share a
unified region. When no execution memory is used, storage can acquire all the
available memory and vice versa. Storage Memory is managed by Block Manager.
Manager running on every node (driver and executors) which provides interfaces for
putting and retrieving blocks both locally and remotely into various stores. When RDD
partitions have been cached in memory during the iterative computation, an operation
Memory Management Approaches in Apache Spark: A Review 397

which needs the partitions will get them by Cache Manager. Moreover, the partitions
are cached by Cache Manager, and all operations including reading or caching in Cache
Manager mainly depend on the API of Block Manager. Block Manager decides
whether partitions are obtained from memory or disks.
In Apache Spark, intermediate data caching is executed by calling persist () method
for RDD with specifying a storage level. The storage level designates use of disk-only,
or use of both memory and disk, etc. In the case of RDD, the default is memory-only.
When the memory used for caching data reaches the capacity limits, Block Man-
ager will choose which data to discard to make room for the new ones, and the
discarded data need to be recovered when they are used again.
The cache replacement policy in Apache Spark is LRU [9] because of its simplicity
and low overhead. However, the LRU algorithm performs poorly for the following
workloads:
Scanning Workload. Because an LRU algorithm evicts the least-recently-used block,
recently accessed blocks reside in the cache. However, the blocks in the scanning
workload are accessed only a single time.
Cyclic Access (loop-like). Workload in which loop length is greater than cache size.
For instance, when the cache size is 3 and the workload’s block request sequence is 1-
2-3-4-1-2-3-4-1-2-3-4, the LRU algorithm always generates a cache miss. In this case,
Block 1 will be evicted as a result of the insertion of Block 4. Thus, the next request of
Block 1 cannot be a cache hit. Therefore, if the cache size is smaller than the work-
load’s cyclic pattern size, the LRU algorithm always generates a cache miss.
Beside the previous drawbacks of LRU in some workloads, the LRU algorithm
neglects the lazy evaluation feature in Apache Spark.
Memory caching has a long history and has been widely employed in storage
systems, databases, file systems, web servers, operating systems, and processors, and to
address the problems of LRU, many buffer cache algorithms have been proposed [11].
Among them, two cache algorithms, adaptive replacement cache (ARC) [12] and low
inter-reference recency set (LIRS) [13], have shown the best performance for multiple
workloads. These two-LRU stack-based approaches overcome the limitation of LRU
algorithm.

4 Cache Replacement Algorithms for Apache Spark

In this section, the different selection factors -which the researchers used in selecting
the RDDs to be replaced when cash is full- are going to be categorized and discussed,
followed by an extensive review for different algorithms.

4.1 Selection Factors


For Apache Spark, researchers studied different factors to be used in selecting the
RDDs to be replaced when cash is full, and they compared the performance with the
LRU algorithm used in the Apache Spark. As shown in Fig. 3, some selection factors
can be considered as history based factors such as the recency, frequency and
398 M. Dessokey et al.

computational cost factors, others can get benefit from the lazy evaluation feature in
Apache Spark and the DAG information in selecting the RDDs to be replaced.

Fig. 3. Selection factors

History Based Factors


Recency. LRU considers the recency factor only for the chosen RDD. When the cache
memory is insufficient, the LRU algorithm will release the least recently used RDD.
Computational Cost Factor. It was used in [14, 15]. When the discarded data is needed
to be reused, there would be unnecessary computational overhead. Then, the compu-
tational cost of partition–sometimes referred as recovery cost - should be a crucial
factor. The partition with higher cost shouldn’t be replaced. Duan et al. [14] defined the
computational cost of the whole RDD as the max computational cost of RDD parti-
tions, and the cost of partition j of RDDi calculated as the difference between starting
time STij and finishing time FTij, which can roughly express its execution and com-
munication time, so the computational cost of RDD partition j calculated as

Costij ¼ FTij  STij ð1Þ

Partition Size. It was considered in [14, 16]. When other factors are consistent, it is
preferable to delete the RDDs occupying a large memory space to release more
resources.
DAG Based Factors
Reference Count. It was used and defined in [17] as the count of dependent child
blocks that have not been computed yet.
Effective Reference Count. It was used in [18]. If data block is referenced by task t,
then this reference is effective if task t’s dependent blocks, if computed, are all cached
in memory. The effective reference count of a data block is the number of its effective
references.
Memory Management Approaches in Apache Spark: A Review 399

Composition Reference Count. It was used in [19] which defined the reference count
for bock b in current stage as RCintrab, and RCinterb as the Inter-stage reference count
which is the reference count for block b in downstream stages, then calculate the
composition reference count as in Eq. 2.

RCcomb ¼ RCintrab þ beta  RCinterb = Dtss ð2Þ

Where s is id of current stage, ts is id of nearest stage that b will be visited again in


the future. Dtss indicates distance between s and ts. beta, which is not less than zero, is
an adjustment factor which indicates weight that inter-reference count would be
considered.
Reference Distance. It was used in [20] and defined as the relative distance between
the current step in the application execution and the step in the workflow that the data
block is needed. The reference distance is initially calculated by parsing the DAG, and
later while the application is executed, the reference distance for each block is updated
by simply decrementing the value based on the stage ID that is currently executing.

4.2 Cache Replacement Algorithms


Weight Replacement (WR) algorithm [14] considers computation cost, the number of
use for partitions and sizes of RDDs in selecting the RDDs to be replaced. PageRank
algorithm was used to test the system performance under different conditions. WR
shows less execution time and more memory occupancy rate than Apache Spark
default replacement policy.
Least Cost Strategy (LCS) [15] gets dependencies information between cache data
via analyzing application, and calculates the recovery cost during running. By pre-
dicting how many times cache data will be reused and using it to weight the recovery
cost, it evicts the data which lead to minimum recovery cost in future. Hi-Bench
benchmark [21] was used to evaluate the performance. Experimental results showed
that this approach achieved better performance when memory space was not sufficient,
and reduced 30% of the total execution time compared with the default Apache Spark
replacement policy.
Least Reference Count (LRC) [17] algorithm uses reference count factor and evicts
the cached data blocks with the smallest reference count, also LRC can safely evicts
data blocks with zero reference count which will not be used again in the remaining
computations. Spark-Bench [22] was used to evaluate the performance and compared
with LRU, LRC improved the caching performance by 22%.
Least Effective Reference Count (LERC) [18], uses effective reference count factor
and evicts blocks with smallest effective reference count. LERC speeded up the job
completion time by up to 37% compared to LRU.
Least Composition Reference Count (LCRC) [19], uses composition reference
count and evicts blocks with smallest composition reference count. The performance
was improved by up to 65% compared to LRU.
Most Reference Distance (MRD) [20] algorithm uses reference distance factor and
evicts the data block whose reference distance is the largest, and pre-fetches the data
400 M. Dessokey et al.

blocks whose reference distance is the smallest. Spark-Bench was used to evaluate the
performance. MRD had low overhead and performance was improved by an average of
53% compared to LRU.
In previous studies, any change in the RDD has to reload the whole RDD from the
external storage system, but recently Self-adaptive Weight Cache Replace Algorithm
(SWCR) [16] uses partial-update RDD and introduces weight model that calculate
frequency, size, calculation cost and dependency integrity, and sets weight for each
feature that is adjusted according to each application requirements. SWCR evicts the
data blocks with the smallest weight. SWCR improved performance by an average of
21% compared to LRU.
Latest version of Apache Spark starts to consider the partition replacement.
Table 1 shows a comparison between different algorithms used for cash replace-
ment in Apache Spark in terms of the replacement factors used, improvement per-
centage achieved in comparison with default Apache Spark replacement algorithm, the
implementation and benchmarks used in evaluating the algorithms.

Table 1. Comparative analysis of replacement algorithms in Apache Spark


Replacement Year Replacement Factors Improvement/ Implementation Benchmarks
technique Apache Spark Physical used
version cluster/Virtual
Cluster
LRU 2010 Recency Default – –
Apache Spark
WR [14] 2016 Computation Cost, 40%/Apache 6 Virtual nodes PageRank
Frequency and Size Spark-1.1.0
LCS [15] 2017 Recovery Cost 30%/Apache 8 physical Page Rank
Spark 1.5.2 nodes cluster HI-Bench
LRC [17] 2017 Reference Count 60%/Apache 20 virtual Page Rank
Spark 1.6.1 nodes Spark-Bench
LERC [18] 2017 Effective Reference 37%/Apache 20 virtual Spark zip
Count Spark 1.6.1 nodes Jobs
LCRC [19] 2018 Composition Reference 65%/Apache 4 physical KMeans
Count Spark 2.3.0. nodes cluster
MRD [20] 2018 Most Reference 53%/Apache 25 virtual Spark-Bench
Distance Spark 2.0.0 nodes
SWCR [16] 2019 Frequency, Size, 21%/Apache 4 virtual nodes PageRank
Calculation Cost and Spark 2.4.0
Dependency integrity

It can be seen that there are many factors rather than the recency factor used by
LRU, that could be used to enhance the Apache Spark performance, in addition the use
of the lazy evaluation feature in Apache Spark and the DAG information can introduce
a set of selection factors to keep the most important data blocks residing in the cache to
be ready for use.
Memory Management Approaches in Apache Spark: A Review 401

Classified and active caching strategy was introduced specifically for iterative
applications [24], in the first iteration of iterative application. When application reads
the data from storage, it uses the active caching algorithm to create the corresponding
RDD. After one iteration is finished, the useful RDD for the next iteration can be kept
or tuned into cache and some RDD should be eliminated from cache that should not be
needed for next iteration.

5 Caching Optimization Techniques for Apache Spark

This section reviews other researches that introduce other optimization techniques to
improve memory performance in Apache Spark.
Some researches [23, 25] addressed the problem of selecting which data to be
cached in the memory and which cache level to use. They do not need the decision to
be only by the user. An adaptive algorithms were introduced to automatically deter-
mine the most valuable intermediate datasets to be stored in the memory and [25]
adaptively uses different in-memory cache level according runtime information of the
cluster.
Other researchers think in studying the performance of Apache Spark on different
disk types, Doppio [26] introduced an I/O-Aware performance analysis for Apache
Spark, by using different combinations of Hard Disk Drives (HDDs) and Solid-State
Drives (SSDs) to measure the I/O impact and change of the CPU core number to
discover the relation between computation and I/O access, hence the model could be
used to find the optimal configuration selection in the public cloud.
Another way to improve the performance of Apache Spark is to use a separate data
caching/storage layer, which can get use of SSDs for data caching due to its high read
speeds. This data caching/storage layer sits between compute and storage.
Examples of such a layer are:
RubiX [27], an open source project, uses SSDs rather than reserve operating memory
for caching purposes. It is used in data caching service for Azure HDInsight that
improves the performance of Apache Spark jobs [28].
Databricks Delta Lake [29], is an open source storage layer. Which also uses nodes
local storage for caching and the data is cached automatically whenever a file has to be
fetched from a remote location so successive reads of the same data are performed
locally, which results in significantly improved reading speed. This caching process
does not require any action from the user.
Open Cache Acceleration Software (Open CAS) [30], Open CAS interoperates with
node memory to create a multilevel cache that optimizes the use of system memory and
automatically determines the best cache level for active data, allowing applications to
perform even faster than running fully on SSDs.
Alluxio [31], an open source data orchestration for big data and machine learning. It
can be used with Apache Spark and it caches data which can be accessed concurrently
by multiple application frameworks.
402 M. Dessokey et al.

6 Conclusion

Over the latest years, Apache Spark has been widely used as in-memory large-scale
data processing platform. An important feature in Apache Spark is the caching of the
intermediate data. If the data size becomes larger than the storage size, accessing and
managing the data efficiently become challenging.
Till the latest Apache Spark release, LRU has been the only used replacement
algorithm, despite that memory caching has a long history and has been widely
employed in different systems and many cache replacement algorithms have been
proposed to address the problems of LRU. LRU also neglects the use of the lazy
evaluation feature in Apache Spark and the DAG information which introduce a set of
selection factors to keep the most important data blocks residing in the cache.
In this paper different replacement algorithms with different selection factors were
discussed, we recommend using the weight replacement algorithm in which different
weights are given to selection factors and the weights can be adjusted automatically
according to the requirements of every application.
We also listed some of the open source data caching/storage layers, which can be
used with Apache Spark to improve its performance. All of the separate caching layers
can get use of local SSDs in caching the data.

References
1. Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1), 1–20
(2014). https://doi.org/10.1186/s40537-014-0008-6
2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun.
ACM 51(1), 107–113 (2008)
3. Zaharia, M., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95
(2010)
4. Gu, L., Li, H.: Memory or time: performance evaluation for iterative operation on Hadoop
and spark. In: 2013 IEEE 10th International Conference on High Performance Computing
and Communications (2013)
5. Costa, C.H.A., et al.: Optimization of genomics analysis pipeline for scalable performance in
a cloud environment. In: 2018 IEEE International Conference on Bioinformatics and
Biomedicine (BIBM) (2018)
6. Sarumi, O.A., Leung, C.K.: Exploiting anti-monotonic constraints in mining palindromic
motifs from big genomic data. In: 2019 IEEE International Conference on Big Data (Big
Data) (2019)
7. Zhou, H., et al.: A big data mining approach of PSO-based BP neural network for financial
risk management with IoT. IEEE Access 7, 154035–154043 (2019)
8. Zhang, Z., et al. Scientific computing meets big data technology: an astronomy use case. In:
2015 IEEE International Conference on Big Data (Big Data) (2015)
9. Karau, H., et al.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media,
Newton (2015)
10. Zaharia, M.: An Architecture for Fast and General Data Processing on Large Clusters.
Association for Computing Machinery and Morgan & Claypool Publishers (2016)
Memory Management Approaches in Apache Spark: A Review 403

11. Berger, D.S., Sitaraman, R.K., Harchol-Balter, M.: Adaptsize: orchestrating the hot object
memory cache in a content delivery network. In: Proceedings of the 14th USENIX
Conference on Networked Systems Design and Implementation, pp. 483–498. USENIX
Association, Boston (2017)
12. Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In:
Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pp. 115–
130. USENIX Association, San Francisco (2003)
13. Jiang, S., Zhang, X.: LIRS: an efficient low inter-reference recency set replacement policy to
improve buffer cache performance. SIGMETRICS Perform. Eval. Rev. 30(1), 31–42 (2002)
14. Duan, M., et al.: Selection and replacement algorithms for memory performance
improvement in Spark. Concurr. Comput.: Pract. Exp. 28(8), 2473–2486 (2016)
15. Geng, Y., et al.: LCS: an efficient data eviction strategy for Spark. Int. J. Parallel Program. 45
(6), 1285–1297 (2017)
16. Zhao, C., et al.: Research cache replacement strategy in memory optimization of spark. Int.
J. New Technol. Res. (IJNTR) 5(9), 27–32 (2019)
17. Yu, Y., et al. LRC: dependency-aware cache management for data analytics clusters. In:
IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE (2017)
18. Yu, Y., et al. LERC: coordinated cache management for data-parallel systems. In:
GLOBECOM 2017-2017 IEEE Global Communications Conference. IEEE (2017)
19. Wang, B., et al.: LCRC: a dependency-aware cache management policy for Spark. In: 2018
IEEE International Conference on Parallel and Distributed Processing with Applications.
IEEE (2018)
20. Perez, T.B.G., Zhou, X., Cheng, D.: Reference-distance eviction and prefetching for cache
management in Spark. In: Proceedings of the 47th International Conference on Parallel
Processing, Association for Computing Machinery, p. Article 88, Eugene (2018)
21. Huang, S., et al.: The HiBench benchmark suite: characterization of the MapReduce-based
data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops
(ICDEW 2010). IEEE (2010)
22. Li, M., et al.: SparkBench: a spark benchmarking suite characterizing largescale in-memory
data analytics. Cluster Comput. 20(3), 2575–2589 (2017)
23. Yang, Z., et al.: Intermediate data caching optimization for multi-stage and parallel big data
frameworks. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD).
IEEE (2018)
24. Niu, D., et al.: The classified and active caching strategy for iterative application in Spark. In:
2018 27th International Conference on Computer Communication and Networks (ICCCN).
IEEE (2018)
25. Xu, E., Saxena, M., Chiu, L.: Neutrino: revisiting memory caching for iterative data
analytics. In: 8th {USENIX} Workshop on Hot Topics in Storage and File Systems
(HotStorage 2016) (2016)
26. Zhou, P., et al. Doppio: I/O-aware performance analysis, modeling and optimization for in-
memory computing framework. IEEE. (2018)
27. RubiX. https://github.com/qubole/rubix
28. Azure HDInsight. https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-
improve-performance-iocache
29. Databricks Delta Lake. https://docs.databricks.com/delta/optimizations/delta-cache.html
30. Open Cache Acceleration. https://open-cas.github.io/
31. Alluxio. https://www.alluxio.io/
The Influence of Service Quality on Customer
Retention: A Systematic Review in the Higher
Education

Aisha Alshamsi1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. This paper aims to identify the influence of service quality on cus-
tomer retention and the factors that affect this relationship using a systematic
review and meta-analysis method to use in the second stage in examining the
relationship of service quality on customer retention in higher educations.
A systematic review method was conducted to select the studies that will assist
in the current studies. This systematic review covered 32 research articles
published in peer-reviewed journals from 1996 till 2018 and were reviews
critically. The main findings of the study indicate that service quality-related
factors is the most common factor, flowed by customer satisfaction, trust,
commitment, and loyalty. Moreover, it has been noticed that the quantitative
method using questionnaire was found to be the primary relied upon research
methods for collecting data followed by a focus group. Furthermore, 75% of the
analyzed studies recorded positive research outcomes. Most of the analyzed
studies that had a positive outcome were conducted in the United Kingdom
followed by the United States in terms of the context, most of the analyzed
studies where done for Banks, followed by Mobile Service Industry, Retailing
industry, Small firms, Steel industry, Tourism industry, Airline industry, Zoo,
and Advertising service respectively. To that end, this systematic review
attempts to investigate the relationship between service quality and customer
retention and the factors affecting this relationship.

Keywords: Systematic review  Service Quality  Customer Retention 


Customer satisfaction  Trust  Commitment  Loyalty

1 Introduction

Service Quality (SQ) is a very important topic and considered to be a critical factor for
modern service companies [1, 2] as it’s considered to be one of the strongest tools in
differentiating the business style from other competitors [3, 4] and to have a compet-
itive advantage that will enable the companies to attract new customers, as well as it’s a

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 404–416, 2021.
https://doi.org/10.1007/978-3-030-58669-0_37
The Influence of Service Quality on Customer Retention 405

very important means for customer retention (CR) [5–7]. Previous studies had advice
firms to increase the level of their CR since the cost of customer acquisition is higher
than the serving repeat customers [8–11], which will lead to an increase in their profits
[12–14]. It has been confessed that excellence in services improve customers experi-
ence and assist in building loyal customers [15–17]. Moreover, SQ is the key ante-
cedent to a successful customer relationship [5, 18–20]. CR and SQ are very serious
and important issues for the continuity and success of the business [21, 22]. Various
studies on this topic were done. It is believed that CR is affected by the SQ [23–25], yet
further examination is required to be done depending on other research perceptions;
therefore, a systematic review will be conducted to do so, and as per [26, 27], the
systematic review aims to summarize a large number of studies about a phenomenon.
Accordingly, the selected studies systematically reviews combine the CR studies
related to SQ to provide a comprehensive analysis of the collected studies and study the
relationship between them. More precisely, this systematic review poses the following
five research questions:
RQ1: What are the main research purposes of the selected studies?
RQ2: What are the main research methods of the selected studies?
RQ3: What are the active countries in the context of the selected studies?
RQ4: What are the main disciplines/contexts of the selected studies?
RQ5: What are the year of publication of the selected studies?

2 Literature Review

Customer retention is described by many scholars as customers’ stated continuation of


a business relationship with a specific firm [22, 28, 29]. On the other hand, service
quality is defined as the overall impression of customers of the relative
inferiority/superiority of the organization and its services [30, 31]. Customer retention
and service quality are an issue of concern in every service sector [32, 33]. It can have a
direct impact either positive or negative on institutions and market sectors, some
studies connected both service quality with customer retention as it was specified in
some literature that to maintain a customer the institution should improve its quality of
services [34, 35]. Some findings show that service quality contributes to having long
term relationships and an increase in customer retention [2, 36, 37]. As per the above,
we can understand the importance of studying the relationship between service quality
and customer retention, and for that, a systematic review was conducted to add more
value to the literature. The systematic reviews are considered a type of literature
review, it collects and critically analyze several research studies throughout using
methods that are selected by one or more research questions and answer them in
structured methodology [38, 39].
406 A. Alshamsi et al.

3 Methods

Before conducting any study, it’s crucial to do the critical literature review [40]. To find
the relevant publications for the selected topic addressing service quality and customer
retention, a structured approach was followed based on the suggestion of Webster and
Watson [41]. The review of this study was done in four different stages [42], these
stages include: specifying the inclusion and exclusion criteria, data source and search
strategies, quality assessment, and data coding and analysis [40]. Furthermore, the
details of these stages are described below in the following subtitle:

3.1 Inclusion/Exclusion Criteria


The articles that will be critically analyzed in this review study should meet the
inclusion and exclusion criteria describes in Table 1 [43].

Table 1. Describes the inclusion and exclusion criteria for this systematic review report
No. Criteria Inclusion Exclusion
1 Date All –
2 Source type Peer-reviewed articles, Scholarly Non- Peer-reviewed articles,
journals, case studies, Academic newspapers, Book reviews, and
Journals, dissertations & theses other types of publications
3 Language English Papers that use languages other
than English
4 Type of Peer-reviewed, quantitative, Annual reports, Audio\video
studies qualitative, empirical studies, clip, advertisement, directory,
systematic review film, and other studies
5 Study design Meta-analyses randomized and –
controlled studies, survey,
interview, case study
6 Measurement Service quality and retention –
7 Outcome Relationship between service
quality and retention
8 Context Should involve service quality All contexts do not mention
and retention service quality and retention in
the Title

3.2 Data Sources and Research Strategies


As per [44] the research articles have been included in this systematic literature review
from an extensive search of existing studies using different subsequent databases
(ProQuest One Academic, Emerald insight, Scopus, Epsco and google scholar search
engine). The search for these studies was undertaken in February 2020. The keywords
that were included in the search terms were ((quality service*) and (retention)),
(“customer retention” and (quality of service)), AND (service quality “customer
retention”). The search was done using the title only as advance search criteria using
The Influence of Service Quality on Customer Retention 407

the above databases. From each article, the following number of articles were extracted
respectively: ProQuest One Academic (N = 227), Epsco (N = 19), Emerald (N = 39),
Google scholar (N = 255), so the total number of studies are (N = 568) articles were
founded using the above-mentioned keywords. (N = 495) Articles were found non-
relevant and duplicated. Therefore, they were filtered out. As a result, the overall
reaming articles become (N = 73). And after going through them and performing the
inclusion and exclusion criteria for each study, the number of articles becomes
(N = 44) research article was found that meet the inclusion criteria, after that the
articles without trustworthiness were also excluded, and the remaining articles are
(N = 32). In this manner, the relevant studies were selected and included in the sys-
tematic review process, as shown in Table 2. Moreover, the selection took place in four
phases, as explained below in Table 3, and Fig. 1 demonstrates the process for the
systematic review and the number of studies determined at each step [45].

Table 2. The data sources and search keywords.


Set # Searched for Databases Results Used
one
S1 ti(quality service*) AND ti(retention) ProQuest 23 15
One
Academic
S2 TI service* quality AND TI retention Epsco 19 4
S4 title: “customer retention” AND (quality of service) Emerald 39 6
insight
S5 (TITLE (service* AND quality) AND TITLE (retention)) Scopus 28 1
S6 allintitle: service quality “customer retention” Google 255 6
Scholar
Total 32

The following search and selection process was directed by four consecutive steps
[45], as described in Table 3,

Table 3. Show the selection criteria for articles from each data basis.
Filter Description ProQuest Epsco Emerald Scopus Google Total
Scholar
Step 1 Articles with selected 227 19 39 28 204 568
keywords, After merging the
results from the different
databases to assist in finding
the more relevant studies that
will meet the objectives of the
study. Deleting duplicate
articles
(continued)
408 A. Alshamsi et al.

Table 3. (continued)
Filter Description ProQuest Epsco Emerald Scopus Google Total
Scholar
Step 2 After reading the titles, and 45 7 8 3 10 73
eliminating the non-relevant
articles
Step 3 Hand searching 15 4 6 1 6 32
Step 4 Citation tracing to be used as 5 0 0 0 2 7
reference and evidence to
support the literature
Final samples of selected studies 32
without the references

Figure 1 showed the flowchart below, indicates the number of selected studies and
the process of narrowing down the number of articles to reach 32 studies [46], and all
these studies were analyzed as a final sample using excel data.

Fig. 1. Systematic review process.

4 Result

Based on the 32 research studies about the influence of the quality of the service on
customer retention and the research question mentioned in the introduction [40], the
findings of this systematic review are reported as per below:
The Influence of Service Quality on Customer Retention 409

RQ1: Distribution of research purposes:


From the systematic review and the meta-analysis, many research studies were pre-
pared to examine the effect of service quality and its influence on customer retention.
Each paper was categorized into one of these five modifications: financial factors,
internal factors, customer-related factors, service quality-related factors, and Contextual
factors. As described in Fig. 2 below, we can say that around 50% of the collected
papers were service quality-related factors (N = 47). Moreover, approximately 15% of
those articles were extended by contextual factors (N = 14). Additionally, the third-
highest category (N = 11) is for factors from other theories or models with around
12%. Furthermore, the fourth category, the collected articles were drawn-out by
internal factors around 6% (N = 10), and the last category is the financial factors, with
around 3% (N = 7). It has been noticed that service quality-related factors has the
highest percentage in the selected literature studies.

Fig. 2. Distribution of studies based on research purpose.

Figure 2 classifies the influence of service quality across the analyzed studies to
determine the most common factors that are affecting customer retention in the analyzed
studies. From Fig. 2, it seems that service quality was positively affecting customer
retention [2, 20, 32, 47]. Moreover, customer satisfaction is significantly influencing
customer retention as it acts as a moderator in some studies [48, 49]. Furthermore, price,
customer service, reliability, assurance, tangibles, empathy, customer relationships,
value, and responsiveness are affecting customer retention positively [50–52]. On the
other hand, it has been noticed that few studies considered the following factors:
switching cost, innovation management, Process service quality, and Trust, although all
of them have shown a positive effect on customer retention. Hence, further research
should focus on investigating these factors on customer retention [14, 53–55]. Below is
Fig. 3 determines the most common factors, from it we can see the most frequent variables
which are respectively service quality, customer satisfaction, customer service, reliability,
assurance, tangibles, empathy, responsiveness, price, value, customer relationship,
switching cost, innovation management, and trust. Hence, we can understand how much
410 A. Alshamsi et al.

the service quality is critical factors that might have a negative or positive effect on the
customer’s retention.

26

3 1 2 5 8 2 3 1 6 5 6 5 5 5

Fig. 3. Distribution of studies based on research purpose

RQ2: What are the main research methods of the selected studies?
Distribution of research methods:
Figure 4 shows that 63% of the analyzed studies depended only on Quantitative survey
(N = 20) for data collection, followed by both 16% (Quantitative (survey) & focus
group interview) (N = 5), then 9% Qualitative Secondary data (N = 3), after that, 6%
Quantitative (Survey) two-stage SEM approach (N = 2) and 3% in both Qualitative
(Focus group) and meta-analysis (N = 1) respectively. From this, we can realize that
the most common method that was used by the reviewed studies is a Quantitative
survey, followed by a Focus group interview.

3% Quantitative
3% 6% (Survey)
two-stage SEM
9% 16% approach
Quantitative (survey)
& focus group
interview
63%
Quantitative
(Survey)

Fig. 4. Distribution of studies by research methods.


The Influence of Service Quality on Customer Retention 411

Figure 5 indicates that 75% of the analyzed studies (N = 23) registered Positive
research outcomes, followed by 20% (N = 8) as Natural outcomes. And 5% (N = 1) is
Negative.

Fig. 5. Distribution of research outcomes.

Figure 6 below: shows the distribution of all collected articles across the countries
that these studies were conducted, from it we can notice that the research articles that
are used in this study were frequently carried out from United Kingdom (N = 11),
followed by United States (N = 3), Ghana (N = 2), India (N = 2), Thailand (N = 1),
Nigeria (N = 1), Netherlands(N = 1), South Sulawesi Province (N = 1), Korea
(N = 1), Jordan (N = 1), Indonesia(N = 1), Greece (N = 1), Australia (N = 1) and
Amsterdam (N = 1). From this it was noticed that most studies with positive outcomes
were from the United Kingdom, followed by the United States. Also, we can see that
no studies were done in the UAE, which will is an excellent opportunity to conduct a
research study using this topic in UAE.

STUDIES
United Kingdom 3 11
1
1
Nigeria 1
1
1
Korea 1
1
1
Greece 1
1 2
India 2
1
1
Not mentioned 2
0 2 4 6 8 10 12

Fig. 6. Distribution of studies by research methods.


412 A. Alshamsi et al.

RQ4: Distribution of the studies by disciplines/contexts: Going back to Fig. 7 below,


most of the studies were done in the Banks (N = 9), followed by Mobile Service
Industry (N = 7), Retailing industry (N = 2), Small firms (N = 2), Advertising service,
Airline industry, home appliances business, Hotels, IS service, restaurants, Schools,
Steel industry, Tourism industry and Zoo with (N = 1) in all of them respectively.
Higher education sectors and universities are not covered, which opens new diminution
for a research study in this regard.

Fig. 7. Distribution of studies in terms of disciplines.

RQ5: Distribution of the studies by year of publication: In terms of publication year,


Fig. 8 below illustrates the distribution of studies for service quality and its effect on
customer retention across their publication year. As shown below, the studies ranged
from 1996 until 2018 with (N = 1) as much research in service quality and customer
retention in all year, respectively with the exception year 2004 the number of studies
increased to be (N = 2) and in the year 2009 the studies (N = 3). Moreover, the number
of studies increased in 2011 and 2015 to be (N = 5) but decreased in the year 2016
(N = 3) and year 2018 (N = 2).

Distribution of studies in terms of


publication year
6
4 5 5
2 3 3 3
1 1 1 1 1 2 1 1 1 1 2
0

Fig. 8. Distribution of studies in terms of publications year.


The Influence of Service Quality on Customer Retention 413

5 Conclusion

Earlier studies had provided insight into the research trend model. However, this
research study has investigated the relationship between service quality and customer
retention and which variables affecting this relationship. In this study, I have conducted
systematic review service quality related to customer retention, and the reason behind it
is to provide a complete analysis of the existing studies and to discuss the inference of
the analysis results. The present review study revealed six findings. First, the most
frequent factor is service quality-related factors, followed by contextual factors, factors
from other theories or models, internal factors and financial factors respectively, and
the service quality factors affected positively on the customer retention. Second, cus-
tomer satisfaction significantly influences the service quality and customer retention.
Third, the quantitative method, including surveys and questionnaires, was found to be
the most used method in the collected studies. Fourth, most of the articles in this
research study focused on the banking sectors, Mobile Service Industry, and Retailing
industry, respectively. Fifth, the majority of the study was done in the United Kingdom,
followed by the United States, Ghana, India, and the majority of the survey registered
positive outcomes in these countries. Sixth, related to the year of publication 2011 and
2015, have perceived a remarkable increase in publication.
As a limitation, this systematic review had focused on a particular data basis, and
the search criteria were based on the title only. By that, not all studies related to service
quality and customer retention were covered, which can be included in future studies.

References
1. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of
e-service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and
e-trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
2. Karin, A., Pervez, N.: Service quality and customer retention: building long-term
relationships. Eur. J. Mark. 38, 1577–1598 (2004)
3. Obeidat, B., Sweis, R., Zyod, D., Alshurideh, M.: The effect of perceived service quality on
customer loyalty in internet service providers in Jordan. J. Manag. Res. 4(4), 224–242
(2012)
4. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84 (2012)
5. Alshurideh, M.: The factors predicting students’ satisfaction with universities’ healthcare
clinics’ services: a case-study from the Jordanian higher education sector. Dirasat Adm. Sci.
161(1524), 1–36 (2014)
6. Ashurideh, M.: Customer service retention–a behavioural perspective of the UK mobile
market. Durham University (2010)
7. Alshurideh, M.T., et al.: The impact of Islamic Bank’s service quality perception on
Jordanian customer’s loyalty. J. Manag. Res. 9, 139–159 (2017)
8. Al Kurdi, B., Alshurideh, M., Salloum, S.A., Obeidat, Z.M., Al-dweeri, R.M.: An empirical
investigation into examination of factors influencing university students’ behavior towards
elearning acceptance using SEM approach. Int. J. Interact. Mob. Technol. 14(02), 19–41
(2020)
414 A. Alshamsi et al.

9. Alshurideh, et al.: Understanding the quality determinants that influence the intention to use
the mobile learning platforms: a practical study. Int. J. Interact. Mob. Technol. 13(11), 157–
183 (2019)
10. Al Dmour, H., Alshurideh, M., Shishan, F.: The influence of mobile application quality and
attributes on the continuance intention of mobile shopping. Life Sci. J. 11(10), 172–181
(2014)
11. Alshurideh, M., Masa’deh, R., Al kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Financ. Adm. Sci. 47(12), 69–78 (2012)
12. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics. Theor. Econ. Lett. 9(02), 392–414 (2019)
13. Alshurideh, D.M.: Do electronic loyalty programs still drive customer choice and repeat
purchase behaviour? Int. J. Electron. Cust. Relatsh. Manag. 12(1), 40–57 (2019)
14. Edward, M.: Role of switching costs in the service quality, perceived value, customer
satisfaction and customer retention linkage. Asia Pac. J. Mark. Linguist. 23(3), 327–345
(2011)
15. Alshurideh, M., Nicholson, M., Xiao, S.: The effect of previous experience on mobile
subscribers’ repeat purchase behaviour. Eur. J. Soc. Sci. 30(3), 366–376 (2012)
16. Ammari, G., Al kurdi, B., Alshurideh, M., Alrowwad, A.: Investigating the impact of
communication satisfaction on organizational commitment: a practical approach to increase
employees’ loyalty. Int. J. Mark. Stud. 9(2), 113–133 (2017)
17. Alshurideh, M.T.: A theoretical perspective of contract and contractual customer-supplier
relationship in the mobile phone service sector. Int. J. Bus. Manag. 12(7), 201–210 (2017)
18. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK
mobile phone market. J. Manag. Res. 6(1), 109 (2014)
19. Alshurideh, M., Alhadid, A., Al kurdi, B.: The effect of internal marketing on organizational
citizenship behavior an applicable study on the University of Jordan employees. Int. J. Mark.
Stud. 7(1), 138 (2015)
20. Ennew, C.T., Binks, M.R.: The impact of service quality and service characteristics on
customer retention: small businesses and their banks in the UK1. Br. J. Manag. 7, 219–230
(1996)
21. Zu’bi, Z., Al-Lozi, M., Dahiyat, S., Alshurideh, M., Al Majali, A.: Examining the effects of
quality management practices on product variety. Eur. J. Econ. Financ. Adm. Sci. 51(1),
123–139 (2012)
22. Alshurideh, M.: Scope of customer retention problem in the mobile phone sector: a
theoretical perspective. J. Mark. Consum. Res. 20, 64–69 (2016)
23. Alshurideh, M.: A behavior perspective of mobile customer retention: an exploratory study
in the UK Market. The end of the pier? competing perspectives on the challenges facing
business and management British Academy of Management Brighton–UK. Br. Acad.
Manag. 1–19 (2010)
24. Alshurideh, M.: Exploring the main factors affecting consumer choice of mobile phone
service provider contracts. Int. J. Commun. Netw. Syst. Sci. 9(12), 563–581 (2016)
25. Kettinger, W.J., Park, S.H.S., Smith, J.: Understanding the consequences of information
systems service quality on IS service reuse. Inf. Manag. 46(6), 335–341 (2009)
26. Sampaio, R.F., Mancini, M.C.: Systematic review studies: a guide for careful synthesis of
scientific evidence. Rev. Bras. Fisioter. 11(1), 83–89 (2007)
27. Corrêa, V.S., Vale, G.M.V., de Resende Melo, P.L., de Almeida Cruz, M.: Revista de A
dministração C ontemporânea Journal of Contemporary Administration O ‘ Problema da
Imersão ’ nos Estudos Empreendedorismo: Uma Proposição Teórica do, pp. 232–244 (2020)
The Influence of Service Quality on Customer Retention 415

28. Alshurideh, M.: Is customer retention beneficial for customers: a conceptual background.
J. Res. Mark. 5(3), 382–389 (2016)
29. Keiningham, T.L., Cooil, B., Aksoy, L., Andreassen, T.W., Weiner, J.: The value of different
customer satisfaction and loyalty metrics in predicting customer retention, recommendation,
and share of wallet. Manag. Serv. Qual. Int. J. 17(4), 361–384 (2007)
30. Valarie, A., Leonard, L.: Servqual: a multiple-item scale for measuring consumer perc.
J. Retail. 64(1), 12 (1988)
31. Zeithaml, V.A., Berry, L.L.: A conceptual model of service quality and its implications for
future research. J. Mark. 49(4), 41–50 (1985)
32. Alshurideh, M., Al Kurdi, B., Salloum, S.: Examining the main mobile learning system
drivers’ effects: a mix empirical examination of both the Expectation-Confirmation Model
(ECM) and the Technology Acceptance Model (TAM). In: International Conference on
Advanced Intelligent Systems and Informatics, pp. 406–417 (2019)
33. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
34. Obeidat, R., Alshurideh, Z., Al Dweeri, M., Masa’deh, R.: The influence of online revenge
acts on consumers psychological and emotional states: does revenge taste sweet?. In: 33
IBIMA Conference Proceedings - Granada, Spain, 10–11 April 2019 (2019)
35. Yildiz, S., Yildiz, E.: Social and administrative sciences. J. Soc. Adm. Sci. 2(2), 53–61
(2015)
36. Alshurideh, M., Al Kurdi, B., Abumari, A., Salloum, S.: Pharmaceutical promotion tools
effect on physician’s adoption of medicine prescribing: evidence from Jordan. Mod. Appl.
Sci. 12(11), 210–222 (2018)
37. Alshurideh, M., Salloum, S.A., Al Kurdi, B., Al-Emran, M.: Factors affecting the social
networks acceptance: an empirical study using PLS-SEM approach. In: 8th International
Conference on Software and Computer Applications (2019)
38. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of E-learning system in
higher educational environments in the UAE: applying the extended Technology Acceptance
Model (TAM). The British University in Dubai (2018)
39. Salloum, S.A., Alhamad, A.Q.M., Al-Emran, M., Monem, A.A., Shaalan, K.: Exploring
students’ acceptance of e-learning through the development of a comprehensive technology
acceptance model. IEEE Access 7, 128445–128462 (2019)
40. Al-Emran, M., Mezhuyev, V., Kamaludin, A.: Technology acceptance model in m-learning
context: a systematic review. Comput. Educ. 125, 389–412 (2018)
41. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future : writing a literature
review reproduced with permission of the copyright owner. Further reproduction prohibited
without permission. MIS Q. 26(2), xiii–xxiii (2002)
42. Al-Emran, M., Mezhuyev, V., Kamaludin, A., Shaalan, K.: The impact of knowledge
management processes on information systems: a systematic review. Int. J. Inf. Manage. 43
(July), 173–187 (2018)
43. Fetters, L., Figueiredo, E.M., Keane-Miller, D., McSweeney, D.J., Tsao, C.C.: Critically
appraised topics. Pediatr. Phys. Ther. 16(1), 19–21 (2004)
44. Costa, V., Monteiro, S.: Key knowledge management processes for innovation: a systematic
literature review. VINE J. Inf. Knowl. Manage. Syst. 46(3), 386–410 (2016)
45. Calabrò, A., Vecchiarini, M., Gast, J., Campopiano, G., De Massis, A., Kraus, S.: Innovation
in family firms: a systematic literature review and guidance for future research. Int. J. Manag.
Rev. 21(3), 317–355 (2019)
416 A. Alshamsi et al.

46. Ramezani, A., Ghazimirsaeed, S.J., Azadeh, F., Bandboni, M.E., YektaKooshali, M.H.: A
meta-analysis of service quality of Iranian university libraries based on the LibQUAL model.
Perform. Meas. Metr. (2018)
47. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.E.: Corporate social responsibility and patronage intentions: the mediating
effect of brand credibility. J. Mark. Commun. 14, 1–24 (2020)
48. Al Kurdi, B.: Investigating the factors influencing parent toy purchase decisions: reasoning
and consequences. Int. Bus. Res. 10(4), 104–116 (2017)
49. Steiner, W.J., Siems, F.U., Weber, A., Guhl, D.: How customer satisfaction with respect to
price and quality affects customer retention: an integrated approach considering nonlinear
effects. J. Bus. Econ. 84(6), 879–912 (2014)
50. Alshurideh, M.: Do we care about what we buy or eat? a practical study of the healthy foods
eaten by Jordanian youth. Int. J. Bus. Manag. 9(4), 65 (2014)
51. Parida, B.B., Baksi, A.K.: Customer retention and profitability: CRM environment.
SCMS J. Indian Manag. 8(2), 66–84 (2011)
52. Chadichal, S.S., Misra, S.: Exploring web based servqual dimensions in green banking
services impact on developing e-CRM. Asia Pacific J. Manag. Entrep. Res. 1(3), 289–312
(2012)
53. Al-Dmour, H., Alshuraideh, M., Salehih, S.: A study of Jordanians’ television viewers
habits. Life Sci. J. 11(6), 161–171 (2014)
54. Alshurideh, et al.: Determinants of pro-environmental behaviour in the context of emerging
economies. Int. J. Sustain. Soc. 11(4), 257–277 (2019)
55. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality,
price fairness and service recovery shape customer satisfaction and delight? a practical study
in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
The Impact of Ethical Leadership
on Employees Performance: A Systematic
Review

Hind AlShehhi1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. The purpose of this paper is to study the impact of ethical leadership
on employees’ performance within the business organization. Corporate leaders
are expected to lead by example in fostering ethical behaviors and actions
among their team members. Many studies have been explored deeper into the
efficacy of the top leaders on employees and their performance. Data captured
from 30 articles indicated that ethical leadership has a positive effect on
workers’ in-role job performance and all hypotheses were confirmed. These
findings have significant implications for research and practice. Moreover, this
research will focus on the effects of an ethical leadership approach on the
performance of workers since ethical leadership is considered critical in
enhancing the adopted business strategy in the achievement of organizational
goals and objectives.

Keywords: Ethical leadership  Employee performance  Moral sensitivity 


Moral perspective

1 Introduction

Ethical leadership plays a critical role in enhancing the productivity of employees


within business organizations [1, 2]. In the wake of the ever-increasing competition, the
management and leadership of business organizations need to foster effective examples
as far as ethical behaviors are concerned [3]. Ethical leadership is multidimensional and
involves the evaluation of the worker commitment, psychological well-being of the
team members, job satisfaction [4–6]. Such factors are critical in determining the
productivity of the team members in business settings. Ethical behaviors include
important concepts such as transparency, fairness, integrity, and compassion as well as

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 417–426, 2021.
https://doi.org/10.1007/978-3-030-58669-0_38
418 H. AlShehhi et al.

empathy, especially by the top leadership towards their team members or followers [7–
9]. The goal of this paper is to analyze other studies on the impact of ethical leadership
on employees’ performance within business organizations. This existing systematic
review includes the investigation of ethical leadership with regards to employees’
performance from the standpoint pf research questions, research methods, country
distribution, and across their year of publication. Therefore, this review is planned to
study the following research question to provide a comprehensive analysis of the
collected data. The question is:
• What is the impact of using ethical leadership on employee performance within
business organizations?

2 Literature Review

2.1 Major Issues and Gaps


Most studies agree that leadership is one of the major factors that has an indisputable
influence on the performance of the team members in all business organizations [10].
The performance of workers under different leaderships remain central in the business
concerns on ways of improving the productivity of the team members [11, 12].
Government leadership plays critical roles in the performance of the team members as
well as in the attainment of the set organizational goals and objectives [13, 14].
According to recent studies on corporate leadership, extraverted leaders enable their
employees to attained improved performance, particularly when the workers are pas-
sive. The assessment of various mechanisms of ethical leadership is critical in deter-
mining the performance of the employees. Most studies did not explore the benefits of
treating employees fairly, listening, and being understanding and sensitive to their
matters. The primary goals of effective and ethical leadership include confirming
improved employee performance [6]. The findings from studies in the human resource
management field have shown that increasing employees’ performance is probable
through embracing ethical leadership, especially the top leadership of the business
corporations. A study by Ewest demonstrated that ethical leadership practices help
enable the employees to discover their talents and remain committed in their contri-
butions to the hiring business organization.

2.2 Key terms


In this section, the definition to main terms has been provided as illustrated in Table 1.
The Impact of Ethical Leadership on Employees Performance 419

Table 1. Key terms.


Ethical leadership According to Dust et al., ethical leadership refers to the demonstration
of most appropriate behaviors and personal demeanors as well as
improved interpersonal relationships and fostering such values to the
team members. It is defined also as the leadership that is ruled by
respect for ethical beliefs and values and the dignity and rights of
others. In the conducted studies there were many dimensions of ethical
leadership such as moral senility, moral perspective, incorporating
transparency, self-efficacy, and treating employees with respect
Moral sensitivity The ability of a leader to identify prevailing moral issues and
understand the ethical implications of such factors in the decision-
making processes [15]
Moral perspective It is defined as an individual’s evaluation of the extent to which an
action is right or wrong because a moral perspective is personal, they
are not easy to measure. Two of the most influential measurements are
the Multidimensional Ethics Scale (MES) [16] and the Defining Issues
Test (DIT) [17]
Employees’ This refers to the general productivity of the employees as far as their
performance organizational roles and responsibilities are concerned [18]

3 Methods

3.1 Search Strategy and Data Sources


This study applied the systematic review approach. A large number of studies such as
[19–22] have been used in this method. The systematic review for this paper follows a
review design built on descriptive sub-design, which will involve the use of several
secondary sources of information such as peer-reviewed journals published within the
last 5 years. The sources include the combination of both qualitative and quantitative
sources of information through the following databases: Science Direct, Wiley Online
Library, JSTOR, EBSCO, LISTA, pro quest, Lexis Nexis Academy, Emerald, and
UNWTO e-library. The measurement analysis that will be used to inform the research
is the ability of the ethical leadership styles to impact production, working schedule,
and level of trust and commitment among employees. The objections that would hinder
the study is the inability of leaders to inspire immediate change after their appointment.
Another shortcoming that would hamper the research is the inadequate resources to
influence desired change whereby insufficient resources make it challenging to main-
tain a highly responsive and sustainable work environment. This study could also
explain the remedying efforts one can use to reduce the shortcomings resulting from the
implementation of the factors discussed above. This study also applies to either small-
scale or large-scale deployment, and the outcome may vary according to efficiency.
Moreover, it will highlight some of the areas that may require more research to inform
future studies.
420 H. AlShehhi et al.

3.2 Data Collection and Search Criteria


Figure 1 below shows the main steps that this study followed to elicited the main
chosen articles to be used to study the topic at hand. The article had complied with the
following conditions.
1. Date: published sources in the period 2015 to 2020 is used.
2. Language: English
3. Type of studies and study design: qualitative and quantitative. secondary sources
such as surveys, questionnaires, interviews, case studies
4. Outcome: Employees performance
5. Context: all type of organizations
6. Exclusion: any articles on ethical leadership that does not address the issues of
employees’ performance.

Fig. 1. PRISMA flowchart for the selected studies.


The Impact of Ethical Leadership on Employees Performance 421

Table 2. Inclusion and exclusion criteria.


Inclusion Exclusion
Should involve ethical leadership and Ethical leadership that is not used within the
employees performance organization
Should involve job or employees Ethical leadership that does not address the impact
performance on organizations and workers
Articles should be in English published Articles that use other languages and published
from 2015 to 2020 before 2015

Table 3. Keywords.
“Ethical leadership” AND “employees Ti(ethical leadership) AND AB(employees
performance” performance)
“Ethical leadership” AND “Job Ti(ethical leadership) AND (employees
performance” performance)
“Ethical leadership” AND “performance” ab(ethical leadership’) AND (employees
performance)

Table 4. Search results across the database


Database Frequency Final Keywords used
frequency
Emerald 7 2 Ti(ethical leadership) AND (employees
performance)
Sage Journals 7 1 “Ethical leadership” AND “employees
performance”
Science Direct 26 7 “Ethical leadership” AND “employees
performance”
“Ethical leadership” AND “performance”
Pro quest 9 1 ab(ethical leadership’) AND (employees
performance)
JSTOR 42 5 “Ethical leadership” AND “Job
performance”
LexisNexis 72 4 “ethical leadership” AND “employees
performance”
EBSCO 29 6 “Ethical leadership” AND “employees
performance”
“Ethical leadership” AND “performance”
UNWTO 44 3 “Ethical leadership” AND “employees
eLibrary performance”
Wiley Online 79 1 “Ethical leadership” AND “employees
Library performance”
Ti(ethical leadership) AND AB(employees
performance)
Total 315 30
422 H. AlShehhi et al.

3.3 Inclusion and Exclusion Criteria


The selected articles that will be analyzed in this systematic review should meet this
following inclusion and exclusion criteria (Table 2) and terms used in (Table 3). However,
Table 4 shows the frequencies of the selected articles through each research database.

3.4 Quality Assessment


In this part, a quality assessment of the selected studies is used with the inclusion and
exclusion criteria in Table 5. The 30 studies included in this systematic review were
gathered through 9 databases using different keywords as explained in Table 2,
Table 3, and Table 4. The quality assessment checklist with 9 criteria was recognized
to evaluate the quality of the studies selected (N = 30) demonstrated in Table 6 with
the results. Each question was scored according to a three-point scale, “YES” worth 1
point, “NO” worth 0 points while “partially” worth 0.5 points.

Table 5. Quality assessment questions


# Question
1 Are the research aims specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study specified?
4 Is the study context/discipline specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?

4 Result

The main findings through the analyzed articles are as following: Ethical leadership has
a direct impact on employees’ performance. It was founded across many if not all the
studies. Out of all the 30 articles that were gathered and selected, all confirmed that
indeed there was a direct positive relationship between employees’ performance and
ethical leadership. Moreover, Employees give their best when treated well, this finding
founded across almost all the articles that there was conformity that the performance
improved when there was a good treatment for the employees.
As expected, trust, job security, commitment, and efficiency moderate the effects of
ethical leadership on employees’ performance. This research has some theoretical
contributions to the relationship between ethical leadership and employees’ perfor-
mance. First, earlier research works showed that ethical leadership has a positive effect
on performance [23–25]. Researchers clarifying the mechanism that ethical leadership
affects performance depending on social learning and social exchange theory. This
research contributes to the relationship between ethical leadership and performance
The Impact of Ethical Leadership on Employees Performance 423

using social identity theory. Otherwise, this research shows that ethical leadership can
affect performance through determining an individual’s organizational commitment,
which widens the outcome of ethical leadership and the antecedent of performance [4,
26–28].

Table 6. Quality assessment results


Study Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Total %
S1 1 0.5 1 1 0.5 1 1 1 0.5 7.5 83%
S2 1 0.5 1 1 0.5 1 1 1 0.5 7.5 83%
S3 1 0.5 1 1 0.5 0 0.5 1 0.5 6 67%
S4 1 1 1 1 0.5 0 0.5 1 1 7 78%
S5 1 0.5 1 1 1 0.5 0.5 1 1 6.5 72%
S6 1 0.5 0.5 1 1 0 0.5 0.5 1 6 67%
S7 1 0.5 1 0.5 1 1 1 1 1 8 89%
S8 1 1 0.5 1 1 1 1 0.5 0.5 7.5 83%
S9 1 1 0.5 1 1 1 1 1 1 8.5 94%
S10 1 1 0.5 1 1 1 0.5 0.5 1 6 67%
S11 1 1 1 1 0.5 1 0.5 0.5 1 6 67%
S12 1 1 1 1 0.5 1 0.5 0.5 0.5 7 78%
S13 1 1 1 1 0.5 0.5 0.5 1 0.5 7 78%
S14 1 0.5 1 1 0.5 0.5 1 1 0.5 7 78%
S15 1 1 1 1 0.5 0.5 1 1 0.5 7.5 83%
S16 1 0.5 1 1 1 0.5 1 1 0.5 7.5 83%
S17 1 0.5 1 1 1 0.5 1 1 1 8 89%
S18 1 0.5 1 1 0 1 0.5 1 0.5 6.5 72%
S19 1 1 1 1 1 1 0.5 0.5 1 8 89%
S20 1 1 1 0.5 0.5 1 0.5 1 1 8.5 94%
S21 1 1 0.5 0.5 0.5 1 0.5 0.5 1 6.5 72%
S22 1 1 1 0.5 1 1 0.5 1 1 8 89%
S23 1 0.5 1 1 0.5 1 0.5 1 1 7.5 83%
S24 1 0.5 1 1 0.5 1 0.5 1 1 7.5 83%
S25 1 1 1 1 1 1 0.5 1 1 8.5 94%
S26 1 1 0.5 1 1 0.5 1 1 1 8 89%
S27 1 1 1 1 0.5 1 0.5 0.5 0.5 7 78%
S28 1 1 0.5 0.5 0.5 0 1 0.5 0.5 5.5 61%
S29 1 0.5 1 1 0.5 1 0.5 0.5 1 7 78%
S30 1 1 1 1 0.5 1 0.5 0.5 1 7.5 83%

The articles were collected across the following different countries to collect data
from different perspectives and cultures. Almost 40% studied the ethical leadership on
employees’ performances were conducted in the USA (N = 8) followed by China, the
UK, Korea, and other different countries. Figure 2 below shows the distribution of the
selected articles across the countries.
424 H. AlShehhi et al.

Distribution of studies in terms of country

UAE
Malaysia
Pakistan
Poland
Egypt
UK
USA
0 1 2 3 4 5 6 7 8 9

Fig. 2. Distribution of studies in terms of country

5 Conclusion

This systematic review paper is scholarly and current on the topic of ethical leadership
and its impact on employee’s performance and satisfaction have been able to develop
an actual understanding of some of the measures that should be gathered in a place to
ensure better performance at the workplace. The workers should be treated well and
given some care so that they feel valued and respected [8, 29–31]. This method will be
strategic in ensuring that the employees can perform and deliver [30–33]. The workers
have been employed majorly on what is in the best interest of the employees against to
what the workers are forced to do [33–35]. In organizations, based on the logic
highlighted through the articles reviewed, it is necessary to ensure that the employees
can be part of the policies formulated. This is key to ensuring that they will be able to
value and motivated to work. The workers should be at the center of strategic as well as
organizational change policies. The proposed study also lays critical groundwork for
future research studies in the field of HRM, especially concerning ethical leadership [5,
24, 26, 36].

References
1. Obeidat, R., Alshurideh, Z., Al Dweeri, M., Masa’deh, R.: The influence of online revenge
acts on consumers psychological and emotional states: does revenge taste sweet? In: 33
IBIMA Conference proceedings, Granada, Spain, 10–11 April 2019
2. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
3. Dhar, R.L.: Ethical leadership and its impact on service innovative behavior: the role of
LMX and job autonomy. Tour. Manag. 57, 139–148 (2016)
The Impact of Ethical Leadership on Employees Performance 425

4. Alshurideh, M., Al Kurdi, B., Abu Hussien, B., Alshaar, H.: Determining the main factors
affecting consumers’ acceptance of ethical advertising: a review of the Jordanian market.
Mark. Commun. 23(5), 513–532 (2017)
5. Alshraideh, A., Al-Lozi, M., Alshurideh, M.: The impact of training strategy on
organizational loyalty via the mediating variables of organizational satisfaction and
organizational performance: an empirical study on Jordanian agricultural credit corporation
staff. J. Soc. Sci. 6, 383–394 (2017)
6. Chughtai, A., Byrne, M., Flood, B.: Linking ethical leadership to employee well-being: the
role of trust in supervisor. J. Bus. Ethics 128(3), 653–663 (2015)
7. Alshurideh, M., Al Kurdi, B., Vij, A., Obiedat, Z., Naser, A.: Marketing ethics and
relationship marketing-an empirical study that measures the effect of ethics practices
application on maintaining relationships with customers. Int. Bus. Res. 9(9), 78–90 (2016)
8. Ammari, G., Al kurdi, B., Alshurideh, M., Alrowwad, A.: Investigating the impact of
communication satisfaction on organizational commitment: a practical approach to increase
employees’ loyalty. Int. J. Mark. Stud. 9(2), 113–133 (2017)
9. Shin, Y., Sung, S.Y., Choi, J.N., Kim, M.S.: Top management ethical leadership and firm
performance: mediating role of ethical and procedural justice climate. J. Bus. Ethics 129(1),
43–57 (2015)
10. Wang, D., Gan, C., Wu, C., Wang, D.: Ethical leadership and employee voice: Employee
self-efficacy and self-impact as mediators. Psychol. Rep. 116(3), 751–767 (2015)
11. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84–95 (2012)
12. Alshurideh, M., Alhadid, A., Al kurdi, B.: The effect of internal marketing on organizational
citizenship behavior. Int. J. Mark. Stud. 7(1), 138 (2015)
13. Al Shurideh, M., Al Sharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics. Theor. Econ. Lett. 9(02), 392–414 (2019)
14. Ren, S., Chadee, D.: Ethical leadership, self-efficacy and job satisfaction in China: the
moderating role of guanxi. Pers. Rev. 46(2), 371–388 (2017)
15. Dust, S.B., Resick, C.J., Margolis, J.A., Mawritz, M.B., Greenbaum, R.L.: Ethical leadership
and employee success: examining the roles of psychological empowerment and emotional
exhaustion. Leadersh. Q. 29(5), 570–583 (2018)
16. Reidenbach, R.E., Robin, D.P.: Toward the development of a multidimensional scale for
improving evaluations of business ethics. J. Bus. Ethics 9(8), 639–653 (1990)
17. Rest, J.R.: Development in Judging Moral Issues. University of Minnesota Press (1992)
18. Anderson, H.J., Baur, J.E., Griffith, J.A., Buckley, M.R.: What works for you may not work
for (Gen) Me: limitations of present leadership theories for the new generation. Leadersh. Q.
28(1), 245–260 (2017)
19. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
20. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
21. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
22. Nedal Fawzi Assad, M.T.A.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
426 H. AlShehhi et al.

23. Alshurideh, M., et al.: Determinants of pro-environmental behaviour in the context of


emerging economies. Int. J. Sustain. Soc. 11(4), 257–277 (2019)
24. Wang, Y.-D., Sung, W.-C.: Predictors of organizational citizenship behavior: ethical
leadership and workplace jealousy. J. Bus. Ethics 135(1), 117–128 (2016)
25. Yang, Q., Wei, H.: The impact of ethical leadership on organizational citizenship behavior:
the moderating role of workplace ostracism. Leadersh. Organ. Dev. J. 39(1), 100–113 (2018)
26. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
27. Frericks, M.: Why leadership should monitor mentions too. Biz Blog (2015)
28. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.: Corporate social responsibility and patronage intentions: the mediating effect
of brand credibility. J. Mark. Commun. 1–24 (2020)
29. Alshurideh, M., Shaltoni, A., Hijawi, D.: Marketing communications role in shaping
consumer awareness of cause-related marketing campaigns. Int. J. Mark. Stud. 6(2), 163
(2014)
30. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality,
price fairness and service recovery shape customer satisfaction and delight? A practical study
in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
31. Alshurideh, M.T.: Exploring the main factors affecting consumer choice of mobile phone
service provider contracts. Int. J. Commun. Netw. Syst. Sci. 9(12), 563–581 (2016)
32. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
33. Alshurideh, M., et al.: Loyalty program effectiveness: theoretical reviews and practical
proofs. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
34. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
35. Alshurideh, M.: The factors predicting students’ satisfaction with universities’ healthcare
clinics’ services: a case-study from the Jordanian higher education sector. Dirasat Adm. Sci.
161(1524), 1–36 (2014)
36. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance. Eur. J. Econ. Finan. Adm.
Sci. 51(1), 44–64 (2012)
Data Mining, Decision Making,
and Intelligent Systems
Evaluating Non-redundant Rules of Various
Sequential Rule Mining Algorithms

Nesma Youssef1,2(&), Hatem Abdulkader1, and Amira Abdelwahab1,3


1
Faculty of Computers and Information, Minoufia University,
Shibin El Kom, Egypt
Hatem6803@yahoo.com, amira.ahmed@ci.menofia.edu.eg
2
Department of Information System, Sadat Academy for Management Science,
Cairo, Egypt
Nesma.hassan@sadatacademy.edu.eg
3
College of Computer Science and Information Technology,
King Feisal University, Al-Hofuf, Saudi Arabia

Abstract. The data mining techniques help discover hidden knowledge from a
huge database. In the pattern mining field, the main goal is to discover inter-
esting patterns in large databases. The sequential pattern mining technique is
specialized for discovering sequential patterns with only one measure called
support. It is not sufficient and misleading for the user. Sequential rule mining is
a good solution that takes another measure into an account called confidence.
This paper presents a comparative analysis between two sequential rule mining
algorithms, namely non-redundant with dynamic bit vector (NRD-DBV), and
TRuleGrowth algorithm. The study clarifies the execution time, the number of
rules, and the memory usage for each algorithm. In addition, exposure to the
most proper field for each algorithm to achieve elevated efficiency.

Keywords: Sequential rule mining  TRuleGrowth  Non redundant sequential


rule  Closed sequential patterns

1 Introduction

There is an essential problem in discovering the temporal relationships in large


sequence databases. It helps to acquire the user with better information about the data
and sets the base for the prediction process. Various techniques have been proposed for
discovering relations in a database. One of the most common techniques is sequential
pattern mining (SPM) that used to find frequently sequential patterns in sequence
databases [10]. SPM depends only on the support measure means the number of the
existence of items in a database. It can be misleading and not sufficient for making a
prediction. The alternative to SPM that addresses this problem is sequential rule mining
(SRM) [11]. SRM considers additional measure called confidence. It calculates the
probability of the following pattern. Mining sequential rules have faced many chal-
lenges, for example, similar rules that can be classified differently. Also, some rules
considered uninteresting individually, which leads to not being discovered. So, specific
rules are scaring for use in predicting. These challenges were a fundamental reason for

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 429–440, 2021.
https://doi.org/10.1007/978-3-030-58669-0_39
430 N. Youssef et al.

producing an enormous number of redundant sequential rules, which makes the mining
process inefficient in an intelligent system. Many researchers have proposed enhanced
methods of SRM to reduce redundancy in many ways and improve the efficiency of
these algorithms.
There are two types of sequential rule mining; standard and partially ordered
sequential rules. Many researchers have proposed algorithms to enhance the efficiency
of SRM algorithms depending on the following two directions:
First, for standard sequential rules, improve the performance of the sequential
patterns (SPs) through the mining process. It divided into two phases; (a) mining
frequent sequential patterns. (b) generate sequence rules depend on the first phase.
Many researchers pay attention to enhance the efficiency of this phase by eliminating
unaffected sequences that don’t impact the final results. They have mining frequent
closed sequential patterns that help to generate rules based on more compact infor-
mation. We refer to that type of the algorithm in this paper with (NRD-DBV)
algorithm.
Second, for partially ordered sequential rules, extending the mining of SR by uses
additional constraint. Mining sequential rules in a partially ordered manner, meaning
we didn’t need to order items in the antecedent and consequent side. It uses the pattern
growth technique for incrementally detecting all valid rules. Researchers develop
enhanced algorithms that accept an additional constraint to improve the primary
algorithms. Like the TRuleGrowth algorithm accepts a window size constraint. It helps
to reduce the number of rules generated, decreases the runtime, and reduce disk space
requirement for storing rules produced. So that enables the user to analyze the results
facilely.
This paper presents an extensive study of two types of SRM algorithms; standard
and partially ordered sequential rules. These algorithms are helping to obtain non-
redundant rules among data items in large sequence databases. We compare how each
algorithm performs; non-redundant rules with dynamic bit vector and TRuleGrowth
algorithm, to give the most suitable domains for each algorithm. The two algorithms
compared to these criteria: the execution time, the number of rules generated, and the
memory usage.

2 Literature Review

Many researchers have proposed algorithms to improve the process of mining


sequential patterns. The main problem of mining SPM is when setting a support
threshold with low value; it has produced irrelevant sequential patterns. A sequential
rule mining is a natural extension of sequential patterns that can help users to under-
stand the orders of a sequence in a sequence database. SRM has applied in many areas
such as E-learning, manufacturing simulation, customer behavior analysis, and rec-
ommendation [1, 2].
The most common algorithm for SRM is Mannila & Verkano [3] that address the
problem of predicting the behavior of the sequences by discovering all episodes that
happen frequently in a sequence database.
Evaluating Non-redundant Rules of Various Sequential Rule 431

The RuleGen algorithm [4] has been proposed to generate the full set of the
sequential rules from frequent patterns and eliminate redundant rules in the next phase
of the mining process. It has to scan the database many times for calculating the support
of every prefix that causes high complexity and additional cost.
Posteriorly, researchers have generated sequential rules on the training set of
sequence data that called partially ordered sequential rules (POSR). It depends on that
item in the antecedent and consequent sides of a rule unordered. Two baseline algo-
rithms of POSR called CMRule & CMDeo. It’s the first algorithm that removes the
temporal information and produces all rules that achieve the minimum support.
It depends on the number of association rule that makes it inefficient. The second
algorithm is CMDeo perform more efficient than CMRule as it generates all valid rules
of size 1 * 1 through left and right expansion procedures. RuleGrowth algorithm has
been proposed to overcome the problems in CMDeo through grows rules recursively to
discover individual items. It can be expanded rules during ensures only valid rules
being in the sequence database [5].
There are many algorithms have been proposed depending on the prefix tree to
achieve better efficiency. These are concentrating on improving SR by removing
redundant rules such as CNR, MNSR, and IMSR [6]. They are sorting frequent
sequences in ascending order before generating rules. Thus it reduces the number of
scans for each sequence and reduces the complexity to O (n2).
The researchers have developed an extension of the RuleGrowth algorithm called
TRuleGrowth. This algorithm discovers the sequential rules occurring within a sliding
window constraint to generate much smaller rules. It improves efficiency due to
reducing the required space to store the produced rules [4, 7].
An efficient algorithm has been proposed in [8] called NRD-DBV. It utilizes a
dynamic bit vector data structure with a prefix tree. This algorithm helps in eliminating
uninteresting candidates early. So, it can minimize the runtime and memory usage.

3 Description and Implementation for Two Enhanced


Algorithms of SRM

The most common type of sequential rule mining is a standard sequential rule. It
describes a sequential relationship between two sequential patterns. It accepts only two
parameters named minimum support and minimum confidence determined by the user.
It executes only integers in a sequence and generates all rules that achieve support
value and confidence higher than a threshold. It can be discovered by many algorithms
such as RuleGen, CNR, MNSR, and NRD-DBV. It has been specifically for producing
important decisions or predictions.
The other type of sequential rule is the newest type of rules named partially ordered
sequential rules. It’s more general than standard sequential rules. It represents a
sequential relationship between two unordered item-sets. So, it eliminated a sequential
relationship between the antecedent and the consequent of rules. POSRs are interested
in predicting one item at a time. It generates SRs based on the matching rules; if the
antecedent side exists in the sequence, then a rule is matched. It can extend the mining
of SRs by adding constraints such as the TRuleGrowth algorithm. TRuleGrowth can be
432 N. Youssef et al.

allowing a specifying parameter called window-size to provide the same functionality


of the NRD-DBV algorithm.
Two types of sequential rule mining are represented through the following two
algorithms. These algorithms have been considered the newest algorithms named the
TRuleGrowth and the NRD-DVB algorithm. They can be providing higher prediction
accuracy.

3.1 TRuleGrowth an Extension of RuleGrowth Algorithm


POSR represents a new form of how to mine sequence rules. These items on the
pioneer and posterior element units can be in any order. This structure replaces equi-
table rules with a single rule and increases prediction accuracy. RuleGrowth algorithm
presented to be mining POSR. It depends on the pattern growth approach for detecting
more precise rules. This approach is performing in an incremental manner that starts
with two items, and then grows rules one element each time by expanding their left and
right side of the rule. So, the algorithm can easily accept restrictions for specific
application needs.
One specific extension of RuleGrowth is TRuleGrowth. It detects rules occurring
within a maximum number of consecutive sets of items in each sequence. It’s a more
efficient algorithm than RuleGrowth because it generates much smaller rules. So, it
reduces the disk space requirements for storage. It makes it easier to analyze the results.
Adjusting the window size constraints can increase the accuracy of the prediction when
using rules to predict.
The TRuleGrowth algorithm is consists of three main phases. The first phase is
converting the dataset into a sequential list and determining the minimum support
threshold. The second phase is generating a rule with size 1 * 1 and performing two
processes to expand the rule on both the left and right sides. Finally, setting the window
size constraint, minimum confidence threshold, and then it examining the validity of all
rules to generate that POSR, as shown in Fig. 1.

Fig. 1. Framework of TRuleGrowth algorithm


Evaluating Non-redundant Rules of Various Sequential Rule 433

3.1.1 The TRuleGrowth Algorithm Implementation


The algorithm first scans the sequence database one to find sides for each item. Then, it
identifies all items in which their support higher than or equal minSup to generates all
valid rules in size 1 * 1. For calculating sides (x ! y) and sides (y ! x), the algorithm
scans the 1-sequence and considers each pair of items x, y one by one. The support of
the rule is calculated by dividing |sides (x ! y)|/|s|. If the support no lowers than the
minSup, then two procedures are called to expand the left and right sides [4], and to
ensure the sliding window constraint is applied to all rules having size 1 * 1. The
algorithm store all occurrences of each item and indicating to their position by con-
taining that item. For example, the occurrences of b in sequence {a, b}, {d}, {b}, {a, b,
e} are 1, 3 and 4. Then, using the hash table for checking each item locate before item y
and items located before x in a sequence database. This procedure performs within a
window size constraint. The two procedures are achieved in five steps as follows: first,
initialize a hash table with a null value and check item-set in each sequence sides.
Second, eliminate all items that don’t achieve the sliding window constraint. Third, If
size of ‘hash b’ = size ‘b’, and then add every item c  a \ m in the hash table. Fourth,
If size ‘hash table’ < ‘b’, then add every item with the position of m as d  b \ m.
Finally, if size ‘hash a’ = size ‘a’ and size of ‘hash b’ = size ‘b’, add side to available
side (a U {c} ! b) for each item c 62 a [ b which is having id before first item of ‘b’
within window size.
Implementing an additional parameter named a sliding window can be added their
features:
1. Pruning the search space through several orders of magnitude, so it decreases the
execution time.
2. Reducing the disk space of discovered rules by producing a much smaller number
of rules, so that makes it easier for a user to analyze the result.
3. Approving its importance in real-life applications, especially for temporal patterns
such as analyzing the data for a stock market.

3.2 Non-redundant Rule Based on Dynamic Bit Vector


NRD-DBV algorithm utilizes a dynamic bit vector pattern to mine frequent closed
sequences by depending on vertical data format that proved its efficiency in scanning
the database only once. Mining frequent closed sequence has a significant advantage
that overcomes the problem of generating an exponential number of patterns in the long
sequence database. It is a lossless compression of sequential patterns reducing the
number of patterns while extracted the full information. By applying this approach, we
haven’t super sequence with the same support of the parent. So, it’s more efficient as it
reduces the memory usage and the execution time required for mining long sequence
databases. Additionally, it adopts the prefix tree to store all frequent closed sequences
that make it more efficient to generate non-redundant sequential rules.
434 N. Youssef et al.

3.2.1 The NRD-DBV Algorithm Implementation


There are five phases to perform NRD-DBV algorithm in Fig. 2:
1. Convert a sequence database to DBV structure and store 1-sequences in a prefix tree
as the root of the tree initialize with a null value.
2. Check and remove all prefixes that didn’t extend a frequent closed sequence. For
example, item B usually occurs after item A and has the same support of A, then B
should be absorbed like A (DE) = 30% and A (DE) C = 30%. With the examination
of downward closure, we must eliminate A (DC).
3. Store all frequent sequences as child nodes and check those as frequent closed
sequences or prefixed generators.
4. Preform sequence extension for each child by utilizing the closed pattern extension
method. The method performs in two forms: First, sequence extension that grows
patterns by adding an item as a new item-set after the last existing item-set. Second,
the item-set extension that grows patterns by adding the item after the last item-set
in the pattern.
5. Generation of non-redundant rules by performing a condition to stop generating
rules that don’t meet the minConf threshold. Denoted as
  
supðSequence of Sn
If  minconf
supðSequence of pre Þ

6. Then non-redundant sequential rule (NR-SeqRule) equal to NR-SeqRule union with


r. Other, discontinue generating the rules for the child nodes. And also discontinue
generating the rules when sup (n2) < sup (n1) and if (sup(n1))/(sup(n2)) < min-
Conf, then (sup(n2))/(sup(n)) < minConf.

Fig. 2. Framework of NRD-DBV algorithm


Evaluating Non-redundant Rules of Various Sequential Rule 435

By applying this approach, we haven’t super sequence with the same support of the
parent. So, it’s more efficient as:
1. It reduces the memory usage and the execution time required for mining long
sequence database.
2. Additionally, it adopts the prefix tree to store all frequent closed sequences that
make it more efficient to generate non-redundant sequential rules.

4 Evaluate the Performance of Each Algorithm

Experiments were proceeded to evaluate the effect of minSup on the run-time, the
number of rules, and the memory usage. The implementation of both the NRD-DBV
algorithm and the TRuleGrowth algorithm were done on a laptop with an Intel Core i5
2.3 GHz processor and 6.58 GB of RAM running Windows 7. Both algorithms
encoded with python language and run on Jet Brains PyCharm.
Three real datasets with different features were applied to evaluate the performance
that downloaded from SPMF [9]. The first dataset named BMSwebview1 (Gazelle) that
contains 59,601 sequences of clickstream data from an e-commerce website. It includes
497 distinct items with an average length of 2.42. The most important thing that
distinguishes it is a variance of its items that it included items repeated rarely.
The second dataset is Korsarak, a huge dataset containing 990,000 sequences of
click-stream data from a Hungarian news portal. It includes 41270 items with an
average sequence length of 8.1. Due to the difficulty applied on TRuleGrowth that
caused overhead limit to exceed. We implement a subset of Korsarak included only
25000 items.
The third dataset is BMSWebView2 (Gazelle), the use of this data set in the
KDD CUP 2000. It contains 77512 clickstream data of e-commerce with 3340 distinct
items and an average length of 4.62.
We analyzed the performance of the TRuleGrowth and the NRD-DBV algorithm
phase by phase and studied the effect of the minSup on the runtime, the number of
generated rules, and the memory usage.
In (BMWwebview1) dataset, we set lower minSup value due to the variance of
sequences that means that items are not repeated many times frequently in the dataset.
That means that when setting minSup value as included in other experiments, we didn’t
have any rules during the mining process. The value of the minConf threshold was set
to 0.5 for all states in the experiment. The value of parameters where determined after
carrying out many initial experiments to acquire the highest performance.
We notice that the runtime increased with decreasing minSup as shown in Fig. 3;
there is a reversed relationship between them. The intelligibility of this relationship
appears in the NRD-DBV algorithm and the TRuleGrowth algorithm when setting a
high value for window size. When the window size value decreases, we notice that it
takes less time and fewer number of sequence rules. That is because there is no high
computation for generating sequential rules.
436 N. Youssef et al.

Figs. 3 Runtime of sequential rules for Fig. 4. Comparison of memory usage for
webview 1 with various minSup values and webview 1 with various minSup values and
(minConf = 0.5) (minConf = 0.5)

Table 1. Rule count of sequential rules on BMSwebview1


minSup TRuleGrowth-w6 TRuleGrowth -w10 TRuleGrowth -w14 NRD-DBV
0.0006 5239 32726 152611 16288
0.0007 4843 31002 132947 19184
0.00075 4565 29633 116043 13149
0.0008 4326 28432 101104 6960
0.00085 4087 26686 84909 3607
0.0009 3929 25341 74946 2280

Table 1 shows the number of generated rules; there is also a reversed relationship
between the number of rules and minSup value. But there is a further increase in the
numbers of rules in the TRuleGrowth algorithm with a high value of window size. The
NRD-DBV algorithm generates sequential rules nearby to the number of TRuleGrowth
algorithms as the window size value decreases and increases minSup value at the same
time.
In the second dataset, Korsarak, that considered one considerable sequential
datasets. We utilized a subset of the global dataset that contains non-duplication of
location on the news portal that users browsing in a specific session.
When utilizing the original Korsarak dataset, TRuleGrowth ceased to produce any
rules and an overhead limit exceed occurred. The NRD-DBV algorithm succeeded in
generating the rules on the Korsarak dataset with 9900,000 sequences. It produced
from 4 to 21 rule when setting minSup from 0.07 to 0.03 in time from 192 (s) to 258
(s). We dealt with this problem by using a subset of the Korsarak with 25000
sequences.
Evaluating Non-redundant Rules of Various Sequential Rule 437

Table 2. Rule count of sequential rules on Korsarak


minSup TRuleGrowth-w10 TRuleGrowth-w16 NRD-DBV
0.01 31 34 92
0.02 12 13 35
0.03 9 9 21
0.04 8 8 15
0.05 6 6 12
0.06 4 5 5
0.07 4 4 4
0.08 3 3 3

Figure 5 shows that the TRuleGrowth algorithm proved its effectiveness when
setting minSup with a lower value that generated rules in less time than the NRD-DBV
algorithm. There is no much difference when changing the value of window size
constraint from 10 to 16. While decreasing the minSup value, the generation of the
NRD-DBV algorithm for the rules was almost three times greater than the TRule-
Growth algorithm, as shown in Table 2.

Figs. 5. Runtime of sequential rules for Fig. 6. Comparison of memory usage for
Korsarak with various minSup values and Korsarak with various minSup values and
(minConf = 0.5) (minConf = 0.5)

In the third dataset, named BMSWebView2, it differs from BMSWebView1 in


which the total occurs of the item in it is 358,278 items while the BMSWebView1
contains 149,639 items. Figure 7 shows the noticeable difference when set window size
constraint with a value of 6 compared to a value of 10. As with increasing the window
size constraint value, the runtime increased particularly when setting low value to the
minSup. But the TRuleGrowth algorithm still had the most portion of the runtime to
generate the sequential rules.
Similarly, for the number of rules, the lowest number of rules was generated in
TRuleGrowth algorithm with a low value of window size constraint. While the number
of rules increased in the TRuleGrowth with higher window size and it takes additional
time for computations when decreases minSup value.
438 N. Youssef et al.

The NRD-DBV algorithm has further increased in generating the highest number of
sequence rules, especially, when decreased the value of minSup, as shown in Table 3.

Table 3. Rule count of sequential rules on BMSWebView2


minSup TRuleGrowth-w6 TRuleGrowth-w10 NRD-DBV
0.001 1305 9050 18000
0.002 641 1872 2706
0.003 262 587 698
0.004 120 236 260
0.005 68 117 124
0.006 41 64 65
0.007 22 35 35
0.008 12 12 16

Concerning the memory usage, it is supposed that the amount of memory required
increased with a decrease of minSup value because of the increasing number of
sequence rules. But, the NRD-DBV algorithm has proven its efficiency in memory
usage than the TRuleGrowth algorithm at all experiments. This because it utilizes the
DBV structure and prunes the child nodes early of all prefixes that remove unimportant
rules, as shown in Figs. 4, 6 and 8.

Figs. 7. Runtime of sequential rules for Fig. 8. Comparison of memory usage for
webview2 with various minSup values and webview2 with various minSup values and
(minConf = 0.5) (minConf = 0.5)

5 Conclusion and Future


Work

This paper presented a significant comparison of two sequential rule mining algorithms.
By applying them to different features of three real datasets, we observed that mining
the NRD-DBV algorithm could minimize the number of sequential rules and the
Evaluating Non-redundant Rules of Various Sequential Rule 439

required memory usage. It depends on reducing the search space by using DBV
structure with a prefix-tree that leads to early pruning child nodes. Additionally, the
NRD-DBV algorithm produces more rules than the TRuleGrowth algorithm for
adopting the arrangement of items into consideration. Whereas, mining the TRule-
Growth algorithm with a specific value of a window size constraint can perform faster
and improves the accuracy of discovered sequential rules that doesn’t restrict to the
arrangement. We also concluded that the most appropriate algorithm is chosen based
on the characteristics of the database and the parameters specified by the user. If the
database contains items that rarely repeated, then the most appropriate in terms of
execution time and the number of rules is the TRuleGrowth algorithm. While the NRD-
DVB algorithm is the most appropriate solution for mining very huge databases. In all
cases, the window size constraint in the TRuleGrowth algorithm must be used with a
lower value, because using it with a higher value is an obstacle that making overhead
high computations.
The NRD-DBV algorithm is more useful in domains that necessitate the arrange-
ment of items such as medical area, error detection, intervention, and bugs. for example
in the medical area, If the patient is suffering from a fever, which is followed by a
decrease in the level of coagulation followed by the appearance of a red speck on the
body, it is bearable that the patient will need to treat dengue fever. This order in events
is important in predicting the appropriate type of treatment. We also concluded from
experience the limitations of the NRD-DBV algorithm as follow:
1. There are several steps needed to construct the data structure of the sequences
before producing the rules. It requires a lot of time.
2. Demonstrated by the nature of the dataset and the degree of the frequency of items
in it. A dataset having a high variance of items didn’t perform well with the usual
value of the minSup threshold.
The implementation of the TRuleGrowth algorithm allows you to customize
optional parameters such as increasing the number of items that appear in the ante-
cedent and consequent of a rule. It can help provide product recommendations and
make quick decisions. The TRuleGrowth algorithm also suffers from limitations:
1. When setting a large value to the window size constraints, it didn’t provide a major
improvement.
2. It has proven inefficient in performance when applied on a larger dataset. It caused
an overhead limit to exceed.
For future work, we plan to enhance the NRD-DBV algorithm in such a way to
improve the efficiency in the long sequence database by utilizing a parallel approach for
producing the rules.

References
1. Noughabi, E.A.Z., Albadvi, A., Far, B.H.: How can we explore patterns of customer
segments’ structural changes? a sequential rule mining approach. In: International
Conference on Information Reuse and Integration, pp. 273–280. IEEE (2015)
440 N. Youssef et al.

2. Jannach, D., Jugovac, M., Lerche, L.: Adaptive recommendation-based modeling support for
data analysis workflows. In: Proceedings of the 20th International Conference on Intelligent
User Interfaces, pp. 252–262. ACM. March 2015
3. Zaki, M.J.: SPADE.: an efficient algorithm for mining frequent sequences. Mach. Learn. 1
(1–2), 31–60 (2001)
4. Fournier-Viger, P., et al.: Mining partially-ordered sequential rules common to multiple
sequences. IEEE Trans. Knowl. Data Eng. 27(8), 2203–2216 (2015)
5. Ijeast., et al.: Survey on sequence mining algorithms. Int. J. Eng. Appl. Sci. Technol. 58–64.
IJEAST (2016)
6. Pham, T.-T., Luo, J., Vo, B.: An effective algorithm for mining closed sequential patterns
and their minimal generators based on prefix trees. Int. J. Intell. Inform. Data. Syst. 7(4),
324–339 (2013)
7. Setiawan, F., Yahya, B.N.: Improved behavior model based on sequential rule mining. Appl.
Soft Comput. 68, 944–960 (2017)
8. Tran, M.-T., et al.: Mining non-redundant sequential rules with dynamic bit vectors and
pruning techniques. Appl. Intell. 45(2), 333–342 (2016)
9. Fournier-Viger, P., et al. The SPMF open-source data mining library version 2. In: Joint
European conference on machine learning and knowledge discovery in databases, pp 36–40.
Springer, Cham (2016)
10. Rezig, S., Achour, Z., Rezg, et al.: Using data mining methods for predicting sequential
maintenance activities. Appl. Sci. 8(11), 2184 (2018)
11. Kour, A.: Sequential rule mining, methods and techniques: a review. Int. J. Comput. Intell.
Res. 13(7), 1709–1715 (2017)
Impact of Fuzzy Stability Model on Ad Hoc
Reactive Routing Protocols to Improve
Routing Decisions

Hamdy A. M. Sayedahmed1(&), Imane M. A. Fahmy2(&),


and Hesham A. Hefny2(&)
1
Central Metallurgical Research and Development Institute (CMRDI),
Cairo, Egypt
hamdi@cmrdi.sci.eg
2
Faculty of Graduate Studies for Statistical Research, Cairo University,
Giza, Egypt
{iman_fahmy,hehefny}@cu.edu.eg

Abstract. Mobile Ad hoc Network (MANET) is the cornerstone for the


Internet of Things (IoT) and Vehicle to Everything (V2X) networks, its devices
are remarkable of lightweight to be portable and free to join or disjoined the
network. Therefore, one of the MANET routing types is reactive routing pro-
tocols. Reactive routing protocols set up the connection between devices either
on-demand such as Ad Hoc On-demand Distance Vector (AODV) protocol or
source routing such as Dynamic Source Routing (DSR) protocol. Reactive
routing protocols increase routing overhead on discovering new routes. Thus
routing overhead, and delay will increase. In this paper, the method of gener-
ating a fuzzy model is presented. Also, a tuned fuzzy stability model is intro-
duced to handle the imprecision of routing decisions by the Fuzzy Stability
model for Ad Hoc On-demand Distance Vector (FSAODV) and Fuzzy Stability
model for Dynamic Source Routing (FSDSR). The results showed that
FSAODV and FSDSR have outperformed the state of art protocols AODV and
DSR respectively.

Keywords: MANET  Reactive routing  AODV  DSR  Fuzzy stability


model  FSDSR  FSAODV

1 Introduction

A mobile ad hoc network (MANET) also referred to as a wireless ad hoc network is a


self-organized, infrastructure-less network of mobile nodes i.e. it doesn’t have central
coordination. In MANET, the set of nodes connected through radio waves. Each node
inside a network works either a host or a router node. The transmission range for each
node enables nodes to direct connect with other nodes within its range, while they need
another node (router node) to guide their packets if the connection occurs outside their
ranges. Besides, the node’s movements make network topology frequently change and
link failure more often, according to [6] and [20].

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 441–454, 2021.
https://doi.org/10.1007/978-3-030-58669-0_40
442 H. A. M. Sayedahmed et al.

Determining MANET challenges will help to build effective applications [3, 14].
Thus, MANET challenges could be addressed as routing between any communicating
pair of nodes, multi-cast routing, reliability, and security between nodes in exchanging
information [21]. Also, Quality of Service (QoS) in exchanging information, battery
constraint, inter-networking, a hidden terminal problem refers to packets collisions at
receiving nodes, are other challenges of MANETs which could be considered in setting
up an application with better performance.
Many routing protocols have been proposed for the MANET and classified as:
proactive (Table Driven) or reactive (On-Demand) routing protocols. In proactive
routing protocols, each node keeps up-to-date routing table by broadcasting periodi-
cally control packets within its transmission range. The main issue in proactive is that
all nodes keep up-to-date routing tables. In reactive routing protocols, each node
invokes the route discovery protocol to discover a path to a destination, this path valid
when the destination is reachable. The main issue in reactive routing protocols is
routing overhead on discovering new routes either dynamic source routing or on-
demand routing [13, 22].
Multipath routing is a solution for resource-saving. As multipath discovery helps to
transfer data through different backup links. Based on multiple links will decrease the
end to end delay, bit error rate (BER), and saves limited power [5, 23]. However, the
need for a new technique to handle the lack of information and ambiguity of MANET’s
nature could provide better enhancement. Since routing in MANETs depends on dif-
ferent parameters with different values. These values together or separated can form
gradual degrees of precision. Fuzzy logic exploits ambiguity and lack of information to
make decisions approximated which can be used in MANET’s routing. In this paper,
the process of generating a fuzzy model is introduced in detail. Also, the tuning process
for the fuzzy model is shown. Besides, a tuned fuzzy stability model is introduced to
enhance Ad Hoc On-demand Distance Vector protocol (AODV) and Dynamic Source
Routing (DSR) protocol performance.
This paper is organized as follows: Sect. 2 provides an overview of related work.
Section 3 introduces a review of reactive routing protocols. Section 4 presents the used
fuzzy model. Section 5 describes a simulation environment. Section 6 discusses the
collected results. Section 7 presents the conclusion and future work.

2 Literature Review

Routing stability management is an important issue for any Ad Hoc network. There are
several methods to improve link stability such as statistical method, prediction models,
clustering approach, and fuzzy models ..etc. Therefore, selecting the appropriate
technique is relative to some parameters such as predicting network size, number of
participating nodes, type of routing protocol, and kind of service that will be provided.
So, the use of fuzzy logic and classical methods for routing stability in MANET has
found several numbers of studies in the literature, recently. It is generalized that
multipath performs better than single-path in high traffic loads when AODV had been
compared to Ad hoc on-demand multipath Distance Vector protocol (AOMDV) [2, 5,
15]. Also, selecting candidate nodes or finding the intersected nodes along routes in
Impact of Fuzzy Stability Model 443

multipath source routing protocol will improve the overall Ad Hoc network perfor-
mance [16, 17]. Improving the Quality of service (QoS) in MANET could be achieved
by preventing data loss when reducing route breaks [4].
A fuzzy logic system for caching decisions had improved routing efficiency. The
fuzzy routing algorithm used to balance the load along multiple paths. Fuzzy opti-
mization tended to decrease the disadvantages of both uni-path and multipath routing
[1]. Also in [7], it had been proposed a novel scheme of fuzzy logic-based dynamic
routing in MANET. The fuzzy logic applied to manage routing policies and enhance
routing performance dynamically. The proposed algorithm depended on mobility,
signal power, bandwidth, and packet forwarding ratio where the node’s segmentation
was reducing the overhead of the entire network and speed up the routing process.
[18] had suggested a stable routing protocol by embedding a fuzzy logic system
that considered input metrics hop count and stability factor. The performance analysis
showed that the fuzzy logic base proposed scheme has a better packet delivery ratio and
delay than DSR. [28] also, used an adaptive fuzzy inference system to enhance the
dynamic source routing by considering the routes inside the route cache ordering based
on hop count, energy, and delay.
[29] proposes a compressed fuzzy logic-based multi-criteria AODV routing in the
Ad Hoc environment. This proposition aims to enhance the routing decision mecha-
nism by jointly considering the number of relays, distance factor, direction angle, and
vehicle speed variance.
A fuzzy logic strategy that uses certificate authorization, auditing energy, and node
trust was ensured to improve data stability by detecting attacks with increasing network
density [30].

3 Reactive Routing Protocols

Reactive routing protocols end the idea of keeping information all the time and cut the
routing overhead by keeping only the updated paths. The communication between the
source and destination requires a route discovery procedure and maintaining this route
by some route maintenance rules until this route is no longer valid or undesired [9, 11].
In reactive routing protocols, the source node floods the network with a route
request packet (RREQ) to find a path for a destination on demand.
The common drawback of this type of protocol is a higher delay which accom-
panied by new route discovery. Particularly Ad hoc on-demand routing protocol
(AODV) drawbacks are a large number of control packets generated in link failure,
consumes network bandwidth, and level of QoS decreases with an increasing network
density [24]. In Dynamic Source Routing protocol (DSR), the drawbacks are it is not
scalable, a long time to get routing information, and stale routes in route cache.
Therefore, reactive routing protocols are preferred to medium networks.

3.1 Dynamic Source Routing (DSR)


DSR protocol had been classified as a source routing. i.e. it groups the node’s addresses
from source to destination in established route. Grouping addresses allow the
444 H. A. M. Sayedahmed et al.

intermediate nodes to update their route caches by inserting each received control
packet source address. Routing overhead increases with increasing mobility and the
number of nodes Neha Trivedi et al., 2015 [12]. Also, it is called the route discovery
process. In route maintenance, a node sends a route error (RERR) packet if a route fails.
Each node used that link, will remove this route from its route cache.

3.2 Ad-Hoc On-Demand Distance Vector (AODV)


AODV [8], had been classified as on-demand routing. Also, it allows the connection to
be set up between a source node and a destination node through the processes of path
discovery without keeping all routing information in routing tables.
Path discovery starts when a node needs to connect with another node that is not on
its routing table. Each node maintains a node sequence number and broadcast-ID. The
source node floods the network with route request packets (RREQ) to its neighbors.
The RREQ includes:
<sourceaddr; source sequence #; broadcast id; destaddr; dest sequence #; hop cnt>
<sourceaddr; broadcast id> uniquely identify the RREQ packet. Reverse Path setup
[25], starts when a source node starts to broadcast the RREQ to its neighbors. Each node
replies with RREP or rebroadcasts the RREQ until the destination reached or
unreachable. So, intermediate nodes record the first sender RREQ, otherwise, it drops
duplicated RREQ. Each route reply to each sender (i.e. intermediate nodes) forms a
reverse path setup [24]. Forward Path setup, each node set up a pointer from which it had
received the RREP along the reverse path which had been established. Nodes that are
not in the reverse path will timeout after ACTIVE_ROUTE_TIMEOUT (3000 ms) and
will drop the reverse pointers.

4 Fuzzy Model-Based Reactive Protocols

The fuzzy stability model could be built as in Fig. 1 that shows the process of gen-
eration. At step 1, the features are determined which affects the behavior of MANET.
Also, these features represent a gradual degree when grouped. The features were total
routes in route cache in each node (TR), number of routes for a specified destination in
route cache (NS), and speed of a node (S) in our model.
At step 2 either the use of standard data set that is published on known libraries or
generate data set which contains the discriminant features. In step 3, it is presented two
methods that could be used to extract the relation between features, also referred to as
extracting IF…THEN…Rules. The first method is to ask the experts with selected data
set, the second method is to use a fuzzy clustering algorithm such as a fuzzy C-Mean
algorithm or Fuzzy density-based algorithm. The Fuzzy clustering algorithm provides
the model with the universe of discourse for each linguistic term for each feature. Also,
provides the model with the number of linguistic terms for each feature and number of
IF…THEN…Rule [26, 27]. Creating a fuzzy stability model, the fuzzy C-Mean
algorithm was used with intensive analysis.
Impact of Fuzzy Stability Model 445

The presented fuzzy stability model is a tuned fuzzy model proposed by authors in
[10]. The earlier model contained TR and S as inputs and NS as output. Also, it
contained 24 IF…THEN…rule base.
The tuning process has 2 phases. The first phase is, running a fuzzy C-mean
algorithm (FCM) with 24 cluster number with 30 runs and get the average per each
cluster in the 30 runs. The average is considered to avoid random initial cluster centers
that had initialized by the FCM algorithm. Also, minimizing the 24 IF…THEN…rules
by applying the distance matrix. The second phase is, changing the Triangular mem-
bership functions for both inputs and output linguistic terms with Gaussian membership
functions which were used for both Inputs and output linguistic terms. Besides, cal-
culating the mean and variance for each cluster. It was considered to figure out each
linguistic term width. After tuning, the fuzzy stability model changed as follows.

4.1 Input Variables


The TR and S were used as the inputs. The TR was described through 5 linguistic terms
(v.low, low, medium, below.high, high), and S with 4 linguistic terms (resident, move,
medium, fast). The fuzzy C-Mean algorithm set the universe of discourse for TR and S
{0, 79}, and {0, 15} respectively. Figures 2 and 3 show the membership function for
TR and S with associative linguistic terms for the tuned model.

Fig. 1. Fuzzy model building structure


446 H. A. M. Sayedahmed et al.

Fig. 2. Total routes (TR) input variable

Fig. 3. Node Speed (S) Input Variable

4.2 IF…tHEN…Rules
“IF-THEN-Rules” represents the knowledge in a system. The used fuzzy inference
system (FIS) type is Mamdani, “And” method is min, “OR” method is max, and the
defuzzification is the centroid. The “IF-THEN- Rules” was minimized to 5 rules as in
Table 1, which had been obtained from the distance matrix according to [19].

Table 1. IF…THEN…Rules

1.IF TR is very low and S is medium Then NS is not stable


2.IF TR is low and S is fast Then NS is near stable
3.IF TR is medium and S is resident Then NS is stable
4.IF TR is below high and S is medium Then NS is very stable
5.IF TR is high and S is move Then NS is consistent

4.3 Output Variable


The output variable represents the number of routes in route cache for a specified
destination also referred to as node stability (NS). The NS was described through 5
linguistic variables (not stable, near stable, stable, consistent, very stable). The
defuzzification membership function is Gaussian for all linguistic variables. Also, the
Impact of Fuzzy Stability Model 447

universe of discourse for NS is {0, 18}. Figure 4 shows the membership function with
linguistic variables.

Fig. 4. Node stability (NS) output

The aim of using the fuzzy model is to enhance reactive routing protocols either on-
demand such as in AODV or dynamic source routing such as in DSR to get better
network performance. Also, exploiting the lack of information and ambiguity of mobile
speed and number of routes to keep the best level of Quality of Service.
The fuzzy model works as shown in Procedure 1. In reactive routing protocols, the
route discovery process starts when a source node wants to transfer packets to the
destination node by broadcasting RREQ packet into a network and when an inter-
mediate node receives the RREQ packet, it evaluates input parameters TR and S. Then
fuzzy model evaluates the output parameter NS and determines node is available or not
to take part with that request. If the node is available, then RREQ is re-broadcasted.
Otherwise, the node drops the RREQ. This process is done by each intermediate node
until the request reaches the destination. Route reply packet generated from a desti-
nation sent back to the source node via the path stored in route table or route record.
Each node in each route receives the RREP packet; a fuzzy model re-evaluates the node
stability due to node mobility or route expiry time. If a node is available, it forwards the
packet to the potential destinations. Therefore, either FSAODV or FSDSR guarantees
that each node along a path is a near-optimal node which will reduce route error RERR
and network overhead. As an example of fuzzy stability model output, node stability is
5.51 for total routes is 40, and node speed is 2. Also, the fuzzy model works standalone
or embedded in a reactive routing protocol.
The formerly described procedure allows the state of art protocols AODV and DSR
to decrease the number of control packets in MANET based on fuzzy logic. Moreover,
helps to decrease delay and improve packet delivery ratio.
448 H. A. M. Sayedahmed et al.

Procedure 1: Fuzzy Stability Model Workflow


Start:
1- If packet is RREQ packet
I- Node calculate TR & S
II- If (NS is Threshold) Then send RREQ
Else
Discard the RREQ
Wait another RREQ
2- Else If packet is RREP packet
I- Node calculate TR & S
II- IF (NS is Threshold) Then send RREP
Else
Discard the RREP
Wait another RREP
3- Else
packet is Source Route (SR) or Acknowledgement (Ack.) or
Route Error (RRER) packets Then
1. send the packet
2. Add node address to route cache
End

5 Simulation Environment

The fuzzy stability model had tested using OPNET Modeler 14.5 and MATLAB
R2014b fuzzy toolbox. The simulation settings are as in Table 2. The simulation was
run in multiple scenarios with the variation of node density 10,20,30,40 and 50 nodes.
The fuzzy model accepts the condition when NS is consistent. Also, a random way-
point was selected as a mobility model due to its similarity with natural environments
in all scenarios. The collected results represent global network performance. Global
statistics describe the behavior of the specified protocol in the whole simulated system.
Also, it is shared by all objects in the simulation.

Table 2. Environment settings


Simulation parameter Value
Protocol FSDSR/DSR/AODV/FSAODV
Mobility Random waypoint
Node type “manet_station_adv”
No. of nodes 10,20,30,40,50 nodes
Area 1000 * 1000 m2
Simulation time 45 min
Impact of Fuzzy Stability Model 449

The ratio of the data packets delivered to the destination to those generated by CBR
sources is the packet delivery ratio (PDR). Equation 1 represents the PDR.
Pc Ri
i¼1 Ti
PDR ¼ ð1Þ
N
Where N is a total number of flows, i is node id, Ri is the number of received
packets from node i, and Ti is the number of transmitted packets from node i.
The number of retransmission attempts is the summation of all packets sent by each
node inside the network until those packets successfully transmitted or discarded for
when reaching a short or long retry limit. Throughput represents the total number of
bits/sec forwarded from wireless LAN layers to higher layers in all WLAN nodes of the
network.
Wireless LAN delay is the end to end delay of all the packets received by nodes.
Delay increases with increasing the number of control packets exchanging. Wire-
less LAN delay can express as in Eq. 2.
 
Dend to end ¼ N Dtrans þ Dprop þ Dproc þ Dqueuing ð2Þ

Where “Dend to end is total end to end delay”, “Dtrans is transmission delay”, “Dprop
is propagation delay”, “Dproc is processing delay”, “Dqueuing is queuing delay”, and “N
is the total number of packets”

6 Result Discussion

Figure 5 shows that the packet delivery ratio for FSDSR and FSAODV has outper-
formed DSR and AODV respectively. Due to reducing the number of control packets,
each node uses a fuzzy stability model process lower number of packets therefore
overall exchanging packets process improved i.e. the denominator of Eq. (1) had
decreased, and the number of transmitted and received packets had decreased as well
which are the numerator. On the other hand, un-restricted control packets lead to
increasing overhead on each node. But in some cases (10 nodes and 50 nodes) the PDR
for AODV-FSAODV are close to each other, and DSR-FSDSR is close to each other.
The reason behind that closeness is that the output variable (NS) has lower membership
as an outcome of the fuzzy model.
450 H. A. M. Sayedahmed et al.

Packet delivery Ratio


1

0.5

0
10 20 30 40 50
nodes nodes nodes nodes nodes

AODV FSAODV DSR FSDSR

Fig. 5. Packet delivery ratio

Figure 6 shows that several retransmission attempts for node uses AODV or DSR
is higher than using FSAODV and FSDSR. Because in AODV or DSR all paths with a
long and short number of hops per route discovered which need more control packets
exchanging between each hop and the next one. But, FSAODV and FSDSR discover
short paths with a lower number of control packets exchanging as a result of fuzzy
model control.

Retransmission Attempts
0.6

0.4

0.2

0
10 20 30 40 50
nodes nodes nodes nodes nodes

AODV FSAODV DSR FSDSR

Fig. 6. No. of retransmission attempts


Impact of Fuzzy Stability Model 451

Processing more control packets need more throughputs as in using AODV or


DSR. Figure 7 shows the throughput between AODV, DSR, FSAODV, and FSDSR
concerning node density. FSAODV and FSDSR are better than AODV and DSR
respectively.

Throughput
40000

20000

0
10 20 30 40 50
nodes nodes nodes nodes nodes

AODV FSAODV DSR FSDSR

Fig. 7. Throughput

Figure 8 by decreasing the number of control packets in FSAODV and FSDSR,


wireless LAN delay is going lower. Since N which is the total number of packets had
decreased and Dtrans Dproc, Dprop, Dqueuing as well as Eq. (2) shows. Therefore, a lower
number of retransmission attempts leads to low send/receive control packets between

Delay (sec.)
0.02

0.015

0.01

0.005

0
10 20 30 40 50
nodes nodes nodes nodes nodes

AODV FSAODV DSR FSDSR

Fig. 8. Wireless LAN delay


452 H. A. M. Sayedahmed et al.

nodes within a certain route. Moreover, transferring packets in FSAODV or FSMDSR


keep up a stable lowered overhead route. In 10 nodes cases, the behavior of either fuzzy
or standard protocols is the same as each other.

7 Conclusion

In this paper, it was showed the process of constructing a fuzzy model. Also, the
process of tuning a fuzzy model. A Fuzzy stability model was tuned and applied to
handle the imprecise values of total routes in route cache and speed of nodes, to
enhance reactive routing protocols AODV and DSR. The tuned fuzzy model works in
the route discovery process for each protocol either multiple routes discovery as in
DSR or single route discovery as in AODV. The derived results showed that the fuzzy
stability model can help to get an ideal decision in reactive routing protocols either
dynamic source routing or on-demand routing. Also, it can be applied in different
reactive protocols.
The tuned fuzzy model FSAODV and FSDSR outperformed the standard AODV
and DSR respectively in packet delivery ratio, number of retransmission attempts, and
wireless LAN delay. Although not all QoS metrics included in the comparison, it can
be concluded that considering total routes in route cache and the speed of nodes can
improve the performance of Mobile Ad-Hoc Network in terms of packet delivery ratio,
the number of retransmission attempts and wireless LAN delay. At some node,
FSAODV or FSDSR has the same behavior of AODV or DSR as illustrated. Fuzzy
logic can handle many cases that probability models can not due to memory and time
complexity as addressed by this work. The future work for this research will consider
total cached replies sent route-cache size and network load to widen the view of that
model. Furthermore, the model will be replaced in the route maintenance process.

References
1. Gowri, A., Valli, R., Muthuramalingam, K.: A review: optimal path selection in ad hoc
networks using fuzzy logic. Int. J. Appl. Graph Theory Wirel. Ad Hoc Netw. Sensor Netw. 2
(4), 1–6 (2010)
2. Jhaveri, R.H., Patel, N.M.: Mobile ad-hoc networking with AODV: a review. Int. J. Next-
Gener. Comput. 6(3), 165–191 (2015)
3. Bang, A.O., Ramteke, P.L.: MANET: history, challenges and applications. Int. J. Appl.
Innov. Eng. Manage. 2(9), 249–251 (2013)
4. Dana, A., Babaei, M.H.: A fuzzy based stable routing algorithm for MANET. Int. J. Comput.
Sci. 8(1), 367–371 (2011)
5. Pudashine, K., Gawain, D.: Comparative analysis of unipath and multipath reactive routing
protocols in mobile ad hoc network. Int. J. Res. (IJR) 1(6), 705–708 (2014)
6. Singhal, A., Daniel, A.K.: Fuzzy logic based stable on-demand multipath routing protocol
for mobile ad hoc network. In: Fourth International Conference on Advanced Computing &
Communication Technologies (2014)
7. Chaythanya, B.P., Ramya, M.M.: Fuzzy logic based approach for dynamic routing in
MANET. Int. J. Eng. Res. Technol. 3(6), 1437–1441 (2014). ISSN: 2278-0181
Impact of Fuzzy Stability Model 453

8. Cunha, F., Villas, L., Boukerche, A., Maia, G., Viana, A., Mini, R.A., Loureiro, A.A.: Data
communication in VANETs: protocols, applications and challenges. Ad Hoc Netw. 44, 90–
103 (2016)
9. Abbas, N.I., Ilkan, M., Ozen, E.: Fuzzy approach to improving route stability of the AODV
routing protocol. EURASIP J. Wirel. Commun. Netw. 2015(1), 235 (2015)
10. Sayedahmed, H.A., Fahmy, I.M., Hefny, H.A.: Improving multiple routing in mobile ad hoc
networks using fuzzy models. In: International Conference on Advanced Intelligent Systems
and Informatics, pp. 642–654. Springer, Cham (2017)
11. Ghayvat, H., Pandya, S., Shah, S., Mukhopadhyay, S.C., Yap, M.H., Wandra, K.H.:
Advanced AODV approach for efficient detection and mitigation of wormhole attack in
MANET. In: 2016 10th International Conference on Sensing Technology (ICST), pp. 1–6.
IEEE (2016)
12. Trivedi, N., Kumar, G., Raikwar, T.: Performance and evolution of routing protocol DSR,
AODV and AOMDV in MANET. Int. J. Comput. Appl. 109(8), 1–8 (2015)
13. Palta, P., Goyal, S.: Comparison of OLSR and TORA routing protocols using OPNET
modeler. Int. J. Eng. Res. Technol. 1(5), 984–990 (2012)
14. Goyal, P., Parmar, V., Rishi, R.: MANET: vulnerabilities, challenges, attacks, application.
IJCEM Int. J. Comput. Eng. Manage. 11, 32–37 (2011)
15. Motegi, S., Horiuchi, H.: Proposal on AODV-based multipath routing protocol for mobile ad
hoc networks. In: Proceedings of of First International Workshop on Networked Sensing
System (2004)
16. Upadhayaya, S., Gandhi, C.: Node disjoint multipath routing considering link and node
stability protocol: a characteristic evaluation. Int. J. Comput. Sci. 7(1) (2010). No. 2
17. Mallapur, S.V., Patil, S.R.: Stable backbone based multipath routing protocol for mobile ad-
hoc network. In: International Conference on Circuits, Power and Computing Technologies,
ICCPCT (2013)
18. Sharma, V., Alam, B., Doja, M.N.: Fuzzy weighted metrics routing in DSR in MANTs. In:
Proceedings of First International Conference on Information and Communication
Technology for Intelligent Systems, vol. 2 (2016)
19. King, R.S.: Cluster Analysis and Data Mining: An Introduction. Stylus Publishing, LLC,
Sterling (2015)
20. Conti, M., Giordano, S.: Mobile ad hoc networking: milestones, challenges, and new
research direction. IEEE Commun. Mag. 52(1), 85–96 (2014)
21. Lou, W., Fang, Y.: A survey of Wireless Security in Mobile Ad Hoc Networks: Challenges
and Available Solutions. Kluwer Academic Publishers, Dordrecht (2013)
22. Safa, H., Artail, H., Tabet, D.: A cluster-based trust-aware routing protocol for mobile ad hoc
networks. Wirel. Netw. 16(4), 969–984 (2010)
23. Yi, J., Adnane, A., David, S., Parrein, B.: Multipath optimized link state routing for mobile
ad hoc networks. Ad Hoc Netw. 9(1), 28–47 (2011)
24. Hassnawi, L.A., Ahmad, R.B., Yahya, A., Aljunid, S.A., Elshaikh, M.: Performance analysis
of various routing protocols for motorway surveillance system cameras’ network. Int.
J. Comput. Sci. 9(2), 7 (2012)
25. Maurya, P.K., Sharma, G., Sahu, V., Roberts, A., Srivastava, M.: An overview of AODV
routing protocol. Int. J. Mod. Eng. Res. 2(3), 728–732 (2012)
26. Wang, W., Pedrycz, W., Liu, X.: Time series long-term forecasting model based on
information granules and fuzzy clustering. Eng. Appl. Artif. Intell. 41, 17–24 (2015)
27. Nadiri, A.A., Gharekhani, M., Khatibi, R., Moghaddam, A.A.: Assessment of groundwater
vulnerability using supervised committee to combine fuzzy logic models. Environ. Sci.
Pollut. Res. 24(9), 8562–8577 (2017)
454 H. A. M. Sayedahmed et al.

28. Sharma, V., Alam, B., Doja, M.N.: An improvement in DSR routing protocol of MANETs
using ANFIS. In: Malik, H., Srivastava, S., Sood, Y.R., Ahmad, A. (eds.) Applications of
Artificial Intelligence Techniques in Engineering. AISC, vol. 697, pp. 569–576. Springer,
Singapore (2019). https://doi.org/10.1007/978-981-13-1822-1_53
29. Fahad, T.O., Ali, A.A.: Compressed fuzzy logic based multi-criteria AODV routing in
VANET environment. Int. J. Electr. Comput. Eng. (IJECE) 9(1), 397–401 (2019)
30. Arulkumaran, G., Gnanamurthy, R.K.: Fuzzy trust approach for detecting black hole attack
in mobile adhoc network. Mob. Netw. Appl. 24(2), 386–393 (2019)
A Multi-channel Speech Enhancement Method
Based on Subband Affine Projection Algorithm
in Combination with Proposed Circular Nested
Microphone Array

Ali Dehghan Firoozabadi1(&), Pablo Irarrazaval2, Pablo Adasme3,


Hugo Durney1, Miguel Sanhueza Olave1, David Zabala-Blanco4,
and Cesar Azurdia-Meza5
1
Department of Electricity, Universidad Tecnológica Metropolitana,
Av. Jose Pedro Alessandri 1242, 7800002 Santiago, Chile
{adehghanfirouzabadi,hdurney,msanhueza}@utem.cl
2
Electrical Engineering Department, Pontificia Universidad Católica de Chile,
Santiago, Chile
pim@uc.cl
3
Electrical Engineering Department, Universidad de Santiago de Chile,
Santiago, Chile
pablo.adasme@usach.cl
4
Department of Computing and Industries, Universidad Católica Del Maule,
3466706 Talca, Chile
davidzabalablanco@hotmail.com
5
Department of Electrical Engineering, Universidad de Chile, Santiago, Chile
cazurdia@ing.uchile.cl

Abstract. In this paper, a novel multi-channel speech enhancement system is


introduced based on a proposed circular nested microphone array (C-NMA) in
combination with subband affine projection algorithm (SB-APA). The multi-
channel speech enhancement methods have better accuracy because of infor-
mation redundancy in comparison with single-channel methods. Firstly, a novel
C-NMA is proposed with low computational complexity in comparison with
other speech recording microphones. The C-NMA eliminates the spatial aliasing
in microphone signals. Then, a subband step is implemented based on the
speech components to increase the frequency resolution. The affine projection
algorithm is implemented adaptively on the subband signals by C-NMA.
Finally, the subband signals are combined by the synthesize filters and the
enhanced signal is produced. The accuracy of the proposed method is compared
with least mean square (LMS), traditional APA, recursive least square (RLS),
and real-time generalized cross-correlation non-negative matrix factorization
(RT-GCC-NMF). The results show the superiority of the proposed method in
comparison with other previous works in noisy and reverberant environmental
conditions.

Keywords: Multi-channel speech enhancement  Adaptive filters  Affine


projection algorithm  Speech processing  Subband processing

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 455–464, 2021.
https://doi.org/10.1007/978-3-030-58669-0_41
456 A. D. Firoozabadi et al.

1 Introduction

Speech enhancement is one of the most important research fields in speech commu-
nication systems and smart meeting rooms. The aim of the enhancement algorithms are
reducing the level of the noise and reverberation without having speech distortion. In
the recent decay, the noise cancellation of speech signal has been an important task
specially in mobile system developments [1, 2]. The aim of speech enhancement
systems are increasing the perception in the human hearing systems or the enhancement
of extracted features from the speech signal in recognition systems. The undesirable
noise factors in speech signal are based on the way of combination such as: additive,
multiplicative or convolutive noise. Different methods have been proposed for speech
enhancement because of the various applications. Increasing the speech perception and
decreasing the distortion are two important tasks in speech enhancement systems.
Various single and multi-channel speech enhancement methods are proposed in the
recent years. The machine learning-based speech enhancement methods are developed
in many applications. For example, the non-negative matrix factorization (NMF)
method is proposed based on the matrix decomposition to the basic and activation
factors with non-negative elements [3, 4]. The method based on the adaptive filters [5]
and subspace decomposition algorithms [6] are the important methods in multi-channel
conditions. The most well-known proposed algorithms in denoising systems are the
optimal distributed minimum-variance beamforming [7] and perceptual properties-
based subspace method [8]. The other speech enhancement methods are based on the
adaptive filters, where the most important algorithms are least mean square (LMS) [9],
recursive least square (RLS) [10], and affine projection algorithm (APA) [11]. The APA
is the generalized form of normalized LMS speech denoising method. Also, the AP
algorithm proposes a trade-off between low convergence rate of LMS algorithm and
high computational complexity in RLS method.
In this paper, a multi-channel speech enhancement method is introduced based on
the proposed circular NMA (C-NMA) in combination with subband APA (SB-APA).
The NMA increases the accuracy of the speech enhancement methods because of the
information redundancy. But the spatial aliasing is one of the challenges in the use of
microphone arrays. Firstly, a C-NMA is proposed to eliminate the spatial aliasing, and
the array dimensions are designed to be applicable in the real conditions. The speech
components are different in frequency bands. Therefore, a subband method is con-
sidered for speech signal processing. This method prepares the high frequency reso-
lution in low speech frequency components. Finally, the AP algorithm, as an adaptive
method for the speech enhancement, is implemented on subband signals of C-NMA.
Since each APA block is implemented on a specific subband with limited information,
the accuracy and speed of convergence are increased in this condition. In the last step,
the synthesis filters are implemented to generate the enhanced speech signal. The
proposed system with SB-APA is compared with LMS, traditional APA, RLS, and
real-time generalized cross-correlation non-negative matrix factorization (RT-GCC-
NMF) [12]. The results show the superiority of the proposed system in comparison
with other previous works in all environmental conditions.
A Multi-channel Speech Enhancement Method 457

In Sect. 2, the microphone signal model, proposed C-NMA, and SB-APA are
presented. Section 3 shows the simulation and results of the proposed method in
comparison with other previous works. Section 4 includes some conclusions.

2 The Proposed Speech Enhancement System Based


on the C-NMA and SB-APA

2.1 Microphone Signal Model


The ideal and real microphone signal models are considered in the speech enhancement
applications. In this paper, the real model is selected to prepare the conditions similar to
real environments. The microphone signal model in real scenarios is defined as:
h i
xm ½n ¼ s½n  cm ~d ðsÞ ; n þ vm ½n; ð1Þ

where xm ½n is the received signal in m-th microphone place, s½n is the speech signal in
the sound source, cm is the impulse response between source and m-th microphone,
vm ½n is the additive noise in the place of m-th microphone, ~
d ðsÞ is the distance between
source and m-th microphone, and * is the convolution operator.

2.2 The Proposed Circular Nested Microphone Array


The microphone array increases the performance of the speech enhancement algorithms
by providing more information. But the spatial aliasing based on the microphone
distances destroys the microphone information, and it decreases the speech enhance-
ment accuracy. The NMA has the capability to eliminate the spatial aliasing. In this
section, a C-NMA is proposed, where its characteristics are fixed in the horizontal
plane for different speaker directions based on the circular shape [13]. Figure 1 shows
the block diagram of the proposed system in this paper. The C-NMA part is shown in
the left side. The C-NMA is defined with its analysis filters and down-sampler blocks.

Fig. 1. The block diagram of the proposed method for speech enhancement by considering the
C-NMA.
458 A. D. Firoozabadi et al.

The proposed C-NMA is designed to cover the frequency range [50–7800] z with
sampling frequency Fs ¼ 16000 Hz. This proposed array is structured of 4 sub-arrays.
The first sub-array is designed for the frequency range B1 = [3900–7800] Hz with the
central frequency fc1 ¼ 5850 Hz. The inter-microphone distance (d) is considered as
d\k=2 (k is wavelength for the highest frequency component) to avoid the spatial
aliasing, which is d1 \2:2 cm for the first sub-array. The second sub-array covers the
frequency range B2 = [1900–3900] Hz with central frequency fc2 ¼ 2900 Hz. The
inter-microphone distance is calculated as d2 ¼ 2d1 \4:4 cm. The third sub-array is
defined for frequency range B3 = [950–1900] Hz and with central frequency fc3 ¼
1425 Hz The inter-microphone distance for the third sub-array is d3 ¼ 4d1 \8:8 cm.
Finally, the forth sub-array is designed for the frequency range B4 = [50–950] Hz with
central frequency fc4 ¼ 500 Hz, and the inter-microphone distance d4 ¼ 8d1 \17:6 cm.
Table 1 shows the summarized information to design the C-NMA.

Table 1. The information to design analysis filters for C-NMA and subband filter bank.
Band Bandwidth Analysis filter bank fc Hz d(cm)
(subband processing)
1 B1 = [3900–7800] Hz B1, 1 = [6825–7800] Hz 5850 <2.2
B1, 2 = [5850–6825] Hz
B1, 3 = [4875–5850] Hz
B1, 4 = [3900–4875] Hz
2 B2 = [1900–3900] Hz B2, 1 = [2900–3900] Hz 2900 <4.4
B2, 2 = [1900–2900] Hz
3 B3 = [950–1900] Hz B3, 1 = [1425–1900] Hz 1425 <8.8
B3, 2 = [950–1425] Hz
4 B4 = [50–950] Hz B4, 1 = [500–950] Hz 500 <17.6
B4, 2 = [50–500] Hz

The C-NMA is made to have the closest microphone based on the designed
structure. Therefore, the first sub-array contains the microphone pairs {1, 2}, {2, 3}, {3,
4}, {4, 5}, {5, 6}, {6, 7}, {7, 8}, and {8, 1} with inter-microphone distance d1 ¼ 2:2
cm. The microphone pairs {1, 3}, {3, 5}, {5, 7}, {7, 1}, {2, 4}, {4, 6}, {6, 8}, and {8,
2} are allocated for the second sub-array with d2 ¼ 4:2 cm. The third sub-array con-
tains the microphone pairs {1, 4}, {2, 5}, {3, 6}, {4, 7}, {5, 8}, {6, 1}, {7, 2}, {8, 3}
with inter-microphone distance d3 ¼ 5:6 cm. For the last sub-array, the microphone
pairs {1, 5}, {2, 6}, {3, 7}, {4, 8} are considered for implementations with d4 ¼ 6 cm.
The selected microphone pairs for each sub-arrays are shown in Fig. 2.
Each designed sub-array needs the analysis and synthesis filters to avoid the spatial
aliasing. The analysis filters Hi ðzÞði ¼ 1; . . .; 4Þ are shown in Fig. 3a. A tree structure
containing a high-pass filter HPi ðzÞ, a low-pass filter LPi ðzÞ; and a down-sampler block
Di is considered for implementing these analysis filters.
A Multi-channel Speech Enhancement Method 459

Fig. 2. The proposed C-NMA with allocated microphones for each sub-array.

Fig. 3. The tree structure of a) analysis filters, and b) synthesis filters for C-NMA.

H1 ðzÞ ¼ HP1 ðzÞ


H2 ðzÞ ¼ LP1 ðzÞHP2 ðz2 Þ
ð2Þ
H3 ðzÞ ¼ LP1 ðzÞLP2 ðz2 ÞHP2 ðz4 Þ
H4 ðzÞ ¼ LP1 ðzÞLP2 ðz2 ÞLP3 ðz4 Þ:

The synthesis filters Gi ðzÞ are the inverse version of analysis filters and they are
designed with tree structure. Figure 3b shows the tree structure of synthesis filters.

2.3 The Proposed Subband-APA


The speech is a wideband and non-stationary signal. Therefore, each frequency band of
speech signal has different components in comparison with other bands. Also, the
speech has the windowed-disjoint orthogonality (W-DO) [14], which means each time-
frequency (TF) point of speech spectrum with high probability is related to just one
speaker. The subband processing increases the accuracy and speed of convergence for
AP algorithms. The output of analysis filter bank in C-NMA is shown as:

xm;i ½n ¼ xm ½n  hi ½n where fm ¼ 1; . . .; 8 and i ¼ 1; . . .; 4g ð3Þ

where hi ½n is the analysis filter for C-NMA and xm;i ½n is the filter’s output. Then, these
signals enter to the analysis filter bank (subband processing) as:

ym;i;j ½n ¼ xm;i ½n  fj ½n fj ¼ 1; . . .; 10g ð4Þ

where fj ½n is the analysis filter for the proposed subband processing, which has been
shown in Table 1.
460 A. D. Firoozabadi et al.

The AP algorithm is implemented on each subband microphone signal of analysis


filter bank. The AP algorithm was proposed to increase the speed of convergence in
gradient-based algorithms. In this paper, the speed of convergence and accuracy of AP
algorithm is improved by the use of subband data [15].
The most important parameter in AP algorithm is the filter update equation, where
the N input vectors (N is the projection order) is considered instead of 1 single vector in
normalized least mean square (NLMS) algorithm. L Adaptive filter coefficients are
considered as:

DwL ½n ¼ wL ½n  wL ½n  1: ð5Þ

The Eq. 6 is minimized by the use of N defined constraints in Eq. 7, to calculate the
AP coefficients.

kDwL ½nk2 ¼ DwTL ½nDwL ½n; ð6Þ

where the constraints are:

wTL ½nym;i;jðLÞ ½n  k ¼ d ½n  k for k ¼ 0; . . .; N  1 ð7Þ

where ym;i;jðLÞ ½n is a subband vector of the speech signal, and d ½n is the desired signal.
Figure 4 shows the subband-AP algorithm for the implementation in subband pro-
cessing for the proposed denoising method.

Fig. 4. The adaptive structure in subband AP algorithm for the proposed denoising system.

The proposed solution for solving the minimization problem prepares the update
equation for AP algorithm as:
 1
wL ½n = wL ½n  1 þ AT ½n A½nAT ½n eN ½n; ð8Þ
A Multi-channel Speech Enhancement Method 461

where,
 T
A½n ¼ ym;i;jðLÞ ½n; ym;i;jðLÞ ½n  1; . . .; ym;i;jðLÞ ½n  N þ 1 : ð9Þ

eN ½n is the N  1 vector as:

eN ½n ¼ dN ½n  A½nwL ½n  1; ð10Þ

where dN ½n is the desired signal with size N  1 as:

dNT ½n ¼ ðd ½n; d ½n  1; . . .; d ½n  N þ 1Þ: ð11Þ

The Eq. 8 is rewritten to shows the effect of affine projection method.


 1
wL ½n = wL ½n  1  aðN  1Þ þ lATs ½n As ½nATs ½n þ dI eNs ½n; ð12Þ

With eNs ½n ¼ dNs ½n  As wL ½n  1  aðN  1Þ: Also, As ½n is expressed as:
 T
As ½n ¼ ym;i;jðLÞ ½n; ym;i;jðLÞ ½n  s; . . .; ym;i;jðLÞ ½n  ðN  1Þs ; ð13Þ

and,
T
dNs ½n ¼ ðd ½n; d ½n  s; . . .d ½n  ðN  1ÞsÞ: ð14Þ

As shown in Eq. 12, N data vectors for updating the coefficients are not selected
always from the last signal samples. The SB-APA with parameters ða ¼ 0; d ¼ 0; s ¼
1Þ [16] is considered to implement in combination with C-NMA in this paper. The AP
algorithm with subband processing increases the speed of convergence because of
considering the specific frequency components to each subband and microphone pairs.

3 Simulation and Results

In this section, the accuracy of the proposed C-NMA in combination with SB-APA
method is shown on simulated data by the use of TIMIT dataset [17]. 7 s of man’s
speech signal is considered for evaluations. An additive white Gaussian noise is used to
prepare the conditions similar to the real environment. The Image model [18] is
selected for simulating the impulse response in the experimental environments. This
model generates the impulse response by the use of room dimensions, source location,
microphone location, sampling frequency, impulse response length, and surface
reverberation coefficients. The room dimensions and speaker location are selected as
(475, 592, 420) cm, (374, 146, 110) cm, respectively. The SNR ranges {−10, −5, 0, 5,
10, 15} dB are chosen for the simulations in noisy conditions. The room reverberation
time (RT60 ) is selected as 350 ms to have a condition similar to the real environments.
A Hamming window with 30 ms length is used for signal windowing in the simula-
tions. The projection order is selected as N = 4 in APA(N) to be implemented in real-
462 A. D. Firoozabadi et al.

time based on its computational complexity. Also, the step size is selected as l ¼ 1 to
have the suitable speed of convergence.
Figure 5 shows the time-domain noisy signal (SNR = 0 dB), noisy spectrum, time-
domain enhanced signal and the spectrum of the enhanced signal. As seen in this
figure, the proposed method eliminates the white background noise with high accuracy.
Also, the distortion in the enhanced signal is very low. The proposed algorithm has the
high speed of convergence in comparison with other precious works because of esti-
mating the noise in each specific bands, which makes more stationarity for noise
estimations.

Fig. 5. a) Time-frame of the noisy signal (SNR = 0 dB), and noisy signal spectrum, b) time-
frame of the enhanced speech signal and enhanced signal spectrum.

The proposed C-NMA with SB-APA is compared with LMS [9], traditional APA
[11], RLS [10], and RT-GCC-NMF [12] algorithms. The segmental signal-to-noise
ratio (Segmental SNR) and perceptual evaluation of speech quality (PESQ) criteria are
selected for comparison the proposed method with the previous works. Figure 6 shows
the comparison between these methods based on the considered criteria. As seen in
these curves, the proposed C-NMA with SB-APA method has better accuracy in
comparison with previous works specially in low SNRs based on the segmental SNR
and PESQ results. All methods have almost the same results by reach to SNR = 15 dB.
The segmental SNR and PESQ are considered as quantitative and qualitative criteria,
respectively to show the superiority of the proposed method in comparison with other
previous works.
A Multi-channel Speech Enhancement Method 463

Fig. 6. The comparison between the proposed C-NMA with SB-APA method, LMS, traditional
APA, RLS. and RT-GCC-NMF with a) Segmental SNR, and b) PESQ criteria.

4 Conclusions

Speech enhancement is one of the most important applications in speech processing.


The proposed method in this paper is based on novel C-NMA in combination with SB-
APA. The benefit of microphone array is information redundancy, which increases the
accuracy of the enhancement algorithms. But the spatial aliasing appears because of the
inter-microphone distances. The C-NMA is proposed for eliminating the spatial
aliasing. The C-NMA output signals increase the speech enhancement accuracy. Then,
a subband processing step is proposed to have higher frequency resolution in low
frequency components, where the speech signal has more information. Finally, a SB-
APA is implemented on these subband nested microphone array signals as a common
method for speech enhancement. The projection factor is selected as N = 4 to have the
real-time implementation with a reasonable computational complexity. The proposed
method is compared with LMS, traditional APA, RLS, and RT-GCC-NMF algorithms
based on the segmental SNR and PESQ criteria. The results show the superiority of the
proposed method in comparison with other previous works spatially in low SNRs by a
low distortion level.

Acknowledgment. The authors acknowledge financial support from: FONDECYT


No. 3190147 and FONDECYT No. 11180107.
464 A. D. Firoozabadi et al.

References
1. Prasad, P.B.M., Ganesh, M.S., Gangashetty, S.V.: Two microphone technique to improve
the speech intelligibility under noisy environment. In: 14th International Colloquium on
Signal Processing & Its Applications, Batu Feringghi, pp. 13–18 (2018)
2. Fukui, M., Shimauchi, S., Hioka, Y., Nakagawa, A., Haneda, Y.: Acoustic echo and noise
canceller for personal hands-free video IP phone. IEEE Trans. Consum. Electron. 62(4),
454–462 (2016)
3. Kwon, K., Shin, J.W., Kim, N.S.: NMF-based speech enhancement using bases update.
IEEE Signal Process. Lett. 22(4), 450–454 (2015)
4. Chung, H., Badeau, R., Plourde, E.: Training and compensation of class-conditioned NMF
bases for speech enhancement. Neurocomputing 284, 107–118 (2018)
5. Compernolle, D.V.: Switching adaptive filters for enhancing noisy and reverberant speech
from microphone array recordings. In: IEEE International Conference on Acoustics, Speech
and Signal Processing, Albuquerque, USA, pp. 833–836 (1990)
6. Ephaim, Y., Van Trees, H.L.: A signal subspace approach for speech enhancement. IEEE
Trans. Speech Audio Process. 3, 251–266 (1995)
7. Markovich-Golan, S., Bertrand, A., Moonen, M.: Optimal distributed minimum-variance
beamforming approaches for speech enhancement in wireless acoustic sensor networks.
IEEE Trans. Sig. Process. 107, 4–20 (2015)
8. Cheng, N., Liu, W.: Perceptual properties based signal subspace microphone array speech
enhancement algorithm. Acta Autom. Sin. 35(12), 1481–1487 (2009)
9. Haykin, S.: Adaptive Filter Theory, 3rd edn. Prentice Hall Inc., Pearson (1996)
10. Rakesh, P., Kishore Kumar, T.: A novel RLS based adaptive filtering method for speech
enhancement. In: 17th International Conference on Communications, Control and Signal
Processing, London, pp. 176–181 (2015)
11. Gonzalez, A.; Ferrer, M.; Albu, F.; Diego, M.: Affine projection algorithms: Evolution to
smart and fast algorithms and applications. In Proceedings of the 20th European Signal
Processing Conference (EUSIPCO), Bucharest, Romania, pp. 1965–1969(2012)
12. Wood, S.U.N., Rouat, J.: Unsupervised low latency speech enhancement with RT-GCC-
NMF. IEEE J. Sel. Top. Sig. Process. 13, 332–346 (2019)
13. Firoozabadi, A.D., Abutalebi, H.R.: Combination of nested microphone array and subband
processing for multiple simultaneous speaker localization. In: 6th International Symposium
on Telecommunications (IST), Tehran, pp. 907–912 (2012)
14. Yilmaz, O., Rickard, S.: Blind separation of speech mixtures via time-frequency masking.
IEEE Trans. Signal Process. 52, 1830–1847 (2004)
15. Gonzalez, A., Ferrer, M., Albu, F., Diego, M.D.: Affine projection algorithms: Evolution to
smart and fast algorithms and applications. In: Proceedings of the 20th European Signal
Processing Conference (EUSIPCO), Bucharest, pp. 1965–1969 (2012)
16. Ozeki, K., Umeda, T.: An adaptive filtering algorithm using an orthogonal projection to an
affine subspace and its properties. Electron. Commun. Jpn. 67(5), 19–27 (1984)
17. Garofolo, J.S., et al.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web
Download. Philadelphia: Linguistic Data Consortium (1993). https://catalog.ldc.upenn.edu/
LDC93S1. Accessed March 2019
18. Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics.
J. Acoust. Soc. Am. 65(4), 943–950 (1979)
Game Theoretic Approach to Optimize
Exploration Parameter in ACO MANET
Routing

Marwan A. Hefnawy(&) and Saad M. Darwish

Department of Information Technology, Institute of Graduate Studies


and Research, Alexandria University, Alexandria, Egypt
marwan.hefnawy@gmail.com, saad.darwish@alexu.edu.eg

Abstract. A Mobile Ad hoc Network (MANET) is a set of communicating


mobile devices (nodes) without network infrastructure. Finding the optimum
path between communicating nodes in MANETs is a challenging issue. This is
due to the dynamic nature of nodes and the lack of a central routing authority in
the network. The nature of the problem guided many researchers to follow the
Ant Colony Optimization (ACO) approach because of the similarity between the
two processes. In ACO, communication packets are simulated as real ants that
come out of their nest and search for food. ACO protocols have a trade-off
between exploring new routes vs. exploiting best routes followed by other ants.
The two contradicting behaviors are tuned in the ACO model by a set of
parameters. In this research, we introduce a novel approach to determine an
online balance between exploration and exploitation of routes in ACO in
MANET routing through game theory. This approach combines the benefit of
online parameter tuning’s adaptability and -on the other hand- the low com-
putational cost of game theory. Experimental results show higher performance
for this approach than competitive algorithms.

Keywords: ACO  Mobile ad hoc networks  AntHocNet  Game theory 


Parameter optimization

1 Introduction

Mobile ad hoc networks are essential in many real-life activities where it is not possible
to establish a permanent communication network’s infra-structure. They are used
extensively in the military sector, rescue fields for disaster relief activities, and
vehicular networking [1, 2]. Routing in MANETs is a challenging research area
because of: (1) mobility of nodes which leads to dynamic network topology, (2) ab-
sence of dedicated router devices, and (3) limited power resources available for nodes
makes them leave the network arbitrarily [3]. These limitations lead to the invention of
many approaches to optimize finding the best routes from a source node to the desti-
nation node. Routing protocols in MANETs are classified into four main categories
based on the underlying architecture of the network that includes reactive protocols,
table-driven protocols, hybrid routing protocols, and location-aware protocols. See
[4, 5] for more details.

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 465–474, 2021.
https://doi.org/10.1007/978-3-030-58669-0_42
466 M. A. Hefnawy and S. M. Darwish

ACO is a type of swarm intelligence optimization that has many real-life uses,
especially in problems that need autonomous agents’ solutions and lack a central
administration authority [6]. In ACO-based MANET routing, individual ant agents try
several routes from the source node to the destination node stochastically. They deposit
pheromone amounts that indicate the quality of the routes traversed. The deposited
pheromone amounts from all passing ant agents are accumulated on the route. On the
other hand, there is a continuous evaporation (decrement) process for the pheromone
amount from all routes by time in order to identify the non-used routes. By time and
passage of more ant agents, the optimal routes are distinguished by a heavy pheromone
concentration while the abandoned routes are remarked by low pheromone amounts.
Successor ant agents have to decide whether to follow the optimal routes followed by
their ancestors, or experiment new routes to keep them as good backups for future use
in case of failure of the current optimal routes [7].
ACO based routing algorithms have the same classification as of the general
MANET routing algorithms discussed earlier. There are table-driven (proactive) pro-
tocols, on-demand (reactive) protocols, and hybrid protocols [7]. Parameter tuning in
ACO is the process of finding the optimal values of a certain parameter or a combi-
nation of parameters. The parameter tuning process in the literature is classified into
offline tuning and online tuning [8]. To the best of our knowledge, only a few
researchers dealt with online parameter tuning in MANET routing with ACO algo-
rithms like Deepalakshmi et al. [9].

1.1 Contribution of This Work


The algorithm introduced in this paper is a game-theoretic enhancement to the
ANTHOCNET routing algorithm. Our contribution is using game theory to make an
online tuning for the b parameter that controls the exploration-exploitation balance
during the reactive path setup phase. The players of this game are the two competing
concepts: Exploration and Exploitation. The target of each player is to dominate the
behavior of most of the crawling ants. Online parameter tuning makes the routing
algorithms more flexible to the changes in the MANET environment. However, online
parameter tuning algorithms are characterized by heavy calculations compared with
offline parameter tuning algorithms. In our proposed algorithm, we benefit from the low
computational cost of game theory and the high adaptability of the online parameter
tuning.
The rest of this paper is as follow. Section 2 investigates the related work in
literature. Section 3 introduces the proposed algorithm. In Sect. 4 we present the
experimental results of our experiments. Finally, the conclusion is summarized in
Sect. 5.

2 Literature Survey

Ad hoc On-demand Distance Vector (AODV) is one of the most famous reactive
routing protocols in MANET environment [3]. In this algorithm, the source node
searches for the destination node in its routing table. Although this algorithm ensures
Game Theoretic Approach to Optimize Exploration Parameter 467

reaching the destination, it is remarked with high routing overhead. Game-theoretic


approaches have been suggested in MANET routing from many perspectives. Naserian
et al. [10] used game theory to control the flooding behavior in AODV protocol. Each
intermediate node in the AODV environment that receives an RREQ packet to prop-
agate it further inside the network is considered as a game player. The strategy of the
players is either forward the packet or drop it. The tradeoff is between the costs of
forwarding the packet against the network gain factor. This method limits the nodes
that receive the RREQ packet all over the network and hence reduces the flooding
effect in the network.
Regarding the implementation of ACO in MANET routing, the authors of [11]
introduced the AntHocNet algorithm. It is a hybrid ACO routing algorithm in MANET.
It is composed of reactive and proactive components. Route quality is determined by a
combination of several metrics such as delay time, number of hops, etc. The contin-
ually changing measurements of quality metrics are transformed, in the algorithm, into
pheromone values that indicated the relative goodness of each route. In this case, the
ant agents choose either following the routes having the highest pheromone values or
exploring new routes randomly. One parameter in the equation that control’s the
exploratory ant’s movement is responsible for either increasing the exploration
behavior or increasing the exploitation behavior of the ants. The AntHocNet uses an
offline parameter tuning approach with a predefined set of values for all parameters.
This makes the algorithm not adaptable with network environment changes.
Parameters tuning in ACO for MANET routing have been investigated from several
approaches. Deepalakshmi et al. [9] used an online particle swarm optimization
algorithm to optimize three ACO parameters in MANET routing. The algorithm starts
with a predefined set of values for the parameters, then it gathers the performance
values associated with the current values of the parameters and makes a correlation
between the inputs and the outputs. If there is an enhancement in the performance from
the old instance to the new instance, then the algorithm continues changing the
parameters’ values in the same manner (increment or decrement). Otherwise, it reverses
the changing manner of the parameters that caused performance degradation. Although
the algorithm reaches optimal or near-optimal values for the parameters, it is
remarkable with a high computational time. Sandhya et al. [12] implemented a fuzzy
approach to tune the ACO parameters in a vehicle routing environment. They applied
three online fuzzy adaptive strategies to optimize a combination of the ACO parameters
in each strategy. Although the fuzzy approach has a low computational cost, it needs an
expert system to assign the membership function of the system.

3 Proposed Algorithm
3.1 ACO Component
ANTHOCNET [6] routing protocol is used as a general framework for finding routes
inside the network. In ANTHOCNET, during the reactive path setup phase, if a source
node s requires to start a communication session with a destination node d, the node s
checks if it has a pheromone value for d in its routing table. If no route information to
468 M. A. Hefnawy and S. M. Darwish

destination d exists at s, then it broadcasts forward exploratory ants throughout the


network. At each intermediate node i, the forward ant is further broadcasted to all its
neighboring nodes with a weighted mechanism. The next hop n for the forward ant is
chosen with probability Pnd based on the relative goodness of node n to deliver the
forward ant to the destination d. Every node i keeps a pheromone table T i that is used
as a routing table. The rows of the table are the possible destination nodes, while its
columns are the neighboring nodes that exist in the direct communication range of the
node that carries the table. The entries of the table are real numbers Tind that represent
relative goodness of going from the current node i to the destination node d through the
neighboring node n. The probability of choosing a certain neighbor n as the next hop
for the exploratory ants is calculated as follows:

ðTind Þb
Pnd ¼ P i b
;b1 ð1Þ
j2Ndi ðTjd Þ

where Ndi is the set of all neighboring nodes of i. The Tind values are accumulative
numbers and are calculated as follows. When the forward ants reach the destination
node d, they are transformed into backward ants that traverse back the same route they
followed for arriving d. During returning back, the backward ant collects route quality
information such as end-to-end delay, number of hops, signal quality, etc. Upon
reaching the intermediate node i, the backward ant carries the path information in a
value called sid . This value updates the pheromone table as follows:

Tind ¼ cTind þ ð1  cÞsid ð2Þ

That is, the updated state of route (sid ) is incorporated into the pheromone table and is
added to the existing historical value in the table Tind with a weighted mechanism as
described in Eq. 2. The c parameter is set to 0.7 in ANTHOCNET [6].
Low values of b lead to considering all neighboring nodes of i to be of nearly
similar goodness as possible routes to destination d and hence lead to what the authors
of ANTHOCNET called ant proliferation. In this case, the network is overwhelmed
with the exploratory ants, which represent a burden over the MANET. On the other
hand, it was found experimentally in their research that optimal network performance is
obtained for b  20. The value of the b parameter in their research was fixed during the
simulation execution. In our algorithm, it is needed to keep a table having columns
representing the gathered performance metrics from the network through the sid data.
The rows of the new table are the last five values of sid for each performance metric.
The aim is to use these values in the game theory calculations.

3.2 Game Theory Component


Formal Definition of the Problem: In a normal form game, we have a set I of m players,
a strategy profile S and a set U of utility functions. Any individual player k 2 I has a set
Sk of strategies from which the player chooses only one strategy sk to play, such that
sk 2 Sk . The whole game strategy profile is a vector s ¼ fs1 ; s2 ; . . .; sm g that is the
Game Theoretic Approach to Optimize Exploration Parameter 469

strategies chosen by all players. The utility function uk ðsÞ is the gain of player k when
the strategy profile s is chosen by all players. To refer to the strategies chosen by all
players except player k, we write sk . So, we can rewrite the strategy profile chosen by
all players as following s ¼ fsk ; sk g. Nash equilibrium is a strategy profile in which
no player gains
 any benefit  from changing its strategy unilaterally. So, the strategy
profile s ¼ s1 ; s2 ; . . .; sm is said to be a Nash equilibrium of the game if [13]:
   
8 k 2 I; and 8 sk 2 Sk ; we have uk sk ; sk  uk sk ; sk ð3Þ

The aim of our model is to achieve the Nash equilibrium and set the optimal value of
the exploration parameter b in Eq. 1. The flowchart of the proposed algorithm is
illustrated in Fig. 1.

Fig. 1. Flowchart of the proposed routing algorithm

– Implementation of game theory in the proposed algorithm: Table 1 contains the


mapping between game theory terminology and its implementation in our problem.
So, the measured network performance metrics are used as input for changing the b
parameter to decide the chosen strategy for each player. On the other hand, these
metrics are changing based on the new b parameter’s value and are collected again
470 M. A. Hefnawy and S. M. Darwish

to calculate the player’s utility (payoff) value. The game is played until we obtain
the equilibrium.

Table 1. Modelling of the current problem in game theory


Game Corresponding elements in the MANET
terminology
Players Exploration and Exploitation
Strategy Action taken by the player to get maximum benefit. In our environment, it is
setting the values of b. This action is taken based on the incoming
measurements of the network performance metrics
Utility Network performance metrics after setting the new b value
function

– Strategies of the players: We associated the exploration player’s strategy with the
Signal to Noise Ratio (SNR) metric, and associated the exploitation player’s
strategy to the End to End Delay (EED) metric. Definitions of the selected metrics
are found in [14]. The raw measurement of each metric is transformed into (Low,
Moderate, and High) categories. The boundaries between Low, Moderate, and High
categories of a certain metric are arbitrary and are metric-environment dependent.
As an example, for the EED metric, if we are operating a live video streaming
application, then the boundaries should be set to very small values of the time. In
other non-live applications, we can relax the values of Low, Moderate, and High
quality of the metric. Other player-metric associations can be experimented.
The exploration player has a permanent target to lower the b value to achieve
higher exploration and vice versa for the exploitation player. Since the b value is set in
the original ANTHOCNET 1  b  20 [11], then we set the limits of the exploration
player’s freedom to change b within the range 1  b\bLimit , and the Exploitation
player’s freedom within bLimit  b  20. The parameter bLimit is the limit that separates
between the range that the first player can assign to b and the corresponding range of
player 2. We can formulate this as:

1  b1  bLimit  b2  20 ð4Þ

where bk is the b value chosen by player k. This value of bk is chosen in our algorithm
to be the third of the allowed range for player k according to the incoming category of
the performance metric (Low, Moderate, or High). Table 2 demonstrates how each
player specifies its strategy.

Table 2. Calculations of the strategy of each player


Associated metric’s value Strategies of Strategies of
Exploration player Exploitation player
Low b1L ¼ 1 b2L ¼ bLimit
Moderate b1M ¼ 1 þ ðbLimit 1Þ b2M ¼ bLimit þ ð20bLimit Þ
2 2
High b1H ¼ bLimit b2H ¼ 20
Game Theoretic Approach to Optimize Exploration Parameter 471

where bkL ,bkM ; bkH are the strategies (b values) chosen by player k when the
associated metric returns Low, Moderate, and High measurements correspondingly.
When the exploration player gets the worst measures for the associated metric, it is
forced to use the lowest possible b value in its allowed range that is b ¼ 1, and so on.
Generally, a player chooses the suitable b value according to the measure of the
associated network metric within the allowed range of b for that player. Since we have
only one b variable inside Eq. 1, so we average the two b values calculated by the two
players. The payoff matrix of the proposed algorithm is shown in Table 3.

Table 3. The payoff matrix of the proposed algorithm


Player2: Exploitation
(based on: EED)
Player1: Strategy s1 : Low s2 : Moderate s3 : High
Exploration quality quality quality
(based on Signal to s1 : Low b ¼ b1L þ2 b2L b ¼ b1L þ2 b2M b ¼ b1L þ2 b2H
Noise Ratio) quality
s2 : Moderate b ¼ b1M þ2
b2L
b ¼ b1M þ2 b2M b ¼ b1M þ2 b2H
quality
s3 : High b ¼ b1H þ2
b2L
b ¼ b1H þ2 b2M b ¼ b1H þ2 b2H
quality

– Utility functions of the players: The utility function of player 1 (Exploration player)
is SNRðbÞ and the utility function of the player 2 (Exploration player) is EEDðbÞ. In
other words, after setting the b value in Table 3, we use Eq. 1 to forward several
exploratory ants and wait for the backward ants carrying the new SNR and EED
values corresponding to the selected b: We consider the equilibrium is reached if,
after several iterations, no change of the value of the utility functions of both players
occur, or at least the change is within a predetermined threshold. In our experi-
ments, we chose no more than 2% change in 5 consecutive iterations.

Game Theater: The game in our algorithm is played during the reactive path setup
phase in ANTHOCNET. All the involved nodes in routing are eligible to initiate the
game to change the exploration behavior in regions of the network. A node initiates the
game after detecting degradation in the incoming path quality measures sid for a certain
successive number of received ants with a certain threshold of degradation. We set 30%
degradation of the metric’s measure as a triggering event to start the game.

4 Experimental Results

In this section we compare the proposed algorithm’s performance against the original
ANTHOCNET [6] in terms of EED, then we test the impact of changing the bLimit
parameter over the proposed algorithm’s EED also. We first introduce the simulation
environment and then describe the results.
472 M. A. Hefnawy and S. M. Darwish

4.1 Simulation Environment


We used the NS2.34 network simulator [15] on a PC with Intel Core i3 CPU having
3.3 GHz clock speed and 8 GB RAM. The number of mobile nodes in the simulation
environment is tested with different settings 20, 40, 60, 80 and 100 nodes. The used
settings for the simulation environment are present in Table 4. Regarding the com-
parison between the proposed algorithm and ANTHOCNET, we chose the EED metric
for comparison as it is the main concern for the end-users. We changed the number of
nodes from 20 to 100 gradually, re-run the simulation, and recorded the average EED
in every simulation session.

Table 4. Fixed parameters of the simulation environment


Simulation duration 100 s
Mobility model Random way point
Speed of the nodes 100 m/s
Radio-propagation model Two-ray ground-reflection model
MAC type 802.11
Interface queue type CMU priqueue
Link Layer Type LL
Max packet in ifq 50
Antenna type Antenna/OmniAntenna
Simulation area 1000 m  1000 m
Pause time 20 s

EED
Proposed Algorithm vs. ANTHOCNET
1
ANTHOCNET
0.8
Proposed Algorithm
EED (ms)

0.6
0.4
0.2
0
20 40 60 80 100
Number of Nodes

Fig. 2. EED for different network sizes.

For the first experiment, we set b ¼ 20 for the ANTHOCNET as in [11] and set
bLimit ¼ 8 for the proposed algorithm. As shown in Fig. 2, we found that the proposed
algorithm shows nearly similar EED to ANTHOCNET for the number of nodes \60.
For large number of nodes (  60), the proposed algorithm outperformed ANTHOC-
NET. The reason for the advantage of the proposed algorithm over ANTHOCNET for
Game Theoretic Approach to Optimize Exploration Parameter 473

large network size is that it gives the flexibility to search more routes in case of
network’s performance metrics degradation, or retain the current routes if they are
satisfactory. In contrast, the b parameter in ANTHOCNET is constant during the
simulation session. Constant small b values lead to high ant exploration, which causes
delay till finding the optimum route. On the other hand, constant high b values cause
excessive utilization of good routes, which leads to high congestion and causes high
EED in the case of large networks.
The other experiment, as shown in Fig. 3, is the impact of the bLimit parameter upon
the EED metric of the proposed algorithm. We set the number of nodes to 100 in this
experiment. We tested bLimit in a range from 2  bLimit  20. The bLimit parameter
represents the upper limit that the Exploration player can assign to the b variable and –
in the same time – is the lower limit that the Exploitation player can assign to it.
Extreme values of bLimit yield poor EED impact upon the network. In the case of small
values of bLimit , the exploration player has a low upper limit, and the exploitation player
has a low lower limit. Revising the calculation method of the final b variable at the
node that holds the game, the mentioned conditions provide low resultant b value
which means more exploration to be done. The other extreme case of higher values of
bLimit provide higher resultant b values which means more exploitation is done. The
balance between exploration and exploitation provides the best EED measures and is
achieved according to the experiment with bLimit is around 8.

Effect of parameter on the proposed algorithm’s EED

1
0.9
EED (ms)

0.8
0.7
0.6
0.5
2 4 6 8 10 12 14 16 18 20

Fig. 3. EED for different values of bLimit in the proposed Algorthim

5 Conclusion

We introduced a novel game-theoretic approach to control the exploratory behavior of


the ANHOCNET routing algorithm as one of the most efficient ACO routing algo-
rithms in MANET. The players are exploration and exploitation, which are semantic
players. The proposed algorithm is competitive to ANTHOCNET in small and mod-
erate size networks (up to 60 nodes), and outperforms ANHOCNET in large size
474 M. A. Hefnawy and S. M. Darwish

networks. For future work, we intend to experiment several player-metric associations


and examine the effect upon network performance.

References
1. Rosas, E., Hidalgo, N., Gil-Costa, V., Bonacic, C., Marin, M., Senger, H., Arantes, L.,
Marcondes, C., Marin, O.: Survey on simulation for mobile Ad-Hoc communication for
disaster scenarios. J. Comput. Sci. Technol. 31(2), 326–349 (2016)
2. Papakostas, D., Eshghi, S., Katsaros, D., Tassiulas, L.: Energy-aware backbone formation in
military multilayer Ad Hoc networks. Ad Hoc Netw. 81, 17–44 (2018)
3. Boukerche, A., Turgut, B., Aydin, N., Ahmad, M.Z., Bölöni, L., Turgut, D.: Routing
protocols in Ad Hoc networks: a survey. Comput. Netw. 55(13), 3032–3080 (2011)
4. Haque, I.T.: On the overheads of Ad Hoc routing schemes. IEEE Syst. J. 9(2), 605–614
(2015)
5. Walikar, G.A., Biradar, R.C.: A survey on hybrid routing mechanisms in mobile Ad Hoc
networks. J. Netw. Comput. Appl. 77, 48–63 (2017)
6. Ducatelle, F., Di Caro, G., Gambardella, L.M.: Using ant agents to combine reactive and
proactive strategies for routing in mobile Ad-Hoc networks. Int. J. Comput. Intell. Appl. 5
(02), 169–184 (2005)
7. Rathi, P.S., Mallikarjuna Rao, C.H.: Survey paper on routing in MANETs for optimal route
selection based on routing protocol with particle swarm optimization and different ant colony
optimization protocol. In: Smart Intelligent Computing and Applications, pp. 539–547.
Springer, Singapore (2020)
8. Stützle, T., López-Ibánez, M., Pellegrini, P., Maur, M., De Oca, M.M., Birattari, M., Dorigo,
M.: Parameter adaptation in Ant colony optimization. In: Autonomous Search, pp. 191–215.
Springer, Berlin (2011)
9. Deepalakshmi, P., Radhakrishnan, S.: Online parameter tuning using particle swarm
optimization for Ant-based Qos routing in mobile Ad-Hoc networks. Int. J. Hybrid Intell.
Syst. 9(4), 171–183 (2012)
10. Reina, D.G., Toral, S.L., Johnson, P., Barrero, F.: A survey on probabilistic broadcast
schemes for wireless Ad Hoc networks. Ad Hoc Netw. 25, 263–292 (2015)
11. Ducatelle, F., Di Caro, G.A., Gambardella, L.M.: An analysis of the different components of
the anthocnet routing algorithm. In: International Workshop on Ant Colony Optimization
and Swarm Intelligence, pp. 37–48. Springer (2006)
12. Sandhya, Goel, R.: Fuzzy based parameter adaptation in ACO for solving VRP. Int. J. Oper.
Res. Inf. Syst. 10(2), 65–81 (2019)
13. Kusyk, J., Sahin, C.S., Zou, J., Gundry, S., Uyar, M.U., Urrea, E.: Game theoretic and bio-
inspired optimization approach for autonomous movement of MANET nodes. In: Zelinka, I.,
Snášel, V., Abraham, A. (eds.) Handbook of Optimization. Intelligent Systems Reference
Library, pp. 129–155. Springer, Berlin (2013)
14. Quy, V.K., Ban, N.T., Nam, V.H., Tuan, D.M., Han, N.D.: Survey of recent routing metrics
and protocols for mobile Ad-Hoc networks. J. Commun. 14(2), 110–120 (2019)
15. . The Network Simulator - ns2. https://www.isi.edu/nsnam/ns/
Performance Analysis of Spectrum
Sensing Thresholding Methods
for Cognitive Radio Networks

Rhana M. Elshishtawy1(B) , Adly S. Tag Eldien1 , Mostafa M. Fouda1 ,


and Ahmed H. Eldeib2
1
Department of Electrical Engineering, Faculty of Engineering at Shoubra,
Benha University, Benha, Egypt
marwan.hefnawy@gmail.com
2
Department of Electrical and Communication Engineering,
Canadian International College for Engineering, Cairo, Egypt

Abstract. Cognitive radio (CR) is an innovated solution for the scarcity


of spectrum bandwidth. Spectrum sensing is a pivotal process to facili-
tate CR. Spectrum sensing indicates the availability/absence of the pri-
mary user (PU) which helps secondary users (SUs) accessing the spec-
trum band when it is idle while avoiding any interference. This leads to
an efficient use for the spectrum. At a low signal-to-noise ratio (SNR),
noise fluctuations (i.e., noise uncertainty) is the main reason for missed
detection or false alarm; which results in higher interference. This paper
introduces an efficient adaptive detection scheme for CR networks, where
Various SUs participate to distinguish inactive spectrum bands; improv-
ing the detection’s efficiency and overcoming the interference by decreas-
ing the error probability in spectrum sensing, and overcoming node fail-
ure using fusion center technique. Monte Carlo is used to analyze the
efficiency of detection under the usage of single, and adaptive double
thresholds.

Keywords: Cognitive radio · Sensing · Energy detection · Fusion


center · Adaptive double threshold · Monte Carlo

1 Introduction
Scarcity of the frequency spectrum is increasingly becoming an obstacle for the
implementation of new wireless technologies. To enhance the spectrum employ-
ment, cognitive radio (CR) as a principle has been used to enable unlicensed
users to employ the band that is not utilized by any licensed users; they are also
called secondary users (SUs) and primary users (PUs), respectively [8]. Thus,
spectrum sensing vitality is in identifying the idle bands accurately prevent-
ing interference with the PUs. Many algorithms were developed targeting the
reliability of the spectrum sensing process. This is accomplished using various
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 475–487, 2021.
https://doi.org/10.1007/978-3-030-58669-0_43
476 R. M. Elshishtawy et al.

techniques of spectrum sensing which are categorized as cooperative and non-


cooperative [2,20], and [14]. The use of CR enhances the efficiency of the wireless
frequency spectrum resources. The steps used to manage this process are men-
tioned in [21]. One of the most important purposes for CR is to get the most
accurate decision for presence/absence of the PU; thus, researchers are seeking
this target. The presence of the confusion region that includes two undecided
targets (missed detection & false alarm) will decrease the credibility of detection,
which is an obstacle facing the sensing progress. This is discussed and enhanced
in our work by using cooperative spectrum sensing (CSS) technique.
The paper is arranged as follows: Sect. 2 offers a brief illustration for the
literature on the introduced sensing methods. Section 3 illustrates the system
model, energy detection (ED) and CSS techniques study. Section 4 presents the
results of the analysis for both techniques. Section 5 includes the observation for
the paper.

2 Related Work

There are many works outlining the use of ED with the traditional single thresh-
olding operation as in [9], and [24], which discuss the ED thresholding method,
its sensing performance and hypotheses metric, also are used in the decision
making, how the threshold is adapted dynamically, to the level of noise depend-
ing on the Constant False Alarm Rate method as in [5]. Recognition of the
free holes in a spectrum band, enhances the execution of double threshold ED
method, over additive white Gaussian noise (AWGN) as declared in [7,22]. How-
ever, ED spectrum sensing has some failures as it is a semi-blind technique that
may cause tolerance in confirming the presence of the signal as in [9]. The region
that may suffer from missed targets or false targets could be named confu-
sion region. Alongside the noise uncertainty, this causes an unreliable detection
process. Many researches worked on some enhancements in the traditional ED
threshold [19]. Moreover, some researches used the CSS with single threshold-
ing technique to enhance the performance [1]. The authors in [6,13] introduc-
ing double threshold using the logic rule, the proposed technique is analyzed
through comparing the performance of [6] with the proposed technique that
shows improvement in the accuracy of sensing process.
At a low signal-to-noise ratio (SNR), noise fluctuation (i.e., noise uncertainty)
influences the reliability of detection causing false alarms and missed targets.
Moreover, node failure is another factor of fusing many local decision into a
single global decision which affects the detection accuracy as discussed over
a fading channel in [18]. In this paper, the noise uncertainty problem at low
SNR wall that results in unreliable detection and missing targets which leads to
interference, is enhanced using adaptive double threshold (ADT) to enhance the
sensing through the confusion region. Using ED and CSS both based on binary
phase shift keying (BPSK) modulation and Monte Carlo simulation, which is
implemented to get out the optimal sensing, thresholding parameters, decreasing
interference, and increasing the performance efficiency. Moreover decreasing the
Performance Analysis of Spectrum Sensing Thresholding Methods 477

probability of error using ADT. Thus, we predict the detection probability and
false alarm probability for diverse values of samples, test the effect of changing
some parameters and finding the optimal performance, by using the following
methods:
1. Single threshold (ST) method [19] using ED spectrum sensing technique
(namely ED-ST).
2. Adaptive double threshold (ADT) method [12] using ED or CSS (namely,
ED-ADT and CSS-ADT, respectively).

3 System Model
Binary hypothesis testing [4] is used in ED, to discover the availability of the PU
signal. This is done by contrasting the signal’s received energy in a particular
frequency band with a pre-defined threshold. The received signal is given as
follows:  
n(t), H0
X(t) = , (1)
h(t) ∗ s(t) + n(t), H1
As, X(t) is the received signal, the sensing channel’s gain is represented by h(t),
n(t) is the zero mean AWGN and the PU’s transmitted signal is given by s(t).
The availability of the PU is indicated by the two hypothesis, depending on the
energy (E), if it is larger than the threshold value (λ) and vice verse. When it
is H0 , then the PU is absent but when it is H1 , then the PU is present. After
sampling X(t), the nth sample X(n) is given by:
 
n(n), 1 ≤ n ≤ M, H0
X(n) = , (2)
h(n) ∗ s(n) + n(n), 1 ≤ n ≤ M, H1
As, M is the samples total number. The occupancy decision of the band will
be confirmed by contrasting the decision metric Λ with the threshold value λ.
Neyman–Pearson criterion was applied to the binary hypothesis given in Eq. 2,
the test statistics of ED is as follows [4]:

M
Λ= |X(n)|2 . (3)
n=1

The taken procedures to implement the ED spectrum sensing techniques are


declared in [3] as Fig. 1 shows as follows:
– At first, a band pass filter (BPF) is used to suppress any noise and choose the
interesting band, then the signal passes within an analog to digital converter
(ADC).
– Moreover, when the number of samples is very large, fast Fourier transform
(FFT) is usually employed to reduce the complexity.
– The N samples that is received from the FFT, will be squared, then averaged,
contrasted with the value of threshold λ to get the sensing decision and admit
the availability/absence of the PU.
Here we will compare between ST and ADT from thresholding point of view
to get the best performance.
478 R. M. Elshishtawy et al.

3.1 Single Threshold Energy Detection Based

The ED performance is assessed by the following parameters depending on the


binary hypothesis test.
According to the test statistics, by using the binary hypothesis testing H0
and H1 , there are four probabilities of detection cases which we describe as
follows:

– The 1st and the 2nd cases represents the first performance metric it depend on
the detection probability (Pd ), it confirms the presence of the signal when H1
is true, when it is P (H1|H1), it is a true positive; but when it is P (H0|H0),
it is a true negative.
– The 3rd case is the false alarm probability (Pf ). It decides the presence or
not when H0 is true, i.e., P (H1|H0) is false alarm in reality, the signal is not
available, but the detection reveals it’s presence so the SU will not utilize the
spectrum because of that false detection. So Pf leads to poor spectrum usage
which is the reason that higher Pf is affecting the performance.
– The 4th case is the missed detection probability (Pm ). It decides the absence
of the signal when H1 is true, i.e., P (H0|H1) is miss detection, and thus
Pm = 1 − Pd , the signal is present in reality, but the decision is its absence,
so the SU will use the spectrum when the PU is already utilizing it. So that
Pm causes in the SU and PU interference. By increasing Pm this will increase
the poor efficiency due to a destructive interference. Finally, for an efficient
detection, we need to achieve high P d, low Pm , and low Pf . The simulation
will confirm that through the following equations:

λ − M σ2
Pf (Λ > λ|H0 ) = Q( √ ), (4)
2M σ 2
λ − M σ 2 (1 + SN R)
Pd (Λ > λ|H1 ) = Q( √ ). (5)
2M σ 2 (1 + SN R)
The missed detection probability can be calculated by:

Pmd = 1 − Pd . (6)

The detection’s probability of error can be calculated by:

Pe = Pf + Pmd . (7)

X(t) BPF H0 or H1
Squaring
& FFT Mean Value Thresholding
Device
ADC

Fig. 1. Energy detection spectrum sensing steps block diagram [3].


Performance Analysis of Spectrum Sensing Thresholding Methods 479

The pre-defined threshold (λ) is the factor that confirms the availabil-
ity/absence of a signal. The performance metrics are the factors that decide
its operating value. As it depends on Pf [4] as shown in the following equation:

λ = Q−1 (1 − Pf ) + 2M σ 4 + M σ 2 , (8)
As, Q−1 is the Q function’s inverse, and M is the samples number as follows:

Q−1 (Pf ) − Q−1 (Pd ) 2SN R + 1 2
M =( ) . (9)
SN R
The fourth performance metric is obtained after drawing the previous equa-
tion, is receiver operating characteristic (ROC) curve. It is a relation between
the Pd versus the Pf for a given SNR.

3.2 Adaptive Double Threshold CSS


The noise uncertainty problem in the transmission systems, makes the usage of
the traditional ED-ST sensing technique a complex solution. Moreover, the per-
formance of detection will not be with its highest efficiency under low SNR [23].

To solve this problem, ADT based on CSS (CSS-ADT) was introduced in [10],
the model is in [11], and [15] as shown in Fig. 2. Moreover, it declares the CSS
fusion center (FC) paradigm. In which, results from each SU are collected by the
FC within specific rules (e.g., OR logic rule), Fig. 3 shows the detailed system
model that shows the whole process of sensing, the detection probability (Qd )
and false alarm probability (Qf ) are given in [10] as follows:

K
Qd = 1 − (1 − Pdi ), (10)
i=1

X(t) or
Energy Threshold
Threshold
detection adaptation

SU1

X(t) or
Energy Threshold
Threshold Fusion Center
detection adaptation

SU2

X(t) or
Energy Threshold
Threshold
detection adaptation
SUn

Fig. 2. Steps for CSS spectrum sensing [11], and [15].


480 R. M. Elshishtawy et al.


K
Qf = 1 − (1 − Pf i ), (11)
i=1

where Pdi is the detection probability and Pf i is the false alarm probability for
the ith SU, K is the CRs number that are collaborating in CSS. Then the FC
makes a global decision about the PU signal.
The number of samples for CSS is given in [16] as:
√ √
[ 2Q−1 (Pf i ) − 4γ + 2Q−1 (Pdi )]2
M= , (12)
γ2

where, γ = σs2 /σn2 which is the SU received signal’s power to noise ratio.
So to enhance the detection performance by reducing the interference resulted
from confusion region; our proposed CSS contribution was used with the follow-
ing threshold:

σn2 −1 1 γ 4γ + 2 
λ(CSS) = + Q (1 − P̂f i )( + + ln 2γ + 1, (13)
2 4 2 Mγ
where λ1 and λ2 are given by:

λ1 = (1 − ρ)λ(CSS) , (14)

λ2 = (1 + ρ)λ(CSS) , (15)
where ρ represents the uncertainty parameter, ρ > 1 is the noise uncertainty
size. The decision metric in Eq. 3 is used to decide the location of the PU signal,
if it is located out the confusion region (greater than λ1 or smaller than λ2 then

Fusion center

Fig. 3. System model diagram.


Performance Analysis of Spectrum Sensing Thresholding Methods 481

the generated 1 or 0 indicates the availability or absence of the PU, respectively.


On the other hand, if the value of energy is located in the region of confusion
which is the region that includes the higher probabilities for the missed detection
or false alarm targets, according to which sub-region that Λ falls, the decimal
values (DV) will be generated as shown in Fig. 4, that was introduced in [19] as
follows: ⎧ ⎫
⎪ 00, λ1 < Λ ≤ A ⎪

⎨ ⎪

01, A < Λ ≤ B
DV = , (16)

⎪ 10, B < Λ ≤ C ⎪ ⎪
⎩ ⎭
11, C < Λ ≤ λ2
where A = (1 + α)λ(CSS) , B = λ and C = (1 − α). α represents the estimated
test statics related to each single decision received from the SU.
Moreover, the choice at the CR/SU user using the ADT scheme towards the
PU signal’s detection is declared in the next logic rule (LR) [19]:
⎧ ⎫
⎨ H0 = 0, Λ ≤ λ1 (P U absent) ⎬
LR = H = DV, λ1 < Λ < λ2 (N ot sure) , (17)
⎩ ⎭
H1 = 1, Λ ≥ λ2 (P U present)
The PU’s detection decision is confirmed when Λ is in the region of confusion,
depending on n value which is compared to a specific threshold that reaches the
needed probability of false alarm:
 
n = DV, λ1 < Λ < λ2 , (18)
Each received decision has a test static [11] that can be estimates as:
⎧ λ1 +A ⎫

⎪ , 00 ⎪

⎨ A+B 2 ⎬
2 , 01
RXi (F C) = , (19)

⎪ 2 , 10 ⎪
B+C

⎩ C+λ 2

2 , 11

Confusion
Region

Missed Detection + False Alarm

Signal is Absent : H0
Signal is Present : H1

Power
00 01 10 11

A B C

Lower Upper
Threshold Threshold

Fig. 4. The confusion region and its four sub-regions [19].


482 R. M. Elshishtawy et al.

The global test static equation [15] which was estimated is given by:


M
GRX = βi ∗ RXi (20)
i=1

Where βi is the ith estimated global test statistics (RXi ) assigned weight as
β1 + β2 + .... + βM = 1, then it is contrasted to threshold λ(CSS) to arrive at the
final decision. Threshold λ(CSS) can be evaluated from (13) as three different
values are given to the above threshold, it takes corresponding to the targeted
Pf i in the following cases:

1. Using threshold that P̂f i = Pf i to avoid interference of PU with SUs.


P
2. Using threshold with P̂f i = 3f i to increase the SU utilization for the spec-
trum.
P
3. For comparing between the previous two cases threshold with P̂f i = 2f i was
used.

After we apply the previous values to the threshold the Pmd will be affected
too and it will decrease as the Pf i decreases. Thus, decreasing the interference
between the PU and the CR users which enhances the performance of sensing.
Moreover, using the enhanced technique under any of the above thresholds tends
to provide higher performance than the traditional ED [17]. In Fig. 5 the flow
chart for the improved CSS-ADT is declared showing the sensing steps.

4 Simulation Results
Here, a simulation study was introduced to analyze the performance of the intro-
duced paradigm. Firstly we introduced a model using MATLAB for an ED-ST
method based on Monte Carlo simulation. Moreover, we assume that all the
signals are complex Gaussian, the used algorithm is as follows:

1. Let that only noise is received, i.e., PU is not available.


2. As if, only the noise lies above the threshold, this corresponds to false alarm.
3. Run this scenario for a number of iterations.
4. Probability of false alarm = energy above threshold/number of iteration.

Figure 6 shows that the ED-ADT spectrum sensing technique provides


around 15% higher performance than the ED-ST spectrum sensing technique. In
Fig. 7, the ROC curve between ED-ADT and CSS-ADT, with number of samples
N = 500. As by using the CSS the Pd is much more higher than using the ED as
to reach the same Pd as in the CSS showing enhancement by 30%.
Performance Analysis of Spectrum Sensing Thresholding Methods 483

Start

The first Yes Performance single


judgement threshold ED

No

Perform CSS-ADT Compare Energy


values with threshold

Each SU participate in CSS


calculate energy

Yes
Send bit 1 to Yes
FC
No
Fuse all the No
available
No
decision FC==0
according to Yes
OR rule
Yes
No
Send bit 0 to
Sensing Failure FC

Send 00 or 01 or 10 or 11 to FC

PU present

Yes
Calculate GRX

No
PU absent

Fig. 5. The flow chart for the improved CSS-ADT.

Moreover, Fig. 8 shows the relation between the various SNR and the coop-
erative detection probability (Qd ) which confirms the higher performance using
the CSS-ADT with various SNR than ED-ADT. Finally, Fig. 9 declares the rela-
tion between the error probability (Pe ) and the SNR, showing lower (Pe ) with
the CSS-ADT. So as to deal with the drawbacks of node failure and interference;
CSS was used as it was declared and proved here by simulation.
484 R. M. Elshishtawy et al.

0.9

0.8

Probability of detection (Pd )


0.7

0.6
ED−ADT
0.5 ED−ST

0.4

0.3

0.2

0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Probability of false alarm (Pf )

Fig. 6. ROC curve between Pd vs Pf for ED-ST and ED-ADT.

0.9
Cooperative probability of detection (Qd)

0.8

0.7

0.6

0.5
CSS−ADT
0.4 ED−ADT

0.3

0.2

0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Cooperative probability of false alarm (Q f)

Fig. 7. ROC curve between Pd vs Pf for ED-ADT and CSS-ADT.


Performance Analysis of Spectrum Sensing Thresholding Methods 485

0.9

Cooperative probability of detection (Qd)


0.8

0.7

0.6
CSS−ADT
0.5 ED−ADT

0.4

0.3

0.2

0.1

−15 −14 −13 −12 −11 −10 −9 −8 −7 −6 −5


SNR(dB)

Fig. 8. Relation between Qd and SNR.

0.9

0.8

0.7
Probability of error (Pe)

0.6
CSS−ADT
0.5 ED−ADT

0.4

0.3

0.2

0.1

−15 −14 −13 −12 −11 −10 −9 −8 −7 −6 −5


SNR(dB)

Fig. 9. Relation between Pe vs SNR for single threshold ED and ADT.

5 Conclusion
Cooperative spectrum sensing (CSS) is done in each secondary user (SU) using
adaptive double threshold (ADT) technique. A fair opportunity is given to
doubtful consumer to guarantee reputation. Thus, fusion center (FC) is used
to receive two choices (local decisions and energy values). An improved model is
proposed under which the observed energy values are averaged by the FC and
then compared it to the threshold (λ) to confirm a decision. In addition, it mixes
all the SUs local decision by OR logic rule of FC to get a final decision. Significant
improvement is noticed in the detection performance after using the CSS-ADT.
Alongside, sensing failure issue has also been solved. For future enhancement to
486 R. M. Elshishtawy et al.

this model, the time for sensing will be decreased, thus SUs could get the most
out of using this system.

References
1. Al-Jarrah, M.A., Al-Dweik, A., Ikki, S.S., Alsusa, E.: Spectrum-occupancy aware
cooperative spectrum sensing using adaptive detection. IEEE Syst. J. 1–12 (2019,
in press). https://doi.org/10.1109/JSYST.2019.2922773
2. Ali, A., Hamouda, W.: Advances on spectrum sensing for cognitive radio networks:
theory and applications. IEEE Commun. Surv. Tutor. 19(2), 1277–1304 (2017)
3. Arjoune, Y., El Mrabet, Z., El Ghazi, H., Tamtaoui, A.: Spectrum sensing:
enhanced energy detection technique based on noise measurement. In: 2018 IEEE
8th Annual Computing and Communication Workshop and Conference (CCWC),
pp. 828–834. IEEE (2018)
4. Atapattu, S., Tellambura, C., Jiang, H.: Energy Detection for Spectrum Sensing
in Cognitive Radio. Springer, Cham (2014)
5. Bunch, J.R., Fierro, R.D.: A constant-false-alarm-rate algorithm. Linear Algebra
Appl. 172, 231–241 (1992)
6. Charan, C., et al.: Double threshold based cooperative spectrum sensing with
consideration of history of sensing nodes in cognitive radio networks. In: 2018
2nd International Conference on Power, Energy and Environment: Towards Smart
Technology (ICEPE), pp. 1–9. IEEE (2018)
7. Elshishtawy, R.M., Eldien, A.S.T., Fouda, M.M., Eldeib, A.H.: Implementation
of multi-channel energy detection spectrum sensing technique in cognitive radio
networks using LabVIEW on USRP-2942R. In: 2019 15th International Computer
Engineering Conference (ICENCO), pp. 1–6. IEEE (2019)
8. Fadlullah, Z.M., Nishiyama, H., Kato, N., Fouda, M.M.: Intrusion detection system
(IDS) for combating attacks against cognitive radio networks. IEEE Netw. 27(3),
51–56 (2013)
9. Fouda, M.A., Eldien, A.S.T., Mansour, H.A.: FPGA based energy detection spec-
trum sensing for cognitive radios under noise uncertainty. In: 2017 12th Interna-
tional Conference on Computer Engineering and Systems (ICCES), pp. 584–591.
IEEE (2017)
10. Ghasemi, A., Sousa, E.S.: Collaborative spectrum sensing for opportunistic access
in fading environments. In: First IEEE International Symposium on New Frontiers
in Dynamic Spectrum Access Networks (DySPAN), pp. 131–136. IEEE (2005)
11. Ghazizadeh, E., Abbasi-moghadam, D., Nezamabadi-pour, H.: An enhanced two-
phase SVM algorithm for cooperative spectrum sensing in cognitive radio networks.
Int. J. Commun Syst 32(2), e3856 (2019)
12. Gorcin, A., Qaraqe, K.A., Celebi, H., Arslan, H.: An adaptive threshold method for
spectrum sensing in multi-channel cognitive radio networks. In: 2010 17th Inter-
national Conference on Telecommunications, pp. 425–429. IEEE (2010)
13. Hai, W., Zhang, Y., Chen, Z., Guo, X., He, C.: A signal marker method based
on double threshold energy detection. In: 2018 12th International Symposium on
Antennas, Propagation and EM Theory (ISAPE), pp. 1–4. IEEE (2018)
14. He, Y., Xue, J., Ratnarajah, T., Sellathurai, M., Khan, F.: On the performance of
cooperative spectrum sensing in random cognitive radio networks. IEEE Syst. J.
12(1), 881–892 (2016)
Performance Analysis of Spectrum Sensing Thresholding Methods 487

15. Lee, Y.L., Saad, W.K., El-Saleh, A.A., Ismail, M.: Improved detection performance
of cognitive radio networks in AWGN and Rayleigh fading environments. J. Appl.
Res. Technol. 11(3), 437–446 (2013)
16. Liu, X., Zhang, C., Tan, X.: Double-threshold cooperative detection for cognitive
radio based on weighing. Wirel. Commun. Mob. Comput. 14(13), 1231–1243 (2014)
17. Muthumeenakshi, K., Radha, S.: Improved sensing accuracy using enhanced energy
detection algorithm with secondary user cooperation in cognitive radios. Int. J.
Commun. Netw. Inf. Secur. 6(1), 17–28 (2014)
18. Niu, R., Chen, B., Varshney, P.K.: Fusion of decisions transmitted over Rayleigh
fading channels in wireless sensor networks. IEEE Trans. Signal Process. 54(3),
1018–1027 (2006)
19. Omer, A.E.: Review of spectrum sensing techniques in cognitive radio networks.
In: 2015 International Conference on Computing, Control, Networking, Electronics
and Embedded Systems Engineering (ICCNEEE), pp. 439–446. IEEE (2015)
20. Qin, Z., Wang, J., Chen, J., Wang, L.: Adaptive compressed spectrum sensing
based on cross validation in wideband cognitive radio system. IEEE Syst. J. 11(4),
2422–2431 (2015)
21. Ranjan, A., Singh, B., et al.: Design and analysis of spectrum sensing in cognitive
radio based on energy detection. In: 2016 International Conference on Signal and
Information Processing (IConSIP), pp. 1–5. IEEE (2016)
22. Sarala, B., Devi, S.R., Sheela, J.J.J.: Spectrum energy detection in cognitive radio
networks based on a novel adaptive threshold energy detection method. Comput.
Commun. 152, 1–7 (2020)
23. Tandra, R., Sahai, A.: SNR walls for signal detection. IEEE J. Sel. Top. Signal
Process. 2(1), 4–17 (2008)
24. Umebayashi, K., Hayashi, K., Lehtomäki, J.J.: Threshold-setting for spectrum
sensing based on statistical information. IEEE Commun. Lett. 21(7), 1585–1588
(2017)
The Impacts of Communication Ethics
on Workplace Decision Making
and Productivity

Alyaa Alyammahi1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Communication is the first most important tool towards workplace


success. Effective communication ethics increases workplace productivity in
various ways, as the management and the employee’s alike employee is capable
of making the right decisions concerning various issues within a workplace.
Effective commination ethics govern all the decisions made within any work-
place situation. It is established that ethics guide and direct all the business
operations, an element that is crucial in the overall success of any business. The
database has been reviewed to inform the system review. Besides, the finding
was that communication ethics is key in any business decision-making process.
There is a need for any workplace management to understanding the role of
effective workplace communication ethics and culture or integrate such ethics in
everyday practice within the workplace. The systematic review does reveal the
impact of workplace communication ethics on the decision-making and the
overall productivity of a workplace.

Keywords: Communication  Ethics  Productivity  Decision making

1 Introduction

Communication Ethics is more than just a concept that is applied in enhancing the image
of the corporation; ethics are the key foundations of success [1, 2]. Communication
ethics should be applied from the very moments that open its doors [3]. Communication
ethics is made of the action of the individuals working within a given workplace [4].
Managers do decide daily that does affects their entire business workplaces as the whole.
Such a decision relies on how effective is communication is [5, 6]. Not only do these
decisions impact the jobs and livelihood, but they do also have consequences that are
both positive and negative for businesses as a whole, including personal, customer, and
the community in general [7–9]. Ethics of communication in the workplace are often
critical components of safeguarding individuals and the groups from the potential
negative consequences of poor managerial decisions making [10, 11]. These impacts
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 488–500, 2021.
https://doi.org/10.1007/978-3-030-58669-0_44
The Impacts of Communication Ethics on Workplace Decision Making 489

directly on the aspect of the decision-making process. Communication ethics often relate
to how a workplace or corporation handles a situation that requires moral decisions.
According to [12] building on the foundations of effective ethics in communication
within a workplace helps the workplace in creating a long-lasting positive effect of the
workplace as such ethics has a more significant impact on the wellbeing and to general
operations of the business success based on the right decision-making process [13].
According to the literature, communication ethics in the workplace is critical in shaping
the vital services of workplace decision making. The existing literature has focused on
the essential role of ethics in business and management on the element of guiding the
decision-making process in the workplace. Various studies carried out did give a
valuable understanding through a synthesis of the role of communication ethics in the
workplace [13]. Consequently, the present-day systematically synthesized the ethical
practices in any workplace in affording a compressive analysis of the collected studies.
Even more specifically, this review poses several questions.
Research questions
RQ1. What are the significant effects of communication ethics in workplace deci-
sion making and overall workplace productivity?
RQ2. What are the main research methods and the outcomes addressed in the
collected studies?
RQ3. What is the active database in the context of communication ethics in
workplace decision making?

Research Importance: The research aims to establish the importance of communi-


cation ethics in the process of the decision-making process. The research is therefore
important in informing the organization’s decision-making process. To the UAE, it is
essential in promoting the best management practices through effective ethics in
communication.
Research Objectives: The key objective of the research includes
• To establish the role of communication ethics in the business decision-making
process
• To enhance the understanding of the role played by the ethics of communication in
a business decision-making process.
• The systematic review aims to establish some of the critical research conducted on
the aspect of communication ethics and its contribution to the decision-making
process.

2 Literature Review

Recently, numerous studies have been conducted on developing an understanding of


the impact or effect of ethics in business and corporate management. This knowledge as
far much more key than any other asset in the workplace management. Communication
ethics have become a much more prevalent research trend in the workplace [7].
490 A. Alyammahi et al.

Communication ethics is defined as; “is the accepted set of moral values and corporate
standards of communication in a workplace setting.” [7]. The guiding principle or
framework of communication ethics does permeate all the veracious levels of work-
places and standards of decision-making. It is about having the wisdom of determining
the differences between the right and the wrong decisions within a workplace. Com-
munication ethics fundamentally epitomizes the workplace’s codes of corporate gov-
ernance and hence governs the element of workplace productivity [14]. It works to
stipulate the essential morality standards and behavioral patterns expected of the
individual and the workplace at large [15]. The numerous moral benchmarks can be
perceived in terms of the microenvironment and the macro-environment of the
workplace and its decision-making ability [13].
According to the literature, different researchers have introduced the various impact
of communication ethics on a workplace decision-making process. [16] did state that
communication ethics the study of the standards of business behaviors that works to
promotes welfare and the good.”. Communication ethics, according to the literature,
manifests both as the written and unwritten codes of moral standards that are often
crucial to the current activities and future inspirations of any workplace environment
[17]. They can usually different from the workplace to the other [18]. The differences
are based on the elements of cultural perspectives, operations structures, strategic, and
operations. Regarding the effects of workplace communication ethics on any workplace
depending on the specific workplace culture, the workplace can reap positively or
negatively. Concerning the effect of the literature review studies, [19] did conduct a
systematic review to analyze the studies that are related to the role that communication
ethics plays within a workplace with the attempt to construct a better understanding.
Also in [20], authors conducted a systematic review to analyze the impact of a negative
impact on the workplace’s communication ethics on the decision-making process. In
[16], the authors conducted a systematic review of the knowledge revolving on the role
of communication ethics in the workplace employee and management process. Bedi
et al. in [21] conducted a systematic review on what are some of the positive effects of
the communication ethics on the success of a workplace while relating some of the
harmful practices on workplace decision making related to poor ethics in communi-
cation [15]. Authors in [22] carried out a systematic review of the causes of the
existence of poor communication ethics in the workplace and how to manage them.
Based on the above literature reviews, none of the studies considers the actual specific
ethics practices, which can be classified as both positive and negative communication
ethics in workplace decision making and the productivity of the workplace [13].
• Communication Ethics: Communication ethics is the notion that is governed by
their morals which in turn affects communication [16].
• Productivity: the effectiveness of productive effort, especially in industry, as
measured in terms of the rate of output per unit of input [23].
• Decision making: is the process of identifying and choosing alternatives based on
the values, preferences, and beliefs [6].
The Impacts of Communication Ethics on Workplace Decision Making 491

3 Methods

A critical review is a crucial stage before conducting any research study [23]. It forms
the foundations of the accumulation of the necessary knowledge in which interns make
it possible for the theory’s extensions and development of critical relationships to
inform the various elements of the study [18]. A literature review can be viewed as a
systematic review, but this is only when the analysis is based on the specific research
questions, determines and analyzes relevant research studies, and evaluate their critical
qualities on a given criterion [20]. In this review study, the study was conducted based
on the effects of ethics on decision making. The analysis of each of the studies from the
all through was performed manually.

3.1 Research Strategy


3.1.1 Keywords and Database Table
The articles that will be critically analyzed in this review study should meet the
inclusion and exclusion criteria described in Table 1 below:

Table 1. The data sources and search keywords.


Journal Frequency Keywords Duration Total
database
Goggle 11 “Communication Ethics” AND 2015– 30
scholar “productivity” AND “decision making” 2020
ProQuest 4
Emerald 4
Sage 5
journals
Journal 3
citation
reports
JSTOR 3

The diagram shown in Fig. 1 indicates the criteria which were used in arriving at
the eligible article and journal that was used in the systematic review. It begins widely
with the various article on the same topic but narrows down specific n = 30 articles for
the review.

3.1.2 Inclusion and Exclusion Criteria


The inclusion and exclusion Table 2 gives the criteria for the choice of an article that
was used on the process of the systematic review. An article which meets all the
requirement for the systematic review was considered while any which missed any of
the criteria did was excluded.
492 A. Alyammahi et al.

Fig. 1. Systematic review process.

Table 2. Inclusion and exclusion criteria.


Inclusion criteria Exclusions criteria
Should involve the impact of Impact of communication ethics not related to
communication ethics on workplace workplace decision making
decisions making
Should involve the impact of Impact of communication ethics relating to the
communication ethics on workplace workplace decision making and productivity
decision making and productivity
Should be published between 2015–2020 Published earlier than 2015
Should be in English Published in other languages other than
English

3.1.3 Factor Table


The factor Table 3 outlines which element of the dependent and the independent factor
of the communication ethics process is key in the decision-making element. It gives an
overall summary of what each source reviewed contributes to the research as seen in
Table 4.
The Impacts of Communication Ethics on Workplace Decision Making 493

Table 3. Factor frequency


Source Dependent factors Independent factors
Knowledge Education Culture Technology Moral Values Personal
goals
1 [14] X
2 [19] X X x
3 [21] X X
4 [12] X x
5 [23] X X x
6 [24] X
7 [18] X
8 [25] X X
9 [20] X
10 [7] X
11 [26] X X
12 [27] X x
13 [28] X X x
14 [17] X x
15 [22] X
16 [16] X X X
17 [15] X X x
18 [29] X x
19 [30] X X X x
20 [31] X
21 [32] X
22 [33] X X
23 [34] X X x
24 [35] X x x
25 [36] X X
26 [37] X X
27 [38] X X
28 [6] X X x
29 [39] X
30 [13] X
Frequency 11 13 8 8 6 6 7

Table 4. Factor frequency summary


Factor Frequency
1 Knowledge 11
2 Education 13
3 Culture 8
4 Technology 8
5 Moral 6
6 values 6
7 Personal goals 7
494 A. Alyammahi et al.

Table 5. 30 articles analysis brief


No Source Study purpose Country Method Finding
1 [14] Effect of knowledge on Spain Qualitative The result indicated that
communication ethics on communication ethics are critical in
workplace decision making specific decision-making skills
development and productivity
levels
2 [19] Effect of education Japan Quantitative The result indicated that workplace
communication ethics on growth is directly affected by the
ethical aspect of workplace
operations
3 [21] Communicating ethic on UK Quantitative The result indicated that the
workplace management workplace could reap positively or
negatively depending on the
communication ethics
4 [12] Impact of ethics on Canada Qualitative The result indicated that employees
workplace management work with a high demand for ethics
in all facets of their operation do
perform their responsibilities at
more elevated levels
5 [23] Communication ethics USA Quantitative The result indicated that ethically
running a business from top to
bottom often build a stronger bond
between individual on the
management team
6 [24] To analyze impact ethics on UAE Surveys Findings indicated that guides the
management generally management decisions
7 [18] Relationship between USA Qualitative The result indicated that that
communication ethics and communication ethics the study of
employee conducts decision the standards of business behaviors
making that works to promotes welfare and
the good.
8 [25] Ethical decision making EU Qualitative The systematic review results
within the business recommend more qualitative studies
in business ethics
9 [20] Role of managers in Malaysia Quantitative The outcome indicated that
impacting communication managers change the employee’s
ethics behavior communication ethics
10 [7] Workplace communication UK Quantitative The result indicated that
ethics Communication ethics
fundamentally epitomizes the
workplace’s codes of corporate
governances
11 [26] Communication ethics France Quantitative The outcome indicated that
importance managers who are using the best
communication ethics in their
business do find employees easy to
manage as they follow such
footsteps
(continued)
The Impacts of Communication Ethics on Workplace Decision Making 495

Table 5. (continued)
No Source Study purpose Country Method Finding
12 [27] Role of communication China Quantitative The findings indicate that
ethics in a workplace communication ethics is dynamics
and is ever-changing hence needs
and an ever-integrating approach
13 [28] Relationship between USA Not Most workplaces, especially in the
communication ethics and specified USA, do have stronger
employees’ value communication ethical operations
and hence are very successful
14 [17] Effects Personal goals on Indonesia Quantitative The result indicated that
workplace communication Communication ethics primarily
ethics epitomizes the workplace’s codes of
corporate governances
15 [22] Role and responsibility of Not Qualitative The result indicated that most
managers in fostering specified businesses have put in place the
workplace communication best ethical application which
ethics focuses on the employees and not
managers
16 [16] Workplace communication USA Quantitative The result indicated that work
ethics importance which is failing in ethical
consideration are doing so in all
aspects of their operations as well
17 [15] Role of managers in USA Quantitative The result indicated that workplaces
impacting ethics behavior which have succeeded have the best
ethical application
18 [29] To analyze impact UK Quantitative The result indicated that ethics is
workplace communication entirely in any element of operation
ethics in a workplace setting
19 [30] Role of workplace ethics on France Quantitative The result indicated that any
management workplace operation relies on the
nature of the business skills
impacted by the manager
20 [31] Effect of workplace UK Qualitative The result indicated that affects
communication ethics on decision-making skills incubation
workplace skills and and growth
productivity
21 [32] Workplace Ethical Japan Quantitative The result indicated that ethics
considerations in consideration does not have any
management procure of application
22 [33] Role of workplace ethics on Not Quantitative The result indicated that the
management specified managers at all levels foster
workplace ethics
23 [34] Communication Ethical UK Quantitative The result indicated that ethical
implication of any practice is either written or
workplace practice unwritten code of workplace
operations
24 [35] To determine the role of Norway Quantitative The result indicated that managers
managers in fostering do play a crucial role in impacting
communication ethics on employee’s communication
ethical understanding
(continued)
496 A. Alyammahi et al.

Table 5. (continued)
No Source Study purpose Country Method Finding
25 [36] Relationship between USA Qualitative Findings reveal that harmful
communication ethics and communication ethical practice
decision making harms workplace operations
26 [37] Role and responsibility of Canada Quantitative The result indicated that managers
managers in fostering are the custodian of a workplace
effective communication communication ethical practice
ethics
27 [38] To analyze impact ethics on USA Quantitative The result indicated that impact on
workplace productivity the workplace in various ways
depending on how it’s applied
28 [6] Effect of communication Canada Quantitative The result indicated that it is the
ethics on workplace skills primary tool for workplace
development decision-making skills
29 [39] Relationship between UK Quantitative The result indicated that Is the tool
effective communication used by employees to operate
ethics and employee borrowed from the management
conducts
30 [13] Role of communication Canada Qualitative The result indicated that
ethics on workplace Communication ethics often relate
management to how a workplace handles a
situation that requires moral
decision making

4 Result

Concerning the published 30 research studies on the effects of the communication


ethics on workplace decision making and productivity 2015 to 2020 as seen in Table 5
and Fig. 2, the findings of the systematic review are reported based on the three
research questions;

COUNTRY FREQUENCY
8
6
4
2
0

Fig. 2. Country frequency of the article reviewed.


The Impacts of Communication Ethics on Workplace Decision Making 497

RQ1. What are the significant effects of workplace ethics in workplace decision
making and productivity?
RQ2. What are the main research methods and the outcomes addressed in the
collected studies?
RQ3. What is the active database in the context of communication ethics in the
workplace? Decision-making and productivity?
From the three research questions, the general understanding of the effects of
workplace ethics on workplace decision making is understood. Several research studies
were conducted on the impact of workplace communication ethics according to the
RQ1. The result indicated a variety of effects, both positive and negative depending on
the characteristics of the communication ethics itself and its application. Besides, the
principal method used in the systematic review, according to the RQ2 is the e-business
implementation and the aspect of social media use. Moreover, the ERP system was also
applied in the process.

5 Discussion

A workplace environment which focuses on the operation based on the good com-
munication ethics have all the employees aligned towards achieving the same level of
ethical behavior, the positive element of this is the smooth flow of decision-making
activities as a result of improved workplace productivity [14, 40, 41]. Ethically running
workplace from top to bottom often build a stronger bond between individual on the
management team and further create stability within the company. Whenever a man-
ager is leading a workplace based on the set of effective communication ethics,
employees are often obligated to follow the footsteps [7]. Employees as well make
better decisions in less time within the communication ethics as a guiding principle;
this element, according to [15], does increase the productivity of employees as well as
boosting the employee’s morale. The business where employees work with high
demand for communication ethics in all facets of their operation do perform their
responsibilities at a higher level and are also more inclined to stay loyal to the
workplace or management [18].

6 Conclusion

From the systematic review, communication ethics impact workplace productivity


through employee coordination. Any workplace often calls for numerous decisions
daily- decisions regarding everything from which vendor to use for the various secure,
among others [26]. Some of the communication ethical requirements for the business
are often confined to last, environmental regulation, and the minimum wage and
restrictions against insider trading and collation. According to [24], various manage-
ment teams do set the tone for how the entire company runs daily [7]. When the
prevailing management philosophy is based on the ethical practices and behaviors,
498 A. Alyammahi et al.

workplace leaders and managers with the workplace can direct employees by examples
and guide them towards making the right decisions that are not only beneficial to them
as individuals but also to the workplaces as a whole.

References
1. Alshurideh, M., Al Kurdi, B., Vij, A., Obiedat, Z., Naser, A.: Marketing ethics and
relationship marketing-an empirical study that measures the effect of ethics practices
application on maintaining relationships with customers. Int. Bus. Res. 9(9), 78–90 (2016)
2. Alshurideh, M., Al Kurdi, B., Abu Hussien, A., Alshaar, H.: Determining the main factors
affecting consumers’ acceptance of ethical advertising: a review of the Jordanian market.
J. Mark. Commun. 23(5), 513–532 (2017)
3. Alshurideh, M.: Pharmaceutical promotion tools effect on physician’s adoption of medicine
prescribing: evidence from Jordan. Mod. Appl. Sci. 12(11), 210–222 (2018)
4. Ammari, G., Al kurdi, B., Alshurideh, A., Alrowwad, A.: Investigating the impact of
communication satisfaction on organizational commitment: a practical approach to increase
employees’ loyalty. Int. J. Mark. Stud. 9(2), 113–133 (2017)
5. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84 (2012)
6. Trevino, L.K., Nelson, K.A.: Managing Business Ethics: Straight Talk about How to Do It
Right. Wiley, Hoboken (2016)
7. Ferrell, O.C., Fraedrich, J.: Business Ethics: Ethical Decision Making and Cases. Nelson
Education (2015)
8. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
9. Alshurideh, M., Al Kurdi, B., Salloum, S.A.: Examining the Main Mobile Learning System
Drivers’ Effects: A Mix Empirical Examination of Both the Expectation-Confirmation
Model (ECM) and the Technology Acceptance Model (TAM), vol. 1058 (2020)
10. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.: Corporate social responsibility and patronage intentions: the mediating effect
of brand credibility. J. Mark. Commun. 1–24 (2020)
11. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
12. Bin Salahudin, S.N., Binti Baharuddin, S.S., Abdullah, M.S., Osman, A.: The effect of
Islamic work ethics on organizational commitment. Procedia Econ. Financ. 35, 582–590
(2016)
13. Wang, L.C., Calvano, L.: Is business ethics education effective? An analysis of gender,
personal ethical perspectives, and moral judgment. J. Bus. Ethics 126(4), 591–602 (2015)
14. Akrivou, K., Bradbury-Huang, H.: Educating integrated catalysts: transforming business
schools toward ethics and sustainability. Acad. Manag. Learn. Educ. 14(2), 222–240 (2015)
15. Murtaza, G., Abbas, M., Raja, U., Roques, O., Khalid, A., Mushtaq, R.: Impact of Islamic
work ethics on organizational citizenship behaviors and knowledge-sharing behaviors.
J. Bus. Ethics 133(2), 325–333 (2016)
16. Maylor, H., Blackmon, K., Huemann, M.: Researching business and management.
Macmillan International Higher Education (2016)
17. Kolk, A.: The social responsibility of international business: from ethics and the
environment to CSR and sustainable development. J. World Bus. 51(1), 23–34 (2016)
The Impacts of Communication Ethics on Workplace Decision Making 499

18. Crane, A., Matten, D., Glozer, S., Spence, L.: Business ethics: managing corporate
citizenship and sustainability in the age of globalization. Oxford University Press (2019)
19. Beaudoin, C.A., Cianci, A.M., Tsakumis, G.T.: The impact of CFOs’ incentives and
earnings management ethics on their financial reporting decisions: the mediating role of
moral disengagement. J. Bus. Ethics 128(3), 505–518 (2015)
20. Demirtas, O., Akdogan, A.A.: The effect of ethical leadership behavior on ethical climate,
turnover intention, and affective commitment. J. Bus. Ethics 130(1), 59–67 (2015)
21. Bedi, A., Alpaslan, C.M., Green, S.: A meta-analytic review of ethical leadership outcomes
and moderators. J. Bus. Ethics 139(3), 517–536 (2016)
22. Lehnert, K., Park, Y., Singh, N.: Research note and review of the empirical ethical decision-
making literature: boundary conditions and extensions. J. Bus. Ethics 129(1), 195–219
(2015)
23. Bowie, N.E.: Business Ethics: a Kantian Perspective. Cambridge University Press,
Cambridge (2017)
24. Çelik, A., Dedeoğlu, S., İnanir, B.B.: Relationship between ethical leadership, workplaceal
commitment and job satisfaction at hotel workplaces. Ege Akad. Bakış Derg., vol. 15, no. 1,
pp. 53–64 (2015)
25. Lehnert, K., Craft, J., Singh, N., Park, Y.: The human experience of ethics: a review of a
decade of qualitative ethical decision-making research. Bus. Ethics A Eur. Rev. 25(4), 498–
537 (2016)
26. Freeman, R.E.: Ethical leadership and creating value for stakeholders. In: Business Ethics:
New Challenges for Business Schools and Corporate Leaders, pp. 94–109. Routledge (2016)
27. Guerci, M., Radaelli, G., Siletti, E., Cirella, S., Shani, A.B.R.: The impact of human resource
management practices and corporate sustainability on organizational ethical climates: an
employee perspective. J. Bus. Ethics 126(2), 325–342 (2015)
28. Jaramillo, J., Bande, B., Varela, J.: Servant leadership and ethics: a dyadic examination of
supervisor behaviors and salesperson perceptions. J. Pers. Sell. Sales Manag. 35(2), 108–124
(2015)
29. Pearson, R.: Business ethics as communication ethics: public relations practice and the idea
of dialogue. In: Public Relations Theory, pp. 111–131. Routledge (2017)
30. Ng, T.W.H., Feldman, D.C.: Ethical leadership: meta-analytic evidence of criterion-related
and incremental validity. J. Appl. Psychol. 100(3), 948 (2015)
31. Quarshie, A.M., Salmi, A., Leuschner, R.: Sustainability and corporate social responsibility
in supply chains: the state of research in supply chain management and business ethics
journals. J. Purch. Supply Manag. 22(2), 82–97 (2016)
32. Sadgrove, K.: The Complete Guide to Business Risk Management. Routledge (2016)
33. Schaltegger, S., Burritt, R.: Business cases and corporate engagement with sustainability:
differentiating ethical motivations. J. Bus. Ethics 147(2), 241–259 (2018)
34. Schermerhorn Jr., J.R., Bachrach, D.G.: Exploring Management. Wiley, Hoboken (2017)
35. Schwepker, C.H., Schultz, R.J.: Influence of the ethical servant leader and ethical climate on
customer value enhancing sales performance. J. Pers. Sell. Sales Manag. 35(2), 93–107
(2015)
36. Shafer, W.E.: Ethical climate, social responsibility, and earnings management. J. Bus. Ethics
126(1), 43–60 (2015)
37. Lu, L.-C., Chang, H.-H., Chang, A.: Consumer personality and green buying intention: the
mediate role of consumer ethical beliefs. J. Bus. Ethics 127(1), 205–219 (2015)
38. Shin, Y., Sung, S.Y., Choi, J.N., Kim, M.S.: Top management ethical leadership and firm
performance: mediating role of ethical and procedural justice climate. J. Bus. Ethics 129(1),
43–57 (2015)
500 A. Alyammahi et al.

39. Wallace, M., Sheldon, N.: Business research ethics: participant observer perspectives. J. Bus.
Ethics 128(2), 267–277 (2015)
40. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: an empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Financ. Adm. Sci. 51(1), 44–64 (2012)
41. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
A Comparative Study of Various Deep
Learning Architectures for 8-state Protein
Secondary Structures Prediction

Moheb R. Girgis(&) , Enas Elgeldawi,


and Rofida Mohammed Gamal

Department of Computer Science, Faculty of Science,


Minia University, El-Minia, Egypt
{moheb.girgis,enas.elgeldawi}@mu.edu.eg,
rofida21_gamal@yahoo.com

Abstract. In recent years, deep learning (DL) techniques have been applied in
the structural and functional analysis of proteins in bioinformatics, especially in
8-state (Q8) protein secondary structure prediction (PSSP). In this paper, we
have explored the performance of various DL architectures for Q8 PSSP, by
developing six DL architectures, using convolutional neural networks (CNNs),
recurrent neural networks (RNNs), and combinations of them. These architec-
tures are: CNN-SW (CNNs with sliding window); CNN-WP (CNNs with whole
protein as input); LSTM+ (Long Short-Term Memory (LSTM) & Bidirec-
tional LSTM (BLSTM)); GRU+ (Gated Recurrent Unit (GRU) & bidirectional
GRU (BGRU)); CNN-BGRU (CNNs & BGRUs); and CNN-BLSTM (CNNs &
BLSTMs). They include batch normalization, dropout, and fully-connected
layers. We have used CB6133 and CB513 datasets for training and testing,
respectively. The experiments showed that combining CNN with BLSTM or
BGRU overcame overfitting, and achieved better Q8 accuracy, precision, recall
and F-score. The experiments on CB513 showed that CNN-SW, CNN-BGRU,
and CNN-BLSTM achieved Q8 accuracy comparable with some state-of-the-art
models.

Keywords: Protein secondary structures  Q8 prediction  Convolutional


neural network  Recurrent neural networks  Long Short-Term Memory  Gated
Recurrent Unit  Overfitting  Deep learning

1 Introduction

In bioinformatics, protein secondary structure prediction (PSSP) is very important for


medicine and biotechnology, for example drug design [1] and the design of novel
enzymes. PSSP also plays an important role in protein tertiary structure prediction [2].
Protein secondary structures are classified into two categories: 3-state category (Q3),
which includes: helix (H), strand (E), and coil (C); and 8-state category (Q8), which has
been proposed by the DSSP program [3], includes: 310 helix (G), a-helix (H), p-helix
(I), b-stand (E), bridge (B), turn (T), bend (S), and others (C).

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 501–513, 2021.
https://doi.org/10.1007/978-3-030-58669-0_45
502 M. R. Girgis et al.

Protein sequence contains two types of sequence information: local context and
long-range interdependencies [4, 5]. Local contexts denote the correlations between
residues with distance less than or equal to a predefined threshold, while long-range
correlation are the correlations between residues with distance more than threshold [2].
It is known that local contexts are critical for PSSP. Specifically, the secondary
structure category information of the neighbors of an amino acid are the most effective
features for determining the secondary structure of this amino acid [5]. Convolutional
neural networks (CNNs) [6], a specific type of deep neural networks (DNNs) using
translation-invariant convolutional kernels, can be used in extracting local contextual
features. On the other hand, long-range interdependency of amino acids also holds vital
evidences for the category of a secondary structure. Recurrent neural networks (RNNs)
are designed to capture dependencies across a distance larger than the extent of local
contexts [5]. RNNs with gate and memory structures, including long short-term
memory (LSTM) [7], gate recurrent units (GRUs) [8], bidirectional RNN (BRNN) [9],
bidirectional GRU (BGRU) [5], and bidirectional LSTM (BLSTM) [10] can artificially
learn to remember and forget information by using specific gates to control the
information flow.
The aim of this paper is to explore the performance of various deep learning
(DL) architectures for Q8 PSSP. To this end, we have developed six different DL
architectures for Q8 PSSP, which use CNNs, RNNs, and combinations of them. These
architectures are: CNN-SW, CNN-WP, LSTM+, GRU+, CNN-BGRU, and CNN-
BLSTM.
The paper is organized as follows: Sect. 2 presents review of related work. Sec-
tion 3 describes the architectures of the explored PSSP models. Section 4 presents the
experiments setup, and the results of evaluating the performance of the presented
models in Q8 PSSP, and comparing them with some state-of-the-art models. Finally,
Sect. 5 presents the conclusion of this work.

2 Related Work

This section presents a review of recent related work that applied DL methods to PSSP,
specifically Q8 prediction.
Sunderby and Winther [10] used a BLSTM for PSSP. Zhou and Troyanskaya [4]
used combination of convolutional and supervised generative stochastic network (SC-
GSN) for low-level structured prediction that is sensitive to local information, while
getting high-level and distant features. Wang et al. [11] proposed DeepCNF, which is a
DL extension of Conditional Neural Fields (CNF). It combines the advantages of both
CNFs and deep CNNs. Li and Yu [5] proposed a deep convolutional and recurrent
neural network (DCRNN), which includes multiscale CNN layers and stacked BGRUs
to obtain local and long-range contact information. Busia and Jaitly [12] proposed
Next-Step Conditioned CNN (NCCNN). It is composed of a multi-scale and residual
convolutional architecture, enhanced with next-step conditioning on structure labels.
Heffernan et al. [13] employed LSTM-BRNN to capture long range interactions, in a
tool named SPIDER3. Zhou et al. [2] proposed a DL model, named CNNH_PSS, by
using multi-scale CNN with highways between neighbor convolutional layers that can
A Comparative Study of Various Deep Learning Architectures 503

extract both local contexts and long-range interdependencies. Fang et al. [14] proposed
a DNN architecture, named the Deep inception-inside-inception network, and imple-
mented it as a tool MUFOLD-SS. MUFOLD-SS enables effective processing of local
and global interactions between amino acids in making accurate prediction. Zhang et al.
[15] proposed a DL model, named CRRNN. In this model, 1D CNN and original data
were constructed into a local block to capture adjacent amino acid information, and a
residual network connected an interval BGRU network to improve modeling long-
range dependencies. Kumar et al. [16] proposed a DL model, which consists of hybrid
features with combination of CNN and BRNN.

3 The Explored PSSP Models

We have developed six DL models for Q8 PSSP, and explored their performance. They
can be categorized as follows: (1) two CNN models, namely: CNN with local sliding
window (CNN-SW) and whole protein prediction CNN (CNN-WP); (2) two RNN
models, namely: LSTM+ and GRU+; and (3) two CNN-RNN combined models,
namely: CNN-BGRU and CNN-BLSTM.
Each one of these DL models consists of two main parts: feature learning &
extraction part, and classification part. In all models, except the second one (CNN-
WP), the classification part consists of three fully-connected (dense) layers, which
classify the extracted features, as shown in Fig. 1. The features are flattened before
presenting them to the first dense layer. In the last dense layer, the softmax activation
outputs the predicted results in the 8-state category. The architectures of these DL
models are described in the following subsections.

Dropout Flatten Fully-connected Fully-connected Softmax

Fig. 1. The Classification Part (CP) of our models

3.1 CNN with Local Sliding Window (CNN-SW) Model


In the CNN-SW model, we have used a local sliding window of a limited number of
elements as an example for the CNN. Each sliding window is shifted along the
sequences, predicting for each window the secondary structure in a single location (8
classes), in the center of each window. The size of the window has been chosen to be
17 residues as it yields the best results (performance/training time trade off).
Here, we have used a 1D CNN to model the local dependencies of adjacent amino
acids. In the 1D CNN, a kernel is a filter for removing outliers to filter data or act as a
feature detector. Given the sequence data:

X ¼ ðx1 ; x2 ; x3 : : :xt1 ; xt ; xt þ 1 : : :xn Þ; ð1Þ


504 M. R. Girgis et al.

where xi = (xi1, xi2, . . .xij, . . .xim) is a feature vector of the ith residue. The residue is
convoluted by the 1D CNN as follows:

hi ¼ f ðW  xi:i þ k1 þ bÞ ð2Þ

where “*” denotes the convolutional operation, W is the weights matrix, and k repre-
sents the kernel size. The kernel sizes of CNN were set to 3 and 5. These small kernel
sizes enable the CNN to effectively capture the local information. In the first two CNN
layers, 128 filters were used, and in the last layer 64 filters were used; and each layer
contains a rectified linear unit (RLU) function that activates the network output.
Figure 2 shows the architecture of the CNN-SW model. Its feature learning &
extraction part consists of 3 CNN_1D Layers, in which each CNN_1D layer is fol-
lowed by a Batch Normalization layer and a Dropout layer. The Batch Normalization
layer is used to transform inputs so that they are standardized [17]. Batch Normal-
ization accelerates the training of DNNs, and also acts as a regularizer [18]. The
Dropout layer is used to reduce overfitting and regularize DNNs [19].

Input CNN_1D_5 BatchNorm Dropout CNN_1D_5

Output CP BatchNorm CNN_1D_3 Dropout BatchNorm

Fig. 2. CNN with local sliding window model architecture

3.2 Whole Protein Prediction CNN (CNN-WP) Model


In the CNN-WP model, we have used the whole protein sequence (primary structure)
as an example for the CNN, with an output of dimension 700  9, the sequence of the
predicted secondary structure. Figure 3 shows the architecture of the CNN-WP model.
Its feature learning & extraction part consists of 3 CNN_1D Layers, and the classifi-
cation part includes only one fully-connected layer. The kernel sizes of these CNN
layers were set to 11, as the average length of a-helex is 11 residues and the average
length of b-sheet is 6 residues. Each CNN_1D layer is followed by a Dropout layer.

Input CNN_1D_11 Dropout CNN_1D_11 Dropout

Output Softmax Dropout CNN_1D_11

Fig. 3. Whole protein prediction CNN (CNN-WP) model architecture

3.3 LSTM+ Model


LSTM is a RNN architecture with feedback connections. LSTM was developed to deal
with the exploding and vanishing gradient problems that can be encountered when
A Comparative Study of Various Deep Learning Architectures 505

training traditional RNN [20]. The LSTM unit is composed of memory cell, input gate,
output gate, input modulation gate and forget gate. The memory cell remembers values
over arbitrary time intervals and the four gates regulate the flow of information into and
out of the cell.
BLSTM is an extension of LSTM that can improve model performance on
sequence classification problems. In problems where all time steps of the input
sequence are available, BLSTM combines two LSTMs, one moves forward through
time beginning from the start of the sequence, and one moves backward beginning
from the end of the sequence. This can provide additional context to the network and
result in faster and even fuller learning on the problem [21].
Figure 4 shows the architecture of our LSTM+ model. Its feature learning &
extraction part consists of a LSTM Layer, followed by a Dropout Layer, and a BLSTM
layer. In the LSTM and BLSTM layers, 32 and 9 filters were used, respectively.

Input LSTM Dropout BatchNorm BLSTM BatchNorm CP Output

Fig. 4. LSTM+ model architecture

3.4 GRU+ Model


GRU is another RNN architecture, similar to LSTM. The GRU unit is composed of
reset and update gates instead of input, output and forget gates of the LSTM unit.
BGRU combines a GRU that moves forward through time beginning from the start
of the sequence along with another GRU that moves backward through time beginning
from the end of the sequence.
Figure 5 shows the architecture of our GRU+ model. Its feature learning &
extraction part consists of a GRU Layer, followed by a Dropout Layer, and a BGRU
layer. In the GRU and BGRU layers, 32 and 9 filters were used, respectively.

Input GRU Dropout BatchNorm BGRU BatchNorm CP Output

Fig. 5. GRU+ model architecture

Input Local Block BGRU BGRU BatchNorm

Output CP BatchNorm BGRU BGRU


(a)
CNN_1D_5 TimeDistrib Dropout CNN_1D_3 TimeDistributed(BatchNorm)
TimeDistributed(Flatten) TimeDistributed(Dropout)
(b)

Fig. 6. (a) CNN-BGRU model architecture, (b) local block architecture


506 M. R. Girgis et al.

3.5 CNN-BGRU Model


Figure 6(a) shows the architecture of our CNN-BGRU model. Its feature learning &
extraction part consists of a local block, followed by 4 BGRU Layers. The local block,
as shown in Fig. 6(b), consists of 2 CNN_1D Layers, with local sliding window of size
17 and kernel sizes 5 and 3. In this block, the TimeDistributed (layer) wrapper was
used to apply a layer to every temporal slice (time step) of an input.

3.6 CNN-BLSTM Model


Figure 7 shows the architecture of our CNN-BLSTM model. Its feature learning &
extraction part consists of a local block, shown in Fig. 6, followed by 4 BLSTM
Layers.

Input Local Block BLSTM BLSTM BatchNorm

Output CP BatchNorm BLSTM BLSTM

Fig. 7. CNN-BLSTM model architecture

4 Experiments

4.1 Datasets
In this study, the training dataset CB6133, produced by PISCES CullPDB server [22],
is used. This dataset contains 6128 non-homologous sequences, each of 39900 features.
In the 6128 proteins, 5600 proteins are training samples, 256 proteins are validation
samples and 272 proteins are testing samples. This dataset is publicly available from
literature [4]. Protein sequences in CB6133 have a similarity less than 25%, a reso-
lution better than 3.0 Å and an R factor of 1.0. The redundancy with test datasets was
removed using cd-hit [23]. The 6128 proteins  39900 features were reshaped into
6128 proteins  700 amino acids  57 features. The test dataset CB513, obtained from
[4], is used. It is widely used to evaluate the performance of the PSSP methods. It
consists of 26143 a-helix, 1180 b-bridge, 17994 b-strand, 3132 310helix, 30 p-helix,
10008 Turn, 8310 Bend, and 17904 Coil.

4.2 Input Representation


Since we consider proteins with a maximal sequence length of 700 amino acids,
proteins shorter than 700 AA were padded with zeros and the corresponding outputs
are labeled with “NoSeq”. This padding allows us to have the same input shape for any
protein. Thus, amino acid chains are described by a 700  57 matrix. The 700 denotes
the peptide chain and the 57 denotes the number of features in each amino acid. Among
the 57 features, 22 represent the primary structure (20 amino acid, 1 unknown or any
amino acid, 1 ‘NoSeq’ - padding), 22 represent the Protein Profile (same as primary
A Comparative Study of Various Deep Learning Architectures 507

structure), 2 represent N- and C-terminals, 2 represent relative and absolute solvent


accessibility and 9 represent the secondary structure (8 possible states, 1 ‘NoSeq’ -
padding).
The input to each model consists of position-specific scoring matrix (PSSM), N-
and C-terminals, and relative and absolute solvent accessibility, used only for training
(absolute accessibility is thresholded at 15; relative accessibility is normalized by the
largest accessibility value in protein and thresholded at 0.15; original solvent acces-
sibility is computed by DSSP (Define Secondary Structure of Proteins) (https://swift.
cmbi.umcn.nl)). To generate a PSSM, we ran PSI-Blast [24] to search the NCBI non-
redundant database through three iterations with E-value = 0.001. To ensure the net-
work gradients decrease smoothly, the above 57 features were normalized by logistic
function. The secondary structure labels are generated by DSSP [3], and the absolute
solvent accessibility is predicted by the neural networks [25].

4.3 Evaluation Methods


We have used accuracy, precision, recall, and F-score metrics for measuring the Q8
prediction performance of our DL models. In addition, for each model:
• Eight AUC-ROC curves [26], for the 8 individual secondary structure classes, were
plotted to measure how much the model is capable of distinguishing each class from
other classes.
• Training and validation losses and accuracy curves were plotted. Training Curves,
calculated from a training dataset, indicate how well the model is learning. Vali-
dation Curves, calculated from a validation dataset, indicate how well the model is
generalizing. These curves are used to diagnose an underfit, overfit, or well-fit
model.

4.4 Experimental Results and Comparative Analysis


The developed models were implemented in Keras, which is a publicly available DL
software. Weights in each model were initialized using default values. The learning rate
was set to 0.0009. The Categorical Cross-Entropy loss function was used to train the
model. The dropout rate and batch size were set to 0.38 and 64, respectively. The
models were trained on Lenovo ideapad520, Processor core i7, NVIDIA GeForce, with
8 GB memory.
For all models, except CNN-SW and CNN-WP, the Q8 accuracy results were
reported only for 35 epochs as no improvement were achieved for more epochs. The
running times for the models were 10, 3, 17, 11, 44 and 43 s, respectively.
The CNN-SW Model, shown in Fig. 2, has been trained using CB6133, and
achieved Q8 prediction accuracy of 71% for 35 epochs and 72% for 50 epochs. It
achieved Q8 prediction accuracy of 69.19% on CB513. Figure 8 shows the learning
accuracy and loss curves, which indicate good fit.
508 M. R. Girgis et al.

Fig. 8. Training and validation accuracy and losses of the CNN-SW model

The CNN-WP Model, shown in Fig. 3, has been trained with CB6133 for 50
epochs, and achieved Q8 prediction accuracy 90.18%. It achieved Q8 prediction
accuracy of 92.07% for 50 epochs on CB513. Figure 9 shows the learning accuracy
and loss curves, which indicate good fit.
The LSTM+ Model, shown in Fig. 4, has been trained using CB6133, and
achieved Q8 prediction accuracy 70.60%. It achieved Q8 prediction accuracy of
65.55% on CB513. Figure 10 shows the learning accuracy and learning loss curves.
There are large gaps between training and validation accuracy and loss curves, which
indicate that this model is overfitting the training and validation data.

Fig. 9. Training and validation accuracy and losses of the CNN-WP model

Fig. 10. Training and validation accuracy and losses of the LSTM+ model

The GRU+ Model, shown in Fig. 5, has been trained using CB6133, and achieved
Q8 prediction accuracy of 70.26%. It achieved Q8 prediction accuracy of 66.68% on
A Comparative Study of Various Deep Learning Architectures 509

CB513. Figure 11 shows the learning accuracy and loss curves, which indicate that this
model is overfitting the training and validation data.
The CNN-BGRU Model, shown in Fig. 6, has been trained using CB6133, and
achieved Q8 prediction accuracy of 71.72%. It achieved Q8 prediction accuracy of
68.13% on CB513. Figure 12 depicts the learning accuracy and learning loss curves,
which show good fit. This indicates that CNN-BGRU is more robust to overfitting and
is generalizing better than GRU+.

Fig. 11. Training and validation accuracy and losses of the GRU+ model

The CNN-BLSTM Model, shown in Fig. 7, has been trained using CB6133, and
achieved Q8 prediction accuracy of 70.82%. It achieved Q8 prediction accuracy of
68.00% on CB513. Figure 13 depicts the learning accuracy and loss curves, which
show good fit. This indicates that CNN-BLSTM is more robust to overfitting and is
generalizing better than LSTM+.

Fig. 12. Training and validation accuracy and losses of the CNN-BGRU model
510 M. R. Girgis et al.

Fig. 13. Training and validation accuracy and losses of the CNN-BLSTM model

Table 1. AUC-ROC values for individual secondary structure classes with the proposed models
Secondary structure CNN-SW CNN-WP LSTM+ GRU+ CNN-BGRU CNN-BLSTM
L 0.87 0.87 0.87 0.88 0.87 0.89
B 0.84 0.84 0.84 0.85 0.84 0.87
E 0.96 0.96 0.95 0.95 0.96 0.96
G 0.89 0.86 0.87 0.87 0.86 0.89
I 0.91 0.85 0.88 0.72 0.86 0.86
H 0.98 0.97 0.97 0.97 0.97 0.98
S 0.87 0.85 0.85 0.85 0.85 0.87
T 0.90 0.89 0.89 0.89 0.89 0.90

Table 1 shows the AUC-ROC values for individual secondary structure classes
with the proposed models. As the table shows, the values for each class for all models
are near to 1, which means they have good measure of separability, i.e. they are capable
of distinguishing between classes. The table shows also that CNN-BLSTM achieved
the highest values for all classes, except class I.
Tables 2 and 3 show comparisons of the Q8 accuracy, precision, recall and F-score,
on CB6133 and CB513, respectively, between the proposed models. As the table
shows, CNN-WP achieved the highest accuracy, but CNN-BLSTM achieved the
highest precision, recall and F-score, followed by CNN-BGRU and CNN-SW.

Table 2. A comparison of Q8 accuracy, precision, recall and F-score on CB6133 dataset


between the proposed models
CNN-SW CNN-WP LSTM+ GRU+ CNN-BGRU CNN-BLSTM
Accuracy % 71.71 90.18 70.60 70.26 71.72 70.819
Precision 0.8295 0.5576 0.6541 0.7109 0.8295 0.9284
Recall 0.8747 0.7795 0.3710 0.3972 0.8747 0.8867
F-score 0.8515 0.6502 0.4735 0.5097 0.8515 0.9071
A Comparative Study of Various Deep Learning Architectures 511

Table 3. A comparison of the Q8 accuracy, precision, recall and F-score on CB513 test set
between the proposed models
CNN-SW CNN-WP LSTM+ GRU+ CNN-BGRU CNN-BLSTM
Accuracy % 68.38 92.07 65.55 66.68 68.13 68.00
Precision 0.8791 0.4826 0.6133 0.6956 0.8791 0.9492
Recall 0.8443 0.3559 0.3771 0.4181 0.8443 0.8593
F-score 0.8614 0.4096 0.4671 0.5223 0.8614 0.9021

Table 4 shows a comparison of the Q8 accuracy on CB513 between our models


and the following state-of-the-art models: DCRNN [5], DeepCNF [11], and SC-GSN
[4], and BLSTM [10], NCCNN model [12], and CRRNN [15]. As the table shows,
CNN-WP achieved the highest accuracy, followed by CRRNN, NCCNN and DCRNN.
It also shows that CNN-SW, CNN-BGRU, CNN-BLSTM, and DeepCNF reached
nearly similar accuracies.

Table 4. A comparison of the Q8 accuracy on CB513 between our models and state-of-the-art
models
Proposed models Accuracy % State-of- the-art models Accuracy %
CNN-SW 68.38 SC-GSN 66.4
CNN-WP 92.07 BLSTM 67.4
LSTM+ 65.55 DeepCNF 68.3
GRU+ 66.68 DCRNN 69.7
CNN-BGRU 68.13 NCCNN 70.3
CNN-BLSTM 68.00 CRRNN 71.4

From the above comparative study, we can conclude that:


• Although CNN-WP achieved very high accuracy, CNN-SW achieved higher pre-
cision, recall and F-score. Therefore, we used CNN-SW in CNN-BGRU and CNN-
BLSTM.
• Combining CNN with each of BLSTM and BGRU has overcome the overfitting,
which occurred with LSTM+ and GRU+ models. Also, they achieved better Q8
accuracy, precision, recall and F-score than LSTM+ and GRU+, respectively.
• CNN-BGRU and CNN-BLSTM consume more running time than other models.
• CNN-BLSTM achieved the highest AUC-ROC values for almost all individual
secondary structure classes, and it achieved, on CB6133 and CB513, the highest
precision, recall and F-score, followed by CNN-BGRU and CNN-SW.
• The comparison of the Q8 accuracy on CB513 between our models and state-of-the-
art models, showed that CNN-SW, CNN-BGRU, and CNN-BLSTM achieved
results comparable with some state-of-the-art models.
512 M. R. Girgis et al.

5 Conclusion

This paper explored the performance of various DL architectures for Q8 PSSP, by


developing 6 DL architectures, using CNNs, RNNs, and combinations of them. These
architectures are: CNN-SW, CNN-WP, LSTM+, GRU+, CNN-BGRU, and CNN-
BLSTM. They include batch normalization, dropout, and fully-connected layers.
We have used CB6133 and CB513 as training and test datasets, respectively. The
experiments showed that combining CNN with BLSTM or BGRU has overcome
overfitting, which occurred with LSTM+ and GRU+ models, and CNN-BLSTM
achieved the best Q8 accuracy, precision, recall and F-score values. Also, CNN-
BLSTM achieved the highest AUC-ROC values for almost all individual secondary
structure classes. The experiments on CB513 showed that CNN-SW, CNN-BGRU, and
CNN-BLSTM achieved Q8 accuracy comparable with some state-of-the-art models.
We have noticed that CNN-WP achieved very high Q8 accuracy, but achieved very
low precision, recall and F-score.

References
1. Noble, M.E., Endicott, J.A., Johnson, L.N.: Protein kinase inhibitors: insights into drug
design from structure. Science 303(5665), 1800–1805 (2004)
2. Zhou, J., Wang, H., Zhao, Z., Xu, R., Lu, Q.: CNNH_PSS: protein 8-class secondary
structure prediction by convolutional neural network with highway. BMC Bioinform. 19
(60), 99–119 (2018)
3. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of
hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
4. Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic
network for protein secondary structure prediction. In: 31st International Conference on
Machine Learning (ICML 2014), pp. 745–53. PMLR, Bejing (2014)
5. Li, Z., Yu, Y.: Protein secondary structure prediction using cascaded convolutional and
recurrent neural networks. In: 25th International Joint Conference on Artificial Intelligence
(IJCAI 2016), pp. 2560–2567. AAAI Press, California (2016)
6. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proc. IEEE 86(11), 2278–2324 (1998)
7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
8. Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio,
Y.: Learning phrase representations using RNN encoder–decoder for statistical machine
translation. In: 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 1724–1734. Association for Computational Linguistics, Doha (2014)
9. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal
Process. 45(11), 2673–2681 (1997)
10. Sunderby, S.K., Winther, O.: Protein secondary structure prediction with long short term
memory networks. arXiv:1412.7828v2 [q-bio.QM] (2014)
11. Wang, S., Peng, J., Ma, J., Xu, J.: Protein secondary structure prediction using deep
convolutional neural fields. Sci. Rep. 6, Article number 18962 (2016)
A Comparative Study of Various Deep Learning Architectures 513

12. Busia, A., Jaitly, N.: Next-step conditioned deep convolutional neural networks improve
protein secondary structure prediction. In: Conference on Intelligent Systems for Molecular
Biology and European Conference on Computational Biology (ISMB/ECCB 2017).
International Society of Computational Biology, Leesburg (2017)
13. Heffernan, R., Yang, Y., Paliwal, K., Zhou, Y.: Capturing non-local interactions by long
short-term memory bidirectional recurrent neural networks for improving prediction of
protein secondary structure, backbone angles, contact numbers and solvent accessibility.
Bioinformatics 33(18), 2842–2849 (2017)
14. Fang, C., Shang, Y., Xu, D.: MUFOLD-SS: new deep inception-inside-inception networks
for protein secondary structure prediction. Proteins 86(5), 592–598 (2018)
15. Zhang, B., Li, J., Lü, Q.: Prediction of 8-state protein secondary structures by a novel deep
learning architecture. BMC Bioinform. 19(293), 1–13 (2018)
16. Kumar, P., Bankapur, S., Patil, N.: An enhanced protein secondary structure prediction using
deep learning framework on hybrid profile based features. Appl. Soft Comput. J. 86
(105926), 1–10 (2020)
17. Brownlee, J.: Better Deep Learning: Train Faster, Reduce Overfitting, and Make Better
Predictions. v1.7. edn. Machine Learning Mastery, Vermont (2020)
18. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing
internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
19. Brownlee, J.: How to Reduce Overfitting with Dropout Regularization in Keras. https://
machinelearningmastery.com/how-to-reduce-overfitting-with-dropout-regularization-in-
keras/. Accessed 3 Oct 2019
20. Long short-term memory, From Wikipedia, https://en.wikipedia.org/wiki/Long_short-term_
memory. Accessed 29 Aug 2019
21. Brownlee, J.: How to Develop a Bidirectional LSTM for Sequence Classification in Python
with Keras. Long Short-Term Memory Networks. Accessed 16 June 2017
22. Wang, G., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19
(12), 1589–1591 (2003)
23. Li, W., Godzik, A.: Cd-hit: a fast program for clustering and comparing large sets of protein
or nucleotide sequences. Bioinformatics 22(13), 1658–1659 (2006)
24. Altschul, S.F., Gertz, E.M., Agarwala, R., Schaaffer, A.A., Yu, Y.K.: PSI-Blast pseudo
counts and the minimum description length principle. Nucleic Acids Res. 37(3), 815–824
(2009)
25. Chen, H., Zhou, H.X.: Prediction of solvent accessibility and sites of deleterious mutations
from protein sequence. Nucl. Acids Res. 33(10), 3193–3199 (2005)
26. Narkhede, S.: Understanding AUC - ROC Curve. https://towardsdatascience.com/understan
ding-auc-roc-curve-68b2303cc9c5. Accessed 26 June 2018tsa
Power and Control Systems
Energy Efficient Spectrum Aware Distributed
Clustering in Cognitive Radio Sensor Networks

Randa Bakr1(&), Ahmad A. Aziz El-Banna1, Sami A. A. El-Shaikh2,


and Adly S. Tag ELdien1
1
Electrical Engineering Department, Faculty of Engineering at Shoubra,
Benha University, Benha, Egypt
randa.bakr@gmail.com
2
Mechanical and Electrical Research Institute, National Water Research Center,
Al Qanatir Al Khayriyah 13411, Egypt

Abstract. Cognitive Radio Sensor Network (CRSN) is an advanced wireless


communication paradigm, which overcomes the limitation energy and band-
width constraints for the conventional Wireless Sensor Network (WSN) by
avoiding overcrowded transmission bands. Sensor nodes in CRSN function as
secondary users, providing opportunistic access to unoccupied channels for a
band that is primarily allowed to a primary user. Throughout this paper, we
propose a technique that constructs energy-efficient distributed clusters in a
dynamic frequency medium with self-organized fashion. Intra-cluster connec-
tivity is enhanced by picking cluster head in rotation mechanism depending on
nodes’ residual energy rate, number of channels, single hop to neighbors, dis-
tance to sink, and communication cost for nodes. Moreover, based on these
proposed network parameters the appropriate number of clusters is determined.
Simulation results proved the superiority of the proposed technique over other
algorithms, considering the lifespan of the network.

Keywords: Cognitive radio sensor networks  Clustering algorithm  Energy


efficient  Wireless sensor networks  Lifespan of the network

1 Introduction

Wireless Sensor Network (WSN) is a network of small devices, known as sensor


nodes. These sensor nodes are particularly clustered and collaborate to transmit
information from the field being monitored. By using the ISM (Industrial, Scientific,
and Medical) bands. WSN technology proposes several advantages over traditional
networking solutions, like, scalability, lower costs, accuracy, reliability, flexibility, and
eases of deployment that make it popularly used in a wide range for various applica-
tions. However, WSN also has limited resources, which include limited bandwidth,
short transmission range, constrained energy and restricted storage and processing
capability [1]. The electromagnetic spectrum faces a problem of the scarcity as ISM
bands are overcrowded resulting from the rapidly increasing number of linked devices
throughout the Internet of Things (IoT) network technology, whereas, the licensed
bands often go underutilized. Cognitive Radio (CR) is perceived even suggestion for

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 517–526, 2021.
https://doi.org/10.1007/978-3-030-58669-0_46
518 R. Bakr et al.

utilization of radio spectrum with an opportunistic allocation policy [2]. CR is a


wireless networking technology, that enables the transceivers from detecting the vacant
communication channels of the licensed users (Primary Users (PUs)), then the trans-
ceivers communicate immediately by these vacant channels, the transceivers with this
functionality are classified as a Secondary Users (SUs). However, CR has challenges
regarding the dynamic spectrum access and spectrum-efficient communication [3, 4].
WSN approaches already do not consider CR features and thus do not resolve the
challenges of dynamic spectrums. In addition, approaches have been proposed for
Cognitive Radio Network (CRN) that may not take the hardware and energy challenges
into consideration. Therefore, it is proposed to smartly combine cognitive radio net-
works with wireless sensor networks in [5]. With the wide-range and overlaid
deployments, CRSN represent a promising solution for hardware and energy limitation
issues resulted from WSN and dynamic spectrum issues resulted from CR. Moreover,
CRSNs inherit the characteristics of the conventional WSNs which are restricted by
energy, constrained storage and processing resources. Therefore, an energy efficient
communication is considered a major challenge in CRSNs, as it requires extending the
lifetime of the network through a certain design topology of maintenance techniques in
the dynamic spectrum environment [6]. Recently, researchers focus on clustering,
nodes connectivity, and establishing paths between nodes to enhance network lifetime,
scalability, and stability [7, 8]. Clustering is a process of grouping and organizing
similar sensor nodes with certain objectives to prolong network lifetime by saving
energy and bandwidth resources. Whereas, each sensor node transmits its information
to its Cluster Head (CH), which forwards the aggregated data information to the Base
Station (BS) [9]. The clustering for CRSN takes into account the common available
channels for nodes.
Within this paper, a distributed energy-efficient clustering strategy aware of spec-
trum is proposed. Whereas, the appropriate number of clusters is defined. In addition,
the proposed technique takes into account the process of electing CH for each cluster,
which prolongs the network lifetime.
The remaining sections of this paper is pursued as. In Sect. 2, provides a
description of the previous relevant research. In Sect. 3, system model is explained.
Section 4 describes the proposed approach, while the performance evaluation is out-
lined in Sect. 5. Finally, Sect. 6 concludes the paper.

2 Related Work

For clustering mechanism, much awareness was already given to the issue of clustering
whether in WSNs or CRNs but only a some of these patterns were completely appli-
cable to CRSN. Some considerations must be taken into account to fulfill the main
objective of CRSN, which is to collect application-specific source information quickly
and accurately. In addition, CRSN inherits some resource constraints from traditional
WSNs such as energy scarcity. In [10, 11], a protocol on energy-efficient low Energy
Adaptive Clustering Hierarchy (LEACH) is proposed, where the node is automatically
decided whether it is just cluster head node or member node depending on the prob-
ability addressed for each node and then member nodes join the nearest cluster head.
Energy Efficient Spectrum Aware Distributed Clustering 519

This protocol prolongs the network lifetime, but it assumes fixed channel, thus not
applicable for CRSN. In [12], another approach is developed, and named Hybrid
Energy Efficient Distributed (HEED) protocol, where CH election depends on node’s
residual energy and the proximity to neighbors. This approach also improves the
lifetime and throughput of the network significantly, but it has fixed channel, which
limits its applicability to CRSN. The authors in [13], investigate the route detection and
implement CRN clustering techniques, ensuring a connectivity, nevertheless, their
work is considered non-energy efficient for allocating resources. In conversely, effi-
ciency of energy is considered in [14], clusters are formed according to the occurrence
of such an event; in addition, clusters are created between the location of the event and
the sink to reduce excessive formation of clusters. However, for the networks where
rate of event occurrence is high, frequent re-clustering remains a challenge. A spec-
trum-aware clustering scheme for CRSN is mentioned in [15], by introducing a Cen-
tralized Spectrum-Aware Clustering algorithm (CSAC) and a Distributed Spectrum-
Aware Clustering (DSAC) technique, the main idea for the two algorithms is the same,
but DSAC has higher stability and lower complexity than CSAC. This is due to the
self-manner for sensor nodes in the clustering formation process for DSAC. This leads
to no need for a central control unit as in CSAC, which decreases the intensive
exchange of messages between nodes and CHs and works to utilize energy efficiently.
The authors in [16, 17] designing clustering algorithms for CRSN which is extending
of the Low Energy Adaptive Clustering Hierarchy (LEACH) protocol. Whereas, in the
case of each node being chosen as a cluster head, they depend only the number of
unoccupied channels as a weight. In [18], an approach that considers both dynamic
spectrum and energy challenges is proposed. The CH election process depends on the
available channels of the nodes, the residual energy values, the number of neighbors
and the distance from the sink.

3 System Model

We considered a CRSN model containing of a BS and (N) nodes with CR capabilities


divided into a number of clusters (K) and randomly deployed through uniform dis-
tribution within network region (M2). Inside this area, the spectrum resource is divided
into a number of channels (ch) which are shared by sensor nodes (ni) whereas, (i) from
1 to (N). In addition, the number of PUs (P). The process of selecting a licensed
channel for each sensor node is established based on the activities of PUs, which are
implemented alongside the sensor nodes. Any node within the protection area of an
active PU considers the channel a busy channel if this active PU uses it. Formation of
the clusters depends on the sensor nodes that communicate through utilizing the vacant
channels at their particular locations.
520 R. Bakr et al.

4 The Proposed Approach


4.1 Spectrum Aware Clustering for CRSN
The proposed structure for clustering aware of the spectrum is formed whereas; the
CRSN nodes that are located within the protected range of the PUs cannot use the
channels, which are occupied by PUs. A number of neighboring nodes with common
channels form clusters according to group-wise restrictions [7]. One of the member
nodes for a particular cluster is selected as a CH for this cluster. During intra-cluster
communication, the readings of source information for CRSN nodes are sent to their
CH via the local common channel. In addition, for inter-cluster communication, the CH
aggregates the source data information and sends the data information to the sink.

4.2 Minimizing Communication Energy


The main aim of this work is to reduce the power consumption considering the
appropriate number of clusters in the network to maintain the power consumed in intra-
and inter-cluster connectivity. It should be mentioned that increasing the intra-cluster
distances would increase the intra-cluster communication energy. For the process of
clustering, the neighborhood-information is exchanged between a pair of clusters to
obtain the local minimum distance between them. Based on the local nearest distance
and the common available channel, a pair of clusters can be merged. The energy
consumed by the CH nodes depends on the signals are obtained from the member
nodes, then the signals are aggregated, and the aggregated signal is transmitted to the
BS. The model as defined in [10], shown in Fig. 1 is used for calculating the energy
consumption to the sensor nodes. The CH dissipated energy is expressed as [10]:

ECH ¼ lEelec ðn  1Þ þ lEDA n þ lEelec þ lemp dtoBS


4
ð1Þ

Where l is the total number of bits of each data message, Eelec is the electronic energy
for transmitter, n is the number of nodes in a cluster, EDA is the energy dissipated per bit
due to data aggregation, emp is the multipath model power amplified, dtoBS is the
distance between the cluster head node and the BS.

Fig. 1. Energy dissipation model.


Energy Efficient Spectrum Aware Distributed Clustering 521

The dissipated energy for any node of the member nodes is expressed as [10]:

EnonCH ¼ lEelec þ lefs dtoCH


2
ð2Þ

Where: efs is the free space model of the power amplifier.


2
Assuming the area occupied by each cluster is approximately Mk . According to the
derivation in [10], the optimal number of cluster can be expressed as:
pffiffiffiffi rffiffiffiffiffiffiffi
N efs M
kopt ¼ pffiffiffiffiffiffi 2
ð3Þ
2p emp dtoBS

4.3 Cluster Head Election


Keeping a fixed specific node as a cluster head may deplete its power faster than some
other nodes in the cluster because the CH consumes more power than the other nodes
in the same cluster. Therefore, a CH rotation criterion is employed. Moreover, a weight
for each node is allocated in the cluster every round. Thus, the node only with max-
imum weight is therefore chosen as the CH. The weight allocated to the cluster nodes is
expressed as:

1 1
W ðni Þ ¼ Eni  jchðni Þj  Neighðni Þ   ð4Þ
costcomm ðni Þ d 2 ðn i ; sÞ

Where Eni residual energy for each node, chðni Þ number of available channels for each
node into the cluster, Neighðni Þ number of neighbors in the cluster for each node,
d ðni ; sÞ is the distance for each node in the same cluster to the sink node and
costcomm ðni Þ is the communication cost for each node into the cluster.
A communication cost for each node is calculated depending on the variable power
transmission levels. The sensor nodes utilize these power levels during the intra-cluster
communication depending on the transmission distances. Moreover, to avoid the dis-
advantages of the static and dynamic approaches of clustering, a hybrid method is used
for clustering the network. It should be mentioned that, at the hybrid method, the
election of CH should not take place in every round.
In this proposed clustering technique, each node in the network area considers itself
as a disjoint cluster at first. The node information consists of node ID, available
channels for each node, and the residual energy for each node. In each round, all cluster
nodes should be aware of cluster information such as the size of clusters, the available
channels for neighbors and the distances between nodes and their neighbors. After that,
each cluster transmits a merge invitation to the nearest cluster, which has the similar
common channel. If it receives request invitation from that cluster, they both will
merge into a new cluster. Keeping in mind that each node instantaneously implements
channel sensing process and if, any PUs activity changed, the only affected node loses
its common channel, so it announces itself as a new cluster. This keeps the network
from the frequent re-clustering for clusters, which increases the stability of the network.
This process continues until the appropriate number of clusters is reached kopt . After
522 R. Bakr et al.

clusters are formed, the process of electing a CH is executed by assigning a weight for
each node depending on the residual energy of the node, the number of channels
available for that node, the number of neighbors for this node inside the cluster, the
distance from that node to the sink and the communication cost for this node inside the
cluster. The same CH is elected for several rounds until its residual energy reaches a
threshold value aEin . If the CH detects that its residual energy is below the threshold
value, which is calculated using a (a is a constant number between zero and 1), then the
node cannot be considered as a CH again and the process of cluster head re-election is
applied. If all nodes are elected as cluster heads and they still have residual energy,
each node of this cluster joins the closest cluster, which has a similar common channel.
If any node cannot find similar cluster common channels, then the node will still in the
cluster and the election of the CH will depend only on the maximum weight until all
nodes become dead. The proposed technique is described by the flowchart in Fig. 2.

Fig. 2. Flowchart of the proposed technique


Energy Efficient Spectrum Aware Distributed Clustering 523

5 Performance Evaluation

The performance of the proposed technique is evaluated in this section by comparing


the performance of LEACH [10], CogLEACH [16] with the proposed technique. For
all simulations, a100 CRSN nodes and 10 PUs are randomly deployed in a 100  100
square meters’ area. The protection range of PUs is chosen as 20 m and the maximum
range for the CRSN node is 50 m. There are three channels used for both PUs
and CRSN nodes. The communication energy parameters are set as Eelec = 50nJ/bit,
efs = 10pJ/bit/m2 and emp = 0.0013pJ/bit/m4. The energy for data aggregation is set as
EDA = 5nJ/bit/signal and 75 m  dtoBS  185 m.
The clusters number varies from 1 to 11 and the average energy dissipated per
round is checked. It has been found that the minimum dissipated energy is obtained
when the number of allocated clusters is about five to six, which is shown in Fig. 3 and
this finding supports the theoretical analysis mentioned above in Sect. 4.2, where the
proposed technique gives the appropriate number of clusters considering the assumed
parameters and it is noticed that about five.

Fig. 3. Optimal Number of clusters

Comparing the network life of the protocols analyzed, Fig. 4 indicates number of
live nodes for proposed technique, CogLEACH, and LEACH protocol. Nodes of the
proposed technique last longer than the others. The clustering process is carried out in a
distributed and self-organized manner, which is sensitive to the dynamic spectrum.
Furthermore, the proposed technique takes into account the choosing of cluster head
nodes by an efficient way to save the communication energy.

Fig. 4. Network lifetime


524 R. Bakr et al.

By changing the value of a as (0.1, 0.5, 0.9), network lifetime will change. Figure 5
shows that the network lifetime is increased by decreasing the value of a. This is
attributed to the decreased value of the residual energy, which is needed for such a node
to work as just a cluster head for several rounds.

Fig. 5. The comparison between different values of a for the proposed technique.

For a = 0.9, the amount of the residual energy required to allow the node to
function as a CH is very high. When the amount of the residual energy for the node that
works as a CH reaches a value lower than that of the threshold value aEin , the CH node
will then become a member node. This will lead to a scenario of no available cluster
head nodes. It should be mentioned that at each round some energy is wasted for the
announcement of the node to be a CH. Thus, the CH selection will be made through the
maximum node weight without taking into consideration the threshold value aEin . So,
with a = 0.9, the network energy is rapidly wasted and the network lifetime is so short.

Fig. 6. The relation between network averages consumed power and number of round with
variation of a.

For a = 0.5, the value of residual energy required to allow the node to work as a CH
is not so high. This makes the node works as a CH for several rounds. That will lead to
save power for longer time.
Energy Efficient Spectrum Aware Distributed Clustering 525

For a = 0.1, the value of residual energy required to allow the node to work as a CH
is so little. This makes the node works as a CH for more rounds, which finally causes a
save of power for the longest time. Thus, it can be inferred that the smaller the value of
a, the longer lifetime the network has with a smaller network power consumption.
Figure 6 Shows total consumed power of all nodes in the network with different
values of a. It is clear that the quantity of energy consumed for the network decreases
with the decrease of the value of a which means increasing the lifetime of the network.

Fig. 7. The relation between variation of a for the proposed technique and network number of
rounds.

Figure 7 shows the relation between a variation and network number of rounds. In
this figure, the number of rounds decreases by rising the value of a which affects the
lifetime of the network.

6 Conclusion

In this paper, an energy-efficient distributed clustering technique is proposed for


cognitive radio sensor networks. The proposed technique works through grouping
CRSN nodes into an appropriate number of clusters, and then the procedure of the
cluster head selection is applied. The proposed technique is considered an energy
efficient and spectrum aware clustering technique. Moreover, extensive simulations
demonstrate that the proposed technique consumes lower amount of energy compared
to other techniques, which leads to maintaining the lifetime of the network for a longer
time.

References
1. Shanthi, S., Nayak, P., Dandu, S.: Minimization of energy consumption in wireless sensor
networks by using a special mobile agent. In: Soft Computing and Signal Processing,
pp. 359–368. Springer, Berlin (2019)
2. Sambana, B., Reddy, L.S., Nayak, D.R., Rao, K.C.B.: Integrative spectrum sensing in
cognitive radio using wireless networks. In: Proceedings of International Conference on
Remote Sensing for Disaster Management, pp. 613–623. Springer (2019)
526 R. Bakr et al.

3. Haykin, S.: Cognitive radio: brain-empowered wireless communications. IEEE J. Sel. Areas
Commun. 23(2), 201–220 (2005)
4. Ahmad, A., Ahmad, S., Rehmani, M.H., Hassan, N.U.: A survey on radio resource allocation
in cognitive radio sensor networks. IEEE Commun. Surv. Tutor. 17(2), 888–917 (2015)
5. Akan, O.B., Karli, O.B., Ergul, O.: Cognitive radio sensor networks. IEEE Netw. 23(4), 34–
40 (2009)
6. Usman, M., Har, D., Koo, I.: Energy-efficient infrastructure sensor network for ad hoc
cognitive radio network. IEEE Sens. J. 16(8), 2775–2787 (2016)
7. Zhang, H., Zhang, Z., Dai, H., Yin, R., Chen, X.: Distributed spectrum-aware clustering in
cognitive radio sensor networks. In: 2011 IEEE Global Telecommunications Conference-
GLOBECOM, pp. 1–6. IEEE (2011)
8. Wu, H., Yao, F., Chen, Y., Liu, Y., Liang, T.: Cluster-based energy efficient collaborative
spectrum sensing for cognitive sensor network. IEEE Commun. Lett. 21(12), 2722–2725
(2017)
9. Daneshvar, S.M.H., Mohajer, P.A.A., Mazinani, S.M.: Energy-efficient routing in WSN: a
centralized cluster-based approach via grey wolf optimizer. IEEE Access 7, 170019–170031
(2019)
10. Heinzelman, W.B., Chandrakasan, A.P., Balakrishnan, H.: An application-specific protocol
architecture for wireless microsensor networks. IEEE Trans. Wireless Commun. 1(4), 660–
670 (2002)
11. Roy, N.R., Chandra, P.: A note on optimum cluster estimation in leach protocol. IEEE
Access 6, 65690–65696 (2018)
12. Younis, O., Fahmy, S.: HEED: a hybrid, energy-efficient, distributed clustering approach for
ad hoc sensor networks. IEEE Trans. Mob. Comput. 3(4), 366–379 (2004)
13. Xu, F., Zhang, L., Zhou, Z., Ye, Y.: Spectrum-aware location-based routing in cognitive
UWB network. In: 2008 3rd International Conference on Cognitive Radio Oriented Wireless
Networks and Communications (CrownCom 2008), pp. 1–5. IEEE (2008)
14. Tabassum, M., Razzaque, M.A., Miazi, M.N.S., Hassan, M.M., Alelaiwi, A., Alamri, A.: An
energy aware event-driven routing protocol for cognitive radio sensor networks. Wirel.
Netw. 22(5), 1523–1536 (2016)
15. Zhang, H., Zhang, Z., Yuen, C.: Energy-efficient spectrum-aware clustering for cognitive
radio sensor networks. Chin. Sci. Bull. 57(28–29), 3731–3739 (2012)
16. Eletreby, R.M., Elsayed, H.M., Khairy, M.M.: CogLEACH: A spectrum aware clustering
protocol for cognitive radio sensor networks. In: 2014 9th International Conference on
Cognitive Radio Oriented Wireless Networks And Communications (CROWNCOM),
pp. 179–184 IEEE (2014)
17. Latiwesh, A., Qiu, D.: Energy efficient spectrum aware clustering for cognitive sensor
networks: CogLeach-C. In: 2015 10th International Conference on Communications and
Networking in China (ChinaCom), pp. 515–520. IEEE (2015)
18. Saini, D., Misra, R., Yadav, R.N.: Distributed event driven cluster based routing in cognitive
radio sensor networks. In: 2016 IEEE Annual India Conference (INDICON), pp. 1–6. IEEE
(2016)
The Autonomy Evolution in Unmanned Aerial
Vehicle: Theory, Challenges and Techniques

Mohamed M. Eltabey1 , Ahmed A. Mawgoud2(&) ,


and Amr Abu-Talleb3
1
Mechanical Power Department, Faculty of Engineering, Cairo University,
Giza, Egypt
mo.eltabey@gmail.com
2
Information Technology Department, Faculty of Computers and Artificial
Intelligence, Cairo University, Giza, Egypt
aabdelmawgoud@pg.cu.edu.eg
3
Computer Science Department, Faculty of Computers Science,
University of People, Pasadena, USA
amr.emad@yahoo.com

Abstract. The research areas in the field of UAVs have increased considerably
during the last years, the research in this field is driven by the specific needs of
each organization that conduct the research, There are two main research areas,
the first is the operational and it is conducted by the governmental institutions
and the universities, and the second is the technological and it is conducted
mainly by the companies, This paper discusses the current technological
research topics in the field of UAVs, focusing on the fuzzy-logic based methods
which are employed in many control problems to increase the level of auton-
omy, the fuzzy-logic is considered as a promising hot subject which contains
many active research topics and multiple potential tools for solving complex
control problems to extend the UAV capabilities to perform different functions
like Optimal path planning, Collision avoidance, Trajectory motion and path
following autonomously without the need of the human pilot with the minimum
human supervision, the paper illustrates the different levels, functions and
challenges of autonomy and a comparative analysis is conducted to analyze four
potential directions which are considered to be promising areas for fuzzy-logic
based approaches. It also highlights the two main areas of AI research in the
field of UAV autonomous flight, 1) the imitation of the human pilot and 2) the
high-level applications like image evaluation, and how to tackle some of the
problems in these areas with aid of fuzzy-logic based machine learning
algorithms.

Keywords: UAV  Artificial Intelligence  Autonomous Flight  Fuzzy Logic

1 Introduction

UAVs are considered an essential part of the military industries and operations, adding
that, it has a main role in many civil scientific and commercial purposes. Currently,
UAVs are being developed to satisfy various needs and these needs are the main

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 527–536, 2021.
https://doi.org/10.1007/978-3-030-58669-0_47
528 M. M. Eltabey et al.

motive behind the growing UAVs research activities, many government institutions,
universities, and research institutes, and public & private sector entities conduct
research that fit their interests [1]. UAV research areas are divided into two main areas,
the first is operational research area which focus on effective usage in terms of policies,
certifications… etc. The second research area are technological, the private sector is
active in this area and universities and research institutes take part in some research
activities, the two areas affect one another and they are interconnected, e.g. the level of
autonomy determines the type of missions that could be achieved by a UAV, generally
speaking, both domains focus on making the most of the UAV [2].
For the human societies, autonomy is the ability for an individual to make an
informed, self-made decision with his own rules and to be goal directed in a constantly
changing environment. Autonomy for the machine is to be able to do a collection of
functions such as, sensing, perceiving, analyzing, communicating, planning and deci-
sion making without human intervention and to be goal directed in unpredictable
situations which may vary greatly from low to high altitudes in a crowded airspace. The
theory behind autonomous flight borrows from many disciplines such as, aeronautical
engineering, automatic control, artificial intelligence and other disciplines [3]. It pro-
vides a control architecture that merges algorithms for mission planning, trajectory
generation, path following and adaptive control theory meeting strict performance
requirements such as all maneuvers have to be collision free [4]. Table 1 below
expresses the ten levels of automation Table 2.

Table 1. Automation Levels


Level Description
1 No computer assistance; the operator must make all decisions and take actions
2 System provides a wide-ranging set of decision/action
3 System narrows the selection down to a few
4 System provides an alternative
5 System executes a suggestion after the operator approval
6 System allows the operator a limited time to veto before the execution
7 System executes automatically, then informs operators
8 System informs the operator only if asked
9 System informs the operator only if it decides to
10 System decides whole actions and acts autonomously, ignoring the operator

Table 2. Functions (Research Areas) of the Unmanned Aerial Vehicles


Number Research Area
1 Sensor and other information fusion
2 Communication management
3 Optimal path planning
4 Collision avoidance
5 Trajectory motion and path following
6 Target identification and threat evaluation
7 Abort decision-making/ response
8 Task scheduling
The Autonomy Evolution in Unmanned Aerial Vehicle 529

UAVs are classified mainly to two types as is the case in any aircraft, 1) Fixed wing
and 2) Rotorcraft, with different theory of operation, flight dynamics, control system
and different applications, this paper focuses on the potential techniques of increasing
the automation level through using a suitable control architecture for both types. UAVs
are classified to be able to have a common terminology as a reference to enhance
communication between different parties taking into consideration that each entity has
its own categories [5]. Figure 1 below represents the main difference between fixed
wing & rotary wing, while Fig. 2 shows the basic fixed wing aircraft components.

Fig. 1. An illustration between various types of UAVs

Fig. 2. Sketch illustrates the main components of the ordinary aircraft


530 M. M. Eltabey et al.

This paper is organized as following, Sect. 2 is a literature review of previous


studies related to automation enhancement in UAV through different techniques, while
Sect. 3 discusses the main challenges that face automation in UAVs, Sect. 4 shows
fuzzy-logic role in enhancing UAV automation level, finally Sect. 5 is a summarization
for the overall discussed ideas in the paper.

2 Literature Review

During the past, there has been extensive research work by control experts and sci-
entists on developing a powerful fault-tolerant control system for different aircraft, that
could sustain a steady flight in adverse conditions. This paper main objective is to
demonstrate fuzzy-logic based autonomous navigation and control main topics and
analyze the major work in these fields. In this section, we are going to present four
papers that used artificial intelligence approaches to improve the autonomous control
level with different control strategies for UAVs [6].
(C. Sabo et al., 2012) There are many cases in which Unmanned Aerial Vehicles
are subject to obstacles while maneuvering in an environment especially when having
little prior understanding of the surrounding objects. To be able to move towards any
target in unknown conditions and the environment in real-time, we should provide an
algorithm that could conduct dynamic motion and path planning. This paper focuses on
developing a fuzzy-logic based approach for two-dimensional motion planning, this
fuzzy system take gathered information about the obstacles with the aid of sensing
devices and target location and it modifies the heading angle and the speed. The
performance of the fuzzy-logic controller was evaluated by validation and testing
methods like Monte Carlo. The fuzzy-logic controller enables the UAV to follow the
exact path provided by the optimal path algorithm with a very low failure rate which
shows the potentiality of further exploration of such controllers. The fuzzy-logic
control method with about a 3% failure rate, versus a common intelligent control
method called Artificial Potential Field (APF) of about 18% failure rate, illustrates the
benefits of such a system of adaptability to complex situations with minimum effort [7].
(M Norton et al., 2014) have proposed in their study an adaptive fuzzy multi-
surface sliding control (AFMSSC) for trajectory tracking of 6 degrees of freedom for
aerial vehicles with multiple inputs and multiple outputs (MIMO). They have explained
that the adaptive fuzzy logic-based function approximator could be the main tool to
have an estimation for the system uncertain aspects of the system and the flight could
be controlled with an iterative multi-surface sliding control. Using AFMSSC with
MIMO autonomous flight systems could provide a type of control that could serve
matched and mismatched uncertain aspects, internal dynamic excitation and distur-
bances of the system. The AFMSSC system has assured the output tracking and the
boundedness of the tracking error. Also, they also presented the results of the simu-
lation to validate their analysis [8].
(N Ernest et al., 2016) The authors of this paper introduce ALPHA, an Artificial
Intelligence that controls Unmanned Combat Aerial Vehicles in virtual combat with a
high precision simulation environment. They have utilized the fuzzy logic-based
Artificial Intelligence approaches that could be used to solve highly complex control
The Autonomy Evolution in Unmanned Aerial Vehicle 531

problems. This problem represents one of the most complex applications of a fuzzy-
logic based Artificial Intelligence to an Unmanned Combat Aerial Vehicle control
problem. This advancement has been possible by the progress in genetic fuzzy tree
methodology. The ability to have higher performance, increased computational effi-
ciency, besides robustness with uncertainties, adaptability with changing situations,
verified and validated that it follows the safety specifications and operating instructions
with definite practices, the easiness of the design and execution are some of the strength
points of this type of control [9].
(M Talha, 2018) This paper proposes a fuzz logic-based position and speed control
auto landing technique. This system controls the attitude of the UAV system by the
usage of velocity information and real-time position. The proposed fuzzy logic con-
troller which can be considered as a hybrid of position and velocity control algorithm
provides a fast and autonomous landing performance. Controlling the velocity is
responsible for secure landing and it protect the UAV from hitting the ground at high
speeds. Position control is important to determine the altitude and to generate com-
mands that overcame the in-ground-effect for easy and fast landing. This fusion
between position and speed control improve the efficiency because it reduces the
landing time and raises the safety emphasis comparing to conventional controllers [10].
The response time during landing could be enhanced by the adoption of Lookup table-
based fuzzy logic technique which improved the execution time as compared to a
normal fuzzy technique. With the aid of the simulation environment we could compare
the conventional PID controller to the landing controller proposed, the results have
illustrated clear refinement to the previous method. Moreover, this technique has been
applied to a quadcopter to verify its capability in the real world, the results are used to
verify the practical utilization in real-time. It could be noticed that the proposed con-
troller is safer and quicker when comparing to conventional methods. This technique
could be applied for all types of copters without significant modifications [11].

3 Autonomous Unmanned Aerial Vehicles Challenges

Autonomy of a UAV is affected by many factors including environmental difficulty,


mission complexity and the level of human intervention to achieve the mission goals.
The main objective of UAV regulations is to maintain safe operations quantified as an
equivalent level of safety with the manned aircraft. The level of the UAV autonomy
will pose some certification issues such as: The compliance of human-machine inter-
action with the current air traffic control (ATC) instructions, handling UAV failures,
collision avoidance and avoidance of sensitive areas and objects. Aircraft sense and
avoid systems (SAA) focus on ensuring adequate level of safety by executing self-
separation and collision avoidance [12].
Small UAVs face a difficult regulation challenge, with thousands of them are sold
yearly, a beginner could build a small UAV easily with available parts with many
resources and parts are sold locally or from the internet. UAVs can pose a real threat to
many aerial vehicles including commercial aircraft, and other ground facilities.
532 M. M. Eltabey et al.

The control could be lost between the operators and the UAV during flight. It has been
reported that UAVs have been used in many illegal conducts like contraband smuggling
for prisoners and across borders, but there has been no serious accident. Regarding the
security and privacy aspect, the risks posed by the improper usage of UAVs represent a
risk to people’s privacy, with its high altitude and camera recording capabilities it could
be used to make video recordings for people’s properties through many ways. So, this
aspect hasn’t reached the required level for minimal risk. These challenges have been
addressed by the governmental bodies with multiple strategies, the ownership and
operations for UAVs are regulated the law enforcement agencies ensure that rules are
applied supported by many technological tools that could start from signal jamming and
end with capturing and attacking to bring down the UAV [13].

3.1 Artificial Intelligence


AI Application in the UAVs can be divided into two main areas:
• First Area: Imitation of the human pilot behavior by the machine in different
scenarios to maintain the controllability and stability of the aircraft different systems
and ensures that the aircraft achieves its objectives, common examples are object
detection and obstacle avoidance. Currently, many flying aids support the human
pilot in many tasks. There is ongoing work to introduce the UAVs to the civil
airspace. The current operations of the UAVs depend heavily on human pilots, the
Artificial Intelligence methods are powerful tools to increase the level of inde-
pendence and can contribute for a smarter UAVs that could operate without even
the need of human supervision [14]. There are many challenges to be explored in
this area. The current operations of the UAVs depend heavily on human pilots, the
Artificial Intelligence methods are powerful tools to increase the level of inde-
pendence and can contribute for a smarter UAVs that could operate without even
the need of human supervision. There are many challenges to be explored in this
area.
• Second Area: Image/data evaluation; many technologies are assured such as “Auto
image recognition”, a common example for this technology is “Nametag assign-
ment” for taken photographs, the current technologies for examining media file
especially videos and photos are usually expensive and time-consuming, the usage
of an image recognition technology with a suitable applied AI approach would
make it easy for the AI to simplify the identification process for the person in the
loop for further assessing and decision making [15].
AI and software systems are required to gather the sensor information and make
proper decisions. To make sure that these systems could achieve their objectives in
highly uncertain situations, extensive simulations are performed to assure the behavior
of these autonomous systems. These technologies also should be designed and tested to
verify their level of safety, also, AI systems should be trained by simulation models of
different dynamic environments, airspace, and adverse weather [16]. By using different
tailored simulations, the development of optimized, safe and reliable systems could be
The Autonomy Evolution in Unmanned Aerial Vehicle 533

faster for testing and certifying than using physical models. The potential of AI
applications in UAVs is huge that with just some flying hours connected with the
internet, the UAV can use a camera to take pictures, process them and increase the
intelligence level [17]. This will make it able to refine the software to be better in
achieving the assigned objectives. While doing some tests, the AI-powered UAVs
proved to be able to explore various environments without interventions by just trial
and error method. UAVs could use the data gathered from the sensors and analyze the
obstacles to help it plan its way. ‘Fuzzy Logic’ is one of the main Artificial Intelligence
approaches applied for this purpose. UAV is used in industry in many various sectors
(i.e. Mining, Telecommunication, Insurance, Media, Security, Transport and Infras-
tructure) [18] as shown in Fig. 3 below.

MINING 3.4
TELECOM 5
INSURANCE 5.3
MEDIA 6.9
SECURITY 8.2
TRANSPORT 10.2
AGRICULTURE 25.5
INFRASTRUCTRE 35.5

0 10 20 30 40

Fig. 3. UAV usage rate by various industry sectors (Commercial Drone Professional, 2018)

4 Fuzzy-Logic Role in Increasing UAV Autonomy Level

Fuzzy logic is a practice of multiple-valued logics where true values can be a real
number between zero and one. It is main function is to represent the definition of partial
truth, where the true value can be set between entirely true and entirely false. The main
target of this section is highlighting four previous studies that discussed the autono-
mous improvements in UAV through fuzzy logic; to enhance its control systems [19].
In Table 1, a comparative study on four papers were made; to clarify different fuzzy
logic techniques for solving various autonomy UAV challenges Table 3.
As future work, the usage of MATLAB/Simulink based UAV simulation system
can be used as a comparative study on the previously discussed four papers with
various parameters [20]; in order to measure the improvement autonomy level in each
experiment, the simulation sketch of the application is shown in Fig. 4 above.
534 M. M. Eltabey et al.

Table 3. A comparative study between four researches related to fuzzy-logic & artificial
intelligence applications in autonomous flight
Paper Challenge Technique Goal
No.
(C. Maneuvering and A fuzzy-logic based An algorithm that could
Sabo moving towards any approach for two- conduct dynamic motion
et al., target for unknown dimensional motion and path planning
2012) environment conditions planning
in real-time
(M Provide a type of control An adaptive fuzzy To have an analyzed and
Norton that could serve matched multi-surface sliding validated AFMSSC
et al., and mismatched control (AFMSSC) for system for output
2014) uncertain aspects, trajectory tracking of 6 tracking and the
internal dynamic degrees of freedom for boundedness of the
excitation and UAV with multiple tracking error
disturbances of the inputs and multiple
system outputs (MIMO)
(N To have higher ALPHA, an Artificial To solve highly complex
Ernest performance, increased Intelligence that controls control problems using a
et al., computational Unmanned Combat fuzzy logic-based
2016) efficiency, besides Aerial Vehicles in Artificial Intelligence
robustness with virtual combat with a approach
uncertainties, high precision
adaptability with simulation environment
changing situations
(M Reducing the landing A fuzzy logic-based Controlling the UAV
Talha time without position and speed attitude system by the
et al., compromising the safety control auto-landing usage of velocity
2018) emphasis technique information and real-
time position

Fig. 4. An example of MATLAB/Simulink usage-based UAV simulation system


The Autonomy Evolution in Unmanned Aerial Vehicle 535

5 Conclusion

This paper has discussed the current challenges in using fuzzy-logic control in
increasing the level of autonomy for UAV complex control problems, the papers
discussed introduced the fuzzy-logic based ordinary and machine learning algorithms
approach that could solve the complex-control problems. For I) general control prob-
lems, 1) it is proved that the AFMSSC system guarantees asymptotic output tracking
and ultimate uniform boundedness of the tracking error. Simulation results are pre-
sented to validate the analysis. 2) An algorithm that could conduct dynamic motion and
path planning. A fuzzy-logic based approach for two-dimensional motion planning. II)
And in more Specific Applications, 3) ALPHA, an Artificial Intelligence that controls
Unmanned Combat Aerial Vehicles in virtual combat with a high precision simulation
environment. to solve highly complex control problems using a fuzzy logic-based
Artificial Intelligence approach, to have higher performance, increased computational
efficiency, besides robustness with uncertainties, adaptability with changing situations.
4) Reducing the landing time without compromising the safety emphasis. Controlling
the UAV attitude system by the usage of velocity information and real-time position.
A fuzzy logic-based position and speed control auto-landing technique. This work adds
immensely to the body of evidence that this methodology is an ideal solution to a very
wide range of control problems. And it could serve as a robust control system in the
adverse conditions and uncertainties of the surrounding environments.

References
1. Kumari, P., Raghunath, I.: Unmanned aerial vehicle (DRONE). Int. J. Eng. Comput. Sci. 5
(6), 16761–16764 (2016). ISSN: 2319-7242
2. Prystai, A., Ladanivkyy, B.: UAV application for geophysical research. Geofiz. cheskiy Zh.
39(2), 109–125 (2017)
3. Zhou, J., Zhu, H., Kim, M., Cummings, M.: The impact of different levels of autonomy and
training on operators’ drone control strategies. ACM Trans. Hum. Robot Interact. 8(4), 1–15
(2019)
4. Mokhtar, M., Matori, A., Yusof, K., Embong, A., Jamaludin, M.: Assessing UAV landslide
mapping using unmanned aerial vehicle (UAV) for landslide mapping activity. Appl. Mech.
Mater. 567, 669–674 (2014)
5. Altena, B., Goedemé, T.: Assessing UAV platform types and optical sensor spec-ifications.
ISPRS Ann. Photogrammetry Remote Sens. Spat. Inf. Sci. 2(5), 17–24 (2014)
6. Mekki, H., Djerioui, A., Zeghlache, S., Bouguerra, A.: Robust adaptive control of coaxial
octorotor UAV using type-1 and interval type-2 fuzzy logic systems. Adv. Model. Anal. C.
73(4), 158–170 (2018)
7. Sabo, C., Cohen, K.: Fuzzy logic unmanned air vehicle motion planning. Adv. Fuzzy Syst.
2012, 1–14 (2012)
8. Norton, M., Stojcevski, A., Kouzani, A., Khoo, S.: Adaptive fuzzy multi-surface sliding
control of multiple-input and multiple-output autonomous flight systems. IET Control Theor.
Appl. 9(4), 587–597 (2015)
536 M. M. Eltabey et al.

9. Ernest, N., Carroll, D.: Genetic fuzzy based artificial intelligence for un-manned combat
aerial vehicle control in simulated air combat missions. J. Defense Manag. 06(01), 0374–
2167 (2016)
10. Talha, M., Asghar, F., Rohan, A., Rabah, M., Kim, S.: Fuzzy logic-based ro-bust and
autonomous safe landing for UAV quadcopter. Arab. J. Sci. Eng. 44(3), 2627–2639 (2018)
11. Kim, D., Park, G.: Uncertain rule-based fuzzy technique: nonsingleton fuzzy logic system
for corrupted time series analysis. Int. J. Fuzzy Logic Intell. Syst. 4(3), 361–365 (2004)
12. Anicho, O., Charlesworth, P., Baicher, G., Nagar, A.: Conflicts in routing and UAV
autonomy. J. Telecommun. Digit. Econ. 6(4), 96–108 (2018)
13. Zhi, Y., Fu, Z., Sun, X., Yu, J.: Security and privacy issues of UAV: a survey. Mob. Netw.
Appl. 25(1), 95–101 (2019)
14. Mawgoud, A.A., Hamed, N., Taha, M., El Deen, M., Khalifa, N.: Cyber security risks in
MENA region: threats, challenges and countermeasures. In: International Conference on
Advanced Intelligent Systems and Informatics. Springer, Cham, pp 912–921 (2020)
15. Ropero, F., Muñoz, P., R-Moreno, M.: TERRA: a path planning algorithm for cooperative
UGV–UAV exploration. Eng. Appl. Artif. Intell. 78, 260–272 (2019)
16. Zhuo, X., Koch, T., Kurz, F., Fraundorfer, F., Reinartz, P.: Automatic UAV image geo-
registration by matching UAV images to georeferenced image data. Remote Sens. 9(4), 376
(2017)
17. Ampatzidis, Y., Partel, V.: UAV-based high throughput phenotyping in citrus utilizing
multispectral imaging and artificial intelligence. Remote Sens. 11(4), 410 (2019)
18. Cho, J., Lim, G., Biobaku, T., Kim, S., Parsaei, H.: Safety and security man-agement with
unmanned aerial vehicle (UAV) in oil and gas industry. Procedia Manufac. 3, 1343–1349
(2015)
19. Mawgoud, A.A., Ali, I.: Statistical insights and fraud techniques for telecommunications
sector in Egypt. In: International Conference on Innovative Trends in Communication and
Computer Engineering (ITCE), pp 143–150. IEEE (2020)
20. El Karadawy, A.I., Mawgoud, A.A., Rady, M.H.: An empirical analysis on load balancing
and service broker techniques using cloud analyst simulator. In: International Conference on
Innovative Trends in Communication and Computer Engineering (ITCE). Aswan, Egypt,
pp. 27–32. IEEE (2020)
A Non-destructive Testing Detection Model
for the Railway Track Cracks

Kamel H. Rahouma(&), Samaa A. Mohammad,


and Nagwa S. Abdel Hameed

Electrical Engineering Department, Faculty of Engineering,


Minia University, Minia, Egypt
kamel_rahouma@yahoo, samaa_ahmed.pg@eng.s-mu.edu.eg,
nagwa_minia@yahoo.com

Abstract. The aim of the paper is to detect the cracks occur in the non-
electrified track-areas of railway tracks of the Egyptian Railway (ENR) with
respect to barking distance. The paper depends on an electrical model uses Non-
Destructive Long-Range Ultrasonic-Testing (NDT-LRUT). The model is vali-
dated by using MATLAB/SIMULINK software. By comparing the results of
this model and other related work in [1], although our model detect cracks at
distance 1192 m which is lower than the crack distance detected in [1] that equal
to 1722 m, but our model is more flexible than model in [1]. That’s because our
model can detect larger distance easily by changing the frequency of the
ultrasonic source and also because it’s simpler than [1] in analysis, calculations
and applying. For the weaknesses that need more work, the model needs more
development on how exactly to connect to the track circuit block.

Keywords: Egypt railway  Railway crack detection  Non-electrified railway


track  Non-Destructive Testing (NDT)  Long-Range Ultrasonic Testing
(LRUT)

1 Introduction

The danger of railway track cracks lies on that it may lead to a possibility of three
effects, either to cause a deviation of the train from its normal route, or to cause a
complete turnover of the train especially when it is at its highest speed, or it causes a
collision between the train and one of the other trains or anything around [2]. Also may
be the three possibilities occur in the same time and this has a very bad effect on the
lives of people on the train as it leads to great material losses to the country in terms of
trains and installations also that costs of restoration and maintenance. Also, the phe-
nomenon of wheel wear and alignment of the track is one of the causes of the con-
tinuous growth of fatigue of the bars, which leads to cracks or fractures in the tracks [3].
Although there are many reasons that lead to cracks in the railway tracks but it can be
summarized as a basic cause; which is an incompatibility between the maximum load
vehicle can load and actual weight that track load. When the actual load is larger than
the ideal load, this causes fractures or cracks in the railway tracks. For example; goods
trains require a special tracks that can carry more weight and more load than those used

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 537–547, 2021.
https://doi.org/10.1007/978-3-030-58669-0_48
538 K. H. Rahouma et al.

for passengers trains. So, when the normal tracks are used for goods trains’ movement
it can cause cracks. Another example; when passenger trains are loaded twice the
number of passengers, which increases the weight of the train and is loaded onto the
tracks which may not load this so, it is furcated. Also when there is a difference in
temperature significantly; this leads to the occurrence of expansion and contraction of
the iron of the track. This is subject to fracture or crack for the slightest load that occurs
on it [2]. There are many accidents occur due to cracks. In New Delhi - India in 2017,
as some cracks on the rail tracks caused a deviation in the train track near Mahuba
Station in Uttar Pradesh state - India. This accident caused losses on the physical and
humanitarian sides. On the physical side, the accident destroyed about 400 m of the
bar. Meanwhile, the accident injured about 52 passengers [4]. The purpose of the paper
is to develop an electrical model to detect crack occurrence on the railway track with
respect to barking distance required to stop train according to its speed to guarantee
safe stop for the train.

2 Related Work

There are many trends in the issue of detecting railway cracks. In [5] Sharma,
Kumawat, Maheshwari and Jain conducted a survey of the various methods used to
detect obstacles along the railway tracks. They have shown that the use of LRUT,
image processing (video), ground penetration radar (GPR), light emitting diode
(LED) light, photoresist (LDR) assembly, and fuzzy gates are methods for detecting
broken or cracked paths. In [6] Acinash and Aruna specified fixed points for the LED-
LDR assembly to detect broken railroad tracks, but it did not provide the detected
distance. In [7] Campos, Gharaibeh, Mudge and Kappatos used a long-range ultra-
sound test (LRUT) technique that examines the head, web, and foot of a rail track path
to detect any cracks or manufacturing defects. In [8] Maneesha, Sameer, Jay and John
provided a scanning technique that could detect missed track sections using image
processing methods. From this we concluded that the ultrasound test technique can be
relied upon to detect cracks in the railway tracks, as the test gives a microcosm of the
position of the track. So, we decided to use Long-Range Ultrasonic Testing (LRUT).
In [1]; Rahouma, Mohammad and Abdel Hameed provided an electrical model to
detect crack occurrence and crack distance along the railway track. They consider the
railway track as a transmission line which can be simplified into RL circuit and by
analysis the circuit using two port network the crack occurrence is detected. They could
detect crack occurrence at distance of 1722 m along the track. The advantage of this
method it is accurate to detect the crack distance but the disadvantage of it that it is
considered complex in its measurements.

3 Methodology

The equivalent circuit of the railway track can be represented as shown in Fig. 1 that
shows two sections of track circuit [1] and use a sound signal as its input and study the
output response to detect whether there is a crack on it or not.
A Non-destructive Testing Detection Model for the Railway Track Cracks 539

Fig. 1. Equivalent circuit of two section cascaded track circuits

3.1 Non-destructive Testing


Non-destructive testing (NDT) is a technique that is used to evaluate both of integrity
and properties of a specific material without causing any damage in it. There are
various techniques of NDT but Acoustic Emission Testing (AE) and Electromagnetic
Testing (ET) are the most popular methods [9]. So, Ultrasonic Testing (UT) and Visual
Testing (VT) are important means of NDT [10]. The NDT especially the long range
ultrasonic testing (LRUT) is chosen to test and detect cracks along the railway track
because it doesn’t cause any damage in the railway track itself after testing. It describes
main parameters of the track like resistance, connectivity, temperature and pressure. So,
it can give an image about the continuous status of the track and it will be easy to detect
any crack in it. The advantage of using LRUT that the test can be performed by fixed
points of transducers which are placed at the edges of the track to guarantee that the
whole track is examined like it is shown in Fig. 2. And also it can be performed using
portable devices [11].

Fig. 2. Fixed points - transducers along the track within the test

3.2 Long Range Ultrasonic Testing (LRUT)


Usually the LRUT is used to examine the pipelines, water and sewage pipes. In
pipelines; the LRUT is used to examine 180 m pipes [12]. The LRUT can be per-
formed by ring of transducers which are fitted around the track. Those transducers
transmit and receive high frequency ultrasonic waves along the track and by the
returning echoes, the crack can be detected.
540 K. H. Rahouma et al.

3.3 Physical Modeling of LRUT for Crack Detection in Railway Track

• Equivalent Circuit of Track Circuit


Figure 3 shows the equivalent circuit of single section track circuit block [1]. For the
LRUT the source can be represented by a sinusoidal waveform. Equation (1) calculates
the track impedance.

Fig. 3. Equivalent circuit for track circuit block

Z ¼ R þ JxL ð1Þ

• Mathematical Modeling of LRUT Track Circuit


The sound speed (velocity) can be calculated by Eqs. (2) and (3) (Table 1).

C¼fk ð2Þ

Table 1. Track parameters and units


Symbol Identification Unit
Z Rail impedance ohm (X)
R Rail resistance ohm (X)
L Rail inductance henry (H)
C Sound speed in steel ¼ 5960 m/s
f Frequency of the sound signal cycles/s = Hz
we consider f = 3 MHz
T Periodic time of the sound signal s (sec)
K Wavelength of sound signal m (meter)
d Distance of the travelling sound wave m (meter)
t Time consumed by the travelling sound wave s (sec)
dCrack Crack distance m (meter)
A Non-destructive Testing Detection Model for the Railway Track Cracks 541

d
C¼ ð3Þ
t
1
f¼ ð4Þ
T

The difference between Eq. (2) and Eq. (3) that Eq. (2) calculates the speed (ve-
locity) for single sinusoidal wave so, k represents the single wavelength of this wave as
shown in Fig. 4. Equation (3) calculates the speed (velocity) of the whole sinusoidal
waveform in the steel so, d represents the distance wave travelled from point to point of
crack as shown in Fig. 6. So, d represented the whole wavelength for the trav-
elled + reflected wave.

Fig. 4. The wavelength of sound signal

This means that the wavelength (k) is the fixed wavelength of single sinusoidal
signal and the distance (d) is a number of wavelengths at specific point or time for the
travelling sound wave. For the example in Fig. 5; it is easy to detect a distance
d = 2 * k.

Fig. 5. The distance of wave at specific point


542 K. H. Rahouma et al.

• Mechanism of working for LRUT


When no crack is detected: From Eq. (2); the wavelength of the signal can be calcu-
lated from Eq. (2) the wavelength can be calculated from Eq. (5):

C
k¼ ð5Þ
F
5960
k¼ ¼ 1:98  103 ’ 2 mm ð6Þ
3  106
1 1
T¼ ¼ ¼ 0:33 ls ð7Þ
f 3  106

The k = 2 mm in Eq. (6) shows the minimum and fixed distance between two
maxima or two minima of the sinusoidal signal at fixed periodical time equal to
0.33 ls.
When crack is detected:

Fig. 6. Mechanism of working for LRUT

Figure 6 shows the mechanism of working of LRUT. The travelling wave has a
motion which transfers from point to point in the right direction until it ran into a crack
which is shown in the Fig. by the X point. Then it starts to be reflected again in the
opposite direction toward the source as shown in the motion of the reflected wave.
Here; when the travelling wave started to move completing a total wavelength k in a
periodical time T. Then; it ran into a crack at the X crack point. So; the wave started to
be reflected again in the opposite direction. After; the reflected wave is reaches the
source; it consumed a total time equal to twice the time it was discovered in the crack.
So; when the crack distance is calculated, the time will be divided by two to get it. For
the previous example; If the travelling signal consumed 0.4 s to detect a crack and start
to be reflected again to the source.
A Non-destructive Testing Detection Model for the Railway Track Cracks 543

t
t ¼ time consumed for the reflected wave ¼ ð8Þ
2
Using Eq. (3);

t
d¼C ð9Þ
2
0:4
d ¼ 5960  ¼ 1192 m ¼ dCrack ð10Þ
2

Figure 7 shows the relationship between the signal wavelength and the crack
distance of the reflected signal. It shows that the crack can be detected as a specified of
wavelength and it is based on the time consumed by the reflected wave to get back to
the source.

Fig. 7. Relationship between signal wavelength and crack distance

4 Proposed System

Figure 8 shows the proposed system. The proposed system depends on connecting an
acoustic source to the track within frequency range 1–20 MHz to produce ultrasonic
waveform. Then a test is performed at steady state point when there is no crack
occurred at the crack or we can say that the standard measures is obtained when the
track is well-designed and has no problem. So, it can be concluded that crack occurred
when change occurs with compared with the steady state variables. When no crack is
544 K. H. Rahouma et al.

detected; measure the wavelength k and periodical time T of the travelling wave from
Eqs. (2), (4) and (5). When crack is detected; measure the crack distance dCrack and
consumed time t of the reflected wave from Eqs. (3), (8) and (9). By defining the crack
distance the action for stop the train can be occurred.

Fig. 8. The Model Flowchart

5 Results
5.1 Results for Model
Figure 9 shows the output response of NDT-LRUT for the railway track. Figure (9 – a)
shows the output when no crack is detected and it shows the periodic time which equal
to 0.2 s. Figure (9 – b) shows the output when crack is detected and it shows the
periodic time which equal to 0.4 s. By applying Eq. (9), the crack distance can be
calculated.

Fig. 9. Output response (a) No crack detected (b) Crack detected

t 0:4
d¼C ¼ 5960  ¼ 1192 m ¼ dCrack ð11Þ
2 2
A Non-destructive Testing Detection Model for the Railway Track Cracks 545

Fig. 10. Displacement between the travelling and reflected wave

Figure 10 shows the displacement which occurs between both of travelling and
reflected wave when crack is detected. The crack distance is larger than the required
barking distance to stop 120 km/hr train which equals to 766 m [1] so, it is safe to stop
the train. This result means that the ultrasound technique used in this paper is suc-
cessful to detect crack at a distance of 1192 m using a sinusoidal frequency equals to
3 MHz which is a low frequency of the ultrasound frequency spectrum. This means
that the more the ultrasound signal frequency increased; the more it can detect cracks at
long distance.

5.2 Discussion and Comparison of the Results


Usage of LRUT is very helpful for railway track’s cracks detection. Table 2 shows a
comparison between two models detect crack occurrence.

Table 2. Comparison between our model and model in [1]


Transmission-Line Model [1] NDT-LRUT Model
Methodology Transmission line theory Acoustic analysis
Two port network analysis
Relation with Depends mainly on signaling Don’t depend on signaling system
Signaling system especially on track circuit
System block
Validation MATLAB/SIMULINK MATLAB/SIMULINK
Method
Crack Up to 1722 m Up to 1192 m
Distance
(continued)
546 K. H. Rahouma et al.

Table 2. (continued)
Transmission-Line Model [1] NDT-LRUT Model
Strengths Accurate mathematical analysis Cost effective
Simplified analysis
More flexible to detect larger distances
Can be applied at fixed points or using
portable LRUT devices
Care of the non-electrified areas of
tracks
Limitations Complex mathematical analysis Need more development to design a
Need to re-establish signaling transducer suitable for railway
system applications
Expensive cost Consider the track is continuous
Can be applied only at fixed without joint points or wielding points
points

6 Conclusion and Future Work

The paper provides an electrical model which is based on using a new technology of
non-destructive long-range ultrasonic testing (NDT-LRUT) to detect crack occurrence
along the track. The model is built to solve the problem of crack detection on the non-
electrified tracks not connected directly to the electrical signaling system in Egyptian
Railways. The model is developed by using LRUT for a distance reach 1192 m and it is
very useful property for this method. That’s because the traditional LRUT used in
pipelines could check a pipe with length of just 180 m. So, the paper could improve the
LRUT to detect larger lengths for the railway tracks. For the future work it is planned to
design and implement a transducer suitable for the railway applications to apply the
NDT-LRUT testing for the railway tracks. Also it is planned to study the effect of joint
points between track sections on the analysis.

References
1. Rahouma, K.H., Mohammad, S.A., Hameed, N.S.A.: A mathematical model for detection of
railway track cracks based on the track signalling system. Egypt. Comput. Sci. J. 44(2), 32–
50 (2020)
2. Rohan Damodar, S.: (1) How do cracks occur in railway tracks? – Quora (2017). https://
www.quora.com/How-do-cracks-occur-in-railway-tracks. Accessed 17 Feb 2020
3. Doherty, A., Clark, S., Care, R., Dembosky, M.: Articles - why rails crack? Ingenia Online
23, 23–28 (2005)
4. The Times of India. Crack in tracks apparently led to derailment: Railways | India News -
Times of India (2017)
5. Sharma, K., Kumawat, J., Maheshwari, S., Jain, N.: Railway security system based on
wireless sensor networks: state of the art. Int. J. Comput. Appl. 96(25), 32–35 (2014)
6. Vanimireddy, A., Kumari, D.A.: Automatic broken track detection using LED-LDR
assembly. Int. J. Eng. Trends Technol. 4(July), 2842–2845 (2013)
A Non-destructive Testing Detection Model for the Railway Track Cracks 547

7. Campos-Castellanos, C., Gharaibeh, Y., Mudge, P., Kappatos, V.: The application of long
range ultrasonic testing (LRUT) for examination of hard to access areas on railway tracks.
In: IET Conference Publication, vol. 2011, no. 581 (2011)
8. Singh, M., Singh, S., Jaiswal, J., Hempshall, J.: Autonomous rail track inspection using
vision based system. In: Proceedings of the 2006 IEEE International Conference on
Computational Intelligence for Homeland Security and Personal Safety, CIHSPS 2006, vol.
2006, no. October, pp. 56–59 (2006)
9. What is Non-Destructive Testing (NDT)? - Methods and Definition – TWI. https://www.twi-
global.com/technical-knowledge/faqs/what-is-non-destructive-testing. Accessed 17 Feb
2020
10. Types of Non-Destructive Testing (NDT) | Nucleom: Knowledge section. https://nucleom.
ca/en/knowledge/types-of-non-destructive-testing/. Accessed 20 Feb 2020
11. Long Range Ultrasonic Flaw Detector With A Scan, Portable Ultrasonic Flaw Inspection -
Buy Ultrasonic Inspection, Ultrasonic Nondestructive Test, Non Destructive Testing Product
on Alibaba.com. https://www.alibaba.com/product-detail/long-range-ultrasonic-flaw-
detector-with_60317227583.html?spm=a2700.7724857.normalList.17.a1e769dbBo0S44.
Accessed 17 Feb 2020
12. Long range ultrasonic testing (LRUT) | LMATS. https://lmats.com.au/services/advanced-
ndt-solutions/gw-lrut-guided-wave-long-range-ultrasonic-testing. Accessed 20 Feb 2020
Study of Advanced Power Load Management
Based on the Low-Cost Internet of Things
and Synchronous Photovoltaic Systems

Elias Turatsinze1,2,4,5, Kuo-Chi Chang1,2,4,5,6(&), Pei-Qiang Li1,4,5,


Cheng-Kuo Chang1,2,4,5, Kai-Chun Chu3,4,5, Yu-Wen Zhou1,2,4,5,
and Abdalaziz Altayeb Ibrahim Omer1,2,4,5
1
School of Information Science and Engineering, Fujian University
of Technology, Fuzhou, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
Institute of Environmental Engineering, National Taiwan University,
Taipei, Taiwan
4
Department of Business Management, Fujian University
of Technology, Fuzhou, China
5
Institute of Construction Engineering and Management,
National Central University, Taoyuan, Taiwan
6
College of Mechanical and Electrical Engineering, National Taipei University
of Technology, Taipei, Taiwan

Abstract. The main problem with today’s power consumption is to determine


which load is consuming much power so that it can be disconnected to reduce
high electrical bills. The main concern of today’s researches is to find an
alternative way of installing a grid-connected PV system or other renewable
resources free of harmful gases so that they add up to the electrical grids to
provide enough power. This study presents how to synchronize a photovoltaic
system with a grid to save money and electrical load management based on IoT
technology. However, current sensing is used to sense the current when loads
are connected and send the instantaneous value of current to the Arduino Uno
which is programmed to find the total power consumed and then ESP 8266 WiFi
module uploads the data to the Thingspeak for analysis and monitoring where
utility company or users can access the information about the usage of electricity
since many PV systems are in remote areas. A simulation of an inverter with
PLL circuit for generating the desired signal waveform and frequency is done in
MATLAB/Simulink and the objective was successfully achieved which showed
that the output voltage is a sinusoidal waveform of the same frequency and same
phase angle and a current sensor can be installed at the output of the inverter to
sense the power generated for analysis and monitoring of the proposed PV
system.

Keywords: Load management  Internet of Things  PLL  PV system 


Voltage Controlled Oscillator

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 548–557, 2021.
https://doi.org/10.1007/978-3-030-58669-0_49
Study of Advanced Power Load Management 549

1 Introduction

In recent years, many developing countries have faced power outages, instability of
power systems, and power reliability issues in the loaded sector [1–3]. PV solar system
is among the simple renewable sources with simple design and structure and low
operating cost which can have priority to be used in grid-connected systems by proper
design of the boost converters and inverters to meet the requirements of the syn-
chronism [4, 5]. IoT is one of the emerging green technologies which include sensors,
actuators and many other devices communicating together to perform a specific task
without the intervention of human action [6–8]. This study’s objective is to synchronize
the generated PV solar energy to the grid and to provide an IoT based power moni-
toring system. The PV system mainly uses solar energy to convert solar energy into
electricity using a whole set of photovoltaic panels. An important issue of a PV system
is to match the inverter output voltage/current waveform with the grid voltage wave-
form. When there is a mismatch, there will be safety concerns when the system is
connected in parallel. Inverter output signals are currently generated using PLL (Phase
Locked Loop) circuits [9–12]. The Internet of Things technology is explored in the
electrical load management system (ELMS) which will lead to smart grids to meet the
expectation of today’s power systems [13–17]. However, this study includes
methodology, the internal components of the proposed IoT system, designing the
software for this proposed system, grid-connected PV system, Results, and interpre-
tations, Conclusion, future work, and references. This is an important contribution of
this research.

2 Methodology

This study uses an IoT-based power load management system to effectively manage the
power consumed and minimize the costs of electricity by incorporating the photo-
voltaic system. The IoT is used to develop an embedded system that is installed at the
output of a distribution transformer based on low-cost system of Arduino Uno and ESP
8266 WiFi module. The system monitors the load connected to the power distribution
network, provides updates about the power usage, and the control station manager
takes action based on recorded measurements. When the peak power demand is
attained, this proposed system can be used to disconnect some loads connected to the
power distribution, thereby effectively monitoring the power system and at this moment
the loads will be powered by the photovoltaic system (Fig. 1).
This study focuses on the internal unit, which includes the control part of the
system. External components are regular parts of the power system, which are: power
generation, transmission, and distribution. This proposed system is designed in a way
that all equipment can work together to achieve the required power monitoring goals.
550 E. Turatsinze et al.

Fig. 1. Block diagram of IoT ELMS.

3 Design the Software for This Proposed System

The embedded system should have software that uses the Arduino IDE to control
operating modes. After powering the dc power supply unit system, the IoT system
designed in this project was initialized. All available loads are then powered on using
relays programmed using Arduino Uno for controlling and monitoring the entire
design. The current sensor immediately after getting powered and loads drawing cur-
rent, it starts to measure the current flowing through the conductor and sends the data to
the Arduino where this Arduino Uno is programmed using electrical power formulas
for calculating the power consumed (Fig. 2).

Fig. 2. Flowchart.

The electrical power consumed by electrical loads is calculated by measuring the


potential difference between two conductors and the current flowing through that live
conductor. Since electrical loads used here are AC loads, the power consumed was
calculated using the following formula (1):
Study of Advanced Power Load Management 551

P¼V I ð1Þ

Where P is the power consumed, V is the root mean square (RMS) value of the voltage
in, and I is the RMS current flowing in the conductor.
The voltage here is 220 V and the current sensor provides instantaneous current
drawn by the loads. At the same time, Arduino will transmit the energy consumption
value to the ESP 8266 WiFi module, which helps us to upload these data to the
Thingspeak server for further analysis and monitoring. Once the peak power demand
time is reached, the system can disconnect the low-priority loads to avoid high elec-
trical bills by setting a threshold value. As long as the internal system is powered on,
the system will continue to update the Thingspeak server so that the user or utility grid
staff can get information about the usage of electricity. Once it is required that the grid
has to disconnect some loads, the photovoltaic solar system will continue supplying
electricity but not supplying to the grid as it may cause islanding phenomenon where
the optocoupler device is used to avoid this scenario. Besides, when the photovoltaic
system is producing much energy, it can feed to the grid and this will reduce the cost of
electricity paid per month.

4 Grid-Connected PV System

The PV system’s interconnection with the grid requires precise control of the syn-
chronization between the inverter and the grid. For power systems connected in par-
allel, important parameters such as system voltage, phase, and frequency must be
matched. The DG unit is connected to the utility grid. In the grid parallel plan, the
voltage value and phase shift are set by the grid at the PCC, and DPG (distributed
power generation) and PCC voltage synchronization actions can be completed. Grid
measurements of the voltage amplitude, the phase angle, and the frequency will be the
main goals for stable synchronous operation. The actual connection between the public
power grid and DPG is implemented for smooth interconnection. The DPG and grid
voltage should be the same, but the deformation of solar photovoltaic power generation
cannot be guaranteed (Fig. 3).

Fig. 3. Parallel diagram of power grid and PV system.


552 E. Turatsinze et al.

From Fig. 4, this study focuses on the common coupling point of two power supply
systems in parallel. After generating electricity from the PV arrays, it is then converted
into DC values of voltage and current. The frequency is matched with the voltage and
frequency at the PCC using the phase-locked loop. The whole design is done in
Matlab/Simulink.

Fig. 4. Design block diagram of parallel system for this study.

4.1 System Analysis and Algorithm


This output AC values from the inverter must have the same important parameters like
voltage, frequency, phase angle and signal waveform which is a sinusoidal waveform
to be synchronized to the grid as this distributed generation have to be fed to the grid
when the generated power exceeds the power demand of the user. Synchronization of
two or more different power generations is a very crucial point to deal with while
interconnecting them since it deals with the matching of voltage and frequency to the
existing generating unit because it cannot deliver power the grid unless they operate at
the same frequency and voltage. Synchronous relay is very useful for eliminating
human response time in the process, or when workers are not there such as remotely
controlled power plants. Lamps or Synchro scope are sometimes used to automatic
relays for possible manual use or monitoring of generator sets. But this study for PLL is
used for synchronizing photovoltaic solar energy and the power grid. PLL is an
important building foundation of modern electronic technology. It receives the refer-
ence signal of the input required waveform and mainly consists of a phase detector,
LPF (low-pass filter), and VCO (voltage-controlled oscillator) (Fig. 5).
Study of Advanced Power Load Management 553

Fig. 5. PLL system block diagram.

An input voltage (Vi) having a frequency (fi) passes in the phase detector. The
phase detector here is used for comparing fi with fo and gives the DC voltage error Ver
(= fi + fo). The dynamic characteristic of the PLL is that the voltage in the system
eliminates high-frequency noise via LPF, and generates a stable DC voltage Vf
(= fi − fo).
When DC voltage goes through the VCO and gives the output frequency fo which
is relative to the input DC voltage signal. It analyzes and changes the fi and fo through
the feedback loop until the fo is coordinated with the fi. In this manner, the PLL works
for the accompanying stage are free running, catch and stage lock. In this manner, the
PLL works in the accompanying stages: free running, catch and stage lock. When the
system is running freely, no input voltage is provided. When the input frequency is
available, VCO will start changing and begins produce the output frequency to com-
pare them and then this phase is termed the capture phase. The last phase occurs when
the fo is adjusted to be equal to the fi, and here, the frequency comparison will stop.
After generating the electrical power from this PV system, the generated values
need to be recorded and uploaded to the thingspeak for monitoring the PV power
generation and as the total load power consumed is also recorded, this technology will
help to know well the generation and usage of solar energy generated. Furthermore, the
utility company will use this information to know the power which is fed to the grid as
many PV systems may be in remote areas.

4.2 Inverter Simulation Using MATLAB/Simulink


MATLAB/Simulink software helped to draw the circuit diagram of an inverter to see
the behavior of input/output electrical parameters of an inverter. The goal of this
simulation was to make sure that the output parameters of an inverter are sinusoidal
waveforms to match that of the grid with the same frequency to assure better syn-
chronization. The output voltage from the three-phase inverter should match the same
as the operating system voltage of the grid (Fig. 6).
554 E. Turatsinze et al.

Fig. 6. Snapshot of the block diagram of an inverter in MATLAB/Simulink.

5 Results and Interpretation

In this study, the photovoltaic system which can be synchronized with the grid power
to supply Ac Loads especially during high peak demand to reduce high electric bills
and the use of IoT technology to record the instantaneous power consumed and the
solar energy generated and then uploads the data to the Thingspeak for further analysis.
This study simulates to check the voltage waveform characteristics of the three-phase
inverter output. As the MATLAB simulation shows, the results of the simulation to
have sinusoidal waveforms of output voltage were successfully achieved with the
ratings that match to the utility grid, with the frequency of 50 Hz (Fig. 7).

Fig. 7. A snapshot of scopes from MATLAB/Simulink showing the simulation results.


Study of Advanced Power Load Management 555

In the simulation, the grid system is first simulated to see the waveform of the
voltage and current (Fig. 8).

Fig. 8. The results of simulation voltage and current waveforms for this study grid.

The DG unit was also simulated as it is the main concern in this paper. AC voltage
and current from the inverter were filtered using RLC circuits to reduce harmonics
which may distort the system. A simulation was done at this stage to verify that the
design matching the PV system and power grid. Figure 9 shows the output of the
inverter and it shows that they are almost the same if you look carefully the snapshot of
Voltage/current waveforms in the grid system in Fig. 8 and that of the inverter in Fig. 9.

Fig. 9. Simulation results of inverter output voltage/current in this study.

When the voltage of grid and inverter output is tried to be seen on the same scope,
you can’t separate which voltage waveform belongs to the grid or inverter as shown in
Fig. 10. They overlap with each other and this shows that each line voltage of the
inverter with its corresponding line voltage from the grid is in phase with the same
frequency and voltage.
556 E. Turatsinze et al.

Fig. 10. The snapshot of the overlapped waveforms of the grid voltage and output inverter
voltage.

From the simulation and design did, DG equipment can be directly connected to the
power grid. At this stage, the electric power is generated and what is needed here is
how to monitor and control this generated power and monitor the energy consumed by
the user. A current sensor has to be installed at the output of the inverter to sense the
current flowing through it and send the value to the control which calculates the power
consumed and uploads the data to the Thingspeak server for better monitoring and
analysis.

6 Conclusion and Future Work

Synchronization is a very crucial point to deal with in a grid-connected PV system and


electrical load management is another aspect to be considered to reduce electrical bills
and monitor the loads regardless of where you are using the internet and Smartphone.
Generally, this paper aims to design a grid-connected PV system with electrical load
management using IoT technology where the power consumed has to be recorded and
uploaded to the internet where every concerned party can access the data for moni-
toring and analysis. Simulation results showed the proposed system provides and
fulfills all the requirements of synchronism and the design objectives were successfully
achieved. After the PV system and the grid can be synchronized in parallel, multiple
simultaneous real-time monitoring and sharing of the distributed system load is the
main function we expect. Furthermore, the generated power from the PV system and
power consumed can be recorded and uploaded to the server. Further researches should
be conducted to find the best IoT communication technology for monitoring the
generated power and energy consumed by electrical loads.
Study of Advanced Power Load Management 557

References
1. Sakhare, R.V., Deshmukh, B.T.: On electric power management using zigbee wireless
sensor network. Int. J. Adv. Eng. Technol. 4(1), 492–500 (2012)
2. Ghaderi, D., Maroti, P.K., Sanjeevikumar, P., Holm-Nielsen, J.B., Hossain, E., Nayyar, A.:
A modified step-up converter with small signal analysis-based controller for renewable
resource applications. Appl. Sci. 10(1), 102 (2020)
3. Solanki, A., Nayyar, A.: Green internet of things (G-IoT): ICT technologies, principles,
applications, projects, and challenges. In: Kaur, G., Tomar, P. (eds.) Handbook of Research
on Big Data and the IoT, pp. 379–405. IGI Global, Hershey (2019)
4. Khan, M.W., et al.: Synchronization of photo-voltaic system with a grid. J. Electr. Electron.
Eng. (IOSR-JEEE) 7(4), 01–05 (2013)
5. Yang, Y., Blaabjerg, F.: Synchronization in single-phase grid-connected photovoltaic
systems under grid faults. In: 2012 3rd IEEE International Symposium on Power Electronics
for Distributed Generation Systems (PEDG). IEEE (2012)
6. Srinivasan, M., Ravikumar, R.: Synchronization and smooth connection of solar
photovoltaic generation to utility grid. Int. J. Electr. Eng. 9(1), 51–56 (2016)
7. Munde, S.S., et al.: Automatic load management of electric power by use of zigbee
technology. Int. J. Electr. Electron. Res. 3(2), 106–110 (2015)
8. Leonardo Energy: Importance of load management, January 2009. http://www.leonardo-
energy.org
9. Dike, D.O., Ogu, R.E., Uzoechi, L.O., Ezenugu, I.A.: Development of an internet of things
based electricity load management system. Am. J. Eng. Res. (AJER) 5(8), 199–205 (2016)
10. IEA-PVPS, Cumulative Installed PV Power, October 2005. http://www.iea-pvps.org
11. Shahidehpour, M., Schwartz, F.: Don’t let the sun go down on PV. IEEE Power Energy
Mag. 2(3), 40–48 (2004)
12. Blaabjerg, F., Chen, Z., Kjaer, S.: Power electronics as efficient interface in dispersed power
generation systems. IEEE Trans. Power Electron. 19(5), 1184–1194 (2004)
13. Blaabjerg, F., Teodorescu, R., Liserre, M., Timbus, A.V.: Overview of control and grid
synchronization for distributed power generation systems. IEEE Trans. Ind. Electron. 53,
1398–1409 (2006)
14. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Agent-based middleware
framework using distributed CPS for improving resource utilization in the smart city. Future
Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006
15. Chu, K.-C., Horng, D.-C., Chang, K.-C.: Numerical optimization of the energy consumption
for wireless sensor networks based on an improved ant colony algorithm. IEEE Access 7,
105562–105571 (2019)
16. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Energy saving technology of
5G base station based on internet of things collaborative control. IEEE Access 8, 32935–
32946 (2020)
17. Sheril, A.A., Babu, M.R.: Synchronization control of grid-connected photovoltaic system.
Middle East J. Sci. Res. 25(4), 864–870 (2017)
Power Grid Critical State Search Based
on Improved Particle Swarm Optimization

Jie Luo1,2(&), Hui-Qiong Deng1,2(&), Qin-Bin Li2(&),


Rong-Jin Zheng2(&), Pei-Qiang Li2(&), and Kuo-Chi Chang3,4(&)
1
School of Information Science and Engineering, Fujian University
of Technology, Fuzhou 350108, China
{550461238,1123233466}@qq.com
2
Fujian Provincial University Engineering Research Center of Smart Grid
Simulation Analysis and Integrated Control, Fuzhou 350108, China
{550461238,1123233466,394725843,
626424482,596902510}@qq.com
3
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
2507416643@qq.com
4
College of Mechanical and Electrical Engineering,
National Taipei University of Technology, Taipei, Taiwan

Abstract. This article proposes a method to find the closest critical initial
running state of power grid, aiming at the possible chain reaction failure caused
by branch fault in power system. First of all, assuming the initial failure of
power grid, considering the specific performance of relay protection, analyze the
critical state of the branch in cascading trip in detail, the electrical distance
between the current running state and the interlocking fault expressed in the
form of nodal injection power is given. Secondly, on the basic of the relation-
ship between the nodal injections power and the system safety, a safety level
model that the system will not trigger cascading failures is established. Finally,
the improved simplified mean particle swarm optimization algorithm is used to
solve the model, and an example is analyzed on IEEE-14 node system. An
example shows the effectiveness and feasibility of the proposed method, which
provides a reference for further prevention of cascading trip accidents.

Keywords: Power system  Cascade tripping  Critical state  Simplified


mean  Particle swarm optimization (PSO)

1 Introduction

Minor power system failures rarely cause major power outages. Major power outages
are often caused by cascading failures that lead to escalating events. Although the
possibility of cascading failure is very small, the loss caused by it cannot be estimated.
Therefore, in the process of system reliability evaluation, the possibility of low-
probability events must be evaluated [1]. The cascading failure will lead to the

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 558–567, 2021.
https://doi.org/10.1007/978-3-030-58669-0_50
Power Grid Critical State Search Based on Improved Particle Swarm Optimization 559

continuous escalation of the system failure, which may cause the overload of some
branches or equipment. In this case, if operators do not have a clear understanding of
the actual running state of the system, it is easy to lead to a serious power failure. Thus,
cascading failure analysis and preventive control have very important practical sig-
nificance; this topic has also become a hot spot of power system research [2].
Literature [3, 4] mainly studies the influence of nodal injection power on whether or
not cascading tripping occurs in the power grid after the initial failure, and analyzes the
electric volume between the nodal injection power and the branch circuit. Literature [5,
6] proposes an index and a calculation method for measuring the safety margin of the
power grid considering the cascading trip, and an optimization model for preventing
the cascading trip is established based on the safety margin.
Aiming at the cascading fault caused by the branch fault, this paper firstly analyzes
the running state of the power grid, and proposes an index to measure the security of
the power grid combined with the nodal injection power. Then, taking the shortest
power distance between two trips triggered by the current operation state as the
objective function, the optimization model for calculating the objective function is
established. Finally, the improved simplified mean particle swarm optimization is used
to solve the model. The simulation results verify the effectiveness and feasibility of the
method.

2 Mathematical Expression of Cascading Trip

In this article, only the current-type protection is considered. Supposing an initial


failure occurs in a branch of the power grid, after the fault branch is removed and the
power flow is redistributed, it can be judged whether cascading trip will occur on the
remaining branch, which can be expressed by Eq. (1) [3]:
   
Ip:dst ¼ Ip:s   Ip  ð1Þ

Equation (1) is the current representation after the power flow is redistributed, IP
represents the current of branch LP, Ips represents the current protection setting value
of branch LP, Ip.dst is the electrical distance quantity between Ip.s and Ip.
According to the cascading trip knowledge, when Ip.dst> 0, branch Lp is still in
running state, and when Ip.dst  0, branch Lp is removed by the protection device, that
is, cascading trip will occur in Lp, Ip.dst= 0 is a exceptional circumstance, in this
circumstance, branch Lp is just at the critical state of triggering chain failure.
Assuming that after the initial failure is removed, the number of remaining branches
is n, the remaining branches are analyzed according to formula (1), and the diagonal
matrix of the following formula (2) can be given:
 
K ¼ diag I1:dst ; . . .; Ip:dst ; . . .; In:dst ð2Þ
560 J. Luo et al.

3 Critical State of Chain Failure

It is assumed that at a certain moment, the initial failure branch is cut off, and after the
power grid is redistributed by the power flow, if all the remaining branches satisfy the
formula (3), the cascading trip will occur.

jK j ¼ 0
ð3Þ
Ip:dst  0; p ¼ 1; 2; . . .; n

Referring to the literature [4], it can be seen that after the initial failure of branch Li
exits the operation, the power flow Eq. (4) of the power grid is as follows:
  
~ U_
Y U_ ¼ S ð4Þ

In Eq. (4), Y and U_ are respectively expressed as node admittance matrix and node
voltage vector after Branch Lp exits operation. Because Y is a matrix with fixed values.
So U_ in Eq. (4) mainly depends on the nodal injection power S. ~ Ip in Eq. (1) is mainly
depend on the node voltage. Further, Ip is mainly depending on the nodal injection
power. To sum up, for a certain initial failure, the critical running state of the power
grid after the initial failure resection is mainly depend on the nodal injection power.

4 Safety Index of System

This article will use nodal injection power to describe the running state of the power
grid. When the power grid operates in a certain initial running state and is affected by
the initial failure, some power grids may lead to cascading trip, and some power grids
operate normally. The boundary between these two states is called the critical initial
running state of the power grid.
There may be many critical initial operating states in the actual power grid. In these
critical states, we should focus on the critical initial running state closest to the current
initial running state of the power grid. If the power grid is very close to the critical state
and slightly changes the nodal injection power, it is likely to enter this state.
Suppose S´ be the nodal injection power vector in the current initial running state s0
of the power grid, and S is the nodal injection power in a critical initial running state s1
of the power grid, then the distance between S´ and S can be expressed as:

DðSÞ ¼ kS0  Sk ð5Þ

In Eq. (5), D (S) is expressed as the norm of the difference between S´ and S. D
value can be defined as an index of system safety. Therefore the closest critical initial
running state should meet the requirements of Eq. (6).
Power Grid Critical State Search Based on Improved Particle Swarm Optimization 561

F ¼ min D ðSÞ ð6Þ

According to the knowledge of power system, before the initial fault occurs, the
power grid should also satisfy the power flow equation of steady-state operation under
state s1, which can be abbreviated as follows Eq. (7):

R 0 ð xÞ ¼ 0 ð7Þ

Among them, R0 is the mapping relationship corresponding to the power flow


before the initial failure, and x is a single state variable.
Before the fault occurs, the normal operation of the power grid should also meet the
inequality constraints, which is abbreviated to the following form:

J 0 ð xÞ  0 ð8Þ

When the initial fault occurs, the power grid runs in state s1 and should satisfy the
steady-state power flow equation, which can be abbreviated as follows Eq. (9):

Rk ð xÞ ¼ 0 ð9Þ

Combined with the above analysis, a model for finding the closest critical initial
running state can be established. The variable to be optimized is the nodal injection
power under the closest critical initial running state, and its expression is as follows
Eq. (10):
8
>
> minDðSÞ ¼ kS0  Sk
>
>
>
> R0 ð xÞ ¼ 0
>
>
>
< Rk ð xÞ ¼ 0
ð10Þ
> J 0 ð xÞ  0
>
>
>
>
>
>
> jK j ¼ 0
>
:
Ip:dst  0; p ¼ 1; 2; . . .; n

For jK j ¼ 0 in formula (10), write it as f ðxÞ ¼ 0; For Ip.dst  0 in formula (10),


write it as J1(x)  0; In order to transform constrained problem into unconstrained
problem. This paper gives the objective function shown below:

X
N1
1   2 X
N1
1  2 1
D0 ¼ D þ min 0;  J0 ðx) þ min 0; J 1 ðxÞ þ ½f ðxÞ2 ð11Þ
k
dk k
fk l

Among them, dk, fk and l are penalty factors. dk is set as 0.5, fk is set as 0.5, l is
set as 0.8.
562 J. Luo et al.

5 Algorithm Optimization
5.1 Basic PSO
PSO is an algorithm that relies on swarm intelligence to search randomly. In the
iterative calculation, each particle updates its own speed and position by constantly
updating Pbest and gbest, so as to find the best position of the particle. Particle position
and velocity are updated as follows formula (12):
(

vki þ 1 ¼ wvki þ c1 r1 P xk þ c2 r2 g xk


best:i i best i ð12Þ
xki þ 1 ¼ xki þ vki

In formula (12), w is the inertia weight; c1 and c2 are acceleration factors, and k is
the current iteration number.

5.2 Improved PSO


5.2.1 Simplified PSO
This paper, based on the basic PSO and combination with reference [7], proposes a
mean simplified PSO by removing the speed term in the algorithm, and its position
update formula is shown in Eq. (13):
p þ g
p þ g

Xit þ 1 ¼ wXit þ c1 r1 best best


 Xit þ c2 r2 best best
 Xit ð13Þ
2 2

This improved method can make the particle move to the current optimal position
by using the particle itself and the global optimal position, so that the particle can find
the global optimal position faster, thus effectively avoiding the problem of premature
convergence of the algorithm.

5.2.2 Acceleration Factor


In PSO, the acceleration factor reflects the information exchange between particles.In
this paper, referring to the literature [8], setting a larger value of c1 and a smaller value
of c2 in the early stage of the search, reducing the value of c1 and increasing the value
of c2 in the later stage of the search, so that more particles can learn from the global
optimization, while less particles learn from the local optimization. Therefore, this
article proposes the acceleration factor change formula as follows formula (14):
 
c1 ðkÞ ¼ c1ini  c1ini  cifin  ðk=Tmax Þ
  ð14Þ
c2 ðk Þ ¼ c2ini  c2fin  c2ini  ðk=Tmax Þ

In formula (14), c1ini and c1fin represent the Initial and end values of the acceleration
factor c1, c2ini and c2fin represent the Initial and end values of the acceleration factor c2,
k is the current number of iterations.
Power Grid Critical State Search Based on Improved Particle Swarm Optimization 563

5.2.3 Dynamic Inertia Weight


Inertia weight is a key parameter for local and global search of balance algorithm. This
paper, proposes a strategy to dynamically change inertia weight,and its change formula
can be expressed as follows:

w ¼ wmin þ ðwmax  wmin Þ  cosðp t=2Tmax Þ þ r  betarnd ða; bÞ ð15Þ

Among them, r is the inertia adjustment factor, and betarnd generates random
numbers following the beta distribution. The third term uses beta distribution to adjust
the overall value distribution of inertia weight, and adds inertia adjustment factor
before betarnd to control the deviation degree of w, so as to further improve the search
accuracy of the algorithm.

5.3 Improved PSO Calculation Flow


According to the above analysis, corresponding to the closest critical initial running
state, the algorithm flow chart is as follows
Step 1: Set the initial fault branch and calculate the data after power flow.
Step 2: Read in system parameters, set algorithm parameters and objective functions.
Step 3: In the algorithm, the nodal injection power vector of power grid is used as
particles and initialize the particle population.
Step 4: The fitness value is calculated to find the individual optimal value and the
global optimal value.
Step 5: The acceleration factor and inertia weight are calculated by formula (13)–(15),
and the position of each particle is updated in time.
Step 6: Determine whether the stop condition is satisfied, if it is satisfied, exit the
program. Otherwise, continue the iteration until the stop condition is satisfied.

6 Example Analysis

This paper takes IEEE-14 node system as an example to further explain the proposed
algorithm. It will be described in detail below. In the algorithm, the parameters of each
component of the system and the calculated D value are expressed by the per-unit,
where the reference capacity is taken as 100 MVA.
In this paper, the improved PSO and the basic PSO are tested by MATLAB
simulation. In this test, the specific parameters are set as the number of particle pop-
ulation is 40, the maximum number of iterations is Tmax = 200. In the PSO, acceler-
ation factor c1 = c2 = 2, inertia weight w decreases linearly from 0.9 to 0.1; in the
improved PSO, acceleration factors c1ini = 2, c1fni = 0.5, c2ini = 0.5, c2fni = 2, inertia
weight wmin = 0.1, wmax = 0.9.
Assuming that the initial failure is the branch between node 3 and node 4, that is,
branch L42. In order to facilitate analysis and comparison, in this paper, it is assumed
that each branch of IEEE-14 system is equipped with a back-up protection of current-
type, and the setting value of protection current is 3 KA.
564 J. Luo et al.

Figure 1 shows the result of D-value obtained by PSO, with 200 iterations. The
abscissa represents the number of iterations, and the ordinate represents the distance. It
can be seen from the figure that the shortest power distance D corresponding to the
optimal value is 3.2266.

Fig. 1. Power shortest distance D of basic PSO.

Figure 2 shows the result of D-value obtained by the improved PSO, with 200
iterations. It can be seen from the figure that the shortest power distance D corre-
sponding to the optimal value is 2.3642.

Fig. 2. Power shortest distance D of improved PSO.

Figure 3 shows the results of the optimal fitness obtained by PSO, with 200 iter-
ations. The abscissa in the figure represents the iteration times, and the ordinate rep-
resents the fitness. We can see from the figure that the best fitness is 0.0663.
Power Grid Critical State Search Based on Improved Particle Swarm Optimization 565

Fig. 3. Optimal fitness of basic PSO.

Figure 4 shows the results of the optimal fitness obtained by the improved PSO,
with 200 iterations. We can see from the figure that the best fitness is 0.0557.

Fig. 4. Optimal fitness of improved PSO.

It can be seen from the comparison results that in calculating the shortest distance
between the current state and the critical state of the system, the improved PSO reduces
from 3.2266 to 2.3642. In calculating the optimal fitness, the improved PSO reduces
from 0.0663 to 0.0557. The improved PSO is used to repeatedly calculate the D value
and the optimal fitness of the system. The D value converges to about 2.364, and the
optimal fitness converges to about 0.063. This shows that the improved PSO improves
the PSO optimization level, increases the search range, and improves the calculation
accuracy. Therefore, the improved algorithm can find the closest critical point of
triggering the cascading fault more quickly, which provides a reference for preventing
interlocking trip accident.
566 J. Luo et al.

7 Conclusion

In recent years, although the power system continues to improve its own stability, the
power failure accidents still happen frequently, the main reason is the cascading
trip. This article starts with the manifestation of the early stage of the cascading trip,
on the basic of the action mechanism of relay protection, the distance protection is
used as the overload protection of the branch, and an optimization model is established
to find the critical point of the power grid. The variable to be optimized is the injected
power of the node corresponding to the running state of the cascading trip boundary.
The improved PSO is applied to solve the model, on the basic of the simplified PSO, a
simplified mean PSO with dynamic adjustment of acceleration factor and inertia weight
are given. By modifying the individual optimal position and the global optimal position
in the algorithm position formula, adding the dynamic adjustment of beta distribution
to the inertia weight and introducing the dynamic acceleration factor, it not only
increase the variety of the population, but also makes the algorithm has good global
convergence ability. The simulation results show that the improved PSO can achieve
better optimization results in all target directions. Under the same iteration times, the
improved PSO in this paper has faster convergence speed, better stability and calcu-
lation accuracy compared with PSO, and can quickly and accurately find out the critical
point of triggering cascading fault, which provides reference for further research on
preventing cascading trip in the future.

Acknowledgment. This research was financially supported by Scientific Research Development


Foundation of Fujian University of Technology under the grant GY-Z17149, and Scientific and
Technological Research Project of Fuzhou under the grant GY-Z18058.

References
1. Gan, D.-Q., Jiang-Yi, H., Han, Z.-X.: Thinking on some international blackouts in 2003.
Power Syst. Autom. 28(3), 1–5 (2004)
2. Huang, X.-Q., Huang, Y., Liu, H., et al.: Analysis of the importance of the root causes of
power production accidents based on the dynamic weight Delphi method. Electr. Technol. 18
(3), 89–93 (2017)
3. Deng, H.Q., Lin, X.Y., Wu, P.P., Li, Q.B., Li, C.G.: A method of power network security
analysis considering cascading trip. In: Pan, J.S., Lin, J.W., Liang, Y., Chu, S.C. (eds.)
Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and
Computing, vol. 1107. Springer, Singapore (2020)
4. Zhu, J.-W.: Power System Analysis. China Power Press, Beiping (1995)
5. Deng, H.-Q., Li, C.-G., Yang, B.-L., Alaini, E., Ikramullah, K., Yan, R.: A Method of
Calculating the Safety Margin of the Power Network Considering Cascading Trip Events.
Springer (2020)
6. Deng, H.Q., Wu, P.P., Lin, X.Y., Lin, Q.B., Li, C.G.: A method to prevent cascading trip in
power network based on nodal power. In: Pan, J.S., Lin, J.W., Liang, Y., Chu, S.C. (eds.)
Genetic and Evolutionary Computing. ICGEC 2019. Advances in Intelligent Systems and
Computing, vol. 1107. Springer, Singapore (2020)
Power Grid Critical State Search Based on Improved Particle Swarm Optimization 567

7. Huang, Y., Lu, H.-Y., Xu, K.-B., Shen, G.-Q.: Simplified mean particle swarm optimization
algorithm with dynamic adjustment of inertia weight. Microcomput. Syst. 39(12), 2590–2595
(2018)
8. Teng, Z.-J., Lv, J.-L., Guo, L.-W., Wang, Z.-X., Xu, H., Yuan, L.-H.: Particle swarm
optimization algorithm based on dynamic acceleration factor. Microelectr. Comput. 34(12),
125–129 (2017)
Study of PSO Optimized BP Neural Network
and Smith Predictor for MOCVD Temperature
Control in 7 nm 5G Chip Process

Kuo-Chi Chang1,2,7(&), Yu-Wen Zhou1,2, Hsiao-Chuan Wang3, Yuh-


Chung Lin1,2, Kai-Chun Chu4, Tsui-Lien Hsu5, and Jeng-
Shyang Pan6
1
School of Information Science and Engineering,
Fujian University of Technology, Fuzhou, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
Institute of Environmental Engineering, National Taiwan University,
Taipei, Taiwan
4
Department of Business Management, Fujian University
of Technology, Fuzhou, China
5
Institute of Construction Engineering and Management,
National Central University, Taoyuan, Taiwan
6
College of Computer Science and Engineering, Shandong University
of Science and Technology, Shandong, China
7
College of Mechanical and Electrical Engineering, National Taipei University
of Technology, Taipei, Taiwan

Abstract. The key industries of information technology in the semiconductor


integrated circuit industry will have an important role after 2020, In the third
generation of advanced semiconductor of 7 nm 5G chips GaN is be used, MOCVD
is a key technology for preparing high-quality communication semiconductor
crystals. This study proposes PID controller of PSO and BP neural network
algorithm to improve the control ability of MOCVD temperature. From the
research results, it is found that the method proposed in this study adopts the PSO
and BP neural network intelligent PID algorithm controller with Smith predictor,
which has better dynamic performance, and the value from 150 to 500 s is stable
from 0 to 1 no vibration, any overshoot and short adjustment time, ideal control.

Keywords: MOCVD  5G chip process  BP neural network  Smith


predictor  Intelligent control  PSO algorithm

1 Introduction

The key industries of information technology in the semiconductor integrated circuit


industry will have an important role after 2020, especially in 5G mobile communica-
tions. With the rapid development of the third generation of semiconductor materials,
MOCVD process technology has a profound influence on the development of batteries,

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 568–576, 2021.
https://doi.org/10.1007/978-3-030-58669-0_51
Study of PSO Optimized BP Neural Network and Smith Predictor 569

lighting, communications and other fields. It is also key technical equipment for 5G
communication chips. At present, the investment scale of semiconductor factories is
huge, especially in the future trend, the key commercial scale is 7 nm and below
(Fig. 1) [1–3], the production cost is expensive, and the precision of thin film process
control is required. Table 1 summarizes the latest application trends of 7 nm chips.

Fig. 1. Semiconductor key size trend in 2020.

Table 1. Summarizes the latest application trends of 7 nm chips.


570 K.-C. Chang et al.

In the third generation of advanced semiconductor 5G chips GaN is be used,


MOCVD is a key technology for preparing high-quality communication semiconductor
crystals. However, temperature is one of the most common process parameters in
industrial control of chip processes. Temperature control is also an important parameter
of the MOCVD process system (Fig. 2), which directly affects the quality of film
material growth [4–6]. The temperature of the reaction chamber must be accurately
controlled throughout the entire material growth process, so as to ensure that the
material quality meets the characteristics.

Fig. 2. The architecture of MOCVD system in 7 nm FAB

With the progress of science and technology, many intelligent algorithm control
methods have emerged. However, PID control is still widely used in various fields such
as Electromechanical, petroleum, chemical industry, thermal engineering, metallurgy
and so on, especially in the bottom industrial production process control. This is
because the advanced intelligent algorithm can not be 100% stable in industrial control
directly as a controller. Because PID control not only has the advantages of simple
algorithm principle, stable and reliable, easy to realize, strong adaptability, good
robustness to model parameter perturbation, but also clear physical meaning and easy
to understand. Therefore, the intelligent PID controller based on BP neural network
algorithm is adopted in combination with the characteristics of BP neural network self-
adaptive and self-learning ability. PSO has the characteristics of fast convergence
speed. At present, MOCVD process control in engineering application still adopts the
traditional control method, and some theoretical research including fuzzy control has
made some progress, and its correctness and feasibility have been verified by simu-
lation experiment. However, from the result, the control effect has not reached the very
precise control, and there is still room for optimization and further research. PSO
optimization algorithm is mainly used to solve the problem that BP neural network
itself is uncertain and easy to fall into local optimal solution, so as to improve the
Study of PSO Optimized BP Neural Network and Smith Predictor 571

control performance. Aiming at the shortcomings of the neural network with large
uncertainty and easy to fall into the local optimal solution, the PSO algorithm is used to
optimize the weights, improve the ability of the neural network to adjust PID param-
eters adaptively, and add the Smith predictor. Compared with typical compensation
control schemes, Smith predictive control is one of them, and the control effect of
controlled objects with large time delay is improved. This is an important contribution
of this research to the intelligent temperature control of current advanced equipment.

2 PID Controller Design of BP Neural Network Combined


on PSO

Figure 3 is shown the structure of PSO-BP-PID controller. The main purpose of BP


neural network is to optimize the BP neural network algorithm through PSO, and then
modify the initial weight to adjust the process parameters. We expect to improve the
self-learning ability and convergence speed of the BP neural network for the entire PID
controller. In this study, the PID control parameters are finally output as P, I, and D
three values, by which the conventional PID controller can improve the closed-loop
control performance of the controlled object, which is an important function. In the
structure diagram of PSO-BP-PID controller,rin represents the set working temperature
value, yout represents the actual output temperature value, and e represents the tem-
perature error [7–9].

Fig. 3. PID controller schematic diagram used PSO to optimize BP neural network.

The steps used PSO algorithm to optimize BP neural network in this study are as
follows:
Step (1): encode the connection weights of all neurons in the BP network structure,
so that the individual becomes a real number code string. The BP neural network for
this study takes the set value, error and actual value as the input, PID parameter as
the output, hidden layer as 5 layers, forming a 3-5-3 structure, so the particle vector
dimension is 30
Step (2): initialize the particle swarm. In this study, the population size has been set
30; c1, c2 are taken as 2; w is taken as 0.8; 500 is the iteration maximum numbers.
572 K.-C. Chang et al.

Set the maximum and minimum velocity of particles as vmin ; vmax , and generating
initial velocity randomly in the interval.
Step (3): Map the formed particles to the BP neural network to form the neural
network weights, and then construct the fitness function like below (1):

1
f ¼ ½rðk þ 1Þ  yðk þ 1Þ2 ð1Þ
2

r(k + 1) is the expected value at k + 1 moment.


y(k + 1) is the actual value at k + 1 moment.
By calculating the particle fitness value, each particle fitness value and the
overall optimal value are obtained.
Step (4): Update the position of the particles and the speed of each iteration, and
then check whether the speed of the updated particles is within the set range.
Step (5): Use the test algorithm to confirm that the iteration numbers are reached the
set value, or whether the system output has met the objective function. If any of the
above assumptions are true, the iteration will be terminated and the optimal solution
ð2Þ ð3Þ
will be generated, the optimal initial weights wij ð0Þ and wli ð0Þ are optimized by
PSO for BP neural network, otherwise, it will return to Step3 iteration to continue.
We use the weights optimized by the PSO algorithm as the initial weights of the BP
neural network, and then send them to the training network. The flow chart of PSO-BP
neural network is show in Fig. 4.

Fig. 4. Flow chart of this study of PSO-BP network.


Study of PSO Optimized BP Neural Network and Smith Predictor 573

The PID control structure diagram of the Smith predictor based on the PSO
algorithm BP in Fig. 5 can be obtained by combining the three methods described
above.

Fig. 5. PID control structure diagram of the Smith predictor based on the PSO algorithm BP.

3 Discussion of Experimental Results

The temperature control system model of MOCVD can be obtained by curve flying
method. The gain is 3.2 (°C/0.1 V), the controlled object dead time is 150 s as shown
in formula (2), the controlled process inertia time constant is 200 s, and the object
model is [10–12]:

3:2
GðsÞ ¼ e150s ð2Þ
200s þ 1

In this study, we used MATLAB Simulink toolbox for simulation of the MOCVD
temperature control system. The simulation system is established by the combination of
step signal module, incremental PID controller, transfer function of temperature control
system and oscilloscope. Figure 6 is to show the PSO BP neural network combination
to PID control with Smith predictor simulation structure. PSO BP algorithm is pro-
grammed in the S-function module of the graph.
574 K.-C. Chang et al.

Fig. 6. Simulation controller structure of PSO BP neural network combination PID with Smith
predictive.

The temperature control of MOCVD is simulated by Simulink toolbox of


MATLAB. The incremental PID and Smith predictor parameters proposed in this study
are adjusted by its own characteristics. The PID parameters are adjusted continuously
by trial and error method. Finally, the simulation control results are obtained. PSO
optimizes the PID controller of BP neural network and Smith predictor, then the
parameters of PID are adjusted online and the optimal control is output through self-
adaptive and self-learning ability.

Fig. 7. MOCVD temperature control response simulation curve.


Study of PSO Optimized BP Neural Network and Smith Predictor 575

Through the trial and error of conventional PID control, the relatively good curve is
obtained. The values of PID parameters in the 0–1 range are 1, 0.08 and 0.1 respec-
tively. It can be found from the two simulation results of temperature control in Fig. 7
that after the same limit is imposed on the value of PID parameters. Using the incre-
mental PID control method of Smith predictor, the system has overshoot, the value is
between 0.5*1.7, the response curve oscillates, the solid-state stability time is long,
and the control effect is poor. From the research results, it is found that the method
proposed in this study adopts the PSO and BP neural network intelligent PID algorithm
controller with Smith predictor, which has better dynamic performance, and the value
from 150 to 500 s is stable from 0 to 1 no vibration, any overshoot and short
adjustment time, ideal control. However, we can clearly find that the BP neural network
adaptive PID controller has the advantages of stable control output and convenient
adjustment. Due to the shortcomings of traditional neural network equipment in the
traditional sense of slow convergence speed, the local “pure lag” and “large inertia”
achieve the control quality and best tracking of the system, instead of local opti-
mization is constant, so the PSO algorithm The optimized BP neural network controller
can significantly improve the results of the MOCVD temperature control. There will be
overshooting of the system step response; therefore, it has a good guiding role in
improving the control accuracy of the MOCVD industrial process and has great
guiding significance. The development of the semiconductor industry is of great sig-
nificance, especially at present, it is more focused on the process market of 5G chips.

4 Conclusion and Suggestion

The key industries of information technology in the semiconductor integrated circuit


industry will have an important role after 2020, In the third generation of advanced
semiconductor 5G chips GaN is be used, MOCVD is a key technology for preparing
high-quality communication semiconductor crystals. This study proposes PID con-
troller of PSO and BP neural network algorithm to improve the control ability of
MOCVD temperature. From the research results, it is found that the method proposed
in this study adopts the PSO and BP neural network intelligent PID algorithm con-
troller with Smith predictor, which has better dynamic performance, and the value from
150 to 500 s is stable from 0 to 1 no vibration, any overshoot and short adjustment
time, ideal control. In the future, one is to find a better optimization algorithm to further
improve the initialization time and control performance of neural network weights. The
second is to build the simulation environment of the actual control, and apply the
mathematical control method to the simulation environment to get the actual control
simulation experiment results.
576 K.-C. Chang et al.

References
1. Chang, K.-C., Lin, Y.-C., Chu, K.-C.: Application of edge computing technology in the
security industry. Front. Soc. Sci. Technol. 1(10), 130–134 (2019). https://doi.org/10.25236/
FSST.2019.011016
2. Zhou, Y.W., et al.: Study on IoT and big data analysis of furnace process exhaust gas
leakage. In: Pan, J.S., Li, J., Tsai, P.W., Jain, L. (eds.) Advances in Intelligent Information
Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies,
vol. 156. Springer, Singapore (2020)
3. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently
safer design strategies (IV). The advanced thin film manufacturing process design and
adjustment. J. Loss Prev. Process Ind. 43, 280–291 (2016)
4. Chang, K.-C., Chu, K.-C., Wang, H.-C., Lin, Y.-C., Pan, J.-S.: Agent-based middleware
framework using distributed CPS for improving resource utilization in smart city. Future
Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006
5. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G
base station based on internet of things collaborative control. IEEE Access 8, 32935–32946
(2020)
6. Amesimenu, D.K., et al.: Home appliances control using android and arduino via bluetooth
and GSM control. In: Hassanien, A.E., Azar, A., Gaber, T., Oliva, D., Tolba, F.
(eds) Proceedings of the International Conference on Artificial Intelligence and Computer
Vision (AICV2020). AICV 2020. Advances in Intelligent Systems and Computing, vol
1153. Springer, Cham (2020)
7. Stadel, O., Schmidt, J., Liekefett, M., Wahl, G., Gorbenko, O.Y., Kaul, A.R.: MOCVD
techniques for the production of coated conductors. IEEE Trans. Appl. Supercond. 13(2),
2528–2531 (2003)
8. Li, C.H., Xu, S.X., Xie, Y., Zhao, J.: The application of PSO-BP neural network PID
controller in variable frequency speed regulation system. Appl. Mech. Mater. 599–601,
1090–1093 (2014).https://doi.org/10.4028/www.scientific.net/amm.599-601.1090
9. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for
wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7,
105562–105571 (2019)
10. Mickevičius, J., Dobrovolskas, D., Steponavičius, T., Malinauskas, T., Kolenda, M., Kadys,
A., Tamulaitis, G.: Engineering of InN epilayers by repeated deposition of ultrathin layers in
pulsed MOCVD growth. Appl. Surf. Sci. 427, 1027–1032 (2014). https://doi.org/10.1016/j.
apsusc.2017.09.074
11. Chih-Cheng, L., Chang, K.-C., Chen, Chun-Yu.: Study of high-tech process furnace using
inherently safer design strategies (III) advanced thin film process and reduction of power
consumption control. J. Loss Prev. Process Ind. 43, 280–291 (2015)
12. Chang, K.C., Pan, J.S., Chu, K.C., Horng, D.J., Jing, H.: Study on information and
integrated of MES big data and semiconductor process furnace automation. In: Pan, J.S.,
Lin, J.W., Sui, B., Tseng, S.P. (eds) Genetic and Evolutionary Computing. ICGEC 2018.
Advances in Intelligent Systems and Computing, vol 834. Springer, Singapore (2019)
Study of the Intelligent Algorithm
of Hilbert-Huang Transform in Advanced
Power System

Cheng Zhang1, Jia-Jing Liu1, Kuo-Chi Chang1,2,6(&),


Hsiao-Chuan Wang3, Yuh-Chung Lin1,2, Kai-Chun Chu4,
and Tsui-Lien Hsu5
1
School of Information Science and Engineering,
Fujian University of Technology, Fuzhou, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
Institute of Environmental Engineering, National Taiwan University,
Taipei, Taiwan
4
Department of Business Management, Fujian University of Technology,
Fuzhou, China
5
Institute of Construction Engineering and Management,
National Central University, Taoyuan, Taiwan
6
College of Mechanical and Electrical Engineering, National Taipei University
of Technology, Taipei, Taiwan

Abstract. With the rapid increase in population and electricity consumption,


power grid has long formed a large-scale interconnection of systems. The power
system is a complex multi-dimensional dynamic system, the traditional system
parameter processing method has gradually shown its limitations, which affects the
stability and reliability analysis of the system. This study introduces a popular
time-frequency intelligent application analysis methodology for nonlinear non-
stationary signals–Hilbert-huang transform (HHT) algorithm, then summaries the
applications of HHT method for low frequency oscillation of advanced power
system, power quality detection and harmonic analysis with combining the
research achievements of domestic and foreign scholars in recent years. However,
this study discusses the interpolation function and endpoint effect of HHT method
in practice and further research on its application in advanced power system.

Keywords: Advanced power system  Hilbert-Huang transform algorithm 


Low frequency oscillation  Power quality detection  Harmonic analysis

1 Introduction

Interconnection of power systems in China has become a prominent feature in recent


years. In nonlinear non-stationary power systems, the Fourier transform is previous
methods of data processing, mainly deal with linear and non-stationary signals, so there
are some constraints on the Fourier transform. It could merely be used to analyze linear
and stationary signals because it requires that the analysis system must be linear and

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 577–585, 2021.
https://doi.org/10.1007/978-3-030-58669-0_52
578 C. Zhang et al.

under periodic or stable data conditions, if this is not the case, the spectrum results
obtained will not have much practical significance. There have also been a number of
signal processing methods in history that are either linearly bound or stationary bound
so that nonlinear nonstationary signals cannot be analyzed completely. Norden E. hiang
of NASA proposed a new signal processing method for Hilbert-Huang transform in
1998, which is considered one of the most important applied mathematical methods
ever devised by NASA [1–3]. This method has been widely used in the power system
and has been further studied by many scholars.
As a new time-frequency analysis method, HHT algorithm breaks the limitation of
Fourier transform, solves the problem of instantaneous frequency identification and has
its own adaptability [4]. This method is suitable for time-frequency analysis of non-
linear nonstationary signals; this method can calculate the instantaneous frequency and
amplitude of the signal, and then contribute to the theoretical basis of power system
parameter analysis. However, for advanced power system, power quality detection, and
power system harmonic analysis under the research results of HHT theory and its
application in advanced power system. This study discusses the further research on its
application in advanced power system.

2 Methodology of Hilbert-Huang Transform

HHT is a new data analysis method, mainly composed of Hilbert spectrum analysis and
EMD (empirical mode decomposition). The most important feature of this method is the
introduction of IMF (intrinsic mode functions) thinking to solve local signal charac-
teristics. Above this paper introduce the main process of HHT method to deal with
signals which is nonlinear and nonstationary. First, the signals to be analyzed are
resolved into IMF components by EMD algorithm and then analyze each the method to
obtain the instantaneous frequency and amplitude of the signal is mainly achieved using
the IMF component using the Hilbert method. The key part of EMD decomposition is a
step-by-step filtering process. Assume that the signal that needs to be decomposed is s(t).
Step (1): First find the maximum and minimum values on all dates, then use spline
interpolation to fit the upper envelope V1(t) and the lower envelope V2(t), and then
calculate the average as Eq. (1).

1
m ¼ ðv1 ðtÞ þ v2 ðtÞÞ ð1Þ
2
Step (2): First define h ¼ sðtÞ  m, when h can meet the two requirements of the
IMF, namely (A) when the number of extreme points and zeros of all data segments
are the same or at most differ by 1, or (B) at random time points, its local The
expectation of the envelope formed by the maximum value and the envelope formed
by the local offset should be zero. After that, h is the first IMF. If h is not regarded
as the next and new s(t) and the previous action is repeated, finally the h meets IMF
only required to complete [5].
Step (3): the decomposition stops when the residual component is monotonous or
small enough to be regarded as measurement error, at this time the signal has been
Study of the Intelligent Algorithm of Hilbert-Huang Transform 579

decomposed into several intrinsic modal signals ci and a residual component r, the
calculation method of s(t) is shown in Eq. (2).

X
n
sðtÞ ¼ ci þ r ð2Þ
i¼1

In principle, the size of the two standard deviations SD is used as the criterion of the
intrinsic mode function, the calculation method is shown in Eq. (3).

X
T
½hk1 ðtÞ  hk ðtÞ2
sd ¼ ð3Þ
t¼0 h2k1 ðtÞ

In general situation, the smaller the values of SD are the better the linearity and
stability of the resulting intrinsic modal function. A lot of practice shows that the
decomposition effect of EMD is the best when SD is between 0.2 and 0.3. The detailed
HHT method step flow charts are shown in Fig. 1.

input original signal s(t)

find all maxima and minima in s(t)

find the upper v1(t) and lower envelopeand


v2(t) of s(t) respectively.

to find the
averageenvelope,
m = (v1 (t) + v1 (t)) / 2

h=s(t)-m

to judge whether h meets the N


decomposition stop criterion, sd
should be between 0.2 and 0.3

S(t)=r+ci

N determine
whether r satisfies
residual condition
Y

output ci(t)

Fig. 1. Flow chart of calculation steps of HHT method.


580 C. Zhang et al.

3 Application of Hilbert-Huang Transforms Method


3.1 Advanced Power System Low Frequency Oscillation Applications
As the scale of interconnected large power grids continues to expand, continuous
improvement of interconnection and the continuous operation of transmission lines
with large capacity and long distance, the probability of low frequency oscillation is
greatly increased, which seriously affects the reliability and safety of advanced power
system.
Low frequency oscillations of power systems refers to the phenomenon that after
the system is disturbed, the relative sway between the rotors of the synchronous
generator operating in parallel causes the power of the system to oscillate at other
angles from 0.2 to 0.5 Hz. For a long time, the low frequency oscillation problem has
been studied by the linearization of small signal stability analysis, and achieved certain
results, but as we all known non-linearity is a typical feature of power systems, as the
system scale grows and becomes more complex, the linearization method is exposed to
its shortcomings. With the development of nonlinear research, many scholars introduce
bifurcation and chaos theory into the study of low frequency oscillation, although such
methods can solve some problems that previous linearization methods could not solve,
it still has limitations on the size of the system and the order of the equation [6–8].
HHT algorithm applied for analyze the dynamic oscillation mode of low frequency
oscillation of advanced power system and extract about system fault transient infor-
mation, the Prony’s algorithm and the wavelet transform algorithm are used to analyze
the examples in the 2-zone 4-machine system and the epri-36 node system respectively
to prove EMD method has high resolution and can effectively process short data, at the
same time, it can also overcome the difficulty of selecting the wavelet basis in the
wavelet variation. In reference, before empirical mode decomposition, the approximate
range of each mode frequency have been known by using Fourier transform and then it
is processed by filtering according to the density of the obtained modal frequency [9,
10].
This method can identify the characteristic parameters of low frequency oscillation
accurately. Many scholars combine the HHT algorithm with the traditional low fre-
quency oscillation analysis method and Prony’s method. For example, in reference,
EMD is used to decompose, and then signals are distinguished from noises by the fact
that there is a correlation between signals and only a weak correlation among noise
components. Based on this, de-noise and reconstruct the signal and finally the recon-
structed signal is analyzed by Prony’s method. This way can improve the shortcoming
of Prony’s sensitivity to noise. In reference [11, 12], after the signal is decomposed by
EMD, using the method of obtaining signal energy from each IMF to distinguish noise
and signal pattern and the IMF component with large energy weight is the dominant
Study of the Intelligent Algorithm of Hilbert-Huang Transform 581

mode of oscillation. In short, the HHT method application for low frequency oscillation
of advanced power system has become a trend. A large number of experiments have
proved that this method can achieve more effective and accurate results in the field of
dealing with nonlinear signal in the power system, and extract the characteristics of the
oscillation mode.

3.2 Power Quality Detection


Power supply quality often encounters transient effects of switching operations, power
system failures, and harmonic distortion. For the transient phenomenon of power
system, many scholars at home and abroad use wavelet technology to solve it [13, 14].
HHT analysis method is used to effectively analyze abnormal power quality distur-
bance signals. It breaks down the signal into several IMF by EMD, and these intrinsic
modal functions contain the local characteristics of the signal. Then calculate the first
amplitude and frequency of the IMF, and the time and magnitude of the disturbance
mutation can be obtained. Past research developed an electric energy detection system
directly based on HHT method, which combined software and hardware to provide a
convenient detection system platform for workers.
However, when detect the voltage discontinuity signal at the inflection point with
HHT method, only the time when the disturbance occurs can be detected, but the
detection fails at the termination time. The solution to this problem is to superimpose
the harmonic signal on the original voltage discontinuous signal. Adding harmonic
signals, such as the third harmonic, intended to increase the local extremum point, so as
to detect the amplitude of the disturbance termination. In the application of HHT
method, the problem of mode aliasing will appear. The interference signal occupies the
position of the original physical process curve, making each IMF unable to reflect the
real physical process, which is called the mode aliasing problem [15, 16]. A known
high-frequency signal is superimposed on the original signal to extract the transient
oscillation signal. After decomposing the added single with EMD, the first IMF
component is subtracted from the known high frequency signal to get IMF’ and then
use the Hiblert to analysis it. In addition, EMD method cannot accurately screen out all
components of the harmonic signal when it decomposes the harmonic signal with too
much fundamental wave energy. Before using EMD, Fourier transform was performed
on the signal, this step can determine the approximate frequency range value of each
mode, and then use the low-pass filtering method or other filters to specify the filtering
frequency part, so as to make each component leave for HHT analysis [17, 18]. For
power quality detection, the HHT method plays an extremely important role. It can
check in real time whether the power quality in the power system is good or bad, and
laid an important foundation for the next step of quality improvement, so as to ensure
the safe operation of the power supply system. At present, this method is still being
studied in this field.

3.3 Harmonic Analysis of Advanced Power System


Nowadays, the power supply system is often severely polluted by harmonics. The
analysis and suppression of power system harmonic is an important and very
582 C. Zhang et al.

meaningful work. Harmonic detection is the foundation and main basis of every work,
only accurate real-time detection of harmonics can prepare for subsequent suppression.
Since the early 1980s, China has paid much attention to and carried out research on
harmonic control, and developed several indexes of power quality to limit the allow-
able harmonic value of public power grid. Active power filter (APF) is a power
electronic device which can dynamically suppress harmonics and improve power
quality. But as the power quality continues to improve, scholars at home and abroad
continue to explore new methods for advanced power system harmonic detection. HHT
is introduced into the application of harmonic detection in power system. It can extract
harmonic signals of any frequency. However, the envelope and endpoint problems of
HHT method in practical application will affect the accuracy of harmonic detection, the
cubic spline interpolation used in the method has the phenomenon of overshoot and
undershoot in practice, and because the end points of the boundary are not all extreme
points, it will lead to the phenomenon of flying wings. In view of this that a lot of
scholars have also made research results. The problem of envelope is improved by
using the Hermite interpolation method instead of the original cubic spline interpola-
tion method, and the point symmetric extension method is used to improve the problem
of end-point flying wing. Further dealt with the problem of endpoint effect, and pro-
posed the method of combining artificial neural network and point symmetric extension
to improve the endpoint problem. Furthermore, the signal is preprocessed by filtering to
improve the aliasing problem, which further improves the accuracy of harmonic
analysis.
For HHT, it can’t effectively distinguish the harmonic of similar frequency, Iter-
ative HHT method is often used to identify the static signal inside the signal to improve
the detection accuracy [19, 20]. However, the HHT method has a huge impact on the
harmonic detection of the power supply system and is also the basis for harmonic
suppression. The problems that affect the accuracy of harmonic detection in its
application process are not only the improvement of the above summary, but also the
continuous research and improvement by scholars.

4 Research Discussions and Prospects for Future Study

In recent years, HHT method has become a hot topic in the research of scholars at home
and abroad, and its application in power system has also attracted much attention and
achieved good results. It has been proved that this method has a broad application
prospect in power system; however, this method still has some problems to be solved
like below.

4.1 Interpolation Function Problem


In the process of EMD decomposition, envelope fitting curve is a very important step,
so the selection of interpolation function is extremely important. The original method
uses cubic spline interpolation, but there are over and under impulse phenomena, which
affect the accuracy of the algorithm. At present, many scholars have proposed other
interpolation functions, various improvement methods are shown in Table 1 below.
Study of the Intelligent Algorithm of Hilbert-Huang Transform 583

Table 1. Several interpolation function methods.


Scholar name Interpolation function method Published date
R. He Mobility Model 2018
C. Zheng Ultra-low frequency oscillation 2018
Guang Xiaolei Piecewise cubic Hermite 2011
Jin Tao Subsection power function 2005
Hu Jinsong High-order spline interpolation 2003

Although these methods improve the accuracy and time of the algorithm, the
problem should be further studied.

4.2 Endpoint Effect Problem


When we use the envelope fitting curve to obtain the maximum and minimum values,
the flying wing phenomenon at the end of the curve will affect the accuracy of the HHT
analysis, mainly because the boundary endpoints are not all endpoints. To solve this
problem, it is necessary to extend the boundary points with the extension method. For
this a lot of scholars have put forward many extension methods and proved their
effectiveness, Table 2 lists several extension methods to improve the end effect
problem.

Table 2. Several extension methods.


Scholar name Extension methods Published
date
Vergura, S HHT and Wavelet Analysis 2018
Ucar, F Fast Extreme Learning Machine 2018
Qi Quanquan Method of removing endpoints 2011
Wang Ting The most minimum similarity distance compared with the 2009
ends
Su Yuxiang Artificial neural networks+Mirror extension 2008

These methods have been proven to effectively improve end effects and are widely
used. Because of the importance of this problem, it still needs further study.

4.3 Other Applications of Power Systems


The design application of HHT methodology for advanced power system that low
frequency oscillation controller can be further studied and in the field of improvement
of the precision of low-frequency oscillation modal parameter extraction under strong
noise background. In addition, we can further study the application of HHT in other
aspects of power system, such as fault diagnosis, voltage safety analysis and so on.
584 C. Zhang et al.

5 Conclusion and Suggestion

With the expansion and complexity of the power system, the HHT method is very
suitable for the development and stable operation of advanced power system. This
study introduces the principle of HHT method and summarizes its application in
advanced power system in three directions includes low frequency oscillation of power
system, power quality detection and power system harmonic detection analysis com-
bined with the domestic and foreign scholars on the application of this method research
results. The intelligent method can extract the dynamic mode and oscillation infor-
mation of the low-frequency oscillation of the power system and make preparations for
the next step of oscillation suppression. Moreover, it can effectively analyze non-
stationary power quality disturbance signals and monitor harmonic signals in real time
to improve power quality. In general, the application of the intelligent algorithm to the
traditional power system will greatly improve the reliability of the power grid and
reduce the losses caused by the network in security. Finally, the further research
direction of this method is discussed. Since the research of HHT method started late,
several shortcomings of this intelligent method in practical application stated in the
previous section need further exploration and practice.

Acknowledgment. Project supported by Natural Science Foundation of Fujian Province, China


(Grant No. 2015J01630) and Fujian University of Technology Research Fund Project (GY-
Z18060).

References
1. He, R., Ai, B., Stüber, G.L., Zhong, Z.: Mobility model-based non-stationary mobile-to-
mobile channel modeling. IEEE Trans. Wirel. Commun. 17(7), 4388–4400 (2018)
2. Zhang, J., Tan, X., Zheng, P.: Non-destructive detection of wire rope discontinuities from
residual magnetic field images using the Hilbert-Huang transform and compressed sensing.
Sensors 17, 608 (2017)
3. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G
base station based on internet of things collaborative control. IEEE Access 8, 32935–32946
(2020)
4. Huang, N.E., Shen, Z., Long, S.R., et al.: The empirical mode decomposition and the Hilbert
spectrum for nonlinear and non-stationary time series analysis. 454(1971), 903–995 (1998)
5. Chen, D., Lin, J., Li, Y.: Modified complementary ensemble empirical mode decomposition
and intrinsic mode functions evaluation index for high-speed train gearbox fault diagnosis.
J. Sound Vib. 424, 192–207 (2018). https://doi.org/10.1016/j.jsv.2018.03.018
6. Zheng, C., et al.: Analysis and control to the ultra-low frequency oscillation in southwest
power grid of China: a case study. In: 2018 Chinese Control and Decision Conference
(CCDC), Shenyang, pp. 5721–5724 (2018)
7. Jiang, K., Zhang, C., Ge, X.: Low-frequency oscillation analysis of the train-grid system
based on an improved forbidden-region criterion. IEEE Trans. Ind. Appl. 54(5), 5064–5073
(2018)
Study of the Intelligent Algorithm of Hilbert-Huang Transform 585

8. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for
wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7,
105562–105571 (2019)
9. Wang, Y., Dong, R.: Improved low frequency oscillation analysis based on multi-signal
power system. Control Eng. China 26(07), 1335–1340 (2019)
10. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently
safer design strategies (IV). The advanced thin film manufacturing process design and
adjustment. J. Loss Prev. Process Ind. 43, 280–291 (2016)
11. Lijie, Z.: Application of Prony algorithm based on EMD for identifying model parameters of
low-frequency oscillations. Power Syst. Protect. Control 37(23), 9–14+19 (2009)
12. Ucar, F., Alcin, O.F., Dandil, B., Ata, F.: Power quality event detection using a fast extreme
learning machine. Energies 11, 145 (2018)
13. Lu, C.-C., Chang, K.-C., Chen, C.-Y.: Study of high-tech process furnace using inherently
safer design strategies (III) advanced thin film process and reduction of power consumption
control. J. Loss Prev. Process Ind. 43, 280–291 (2015)
14. Sahani, M., Dash, P.K.: Automatic power quality events recognition based on Hilbert Huang
transform and weighted bidirectional extreme learning machine. IEEE Trans. Ind. Inf. 14(9),
3849–3858 (2018)
15. Vergura, S., Carpentieri, M.: Phase coherence index, HHT and wavelet analysis to extract
features from active and passive distribution networks. Appl. Sci. 8, 71 (2018)
16. Zhao, J., Ma, N., Hou, H., Zhang, J., Ma, Y., Shi, W.: A fault section location method for
small current grounding system based on HHT. In: 2018 China International Conference on
Electricity Distribution (CICED), Tianjin, pp. 1769–1773 (2018)
17. Li, K., Tian, J., Li, C., Liu, M., Yang, C., Zhang, G.: The detection of low frequency
oscillation based on the Hilbert-Huang transform method. In: 2018 China International
Conference on Electricity Distribution (CICED), Tianjin, pp. 1376–1379 (2018)
18. Shi, Z.M., Liu, L., Peng, M., Liu, C.C., Tao, F.J., Liu, C.S.: Non-destructive testing of full-
length bonded rock bolts based on HHT signal analysis. J. Appl. Geophys. 151, 47–65
(2018). https://doi.org/10.1016/j.jappgeo.2018.02.001
19. Kabalci, Y., Kockanat, S., Kabalci, E.: A modified ABC algorithm approach for power
system harmonic estimation problems. Electr. Power Syst. Res. 154, 160–173 (2018).
https://doi.org/10.1016/j.epsr.2017.08.019
20. Bečirović, V., Pavić, I., Filipović-Grčić, B.: Sensitivity analysis of method for harmonic
state estimation in the power system. Electr. Power Syst. Res. 154, 515–527 (2018). https://
doi.org/10.1016/j.epsr.2017.07.029
Study of Reduction of Inrush Current on a DC
Series Motor with a Low-Cost Soft Start
System for Advanced Process Tools

Governor David Kwabena Amesimenu1,2, Kuo-Chi Chang1,2,6(&),


Tien-Wen Sung1,2, Hsiao-Chuan Wang3, Gilbert Shyirambere1,2, Kai-
Chun Chu4, and Tsui-Lien Hsu5
1
School of Information Science and Engineering,
Fujian University of Technology, Fuzhou, China
albertchangxuite@gmail.com
2
Fujian Provincial Key Laboratory of Big Data Mining and Applications,
Fujian University of Technology, Fuzhou, China
3
Institute of Environmental Engineering, National Taiwan University,
Taipei, Taiwan
4
Department of Business Management, Fujian University of Technology,
Fuzhou, China
5
Institute of Construction Engineering and Management,
National Central University, Taoyuan, Taiwan
6
College of Mechanical and Electrical Engineering,
National Taipei University of Technology, Taipei, Taiwan

Abstract. The existence of high armature current is the main limitation while
starting a DC series motor. This initial current can have dangerous effects on the
DC motor such as damage of windings and reduction in the lifetime of the
machine. There are two basic methods of starting of DC motor namely the
resistor starting method and the soft start using solid-state devices. In this
project, to lessen the inrush current, the solid-state method was adopted to start
the motor smoothly. The system uses Arduino Atmega328p-PU Microcontroller
to send pulses to the motor driver which regulates the running of the motor. By
focusing on DC series motor, power electronic converters and microcontrollers,
the system was modeled and simulated in Proteus ISIS Software. The results
recorded from the soft starter system were compared with a direct online starting
system considering the starting current and voltage of the motor with an open
loop control system. The soft starter system was able to decrease the direct
online current value from 0.996 A to 0.449 A which represents approximately
28.1% reduction of the starting current to protect the motor.

Keywords: DC series motor  Arduino Atmega328p-PU  Inrush current 


Low-cost soft start system  Advanced process tools

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 586–597, 2021.
https://doi.org/10.1007/978-3-030-58669-0_53
Study of Reduction of Inrush Current on a DC Series Motor 587

1 Introduction

Almost every mechanical motion that is seen around is achieved by an electric motor.
Changing from electrical energy to mechanical energy is done by devices called
electric motors. Per the power supply demand, generally there are two categories of
electric motors which are the DC and AC motors. This project serves to provide a
concise account of the soft starting of DC motors. DC motors consist of two main parts
namely the stator which is the stationary part having the field windings and the rotor
being the rotating part containing the armature windings. However, more than one
permanent magnet pieces makes up the stator of a permanent magnet DC motor. Based
on the construction of DC motors and the mode of excitation, there are five major types
of DC motors. They are permanent magnet DC motor, compound DC motor, shunt DC
motor, series DC motor, and separately excited DC motor [1].
The permanent magnet motor uses a magnet to supply the field flux and have better
regulation of speed and better ability of starting. The driving of certain limited amount
of load which cannot be exceeded is the drawback of a permanent magnet DC motor
and is usually used in low powered areas. The link between the field windings and the
armature windings of the series motors is done in series. The series motor develops a
high starting torque at start and the speed varies widely between full load and no load.
Considering the connection between the field windings and the armature windings of
the shunt DC motor, it is done in parallel and operates under better regulation of speed.
The winding of the field is connected to the same source as the winding of the armature
and can also be separately excited. An advantage of the shunt field being separately
excited is the ability of a variable speed drive to establish an independent control of the
field and armature [1].
In compound motors, the field is connected in series with the armature and a
separately excited shunt field. The series field can cause hitches in a variable speed
drive application [1]. In a separately excited motor, the voltage source for the armature
circuit is separated from the field excitation circuit, that is, both the armature circuit and
field circuit have different voltage supply sources. Although DC motors are self-
starting, one major drawback encountered is that, DC motor draws high amount of
current during the start of the motor which in effect can damage the armature windings.
The reason of the high initial armature current is due to absence of back emf at starting
[2]. In view of this, there is the need to limit the high starting armature current using a
starter.

1.1 Review of Related Work


Soft starting the DC motor will prevent high inrush current and also prevent the
armature windings of the motor from burning. Researchers have implemented various
starting methods to curb the high inrush current to the armature circuit when the motor
is started. Resistor starting method and the solid-state starting method [3] are the two
major methods implemented in the start of DC motors. The methods used, the out-
comes, merits and demerits of the work of some researchers are going to be discussed.
Resistors are used to reduce the high inrush current to the armature windings by
connection them in series with the armature windings of the motor. There are different
588 G. D. K. Amesimenu et al.

types of resistor starting method used depending on the type of DC motor [4]. These
include the 2-point starter, 3-point starter, and the 4-point starter [4]. A 2-point starter is
connected in series with the armature circuit to protect the motor from high inrush
current during starting [4]. The two-point starter is used for only series motors and has
a no-load release coil [2]. The no-load release coil helps to stop the motor when it is
started without a load. Since the two-point starter has a limitation of being used in
series motors only, a three-point starter is used in shunt DC motors or compound
wound DC motors [4]. The three-point starter is preferred for armature speed control of
DC motors. It cannot be used where wide range of speed control, by shunt field control
is required [3]. The four-pointer starter is preferable for field speed control of DC
motors. A drawback of this starter is that, it fails when there is an open-circuit in the
field windings. A common application of rectifier circuits is to convert an AC voltage
input to a DC voltage output [5]. There are two main types of rectifiers, half wave
rectifier and full wave rectifier. A half wave rectifier only conducts during one half-
cycle of the input AC voltage. This introduces harmonics in the output current; such
harmonics are undesired by dc load and leads to increased power losses, poor power
factor and interference [6].
In [3], three conventional means of controlling and monitoring the armature current
level when starting a dc motor are presented, namely: the use of a gradually decreasing
tapped resistance between the supply voltage and the motor armature circuit, the use of
a chopper circuit between the supply voltage and the motor armature circuit and the use
of a variable DC voltage source. Before the bit by bit buildup of the motor speed, the
circuit of the armature was in series connection with a resistance and was slowly
removed as the motor generates its own emf to drive it. Steady-state analysis was used
to calculate the period of movement from one tap to the other [7, 8]. From the results
obtained, it was observed that the highest amount of current in the armature did not
surpass twice its value of rating. Secondly, a step-up converter was used in controlling
the armature current. The hysteresis controller was used to bias the controlled switch of
the chopper circuit [9].
Finally, a variable DC voltage source was used in an indirect way to control the
armature current. The level of the voltage source was minimal at the start-up and the
controlled full-wave rectifier was used to increase the back emf bit by bit. The value of
the peak of the current in the armature windings has its ripples to be minimal and did
not surpass the value of its rating. In conclusion, the peak value of the current in the
armature seemed to have been controlled by the variable voltage DC source method
and the ripples of the current in the armature windings was minimal [8].
The significant of adding of starter circuit in different kinds of DC motors is
explained in [10]. A technique of soft starting is introduced as a measure of controlling
how the motor is to be started. The paper gives a study on the different types of starters
used in the industries and also elaborates on why motors burn when starters are not
used. Armature of a motor has no back emf generated in a motor is not in motion,
however large amount of current would be drawn due to its relatively small resistance
when the stationary armature is supplied with full load across it [10]. The large amount
of current drawn is capable of causing damage to the windings, brushes and com-
mutators, and to mitigate this occurrence, a series connection between the armature
winding and a resistance is employed to alleviate the damage of the armature during the
Study of Reduction of Inrush Current on a DC Series Motor 589

time for start only [11]. When the motor reaches an appreciable speed and generate an
emf that can regulate the motor speed, the resistance is slowly cut off. In conclusion,
DC motors need the external starter for its starting then after it has gained a good speed,
they can be cut off. The provision from zero voltage to a voltage rating is basically done
by a starter and also reduces the inrush current and controls current at start to a safer
amount until the speed and torque rating of the motor is achieved.
Two methods of starting a PV fed DC motor, namely resistor start and a hysteresis
control of armature current are presented in [2]. The hysteresis control of armature
current method aims at including a chopper circuit with a hysteresis controller to the
armature circuit to restrict the high armature current. Controlling and monitoring the
current of the armature windings in two quantities of pre-set verge by switching a
MOSFET between the set values [2]. A comparative study between the resistor starter
and the chopper circuit with hysteresis controller starter was conducted. According to
this paper’s simulation results, the latter method of chopper circuit with hysteresis
controller appear to reduce the initial armature current and avoid wastage of energy
present in the conventional resistor start method. The resistor starting method presents a
shorter settling time.
The starting of DC motors using SCR module is presented in [12]. This SCR
module consists of a bridge rectifier which is designed with two thyristors [12], two
diodes and a firing circuit [13] that takes a single-phase supply. The rectifier gives a
variable DC voltage output that is fed to the terminal of the armature. Laboratory test
result of a 220 V, 5 hp DC shunt motor indicates a starting current of approximately
12 A at no load condition. By using the SCR module, the starting current is reduced to
2 A.

1.2 Study Aims


Designing a soft starting circuit to limit the high amount of current which enters the
armature circuit of DC motor during the starting of the motor in order to protect the
machine from any damages is the purpose of this project. The objectives of this project
are to:
(1) Review existing literature on starting methods of DC motors.
(2) Model DC motor soft starter for both DC source and AC source and in proteus ISIS
software.
(3) Study the performance of the dc motor soft starter and give recommendations.
(4) Compare the direct-online starting method and the soft start.

2 System Design and Production

A microcontroller, the driver circuit, the relay and a rectifier form the soft starter circuit.
AC and DC power sources are the two power sources used of which the AC power
converted to DC power is done by the rectifier circuit. The AC power source is rectified
and is prioritized as the main source of supply to the DC motor. The DC power source
serves as a back-up to the AC power source. The relay in the circuit switches between
590 G. D. K. Amesimenu et al.

the rectified AC source and the DC source. The Arduino Atmega328p-PU Micro-
controller is the controller used in this proposed system to generate and sends pulses
that control the switching action of the MOSFET in the driver circuit. The pulses
generated by the microcontroller determines the duty cycle of the chopper circuit in the
driver circuit which determines the output of the chopper circuit. The driver circuit
receives signals from the microcontroller and feeds DC power into the DC motor in
varying amounts thereby indirectly controlling the motor speed and torque. A real time
clock (RTC) records the date and starting time of the motor whiles the display shows
the recorded time. The soft starter employed in the system helps to control the supply
voltage to the DC motor to protect the motor windings from burning and damaging.
The system will be executed through modeling and simulation using Proteus ISIS
software. The block diagram of the soft starter system is shown in Fig. 1 [14–18].

Fig. 1. Block diagram of the soft starter system.

2.1 AC Power Supply


The circuit of the AC power supply represented in Fig. 2. The AC power supplies
consist of a transformer, an uncontrolled single-phase bridge rectifier, voltage con-
troller and filtering capacitor. The transformer is used to step down the supply AC
voltage from 240 V to 12 V RMS. AC-DC single-phase full-wave uncontrolled bridge
voltage rectifier is used to convert the alternating voltage to a direct voltage to be
applied to the dc motor. The AC power supply consist of four diodes arranged in a
bridge arrangement passing the positive half of the wave but inverting the negative half
of the sine-wave to create a pulsating DC output. Two filtering capacitors C1 and C2 of
values 100 lF and 470 lF respectively are used to provide a smooth pulsating output
DC. The voltage regulator is employed to provide steady 12 V supply to the motor
driver circuit.
Study of Reduction of Inrush Current on a DC Series Motor 591

Fig. 2. Schematic diagram of the AC power supply circuit.

2.2 DC Power Circuit


Figure 3 shows the DC power source circuit which mainly consists of DC power
source, switch and a diode. The DC power source provides a steady 12 V power to the
motor. The unidirectional flow of current is done by the diode to prevent reverse flow
of power to the source.

Fig. 3. Direct current power source.

2.3 Liquid Crystal Display (LCD)


The liquid crystal display shows the real time at which the motor starts, recorded by the
real time clock (RTC), the PWM level of the controller and the PWM level expressed
in percentage. A 16  2 LCD displays 16 characters per line. A variable resistor is
connected to the LCD to adjust the contrast of the screen. The diagram of the liquid
crystal display is shown in Fig. 4.
592 G. D. K. Amesimenu et al.

Fig. 4. Liquid Crystal Display (LCD).

2.4 Microcontroller
Figure 5 shows the Arduino Atmega328p-PU Microcontroller unit in the circuit which
is a self-contained system with a processor and other peripherals. It is having 8-bit
processor core which has codes written on to be executed and operates between 1.8–
5.5 v. It is the intelligent unit of the system and it is programmed to take inputs from
the device it is controlling and retain control by sending the device signals to different
parts of the device. The controller has six PWM output channels which drive the power
electronic switches of the starter.

Fig. 5. Arduino Atmega328p-PU Microcontroller.

2.5 Real Time Clock (RTC)


The DS1302 real time clock was used in the system to communicates with the
microcontroller through the 12C communication and its operating power is between
3.3–5 v. This means that it communicates with the microcontroller using just 2 pins. It
keeps track of the current time hence you do not have to set the date and time every
time you start the motor. The RTC has a crystal oscillator which uses less power and it
is very cheap. Figure 6 shows a diagram of the real time clock.
Study of Reduction of Inrush Current on a DC Series Motor 593

Fig. 6. Real Time Clock (RTC).

2.6 Motor Driver


The circuit in the motor driver links the microcontroller to the DC series motor. The
motor driver circuit in this project has two main functions. It drives the motor in either
forward or backwards direction and the voltage given to the DC motor is controlled by
a converter. The circuit is made of an H-bridge converter with four MOSFET whose
gates are triggered by the microcontroller given signal. The motor driver receives
signals from the microcontroller and transmits the relative signal to the motor to control
the working speed and direction of the motor [9]. The relationship of the input and
output of the rectifier is given by Eq. (1) and (2):

VDC ¼ 0:9  VRMS ð1Þ

OR

2Vmax
VDC ¼ ð2Þ
p

Where output voltage is VDC, input voltage (root mean square) is VRMS, maximum
peak to peak value of a half cycle is Vmax.
The switching of the MOSFET gives a variable output voltage of the converter to
the DC series motor. Figure 7 shows the motor driver used in this system.
594 G. D. K. Amesimenu et al.

Fig. 7. Motor driver.

2.7 Relay
Electromechanically allowing the flow of current in a circuit is done by a switching
device called Relay and this operation is done by opening and closing of contacts of the
relay. The NO on a relay represents normally open and this shows that the relay is has
not been energized and the contacts are opened not to allow the flow of current and
vice-versa when on NC which means normally close. The functions will change when
current is applied to it. Therefore, in this project a relay is used to select between the
types of power supply source to the motor. The AC source is set as primary hence it is
connected to the normally open (NO). Thus, when the rectified AC current passes
through the coil of the relay, it becomes energized to change the state of the contactor.
Figure 8 shows a diagram of the relay.

Fig. 8. Relay.

3 Experimental Results and Discussion

In this section of the project, discussions will be made on starting and running behavior
of soft starter system of the DC series motor compared to direct online system by
modeling the system using Proteus ISIS Software. Figure 9 shows wiring diagram of
the whole soft starter operation in this study.
Study of Reduction of Inrush Current on a DC Series Motor 595

Fig. 9. Wiring diagram of the soft starter system.

Considering direct online starting during simulation, the startup current increases
significantly with a current value of about of 0.996 A at maximum speed of 498 rpm
and operates at full voltage and after it attained it constant speed, a steady-state current
of 0.4978 A was recorded. This result shows how high inrush current the motor
receives before the current drops with time to its steady-state. However, with the soft
starter during simulation, the startup current the DC motor receives compared to the
direct online starter is smaller and it gradually grows with respect to time. The startup
current received by the DC motor was 0.78 A at a zero voltage which increases
gradually with respect to the duty cycle of the converter which shows about 21.3%
decrease in starting current during simulation.
Also, during the use of direct online starting system in simulation with Proteus ISIS
Software, the DC motor starts with a very high inrush current and later reduces when
the speed of the motor increases to its rated value with respect to time. Soft starter

Fig. 10. Current-speed graph of direct-online and soft starter.


596 G. D. K. Amesimenu et al.

system on the other hand starts at a lower current to the DC motor and gradually
increases with as the motor speed increases till the speed gets to its rated value. The
current-speed characteristic of the direct-online starter and the soft starter systems is
shown in Fig. 10.

4 Conclusion and Recommendation

The soft starter system was designed to reduce the starting voltage of the DC motor
which intend will also affect the current at the start of the motor. The system comprises
of an Arduino Atmega328p-PU Microcontroller which sends the PWM signals to the
driver circuit. The driver circuit drives the motor according to the signals received from
the microcontroller. The soft starter circuit was designed and simulated using Pro-
teus ISIS Software. The results attained from the soft starter system were compared
with the direct-online starter system results. The direct-online starter system recorded a
starting current of 0.942 A against 0.716 A for the soft starter system and a current of
0.449 A was recorded at maximum speed. Per the results attained, it can be concluded
that the employment of the soft starter system gives protection from zero voltage to a
voltage rating which is basically done by a starter and also reduces the inrush current
and controls current at start to a safer amount until the speed and torque rating of the
motor is achieved as compared to the direct-online starting method. It was found that
the smooth stopping of the motor was a challenge encountered which can later be
looked at in the future. An improved method of stopping the motor can be employed in
the future to make the project more efficient.

References
1. Shrivastava, K., et al.: A review on types of DC motors and the necessity of starter for its
speed regulation. Int. J. Adv. Res. (2016)
2. Sudhakar, T., et al.: Soft start methods of DC motor fed by a PV source. Int. J. Appl. Eng.
Res. 10(66) (2015). ISSN 0973-4562
3. Taleb, M.: A paper on Matlab/Simulink models for typical soft starting means for a DC
motor. Int. J. Electr. Comput. Sci. IJECS-IJENS 11(2) (2011)
4. Koya, S.A.: DC Motor Control: Starting of a DC Motor
5. Theraja, B., Theraja, A.: A Textbook of Electrical Technology Volume I Basic Electrical
Engineering in S.I. System of Units. S. Chand & Company, New Delhi (2005)
6. Rashid, M.H.: Power Electronics Handbook. Academic Press, San Diego (2001)
7. Sen, P.C.: Principles of Electric Machines and Power Electronics. Wiley, New York (1989)
8. Matlab Software Version 6.5, The Math Works Inc. (2002)
9. Chen, H.: Value Modeling of Hysteresis Current Control in Power Electronics (2015)
10. Kumar, R.: 3- Coil starter use for starting D.C. motor. Int. J. Sci. Res. Eng. Technol.
(IJSRET), March (2015)
11. Theraja, B., Theraja, A.: Electrical Technology, AC & DC Machines Volume II. S. Chand &
Company, New Delhi (2005)
12. Holonyak, N.: The silicon P-N-P-N switch and controlled rectifier (thyristor). IEEE Trans.
Power Electron. 16(1), 8–16 (2001)
Study of Reduction of Inrush Current on a DC Series Motor 597

13. In: IEEE 7th Power Engineering and Optimization Conference (PEOCO), 22 July (2013)
14. Atmel 8-bit AVR Microcontrollers ATmega 328IP datasheet summary
15. USB Radio Clock Meinburg. Accessed 20 Oct 2017
16. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Agent-based middleware
framework using distributed CPS for improving resource utilization in smart city. Future
Gener. Comput. Syst. 108, 445–453 (2020). https://doi.org/10.1016/j.future.2020.03.006
17. Chang, K.C., Chu, K.C., Wang, H.C., Lin, Y.C., Pan, J.S.: Energy saving technology of 5G
base station based on internet of things collaborative control. IEEE Access 8, 32935–32946
(2020)
18. Chu, K.C., Horng, D.J., Chang, K.C.: Numerical optimization of the energy consumption for
wireless sensor networks based on an improved ant colony algorithm. J. IEEE Access 7,
105562–105571 (2019)
Co-design in Bird Scaring Drone Systems:
Potentials and Challenges
in Agriculture

Moammar Dayoub1(B) , Rhoda J. Birech2 , Mohammad-Hashem Haghbayan1 ,


Simon Angombe2 , and Erkki Sutinen1
1
University of Turku (UTU), 20500 Turku, Finland
modayo@utu.fi
2
University of Namibia, Windhoek, Namibia

Abstract. In Namibia, agricultural production depends on small-scale


farms. One of the main threats to these farms is the birds’ attack. There
are various traditional methods that have been used to control these
pest birds such as making use of chemicals, fires, traps, hiring people to
scare the birds, as well as using different aspects of agricultural mod-
ifications. The main problem of using such methods is that they are
expensive, many are harmful to the environment or demand extra-human
resources. In this paper, we investigate the potential and challenges of
using a swarm of drones as an Intelligent surveillance and reconnaissance
(ISR) system in a bird scaring system targeting a specific type of bird
called Quelea quelea, i.e., weaver bird, and on Pearl millet crop. The idea
is to have a co-design methodology of the swarm control system, involv-
ing technology developers and end-users. To scare away the birds from
the field, a disruption signal predator-like sound will be produced by the
drone. This sound is extremely threatening and terrifying for most bird
species. The empirical results show that using the aforementioned tech-
nology has a great potential to increases food security and sustainability
in Africa.

Keywords: Swarm of drones · Bird control · Precision agriculture ·


Quelea quelea · Pearl millet

1 Introduction

The Red-billed Quelea Quelea bird is the most important pest in agriculture
affecting cereal crops in Africa. The damage caused to small grain crops in the
semi-arid zones of Africa was estimated at 79.4 million US$ per annum in the year
2011 [1]. At the same time, birds are an important component of agroecosystems.
The birds feed on rodents, insects, and other arthropods, hence balancing the
ecosystem [2]. They depend on agriculture for food in the form of grains, seeds,
and fruits.
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 598–607, 2021.
https://doi.org/10.1007/978-3-030-58669-0_54
Co-design in Bird Scaring Drone Systems: 599

Various techniques have been used by people to control the Quelea, which
consists of scaring away, trapping and catching birds for food, harvesting eggs,
disturbing or destroying the nests, burning the birds while roosting and poisoning
with organophosphate avicide. All these methods create disruptions to the birds,
which finally die or vacate the area and migrate to another place, resulting to
an imbalanced ecosystem and a biodiversity threat [3]. Killing the birds is not a
proper solution to the bird problem as attempted mass killing of Sturnus vulgaris
in Europe and Quelea birds in Africa have been disapproved by the international
conventions [2].
Most small-holder farmers in Africa have no access to costly and sophisti-
cated equipment and materials to control birds such as aircraft, boom-sprayers,
chemical avicides, firebombs, and dynamite, and have instead relied on tradi-
tional methods, which are largely effective and environmentally friendly, but
time-consuming, tedious and limited in the scale of application [4,5] Unmanned
aerial system (UAS) is a technological innovation with great potential to improve
the management efficiency of agriculture and natural resource systems such as
crops, livestock, fisheries, and forests [6].
The most important factors limiting pearl millet production in Namibia are
unfavorable climate and erratic weather patterns, the widespread use of tradi-
tional farming practices, limited farm sizes, and Red-billed quelea birds (Quelea.
quelea lathamii) [7]. The quelea is a small weaver bird native and endemic to
sub-Saharan Africa whose main food is the seed of annual grasses. It attacks
small-grained cereal food crops in the absence of such grasses [8].
The cost of protecting crops from the Quelea is very high for a country like
Namibia. We are looking for safe, cheap, and effective ways to protect crop yield
for farmers, ensure food security, and increase the level of livelihood in those
areas.

2 The Main Challenges of Agriculture in Namibia

Namibia is the driest country in Sub-Saharan Africa. It is classified to be an


arid and semi-arid county with 55% of the land being arid, receiving less than
300 mm of rain per year, and 40% being semi-arid, receives between 300–500 mm
per year. The main crops grown in Namibia include pearl millet, maize, sorghum,
wheat, grapes, and date palm. Pearl millet, locally known as Mahangu, is the
most widely grown crop (by the land area) by smaller holder communal farmers
under rained condition [9]. In communal farming areas, in particular, the drier
North Central Namibia, pearl millet is grown almost exclusively (Ohangwena,
Oshikoto, Oshana, Omusati) (300 mm–450 mm of rainfall) [10].
Pearl millet is the most important food security cereal crop in Northern
Namibia as it occupies 80% of cropped land; followed by maize, then sorghum
takes the third position. Sorghum occupies almost the same land area as maize
and is planted almost exclusively by communal farmers. Pearl millet accounts
for approximately 40% of cereal grain intake and 24% of total calorie intake by
Namibian consumers [11]. It is therefore the principal source of food security to
600 M. Dayoub et al.

the majority of the country’s rural population and forms a crucial part of the
national diet [12]. The local cultivars of pearl millet are predominantly planted
due to their hard kernel which stores well, strong stalks that do not lodge easily
and have good food value. Pearl millet mature in 120–150 days, giving yields
that are the lowest among the cultivated cereals (250–300 Kg/ha) [13]. Pearl
millet grain is nutritionally better than other cereals such as maize, rice, and
sorghum due to its high protein content [14]. They are also a valuable animal
feed, with higher protein content and a better-balanced amino acid profile than
most cereals, such that less protein concentrate is required in a pearl millet-based
animal feed ration [15].

3 Economic Losses Caused by Queleas


The Quelea is considered the greatest biological limitation of cereal production
in Africa. It is mentioned as the most numerous and destructive bird on earth
[16]. The estimated adult population of Quelea in Africa is at least 1.5 billion,
causing agricultural losses in excess of 50 million US$ annually according to FAO.
A colony of Quelea in Namibia was estimated to number 4.8 million adults and
4.8 million fledglings, and to consume approximately 13 tonnes of insects and
800–1200 tonnes of grass and cereal seeds during its breeding cycle [17,18]. One
Quelea bird consumes grain that is half of its size (10 g) per day, meaning that 10
tonnes of crops can be consumed daily by a swarm of one million birds [18,19].
Crop losses can range from 50 to 100% depending on the extend/duration of
invasion [20] estimates that 2 million Quelea birds can destroy 50 tonnes of rice
crop in a day valued at 600,000 USD. Despite these losses of phenomenal scale,
little research currently is being conducted of the Queleas. The most damage
is inflicted on the crops if the birds attack a crop when their seeds are at milk
stage. In cereals, the milky stage is the period when a milky white substance
begins to accumulate in the developing grains. The bird is also destructive at
the dough stage and less destructive at the grain maturity and harvesting stages.
In Northern Namibia, the period from the milk stage to the harvesting stage lasts
for 2–3 months and this long duration of protracted control is a huge cost to the
farmer.

4 Control of Queleas
4.1 Modelling and Early Warning
The dependence of Quelea breeding on rainfall allows for the prediction of inva-
sion based on rainfall occurrence. Models for the occurrence of the invasion in a
location has been done based on these rainfall patterns [21].

4.2 Application of Chemical Avicide


Aerial application of Fenthion, an organophosphate pesticide has been used
extensively against birds in crop farms. It is highly poisonous to birds and slightly
poisonous to mammals. The chemical is applied at night in the nesting area [22].
Co-design in Bird Scaring Drone Systems: 601

4.3 Fire-Bombs
Another way of controlling Quelea birds is by blowing them up using firebombs
or dynamite as they concentrate to roost. Explosives made of diesel and petrol
mixtures are set and detonated to create fires that kill birds and their breeding
colonies [23].

4.4 Traditional Methods


Bird scaring - Bird scaring is done by using scarecrows, chasing, and shouting at
the birds to scare them away. This practice is mostly done by children throwing
stones, clapping hands, and beating metals [24]. However, the birds soon become
habituated to the scarecrows and noise and they ignore them. The birds get into
the fields as early as possible and by the time farmers wake up, the birds are
already in the fields feeding. Sorghum farmers in Namibia and Botswana guard
sorghum and millet fields daily from early morning until evening from the months
of March to May. They ignite leaves and throw stones to drive them away. All
other household duties are temporarily put aside and delayed during that period.
Farmers need to guard the fields otherwise the yield will be completely lost [25].

4.5 Killing the Birds and Use as Food


The Quelea bird is eaten by people in many parts of Africa [1,26]. Both the
adult birds and the chicks can be harvested and used for human consumption
at subsistence level or sold in the market. The harvesting of birds and chicks is
done communally every night until the surviving birds flee from the area [25].
Various means of trapping Quelea birds have been used in many parts of Africa
[22]. One example is spreading a fishing net near the roost to trap the swarming
birds returning to roost. Trapped birds are collected and used as food.

4.6 Changing Agricultural Practices


Planting as early as possible enables the crop to pass through the most vulnerable
stage, the grain-milk stage before the birds arrive. Other cultural practices are
creating disturbances on the nests and the eggs, which causes the colony to vacate
the nests and move to another place [1]. In Zimbabwe, the threat of Quelea birds
on small grain crops over the years has led farmers to change from small cereals
to maize. The maize may be attacked by wild pigs, monkeys, and baboons, but
control of those animals by scaring is easier compared to control of Quelea birds
[27]. Unfortunately in Namibia, the crop alternatives are limited and farmers are
left with the option of pearl millet, which is adapted to the harsh climate.

5 Using the Drones for Pest Bird Control


Drones as an ISR system, serve farmers to protect the crops from pests as well
as support the timely, efficient, and optimized use of inputs such as soil amend-
ments, fertilizers, seeds, and water. This process leads to an increase in yields
602 M. Dayoub et al.

and reduces the overall cost of farm operations. Nowadays, the Unmanned Aerial
Vehicle (UAV) is a low-cost option in sensing technology and data analysis sys-
tem [28]. The farmer can use UAV to scare birds away from orchards or crops
to avoid yield losses. The UAV provides loudspeaker broadcasting trouble sig-
nals, and the drone can be designed to imitate a huge predator bird. Research
showed that UAV can prevent extensive pest birds in a 50 m radius centered
on the UAV confirming that one UAV is capable of protecting a farm smaller
than 25 ha. This implies that a swarm of UAVs drones can be used to protect
larger farms [18]. The nuisance and trouble caused by the Quelea bird are best
understood by the farmer and it is important to apply the co-design approach
in the development of a robust UAV solution. Co-design is a technology pro-
duction method that involves the end-users in the design of the technology, in
this case, the Drones. This approach engages farmers and other stakeholders in
the technology development process. Stakeholder contribution and insight are
valuable in guiding the design process and bring about successful outcomes and
sustainability [29,30] see Fig. 1.

Drones, Camera and sound

Ground Camera

So ware

Computer

Bird Detec on system (Ground)

Bird.pdf

Fig. 1. System diagram of the proposed swarm drone bird

5.1 Methodology: Design of Swarm Drones

The drone is composed of a sensor on the ground, which investigates the space
and determines the time of flying the drones, it thus protects the crop from
in-coming birds. A drone has a multi-dimensional appearance with visual move-
ment; it is equipped with a convincing sound device and imitates the flight and
Co-design in Bird Scaring Drone Systems: 603

sound of a natural predator, which scares the pest bird. It is an effective way to
protect a large area from quelea birds without inflicting any harm to them. We
use alarm sounds released by the drone to avoid any bird approaching it. In this
study, we use different alarms to evaluate the effectiveness of band distress calls
on birds and to measure the effectiveness of control. The drone will be deployed
to protect agricultural fields from birds and will be used during some agricul-
tural practices. The main benefit of drones is its ability to reduce the damage
to the farm as well as to the birds. The study will also evaluate the response of
quelea birds to drones flying at different altitudes. It will assess the effectiveness
of alerts when drones are flying within 30 m above ground level (AGL) and at
lower altitudes.

5.2 Technical Setup and Analysis Results

In this section we show the technical setup used to control and steer the swarm
and some results about the amount of energy consumed by each drone. The
weight of each drone is around 8 Kg including the sound producer and other
peripherals. To design this system two parts should be covered, i.e., 1) control
part and mission part. The control part is all the processes that are focused to
control the drones and collision avoidance that is directly related to the swarm
of drones while the mission part is focused on performing the mission that is
scaring the bird. For control system we use non-linear technique from [31] for
take-off and [32] for landing the drones. The main strategy here is to perform
near-optimal algorithms on each drone to save the battery energy as much as
possible and also remove the drone vibration in a way that object detection,
i.e., bird, and tracking is done appropriately. To guarantee a suitable collision
avoidance we used the technique explained in [33]. To cover all the areas in a
land, simultaneous localization, and mapping (SLAM) technique is applied [34].
GPS is used to perform SLAM since the drone operates in open areas. GPS
module is mounted on the drones and the location of each drone is reported via
GPS. However the SLAM can be done via other techniques than GPS such as
visual odometry [34] for some semi-closed areas e.g., under the trees and special
weather conditions, that is the future work for this paper. For the mission part
the object detection and tracking are done via a normal convectional neural
network (CNN) is used. The main technique is explained in [35].
The land area is considered 4 hectares in a square shape that is 4 × 104 m2 .
3 drones as a swarm are used to cover the area. The swarm operates in a way
that each drone keeps 10 m distance from the other drones in a line formation
and the central drone is the leader [31] and [36]. If a drone is 8 Kg the energy for
each drone without considering the air friction is 8 × 10 × 30 = 2400 J. If the
friction of the air is negligible, the elapsed time for each total sweep of the land
is 4x104 /40xv where v is the speed of the drone and 40 is the sweep line. In our
experiment the speed of each drone is 50 Km/h, i.e., 13.8 m/s. The overall energy
consumption of the swarm is 3 × 4 × 104 /40 × 13.8 × 2400 J ≈ 1 MJ. This shows
that a our drone with two normal 12 V, 2200 mAh battery with one charge break
604 M. Dayoub et al.

can sweep all the land. Please notice that we consider the air friction negligible
and we take out the energy of drone acceleration/deceleration.

5.3 Co-designing the Drones

In order to ensure the functional design of the drone-based solution to the bird
problem, we will follow the co-design approach. Co-design in technology devel-
opment involves the engagement of shareholders, particularly during the design
development process. Co-design enables early-stage alterations and guarantees
delivery of a usable product while ensuring that the technology matches the needs
of the end-users. By using co-design, we can link the design with the needs of
the community in Namibia, and the end-user is credited for creating a home-
grown solution for crop protection against a notorious bird pest [37]. Co-design
involves a set of users, representing diverse stakeholders throughout the design
process, from ideation to the final product. The co-design team will consist of
the following stakeholders: farmers, ICT specialists, agronomists, extension staff,
ecologists, local authorities, etc. In the technical design of the drones, we will
make use of a team of MSc students in software engineering and ICT. The team
will be supervised by our support team at the University of Turku in Finland.
The aspects of sustainability will be catered from the onset of the project: During
the stakeholder interface, pertinent questions on the economic, environmental,
energy, and culture are addressed and factored into the drone design. Economy
and business: How to cover the budget of the design and running costs? How
should technology attain low-cost status and remain profitable to users? Envi-
ronment: How best can environmentally friendly materials used and how may
they be recycled? Energy - Can the gadget be charged by solar power? Culture:
What cultural consideration should be integrated, in the perspective of subsis-
tence farmers, women, and youth? Ethics: What are the general principles of
ownership, use, and Intellectual Property Rights (IPRs)?

5.4 Comparison of the Cost of Different Quelea Bird Control


Methods

The cost of method chemical control: the rate of application of fenthion on millet
in the form of Queletox R is on average 2 Kg/ha (1.5–2.4 Kg/ha) [6]. At a price
of 105 $ for 1 kg of fenthion, it costs USD 210 to purchase 2 Kg of the chemical
for the control of Quelea in one hectare. The cost of method Bird-scaring: The
other option is to hire people to scare the birds (3 persons × 60 days × 5 $
per day) which costs 900 $ per hectare. Our drone costs 400 $ per piece and
can cover at least 2 ha. If we used swarm drones (3 drones for instance), it can
cover a larger area and serve the farm for several years. The cost of buying
the drone is recovered in the first year and the technology could work for at
least 5 years. Besides that, the environment is preserved when the birds are
scared away instead of being killed; while no risk is posed to the food-chain. The
technology uses less energy compared to other traditional methods.
Co-design in Bird Scaring Drone Systems: 605

6 Conclusion, Regulations, and Future-Works


To reduce the damage of crops by pests, appropriate management practices are
required and smart solutions should be deployed on the farm. The amount of
damage would vary depending on the extent of the application of smart solu-
tions on the field. A drone has obvious design and sound advantages, combined
with its ability to fly quickly and randomly across an area, which means it is
quickly distinguished by birds as a threat. We will exploit the voice in this smart
bird control technology so that when the birds hear the various trouble signals,
they feel threatened and escape from the area. The digital audio circuit will
be designed with specific bandwidth and sufficient memory to generate alarms
repeatedly. The frequency of broadcasts will vary depending on altitude and
time of day [38].
There are various challenges affecting the implementation of UAVs in agri-
cultural systems. Regarding the regulation requirements, all drones flying in
Namibia require certification and permissions. UAV must be kept below a height
of 120 m from the ground and under the top of a man-made object. e.g. a building
or tower. The drone will not fly at night because the birds are inactive, whereas,
during the day, the drone can recognize the bird clearly. A drone is not allowed
to fly near the airport or other restricted airspaces. Using drones required the
permission of landowners. The drone should be kept within the line of sight of
the operator. This means it cannot be flown behind obstacles or through cloud
or fog [39].
Quality software plays a vital role in the applicability of drone technology [6].
In the future, the map of the area can be loaded into the drone controller. This is
then set up with boundary areas and present points to create flight paths for the
drone. The best flight paths to use and the height to fly the drone does depend
on the bird quantity, where the birds are living and the time of the day the drone
is functioned. Various flight ways can also be set up and used at different times.
The drone has long flight times allowing them to stay in the space for half an hour
to one hour (depending on the model). When the drone returns from its flight,
all that is required is to replace the battery before sending the drone to fly again.
In an agricultural rural setting, the stakeholder knowledge fed to the technology
during the co-design processes produces an ultimate ICT-based ‘African drone’
embedded with Intelligent Systems to become a bird control best practice and
a means of increasing production, preserve resources, improving income, and
achieving food security via the digitization of African Agriculture.

References
1. Jaeger, M., Elliott, C.C.: Quelea as a Resource, pp. 327–338. Oxford University
Press, Oxford (1989)
2. Dhindsa, M.S., Saini, H.K.: Agricultural ornithology: an Indian perspective. J.
Biosci. 19, 391–402 (1994)
3. Kale, M., Balfors, B., Mörtberg, U., Bhattacharya, P., Chakane, S.: Damage to agri-
cultural yield due to farmland birds, present repelling techniques and its impacts:
an insight from the Indian perspective. J. Agric. Technol. 1, 49–62 (2012)
606 M. Dayoub et al.

4. Elliot, C.: The New Humanitarian-Quelea-Africa’s most hated bird (2009). https://
www.thenewhumanitarian.org/news/2009/08/19
5. Anderson, A., Lindell, C.A., Moxcey, K.M., Siemer, W.F., Linz, G.M., Curtis, P.D.,
Carroll, J.E., Burrows, C.L., Boulanger, J.R., Steensma, K.M., Shwiff, S.A.: Bird
damage to select fruit crops: the cost of damage and the benefits of control in five
states. Crop Prot. 52, 103–109 (2013)
6. Sylvester, G.: E-agriculture in Action: Drones for Agriculture, p. 112, FAO (2018)
7. FAO NAD WFP, “ROP, Livestock and Food Security Assessment Mission to
Namibia” (2009). http://www.fao.org/3/ak334e/ak334e00.htm
8. Craig, A.F.J.K.: Quelea quelea, Birds of Africa, vol. 7 (2004)
9. Matanyairel, C.M.: Pearl Millet Production System(s) in the Communal Areas
of Northern Namibia: Priority Research Foci Arising from a Diagnostic Study.
Technical report (1996)
10. NAMIBIA, “Country Report on the State of Plant Genetic Resources for Food
and Agriculture Namibia,” Technical report (2008)
11. Rohrbach, E., Lechner, D.D., Ipinge, W.R., Monyo, S.A.: Impact from Investments
in Crop Breeding: the case of Okashana 1 in Namibia, International Crops Research
for the Semi Arid tropics, Technical report (1999)
12. Shifiona, T.K., Dongyang, W., Zhiguan, H.: Analysis of Namibian main grain crops
annual production, consumption and trade-maize and pearl millet. J. Agric. Sci.
8(3), 70–77 (2016)
13. Singh, G., Krishikosh Viewer Krishikosh (2003). https://krishikosh.egranth.ac.in/
displaybitstream?handle=1/5810014632
14. Roden, P., Abraha, N., Debessai, M., Ghebreselassie, M., Beraki, H., Kohler, T.:
Farmers’ appraisal of pearl millet varieties in Eritrea, SLM Eritrea Report 8, Tech-
nical report (2007)
15. Macauley, H.: Background paper Cereal Crops: Rice, Maize, Millet, Sorghum,
Wheat. ICRISAT, Technical report (2015)
16. Lagrange, M.: Innovative approaches in the control of quelea, ouelea auelea lath-
imii. In: Proceedings of the Thirteenth Vertebrate Pest Conference, p. 63, Zim-
babwe (1988)
17. Pimentel, D.: Encyclopedia of Pest Management. CRC Press (2007) www.crcpress.
com
18. Wang, Z., Griffin, A.S., Lucas, A., Wong, K.C.: Psychological warfare in vineyard:
using drones and bird psychology to control bird damage to wine grapes. Crop
Prot. 120, 163–170 (2019)
19. Elliott, C.C.H.: The Pest Status of the Quelea. Africa’s Bird Pest. pp. 17–34.
Oxford University Press, Oxford (1989)
20. Oduntan, O.O., Shotuyo, A.L., Akinyemi, A.F., Soaga, J.A.: Human-wildlife con-
flict: a view on red-billed Quelea quelea. Int. J. Mol. Evol. Biodivers. V5, 1–4
(2015)
21. Cheke, R.A., Veen, J.F., Jones, P.J.: Forecasting suitable breeding conditions for
the red-billed quelea Quelea quelea in Southern Africa. J. Appl. Ecol. 44(3), 523–
533 (2007)
22. Cheke, R.A., Sidatt, M.E.H.: A review of alternatives to fenthion for quelea bird
control, pp. 15–23, February 2019
23. Cheke, R.A.: Crop Protection Programme Quelea birds in Southern Africa: pro-
tocols for environmental assessment of control and models for breeding forecasts
R8314. Natural Resources Institute, University of Greenwich at Medway, Technical
report (2003)
Co-design in Bird Scaring Drone Systems: 607

24. L.C. of Africa, Lost Crops of Africa. National Academies Press, February 1996
25. CABI, Quelea quelea (weaver bird). https://www.cabi.org/isc/datasheet/66441
26. Mulliè, W.C.: Traditional capture of red-billed quelea Quelea quelea in the Lake
Chad Basin and its possible role in reducing damage levels in cereals. Ostrich
71(1–2), 15–20 (2000)
27. Mathew, A.: The feasibility of small grains as an adoptive strategy to climate
change. Russ. J. Agric. Socio-Econ. Sci. 41(5), 40–55 (2015)
28. Norasma, C.Y.N., Fadzilah, M.A., Roslin, N.A., Zanariah, Z.W.N., Tarmidi, Z.,
Candra, F.S.: Unmanned aerial vehicle applications in agriculture. In: IOP Con-
ference Series: Materials Science and Engineering, vol. 506, pp. 012063 (2019)
29. Wojciechowska, A., Hamidi, F., Lucero, A., Cauchard, J.R.: Chasing lions:
co-designing human-drone interaction in Sub-Saharan Africa, CoRR, vol.
abs/2005.02022 (2020). https://arxiv.org/abs/2005.02022
30. Myllynpää, V., Misaki, E., Apiola, M., Helminen, J., Dayoub, M., Westerlund,
T., Sutinen, E.: Towards holistic mobile climate services for farmers in Tambuu,
Tanzania. In: Nielsen, P., Kimaro, H.C. (eds.) Information and Communication
Technologies for Development. Strengthening Southern-Driven Cooperation as a
Catalyst for ICT4D, pp. 508–519. Springer International Publishing, Cham (2019)
31. Tahir, A., Böling, J.M., Haghbayan, M.H., Plosila, J.: Navigation system for land-
ing a swarm of autonomous drones on a movable surface. In: Proceedings of the
34th International ECMS Conference on Modelling and Simulation, ECMS 2020,
Wildau, Germany, 9–12 June 2020, pp. 168–174. European Council for Modeling
and Simulation (2020)
32. Tahir, A., Boling, J.M., Haghbayan, M.H., Plosila, J.: Comparison of linear and
nonlinear methods for distributed control of a hierarchical formation of UAVs.
IEEE Access 8, 95667–95680 (2020)
33. Yasin, J.N., Haghbayan, M.H., Heikkonen, J., Tenhunen, H., Plosila, J.: Formation
maintenance and collision avoidance in a swarm of drones. In: ISCSIC 2019: 3rd
International Symposium on Computer Science and Intelligent Control, Amster-
dam, The Netherlands, 25–27 September 2019, pp. 1:1–1:6. ACM (2019)
34. Mohamed, S.A.S., Haghbayan, M.H., Westerlund, T., Heikkonen, J., Tenhunen,
H., Plosila, J.: A survey on odometry for autonomous navigation systems. IEEE
Access 7, 97466–97486 (2019)
35. Rabah, M., Rohan, A., Haghbayan, M.H., Plosila, J., Kim, S.: Heterogeneous par-
allelization for object detection and tracking in UAVs. IEEE Access 8, 42784–42793
(2020)
36. Yasin, J.N., Haghbayan, M.H., Heikkonen, J., Tenhunen, H., Plosila, J.: Unmanned
aerial vehicles (UAVs): collision avoidance systems and approaches. IEEE Access
8, 105139–105155 (2020)
37. Dayoub, M., Helminen, J., Myllynpää, V., Pope, N., Apiola, M., Westerlund, T.,
Sutinen, E.: Prospects for climate services for sustainable agriculture in Tanzania.
In: Tsounis, N., Vlachvei, A. (eds.) Advances in Time Series Data Methods in
Applied Economic Research, pp. 523–532. Springer International Publishing, Cham
(2018)
38. Berge, A., Delwiche, M., Gorenzel, W.P., Salmon, T.: Bird control in vineyards
using alarm and distress calls. Am. J. Enol. Vitic. 58(1), 135 LP–143 LP (2007)
39. Maintrac Group: Drones for bird control - how it works - Main-
trac Group (2019). https://www.maintracgroup.com/blogs/news/drones-for-bird-
control-how-they-work
Proposed Localization Scenario
for Autonomous Vehicles in GPS Denied
Environment

Hanan H. Hussein1(&), Mohamed Hanafy Radwan2,


and Sherine M. Abd El-Kader1
1
Computers and Systems Department, Electronics Research Institute,
Giza, Egypt
{hananhussein,sherine}@eri.sci.eg
2
Valeo Inter-Branch and Automotive Software Corporate, Cairo, Egypt
mohamed.radwan@valeo.com

Abstract. The improvement of Advanced Driver Assistance Systems (ADAS)


has been increased significantly in last decades. One of the important topics
related to ADAS is the autonomous parking system. This system faces many
challenges to have an accurate estimation for the node (vehicle) position in
indoor scenario, especially in the absence of GPS. Therefore, an alternative
localization system with precise positioning is mandatory. This paper addresses
different indoor localization techniques with their advantages and disadvantage.
Furthermore, several localization technologies have been studied such as Ultra-
WideBand (UWB), WiFi, Bluetooth, and Radio Frequency Identification Device
(RFID). The paper compares between these different technologies highlighting
their coverage range, localization accuracy, applied techniques, advantages, and
disadvantages. The key contribution of this paper is proposing a scenario for
underground garage. Finally, a localization system based UWB-Time Difference
of Arrival (TDoA) is suggested deployed on IEEE 802.15.4a UWB-PHY
standard hardware platform with several Anchor Nodes (ANs), enabling com-
munication and localization with significant accuracy.

Keywords: Indoor localization  Autonomous vehicle  UWB

1 Introduction

In the last couple of years, localization [1] has become one of the most interesting
topics the industry field especially in vehicular technology era. Localization systems
become the key factor for many deployed applications, like Advanced Driver Assis-
tance Systems (ADAS), Intelligent Transportation System (ITS), military applications,
automation in precision farming, Internet of Things (IoT) [2],….etc. In all these
applications, it is mandatory to have accurate position.
In autonomous parking system, vehicles are driving autonomously to their parking
slots inside the underground garage [3]. Such system needs special wireless network in
order to communicate vehicles with some sensors in this garage. In addition, localization
system is required high precision of positioning (i.e. few centimeters). Nevertheless in

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 608–617, 2021.
https://doi.org/10.1007/978-3-030-58669-0_55
Proposed Localization Scenario for Autonomous Vehicles 609

independent GPS environment such as underground parking garages or indoor places,


Global Navigation Satellite System (GNSS) doesn’t support localization. Thus, different
local positioning systems have been proposed recently [4] to solve this problem with high
accuracy. In autonomous parking scenario, the main problems are:
• Navigate a vehicle autonomously inside the underground parking garage.
• The navigation process should have localization accuracy in range of centimeter,
with high flexibility, robustness and resolution.
• All the localization estimation and managing the vehicle movement should be by
the vehicle itself.
• Distribution model of the communication network inside the garage should be
optimized in order to cover all the garage area.
In this paper, we will discuss different techniques and technologies that support
indoor localization system. Besides, main motivation and proposed solution will be
showed to support autonomous parking scenario.
Remaining paper is introduced as follows: Sect. 2 and Sect. 3 show localization
techniques and technologies, respectively. A network scenario for underground parking
garage application is suggested in Sect. 4. Finally, conclusion and future work is
showed in Sect. 5.

2 Localization Techniques

In this section, several techniques which are widely applied for localization will be
discussed.

2.1 Received Signal Strength Indicator (RSSI)


The Received Signal Strength Indicator (RSSI) technique is considered as the simplest
and commonly applied technique for indoor localization [5]. The RSSI is the measured
received power level at Rx in dBm. Actual distance d between pairs (i.e. Tx and Rx)
can be predicted using RSSI; where distance d is inversely proportional with RSSI
level as

RSSI ¼ 10 n log10 d þ A; ð1Þ

where n is expressed as the path loss exponent. In case of free space, n equals 2 while it
equals 4 in case of indoor places. A is considered as the RSSI level at Rx.
It is noted that, RSSI needs at least triangulation points or N-point. For example,
RSSI is measured at the vehicle to estimate d from the vehicle and trilateration ref-
erence points. Finally, simple geometry can be applied to predict the vehicle’s location
related to the reference points.
610 H. H. Hussein et al.

2.2 Channel State Information (CSI)


Channel State Information (CSI) denotes to recognize channel characteristics of a
wireless link. This information defines how a signal propagates from Tx to Rx and
describes the combined effect (i.e. scattering, fading, and power decay with distance).
CSI is usually delivered higher localization accuracy than RSSI; as the CSI has the
ability to realize both channel amplitude response and channel phase response in
different frequencies [6]. However, the CSI is a complex technique. It can be formu-
lated in a polar form as

H ð f Þ ¼ jH ð f Þjejuðf Þ ; ð2Þ

where, jH ð f Þj is the channel amplitude response of the frequency f while channel


phase response is expressed as uðf Þ. Currently, several IEEE 802.11 Network Interface
Controllers (NICs) cards can offer subcarrier-level channel measurements for OFDM
approach which can be translated into richer multipath information, more stable
measurements and higher localization accuracy [6].

2.3 Fingerprinting Technique


Fingerprinting technique is employed for indoor localization systems. For localization
detecting, it is commonly demanding additional history or survey about surrounding
environment to acquire this environment fingerprints [7]. Primarily, RSSI measure-
ments are collected during the offline phase. Once the system is deployed, online
measurements are matched with the offline measurements to estimate vehicle location.
Numerous algorithms are improved to compare the online measurements with the
offline measurements like, probabilistic method, Artificial Neural Network, K-Nearest
Neighbor (KNN), and Support Vector Machine (SVM).

2.4 Angle of Arrival (AoA)


Angle of Arrival (AoA) technique is a widely technique applied at Rx. This technique
is based on antennae arrays [8] to calculate the distance among Tx and Rx, where the
incident angle at Rx is used to estimate the difference of time arrival at each element of
the antennae array as in Fig. 1.

2.5 Time of Flight (ToF)


Time of Arrival (ToA) or Time of Flight (ToF) uses the signal propagation charac-
teristics to determine d [9], in which the speed of the light (c ¼ 3  108 m/s) can be
multiplied by ToF value to estimate d.
Unfortunately, ToF demands synchronization among Tx and Rx. In several situa-
tions, timestamps is transmitted with signal to eliminate synchronization issue. Let t1
be the needed time to transmit data from Tx while t2 is the measured time at Rx after
receiving data; where t2 ¼ t1 þ tp. tp is expressed as the signal propagation time [9].
Hence, Dij is the measured distance between the Txi and Rxj can be formulated as
Proposed Localization Scenario for Autonomous Vehicles 611

Fig. 1. AOA technique [8]

Dij ¼ ðt2  t1 Þ  v; ð3Þ

2.6 Time Difference of Arrival (TDoA)


Time Difference of Arrival (TDoA) is similar to ToF. The difference between both
techniques that TDoA depends on the difference in signals propagation times from
several Tx. Usually, TDoA is measured at Rx [1]. The actual distance Dij is obtained as

Dij ¼ c  TDij ; ð4Þ

where TDij is measured TDoA while light speed is c. Dij is expressed as


qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
 2  2  2ffi
LD ði; jÞ ¼ ðXi  xÞ2 þ ðYi  yÞ2 þ ðZi  zÞ2  Xj  x þ Yj  y þ Zj  z ;
ð5Þ

where ðXi; Yi; ZiÞ are the coordinates of Tx/reference node i and ðx; y; zÞ are the
coordinates of RX/user.
Table 1 offers a comparison between the mentioned techniques for indoor local-
ization and addresses main advantages and drawbacks of these techniques.

3 Technologies for Localization

This section will introduce various communication technologies like WiFi, Bluetooth,
Ultra-WideBand (UWB). These technologies are exploited by localization techniques
to improve indoor localization accuracy.

3.1 WiFi
Wifi is commonly operates in the ISM band (Industrial, Scientific, and Medical band).
It follows IEEE 802.11ah standard (i.e. categorized for IoT services) with coverage
612 H. H. Hussein et al.

Table 1. Main advantages and drawbacks of indoor localization techniques


Techniques Advantages Drawbacks
RSSI Simple implementation, low Affected by fading and environmental
cost, can be exploits with noise, prone accuracy, fingerprinting is
various technologies needed
CSI Immune to noises and fading Complexity
Fingerprinting Simple implementation Any minor variation in environment,
new fingerprints are needed
AoA High positioning precision, no Complex implementation, directional
fingerprinting is needed antennas is needed, complex algorithms
and performance degrades as distance
increased
ToF High positioning precision, no Time synchronization between Tx and
fingerprinting is needed Rx is needed. In some scenarios,
additional antennas and time stamps is
needed at Tx and Rx. LOS is required
for high accuracy
TDoA High positioning precision, no Clock synchronization is needed, large
fingerprinting is needed BW is needed, might need time stamps

range about 1 km [10]. This technology is existed in laptops, smart phones, and other
portable user devices. Thus, Wifi is considered as one of the most popular technologies
that exploited for indoor localization [11]. Additionally, WiFi access points can be
exploited as reference points for localization systems calculations (can be used without
any additional infrastructure). The mentioned techniques as RSSI, CSI, ToF, AoA, and
any combination of them can be employed to offer WiFi based localization system.

3.2 Bluetooth
IEEE 802.15.1 standard or Bluetooth technology is applied to connect various nodes
either fixed or moving via limited distance space. Various indoor localization tech-
niques like RSSI, ToF, and AOA can be based on Bluetooth technology. Based on
several researches [12], RSSI is the most applied positioning technique based on
Bluetooth due to its simplicity. Unfortunately, the main drawback of applying RSSI
technique on Bluetooth technology is the accuracy limitations in localizing nodes
(vehicle). However, Bluetooth in its original form can be exploited for positioning due
to its coverage range, low transmitted power and energy consumption). Two common
protocols based on Bluetooth applied for indoor localization are iBeacons (by Apple
Inc.) and Eddystone (by Google Inc.) [13].

3.3 Radio Frequency Identification Device (RFID)


Radio Frequency Identification Device (RFID) is employed for transmitting and
memorizing data by electromagnetic transmission from Tx to any RF compatible circuit
Proposed Localization Scenario for Autonomous Vehicles 613

[14]. RFID systems are categorized into two main types (i.e. active RFID, and passive
RFID).
Active RFID: Usually, it operates in the UHF (Ultra High Frequency) band and
microwave band. Active RFIDs are attached with power source. Their IDs are trans-
mitted periodically and their coverage range in hundreds of meters away from RFID
reader. These active devices can be exploited for tracking objects and indoor local-
ization because of their coverage range, low budget and simplicity in implementation.
Nevertheless, active RFID technology is prone against localization accuracy and it is
not available on most portable user devices.
Passive RFID: Unlike active RFIDs, passive RFIDs have short coverage range (1–
2 m) and don’t require batteries. These devices are smaller in size and lighter in weight
with lower cost than other type. Passive RFIDs are operates in low, high, UHF and
microwave band. These devices are considered as an alternative solution to bar codes,
specifically in non-line of sight environments. However, they are not efficient in case of
localization due to their limited coverage range. These passive devices can be applied
for proximity based services by exploiting brute force mechanisms, but still additional
complex modifications are needed like transmitting an ID that can be used to identify
the devices.

3.4 Ultra Wideband (UWB)


In UWB, ultra short-pulses with time period of <1 ns are transmitted over a large
bandwidth (>500 MHz), in the frequency range from 3.1 to 10.6 GHz, with low duty
cycle [15].
This technology is exploited for localization estimation based on propagating radio
signal from Tx into reference nodes (i.e. known locations). ToF, TDoA, AoA, and
RSSI or hybrid technique are the most applied techniques based on UWB technology
[16].
Table 2 compares between different wireless technologies applied to support indoor
localization in terms of coverage range, accuracy, applied localization techniques,
advantages and disadvantages.
As stated in Table 2, UWB has numerous advantages such as low power con-
sumption, large BW, low cost, high data rate, high accuracy in localization, robust
against environments variations and immune to noises and fading due to its very short
pulses. In addition, UWB signal can penetrate several of materials and has the capa-
bility of data transmission and localization simultaneously.
All these characteristics make UWB a strong candidate for vehicle localization.
Nowadays, integrated UWB radio communication chips implementing the IEEE
802.15.4a standard [17] became available on the market. This chip is offering the
ability to implement UWB technology in any application scenario efficiently.
Unlike RSSI, the accuracy of ToF and TDoA techniques can be better by increasing
either SNR or effective BW. One limitation of ToF as stated before is the necessities of
time synchronization among all nodes. However time synchronization is also vital in
TDoA, but it is much easier in this case; as it is necessary only among reference nodes.
Due to the similarity of the offset time for each of the ToF calculations, thus offset time
614 H. H. Hussein et al.

Table 2. Comparison among various indoor localization technologies


Technology Coverage Accuracy Localization Advantages Disadvantages
range technique
WiFi 150 m 10–15 m RSSI, CSI, ToF Widely Affected by
and AoA available, noises and
high fading,
accuracy, no additional
additional processing
hardware algorithms is
required required
Bluetooth 70–100 m <8 m RSSI, High Low accuracy,
Fingerprinting, throughput, affected by
AoA, and ToF high energy fading and
efficiency noises
RFID 1–2 m 10 cm RSSI, Low power Low coverage
Fingerprinting consumption, Range
high accuracy
UWB 100 m 10– ToF, TDoA, Robust Short coverage
30 cm AoA, and RSSI against range,
interference, additional
high accuracy hardware is
required

will be discarded when subtracting these ToFs. Besides, AoA is another approach can
be applied with TDoA to get more accuracy. Thus implementing TDOA or AOA
technique supported with UWB technology can greatly improve the indoor localization
precisions. In some studies, they can support accuracy range 10*30 cm [18].

4 Proposed Network Scenario

In case of autonomous underground parking garage, TDoA technique supported with


UWB localization system is suggested in order to allow vehicles localize themselves
and drive autonomously inside the garage. The system concept is visualized in a
parking garage scenario as in Fig. 2. The network is covered by Anchor Nodes
(ANs) distributed in the parking. These ANs communicated with UWB tag node that
attached with vehicle in order to help vehicle locate itself inside the garage.
It is preferable to optimize the distribution of these ANs in the garage; in order to
make each vehicle in any location inside the garage has the ability to connect with at
Proposed Localization Scenario for Autonomous Vehicles 615

Fig. 2. Autonomous underground parking garage based on UWB technology

least three LOS ANs. These localization processes help the vehicle to monitor its past
track and to determine and plan its future track as in Fig. 2.
The garage should have various spread ANs as a fixed infrastructure inside the
garage. All UWB ANs IDs and locations in the network should be defined and pro-
grammed. It is supposed that these ANs will be mounted in a height of 2 m in the
garage. Therefore, if a vehicle attached with a UWB tag node arrives the parking, it
requests the list of the current ANs in the network by broadcast messages. Then by
replying on that broadcast, AN sends the garage infrastructure data (i.e. ANs position
and detailed map).
After AN list is acquired, the tag node will periodically pick the available ANs for
deploying UWB-TDoA standard measurement protocol in order to allocate the vehicle
accurately. The attached tag node in vehicle attempts to calculate the actual distance
between the vehicle and each available AN through TDoA technique. In order to obtain
this distance, timestamps are submitted. Consequently, the obtained distances from
each available AN is listed in a measurement matrix that rapids dynamically based on
vehicle’s movement. This matrix obtains the recent location, current velocity and the
direction. Figure 3 clarifies the main procedures to estimate vehicle’s location inside
the garage. Such positioning information can be fused with the internal vehicle’s
Odometry calculations with one of the popular fusion techniques like Extended Kalman
Filter as in [18].
616 H. H. Hussein et al.

Fig. 3. Vehicle’s location estimation procedures for autonomous parking scenario

5 Conclusion and Future Work

In this paper, the importance of indoor localization systems in case of GPS denied
environment is showed. Some of indoor localization techniques are offered such as
RSSI, CSI, AoA, fingerprinting, ToF, and TDoA. The paper stated the advantages and
disadvantages of each localization technique. Moreover, some communication tech-
nologies that support localization system such as Bluetooth, RFID, WiFi, and UWB are
presented. It is proved that implementing TDOA technique supported with UWB
technology can greatly improve the indoor localization accuracy. Finally, the system is
characterized in a scenario for vehicle navigation in an underground parking garage.
Future works will be considered to have a comprehensive study to evaluate the
hardware platform and measure its accuracy in positioning. In addition, we plan to
introduce an artificial intelligent algorithm that could learn from surrounding envi-
ronment to enhance localization accuracy.
Proposed Localization Scenario for Autonomous Vehicles 617

References
1. Ahmed, E.M., et al.: Localization methods for internet of things: current and future trends.
In: 2019 6th International Conference on Advanced Control Circuits and Systems (ACCS) &
2019 5th International Conference on New Paradigms in Electronics & information
Technology (PEIT), pp. 119–125. IEEE (2019)
2. Salem, M.A., Tarrad, I.F., Youssef, M.I., El-Kader, S.M.A.: QoS categories activeness-
aware adaptive EDCA algorithm for dense IoT networks. Int. J. Comput. Netw. Commun.
(IJCNC) 11(3), 67–83 (2019)
3. Hussein, H.H., et al.: Internet of Vehicles (IoV) enabled 5G D2D technology using proposed
resource sharing algorithm. In: 2019 6th International Conference on Advanced Control
Circuits and Systems (ACCS) & 2019 5th International Conference on New Paradigms in
Electronics & Information Technology (PEIT), pp. 126–131. IEEE (2019)
4. Gaber, H., et al.: Localization and mapping for indoor navigation: survey. In: Robotic
Systems: Concepts, Methodologies, Tools, and Applications, pp. 930–954. IGI Global (2020)
5. Yang, B., et al.: A novel trilateration algorithm for RSSI-based indoor localization. IEEE
Sens. J. 20(14), 8164–8172 (2020)
6. Gao, R., et al.: Extreme learning machine ensemble for CSI based device-free indoor
localization. In: 2019 28th Wireless and Optical Communications Conference (WOCC),
pp. 1–5. IEEE (2019)
7. Duan, Y., et al.: Packet delivery ratio fingerprinting: towards device-invariant passive indoor
localization. IEEE Internet Things J. 7(4), 2877–2889 (2020)
8. Al-Sadoon, M.A.G., et al.: AOA localization for vehicle tracking systems using a dual-band
sensor array. IEEE Trans. Antennas Propag. 68(8), 6330–6345 (2020)
9. Li, C., et al.: CRLB-based positioning performance of indoor hybrid AoA/RSS/ToF
localization. In: 2019 International Conference on Indoor Positioning and Indoor Navigation
(IPIN), pp. 1–6. IEEE (2019)
10. Soares, S.M., Carvalho, M.M.: Throughput analytical modeling of IEEE 802.11 ah wireless
networks. In: 2019 16th IEEE Annual Consumer Communications & Networking
Conference (CCNC), pp. 1–4. IEEE (2019)
11. Cheng, H.-M., Song, D.: Localization in inconsistent wifi environments. In: Robotics
Research, pp. 661–678. Springer, Cham (2020)
12. Khan, A., et al.: Bluetooth and ZigBee: a network layer architecture gateway. Int. J. Simul.
Syst. Sci. Technol. 20, 1–10 (2019)
13. Bbosale, A., et al.: Indoor navigation system using BLE beacons. In: 2019 International
Conference on Nascent Technologies in Engineering (ICNTE), pp. 1–6. IEEE (2019)
14. Liu, Q.: Automated logistics management and distribution based on RFID positioning
technology. Telecommun. Radio Eng. 79(1), 17–27 (2020)
15. Delamare, M., et al.: Static and dynamic evaluation of an UWB localization system for
industrial applications. Sci 2(1), 7 (2020)
16. Chen, Y.-Y., et al.: UWB system for indoor positioning and tracking with arbitrary target
orientation, optimal anchor location, and adaptive NLOS mitigation. IEEE Trans. Veh.
Technol. (2020)
17. De Dominicis, C.M., Pivato, P., Ferrari, P., Macii, D., Sisinni, E., Flammini, A.:
Timestamping of IEEE 802.15. 4a CSS signals for wireless ranging and time synchroniza-
tion. IEEE Trans. Instrum. Meas. 62(8), 2286–2296 (2013)
18. Stoll, H., et al.: GPS-independent localization for off-road vehicles using ultra-wideband
(UWB). In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems
(ITSC). IEEE (2017)
Business Intelligence
E-cash Payment Scheme in Near Field
Communication Based on Boosted Trapdoor
Hash

Ahmed M. Hassan1(&) and Saad M. Darwish2


1
Naval College, Alexandria, Egypt
A2hassan1968@yahoo.com
2
Institute of Graduate Studies and Research, Department of Information
Technology, Alexandria University, Alexandria, Egypt
saad.darwish@alexu.edu.eg

Abstract. In the modern world, there is rapid progress in the Internet and
information technology that is becoming more important in the field of Elec-
tronic commerce. The majority of the clients use the electronic cash process as
an alternative to using real cash. But in this E-cash method, there is a problem
faced by most of the customers that is the privacy and safety of the client. In
order to preserve the privacy of this mechanism, the trapdoor hash function
plays a vital role in the construction of secure digital signatures. This secure
digital signature is broadly used in different fields nowadays. Near Field
Communication (NFC) is nowadays a popular technology to facilitate enabling
the payment technique of consumers with the mobile token system. This paper
suggests an adapted scheme that integrates boosted trapdoor hash and NFC
technology to enhance E-cash payment robustness with minimum computation
cost. Boosted trapdoor hash relies on appending some bits to the original sig-
nature to increase the digital signature’s security and reduce the modular
reduction in onlinecomputation. Since the processing does not require the
operation of division, it is easier than other schemes to implement in the
environment of smartphones. The suggested scheme can be used with mobile
devices effectively in E-payment.

Keywords: Near field communication  Boosted trapdoor hash  E-cash 


E-Payment

1 Introduction

As the extensive use of smartphones, payments of financial commitments for most of


people is nowadays become easier. Near Field Communication (NFC)–based smart-
phones are utilized to offer various information that is obtained very easily by the users.
The data are exchanged with the use of NFC technology within the systems that are at a
distance of few centimeters, and recently it represents a usual wireless communication
technology. This technology has been deployed and used widely for making payments
without the use of credit or debit cards [1, 2].

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 621–631, 2021.
https://doi.org/10.1007/978-3-030-58669-0_56
622 A. M. Hassan and S. M. Darwish

For payments made devoid of the contact of NFC-based smartphones, the device
operates in the form of card emulation mode, and it appears such to be a conventional
contactless smart card to the peripheral reader. A smart card chip will store the
information of the mobile user by using a secure element that enables protection for
data stored and performs transactions in a protected manner. By adaptation of the NFC
on smartphones, it is easier to make several payments safe online though the com-
munication is done in a short distance. This regular securing the payments process by
using a key agreement is done using an elliptic curve algorithm [3]. This mechanism is
also identified as public-key encryption algorithm. The NFC on smartphones is
employed for supporting various functions such as Mobile to Mobile methods and also
TAGs to mobile. By the use of these functions, the NFC on smartphones will transfer
the large data volumes generously.
The activities like transactions and retrieving the information using micropayments
or transactions of payments can be done using NFC tags. This tag depends on the NFC
reader built-in within the customer’s phones. The significant key feature of NFC on
smartphones is that it has the benefit of performing the services without revealing the
secrets of private information of users such as financial data. Thus NFC arises as an
easy process as it is employed to apply the complexity of payments made on cards in
the accessible end of the system’s sales. The drawback of the NFC tags is that it cannot
perform any action on their own, but they are engaged simply for transferring the data
to an active device, namely smartphones [4, 5]. The procedure of enabling this is called
NFC provisioning. Whereas the safety keys are providing to the user whenever he uses
to make transactions, and the wallet or payment application is put in storage in a safe
environment.
The Trusted Service Manager (TSM) is the reliable third party resource that offers
OTA (Over the Air) facilities to the NFC payment application issuer. Numerous of the
TSM can be concerned in the delivery of the payment applications. The ultimate aim of
TSM technology in the NFC network is to possibly make the NFC payment man-
ageable application on the user’s smartphones. A small number of applications such as
life cycle administration of the NFC application payment on the user’s smart mobile
phone, OTA activation or delivery of the NFC payment, transferring, and bridging
services for the new phone when needed are delivered by TSM [6].
NFC smartphones are the irrevocable component of NFC communication, which is
naturally composed of numerous integrated circuits such as the NFC communication
module, as illustrated in Fig. 1. The NFC communication module is composed of an
NFC Contactless Front-end (NFC CLF), an NFC antenna, and a fused chipset referred
to as an NFC Controller (NFCC) whose function is to manage the emission and
reception of the signals, and modulation/demodulation [2].
To overcome the security issues and boost the safeties in the NFC platform, NFC-
SEC (security) process is recommended that offers secure channel services. PID is
defined as the part of the NFC-SEC cryptography according to the invocation of the
searchable symmetric encryption (SSE). It can set up a common secret key verification
and agreement mechanism. By using the activation request (ACT_REQ) and activation
response (ACT_RES), the peer NFC-SEC entities establish a common secret key that
depends on the NFC-SEC cryptography. Also, by Verification Request (VFY_REQ)
E-cash Payment Scheme in Near Field Communication 623

and Verification Response (VFY_RES), the peer NFC-SEC will be verified for their
contracted secret key that depends on the NFC SEC cryptography [7, 8].
To eliminate the vulnerabilities regarding security issues of traditional NFC-SEC,
we suggested a modified e-cash payment scheme using a boosted trapdoor hash
function on smart mobile devices. Herein, instead of utilizing traditional trapdoor hash,
the suggested scheme exploits a digital signature that replaces the modular reduction
with a conventional multiplication to cut down the online computation. Only with
theseprimitive operations, we improved the efficiency using the crypto algorithm that
can perform a high-speed operation. Furthermore, the user impersonation attack is
impossible as anyone cannot get a session key, because the signatures changes in every
session.
The rest of the paper is prepared as follows: Sect. 2 describes some of the state-of-
the-art related work. A detailed explanation of the suggested scheme has been made in
Sect. 3. In Sect. 4, the security analysis are given. Finally, conclusions are drawn in
Sect. 5.

Fig. 1. General architecture of a NFC smartphone.


624 A. M. Hassan and S. M. Darwish

2 Related Work

Recently, NFC Controller Interface (NCI) is emerged to facilitate the integration of


chipsets manufactured by different chip manufacturers and defines a communal level of
functionality and interoperability between the components inside an NFC device that
expected to be in any form, e.g., operating systems and hardware [2, 3]. Actually, in
today’s smartphones, NFC stacks hardly meet all these requirements because of
business model-related issues. In addition, other authors suggested a new NFC stack
architecture for mobile devices by analyzing OS services, and NFC Forum standards.
This NFC stack runtime environment is also validated on the Android and Windows
Phone OSs.
According to a recent study [6], the NFCIP-1 protocol allows error handling,
provides an ordered data flow and performs reliable, and error-free communication in
the link layer. In the alternative study [7], a simulation model for the NFCIP-1 over the
network simulator is offered. The study showed that the NFCIP-1 protocol needs to be
supported with additional methods such as flow controller mechanisms. In [8], the
authors discussed the realization of an IP link using tunneling over the NFCIP-1
protocol, which enables devices to easily exchange data over the network. Such a
tunneling implementation may bring new possibilities for peer-to-peer mode
applications.
In some studies, as stated in [7], a secure version of the LLCP is established called
LLCPS (i.e., the Logical Link Control protocol protected by TLS), which defends the
transactions in peer-to-peer mode. The protocol is validated by two experimental
platforms and delivers strong mutual authentication, privacy, as well as integrity. An
alternative significant protocol in this layer is the Simple NDEF Exchange Protocol
(SNEP) that tolerates an application on an NFC-enabled device to interchange NDEF
messages with different device in peer-to-peer mode. The protocol makes use of the
LLCP connection-oriented transportation mode to deliver a reliable data exchange.
In accordance with peer-to-peer protocols, as stated in [8], one study analyzed
obtainable peer-to-peer protocols and offered OPEN-NDEF Push Protocol (NPP) as an
open-source library. NPP is a simple protocol built on top of LLCP, which is designed
by Google to push an NDEF message from one device to another on Android devices.
An enhanced version called OPEN-SNEP library as an update to NPP is offered that
analyzes technical details of the OPEN-SNEP solution. The main dissimilarities
between NPP and SNEP are provided with use suitcases as well. The suggested pro-
tocol tries to enrich HCE security trough utilizing Trapdoor hash instead of NDEF
message that diminishes the throughout the network. In comparison to the approach
suggested in [4] that uses trapdoor hash functions for e-payment in mobile apps, the
proposed scheme used an improved variant of trapdoor hash to improve protection and
minimize computational costs.
E-cash Payment Scheme in Near Field Communication 625

3 The Proposed E-payment Scheme

We suggest high-speed authentication operation and key agreement for NFC envi-
ronments. We provide a scheme for carrying out safe correspondence, giving no details
to the other person, including the smartphone or user. First, all mobile devices will
experience a one-time pre-registration phase. Authentication shall be effected by the
use of secret information issued during the registration phase. The mobile pre-paid
system can be divided into three phases: the registration phase, the payment phase, and
the deposit phase. The following is a thorough summary of the proposed scheme.

3.1 Registration Phase


The user Identification Information shall be given by the Trustworthy Authentication
Center and the Third Party (TP) for efficient communication. Only authentication is
needed to validate the source of the transaction. Private details must be entered by the
users to execute the transaction, while the unregistered consumer would not be allowed
to conduct the transaction. This means fraudulent transactions can be stopped. The
authorized mobile device will submit to the TP both ID and password. After obtaining
the content, the UV (User Value) and USS (User Shared Secret Number) will be stored
in DB by TP. The USS and UV are re-transmitted to mobile devices through a secure
path. Such USS and UV will be handled safely by the user and mobile app via the
following function [9–11].

USS ¼ hðIDÞkPWkxÞ; UV ¼ hðIDÞjjxÞ ð1Þ

The USS and UV are re-transmitted to mobile devices through a safe connection.
This USS and UV would be handled securely by the user and the mobile app. Then
select a random number x1 2R f0; 1gl and calculate y1 ¼ gx1 mod n . The public key
for the trapdoor key is HK ¼ ðg; n y1 Þ and the trapdoor key is x1. Randomly generate
message m1 2R f0; 1gI and number r1 2R f0; 1g2:I þ k . Then calculate the trapdoor hash
value A ¼ THHL ðm1 ; r1 ; y1 Þ ¼ gr1 :ym1 mode n. Then Transmits identity message IDi
and the trapdoor hash value A to the bank. The bank uses of signing key specified as
d to sign the message and makes a payment made electronic system identified as r. The
bank sends the electronic payment certificate r to the consumer as:

r ¼ H ðIDi ; AÞe mode n ð2Þ

Then customer verifies the r value. If it is correct then customer stores the random pair
ðm1 ; r1 ; x1 Þ and the e- payment credential into the smart phone devices. Calculate one
more pair ðm1 ; r1 Þ in payment phase. The trapdoor hash values are generated by both
the pair of keys.

3.2 Authentication Phase


The authentication method is to insure that the authorized users and the source use the
pre-registered details. The scheme introduced in this research shall carry out the
626 A. M. Hassan and S. M. Darwish

verification without revealing any personal details to the opposing group. The reader
can translate the qualified results of the verification process to validate the information
provided by the TAG or the user. It create two requests for messages [11–14].

Generate N1 ; Msgreq M1 ¼ hðID1 jj USS 1 jjN1 Þ; Msgreq M2 ¼ UV  N1 ð3Þ

The information generated, MsgreqM1 and MsgreqM2 is transmitted to the reader and
the receiver performs the given function to acquire the information of readers for
MsgreqM3 and MsgreqM4 as:

Generate N2 ; Msgreq M3 ¼ hðID2 jj USS 2 jjN2 Þ; Msgreq M4 ¼ UV2  N2 ð4Þ

The reader transmits the information for ID1, MsgreqM1, MsgreqM2, ID2, MsgreqM3
and MsgreqM4 to TP. Whereas TP verifies MsgreqM1, MsgreqM2, MsgreqM3 and
MsgreqM4. It produces nonce value N3, confirming all information received are right.
TP perform different function after created nonce values.

Msgres M ¼ hðUV1 jjN3 Þ  N1 ð5Þ

Msgres M1 ¼ ðID1  N3 Þ  ðUV1 jjN1 Þ ð6Þ

Msgres M2 ¼ hðN3 jj UV1 jjN1 Þ; ð7Þ

Msgres M3 ¼ ðID2  N3 Þ  hðUV2 jjN2 Þ ð8Þ

Msgres M4 ¼ hðN3 jj UV2 jjN2 Þ ð9Þ

Msgres M5 ¼ Msgres M  N3 ; ð10Þ

TP sends Msgres M1|| Msgres M2|| Msgres M3|| Msgres M4|| Msgres M5 to the receiving
devices. The reader confirms Msgres M3, Msgres M4 and Msgres M5.

3.3 Key Agreement Phase


The need for a key agreement step by session key is to share payment details on the
basis of the authentication procedure. The reader equipment confirms the Msgres M3,
Msgres M4, and Msgres M5. It transmits Msgres M1, Msgres M2, and Msgres M5 to the
beginning user. The beginning user must create the session key as the beneficiary of the
data to be represented, after confirming the received information as
SK ¼ hðN3jjResMÞ. The machines shall use this general session key to interact
securely.

3.4 Payment Phase


Details on the invoice, the customer and the price can be supplied to the package. The
Client shall determine the document m2 ¼ HðIDi ; IDM ; IDB ; MN; MPÞ. The document
shall contain the identification of the issuer, the client, the vendor, the name and price
E-cash Payment Scheme in Near Field Communication 627

of the products bought. The saved information ðm1 ; r1 ; x1 Þ is retrieved from the smart
mobile devices of the customer. Initially select a new trapdoor key x2 2R f0; 1gI and
compute y2 ¼ gx2 mode n. Then execute the boosted trapdoor operation
THTK m2 ¼ r2 ¼ r1 þ x1 :m1  x2 :m2 : ð11Þ

It implies that A ¼ gr1 :ym1 ¼ gr2 :ym2 mode n. Let the payment information
a ¼ ðr; A; IDi ; IDM ; IDB ; MN; MP; r2 ; y2 Þ, the merchant uses the receiving a to build
the code recognizer for verification. Then the mercantile adds the TI operating infor-
mation to the order and forms an estimation, then the key is signed.

S ¼ SigM ðr; A; IDi ; IDM ; IDB ; MN; MP; r2 ; y2 ; TIÞ: ð12Þ

Then the mercantile sends a bank verification S. The introduction of NFCs in smart
phones makes communicating between users simpler. Figure 2 illustrates the specific
interactions between the elements of the NFC payment system. Users may obtain
detailed product details via the TAG and can then complete payment via contacts with
appropriate readers or mobile payment systems. The NFC scheme is therefore a handy
tool, because it can solve the problems of card and movement-based purchases in
current POS (Point of Sales) schemes.

Fig. 2. NFC payment system– near field communication.

The modification made to the current system lies in the use of boosted trapdoor
hash function instead of traditional trapdoor hash to create the session key. We know
that making a modular reduction within trapdoor hash requires more computational
power than that of multiplication used in the boosted trapdoor hash. Therefore, the
result is consistent with expectations since the modular reduction has been replaced
with a conventional multiplication. In trapdoor hash, the quantity r2 is computed using
integer arithmetic. The bit length of r2 may vary in a wide range because that r2 is a
628 A. M. Hassan and S. M. Darwish

result of integer arithmetic. Therefore the new definition is forced to switch the position
of r and m. With the sacrifice of more appended bits and longer hash operation, the
paper utilized a trapdoor hash function with very efficient in online computation. The
proposed scheme appears appealing to battery-powered computing devices because no
further modular reduction activity is expected in the online process. Herein, the trap-
door process is conducted in the online period; the hash procedure is performed during
the offline phase.
Within NFC Payment System, vulnerability issues can emerge, as illustrated in
Fig. 3, such as the copying of TAGs, exposure to unauthorized TAGs, and leaving
identification details on record via a hidden reader or mobile device, as contact between
TAGs and mobile devices occurs in a wireless world. Attacks such as “Man-in-the-
Middle,” replay, and recovery of authentication details in the connection between a
mobile device or a reader and a certification center server can also occur. Therefore, our
goal in this paper is to suggest a lightweight authentication approach and a safe way to
respond to such attacks, which needs less modulo and hash operations to complete the
purchase and deposit phases. See [15, 16] for more details.

Fig. 3. NFC payment System –vulnerability of NFC.

4 Security Analysis

Let the following scenarios be taken into account. If we first pay with a credit card, we
choose one of the many cards that we have, and we’ll send the card to the clerk. After
the card was applied to the reader by the clerk, the credit card reader will start com-
municating for the transaction to the remote server. When the transaction is complete,
the signature must be marked on a receipt. We first take a smartphone and display NFC
readers when we pay with NFC smartphones. The reader then contacts the remote
server to start the payment. The PIN requirement is made accessible through a mobile
device system. Once payment has been made, an electronic receipt is produced and
E-cash Payment Scheme in Near Field Communication 629

stored automatically on the computer. Therefore, there is no exposed data and no replay
attack on the proposed method. The use of nonce value is a key principle in our
proposed framework that prevents replay attacks. The nonce value varies in all ses-
sions, and so this nonce value is involved in all phases of operation.
Our scheme carries out the authentication through the TP so that user information is
not left on the other party’s computer. The balance on the other side of this document is
ID, ReqM1, and ReqM2 only. The key USS and UV cannot be overcome. This is
necessary to remember. Even if someone gets ResM1, ResM2, and ResM5 obtained via
TP; it cannot open the value of the nonce created by the consumer. Consequently, all
values are meaningless except ID. Therefore, for malicious uses, nobody can get the
information.

4.1 Undeniability
The quotation must be signed with the trader’s signing key for each transaction that is
made between the customer and the trader. After completion of the signature, the trader
shall pass the quotation to the bank, and the bank will check it using the public key
released by the trader. The trader cannot refuse to sign the quote details. In the payment
process, instead, the customer uses the trapdoor key for measuring, obtaining, and
generating the trapdoor hash value A, THTK(m2), (m2, r2, y2). The THTK(m2) could only
be computed by the customer who knows the trapdoor key. The consumer cannot,
therefore, deny the details about the transaction.

4.2 Accuracy
The quote provided by the dealer can be submitted for verification to any group. Should
any party wish to verify the authenticity of the transaction, the merchant can verify the
signature and the transaction data on the public key published by it. Furthermore, To
generate a session key, user U2 should be able to perform h(N3|| Msgres M). At this
point, because the Msgres M is h(UV1||N3||N1), even if you mix Msgres M1 and
Msgres M2, you cannot create Msgres M. Therefore, the intruder will not build a session
key because the changes in UV1 and N1 values cannot be discovered in any session.
You cannot get a login key even though you choose to use the previous session key in
the previous session, because every session changes the N3. Finally, the attack on the
user is unlikely.

4.3 Prevention of Double-Spending


The payment information (r, A, IDi, IDM, IDB, MN, MP, r2, y2, TI) is generated after
the transaction between the customer and the merchant. Payment information is held by
the bank in its database. The bank checks the records when the customer receives a new
payment request to ensure there are no other payment details in the records. The request
for a new payment will be denied if the same payment information is found. This will
effectively prevent the customer from getting any replication problem. With regard to
replay attacks, the proposed scheme uses the nonce value to handle this type of attack.
Nonce values take part in all phases, and in all sessions, the nonce value can vary. The
630 A. M. Hassan and S. M. Darwish

projected approach uses TP authentication to guarantee that the information is main-


tained by users on other party devices.

5 Conclusion

The e-commerce network is flourishing with the exponential growth of the Internet and
information technology. Unfortunately, the latest e-cash schemes have proven to be
cumbersome. In this article, we suggested a modified e-cash payment protocol using
the boosted trapdoor hash on smart mobile devices to increase security via building a
one-time-key session with a low computational cost. By utilizing the improved trap-
door hash function, this scheme will reduce computing costs and satisfy the protection
criteria of e-commerce networks; making the system applicable to most mobile devices,
and meeting the security requirements of e-cash commerce systems.
As different from the current protocols that took a significant number of modulo
and hash operations to complete the purchasing and deposit phases. The implemen-
tation of these protocols to mobile devices had a range of limitations, including
insufficient battery capacity and inadequate computing capacities that could not be
resolved. The scheme built in this study contains improved trapdoor hash functions to
reduce the shortcomings of online computing. Enhanced trapdoor hash functions take
advantage of pre-computing (offline computing). Therefore, only an integer multipli-
cation is required during the online operation, making the payment system possible for
mobile device applications. In the future, we’ll figure out the more efficient protocol for
the customer and the bank.

References
1. Shadi, N.: Secure authentication protocol for NFC mobile payment systems. Int. J. Comput.
Sci. Netw. Secur. 17(8), 257–263 (2017)
2. Mohamad, B., Rouba, B.: A lightweight security protocol for NFC-based mobile payments.
In: International Conference on Ambient Systems, Networks and Technologies, Procedia
Computer Science, Dubai, pp. 705–711 (2016)
3. Nour, E., Guy, P.: Security Enhancements in EMV protocol for NFC mobile payment. In:
The 15th IEEE International Conference on Trust, Security and Privacy in Computing and
Communications, China, pp. 1–7 (2016)
4. Mayada, A., Ali, A.: Online security protocol for NFC mobile payment applications. In:
International Conference on Information Technology (ICIT), Amman, pp. 827–832 (2017)
5. Sekhar, V., Sarvabhatla, M.: Secure lightweight mobile payment protocol using symmetric
key techniques. In: Proceedings of International Conference on Computer Communication
and Informatics, India, pp. 1–6 (2012)
6. Eun, H., Lee, H., Oh, H.: Conditional privacy preserving security protocol for NFC
applications. IEEE Trans. Consum. Electron. 59(1), 153–160 (2013)
7. Information Technology Telecommunications and Information Exchange between Systems
—NFC Security—Part 1: NFC-SEC NFCIP-1 Security Service and Protocol; ISO/IEC
13157–1:2010; ISO/IEC: Geneva, Switzerland (2010)
E-cash Payment Scheme in Near Field Communication 631

8. Information Technology Telecommunications and Information Exchange between Systems


—NFC Security—Part 2: NFC-SEC Cryptography Standard Using ECDH and AES;
ISO/IEC 13157–2:2010; ISO/IEC: Geneva, Switzerland (2010)
9. Liaw, H., Lin, J., Wu, W.: A new electronic traveler’s check scheme based on one-way hash
function. Electron. Commer. Res. Appl. 4(6), 499–508 (2007)
10. Yang, F.: Improvement on a trapdoor hash function. Int. J. Netw. Secur. 1(9), 17–21 (2009)
11. Wang, J., Yang, F., Paik, I.: A novel E-cash payment protocol using trapdoor hash function
on smart mobile devices. Int. J. Comput. Sci. Netw. Secur. 11(6), 12–19 (2011)
12. Chandrasekhar, S., Chakrabarti, S., Singhal, M.: A trapdoor hash-based mechanism for
stream authentication. IEEE Trans. Dependable Secure Comput. 5(9), 699–713 (2012)
13. Gao, W., Li, F., Wang, X.: Chameleon hash without key exposure based on schnorr
signature. Comput. Stan. Interfaces 31(1), 282–285 (2009)
14. Jain, A., Shanbhag, D.: Addressing security and privacy risks in mobile applications. IT
Prof. 5(14), 28–33 (2012)
15. Wenzheng, L., Xiaofeng, W., Wei, P.: State of the Art: Secure Mobile Payment. IEEE
Access 10, 1–18 (2019)
16. Mahdi, G., Morteza, N.: An anonymous and secure key agreement protocol for NFC
applications using pseudonym. Wirel. Netw. 1–16 (2020)
Internal Factors Affect Knowledge
Management and Firm Performance:
A Systematic Review

Aaesha Ahmed Al Mehrez1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. The research paper sets out to prepare a systematic review of one of
the essential terms in any organization, which is knowledge management
(KM) and firm performance (FP). The research aimed to learn more about the
relationship between the KM and FP. This systematic review coved 41 articles
published between 2010–2020 and come from different databases like ProQuest,
Emerald, ScienceDirect, Taylor & Francis, and Google Scholar. The results
filtered by a set of inclusion and exclusion criteria. The more frequency factors
will be highlighted as the main findings for future study. A proposed framework
can be defined in that stage for future research. After that, a comprehensive
analysis will determine the 41 articles and answers the five research questions.
The main finding in this research knowledge management positively related to
firm performance. Many factors affect this relation like organization learning,
intellectual capital, human resources management, information technology, soft
total quality management, knowledge management practice, strategy, structure.

Keywords: Systematic review  Knowledge Management  Firm


performance  Organization Performance

1 Introduction

Knowledge management is a process that takes place within the organization, which
helps to find information and knowledge like creating knowledge and sharing it [1, 2].
Its plays an essential role in many administrative activities like solving problem,
coming up with a new idea and help the decision-maker to decide their decision [3–6].
There are many benefits of using knowledge management in firms. First, they used it as
wealth generated tool. Second, rediscover that human resources as the leading and
essential resource of knowledge management. Third good knowledge management
practices will support the firm to remain in the competition with others in the markets
[7]. [8] argued that knowledge management as a concept is very complicated because

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 632–643, 2021.
https://doi.org/10.1007/978-3-030-58669-0_57
Internal Factors Affect Knowledge Management and Firm Performance 633

its exit in the human mind; it’s not easy to understand others as the knowledge is
contextualized and customized [4, 9].
Some researchers used different terms to connect it with the study knowledge
management, like knowledge-based [10], knowledge management capability [11],
strategic knowledge management [12], tract and explicit knowledge [13], knowledge
management practice and systems [14] and Knowledge management capabilities [15].

2 Literature Review

2.1 Research Questions


This systematic review is focusing on the relationship between knowledge management
and firm performance and studying the factors and variables that affect this relation-
ship. More specifically, this systematic review will be analyzed by answering the
following questions:
RQ1: What are the main research methods of the selected studies?
RQ2: What are the participating countries in the context of the selected studies?
RQ3: What is the context of the selected studies?
RQ4: In terms of KM and FP distribution of studies by year of publication?

2.2 Research Objectives


This systematic review is focusing on developing many skills for us as new DBA
students, such as:
1. Developing our research skills and techniques using different databases.
2. Learning the process to conduct a systematic review.
3. Extend our knowledge about the search topic.
4. Explore the factors that affect the relationship between KM and FP.
5. Generate ideas about future studies for the same topic.

2.3 Research Important


Many firms used the KM as competitive advantages between the organizations. If the
knowledge used in the right way between all parties in the organization, everyone will
act positively to achieve the organization’s goals and objectives. This research will help
us as a researcher to see what other researchers did in this field, build from where others
stopped, understand many various issues related to our topic, KM, and FP. Also,
exploring the factors that affect the relationship between KM and FP.
634 A. A. A. Mehrez et al.

3 Literature Review
3.1 Knowledge Management
Knowledge management is an essential tool in any organization. Authors in [16]
Defined the KM as a collection of management practices and techniques used by the
organization to distribute the information, know-how the expertise, and intellectual
capital within the organization use and reused the knowledge. Besides that, [17]
defined KM practices as a set of managerial sets of actions that support organizational
knowledge processes to maximize the value created by organizational knowledge
assets. Authors in [18] stated that KM as strategies and procedures designed to identify,
obtain, structure, value, control, and share an organization’s intellectual assets to enrich
its performance and competitiveness. Therefore, authors in [8] argues that knowledge
management as a concept is very complicated because its exit in the human mind; it’s
not easy to understand others as the knowledge is contextualized and customized. [19]
contended that successful knowledge management required to not only transfer the
internalized tacit knowledge into explicit, organized knowledge to systematic or
technical to share it but also for individuals to adopt and make personally meaningful
collected knowledge once it saved from the knowledge management system. Authors
in [20] acknowledged that knowledge management is mixed operations of processes
and practices that help the organization to get knowledge-based competitive advan-
tages. Knowledge management has four types of success factors, human-oriented,
organization oriented, technology-oriented, and management processes orientated [21].
Academics, practitioners defined the concept of Knowledge management as three main
activities first knowledge acquisition, knowledge dissemination, and responsiveness to
knowledge [22].
Authors in [8] used organization learning to investigate the collaboration, team-
work, freedom, and reward, and recognition and their relationship with knowledge
management and firm performance, which showed a positive impact. [23] applied the
transformational leadership, organization learning, and organizational innovation to
cover the relationship between KM and FP. Explores the role of tacit and explicit
knowledge in converting management innovation into firm performance is the method
used by [13]. Authors in [15] focused on his research on decision quality as a primary
administrative process that affects knowledge management and FP.

3.2 Organizational Performance


Organization concept refers to a group of people working in one place to achieve
specific goals and objectives. The performance of the organization can be assessed by
checking the achievement of that group [24–26]. Old Scholar defined this concept as
the ability to achieve organization target is the definition of the performance [27–29].
[30] described the performance as a work fulfillment, accomplished objectives, and
personal change; at the gathering level, it alludes to assurance, cohesion, efficiency, and
Internal Factors Affect Knowledge Management and Firm Performance 635

profitability; and at the sound level, it is about benefit, proficiency, efficiency, non-
appearance rate, turnover rate, and versatility while the new description looked per-
formance as the ability to measure organizational effectiveness, productivity, prof-
itability, quality, continuous improvement, work quality, and social responsibility as
leading indicators for performance [25, 31–33].

4 Methods

In this assessment, we will use a systematic review of the Knowledge Management and
Firm Performance. There will be many stages that research will go through like the
inclusion/exclusion criteria, data sources, search strategy, data coding, and analysis the
details will be in the following paragraphs.

4.1 Inclusion/Exclusion Criteria


The inclusion and exclusion criteria set up to help us to do our research. In the
inclusion as seen in Table 1, the search articles must involve in Knowledge Man-
agement and Firm Performance. The second should show the relationship between KM
and FP. Thirds articles should be between the years of 2010–2020. The fourth should
be in English languages. The last point no sector indicator. In the Exclusion criteria, the
first articles with Technical background should exclude—second old articles in
Knowledge management and firm performance. The last criteria article was written in
different languages. The below showed the inclusion and exclusion and this research

Table 1. Inclusion and exclusion criteria


Inclusion criteria Exclusion criteria
• Should involve in knowledge management and firm • Article with IT
performance backgrounds
• Should be showing the relation between KM AND FP • Other Languages
• No sector indicators • Old articles
• The year 2010–2020
• Should be written in English
636 A. A. A. Mehrez et al.

4.2 Data Sources and Research Strategies


This study applied the systematic approach to answer the research questions. Many
scholars such as used this method [34–37]. To do this systematic literature review, we
used many databases and research engines like ProQuest, emerald Google Scholar,
Taylor & Francis, and science direct. The research terms used “knowledge Manage-
ment AND Firm Performance”, “knowledge Management, AND Organization Per-
formance”. We used the PRISMA statement as the research paper. The impact of
knowledge management processes on information systems: A systematic review [38]
we used it as a guideline to do our research. In the Identification stage, our research
found 1361 articles from the mentioned databases and search engine. In the Screening
stage, we filtered the results, and we removed the duplicated articles, where most of the
duplicates come from google scholar, we limited the results by showing articles from
2010 to 2020, peer-reviewed; our results showed 313 articles. After that, we checked
the titles and abstracts of all articles, which reduced the number of showing articles to
207 articles. We requested the full articles and keywords included in the title, which
reduced the resulting articles to 73 articles in the eligibility stage. We applied the
inclusion and the exclusion criteria for all articles, which reduced the showing article to
41 articles included in the final stage as seen in Table 2. Figure 1 illustrates the
flowchart of the selected study.

Table 2. Data sources and Databases


No Databases Number of Number of Number of Number of
articles articles articles articles
Stage 1 Stage 2 Stage 3 Stage 4
1 ProQuest 123 89 30 12
2 Emerald 117 42 12 9
3 Science 98 71 15 11
Direct
4 Taylor & 34 9 5 1
Francis
5 Google 989 102 11 8
Scholar
Total 1361 313 73 41
Internal Factors Affect Knowledge Management and Firm Performance 637

Fig. 1. PRISMA flowchart for the selected studies

4.3 Data Coding and Analysis


In this research, the research methodology quality was coded included as seen in Fig. 1,
methods as seen in Fig. 2, the country as seen in Fig. 3, context as seen in Fig. 4, and
year of publication as seen in Fig. 5, as the way that [39] used in his paper. The set of
questions gives a clear idea about the quality of papers, a researcher in this stage can
include or exclude many articles by answering the following four questions.
RQ1: What are the main research methods of the selected studies?
RQ2: What are the participating countries in the context of the selected studies?
RQ3: What is the context of the selected studies?
RQ4: In terms of KM and FP distribution of studies by year of publication?
Based on 41 research articles in terms of knowledge management and firm per-
formance from 2010–2020. The results of the systematic review according to the
answer to the above four questions.
638 A. A. A. Mehrez et al.

5 Results and Analysis


5.1 RQ1: Distribution of Research Methods
In terms of the distribution of the study regarding the method used, the figure pie charts
number 2 below showed that many research papers depend on surveys to do their
research studies as its more popular method on data collection. Other methods used the
interviews. The framework, quantitative and qualitative, literature review, mail survey,
and conceptual study were also used in the studies as methods.

Fig. 2. Distribution of studies by research methods

5.2 RQ2: Distribution of Countries


Figure charts 3 below showed that China had conducted more research articles in
Knowledge management and firm performance and the reason behind that China one of
the leading countries in Industries. Spain and Iran become in the second with four
articles. India, Taiwan, and Malaysia become the third. There where one article from
the following countries South Korea, UK, South Africa, Finland, Slovenia, Bahrain,
Serbia, Egypt, Pakistan, Brazil, Bangladesh, and Japan.
Internal Factors Affect Knowledge Management and Firm Performance 639

Fig. 3. Research studies by countries.

5.3 RQ3: Distribution of Research Context


In terms of context, this systematic review covered much variety of contexts like the
investment bank Banking sector, Investment bank, HR organization, High Technology
firms, electronic and electric firms, hospitality services, domestic companies, manu-
facturing organizations as seen in Fig. 4. Firms, SMEs, firms, Technology and IT
cross-sectional industries, textile industry, family businesses, PLU-SMEs, engineering
firms, and firms adopted the Knowledge management system. The chart illustrated that
small-medium enterprises have six articles, manufacturing firms coming as the second
context in this research. Technology firms And IT firms coming in the thirds.

Fig. 4. Distribution of studies in terms of context/discipline


640 A. A. A. Mehrez et al.

5.4 RQ4: Distribution of Studies by Year of Publication


In terms of publication year, Fig. 5 illustrates the distribution of study that investigates
the knowledge management and firm performance from 2010 till 2020. The total
number of research papers in this research is 41. I can tell that there are three articles in
2010, 2012, and 2016. They are for articles in the year 2011, 2014 and 2017. The
number of articles reaches six articles in the years 2013, 2015, and 2019. From the
charts below, the number of articles dropped to 2 articles in 2018.

Fig. 5. A number of studies according to the year

6 Conclusion

From this systematic review assessment with regards to knowledge management and
firm performance, there are many findings. First, the are many factors repeated in this
research like culture, structure, strategy technology, intellectual capital, leadership,
process, and rewards. Most of the repeated elements are located within the organization
and the firm that will light the path for future study for the internal drivers that affect the
KM and FP. Second Most of the articles showed a positive relationship between
knowledge management and firm performance; there a cumulative connection between
the KM and FP. If there is excellent knowledge management in the firm, there will be a
top firm performance. The third survey as a research method found as the primary data
collection in the most research 78%. Fourth, 87.8% of the articles showed positive
research outcomes. 7.3% negative and 4% N/A. There are many limitations mentioned
in this systematic review. First, many articles highlighted that there are some limita-
tions in the theoretical part as well as the conceptual model which needed more
constructs. Second Many articles focused on specific contexts and county, that makes it
very difficult to generalize the study. Third limitation related to the sample size is small
and low reliabilities showed in many articles. The fourth point, some of the studies
don’t take into consideration other possible mediators and moderators, such as trust,
policy, rewords, etc. Finally, many biases showed in many articles.
Internal Factors Affect Knowledge Management and Firm Performance 641

References
1. Altamony, H., Alshurideh, M., Obeidat, B.: Information systems for competitive advantage:
implementation of an organizational strategic management process. In: Proceedings of the
18th IBIMA Conference on Innovation and Sustainable Economic Competitive Advantage:
From Regional Development to World Economic, Istanbul, Turkey, 9–10 May (2012)
2. Shannak, R., Masa’deh, R., Al-Zu’bi, Z., Obeidat, B., Alshurideh, M., Altamony, H.: A
theoretical perspective on the relationship between knowledge management systems,
customer knowledge management, and firm competitive advantage. Eur. J. Soc. Sci. 32(4),
520–532 (2012)
3. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics. Theor. Econ. Lett. 9(02), 392–414 (2019)
4. Salloum, S.A., Al-Emran, M., Shaalan, K.: Mining social media text: extracting knowledge
from Facebook. Int. J. Comput. Digit. Syst. 6(2), 73–81 (2017)
5. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
6. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: 13th International Conference, KMO 2018 (2018)
7. Baker, J.H.: Is servant leadership part of your worldview? weLEAD Online Magazine,
January 2001
8. Jain, A.K., Moreno, A.: Organizational learning, knowledge management practices and
firm’s performance: an empirical study of a heavy engineering firm in India. Learn. Organ.
22(1), 14–39 (2015)
9. Mhamdi, C., Al-Emran, M., Salloum, S.A.: Text mining and analytics: a case study from
news channels posts on Facebook, vol. 740 (2018)
10. Wu, I.L., Chen, J.L.: Knowledge management driven firm performance: the roles of business
process capabilities and organizational learning. J. Knowl. Manag. 18(6), 1141–1164 (2014)
11. Habib, A., Bao, Y.: Impact of knowledge management capability and green supply chain
management practices on firm performance. Int. J. Res. Bus. Soc. Sci. (2147–4478) 8(5),
240–255 (2019)
12. Davila, G., Varvakis, G., North, K.: Influence of strategic knowledge management on firm
innovativeness and performance. Braz. Bus. Rev. 16(3), 239–254 (2019)
13. Magnier-Watanabe, R., Benton, C.: Management innovation and firm performance: the
mediating effects of tacit and explicit knowledge. Knowl. Manag. Res. Pract. 15(3), 325–335
(2017)
14. Santosh, Y.K., Dennis, J., Jigeesh, N.: The influence of knowledge management practices
and systems on firm performance. Int. J. Appl. Eng. Res. 10(18), 39338–39344 (2015)
15. Yu, H., Shang, Y., Wang, N., Ma, Z.: The mediating effect of decision quality on knowledge
management and firm performance for Chinese entrepreneurs: an empirical study.
Sustainability 11(13), 1–15 (2019)
16. Iandoli, L.: Organizational Cognition and Learning: Building Systems for the Learning
Organization: Building Systems for the Learning Organization. IGI Global, London (2007)
17. Kianto, A., Andreeva, T.: Knowledge management practices and results in service-oriented
versus product-oriented companies. Knowl. Process Manag. 21(4), 221–230 (2014)
18. Zaied, A.N.H.: An integrated knowledge management capabilities framework for assessing
organizational performance. Int. J. Inf. Technol. Comput. Sci. 4(2), 1–10 (2012)
642 A. A. A. Mehrez et al.

19. Nonaka, I., Takeuchi, H.: The Knowledge-Creating Company: How Japanese Companies
Create the Dynamics of Innovation. Oxford University Press, New York (1995)
20. Alavi, M., Leidner, D.E.: Knowledge management and knowledge management systems:
conceptual foundations and research issues. MIS Q. 25, 107–136 (2001)
21. Heisig, P.: Harmonisation of knowledge management–comparing 160 KM frameworks
around the globe. J. Knowl. Manag. 13(4), 4–31 (2009)
22. Darroch, J., McNaughton, R.: Beyond market orientation. Eur. J. Mark. 37(3/4), 572–593
(2003)
23. Noruzy, A., Dalfard, V.M., Azhdari, B., Nazari-Shirkouhi, S., Rezazadeh, A.: Relations
between transformational leadership, organizational learning, knowledge management,
organizational innovation, and organizational performance: an empirical investigation of
manufacturing firms. Int. J. Adv. Manuf. Technol. 64(5–8), 1073–1085 (2013)
24. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.: Corporate social responsibility and patronage intentions: the mediating effect
of brand credibility. J. Mark. Commun. 1–24 (2020)
25. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: an empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Financ. Adm. Sci. 51(1), 44–64 (2012)
26. Alshraideh, A., Al-Lozi, M., Alshurideh, M.: The impact of training strategy on
organizational loyalty via the mediating variables of organizational satisfaction and
organizational performance: an empirical study on Jordanian agricultural credit corporation
staff. J. Soc. Sci. 6, 383–394 (2017)
27. Alshurideh, M., Masa’deh, R., Al kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Financ. Adm. Sci. 47(12), 69–78 (2012)
28. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84 (2012)
29. Sloma, R.: How to Measure Managerial Performance. Beard Books, Washington (1999)
30. Ivancevich, J.M.: Different goal setting treatments and their effects on performance and job
satisfaction. Acad. Manag. J. 20(3), 406–419 (1977)
31. Alshurideh, M.T., et al.: The impact of Islamic bank’s service quality perception on
Jordanian customer’s loyalty. J. Manag. Res. 9, 139–159 (2017)
32. Obeidat, B., Sweis, R., Zyod, D., Alshurideh, M.: The effect of perceived service quality on
customer loyalty in internet service providers in Jordan. J. Manag. Res. 4(4), 224–242
(2012)
33. Bolat, T., Yilmaz, Ö.: The relationship between outsourcing and organizational performance:
is it myth or reality for the hotel sector? Int. J. Contemp. Hosp. Manag. 21(1), 7–23 (2009)
34. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
35. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
36. Alshurideh, M.T., Assad, N.F.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
Internal Factors Affect Knowledge Management and Firm Performance 643

37. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
38. Al-Emran, M., Mezhuyev, V., Kamaludin, A., Shaalan, K.: The impact of knowledge
management processes on information systems: a systematic review. Int. J. Inf. Manage. 43
(July), 173–187 (2018)
39. Al-Emran, M., Mezhuyev, V., Kamaludin, A.: Technology acceptance model in M-learning
context: a systematic review. Comput. Educ. 125, 389–412 (2018)
Enhancing Our Understanding
of the Relationship Between Leadership,
Team Characteristics, Emotional Intelligence
and Their Effect on Team Performance:
A Critical Review

Fatima Saeed Al-Dhuhouri1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Teams are the core base of every business and institute, targeted
toward achieving particular common projects and strategies through effective
collaboration. Many scholars studied the various variables that affect the team
performance level. Recently, they shed light on the psychological factors,
including “Emotional Intelligence (EI)”. This paper aim toward elevating our
understanding of the relationship between leadership, team characteristics,
emotional intelligence and their effect on team performance through the
implication of systematic review approach to review and synthesize EI studies
related to team performance aiming to provide a comprehensive analysis of 19
research articles from 2010 to 2020. The main findings include the discovery of
14 external factors that have a relationship with EI and team performance in two
or more of the relevant studies. Most importantly, out of the 14 factors, 6 factors
were selected in which all of them demonstrated a positive relationship with EI
and team performance. Moreover, a quantitative approach was the primarily
relied research method for data collection. Additionally, most of the analyzed
studies were undertaken in the US; this is followed by Australia, in addition to
other countries. Besides, most of the analyzed studies were frequently conducted
in Academic context, followed by construction, in addition to other contexts.

Keywords: Leadership  Team characteristics  Flexibility  Team size  Team


diversity  Emotional intelligence  Trust  Homogenization  Team performance

1 Introduction

Highly performing teams are considered a competitive advantage for the organization’s
survival in a dynamic business environment [1–4]. There are variously internal and
external factors that impact team performance. Recently, researchers shed light on the

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 644–655, 2021.
https://doi.org/10.1007/978-3-030-58669-0_58
Enhancing Our Understanding of the Relationship 645

psychological factors, including emotional intelligence (EI). Empirical evidence proved


that there is a positive association between EI and team performance. Various studies
analyzed EI’s effect on team performance from different angles, including “knowledge
sharing, team characteristics, leadership and so much more”. The present systematic
review aims to analyze collected studies to elevate our understanding of the factors that
impact team performance through a psychological lens (Emotional intelligence, EI).
The systematic review provides a means of methodical search and analysis of all
relevant empirical studies using a scientific approach to provide a comprehensive
explanation of research results [5, 6]. It uses readily available evidence to provide
information for decision-makers and for developing new research insight [7, 8].

1.1 Research Importance


The results and findings of this systematic review will provide a value for organizations
through understanding the main variables/mediators/moderators that promote team
performance in the presence of emotional intelligence (EI), which will enable them to
control their performance. In specific, this study will provide critical information
regarding the significance of leadership, team characteristics on emotional intelligence,
and team performance. Results would also evaluate and develop strategies in enhancing
the team’s attitude, skills, and knowledge.

1.2 Research Questions


This paper aims to answer a critical research question: “Does emotional intelligence
(EI) affect team performance and what are the roles of leadership, team characteristics,
trust, and team homogenization in that relationship?”.

1.3 Research Objectives


We enlisted a set of research objectives to answer our research question:
• Objective 1: To assess the effect of EI on team performance.
• Objective 2: Evaluating the impact of trust as a mediator between EI and team
performance.
• Objective 3: Evaluating the effects of homogenization as a mediator between EI and
team performance.
• Objective 4: To assess the effect of leadership on EI.
• Objective 5: To assess the effect of team characteristics (team diversity, flexibility,
and size) on EI.
• Objective 6: To highlight the significance and importance of emotional intelligence
among leaders, team members in the work setting.
646 F. S. Al-Dhuhouri et al.

2 Literature Review
2.1 Emotional Intelligence
EI is generally defined as recognizing and regulating their own and others’ emotions to
steer individuals’ emotions, thoughts, and actions [9]. A growing number of
researchers have recently shed attention on Emotional intelligence (EI) and its
importance. Wayne Payne invented the term “Emotional Intelligence” in 1985, since
then, EI was considered an influential area [9]. There are 4 EI Models and Basic
Dimensions; Mayer and Salovey, BAR-ON, GOLEMAN, and COOPER & SAWAF,
with different definitions and dimensions [10]. EI occurs not only on the individual
level but also on the team level, as noted [11].

2.2 Team Performance


Various studies proved that EI affects team performance in which the relationship is not
likely to be direct [11–13]. [14] defined teams as “identifiable social work units con-
sisting of two or more people with several unique characteristics”. Simply, team per-
formance is defined as the degree of the team able to achieve its output goals.

2.3 Trust
Trust is defined (according to [15], p. 605) as a “multicomponent variable consisting of
4 distinct but related dimensions; propensity to trust, perceived trustworthiness, co-
operative and lack of monitoring behaviors”. As pointed by many scholars such as [16–
21] that intrateam trust promotes cooperation, task support, communication, and
coordination (known as positive interpersonal dynamics). Simply, team members
characterized by high EI tend to regulate their own emotions, in addition to intensive
sensitivity, recognition, and responsiveness to others’ emotions leading to the
demonstration of dependability and trustworthiness [21].

2.4 Team Homogenization


Team homogenization is considered an opposite term of team conflict. Intrateam
conflict is defined as “the process arising from perceived incompatibilities or differ-
ences between team members” ([22], p. 116). Homogenized teams are capable
of regulating and understanding their emotions (High EI), leading to high team per-
formance [11].

2.5 Leadership
Researchers in [23] provided an integrative definition of leadership; “A leader is one or
more people who select, equips, trains, and influences one or more follower(s) who
have diverse gifts, abilities, and skills and focuses the follower(s) to the organization’s
mission and objectives causing the follower(s) to willingly and enthusiastically expend
spiritual, emotional, and physical energy in a concerted, coordinated effort to achieve
Enhancing Our Understanding of the Relationship 647

the organizational mission and objectives”. Leaders have a significant influence on


team performance, processes, and characteristics [21]. Successful leadership exhibit
emotions properly and identify others’ emotions effectively [24]. As a result, Leaders
influence teams’ emotions and values, which eventually affect their performance [24].

2.6 Team Characteristics


In this paper, we underlined three-team characteristics, including; team flexibility, team
size, and diversity.

2.7 Team Flexibility


Researchers in [25] defined flexibility as “individual/organization able to be resilient,
adaptable and proactive, in addition to its capacity to change and to adapt to the
challenging environment”. It was proved that it acts as independent variable effecting
emotional interactions within teams and their performance by enabling them to view
problems from different angles to find untraditional solutions [25].

2.8 Team Size


Researchers in [25] defined flexibility as “individual/organization able to be resilient,
adaptable, and proactive. Researchers in [26] discovered that a large team size pro-
motes a positive relationship between EI and team member information.

2.9 Team Diversity


It is defined as the degree to which team members differ from each other [26]. Previous
research grouped team diversity into two categories; one is information-related attri-
butes such as educational background, and the other is non-informational diversity such
as personality and demographics. In this study, we will focus on non-informational
diversity. [26] discovered that high non-informational team diversity facilitates the
positive relationship between EI and team performance.

3 Methods

In this section, we followed certain steps, as follows: determination of inclusion and


exclusion criteria, data sources and literature search strategies, and quality assessment.
The details of each section are illustrated in the following paragraphs.

3.1 Inclusion and Exclusion Criteria


The critically analyzed articles in this systematic review must meet the inclusion and
exclusion criteria [5, 27–29] described in the following lines:
648 F. S. Al-Dhuhouri et al.

The inclusion criteria:


1. Date: published in the period 2010 to 2020
2. Language: English.
3. Study design: meta-analyses, randomized, and controlled (any).
4. Should include emotional intelligence impact on team performance.
5. Should consider team level.
The exclusion criteria:
1. Non-English papers.
2. Articles that consider individual level.

3.2 Data Sources and Literature Search Strategy


We conducted a systematic review to achieve our research objective in which we
obtained multiple studies from various databases, including ProQuest, Google scholar,
and Wiley Online Library. We used different search keywords to identify relevant
articles (Table 1).

Table 1. Keyword search


Keyword Search
“Emotional Intelligence” AND “Team Performance”
“EI” AND “Team Performance”

After identifying the keywords, we conducted an initial search in multiple databases


(refer to Table 2).

Table 2. Initial search results across the databases.


Journal Keywords Search
databases “Emotional Intelligence” AND “Team “EI” AND “Team
Performance” Performance”
ProQuest 11 20
Search format: ti (Emotional Intelligence) Search format: ti (EI) AND ti
AND ti (Team Performance) (Team Performance)
Google 15 10
Scholar
Wiley 2 0
Online Search Format: “Emotional Intelligence” in Search Format: “EI” in Title
Library Title and “ team performance “ in Title and “team performance” in
Title
Total 28 30
Enhancing Our Understanding of the Relationship 649

By following the search results, 58 studies were obtained based on the search terms
that are described in Table 1. Afterward, we performed data analysis to remove articles
that didn’t align with our inclusion\exclusion criteria (26), in addition to removing
duplicates (13). Thus the total number of collected articles becomes (19), knowing that
each of them confirmed the inclusion and exclusion criteria.

3.3 Quality Assessment


In addition to the inclusion and exclusion criteria, we applied quality assessment
analysis. Researchers in [30] proposed a quality assessment checklist (Table 3) that
provided a means for evaluating the quality of the studies. Quality assessment tech-
nique enable us in selecting appropriate studies that provide sufficient information and
comply with a key research component to elevate the systematic review validity

Table 3. Quality assessment checklist.


# Question
1 Are the research aims specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study specified?
4 Is the study context/discipline specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?

Based on this comprehensive quality assessment checklist, we analyzed (19) stud-


ies (Table 4). We found three low-quality articles (S2, S4, and S11) that don’t provide
sufficient information (with a shortage in key research component). However, most of
the selected articles contained sufficient information, including clear research aim, clear
specific variables, detailed data collection method, and so on. Also, 2 out of 19 selected
articles demonstrated full and sufficient information covering all key research
components.
As shown in Fig. 1, we illustrated the systematic review process and the number of
studies obtained at each stage.
Once valid articles were identified, and their qualities were assessed, we presented
all the factors (found in the studies).
650 F. S. Al-Dhuhouri et al.

Table 4. Quality Assessments of the selected studies.


SN Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Total
S1 1 1 1 1 1 1 1 1 1 100%
S2 1 0 0 0 0 0 0 0.5 0.5 22%
S3 1 1 1 1 1 1 1 1 1 100%
S4 1 1 0.5 0.5 0 0 0 0.5 0.5 44%
S5 0.5 0.5 1 1 1 0.5 0.5 0.5 1 72%
S6 1 1 0.5 1 1 0.5 1 1 1 88%
S7 0.5 1 1 1 0.5 1 1 0.5 0.5 77%
S8 1 0.5 1 1 0.5 0.5 1 1 0.5 77%
S9 0.5 1 1 0.5 1 1 1 1 0.5 83%
S10 1 0.5 1 1 1 1 1 0.5 1 88%
S11 1 0 1 1 0 0 0 1 1 55%
S12 1 1 1 0.5 1 0.5 1 0.5 0.5 77%
S13 1 0.5 0.5 1 0.5 0.5 1 0.5 0.5 67%
S14 1 1 0.5 1 0.5 1 1 1 1 89%
S15 1 1 1 0.5 1 1 0.5 1 1 89%
S16 1 1 1 1 1 1 0.5 1 1 94%
S17 1 0 0 0 0 0 0 0.5 1 28%
S18 1 1 1 1 1 1 0.5 1 1 94%
S19 1 1 0.5 1 1 1 1 1 1 94%

Fig. 1. Systematic review process.


Enhancing Our Understanding of the Relationship 651

4 Result and Discussion

Overall, (38) external factors were identified and assessed in the (22) studies. However,
it was determined that only (14) external factors had a relationship with team perfor-
mance in two or more of the relevant studies.

4.1 Distribution of the Studies in Term of Disciplines


We reported the outcomes of this systematic review that analyze 19 research articles
published in the EI models in team management orientations to three research
questions.
RQ1: Distribution of studies in terms of research methods.

Quantitative Qualitative Experiement others

Fig. 2. Distribution of studies in terms of research methods

We categorized each collected study according to their research method (quantitative,


qualitative, experimental, and others). As shown in Fig. 2, most of the articles used the
quantitative research method (in the form of questionnaires) as a primary tool.
652 F. S. Al-Dhuhouri et al.

Not specified
Norway
Spain
India
Australia
USA
Turky
China
Malaysia

0 1 2 3 4 5 6 7 8

Fig. 3. Distribution of countries

RQ2: Distribution of countries.


Furthermore, we distributed the collected studies (N = 19) across the countries that
carried out these studies. Figure 3 shows that most of the studies were conducted in the
USA, followed by Australia and others.

0
2011 y 2012 y 2013 y 2014 y 2015 y 2016 y 2017 y 2018 y 2019 y 2020 y

Fig. 4. Distribution of studies in term of publication year

RQ3: Distribution of studies in term of publication year.


Also, we analyzed and distributed collected articles across their publication years
(Refer to Fig. 4.). As observed, the studies are ranged from 2011 to 2020. The number
of studies peaked in 2013 (with four studies).
Enhancing Our Understanding of the Relationship 653

4.2 Distribution of the Frequency of the Factor


Overall, (44) external factors were identified and assessed in the (19) studies. However,
it was determined that only (8) external factors (Awareness of own emotions, Man-
agement of own emotions, awareness of other’s emotions, management of other’s
emotions, EI, Intrateam Trust, Hetergenety, conflict management) had a relationship
with team performance in two or more of the relevant studies as shown in Fig. 5.

12
10
8
6
4
2
0

Fig. 5. Distribution of the frequency of the factor.

5 Conclusion

The field of emotional intelligence (EI) impact on team performance is not completely
investigated; however, many scholars proposed and tested various interesting models.
By analyzing them in the form of “systematic review”, a holistic framework was
established in which we linked various factors (leadership, team flexibility, team
diversity, team size, trust, and homogenization) to EI and team performance. In
specific, we collected 19 articles that emerged mostly from the US using three data-
bases (Proquest, Google scholar, and Wiley online library). Most of the studies were
conducted in 2013. Furthermore, the majority of them used a quantitative research
method (in the form of questionnaires) as a primary tool. Besides, most of the analyzed
studies were frequently conducted in Academic context, followed by construction, in
addition to other contexts. As a limitation, this paper relied on specific databases for
collecting the articles (Proquest, Google scholar, and Wiley online library), which may
not provide a comprehensive representation of all studies related to EI and team per-
formance. Therefore, we recommend expanding the current study by including studies
from other databases such as SAGE and SCOPUS. Furthermore, we recommend
expanding the current study to include EI effect on team effectiveness.
654 F. S. Al-Dhuhouri et al.

References
1. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84–95 (2012)
2. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
3. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
4. Altamony, H., Alshurideh, M., Obeidat, B.: Information systems for competitive advantage:
implementation of an organizational strategic management process. In: Proceedings of the
18th IBIMA Conference on Innovation and Sustainable Economic Competitive Advantage:
From Regional Development to World Economic, Istanbul, Turkey, 9th–10th May 2012
5. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
6. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
7. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
8. Nedal Fawzi Assad, M.T.A.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
9. Syed, F., Rafiq, A., Ahsan, B., NadeemMajeed, M.: An efficient framework based on
emotional intelligence to improve team performance in developing countries. Int. J. Mod.
Educ. Comput. Sci. 5(12), 16 (2013)
10. Bal, C.G., Firat, I.: The impact of emotional intelligence on team performance and learning
organization of employees. Int. J. Acad. Res. Bus. Soc. Sci. 7(7), 304–325 (2017)
11. Rezvani, A., Barrett, R., Khosravi, P.: Investigating the relationships among team emotional
intelligence, trust, conflict and team performance. Team Perform. Manage.: Int. J. (2019)
12. Alshurideh, A.A., Al kurdi, B.: The effect of internal marketing on organizational citizenship
behavior an applicable study on the University of Jordan employees. Int. J. Mark. Stud. 7(1),
138 (2015)
13. Alshurideh, M.: The factors predicting students’ satisfaction with universities’ healthcare
clinics’ services: a case-study from the Jordanian higher education sector. Dirasat. Adm. Sci.
161(1524), 1–36 (2014)
14. Salas, E., Rosen, M.A., King, H.: Managing teams managing crises: principles of teamwork
to improve patient safety in the emergency room and beyond. Theor. Issues Ergon. Sci. 8(5),
381–394 (2007)
15. Costa, A.C.: Work team trust and effectiveness. Pers. Rev., 605–622 (2003)
16. Alshurideh, et al.: Loyalty program effectiveness: theoretical reviews and practical proofs.
Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
17. Ashurideh, M.: Customer service retention–A behavioural perspective of the UK mobile
market. Durham University (2010)
18. Al Dmour, H., Alshurideh, M., Shishan, F.: The influence of mobile application quality and
attributes on the continuance intention of mobile shopping. Life Sci. J. 11(10), 172–181
(2014)
Enhancing Our Understanding of the Relationship 655

19. Alshurideh, M., Masa’deh, R., Al kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Finan. Adm. Sci. 47(12), 69–78 (2012)
20. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
21. Chang, J.W., Sy, T., Choi, J.N.: Team emotional intelligence and performance: interactive
dynamics between leaders and members. Small Gr. Res. 43(1), 75–104 (2012)
22. Greer, L.L., Caruso, H.M., Jehn, K.A.: The bigger they are, the harder they fall: linking team
power, team conflict, and performance. Organ. Behav. Hum. Decis. Process. 116(1), 116–
128 (2011)
23. Winston, B.E., Patterson, K.: An integrative definition of leadership. Int. J. Leadersh. Stud. 1
(2), 6–66 (2006)
24. Mishra, N., Mishra, R., Singh, M.K.: The impact of transformational leadership on team
performance: the mediating role of emotional intelligence among leaders of hospitality and
tourism sector. Int. J. Sci. Technol. Res. 8(11), 3111–3117 (2019)
25. Günsel, A., Açikgöz, A.: The effects of team flexibility and emotional intelligence on
software development performance. Gr. Decis. Negot. 22(2), 359–377 (2013)
26. Paik, Y., Seo, M.-G., Jin, S.: Affective information processing in self-managing teams: The
role of emotional intelligence. J. Appl. Behav. Sci. 55(2), 235–267 (2019)
27. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of E-learning system in
higher educational environments in the UAE: applying the extended technology acceptance
model (TAM). The British University in Dubai (2018)
28. Salloum, S.A., Alhamad, A.Q.M., Al-Emran, M., Monem, A.A., Shaalan, K.: exploring
students’ acceptance of e-learning through the development of a comprehensive technology
acceptance model. IEEE Access 7, 128445–128462 (2019)
29. Alhashmi, S.F.S., Salloum, S.A., Abdallah, S.: Critical success factors for implementing
artificial intelligence (AI) projects in Dubai Government United Arab Emirates
(UAE) Health Sector: Applying the Extended Technology Acceptance Model (TAM), vol.
1058 (2020)
30. Kitchenham, S., Charters, B.: Guidelines for performing systematic literature reviews in
software engineering. Softw. Eng. Group Sch. Comput. Sci. Math. Keele Univ. 1–57 (2007)
Factors Affect Customer Retention:
A Systematic Review

Salama S. Alkitbi1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Many review studies were handled to provide valuable insights into
customer retention issues and factors that could influence it positively and
effectively. This study systematically reviews and analyzes customer retention
and its related factors of 30 research studies from 2005 to 2019. The main
findings contain that the most common factors that affect customer retention are
service quality, satisfaction, trust, and commitment. Moreover, most of the
customer retention studies were focused on the banking sector, followed by the
studies that concerned about retail industry issues. Additionally, most of the
conducted studies were undertaken in Indonesia, followed by Nigeria and India.
The findings of this review study provide an overview of the current studies and
analyses of customer retention and factors that affect it.

Keywords: Customer retention  Satisfaction  Trust  Commitment

1 Introduction

Generally, it is recognized that there is a positive relationship between customer


retention and profitability. Customer retention enables the company to increase prof-
itability and revenue [1–3]. Thus, the small increase in customer retention could have a
positive impact on profitability [4–6]. [7, 8] indicated that customer retention indicates
customer’s intention to repurchase a service or a product from the service provider. In
[9], authors defined customer retention as “the future propensity of a customer to stay
with the service provider. It should be a continuous process to find a customer and
retain them in a long-term relationship [10–12].
The purpose of the study is to review previous studies and to find out the main
influencing factors affecting the existing customer. In addition, the study aims to
investigate the impact of customer satisfaction, trust, and commitment on customer
retention in different sectors and contexts. The review study stands the following six
research questions:

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 656–667, 2021.
https://doi.org/10.1007/978-3-030-58669-0_59
Factors Affect Customer Retention: A Systematic Review 657

RQ1: What are the main disciplines/contexts of the selected studies?


RQ2: What are the participating countries in the context of the selected studies?
RQ3: How reviews are distributed across their year of publication?
RQ4: What are the research methods of the selected studies?
RQ5: What are the active databases in the context of customer retention?
RQ6: What are the most factors that influence customer retention?
Thus, the importance of this study is in its attempt to contribute and add value to the
extant literature by covering up-to-date research studies on customer retention and its
attempt to explore some main factors that influence it over different countries such as
customer satisfaction, customer trust, and customer commitment. In addition, finding
out some contexts and disciplines in which customer retention and its influencing
factors were not explicitly searched and in detail.
The structure of this review study is as follows. Section 2 reviews the literature
related to customer satisfaction, customer retention, trust, and commitment to exploring
the relationships between these concepts. Section 3 shows the study methodology by
covering the inclusion/exclusion criteria, data sources and search strategies, quality
assessment, as well as data coding and analysis. The fourth section discusses the
results, followed by the proposed model and hypotheses. Finally, the fifth section
presents the conclusion with limitations and some recommendations.

2 Literature Review

Customer retention survives when the companies can fulfill customer expectations and
additionally maintain it in long-term relationships to ensure long-term buying decisions
[13–15]. The topic of customer retention is argued in business economics commonly
within the perspective of relationship marketing, which considers customer relation-
ships as one of the primary concerns with the long-term objective of developing and
maintaining them [16–18]. Many previous studies indicated that companies should
always manage customer satisfaction to achieve the retention stage. According to [19]
“satisfaction is an overall customer attitude towards a service provider”. In [20],
authors added that satisfaction is an emotional reaction regarding what customers
expect and what they receive, including the fulfillment of needs and goals. Customer
retention states a desired outcome in the future to satisfaction, so long-term of rela-
tionship is demonstrated by satisfaction. Although customer satisfaction does not
guarantee repurchase, it still plays a vital role in ensuring customer retention. While
many studies on customer retention had long focused on customer satisfaction, addi-
tional factors are stated as an influence in customer retention, such as trust and com-
mitment. [21], in “The Commitment-Trust Theory of Relationship Marketing,” which
is the most influential Relationship Marketing, suggests that the center of successful
relationship marketing is the relationship of commitment and trust. They urged the
importance of commitment and trust that leads to build a positive correlation between
company and customers and encourage efficiency, productivity, and effectiveness. The
degree of trust between service provider and customer is significantly influenced by the
quality of the service, which results in an effective commitment to the provider, and
658 S. S. Alkitbi et al.

enhancing commitment is important since it leads to an intention to invest further and


reinforce the relationship with the provider.
As mentioned in the previous section, this review study will attempt to highlight the
recent findings of scholars about customer retention and explore some new or limited
contexts and countries in which literature did not covered enough and did not inves-
tigate and examine the relationship of customer retention with mainly satisfaction, trust,
and commitment.

3 Methods

The process of this review study for conducting a systematic review was guided by
Kitchenham and Charters’s guidelines [22]. This method is used by many scholars such
as [23–26]. Given that, the review was directed in four stages: the identification of
inclusion and exclusion criteria, data sources and search strategies, quality assessment,
and data coding and analysis. The details of these phases are shown in the following
sub-sections.

3.1 Inclusion/Exclusion Criteria


The studies that were critically analyzed in this review study meet the inclusion and
exclusion criteria described in Table 1.

Table 1. Inclusion and exclusion criteria.


Inclusion criteria Exclusion criteria
- Should involve Customer retention and should be in - Any study that does not discuss
Tittle mainly customer retention
- Should involve one or all terms of satisfaction, trust, &
commitment, and could be in Tittle, abstract or
anywhere in the document
- Should be written in English
- Should be published between 2005 and 2019
- Could include articles, reports, or theses

3.2 Data Sources and Research Strategies


The studies were identified using different keywords that are related mainly to customer
retention and adding other factors such as satisfaction, trust, and commitment in several
searching processes until reaching a suitable number of related and targeted articles and
studies to select and analyze. Table 2 shows the keywords search. The search process
Factors Affect Customer Retention: A Systematic Review 659

was conducted through five various journal databases which are ProQuest, Science-
Direct, Taylor, and Emerald, and Google Scholar as a search engine. The search was
done on a different day for several times started on Sunday, 26th Jan. 2020, and the last
search was on Saturday, 1st Feb. 2020. Initial search results which focus on searching
for customer retention as a title and other keywords in the title or abstract or anywhere
in a document across five online databases and google scholar are 137 studies. The
studies were downloaded and then filtered to remove the duplicate, which was 20
studies, so the studies became 117 studies, and the distribution of studies across the
database is presented in Table 3. After applying inclusion and exclusion criteria to
select targeted studies, the studies became 105 studies. After screening studies and to
focus mainly on that studies involve good discussion and analysis for customer
retention and targeted factors that influence it, the selected studies to analyze for this
review systematic study are 30 studies. These 30 studies were applied to various
sectors, countries and published in the last 15 years ago. All processes are described in
Fig. 1.

Table 2. Data sources and databases


Keyword search
“Satisfaction” AND “Customer retention”
“Satisfaction” AND “Trust” AND “Customer
retention”
“Satisfaction” AND “Trust” AND
“commitment” AND “customer retention”

Table 3. The total number of articles.


Journal databases Frequency
ProQuest 51
ScienceDirect 22
Taylor 23
Emerald 15
Google Scholar 26
Total 137
660 S. S. Alkitbi et al.

Fig. 1. PRISMA flowchart for the selected studies.

3.3 Quality Assessment


Quality assessment is another factor that could be applied to examine the studies’
quality, along with the inclusion and exclusion criteria. The quality assessment checklist
with 9 criteria was formulated to evaluate the quality of the research studies that were
selected for further analysis (N = 30). The quality assessment checklist is clarified in
Table 4. The checklist was adapted from those suggested by [22]. Each question was
scored according to the three-point scale, with a “Yes” being worth 1 point, “No” being
worth 0 point, and “Partially” being worth 0.5 point. Thus, each study could score
between 0 and 9, with the higher the total score a study gets, the higher the degree to
which this study addresses the research questions. Tables 5 clarifies the quality
assessment results for all the 30 studies. All the studies have passed the quality
assessment, which means that all the studies are qualified to be used for analysis.

Table 4. Quality assessment criteria.


# Question
1 Are the research aims clearly specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study clearly specified?
4 Is the study context/discipline clearly specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?
Factors Affect Customer Retention: A Systematic Review 661

Table 5. Quality assessment results


Study Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Total Percentage
1. 0.5 1 1 1 1 1 1 0.5 0.5 7.5 83%
2. 1 1 1 1 1 1 1 0.5 0.5 8 88%
3. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
4. 1 1 0.5 1 1 1 1 0.5 0.5 7.5 83%
5. 1 1 1 1 1 1 1 0.5 1 8.5 94%
6. 1 1 1 0.5 1 1 1 1 1 8.5 94%
7. 0.5 0.5 1 0.5 1 0.5 1 0.5 0.5 6 66%
8. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
9. 0.5 0.5 1 0.5 1 1 1 0.5 0.5 6.5 72%
10. 0.5 0.5 1 0.5 1 0 0.5 0.5 0.5 5 55%
11. 0.5 0.5 1 1 1 1 1 0.5 0.5 7 77%
12. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
13. 0.5 0.5 0.5 0.5 1 1 1 0.5 0.5 6 66%
14. 1 1 1 0.5 1 0.5 1 1 1 8 88%
15. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
16. 1 1 1 0.5 1 1 0.5 0.5 0.5 7 77%
17. 1 0.5 1 1 0.5 0.5 0.5 0.5 1 6.5 72%
18. 1 1 0.5 1 1 0.5 0.5 0.5 0.5 6.5 72%
19. 1 1 0.5 1 0.5 0 1 0.5 0.5 6 66%
20. 1 1 0.5 1 0.5 1 0.5 0.5 0.5 6.5 72%
21. 1 1 1 1 1 1 1 0.5 0.5 8 88%
22. 1 1 1 1 1 0.5 1 0.5 0.5 7.5 83%
23. 1 1 1 1 0.5 0.5 1 0.5 0.5 7 77%
24. 1 1 0.5 1 1 1 1 0.5 0.5 7.5 83%
25. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
26. 1 1 1 0.5 1 1 1 0.5 0.5 7.5 83%
27. 1 1 1 0.5 1 0.5 1 0.5 0.5 7 77%
28. 1 1 1 1 0.5 0.5 0.5 0.5 0.5 6.5 72%
29. 1 1 1 0.5 0.5 0.5 1 0.5 0.5 6.5 72%
30. 1 1 1 1 1 0.5 1 0.5 0.5 7.5 83%

3.4 Data Coding and Analysis


The characteristics correlated to the research methodology quality were coded,
including database, authors, year of publication, the context of the studies, independent
& dependent factors, research method (quantitative, qualitative), sample size, and
mediators\moderators. During the analysis process and through extracting data from
selected studies, any study that didn’t clearly describe customer retention and its
relation with at least one of the targeted factors was excluded from the synthesis.
662 S. S. Alkitbi et al.

4 Result

Based on the 30 studies published on Customer retention from 2005 to 2020, the results
of this systematic review are reported according to the six research questions.

4.1 Distribution of the Studies in Terms of Disciplines/Contexts


Context/discipline is a study field, sector, or industry that is researched in customer
retention. In this research, the collected articles were distributed across the
disciplines/contexts. The banking sector is the most context that attracts most studies to
investigate customer retention and factors that influence it. According to 30 studies that
were conducted in this research, about seven studies were focusing on the banking
sector. The second most attractive context is the mobile phone market that about five
studies form conducted studies examined customer retention with other factors in this
field. While the less investigated filed is the hotel or hospitality industry, which is
recommended for more exploration. Figure 2 represents the distribution of the entire
collected articles across the context/discipline that these studies were conducted.

Fig. 2. Distribution of studies in terms of context/discipline.

4.2 Distribution of the Studies in Terms of Countries in the Context


The majority of these studies were carried out in Indonesia with four studies. Then,
three studies were carried out in Nigeria and India. This is followed by Jordan, Pak-
istan, Malaysia, and Korea, with two studies for each. Form different other countries,
Factors Affect Customer Retention: A Systematic Review 663

one study at least was analyzed in this study. Figure 3 describes the distribution of all
conducted articles across the countries in which these research studies were
investigated.

Fig. 3. Distribution of studies in terms of country.

4.3 Distribution of the Studies in Terms of Publication Year


In terms of publication year, Fig. 4 shows the distribution of the analyzed studies
across the years in which these studies were published. The studies are conducted from
2005 to 2019. It is obvious that the number of published studies was increased from
one study in 2005 to about 4 to 6 in recent years. Furthermore, there is a remarkable
increase of published articles in 2013 and 2019.

Fig. 4. Distribution of studies in terms of publication year.


664 S. S. Alkitbi et al.

4.4 Distribution of the Studies in Terms of Research Method


Based on research method distribution, it is clear that the majority of conducted studies
were depended on quantitative method for about 80% of 30 selected studies and that
was mainly relied on questionnaire surveys for collecting data. Three studies were
applied qualitative method with main interviews. Respectively. Moreover, both
methods were applied in 2 studies from 30 conducted studies. Figure 5 shows the
distribution of the studies in terms of research methods.

Fig. 5. Distribution of studies in terms of research method.

4.5 Distribution of the Studies in Terms of Database


This study was searching for customer retention and another variable in four main
databases, which were Taylor, ProQuest, Emerald, & ScienceDirect, in addition to
search on google scholar engines. The majority of the conducted studies were selected
form google scholar, as shown in Fig. 6 with about 20 studies, as it is the most
productive among others, followed by the emerald database with four studies and two
studies from ProQuest, ScienceDirect, and Taylor databases.

Fig. 6. Distribution of studies in terms of database.


Factors Affect Customer Retention: A Systematic Review 665

5 Conclusion

This review study aims to review previous studies to find out the main influencing
factors that affect customer retention and mainly, investigate the impact of the customer
satisfaction, trust, and commitment on customer retention in different sectors and
contexts through the various database and search engine across various years of pub-
lications [27–30]. Previous customer retention literature studies provided valuable
results on the relationship of customer retention with other factors that the companies
could take in their consideration to empower their development and gain a competitive
advantage over competitors [31–34]. But still, there are many variables that could be
examined in different sectors, and that what this review study tries to highlight through
analyzing around 30 studies. The selected studies were conducted through certain
processes and specific criteria. The quality of the conducted studies was assessed before
analyzing applying inclusion\exclusion criteria as well as the quality assessment
checklist with nine criteria. The main terms of analysis are independent and dependent
variables, year of publication, context, country, database, and method. The main
finding of this review is that the main factors that affect customer retention positively
are customer satisfaction, service quality, trust, commitment, and loyalty [35–38]. As
mentioned in previous sections, some variables like trust and loyalty and their relations
with other external factors need further explanation, especially in the hospitality
industry like hotels. Moreover, most of the analyzed studies had limited sample size or
group and that makes it difficult for generalization since it didn’t represent all popu-
lation for example, a study [39] in which its samples were students, so the study’s
results may not be generalized to non-student populations. In addition, this review has
focused in some databases to collect previous studies, so it is recommended to expand
research in another database to find other valuable studies such as Scopus and IEEE
and also in different industries.

References
1. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.E.: Corporate social responsibility and patronage intentions: the mediating
effect of brand credibility. J. Mark. Commun. 1–24 (2020)
2. Alshurideh, D.M.: Do electronic loyalty programs still drive customer choice and repeat
purchase behaviour? Int. J. Electron. Cust. Relatsh. Manag. 12(1), 40–57 (2019)
3. Zeithaml, V.A., Berry, L.L., Parasuraman, A.: The behavioral consequences of service
quality. J. Mark. 60(2), 31–46 (1996)
4. Alshurideh, M.T., et al.: The impact of Islamic Bank’s service quality perception on
jordanian customer’s loyalty. J. Manag. Res. 9, 139–159 (2017)
5. Alshurideh, M.T.: A theoretical perspective of contract and contractual customer-supplier
relationship in the mobile phone service sector. Int. J. Bus. Manag. 12(7), 201–210 (2017)
6. Ntabo, K.O., Aunda, A.O.: Influence of customer relational management practices on
customer retention. Barat. Interdiscip. Res. Journal 6, 35–36 (2016)
7. Edward, M.: Role of switching costs in the service quality, perceived value, customer
satisfaction and customer retention linkage, 23(3), 327–345 (2011)
666 S. S. Alkitbi et al.

8. Alshurideh, M.T.: Exploring the main factors affecting consumer choice of mobile phone
service provider contracts. Int. J. Commun. Netw. Syst. Sci. 9(12), 563–581 (2016)
9. Danesh, S.N., Nasab, S.A., Ling, K.C.: The study of customer satisfaction, customer trust
and switching barriers on customer retention in Malaysia hypermarkets. Int. J. Bus. Manag. 7
(7), 141–150 (2012)
10. Alshurideh, M.: Scope of customer retention problem in the mobile phone sector: a
theoretical perspective. J. Mark. Consum. Res. 20, 64–69 (2016)
11. Alshurideh, M.: Is customer retention beneficial for customers: a conceptual background.
J. Res. Mark. 5(3), 382–389 (2016)
12. Ammari, G., Al kurdi, B., Alshurideh, M., Alrowwad, A.: Investigating the impact of
communication satisfaction on organizational commitment: a practical approach to increase
employees’ loyalty. Int. J. Mark. Stud. 9(2), 113–133 (2017)
13. Al-Dmour, H., Alshuraideh, M., Salehih, S.: A study of Jordanians’ television viewers
habits. Life Sci. J. 11(6), 161–171 (2014)
14. Alshurideh, M.: The factors predicting students’ satisfaction with universities’ healthcare
clinics’ services: a case-study from the Jordanian higher education sector. Dirasat. Adm. Sci.
161(1524), 1–36 (2014)
15. Preikschas, M.W., Cabanelas, P., Rüdiger, K., Lampón, J.F.: Value co-creation, dynamic
capabilities and customer retention in industrial markets. J. Bus. Ind. Mark. 32(3), 409–420
(2017)
16. Al Dmour, H., Alshurideh, M., Shishan, F.: The influence of mobile application quality and
attributes on the continuance intention of mobile shopping. Life Sci. J. 11(10), 172–181
(2014)
17. Obeidat, B., Sweis, R., Zyod, D., Alshurideh, M.: The effect of perceived service quality on
customer loyalty in internet service providers in Jordan. J. Manag. Res. 4(4), 224–242
(2012)
18. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
19. Levesque, T., McDougall, G.H.G.: Determinants of customer satisfaction in retail banking.
Int. J. bank Mark. 14(7), 12–20 (1996)
20. Hansemark, O.C., Albinsson, M.: Customer satisfaction and retention: the experiences of
individual employees. Manag. Serv. Qual. Int. J. 14(1), 40–57 (2004)
21. Morgan, R.M., Hunt, S.D.: The commitment-trust theory of relationship marketing. J. Mark.
58(3), 20–38 (1994)
22. Kitchenham, S., Charters, B.: Guidelines for performing systematic literature reviews in
software engineering. Softw. Eng. Group Sch. Comput. Sci. Math. Keele Univ. 1–57 (2007)
23. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
24. Nedal Fawzi Assad, M.T.A.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
25. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
26. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
Factors Affect Customer Retention: A Systematic Review 667

27. Al-Duhaish, A., Alshurideh, M., Al-Zu’bi, Z.: The impact of the basic reference group usage
on the purchasing decision of clothes (A field study of Saudi youth in Riyadh city). Dirasat.
Adm. 41(2), 205–221 (2014)
28. Alshurideh, M., Al Kurdi, B., Abumari, A., Salloum, S.: Pharmaceutical promotion tools
effect on physician’s adoption of medicine prescribing: evidence from Jordan. Mod. Appl.
Sci. 12(11), 210–222 (2018)
29. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK
mobile phone market. J. Manag. Res. 6(1), 109 (2014)
30. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84–95 (2012)
31. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: an empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Financ. Adm. Sci. 51(1), 44–64 (2012)
32. Shannak, R., Masa’deh, R., Al-Zu’bi, Z., Obeidat, B., Alshurideh, M., Altamony, H.: A
theoretical perspective on the relationship between knowledge management systems,
customer knowledge management, and firm competitive advantage. Eur. J. Soc. Sci. 32(4),
520–532 (2012)
33. Altamony, H., Alshurideh, M., Obeidat, B.: Information systems for competitive advantage:
implementation of an organisational strategic management process. In: Proceedings of the
18th IBIMA Conference on Innovation and Sustainable Economic Competitive Advantage:
From Regional Development to World Economic, Istanbul, Turkey, 9th–10th May (2012)
34. Alshurideh, R., Masa’deh, R., Al kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Finan. Adm. Sci. 47(12), 69–78 (2012)
35. Alshurideh, M., Shaltoni, A., Hijawi, D.: Marketing communications role in shaping
consumer awareness of cause-related marketing campaigns. Int. J. Mark. Stud. 6(2), 163
(2014)
36. Alshurideh, M.N., Xiao, S.: The effect of previous experience on mobile subscribers’ repeat
purchase behaviour. Eur. J. Soc. Sci. 30(3), 366–376 (2012)
37. Ashurideh, M.: Customer service retention–A behavioural perspective of the UK mobile
market. Durham University (2010)
38. Al-Dmour, H., Al-Shraideh, M.T.: The influence of the promotional mix elements on
Jordanian consumer’s decisions in cell phone service usage: an analytical study.
Jordan J. Bus. Adm. 4(4), 375–392 (2008)
39. Al-Tit, A.A.: The effect of service and food quality on customer satisfaction and hence
Customer retention. Asian Soc. Sci. 11(23), 129 (2015)
The Effect of Work Environment Happiness
on Employee Leadership

Khadija Alameeri1, Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
malshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. The concept of leadership has attracted attention of scholars over the
recent past as companies struggle to manage stiff competition in the local and
international markets. The aim of this systematic review is to critically syn-
thesize the relationship between employee leadership and work environment
happiness within the theoretical paradigm of the contemporary academic liter-
ature. Primary data was obtained through semi-structured interviews with Sec-
ondary data was collected from 21 scholarly sources, including books and
journal articles on the subject matter, and further analyzed with the help of
mixed quantitative-qualitative methods to answer seven predefined research
questions, investigating the role of work environment. Results of the systematic
review showed that happiness in the workplace directly influences the ability of
employees to ascend to top management positions. The study has identified six
independent variables that directly affect the ability of to become leaders: 1) job
satisfaction, 2) employee engagement, 3) workplace safety, 4) valued social
position, 5) support from work friends and Work-life equalization. Findings
provide a foundation for further educational research with potential recom-
mendations for employers. Potential limitations include judgmental sampling
and failure to incorporate socioeconomic variables, sensitive to culture.

Keywords: Employee engagement  Job satisfaction  Happiness  Workplace


environment  Safety  Social support  Leadership

1 Introduction

According to [1], ‘‘Happiness’’ is the modern word, usually translated from the original
Greek (eudaimonia), used to describe the good life. It is accomplished by living well
and doing well over time. However, happiness in the workplace is one of the major
concerns that many organizations are trying to address as a way of enhancing pro-
ductivity and leadership [2–4]. According to [5], in the current society, the majority of
adults spend most of their time in the workplace. The tough economic times have
forced many people to take two or three jobs just to ensure that they can earn enough
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 668–680, 2021.
https://doi.org/10.1007/978-3-030-58669-0_60
The Effect of Work Environment Happiness on Employee Leadership 669

income to meet their personal needs. Others work overtime to increase their earning. It
means that they have very little time to spend with family and friends. The nature of the
workplace environment, especially in terms of satisfaction that workers get, signifi-
cantly affects their lives [6]. As [7] observed, it is important to ensure that the work-
place environment is conducive to enhance the performance of the employees by
making them feel valued, respected, and trusted with tasks assigned to them. Such
employees would feel indebted to the firm, and as such, would be willing to sacrifice
their time to ensure that they deliver on every assignment given to them. Scholars have
been interested in investigating how different factors affect employee’s happiness in the
workplace. They have also developed models that explain how happiness of workers
influences performance, and its subsequent relationship with leadership, especially
among women. Studies have shown that women thrive in an environment that is
supportive. Organizations that have created a happy workplace environment tend to
have a higher number of women in top leadership positions than those with high levels
of stress. Women tend to be more sensitive to criticism, especially when they are trying
to climb the career ladder. When they feel that they lack the support they need, they
will avoid pursuing promotions for fear of making mistakes that would lead to further
criticism. In this paper, the researcher seeks to investigate the effect of happiness in the
workplace environment on employee performance and satisfaction, and its relationship
with leadership, especially among women.
Research Questions: It is important to define a specific question that will guide the
entire process of collecting and analyzing primary data from various sources. As
explained in the introduction, the study seeks to investigate the relationship between
happiness at the workplace and employees’ performance/satisfaction. As such, the
following research question was formulated to help in the process of collecting primary
data from respondents:
– RQ1. What are the factors that affect happiness at work, and how do they influence
employees’ performance and leadership?
The following are the supportive research questions that will help in the data col-
lection process:
– RQ2. What are the factors affecting work environment?
– RQ3. How can organizations design a healthy work environment?
– RQ4. How do work environment help in supporting women leader?
– RQ5. How do work environment affect employee performance?
– RQ6. How does employee performance affect employee satisfaction?
– RQ7. How does employee satisfaction affect employee leadership?
To answer the research questions, the study applied the systematic review approach.
This study has been used by many scholars such as [8–11].
670 K. Alameeri et al.

2 Literature Review

The performance of employees has always been linked with their level of satisfaction at
work [12–14]. In a society where coercive approach to management is becoming less
effective and undesirable, managers are redefining their approach to enhancing the
productivity of their workers [15]. In order to answer clearly formulated research
questions, responding to the objectives of this study, a systematic review is utilized.
With systematic and reproducible methods to determine, choose, and evaluate the
relevant research, this type of literature review proves to be most efficient in collecting
and analyzing data included. Findings made by other scholars will help in identifying
factors that affect happiness in the workplace, which in turn influences job satisfaction,
and the ability of women to rise to top managerial positions. In this section, the
researcher has identified five independent variables that directly affect the dependent
variable.

2.1 Independent Variable 1: Employee Engagements


Employee engagement, according to [16], refers to a concept that focuses on the nature
of the relationship between a given entity and its employees. Many firms have realized
that the best way of empowering their workers is through employee engagement.
Scholars have developed different ways of promoting employee engagement within an
organizational setting. Figure 1 below identifies various opportunities that a firm can
use to improve employee engagement as a way of enhancing their productivity.

Fig. 1. Employee engagement opportunities


The Effect of Work Environment Happiness on Employee Leadership 671

As shown in the figure above, one of the ways of promoting employee engagement
is to encourage personal growth. A firm should create an environment where workers
can experience personal growth at work. It is also important to ensure that there is
happiness in the workplace. Factors that may cause confrontation or undesirable
arguments among workers should be eliminated. The model identifies ambassadorship
as another approach of promoting employee engagement. In this case, a firm will
encourage its workers to represent it in various contexts as a way of promoting the
brand. Effective representation of the brand would help to strengthen the firm’s position
in the market. A firm should find ways of enabling employees to strengthen their
relationship to peers. Workers should be willing and capable of helping their colleagues
to achieve career growth. Relationship with managers is equally important [17]. The
relationship can be strengthened by having an effective vertical communication system
within the company. Recognition should not be underestimated when promoting
employee engagement. Researcher in [18] argues that it is crucial to acknowledge the
excellent work that employees do in their respective areas of work. The model also
identifies an effective feedback system as a way of promoting employee engagement.
The feedback helps them to identify their strengths and weaknesses. It makes it pos-
sible for these workers to have a continuous improvement program as they climb the
career ladder. Wellness is another major factor. Issues such as occupational health and
safety, and emotional wellness of workers should be taken seriously by the manage-
ment. Alignment is the ninth factor as shown in the figure above. The human resource
(HR) department should ensure that skills and experience of workers is aligned with
responsibilities assigned to them. The last factor is compensation based on the job that
an employee is assigned. Authors in [19] believed that it is crucial for the management
to ensure that employees feel adequately compensated. When each of these factors is
taken care of within an organization, employees will be satisfied with their work and
their performance will ultimately improve. Such an environment will also be conducive
for women to achieve career growth. The following questions will need to be
addressed: (1) what are the factors that promote employee engagement in the work-
place?, and (2) What are the benefits of employee engagement to an organization?

2.2 Independent Variable 2: Job Satisfaction


Job satisfaction is another major factor that defines the environment under which an
employee works. Authors in [20] defines job satisfaction as the extent to which a given
worker feels contented and self-motivated with their job. When an individual feels
satisfied with their work, there is always the desire to register a better performance
because they feel they are getting as much as they give. Figure 2 below identifies
factors that define whether an employee would feel satisfied with their job.
Good salary is one of the main factors that define job satisfaction. Authors in [21]
explained that employees tend to move from one company to another in search of good
remuneration. Higher salaries enable them to meet their financial obligations, making
them satisfied with what they do. Career growth is another major factor. An over-
whelming majority of workers often desire to climb the career ladder [22]. As such,
firms should have systems that assure its workers of career growth depending on the
time they have worked for the company and their productivity. The process should be
672 K. Alameeri et al.

Fig. 2. Job satisfaction

seen to be fair and based on specific criterion that is meaningful to stakeholders


involved. Having a work-life balance is critical in enhancing job satisfaction within an
organization. As employees focus on spending a lot of time at work to increase their
income and enhance their career growth, their private life should not be compromised
[20]. The management should ensure that these workers have enough time to spend
with their friends and family to help fight burnout [23–25]. The model also emphasizes
the need to have job security. Workers should not feel that their job will be threatened
when they make a mistake or in instances when the firm is going through economic
challenges. Just like the case of employee engagement, job satisfaction also champions
for recognition of employees when they register exemplary performance. Challenges
also define the experience of workers within a firm. Researchers in [15] explained that
it is often advisable to subject workers to meaningful challenges in their assignment
that motivates critical thinking and creativity. When they overcome these challenges,
especially through the use of new skills and practices, these employees will be more
satisfied with their work. The following questions will need to be addressed: (1) what
are the factors that affect workplace environment, and (2) How does employee’ per-
formance affects employee satisfaction?

2.3 Independent Variable 3: Workplace Safety


Job Health, safety, and wellbeing of the workers are another independent variable that
cannot be ignored when discussing about employees’ satisfaction, performance, and the
ability to promote leadership, especially among female workers. As [17] observes,
workers want to feel safe in their places of work. Many companies in Chinese Hubei
province have currently opted to have their employees work from home because of the
fear of corona virus infection. These companies understand the need to protect the
The Effect of Work Environment Happiness on Employee Leadership 673

health of their workers and are committed to embracing policies that would limit the
spread of the disease to their workers. Occupational safety is another factor that is
clearly defined in various laws, especially those enacted by the International Labor
Organization (ILO). Companies are expected to have internal policies that enhance
safety of all workers and visitors who come to the premise. It is necessary to ensure that
various hazards are eliminated for the benefit of workers. According to [18], it is
equally important to ensure that employees are educated on issues relating to their
safety when they are in their respective workplaces. Providing them with the right
protective gear may not be enough if they do not understand and appreciate the need to
use them.
The well-being of the employees is another factor that defines workplace safety
[16]. Sometimes an employee may be suffering from emotional instability caused by
issues in their private lives. It is dangerous to ignore emotionally distraught employees
because they not only expose themselves to danger but can also be a major threat to
safety of others [17]. It is necessary to have a system that can help such traumatized
individuals as soon as it becomes apparent that they are emotionally unstable. Work-
place safety defines employee’s satisfaction and their performance at work. The fol-
lowing questions will need to be addressed:
– How can an organization design a healthy workplace environment?
– How does an enabling work environment help in supporting women leadership?

2.4 Independent Variable 4: Valued Social Position


The need to have an environment where employees feel valued at the workplace is also
important when promoting employee satisfaction and an environment where workers
feel valued. According to [21], it is normal for people to demand respect for those they
work with. It is also important to ensure that they are given valued social position based
on the time they have been working for the organization and their performance.
Maslow’s hierarchy of needs model [26], can help explain this concept. At the bottom
of the model are psychological needs. When an employee is newly recruited within a
firm, they will only have basic needs [17]. The most important thing to such employees
is the fact that they have a job and can afford to pay their bills. When such an employee
has spent a year or two at the firm, they will graduate from basic needs to the second
level of safety needs. They will demand to have a workplace environment where their
safety and security is not compromised. They will be interested in ensuring that the
organization abides by ILO laws and other legal requirements within the country meant
to protect their interest [7]. When the employee has taken more than five years at a
given organization, belongingness and love needs would set emerge. They would
expect respect based on their position, age, and the time they have spent in the firm.
Esteem needs is the second highest level in this model, as shown in the figure below. In
most of the cases, such an employee would be in the mid to senior managerial levels.
They are often heads of department who are expected to make critical decisions that
would enhance the overall performance of the organization.
They expect their position to be accompanied with appropriate remuneration and
other benefits. At the apex of this model are the self-actualization needs [21]. Most of
674 K. Alameeri et al.

those with these needs are chief executive officers or individuals holding top man-
agerial positions. When one’s social position within the firm is valued, they will feel
contented and committed to achieving a career growth.
The following questions will need to be addressed:
– What are the factors that affect workplace environment?
– How does work environment affect employee performance?

2.5 Independent Variable 5: Friends’ Support and Work-Life


Conciliation
The support that one gets from friends and coworkers also define their job satisfaction
and the ability to become leaders in their organizations. According to [17], people often
want reassurance from their peers to pursue top leadership positions. Being a leader is
often characterized by numerous challenges and there is always the fear that one might
fail. As such, it is often helpful if one gets the support of their friends who will assure
them that they can make it in leadership positions. Women tend to require regular
assurance than men whenever they consider pursuing a higher office. [19] explains that
one of the main reasons why some people, especially women, avoid higher offices is
the problem of criticism. Accordingly, it is important for an organization to promote a
culture where workers learn to support their colleagues instead of engaging in criticism.
As [15] states, positive criticism is constructive because it involves pointing out both
strengths and weaknesses and explaining how one can use unique skills and capabilities
to overcome weaknesses and achieve success. Employees should focus on peer reviews
as an effective way of promoting growth based on merits. Of interest in such reviews
should be to improve the overall performance of colleagues without making them feel
less valued. It is the responsibility of an individual employee to associate with positive
people outside their workplace environment [18]. They should associate with people
who believe in their capacity to be successful leaders. These five independent variables
directly affect the following dependent variable. The following questions will need to
be addressed:
– How does work environment affect employee performance?
– How does employee satisfaction affect employee leadership?

2.6 Dependent Variable 1: Employee (Women) Leadership


The concept of leadership has attracted the attention of leaders over the past several
decades because of the changes in the socio-economic and political environment.
Approaches of leadership that were used prior to the twenty-first century are no longer
effective. Authors in [22] explains that highly successful companies such as Google
Inc., Apple, and Microsoft have learned the importance of embracing the right lead-
ership approaches that emphasize the need to motivate workers instead of coercing
them to undertake a given responsibility. It has also become apparent that women play
a critical role in the management of organizations irrespective of their size [27]. Some
of the qualities unique to women such as compassion and the ability to multitask make
them good leaders, especially when handling young workers. Companies are currently
The Effect of Work Environment Happiness on Employee Leadership 675

committed to promoting their workers to higher managerial positions based on their


competency, regardless of their gender.
In the context of this study, the ability of female employees to rise to the rank of
senior leadership positions is considered a dependent variable. According to [20], the
global society, including the United Arab Emirates, has come to realize the importance
of having women in managerial positions. However, it takes individual commitment
among female employees for them to rise to the position of top management. The five
variables discussed above, which include employee engagement, job satisfaction,
workplace safety, valued social position, and friends support all have a direct influence
on the ability of women to get into managerial [22]. They define the environment under
which these employees work and the ease with which they can rise to higher man-
agerial positions. Having the right policies both in private and public entities would
help create an environment that can nurture women to become successful corporate
leaders in the country. The following questions will need to be addressed: (1) what are
the factors that promote women leadership in local organizations, and (2) How can an
organization promote career growth among employees?

3 Research Model

It is necessary to develop a research model that helps in explaining how independent


variables relate with the independent variables in a given study [16]. As shown in
Fig. 3 below, there are five independent variables. Employee engagement, job satis-
faction, workplace safety, valued social position, friends’ support, and work-family
conciliation all affect employee performance and employee satisfaction. Authors in [28]
believe that employee satisfaction and performance significantly influences their
chances of becoming leaders in their organization. It means that chances of female
employees rising to positions of management depend on how the independent variables
influence the intermediate variables, which in turn, influences the dependent variable.
The model below summarizes the relationship.

4 Research Hypotheses

It was important to develop hypotheses based on the primary research question and the
objectives of the study. The hypotheses would be rejected or confirmed based on the
findings that will be made from the analysis of primary data. Table 1 shows the
hypotheses that were developed for the study:
H1o. There is no direct relationship between happiness in the workplace environment
and employees’ satisfaction and performance.
676 K. Alameeri et al.

Fig. 3. Relationship between variables.

Table 1. Research hypotheses


# Hypothesis
H1o There is no direct relationship between happiness in the workplace environment and
employees’ satisfaction and performance
H1a There is a direct relationship between happiness in the workplace environment and
employees’ satisfaction and performance
H2o There is no direct relationship between employees’ satisfaction/performance and
employee leadership
H2a There is a direct relationship between employees’ satisfaction/performance and
employee leadership
H3o Women do not flourish in their careers to become successful leaders when they have a
happy workplace environment
H3a Women tend to flourish in their careers to become successful leaders when they have a
happy workplace environment

5 Methodology

To the best of the researcher’s knowledge, the most effective approach to this study
would be a combination of qualitative and quantitative methods. After reviewing the
relevant literature, the researcher will focus on the collection of primary data from a
sample of respondents. The primary data will help in answering the set research
questions. Table 2 below is the proposed timeline of the activities that will be con-
ducted to obtain data from a sample of respondents.
The Effect of Work Environment Happiness on Employee Leadership 677

Table 2. Gantt chart

Activity/Time (2020) Feb Mar Feb 10- Mar 25- Apr May

10-30 1-15 May 28 Apr 20 21-28 1-15

Proposal development.

Proposal approval

Literature review

Primary data collection

Data analysis

Write-up and editing

According to [29], when planning to collect primary data from specific individuals,
it is often important to define the criteria that would be used in selecting participants.
The inclusion/exclusion criteria help in ensuring that those who are selected to take part
in the investigation have the right skills that can enable the researcher to collect the
right information. Table 3 below shows the inclusion/exclusion criteria.

Table 3. Inclusion/exclusion criteria.


Inclusion criteria Exclusion criteria
Has been a resident of UAE for the last five Is an expatriate who has been in the country
years for less than five years
Is currently working in either public or Is retired or yet to be employed hence lacks
private corporation the desired experience of the current
workplace environment
Has adequate knowledge hence understands Has limited academic knowledge hence lacks
the concept of leadership (at least a college knowledge of the concept of leadership (is not
graduate) a college graduate)

In this study, judgmental sampling will be appropriate to ensure that individuals


with specific qualities are selected to participate in the study [30]. The researcher will
develop a questionnaire that will be used to collect data from the selected participants.
A face to face interview will be conducted with the respondents to collect primary data.
Once the data is collected, the researcher will use mixed method to conduct the
analysis. Table 4 below shows keyword searches used to identify online secondary
data.
678 K. Alameeri et al.

Table 4. Keyword search


Keyword Search
“Happiness” AND “Satisfaction”
“Happiness” AND “Performance”
“Women” AND “Leadership”
“Women” AND “Career”

Table 5 below shows the databases where online secondary data was obtained and
their search frequency.

Table 5. Databases frequencies.


Databases Frequency
Google Scholar 48
SAGE Journals 26
Elsevier 12
Wiley Online Library 7
APA PsycNet 5

Figure 4 below shows how digital data (online journal articles and books) were
identified and included in this study. The same process will be used when conducting
the study:

Fig. 4. PRISMA flowchart.


The Effect of Work Environment Happiness on Employee Leadership 679

6 Conclusion

This systematic review investigates the relationship between employee leadership and
happiness in the work environment. The analysis of secondary and primary data shows
that there are five variables, contributing to the level of the worker’s satisfaction at
workplace, such as employee engagement, job satisfaction, workplace safety, valued
social position, friends’ support, and work-family conciliation. The degree to which the
aforementioned aspects matter to the worker depends on the gender, culture, and
individual perceptions.

References
1. Gavin, J.H., Mason, R.O.: The virtuous organization: the value of happiness in the
workplace. Organ. Dyn. 33(4), 379–392 (2004)
2. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality,
price fairness and service recovery shape customer satisfaction and delight? A practical study
in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
3. Ashurideh, M.: Customer service retention–a behavioural perspective of the UK mobile
market. Durham University (2010)
4. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK
mobile phone market. J. Manag. Res. 6(1), 109 (2014)
5. Al-Emran, M., Mezhuyev, V., Kamaludin, A., Shaalan, K.: The impact of knowledge
management processes on information systems: a systematic review. Int. J. Inf. Manag. 43,
173–187 (2018)
6. Alshurideh, M., Masa’deh, R., Al kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Finance Adm. Sci. 47(12), 69–78 (2012)
7. Tasnim, Z.: Happiness at workplace: building a conceptual framework. World J. Soc. Sci. 6
(2), 62–70 (2016)
8. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
9. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
10. Alshurideh, M.T., Assad, N.F.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
11. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
12. Alshurideh, M., Nicholson, M., Xiao, S.: The effect of previous experience on mobile
subscribers’ repeat purchase behaviour. Eur. J. Soc. Sci. 30(3), 366–376 (2012)
13. Alshurideh, M.: The factors predicting students’ satisfaction with universities’ healthcare
clinics’ services: a case-study from the Jordanian higher education sector. Dirasat Adm. Sci.
161(1524), 1–36 (2014)
680 K. Alameeri et al.

14. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of
e-service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and
e-trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
15. Sandhu, K.: Leadership, management, and adoption techniques for digital service
innovation. IGI Global, Hershey, PA (2020)
16. Serrat, O.: Building a learning organization. In: Knowledge Solutions. Springer, Singapore
(2017)
17. Pisano, G.P.: Toward a prescriptive theory of dynamic capabilities: connecting strategic
choice, learning, and competition. Ind. Corp. Change 26(5), 747–762 (2017)
18. King, P.: Persuasion Tactics (Without Manipulation): Covert Psychology Strategies to
Influence, Persuade, & Get Your Way. London, UK Publ. Drive (2019)
19. Kane, G.: The technology fallacy: people are the real key to digital transformation. Res.
Manag. 62(6), 44–49 (2019)
20. Pauleen, D.J.: Personal Knowledge Management: Individual, Organizational and Social
Perspective. Routledge, New York (2016)
21. Bratianu, C.: Organizational learning and the learning organization. Res. 5(1), 1–20 (2018)
22. van Dam, N.: Elevating Learning & Development. Lulu. com (2018)
23. Alshurideh, M., Salloum, S.A., Al Kurdi, B., Al-Emran, M.: Factors affecting the social
networks acceptance: an empirical study using PLS-SEM approach. In: 8th International
Conference on Software and Computer Applications (2019)
24. Kurdi: Healthy-Food Choice and Purchasing Behaviour Analysis: An Exploratory Study of
Families in the UK. Durham University (2016)
25. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply Chain Integration and Customer
Relationship Management in the Airline Logistics (2019)
26. Cunningham, J.B.: Strategic Human Resource Management in the Public Arena: A
Managerial Perspective. Macmillan International Higher Education, London (2016)
27. Budhwar, P.S., Mellahi, K.: The Middle East context: an introduction. In: Handbook of
Human Resource Management in the Middle East. Edward Elgar Publishing (2016)
28. Pawirosumarto, S., Sarjana, P.K., Gunawan, R.: The effect of work environment, leadership
style, and organizational culture towards job satisfaction and its implication towards
employee performance in Parador Hotels and Resorts, Indonesia. Int. J. Law Manag. 59(6),
1337–1358 (2017)
29. Ravitch, S.M., Riggan, M.: Reason & Rigor: How Conceptual Frameworks Guide Research.
Sage Publications, Thousand Oaks (2016)
30. Boeren, E.: The methodological underdog: A review of quantitative research in the key adult
education journals. Adult Educ. Q. 68(1), 63–79 (2018)
Performance Appraisal on Employees’
Motivation: A Comprehensive Analysis

Maryam Alsuwaidi1 , Muhammad Alshurideh1,2 ,


Barween Al Kurdi3 , and Said A. Salloum4(&)
1
University of Sharjah, Sharjah, UAE
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Several analysis studies have been carried out with a view to pro-
viding valuable knowledge into the existing research outline of the performance
appraisal and employee motivation. The current study systematically reviews
and synthesizes the performance appraisal and employee motivation aiming to
provide a comprehensive analysis of 27 articles from 2015 to 2020. The research
will aim to establish the impact of performance appraisal fairness on the
employees’ motivation in an organization. To achieve its objective, the study
will adopt descriptive research. It will be informed of a survey, and there will be
a sample selection to make the process economical. This shows that there will be
a use of different techniques of information collection since the data to be
collected a primary data. There will be interviewing of the sample size, and their
responses will be noted down. The presence of the researcher may influence
some people, and this necessitates the use of questionnaires for the respondents
to fill on their own. In addition, most of the analyzed studies were conducted in
Malaysia, China, Pakistan, and India. Besides, most of the analyzed studies were
frequently conducted in job satisfaction and performance context, employee
motivation followed by organizational effectiveness context. To that end, the
findings of this review study provide an insight into the current trend of how
performance appraisal affects employee’s motivation.

Keywords: Performance appraisal  Employee motivation  Employee


Performance  Systematic review

1 Introduction

The system of performance appraisal has manifested to be among the famous para-
doxes in the effectiveness of human resource management in any of the world’s
organizations [1, 2]. The performance appraisal aims to enhance the efficacy and
efficiency in employees’ performance [3, 4]. The process involves evaluation of
employees’ performance in their respective departments based on the organizational set
rules, objectives, and procedures [5, 6]. The prime result of performance appraisal is to
promote accurateness of the employee’s performance and relating it to the likelihood of
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 681–693, 2021.
https://doi.org/10.1007/978-3-030-58669-0_61
682 M. Alsuwaidi et al.

rewards [7–9]. Many benefits accrue to the firm from the performance appraisal like job
satisfaction, employees’ morale, increased employees’ commitment, reduced employ-
ees’ turnover, feeling equity, and association of performance and rewards [10–12].
Following all these benefits that result from the performance appraisal, it is evident that
employees, through the practice, are motivated [7, 13, 14]. The main question that has
caused debate is the impact that the system has on the motivation of the organizational
workers [2, 5]. This is due to the opinions of [4]. Indeed, the employees in any
organization are highly motivated through the rewards, but this is not the only way [15,
16]. Additionally, there conditions for the rewards to be extended to them, and this calls
for the employment of the performance appraisal. Through the process, several factors
will promote the motivation of the employees. The proposed research will focus on the
impacts of the performance appraisal on the employees’ motivation [17].

1.1 Research Questions


The existing systematic literature review illustrates the importance of the performance
appraisal fairness on employee’s motivation, which is affected by other important
factors. This study is trying to examine the collected studies by focusing on the factors
that determined on performance appraisal and employee’s motivation [18]. Under-
standing the factors that are related to the performance appraisal and employee’s
motivation will help the researcher to plan forward to explore the impact of other
factors that are missed in the present body of literature. Overall, this study systemat-
ically reviews and synthesizes the Performance appraisal studies related to employee’s
motivation to analyze the collected studies comprehensively. Based on the 27 research
articles published on the performance appraisal on employee’s motivation from 2015 to
2020, I reported the results of this systematic review according to the five research
questions. In particular, the following five research questions are raised by this review
study:
– RQ1: How the performance appraisal will affect employee motivation?
– RQ2: What are the factors that will be changing the employee’s motivation?
– RQ3: How does the evaluation of employees influence their motivation?
– RQ4: What are the main research methods of the selected studies?
– RQ5: What are the main disciplines/contexts of the selected studies?

1.2 Research Important


In many organizations, one of the intrigues of management is the Performance
Appraisal System, which aims at enhancing fairness, effectiveness as well as boost job
performance within the departments of an organization. Performance appraisal has
been in existence for decades now with it being utilized to assess employees’ perfor-
mance put against specific laid down procedures and rules. Performance appraisal is
necessary for raising the performance levels of employees in terms of output quality
and time management. There many positive outcomes of a performance appraisal
program, which include increased motivation and teamwork spirit, morale among the
employees, reduction in employee turnover, job satisfaction as well as the possibility of
Performance Appraisal on Employees’ Motivation 683

a reward system [19, 20]. The breadth of the phrase performance appraisal gives it
different meanings to different people, one of them being that it is a process through
which people’s performance is gauged to enable a reward system to be put in place. In
a bid to determine whether performance appraisal fairness has any significant impact on
the employees’ motivation and to also help interrogate the issues faced by employees
that are related to performance appraisal and fairness in an organization. This paper
embarks on several systematic reviews of existing literature that utilize the effect of the
performance appraisal on employee’s motivation, which is trusted across many
evidenced-based fields.

2 Literature Review

This section involves a detailed discussion of the study’s reinforcement of the theories.
Additionally, it discusses the components of the system of performance appraisal and
the factors that are said to influence the motivation of the employees [21–23]. These
preset goals and objectives of the organization of the firm give the employees the
direction they will take, and they will be aware of the target of the organization. It is
through this outline that the employees will be able to assess themselves and know their
progress. Authors in [24] mentioned that the evaluation of the employees’ performance
would only be possible and effective if their results are compared against these set rules
and goals of the organization. Another prominent researcher about the performance
appraisal was Harrington. Harrington said that different employees set different pur-
poses in different situations of their work. He moreover argued that not all goals that
are placed could be reached. However, it has been mentioned that the level of
achievement could be improved through employees’ motivation [25, 26]. Most of these
researchers, among others, suggested that all organizations are obligated to having
motivated employees. The basis of the employees’ motivation is an evaluation of their
performance.
Once employees are evaluated, they will be able to know their contribution to the
business. It will be a better feeling to the employee that he is contributing to the success
of the business [27]. However, the performance of the employee may not meet the
required targets; it is also better for them since they will know their weakness, and they
will be corrected and given the right directions and techniques in performing their
tasks. Other organizations take and initiative of taking their employees for training to
improve the results of the performance appraisal. The key terms in this research study
are Performance appraisal, employee’s motivation, and Employee Performance.
• Performance Appraisal: The measure of achievements of goals with the vision
and mission of the company ([4], P. 16–0).
• Employee’s Motivation: Level of commitment, energy, as well as creativity that
the workers have in the course of their duties. ([28], p. 27).
• Employee Performance: The measure of achievements of goals in relation to the
vision and mission of the company ([29], P. 224–247).
684 M. Alsuwaidi et al.

• Systematic review: A systematic review summarizes the results of carefully


designed health studies available and critically appraise research studies synthesize
findings quantitatively or qualitatively [30].

3 Methods

A comprehensive review of the literature is an important step before any research is


carried out. It builds the basis for the accumulation of knowledge, which facilitates the
expansion and improvement of the theories, closes gaps in research, and exposes areas
in which previous research has been missing [31, 32]. The review was undertaken in
various stages: the identification of inclusion and exclusion criteria, data sources, and
quality assessment. The details of these stages are described in the following sub-
sections.

3.1 Inclusion/Exclusion Criteria


The articles that will be critically analyzed in this review study should meet the
inclusion and exclusion criteria described in Table 1.

Table 1. Inclusion and exclusion criteria.


No. Criteria Inclusion Exclusion
1 Date All –
Source type Peer-reviewed articles, Non- Peer-reviewed articles,
Scholarly journals, case studies, newspapers, Book reviews, and
Academic Journals, other types of publications
dissertations & theses
2 Language English Papers that use languages other
than English
3 Type of A quantitative, qualitative, Annual reports, advertisement,
studies systematic review directory, film, and other studies
4 Study design The survey, interview, case –
study
5 Measurement Service quality and retention –
6 Outcome Relationship between
Performance appraisal and
employees motivation
7 Context Should involve employees All contexts that do not mention
motivation and performance employee’s motivation and
appraisal performance appraisal in the
Title and the Abstract
Performance Appraisal on Employees’ Motivation 685

3.2 Data Sources and Correlation Analysis


The studies included in this systematic literature review were collected through a broad
search of available studies such as ProQuest One Academic, Science Direct, Wiley
Online Library, Sage journals, and JSTOR. Many scholars such as [33–36] have used
the systematic review method widely to stand for the high quality articles that need to
be analyzed.
The search results found 160 articles using the aforementioned keywords as seen in
Table 2. We filtered out 23 articles that found as duplicated. Thus, the total number of
the collected papers becomes 137, and their distribution according to databases they
belong to is presented in Table 3.

Table 2. The data sources and search keywords.


Keywords research
“Performance appraisal” AND “employee’s motivation”
“Performance appraisal influence” AND “employee’s motivation”
“Ti(appraisal performance)” AND “employee’s motivation”

Table 3. Show the selection criteria for articles from each data basis.
Databases #
ProQuest One Academic 86
Science Direct 41
Wiley Online Library 11
Sage journals 5
JSTOR 17
Total 160

Overall, 27 research articles met the inclusion criteria and have been used in the
analysis process. Figure 1 shows the systematic review process and the number of
articles determined at each stage.

3.3 Data Coding and Analysis


The features linked to the research methodology quality were coded, including the
following items [37]. (a) Title, (b) Authors, (c) Year, (d) Place, (e) Context, (f) Inde-
pendent Factors, (g) Mediating Variable, (h) Dependent factors, (i) Data Collection
Methods, (j) Sample Size, (k) Items and questions used, (l) Level &Trustworthiness,
(m) Journal, (n) data basis, (o) number of Citation, (p) effect and the findings.
Throughout this data analysis stage, I have excluded the studies that did not describe
the effect of service quality on customer retention [38]. Many figures were drawn below
to show the data that were extracted from the selected studies. In order to appropriately
and critically analyze the collected studies, we will take the same idea of the TAM
686 M. Alsuwaidi et al.

model, and will divide the factors into five groups [39–42]. The five groups are 1)
financial factors: all factors related to financing or money, 2) internal factors: all factors
related to the business strategy such as customer relationships, innovation management,
customer service, 3) customer-related factors: which are all factors related to the cus-
tomers, such as Customer Loyalty, and Customer retention, and 4) service quality-
related factors: will include service quality dimension like Responsiveness, Empathy,
Tangibles, Reliability, Assurance, 5) Contextual factors: factors that have a controlling
or mediating impact, such as Satisfaction and commitment.

Fig. 1. Systematic review process.

4 Results

4.1 The Frequency of the Factors


The pie chart Fig. 2 below explains the percentage of each independent and dependent
factors in the 27 studies articles. For the dependent factors as seen in Fig. 3, job
satisfaction was the highest, with 32%, followed by the employee’s motivation with
21%. For the Independent factor, the highest was the performance appraisal with 40%,
followed by employee’s performance with 35% and then employee’s engagement.
Performance Appraisal on Employees’ Motivation 687

Fig. 2. The frequency of dependent factors.

Fig. 3. The frequency of independent factors.

Dependent Frequency Independent Frequency


Job satisfaction 23 Performance appraisal 19
Employee motivation 15 Employees Performance 17
Job security 11 Employees engagement 12
Commitment 11
Employee turnover 11
Organizational effectiveness 13
(continued)
688 M. Alsuwaidi et al.

(continued)
# Source Country Database Method
# Source Country Database Method
1 [31] Bucharest, ScienceDirect Qualitative method: Survey
Romania
2 [43] Malaysia ScienceDirect Qualitative method: Survey
3 [44] Bali, Indonesia ScienceDirect Quantitative method: Survey
4 [45] Pakistan Proquest Qualitative method: Interview
5 [46] Kelantan, Proquest Quantitative method: Questionnaire
Malaysia
6 [18] Kampala, Proquest Cross-sectional, correlational
Uganda method: Questionnaire
7 [47] San Diego Proquest Quantitative method: Survey
8 [43] St Louis ScienceDirect Qualitative method: Literature
9 [48] Punjab, Pakistan Proquest Quantitative method: Questionnaire
10 [49] Norway ScienceDirect Quantitative method: Questionnaire
11 [50] India JSTOR Cross-sectional method: Survey
questionnaire
12 [51] Pakistan JSTOR Quantitative method: Survey
13 [52] Taiwan JSTOR Quantitative method: Questionnaire
14 [53] Australia JSTOR Qualitative method: Survey
15 [54] Literature JSTOR Qualitative - Literature
16 [55] Ujjain JSTOR Qualitative method
17 [56] Canada Wiley Online Quantitative method: Survey
Library
18 [57] China Wiley Online Quantitative method: Survey
Library
19 [58] Hong Kong, Wiley Online Quantitative method: Survey
China Library
20 [59] Malaysia Sage journals Quantitative method: Survey
21 [60] Germany Sage journals Quantitative method: Interview
22 [61] China Sage journals Quantitative method: Survey
23 [62] Malaysia ScienceDirect Cross-sectional
Quantitative method: Survey
24 [12] Malaysia ScienceDirect Quantitative method
Questionnaire
25 [26] United State of ProQuest Quantitative method
America Questionnaire
26 [53] Australia Taylor & Qualitative case study methodology
Francis
27 [63] Peshawar, ProQuest Qualitative method: Survey
Pakistan

For research, the articles were distrusted and collected around the countries, which
conducted these studies. The table above shows that Performance appraisal research
Performance Appraisal on Employees’ Motivation 689

Fig. 4. Distribution of studies in terms of country

articles that were extended by external variables were frequently carried out by
Malaysia, followed by Pakistan, China, India, Taiwan, Canada, Australia, Germany,
the United States of America, and others. The figure below shows the distribution of
studies in terms of the country. The table illustrates the distribution of the total col-
lected articles across different countries that these studies were conducted. The majority
of them are clearly shown of the performance appraisal on employee motivation studies
(N = 4) were undertaken in Malaysia and Pakistan, followed by China (N = 4),
respectively, among the other countries.

5 Conclusion

The systematic review of performance appraisal on employee’s motivation has indi-


cated that fair performance appraisals play a massive role in determining the motivation
levels in employees. The different approaches used by various organizations to the
process, whether in the private or public sector, shapes the way workers perceive the
process. Ideally, performance appraisal ought to assess the competencies that the
employees possess their performance and provide the data so that the management can
use the said data to reward them accordingly. Some of the results of performance
appraisal are growth-oriented, and therefore, employee commitment to the organiza-
tional goals is pegged on the process. Performance appraisal is a crucial process in any
organization. For any organization to achieve its desired goals of management and
employee output there has to be put in place an effective performance appraisal system.
An effective appraisal system engages in performance planning as a foundation. It also
prescribes the steps to be taken when managing and reviewing performances as well as
performance measuring and monitoring techniques. The process should ensure that
there is impartiality and that employees have the right perception to safeguard their
motivation and commitments towards organizational ideals.
690 M. Alsuwaidi et al.

References
1. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: an empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Finance Adm. Sci. 51(1), 44–64 (2012)
2. Ammari, G., Alkurdi, B., Alshurideh, M., Alrowwad, A.: Investigating the impact of
communication satisfaction on organizational commitment: a practical approach to increase
employees’ loyalty. Int. J. Mark. Stud. 9(2), 113–133 (2017)
3. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84–95 (2012)
4. Bowra, Z.A., Nasir, A.: Impact of fairness of performance appraisal on motivation and job
satisfaction in banking sector of Pakistan. J. Basic Appl. Sci. Res. 4(2), 16–20 (2014)
5. Alshurideh, M., Alhadid, A., Alkurdi, B.: The effect of internal marketing on organizational
citizenship behavior an applicable study on the University of Jordan employees. Int. J. Mark.
Stud. 7(1), 138 (2015)
6. Obeidat, B., Sweis, R., Zyod, D., Alshurideh, M.: The effect of perceived service quality on
customer loyalty in internet service providers in Jordan. J. Manag. Res. 4(4), 224–242
(2012)
7. Alshraideh, A., Al-Lozi, M., Alshurideh, M.: The impact of training strategy on
organizational loyalty via the mediating variables of organizational satisfaction and
organizational performance: an empirical study on jordanian agricultural credit corporation
staff. J. Soc. Sci. 6, 383–394 (2017)
8. Obeidat, Z., Alshurideh, M., Al Dweeri., R., Masa’deh, R.: The influence of online revenge
acts on consumers psychological and emotional states: does revenge taste sweet? In: 33
IBIMA Conference proceedings, Granada, Spain, 10–11 April 2019 (2019)
9. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.: Corporate social responsibility and patronage intentions: the mediating effect
of brand credibility. J. Mark. Commun., 1–24 (2020)
10. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
11. Alshurideh, M., Masa’deh, R., Alkurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Finance Adm. Sci. 47(12), 69–78 (2012)
12. Hamid, S., Hamali, J.B.H., Abdullah, F.: Performance measurement for local authorities in
Sarawak. Procedia-Soc. Behav. Sci. 224, 437–444 (2016)
13. Ashurideh, M.: Customer service retention–a behavioural perspective of the UK mobile
market. Durham University (2010)
14. Alshurideh, M.T.: Exploring the main factors affecting consumer choice of mobile phone
service provider contracts. Int. J. Commun. Netw. Syst. Sci. 9(12), 563–581 (2016)
15. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
16. Alshurideh, M.: A qualitative analysis of customer repeat purchase behaviour in the UK
mobile phone market. J. Manag. Res. 6(1), 109 (2014)
17. Teo, S.T.T., Bentley, T., Nguyen, D.: Psychosocial work environment, work engagement,
and employee commitment: a moderated, mediation model. Int. J. Hosp. Manag. 88, 102415
(2019)
Performance Appraisal on Employees’ Motivation 691

18. Sendawula, K., Nakyejwe Kimuli, S., Bananuka, J., Najjemba Muganga, G.: Training,
employee engagement and employee performance: evidence from Uganda’s health sector.
Cogent Bus. Manag. 5(1), 1470891 (2018)
19. Alshurideh, M., et al.: Loyalty program effectiveness: theoretical reviews and practical
proofs. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
20. Alzoubi, H., Alshurideh, M., Al Kurdi, B., Inairata, M.: Do perceived service value, quality,
price fairness and service recovery shape customer satisfaction and delight? A practical study
in the service telecommunication context. Uncertain Supply Chain Manag. 8(3), 1–10 (2020)
21. Alshurideh, M.T., et al.: The impact of Islamic bank’s service quality perception on
Jordanian customer’s loyalty. J. Manag. Res. 9, 139–159 (2017)
22. Al Dmour, H., Alshurideh, M., Shishan, F.: The influence of mobile application quality and
attributes on the continuance intention of mobile shopping. Life Sci. J. 11(10), 172–181
(2014)
23. Meng, F., Wu, J.: Merit pay fairness, leader-member exchange, and job engagement:
evidence from Mainland China. Rev. Public Pers. Adm. 35(1), 47–69 (2015)
24. Lothian, N.: Measuring corporate performance: a guide to non-financial indicators.
Chartered Institute of Management Accountants (1987)
25. Alshurideh, D.M.: Do electronic loyalty programs still drive customer choice and repeat
purchase behaviour? Int. J. Electron. Cust. Relatsh. Manag. 12(1), 40–57 (2019)
26. Harrington, J.R., Lee, J.H.: What drives perceived fairness of performance appraisal?
Exploring the effects of psychological contract fulfillment on employees’ perceived fairness
of performance appraisal in US federal agencies. Public Pers. Manag. 44(2), 214–238 (2015)
27. Rubel, M.R.B., Kee, D.M.H.: High commitment compensation practices and employee
turnover intention: mediating role of job satisfaction. Mediterr. J. Soc. Sci. 6(6 S4), 321
(2015)
28. Edirisooriya, W.A.: Impact of rewards on employee performance: with special reference to
ElectriCo. In: Proceedings of the 3rd International Conference on Management and
Economics, vol. 26, no. 1, pp. 311–318 (2014)
29. Sharma, N.P., Sharma, T., Agarwal, M.N.: Measuring employee perception of performance
management system effectiveness. Empl. Relat. 38, 224–247 (2016)
30. Martinic, M.K., Pieper, D., Glatt, A., Puljak, L.: Definition of a systematic review used in
overviews of systematic reviews, meta-epidemiological studies and textbooks. BMC Med.
Res. Methodol. 19(1), 203 (2019)
31. Arnăutu, E., Panc, I.: Evaluation criteria for performance appraisal of faculty members.
Procedia-Social Behav. Sci. 203, 386–392 (2015)
32. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
33. Assad, N.F., Alshurideh, M.T.: Financial reporting quality, audit quality, and investment
efficiency: evidence from GCC economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208
(2020)
34. Alshurideh, M.T., Assad, N.F.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
35. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
36. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
692 M. Alsuwaidi et al.

37. Al-Emran, M., Mezhuyev, V., Kamaludin, A., Shaalan, K.: The impact of knowledge
management processes on information systems: a systematic review. Int. J. Inf. Manag. 43
(July), 173–187 (2018)
38. Calabrò, A., Vecchiarini, M., Gast, J., Campopiano, G., De Massis, A., Kraus, S.: Innovation
in family firms: a systematic literature review and guidance for future research. Int. J. Manag.
Rev. 21(3), 317–355 (2019)
39. Salloum, S.A., Al-Emran, M.: Factors affecting the adoption of e-payment systems by
university students: extending the TAM with trust. Int. J. Electron. Bus. 14(4), 371–390
(2018)
40. Alhashmi, S.F.S., Salloum, S.A., Abdallah, S.: Critical success factors for implementing
Artificial Intelligence (AI) Projects in Dubai Government United Arab Emirates
(UAE) health sector: applying the extended Technology Acceptance Model (TAM). In:
International Conference on Advanced Intelligent Systems and Informatics, pp. 393–405
(2019)
41. Alshurideh, M., Al Kurdi, B., Salloum, S.: Examining the main mobile learning system
drivers’ effects: a mix empirical examination of both the Expectation-Confirmation Model
(ECM) and the Technology Acceptance Model (TAM). In: International Conference on
Advanced Intelligent Systems and Informatics, pp. 406–417 (2019)
42. Al-Emran, M., Mezhuyev, V., Kamaludin, A.: Technology acceptance model in M-learning
context: a systematic review. Comput. Educ. 125, 389–412 (2018)
43. Nair, M.S., Salleh, R.: Linking performance appraisal justice, trust, and employee
engagement: a conceptual framework. Procedia-Soc. Behav. Sci. 211, 1155–1162 (2015)
44. Syafii, L.I., Thoyib, A., Nimran, U.: The role of corporate culture and employee motivation
as a mediating variable of leadership style related with the employee performance (studies in
Perum Perhutani). Procedia-Soc. Behav. Sci. 211, 1142–1147 (2015)
45. Khaskhelly, F.Z.: Investigating the impact of training on employee performance: a study of
non-government organizations at Hyderabad division. University of Sindh, Jamshoro (2018)
46. Baleghi-Zadeh, S., Ayub, A.F.M., Mahmud, R., Daud, S.M.: Behaviour Intention to use the
learning management: integrating technology acceptance model with task-technology fit.
Middle-East J. Sci. Res. 19(1), 76–84 (2014)
47. Kumar, R.: Research Methodology: A Step-by-step Guide for Beginners. Sage Publications
Limited, Thousand Oaks (2019)
48. Imran, M., Hamid, S.N.B.A., Aziz, A.B., Wan, C.Y.: The effect of performance appraisal
politics on employee performance in emergency services of Punjab, Pakistan. Acad. Strateg.
Manag. J. 18, 1–7 (2019)
49. Saether, E.A.: Motivational antecedents to high-tech R&D employees’ innovative work
behavior: self-determined motivation, person-organization fit, organization support of
creativity, and pay justice. J. High Technol. Manag. Res. 30(2), 100350 (2019)
50. Shrivastava, A., Purang, P.: Performance appraisal fairness & its outcomes: a study of Indian
banks. Indian J. Ind. Relat. 51, 660–674 (2016)
51. Sattar, T., Ahmad, K., Hassan, S.M.: Role of human resource practices in employee
performance and job satisfaction with mediating effect of employee engagement. Pak. Econ.
Soc. Rev. 53, 81–96 (2015)
52. Yang, J.-T., Wan, C.-S., Wu, C.-W.: Effect of internal branding on employee brand
commitment and behavior in hospitality. Tour. Hosp. Res. 15(4), 267–280 (2015)
53. Gollan, P.J., Kalfa, S., Agarwal, R., Green, R., Randhawa, K.: Lean manufacturing as a
high-performance work system: the case of cochlear. Int. J. Prod. Res. 52(21), 6434–6447
(2014)
54. Dobre, O.-I.: Employee motivation and organizational performance. Rev. Appl. Soc.-Econ.
Res. 5(1) (2013)
Performance Appraisal on Employees’ Motivation 693

55. Jain, R.: Employee innovative behavior: a conceptual framework. Indian J. Ind. Relat. 51, 1–
16 (2015)
56. Schmidt, C.G., Foerstl, K., Schaltenbrand, B.: The supply chain position paradox: green
practices and firm performance. J. Supply Chain Manag. 53(1), 3–25 (2017)
57. Ren, T., Xiao, Y., Yang, H., Liu, S.: Employee ownership heterogeneity and firm
performance in China. Hum. Resour. Manag. 58(6), 621–639 (2019)
58. Kim, T., Wang, J., Chen, T., Zhu, Y., Sun, R.: Equal or equitable pay? Individual differences
in pay fairness perceptions. Hum. Resour. Manag. 58(2), 169–186 (2019)
59. Hanaysha, J.R.: An examination of the factors affecting consumer’s purchase decision in the
Malaysian retail market. PSU Res. Rev. 2, 7–23 (2018)
60. Hauff, S., Alewell, D., Hansen, N.K.: Further exploring the links between high-performance
work practices and firm performance: a multiple-mediation model in the German context.
Ger. J. Hum. Resour. Manag. 32(1), 5–26 (2018)
61. Li, F., Chen, T., Lai, X.: How does a reward for creativity program benefit or frustrate
employee creative performance? The perspective of transactional model of stress and coping.
Gr. Organ. Manag. 43(1), 138–175 (2018)
62. Ibrahim, Z., Ismail, A., Mohamed, N.A.K., Raduan, N.S.M.: Association of managers’
political interests towards employees’ feelings of distributive justice and job satisfaction in
performance appraisal system. Procedia-Soc. Behav. Sci. 224, 523–530 (2016)
63. Naeem, M., Jamal, W., Riaz, M.K.: The relationship of employees’ performance appraisal
satisfaction with employees’ outcomes: evidence from higher educational institutes.
FWU J. Soc. Sci. 11(2), 71–81 (2017)
Social media and Digital transformation
Social Media Impact on Business:
A Systematic Review

Fatima Ahmed Almazrouei1, Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
malshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Social media is a multifaceted phenomenon that significantly affects


business competence mainly because of spearheading the evolutionary process.
The primary purpose of the systematic review is to encompass the evaluation of
social media as a model that influences business enterprises in the local and
international levels. The systematic review utilized four primary hypotheses to
determine the influence of social media on businesses. These hypotheses are
Social media (SM) that significantly influences the sales (SL) in business, Social
media (SM) which have a strong relationship with businesses loyalty (LO),
Social media (SM) that influences business by awareness (AW), and Social
media (SM) significantly influences the level of business performance (BP).
Different research studies established that social media significantly contributes
to the competence of firms mainly because of the global effect. Examples of
these social media facets include social media knowledge and various platforms
such as Facebook, Instagram, Twitter, YouTube, and LinkedIn. In this case,
social media fostered the emergence of various business capabilities. Examples
of these capabilities encompass brand awareness, brand loyalty, and sales.
Social media is a platform that profoundly influences the level of business
competence through the advancement of business capabilities.

Keywords: Social media (SM)  Platforms  Systematic review  Business


capabilities

1 Introduction

Social media was initially created as a mechanism of connection and reconnection


among the people and has since demonstrated immense impacts on the lifestyle of
people around the world. Social media has contributed to significant importance to
people since it has become much more to the global community [1–9]. The
advancements of the internet structure, specifically social media and networking has
affected the online community. Authors in [10] assert that the overall availability of the
internet has allowed various people to use social media platforms such as Facebook,
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 697–707, 2021.
https://doi.org/10.1007/978-3-030-58669-0_62
698 F. A. Almazrouei et al.

Twitter, Snapchat, Google+, LinkedIn, YouTube, Instagram, and many others. More
so, in [11], the authors postulate that the social media environment has further been
employed in the creation of content online to add to the interaction process. The source
also indicates that social media platforms and online communities have increased the
accessibility to information. The sites have facilitated the ease of sharing information
and content, without making physical contact. Delaveau et al. in [12] noted that it is
with these advancements that social networking sites (SNSs) have emerged and taken
the center-stage in e-commerce.
Research Importance: Social media, especially in commercial advertising. The
consumers have benefited in the process by sharing information and recommendations
of goods and services, thereby improving the critical factor of marketing. The uti-
lization of social media in the commercial advertisement is a trending strategy and is
here to stay due to the current extremely competitive world.
Research Objectives: The general objective of this systematic review is to examine
the use of social media in Business. This illustrated by various studies on the outcome
of using social media in the marketing process and in business as a whole.
Research questions: This systematic review includes several questions that lead to
building the hypotheses and research model. The main question is “what is the impact
of social media on business?” and there are three sub research questions as shown in
Fig. 1.

Fig. 1. Sub research questions


Social Media Impact on Business: A Systematic Review 699

2 Literature Review

Hypotheses indicate the anticipated outcome or the present reports in the literature
concerning social media use in business. These are the key terms that will be addressed
in this part. Based on the literature, it is hypothesized that social media significantly
influences the level of business performance. Additionally, it is hypothesized that social
media has a strong relationship with businesses through the enhancement of business
capabilities as well as it hypothesized that social media influences business perfor-
mance through the various elements of business capabilities. Tan et al. in [13]
researched the effectiveness of utilizing social media in business. The findings of the
studies indicate that the text content that is created and developed in the platforms, ad
utilizes particular words that have a positive influence on the attention of the users.
The key terms in this research study are Social media, commercial advertising, and
platforms.
– Social Media: describes various forms of electronic communication like blogging
websites and those used for social networking that allows individuals to use them to
share personal messages, ideas, various contents, and to share information.
– Platforms: are a number of technological developments that act as a base for the
development of other technologies, processes, or applications.
– Systematic Review: A sort of literature review using systematic approaches to
gather secondary data objectively analyses research studies
– Business Capability: is a representation and expression of what business does and
has an ability to do.

2.1 Literature Gaps


Although there has been more research on the impact of social media, little is known
about the impacts on business. The key terms that will be utilized in this section are the
gaps or the limited information on the research topic. Various social media platforms
have facilitated customer activities and commercial advertising where the businesses
have benefited a lot through them. For instance, platforms like LinkedIn, Facebook,
Instagram, Twitter, and other sites have been advantageous in accessing a more sig-
nificant number of people and engaging them to buy or sell certain products. This has
been effective since a significant number of people are reached within a short time
although their impacts on business have not been fully reported.

3 Methods

An extensive literature review was conducted to build a foundation for theory


expansion, close the existing research gap, and uncover areas that previous research has
missed. The literature review was carried out based on the systematic review conducted
in this context. The study used different distinct stages, namely, inclusion and exclusion
criteria, data sources and search strategies, quality assessment, and data coding, as
shown in Tables 1, 2, and 3.
700 F. A. Almazrouei et al.

Table 1. Inclusion and exclusion criteria.


Inclusion criteria Exclusion criteria
Should involve Social Media and Business Social media is not used by business
Should be peer-reviewed for the document Articles on health and science
type
Should be a scholarly journal for the source Non-scholarly journals are excluded
type
The articles should be in English Articles not in English
Should be published between 2015–2020 Papers published in 2014 and back are
excluded

Table 2. Data sources and correlation analysis


Keyword search “Social Media” AND
“Business Performance” “Social Media Influence” AND
“Business” ✓ “Social Media influence” AND “Sales”Allintitle:
Social media AND Business ✓

3.1 Search Strategy


The studies used in this systematic review were through conducting an advanced search
of the available studies between 2015 to 2020 from the following databases: Google
Scholar, Science Direct, IEEE Explore, JSTOR, Emerald, and Sage. Keywords used to
filter the results are “Social Media” AND “Business Performance”, “Social Media
influence” AND “Business”, “Social Media influence” AND “Sales” and All in the
title: Social media AND Business. In fact, two keywords were chosen to be used in the
search: “Social Media influence” AND “Business” and All in the title: Social media
AND Business in order to get better and specified results. These keywords resulted in
747 articles from six databases, and after filtering the duplicated articles out, 45 articles
remained. After reading the abstract and scanning the 45 articles, 30 articles were
selected for the systematic review.

Table 3. Data sources and databases.


Databases Frequency After filtering & removing Selected
duplicates studies
Google Scholar Keywords 330 11 9
(All in Title)
Science Direct 45 8 4
IEEE Explore 134 5 2
JSTOR 103 6 4
Emerald 121 12 10
Sage 14 3 1
Total 747 45 30
Social Media Impact on Business: A Systematic Review 701

In fact, some databases such as Google scholar give a hundred thousand results and
for this reason the keyword of all in the title: Social media AND Business was chosen
for this database to minimize and specified the results to be more related to the topic.
Moreover, the results were minimized from a hundred thousand articles to 330 articles.

3.2 Data Extraction


Those characteristics related to the research model included the research model,
methods discipline level of education. The studies that did not depict the inclusion
criteria were excluded from synthesis as seen in Table 4. The systematic review pro-
cess steps have been explaining in Fig. 2.

Table 4. Data sources and Databases


N Databases Number of Number of Number of Number of
articles articles articles articles
Stage 1 Stage 2 Stage 3 Stage 4
1 ProQuest 123 89 30 12
2 Emerald 117 42 12 9
3 Science 98 71 15 11
Direct
4 Taylor & 34 9 5 1
Francis
5 Google 989 102 11 8
Scholar
Total 1361 313 73 41

3.3 Quality Assessment


A quality assessment, together with the inclusion and exclusion criteria already men-
tioned, was used to evaluating the articles’ quality as seen in Table 5. Also, the quality
assessment checklist was developed where points were awarded on a three-point scale.
In this case, ‘yes’ was worth 1 point, ‘No’ had no point while ‘Partially’ was awarded
0.5 points. The score for each research study ranged between 0 to 9, with the highest
indicating that the research study addresses most of the research questions as seen in
Table 6. The 30 remaining articles after removing the duplicated number are the ones
that passed the quality assessment as seen in Table 7. The reviews are thus qualified for
further analysis.
702 F. A. Almazrouei et al.

Fig. 2. Systematic review Process

Table 5. Quality assessment questions


# Question
1 Are the research aims clearly specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study clearly specified?
4 Is the study context/discipline clearly specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?

Table 6. Quality assessment results.


S# Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Total Percentage
S1 1 1 1 1 1 1 1 0.5 0.5 8 89%
S2 1 1 1 1 1 1 1 1 0.5 8.5 94%
S3 1 1 1 0.5 1 1 1 1 1 8.5 94%
S4 1 1 1 0.5 1 1 1 0.5 1 8 89%
S5 1 1 0.5 1 1 1 1 1 1 8.5 94%
S6 1 1 1 1 1 0.5 1 1 1 8.5 94%
(continued)
Social Media Impact on Business: A Systematic Review 703

Table 6. (continued)
S# Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Total Percentage
S7 1 1 0.5 1 1 1 1 1 1 8.5 94%
S8 1 0.5 1 1 1 0.5 1 1 1 8 89%
S9 1 1 1 1 0.5 1 1 1 1 8.5 94%
S10 1 1 0.5 1 1 0.5 1 1 1 8 89%
S11 1 1 1 1 0.5 1 1 1 1 8.5 94%
S12 1 1 1 0.5 1 1 1 1 1 8.5 94%
S13 1 1 1 1 0.5 1 1 0.5 1 8 89%
S14 1 0.5 1 1 1 1 1 1 1 8.5 94%
S15 1 1 1 0.5 1 1 1 1 0.5 8 89%
S16 1 1 1 1 1 0.5 1 1 1 8.5 94%
S17 1 1 1 0.5 1 1 0.5 1 1 8 89%
S18 1 1 0.5 1 1 1 1 1 1 8.5 94%
S19 1 1 1 1 0.5 1 1 1 1 8.5 94%
S20 0.5 1 0.5 1 1 1 1 1 1 8 89%
S21 1 1 1 1 1 0.5 1 1 1 8.5 94%
S22 1 0.5 1 1 0.5 1 1 0.5 1 7.5 83%
S23 1 1 1 0.5 1 1 1 1 1 8.5 94%
S24 1 1 0.5 1 1 1 1 1 1 8.5 94%
S25 1 1 1 1 1 1 1 0.5 1 8.5 94%
S26 1 0.5 1 1 1 0.5 1 1 1 8 89%
S27 0.5 1 1 1 1 1 1 1 1 8.5 94%
S28 1 1 1 1 0.5 0.5 1 1 1 8 89%
S29 1 0.5 1 0.5 1 1 1 0.5 1 7.5 83%
S30 1 1 0.5 1 1 1 0.5 1 1 8 89%

Table 7. Data analysis.


Source Dependent factors Independent factors
source Brand Sales Brand Social Social Facebook LinkedIn Instagram Google Social
awareness loyalty media media media
knowledge Advertising platforms
[14] x x
[15] x x x x x x x
[16] x x x x
x x x x x
[17] x x x x
[18] x x x x x
[19] x x x
[20] x x x x x
[21] x x x
[22] x x x x x
[23] x x x x x
[24] x x x
(continued)
704 F. A. Almazrouei et al.

Table 7. (continued)
Source Dependent factors Independent factors
source Brand Sales Brand Social Social Facebook LinkedIn Instagram Google Social
awareness loyalty media media media
knowledge Advertising platforms
[25] x x x x
[26] x x x x
[27] x x x
[28] x x x
[29] x x
[30] x x
[31] x
[32] x x
[33] x x x x
[34] x x x x
[35] x
[36] x x
[37] x x x x x
[38] x x
[39] x x x x x
[40] x x
[41] x x x x
[42] x x x x x x

4 Results and Analysis

Social media encompasses an array of tools and knowledge that significantly influence
the competence of business through increased sales and performance [43, 44]. Pri-
marily, social media acts as a platform that enhances the improvement of business
operations mainly because of the intensified global connectivity. In this case, busi-
nesspeople acquire dynamic capabilities through various social media tools. These
capabilities include advanced brand awareness, loyalty, innovative insights, product
diversification, and international business network. The main objective of the research
model entails unveiling the relationship between social media and businesses. There-
fore, the model demonstrates the significant patterns under the ideology of independent,
mediator, and dependent factors. I chose this model as it offers contact that assists
companies to utilize the various acquired capabilities through social media to boost the
competence of the business.
Businesses collect information about their competence level through various
capability entities and social media. These are referred to as strategic business points,
whereby companies use to create competent marketing responses. The various capa-
bility entities encompass brand awareness, brand loyalty, product innovation, product
diversification, and global business networks. Insights from these entities can be used
to increase the quality of business operations, such as the process of developing
products and delivering services. Businesses can also use this information to customize
their products and services to fit individual tastes and preferences. Effective business
strategies contribute to the enhancement of business operations and competence
Social Media Impact on Business: A Systematic Review 705

through the information gathered from the various social media platforms. With this
research, businesses will get equipped with knowledge of how to effectively use social
media through the acquired dynamic capabilities, to make sales and marketing activ-
ities, which eventually leads to an increase in sales.

5 Conclusion

Social media is a multifaceted phenomenon that profoundly influences the level of


business competence, mainly because of the enhancement of its capabilities. In most
cases, social media plays the support role of an interactive platform and source of
knowledge. The intensification of connectivity renders a niche market for the busi-
nesses to exploit through the sufficient flow of information. As a result, enterprises
acquire dynamic capabilities to use in gaining a competitive position in the niche
market. Therefore, social media is a platform that significantly influences businesses
through the enhancement of competence.

References
1. Salloum, S.A., Al-Emran, M., Monem, A.A., Shaalan, K.: A survey of text mining in social
media: Facebook and twitter perspectives. Adv. Sci. Technol. Eng. Syst. J. 2(1), 127–133
(2017)
2. Mhamdi, C., Al-Emran, M., Salloum, S.A.: Text mining and analytics: a case study from
news channels posts on Facebook, vol. 740 (2018)
3. Salloum, S.A., Al-Emran, M., Shaalan, K.: Mining text in news channels: a case study from
Facebook. Int. J. Inf. Technol. Lang. Stud. 1(1), 1–9 (2017)
4. Habes, M., Alghizzawi, M., Khalaf, R., Salloum, S.A., Ghani, M.A.: The relationship
between social media and academic performance: Facebook perspective. Int. J. Inf. Technol.
Lang. Stud. 2(1), 12–18 (2018)
5. Alshurideh, M., Salloum, S.A., Al Kurdi, B., Al-Emran, M.: Factors affecting the acceptance
of the social network: an empirical study using PLS-SEM approach. In: ACM International
Conference Proceeding Series, vol. Part F1479 (2019)
6. Salloum, S.A., Mhamdi, C., Al Kurdi, B., Shaalan, K.: Factors affecting the adoption and
meaningful use of social media: a structural equation modeling approach. Int. J. Inf. Technol.
Lang. Stud. 2(3), 96–109 (2018)
7. Salloum, S.A., Maqableh, W., Mhamdi, C., Al Kurdi, B., Shaalan, K.: Studying the social
media adoption by university students in the United Arab Emirates. Int. J. Inf. Technol.
Lang. Stud. 2(3), 83–95 (2018)
8. Salloum, S.A., Al-Emran, M., Habes, M., Alghizzawi, M., Ghani, M.A., Shaalan, K.:
Understanding the impact of social media practices on e-learning systems acceptance, vol.
1058 (2020)
9. Al-Maroof, R.S., Salloum, S.A., AlHamadand, A.Q.M., Shaalan, K.: A Unified model for
the use and acceptance of stickers in social media messaging. In: International Conference on
Advanced Intelligent Systems and Informatics, pp. 370–381 (2019)
10. Lu, Y., Zhao, L., Wang, B.: From virtual community members to C2C e-commerce buyers:
Trust in virtual communities and its effect on consumers’ purchase intention. Electron.
Commer. Res. Appl. 9(4), 346–360 (2010)
706 F. A. Almazrouei et al.

11. Lai, L.S.L., Turban, E.: Groups formation and operations in the Web 2.0 environment and
social networks. Group Decis. Negot. 17(5), 387–402 (2008)
12. Delaveau, F., Mueller, A., Ngassa, C.K., Guillaume, R., Molière, R., Wunder, G.:
Perspectives of physical layer security (physec) for the improvement of the subscriber
privacy and communication confidentiality at the air interface. Perspectives (Montclair) 27,
28 (2016)
13. Tan, W.J., Kwek, C.L., Li, Z.: The antecedents of effectiveness interactive advertising in the
social media. Int. Bus. Res. 6(3), 88 (2013)
14. Veldeman, C., Van Praet, E., Mechant, P.: Social media adoption in business-to-business: IT
and industrial companies compared. Int. J. Bus. Commun. 54(3), 283–305 (2017)
15. Kumarasamy, T., Srinivasan, J.: Impact of social media applications on small and medium
business entrepreneurs in India. Int. J. Commer. Manag. Res. 3(10), 50–53 (2017)
16. Flanigan, R.L., Obermier, T.R.: An assessment of the use of social media in the industrial
distribution business-to-business market sector. J. Technol. Stud. 42(1), 18–29 (2016)
17. Bouwman, H., Nikou, S., Molina-Castillo, F.J., de Reuver, M.: The impact of digitalization
on business models. Digit. Policy Regul. Gov. 20, 105–124 (2018)
18. Gavino, M.C., Williams, D.E., Jacobson, D., Smith, I.: Latino entrepreneurs and social
media adoption: personal and business social network platforms. Manag. Res. Rev. 42, 469–
494 (2019)
19. Ahmad, S.Z., Bakar, A.R.A., Ahmad, N.: Social media adoption and its impact on firm
performance: the case of the UAE. Int. J. Entrep. Behav. Res. 25, 84–111 (2019)
20. Parveen, F., Jaafar, N.I., Ainin, S.: Social media’s impact on organizational performance and
entrepreneurial orientation in organizations. Manag. Decis. 54, 2208–2234 (2016)
21. Eid, R., Abdelmoety, Z., Agag, G.: Antecedents and consequences of social media
marketing use: an empirical study of the UK exporting B2B SMEs. J. Bus. Ind. Mark. 35,
284–305 (2019)
22. Cao, Y., Ajjan, H., Hong, P., Le, T.: Using social media for competitive business outcomes.
J. Adv. Manag. Res. 15, 211–235 (2018)
23. Moy, M.M., Cahyadi, E.R., Anggraeni, E.: The impact of social media on knowledge
creation, innovation, and performance in small and medium enterprises. Indonesian J. Bus.
Entrep. 6(1), 23 (2020)
24. Kwon, J., Han, I., Kim, B.: Effects of source influence and peer referrals on information
diffusion in Twitter. Ind. Manag. Data Syst. 117, 896–909 (2017)
25. Tóth, Z., Liu, M., Luo, J., Braziotis, C.: The role of social media in managing supplier
attractiveness. Int. J. Oper. Prod. Manag. (2019)
26. del Carmen Alarcón, M., Rialp, A., Rialp, J.: The effect of social media adoption on
exporting firms’ performance. In: Entrepreneurship in International Marketing. Emerald
Group Publishing Limited (2015)
27. Jones, N., Borgman, R., Ulusoy, E.: Impact of social media on small businesses. J. Small
Bus. Enterp. Dev. 22, 611–632 (2015)
28. Grizane, T., Jurgelane, I.: Social media impact on business evaluation. Procedia Comput.
Sci. 104, 190–196 (2017)
29. Rugova, B., Prenaj, B.: Social media as marketing tool for SMEs: opportunities and
challenges. Acad. J. Bus. 2(3), 85–97 (2016)
30. Wamba, S.F., Carter, L.: Social media tools adoption and use by SMEs: an empirical study.
In: Social Media and Networking: Concepts, Methodologies, Tools, and Applications,
pp. 791–806. IGI Global (2016)
31. Thelijjagoda, S., Hennayake, T.M.: The impact of social media networking (SMN) towards
business environment in Sri Lanka. In: 2015 Fifteenth International Conference on Advances
in ICT for Emerging Regions (ICTer), pp. 207–213 (2015)
Social Media Impact on Business: A Systematic Review 707

32. Aral, S., Dellarocas, C., Godes, D.: Introduction to the special issue—social media and
business transformation: a framework for research. Inf. Syst. Res. 24(1), 3–13 (2013)
33. Mirchandani, A., Gaur, B.: Current trends & future prospects of social media analytics in
business intelligence practices. Viitattu 7, 2019 (2019)
34. Nisar, T.M., Prabhakar, G., Strakova, L.: Social media information benefits, knowledge
management and smart organizations. J. Bus. Res. 94, 264–272 (2019)
35. Prodanova, J., Van Looy, A.: A systematic literature review of the use of social media for
business process management. In: International Conference on Business Process Manage-
ment, pp. 403–414 (2017)
36. Tajvidi, R., Karami, A.: The effect of social media on firm performance. Comput. Human
Behav., 105174 (2017)
37. Wang, Z., Kim, H.G.: Can social media marketing improve customer relationship
capabilities and firm performance? Dynamic capability perspective. J. Interact. Mark. 39,
15–26 (2017)
38. Tajudeen, F.P., Jaafar, N.I., Ainin, S.: Understanding the impact of social media usage
among organizations. Inf. Manag. 55(3), 308–321 (2018)
39. He, W., Wang, F.-K., Chen, Y., Zha, S.: An exploratory investigation of social media
adoption by small businesses. Inf. Technol. Manag. 18(2), 149–160 (2017)
40. Abed, S.S., Dwivedi, Y.K., Williams, M.D.: Social commerce as a business tool in Saudi
Arabia’s SMEs. Int. J. Indian Cult. Bus. Manag. 13(1), 1–19 (2016)
41. Cesaroni, F.M., Consoli, D.: Are small businesses really able to take advantage of social
media? Electron. J. Knowl. Manag. 13(4), 257 (2015)
42. Pourkhani, A., Abdipour, K., Baher, B., Moslehpour, M.: The impact of social media in
business growth and performance: a scientometrics analysis. Int. J. Data Netw. Sci. 3(3),
223–244 (2019)
43. Alghizzawi, M., Habes, M., Salloum, S.A., Ghani, M.A., Mhamdi, C., Shaalan, K.: The
effect of social media usage on students’e-learning acceptance in higher education: a case
study from the United Arab Emirates. Int. J. Inf. Technol. Lang. Stud. 3(3), 13–26 (2019)
44. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
Digital Transformation and Organizational
Operational Decision Making: A Systematic
Review

Ala’a Ahmed1 , Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
malshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering, University of Sharjah,
Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. The digital transformation concept is related to how organizations


utilize the power and possibility of technology like cloud computing, artificial
intelligence, machine learning, and IoT technologies, to enhance the business
operation and customers’ experience or create new business lines and methods.
This study introduces systemic review literature focusing on the organizational
perspective of operation decision-making approaches under the digital trans-
formation umbrella. Finally, the study concludes with a set of previous studies’
gaps and provides new research venues.

Keywords: Systematic review  Digital transformation  Artificial


intelligence  Organizational operational

1 Introduction

Where the fourth industry revelation almost started and change the old understanding
of the business operation and decision-making approaches now, we deal with a very
dynamic market and consumers’ behaviors, and the accelerated curve of technology
evolves [1]. The need for understanding the implication of the new era becomes
necessary for an organization to sustain, grow, and survive. Digital transformation
(DI) and data-driven organization have become a strategic goal of many organizations
and senior management agendas [2], but the amount of literature is little regarding how
organization decision making ability impact. The Definition of Digital transformation
as [2] is “us od new digital technologies (social media, mobile, analytics or embedded
devices) to enable significant business improvement such as enhancing users’ experi-
ence, streaming operation, or creating new business models. Review all previous lit-
erature shows the growth in studies. As below, Fig. 1 shows the dramatic increase in
research 2010–2019. Start focusing on digital transformation and the impact of big data
analytic many theories are used now as Information technology business values. Also,

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 708–719, 2021.
https://doi.org/10.1007/978-3-030-58669-0_63
Digital Transformation and Organizational Operational Decision Making 709

this review shows the no much study’s in venues of values of digital transformation and
its impact on organization ability of decision making based on data analytics and how
this will support organization business growth [3]. This systematic review will focus on
two mains theoretical background first, established recourses view theory. Secondly,
dynamic capabilities view both of these theories represent as solid foundation of our
study trying to evaluate and identify the importance of each factor in organization data-
driven decisions, and it affects business performance.

Fig. 1. Digital Transformation and Operational decision-making publication growth (2010–


2019)

The study structure will be as follow. Section 2 gives a brief of research contri-
bution and gap. Section 3 details the methodology used in our systematic literature
review by describing all steps conducted of selecting and filtering the articles and data
extracted to build a frequency factors table. Section 4 explains the aspect of digital
transformation in the business context and areas of operational decision making in the
organization digital transformation context Section 5 concludes the final remarks about
the overall frameworks of digital organizational transformation and its impact on
business performance and values.

2 Research Contribution and Gaps

This Previous literature delivered state of the art contributions in about (AI) and
(DSS) impacts on organization operations and building dynamics capabilities preparing
the organization for the digital transformation and adoption of AI-based decisions
systems; this study will develop different approaches analyzing quantitative data col-
lected to measure the effects of digital transformation in specific (AIDSS) on organi-
zational survival, growth, and performance at highly dynamics organization such as
technology and manufacturing industry. The research question of this study is What the
effects of digital transformation in specific utilizing AI-based decisions support system
on organizational survival, growth, and performance
710 A. Ahmed et al.

This study will demonstrate the effect of adopting digital transformation in specific
on utilizing an AI-based decision support system on organizational survival, growth,
and performance to provide a better understanding of how organizations can utilize the
resources and processes necessary to achieve maximum performance and profit.
Finally, the study outcome could be extended to other service sectors as a guideline of
organization within and after the digital transformation.

3 Methods

A systematic literature review method will be followed by [4–8]. Four-stage review


conduct, first, identify relevant study as per established search criteria. The second
stage is identifying the inclusion and exclusion of articles and studies. The third stage is
a full review of articles and study and finally perform data extraction and analysis.
Below section will describe each stage in detail.

3.1 Identify Relevant Studies


Based on the research question, “How deep the impact of digital transformation on
business operation decision making.” to identify relevant studies below Fig. 2 multi-
stage review of articles used.

Fig. 2. Articles selection process.

3.2 Inclusion/Exclusion Criteria


Selection of the articles and study is the most critical stage of this systematic review,
inclusion, and exclusion criteria applied as seen in Table 1 below:

Table 1. Inclusion and exclusion criteria.


Date 2010–2020
Language English
Type of Studies Peer-Reviewed, Q1
Study Design Random and Controlled
Measurements Studies about the impact of digital transformation on decisions making
Digital Transformation and Organizational Operational Decision Making 711

3.3 Search Strategy


Scopus, ProQuest, and EPSCO database used in the search, to reduced and controlled
the searched keywords combination set used as below Table 2, a search of those
keywords applied within the title, abstract, and keywords within the articles.

Table 2. Keywords and database used.


Keywords Database
(Digital transformation AND artificial intelligence-based AND (ProQuest, Scopus,
organization decisions making approaches) and EPSCO)
AI-Based Digital Transformation AND organization decisions (ProQuest, Scopus and
making approaches) EPSCO)

The search result ProQuest 276 articles, Scopus 163 and ESPCO 51 articles all
publish in scholarly journals identified and processed in Ref Works reference manager.
Stage 2 all articles set titles reviewed to identify the relevant studies were studies not
related to the business context of digital transformation exclude as outcome 235 studies
passed to stage 3 where abstract reviewed in terms of how much articles relevant to the
main research question “How deep the impact of digital transformation on business
operation decision making” outcome of this stage is 69 articles will be reviewed in-
depth to asset the relevancy and quality to build Nobel contribution.

3.4 Quality Assessment


The 69 articles remained it was assessed each individually considering scientific
approach followed in the study, also the credibility of the articles assessing the finding
where it valuable of the organization and finally considering Q1 rank journals to
increase the value of contribution in our Nobel systematic review—saying that 27
articles consider in our study as seen below in Table 3.

Table 3. Selected Articles.


Author Journals Rank
[9] MIT Sloan Management Review Q1
[10] Journal of Knowledge Management Q1
[11] MIT Sloan Management Review Q1
[12] Journal of Manufacturing Technology Management Q1
[12] MIT Sloan Management Review Q1
[13] International Journal of Production Research Q1
[14] Journal of Big Data Q1
[15] Journal of Knowledge Management Q1
(continued)
712 A. Ahmed et al.

Table 3. (continued)
Author Journals Rank
[16] Journal of Big Data Q1
[17] Information Systems Frontiers Q1
[18] Information Systems Frontiers Q1
[15] Information Systems Frontiers Q1
[19] Journal of Knowledge Management Q1
[20] Journal of Knowledge Management Q1
[21] Journal of Knowledge Management Q1
[22] MIT Sloan Management Review Q1
[23] Journal of Big Data Q1
[24] Journal of Business Research Q1
[25] Government Information Quarterly Q1
[26] Telecommunications Policy Q1
[27] Government Information Quarterly Q1
[28] International Journal of Production Economics Q1
[29] Journal of Manufacturing Technology Management Q1
[30] Journal of Information Technology Q1
[31] Energies Q1
[32] IEEE Access Q1
[33] International Journal of Production Research Q1

3.5 Data Extraction


To find the right synthesize the result among all studies selected many aspects of
categorizing the studies in spared sheets such as study methodology (case study,
qualitative approach, quantitative approach), publication Journals, publication rank,
level of analysis, key finding and future research suggestion. A coding scheme used to
analyze the selected factor frequency as seen below in the frequency Table 4.

Table 4. Frequency extracted factors


Title Factors
Strategy Governance Methods Software Technology People Decision
and Making
Culture Performance
Artificial Intelligence for 1 0 0 0 1 1 1
Innovation in Austria
Barriers, practices, methods and 1 0 1 0 0 1 1
knowledge management tools in
startups
Big data in the public sector: 1 0 0 0 1 1 0
Uncertainties and readiness
Big Data: Deep Learning for 0 0 0 1 1 0 0
financial sentiment analysis
(continued)
Digital Transformation and Organizational Operational Decision Making 713

Table 4. (continued)
Title Factors
Strategy Governance Methods Software Technology People Decision
and Making
Culture Performance
Conceptual Approach to the 0 0 0 1 1 0 1
Development of Financial
Technologies in the Context of
Digitalization of Economic
Processes
Digital supply chain model in 0 0 1 1 1 0 1
Industry 4.0
Digitalization, business models, 1 0 1 1 1 1 1
and SMEs: How do business model
innovation practices improve the
performance of digitalizing SMEs?
Driving innovation through big 1 1 1 1 1 1 1
open linked data (BOLD):
Exploring antecedents using
interpretive structural modeling
An empirical study on innovation 0 0 0 0 0 1 0
motivators and inhibitors of
Internet of Things applications for
industrial manufacturing enterprises
Examining the Core Dilemmas 0 0 0 0 1 0 0
Hindering Big Data-related
Transformations in Public-Sector
Organisations
Factors Introducing Industry 4.0 to 1 0 0 0 1 0 0
SMEs
Industry 4.0: State of the art and 0 0 0 0 1 0 0
future trends
Information and reformation in KM 1 0 0 1 1 1 1
systems: big data and strategic
decision-making
Integration of big-data ERP and 0 0 0 1 1 1 1
business analytics (BA)
Knowledge Management: A Tool 0 0 0 0 1 1 1
for Implementing the Digital
Economy
Management of Service Level 0 1 0 0 1 0 0
Agreements for Cloud Services in
IoT: A Systematic Mapping Study
Mapping the values of IoT 0 0 1 1 1 1 1
Multi-agent systems applications in 0 0 0 1 1 0 1
energy optimization problems: A
state-of-the-art review
Organizational implications of a 0 0 0 1 1 1 1
comprehensive approach for cloud-
storage sourcing
Smart Maintenance: a research 0 0 1 0 1 0 1
agenda for industrial maintenance
management
(continued)
714 A. Ahmed et al.

Table 4. (continued)
Title Factors
Strategy Governance Methods Software Technology People Decision
and Making
Culture Performance
Standardization framework for 0 1 0 0 0 0 1
sustainability from the circular
economy 4.0
Technology blindness and temporal 1 0 0 0 0 0 1
imprecision: rethinking the long
term in an era of accelerating
technological change
The impact of big data analytics on 0 0 0 0 0 1 1
firms’ high-value business
performance
The Impact of Digitalization on the 1 0 0 0 1 0 1
Insurance Value Chain and the
Insurability of Risks
The Internet of Everything: Smart 0 0 1 1 1 0 1
things and their impact on business
models
The role of future-oriented 1 0 0 0 1 0 0
technology analysis in e-
Government: a systematic review
The role of leadership in a 1 0 0 0 0 1 1
digitalized world: A review
Theory-driven or process-driven 0 0 0 1 0 0 1
prediction? Epistemological
challenges of big data analytics
Towards a technological, 1 0 0 1 1 1
organizational, and socio-technical
well-balanced KM initiative
strategy: a pragmatic approach to
knowledge management
12 3 7 12 22 14 21

4 Results
4.1 Digital Transformation in the Business Context
Explain the disruptive impact of digital transformation on the organization’s business
model and innovation. Determined the characteristics of digital innovations will use a
dynamic capabilities framework to help us understand the nature of strategic change in
the organization [34, 35]. This framework gives us the ability to describe organizations’
capacities. Dynamic capabilities are innovation-based and distinguishable from a firm’s
operational capabilities, where its conventional help organization in a sense they help a
firm in the present by maintaining the status quo, but this leaves the organization
vulnerable to environmental change [36–40]. [41] explains that although ordinary
capabilities enable the or to perform operational tasks, conventional capabilities in
functions such as accounting, human resources management, and sales are now easily
Digital Transformation and Organizational Operational Decision Making 715

replicable because they can be outsourced. The literature clarifies the relationship
between dynamic capabilities, strategy, and business models argument that a business
model is “a reflection of the firm’s realized strategy,” [42] business models (present or
short-term perspective) to face either upcoming or existing contingencies. In other
words, dynamic capabilities represent the intermediary between strategy and business
models, ensuring the strategic renewal of organizations [43]. Authors in [44] emphasize
that firms require a system of dynamic capabilities to orchestrate resources and evolve
the business model. Balanced redundancy, requisite variety, and cognitive discretion
were the dynamic capabilities that supported the evolution of a news organization’s
digital business model. For organizations, [45] argues, “In many cases, corporate
strategy dictates business model design. At times, however, the arrival of new general-
purpose technology (e.g., the Internet) opens opportunities for radically new business
models to which corporate strategy must then respond.” Building sensing, seizing, and
transforming capabilities thus allows a firm to craft a future strategy that designs,
creates, and refines a defensible business model, guides organizational transformation,
and provides a durable source for obtaining a competitive advantage [45]. For orga-
nizations the condition requires to stay competitive, it’s a must of mangers on opera-
tions to have a well understanding of the competition and competitors [46]. In order of
organizations to make the correct strategy in terms of operations and reflecting it across
overall organization activity from design, plan to enhance the organization position and
competitive edge among competitors by improving the operation process and be more
Flexible and fast adopting market changes. We need to understand the concept of
decision is that there is an ‘individual decision-maker facing a choice involving
uncertainty about outcomes [47]. Putting the individual decision-maker at the center of
decision-making is the most intuitive approach to studying algorithmic decision-
making in organizations [48], where the individual is the recipient of automated
decisions or recommendations. For instance, operative decision-makers who provide IT
enabled services [49]. Study on the implementation of design and evaluations of the
Performance Management System, with the context of SME’s use. PMS providers
targeting SMEs use the empirical natures of PMS to provide proof that covers the
enablement of their software offerings. Also, be considered as a pathway to evaluate
and improve existing PMS. PMS does not rely on benchmarks to support decisions
making along with PMS use. The study shows the interest of business owners and
makers of proposed PMS features that were unavailable in existing PMS [50]. Design a
new performance modeling system based on data envelopment analysis (DEA) and
artificial neural network (ANN) to demonstrate overall business efficiency performance
predictions and to be tested empirically underperformance modeling framework. This
study focus on advancing research in performance benchmark and modeling by
proposing a new model that can help managers make decisions and solve theory
practice issues. Channing management approaches and the overall culture of the
organization it is necessary since its play major role of successful implementation on
any innovative system or concept. Where the age group is key in this study, a sig-
nificant difference in precipitation found answering the same question about the
industry 4 concept. Another study shows the deep relationship between human and
artificial intelligence decision making algorithms between de-attaching in the context of
rational distance and attaching to decision making caused off accidental and
716 A. Ahmed et al.

infrastructural proximity, imposed engagement, and effective adhesion. The study


demonstrates the result of unbalanced relationships as delayed decision and solutions,
and manipulation in data, also pointing out the important role of media studies in
explaining behavior created during digital transformation in the organization (Bader
Verena, Stephan Kaiser-2019), moreover organization behavior in terms of data-driven
and business digitalization era where data-driven it has to be an organizational culture.
However, the multidisciplinary nature of big data analytics and data science (BDADS)
seems to collide with the domain- ant “functional silo design” that characterizes
business schools. The scope and breadth of the radical digitally enabled change; they
necessitate a global questioning about the nature and structure of business education.
Understanding incumbent firms in traditional industries build dynamics capabilities for
digital transformation and conceptualized building dynamics capabilities as a process
of building dynamics capabilities for ongoing strategy renewal will help build to
understand the importance of the time factor roles in digital transformation in orga-
nization Maintenance of their competitive edge [51]. We manage to create a research
model highlight the impact and correlation among organizations building dynamic
capabilities and business performance, taking into consideration internal and external
factors impact, as seen below in Fig. 3.

Fig. 3. Digital transformation and data frameworks.

4.2 Area of Operational Decision Making in Organization Digital


Transformation Context
One of the essential issues is converting digital transformation insights into decisive
action. Utilizing the data outcome of the digital transformation it is a key now of many
organizations where the adoption of data-driven decision start to be one of the key
Digital Transformation and Organizational Operational Decision Making 717

competitive advantages of the firms in the new dynamic nurture of the market. Current
literature considers utilizing the data is sufficient to create an expected outcome of the
organization where it is important to study the techniques and obstacles that block this
value. We suggest here to study the impact of external factors such as environmental
and data breaching threats affecting digital transformation, where it impacts organi-
zation decision ability and expected performance.

5 Conclusion

Our systematic review presents a deep theoretical framework tackling the organiza-
tion’s digital transformation from different angels and aspect, where the dynamic
capabilities of any organization and external, internal factor relate to the business
growth and maintain performance, additional studies need to test the proposed
framework empirically by surveys, interviews, focus groups targeting the expert and
practitioners in this field such as operation managers and CEO, a combined method-
ology(qualitative and quantitative) of data collection to be adopted. Finally say the
main goals of this review is to give the digital transformation a bigger frame and role in
organization performance and growth not only technology used to improve the process
or organization.

References
1. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: International Conference on Knowledge Management in Organiza-
tions, pp. 94–106 (2018)
2. Woo, S.-R., et al.: STING-dependent cytosolic DNA sensing mediates innate immune
recognition of immunogenic tumors. Immunity 41(5), 830–842 (2014)
3. Constantiou, I.D., Kallinikos, J.: New games, new rules: big data and the changing context
of strategy. J. Inf. Technol. 30(1), 44–57 (2015)
4. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
5. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
6. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
7. Alshurideh, M.T., Assad, N.F.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
8. Kitchenham, B., Charters, S.: Guidelines for performing systematic literature reviews in
software engineering, pp. 1–57. Software Engineering Group, School of Computer Science
and Mathematics, Keele University (2007)
9. Prem, E.: Artificial intelligence for innovation in Austria. Technol. Innov. Manag. Rev. 9
(12), 5–15 (2019)
718 A. Ahmed et al.

10. Oliva, F.L., Kotabe, M.: Barriers, practices, methods and knowledge management tools in
startups. J. Knowl. Manag. 23, 1838–1856 (2019)
11. Klievink, B., Romijn, B.-J., Cunningham, S., de Bruijn, H.: Big data in the public sector:
uncertainties and readiness. Inf. Syst. Front. 19(2), 267–283 (2017)
12. Garay-Rondero, C.L., Martinez-Flores, J.L., Smith, N.R., Morales, S.O.C., Aldrette-
Malacara, A.: Digital supply chain model in Industry 4.0. J. Manuf. Technol. Manag. (2019)
13. Da Xu, L., Xu, E.L., Li, L.: Industry 4.0: state of the art and future trends. Int. J. Prod. Res.
56(8), 2941–2962 (2018)
14. Bouwman, H., Nikou, S., Molina-Castillo, F.J., de Reuver, M.: The impact of digitalization
on business models. Digit. Policy Regul. Gov. 20, 105–124 (2018)
15. Dwivedi, Y.K., et al.: Driving innovation through big open linked data (BOLD): exploring
antecedents using interpretive structural modelling. Inf. Syst. Front. 19(2), 197–212 (2017)
16. Heinis, T.B., Hilario, J., Meboldt, M.: Empirical study on innovation motivators and
inhibitors of Internet of Things applications for industrial manufacturing enterprises.
J. Innov. Entrep. 7(1), 10 (2018)
17. Kuoppakangas, P., Kinder, T., Stenvall, J., Laitinen, I., Ruuskanen, O.-P., Rannisto, P.-H.:
Examining the core dilemmas hindering big data-related transformations in public-sector
organisations. NISPAcee J. Public Adm. Policy 12(2), 131–156 (2019)
18. Latilla, V.M., Frattini, F., Petruzzelli, A.M., Berner, M.: Knowledge management,
knowledge transfer and organizational performance in the arts and crafts industry: a
literature review. J. Knowl. Manag. 22(6), 1310–1331 (2018)
19. Intezari, A., Gressel, S.: Information and reformation in KM systems: big data and strategic
decision-making. J. Knowl. Manag. 21, 71–91 (2017)
20. Shi, Z., Wang, G.: Integration of big-data ERP and business analytics (BA). J. High Technol.
Manag. Res. 29(2), 141–150 (2018)
21. Mizintseva, M.F., Gerbina, T.V.: Knowledge management: a tool for implementing the
digital economy. Sci. Tech. Inf. Process. 45(1), 40–48 (2018)
22. Mubeen, S., Asadollah, S.A., Papadopoulos, A.V., Ashjaei, M., Pei-Breivold, H., Behnam,
M.: Management of service level agreements for cloud services in IoT: a systematic mapping
study. IEEE Access 6, 30184–30207 (2017)
23. Nicolescu, R., Huth, M., Radanliev, P., De Roure, D.: Mapping the values of IoT. J. Inf.
Technol. 33(4), 345–360 (2018)
24. Krishnaswamy, V., Sundarraj, R.P.: Organizational implications of a comprehensive
approach for cloud-storage sourcing. Inf. Syst. Front. 19(1), 57–73 (2017)
25. Bokrantz, J., Skoogh, A., Berlin, C., Wuest, T., Stahre, J.: Smart maintenance: a research
agenda for industrial maintenance management. Int. J. Prod. Econ., 107547 (2019)
26. Ávila-Gutiérrez, M.J., Martín-Gómez, A., Aguayo-González, F., Córdoba-Roldán, A.:
Standardization framework for sustainability from circular economy 4.0. Sustainability 11
(22), 6490 (2019)
27. Dorr, A.: Technology blindness and temporal imprecision: rethinking the long term in an era
of accelerating technological change. Foresight 18, 391–413 (2016)
28. Popovič, A., Hackney, R., Tassabehji, R., Castelli, M.: The impact of big data analytics on
firms’ high value business performance. Inf. Syst. Front. 20(2), 209–222 (2018)
29. Eling, M., Lehmann, M.: The impact of digitalization on the insurance value chain and the
insurability of risks. Geneva Pap. Risk Insur. Pract. 43(3), 359–396 (2018)
30. Langley, D.J., van Doorn, J., Ng, I.C.L., Stieglitz, S., Lazovik, A., Boonstra, A.: The internet
of everything: smart things and their impact on business models. J. Bus. Res. (2020)
31. Sánchez-Torres, J.M., Miles, I.: The role of future-oriented technology analysis in e-
Government: a systematic review. Eur. J. Futures Res. 5(1), 15 (2017)
Digital Transformation and Organizational Operational Decision Making 719

32. Cortellazzo, L., Bruni, E., Zampieri, R.: The role of leadership in a digitalized world: a
review. Front. Psychol. 10, 1938 (2019)
33. Elragal, A., Klischewski, R.: Theory-driven or process-driven prediction? Epistemological
challenges of big data analytics. J. Big Data 4(1), 1–20 (2017)
34. Teece, D., Peteraf, M., Leih, S.: Dynamic capabilities and organizational agility: risk,
uncertainty, and strategy in the innovation economy. Calif. Manag. Rev. 58(4), 13–35
(2016)
35. Schilke, O., Hu, S., Helfat, C.E.: Quo vadis, dynamic capabilities? A content-analytic review
of the current state of knowledge and recommendations for future research. Acad. Manag.
Ann. 12(1), 390–439 (2018)
36. Alshurideh, M., Shaltoni, A., Hijawi, D.: Marketing communications role in shaping
consumer awareness of cause-related marketing campaigns. Int. J. Mark. Stud. 6(2), 163
(2014)
37. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
38. Alshurideh, M., Al Kurdi, B., Abumari, A., Salloum, S.: Pharmaceutical promotion tools
effect on physician’s adoption of medicine prescribing: evidence from Jordan. Mod. Appl.
Sci. 12(11), 210–222 (2018)
39. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics, vol. 9, no. 02, pp. 392–414 (2019)
40. Alshurideh, et al.: Determinants of pro-environmental behaviour in the context of emerging
economies. Int. J. Sustain. Soc. 11(4), 257–277 (2019)
41. Teece, D.J.: A dynamic capabilities-based entrepreneurial theory of the multinational
enterprise. J. Int. Bus. Stud. 45(1), 8–37 (2014)
42. Trkman, M., Trkman, P.: A framework for increasing business value from social media.
Econ. Res. istraživanja 31(1), 1091–1110 (2018)
43. Agarwal, R., Helfat, C.E.: Strategic renewal of organizations. Organ. Sci. 20(2), 281–293
(2009)
44. Velu, C.: A systems perspective on business model evolution: the case of an agricultural
information service provider in India. Long Range Plann. 50(5), 603–620 (2017)
45. Teece, D.J.: Dynamic capabilities as (workable) management systems theory. J. Manag.
Organ. 24(3), 359–368 (2018)
46. Hodgkinson, G.P., Johnson, G.: Exploring the mental models of competitive strategists: the
case for a processual approach. J. Manag. Stud. 31(4), 525–552 (1994)
47. Peterson, M.: An Introduction to Decision Theory. Cambridge University Press, Cambridge
(2017)
48. Davenport, T.H., Dyché, J.: Big data in big companies. Int. Inst. Anal. 3, 1–31 (2013)
49. Chae, B.K., Yang, C., Olson, D., Sheu, C.: The impact of advanced analytics and data
accuracy on operational performance: a contingent resource based theory (RBT) perspective.
Decis. Support Syst. 59, 119–126 (2014)
50. Bennett, L.M., Gadlin, H., Marchand, C.: Collaboration Team Science: Field Guide. US
Department of Health & Human Services, National Institutes of Health … (2018)
51. Leitner, K.-H.: The future of innovation: hyper innovation, slow innovation, and no
innovation. BHM Berg-und Hüttenmännische Monatshefte 162(9), 386–388 (2017)
The Impact of Innovation Management
in SMEs Performance: A Systematic Review

Fatema Al Suwaidi1 , Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
{malshurideh,ssalloum}@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE

Abstract. This paper presents a method of evaluating innovation in small and


medium enterprises, while the main purpose of the article is to examine how
innovation affects business performance. Innovation is an important factor that
produces quality and improves competitiveness. This study presents a com-
prehensive analysis of 17 research articles from 2015 to 2019. Questionnaires,
surveys, and case studies were the primary methods of data collection and were
undertaken in Pakistan, Mexico, Kuwait, Indonesia, Italy, respectively, among
other countries. To that end, the findings in the studies offer an investigation on
the circumstances the lead SMEs to adopt innovation to enhance their
performance.

Keywords: Small & medium enterprises  Innovation; systematic review

1 Introduction

Small and medium enterprises (SMEs) are labeled as the backbone of the growing
economy since it leads countries toward profitability [1–6]. The curiosity of many
researchers such as [7–12] nowadays became triggered to identify the secret behind
SME’s success stories and how they manage to survive against conglomerate com-
panies. The innovation paradigm is our main focus to measure the performance of
SMEs, giving the fact that innovation is part of our human traits as we enjoy being
creative and collect appraisal for coming up with good ideas or adding value to what
we do either in our personal or professional life [13]. The best way to measure and trace
innovation in SMEs is when a company is naturally required to change in different
areas of its business activities, and the process of change leads to the appearance and
disappearances of objects within the company, resulting in a change in the level of
innovation [11, 12, 14, 15]. The review addresses the following three research
questions:

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 720–730, 2021.
https://doi.org/10.1007/978-3-030-58669-0_64
The Impact of Innovation Management in SMEs Performance 721

• RQ1: What are the main research methods in the collected studies?
• RQ2: What is the Distribution of countries in the selected studies?
• RQ3: What are the database sources used to collect the studies?

2 Methods

To present new research and beneficially contribute to the available literature, this
study adopts the systematic review method used by many scholars such as [16–19].
This study was conducted through many steps. Initially, we take on was to read and
fully assess what other fellow researchers have published on the same topic by dedi-
cating time and effort to revise their findings. Doing so we started a rigorous literature
review with a similar framework in mind to determine how other researchers’ insight
and the result can be used as the stepping-stone in starting our path into writing this
paper. We rely on many criteria and following specific guidelines in the systematic
review to find the most recent and suitable articles that will add to our knowledge in the
area of our research with the focus on adding credibility when presenting our research
findings [16, 20–23]. We established a precise inclusion criterion mentioned in
Table 1, which produce a wide spectrum of literature that tackled innovation perfor-
mance, particularly by focusing on the SMEs segment, which was the context in our
paper when searched in the different databases search engines.

Table 1. Inclusion and exclusion criteria.


Inclusion criteria Exclusion criteria
Should involve innovation management Papers that are not used with innovation
management
Should involve SMEs Papers don’t involve SMEs
Should be written in the English Papers that use another language than English
language
Should be published between 2015 and Papers published before 2015
2019

2.1 Inclusion/Exclusion Criteria


The most crucial step in conducting high-quality research is to create an articulated
study question that acts as a guideline to refine a more focused question for the study,
that’s why we used the PICO frame model to locate the relevant published studies that
are clearly described in Table 2.
722 F. Al Suwaidi et al.

Table 2. PICO frame model.


PICO
Population Small and medium enterprises
Intervention Innovation management performance
Comparison Traditional management
Outcome The impact of innovation management

2.2 Data Sources and Research Strategies


Starting with simple keywords, we broadened our knowledge about innovation in
SME’s performance and were able to find and collect interesting articles in different
databases such as Science Direct, ProQuest one academic, ABI/INFORM Global, and
Wiley. Then we started to narrow down the search by relying on specific keywords to
bring us closer to what others have presented on our topic of interest to enable us to
build on their contribution and continue to bring a new perspective or develop a new
outlook on the examined topic. The first attempt was: (“Small and medium enterprises”
AND “innovation”), that resulted in 8,048 but didn’t give us the exact feel of what we
are searching for; afterward we added more focus on measuring the innovation capacity
by including it in the search: (“Small and medium enterprises” AND “Innovation
capacity”) which calculated 690 papers however it throws us far from our purpose of
the study. Finally, we reached our targeted articles by mainly concentrate on focal
keywords that identified exactly what we were seeking to measure, which is: (“SMEs
performance” AND “innovation management”). The founded initial results were 155
articles described in Table 3. The crucial and difficult process was the filtration process
that followed in many steps on the collected studies to screen and exclude the low-
quality papers that add no contribution to our study, the First step was to eliminate the
38 duplicated articles and step two was a thorough review based on the abstract and
body of works for the remaining articles which decreased the total count of 17 articles
that met not only the inclusion criteria but measured variety of variables affecting the
SMEs performance when it comes to innovation management and is illustrated in
Fig. 1.

Table 3. Data sources and databases.


Keyword search Science ProQuest ABI/INFORM Wiley Frequency
direct one global
academic
“Small and medium 1,492 1,871 3,170 1,515 8,048
enterprises” AND
“innovation”
“Small and medium 367 172 126 25 690
enterprises” AND
“Innovation capacity”
“SMEs performance” 23 27 97 8 155
AND “innovation
management”
The Impact of Innovation Management in SMEs Performance 723

Fig. 1. Flow chart of the systematic review selection process.

2.3 Quality Assessment


Selection and assessing the quality of the studies experience two phases: First, the
initial screening that focused mainly on the title and the abstract against the inclusion
criteria, secondly, a rigorous screening for the full papers. To assess the 17 Articles on
the side of the inclusion and exclusion criteria, we used the Data Quality Assessment
(DQA) with eight criteria of checklist questions to estimate the impact of each study as
seen in Table 4. Table 5 measures the quality of the articles by scoring them with the
three-point scale if the answer is “YES” the score will be 1 point, “NO” the score will
be 0 points and “PARTIALLY” the score will be 0.5 points, so each study could score
between 0 and 8, the higher the degree to which this study addresses the research
questions illustrated in Table 5.

Table 4. Quality assessment checklist.


# Question
1 Are the research aims clearly specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study clearly specified?
4 Is the study context/discipline clearly specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?
724 F. Al Suwaidi et al.

Table 5. Quality assessment results


Study Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Total Percentage
S1 1 0.5 0 0.5 1 1 1 0 5 63%
S2 1 1 0.5 1 0 0.5 1 1 6 75%
S3 0.5 1 0.5 0.5 1 0 0.5 1 4.5 56%
S4 0.5 0 0.5 1 0.5 1 0 0.5 4 50%
S5 1 0.5 1 1 1 1 1 0.5 7 86%
S6 1 1 1 1 1 1 0 0.5 6.5 81%
S7 0.5 0.5 1 1 0.5 0.5 1 1 6 75%
S8 1 1 1 0.5 1 1 0.5 1 7 86%
S9 1 1 1 1 0 1 1 0.5 6.5 81%
S10 1 1 0.5 0.5 1 0.5 0.5 1 6 75%
S11 1 1 0 1 0.5 1 1 0.5 6 75%
S12 0 0.5 1 1 0.5 0.5 0.5 1 5 63%
S13 1 1 1 0.5 0.5 1 1 0.5 6.5 81%
S14 1 1 1 1 1 1 1 1 8 100%
S15 1 1 1 0.5 0.5 0 1 1 6 75%
S16 1 1 1 1 1 1 1 1 8 100%
S17 1 1 1 0.5 0 1 1 1 6.5 81%

2.4 Data Coding and Analysis


The attributes to our research methodology quality were coded to display the structure
of the analyzed articles as (a) data collection methods (e.g., questionnaire, survey,
interview, case studies, etc.) described in Fig. 2 and (b) country described as seen in
Fig. 3.

Method
Not specified
18%

Case Study
6%
Interview Survey
17% 59%

Survey Interview Case Study Not specified

Fig. 2. Distribution of studies by research methods.


The Impact of Innovation Management in SMEs Performance 725

Country
1
China 1
2
Kuwait 1
1
Indonesia 1
3
Portugal 2
2
italy 3
1
United Kingdom 1
1
Finland 1

Fig. 3. Distribution of studies in terms of country

3 Result and Discussion

After careful classification of the external factors of each published 17 articles in


Tables 6 and 7 from the year 2015 to 2019, we witnessed how articles tackle the
innovation concept in SMEs from different interpretation some focused in effect from
internal factors such as leadership and training or from external factors such as financial
performance and risk.
The findings revolve around the reported result of the systematic review, according
to the three research questions.
– RQ1: What are the main research methods in the collected studies?
Figure 2 indicates that 59% of the analyzed studies were mainly depended on the
survey (n = 10).
– RQ2: What is the distribution of countries in the selected studies?
We distributed the collected articles across the countries that these studies were
conducted in Fig. 3 indicates how SME innovation variables were examined across
countries around the world from Italy, Malaysia, Pakistan, Brazil, Kuwait, Mexico,
and others.
– RQ3: What are the database sources used to collect the studies?
In term of the database, both Table 6 and Table 7 illustrates the process of obtaining
the published studies all through 4 different database sources: Science Direct,
ProQuest one academic, ABI/INFORM Global and Wiley in the last 5 years that
emphasized on keywords examining the relations between SMEs and Innovation
performance.
726 F. Al Suwaidi et al.

Table 6. Classification of Innovation performance in SMEs in terms of external variables.

Financial Technological Training and leadership Risk Management


Source
performance capability development

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]
The Impact of Innovation Management in SMEs Performance 727

[38]

[39]

[40]

Table 7. Analysis of Innovation performance in SMEs papers across the analyzed 17 papers.
Author/year Country Context Dependent factors Data collection method Sample size
[24] Finland SMEs Management of continuous Quantitative 2,400 SMEs
innovation
[26] Pakistan SMEs Managerial capabilities Qualitative (survey) 210 firms
impact on innovation
[25] EU members SMEs Innovation in SMEs Quantitative (interviews) 27 companies
state and
Norway
[29] Mexico SMEs Business performance Qualitative (survey) 415 SMEs
[27] Korea SMEs The ratio of export/sales Quantitative 1,200 firm
volume in 2017
[41] Italy SMEs Internationalization of SMEs Quantitative 161 Mexican
SMEs
[30] Portugal Exporting International performance of Quantitative 120 Exporting
SMEs SMEs (pre-test interviews) SMEs
[31] Pakistan SMEs Organization innovation Quantitative 239 SMEs
(cross-sectional survey)
[32] Indonesia SMEs Innovation power Quantitative 84 SMEs
(questionnaire)
[33] Brazil SMEs Internationalization process Quantitative (survey using 112 Brazilian
of firms a structured questionnaire) industrial
SMEs
[34] Kuwait SMEs Organization performance Quantitative (survey using 500 CEO
a structured questionnaire)
[35] Malaysia SMEs Innovation culture Quantitative 140
(questionnaire-based
survey)
[36] UK & Italy SMEs Innovation process Qualitative 2
(two case studies)
[37] Mexico SMEs Innovation capabilities Qualitative (questionnaire 308
survey)
[40] China SMEs Innovation performance Qualitative (survey) 206
[39] Pakistan SMEs Relationship between HPWS Qualitative 237
and innovation performance (self-administered
questionnaire)
[40] Malaysia SMEs Effect of innovation adoption Quantitative (interviews) 360
on the performance
728 F. Al Suwaidi et al.

4 Conclusion

From this systematic review assessment with regards to knowledge management and
firm performance, there are many findings. First, the are many factors repeated in this
research like culture, structure, strategy technology, intellectual capital, leadership,
process, and rewards. Most of the repeated elements are located within the organization
and the firm that will light the path for future study for the internal drivers that affect the
KM and FP. Second Most of the articles showed a positive relationship between
knowledge management and firm performance; there a cumulative connection between
the KM and FP. If there is excellent knowledge management in the firm, there will be a
top firm performance. The third survey as a research method found as the primary data
collection in the most research 78%. Fourth, 87.8% of the articles showed positive
research outcomes. 7.3% negative and 4% N/A. There are many limitations mentioned
in this systematic review. First, many articles highlighted that there are some limita-
tions in the theoretical part as well as the conceptual model which needed more
constructs. Second Many articles focused on specific contexts and county, that makes it
very difficult to generalize the study. Thirds limitation that the sample size is small and
low reliabilities showed in many articles. The fourth point, some of the studies don’t
take into consideration other possible mediators and moderators, such as trust, policy,
rewords, etc. Finally, many biases showed in many articles.

References
1. Alshurideh, M.T., Assad, N.F.: Financial reporting quality, audit quality, and investment
efficiency: evidence from gcc economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208
(2020)
2. Abu Zayyad, H.M., Obeidat, Z.M., Alshurideh, M.T., Abuhashesh, M., Maqableh, M.,
Masa’deh, R.: Corporate social responsibility and patronage intentions: the mediating effect
of brand credibility. J. Mark. Commun. 1–24 (2020)
3. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
4. Mehmood, T., Alzoubi, H., Alshurideh, M., Ahmed, G., Al-Gasaymeh, A.: Schumpeterian
entrepreneurship theory: evolution and relevance. Acad. Entrep. J. 25(4), 1–10 (2019)
5. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics. Theor. Econ. Lett. 9(02), 392–414 (2019)
6. Alshurideh, et al.: Determinants of pro-environmental behavior in the context of emerging
economies. Int. J. Sustain. Soc. 11(4), 257–277 (2019)
7. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
8. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84–95 (2012)
9. Alshurideh, M., Masa’deh, R., Al Kurdi, B.: The effect of customer satisfaction upon
customer retention in the Jordanian mobile market: an empirical investigation. Eur. J. Econ.
Finance Adm. Sci. 47(12), 69–78 (2012)
The Impact of Innovation Management in SMEs Performance 729

10. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: an empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Finance Adm. Sci. 51(1), 44–64 (2012)
11. Shannak, R., Masa’deh, R., Al-Zu’bi, Z., Obeidat, B., Alshurideh, M., Altamony, H.: A
theoretical perspective on the relationship between knowledge management systems,
customer knowledge management, and firm competitive advantage. Eur. J. Soc. Sci. 32(4),
520–532 (2012)
12. Altamony, H., Alshurideh, M., Obeidat, B.: Information systems for competitive advantage:
implementation of an organisational strategic management process. In: Proceedings of the
18th IBIMA Conference on Innovation and Sustainable Economic Competitive Advantage:
From Regional Development to World Economic, Istanbul, Turkey, 9–10 May 2012 (2012)
13. McCallion, J.: Innovation, innovation, innovation. IT Pro, Retrieved (2017)
14. Kaczmarska, B., Gierulski, W.: Innovation map in the process of enterprise evaluation. In:
Key Engineering Materials, vol. 669, pp. 497–513 (2016)
15. Zu’bi, Z., Al-Lozi, M., Dahiyat, S., Alshurideh, M., Al Majali, A.: Examining the effects of
quality management practices on product variety. Eur. J. Econ. Finance Adm. Sci. 51(1),
123–139 (2012)
16. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
17. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
18. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
19. Alshurideh, M.T., Assad, N.F.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
20. Alhashmi, S.F.S., Salloum, S.A., Mhamdi, C.: Implementing artificial intelligence in the
United Arab Emirates healthcare sector: an extended technology acceptance model. Int.
J. Inf. Technol. Lang. Stud. 3(3), 27–42 (2019)
21. Alhashmi, S.F.S., Salloum, S.A., Abdallah, S.: Critical success factors for implementing
artificial intelligence (AI) projects in Dubai government United Arab Emirates (UAE) health
sector: applying the extended technology acceptance model (TAM), vol. 1058 (2020)
22. Salloum, S.A., Alhamad, A.Q.M., Al-Emran, M., Monem, A.A., Shaalan, K.: Exploring
students’ acceptance of e-learning through the development of a comprehensive technology
acceptance model. IEEE Access 7, 128445–128462 (2019)
23. Salloum, S.A.S., Shaalan, K.: Investigating students’ acceptance of e-learning system in
higher educational environments in the UAE: applying the extended technology acceptance
model (TAM). The British University in Dubai (2018)
24. Saunila, M.: Managing continuous innovation through performance measurement. Compet.
Rev. Int. Bus. J. 27(2), 179–190 (2017)
25. Segarra-Ciprés, M., Bou-Llusar, J.C.: External knowledge search for innovation: the role of
firms’ innovation strategy and industry context. J. Knowl. Manag. 22(2), 280–298 (2018)
26. Ali, Z., Sun, H., Ali, M.: The impact of managerial and adaptive capabilities to stimulate
organizational innovation in SMEs: a complementary PLS–SEM approach. Sustainability 9
(12), 2157 (2017)
27. Yoo, W.-J., Choo, H.H., Lee, S.J.: A study on the sustainable growth of SMEs: the
mediating role of organizational metacognition. Sustainability 10(8), 2829 (2018)
730 F. Al Suwaidi et al.

28. Santoro, G., Vrontis, D., Thrassou, A., Dezi, L.: The Internet of Things: building a
knowledge management system for open innovation and knowledge management capacity.
Technol. Forecast. Soc. Change 136, 347–354 (2018)
29. Soto-Acosta, P., Popa, S., Martinez-Conesa, I.: Information technology, knowledge
management and environmental dynamism as drivers of innovation ambidexterity: a study
in SMEs. J. Knowl. Manag. 22(4), 824–849 (2018)
30. Prange, C., Pinho, J.C.: How personal and organizational drivers impact on SME
international performance: the mediating role of organizational innovation. Int. Bus. Rev.
26(6), 1114–1123 (2017)
31. Rasheed, M.A., Shahzad, K., Conroy, C., Nadeem, S., Siddique, M.U.: Exploring the role of
employee voice between high-performance work system and organizational innovation in
small and medium enterprises. J. Small Bus. Enterp. 24(4), 670–688 (2017)
32. Ismanu, S., Kusmintarti, A.: Innovation and firm performance of small and medium
enterprises. Rev. Integr. Bus. Econ. Res. 8, 312 (2019)
33. Oura, M.M., Zilber, S.N., Lopes, E.L.: Innovation capacity, international experience and
export performance of SMEs in Brazil. Int. Bus. Rev. 25(4), 921–932 (2016)
34. Sawaean, F., Ali, K.: The impact of entrepreneurial leadership and learning orientation on
organizational performance of SMEs: the mediating role of innovation capacity. Manag. Sci.
Lett. 10(2), 369–380 (2020)
35. Hanifah, H., Halim, H.A., Ahmad, N.H., Vafaei-Zadeh, A.: Can internal factors improve
innovation performance via innovation culture in SMEs? Benchmarking Int. J. 27(1), 382–
405 (2019)
36. Usai, A., Scuotto, V., Murray, A., Fiano, F., Dezi, L.: Do entrepreneurial knowledge and
innovative attitude overcome ‘imperfections’ in the innovation process? Insights from SMEs
in the UK and Italy. J. Knowl. Manag. 22(8), 1637–1654 (2018)
37. dos Santos, B.C.P., Torres, M.F., Durán-Sánchez, A., Maldonado-Erazo, C.P.: Tourism
regions of mainland Portugal and its position on the social network Facebook. Tour.
Hosp. Int. J. 8(2), 114–139 (2017)
38. Zhou, Q., Fang, G., Yang, W., Wu, Y., Ren, L.: The performance effect of micro-innovation
in SMEs: evidence from China. Chinese Manag. Stud. 11(1), 123–138 (2017)
39. Shahzad, K., Arenius, P., Muller, A., Rasheed, M.A., Bajwa, S.U.: Unpacking the
relationship between high-performance work systems and innovation performance in SMEs.
Pers. Rev. 48(4), 977–1000 (2019)
40. Al Mamun, A.: Diffusion of innovation among Malaysian manufacturing SMEs. Eur.
J. Innov. Manag. 21(1), 113–141 (2018)
41. Santoro, G., Mazzoleni, A., Quaglia, R., Solima, L.: Does age matter? The impact of SMEs
age on the relationship between knowledge sourcing strategy and internationalization.
J. Bus. Res. (2019)
The Effect of Digital Transformation
on Product Innovation: A Critical Review

Jasim Almaazmi1 , Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
malshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. The first part of this review highlights a brief update on Digital
transformation that is considered as the survival of companies. Digital tech-
nologies bring major changes in culture, people, business processes, and busi-
ness models of organizations. A wide range of organization either completed
their digital transformation journey or they are in the way to achieve the digital
transformation, all of these organizations are expecting appositive critical
changes in the way business is conducted. This research aims to test and provide
an overview of digital transformation impact on the Organization’s innovation
capabilities. The researcher belief the subject of the Impact and results of post
digitalization is not addressed by academia in wide scope compare to pre-
digitalization. A systematic review approach is selected to test what the new
research is talking about and highlighting in the innovation subject in the digital
era. For this subject, different database with the related keyword used and a new
model of relationship is proposed.

Keywords: Post-Digital transformation  Innovation in the digital world 


Digital innovation

1 Introduction

According to IDC, Worldwide Spending on Digital Transformation Will Reach $2.3


Trillion in 2023 [1]. Digital transformation becomes a must trend to sustain the busi-
ness and adopted across all industries including banking, healthcare, automotive,
telecommunications, and manufacturing sectors. It enables innovation practices, new
value creation as well as new business models. Digital transformation has become a
high priority on leadership agendas, with nearly 90% of business leaders in the U.S.
and U.K. expecting IT and digital technologies to make an increasing strategic con-
tribution to their overall business in the coming decade. The question is no longer when
companies need to make digital transformation a strategic priority—this tipping point
has passed—but how to embrace it and use it as a competitive advantage [2]. Digital

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 731–741, 2021.
https://doi.org/10.1007/978-3-030-58669-0_65
732 J. Almaazmi et al.

transformation is beyond technology transfer, it is a significant approach addressing


managerial i.e. business efficiency, human resources, and redesigning business
processes.
Digital transformation changes the companies into technology companies and
changes the landscape of marketing, client engagement, and commerce. There are a
couple of reasons that justify the organization to go through digital transformation as
follows: Acceleration of Change in the market require new digital solutions to maintain
a position of industry leadership. Also, the Digitally Born startup is disrupting the
market and Companies are under pressure to compete digitally, and change business
models. Also, increase customer expectations with the entrance of new players in the
market. However, the next challenge of the Organization is to sustain successes through
the Innovation as the main differentiator from other competitors.
Research Questions: This research aims to answer the question of the impact of the
key attribute of digital transformation on the organization’s innovation especially in
terms of new product creation and what is the role of different variables on this
capability.
• Research question 1: How does the digital transformation Impact Product
Innovation?
• Research question 2: How does the Leadership Impact Product Innovation post
successful digital transformation?
• Research question 3: How does the work Culture Impact Product Innovation Post
successful digital transformation?
• Research question 4: How does the digital Skills Impact Product Innovation Post
successful digital transformation?
• Research question 5: How does the digital business Processes Impact Product
Innovation Post successful digital transformation?
• Research question 6: How does the Technology Impact Product Innovation Post
successful digital transformation?

Research Objectives: This research aims to answer the question of the impact of key
attributes of digital transformation on the organization’s innovation especially in terms
of new product creation and what affects those different variables has on this capability.
This objective selected due to the limited studies in this domain identified during the
article selection stage.
Research Important: This research covers the area of post-digital transformation, that
addresses the impotence of product innovation as one of the major expected outcomes
for digital transformation adopted by the organization. The number of research in this
domain based on the research conducted above shows limited contribution in this area
by academia.
The Effect of Digital Transformation on Product Innovation 733

2 Literature Review

Digital Transformation is not anymore the choice of organization, the world is full of a
new business model that disrupting all and every stable industry [3–6]. Allot of
organizations started and completed their transformation journey while the others are
still in the way. To understand the size and value of the transformation let us set a
common understanding of the subject. The research questions are:
What is digital transformation? Why it is needed and what is expected value?
Digital Transformation is not just about implementing more and better technolo-
gies. It involves aligning the organization with the demands of the digital environment
by increasing appetite for risk, investing in digital opportunities for your employees,
streamlining organizational structures for agility. Only then the organization can move
from doing digital to being digital [7]. Companies on their digital journey need to
recognize that digital is more than a technology to implement. Instead, digital requires a
corporate shift in mindset that companies of all sizes and sectors need to embrace to
position themselves to successfully compete now and in the future [8]. Digital leaders
to empower employees and to display relationship-orientation (empower employees
without overburdening them) and to teach employees how to efficiently work in virtual,
self-organizing, or cross-organizational teams. Organizational culture should be
transformed into a culture of involvement, in which decisions are taken together, a
culture of innovation, that ensures agility based on the acceptance of suggestions, and a
culture of training, in which staff is constantly developed [9]. Accelerated digitaliza-
tion, in parallel with the transformation of business models, could add many millions of
revenues to economic growth, additional international investment, and increased
international competitiveness [10]. Digital transformation needs to come from the top,
and companies should designate a specific executive or executive committee to
spearhead efforts. Companies should take small steps, via pilots and skunkworks, and
invest in the ones that work [11]. Agility and Open Innovation are considered an
essential factor for maintaining competitiveness and ultimately for the survival of a
company [12]. The ultimate goal for digital transformation is to change the existing
situation and to create new production processes, new products, and new markets.
Digital transformation provides the organization with competitive advantages [13–15].
To add more, the capability of any organization to be agile in product creation and
innovation may add a competitive advantage to these organizations compare to their
competitor and create new value and maybe new Market to work in [16–20].

3 Methods

The main objective of this research is to explore the effect of post-digital transformation
on a literature review. The research used a systematic literature review, following [21]
protocol. It is a rigorous approach to search, analyze, and collate all relevant empirical
evidence. Applied in a given domain, it allows a complete interpretation of research
results including a gap in research. The systematic literature review follows the fol-
lowing steps as seen in Fig. 1 below:
734 J. Almaazmi et al.

Fig. 1. Systematic literature review method.

Starting with a literature review to build a foundation for knowledge is an important


step to start this research study. Then support by a systematic review for almost 30
articles from different journals. For this exercise, multiple databases were used with
different keywords. The review was undertaken in distinct stages: the identification of
inclusion and exclusion criteria, data sources and search strategies, quality assessment,
and data analysis. These stages are further detailed in the following sections.

3.1 Inclusion/Exclusion Criteria


The inclusion criteria for this systematic review are as follows:
1. Date: published in the period 2012 to 2020
2. Language: articles in English
3. Type of studies: Conference Paper and Proceeding; Magazine; Newspapers and
Scholarly Journals.
4. Study design: meta-analyses and controlled
5. Measurement: studies in which the effect of digital Transformation on Organization
6. Outcome: Area and dimensions of impact
7. Context: study related to Digital Transformation and implementation in different
organizations and digital innovation.

3.2 Data Sources and Research Strategies


This study adopts the systematic approach technique. Many scholars such as [5, 22–24]
have been used in this method. The method aimed to go through a set of research
databases and select the most interrelated articles on the topic at hand. The ProQuest
Central database, Google Scholar and Scopus were used to conduct the search of key
terms and the following generic search filters were also applied during the search:
1. Conference Paper and Proceeding, Magazines, Newspapers and Scholarly Journals
2. Published in the period 2012 to 2020.
3. Articles in English.
4. Documents type: Article, Case Study, Conference, Conference Paper, Conference
Proceeding, Essay, Letter to the editor, Literature Review, Reference document,
Report, Review.
The search was conducted using a combination of search terms: “Digital Trans-
formation AND Innovation”; “Digital Innovation” and “Organizational Transforma-
tion AND Digital Transformation” etc…
The Effect of Digital Transformation on Product Innovation 735

Search on multiple databases and search engine were carried out with proper tuning
and focus on the search within the title and abstract the final result presented in Table 1
below:

Table 1. Number of article per search engine.


Search engine Number of articles
ProQuest 152
Google Scholar 288
Scopus 128

These articles then went through another quick scan on the abstract to identify the
related article for this research. Only 25 articles were selected. Also and during the
articles review, 4 more articles were added to the list from the reference of the main
articles. 29 articles were finally select for this research. Figure 2 below illustrates the
selection steps:

Fig. 2. Step of search and selection.

3.3 Quality Assessment


A manually quality assessment of the selected articles was carried by simply validate
the content of the article and how much related to the subject which was carried out
during the filtration stage of the article search. Also, a quality assessment checklist with
8 question criteria was developed and used to appraise the quality of the research
articles as seen in Table 2 and Table 3. The checklist was adapted from those Sug-
gested by [21].
736 J. Almaazmi et al.

Table 2. Quality assessment questions


# Question
1 Are the research aims specified?
2 Was the study designed to achieve these aims?
3 Are the variables considered by the study specified?
4 Is the study context/discipline specified?
5 Are the data collection methods adequately detailed?
6 Does the study explain the reliability/validity of the measures?
7 Are the statistical techniques used to analyze the data adequately described?
8 Do the results add to the literature?
9 Does the study add to your knowledge or understanding?

Each question was scored 0, 0.5, and 1 the articles has based the quality assessment
with a mini score of 68.75%. As shown below:

Table 3. Quality assessment results.


Article Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Total Percentage
[25] 1 1 1 1 1 1 1 1 8 100%
[15] 1 1 1 0 0.5 0.5 1 1 6 75.00%
[26] 1 1 1 0 0.5 1 0.5 1 6 75.00%
[27] 1 1 1 1 1 1 1 1 8 100%
[28] 1 1 1 0 0.5 1 0.5 1 6 75%
[9] 1 1 1 1 1 1 1 1 8 100%
[29] 1 1 1 1 1 0.5 1 1 7.5 93.75%
[30] 1 1 1 1 1 1 1 1 8 100%
[31] 1 1 1 1 1 0.5 0.5 1 7 87.50%
[32] 1 1 1 1 1 1 1 1 8 100%
[2] 1 1 1 1 1 1 1 1 8 100%
[33] 1 1 1 1 1 0.5 0.5 1 7 87.50%
[12] 1 1 1 0 0.5 1 1 1 7.5 93.75%
[34] 1 1 1 0 0.5 1 1 1 6.5 81.25%
[35] 1 1 1 1 1 0.5 0.5 1 7 87.50%
[33] 1 1 1 1 1 1 1 1 8 100%
[36] 1 1 1 1 1 1 1 1 8 100%
[10] 1 1 1 0 0.5 0.5 0.5 1 5.5 68.75%
[37] 1 1 1 1 1 1 1 1 8 100%
[38] 1 1 1 1 1 1 1 1 8 100%
[39] 1 1 1 0 0.5 1 0.5 1 6 75%
[40] 1 1 1 1 1 0.5 1 1 7.5 93.75%
[8] 1 1 1 1 1 1 1 1 8 100%
[41] 1 1 1 0 0.5 1 1 1 6.5 81.25%
[42] 1 1 1 0 0.5 0.5 0.5 1 5.5 68.75%
[43] 1 1 1 0 0.5 1 1 1 6.5 81.25%
[7] 1 1 1 1 1 1 0.5 1 7.5 93.75%
The Effect of Digital Transformation on Product Innovation 737

4 Results

The main table shows the outcome of the study of 29 articles

4.1 Country-Wise Distribution


The selected article was conducted across any region of the world as seen in Fig. 3,
seven articles did not specify the region. Also, eight articles where global conducted,
the graph below reflect the region of the selected articles. This highlights that Digital
transformation is a global trend and bringing global attention.

Country Wise Distribution


8
7
6
5
4
3
2
1
0

Fig. 3. Country-wise distribution of articles

4.2 Industry-Wise Distribution


Also, the industry of target for the selected articles covers many Mix industries,
however, 11 articles did not specify the industry type. Below Fig. 4 presents the
industries of focus for the selected articles.
738 J. Almaazmi et al.

Industry Frequancy

12
10
8
6
4
2
0

Fig. 4. Industry-wise distribution of articles

4.3 Sample Size Used


When it comes to sampling size, the research split the size into 5 groups as seen in
Fig. 5 to be: not specified, 1–9, 10–99, 100–999, and 1000–9999, however, there were
10 articles that didn’t specified the sample size. Further information illustrated in the
following Graph.

Sample Size Frequancy

10

0
Not 1 to 9 10 to 99 100 to 999 1000 to
Specified 9999

Fig. 5. Sample size frequency


The Effect of Digital Transformation on Product Innovation 739

5 Conclusion

The present reviewed articles all agree the digital transformation is much more than just
technologies upgrade, some studies were talking also about transformation without
technology changes. At the same time, all articles agreed on business model changes,
acquire new skills, new business processes, work culture and digital leadership are all
main pillars for digital transformation. However, the result of the pillars after trans-
formation and the impact on organization performance is yet to be analyzed. In other
words, the subject of the Post Transformation requires further studies as the majority of
the studies and research focus on Pre-digital transformation and During Transformation
stages. During our research, we could not come across a research test and evaluating
the outcome of digital transformation and test improvement achieved in these orga-
nizations on a systematic approach. Hence, the researcher is recommending the aca-
demics to give attention to this domain for further studies.

References
1. IDC: Worldwide Spending on Digital Transformation Will Reach $2.3 Trillion in 2023,
More Than Half of All ICT Spending, According to a New IDC Spending Guide (2019)
2. Hess, T., Matt, C., Benlian, A., Wiesböck, F.: Options for formulating a digital
transformation strategy. MIS Q. Exec. 15(2), 123–139 (2016)
3. Salloum, S.A., Al-Emran, M., Shaalan, K.: The impact of knowledge sharing on information
systems: a review. In: 13th International Conference, KMO (2018)
4. Alhashmi, S.F.S., Salloum, S.A., Abdallah, S.: Critical success factors for implementing
artificial intelligence (AI) Projects in Dubai Government United Arab Emirates
(UAE) Health Sector: Applying the Extended Technology Acceptance Model (TAM), vol.
1058 (2020)
5. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
6. Zainal, A.Y., Yousuf, H., Salloum, S.A.: Dimensions of agility capabilities organizational
competitiveness in sustaining. In: Joint European-US Workshop on Applications of
Invariance in Computer Vision, pp. 762–772 (2020)
7. Kiron, D., Kane, G.C., Palmer, D., Phillips, A.N., Buckley, N.: Aligning the organization for
its digital future. MIT Sloan Manag. Rev. 58(1), 1–17 (2016)
8. Kane, G.C., Palmer, D., Phillips, A.N., Kiron, D.: Is your business ready for a digital future?
MIT Sloan Manag. Rev. 56(4), 37 (2015)
9. Schwarzmüller, T., Brosi, P., Duman, D., Welpe, I.M.: How does the digital transformation
affect organizations? Key themes of change in work design and leadership. mrev Manag.
Rev. 29(2), 114–138 (2018)
10. Yanovska, V., Levchenko, O., Tvoronovych, V., Bozhok, A.: Digital transformation of the
Ukrainian economy: digitization and transformation of business models. In: SHS Web of
Conferences, vol. 67, p. 5003 (2019)
11. Woo, S.-R., et al.: STING-dependent cytosolic DNA sensing mediates innate immune
recognition of immunogenic tumors. Immunity 41(5), 830–842 (2014)
740 J. Almaazmi et al.

12. Burchardt, C., Maisch, B.: Digitalization needs a cultural change–examples of applying
agility and Open Innovation to drive the digital transformation. Procedia CIRP 84, 112–117
(2019)
13. Shannak, R., Masa’deh, R., Al-Zu’bi, Z., Obeidat, B., Alshurideh, M., Altamony, H.: A
theoretical perspective on the relationship between knowledge management systems,
customer knowledge management, and firm competitive advantage. Eur. J. Soc. Sci. 32(4),
520–532 (2012)
14. Altamony, H., Alshurideh, M., Obeidat, B.: Information systems for competitive advantage:
Implementation of an organisational strategic management process. In: Proceedings of the
18th IBIMA Conference on Innovation and Sustainable Economic Competitive Advantage:
From Regional Development to World Economic, Istanbul, Turkey, 9th–10th May 2012
15. Lozic, J.: Core concept of business transformation: from business digitization to business
digital transformation. In: Economic and Social Development (Book of Proceedings), 48th
International Scientific Conference on Economic and Social Development, vol. 1, no. m3,
p. 159 (2019)
16. ELSamen, A., Alshurideh, M.: The impact of internal marketing on internal service quality: a
case study in a Jordanian pharmaceutical company. Int. J. Bus. Manag. 7(19), 84 (2012)
17. Alkalha, Z., Al-Zu’bi, Z., Al-Dmour, H., Alshurideh, M., Masa’deh, R.: Investigating the
effects of human resource policies on organizational performance: An empirical study on
commercial banks operating in Jordan. Eur. J. Econ. Finan. Adm. Sci. 51(1), 44–64 (2012)
18. Alshurideh, M.T., et al.: The impact of Islamic Bank’s service quality perception on
Jordanian customer’s loyalty. J. Manag. Res. 9, 139–159 (2017)
19. Ashurideh, M.: Customer service retention–A behavioural perspective of the UK mobile
market, Durham University (2010)
20. Nedal Fawzi Assad, M.T.A.: Financial reporting quality, audit quality, and investment
efficiency: evidence from GCC economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208
(2020)
21. Kitchenham, S., Charters, B.: Guidelines for performing systematic literature reviews in
software engineering. Softw. Eng. Group Sch. Comput. Sci. Math. Keele Univ. 1–57 (2007)
22. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
23. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
24. Nedal Fawzi Assad, M.T.A.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
25. Berman, S.J.: Digital transformation: opportunities to create new business models. Strateg.
Leadersh. 40(2) (2012)
26. Shahin, I.: Speaker verification in emotional talking environments based on three-stage
framework. In: 2017 International Conference on Electrical and Computing Technologies
and Applications (ICECTA), pp. 1–5 (2017)
27. Cha, K.J., Hwang, T., Gregor, S.: An integrative model of IT-enabled organizational
transformation. Manag. Decis. 53(8) (2015)
28. Kotarba, M.: Digital transformation of business models. Found. Manag. 10(1), 123–142
(2018)
29. Ivančić, L., Vukšić, V.B., Spremić, M.: Mastering the digital transformation process:
business practices and lessons learned. Technol. Innov. Manag. Rev. 9(2), 36–50 (2019)
30. Bouwman, H., Nikou, S., Molina-Castillo, F.J., de Reuver, M.: The impact of digitalization
on business models. Digit. Policy Regul. Gov. 20(2) (2018)
The Effect of Digital Transformation on Product Innovation 741

31. Berman, S., Marshall, A.: The next digital transformation: from an individual-centered to an
everyone-to-everyone economy. Strateg. Leadersh. (2014)
32. Caruso, L.: Digital innovation and the fourth industrial revolution: epochal social changes?
Ai Soc. 33(3), 379–392 (2018)
33. Boskovic, A., Primorac, D., Kozina, G.: DIGITAL Organizations and digital transformation.
Econ. Soc. Dev. B. Proc. 263–269 (2019)
34. Kazim, F.A.B.: Digital transformation and leadership style: a multiple case study. ISM J. Int.
Bus. 3(1), 24–33 (2019)
35. Fitzgerald, M., Kruschwitz, N., Bonnet, D., Welch, M.: Embracing digital technology: A
new strategic imperative. MIT Sloan Manag. Rev. 55(2), 1 (2014)
36. Pelletier, C., Cloutier, L.M.: Conceptualising digital transformation in SMEs: an ecosystemic
perspective. J. Small Bus. Enterp. Dev. (2019)
37. Mhlungu, N.S.M., Chen, J.Y.J., Alkema, P.: The underlying factors of a successful
organisational digital transformation. South African J. Inf. Manag. 21(1), 1–10 (2019)
38. Müller, J.M., Traub, J., Gantner, P., Veile, J.W., Voigt, K.-I.: Managing digital disruption of
business models in Industry 4.0. In: ISPIM Conference Proceedings, pp. 1–19 (2018)
39. Michelle Kirk IGP, C.R.M.: Be the change: driving digital transformation in your
organization. Inf. Manag. 52(3), 44–46 (2018)
40. Weill, P., Woerner, S.L.: Is your company ready for a digital future? MIT Sloan Manag.
Rev. 59(2), 21–25 (2018)
41. Doherty, E., Carcary, M., Conway, G., Crowley, C.: Customer experience management
(CXM)–development of a conceptual model for the digital organization. In: ECISM 2017
11th European Conference on Information Systems Management, p. 103 (2017)
42. Delmond, M.-H., Coelho, F., Keravel, A., Mahl, R.: How information systems enable digital
transformation: a focus on business models and value co‐production. In: HEC Paris Res.
Pap. No. MOSI-2016-1161 (2016)
43. Shrivastava, S.: Digital disruption is redefining the customer experience: the digital
transformation approach of the communications service providers. Telecom Bus. Rev. 10(1),
41 (2017)
Women Empowerment in UAE: A Systematic
Review

Asma Omran Al Khayyal1 , Muhammad Alshurideh1,2(&) ,


Barween Al Kurdi3 , and Said A. Salloum4
1
University of Sharjah, Sharjah, UAE
malshurideh@sharjah.ac.ae
2
Faculty of Business, University of Jordan, Amman, Jordan
3
Amman Arab University, Amman, Jordan
4
Research Institute of Sciences and Engineering,
University of Sharjah, Sharjah, UAE
ssalloum@sharjah.ac.ae

Abstract. Women in the UAE have long been recognized as equal partners in
national development and many efforts were taken to pursue empowering
women in cultural, social, and economic fields. This systematic review is dis-
tinctive; that it synthesizes research on the influence of different factors affecting
women leadership/entrepreneurship positions in the UAE. It aims to investigate
the key factors affecting the female entrepreneurs and leaders, and to develop a
conceptual model on the principal social and cultural factors inducing the
success of Emirati women in attaining senior leadership roles or leading
entrepreneurs. Moreover, this study proposes a set of new directions for future
research venues to address the current dearth of empirical work on women’s
leadership in the UAE, despite all the empowering programs and initiatives. It
includes articles using quantitative and qualitative methods from different
databases like ProQuest and EBSCO. The study reviews databases and selects
many high-quality articles to be reviewed using a set of inclusion and exclusion
criteria. 23 articles were chosen after removing the duplicated ones. From a
practical perspective, this study is important while it summarizes and evaluates
all factors affecting women empowering in leadership roles and tries to offer a
proper model for a large number of women empowerment drivers within the
UAE setting.

Keywords: Women  Empowerment  Leadership  Entrepreneurship  UAE 


Impacts  Challenges

1 Introduction

Women’s empowerment has been identified with many different perspectives, con-
cepts, interventions, and consequences. Literature offers different definitions of
empowerment and it seems to be no universally accepted definition, due to the vari-
ations in the cultural context that affect how empowerment may occur. Nonetheless,
much of the research agrees that Women empowerment refers to increasing the political,
social, educational, or economic strength of individuals and communities of women.
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 742–755, 2021.
https://doi.org/10.1007/978-3-030-58669-0_66
Women Empowerment in UAE: A Systematic Review 743

It has become a significant topic of discussion in regards to development and eco-


nomics. Nations, businesses, and communities can benefit from the implementation of
programs and policies that adopt the notion of women empowerment.
The United Nation outlined five key components of empowerment:
1. Women’s sense of self-worth
2. Women’s right to have and to determine choices
3. Women’s right to have access to opportunities and resources
4. Women’s right to have the power to control their own lives, both within and outside
the home
5. Women’s ability to influence the direction of social change to create more just
social and economic order, nationally and internationally.
An increasing number of successful Female entrepreneurship; which is one aspect of
women empowerment, is becoming a powerful segment in the context of developing
countries.

1.1 Research Country Context


The United Arab Emirates often referred to as the U.A.E, has been formed as a
federation in 1971 out of seven individual emirates. It is located on the eastern side of
the Arabian Peninsula, at the entrance to the Arabian Gulf. It has coastlines on the
Oman Gulf and the Arabian Gulf, with Saudi Arabia to the west and southwest, and
Oman to the southeast and Musandam’s eastern tip. Arabic is the official language and
Islam is the official religion in the country. The UAE has experienced unprecedented
economic growth and urban development, thanks to significant oil revenues over the
past three decades [1–4]. The population of the UAE estimated at 9.89 million in 2019
(69.12% male, 30.88% female). UAE in the last few decades made significant progress
in the fields of women’s empowerment, having a GII (Gender Inequality Index) value
of 0.113, ranking it 26 out of 162 countries in the 2018 index. Women hold 22.5% of
parliamentary seats and 78.8% of adult women have at least secondary education,
contrasted with 65.7% of their male counterparts (UNDP Human Development Report
2019). Despite the initiatives of the UAE government to empower women and the
growing percentage of educated females, the level of female entrepreneurship in the
UAE remains very low compared to other countries with similar GDP levels. The UAE
provides a fertile and interesting context for this research, given that within the small
but growing research on women empowerment in the Arab world, there is little data on
women entrepreneurs in the UAE. To better understand whether the overall environ-
ment in the UAE which includes all policies, regulations, social, cultural, and religious
aspects regarding women empowerment is considered as barriers toward successful
female entrepreneurial activities or being supportive, I have conducted a review of the
available literature.

1.2 Research Objectives


The primary objective of this review was to examine the impact of the local envi-
ronment on women’s individual-level empowerment in the United Arab Emirates using
744 A. O. Al Khayyal et al.

evidence from rigorous quantitative and qualitative evaluations. The secondary


objective was to examine the perspectives of female participants on their experiences of
empowerment as a result of the new governmental efforts in the UAE using evidence
from high-quality evaluations. We conducted an integrated mixed-methods systematic
review that examined data generated through both quantitative and qualitative research
methods.

1.3 Research Question


This study aims to answer the following question:
RQ1: What factors affect female leadership and entrepreneurial activities in the
United Arab Emirates? Do we have a supportive environment?

2 Literature Review

There is an abundance of literature available on women as entrepreneurs, challenges


faced by them, and the opportunities they possess in the world. However, very limited
literature is present related to women entrepreneurs in the UAE. This study is based on
research that involved searching academic online databases and the use of the google
scholar search engine to the source material. It evaluated the scope quality and accuracy
of the publications used in this paper as well as their validity and importance [5–7]. The
bulk of references were peer-reviewed articles. Most sources were from peer-reviewed
journals. Broadly the five conceptual dimensions of empowerment commonly found
throughout the literature include psychological [8, 9], social [10, 11], cultural [12, 13],
and legal dimensions [12, 14–16].
Authors in [12] confirmed the significant influence on women’s empowerment of
the role of the national government, Islamic work ethic, and family. They shed the light
that remarkably, the problems faced by Emirati working women are distinct from the
rest of the region, and the perspective of Emirati is relevant to international literature by
introducing more progress to the concept of women empowerment research. In [9], the
author’s study showed that despite the different governmental initiatives in the UAE,
female entrepreneurship is not a popular option among Emirati women. They described
the enablers in the UAE framework and confirmed that the spotting market trends and
customer needs were rated by Emirati female entrepreneurs as the primary enabler of
entrepreneurship, accompanied by management skills growth and sustainable com-
petitive advantage. While [8] suggested in their study that personal, environmental, and
government support factors affect the success of women-owned SMEs in the UAE
positively and significantly. In [13], the author’s study showed that women’s
advancement was measured and affected by an eight-factor model, economic needs, job
efficiency, type and practices of leadership, marital status, social needs, organizational
commitment, organizational satisfaction, and public policy. In her research [17]
examined the challenges faced by women entrepreneurs in the UAE, the effect of
cultural values on these obstacles, and the growth of women’s enterprises. She
emphasized that cultural values and contextual factors explain the low rates of
women’s entrepreneurship in the UAE and the growth of their businesses. In the study
Women Empowerment in UAE: A Systematic Review 745

done by [11], they interviewed 16 women who do not seem to face any tension with
their entrepreneurial life and their personal, family, social, leisure, and friendship lives.
Though satisfied with being in business, however, they face certain barriers when
starting their venture, emerging mainly from lack of support, social structure and
traditions, and family and personal reasons. [18] described the main reasons for the
under-representation of Emirates women in the information technology (IT) sector of
the United Arab Emirates (UAE); and the obstacles and challenges that national women
have personally experienced while working in this national economy sector. The study
showed that although national women have made considerable advances into nearly all
careers and occupations in recent years, they are still notably underrepresented in IT,
especially in the private sector and few of them are in senior-level positions. The results
showed that cultural and family considerations tend to discourage many young Emi-
rates people from pursuing occupations in this field, and negative gendered behavioral
stereotypes regarding women are still prominent in the local IT industry. Based on [19],
who examined how local Emirati women navigate participation in the workforce, the
main challenges identified are related to social norms, family, and personal aspects.
Authors in [20] described three trends that impact women’s careers: family effect on
professional lives, individual attitudes towards professional preparation, and career
development in the workforce.
In [21], the authors study mentioned the new regulations set by the United Arab
Emirates to support women in entrepreneurial ventures; as it attempts to involve all its
citizens in economic and social development. They reviewed the main areas that affect
the progress of UAE female entrepreneurship; starting with governmental efforts to
improve female entrepreneurship; the socio-cultural realities that restrict women in
business ventures; the impact of the UAE’s highly collectivistic culture on women’s
business networking; and eventually, the UAE women’s motivation for entrepreneur-
ship. Authors in [22] described and discussed the obstacles and barriers faced by Arab
women in the United Arab Emirates (UAE) in their professions, considering the UAE
government’s significant efforts to encourage women in leadership. They reported that
the main obstacles included: adverse laws, work/family tensions, and lack of support
and entrepreneurial opportunities. Authors in [23] reported that although female
Emirati students have little opportunity to develop essential leadership skills during
childhood and youth, they enjoy being challenged. The research addressed how
important challenges can be, as challenging difficulties will teach them as much, if not
more than positive scenarios. Authors in [24] defined a set of variables which affects
female entrepreneurship, including the gender of the owner, private sources of funds,
external sources of funds, use of technology, business expenses, number of weekly
hours worked by the owner, outsourcing or subcontracting, business age, and number
of family members assisting the owner in the management of the business. Authors in
[25] developed an overview of a group of Dubai business students who were optimistic
about the function that universities may add in fostering their involvement in
entrepreneurship for their education and as the incubator for their new venture. Authors
in [26] raised their question of whether gender matters in the UAE. They examined
whether there is a difference in the entrepreneurial intentions among male and female
students in the UAE.
746 A. O. Al Khayyal et al.

3 Method

This study applied the systematic review approach. Many scholars such as [27–30]
used this method. This method is employing a critical literature review, which is an
important step before conducting any research study. It establishes the groundwork for
knowledge accumulation, which enables the theories’ extensions and developments,
Identifies gaps in current knowledge, and uncovers areas where previous research has
missed. A literature review can be viewed as a systematic literature review only when
the review is based on explicit research questions, determines and analyzes relevant
research studies, and evaluates their quality based on specified criteria [31]. We fol-
lowed the conduct and reporting standards for systematic reviews of social and eco-
nomic interventions as set out by the Campbell Collaboration [32], including the
development and publication of a protocol with pre-determined inclusion criteria and
analysis plan which was published in the Campbell Collaboration Library [33]. This
systematic review includes clear inclusion and exclusion criteria, an explicit search
strategy, systematic coding and analysis of included studies, and quality assessment.
The details of these stages are described in the following sub-sections.

3.1 Search Terms


Table 1 shows the main terms and synonyms used in the research step.

Table 1. Terms and synonyms


Term Synonyms
Women Female, feminize
Empowerment Leadership, entrepreneurship, management
UAE United Arab Emirates, Dubai, Abu Dhabi, Sharjah
Impacts Factors, challenges, barrier, motivation, enablers
Women Female, feminize

3.2 Inclusion/Exclusion Criteria


The articles that will be critically analyzed in this review study should meet the
inclusion and exclusion criteria described in Table 2.
Women Empowerment in UAE: A Systematic Review 747

Table 2. Inclusion/Exclusion criteria


Inclusion criteria Exclusion criteria
Should involve “Women” “empowerment” “leadership Articles specialized in health,
or entrepreneurship” education
Should be written in the English language Papers that use languages other
than English
Should be published in the period from 2000–2019 articles published before the year
2000
Should be Peer-reviewed articles Non-peer reviewed
Should be studying the UAE context Other countries than the UAE

Full details about the data extraction forms and critical appraisal tools are available
in the full systematic review in the Campbell Library [34].

3.3 Initial Search Result


Further search strategies using ancestry review of all cited references in included
articles, which add 12 more articles. In total 114 articles were considered as seen in
Table 3.

Table 3. Initial search result.


Database Key terms Result Relevant
ProQuest ab(women) AND ab(leadership) AND ab(UAE) 4 1
ProQuest ab(women) AND ab(leadership) AND ab(United Arab 5 1
Emirates)
ProQuest ab(women) AND ab(entrepreneurship) AND ab(UAE) 4 0
ProQuest ab(women) AND ab(entrepreneurship) AND ab(United 8 1
Arab Emirates)
ProQuest ab(women) AND ab(entrepreneur) AND ab(UAE) 6 2
EBSCO (women or female or woman or females) 59 6
And (leadership or leader or entrepreneurship or
entrepreneur)
And (UAE or the United Arab Emirates or Dubai or Abu
Dhabi) And
(challenges or barriers or difficulties or issues or problems
or limitations)
Total 93 11
748 A. O. Al Khayyal et al.

Fig. 1. Systematic review process.

3.4 Search Strategy


The studies included in this systematic literature review were collected through a broad
search of available studies through the following online academic databases and the use
of the google search engine: ProQuest, and EBSCO, and Google Scholar. The search
terms include the keywords (“Women” AND “Empowerment”) and (“Women” AND
“leadership OR entrepreneurship”) in the “UAE” context. Our search results found 93
articles using the aforementioned keywords. We assessed the scope, content, accuracy,
as well as the authority and relevance of the articles, reports, and web-based material
used in this paper following guidance for finding, retrieving, and evaluating journal and
web-based information. Moreover, we filtered out 82 articles that we found as dupli-
cated after full-text reviews of relevant articles. Thus, the total number of the collected
papers becomes 11, and their distribution according to databases they belong to is
presented in Table 3. For each study, the researchers confirmed the inclusion and
exclusion criteria. Further search strategies using ancestry review of all cited references
in included articles, which add 12 more articles. Overall, 23 research articles met the
inclusion criteria and have been used in the analysis process. Figure 1 illustrates the
systematic review process and the number of articles determined at each stage. To
minimize bias in study inclusion criteria, this study follows guidelines for the
Women Empowerment in UAE: A Systematic Review 749

preparation of review protocols by the Campbell Collaboration as well as PRISMA


guidelines for systematic reviews [35]. However, the majority of studies were cross-
sectional and non-experimental

3.5 Data Coding and Analysis


The characteristics correlated to the research methodology quality were coded
including (a) the main affecting factors, (b) research methods (e.g., survey, interviews,
experiment, etc.), (c) publication year, (d) participants, and (f) database.

4 Result

Concerning the published 23 research studies about the main factors influencing
women empowerment in UAE from 2000 to 2019, the findings of this systematic
review are reported based on the six research questions.

RQ1: What are the main factors influencing women empowerment in UAE?
This study classifies the factors affecting women leadership/entrepreneurship positions
in the UAE across the analyzed studied to determine what are the most frequent factors.
Table 4 shows the most frequent factors used.

Table 4. Frequency table.


Articles Government Technology/ Skills/ Social- Gender Cultural Religious Personal Political National Balancing Role
regulations financial training/ family inequality/ factors traits situation economy work with model
and laws resources education support discrimination development life res. existence
availability
S1 X X X X X
S2 X X X X X
S3 X X X X X X
S4 X X X X X
S5 X X X X X X
S6 X X X
S7 X X
S8 X X X
S9 X X
S10 X X X X
S11 X X X X
S12 X X X X X
S13 X X X X
S14 X X X
S15 X X X X X X X
S16 X X X X X
S17 X X
S18 X X X X
S19 X X X
S20 X X X X
S21 X X X X X X
S22 X X
S23 X X X
Total 13 8 13 21 0 9 4 15 0 4 4 2
750 A. O. Al Khayyal et al.

governem…
25
technolog… role…
20
skills/train… 15 balancing…
10
5
social-… 0 naƟonal… total

gender… poliƟcal…

cultural… Personal…
religious

Fig. 2. Effecting factors.

Table 4 is showing the most factors affecting women empowerment in the UAE.
We can notice that social- family support is the most frequent factor studied (N = 21),
followed by Personal traits (N = 15), government regulations and laws and
skills/training/education (N = 15), cultural factors (N = 9), technology/financial
resources availability (N = 8), Religious factors and national economic development
and balancing work with life responsibilities (N = 4), role model existence (N = 2),
while none of the studies mentioned the gender discrimination factor neither the
political situation although these factors are considered from the global factors affecting
women empowerment as seen in Fig. 2.

focus group
9%
survey
22%
survey
quesƟonnaire
interview
interview focus group
43% quesƟonnai
re
26%

Fig. 3. Main research methods.


Women Empowerment in UAE: A Systematic Review 751

According to the earlier result, we came up with the following hypothesis:


H1. There is a significant direct relationship between women’s empowerment and
personal factors.
H2. Support from government authorities/HR departments/other women associa-
tions is positively related to women empowerment.
H3. Environmental factors are positively related to women’s empowerment.
H4. Environmental factors are positively related to women’s empowerment.
RQ2: What is the main research methods addressed in the collected studies?
Figure 3 indicates that 43% of the analyzed studies were mainly dependent on
interviews (N = 10) for data collection, this is followed by questionnaire surveys (N =
6) and the surveys (N = 5), and 2 focus groups.

Types Of Parcipants
employee
student 9%
22%
employee
entrepreneur
leader

leader student
26% entrepreneur
43%

Fig. 4. Types of participants.

RQ3: What are the types of participants in the collected studies?


Figure 4 shows the distribution of the analyzed studies in terms of participants’
types. We can observe that women empowerment studies in UAE were primarily
focused on entrepreneurs (N = 10). This is followed by studies that focused on leaders
and managers (N = 6), students participants were (N = 5), while 2 studies were done ith
unspecified levels of employees. We can notice that the majority of studies were
focusing on female opinions rather than males’.
RQ4: How are the women empowerment studies are distributed across years of
publication?
752 A. O. Al Khayyal et al.

Arcles' Publicaon Year


6
5
4

Total
3
2
1
0
20 20 20 20 20 20 20 20 20 20 20 20 20 20
17 16 15 14 13 12 11 10 09 08 07 06 05 04
Total 5 2 1 3 1 1 3 3 2 1 0 0 0 1

Fig. 5. Articles publication year.

Concerning the publication year, Fig. 5 describes the distribution of the analyzed
articles over the years in which these articles were published. In that, these studies are
ranged from 2000 to 2019. The number of published articles fluctuated over the years.
In (2010, 2011, and 2014) 3 studies met the researcher criteria. Furthermore, there is a
remarkable increase in published articles since 2015. It is worthwhile that the number
of published articles in 2017 was the largest and this could refer to the increasing
awareness in this field.

Fig. 6. Database used.


Women Empowerment in UAE: A Systematic Review 753

RQ5: What are the active databases in the context of women empowerment in
UAE?
This section is dedicated to determining the most active databases that publish
studies related to women empowerment in the UAE. Figure 6 shows the distribution of
the collected studies in terms of databases. ProQuest is considered the most productive
database among others with 17 published articles. This is followed by Google Scholar
(N = 4), Google, EBSCO with one study each.

5 Conclusion

Twelve factors affecting women empowerment in the leadership and entrepreneurship


world were identified, and contextual relationships between them were established
using the frequency model. While significant improvements have been made in
empowering Emirati women, who represent 59% of the total government workforce (as
statistically announced in 2013), further personal, organizational and social develop-
ment initiatives are needed to make entrepreneurship a more attractive option for
Emirati women. Hopefully, the findings of the study would help legislators and
organizations formulate strategies toward a better business environment for female
entrepreneurs in Emirati.

References
1. Nedal Fawzi Assad, M.T.A.: Financial reporting quality, audit quality, and investment
efficiency: evidence from GCC economies. WAFFEN-UND Kostumkd. J. 11(3), 194–208
(2020)
2. Aburayya, A., Alshurideh, M., Albqaeen, A., Alawadhi, D., Ayadeh, I.: An investigation of
factors affecting patients waiting time in primary health care centers: an assessment study in
Dubai. Manag. Sci. Lett. 10(6), 1265–1276 (2020)
3. AlShurideh, M., Alsharari, N.M., Al Kurdi, B.: Supply chain integration and customer
relationship management in the airline logistics. Theor. Econ. Lett. 9(02), 392–414 (2019)
4. Al-Jarrah, I., Al-Zu’bi, M.F., Jaara, O., Alshurideh, M.: Evaluating the impact of financial
development on economic growth in Jordan. Int. Res. J. Financ. Econ. 94, 123–139 (2012)
5. Ashurideh, M.: Customer service retention–a behavioral perspective of the UK mobile
market. Durham University (2010)
6. Kurdi: Healthy-food choice and purchasing behaviour analysis: an exploratory study of
families in the UK. Durham University (2016)
7. Al-dweeri, R., Obeidat, Z., Al-dwiry, M., Alshurideh, M., Alhorani, A.: The impact of e-
service quality and e-loyalty on online shopping: moderating effect of e-satisfaction and e-
trust. Int. J. Mark. Stud. 9(2), 92–103 (2017)
8. Gupta, N., Mirchandani, A.: Investigating entrepreneurial success factors of women-owned
SMEs in UAE. Manag. Decis. 56(1), 219–232 (2018)
9. Jabeen, F., Faisal, M.N.: Imperatives for improving entrepreneurial behavior among females
in the UAE. Gend. Manag. Int. J. 33(3), 234–252 (2018)
10. Tlaiss, H.A.: Women managers in the United Arab Emirates: successful careers or what?
Equal. Divers. Incl. Int. J. 32(8), 756 (2013)
754 A. O. Al Khayyal et al.

11. Itani, H., Sidani, Y.M., Baalbaki, I.: United Arab Emirates female entrepreneurs: motivations
and frustrations. Equal. Divers. Incl. Int. J. 30(5), 409–424 (2011)
12. Shaya, N., Khait, R.A.: Feminizing leadership in the Middle East. Gend. Manag. Int. J. 32
(8), 590–608 (2017)
13. Yaghi, A.: Is it the human resource policy to blame? Gend. Manag. Int. J. 31(7), 479–495
(2016)
14. Hurlbert, J.S., Beggs, J.J., Haines, V.A.: Social networks and social capital in extreme
environments. In: Social Capital, Routledge, pp. 209–231 (2017)
15. Alshurideh, M.: Is customer retention beneficial for customers: A conceptual background.
J. Res. Mark. 5(3), 382–389 (2016)
16. Ghannajeh, A., et al.: A qualitative analysis of product innovation in Jordan’s pharmaceu-
tical sector. Eur. Sci. J. 11(4), 474–503 (2015)
17. Tlaiss, H.A.: Women’s entrepreneurship, barriers and culture: insights from the United Arab
Emirates. J. Entrep. 23(2), 289–320 (2014)
18. Al Marzouqi, A.H., Forster, N.: An exploratory study of the under‐representation of Emirate
women in the United Arab Emirates’ information technology sector. Equal. Divers. Incl. Int.
J. 30(7), 544–562 (2011)
19. Marmenout, K., Lirio, P.: Local female talent retention in the Gulf: Emirati women bending
with the wind. Int. J. Hum. Resour. Manag. 25(2), 144–166 (2014)
20. Kemp, L.J., Zhao, F.: Influences of cultural orientations on Emirati women’s careers. Pers.
Rev. 45(5), 988–1009 (2016)
21. Goby, V.P., Erogul, M.S.: Female entrepreneurship in the United Arab Emirates: legislative
encouragements and cultural constraints. Women’s Stud. Int. Forum 34(4), 329–334 (2011)
22. Miller, K., Kyriazi, T., Paris, C.M.: Arab women employment in the United Arab Emirates:
exploring opportunities, motivations and challenges. Int. J. Sustain. Soc. 9(1), 20–40 (2017)
23. Madsen, S.R.: Transformational learning experiences of female UAE college students. Educ.
Bus. Soc. Contemp. Middle East Issues 2(1), 20–31 (2009)
24. Al Roomi, O., Ibrahim, M.: Performance determinants of home‐based businesses in Dubai.
J. Econ. Adm. Sci. 20(2), 61–82 (2004)
25. Gallant, M., Majumdar, S., Varadarajan, D.: Outlook of female students towards
entrepreneurship. Educ. Bus. Soc. Contemp. Middle East Issues 3(3), 1–12 (2010)
26. Majumdar, S., Varadarajan, D.: Students’ attitude towards entrepreneurship: does gender
matter in the UAE?. Foresight (2013)
27. Alhashmi, S.F.S., Alshurideh, M., Al Kurdi, B., Salloum, S.A.: A systematic review of the
factors affecting the artificial intelligence implementation in the health care sector. In: Joint
European-US Workshop on Applications of Invariance in Computer Vision, pp. 37–49
(2020)
28. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Mining in educational data: review
and future directions. In: Joint European-US Workshop on Applications of Invariance in
Computer Vision, pp. 92–102 (2020)
29. Salloum, S.A., Alshurideh, M., Elnagar, A., Shaalan, K.: Machine learning and deep
learning techniques for cybersecurity: a review. In: Joint European-US Workshop on
Applications of Invariance in Computer Vision, pp. 50–57 (2020)
30. Nedal Fawzi Assad, M.T.A.: Investment in context of financial reporting quality: a
systematic review. WAFFEN-UND Kostumkd. J. 11(3), 255–286 (2020)
31. Khan, K.S., Kunz, R., Kleijnen, J., Antes, G.: Five steps to conducting a systematic review.
J. R. Soc. Med. 96(3), 118–121 (2003)
32. Quinn, E., Huckel-Schneider, C., Campbell, D., Seale, H., Milat, A.J.: How can knowledge
exchange portals assist in knowledge management for evidence-informed decision making in
public health? BMC Public Health 14(1) (2014)
Women Empowerment in UAE: A Systematic Review 755

33. Brody, C., Dworkin, S., Dunbar, M., Murthy, P., Pascoe, L.: The effects of economic self-
help group programs on women’s empowerment: a systematic review, vol. 24, p. 2015
(2013). Accessed Oct
34. Brody, C., et al.: Economic self-help group programs for improving women’s empowerment:
a systematic review. Campbell Syst. Rev. 11(1), 1–182 (2015)
35. Shlonsky, A., Noonan, E., Littell, J.H., Montgomery, P.: The role of systematic reviews and
the Campbell Collaboration in the realization of evidence-informed practice. Clin. Soc.
Work J. 39(4), 362–368 (2011)
Robotic, Control Design and Smart
Systems
Lyapunov-Based Control
of a Teleoperation System in Presence
of Time Delay

Mohamed Sallam1(B) , Ihab Saif1 , Zakaria Saeed2 , and Mohamed Fanni3


1
Helwan University, Cairo 11795, Egypt
sallam.mohamed@h-eng.helwan.edu.eg
2
Mechatronics Engineering Department, High Institute of Engineering, Giza, Egypt
3
Egypt-Japan University of Science and Technology, Alexandria 21934, Egypt

Abstract. Time delay can significantly degrade the performance of tele-


operation systems or even render the system unstable. In the literature,
a Lyapunov-like function was used to derive a condition that can guar-
antee the position tracking of a tele-operator in the presence of time
delay. This paper extends our previous work to verify the correctness
of such a tracking condition on a different system and at various cases.
The simulation and experimental results show that in some cases even
when the condition is satisfied, position tracking is not achieved. The
same Lyapunov-like function was used in this research to derive a new
and verified tracking condition. The experimental setup consists of two
identical Phantom Premium 1.5/6DOF devices.

1 Introduction

Teleoperation system increases the human accuracy and allows procedures to


be performed on small scale as in telesurgery. It consists of two manipulators
exchanging the information about their positions over a communication channel
which delays the signals. In 1989, Anderson et al. [1] pointed out that even a
small-time delay, like 40 ms, can make the teleoperation system unstable. Since
then, several researchers presented many approaches to control a teleoperation
system in presence of time delay as in [8,9] and [10].
In 2009, Nuno et al. [6] used a Lyapunov-like function to derive a condition
under which the position error of the teleoperator is bounded. Those conditions
help in selecting the control gains that guarantee the position tracking.
In 2013, we proved in [7] that the tracking condition obtained by Nuno et al.
in [6] is not always correct. Moreover, we derived a new tracking condition. Only
after publishing our results in 2013, Nuno et al. cited the new condition in their
publications as in remark 3 in [4], remark 5 in [3], proposition 2 in [5], and remark
2 in [2]. In none of their publications they referred to their old condition.
In this paper, we extend our previous work [7] to show by analysis, simulation,
and experiments, that the tracking condition, obtained by Nuno for the P-like
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 759–768, 2021.
https://doi.org/10.1007/978-3-030-58669-0_67
760 M. Sallam et al.

Fig. 1. Bilateral teleoperation system using two Phantoms.

controller, is not always correct. Afterwards, the derivation and verification of a


new tracking condition are presented. Two identical Phantom 1.5/6DOF devices
are used to validate experimentally the obtained results, Fig. 1.

2 Dynamic Model of the Teleoperation System

The system is modeled as a pair of 2-DOF serial links as shown in Fig. 2.

Ml (ql )q̈l + Cl (ql , q̇l )q̇l = τl − τh


(1)
Mr (qr )q̈r + Cr (qr , q̇r )q̇r = τe − τr
where q̈l , q̇l , ql and q̈r , q̇r , qr are the acceleration, velocity and position for local
and remote manipulators, τl , τr are the control signals for local and remote
manipulators, and τh , τe are the forces exerted by the human operator and the
environment interaction, Ml (ql ), Mr (qr ) are the inertia matrices for local and
remote manipulators, Cl (ql , q̇l ), Cr (qr , q̇r ) are the Coriolis and centrifugal forces
matrices. In P-like control system, the local manipulator sends its position to
the remote manipulator as an signal and vice versa. The torque on the two
manipulators is proportional to the error plus a damping injection term, Fig. 3.

Fig. 2. Two 2-links manipulators representing the teleoperation system.


Lyapunov-Based Control 761

Fig. 3. P-like controller for bilateral teleoperation system.

The control laws for the local and remote manipulators are given by Eq. 2, where
Kl and Kr are the proportional gains for local and remote manipulators, and Bl
and Br are the damping injection terms.

τl = Kl [qr (t − Tr (t)) − ql ] − Bl q̇l


(2)
τr = Kr [qr − ql (t − Tl (t))] + Br q̇r

3 The Derivation of the Old Tracking Condition


Nuno et al. considered in [6] the following Lyapunov-like function V(qi , q̇i , t)
that has a clear energy interpretation to obtain a condition for stable tracking.
It contains the energy terms associated with the teleoperator components.
1 T Kl T Kl
V= q̇ Ml (ql )q̇l + q̇ Mr (qr )q̇r + |ql − qr |2
2 l 2Kr r 2  
  
Kinetic energy of the local and remote robots Energy of the controller

t (3)
Kl T Kl
+ (q̇T
l Th − q̇ Te )dσ + kl + kr
Kr r Kr
0
  
Energy of the human and environment

The time derivative of V(qi , q̇i , t) is firstly obtained, and then by integrating
V̇ (qi , q̇i , t) from (0) to (t), one obtains V(t)−V(0). The system energy is bounded
if V(t) − V(0) < 0 and therefore the system stability is achieved. After long
derivation, the following condition was obtained for the system stability.

4Bl Br > (∗ Tl2 + ∗ Tr2 )Kl Kr (4)

According to [6], setting the gains such that the inequality Eq. 4 is satisfied,
means that the velocities and position error are bounded. In this paper, we call
the condition Eq. 4 as the old tracking condition, and our condition derived in
[7] as the new tracking condition.
762 M. Sallam et al.

4 Analytical Verification for the Old Tracking Condition


Through the derivation presented in [6] for the P-like controller, Nuno et al.
proved that the Lyapunov-like function is bounded only if the control gains are
selected such that λi > 0 and αi > 0 in the two following equation:
 ∗ 2
  ∗ 2

Kl Tl Kl Br Kl Tr
λl = Bl − αl + , λr = − αr +
2 αr Kr 2 αl
In other words, the gains holding the condition Eq. 4 must be a solution for the
two inequalities Eq. 5 and Eq. 6, i.e. allow positive values for λl , λr , αl , and αr .
 ∗ 2

Kl Tl
Bl − αl + >0 (5)
2 αr
 ∗ 2

Kl Br Kl T
− αr + r > 0 (6)
Kr 2 αl
Here is the first set of gains that satisfy the tracking condition Eq. 4 however
they are not a solution for the two inequalities Eq. 5 and Eq. 6;
√ √
3 3
Kl = 1, Kr = 1, Tl = 1, Tr = 1, Br = , Bl =
2 2
Substitute with these gains in the two inequalities Eq. 5 and Eq. 6, one obtains
√   √  
3 1 1 3 1 1
− αl + > 0, − αr + >0
2 2 αr 2 2 αl
In order to solve those two inequalities graphically, both will be plotted as
shown in Fig. 4. If there is a solution for the two inequalities, the two groups of
curves should at least intersect at one point.
As shown in Fig. 4, there is no intersection between the two groups of curves
that represent the two inequalities. Consequently, although the selected gains
satisfy the condition Eq. 4, there is no any solution for the two inequalities Eq. 5
and Eq. 6 at those gains. This means that the boundedness of the Lyapunov-like
function Eq. 3 is not guaranteed even if the condition Eq. 4 is satisfied. This gives
a strong proof about the incorrectness of the condition Eq. 4.
Here is another set of gains that satisfy the condition Eq. 4 but not the two
inequalities Eq. 5 and Eq. 6.

Kl = 1.3, Kr = 0.4, Bl = 0.31, Br = 0.33, ∗ T (t) = 0.55

Substituting with these gains in the two inequalities Eq. 5 and Eq. 6, one obtains
 
1.3 0.552
0.31 − αl + >0
2 αr
 
1.3 × 0.33 1.3 0.552
− αr + >0
0.4 2 αl
Lyapunov-Based Control 763

Fig. 4. Level curves representing the two inequalities at the first set of gains that satisfy
the old tracking condition Eq. 4.

Both inequalities are plotted in Fig. 5. As shown there is no any intersection


between the two families of curves. Consequently, although the selected gains
satisfy the condition Eq. 4, there is no any solution for the two inequalities at
those gains. This confirms that the boundedness of Lyapunov-like function Eq. 3
is not guaranteed even if the condition Eq. 4 is satisfied.

Fig. 5. Level curves representing the two inequalities at second set of gains that satisfy
the old tracking condition Eq. 4.
764 M. Sallam et al.

5 Practical Verification for the Old Tracking Condition


Firstly, using simulation, we check the response of the system in presence of gains
that satisfy the old tracking condition Eq. 4 but not the inequalities Eq. 5 and
Eq. 6. This test is conducted using the same gains of the analytical verification.

Kl = 1.3, Kr = 0.4, Bl = 0.31, Br = 0.33, ∗ T (t) = 0.55

Figure 6 shows that the position tracking is not achieved and the system went
unstable. This test confirms the analytical results in the previous section.

Fig. 6. Simulation test - unstable tracking at gains satisfying the condition Eq. 4

In the experimental test Fig. 7, a set of gains are selected such that the
tracking condition Eq. 4 is satisfied. The local and remote manipulators are set
to have different initial position. As the control algorithm starts to run, each

Fig. 7. Real time simulink model of the teleoperation system using two Phantoms.
Lyapunov-Based Control 765

manipulator receives the control action to track the position of the other manip-
ulator. If the condition Eq. 4 properly works, then the tracking will be achieved.
If the human does not move the manipulator and there is no interaction with
the environment, the position error converges to zero. The gains in this test are,
Kl = 2, Kr = 2, Br = 0.6, Bl = 0.6, ∗ Tl = ∗ Tr = 0.3
Figure 8 shows that the tracking is not achieved although condition Eq. 4 is held.

Fig. 8. Experiment result - unstable tracking at gains satisfying condition Eq. 4

6 Deriving a New Tracking Condition


The derivation of the new tracking condition, presented in this section, is similar
to the derivation presented in [6] up to the two inequalities Eq. 5 and Eq. 6. After
re-arranging the terms of Eq. 5, one obtains

Tl2 Kl
αr > (7)
2Bl − Kl αl
Similarly, after re-arranging the terms of Eq. 6, one obtains
2Br αl − Kr ∗ Tr2
αr < (8)
Kr αl
From Eq. 7 and Eq. 8, we can conclude the following inequality

Tl2 Kl 2Br αl − Kr ∗ Tr2
< (9)
2Bl − Kl αl Kr αl
After manipulating the inequality Eq. 9, one obtains
(4Bl Br + Kl Kr ∗ Tr2 − Kl Kr ∗ Tl2 )αl − 2Br Kl αl2 − 2Bl Kr ∗ Tr2
>0 (10)
Kr αl (2Bl − Kl αl )
The only condition that satisfies the inequality Eq. 10 is the following condition
4Bl Br > Kl Kr (∗ Tr + ∗ Tl )2 (11)
As such, setting the control gains fulfilling the conditions Eq. 11 ensures that
V(t) < V(0), thus the Lyapunov-like function Eq. 3 is bounded.
766 M. Sallam et al.

7 Analytical Verifications for the New Tracking


Condition
The new tracking condition Eq. 11 has been derived for the first time in [7]. Since
then, it has been used by Nuno et al. in many of their publications as in remark
3 in [4], remark 5 in [3], proposition 2 in [5], and remark 2 in [2]. In addition,
it

is noticed

that the two sets of gains (Kl = 1, Kr = 1, Tl = 1, Tr = 1, Br =
3 3 ∗
2 ,Bl = 2 ) and (Kl = 1.3, Kr = 0.4, Bl = 0.31, Br = 0.33, T (t) = 0.55) that
satisfied the condition Eq. 4 but were not solutions for the two inequalities Eq. 5
and Eq. 6, do not satisfy the new condition Eq. 11. This is in general a promising
indication about the correctness of the new condition.
In this first test, gains are selected, such that the new condition Eq. 11 is
satisfied,
Kl = 1, Kr = 1, Tl = 1, Tr = 1, Br = 1.1, Bl = 1.1
Substitute with these gains in the two inequalities Eq. 5 and Eq. 6, one obtains
 
1 1
1.1 − αl + >0
2 αr
 
1 1
1.1 − αr + >0
2 αl
The two inequalities are solved graphically as shown in Fig. 9. If there is a solu-
tion for the two inequalities, the two groups of curves should at least intersect
at one point. Since the two families of curves intersect with each other, it means
that there are positive values for αl and αr that produce positive values for λl
and λr and hence one solution for the two inequalities can be found. This ensures
position tracking.
In this second test, the gains are selected, such that the old condition
Eq. 4 is satisfied, and the new condition Eq. 11 is critically not satisfied. “Criti-
cally” means that the two sides of the condition Eq. 11 are equal i.e. (4Bl Br =
Kl Kr (Tl + Tr )2 )
Kl = 1, Kr = 1, Tl = 1, Tr = 1, Br = 1, Bl = 1
Substituting with these gains in the two inequalities Eq. 5 and Eq. 6, one obtains
 
1 1
1− αl + >0
2 αr
 
1 1
1− αr + >0
2 αl
The two inequalities are plotted as shown in Fig. 10. The two families of curves
intersect with each other, only when λl = λr = 0, which is not acceptable.
Once λl or λr becomes a positive value, the two families do not intersect with
each other anymore. It means that there are no positive values for αl and αr
that produce positive values for λl and λr . Hence the position tracking is not
guaranteed.
Lyapunov-Based Control 767

Fig. 9. Level curves representing the two inequalities at set of gains that satisfy the
new tracking condition Eq. 11.

Fig. 10. Level curves representing the two inequalities at set of gains that critically do
not satisfy the new tracking condition Eq. 11.

8 Conclusion and Future Work


This paper investigates the correctness of the condition derived by Nuno et al. in
[6]. This condition helps in selecting and adjusting the control gains to achieve
position tracking in teleoperation systems. It is proved by analysis, simulation
and experiments that Nuno’s condition is not always correct. It is shown that
in some cases, even if the condition is satisfied, the time delay can render the
system unstable. Afterwards, a new tracking condition is derived and verified
based on a Lyapunov-like function. More control schemes could be also tested
using the modified tracking condition.
768 M. Sallam et al.

References
1. Anderson, R., Spong, M.: Bilateral control of teleoperators with time delay. IEEE
Trans. Autom. Control 34, 494–501 (1989). https://doi.org/10.1109/9.24201
2. Aldana, C.I., Romero, E., Nuno, E., Basañez, L.: Pose consensus in networks of
heterogeneous robots with variable time delays. Int. J. Robust Nonlinear Control
25, 2279–2298 (2015). https://doi.org/10.1002/rnc.3200
3. Nuno, E., Valle, D., Sarras, I., Basañez, L.: Leader-follower and leaderless consensus
in networks of flexible-joint manipulators. Eur. J. Control 20, 249–258 (2014).
https://doi.org/10.1016/j.ejcon.2014.07.003
4. Nuno, E., Sarras, I., Basañez, L.: Consensus in networks of nonidentical Euler–
Lagrange systems using P+d controllers. IEEE Trans. Robot. 29, 1503–1508
(2013). https://doi.org/10.1109/TRO.2013.2279572
5. Nuno, E., Sarras, I., Basañez, L., Kinnaert, M.: Control of teleoperators with joint
flexibility, uncertain parameters and time-delays. J. Robot. Auton. Syst. 62, 1691–
1701 (2014). https://doi.org/10.1016/j.robot.2014.08.003
6. Nuno, E., Basañez, L., Ortega, R., Spong, M.: Position tracking for non-linear
teleoperators with variable time delay. Int. J. Robot. Res. 28, 895–910 (2009).
https://doi.org/10.1177/0278364908099461
7. Sallam, M., Ramadan, A., Fanni, M.: Position tracking for bilateral teleoperation
system with varying time delay. In: The 2013 IEEE/ASME International Con-
ference on Advanced Intelligent Mechatronics (AIM), Wollongong, pp. 1146–1151
(2013). https://doi.org/10.1109/AIM.2013.6584248
8. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of mas-
ter salve system using optimal NPID and FOPID. In: 2019 IEEE 28th Interna-
tional Symposium on Industrial Electronics (ISIE), Vancouver, pp. 485–490 (2019).
https://doi.org/10.1109/ISIE.2019.8781129
9. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of master
slave robotics system using optimal control schemes. In: IOP Conference Series:
Materials Science and Engineering, vol. 610, p. 012056 (2019). https://doi.org/10.
1088/1757-899X/610/1/012056
10. Sallam, M., Ramadan, A., Fanni, M., Abdellatif, M.: Stability verification for bilat-
eral teleoperation system with variable time delay. Int. J. Mech. Mechatron. Eng.
5, 2477–2482 (2011)
Development and Control
of a Micro-robotic System
for Medical Applications

Fady Magdy(B) , Ahmed Waheed, Ahmed Moustafa, Ramy Farag,


Ibrahim M. Badawy, and Mohamed Sallem

Helwan University, Cairo, Egypt


FadyMagdy@h-eng.helwan.edu.eg

Abstract. The wireless control of microparticles as controllable micro-


robots has the potentials in contributing is resolving some medical issues,
the current medical approaches are not efficient in resolving them. This
paper presents our designed system for controlling a 100 µm spherical
para-magnetic particle. The system setup design was based upon simu-
lations of magnetic flux densities over the microparticle water reservoir
on COMSOL Multiphysics modeling software. LABVIEW software was
used to code and interface with the system. Also, the use of ready-to-use
optimization controllers was proposed such as LABVIEW auto-tuning
PID controller, instead using the trial and error approach that has been
taken previously. It’s not recommended to use trial and error approach
since it is time costly and not optimal in terms of error minimization.
Using the auto tuning PID controller, settling error less than 4.22 µm
could achieved and a maximum error in trajectory following 4.4 µm, while
the trial and error approach achieved 44.5 µm settling error and a max-
imum error in trajectory following 0.6 µm. Achieving this performance
makes the microparticles a candidate in novel medical operations that
require the navigation of micro objects within arteries and human body
in general.

Keywords: Microparticles · Microrobots · Optimization controller ·


Drug carriers · Computer vision · Object tracking

1 Introduction
As medicine is in continuous development every day. Researchers become able
to discover more deeper causes of many diseases e.g. diabetic retinopathy [4].
This might require a medical intervention inside sensitive and complex organs,
also there are also certain drugs are better to be delivered to certain affected tis-
sues like chemotherapy for cancer tissues [1]. Externally powered Paramagnetic
microparticles were introduced in [9] to diagnosis of diseased tissues by calculat-
ing the interacting force between the particle and the tissues as the infected ones
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 769–778, 2021.
https://doi.org/10.1007/978-3-030-58669-0_68
770 F. Magdy et al.

have different palpation than the healthy ones. Also, for the therapy of blood
clots in delicate capillaries, (see Fig. 1) [2]. Indeed, this method is safer because it
avoids causing trauma to the patient as the particle can be injected to the body
through the body orifices or by using a syringe to deliver it to the blood vessels.
Specially with patients with special conditions that have barriers to make surg-
eries such as, diabetics. Additionally, the chemical therapy is targeted just for
the infected tissues without affecting the whole body [3].

Fig. 1. Body blood vessels

Not to mention, the motion of the particle has to be precious and able to over-
come the circumstances that these particles might encounter inside the human
body like blood flow pressure and various blood flow rates. Therefore, closed
loop controllers ought to be deployed [5]. Different types of closed loop con-
trollers could be used, however, PID controller is used in the experiments, since
PID controllers is successful in deployment in non-linear systems as this.
The PID controller parameters should minimize the error while keeping an
appropriate rise time. In fact, there are many ways to select the parameters
such as try and error, a mathematical model representation but this in case the
system’s parameters are static. However, the system’s parameters are dynamic
which is dependent on time and the setup’s configuration of every experiment,

Fig. 2. System Layout: the system comprises of (from left to right) power supply,
controller and control signal DAC unit, coil driver, coils, microparticle and its water
reservoir and microscope
Micro-robotic System 771

making a reliable of the system is hard to achieve. Online system identification be


one way to solve this problem, other method is the use of auto-tuning controllers.
In this work, the components of the system hardware are described briefly in
Fig. 2, while focusing on how LABVIEW is implemented to control the system
and how the built in online auto tuning block was used, that performs number
of trials to calculate the most suitable parameters for the system setup config-
uration, that facilitates editable parameters to get the required response, so no
need to build a customized algorithm to implement the scope.

2 Experimental Setup
The system in Fig. 3 consists of a power supply unit, Coils for driving the
microparticle, 4-channels DC driver for actuating the coils, microscope which
is for feed-backing the microparticle position to the controller, reservoir to house
the microparticle, controller connected to a digital to analog converter unit.

Fig. 3. The system for controlling the microparticle in 2D space

2.1 General Considerations


In these experiments some limitations are considered to focus on the target. First,
in the real-world a type of scan would be used to track the particle [10], while a
camera is used but it is not applicable in the real world. Also, the positioning is
done in 2D while the real-world needs 3D positioning [12]. Not to mention, the
work-space in these experiments is much smaller than in the patient’s body, so
the coils should be relatively bigger to fit the real environment. Furthermore, the
way to get the particle out of the patient’s body is neglected as biodegradable
materials might be used in the particle manufacturing [11].

2.2 Reservoir
The reservoir is made of acrylic with dimensions of 10 × 10 × 9 mm, these dimen-
sions based on the properties of the microparticle, size limitation of the coils and
772 F. Magdy et al.

the surface tension of the reservoir with the microparticle holder fluid, which is
water. First due to surface tension between the walls of the reservoir and water,
water takes concave shape at the surface of acrylic container, so to minimize the
effect of this on the experiments, the area of container should be large relatively
to the particle size. However, the size of the reservoir should also be appropriate
such that the magnetic field made by the coils should cover the surface area of
the reservoir.
Secondly, to benefit from the maximum forces from the magnetic fields of
the coils, the microparticle should be held at the top of the water at the same
plane of the coils center. Thus, the height of the reservoir is made bigger than
the coil’s radius, such that the water level reaches the center of the coils.
Finally, the bottom of the reservoir is made white, so the disturbance that
could emerge due to color variations are minimized, and the color contrast
between black of the microparticle and white of the reservoir makes the detection
of the microparticle by image processing easier.

2.3 Coils

(a) (b)

Fig. 4. (a) Coil’s dimensions (b) the simulation of magnetic flux density from a single
coil.

Four coils are used to control the 2D position of the microparticle on water.
Using COMSOL Multiphysics software the magnetic flux density produced by
each coil can be simulated (see Fig. 4). All real-world coil parameters are applied
on this model as an isolated copper wire of 0.7 mm diameter is used (Electrical
conductivity 5.998 × 107 [S/m], and resistivity 1.667 × 10−8 [Ω.m]), a galvanized
steel core of 20 mm diameter (Electrical conductivity 1.12 × 107 [S/m]), with
about 1200 turns (see Fig. 4). The magnetic flux density is studied over 2D plane
in which the microparticle moves. The simulations show that the magnetic flux
density is the highest value at the coil outer surface and decreases when moving
away from the coil (see Fig. 4).
The magnetic flux density is studied over X-Z plane in which the micropar-
ticle moves as shown in Fig. 5. The results show that the magnetic flux density
has the highest value at the coil upper surface, decreases when moving further
Micro-robotic System 773

along Z-axis. Because of the change of magnetic flux over this area, a region of
interest (4.57 × 3.42 mm2 ) is used, where it doesn’t change observably.

Fig. 5. Magnetic flux density simulation

2.4 Paramagnetic Microparticles

Particles are made of paramagnetic material specifically iron-oxide in lactic acid


with approximately 100 µm in diameter. The velocity of the particle is function
of the viscous drag force and the magnetic force exerted on the microparticle
by the magnetic field of the coils. The maximum velocity of the particle when
its acceleration decay to zero, such that the magnetic force balances the viscous
drag force (6).
The magnetic force exerted on the microparticle can be expressed as follows:

F = ∇αp Vp B 2 (1)

Where Vp is the volume of the microparticle, and B is the magnetic flux density.
B is function of time and distance between the microparticle and the coil exerting
the magnetic force. Since αp , Vp are constants, 1 can be expressed as follows:

4
F = παp rp3 ∇B 2 (2)
3
Where rp is the radius of the microparticle, while the viscous drag force can be
expressed as follows:
Fd = −6πηrp ν (3)
Where η is the dynamic viscosity and ν is the velocity of the microparticle.
According to Newton’s second law of motion:

F = mp ap (4)

4
παp rp3 ∇B 2 − 6πηrp ν = mp ap (5)
3
774 F. Magdy et al.

3 παp rp ∇B − mp ap
4 3 2
ν= (6)
6πηrp
From 5, the maximum velocity vm can be calculated when the viscous drag force
balances the magnetic force, at which the acceleration vanishes:

2 αp rp2
num = ∇B 2 (7)
9 η

2.5 Control Algorithm


The control algorithm is implemented using LABVIEW software, by doing so
the advantage of multithreads feature and high clock speed of the computer’s
micro-processor unit is being taken. The control algorithm consists of two main
layers, the first one is for interface. The second layer is where data is processed,
and control signals are calculated. It layer can be divided into sequential tasks
as shown in Fig. 6, which can be summed up as follows:

(a) (b)

Fig. 6. (a) Block diagram of the closed-loop system (b) flow chart represents the
sequence of the control algorithm

– The microscope captures frames at of 25–30 fps. Each frame is processed con-
verting it from RGB to gray-scale, then applying filters (Erosion) to eliminate
noise in the captured image and locate accurately. Finally, the centroid of the
particle is detected after applying threshold, to filter everything except the
microparticle.
– The auto-tuning PID controller block takes the coordinates from the image
processing, then calculate the next control signal to be applied on the coils,
then the control signal values are sent to microcontroller.
– The microcontroller receives the control signals then convert them to analog
signal by its PWM modules.
Micro-robotic System 775

2.6 Auto-tuning Methodology

PID gains auto-tuning methodology is held by the built-in LABVIEW block


during the mentioned loop.
When the auto-tune button is pressed by the user, a window pops-up that has
some instructions to be followed and some parameters to be filled such as pre-
mentioned gains values, type of controller, response performance, number of iter-
ation cycles, etc. Then after executing the process the new gains are generated.

2.7 Peripheral Devices

– Arduino UNO microcontroller is used as a data acquisition unit, so that it


provides the PWM signals to the 4-channels DC driver.
– Four channels DC driver is used to distribute the voltage for each coil, the
power is supplied by 12 V power supply.

3 Results

By using different set parameters of PID gains’ values based on try and error
method and online auto-tuning method, different trajectories were executed,
that represents a part of a net of blood vessels like mentioned in Fig. 1.
As shown in Fig. 7, try and error method (kp = 1, ki = 0.01) could achieve
acceptable responses with rise time approximately 0.4−0.63 s and overshoot
0.7−11.1%. Then, the circle trajectory in Fig. 8 by a maximum error 0.25 mm,

Fig. 7. X & Y positioning responses by try & error PI.

Fig. 8. Left, circle trajectory. Right, the other trajectory represents a blood vessel by
try & error PI.
776 F. Magdy et al.

Fig. 9. Another trajectory that represents a blood vessel by try & error PI.

while the other trajectory in Fig. 8 by a maximum error 3.48 mm and the tra-
jectory in Fig. 9 by a maximum error 3.56 mm.
While auto-tuning method evaluated values kp = 1.583602, ki = 0.031190
that could achieve better responses with rise time approximately 0.57−1.5 s and
overshoots 3.7−4.4% in point to point motion as shown in Fig. 10. Then, the
circle trajectory in Fig. 11 by a maximum error 0.23 mm, while the other trajec-

Fig. 10. X & Y positioning responses by auto tuned PI.

Fig. 11. Left, circle trajectory. Right, the other trajectory represents a blood vessel by
auto tuned PI.

Fig. 12. Another trajectory that represents a blood vessel by auto tuned PI.
Micro-robotic System 777

tory in Fig. 11 by a maximum error 0.22 mm and the trajectory in Fig. 12 by a


maximum error 0.6 mm.
On the other hand, PID gains (kp = 13.669023, ki = 0.045388, kd = 0.11347)
not only showed irrational performance as shown in Fig. 13 but also, failed to
follow the trajectories due the presence of deferential parameter which is depends
on the size of the noises in the system that is might be very large with respect
to the time.
Also, The evaluated gains’ values by the auto-tuner are not dependable, as
any variations might happen on the system leads to invalidity of these values.

Fig. 13. X & Y positioning responses by auto tuned PID.

The circle trajectory in Fig. 14 Left, was achieved by using gains’ values
Kp = 6.043791, ki = 0.028992 with a maximum error 4.4 µm. Suddenly, as
shown in Fig. 14 Right, these values did not do well as before with the system,
which results a maximum error about 0.15 mm.

Fig. 14. Left, first trial circle trajectory. Right, repeated trajectory with different
results.

4 Conclusion and Future Work


Using fixed set of parameters for the PID gains’ as used before [6] will not last
too long with the same system. This drove us to use auto-tuning method to
choose the best parameters for the current circumstances. Otherwise, in a future
work an adaptive controller might accomplish for better results.
778 F. Magdy et al.

With this controller the setup is candidate to navigate microparticles inside


a body where the error tolerance is less than 4.4 µm. From the conducted exper-
iments, the deploy of auto-tuning controller which is based on optimization algo-
rithm is more efficient than the use of trial and error approach.
Furthermore, instead of giving discrete or array of setpoints by the user, it
would be more practical to use Master-Slave system [7,8], to direct the particle.

References
1. Wu, Z., Troll, J., Jeong, H., Wei, Q., Stang, M., Ziemssen, F., Wang, Z., Dong, M.,
Schnichels, S., Qiu, T.: Fischer, P: A swarm of slippery micropropellers penetrates
the vitreous body of the eye. Appl. Sci. Eng. 4, eaat4388 (2018). https://doi.org/
10.1126/sciadv.aat4388
2. Kim, B., Park, H., Lim, M., Park, J.: A vibrating foxtail based locomotive mech-
anism for hunting for blood clots. In: The 39th International Symposium on
Robotics, Seoul, pp. 301–304 (2008)
3. Han, J., Zhen, J., Nguyen,V., Go, G., Choi, Y., Ko, S., Park, J., Park, S.: Hybrid-
actuating macrophage based microrobots for active cancer therapy. School of
Mechanical Engineering, Chonnam National University, Gwangju, Korea. https://
doi.org/10.1038/srep28717
4. Wu1, Z., Troll, J., Jeong, H., Wei, Q., Stang, M., Ziemssen, F., Wang, Z., Dong, M.,
Schnichels, S., Qiu, T., Fischer, P.: A swarm of slippery micropropellers penetrates
the vitreous body of the eye (2018). https://doi.org/10.1126/sciadv.aat4388
5. Salim, S., Zainon, M.: Control system engineering (2010). 978-983-2948-90-2
6. Keuning, J.D., Vries, J., Abelmann, L., Misra, S.: Image-based magnetic control of
paramagnetic microparticles in water. In: the IEEE/RSJ International Conference
on Intelligent Robots and Systems, SanFrancisco, pp. 421–426 (2011). https://doi.
org/10.1109/IROS.2011.6048703
7. Rashad, S.A., Sallam, M., Bassiuny, A.B., Abdelghany, A.M.: Control of master
slave robotics system using optimal control schemes. In: IOP Conference Series:
Materials Science and Engineering, vol. 610, p. 012056 (2019). https://doi.org/10.
1088/1757-899X/610/1/
8. Sallam, M., Ramadan, A., Fanni, M., Abdellatif, M.: Stability verification for bilat-
eral teleoperation system with variable time delay. Int. J. Mech. Mechatron. Eng.
5, 2477–2482 (2011)
9. Khalil, I., Metz, R., Abelmann, L., Misra, S.: Interaction force estimation during
manipulation of microparticles. In: International Conference on Intelligent Robots
and Systems. IEEE/RSJ (2012). 978-1-4673-1735-1/12/S31.00
10. Pané, S., Puigmartı́-Luis, J., Bergeles, C., Chen, X., Pellicer, E., Sort, J., Pocep-
cová, V., Ferreira, A., Nelson, B.: Imaging technologies for biomedical micro- and
nanoswimmers. Adv. Mater. Techno. - Adv. Intell. Syst. (2018). https://doi.org/
10.1002/admt.201800575
11. TirgarBahnamiri, P., Bagheri-Khoulenjani, S.: Biodegradable microrobots for tar-
geting cell delivery. Med. Hypotheses (2017). https://doi.org/10.1016/j.mehy.2017.
02.015
12. Kummer, M., Abbott, J., Kratochvil, B., Borer, R., Sengul, A., Nelson, B.:
OctoMag: an electromagnetic system for 5-DOF wireless micromanipulation. IEEE
Trans. Rob. 26(6), 1006–1017 (2010). https://doi.org/10.1109/TRO.2010.2073030
Wake-up Receiver for LoRa-Based Wireless
Sensor Networks

Amal M. Abdel-Aal(&), Ahmad A. Aziz El-Banna,


and Hala M. Abdel-Kader

Electrical Engineering Department, Faculty of Engineering at Shoubra,


Benha University, Cairo, Egypt
Eng.amal1988@gmail.com

Abstract. Wireless Sensors Networks (WSN) are one of the widespread but
challenging wireless technologies. Two typical challenges in WSN, i.e., energy
consumption and coverage area, are considered recently to enhance the per-
formance of WSN. In this paper, we present a combination of two technologies
towards an efficient WSN; we are using Wake-up Receiver (WUR) technology
which reduces energy consumption along with Long-Rang Wide Area Network
(LoRaWAN) technology which increases the coverage area of the network. The
proposed network design is based on two different transmission scenarios
between the network nodes that are equipped with WUR and LoRa chip and are
connected using star topology to reduce the probability of collisions. Moreover,
the paper studies and tests the network using the OMNeT program to verify that
the proposed network design achieves better results for energy consumption
than traditional LoRa plus achieving a lower collisions network with distinct
packet delivery and negligible delay.

Keywords: Wake-up receiver  IoT  LoRaWAN  Flora library  OMNeT 


Wireless sensor network

1 Introduction

Wireless sensor networks (WSN) are the basic block in the Internet of things
(IoT) systems, where sensor nodes are connected to the Internet dynamically, and use it
to work together and achieve their tasks. Moreover, IoT is considered a new generation
of Internet that enables connectivity between some devices like sensors and different
artificial intelligence tools, and allows users to remotely control devices without need to
connect directly to them [1]. However, numerous challenges are facing the design of a
robust WSN, such as error control, fault tolerance, power management, and energy
efficiency. These challenges are still open problems and attracted the attention of the
communication society researchers, consequently, in this work we address the power
management and energy-efficient techniques.
The main components of the WSN, are sensors, modems, gateway and Internet.
One critical component is the sensor which is a battery-powered device that spreads
along specific area to sense something and collect data then send it to the server for
further analysis and/or control purposes. The sensors are usually placed in hard to reach

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 779–792, 2021.
https://doi.org/10.1007/978-3-030-58669-0_69
780 A. M. Abdel-Aal et al.

places, therefore, it is important to achieve two goals: the first one is to extend the
sensor’s lifetime, and the second goal is the ability to cover large area to collect as
much data as possible. Achieving these goals is the motivation to combine two
promising techniques, namely: wake up receiver (WUR) and long-range wide area
network (LoRaWAN).
To achieve the first objective, we define the main parameter that affects the sensor
lifetime as its power consumption; and how to improve the performance of the sensor
operation to decrease the power consumption is the main problem that we aim to solve
it in this work. This could be achieved by decreasing the power-consumed by the
sensor while performing its various tasks. Looking into the architecture of the sensor
node, we could easily discover that the main element which consumes most of the
sensor power is the transceiver as it is often working all the time even there are data to
send/receive or not. However, many techniques could be applied to reduce this con-
sumed power, one of these techniques is the wake-up strategy.
WUR is an additional circuit attached to the sensor node and the main task of it is
to make the node transceiver goes in deep sleep mode to reduce the consumed power
and will just be waked up as per the need for data transmission or reception. The
wake-up receiver schemes can be classified into diverse categories based on the fol-
lowing labels [2]:
1. Power source (Passive (external)/Active (internal))
2. Destination specification (Identity-based/Range-based)
3. Wake-up signal type (Radio-based/Acoustic)
4. Wake-up channel (Shared/Separate (Single channel/Multiple channels))
A typical wake-up receiver circuit that consist of other circuits (Envelop detector &
multiplier, Data slicer, Preamble detector & Low-Cut filter, PWM decoder & SPI
adapter and Sensor node), noting that the proposed network design considers the
structure of this WUR circuit [3].
On the other hand, to realize the second goal and to cover a spacious area, a recent
technology named LoRaWAN could be employed. LoRa was developed by Cycleo and
is introduced in the last years to permit long-range connectivity for IoT systems. It
includes some features such as (spread spectrum modulation, adaptive data rates, and
adaptive power levels) which are used to achieve long-range, low cost and effective
power. LoRa is automatically choosing a spreading factor (SF) between 7 and 12. The
Architecture of LoRa Network consists of three main parts (end devices, gateways, and
network server). End devices are divided into three classes named class A, class B, and
class C. There are some differences between these classes from the device perspective
and variations in the data frame structures [4]. Class A operation must be supported by
all LoRa devices because of it considered most efficient in power consumption.
In this work, we consider LoRaWAN 1.0.3 which uses device time request as MAC
command for synchronizing device real-time clock (class A). The rest of the paper is
organized as follows: Sect. 2 introduces the related work of the WURs and LoRa.
Section 3 illustrates the design and implementation of the LoRa-based network with
the proposed operating scenarios. Section 4 shows the simulation results, while Sect. 5
concludes the work done through this paper.
Wake-up Receiver for LoRa-Based Wireless Sensor Networks 781

2 Related Work

The basic demand of the integration between WUR and LoRaWAN techniques in
WSNs is to enable sensors to spread over large area and to work for longer times. This
enables the sensors to sense if any packets need to be sent or received without any
worry about the coverage area limitations or the amount of consumed power that affect
the battery life time which is a core requirement for any battery-based device.
The early work was studying the different WUR circuits to reduce the power
consumption, thereafter, the LoRaWAN protocol was proposed and the research
direction started to turn to the integration between both technologies to enhance the
network performance in terms of the consumed energy. The following section presents
some of the research efforts in the two techniques separately and sheds the light on
other studies that recently taken into account the integration between them.

2.1 Wake-Up-Radio (WUR)


A lot of research has been conducted to reduce the power consumption using different
wake-up techniques such as the usage of a secondary very low power wake-up channel
in free space optical (FSO) transmission. The FSO channel is used to send small wake-
up signals and after the wake-up is received successfully, the higher power radio
channel is used to transfer data packets more efficiently as in [5]. FSO wake-up scheme
reduces latency by a factor of 10 and when configured to equivalent latency, the FSO
receiver has a power consumption that is eight times lower than that of B-MAC.
Another research [6] designed a smart power unit that allows the board to perform
maximum power point tracing (MPPT) to improve the efficiency of the energy har-
vesting process and increase the possibility to use power management policy by using
the communication with the node and using the radio wake up receiver. In addition,
another point of view is presented in [7] by providing the correlation between wireless
sensor networks and system level design, and the authors could achieve very low
power consumption (in few lW range) at a very low cost. Moreover, another way to
reduce power consuming is by using passive WUR which harvesting the required
power for operating the WUR unit from received signal as in [8]. Furthermore, there
are a lot of different designs for WUR which working to achieve lower power con-
sumption as in [3] and [9]. And other implement receiver-initiated consecutive packet
transmission WuR (RI-CPT-WuR) MAC protocol to improve energy efficiency, packet
reliability ratio, and delay for IoT networks enabled by WuR as in [10]. Recently other
researches work on to find out the possibility research chances and practical challenges
regarding to the practical application of WUR technology for upcoming IoT applica-
tions as in [11].

2.2 Low Power Long Range Communication


LoRa is a wireless modulation technology (physical layer to connect things) with an
important feature of being an open-source, i.e., no license is required to transmit data
over long distances e.g. 2K meters. In addition, LoRaWAN is considered the IP pro-
tocol to operate Lora networks. Recently, LoRa has a great attention by researchers.
782 A. M. Abdel-Aal et al.

There are three main parameters for LoRa which are code rate (CR), spreading factor
(SF) and bandwidth (BW). The authors in [12] analyzed the impact of these parameters
on the performance of LoRa. In addition, the work in [13] tested the ability of LoRa to
monitor health and wellness for a huge campus with using transmission power 14 dBm
and a maximum spreading factor of LoRa 12 for band 868 MHz ISM. On another
hand, there are a lot of transmission parameter combinations for LoRaWAN almost
6720 as reported in [14]. In [14], the authors were trying to find a way that helps to find
the suitable parameters for each use of LoRa devices. However, most of LoRa devices
application use star topology due to its simplicity, the authors in [15] tried to make
different topology by using mesh topology and proposed a comparison between star
and mesh topology in terms of PDR (packet delivery ratio).

2.3 Wake-Up-Radio (WUR) and Low Power Long Range


Communication
Further studies combined between WUR and LoRa to improve the energy efficiency
challenge. In [1], the authors combined between WUR with standard LoRa and applied
this on-demand TDMA to improve the performance, however, they didn’t use any
random distributions characteristic of real conditions, long period of time is not cal-
culated, and the mode of data transfer from the server to the nodes is not considered. In
[16], the authors proposed one WUR design for LoRa systems and evaluated the
performance result from controlling the bitrate and divided it into two categories: high
bit rate and low bit rate. Another architecture is presented in [17] based on a MAC
protocol design which allows each end device to operate as a cluster head (CH) with
using LoRa at the time of sending the data and operate as normal node and using WUR
the rest of the time, the work analytically achieved low power consumption and low
latency.

3 Design and Implementation

In this part, we propose the architecture and the design of the proposed LoRa-based
WSN using the typical IoT topology that we designed to achieve better energy con-
sumption and coverage area by using two types of communication.

3.1 Network Implementation


Based on the principle of data access to the transmission medium, there are two
common types of network:
A. Controlled data access with using access time-sharing.
B. Random data access with using collision resolution algorithm.
Wake-up Receiver for LoRa-Based Wireless Sensor Networks 783

In the first network type, the node selects the suitable time to transmit its data using
some algorithms which avoid collision like ALOHA (which is a simple communication
scheme where each node in the network sends data when there is a frame and when this
frame reaches the destination successfully then the next frame starts). However, the
main disadvantage of this type is that it increases the delay but the benefit appears when
the network load is increased because the data still transmitted in the same given time.
In the second type, the node sends data any time so the delay is decreasing but the
collision increases and when the collision increase, the number of dropped packets
increase; and that’s requires more number of re transmitting the packets which
increases the power consumption as well, so this type is preferable for networks with
low loads.
We followed these implementations in our design as shown in the next subsection.

3.2 Network Design


We consider the IoT network topology as (Data System-Gateway-Devices) in the same
order from up to down where the gateway is responsible for collecting data from
various devices in the network then sends it to the data server. In this design, we
combine two types of communication, long-range communication and short-range
communication where the first one is applied to the gateway that uses LoRa devices
and the second one is applied to the end devices that are assumed to be equipped with
WUR units. By employing this technique, we could make use of the LoRa capabilities
for long-range communication in the WSN network to increase its coverage area while
using the WUR at the same time to improve the power consumed by the network
nodes.
Considering the star topology in Fig. 1. We can operate this network in two dif-
ferent transmission scenarios:

Fig. 1. Communication on the designed network


784 A. M. Abdel-Aal et al.

The first scenario is performed when the sever requests to collect data from all end
devices and is performed according to the following sequence:
1. When the server needs to collect data from all ED, it sends the request packet to the
CH (by using LoRa, long Range communication) represented by the dashed line in
Fig. 1.
2. The CH in its turn broadcasts WUs signals to wake up all the EDs (using short
range communication) represented by the solid line as in Fig. 1.
3. The EDs send data to CH then the CH send the packets back to the server.
And if the node collected data and need to send it to the server, it doesn’t wait for
the wake-up signal from the server, but it sends it immediately. Although this may
result in some collisions which could affect the network performance, but it decreases
the consumed power dramatically because the node goes into deep sleep for longer
times until it wakes up again.
The second scenario is performed when the server requests to collect data from
specific end device and is done in the following sequence:
1. When the server needs to collect data from specific ED, it sends a packet with the
required ED MAC address or ID to the CH by using long range communication.
2. This CH will wake up only the ED with the given MAC by using short range
communication.
3. ED sends data to CH and then the CH sends the data packets back to the server.
Both scenarios are studied and compared with the standard model of the traditional
LoRa network that has all the nodes work all the time even there are data to
send/receive or not.
In the proposed design, we consider the server to be located at kilometers far away
from the cluster head (CH) and it is always listening all the time to any incoming
packets, and since it’s already wall-powered, it usually doesn’t suffer from power
consumption problems. Moreover, the CH is considered a LoRaWAN device since it
needs continuous listening for incoming messages from both sides with low power
consumption, so it’s also chosen as LoRaWAN class A device. Finally, each ED is
assumed to be equipped by WUR to reduce the node’s power consumption through
pushing it to deeper sleep modes when no communication activity is required.
Moreover, the WUR is assumed to work with on-off keying (OOK) modulated signal.
We implement the network using the OMNeT software [18], we implement two
networks, the first one called standard mode which uses standard LoRa devices with all
the network nodes. We used LoRa Flora library which use a random-access mecha-
nism, second one called WUR mode we trying to avoid the increase of power con-
sumption by using WUR. There are different types of LoRa chips as in Table 1, we use
SEMTECH SX1272.
Wake-up Receiver for LoRa-Based Wireless Sensor Networks 785

Table 1. Various Lora chips specifications


Vendor SEMTECH SEMTECH MURATA MICROCHIP
Model SX1272 SX1276 CMWX1ZZABZ RN2483
Supply voltage (V) 3.3 3.3 3.3 3.3
Sleep (mA) 0.0001 0.0002 0.0014 0.0013
Idle (mA) 0.0015 0.0015 2.8
Receiver (Band 1, 9.7 10.3 21.5
BW = 125 kHz) (mA)
Transmit (14 dBm) 28 28 47 38.9
(mA)
Power
Sleep (uW) 0.33 0.66 4.62 4.29
Idle (uW) 0.00495 0.00495 0 9.24
Receiver (Band 1, 32.01 33.99 70.95 0
BW = 125 kHz) (mW)
Transmit (14 dBm) 92.4 92.4 155.1 128.37
(mW)

4 Experimental Results

In this section we present the experimental results of the designed network explained in
Sect. 3 through OMNeT program using a core model design from the flora library. The
network server and nodes support dynamic management of configuration parameters
through adaptive data rate (ADR). Following is a description of the experiment setup
and the achieved results.

4.1 Experiment Setup


4.1.1 OMNeT
We consider a network with 2-450 sensor nodes installed as end devices (EDs) used for
sensing purposes, assuming one sensor node acting as cluster head (CH) and consider a
single server node as illustrated in Fig. 1. At the EDs, we configure WUR unit with the
following specifications: the detector of the WUR consume 0.27 uW/1.5 V and the
decoder consumes 8 nW at testing frequency 433.92 MHz and data rate of 2 to 80 Kb/s
with sensitivity −51 dBm with preamble detector (+SAW filter) interference filtering.
In addition, we considered LoRa chip SEMTECH SX1272 with the following oper-
ating parameters at 3.3 V: consume 0.33 uW/0.0001 mA in sleep mode and
0.00495uW/0.0015 mA in idle listening, 32.01 mW/9.7 mA in receiving (Band 1,
BW = 125 kHz) and 92.4 mW/28 mA in transmitting (14 dBm).
In the proposed network, we employed three different node models: The gateway
model, LoRa node model, and LoRa node WUR model.
1. Gateway model contains part of the objects from the standard Inet framework,
which determines the interaction algorithms with the UDP and IP protocol. The
786 A. M. Abdel-Aal et al.

second part of the objects provides the algorithm of the LoRa protocol. The main
one is the network card model (LoRaGWNIC) as shown in Fig. 2(a).
2. LoRa node model consists of two objects - LoRaNIC and a simple application.
A simple application provides the generation of objects for transmission to the
server and receiving packets from the server as shown in Fig. 2(b).
3. LoRa Node WUR model is similar to a standard node, an additional WUR unit is
attached to receive and process the activation signal as shown in Fig. 2(c).

Fig. 2. (a) LoRa gateway model; (b) LoRa node; (c) LoRa node WUR

4.1.2 Scenarios
We consider random distribution and modified the way to exchange data between the
server and the end devices to figure out the best way to communicate to achieve better
energy efficiency and low collision ratio. We configured three scenarios to distinguish
between the different ways in communication:
• Description of the first scenario (Standard mode)
In this scenario, the network is modeled in the standard mode. The node creates a
packet at a random time and sends it to the server, the parameters for the distribution of
intervals between packets followed an exponential distribution with mean equals 180 s.
For every 4 packets, the server sends a message, as a reply that server received the
packets, to the node. To send a message to the server, the node uses a random-access
mechanism to the medium. In the event of a collision, a packet is retransmitted at a
random time interval. However, the packet is dropped after 15 unsuccessful attempts of
transmission.
When the server sends the packet to the host, the packet is sent directly to the
gateway. At the gateway node, the packet is queued and remain waiting for the desired
node to enter the packet reception mode. According to the standard LoRa operating
mode, the node switches to the packet reception mode after each packet transmission to
the server.
Wake-up Receiver for LoRa-Based Wireless Sensor Networks 787

• Description of the second scenario (WUR mode scenario 1)


In this scenario, the packet sending algorithm is the same as the first scenario. The
node sends packets to the server; the server sends a packet to the host through several
received packets. The node also uses a random-access mechanism that allows collisions
and retransmission if they occur.
WUR mode is used to receive packets from the server. In this scenario, the packet
from the server to the node does not wait for the node to enter the packet reception mode
after sending it. When the packet arrives at the gateway, GW broadcasting the activation
packet to all nodes (EDs) and after the node is activated, it sends its data packet.
• Description of the third scenario (WUR mode scenario 2)
In this scenario, the interaction algorithm between the node and the server is
changed. It uses the routing table to communicate with each end device and at specified
times, the server sends an activation packet to a specific node. The activation packet is
first transmitted to the GW node, then the GW transfers it to the necessary node. The
node receives the activation package and sends the data packet to the server.

4.2 Evaluation Metrics


We employed three metrics to evaluate the network performance which are:
A. Number of collisions represents the number of packets that failed to reach its
destination due to instantaneous transmissions by various nodes at the same time,
B. Power consumption represents the amount of power that consumed by the network
nodes during a certain period of time, and
C. End to end delay and also called OWD (One-Way Delay) represents the packet
transmission time from the sender to the destination node (noting that it is different
from the RTT (round-trip time)), and it could be calculated for regular LoRa
network and for the proposed WUR network respectively as:

OWD ¼ TPkg ð1Þ

OWD ¼ TWUs þ TPkg ð2Þ

Where TPkg is the time taken by packets to reach to the destination and TWUs is the
time taken by the wake up signal to reach to the end device.

4.3 Network Performance Analysis


In this section we present and discuss the results achieved from deploying the proposed
network design on OMNeT. Specifically, we will discuss the performance of the
network in terms of the number of the collisions, the power consumption, and the end
to end delay.

4.3.1 Number of Collisions


Figure 3 shows the percentage of collisions for each node in the network for the
different scenarios explained in Sect. 4.1.2.
788 A. M. Abdel-Aal et al.

20
Standard

Collision per node


15 mode
WUR mode
10
(scenario 1)
5 WUR mode
(scenario 2)
0
2 100 200 300 400
Number of nodes

Fig. 3. Number of collision for different simulation scenarios

It is shown that when using the WUR for the first scenario, the percentage of
collision slightly increases than the standard mode. It can be interpreted that more
collisions happen because an additional wake-up packet should be transmitted, which
increases the busy time of the channel and increases the probability of collisions.
However, when using WUR with scenario number 2, there is no collision and that’s
because the server wakes up a specific node only to send its data so it occupies solely
the channel to send its packets.

4.3.2 Power Consumption


The main goal of this work is to reduce the consumed energy of the designed network
to extend the life time of the sensor nodes. We employed the two proposed scenarios to
distinguish between their performance w.r.t the standard mode. We measure the total
consumed energy for all network as shown in Fig. 4.

3,500
Totall Energy Consumed (mJ)

3,000
Standard
2,500
mode
2,000
WUR mode
1,500 (scenario 1)
1,000 WUR mode
500 (scenario 2)
0
2 100 200 300 400
Number of nodes

Fig. 4. Total energy consumed for different simulation scenarios


Wake-up Receiver for LoRa-Based Wireless Sensor Networks 789

The standard scenario consumes the most energy and the other two WUR scenarios
outperform it with different levels, however, at higher number of nodes, the WUR
(scenario 1) consumes more total energy than the standard one because of the following
reasons.
For packets reception, by employing the WUR, the typical consumed energy
decreases dramatically compared to the standard model as shown in Fig. 5 because the
node goes in deep sleep results in removing the consumed energy of continuous
listening and that’s already achieved in the WUR (scenario 2). However, by employing
WUR (scenario 1), the consumed energy increases as a function of the number of nodes
in the network and by increasing the density of the nodes, the number of collisions
increases which requires more re-transmissions, keeping in mind that extra wakeup
signals exist. In the WUR (scenario 2), only one node is receiving the data (assigned by
its ID) which dramatically decreases the power consumptions specially when the
network becomes dense at higher number of nodes as could be concluded from Fig. 5.

4.0
Receive Energy Consumed

3.5 Standard
per one Node (mJ)

3.0 mode
2.5 WUR mode
2.0 (scenario 1)
1.5
WUR mode
1.0 (scenario 2)
0.5
0.0
2 100 200 300 400
Number of nodes

Fig. 5. Received energy consumed per node

For packets transmission, sending some data packets consumes fixed amount of
energy and is almost constant in all the scenarios as shown in Fig. 6.
790 A. M. Abdel-Aal et al.

Consumed/package (mJ)
0.10

Transmitted energy
Standard
0.08
mode
0.06
WUR mode
0.04 (scenario 1)
0.02 WUR mode
(scenario 2)
0.00
2 100 200 300 400
Number of nodes

Fig. 6. Transmitted energy consumed per package

In summary, we found that operating in the WUR (scenario 2) mode which is the
proposed scenario for communication between server and end devices could make use
from the advantages of the LoRa and the WUR together and could achieve very small
power consumption.

4.3.3 End to End Delay


Figure 7 illustrates measuring the time interval for sending a packet from node for the
three scenarios.

35
End-to-End delay Upstream
traffic (node-server) (s)

30 Standard
25 mode
20 WUR mode
15 (scenario 1)
10 WUR mode
5 (scenario 2)
0
2 100 200 300 400
Number of nodes

Fig. 7. End-to-End delay upstream traffic.

The WUR (scenario 1) shows very small delay, while WUR (scenario 2) and the
standard mode exhibits linear delay because both of them exposed high collision rate as
explain in Sect. 4.3.1, and hence they should retransmit the packets again which
consumes more time. Moreover, the delay increases for WUR (scenario 1) slightly
Wake-up Receiver for LoRa-Based Wireless Sensor Networks 791

bigger than the standard mode because of the additional wake up packets, however, for
WUR (scenario 2), the nodes immediately send their data without waiting any control
signals and with lower probability for retransmission due to free collision channel,
which result in an almost zero delay.

5 Conclusions

In this paper work, we investigate the effect of the integration between the wake-up
receiver with a listening power of 270 nW and LoRa chip SEMTECH SX1272 on the
performance of wireless sensor networks and the life time of sensor node by trying to
achieve lower power consumption with smaller delay. In terms of comparison between
standard LoRaWAN technique and our proposed technique (integration between the
wake-up receiver with LoRa chip) using OMNeT software program, the experiments
results show that it is better to use the WUR mode in general than standard LOR-
AWAN technique. However, for dense networks, the integration between WUR and
LoRaWAN as general should be configured under specific communication way to
achieve better power consumption and delay. When employing the second proposed
mode, we enhanced the network performance by reducing the collisions and delay and
consuming lower power levels which increases the life time of the sensor node.

References
1. Piyare, R., Murphy, A., Magno, M., Benini, L.: On-demand LoRa: asynchronous TDMA for
energy efficient and low latency communication in IoT. Sensors 18(11), 3718 (2018)
2. Demirkol, I., Ersoy, C., Onur, E.: Wake-up receivers for wireless sensor networks: benefits
and challenges. IEEE Wireless Commun. 16(4), 88–96 (2009)
3. Marinkovic, S.J., Popovici, E.M.: Nano-power wireless wake-up receiver with serial
peripheral interface. IEEE J. Sel. Areas Commun. 29(8), 1641–1647 (2011)
4. Bouguera, T., Diouris, J.-F., Chaillout, J.-J., Jaouadi, R., Andrieux, G.: Energy consumption
model for sensor nodes based on LoRa and LoRaWAN. Sensors 18(7), 2104 (2018)
5. Mathews, J., Barnes, M., Young, A., Arvind, D.K.: Low power wake-up in wireless sensor
networks using free space optical communications. In: 2010 Fourth International Conference
on Sensor Technologies and Applications, pp. 256–261 (2010)
6. Magno, M., Marinkovic, S., Brunelli, D., Popovici, E., O’Flynn, B., Benini, L.: Smart power
unit with ultra-low power radio trigger capabilities for wireless sensor networks. In: 2012
Design, Automation & Test in Europe Conference & Exhibition (DATE) (2012)
7. Popovici, E., Magno, M., Marinkovic, S.: Power management techniques for wireless sensor
networks: a review. In: 5th IEEE International Workshop on Advances in Sensors and
Interfaces IWASI (2013)
8. Huo, L.: A Comprehensive Study of Passive Wake-up Radio in Wireless Sensor Networks,
Ms.c. Thesis, TuDelft Univ. (2014)
9. der Doorn, B.V., Kavelaars, W., Langendoen, K.: A prototype low-cost wakeup radio for the
868 MHz band. Int. J. Sensor Netw. 5(1), 22–32 (2009)
10. Guntupalli, L., Ghose, D., Li, F.Y., Gidlund, M.: Energy efficient consecutive packet
transmissions in receiver-initiated wake-up radio enabled WSNs. IEEE Sens. J. 18(11),
4733–4745 (2018)
792 A. M. Abdel-Aal et al.

11. Bello, H., Xiaoping, Z., Nordin, R., Xin, J.: Advances and opportunities in passive wake-up
radios with wireless energy harvesting for the internet of things applications. Sensors 19(14),
3078 (2019)
12. Noreen, U., Bounceur, A., Clavier, L.: A study of LoRa low power and wide area network
technology. In: 2017 3rd International Conference on Advanced Technologies for Signal and
Image Processing (ATSIP) (2017)
13. Petajajarvi, J., Mikhaylov, K., Hamalainen, M., Iinatti, J.: Evaluation of LoRa LPWAN
technology for remote health and wellbeing monitoring. In: 2016 10th International
Symposium on Medical Information and Communication Technology (ISMICT) (2016)
14. Bor, M., Roedig, U.: LoRa transmission parameter selection. In: 2017 13th International
Conference on Distributed Computing in Sensor Systems (DCOSS) (2017)
15. Lee, H.-C., Ke, K.-H.: Monitoring of large-area IoT sensors using a LoRa wireless mesh
network system: design and evaluation. IEEE Trans. Instrum. Measur. 67(9), 2177–2187
(2018)
16. Aoudia, F.A., Gautier, M., Magno, M., Gentil, M.L., Berder, O., Benini, L.: Long-short
range communication network leveraging LoRa™ and wake-up receiver. Microprocess.
Microsyst. 56, 184–192 (2018)
17. El Hoda Djidi, N., Magno, M.: Opportunistic cluster heads for heterogeneous networks
combining LoRa and wake-up radio. In: International Conference on EWSN, pp. 200–205
(2020)
18. Omnetpp.org. Simulation Models And Tools (2020). https://omnetpp.org/download/models-
and-tools. Accessed 21 Mar 2020
Smart Approach for Discovering Gateways
in Mobile Ad Hoc Network

Kassem M. Mostafa(&) and Saad M. Darwish

Department of Information Technology, Institute of Graduate Studies


and Research, Alexandria University, Alexandria, Egypt
{kassem.mohammed,saad.darwish}@alexu.edu.eg

Abstract. Providing Internet access for nodes in wireless mesh networks or


Mobile Ad hoc Network (MANET) is one of the most critical and challenging
processes. Multiple gateways (GWs) are deployed to enhance the capability of
MANET. Many of the existing routing protocols, either using the traditional
routing approaches or using the tree-based network’s approach, are employed
and enhanced to work with MANET. However, as MANET is dynamic and
temporary networks, using current methods fail to discover new or failure GWs
and are required more overhead on the network. To handle this gap, the work
presented in this paper utilizes swarm intelligence techniques to build a routing
approach dedicated to MENET that works well to discover new GWs, checks,
and maintains existing paths to the GWs. Furthermore, it can discover new paths
to existing GWs, detect any link failure in any path, and try to fix it. Many
experiments were conducted to validate the suggested routing approach, and the
results reveal the applicability of the suggested model.

Keywords: MANET  Routing protocol  Swarm intelligence  Gateways


discovering

1 Introduction

MANET is a group of wireless nodes that formed a temporary network arranged in an


ad-hoc manner without any fixed infrastructure services; this type of network is
dynamically self-organized and self-configured. Each node acts as a source and also acts
as a router and can forward packets [1, 2]. The distinctive attributes of MANET have
made it useful for a large number of applications [3]. Most of the applications and users
in MANET require Internet resources. The GW is usually the most important compo-
nent of Internet access. Multi-GWs should be deployed to provide high performance and
reliable Internet access [1]. Because of the nature and characteristics of the MANET and
without any fixed infrastructure services, the process of discovering GWs that are added
to the network or left it or even detecting failure or new links to existing GWs is
difficult, it cases much overhead and loaded on the network. There are many of the
existing routing protocols that tried to provide and enhance Internet access in MANET
[4]. In the literature, the routing protocols are classified into three approaches. Reactive
(on-demand) such as Ad hoc On-Demand Distance Vector (AODV), Pro-active (table-
driven), and hybrid routing protocols. See [5, 6] for more details.

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 793–802, 2021.
https://doi.org/10.1007/978-3-030-58669-0_70
794 K. M. Mostafa and S. M. Darwish

The hybrid protocols are a combination of both reactive and proactive approaches.
They initially establish the route from routing information available in the routing table
(proactive) and use the reactive approach for updates and more demands. But these
protocols either built on the traditional approach that does not support multiple paths or
does not support multiple GWs or built on a tree-based routing approach [5]. However,
the discovering of the GWs process in the tree-based routing starts to form the GW
itself by announcing its presence using broadcast. It uses a lot of broadcasts in the
processes of discovering GWs or get an update of the existing GWs. This amount of
broadcast in a low bandwidth and capacity network like MANET case a lot of issues in
network throughput, packets lost, or protocol overhead [6–9].
The Swarm Intelligence approaches such as Ant Colony Optimization (ACO),
Artificial Bee Colony (ABC) are more efficient than early approaches in providing
loop-free, energy-aware, and multi-path routing in mobile Ad-hoc [10, 11]. They are
more promising in providing better routing overhead, packet delivery ratio, and average
end-to-end delay [12]. Motivated to reduce a lot of broadcast in the processes of
discovering GWs or get the update of the existing GWs, the work presented in this
paper proposed a modified bio-inspired hybrid routing technique based on ACO to
enhance the process of discovering new GWs, check and maintaining existing paths to
the GWs, discovering new paths to existing GWs and detecting any link failure in any
path and try to fix that failure. Utilizing ACO minimizes the number of the broadcast in
the network that will decrease the overhead on the network and the time of the dis-
covering as compared with the tree-based routing protocols [4, 7, 8].
The rest of this paper is organized as follows: Sect. 2 describes some of the recent
related works. The detailed description of the proposed approach has been made in
Sect. 3. In Sect. 4, the results and discussions on the dataset are given. Finally, the
conclusion is annotated in Sect. 5.

2 Literature Review

In this section, we will introduce a brief of the previous researches that provided
performance analysis and comparison of the existing routing protocols in MANET and
some of the limitations of them. Also, we will present some of the research that focuses
on the benefits and advantages of implementing ACO in the routing protocols in
MANET. A. Sureshkumar et al. [13] provided an analysis of the performance of the
AODV protocol. AODV protocol still needs improvement in QoS (Quality of Service),
packet delivery, and node’s energy. It is difficult to measure the expiry time of the
routing if no data is transferred. Furthermore, this type of protocol does not support
new path detection or new gateways detection after the initial paths discover to the
initial gateway.
The authors in [14] discussed the performance analysis of the enhanced version of
AODV protocols like Ad-hoc On-demand Distance Vector routing from Uppsala
University (AODV-UU). This protocol supports IPv6 and multicasting and AD-HOC
On-demand Multiple Distance Vector Routing (AOMDV), which represents a multi-
path extension of AODV. Hence it needs to handle more control packets that, in turn,
increase the routing overhead [15, 16].
Smart Approach for Discovering Gateways in Mobile Ad Hoc Network 795

P. Manickam et al. [17] provided an in-depth detail of the Hybrid Wireless


Mesh Protocol (HWMP) that is the IEEE 802.11s default routing protocol. HWMP is
based on AODV and has a configurable extension for tree-based proactive routing. It
exploits a tree-based routing combined with an on-demand routing protocol to address
mobility, which causes a bottleneck at the root node. HWMP suffers from frequent
route reconstruction due to the unstable link state in the wireless environment, and the
path diversity is not efficiently exploited by single-path routing. The authors in [6]
suggested a Multi-Gateway Multipath routing protocol (MGMP) that handles the
limitations of HWMP. It is designed to explore multiple paths during route discovery in
multi-gateway MANET. But it is still using the tree architecture presented in HWMP,
which causes a bottleneck at the root node and more broadcast.
Recently, ACO is one of the most successful swarm intelligence algorithms in some
fields of the design of MANET [18]. For instance, the authors in [19] presented a study
that approved ACO is an efficient and comparatively better way to enhance the overall
performance of a MANET routing protocol in terms of overhead and connectivity. Ant-
AODV is a hybrid protocol that can provide reduced end-to-end delay and high con-
nectivity as compared to AODV. Ant Dynamic Source Routing (Ant-DSR) is a reactive
protocol that implements a proactive route optimization method through the constant
verification of cached routes that increases the probability of a given cached route and
network reality. Furthermore, the hybrid ant colony optimization routing algorithm for
mobile ad hoc network (HOPNET) involves characteristics of the Zone Routing Pro-
tocol is discussed in many research [12, 18, 19]. Unlike the hybrid routing approaches
that utilize the ACO technique for the proactive phase only to discover gateways, the
suggested approach uses ACO both in all steps reactive and proactive so that it
enhances the routing functionality in both discovery’s time and network overhead.

3 Methodology

The proposed gateways discovering approach aims to provide an automatic and effi-
cient way for nodes in the MANET network to find new Internet Gateways added to the
network and discover new paths and maintaining existing routes to the existing Internet
gateways. So, the suggested approach falls under multipath multi-gateways routing.
Each node will create a data structure that will be saved in an array called a pheromone
table (PH-Table) that contains the attributes of each path, as shown in Table 1. The PH-
Q will be calculated form the minimum bandwidth link among the path (BWmin), the
queue length of maximum node’s queue among the path (Qmax), and the number of
nodes among the path (Npath). The EV will vary between 1, 0.5, or 0; where 1 indicates
that the path is verified, 0.5 specifies that the path needs to be checked and 0 indicates
that the path is expired. The proposed routing approach consists of three phases: GW
offloading phase, reactive phase, and proactive, phase as illustrated in Fig. 1. Herein,
ACO is utilized to discover the paths to the GW and collect the necessary data to build
the PH-Table. To achieve this goal, an electronic packet will be represented as ant; it is
called the ant agent.
796 K. M. Mostafa and S. M. Darwish

Table 1. Description of the PH-Table


Abbreviation Name Description
NID Neighbor ID Represented by the node MAC address
GWID Gateway ID Represented by the GW MAC address
NNID Next node ID The id of the next node of the GW path
PH-Q Pheromone Represent the quality of the GW path
quantity
EV Evaporation A variable represents the evaporation value of the GW
variable path

Fig. 1. The suggested ant–based gateways discovering approach


Smart Approach for Discovering Gateways in Mobile Ad Hoc Network 797

After building the PH-table, there will be many paths the ant could choose to reach
the GW, and the node will choose the best paths probabilistically, based on the PHQ
associated with next hops [20].

ðPHQnid Þb1
PNIDG ¼ P b1
; b1  1 ð1Þ
J2Npath ðPHQJ Þ

where PNIDG is the probability the NNID to reach the GW, PHQnid is the PHQ of the
NNID , Npath is all the NNID of all available GW paths and b1 is a parameter value that
can control the exploratory behavior of the ants. In most ACO routing protocol, b1 = 1
[1] to choose the best path but in our proposed protocol and because it is multi paths
multiple GWs we gave it a large number b1  20, which will lead to that if several
paths have a similar quality, data will be spread over them. However, if one path is
clearly better than the other, it will almost be preferred.

3.1 GW Offloading Phase


In this phase, any GW announces its neighbors’ nodes about itself; it sends a broadcast
packet periodically with TTL = 1 to ensure that broadcast will end at the next neighbor.
This packet will include information about the GW so each neighbor will calculate the
status and the quality of that GW each time it receives the packet, and it will consider
the GW down when no packet is received. Within the suggested routing protocol, in the
reactive and proactive phases, each node in the network will send ants periodically to
either discover the GW or check its availability. The GW should replay to all those
ants, which will lead to overhead on the GW to serve the network discovery. So, we
were offloading that process from the GW to its neighbors’ nodes and dedicated the
GW to do its primary job, providing Internet access. Each node receives the GW
broadcast announcement, it will change and tag itself as GW neighbor (GWN) and will
replay to any node’s requests to check the GW availability on behalf of the GW.

3.2 The Reactive Phase


The reactive phase is used in the initial discovery when the node does not have any GW
in its PH-table, or the case of continuous discovery when it has GAs but discovering
new GW. In this phase, the algorithm sends an ant agent called Forward Ant (FW-Ant)
to each neighbor that does not appear as the next node in any GW path; in the initial
discovery, it will send to all neighbors. Each neighbor receives the FW-Ant will search
in its PH-Table. If there is no GW, the current node repeats the search and sends FW-
Ant to all neighbors except for the node that sent the package. But if it has GW or many
GWs, it will check the EV of each GW path. For each GW path, if the EV = 1, the path
is verified and updated. Consequently, the FW-Ant tour ends, and the FW-Ant is
swapped to a Backward Ant (BW-Ant). If the GW path’s EV = 0.5, the path is valid
but not updated, so the node will start the proactive phase for that path to update it then
covert the FW-Ant to BW-Ant and sends it back to the source node. The BW-Ant will
update the PH-Table of the source node with the GWID, GW path and it’s PHQ
798 K. M. Mostafa and S. M. Darwish

depending on the BWmin, Qmax and the Npath of the path from the GW to the current
node. Also, it will update the path’s EV = 1.

3.3 The Proactive Phase


If any existing GW path does not use or checked recently, the node will change its EV
from 1 to 0.5, which means that the path needs to be updated, and that update is
achieved by the proactive phase. In the proactive phase, the node will check if there is
any GW path with EV = 0.5 so that the suggested model will send FW-Ant to the
NNID in the path to ask for an update; the next node will check its PH-Table. If its PH-
Table has no GW path, it will start the reactive process, and if it has GW or GWs paths,
it will check the EV of each one. It the EV = 1, then it will create a BW-Ant and sends
it to the source node with all information about the path. If the path EV = 0.5, again,
this node will start the proactive process to update the path by forwards FW-Ant to the
NNID until all the FW-Ants converted to BW-Ants either by nodes among the path or
reaching the GWN. There it will be the end of any FW-Ant and no FW-Ant reaches the
GW because of the offloading process. The BW-Ant will return using the same FW-Ant
path and update the PH-Table of each node among the path as described previously.
Also, each node will check the GWID carried with each BW-Ant and the GWID that the
FW-Ant was sent through the same NNID, and if they are the same, the existing path’s
EV will update to 1. If they are not the same, a new GW path will be added. If no BW-
Ant’s GWID matching the FW-Ant’s GWID, the existing GW path’s EV will be
updated to 0, which means it is not valid and should be expired.
In general, in our approach, the reactive phase is used to discover new GWs or GW
paths through neighbors that exist in any GW paths exit. But the proactive phase is
used to discover new GWs or GW paths through neighbors exist in existing GW paths.
Also, the proactive phase is used to updating or expiring existing GW paths.

4 Experimental Results

In the simulation, we used MATLAB 9.1 (R2016b) to compare our routing approach
with the AODV and Hybrid Wireless Mesh Protocol (HWMP). AODV is widely used
in MANET, and HWMP is the default routing protocol for IEEE in MANET. The
simulation was implemented in three rounds with different numbers of nodes, different
amounts of GWs, and multiple simulation times, as shown in Table 2, to confirm its
stability. In the first round, the network contains 100 nodes with five nodes support
Internet access “GWs.” The objective of this round is to measure the performance of
the reactive phase for each protocol. We measured the time for all nodes to discover all
GW with all available paths. In the second round, new GWs were added to the network
in the case of 1, 3, and 5 new GWs. The objective of this round is to measure the
discovering time of all new GWs at all nodes. In the third round, new nodes were added
to the existing network, and the objective is to assess the scalability of the suggested
routing model in terms of the time it took to discover all GWs. Furthermore, the
following criteria are evaluated that includes the routing overhead, which is the ratio of
the routing control packets to the total packets transmitted over the network; and the
Smart Approach for Discovering Gateways in Mobile Ad Hoc Network 799

GWs overhead which is the ratio of the routing control packets on the GW to the total
packets that the GW process to and from the internet. The following table will describe
the parameters used in the simulation.

Table 2. Simulation parameters


Parameter Value Parameter Value
1-Area 1000  1000 m 2-No of nodes 50, 100 +
3-No of GWs For 5 to 10 4-Radio type 802.11b
5-Data rate 2 Mbps 6-Channel frequency 2.4 Ghz
7-No. of channel 1 8-Antenna model Omni directional
9-Application type HTTP and video streaming 10-Packet size (Byte) 512
11-Simulation time 120 s, 300 s and 500 s

Figure 2a reveals that the proposed model provides less time for nodes to discover
all available GWs in the network than AODV and HWMP. One possible explanation of
this result is that even all protocols initiate the discovery phase with the broadcast. Still,
AODV and HWMP required the discovery package to arrive at the GW. In contrast, the
proposed model will stop the discovery process at any node among the path that
already discovered the GW. From Fig. 2b, we found that discovering new added GW
to the network required a long time for AODV because it is a reactive or on-demand
protocol. For both HWMP and the proposed model, they take less time for discovering
new GWs as they are a hybrid protocol that depends on the proactive phase. Yet, the
proposed model is quicker than HWMP because any node will find the new GW from
any neighbor that already discovered it; but HWMP will wait until all nodes receive the
announce broadcast form the GW. Furthermore, as confirmed from Fig. 3, for any new
node added to the network, the proposed protocol provides a good time to discover the
whole available GWs. In the proposed model, when the new node begins the discovery
process through broadcasting, all the neighbors will respond immediately with all
available GWs. Still, in the case of AODV and HWMP, the discovery broadcast must
reach the GW and return to the source node.
It can be observed from Fig. 4 that HWMP requires more routing overhead as it
applies two steps broadcasting; the first step is from the node to discover the network,
and the second step is from the GW to announce itself. But, for both AODV and the
suggested approach, the routing overhead decreases after the initial phase. But as
HWMP uses the broadcast in the reactive phase, so the routing overhead is increased.
Yet the proposed protocol will update its routing table “the pheromone table” by
negotiating the neighbors for any update using the FW-Ant. The AODV overhead
decreased because it does not provide a proactive technique, and that is one of its
limitations. Figure 5 validates the GW offloading feature of the proposed protocol. In
the case of the proposed protocol, the GW overhead is about 0.25% compared to the
AODV protocol and about 0.35% in the case of HWMP. The GWN will respond on
behalf of the GW that offloads the GW overhead to the minimum.
800 K. M. Mostafa and S. M. Darwish

Fig. 2. (a) Initial discovery time. (b) Time to discover GWs

Fig. 3. Time of new node to discover all GWs.

Fig. 4. Routing overhead over the time


Smart Approach for Discovering Gateways in Mobile Ad Hoc Network 801

Fig. 5. GW overhead over the time.

5 Conclusion

In this paper, we introduced a new hybrid routing discovery protocol based on the
evolutionary technique. It is a multi-paths multi-gateways protocol that provides a lot
of paths through parallel computation to enhance network capacity. The suggested
approach implements both reactive and proactive phases from the source node using
ACO. Yet it uses minimum broadcast in the network. The results confirm the superi-
ority of the approach in terms of GWs discovery time as compared with state-of-the-art
routing protocol AODV and HWMP. Our proposed model still needs more improve-
ment in its initial discovery phase. It is not suitable to serve the internal network
discovering that means discovering destinations inside the network and their paths. So,
in the future, to improve it to overcome these limitations to build a complete routing
protocol in MANET.

References
1. Ian, F., Wangb, X., Wang, W.: Wireless mesh networks: a survey. Comput. Netw. 47(4),
445–487 (2005)
2. Nandiraju, N., Nandiraju, D., Santhanam, L., He, B., Wang, J., Agrawal, D.: Wireless mesh
networks: current challenges and future directions of web-in-the-sky. IEEE Wirel. Commun.
14(4), 79–89 (2007)
3. Shabana, S., Noor, R., Hossein, M.: Review on MANET based communication for search
and rescue operations. Wirel. Pers. Commun. 94(1), 31–52 (2015)
4. Hu, Y., He, W., Wei, H., Yang, S., Zhou, Y.: Multi-gateway multi-path routing protocol for
802.11s WMN. In: Proceedings of the 6th IEEE International Conference on Wireless and
Mobile Computing Networking and Communications, Canada, pp. 308–315 (2010)
5. Shobana, M., Karthik, S.: A performance analysis and comparison of various routing
protocols in MANET. In: Proceedings of the International Conference on Pattern
Recognition Informatics and Mobile Engineering, India, pp. 391–393 (2013)
6. Sharma, A., Kumar, R.: Performance comparison and detailed study of AODV, DSDV,
DSR, TORA and OLSR routing protocols in Ad Hoc networks. In: Proceedings of the 4th
International Conference on Parallel Distributed and Grid Computing, India, pp. 732–736
(2016)
802 K. M. Mostafa and S. M. Darwish

7. Gandhi, S., Chaubey, N., Shah, P., Sadhwani, M.: Performance evaluation of DSR, OLSR
and ZRP protocols in MANETs. In: Proceedings of the IEEE International Conference on
Computer Communication and Informatics, India, pp. 1–5 (2012)
8. Sunil, J., Jagdish, S.: Evaluating performance of OLSR routing protocol for multimedia
traffic in MANET using NS2. In: Proceedings of the 5th IEEE International Conference on
Communication Systems and Network Technologies, India, pp. 225–229 (2015)
9. Singh, M., Lee, S.-G., Lee, H.: Non-root-based hybrid wireless mesh protocol for wireless
mesh networks. Int. J. Smart Home 7(2), 71–84 (2013)
10. Habboush, A.K.: Ant colony optimization (ACO) based MANET routing protocols: a
comprehensive review. Comput. Inf. Sci. 12(1), 82–92 (2019)
11. Mingchuan, Z., Meiyi, Y., Qingtao, W., Ruijuan, Z., Junlong, Z.: Smart perception and
autonomic optimization: a novel bio-inspired hybrid routing protocol for MANETs. Future
Gener. Comput. Syst. 81, 505–513 (2017)
12. Zhang, H., Wang, X., Memarmoshrefi, D.: A survey of ant colony optimization-based
routing protocols for mobile AdHoc networks. IEEE Access 5, 24139–24161 (2017)
13. Sureshkumar, A., Ellappan, V., Manivel, K.: A comparison analysis of DSDV and AODV
routing protocols in mobile ADHOC networks. In: Proceedings of the IEEE Conference on
Emerging Devices and Smart Systems, India, pp. 234–237 (2017)
14. Daxesh, N., Sejal, B., Hemangi, R., Kothadiya, D., Rutvij, H.: A survey of reactive routing
protocols in MANE. In: Proceedings of the IEEE International Conference on Information
Communication and Embedded Systems, India, pp. 1–6 (2014)
15. Kumar, A., Rahul, H.: Performance analysis of DSDV, I-DSDV, OLSR, ZRP proactive
routing protocol in mobile AdHoc networks in IPv6. Int. J. Adv. Sci. Technol. 77(3), 25–36
(2015)
16. Patel, B., Srivastava, S.: Performance analysis of zone routing protocols in mobile AdHoc
networks. In: Proceedings of the IEEE National Conference on Communications, India,
pp. 1–5 (2010)
17. Yang, K., Ma, J., Miao, Z.: Hybrid routing protocol for wireless mesh network. In:
Proceedings of the IEEE International Conference on Computational Intelligence and
Security, China, pp. 547–551 (2009)
18. Liu, X.: Routing protocols based on ant colony optimization in wireless sensor networks: a
survey. IEEE Access 5, 26303–26317 (2017)
19. Kumar, A., Sadawarti, H., Kumar, A.: MANET routing protocols based on ant colony
optimization. Int. J. Model. Optim. 2(1), 42–49 (2012)
20. Ducatelle, F., Caro, G., Gambardella. L.: Using ant agents to combine reactive and proactive
strategies for routing in mobile ad hoc networks. Int. J. Comput. Intell. Appl. 5(2), 169–184
(2005)
Computational Intelligence Techniques
in Vehicle to Everything Networks: A Review

Hamdy A. M. Sayedahmed1(&), Emadeldin Mohamed2,


and Hesham A. Hefny2
1
Central Metallurgical Research and Development Institute (CMRDI),
Cairo, Egypt
hamdi@cmrdi.sci.eg
2
Faculty of Graduate Studies for Statistical Research, Cairo University,
Giza, Egypt
{eemmohamed,hehefny}@cu.edu.eg

Abstract. The remarkable developments in wireless communication tech-


nologies in recent years along with the anticipation of further advances in these
technologies have stimulated the efforts to investigate vehicle to everything
(V2X) communication to increase road safety, improve traffic management, and
facilitate infotainment applications. V2X communication is based either on
cellular infrastructures such as 4G and 5G, or wireless LAN technologies such
as the IEEE 802 protocol family. Several challenges, however, face the
deployment of V2X. Examples of these challenges are routing, security, and
analysis of collected data. Several research efforts have investigated computa-
tional intelligence methods to tackle these issues. In this paper, we review recent
advances in computational intelligence in V2X communications. First, we
provide preliminaries of V2X communication. Second, we discuss classical and
computational intelligence solutions in V2X communication. Last, we present
open problems and research challenges that need to be addressed to realize the
full potential of V2X systems.

Keywords: V2X  V2V  V2I  Smart city  Intelligent transport systems 


Computational intelligence

1 Introduction

Recently, demands for wireless-based systems has been witnessing an enormous


increase in smart cities. Transportation systems, particularly, have started to deploy
wireless technologies such as WiFi, Bluetooth, 4G, and 5G to support communications
between vehicles. Connected vehicles, also referred to as vehicle-to-everything (V2X),
enables many new services that aim at enhancing road safety, improving traffic man-
agement, and providing infotainment applications [1]. These applications signify the
need for next-generation mobile vehicular networks that provide low latency, high
throughput, and better packet delivery. The promising 5G communication systems use
various technologies such as new radio frequencies (NR), massive MIMO, edge
computing, beam-forming, millimeter-wave, and small cells to reach data rates up to 10

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 803–815, 2021.
https://doi.org/10.1007/978-3-030-58669-0_71
804 H. A. M. Sayedahmed et al.

Gbps, support real-time applications, and cover vehicle speeds from 0 km/h up to
500 km/h with a robust quality of service (QoS) to support a variety of services [2].
In V2X communication, vehicles have the main characteristics of sensing, com-
municating, computing, and actuating with other devices. V2X communication can be
defined as the exchange of data between vehicles and other units such as pedestrians,
Internet gateways, and transport units such as traffic lights and signs that are part of an
intelligent transportation system (ITS). The term V2X communication encompasses
several communication systems such as vehicle-to-vehicle (V2V), vehicle-to-
infrastructure (V2I), and vehicle-to-pedestrian (V2P) communications. V2V commu-
nication is typically characterized by limited sensor range and computational capa-
bility. V2I connects user equipment (UE) to roadside units (RSU), which typically are
connected to long term evaluation (LTE) network. V2P communication connects
vehicles and pedestrian equipment either directly or through infrastructure. In general,
traffic information is routed to gateway vehicles whose responsibilities are to collect
traffic information from other vehicles and transfer them to a control center [3].
Supporting communications over various V2X environments is hard. For instance,
in urban areas, increased vehicle density reduces spectrum efficiency. In contrast in
rural areas where vehicles are rare, V2X communication still does not perform well due
to the smaller number of users’ terminals. To solve this problem, the European
Telecommunications Standards Institute (ETSI) has standardized V2X communication
in ITS using the cooperative awareness messages (CAMs) and the decentralized
environmental notification messages (DENMs). CAMs are periodic and convey vehicle
status, whereas DENMs are used to warn users of dangerous events [4, 5].
V2X communication supports the exchange of a huge amount of information. By
forming datasets obtained from multiple sources, V2X applications could intelligently
carry out multiple objectives such as reducing road crashes, reducing traffic congestion,
and improving fuel consumption. These datasets could also be used to forecast traffic
for better management and road safety. However, dealing with such vast datasets
requires massive computational resources if not done smartly. Luckily, we already have
experience dealing with large datasets in other disciplines such as medical diagnoses,
robotics, industry production, and data science using computational intelligence (CI).
CI methods comprise approaches such as fuzzy set theory [6], artificial neural networks
(ANNs) [7], swarm intelligence (SI) [8], and genetic algorithms (GAs) [9]. They
introduce low-cost approximate solutions to uncertainty and nonlinearities. Uncertainty
comes from many reasons such as lack of information, conflicting evidence, and
ambiguity of a situation [10]. Generally, CI is efficient for data with highly sparse
relevancy.
This paper provides a comprehensive review of recent advances in computational
intelligence solutions in V2X communication. It presents preliminaries in V2X com-
munications as well as a discussion of classical and computational intelligence-based
methods that enhance V2X communications. It also presents open problems and
research challenges in V2X systems. The rest of the paper is organized as follows.
Section 2 provides an overview of V2X networks. Section 3 introduces a review of CI
techniques. Section 4 surveys recent research efforts in CI solutions for V2X systems.
Section 5 presents the challenges and open issues. Section 6 presents the conclusion
and future work.
Computational Intelligence Techniques in Vehicle 805

2 V2X Networks

Each vehicle in a V2X network can communicate with other vehicles and devices
(Fig. 1) either directly through wireless communication or indirectly via multi-hop
connections. In V2V networks, vehicles can share information such as their velocities,
locations, directions, and traffic hazards to avoid vehicle crashes. In V2I networks,
vehicles communicate with road systems such as cameras and lane markers for better
traffic management. Generally, a V2I network contains as a minimum an onboard unit
(OBU), a roadside unit (RSU), and a communication channel. The V2P network senses
the surrounding environment from portable devices and shares this information with
other vehicles/pedestrian/motorcycles to prevent accidents. In the V2D network, con-
nections occur between vehicles and any electronic devices. In V2N networks, vehicles
communicate with each other through an LTE network that broadcasts or unicasts
packets using an application server. V2G networks connect to the service providers
through gateways or other networks [11–14].

V2I (vehicle-to-infrastructure)

V2N (vehicle-to-network)

V2X (Vehicle-to- V2V (vehicle-to-vehicle)


everything) V2P (vehicle-to-pedestrian)

V2D (vehicle-to-device)

V2G (vehicle-to-grid)

Fig. 1. V2X different networks.

From the protocol perspective, a 5G V2X network is based on two main tech-
nologies: dedicated short-range communications (DSRC) and long-term evolution-
vehicle (LTE-V). DSRC is single/multiple short-range to medium-range wireless
communication channels specifically designed for automotive purposes. DSRC is
achieved over reserved radio spectrum bands, which differ in North America, Europe,
and Japan, posing incompatibility problems [11]. DSRC relies on radio communication
transceivers mounted on vehicles as well as roadside units. The DSRC can be mapped
into the TCP/IP protocol stack (Fig. 2). One of the main challenges for DSRC tech-
nology is that the initial cost of installing the infrastructure could be high [13, 15, 16].
In addition, the IEEE 802.11p has many issues such as coverage area, mobility limi-
tation, long end-to-end delay, incomplete use cases, and weakness in reliability.
The Third Generation Partnership Project (3GPP) is a standardization organiza-
tion that develops protocols for mobile telephony. A well-known work of 3GPP is the
5G NR (new radio) designed to be the global standard for the air interface of 5G
networks [17]. There are two frequencies bands in 5G NR: the sub-6 GHz frequency
806 H. A. M. Sayedahmed et al.

Safety application sublayer


IEEE 1609.2 Message sublayer Application layer
security SAE J2735
WSMP Transport layer
IEEE 1609.3 Network layer
LLC sublayer
IEEE 802.2
MAC sublayer extension
IEEE 1609.4
MAC sublayer
IEEE 802.11p
Physical layer

Fig. 2. DSRC TCP/IP layers mapping [16].

bands and the 24-100 GHz frequency bands. Based on LTE-4G infrastructure, 5G NR
will be launched initially in non-standalone mode (NSA) and before the launch of the
full standalone mode (SA). 5G NR will help in promoting V2X communications that
provide V2V, V2I, and V2P communications, leading to an increase in autonomous
(self-driving) vehicles and better use of the Internet of Things (IoT).
While initial specifications enabled non-standalone 5G radio systems to be inte-
grated with LTE-4G, the scope of Release 15 expands to cover ‘standalone’ 5G with a
new radio system complemented by a next-generation core network. It also embraces
enhancements to LTE and, implicitly, the evolved packet core (EPC). This crucial
waypoint has enabled vendors to progress rapidly with chip design and initial network
implementation during 2019. As the Release 15 work has matured, the group’s focus is
now shifting to the first stage of Release 16, often informally referred to as ‘5G Phase
2’. Studies and researches on Release 17 are in progress, covering Multimedia Priority
Service, 5G V2X application layer services, 5G satellite access, Local Area Network
support in 5G, wireless and wire-line convergence for 5G, terminal positioning and
location, communications in vertical domains and network automation, and novel radio
techniques as interesting topics. Further studies have been launched or progressed on
security, codecs and streaming services, LAN interworking, network slicing, and the
IoT.
Both DSRC and cellular (LTE-V) technologies have advantages and disadvantages
(Table 1 presents a comparison between the two approaches). Hybrid approaches are
required for a better quality of service. For better hybridization, four issues have to be
considered. 1) separation between DSRC and LTE-V bands is required to avoid
interference as declared in Europe by 5GAA. 2) Artificial intelligence techniques can
be utilized to control traffic, routing strategies, and joint scheduling by learning patterns
from real-time data. 3) Software-defined vehicular networking (SDVN) such as multi-
hop routing, dynamic resource allocation, and mobility can be used to ease the man-
agement of the heterogeneous 5G V2X networks. 4) New Millimeter-Wireless Access
in Vehicle Environment (mm-WAVE) dedicated spectrum or modified IEEE 802.11ad
can be used to provide higher data rates [13, 18, 19].
Computational Intelligence Techniques in Vehicle 807

Table 1. V2X DSRC versus LTE-V


Criteria DSRC LTE-V
Applications Better support for safety applications Better support for non-safety
targeted applications
Algorithm Carrier-sense multiple access with Beacon scheduling, resource
used collision avoidance (CSMA/CA) selection algorithm, and side-link
protocol
Performance Shorter symbol duration so it Concerning data rate and larger
presents a high performance in coverage, LTE is significantly
speeds up to 250 km/h more than DSRC

3 Computational Intelligence Techniques

Computational Intelligence (CI) could be defined as a combination of soft computing


(SC) and numerical processing that can introduce low-cost approximate solutions to
uncertainty. CI different techniques are used in problems that have no certain models,
or those that have models but their computations are hard and complex. CI comprises a
fuzzy set theory (FS), artificial neural networks (ANNs), swarm intelligence (SI),
artificial immune system (AIS), and evolutionary computation (EC) as shown in
(Fig. 3) [20]. FS is inspired by human thinking processes and the ability to reason
based on imprecise information. ANNs can perform natural tasks such as perception
and recognition. SI can imitate the social behavior of organisms living in swarms or
colonies. AIS is inspired by natural immune systems that have matching ability to
distinguish between foreign cells entering the body and those cells belong to the body.
EC is inspired by natural evolution that is based on the survival of the fittest concept.

Fig. 3. CI methods

3.1 Fuzzy Logic


Fuzzy logic (FL) is used in systems where fuzziness exists such as systems that depend
on human observation, have continuous input/output systems, are vague, or are
impossible to model. It is efficient in handling nonlinearities and provides low-cost
approximate solutions with effective degrees of precision by exploiting tolerance of
808 H. A. M. Sayedahmed et al.

imprecision. To build a fuzzy model, either a fuzzy clustering is issued or a domain


expert is invoked. Fuzzy clustering is the process of grouping objects that are similar
and close to each other. In fuzzy clustering, no predefined groups exist and knowledge
about relations between data is unknown. Therefore, clustering is an unsupervised
process.
Many fuzzy clustering algorithms have been proposed in the literature and can be
classified as partitional clustering, hierarchical clustering, density-based clustering,
grid-based clustering, and subspace clustering. The number of clusters is not given for
all clustering algorithms except for density-based algorithms, which require the radius
among other parameters. Generally, fuzzy clustering can be implemented by applying a
clustering algorithm, then forming a membership function for the variables selected, i.e.
determining the shape of the membership function. It uses a list of rules rather than
complicated mathematical expressions. The major components of the system consist of
the knowledge base (rule base and dataset), decision making, fuzzification, and
defuzzification [6, 10, 21].

3.2 Artificial Neural Networks


Artificial neural networks (ANN) are computational networks that attempt to mimic the
neurons’ networks of the human biological nervous system. This simulation is trained
through patterns either known or unknown. ANNs are more suitable for nonlinear or
stochastic problems. ANNs consist of a series of neurons that are interconnected very
similarly to biological ones. ANNs can solve classification problems as they depend on
the number of neurons and the shape of the interconnection. However, a convergence
problem of ANNs may occur due to overfitting while training a model. Moreover, its
design depends on many trials. ANNs include recurrent ANNs (RANNs), feed-forward
networks ANNs, or conventional ANNs [7]. Generally, ANNs are advantageous to
computers as they are highly parallel.
ANN consists of an input layer that includes the features or trained attributes, a
hidden layer that consists of some neurons, and the output layer that produces the
target. The number of neurons on the hidden layer depends on experimenting and
testing.

3.3 Evolutionary Computation


Evolutionary Computation (EC) provides algorithms for optimization that are inspired
by the biological system. These algorithms are sustainable for the trial and usually, the
solution is based on metaheuristic functions. In EC, candidate solutions, referred to as
the initial set, are obtained and recursively modified. Each modification produces a new
solution and removes another that is less in quality (fitness). Therefore, the set of
solutions will have better fitness and then it is required to choose using a fitness
function [22].
Computational Intelligence Techniques in Vehicle 809

3.4 Swarm Intelligence


Swarm Intelligence (SI) is a self-organized algorithm inspired by the collective
behavior of social insect colonies and other animal swarms. SI algorithms usually
contain many relatively homogeneous individuals such that the interactions between
these individuals depend on the colony/swarm rule that creates a piece of local
information that should be exchanged with the colony/swarm individuals. This
behavior is said to be self-organized and can be used to design scalable and parallel
systems [8, 31, 37].

3.5 Artificial Immune System


Artificial Immune System (AIS) is inspired by the natural immune systems that can
distinguish between foreign cells entering the body and those cells belong to the body.
It comprises mathematical and computational techniques that maintain a memory of
past encounters and can continually learn about a new pattern. Generally, it is a rule-
based machine learning systems [20].

4 Computational Intelligence in V2X Communication

On the one hand, CI methodologies can be used to represent and process uncertainty
and nonlinearities that deal with large datasets. On the other hand, 5G V2X commu-
nication creates a huge amount of datasets. These datasets can be analyzed for better
traffic management, road safety, infotainment services, In addition, the network per-
formance depends on different dynamic parameters with different values that collec-
tively or individually form gradual degrees. It is natural to utilize CI techniques in V2X
communication systems. The following is a literature review of the CI techniques used
to support V2X communication.

4.1 V2X Based FS and ANNs


Wu Celimuge et al., 2018 [23] presents a two-level cluster-based approach that gives a
solution for limited bandwidth in a dense vehicle network. The first level is a fuzzy
logic algorithm that contains velocity factor, leadership factor, and signal factor as
inputs, and predefined fuzzy IF-Then rules, and competency value as output. The
second level is a Q-learning algorithm that uses to tune the number of gateways nodes.
The approach provides a comparison against LTE and CDS-SVB using ns2. However,
the experiments assume connected network topology, unicast communications for
V2V, and sending data from the cloud to vehicles.
H. Zhang et al., 2018 [25] proposed Security Aware Fuzzy Enhanced Reliable Ant
Colony Optimization (SAFERACO) routing protocol. It identifies the misbehavior of
vehicles and excludes them to improve network performance. However, it does not
consider the misbehavior of trusted vehicles under dynamic attacks. Moreover, the
simulation environment parameters are not realistic.
810 H. A. M. Sayedahmed et al.

A composite model for traffic delay estimation using fuzzy logic modeling and
ANNs is introduced by Rani Pushpi et al., 2018 [24]. The mean squared error
(MSE) and mean absolute error (MAE) are used to validate the accuracy of the model.
The validating results showcase that the model has better accuracy than earlier ones.
The system involves the determination of environmental conditions and estimation of
traffic delay. Fuzzy logic is used to evaluate the environmental conditions for a par-
ticular duration using climate and street conditions, threshold capacity, and traffic
volume have measured the usage of sensors positioned at road link and a neural
network model is deliberated for the estimation of traffic delay of flow.
Zhiguang Cao et al., 2017 [27] uses fuzzy logic and multiagent system to minimize
vehicle routing delay. However, agent vehicles receive the route recommendation from
infrastructure agents. This renders the solution expensive as it provides each infras-
tructure in every intersection. Zahid Khan et al., 2019 [28] uses fuzzy logic and Q-
learning to improve data dissemination in V2X networks. However, the work does not
consider coverage, lifetime, and energy consumption. Besides, the Q-learning algo-
rithm produces additional complexity, and the analytical model is not realistic.
Raras Tyasnurita et al., 2017 [26] present an Artificial Neural Network Time Delay
Neural Network (TDNN) to decrease the vehicle routing process. However, increasing
the number of vehicles leads to expensive routing, and the solution is not scalable.

4.2 V2X Based EC and SI


Jon Henly Santillan et al., 2018 [29] uses a Cuckoo search algorithm as a type of
Swarm Intelligence and treats it as an optimization problem for routing V2V networks.
It demonstrates the potential of the hybrid algorithm in acquiring solutions for routing
problems and creates additional complexity. Another optimization solution that utilizes
the Bee colony algorithm is given in [30] for the capacitated vehicle routing problem.
To improve the Quality of Service (QoS) in terms of reliability and availability of
beacons, a Swarm Intelligence-Optimization (Ant colony Algorithm) is used in V2V
and V2I networks by [31]. The simulation environment does not depend upon the
exchange of beacon messages. Besides, the mobility model is not realistic and does not
consider data generated by applications that may result in congestion or packet loss.
To better secure V2X communications, a study is proposed in [32] that is based on
block-chains. The objective is to relieve data reconnaissance and to get a cheap durable
solution. A block-chain is typically an ordered and time-stamped list of blocks com-
prising multiple transactions. V2X communications include roadside units as a major
participant in the network. Each vehicle in the network can verify data transmitted to
others after passing RSU to reach a distributed ledger. Distributed ledger stores
transactions on its database for every connected vehicle and other ledgers as well. Thus,
a new vehicle enters the network can immediately be verified. However, the devel-
opment and business communities have not yet agreed on standards for blockchain
technology. In a different study [33], a sub-layer for V2X communication protocols that
enhances privacy, authentication, and confidentiality is added. This sub-layer is a
statistical inference that observes vehicles’ behavior and is not cryptographic-based.
Detection of routing attacks (black hole and wormhole) is one of the V2X issues
that could be solved using intelligent water drops (IWD) algorithm, one of the swarm
Computational Intelligence Techniques in Vehicle 811

Table 2. Comparison in 5G V2X literature work


Citation CI technique Evaluation Advantages Disadvantages
Wu Celimuge Fuzzy logic and Q- ns2 A solution for ns2 consumes CPU and
et al., 2018 [23] learning limited bandwidth memory for a large
in a dense network number of nodes
Rani Pushpi Fuzzy logic and MATLAB Measures traffic Model is not realistic
et al., 2018 [24] ANNs delay based on
environmental
conditions
H. Zhang et al., Fuzzy logic and Ant ns3 Detects flooding Does not consider the
2018 [25] colony attacks and misbehavior of trusted
outperforms vehicles under some
AntHocNet attacks
Raras ANNs MATLAB Decrease the routing Increasing vehicle
Tyasnurita process number leads to
et al., 2017 [26] expensive routing
Zhiguang Cao Fuzzy logic and Simulation of Minimizes vehicle An expensive solution
et al., 2017 [27] multi-agent system Urban routing delay
Mobility
(SUMO)
Zahid Khan Fuzzy logic and Q- Simulation of Improves data Not a realistic model
et al., 2019 [28] learning Urban dissemination in a (does not consider
Mobility V2X network coverage, lifetime, and
(SUMO) and energy consumption
Matlab
Jon Henly Cuckoo Search The algorithm Optimizes vehicle creates additional
Santillan et al., Algorithm was coded routing complexity
2018 [29] using Java
[30] Enhanced Artificial Tested on two Enhances the The computational time
Bee Colony standard capacitated vehicle increases with increasing
Algorithm benchmark routing problem the scale of instances
instance sets
[31] Situational OMNET ++ The algorithm Obtaining the
awareness achieved a reliable geographical information
(SA) concept and an data transmission may consume the
ant colony system resources
(ACS)
Vasiliy Intelligent water ns3 Reduced the routing It is not suitable for
Krundyshev drops attacks highway scenarios.
et al., 2018 [34] (IWD) algorithm
R. Kasana et al., Cat swarm The algorithm Improves packet assumes vehicles do not
2017 [37] Algorithm was coded delivery ratio and change their velocities
normalizes routing
load
Sahar Clustering and Ant ns2 and improves ns2 consumes the CPU
Ebadinezhad colony algorithm SUMO throughput, drop and memory when
et al., 2019 [39] ratio, cluster simulating a large
numbers, and number of nodes
average delay
812 H. A. M. Sayedahmed et al.

intelligence algorithms. Vasiliy Krundyshev et al., 2018 [34] introduces this approach
to reduce routing attacks in a V2V and V2I network. The algorithm consists of 3
phases: build a route, maintain existing routes, and update a trust estimation to detect
anomalies in routing and build a new route bypassing the suspicious node. However,
the simulation assumes only 1000 vehicles with speed between zero to 140 km/h,
which is not typical in highway scenarios.
Optimizing capacitated vehicle routing problem using bat algorithm, one of the
swarm intelligence algorithms, is experimented in [35], referred to as Optimizing
vehicle routing problem with time windows (VRPTW), and [36], referred to as Hybrid
Bat Algorithm and path relinking (HBA-PR). In [35] the results are based on statistical
collection while in [36] it is assumed that vehicles maintain the same speed and
distance.
R. Kasana et al., 2017 [37] investigate a solution for the next forwarding vehicle
using geographical routing based on cat swarm optimization, namely Cat Swarm
Optimization based on Geographical Routing (CSO-GR). The CSO-GR is compared to
Geographic Distance Routing protocol (GEDIR) concerning packet delivery ratio
(PDR) and normalized routing load.
Ant colony optimization (ACO) with the genetic algorithm (GA) is used to solve
the problem of the capacitated vehicle routing problem by splitting the delivery [38].
A random probability model for choosing the next stop is used. The obtained results
show improvement. In a different work, ACO is used to decrease network delay and
increase network scalability [39] in Clustering-Based Modified Ant Colony Optimizer
for the Internet of Vehicles (CACOIOV). ACO is called in two stages of packet routing
with the aid of a mobility Dynamic Aware Transmission Range on Local traffic Density
(DA-TRLD) algorithm. Table 2 summarizes the CI techniques in V2X communication.

5 Challenges and Open Issues

Several research challenges should be tackled to achieve the full potential of 5G V2X
communications. As introduced earlier, there are two main approaches for V2X
communications: DSRC and LTE-V. Each approach has its strength and weaknesses.
Hybrid approaches that utilize both DSRC and LTE-V can be used to combine the
advantages of both and avoid their disadvantages. The main issue that faces this hybrid
approach is DSRC and LTE-V band separation since the interference between V2V and
V2I may result in collisions that lead to a longer delay.
An important issue is a need for dynamic, fast, and smart delivery of information to
the appropriate destinations. The 5G V2X comprises multiple resources that produce a
huge amount of data such as locations, speeds, and direction of vehicles at a high data
rate that is enabled by the new Millimeter-Wireless Access in Vehicle environment
(mm-WAVE) or the modified IEEE 802.11ad. This data have to be shared in a dynamic
and fast manner. Due to the dynamics of the environment and the fast mobility of
vehicles, some uncertainties may arise. In such situations, CI techniques seem suitable
to handle these uncertainties.
A third issue is resource allocation in V2I communications. Any vehicle may keep
communicating with a V2I resource to obtain infotainment service for example. This
Computational Intelligence Techniques in Vehicle 813

may degrade the overall network performance and affect other safety services. A viable
solution to this issue is SDVN.
Fourth, while CAMs and DENMs are standardized messages used in ITS, no
routing protocol is standardized for the 5G V2X. Typically, Vehicular Ad Hoc Net-
work routing protocols are used, which have their issues.
Last, the evaluation of new solutions is a challenge as there is no comprehensive
simulation environment the contains all the elements of the 5G V2X communication
systems. Evaluation of new solutions is typically conducted by experimenting in real
environments or utilizing several simulators to do the evaluation.

6 Conclusion

This paper addresses V2X communications, which is becoming increasingly vital in


smart cities. V2X has the potential for better traffic management, safer roads, and better
infotainment services for end-users. In this paper, we presented an overview of V2X
communications and its two main approaches: the cellular-based and the DSCR. We
surveyed recent CI solutions that address the main issues of V2X communications. It is
noticed that most of the CI work in V2X is based on either fuzzy logic and neural
networks or evolutionary computing and aided algorithm to handle use cases, security,
or routing. Moreover, each algorithm handles one or at most two objectives in the 5G
V2X communication systems. Besides, and to the best of the authors’ knowledge, no
specific work gives attention to routing between vehicles and mobile devices. Last, in
most of the surveyed research, performance is evaluated using one single tool, which
needs further validation.
V2X communication shares a huge amount of information between different ele-
ments. This information could be better organized in datasets to carry out multiple
tasks. The size of these datasets is as large as the number of the participating vehicles in
each sub-network. It is well known that CI approaches can deal with such amount of
data. However, a single CI approach cannot handle these datasets. Some researchers
used fuzzy logic with the aid of other algorithms such as Q-learning, multiagent, or
ANNs. Also, most of the researches utilize evolutionary computation such as ACO, Cat
optimization, Bat optimization, or Bee optimization. In summary, researches on V2X
are quite broad. Several open challenges still not addressing such as routing, security,
resource allocation, and performance evaluation.

References
1. Chiti, F., et al.: Communications protocol design for 5G vehicular networks. In: 5G Mobile
Communications, pp. 625–649. Springer, Cham (2017)
2. Kassim, M., Rahman, R.Ab., Aziz, M.A.A., Idris, A., Yusof, M.I.: Performance analysis of
VoIP over 3G and 4G LTE network. In: 2017 International Conference on Electrical,
Electronics and System Engineering (ICEESE). IEEE (2017)
3. Khan, Z., Fan, P.: A multi-hop moving zone (MMZ) clustering scheme based on cellular-
V2X. China Commun. 15(7), 55–66 (2018)
814 H. A. M. Sayedahmed et al.

4. Seo, H., et al.: LTE evolution for vehicle-to-everything services. IEEE Commun. Mag. 54
(6), 22–28 (2016)
5. Tseng, Y.-L.: LTE-advanced enhancement for vehicular communication. IEEE Wirel.
Commun. 22(6), 4–7 (2015)
6. Garg, M.K., Singh, N., Verma, P.: Fuzzy rule-based approach for design and analysis of a
trust-based secure routing protocol for MANETs. Procedia Comput. Sci. 132, 653–658
(2018)
7. Kumar, P., Tripathi, S., Pal, P.: Neural network based reliable transport layer protocol for
MANET. In: 2018 4th International Conference on Recent Advances in Information
Technology (RAIT). IEEE (2018)
8. Sharma, A., Kim, D.S.: Robust bio-inspired routing protocol in MANETs using ant
approach. In: International Conference on Ubiquitous Information Management and
Communication. Springer, Cham (2019)
9. Choudhary, R., Sharma, P.K.: An efficient approach for power aware routing protocol for
MANETs using genetic algorithm. In: Emerging Trends in Expert Applications and Security,
pp. 133–138. Springer, Singapore (2019)
10. Zimmermann, H.-J.: Fuzzy Set Theory—and Its Applications. Springer, Berlin (2011)
11. Abboud, K., Omar, H.A., Zhuang, W.: Interworking of DSRC and cellular network
technologies for V2X communications: a survey. IEEE Trans. Veh. Technol. 65(12), 9457–
9470 (2016)
12. Festag, A.: Standards for vehicular communication—from IEEE 802.11 p to 5G. e & i
Elektrotechnik und Informationstechnik 132(7), 409–416 (2015)
13. Zhao, L., et al.: Vehicular communications: standardization and open issues. IEEE Commun.
Stand. Mag. 2(4), 74–80 (2018)
14. Bey, T., Tewolde, G.: Evaluation of DSRC and LTE for V2X. In: 2019 IEEE 9th Annual
Computing and Communication Workshop and Conference (CCWC). IEEE (2019)
15. Intelligent Transport Systems (ITS): Vehicular Communications; GeoNetworking; Part 5:
Transport Protocols; Sub-part 1: Basic Transport Protocol, ETSI EN Standard 302 636-5-1
V1.2.1, August 2014
16. Bouk, S.H., et al.: Hybrid adaptive beaconing in vehicular ad hoc networks: a survey. Int.
J. Distrib. Sens. Netw. 11(5), 390360 (2015)
17. https://www.etsi.org/deliver/etsi_ts/138300_138399/138300/15.03.01_60/ts_138300v150301p.
pdf
18. Del Ser, J., et al.: Bioinspired computational intelligence and transportation systems: a long
road ahead. IEEE Trans. Intell. Transp. Syst. 21(2), 466–495 (2019)
19. Filippi, A., Moerman, K., Daalderop, G., Alexander, P.D., Schober, F., Pfliegl, W.: Ready to
roll: why 802.11 p beats LTE and 5G for V2x. NXP Semiconductors, Cohda Wireless and
Siemens White Paper (2016)
20. Siddique, N., Adeli, H.: Computational Intelligence: Synergies of Fuzzy Logic, Neural
Networks and Evolutionary Computing. Wiley, Hoboken (2013)
21. Chaythanya, B.P.: Fuzzy logic based approach for dynamic routing in MANET. Int. J. Eng.
Res. 3(6), 1434–1441 (2014)
22. Kallel, L., Naudts, B., Rogers, A.: Theoretical Aspects of Evolutionary Computing.
Springer, Berlin (2013)
23. Wu, C., et al.: Cluster-based content distribution integrating LTE and IEEE 802.11 p with
fuzzy logic and Q-learning. IEEE Comput. Intell. Mag. 13(1), 41–50 (2018)
24. Rani, P., Shaw, D.K.: Artificial neural networks approach induced by fuzzy logic for traffic
delay estimation. J. Eng. Technol. 6(2), 127–141 (2018)
Computational Intelligence Techniques in Vehicle 815

25. Zhang, H., Bochem, A., Sun, X., Hogrefe, D.: A security aware fuzzy enhanced reliable ant
colony optimization routing in vehicular Ad hoc networks. In: IEEE Intelligent Vehicles
Symposium (IV) Changshu, Suzhou, China, 26–30 June 2018
26. Tyasnurita, R., Özcan, E., John, R.: Learning heuristic selection using a time delay neural
network for open vehicle routing. In: 2017 IEEE Congress on Evolutionary Computation
(CEC), pp. 1474–1481. IEEE (2017)
27. Cao, Z., Guo, H., Zhang, J.: A multiagent-based approach for vehicle routing by considering
both arriving on time and total travel time. ACM Trans. Intell. Syst. Technol. (TIST) 9(3), 1–
21 (2017)
28. Khan, Z., et al.: Two-level cluster based routing scheme for 5G V2X communication. IEEE
Access 7, 16194–16205 (2019)
29. Santillan, J.H., Tapucar, S., Manliguez, S., Calag, V.: Cuckoo search via Lévy flights for the
capacitated vehicle routing problem. J. Ind. Eng. Int. 14(2), 293–304 (2018). https://doi.org/
10.1007/s40092-017-0227-5
30. Szeto, W.Y., Wu, Y., Ho, S.C.: An artificial bee colony algorithm for the capacitated vehicle
routing problem. Eur. J. Oper. Res. 215(1), 126–135 (2011)
31. Eiza, M.H., Owens, T., Ni, Q., Shi, Q.: Situation-aware QoS routing algorithm for vehicular
ad hoc networks. IEEE Trans. Veh. Technol. 64(12), 5520–5535 (2015)
32. Ortega, V., Bouchmal, F., Monserrat, J.F.: Trusted 5G vehicular networks: blockchains and
content-centric networking. IEEE Veh. Technol. Mag. 13(2), 121–127 (2018)
33. Bian, K., Zhang, G., Song, L.: Security in use cases of vehicle-to-everything communi-
cations. In: 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall). IEEE (2017)
34. Krundyshev, V., Kalinin, M., Zegzhda, P.: Artificial swarm algorithm for VANET protection
against routing attacks. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 795–
800. IEEE (2018)
35. Osaba, E., Carballedo, R., Yang, X.-S., Fister Jr., I., Lopez-Garcia, P., Del Ser, J.: On
efficiently solving the vehicle routing problem with time windows using the bat algorithm
with random reinsertion operators. In: Nature-Inspired Algorithms and Applied Optimiza-
tion, pp. 69–89. Springer, Cham (2018)
36. Zhou, Y., Luo, Q., Xie, J., Zheng, H.: A hybrid bat algorithm with path relinking for the
capacitated vehicle routing problem. In: Metaheuristics and Optimization in Civil
Engineering, pp. 255–276. Springer, Cham (2016)
37. Kasana, R., Kumar, S.: A geographic routing algorithm based on cat swarm optimization for
vehicular ad-hoc networks. In: IEEE, 4th International Conference on Signal Processing and
Integrated Networks (SPIN), pp. 86–90 (2017)
38. Rajappa, G.P., Wilck, J.H., Bell, J.E.: An ant colony optimization and hybrid metaheuristics
algorithm to solve the split delivery vehicle routing problem. Int. J. Appl. Ind. Eng. 3(1), 55–
73 (2016)
39. Ebadinezhad, S., Dereboylu, Z., Ever, E.: Clustering-based modified ant colony optimizer
for internet of vehicles (CACOIOV). Sustainability 11(9), 2624 (2019)
Simultaneous Sound Source Localization
by Proposed Cuboids Nested Microphone
Array Based on Subband Generalized
Eigenvalue Decomposition

Ali Dehghan Firoozabadi1(&), Pablo Irarrazaval2, Pablo Adasme3,


Hugo Durney1, Miguel Sanhueza Olave1, David Zabala-Blanco4,
and Cesar Azurdia-Meza5
1
Department of Electricity, Universidad Tecnológica Metropolitana, Av. Jose
Pedro Alessandri 1242, 7800002 Santiago, Chile
{adehghanfirouzabadi,hdurney,msanhueza}@utem.cl
2
Electrical Engineering Department, Pontificia Universidad Católica de Chile,
Santiago, Chile
pim@uc.cl
3
Electrical Engineering Department, Universidad de Santiago de Chile,
Santiago, Chile
pablo.adasme@usach.cl
4
Department of Computing and Industries, Universidad Católica del Maule,
3466706 Talca, Chile
davidzabalablanco@hotmail.com
5
Department of Electrical Engineering, Universidad de Chile, Santiago, Chile
cazurdia@ing.uchile.cl

Abstract. Multiple sound source localization is an important application in


speech processing. In this paper, a cuboids nested microphone array (CuNMA)
is proposed for sound acquisition. Also, the spatial aliasing is eliminated by the
use of this array. Then, the subband processing is proposed based on the
GammaTone filter bank. In the next, the generalized eigenvalue decomposition
(GEVD) algorithm is implemented on all microphone pairs of CuNMA and for
each obtained subband of the GammaTone filter bank. In each subband, the
standard deviation (SD) is calculated for all direction of arrival (DOA) estima-
tions, and the subbands with improper information are eliminated. Then, the K-
means clustering with silhouette criteria are implemented on all DOAs for
estimating the number of speakers and to allocate the related DOAs for each
cluster. The proposed method is compared with steered response power-phase
transform (SRP-PHAT), Geometric Projection, and spectral source model-deep
neural network (SSM-DNN) on simulated data in noisy and reverberant con-
ditions, which the results show the superiority of the proposed method in
comparison with other previous works.

Keywords: Sound source localization  Nested microphone array 


GammaTone filter bank  Subband processing  Clustering

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 816–825, 2021.
https://doi.org/10.1007/978-3-030-58669-0_72
Simultaneous Sound Source Localization by Proposed Cuboids 817

1 Introduction

Sound source localization is an active and important field in speech signal processing,
where many research works were done in this area. There are many applications for
sound source localization such as: hearing aid systems [1], robotics [2], videoconfer-
encing [3], etc. Different strategies are utilized for source localization based on the time
difference of arrival (TDOA) [4], and energy propagation [5]. The computational
complexity is smaller is TDOA-based localization methods but the accuracy is lower in
noisy and reverberant conditions. The energy-based methods are slower because of
high computational complexity but they have higher accuracy and robustness in
undesirable conditions.
Some particular algorithms are proposed for direction of arrival (DOA) estimation
based on microphone array. The most common methods are estimating signal
parameter via rotational invariance technique (ESPRIT) [6], and multiple signal clas-
sification (MUSIC) [7]. The sparse component analysis (SCA) [8] algorithm localizes
multiple sources by considering at least a fixed-time analysis region for each sound
source. In this condition, the DOA estimation for multiple sound sources is converted
to DOA estimation for single sources. Ma et al. proposed a binaural source localization
method, which is implemented by combination between model-based methods of
speech spectrum components, and deep neural network (DNN) [9]. First, some models
are estimated for target and background sources by the use of speech spectrum com-
ponents in training step. In the next step, the source’s models are considered simul-
taneously, and the DOAs are estimated by the use of DNN from microphone signals.
Long et al. proposed a method for source localization based on the Geometric Pro-
jection algorithm [10]. Four types of power functions are proposed facing of some
acoustic source localization (ASL) algorithms. The similarity between proposed and
traditional methods is shown by comparing these power functions for ASL. Finally,
three fusion methods (arithmetic, normalized and geometric) are presented for wide-
band source localization.
In this paper, a TDOA-based method is proposed for three-dimensional multiple
sound source localization. First, a cuboids nested microphone array (CuNMA) with
microphone distribution in three dimensions is proposed to solve the problem for
spatial aliasing. Since the speech is a non-stationary signal with non-uniform infor-
mation in different frequency bands, the GammaTone filter bank, as an efficient filter, is
introduced for speech subband processing. In the next, the generalized eigenvalue
decomposition (GEVD) algorithm is implemented on all microphone pairs related to
each subband for estimating the DOAs. The subbands with inappropriate information
are ignored by standard deviation (SD) calculation and defining threshold for all
subbands. Then, all DOAs for each speaker are classified by K-means clustering and
silhouette criteria. Finally, the 3D position for each speaker is estimated by intersection
between all DOAs in clusters related to the microphone pairs. The proposed algorithm
is evaluated on noisy and reverberant conditions and is compared with some previous
works.
In Sect. 2, the microphone signal model and proposed cuboids nested microphone
array are presented. Section 3 shows the GammaTone filter bank, subband GEVD
818 A. D. Firoozabadi et al.

algorithm, thresholding on SDs, and clustering with silhouette criteria. The simulations
and results are discussed in Sect. 4. Section 5 includes some conclusions.

2 Microphone Signal Model and Proposed Nested Array

The microphone signal model is one of the basic assumptions in speech signal pro-
cessing specially in localization and tracking applications. The real model is appro-
priate in simulations, which contains the reverberation effects and undesirable
environmental conditions. The microphone array increases the accuracy of localization
algorithms by information redundancy but its disadvantage is spatial aliasing because
of inter-microphone distances. The linear nested microphone array has been proposed
for eliminating the spatial aliasing in speech enhancement applications [11]. But this
array is not useful for localization scenarios because the microphones are located just in
one dimension. In this section, cuboids nested microphone array is proposed as an
appropriate array for 3D sound source localization algorithms.
The frequency range for speech signal is [50–8000] Hz in localization applications.
Then, the sampling frequency is selected as fs ¼ 16000 Hz: Therefore, the nested
microphone array is designed to cover the frequency range B = [50–7600] Hz. The
proposed cuboids NMA is structured with four subarrays. The first subarray is designed
to cover the frequency range B1 = [3800–7600] Hz, which is the highest frequency
range for considered speech signal. The inter-microphone distance (d) with the con-
dition d\k=2 (k is the highest wavelength for this frequency range) is selected as
d1 \2:3 cm. The second subarray is designed for the frequency range B2 = [1900–
3800] Hz. Therefore, the inter-microphone distance is calculated as d2 ¼ 2d1 \4:6 cm:
The third subarray is designed for the frequency range B3 = [950–1900] Hz. In this
case, the inter-microphone distance is d3 ¼ 4d1 \9:2 cm. Finally, the forth subarray
covers the frequency range B4 = [50–950] Hz, where the inter-microphone distance is
d4 ¼ 8d1 \18:4 cm. Figure 1 shows the block diagram of the proposed method, where
the nested microphone array is presented in the left side of the diagram.

Fig. 1. The block diagram of the proposed method for multiple sound source localization by the
use of CuNMA and subband GEVD.
Simultaneous Sound Source Localization by Proposed Cuboids 819

The nearest microphone pairs (1, 2) (2, 3) (3, 4) (4, 1) (5, 6) (6, 7) (7, 8), and (8, 5) are
located to have inter-microphone distance d1 ¼ 2:3 cm: These microphones are adjusted
in both sides of CuNMA for preparing the spatial symmetry. The microphone pairs (1, 3)
(2, 4) (5, 7), and (6, 8) for the second subarray are placed with inter-microphone distance
d2 ¼ 3:25 cm: Then, these microphone pairs are used for frequency band B2. The
microphone pairs (4, 7) (8, 3) (5, 4) (1, 8) (6, 1) (5, 2) (6, 3), and (2, 7) are designed with
inter-microphone distance d3 ¼ 9:2 cm for the third subarray. Finally, the inter-
microphone distance is d4 ¼ 9:75 cm for the farthest microphone pairs (3, 5) (6, 4) (8, 2),
and (1, 7). Figure 2 shows 4 subarrays in the proposed CuNMA.

Fig. 2. The designed 4 subarrays for the proposed CuNMA.

Each subarray requires an analysis filter bank to avoid the spatial aliasing to prepare
the appropriate frequency bands for related microphone pairs. Therefore, an analysis
filter Hi ðzÞ and down sampler Di are considered for multi-rate sampling. Then, the
output of analysis filter bank is expressed as:

xm;i ½n ¼ xm ½n  hi ½n where ðm ¼ 1; . . .; 8; i ¼ 1; . . .; 4Þ; ð1Þ

where xm;i ½n is the m-th microphone signal in the output of i-th analysis filter, and hi ½n
is the impulse response for analysis filter. Then, xm;i ½n is considered as an input for the
GammaTone filter bank.

3 The Proposed Sound Source Localization Method Based


on GammaTone Filter Bank and Subband GEVD
3.1 The GammaTone Filter Bank for Subband Processing
Speech is a wideband and non-stationary signal with W-DO property. It means, each
time-frequency point in the speech spectrogram with high probability is related to a
single source. Therefore, the subband processing prepares more accurate information
for each separate speakers. The GammaTone filter bank was proposed by Johannesma
to model the human hearing system based on the correlation function [12]. The time-
domain impulse response for this filter is shows as:
820 A. D. Firoozabadi et al.

hgðtÞ ¼ vðc; bÞtc1 ebt cosðxt þ uÞuðtÞ; ð2Þ

where vðc; bÞ is the normalization constant based on the bandwidth factor b and filter
order c. x, u and uðtÞ are the central frequency in radian, phase shift, and unit step,
respectively. For fixed order c, b is considered as scaling parameter and the filter
bandwidth is changed by this factor, and order c controls the filter shape. The nested
microphone array signals are shown in the output of GammaTone filter bank as:

xm;i;j ½n ¼ xm;i ½n  hgj ½n ðj ¼ 1; . . .; 16Þ; ð3Þ

where hgj ½n and xm;i;j ½n are the discrete impulse response and microphone signal in the
output GammaTone filter bank, respectively.

3.2 The Proposed Subband GEVD for DOA Estimation


The DOA estimation is one of the techniques for sound source localization. GEVD is
an novel algorithm for localization, which is defined based on the eigenvalue
decomposition of covariance matrix. In this paper, the GEVD algorithm is imple-
mented on subbands and some other processes are combined to localize the speakers’
locations. The signal for each microphone pairs is defined as:
xT1 ½ng2 ¼ xT2 ½ng1 ; ð4Þ

where:

xi ½n ¼ ½xi ½n; xi ½n  1; . . .:; xi ½n  M þ 1T ; i ¼ 1; 2; ð5Þ

where xi ½n is the recorded speech signal by microphones. The signal xi ½n is the same
as xm;i;j ½n that the notations were changed for simplicity and T denotes to transpose of a
vector. The impulse response vector with length M is defined as:

gi ¼ ½gi;0 ; gi;1 ; . . .; gi;M1 T ; i ¼ 1; 2; ð6Þ

The covariance matrix for microphone pair m1 & m2 is expressed as:


 
Rx1 x1 Rx1 x2
R¼ ; ð7Þ
Rx2 x1 Rx2 x2

where the matrix components are Rxi xj ¼ Efxi ½nxTj ½ng; ði; j ¼ 1; 2Þ: The vector u is
defined as:
!
g2
u¼ : ð8Þ
g1
Simultaneous Sound Source Localization by Proposed Cuboids 821

It can be seen of Eqs. 4 and 7 that Ru ¼ 0; which means vector u is the eigenvector
of the covariance matrix R for eigenvalue 0. The GEVD method extracts the stochastic
gradient algorithms for estimating the generalized eigenvector related to the smallest
eigenvalue of noise covariance matrix RbM ; and microphone covariance matrix RxM [13].
The noise covariance matrix RbM is estimated from silence part of the recorded
microphone signal. The vector u is calculated by minimizing the cost function uT RxM u
during an iterative process. Then, the error function for the adaptive algorithm is
considered as:

uT ½nxm ½n
e½n ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi; ð9Þ
uT ½nRbM u½n

where is implemented by least mean square error (LMS) algorithm as:

@u½n
u½n þ 1 ¼ u½n  le½n ; ð10Þ
@e½n

Where l is the adaptation step in adaptive algorithm. After calculation the gradient in
Eq. 10, and simplification with normalization step to avoid the error propagation,
vector u is calculated as:

~u½n þ 1
u½n þ 1 ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ð11Þ
~uT ½n þ 1RbM ~ u½n þ 1

where:

~u½n þ 1 ¼ u½n  le½nfxm ½n  e½nRbM u½ng: ð12Þ

The impulse responses g1 and g2 are calculated by estimated vector u. In the next
step, the SD for DOAs is estimated for different time frames on each subband. The
subbands with small SD (jSDj\  100 ) have proper DOAs, which are centralized
around one point that shows there is just one dominant speaker in this subband. These
subbands are passed and their DOAs are considered for the next process. The subbands
with larger SD show the dispersion of estimated DOAs, which are not considered for
the next process, and they are eliminated.
The passed DOAs from the last step based on SD calculation are entered to the
clustering stage. Since the number of clusters (number of speakers) is unknown, the K-
means clustering method is a proper suggestion for this algorithm. The silhouette
criteria is implemented for estimating the K value (the number of speakers or clusters)
[14]. In the next step, the DOAs in each cluster are plotted in directions x, y and z for
calculating the intersections. This process is implemented for all clusters to estimate the
3D locations for speakers.
822 A. D. Firoozabadi et al.

 n¼1;...;N
ðxk ; yk ; zk Þ ¼ DOAk;1 \ DOAk;2 \ . . .: \ DOAk;n k¼1;...;K ; ð13Þ

where K is the number of clusters, and N is the number of DOAs in each cluster.
Finally, the 3D speakers’ locations ðxk ; yk ; zk Þ are calculated with this proposed sub-
band GEVD algorithm by the use of CuNMA.

4 Simulations and Results

The evaluations are implemented on simulated data, where the TIMIT dataset is
considered for the simulations [15]. The aim of the proposed method is multiple
simultaneous sound source localization. The simulations are designed for 2 and 3
simultaneous speakers to cover the high percentage of real scenarios. The additive
white Gaussian noise is considered for simulating the noisy conditions. The Image
model is selected to simulate the conditions similar to the real scenarios [16]. Also, the
hamming window is selected with 60 ms length for data blocking. In simulations, the
estimated impulse response length for GEVD algorithm is calculated as 128 samples.
Also, the adaptation step is set to l ¼ 107 in GEVD algorithm to have suitable speed
of convergence. The evaluations are implemented on a room with dimensions (350,
300, 400) cm, where 3 speakers are located at (45, 255, 170) cm (S1) (320, 250,
175) cm (S2), and (60, 75, 165) cm (S3), respectively. Figure 3 shows a view of the
simulated room.

Fig. 3. A view of the simulated room with location of speakers and microphone array.

The mean square error (MSE) criteria is selected for evaluation the precision and
robustness for the proposed method. The proposed Cuboids nested microphone array-
subband GEVD (CuNMA-SBGEVD) method is compared with steered response
power-phase transform (SRP-PHAT) [17], Geometric Projection [10], and SSM-DNN
[9] algorithms. Figure 4 shows the results for 2 simultaneous speakers. Figure 4a
represents the results for SNR = 10 dB and different range of RT60 : As seen, the
proposed method has smaller MSE values in comparison with two other previous
Simultaneous Sound Source Localization by Proposed Cuboids 823

works. Figure 4b shows the results for RT60 ¼ 500 ms and different range of SNRs.
Also, this figure shows the superiority of the proposed method and its robustness in
different SNRs.

Fig. 4. The MSE for the proposed CuNMA-SBGEVD method in comparison with SRP-PHAT
[17], geometric projection [10], and SSM-DNN [9] algorithms for 2 simultaneous speakers, a)
SNR = 10 dB and different range of RT60 , and b) RT60 ¼ 500 ms and different SNRs.

Figure 5 shows the MSE results for 3 simultaneous speakers. Figure 5a represents
the results for SNR = 10 dB and different RT60 values, and Fig. 5b is for RT60 ¼
500 ms and different range of SNRs. Also, this figure shows the smaller MSE values
for the proposed method in comparison with SRP-PHAT [17], Geometric Projection
[10], and SSM-DNN [9] algorithms. These two figures represent the superiority of the
proposed method in undesirable conditions, and for different number of speakers. The
results for 2 simultaneous speakers is a bit better than 3 speakers because the mixture
between the information in frequency is less for 2 speakers.

Fig. 5. The MSE for the proposed CuNMA-SBGEVD method in comparison with SRP-PHAT
[17], Geometric Projection [10], and SSM-DNN [9] algorithms for 3 simultaneous speakers, a)
SNR = 10 dB and different range of RT60 , and b) RT60 ¼ 500 ms and different SNRs.
824 A. D. Firoozabadi et al.

5 Conclusions

In this paper, a cuboids nested microphone array is proposed for simultaneous sound
source localization. This array localizes the sources accurately because of its 3D shape,
and it eliminates the spatial aliasing by the nested structure. Also, speech signal has
different information in frequency bands. Therefore, the GammaTone filter bank is
proposed for subband processing as a human hearing-based filter. The GEVD algo-
rithm is implemented in subband on all microphone pairs from nested microphone
array. Then, the SD is calculated for all DOAs and each subband. Then, the DOAs with
SD bigger than a threshold are eliminated. The number of speakers and their DOAs are
estimated by the K-means clustering and silhouette criteria. Finally, the 3D position of
each speaker is estimated by intersection between DOAs in each cluster. The proposed
method is compared with SRP-PHAT, Geometric Projection, and SSM-DNN algo-
rithms in noisy and reverberant conditions for 2 and 3 simultaneous speakers. The MSE
results show the superiority of the proposed method in comparison with other previous
works.

Acknowledgment. The authors acknowledge financial support from: FONDECYT


No. 3190147 and FONDECYT No. 11180107.

References
1. Simon, H.J.: Bilateral amplification and sound localization: then and now. J. Rehabil. Res.
Dev. 42(4), 117–132 (2005)
2. Wu, X., Gong, H., Chen, P., Zhong, Z., Xu, Y.: Surveillance robot utilizing video and audio
information. J. Intell. Robot. Syst. 55(4/5), 403–421 (2009)
3. Wang, C., Griebel, S., Brandstein, M.: Robust automatic videoconferencing with multiple
cameras and microphones. In: IEEE International Conference on Multimedia and Expo, New
York, NY, USA, pp. 1585–1588 (2000)
4. Blandin, C., Ozerov, A., Vincent, E.: Multi-source TDOA estimation in reverberant audio
using angular spectra and clustering. Sig. Process. 92(8), 1950–1960 (2012)
5. Sheng, X., Hu, Y.H.: Maximum likelihood multiple-source localization using acoustic
energy measurements with wireless sensor networks. IEEE Trans. Sig. Process. 53(1), 44–53
(2005)
6. Roy, R., Kailath, T.: Esprit-estimation of signal parameters via rotational invariance
techniques. IEEE Trans. Acoust. Speech Sig. Process. 37(7), 984–995 (1989)
7. Schmidt, R.: Multiple emitter location and signal parameter estimation. IEEE Trans.
Antennas Propag. 34(3), 276–280 (1986)
8. Pavlidi, D., Griffin, A., Puigt, M., Mouchtaris, A.: Real-time multiple sound source
localization and counting using a circular microphone array. IEEE Trans. Audio Speech
Lang. Process. 21(10), 2193–2206 (2013)
9. Ma, N., Gonzalez, J.A., Brown, G.J.: Robust binaural localization of a target sound source
by combining spectral source models and deep neural networks. IEEE/ACM Trans. Audio
Speech Lang. Process. 26, 2122–2131 (2018)
10. Long, T., Chen, J., Huang, G., Benesty, J., Cohen, I.: Acoustic source localization based on
geometric projection in reverberant and noisy environments. IEEE J. Sel. Top. Sign. Process.
13(1), 143–155 (2019)
Simultaneous Sound Source Localization by Proposed Cuboids 825

11. Zheng, Y.R., Goubran, R.A., El-Tanany, M.: Experimental evaluation of a nested
microphone array with adaptive noise cancellers. IEEE Trans. Instrum. Measur. 53(3),
777–786 (2004)
12. Boer, E.D., Kruidenier, C.: On ringing limits of the auditory periphery. Biol. Cybern. 63(6),
433–442 (1990)
13. Benesty, J.: Adaptive eigenvalue decomposition algorithm for passive acoustic source
localization. J. Acoust. Soc. Am. 107, 384–391 (2000)
14. Peter, J.R.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.
Comput. Appl. Math. 20, 53–65 (1987)
15. Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.:
TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Linguistic Data Consortium,
Philadelphia. https://catalog.ldc.upenn.edu/LDC93S1. Accessed 20 May 2019
16. Allen, J., Berkley, D.: Image method for efficiently simulating small-room acoustics.
J. Acoust. Soc. Am. 65(4), 943–950 (1979)
17. Do, H., Silverman, H.F.: SRP-PHAT methods of locating simultaneous multiple talkers
using a frame of microphone array data. In: IEEE International Conference on Acoustics,
Speech and Signal Processing, Dallas, TX, pp. 125–128 (2010)
A Framework for Analyzing 4G/LTE-A Real
Data Using Machine Learning Algorithms

Nihal H. Mohammed1, Heba Nashaat1(&), Salah M. Abdel-Mageid2,


and Rawia Y. Rizk1
1
Electrical Engineering Department, Port Said University,
Port Said 42523, Egypt
{nihalhossny,hebanashaat,r.rizk}@eng.psu.edu.eg
2
Computer Engineering Department, Collage of Computer Science
and Engineering, Taibah University, Medina, Saudi Arabia
sabdelmageid@taibahu.edu.sa

Abstract. Current cellular systems require manual configuration and manage-


ment of networks, which is expensive and time-consuming due to the increasing
rate of mobile users and nodes. Reducing the manual work of the Mobile
Network Operator (MNO) is a challenge of cellular networks. Therefore, it is
important to investigate and understand traffic patterns of a massive number of
cells. Machine learning (ML) and Network Functions Virtualization (NFV) have
used as necessary tools to analyze data and improve network efficiency. In this
paper, a ML-based framework is proposed to analyze real 4G/LTE-A mobile
network. The experimental results prove the efficiency of the proposed frame-
work to cluster the cells according to their behavior. This behavior is represented
by user downlink (DL) throughput in the cell or on edge during peak hours,
which is impacted by traffic load balance and the utilization of system resources
under the right radio conditions. This gives the operator a chance to optimize the
network performance.

Keywords: 4G/LTE-A  Data analysis  KPIs  Machine learning  Traffic load


balance

1 Introduction

The main aim of the Mobile Network Operator (MNO) is to provide Quality of Ser-
vices (QoS) for multimedia [1–3]. The 4G/LTE-A technologies have been developed to
meet user requirements and provide high network enhancement. In order to monitor
and optimize the network performance, there is a need for using Key Performance
Indicators (KPIs), which is the result of the network optimization process. The KPIs
can control the quality of provided services and achieve resource utilization. KPI could
be based upon network statistics, user drive testing, or a combination of both. KPI may
reach one GB of the size where analyzing this data helps in optimizing and under-
standing network performance. The time interval used for KPI evaluation should be
agreed upon when defining KPI targets. The results from daily averages are likely to be
hopeful relative to busy hour’s performance. With the increase in demand, it is highly

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 826–838, 2021.
https://doi.org/10.1007/978-3-030-58669-0_73
A Framework for Analyzing 4G/LTE-A Real Data 827

desirable to search for throughput enhancing techniques, particularly on the downlink


of 4G and 5G wireless systems [4].
Recently, the usage of Machine learning (ML) and Artificial Intelligence
(AI) techniques to analyze and perform autonomous operations in cellular networks has
been widely studied. Some studies show that it is possible to deploy ML algorithms and
AI in cellular networks effectively. Evaluation of the gains of a data-driven approach
with real large-scale network datasets is studied in [5]. In [6], a comprehensive strategy
of using big data and ML algorithms to cluster and forecast traffic behaviors of 5G cells
is presented. This strategy uses a traffic forecasting model for each cluster using various
ML algorithms. The Self-Optimized Network (SON) functions configuration is updated
in [7] such that the SON functions contribute better toward achieving the KPI target.
The evaluation is done on a real data set, which shows that the overall network
performance is improved by including SON management. Also, realistic KPIs are used
to study the impact of several SON function combinations on network performance;
eight distinct cell classes have been considered enabling a more detailed view of the
network performance [8]. In [9], the authors try to achieve load balance and traffic
steering. Two different ML algorithms of a throughput-based load balance and traffic
steering are introduced in LTE-A.
In this paper, a real 4G mobile network data set is collected hourly for three weeks
in a heavy traffic compound in Egypt in order to analyze user QoS limitations. This
limitation may be corresponding to system resources and traffic load. The ML-based
framework is introduced to analyze the Real 4G/LTE-A mobile network. It uses
visualization, dimension reduction, and clustering algorithms to enhance user Down-
Link (DL) throughput in the cell or on edge and balances the load.
The rest of this paper is organized as follows: Sect. 2 describes KPIs types and its
usage. Section 3 presents the ML-based framework for 4G/LTE-A performance eval-
uation. Experimental results and discussion are introduced in Sect. 4. Finally, Sect. 5
presents the main conclusion and future work.

2 4G/LTE Network KPIs

The main purpose of Radio Access Network (RAN) is to check the performance of the
network. Post-processing usually checks, monitors, and optimizes KPIs values and
counters to enhance the QoS or to get better usage of network resources [10, 11].
KPIs are categorized to radio network KPIs (from 1 to 6) and service KPIs (7 and 8) [12]:
1. Accessibility KPI measurements assist the network operator with information about
whether the services requested by a user can be accessed with specified levels of
tolerance in some given operating conditions.
2. Retainability KPIs measure the capacity of systems to endure consistent reuse and
perform its intended functions. Call drop and call setup measure this category.
3. Mobility KPIs are used to measure the performance of a network that can manage
the movement of users and keep the attachment with a network such as a handover.
The measurements include both intra and inter radio access technology (RAT) and
frequency success rate (SR) handover (HO).
828 N. H. Mohammed et al.

4. Availability KPIs measure the percentage of time that a cell is available. A cell is
available when the eNB can provide radio bearer services.
5. Utilization KPIs are used to measure the utilization of network and distribution of
resources according to demands. It consists of uplink (UL) resource block
(RB) utilization rate and downlink (DL) RB utilization rate.
6. Traffic KPIs are used to measure the traffic volumes on LTE RAN. Traffic KPIs are
categorized based on the type of traffic: radio bearers, downlink traffic volume, and
uplink traffic volume.
7. Integrity KPIs are used to measure the benefits introduced by networks to its user.
This indicates the impact of eNBs on the service quality provided to the user, such
as what is the throughput for cell and user and latency which users are served.
8. Latency KPIs measure the amount of service latency for the user or the amount of
latency to access a service.
In our research, three types of KPIs are analyzed to notice Cell Edge User
(CEU) throughput and its relation with traffic load among bands. These are Integrity
KPIs, utilization KPIs, and traffic KPIs.

3 A ML-Based Framework for 4G/LTE-A Performance


Evaluation

This section presents detailed design phases for investigating the network performance,
retrieving management information, and managing the performance of networks. The
proposed structure consists of five phases, as in Fig. 1. These phases are described as
follows:

Fig. 1. The main phases of the ML-based framework for 4G/LTE-A performance evaluation
A Framework for Analyzing 4G/LTE-A Real Data 829

3.1 Phase 1: Real Data Set Collection


A data set used in behavior evaluation is based on the monitoring of logs generated by
104 eNBs (312 cells). The selected area is “Tagamoaa Elawal” in Egypt, which is a
massive traffic area. It has more than 4743470 user elements (UEs) per day. The base
stations in the dataset belong to a 4G LTE-A, 2  2 Multiple Input Multiple Output
(MIMO) deployments with three bands of the three frequencies that exist in Egypt
applied in each cell: 2100 MHz, 1800 MHz, and 900 MHz with 10, 10, and 5 MHz
Bandwidth (BW); respectively assigned to each band. It represents the most advanced
cellular technology commercially in Egypt deployed on a large scale. This data is
collected hourly for three weeks as a 104 Megabyte log file with more than 77 features
and 259224-time rows.

3.2 Phase 2: Preparing Data for ML


There are four steps to prepare data for ML algorithms: Formatting, data cleaning,
features selection, and dimension reduction.

3.2.1 Formatting
ML algorithms can acquire their knowledge by extracting patterns from raw data. This
capability allows them to perform tasks that are not complicated to humans, but require
a more subject and intuitive knowledge and, therefore, are not easily described using a
set of logical rules. Log files collected from the network optimizer should be entered
into the machine in excel or CSV file format.

3.2.2 Data Cleaning


Pandas data frame [13] provides a tool to read data from a wide variety of sources.
Either Jupiter notebook or Goggle Collab is used for that step. Data cleaning and
preparation is a critical step in any ML process. Cleaning data is to remove any null or
zero value and its corresponding time row using python codes to avoid any mistake
during ML algorithms later. After the cleaning step in our framework, data is reduced
to 53 features and 222534-time lines.

3.2.3 Features Selection


This step aims to select and exclude features. Measured features after data cleaning are
summarized in Table 1. It considers the necessary parameters for the 4G/LTE-A net-
work, such as DL traffic volume, average throughput distributed for a specific cell,
average throughput for users, maximum and average number of UEs in a particular cell,
and network utilization. Utilization physical resource block (PRB) can be considered as
PRB percentage, which represents the percentage of resources distribution of each band
according to demands and available frequencies BW. The scheduler should take into
account the demand BW and load of traffic when assigning to the band. Therefore the
scheduler doesn’t allocate PRBs to users who are already satisfied with their current
allocation. Moreover, these resources are allocated to other users who need them
according to band load and the available BW.
830 N. H. Mohammed et al.

The Channel Quality Indicators (CQIs) have features number from 6 to 8. It rep-
resents the percentage of users in three categories of CQI; lowest, good and best, as in
Table 1. The features with numbers from 13 to 19 represent the indexes with Timing
Advance (TA). It can be considered as an indication of the coverage of each cell. The TA
is located on each index, which is a negative offset. This offset is necessary to ensure that
the downlink and uplink sub frames are synchronized at the eNB [14]. The used
Modulation and Coding Scheme (MCS) (numbered in Table 1 from 21 to 52) is also
taken into account. MCS depends on radio link quality and defines how many useful bits
can be transmitted per Resource Element (RE). UE can use the MCS index (IMCS) from
0–31 to determine the modulation order (Qm), and each IMCS is mapped to transport
block size (TBS) index to assess the number of physical resource blocks. In LTE, there
are the following modulations supported: QPSK, 16QAM, 64QAM, and 256QAM, and
to indicate if the most proper MCS level is chosen to use, an average MCS (feature
number 4 in Table 1) is used. It takes the range from 1 to 30. It represents a lousy choice
for MCS when it is under eight, from 10 to 20 it is good, and excellent MCS when it is
above 20. Both MCS and CQI are used as an indication of radio condition [15].
By applying the sklearn’s feature selection module [16] to the data set of 4G/LTE-
A network, all features haven’t zero difference, and there are no features with the same
value in all columns. Therefore no features are removed when sklearn’s feature
selection module is used. The output of correlation code in python is applied to these
53 features. The closest the value to 1 is the highest correlation between two features,
as in Fig. 2. Univariate feature selection works by selecting the best features based on
univariate statistical tests [17]. Sklearn’s SelectKBest [17] is used to choose some
features to keep. This method uses statistical analysis to select features having the
highest correlation to the target (our target here is user DL throughput in the cell and on
edge), it is the top 40 features (denoted by * in Table 1).

3.2.4 Dimension Reduction


Figure 2 shows that many features are highly correlated (redundant) where it could be
eliminated. Dimensionality reduction transforms features to a lower dimension. Prin-
ciple Component Analysis (PCA) is a dimensionality reduction technique used to
project the data into a lower-dimensional space. It reduces our features to the first 20
features in Table 1, where they are less and medium correlated and related to our target.

3.3 Phase 3: Data Visualization


Visualization is the graphical representation of information and data. With visual
elements like charts, graphs, and maps, data visualization tools provide an accessible
way to see and understand trends, outliers, and patterns in data. Distribution of traffic,
User DL throughput, and Indexes & TA are plotted to understand data characterization:
1. Distribution of traffic in three bands: Table 2 shows the traffic density of three bands
in MegaByte (MB). The L2100 band has a huge traffic density, and most traffic is
congested in its cells. Therefore, load balancing must be applied to transfer load
from overloaded cells to the neighboring cells with free resources for more balanced
load distribution to maintain appropriate end-user experience and performance.
A Framework for Analyzing 4G/LTE-A Real Data 831

Table 1. Used features after cleaning.


No. Feature name Description No. Feature Description
name
*0 Traffic DL Measure DL traffic volume *27 MCS.6 No. of users have
volume on LTE radio access NW Modulation (QPSK) &
index TBS(6)
*1 Cell Name Name of the cell *28 MCS.7 No. of users have
Modulation (QPSK) &
index TBS(7)
*2 Cell DL Avg. Average throughput *29 MCS.8 No. of users have
TH. distributed for specific cell Modulation (QPSK) &
index TBS(8)
*3 User DL Avg. Average throughput for users *30 MCS.9 No. of users have
TH. on specific cell Modulation (QPSK) &
index TBS(9)
*4 An indication for Efficient
Avg. suitable *31 MCS.10 No. of users have
selection Selection MSC on specific Modulation (16QAM) &
MSC cell index TBS(9)
*5 Avg. PRB Measure the system *32 MCS.11 No. of users have
utilizationcapability to meet the traffic Modulation(16QAM) &
demand. index TBS(10)
*6 CQI 0–4% Percentage of users have *33 MCS.12 No. of users have
Channel quality indicator Modulation (16QAM) &
QPSK (lowest) index TBS(11)
*7 CQI 5–9% Percentage of users have *34 MCS.13 No. of users have
Channel quality indicator Modulation (16QAM) &
16QAM (good) index TBS(12)
*8 CQI 10–15% Percentage of users have *35 MCS.14 No. of users have
Channel quality indicator Modulation (16QAM) &
level 64QAM (best) index TBS(13)
*9 CEU cell DL Avg. predicted DL *36 MCS.15 No. of users have
Avg. TH. throughput Cell edge user for Modulation (16QAM) &
specific cell index TBS(14)
*10 CEU user DL Avg. throughput for users on *37 MCS.16 No. of users have
Avg. TH. an edge Modulation (16QAM) &
index TBS(15)
*11 Avg. UE No. Avg. No. of UE in a specific *38 MCS.17 No. of users have
cell Modulation (64QAM) &
index TBS(15)
*12 Max UE No. Max. No. of UE in a specific *39 MCS.18 No. of users have
cell Modulation (64QAM) &
index TBS(16)
*13 TA & Index0 eNB coverage 39 m & TA is *40 MCS.19 No. of users have
0.5 m Modulation (64QAM) &
index TBS(17)
(continued)
832 N. H. Mohammed et al.

Table 1. (continued)
No. Feature name Description No. Feature Description
name
*14 TA &Index1 eNB coverage 195 m & TA 41 MCS.20 No. of users have
is 2.5 m Modulation (64QAM) &
index TBS(18)
*15 TA & Index2 eNB coverage 429 m & TA 42 MCS.21 No. of users have
is 5.5 m Modulation (64QAM) &
index TBS(19)
*16 TA& Index3 eNB coverage 819 m & TA 43 MCS.22 No. of users have
is 10.5 m Modulation (64QAM) &
index TBS(19)
*17 TA &Index4 eNB coverage 1521 m & TA 44 MCS.23 No. of users have
is 19.5 m Modulation (64QAM) &
index TBS(20)
*18 TA & Index5 eNB coverage 2769 m & TA 45 MCS.24 No. of users have
is 35.5 m Modulation (64QAM) &
index TBS(21)
*19 TA &Index6 eNB coverage 5109 m & TA 46 MCS.25 No. of users have
is 65.5 m Modulation (64QAM) &
index TBS(22)
*20 L.PRB.TM2 Capacity monitoring by PRB 47 MCS.26 No. of users have
Modulation (64QAM) &
index TBS(23)
*21 MCS.0 No. of users have Modulation 48 MCS.27 No. of users have
(QPSK) & index TBS(0) Modulation (64QAM) &
index TBS(24)
*22 MCS.1 No. of users have Modulation 49 MCS.28 No. of users have
(QPSK) & index TBS(1) Modulation (64QAM) &
index TBS(25)
*23 MCS.2 No. of users have Modulation 50 MCS.29 No. of users have
(QPSK) & index TBS(2) Modulation (QPSK) &
index TBS reserved
*24 MCS.3 No. of users have Modulation 51 MCS.30 No. of users have
(QPSK) & index TBS(3) Modulation (16QAM) &
index TBS reserved
*25 MCS.4 No. of users have Modulation 52 MCS.31 No. of users have
(QPSK) & index TBS(4) Modulation (64QAM) &
index TBS reserved
*26 MCS.5 No. of users have Modulation
(QPSK) & index TBS(5)
A Framework for Analyzing 4G/LTE-A Real Data 833

Fig. 2. Data features correlations

2. A Scatter plot in Fig. 3 is used to represent the distribution between DL throughput,


traffic volume, and PRB utilization. An increase in the usage of PRB and traffic
causes a decrease in DL throughput for UEs. Also, average DL throughput for
CEUs is plotted with average UEs number. It is found that increases in the number
of UEs may lead to a decrease in CEU’s throughput and vice versa with the
polynomial distribution. With an increasing number of UEs, DL throughputs
decrease to reach zero during three bands, as in Fig. 4.
3. TA and index: There are significant differences between LTE bands in terms of
performance. The 900 MHz band offers superior indoor penetration and rural cov-
erage, while the 1800 MHz provides slightly improved spectrum efficiency due to
the higher possibility that MIMO channels are available. Finally, 2100 MHz assigns
better spectrum efficiency than 1800 MHz, and 900 MHz and provides better cov-
erage near the eNB. A bar blot for the three band’s index is shown in Fig. 5. It is
shown that most traffic comes from Index0 (distance 39 m from the eNB) and Index1
(distance 195 m from the eNB). However, other indexes such as Index4, Index5, and
Index6 must be used with 1800 and 900 to cover the users on edge.

Table 2. Traffic volume distribution in three bands


L900 L1800 L2100
Avg. DL traffic volume (MB) 297.177527 278.716868 1215.516581
834 N. H. Mohammed et al.

Fig. 3. User DL TH according to traffic and Fig. 4. Average user DL throughput versus
utilization max UEs number

Fig. 5. TA & indexes in three bands

3.4 Phase 4: Clustering


For more visualization and clarification, the k-means clustering algorithm is used for
unlabeled data. The K-means clustering algorithm is widely used because of its sim-
plicity and fast convergence. However, the K-value of clustering needs to be given in
advance, and the choice of K-value directly affects the convergence result [18, 19]. The
initial centroid of each class is determined by using the distance as the metric. The
elbow method is used to determine the number of clusters. Implementing the elbow
method in our framework indicates that the number of clusters should be three clusters.
A Scatter plot in three dimensions verified the number of the clusters, as in Fig. 6.

Fig. 6. Real data clustering


A Framework for Analyzing 4G/LTE-A Real Data 835

3.5 Phase 5: Analyzing Quality Metric


This phase is responsible for discovering and analyzing network performance. Twenty
features that are the output of Phase 4 is used to cluster the cells with the same
attributes and discover the overall system performance decline. Therefore, all network
problems such as throughput troubleshooting for UEs in the cell or on edge (which we
focus on), traffic load balance, and PRB utilization distribution could be discovered
during this phase. The analysis considers the overall DL throughput, the traffic volume,
number of UEs, and network efficiency during peak hours.

4 Experimental Results and Discussion

As for the first part of the analysis, the summarized results are conducted based on the
number of clusters. Table 3 shows the big difference in minimum DL throughput for
UEs and minimum DL throughput for CEUs in the three clusters. That happens either
during peak hours or not. As in the results, the lowest throughput is recorded in the
second cluster. Also, minimum utilization is found in the second cluster, and it is
recorded according to the most moderate traffic. However, the second cluster is not fair
PRB utilization distribution according to each band’s BW. MCS and CQI indicate that
all sites are under good radio conditions, therefore the channel condition is not the
reason of the throughput degradation. Figure 7 indicates average traffic volume for the
three clusters, which shows that the third cluster has the most traffic, and the second
cluster has the lowest. Although large varying in traffic volume in the three clusters,
there is no much dissimilarity in average DL throughput, as shown in Fig. 8. Com-
paring the performance of the proposed framework against [9] which is throughput-
based traffic navigating framework. The authors in [9] practice a mathematical
framework of the throughput estimation data set and use it for load balance and traffic
steering. It is worth mentioning that, estimation process leads to error in a range of
2.1% and 2.9%, which affects load balance accuracy. On the other hand, DL
throughput real data set are used in the proposed framework for analyzing the network
performance, which has a 0% estimation error.

Table 3. Network performance for three clusters


Features First cluster Second cluster Third cluster
Avg. Traffic volume in Medium (1250 Low (200 to Very high
Mbps to 1500 MB) 400 MB) (3000 to
3600 MB)
Avg. UEs DL 7 to 16 Mbps 7 to 18 Mbps 7 to 16 Mbps
throughput in Mbps
Min. UEs DL L900 1.2974 0.0549 1.6552
throughput in Mbps L1800 0.4471 0.1297 2.4382
L2100 2.9597 0.7952 0.7111
Min. CEU user DL L900 0.0164 0.0022 0.043
throughput in Mbps L1800 0.0174 0.0012 0.5553
(continued)
836 N. H. Mohammed et al.

Table 3. (continued)
Features First cluster Second cluster Third cluster
0.0462 0.0141 0.1055
Max UEs no. in each L900 62 230 97
cluster L1800 103 78 53
L2100 169 150 340
PRB Utilization L900 41.6% 12.7% 70%
L1800 41.6% 12.8% 62.7%
L2100 23.7% 8.8% 47.9%
Min DL user Low (0.5 to Very low (0.2 to Reasonable (1
throughput during peak 4 Mpbs) 3.8 Mpbs) to 5 Mpbs)
hours
Min. CEU DL Low (0.5 to Very low Low (0.5 to
throughput during peak 1 Mpbs) (0.0039 Mpbs to 0.3 Mpbs)
hours 0.15)

In order to evaluate the performance of the clusters, the second cluster is analyzed
in detail, while the behavior of other clusters is mentioned in brief. The number of
output rows in the second cluster is 10953-time row for 99 eNBs. Peak hours are
defined from 5 PM to 1 AM according to maximum traffic volume time. Table 4
represents min throughput during these hours in the cell or on edge. Min DL
throughput in L2100 has the range of 0.79 Mbps at 12 AM with 70 UEs (about 46.6%
of max UEs recorded at that cluster as in Table 3) to 3.8 Mbps at 6 PM. However, min
DL user throughput in L900 is between 0.1 Mbps at 12 AM to 1 Mbps at 7 PM for all
UEs. At 7 PM, max numbers of UEs are recorded in this cluster during this band, as in
Table 4. On the other hand, min DL throughput in L1800 is between 0.5 Mbps to 1
Mbps at (1 AM, 5 PM) for the number of UEs in a range of 41% to 93% from total UEs
recorded in this cluster as in Table 4. CEUs also have very low DL throughput during
peak hours in the three bands (from 0.1Mbps to 0.003 Mbps).

Fig. 7. Avg. traffic volume Fig. 8. Avg. DL user throughput


A Framework for Analyzing 4G/LTE-A Real Data 837

Table 4. Performance parameters during peak hours in the second cluster


Peak hours Min DL throughput for Min DL throughput for Max UEs number
UEs CEUs
L900 L1800 L2100 L900 L1800 L2100 L900 L1800 L2100
12:00:00 AM 0.156 0.2306 0.7952 0.0121 0.0278 0.0248 36 29 70
1:00:00 AM 0.3787 0.1297 2.9959 0.0553 0.0137 0.0684 46 32 48
5:00:00 PM 0.416 0.5426 3.3255 0.0885 0.0078 0.0652 43 73 99
6:00:00 PM 0.5506 0.3564 3.8762 0.0781 0.0323 0.1415 228 53 84
7:00:00 PM 1.0832 0.1982 2.9897 0.0039 0.0078 0.1563 230 68 71
8:00:00 PM 0.8198 0.5107 1.8825 0.0564 0.0169 0.1875 34 39 82
9:00:00 PM 0.8327 0.6064 3.5418 0.0393 0.0689 0.1523 34 37 76
10:00:00 PM 0.8962 0.5864 1.9299 0.0134 0.0469 0.2578 32 55 66
11:00:00 PM 0.6756 0.2293 2.1229 0.0438 0.0443 0.0547 44 57 60

5 Conclusion

This paper focuses on using real data LTE-A heavy traffic to study real mobile network
problems. Although, clustering of sites in the cellular network seems to be saturated,
analyzing data set with 312 cells with 20 radio KPI features discovered that there are
several problems. Timing advance and index indicate that all cell bands cover users
near the site regardless of far users. Therefore, this is one of the reasons for bad DL
throughput for CEU, and 1800 and 900 bands should cover users on edge. PRB
utilization is not distributed well. L2100 had the lowest utilization even though it has
the largest BW (10 MHz), and also it has the largest traffic volume in all clusters. The
second cluster has the lowest min DL throughput at beak hours. Moreover, all UEs
(100% of max UEs) take this min throughput in this cluster, although CQI and MCS
are good. In the second cluster, CEU has very bad throughput during the peak in all
bands. Low demand throughput is due to lousy load distribution among three bands in
each site and inadequate resource utilization where network parameters should be
optimized to give users better QoS and to enhance coverage of each band. Therefore,
our aim in future work is to try to optimize network parameters using ML to achieve
predicted traffic load volume in order to enhance DL throughput, especially for CEU.
Also, use an appropriate regression algorithm to record enhancement on spectrum
efficiency.

References
1. Nashaat, H., Rizk, R.: Handover management based on location based services in F-HMIPv6
networks. KSII Trans. Internet Inf. Syst. (TIIS) 15(4), 192–209 (2018)
2. Rizk, R., Nashaat, H.: Smart prediction for seamless mobility in F-HMIPv6 based on
location based services. China Commun. 15(4), 192–209 (2018)
3. Nashaat, H.: QoS-aware cross layer handover scheme for high-speed vehicles. KSII Trans.
Internet Inf. Syst. (TIIS) 12(1), 135–158 (2018)
838 N. H. Mohammed et al.

4. Kukliński, S., Tomaszewski, L.: Key performance indicators for 5G network slicing. In:
2019 IEEE Conference on Network Softwarization (NetSoft), Paris, France, pp. 464–471
(2019)
5. Salo, J., Eduardo Zacarías, B.: Analysis of LTE radio load and user throughput. Int.
J. Comput. Netw. Commun. (IJCNC) 9(6), 33–45 (2017)
6. Luong, V., Do, S., Bao, S., Paul, L., Li-Ping, T.: Applying big data, machine learning, and
SDN/NFV to 5G traffic clustering, forecasting, and management. In: 2018 IEEE
International Conference on Network Softwarization and Workshops (Netsoft), Montreal,
Canada (2018)
7. Lars, C., Shmelz, S.: Adaptive SON management using KPI measurements. In: IEEE IFIP
Conference on Network Operations and Management Symposium (NOMS), Istanbul,
Turkey (2016)
8. Soren, H., Michael, S., Thomas, K.: Impact of SON function combinations on the KPI
behavior in realistic mobile network scenarios. In: IEEE Wireless Communication and
Network Conference Workshops (WCNCW), Barcelona, Spain (2018)
9. Gimenez, L., Kovacs, I., Wigard, J., Pedersen, K.: Throughput-based traffic steering in LTE-
advanced HetNet deployments. In: IEEE Vehicular Technology Conference, Boston, USA
(2015)
10. Abo Hashish, S., Rizk, R., Zaki, F.: Joint energy and spectral efficient power allocation for
long term evolution-advanced. Comput. Electr. Eng. 72, 828–845 (2018)
11. Nashaat, H., Refaat, O., Zaki, F., Shaalan, E.: Dragonfly-basedjoint delay/energy LTE
downlink scheduling algorithm. IEEE Access 8, 35392–35402 (2020)
12. Ralf, K.: Key performance indicators and measurements for LTE radio network optimiza-
tion. In: LTE Signaling, Troubleshooting and Performance Measurement, pp. 267–336
(2015)
13. Wang, K., Fu, J., Wang, K.: SPARK-a big data processing platform for machine learning. In:
International Conference on Industrial Informatics - Computing Technology, Intelligent
Technology, Industrial Information Integration (ICIICII), pp. 48–51 (2016)
14. Bejarano, J., Toril, M.: Data-driven algorithm for indoor/outdoor detection based on
connection traces in a LTE network. IEEE Access 7, 65877–65888 (2019)
15. Salman, M., Ng, C., Noordin, K.: CQI-MCS mapping for green LTE downlink transmission.
In: Proceedings of the Asia-Pacific Advanced Network, vol. 36, pp. 74–82 (2013)
16. Yong seog, K., Filippo, M., Nick, W.: Feature selection in data mining. In: Data Mining
Opportunities and challenge, pp. 80–105 (2003)
17. A little book of python for multivariate analysis. https://python-for-multivariate-analysis.
readthedocs.io/a_little_book_of_python_for_multivariate_analysis.html
18. Theyazn, T., Joshi, M.: Integration of time series models with soft clustering to enhance
network traffic forecasting. In: 2016 Second International Conference on Research in
Computational Intelligence and Communication Networks (ICRCICN), Kolkata, India
(2016)
19. Yuan, C., Yang, H.: Research on K-value selection method of K-means clustering algorithm.
J. Multi. Sci. J. 2(2), 226–235 (2019)
Robust Kinematic Control of Unmanned
Aerial Vehicles with Non-holonomic
Constraints

Ahmad Taher Azar1,2(B) , Fernando E. Serrano3 , Nashwa Ahmad Kamal4 ,


and Anis Koubaa1
1
Robotics and Internet of Things Lab (RIOTU),
Prince Sultan University, Riyadh, Saudi Arabia
{aazar,akoubaa}@psu.edu.sa
2
Faculty of Computers and Artificial Intelligence,
Benha University, Banha, Egypt
ahmad.azar@fci.bu.edu.eg, ahmad t azar@ieee.org
3
Faculty of Engineering and Architecture,
Universidad Tecnologica Centroamericana (UNITEC),
Zona Jacaleapa, Tegucigalpa, Honduras
feserrano@unitec.edu, serranofer@eclipso.eu
4
Faculty of Engineering, Cairo University, Giza, Egypt
nashwa.ahmad.kamal@gmail.com

Abstract. This paper presents a robust kinematic control of unmanned


UAV aerial vehicles with non-holonomic constraints. The studied sys-
tem consists of a 2D UAV non-holonomic kinematic model represented
as a driftless system with its state and control inputs. The first part
of this study consists of the evidence that the model being studied is
non-holonomic in view of its involutivity properties. The second part of
this study consists of the design of a robust kinematic controller for an
unmanned aerial vehicle (UAV) in which the states of the systems are
used for feedback control and the desired angular and linear velocities
are precisely tracked by the proposed controller approach. This control
strategy is achieved by designing the appropriate Lyapunov functional to
meet the robust stability conditions and by finding the switching gains to
track the desired profile. The control strategy obtained is tested in the
proposed 2D mathematical model of the unmanned aerial vehicle and
it is confirmed that the system variables track the desired profile while
keeping the angular and linear velocity bounded. This study concludes
with a discussion of the results and the respective conclusions.

Keywords: Planar Parallel Manipulator (PPM) · Kinematic analysis ·


ADAMS · Neural networks · Neuro-fuzzy inference system · Particle
swarm optimization (PSO) · Genetic algorithm (GA)

c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 839–850, 2021.
https://doi.org/10.1007/978-3-030-58669-0_74
840 A. T. Azar et al.

1 Introduction
Non-holonomic constraints can be found in different types of robotic and mechan-
ical systems. Holonomy is directly related to involutivity, i.e. when the Lie
bracket of any pair of vector fields spans the base set. Non-holonomic con-
straints allow the definition of a robotic kinematic model and unmanned aerial
vehicles as driftless control systems with state and input variables that allow
for an appropriate kinematic control design. Examples of non-holonomic sys-
tems can be found, for example, in [1], where the spherical formation control of
non-holonomic aircraft such as vehicles is demonstrated. The control strategy
shown in this paper is based on a backstepping controller where two filters are
used to replace the virtual variables in the backstepping control design [3,17,19–
21]. Another interesting strategy can be found in [11], where an adaptive finite
time tracking control for multiple non-holonomic unmanned aerial vehicles is
designed. The backstepping controller is obtained by implementing the Lya-
punov theory to overcome the monitoring of the UAV swarm. The nonholonomic
limitations of this study are addressed by a transverse function. In [10], a time
variable tracking control for multiple UAVs with non-holonomic limitations is
shown. In this paper, similar to the previous study, a backstepping controller is
designed to avoid chattering with a hysteretic quantizer.
Scientists have researched non-holonomy for different forms of mechanical
systems. Studies such as [12] show that the problem with the rolling motion
of two rigid cylindrical bodies leads to a non-linear motion equation and con-
firm that the geometric theory of non-holonomic constraint systems provides an
efficient tool to solve this kind of problem. Normal forms and singularities of non-
holonomic robots can be found in [16]. The authors in that study intended to
describe the singularities of non-holonomic robotic systems with their respective
controllers. In addition, the authors in [5] provided an important contribution to
the control of the formation of non-holonomic agents. A distributed controller is
designed to control the navigation of non-holonomic agents. A quad-rotor guid-
ance control for non-holonomic vehicles is proposed in [6] where a velocity-based
strategy for the 3-D prescribed path is proposed. In [22], a control of redundant
robot arms with zero-space compliance with singularities is provided. One of the
important results in that research is that the controller does not need sensor
information.
Robust control has been an important strategy to solve various stabilization
problems in mechanical systems. Examples of such controllers can be found, for
example, in [2,4], where a robust controller is designed to control the unmanned
underwater vehicle in the light of parametric interval uncertainties. Another
example can be found in [13], where the design of a robust controller for auto-
mated side control is shown. Very interesting results that can be implemented in
mechanical systems can be found in [15], where robust output feedback control is
applied to different nonlinear complex systems. In [14], robust output feedback
stabilization of the angular velocity of a rigid body is demonstrated by other
interesting control strategies found in other studies such as [9].
Robust Kinematic Control of UAV 841

A robust kinematic control of non-holonomic unmanned aerial vehicles is


presented in this paper. The first part of this paper is devoted to showing the
2D non-holonomical kinematic model of the UAV along with the definitions of
holonomy and involutivity showing that this mathematical model has the limita-
tions mentioned above [18]. The second part of this study consists of formulating
robust kinematic control laws that consider the robust stability of the feedback
system in order to track the angular and linear reference velocity [8]. Robust
switching gains are achieved by maintaining the stability of the closed loop sys-
tem. A numerical experiment is being conducted to corroborate the theoretical
results explained in this study.
This paper is organized as follow: In Sect. 2, problem formulation is presented.
Robust kinematic controller design is described in Sect. 3. Numerical experiments
are presented in Sect. 4. Discussion of the results are presented in Sect. 5. Finally,
the paper is concluded in Sect. 6 along with the future directions.

2 Problem Formulation
First of all, it is important to mention the definitions of holonomy and involu-
tivity. The Pfaffian constraint is defined as follows [7]:
<ωi , q̇i > = 0 (1)
where qi are the generalized coordinate system. The following property describes
the holonomy of a system [18]:
Property 1. If there is a function ωi = dhi for (1) where dhi = [ ∂hi ∂hi
∂q1 , ..., ∂qn ], the
restrictions are holonomic, otherwise the restrictions are non-holonomic.
So, consider the following property for involutivity [18]:
Property 2. Consider the vector fields g1 and g2 . If the Lie bracket [g1 , g2 ] is
set to Δ = span[g1 , ..., gn ] then the restrictions are holonomic otherwise they
are non-holonomic. This means that [g1 , g2 ] could be represented by a linear
combination of Δ or, in other words, it is involutive.
Another important definition is the description of a driftless system such as:

q̇ = gi (q)ui (2)
i

where ui are the inputs of the control systems and gi (q) are the respective vector
fields if gi ∈ Ω ⊥ where ωi ∈ Ω [18]. In order to establish the mathematical model
for the controller design, consider the following non-holonomic 2D kinematic
model of an unmanned aerial vehicle [10,11]:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
ẋ 0 cos(φ)
⎣ ẏ ⎦ = ⎣ 0 ⎦ w + ⎣ sin(φ) ⎦ v
φ̇ 1 0
q̇ = g1 u1 + g2 u2 (3)
842 A. T. Azar et al.

Where q = [x, y, φ]T , u1 = w ∈ R is the input angular velocity and u2 = v ∈


R is the linear velocity. Consider the Lie bracket for g1 and g2 [18]:
∂g2 ∂g1
[g1 , g2 ] = Lg1 g2 − Lg2 g1 = g1 − g2 (4)
∂q ∂q
So, from (3) and (4), the following result is obtained:
⎡ ⎤
−sin(φ)
[g1 , g2 ] = ⎣ cos(φ) ⎦ (5)
0
Therefore, (5) is not included in Δ = span[g1 , g2 ] and this confirms that the
constraints are non-holonomic.

3 Robust Kinematic Controller Design


Designing a robust kinematic controller from (3) as a driftless system as defined
in (6):
q̇ = g1 u1 + g2 u2 (6)
where q̇ = f (q, u). So, the following term is obtained as found in [8]:
D = Lf V + αv
D = ∇V.f (q, u) + αv
D = ∇V.g1 u1 + ∇V.g2 u2 + αv
D = ψ1 u 1 + ψ 2 u 2 + ψ 0 (7)
Where ψ0 = αv , ψ1 = ∇V.g1 and ψ2 = ∇V.g2 for ψ0 , ψ1 , ψ2 ∈ R. The
following condition is found as shown below [8]:
K = {u ∈ U : ψ0 + ψ1 u1 + ψ2 u2 ≤ 0} (8)
Consider the following error variables:
u1 = λ1 (wd − w) = λ1 ew
u2 = λ2 (vd − v) = λ2 ev (9)
where λ1 , λ2 ∈ R are the controller gains. Please note that the measured outputs
are as follows:
w = φ̇
   
 ẋ   vcos(φ) 
v=   =   (10)
ẏ   vsin(φ) 
Then, by substituting the error variables (9) into (8), the condition is con-
verted into:
K = {u ∈ U : ψ0 + λ1 ψ1 ew + λ2 ψ2 ev ≤ 0}
K = {u ∈ U : ψ0 + λ1 δ1 + λ2 δ2 ≤ 0} (11)
Robust Kinematic Control of UAV 843

where δ1 = ψ1 ew and δ2 = ψ2 ev .
In order for the condition (11) to be met, the robust gains are tuned as:

ψ0
− 2δ1 when ψ0 > 0 and δ1 > 0
λ1 =
k1 when ψ0 ≤ 0 and δ1 ≤ 0

ψ0
− 2δ2 when ψ0 > 0 and δ2 > 0
λ2 = (12)
k2 when ψ0 ≤ 0 and δ2 ≤ 0

for k1 , k2 ∈ R+ . Now considering again that q̇ = f (q, u), the Lyapunov function
V (q) = 12 q T q and αv = q T q. By obtaining the following derivative V̇ (q) = q T q̇ =
∇V (q).f , we conclude that ∇V (q) = q T . The block diagram of the proposed
robust kinematic automatic control system is shown In Fig. 1 where the non-
holonomic constraints are included in the kinematic controller design.

Fig. 1. Block diagram of the robust kinematic automatic control system

4 Numerical Experiments
In this numerical example section, a non-holonomic kinematic model of an
unmanned aerial vehicle (3) is used to test the designed controller in this study.
Two numerical examples are provided, the first is the implementation of the
sinusoidal linear and angular velocity profile desired, and the second is the imple-
mentation of the step profile to check the convergence time to zero of the error
variable.

4.1 Numerical Experiment 1

The desired velocity profiles wd and vd are sinusoidal functions truncated at a


certain value. These input functions were chosen on the basis that it is acceptable
to evaluate dynamic behavior when non-holonomic constraints are encountered
in the analyzed system. It is important to remember that the proposed control
strategy is similar to the updated versions of the robust controllers used in [15].
844 A. T. Azar et al.

0.8
Reference
Proposed Strategy
0.6 Mohamed and Alamir,2018
Alamir et. al, 2017

0.4

(rad/s)
0.2

3 0
Velocity for q

-0.2

-0.4

-0.6

-0.8
0 2 4 6 8 10 12 14
Time (s)

Fig. 2. Angular velocity q̇3

In Fig. 2, the angular velocity of the variable q̇3 = φ̇ is compared with the
reference variable along with the results of the strategies shown in [15]. The
results have shown that the velocity profile is tracked with a small error relative
to other control approaches. The integral squared error (ISE) for this variable is
shown in Table 1 to confirm that the smallest ISE is obtained by the proposed
control strategy even though this result is very close to the results obtained by
the approach shown in [15].
Something similar can be corroborated by Fig. 3 and Fig. 4 where this linear
velocity is precisely tracked by the proposed control approach while reducing the
error when a trajectory velocity profile is followed. It can also be observed that
the results obtained exceed the results obtained by [15] in both figures.

10
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
5

0
(m/s)

-5
1
Velocity for q

-10

-15

-20

-25
0 2 4 6 8 10 12 14
Time (s)

Fig. 3. Velocity q̇1


Robust Kinematic Control of UAV 845

0.4
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
0.2

(m/s)
2
Velocity for q
-0.2

-0.4

-0.6

-0.8
0 2 4 6 8 10 12 14
Time (s)

Fig. 4. Velocity q̇2

Table 1. Integral squared error for the velocity variable q̇3

Control strategy Integral squared error


Proposed strategy 3.27232
Mohamed and Alamir, 2018 3.88184
Alamir et al. 2017 4.48075

In Fig. 5 and Fig. 6, the linear position in y and the rotation angle of the
UAV φ are shown respectively. The results show the evolution of both variables
in the time, thus maintaining a sinusoidal trajectory by reference velocity.
Finally in Fig. 7 and Fig. 8, the control inputs for the kinematic non-
holonomic UAV are shown to confirm that that the input velocities w and v
do not reach higher values, so these input variables can be implemented.

0.04
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017

0.02
(m)

0
2
Distance q

-0.02

-0.04

-0.06
0 2 4 6 8 10 12 14
Time (s)

Fig. 5. Position q2
846 A. T. Azar et al.

0.15
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017

0.1

(rad)
3
0.05
Angle q

-0.05
0 2 4 6 8 10 12 14
Time (s)

Fig. 6. Angle q3

0.8
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
0.6

0.4
(rad/s)

0.2
1
Input U

-0.2

-0.4
0 2 4 6 8 10 12 14
Time (s)

Fig. 7. Input u1

10
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
5

0
(m/s)

-5
2
Input U

-10

-15

-20

-25
0 2 4 6 8 10 12 14
Time (s)

Fig. 8. Input u2
Robust Kinematic Control of UAV 847

1.4
Reference
Proposed Strategy
Mohamed and Alamir,2018
1.2
Alamir et. al, 2017

(rad/s)
0.8

3
Velocity for q
0.6

0.4

0.2

0
0 2 4 6 8 10 12 14
Time (s)

Fig. 9. Evolution in time of the variable q̇3 = φ̇

4.2 Numerical Experiment 2


Similar to the previous example, the results obtained are compared with the
results obtained in [15] implementing a step function to obtain the time in which
the convergence to zero of the error variable is achieved.
As can be noticed in Fig. 9, the most accurate result is the proposed controller
strategy compared to the modified strategies shown in [15]. The accuracy is
significantly improved by the proposed controller and, as can be confirmed later,
by an appropriate control effort and a small error.
As shown in Fig. 10, the error converges to zero is reached by 5.1 s (0.1 s after
the step transition) so that it can be corroborated that this strategy achieves a
shorter convergence time as opposed to other approaches.

0.4
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
0.2
(rad/s)

0
3

-0.2
Error for the variable q

-0.4

-0.6

-0.8

-1
0 2 4 6 8 10 12 14
Time (s)

Fig. 10. Error ew of the variable q̇3


848 A. T. Azar et al.

1.4
Proposed Strategy
Mohamed and Alamir,2018
Alamir et. al, 2017
1.2

(rad/s)
0.8

1
Input U

0.6

0.4

0.2

0
0 2 4 6 8 10 12 14
Time (s)

Fig. 11. Input variable u1 for the experiment 2

Finally, in Fig. 11, the control effort is shown where, as verified, this control
input is bounded and therefore excessive values are reached. The theoretical and
experimental results will be analyzed in depth in the following discussion section.

5 Discussion
The results shown in the theoretical derivations of the proposed controller show
that robust stability is achieved by selecting appropriate controller gains. As
confirmed in the numerical simulation examples, despite their structure, the
obtained switching control gains do not produce chattering or unwanted oscil-
lations. The non-holonomic constraints found in the kinematic model of the
unmanned aerial vehicle analyzed have been effectively overcome by a robust
controller to provide high maneuverability.
The results obtained in this paper can be of assistance to scientists who need
a robust, scalable and clear kinematic control technique for unmanned aerial
vehicles of any type. Another important finding is that the obtained velocity is
not excessive due to the robust controller action and the switching gains yield
bounded inputs, provided that the unmanned aerial vehicle kinematic model is
designed as a driftless system.

6 Conclusion
This paper proposes a robust kinematic controller for unmanned aerial vehicles
with non-holonomic limitations. Holonomic constraints are found in many types
of mechanical systems that involve involutivity in the conclusion of constraint
holonomy. The robust control legislation obtained in this paper is essentially
based on angular and linear velocity in order to track the desired position and
velocity. Gains are tuned by a switching law which avoids a chattering effect that
produces unwanted oscillations that can lead to instability, as demonstrated by
theoretical derivation and corroborated by a numerical experiment.
Robust Kinematic Control of UAV 849

References
1. Ai, X., Chen, Y.Y., Zhang, Y.: Spherical formation tracking control of non-
holonomic aircraft-like vehicles in a spatiotemporal flowfield. J. Franklin Inst.
(2020)
2. Ammar, H.H., Azar, A.T.: Robust path tracking of mobile robot using fractional
order PID controller. In: The International Conference on Advanced Machine
Learning Technologies and Applications (AMLTA 2019). Advances in Intelligent
Systems and Computing, vol. 921, pp. 370–381. Springer, Cham (2020)
3. Azar, A.T., Serrano, F.E., Flores, M.A., Vaidyanathan, S., Zhu, Q.: Adaptive
neural-fuzzy and backstepping controller for port-Hamiltonian systems. Int. J.
Comput. Appl. Technol. 62(1), 1–12 (2020)
4. Azar, A.T., Serrano, F.E., Hameed, I.A., Kamal, N.A., Vaidyanathan, S.: Robust
H-infinity decentralized control for industrial cooperative robots. In: Proceedings
of the International Conference on Advanced Intelligent Systems and Informatics
2019. Advances in Intelligent Systems and Computing, vol. 1058, pp. 254–265.
Springer, Cham (2020)
5. Barogh, S.A., Werner, H.: Cooperative source seeking with distance-based for-
mation control and non-holonomic agents. IFAC-PapersOnLine 50(1), 7917–7922
(2017)
6. Bouzid, Y., Bestaoui, Y., Siguerdidjane, H., Zareb, M.: Quadrotor guidance-control
for flight like nonholonomic vehicles, pp. 980–988 (2018)
7. Choset, H., Lynch, K.M., Hutchinson, S., Kantor, G.A., Burgard, W., Kavraki,
L.E., Thrun, S. (eds.): Principles of Robot Motion: Theory, Algorithms, and Imple-
mentation. Intelligent Robotics and Autonomous Agents Series. The MIT Press,
Cambridge (2005)
8. Freeman, R.A., Kokotovic, P.V.: Robust Nonlinear Control Design. Birkhauser
(1996)
9. Hernández-Torres, D., Riu, D., Sename, O.: Reduced-order robust control of a fuel
cell air supply system. IFAC-PapersOnLine 50(1), 96–101 (2017)
10. Hu, J., Sun, X., He, L.: Time-varying formation tracking for multiple UAVs with
nonholonomic constraints and input quantization via adaptive backstepping con-
trol. Int. J. Aeronaut. Space Sci. 20(3), 710–721 (2019)
11. Hu, J., Sun, X., Liu, S., He, L.: Adaptive finite-time formation tracking control for
multiple nonholonomic UAV system with uncertainties and quantized input. Int.
J. Adapt. Control Signal Process. 33(1), 114–129 (2019)
12. Janova, J., Musilova, J.: Non-holonomic mechanics: a geometrical treatment of
general coupled rolling motion. Int. J. Non-Linear Mech. 44(1), 98–105 (2009)
13. Korus, J.D., Karg, P., Ramos, P.G., Schutz, C., Zimmermann, M., Muller, S.:
Robust design of a complex, perturbed lateral control system for automated driv-
ing. IFAC-PapersOnLine 52(8), 1–6 (2019)
14. Mazenc, F., Astolfi, A.: Robust output feedback stabilization of the angular velocity
of a rigid body. Syst. Control Lett. 39(3), 203–210 (2000)
15. Mohamed, A., Alamir, M.: Robust output feedback controller for a class of nonlin-
ear systems with actuator dynamics. IFAC-PapersOnLine 51(25), 275–280 (2018)
16. Ratajczak, J., Tchon, K.: Normal forms and singularities of non-holonomic robotic
systems: a study of free-floating space robots. Syst. Control Lett. 138(104), 661
(2020)
850 A. T. Azar et al.

17. Shukla, M.K., Sharma, B.B., Azar, A.T.: Control and synchronization of a frac-
tional order hyperchaotic system via backstepping and active backstepping app-
roach. In: Mathematical Techniques of Fractional Order Systems. Advances in
Nonlinear Dynamics and Chaos (ANDC), pp. 559–595. Elsevier (2018)
18. Spong, M., Hutchinson, S., Vidyasagar, M.: Robot Modeling and Control. Wiley,
Hoboken (2006)
19. Vaidyanathan, S., Azar, A.T.: Adaptive backstepping control and synchronization
of a novel 3-D jerk system with an exponential nonlinearity. In: Advances in Chaos
Theory and Intelligent Control, pp. 249–274. Springer, Berlin (2016)
20. Vaidyanathan, S., Idowu, B.A., Azar, A.T.: Backstepping controller design for the
global chaos synchronization of Sprott’s jerk systems. In: Chaos Modeling and
Control Systems Design. Studies in Computational Intelligence, vol. 581, pp. 39–
58. Springer, Cham (2015)
21. Vaidyanathan, S., Jafari, S., Pham, V.T., Azar, A.T., Alsaadi, F.E.: A 4-D chaotic
hyperjerk system with a hidden attractor, adaptive backstepping control and cir-
cuit design. Arch. Control Sci. 28(2), 239–254 (2018)
22. Vigoriti, F., Ruggiero, F., Lippiello, V., Villani, L.: Control of redundant robot
arms with null-space compliance and singularity-free orientation representation.
Robot. Auton. Syst. 100, 186–193 (2018)
Nonlinear Fractional Order System
Synchronization
via Combination-Combination
Multi-switching

Shikha Mittal1 , Ahmad Taher Azar2,3(B) , and Nashwa Ahmad Kamal4


1
Department of Mathematics, Jesus and Mary College,
University of Delhi, New Delhi, India
sshikha7014@gmail.com
2
Robotics and Internet of Things Lab (RIOTU),
Prince Sultan University, Riyadh, Saudi Arabia
aazar@psu.edu.sa
3
Faculty of computers and Artificial Intelligence,
Benha University, Benha, Egypt
ahmad.azar@fci.bu.edu.eg, ahmad t azar@ieee.org
4
Faculty of Engineering, Cairo University, Giza, Egypt
nashwa.ahmad.kamal@gmail.com

Abstract. This manuscript focuses on the synchronization of the non-


linear fractional order system by multi-switching combination. Two mas-
ter systems and two slave systems are considered for combination-
combination multiple-switch synchronization technology and various
master systems are syncing with different slave systems. The Lorenz
method of the nonlinear fractional order is to apply it. The stability
of the dynamic structure of fractional order error is studied by pole
placement. In order to show that the method is accurate and functional,
theoretical results and numerical simulations are given.

Keywords: Fractional order chaotic system · Chaos synchronization ·


Commensurate system · Fractional stability

1 Introduction
The main advantage of a fractional calculation is that it has a memory and it is
a very suitable way to represent the memory and inherited properties of different
material and processes [17]. The system in a fractional order describes specific
structures more accurately in the interdisciplinary fields than an integer order
structure. The fractional order chaotic structure has more complex dynamics
than the integer chaotic structures [1,7]. Therefore, the study of its behavior
and the assessment of dynamics is a critical issue which has attracted much
of the researchers’ attention today. Furthermore, fractional uses of calculations
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 851–861, 2021.
https://doi.org/10.1007/978-3-030-58669-0_75
852 S. Mittal et al.

have been taken into account in many areas such as image processing, signal
processing and [16] automatic monitoring. The importance of analysis and study
of complex structures in fractions are well illustrated in these and many other
comparative examples.
In 1990 the master and slave system concept was employed by Pecora and
Carroll [15] to synchronize chaotic systems. Ott et al. [8] published the OGY
approach to chaos management 1990. A variety of methods have been developed
to manage chaos and synchronize unlike similar systems, for example optimum
control [18], adaptive control [4,23–25,27], active control [2,20], sliding mode
control [4], backstepping control [28], in order to try better chaos management
and synchronization strategies. Different synchronization schemes were proposed
and recorded as a result of the rapid growth of the enthusia for chaos-control
and synchronizations, for example, complete synchronization [2], generalized
synchronization [6,13], combination-combination [5], Projective synchronization
[12,14,26], hybrid synchronization [10,11], combination synchronization [6].
This paper generalizes the combination synchronization scheme used to
derive other type of synchronization schemes. Consequently, the combined syn-
chronization method is simpler and applies to devices in the real world. More-
over, the combination-combination synchronization also provides a better under-
standing of the dynamic syncing and multiple pattern configurations that take
place in real world systems because real-world synchronization is dynamic. The
multi-switching synchronisation-theory has made some progress over the past
years [19,21,22]. The various master system conditions for multiple synchro-
nization schemes are synchronized with the desired slave system status. The
importance for information security of these forms of synchronization studies is
evident within an extensive range of potential synchronization directions which
can be achieved by multiple synchronization. Although the schemes explicitly
offer improved resistance and anti-aggression capabilities to secure communi-
cation, only a few such studies have been recorded. Motivated by the above
discussion, we have tried in this paper to examine and study the synchroniza-
tion of the multi-switching combination-combination of fractional-order chaotic
systems.
This paper is organized as follow: Preliminaries on fractional derivatives are
presented in Sect. 2. The problem formulation is defined in Sect. 3 to achieve
multi-switch combination-combination synchronization. In Sect. 4, illustrative
example is given by considering nonlinear fractional order Lorenz chaotic system.
Numerical simulations shall be given in Sect. 5. Finally, concluding comments are
given in Sect. 6.

2 Preliminaries

Caputo fractional derivative [3] is defined as follows

Dtp x (t) = J m−p xm (t) with 0 < p ≤ 1 (1)


Nonlinear Fractional Order System Synchronization 853

where m = [p], i.e., m is the first integer which is not less than p, xm is the
m-order derivative in the usual sense, and J q (q > 0) is the q-order Reimann-
Liouville integral operator with expression:
t
q 1 q−1
J y (t) = (t − τ ) y (τ ) dτ (2)
Γ (q)
0

where Γ denotes Gamma function.


Lemma 1 [16]. Suppose f (t) has a continuous kth derivative on [0, t] (k ∈ N,
t > 0), and let p, q > 0 be such that there exists some  ∈ N with  ≤ k and p,
p + q ∈ [ − 1, ]. Then
Dtp Dtq f (t) = Dtp+q f (t) (3)

Lemma 2 [17]. The Laplace transform of the Caputo fractional derivative rule
reads
n−1

p p
L {Dt f (t)} = s F (s) − sp−k−1 f (k) (0) ,
k=0
(4)
(p > 0, n − 1 < p ≤ n) .
Particularly, when p ∈ (0, 1], we have L {Dtp f (t)} = sp F (s) − sp−1 f (0).

Lemma 3 [17]. The Laplace transform of the Riemann-Liouville fractional inte-


gral rule satisfies
L {J p f (t)} = s−p F (s) , (p > 0) . (5)

Lemma 4 [7]. The n-dimensional fractional order linear system: Dtpi xi (t) =
 n
j=1 aij xj (t) , 1 ≤ i ≤ n, where 0 < pi ≤ 1, is asymptotically stable if all roots
λ of the equation
   
det diag λM p1 , λM p2 , ..., λM pn − A = 0 (6)
π
satisfy |arg (λ)| > 2M , where M is the least common multiple of the denomina-
tors of pi ’s and A = (aij )n×n .

Lemma 5 [9]. The n-dimensional fractional order system: Dtp X (t) = AX (t),
where 0 < p ≤ 1 and A ∈ Rn×n , is asymptotically stable if A is a negative
definite matrix.

3 Problem Formulation
In this segment, we first present the multi-switch combination-combination syn-
chronization scheme of nonlinear fractional-order chaotic systems.
Consider the fractional-order master systems as described below:

dq x(t)
= A1 x + B1 (x) (7)
dtq
854 S. Mittal et al.

dq y(t)
= A2 y + B2 (y) (8)
dtq
where the state vectors for the master systems (7) and (8) are given by x =
(x1 , x2 , ..., xn )T and y = (y1 , y2 , ..., yn )T ∈ Rn respectively.
A1 , A2 ∈ Rn and B1 , B2 : Rn → Rnxm are non-linear vector functions.
Let’s consider the two slave systems via equation

dq z(t)
= A3 z + B3 (z) + ξ (9)
dtq
dq w(t)
= A4 w + B4 (w) + η (10)
dtq
where the state vectors for the response systems (9) and (10) are given by z =
(z1 , z2 , ..., zn )T and w = (w1 , w2 , ..., wn )T .
A3 , A4 ∈ Rn and B3 , B4 : Rn → Rnxm are non-linear vector functions.
The control vectors ξ = (ξ1 , ξ2 , ..., ξn )T and η = (η1 , η2 , ..., ηn )T is designed
to accomplish the desired result.
Definition 1. The multi-switching combination-combination synchronization is
achieved between the two master systems (7) and (8) and two slave systems (9)
and (10) if the following requirements are met

lim e = lim Kx + Ly − M z − N w


t→∞ t→∞

where K, L, M and N ∈ Rnxn are constant diagonal matrices and R =  0.


 ∗  symbolize the matrix norm and e = Kx + Ly − M z − N w is the error
synchronization vector.
For convenience, let us presume K = diag(α1 , α2 , ..., αn ), L = diag(β1 ,
β2 , ..., βn ), M = diag(γ1 , γ2 , ..., γn ) and N = diag(δ1 , δ2 , ..., δn ) then the
components of error vector e are expressed as

eklmn = αk xk − βl yl − γm zm − δn wn

The appropriate options to obtain the error states for the indices are:
k = l = m = n, k = l = n = m, k = m = n = l, m = l = n = k,
k = l = m = n, k = m = n = l, k = n = m = l, k = l = n = m,
k = m = n = l, k = n = l = m, k = l = m = n, k = l = m = n,
k = m = l = n, k = l = m = n
The transformed error dynamical system is expressed with the help of equations
(7), (8), (9) and (10) as:
dq e(t)
= K(A1 x + B1 (x)) + L(A2 y + B2 (y)) − M (A3 z + B3 (z)) − N (A4 w + B4 (w))
dtq (11)
− U (x, y, z, w)

where
U (x, y, z, w) = M ξ + N η
Nonlinear Fractional Order System Synchronization 855

The goal is to model U (x, y, z, w) controller to achieve multi-switch combination-


synchronization in accordance with Definition1, and to reduce the dynamics of
errors to
dq e(t)
= (A − P )e(t) (12)
dtq
in such a manner that the eigenvalues of A − P satisfy arg(λi ) > απ 2 , (i =
1, 2, 3, ..., n), which in turn ensures the achievement of the multi-switching
combination-combination synchronization of (7) and (8) systems with the (9)
and (10) systems.
Remark 1. The constant matrices K, L, M and N are called the scaling matri-
ces. These scaling matrices can be assumed to be the functional matrices of state
variables x, y, z and w.

Remark 2. If M = 0 or N = 0, this will minimize the multi-switching


combination-combination synchronization issue to multi-switching combination
synchronization.

Remark 3. If K = 0, M = I, N = 0 or K = 0, M = 0, N = I or L =
0, M = I, N = 0 or L = 0, M = 0, N = I, then the combination-combination
synchronization will be minimized to the projective synchronization, where I is
an square identity matrix of order n.

Remark 4. If the scaling matrices K = L = M = 0 or K = L = N = 0, then


the combination-combination synchronization will minimized into chaos control
problem.

4 Illustration

Consider the fractional order Lorenz [29] system as the first master system
⎧ q
⎪ d x1
⎪ q = a(x2 − x1 )


⎪ dt
⎨ q
d x2
= mx1 − x1 x3 − x2 (13)

⎪ dtq

⎪ q

⎩ d x3 = x x − bx
1 2 3
dtq
the second master system is represented by equation:
⎧ q
⎪ d y1

⎪ = a(y2 − y1 )

⎪ dtq
⎨ q
d y2
= my1 − y1 y3 − y2 (14)

⎪ dtq

⎪ q

⎩ d y3 = y y − by
1 2 3
dtq
856 S. Mittal et al.

Let us consider the first slave system represented by equation:


⎧ q
⎪ d z1

⎪ = a(z2 − z1 ) + ξ1
⎪ dtq

⎨ q
d z2
= mz1 − z1 z3 − z2 + ξ2 (15)
⎪ dtq


⎪ q

⎩ d z3 = z z − bz + ξ
1 2 3 3
dtq
Let us consider the second slave system represented by equation:
⎧ q
⎪ d w1

⎪ = a(w2 − w1 ) + η1

⎪ dtq
⎨ q
d w2
= mw1 − w1 w3 − w2 + η2 (16)

⎪ dtq


⎪ dq w3
⎩ = w1 w2 − bw3 + η3
dtq
where ξ = (ξ1 , ξ2 , ξ3 ) and η = (η1 , η2 , η3 ) are the controllers to be designed.
Under the appropriate conditions for the k, l, m = 1, 2, 3 indices as
defined in Definition, there are various possible switching combinations to define
the error status for the (13)–(16) master response systems as mentioned below.

For k = l = m = n, we have e1112 , e1113 , e2221 , e2223 , e3331 & e3332


For k = l = n = m, we have e1121 , e1131 , e2212 , e2232 , e3313 & e3323
For k = m = n = l, we have e1211 , e1311 , e2122 , e2322 , e3133 & e3233
For m = l = n = k, we have e1222 , e1333 , e2111 , e2333 , e3111 & e3222
For k = l = m = n, we have e1122 , e1133 , e2211 , e2233 , e3311 & e3322
For k = m = n = l, we have e1212 , e1313 , e2121 , e2323 , e3131 & e3232
For k = n = m = l, we have e1221 , e1331 , e2112 , e2332 , e3113 & e3223
For k = l = n = m, we have e1123 , e1132 , e2213 , e2231 , e3312 & e3321
For k = m = n = l, we have e1213 , e1312 , e2123 , e2321 , e3132 & e3231
For k = n = l = m, we have e1231 , e1321 , e2132 , e2312 , e3123 & e3213
For k = l = m = n, we have e1223 , e1332 , e2113 , e2331 , e3112 & e3221
For k = l = m = n, we have e1233 , e1322 , e2133 , e2311 , e3122 & e3211
For k = m = l = n, we have e1323 , e1232 , e2313 , e2131 , e3212 & e3121

In this paper, we define results from different switching possibilities for three
randomly selected error state vector combinations.
Let us consider the error system as follows:

⎨ e1213 = α1 x1 + β2 y2 − γ1 z1 − δ3 w3

e2123 = α2 x2 + β1 y1 − γ2 z2 − δ3 w3 (17)


e3231 = α3 x3 + β2 y2 − γ3 z3 − δ1 w1
Nonlinear Fractional Order System Synchronization 857

The error dynamical system is obtained as follows:


⎧ q
⎪ d e1213

⎪ = α1 (a(x2 − x1 )) + β2 (my1 − y1 y3 − y2 ) − γ1 (a(z2 − z1 ) + ξ1 )

⎪ dtq



⎪ − δ3 (w1 w2 − bw3 + η3 )


⎪ q
⎪ d e2123
⎨ = α2 (mx1 − x1 x3 − x2 ) + β1 (a(y2 − y1 )) − γ2 (mz1 − z1 z3 − z2 + ξ2 )
dtq



⎪ − δ3 (w1 w2 − bw3 + η3 )

⎪ q

⎪ d e3231

⎪ = α3 (x1 x2 − bx3 ) + β2 (my1 − y1 y3 − y2 ) − γ3 (z1 z2 − bz3 + ξ3 )

⎪ dtq


− δ1 (a(w2 − w1 ) + η1 )
(18)
Denote ⎧

⎨ U1 = γ1 ξ1 + δ3 η3
U2 = γ2 ξ2 + δ3 η3 (19)


U3 = γ3 ξ3 + δ1 η1
To realize the control gain, we propose the following theorem:
Theorem 1

⎪ U1 = −α1 ax2 − β2 (my1 − y1 y3 − y2 ) + γ1 (a(z2 − z1 )) + δ3 (w1 w2 − bw3 )



⎪ + a(β2 y2 − γ1 z1 − δ3 w3 ) − p1 e1213

⎨ U2 = −α2 (mx1 − x1 x3 − x2 ) − β1 ay2 + γ2 (mz1 − z1 z3 − z2 ) + δ3 (w1 w2 − bw3 )
(20)

⎪ + a(α2 x2 − γ2 z2 − δ3 w3 ) − p2 e2123



⎪ U = −α3 x1 x2 − β2 (my1 − y1 y3 − y2 ) + γ3 (z1 z2 − bz3 ) + δ1 (a(w2 − w1 ))
⎩ 3
+ b(β2 y2 − γ3 z3 − δ1 w1 ) − p3 e3231

where P = (p1 , p2 , p3 ) is a gain matrix and A − P should have eigenvalues all


satisfying the aforementioned stability condition of Sect. 4. We have the following
corollaries that are based on Theorem 1.
Corollary 1. (i) If N = 0 and the control functions U1 , U2 and U3 are designed
as follows:
U = −α ax − β (my − y y − y ) + γ (a(z − z )) + a(β y − γ z ) − p e
1 1 2 2 1 1 3 2 1 2 1 2 2 1 1 1 1213

U2 = −α2 (mx1 − x1 x3 − x2 ) − β1 ay2 + γ2 (mz1 − z1 z3 − z2 ) + a(α2 x2 − γ2 z2 − p2 e2123


U3 = −α3 x1 x2 − β2 (my1 − y1 y3 − y2 ) + γ3 (z1 z2 − bz3 ) + b(β2 y2 − γ3 z3 − p3 e3231
(21)
then the master systems (13) and (14) will achieve multi-switching combination
synchronization with the response system (15). (ii) If M = 0 and the control
functions U1 , U2 and U3 are designed as follows:
U = −α ax − β (my − y y − y ) + δ (w w − bw ) + a(β y − δ w ) − p e
1 1 2 2 1 1 3 2 3 1 2 3 2 2 3 3 1 1213

U2 = −α2 (mx1 − x1 x3 − x2 ) − β1 ay2 + δ3 (w1 w2 − bw3 ) + a(α2 x2 − δ3 w3 ) − p2 e2123 (22)


U3 = −α3 x1 x2 − β2 (my1 − y1 y3 − y2 ) + δ1 (a(w2 − w1 )) + b(β2 y2 − δ1 w1 ) − p3 e3231

then the master systems (13) and (14) will achieve multi-switching combination
synchronization with the slave system (16).
858 S. Mittal et al.

Corollary 2. (i) If K = 0, M = I, N = 0 and the control functions


U1 , U2 and U3 are designed as follows:

⎨ U1 = −β2 (my1 − y1 y3 − y2 ) + a(z2 − z1 ) + a(β2 y2 − z1 ) − p1 e1213

U2 = −β1 ay2 + mz1 − z1 z3 − z2 − az2 − p2 e2123 (23)


U3 = −β2 (my1 − y1 y3 − y2 ) + z1 z2 − bz3 + b(β2 y2 − z3 − p3 e3231

then the master system (14) will achieve multi-switching projective synchroniza-
tion with the slave system (15).
(ii) If K = 0, M = 0, N = I and the control functions U1 , U2 and U3 are
designed as follows:

⎨ U1 = −β2 (my1 − y1 y3 − y2 ) + w1 w2 − bw3 + a(β2 y2 − w3 ) − p1 e1213

U2 = −β1 ay2 + w1 w2 − bw3 ) − aw3 − p2 e2123 (24)


U3 = −β2 (my1 − y1 y3 − y2 ) + a(w2 − w1 ) + bβ2 y2 − p3 e3231

then the master system (14) will achieve multi-switching projective synchroniza-
tion with the slave system (16).
(iii) If L = 0, M = I, N = 0 and the control functions U1 , U2 and U3 are
designed as follows:

⎨ U1 = −α1 ax2 + a(z2 − z1 ) − az1 − p1 e1213

U2 = −α2 (mx1 − x1 x3 − x2 ) + mz1 − z1 z3 − z2 + a(α2 x2 − z2 − p2 e2123


U3 = −α3 x1 x2 + z1 z2 − bz3 − bz3 − p3 e3231
(25)
then the master system (13) will achieve multi-switching projective synchroniza-
tion with the slave system (15).
(iv) If L = 0, M = 0, N = I and the control functions U1 , U2 and U3 are
designed as follows:

⎨ U1 = −α1 ax2 + w1 w2 − bw3 − aw3 − p1 e1213

U2 = −α2 (mx1 − x1 x3 − x2 ) + (w1 w2 − bw3 ) + a(α2 x2 − w3 ) − p2 e2123


U3 = −α3 x1 x2 + a(w2 − w1 ) − bw1 ) − p3 e3231
(26)
then the master system (13) will achieve multi-switching projective synchroniza-
tion with the slave system (16).

5 Numerical Examples

Matlab is used to simulate the effectiveness of the designed controllers.


Let us consider the values as follows : (α1 , α2 , α3 = (1, 1, 1)), (β1 , β2 , β3 ) =
(1, 1, 1), (γ1 , γ2 , γ3 ) = (1, 1, 1), (δ1 , δ2 , δ3 ) = (1, 1, 1) and the fractional-
order q = 0.97.
Nonlinear Fractional Order System Synchronization 859

The parameter value for which fractional order Lorenz system shows chaotic
behavior is taken as (a, b, m) = (10, 8/3, 28). Master and slave system’s
initial states are randomly selected as (x1 (0), x2 (0), x3 (0)) = (0.3, 0.5, 0.8)
(y1 (0), y2 (0), y3 (0)) = (0.1, 0.1, 0.1), (z1 (0), z2 (0), z3 (0)) = (0.2, 0.2, 0.7)
and (w1 (0), w2 (0), w3 (0)) = (0.4, 0.1, 0.6). When we set all A − P ’s own values
to be −2, which satisfy arg(λi ) > απ 2 , (i = 1, 2, 3). Thus, using pole placement
technique, we can obtain the gain matrix as A − P = diag(−2, − 2, − 2).
Figure 1 displays the time response of e1213 , e2123 , e3231 synchronization errors.

Fig. 1. Synchronization error between states of master and slave systems

6 Conclusion
In this manuscript, we discussed the synchronization of the non-linear fractional
order system by means of a multi-switching combination. The problem was suc-
cessfully demonstrated by Lorenz ’s chaotic fractional order. Finally, numerical
results are provided to validate the effectiveness of the synchronization scheme
proposed. Theoretical and numerical results are well-defined.

References
1. Azar, A.T., Vaidyanathan, S., Ouannas, A.: Fractional Order Control and Syn-
chronization of Chaotic Systems. Studies in Computational Intelligence, vol. 688.
Springer, Berlin (2017)
860 S. Mittal et al.

2. Bhat, M.A., Shikha: Complete synchronisation of non-identical fractional order


hyperchaotic systems using active control. Int. J. Autom. Control 13(2), 140–157
(2019)
3. Caputo, M.: Linear models of dissipation whose Q is almost frequency independent-
ii. Geophys. J. Roy. Astron. Soc. 13(5), 529–539 (1967)
4. Khan, A., Shikha: Robust adaptive sliding mode control technique for combination
synchronisation of non-identical time delay chaotic systems. Int. J. Model. Ident.
Control 31(3), 268–277 (2019)
5. Khan, A., Singh, S., Azar, A.T.: Combination-combination anti-synchronization
of four fractional order identical hyperchaotic systems. In: Hassanien, A.E., Azar,
A.T., Gaber, T., Bhatnagar, R., Tolba, F.M. (eds.) The International Conference
on Advanced Machine Learning Technologies and Applications (AMLTA2019).
Advances in Intelligent Systems and Computing, vol. 921, pp. 406–414. Springer,
Cham (2020)
6. Kocarev, L., Parlitz, U.: Generalized synchronization, predictability, and equiva-
lence of unidirectionally coupled dynamical systems. Phys. Rev. Lett. 76(11), 1816
(1996)
7. Matignon, D.: Stability results for fractional differential equations with applications
to control processing. In: Computational Engineering in Systems Applications, pp.
963–968 (1996)
8. Ott, E., Grebogi, C., Yorke, J.A.: Controlling chaos. Phys. Rev. Lett. 64(11), 1196
(1990)
9. Ouannas, A., Al-sawalha, M.M., Ziar, T.: Fractional chaos synchronization schemes
for different dimensional systems with non-identical fractional-orders via two scal-
ing matrices. Optik - Int. J. Light Electron. Opt. 127(20), 8410–8418 (2016)
10. Ouannas, A., Azar, A.T., Abu-Saris, R.: A new type of hybrid synchronization
between arbitrary hyperchaotic maps. Int. J. Mach. Learn. Cybernet. 8(6), 1887–
1894 (2017)
11. Ouannas, A., Azar, A.T., Vaidyanathan, S.: New hybrid synchronisation schemes
based on coexistence of various types of synchronisation between master-slave
hyperchaotic systems. Int. J. Comput. Appl. Technol. 55(2), 112–120 (2017)
12. Ouannas, A., Azar, A.T., Ziar, T.: On inverse full state hybrid function projec-
tive synchronization for continuous-time chaotic dynamical systems with arbitrary
dimensions. Differ. Equ. Dyn. Syst. (2017). https://doi.org/10.1007/s12591-017-
0362-x
13. Ouannas, A., Azar, A.T., Ziar, T., Radwan, A.G.: Generalized synchronization of
different dimensional integer-order and fractional order chaotic systems. In: Azar,
A.T., Vaidyanathan, S., Ouannas, A. (eds.) Fractional Order Control and Synchro-
nization of Chaotic Systems, Studies in Computational Intelligence, vol. 688, pp.
671–697. Springer, Cham (2017)
14. Ouannas, A., Azar, A.T., Ziar, T., Vaidyanathan, S.: On new fractional inverse
matrix projective synchronization schemes. In: Azar, A.T., Vaidyanathan, S.,
Ouannas, A. (eds.) Fractional Order Control and Synchronization of Chaotic Sys-
tems, Studies in Computational Intelligence, vol. 688, pp. 497–524. Springer, Cham
(2017)
15. Pecora, L.M., Carroll, T.L.: Synchronization in chaotic systems. Phys. Rev. Lett.
64(8), 821 (1990)
16. Podlubny, I.: Fractional Differential Equations. Academic Press, New York (1999)
17. Samko, S.G., Kilbas, A.A., Marichev, O.I.: Fractional Integrals and Derivatives:
Theory and Applications. Gordon and Breach Science Publishers, London (1993)
Nonlinear Fractional Order System Synchronization 861

18. Singh, S., Azar, A.T.: Controlling chaotic system via optimal control. In: Interna-
tional Conference on Advanced Intelligent Systems and Informatics. Springer, pp.
277–287 (2019)
19. Singh, S., Azar, A.T.: Multi-switching combination synchronization of fractional
order chaotic systems. In: Joint European-US Workshop on Applications of Invari-
ance in Computer Vision, pp. 655–664. Springer (2020)
20. Singh, S., Azar, A.T., Bhat, M.A., Vaidyanathan, S., Ouannas, A.: Active control
for multi-switching combination synchronization of non-identical chaotic systems.
In: Advances in System Dynamics and Control, pp. 129–162. IGI Global (2018)
21. Singh, S., Azar, A.T., Vaidyanathan, S., Ouannas, A., Bhat, M.A.: Multiswitching
synchronization of commensurate fractional order hyperchaotic systems via active
control. In: Mathematical Techniques of Fractional Order Systems, pp. 319–345.
Elsevier (2018)
22. Singh, S., Azar, A.T., Zhu, Q.: Multi-switching master-slave synchronization of
non-identical chaotic systems. Innovative Techniques and Applications of Mod-
elling, Identification and Control, pp. 321–330. Springer (2018)
23. Vaidyanathan, S., Azar, A.T.: A novel 4-D four-wing chaotic system with four
quadratic nonlinearities and its synchronization via adaptive control method.
Advances in Chaos Theory and Intelligent Control, pp. 203–224. Springer, Berlin
(2016)
24. Vaidyanathan, S., Azar, A.T.: Adaptive control and synchronization of Halvorsen
circulant chaotic systems. Advances in Chaos Theory and Intelligent Control, pp.
225–247. Springer, Berlin (2016)
25. Vaidyanathan, S., Azar, A.T.: Dynamic analysis, adaptive feedback control and
synchronization of an eight-term 3-D novel chaotic system with three quadratic
nonlinearities. Advances in Chaos Theory and Intelligent Control, pp. 155–178.
Springer, Berlin (2016)
26. Vaidyanathan, S., Azar, A.T.: Generalized projective synchronization of a novel
hyperchaotic four-wing system via adaptive control method. Advances in Chaos
Theory and Intelligent Control, pp. 275–290. Springer, Berlin (2016)
27. Vaidyanathan, S., Azar, A.T.: Qualitative study and adaptive control of a novel
4-d hyperchaotic system with three quadratic nonlinearities. In: Azar, A.T.,
Vaidyanathan, S. (eds.) Advances in Chaos Theory and Intelligent Control, pp.
179–202. Springer, Cham (2016)
28. Vaidyanathan, S., Idowu, B.A., Azar, A.T.: Backstepping controller design for the
global chaos synchronization of sprott’s jerk systems. In: Chaos Modeling and
Control Systems Design, pp. 39–58. Springer (2015)
29. Wu, X.J., Shen, S.L.: Chaos in the fractional-order lorenz system. Int. J. Comput.
Math. 86(7), 1274–1282 (2009)
Leader-Follower Control of Unmanned
Aerial Vehicles with State Dependent
Switching

Ahmad Taher Azar1,2(B) , Fernando E. Serrano3 , Nashwa Ahmad Kamal4 ,


and Anis Koubaa1
1
Robotics and Internet of Things Lab (RIOTU),
Prince Sultan University, Riyadh, Saudi Arabia
{aazar,akoubaa}@psu.edu.sa
2
Faculty of computers and Artificial Intelligence, Benha University, Benha, Egypt
ahmad.azar@fci.bu.edu.eg, ahmad t azar@ieee.org
3
Faculty of Engineering and Architecture,
Universidad Tecnologica Centroamericana (UNITEC),
Zona Jacaleapa, Tegucigalpa, Honduras
feserrano@unitec.edu, serranofer@eclipso.eu
4
Faculty of Engineering, Cairo University, Giza, Egypt
nashwa.ahmad.kamal@gmail.com

Abstract. This paper proposes a leader-follower controller for


unmanned aerial vehicles. This strategy consists of implementing state-
dependent switching laws based on the stability of the closed loop during
each switching mode. Multiple unmanned aerial vehicles UAVs are con-
sidered to be training around the leader in order to achieve the desired
trajectory profile where, in this case, the trajectory of the UAVs is driven
to follow the leader along the desired path. State-dependent switching
provides advantages compared to other approaches where accurate con-
trol action is needed when a change in the leader’s state trajectory occurs,
so that the appropriate UAV path command is received to drive the UAV
in the desired direction. The stability theorem of Lyapunov is imple-
mented taking into account the use of Metzler matrices to achieve the
stability of the closed loop system in different switching modes. The
results obtained in this study are validated by two numerical examples
under different conditions for observing and testing the theoretical results
provided in this paper and for demonstrating the mathematical contri-
butions made in this study.

Keywords: Unmanned aerial vehicles (uav) · Leader-follower


controller · Lyapunov theory · Stability analysis

1 Introduction
Unmanned Aerial Vehicles (UAVs) have been extensively used in various forms
of operations, such as rescue, surveillance and other civil and military roles [12].
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 862–872, 2021.
https://doi.org/10.1007/978-3-030-58669-0_76
Leader-Follower Control of Unmanned Aerial Vehicles 863

Formation is required in order to accomplish any particular tasks in which this


formation technique consists of following an aerial or ground leader by following
a predefined desired direction according to the type of mission that is required.
Leader-follower consensus is made in a variety of ways, as found in the literature.
One of the most popular approaches is the implementation of graph theory,
e.g. in [5], a leader-follower consensus of the multi-agent system is achieved
by switching topologies, where the Lyapunov direct method is implemented to
ensure a leader following an agreement. Another representation can be found
in [8], where a flocking system for unmanned aerial vehicle groups is shown. It
can be corroborated in this paper that an appropriate control system is given
by avoiding collisions of each UAV and that the speed is the same as that of
the virtual pilot. In [25], a further consensus flight control system for UAVs is
shown where a multivariable model reference control is carried out taking into
account linear dynamic uncertainties and disturbances. Similar examples can
also be found in [6,14], where the switching topology is extended to various
types of systems.
There are two types of switching for the controllers. The first is related to
time-dependent switching topology, the most common of which is the average
dwell time (ADT) type. The second is the state-dependent switching topology in
which states are used to choose switching modes. Examples of time-dependent
switching topologies can be found in [9,10], where the interaction between state-
dependent and time-dependent switching is shown in the first case and an adap-
tive controller with mode-dependent average dwell time is used in the second
case. Controller design with state-dependent switching topology results can be
found in [13,23] where, in the first case, a H − ∞ controller for discrete time
switched systems is provided where a dwell time constraint is applied. In the
second analysis, state-dependent switching is used to develop the output feed-
back controller for delayed switching systems. Another interesting example of
the application of state-dependent switching for the controller design of the
switched system can be found in [22], where a state-dependent switching topol-
ogy for the controller design of the inverted pendulum is provided. To conclude,
the following references contain important results related to the stability and
control of switched systems, in their continuous or discreet form, implementing
the state-dependent switching topology [3,4,7,21,24].
The design of a leader-follower control strategy for UAVs with state-
dependent switching is proposed in this paper. The design of the controller con-
sists in the selection of suitable Lyapunov functions with their respective matrices
in order to create the correct control law along with the state-dependent topol-
ogy [1,2,11,15–20]. The error variable is defined taking into account the desired
path along with the measured output for each UAV. Control law is developed in
order to ensure the exponential stability of the system and, at the same time,
to achieve a state-dependent switching topology.
This paper is organized as follow: In Sect. 2, problem formulation is presented.
Leader-Follower control strategy for Unmanned Aerial Vehicles is described in
Sect. 3. Numerical experiments are presented in Sect. 4. Discussion of the results
864 A. T. Azar et al.

are presented in Sect. 5. Finally, the paper is concluded in Sect. 6 along with the
future directions.

2 Problem Formulation
Consider the following dynamic model of unmanned aerial vehicles:

⎡ ⎤ ⎡ ⎤
η v
x=⎣v⎦ fσ(t) (x) = ⎣ −g3 ⎦ (1)
−1
Ω −JBi Ωsk JBi Ω
⎡ ⎤
0 0  
W
gσ(t) (x) = ⎣ m1i RBi

0 ⎦ U= (2)
−1 τb
0 JBi

So, this dynamic model can be expressed in affine form:

ẋ = fσ(t) (x) + gσ(t) (x)U (3)


where:
⎡ ⎤
0 −Ωr Ωq
Ωsk = ⎣ Ωr 0 −Ωp ⎦ (4)
−Ωq Ωp 0



RBi = RBi RBi RBi RBi
⎡ ⎤
W1
⎢ W2 ⎥
W =⎢ ⎥
⎣ W3 ⎦ (5)
W4
T
Then, Wj = [0, 0, Tj ] . The rotation matrix is denoted as RBi and Tj is the
lifting torque of each actuator. The position vector is given by η = [x, y, z]T and
T T
the angular velocity vector is given by Ω = [Ωp , Ωq , Ωr ] and 3 = [0, 0, 1] .
Finally, mi is the UAV’s mass and JBi is the respective inertia matrix.

3 Leader-Follower Control Strategy for Unmanned Aerial


Vehicles
In Fig. 1, the resulting block diagram of the proposed control strategy is shown.
Before deriving the proposed control strategy, consider the error variables e =
xd − x and its derivative ė = ẋd − ẋ where xd is the desired trajectory for the
UAV leader. An important remark is that the Metzler matrix is composed of
all square matrices with non-negative off-diagonal components [4]. In order to
design the controller, consider the following function [3,4]:
Leader-Follower Control of Unmanned Aerial Vehicles 865

Fig. 1. Block diagram of the proposed controller

1
V (e) = min eT Pi e (6)
2 i=1,...,N
where P i is a diagonal matrix of the correct dimension. The following theorem, in
which the proof is omitted [3,4], is required to describe the closed loop stability
of the switched system.
Theorem 1. The error system given by ė = ẋd − ẋ, where ẋ is shown in (3), is
stabilized if its Lyapunov function (6) meet the following condition:
⎧ ⎛ ⎞ ⎫
⎨  ⎬
λi = e ∈ Rn : V̇ (e) < −eT ⎝ Πji Pj ⎠ e (7)
⎩ ⎭
j∈P

for any Πji ∈ M (Metzler matrices) and with the state dependent switching
law shown in:

σ(x(t)) = arg min eT Pi e (8)


i∈P

The derivative of the Lyapunov function (6) is given as:

V̇ (e) = eT Pk ė = eT Pk [ẋd − ẋ]


= eT Pk ẋd − eT Pk ẋ (9)

Now substituting system (3) into (9) yields:

V̇ (e) = eT Pk ẋd − eT Pk fσ(t) (x) − eT Pk gσ(t) (x)U (10)

Obtaining the following control law:


−1 −1 −1
U = gσ(t) (x)ẋd − gσ(t) (x)fσ(t) (x) − gσ(t) (x)e
−1 −1
− gσ(t) (x)ė + gσ(t) (x)Pk−1 Re (11)

Making

j∈P Πji Pj = R
 
w = eT T
j∈P Πji Pj e = e Re (12)
866 A. T. Azar et al.

By substituting (11) into (10), the following result is obtained:

V̇ (e) = eT Pk e + eT Pk ė − eT Re = eT Pk e + eT Pk ė − w (13)
By implementing Theorem 1 yields:

V̇ (e) = eT Pk ė < −eT Pk e (14)

Now in order to corroborate that the closed loop system is stable, consider
the following differential inequality obtained from (14):

dV
< −V (15)
dt
becoming in:
 V  t
dV
<− dt
V0 V t0
ln|V |
e
< e−(t−t0 )
eln|V0 |
V < V0 e−(t−t0 ) (16)

So, with (16), the exponential stability of the closed loop system is proved.

4 Numerical Experiments

In this section, two numerical experiments are shown to validate the theoretical
results obtained in this study. The parameters of the three UAV used in this
study are m1 = 0.94 Kg, m2 = 0.90 Kg, m3 = 0.96 Kg, with Ixx1 = Ixx2 =
Ixx3 = 0.02, Iyy1 = Iyy2 = Iyy3 = 0.02 and Izz1 = Izz2 = Izz3 = 0.02.

4.1 Numerical Example 1

In Fig. 2, the time evolution of the position variable is shown when a step refer-
ence is implemented for tracking purposes. The trajectory in x, y and z for the
UAV 1 is tracked accurately by implementing the proposed control strategy.
Meanwhile in Fig. 3, the velocities of the UAV are shown in which the velocity
do not reach excessive values even in the transition of the step reference reaching
the origin in finite time.
Finally, in Fig. 4 the time evolution of the error variable is shown in which,
as observed in Fig. 2, the origin of the proposed control approach is reached
precisely and more quickly by means of the control act.
Leader-Follower Control of Unmanned Aerial Vehicles 867

25
Reference
20
Obtained Trajectory

X (m)
15

1
10
5
0
0 1 2 3 4
Time (s)

25
Reference
20
Obtained Trajectory
X (m)

15
2

10
5
0
0 1 2 3 4
Time (s)

40
Reference
30
Obtained Trajectory
X (m)

20
3

10
0
-10
0 1 2 3 4
Time (s)

Fig. 2. Time Evolution of the position variable x

250
(m/s)

200
150
1

100
Velocity X

50
0
-50
0 0.5 1 1.5 2 2.5 3
Time (s)

250
(m/s)

200
150
2

100
Velocity X

50
0
-50
0 0.5 1 1.5 2 2.5 3
Time (s)

400
(m/s)

300
3

200
Velocity X

100
0
-100
0 0.5 1 1.5 2 2.5 3
Time (s)

Fig. 3. Time Evolution of the velocity variable ẋ


868 A. T. Azar et al.

20
15

(m)
10

1
error 5
0
-5
0 0.5 1 1.5 2 2.5 3
Time (s)

20
15
(m)

10
2
error

5
0
-5
0 0.5 1 1.5 2 2.5 3
Time (s)

40
30
(m)

20
3
error

10
0
-10
0 0.5 1 1.5 2 2.5 3
Time (s)

Fig. 4. Time Evolution of the error variable e

4.2 Numerical Experiment 2

In Fig. 5, the trajectory of the leader and follower for the unmanned aerial vehi-
cle UAV 1 is shown. It can be noticed that the follower track accurately the
trajectory of the leader avoiding collisions when there is a change of trajectory.
The reference plot is considered as the leader and the follower is denoted as the
obtained trajectory.

Obtained Trajectory
Reference

40

30

20
z (m)

10

-10

20
10
20
0 10
0
-10 -10
y (m) -20
-20 -30 x (m)
-40
-30 -50

Fig. 5. Trajectory of the follower UAV1 in comparison with the leader


Leader-Follower Control of Unmanned Aerial Vehicles 869

Fig. 6. Angular velocity Ωp

Figures 6, 7 and 8 display the angular velocity of the UAV 1 while following
the prescribed trajectory generated by the leader. It can be seen that these
velocities are not excessive, showing that there are no sudden changes in the
maneuvering action to keep the trajectory smooth while following the leader’s
trajectory.

Fig. 7. Angular velocity Ωq

Fig. 8. Angular velocity Ωr


870 A. T. Azar et al.

60

Velocity X1 (m/s)
40
20
0
-20
-40
-60
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Time (s)

60
Velocity X2 (m/s)

40
20
0
-20
-40
-60
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Time (s)

400
Velocity X3 (m/s)

300
200
100
0
-100
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Time (s)

Fig. 9. Time Evolution of the linear velocities of the UAV 1

In Fig. 9, the time evolution of the linear velocity of the UAV 1 is shown
in which it can be noticed that these velocities are moderated according to the
maneuvering of the follower UAV in comparison with the leader.
Finally in Fig. 10, the time evolution of the switching modes is shown. These
modes are generated by the switching law depending of the state. It can be
noticed how the three UAV’s are switched until the trajectory of the leader is
followed avoiding collisions.

Fig. 10. Evolution in time of the switching modes


Leader-Follower Control of Unmanned Aerial Vehicles 871

5 Discussion
According to the results obtained in this study, it can be shown how the state-
dependent switching control is applied to map the trajectory of the follower-leader
system of unmanned aerial vehicles. Compared to other graph theory approaches,
the proposed control strategy offers a more stable and consistent alternative due
to the switching topology of the Lyapunov strategy, which ensures the stability
of the closed loop system while avoiding collisions between the leader and the fol-
lower and other unmanned aerial vehicles. The first numerical analysis supports
the theoretical findings when the desired step is taken. While the second example
is a more practical scenario in which it can be shown whether the UAV 1 follows
the leader exactly according to the desired trajectory profile.

6 Conclusion
This paper proposes the design of a leader-follower controller for unmanned aerial
vehicles with state-dependent switching. The proposed controller is obtained by
using the Lyapunov approach under the required conditions for the stability of
the closed loop and by using the Metzler matrices. The state-dependent switching
topology guarantees the exponential consistency of the system. Numerical results
corroborate the theoretical results of the optimal performance of the proposed
control approach.

References
1. Azar, A.T., Serrano, F.E., Hameed, I.A., Kamal, N.A., Vaidyanathan, S.: Robust
h-infinity decentralized control for industrial cooperative robots. In: Proceedings
of the International Conference on Advanced Intelligent Systems and Informatics
2019, Advances in Intelligent Systems and Computing, vol. 1058, pp. 254–265.
Springer, Cham (2020)
2. Azar, A.T., Serrano, F.E., Koubaa, A.: Adaptive fuzzy type-2 fractional order
proportional integral derivative sliding mode controller for trajectory tracking of
robotic manipulators. In: 2020 IEEE International Conference on Autonomous
Robot Systems and Competitions (ICARSC), pp. 183–187 (2020)
3. Ding, X., Liu, X.: On stabilizability of switched positive linear systems under state-
dependent switching. Appl. Math. Comput. 307, 92–101 (2017)
4. Galbusera, L., Bolzern, P.: H-infinity control of time-delay switched linear systems
by state-dependent switching. IFAC Proc. 43(2), 218–223 (2010)
5. Jiang, J., Jiang, Y.: Leader-following consensus of linear time-varying multi-agent
systems under fixed and switching topologies. Automatica 113(108), 804 (2020)
6. Karthick, S., Sakthivel, R., Wang, C., Ma, Y.K.: Synchronization of coupled mem-
ristive neural networks with actuator saturation and switching topology. Neuro-
computing 383, 138–150 (2020)
7. Kim, J.S., Yoon, T.W., Persis, C.D.: Discrete-time supervisory control of input
constrained neutrally stable linear systems via state dependent dwell time logic.
IFAC Proc. Vol. 37(12), 397–402 (2004)
8. Liu, W., Gao, Z.: A distributed flocking control strategy for UAV groups. Comput.
Commun. 153, 95–101 (2020)
872 A. T. Azar et al.

9. Mandal, D.: On the importance of the coexistence of time and state-dependent


switching. Chaos, Solitons Fractals 115, 154–159 (2018)
10. Niu, B., Zhao, P., Liu, J.D., Ma, H.J., Liu, Y.J.: Global adaptive control of switched
uncertain nonlinear systems: an improved mdadt method. Automatica 115(108),
872 (2020)
11. Ouannas, A., Azar, A.T., Vaidyanathan, S.: New hybrid synchronisation schemes
based on coexistence of various types of synchronisation between master-slave
hyperchaotic systems. Int. J. Comput. Appl. Technol. 55(2), 112–120 (2017)
12. Samanta, S., Mukherjee, A., Ashour, A.S., Dey, N., Tavares, J.M.R.S.,
Abdessalem Karâa, W.B., Taiar, R., Azar, A.T., Hassanien, A.E.: Log transform
based optimal image enhancement using firefly algorithm for autonomous mini
unmanned aerial vehicle: An application of aerial photography. Int. J. Image Graph.
18(04), 1850,019 (2018)
13. Sang, H., Nie, H.: Asynchronous h-infinity control for discrete-time switched sys-
tems under state-dependent switching with dwell time constraint. Nonlinear Anal.:
Hybrid Syst. 29, 187–202 (2018)
14. Surapong, N., Mitsantisuk, C.: Position and force control of the scara robot based
on disturbance observer. Proc. Comput. Sci. 86, 116–119 (2016)
15. Vaidyanathan, S., Azar, A.T.: A novel 4-D four-wing chaotic system with four
quadratic nonlinearities and its synchronization via adaptive control method.
Advances in Chaos Theory and Intelligent Control, pp. 203–224. Springer, Berlin
(2016)
16. Vaidyanathan, S., Azar, A.T.: Adaptive control and synchronization of Halvorsen
circulant chaotic systems. Advances in Chaos Theory and Intelligent Control, pp.
225–247. Springer, Berlin (2016)
17. Vaidyanathan, S., Azar, A.T.: Dynamic analysis, adaptive feedback control and
synchronization of an eight-term 3-D novel chaotic system with three quadratic
nonlinearities. Advances in Chaos Theory and Intelligent Control, pp. 155–178.
Springer, Berlin (2016)
18. Vaidyanathan, S., Azar, A.T.: Generalized projective synchronization of a novel
hyperchaotic four-wing system via adaptive control method. Advances in Chaos
Theory and Intelligent Control, pp. 275–290. Springer, Berlin (2016)
19. Vaidyanathan, S., Azar, A.T.: Qualitative study and adaptive control of a novel
4-d hyperchaotic system with three quadratic nonlinearities. In: Azar, A.T.,
Vaidyanathan, S. (eds.) Advances in Chaos Theory and Intelligent Control, pp.
179–202. Springer, Cham (2016)
20. Wang, Z., Volos, C., Kingni, S.T., Azar, A.T., Pham, V.T.: Four-wing attractors
in a novel chaotic system with hyperbolic sine nonlinearity. Optik - Int. J. Light
Electron Opt. 131, 1071–1078 (2017)
21. Wu, Y., Gao, Y., Li, W.: Finite-time synchronization of switched neural networks
with state-dependent switching via intermittent control. Neurocomputing 384,
325–334 (2020)
22. Yamakawa, S., Yamada, A., Fujimoto, H.: State dependent switching control for
inverted pendulum system. IFAC Proc. 38(1), 1142–1147 (2005)
23. Yang, D., Li, X., Qiu, J.: Output tracking control of delayed switched systems via
state-dependent switching and dynamic output feedback. Nonlinear Anal.: Hybrid
Syst. 32, 294–305 (2019)
24. Zhao, G., Wang, J.: Stability and stabilization of switched polynomial nonlinear
systems. IFAC Proc. Vol. 44(1), 5365–5370 (2011)
25. Zhen, Z., Tao, G., Xu, Y., Song, G.: Multivariable adaptive control based consensus
flight control system for uavs formation. Aerosp. Sci. Technol. 93(105), 336 (2019)
Maximum Power Extraction
from a Photovoltaic Panel Connected
to a Multi-cell Converter

Arezki Fekik1,2 , Ahmad Taher Azar3,4(B) , Nashwa Ahmad Kamal5 ,


Fernando E. Serrano6 , Mohamed Lamine Hamida2 , Hakim Denoun2 ,
and Nacira Yassa2
1
Akli Mohnd Oulhadj University of Bouira, Bouı̈ra, Algeria
arezkitdk@yahoo.fr
2
Electrical Engineering Advanced Technology Laboratory (LATAGE),
University Mouloud Mammeri of Tizi-Ouzou, Tizi Ouzou, Algeria
ml hamida@yahoo.com,
akim danoun2002dz@yahoo.fr, yassa.nacera@yahoo.fr
3
Robotics and Internet-of-Things Lab (RIOTU),
Prince Sultan University, Riyadh, Saudi Arabia
aazar@psu.edu.sa
4
Faculty of computers and Artificial Intelligence, Benha University, Benha, Egypt
ahmad.azar@fci.bu.edu.eg
5
Faculty of Engineering, Cairo University, Giza, Egypt
nashwa.ahmad.kamal@gmail.com
6
Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras
serranofer@eclipso.eu

Abstract. This article presents a control strategy to extract the max-


imum power point (MPP) for a solar a photovoltaic (PV) system. The
Perturb and Observe (P&O) technique is used as a DC converter con-
troller to operate the photovoltaic panels at the highest power value in
different weather conditions. To improve the quality and performance of
MPPT control (P&O), the conventional converter is replaced by a mul-
ticell converter to achieve these improvements. The simulation results
showed good performance of the suggested converter. The voltage bal-
ance of the two floating capacitors is successfully obtained and an extrac-
tion of the maximum power is obtained.

Keywords: Multi cell converter · Maximum Power Point Tracking


(MPPT) · Perturb and Observe · Photovoltaic system

1 Introduction

Important has made strides in the past few years in the study and development
of clean energy technologies such as wind, sea wave and solar energy technolo-
gies [1,2,15,16]. Between those resources, solar energy is considered as one of
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 873–882, 2021.
https://doi.org/10.1007/978-3-030-58669-0_77
874 A. Fekik et al.

the most resourceful, fewer toxic process, harmless and noiseless. Like a major
systems using solar energy, PV process is of increasing interest in the last years
[7,10,21]. Unfortunately, the PV system has a disadvantage, which is basically
due to its low energy conversion rate caused by the nonlinear characteristic of
the photovoltaic generator. To solve such a problem, a maximum power point
tracking (MPPT) strategy is necessary to reach the MPP of the PV generator
under different working conditions [5,6,8,12,22]. The MPPT method might be
dependent or independent on array model. In the first case, dependent methods
are used to generate (offline) a database of parameters (Vref and/or duty cycle)
which ensure producing the PV maximum power. To do this, these methods
use the collected data from a set of typical power (Ppv ) curves according to
the voltage (Vpv ) of the PV systems under different irradiances and tempera-
tures conditions. Following these conditions (irradiance and temperature), Vref
and/or duty cycle corresponding to the MPP are selected from the (Vref and
or/duty cycle) database [4,11,17]. In the case of dependent methods, among
different intelligent controllers like Artificial Neural Controller (ANN), Adaptive
Neuro-Fuzzy Inference System (ANFIS) [3], the Fuzzy Logic Controller (FLC) is
the simplest to implement. Recently, FLC received an increasing attention from
researchers. This method provides better responses than other conventional con-
trollers [18,20]. FLC and ANN methods focus on the nonlinear characteristics
of the PV. Some drawbacks are related to rules definition, algorithm complexity
and response time to reach the MPP. The use of various power electronics con-
verters in renewable energy applications has seen a major increase over the last
year, namely the use of DC-DC converters for extracting maximum power and
DC-AC converters for injecting electrical energy into power grids or for power-
ing motors [17,19]. This work proposes a simple and classical maximum power
extraction (MPPT) algorithm for a solar PV system. The P&O technique is used
as a DC converter controller to operate the PV panels at the highest power value
under changing weather conditions. To improve the quality and performance of
the MPPT (P&O) control, the conventional converter is replaced by a multi-cell
voltage converter to achieve these improvements. The results of the simulation
and the experiment showed a good performance of the suggested converter. The
voltage balance of the two floating capacitors is successfully obtained and an
extraction of the maximum power is obtained.
The paper is organized as follows. Photovoltaic System modeling is discussed
in Sect. 2. Maximum power point tracking algorithms are presented in Sect. 3.
Section 4 includes the simulation results and discussion. Finally, Sect. 5 deals
with conclusion of the research work.

2 PV System Modeling
The photovoltaic system contains the SHELL SP75 PV generator connected to
a DC load (resistance load) via a multi-cell converter (3 cells), the so-called
single-stage power conversion as shown in Fig. 1.
In this study, a single-diode model of photovoltaic cell is chosen; the following
equations give the typical I-V characteristic of a solar array:
Maximum Power Extraction from a Photovoltaic Panel 875

Ipv Id Is

Boost converter

Three phase

AC Load
c
Irradiance (G)

inverter
Vpv Vdc Vs
Temperature (T)

PV array Duty cycle

MPPT PWM
Controller

Fig. 1. PV system.

Vpv + Ra Ipv Vpv + Rs Ipv


Ipv = Iph − I0 [exp( ) − 1] − (1)
VT Rsh
G
Iph = (Isc + KI ΔT ) (2)
Gn
T 3 −qEg 1 1
I0 = Irs ( ) [exp( ( − ))] (3)
Tref nK T Tref
Isc
Irs = (4)
Voc
esp( nNs VT )−1
Where
nKT
VT = (5)
q
The parameters of the photovoltaic module studied in this study are given
in Table 1.

Table 1. PV panel parameters

Module parameters Values


Power at MPP: Pmax 75 W
Open circuit voltage: Voc 21.7 V
Short current circuit: Isc 4.8 A
Voltage at MPP: Vmpp 17 V
Current at MPP: Impp 4.4 A
876 A. Fekik et al.

With:
VT : The thermodynamic potential (J/C).
n: Ideality factor of the solar cell.
G: The irradiance (W/m2 ).
K: Boltzmann’s constant (1.380510− 23 J/K).
q: Electron charge (1.610− 19 C).
T : The operating cell temperature (K).
Tref : Reference temperature (T = 283 K).
ΔT : The difference of T − Tref (K).
Ipv , Vpv , Ppv : The cell output current (A), voltage (V ) and power (W ), respec-
tively.
Iph : The light-generated current (A), which is directly proportional to G.
Isc , Voc : The short circuit current (A) and the open circuit voltage (V ).
KI : Temperature coefficients of the short-circuit current (A/K).
I0 : The cell reverse saturation current (A).
Ns : The number of cells connected in series.
Rs and Rsh : The series and shunt resistor (Ω), respectively.
Eg : The physical band gap energy (eV ), (1.12 eV for Si).

3 MPPT Algorithms
MPPT control strategies are considered to be an essential part in the PV system
because they increase the output power of a PV system and thus increase the
efficiency of the array. There are distinct strategies utilized to track the extreme
power point. The Perturbation and Observation (P&O) strategy is considered
the traditional technique. The P&O technique is commonly used because its
algorithm is easy to implement. This operation is executed by disturbing the
system by raising or reducing the operating voltage of the module then observing
its impact on the generated power of the module. Figure 2 displays the flowchart
of the P&O algorithm as it should be applied on the control microprocessor.
From Fig. 2, the calculation of the current output power P(k) is depending on
the measurement of the current (I) and the voltage (V). Then, a comparison
is drawn between the value P(k) and the prior value P(k−1) that produced
from last measurement. If the output power has increased, the disturbance will
continue in the same direction. If the power is decreased from one of the last
measurements, then the disturbance of the output voltage will be turned back
in the opposite direction of the last cycle. With this technique, the operating
voltage is disturbed at each cycle of the MPPT. Once the MPP is achieved, it will
oscillate around the ideal voltage V of operation. This leads to a power loss that
relies on the step width of a single Cp perturbation. If Cp is large, the MPPT
algorithm will respond quickly to sudden changes in the operating conditions
but losses will be increased under stable or slightly changing conditions. If Cp is
very small, losses under stable or changing conditions will be decreased, but the
system will no longer be able to keep up with rapid changes in temperature or
insulation.
Maximum Power Extraction from a Photovoltaic Panel 877

Fig. 2. Flowchart of the Perturb and Observe algorithm (CP is the perturbation step
width)

3.1 Multi Cell Converter


The multicellular converter array is the synthesis of several condenser isolated
cells [13]. There are two complementary switches for each cell. To ensure the
proper operation of all the device properties, the voltages at the terminals of
each cell must be well specified. There are eight operating modes in the case
of a three-cell converter as shown in Fig. 3 [9,14]. Table 2 shows the different
settings depending on the state of the switches (S3 , S2 , S1 ) and the related
output voltages.

Fig. 3. Structure of a three-cell converter powered by PV panel

This model is used for validating the controls only. It is accurate since, at
every point, it precisely takes the turn state (On or Off). It is the current through
878 A. Fekik et al.

the condenser and the move commands Sk = 0, 1 (k = 1, 2, 3), Table 2 shows the
configuration of the Multi cell converter

Table 2. Configuration of the Multi cell converter

States 2S3 S2 S1 Vout


0 0 0 0 0
E
1 0 0 1 3
E
2 0 1 0 3
2∗E
3 0 1 1 3
E
4 1 0 0 3
2∗E
5 1 0 1 3
2∗E
6 1 1 0 3
7 1 1 1 E

The function of the control signals is in the following form:


ICk = (Sk+1 − Sk ) ∗ Iload (6)

dVCk
ICk = Ck (7)
dt
The synthesis of the two Eqs. (6) and (7) is:
Sk+1 − Sk
dVCk = ( ) ∗ Iload (8)
Ck
The voltage equations from the converter’s two floating condensers are pro-
vided by:

S2 −S1
dt = ( C1 ) ∗ Iload
dVC1
S3 −S2 (9)
dt = ( C2 ) ∗ Iload
dVC2

According to the mesh theorem, the voltage Vload is the number of voltages
at the interrupt terminals:


P
VLoad = (VCk − VCk−1 ) ∗ Sk (10)
k=0
Where Vc0 = 0 V, where Vcp = E. The current in the load is indicated by:
dILoad VOut R
= − Ic (11)
dt L L
By substituting (10) and (11) we obtain for a Three-cell converter:
dILoad R S2 − S1 S3 − S2 E
= − Ic − ( )Vc1 − ( )Vc2 + S3 (12)
dt L L L L
Maximum Power Extraction from a Photovoltaic Panel 879

4 Simulation Results and Discussions


A numerical simulation was conducted in the MATLAB/SIMULINK for validat-
ing the feasibility of the control strategy analyzed in this research.
Figure 4 represents the voltages of the floating condensers and the output
voltage of the PV panel with the MPPT controller (P&O) for the three-cell
converter simulation. After the transient phase has passed, it can be noted that
the voltage of the two rotating condensers approaches their equilibrium values.
The voltage Vc1 oscillates between 14 and 15 V correlates to the value of 2* E/3
and the voltage Vc2 oscillates between 7 and 8 V until the value of 7, 13 V is
equivalent to the value of E/3.

25

20

15
Voltage (V)

10

5
Vpv
Vc1
Vc2
0

−5
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time(S)

Fig. 4. Voltage capacitor and Panel Voltage with irradiation = 1000 W/m2 and tem-
perature = 25 ◦ C

The evolution of the charging current is shown in Fig. 5. The load current
stabilizes at a set value with an illumination of 1000 (W/m2 ) following a transient
period to maintain the power demand with the load.

4
ILoad

3.5

2.5
Load current (A)

1.5

0.5

0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time(S)

Fig. 5. Load current with irradiation = 1000 W/m2 and temperature = 25 ◦ C

In Fig. 6, the characteristic of the panel power as a function of the voltage for
different value of the irradiation is presented. The panel power is given by the
880 A. Fekik et al.

multiplication of the voltage of the panel Vpv and its current Ipv . The system
is subjected to different illumination values of 400 W/m2 , 600 W/m2 , 800 W/m2
and 1000 W/m2 . The proper working of this algorithm can be demonstrated in
a simple and consistent manner. Any time the irradiation factor increases, the
power increases. The MPPT controller continues oscillating around the point of
maximum power.

80

1000 W/m2
70 800 W/m2
600 W/m2
400 W/m2
60

50
Power (W)

40

30

20

10

0
0 5 10 15 20 25
Voltage (V)

Fig. 6. Characteristic of the power of the panel according to the voltage with irradiation
differents

Figure 7 shows the power of the load for an illumination of 1000 W/m2 . It
can be shown that the system works at its maximum power point.

80

70
Pload

60

50
Power Load (W)

40

30

20

10

0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Time (S)

Fig. 7. Load power with irradiation = 1000 W/m2 and temperature = 25 ◦ C

5 Conclusion
In this study, the design of the MPPT algorithm connected to a multi-cell con-
verter is proposed. The results of theoretical and numerical simulation have
demonstrated that the proposed controller improves the robustness and per-
formance of the system even when the irradiance changes on several levels. It
is important to note that the voltages of the two capacitors flowing from the
converter follow their reference and stabilize on their reference level.
Maximum Power Extraction from a Photovoltaic Panel 881

References
1. Abdelmalek, S., Rezazi, S., Azar, A.T.: Sensor faults detection and estimation for
a DFIG equipped wind turbine. Energy Proc. Mater. Energy I 139, 3–9 (2015)
2. Abdelmalek, S., Azar, A.T., Dib, D.: A novel actuator fault-tolerant control strat-
egy of DFIG-based wind turbines using takagi-sugeno multiple models. Int. J.
Control Autom. Syst. 16(3), 1415–1424 (2018)
3. Amara, K., Fekik, A., Hocine, D., Bakir, M.L., Bourennane, E.B., Malek, T.A.,
Malek, A.: Improved performance of a PV solar panel with adaptive neuro fuzzy
inference system ANFIS based MPPT. In: 2018 7th International Conference on
Renewable Energy Research and Applications (ICRERA), pp 1098–1101. IEEE
(2018)
4. Amara, K., Bakir, T., Malek, A., Hocine, D., Bourennane, E.B., Fekik, A., Zaouia,
M.: An optimized steepest gradient based maximum power point tracking for PV
control systems. Int. J. Electr. Eng. Inform. 11(4), 662–683 (2019)
5. Ammar, H.H., Azar, A.T., Shalaby, R.: Mahmoud, M.I.: Metaheuristic optimiza-
tion of fractional order incremental conductance (FO-INC) maximum power point
tracking (MPPT). Complexity 2019, 1–13 (2019). 7687891
6. Smida, M.B., Sakly, A., Vaidyanathan, S., Azar, A.T.: Control-based maximum
power point tracking for a grid-connected hybrid renewable energy system opti-
mized by particle swarm optimization. In: Azar, A.T., Vaidyanathan. S. (eds.)
Advances in System Dynamics and Control, Advances in Systems Analysis, Soft-
ware Engineering, and High Performance Computing (ASASEHPC), pp. 58–89.
IGI Global (2018)
7. Busquets-Monge, S., Rocabert, J., Rodriguez, P., Alepuz, S., Bordonau, J.: Multi-
level diode-clamped converter for photovoltaic generators with independent voltage
control of each solar array. IEEE Trans. Ind. Electron. 55(7), 2713–2723 (2008)
8. Ghoudelbourk, S., Dib, D., Omeiri, A., Azar, A.T.: MPPT control in wind energy
conversion systems and the application of fractional control (piα) in pitch wind
turbine. Int. J. Modell. Ident. Control 26(2), 140–151 (2016)
9. Hamida, M.L., Denoun, H., Fekik, A., Vaidyanathan, S.: Control of separately
excited dc motor with series multi-cells chopper using pi-petri nets controller. Non-
linear Eng. 8(1), 32–38 (2019)
10. Jordehi, A.R.: Maximum power point tracking in photovoltaic (PV) systems: a
review of different approaches. Renew. Sustain. Energy Rev. 65, 1127–1138 (2016)
11. Kamal, N.A., Ibrahim, A.M.: Conventional, intelligent, and fractional-order control
method for maximum power point tracking of a photovoltaic system: a review. In:
Fractional Order Systems, pp 603–671. Elsevier (2018)
12. Kamal, N.A., Azar, A.T., Elbasuony, G.S., Almustafa, K.M., Almakhles, D.: PSO-
based adaptive perturb and observe MPPT technique for photovoltaic systems. In:
Hassanien, A. (ed) Proceedings of AISI 2019, Advances in Intelligent Systems and
Computing, vol. 1058, pp. 125–135. Springer, Cham (2020)
13. Lamine, H.M., Hakim, D., Arezki, F., Nabil, B., Nacerddine, B.: Cyclic reports
modulation control strategy for a five cells inverter. In: 2018 International Con-
ference on Electrical Sciences and Technologies in Maghreb (CISTEM), pp. 1–5.
IEEE (2018)
14. Lamine, H.M., Hakim, D., Arezki, F., Dyhia, K., Nacereddine, B., Youssef, B.:
Control of three-cell inverter with a fuzzy logic-feedback linearization strategy to
reduce the harmonic content of the output current. In: 2019 International Con-
ference of Computer Science and Renewable Energies (ICCSRE), pp. 1–5. IEEE
(2019)
882 A. Fekik et al.

15. Meghni, B., Dib, D., Azar, A.T., Ghoudelbourk, S., Saadoun, A.: Robust adap-
tive supervisory fractional order controller for optimal energy management in wind
turbine with battery storage. In: Azar, A.T., Vaidyanathan, S., Ouannas, A. (eds.)
Fractional Order Control and Synchronization of Chaotic Systems, Studies in Com-
putational Intelligence, vol. 688, pp. 165–202. Springer, Cham (2017)
16. Meghni, B., Dib, D., Azar, A.T., Saadoun, A.: Effective supervisory controller
to extend optimal energy management in hybrid wind turbine under energy and
reliability constraints. Int. J. Dyn. Control 6(1), 369–383 (2018)
17. Muhammad, S., Musa, H.: Comparison of an optimized fractional order fuzzy and
fuzzy controllers based MPPT using PSO for photovoltaic applications. In: Pro-
ceedings of the 2019 2nd International Conference on Electronics and Electrical
Engineering Technology, pp. 137–142 (2019)
18. Patel, S., Sarabakha, A., Kircali, D., Kayacan, E.: An intelligent hybrid artificial
neural network-based approach for control of aerial robots. J. Intell. Robot. Syst.
97(2), 1–12 (2019)
19. Ramli, M.A., Twaha, S., Ishaque, K., Al-Turki, Y.A.: A review on maximum
power point tracking for photovoltaic systems with and without shading condi-
tions. Renew. Sustain. Energy Rev. 67, 144–159 (2017)
20. Rao, V.V., Kumar, A.A.: Artificial neural network and adaptive neuro fuzzy control
of direct torque control of induction motor for speed and torque ripple control.
In: 2018 2nd International Conference on Trends in Electronics and Informatics
(ICOEI), pp. 1416–1422. IEEE (2018)
21. Roman, E., Alonso, R., Ibañez, P., Elorduizapatarietxe, S., Goitia, D.: Intelligent
PV module for grid-connected PV systems. IEEE Trans. Ind. Electron. 53(4),
1066–1073 (2006)
22. Scarpa, V.V., Buso, S., Spiazzi, G.: Low-complexity MPPT technique exploiting
the PV module MPP locus characterization. IEEE Trans. Ind. Electron. 56(5),
1531–1538 (2008)
Hidden and Coexisting Attractors
in a New Two-Dimensional Fractional
Map

Amina-Aicha Khennaoui1(B) , Adel Ouannas2 , and Giuseppe Grassi3


1
Laboratory of Dynamical System and Control, University of Larbi Ben M’hidi,
Oum El Bouaghi, Algeria
khennaoui.amina@univ-oeb.dz
2
Laboratory of Mathematics, Informatics and Systems (LAMIS),
University of Laarbi Tebessi, 12002 Tebessa, Algeria
ouannas.adel@univ-tebessa.dz
3
Dipartimento Ingegneria Innovazione, Universita del Salento, 73100 Lecce, Italy
giuseppe.grassi@unisalento.it

Abstract. In this paper, we derive a novel two-dimensional fractional


map based on discrete fractional calculus. This map has no equilibrium
point, but yet it can exhibit rich and complex dynamics. Chaos and
bifurcation of the novel fractional map are analyzed by employing vari-
ous tools including bifurcation diagrams, Lyapunov exponents and phase
portraits.

Keywords: Discrete fractional calculus · Chaos · Coexisting attractors

1 Introduction
Recently, researchers have developed an interest in discrete fractional calculus
and its applications in science and engineering. The vast majority of litera-
ture related to discrete fractional calculus was published in the last decade.
This gave rise to many two and three-dimensional fractional maps such as [1–5].
Researchers have claimed that these systems are superior over their integer-order
counterparts and have richer dynamical behaviors than their integer counter-
parts.
Recently, more attention has been paid to the study of the complex dynamics
of chaotic systems with no equilibrium point due to its importance in engineering
application. The attractors associated with these systems are attractors with
no fixed point; such attractors are called hidden attractors. Several nonlinear
continuous models with hidden attractors have been proposed in the literature
such as [6].
On the other hand, only little effort has been made in discrete models [7–11].
Motivated by previous work of Ouannas et.al [7], in this paper, we present a new
c The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 883–889, 2021.
https://doi.org/10.1007/978-3-030-58669-0_78
884 A.-A. Khennaoui et al.

two-dimensional fractional map without equilibrium points. Dynamic properties


of this fractional map are obtained through bifurcation and largest Lyapunov
exponents. It is shown that the proposed fractional map has the property of
coexisting attractors.

2 The New 2D Fractional Map


In this paper we consider the following two-dimensional difference equation:
C
Δνa x(t) = y(t − 1 + ν) − x(t − 1 + ν),
C Δν y(t) = −αy(t − 1 + ν) − 0.37y 2 (t − 1 + ν) + 0.81x(t − 1 + ν)y(t − 1 + ν) + 1.79,
a
(1)
where x and y are the states of the fractional map, α is system parameter,
and C Δνa is the Caputo-like difference operator of fractional order ν ∈ (0, 1).
Definition of the ν − th fractional operator is given as follows:
Definition 1. For n = 1, ν > 0 and y(t) ∈ Na , we define the ν-th order Caputo-
like operator as
t−(1−ν)
1  −ν
C
Δνa y (t) = (t − s − 1) Δs y (s) , (2)
Γ (1 − ν) s=a

where the symbol Γ (.) represents the Euler’s gamma function and t ∈ Na+1−ν .
The definition of the ν − th fractional sum operator Δ(−ν) y(t) is given by:
(−ν)
Definition 2. For ν > 0, we define the ν − th fractional sum Δa y(t) as
t−ν
1 
Δ−ν
(ν−1)
a y (t) = (t − s − 1) y (s) . (3)
Γ (ν) s=a

Using the ν–th fractional sum, Eq. (1) can also be rewritten in the form of voltera
integral Eq. (4):
⎧ 1 t−ν (ν−1) (y(s + ν − 1) − x(s + ν − 1)),
⎨ x(t) = x0 + Γ (ν)  s=a+1−ν (t − s − 1)

1 t−ν (ν−1)
y(t) = y0 + Γ (ν) s=a+1−ν (t − s − 1) (−αy(s − 1 + ν) − 0.37y 2 (s − 1 + ν) (4)


+ 0.81x(s − 1 + ν)y(s − 1 + ν) + 1.79).

In the present work, numerical methods are adopted to investigate the com-
plex dynamics of this fractional map. Firstly, we discuss the equilibrium points
of the novel model. The equilibrium points are obtained by solving the following
equations: 
y − x = 0,
(5)
−αy − 0.37y 2 − 0.81xy + 1.79 = 0,
from system of Eq. (5) it follows that

− αy − 1.18y 2 + 1.79 = 0. (6)


Hidden and Coexisting Attractors 885

Thus, the fractional map (1) has no equilibrium point when −2.9067 < α <
2.9067. This result shows that the fractional map (1) can generate hidden chaotic
attractor.
Secondly, we present the numerical formula of the fractional map (1). Set the
(ν−1)
initial point a to 0, s + ν = j, and replace (t−s−1)
Γ (ν) by Γ (ν)ΓΓ(t−s−ν+1)
(t−s)
, the
above Eq. (4) is changed to:
⎧ n Γ (n−j+ν)

⎨ x(n) = x0 + Γ (ν) j=1 Γ (n−j+ν) (y(j − 1) − x(j − 1)),
1
 n Γ (n−j+ν)
⎪ y(n) = y0 + Γ (ν)
1
j=1 Γ (n−j+ν) (−αy(j − 1) − 0.37y (j − 1)
2 (7)

+ 0.81x(j − 1)y(j − 1) + 1.79).

where x(0) and y(0) are the initial states. In the next section, dynamic charac-
teristics of the novel 2D fractional map are analyzed numerically.

3 Bifurcations and Largest Lyapunov Exponents


When plotting bifurcation diagrams, two sets of symmetrical initial states are
considered. The bifurcation diagram is plotted in blue for the initial state x0 =
1.78, y0 = −0.79, and red for the initial states x0 = −1.78, y0 = 0.79.

3.1 Bifurcation and Largest Lyapunov Exponents Versus System


Parameter α

Firstly, we will study the bifurcation diagram of the fractional map (1) as the
parameter α is varied from 1.35109 to 1.9199. Biifurcation diagrams and largest
Lyapunov exponents of state variable x(n) is studied for two different values of
ν as shown in Fig. 1 and Fig. 2. As can be seen, the states of the fractional map
(1) changes qualitatively with the variation of α and ν. Figure 1(a) illustrate
the bifurcation diagram of the fractional map (1) with ν = 0.9362. When α
increases from 1.35109 to 1.9199 the states of the system goes from periodic to
chaotic motion via period doubling bifurcation. A periodic windows is observed
in (1.7029, 1.7324). It is worth noting that the fractional map (1) exhibit chaotic
behavior in larger intervals for the initial condition x0 = 1.78, y0 = −0.79. As
shown in Fig. 2 when the fractional order increases from 0.9362 to 0.992, the
fractional map (1) shows chaotic motion over most of the range (1.7387, 1.9136)
with small periodic motion at α = 1.7977.

3.2 Bifurcation Versus Fractional Order ν

To observe the influence of the order ν on the dynamic behavior of fractional


map (1) its bifurcation with respect to the order ν is considered. We choose
to fix parameter α = 1.73, and vary the value of ν in the interval [0, 1]. The
bifurcation diagram and the largest Lyapunov exponent are illustrated in Fig. 3
(a) and Fig. 3 (b), respectively. As one can see, the system has positive Lyapunov
886 A.-A. Khennaoui et al.

Fig. 1. (a) Bifurcation diagrams of the fractional order map (1) versus α for ν = 0.9362,
(b) Largest Lyapunov exponent diagram corresponding to (a).

exponent when ν takes the smallest values, indicating that the fractional map
(1) is chaotic. When the order ν ∈ [0.9362, 0.9402] ∪ ]0.9816, 0.9834], fractional
map (1) is in periodic state, while for the remaining ranges the fractional map
(1) shows chaotic behavior.

Fig. 2. (a) Bifurcation diagrams of the fractional order map (1) versus α for ν = 0.992,
(b) Largest Lyapunov exponent diagram corresponding to (a).
Hidden and Coexisting Attractors 887

0.3
2.5

2
0.25

1.5

0.2

0.15
0.5

λmax
xmax

0 0.1

−0.5
0.05

−1

0
−1.5

−2 −0.05
0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1
ν ν

Fig. 3. (a) Bifurcation diagram versus ν for α = 1.73, (b) Largest Lyapunov exponent
corresponding to diagram (a).

(a) (b)
2 2.5

2
1.5
1.5
1
1

0.5 0.5
y

0
0
−0.5
−0.5
−1

−1 −1.5
−1 0 1 2 −2 −1 0 1 2 3
x x

(c)
2

1.5

0.5
y

−0.5

−1

−1.5
−2 −1 0 1 2
x

Fig. 4. The coexisting attractors of fractional map (1) with system parameter parame-
ters 1.73 and initial condition (−1.78, 0.79) for red attractor and (1.78, −0.79) for blue
attractor; (a) hidden chaotic attractor and periodic hidden attractor for ν = 0.9992;
(b) periodic orbit and hidden chaotic attractor for ν = 0.9362; (c) Two coexisting
hidden attractors for ν = 0.96.
888 A.-A. Khennaoui et al.

4 Coexisting Attractors
Here, the dynamics of the fractional map (1) are analyzed using the phase por-
traits, with two different values of initial conditions. Fixing the parameter α
value and the initial conditions as above. For ν = 0.992, the fractional map (1)
coexist hidden chaotic attractor and hidden periodic attractor corresponding to
initial condition (1.78, −0.79) and (−1.78, 0.79) as shown in Fig. 4 (a). Similarly,
when we choose the order ν = 0.9992 the fractional map (1) coexists periodic
and chaotic hidden attractor corresponding to initial condition (−1.78, 0.79) and
(1.78, −0.79), respectively as plotted in Fig. 4 (b). Furthermore, if we fix the order
ν to 0.96 a coexisting hidden attractor is obtained as depicted in Fig. 4 (c).

5 Conclusion
A fractional map, without equilibrium, which exhibits rich dynamics and hidden
chaotic attractors was examined in this work. The proposed fractional map has
constructed based on the Caputo-like difference operator with no fixed point.
Through phase portraits, bifurcation diagrams and largest Lyapunov exponent,
we have shown that chaos exists in this fractional map and that the type and
range of chaotic behavior is dependent on the fractional order. Also, numerical
experiments have shown that the system exhibits the property of coexisting
attractors.

References
1. Wu, G.-C., Baleanu, D.: Discrete chaos in fractional delayed logistic maps. Non-
linear Dyn. 80(4), 1697–1703 (2014)
2. Khennaoui, A.-A., Ouannas, A., Bendoukha, S., Grassi, G., Lozi, R.-P., Pham,
V.-T.: On fractional-order discrete-time systems: chaos, stabilization and synchro-
nization. Chaos Solitons Fractals 119, 150–162 (2019)
3. Khennaoui, A.-A., Ouannas, A., Bendoukha, S., Grassi, G., Wang, X., Pham,
V.-T., El-Alsaadi, F.: Chaos, control, and synchronization in some fractional-order
difference equations. Adv. Differ. Equ. 2019, 412 (2019)
4. Ouannas, A., Khennaoui, A.-A., Grassi, G., Bendoukha, S.: On the Q-S chaos syn-
chronization of fractional-order discrete-time systems: general method and exam-
ples. Discret. Dyn. Nat. Soc. 1–8 (2018)
5. Gasri, A., Ouannas, A., Khennaoui, A.A., Bendoukha, S., Pham, V.-T.: On the
dynamics and control of fractional chaotic maps with sine terms. Int. J. Nonlinear
Sci. Numer. Simul. 1 (2020)
6. Jafari, S., Sprott, J.-C., Nazarimehr, F.: Recent new examples of hidden attractors.
Eur. Phys. J. Spec. Top. 224, 1469–1476 (2015)
7. Ouannas, A., Wang, X., Khennaoui, A.A., Bendoukha, S., Pham, V.T., Alsaadi, F.:
Fractional form of a chaotic map without fixed points: chaos, entropy and control.
Entropy 20, 720 (2018)
8. Khennaoui, A.-A., Ouannas, A., Boulaaras, S., Pham, V.-T., Taher Azar, A.: A
fractional map with hidden attractors: chaos and control. Eur. Phys. J. Spec. Top.
229, 1083–1093 (2020)
Hidden and Coexisting Attractors 889

9. Khennaoui, A.-A., Ouannas, A., Grassi, G., Azar, A.T.: Dynamic analysis of a
fractional map with hidden attractor. In: Joint European-US Workshop on Appli-
cations of Invariance in Computer Vision, pp. 731–739. Springer, Cham (2020)
10. Ouannas, A., Khennaoui, A.-A., Momani, S., Grassi, G., Pham, V.-T.: Chaos and
control of a three-dimensional fractional order discrete-time system with no equi-
librium and its synchronization. AIP Adv. 10(4), 045310 (2020)
11. Ouannas, A., Khennaoui, A., Momani, S., Pham, V.T., El-Khazali, R.: Hidden
attractors in a new fractional-order discrete system: chaos, complexity, entropy
and control. Chin. Phys. B 29, 050504 (2020)
12. Abdeljawad, T.: On Riemann and Caputo fractional differences. Comput. Math.
Appl. 62, 1602–1611 (2011)
13. Atici, F.-M., Eloe, P.-W.: Discrete fractional calculus with the nabla operator.
Electron. J. Qual. Theory Differ. Equ. Spec. Ed. I 3, 1–12 (2009)
14. Anastassiou, G.-A.: Principles of delta fractional calculus on time scales and
inequalities. Math. Comput. Model. 52, 556–566 (2010)
Author Index

A Angombe, Simon, 598


Abd El-Kader, Shriene M., 314, 336 Anis, Sarah, 227
Abd El-Moghith, Ibrahim A., 282 Aref, Mostafa, 115, 227
Abdel Hameed, Nagwa S., 537 Aref, Mostafa M., 65
Abdel-Aal, Amal M., 779 Azar, Ahmad Taher, 839, 851, 862, 873
Abdel-Kader, Hala M., 779 Aziz El-Banna, Ahmad A., 51, 517, 779
Abdellah, Abdelrahman, 292 Azurdia-Meza, Cesar, 455, 816
Abdel-Mageid, Salah M., 826
Abd-Elrahman, Emad, 292 B
Abdelwahab, Amira, 429 Badawy, Ibrahim, 305
Abdulkader, Hatem, 429 Badawy, Ibrahim M., 769
Abu-Talleb, Amr, 527 Bakr, Randa, 517
Adasme, Pablo, 455, 816 Banditwattanawong, Thepparit, 159
Adly, HebatAllah, 51 Birech, Rhoda J., 598
Ahmed, Ala’a, 708
Al Khayyal, Asma Omran, 742 C
Al Kurdi, Barween, 644, 656, 668, 681, 697, Chang, Cheng-Kuo, 548
708, 720, 731, 742 Chang, Kuo-Chi, 90, 148, 385, 548, 558, 568,
Al Suwaidi, Fatema, 720 577, 586
Alameeri, Khadija, 668 Chu, Kai-Chun, 148, 548, 568, 577, 586
Al-Berry, M. N., 358
Al-Dhuhouri, Fatima Saeed, 644 D
AL-Ghuribi, Sumaia Mohammed, 204 Darwish, Saad M., 3, 16, 38, 215, 261, 271,
Alkitbi, Salama S., 656 282, 465, 621, 793
Almaazmi, Jasim, 731 Dayoub, Moammar, 598
Almazrouei, Fatima Ahmed, 697 Deng, Hui-Qiong, 558
Alshamsi, Aisha, 404 Denoun, Hakim, 873
AlShehhi, Hind, 417 Dessokey, Maha, 394
Alshurideh, Muhammad, 100, 404, 417, 488, Durney, Hugo, 455, 816
632, 644, 656, 668, 681, 697, 708, 720,
731, 742 E
Alsuwaidi, Maryam, 681 Ebeid, Ahmed G., 314
Alyammahi, Alyaa, 488 Eid, Marwa M., 247
Amesimenu, Governor David Kwabena, 586 El Kafhali, Said, 237
Amin, Safaa El-Sayed, 324 El-Aziz, Reham Kamel Abd, 51

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
A. E. Hassanien et al. (Eds.): AISI 2020, AISC 1261, pp. 891–893, 2021.
https://doi.org/10.1007/978-3-030-58669-0
892 Author Index

ElDeeb, Hesham E., 292 I


Eldeeb, Hesham, 394 Ibrahim, Mina, 370
Eldeib, Ahmed H., 475 Irarrazaval, Pablo, 455, 816
Eldien, Adly S. Tag, 475 Ismail, Ahmed A., 271
Elgeldawi, Enas, 501
Elghamrawy, Sally, 347 J
El-Kader, Sherine M. Abd, 608 Jean d’Amour, Ntawiheba, 148
ElSaadawy, Alaa H., 358
ELSayed, Ahmed S., 358 K
El-Shaikh, Sami A. A., 517 Kamal, Nashwa Ahmad, 839, 851, 862, 873
Elshishtawy, Rhana M., 475 Kasban, H., 126
Eltabey, Mohamed M., 527 Khalil, Hassan A., 3
Elzoghaly, Khaled O., 215 Khennaoui, Amina-Aicha, 883
Koubaa, Anis, 839, 862
Kurdi, Barween Al, 100, 404, 417, 488, 632
F
Fahmy, Imane M. A., 441
L
Fang, Qingjun, 187
Li, Pei-Qiang, 148, 548, 558
Fanni, Mohamed, 759
Li, Qin-Bin, 558
Farag, Ramy, 305, 769
Lin, Yuh-Chung, 148, 568, 577
Fatrah, Aicha, 237
Liu, Jia-Jing, 577
Fawzy, Ahmed, 336
Lotfy, Yasmin A., 261
Fawzy, Noorhan K., 65
Lu, You-Te, 187
Fekik, Arezki, 873
Luo, Jie, 558
Fikry, Refaat M., 126
Luo, Ling, 90
Firoozabadi, Ali Dehghan, 455, 816
Fouda, Mostafa M., 475
M
Madbouly, Magda M., 16, 38
G Magdy, Ahmed, 336
Gaber, Heba, 370 Magdy, Fady, 305, 769
Gamal, Rofida Mohammed, 501 Mahmoud, Zakaria, 305
Gawich, Mariam, 179 Marey, Mohammed A., 65
Genina, Alàa, 179 Marey, Mohammed Abd El-Rahman, 324
Girgis, Moheb R., 501 Masdisornchote, Masawee, 159
Grassi, Giuseppe, 883 Mawgoud, Ahmed A., 527
Mehrez, Aaesha Ahmed Al, 632
H Mittal, Shikha, 851
Haghbayan, Mohammad-Hashem, 598 Mohamed, Emadeldin, 803
Halmaoui, Houssam, 79 Mohamed, Hatem, 370
Hamdi, Eman, 115 Mohammad, Samaa A., 537
Hamed, Ghada, 324 Mohammed, Nihal H., 826
Hamida, Mohamed Lamine, 873 Mokhtar, Yasser F., 16
Haqiq, Abdelkrim, 79, 237 Mostafa, Aya S. A., 26
Hassan, Ahmed M., 621 Mostafa, Kassem M., 793
Hassan, Osama F., 3, 215 Mostafa, Lamiaa, 195
Hefnawy, Marwan A., 465 Moustafa, Ahmed, 769
Hefny, Hesham A., 441, 803
Hegazy, Abdelfatah, 179 N
Hikal, Noha A., 247 Nashaat, Heba, 826
Hossam, Aya, 336 Noah, Shahrul Azman, 204
Hsu, Tsui-Lien, 148, 568, 577, 586
Hu, Peng-Jun, 90 O
Hu, Xiaohui, 171, 187 Olave, Miguel Sanhueza, 455, 816
Hussein, Hanan H., 608 Omer, Abdalaziz Altayeb Ibrahim, 548
Author Index 893

Ou, Haiyan, 171 Shyirambere, Gilbert, 586


Ouannas, Adel, 883 Sung, Tien-Wen, 171, 187, 586
Sutinen, Erkki, 598
P
Pan, Jeng-Shyang, 568 T
Tag Eldien, Adly S., 51
R Tag ELdien, Adly S., 517
Radwan, Mohamed Hanafy, 608 Taher, Mohamed, 292
Rady, Sherine, 115 Tantawi, Manal, 137
Rahouma, Kamel H., 26, 537 Tiun, Sabrina, 204
Rizk, Rawia Y., 826 Tolba, Mohamed F., 137
Roushdy, Mohamed, 358 Tolba, Mohamed Fahmy, 324
Turatsinze, Elias, 548
S
Saad, Elsayed, 394 W
Saad, Sally, 227 Waheed, Ahmed, 769
Saeed, Zakaria, 759 Wang, Hsiao-Chuan, 148, 568, 577, 586
Saif, Ihab, 759
Saif, Sherif M., 292, 394
Y
Salah, Khaled, 237
Yang, Liu, 90
Salem, Sameh, 394
Yassa, Nacira, 873
Sallam, Mohamed, 305, 759
Ye, Zhi-Peng, 385
Sallem, Mohamed, 769
Youssef, Nesma, 429
Salloum, Said A., 100, 404, 417, 488, 632,
644, 656, 668, 681, 697, 708, 720, 731,
742 Z
Sayedahmed, Hamdy A. M., 441, 803 Zabala-Blanco, David, 455, 816
Selem, Enas, 314 Zalat, Mohamed S., 38
Serrano, Fernando E., 839, 862, 873 Zhang, Cheng, 577
Shah, Syed Faisal, 100 Zhang, Ping-Jun, 90
Shaker, Abdelrahman M., 137 Zheng, Rong-Jin, 558
Shedeed, Howida A., 137 Zhou, Yu-Wen, 148, 548, 568

You might also like