You are on page 1of 1188

Advances in Intelligent Systems and Computing 880

Kohei Arai
Rahul Bhatia
Supriya Kapoor   Editors

Proceedings
of the Future
Technologies
Conference (FTC)
2018
Volume 1
Advances in Intelligent Systems and Computing

Volume 880

Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Advances in Intelligent Systems and Computing” contains publications on theory,
applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all
disciplines such as engineering, natural sciences, computer and information science, ICT, economics,
business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the
areas of modern intelligent systems and computing such as: computational intelligence, soft computing
including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms,
social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and
society, cognitive science and systems, Perception and Vision, DNA and immune based systems,
self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics including
human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent
data analysis, knowledge management, intelligent agents, intelligent decision making and support,
intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings
of important conferences, symposia and congresses. They cover significant recent developments in the
field, both of a foundational and applicable character. An important characteristic feature of the series is
the short publication time and world-wide distribution. This permits a rapid and broad dissemination of
research results.

Advisory Board
Chairman
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
e-mail: nikhil@isical.ac.in
Members
Rafael Bello Perez, Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba
e-mail: rbellop@uclv.edu.cu
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
e-mail: escorchado@usal.es
Hani Hagras, University of Essex, Colchester, UK
e-mail: hani@essex.ac.uk
László T. Kóczy, Széchenyi István University, Győr, Hungary
e-mail: koczy@sze.hu
Vladik Kreinovich, University of Texas at El Paso, El Paso, USA
e-mail: vladik@utep.edu
Chin-Teng Lin, National Chiao Tung University, Hsinchu, Taiwan
e-mail: ctlin@mail.nctu.edu.tw
Jie Lu, University of Technology, Sydney, Australia
e-mail: Jie.Lu@uts.edu.au
Patricia Melin, Tijuana Institute of Technology, Tijuana, Mexico
e-mail: epmelin@hafsamx.org
Nadia Nedjah, State University of Rio de Janeiro, Rio de Janeiro, Brazil
e-mail: nadia@eng.uerj.br
Ngoc Thanh Nguyen, Wroclaw University of Technology, Wroclaw, Poland
e-mail: Ngoc-Thanh.Nguyen@pwr.edu.pl
Jun Wang, The Chinese University of Hong Kong, Shatin, Hong Kong
e-mail: jwang@mae.cuhk.edu.hk

More information about this series at http://www.springer.com/series/11156


Kohei Arai Rahul Bhatia

Supriya Kapoor
Editors

Proceedings of the Future


Technologies Conference
(FTC) 2018
Volume 1

123
Editors
Kohei Arai Supriya Kapoor
Saga University The Science and Information
Saga, Japan (SAI) Organization
Bradford, UK
Rahul Bhatia
The Science and Information
(SAI) Organization
Bradford, West Yorkshire, UK

ISSN 2194-5357 ISSN 2194-5365 (electronic)


Advances in Intelligent Systems and Computing
ISBN 978-3-030-02685-1 ISBN 978-3-030-02686-8 (eBook)
https://doi.org/10.1007/978-3-030-02686-8

Library of Congress Control Number: 2018957983

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Editor’s Preface

Future Technologies Conference (FTC) 2018 was held on November 13–14, 2018,
in Vancouver at the Marriott Pinnacle Downtown Hotel, with sweeping views
of the coastal mountains, Coal Harbour, and Vancouver’s city skyline. The city of
Vancouver is considered as one of the most beautiful cities in the world.
With great privilege, we present the Proceedings of FTC 2018 in two volumes to
the readers. We hope that you will find it useful, exciting, and inspiring. FTC 2018
aims at producing a bright picture and charming landscape for future technologies
by providing a platform to present the best of current systems’ research and
practice, emphasizing innovation and quantified experience. The ever-changing
scope and rapid development of future technologies create new problems and
questions, resulting in the real need for sharing brilliant ideas and stimulating good
awareness of this important research field.
Researchers, academics, and technologists from leading universities, research
firms, government agencies, and companies from 50+ countries presented the latest
research at the forefront of technology and computing. After the double-blind
review process, we finally selected 173 full papers including six poster papers to
publish.
We would like to express our gratitude and appreciation to all of the reviewers
who helped us maintain the high quality of manuscripts included in this conference
proceedings. We would also like to extend our thanks to the members of the
organizing team for their hard work. We are tremendously grateful for the contri-
butions and support received from authors, participants, keynote speakers, program
committee members, session chairs, organizing committee members, steering
committee members, and others in their various roles. Their valuable support,
suggestions, dedicated commitment, and hard work have made FTC 2018 a suc-
cess. Finally, we would like to thank the conference’s sponsors and partners:
Western Digital, IBM Research, and Nature Electronics.
We believe this event will help further disseminate new ideas and inspire more
international collaborations.

v
vi Editor’s Preface

We hope that all the participants of FTC 2018 had a wonderful and fruitful time
at the conference and that our overseas guests enjoyed their sojourn in Vancouver!
Kind Regards,
Kohei Arai
Contents

Towards in SSVEP-BCI Systems for Assistance in Decision-Making . . . 1


Rodrigo Hübner, Linnyer Beatryz Ruiz Aylon, and Gilmar Barreto
Image-Based Wheel-Base Measurement in Vehicles: A Sensitivity
Analysis to Depth and Camera’s Intrinsic Parameters . . . . . . . . . . . . . . 19
David Duron-Arellano, Daniel Soto-Lopez, and Mehran Mehrandezh
Generic Paper and Plastic Recognition by Fusion of NIR
and VIS Data and Redundancy-Aware Feature Ranking . . . . . . . . . . . . 30
Alla Serebryanyk, Matthias Zisler, and Claudius Schnörr
Hand Gesture Recognition with Leap Motion . . . . . . . . . . . . . . . . . . . . 46
Lin Feng, Youchen Du, Shenglan Liu, Li Xu, Jie Wu, and Hong Qiao
A Fast and Simple Sample-Based T-Shirt Image Search Engine . . . . . . 55
Liliang Chan, Pai Peng, Xiangyu Liu, Xixi Cao, and Houwei Cao
Autonomous Robot KUKA YouBot Navigation Based on Path
Planning and Traffic Signals Recognition . . . . . . . . . . . . . . . . . . . . . . . . 63
Carlos Gordón, Patricio Encalada, Henry Lema, Diego León,
and Cristian Peñaherrera
Towards Reduced Latency in Saccade Landing Position Prediction
Using Velocity Profile Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Henry Griffith, Subir Biswas, and Oleg Komogortsev
Wireless Power Transfer Solutions for ‘Things’ in the Internet
of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Tim Helgesen and Moutaz Haddara
Electronic Kintsugi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Vanessa Julia Carpenter, Amanda Willis, Nikolaj “Dzl” Møbius,
and Dan Overholt

vii
viii Contents

A Novel and Scalable Naming Strategy for IoT Scenarios . . . . . . . . . . . 122


Alejandro Gómez-Cárdenas, Xavi Masip-Bruin, Eva Marín-Tordera,
and Sarang Kahvazadeh
The IoT and Unpacking the Heffalump’s Trunk . . . . . . . . . . . . . . . . . . 134
Joseph Lindley, Paul Coulton, and Rachel Cooper
Toys That Talk to Strangers: A Look at the Privacy Policies
of Connected Toys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Wahida Chowdhury
A Reinforcement Learning Multiagent Architecture Prototype
for Smart Homes (IoT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Mario Rivas and Fernando Giorno
Real-Time Air Pollution Monitoring Systems Using Wireless Sensor
Networks Connected in a Cloud-Computing, Wrapped
up Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Byron Guanochanga, Rolando Cachipuendo, Walter Fuertes,
Santiago Salvador, Diego S. Benítez, Theofilos Toulkeridis, Jenny Torres,
César Villacís, Freddy Tapia, and Fausto Meneses
A Multi-agent Model for Security Awareness Driven by Home
User’s Behaviours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Farhad Foroughi and Peter Luksch
Light Weight Cryptography for Resource Constrained IoT Devices . . . 196
Hessa Mohammed Zaher Al Shebli and Babak D. Beheshti
A Framework for Ranking IoMT Solutions Based on Measuring
Security and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Faisal Alsubaei, Abdullah Abuhussein, and Sajjan Shiva
CUSTODY: An IoT Based Patient Surveillance Device . . . . . . . . . . . . . 225
Md. Sadad Mahamud, Md. Manirul Islam, Md. Saniat Rahman,
and Samiul Haque Suman
Personal Branding and Digital Citizenry: Harnessing the Power
of Data and IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Fawzi BenMessaoud, Thomas Sewell III, and Sarah Ryan
Testing of Smart TV Applications: Key Ingredients, Challenges
and Proposed Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Bestoun S. Ahmed and Miroslav Bures
Dynamic Evolution of Simulated Autonomous Cars in the Open
World Through Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Joe R. Sylnice and Germán H. Alférez
Contents ix

Exploring the Quantified Experience: Finding Spaces for People


and Their Voices in Smarter, More Responsive Cities . . . . . . . . . . . . . . 269
H. Patricia McKenna
Prediction of Traffic-Violation Using Data Mining Techniques . . . . . . . 283
Md Amiruzzaman
An Intelligent Traffic Management System Based on the Wi-Fi
and Bluetooth Sensing and Data Clustering . . . . . . . . . . . . . . . . . . . . . . 298
Hamed H. Afshari, Shahrzad Jalali, Amir H. Ghods, and Bijan Raahemi
Economic and Performance Based Approach to the Distribution
System Expansion Planning Problem Under
Smart Grid Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Hatem Zaki, R. A. Swief, T. S. Abdel-Salam, and M. A. M. Mostafa
Connecting to Smart Cities: Analyzing Energy Times Series
to Visualize Monthly Electricity Peak Load in Residential Buildings . . . 333
Shamaila Iram, Terrence Fernando, and Richard Hill
Anomaly Detection in Q & A Based Social Networks . . . . . . . . . . . . . . 343
Neda Soltani, Elham Hormizi, and S. Alireza Hashemi Golpayegani
A Study of Measurement of Audience in Social Networks . . . . . . . . . . . 359
Mohammed Al-Maitah
Predicting Disease Outbreaks Using Social Media: Finding
Trustworthy Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Razieh Nokhbeh Zaeem, David Liau, and K. Suzanne Barber
Detecting Comments Showing Risk for Suicide in YouTube . . . . . . . . . 385
Jiahui Gao, Qijin Cheng, and Philip L. H. Yu
Twitter Analytics for Disaster Relevance and Disaster
Phase Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Abeer Abdel Khaleq and Ilkyeun Ra
Incorporating Code-Switching and Borrowing in Dutch-English
Automatic Language Detection on Twitter . . . . . . . . . . . . . . . . . . . . . . . 418
Samantha Kent and Daniel Claeser
A Systematic Review of Time Series Based Spam Identification
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Iqra Muhammad, Usman Qamar, and Rabia Noureen
CNN with Limit Order Book Data for Stock Price Prediction . . . . . . . . 444
Jaime Niño, German Hernandez, Andrés Arévalo, Diego Leon,
and Javier Sandoval
x Contents

Implementing Clustering and Classification Approaches for Big Data


with MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Katrin Pitz and Reiner Anderl
Visualization Tool for JADE Platform (JEX) . . . . . . . . . . . . . . . . . . . . . 481
Halim Djerroud and Arab Ali Cherif
Decision Tree-Based Approach for Defect Detection and Classification
in Oil and Gas Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Abduljalil Mohamed, Mohamed Salah Hamdi, and Sofiene Tahar
Impact of Context on Keyword Identification and Use in Biomedical
Literature Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Venu G. Dasigi, Orlando Karam, and Sailaja Pydimarri
A Cloud-Based Decision Support System Framework for Hydropower
Biological Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Hongfei Hou, Zhiqun Daniel Deng, Jayson J. Martinez, Tao Fu, Jun Lu,
Li Tan, John Miller, and David Bakken
An Attempt to Forecast All Different Rainfall Series by Dynamic
Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
Swe Swe Aung, Shin Ohsawa, Itaru Nagayama, and Shiro Tamaki
Non-subsampled Complex Wavelet Transform Based Medical
Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
Sanjay N. Talbar, Satishkumar S. Chavan, and Abhijit Pawar
Predicting Concussion Symptoms Using Computer Simulations . . . . . . . 557
Milan Toma
Integrating Markov Model, Bivariate Gaussian Distribution
and GPU Based Parallelization for Accurate Real-Time Diagnosis
of Arrhythmia Subclasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Purva R. Gawde, Arvind K. Bansal, and Jeffery A. Nielson
Identification of Glioma from MR Images Using Convolutional
Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Nidhi Saxena, Rochan Sharma, Karishma Joshi, and Hukum Singh Rana
Array of Things for Smart Health Solutions Injury Prevention,
Performance Enhancement and Rehabilitation . . . . . . . . . . . . . . . . . . . . 598
S. M. N. Arosha Senanayake, Siti Asmah @ Khairiyah Binti Haji Raub,
Abdul Ghani Naim, and David Chieng
Applying Waterjet Technology in Surgical Procedures . . . . . . . . . . . . . 616
George Abdou and Nadi Atalla
Blockchain Revolution in the Healthcare Industry . . . . . . . . . . . . . . . . . 626
Sergey Avdoshin and Elena Pesotskaya
Contents xi

Effective Reversible Data Hiding in Electrocardiogram Based


on Fast Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640
Ching-Yu Yang, Lian-Ta Cheng, and Wen-Fong Wang
Semantic-Based Resume Screening System . . . . . . . . . . . . . . . . . . . . . . . 649
Yu Hou and Lixin Tao
The Next Generation of Artificial Intelligence: Synthesizable AI . . . . . . 659
Supratik Mukhopadhyay, S. S. Iyengar, Asad M. Madni,
and Robert Di Biano
Cognitive Natural Language Search Using Calibrated
Quantum Mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
Rucha Kulkarni, Harshad Kulkarni, Kalpesh Balar, and Praful Krishna
Taxonomy and Resource Modeling in Combined
Fog-to-Cloud Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
Souvik Sengupta, Jordi Garcia, and Xavi Masip-Bruin
Predicting Head-to-Head Games with a Similarity Metric
and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
Arisoa S. Randrianasolo and Larry D. Pyeatt
Artificial Human Swarms Outperform Vegas Betting Markets . . . . . . . 721
Louis Rosenberg and Gregg Willcox
Genetic Algorithm Based on Enhanced Selection and Log-Scaled
Mutation Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
Neeraj Gupta, Nilesh Patel, Bhupendra Nath Tiwari, and Mahdi Khosravy
Second-Generation Web Interface to Correcting ASR Output . . . . . . . . 749
Oldřich Krůza and Vladislav Kuboň
A Collaborative Multi-agent System for Oil Palm Pests
and Diseases Global Situation Awareness . . . . . . . . . . . . . . . . . . . . . . . . 763
Salama A. Mostafa, Ahmed Abdulbasit Hazeem,
Shihab Hamad Khaleefahand, Aida Mustapha, and Rozanawati Darman
Using Mouse Dynamics for Continuous User Authentication . . . . . . . . . 776
Osama A. Salman and Sarab M. Hameed
Ten Guidelines for Intelligent Systems Futures . . . . . . . . . . . . . . . . . . . 788
Daria Loi
Towards Computing Technologies on Machine Parsing of English
and Chinese Garden Path Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
Jiali Du, Pingfang Yu, and Chengqing Zong
Music Recommender According to the User Current Mood . . . . . . . . . . 828
Murtadha Al-Maliki
xii Contents

Development of Extreme Learning Machine Radial Basis Function


Neural Network Models to Predict Residual Aluminum for Water
Treatment Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
C. D. Jayaweera and N. Aziz
Multi-layer Mangrove Species Identification . . . . . . . . . . . . . . . . . . . . . 849
Fenddy Kong Mohd Aliff Kong, Mohd Azam Osman,
Wan Mohd Nazmee Wan Zainon, and Abdullah Zawawi Talib
Intelligent Seating System with Haptic Feedback
for Active Health Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Peter Gust, Sebastian P. Kampa, Nico Feller, Max Vom Stein,
Ines Haase, and Valerio Virzi
Intelligence in Embedded Systems: Overview and Applications . . . . . . . 874
Paul D. Rosero-Montalvo, Vivian F. López Batista, Edwin A. Rosero,
Edgar D. Jaramillo, Jorge A. Caraguay, José Pijal-Rojas,
and D. H. Peluffo-Ordóñez
Biometric System Based on Kinect Skeletal, Facial and
Vocal Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Yaron Lavi, Dror Birnbaum, Or Shabaty, and Gaddi Blumrosen
Towards the Blockchain-Enabled Offshore Wind
Energy Supply Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904
Samira Keivanpour, Amar Ramudhin, and Daoud Ait Kadi
Optimal Dimensionality Reduced Quantum Walk
and Noise Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914
Chen-Fu Chiang
Implementing Dual Marching Square Using Visualization
Tool Kit (VTK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 930
Manu Garg and Sudhanshu Kumar Semwal
Procedural 3D Tile Generation for Level Design . . . . . . . . . . . . . . . . . . 941
Anthony Medendorp and Sudhanshu Kumar Semwal
Some Barriers Regarding the Sustainability of Digital Technology
for Long-Term Teaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
Stefan Svetsky and Oliver Moravcik
Digital Collaboration with a Whiteboard in Virtual Reality . . . . . . . . . . 962
Markus Petrykowski, Philipp Berger, Patrick Hennig,
and Christoph Meinel
Teaching Practices with Mobile in Different Contexts . . . . . . . . . . . . . . 982
Anna Helena Silveira Sonego, Leticia Rocha Machado,
Cristina Alba Wildt Torrezzan, and Patricia Alejandra Behar
Contents xiii

Accessibility and New Technology MOOC- Disability and Active


Aging: Technological Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
Samuel A. Navarro Ortega and M. Pilar Munuera Gómez
Lecturing to Your Students: Is Their Heart In It? . . . . . . . . . . . . . . . . . 1005
Aidan McGowan, Philip Hanna, Des Greer, and John Busch
Development of Collaborative Virtual Learning Environments
for Enhancing Deaf People’s Learning in Jordan . . . . . . . . . . . . . . . . . . 1017
Ahmad A. Al-Jarrah
Game Framework to Improve English Language Learners’
Motivation and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029
Monther M. Elaish, Norjihan Abdul Ghani, Liyana Shuib,
and Abdulmonem I. Shennat
Insights into Design of Educational Games: Comparative Analysis
of Design Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041
Rabail Tahir and Alf Inge Wang
Immersive and Collaborative Classroom Experiences
in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062
Derek Jacoby, Rachel Ralph, Nicholas Preston, and Yvonne Coady
The Internet of Toys, Connectedness and Character-Based Play
in Early Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079
Pirita Ihamäki and Katriina Heljakka
Learning Analytics Research: Using Meta-Review to Inform
Meta-Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097
Xu Du, Juan Yang, Mingyan Zhang, Jui-Long Hung, and Brett E. Shelton
Students’ Evidential Increase in Learning Using Gamified
Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109
V. Z. Vanduhe, H. F. Hassan, Dokun Oluwajana, M. Nat, A. Idowu,
J. J. Agbo, and L. Okunlola
Improving the Use of Virtual Worlds in Education Through Learning
Analytics: A State of Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1123
Fredy Gavilanes-Sagnay, Edison Loza-Aguirre, Diego Riofrío-Luzcando,
and Marco Segura-Morales
Design and Evaluation of an Online Digital Storytelling Course
for Seniors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1133
David Kaufman, Diogo Silva, Robyn Schell, and Simone Hausknecht
The Role of Self-efficacy in Technology Acceptance . . . . . . . . . . . . . . . . 1142
Saleh Alharbi and Steve Drew
xiv Contents

An Affective Sensitive Tutoring System for Improving Student’s


Engagement in CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1151
Ruth Agada, Jie Yan, and Weifeng Xu
Multimedia Interactive Boards as a Teaching and Learning Tool
in Environmental Education: A Case-Study with
Portuguese Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164
Cecília M. Antão
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171
Towards in SSVEP-BCI Systems
for Assistance in Decision-Making

Rodrigo Hübner1,3(B) , Linnyer Beatryz Ruiz Aylon2 , and Gilmar Barreto3


1
Computer Department, Computer Interfaces Research Group,
Federal University of Technology - Paraná, Campo Mourão, Paraná 87301–899, Brazil
rodrigohubner@utfpr.edu.br
2
Manna Research Group, State University of Maringá,
Maringá, Paraná 87020–900, Brazil
3
School of Electrical and Computer Engineering, Intelligent Systems and Control
Laboratory, State University of Campinas, Campinas, São Paulo 13083–970, Brazil

Abstract. In recent years, Brain Computer-Interfaces (BCI) has a


major focus on systems out of clinical scope. These systems have been
used to control electrical and electronic equipment, control of digital
games and other kinds of “control”. Such control can be accomplished
through decision-making by a BCI system. A paradigm known for this
purpose is SSVEP (system based on steady-state visually evoked poten-
tial paradigm), in which it is possible to distinguish targets with dif-
ferent frequency flicker through visual evocations. This paper proposes
a human-computer interaction system using SSVEP for assistance in
decision-making. In particular, the work describes a prototype of traffic
lights proposed as a case study. The experiments with this prototype,
have created decision-making situations, allowing the SSVEP-BCI sys-
tem assists the individual to decide correctly.

Keywords: BCI · SSVEP · Decision-making

1 Introduction
Brain Computer-Interfaces (BCI) [3,7,19] is commonly used for the development
of systems that can improve the quality of life of people who have some physical
constraint which limits the capacity of that person (visual, auditory or motor).
In this way, a BCI system should minimize the subject’s disability by assisting
in the task that the subject could perform alone. An example of this is the [10],
a system in which a subject who has speech impairment, focuses on an array of
letters on a monitor, and through the visual stimuli generated, the BCI system
can classify which the letter the subject is looking at and displaying it.
A BCI system can also aid in the decision-making of healthy subjects. There
are situations that can be considered risky, for example, braking a vehicle while
driving when you see a red traffic light or a car headlight flashing ahead. In such
situations, a BCI system can assist the driver if the decision taken by him is
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1–18, 2019.
https://doi.org/10.1007/978-3-030-02686-8_1
2 R. Hübner et al.

not the correct one. With this premise, we are developing a work to investigate
the SSVEP paradigm (Steady-State Visually Evoked Potential) [13–15] used to
determine which target with flicker frequency an subject is focused, which can be
recognized with an electroencephalography (EEG) equipment. In order for the
BCI system to make the right decision, it is necessary that the different events
are being presented at different flicker frequencies.
In order to conduct this research, was built simulations that reproduce tech-
niques that use SSVEP, because when this concept of decision-making is applied
to the real world, such situations can not be played the same way using the
traditional SSVEP paradigm as bright targets do not present a scintillation fre-
quency that can be classified by the BCI system, in addition to endangering the
life of the experiment subjects. In this context, the objective of this paper is to
present an empirical study of the techniques used for the processing of SSVEP
signals, aiming the development of a SSVEP-BCI system to aid in decision mak-
ing in situations close to the real world. The reason is that real bright targets
do not have a flicker frequency that can be classified by the BCI system, beside
putting at risk the lives of the subjects of the experiment. In this context, the
objective of this paper is to present an empirical study of the techniques used
for the processing of SSVEP signals, aiming the development of a SSVEP-BCI
system to assist in decision-making in situations close to the real world. For this,
we have built a prototype of traffic lights with Light Emitting Diode (LED) to
create decision-making situations.
To fulfill this objective, a set of experiments based on the SSVEP paradigm
was reproduced using a public database, with the intention of evaluating the pro-
gramming methods. We also constructed databases with EEG signal acquisition
to be evaluated with a prototype using LED-based traffic lights, in which they
generate the necessary visual evocation for experimentation. Finally, we investi-
gated different SSVEP signal stimulation strategies, making the prototype traffic
lights constructed have a behavior closer to reality without the visualization of
traditional flicker frequencies of the SSVEP paradigm.
This paper is divided as follows. Section 2 presents a brief grounding for
the SSVEP paradigm. Section 3 presents some related works. Section 4 presents
experiments with public database and with the constructed prototype, using the
traditional model SSVEP. Section 5 presents BCI system directions for evaluat-
ing decision-making at traffic lights, using the SSVEP paradigm in non-flickering
targets. Finally, Sect. 6 presents the conclusion.

2 SSVEP-BCI Background
The BCI paradigms determine what and how the subject must behave to produce
certain known patterns that can be interpreted by a BCI system. The subject
must generally be subjected to a calibration equipment and a training before
the experiment. The configuration of the physical environment, positioning of
the electrodes and the software set are directly associated with the paradigm
used. The paradigms currently used in a BCI system are: Selective attention
and Motor Imagery [18]. In this paper we focus on Selective Attention.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 3

Selective Attention. Paradigms of BCI based on selective attention require


external stimuli that result in patterns of response by the brain [8]. Such stimuli
may be visual, auditory or tactile. In this method, each stimulus is associated
with a specific command and the user must focus his attention on a target
stimulus to generate the corresponding action. In this work will be used visual
stimuli and the main paradigms that use these stimuli are: Steady-State Evoked
Potentials (SSEP) and P300.

– P300: The P300 paradigm consists of obtaining a series of positive peaks


in the input signal, with a variation in amplitude in a short space of time.
This variation should occur after the appearance of the infrequent target
stimulus among several frequent [6]. In this way it is possible to visualize a
variation in signal amplitude in the time domain. Stimuli can be auditory,
visual or sensory. An example of a visual stimulus may be determined by a
letter or screen symbol of a computer that the subject is focused on, which
upon receiving a contrast (generally lighter) will generate a peak in the signal
approximately 300 milliseconds after the stimulus evocation. For this peak is
given the name of P300 (peak 300).
– SSEP: Periodic external stimuli can be verified in the signal obtained from
any region of the visual cortex. They may be of the sensory, auditory, but
mainly visual, known in the literature as SSVEP.
– SSVEP: SSVEP stimuli can be triggered by a visual frequency stimulated
to the subject. Usually these stimuli are generated by a computer simulation
on the monitor screen, but it is also normal to use LEDs for it [25]. Using
the screen of a monitor, it is necessary set up the experiment so the screen
refresh rate as a multiple of the flicker frequencies used as target. A target
may be a light flickering at a frequency of 8 Hz, where an subject is visu-
ally focused on it and thereby it will be possible to recognize a response in
the electroencephalogram (EEG) signal obtained from the visual cortex at a
frequency around 8 Hz. In a study conducted by [20] it has been found that
stimulated frequencies can range from 5 to 100 Hz. The SSVEP signal has
other characteristics such as luminance, contrast and chromatic that can be
modulated together with the flickering frequencies of a target stimulus [4].

2.1 Signal Processing in the SSVEP Paradigm

A BCI experiment based on the SSVEP paradigm is related to how the stimuli
are presented to the subject and how the signals obtained through the EEG
equipment are processed. We present the processing steps of the SSVEP signal.

Pre-processing of EEG Signals. In pre-processing, an EEG signal is filtered


without losing relevant information. In addition, the signal can be improved by
separating the noises present, known as signal-to-noise ratio (SNR). When the
SNR is low on the signal, it means that detectable patterns will be difficult to
find. Even when the SNR is high on the signal, it means that the standards will
4 R. Hübner et al.

be easy to identify. Signal filtering techniques can be applied in combination,


facilitating the determination of the signals of interest.
Temporal and spatial filters are used as signal preprocessing. In this paper we
used the bandpass temporal filtering techniques by the finite impulse response
(FIR) method [22] and the spatial filtering method Common Average Reference
(CAR) [17], which consists of the point-to-point subtraction of each signal by
the mean of all EEG signals obtained by all the electrodes.

Feature Extraction. This step performs a search for the features that best
describes the expected properties of the input signal. Such characteristics can be
obtained using: the signal waveform analyzed in the time domain; Components
of subject frequencies in the frequency domain; Power density spectrum; Time
frequency analysis (i.e. Short-Time Fourier Transform - STFT), Autoregressive
Models, etc. [11].
In SSVEP-BCI systems, methods for extracting features based on the spec-
tral information presented in the EEG signal. In a given set of evoked frequencies,
the Power Spectral Density (PSD) calculation can extract from the signal, the
information of interest to be classified. The main methods used for SSVEP fre-
quency density analysis are: Filter Bank, Spectrogram, Weltch Method [2] and
Multitaper Method [16]. In this work was used the Multitaper Method that can
be applied by the tool MNE-Python1 .

Feature Selection. In the feature extraction can be obtained a large number


of variables that will be analyzed in the future by a classifier. In this step, the
most relevant features of the set obtained by the feature extraction are selected,
allowing to improve the performance of the classifier in terms of faster execu-
tion and effectiveness. Among the techniques of feature selection are mentioned
Filter (Pearson’s Correlation Coefficients and Davies-Bouldin Index) and the
Wrappers technique [2]. The technique Recursive Feature Elimination (RFE)
based in Wrappers is used in this work because it presents in general a better
performance in the same work cited.

Classification. Classification is the final stage of EEG signal processing. It is


possible to decide which action or command should be executed. The selection
of characteristics has as output a vector of characteristics used by the classifi-
cation of data in different classes. Classifiers that follow the supervised learning
approach use samples of labeled examples called training sets. This set is formed
by several labeled samples of each class, so that the classifier is able to recognize
new samples and classify them in any of the classes that make up this set.
There are several supervised classification algorithms, such as Support Vector
Machine (SVM) and Linear Discriminant Analysis (LDA). In this work we chose
to use the SVM classifier, based on its performance presented in [15].

1
http://martinos.org/mne.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 5

3 Related Works
The main works that contributed to the development of this paper are presented
below.
In Development of an ssvep-based BCI spelling system adopting
a qwerty-style LED keyboard [12] a speller system was developed in the
QWERTY model using 30 LEDs representing each key of keyboard, flickering at
different frequencies. This method allows the individual to select a character
without the need for multiple steps as in traditional BCI speller systems. It was
possible to obtain wide frequency resolution, strictly recognizing for example a
flickering stimulus of 0.1 Hz. The experiments were performed with ten healthy
subjects, in which five participated in an offline experiment and five in an online
experiment. 68 English words were used for the evaluations. In the offline results,
accuracy of 76.67% and 72.33% was obtained for viewing angles 40 and 30 degrees
respectively. The online results were better because the best angle and the best
combination of electrodes were used (Oz and O2 in system 10–20), obtaining
accuracy regarding the amount of time participants took to recognize each char-
acter: 5 s (84.69%), 6 s (86.17%) and 7 s (89.53%). From this work it was possible
to obtain important information about the distance and positioning angle of the
LEDs for a better result, besides the best electrodepositions for it.
In A novel stimulation method for multi-class SSVEP-BCI using
intermodulation frequencies [4] a method was developed using different inter-
modulation frequencies for SSVEP-BCIs with flickering targets at the same fre-
quency of 15 Hz. The set up allowing a greater number of targets. The authors
encoded nine target objects on an LCD screen, in which quadratic forms were
arranged in a 3 × 3 matrix. The modulation frequency for each target was gener-
ated by color characteristics (C), alternating the frames in green, red and gray,
luminance characteristic (L), alternating frames with a difference of 20 cd/m− 2
and the mixture of the two (CL) forming three approaches. As a result, the
average accuracy for the online assessment of the three approaches was 85%,
with the mixture of the two (CL) being the highest obtained of 96.41%. This
work presents alternatives in the SSVEP paradigm, which it implies to recognize
different targets flickering in the same frequency.
In the work Towards an optimization of stimulus parameters for
brain-computer interfaces based on steady state visual evoked poten-
tials [5] the influence of several characteristics of the SSVEP visual stimulus of
the SSVEP signal is presented. Five characteristics were evaluated for the tar-
gets: size, distance, color, shape and presence of a fixation point in the middle
of each flickering object. The distance between the stimulation targets and the
presence or absence of the fixation point had no significant effect on the results,
since the color and size of the flickering target played an important role in the
SSVEP response. Experiments were performed with 5 subjects and four stim-
uli were presented on the monitor screen with different flickering frequencies. A
group of LEDs was added adjacent to each object shown on the screen, respon-
sible for randomly generating the imposed luminance. The spectral responses
are larger for white, followed by yellow, red, green, and blue color. About the
6 R. Hübner et al.

size of objects, the quality of spectral information is proportionately larger in


relation to the size of the object. Other features did not have relevant effects for
this study. This work presented important information for the characterization
of the environment in which the prototype of our work is inserted.
The work of Use of high-frequency visual stimuli above the critical
flicker frequency in a SSVEP-based BMI [21] presents an evaluation using
frequencies above those traditionally used in SSVEP-BCI systems. Green (low
luminance) and blue (high luminance) LEDs were used to verify the accuracy of
the system and the level of visual fatigue of the subjects. Subjects fixed green and
blue flickering light (30 and 70 Hz respectively), and the SSVEP amplitude was
evaluated. The subjects were asked to indicate whether the stimulus was visibly
flickering and to report their subjective level of discomfort. It also evaluated
visible frequencies (41, 43 and 45 Hz) against invisible frequencies (61, 63 and
65 Hz). As a result, 93.1% and 88% were obtained for the visible and invisible
stimuli respectively. In addition, it was concluded that high frequencies continue
to offer good performance and that visual fatigue has been reduced. In our paper
we investigated the use of high flickering frequencies (invisible to the human eye)
to approach a real situation.
The related work presented encouraged the use of new concepts in the non-
traditional SSVEP method. These methods can contribute to a SSVEP-BCI
system applied in a real situation. The next section presents the conduction of
the preliminary experiments.

4 Preliminary Experiments
This section presents two experimental sets that are the basis for our investiga-
tion. The two sets are divided as follows:
1. Development of codes for the evaluation of a public SSVEP-BCI database;
and
2. Construction of a prototype using traffic lights with LEDs as flickering targets.
Initially, we demonstrate the results of codes produced as part of this work,
to evaluate a public database. After the evaluation of the experiment, a second
experimental set was performed to evaluate a database produced by us, using
a prototype with traffic lights constructed with LEDs, in which LEDs perform
traditional SSVEP stimuli, based on flickering targets frequencies. By analyzing
these results in addition to investigating new methods linked to SSVEP-BCI
systems, it will be possible to develop a new BCI system for decision-making
with non-flickering targets using the same physical components of the second
experimental set. The proposal resulting from this research is in Sect. 5.
In all experiments was used the tool MNE-Python [9], which makes up a set
of libraries written in the Python programming language for the purpose of
analyzing EEG and MEG data. The library also used was Scikit Learn2 for
routines based on Computational Intelligence, also written in Python.
2
http://scikit-learn.org.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 7

4.1 Public Database SSVEP-BCI


In this section the experiment performed with the AVI SSVEP database3 , devel-
oped by [24], built as part of a work by the same author [23], was develop a
“speller with dictionary support”. First it will be introduced to database built
by [24] and then presented the algorithmic strategies developed by us, detailing
the loading and preparation of data, procedures and results respectively.

Description of the Public Database AVI SSVEP. The base has measured
EEG data from healthy subjects, being exposed in flickering targets to obtain
SSVEP responses. Data were recorded using three electrodes (Oz, Fpz e Pz) posi-
tioned according to the 10–20 system. The data obtained from the electrode Oz
is the only ones recorded in the database. The electrode Fpz was used as refer-
ence and the electrode PZ for ground. An LCD monitor was used for stimulus
generation BenQ XL2420T with refresh rate at 120 Hz. The EEG equipment used
was the g.USBamp which has a sampling rate of 512 Hz and gold-plated elec-
trodes moistened with electrolytic gel. Subjects had to concentrate during the
experiment on targets of 2.89 cm2 on the monitor screen, seated at a distance of
60 cm from it.
Two types of experiments were performed to compose this database. The
first was performed with a single target (ST) to verify the existence of the VEP
signal. Four subjects were used, each submitted to a single session, focusing on
a single target for thirty seconds, four times. The frequencies chosen in each test
were random, but they were the same for each subject. The second experiment
was performed with multiple targets (MT), adding seven targets at different
frequencies. Five subjects were used in two sessions, focusing on multiple targets
for sixteen seconds, ten times. In each trial the subject focused on one of the
flickering targets reported and the sequence reported was also random but the
same for the five subjects.

Loading and Data Preparation. The codes developed for ST analysis were
necessary because it has a single target, taking into account our main research
at traffic lights, only one light will be lit at a time. The MT data were also
analyzed because there is a greater variation of samples and thus it is possible
to construct and evaluate a greater combination of strategies.
In the ST data, each subject performed only one session with four trials, but
since there are twenty-seven trials in each session, the training and test data
could be divided into different proportions in the same session, so that 33% of
the samples (9 samples) were used for the training, while 67% of the samples
(18 samples) were used for testing. In the MT data, the training and test data
of the classifier are divided into different sessions, because there are few samples
available, adding ten tests each session, but each subject performed two sessions.
In this way, the second session of each subject was used with ten samples for the
training of the classifier and the first session with the same subject for the tests.
3
http://www.setzner.com/avi-ssvep-dataset/.
8 R. Hübner et al.

Experimental Procedures. Regardless of the division of data for each exper-


iment, the algorithms for preprocessing, feature extraction and selection and
classification were the same. Figure 1 shows the execution flow and the algo-
rithms applied in each experimental stage.

Fig. 1. General flow of execution of the experiments presenting the algorithms used in
each step.

Generally, the classification algorithm uses different combinations of features


extracted for data training. In this experiment, the only features extracted is
the Power Spectral Density (PSD) of the SSVEP signal, which allows to train
the classification model independent of its class. This occurs because regardless
of the frequency stimulated, the PSD should have a higher value than the rest
of the non-invoked frequencies. Thus, training models of any frequency can be
applied to classify any test sample.

Results. In the analysis of the results with the ST data, three combinations of
data were used for the training and test, since each subject performed the same
experimental sequence three times. Thus, the first training section was used for
the classification model and the second and third for testing, and the other two
possible combinations to testing three different possibilities.
The best frequency range for the feature extraction was to use a standard
deviation equal to 0.3 (based on an exhaustive execution), that is, if the feature
extraction was performed around a frequency of 6 Hz, the range frequency was
from 5.7 to 6.3 Hz.
Figure 2a presents the bar plot with the results of the experiment with the
ST data. The best result was with subject 4, which the accuracy for the three
sessions was 100%. But the worst result was with subject 3 using the first session
as a test, which an accuracy of 14% was obtained. The overall mean accuracy of
all subjects was 70.75%.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 9

Fig. 2. Results of the experiment with the ST data from the AVI database.

The PSD charts were analyzed to determine the low results presented by
subject 3. In the first session, the target evoked a signal of 6.0 Hz, but the
PSD is higher around 12.0 Hz. This result implies both the poor training of the
classifier and the use of these data for testing, resulting in low accuracy.
Figure 2b presents a PSD of the first session performed by subject 4, in
which it obtained the highest accuracy (100%). It can be observed that in both
figures, the PSD is the highest around the evoked frequency and the rest of the
frequencies have low values. These data have good classifier training and also
result in good accuracy if used for the test.
In the results with the MT experiment, it was considered that the second
session of each subject would be better used for classifier training. The best
frequency range for the feature extraction was also with the standard deviation
equal to 0.3.
Figure 3a shows the bar plot with the results of the MT experiment. Most of
the results were better using the second session with the exception of subject 2.
The best result was with subject 4 and 5, in which the accuracy was 100% for
the two cases using the training with the second session. The worst result was
with subject 3 using both the first session and training as the second one, in
which an accuracy of 50% and 60% respectively was obtained. The overall mean
accuracy of all subjects was 84%.
The PSD graphics were analyzed to determine the low results presented by
subject 3. Figure 3b presents the PSD of the first session performed by this
subject. A signal of 9.3 Hz was evoked, but the PSD is larger around 6.5 Hz.
The tests performed with the experimental base of [24] demonstrated that it
is possible to use the codes developed by our work to evaluate an SSVEP-BCI
system.
10 R. Hübner et al.

Fig. 3. Results of the experiment with the MT data from the AVI database.

Fig. 4. Traffic lights built with LEDs used in experiment 2 prototype.

4.2 SSVEP-BCI System Based on Flickering Traffic Lights

In this experimental stage, the construction of our database for the evaluation
of the prototype using traffic lights with flickering LEDs was started, as well as
testing the functioning of the EEG equipment used.

Description of Equipment Used. For the development of the prototype,


two traffic lights made up of LEDs were used. Figure 4a shows the traffic light
constructed with the rest of the prototype, built with three diffuse of 10 mm,
with red, yellow and green LED color. Figure 4b presents the traffic light built
with three high-brightness 5 mm LEDs and a high brightness LED of 3 mm, two
in red, one yellow and one green (3 mm).
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 11

The variability of traffic lights were constructed to verify the difference of


the EEG signal presented when using diffuse or high brightness LEDs, since the
latter has a higher light intensity despite causing a visual nuisance.
The operation of the traffic lights is carried out with the aid of the Arduino
UNO4 , a free hardware electronic prototyping platform, which uses a microcon-
troller ATmega328P of 32 MB flash memory and 16 MHz speed. In addition to the
LEDs connected to the traffic lights, a push button was also added to manually
control the start of each session or to stop it if necessary.
The EEG equipment used in the experiments is the OpenBCI board5
of 32 bits with 8 channels for the EEG/MEG/ECG (Electroencephalo-
gram/Magnetoencephalograph/Electrocardiogram) measuring plus three auxil-
iary channels used for the measuring of a gyroscopic sensor. The equipment can
still be expanded to 16 channels using the module Daisy that accompanies the
equipment.
A helmet developed with a 3D printer was used to perform the experiment.
Ultracortex Mark 36 , used to couple the electrodes and the OpenBCI board.
The electrodes used for the experimentation are constructed with a Silver-Silver
Chloride (Ag-AgCl) alloy, dispensing with the use of electrolytic paste or gel, thus
allowing easy placement of the helmet on different subjects during an experiment.

Experimental Procedures. To simulate the traffic light with the LEDs with
flickering frequencies, a code was developed for the micro controller that allows
to specify the frequencies of each LED. In the case of a conventional SSVEP-BCI
experiment, it is desirable for multiple targets to flick at different frequencies, so
Eq. 1 was applied in the Arduino code, where the interval I is the time between
the LED activations by frequency division f desired by a unit, adding the division
by 2 to disregard the half cycle of the LED on/off, multiplying by 1000 to
calculate the time in milliseconds, and finally subtracting  which is the delay
of loop of code running on hardware. This delay was calculated using an LDR
light sensor connected to an Arduino in which the sensor was pointed at the
LED lit at different frequencies and the read sensor data sent to the computer
for analysis by a graph as a function of time. It has been found that this delay
varies from 1 to 2 ms, so the average of this value (1.5 ms) has been assigned
to .
I = [1 / f ] / 2 ∗ 1000 −  (1)
The following frequencies for each LED have been configured: red = 8 Hz,
yellow = 10 Hz e green = 12 Hz. Non-multiple frequencies were chosen from
each other, which prevents an overlapping phenomena from occurring in the
spectrogram, causing signal magnitude to be high around the multiples of the
invoked frequency.
Figure 5 shows a flowchart of the experimentation detailing the softwares
and hardware used, as well as the communication model made between them.
4
https://www.arduino.cc/.
5
http://openbci.com.
6
https://github.com/OpenBCI/Ultracortex/tree/master/Mark 3.
12 R. Hübner et al.

Obtaining the EEG signal by means of OpenBCI board is performed with the
software OpenBCI GUI v27 . This software sends the signal obtained using the
interface Lab Streaming Layer8 (LSL) in the form of streaming to a code writ-
ten in Python to receive the EEG signal and writes it to a file FIF (tool file
extension MNE) along with the markers received by the micro controller serial
port. Such markers are the time indications that denote the moment each light
in the traffic light was lit.

Fig. 5. Representation of the flow of experiment 2.

This stage of the experiments was performed with only one subject, since the
objective was to test the correct functioning of the EEG equipment and verify if
the prototype is enough to evoke a good signal SSVEP. The following protocol
for the realization of the sessions was adopted:

– Internal environment with low luminosity.


– Subject sitting approximately one meter away from the target.
– Subject is exposed to two sessions. Figure 6c demonstrates how the sequence
of a session is performed. At each session the SSVEP signal was evoked twenty
times with a random light sequence at the target. During the session, the LED
is active for 10 s with intervals of 5 s between one activation and another. In
this way, a session lasts for 15 min and 42 s.
– EEG data and markers were recorded in a single file FIF (referring to the tool
MNE) in a database for further offline analysis.

7
https://github.com/OpenBCI/OpenBCI GUI.
8
https://github.com/sccn/labstreaminglayer.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 13

Fig. 6. Illustrations of the protocol for experiment 2.

The electrodes were positioned on the subject’s scalp in locations of the


occipital lobe, parieto-occipital lobe and parietal lobe, respecting the system 10–
20 pattern. Figure 6a shows the positions of the eight electrodes that measure the
EEG signal (O1, Oz, O2, PO3, PO4, PO7, PO8 and Pz), plus two electrodes used
for reference and grounding (Fz connected to the frontal lobe and A2 connected
in the lobe of the right ear respectively). Finally, Fig. 6b shows the complete
assembly of the OpenBCI board connected to the Ag-AgCl electrodes together
with the helmet Ultracortex Mark 3.

Results. A code was developed with some modifications related to that used
in the experimental set 1. In this experiment we added the CAR space filter
(Common Average Reference), taking as reference the channels Oz, O2, PO4 and
PO7, as they were the channels with the highest VEP response, in addition to
the FIR filters (Hamming window) at the cut-off frequencies of 5 Hz and 50 Hz
and a filter notch in the frequencies of 60 Hz and 120 Hz.
The training and test data used to classify, were divided into 30% and 70%
portions respectively, performing a cross-validation in which the initial 30% were
used (six first trials) and the remainder for testing, from the second to the seventh
training trials and so on until completing fifteen different combinations.
14 R. Hübner et al.

Fig. 7. Accuracy of results obtained from cross-validation of experiment 2.

Fig. 8. Evoked 8 Hz with multiple channels.

The best frequency range for the feature extraction was with the standard
deviation equal to 1.0. This value was found using an exhaustive execution with
the 30% of the first triages used for classifier training SVM).
Figure 7 shows the graph with the results of experiment 2 using cross-
validation. The best result was with the 9th piece of data used for the training
of the classifier, in which the accuracy was 100%. The worst results were with
the 8th and 14th portions of data used for the classifier training, in which an
accuracy of 78% was obtained in both cases. The overall mean accuracy for all
cross-evaluation was 86%.
Figure 8 shows a PSDs of the session performed with stimuli in the frequencies
of 8, 10, and 12 Hz, in which it obtained the highest accuracy (100%). It can be
observed that in both figures the PSD is the highest around the evoked frequency
and the rest of the frequencies have low values. These data have good classifier
training and also result in good accuracy if used for the test.
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 15

Analyzing the results of the experimental set 2, it was possible to find a


sequence of flickering frequencies that could be used to evaluate the prototype
of semaphore constructed in addition to the EEG equipment used for data acqui-
sition. In the next section we demonstrate the directions we are taking to develop
an SSVEP-BCI system with non-flickering targets.

5 Towards in New SSVEP-BCI System


In this section we present some hypotheses raised through the research carried
out with the previous experiments. The idea is to develop an SSVEP-BCI system
for decision-making at traffic lights, avoiding that the targets have a visible flicker
frequency. In this context, the decision-making is to determine which of the lights
of a traffic light is active. Thus, the objective of this third experimental set-up
will be to construct a new BCI system with non-scintillating targets for human
vision to approach a real decision-making situation when a vehicular driver has
while viewing a traffic light while driving.
Some hypotheses are presented for this third experimental future set, which
consists of taking advantage of some strategies of the SSVEP paradigm presented
in the related works and previous experiments, using the same prototype of the
second experiment.
The first hypothesis is set targets with different flicker frequencies,
so that these frequencies are not visible to the human eye.
Strategy: The system should be able to identify frequencies above those used
in traditional SSVEP-BCI systems. The SSVEP-BCI systems use frequencies
generally until 30 Hz. This strategy will be applied by presenting frequencies
increasing above 30 Hz (1 in 1 Hz) for a set of subjects. In this way each subject
will inform at what time the flicker frequency would no longer be visible. Then
three different frequencies not visible to the subjects will be configured for the
targets.
Potential problems: In the work of [21] is show SSVEP stimuli above tradi-
tional frequencies can be used in BCI systems, but are more difficult to detect
because it have a very low SSVEP signal, which may imply a low accuracy in
the proposed system.
In this second hypothesis, account is taken of the same flickering fre-
quency when the targets are active, even though these frequencies are not visible
to the human eye.
Strategy: The BCI system should be able to differentiate the color/luminance
by the VEP response amplitude for the same stimulated frequencies. In this
way, the same strategy presented in the first hypothesis will be used to find
frequencies not visible to the human eye and to use the least of them to configure
the targets. The VEP response of the different targets will then be analyzed
using the amplitude difference information as the main feature. To support this
hypothesis, in the work carried out by [1], are shown how the colors used as
targets can influence the phase value in an SSVEP system.
16 R. Hübner et al.

Potential problems: Even with different values in relation to the col-


ors/luminance of the LEDs, such values can be little discriminative, resulting
in a low accuracy of the proposed system. In this way, a third and last hypoth-
esis is raised.
In the third hypothesis, the development of a BCI system taking into
account a mixture between the first and second hypotheses.
Strategy: In this model will be used a junction of the two previous hypotheses
with the premise of improving the performance of the proposed system. Taking
into account positive results in first and second hypotheses(not necessarily good
results), the intention of this strategy will be to obtain the maximum perfor-
mance of the two strategies used. For this, the classifier training model must be
applied to a data sequence that has at least all possible combinations between
different frequencies X different LED colors.
Potential problems: The problems of this hypothesis are related to the same
ones presented in the first and second hypotheses. In addition, different flickering
frequencies can evoke different values in the VEP signal regardless of the colors
of the LEDs, because there are evoked frequencies in which the VEP response
is stronger than others.
In this work we identify new strategies that can be used in SSVEP-BCI
systems to be applied in real situations. In our context, we want to apply to aid
in decision making at traffic lights. Summing up the hypotheses raised, the next
step is to develop and evaluate a system with these concepts.

6 Conclusion
In this paper, we investigate the SSVEP-BCI systems, evaluate a public database
using a new code developed and create our own database through a simulation
of decision-making using traffic lights. The decision process applied has well-
known actions: when the driver is in green, his can continue riding normally,
in red color the driver must decelerate and stop the car and, for some models
of traffic lights, in yellow light, the driver should have more attention at the
intersection, consequently reducing the speed of the vehicle.
However, we have verified that the traditional SSVEP-BCI system usually
uses flickering frequencies visible to the human eye, which makes it unfeasible to
use such a model in future real situations. In this way, the bibliographic survey
of some related works allowed to visualize some characteristics in this model
that can be useful for the development of a simulation closer to reality. It was
also possible to identify other factors in the methodology of these works that
contribute to the development of our system: the algorithms used in the SSVEP
signal processing, the section time performed by the subjects, rest time, number
of times each experiment was performed, possible scenarios of experimentation,
position of the electrodes for EEG acquisition, etc.
The practical experiments carried out already contributed to the develop-
ment of much of what we want to develop, since it was possible to evaluate
Towards in SSVEP-BCI Systems for Assistance in Decision-Making 17

the codes developed, the prototype built and the EEG equipment used, besides
generating satisfactory results for our research. With this, it was possible to
idealize hypotheses of the new system. The first two hypotheses will certainly
be developed, but the third hypothesis will be developed considering the results
obtained in the first and second, resulting in the proposed SSVEP-BCI system.

Acknowledgment. We would like to thank CNPq (Brazilian Council for Scientific


and Technological Development) scholarship Brazil (311685/2017-0).

References
1. Cao, T., Wan, F., Mak, P.U., Mak, P.I., Vai, M.I., Hu, Y.: Flashing color on the
performance of SSVEP-based brain-computer interfaces. In: 2012 Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society, pp.
1819–1822. IEEE, San Diego, August 2012
2. Carvalho, S.N., Costa, T.B., Uribe, L.F., Soriano, D.C., Yared, G.F., Coradine,
L.C., Attux, R.: Comparative analysis of strategies for feature extraction and clas-
sification in SSVEP BCIs. Biomed. Signal Process. Control. 21, 34–42 (2015)
3. Chaudhary, U., Birbaumer, N., Ramos-Murguialday, A.: Brain-computer interfaces
for communication and rehabilitation, pp. 513–525 (2016)
4. Chen, X., Wang, Y., Zhang, S., Gao, S., Hu, Y., Gao, X.: A novel stimulation
method for multi-class SSVEP-BCI using intermodulation frequencies. J. Neural
Eng. 14(2), 026013 (2017)
5. Duszyk, A., Bierzyńska, M., Radzikowska, Z., Milanowski, P., Kuś, R., Suffczyński,
P., Michalska, M., Labecki, M., Zwoliński, P., Durka, P.: Towards an optimization
of stimulus parameters for brain-computer interfaces based on steady state visual
evoked potentials. PLoS ONE 9(11), e112099 (2014)
6. Fazel-Rezai, R., Ahmad, W.: P300-Based Brain-Computer Interface Paradigm
Design. INTECH Open Access Publisher (2011)
7. Fouad, M.M., Amin, K.M., El-Bendary, N., Hassanien, A.E.: Brain computer inter-
face: a review. In: Hassanien, A.E., Azar, A.T. (eds.) Brain-Computer Interfaces:
Current Trends and Applications, pp. 3–30. Springer International Publishing,
Cham (2015)
8. Graimann, B., Allison, B., Pfurtscheller, G.: Brain-computer interfaces: a gentle
introduction. In: Brain-computer interfaces. In: Graimann, B., Pfurtscheller, G.,
Allison, B. (eds.) The Frontiers Collection, pp. 1–27. Springer, Heidelberg (2010)
9. Gramfort, A., Luessi, M., Larson, E., Engemann, D., Strohmeier, D., Brodbeck,
C., Goj, R., Jas, M., Brooks, T., Parkkonen, L., Hämäläinen, M.: MEG and EEG
data analysis with mne-python. Front. Neurosci. 7, 267 (2013). http://journal.
frontiersin.org/article/10.3389/fnins.2013.00267
10. Halder, S., Pinegger, A., Käthner, I., Wriessnegger, S.C., Faller, J., Antunes, J.B.P.,
Müller-Putz, G.R., Kübler, A.: Brain-controlled applications using dynamic P300
speller matrices. Artif. Intell. Med. 63(1), 7–17 (2015)
11. Yang, B.-H., Yan, G.-Z., Wu, T., Yan, R.: Subject-based feature extraction using
fuzzy wavelet packet in brain-computer interfaces. Signal Process. 87(7), 1569–
1574 (2007)
12. Hwang, H.-J., Lim, J.-H., Jung, Y.-J., Choi, H., Lee, S.W., Im, C.-H.: Development
of an ssvep-based BCI spelling system adopting a qwerty-style LED keyboard. J.
Neurosci. Methods 208(1), 59–65 (2012)
18 R. Hübner et al.

13. Lin, K., Cinetto, A., Wang, Y., Chen, X., Gao, S., Gao, X.: An online hybrid bci
system based on ssvep and emg. J. Neural Eng. 13(2), 026020 (2016)
14. Lin, Y.-P., Wang, Y., Jung, T.-P.: Assessing the feasibility of online SSVEP decod-
ing in human walking using a consumer EEG headset. J. Neuro Eng. Rehabil.
11(1), 119 (2014)
15. Martišus, I., Damaševičius, R.: A prototype SSVEP based real time BCI gaming
system. Intell. Neurosci. 2016, 18 (2016)
16. McCoy, E.J., Walden, A.T., Percival, D.B.: Multitaper spectral estimation of power
law processes. IEEE Trans. Signal Process. 46(3), 655–668 (1998)
17. McFarland, D.J., McCane, L.M., David, S.V., Wolpaw, J.R.: Spatial filter selec-
tion for eeg-based communication. Electroencephalogr. Clin. Neurophysiol. 103(3),
386–394 (1997)
18. Mühl, C., Gürkök, H., Bos, D.P.-O., Thurlings, M.E., Scherffig, L., Duvinage, M.,
Elbakyan, A.A., Kang, S., Poel, M., Heylen, D.: Bacteria hunt: evaluating multi-
paradigm BCI interaction. J. Multimodal User Interfaces 4(1), 11–25 (2010). Open
Access
19. Prashant, P., Joshi, A., Gandhi, V.: Brain computer interface: a review. In: 2015
5th Nirma University International Conference on Engineering (NUiCONE), pp.
1–6. IEEE, Ahmedabad, November 2015
20. Regan, D.: Steady-state evoked potentials. J. Opt. Soc. Am. 67(11), 1475–1489
(1977)
21. Sakurada, T., Kawase, T., Komatsu, T., Kansaku, K.: Use of high-frequency visual
stimuli above the critical flicker frequency in a ssvep-based bmi. Clin. Neurophysiol.
126(10), 1972–1978 (2015)
22. Shenoi, B.A.: Introduction to Digital Signal Processing and Filter Design. Wiley-
Interscience (2005)
23. Vilic, A., Kjaer, T.W., Thomsen, C.E., Puthusserypady, S., Sorensen, H.B.D.: DTU
BCI speller: an SSVEP-based spelling system with dictionary support. In: 2013
35th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), pp. 2212–2215. IEEE, Osaka, July 2013
24. Vilic, A.: AVI SSVEP dataset (2014). http://www.setzner.com/avi-ssvep-dataset
25. Zhu, D., Bieger, J., Molina, G.G., Aarts, R.M.: A survey of stimulation methods
used in SSVEP-based BCIs. Intell. Neurosci. 2010, 1:1–1:12 (2010)
Image-Based Wheel-Base Measurement
in Vehicles: A Sensitivity Analysis to Depth
and Camera’s Intrinsic Parameters

David Duron-Arellano(&), Daniel Soto-Lopez,


and Mehran Mehrandezh

University of Regina, 3737 Wascana Pkwy, Regina, SK S4S 0A2, Canada


duad92@gmail.com,
{sotolopd,mehran.mehrandezh}@uregina.ca

Abstract. Image-based metric measurement has been widely used in industry


for the past decade due to the recent advancement in processing power and also
the unobtrusiveness of this method. In particular, this method is gaining atten-
tion in the realm of real-time detection, classification, and inspection of vehicles
used in intelligent transportation systems for law enforcement. These systems
have proven themselves as a plausible competition to under-the-pavement loop
sensors. In this paper, we analyze the sensitivity in image-based metric mea-
surement for vehicles’ wheel base estimation. Results lead to a simple guideline
for calculating the optimal configuration yielding the highest resolution and
accuracy. More specifically, we address the sensitivity of the metric measure-
ments to the depth (i.e., the distance between the camera and the vehicle) and
also internal calibration parameters of the visible-light imaging system (i.e.,
camera’s intrinsic parameters). We assumed a pinhole projection model with
added barrel effect, aka, lens distortion. A 3D video simulation was developed
and used as a Hardware-in-the-Loop (HIL) testbed for verification and valida-
tion purposes. Through a simulated environment, three case studies were con-
ducted to verify and validate theoretical data from which we concluded that the
error due lens distortion accounted for 0.014% of the total error whereas the
uncertainty in the depth of the vehicle with respect to the location of the camera
accounted for 99.8% of the total error.

Keywords: Image-processing  Digital-metrology  Vision-systems

1 Introduction

As vehicle population has been increasing exponentially over the years, new and cost-
effective technologies for monitoring and controlling the traffic have been developed.
Intelligent systems, such as vision-based vehicle classification systems, have been
continuously investigated for its affordability and efficiency. Two major applications of
these systems are toll collection and law enforcement, which make use of a wide
variety of techniques to detect, characterize, count and classify vehicles.
These previously mentioned techniques are usually implemented in accordance to
the 13-vehicle classification scheme [1] described by the Federal Highway

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 19–29, 2019.
https://doi.org/10.1007/978-3-030-02686-8_2
20 D. Duron-Arellano et al.

Administration (FHWA). This scheme is based on the classification of vehicles’


wheelbase and number of axles. Also, even though, several technologies have been
explored to comply with the FHWA regulations, under-pavement loop sensors have
been the most broadly implemented ones, mainly because of its reliability and
robustness.
Nevertheless, one of the biggest concerns of loop sensors is the intrusiveness that
they entail. That is to say, when one of the sensors has to be replaced, the pavement
needs to be removed and rebuilt again, which is an expensive, complex and time-
consuming process.
Alternative techniques, such as vision systems, which don’t involve any kind of
intrusion without compromising accuracy, efficiency or affordability, are being further
explored.
Therefore, the purpose of our research narrows down to selecting a vision system
and analyzing the sensitivity of its parameters, which would lead us to the most
effective configuration for accurately measuring wheelbase and counting axles.
Through our analysis we conclude that lens distortion and depth assumption are
regarded as the parameters that carry the biggest error in metrology applications of this
nature.
Due to the convexity of the lens, the error on the output measurements obtained
from an image grows non-linearly as the observed features approximate the borders of
the image. Also, as depth is not an implicit parameter in the vision system, it has to be
assumed to provide the scale factor for the measurement, which is originally depicted in
pixels. This assumption carries a range of uncertainty which accounts for significant
errors on the output.
In this paper, the sensitivity of the latter are analyzed by means of the Pinhole
Projection (PHP) and the Brown-Conrady (BC) models and an optimized setup is
proposed as a result of the analysis. Also, a 3D simulated environment is presented to
run tests for verification and validation of the theoretical data.

2 Definition of Parameters

In the case study presented in this paper, it is first assumed that a camera is located on
the side of a 2-lane freeway regulated under the FHWA. The objective of this analysis
is to observe how the wheelbase estimation is affected due to major uncertainties in the
process, provided an assumed depth.
As it is depicted in Fig. 1, there are six parameters involved in the process. Namely,
wheelbase Wl, vehicle length Vl, vehicle width Vw, lane width lw, distance from the
camera to the center of the lane Zc and position of the vehicle in the lane Pl. As well as
the assumed depth Za and the real depth Zr, which are not depicted in Fig. 1.
Even though all these parameters may vary, we can assume that given specific
conditions such as location and fixed configuration, or because they do not have a
direct relation to the output, most can be disregarded as uncertainties. Thus, the
assumed depth Za and the lens distortion are regarded as the major uncertainties, and
the only ones that pertain to the analysis.
Image-Based Wheel-Base Measurement in Vehicles 21

Fig. 1. Camera located in the freeway side perpendicular to the vehicle.

Vehicle length. Even though the uncertainty due to vehicle length is disregarded as
it is implicit in the wheelbase and thus irrelevant, it is initially relevant when defining
the camera location to guarantee the required field of view.
Lane Width. For this case study the width of the lane is set to be 3.6 m, as it is the
required width of a single lane on any rural/urban freeway according to the FHWA [2].
Wheelbase Length. According to the FHWA 13-vehicle classification scheme,
under the Function Class 11 depicted in Table 1, which describes the Urban Interstate
Freeways statistics, the overall wheelbase distribution falls within 1 and 45 ft (0.3048
to 13.716 m). Nevertheless, at least 75.7% of the samples fall in the class 2 (Table 1),
within 6 and 10.10 ft (1.8288 to 3.0784 m) and at least 93.8% of the samples fall within
6 and 23.09 ft (1.8288 to 7.0378 m). Although this parameter does not directly affect
the process of wheelbase estimation, it is considered as it defines the required field of
view. Moreover, the understanding of the distribution helps us narrowing down the
case study.

Table 1. Urban Interstate Freeways wheelbase range for the FHWA 13-vehicle classification
scheme
Function class 11 [7]
Class 1 2 3 4 5 6 7 8 9 10 11 12 13
Vehicles on the 0.2 75.7 15.7 0.2 1.6 0.8 0 1.1 3.9 0.2 0.2 0.1 0.6
road distribution
(%)
Wheelbase 1.00– 6.00– 10.11– 23.10– 6.00– 6.00– 6.00– 6.00– 6.00– 6.00– 6.00– 6.00– 6.00–
range (ft) 5.99 10.10 23.09 40.00 23.09 23.09 23.09 26.00 30.00 26.00 30.00 26.00 45.00

Since we can observe that the variation of wheelbase is considerably broad, it is


important to note that as the wheels’ position move within the image frame, the
estimation is subjected to higher distortions due to lens convexity as the features of
interest (wheels) approach the edges.
22 D. Duron-Arellano et al.

Fig. 2. (a) Vehicle close to left lane line is perceived smaller; (b) vehicle close to right lane line
is perceived bigger.

Position in the Lane Pl. As it is depicted in Fig. 2 the position of the vehicle in the
lane affects directly the perception of the dimension of the object. Therefore, for this
case study it is assumed that the car moves only within the lane and its position is
assumed to be defined under a normal distribution with an average location in the
middle of the lane, 1.8 m from the sideline.
Vehicle width Vw. Just as in position in the lane with the position, the width of the
vehicle also modifies the real depth, which directly modifies the perceived dimension.
Consequently, although, vehicle width may vary from virtually 0 to the maximum
allowable width of 2.6 m, established by the Federal Aid-Highway Act [3], the average
vehicle width is 1.8 m [4]. Thus, the one considered for this analysis.
Therefore, under the previously described assumptions of Pl and Vw, we can
establish a variation in depth of Pl  V2w ¼ 1:8 m  0:9 m ¼ 0:9 m.
It can be observed that Vehicle width and Position in the Lane work simultaneously
as they together affect the real depth Zr, which deviates from the assumed depth Za, as
described in the next section.

3 Depth Assumption Uncertainty

For this case study, a Canon E057D with Lens EFS 18-135 mm set in 50 mm (focal
length) and 2592  1728 pixels of aspect ratio has been used. The focal length in
pixels, which is the distance between the lens and the point where the rays converge to
a focus and obtained by means of the MATLAB Calibration Toolbox, is 5922.84 ± 54
pixels. This focal length does not vary along this analysis as the nature of the proposed
system is as a fixed system and it has been chosen upon the desired visibility given a
certain location from the object-of-interest.
As stated before, the vehicle is assumed to be moving within ±0.9 m from the
center of the lane. Therefore, since the Assumed Depth Za in the estimation of the
wheelbase should be the distance between the camera and the visible wheels (outer face
of each vehicle), the assumed depth must be 0.9 m before the center of the lane.
Image-Based Wheel-Base Measurement in Vehicles 23

Assuming that the field of view is determined by the maximum length to be


perceived, which is that of a single-trailer semi-truck (65′  20 m), by means of the
PHP model we can obtain that

Xc f 20 m  5922:84 pixels
Zc ¼ ¼ ¼ 45:70 m ð1Þ
x 2592 pixels

where Zc is the distance to the object, which for this first calculation is assumed to be in
the center of the lane, Xc is the length of the object in meters, f is the focal length in
pixels and x is the length of the object in pixels.
Since the distance to the center of the lane should be 45.70 m to perceive a
maximum length of 20 m, the assumed depth Za, considering the outer face of the
vehicle and its width and position in the lane variations, should be 44.80 m ± 0.9 m.
To illustrate the variation on the wheelbase estimation error due to depth
assumption, a random vehicle with 2.5 m wheelbase (Xc) is considered. And, by means
of the basic PHP model we can estimate that

Xc f 2:5 m  5922:84 pixels


x¼ ¼ ¼ 330:51 pixels ð2Þ
Zc 44:80 m

where x is the estimated wheelbase in pixels and Zc is the distance to the object.
This previous analysis gives us the wheelbase in pixels when the side-face of the
car is exactly 44.80 m away from the camera (the vehicle is centered), disregarding all
other uncertainties. Nevertheless, as it is discussed before, the actual depth may vary up
to 0.9 m as the car moves within the lane. This uncertainty in depth is reflected as
follows

Zc x ð44:80 m  0:9Þ  330:51 px


Xc ¼ ¼ ¼ 2:50  0:05 m ð3Þ
f 5922:84 px

It is important to note that since the convexity of the lens is not being considered,
the variation on the estimation Xc is linear due to the linearity of the equation. Also, it
can be observed that there is a 2% of uncertainty in the estimation of the wheelbase.
Then, it stands out that as either the distance from the camera to the object or the focal
length increase, the variation on the estimation decreases. Nevertheless, this decrement
in variation is directly proportional to a decrement in the resolution. Therefore, the
accuracy of the results would rely on a point where both parameters, variation due
depth assumption and resolution, are optimized.

4 Camera Intrinsic Parameter Uncertainty

For the case when the intrinsic parameters are regarded as uncertainties, in our analysis,
barrel distortion along with tangential distortion account for the major variation.
To account for the above-mentioned distortions, the BC equation, (4) [5], has been
utilized for the case of estimating wheelbase.
24 D. Duron-Arellano et al.

   
x2 ¼ x2 1 þ k1 r 2 þ k2 r 4 þ 2p1 x1 y1 þ p2 r 2 þ 2x21 ð4Þ

where x2 is the distorted point, x1 and y1 are the real point coordinates, k1 , k2 are the
radial distortion coefficients of the lens, p1, p2 are the tangential distortion coefficients
pffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and r ¼ x21 þ y21 .
For this case study, following previously stated assumptions, most importantly the
camera location and configuration, and isolating this newly presented uncertainty
source, two cases come out: (1) as the length of the vehicle increases, the features
(axles) get closer to the edges, thus being subjected to higher distortions; (2) as the
height of the wheels deviates from the average, set at the center of the projection, the
features are also subjected to higher distortions as they get closer to the edges.

Fig. 3. Barrel and tangential distortion on unitary frame.

As seen in Fig. 3, the previously described situation for the above-mentioned


camera lens used for this case analysis is represented by blue vectors on a unitary
frame. These vectors represent the deviation of the pixels from its real location before
the lens distortion and it can be observed that the deviation is slightly bigger on the
vectors on the bottom because of the tangential distortion. The radial distortion coef-
ficients [−0.0941 0.1017] and the tangential distortion coefficients [−0.0012 0.0051]
have been obtained through the Image Calibration Toolbox by MATLAB.
Below we present two case studies. In the first one, we show how the perception of
the location of the points-of-interest (POI) varies due to lens distortion depending on its
location on the x-axis; this by analyzing two scenarios: the first one with an average
small vehicle and the second one with an average large vehicle. In the second case
study, in the other hand, we analyze the variation depending on the position of the POI
on the y-axis.
For case 1, since the distortion grows exponentially, as seen in (4), the variation of
wheelbase close to the center of the projection is less sensitive than that closer to the
edges.
Image-Based Wheel-Base Measurement in Vehicles 25

In a first scenario, we assume that the small vehicles (from 2 to 6 m long) will
present the lesser variation for its proximity to the center of the image, as explained
before.
When the average small vehicle (4 m) is considered, according to the BC equation
in (4), we observe a variation from the real value of 0.014 m. In a second scenario,
considering the biggest vehicle assumed for this analysis, a single-trailer semi-truck
(20 m), and according to Eq. 4, we obtain a variation of 0.152 m.
It can be observed that the sensitivity increases with the length of the vehicle as the
features approach the edges of the frame. When increasing the size of the vehicle 5
times the error not only increases but it does it non-linearly: more than 10 times,
0.152:0.014 or 10.85.
For case 2, as stated before, the variations on wheels’ height also affect the output
non-linearly as it deviates from the center in any direction.
In order to obtain the lesser variation in the output, the center of projection of the
camera is matched with the center of the axle of the average wheel, 16 in. (40.64 cm)
[6], which is 20.32 cm apart from the pavement.
In the first scenario for this second case, taking a semitrailer-truck’s wheel as the
highest allowable wheel size, 22.5 in. (57.15 cm), a maximum variation of 8.255 cm in
the positive y-axis from the average height is considered. Then, by means of (4), we
calculate an error of 0.0049 m.
On the other hand, in the second scenario, when we consider the same variation of
8.255 cm but this time towards the negative direction in the y-axis, we now obtain an
error of 0.0051 m. From this we can observe that the variation is slightly more sensitive
when wheels are smaller than that when they are bigger than the average. As we can see
in the representation of the distortions in Fig. 3, the distortions tend to be bigger in—y;
this is attributed to the tangential distortion of the current camera setup.
A similar process is followed when analyzing the sensitivity of the wheelbase
estimation when the distance from the camera to the object (Zc) is considered to be
uncertain and at the same time considering the image to be subjected to lens distortion.
In this case, the image is subjected to two different uncertainty sources, which lead to
even bigger variations on the wheelbase estimations.
Nevertheless, it is well understood that resolution plays a bigger role when varia-
tions due lens distortion can be minimized. That is to say, when having a closer picture
of an object, the error due to a minimized lens distortion is compensated and even
outperformed by the increase in resolution. This situation is possible since the barrel
distortion is almost completely eradicated when undistorting the frames by means of
the BC model [5] and the tangential distortion is negligible.

5 Validation and Verification Using a 3D Simulated


Environment

Accuracy in wheelbase measurement requires actual values that are a challenge to


collect due the nature of real-world scenarios. To gain a better understanding of how
variations in the vehicle width and its position on the lane affect the accuracy of the
result, a 3D simulated environment was created. The wheelbase of a rendered vehicle in
26 D. Duron-Arellano et al.

real time was displayed on a LED monitor and measured by counting the number of
pixels between the center of each axle as well as physically measured with a ruler. The
center of each of the wheels was denoted with a one-pixel red dot for easier reference.
Unlike measuring the wheelbase of a real vehicle, with this method is possible to find
the wheelbase of the vehicle with absolute accuracy. This proposed methodology
creates a validation tool to provide a simulated test bench for testing and evaluation of
visual sensors used for inspecting wheelbase in a structured lab environment without
having to leave the lab for in-field testing for the first time. In order to reproduce the
setup of a camera located aside the freeway as shown in Fig. 1, a video camera was
placed in front of the LED monitor as displayed in Fig. 4. To simulate the depth change
due the position of the vehicle within the width of the lane, the rendered vehicle was
resized in order for the camera to perceive the size of the vehicle as portrayed in Fig. 5.

Fig. 4. Experimentation setup of camera located on the freeway side of a lane, perpendicular to
the vehicle.

Fig. 5. (a) Vehicle in middle of left lane line is perceived at one size. (b) vehicle far in the left
lane is perceived as smaller while the vehicle in (c) close to right lane line is perceived bigger.
Image-Based Wheel-Base Measurement in Vehicles 27

In the simulation setup, a LG LED LCD E250 V monitor with a native resolution
of 1920  1080 pixels and a screen size of 54.85 cm diagonally was utilized to render
a 3D simulation of a Class 2 vehicle. Also, a video camera Sanyo Xacti VPC-FH1 with
a built-in lens set at 5.95 mm was used to record video at a resolution of 1920  1080
pixels. The focal length (f) in pixels 2181 ± 0.95 pixels was obtained by means of the
MATLAB Calibration Toolbox.
The field of view is determined by the maximum length to be perceived, for this
experiment the maximum length is the width of the monitor 47.8 cm, and by means of
the Eq. 1 we obtain that the distance of the camera to the monitor Zc is 54.3 cm, where
Xc is the length of the monitor in centimeters, f is the focal length in pixels and x is the
length of the monitor in pixels.
Once we obtained the ideal distance of the camera to the monitor, we subtracted
f = 5.95 mm and accomplished the final distance of the camera with respect to the
monitor as 53.7 cm. We achieved the alignment of the 1920  1080 pixels of the
monitor with the 1920  1080 pixels of the video samples recorded with the camera
through exhaustive calibration.
With the above-mentioned parameters the absolute distance between the camera
and the monitor Zr is 54.3 cm and to recreate the variation in depth as Fig. 5
demonstrates, the rendered size of the vehicle was decreased in case 2 by 5 pixels to
illustrate a higher depth and increased by 5 pixels in case 3. For all these cases, the
recreated wheelbase Xr is 5.85 cm and the assumed value of Za is 54.3 cm.
One video for each case was recorded and for each video a frame was extracted for
analysis when the rendered vehicle was located at the closest point to the center of the
field of view.
Each frame was analyzed to obtain wheelbase x for PHP model and undistorted
using the radial distortion coefficients, [−0.1604 0.0653] and the tangential distortion
coefficients, [7.5313e−04–6.0965e−04] obtained through the Image Calibration
Toolbox by MATLAB. Wheelbase in pixels x was measured in each of the six pictures.
The measurement was made using the area of pixels with the highest red contrast
denoting the center of each wheel. By measuring the corresponding values of x1−3 and
Zc1 c3 we observed that for each of the extracted frame samples, the wheelbase x values
were exactly the same number of pixels displayed on the monitor.
For this experiment we performed wheelbase estimation for each of the recreated
depth values for case 1, case 2 and case 3 as depicted in Fig. 5.
By means of the Eq. 3, we calculated wheelbase distance in centimeters Xc for case
1, case 2 and case 3.
In case 1, the assumed depth is fixed at 54.3 cm. From Table 2, using Eq. 3, it can be
seen that the picture taken at the assumed Za depth shows a variation in X1 of 0.0020 cm
for PHP model and 0.0005 cm for BC model accounting for an error due distortion of
0.021% for PHP model and 0.004% for BC model. For the case 2, using the same Za
value, it showed a variation in X2 of 0.1240 cm for PHP model and 0.1235 cm for BC
model accounting for an error due distortion of 0.0004% for PHP and 0.0004% for BC
model causing a final total error of 2.119% for PHP and 2.111% for BC model. Lastly, in
case 3, it showed a variation in X3 of 0.1235 cm PHP and 0.1247 cm for BC model
28 D. Duron-Arellano et al.

Table 2. Results for Case 1, Case 2 and Case 3


Wl Za Zr x (Observed Zc Xc Xc (Offset % of % of error
(Recreated (Assumed (Recreated Wheelbase) (Calculated (Calculated with error due due Za
wheelbase) depth) depth) (px) depth) (cm) Wheelbase) respect to distortion and
(cm) (cm) (cm) (cm) Wl) (cm) distortion
Case 1 (for x1 = 235 pixels)
PHP 5.85 54.3 54.3 235.05 54.2814 5.8520 0.0020 0.021 0.034
BC 234.99 54.2953 5.8505 0.0005 0.004 0.009
Case 2 (for x2 = 230 pixels)
PHP 5.85 54.3 55.47 229.99 55.4733 5.7260 0.1240 0.004 2.119
BC 230.01 55.4708 5.7265 0.1235 0.004 2.111
Case 3 (for x3 = 240 pixels)
PHP 5.85 54.3 53.16 239.93 53.1774 5.9735 0.1235 0.029 2.111
BC 239.98 53.1663 5.9747 0.1247 0.008 2.132

accounting for an error due distortion of 0.029% for PHP and 0.008% for BC model
causing a final error of 2.111% for PHP and 2.132% for BC model.

6 Conclusions

In this paper, by means of the Pinhole Projection and the Brown-Conrady models, we
analyzed how the wheelbase estimation is affected due to major uncertainties in the
measuring process, in the case where a certain depth is assumed. Through a simulated
environment, two case studies were conducted to verify and validate theoretical data
from which we can conclude that in the three cases, the error due radial and tangential
distortion presented an error of up 0.03% accounting for 0.014% of the total error in
PHP model and 0.004% in BC model in case 3, whereas the uncertainty in the depth of
the vehicle with respect to the location of the camera represented an error of up to
2.132% in Xc3 , accounting for 99.8% of the total error. Distortion model has proven to
minimize the sensitivity on the wheelbase estimation, although further applications
should prioritize on estimating accurate depth for it is the most sensitive source of
variation and it accounts for the highest errors. Finally, it can also be concluded that for
metrology applications through vision systems, even though there are several uncer-
tainty sources to be considered and apart from correction models, resolution and
processing speed, precise measurements depend on a very high percentage on an
accurate estimation of depth.

References
1. Hallenbeck, M.E., Selezneva, O.I., Quinley, R.: Verification, Refinement, and Applicability
of Long-Term Pavement Performance Vehicle Classification Rules. No. FHWA-HRT-13-091
(2014)
2. Stein, W.J., Neuman, T.R.: Mitigation Strategies for Design Exceptions. No. FHWA-SA-07-
011 (2007)
3. Weingroff, R.F.: Federal-aid highway act of 1956: creating the interstate system. Public Roads
60(1) (1996)
Image-Based Wheel-Base Measurement in Vehicles 29

4. DoT, U.S.: Federal size regulations for commercial motor vehicles (2004)
5. Brown, D.C.: Decentering distortion of lenses (PDF). Photogramm. Eng. 32(3), 444–462
(1966)
6. Blow, P.W., Woodrooffe, J.H., Sweatman, P.F.: Vehicle Stability and Control Research for
US Comprehensive Truck Size and Weight (TS&W) Study. No. 982819. SAE Technical
Paper (1998)
7. Hajek, J.J., Selezneva, O.J., Mladenovic, G., Jiang, Y.J.: Estimating Cumulative Traffic
Loads, Volume II: Traffic Data Assessment and Axle Load Projection for the Sites with
Acceptable Axle Weight Data, Final Report for Phase 2. No. FHWA-RD-03-094 (2005)
Generic Paper and Plastic Recognition
by Fusion of NIR and VIS Data
and Redundancy-Aware Feature Ranking

Alla Serebryanyk1(B) , Matthias Zisler2 , and Claudius Schnörr1


1
University of Applied Sciences Munich, Munich, Germany
alla.serebryanyk@hm.edu, schnoerr@cs.hm.edu
2
Institute of Applied Mathematics, University of Heidelberg, Heidelberg, Germany
zisler@math.uni-heidelberg.de
http://schnoerr.userweb.mwn.de/

Abstract. Near infrared (NIR) spectroscopy is used in many applica-


tions to gather information about chemical composition of materials.
For paper waste sorting, a small number of scores computed from NIR-
spectra and assuming more or less unimodal clustered data, a pixel clas-
sifier can still be crafted by hand using knowledge about chemical prop-
erties and a reasonable amount of intuition. Additional information can
be gained by visual data (VIS). However, it is not obvious what fea-
tures, e.g. based on color, saturation, textured areas, are finally impor-
tant for successfully separating the paper classes in feature space. Hence,
a rigorous feature analysis becomes inevitable. We have chosen a generic
machine-learning approach to successfully fuse NIR and VIS informa-
tion. By exploiting a classification tree and a variety of additional visual
features, we could increase the recognition rate to 78% for 11 classes,
compared to 63% only using NIR scores. A modified feature ranking
measure, which takes redundancies of features into account, allows us to
analyze the importance of features and reduce them effectively. While
some visual features like color saturation and hue showed to be impor-
tant, some NIR scores could even be dropped. Finally, we generalize this
approach to analyze raw NIR-spectra instead of score values and apply
it to plastic waste sorting.

Keywords: Near Infrared (NIR) Spectroscopy · Waste sorting


Visual Features (VIS) · CART · Feature ranking · Machine-learning

1 Introduction
More than 16 million tons of waste paper are processed each year in Germany
[4]. At our partner facility around 130,000 tons per year are handled. A high
sorting quality of the waste paper is critical to achieve a high grade of recy-
cled paper while keeping the environmental footprint to a minimum. In [10], a
general overview of many methods in the field of paper waste sorting is given,
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 30–45, 2019.
https://doi.org/10.1007/978-3-030-02686-8_3
Generic paper and plastic recognition and redundancy-aware feature ranking 31

and the impact is emphasized these methods can have on the conservation of
natural resources in terms of energy and water consumption, CO2 -footprint, and
environmental pollution. Ultimately, good knowledge about the input material
may be used to optimize the parameters of the sorting facility, e.g. the conveyor
belt speed.
We address this paper sorting problem by using near infrared (NIR) and
additional RGB (red-green-blue) visual data. From the visual data, we use the
RGB and HSV (hue-saturation-value) color components and compute a huge
variety of features consisting of classical and statistical texture sensitive features
(VIS-features).
There is also a strong need for optimizing the parameters of sorting facilities
for plastic waste based on the composition of the input material in order to
improve the throughput and the sorting quality. In the European Community
alone there are 26 million tons of plastic waste to be sorted, only 30% of them
are recycled1 . This is all the more important since China has denied to take the
plastic waste from Europe any longer. The quality of the sorted output in terms
of purity and attainable constant properties of sorts is crucial for the usability
in many applications and thus for the price of the recycled materials.
Our classifier implementation of a Classification and Regression Tree (CART)
allows a ranking of the features by importance and thus can be used to select only
the most important features. Furthermore, the complexity of the classifier can
be parameterized to create simpler decision trees which has proven to be more
robust in case of high measuring errors and partly non-representative data. The
optimal decision tree ultimately results by a cross-validation training scheme.
For paper waste, we compare the classification performance in three experi-
ments: First, only NIR scores are used for training, then RGB and HSV data is
added, and finally a whole variety of visual (VIS) features is combined. Based on
the set of NIR and VIS features we were able to show the power of an importance
ranking for an effective feature selection.
For plastic waste, we have direct access to the raw spectra, so we can analyse
the raw spectra of a NIR camera instead of pre-processed score values, as we
were limited to do in the paper waste case. In this case the improved feature
ranking is able to identify the wavelengths with most discriminative power for
the trained plastic sorts.
The rest of the paper is arranged as follow: in Sect. 2 the setting for the
recording of the paper and plastic waste material is sketched and the character-
istics of the available sensor data is described. Section 3 briefly mentions classic
approaches to analyse and classify waste material, and a list of feature ranking
approaches is given, one of them based on the CART is pursued further and
discussed in more detail in Sect. 4. In particular, in Sect. 4.2, our modification
of the CART feature ranking is given to adequately regard the redundancy of
features. This modification is empirically verified by a synthetic data example.
Section 4.3 states a modification to the pruning of the CART to improve its
robustness. The preprocessing of the paper data and plastic spectra is stated in
1
According to a recent newspaper report.
32 A. Serebryanyk et al.

Sect. 4.4. Section 5 describes how the recognition rate could be increased from
63% to 78% by fusing NIR and VIS data, and the effectiveness of our feature
ranking and reduction method is proved on the used paper features and on the
plastic spectra. Finally, Sect. 6 summarizes the main results and states ideas for
future work.

2 Characteristics of Waste Data


2.1 Paper Data
Line scan cameras for NIR and RGB were used to image the conveyor belt
transporting the waste paper. The system used in a real paper sorting plant
recorded 172 NIR tracks and 1204 RGB tracks at 175 scans per second and a
belt speed of around 0.5 m/sec and covered a width of circa 90 cm (see top at
Fig. 1).

Fig. 1. Example visualization of the classification results on real world paper data. The
upper image shows the RGB data of a section of the conveyor belt. Each color in the
lower image represents the recognized paper class. The background is colored in black.

Overall, 29 NIR-based features or scores were used for the classification prob-
lem and were processed from the raw NIR spectra similarly to [9]. A third
party project partner, a NIR camera manufacturer, provided these scores. These
consist of 11 scores discriminating plastic versus paper, 15 scores sensitive to
different paper classes, and 3 values measuring the content of characteristic
chemicals: talcum, kaolin, and lignin. Plastic content may result from coated
paper classes, adhesive tapes or foils, for example.
Generic paper and plastic recognition and redundancy-aware feature ranking 33

Table 1. Paper classes to be discriminated, with N = i Ni = 4175121 samples
in total.

Class index Abbreviation Description Samples Ni


0 BG Background 853573
1 ZD Newspaper 473144
2 MGWD Magazine/advertising print 854485
3 BP Bureau paper 540297
4 WPb Corrugated paper brown 196494
5 WPw-u Corrugated paper white covered and uncoated 217558
6 WP-g Corrugated paper coated 118834
7 KA-u Carton package uncoated 90218
8 KA-g Carton package coated 538842
9 SV Other packages 152433
10 UN Unassigned objects 139243

Based on the visual RGB data a huge variety of features is computed


consisting of co-occurrence features, histogram moments, Haar wavelet filters,
anisotropic Gaussian filters, and first and second order spatial derivatives for
various mask widths and orientation angles (VIS features).
The NIR-scores and VIS-features are then combined in a feature vector of
dimension d: x ∈ Rd for each pixel of a track. The set of feature vectors X =
{xi }, i ∈ {1, . . . , N } along with a class label from labeled data form the training
data set we operate on. Thus, NIR- and VIS-features are fused in these vectors
and treated in a common sense by the classifier and feature ranking procedure.
We discriminate 10 paper classes which were defined by a third party project
partner. The conveyor belt is treated as a separate background class. Thus, a
total number of 11 classes are discriminated for the results in this paper (see
Table 1).

2.2 Plastic Data


To test the recognition of plastic waste only one bottle per plastic class was
available. The bottles were cleaned, and labels or markers were removed. This is
only a small data set, and the preparation had to lead to too optimistic results
in terms of recognition rates, but we wanted to check two aspects:
– Does our generic approach have a chance to be successfully transferred to the
treatment of plastic waste?
– Can the feature selection analysis be successfully applied to raw NIR-spectra
as well to overcome the need of experts experience to compute application
dependent score values?
For plastic objects, the NIR-camera recorded 320 tracks perpendicular to the
belt movement in the range of 900−1200 ηm and a wavelength resolution of
34 A. Serebryanyk et al.

Table 2. Plastic classes to be discriminated

Class index Abbreviation Description


0 BG Background
1 PET raw Polyethylene Terephthalate raw material
2 PET bottles PET bottles
3 PET blue PET blue
4 PET brown PET brown
5 PET green PET green
6 PET transp PET transparent
7 ABS Acrylnitril-Butadien-Styrol
8 PE Polyethylene
9 PE UHMW PE ultra high-molecular
10 PEUHMWTG 1.2 PE ultra high-molecular TG 1.2
11 PE hard Polyethylene hard
12 Polyester resin Polyester resin
13 PA Polyamide
14 PC Polycarbonate
15 PP Polypropylene
16 PVC hard Polyvilylchloride hard
17 PAK Polyacrylate

256 values. The background was suppressed by an intensity threshold. For the
training of the background as a separate class some additional measurements
were taken from an empty belt. The background data were reduced as in the
paper data experiments so that the background does not dominate the other
classes and hence the determined recognition rate. Based on these data a labeled
training set was built up.
Note that some PET-classes only differ in color. Table 2 lists all defined plastic
classes.

3 Related Work
NIR spectroscopy is a well established technique for material identification in
general and paper sorting in particular [9–11]. Besides characteristic absorption
bands, also first and second order derivatives are used to preprocess the raw
reflectance spectra. Smoothing filters like Savitzky-Golay are used to reduce
noise in the derivatives [9]. Furthermore, Principal Component Analysis (PCA)
is used to reduce the dimension of the feature space [7]. Classification is then
carried out by evaluating several subsequent binary decision rules, for which
Partial Least Squares (PLS) regression is applied. The order of these substeps
is based on a sequence of manual analysis steps or on rather intuitive decisions.
Generic paper and plastic recognition and redundancy-aware feature ranking 35

Along with PCA also other techniques for feature analysis like Fisher Linear
Discriminant Analysis (LDA) or the divergence measure based on Kullback-
Leibler distance for probability distributions, besides others, have been used for
similar problems in pattern recognition [3]. Generally, the linear techniques PCA
and LDA will be only optimal if the class distributions are well separated and
Gaussian in feature space.
Well known classifiers include Classification and Regression Trees (CART)
[2], Randomized Trees or Random Forests [1] and Support Vector Machines
(SVM), besides many others [3]. Feature ranking can be done, e.g. by using a
CART with surrogates [2], Randomized Trees [5], or Recursive Feature Elimina-
tion (RFE) using weight parameters of trained SVMs [6].
We decided to use a CART classifier, since it is a rule-based and parameter
free technique which can handle a large number of features and performs well on
arbitrary distributions, provided a large number of training samples is available,
which is clearly the case in our application [2].
In [8], the approach of a generic data fusion of VIS and NIR data using a
classifier and a Machine-Learning approach was first described. In the following
sections, we describe the progress of this work and the first step towards an
application of the methods to the task of plastic waste sorting by analyzing
whole raw NIR spectra.

4 Methodology
4.1 Classifier

We use our own C++ implementation of the CART algorithm which is based
on the principles presented in [2]. The CART algorithm trains a binary decision
tree. In each node the pattern set is split at a threshold for a feature which
minimizes the impurity in the following subsets. As impurity metric we use the
Gini diversity index for a node t as proposed by [2]:

i(t) = p(j|t)p(k|t), (1)
j=k

where the indices j and k represent different classes. A splitter s is defined by the
feature which is used to split and the corresponding threshold. The decrease of
impurity from one node to the left and right child nodes tL and tR by a splitter
s is described by the delta impurity

Δi(s, t) = i(t) − pR i(tR ) − pL i(tL ), (2)

where pL and pR are the proportions of data in tL and tR respectively. The


splitter s which maximizes Δi(s, t) is then used as primary splitter. Each leaf of
the tree finally represents a class. To use a trained classification tree, the tree is
traversed for a given pattern according to the splits in each node and the class
of the reached leaf node is returned.
36 A. Serebryanyk et al.

4.2 Feature Ranking and Selection

In order to rate the importance of features, surrogates are chosen in each node
of the tree. Therefore, splitting thresholds for the other features not used in
the primary splitter are sought so that the resulting child trees would be most
similar to the trees created by the original primary splitter. For each surrogate
s∗ and the primary splitter s, the delta impurity measure from (2) is calculated.
Finally these delta impurities are summed up over all nodes for each feature,
which gives a measure M (xm ) for the importance of each feature xm :

M (xm ) = (Δi(s∗m , t) + Δi(sm , t)) , (3)
t∈T

where m ∈ {1, . . . , d} denotes the index of the specific feature, T is the set of
all nodes representing the decision tree and s∗m and sm denote the surrogates
and the primary splitter which involve feature xm . As opposed to the importance
measure found in [2], which ignores the delta impurity for the primary splitter, we
deliberately included it, since we think the feature actually used in the primary
splitter is important by definition. Tests with an artificially designed test dataset
also yielded more realistic importance measures when the primary splitter was
included.
Moreover, we defined an importance measure M  (xm ) which only sums up the
delta impurities of the primary splitter of each node, thus leaving out these of the
surrogate splitters. This means that only features actually used by the classifier
gain importance. This has the effect, that the importance ranking selects between
similar important but redundant features, thus dropping unnecessary features,
as we observed in the selection of characteristic wavelengths in raw NIR-spectra
of plastic waste (see later in Sect. 5.2).
To validate this observation we created an artificial dataset comprising 1000
samples of 11 overlapping Gaussian distributions with identity covariance matri-
ces each, that is they scatter isotropically. One distribution is centered at the
origin, and the others are placed at the coordinate axes at increasing distances
from the origin. These distributions overlap mostly with the distribution around
the origin and not with each other. A sketch is given in Fig. 2 for d = 2 features.
A CART classifier can easily separate the centered distribution around the
origin from an apart distribution by one threshold on the corresponding coordi-
nate axis, that means the corresponding feature. The farther apart a distribution
is the less is the overlap and thus the more important is that feature. When
applying the CART, the measure M (xm ) leads to an increasing feature ranking
of features 1, 2, . . . , 10, as expected.
In a next step, we replicated the feature 5 in the data set as feature 11.
Thus, these two features are completely redundant. As expected, these features
are assigned the same importance by M (xm ), as shown in Table 3. By the way,
an Randomized-Tree classifier leads to the same ranking result.
Generic paper and plastic recognition and redundancy-aware feature ranking 37

Fig. 2. A sketch of two isotropic Gaussian distributions overlapping at a different


degree with the distribution centered at the origin. The circles represent the contour
lines of the distributions. Feature x2 can better separate class 1 and 3 by a threshold
than x1 can with class 1 and 2, thus feature x2 is regarded more important than x1 by
the ranking measure.

Table 4. Normalized feature ranking by


Table 3. Normalized feature rank-
M  (xm ) with two redundant features 5
ing by M (xm ) with two redundant
and 11. Note that feature 11 is ranked 0
features 5 and 11 ranked equally
in this case.
Feature Importance
Feature Importance
10 1
10 1
9 0.90932413
9 0.90932413
8 0.86688438
8 0.86688438
7 0.76397420
7 0.76397420
6 0.66307053
6 0.66307053
5 0.65805597
5 0.65805597
11 0.65805597
4 0.47340730
4 0.47340730
3 0.18303054
3 0.18303054
2 0.11822442
2 0.11822442
1 0
1 0
11 0

In contrast, when using the measure M  (xm ) the classifier decides to use
feature 5 and rates the completely redundant feature 11 worthless, as shown in
Table 4. This is the sort of feature ranking we need to strongly reduce the feature
count while retaining most information about material classes.
38 A. Serebryanyk et al.

4.3 Robustness Improvement

If the classifier is trained until each leaf contains one single training pattern the
classifier will likely be overfitted, since also outliers are ‘learned by heart’ and
might be confused with representative data from other classes. This problem
is addressed by an internal cross-validation scheme that prunes back the fully
trained tree to some degree until it generalizes well on the given dataset.
However, in a real-world scenario with changing side conditions, feature mea-
surements might be slightly influenced by additional effects not covered by the
original training dataset. We address this problem by continuing the pruning
process of the trained tree to make it more robust against small changing mea-
surement effects. By the way, this leads to simpler trees as well.

4.4 Data Preprocessing


Paper Data. The training data is compiled from mono-fraction recordings for
each class. As a preprocessing step the paper objects were separated from the
background by using a threshold on the intensity of the visual data.
For the results in this paper, the visual resolution of 1204 pixels per scan was
scaled down to the resolution of 172 pixels of the NIR data, by a simple data
reduction.
Since the background class of the conveyor belt showed to be quite dominant
and very well distinguishable from the paper classes, the background data was
resampled to roughly the same amount as the next bigger classes. This avoids the
overall recognition rate to be too optimistic just because of a good background
recognition.

Plastic Data. According to [12], varying intensities from scan line to scan line
were caused by varying distances between camera and the objects and by diffuse
scattering effects. Following the norming procedure described in [12], all spectra
are normed so that
d
|xi | = const = 256 ,
i=1

where xi is a component of the feature vector x ∈ Rd , in this case the intensity


value at a particular wavelength of the spectrum at a pixel of the scan track.
Essentially, this normalization removes a constant bias. The constant value 256 is
chosen to avoid inaccuracies due to floating point errors for big or small spectral
values. Imposed PP-spectra, normalized and smoothed, are shown in Fig. 3 as
an example. These spectra match quite well, they don’t spread much vertically.
Since the spectra don’t show sharp peaks, no peak retaining smoothing fil-
ter is necessary. We used simple Gaussian smoothing filters, and calculated the
first and second derivatives by derivated Gaussian filters as additional spectral
features used in the material classification.
Generic paper and plastic recognition and redundancy-aware feature ranking 39

Fig. 3. Example of superimposed spectra for plastic sort Polypropylene (PP) after
normalization and smoothing to show the variation in the spectra. The spectra don’t
spread much vertically after normalization (The color scale represent frequency of over-
lapping spectra and can be ignored here).

5 Experimental Results
5.1 Paper Data

The dataset used for the following results consisted of almost 4 million sam-
ples of which 80% were used as training set and 20% as validation set in a 3-fold
cross-validation scheme. To be clear, the purpose of this cross-validation is to get
a most accurate estimation of the real recognition rate. We emphasize that this
dataset originates from a real sorting facility with all dirty effects like probe
contamination, light scattering, changing detector-probe distances, shadow
effects, etc.
Solely using the given NIR features as described in Sect. 2.1, our classifier
achieved an overall recognition rate of 63%. The classification statistics are given
in Table 5, and the corresponding error matrix or confusion matrix F is visualized
in Fig. 4. Ni /N is the fraction of data belonging to class i. The elements Fij of F
are the number of samples from class i which are classified as class j, where i is
the row index and j the column index. The diagonal elements of F represent the
frequency of correct classification decisions, while the off-diagonals show false-
positive and false-negative decision rates. From F the diagonal elements diag(F )
are extracted and the F1 measure is computed. The F1 measure is the harmonic
mean of precision and recall and thus also considers false positives and false
negatives. The overall recognition rate is calculated as 1 − P (F ), where P (F ) is
the error probability.
Adding the RGB and HSV channels the recognition rate could be raised to
69%. In a first attempt to include other features, a variety of 386 additional
visual features were computed consisting of co-occurrence features, histogram
moments, Haar wavelet filters, anisotropic Gaussian filters, and first and second
order spatial derivatives for various mask widths and orientation angles. The
total of 419 features resulted in a recognition rate of around 77%.
As a remark, the trained CART classifier consists of 484054 decision nodes
and 33371 leaves in this case. Two reasons led us to the decision not to use a
Randomized Tree (RT) instead of a CART: first a RT ranks the features like a
CART with surrogate rules according to M (xm ). Second, the time of a couple of
40 A. Serebryanyk et al.

minutes needed to read in a trained RT consisting of e.g. 100 CART classifiers


is a bit prohibitive in a real facility environment.

Table 5. Classification statistics for all NIR features (d = 29)

Class index i 0 1 2 3 4 5 6 7 8 9 10
Class abbrev. BG ZD MGWD BP WPb WPw-u WP-g KA-u KA-g SV UN
Ni /N 16.65 11.87 21.44 13.56 4.93 5.46 2.98 2.26 13.52 3.83 3.49
F1 measure 95.09 54.68 60.35 65.75 43.68 36.32 36.03 19.23 68.98 30.82 34.39
diag(F ) 16.169 7.120 14.346 9.618 2.284 1.702 0.736 0.276 9.060 0.789 0.858
1 − P (F ) = 62.958

Table 6. Classification statistics for the best d = 59 features selected among NIR,
RGB, HSV and a mixture of visual features

Class index i 0 1 2 3 4 5 6 7 8 9 10
Class abbrev. BG ZD MGWD BP WPb WPw-u WP-g KA-u KA-g SV UN
Ni /N 16.65 11.87 21.44 13.56 4.93 5.46 2.98 2.26 13.52 3.83 3.49
F1 measure 96.49 72.60 75.19 80.84 82.79 70.18 63.42 69.81 75.57 62.53 61.99
diag(F ) 16.026 8.704 17.086 11.074 4.079 3.629 1.641 1.457 10.242 2.172 1.973
1 − P (F ) = 78.082

By iteratively deleting the most unimportant features (according to the mea-


sure described in Sect. 4.2), the number of features could be reduced to just 59,
while even improving the recognition rate slightly to 78%. The error statistics
are listed in Table 6, and the corresponding error matrix F is visualized in Fig. 5.
It is worth to be noted, that the increase in recognition rate from 63% to
78% contributed mainly to the paper classes and not to the background class
(compare F1 measures in Tables 5 and 6). An example of classified paper waste
is shown at the bottom of Fig. 1 where the paper classes are labeled by different
colors.
To further illustrate the feature selection process and its relevance to the
achievable recognition rate, Fig. 6 shows the recognition rate versus the number
of selected features among the 419 total features. At the far right, when all NIR
and VIS features are used, 77% recognition rate is achieved. Surprisingly, when
moving to the left in this plot, a further deletion of features results in a slight
increase of the recognition rate, because the classifier is no longer worried about
useless and redundant information in the data set. However, the CART classifier
is a parameter free approach and deals robustly with useless information. The
most important result is, however, that the features can be reduced down to 59
with no loss in the recognition rate, which leads to 78%. Only when reducing
Generic paper and plastic recognition and redundancy-aware feature ranking 41

Fig. 4. Visualization of the class error Fig. 5. Visualization of the class error
matrix F for 29 NIR features. With i being matrix F for best 59 NIR+VIS features
the row index and j the column index, the (see peak in Fig. 6). The recognition rate
elements Fij are the number of samples is improved much compared to Fig. 4.
from class i which are classified as class j.
Low values are colored in blue, high values
in red.

the features further, a significant decrease of the recognition rate results (see far
left in Fig. 6). Thus, with appropriate feature selection, the computational cost
can be reduced, since only the best visual features need to be computed.
Interestingly, our feature ranking also showed, that the H and S channel of
the HSV data are quite important, which is also stated by [9]. More surprisingly,
almost half of the original NIR features could be dropped in the remaining set
of 59 features – even the values for talcum and lignin.
While [10] states, that rule-based classifiers like CART are generally too slow
for real-time applications, we would be able to process at a conveyor speed of
4m/s on a standard 4-core computer based on 29 NIR, 3 RGB and 3 HSV features
without the need to further parallelize by hardware. This would be eight times
the actual conveyor speed. When, however, exploiting many hundreds of visual
features, more sophisticated data preprocessing steps need to be applied.

5.2 Plastic Data


In the first experiment, a CART-classifier was trained for all 17 classes with 768
features. The size of the training data is big enough, and the classifier uses an
internal cross-validation so that overfitting is avoided. The class error matrix in
Fig. 7 however shows an almost perfect recognition of all classes with 1−P (F ) =
89.57%. Even the five PET-classes, that only differ in color and cause the most
recognition errors, are recognized quite well. This is an overly optimistic result,
of course, but it shows it’s worth to proceed with our generic approach.
42 A. Serebryanyk et al.

In the next experiment, only the most important classes from an application
point of view are considered further by merging all PET-classes (1–6) and all
PE-classes (8–11) to one PET and PE class respectively, and dropping classes
7, 12, and 17, see Table 7 and compare with Table 2.

Table 7. Most important plastic classes to be discriminated, with N = 537267 samples


in total. The class index runs from 0, . . . , c with c = 6 classes plus background

Class index i Abbreviation Class Pattern samples Ni


0 BG Background 192678
1 PET Polyethylene Terephthalate 192676
2 PE Polyethylene 105113
3 PA Polyamide 12078
4 PC Polycarbonate 2641
5 PP Polypropylene 15059
6 PVC hard Polyvilylchloride hard 17022

Fig. 6. Recognition rate over selected features. Best trade-off with 59 features and
recognition rate of 78%.

Table 8. Classifying statistics for 6 important classes with d = 768 features

Class index i 0 1 2 3 4 5 6
Class abbrev. BG PET PE PA PC PP PVC
Ni /N 35.86 35.86 19.56 2.25 0.49 2.80 3.17
F1 measure 99.79 99.62 99.74 99.94 94.48 99.47 97.15
diag(F ) 35.809 35.734 19.507 2.247 0.454 2.793 3.061
1 − P (F ) = 99.604%
Generic paper and plastic recognition and redundancy-aware feature ranking 43

Fig. 7. Class error matrix for all plastic classes. Fig. 8. Class error matrix for 6
The overall recognition rate is 1 − P (F ) = important plastic classes. The over-
89.57%. Mostly the differently colored PET- all recognition rate is 1 − P (F ) =
classes contribute to the recognition error. 99.604%.

Fig. 9. Recognition rate versus selected features count for Breiman-measure (blue) and
only primary splitter measure (red). Less features are needed in the red case.

Figure 8 shows the related class error matrix, and Table 8 the classification
statistics. As before the recognition rate is very good, almost 100% now.
The effect, only to consider the primary splitter in the feature ranking is
shown in Fig. 9. The recognition rate drops at less features compared to the
feature selection based on the original ranking criterion. That’s because now
the ranking selects between equally important, but redundant features, thus
dropping high ranked but unnecessary features as well.
Figure 10 shows the second derivative of spectra of various plastic materials.
The grey bars indicate the importance assigned to wavelengths according to this
feature by the importance measure M  (xm ). Wavelengths where this feature
shows great diversity are rated high.
As mentioned above, these recognition rates are overly optimistic due to (a)
the careful probe preparation and (b) the data set being far from realistic for
all possible appearances of plastic waste in a real facility. But the results show,
44 A. Serebryanyk et al.

Fig. 10. Importance (grey bars) of the 2nd derivative of spectra versus wavelength.

that even identic PET probes, only differently colored, can be recognized well,
and that the feature selection scheme can be applied to whole raw NIR-spectra
too. This is all the more important as
– It is a generic approach without the need of any expert knowledge, and
– The amount of data of a raw spectrum is about eight times that of prepro-
cessed score values, hence the need for a data reduction increases much.

6 Conclusion and Outlook


The experimental results including additional visual features show a significant
improvement over NIR scores alone. Our results on the real world paper data
approve the preliminary results attained on a laboratory-dataset with 14 different
paper classes. The feature ranking of the CART classifier enables us to use many
potential features at first and automatically select only the best subset for a
productive environment.
The application of the material recognition methods on raw NIR-spectra of
plastic waste reveals that wavelengths can be selected in an generic way, where
material classes exhibit characteristic diversity, thus preprocessed scores depen-
dent on the experience of a particular camera manufacturer are no longer nec-
essary. This way, the amount of data of raw spectra can be successfully reduced
as well while retaining the crucial information.
For the future, we plan to exploit the full visual resolution in order to capture
finer structure details in paper waste. At the same time, intelligent data fusion of
multivariate data of different resolutions is needed to avoid resubstitution error
due to partially replicated data. With a sevenfold higher resolution, the com-
putational costs will also be a critical factor. Therefore, we want to investigate
the applicability of a regional pre-clustering procedure and other data reduction
techniques. We also intend to compare the feature ranking technique used in
our CART classifier to other possible techniques, like e.g. l1 -regularized data
reduction. Compared to a simple RGB camera a NIR sensor is rather expen-
sive. Thus, it is also of interest, if visual features alone suffice to achieve an at
least acceptable recognition rate for a lower price. Since real world paper waste
is not guaranteed to only contain paper, detection of problematic material like
Generic paper and plastic recognition and redundancy-aware feature ranking 45

inflammable materials or rigid objects which might damage the sorting plant
would be much appreciated. For these classes it is generally hard to gather much
training data, as the variety of possible objects is huge.
The recognition results for plastics on a small data set of raw NIR-spectra
are quite promising and advice us to determine the recognition rates on a large
scale in a real sorting facility for plastic materials as well.

References
1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). ISSN: 0885-6125
2. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression
Trees. Chapman & Hall/CRC, Boca Raton (1984). ISBN: 978-0-412-04841-8
3. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New
York (2000). ISBN 0-471-05669-3
4. Verband Deutscher Papierfabriken e.V. Facts about Paper (2015). Brochure.
Accessed 30 Nove 2015. http://www.vdp-online.de/en/papierindustrie/statistik
5. Genuer, R., Poggi, J.-M., Tuleau-Malot, C.: Variable selection using random
forests. Pattern Recognit. Lett. 31(14), 2225–2236 (2010)
6. Guyon, I., Weston, J., Barnhill, S., Vapnik. V.: Gene selection for cancer classifica-
tion using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002). ISSN:
0885-6125
7. Jolliffe, I.T.: Principal component analysis. In: Springer Series in Statistics.
Springer, New York (1986). ISBN: 0-387-96269-7
8. Klippel, P., Zisler, M., Schröder, F., Schleich, S., Serebryanyk, A., Schnörr, C.:
Improvement of dry paper waste sorting through data fusion of visual and NIR
data. In: Pretz, T., Wotruba, H. (eds.) 7th Sensor-Based Sorting & Control 2016,
Shaker (2016)
9. Leitner, R., Rosskopf, S.: Identification of flexographic-printed newspapers with
NIR spectral imaging. Int. J. Comput. Inf. Syst. Control. Eng. 2(8), 68–73 (2008).
ISSN: 1307-6892
10. Rahman, M.O., Hussain, A., Basri, H.: A critical review on waste paper sorting
techniques. Int. J. Environ. Sci. Technol. 11(2), 551–564 (2014). ISSN: 1735–1472.
English
11. Rahman, M.O., Hussain, A., Scavino, E., Basri, N.E.A., Basri, H., Hannan, M.A.:
Waste paper grade identification system using window features. J. Comput. Inf.
Syst. 6(7), 2077–2091 (2010). ISSN: 1553-9105
12. Siesler, H.W., Ozaki, S., Kawata, Y.-a., Heise, H.M.: Near-Infrared Spectroscopy.
Principles, Instruments, Applications. Wiley-VCH Verlag GmbH (2002)
Hand Gesture Recognition
with Leap Motion

Lin Feng1 , Youchen Du1 , Shenglan Liu1(B) , Li Xu2 , Jie Wu1 , and Hong Qiao3
1
Dalian University of Technology, Dalian, China
liusl@mail.dlut.edu.cn
2
Neusoft Co. Ltd., Shenyang, China
3
Chinese Academy of Sciences, Beijing, China

Abstract. Hand gesture is a natural way for people to communicate, it


plays an important role in Human-Computer Interaction (HCI). Nowa-
days, many developers build HCI applications on the top of hand gesture
recognition, but how to get more accurate when recognizing hand ges-
tures still have a long way to go. The recent introduction of depth cam-
eras like Leap Motion Controller (LMC) allows researchers to exploit the
depth information to recognize hand gesture more robustly. This paper
proposes a novel hand gesture recognition system with LMC for hand
gesture recognition. Histogram of Oriented Gradient (HOG) feature is
extracted from Leap Motion binarized and undistorted sensor images.
We feed these features into a multi-class Support Vector Machine (SVM)
classifier to recognize performed gesture. The results show that our model
is much more accurate than previous work.

Keywords: Hand gesture recognition


Support Vector Machine (SVM)
Histogram of Oriented Gradient (HOG) · Leap motion

1 Introduction
In recent years, with the enormous development in the field of machine learn-
ing, problems such as understanding human voice, language, movement, posture
become more and more popular, hand gesture recognition as one of the these
fields has attracted many researchers’s interest [1]. Hand is an important part of
the human body, as a way to supplement the human language, gestures play an
important role in daily life, in the fields of human-computer interaction, robotics,
sign-language, how to recognize a hand gesture is one of the core issues [2–4]. In
previous work, Orientation Histograms have been used to recognize hand ges-
ture [5], a variant of Earth mover’s distance(EMD) also have been used to finish
this task [6]. Recently, a bunch of depth cameras such as Time-of-Flight cameras
and Microsoft Kinect have been marketed one after another, the use of depth
features has been added to the gesture recognition based on low dimensional
feature extraction [7]. A volumetric shape descriptor have been used to achieve
robust pose recognition in realtime [8], adding features like distance, elevation,
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 46–54, 2019.
https://doi.org/10.1007/978-3-030-02686-8_4
Hand Gesture Recognition with Leap Motion 47

curvature based on 3D information on the hand shape and finger posture con-
tained in depth data have also improved accuracy [9]. Recognize hand gesture
through contour have also been explored [10]. Use finger segmentation to recog-
nize hand gesture have been tested [11]. Use HOG feature and SVM to recognize
hand gesture have also been proposed [12].
The Leap Motion Controller (LMC) is a consumer-oriented tool for gesture
recognition and finger positioning developed by Leap Motion. Unlike Microsoft
Kinect, it is based on binocular visual depth and provides data on fine-grained
locations such as hands and knuckles. Due to the different design concepts, it can
only work normally under close conditions, but it has a good performance on data
accuracy with an accuracy of 0.2 mm [13]. There have been many researches try
to recognize hand gesture LMC [14,15]. Combining Leap Motion and Kinect for
hand gesture recognition have been proposed and achieved a good accuracy [16].
Our main contributions are as follows:
1. We propose a LMC hand gesture dataset, which contains 13 subjects and 10
gestures, each gesture by each subject is repeated 20 times, thus we have 2600
samples in total.
2. We use Leap Motion only. We extract the HOG feature of LMC sensor images,
HOG feature significantly improved gesture accuracy.
This paper is organized in this way: In Sect. 2, we give a brief introduction
of our model architecture, methods and our dataset. In Sect. 3, we present the
HOG feature extracted from binarized LMC sensor images. In Sect. 4, we analyze
and compare the performance of the HOG feature with the work presented by
Margin et al. In Sect. 5, we put forward the conclusion of this paper and thoughts
on the following work.

2 Overview
In this section, we describe the model architecture we used and the way data is
handled (Sect. 2.1), and how we collect our dataset by LMC (Sect. 2.2).

2.1 System Architecture


Figure 1 shows in detail the recognition model we designed. For sensor images,
we retrieve sensor images from LMC and binarize these images, then we extract
the HOG feature, finally put these features into a One-vs-One multi-class SVM
to classify hand gesture.

2.2 Hand Gesture Dataset


In order to evaluate the performance of the HOG feature of the raw sensor
images, we propose a new dataset, the setup is shown in Fig. 2. The dataset
contains a total of 10 gestures (Fig. 3) performed by 13 individuals, each gesture
is repeated 20 times, so the dataset contains a total of 2600 samples. The tracking
48 L. Feng et al.

Fig. 1. System architecture.

Fig. 2. Capture setup.


Hand Gesture Recognition with Leap Motion 49

data and sensor images are captured simultaneously, and each individual is told
to perform gestures within LMC’s valid visual range, allowing translation and
rotation, no other prior knowledges.

Fig. 3. Gestures in dataset.

3 Feature Extraction from Sensor Images


3.1 Sensor Images Preprocessing

Barrel distortion is introduced due to LMC’s hardware (Fig. 4), in order to get
realistic images we use an official method provided by Leap Motion to use bilinear
interpolation to correct distorted images.
We use threshold filtering for the corrected image, and after doing so, the
image will be binarized, retaining the area of the hand and removing the non-
hand area as much as possible, as show in Fig. 5.

3.2 Histogram of Oriented Gradient

The HOG feature is a feature descriptor used for object detection in computer
vision and image processing. Its essence is the statistics of image gradient infor-
mation. In this paper, we use HOG feature to extract the feature information
about gestures in binarized undistorted sensor images.
50 L. Feng et al.

Fig. 4. Raw images from LMC.

Table 1. Tracking features accuracy on both datasets

Marin et al. Ours


79.80% 82.30%
Hand Gesture Recognition with Leap Motion 51

Fig. 5. Binarized images.

4 Experiments and Results


4.1 Comparison Between Different Datasets
In order to prove our dataset have a similar data distribution compared with pre-
vious work and have no special preferences on our HOG feature, We reconstruct
the calculations for features like fingertips angle, fingertips distance, fingertips
elevation in [16], the results as shown in Table 1.

4.2 HOG Feature with Different Classifiers


We compare the performance of HOG feature with different classifiers, such as
LR, SVM (RBF), SVM (linear), RF, KNN, MLP. In each round, we split dataset
into 80% train set and 20% test set, then we train these classifiers with the same
52 L. Feng et al.

data and validate its performance. The results of 10 rounds show that SVM with
RBF kernel outperforms other classifiers with a significantly margin, as shown
in Table 2.

Table 2. Performance of HOG feature on different classifiers

Classifier Precision
LR 88.15%
SVM(RBF) 96.42%
SVM(linear) 96.31%
RF 82.50%
KNN 94.69%
MLP 94.00%

4.3 SVM Details

We use the One-vs-One strategy for multi-class SVM with RBF kernel to classify
10 classes, for each class pair there is a SVM, so result in a total of 10∗(10−1)/2 =
45 classifiers, the final classification result based on votes received. For hyper-
parameters like (C, γ), we use the grid search method on 80% of the samples
with 10-fold cross-validation, C is searched from 100 to 103 , γ is searched from
10−4 to 100 .
We present our best results with parameters searched by grid search in
Table 3.

Table 3. Best results with parameters searched by grid search

Classifier Precision
SVM(RBF) 98.27%

5 Conclusions and Future Works

In this paper, we proposed a LMC hand gesture dataset, which contains 13


subjects and 10 gestures. We proposed a way to extract HOG feature from LMC
raw sensor images by using binarized and undistorted method. We compared
the performance of HOG feature with different classifiers and presented the best
results in our experiment.
In future work, we will explore the characteristics of tracking data, we think
the characteristics of the joints will also affect the accuracy of the overall clas-
sification due to the correlation between joints. We will try to perform feature
Hand Gesture Recognition with Leap Motion 53

fusion between tracking features and HOG feature, the results should be con-
siderable. The current training process consumes much time in our experiment,
we will continue to optimize the training process by introducing techniques like
removing linearly-dependent features by PCA. At the same time, we will study
the interaction between the system and virtual reality application scenarios.

Acknowledgments. This work was supported in part by the National Natural Sci-
ence Foundation of China under Grant 61627808, 91648205, 61602082, and 61672130.
This work was also supported in part by the development of science and technology of
guangdong province special fund project Grants 2016B090910001 and Open Program
of State Key Laboratory of Software Architecture (Item number SKLSAOP1701).

References
1. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human
computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
2. Ohn-Bar, E., Trivedi, M.M.: Hand gesture recognition in real time for automotive
interfaces: a multimodal vision-based approach and evaluations. IEEE Trans. Intell.
Transp. Syst. 15(6), 2368–2377 (2014)
3. Wan, C., Yao, A., Van Gool, L.: Hand pose estimation from local surface normals.
In: European Conference on Computer Vision, pp. 554–569. Springer (2016)
4. Chaudhary, A., Raheja, J.L., Das, K., Raheja, S.: Intelligent approaches to interact
with machines using hand gesture recognition in natural way: a survey. arXiv
preprint arXiv:1303.2292 (2013)
5. Freeman, W.T., Tanaka, K.-i., Ohta, J., Kyuma, K.: Computer vision for computer
games. In: Proceedings of the Second International Conference on Automatic Face
and Gesture Recognition, pp. 100–105. IEEE (1996)
6. Ren, Z., Yuan, J., Meng, J., Zhang, Z.: Robust part-based hand gesture recognition
using kinect sensor. IEEE Trans. Multimed. 15(5), 1110–1120 (2013)
7. Suarez, J., Murphy, R.R.: Hand gesture recognition with depth images: a review.
In: RO-MAN, 2012 IEEE, pp. 411–417. IEEE (2012)
8. Suryanarayan, P., Subramanian, A., Mandalapu, D.: Dynamic hand pose recogni-
tion using depth data. In: 2010 20th International Conference on Pattern Recog-
nition (ICPR), pp. 3105–3108. IEEE (2010)
9. Dominio, F., Donadeo, M., Zanuttigh, P.: Combining multiple depth-based descrip-
tors for hand gesture recognition. Pattern Recognit. Lett. 50, 101–111 (2014)
10. Yao, Y., Yun, F.: Contour model-based hand-gesture recognition using the kinect
sensor. IEEE Trans. Circuits Syst. Video Technol. 24(11), 1935–1944 (2014)
11. Chen, Z.-h., Kim, J.-T., Liang, J., Zhang, J., Yuan, Y.-B.: Real-time hand gesture
recognition using finger segmentation. Sci. World J. 2014 (2014)
12. Feng, K.-p., Yuan, F.: Static hand gesture recognition based on hog characters and
support vector machines. In: 2013 2nd International Symposium on Instrumenta-
tion and Measurement, Sensor Network and Automation (IMSNA), pp. 936–938.
IEEE (2013)
13. Weichert, F., Bachmann, D., Rudak, B., Fisseler, D.: Analysis of the accuracy and
robustness of the leap motion controller. Sensors 13(5), 6380–6393 (2013)
14. Ameur, S., Khalifa, A.B., Bouhlel, M.S.: A comprehensive leap motion database
for hand gesture recognition. In: 2017 International Conference on Information and
Digital Technologies (IDT), pp. 514–519. IEEE (2017)
54 L. Feng et al.

15. Wei, L., Tong, Z., Chu, J.: Dynamic hand gesture recognition with leap motion
controller. IEEE Signal Process. Lett. 23(9), 1188–1192 (2016)
16. Marin, G., Dominio, F., Zanuttigh, P.: Hand gesture recognition with leap motion
and kinect devices. In: 2014 IEEE International Conference on Image Processing
(ICIP), pp. 1565–1569. IEEE (2014)
A Fast and Simple Sample-Based T-Shirt Image
Search Engine

Liliang Chan ✉ , Pai Peng, Xiangyu Liu, Xixi Cao, and Houwei Cao
( )

Department of Computer Science, New York Institute of Technology, New York, USA
{lchen25,ppeng,xliu24,xcao01,hcao02}@nyit.edu

Abstract. In this paper, we proposed a fast and simple sample-based T-shirt


image retrieval system TColor, which can effectively search T-shirt image by
main color, and optional secondary colors. We considered several distinct prop‐
erties of T-shirt images. Instead of traversing all pixels on T-shirt image, we
search T-shirt by color based on 12 representative pixels extracted from the esti‐
mated effective T-shirt area. We evaluated our system based on a small amount
of pilot T-shirt image data. Our results indicated that the proposed system signif‐
icantly outperforms the straight-forward, brute force unfiltered traverse search,
and obtains similar results with a much complex, time-consuming filtered traverse
algorithm which removes the background color for t-shirt image during the
search.

Keywords: T-shirt image · Image search · Search engine

1 Introduction

In the era of information age, there are dramatically number of images being distributed
and shared over the web. As a result, many search engines have added the function of
image search, such as Google, Baidu, Bing, etc. The most common approach for image
search is “content-based” image retrieval, which is based on the image analysis in order
to extract low-level visual properties, such as color, shape, and texture [1, 2]. Besides,
other systems search images based on the visual similarity, regardless of the content of
the real images [3]. The first step in image retrieval is feature extraction. Most image
search engines use the color space feature extractor and the composition space feature
extractor to extract the image features, and then search the best image based on the
similarities. During the search process, the perceptual hash algorithm is usually used to
generate a “fingerprint” string for each picture, and the similarity between images can
be measured by comparing the fingerprints between different pictures. Although image
search has been successfully applied in many search engines and applications, it is not
trivial and there are many challenges encountered in the search process. For example,
simplifying the color and calculating the gray-scale average of pixels can take very long
time on large image databases. In addition, compared with general image search, T-shirt
search has some distinctive characteristics and challenges. In this paper, we proposed a
fast and simple sample-based T-shirt image search engine. By considering several

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 55–62, 2019.
https://doi.org/10.1007/978-3-030-02686-8_5
56 L. Chan et al.

distinct properties of T-shirt image, our system can effectively search T-shirt by main
color, and optional secondary colors.
In this paper, we proposed a very fast and simple sample-based T-shirt image search
engine, which can effectively search T-shirt by color. Compared with general image
search, T-shirt search has some distinctive characteristics and challenges. For example,
the t-shirt images usually have large portion of background, and the background color
can cause perturbations to search accuracy. On the other hand, the t-shirt images usually
have symmetrical structure, and located in the relatively fixed position of the entire
image. By considering these distinct properties of T-shirt images, we proposed a simple
but effective search system which can search the T-shirt images by main color and
optional secondary colors. For each t-shirt image, instead of traversing all pixels, we
first select 12 pixels based on some sampling rules derived from analyzing a small
amount of pilot data, and extract the RGB data of these pixels [5]. Then we transform
three-dimensional microscopic RGB data into visual colors. In the process, we chose
12 common colors, and classified pixels with different RGB into colors based on the
Euclidean distance [6]. Meanwhile, we compute the proportion of each color and stored
the information into our t-shirt image database for future search. Based on the pilot
evaluation results on 200 t-shirt images, our proposed system significantly outperforms
the general unfiltered traverse search, and obtains similar results with much complex,
time-consuming filtered traverse algorithms which removed the background color for t-
shirt image.

2 Methods

In this section, we introduce how we implement the proposed sample-based t-shirt search
algorithm.

2.1 Selection of Representative Pixels

Instead of traversing all pixels, our proposed sample-based t-shirt search system search
t-shirt is only based on a few samples. How to select representative sampling points is
very crucial for the search accuracy. Here we introduce our strategies for data sampling.
First of all, as most of the t-shirts are symmetrical, we only focus on left half of the
image. Chopping half of the image can obviously decrease the search time and
complexity, reduce the data size from 2n to n. On the other hand, t-shirt images usually
have large portion of background. We try to avoid the background area and only select
data samples from the effective t-shirt region. In order to do that, we try to determine
the relative position of T-shirt boundary in four directions (left, right, upper, and lower)
based on statistical analysis on 50 pilot t-shirt images in our dataset. Figure 1(a) and
1(b) shows the histogram of the boundary locations based on the 50 pilot images, clearly
indicates the range of boundary locations. Based on that, we can roughly determine the
valid area of t-shirt images as shown in Fig. 2. Then we randomly sample 12 pixels from
the valid area, and an example of t-shirt image and how the 12 selected pixels distributed
can be found in Fig. 2 as well.
A Fast and Simple Sample-Based T-Shirt Image Search Engine 57

Fig. 1a. Histogram for left/right boundary distribution.

Fig. 1b. Histogram for upper/lower boundary distribution.

Fig. 2. Valid search area (left) & example of how 12 selected pixels distributed.

2.2 Determining the Color for Selected Pixels

For each selected pixel, we can easily get the corresponding microscopic R-G-B data
on these sampling pixels by Python Image Library PIL [4]. However, the microscopic
58 L. Chan et al.

R-G-B information is not visual enough. We need to transform the microscopic R-G-B
to macroscopical candidate colors [7–9]. In our proposed t-shirt search system, we give
users 12 candidate colors to choose, including black, white, red, orange, yellow, green,
cyan, blue, purple, pink, grey and brown. Therefore, we divide the R-G-B 3-dimensional
space into 12 parts based on Euclidean Distance [10]. By computing the Euclidean
Distance between the sampling pixel and the standard colors as (1), the sample pixel
should belong to color category Ci with the shortest distance.

D (sample pixel, standard color) = (R − R′ )2 + (G − G′ )2 + (B − B′ )2 , (1)
( )
D P, Ci = Min(D (P, C)) (2)

2.3 Traversal-Based T-shirt Retrieval

Two traversal based search algorithms are implemented as well for the sake of compar‐
ison.

Unfiltered Traversal Search. We first consider the most straight-forward search


approach, the unfiltered traversal search. This simple brute-force approach does not take
into account the background color of t-shirt image. We try to traverse very pixel on the
image, and get the corresponding R-G-B data for each pixel, then further classify them
into one of the twelve candidate colors [11].

Filtered Traversal Search. In filtered-traversal search, we try to filter the background


color of the t-shirt image. As the Euclidean distance between two pixels of obviously
different colors should be much bigger than that between two pixels of similar colors,
we can identify whether a pixel is located on boundary or not, by examining the Eucli‐
dean distance between the current pixel and the adjacent pixel during the traverse search.
Figure 3 shows the Euclidean distance across the boundary for the 50 pilot t-shirt images

Fig. 3. Euclidean Distance across the boundary on 50 pilot T-shirt images.


A Fast and Simple Sample-Based T-Shirt Image Search Engine 59

in our dataset. We can see that the min distance across the boundary is 1500. As a result,
we choose this value as the threshold to filter the background color in the filtered traversal
algorithm.

3 Results

3.1 Dataset

3000 T-shirt images were collected for our study. In our pilot study. 200 testing T-shirt
images were labelled by human labelers. Specifically, the labelers will label the T-shirt
image with one of the 12 common colors including black, white, red, orange, yellow,
green, cyan, blue, purple, pink, grey and brown. The main color of the T-shirt will be
marked as the color occupying more than 45 per cent of the T-shirt. Secondary color is
the one occupying less than the main color but more than 0% of the t-shirt area.
Figure 4 shows the color distribution of the main color and secondary colors on the pilot
test data set.

Color Distribution for 200 Testing Images

80
60
40
20
0

Main Color Secondary Color

Fig. 4. Color distribution of the main color and secondary colors for the 200 pilot test T-shirt
images.

3.2 Evaluations

Table 1 compares the performance of the three different search approaches. We consider
two different evaluation matrices, MAP (Mean Average Precision) and MRR (Mean
Reciprocal Rank). MAP is used to evaluate the system precision in general. Different
from standard image search engine, in a t-shirt search engine, the accuracy for searching
by main color should be much more significant. So, we compute the MRR to evaluate
the main color search as well.
60 L. Chan et al.

Table 1. Performance of the three different search approaches


Algorithm applied to the engine MAP MRR for main color search
Sampling Algorithm 0.61 0.90
Traverse Algorithm (Filtered) 0.63 0.90
Traverse Algorithm (Unfiltered) 0.52 0.78

From Table 1, we can see that the MAP for sample-based search is 0.61, which is
comparable with the 0.63 obtained by the filtered traversal search, and significantly
better than the simple, brute-forced traversal search with 0.52 mean averaged precision.
Similar results can be seen on MRR. The MRR for main color search is 0.90 for the two
search algorithms which benefit from removing the background colors during search.
The MRR performance is much lower on the simple traversal search. Significant test is
also performed to indicate the significance of the results. The improvement between the
proposed sample-based search and simple traversal search is statistically significant with
the p-value 0.02. There is not significant difference (p-value 0.15) with the sample-based
search and filtered traversal search.
We also evaluated the three different systems by testing the execution speed in the
same testing environment. The results are shown in Table 2. It’s clear that the search
engine applied the proposed sampling-based algorithm has a clear advantage in execu‐
tion efficiency. It’s execution speed is less than 1/50 compared with the other two
engines. The filtered traversal search takes the longest time to search the T-shirt image
among the three approaches.

Table 2. Comparison on execution speed


Algorithm applied to the engine Average consuming time for analyzing color information for
one T-shirt image
Sampling Algorithm 10 ms
Traverse Algorithm (Filtered) 900 ms
Traverse Algorithm (Unfiltered) 760 ms

We are also interested in how our proposed t-shirt color search engine works on
different colors. We further break-down the results for each color. Figure 5 shows the
results. First of all, we can see that our proposed system performs significantly different
on different colors. For example, the system can search red, green color T-shirt with
very high MAP, while it did not show good performance on T-shirt with cyan, pink,
grey, purple and brown.
A Fast and Simple Sample-Based T-Shirt Image Search Engine 61

MAP for 12 colors


1.00
0.80
0.60
0.40
0.20
0.00

Fig. 5. Break-down MAP (Mean Average Precision) for each color based on the proposed
sample-based T-shirt image search.

4 Conclusions

This paper focuses on the T-shirt image search task. We considered several distinct
properties of T-shirt images and proposed a fast and simple sample-based T-shirt image
search engine, which can effectively search T-shirt by main color, and optional secon‐
dary colors. Instead of traversing all pixels, our proposed sample-based t-shirt search
system search t-shirt is only based on a few samples. How to select representative
sampling points is very crucial for the search accuracy. In this study, 12 representative
pixels were extracted from the estimated T-shirt area. Several statistical analyses were
performed to bound the sampling region. We evaluated our system based on 200 pilot
T-shirt images. Both the MAP and MRR results indicated that the proposed system
significantly outperforms the straight-forward, brute force unfiltered traverse search, and
obtains similar results with a much complex, time-consuming filtered traverse algorithm
which removes the background color for t-shirt image during the search. We further
break-down the results for each color, and the results indicate that the proposed system
performs significantly different on different colors. The system can search red, green
color T-shirt with very high MAP, while it did not show good performance on purple
and brown T-shirt. We also evaluated the three different systems by testing the execution
speed in the same testing environment. The proposed system shows clear advantage in
execution efficiency. The execution speed is less than 1/50 compared with the other two
engines. In future, we will validate our proposed sample-based T-shirt search engine on
large dataset with more T-shirt images.
62 L. Chan et al.

References

1. Veltkamp, R.C., Tanase, M.: Content-Based Image Retrieval Systems: A Survey. Technical
Report UU-CS-2000-34, Dept. of Computing Science, Utrecht University (2002)
2. Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries
in MARS. In: Proceedings of ACM Conference on Multimedia, pp. 403–413 (1997)
3. Terragalleria. http://www.terragalleria.com
4. Pajankar, A.: Raspberry Pi Image Processing Programming: Develop Real-Life Examples
with Python, Pillow, and SciPy. Apress (2017)
5. Zhang, Q., Song, X., Shao, X., Zhao, H., Shibasaki, R.: From RGB-D images to RGB images:
single labeling for mining visual models. ACM Trans. Intell. Syst. Technol. 6(2), 16 (2015)
6. Huang, X.Y., Chen, W.W.: Study on image search engine based on color feature algorithm.
Adv. Mater. Res. 267, 1010–1013 (2011)
7. Huang, X., Chen, W: A modular image search engine based on key words and color features.
In: Transactions on Edutainment VIII. LNCS, vol. 7220, pp. 200–209 (2012)
8. Tedore, C., Johnsen, S.: Using RGB displays to portray color realistic imagery to animal eyes.
Curr. Zool. 63, 27–34 (2017)
9. Lieb, A.: Color indexing for images. US20080044081 (2008)
10. Claussen, R.: Algorithms: Euclidean algorithm. ACM (1960)
11. Leon, K., et al.: Color measurement in L*a*b* units from RGB digital images. Food Res. Int.
39(10), 1084–1091 (2006)
Autonomous Robot KUKA YouBot Navigation
Based on Path Planning and Traffic Signals
Recognition

Carlos Gordón ✉ , Patricio Encalada ✉ , Henry Lema ✉ , Diego León ✉ ,


( ) ( ) ( ) ( )

and Cristian Peñaherrera ✉


( )

Facultad de Ingeniería en Sistemas, Electrónica e Industrial, Universidad Técnica de Ambato,


Ambato 180150, Ecuador
{cd.gordon,pg.encalada}@uta.edu.ec

Abstract. We present the successful demonstration of autonomous robot KUKA


YouBot navigation based on path planning and traffic signals recognition. The
integration of both capabilities path planning and traffic signals recognition was
carried out, thanks to the integration among Robot Operating System, MATrix
LABoratory software and Open Source Computer Vision Library working envi‐
ronments. The Robot Operating System allows the simulation of the autonomous
robot navigation by using Gazebo and provides the implementation of the algo‐
rithms in simulated and real platforms. MATrix LABoratory software improves
the communication tasks taking advantage of data processing tools in the path
planning process. Finally, Open Source Computer Vision Library allows the
traffic signals recognition by using the Scale-Invariant Feature Transform and
Speeded-Up Robust Features algorithm. The integration of Robot Operating
System, MATrix LABoratory software and Open Source Computer Vision
Library is a promising approach to provide autonomous navigation capability in
any mobile robot and in uncontrolled environments.

Keywords: Autonomous navigation · KUKA YouBot


Robot operating system component · Path planning · Traffic signals recognition

1 Introduction

Autonomous robot navigation (ARN) in uncontrolled environments is an extraordinary


ability for any mobile robot in order to achieve a specific goal or perform any task without
external assistance [1]. ARN requires set of subsystems which are working together,
such as building a map of the surrounding world, localizing the robot and the goal point
within the map, making a motion plan according to the map and the localization of the
beginning and goal points, executing that plan, and be prepared when something changes
during the motion execution. All the subsystems should be executed at the same time
which is a challenging task for mobile robots [2]. Several working environments have
been used for providing autonomous navigation with artificial vision techniques in
robots. Among them we can mention: ROS (Robot Operating System, which is a leading
development environment in robotics providing tools and libraries for the development

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 63–78, 2019.
https://doi.org/10.1007/978-3-030-02686-8_6
64 C. Gordón et al.

of robotic systems) [3], Matlab (MATrix LABoratory software, which includes the
Robotics System Toolbox since the R2015A Matlab’s release) [4], and OpenCV (Open
Source Computer Vision Library, specially designed for the treatment, capture and
visualization of images in a wide range of areas such as pattern recognition in robotics,
biometrics, segmentation, etc.) [5].
Different algorithms have been developed in order to integrate the set of subsystems
for ARN in uncontrolled environments such as path planning [6] and Traffic Signals
Recognition [7]. On one hand, path planning is the method of finding the best feasible
path from beginning to goal locations. This topic is of major research and different
techniques have been reported with the intention to implement the path planning
approach. Among them, we have the Probabilistic RoadMap (PRM) which is a motion
planning algorithm used to find a path from start to the goal point in occupancy grid map
[8]. Other path planning approaches have included Normal Probability [9], efficient
interpolation [10], and Heuristics [11] approaches. On the other hand, traffic Signals
Recognition has been required to ensure autonomous robot navigation which needs the
integration of artificial vision techniques in order to perform the recognition task [12].
Artificial vision not only allows the recognition of traffic signals but also it allows the
taking decisions when robots perform the autonomous navigation and new sceneries
appear in the robot’s trajectory [13].
The aim of this work is to present the viability and possibility of the integration of
ROS, Matlab and OpenCV working environments in order to develop the autonomous
robot KUKA YouBot navigation based on path planning and traffic signals recognition.
ROS Hydro medusa is the seventh ROS distribution release, which allows the simulation
of the autonomous robot navigation by using Gazebo and provides the implementation
of the algorithm in the real platform of the robot KUKA YouBot. Matlab with the
Robotics System Toolbox improves the communication tasks with ROS, taking
advantage of data processing tools in the path planning process. Besides, OpenCV allows
the traffic signals recognition by using the SIFT (Scale-Invariant Feature Transform)
[14] & SURF (Speeded-Up Robust Features) [15] combined algorithm. It is important
to mention that we mainly take account in reaching the goal location by the robot KUKA
YouBot and we do not consider the time that the robot requires to achieve the goal
location due to the fact that the path planning and traffic signals recognition algorithms
working together takes a lot of computation time. We are working further in the imple‐
mentation and optimization of other path planning and traffic signals recognition algo‐
rithms in order to reduce the execution time. Finally, the integration of Robot Operating
System, MATrix LABoratory software and Open Source Computer Vision Library is a
promising approach to provide autonomous navigation capability in any mobile robot.
The following sections describe all the process carried out in the demonstration of
autonomous robot KUKA YouBot navigation based on path planning and traffic signals
recognition. Thus, Sect. 2 describes the Robot Operating System, MATrix LABoratory
software and Open Source Computer Vision Library working environments integration.
Then, Sect. 3 introduces the path planning and traffic signals recognition implemented
algorithms. Next, Sect. 4 presents the features of the robot KUKA YouBot in which all
the algorithms were tested. Then, Sect. 5 explains in detail the results reached in the
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 65

Simulation and Experimental testing. And finally, Sect. 6 summarizes the conclusions
of the present work.

2 Working Environments Integration

As aforementioned, in order to achieve the autonomous robot navigation approach, it was


necessary the integration of ROS, Matlab and OpenCV working environments as shown
in Fig. 1. ROS Hydro medusa is the seventh ROS distribution release, which offers tools
and libraries for the development robotic systems. In recent years, ROS has gained wide
currency for the creation of working robotic systems, not only in the laboratory but also in
industry. The autonomous navigation of KUKA youbot was simulated by using gazebo
simulator which is integrated with ROS. With the intention of achieving ROS integration
with stand-alone Gazebo, a set of ROS packages named gazebo_ros_pkgs provides wrap‐
pers around the stand-alone Gazebo. They provide the necessary interfaces to simulate a
robot in Gazebo using ROS messages, services and dynamic features [16]. It is important
to mention that the youBot Gazebo packages incorporates geometry, kinematics, dynamics
and visual models of the KUKA youBot in Universal Robotic Description Format (URDF)
as well as launch files and tools needed to operate the robot in Gazebo. The Robotics
System Toolbox included in Matlab provides a complete integration between Matlab,
Simulink and ROS. The toolbox enables to write, compile and execute code on ROS-
enable robot’s and on robots simulators like aforementioned Gazebo, allowing to generate
ROS node from Simulinks model and implement it into the ROS network [17]. The artifi‐
cial vision algorithm for traffic signals recognition was implemented by using Open CV,
which is the Open Source Computer Vision Library, specially designed for the treatment,
capture and visualization of images in a wide range of areas such as robotics, biometrics,
segmentation, human–computer interaction, monitoring and object recognition.

Fig. 1. Integration of working environments.

A detailed architecture of ROS, Matlab and OpenCV integration is depicted in


Fig. 2. ROS is fundamentally a client/server system. It consists of a series of nodes
(programs) that communicate with each other through topics (dissemination) or services
(interactive communication). It is a process that provides a hard-realtime-compatible
66 C. Gordón et al.

Fig. 2. Architecture of ROS, Matlab and OpenCV integration.


Autonomous Robot KUKA YouBot Navigation Based on Path Planning 67

loop to control a robot mechanism, which is usually designed in a modular way, so that
a system is formed by different controllers as diff_drive_controller, position_controllers,
force_torque_sensor_controller and others. ROS working environment mainly includes
three nodes: image processing, user application and controller node.
ROS Node: Image processing converts images from ROS to OpenCV format or
vice versa through CvBridge, a library which enables to send or receive images with the
OpenCV image processing. Also, this node obtains images with the subscribers from
the publishers established in the ROS Nod: User application and sends different
commands with its publisher to the subscriber in the ROS Node: controller_node.
ROS Node: User_application executes the communication between Client and
Server via ROS Action Protocol, which is built on top of ROS messages. The client and
server then provide a simple API (application program interface, which is a set of
routines, protocols, and tools for building software applications) for users to request
goals (on the client side) or to execute goals (on the server side) via function calls and
callbacks. The User_application and controller nodes communication provides to the
controller node the logical commands for being interpreted to physical actions. The ROS
Action Clients send the position and trajectory information processed with the API and
other tools and protocols to the Action Server of controller node. While, the ROS
Publisher of the User_application node sends the commands like velocity, to the ROS
Subscriber of controller node for the next stage of the process in the communication.
ROS Node: Controller_node transforms commands into measures or signals that
can be understood by the actuators of the robot.
ROS Node: Matlab_global_node corresponds to the script or program created in
Matlab, which receive the data from the controller_node process the information, and
sends a new command through publisher to the controller_node in order to perform an
action in the different actuators in the robot KUKA youbot.
OpenCV image processing handles images, which uses different scripts, libraries
and techniques like SIFT & SURF. The images are processed thanks to the communi‐
cation between the cv:Mat (OpenCV-Class to store images) and CvBridge (ROS-library
to transform images formats).
Finally, YouBot Hardware is the space where the robot system is represented as a
combination of decoupled functional subsystems. The manipulator arm and the base
platform are arranged as the combination of several joints. At the same time, each joint
is defined as a combination of a motor and a gearbox. The communication with the
hardware and the driver is done using the Serial Ethercat connection.

3 Implemented Algorithms

The ARN was performed by the application of two algorithms. The first one is the path
planning algorithm and the second one is the traffic signal recognition algorithm.
Considering the path planning requirement, different algorithms were studied. We are
able to mention probabilistic roadmap (PRM), which is a probabilistic method, one of
its main virtues is its efficiency in the calculation of trajectories with robots of many
degrees of freedom. It can be either a network (single query) or multiple (multiple query)
68 C. Gordón et al.

[18]. Also we have the Lazy PRM algorithm, which is a single query variant. Therefore,
the pre-processing phase will be quite simple, since it is not necessary to generate a
complete network, but simply one that will help us to solve the particular problem [19].
Finally, another algorithm is the rapidly exploring random tree (RRT), which is a sub-
optimal, static model-based, probabilistic planning algorithm that builds a single, unidir‐
ectional, tree-like graph, this part from the starting point and expands throughout the
working environment through a sampling process that looks for random points until it
reaches the end point, at which point it stops [20]. The features of the cited path planning
algorithms are summarized in Table 1 in terms of processing time, space-constrained
solutions, robustness, and computational cost.

Table 1. Algorithms for path planning


Algorithms Processing time Space- Robustness (%) Computational
(seconds, s) constrained Cost (IPS)
solutions (%)
PRM Average Low Low Average
Lazy PRM Low Average Average Low
RRT Average Average High High

Taking into account the features of the reviewed path planning algorithms we
consider the PRM algorithm which provides average processing time and average
computational cost. In fact, PRM algorithm avoids increasing the processing time of the
integrated architecture of ROS, Matlab and OpenCV. The path planning algorithm was
implemented in Matlab, which was carried out thanks to the implementation of pure
pursuit algorithm using probabilistic roadmaps (PRM) in robot navigation. The flow
chart of the implemented PRM algorithm is depicted in Fig. 3. First, we mainly consider
the Robot and algorithm parameters like: Robot dimensions, Start and objective point,
PRM node and PRM minimum distance. Then, we get the image and process the scenery
from Gazebo. Next, we generate the occupancy grid of the image processed in grayscale
considering 0 free and 1 occupied). The following step is to inflate map in rate of robot
dimensions. Then, it is necessary to find random paths and perform the decision process
taking into account the question. Is Path empty? When the answer is true the map is
updated and nodes incremented. In contrary, the path is free and continues the navigation
until reaching the goal location.
Then, different algorithms were reviewed in order to implement the traffic signals
recognition requirement. Among them we are able to mention the Binary Robust Inde‐
pendent Elementary Features (BRIEF) algorithm which works with strings of bits in
order to describe characteristic points. For this reason, BRIEF algorithm is much faster
than SIFT and SURF algorithm. BRIEF algorithm also reduces the complexity in the
matching and detection process between images, which lets low-powered devices run
this algorithm [21]. It is important to mention that BRIEF algorithm is not invariant to
rotation because it can only handle a maximum difference of 10 to 15 degrees. Another
interesting algorithm is the Oriented Fast and Rotated BRIEF (ORB) algorithm. ORB
algorithm was created from BRIEF and was modified in order to be invariant to rotation
and strong against noise [22]. This method uses FAST (Feature from Accelerated
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 69

Fig. 3. Flow chart of the PRM algorithm.

Segment Test) detector to obtain points and BRIEF descriptor. As a result, ORB can be
run in reduced processing capacity devices. Finally an advanced algorithm is a combi‐
nation of Scale-Invariant Feature Transform & Speeded-Up Robust Features algorithms.
SIFT & SURF algorithm allows the automatic traffic signals detection in real time [23].
The main advantage of this algorithm is that the extraction of interest points is acceptable
and provides the best features in scale, illumination and rotation. As an added value,
SIFT & SURF algorithm provides higher robustness indicated by the lower BER values
[24, 25]. The features of the studied algorithms for traffic signals recognition in terms
of processing time, accuracy, robustness, computational cost and rotation are summar‐
ized in Table 2.
70 C. Gordón et al.

Table 2. Algorithms for traffic signals recognition


Algorithms Processing time Accuracy Robustness (%) Computation Rotation
(seconds, s) (dispersion, σ) al Cost (IPS) (Degrees, °)
BRIEF High Medium Low Low 10°–15°
ORB High Medium Medium Low Invariant
SIFT & Low High High High Invariant
SURF

Finally, the traffic signals detection system was implemented in OpenCV, which was
carried out with the SIFT & SURF algorithm by considering the features like high accu‐
racy, high robustness and invariant to rotation. It is important to mention that in the
present work, we do not consider processing time and computational cost features. So,
we are working farther in order to reduce processing time, and computational cost with
other algorithms in feature studies.

4 Robot KUKA YouBot

The integration of ROS, matlab and openCV was implemented experimentally in the
KUKA youBot which is an open, expandable and modular robotic system. This robot
is specially developed for research purposes with emphasis on robotics. KUKA you-bot
mainly consist of an omnidirectional platform, a robotic arm with five degrees of
freedom, and a gripper grip with two fingers which is depicted in Fig. 4. All the data

Fig. 4. KUKA YouBot, available in Technical University of Ambato in Ecuador.


Autonomous Robot KUKA YouBot Navigation Based on Path Planning 71

acquisition and the experimental demonstration were developed in the robotics labora‐
tory of the Technical University of Ambato in Ecuador.

5 Simulation and Experimental Results

The simulation of the system using gazebo consists in having the robot with its actuators
and sensors in a three-dimensional environment, where the transit signals are placed so
that they have line of sight with the camera. The procedure begins with the modeling of
the robot that is obtained from the repository of YouBot Store and its surroundings with
3D models made in SketchUp and Blender, whose models must be managed by Gazebo
for which, in the .config and .sdf files are configured the physical properties of the object
in 3D, such as mass, inertia, texture, shape and color to be imported into the gazebo
work space, where we can already use them to assemble the navigation environment of
the mobile robot. Finally, we can execute the movement control scripts of both the
omnidirectional platform and the robotic arm. The pictures of the simulation of robot
KUKA Youbot in gazebo environment are depicted as follows. Figure 5(a) sketches the
robot KUKA Youbot in 3D environment. Figure 5(b) depicts a zoom in of the robot
KUKA Youbot in 3D environment. Figure 5(c) shows the robot KUKA Youbot closed

Fig. 5. Gazebo Simulation. (a) Robot KUKA Youbot in 3D environment. (b) Zoom in of robot
KUKA Youbot in 3D environment. (c) Robot KUKA Youbot and stop traffic signal. (d) Robot
KUKA Youbot and one way traffic signal.
72 C. Gordón et al.

to the stop traffic signal. And Fig. 5(d) depicts the robot KUKA Youbot closed to the
one way traffic signal.
The path planning algorithm was implemented in the created road map (25 m * 20 m)
which is depicted in Fig. 6. Where we observe depicted with asterisks, the start location,
goal location, one way traffic signal and stop traffic signal within the map. The purpose
of one way traffic signal is the path changing and the stop traffic signal is to wait for 60 s
before continuing the path. Moreover, the result of PRM algorithm applied in the prob‐
abilistic road map is depicted in Fig. 7, in which we are able to identify 60 nodes. We
do not use greater number of nodes with the intention of reducing the computational
effort. We are able to detect the path in orange line. It is important to mention that the
solutions provided by PRM are not the optimal path. Also, the optimized path is depicted
in green dashed line obtained via mean square optimization. Besides, we have the real
trajectory in red continuous line performed by the robot KUKA YouBot, we mainly
appreciate the changes in the trajectory due to the traffic signal detection and taking
decisions. It is necessary to mention that we avoid some features like the proximately
to walls and other objects in order to reduce complexity.

Fig. 6. Road Map with start, goal and traffic signals locations.
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 73

Fig. 7. Road Map with PRM execution. Probabilistic Path in orange line, Optimized path in green
dashed line, Real trajectory in red continuous line.

The artificial vision techniques based on SIFT & SURF algorithm allowed
performing the Traffic Signals Recognition in the real platform execution. The summar‐
ized process was carried out in the following way. First, it is necessary to have the pattern
library of the traffic signals. The pattern of the Stop traffic signal is sketched in Fig. 8(a).
Second, it is the acquisition of the image with a Microsoft HD camera located in the
fingers of the gripper, when the KUKA YouBot is executing the path. The obtained
image from the camera is sketched in Fig. 8(b). Third, it is the extraction of the features
of the pattern image. Figure 8(c) shows the features extraction from the pattern. Fourth,
it is the extraction of the features of the obtained image from the camera which is depicted
in Fig. 8(d). Fourth, it is the comparison of the features between the two previous
extractions, Fig. 8(e) depicts the feature comparison. Finally, we have the detection
result of the traffic signal, which is shown in Fig. 8(f).
74 C. Gordón et al.

a) b)

c) d)

e)

f)

Fig. 8. SIFT & SURF algorithm execution, (a) Pattern of the Stop traffic signal, (b) Obtained
image from the camera, (c) Features extraction from the pattern, (d) Features extraction from the
obtained image, (e) Feature comparison Pattern, and (f) Detection result.

Also, the pictures when the robot KUKA YouBot meets the traffic signal are depicted
in Figs. 9 and 10, during the real test. The KUKA Youbot with the one way traffic signal
is sketched in Fig. 9. While Fig. 10 shows the moment when the robot reaches the place
where the Stop traffic signal is located. In simulation and real platforms the tasks were
performed with an average linear velocity around 0.20 m/s and the average angular
velocity around 0.45 m/s. The average time of reaching the goal by the robot KUKA
YouBot was around 2 min. We mainly take into account in reaching the goal location
by the robot KUKA YouBot and we do not consider the time that the robot requires to
achieve the goal location due to the fact that the path planning and traffic signals recog‐
nition algorithms working together takes a lot computation time and effort. We are
working further in the implementation and optimization of other path planning and traffic
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 75

signals recognition algorithms in order to reduce the execution time. Besides, we are
looking for the implementation of machine learning algorithms in order to improve the
ability of recognition of all available traffic signals.

Fig. 9. Robot KUKA YouBot and Stop traffic signal.


76 C. Gordón et al.

Fig. 10. Robot KUKA YouBot and One way traffic signal.

6 Conclusions

In conclusion, the autonomous robot KUKA YouBot navigation based on path planning
and traffic signals recognition has been presented. The integration of both capabilities
path planning and traffic signals recognition was achieved by the integration of ROS,
Matlab and OpenCV working environments. ROS allowed the simulation of the auton‐
omous robot navigation by using Gazebo and provided the implementation of the algo‐
rithm in the real platform of robot KUKA YouBot. Matlab improved the communication
tasks by taking advantage of data processing tools in the path planning process. Finally,
OpenCV allows the traffic signals recognition by using the SIFT & SURF algorithm.
We have successfully demonstrated that the integration of ROS, Matlab and OpenCV
is a promising approach to provide autonomous navigation capability in any mobil robot.
Finally, it is important to mention that the capability of traffic signal recognition opens
new areas of research in the field of artificial intelligence and object recognition due to
the fact that the fundamentals of the traffic signals recognition can be applied in other
kind of objects recognition.
Autonomous Robot KUKA YouBot Navigation Based on Path Planning 77

Acknowledgement. The authors acknowledge the Technical University of Ambato in Ecuador


for providing all support and facilities including the robot KUKA YouBot.

References

1. Perez, A., Karaman, S., Shkolnik, A., Frazzoli, E., Teller, S., Walter, M.R.: Asymptotically-
optimal path planning for manipulation using incremental sampling based algorithms. In:
IEEE/RSJ International Conference Intelligent Robots and Systems, pp. 4307–4313 (2011)
2. Corke, P.: Integrating ROS and MATLAB. IEEE Robot. Autom. Mag. 22(2), 18–20 (2015)
3. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T.: ROS: an opensource robot operating
system. In: ICRA Workshop on Open Source Software, vol. 3, no. 2, p. 5 (2009)
4. Matlab: Robotics System Toolbox. http://mathworks.com/help/robotics/index.html.
Accessed 21 Mar 2018
5. Bradski, G., Kaehler, A.: OpenCV. Dr. Dobb’s journal of software tools, 3ed (2000)
6. Kumar, N., Zoltán, V., Szabó-Resch, Z.: Robot path pursuit using probabilistic roadmap. In:
IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI),
pp. 000139–000144 (2016)
7. Adorni, G., Monica, M., Agostino, P.: Autonomous agents coordination through traffic signals
and rules. In: IEEE Conference on Intelligent Transportation System (ITSC 1997), pp. 290–
295 (1997)
8. Kavraki, L.E., Švestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path
planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12(4), 566–
580 (1996)
9. Amith, A.L., Singh, A., Harsha, H.N., Prasad, N.R., Shrinivasan, L.: Normal probability and
heuristics based path planning and navigation system for mapped roads. Procedia Comput.
Sci. 89, 369–377 (2016)
10. Akulovi, M., Ikeš, M., Petrovi, I.: Efficient interpolated path planning of mobile robots based
on occupancy grid maps. IFAC Proc. 45(22), 349–354 (2012)
11. Jun, J.Y., Saut, J.P., Benamar, F.: Pose estimation-based path planning for a tracked mobile
robot traversing uneven terrains. Rob. Auton. Syst. 75, 325–339 (2016)
12. Mahadevan, S.: Machine learning for robots: a comparison of different paradigms. In:
Workshop on Towards Real Autonomy, IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 1996) (1996)
13. Lidoris, G., Rohrmuller, F., Wollherr, D., Buss, M.: The Autonomous City Explorer (ACE)
project—mobile robot navigation in highly populated urban environments. In: IEEE
International Conference on Robotics and Automation (ICRA 2009), pp. 1416–1422 (2009)
14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.
60(2), 91–110 (2004)
15. Bay, H., Tinne, T. Luc Van, G.: Surf: speeded up robust features. In: European Conference
on Computer Vision, pp. 404–417. Springer, Berlin (2006)
16. Craig, C.: A Robotics Framework for Simulation and Control of a Robotic Arm for Use in
Higher Education. MS in Computer Science Project Reports (2017)
17. Galli, M., Barber, R., Garrido, S., Moreno, L.: Path planning using Matlab-ROS integration
applied to mobile robots. In: IEEE International Conference on Autonomous Robot Systems
and Competitions (ICARSC), pp. 98–103 (2017)
18. Kavraki, L.E., Latombe, J.C., Latombe, E.: Probabilistic roadmaps for robot path planning.
In: Practical Motion Planning Robotics Current Approaches and Future Directions, pp. 1–21
(1998)
78 C. Gordón et al.

19. Bohlin, R., Kavraki, L.E.: Path planning using lazy PRM. In: Proceedings of the IEEE
International Conference on Robotics and Automation, vol. 1, pp. 521–528 (2000)
20. LaValle, S.M.: Rapidly-exploring random trees: a new tool for path planning. Citeseerx, vol.
129, pp. 98–11 (1998)
21. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary
features. In: Proceedings of the 11th European Conference on Computer Vision, ser. ECCV
2010, pp. 778–792. Springer, Berlin (2010)
22. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or
SURF. In: IEEE International Conference on Computer Vision (ICCV) (2011)
23. Dreuw, P., Steingrube, P., Hanselmann, H. Ney, H.: SURF-face: face recognition under
viewpoint consistency constraints. In: BMVC, pp. 1–11 (2009)
24. Shaharyar, T., Khan, A., Saleem, Z.: A comparative analysis of SIFT, SURF, KAZE, AKAZE,
ORB, and BRISK. In: International Conference on Computing, Mathematics and Engineering
Technologies (iCoMET) (2018)
25. Zrira, N., Hannat, M., Bouyakhf, E. H., Ahmad, H.: 2D/3D object recognition and
categorization approaches for robotic grasping. In: Advances in Soft Computing and Machine
Learning in Image Processing, pp. 567–593. Springer, Cham (2018)
Towards Reduced Latency in Saccade Landing
Position Prediction Using Velocity Profile Methods

Henry Griffith1 ✉ , Subir Biswas1, and Oleg Komogortsev2


( )

1
Department of Electrical and Computer Engineering, Michigan State University,
East Lansing, MI 48824, USA
{griff561,sbiswas}@msu.edu
2
Department of Computer Science and Engineering, Michigan State University,
East Lansing, MI 48824, USA
ok@msu.edu

Abstract. Saccade landing position prediction algorithms are a promising


approach for improving the performance of gaze-contingent rendering systems.
Amongst the various techniques considered in the literature, velocity profile
methods operate by first fitting a window of velocity data obtained at the initiation
of the saccadic event to a model profile known to resemble the empirical dynamics
of the gaze trajectory. The research described herein proposes an alternative
approach to velocity profile-based prediction aimed at reducing latency. Namely,
third-order statistical features computed during a finite window at the saccade
onset are mapped to the duration and characteristic parameters of the previously
proposed scaled Gaussian profile function using a linear support vector machine
regression model using an offline fitting process over the entire saccade duration.
Prediction performance is investigated for a variety of window sizes for a data
set consisting of 9,109 horizontal saccades of a minimum mandated data quality
induced by a 30-degree step stimulus. An RMS saccade amplitude prediction error
of 1.5169° is observed for window durations of one-quarter of the saccade dura‐
tion using the newly proposed method. Moreover, the method is demonstrated to
reduce prediction execution time by three orders of magnitude versus techniques
mandating online fitting.

Keywords: Eye movement prediction · Gaze-contingent rendering


Foveated rendering

1 Purpose

While gaze-contingent rendering systems (GCRS) offer tremendous potential for


enhancing the user experience in virtual reality (VR) environments, latency concerns
during saccadic eye movements remain an area of open interest in the academic literature
[1, 2]. To address these limitations, a variety of techniques for predicting the landing
position at the onset of saccadic events continue to be proposed [3, 4]. A subclass of
these techniques develops predictions based upon fitting kinematic gaze data to a char‐
acteristic function known to resemble the empirical dynamics of saccadic trajectories
[5]. Approaches fitting eye velocity data to a model profile consistent with the main

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 79–91, 2019.
https://doi.org/10.1007/978-3-030-02686-8_7
80 H. Griffith et al.

sequence relationship between saccade velocity, amplitude, and duration [6], hereby
referred to as velocity profile methods, have been previously considered. While prom‐
ising in principle with respect to their capacity to produce physiologically-meaningful
gaze location estimates across the entire saccadic duration through direct integration,
current approaches instead use profile parameters obtained from the fitting process as
predictor values in a linear regression model for amplitude determination. Furthermore,
application of this technique assumes the feasibility of performing the requisite optimi‐
zation for fitting in an online capacity, which may prove challenging depending upon
the computational capacity of the deployment hardware, along with the specific profile
model and optimization algorithm utilized [7].
The research described herein seeks to address these concerns by introducing an
alternative technique for velocity profile-based prediction of saccade landing position.
The proposed approach performs the requisite profile fitting process in an offline training
process. This modification allows for fitting over the entire saccade duration, thereby
improving adherence to the model profile versus online methods fitting only over the
initial portion of the saccade. Using these results, linear support vector machine regres‐
sion models are developed which map simplistic features computed over a finite duration
window occurring near the saccade onset to both the parameter sets defining the profile
function, along with the saccade duration. These models are subsequently utilized in
online operation, thus providing physiologically-meaningful estimates of the saccadic
trajectory throughout its duration without requiring the previously mandated online
fitting process. Results are presented for a data set consisting of 9,109 horizontal
saccades induced by a 30-degree step stimulus, each subjected to specified quality
inclusion criteria. Details regarding the experimental procedure, data quality filtering,
algorithm development and analysis, and plans for further research are provided in the
remainder of this manuscript.

2 Background/Significance

Eye tracking technology has long been employed across a variety of research domains.
Specific applications range from fundamental endeavors, such as exploring the nature
of information processing through the human visual system (HVS) [8], to more applied
efforts, including applications in visual marketing [9] and biometrics [10]. Commercial
interest in the technology has recently accelerated, as indicated by considerable acquis‐
ition activity in the space (i.e. Google’s acquisition of Eyefluence, Facebook’s acquis‐
ition of Eye Tribe, Apple’s acquisition of Sensormotoric Instruments (SMI), etc.).
Amongst emerging applications, eye tracking is especially promising for integration
within VR environments, due to its potential to improve display performance through
application of gaze-contingent rendering paradigms [11].
GCRS operate by varying display content as a function of the user’s assumed point
of gaze, which is obtained through use of an eye tracker. Such foveated rendering strat‐
egies exploit the inherent asymmetry in visual acuity across the HVS, where high quality
vision is isolated in the center of the field. This asymmetry is associated with the dense
concentration of photoreceptors in the fovea, along with the supporting processing
Towards Reduced Latency in Saccade Landing Position Prediction 81

capacity throughout the remainder of the visual pathway [12] while GCRS have received
attention in the literature for research investigating the unique contributions of central
and peripheral vision during various tasks (i.e. reading [13], visual search [14], etc.),
commercial applications seeking to enhance display performance through improved
efficiency and reduced latency have also been considered. Namely, studies modulating
various determinants of display quality, such as spatial resolution [15] and color [16]
have been investigated.
While the specifications of display and eye tracking hardware are continuously
improving, system latency remains a fundamental limitation for implementing GCRS
[17]. Latency concerns are especially pronounced during the rapid eye movements
between points of fixation known as saccades, where substantial misalignment between
the optimized display region and true gaze location may occur. While saccadic suppres‐
sion is generally believed to mitigate the effect of misalignment by reducing the sensi‐
tivity of the HVS during the saccadic event, examples of intersaccadic perception have
been noted in the literature [18, 19]. Moreover, such misalignments are problematic after
the saccade has ended, as evidence suggests that perception is restored rapidly (between
10 and 50 ms) after completion [20]. To help avoid misalignments in the presence of
saccades, GCRS may utilize saccade landing position prediction (SLPP) techniques, in
which the subsequent display update is adjusted based upon the anticipated gaze landing
point. Predictions are performed at the initiation of the saccadic event as identified using
various online eye movement classification algorithms (i.e. I-VT, etc.) [21].
A variety of techniques for SLPP have been proposed in the literature over the prior
two decades. While diverse in their approach, recent research [4] has proposed a partition
of current methods into those regressing data onto a specific model motivated by the
anatomy and physiology of the underlying oculomotor system, and those which operate
independently of such models. With respect to model-based algorithms, techniques
leveraging functions derived from an underlying oculomotor plant model [22], along
with those approaches which assume a model profile function based upon empirical
observations of eye movement trajectories have been proposed [5, 7]. Amongst the latter
class of solutions, algorithms performing standard linear regression [3], along with an
approach based upon a Taylor series expansion [4], have been demonstrated.

3 Methods

3.1 Experimental Procedure

Data was obtained from an eye-tracking study conducted at Texas State University in
2014 under protocol approved by the Institutional Research Board. A total of 335 partic‐
ipants (178 male, 157 female), ranging in age from 18 to 46, were initially enrolled in
the study, which required completion of a variety of tasks aimed at investigating multiple
oculomotor behaviors of interest (i.e. performing horizontal and oblique saccades under
the induction of a stimulus, reading, etc.). Of those initial enrollees, 322 participants
completed two consecutive sessions of the horizontal stimulus (HS) task under consid‐
eration within this research.
82 H. Griffith et al.

Within the HS task, saccades were induced by varying a stimulus along the horizontal
axis of a 474 × 297 mm (1680 × 1050 pixel) Viewsonic 22″ display in a 30-degree step-
wise fashion. Participants were positioned 550 mm from the black background display.
The utilized stimulus was a white circle of diameter corresponding to approximately 1°
of the visual angle, which enclosed a smaller black circle to promote focus at the center.
Beginning at the origin, the stimulus displaced horizontally, oscillating between −15°
and 15° for 100 iterations, remaining stationary for 1 s between each step.
Oculomotor behaviors were recorded using a SR EyeLink 1000 eye tracking sensor.
The sensor performs monocular eye tracking at a sampling rate of 1000 Hz with a
specified typical accuracy of 0.25–0.50°, and a spatial resolution of 0.25° during
saccadic events. An example of the raw data output of the eye tracker over a HS task
session is depicted in Fig. 1.

Fig. 1. Sample eye tracker output (Subject 1, Trial 1).

3.2 Data Inclusion Criteria

To ensure adequate data quality, inclusion criteria were established at both the session
and event level. Namely, session-level data was screened according to the mean accuracy
computed during post-calibration verification, along with the portion of lost data and
spatial precision computed during each session. Intra-recording precision was computed
as the root-mean square (RMS) value of the inter-sample angular distances [23] occur‐
ring during classified inter-stimuli fixation events of at least 500 ms duration, with
fixation events identified using an offline eye movement classifier described in [24]. A
visualization of two classified fixation events of varying duration occurring during the
stimulus stationary period is depicted in Fig. 2.
Towards Reduced Latency in Saccade Landing Position Prediction 83

Fig. 2. Visualization of varying duration fixation events occurring during stationary stimulus
interval for precision computation

The distribution of all three session-level inclusion metrics across the 644 sessions
is depicted in Fig. 3, with the associated inclusion thresholds summarized in Table 1.

Fig. 3. Distribution of session-level data quality inclusion metrics across candidate data set.
84 H. Griffith et al.

To produce a symmetrical data set (i.e.: two sessions per participant), the matching
session for each participant was also removed for records violating session-level inclu‐
sion criteria. The resulting data set after preliminary quality filtering consisted of 91
subjects, having a mean accuracy of 0.3908° ± 0.1044°, portion of lost data during
recording of 0.8724% ± 0.7570%, and a precision of 0.0149° ± 0.0058° (mean ± std).

Table 1. Session-level data quality thresholds


Data quality inclusion metric Threshold value
Maximum mean accuracy 0.6° of the visual angle
Maximum proportion of lost data samples 3%
Minimum intra-recording mean precision 0.05°

Additional inclusion criteria were applied on the saccadic event level, with events
identified using the aforementioned offline eye movement classification algorithm.
Namely, all classified saccades whose amplitudes were not consistent with the induced
stimulus (i.e. corrective saccades, partitions of the stimulus interval into two saccadic
events, etc.) were discarded. Moreover, events exhibiting any lost data samples, or
physiologically infeasible eye velocities were also removed from the analysis set.
Finally, to remove scenarios in which classifier timing errors may corrupt results due to
either delayed detection or premature termination, a maximum initial and final velocity
value was also mandated. Saccadic event-level exclusion criteria are summarized in
Table 2.

Table 2. Event-level data quality thresholds


Data quality inclusion metric Threshold value
Allowable amplitude range 28°–32°
Maximum number of lost data samples 0%
Maximum velocity 800°/s
Maximum initial and final velocity 100°/s

The aggregate application of session and event level data inclusion criteria produced
an analysis data set of 9109 saccades. The distribution of amplitudes of classified
saccadic events in both the original and analysis data set is depicted in Fig. 4.
Towards Reduced Latency in Saccade Landing Position Prediction 85

Fig. 4. Distribution of saccade amplitudes for entire classifier output and analysis subset.

3.3 Analysis Methods


The scaled Gaussian velocity profile specified in (1), originally introduced in [7] for
SLPP applications, was employed as a model velocity function within this work.
( )2
t−b
− (1)
(p)) ≈ a ∗ e c

where p = [A, b, c]′ denotes the characteristic parameter vector of the profile function,
a is a scaling parameter representing the maximum saccade velocity, b is a location
parameter representing the time of occurrence of the maximum velocity, and c is a shape
parameter related to the width of the profile.
To begin, an offline procedure was performed for each element of the analysis set,
where optimal parameter values were computed by fitting the velocity data of each
sample over the entire saccadic event to the profile function in (1) via non-linear least
squares optimization as specified in (2).
(∑ )
min ri2 = f (v, (p))
i
(2)
S.T.:pi ∈ Ii
86 H. Griffith et al.

where ri2 is the residual sum of squares loss function, v is the velocity data computed
from the eye tracker output using a second order Savitzky–Golay filter, and Ii is the
interval bound on the ith component of the parameter vector. To control for variability
associated with classifier performance with respect to detection of the saccade onset, all
records were adjusted such that any preliminary data for which the radial velocity was
below 20° per second was truncated (i.e. reducing excessive data for the case of prema‐
ture detection. No such adjustments were performed for late detection cases as they were
addressed in the data pre-filtering process). Interval bounds were established using
physiological information and empirical analysis[ as a function of the
] local[ data profile
]
[ ] D D D
as follows: A ∈ 0.9 ∗ vmax , 1.1 ∗ vmax , b ∈ 0.7 ∗ , 1.3 ∗ , c ∈ 0, 1.3 ∗ ,
2 2 2
where vmax is the maximum value of the velocity sample, and D is the duration of the
velocity sample. All fitting operations were performed using the MATLAB fit function,
which performs non-linear least squares optimization using the Levenberg–Marquardt
algorithm.
Next, a feature set based upon the 3rd order statistics of the windowed time series
was computed for the three durations of interest (3),
( ) ( )
Xw = [v∗W , nv |vw = v∗W , s vW , k vw ,
( ) ( ) (3)
a∗W , na |aw = a∗W , s aW , k aw ]′

where vw and aw denote the fixed windowed velocity and acceleration data (determined
as the traditional derivative of the velocity signal) of duration W, (·)* denotes the
maximum value of the windowed time series, and s(·) and k(·) denote the standard devi‐
ation skewness operators, respectively.
{ For the current experiment, the considered
}
D D D
window durations were W ∈ , , . The feature set was chosen in an ad-hoc
2 4 8
fashion on the basis of preliminary simplicity, along with initial analysis and supporting
domain intuition.
Once the feature set had been computed for the various window durations, predictive
linear support vector machine regression models, hereby denoted as 𝜑j , j ∈ {12, 3, 4},
for both the characteristic parameter set elements and saccade duration were developed.
All models were obtained using the fitrsvm function in MATLAB under default algo‐
rithm hyperparameters, with 5-fold cross validation performed. A summary of the
proposed modified prediction workflow versus the previously proposed online method
is depicted in Fig. 5.
A visualization of profile estimates for the various fixed window durations consid‐
ered herein is depicted in Fig. 6. As noted, all three symmetrical estimates are unable to
model the demonstrated skewness of the velocity data associated with large amplitude
saccades.
Towards Reduced Latency in Saccade Landing Position Prediction 87

Fig. 5. Proposed workflow for modified velocity profile-based SLPP.

Fig. 6. Predicted velocity profiles for varying window durations.

4 Results

The online amplitude estimation procedure depicted in Fig. 5 was employed across the
entire analysis data set. The RMS error of the saccade amplitude prediction was used as
a metric to evaluate prediction accuracy. Requisite computational time, as quantified
using the internal timer available in MATLAB through the native tic and toc functions
(Intel i7-7500U processor, 16 GB RAM), was also recorded. Amplitude estimates were
formulated on kinematic principles as denoted in (4). Integrations were estimated
numerically in MATLAB using the trapz function.
D
Ei = Âi − Ai = v(t)dt − Ai (4)
∫0
88 H. Griffith et al.

To perform preliminary benchmarking of the efficacy of the proposed approach,


amplitude estimates were also developed using a variation of the technique described
in [7]. Namely, fitting of the velocity data for fixed window durations was performed
online in a manner identical to that presented above for the offline training procedure
introduced herein. A linear regression (performed using MATLAB’s fitlm function)
model was then developed using 5-fold cross validation to estimate the saccade ampli‐
(
tude as a function of the 4 parameters )
proposed in the original work as estimated from
c
the online fitting procedure (i.e. a, b, c, ). It should be noted that this method does
a
not provide an estimate of the velocity trajectory over the remainder of the saccade
duration due its inability to directly estimate the saccade duration. This benchmarking
approach differs slightly from that originally proposed in [7], in that a rolling window
with convergence criteria is replaced by the fixed windows to promote comparability
between the two methods.
The RMSE of the amplitude predictions is presented in Table 3 for both the newly
proposed method and benchmarking algorithm. Corresponding mean execution times
required for each prediction are presented in Table 4. While the traditional method
produces improved accuracy bounded by a factor of 2 across the various durations
considered, the newly proposed method reduces execution time by three orders of
magnitude for the computational workflow (i.e. algorithm and architecture parameters)
used in this analysis. Furthermore, for both methods, inclusion of a larger portion of the
saccade duration within the prediction provides either limited or no marginal improve‐
ment in prediction accuracy. For the newly proposed method, the reduction in accuracy
observed for expanding window duration from D/4 to D/2 may be associated with
reduction in the diversity of the considered feature set (for example, in the limiting sense
where the duration includes the profile peak, the maximum velocity feature should be
nearly identical as suggested by the main sequence relationship for the constant step
stimulus used in data generation).

Table 3. Comparative RMSE accuracy


( ) ( )
Window duration RMSE Ei , RMSE Ei ,
New method (°) Traditional method (°)
W = D/8 1.6917 0.9758
W = D/4 1.5169 0.9624
W = D/2 1.7006 0.9408

Table 4. Comparative mean execution times


Window duration Mean(Exec.Time) Mean(Exec.Time)
New method (s) Traditional method (s)
W = D/8 19.1 ∗ 10−6 32.2 ∗ 10−3
W = D/4 18.3 ∗ 10−6 29.1 * 10−3
W = D/2 17.1 ∗ 10−6
29.5 * 10−3
Towards Reduced Latency in Saccade Landing Position Prediction 89

Preliminary investigation has been conducted attempting to identify common


sources of error in amplitude estimates for the newly proposed method. Namely, manual
investigation of the worst-case estimates across the data set has been performed, with
initial analysis indicating that estimates are particularly corrupted for those velocity
profiles varying from the idea case (i.e. noisy profiles whose dynamics are not consistent
with the ideal scenario of a concave function). While additional pre-filtering may be
utilized to remove these results in subsequent analysis attempting to quantify the best-
case performance of this proposed approach, options for best handling such noisy
profiles in practice are of primary concern in future research.

5 Conclusions

A novel method for reducing the latency of existing velocity profile-based SLPP algo‐
rithms is introduced and explored herein. Rather than performing the requisite fitting
process for determination of the characteristic parameter set in real-time, the proposed
method uses linear SVM mappings relating simplistic third-order statistical features
computed during fixed duration windows at the saccade onset to both the profile’s char‐
acteristic parameter set and saccade duration. Models are developed offline based upon
fitting conducted over the entire saccade duration. This proposed methodology offers
the benefit of producing physiologically-meaningful saccade landing position predic‐
tions without requiring the online solution of the underlying non-linear optimization
problem mandated in determining the characteristic parameter set for the previously
proposed scaled Gaussian profile. Benchmarking versus a slight variation of the previ‐
ously proposed technique demonstrated that although RMSE prediction accuracy was
reduced on the order of a factor of 2 (corresponding to a RMSE percent accuracy reduc‐
tion of 2.25% computed for the ideal step stimulus amplitude), requisite execution time
is reduced by 3 orders of magnitude for the computational workflow considered herein.
For all cases considered, increasing window duration provided limited to no marginal
improvement in prediction accuracy. This latter result is promising for enhancing
prediction speed in practical implementations.
While the reported results are encouraging, their generalizability is inherently limited
by the level of pre-filtering that was performed to yield empirical profiles resembling
the produced model functions. This analysis approach was chosen to establish a perform‐
ance baseline for the highest possible data quality conditions. Future research will
establish the performance of various saccade prediction methods in cases of varied data
quality, and for a more diverse set of amplitude values and directions (i.e. vertical and
oblique saccades). Furthermore, subsequent work will attempt to optimize the general
workflow introduced herein through application of standard best-practices in regression
approximation, including utilization of traditional feature selection algorithms, consid‐
eration of alternative regression models and optimization of associated hyperparameters,
along with the consideration of alternative velocity profiles suitable for modeling a
broader range of trajectories encountered in practice, such as the skewed model profile
based upon the Wald distribution recently proposed [25]. This latter modification is
90 H. Griffith et al.

especially promising for predicting the known skewed velocity profiles of large
amplitude saccades.

References

1. Padmanaban, N., Konrad, R., Stramer, T., Cooper, E.A., Wetzstein, G.: Optimizing virtual
reality for all users through gaze-contingent and adaptive focus displays. In: Proceedings of
the National Academy of Sciences, p. 201617251 (2017)
2. Albert, R., Patney, A., Luebke, D., Kim, J.: Latency requirements for foveated rendering in
virtual reality. ACM Trans. Appl. Percept. 14(4), 25 (2017)
3. Arabadzhiyska, E., Tursun, O.T., Myszkowski, K., Seidel, H.-P., Didyk, P.: Saccade landing
position prediction for gaze-contingent rendering. ACM Trans. Gr. 36(4), 50 (2017)
4. Wang, S., Woods, R.L., Costela, F.M., Luo, G.: Dynamic gaze-position prediction of saccadic
eye movements using a Taylor series. J. Vis. 17(14), 3 (2017)
5. Han, P., Saunders, D.R., Woods, R.L., Luo, G.: Trajectory prediction of saccadic eye
movements using a compressed exponential model. J. Vis. 13(8), 27 (2013)
6. Bahill, A.T., Clark, M.R., Stark, L.: The main sequence, a tool for studying human eye
movements. Math. Biosci. 24(3–4), 191–204 (1975)
7. Paeye, C., Schütz, A.C., Gegenfurtner, K.R.: Visual reinforcement shapes eye movements in
visual search. J. Vis. 16(10), 15 (2016)
8. Rayner, K.: Eye movements in reading and information processing: 20 years of research.
Psychol. Bull. 124(3), 372 (1998)
9. Wedel, M., Pieters, R.: A review of eye-tracking research in marketing, pp. 123–147. Emerald
Group Publishing Limited (2008)
10. Bednarik, R., Kinnunen, T., Mihaila, A., Fränti, P.: Eye-movements as a biometric, pp. 780–
789 (2005)
11. Patney, A., et al.: Towards foveated rendering for gaze-tracked virtual reality. ACM Trans.
Graph. 35(6), 179 (2016)
12. Banks, M.S., Sekuler, A.B., Anderson, S.J.: Peripheral spatial vision: limits imposed by
optics, photoreceptors, and receptor pooling. J. Opt. Soc. Am. A 8(11), 1775 (1991)
13. Rayner, K.: The gaze-contingent moving window in reading: development and review. Vis.
Cognit. 22(3–4), 242–258 (2014)
14. Nuthmann, A.: How do the regions of the visual field contribute to object search in real-world
scenes? Evidence from eye movements. J. Exp. Psychol. Hum. Percept. Perform. 40(1), 342
(2014)
15. Prince, S.J., Rogers, B.J.: Sensitivity to disparity corrugations in peripheral vision. Vis. Res.
38(17), 2533–2537 (1998)
16. Duchowski, A.T., Bate, D., Stringfellow, P., Thakur, K., Melloy, B.J., Gramopadhye, A.K.:
On spatiochromatic visual sensitivity and peripheral color LOD management. ACM Trans.
Appl. Percept. 6(2), 9 (2009)
17. Saunders, D.R., Woods, R.L.: Direct measurement of the system latency of gaze-contingent
displays. Behav. Res. Methods 46(2), 439–447 (2014)
18. Diamond, M.R., Ross, J., Morrone, M.C.: Extraretinal control of saccadic suppression. J.
Neurosci. 20(9), 3449–3455 (2000)
19. Mathôt, S., Melmi, J.-B., Castet, E.: Intrasaccadic perception triggers pupillary constriction.
PeerJ 3, e1150 (2015)
20. Anliker, J: Eye movements: online measurement, analysis, and control. In: Eye Movements
and Psychological Processes (1976)
Towards Reduced Latency in Saccade Landing Position Prediction 91

21. Salvucci, D.D., Goldberg, J.H.: Identifying fixations and saccades in eye-tracking protocols,
pp. 71–78 (2000)
22. Bahill, A.T., Latimer, J.R., Troost, B.T.: Linear homeomorphic model for human movement.
IEEE Trans. Biomed. Eng. 11, 631–639 (1980)
23. Holmqvist, K., Nyström, M., Mulvey, F.: Eye tracker data quality: what it is and how to
measure it, pp. 45–52 (2012)
24. Friedman, L, Rigas, I, Abdulin, E, Komogortsev, O.V.: A novel evaluation of two related and
two independent algorithms for eye movement classification during reading. Behav. Res.
Methods (2018)
25. Griffith, H., Biswas, S., Komogortsev, O.V.: Towards improved saccade landing position
estimation using velocity profile methods. In: IEEE SoutheastCon 2018, St. Petersburg FL
(2018)
Wireless Power Transfer Solutions for ‘Things’
in the Internet of Things

Tim Helgesen ✉ and Moutaz Haddara


( )

Westerdals – Oslo School of Arts, Communication and Technology, Oslo, Norway


Timrobbyh@gmail.com, Hadmoa@westerdals.no

Abstract. The Internet of Things (IoT) has several applications in various indus‐
tries and contexts. During the last decade, IoT technologies were mainly domi‐
nated by the supply chains and warehouses of large manufacturers and retailers.
Recently, IoT technologies have been adopted in virtually all other fields,
including healthcare, smart cities, and self-driving cars. While the opportunities
for IoT applications are endless, challenges do exist. These challenges can be
broadly classified as social, political, organizational, privacy, security, environ‐
mental, and technological challenges. In this paper, we focus on one dimension
of the technological challenges, specifically on how IoT products/devices can be
powered and charged without interruption, while either in use or in motion, since
they are known to be intensively power consuming objects. This literature review
paper explores how the emerging technology of Wireless Power Transfer (WPT)
could aid in solving power and charging problems for various IoT devices. Our
findings suggest that in theory, WPT can indeed be used to solve IoT’s intelligent
devices, or “things”, charging and power challenges. However, we found that
human exposure and safety, industrial context, environmental issues, and cost of
technology are important factors that could affect WPT adoption in organizations.

Keywords: Wireless power transfer · Internet of Things


Wireless energy transfer · Literature review

1 Introduction

The Internet of Things (IoT) domain has increased in popularity and research focus in
recent years, and is sometimes even described as the next big thing, much like the internet
back in its early days [1]. IoT can be broadly described as a cyber-physical network
where “smart” objects, or “things”, communicate and cooperate with each other (and
with humans) to create new applications, or services to achieve a common goal [2, 3].
There are several formal definitions of IoT, and Vermesan et al. [4] proposed an ideal
one:
“The Internet of Things could allow people and things to be connected anytime, anyplace, with
anything and anyone, ideally using any path/network and any service.” [4, p. 12].

Through this connection of people and things, the goal is to achieve a better world
where things know what we like, what we want, and give them to us with minimal human
intervention [5]. Yet some simply describe IoT as increased machine-machine

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 92–103, 2019.
https://doi.org/10.1007/978-3-030-02686-8_8
Wireless Power Transfer Solutions for ‘Things’ in the IoT 93

communication. However, as pointed out by Isenberg et al. [6], at its core, the internet
of things is more than just communication technologies, it goes beyond communication;
as it endows the individual object or “thing” with intelligence. These intelligence
equipped objects, or “Smart/Intelligent Products” can be described as physical objects
equipped or coupled with computational software [6, 7]. Wong et al. [8] proposed several
requirements for intelligent products: (1) The object should have a unique identification,
(2) be able to communicate with its environment such as other objects, (3) retain or store
data about itself, (4) deploy a language to display its features, production requirements
etc., (5) and participate or make decisions relevant to its own destiny. These criteria
must also be met to enable the interaction between things [6]. One of the main challenges
related to these intelligent products is their power consumption; the power they need to
be able to perform their functions normally [6, 7]. These functions could include,
communication through wireless technology, or use of sensors [6]. Power is limited,
because these objects often move around, and therefore need a self-sufficient energy
source, such as batteries, to power mentioned functions [9]. Power consumption is a
challenge that could affect the decision of which wireless technology can be used and
adopted, because of the potential latency in the communication [9], or could be a poten‐
tial performance bottleneck [10]. Another problem is that battery replacement could be
costly, especially in large-scale deployments and IoT infrastructures [11]. Following the
replacement of batteries, throwing away batteries adds to the ever-increasing electronic
waste issue. One solution to the power consumption problem is “clustering”, as proposed
by López et al. [7]. Clustering gives the possibility to manage the power of the devices
by electing so-called representative network “members”, which have the responsibility
to collect and forward all communication within the network. These members are elected
based on their residual energy, where devices under a pre-set percentage of energy will
not be elected. However, this solution only slows battery consumption, as battery charge
or replacement is still needed at some point in time. Another potential solution is the
use of Bluetooth low-energy (BTLE) technology, which allows greater battery efficiency
compared to other communication technologies [12]. But this solution, again, only slows
the inevitable, which is the replacement of batteries.
The remainder of the paper is structured as follows. First, an overview of wireless
power transfer technology opportunities and challenges are provided in Sect. 2. The
research methodology is discussed in Sect. 3, followed by an overview of the articles
named in Sect. 4. Section 5 provides an overview of the literature review’s main findings.
A discussion is provided in Sect. 6. Finally, research conclusions are provided in Sect. 7.

2 Wireless Power Transfer

Wireless power transfer (WPT) technology (see Fig. 1), is a technique also known as
wireless charging, or wireless energy transfer (WET) [13]. WPT can be briefly explained
as the process of transmitting electricity from one power system to another through the
air gap via, for instance, an electromagnetic field or electromagnetic radiation [10].
Wireless charging happens when one of the transmitting systems is constantly powered,
and therefore continues to transfer power until the other system/device is fully charged
94 T. Helgesen and M. Haddara

[14]. The object, or power source, that transmits power is commonly referred to as a
power source (e.g. charging station), and the object that receives the power is commonly
referred to as the energy harvesting object, or simply “load” (e.g. robot) [15, 16].

Fig. 1. Generic wireless power transfer illustration.

While this technology has the potential to completely reshape the IoT landscape,
there is little research surrounding wireless power transfer in the IoT context. The aim
of this study is to explore the current literature and identify the potential use and appli‐
cations of WPT technologies to wirelessly charge intelligent products, or things, and
answer the following two main research questions:
• What wireless power transfer technologies could potentially solve the power chal‐
lenges related to intelligent products?
• What are the challenges following the use of wireless power transfer technologies?

3 Methodology

Literature review papers represent a well-established method for accumulating existing,


documented, and state-of-the-art knowledge within a domain of interest. In this article
we have applied a systematic review approach as described by Webster and Watson [17].
This approach is characterized by adopting explicit procedures and conditions. This
involves the use of a variety of procedures combined with various search criteria to
minimize bias as much as possible [18].
The review covers articles published between the years 2007–2018 (February). We
have narrowed down the search process through a condition, that the articles need to be
published in peer reviewed journals, edited books, or conference proceedings. More‐
over, no delimitation has been imposed on the outlets’ field, to enable potential research
results from various fields. The following search procedures have been applied to
provide a comprehensive and systematic methodology.
1. An initial search was done through Google Scholar. The search option was limited
to articles’ titles. The keywords: wireless charging, wireless power transfer, wireless
energy transfer, IoT, internet of things, and their combinations were used.
2. Due to their high relevance for research, other research databases were used. These
databases included ACM Digital Library, IEEE Xplore Digital Library, EBSCO host
and Springer. The search procedure was restricted to the same keywords as in the
Wireless Power Transfer Solutions for ‘Things’ in the IoT 95

previous step. In addition to the title area, the abstract and keyword parts of the
articles have been included into the search.
3. In order to minimize the search results, we have put a constraint that the papers
included in this review must have at least five citations.
4. Additionally, we conducted a secondary search through scanning all of the selected
articles reference lists, to identify further potential literature sources.
5. The articles abstracts were then carefully read by both authors to check their rele‐
vance for this review paper. Only articles directly addressing wireless power transfer
technologies within the IoT domain were selected for it.
6. Based on the preliminary review, two main categories of wireless transfer tech‐
nology ranges were identified. Hence, the articles were classified into two main
groups, near-field and long-field power transfer technologies.
The authors independently classified the articles into a concept matrix [17], which
included the research themes. The results were then compared and discussed in order to
achieve a consensus on each article’s classification. It is important to mention that an
article could fall into one or more themes, based on the article’s technology focus.
One of the main limitations of this research methodology is that some potentially
relevant papers may have been omitted, because they didn’t meet our condition of the
minimum number of citations. The omitted research papers that were more recent and
had a low number of citations particularly affected the scope of this literature review.

4 Overview of the Articles

In total, we reviewed thirty articles that were published in various outlets; Of these, 24
are journal articles, 1 is a conference proceeding, and 5 are articles in books. As seen in
the following figure (Fig. 2), the review shows a gradual increase in research interest in
wireless power transfer, with a maximum of 9 publications in 2016.

3 3 3
2 2
1 1
0 0 0

Fig. 2. Number of publications per year.


96 T. Helgesen and M. Haddara

5 Main Findings

In literature, several potential wireless power transfer technologies were identified and
split into two main categories: Near-field and Long-field wireless charging technologies,
as shown in the following table and discussed in this section (Table 1).

Table 1. Overview of research topics and their corresponding papers.


Range category Wireless power technologies overview
WPT technology Papers
Near-field Inductive Power Transfer (IPT) [10, 15, 19–24]
Resonant Inductive Power Transfer (RIPT) [10, 14–16, 25–32]
Capacitive Power Transfer (CPT) [33, 34]
Long-field Radio Frequency (RF) radiation [13, 35–38]
Microwave Power Transfer (MPT) [10, 15, 38–40]
Laser Power Transfer (LPT) [10, 41, 42]

5.1 Near-Field Power Transfer

(1) Inductive Power Transfer


Inductive power transfer (IPT), also known as inductive coupling, transfers power
from one coil to another, and have been used for powering RFID tags and medical
implants [26]. The field IPT generates is in the kilohertz range, and is typically used
within a few millimeters, to a few centimeters (20 cm) from the targeted load [15].
Power varies between watt and kilowatt based on transmission efficiency [33]. The
transmission efficiency decreases as range increases, and even more so if there is
any misalignment between the coils [23, 25]. Following misalignment, if there is a
change to the range, the coils require calibration to work [39]. Loss of electricity
through misalignment, range, or metallic objects between the coils will lead to an
increase in heat [14, 15]. Due to its low transmission efficiency, the field is consid‐
ered safe for humans [15]. In the IoT domain, this technology has been recom‐
mended for several applications. For example, Rim and Mi [24] explored the possi‐
bilities of the wireless power transfer to electric vehicles, and other mobile devices.
(2) Resonant Inductive Power Transfer
One of the earliest implementations of the resonant inductive power transfer (RIPT)
is Nikolai Tesla’s magnifying transmitter, or coil [43] (Fig. 3). The magnifying
transmitter succeeded to wirelessly transmit power to power harvesting objects,
like lamps, as shown in Fig. 2. Resonant inductive power transfer follows the same
basic principles as IPT. However, this technology makes use of magnetic resonant
coils, which operate at the same resonance frequency [10]. This technique makes
creates a stronger connection, and therefore increases the potential range and effi‐
ciency. The first documented optimal use of RIPT for WPT was performed by Kurs
et al. [28], and achieved a transmission efficiency around 90% at 1 m, and 40% at
2 m. Power varies between watt and kilowatt based on transmission efficiency [34].
As with IPT, the transmission efficiency decreases as range increases, though RIPT
Wireless Power Transfer Solutions for ‘Things’ in the IoT 97

has proven to have a longer range and better efficiency [15, 20, 28]. As with IPT,
RIPT requires calibration for each change made to distance or coil [39]. RIPT
technology can charge multiple receivers at the same time, even if the receivers are
out of sight [15]. As with ICT, the resonant field is considered safe for humans,
which was proven by Ding et al. [19]. Thus, Bito et al. [32] have developed a real-
time electrically controlled wireless charging infrastructure, and algorithms that
can be used to recharge biomedical and implanted devices (e.g. pacemakers). This
could effectively abolish the need for surgical procedures, which are currently
necessary for occasional battery replacement.

Fig. 3. Tesla’s magnifying transmitter wirelessly powering a lamp [44].

(3) Capacitive Power Transfer


Capacitive power transfer (CPT) is a coupling made up of two metal surfaces where
electricity is transferred between the point of contact [33]. Though potentially
cheaper than IPT and RIPT, CPT requires close contact between the two metal
surfaces. Hence, it is greatly limited by range requirements [27, 33, 34]. CPT tech‐
nology has only recently seen kilowatt-scale loads, and was overlooked until 2008,
which could explain this [33].
98 T. Helgesen and M. Haddara

5.2 Long-Field Power Transfer


(1) Radio Frequency Radiation
Radio Frequency (RF) radiation, uses radio frequency emitted from an antenna for
carrying radiant energy [10]. It can send power from a meter, up to several kilo‐
meters based on the technique used [15]. However, it has a very low efficiency rate,
and requires line of sight to deliver power [29]. In regard to the low efficiency rate,
one project reported a transmission efficiency of around 1% at 30 cm [10]. It also
needs to know the location of the intended target [15]. Due to its health risks through
exposure, radio frequency is commonly used and operated in low power areas [15].
Boshkovska et al. [35] proposed a simultaneous wireless information and power
transfer (SWIPT) model that enables simultaneous wireless information and power
transfer on the same waveforms. This model also extends the possibilities for IoT
energy-harvesting devices, which also need continuous communication [35, 40].
One of the paramount obstacles for far-field wireless power implementations, is the
end-to-end power transfer efficiency and optimization needed to increase the direct
current power level at the output of the rectenna (energy harvester), without the
need to increase the transmission power and waveform output [36]. Through simu‐
lations, Clerckx and Bayguzina [36, 37] and Huang and Clerckx [45], have provided
models and algorithms that could potentially increase the transmission output in
waveforms, and decrease power loss during radio frequency to direct current
conversions in far-field transmissions.
(2) Microwave Power Transfer
Microwave power transfer (MPT) is a technique that increases transmission effi‐
ciency and range through, for instance, a parabolic dish, which focuses the radio
waves [14, 22]. However, MPT requires complicated tracking mechanisms, and a
large scale of devices [15]. Galinina et al. [22], have proposed a framework for
applying MPT techniques to transfer power to 5G devices, such as wearables,
through beacons that facilitate a continuous supply of power, creating self-sustain‐
able devices. Finally, Di Renzo and Lu [38] developed a stochastic mathematical
model to analyze and optimize low-energy cellular-enabled mobile devices, that
have dual wireless information and beam power transmission capabilities.
(3) Laser Power Transfer
Another RF technique is the use of optical laser power transfer (LPT), which trans‐
mits power under visible or near infrared frequency [10]. However, like MPT, it
requires complicated tracking mechanisms, and a large spectrum of devices [15].
One of the potential applications of LPT is Industry 4.0, otherwise known as the
4th industrial revolution [2]. On a larger scale, with the emergence of cloud
computing and the current advancements in the mobile networks, billions of heter‐
ogeneous smart devices with different application requirements are connected to
networks, and are currently generating large volumes of data that need be processed
in distributed cloud infrastructures [42]. Hence, Munoz et al. [42] have presented
a platform that is currently under development, which utilizes fifth-generation (5G)
mobile network technologies to develop new radio interfaces to cope with the
exponential traffic growth, and integrate diverse networks from end to end, with
Wireless Power Transfer Solutions for ‘Things’ in the IoT 99

distributed cloud resources to deliver E2E IoT and mobile services. Moreover, a
paper by Liu et al. [41], explored the possibilities of transforming the current
Chinese power grid into a smart grid to enable IoT applications. The paper focuses
on optical/laser technologies as enablers for IoT devices’ communication and wire‐
less charging through the grid.

6 Discussion

The reviewed articles are spread across 20 various outlets. Among the outlets, we have
recognized only one special journal issue focusing on wireless power transfer technol‐
ogies within the IoT context. As the research interest on WPT in IoT is increasing,
research outlets should pay more attention to this domain. In general, 30 articles across
a 12-year period is a low number of publications. Despite the need for research WPT
for IoT was recognized in previous literature. Still, the amount of research conducted
on this issue is considered very limited. Thus, more research needs to be carried out in
order to gather sufficient knowledge about this phenomenon, as WPT in IoT did not
receive appropriate attention compared to other IoT related topics.
Based on our WPT in IoT literature review, in the following part we answer our
research questions, and present some research gaps and future research suggestions.
To answer the first question, what wireless power transfer technologies could poten‐
tially solve the power challenges related to intelligent products? It is apparent that
virtually all of the technologies identified in the literature could solve the device charging
and power harvesting challenges that were discussed earlier in this paper. However, the
decision of which of these technologies would be the best fit, should be based on several
factors. One factor is the target environment. For instance, one type of environment
could be an industrial workplace, where intelligent devices are being used to inform
users about exposure to hazardous equipment such as in the case of Kortuem et al. [46].
Since this would most likely be a very open and dynamic environment, microwave
power transfer could be used through the use of power beacons (PBs), as recommended
by Huang and Lau [39]. Likewise, the use of a capacitive power transfer solution is also
viable, where the smart object has to be placed on top of a charging platform when at
rest; though this would require the device to hold out until it is charged. This technology
is very similar to existing wireless mobile phone charging stations. The decision of
which solution would be the best fit, should take into consideration another factor—cost.
Though costs might be reduced, due to the absence of battery that would need to be
replaced, the high implementation costs of the technology are still factors to be consid‐
ered. Implementation costs could be, for example; the price of changing traditional
charging cords with wireless chargers, and the cost of installing wireless power receivers
in the intelligent products [15], though this would depend on the chosen technology.
Cost is also affected by the required charging range, as long-range charging is not as
effective as wired; therefore, consumes more electricity. Another factor is the size of the
object/device. It has been pointed out that both inductive and resonant inductive coupling
require a relatively large receiver for effective long-range charging [15]. Though this
most likely depends on the amount of power the device needs, as Cannon et al. [26]
100 T. Helgesen and M. Haddara

pointed out, how one large-source coil transponder can be used to charge many small-
load coil receivers. However, the most important factor should be the planned perform‐
ance level of the smart object, as on-the-go charging leads to more power consumption;
therefore, opening the way for more functions. The goal should be to utilize this extra
power to increase the performance of the smart object. To illustrate this, clustering as
explained in the introduction, was proposed to slow power consumption at the cost of
real-time data and could lead to potentially disconnected environments. However,
always having the power needed to perform their function would lead to always-avail‐
able real-time data, communication, and coordination, which is closer to the ideal defi‐
nition of IoT.
Regarding the second research question: What are the challenges following the use
of wireless power transfer technologies? There are some general challenges with the
use of WPT technology. One of the paramount challenges is how businesses could
outweigh the cost of acquiring and using the technology in terms of business value.
Another challenge is to implement the technology in an optimum way, so that it does
not disrupt or slow down business processes. In addition, the technology must be imple‐
mented in a way that it will not pose any potential health risks to humans in the vicinity.
Based on this review’s findings, several research gaps have been identified. For
example, it is evident that the majority of the reviewed papers focused more on the near-
field wireless power transfer technologies than the long-field context. As discussed
earlier, the longer the range, the more wireless power is needed to charge distant objects,
which could be inefficient and costly for the time being. Thus, more research is needed
in order to find power optimization techniques among available power sources and
power harvesting devices. It is also palpable that very little research has been conducted
within the laser power transfer domain in the long-field WPT. This lack of research could
possibly be explained by the expensive infrastructure required to implement this tech‐
nology. In addition, as virtually all of the papers reviewed are considered highly tech‐
nical papers (mostly IEEE outlets), there is also an apparent research gap on the business
value and feasibility of the different WPT technologies from a business perspective.
Furthermore, almost none of the papers have reported a real-world case study on WPT
implementations within businesses. This could explain the slow adoption of WPT tech‐
nology in this particular domain, as bridging research between technical and business
issues is needed to reach the managers, and to increase the businesses’ awareness of
such technologies.

7 Conclusions

This paper contributes to both research and practice through providing a comprehensive
literature review on the potential of wireless power transfer technologies in the IoT
domain. For practice, the paper sheds the light on past and recent issues as well as
challenges that can guide IoT consultants, vendors, and clients in their future projects.
For researchers, the organization of literature into the different WPT technologies can
aid them in identifying the topics, findings, and gaps discussed in each technology of
Wireless Power Transfer Solutions for ‘Things’ in the IoT 101

interest. Finally, we have provided our observations and future research suggestions that
would enrich knowledge in this domain.

References

1. Sajid, O., Haddara, M.: NFC mobile payments: are we ready for them? In: SAI Computing
Conference (SAI), 2016, pp. 960–967 (2016)
2. Haddara, M., Elragal, A.: The readiness of ERP systems for the factory of the future. Procedia
Comput. Sci. 64, 721–728 (2015)
3. Misra, G., Kumar, V., Agarwal, A., Agarwal, K.: Internet of Things (IoT)—a technological
analysis and survey on vision, concepts, challenges, innovation directions, technologies, and
applications (an upcoming or future generation computer communication system
technology). Am. J. Electr. Electron. Eng. 4, 23–32 (2016)
4. Vermesan, O., Friess, P., Guillemin, P., Gusmeroli, S., Sundmaeker, H., Bassi, A., et al.:
Internet of Things strategic research roadmap. In: Internet of Things-Global Technological
and Societal Trends, vol. 1, pp. 9–52 (2011)
5. Perera, C., Zaslavsky, A., Christen, P., Georgakopoulos, D.: Context aware computing for
the Internet of Things: a survey. IEEE Commun. Surv. Tutor. 16, 414–454 (2014)
6. Isenberg, M.-A., Werthmann, D., Morales-Kluge, E., Scholz-Reiter, B.: The role of the
Internet of Things for increased autonomy and agility in collaborative production
environments. In: Uckelmann, D., Harrison, M., Michahelles, F. (eds.) Architecting the
Internet of Things, pp. 195–228. Springer, Berlin (2011)
7. López, T.S., Brintrup, A., Isenberg, M.-A., Mansfeld, J.: Resource management in the Internet
of Things: clustering, synchronisation and software agents. In: Uckelmann, D., Harrison, M.,
Michahelles, F. (eds.) Architecting the Internet of Things, pp. 159–193. Springer, Berlin
(2011)
8. Wong, Y., McFarlane, D., Zaharudin, A.A., Agarwal, V.: The intelligent product driven
supply chain. In: 2002 IEEE International Conference on Systems, Man and Cybernetics, vol.
4, p. 6 (2002)
9. Mattern, F., Floerkemeier, C.: From the internet of computers to the Internet of Things. In:
Sachs, K., Petrov, I., Guerrero, P. (eds.) From Active Data Management to Event-Based
Systems and More, pp. 242–259. Springer, Berlin (2010)
10. Xie, L., Shi, Y., Hou, Y.T., Lou, A.: Wireless power transfer and applications to sensor
networks. IEEE Wirel. Commun. 20, 140–145 (2013)
11. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of Things: vision, applications
and research challenges. Ad Hoc Netw. 10, 1497–1516 (2012)
12. Swan, M.: Sensor mania! the Internet of Things, wearable computing, objective metrics, and
the quantified self 2.0. J. Sens. Actuator Netw. 1, 217–253 (2012)
13. Yuan, F., Jin, S., Wong, K.K., Zhao, J., Zhu, H.: Wireless information and power transfer
design for energy cooperation distributed antenna systems. IEEE Access 5, 8094–8105 (2017)
14. Chawla, N., Tosunoglu, S.: State of the art in inductive charging for electronic appliances and
its future in transportation. In: 2012 Florida Conference on Recent Advances in Robotics, pp.
1–7 (2012)
15. Lu, X., Wang, P., Niyato, D., Kim, D.I., Han, Z.: Wireless charging technologies:
fundamentals, standards, and network applications. IEEE Commun. Surv. Tutor. 18, 1413–
1452 (2016)
16. Lu, X., Wang, P., Niyato, D., Han, Z.: Resource allocation in wireless networks with RF
energy harvesting and transfer. IEEE Netw. 29, 68–75 (2015)
102 T. Helgesen and M. Haddara

17. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: writing a literature
review. MIS Q. 26, xiii–xxiii (2002)
18. Bryman, A.: Social Research Methods. OUP, Oxford (2012)
19. Ding, P.-P., Bernard, L., Pichon, L., Razek, A.: Evaluation of electromagnetic fields in human
body exposed to wireless inductive charging system. IEEE Trans. Magn. 50, 1037–1040 (2014)
20. Hui, S.Y.R., Zhong, W., Lee, C.K.: A critical review of recent progress in mid-range wireless
power transfer. IEEE Trans. Power Electron. 29, 4500–4511 (2014)
21. Zhao, B., Kuo, N.-C., Niknejad, A.M.: An inductive-coupling blocker rejection technique for
miniature RFID tag. IEEE Trans. Circuits Syst. I Regul. Pap. 63, 1305–1315 (2016)
22. Galinina, O., Tabassum, H., Mikhaylov, K., Andreev, S., Hossain, E., Koucheryavy, Y.: On
feasibility of 5G-grade dedicated RF charging technology for wireless-powered wearables.
IEEE Wirel. Commun. 23, 28–37 (2016)
23. Imura, T., Hori, Y.: Maximizing air gap and efficiency of magnetic resonant coupling for
wireless power transfer using equivalent circuit and Neumann formula. IEEE Trans. Ind.
Electron. 58, 4746–4752 (2011)
24. Rim, C.T., Mi, C.: Wireless Power Transfer for Electric Vehicles and Mobile Devices. Wiley,
Hoboken (2017)
25. Beh, T.C., Kato, M., Imura, T., Oh, S., Hori, Y.: Automated impedance matching system for
robust wireless power transfer via magnetic resonance coupling. IEEE Trans. Ind. Electron.
60, 3689–3698 (2013)
26. Cannon, B.L., Hoburg, J.F., Stancil, D.D., Goldstein, S.C.: Magnetic resonant coupling as a
potential means for wireless power transfer to multiple small receivers. IEEE Trans. Power
Electron. 24, 1819–1825 (2009)
27. Hui, S.: Planar wireless charging technology for portable electronic products and Qi. Proc.
IEEE 101, 1290–1301 (2013)
28. Kurs, A., Karalis, A., Moffatt, R., Joannopoulos, J.D., Fisher, P., Soljačić, M.: Wireless power
transfer via strongly coupled magnetic resonances. Science 317, 83–86 (2007)
29. Xie, L., Shi, Y., Hou, Y.T., Sherali, H.D.: Making sensor networks immortal: an energy-renewal
approach with wireless power transfer. IEEE/ACM Trans. Netw. 20, 1748–1761 (2012)
30. Choi, B.H., Thai, V.X., Lee, E.S., Kim, J.H., Rim, C.T.: Dipole-coil-based wide-range
inductive power transfer systems for wireless sensors. IEEE Trans. Ind. Electron. 63, 3158–
3167 (2016)
31. Yeo, T.D., Kwon, D., Khang, S.T., Yu, J.W.: Design of maximum efficiency tracking control
scheme for closed-loop wireless power charging system employing series resonant tank. IEEE
Trans. Power Electron. 32, 471–478 (2017)
32. Bito, J., Jeong, S., Tentzeris, M.M.: A real-time electrically controlled active matching circuit
utilizing genetic algorithms for wireless power transfer to biomedical implants. IEEE Trans.
Microw. Theory Tech. 64, 365–374 (2016)
33. Dai, J., Ludois, D.C.: A survey of wireless power transfer and a critical comparison of
inductive and capacitive coupling for small gap applications. IEEE Trans. Power Electron.
30, 6017–6029 (2015)
34. Dai, J., Ludois, D.C.: Wireless electric vehicle charging via capacitive power transfer through
a conformal bumper. In: 2015 IEEE Applied Power Electronics Conference and Exposition
(APEC), pp. 3307–3313 (2015)
35. Boshkovska, E., Koelpin, A., Ng, D.W.K., Zlatanov, N., Schober, R.: Robust beamforming
for SWIPT systems with non-linear energy harvesting model. In: 2016 IEEE 17th
International Workshop on Signal Processing Advances in Wireless Communications
(SPAWC), pp. 1–5 (2016)
Wireless Power Transfer Solutions for ‘Things’ in the IoT 103

36. Clerckx, B., Bayguzina, E.: Waveform design for wireless power transfer. IEEE Trans. Signal
Process. 64, 6313–6328 (2016)
37. Clerckx, B., Bayguzina, E.: Low-complexity adaptive multisine waveform design for wireless
power transfer. IEEE Antennas Wirel. Propag. Lett. 16, 2207–2210 (2017)
38. Renzo, M.D., Lu, W.: System-level analysis and optimization of cellular networks with
simultaneous wireless information and power transfer: stochastic geometry modeling. IEEE
Trans. Veh. Technol. 66, 2251–2275 (2017)
39. Huang, K., Lau, V.K.: Enabling wireless power transfer in cellular networks: architecture,
modeling and deployment. IEEE Trans. Wirel. Commun. 13, 902–912 (2014)
40. Bi, S., Zeng, Y., Zhang, R.: Wireless powered communication networks: an overview. IEEE
Wirel. Commun. 23, 10–18 (2016)
41. Liu, J., Li, X., Chen, X., Zhen, Y., Zeng, L.: Applications of Internet of Things on smart grid
in China. In: 2011 13th International Conference on Advanced Communication Technology
(ICACT), pp. 13–17 (2011)
42. Munoz, R., Mangues-Bafalluy, J., Vilalta, R., Verikoukis, C., Alonso-Zarate, J., Bartzoudis,
N., et al.: The CTTC 5G end-to-end experimental platform: integrating heterogeneous
wireless/optical networks, distributed cloud, and IoT devices. IEEE Veh. Technol. Mag. 11,
50–63 (2016)
43. Brown, W.C.: The history of power transmission by radio waves. IEEE Trans. Microw.
Theory Tech. 32, 1230–1242 (1984)
44. Tesla, N.: The Problem of Increasing Human Energy: With Special Reference to the
Harnessing of the Sun’s Energy. Cosimo Inc., New York (2008)
45. Huang, Y., Clerckx, B.: Waveform optimization for large-scale multi-antenna multi-sine
wireless power transfer. In: 2016 IEEE 17th International Workshop on Signal Processing
Advances in Wireless Communications (SPAWC), pp. 1–5 (2016)
46. Kortuem, G., Kawsar, F., Sundramoorthy, V., Fitton, D.: Smart objects as building blocks
for the Internet of Things. IEEE Internet Comput. 14, 44–51 (2010)
Electronic Kintsugi
An Investigation of Everyday Crafted Objects in Tangible
Interaction Design

Vanessa Julia Carpenter1 ✉ , Amanda Willis2, Nikolaj “Dzl” Møbius3,


( )

and Dan Overholt1


1
Technical Doctoral School of IT and Design, Aalborg University, Copenhagen, Denmark
{vjc,dano}@create.aau.dk
2
Simon Fraser University, Surrey, Canada
3
HumTek, Roskilde University, Roskilde, Denmark

Abstract. In the development of enhanced and smart technology, we explore


the concept of meaningfulness, tangible design and interaction with everyday
objects through Kintsugi, the Japanese craft of repairing broken ceramics with
gold. Through two workshops, this emergent design research develops an iterative
prototype: Electronic Kintsugi, which explores how we can facilitate more
human-to-human or human-to-self connection through a hybrid crafted everyday
object. We identify three themes: (1) enhancing human connection through
embedded or “magic” technology; (2) using everyday objects to prompt personal
reflection and development; and (3) exploring transferable design principles of
smart products with a device of undefined purpose, and this converges traditional
craft and technology.

Keywords: Craft · Internet of Things (IoT) · Tangible interaction


Everyday objects

1 Introduction

This work explores Kintsugi, the Japanese craft of repairing broken ceramics with gold
and explores how we can use capacitive touch to facilitate tangible interaction with an
everyday, crafted object. We situate ourselves within interaction design and look to craft
and tangible interaction related works.
The grounding question for this work asks how can we facilitate more human-to-
human or human-to-self connection through a digital/crafted hybrid-everyday object
and which design benefits can this offer future technology? We explore this through
three themes which emerge in our work about technology, craft and interaction. Much
of the recent work within interaction design about tangible interaction has shown an
increased focus on traditional craft work [1–4] and a return to tangible interaction [5–
7] from screen interaction. Despite a focus on the craft and the tangible, in commercial
areas a strong focus on app-based interaction, digital displays, and screen based solutions
has become the norm, even pushing towards virtual or augmented reality. Meanwhile,

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 104–121, 2019.
https://doi.org/10.1007/978-3-030-02686-8_9
Electronic Kintsugi 105

a number of critical views about the value of the Internet of Things (IoT) have recently
been published [3, 8] and a wave of research and devices around the themes of mind‐
fulness, self-exploration, reflection, and well-being is emerging [9, 10].
In this area of overlap, between screens and tangible interaction, between making
devices and traditional craft, between the IoT devices and the mindfulness tools, we find
ourselves interested in exploring the potential engagement qualities of non-screen,
tangible interaction in the form of everyday crafted objects. We are specifically inter‐
ested in the physical nature of both the IoT gadgets and the mindfulness tools as they
tie into the physicality of crafted objects. We rely on physical objects in our lives and
while designing future smart homes, offices, cars, etc., we might benefit from a deeper
understanding of how we relate to these physical things [11]. Núñez Pacheco and Loke
elaborate: “A focus on a more reflective approach can offer fresh ways of understanding
how the lived body interacts with artefacts, products and spaces” [12]. This speaks to
how we can look further into understanding how humans can interact with ‘things’ and
our focus is to take that further and ask how we can facilitate more human-to-human or
human-to-self connection through a hybrid crafted everyday object.

2 Introducing Kintsugi as a Device to Explore Connection and


Meaning Making

Electronic Kintsugi was developed as an investigation tool into how we could use
everyday objects to explore human-to-human connection, human-to-self connection,
and to find if we could develop something which intrigued and engaged people, moving
from the IoT (Internet of Things) towards an appreciation and use of crafted, tangible,
interactive, everyday objects. Electronic Kintsugi is a platform for exploration and
meaning-making, an opportunity to engage with others, and with oneself and to create
new narratives. In our work, our context was Japan’s artisanal craft of Kintsugi where
we developed our work with a Kintsugi artist and our focus was on the tangible, non-
screen interaction properties of how a device with an undefined purpose might exist in
between these realms of traditional craft, technology and sound.
Inspired by Tsaknaki and Fernaeus’ work with Expanding on Wabi-Sabi as a Design
Resource in HCI [13] where they explored unfinished craft and interaction design, the
authors created a device and facilitated two participatory workshops exploring the Japa‐
nese craft of Kintsugi: mending broken ceramics with a precious metal to make them
more beautiful and valuable than before. These concepts were adopted with the creation
of Electronic Kintsugi: a sound or light reactive piece of repaired ceramics with touch
interaction on the precious metal seams. Our interest is in the aesthetics of individuality,
human touch, and to explore and respect the tradition of the craft of Kintsugi itself (Video
of Electronic Kintsugi here: https://youtu.be/p5Pu0-gZ3u0) (Fig. 1).
106 V. J. Carpenter

Fig. 1. Electronic Kintsugi in a design expert’s home; The Kintsugi artist creating traces; First
workshop explorations with light and sound.

3 Related Works: Exploring the Physical Qualities of Hybrid


Tangible Embedded Interaction, Through Crafted “Things”

The literature review researched works where craft is referenced for the transferable
physical qualities of interaction design; material, texture, touch, and recognition of
craftsmanship as opposed to the sleek smooth, machined surfaces of our current smart
products. We see this as a natural progression from a screen-based society, moving
towards embodied engagement and beyond the swipe-interaction of the “black mirror”
(screen) as described by Rose [14]. Three thematic findings informed our prototype and
workshop development.

3.1 Traditional Craft as a Starting Point for Exploration

Tsaknaki and Fernaeus explore craft in depth, in a variety of their works, and hereby
evaluate the role of interaction design in craft. In their work on Wabi Sabi, Tsaknaki
and Fernaeus [13] present the concept of Wabi Sabi; and “approach perfection through
explicitly unfinished designs”. We embrace the concept of unfinished design with Elec‐
tronic Kintsugi, deliberately designing an unfinished device to prompt curiosity and
exploration of the prototype. In their work with leather, Tsaknaki, Fernaeus, and Schaub
[15] explore how leather can be a touch based, rich material for tangible interactions.
This work informs how we can look to everyday materials, in our case, ceramics, for
stroking interaction, much like the leather interactions of their SoundBox.
In exploring silversmithing, Tsaknaki, Fernaeus, Rapp and Belenguer [16] both
engaged local artisans and focused especially on the “cultural and historical signifi‐
cance” of the craft, and explored the design “space of screen-less” interactions. This
finding informed our choice of working with the Japanese artisanal craft of Kintsugi
where we developed our work with a Kintsugi artist and our focus was on the tangible,
non-screen interaction properties of how a device with an undefined purpose might exist
in between these realms of traditional craft and technology.
Electronic Kintsugi 107

3.2 Designing from Everyday Things with Social Implications in Mind


In recent works about the Internet of Things (IoT), Cila, Smit, Giaccardi and Kröse [8],
Nordrum [17], and Lingel [3] all explore the social significance of the “thing” and
suggest that we need not only look at the everyday (home and workplace) but also the
social and cultural implications of these everyday interactions with things. Our work
focuses on this “thing” and thus, the development of Electronic Kintsugi.

3.3 Technology and Touch

Significant work has been done in the field of interaction design with regards to touch
and in the interests of space we do not cover that here, however the particular work by
Cranny-Francis [18] covers a sizeable portion of the touch research done within design.
In Semefulness: a social semiotics of touch, Cranny-Francis introduces the experience
of touch as ‘semefulness’ – “multiply significant, physically, emotionally, intellectually,
spiritually, politically” [18]. She describes the ‘tactile regime’ of touch in culture, how
it shapes how we engage with one another or to the tools we design and then use. She
describes that “Touch is semeful in that it is full of meanings - physical, emotional,
intellectual, spiritual and those meanings are socially and culturally specific and
located.” Here we can begin to touch upon the multi-faceted nature of Electronic Kint‐
sugi. It is culturally and location specific to traditional Japanese craft; it is emotional to
some - as an heirloom or a piece of valuable art; it fosters social interaction when acting
as Electronic Kintsugi (see Sect. VI. C); and it is physical in nature, it requires touch,
stroking, holding the bowl. One ambition of Electronic Kintsugi is to enable meaningful
experiences for the participants, and by addressing Cranny-Francis’ ‘semeful’ attributes,
we may begin to explore this domain.

3.4 A Focus on Audio and Playfulness

Schoemann and Nitsche [4] use the “Stitch Sampler”, a sew-able musical instrument to
focus on embodiment via the act of sewing, and on audio feedback, “to respond to the
crafter’s personality”. These qualities of craft, tangible non-screen interaction, and
playfulness with sound inform our process, helping to frame the area we are exploring.
Electronic Kintsugi allows participants to explore the interaction qualities of a hybrid
crafted device and consider its potential uses in their lives. We encourage curiosity and
unexpected encounters, and reflections of those encounters. This speaks to our objective
to inform future smart product design and encourage a tangible, non-screen interface
which utilizes craft and the qualities of curiosity and reflectivity.

4 Methodology

Initially, we were fascinated by the idea of Kintsugi and made a basic prototype to
explore possible values of Electronic Kintsugi. This work spans from the first prototype
to two workshops, one in Japan, and one in Denmark, six months apart. We present an
108 V. J. Carpenter

overview of methods here and then describe each workshop and the findings in the
following sections.

4.1 Workshop 1: Methods


The first workshop was designed in a collaborative process with FabCafe Tokyo and
Kintsugi artist, Kurosawa where we combined electronics with an everyday “craft”
object with the artisan in this process [16] so they could both introduce us to the nuances
of the craft and help us to understand to what we should be paying attention.
Following the process described by Tsaknaki, Fernaeus and Schaub [15] in their
leather material explorations, we created a workshop session to explore the properties
of Kintsugi and gain insight into the craft, and to investigate how our prototype was
received by participants in that context.
We used thin strips of copper tape to conduct electrical current and worked with the
Kintsugi artist to carefully overlay the traces of precious metals where the repair had
been, to emulate the traditional Kintsugi.1
The workshop consisted of two of the authors (one, an electrical and mechanical
engineer and the other an interaction designer), the Kintsugi artist, and seven participants
of varying electronics skill levels who were recruited through an open Fabcafé Tokyo
Facebook event.
During the workshop, the Kintsugi artist presented and demonstrated their process,
allowing participants to try their hand at creating Kintsugi. The authors presented their
work and the thoughts behind the Electronic Kintsugi. The workshop explored Kintsugi
and interaction with it, using two familiar outputs, sound and light, which would act as
examples of possible outputs, so that participants were able to extrapolate from this in
terms of what the Electronic Kintsugi might be used for.
We conducted the workshop in a focus group style, and did two rounds of explorative,
hands-on evaluation. A questionnaire was developed to capture their experience (Results
in section “First Workshop”).

4.2 Second Iteration of the Electronic Kintsugi

Cila, Smit, Giaccardi and Kröse [8] describe the interventionist product, for creating
dialogues, which sense, respond to, and interpret data. The Electronic Kintsugi was
developed to sense touch, responds to it, and for the second workshop, could interpret
data, such as how often it is being stroked.
After feedback from the first workshop, the Electronic Kintsugi was updated to have
more responsiveness and a more light interaction would emerge, or how it would
progress in order to prompt explorative and playful behaviour with the device. Rather,
it had a certain level of ambiguity [19] via the programmed adaptive behaviours, based
on how much it was interacted with and for how long, e.g., if it had been left alone, or
off for a period.

1
http://www.kurovsya.com/.
Electronic Kintsugi 109

Several touch-to-sound and touch-to-light reactions were developed for the work‐
shop. Each reaction was taking input from the touch interface2 and creating a specific
output in the form of either light or sound. Light was output on a strip of NeoPixels and
sound was synthesized using a software library3 and output to a speaker.
The light reactions transform a single parameter from the touch interface into a
specific light pattern on the LED display. Likewise, sound reactions transform a single
parameter from the touch interface to single tones, chords or evolving sound figures.
In the second iteration, we wanted to increase the complexity [20] of interacting with
the device so the interaction was less binary, such as a touch = a sound. Instead, it was
decided to make the coupling between the input and output less apparent, giving it the
autonomy to interpret the frequency of interaction and respond according. Within the
second iteration algorithm, there exist five cases for interaction modalities for either
sound or light, meaning five for sound and five for light. There is a manual switch on
the Electronic Kintsugi so participants can choose if they are interacting with light or
sound. These five cases were five variations in types of output cycled through a timer
based on interaction. If the user was interacting with the Electronic Kintsugi, then it
would remain on that mode longer, until they paused interacting, to not interrupt their
flow of interaction. Then it would move to the next mode. Each mode was a variation
in output, so for example, for sound, it might be different chords or tones.
This had the purpose of giving the participant less time to recognize patterns in the
behaviour and enhance the user’s curiosity. We focused on how the interaction between
the participants and the Electronic Kintsugi could be more tightly or loosely coupled,
yet also incorporate elements of surprise; and what implications this interaction had for
the participants’ association to the Electronic Kintsugi as a device, versus as an instru‐
ment, companion, or tool.

4.3 Workshop 2: Methods


The second workshop was scheduled six months after the first, due to travel and revisions
to the technology and workshop design.
Approaching workshop two, Wakkary et al. [11] published a work, “Morse Things”
wherein they utilised a methodology for engaging design researchers to evaluate their
everyday object through having the object in their home for some weeks, and then
following up with a workshop with the design researchers to explore the experiences
with the object. We adopted this methodology for our work, and asked four design
researchers to evaluate the Electronic Kintsugi in their homes for a period of five weeks
followed by a workshop. We chose to use this method, in agreement with Wakkary et al.
who explain, “A key motivation in our approach was the desire to deepen our investi‐
gation by including a wider range of experts that have the design expertise to perceive
and investigate the nuanced and challenging notions of thing-centeredness.”

2
We followed instructions from: http://www.instructables.com/id/Touche-for-Arduino-
Advanced-touch-sensing/.
3
We used this library: https://github.com/dzlonline/the_synth.
110 V. J. Carpenter

4.4 Participant Selection and Introduction to Electronic Kintsugi


Opportunity sampling was used to select experts in design research from different back‐
grounds, aged 30–38, living in Copenhagen to ensure different perspectives on the
experience and imagined future uses. Participants’ names have been changed for their
privacy. Their backgrounds are crossovers between the fields of engineering, interaction
design, dance, performance design, industrial design, robotics, and hardware develop‐
ment.
Participants were recruited by email and it was explained to the participants that
they’d have the object in their home for 5 weeks and engage with it for a minimum of
15 min per week, spending another 15 min per week journaling their experiences.
Participants were asked to keep a record of their thoughts and experiences and to both
keep these as a document and bring these thoughts to the workshop at the end.
We found four researchers who were available to review the device worked. Our
goal here was to invite these experts to explore with us and find out what questions to
ask participants [21].
We describe the specific methods we used during workshop 2 in the section “Second
Workshop” to maintain continuity and legibility of this work (Fig. 2).

Fig. 2. Touching the traces on the Kintsugi bowl with the Electronic Kintsugi boxes displaying
light and playing sound.

5 First Workshop: FabCafe Tokyo

Workshop 1 informed our work and to set the scene for workshop 2. The workshop was
conducted in both English and Japanese, and participants could communicate in their
preferred language. We used a written questionnaire so participants could answer in
their preferred language. We briefly present workshop one and then move to reflect on
findings from workshops one and two.
After a brief demonstration of function, the Electronic Kintsugi was explored by
participants. They touched the traces with one, two or all fingers, and tried turning the
ceramics over, holding it in one hand or two. We explained “the output could be
anything, it could start your car, or feed your pet”.
Electronic Kintsugi 111

Since participants were familiar with the interaction technique after exploring the
sound interaction, the light interaction had a much different approach. Participants knew
how they could touch it, with one or several fingers and they now focused on light or
harder touches, strokes, or resting their finger on the traces. The light was much more
unpredictable than the sound. Whereas with the sound, they were acting almost as musi‐
cians, experimenting to find patterns and particular notes, with the light it was more
about getting a bigger or smaller reaction than it was about the nuances in between these
small or large bursts of light. One participant asked, “I want to know how much it’s me
that is controlling it and how much it is doing on its own”.

5.1 Findings
We highlight several responses here from the questionnaire to inform future researchers
in this field who might be interested in working further with this.
• Encouraging senses and emotions
– Being able to handle the Kintsugi was a special experience, “There is a different
feel to a real Kintsugi. It’s rare to see the hitting of the device so profoundly.”
(P-1A) and “We’re often not given permission to touch traditional art. It feels
good to be encouraged to touch it.” (P-1E).
• An interest in other senses: taste, smell, and food
– One participant who suggested it be used as a bowl to eat from “Japanese people
eat with bowls close to their mouth, so I want to see some sound installation when
someone is eating” (P-1A) and another who suggested that it could be used for a
cat or dog food request device “imagine the cat’s tongue licking the Kintsugi!.”
(P-1C).
• Light – Unpredictable but has potential
– One participant noted that the light reminded them of a starry sky and stated, “In
a larger, or aesthetically ordered or different setting (night), it would be very
soothing” (P-1C). Another participant was inspired and shared an idea “The
combination of the craft and the touch with the light feedback reminded me of the
challenges of regaining fine motor control in a finger after an accident. The focus
required and the tranquility of the lights may be a fun alternative physical therapy.”
(P-1E)
• Sound – Alive characteristics
– One participant remarked, “Craft has character, especially as it ages. How might
that character be represented as sound? I feel the sounds were lovely but not
aligned with the character of the craftwork. Or maybe it had juxtaposition of sound
quality and physical character which enhances the contrast between tradition and
technology.” (P-1F). Two participants related to the object in an anthropomorphic
way, stating “It was like the cup was telling me how he/she’s doing. Since Kintsugi
part is a past wound, sometimes I felt like it’s telling me it had pain.” (P-1E).
112 V. J. Carpenter

5.2 Findings Summary


The workshop provided us with some considerations about the role of art and objects
and potential interactivity from these objects. Participants were excited to play with art
and traditional craft based objects. They were fascinated by the light and sound output
and could extrapolate to imagine other interaction scenarios. They explored the aesthetic
interaction qualities and played the Kintsugi like an instrument, using expressive hand
gestures to explore the touch interaction. And they could reflect on the role of technology
and tradition and how we live our lives: “Developing a closer, more physical relationship
with the objects in our lives feels meaningful.” (P-1E).

6 Second Workshop: Copenhagen

To prepare for the second workshop, we asked participants to spend 20 min in silence
[22] to complete a written activity to gather their pre-workshop thoughts and feedback
prior to engaging in dialogue.
We used Kujala, Walsh, Nurkka, and Crisan’s [23] method of sentence completion
to extract these initial reactions. We provided the instructions that participants should
answer quickly (20 questions in 20 min) and the beginning of the sentence was given,
which was then completed by the participant in a way they saw fit. Kujala and Nurkka
[24] used categories of user values to classify questions. In Fig. 1, one can see the
sentences we defined, as per each value category. We tried to make a nearly even number
of positive and negative questions, and allowed extra space if they wished.

6.1 Sentence Completion Tool


Electronic Kintsugi 113

A Likert scale [25] was used to determine their reactions to sound and light inter‐
actions. We asked participants to rate the light and sound interaction. For light, we asked
“I found the light output to be:” and gave one of the scales the value of “Calming” and
the other end of the scale “Attention Seeking”. For sound, we asked the same, but added
an additional scale of “noise” to “music”.
We spent the remaining 2.5 hours engaged in a group discussion about their expe‐
riences, comparing, contrasting, and exploring possible future interactions.
114 V. J. Carpenter

6.2 Findings of Workshop Two


We used mind mapping as a technique to map out the responses from the discussion and
journals [26]. We present here the results of the sentence completion as well as the
discussion and journals.

6.3 Sentence Completion

We compared the sentence completion responses sentence by sentence and by category.


The Electronic Kintsugi was described as “enjoyable, calming, interesting, and
different” in the one word descriptions. The findings from participants, ordered by the
Sentence Completion Tool headlines [23] were:
General: Participants felt a sense of achievement when interacting with others and
felt connected to it when it: “reacted to my own and others touching it”.
General: Predictability. They were disappointed and frustrated with the light inter‐
action: “the light interaction was unpredictable, non-responsive and not interesting”. It
is noted here that in both workshops, the light was reported to be not as responsive as
the sound input. Participants in both workshops reported that they were more fascinated
with the sound feedback, particularly because there were more nuances in the sound
than in the light.
Emotional: Participants described their emotional response as “playfulness and
companionship, calming, joy and puzzled” and again highlighted their frustration with
the lights, describing them as “underwhelm(ing), disappoint(ing), and distanced”. Two
participants referenced the social values and stated that their best experiences were while
playing with others.
Stimulation and epistemic: Participants described the changing soundscape,
mentioned their desire to use it when someone asked about it.
Growth and self-actualization: Participants described both, relaxation and concen‐
tration as well as creative thinking and social interaction as outcomes of their interactions
with the Electronic Kintsugi.
Traditional values: Participants noted that, as an object in their home, it was “cute
and modern”, “playful and interactive” and that it “combined ceramics with playful‐
ness”.
Finally, in the extra space provided, three responses were thought provoking
• I kept receipts in it and I liked how it became less precious and more functional
• I wonder if you were tracking my use
• It was a search into new creative possibilities.
The Likert Scales gave us the below results, indicating that while results varied, light
was generally thought to be more attention seeking than calming, sound was found to
be generally more calming than attention seeking and sound was more musical than
noisy.
Electronic Kintsugi 115

“I found the light output to be:” (Calming = 1, Average rating of 5.75 (Actual Rating Values
Attention Seeking = 10) = 8, 4, 4, 7)
“I found the sound output to be:” (Calming = Average rating of 3.75 (Actual Rating Values
1, Attention Seeking = 10) = 3, 3, 7, 2)
Extra question for sound: (Noise = 1, Music = Average rating of 6.25 (Actual Rating Values
10) = 6, 5, 5, 9)

From the discussion and journaling, three primary categories of interest emerged:
(1) enhancing human connection through embedded or “magic” technology, (2) using
a craft based object in prompting personal reflection and development, and (3) exploring
transferable design principles of smart products with a device which has no defined
purpose, and which converges traditional craft and technology. In the accounts below,
participants focused primarily on the sound based interaction as they were not interested
in the light interaction and spent most of their time with sound (Fig. 3).

Fig. 3. The Electronic Kintsugi bowl with a design researcher, she is playing with the light as a
break from work.

7 Three Themes Identified

7.1 Enhancing Human Connection Through Embedded or “Magic” Technology


There were several accounts of how the Electronic Kintsugi sparked social connections
and interactions. Antonio had placed it in the kitchen and he explained that the bowl on
its own might not have sparked curiosity but the box did and visitors asked what it was
and then wanted to play with it. For Sandra, she was having an evening of entertaining
guests, and as they were finally leaving (she was tired), she stood in the doorway, and
absent-mindedly touched the bowl as they were putting on their shoes. The guests
became immediately intrigued and asked questions and wanted to play with it, which
116 V. J. Carpenter

was both charming and exhausting, since, as Sandra explained, she was ready for them
to go home, but also happy to play and show them the bowl. For Henry, it was a social
life saver as he suddenly found himself spending time with his father in law who doesn’t
speak much English, and Henry doesn’t speak much Danish. The Electronic Kintsugi
came to the rescue as a medium they could explore together, without a need for verbal
language. Martin explained that he took it on the bus and it was “totally inappropriate”
there, it was loud and kept making screeching noises. He was frustrated with it, and
imagined if it was quiet and making nicer sounds as it often did (though, not on the bus)
then he could have asked others to join in on the playing.
The ‘magic’ of the object was intriguing to people who didn’t know what it was and
sparked both play and conversation, even, in Sandra’s case, when they should have been
leaving. It offered a needed social lubricant in the case of Henry and sparked ideas on
how to engage strangers on the bus for Martin. Having an everyday object have ‘magical’
and unexpected properties, without being a gadget, or being used for some other purpose
(a fancy remote, a communications device, etc.) seemed to be the key to sparking this
social interaction. Unexpected qualities of playfulness via a changing soundscape were
the right recipe for the Electronic Kintsugi.

7.2 Using an Everyday Object in Prompting Personal Reflection and


Development
Our experts felt that an everyday object combining traditional craft and technology was
important, commenting that they “wanted to come back to it again, it levels up, it evolves
over time” (Martin) and “I love that it’s not intuitive, you have to spend time with it and
get to know it. It’s nice that it doesn’t have a defined purpose, somehow it’s good to just
have something nice and electronic in your home, especially with the copper tape, it
feels like a crafted aesthetic, you can see craft, and the time put into it, but you can’t see
code, so somehow this makes tangible the craft of the code”. (Henry). Sandra likened it
to a “Tibetan singing bowl, you have to hit it just right and there’s a pleasure behind
controlling that energy”. And Martin continued, “The electronics force you into move‐
ment, I’ve never done this with an Ikea bowl”.
Bringing together physical and digital materials, considering both the craft of the
object and the craft of the code, and, considering the social surroundings that the object
inhabits were important aspects of creating a hybrid craft [16].
For us, it is the combination of these things which is a significant part of designing
for meaningful interactions and experiences when working with future smart everyday
products in the home.

7.3 The Role of an Object with a Non-defined Purpose

The fact that the purpose of the object was open-ended was well-liked, and the partici‐
pants used this opportunity to explore the possibilities with it. Some of their comments
included “I love that it’s not intuitive, you have to spend time with it and get to know
it” (Martin) and “It was interesting, as a dancer, that I played a lot with the hand move‐
ments and did improvised hand movements” (Sandra).
Electronic Kintsugi 117

It was briefly discussed what it might be like to grow up with an object like this in
your home, instead of an iPad or TV, and how that might change your perceptions of
how you interact with the world, and come to appreciate objects. Sandra explained “I
prefer it as an ornament, something non-connected. It can be a companion, or a container,
such as for my receipts.” The combination of a non-defined interaction purpose with the
functionality of a common object, a bowl, seemed to work well to invite playful and
curious interactions.
While some experts poured water into the bowl to explore the sound, Antonio took
it a step further, and ate his breakfast cereal from the bowl, “it made me aware of how
fast I was eating”. (Interestingly, in workshop one, this was a suggestion from partici‐
pants, that it could be nice to eat from the bowls). The choice to use a bowl came from
our fascination with Kintsugi and the tendency there to repair bowls, and we learned
that as a starting object for this exploration, a bowl has so many inherent properties,
something to eat from, to store things in, as a decorative object, as a historical object,
it’s nice to hold, and it exists in many cultures, and many homes.
Creating an object with non-defined purpose can be one way to encourage curiosity,
playfulness and an opportunity for the creation of meaningful or important moments in
one’s life, especially when there is a human-to-self (self-development) or human-to-
human (social) aspect. On the contrary, further interaction design would be necessary
once an object moves beyond being something with a non-defined purpose. In this work,
our focus on a non-defined purpose is not disregarding designing interactions for a
specific context, but rather our focus is on designing interaction concepts at an earlier
phase of the project development.

8 Discussion

It is worthwhile to revisit Borgmann (as described by Fallman [19]) here, who worried
that technology would “turn us into passive consumers, increasingly disengaged from
the world and from each other” [19]. Our aim with Electronic Kintsugi, and a focus on
designing for ambiguous interactions with everyday objects, is to move back towards
each other, towards engagement with familiar objects, towards creativity and playful‐
ness and that it is “not simply [a] neutral means for realizing human ends, but actively
help[s] to shape our experiences of the world” [19].
Despite work in academia developing tangible, non-screen devices or criticising IoT
(as earlier presented) the products which emerge on-market today are not abundantly
reflective of this. These products do not necessarily engage people on a human-to-human
or human-to-self level and instead, often cater to fixing a small problem without neces‐
sarily considering a more holistic impact. Cila, Smit, Giaccardi and Kröse [8], describe
the current approach to IoT as being short-sighted and emphasize the potential for the
role of interaction design in new smart things. In our work, we expand on this, and
emphasize a need for smart things to perhaps be rooted in craft to enhance meaning-
making, to utilize non-screen interaction, and to move towards facilitating human-to-
human or human-to-self exploration.
118 V. J. Carpenter

We further emphasize the role of a device with an undefined interaction purpose, as


opposed to the very specific devices emerging on market today such as smart candles4
(controllable via app) or smart hair brushes.5
Although we needed to use copper tape to achieve the conductivity, in the future,
we would like to explore which material properties would allow a Kintsugi artist to
create something more conductive using the traditional precious metals. Given this, the
most significant aspect was the conceptual consideration of how one might interact with
an object which had been created by an artist, but is otherwise an ‘everyday object’ (one
which we might find in our homes anyway, such as a bowl).
Returning to Cranny-Francis’ semefulness, we can see the aspects of physical,
emotional, intellectual, spiritual, social, and cultural [18] in the Electronic Kintsugi. We
essentially augment a crafted object with technology, with the aim of created an
enchanted [14] everyday object with a historical, crafted background which is open to
interpretation and explorative play. The role of an enchanted [14] everyday object is
especially important to consider in a world of increasing IoT gadgets. Considering a
future vision of connected everything, we feel it is important that we do not become too
focused on the technology, such as having RFIDs under our skin [27] or being laden
with smart tablets, smart watches or smart water bottles; but rather, that we embrace
humanness.
We want to create devices which provoke thoughtful and critical reflection, and
engage people on a tangible level; not just a screen asking if you’ve been mindful today
[28]. When considering the design of new ‘smart’ objects, we should perhaps ask, “does
it need to be connected, and if so, why?”, or “how can I enhance the existing values in
this everyday object?” A door handle for example, doesn’t just open a door, it is the
literal door to coming home from work, relaxing after a long day, seeing your family
again, and more.6 The affordances inherent in everyday objects are many and it is our
job as interaction designers to not only invent new technologies and uses but to consider
how to support these values and avoid turning the objects in our world into cloud-
connected gadgets.
Electronic Kintsugi embraces new technology and established craft practices,
emphasizing curiosity and playfulness while facilitating interaction between people and
the self. Furthermore, we felt that the aspect of craft was a key identifier in what made
the everyday object special. The history and delicate quality of the Kintsugi had multiple
reactions, the participants in Japan were intrigued that they were allowed to play with a
piece of art, and the participants in Denmark were eager to engage with, and learn more
about Kintsugi. Our primary concern was the investigation of a non-screen, tangible
everyday object coming from a place of craft, and in future work we hope to further
investigate how we could work with a Kintsugi artist to create a fully functional piece
of Electronic Kintsugi, with capacitive traces in the piece.

4
https://www.ludela.com/.
5
https://www.kerastase-usa.com/connected-brush.
6
From an interview with designer Carl Alviani (http://meaningfuldevices.vanessa‐
carpenter.com/2017/08/10/anything-but-personal-is-a-failure/).
Electronic Kintsugi 119

9 Conclusion

In this work, we have presented Electronic Kintsugi: an exploration in how an everyday


object (a bowl) in combination with artisanal craft (Kintsugi) and electronics (conduc‐
tive sensing) could result in more human-to-human connection and human-to-self
development. Through two workshops, one in Japan with a Kintsugi artist and partici‐
pants, and one in Denmark, with design research experts, we explored the properties of
this Electronic Kintsugi, an interactive object with no defined purpose and two main
interaction outputs - sound and light. We found that sound as feedback was of significant
interest due to its nuanced nature and reactiveness, and between workshops, the sound
was programmed to evolve over time with use.
Using copper tape, we augment a traditional, crafted object, namely, Kintsugi with
electronics, and call it Electronic Kintsugi, creating an open platform for play, explo‐
ration and development. In future work, we hope to continue work with Kintsugi artists
to find a material which can be used in the craft practice, which would also be conductive
enough for Electronic Kintsugi.
We identified three categories of reflection from our studies with participants, and
areas which future smart products can look to, to enable more meaningful interactions
between human and human and human and device. These categories are: (1) enhancing
human connection through embedded or “magic” technology, (2) using everyday objects
to prompt personal reflection and development, and (3) exploring transferable design
principles of smart products with a device of undefined purpose, and which converges
traditional craft and technology.
Finally, we discussed that as interaction designers, we would like to focus on
embracing humanness in future technology designs and could look to the values and
affordances inherent in everyday objects to bring out these values and design for these
moments in our lives.

Acknowledgment. We are grateful to FabCafe Tokyo, Kurosawa-San, the participants of


workshop one, the design experts of workshop 2, and all the user testers and helpers along the
way.

References

1. Zheng, C., Nitsche, M.: Combining practices in craft and design. In: Proceedings of the Tenth
International Conference on Tangible, Embedded, and Embodied Interaction (TEI 2017), pp.
331–340. ACM, New York (2017). https://doi.org/10.1145/3024969.3024973
2. Zoran, A., Buechley, L.: Hybrid reassemblage: an exploration of craft, digital fabrication and
artifact uniqueness. Leonardo, 46(1), 4–10 (2013). http://www.research.lancs.ac.uk/
portal/en/publications/designing-information-feedback-within-hybrid-physicaldigital-
interactions(4709b666-bbe3-46f8-ad3a-6d06fdd6f5cd)/export.html
3. Lingel, J.: The poetics of socio-technical space: evaluating the internet of things through craft.
In: Proceedings of Conference on Human Factors in Computing Systems (CHI 2016). ACM,
New York (2016). https://doi.org/10.1145/2858036.2858399
120 V. J. Carpenter

4. Schoemann, S., Nitsche, M.: Needle as input: exploring practice and materiality when crafting
becomes computing. In: Proceedings of the Eleventh International Conference on Tangible,
Embedded, and Embodied Interaction (TEI 2017). ACM, New York (2017). https://doi.org/
10.1145/3024969.3024999
5. Hogan, T., Hornecker, E.: Feel it! See it! Hear it! Probing tangible interaction and data
representational modality. In: Proceedings of DRS 2016, Design Research Society 50th
Anniversary Conference, Brighton, UK (2016)
6. Kettley, S., Sadkowska, A., Lucas, R.: Tangibility in e-textile participatory service design
with mental health participants. In: Proceedings of DRS 2016, Design Research Society 50th
Anniversary Conference, Brighton, UK (2016)
7. Mols, I., van den Hoven, E., Eggen, B.: Informing design for reflection: an overview of current
everyday practices. In: Proceedings of the 9th Nordic Conference on Human–Computer
Interaction (NordiCHI 2016). ACM, New York (2016). https://doi.org/
10.1145/2971485.2971494
8. Cila, N., Smit, I., Giaccardi, E., Kröse, B.: Products as agents: metaphors for designing the
products of the IoT age. In: Proceedings of the 2017 CHI Conference on Human Factors in
Computing Systems (CHI 2017), pp. 448–459. ACM, New York (2017). https://doi.org/
10.1145/3025453.3025797
9. Akama, Y., Light, A., Bowen, S.: Mindfulness and technology: traces of a middle way. In
Proceedings of the 2017 Conference on Designing Interactive Systems (DIS 2017), pp. 345–
355. ACM, New York (2017). https://doi.org/10.1145/3064663.3064752
10. Mols, I., van den Hoven, E., Eggen, B.: Balance, cogito and dott: exploring media modalities
for everyday-life reflection. In: Proceedings of the Eleventh International Conference on
Tangible, Embedded, and Embodied Interaction (TEI 2017), pp. 427–433. ACM, New York
(2017). https://doi.org/10.1145/3024969.3025069
11. Wakkary, R., Oogjes, D., Hauser, S., Lin, H., Cao, C., Ma, L., Duel, T.: Morse things: a design
inquiry into the gap between things and us. In: Proceedings of the 2017 Conference on
Designing Interactive Systems (DIS 2017), pp. 503–514. ACM, New York (2017). https://
doi.org/10.1145/3064663.3064734
12. Núñez Pacheco, C., Loke, L.: Tacit narratives: surfacing aesthetic meaning by using wearable
props and focusing. In: Proceedings of the Eleventh International Conference on Tangible,
Embedded, and Embodied Interaction (TEI 2017), pp. 233–242. ACM, New York (2017).
https://doi.org/10.1145/3024969.3024979
13. Tsaknaki, V., Fernaeus, Y.: Expanding on wabi-sabi as a design resource in HCI. In:
Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI
2016), pp. 5970–5983. ACM, New York (2016). https://doi.org/10.1145/2858036.2858459
14. Rose, D.: Enchanted Objects: Design, Human Desire, and the Internet of Things. Simon and
Schuster, New York (2014)
15. Tsaknaki, V., Fernaeus, Y., Schaub, M.: Leather as a material for crafting interactive and
physical artifacts. In: Proceedings of the 2014 Designing Interactive Systems (DIS 2014).
ACM, New York (2014). https://doi.org/10.1145/2598510.2598574
16. Tsaknaki, V., Fernaeus, Y., Rapp, E., Belenguer, J.S.: Articulating challenges of hybrid
crafting for the case of interactive silversmith practice. In: Proceedings of the 2017
Conference on Designing Interactive Systems (DIS 2017), pp. 1187–1200. ACM, New York
(2017). https://doi.org/10.1145/3064663.3064718
17. Nordrum, A.: Popular Internet of Things Forecast of 50 Billion Devices by 2020 Is Outdated
(2016). https://spectrum.ieee.org/tech-talk/telecom/internet/popular-internet-of-things-
forecast-of-50-billion-devices-by-2020-is-outdated
Electronic Kintsugi 121

18. Cranny-Francis, A.: Semefulness: a social semiotics of touch. Soc. Semiot. 21(4), 463–481
(2011). https://doi.org/10.1080/10350330.2011.591993
19. Fallman, D.: The new good: exploring the potential of philosophy of technology to contribute
to human–computer interaction. In: Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems (CHI 2011), pp. 1051–1060. ACM, New York (2011). https://
doi.org/10.1145/1978942.1979099
20. Hobye, M.: Designing for Homo Explorens: Open Social Play in Performative Frames, pp.
16–17. Malmö University, Malmö (2014)
21. Bødker, S.: When second wave HCI meets third wave challenges. In: Mørch, A., Morgan,
K., Bratteteig, T., Ghosh, G., Svanaes, D. (eds.) Proceedings of the 4th Nordic Conference
on Human–Computer Interaction: Changing Roles (NordiCHI 2006), pp. 1–8. ACM, New
York (2006). https://doi.org/10.1145/1182475.1182476
22. Martin, B., Hanington, B.: Universal Methods of Design. Rockport Publishers, Beverly
(2012)
23. Kujala, S., Walsh, T., Nurkka, P., Crisan, M.: Sentence completion for understanding users
and evaluating user experience. Interact. Comput. 26(3), 238–255 (2014). https://doi.org/
10.1093/iwc/iwt036
24. Kujala, S., Nurkka, P.: Identifying user values for an activating game for children. In:
Lugmayr, A., Franssila, H., Sotamaa, O., Näränen, P., Vanhala, J. (eds.) Proceedings of the
13th International MindTrek Conference: Everyday Life in the Ubiquitous Era (MindTrek
2009), pp. 98–105. ACM, New York (2009). https://doi.org/10.1145/1621841.1621860
25. Brooke, J.: SU: a quick and dirty usability scale. In: Jordan, P., Thomas, B., Weerdmeester,
B.A., McClelland, I. (eds.) Usability Evaluation in Industry, pp. 189–194. Taylor & Francis,
London (1996)
26. Wheeldon, J., Faubert, J.: Framing experience: Concept maps, mind maps, and data collection
in qualitative research. Int. J. Qual. Methods. (2009). https://doi.org/
10.1177/160940690900800307
27. Astor, M.: Microchip implants for employees? One company says yes. New York Times
(2017). https://www.nytimes.com/2017/07/25/technology/microchips-wisconsin-company-
employees.html
28. Newman, K.M.: Free Mindfulness Apps Worthy of Your Attention. Mindful (2017). https://
www.mindful.org/free-mindfulness-apps-worthy-of-your-attention/
A Novel and Scalable Naming Strategy
for IoT Scenarios

Alejandro Gómez-Cárdenas ✉ , Xavi Masip-Bruin,


( )

Eva Marín-Tordera, and Sarang Kahvazadeh


Advanced Network Architectures Lab (CRAAX), Universitat Politècnica de Catalunya (UPC),
Barcelona, Spain
{alejandg,xmasip,eva,skahvaza}@ac.upc.edu

Abstract. Fog-to-Cloud (F2C) is a novel paradigm aimed at increasing the


benefits brought by the growing Internet-of-Things (IoT) devices population at
the edge of the network. F2C is intended to manage the available resources from
the core to the edge of the network, allowing services to choose and use either a
specific cloud or fog offer or a combination of both. Recognized the key benefits
brought by F2C systems, such as low-latency for real-time services, location
awareness services, mobility support and the possibility to process data close to
where they are generated, research efforts are being made towards the creation of
a widely accepted F2C architecture. However, in order to achieve the desired F2C
control framework, many open challenges must be solved. In this paper, we
address the identity management challenges and propose an Identity Management
System (IDMS) that is based on the fragmentation of the network resource IDs.
In our approach, we divide the IDs into smaller fragments and then, when two
nodes connect, they use a portion of their full ID (n fragments) for mutual iden‐
tification. The conducted experiments have shown that an important reduction in
both, the query execution times and the space required to store IDs, can be
achieved when our IDMS is applied.

Keywords: IDMS · Identity management · Fog-to-Cloud · Resource identity

1 Introduction

The Internet of Things (IoT) is a communication paradigm that allows all kind of objects
to connect to the Internet network. According to [1] on 2020 the number of connected
devices will reach the 50 billion, that is, 6.58 times more than estimated world population
for the same year. Aligned to the constant growth of the IoT devices population, the
amount of data they generate at the edge of the network is growing as well. Every day,
large volumes of data in all formats (video, pictures, audio, plain text, among others)
are generated and then moved to cloud datacenters to be processed. In fact, it is estimated
that in the near future only an autonomous car will produce up to 4 TB data on a daily
basis [2].
It is widely accepted that useful information can be extracted from data, using cloud-
based data mining techniques. Nevertheless, moving large amounts of data from the

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 122–133, 2019.
https://doi.org/10.1007/978-3-030-02686-8_10
A Novel and Scalable Naming Strategy for IoT Scenarios 123

edge to the datacenters located at the core of the network may incur significant overhead
in terms of time, network throughput, energy consumption and cost [3]. To overcome
these issues, novel computing paradigms such as fog computing have emerged at the
edge of the network.
Fog computing is a paradigm intended to extend cloud computing capacities to the
edge of the network, allowing data to be processed and aggregated close to where it is
generated [4]. The fact that Fog computing is deployed close to the end users devices
facilitates some key characteristics for IoT services and applications, such as for
example, low-latency, mobility, and location-awareness [5]. Indeed, Fog computing
emerged to collaborate with cloud computing, thus not competing each other.
Nowadays, the new combined fog-to-cloud [6] proposed to ease service execution
in hierarchical fashion fog, cloud, or combination of both. There are two ongoing
projects to deploy the hierarchical and combined F2C system. One of them is called
OpenFog consortium [7] and another one mF2C [8]. The mF2C project, at early stage
proposed a hierarchical and layered architecture that the whole set of resources can be
executed in cloud, fog, or combination of both. In mF2C, distributed fog nodes can be
utilized for delay-sensitive and demanded low-latency services and processing at the
edge of the network, and in parallel, cloud can be used for massive and long-term
processing and storage.
In a realistic scenario, F2C is shown as a hierarchical three tiers architecture [9]
where the most constrained devices are located at the lower tier. The middle tier is
integrated by nodes that act as aggregator of the available resources for the lower layer
(see Fig. 1) and finally, at the top of the hierarchy, the cloud datacenter is located.

Fig. 1. Fog-to-Cloud general topology.

Certainly, the F2C resources continuum must be managed by a control strategy (sort
of control plane), but because there are still many challenges to be solved, the control
concept as a whole, is yet an open issue for Fog and surely F2C systems.
One of the challenges to be addressed in F2C systems is the lack of an Identity
Management System that meets the specific paradigm requirements. In F2C, the Identity
Management System (IDMS) is the set of functions aiming to provide a mechanism to
assign and in general to manage, the resource identities of both, physical and virtual
devices. According to [10], the management of the resource identifiers at the edge is
very important for programming, addressing, things identification, data communication,
124 A. Gómez-Cárdenas et al.

authentication, and security. Thus, the IDMS is a key component of the F2C control
plane framework.
In short, some of the features an IDMS should provide in F2C system are: (i) the
capability to scale smoothly in parallel with the network; (ii) supporting devices
mobility without losing their identities; (iii) security and privacy protection; (iv)
interoperability among different service providers and; (v) supporting highly
dynamic network topologies.
In this paper, we focus on the IDMS challenge and propose a solution that address
the aforementioned system requirements. The key contributions of our work when
compared with other available solutions include the mobility support, it is, the capability
of the edge devices to keep their identifiers, even when they are on the move. Such ID
persistence eases the mutual identification and authentication processes between a node
and an aggregator node in future interactions. Likewise, the IDMS strategy that we
propose allows to adjust the identifiers size that the resources use in the network without
losing the identity uniqueness property. Finally, unlike other solutions, our proposal is
focused in reducing the compute load required to identify the resources in the network.
This undoubtedly benefits the entire network, especially the lowest layer in the hierarchy
where the resources are very constrained and therefore a more efficient management of
them is required.
The remainder of this paper is organized as follows. In Sect. 2 other IDMS solutions
are reviewed. In Sect. 3 our IDMS proposal is described. The evaluation and results are
presented in Sect. 4 and finally, in Sect. 5 the conclusions and future work are discussed.

2 Related Work

In computer networks, the name and the address of a device stand for two different
things. The general distinction between a name and an address is that a name can remain
with an entity even when that entity is mobile and moves among different locations (i.e.
addresses) [11]. From the IDMS perspective, the mobility support offered by F2C means
that the identifiers assigned to the network resources are persistent, i.e., they remain even
if the attributes, such as the location of the devices change. Therefore, the usage of
addressing techniques to manage the resources identity in F2C is not the proper solution.
Rather, an IDMS that gives support to both, static and mobile nodes in the network, must
be considered.
Under this premise, in this section we pay special attention to IDMS solutions whose
target include IoT-devices. The rationale of this decision is that generally speaking, IoT
puts together static and mobile devices, thus, providing support to all of them is manda‐
tory in any solution to be deployed in the IoT arena.
In [12], authors present a smart home operating system for the IoT named EdgeOSH.
In EdgeOSH, the architecture component in charge of managing the devices identities
is the naming module. Such module allocates unique human friendly names describing
the location (where), role (who) and data description (what) of the devices, for example,
LivingRoom.CellingLight.Bulb2. These names are used by the operating system to
manage services, data and devices.
A Novel and Scalable Naming Strategy for IoT Scenarios 125

Nevertheless, the way in which EdgeOSH manages the devices identities presents
several drawbacks that prevent it from being used in F2C environments. For example,
human-meaningful names ease to disclose sensitive information and to access unau‐
thorized network resources through masquerade attacks. Another issue refers to the fact
that it is not prepared to support the tremendously large number of devices expected in
F2C, i.e., therefore, it is not scalable. As a consequence, the authors concluded that an
efficient IDMS for the IoT is still an open problem and further investigation is required.
Motivated by the need of an identity information service where the provider of the
service is unable to access the information that passes through their servers, authors in
[13] proposed BlindIdM, an Identity Management as a Service (IDaaS) model with a
focus on data privacy protection. In such model three main type of actors are defined:
users, service providers and identity providers. The user is a node in the network with
the identity information of a set of entities and its goal is to transfer such information to
the service provider in a secure fashion. The authors claim that through encryption
techniques, BlindIdM permits to send the identity information from the user to the
service provider without the identity provider being able to read it. To achieve this, the
information is initially encrypted by the user, then re-encrypted by the identity provider
and finally decrypted by the service provider. The results obtained during the evaluation
of the proposal show assumable times for the three cryptographic operations, however,
it is important to note that these operations were performed by powerful cloud data
centers. Given the decentralized nature of the F2C paradigm, it is likely that some of the
key functions of the control plane will be executed in the edge of the network, including
the identity management service. In this sense, the three cryptographic operations
proposed by the authors may cause an important bottleneck, degrading the overall
system quality of service (QoS) in terms of response times.
In [14] authors introduce a user-centric identity management framework for IoT.
They propose the creation of a global identity provider (gIdP), responsible for main‐
taining global identity. The gIdP is used by the service providers (SP) to generate local
identity. However, this proposal has two major drawbacks: (i) the global identity
provider represents a single point of failure in the system – such centralization contra‐
dicts the F2C paradigm; (ii) the proposed framework is intended to provide identities to
the user rather than the devices. In F2C, regardless of whether several devices belong
to the same person, every node in the network must have its own unique identifier, thus,
an object-centric approach should be applied.
The work in [15] present a machine-to-machine IDMS that allows network devices
to generate multiple pseudonyms to be used as identifier in different applications. They
use anonymous attestation to perform verification of the pseudonym, i.e., an interactive
method for one party to prove to another that the pseudonym is valid and should be
accepted but without revealing anything else than the validity of the pseudonym. The
problem of implementing this identity management strategy in a F2C systems is that the
anonymous attestation is a set of complex mathematical expressions that the nodes have
to solve in order to validate the identity of other nodes. Thus, the calculations destined
to validate the identity of other devices will add a significant delay in the connection
establishments between nodes, mainly motivated by the low-computational power
devices at the lowest F2C layer have.
126 A. Gómez-Cárdenas et al.

3 IDMS Proposal

The IDMS proposal is partitioning globally unique IDs into the set of smaller fragments
(fg). The fragments partitioning eases network resources to be identified by a fraction
of their ID instead of the full identifier according to their position in the hierarchical
F2C network as shown in Fig. 2.

Fig. 2. Identifier fragmentation.

First of all, we define the hierarchical F2C network connection between two nodes.
In F2C, the connection will be given by the node at the higher hierarchical level.
According to [9], three layers are identified at early stage for the F2C system Although,
the proposed three layer F2C system is not considering inter-service-provider interac‐
tion, therefore, we assume fourth layer such as follows:
– Edge: This F2C connection provides all occurred connection among resources
(physical devices or virtual entities) under the same fog node. The resources that form
an area at the edge layer are located geographically closer to each other. For example,
an area at the edge can be considered as a hospital building or a school.
– Fog: The fog layer connection includes the connections among the fog nodes and the
resources that they aggregate. An example of this connection layer can be a connec‐
tion between a sensor and another device grouped under different fog nodes.
– Cloud: This Connection layer includes all resource connections established by the
same service provider. The main difference with the fog layer connection is that
resources may be located geographically far from each other. For example, resources
in different cities connected by the same Internet Service Provider.
– Global: This connection layer is that all connections among resources stand in global
concept. In this context, the resources may or may not be located close to each other
and thus inter service provider connectivity plays a key role. For example, connation
between two smart cities provided by two service providers.
Figure 3 presents the four described F2C connection layer and its borders specifi‐
cations. Since the number of layers in the F2C architecture may be changed, the set of
F2C layer connection and the ID fragmentation policy may be changed as well to be
A Novel and Scalable Naming Strategy for IoT Scenarios 127

aligned properly with the number of F2C layers. Therefore, it is worth highlighting that,
this is a simple approach.

Fig. 3. Hierarchical F2C network connections.

Once the all F2C connection has been defined for F2C network topology, we divide
the resource identifiers into n parts, where n is the number of F2C connections layer
defined in the F2C system. Now, every time a connection between two nodes is estab‐
lished in the network, the nodes use a fraction of the identifier rather than using the full
identifier for a mutual identification. The number of fragments to be used in each
connection depends on the node at the higher hierarchical level. For example, the F2C
network topology illustrated in Fig. 3, the device (b) connects to the Fog node #2, such
F2C connection will be set as Fog connection. Then, only two fragments of the global
identifier will be utilized during the identification process.
In fact, from F2C connection and topological perspective, nodes which are located
at higher layers need to use more ID fragments, and consequently, the utilized ID during
the connections with other nodes will be larger. The reason for this is that nodes in higher
layers have more devices as child. Therefore, to be able to identify each of these devices,
longer fragments in identifiers will be required.
Regarding the fragments of identities division, we mention that according to the
different use-cases and implementation needs, length of the fragments may be varied.
The lower layer in a F2C system is the IoT layer. In the IoT layer, the length of the first
fragment would depend on the maximum number of resource IDs that a fog node can
store in cache during a given period of time, that is, the identifiers cache size. Large
identifiers cache sizes in the fog nodes also entail larger identifier fragments. IoT devices
might has limited resource characteristics, therefore, small cache sizes might be
expected in this layer. Fog nodes can play a key role for adjusting the ID fragment length
to collision problems do not arise. Collision problem in the naming are addressed in [16–
18]. In the proposed identity management, a collision problem occurs when two or more
resources in a F2C connection use the same identifier. Thus, since the purpose of IDs is
identifying unambiguously a resource, the collision probability must to be reduced.
128 A. Gómez-Cárdenas et al.

In order to enhance the IDMS security and privacy, the full resource identifier is not
propagated nor stored through the network but it is only known by: (i) the resource to
which the ID belongs; (ii) the fog node as long as the resource is connected to the F2C
network through it, and; (iii) other resources in a global connection that require the full
resource ID for a proper identification. In short, preventing collisions during the iden‐
tification process is the reason that drives nodes in a global connection to use their full
ID instead of a fraction of it.
In our proposal, fog nodes play a key role because they perform IDs fragmentation
and share the required resource ID fragments with other nodes according to the F2C
connection layers in F2C systems.

4 Evaluation and Results

In this section we present the description of the experiment we used to validate our
proposal and the results obtained. For the results, we have compared the storage required
to store the resource identifiers and the queries execution times when the resources use
their full identifier in the network and when they use a fraction of it, hence, two param‐
eters have been considered during the evaluation.
In F2C, the resources grouped in the lowest layer of the network hierarchy will be
the most challenging to identify. Such complexity is caused by the tremendous number
of devices concentrated in the bottom of the network topology (user’s devices, sensor
networks and other IoT artifacts), the lack of control that the service provider will have
over those devices and the highly dynamic network topology caused by the inherent
mobility of many devices. Thus, recognized the aforementioned as a fact, in this section
we focus in the IoT layer, hence evaluating the performance of our proposal when using
the first ID fragment.

4.1 Experiment Description

In the conducted experiment, we have used a Raspberry Pi 3 model B. Such device


integrates an ARM 1.2 GHz quad-core processor and 1G of RAM memory. The reason
for using that device is that we consider its specifications as the minimum hardware
requirements that a device should meet in order to be considered for the fog node role
in the F2C system.
The software we have preinstalled in the Raspberry Pi are Ubuntu 16.04 as Operating
System and a SQL Database Management System (DBMS). Subsequently, we created
five databases and filled them with a million of synthetic resource identifiers. The length
of the resource identifier in the first database was set to 128 bytes (according with the
length used in [19]). This first database was the one with the full identifiers. In the next
four databases a truncated version of the identifiers in the first database was stored. The
IDs were truncated at 32, 16, 8 and 4 bytes respectively. In all the cases, the identifiers
were generated using only the hexadecimal charset.
A Novel and Scalable Naming Strategy for IoT Scenarios 129

4.2 Used Storage


In F2C, the IoT layer is the one with the most limited resources. In fact, many of the
devices that operates in the lowest layer do not even have the necessary hardware
resources to process the data they generate, therefore, an effective resource management
is a must.
In this sense, the storage is one of the most constrained aspect of the devices in the
IoT layer. A F2C framework that requires excessive storage capacity to store the data
generated on runtime may disallow a large number of devices to be used as fog nodes,
causing with this, in the worst scenario, that the existing fog nodes reject new connec‐
tions because they are overloaded.
The storage required to store the resource IDs in the fog nodes is the first parameter
we have evaluated. The results obtained during the validation (Table 1) show that trun‐
cating the identifier that the resources use in the IoT layer reduce the space in disk
required to store them.

Table 1. Database sizes


Database Size (MB) %
128 Bytes 162.17 100.00
32 Bytes 67.09 41.37
16 Bytes 51.08 31.50
8 Bytes 42.08 25.95
4 Bytes 37.06 22.85

Table 1 shows the size in megabytes of the databases previously described. The
column in the right presents the percentage of the space required by the truncated data‐
bases with respect to the database that stores the full resource identifiers, it is, the 128
bytes identifiers.
It can be highlighted from the table that the difference in megabytes between the
databases with the identifiers of 8 and 4 bytes is minimal, even when the identifiers
stored in the first one are larger. This is rooted on the fact that the indexes that the DBMS
uses are not in function on the length of the fields in the tables.
In all the cases, the space in disk required to store the identifiers fragments is between
58.63% and 77.15% less than the space needed to store the full identifiers.

4.3 Queries Times


One of the main advantages that the F2C paradigm offers is the possibility to execute
applications and services with a reduced delay than cloud computing. This opens the
door to the development and deployment of all kind of novel services that require real
time responses, such as e-health services, online videogames, earthquake alarm triggers,
etc. To achieve such goal, it is imperative that the individual components that integrate
the F2C framework are highly efficient and avoid adding delays in the internal processes.
In the F2C framework, The IDMS component should be able to identify the resources
in a time that allows to devices on the move to switch among different fog nodes without
130 A. Gómez-Cárdenas et al.

interrupt its activities. Such identification process includes the database lookup task. In
this sense, our proposal aims at reducing the database lookup times, this by reducing the
amount of information that the fog nodes store.
In the validation phase, we have used the databases described under Sect. 4.1 to
measure the lookup times. We have measured ten times the time required to fetch among
200, 400, 600, 800 and 1,000 (thousands) records for each database and then we calcu‐
lated the averages of the obtained results (Table 2).

Table 2. Queries execution times


ID length IDs in the Fog Node (thousands)
200 400 600 800 1,000
128 Bytes 2.97 6.29 9.62 12.93 16.68
100.00% 100.00% 100.00% 100.00% 100.00%
32 Bytes 1.51 2.49 3.97 4.95 7.55
50.87% 39.65% 41.27% 38.28% 45.24%
16 Bytes 1.29 2.34 3.43 4.92 6.34
43.30% 37.24% 35.67% 38.01% 38.01%
8 Bytes 1.26 2.20 2.93 4.42 5.52
42.51% 34.90% 30.43% 34.16% 33.10%
4 Bytes 1.16 1.91 3.14 3.98 5.02
38.99% 30.31% 32.62% 30.80% 30.11%

Table 2 and Fig. 4 summarize the results obtained. For the sake of comparison,
percentage related to the first database are also included in Table 2. As it can be observed,
using a fraction of the full resource identifiers reduces significantly the time required to
search an item in the database. By using a quarter of the name of the devices, our proposal
has shown a reduction of up to 49.13% in the search time. In fact, a 32 bytes ID is still

Fig. 4. Queries execution times.


A Novel and Scalable Naming Strategy for IoT Scenarios 131

a large identifier for the lower F2C layer, which means that the ID length can be reduced
even more and with it, also the search time.
It’s worth noting that in general, the times obtained when using 8 and 4 bytes iden‐
tifiers are very similar. This means that the time behaves exponentially, what is justified
by the management of indexes and primary keys used by the DBMS to improve the data
retrieval process.
In Fig. 4, the queries execution times are presented graphically. The blue bars repre‐
sent the lookup times in the database that stores the full resource identifiers. It can easily
be observed that in all the cases the time required to search in such database are consid‐
erably longer than the queries execution times when the resources use a fraction of their
full ID. In this figure, the exponential behavior of query execution times can be observed
more clearly. This trend becomes more evident as the volume of data to be handled
increases.
From the results shown in the Table 2 and Fig. 4, we can conclude that when the
edge devices use a fraction of their full identifier instead of the full version of it, the
lookup time decreases significantly (between 54.76 and 69.89% for large volumes of
data), all of this, without affecting the ID uniqueness property, it is, keeping a very low
collision probability.

5 Conclusions and Future Work

The F2C compute paradigm have arose as a novel solution that intends both, to manage
the resource continuum from the edge of the network to the cloud datacenter and to solve
some of the cloud inherent limitations, such as the possibility of offering remote
resources at the edge with a reduced latency to be used by delay sensitive services that
require real time responses. However, there is still a list of open challenges that must be
addressed before we can have a F2C framework that can be deployed. One of those
challenges is the management of the resources identities in the network, especially, in
the lower hierarchical layer, where most of those resources will be concentrated.
In this paper, we propose a strategy to manage the identity of the resources that
consists of fragmenting the unique global resource ID into smaller fragments. Each time
a connection to a resource is established, the fog node that aggregates the resource to
the network will determine the connection scope and thereafter, the number of fragments
required for a mutual unambiguous identification.
The results obtained during the proposal validation phase show that the implemen‐
tation of our proposal allows to reduce both, the space in disk required to store the
resource identifiers in the fog nodes and the query execution times, achieving with this,
a more efficient use of resources in the IoT layer and streamline the resource identifi‐
cation process.
Future work in this topic includes to implement this proposal in a real scenario to
validate its effectiveness in the whole F2C environment and to propose an algorithm
that allows to determine the optimal fragment lengths for each level in the network
hierarchy.
132 A. Gómez-Cárdenas et al.

Acknowledgment. This work is supported by the H2020 mF2C project (730929) by the Spanish
Ministry of Economy and Competitiveness and by the European Regional Development Fund
both under contract TEC2015-66220-R (MINECO/FEDER), and for Alejandro Gómez-Cárdenas
by the Consejo Nacional de Ciencia y Tecnología de los Estados Unidos Mexicanos (CONACyT)
under Grant No. 411640.

References

1. Evans, D.: The Internet of Things: How the Next Evolution of the Internet is Changing
Everything (2011)
2. Burkert, A.: Modern Cars’ Insatiable Appetite for Data (2017)
3. Mehdipour, F., Javadi, B., Mahanti, A.: FOG-engine: towards big data analytics in the fog.
In: 2016 IEEE 14th International Conference on Dependable, Autonomic and Secure
Computing, 14th International Conference on Pervasive Intelligence and Computing, 2nd
International Conference on Big Data Intelligence and Computing and Cyber Science and
Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 640–646 (2016)
4. Ferrer-Roca, O., Roca, D., Nemirovsky, M., Milito, R.: The health fog. Small data on health
cloud. Presented at the International eHealth, Telemedicine and Health ICT Forum for
Educational, Networking and Business, Luxembourg, 23 April 2015
5. Firdhous, M., Ghazali, O., Hassan, S.: Fog computing: will it be the future of cloud
computing? Presented at the proceedings of the third international conference on informatics
and applications, Kuala Terengganu, Malaysia (2014)
6. Masip-Bruin, X., Marín-Tordera, E., Jukan, A., Ren, G.-J., Tashakor, G.: Foggy clouds and
cloudy fogs: a real need for coordinated management of fog-to-cloud (F2C) computing
systems (2016)
7. OpenFog Consortium: OpenFog Reference Architecture for Fog Computing, USA (2017)
8. mF2C Consortium: mF2C Project. http://www.mf2c-project.eu/
9. Sarkar, S., Misra, S.: Theoretical modelling of fog computing: a green computing paradigm
to support IoT applications. IET Netw. 5, 23–29 (2016)
10. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE
Internet Things J. 3, 637–646 (2016)
11. European Telecommunications Standards Institute: Corporate telecommunication Networks
(CN); User Identification in a SIP/QSIG Environment (2004)
12. Cao, J., Xu, L., Abdallah, R., Shi, W.: EdgeOS_H: a home operating system for internet of
everything. In: 2017 IEEE 37th International Conference on Distributed Computing Systems
(ICDCS), pp. 1756–1764 (2017)
13. Nuñez, D., Agudo, I.: BlindIdM: a privacy-preserving approach for identity management as
a service. Int. J. Inf. Secur. 13, 199–215 (2014)
14. Chen, J., Liu, Y., Chai, Y.: An identity management framework for Internet of Things. In:
2015 IEEE 12th International Conference on e-Business Engineering, pp. 360–364 (2015)
15. Fu, Z., Jing, X., Sun, S.: Application-based identity management in M2M system. In: 2011
International Conference on Advanced Intelligence and Awareness Internet (AIAI 2011), pp.
211–215 (2011)
16. Farrell, S., Kutscher, D., Dannewitz, C., Ohlman, B., Keranen, A., Hallam-Baker, P.: Naming
Things with Hashes (2013)
17. Bouk, S.H., Ahmed, S.H., Kim, D.: Hierarchical and hash based naming with Compact Trie
name management scheme for vehicular content centric networks. Comput. Commun. 71,
73–83 (2015)
A Novel and Scalable Naming Strategy for IoT Scenarios 133

18. Savolainen, T., Soininen, J., Silverajan, B.: IPv6 addressing strategies for IoT. IEEE Sens. J.
13, 3511–3519 (2013)
19. Gómez-Cárdenas, A., Masip-Bruin, X., Marín-Tordera, E., Kahvazadeh, S., Garcia, J.: A
hash-based naming strategy for the fog-to-cloud computing paradigm. In: Heras, D.B., Bougé,
L., Mencagli, G., Jeannot, E., Sakellariou, R., Badia, R.M., Barbosa, J.G., Ricci, L., Scott,
S.L., Lankes, S., Weidendorfer, J. (eds.) Euro-Par 2017: Parallel Processing Workshops, pp.
316–324. Springer, Cham (2018)
The IoT and Unpacking the Heffalump’s Trunk

Joseph Lindley ✉ , Paul Coulton, and Rachel Cooper


( )

Imagination, Lancaster University, Lancaster, UK


{j.lindley,p.coulton,r.cooper}@lancaster.ac.uk

Abstract. In this paper we highlight design challenges that the Internet of Things
(IoT) poses in relation to two of the guiding design paradigms of our time; Privacy
by Design (PbD) and Human Centered Design (HCD). The terms IoT, PbD, and
HCD are both suitcase terms, meaning that they have a variety of meanings
packed within them. Depending on how the practices behind the terms are applied,
notwithstanding their well-considered foundations, intentions, and theory, we
explore how PbD and HCD can, if not considered carefully, become Heffalump
traps and hence act in opposition to the very challenges they seek to address. In
response to this assertion we introduce Object Oriented Ontology (OOO) and
experiment with its theoretical framing order to articulate possible strategies for
mitigating these challenges when designing for the Internet of Things.

Keywords: Internet of Things · Privacy by Design · Human-Centered Design

1 Introduction

Although the term the Internet of Things (IoT) is employed regularly, particular in
discussions relating to emerging technologies, its actual meaning is ambiguous as it is
defined differently depending on who’s using it and in what context. Although it was
preceded by other terms such as ubiquitous computing and pervasive computing it has
gained traction with a general audience, perhaps because the terms ‘internet’ and ‘things’
are more accessible. However, having ambiguity baked in to the term means that ‘the
IoT’ is likely to be interpreted differently dependent upon the meanings a particular
individual might associate with these terms. This ambiguity means there is huge varia‐
tion within discourses utilizing the term. Although the research presented in this paper
is aimed at contributing to practices relating to the design of IoT products and services,
it also resonates with other, more general, discussions relating to emerging technologies.
In particular it seeks to contribute to the debates about privacy, ethics, trust and security
in the IoT [37] and understand potential barriers to adoption that may arise through the
establishment of problematic design patterns.
Our title is a play on the word trunk being synonymous with suitcase, and makes
reference to Hyman Minsky’s term, suitcase words. These words describe complex
concepts that, when one tries to define them, reveal a nested series’ of other meanings
contained within. The other odd term in the title, Heffalump, refers a fictional elephant
like creature, appearing in A.A. Milne’s books about Winne the Pooh. In one story Pooh
and his friend Piglet decide to catch a Heffalump in a cunning trap, unfortunately they

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 134–151, 2019.
https://doi.org/10.1007/978-3-030-02686-8_11
The IoT and Unpacking the Heffalump’s Trunk 135

only succeed in trapping themselves. The irony of this story has given rise to Heffalump
Traps being used by political journalists to describe strategies in which a politician might
set a rhetorical trap to catch their opponent and that ultimately backfires on the trapper,
leaving them to appear foolish! Thus, despite their intentions, and often fine execution,
Heffalump traps fail to achieve their aims and instead are detrimental toward the desired
outcome. In this paper we illustrate how the suitcase terms IoT, Privacy by Design (PbD),
and Human Centered Design (HCD) can, become Heffalump traps by virtue of their
nested complexities.
The paper is structured as follows. First, we discuss PbD, paying particular attention
to the linguistic complications when trying to define what it really means using the
example of the ambiguity present in the European Union’s invocation of the term in the
recently introduced (EU) General Data Protection Regulations’ (GDPR). Next, we
discuss the challenge to the well-established paradigms of Human-Centered Design
(HCD) resulting from the complexities introduced by networked nature of IoT products
and services. Third we argue that, if interpreted hubristically, PbD and HCD can result
in unintended consequences, and, in essence, become Heffalump traps. Finally, we
propose the use of new design research techniques incorporating concepts derived
contemporary philosophies of technology that can be used to develop and test strategies
when navigating the complexities of the IoT and thus to minimize the risk of becoming
caught in a Heffalump trap.

2 Privacy by Design (and This by That)

It is important to start this discussion by acknowledging that PbD does not exist in
isolation; there are other propositions which overlap with it such as privacy, security
and/or data protection by default. The semantics of the terms use does not aid our under‐
standing; for example, configuring something by default would not the same as creating
something in a particular way, or put differently, by design. Although, for something to
have a default configuration implies that it must have been designed that way. Adding
to this confusion is the fact that in English language the word ‘design’ can be used in a
multitude of different way to mean very different things, e.g. the designer uses her/his
knowledge of design to design a thingamajig, which was part of the final system design
(which was built in accordance with the original design schematic). It was perhaps
inevitable for confusion to result when the terms appeared in an influential report in the
form “incorporates Privacy by Design principles by default” [6].
The already murky waters that contain PbD are made more difficult to navigate when
we introduce the complex abstractions like ‘privacy’ and ‘security’. To unpack these
very quickly: privacy is not the same as security, but in some circumstances, privacy
may be delivered by security and conversely security may be delivered by privacy. It is
also evident that disciplinary idiosyncrasies can also come into play when trying to bring
some clarity to a particular situation. For example, an engineer may interpret security
operationally in terms of a particular implementation, like access control lists, whereas
a psychologist may draw their understanding from a psychological theory, such as
Maslow’s hierarchy of needs. While both considerations are equally valid even when
136 J. Lindley et al.

their epistemological roads intersect, a common understanding will not necessary


emerge. These definitional complexities are not, in themselves, anything to do with how
one delivers PbD, they must be acknowledged within any critical discussion. Whilst the
argument in this research is relevant to wider discourses of emerging technology,
primarily the specific issues we are concerned with are (1) Privacy by Design [6] and
(2) Data protection by design and by default as referred to in article 25 of the GDPR [42].
Whilst the term PbD emerged originally in a 1995 report1 it came to prominence in
2012 through the work of Ann Cavoukian and Jeff Jonas [6]. Introducing PbD Cavoukian
quotes the words of a 13th century Persian poet who posits that to ‘reinvent the world’
one must ‘speak a new language’. The premise is that technological progress is itself a
new language that brings with it fundamental challenges to the notion of privacy. Going
on to provide more concrete examples, the report describes the use of a one-way hash
function to protect data subjects’ privacy so that even if patterns can be observed in the
data, it cannot be reverse engineered to reveal the names of the participants. While this,
and the other examples provided are compelling they are arguably a little naïve.
Although in particular contexts such approaches can protect the privacy of individuals
represented in the data in the increasingly heterogeneous contexts the IoT represents
they can be extremely vulnerable to exploitation through amalgamation with other,
seemingly unconnected, data sources and complete reliance on them could prove detri‐
mental. In the report Cavoukian builds upon the technical contribution of Jeff Jonas to
propose seven principles for the creation of systems that are private by design. These
include:
• Full attribution of each data record;
• Data is tethered (any changes to data are recorded at the time of change);
• Analytics only occur when data has been anonymized;
• Tamper-resistant audit can be performed;
• Systems are created that tend towards false negative rather than false positive in
borderline cases;
• Self-correcting conclusions (conclusions can be changed based on new data anal‐
ysis);
• Information flows are transparent (data movements should be trackable and traceable
—whether that is through a hard copy, appears on monitor, or is sent to another
system).
These principles are aimed at what the report refers to as ‘sense making systems’,
systems that synthesize data from multiple systems such as payroll, customer relation‐
ship management, financial accounting, in order to reach new workflow conclusions.
While the principles make some sense within the bounded context described, they are
regrettably too specific to become generally applicable to the heterogeneous user groups
and devices found within the IoT.
In her discussion of PbD Sarah Spiekermann notes “Data is like water: it flows and
ripples in ways that are difficult to predict” [33], the implication being that PbD is rather

1
http://www.ontla.on.ca/library/repository/mon/10000/184530.pdf.
The IoT and Unpacking the Heffalump’s Trunk 137

idealistic and when implemented in practice can be as simple as the utilizing Privacy-
Enhancing Technologies with additional security, with the aspiration being an appa‐
rently “fault-proof” system. Although such an aim is worthy, and the approach is valid,
as she states, “the reality is much more challenging”. Spiekermann problematizes this
idealism by reflecting business models of Google and Facebook. They provide a range
of apparently ‘free’ services but “without personal data such services are unthinkable”.
She argues that proponents of PbD “hardly embrace these economic facts in their
reasoning”. In other words, it may not be possible to create feature rich systems that are
profitable for the companies that supply them without contravening some of PbD’s
fundamental ideals.
In Cavoukian’s response, whilst broadly agreeing with Spiekermann’s analysis, she
also insists “the challenges of PbD are not as great as Spiekermann suggested; the engi‐
neers I have met have embraced the PbD principles, finding implementation not difficult”
[5]. Whilst this may be true, it somewhat misses the more interesting element of Spie‐
kermann’s analysis which touches on potentially systemic shortcomings at the core of
PbD’s rhetoric: a ‘fault-proof’ landscape is unrealistic when the ‘economic facts’ of
many business models are not acknowledged. Spiekermann’s critique highlights that to
do PbD effectively, it must become part of overall organizational culture, cutting across
management, finance, marketing, design and engineering. This is perhaps the reason
behind why PbD stagnates, and struggles to move from principles to practicalities—
particularly in consumer goods. An alternative perspective on this echoes Shapiro’s
suggestion that neither engineers nor customers are able to properly articulate, under‐
stand, or analyze the impact of ‘non-functional’ requirements like privacy [32]. These
hard-to-grasp requirements operate at a completely different level of abstraction to what
either engineers and customers are accustomed to thinking about.
To recap, the new language of technology is making our world anew, but, we are
not yet fluent in this emerging language. While purely technical responses to privacy
sometimes appear to offer faultless solutions (e.g. processing irreversibly hashed data),
rarely will such a solution be generalizable across a range of contexts. While principles
of PbD appear to be useful mechanisms they can be easily compromised when the
complexities of ‘in the wild’ contexts are encountered. Whilst we are not disputing that
PbD has demonstrably helped inform the delivery of privacy-aware projects with buy-
in from developers, customers, and management alike, such examples appear to be in
very specific contexts and do not necessarily cut through the aforementioned issues.
Although the rhetoric deployed for PbD hints at the practicality of creating a ‘fault-
proof’ approach to privacy this fails to appreciate the economic realities of what
currently makes data-centric businesses viable.
On the 25th May 2018 when GDPR became active the data protection legislation
across a large swathe of Europe immediately changed. As GDPR protects citizens
regardless of where the data pertaining to them is being held, it has also impacted on
any organization who holds data about European citizens. We are yet to fully understand
how GDPR will play out in practice, test cases and precedents will need emerge before
its full implications are understood. Notwithstanding this uncertainty, GDPR is being
cited as a legal framework that will clarify and enforce PbD, because article 25 of GDPR
explicitly mentions Data protection by default and design [40]. The opening words of
138 J. Lindley et al.

the article say that data controllers must take “the state of the art” approaches of PbD
into account however no indication is given to what state of the art might mean in practice
[14]. Given that this assertion is made under the heading ‘data protection by design and
default’ we might reasonably infer that there is a relationship between the two, although
the nature of that relationship is undefined. Article 25 also makes reference to the ‘by
default’ trope, stating that appropriate measures should be taken to ensure that by default
“only personal data which are necessary for each specific purpose of the processing are
processed”. Thus, it appears that GDPR’s interpretation of data-protection by design,
and relatedly by default, is at best ambiguous and certainly does not progress our under‐
standing of how to effectively operationalize the rather abstract principles of PbD. This
lack of specificity with respect to PbD (and its relatives) is not confined to the document
defining GDPR. The UK Information Commissioners Office (ICO) which is the UK
organization responsible for interpreting and enforcing GDPR calls on data controllers
to utilize PbD, but does not proffer any guidance as to how this may be practically
enacted.2 While the definitional challenges facing European regulators are undoubtedly
significant, by including the terminology within the text of GDPR without attending to
PbD’s inherent ambiguity, further challenges are almost certainly abound.

3 Human-Centered Design

In his book The Design of Everyday Things [27] Don Norman presented principles for
designing ‘things’ in such a way that human interaction with them is smooth and fruitful.
Until relatively recently such interactions tended to occur predominantly between users,
things and/or systems that were standalone and self-contained. In the book Norman
provides numerous examples including a refrigerator, a telephone, and a clock. Despite
the fact that some of his examples, such as the telephone, depend upon several technol‐
ogies interacting across a diverse technical infrastructure, the user experience of using
the phone is encapsulated within a discrete interface made up of handset, dialer, and
ringer. Today, interactions occur in much more complex contexts which present
designers with new challenges. The “networkification of the devices that previously
made up our non-Internet world” [29] is creating the IoT and while, interactions with
these devices may appear familiar on the surface they inevitably produce an associated
digital residue. This digital residue is data, and in stark contrast to the “visibility, appro‐
priate clues, and feedback of one’s actions” that Norman highlights as key properties of
HCD [27:8–9] the full impact of the data is rarely visible either during or after actual
user interactions (with connected, or IoT, devices). While this data is necessary to
support business models, to train algorithms and, ultimately, to make stuff work, it is
possible that by obscuring agency of underlying data, models and algorithms at the point
of interaction, designers are in fact operating against the underlying ideology of HCD.
The foundations of HCD are in ergonomics with the aim of supporting the “ways in
which both hardware and software components of interactive systems can enhance
human-system interaction” [43]. Despite being demonstrably useful [2, 16] this engi‐
neering derived paradigm relied on simplifications of complex contexts [11, 13, 38].
2
https://ico.org.uk/for-organisations/guide-to-data-protection/privacy-by-design/.
The IoT and Unpacking the Heffalump’s Trunk 139

These reductive stances are incompatible with other more modern approaches that have
become integral to HCD and acknowledge “the coherence of action is not adequately
explained by either preconceived cognitive schema or institutionalized social norms”
[36:177]. The result is that HCD methods have become extremely diverse, build upon
a variety theoretical and epistemological stances, and are applied variously as both an
evaluative and a generative tool [13, 23, 34]. The spectrum of approaches to utilizing
HCD now includes methodological assemblages that can draw upon ethnography,
participatory design, cultural probes, workshop techniques, scenarios, extreme users,
and personas. Applied sensitively these techniques can produce designs that are “phys‐
ically, perceptually, cognitively and emotionally intuitive” [13], while also matching
“the needs and capabilities of the people for whom they are intended” [27:9]. Whilst it’s
true that “there is no simple recipe for the design or use of human-centered computing”
[17], HCD—particularly among the design research community—has become ubiqui‐
tous is greatly influence on the technologies that concurrently we shape, and then ulti‐
mately shape us.
Even amongst this diverse methodological landscape, a core theme that pervades
HCD utilization is the axiom of simplicity. This is oft interpreted to mean that HCD
should inform the design of services and software that are efficient, effortless, and
edifying to use; that fade into the background becoming invisible, and that ensure any
complexity is that of the underlying task and not of the tool that has been developed to
achieve it [25:197, 26]. Norman himself acknowledges that dogmatically blunt inter‐
pretations of this simplicity axiom can, perhaps unsurprisingly, introduce unintended
consequences that drive HCD towards a “limited view of design” and result in analysis
preoccupied with narrowly focused “page-by-page” and “screen-by-screen” [24] eval‐
uations. This narrow focus can stifle potential users, and/or researchers, form being able
to fully intuit a particular designed ‘thing’ on a crucial cognitive, emotional, and percep‐
tual level. In the hyper-connected and data-mediated assemblages of the IoT, the prev‐
alent assumption that simpler-is-better is already proving highly problematic as the
recent revelations concerning Facebooks use of data illustrate. While some aspects of
HCD are worthy and hold fast, the complexity, ubiquity, and interconnectedness of
systems—represented by the IoT—means that HCD needs to be reevaluated. In the age
of the IoT, whilst we need to reflect the human centered ideals of HCD, it may be
necessary to accept that there are, effectively, multiple centers and actants relevant to
any given interaction.

4 Hubris and Heffalumps

The common thread that connects the previous discussions of PbD and HCD relates to
the risk that occurs when their principles are interpreted hubristically; with excessive
self-confidence. To illustrate this, take a moment to think about the story of the
Titanic. The ship employed cutting edge technology in an effort to make as safe as
possible and was famed for being ‘unsinkable’. As well as explaining a lack of lifeboats
on board, this inflated confidence meant that even though a spotter saw the iceberg in
good time, the helmsman was never asked to take avoiding action—if the ship is
140 J. Lindley et al.

unsinkable, why avoid a sinking hazard? After the tragedy the owners were accused of
using misleading rhetoric about her sinkability, in response they pointed out their claim
was only that the ship was designed to be unsinkable (as opposed to actually being
unsinkable). The tale of the Titanic illustrates that hubristic reliance can, if circumstances
conspire, be extremely dangerous.
Relying on supposed guidelines and principles for HCD and PbD is, arguably,
equivalent to the Titanic’s relying on cutting edge anti-sinking technologies. Hence, we
cast HCD and PbD as potential Heffalump traps. By solely relying on these approaches
—despite their unequivocal worthy aims and demonstrated practical virtues—technol‐
ogists may inadvertently end up ensnaring themselves by the very issues that HCD or
PbD may have sought to avoid (see Fig. 1). The problem, in many ways, is with binary
and didactic positions. Describing ships as unsinkable, systems as private, or designs as
human centered—is irrational. The results of such irrational beliefs may, at worst, result
in tragedies like the Titanic. The IoT is so pervasive that the scope of resulting impacts
range from the relative inconsequence of the Mirai botnet taking down Netflix, through
to the destabilization of national infrastructure and potential dissolution of democratic
processes.

Fig. 1. Depiction of a Heffalump Trap.

If treated insensitively, ideals like PbD and HCD may coerce technologists to believe
that privacy is something that can be ‘achieved’ and a system’s simplicity is analogous
to being ‘human centered’. Notions of apparently perfect systems are as dangerous as
considering a ship unsinkable; these positions are misconceptions. Ship captains, system
developers, and Heffalump trappers alike; be careful. Don’t suggest your ocean liner is
The IoT and Unpacking the Heffalump’s Trunk 141

unsinkable, don’t believe your door-lock is uncrackable, don’t attempt to trap the made-
up animal—refrain from assuming that it might be feasible to design a computerized
device that is perfectly private by design. Do, however, embrace those driving ideals,
just with a healthy skepticism towards the hubristic tendencies. In the following we
describe theoretically-informed strategies to mitigate the dangers of hubris and Heffa‐
lumps.

5 Tempering the Hubris; Designing a Philosophical Response

5.1 Object Oriented Ontology

In the following we introduce Object Oriented Ontology (OOO), a modern philosophy


which can help to make sense of the complex heterogeneous contexts emerging from
the IoT that are so problematic for PbD and HCD. This framework is enacted with a
contemporary speculative design methodology, Design Fiction [7, 19], to develop
responses to the problematic aspects of PbD and HCD’s Heffalump traps. We are not
scholars of philosophy; hence we do not intend to discuss the nuances of OOO’s place
within the broader gamut of philosophy and theory. However, in order to add some
context in the following we offer a short introduction to OOO, specifically within the
context of computing and HCD.
Philosophically underpinning HCD’s simplicity axiom in studies of Human-
Computer Interaction, Heidegger’s seminal Being and Time argues most objects and
tools make most sense in relation to human use. Heidegger uses a hammer as an example,
he says that technologies are either ‘ready-to-hand’ (in their normal context of use) or
‘present-at-hand’ (if the ‘norm’ is disrupted, for example if the head fell off the hammer).
The metaphysics of this distinction are fascinating, but the salient issue is that the
hammer comes to ‘Be’ through interaction with a human. As such the hammer’s very
existence is the product of a correlation between the human mind, and the physical world
[3]. This conceptual configuration described as ‘correlationism’ [15]. What OOO does
differently is to reject correlationism, and by doing so creates the possibility that objects
have realities that are independent from human use and the mind/world correlation. Seen
this way anything from a fiber optic cable, to a blade of grass, to a quantum computer,
to an apple pie—may be given agency in its own ontological limelight. If we imagine
that every individual concept—the fiber cable or the blade of grass—giving off a little
light in this way, then we might say their collective hue is the “flat ontology” that scholars
of OOO refer to [4].
“In short, all things equally exist, yet they do not exist equally […] This maxim may seem like a
tautology—or just a gag. It’s certainly not the sort of qualified, reasoned, hand-wrung ontolog‐
ical position that’s customary in philosophy. But such an extreme take is required for the curious
garden of things to flow. Consider it a thought experiment, as all speculation must be: what if
we shed all criteria whatsoever and simply hold that everything exits, even things that don’t?
[…] none’s existence fundamentally different from another, none more primary nor more orig‐
inal.” [3:11]

Bogost uses the famously ill-fated video game E.T. the Extra-Terrestrial as an example
of how a single thing can be broken into many different types of OOO object. He notes
142 J. Lindley et al.

that the game is simultaneously: a series of rules and mechanics; source code; source
compiled into assembly; radio frequency signals; a game cartridge; memory etched on
silicon; intellectual property; arguably ‘the worst game ever made’; a portion of the
728,000 Atari games that were once buried in the ground in New Mexico;3 a conglom‐
erate of all of these. There is no fundamental thing which defines The E.T. video game.
Instead it is all of these things simultaneously, and all of them independently of any
human interaction. Contemplating what this sort of shift in ontology could mean Bogost
muses “the epistemological tide ebbed, revealing the iridescent shells of realism they
had so long occluded” [3].
This branch of metaphysics may seem very far removed from the development of
technology, however, through a more practically-oriented approach known as Carpentry
it can be materialized. Carpentry involves the creation of “machines” that attempt to
reveal clues about the phenomenology of objects. While it’s accepted that objects’
experiences can never be fully understood, the machines of carpentry act as proxies for
the unknowable. They proffer a “rendering satisfactory enough to allow the artifact’s
operator to gain some insights into an alien thing’s perspective” [3:100]. Sometimes
achieved through programming, and sometimes through other practice, “through the
making of things we do philosophy” [41]—lending the theory a material tangibility is
the kernel of Carpentry. The purpose of Carpentry is to give the otherwise ethereal study
of ontology a very practical legitimacy:
“If a physician is someone who practices medicine, perhaps a metaphysician ought be someone
who practices ontology. Just as one would likely not trust a doctor who had only read and written
journal articles about medicine to explain the particular curiosities of one’s body, so one ought
not trust a metaphysician who had only read and written books about the nature of the
universe.” [3:91]

5.2 Design Fictions

All design usually seeks to change the current context, and thus to create futures by
answering questions or solving problems [22]. Speculative design is somewhat different,
it uses design to pose questions about possible futures, rather than to answer them.4 This
family of design practices does not aim to create products for market, or which solve a
real problem, instead they use the traditions of design in order to elicit insights and
provoke new understandings [1, 8, 9] (a stance that is central to ‘Research through
Design’ [10, 12]). The speculative design landscape is quite broad5 however the specific
approach we employed in this work is Design Fiction.
There continues to be much disagreement about the ‘best’ ways to do Design Fiction,
but the ‘Design Fiction as World Building’ approach [7] is the one we adopted with this
work. Doing Design Fiction this way involves designing a series of artifacts which all

3
cf. https://en.wikipedia.org/wiki/E.T._the_Extra-Terrestrial_(video_game).
4
“A/B” is an excellent keyword based summary of the contrast between affirmative and spec‐
ulative design [30].
5
Dunne and Raby’s book [9] provides a thorough overview of speculative design practice and
Tonkinwise’s review of the book offers some useful critique of speculation tooå [39].
The IoT and Unpacking the Heffalump’s Trunk 143

contribute to the same fictional world. Individual artifacts act as ‘entry points’ in to the
fictional world by depicting parts of it at a range of different scales (Fig. 2). This results
in a reciprocal prototyping effect; the artifacts define the world, the world prototypes
the artifacts, which, in turn, prototype the world.

Fig. 2. Design Fiction as World Building

We utilize Design Fiction this way in a form of Bogostian Carpentry. In Bogost’s


examples he explores the inner world of objects by using computer code. The flexibility
of code allows him to, effectively, ‘play God’ within that realm. The demiurgic quality
afforded Bogost by using computer code also exists when building Design Fiction
worlds. However, instead of functions, APIs and code of the computer’s domain, it is
the essence of Design Fiction worlds—and the designed things that define them—that
are the tools of this particular creationist trade.

The World’s First Truly Smart Kettle. Employing the world building approach, we
attempted to enact Bogostian carpentry in the design of a smart kettle—the kettle is
branded as Polly, in reference to the nursery rhyme Polly Put the Kettle On. The contours
of Polly’s world are crafted through the creation of various artifacts, including a fictional
press release for the kettle, packaging materials, and user interfaces. The press release
describes many of the kettle’s features, these include smart notifications, integration
with social media, voice commands, energy tracking, location-based boiling, and the
trademarked JustRight smart fill meter. Some of these features are prototyped in user
interface designs (e.g. Fig. 3) and the artifacts aim to provide historical context to the
Polly world too: the product was originally crowdfunded before subsequently being
bought out by Amazon’s IoT division; it is regulated by a government organization, and
in order to achieve its accreditation it must utilize the Minimum Necessary Datagram
Protocol [cf. 20, 22].
144 J. Lindley et al.

Fig. 3. Polly’s OOO-inspired timeline and volumetric data graph.

When building Polly’s fictional world we built from the assumption that continuing
IoT adoption will result in even more ubiquity of data collecting devices [35]. Among
these, presumably devices such as kettles will (continue to) collect data too. Today, the
visibility of the data shared by these devices is at best opaque and at worst absent,
isolating the user from the underlying data transactions. While PbD principles can
protect the user from unwanted or nefarious processing of their personal data, on occa‐
sions where that sort of processing is part of the to facilitate the device’s functional
requirements, the best alternative would be to communicate the nature of the data trans‐
actions rather than disguising them. We may liken this to an autonomous car that would
choose an optimized route to its destination. Most of the time routing designed to reduce
journey times are desirable but if the car was designed in such a way that it would not
reveal precisely what that route was, it would likely engender a feeling of distrust.
Responding to this need we constructed two key features in Polly’s fictional world.
Figure 3 (left) shows timeline depicting events taking place over the course of a day.
From the timeline, we can tell that, in data terms, Polly was dormant for over 4 h since
the ‘daily cloud pingback’, which uploads usage data to the cloud and downloads
configuration, security, and update data from the cloud. We can also see Polly was
removed from its base, partially refilled, at which point the kettle’s software anticipates
it may be boiled soon. We can see that removing the kettle from the base and refilling
it result in immediate sharing of data to the cloud. The anticipation event however does
not share data to the cloud but does share data with the home’s smart meter and other
appliances to inform them of an impending power-consumption spike.
The righthand side of Fig. 3 depicts the volume of the data uploaded from Polly,
downloaded to Polly, and moving around the local network. This display differs from
the timeline in that we cannot tell from it why data is moving around. However, what
we can tell is the relative amount of data this smart kettle consumes and generates, as
well as the relative volume of those transactions. Both displays are intended to be used
in conjunction with each other such that Polly is quite transparent about to what it
communicates and for what purposes. Based on the examples we can infer that Polly
downloads much less data than it uploads. The specific reason for the upload/download
disparity is not important, rather the takeaway point is that by utilizing Carpentry and
Design Fiction, considering the reality of the kettle itself and giving the kettle’s Object
The IoT and Unpacking the Heffalump’s Trunk 145

Oriented perspective as much weight as the user’s perspective and the manufacturers
perspective, a more egalitarian interface can be designed that doesn’t detract from the
usability forwarded by HCD or the privacy credentials of PbD, but that does reveal the
reality of what is happening and why, thus detracting from the dangers of hubris.

Orbit, a Privacy Enhancing System. This project was in part motivated to explore
how the European Union’s GDPR may impact on user/technology interactions. We were
minded to develop a system that could obtain GDPR-compliant consent in a modern,
simple and transparent way. Although legal precedents are yet to be tested and estab‐
lished in court, the articles of the GDPR theoretically protect various rights including:
the right to be aware of what personal data is held about an individual; the right to access
personal data; the right to rectify inaccurate data; the right to move personal data from
one place to another; the right to refuse permission for profiling based on personal data;
the right that any consent obtained relating to personal data must be verifiable, specific,
unambiguous and given freely.
The process by which users consent to have their data collected and processed is an
area of particular contemporary relevance. The alleged involvement of British marketing
company Cambridge Analytica in Donald Trump’s election victory and how, if this is
shown to be true, consent was gained for the collection and processing of data from
Facebook, is one factor driving interest in consent. Although some advances have been
made in recent years—for example pre-checked boxes and non-consensual cookie usage
were both outlawed in Europe in 20116—tick boxes for users to indicate they have
understood and agree to conditions of use are still the norm. There are fundamental
problems with this approach, the most obvious of which being that while users often
tick boxes saying they have read terms and conditions, the tick is no indication of whether
they have actually read the text, nor whether they have understood it. In one study only
25% of participants looked at the agreement at all, and as little as 2% could demonstrate
comprehension of the agreement’s content [28]. User agreements that obtain a wide
spectrum of consent, whereby a user gives all the permission a device or service could
ever possibly need, stifle users’ agency to be selective about which features of a system
they would like to use (which in turn seems to contravene the GDPR-protected right for
specific and unambiguous consent). These systems also fail to account for changes over
time; once consent has been gained it is frequently impossible (or very difficult) to
remove or change the nature of the consent.
Again using the Design Fiction world building approach, we decided to use an IoT
lock device to build the world around. Inspired by IoT locks that already exist on the
market7 the fictional lock was imbued with the following features:
• Using short-range radio instead of a key;
• Location-based access (geofencing);
• Temporary access codes (for guests);
• Integration with voice agents (e.g. smart assistants);
• Integration with other services such as If This Then That (IFTTT).

6
http://www.bbc.co.uk/news/world-europe-15260748.
7
cf. http://uk.pcmag.com/surveillance-cameras/77460/guide/the-best-smart-locks-of-2017.
146 J. Lindley et al.

Each feature has a different relationship with collected data, where data is stored, and
how it is processed. Using a short-range radio (NFC) instead of a key only relies on data
inside the users own network; location-based access requires that data be accessed and
stored by the lock company; utilizing services like IFTTT would lead to data being
shared with any number of 3rd parties. Given that our purpose was to explore GDPR-
compliant consent mechanism, our crafting of the Design Fiction only paid brief atten‐
tion to the technical implementation (we assumed that the lock would utilize an IoT
radio standard such as ZigBee and that suitable APIs facilitate integration with external
services such as IFTTT).
Our original aim with this project was to design a map that could be used during a
consent procedure to show to a user what data goes where so that they would be
“informed by design” [21]. However, this aim was immediately challenged by the vast
number of possible variations, even within a relatively small and straightforward IoT
context. Figure 4 illustrates a scenario with an IoT lock which has been configured to
turn on a smart lighting system when the user opens their door. While the cause and
effect are simple and clear to the user (opening the door makes the lights turn on), there
actually several cloud-based services behind the scenes that are necessary to make the
hardware work. There may also be unknown 3rd parties using the data too (e.g. data
brokers). Hence, to turn this into a map that details precisely where data goes, when,
and in what circumstances, is simply not possible. A significant factor driving this chal‐
lenge is that each specific situation needs to be treated as an ad hoc scenario, as something
completely unique [31].

Fig. 4. Diagram showing how a user opening the door may trigger a number of possible data
flows around the constellation, and that there is no single end point.

In order to progress some the design parameters had to be amended. Initially we


made our investigation more tightly scoped, rather than addressing GDPR compatibility
per se, we focused solely on personal identifiability. Next, it was necessary to forget the
The IoT and Unpacking the Heffalump’s Trunk 147

reducible concept of a map that would represent specific and quantifiable measures of
probable risk and accept that any map would require much more extensive use of ‘shades
of grey’. As a result of these changes our experiment with OOO went in directions we
had not predicted.
While our original intention was that OOO’s tiny ontologies would provide us with
means to investigate the lock, the associated data streams, and potential users. Our
attempt at carpentry, we thought, would lead us to have a deeper understanding of those
objects directly. Contrastingly, however, what came to pass is that our carpentry resulted
in the creation of an entirely original object (complete with its own tiny ontology). The
purpose of this new object is to provide a new lens for looking at collections of IoT
devices, platforms, the data that mediates between these, and the people that use them.
These new objects—referred to as Orbits—communicate the relative likelihood that
a person may be identified based upon on device use. They present this in a fashion that
distinguishes between data held locally, with known providers, or with unknown 3rd
parties. These ‘maps’ provided some means to bridge between the vast gamut of possi‐
bilities in the computer-world and the succinct concreteness of judging acceptability in
the human-world. They facilitate value judgements.
The privacy Orbits map IoT systems, the data they utilize, and communicate the
likelihood of identifiability based on data held in different places. The ‘levels’ (i.e. each
concentric circle) represent data that is held locally, with known providers, or with
unknown 3rd parties (see labels in Fig. 5). The definition (blurriness or sharpness) at the
edge of each level describe the probability, or certainty, of the user being identifiable
based on the data at that specific level. If the inner-most level has a pin-sharp edge, then
it is almost definite that the user could be identified based on those data (e.g. the right-
hand diagram’s 1st level in Fig. 5). Blurrier levels mean that the chance of identifiability
is reduced (e.g. the left-hand diagram’s 3rd level in Fig. 5).

Fig. 5. Example identifiability Orbits (the name ‘Orbit’ stems from a visual similarity to the
diagrams used in the Bohr model of the hydrogen atom (https://en.wikipedia.org/wiki/
Bohr_model)).
148 J. Lindley et al.

The Design Fiction world we had created was a useful tool to then import the iden‐
tifiability Orbits into, and to prototype how they might be used. We created a short film
that shows a user installing a new IoT smart lock device in their home8 using a voice
interface and a supporting app. In essence the user is provided with a slider which enables
or disables all the possible functions of the lock, the Orbits communicate how the asso‐
ciated changes in data flows impact on identifiability.
The same scenario may be extended to show the implications of dynamically modi‐
fying settings, for example to temporarily provide access to a delivery agent using a
system similar to Amazon Key.9 If the user has configured their system for maximum
privacy (or, minimal identifiability) then Orbits could be used to temporarily provide
access to the 3rd party and to show the user what the impact on data flows would be.
Though this interaction is clearly achievable, it raises a host of other questions relating
to the temporality of consent. For example, if a user gives consent for their data to be
used by a 3rd party for a few hours, what happens to that data after those hours have
elapsed?

6 Discussion and Conclusions

Our OOO-informed Design Fictions work within boundaries of the following senti‐
ments: “the Internet must be grasped in metaphorical terms” [29] and that “Security by
design and privacy by design can be achieved only by design. We need a firmer grasp
of the obvious” [32]. Of course, acting on such sentiments is easier said than done,
particularly when each of the constructs that we deal with—IoT, PbD and HCD—are
all suitcase terms with multiple possible meanings. Because of this network of prob‐
lematic aspects, we assert that drawing on philosophy, and employing speculative
design, is a productive way to begin to unpack the problem (as opposed to more directly
applied/engineering-led approaches). The examples we have provided above are
intended to be used in two ways. First, we wish to forward the method itself: enacting
Bogostian Carpentry as a way of practicing OOO to address the complexities of PbD
and HCD in an IoT context. This conclusion is relatively straightforward; we invite other
researchers and technologists to apply a similar method and in doing so research the
concepts further. Second, using Design Fiction as a method of Research through Design
[10, 12], we offer the following primary contributions which may be directly applied by
technologists.

Augmenting HCD with Constellations. Our critique and exploration of HCD is not
meant unkindly. We acknowledge and applaud the rich history that HCD has, and rather
than calling out shortcomings we wish to augment it for the 21st century. Thus, we
propose the ‘Constellation’ design metaphor. This is a wrapper for the complexities of
OOO and calls upon designers, developers and analysts to understand and acknowledge
multiple different perspectives in their products. Just as the constellations in the night

8
https://youtu.be/A37SmnNFstA.
9
https://www.theverge.com/2017/10/25/16538834/amazon-key-in-home-delivery-unlock-
door-prime-cloud-cam-smart-lock.
The IoT and Unpacking the Heffalump’s Trunk 149

sky appear different depending on where you stand, the constellations of devices, data,
networks, and users of the IoT appear different depending on whom you are. Rather than
obfuscating this complexity, interfaces such as those exemplified in Polly and Orbit,
should communicate and reveal the complexity so as to inform all parties of any relevant
others’ interests, activities, and agency. In doing so, the otherwise well-developed tools
in HCD’s toolbox, may be utilized and leveraged, in order to produce technologies that
deliver on the promise of the IoT without compromising users’ interests.

Humbling the Hubris; Toward Informed by Design. Precisely echoing our explora‐
tion of HCD, the perspective we present on PbD is not a scornful one. However, we
cannot escape that the temptation to use guidelines and principles as a kind of ‘safety
blanket’ beneath which technologists may hide if they hubristically argue that ‘because
I have ticked the boxes my system design is good enough to protect privacy’. Systems
should be designed in such a way that the potential conflation of understanding relating
to privacy, security, and data protection by design (and/or) default is reduced—this may
be achieved by purposeful disambiguation. This disambiguation may involve acknowl‐
edging that manufacturers cannot guarantee total privacy and explaining the factors
which underpin that uncertainty (as demonstrated in the privacy Orbits in particular).
The complexities of non-functional requirements, particularly in IoT contexts, should
be approached heuristically; users, and every other actor in the given constellation,
should be given the agency to understand any given situation for themselves.

Avoid Heffalump Traps. Adoption of IoT devices has unequivocal societal and
economic benefits, but to capitalize on those benefits designers, engineers and policy-
makers need to set aside beliefs that are founded on the conceptual possibility of ‘perfect’
systems. Such beliefs are incongruous with the unavoidable realities of privacy, trust,
and security issues. Instead, the IoT needs to be designed with a considered approach
that accepts IoT devices definitely do pose problems for individuals’ privacy, but that
those problems can be tempered by subtly shifting our design paradigms such that they
incorporate constellations of meaning and inform all participants in a constellation of
their roles within it. To reinvent the world, we must speak a new language, and that
language should ensure that Heffalump traps are not part of the vernacular.

Acknowledgements. This research was supported by the RCUK Cyber Security for the Internet
of Things Research Hub PETRAS under EPSRC grant EP/N02334X/1.

References

1. Auger, J.: Speculative design: crafting the speculation. Dig. Creat. 24(1), 11–35 (2013).
https://doi.org/10.1080/14626268.2013.767276
2. Bevan, N.: How you could benefit from using ISO standards. In: Extended Abstracts of the
ACM CHI 2015 Conference on Human Factors in Computing Systems, pp. 2503–2504
(2015). https://doi.org/10.1145/2559206.2567827
3. Bogost, I.: Alien Phenomenology, or What It’s Like to Be a Thing. University of Minnesota
Press, Minneapolis (2012)
150 J. Lindley et al.

4. Bryant, L.R.: Democracy of Objects. Open Humanities Press, London (2011). https://doi.org/
10.3998/ohp.9750134.0001.001
5. Cavoukian, A.: Operationalizing privacy by design. Commun. ACM 55(9), 7 (2012). https://
doi.org/10.1145/2330667.2330669
6. Cavoukian, A., Jonas, J.L.: Privacy by Design in the Age of Big Data (2012)
7. Coulton, P., Lindley, J., Sturdee, M., Stead, M.: Design fiction as world building. In:
Proceedings of the 3rd Biennial Research Through Design Conference (2017). https://doi.org/
10.6084/m9.figshare.4746964
8. Dunne, A.: Hertzian Tales: Electronic Products, Aesthetic Experience, and Critical Design.
The MIT Press, London (2006)
9. Dunne, A., Raby, F.: Speculative Everything. The MIT Press, London (2013)
10. Frayling, C.: Research in art and design. R. Coll. Art Res Pap. 1(1), 1–9 (1993)
11. Gasson, S.: Human-centered vs. user-centered approaches to information system design. J.
Inf. Technol. Theory Appl. 5(2), 29–46 (2003)
12. Gaver, W.: What should we expect from research through design? In: Proceedings of the 2012
ACM Annual Conference on Human Factors in Computing Systems - CHI 2012, p. 937
(2012). https://doi.org/10.1145/2207676.2208538
13. Giacomin, J.: What is human centred design? Des. J. 17(4), 606–623 (2014). https://doi.org/
10.2752/175630614X14056185480186
14. Von Grafenstein, M., Douka, C.: The “state of the art” of privacy- and security-by-design
(measures). In: Proceedings of MyData (2017)
15. Gratton, P., Ennis, P.J.: The Meillassoux Dictionary. Edinburgh University Press, Edinburgh
(2014)
16. Jokela, T., Iivari, N., Matero, J., Karukka, M.: The standard of user-centered design and the
standard definition of usability. In: Proceedings of the Latin American Conference on Human-
Computer Interaction - CLIHC 2003, pp. 53–60 (2003). https://doi.org/
10.1145/944519.944525
17. Kling, R., Star, S.L.: Human centered systems in the perspective of organizational and social
informatics. ACM SIGCAS Comput. Soc. 28(1), 22–29 (1998). https://doi.org/
10.1145/277351.277356
18. Lindley, J., Coulton, P.: On the Internet No Everybody Knows You’re a Whatchamacallit (or
a Thing). Making Home: Asserting Agency in the Age of IoT Workshop (2017). http://
eprints.lancs.ac.uk/84761/1/On_the_Internet_Everybody_Knows_Youre_a_Thing.pdf
19. Lindley, J., Coulton, P.: Back to the future: 10 years of design fiction. In: British HCI 2015
Proceedings of the 2015 British HCI Conference, pp. 210–211 (2015). https://doi.org/
10.1145/2783446.2783592
20. Lindley, J., Coulton, P., Cooper, R.: Why the Internet of Things needs object orientated
ontology. Des. J. (2017). https://doi.org/10.1080/14606925.2017.1352796
21. Lindley, J., Coulton, P., Cooper, R.: Informed by design. In: Living in the Internet of Things:
PETRAS Conference (2018)
22. Lindley, J., Sharma, D., Potts, R.: Anticipatory ethnography: design fiction as an input to
design ethnography. In: Ethnographic Praxis in Industry Conference Proceedings 2014, vol.
1, pp. 237–253 (2014). https://doi.org/10.1111/1559-8918.01030
23. Macdonald, N., Reimann, R., Perks, M., Oppenheimer, A.: Beyond human-centered design?
Interactions (2005). https://doi.org/10.1145/1013115.1013184
24. Norman, D.A.: HCD Harmful? A Clarification - jnd.org. http://www.jnd.org/dn.mss/
hcd_harmful_a_clari.html
25. Norman, D.A.: The Invisible Computer: Why Good Products Can Fail, the Personal Computer
is So Complex, and Information Appliances are the Solution. The MIT Press, London (1998)
The IoT and Unpacking the Heffalump’s Trunk 151

26. Norman, D.A.: Human-centered design considered harmful. Interactions 12(4), 14 (2005).
https://doi.org/10.1145/1070960.1070976
27. Norman, D.A.: The Design of Everyday Things, Revised edn. Basic Books, New York (2013)
28. Obar, J.A., Oeldorf-Hirsch, A.: The biggest lie on the internet: ignoring the privacy policies
and terms of service policies of social networking services. In: The 44th Research Conference
on Communication, Information and Internet Policy (2016). https://doi.org/10.2139/ssrn.
2757465
29. Pierce, J., DiSalvo, C.: Dark clouds, Io $ #! +, and? [Crystal Ball Emoji]: projecting network
anxieties with alternative design metaphors. In: Proceedings of the 2017 Conference on
Designing Interactive Systems, DIS 2017, pp. 1383–1393 (2017). https://doi.org/
10.1145/3064663.3064795
30. Raby, F., Dunne, A.: A/B (2009). http://www.dunneandraby.co.uk/content/projects/476/0.
Accessed 27 Oct 2014
31. Schraefel, M.C., Gomer, R., Alan, A., Gerding, E., Maple, C.: The Internet of Things:
interaction challenges to meaningful consent at scale. Interactions 24(6), 26–33 (2017).
https://doi.org/10.1145/3149025
32. Shapiro, S.S.: Privacy by design. Commun. ACM 53(6), 27 (2010). https://doi.org/
10.1145/1743546.1743559
33. Spiekermann, S.: The challenges of privacy by design. Commun. ACM 55(7), 38 (2012).
https://doi.org/10.1145/2209249.2209263
34. Steen, M.: Tensions in human-centred design. CoDesign 7(1), 45–60 (2011). https://doi.org/
10.1080/15710882.2011.563314
35. Sterling, B.: The Epic Struggle of the Internet of Things. Strelka Press, Moscow (2014)
36. Suchman, L.: Human-Machine Reconfigurations: Plans and Situated Actions. Cambridge
University Press, Cambridge (2007)
37. Taylor, P., Allpress, S., Carr, M., Norton, J., Smith, L.: Internet of Things: Realising the
Potential of a Trusted Smart World (2018). https://www.raeng.org.uk/publications/reports/
internet-of-things-realising-the-potential-of-a-tr
38. Thomas, V., Remy, C., Bates, O.: The limits of HCD. In: Proceedings of the 2017 Workshop
on Computing Within Limits - LIMITS 2017, pp. 85–92 (2017). https://doi.org/
10.1145/3080556.3080561
39. Tonkinwise, C.: How we intend to future review of Anthony Dunne. Des. Philos. Pap. 12(2),
169–187 (2014). https://doi.org/10.2752/144871314X14159818597676
40. Vollmer, N.: Article 25 EU General Data Protection Regulation (EU-GDPR) (2017) http://
www.privacy-regulation.eu/en/article-25-data-protection-by-design-and-by-default-
GDPR.htm. Accessed 15 Jan 2018
41. Wakkary, R., Oogjes, D., Hauser, S., Lin, H., Cao, C., Ma, L., Duel, T.: Morse things: a design
inquiry into the gap between things and us. In: Proceedings of the 2017 Conference on
Designing Interactive Systems, pp. 503–514 (2017). https://doi.org/
10.1145/3064663.3064734
42. Summaries of Articles contained in the GDPR. http://www.eugdpr.org/article-
summaries.html. Accessed 15 Sept 2017
43. ISO 9241-210. Ergonomics of human-system interaction – Part 210: Human-centred design
for interactive systems. International Organization for Standardization (2015). https://
www.iso.org/standard/52075.html
Toys That Talk to Strangers: A Look at the Privacy
Policies of Connected Toys

Wahida Chowdhury ✉
( )

University of Ottawa, Ottawa, ON, Canada


Wahida.Chowdhury@hotmail.ca

Abstract. Toys that are connected to the Internet are able to record data from
users and share the data with company databases. The security and privacy of
user data thus depend on companies’ privacy policies. Though there is a rising
concern about the privacy of children and parents who use these connected toys,
there is a scarcity of research on how toy companies are responding to the concern.
We analyzed privacy policies of 15 toy companies to investigate the ways toy
companies publicly document digital standards of their connected products. Our
results show that most toy companies are either unclear or do not mention in their
privacy policy documents how their toys protect the security and privacy of users.
We recommend measures that toy companies may adopt to explicitly respond to
security and privacy concerns so parents can make informed decisions before
purchasing the connected toys for their children.

Keywords: Connected toys · Smart toys · Internet of Things


Information privacy · Data security · Privacy policies · Digital standards
Children · Parents

1 Introduction

Toys that gather information from owners via microphone, camera or user inputs, and
share the information via Internet to whomever these toys are connected to, are known
as connected toys. These toys may replace traditional friends by being highly interactive
such as by recording the child’s preferences and by talking back to the child. These toys
may also replace traditional baby sitters and keep the child busy when parents are
working. Toy companies quickly noted these benefits and advertised their connected
products to children and parents by obscuring associated risks to privacy and data
security. For example, Edwin the Duck uses Bluetooth technology to broadcast lullabies
to its young users; however, the toy company also collects and retains everything the
child says and shares that information with “trusted” third parties. The purpose of our
research was to investigate the extent to which connected toy companies respond to
benefits versus threats towards consumers’ privacy and data security.
We analyzed the privacy policies of 15 connected toys; the connected products were
selected from the privacy guide developed by Mozilla foundation, a not-for-profit
organization that supports and promotes the use of connected products. We asked 16
questions about the privacy and data security of each product and looked through the

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 152–158, 2019.
https://doi.org/10.1007/978-3-030-02686-8_12
Toys That Talk to Strangers: A Look at the Privacy Policies 153

manufacturers’ privacy policies for answers. The results provide a snapshot of the infor‐
mational practices of the connected toy companies, and recommend ways to make
privacy policies more explicit so consumers can make informed decisions before
purchasing.

2 Literature Review

Connected toys relate to ‘a future in which digital and physical entities can be linked,
by means of appropriate information and communication technologies, to enable a whole
new class of applications and services’ [1]. A wide variety of toys fall under the domain
of connected toys. Some of these toys are connected to voice and/or image recognition
software (e.g. Hello Barbie™ or the Hatchimals); some are connected to app-enabled
robots, and other mechanical toys (e.g. Dash and Dot); and others are connected to video
games (e.g. Skylanders or Lego Dimensions) [2]. Some connected toys are connected
to the Internet but do not simulate human-like behaviour; some toys simulate human
interaction by talking to users; and other toys such as connected robots can be coded by
users to perform novel activities [3].
Mascheroni & Holloway (Eds.) (2017) Identified articles about connected toys from
12 countries (Australia, Austria, Finland, Germany, Italy, Lithuania, Malta, Portugal,
Romania, Serbia, Slovenia and Spain), and documented the benefits of connected toys
as reported by parents. The benefits included the development of digital literacy, crea‐
tivity, motivation to learn, reading and writing literacy, social skills, physical activity,
etc. Despite the benefits however, concerns about the security and privacy of users (who
are primarily children) are documented in the literature from the hay days of connected
toys [4].
Concerns about children’s security and privacy were already in place as social
networking, gaming, and other websites gathered, stored, and shared data from child
users with other third parties often without the child users’ knowledge or consent [5].
Connected toys intensified the concerns by making data collection from children easier
(such as by microphone, camera, location tracker, and movement detectors) and by being
able to collect more personal data (such as by being able to follow child users everywhere
and by being always “on”). The developments exacerbated the risks of easy access to
personal information, simply by hacking company databases. Recent examples include
hacking of data collected by the connected toys, Hello Barbie and VTech, from millions
of child users [2].
The security and privacy concerns imply that toy makers should incorporate effective
measures from inception to completion of the development process of connected toys
[6]. Our research looks into the privacy policies of toy companies to report how the
companies are addressing public hopes and fears surrounding connected toys.

3 Methodology

The Mozilla foundation published a report, Privacy Not Included, in December 2017
that reviewed openly accessible privacy policies of different connected products. The
154 W. Chowdhury

report aimed to draw buyers’ attention to three questions related to privacy and security
before purchasing the products: (1) How do the products spy on users? (2) What infor‐
mation about the users do the products collect? and (3) What could happen to users if
data breeches occur? For example, Mozilla guide reports that the connected toy, Dash
the Robot, is a one-eyed robot that can sing, dance, and play to give an highly interactive
and fun experience to children; however, parents should be warned that the robot can
spy on children via microphone and that parents have no control over the data that the
robot collects.
To extend the Mozilla product reviews and have more in-depth synopsis of users’
privacy and data security related to connected products, we conducted further analyses
of the privacy policies of 15 toys and game consoles listed in the Mozilla report. These
connected products were: Smart letters, Edwin the Duck, Adidas miCoach Smart Soccer
Ball, Ozobot Evo, Beasts of Balance, Toymail Talkie, Sphero SPRK+, Osmo, Dash the
robot, BB-8 by Sphero, Airjamz Air Guitar, Hello Barbie, Microsoft Xbox One, Sony
Playstation 4, and Nintendo Switch.
We developed 16 distinct questions from the open access Digital Standards, created
by Consumer Reports, Disconnect, Ranking Rights and the Cyber Independent Testing
Lab to evaluate the privacy and security of the 15 connected toys. For example, we
investigated how secure user information is when using a connected product; we looked
through the product’s privacy policies to determine if the company routinely audits user
data and restricts third party access to the data. The various questions answered what
privacy measures were put in place, what privacy controls were available, and what kind
of information the companies gathered from users and disclosed to third parties.

4 Results

4.1 How secure is users’ data?

Almost all the companies we studied claimed that they take steps or comply with stand‐
ards to protect user data, but they are not always clear about what steps they take or what
standards they follow. Furthermore, none of the companies we studied are confident that
they are hack-proof, and admit that security breaches can still happen.

4.2 Do users need to make a password?

Most companies require users to make a password. However, passwords are not required
to be complex/secure. This means that the user information could be easily hacked.

4.3 Does the company encrypt users’ information?


Only four (27%) of the companies we studied fully encrypt user data; others partly
encrypt users data or do not encrypt at all. This means that the user information could
be easily understood if hacked.
Toys That Talk to Strangers: A Look at the Privacy Policies 155

4.4 Can users control the data that the company collects?
Almost half the companies we studied (53%) do not mention if users can control their
own data. In fact, few companies such as “osmo” toy automatically collect information
without user control.

4.5 Can users delete their data when they leave the service?

Almost all the companies we studied allow users to delete data when they leave services,
but maybe not completely. For example, companies may retain non-personally identi‐
fiable data, and catched or backup copies of user data that companies are not explicit
about. This means that even if users leave a service, their information could be hacked.

4.6 Do users know what information the company collects?

Almost all the companies we studied give users snapshots of what information is
collected from them. However, the hidden rules are often too complex to understand
and are easy to overlook.

4.7 Does the company collect only the information needed for the product to
function?
Almost all the companies we studied collect more information from users than what is
needed to make their product work.

4.8 Is users’ privacy protected from third parties by default?

None of the companies we studied protect user data from third companies by default.
Some companies allow users to review and change their privacy settings. However, it
is not clear to what extent users are able to protect their privacy without loosing access
to services.

4.9 How does the company use users’ data?

The privacy documents of almost all the companies we studied explicitly state how they
might use user data. However, most companies leave the responsibility on users to
control their own privacy, and users are threatened that they might not get the best service
if they restrict access to their data.

4.10 Does the company have a privacy policy document?


All the companies we studied have privacy policy documents. However, the documents
are often very long in a tangible language, and often so not answer important questions.
156 W. Chowdhury

4.11 Will users receive a notification if the company changes its privacy policy?
Less than half (40%) of the companies we studied send notifications if their privacy
policies change. Most companies either do not mention of any change or simply update
the date on top of their policy documents that are very unlikely to be read twice by users
to notice the change.

4.12 Does the company comply only with legal and ethical third-party requests
for users’ information?

Only 27% of the companies we studied explicitly mentioned that they comply only with
legal and ethical third-party requests of user information. Most companies claim to share
non-identifiable information or are not explicit about how information requests are
handled.

4.13 Does the company require users to verify identity with government-issued
identification, or with other forms of identification that could be connected
to users’ offline identity?

None of the companies we studied require users to verify identity with government-
issued identification, indicating that users can register for services under false names.

4.14 Does the company notify users for any unauthorized access to data?
Only two (13%) of the companies we studied notified users of security breaches. This
means that users may continue to use connected products even after these are hacked.

4.15 Is the company transparent about its practices for sharing users’ data with
the government and third parties?

Only four (27%) of the companies we studied were transparent about sharing practices
with the government and third parties.

4.16 Does the company send notifications if the government or third parties
request access to users’ data?

Only three (2%) of the companies we studied notified users of third party requests. This
means that third parties may collect users’ information without their awareness.

5 Discussion

Childhood experiences are rapidly becoming digital by including connected toys and
games that let children connect to strangers effortlessly from the comfort of their home.
Although this may seem fun and safe, our findings indicate that none of the toys provided
Toys That Talk to Strangers: A Look at the Privacy Policies 157

satisfactory answers to all 16 questions related to privacy and data security. There
remained a variety of different ways a connected toy company may gather information,
such as recording users preferences, tracking a user’s IP address and turning on a devi‐
ce’s camera every time the toy is used. The security of user information thus relies on
the security of the databases of a connected toy company or of the third parties that the
company shares information with. If hackers or even employees access the databases
with any wrong motive from having fun to stealing money to initiating a cyber-war,
strangers can talk back to the young users and make them do inappropriate things.
To prevent data breeches, privacy policy documents of the 15 toy companies that
we analyzed claimed to have privacy measures in place; this might make parents feel
relieved to trust the companies to be responsible care takers of their children. However,
the privacy policies of almost all the companies accepted that their databases might not
be secure enough to prevent data breeches. Companies seem to posit that users are
responsible for their own security. However, users were often threatened of losing serv‐
ices if they exercised control of their privacy, for example if users did not share data
with third parties.
The privacy policies of each company attempt to document their data collection and
sharing practices that might give the feeling of making an informed decision about
purchasing the company products. However, the policies do not follow a standardized
format and are not always written in a way that the general user could understand. Also
the definitions of privacy measures such as data control and data collection are not
standardized between companies. This means that many parents may not be aware of
the information that companies gather about their children which may limit their ability
to make fully informed decisions about the products that they’re purchasing. For
example, when a parent signs up for an account for various toys or consoles, certain
information is asked of them but the sign up mechanisms do not draw the parent’s
attention to the fact that the toy’s microphone may be accessed or that the child’s IP
address and/or Wi-Fi information may be stored in the company servers.
Furthermore, users may ignore reading lengthy documents, such as ambiguous
privacy policies, that describe before purchasing what a certain connected toy does. For
example, users may ignore ambiguous warning that a toy maybe harmful which does
not state clearly why or how the toy may be harmful. Users may also feel if a product
is in the market, the company must have done security checks. For example, if a new
car is in the market, users should not have to think if the car would be safe for driving;
let alone, investigating if children’s toys are safe for playing.

6 Recommendations for Toy Companies

Our findings suggest that a Frequently Asked Questions or FAQ should accompany
privacy policy documents that itemize privacy-related questions the way we did in this
report so it’s easier for people to see how their information is collected, used and
disclosed. Secondly, if the concerns stem from sharing data with company databases,
toy companies should re-consider the necessities of sharing data with remote databases
158 W. Chowdhury

that have the possibility of being hacked, rather than sharing data locally within the toy
itself that can only be hacked if the child loses the toy.
Furthermore, more evaluations need to be done, as new toys are developed to ensure
that children’s information is given the highest level of protection. Manufacturers should
strive to make connected toys more reliable and capable each year while service
providers, software engineers, governments, private organizations, and technical experts
should strive to prevent and solve security and socio-economic problems arising from
connected toys.

Acknowledgment. The author wishes to thank Diana Cave (Criminology Department,


University of Ottawa) for assisting in conducting the research, and professor Valerie Steeves
(Criminology Department, University of Ottawa) for her valuable comments on previous drafts
of this article.

References

1. Miorandi, D., Sicari, S., De Pellegrini, F., Chlamtac, I.: Internet of Things: vision, applications
and research challenges. Ad Hoc Netw. 10(7), 1497–1516 (2012). https://doi.org/10.1016/
j.adhoc.2012.02.016
2. Holloway, D., Green, L.: The internet of toys. Commun. Res. Pract. 2(4), 506–519 (2016)
3. Mascheroni, G., Holloway, D. (eds.): The Internet of Toys: A Report on Media and Social
Discourses Around Young Children and IoToys. DigiLitEY, London (2017)
4. Dobbins, D.L.: Analysis of security concerns and privacy risks of children’s smart toys. Ph.D.
Dissertation. Washington University St. Louis, St. Louis, MO, USA (2015)
5. Steeves, V., Jones, O.: Surveillance, children and childhood (Editorial). Surveill. Soc. 7(3/4),
187–191 (2010)
6. Nelson, B.: Children’s Connected Toys: Data Security and Privacy Concerns. United States
Congress Senate Committee on Commerce, Science, and Transportation, 14 December 2016.
https://www.hsdl.org/?view&did=797394. Accessed 4 July 2017
A Reinforcement Learning Multiagent Architecture
Prototype for Smart Homes (IoT)

Mario Rivas ✉ and Fernando Giorno


( )

Instituto de Pesquisas Tecnológicas – IPT, São Paulo, Brazil


mariorivas@hotmail.com, fgiorno@gmail.com

Abstract. Continuous technology progress is fueling the delivery of new and


less expensive IoT components, providing a variety of options for the Smart
Home. Although most of the components can be easily integrated, achieving an
optimal configuration that prioritizes environmental goals over individual
performance strategies is a complex task that requires manual fine tuning. The
objective of this work is to propose an architecture model that integrates rein‐
forcement learning capabilities in a Smart Home environment. In order to ensure
the completeness of the solution, a set of architecture requirements was elicited.
The proposed architecture is extended from the IoT Architecture Reference
Model (ARM), with specific components designed to coordinate the learning
effort, as well as data governance and general orchestration. Besides confirming
the fulfillment of the architecture requirements, a simulation tool was developed
to test the learning capabilities of a system instantiated from the proposed archi‐
tecture. After six million and four hundred thousand execution cycles, it was
verified that system was able to learn in most of the configurations. Unexpectedly,
results show very similar performance for collaborative and competitive envi‐
ronments, suggesting that a more varied selection of agent scenarios should be
tested as an extension of this work, to confirm or contest Q-Learning hypothesis.

Keywords: IoT · Reinforcement · Learning · Q-Learning · Architecture

1 Introduction

Considering the continuous progress on the scientific landscape that facilitates the
delivery of new IoT (Internet of Things) components, and the absence of a single fully
adopted industry standard [1], the goal of achieving an optimal efficiency setup for a
Smart Home relies on empirical approaches and context-based rules, rather than AI
techniques. Furthermore, strategies to achieve context specific goals like energy effi‐
ciency, home safety or environmental control requires a pre-emptive knowledge of the
components and their interaction, reducing the flexibility and resilience.
The objective of this work is to propose an abstract architecture model that integrates
reinforcement learning capabilities in a Smart Home environment, allowing real-time
agent configuration and information exchange governance. By this mean, concrete
systems derived from this architecture will be able to learn optimal strategies to achieve
environmental goals.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 159–170, 2019.
https://doi.org/10.1007/978-3-030-02686-8_13
160 M. Rivas and F. Giorno

The rest of this paper is structured as follows. Section 2 introduces related researches
that contributed to the background of this work. Proposed architecture is presented in
Sect. 3, describing the architecture requirements, the design approach and the final
architecture description. Section 4 details the testing process, introducing the design of
the testing tool, the simulation cases and its results. Finally, conclusions and future scope
are included in Sect. 5.

2 Related Work and Research Contribution

2.1 Related Work

Several approaches were been proposed to resolve the complex interaction issues of the
IoT environments, and its dynamic configuration requirements. In the specific field of
manufacturing, Katasonov et al. [2] introduced the concept of multiagent platforms with
autonomous behaviour to overcome the interoperability issues derived from multiplicity
of standards and protocols. Wang et al. [3] proposed an agent-based hybrid service
delivery, composed by four subsystems: (1) hybrid services based on agents, (2) hybrid
service of ontological search engine, (3) service enablers repository and (4) a service-
oriented agent lifecycle manager.
In order to reduce the uncertainty of the inherent stochastic IoT environment, Nastic
et al. [4] introduced the platform U-GovOps to manage elastic IoT systems, applying a
declarative proprietary language to define policies and resolve real-time issues.
While most authors defined solutions based in multiagent systems, few agent
learning techniques references were found [3], as well as no specific mention to rein‐
forcement learning.

2.2 Research Contribution

This paper presents an integrated vision of several recent studies related to Smart Home
architecture powered by multiagent systems and reinforcement learning techniques,
defining a framework to instantiate concrete architectures.
In order to verify the learning capacity of the resulting architecture, a simulation tool
was created where 64 different scenarios were tested. Results of these simulations are
relevant to understand the impact of the hyperparameters in the reinforcement learning
approach.

3 Proposed Architecture

The current section describes the proposed architecture model to cover the objectives
explained at Sect. 1. Initially a summary of the architecture requirements is listed, then
an explanation of the design approach and finally the architecture description itself.
A Reinforcement Learning Multiagent Architecture Prototype 161

3.1 Architecture Requirements


The general objective of this architecture is to address learning capabilities on a Smart
Home architecture based in multiagent system, supporting online reconfiguration and
resilience. The list of architecture requirements, classified in functional and non-func‐
tional requirements is presented in Table 1.

Table 1. Architecture requirements


Req. type Architecture requirements
Element Description
Functional Initial system configuration Architecture provides suitable artefacts for
system configuration
Learning process oversight Individual agent learning progress is calculated
and utilized on the reward provision
New agent inclusion New agents are included in real-time
Agent removal Architecture provides artefacts to remove
agents in real-time
System parameters System parameters are modifiable at real-time
modification
Information consumers External information consumers can be added/
coordination removed in real-time
External information flow System information flows externally to the
consumers, according to the information
governance in place
System governance control Architecture provides artefacts to define and
manage governance
Non- Resilience Architecture provides a redundant structure to
Functional support operations continuity
Scalabilty System resource requests are anticipated and
capacity limitations are proactively managed
Performance Component orchestration is aligned with
system performance

3.2 Design Approach


The architecture reference model for IoT (ARM) was developed in a joint effort by the
European Platform on Smart Systems (EPoSS) and the IOT-A project. Its main function
is to provide a common structure and a set of guidance to elaborate concrete IoT archi‐
tectures in different contexts. ARM consists in a set of interdependent sub-models
describing reference architecture basic aspects. The intersection of this model with the
system requirements determines the instantiated architecture, represented by views and
perspectives. Basic models described by ARM are: IoT Domain (physical entities and
their logical representation, etc.), Information Domain (information structures, service
modelling, etc.), Functional Domain (group of functionalities included), Communica‐
tion Domain and Trust, Security and Privacy Domain.
162 M. Rivas and F. Giorno

Bassi et al. [5] approach to generate architectures based on ARM is supported by the
usage of views and perspectives as described by Rozanski and Woods [6]. The set of
basic views suggested by these authors [6] are: Functional, Information, Concurrent,
Development, Deployment and Operational. This collection of views provides a
comprehensive description of the architecture; however it does not explicitly consider
non-functional requirements like information security or resilience. Since this type of
requirements are orthogonal to the functional requirements, Rozanski and Woods [6]
suggest to document them as “perspectives”, describing their intersection with func‐
tional views as a complement of the main description. Following the recommendation
of the authors, this work considered the following perspectives: Information Security,
Performance and Scalability, Availability and Resilience and Evolution. A graphic view
of the ARM components and their interaction is represented in Fig. 1.

Fig. 1. Architecture reference description model after ARM.

3.3 Architecture Description


The main concept of the proposed architecture is based on the virtualization of the agents
and their asynchronous learning management. It is composed by the following elements:
Physical Context, Virtual Agent Farm (VAF), Asynchronous Data Layer (ADL), Data
Exchange Manager (DEM), Context Manager and Learning Manager, displayed below
in Fig. 2.
A Reinforcement Learning Multiagent Architecture Prototype 163

Fig. 2. Proposed architecture main components.

Physical context is the representation of the elements that compound the environ‐
ment, like sensors, actuators and other hardware devices. Depending on the number of
sensors and actuators, several non-exclusive combinations may be defined to instantiate
correspondent agents.
The VAF is the logical component that stores virtual agents and their system process.
ADL intermediates data traffic among different components, assuring the persistence
and resilience of the information. Data exchange with external/internal consumer/
publishers is managed by the DEM, based on the information governance defined in the
configuration and administrated by the Context Manager. This component is on charge
of the system orchestration, initiating and controlling all the process and resources.
Learning Manager calculates and distributes rewards to the agents and oversights
the system learning process. While a full representation of the views and perspectives
of the proposed architecture exceeds the scope of this paper, functional view, informa‐
tion view and context view are briefly described below in Fig. 3.
164 M. Rivas and F. Giorno

Fig. 3. Functional view.

The functional view describes four main logical components and their basic inter‐
actions. As depicted here, Context Manager orchestrates and supervises most of the
functional flows within the system. Although ADL is embedded on the background of
this visual representation, none of its functionalities justify its inclusion as a logical
component.
A Reinforcement Learning Multiagent Architecture Prototype 165

Entities included in the Information View diagram represent main information


concepts and their composition/aggregation relationship. Information flow diagram
(usually described using an UML message flow diagram) complements this view, repre‐
senting the information system lifecycle. Most important entities of the proposed archi‐
tecture are described in Fig. 4.

Fig. 4. Information view.

As defined by Rozanski and Woods [6], the Context View describes the relationships,
dependencies and interactions between the system and its environment. The proposed
architecture is defined by the logical and physical context. Data and software compo‐
nents are included on the logical context, while hardware components are included in
the physical context. External entities interacting with the system are represented as out-
of-system-boundary in Fig. 5.
166 M. Rivas and F. Giorno

Fig. 5. Context view.

4 Testing

The architecture proposed was designed to cover all the architecture requirements
mentioned in Sect. 3, however its material verification cannot be performed without a
concrete system derived from it. While creating an IoT concrete system to assess the
feasibility of the proposed architecture is out of the scope of this work, an execution
simulator tool (EST) was designed to confirm the learning capabilities of the solution.

4.1 EST Design


The EST was developed as a functional prototype of the proposed architecture, consid‐
ering its main structures and the relationship among components. Due to the experi‐
mental approach and the limited resources, some particularities were defined:
– Physical environment was reduced to a bi-dimensional space;
– Every agent has an individual id, a pair of coordinates (x,y) and a reference to a two
other agents, known as the “vertices”;
A Reinforcement Learning Multiagent Architecture Prototype 167

– There are nine (9) possible actions to be taken for an agent at any cycle: stand still
or move in one of eight (8) possible directions (0°, 45°, 90°, 135°, 180°, 225°, 270°
or 315°);
– At every cycle each agent knows its current coordinates and the coordinates of each
one of its vertices;
– The individual reward calculation is based on the angular difference from the triangle
formed by the agent itself and its vertices, and a hypothetical equilateral triangle. To
calculate the difference, every internal angle of the triangle is compared with a target
of 60°, computing the sum of these three differences and subtracting from 120:
( )
R = 120− ||ang1 −60 || + || ang2 −60 || + || ang3 −60 ||

4.2 Software Project

The prototype was developed in Linux Ubuntu 16, using Python 3.6 language and the
machine learning library PyTorch 0.2.0. Its neural network was built using a deep queue
learning approach, with five (5) input parameters, a hidden layer of thirty (30) neurons
and nine (9) output parameters (one per each possible agent action). The loss optimiza‐
tion function applied was the adaptive moment estimation (ADAM), and a neuron acti‐
vation was linear rectification (ReLU).

4.3 Simulation Cases

In order to define the simulation cases, the following parameters were considered: (1)
number of agents, (2) algorithm learning rate (alpha), (3) softmax policy temperature
(tau) and (4) number of execution cycles.
The number of agents was limited to four, using the following configurations: (a)
three agents with only one active agent, (b) three agents with only two active agents, (c)
three agents all active and (d) four agents all active.
Learning rate is a parameter utilized by the Q-Learning algorithm [7] to define the
prevalence of the new knowledge over the previous one. Alpha large values (closer to
1−) implies a faster substitution of knowledge while lower values (closer to 0+) implies
a more conservative approach. Defined values for the simulation cases were {0.2, 0.5,
0.8}.
The decision policy utilized by the tool is a version of Softmax [8] implemented on
the PyTorch library. This policy aims to define whether to choose the greedy action (the
best-known action for a specific state) or the random action (to explore the environment)
based on the amount of knowledge currently harvested, i.e. the more knowledge the
more likely to choose a greedy action. The “temperature” parameter (tau) provides a
magnitude to the policy. Values chosen to create the test cases were {0.01, 0.1, 1, 10,
100}.
The number of execution cycles was determined after some exploratory cases,
aiming to gather enough executions to support the conclusions, within the expected
timeframe. As result, the target number of execution cycles was ten thousand (10,000).
168 M. Rivas and F. Giorno

For each agent configuration, fifteen (15) parameter combinations were defined
(three learning rate values x five policy temperatures) plus a scenario without learning
capabilities, totalizing sixty-four (64) combinations. Each scenario was executed ten
(10) times, through ten-thousand (10,000) cycles, completing six-million four-hundred-
thousand cycles.

4.4 Results
Every execution generated a text file containing the reward of the system (calculated as
the sum of the individual rewards) for each cycle and its correspondent graph. In order
to evaluate the convergence of the learning curve, reward values were segmented in
fifty (50) stages, and standard deviation was calculated each one. Whenever the standard
deviation remains decreasing or stable at a very low value, learning curve convergence
is confirmed.
In general, all test cases confirmed the convergence of the learning curves, except
for a few cases where policy temperature was very low and (as expected) scenarios with
no learning capabilities. Figure 6 describes the results of the simulations consolidated
by policy temperature.

Fig. 6. Learning curve convergence by policy temperature.

When compared scenarios by learning rate, no relevant difference was found, as


shown in Fig. 7.
A Reinforcement Learning Multiagent Architecture Prototype 169

Fig. 7. Learning curve convergence by learning rate.

5 Conclusions

The objective of this work was to define an architecture of reference that provides
learning capabilities to a Smart Home environment, allowing for real-time component
configuration and external information governance, as described on the architecture
requirements section.
The proposed architecture defines components and functionalities covering the
architecture requirements, introducing reinforcement learning features. Simulated
scenarios executed also confirmed the learning curve convergence of the system, under
several different configurations.
According to the Q-Learning algorithm definition [7], collaborative multiagent
systems should converge to an optimal policy in a finite number of cycles, however this
is not guaranteed for competitive environments. Unexpectedly, test results shown a very
similar convergence curve for collaborative and competitive environments, suggesting
that a more variated selection of agent scenarios should be tested as an extension of this
work, to confirm or contest Q-Learning hypothesis.
Future extensions of this work may cover the study of learning convergence curves
for more variated configurations, eventually approaching to real life smart home setups.
Another study path suggested by the results of this work refers to the possibility of
sharing intelligence among different configurations, by persisting the agent neural
networks.
Figure 8 represents a consolidated view of the simulations executed by agent config‐
uration.
170 M. Rivas and F. Giorno

Fig. 8. Learning curve convergence by agent configuration.

References

1. Madakam, S., Ramaswamy, R., Tripathi, S.: Internet of Things (IoT): a literature review. J.
Comput. Commun. 3, 164–173 (2015)
2. Katasonov, A., Kaykova, O., Khriyenko, O., Nikitin, S., Terziyan, S.: Smart semantic
middleware for the Internet of Things. In: Proceedings of the 5th International Conference on
Informatics in Control, Automation and Robotics, Portugal, pp. 169–178 (2008)
3. Wang, J., Zhu, Q., Ma, Y.: An agent-based hybrid service delivery for coordinating internet
of things and 3rd party service providers. J. Netw. Comput. Appl. 36, 1684–1695 (2013)
4. Nastic, S., Copil, G., Truong. H., Dustdar. S.: Governing elastic IoT cloud systems under
uncertainty. In: 2015 IEEE 7th International Conference on Cloud Computing Technology and
Science, pp. 131–138. IEEE, Canada (2015)
5. Bassi, A., Bauer, M., Fiedler, M., Kramp, T., Van Kranenburg, R., Lange, S., Meissner, S.:
Enabling Things to Talk: Designing IoT solutions with the IoT Architectural Reference Model,
p. 349. Springer, Berlin (2013)
6. Rozanski, N., Woods, E.: Software Systems Architecture. Working with Stakeholders Using
Viewpoints and Perspectives, p. 529. Pearson, London (2005)
7. Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, UK (1989)
8. Tuyls, K., Weiss, G.: Multiagent learning: basics, challenges and prospects. AI Mag. 3, 41–52
(2012)
Real-Time Air Pollution Monitoring
Systems Using Wireless Sensor Networks
Connected in a Cloud-Computing,
Wrapped up Web Services

Byron Guanochanga1 , Rolando Cachipuendo1 , Walter Fuertes1(B) ,


Santiago Salvador1 , Diego S. Benı́tez2 , Theofilos Toulkeridis1 , Jenny Torres3 ,
César Villacı́s1 , Freddy Tapia1 , and Fausto Meneses1
1
Universidad de las Fuerzas Armadas ESPE, 171-5-231B Sangolquı́, Ecuador
{beguanochanga,recachipuendo,wmfuertes,mssalvador,ttoulkeridis,
cjvillacis,fmtapia,fhmeneses}@espe.edu.ec
2
Universidad San Francisco de Quito USFQ, Campus Cumbayá, Casilla Postal,
17-1200-841 Quito, Ecuador
dbenitez@usfq.edu.ec
3
Escuela Politécnica Nacional, P.O. Box 17-01-2759, Quito, Ecuador
jenny.torres@epn.edu.ec

Abstract. Air pollution continues to grow at an alarming rate, decreas-


ing the quality of life around the world. As part of preventive mea-
sures, this paper presents the design and implementation of a secure
and low-cost real-time air pollution monitoring system. In such sense, a
three-layer architecture system was implemented. The first layer contains
sensors connected to an Arduino platform towards the data processing
node (Raspberry’s Pi), which through a wireless network sends messages,
using the Message Queuing Telemetry Transport (MQTT) protocol. As
a failback method, strings are stored within the data processing nodes
within flat files, and sent via SSH File Transfer Protocol (SFTP) as a
restore operation in case the MQTT message protocol fails. The appli-
cation layer consists of a server published in the cloud infrastructure
having an MQTT Broker service, which performs the gateway functions
of the messages sent from the sensor layer. Information is then published
within a control panel using the NODE-RED service, which allowed to
draw communication flows and the use of the received information and
its posterior storage in a No SQL database named “MongoDB”. Fur-
thermore, a RESTFUL WEB service was shared in order to transmit
the information for a posterior analysis. The client layer can be accessed
from a Web browser, a PC or smartphone. The results demonstrate that
the proposed message architecture is able to translate JSON strings sent
by the Arduino-based sensor Nodes and the Raspberry Pi gateway node,
information about several types of air contaminants have been effectively
visualized using web services.

Keywords: Air pollution · IoT · IaaS · WSN · Web services


c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 171–184, 2019.
https://doi.org/10.1007/978-3-030-02686-8_14
172 B. Guanochanga et al.

1 Introduction
The World Health Organization (WHO) [1] reported that “Air pollution is the
biggest environmental risk to health, carrying responsibility for about one in
every nine deaths annually”. Although industry and the scientific community
have developed various solutions based on conventional Wireless Sensor Net-
works (WSN) for air pollution monitoring, the existing products and the gener-
ated results lack to represent low-cost solutions, some require hiring hosting or
web services, as well as having a number of limited messages without a failback
method.
The aim of this work is to develop a secure environmental monitoring sys-
tem based on WSN that are integrated to the Internet of Things (IoT) concept,
increasing the capacity and life span of the sensor nodes of the WSN with rel-
ative low-costs. Therefore, first, a hardware and software prototype has been
assembled using Arduino and Raspberry Pi platforms, comprising several air
pollution sensors as well as newly designed and constructed wireless expansion
modules. Second, a three-layer architecture, which leverages a real-time air pol-
lution monitoring system has been designed and implemented: (1) The first
sensor layer includes the electronic hardware circuits and the software compo-
nents, both for the Arduino-based sensor nodes and the gateway node, which
was assembled using a Raspberry Pi together with a low-cost wireless expansion
module for capturing the data. (2) The application layer, where a Web service
has been designed and implemented using a set of protocols and formats that
are used to process the data and store them in a MongoDB Database as part of
the Cloud infrastructure. (3) The client layer, which consists of a Web graphical
user interface, providing a visual information about environmental parameters
in order to allow the communication with the WSN and users.
The main contributions of this paper include: (1) The creation of a low-cost
wireless monitoring system (i.e., software) as an IoT application to visualize the
levels of air pollution. (2) The implementation of a novel three-layer message
architecture to translate JSON strings sent by Arduino-based sensor Nodes and
the Raspberry Pi gateway node, which are effectively visualized in Web services.
(3) A failback method as a process for restoring operations via SFTP protocol,
in case the MQTT message protocol fails.
The remainder of this paper is organized as follows: Sect. 2 discusses related
work, Sect. 3 presents the experimental setup, as well as the implementation
of electronic devices and web services, while Sect. 4 provides the experimental
results; finally, Sect. 5 ends the paper with the conclusion and future work.

2 Related Work
The scientific community has been developing innovative alternatives to mea-
sure air pollution using WSN. Nevertheless, several studies has been designed
conventionally.
In relation to low-power wireless communication protocols, similar to this
work, some authors such as [2–14] have used ZigBee technology (based on
Real-Time Air Pollution Monitoring Systems Using WSN 173

the IEEE 802.15.4). Conversely, in this work the NRF24L01 radio frequency
transceiver module, [15] which has an advanced energy management, was used.
The NRF24L01 has an enhanced Shock- Burst hardware protocol accelerator,
which helps to implement a robust and advanced wireless network with low-cost
micro-controllers.
In relation to the connection platform for the different nodes, the study pro-
posed by [7] used Octopus II. The sensor node implemented had a humidity
sensor, temperature and a CO sensor. In [11], the same device was used, with
the difference that the 501A Dust sensor module (DSM501A) was added, which
was designed to detect particles larger than 1 µm. In [5,16,17] the Waspmote
platform was applied, which is characterized by the use of lower energy consump-
tion. In [6], nodes were prepared to monitor gases such as carbon monoxide (CO),
nitrogen dioxide (NO2), sulfur dioxide (SO2), ozone (O3), metals such as lead
(Pb) and particulate matter. In [16] authors proposed a clustering protocol for
the sensor network.
For the connection of different sensors, different models of the Arduino plat-
form have been used. For instance, in [12] the Arduino Mega 128 microcontroller
was used together with the MQ-7 sensitive gas sensor detector in order to deter-
mine CO. For the implementation of the sensor node in [18], the Arduino one
with the Digi XBee module were used for the wireless mesh communication of
the nodes. Similarly, in [19] authors used the Arduino R3 board that has an
Atmel Atmega328 microcontroller with a clock speed of 16 MHz, together with
a XBee model. Raspberry Pi model B was also used for the base station, where
a database has been available for the storage of the received readings and a
Web application was used for data presentation. The majority of these studies
[12,19–34,36] resemble this work since the same open-source Arduino platform
is used. However, they differ in the way the data is transmitted towards the
database, since a Raspberry Pi acting as the Gateway node is used in this work,
using a three-layer message architecture, together with the NRF24L01 module
for Wireless communication.
Regarding the number of sensors for measuring air quality parameters, in [32]
a device was implemented to monitor the CO in different industrial plants. In
[35] temperature and relative humidity data were collected using the SHT11 and
SHT75 sensors, respectively. In [36], a predesigned sensor node, called CanarIT
was used, which displayed several sensors. Data from each sensor node were
stored in the cloud by GPRS communication. In [37], the sensors used were
MG-811 for CO2, MQ-7 for CO and GP2Y1010AU0F for powder particles. In
comparison with this study, most sensor nodes determined only up to four pol-
lutants, including the most common being CO, CO2 and particulate matter.
Nonetheless, more sensors were implemented in this study in order to mea-
sure more pollutants, including CO, CO2, methane (CH4), sulfur dioxide (SO2),
hydrogen sulfide (H2S), NO2 and particulate material (2.5 and 10 µm).
Furthermore, similar to the study proposed in [37], in this work all data
have been stored in a non-relational database and processed in a private cloud
computing infrastructure.
174 B. Guanochanga et al.

3 Experimental Setup
The general architecture of the real-time air pollution monitoring system is illus-
trated in Fig. 1. The system has been divided into three layers. First, the Sensors
layer is formed by the sensor nodes (SN) connected by Arduino R3 boards located
in a distributed manner and the Gateway node, consisting of a Raspberry Pi
board, forming a WSN. The sensor nodes send the polluting gas measurement
information to the corresponding Gateway node wirelessly. Second, the Gate-
way node with Internet access sends the received information to an application
server in the cloud computing. The information will be stored in a non-relational
database such as MongoDB. Third, this information will be published on a Web
page so that users would be able to access it through their Web browser and
smartphones.

Fig. 1. Architecture that leverages the WSN system.

3.1 Sensor Nodes

The electronic circuit diagram of a typical Sensor Node prototype, which depicts
the connections made in each sensor node, is shown in Fig. 2. It consists of the
Arduino board, the Wireless module NRF24L01, and the CO, CO2, CH4, SO2,
H2S, NO2 and particulate material sensors. For the measurement of polluting
gases, the modules MQ-7 (CO), MG-811 (CO2), MQ-4 (CH4), MQ-136 (SO2,
Real-Time Air Pollution Monitoring Systems Using WSN 175

Fig. 2. Schematic electronic circuit diagram for each Sensor Node.

and H2S) and MICS-2714 (NO2) were used. Finally, for the measurement of
particulate material of 2.5 and 10 µm, the digital sensor HK-A5 was used.
The CO, CH4, SO2, H2S and NO2 sensors are Metal Oxide Semiconductor
(MOS) based sensors. This type of sensors displays a small heating element inside
as well as an electrochemical sensor. The heater is necessary in order to fit the
sensor to the its proper operating conditions, since the sensitive surface of the
sensor will react only at certain temperatures. The detection principle is based
on the change of resistance due to incoming gas contact. The CO2 sensor is a
chemical sensor that operates under the principle of a solid electrolyte cell. When
the sensor is exposed to CO2, chemical reactions occur in the cell producing an
electromotive force. The temperature of the sensor must be high enough for these
reactions to occur. Therefore, a heating circuit was used to heat up the sensor
to an adequate temperature.
The MOS sensors required signal conditional circuits for converting their
readings to voltage that will be measured by the Arduino board. Similarly, the
CO2 module has an amplifier circuit to improve the accuracy of the measure-
ments since the output voltage of the sensor is relatively low. Sensor voltages
were measured by the analog inputs of an Arduino Mega microcontroller board.
The particulate material sensor communicates serially with the Arduino board.
Figure 2 shows the sensor connections with the Arduino Mega board. For wire-
176 B. Guanochanga et al.

less communication, the NRF24L01 transceiver module was used operating in


the 2.4 GHz ISM band. Since the Arduino board lacks of enough connections for
powering the sensor modules, a shield-type board was designed for connecting
the board with the sensors and the wireless extension module. The Gateway
node consists of a Raspberry Pi board, and a NRF24L01 Wireless expansion
module. This node receives the measurements from all connected sensor nodes.

3.2 Hardware Prototype


The sensor node prototype was implemented inside a sealed chamber in which the
sensor modules were placed, as shown in Fig. 3. The chamber has two air ducts,
air is sucked by a fan to the interior and then it escapes towards the outside.
The sensors and the wireless module were connected to the shield and to the
Arduino board, where all the sensor node is controlled. Signals from the sensors
were interpreted to different gas concentrations according to the characteristic
curves described in their corresponding data sheets.

Fig. 3. Photograph of the sensor node prototype.

3.3 Web Services


In order to send messages from the Gateway node to the application server, the
MQTT protocol was used with the character string and format illustrated in
Fig. 4. The format uses the JavaScript Object Notation (JSON) type, composed
of the sensor node ID, the IP address, date and time of measurement, latitude
and longitude as well as measurements of the sensor.
As a Failback method for handling errors, this string is also stored in the
Gateway node in a flat file after it has been sent to the application server through
the SFTP protocol, with the purpose of processing it and acting as redundancy
Real-Time Air Pollution Monitoring Systems Using WSN 177

Fig. 4. String Chain with a format based on the MQTT protocol.

in case the MQTT message protocol fails. The application server has a MQTT
Broker service, which represents a central node or broker server and it is respon-
sible for managing the network by receiving messages sent from the Gateway
nodes.
The system has a Delay-Access process, which allows to synchronize the
reception of messages from the processing nodes. This process will always be
checking the status of messages to guarantee their availability and verify if the
failback option has been performed for the node that fails to be the case. With
the implementation of the NODE-RED service installed on the server, several
information flows have been created in order to publish the measurement data
received from the MQTT Broker on the Web, and at the same time to store them
within a MongoDB database. Additionally, an information flow was performed
through a RESTFUL Web service and a GET method, which allowed retrieving
information from the database in order to be shared by other systems.
Figure 5 presents the control panel information about the state of the system’s
central node, it displays the Temperature, CPU Load, and Memory Consump-
tion, which may allow to diagnose the status of the WSN. Figure 6, on the other
hand, shows an example of the real-time monitoring of methane by one of the
sensors, sent by the central node.

4 Results and Discussion

For the proof of concept of this monitoring solution, several pollutant measure-
ments were taken every seven seconds. These measurements were conducted
around three different locations in Ecuador: an university campus located in the
city of Sangolquı́, in the southern zone of the city of Machachi, and in the “La
Virgen Santisima” cave in Tena [38]. A total of 260 samples were obtained for
CO, CO2, CH4, SO2, H2S, NO2 gases, and powder density of type PM2.5 and
PM10. The obtained measurements were on the detection ranges of the gas sen-
sors used in the prototype. Table 1 shows the sensors ranges together with the
typical concentrations of such gases in the environment.
Figure 7 shows the resulting CO2 measurements for the 3 locations, as it can
be seen the concentration inside the cave is much higher than in the cities of
Sangolquı́ and Machachi. An average of 1240 ppm was obtained inside the karstic
cave with a standard deviation (SD) of 319 ppm. In Sangolquı́, it was of about
962 ppm with a SD of 112 ppm, while in Machachi an average of 794 ppm with
a corresponding SD of 89 ppm, was obtained.
178 B. Guanochanga et al.

Fig. 5. Control panel of the central node of the system.

Fig. 6. Example of real-time monitoring of methane by one of the sensors.


Real-Time Air Pollution Monitoring Systems Using WSN 179

Table 1. Types of polluting gases and measurement ranges for the sensors used

Type Sensors used to measure air pollution


Polluting gas Sensor Range Reference
Carbon monoxide, CO MQ-7 20–200 ppm Ecua. Stand.
Carbon dioxide, CO2 MG-811 400–10000 ppm Ref. value
Methane, CH4 MQ-4 200–10000 ppm Ref. value
Sulfur dioxide, SO2 MQ-136 1–200 ppm WHO
a
Hydrogen sulfide, H2S MQ-136 1–100 ppm OSHA
Nitrogen dioxide, NO2 Mics-2714 0.05 to 5 ppm WHO
PM2.5/PM10 HK-A5 0–999 ug/m3 WHO
a
Occupational Safety and Health Administration (OSHA), USA.

Fig. 7. Comparison of CO2 measurement results in the three different locations.

Figure 8, on the other hand, shows the data obtained from the 2.5 micron
particulate material. In Sangolquı́, it reached up to 10 ug/m3, having a higher
concentration density than Machachi with about 5 ug/m3, while for the cave
this concentration is of about 6 ug/m3. Nevertheless, the Ecuadorian standard
of air quality [1] specifies as limit an average of about 50 ug/m3 in a day of
monitoring, and 15 ug/m3 as an annual average, therefore the density of dust
particles in the studied sectors remains within the recommended levels.
180 B. Guanochanga et al.

Fig. 8. Measurements of PM2.5 at three different locations.

Fig. 9. Measurements of PM10 at three different locations.


Real-Time Air Pollution Monitoring Systems Using WSN 181

Finally, Fig. 9 illustrates that the particulate material measurements were


finer than 10 microns at the 3 locations. Hereby, the Sangolquı́ sector presents
again mostly data with about 12 ug/m3, being higher than the sector of Machachi
with 6 ug/m3 and than the Amazonian Cave with about 8 ug/m3. Similarly, the
Ecuadorian standard of air quality establishes that the annual PM10 concen-
tration should not exceed 50 ug/m3; and the daily average should not exceed
100 ug/m3. Therefore, the measurements obtained for the 3 locations comply
with the values recommended by the WHO and the Ecuadorian standard of air
quality.

5 Conclusions and Future Work


This paper focused on the design and implementation of a real-time Air Pol-
lution Monitoring System based on the use of WSN under the concept of IoT
using the infrastructure of a Cloud Computing. A three-layer architecture was
designed and implemented with low-cost electronic hardware, such as Arduino-
based sensor nodes as well as a Raspberry Pi-based gateway node with a low-cost
wireless expansion module that captures the data. In addition, a Web service
was also designed and implemented using a set of protocols and formats used to
process the data and store them in a MongoDB Database as part of the Cloud
infrastructure. The implemented Web graphical user interface allowed the com-
munication with the WSN and users. Compared with other proposed solutions
described in the literature, the solution proposed here is secure, since the same
chain has been stored within the data processing nodes in a flat file, and sent
to the application layer by means of the SFTP, acting as a failback method, in
order to process it and keep it as redundant in case the MQTT message protocol
fails.
Next steps will include the integration of the proposed solution with an ana-
lytical data system based on big data tools, as well as performance improvements
on the capture of the frames by using an Odroid electronic board.

Acknowledgment. The authors would like to thank the financial support of the
Ecuadorian Corporation for the Development of Research and the Academy (RED
CEDIA) in the development of this work, under Project Grant CEPRA-XI-2017-13.

References
1. World Health Organization. Ambient air pollution: A global assessment of exposure
and burden of disease (2016)
2. Zhi-gang, H., Cai-hui, C.: The application of Zigbee based wireless sensor network
and GIS in the air pollution monitoring. In: 2009 International Conference on
Environmental Science and Information Application Technology, Wuhan, pp. 546–
549 (2009). https://doi.org/10.1109/ESIAT.2009.192
3. Banghong, X., Yang, L., Honglei, Z., Junfeng, L.: Application design of wire-less
sensor networks in environmental pollution monitoring. Comput. Measur. Control
2, 003 (2009)
182 B. Guanochanga et al.

4. Postolache, O.A., Dias Pereira, J.M., Silva Girao, P.M.B.: Smart sensors network
for air quality monitoring applications. IEEE Trans. Instrum. Measur. 58(9), 3253–
3262 (2009). https://doi.org/10.1109/TIM.2009.2022372
5. Eren, H., Al-Ghamdi, A., Luo, J.: Application of Zigbee for pollution monitoring
caused by automobile exhaust gases. In: 2009 IEEE Sensors Applications Sympo-
sium, New Orleans, LA, pp. 164–168 (2009). https://doi.org/10.1109/SAS.2009.
4801799
6. Bader, S., Anneken, M., Goldbeck, M., Oelmann, B.: SAQnet: experiences from
the design of an air pollution monitoring system based on off-the-shelf equipment.
In: 2011 Seventh International Conference on Intelligent Sensors, Sensor Networks
and Information Processing, Adelaide, SA, pp. 389–394 (2011). https://doi.org/
10.1109/ISSNIP.2011.6146632
7. Liu, J.H., Chen, Y.F., Lin, T.S., Lain, D.W., Wen, T.H., Sun, C.H., Jiang, J.A.:
Developed urban air quality monitoring system based on wireless sensor networks.
In: 2011 Fifth International Conference on Sensing Technology, Palmerston North,
pp. 549–554 (2011). https://doi.org/10.1109/ICSensT.2011.6137040
8. Zhou, G., Chen, Y.: The research of carbon dioxide gas monitoring platform based
on the wireless sensor networks. In: 2011 2nd International Conference on Artifi-
cial Intelligence, Management Science and Electronic Commerce (AIMSEC), Deng
Leng, pp. 7402–7405 (2011). https://doi.org/10.1109/AIMSEC.2011.6010423
9. Yan, Z., Eberle, J., Aberer, K.: OptiMoS: optimal sensing for mobile sensors. In:
2012 IEEE 13th International Conference on Mobile Data Management, Bengaluru,
Karnataka, pp. 105–114 (2012). https://doi.org/10.1109/MDM.2012.43
10. Mao, X., Miao, X., He, Y., Li, X.Y., Liu, Y.: CitySee: urban CO2 monitoring
with sensors. In: 2012 Proceedings IEEE INFOCOM, Orlando, FL, pp. 1611–1619
(2012). https://doi.org/10.1109/INFCOM.2012.6195530
11. Wang, C.H., Huang, Y.K., Zheng, X.Y., Lin, T.S., Chuang, C.L., Jiang, J.A.: A self
sustainable air quality monitoring system using WSN. In: 2012 Fifth IEEE Inter-
national Conference on Service-Oriented Computing and Applications (SOCA),
Taipei, pp. 1–6 (2012). https://doi.org/10.1109/SOCA.2012.6449427
12. Devarakonda, S., Sevusu, P., Liu, H., Liu, R., Iftode, L., Nath, B.: Real-time air
quality monitoring through mobile sensing in metropolitan areas. In: Proceedings
of the 2nd ACM SIGKDD International Workshop on Urban Computing, p. 15,
August 2013. https://doi.org/10.1145/2505821.2505834
13. Kadri, A., Yaacoub, E., Mushtaha, M., Abu-Dayya, A.: Wireless sensor network
for real-time air pollution monitoring. In: 2013 1st International Conference on
Communications, Signal Processing, and their Applications (ICCSPA), Sharjah,
pp. 1–5 (2013). https://doi.org/10.1109/ICCSPA.2013.6487323
14. Kelly, S.D.T., Suryadevara, N.K., Mukhopadhyay, S.C.: Towards the Implementa-
tion of IoT for environmental condition monitoring in homes. IEEE Sens. J. 13(10),
3846–3853 (2013). https://doi.org/10.1109/JSEN.2013.2263379
15. Fuertes, W., Carrera, D., Villacı́s, C., Toulkeridis, T., Galárraga, F., Torres, J.,
Aules, H.: Distributed system as internet of things for a new low-cost, air pollution
wireless monitoring on real time. In: IEEE/ACM 19th International Symposium
on Distributed Simulation and Real Time Applications (DS-RT), Chengdu, China,
pp. 58–67 (2015). https://doi.org/10.1109/DS-RT.2015.28
16. Mansour, S., Nasser, N., Karim, L., Ali, A.: Wireless sensor network-based air
quality monitoring system. In: 2014 International Conference on Computing, Net-
working and Communications (ICNC), Honolulu, HI, pp. 545–550 (2014). https://
doi.org/10.1109/ICCNC.2014.6785394
Real-Time Air Pollution Monitoring Systems Using WSN 183

17. Kim, J.Y., Chu, C.H., Shin, S.M.: ISSAQ: an integrated sensing systems for real-
time indoor air quality monitoring. IEEE Sens. J. 14(12), 4230–4244 (2014).
https://doi.org/10.1109/JSEN.2014.2359832
18. Abraham, S., Li, X.: A cost-effective wireless sensor network system for indoor
air quality monitoring applications. Procedia Comput. Sci. 34, 165–171 (2014).
https://doi.org/10.1016/j.procs.2014.07.090
19. Ferdoush, S., Li, X.: Wireless sensor network system design using Raspberry Pi
and Arduino for environmental monitoring applications. Procedia Comput. Sci.
34, 103–110 (2014). https://doi.org/10.1016/j.procs.2014.07.059
20. Liu, S., Xia, C., Zhao, Z.: A low-power real-time air quality monitoring system
using LPWAN based on LoRa. In: 2016 13th IEEE International Conference on
Solid-State and Integrated Circuit Technology (ICSICT), Hangzhou, pp. 379–381
(2016). https://doi.org/10.1109/ICSICT.2016.7998927
21. Sugiarto, B., Sustika, R.: Data classification for air quality on wireless sensor net-
work monitoring system using decision tree algorithm. In: 2016 2nd International
Conference on Science and Technology-Computer (ICST), Yogyakarta, pp. 172–176
(2016). https://doi.org/10.1109/ICSTC.2016.7877369
22. Pieri, T., Michaelides, M.P.: Air pollution monitoring in lemesos using a wireless
sensor network. In: 2016 18th Mediterranean Electrotechnical Conference (MELE-
CON), Lemesos, pp. 1–6 (2016). https://doi.org/10.1109/MELCON.2016.7495468
23. Boubrima, A., Bechkit, W., Rivano, H.: Optimal WSN deployment models for
air pollution monitoring. IEEE Trans. Wirel. Commun. 16(5), 2723–2735 (2017).
https://doi.org/10.1109/TWC.2017.2658601
24. Pavani, M., Rao, P.T.: Real time pollution monitoring using Wireless Sensor Net-
works. In: 2016 IEEE 7th Annual Information Technology, Electronics and Mobile
Communication Conference (IEMCON), Vancouver, BC, pp. 1–6 (2016). https://
doi.org/10.1109/IEMCON.2016.7746315
25. Pavani, M., Rao, P.T.: Urban air pollution monitoring using wireless sensor net-
works: a comprehensive review. Int. J. Commun. Netw. Inf. Secur. (IJCNIS) 9(3)
(2017)
26. Hojaiji, H., Kalantarian, H., Bui, A.A.T., King, C.E., Sarrafzadeh, M.: Temper-
ature and humidity calibration of a low-cost wireless dust sensor for real-time
monitoring. In: 2017 IEEE Sensors Applications Symposium (SAS), Glassboro,
NJ, pp. 1–6 (2017). https://doi.org/10.1109/SAS.2017.7894056
27. Jaladi, A.R., Khithani, K., Pawar, P., Malvi, K., Sahoo, G.: Environmental mon-
itoring using Wireless Sensor Networks (WSN) based on IOT. Int. Res. J. Eng.
Technol. (IRJET) 4, 1371–1378 (2017)
28. Sivamani, S., Choi, J., Bae, K., Ko, H., Cho, Y.: A smart service model in green-
house environment using event-based security based on wireless sensor network.
Concurrency Comput. Pract. Exp. 30, 1–11 (2018). https://doi.org/10.1002/cpe.
4240
29. Yadav, M., Sethi, P., Juneja, D., Chauhan, N.: An agent-based solution to energy
sink-hole problem in flat wireless sensor networks. In: Next-Generation Networks,
vol. 638, pp. 255–262. Springer, Singapore (2018). https://doi.org/10.1007/978-
981-10-6005-2-27
30. Aznoli, F., Navimipour, N.J.: Deployment strategies in the wireless sensor net-
works: systematic literature review, classification, and current trends. Wirel. Pers.
Commun. 95, 819–846 (2017). https://doi.org/10.1007/s11277-016-3800-0
184 B. Guanochanga et al.

31. Xu, Y., Liu, F.: Application of wireless sensor network in water quality monitoring.
In: 2017 IEEE International Conference on Computational Science and Engineering
(CSE) and IEEE International Conference on Embedded and Ubiquitous Comput-
ing (EUC), Guangzhou, pp. 368–371 (2017). https://doi.org/10.1109/CSE-EUC.
2017.254
32. Yu, J., Wang, W., Yin, H., Jiao, G., Lin, Z.: Design of real time monitoring system
for rural drinking water based on wireless sensor network. In: 2017 International
Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an,
pp. 281–284 (2017). https://doi.org/10.1109/ICCNEA.2017.102
33. Yang, J., Zhou, J., Lv, Z., Wei, W., Song, H.: A real-time monitoring system
of industry carbon monoxide based on wireless sensor networks. Sensors 15(11),
29535–29546 (2015)
34. Nikhade, S.G.: Wireless sensor network system using Raspberry Pi and Zigbee
for environmental monitoring applications. In: 2015 International Conference on
Smart Technologies and Management for Computing, Communication, Controls,
Energy and Materials (ICSTM), pp. 376–381 (2015)
35. Delamo, M., Felici-Castell, S., Pérez-Solano, J.J., Foster, A.: Designing an open
source maintenance-free environmental monitoring application for wireless sensor
networks. J. Syst. Softw. 103, 238–247 (2015)
36. Moltchanov, S., Levy, I., Etzion, Y., Lerner, U., Broday, D.M., Fishbain, B.: On the
feasibility of measuring urban air pollution by wireless distributed sensor networks.
Sci. Total Environ. 502, 537–547 (2015)
37. Chen, Z., Hu, C., Liao, J., Liu, S.: Protocol architecture for wireless body area
network based on nRF24L01. In: 2008 IEEE International Conference on Automa-
tion and Logistics, Qingdao, pp. 3050–3054 (2008). https://doi.org/10.1109/ICAL.
2008.4636702
38. Constantin, S., Toulkeridis, T., Moldovan, O.T., Villacis, M., Addison, A.: Caves
and karst of Ecuador - state-of-the-art and research perspectives. Physical Geog-
raphy in press (2018). https://doi.org/10.1080/02723646.2018.1461496
A Multi-agent Model for Security Awareness
Driven by Home User’s Behaviours

Farhad Foroughi ✉ and Peter Luksch


( )

Institute of Computer Science, University of Rostock, Rostock, Germany


{farhad.foroughi,peter.luksch}@uni-rostock.de

Abstract. Computer users are limited to perform multitask operations and


processing information. These limitations affect their decision and full attention
on security tasks. The majority of cybercrimes and frauds including effective
security decisions and practising security management are related to human
factors even for experts. Information Security awareness and effective home user
training depend on concrete information and accurate observation of user behav‐
iours and their circumstances. Users’ awareness and consciousness about security
threats and alternatives motivate them to take proper actions in a security situa‐
tion. This research proposes a multi-agent model that provides security awareness
based on users’ behaviours in interaction with home computer. Machine learning
is utilized by this model to profile users based on their activities in a cloud infra‐
structure. Machine learning improves intelligent agent accuracy and cloud
computing makes it flexible, scalable and enhances performance.

Keywords: Home user’s behaviour · Security awareness


Intelligent multi-agent model · User profiling

1 Introduction

Computer users are limited to perform multitask operations and processing information.
These limitations affect their decision and full attention on security tasks. Two signifi‐
cant factors to choose the best action are individual perception climate and self-efficacy
[1, 2].
There is a wide range of home computer usage with different types of users. More‐
over, research and study over the home computer security are challenging because there
is no canonical and specific definition of home computer user. A home user may use a
computer for shopping and banking and other normal daily tasks. The user could be
students who use the computer for learning purposes and use educational software. The
age and gender also may affect the using of a computer at home.
According to these conditions and contexts, users’ information security behaviour
is very dynamic and changeable. The differences between users affect their decisions to
support security or often ignore it [3]. In addition, Information Technology brings new
technology in houses, and the focus of security solutions is also technological. The
majority of the cybercrimes and frauds even for experts are related to human factors
including effective security decisions and practicing security management [4, 5].

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 185–195, 2019.
https://doi.org/10.1007/978-3-030-02686-8_15
186 F. Foroughi and P. Luksch

Byrne et al. [6] analysis presents that computer knowledge and expertise affects the
importance of new threats. For example, integrity perception is significant for users with
extensive knowledge. They also provide evidences that users ignore privacy settings to
follow their habits.
Information Security awareness and effective home user training depend on concrete
information and accurate observation of user behaviours and their circumstances.
Without this information, it would be difficult to provide effective advice or create proper
policy. In additions, as long as individuals fail to provide secure behaviour and interact
with computer safely, other relevant organisations such as government, financial insti‐
tutes and shopping markets that provided online services could be in danger and at risk.
This paper proposes a multi-agent model to provide security awareness and training
material based on users’ behaviours and home computer interactions. This model uses
machine learning on a cloud platform to analyse behaviours in real time or very close
to that.
In Sect. 2, significant human factors that influence user’s decision in a security
situation are discussed. Section 3 introduces the required characteristics of an effective
awareness program for home users. User profiling is the process of capturing user-
computer interaction to model user’s behaviours introduced in Sect. 4. Finally, Sect. 5
proposes a multi-agent model and discusses each element of the model.

2 Human Factors

Psychologists and cognitive scientists say personal behaviours are linked to the person‐
ality profile. Some factors like age or age group, gender, personal interests and hobbies,
occupation, education, and history of actions are included in the personality profile.
This is important to understand users’ online activities and behaviours as well as
their personality and occupation (or computer role) to be able to provide appropriate
security awareness and training.
By providing systematic awareness and guidance for all users sharing a computer or
home network based on their behaviours, this effect could become a security culture.
“Every [security] system is inadequate if there is no security culture shared by the whole
staff” [7].
The information security culture for home users is an important element to provide
an effective and continues secure, safe behaviour [7]. The information security respon‐
sibility as well as physical security of users are the essential pieces of a comprehensive
way to deal with information security management. Metalidou has categorised all human
factors related to this aspect in four groups. These groups are (1) user interfaces of
security-related systems; (2) information security management concerns for risk, busi‐
ness processes and finance; (3) organisational issues related to information security
behaviour, and (4) counterproductive computer usage [5].
It is finally individuals who make decisions in any information security implemen‐
tation, but most of the home users’ security decisions are limited to their technological
solutions.
A Multi-agent Model for Security Awareness 187

Having improved security controls does not mean they are free from risk. West
proves that individuals maintain an appropriate level of risk and danger [2]. In the home
security context, it means a security control implementation or improvement will
increase the users’ risky behaviour.
Technical security controls influence the users’ actions by providing security func‐
tions and mechanisms, but human factors also affect individual’s decisions. Human
factors are including motivation, knowledge, attitude, values and so on. The quality and
accuracy of risk perception impacts users’ awareness, consciousness and behaviour and
motivate them to take proper action in an information security management system [8].
In addition, any awareness program and education plan depend on the views of facili‐
tating the people to make relevant and effective security choices and thus achieve greater
suitable information security consequences [9].

2.1 Security Awareness Program


When a home user is in a security situation or having a risky behaviour, having appro‐
priate skills or knowledge against the threat would lead the user to play an active role.
The confidence based on appropriate solutions will push users to choose adaptive
behaviours more than maladaptive actions [10].
Awareness training generally includes security situations that may occur, the risks
confronted, fundamental methods of security, how to build effective security behaviour,
and recommended resources and support in a security scenario.
Within the home security context, users are able to decide whether and how to carry
out security actions because their options and alternatives are voluntary and subjective.
To follow the decision-making process and to analyse the situation, researchers
recognise five factors that influence users’ decisions in computer security situations.
These factors are [3]:
(1) Recognition, awareness and consciousness of safe practices.
(2) Recognition, awareness and consciousness of possible negative consequences of
unsafe actions.
(3) Recognition, awareness and consciousness of possible supportive resources for safe
practices.
(4) Probably and likelihood of negative consequences.
(5) Cost of consequences.
These five factors could be categorised in two general divisions: (1) Awareness and
knowledge of risks as well as consequences. (2) Awareness and understanding of defen‐
sive and protective measures [3]. Therefore, to provide an effective security awareness
program, it is significant to support human factors that influence users’ decisions.
Home users like other individuals are unmotivated and have a limited capacity for
information processing specifically in multitasking scenarios. Users need the motivation
to improve their capabilities.
When a user has to evaluate alternative options in a situation to make the best choice
and decision, results which are actually abstract in nature such as security and protection
188 F. Foroughi and P. Luksch

are likely to be less persuasive compared to those that are concrete. Consequently, users
need to have a concrete understanding of security definitions [11].
In a typical and normal learning position, a behaviour is formed by positive rein‐
forcement whenever take action “right”. Hence, users need feedback and learning form
specific and particular security-related decisions and not just common protection or
dangerous choices.
The protection and safety measure gain is generally conceptual but negative effects,
and consequences are stochastic, costly and immediate. Accordingly, users should be
able to evaluate any security and risk trade-off.
Furthermore, security benefit and gain are usually intangible or conceptual, but in
the opposite, security cost or losses values are more probable [12]. Because of this, cost
and loss perception are more important influence factors than gain and benefit when
individuals try to evaluate security risks. However, Tversky and Kahneman proved that
individuals are a lot more likely to stay away risk when options are provided as benefits
and take risk when alternatives are presented as losses [13]. They also confirmed that
when users perceive a gain and loss to have the very same benefit, the loss is considerably
more motivating in choosing alternatives (Fig. 1). For example, online shoppers respond
more properly to the understanding of likelihood and chance of negative threats than to
awareness of the threats themselves [3].

Fig. 1. Losses carry more value compared to gain when both are perceived as equal.

The fear manipulation will influence the perceived intensity of the risk and threat.
In addition, an increase of fear appeals will improve the chance and likelihood of a threat
to be realised. Rewards could be an individual pleasure or a fulfilment by peers. The
social acceptance might also be a kind of rewards. Fear awaking could adjust both threat
(risk) perception and threat (risk) probability. Therefore, providing threat (risk) evalu‐
ation is considered to prevent maladaptive reactions [14].
As it is discussed, users usually feel they are at less risk than others. Based on these
findings, it is almost always necessary to improve and enhance users’ risk perception
awareness to increase their security and protection compliance.
Raising risk perception and understanding might also be corporate and comprehen‐
sive to decrease the probability and chance of security policy violation. It means home
user security awareness should be assembled to produce sufficient information and
A Multi-agent Model for Security Awareness 189

knowledge and support all family members or individuals who share a computer to
eliminate security risks.
Clearly, in case home users have to take extra measures and steps to increase their
level of protection, it should not be difficult, and the cost of applying and employing
security controls should be reduced as much as possible with efficient support.

3 User Profiling

Computer security awareness and training has to be personalised to produce the home
user with a sufficient and effective learning experience lined up with his/her day-to-day
occupation, activities, time availability, interests, generation and connection with owned
technology.
The capability of data analysis to correlate information and data from a broad range
of sources across substantial time periods could bring out a clear and efficient under‐
standing of home users’ activities and behaviours. By using this analysis concerning big
data sources, makes security awareness program able to categories users in different risk
groups and provide the appropriate information and training.
For this reason, recognising user behaviour in real time is an important element of
providing relevant information and help to take suitable action or decision. It is possible
to employ user modelling to make this process automatic by using an application or
intelligent agent [15]. It is proved that the user should be realised in a variety of contexts.
Therefore, a context-aware system should be utilised to identify user context in a certain
time period [16]. This aspect drives the idea of using data science and machine learning
to automate the user behaviour analysis to provide a data-driven decision-making model.
A home user could be recognised in cyberspace by a digital profile [3]. A research
by Weber et al. proves that a user profile presents (1) the user’s behavioural patterns or
preferences, (2) the user’s characteristics, (3) the user’s skills, and (4) the cognitive
process that a user chooses an action [17, 18].
The primary function of user profiling is capturing user’s information about interest
domain. This information may be used to understand more about individual’s knowledge
and skills and to improve user satisfaction or help to make a proper decision. The user
profile consists of all information about a user that could be known by the system.
User profiling is usually either knowledge-based or behaviour-based. The knowl‐
edge-based strategy uses statistical models to categorise a user in the closest model based
on dynamic attributes.
The behaviour-based strategy employs the user’s behaviours and actions as a model
to observe beneficial patterns by applying machine learning techniques. Real-time user
behaviour analysis requires on-line monitoring to predict users actions. These behav‐
iours could be extracted through monitoring and logging tasks [19]. Batch analysis or
off-line monitoring could be carried out in time intervals or after a task has been finished
by a user in accordance with statistical parameters of user actions. Using online and off-
line monitoring modes together provide both statistical and dynamical analysis of user
actions [20].
190 F. Foroughi and P. Luksch

Generally, a user profiling begins with user’s data retrieval and data collection.
Collecting user information (actions details) is the first step to create a user model. It
includes “what” information required and “how” to collect relevant information. Data
gathering model could be explicit or implicit [21].
Explicit model means the computer user should be encouraged to provide a specific
amount of information, but just a few number of users participate in such a process and
furthermore, the provided information also has poor quality. Another significant point,
if keeping data up to date is necessary, this data collection model becomes more chal‐
lenging [22].
Implicit data collection model is a “silent” process to collect information through
analysing observed users’ actions and reactions in a computer interaction environ‐
ment [22].
A hybrid profiling model considers both static characteristics and features of a user
and also, tries to retrieve the behavioural information about the user. This strategy creates
a more efficient profile and maintains the accuracy of user data by keeping it up to date.
A major attribute of discovery through observation is user’s change adaptation. It
means, when user’s interest, preferences, habits and goals are changed over the time,
these changes could be reflected in the user profile to keep it updated. This attribute is
possible by using profiling techniques which adapt and adjust the content of user profiles
when new observation data arrived. User feedback could also play an essential role in
this particular process [23].
Collecting a wide range of user’s data creates specific challenges and needs an infra‐
structure to support several requirements including security, privacy and performance.
The data collection should be transparent as much as possible with minimum user inter‐
action. It also should not make the limitation on system computing or network perform‐
ance. Because the behaviour analysis model may require a different type of data over a
time period, data collector architecture should be flexible to cover various sensor types
and technologies on different platforms.

4 Multi-agent Model

Multiple heterogeneous software entities (agents) that interact with each other directly
or indirectly in a complex system with common or common or conflicting goals build
a multi-agent system. [24]. A direct communication might be via messaging, and indirect
communication could be through making an effect on the environment which the other
agent(s) can sense it [25].
An agent provides noticeable characteristics including autonomous, social (interact
with other agents), reactive, proactive, trustworthiness, rationality and learning. Reac‐
tivity character makes agents able to provide ongoing interaction with the system.
Agents are proactive and rational which develops agent behaviour in accordance with
its goal [20].
A Multi-agent Model for Security Awareness 191

The environment that home user interacts with a computer is continual, observable,
dynamic, accessible and non-deterministic. This complex environment requires a multi-
agent system to provide an infrastructure which agents could interact with each other to
achieve the system goal.
An intelligent agent is an ideal rational agent that provides actions to reach the highest
level of performance measure by using provided evidence and built-in knowledge. The
performance measure determines principles of success but should be carefully defined
to concern conflicting criteria.
A rational agent is an agent that performs right actions to achieve its goal as
successful as possible. It means that a rational agent has to be reasonable, sensible and
provides good judgment. Rationality depends on performance measures (determines the
level of success), agent perception from the past (prior knowledge), agent understanding
about the environment (perception sequence) and possible actions [26]. The perform‐
ance measure defines the criterion of success for an agent.
An intelligent agent is based on learning model to run the inference engine. Feature
extraction block receives information from sensors to extract useful features and then it
will send them to the inference engine. The trained inference engine uses this information
based on learning model to predict a result. The learning model is constructed by a
machine learning algorithms [27]. The inference engine provides a decision and sends
it to the actuator. The actuator is responsible for performing necessary action(s).
Machine learning, stored knowledge and condition rules are typical techniques to
make an agent intelligent. Machine learning imparts intelligence by using labelled data
and training process. This approach makes it possible to extract patterns and relation‐
ships to predict unknown data to solve the problem [27].
A distributed multi-agent architecture could supply the flexibility of providing
required functions in the necessary locations. It also requires less programming chal‐
lenges and system control by employing global objectives to supply necessary knowl‐
edge and experiences that make agents able to solve complex problems by more
autonomy [28]. Distributed multi-agent system by using cloud computing power is a
combination of distributed independent, autonomous and incomplete agents that work
together to address a complex global issue with no need of centralised system control
[28]. In this cloud architecture, data is decentralised, and computing is asynchro‐
nous [29].
This architecture lets devices implement more features with limited storage and
processing capabilities.
The proposed model (Fig. 2) tries to develop an architecture by integrating cloud
computing approach and multi-agents architecture to provide a dynamic, flexible, robust
and scalable intelligent system.
192 F. Foroughi and P. Luksch

Fig. 2. Proposed multi-agent model.

In this architecture, the user interface (UI) agent directly interacts with user and
computer to collect required data through independent sensors and also provides relevant
information including warnings, or training materials. The UI agent has sensor modules
including different independent sensors which are responsible for capturing users’
actions from a wide range of resources such as browser history, system settings, file
system, network interfaces and user data (via an explicit method). It extracts relevant
features and also generates data logs. These logs consist of personal and private details
about a user. It is necessary to provide appropriate security measures to keep them safe
and confidential. For this reason, the UI requires two types of data storages to create and
maintain (update) information.
Online storage to store user data and profile: For security purpose, a Secured Virtual
Diffused File System (SVDFS) by using private cloud is proposed. The data exchange
between UI agent and cloud is also protected by a secure communication protocol using
PKI.
Offline storage to store log files and user activities for further analysis or until trans‐
mitted to the server. These log files are stored in an encrypted container with password
protection.
The user profiler (UP) agent receives extracted features from UI and uses machine
learning to process information and create (and update) user profile. The UP agent uses
cloud computing to provide a dynamic, distributed and scalable service.
The risk evaluator (RE) agent receives user profile information from the UP agent
and also recent threats and vulnerabilities from the threat finder (TF) agent. According
to the user profile which describes the necessary level of security and relevant security
measures, the RE agent analyses user’s actions by utilising machine learning techniques
and provides a risk level and related threat’s information to the awareness provider agent.
A Multi-agent Model for Security Awareness 193

The awareness provider (AP) agent uses an awareness and security control repository
to create appropriate awareness and training material covering threats, vulnerabilities,
risk level and required protective or preventive actions. This information will be sent to
the UI agent to be presented by a suitable method through visualiser modules.
Figure 3 illustrates the layered architecture of the multi-agent model and the commu‐
nication links between agents.

Fig. 3. Layered architecture of proposed multi-agent model.

5 Conclusion

Home users like other individuals are unmotivated and have a limited capacity for
information processing in the security situations. Users’ awareness and consciousness
about security threats and alternatives motivate them to take proper action in an infor‐
mation security management system. An effective security awareness requires a concrete
understanding of security definitions, and learning form specific security-related deci‐
sions. It should also provide security control evaluation and risk trade-off when loss
perception and cost is considerably more motivating in choosing alternatives. Risk
perception awareness is a significant factor to increase user’s security and protection
compliance. This research has proposed a multi-agent model that provides security
awareness based on users’ behaviours in interaction with home computer. Machine
learning is utilized by this model to profile users based on their activities in a cloud
infrastructure. Machine learning improves intelligent agent accuracy and cloud
computing makes it flexible, scalable and enhances performance.
This research is limited to cover only home users’ requirements and awareness
program is based on security risks which might be occurred in accordance of general
users’ activities. Moreover, it is significant to handle a huge amount of data in an
online mode and process data streams in real time. Therefore, there are many
machine learning classifiers based on Neural Network (NN), Bayesian learnings,
Decision trees and, statistical analysis tools which should be trained and tested in
194 F. Foroughi and P. Luksch

accordance with samples which will be collected through a volunteer program to find
best possible online classifier.
In this field, the next challenge is to identify required monitoring sensors to observe
users’ behaviour and provide a comparison between machine learning algorithms to
achieve the best performance.

References

1. Hazari, S., Hargrave, W., Clenney, B.: An empirical investigation of factors influencing
information security behavior. J. Inf. Priv. Secur. 4(4), 3–20 (2008)
2. West, R.: The psychology of security. Commun. ACM 51(4), 34–40 (2008)
3. Howe, A.E., et al. The psychology of security for the home computer user. In: 2012 IEEE
Symposium on Security and Privacy (SP). IEEE (2012)
4. Wash, R.: Folk models of home computer security. In: Proceedings of the Sixth Symposium
on Usable Privacy and Security. ACM (2010)
5. Metalidou, E., et al.: The human factor of information security: unintentional damage
perspective. Proc. Soc. Behav. Sci. 147, 424–428 (2014)
6. Bryant, P., Furnell, S., Phippen, A.: Improving protection and security awareness amongst
home users. Adv. Netw. Comput. Commun. 4, 182 (2008)
7. Malcolmson, J.: What is security culture? Does it differ in content from general organisational
culture? In: 43rd Annual 2009 International Carnahan Conference on Security Technology
(2009)
8. Albrechtsen, E.: A qualitative study of users’ view on information security. Comput, Secur.
26(4), 276–289 (2007)
9. Mai, B., et al.: Neuroscience Foundations for Human Decision Making in Information
Security: A General Framework and Experiment Design, in Information Systems and
Neuroscience, pp. 91–98. Springer, Berlin (2017)
10. Milne, G.R., Labrecque, L.I., Cromer, C.: Toward an understanding of the online consumer’s
risky behavior and protection practices. J. Consum. Affairs 43(3), 449–473 (2009)
11. Borgida, E., Nisbett, R.E.: The differential impact of abstract vs. concrete information on
decisions. J. Appl. Soc. Psychol. 7(3), 258–271 (1977)
12. Zurko, M.E., Simon, R.T.: User-centered security. In: Proceedings of the 1996 Workshop on
New Security Paradigms. ACM (1996)
13. Tversky, A., Kahneman, D.: Rational choice and the framing of decisions. J. Bus. 59, S251–
S278 (1986)
14. Mckenna, S.P., Predicting health behaviour: research and practice with social cognition
models. In: Conner, M., Norman, P. (eds.) Open University Press, Buckingham (1996). 230
p. Elsevier, ISBN 0-335-19320-X
15. Iglesias, J.A., et al.: Creating evolving user behavior profiles automatically. IEEE Trans.
Knowl. Data Eng. 24(5), 854–867 (2012)
16. Dinoff, R., et al. Learning and managing user context in personalized communications
services. In: Proceedings of the International Workshop in Conjunction with AVI 2006 on
Context in Advanced Interfaces. ACM (2006)
17. Weber, E.U., Blais, A.R., Betz, N.E.: A domain-specific risk-attitude scale: measuring risk
perceptions and risk behaviors. J. Behav. Decis. Mak. 15(4), 263–290 (2002)
18. Iglesias, J.A., Ledezma, A., Sanchis, A.: Evolving systems for computer user behavior
classification. In: 2013 IEEE Conference on Evolving and Adaptive Intelligent Systems
(EAIS). IEEE (2013)
A Multi-agent Model for Security Awareness 195

19. Middleton, S.E., Shadbolt, N.R., De Roure, D.C.: Ontological user profiling in recommender
systems. ACM Trans. Inf. Syst. 22(1), 54–88 (2004)
20. Kussul, N., Skakun, S.: Intelligent system for users’ activity monitoring in computer
networks. In: Intelligent Data Acquisition and Advanced Computing Systems: Technology
and Applications, IDAACS 2005. IEEE (2005)
21. Schölkopf, B., et al.: Estimating the support of a high-dimensional distribution. Neural
Comput. 13(7), 1443–1471 (2001)
22. Ouaftouh, S., Zellou, A., Idri, A.: User profile model: a user dimension based classification.
In: 2015 10th International Conference on Intelligent Systems: Theories and Applications
(SITA). IEEE (2015)
23. Schiaffino, S., Amandi, A.: Intelligent user profiling. In: Artificial Intelligence an
International Perspective, pp. 193–216. Springer (2009)
24. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and
Logical Foundations. Cambridge University Press, Cambridge (2008)
25. Maes, P.: Pattie Maes on software agents: humanizing the global computer. IEEE Internet
Comput. 1(4), 10–19 (1997)
26. Stuart, R., Peter, N.: Artificial Intelligence-A Modern Approach, vol. 3. California, Berkeley
(2016)
27. Joshi, P.: Artificial Intelligence with Python. Packt Publishing, Birmingham (2017)
28. Rodríguez, S., et al.: Cloud computing integrated into service-oriented multi-agent
architecture. In: Balanced Automation Systems for Future Manufacturing Networks, pp. 251–
259. Springer, Berlin (2010)
29. Wooldridge, M.: An Introduction to Multiagent Systems. Wiley, London (2009)
Light Weight Cryptography for Resource
Constrained IoT Devices

Hessa Mohammed Zaher Al Shebli ✉ and Babak D. Beheshti ✉


( ) ( )

New York Institute of Technology, Old Westbury, NY 11568, USA


Babak.beheshti@nyit.edu

Abstract. The Internet of Things (IoT) is going to change the way we live
dramatically. Devices like alarm clocks, lights and speaker systems can inter‐
connect and exchange information. Billions of devices are expected to be inter‐
connected by the year 2020, thus raising the alarm of a very important issue
‘security’. People have to be sure that their information will stay private and
secure, if someone hacked into your medical device (hand watch) he will be able
to view all your medical records, and he could be able to use it against you. If one
device is hacked your entire network is going to be compromised. Transmitting
your information securely between IoT devices using traditional crypto algo‐
rithms are not possible because those devices have limited energy supply, limited
chip area and limited memory size; because of those constraints a new type of
crypto algorithm came into place: the light weight crypto algorithms. As the name
implies those algorithms are light and can be used in those devices with low
computational power. In this paper, we start by describing some of the heavy
ciphers. We also highlight some lightweight ciphers and the attacks known against
them.

Keywords: Light weight cryptography · IoT devices · Grain cipher


Present cipher · Hight cipher

1 Introduction

Security is the key concern on the technology world. With the rapid increase in the
number of devices connecting to the internet these days, transmitting confidential infor‐
mation in a secure manner is what people try to achieve when they use encryption.
Encryption is the term used to hide the context of the original message (using an encryp‐
tion algorithm and a key) so only the intended user can decrypt it and read it. Figure 1
illustrates the basic flow of information through encryption.
Encryption algorithms are divided into two main categories, symmetric algorithms
and asymmetric algorithms; where the symmetric algorithms mean using only one key
to perform both the encryption and decryption process. While the asymmetric algorithms
use two keys (public and private) one to encrypt and the other to decrypt.
Symmetric algorithms are also divided into two main groups stream ciphers and
block ciphers, from their name indicates stream ciphers encrypts a bit by bit, while the
block cipher encrypts a bunch of bits together.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 196–204, 2019.
https://doi.org/10.1007/978-3-030-02686-8_16
Light Weight Cryptography for Resource Constrained IoT Devices 197

Fig. 1. Encryption process.

In this paper we will start by introducing the general categories of symmetric key
and assymetric key crypto algorithms. We will then proceed to survey some leading
light weight algorithms. For each algorithm we will introduce the fundamental structure,
followed by attacks studied for each one. At the end we will present a comparison of
performance parameters for these algorithms.

1.1 Symmetric Algorithms – Block Cipher (AES)

We will take AES as an example of the symmetric algorithm since it’s the most used
algorithm these days. AES stands for Advanced Encryption Standard, also known as
Rijndael, its original name [10]. AES encrypts a fixed block size of 128 bits, and has a
key size of 128, 192, or 256 bits. AES was developed to substitute DES which was
vulnerable to brute-force attacks. AES encrypts blocks of data in rounds depending on
the key size, for example the 256 bits key, have 14 rounds (below table shows number
of rounds for each key size) [8]. The relation between number of rounds and the key
size is illustrated in Table 1.

Table 1. Number of rounds (R) in relation to cipher key size


No. rounds Key size
10 128
12 192
14 256

For encryption, each round of processing includes four steps, byte substitution, shift
rows, mix columns, and add round key. All rounds are identical except for the last one.
One round is shown in Fig. 2.
The byte substitution step simply means replacing bytes with bytes from a 16 × 16
lookup table. Shift rows step consists of shifting the row state to the left, first row is not
shifted while second row is shifted one byte, third row shifted two bytes and fourth row
shifted three bytes to the left (Fig. 3).
198 H. M. Z. Al Shebli and B. D. Beheshti

Fig. 2. AES encryption round steps.

Fig. 3. Shift rows.

AES is still secure against all attacks. AES requires lots of power and chip area to
do encryption and decryption process. While this is not an issue in devices like work‐
stations and laptops it i’s a concern for small devices that have to save power, and have
limited chip area. For AES with a key size of 128 bits 3,400 GE1 chip area is required
while 2000 GE chip area is allocated for security in IoT device [9].

1.2 Asymmetric Algorithms – RSA

RSA is another crypto algorithm that is widely used; it is an asymmetric algorithm that
uses two different keys but mathematically linked, one to encrypt (the public key) and
one to decrypt (the secret key). RSA got its name from the initial letters of the three
scientists who first publicly described the algorithm in 1977 (Ron Rivest, Adi Shamir,
and Leonard Adleman).
There are two steps for the RSA algorithm:
1. Key generation.
2. RSA encryption, decryption.

1
A gate equivalent (GE) stands for a unit of measure which allows to specify manufacturing-
technology-independent complexity of digital electronic circuits. For today's CMOS technol‐
ogies, the silicon area of a two-input drive-strength-one NAND gate usually constitutes the
technology-dependent unit area commonly referred to as gate equivalent. A specification in
gate equivalents for a certain circuit reflects a complexity measure, from which a corresponding
silicon area can be deduced for a dedicated manufacturing technology (https://en.wiki‐
pedia.org/wiki/Gate_equivalent).
Light Weight Cryptography for Resource Constrained IoT Devices 199

In the key generation step (generating a public key and a corresponding private key),
two large prime numbers have to be generated. After that we have to generate modulus
(n) by multiplying those two prime numbers. Generating the modulus is easy but facto‐
rizing the two prime numbers that we used is considered hard even with today’s super
computers. After that we need to calculate the φ(n) using the formula: φ(n) = (p−1)(q
−1). The public key (expressed as e) is then generated by choosing a prime number in
the range between 3 and ϕ(n). The final public key is a pair of e and n; represented as
(e, n). The private key (d) is the multiplicative inverse of the public key with respect to
ϕ(n), and is also represented as pair (d, n). For the encryption this formula is used:

F(m, k) = mk mod n

where k is the public key or the private key.


Asymmetric algorithms (AKA public key algorithms) relies on mathematical oper‐
ations like factorization to be effective. These operations needs lots of resources to
complete and requires large hardware footprint, making it too expensive for IOT devices.

2 Light Weight Cryptography

Light weight cryptography is designed to secure the communication between IOT


devices since traditional cryptographic algorithms are not an option. IOT devices (AKA
constrained devices) have constraints when it comes to speed, power consumption, area,
processing, memory space and size [14]. The challenge is to reduce some of the algo‐
rithm parameters without affecting the total security of the algorithm. Number of rounds,
key length and processing speed has to be reduced.
There are two ways to design a light weight cryptographic algorithm, the first one is
to develop it from scratch like the PRESENT cipher, and the second way is to optimize
the functionalities of an existing traditional cryptographic algorithm like AES and RSA.
Light weight algorithms are categorized into two main categories; hardware–
oriented and software-oriented based on the requirements of the cipher. Hardware
oriented ciphers used when we are concerned about the number of clock cycles and chip
size; while the software oriented ciphers are used when we are concerned about the
memory space and power consumption.
A standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1
of the International Organization for Standardization (ISO) and the International Elec‐
trotechnical Commission (IEC), started working on the lightweight cryptography
project. ISO/IEC 29192 is the known standard for the lightweight cryptography.
ISO/IEC 29192 part two and part three, specifies block ciphers and stream ciphers
respectively. Some light weight ciphers are introduced below.

2.1 Grain (Stream Cipher)


In this bit oriented synchronous stream cipher the key stream is generated independently
from the plaintext. The stream cipher is divided into two phases: first phase is initiali‐
zation the internal state using the secret key and the initialization vector [7]. Then the
200 H. M. Z. Al Shebli and B. D. Beheshti

phase is repeated while the state is updated and used to generate key-stream bits. There
are two types grain v1 and grain-128. The overall algorithm block diagram is illustrated
in Fig. 4.

Fig. 4. Grain v1 algorithm.

The grain v1 needs an 80-bit stream cipher and a 160 cycles and receives a 64-bit
initialization vector.
The grain-128 needs 128-bit stream cipher and a 256 cycles and receives a 96-bit
initialization vector.
Figure 4 shows a basic structure of the Grain v1 algorithm “f” and “g” are two
polynomials(functions) of degree 80 they are used as a feedback for the feedback regis‐
ters the linear feedback and the non-linear feedback the “h” polynomial uses selected
bits from both the feedback shift registers, bits from the NFSR register are XORed then
added to the “h” function the output is used as a feedback to the LFSR and NFSR in the
initialization phase (as shown in the light blue lines) and during the normal operation it
is used as a key stream output. The output is one bit of the non-linear feedback register
and four bits of the linear feedback register then they are supplied to the nonlinear 5-
to-1 XOR function and the output is linearly combined with 7 bits of the linear feedback
register and released as an output.

2.2 Present (Block Cipher)

PRESENT is an ultra-Lightweight block cipher that has a block length of 64 bits and
two key lengths of 80 bits, 128 bits and 31 rounds [6]. It’s block diagram is shown in
Fig. 5.
Present cipher design got its characteristics from the Serpent ciphers (non-linear
substitution layer S-box) and DES (linear permutation layer pLayer).
There are three stages involved in PRESENT. The first stage is addRoundKey; the
second stage is sBoxLayer; the third stage is the bit permutation pLayer [6].
Figure 5 shows that in each round of the 31 rounds there is an XOR operation to
introduce a round key as in ki 1 <= i <= 32 then k32 is used for post-whitening (post-
whitening is combining the data with portions of the key to increase the security of block
cipher), linear bitwise permutation and a non-linear substitution layer. The non-linear
layer uses S-box of 4-bit to 4-bit which is applied in each round 16 times in parallel.
Light Weight Cryptography for Resource Constrained IoT Devices 201

Fig. 5. Present cipher.

The key can take 80 or 128 bits, it is stored in a key register and represented in a
descending way as k50k49 … k0 at round “i” 64 bits round key is ki which is denoted as
k63k62 … k0 and it consists of the leftmost 64 bits of the contents of the K register. After
extracting the round key ki the K register is rotated by 61-bit positions to the left, the
leftmost 4 bits are passed through the S-Box and the round counter value “i”, then the
‘i’ is XORed with the least significant bits of the round counter on the right and the
whole operation is repeated.

2.3 Hight (Block Cipher)

Hight is a lightweight encryption algorithm that was proposed one year before Present
cipher. It consists of 32 rounds. Hight makes use of XOR, addition Modulo 256 opera‐
tions which allow it to have good performance in hardware [1]. Figure 6 shows the cipher
block diagram.
Hight has a block size of 64-bits and a 128-bits key length. The encryption starts
with an Initial Transformation (IT) that is applied to plaintexts together with input
whitening keys (WKs) [11]. In the last round a Final Transformation (FT) is applied to
the output of the last round together with output whitening in order to obtain the Cipher
texts [1].
The plain text is divided into 8 bytes denoted as P = P7, P6, … P0 the same thing for
the cipher text its divided into 8 bytes denoted as C = C7, C6, … C0 the 64-bit intermediate
values are represented as Xi = Xi,7, Xi,6, … Xi,0 the master key is divided into 16 bytes
denoted as MK = MK15, MK14, … MK0 [13]. The Key Schedule has 2 algorithms one
is to generate whitening-key “WK” and the second is to generate subkey “SK”. The
operation uses 8 whitening-keys, 4 for the initial transformation and another 4 for the
final transformation. 128 subkeys are generated throughout the process 4 used at each
round.
202 H. M. Z. Al Shebli and B. D. Beheshti

Fig. 6. Hight cipher.

At the initial transformation the first intermediate value X0,0, P0 are into an addition
and a modules of 28 with the whitening-key WK0 then the 2nd intermediate value X0,1,
P1, then the 3rd intermediate value X0,2, P2 are into addition and a modules of 28 with the
whitening-key WK1 this is repeated until the X0,7, P7 [12].
At each round the Xi is turned into Xi+1 in example Xi+1,0 = Xi,7 XOR (auxiliary
function of(Xi,6) into addition and a modules of 28 with the subkey SK4i+3) this repeats
for every X until X32,0 in each round the final transformation is the same as the initial
transformation repeated but with the “P” turned into the notation of cipher text which
is “C” and WK instead from 0 to 3 it is now 4 to 7 and the X is from X32,0 to X32,7.

3 Comparison Between the Algorithms

Since Present and Hight are both block ciphers it’s fair to compare them to each other.
Table 2 lists the comparison between key performance criteria between PRESENT and
HIGHT.

Table 2. Comparison between PRESENT AND HIGHT cipher


Algorithm Key size Area “GE” RAM requirement (bytes)
Present 80 1570 142
Hight 128 3048 18

We assume a block size of 64 bits for both algorithms. Table 1 shows that Present
cipher does not need much area as Hight cipher does [2].
Light Weight Cryptography for Resource Constrained IoT Devices 203

As for the stream cipher Grain Table 3 shows the area requirements.

Table 3. Grain cipher


Algorithm Key size Area “GE”
Grain 80 bits 1294

4 Attacks Against Lightweight Algorithms

Designers of Present cipher presented some security margins for differential, linear and
algebraic cryptanalysis. Since then it was discovered that 32% of Present keys (80-bit
key size) are weak for linear cryptanalysis. In 2009 a study on the linear hull and alge‐
braic cryptanalysis was conducted for Present. The study proposed a linear attack for
25 rounds of Present (128-bit key size) and an algebraic attack for 5 rounds of Present
(80-bit key size). After a year of this study an attack on 25-round Present was proposed
that can recover the 80-bit secret key with 262.4 data complexity [3].
In linear cryptanalysis an attacker tries to find biased linear approximations for non-
linear components of a cipher (e.g. an S-Box) and then use them to find biased linear
approximation for the entire cipher. One is then able to use these biased approximations
to recover certain subkey bits. Afterwards, the remaining key bits are recovered by brute
force [4].
One of the studied attacks against Present cipher is a Statistical Saturation Attack
that takes advantages of the weakness in its diffusion layer. Present as well can be
exploited using the Differential key attack [5].

5 Conclusion and Future Work

Recourse constrained devices like RFID (radio-frequency identification) are getting


more and more into our lives because of their cheap prices. The need for cryptographic
solutions is necessary. While lots of ciphers have been proposed their security has to be
studied more and more against emerging attacks.
In this paper we highlighted three lightweight algorithms, we compared them; we
also touched on possible attacks for the Present cipher.
For future work, we plan to simulate several key lightweight crypto-algorithms on
multiple embedded platforms and profile their performance. These performance
comparisons will be important to recognize each algorithm’s internal computation
affinity to specific CPU architectures.

References

1. Ozen, O., Varıcı, K., Tezcan, C., Kocair, C.: Lightweight Block Ciphers Revisited:
Cryptanalysis of Reduced Round PRESENT and HIGHT. http://citeseerx.ist.psu.edu
2. Bogdanov, et al.: PRESENT: An Ultra-Lightweight Block Cipher. http://lightweightcrypto.org
204 H. M. Z. Al Shebli and B. D. Beheshti

3. Lacko-Bartošová, L.: Algebraic Cryptanalysis of Present Based on the Method of Syllogisms.


www.sav.sk
4. Bulygin, S.: More on linear hulls of PRESENT-like ciphers and a cryptanalysis of full-round
EPCBC–96. http://eprint.iacr.org
5. Collard, B., Standaert, F.X.: A Statistical Saturation Attack against the Block Cipher
PRESENT. http://citeseerx.ist.psu.edu
6. Aura, T.: Cryptoanalysis of Lightweight Block Ciphers, November 2011. http://into.aalto.fi
7. Grain: A Stream Cipher for Constrained Environments (n.d.). https://cr.yp.to
8. Block and Stream Cipher Based Cryptographic Algorithms: A Survey (n.d.).
www.ripublication.com
9. Simon and Speck: Block Ciphers for the Internet of Things, July 2015. https://csrc.nist.gov
10. Single-Cycle Implementations of Block Ciphers (n.d.). https://csrc.nist.gov
11. Han, B., Lee, H., Jeong, H., Won, Y.: The HIGHT Encryption Algorithm draft-kisa-hight-00”,
November 2011. https://tools.ietf.org
12. Impossible Differential Cryptanalysis of the Lightweight Block Ciphers TEA, XTEA and
HIGHT (n.d.). https://eprint.iacr.org
13. IP Core Design of Hight Lightweight Cipher and Its Implementation (n.d.). http://airccj.org
14. Rekha, R., Babu, P.: On Some Security Issues in Pervasive Computing: Light Weight
Cryptography”, February 2012. http://www.enggjournals.com
A Framework for Ranking IoMT Solutions
Based on Measuring Security and Privacy

Faisal Alsubaei1,2(&), Abdullah Abuhussein3, and Sajjan Shiva1


1
University of Memphis, Memphis, TN 38152, USA
{flsubaei,sshiva}@memphis.edu
2
University of Jeddah, Jeddah, Saudi Arabia
3
St. Cloud State University, St. Cloud, MN 56301, USA
aabuhussein@stcloudstate.edu

Abstract. Internet of Medical Things (IoMT) is now growing rapidly, with


Internet-enabled devices helping people to track and monitor their health, early
diagnosis of their health issues, treat their illness, and administer therapy.
Because of its increasing demand and its accessibility to high Internet speed,
IoMT has opened doors for security vulnerabilities to healthcare systems. The
lack of security awareness among IoMT users can provoke serious and perhaps
fatal security issues. The disastrous consequences of these issues will not only
disrupt medical services (e.g., ransomware) causing financial losses but will also
put the patients’ lives at risk. This paper proposes a framework to compare and
rank IoMT solutions based on their protection and defense capability using the
Analytic Hierarchy Process. The proposed framework measures the security,
including privacy, in the compared IoMT solutions against a set of user
requirements and using a detailed set of assessment criteria. This works aims to
help in determining and avoiding risks associated with insecure IoMT solutions
and reduce the gap between solution providers and consumers by increasing the
security awareness and transparency.

Keywords: IoMT  Quantitative evaluation  Security  Assessment


Metrics  Measurements  Privacy

1 Introduction

The Internet of Medical Things (IoMT), also known as the healthcare Internet of
Things (IoT), can be described as a collection of medical devices and applications that
are connected through heterogeneous networks. IoMT solutions are being utilized by
many healthcare providers to facilitate the management of diseases and drugs, improve
treatment methods and the patient experience, and reduce cost and errors. Currently,
about a third of IoT devices are found in healthcare; this number is expected to increase
by 2025, with healthcare accounting for the largest percentage (approximately 40%) of
the total global worth of IoT technology ($6.2 trillion) [1]. Further, approximately 60%
of healthcare organizations have already adopted IoT technologies, and that percentage
is expected to rise to approximately 87% by 2019 [2].

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 205–224, 2019.
https://doi.org/10.1007/978-3-030-02686-8_17
206 F. Alsubaei et al.

One of the most prevalent problems currently facing IoMT solutions is security
fragility [3]. A survey found that of more than 370 organizations using the IoMT,
approximately 35% suffered at least one cybersecurity incident in 2016 [4]. The lack of
security awareness among IoMT users is a key factor for the security issues in IoMT.
According to a recent survey, only 17% of connected medical device makers and 15%
of medical professionals are aware of potential security issues and take serious mea-
sures to prevent them [5]. This could explain why more than 36,000 healthcare-related
devices in the U.S. alone are easily discoverable on Shodan, a search engine for IoT
devices [6].
In addition, while there is a lack of security standards for the IoT in general, extra
efforts are needed to regulate and ensure security in the IoMT. Unlike other domains,
security in the medical field is vital due to the sensitivity of the medical data and critical
nature of the operations involved. The U.S. Food and Drug Administration (FDA) has
taken steps to secure medical devices; however, only 10% of these devices are clas-
sified under FDA Class III, which includes devices designed to support or sustain life
(e.g., pacemakers) [7]. However, reduced patient wellbeing is not the only consequence
of IoMT attacks, as these attacks can also have negative effects on medical data
privacy, brand reputation, business continuity, and financial stability.
Moreover, there is a lack of consensus among stakeholders in healthcare organi-
zations regarding security requirements [8]. This dissension and the lack of security
awareness leaves adopters unsure about which security features are relevant to their
solutions [9]. IoMT adopters usually are compelled to accept the default security in
solutions. Adopters should instead be able to measure and verify security themselves to
make well-educated decisions. It is also important to enable adopters to select security
features based on their requirements (i.e., priorities) because security goals depend not
only on the scenario but also on the assets and tolerance to risks.
Due to the rapid evolution of IoMT technologies, there is a need to introduce a
structured quantitative model that is expandable and offers opportunities to improve
security. Thus, we propose a framework to assess the security and privacy levels
provided in IoMT solutions using the Analytic Hierarchy Process (AHP). The proposed
framework allows users to make knowledgeable choices when obtaining new or
enhancing existing IoMT solution. It also allows adopters to define their security
priorities that reflect their security objectives and utilize them to rank prospective
solutions in terms of security. The AHP-based method uses a list of detailed security
assessment criteria collected by examining security controls published by specialized
organizations such as the Open Web Application Security Project (OWASP), the
International Organizations of Standardization (ISO), FDA among others. In addition,
our method uses previous IoMT attacks and available IoMT solutions.
The rest of this paper is organized as follows: the literature for measuring the
security in IoMT is discussed briefly in Sect. 2. Section 3 presents the assessment
criteria used in the framework. The security assessment method is demonstrated in
Sect. 4. Section 5 presents a case study of the framework by assessing the degree of
security in real IoMT solutions. Sections 6 and 7 discuss the evaluation and limitations,
respectively. Lastly, in Sect. 8, we draw concluding remarks and outline some future
works.
A Framework for Ranking IoMT Solutions 207

2 Related Work

This section surveys previous work in assessing the security of IoMT solutions. The
main gaps in the current literature can be summarized as follows:
• The assessment criteria are specific to a set of IoMT scenarios (e.g., patient mon-
itoring) [10, 11].
• The security recommendations are abstract and target only manufacturers who
primarily focus on one part of the IoMT (e.g., devices) to the exclusion of others,
such as mobile and back-end [12–15].
• Lack of an assessment model that helps adopters, according to their security pri-
orities, to quantify and compare the security of potential IoMT solutions [16–22].
• The focus is only on assessing existing solution(s) by utilizing post-deployment
parameters such as configurations and current users’ feedback, which requires
technical knowledge that often most IoMT users lack [14, 23, 24].
Despite the fact that these works are viewed as a valuable contribution, they cannot
be incorporated efficiently into an assessment method for the IoMT. They also do not
provide a practical assessment method that considers the user security priorities. In this
paper, we build upon and complement the past efforts by proposing a framework for
quantifying security in IoMT solutions that is twofold: (1) a detailed list of security
assessment criteria that includes over 200 assessment questions for IoMT security.
These questions were gathered by examining the IoMT security considerations from
different sources and IoMT solution providers. (2) an AHP-based security assessment
method for IoMT solutions utilizing the assessment criteria. The proposed framework
enables users to rank candidate IoMT solutions based on their security to help them in
making educated decisions. The importance of our framework lies in its ability to aid
adopters in selecting or improving current IoMT solutions considering their security
priorities.

3 Assessment Criteria

Because of the rapid development in the IoT technologies and therefore the complexness
of IoMT, it is imperative to design a simple-to-use and elaborate list of assessment
criteria that considers any IoMT solution. Therefore, we utilize the goal-question-metric
(GQM) approach while designing the assessment criteria [25]. GQM is a popular
approach to measure assessment goals by identifying questions and developing metrics
to answer the questions [26]. These metrics are then used to ensure that the goals are
met. As in Fig. 1, GQM is utilized in our framework such that for every IoMT com-
ponent, there is a list of yes/no questions and corresponding answers (i.e., metrics).
A small sample of the assessment criteria is shown in Table 1 and organized as.
208 F. Alsubaei et al.

Fig. 1. Typical IoMT components.

Table 1. A sample of the assessment criteria.


Component Security feature Question
Goal Sub-goal
Secure 1. Intrusion 1. Can the IoMT ecosystem detect endpoints that are
Endpoint prevention connecting to abnormal service, or connecting to
(E) service at unusual times?
2. Can IoMT ecosystem detect endpoints leaving or
joining a communication network at erratic intervals?
3. Can endpoint devices detect a significantly abnormal
network traffic fingerprint of other devices?
4. Do endpoint devices have secure event logging?
2. Strong 1. Do endpoint devices require users to authenticate
authentication themselves before using/access any function?
2. Do the endpoint devices provide mechanisms to
prevent brute force attacks?
3. Do endpoint devices use cryptographic certificates
for self-authentication or to verify the broker identity of
a user?
4. Does the IoMT ecosystem ensure that no hardcoding
or default passwords are allowed in endpoint devices?
3. Secure updates 1. Does the IoMT ecosystem provide automated alerts,
via SMS or email, for available manual updates for
endpoint devices?
2. Are endpoint devices updates and patches, including
extensions or plugins, verified (e.g., binary signing and
hash values) after download and before installation to
ensure their legitimacy?
3. Does the IoMT ecosystem clearly identify the
endpoints software running version?
(continued)
A Framework for Ranking IoMT Solutions 209

Table 1. (continued)
Component Security feature Question
Goal Sub-goal
4. Protected Is the use of direct memory access in endpoint devices
memory by other peripherals carefully managed and controlled?
5. Secure Do endpoint devices renegotiate and verify
communications communication security keys each time it reconnects to
the communication network?
6. Secure Do management systems distinguish between active
administration and inactive endpoint devices?
7. Secure Do endpoint devices use epoxy covering for core circuit
hardware components?
8. Secure software Are all debugging and test technologies disabled in the
endpoint devices?
9. Secure web Is the web interface of endpoint devices presented over
interface hyper-text transfer protocol secure (HTTPS)?
10. Secure storage Are all data stored in the endpoints’ removable media,
protected cryptographically?
11. Regulatory Are the medical endpoint devices approved by the
compliance FDA?
12. Secure root of Are the roots of trust certified by FIPS or CC?
trust
Secure 1. Secure Does the gateway provide standard bidirectional end-to-
Gateway communications end encryption?
(G) 2. Secure storage Does the gateway cryptographically store data collected
from endpoint devices?
3. Intrusion Does the gateway have robust security logging of all
prevention events?
4. Secure Does the gateway provide countermeasures against
hardware physical attacks?
5. Strong Does the gateway cryptographically authenticate
authentication endpoint devices to different components and vice
versa?
6. Secure updates Does the gateway allow for modular updates and
monitoring of extensions and plugins?
7. Secure web Is the gateway’s web interface presented over HTTPS?
interface
Secure 1. Secure 1. Are the communications in mobile devices always
Mobile (M) communications encrypted?
2. Intrusion 1. Does the mobile provide alerts for mobile status (e.g.,
prevention connectivity or power outages)?
3. Strong Do mobile applications, or devices support biometrics
authentication authentication (e.g., fingerprint, face recognition)?
4. Secure updates Are mobile vendor-specific security updates checked
and installed automatically?
(continued)
210 F. Alsubaei et al.

Table 1. (continued)
Component Security feature Question
Goal Sub-goal
5. Secure software Is the application certified and listed in vendors’
application stores (e.g., Apple App Store, Google
Play)?
6. Secure web Is the mobile’s web interface presented in HTTPS?
interface
7. Secure storage Does the mobile application share any data with third
parties?
Secure 1. Secure cloud 1. Does the cloud services always available even during
Back-end environment scaling up/down?
(B) 2. Does the cloud service provider hide information
about the servers physical locations?
3. Does the cloud have countermeasures against data
leakage in multi-user storage services?
4. Does the cloud service provider have an official
insider threat program?
2. Secure software 1. Does the back-end utilize an API for the application
to cryptographically identify itself to its peers?
2. Are back-end third-party libraries actively monitored,
managed, and audited?
3. Are the back-end applications designed to mitigate
buffer errors using the operating system’s mechanisms?
3. Secure web Does the back-end web interface use certificates that are
interface signed by a certificate authority?
4. Regulatory Does the back-end use standard protocols and
compliance technologies?
5. Risk Did the IoMT solution provider identify the assets, risk
assessment factors, and threat agents?
6. Privacy Does the IoMT solution provider have a process to
assurance ensure that the privacy of individuals’ personal and
medical information complies with the latest relevant
privacy laws (e.g., Health Insurance Portability and
Accountability Act (HIPAA), Health Information
Technology for Economic and Clinical Health Act
(HITECH) or the General Data Protection Regulations
(GDPR), Personally Controlled Electronic Health
Records Act, etc.) in effect over user control of their
data?
7. Secure Does the IoMT solution provider validate management
development of the supply chain, the software, the sources of the
lifecycle equipment, and the purchaser and supplier aspects of
the infrastructure?
(continued)
A Framework for Ranking IoMT Solutions 211

Table 1. (continued)
Component Security feature Question
Goal Sub-goal
8. Incident Does the IoMT solution provider have an incident
response response procedure in place for information recovery?
9. Secure storage Are the back-end authentication credentials (i.e.,
usernames, passwords, device ids, etc.) salted and
hashed before stored?
10. Secure Does the back-end have quality of service mechanisms
communications for delivery of targeted messages to specific
components?
11. Secure Does the back-end report and update service
updates infrastructure’s third-party components (both software
and hardware) regularly to ensure the latest security
updates are installed once available?
12. Strong Does the authentication service gather metrics to
authentication determine whether the user changed to an alternative
computing platform, but still uses the former token?
13. Secure Does the back-end include load-balancing features and
administration redundancy systems?
14. Intrusion Do the back-end protect against malware-based attacks?
prevention

3.1 Goals
The goals are the IoMT components to be secured (first column in Table 1). The IoMT
typical components we use, as outlined in Fig. 1, are defined as follows [27]:
Endpoints: These are connected medical devices that typically have embedded sen-
sors to collect data and forward it to the back-end servers. Based on their operating
system, hardware, communication media, mobility, etc., these devices can be of various
kinds but collaborate heterogeneously to perform a common task. Endpoint devices can
be wearable sensors (e.g., blood pressure monitors, heart monitors, pulse oximeters),
implantable devices (e.g., embedded cardiac function monitoring systems, swallowable
camera capsules), ambient sensors (e.g., motion sensors, pressure sensors, room tem-
perature sensors), or stationary devices (e.g., computerized tomography scanners,
surgical instruments).
Gateway: These are optional devices to support some weak endpoint devices. Some
strong endpoint devices can have gateway capabilities and can serve as gateways; in
this case, these devices are called border routers. Gateways act as a bridge network to
aggregate the data collected from the endpoint devices and transmit it to the back-end.
Because of its location, it also serves as a secure channel between the insecure, but
trusted, local network and the untrusted public Internet.
212 F. Alsubaei et al.

Back-end: Most current IoMT environments have back-end server(s) that are often
hosted on the cloud for better scalability. IoT platforms are often utilized for provi-
sioning, management, and automation of endpoint devices. They also provide other
common server-side tasks such as centralized data storage, backups, reports, and
analytics, etc.
Mobile: IoMT systems can also have mobile applications to control endpoint devices
and provide limited back-end capabilities and instant alerts.
Every goal (i.e., IoMT component in the first column of Table 1) has sub-goals (i.e.,
security features in the second column) to ensure that the security goals are achieved.
For instance, to secure the endpoint devices, the identified sub-goals are as follows:
secure administration, strong authentication, secure updates, intrusion prevention,
protected memory, secure communications, secure web interface, secure hardware,
secure software, secure storage, regulatory compliance, and secure root of trust.

3.2 Questions
The assessment questions (third column of Table 1) were thoroughly examined and
collected from various reliable resources that include:
• Medical-specific sources, such as guidelines from the FDA [28], ISO [15], the
Medical Device Risk Assessment Platform (MDRAP) assessment questionnaire
[13], and the Naval Medical Logistics Command (NMLC) [29], among others.
• General IoT security considerations provided by OWASP [17], the Cloud Security
Alliance (CSA) [16], the Global System for Mobile Communication Association
(GSMA) [19], and others [18].
• The documentation of popular IoMT solutions and their accompanying Security
Level Agreements (SecLAs).
The yes/no questions are less demanding and provide the answers to the respon-
dent. Thus, they are quick and easy to answer and provide an accurate and consistent
assessment. These questions precisely measure the different levels of security in the
security features. For example, the security of encryption depends on the used algo-
rithm and encryption key size. Hence, our questions consider, and quantify, such levels
of security. Due to the space constraint, in Table 1 we listed only a sample list of the
assessment criteria in which, only one question is included per security feature except
for the questions used in the case study. The full list will be available in future
publications.

3.3 Metrics
A single metric is a score that depends on the question answer. Our proposed frame-
work utilizes the documentation presented by solution providers to determine the
metrics. Metrics measure the degree of achieving a sub-goal and, ultimately, a goal.
The overall degree of security provided by a security feature is the total scores for all
the assessment questions under that feature. The security features are then used to
calculate the degree of security of a component. As illustrated in Fig. 3, this forms a
hierarchy for the assessment.
A Framework for Ranking IoMT Solutions 213

Fig. 2. The proposed framework flow.

Fig. 3. Sample profile represented in hierarchy.


214 F. Alsubaei et al.

4 IoMT Security Assessment

In this section, we present an assessment method that employes the presented hierarchal
list of assessment criteria and perfectly suits its hierarchal structure. IoMT security
depends on multiple factors; therefore Multiple criteria decision-making (MCDM) is
required such that all goals (and sub-goals) are assessed, and their scores are aggregated
in a meaningful score. Hence, we use the AHP in our assessment method to achieve
this task. AHP is a popular technique to solve MCDM problems [30]. What makes the
AHP more suitable in this scenario than other MCDM techniques, is its flexibility as
well as its ability to address inconsistencies across requirements. It also allows for
composite quantitative and qualitative weighted questions to be compared easily
because of its pairwise comparisons of decision criteria [31]. The pairwise results of
comparisons and weights for every criterion are structured into a hierarchy. These
comparisons of the questions and weights are the basis for the security assessment of
IoMT solutions. As shown in Fig. 2, there are three main stages in our assessment
method, which are described as follows.

4.1 Defining Security Profiles


In this stage of the framework, security profiles are defined to prepare them for
comparisons in the next stage. In other words, profiled IoMT solutions are described in
terms of their security capabilities producing IoMT solution profile. The user desired
degree of security is also captured. Thus, the output of this stage is a user profile that
includes the user requirements (i.e., security priorities) and at least two IoMT solution
security profiles. This allows the user to (1) verify that the solution’s security matches
their requirements, and (2) compare the security in two or more solutions. The two
types of security profiles are described as follows.
User Requirements Profile. This is where IoMT users specify their desired security
degree. The user assigns weights for all elements in the second, third, and/or fourth
levels as in Fig. 3. This detailed profiling is crucial for better accuracy when comparing
the relative importance of two (or more) elements within the same level. This ensures
that all the user’s security priorities are met. The framework provides flexibility in
assigning weights. It allows users to assign weights on a scale of 1 to 10 (i.e., a weight
of 10 denotes it is extremely more important than others, whereas 1 denotes equal
importance) or binary (i.e., 1 or yes denotes required, and 0 or no denotes not required),
or mixed at various layers of the hierarchy. For example, a user marks one component
as very important, assigns quantitative weights to security features in another com-
ponent, and assigns Boolean (yes/no) to a third component. The weight of 0 can be
assigned to irrelevant question(s) so that they are disregarded from the assessment.
Solution Profiles. To compare the degree of security in IoMT solutions, assessment
criteria questions (described in Sect. 3) should be answered for each IoMT solution
individually to assess the security in IoMT solutions. One can use the publicly available
specifications of IoMT solution from the product FAQs page, or contact the solution
providers’ customer service, to answer the assessment criteria questions. In open-
source solutions, security experts can be involved in answering these questions.
A Framework for Ranking IoMT Solutions 215

4.2 Security Quantification


In this stage, the security profiles generated in the previous stage (i.e., the user
requirements profile and the solutions’ profiles) are used to assess security in the
solutions and to check if they match the user security requirements. The terms that will
be used in the assessment is shown in Table 2.

Table 2. Description of assessment terms.


Term Descriptions
q Assessment question
Si Solution i, where i 2 f1; . . .; ng and n denotes the number of IoMT solutions to be
compared
Vi;q Metric (answer) of q provided by Si
Si;q Si provides q with value Vi;q
U IoMT user (adopter)
Vu;q The user required value of q
S1 =S2 Relative rank ratio of S1 over S2 , regarding q
S2 =S1 Relative rank ratio of S2 over S1 , regarding q
Si;q =U Relative rank ratio of Si over U, which indicates if Si fulfills Vu;q

Since the questions in our assessment criteria require only yes or no answers, these
values can be represented as 1 and 0 values. The relationships across the solutions (S)
of the question value (V) can be defined as a ratio:
(
S1 V1 0 if ðV1 ¼ 0 ^ V2 ¼ 0Þ _ ðV1 ¼ 0 ^ V2 ¼ 1Þ
¼ ¼ ð1Þ
S2 V2 1 if ðV1 ¼ 1 ^ V2 ¼ 1Þ _ ðV1 ¼ 1 ^ V2 ¼ 0Þ

For example, assume two IoMT solutions, S1 and S2 , have values


V1;q ¼ 0 and V2;q ¼ 1, respectively, for question q, which user U requires
(thus; Vu;q ¼ 1). The pairwise comparison ratio of S1 and U is defined as V1;q =Vu;q = 0,
which means that S1 is not satisfying the user requirement. However, the pairwise
comparison ratio V2;q =Vu;q = 1 means that S2 is fulfilling the user requirement.
This stage relies on the pairwise comparison matrix (CM) of the security questions
in solutions’ profiles and the user requirements profile. Using a CM for a question over
all profiles, we obtain a one-to-one comparison where V1;q =V2;q denotes the relative
rank of S1 over S2 . If there are n IoMT solutions, the one-to-one CM (including the user
requirment profile) will be of size ðn þ 1Þ  ðn þ 1Þ:

ð2Þ
216 F. Alsubaei et al.

4.3 Ranking
The relative ranking of all the IoMT solutions for any question, which is known as the
priority vector (PV), is derived by calculating the eigenvector of the CM. The PV
transforms the CM into a meaningful vector that summarizes the results of all com-
parisons (ratios) into a normalized numerical ranking. The principle of eigenvector in
AHP is necessary to reduce human errors in the judgment process [32]. The following
example PV shows that solutions 2 and 3 meet the user requirement U.

After all PVs (i.e., rankings) for all questions are computed, they are aggregated
(from bottom to top) to determine the overall security rankings of IoMT solutions.
All the questions’ PVs are combined with their assigned relative weights from the
previous stage.

X
g
PVaggregated ¼ wj :PVj ð3Þ
j¼1

where PVj denotes the PV of the CM of question j, wj denotes the relative weight
assigned to the question, and g is the number of all questions. If the user wants to
compare the security in the underlying levels, then the weights of the upper levels will
not be considered in the aggregation. For example, if a user wants to compare only the
security features in one component, then only the weights of the security features, and
their corresponding questions will be considered.

5 Case Study

In this section, we demonstrate how the framework can be used to assess and rank three
popular real-world cloud-based IoT platforms that are being used widely in healthcare.
We examined the SLAs and other available documentation describing the offered
security to answer the questions in our assessment criteria. As a result, we have three
distinct security profiles for these platforms. We consulted their customer service in
order to answer the questions that we could not answer using their publicly published
documentation. The questions for which we could not find relevant information to
answer are dealt with as answered “no” because if the answers were “yes”, then they
would have used this to market the security of their products. To illustrate the flexibility
of our framework, we show three examples of hypothetical weights (i.e., user
requirements) for a sample of the assessment criteria as described in Table 3 (where yes
and no are denoted by1 and 0, respectively).
Case 1
In this detailed case, the user assigned boolean weights by answering all relevant
questions (i.e., yes denotes required, and no denotes not required). For every question,
A Framework for Ranking IoMT Solutions 217

Eq. 1 is used to perform pairwise comparisons on its CM. Thus, the CM of B.2.3 is:

Then, the PVB.2.3 is calculated by finding the normalized eigenvector of the CMB.2.3

Table 3. Case study assessment values.


Goals Question Metrics User
requirements
(Weights)
Component Security feature S1 S2 S3 U1 U2 U3
Secure Endpoint (E) 1 1 1 1 1 0 2 9
2 1 1 1 1
3 1 1 1 0
4 1 1 1 1
2 1 1 1 1 0
2 1 1 1 1
3 1 1 1 1
4 1 1 1 0
3 1 1 1 1 0
2 1 1 1 0
3 1 1 1 1
Secure Mobile (M) 1 1 1 1 0 1 0 4 5
2 1 0 1 1 1 1
Secure Back-end (B) 1 1 1 0 1 0 8 4 9
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
2 1 1 1 1 1 0
2 1 1 1 0
3 0 1 1 1

This indicates that S1 does not satisfy the user requirement, whereas S2 and S3 fulfill
the user requirement. The same step is applied to all questions to have the relative
ranking of the lowest level in the hierarchy. Then, to aggregate the PVs from bottom to
top, the normalized weights of each level are considered to prepare all questions’ PVs
218 F. Alsubaei et al.

for the final aggregation. Thus, the final ranking PVB.2.3 ¼ 13  12  13  1  PVB:2:3 .
Similarly, PVE.2.4 ¼ 13  13  14  0  PVE:2:4 . Now that all PVs are weighted, the
aggregation of all of them will reveal the final overall rankings.

Thus, in case 1, S2 fulfills the user security requirements (Fig. 4c). The lower
levels’ rankings can also be compared the same way. Figures 4a, b show the com-
ponent level comparisons and security feature-level comparisons, respectively.

Fig. 4. Case study assessment results.

Case 2
In this case, the user assigned priority weights at various levels. For level 1, E is
assigned a weight of 2, which denotes low importance. For level 2, B.1 is assigned a
weight of 8, which denotes relatively high importance, whereas B.2 is not required and
hence has an assigned weight of 0. Thus, the normalized weights for B.1 and B.2 are 1
and 0, respectively. For instance, the weighted PVB.1.1 ¼ 0:4  1  14  PVB:1:1 .
Finally, for the lowest level, M.1.2.1 is assigned a weight of 0 (not required), and M.2.1
A Framework for Ranking IoMT Solutions 219

is assigned 1. Applying the steps described in case 1 for all questions with the new
weights, the final rankings are:

As Fig. 4c shows, unlike other cases, only S3 satisfies the user security require-
ments. This is because, in this case, the endpoint security features are not important and
were given a low weight. Since S3 fully satisfies the other components, it shows a better
ranking.
Case 3
In this case, the user assigned weights only to level 1. The normalized weights are
B = E = 0.39, M = 0.22. Thus, as shown in Fig. 4c, the final ranking reveals that only
S2 fulfills the user requirements.

6 Evaluation

To evaluate the framework, we present two methods. First, to verify completeness of the
list of assessment criteria, we tested its ability to identify and avoid known real-world
security incidents. Since our list of assessment criteria is collected from publications by
multiple specialized organizations, it should cover all security considerations related to
the IoMT. We verified that by gathering all reported IoMT-related vulnerabilities, as of
April 2018, in NIST’s National Vulnerability Database (NVD)1 and CVE Details2
during the last three years to ensure their recentness. The keywords used in this
extensive search are IoT, IoMT, medical, health, medical device, and healthcare. Upon
filtering all found vulnerabilities to exclude the ones that are irrelevant to IoMT (e.g.,
non-medical endpoints), we found 40 distinct vulnerabilities. Then, we analyzed the
details of each vulnerability and mapped it to corresponding security feature(s). This
way, we verified our framework’s accuracy in highlighting all missing or inadequate
security features. Table 4 shows the results of our analysis for each vulnerability with
Common Vulnerabilities and Exposures (CVE) ID and the most relevant feature(s) for
each vulnerability in the affected IoMT component. It is very likely that every vulner-
ability is covered by more than one security feature. As shown in Table 4, these vul-
nerabilities have diverse characteristics in terms of the affected IoMT component,
solution type, and scenario. Since our framework is successfully able to provide security
considerations that safeguard from these varied vulnerabilities, we believe it can scale
well to different and unknown vulnerabilities. This also demonstrates the framework’s
extensibility and cross-domain applicability.
To verify the effectiveness of the framework in capturing missing or inadequate
security features, we analyzed two commercial IoMT solutions that are known to
have/had serious security issues. For example, Medfusion 4000 syringe infusion

1
https://nvd.nist.gov.
2
https://www.cvedetails.com.
220 F. Alsubaei et al.

pumps3 are stationary medical endpoints that are used to deliver small doses of
medication in acute care settings. These pumps were vulnerable to eight and serious
security issues (vulnerabilities 1–8 in Table 4). These vulnerabilities are discussed in
details in an advisory issued by the U.S. Community Emergency Response Team
(CERT) [33]. Using our framework to assess the security of this device (before
applying patches) and compare it with other devices would show that the device has a
low-security score especially regarding the authentication. This information should
help future adopters in making better decisions. For instance, they can choose a better
alternative or wait until the vulnerabilities are patched. This helps users or adopters to
avoid the severe consequences associated with these unpatched endpoint devices,
which were highlighted in the Common Vulnerability Scoring System (CVSS)4 as
medium to high [33]. Similarly, kaa5 is IoT platform that allows healthcare system to
establish cross-device connectivity and implement smart features into medical devices
and related software systems. Kaa is vulnerable (no. 9) to code injection attacks.
Comparing its security with other platforms will result in a low score in feature B.2.

Table 4. IoMT vulnerabilities and their relevancy to our assessment framework.


No. Vulnerability CVE ID Relevant feature (s)
1 2017-12726 E.2
2 2017-12725 E.2
3 2017-12724 E.2
4 2017-12720 E.2
5 2017-12723 E.10
6 2017-12722 E.4
7 2017-12721 E.5
8 2017-12718 E.8
9 2017-7911 B.2
10 2017-11498 B.2
11 2017-11497 B.2
12 2017-11496 B.2
13 2017-6780 B.14
14 2017-7730 G.3
15 2017-7729 G.1
16 2017-7728 G.5
17 2017-7726 G.7
18 2017-3215 M.3
19 2017-3214 M.7
20 2017-8403 M.3
(continued)

3
https://www.smiths-medical.com.
4
https://nvd.nist.gov/vuln-metrics/cvss.
5
https://www.kaaproject.org/healthcare/.
A Framework for Ranking IoMT Solutions 221

Table 4. (continued)
No. Vulnerability CVE ID Relevant feature (s)
21 2017-5675 E.8
22 2017-5674 E.8
23 2017-14002 E.2
24 2018-5457 B.11
25 2016-8355 E.6, E.8
26 2017-6018 E.9
27 2017-5149 E.5
28 2015-3958 E.1
29 2015-3957 E.10
30 2015-3955 E.7
31 2015-1011 E.2
32 2015-3459 E.5
33 2017-14008 B.12
34 2017-14004 B.12
35 2017-14006 B.12
36 2017-14101 B.13
37 2018-5438 B.12
38 2016-9353 B.9
39 2016-8358 E.5
40 2017-12713 B.12

7 Limitations

Solution providers cannot be forced to cooperate by making technical details of their


products available to the public due to service abstraction constraints. This lack of
technical details can be one limitation of this work as these details are required for the
assessment. Nevertheless, adopters can always contact the solution providers’ customer
service to inquire about missing information. This will also give the adopters the
opportunity to know how cooperative and knowledgeable are the customer service
teams in the candidate solutions. We do not anticipate that providers will voluntarily
make their security features publicly available. Our work can motivate them to
cooperate to meet customers’ needs and compete with others transparently. Moreover,
the assessment criteria might not be easy to understand especially for novice users,
such as patients and medical professionals, who often lack the technical knowledge.
But, this work encourages them to learn about the security features and the potential
issues. Also, some users might find the process followed in this framework lengthy and
complex. Nevertheless, we argue that it is worth the initial effort and time investment
because it helps in discovering and avoiding severe consequences of improper security.
222 F. Alsubaei et al.

8 Conclusion and Future Work

Security plays a vital role in IoMT success. In this paper, we presented a security
assessment framework to increase the trust in IoMT solutions. This framework pro-
vides a list of security assessment criteria for IoMT solutions, composed of detailed and
simple-to-use questions. Using this assessment criteria, the framework also provides an
assessment method for IoMT solutions. The significance of this work lies in its ability
to assess a wide range of (1) stakeholders’ requirements (e.g., patients, medical pro-
fessionals, system administrators etc.); (2) solutions (services, devices, platforms, etc.);
and (3) architectures (e.g., mobile-controlled, cloud-based, etc.).
This work educates IoMT users (e.g., patients, medical professionals, etc.) who
often have a low level of awareness about the IoMT security issues and how to address
them. The benefits of this work are not only limited to adopters. This framework can
also be beneficial to IoMT solution providers in assessing their products and compare
them to other IoMT solutions. This encourages healthier and transparent competition
among solution providers. Moreover, researchers and legislators/standardization bodies
can utilize it to understand the security issues in order to better design security solutions
and regulations.
Our future work includes updating the list of assessment criteria that was mentioned
in this paper as well as in our previous work [34] to adapt to the continuous and rapid
evolution of IoMT solutions and their technologies. We will also develop a web-based
tool based on the framework presented in this paper.

References
1. A Guide to the Internet of Things Infographic. https://intel.com/content/www/us/en/internet-
of-things/infographics/guide-to-iot.html
2. 87% of Healthcare Organizations Will Adopt Internet of Things Technology by 2019 (2017).
https://www.hipaajournal.com/87pc-healthcare-organizations-adopt-internet-of-things-
technology-2019–8712/
3. Alsubaei, F., Abuhussein, A., Shiva, S.: Security and privacy in the internet of medical
things: taxonomy and risk assessment. In: 2017 IEEE 42nd Conference on Local Computer
Networks Workshops (LCN Workshops), pp. 112–120 (2017)
4. Cyber Risk Services|Deloitte US|Enterprise Risk Services. https://www2.deloitte.com/us/en/
pages/risk/solutions/cyber-risk-services.html
5. Inc, S.: Synopsys and Ponemon study highlights critical security deficiencies in medical
devices. https://www.prnewswire.com/news-releases/synopsys-and-ponemon-study-
highlights-critical-security-deficiencies-in-medical-devices-300463669.html
6. Medical Devices are the Next Security Nightmare. https://www.wired.com/2017/03/medical-
devices-next-security-nightmare/
7. Hamlyn-Harris, J.H.: Three Reasons Why Pacemakers are Vulnerable to Hacking. http://
theconversation.com/three-reasons-why-pacemakers-are-vulnerable-to-hacking-83362
8. Jalali, M.S., Kaiser, J.P.: Cybersecurity in hospitals: a systematic, organizational perspective.
J. Med. Internet Res. 28, 10059 (2018)
A Framework for Ranking IoMT Solutions 223

9. MSV, J.: Security is Fast Becoming the Achilles Heel of Consumer Internet of Things.
https://www.forbes.com/sites/janakirammsv/2016/11/05/security-the-fast-turning-to-be-the-
achilles-heel-of-consumer-internet-of-things/
10. Abie, H., Balasingham, I.: Risk-based adaptive security for smart IoT in eHealth. In:
Proceedings of the 7th International Conference on Body Area Networks, pp. 269–275.
ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications
Engineering) (2012)
11. Savola, R.M., Savolainen, P., Evesti, A., Abie, H., Sihvonen, M.: Risk-driven security
metrics development for an e-health IoT application. In: Information Security for South
Africa (ISSA), pp. 1–6. IEEE (2015)
12. Food and Drug Administration: Postmarket Management of Cybersecurity in Medical
Devices (2016). https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationand
Guidance/GuidanceDocuments/UCM482022.pdf
13. MDRAP|Home Page. https://mdrap.mdiss.org/
14. McMahon, E., Williams, R., El, M., Samtani, S., Patton, M., Chen, H.: Assessing medical
device vulnerabilities on the Internet of Things. In: 2017 IEEE International Conference on
Intelligence and Security Informatics (ISI), pp. 176–178. IEEE (2017)
15. Medical Equipment in General. https://www.iso.org/ics/11.040.01/x/
16. New Security Guidance for Early Adopters of the IoT. https://cloudsecurityalliance.org/
download/new-security-guidance-for-early-adopters-of-the-iot/
17. OWASP Internet of Things Project-OWASP. https://owasp.org/index.php/OWASP_
Internet_of_Things_Project#tab = Medical_Devices
18. [Press Release WP29] Opinion on the Internet of Things|CNIL. https://www.cnil.fr/en/press-
release-wp29-opinion-internet-things
19. GSMA IoT Security Guidelines-Complete Document Set. https://www.gsma.com/iot/gsma-
iot-security-guidelines-complete-document-set/
20. Laplante, P.A., Kassab, M., Laplante, N.L., Voas, J.M.: Building caring healthcare systems
in the internet of things. IEEE Syst. J. 12, 1–8 (2017)
21. Islam, S.M.R., Kwak, D., Kabir, M.H., Hossain, M., Kwak, K.S.: The internet of things for
health care: a comprehensive survey. IEEE Access. 3, 678–708 (2015)
22. Williams, P.A., Woodward, A.J.: Cybersecurity vulnerabilities in medical devices: a
complex environment and multifaceted problem. Med. Devices Auckl. NZ. 8, 305–316
(2015)
23. Leister, W., Hamdi, M., Abie, H., Poslad, S.: An evaluation framework for adaptive security
for the iot in ehealth. Int. J. Adv. Secur. 7(3&4), 93–109 (2014)
24. Wu, T., Zhao, G.: A novel risk assessment model for privacy security in Internet of Things.
Wuhan Univ. J. Nat. Sci. 19, 398–404 (2014)
25. Caldiera, V., Rombach, H.D.: The goal question metric approach. Encycl. Softw. Eng. 2,
528–532 (1994)
26. Bayuk, J., Mostashari, A.: Measuring systems security. Syst. Eng. 16, 1–14 (2013)
27. OWASP Internet of Things Project-OWASP. https://www.owasp.org/index.php/OWASP_
Internet_of_Things_Project
28. Health, C. for D. and R.: Digital Health-Cybersecurity. https://www.fda.gov/
MedicalDevices/DigitalHealth/ucm373213.htm
29. Naval Medical Logistics Command (NMLC): Medical Device Risk Assessment Question-
naire Version 3.0. (2016). http://www.med.navy.mil/sites/nmlc/Public_Docs/Solicitations/
RFP/MDRA%203.0-20160815RX.PDF
30. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1, 83–98
(2008)
224 F. Alsubaei et al.

31. Cheng, Y., Deng, J., Li, J., DeLoach, S.A., Singhal, A., Ou, X.: Metrics of Security. In: Kott,
A., Wang, C., Erbacher, R.F. (eds.) Cyber Defense and Situational Awareness, pp. 263–295.
Springer International Publishing, Cham (2014)
32. Saaty, T.L.: Decision-making with the AHP: why is the principal eigenvector necessary. Eur.
J. Oper. Res. 145, 85–91 (2003)
33. Smiths Medical Medfusion 4000 Wireless Syringe Infusion Pump Vulnerabilities (Update
A)|ICS-CERT. https://ics-cert.us-cert.gov/advisories/ICSMA-17-250-02A
34. Alsubaei, F., Abuhussein, A., Shiva, S.: Quantifying security and privacy in Internet of
Things solutions. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management
Symposium, pp. 1–6 (2018)
CUSTODY: An IoT Based Patient Surveillance
Device

Md. Sadad Mahamud ✉ , Md. Manirul Islam, Md. Saniat Rahman,


( )

and Samiul Haque Suman


American International University-Bangladesh, Dhaka, Bangladesh
{sadad,manirul,saniat,samiul}@aiub.edu

Abstract. In this paper, the authors present an assistance device for patient’s
surveillance. An IoT based system is developed for monitoring patient’s heart
rate, body temperature and saline rate. An Arduino microcontroller is used here
for processing the data and ESP32 module is used for monitoring the patient’s
data through internet and a GSM module is used for notifying the doctors in
emergency case. The main objective of this project is to help the doctors and
nurses to monitor a patient’s health condition through internet and over cellular
network. On the other hand, if the monitoring parameters exceed beyond their
nominal values, the ready message is sent to the concerned duty doctor as well
as the attendant and display it in the LCD screen and a specific audio sound is
played for urgent awareness.

Keywords: IoT · ESP32 module · Arduino · Heart rate · Body temperature


Saline measurement · GSM · Micro SD card module · Audio · LCD display

1 Introduction

The arrival of modern technology has made our lives much easier and comfortable in
comparison with the previous decades. But after having this technology, still a lot of
medical patients die each year due to the lack of integration of the technologies and
make it accessible at a very affordable cost. It is so difficult for a doctor to monitor a
patient 24/7 incessantly who is suffering from critical disease or some corporal malady.
One of the CCN health reports showed the 10 shocking medical mistakes for the patient’s
date case [1] and most of them occurred for lack of timely care. Hence, to remove human
trouble and lessen the compulsion of monitoring a patient restlessly from a doctor and
a nurse, this paper proposes a low-cost surveillance system called CUSTODY for moni‐
toring a patient through internet with the conduct of GSM technology. Health monitoring
system measures patient’s health condition in regular interval of time. This paper
describes the design of an IoT based pulse rate, saline level rate and body temperature
measuring system with the help of Arduino microcontroller and ESP32 module. The
system raises an alarm when the pulse rate or body temperature rate or the saline level
goes beyond or falls behind the threshold value and sends an emergency alert notification
to concerned doctor and family member. Patient’s real-time monitoring parameters can
also be viewed via internet at any time.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 225–234, 2019.
https://doi.org/10.1007/978-3-030-02686-8_18
226 Md. S. Mahamud et al.

2 Related Works

A Heart rate monitoring system is designed by P.A. Pawar [2] using IR based sensor
which can measure the heart rate and send the signal through GSM module. This system
is also based on the Arduino Microcontroller. Author mainly designed this system for
home-based effective heart rate monitoring system. A LPC2129 health monitoring
system is designed by M. Pereira [3]. In this paper the authors presented an IoT based
device using ARM 7 processor, ECG, Heart Rate, AD8232, and Body Fat percentage
module. The main idea presented in this paper is to provide better and efficient health
services to the patients by implementing a networked information cloud, so the experts
and doctors could make use of this data and provide a quick and an efficient solution.
An IoT based health monitoring system is proposed by N. Gupta with his co-authors
using android app only for health monitoring [4]. This paper presents a health monitoring
system using Pulse Oximeter sensor, Temperature sensors, PIR motion sensor. Using
GPRS network the patient’s data is uploaded in the custom server. As per the study of
this paper a health monitoring system is an efficient system to monitor the health condi‐
tion to keep track of one’s health. C. Raj with his co-authors proposed An IoT based e-
Health Care System for Remote Telemedicine [5]. For testing their system, they used
Body Temperature, Pulse Oximeter, ECG, GSR, EMG sensors to measure the patient’s
body parameters. The paper mainly focused on building a common interface between
multiple remote center and medical practitioner to monitor. An IoT Based Smart Health
Care System using CNT Electrodes is designed by M. Bansal with his co-authors [6].
The main objective of this paper is to provide people with an effective solution to live
in their homes or workplace comfortably instead of going to expensive hospitals. S.
Lavanya with his co-authors develops a remote prescription and I-home healthcare based
on IoT [7]. The authors used Heart rate sensor, Real time clock, RFID tag and for network
connectivity they used Raspberry Pi server. In general, this paper presents an IoT-based
intelligent home-centric healthcare platform which seamlessly connects smart sensors
attached to the human body for physiological monitoring for daily medication manage‐
ment. And many more researches are going on in this vast research field.

3 Architecture Model of the System

Our system is based on Arduino Mega Microcontroller Unit board and ESP32 WiFi
Module board. All the sensors data are being fetched and decoded into the microcon‐
troller and then being sent in real time using the ESP32 module. Figure 1 describes the
architecture model of the proposed system.
CUSTODY: An IoT Based Patient Surveillance Device 227

Fig. 1. Architecture model of the system.

4 Design System

The total system is primarily based on Microcontroller Arduino Mega. Here Arduino
serves as the main controlling unit. After receiving the data from the temperature sensor,
saline load sensor and pulse sensor, the microcontroller unit decodes data for final oper‐
ation. The ESP32 WiFi module is used here for the communication with the public
network. All the data that is being received by Arduino is stored into the micro SD card
module and that stored data is available into the web server through ESP32 module.
Figure 2 shows the simulation model of the total system. The total simulation is done
with the Fritzing simulation software [8].

Fig. 2. Simulation circuit diagram.


228 Md. S. Mahamud et al.

If any of the sensor values cross predefined nominal value, then a pre-defined SMS
or call will be given to the doctor and a specific audio sound will be played. The LCD
display panel shows the current state of the patient.

4.1 Arduino Mega 2560


The Arduino Mega 2560 is a microcontroller board based on the ATmega2560. The
MEGA 2560 is designed for more complex projects. With 54 digital I/O pins, 16 analog
inputs [9]. Mega is the main controlling unit for this system.

4.2 Pulse Sensor

Heart Beat can be measured based on optical power variation as light is scattered or
absorbed during its path through the blood as the heart beat changes [10]. In this system
we have used hair clip pulse sensor. We consider value_1 and value_2 are the first pulse
and list pulse counter value. Now, Ten_Pulse_time = value_1-value_2. So,
Single_pulse_time = Ten_Pulse_time/10. Then our final equation for Beats per Minute
(BPM) is:

Heart rate (BPM) = 60∕Single_pulse_time (1)

After calculating this pulse rate using (1) Arduino store the current rate into the internet
server through ESP32 module and if the pulse rate crosses the nominal value it sends a
SMS, an audio output played, and the display shows the current condition. Figures 3
and 4 shows the change in pulse rate measured for a 23 year old boy as our test patient
in Arduino Serial Monitor.

Fig. 3. Normal pulse rate. Fig. 4. Increased pulse rate.

4.3 Temperature Sensor

This system has used the water-proof DS18B20 Temperature sensor. The DS18B20
provides 9 to 12-bit (configurable) temperature readings over a one-wire [12] interface
[11]. Here, Temp = Output voltage * 0.48828125 and then finally

Temp_final = (Temp ∗ 1.8) + 32 (2)


CUSTODY: An IoT Based Patient Surveillance Device 229

By using (2) [13] Arduino calculates the patient’s body temperature and executes its
operation.

4.4 Load Sensor Module


For this system we have used a strain gauge load cell module [14]. The main concept
behind the load sensor is to measure the saline weight because we cannot put any sensor
inside the saline packet. The load sensor calculated the saline weight in litter and divided
it into three levels. Level 3 indicate that the saline is full level 2 indicate that the saline
is half and level 1 indicate that the saline is almost finish. The nurse or doctor should
change the saline packet. The load sensor values are being fetched by the Arduino Mega
microcontroller to check against the set values and trigger alarm if necessary.

4.5 ESP32 WIFI Module

ESP32 is already integrated with antenna and power amplifier, low-noise amplifiers,
filters, and power management module. The entire solution takes up the least amount of
printed circuit board area. This board is used with 2.4 GHz dual-mode Wi-Fi and Blue‐
tooth chips by TSMC 40 nm low power technology [15]. In this system ESP32 is used
for connecting the system with the cloud. The ESP32 module read the sensor data which
is saved in the SD card and process into the cloud. A private cloud domain server is
created for testing this system as “custody.com”. The monitoring web portal is created
with php and all the data is stored on a MySQL server. After login into the web portal,
using patient ID the user can see the patient’s real time condition. The ESP32 module
operates in the network layer of the OSI model [16]. Figure 5 shows the current health
condition of a patient in the CUSTODY web portal.

Fig. 5. Private cloud domain server custody.com


230 Md. S. Mahamud et al.

4.6 SIM900A GSM Module


GSM is mainly used in devices like mobile phones as well as for long distance commu‐
nication. It transmits and receives data over GPRS, making video calls and SMS [17].
In this project SIM900A GSM module is used for sending SMS. When the sensor values
will exceed the range for the given coordinates, GSM will send SMS to some selected
numbers. Figures 10 and 12 show the SMS received by the cell phone which consists
of patient’s condition.

4.7 Micro SD Card Module

The micro SD card module transfers data from a SD. The Arduino relates to the SD card
through the breakout board and audio commands were saved in this SD card. The
connection of the module with Arduino is shown in Fig. 2. When any sensor value
crosses the range for the given coordinates an audio output will be generated to make
the people aware about the danger. And it also stores the sensor data in the SD card for
transfer it to internet with the help of Arduino and ESP32 Module.

4.8 Audio Amplifier and Speaker

When the audio commands are played form the micro SD card the audio volume is
relatively low. So, to make it louder, we used our own custom made 9 V hearable audio
amplifier. Audio amplifier was made using LA4440 IC and an 8 Ω speaker is used.

4.9 16 * 2 LCD Display


A 16 * 2 LCD display is connected with the system. This display shows the current
sensor value and the patient’s current condition.

5 Hardware Model

Figure 6 shows the hardware model of the system. All the sensors are connected with
the Arduino and the output results are displayed into the LCD module along with Emer‐
gency audio output.
CUSTODY: An IoT Based Patient Surveillance Device 231

Fig. 6. Hardware model of the system.

6 Results

Table 1 shows the test result of this system where the audio output and SMS output are
set. We test this system for only one patient. Different analysis results for this system
are given below. Figures 7 and 8 show the test result when patient is in normal condition
and the saline level is in normal condition as well. No SMS will be triggered or no audio
will be played. The web portal will have patient’s current data. Figures 9 and 10 show
the test output when the saline is in low-level condition. For testing purpose, we used a
250 ml bottle as a saline packet. When saline is almost finished the load sensor gets a
very small weight and a SMS will be sent and audio will be played. Figures 11 and 12
show the test output of the situation when the temperature increases. We increased the
temperature manually and a SMS is sent to the pre-defined number and an audio is played
as well.

Table 1. Results analysis for audio and SMS output


Condition Audio SMS
Normal condition No audio No SMS
Normal condition No audio No SMS
Normal condition No audio No SMS
Body temperature increased Audio played SMS sent
Pulse rate Increased Audio played SMS sent
Saline level low Audio played SMS sent

Figure 13 shows the web server monitoring window when the patient’s pulse rate is
increased. And Fig. 14 shows that if anyhow two or three parameter falls in one time
the system will return patient’s condition as emergency. On this condition a call will be
sent to the attending doctor and also an emergency audio will be played by the system.
232 Md. S. Mahamud et al.

Fig. 7. Normal condition. Fig. 8. Normal saline condition test.

Fig. 9. Saline level low. Fig. 10. SMS received for saline low.

Fig. 11. Temperature increased. Fig. 12. SMS sent for temp. increased.
CUSTODY: An IoT Based Patient Surveillance Device 233

Fig. 13. Web server monitoring when patient pulse rate increased.

Fig. 14. Web server monitoring when more than one sensor parameter crosses its nominal value.

7 Conclusion

The main objective of this paper is to create a low-cost IoT based medical surveillance
system that can be a true virtual assistant to a doctor using smart technique. Real-time
monitoring of the patient’s current health condition by family members is an added
advantage of this system. The initial test run of the prototype is successful. But some
future work is needed for this system. More upgraded sensors can be used to calculate
the pulse rate. As it is an IoT based system the patient’s data must be safe and the data
processing must be faster. In future, further research can be carried out to improve the
algorithm of the system.
234 Md. S. Mahamud et al.

References

1. 10 shocking medical mistakes—CNN. https://www.cnn.com/2012/06/09/health/medical–


mistakes/index.html. Accessed 2018
2. Heart rate monitoring system using IR base sensor and Arduino Uno—IEEE Conference
Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/7057005/.
Accessed 25 Apr 2018
3. A novel IoT based health monitoring system using LPC2129—IEEE Conference Publication.
Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/8256660/. Accessed 25 Apr
2018
4. IOT based health monitoring systems—IEEE Conference Publication. Ieeexplore.ieee.org
(2018). https://ieeexplore.ieee.org/document/8276181/. Accessed 25 Apr 2018
5. HEMAN: Health monitoring and nous: An IoT based e-health care system for remote
telemedicine—IEEE Conference Publication. Ieeexplore.ieee.org (2018). https://
ieeexplore.ieee.org/document/8300134/. Accessed 19 Jun 2018
6. IoT based smart health care system using CNT electrodes (for continuous ECG monitoring)
—IEEE Conference Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/
document/8230002/. Accessed 19 Jun 2018
7. Remote prescription and I-Home healthcare based on IoT—IEEE conference publication.
Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/8094069/. Accessed 19 Jun
2018
8. Fritzing. Fritzing.org (2018). http://fritzing.org/home/. Accessed 25 Apr 2018
9. A. [closed]: Arduino Mega 2560 serial port location. Arduino.stackexchange.com (2018).
https://arduino.stackexchange.com/questions/47727/arduino-mega-2560-serial-port-
location. Accessed 25 Apr 2018
10. Grove—Ear-clip Heart Rate Sensor| Techshopbd. Techshopbd.com (2018). https://
www.techshopbd.com/product-categories/biometrics/1389/grove-ear-clip-heart-rate-
sensor-techshop-bangladesh. Accessed 25 Apr 2018
11. https://playground.arduino.cc/Learning/OneWire. Accessed 25 Apr 2018
12. DS18B20 Digital Temperature Sensor (CN) | Techshopbd. Techshopbd.com (2018). https://
www.TechSoup.com/product-categories/temperature/2796/ds18b20-digital-temperature-
sensor-cn-techshop-bangladesh. Accessed 25 Apr 2018
13. Sensing heart beat and body temperature digitally using Arduino—IEEE Conference
Publication. Ieeexplore.ieee.org (2018). https://ieeexplore.ieee.org/document/7955737/.
Accessed 25 Apr 2018
14. D. Load Cell—200 kg, S. Load Cell—10 kg and D. Load Cell—50 kg: Getting started with
load cells—learn.sparkfun.com. Learn.sparkfun.com (2018). https://learn.sparkfun.com/
tutorials/getting-started-with-load-cells. Accessed 25 Apr 2018
15. Overview | Espressif Systems. Espressif.com (2018). https://www.espressif.com/en/
products/hardware/esp32-devkitc/overview. Accessed 25 Apr 2018
16. What is OSI model (Open Systems Interconnection)?—Definition from WhatIs.com.
SearchNetworking (2018). https://searchnetworking.techtarget.com/definition/OSI.
Accessed 25 Apr 2018
17. Sim900a Gsm Module Interfacing with Arduino Uno. Electronicwings.com (2018). http://
www.electronicwings.com/arduino/sim900a-gsm-module-interfacing-with-arduino-uno.
Accessed 25 Apr 2018
Personal Branding and Digital Citizenry:
Harnessing the Power of Data and IOT

Fawzi BenMessaoud(&), Thomas Sewell III, and Sarah Ryan

School of Informatics and Computing, Indiana University and Purdue University,


Indianapolis, IN 46202, USA
fawzbenm@iu.edu

Abstract. With all that the internet has to offer, it is easy to get lost in the
myriad of resources available to us both academically and socially. We have so
many ways to learn, connect, and promote ourselves that in trying to stay current
in today’s digital world, we can quickly find ourselves overwhelmed. To be
successful, we need a way to conveniently organize educational materials and
references while also ensuring that only our very best self is on display.
According to a study we conducted on this subject, the idea of personal online
management is something which many value highly, but are unsure how to fully
realize. We feel like this is problematic for any modern user, but this can be
resolved. Using multiple data collection methods in our research, we explored
the concept of “Digital Citizenship”. Digital Citizenship is defined as a way of
expressing the online presence and personal brand that users have curated in a
digital space; as well as a simpler, more efficient way to store and organize a
personal digital library. We are presenting an app that would help to fill this
need within the realm of academia and beyond. This app is a way of simplifying
our lives, making the internet more accessible and managing personal, educa-
tional, and academic materials, online profiles, and social media accounts.

Keywords: Personal brand  Digital footprint  Digital Citizenry


Social media

1 Major Aspects

Our study was based on three factors that we felt were interdependent. We were
interested in seeing how people consider their public data, represent their images
online, and how they store personal data. These topics were labeled as Personal Brand,
Web Presence, and Digital Content Storage.

2 Personal Brand

Personal Brand encompasses the way a person presents themselves online. Bridgen
asserted that a person can become successful by developing and marketing their per-
sonal brand, highlighting themselves in a positive light and developing their online self
in such away to be engaging to others [2].

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 235–240, 2019.
https://doi.org/10.1007/978-3-030-02686-8_19
236 F. BenMessaoud et al.

Over a period of time, successful individuals obtain a reputation and position based
on a combination of their expertise and “connectedness”, which makes them attractive
to other players operating in the same space. An authentic personal brand therefore
delivers both a track record and a promise of the ongoing delivery of value. From the
Journal of Business Strategy, we see the statement, “… in most cases authentic per-
sonal brand builders are genuinely strong performers who are highly sought after by
employers because they have the ability to use their personal social capital for the
benefit of the organization and their own career progression within it” [3].

2.1 Web Presence


Inextricably linked to one’s personal brand is their web presence, particularly in the
context of social media. Web presence is the public way an individual is observed from
the point of view of an audience while on the internet. Jones postulated that the more
connections people make, the larger their digital footprint, and the more likely potential
employers will find less positive aspects of a person’s digital life [4]. This is important
to consider, particularly when searching for a career. However, no one lives in a
vacuum, and digital connections with friends, family members, or even professional
contacts is virtually inevitable in the world we live in today.
According to Brake, “Profiles and entries on Facebook, Twitter and many other
such services can contain diaristic or confessional material that looks as if it is only for
the author to read or perhaps for trusted friends and family - but although social media
services often include tools to keep such writings private, many are visible to a large
number of people or even published openly on the web with potential audience of
millions.” [1].
The solution then is not simply to be aware, but to be able to manage one’s image
in the digital space, promoting positive aspects while diminishing the aspects that are
less so. In the article by Harris & Rae, the authors state, “… the ‘digital divide’ between
the ‘haves’ and the ‘have nots’ in the developed world is now less about access to the
web than it is about understanding how to actively participate in the networked society”
[3] having the power to manage one’s overall web presence is key to success in modern
times.

2.2 Digital Content Library


The concept of Digital Content Storage is a library, a collection place of all digital
content that a person owns and uses. This is similar to other methods used to save and
share files of different types and sizes, such as Dropbox or Google Docs. The app we
are presenting as a solution works in this same way, but with the added bonus of the
“vault” feature, which would be a specific space located within the library with extra
security features for more sensitive and restricted document files and information.
Personal Branding and Digital Citizenry 237

3 Motivation

Our motivation for this research was based on our hypothesis that the general populace
has a lack of awareness regarding the importance of monitoring online presence and
has difficulty managing the vast resources available to them. We tested this theory.

3.1 Methods
In choosing a method of study we thought it would appropriate to make use of an
online survey in order to reach a variety of respondents in light of our triple constraints:
we were able to reach the highest amount of people in our given time by the most cost-
effective means. We conducted our study using Google forms. We posted several links
to our survey on Facebook and Twitter, in order to gain a wide viewing and have the
most success. Distributing the survey in this way allowed us to get feedback from those
who may no longer be students or in the academic world, and did not assume any prior
knowledge of our topics, giving us the widest possible net to cast for data. This survey
included questions based on personal brand, web presence, and digital content storage,
gauging the participants both in their current knowledge of these topics and also their
current usage of applications and software/hardware specific to these subjects.
This initial survey was left open for one week. We used the analytics provided by
Google Docs initially, and then used the raw data to analyze the information for
ourselves to make our conclusions. We split this survey into sections, and each section
was specific to one of the three topics we were testing. This allowed us to get a
somewhat general idea of the prior knowledge our participants had for each of our
topics. For example, “Are you Familiar with Personal Brand?” was a specific question
we asked our participants in order to try to gain an understanding of what the general
public might or might not already know about the subject, an approach we felt was
useful in giving meaning to the survey.

3.2 Findings
The data we collected from our surveys proved to hold a number of patterns which we
found in the process of our analyzation. Our initial survey collected data from 60
volunteer participants. This gave us quite a bit of information, which was useful in
gaining knowledge from a large variety of data.
One of our main interests was in determining how important people considered
their social media presence. We were interested in the importance people put on
themselves and their personal media first. Our results showed that over 50% of people
placed themselves and their social media in the mid-range. 81.67% of respondents rated
their social media at a 3 or higher on a scale of 1–5, with 66.67% rating themselves at 3
or 4, the middle rankings (see Fig. 1).
Another interesting pattern we found in our data was the distribution of gender, in
the way that affected our survey. In Fig. 2, our respondents were 66% female, 33%
male, so we thought it prudent to measure some of our responses by gender to find
important differences in the use of social media and web accounts.
238 F. BenMessaoud et al.

Fig. 1. Results of a survey question regarding participant’s ranking of the importance of their
own social media presence.

Fig. 2. Side by Side graphic showing media accounts held by gender. We did not find gender to
have any significant impact on number of social media accounts held by participants.

According to these results, gender is not a highly determining factor when con-
sidering number of social media accounts currently in use by users. This is especially
interesting considering the gender of respondents: as stated, 66% of respondents were
female and 33% male, and interestingly, our data shows a low level of disparity
between the two. We determined that this further proves that a better knowledge of
one’s digital footprint is universally beneficial (see Fig. 2).
We were also very interested to see how highly people determine the importance of
security of their saved content, asking them to rank that importance on a scale of 1–5.
Interestingly, from this data, we found that zero respondents rated their security
Personal Branding and Digital Citizenry 239

importance at one, the lowest. Alternatively, 56.67% of respondents rated their interest
in security of saved data at a 5, the highest possibility on the scale, which shows very
clearly how highly security is considered (see Fig. 3).

Fig. 3. Figure graphing the result of our question regarding importance of saved content. Over
50% of all respondents indicated that it was of extreme importance to them by ranking security at
the highest possible level.

4 Conclusion

In summary, we found that our initial hypothesis was correct. Our belief is that the
internet is ever-expanding, producing more connections than have existed in any time
prior. The many nuances of our presence in this digital space is often missed or not
fully understood, and this can result in unexpected repercussions. The goal of our
research was to see to what extent the people we surveyed were aware of their larger
online presence and the way that they navigated the digital landscape. In examining the
responses, we received for our survey, the patterns showed us that while the people we
surveyed answered that they understood each part of the three categories we were
questioning about, they lacked a big-picture perspective of how those categories were
intertwined. Digital Citizenry combines these concepts together, providing users with a
way to manage their online selves by understanding the overlap that comes from a
digital space, and therefore empowering people to make the best decisions, both for
their present and for their future.

References
1. Brake, D.R.: Sharing Our Lives Online: Risks and Exposure in Social Media. Palgrave
Macmillan, Hampshire (2014)
240 F. BenMessaoud et al.

2. Bridgen, L.: Emotional labour and the pursuit of the personal brand: Public relations
practitioners’ use of social media. J. Med. Pract. 12(1), 61–76 (2011). https://doi.org/10.1386/
jmpr.12.1.61_1
3. Harris, L., Rae, A.: Building a personal brand through social networking. J. Bus. Strategy 32
(5), 14–21 (2011). https://doi-org.proxy.ulib.uits.iu.edu/10.1108/02756661111165435.
Accessed 9 Apr 2018
4. Jones, C., et al.: Net generation or digital natives: is there a distinct new generation entering
university? Comput. Educ. 54(3), 722–732 (2010). https://doi.org/10.1016/j.compedu.2009.
09.022
Testing of Smart TV Applications: Key
Ingredients, Challenges and Proposed
Solutions

Bestoun S. Ahmed(B) and Miroslav Bures

Department of Computer Science, Faculty of Electrical Engineering,


Czech Technical University, Karlovo nám. 13, 121 35 Praha 2, Czech Republic
{albeybes,buresm3}@fel.cvut.cz

Abstract. Smart TV applications are software applications that have


been designed to run on smart TVs which are televisions with integrated
Internet features. Nowadays, the smart TVs are going to dominate the
television market, and the number of connected TVs is growing expo-
nentially. This growth is accompanied by the increase of consumers and
the use of smart TV applications that drive these devices. Due to the
increasing demand for smart TV applications especially with the rise
of the Internet of Things (IoT) services, it is essential to building an
application with a certain level of quality. Despite the analogy between
the smart TV and mobile apps, testing smart TV applications is differ-
ent in many aspects due to the different nature of user interaction and
development environment. To develop the field and formulate the con-
cepts of smart TV application testing, this paper aims to provide the
essential ingredients, solutions, answers to the most critical questions,
and open problems. In addition, we offer initial results and proof of con-
cepts for a creeper algorithm to detect essential views of the applications.
This paper serves as an effort to report the key ingredients and chal-
lenges of the smart TV application testing systematically to the research
community.

Keywords: Smart tv application testing · Software testing


Model-based testing · Internet of Things (IoT)

1 Introduction
A connected TV, which is popularly called smart TV, is a technological assem-
blage device among computer and traditional television. The device is a combina-
tion of conventional TV terminal, operating system (OS), and digital contents in
which all of them are connected to the Internet. Smart TVs are providing differ-
ent digital services like multimedia, gaming, Internet browsing, on-demand enter-
tainment access, a various online interactive session in addition to broadcasting
media. In fact, these devices were expected to be more intelligent, interactive,
and useful in the future [1]. Recently, the electronic companies along with IT
firms were rising investments in the technological advancements of these devices
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 241–256, 2019.
https://doi.org/10.1007/978-3-030-02686-8_20
242 B. S. Ahmed and M. Bures

by launching new terminals and applications for smart TVs. It is expected shortly
that these devices will be a frequent part of our smart homes within an Internet
of Things (IoT) context1 . This explains why the smart TV market worth $265
Billion by 20162 .
Just like the new technological smart devices, smart TVs are operated by an
OS with different applications (apps) installed on it. Although the OS is the key
software for operation, the installed apps on the smart TV brings different uses
and functionalities to the device. At a glance, the smart TV app may look like
a mobile app due to the similarities of the OSs or the development kits. Due
to this “fake” similarity, one may think of testing smart TV apps just like the
mobile app testing. However, in fact, testing smart TV apps is different due to
the nature of user interaction with the app itself.
In mobile apps, the user is interacting with the device touchscreen (i.e., the
application) directly by hand whereas, within smart TVs, the user is interacting
with the app through another device which is the remote controller. Of course,
some vendors are providing interaction by touchscreen to the users, but the way
that application behaves is still based on the remote control device when it comes
to testing practices. In addition, the user of any TV (including the smart TVs)
is usually staying away from the screen and almost use the remote device to
operate the apps all the time.
In the literature, mobile apps testing is well-studied, and many research direc-
tions have been established, (e.g., [2–4]). However, testing smart TV apps is a
new area and many challenges still without a solution, and many research ques-
tions may arise without answers. To address these challenges and questions, it
is essential to explore the app structures, interaction ways, development envi-
ronments, and the technology behind the apps. In doing so, this paper examines
the key ingredients of smart TV app testing. The paper aims to address the
most demandable questions. The paper also discusses the challenges addressed
so far in the literature and open problems for test automation and generation.
Based on that, a systematic framework for testing applications on Smart TVs is
illustrated throughout a prototype. The framework includes the testing process,
its steps, and also the test generation strategy. This will help to validate the
different aspects of the applications before release. This could also serve as an
initiative topic for further research in the near future. The framework will help
to address and formulate more open problems and research questions.
The rest of this paper is organized as follows. Section 2 summarizes the related
works in the literature and those efforts in smart TV app testing that could
be useful here. Section 3 explains the technology behind the smart TV apps.
Section 4 illustrates some analogy and differences between mobile and smart
TV apps. Section 5 describes the navigation and control mechanism of smart
TV apps. Section 6 discusses the open research problems in the smart TV app
testing. Section 7 defines a prototype for a systematic automated testing strategy.
Section 8 discusses the functional and non-functional testing Opportunities in

1
https://read.bi/2L4CDSI.
2
https://bit.ly/2HxnMkL.
Testing of Smart TV Applications 243

Smart TV Applications. Finally, Sect. 9 give concluding remarks and also future
research recommendations.

2 Motivation and Literature

Testing software applications on smart devices is considered to be a development


and an evolution of testing practice from the traditional user interfaces (UI)
like graphical user interface (GUI) and web application testing. The testing
practices for these UIs have been studied extensively in the last decade, and as a
result, many sophisticated methods, algorithms, and tools have been developed.
Banerjee et al. [5] studied more than 230 articles published between 1991–2013
in the area of GUI testing and Li et al. [6] surveyed the literature in two decades
of web application testing.
Mobile application testing could be considered as the first effort towards
smart application testing. There are many differences between mobile apps and
graphical/web UI. In fact, the main issue that makes the difference in the testing
process is the user interaction with the application. In the standard GUI and web
applications, the keyboard and mouse combination is still the standard user input
to interact with the applications. However, this is not the case for mobile apps as
the user interacts with the device touchscreen by fingers and hence, there would
be different interaction behavior from various users. Although this issue leads
to develop new testing strategies for mobile apps, still many of these strategies
are taking benefits, wholly or partially, from the earlier methods and practices
published for GUI and web application testing. For example, Amalfitano et al.
[7] developed MobiGUITAR strategy for systematic mobile application testing
from the GUITAR strategy [8] for GUI testing. An extensive study on mobile
application testing is presented in [2].
Smart TV application is a new smart device application type. The views of
the application are not like other applications. The application structure looks
like web application as it relies on HTML, CSS, and JavaScript; however, the
user interaction with the application differs from other types of applications.
Usually, the user is not interacting with the application directly by hand, and it
should be through another input device, which is the remote device. This could
lead to think that the testing process is similar to the GUI or web application.
However, the remote device does not behave like the standard mouse. While the
standard mouse input device can move in every direction on the application,
the remote device movement is restricted to four explicit directions. The inter-
action difference makes many obstacles and difficulties when it comes to testing
process. While the general concepts of model-based testing are applicable here,
the construction of the model and the model type makes the difference. For
example, due to the different interaction nature, Nguyen et al. [8] used Event
Flow Graph (EFG) as a model of the GUI testing, whereas Amalfitano et al.
[7] uses state machine as a model for the mobile application testing. In smart
TV app, both EFG and state machine models are not applicable. In Smart TV
app, each transition from a state to another is practically just one step, while
244 B. S. Ahmed and M. Bures

this is not the case in other applications. For example, in the mobile app, the
distance between two icons (states) does not make sense in the transition, while
this is very important in the smart TV application, and that will lead to a differ-
ent model. An important effort to formulate this model is done recently by Cui
et al. [9]. Here, the Hierarchical State Transition Matrix (HSTM) is proposed as
a model for the Android smart TV applications. While the model is promising,
there is a need to develop and formulate it for the complex structure of different
applications.
In fact, testing smart TV apps could be seen from different angles. For exam-
ple, usability testing is one of the critical testing issues to address the interaction
between the user and the smart TV through remote device. This will help to
improve the quality of the user interfaces of the applications. Ingrosso et al. [10]
addressed this issue by using several users to test an e-commerce application
on smart TV. Security testing is also an essential issue in the smart TV apps.
However, we could not find a published study addressing security in Smart TV
apps. Recently, Sabina C. [11] discussed and described some of the testing plat-
forms for Smart TV apps. The study chooses Opera and Samsung TV Stores for
testing the applications. The testing process relies on the upload of the applica-
tions to the Opera and Samsung application stores to verify them based on the
code writing. Hence, there is no definition of the testing strategy itself, and that
could not be considered as a formal testing process. The study has also addressed
the importance of functional testing of these applications without giving details
since it is a bachelor study with limitations.
Although it is essential from the industrial point of view, we could not find
many companies giving solutions for smart TV apps testing. One of the exciting
projects so far is the suite.st framework3 . The framework depends on record and
replay testing style by using two different devices, one for recording the actions,
and the other is for acting like an emulator. In fact, the platform dealing with
the application just like a web application and uses record and replay style of
testing being employed by SeleniumHQ4 . The framework is a good startup for
the industry to adapt selenium style of testing for smart TV apps. Although the
framework claims that it is dealing with the functional testing of mobile apps,
still the pass/fail criteria are not clear from an academic point of view. As a
result, there is a need to define a test oracle for the framework. In addition, the
framework does not rely on some automatic test generator for fully testing of
the applications. In fact, defining a test oracle for smart TV application could
be a new research direction as we will address it later in this paper.

3 Smart TV Apps Development and Technology


Just like Android apps, smart TV apps are developed using Software Develop-
ment Kits (SDK). The new versions of Android SDK supporting the development
of smart TV apps. However, these applications can be run on Android Smart TV
3
https://suite.st.
4
http://www.seleniumhq.org/.
Testing of Smart TV Applications 245

devices only. In fact, few SDKs were available for cross-platform development.
For example, Joshfire5 Smart TV SDK was a platform to develop applications to
work on Google and Samsung TV devices but not on LG TV devices. Mautilus6
Smart TV SDK is also a platform for development, but still, the application
is working on some versions of devices only. Smart TV Alliance7 was the most
advanced SDK by supporting different features and platforms. However, the
project is shut down, and the SDK is not available for download.
Samsung Tizen SDK provides a set of tools and frameworks to develop smart
TV apps through Tizen Studio. The SDK is depending on the latest web tech-
nologies such as JavaScript, CSS, HTML5, and W3C widget packaging. In fact,
Samsung has established Tizen.Net which is a new cross-platform application
development that has been integrated with Visual Studio.
Nowadays, most of the SDK tools are relying on a unified approach to the
development technology for smart TV apps. The technologies behind the appli-
cations are JavaScript, HTML5, and CSS3. JavaScript is used as a standard
programming language to program the behavior of the applications. The use
of JavaScript adds the page jumping capability of the application. It enables
the developer also to code complex expressions and calculations like condi-
tional branches, and loops. The fifth version of the Hypertext Markup Language
(HTML5) is used as the latest version for developing the web elements’ structure
and content. The HTML5 is essential to develop the structure of the application
page even without the JavaScript code, but that will lack the interactivity with
the user [12]. Finally, the third version of the Cascading Style Sheets (CSS3) is
used for the presentation of these web elements and polishing them for better
visualization. These essential components are forming the latest and best tech-
nology of the smart TV application, and also they are the newest technology for
the World Wide Web.
In general, Smart TV app could be one of two types, installed or cloud-
based. Installed TV app is a stand-alone app installed on the smart TV without
the need for the Internet connection, while the cloud-based TV app works as
an interface between the cloud and the TV with a shallow content (almost no
additional functionality) when there is no Internet connection.

4 The Analogy and Differences of Smart TV and Mobile


Apps
There are many similarities and differences between the Mobile and Smart TV
apps. These similarities and differences could be seen in three dimensions, (1)
Functionality, (2) Design, and (3) User interaction.
Both applications are working on smart devices. Hence, the functionality
could be similar, as they are both connected to the Internet. The mobile apps

5
https://www.joshfire.com/.
6
https://www.mautilus.com.
7
http://www.smarttv-alliance.org.
246 B. S. Ahmed and M. Bures

could be useful even without connection to the Internet; however, several smart
TV apps are useless without the network connection. The computation power
of the smart device also could define the functionalities of the application itself.
In fact, the mobile apps could be more functional than smart TV apps because
the mobile devices nowadays may have more computational power than smart
TVs. In addition, the aim of the mobile apps is almost different from the smart
TV apps.
Speaking about the application design, there are many differences. For exam-
ple, the size of the screen and icons could define the layout of the application.
Smart TV screens are wider than the mobile devices. The background color of
the smart TV apps could be different from the color in the mobile devices. From
the user interaction point of view, smart TV apps are having less text entry as
it is difficult to enter text from the remote device. Most of the smart TV apps
are designed to get the content from the Internet when connecting whereas this
is not the case for the mobile apps, as they could be standalone applications
without Internet connections interfaces8 . The typical smart TV application is
much more straightforward than the mobile app, especially in the design layout.
The way that the user interacts with the application defines an essential
difference between the smart TV and mobile apps. The user of the mobile app
interacts directly with the application without an intermediate device, while in
the smart TV application, the user interacts with the help of a remote device. In
fact, the UI of the smart TV apps sometimes called 10-foot user interfaces since
the 10 ft (3 m) distance from the TV is the standard distance between the user
and the TV. The developers are considering this distance when developing the
user interface [11]. Using the remote device with this distance is not user-friendly
and not responsive. Hence, the UI must consider this significant difficulty. As
mentioned previously in Sect. 2, this interaction difference will be significant also
when approaching the testing process with model-based testing.

5 Navigation and Control in Smart TV Apps


As mentioned previously, navigation on a smart TV application is through the
remote device. Although some new TV devices are offering the direct interaction
by the user with the screen, the most common interaction with the TV is still
the remote device. The remote device consists of four essential navigation Right,
Left, Up and Down. In addition, the remote device has an OK button to choose
any selected view on the application after exploration. These five key buttons
should work properly while using an application. Figure 1 shows an example of
the TV remote device.
In addition to those five buttons, there are many other buttons on the remote
device that vary from a TV brand to another depending on the level of function-
alities. Some of them are related to the hardware functionalities of the TV itself,
as the power button to turn ON/OFF the TV. There are also ten buttons (from
0–9) for channel jumps and even entering numbers in text fields if necessary.
8
https://bit.ly/2IiNb30.
Testing of Smart TV Applications 247

Fig. 1. TV remote device.

The UI layout of any application plays a primary rule in the testing process.
Understanding the layout could lead to an efficient test generator and runner.
Smart TV apps are following some limited number of layout patterns. Figure 2
shows three main patterns in which most of the smart TV apps are following.
In fact, layout (b) is mostly used, since it puts many views in one window.

Fig. 2. Three main layout design patterns for smart TV apps [13].

The remote device is putting constraints on the navigation from a view to


another because it supports just one step navigation. Hence, each move on the
layout is a step. This would not be a problem when two views are adjacent;
however, for those non-adjacent views, more than one step is needed to move
from one view to another. This navigation is very important when coming to
the test generation strategy based on the application’s model.

6 Open Problems and Challenges


In this section, we discuss different problems and challenges that need to be
addressed for the smart TV app testing. In the following subsections, we will
address each problem, the challenges to solve the problem and our suggestions.
248 B. S. Ahmed and M. Bures

6.1 Start Point of Navigation

One of the first problems that the tester face when testing a smart TV app is
the position of the navigational cursor. Technically speaking, from a JavaScript
developer point of view, this happened when the focus point is not set in the
application. For several applications on the store, this focus point is not set by
the developers. As a result, when the application runs on the emulator, there is
no pre-selected view on the application. The user must use the remote device to
chose a view. Hence, the starting point of the navigator is missing. This problem
is happening clearly with cloud-based TV apps because the views are changing
in real-time with the cloud content. In fact, this is a challenging issue because it
prevents the pre-generation of test sets.
One solution to this problem is to let the tester choose the starting point of
the testing. Yet, there could be a problem of good or bad selection point. Some
starting points may lead to explore the app window sooner by navigating faster
on the views.

6.2 Repository and Benchmark

In general, any software testing verification and validation process should be eval-
uated through some benchmarks. These benchmarks could be real instrumented
programs with some properties for testing. For example, many testing strategies
are using the benchmarks available at Software-artifact Infrastructure Reposi-
tory website9 for benchmarking and evaluation. For android testing, there are
different applications for testing. For instance many papers were using TippyTip-
per 10 , PasswordMaker Pro 11 , MunchLife, K-9 Mail 12 , Tomdroid 13 , AardDict 14 ,
and a few other applications for testing.
In smart TV apps testing, we don’t have enough applications for benchmark-
ing, and we don’t have a repository to store some benchmarks. In fact, there are
two reasons behind this. First, smart TV apps are new and more time may be
needed for the developers to create and publish open source applications. Sec-
ond, the testing process of smart TV app is not defined yet, and the research
is not initialized, in which this paper could be an effort toward that. Samsung
maintains a page with some simple applications and examples15 .
One solution for this difficulty is to develop applications for testing purposes.
Here, the reliability of the testing process would be an issue. However, for better
reliability, the testing and development groups could be separated.

9
http://sir.unl.edu/portal/index.php.
10
https://tinyurl.com/yd77qfzd.
11
https://tinyurl.com/ma65bc8.
12
https://tinyurl.com/6mzfdaa.
13
https://launchpad.net/tomdroid.
14
https://github.com/aarddict/android/issues/44.
15
https://bit.ly/2qC5ncS.
Testing of Smart TV Applications 249

6.3 Test Generator

In mobile app testing, most of the test generation strategies were almost inspired
by other UI test generation strategies. For example, the test generator strategy
of MobiGUITAR [7] framework was adapted from the GUITAR [8] framework
for GUI testing. However, this method could not be followed in smart TV apps.
Due to the user interaction difference in smart TV app, it is hard to adapt some
test generator strategy from GUI or mobile app testing. For this reason, there
is a need to develop a new test generation strategy.
Although relying on previously investigating strategies is not clear at this
early stage, following principles and concepts of model-based testing is still valid.
Here, after deciding on the model and notations, the coverage criteria of the
testing strategy would be another issue. Defining the coverage criteria depends
mainly on the tested functional and non-functional requirements.

6.4 Activity Exploration

The test generation stage cannot be performed without input to the generator
algorithm. For functional or non-functional testing, most probably, the input
would be two things, the number of events to test and the coverage criteria. As
mentioned previously, the coverage criteria can be defined based on a predefined
testing strategy. However, getting the input views for the test generation algo-
rithm may need an exploration of the entire UI activity (i.e., window) of the
smart TV app.
Activity exploration is not a big issue (at least technically) when we have
the source code of the application, i.e., white box testing. A simple code crawler
could scan the HTML5 and CSS3 files and detect the views by parsing the
code, and then feed the generator algorithm by these views. However, catching
the views in the testing process without having the source code (i.e., black-box
testing) could be a tricky job. In fact, there is a need for a special algorithm due
to the special interaction with the application by the remote device.
In Sect. 7.1, we will introduce an algorithm to creep the significant views of
the application activity in a black-box manner.

6.5 Stopping Criteria

Stopping criteria in the smart TV app could be an issue, especially for the cloud-
based applications. In the installed TV app, there is a finite number of views in
which the creeper can catch them, and the testing strategy can cover. When this
coverage criteria are met, the testing strategy may stop. Hence, this can serve
as stopping criteria. However, in cloud-based apps, there could be an infinite
number of events that appear in real-time feeding on the cloud. For example,
the YouTube smart TV app is presenting new views (i.e., videos) when scrolling
down in the application. Practically, there could be an infinite massive number of
views. The number of views may also vary with each new start of the application.
250 B. S. Ahmed and M. Bures

One solution to this challenge is to define a finite number of iteration in which


the creeper can iterate over the application or limiting the number of views to
be covered before the stop.

6.6 Test Suite Ripper

When generating the test cases, we expect some obsolete or invalid test cases.
For example, some detected views during the creeping process may not be valid,
and still, they may be presented in the test cases. To this end, there is a need
for a test ripper to repair those test cases which are not valid. The test ripper
may follow an algorithm to repair the test cases. For example, defining several
predefined patterns of the invalid test cases or transitions from a view to another
view.
Another repairing process of the test cases could be unique from the remote
device. For example, those color buttons on the remote device could be used for
several functional and non-functional requirements depending on the application
configuration.

6.7 Test Runner

When the creeper detects the views, and the test cases are generated and repaired
by the test generator and ripper, a test runner is needed to run these test cases.
A test runner is merely taking the test suite and run the test cases one by one
automatically. Here, the same test runner strategy in android app testing could
be followed by the smart TV app testing. However, executing the test cases
depends on the development kit.

6.8 Fault Taxonomy and Categorization

After running the test cases on the application, an important task is to iden-
tify the encountered faults and the test cases in which these faults related to.
However, faults in smart TV app are not known yet. Here, classical mutation
testing is not applicable. For example, recently, Deng et al. [14] have identi-
fied different faults in the Android apps within a mutation testing framework
for mobile devices. In fact, those faults are more Android-oriented faults, and
they are not applicable here. In addition, some of those faults are related to
the Activity faults, for example, changing the screen orientation, which is also
not appropriate because the Smart TV screen is too big to be frequently ori-
ented. Normally, classical mutation test tools like MuDroid [15] or MuJava [16]
are used for mobile, web or desktop apps. As we mentioned, those tools are
platform-specific tools. An important effort in this approach is done by Cui et al.
[9]. Cui et al. identified eight different types of faults in smart TV applications.
These faults are, TV system halt, TV system reboot, displaying a black screen,
having voices but no images, playing images with delaying, application exit by
exceptions, playing images with a blurry screen, key has no response, or the
Testing of Smart TV Applications 251

response key is wrong. While this is an excellent effort toward the fault catego-
rization, there is a need to identify more faults related to the application itself.
Some of those identified faults may also relate to the TV device itself. Also,
there is a need to identify a method for how to inject these faults in the smart
TV. A significant effort that can be done here is to conduct a study to define
the taxonomy of faults in Smart TV apps. A useful input to this study could
come from smart TV industry especially those companies which are tracking
and getting feedback from users in the cloud. Doing an analytical study on this
data to categorize these faults would be an excellent finding.

6.9 Defining Test Oracle

Defining the pass and fail criteria is a challenging task in software testing process.
Within test automation, the mechanism for determining whether a given test
case is passed or failed is named test oracle. In this context, the distinction
between the correct and incorrect behavior is called “test oracle problem” [17].
A classical way to approach the test oracle is the manual identification of the
pass and fail by the developer. However, for a significant amount of test cases,
this is not accurate and impractical.
Automating test oracles in smart TV app testing is not an easy task since we
don’t know precisely the nature and the kind of faults the application face. In
addition, the dynamic behavior of the cloud-based smart TV applications may
lead to random new views that can be loaded. In fact, this task is connected to
the fault taxonomy and categorization discussed in Sect. 6.8. When we know the
faults and can categorize them, we can define the test oracle for the automated
testing framework.

7 Towards an Automated Testing Strategy

Based on the problems and challenges presented so far, here we can propose an
automated framework to test the smart TV apps. This framework presents our
vision for a strategy to automate the testing process. The framework is working
in the Tizen SDK, which includes a smart TV emulator; however, the framework
is a general framework and it is applicable for other possible emerging SDKs in
the future. Figure 3 shows an overview of this framework and illustrates the
essential components and their relationship to each other.
The framework supports both white and black box testing styles. The tester
chooses among these two features depending on the source code availability and
the application type. As mentioned previously, even when the source code is
available, when the application is a cloud-based app, the tester must consider
this case as black-box testing. When the source code is available, the tester will
import the project and let the framework do the rest automatically. Here, the
creeper will scan the source code and tries to identify the essential views in
the UI.
252 B. S. Ahmed and M. Bures

Fig. 3. Smart TV App testing framework.

In case of black-box testing or cloud-based app, which is probably the most


critical case, the creeper must use a special algorithm to creep and detect all the
views. Detail of this algorithm is presented in the following section (Sect. 7.1).
Here, the creeper uses the log messages from the TV emulator to validate the
views.
In both white or black box testing approaches, the creeper will detect the
essential views and convert all the views and their relationship with each other to
a state machine graph model. This model will be the input to the test generator
which consists of a model-based algorithm for generation and also a test Ripper
to repair the test cases. The repair will be based on some predefined patterns of
invalid test cases. This process is iterative until as far as there is an invalid test
case. The framework will execute these test cases through a test runner on the
TV emulator, and an automated test oracle module will validate them one by
one. Finally, a test report will be presented to the user again.

7.1 Application Creeper

To detect all the necessary views in the application that need to present in the
model for test generation, we have developed an algorithm called EvoCreeper. In
fact, object detectors in UI for mobile, desktop, and web apps is not new. There
are some algorithms called crawlers to crawl on the UI and detect these objects.
None of those algorithms are useful here since we have an entirely different user
interaction behavior in the smart TV apps. Besides, we have thought that the
name “creeper” suites perfectly with what we want to do as the “crawler” word
gives a different meaning due to its use in web and search engine technologies.
Algorithm 1 shows the steps of the EvoCreeper.
If the focus point is not set by the app developer, the EvoCreeper starts by
an action from the tester to choose at least one view to start from, otherwise, it
will start from the focused view. From this view, the creeper will start creeping
the UI evolutionary and incrementally. The algorithm takes four directions DU p ,
DDown , DLef t , DRight plus the OK button from each view to move. When a new
view discovered in each direction (i.e., newV iew = Active), the algorithm will add
Testing of Smart TV Applications 253

Algorithm 1. EvoCreeper Steps


1 Input: v1 is the user selected view
2 Output: List of views to be modeled Lv
3 Iteration It ←1
4 Maximum Iteration Itmax ← max
5 While ((It < Itmax )  (newV iew = null))
6 Use v1 as a start point
7 From v1 generate five possible directions DU p , DDown , DLef t , DRight , OK
8 For each direction
9 Navigate a step
10 Monitor emulator log for reaction
11 If newV iew = Active
12 add newV iew to Lv
13 End If
14 It + +
15 End For
16 End While

it to the list of views to be modeled Lv . This algorithm will continue until there are
no new discovered views. Here, as another stopping criterion, the algorithm will take
some preset number of iteration to avoid the endless discovery loop in some special
cases of cloud-based apps. In the following section (Sect. 7.2), we present an example
as a graphical proof of concept for this algorithm.

7.2 Proof of Concept

In this section, we present a proof of concept for the application creeper in


Algorithm 1. Here, we consider a cloud-based app as a pilot example as it is the
most difficult scenario. As shown in Fig. 4, each activity window has 12 views
and as the user shift down or right, new activities may appear. We consider
three iterations of the algorithm. We assume that the tester will choose v1 as
a start point. In fact, v1 is the worst case choice of the views and we observed
that choosing the view in the middle of the window may lead to less iteration
and better recognition of the views. From v1 , the algorithm will consider four
main directions, DU p , DDown , DLef t , DRight plus the OK button. However, here,
we will consider only those four directions because the OK button may open a
new window in the app.
For each direction, the creeper algorithm will check for new events, which are
most likely new views. Considering the first iteration, and starting from v1 , the
up and left directions Du , Dl will not lead to new views, while the right direction
Dr leads to v2 and the down direction Dd leads to v5 . For the next iteration, the
creeper will start from newly discovered views, v2 and v5 here. From v2 , the news
views v3 and v6 identified by the creeper algorithm. In addition, v1 is discovered
in the Dl direction, however, it is neglected by the creeper as it is already available
on the view list. Considering the v5 , the views v1 , v9 , and v6 are in the three direction
Du , Dd , and Dr respectively; however, only v9 considered as a new view.
254 B. S. Ahmed and M. Bures

Fig. 4. Proof of concepts of the EvoCreeper.

The third iteration also starts from the newly discovered views, v3 , v6 , and
v9 . In the same way, considering the four directions from each view and filtering all
repeated views, four new views were identified, v4 , v7 , v10 , and v13 .
The EvoCreeper algorithm works in an iterative evolutionary style to discover
new views and events in the application under test. As mentioned, this pilot
example considers the cloud-based app. Hence, there is no expectation of the
finite numbers of views in the application. To this end, our proposed stopping
criteria could be useful here. The creeper algorithm will continue for a certain
number of iterations or when no new views discovered.

8 Functional and Non-functional Testing Opportunities


in Smart TV Applications

For testing the functional or non-functional requirement in smart TV, we need a


measure. This measure can be used in the test generation process as a coverage
criterion and also can be used in the design of test oracle. While for functional
requirement it is straightforward, converting a non-functional requirement into
an exact measure is a tricky task. Here, an approximation could be useful.
Many problems could be addressed here. For example, addressing the min-
imum hardware requirements for a specific smart TV application would be an
interesting idea to investigate. Most of the smart TV devices nowadays in the
market rely on low computation power CPU and memory. Extra hardware may
be used to measure the energy consumption of the CPU during the testing
process.
Covering the event interactions in different level is also interesting functional
testing. Here, full, partial, or systematic coverage of the events is the decision
that must be made by the tester. Also, a comparison of these three coverage
criteria is an important study topic to know which approach is better for fault
finding.
Testing of Smart TV Applications 255

The limitation in memory and CPU lead to another interesting non-


functional requirement that may also be used in the testing process, which is
the execution time. It would be interesting to know the situation and sequences
in the smart TV application that causes long or short execution time. This could
also be useful to identify and detecting security vulnerabilities. In fact, security
is an essential issue in smart TV applications that have never been addressed
before.
Probably, an essential non-functional requirement that must be addressed in
smart TV applications is the usability. Due to the availability of remote device,
the usability testing is necessary. In fact, the remote device remains the main con-
straint facing the usability of the smart TV applications. At this early research
stage, it is useful to address how to make the applications more usable and what
are the factors that affect the usability. It is true that the user-oriented testing
technique could be more realistic here; however, an automated testing method
could support the final result of usability testing report.

9 Conclusion and Future Work


In this paper, we have presented the key ingredients, challenges, and some pro-
posed solutions for the smart TV app testing. We think that in the near future,
smart TV apps will be an essential piece of software in the whole context of IoT
services. Despite this importance, we can’t find a systematic and robust testing
strategy in the literature for the smart TV apps. After an extensive study of
these applications, we discover many open problems and challenges in which we
illustrated them in this paper. We found that the most crucial problem to be
solved is the test generation strategy. In this paper, we proposed a fully auto-
mated framework to test smart TV apps. In addition, we have also illustrated
our EvoCreeper algorithm that creeps the views available in the application win-
dow. The algorithm uses an iterative evolutionary style to discover new views.
The output of the algorithm will be input to the test generator strategy that
generates the necessary test cases for the automated testing framework.
Depending on the testing process, there are many opportunities for smart
TV app testing. For example, the security, usability, scalability, and robustness
testing are essential issues that have not been addressed in the literature. Here,
our proposed framework is also useful for these non-functional properties by just
altering the test oracle and test generator components. As part of our work,
we are planning to present more comprehensive strategy with testing results of
different smart TV apps in the future.

Acknowledgment. This research is conducted as a part of the project TACR


TH02010296 Quality Assurance System for Internet of Things Technology.
256 B. S. Ahmed and M. Bures

References
1. Jung, K.S.: The prospect of Smart TV service. Inf. Commun. Mag. 28(3), 3–7
(2011)
2. Zein, S., Salleh, N., Grundy, J.: A systematic mapping study of mobile application
testing techniques. J. Syst. Softw. 117(C), 334–356 (2016)
3. Sahinoglu, M., Incki, K., Aktas, M.S.: Mobile application verification: a systematic
mapping study, pp. 147–163. Springer, Heidelberg (2015)
4. Amalfitano, D., Fasolino, A.R., Tramontana, P., Robbins, B.: Chapter 1 - testing
android mobile applications: challenges, strategies, and approaches. In: Advances
in Computers, vol. 89, pp. 1–52. Elsevier (2013)
5. Banerjee, I., Nguyen, B., Garousi, V., Memon, A.: Graphical user interface (GUI)
testing: systematic mapping and repository. Inf. Softw. Technol. 55(10), 1679–1694
(2013)
6. Li, Y.-F., Das, P.K., Dowe, D.L.: Two decades of web application testing-a survey
of recent advances. Infor. Syst. 43(C), 20–54 (2014)
7. Amalfitano, D., Fasolino, A.R., Tramontana, P., Ta, B.D., Memon, A.M.: Mobi-
guitar: automated model-based testing of mobile apps. IEEE Softw. 32(5), 53–59
(2015)
8. Nguyen, B.N., Robbins, B., Banerjee, I., Memon, A.: Guitar: an innovative tool
for automated testing of GUI-driven software. Autom. Softw. Eng. 21(1), 65–105
(2014)
9. Cui, K., Zhou, K., Song, H., Li, M.: Automated software testing based on hierar-
chical state transition matrix for Smart TV. IEEE Access 5, 6492–6501 (2017)
10. Ingrosso, A., Volpi, V., Opromolla, A., Sciarretta, E., Medaglia, C.M.: UX and
usability on Smart TV: a case study on a T-commerce application, pp. 312–323.
Springer, Cham (2015)
11. Sabina, K.C.: Defining a testing platform for Smart TV applications. Bachelor
thesis, Helsinki Metropolia University of Applied Sciences, January 2016
12. Bluttman, K., Cottrell, L.M.: UX and usability on Smart TV: a case study on a
T-commerce application. McGraw Hill Professional, Cham (2012)
13. Murgrabia, M.: Design considerations for Vewd app store applications (2017).
Accessed 5 Dec 2017
14. Deng, L., Offutt, J., Ammann, P., Mirzaei, N.: Mutation operators for testing
android apps. Inf. Softw. Technol. 81(C), 154–168 (2017)
15. Moran, K., Tufano, M., Bernal-Cardenas, C., Linares-Vasquez, M., Bavota, G.,
Vendome, C., Di Penta, M., Poshyvanyk, D.: Mdroid+: a mutation testing frame-
work for android. In: 40th International Conference on Software Engineering
(ICSE) (2018)
16. Ma, Y.-S., Offutt, J., Kwon, Y.R.: MuJava: an automated class mutation system:
research articles. Softw. Test. Verif. Reliab. 15(2), 97–133 (2005)
17. Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem
in software testing: a survey. IEEE Trans. Softw. Eng. 41(5), 507–525 (2015)
Dynamic Evolution of Simulated
Autonomous Cars in the Open World
Through Tactics

Joe R. Sylnice and Germán H. Alférez(B)

School of Engineering and Technology, Universidad de Montemorelos,


Apartado 16-5, Montemorelos, N.L. 67500, Mexico
1140134@alumno.um.edu.mx, harveyalferez@um.edu.mx

Abstract. There is an increasing level of interest in self-driving cars. In


fact, it is predicted that fully autonomous cars will roam the streets by
2020. For an autonomous car to drive by itself, it needs to learn. A safe
and economic way to teach a self-driving car to drive by itself is through
simulation. However, current car simulators are based on closed world
assumptions, where all possible events are already known as design time.
Nevertheless, during the training of a self-driving car, it is impossible
to account for all the possible events in the open world, where several
unknown events may arise (i.e., events that were not considered at design
time). Instead of carrying out particular adaptations for known context
events in the closed world, the system architecture should evolve to safely
reach a new state in the open world. In this research work, our contribu-
tion is to extend a car simulator trained by means of machine learning
to evolve at runtime with tactics when the simulation faces unknown
context events.

Keywords: Autonomous car · Tactics · Dynamic evolution


Open world · Machine learning

1 Introduction
A human driver learns by practicing how to drive and how to detect problems
in the car and on the road. It is basically the same in the case of autonomous
cars. These cars learn from historical data to learn how to drive.
However, a self-driving vehicle is really expensive to build and maintain. In
fact, there are reports informing that NVIDIA is selling its self-driving process-
ing unit for about $15,000 [1]. That is really expensive taking into account that
this is the price of only the processing unit. Also, it is dangerous and careless
to unleash a self-driving car without proper training and testing. Simulations to
prove new approaches in autonomous cars could be used to solve the aforemen-
tioned problems in the academic world, and especially in developing countries
with limited financial resources.
In the closed world, all the possible context events are known beforehand (i.e.,
at design time or during training under a machine-learning approach). However,
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 257–268, 2019.
https://doi.org/10.1007/978-3-030-02686-8_21
258 J. R. Sylnice and G. H. Alférez

in the open world, unknown context events can arise (e.g. a sudden malfunction
in one of the car sensors). This kind of events have to be controlled efficiently in
order to prevent problems with the driver and passengers. Moreover, although
there are open-source simulators, these simulators do not manage uncertainty in
the open world.
In this research work, our goal is to extend the applicability of machine
learning by means of tactics to carry out the dynamic evolution of simulated
autonomous cars in the open world. Tactics are last-resort surviving actions to
be used when the simulated car does not have predefined adaptation actions to
deal with arising problematic context events in the open world [2]. In order to
apply tactics in the open world, the source code of a car video game was modi-
fied. First, the car was trained with the following supervised learning algorithms:
K-Nearest Neighbors, Logistic Regression, Support Vector Machines, and Deci-
sion Trees. Then, unknown context events were injected at runtime to evaluate
how the car faces those events with tactics.
This paper is organized as follows. Section 2 presents the theoretical founda-
tion of this research work. Section 3 presents the results. Finally, Sect. 4 presents
the conclusions and future work.

2 Justification

The research field of self-driving cars is a hot topic nowadays. However, the
technology behind a self-driving car relies heavily on state-of-the-art software and
really expensive hardware. That is why simulation tools are being increasingly
used in the field because they provide the mechanisms to test and evaluate
the system of a self-driving car without having to buy (or even damage) really
expensive hardware [3].
Predefined adaptation actions for known context events in the closed world
are not enough in the open world where several unknown context events can arise.
Despite the recognized need for handling unexpected events in self-adapting
systems (SAS) [4], the dynamic evolution of SAS in the open world is still and
open and challenging research topic.
In order to visualize the impact of unknown context events in the open world,
let us imagine a self-driving car that has been trained with machine learning.
The training was carried out with datasets composed of known historical data
(e.g. data related to sonar and LiDAR sensors). In other words, the training
was applied in the closed world. However, at runtime several unknown events
may arise in the open world. For instance, although the sensors are highly cali-
brated and thoroughly revised, it is possible that a sensor starts recording inac-
curate data (e.g. because of a broken sonar sensor). This is a dangerous situation
because inaccurate data could lead to an accident. If the car was not trained to
face this kind of situations, then the following question arises: what will the car
do? In order to answer this question, in addition to applying machine learning
to train self-driving cars, it is necessary to count on mechanisms to lead the car
to make the best decision despite unknown context events.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 259

3 Underpinnings of Our Approach


Our approach is based on the following concepts (Fig. 1).

Fig. 1. Underpinnings of our approach.

3.1 Machine Learning


Machine learning can be defined as computational methods using experience to
improve performance or to make predictions accurately. Experience can refer to
past data that is used by the learner. The quality and size of the data are very
important for the accuracy of the predictions made by the learner [5].

3.2 Tactics
Tactics are last-resort surviving actions to be used when a system does not have
predefined adaptation actions to deal with arising problematic context events
in the open world [2]. The use of tactics is common in sports, war, or even
in daily matters to accomplish an end. For example, the most important goal
during a battle is to win. However, unknown or unforeseen events, such as sur-
prise assaults, may arise. These events may negatively affect the expected goal.
Therefore, it is necessary to choose among a set of tactics to reach the goal (e.g.
to escape vs. to do a frontal attack). Tactics are predefined at design time and
are used at runtime to trigger the dynamic evolution of the self-driving car. The
tactics are required to be known beforehand in order for the self-driving car
to face uncertainty. However, these tactics are not associated with any specific
reconfiguration actions (as dynamic adaptation does) [6].

3.3 Dynamic Evolution


A self-driving car has to go from dynamic adaptation in the closed world to
dynamic evolution in the open world in order to respond to unforeseen ongoing
events. Dynamic adaptation can be referred to as punctual changes made to
face particular events by activating and deactivating system features based on
the current context. Meanwhile, dynamic evolution is not just about applying
punctual adaptations to concrete events but it is the gradual growth of the
system to a better state depending on the current context events [2].
260 J. R. Sylnice and G. H. Alférez

3.4 Open World


Open world can be referred to as a context where events are unpredictable,
requiring that software reacts to these events by adapting and organizing its
behavior by itself [7]. As far as we know, current simulated autonomous cars
are based on the closed world assumption where the relationship between the
car and the surroundings are known and unchanging. Nevertheless, in the open
world where the aforementioned relationship is unknown, unpredictable, and
constantly changing, the simulated car has to be able to evolve.

4 Related Work
A fully autonomous car or self-driving vehicle is a car that is designed to be able
to do all the work of maneuvering the car without the passenger never having
to or is not expected to take control of the car at any time or any given moment
[8]. A self-driving vehicle has to be able to identify faults in its system. If the
faults are critical, the vehicle has to either fix these faults or isolate them so that
the system is not compromised [9].
Self-driving vehicles are equipped with state-of-the art sensors and cameras.
Also, they use powerful software behind the hardware to maneuver themselves.
The software learns how to drive through machine learning and the software sees
through computer vision.
There are several self-driving cars in development. For example, the Google
Car is being developed by Google. Google hopes to have self-driving cars on
the road by 2020. However, this company does not intend to become a car
manufacturer. Uber also entered the world of self-driving cars in April 2015. In
addition, Tesla expects to launch a fully autonomous car anytime in 2018. Also,
in April 2015, BMW has partnered with Baidu the “Chinese Google”, to develop
self-driving technology.
There are several research works that propose simulations of autonomous
cars. For instance, in [10] the authors propose a shader-based sensor to simulate
the LiDAR and Radar sensors instead of the common method of ray tracing.
They mention that sensor simulations are very important in the field of self-
driving cars. In this way, the sensors can be evaluated, tested and optimized.
The authors state that ray tracing is an intensive task for the CPU. It is not
problematic when the number of simulated rays and detected objects are small.
However, in reality it becomes problematic or even impossible. According to the
authors, a shader-based sensor simulation is an efficient alternative to ray casting
because it uses parallelism in the GPU and this helps in sparing CPU resources
that the software can use in other areas.
In [11], the authors mention that they have used a simulation tool called
Scene Suite to generate simulated scenes of traffic scenarios. The tool allows
2.5D simulations and uses patented virtual sensor models. The goal of this work
is to show how the data from real world sensor models could be extracted and
then to simulate the results using a scene based pattern recognition. Also, this
paper introduced an approach for learning sensor models with a manageable
Dynamic Evolution of Simulated Autonomous Cars in the Open World 261

demand on computational power based on a statistical analysis of measurement


data clustered into scene primitives.
In [12], the authors focus on the use of the agent-based simulation framework
MATsim and how it could be applied to the field of self-driving cars. Agent-based
simulations are state-of-the-art transport models. Agent-based approaches com-
bine activity-based demand generation and dynamic traffic assignments. MAT-
Sim is a simulation of multi-agent transport based on activity. It is an open
source framework written in JAVA under the GNU license. MATSim’s strength
is the modular design around a core, allowing new users to customize it without
much effort. This work is based on the simulation of autonomous vehicles in a
realistic environment at a large scale with individual travelers (vehicles) that
adapt their movement dynamically with the others.
In [13], the author uses an open source simulator to carry out the evaluation
and application of a reinforcement learning approach to the problem of control-
ling the steering of a vehicle. Reinforcement Learning (RL) is an area of machine
learning in which an agent is placed into a certain environment and is required to
learn how to take proper actions without having any previous knowledge about
the environment itself. If the agent’s behavior is right, it is rewarded. If the
behavior is wrong, the agent is punished. This learning system of reinforcement
learning is called trial and error. In order to evaluate this approach, the Open
Racing Car Simulator (TORCS) was used. In the TORCS environment a car is
referred to as a Robot.
In [3], the authors use an integrated architecture that is comprised of both
a traffic simulator and a robotics simulator in order to contribute to the self-
driving cars simulation. Specifically, the proposed approach uses the traffic sim-
ulator SUMO and the robotics simulator USARSim. These tools are open source
and have good community support. In one hand, SUMO is a microscopic road
traffic simulator written in C++. It was designed by the Institute of Transporta-
tion Systems at the German Aerospace Center to handle large road networks.
On the other hand, USARSim is an open-source robotics simulator written in
Unreal Script, which is the language of the Unreal game engine. It has high qual-
ity sensor simulation and physics rendering. The authors modified the SUMO
and USARSim simulators in order to be able to implement the architecture for
the self-driving car simulation. The result is a simulator in which a self-driving
vehicle can be deployed in a realistic traffic flow.
In [14], the authors describe the global architecture of the simulation/proto-
typing tool named Virtual Intelligent Vehicle Urban Simulator (VIVUS) devel-
oped by the SeT Laboratory. The VIVUS simulator simulates vehicles and sen-
sors. It also takes into account the physical properties of the simulated vehicle
while prototyping the artificial intelligence algorithms such as platoon solutions
and obstacle avoidance devices. The goal of VIVUS is therefore overcoming the
general drawbacks of classical solutions by providing the possibility of designing
a vehicle virtual prototype with simulated embedded sensors.
In [15], the authors combine a traffic simulator and a driving simulator
into an integrated framework. They have used the driving simulator SCANeR
262 J. R. Sylnice and G. H. Alférez

developed by Renault and Oktal, and the AIsum traffic simulator developed by
TSS-Transport Simulation Systems. The framework enables a driver to use the
simulator with a local traffic situation managed by a nano traffic model that is
realistic for the driver and that also provides a realistic global traffic situation in
terms of flow and density. The framework can provide information on the simu-
lated vehicles and the traffic situation for the short-ranged sensors: camera and
radar and also the long-ranged sensors: wireless and embedded navigation. It also
enables the driver and other systems to be involved in an extensive assortment
of traffic situations, accidents, rerouting, road-work zones, and so on.

5 Results
5.1 Methodology
This project has been broken down in the following steps:

Looking for an Open Source Car Simulator: To find the open source car
simulator, Google Search was used with the term “open source car simulator ”
in December 2017. The following is the list of the open source car simulators
found:
– TORCS1 : TORCS is a multi-platform car racing simulation. It is used as an
ordinary car racing game, as an artificial intelligence (AI) racing game, and
as a research platform.
– Apollo2 : Apollo is an open-source autonomous driving platform created by
Baidu. It has a high performance and flexible architecture that supports fully
autonomous driving capabilities and also has car simulation functionalities.
– Udacity’s Self-Driving Car Simulator3 : This simulator was built for
Udacity’s Self-Driving Car nanodegree to teach students how to train cars
and how to navigate road courses using deep learning.

Comparing Different Open Source Car Simulators: The criteria for


choosing the car simulator were the following: (1) it had to be open source
to find the points in which it could be extended; (2) it had to be mature enough
in terms of documentation; (3) it had to be supported by the developer commu-
nity; and (4) it had to be easily extensible in terms of programming. The results
of the comparison are as follows:
1. TORCS meets three of the four criteria. Although, it is open source, mature,
well known in the scientific world, and is greatly supported by the developer
community, it misses the fourth criteria because it is not easily extensible in
terms of programming.
1
http://torcs.sourceforge.net/index.php?name=Sections&op=viewarticle&artid=1.
2
https://github.com/ApolloAuto/apollo.
3
https://github.com/udacity/self-driving-car-sim.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 263

2. Apollo is a fully fledged open autonomous driving platform that meets two of
our criteria: it is open source and mature. However, it is a fully autonomous
driving platform, much more complex than a simulator. Also, since it was
released a couple of months prior to our search, it does not yet have a wide
developer community support. Also, the documentation, written in Chinese
is not yet translated.
3. Udacity’s self-driving car simulator falls short when it comes to documen-
tation. As a result, although it is an open source software, the lack of free
documentation makes it difficult to extend the code.
According to the evaluation, none of these simulators fulfilled our needs. There-
fore, instead of searching for open source autonomous car simulators, we looked
for an open source car game, which could be trained by means of machine learn-
ing and extended for usage in the open world.
We found an open source car game named Lapmaster4 . It is a simple car
game designed with the pygame Python library. It consists of a car running
around a circuit for a certain amount of laps. Also, the player is able to shift
the gears. The goal of the game is to complete the laps as fast as possible. Fig. 2
shows a screenshot of this game.

Fig. 2. Screenshot of the Lapmaster game.

4
http://pygame.org/project-Lap+Master-2923-4798.html.
264 J. R. Sylnice and G. H. Alférez

Extending the Car Simulator: In this step, the Lapmaster car simulator
was extended for the open world. Specifically, two steps were carried out: (1)
collecting data from the context of the car for training; and (2) training the
simulated car with machine learning. These steps are described as follows.
1. Collecting data from the context of the car: The source code of the car game
was modified to collect the position (x and y coordinates) and the direction
(0 - forward, 1 - right, and 2 - left) of the car in every frame. Listing 1 shows
the modified lines of the car’s source code. On line 1, a while loop indicates
that the code is executed while the car simulator is running. On line 2, the
program detects the key that is pressed. On line 3, if the car is moving, then
the program checks if the key “d ” (right) or key “a” (left) is pressed. These
values are stored in the l data list. Specifically, three values are stored in this
list: the x and y coordinates, and the direction (0 for forward, 1 for right,
and 2 for left). If no key is pressed, then the program stores a 0 in the l data
list. On line 12, if the l data list is not empty, then it is passed to the Writer
function with the log’s path in which the contextual data is to be written.
Listing 2 presents the Writer function which writes the data in the comma-
separated values (CSV) format. The CSV file contains 4,149 instances. This
number of instances was obtained by running the game four times. The x and
y coordinates were taken as the features for training, and the direction as the
class.
1 while running :
2 key = pygame . key . get_pressed ()
3 if red . gear > 0:
4 if key [ K_d ]:
5 red . view = ( red . view + 2) % 360
6 d = 1
7 elif key [ K_a ]:
8 red . view = ( red . view + 358) % 360
9 d = 2
10 else :
11 d = 0
12 l_data = [ red . xc , red . yc , d ]
13 if l_data :
14 data . Writer ( l_data , path )

Listing 1.1. A fragment of the modified code of the Lapmater’s source file.

1 import csv
2 def Writer ( data , path ):
3 with open ( path , " a " ) as c_file :
4 write = csv . writer ( c_file , delimiter = ’ , ’)
5 write . writerow ( data )

Listing 1.2. Implemented function for data writing.

2. Training the simulated car: For the training of the simulated car, four super-
vised machine learning algorithms from the scikit-learn 5 Python library were
employed. The algorithms are the following [16]:
(a) K-Nearest Neighbor (KNN): It is a simple algorithm that stores
all available cases and classifies new cases by a majority vote of its k
neighbors.
5
http://scikit-learn.org/stable/#.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 265

(b) Logistic Regression (LR): It is a classification algorithm used to esti-


mate discrete values based on a given set of independent variables. It
predicts the probability of occurrence of an event by fitting data to a
logit function.
(c) Support Vector Machine (SVM): In this classification algorithm,
each data point is plotted in an n-dimensional (n being the number of
features) space where the value of each feature is the value of a partic-
ular coordinate. Then a line called separating hyper plane or (decision
boundary) splits the data points between two or more groups of data.
The further the data points from the decision boundary, the more confi-
dent the algorithm is about the prediction. The closest data points to the
separating hyper plane are known as support vectors.
(d) Decision Trees (DT): In this classification algorithm, the data is split
into two or more homogeneous sets based on most significant attributes
that makes the sets distinct.
The following are the steps used to train the simulated car: (1) a user ran the
game to generate a dataset; (2) the KNN, LR, SVM, and DT algorithms were
executed to get a classification for each class. The classes were 0 for forward, 1 for
right, and 2 for left; (3) the models were evaluated in terms of cross validation;
and (4) the simulated car was extended to use the most accurate classifier.
A fragment of the script to generate the classification models from the data
collected is presented in Listing 3. The first line declares a list containing the
information of the four classifiers used in the experiments. Next, a for loop is
used to iterate over this list in order to train and generate a model for each
algorithm. Line 9 specifies the location and the name of the model that is going
to be trained. In Lines 12 and 13, the program splits the data into training and
test sets. The code in Line 11 indicates that the values are going to be taken
randomly from the dataset. On lines 14–15, a classification model is created and
the cross-validation score is evaluated. In Line 17, the accuracy of each algorithm
is computed. In Lines 18–21, each model is evaluated and a classification report
is generated. Finally, the model generated by each algorithm is saved.
1 classifiers = [
2 ( ’ kNN ’ , K N e i g h b o r s C l a s s i f i e r ( n_neighbors =4) ) ,
3 ( ’ LR ’ , L o g i s t i c R e g r e s s i o n () ) ,
4 ( ’ SVM ’ , SVC () ) ,
5 ( ’ DT ’ , D e c i s i o n T r e e C l a s s i f i e r () )
6 ]
7
8 for name , clf in classifiers :
9 filename = ’ models /% s_ % s . pickle ’ % ( name , data . filename )
10 print ( ’ training : % s ’ % name )
11 rs = np . random . RandomState (42)
12 X_train , X_test , y_train , y_test =
13 train_test_split (X , y , test_size =0.2 , random_state = rs )
14 model = clf . fit ( X_train , y_train )
15 cv = cross_val_score ( clf , X_test , y_test , cv =10 ,
16 scoring = ’ accuracy ’)
17 acc = np . mean ( cv )
18 predictions = clf . predict ( X_test )
19 report = c l a s s i f i c a t i o n _ r e p o r t ( y_test , predictions )
20 print ( ’ training % s done ... acc = % f ’ % ( name , acc ) )
21 pickle . dump ( model , open ( filename , ’ wb ’) )
22 bm . append ( ’% s % s ’ % ( name , report ) )

Listing 1.3. A fragment of code to train and generate classification models.


266 J. R. Sylnice and G. H. Alférez

Injecting Dynamic Evolution Through Tactics: In this step, we emulated


that a sonar sensor was malfunctioning. This situation can cause accidents since
the car will not be able to “see” properly its environment (e.g. other cars). To
trigger this event, a button on the keyboard was pressed. When the car system
recognizes that an unknown context event has arisen, then the “decelerate tactic”
is triggered. This tactic progressively slows down the car until it reaches the state
of a full stop. The reasoning behind this tactic is to prevent that the car keeps
going on without properly detecting its surroundings. The implemented tactic
is shown in Listing 4. Specifically, when the “s” key is pressed on the keyboard,
the slow variable is set to true to indicate that the car has to reduce the speed
until if fully stops.
1 slow = False
2
3 key = pygame . key . get_pressed ()
4 if key [ K_s ]:
5 slow = True
6 if slow :
7 red . speed = .95 * red . speed - .05 * (2.5 * red . gear )

Listing 1.4. A fragment of the source code for the decelerate tactic.

5.2 Outcomes

The accuracy of the models generated with the four algorithms are as follows:
kNN = 0.9313, LR = 0.8927, SVM = 0.8927, DT = 0.929. Table 1 shows the
cross validation results of each model generated with the four classifiers. Also, in
Table 1, only two classes are shown: 0 for forward and 1 for right. That is because
the circuit in the Lapmaster game only has right turns. Although the kNN
algorithm has the best accuracy, the DT algorithm has better results in terms
of precision, recall, and f1-score. The three aforementioned terms are defined as
follows [17]:

– Precision is the ability of the classifier not to identify as positive a sample


that is negative.
– Recall is the ability of the classifier to find all the positive samples.
– F1-score is a weighted mean of the precision and recall.

5.3 Discussion

We published a video6 in which the “decelerate tactic” is effectively triggered at


runtime. Although machine learning works fine in the closed world, i.e., where
there are no unknown events (e.g. malfunctioning sensors), in the open world it
is necessary to count with additional mechanisms to face uncertainty. Therefore,
we argue that autonomous cars that are trained by means of machine learning
need to be extended with highly general tactics that try to defend the car in
extreme conditions of uncertainty.

6
www.harveyalferez.com/autonomous-car-demo.html.
Dynamic Evolution of Simulated Autonomous Cars in the Open World 267

Table 1. Report for each of the algorithm models.

Precision Recall f1-score


kNN
0 0.95 0.99 0.97
1 0.83 0.56 0.67
Avg/Total 0.94 0.94 0.94
LR
0 0.89 1.00 0.94
1 0.00 0.00 0.00
Avg/Total 0.80 0.89 0.84
SVM
0 0.90 1.00 0.95
1 1.00 0.03 0.07
Avg/Total 0.91 0.90 0.85
DT
0 0.97 0.98 0.97
1 0.82 0.71 0.76
Avg/Total 0.95 0.95 0.95

6 Conclusions and Future Work

This research work extended the applicability of machine learning by means of


tactics to carry out the dynamic evolution of a simulated self-driving car in the
open world. To this end, four classifiers were executed and four models were
generated and evaluated. The DT model was used in the simulated car after
evaluation. Then, a tactic to face a simulated unknown context event in the
open world was implemented. This tactic was used to prevent a situation in
which the life of the passengers could be put in jeopardy.
Since this research work was limited to the implementation and application
of one tactic, as future work we would like to propose additional tactics. For
example, tactics related to non-functional requirements, such as availability and
performance, could be used to keep or improve service levels. Also, these tactics
could be handled during execution by means of models at runtime as proposed
in our previous work [2]. Moreover, we plan to test our approach in other tracks
in which complex unknown context events could arise.
268 J. R. Sylnice and G. H. Alférez

References
1. Frederic, L.: All new Teslas are equipped with NVIDIA’s new drive PX 2 AI
platform for self-driving. https://goo.gl/xNSo8B
2. Alférez, G.H., Pelechano, V.: Achieving autonomic web service compositions with
models at runtime. Comput. Electr. Eng. 63, 332–352 (2017)
3. Pereira, J.L., Rossetti, R.J.: An integrated architecture for autonomous vehicles
simulation. In: Proceedings of the 27th Annual ACM Symposium on Applied Com-
puting, pp. 286–292. ACM (2012)
4. Cheng, B.H., De Lemos, R., Giese, H., Inverardi, P., Magee, J., Andersson,
J., Becker, B., Bencomo, N., Brun, Y., Cukic, B., et al.: Software engineering for
self-adaptive systems: a research roadmap. Software engineering for self-adaptive
systems, pp. 1–26. Springer, Heidelberg (2009)
5. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning.
MIT press (2012)
6. Alférez, G.H., Pelechano, V.: Facing uncertainty in web service compositions. In:
2013 IEEE 20th International Conference on Web Services (ICWS), pp. 219–226.
IEEE (2013)
7. Baresi, L., Di Nitto, E., Ghezzi, C.: Toward open-world software: issues and chal-
lenges. Computer 39(10), 36–43 (2006)
8. Coles, C.: Automated vehicles: a guide for planners and policymakers (2016)
9. Maurer, M., Gerdes, J.C., Lenz, B., Winner, H.: Autonomous driving: technical,
legal and social aspects. Springer, Heidelberg (2016)
10. Wang, S., Heinrich, S., Wang, M., Rojas, R.: Shader-based sensor simulation for
autonomous car testing. In: 2012 15th International IEEE Conference on Intelligent
Transportation Systems, pp. 224–229. IEEE (2012)
11. Simon, C., Ludwig, T., Kruse, M.: Extracting sensor models from a scene based
simulation. In: 2016 IEEE International Conference on Multisensor Fusion and
Integration for Intelligent Systems (MFI), pp. 259–264. IEEE (2016)
12. Boesch, P.M., Ciari, F.: Agent-based simulation of autonomous cars. IEEE Am.
Control Conf. (ACC) 2015, 2588–2592 (2015)
13. Piovan, A.G.: A neural network for automatic vehicles guidance. ACE 10, 2 (2012)
14. Gechter, F., Contet, J.-M., Galland, S., Lamotte, O., Koukam, A.: Virtual intel-
ligent vehicle urban simulator: application to vehicle platoon evaluation. Simul.
Modell. Pract. Theory 24, 103–114 (2012)
15. That, T.N., Casas, J.: An integrated framework combining a traffic simulator and
a driving simulator. Procedia-Soc. Behav. Sci. 20, 648–655 (2011)
16. Harrington, P.: Machine Learning in Action. Manning Publications (2012)
17. Scikit-Learn: sklearn.metrics.precision recall fscore support. https://goo.gl/
4xxkGJ
Exploring the Quantified Experience: Finding
Spaces for People and Their Voices in Smarter,
More Responsive Cities

H. Patricia McKenna ✉
( )

AmbientEase and the UrbanitiesLab, Victoria, BC V8V 4Y9, Canada


mckennaph@gmail.com

Abstract. The objective of this paper is to explore the quantified experience in


the context of finding spaces for people and their voices in smarter and more
responsive cities. Using the construct of awareness, this exploration is situated
theoretically at the intersection of affective computing, social computing, and
pervasive computing. This paper problematizes the quantified experience in
human computer interactions (HCI), arguing for smart and responsive cities to be
enabled by more aware people interacting with and influencing aware technolo‐
gies. Aware people and aware technologies refer to the dynamic interweaving of
sensing, sensors, and sensor networks through the Internet of Things (IoT), the
Internet of People (IoP), and the Internet of Experiences. The methodology for
this paper includes an exploratory case study approach and the research design
incorporates multiple methods of data collection including survey and interviews.
Findings from this work highlight the need for qualitative data using content
analysis and other analytic techniques to augment, complement, and enhance the
quantitative data being generated and gathered in urban spaces. This work is
significant in that it: (a) explores elements of the contemporary urban quantified
experience through the lens of awareness and the sub-constructs of adaptability
and openness; (b) advances a framework for people-aware quantified experiences
in support of spaces for people and their voices in smarter, more responsive cities;
and (c) further develops and innovates the research and practice literature for
smart and responsive cities, in relation to people-aware quantified experiences.

Keywords: Affective computing · Awareness


Human Computer Interactions (HCI) · Internet of Experiences
Internet of Things (IoT) · Internet of People (IoP) · Pervasive computing
Quantified experience · Responsive cities · Sensing and sensor networks
Smart cities · Social computing

1 Introduction

The main objective of this paper is to explore the quantified experience in the context
of finding spaces for people and their voices in smarter and more responsive cities. This
work problematizes the quantified experience in human computer interactions (HCI),
arguing for smart and responsive cities to be enabled by more aware people interacting

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 269–282, 2019.
https://doi.org/10.1007/978-3-030-02686-8_22
270 H. P. McKenna

with and influencing aware technologies. Aware people and aware technologies refer to
the dynamic interweaving of sensing, sensors, and sensor networks through the Internet
of Things (IoT), the Internet of People (IoP), and the Internet of Experiences. Using the
construct of awareness to explore the quantified experience, this work is situated theo‐
retically at the intersection of affective computing, social computing, and pervasive
computing. Methodologically, an exploratory case study approach is used in this work
and the research design incorporates multiple methods of data collection including
survey and interviews. Additional details about the methodology are provided in
Sect. 3 of this paper. Briefly, data were gathered from diverse individuals across multiple
small to medium to large sized cities in several countries. Content analysis was used in
the analysis of qualitative data and descriptive statistics in the analysis of quantitative
data. A literature review was conducted for the Internet of Things, People, and Experi‐
ences and the complementing of quantified experiences in the context of smart and
responsive cities. The literature review enabled formulation of a theoretical perspective
for this work. This work is significant in that it: (a) explores elements of the contempo‐
rary urban quantified experience through the lens of awareness and the sub-constructs
of adaptability and openness; (b) advances a framework for people-aware quantified
experiences in support of spaces for people and their voices in smarter and more respon‐
sive cities; and (c) further develops and innovates the research and practice literature
for smart and responsive cities, in relation to people-aware quantified experiences.
In the context of smart cities, future cities, and rapid urbanization globally, the need
for a new urban agenda is advanced by the UN [1] that is, among other things, “people-
centered and measurable”. Konomi and Roussos [2] observe a movement beyond the
earlier conception of smart cities that emerged over the last decade “towards a deeper
level of symbiosis among smart citizens, Internet of Things and ambient spaces”. Gold‐
smith and Crawford [3] advance the notion of responsive cities, leveraging digital tech‐
nologies and data analytics in combination with civic engagement and governance. In
relation to the digital and aware technologies of sensing, sensors, and the Internet of
Things (IoT), Hotho et al. [4] define sensor using the Oxford English Dictionary, as “a
device which detects or measures a physical property and records, indicates, or otherwise
responds to it”. Hotho et al. [4] extend this definition to encompass “technological
sensors as well as human sensors” and sensing that “relates to the psychosocial envi‐
ronment” as in “sensing danger”, as well as enabling “a higher level of integration and
interpretation of different external and internal signals”. Friberg [5] combines the notion
of atmosphere and aesthetic education to propose an approach to the exploration of
performing everyday practices in relation to an awareness of the sensorial and bodily in
urban spaces. As such, the multi-sensorial capabilities of people described by Lévy [6]
from a human geography perspective emerge as awareness, an important form of
sensing.
This introduction and background gives rise to the main research question under
exploration in this work using the construct of awareness and the sub-constructs of
adaptability and openness.
Q1: How and why do people figure strongly in the making of more aware, adaptive, and open
analytic spaces to complement existing approaches to quantified experience in contemporary
urban environments?
Exploring the Quantified Experience: Finding Spaces for People 271

In summary, the primary purpose of this paper is to explore, innovate, and extend
spaces for theoretical and practical debate for quantified experiences in ways that involve
people more directly, knowingly, and creatively. What follows is the development of a
theoretical perspective for this work in the formulation of a conceptual framework for
more people-aware quantified experiences. The framework will then be operationalized
for use in this work using quantitative data complemented with qualitative data. The
methodology for this work is described and the findings are presented along with an
analysis and discussion. The limitations and mitigations of the work are discussed and
future directions are identified, followed by the conclusion.

2 Theoretical Perspective

A review of the research literature was conducted for smart and responsive cities; the
Internet of Things, the Internet of People, and the Internet of Experiences; and oppor‐
tunities for complementing the quantified experience. This theoretical perspective forms
the basis for the formulation of a conceptual framework for more people-aware quan‐
tified experiences.

2.1 Smart and Responsive Cities

Townsend [7] describes smart cities as “places where information technology is


combined with infrastructure, architecture, everyday objects and even our bodies, to
address social, economic, and environmental problems”. Kyriazopoulou [8] provides a
literature review of architectures and requirements for the development of smart cities,
highlighting the sectors identified by Giffinger et al. [9] of smart economy, people,
governance, mobility, environment, and living as the focus for improvement. According
to Kyriazopoulou [8], “offering citizens a great experience” is a primary goal of smart
cities. Gil-Garcia et al. [10] identify 14 dimensions in conceptualizing smartness in
government such as citizen engagement, openness, creativity, technology savvy, and
resilience, to name a few. According to Gil-Garcia et al. [10] citizen engagement “allows
two-way communication and enables collaboration and participation, fostering stronger
and more intelligent relationships” while resilience contributes to the ability to “adapt
to change”. Khatoun and Zeadally [11] provide a smart city model consisting of the
Internet of Things (IoT), the Internet of Services (IoS), the Internet of Data (IoD), and
the Internet of People (IoP) where the IoP highlights smart living and smart people.

2.2 Internet of Things, People, and Experiences

Herzberg [12] describes the Internet of Things (IoT) as “a network that enables physical
objects to collect and exchange data” while describing the Internet of Everything as “a
future wherein devices, appliances, people, and process are connected via the global
Internet”. Vilarinho et al. [13] describe the use of activity feeds in social computing as
a unified communication mechanism for connecting the IoT with the IoP. Li [14] main‐
tains that the IoP “refers to digital connectivity of people through the Internet
272 H. P. McKenna

infrastructure forming a network of collective intelligence and stimulating interactive


communication among people”. An infrastructure is proposed by Miranda et al. [15] in
support of “moving from the Internet of Things to the Internet of People” where “smart‐
phones play a central role, reflecting their current use as the main interface connecting
people to the Internet”. According to Miranda et al. [15] key principles of the IoP include:
social, personalized, proactive, and predictable. Indeed, Miranda et al. [15] employs the
IoP concept to draw “the IoT closer to people, for them to easily integrate into it and
fully exploit its benefits.” Conti et al. [16] argue for “a radically new Internet paradigm”
in the form of “the Internet of People (IoP)” in which people move beyond “end users
of applications” to “become active elements of the Internet.” McKenna [17] explored
the experience of contemporary city environments through urban edges, surfaces,
spaces, and the in-between in an effort to “complement, extend, and enrich algorithmic
and network views.” Wellsandt et al. [18] describe the Internet of Experiences (IoE) in
terms of an experience-centered approach “to complement human-centered innovation
with experiences from artificial systems.”

2.3 Complementing Quantified Experiences


The United Nations [1] notes that, “urban space is being reimagined” while Casini [19]
calls for smart city initiatives to move beyond a focus on “individual areas” toward a
more “integrated approach” taking advantage of “new enabling infrastructures” in
combination with sensor technologies. In this way, cities are encouraged to build upon
existing structures in “exploiting synergies and interoperability between systems to
deliver added value services for citizens to improve their quality of life” [19]. Falcon
and Hamamoto [20] claim that the mass amounts of data being generated in everyday
life “through the Web” and “on city streets” are opening the way for “bodies of data
together with algorithms” that “will shape who we think we are” and “who we will
become.” As mentioned earlier, Gil-Garcia et al. [10] identify creativity and openness
as two of 14 key drivers for conceptualizing smartness in government. It is worth noting
that, according to Amabile [21], a component of creativity is the open-endedness or
heuristic dimension as distinct from “having a single, obvious solution (purely algo‐
rithmic).” And Dourish [22] points out that, “our experience of algorithms can change
as infrastructure changes.”
McKenna et al. [23] explored the potential for the assessment of creativity through
an adaptation of the Consensual Assessment Technique (CAT) for use in technology-
pervasive learning environments. Using a social radio application as an example of a
social media space, McKenna et al. [23] explored environments “characterized by
awareness, autonomy, collaboration, and real time data analytics potential.” McKenna
and Chauncey [24] introduced the CAT into library, information, and learning spaces,
proposing the technique be adapted to accommodate the assessment of creativity, inno‐
vation, and value in everyday, in-the-moment activities. As such, the CAT was explored
[24] in terms of involving people more directly and knowingly in new partnering and
collaborative opportunities in relation to data and learning analytics. By extension, this
current work proposes the consideration of similar techniques for more meaningfully
and directly involving people in the analysis and assessment of quantified experiences
Exploring the Quantified Experience: Finding Spaces for People 273

in the context of smarter and more responsive cities. Indeed, Baumer [25] proposes a
human-centered algorithm design (HCAD) to address gaps or disconnects between
algorithm metrics focused on performance on the one hand and concerns with incorpo‐
rating “human and social interpretations” on the other. In making algorithmic design
more people centered, Baumer [25] identifies three approaches focused on the theoret‐
ical, speculative, and participatory. McKenna [26] explores “the three key enrichment
mechanisms of awareness, creativity, and serendipity in the context of the IoT and the
IoP” pointing to “the potential for a shift to occur” possibly opening new spaces “for
the combining of algorithmic and heuristic activities” and the evolving of “algorithmic/
heuristic relationships in smart cities.”

2.4 Conceptualizing People-Aware Quantified Experiences


This theoretical background enables formulation of a conceptual framework for more
people-aware quantified experiences. As depicted in Fig. 1, the people-technologies-
cities dynamic in public spaces, utilizes a combination of the Internet of Things (IoT),
the Internet of People (IoP), and the Internet of Experiences (IoE), combining aware
people and aware technologies, in the form of responsive, engaging, and evolving
mechanisms and approaches contributing to greater awareness, adaptability, and open‐
ness for fostering future technology spaces with potentials for developing and accom‐
modating people-aware quantified experiences.

Fig. 1. Conceptual framework for people-aware quantified experiences.

The research question (Q1) identified in Sect. 1 of this work is reformulated as a


proposition for exploration in this paper, as follows
P1: People and their multi-sensorial capabilities, in combination with aware technologies, enable
the enhancing of sensing, sensors, and the Internet of Things, People, and Experiences contri‐
buting to greater awareness, adaptability, and openness in support of greater potentials for more
creative and people-aware analytic spaces to complement existing approaches to quantified
experience in contemporary urban environments.
274 H. P. McKenna

3 Methodology

An emergent, exploratory case study approach was used for this work, said to be partic‐
ularly appropriate for the study of contemporary phenomena [27]. Contemporary urban
environments constituted the case for this study. In Sects. 3.1–3.3 a description of the
process followed for this study is provided, the sources of evidence, and the data analysis
techniques used.

3.1 Process

A website was used to describe the study, invite participation, and enable sign up.
Demographic data were gathered during registration for the study including location,
age range, and gender. People were able to self-identify in one or more categories (e.g.,
educator, learner, community member, city official, business, etc.). Registrants were
invited to complete a survey containing 20 questions as an opportunity to think about
smart cities in relation to awareness, adaptability, and openness for improved livability.
In-depth interviews with participants enabled discussion of urban experiences and ideas
about smart cities. A pre-tested survey instrument was used for this study as well as a
pre-tested interview protocol.

3.2 Sources of Evidence

This study attracted international interest with participants located mostly in small to
medium to large sized cities in Canada (e.g., St. John’s, Ottawa, Greater Victoria),
extending also to other countries such as Israel (e.g., Tel Aviv). Survey responses
provided the main source of quantitative data for this study while interview data provided
qualitative evidence for this study along with data provided in response to open-ended
survey questions. Three questions common to both the survey instrument and interview
protocol were adapted from Anderson’s [28] body insight scale (BIS), as a mechanism
for exploring the human-centered sensing of cities as a form of awareness. By contrast,
other scales such as that by Teixiera et al. [29] pertain to human sensing using computing
technologies for the detection of elements such as presence, count, location, track, and
identity. More appropriate for this study, the BIS scale was designed for “assessing
subtle human qualities” and this body insight scale [28], formerly the body intelligence
scale [28], consists of three subscales—energy body awareness (E-BAS); comfort body
awareness (C-BAS); and inner body awareness (I-BAS). Anderson encourages use of
the scale in other domains and as such, the BIS is explored in this work in relation to
people and their experience of everyday urban environments. Also of note is the impor‐
tance of feeling and affect in human computer interactions where emotion is considered
to be “a critical element of design for human experience” [30], applicable here in the
context of smart and responsive cities. The three questions adapted for use in this work
correspond to each of the BIS sub-scales and are slightly altered in terms of wording,
as follows:
Exploring the Quantified Experience: Finding Spaces for People 275

1. Regarding your body awareness in your city, would you agree that your body lets
you know when your environment is safe (On a scale of 1 to 5 on a continuum of
disagree to agree)?
2. Regarding your comfort body awareness in the world, would you agree that you
feel comfortable in the world most of the time (On a scale of 1 to 5)?
3. Regarding your inner body awareness in your city, would you agree that you can
feel your body tighten up when you are angry (On a scale of 1 to 5)?
In parallel with this study, evidence was also gathered through individual and group
discussions with people from diverse sectors across multiple cities (e.g., Toronto,
Vancouver, and Greater Victoria). Perspectives across the city emerged from those in
business (architectural design, ecology, energy, information technology (IT), tourism),
government (city councilors, policy makers, IT staff), educators (secondary and post-
secondary, researchers, IT staff), students (post-secondary – engineering/design/
computing/education/media), and community members (IT professionals, urban
engagement leaders, urban designers, and policy influencers).

3.3 Data Analysis


Qualitative data were analyzed using the content analysis technique involving inductive
analysis to identify emerging terms from the data collected while deductive analysis
enabled the identification of terms emerging from the review of the research literature.
Data were then analyzed for patterns and emergent insights. Descriptive statistics were
used in the analysis of quantitative data. Qualitative evidence gathered from discussions
in parallel with this study supported further analysis, comparison, and triangulation of
data, contributing further insight and rigor.
Overall, data were analyzed for an n = 61 spanning the age ranges of people in their
20s to their 70s, consisting of 39% females and 61% males.

4 Findings

The findings of this paper are presented in terms of the main construct of awareness with
attention given to the sub-constructs of adaptability and openness in terms of the prop‐
osition explored in this work, in response to the research question.

4.1 Awareness

Regarding technology awareness, City IT staff described the IoT as “more about the
instrumentation of things, with everything connected and communicated”. A community
member in St. John’s observed that “we’re not smart about how we use the technology”.
A student noted the pervasive sharing of “very traditional things” and events in daily
lives where people are “all videoing them, sharing them constantly in social media,”
described as “a seamless behavior” contributing to a “seamless interrelationship” of the
“local and global” generating “concurrent awareness.”
276 H. P. McKenna

Based on questions adapted for the city in this study from the body insight scale
(BIS), an emerging example of a people-aware quantified experience is presented in
Table 1. During the 2015 to 2016 phase of this study an abbreviated version of Ander‐
son’s 5-point scale was used to assess urban awareness in relation to the energy body
and feeling safe; the comfort body; and the inner body and feelings of tightness when
angry. Responses from individuals show feelings of safety at the upper end of the scale
with 67% at position 4 and 33% at position 5. Feelings of comfort in the world tend
toward the high end of the scale with 67% at position 5 and 33% at the neutral position
of 3. Feelings of tightness related to anger are spread equally at 33% across the neutral
position of 3 and the upper end of the scale at positions 4 and 5.

Table 1. Awareness in the city – body insight scale (2015/2016)


Awareness 1 2 3 4 5
Energy body: feeling safe 67% 33%
Comfort body: in the world 33% 67%
Inner body: tightens when angry 33% 33% 33%

In discussions with respondents about the BIS questions, it was suggested that the
term “world” contributed to confusion when assessing levels of comfort in a particular
city. Based on this use experience, it was suggested that the phrase “the world” be
replaced with “your city.” The 5-point scale was also found to be too restrictive and it
was suggested that the scale be extended from 5 to 7 points.

Table 2. Awareness in the city – body insight scale (2016/2018)


Awareness 1 2 3 4 5 6 7
Energy body: feeling safe 33% 67%
Comfort body 67% 33%
Inner body: tightens when angry 33% 33% 33%

Guided by feedback from respondents in 2015 to 2016, wording and scale adapta‐
tions were pre-tested and approved for use in this study from 2016 going forward. This
enriched and emerging example of a people-aware quantified experience is presented in
Table 2. Survey responses from individuals show that feelings of safety continue to
emerge at the upper end of the scale in position 7 (67%), with people indicating that their
body lets them know when their environment is safe. However, 33% responded at the
much lower end of the scale at position 2. During interviews it was possible to discuss
the scale rating choices to learn more about underlying factors. Open-ended survey
responses also provided additional insight. For example, in the case of those residing
outside the city or urban area, the response rate drops sharply toward the lower end of
the scale (33%) for feelings of safety during experiences of visiting the city. Regarding
comfort levels in the city, responses varied from the high end at one extreme at position
7 (33%) to an increased concentration appearing at the much lower position of 3 on the
scale (67%). Where urban comfort levels tended toward the higher end of the scale in
cities in 2015 to 2016, comfort levels shifted noticeably in 2016 to 2018 in cities toward
Exploring the Quantified Experience: Finding Spaces for People 277

the lower end of the scale. In part, comfort was influenced by urban design elements,
such as the placement of benches. Feelings of tension in the city, such as anger, appearing
in Table 1 (33% at the 3, 4, and 5 positions) seem to remain relatively consistent with
those emerging in Table 2, tending toward the mid to higher positions of the scale with
33% at the 4, 5, and 6 positions. During interviews it was reported that feelings of tense‐
ness and anger depended upon the city where, in a smaller scale city, the inability to
find a parking spot may contribute to anger, while in a much larger urban center such
as London, being tense “would be normal” pointing to “a difference in how you carry
yourself” depending on the city.

4.2 Adaptability
Mechanisms and approaches to accommodate new forms of adaptability in urban inter‐
actions emerged in a variety of ways. For example, an educator in Vancouver described
the importance of people coming together in the city where “the meeting becomes the
technology that changes everything.” A building designer noted that, “people want to
be able to interact and really be in an overall environment” calling for changes in urban
design. A community organizer in Victoria observed how City Council members “go
where the citizens are” when there is “an opportunity for public engagement.” In the
case of wanting “to reengage with our bylaws about growing food on city land,” Council
members and/or city staff will attend “city events” rather than “just posting something
on their website” as “a really effective way to engage the community.” From a creativity
perspective, a community leader articulated the need to figure out how to “move away
from sector driven strategies to ones that” feature “clusters” so as to “bring industries
and sectors together rather than that sort of silo” approach. Cross-sector initiatives were
identified related to “connected cities,” while recognizing the potential for, and impor‐
tance of, funding for smart cities.

4.3 Openness

City IT staff commented that “fundamentally there is a desire to be very, very open with
the available data” as public data. It was noted that “the other element we’re trying to
share is even just the processes of City Hall” using the example of permit applications.
A locally developed mobile app was described by an educator in terms of the capability
of being “able to open this kind of feedback” potential to anyone in the city as a way
“to transform contributions both in terms of unique ideas and patterns into the design of
some urban space or buildings” as in “smart infrastructure.” A building designer
described the focus on creating a “whole urban space” enabling a coming together of
people so as “to make it feel like its not this closed in community.” The designer
suggested the potential for “having buildings or alleyways” serve as “more than just that
intended use” so as to become multi-use and multi-purpose spaces. A community leader
suggested that, “one of the challenges that the building community faces in doing these
things is financial.” Reference was made to the importance of planning for “an open
innovation event” designed to be “more engaging” inviting proposals to “pilot ideas” to
address urban challenges going forward. Regarding social media and openness, a student
278 H. P. McKenna

questioned the veracity of information provided to platforms, pointing to the frequent


contributing of “made up” details in an effort to maintain some degree of privacy.
Explored quantitatively, as illustrated in Table 3, when asked to assess the extent to
which openness is associated with smart cities on a 7 point scale (1 – Not at all, 2 – Not
sure, 3 – Maybe, 4 – Neutral, 5 – Sort of, 6 – Sure, 7 – Absolutely) the majority of
responses emerge toward the upper end of the scale with 33% at positions 6 and 7 along
with a 33% response at the neutral position of 4.

Table 3. Openness and smart cities – assessments


Smart cities 1 2 3 4 5 6 7
Openness 33% 33% 33%

Exploring quantitatively the potentials for attuning, sharing, and trust, people were
asked to assess these elements in relation to city-focused social media and other aware
technologies on a scale of 1 to 7 (not at all to absolutely). As illustrated in Table 4,
assessments of attuning to urban spaces tended toward the upper end of the scale with
33% at the 6 position and 67% at position 7. Again, sharing is strong with 67% at the
upper end of the scale in position 7 and 33% in position 6. Trust emerges toward the
upper end of the scale with 67% of responses at the 5 position and 33% at 7.

Table 4. Attuning, sharing, and trust – assessments


Smart cities 1 2 3 4 5 6 7
Attuning 33% 67%
Sharing 33% 67%
Trust 67% 33%

A summary of findings is presented in Table 5 in terms of the three constructs of


awareness, adaptability, and openness in relation to the technologies of the Internet of
Things (IoT), the Internet of People (IoP), and the Internet of Experiences (IoE). IoT
technologies emerge in relation to awareness as instrumented, as meeting spaces for
adaptability, and as mobile apps for openness. IoP technologies highlight awareness in
relation to seamless behaviour, as clusters for adaptability, and as piloting ideas across
diverse sectors for openness. IoE technologies contribute to multi-dimensional aware‐
ness, connected cities for adaptability, and to calls for attention to the veracity of data
in social media and other online platforms in relation to openness and associated
concerns with privacy in urban spaces.

Table 5. Summary of findings


Tech Awareness Adaptability Openness
IoT Instrumented Meeting spaces Mobile app
IoP Seamless behavior Clusters Piloting ideas
IoE Multi-dimensional Connected cities Veracity/privacy
Exploring the Quantified Experience: Finding Spaces for People 279

5 Discussion

Awareness-based findings suggest an instrumented, technology perspective from infor‐


mation technology professionals balanced by community member voices highlighting
the importance of being “smart about how we use the technology.” The seamless inter‐
mingling of the IoT-IoP-IoE emerges in the observations of a student articulating the
“concurrent awareness” of the local and the global. The nature of pervasive sharing
described in the findings, enriches the quantitative details provided in Table 4 for
attuning and sharing. Trust level assessments in Table 4, while relatively strong, suggest
an underlying tentativeness with 67% at position 5 and 33% at the upper end of the scale
at 7, when compared with responses for attuning and sharing. The multi-dimensionality
of the urban experience is highlighted through early-stage use of the body insight scale
(BIS) to explore feelings of safety, comfort, and tension levels more directly with people.
Early indications of factors influencing responses to use of the BIS pertain to city size,
urban design elements, familiarity with the city, and other emerging and evolving aspects
of cities and city regions that may include density (e.g., increasing urbanization over
time) and geographic location. Adaptability-related findings emphasize the importance
of figuring out effective ways to bring people together – meetings, clusters, technologies
– in support of more community focused approaches to engagement and governance for
connected cities. Openness-related findings pertained to the use of an urban app for more
inclusive use as smart infrastructure; the piloting of ideas in developing designs for
greater connection in multi-use urban spaces; and the veracity of social media and other
platform data in the face of underlying privacy concerns, shedding light on Table 3 and
quantitative assessments of openness, with implications for quantified experiences.

6 Future Directions

Findings from this work highlight the need for qualitative data to augment, complement,
and enhance the quantitative data being generated and gathered in urban spaces. Issues
related to the veracity of large amounts of data providing the basis for algorithmic activ‐
ities gives rise to concerns identified here with “made up” details and the resulting effect
on algorithmic accuracy. As such, this work points to new pathways for the involvement
of people more meaningfully and directly in the creation of spaces, both in theory and
practice, for interaction in algorithmic realms. Such spaces will contribute to the shaping
of debates, algorithmic designs, and new possibilities and potentials for more creative
outcomes in the innovating of quantified experiences as more people-aware.

7 Challenges, Limitations, and Mitigations

Limitations of this work related to small sample size are mitigated by in-depth and rich
detail from a wide range of individuals across small to medium to large urban centers.
Challenges related to geographic location are mitigated by the potential to extend this
work to other cities, including megacities and regions exceeding 10 million people. The
challenge of studying emergent, dynamic, and evolving understandings of smart cities
280 H. P. McKenna

through awareness, adaptability and openness is mitigated by opportunities to explore


the making of openings and spaces for innovative opportunities going forward for
quantified experiences. While only a limited number of possible body insight scale (BIS)
questions were adapted for exploration in this work, opportunities exist for further vali‐
dation of these questions for use in urban environments going forward and for the inclu‐
sion of additional questions.

8 Conclusion

This paper provides an exploration of the evolving area of aware people and aware
technologies in relation to quantified experiences in smart cities. Key contributions of
this work include: (a) the use of awareness, adaptability, and openness in relation to the
Internet of Things (IoT), the Internet of People (IoP), and the Internet of Experiences
(IoE) as aspects of smart cities, in exploring the potential for innovating quantified
experiences; (b) formulation of a conceptual framework for people-aware quantified
experiences; (c) early-stage exploration of adaptations to the body insight scale (BIS)
for use in the study of quantified experiences in contemporary urban environments; and
(d) further development of the smart cities research and practice literature in relation to
innovations for quantified experiences. A major take away from this work is the critical
importance of aware people in combination with aware technologies in fostering new
potentials for the making of innovative spaces to accommodate people more meaning‐
fully and directly in the algorithmic realm in smart cities. This work will be of interest
to technology developers, researchers, research think tanks, urban practitioners,
community members, and anyone concerned with more creative and innovative quan‐
tified experience initiatives for future tech, smarter cities, and more responsive cities.

References

1. Habitat, U.N.: Urbanization and Development: Emerging Futures—World Cities Report


2016. UN Habitat, Nairobi (2016)
2. Konomi, S., Roussos, G.: Enriching Urban Spaces with Ambient Computing, the Internet of
Things, and Smart City Design. IGI Global, Hershey (2017)
3. Goldsmith, S., Crawford, S.: The Responsive City: Engaging Communities Through Data-
Smart Governance. Jossey-Bass, San Francisco (2014)
4. Hotho, A., Stumme, G., Theunis, J.: Introduction: new ICT-mediated sensing opportunities.
In: Loreto, V., Haklay, M., Hotho, A., Servedio, V.D.P., Stumme, G., Theunis, J., Tria, F.
(eds.) Participatory Sensing, Opinions and Collective Awareness, pp. 3–8. Springer, Cham
(2017)
5. Friberg, C.: Performing everyday practices: atmosphere and aesthetic education. Ambiances
Int. J. Sens. Environ. Archit. Space Var. 464, 1–12 (2014)
6. Lévy, J. (ed.): The City: Critical Essays in Human Geography. Contemporary Foundations
of Space and Place Series. Routledge, London (2016)
7. Townsend, A.M.: Smart Cities: Big Data, Civic Hackers and the Quest for a New Utopia.
WW Norton, New York (2013)
Exploring the Quantified Experience: Finding Spaces for People 281

8. Kyriazopoulou, C.: Architectures and requirements for the development of smart cities: a
literature study. In: Elfhert, M., et al. (eds.) Smartgreens 2015 and Vehits 2015, CCIS 579,
pp. 75–103. Springer, Cham (2015)
9. Giffinger, R., Fertner, C., Kramar, H., Kalasek, R., Pichler-Milanovic, N., Meijers, E.: Smart
Cities: Ranking of European Medium-Sized Cities. University of Technology, Vienna (2007)
10. Gil-Garcia, J.R., Puron-Cid, G., Zhang, J.: Conceptualizing smartness in government: an
integrative and multi-dimensional view. Gov. Inf. Q. 33(3), 524–534 (2016)
11. Khatoun, R., Zeadally, S.: Smart cities: concepts, architectures, research opportunities.
Commun. ACM 59(8), 46–57 (2016)
12. Herzberg, C.: Smart Cities, Digital Nations: How Digital Urban Infrastructure can Deliver a
Better Life in Tomorrow’s Crowded World. Roundtree Press, Petaluma (2017)
13. Vilarinho, T., Farshchian, B.A., Floch, J., Mathisen, B.M.: A communication framework for
the Internet of People and Things based on the concept of activity feeds in social computing.
In: Proceedings of the 9th International Conference on Intelligent Environments, pp. 1–8
(2013)
14. Li, M.: Editorial: Internet of People. Concurr. Comput. Pract. Exp. 29, 1–3 (2017)
15. Miranda, J., Mäkitalo, N., Garcia-Alonso, J., Berrocal, J., Mikkonen, T., Canal, C., Murillo,
J.M.: From the Internet of Things to the Internet of People. IEEE Internet Comput. 19(2), 40–
47 (2015)
16. Conti, M., Passarella, A., Das, S.K.: The Internet of People (IoP): a new wave in pervasive
mobile computing. Pervasive Mob. Comput. 41, 1–27 (2017)
17. McKenna, H.P.: Edges, surfaces, and spaces of action in 21st century urban environments—
connectivities and awareness in the city. In: Kreps, D., Fletcher, G., Griffiths, M. (eds.)
Technology and Intimacy: Choice or Coercion, Advances in Information and Communication
Technology, vol. 474, pp. 328–343. Springer, Cham (2016)
18. Wellsandt, S., Wuest, T., Durugb, C., Thoben, K.D.: The Internet of Experiences—towards
an experience-centred innovation approach. In: Emmanouilidis, C., Taisch, M., Kiritsis, D.
(eds.) Advances in Production Management Systems, Competitive Manufacturing for
Innovative Products and Services, APMS 2012. IFIP Advances in Information and
Communication Technology, vol. 397, pp. 669–676. Springer, Berlin (2013)
19. Casini, M.: Green technology for smart cities. In: IOP Conference Series: Earth and
Environmental Science, vol. 83, p. 012014, 2nd International Conference on Green Energy
Technology, pp. 1–10 (2017)
20. Falcon, R., Hamamoto, B.: Bodies of Data: Who are We Through the Eyes of Algorithms.
Future Now. Institute For The Future (IFTF), Palo Alto (2017)
21. Amabile, T.M.: Componential theory of creativity. In: Kessler, E.H. (ed.) Encyclopedia of
Management Theory. Sage, Los Angeles (2013)
22. Dourish, P.: Algorithms and their others: algorithmic culture in context. In: Big Data and
Society, pp. 1–11 (2016)
23. McKenna, H.P., Arnone, M.P., Kaarst-Brown, M.L., McKnight, L.W., Chauncey, S.A.:
Application of the consensual assessment technique in 21st century technology-pervasive
learning environments. In: Proceedings of the 6th International Conference of Education,
Research and Innovation (iCERi2013), pp. 6410–6419 (2013)
24. McKenna, H.P., Chauncey, S.A.: Exploring a creativity assessment technique for use in 21st
century learning, library, and instructional collaborations. In: Proceedings of the 8th
International Conference of Education, Research and Innovation (iCERi), pp. 5371–5380
(2015)
25. Baumer, E.P.S.: Toward Human-Centered Algorithm Design. In: Big Data & Society, pp. 1–
12 (2017)
282 H. P. McKenna

26. McKenna, H.P.: Creativity and ambient urbanizing at the intersection of the Internet of Things
and People in smart cities. In: Universal Access in Human–Computer Interaction, Virtual,
Augmented, and Intelligent Environments. Lecture Notes in Computer Science, vol. 10908.
Springer, Cham (2018)
27. Yin, R.K.: Case Study Research and Applications: Design and Methods. Sage, Los Angeles
(2018)
28. Anderson, R.: Body Intelligence Scale: defining and measuring the intelligence of the body.
Hum. Psychol. 34(4), 357–367 (2006)
29. Teixiera, T., Dublon, G., Savvides, A.: A survey of human-sensing: methods for detecting
presence, count, location, track, and identify. ENALAB Technical Report 09-2010, vol. 1,
no. 1 (2010)
30. Hanington, B.: Design and emotional experience: introduction. In: Jeon, M. (ed.) Emotions
and Affect in Human Factors and Human–Computer Interaction, pp. 165–183. Elsevier,
London (2017)
Prediction of Traffic-Violation Using Data
Mining Techniques

Md Amiruzzaman(B)

Kent State University, Kent, OH 44242, USA


mamiruzz@kent.edu

Abstract. This paper presents the prediction of traffic-violations using


data mining techniques, more specifically, when most likely a traffic-
violation may happen. Also, the contributing factors that may cause
more damages (e.g., personal injury, property damage, etc.) are discussed
in this paper. The national database for traffic-violation was considered
for the mining and analyzed results indicated that a few specific times
are probable for traffic-violations. Moreover, most accidents happened
on specific days and times. The findings of this work could help prevent
some traffic-violations or reduce the chance of occurrence. These results
can be used to increase cautions and traffic-safety tips.

Keywords: Traffic · Prediction · Crime · Violations · Data mining

1 Introduction

According to [1], approximate population of US is 326,200,000, and there are


196,000,000 licensed drivers [2]. However, based on the data presented in [2],
every day in average of 112,000 tickets are issued for different types of traffic-
violations (mainly speeding). Altogether, approximately 41,000,000 tickets are
issued every year (see Table 1). The statistics provides an overview of the traffic-
violations in the US, and there are number of reasons that causes traffic-
violations. As the number of vehicles are increasing every day, so does the
chance of traffic-violations [3,4]. Often, traffic-violations lead to road accidents
and injuries (Chen et al. 2004; Nath 2006).
Chen et al. in [3] classified different types of crime at different law-
enforcement level. Such as, sex crime in law-enforcement level two, and theft
(e.g., robbery, burglary, larceny, etc.) in law-enforcement level three. In their
classification, traffic-violation is one of the common local crimes [3]. In general,
bad weather, unskilled drivers, drunk drivers, and drivers who pay less attention
while driving may cause traffic-violations, as well as road accidents. However,
there may be some other contributing reasons that may lead to traffic-violations
and road accidents. For example, speeding, reckless driving, driving under influ-
ence of drugs or alcohol, hit-and-run, road rage, etc. The research [3] mainly
focused on crimes and who is committing them, rather than traffic-violations.
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 283–297, 2019.
https://doi.org/10.1007/978-3-030-02686-8_23
284 Md. Amiruzzaman

Table 1. Traffic-violation statistics

Driving citation statistics


Average number of people per day that receive a speeding ticket 112,000
Total annual number of people who receive speeding tickets 41,000,000
Total percentage of drivers that will get a speeding ticket this year 20.6 %

Solomon et al. (2006) analyzed traffic-violation data to develop traffic safety


program [4]. Their research focused on identifying places where traffic-violations
occurred and how to better monitor those places. Solomon et al. (2006) proposed
to use more camera/surveillance to monitor those identified high traffic-violation
places and use those surveillance footages to identify responsible parties [4]. This
research [4] helped to improve traffic-safety programs.
In a separate study, Saran and Sreelekha (2015) found correlations between
drunk driver, careless driving, over the speed limit and road accidents [5]. How-
ever, these findings are not something new to the law-enforcement agencies and
research communities. Moreover, [5] mainly focused on statistical analysis (i.e.,
correlation analysis) and surveillance. In their paper, Saran and Sreelekha [5]
used Artificial Neural Network (ANN) for vehicle detection. They also focused
on Intelligent Transport System (ITS), which incorporate latest computer tech-
nologies and computer vision [5]. Saran and Sreelekha (2015) indicated that
ANN is superior in classificying moving vehicles than Support Vector Machine
(SVN) and k-nearest neighbor (k-nn) algorithms. Note that, SVN and k-nn are
two most popular algorithms that are widely used in data mining.
Gupta, Mohammad, Syed and Halgamuge (2016) found a correlation between
crime rates and accidents from Denver city of Colorado state [6]. Note that
traffic-violations may lead to violent crimes as well. For example, drunk driver
may cause some property damage or injury to others. From their mining research,
Gupta et al. (2016) were able to predict that in the months of January and
February, most crimes are likely to occur. These findings were helpful to the law-
enforcement agencies (Gupta et al. 2016). The major drawback of [6] research
is that authors only focused on one specific city of a state. Analyzing national
database is necessary to understand how traffic-violations occurring in the US.
Nath (2006) indicated that most criminals along with other crimes, com-
mitted traffic-violation crimes as well [7]. One of the interesting findings from
Nath (2008) was to claim that 10% criminals commits 50% of the crimes. Chen
et al. (2004) mentioned that a traffic-violation is a primary concern for city,
county, and state level law-enforcement agencies. In [7], authors mainly focused
on where and how many Closed-Circuit Television (CCTV) would be helpful to
find responsible parties.
The purpose of this study is to predict traffic-violations based on previous
incidents. The national database for traffic violations is to be examined to deter-
mine any factors that contributed to previous traffic-violations and developed
the prediction. Also, what time and days are most violations occur will be deter-
mined using the mining as well.
Prediction of Traffic-Violation Using Data Mining Techniques 285

The rest of this paper is organized as follows: Sect. 2 describes existing litera-
tures. Section 3 describes the method used in this study and Sect. 4 summarizes
the experimental results. Section 5 presents discussion about the experimental
results and Sect. 6 concludes the paper with implications and future works.

2 Literature Review
Chen et al. (2004) studied different types of crime, such as traffic-violations,
sex crime, theft, fraud, arson, gang/drug offenses, violent crime, and cybercrime
[3]. Also, they classified these crime types to different law-enforcement levels
(e.g., level one, level two, etc.). Chen et al. (2004) identified traffic-violations
as level one crime and one of the common local crimes [3]. They mentioned
that speeding, reckless driving, causing property damage or personal injury in a
collision, driving under influence of drugs or alcohol, hit-and-run, and road rage
are common reasons for traffic-violations [3]. According to Chen et al. (2004),
traffic-violations mostly considered as less harmful crime, however, sometimes
this type of crime could cause severe bodily injury or property damage [3]. Even
though, Chen et al. [3] discussed about traffic-violation and other crimes, but
their work actually did not focus on traffic-violation analysis. Rather, their work
focused on other types of crime analysis and prediction of those crimes to help
law-enforcement agencies.
Solomon, Nguyen, Liebowitz and Agresti (2006) demonstrated how to use
data mining (DM) and evaluate cameras that monitor red-light-signals in traf-
fic intersections [4]. Based on their findings they proposed some techniques to
improve traffic safety programs. In their work, they used different modeling
techniques, such as decision trees, neural networks, market-basket analysis, and
k-means. Solomon et al. (2006) focused on identifying places where red-light-
signal violations occurred and how to better monitor those places. The red-light
violation is known as red light running (RLR), and according to the Federal
Highway Administration (FHWA), approximately 1,000 Americans were killed
and 176,000 were injured in 2003 because of RLR.
To describe the severity of RLR and its damage on the economy, Solomon
et al. (2006) in [4] wrote, “The California Highway Patrol estimates that each
RLR fatality costs the United States $2,600,000 and other RLR crashes cost
between $2,000 and $183,000, depending on severity (California State Auditor,
2002)” (p. 621). As for the recommendation, they proposed to use more cam-
era/surveillance to monitor those identified high traffic-violation places and use
those surveillance footages to identify responsible parties. As for their data, they
used traffic-violation data from Washington, DC area; the data was collected
between the year 2000 and 2003 (Solomon et al. 2006). In terms of findings,
their [4] work helped law-enforcement agencies to find responsible parties using
the red light camera (RLC). However, placing RLCs in a right place is not an
easy task. Data mining technique can be helpful to determine the high accident
zone and place RLCs in appropriate locations.
In a separate study [5], Saran and Sreelekha (2015) found correlations
between drunk driver, careless driving, over the speed limit and road accidents.
286 Md. Amiruzzaman

However, these findings are not something new to the law-enforcement agencies
and to the research communities [5]. Their work [5] was more of a classifica-
tion than data mining. They used videos obtained from closed circuit television
(CCTV) cameras placed in roadsides or driveways are used for the surveillance.
They used artificial neural networks (ANN) to detect different types of vehi-
cles [5]. While detecting different types of vehicles are important and interesting
work, however, the need for traffic-violation data mining remain the unsolved.
In their work [5], Saran and Sreelekha (2015) mainly focused on road safety and
surveillance system.
Gupta, Mohammad, Syed and Halgamuge (2016) found a correlation between
crime rates and accidents from Denver city of Colorado state. Note that traffic-
violations may lead to violent crimes as well [6]. For example, drunk driver
may cause some property damage or injury to others. To describe the phe-
nomenon, they said in [6] “The major cause of road accidents is drink driving,
over speed[ing], carelessness, and the violation of traffic rules” (p. 374). From
their mining research, Gupta et al. (2016) were able to predict that in the months
of January and February, most crimes are likely to occur. These findings were
helpful to the law-enforcement agencies (Gupta et al. 2016). They used data
from the National Incident-Based Reporting System (NIBRS), The dataset con-
tained 15 attributes and 372,392 instances [6]. While, Gupta et al. (2016) in
[6] presented interesting findings based on their data mining research, however,
their work is mainly focused on a specific city of a specific state. It is important
that a research study focus on the entire US and try to generalize the findings
mentioned in [6].
Nath (2006) in [7] indicated that most criminals along with other crimes,
committed traffic-violation crimes as well. One of the interesting findings from
Nath (2008) was to claim that 10% criminals commits 50% of the crimes. Chen et
al. (2004) mentioned that a traffic-violation is a primary concern for city, county,
and state level law-enforcement agencies. They also added that traffic-violations
and other criminal activities may be related, and information obtained from
traffic-violations can be further used to find criminals. They focused on getting
contact information from the Department of Motor Vehicles (DMV).
This paper, will provide an overview of traffic-violation data mining as well
as some interesting findings that can be helpful to maintain cautions and prevent
unwanted traffic-violations. The proposed data mining predicts where and what
time of the day the incidents (traffic-violations) will occur based on National
database. Also, what combinations of factors contribute to traffic-violations.

3 Method
Several data mining algorithms were used to analyze the data. For example,
Naı̈ve Bayes, J48 decision tree, Decision Table, and Support Vector Machine.
Also, a few statistical analysis, such as, linear regression analysis, correlation
analysis, and reliability analysis were considered to analyze the final data. Mul-
tiple tools were used to process and analyze the data. For example, SPSS
Prediction of Traffic-Violation Using Data Mining Techniques 287

(i.e., Statistics is a software package developed by IBM company) tests helped to


determine which attributes should be considered for data mining. Also, WEKA1
(i.e., Waikato Environment for Knowledge Analysis) tool was used to perform
data mining algorithms [8] on the research dataset.

3.1 Data
The data was downloaded from the national database for public data2 . The
original database consists of 36 attributes. However, there were lots of attributes
that did not show any variations. For example, the accident attribute only had
“No” as a value. Attributes like that does not contribute to data analysis, so,
those attributes were deleted before the final analysis. The database consisted
over one million records. Of course, some of the rows had some missing values
or wrong values (e.g., human errors). Missing values and wrong values seemed
to be due to user errors. The database included demographic information, such
as, gender of vehicle drivers, and place of incidents, driver state, driver city, etc.

3.2 Preprocessing

The initial task for the preprocessing was to identify which attribute to keep
and which attributes to discard. Of course, the database included overwhelming
amount of data. However, for the data mining, only the most important and
relevant attributes were considered for final analysis. The preprocessing process
included deleting missing data, deleting irrelevant attributes, modifying records
to meaningful format, etc.

– SPSS tests helped to determine which attribute could to be deleted or not


included for data mining as well as final analysis (see Table 2).
– Missing and repeating attributes were discarded as well. Also, wrong entries
were discarded from final selection of data analysis.
– The dataset was divided into training set and testing set. The training set
consisted 67% of the data, whereas testing test consisted of 33% of the total
number of records. Holdout method was used to determine the training set
and testing set.

Initial Processing. After the determining the training set and testing set, and
deciding to keep some candidate attribute. Again, SPSS tests were executed to
determine which attribute should be deleted to further increase the accuracy of
the result. Mainly the test helped to determine which item should be deleted is
“items-deleted” to increase the reliability value. For example, SPSS tests indi-
cated time of the incident should be deleted to increase the reliability of the
results.

1
https://www.cs.waikato.ac.nz/∼ml/weka/downloading.html.
2
https://catalog.data.gov/dataset.
288 Md. Amiruzzaman

Table 2. Inter-Item correlation matrix

Personal Property Alcohol Contributed to


injury damage accident
Personal injury 1.000 −0.016 0.013 0.346
Property damage −0.016 1.000 0.019 0.368
Alcohol 0.013 0.019 1.000 0.014
Contributed to accident 0.346 0.368 0.014 1.000

Initial Results. Initial processing suggested that most traffic-violations hap-


pened in Maryland (DC), more specifically in Washington, DC area. Also, after
modifying the date of incident to weekdays (e.g., Sunday, Monday, Tuesday,
Wednesday, Thursday, Friday, and Saturday), it was noticed that most traffic-
violations happened on Tuesday and Wednesday (see Fig. 1.). This is maybe
because people are more anxious on mid-week (i.e., we call it mid-week effect).

Fig. 1. Number of incidents in days. (x-axis is days–Sunday (starting from left), and
end with Saturday (on the right); y-axis is the number of incidents).

4 Results
4.1 SPSS
Correlation analysis helped to determine that property damage and alcohol
were correlated (17%). Similarly, contributed to accident and property damage
Prediction of Traffic-Violation Using Data Mining Techniques 289

were correlated (34%); contributed to accident and personal injury were corre-
lated (37%). The correlation values were calculated using the following equation
(see (1)):
i=0
(xi − x̄)(yi − ȳ)
rxy =  n i=0 (1)
i=0
n (xi − x̄)2
n (y i − ȳ)2

where,
rxy is thecorrelation value between variables,
x and y, is the symbol for “sum up”,
xi is the individual value of variable x,
x̄ is the mean of variable x.
Similarly, yi is individual value of variable y,
ȳ is the mean of variable y.
In this analysis linear regression was used to verify some of the prediction
made by the WEKA software. The regression equation can be expressed as (see
(2))
yi = a + bxi + c (2)
where,
Y is the dependent variable that the equation tries to predict,
X is the independent variable that is being used to predict Y ,
xi ∈ X, and i = 1, 2, 3, ..., n,
yi ∈ Y , and i = 1, 2, 3, ..., n,
a is the Y -intercept of the line,
b is the slope,
and c is a value called the regression residual, which can be calculated by
|yˆi − yi |, where yˆi is the expected value of y.
The values of a and b are selected so that the square of the regression residuals
is minimized.
More detail about regression equation and example of regression can be
found online3 . The results obtained from linear regression analysis is presented
in Table 3.

Table 3. Linear regression analysis

Model R R2 Adjusted R2 Std. error of the estimate


1 0.404 0.163 0.163 0.125

Reliability values were calculated using equation below (see (3))


N × c̄
α= (3)
v̄ + (N − 1) × c̄
3
http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm.
290 Md. Amiruzzaman

where, N is the number of items c̄ is average iter-item covariance and v̄ is average


variance.
The reliability of four attributes (i.e., personal injury, property damage, alco-
hol, and contributed to the accident) was 0.435 (see Table 4.)

Table 4. Reliability statistics

Cronbach’s α Cronbach’s α based on standarized items N of items


0.435 0.362 4

4.2 Naı̈ve Bayes

The Naı̈ve Bayes classifier is one of the most popular classifiers in data mining.
To describe the strength of Naı̈ve Bayes [9] wrote “The naı̈ve Bayes classifier
computes the likelihood that a program is malicious given the features that are
contained in the program. This method used both strings and bytesequence data
to compute a probability of a binary’s maliciousness given its features” (p. 6).
Results obtained from Naı̈ve Bayes is presented in Table 5.

Table 5. Comparisons of different methods

Method Correctly Incorrectly Kappa Root Mean Precision Recall


name classified (%) classified (%) statistics Square Error
(RMSE)
J48 97.67 2.32 0.24 0.14 0.98 0.99
decision
tree
Naı̈ve Bayes 97.60 2.39 0.06 0.13 0.97 0.99
Support 97.61 2.38 0.00 0.15 0.97 1.00
Vector
Machine
(SVM)
Decision 97.64 2.35 0.24 0.13 0.98 0.99
table

Following the mathematical definition will help to explain how the Naı̈ve
Bayes classifier works.

Let, the dataset be d, and set of classes C = c1 , c2 , ..., cn , and predicted class
c ∈ C. The Naı̈ve Bayes classification can be expressed as (see (4)),

P (d|c)P (c)
P (c|d) = (4)
P (d)
Prediction of Traffic-Violation Using Data Mining Techniques 291

Over 500,000 instances were analyzed using Naı̈ve Bayes (Weka could not
return any results over 0.5 million records). 67% of them as training set and
33% of them as testing set.
The confusion matrix helped to compute the accuracy of classifying algo-
rithms. Therefore, the accuracy of a classifying algorithm can be defined as
(see (5)),
(T P + T N )
Accuracy = (5)
(T P + F P + T N + F N )
here, T P = True Positive, T N = True Negative, F P = False Positive, and F N
= False Negative.
With 97.6% accuracy Naı̈ve Bayes algorithm was able to classify traffic
violations-personal injury, property damage, and the presence of alcohol. The
confusion matrix of Naı̈ve Bayes has shown that only 297 records were classified
as “True Negative” (see Table 6)

Table 6. Confusion matrix (Naı̈ve Bayes)

Predicted class
No Yes
Actual class No True positive = 327107 False negative = 331
Yes False positive = 7715 True negative = 297

In the database different types of vehicle was reported. For example, motor-
cycle, automobile, station wagon, limousine, etc. Naı̈ve Bayes algorithm was able
to classify traffic-violations based on vehicle type with accuracy of 87.444%. Also,
Naı̈ve Bayes algorithm reported that automobile had the highest incident records.

4.3 J48
The J48 decision tree algorithm was used to visualize and determine how predic-
tion was made. In fact, J48 algorithm uses a mathematical model to determine
information gain can help to determine which variable fits better in terms of
target variable prediction. There are other data mining research, such as [10]
used J48 decision tree to predict their outcome variables as well.
Following the mathematical definition will help to explain how SVN classifier
works.
Let, the dataset be d, The dependent variable is Y (i.e., the target variable
that the algorithm is trying to classify).
The dataset d is consists of vector x, which is composed of the features,
x1 , x2 , x3 , . . . etc. that are used to make the classification or the decision tree.
Then, the decision tree algorithm can be expressed as (see (6))

(x, Y ) = (x1 , x2 , x3 , . . . , xk , Y ) (6)

where, k is number of features in vector x.


292 Md. Amiruzzaman

Around 5:00 pm, the traffic-violation happened did not involve alcohol, which
make sense as most people leave their work at that time. However, perhaps the
rush to go home may cause those traffic-violations at that time. On the other
hand, most traffic-violations between 12:00 am and 1:00 am involved alcohol,
which indicates that those occurred by drunk drivers. Perhaps, law-enforcement
agencies should look into those incidents and maintain more cautions. The J48
algorithm classified with the accuracy of 97.6% correct classification. The con-
fusion matrix of J48 has shown that only 1290 records were classified as “True
Negative” (see Table 7)

Table 7. Confusion matrix (J48)

Predicted class
No Yes
Actual class No True positive = 326350 False negative = 1088
Yes False positive = 6722 True negative = 1290

In addition, the J48 algorithm was able to classify traffic-violations based


on vehicle type with accuracy of 87.433%. Also, J48 algorithm reported that
automobile had the highest incident records.

4.4 Support Vector Machine (SVM)

Support vector machine (SVM) is one of the powerful data classification tools.
The SVM was invented at ATT Bell Laboratories by Cortes and Vapnik in
1997 [11]. To describe the strength of SVM classification algorithm Kim, Pang,
Je, Kim, Bang and Yang (2003) in [11] wrote, “The SVM learns a separating
hyperplane to maximize the margin and to produce a good generalization ability”
(p. 2757).
Witten and Frank (2009) in [12] mentioned, “Support vector machines select
a small number of critical boundary instances called support vectors from each
class and build a linear discriminant function that separates them as widely as
possible” (p. 188)
Following the mathematical definition will help to explain how SVN classifier
works:

Let, the dataset be d, and set of classes C = c1 , c2 , ..., cn , and predicted class
c ∈ C. Also, the input set X = x1 , x2 , ..., xn and x ∈ X. Here, X is input and C
is output. Now, if we want to classify c = f (x, α), where, α are the parameters
of the function, then SVN can be expressed as (see (7))

f (x, {w, b}) = sign(w × x + b) (7)

where, w is weight and b is bias.


Prediction of Traffic-Violation Using Data Mining Techniques 293

SVN algorithm was able to classify traffic-violations based on vehicle type


with accuracy of 87.433%. Also, reported that automobile had the highest inci-
dent records. The confusion matrix shows the accuracy of SVM classifier (see
Table 8).

Table 8. Confusion matrix (SVM)

Predicted class
No Yes
Actual class No True positive = 327438 False negative = 0
Yes False positive = 8012 True negative = 0

4.5 Decision Table


The Decision Table (DT) is a rule based classification model is “Decision table”.
This type of method generates rules of associations from the data and groups
the data or classifies the data. The decision table uses best-first search and
cross-validation for evaluation [12].
def def
Here, the symbol “ = ” represents defining relationship. Let, f (x) = x + 1
definies the ralationship of x with function f . In terms of predicting relationship
using DT can be defined as (see (8)):
def
R(x, y) = y = x (8)

where, R is relationship function between x and y. Which indicates that some y


helps to predict x.
DT algorithm was able to classify traffic-violations based on vehicle type with
accuracy of 87.451%. The DT analysis reported that automobile had the highest
incident records. The confusion matrix shows the accuracy of SVM classifier (see
Table 9).

Table 9. Confusion matrix (Decision table)

Predicted class
No Yes
Actual class No True positive = 326203 False negative = 1235
Yes False positive = 6664 True negative = 1348
294 Md. Amiruzzaman

5 Discussion
5.1 Learning from the Data Processing

The original data was download as comma-separated values (CSV) file. However,
I was important that csv file should be converted to WEKA supported file for-
mat. A Java program was written to csv file to Attribute-Relation File Format
(arff) file format. During the conversion process, it was discovered that arff file
is sensitive to date format. What format is used in the file should be explicitly
mentioned in the original arff file, otherwise WEKA software cannot recognize
the data type.
During the data processing and analyzing from visualization tool provided
by WEKA, it was discovered that WEKA support csv file as input as well.
In order to make sense of time of incident, time attribute was discretized to
nearest hour value. So, all time was discretized to 24-hour format, excel function
was used to accomplish this task (e.g., MROUND(B2, “1:00”)). Also, during
the presentation and feedback from experts, it was suggested to include date of
the incident. However, date was not much informative. So, date was converted
to day; built-in excel function was used to convert date to day number (e.g.,
WEEKDAY(A2), and then format was changed to dddd to get the day).
During the analysis κ value was calculated; κ value measures relative improve-
ment over random predictor. The κ statistics was computed using following equa-
tion (see (9)):
Dobserved − Drandom
κ= (9)
Dperf ect − Drandom
In terms of success, precusion and recall values were calculated as well. For
precision (10) was used.

TP
precision = (10)
TP + FP
where, number of true positive is TP, and number of false positive is FP.
Comparisons of different algorithm in terms of precision is shown in Table 10.

Table 10. Precision comparison

Naı̈ve Bayes J48 SVM Decision table


0.977 0.980 0.976 0.980

For recall value (11) was used.

TP
recall = (11)
TP + FN
where, number of true positive is TP, and number of false negative is FN.
Prediction of Traffic-Violation Using Data Mining Techniques 295

Comparisons of different algorithm in terms of recall is shown in Table 11.

Table 11. Recall comparison

Naı̈ve Bayes J48 SVM Decision table


0.999 0.997 1.000 0.996

After obtaining precision and recall values, F − statistics was computed


(see (12)).
2 × recall × precision
F − statistics = (12)
recall + precision
Comparisons of different algorithm in terms of F − statistics is shown in
Table 12. All algorithms provided same F − statistics value.

Table 12. F-measure comparison

Naı̈ve Bayes J48 SVM Decision table


0.988 0.988 0.988 0.988

To evaluate the prediction accuracy, root mean-squared error (RM SErrors)


was computed (see (13)).

 i=1

RM SErrors =  (yˆi − yi )2 (13)
n

where, yi is the observed value for the ith observation and ŷi is the predicted
value.
Comparisons of different algorithm in terms of root mean square error is
shown in Table 13.

Table 13. Root mean-squared error (RM SErrors) comparison

Naı̈ve Bayes J48 SVM Decision table


0.132 0.143 0.152 0.131
296 Md. Amiruzzaman

Fig. 2. Number of traffic-violations in 24 h. (x-axis is hours–0 or 24 (starting from left),


then 1, 2, and end 23 (right); y-axis is number of incidents).

6 Conclusion

Obtained results from data mining and statistical analysis suggested that per-
sonal injury was a must, if driver is drunk. Also, around 1:00 am was the most
dangerous time to go out (see Fig. 2.); most property damage and personal injury
happened because of drunk drivers between 11:00 pm to 1:00 am. This was the
time when most incidents occurred as well. Among all the cities, DC area seemed
to be more consistent with these results. Therefore, if you are in the DC area
during this specified times, then try not to hang out in the DC area at that time.
Perhaps, analyzing more data and latest database from law-enforcement
agencies could help us to find more interesting information. Also, use differ-
ent data mining algorithms could help to understand the data better as well.
Having a domain expert could be beneficial to interpret the findings and add
more implications.
As for the future study, visualization technique can be used to visualize the
intensity of traffic violations over geographic locations, and accident prone areas.
Moreover, deep learning can be applied to identify or classify areas based on their
violation probability as well.

Acknowledgment. The author would like to thank to open data website (https://
catalog.data.gov/dataset) for making the dataset available for research and analysis.
A special thank you to those who participated in the initial presentation and provided
valuable feedback (part of this paper was presented and was submitted as a class
Prediction of Traffic-Violation Using Data Mining Techniques 297

project). Also, thank to Dr. Kambiz Ghazinour for helping me to think further about
the data and analysis process.

References
1. Estimates, A.P.: U.S. and world population clock (2017). Accessed 19 Nov 2017
2. Statistics Brain: Driving Citation Statistics (2016). Accessed 20 Nov 2017
3. Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M.: Crime data mining:
a general framework and some examples. Computer 37(4), 50–56 (2004)
4. Solomon, S., Nguyen, H., Liebowitz, J., Agresti, W.: Using data mining to improve
traffic safety programs. Ind. Manag. Data Syst. 106(5), 621–643 (2006)
5. Saran, K.B., Sreelekha, G.: Traffic video surveillance: vehicle detection and classi-
fication. In: 2015 International Conference on Control Communication and Com-
puting India (ICCC) (2015)
6. Gupta, A., Mohammad, A., Syed, A., Halgamuge, M.N.: A comparative study of
classification algorithms using data mining: crime and accidents in Denver City
the USA. Education 7(7), 374–381 (2016)
7. Nath, S.V.: Crime pattern detection using data mining. In: 2006 IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology
Workshops, WI-IAT 2006 Workshops, pp. 41–44 (2006)
8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.:
The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18
(2009)
9. Schultz, M.G., Eskin, E., Zadok, F., Stolfo, S.J.: Data mining methods for detection
of new malicious executables. In: 2001 IEEE Symposium on Security and Privacy,
S&P 2001 Proceedings, pp. 38–49. IEEE (2001)
10. Olson, D.L., Delen, D., Meng, Y.: Comparative analysis of data mining methods
for bankruptcy prediction. Decis. Support. Syst. 52(2), 464–473 (2012)
11. Kim, H.C., Pang, S., Je, H.M., Kim, D., Bang, S.Y.: Constructing support vector
machine ensemble. Pattern Recognit. 36(12), 2757–2767 (2003)
12. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Tech-
niques, 2nd edn. Elsevier Inc., Amsterdam (2005)
An Intelligent Traffic Management System
Based on the Wi-Fi and Bluetooth Sensing
and Data Clustering

Hamed H. Afshari1 ✉ , Shahrzad Jalali2, Amir H. Ghods1, and Bijan Raahemi2


( )

1
SMATS Traffic Solutions Inc., Ottawa, ON K1Y 3B5, Canada
h.h.afshari@gmail.com
2
Knowledge Discovery and Data Mining Lab, Telfer School of Management,
University of Ottawa, 55 Laurier Ave., E, Ottawa, ON K1N 6N5, Canada

Abstract. This paper introduces an automated clustering solution that applies to


Wi-Fi/Bluetooth sensing data for intelligent route planning and city traffic
management. The solution is based on sensing Wi-Fi and Bluetooth MAC
addresses, preprocessing the collected real data and implementing clustering
algorithms for noise removal. Clustering is used to recognize Wi-Fi and Bluetooth
MAC addresses that belong to passengers traveling by a public transit bus. The
main objective is to build an intelligent system that automatically filters out MAC
addresses that belong to persons located outside the bus for different routes in the
city of Ottawa. This system alleviates the need for defining restrictive thresholds
that might reduce the accuracy, as well as the range of applicability of the solution
for different routes. Various clustering models are built to filter out the noise based
on four features of the average of the signal strength, its variance, number of
detections, and travel time. We compare the performance of clustering using the
Silhouette analysis and the Homogeneity-Completeness-V Measure score. We
conclude that K-means and hierarchical clustering algorithms have a superior
performance for clustering.

Keywords: Wi-Fi Bluetooth sensing · Clustering · Intelligent transportation

1 Introduction

1.1 Problem Statement

The cost of city congestions in North America has been estimated about $120B in 2012.
This is in addition to its negative impacts on the environment, as well as on the economy
that relies on the speed and efficiency of mobility. Public urban transit systems provide
a convenient and affordable solution for this problem. However, the limited revenue
obtained from bus fares limits the number of operating lines for public transit buses.
Hence, to overcome the problem of traffic congestion, optimal operational decisions on
the bus transit planning has a crucial role. Such decisions rely on estimating the number
of passengers, identifying their origins and destinations, and optimizing the travel cost.
Traditional methods of transit data gathering and transit decision planning were mainly

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 298–312, 2019.
https://doi.org/10.1007/978-3-030-02686-8_24
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 299

based on human, whereas they were expensive and time-consuming. Even though some
transit companies use data obtained from smart card transactions, those data may only
be used to find the origin of passengers and not their destinations, and ride time.
A new approach for solving the traffic congestion problem is based on using Wi-Fi-
Bluetooth sensing technologies for estimating the number of passengers, as well as their
origins and destinations. Nowadays, Bluetooth and Wi-Fi signals are constantly being
emitted by smartphones, tablets, and vehicular embedded systems. These signals can be
identified by their device’s unique Media Access Control (MAC) address. Note that
every MAC address is unique to its device and does not change over time. Sensors can
detect such information, and moreover, to track the device and the individual who moves
with that device over time. These individuals can be drivers, passengers of vehicles,
pedestrians, or cyclists. The main concern about such technologies is about recognizing
the MAC addresses that belong to passengers traveling by the bus from those that belong
to individuals outside the bus.

1.2 Literature Review


There has been a large number of studies in recent years that focus on using Wi-Fi and/
or Bluetooth sensors to manage traffic congestions. The Wi-Fi and/or Bluetooth MAC
addresses may be tracked to find the number of individuals in crowded places such as
store lines, supermarkets, public buses, stations, etc. Some of these studies were applied
to public transportation systems such as buses, trains, and undergrounds, while the other
only focused on individual vehicles. Wi-Fi and/or Bluetooth sensors may furthermore
be used to estimate the origin-destination (OD) of passengers, their wait time, and their
travel time. Dunlap et al. [1] have used Wi-Fi and Bluetooth sensing technologies to
estimate OD of passengers in transit buses. They mounted sensors on four buses to
collect Wi-Fi, Bluetooth, and GPS data in four weeks. They applied some preprocessing
steps on collected data in addition to numeric thresholds to remove noise. They moreover
estimated OD data of passengers at different bus stops and validated the results using
ground truth bus routes [1]. Ji et al. [2] have employed Wi-Fi sensors and boarding data
to present a hierarchical Bayesian model for estimating the OD flow matrix and the
sampled OD flow data. They evaluated the accuracy of their method using a bus route
empirically. Kostakos et al. [3] have developed a Bluetooth detection system that records
behaviors of passengers. They showed that approximately 12% of passengers carried
Bluetooth devices, and they measured the flow of passenger’s daily movements with
80% accuracy [3]. Blogg et al. [4] have estimated the OD data using MAC addresses of
Bluetooth devices embedded in vehicles and cell phones of motorists. They showed that
the use of Bluetooth technologies for capturing OD data in limited networks is a cost-
effective solution. Kostakos et al. [5] have introduced an automatic method to collect
passengers’ end to end trip data. They collected the location of the bus, the ticket data,
and the number of people on the bus using a Bluetooth detection sensor. They calculated
the OD matrix, related graphs and analyzed them to optimize transit plans by redesigning
routes and providing new services [5].
300 H. H. Afshari et al.

1.3 Contributions
This paper introduces an intelligent and automated system to recognize the Wi-Fi
and/or Bluetooth MAC addresses that belong to persons in the bus. This system is based
on defining some features and clustering them into distinct groups. Experiments are
conducted to show the performance of this method for real-world applications.
Section 2 briefly reviews some clustering approaches used in this paper. Section 3
presents the test setup and the experiment design. Section 4 discusses cluster modeling
and analysis.

2 Main Approaches for Clustering

2.1 Center-Based Clustering

Center-based clustering is referred to a class of clustering techniques in which the


cluster’s centroids are calculated based on a user-specified number of clusters. After
that, data points are classified into these clusters such that every cluster contains a set
of data points that are more similar (closer in the distance) to its centroid [6]. Center-
based clustering techniques mainly include K-means, fuzzy K-means, and K-medoids.
The K-means algorithm divides data points into groups of equal variance by minimizing
the within-cluster sum of squared error. The K-means algorithm attempts to cluster a
set of N data points into K disjoint clusters, where the cluster centroid is calculated by
the mean μj of data points. The cost function is the within-cluster sum of squared error
(the Euclidean norm) and is given by [7]:
n
∑ ( )
‖ ‖2
min ‖xj − 𝜇i ‖ (1)
𝜇j ∈ C ‖ ‖
i=0

The K-means algorithm suffers sensitive to noise and outliers. To overcome this
issue, the K-medians algorithm uses the Manhattan Norm (instead of the Euclidean
Norm l2) as the distance between data points [8]. The median is defined as the most
centrally located object within a cluster that has the smallest average dissimilarity to
other objects in the cluster. Compared to the K-means, the K-medoids is more robust to
noise and outliers [8]. The K-means and K-medoids are all exclusive clustering techni‐
ques [6] in which every data point is assigned to a single cluster. There are many cases
in which a data point may belong to more than one cluster with a specific probability.
The fuzzy K-means clustering assigns every data point to every cluster with a member‐
ship weight that is between 0 and 1. Membership 0 means that the object does not belong
to the cluster, whereas membership 1 means that it belongs. It is assumed that the sum
of weights (probabilities) for each object is equal to 1.

2.2 Graph-Based Clustering

Graphs are used to represent data in some data mining applications, in which the nodes
are data points, and the links are the connections among data points [6]. The
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 301

agglomerative hierarchical clustering is as an example of graph-based clustering. It starts


with every data point as a single cluster. After that, new clusters are repeatedly generated
by merging the two nearest clusters until a single cluster that includes all data points is
produced [6]. The key idea of hierarchical clustering is the calculation of the proximity
function between two clusters. There are some metrics to calculate the proximity func‐
tion for merging the nearest two clusters. They mainly include [7, 9]: (1) the Ward metric
that minimizes the sum of squared differences of data points inside a cluster; (2) the
maximum metric that minimizes the maximum distance between data points of every
two clusters; (3) the group average metric that minimizes the average of distances
between all data points of every two clusters.

2.3 Density-Based Clustering


The key idea of density-based clustering is that a cluster is a dense region of data points
which is surrounded by a region of law density. This idea is used to create a clustering
algorithm that has superior performance for situations in which clusters are irregular, or
intertwined, as well as situations include noise and outliers [6]. In such situations, the
center-based clustering or the graph-based clustering approach cannot present a satis‐
factory performance. Density-based clustering techniques find regions of high-density
that are separated from each other by low-density regions. The DBSCAN [6] is one of
the most effective density-based clustering techniques that determine the number of
clusters automatically and generates partitioned clusters. Moreover, it can isolate data
points in the low-density regions as noise and remove them from the clustering subspace.
A center-based density metric is used to quantify the density of data points. It may
be calculated by counting the number of data points located within a specified radius,
named as Eps, of every point [6]. The center-based density metric classifies each point
within data points into three main categories including core points, border points, and
noise points. The core point is a point located inside a density-based cluster. The border
point is the point that is not a core point but is located within a close neighborhood of
the core point. The noise point is also a point that is neither a core point nor a border
point and is located relatively far from the centroids [6].

3 Test Setup and Experiment Design

3.1 Sensing Device: Smats TrafficBox™

The Smats TrafficBox™ is a pole-mount, battery operated Bluetooth and Wi-Fi sensor
that was designed and built at SMATS Traffic Solutions Inc. Sensors operate inside a
ruggedized shockproof and waterproof case. It is ideal for tasks that require putting the
sensor at a specific location to collect data for several days. It can scan for up to 4 days
per one charge. The ruggedized case is equipped with a pole-mount configuration, such
that it can scan for days without the need for monitoring. TrafficBox™ sensors can collect
data on moving vehicles as well as in stationary positions. Sensors have adjustable
detection zones that cover a circular or a directional area for detecting Bluetooth and
Wi-Fi devices. Figure 1 shows a typical TrafficBox™ mounted on a pole. TrafficBox™
302 H. H. Afshari et al.

detects Bluetooth Classic and Low Energy devices. Note that Bluetooth devices are often
detected in the discovery mode. If a device is in this mode, the chance of detection by
a sensor is extremely high. However, few Bluetooth devices are in this mode. Traf‐
ficBox™ can additionally detect Bluetooth devices in the paired mode. In this mode,
two devices are connected and communicating with each other.

Fig. 1. A typical Smats TrafficBox™ device that collects Bluetooth and Wi-Fi data.

TrafficBox™ not only stores data offline, but also can send data in real-time for online
storage and real-time traffic monitoring. For offline data collection, the data is saved
onto a micro SD card. Data are later uploaded to a computer as a raw data set, or are
uploaded to the Smats cloud server and can be analyzed in their analytics platform.
TrafficBox™ sensors collect following data: MAC addresses, detection time stamps,
type of devices (Bluetooth or Wi-Fi, with Bluetooth Low Energy optional), the signal
strength, and GPS location data.

3.2 Experiment Design

Ground truth experiments were conducted using public urban transit buses traveling in
the city of Ottawa. TrafficBox™ is placed inside the bus to collect MAC address data
under two different test scenarios; each corresponds to a specific route. Note that
collected raw data contain noise and outliers that mainly correspond to MAC addresses
outside the bus. Before feeding raw data into clustering algorithms, they need to pass
through some preprocessing steps (see Sect. 3.3). After clustering MAC addresses and
identifying the ones that belong to passengers on the bus, they can be used for further
applications. These applications include calculation of the OD matrix, estimation of the
wait and the travel time for every passenger, optimizing bus transit plans, etc. Two routes
are considered for test, where each realizes a test scenario. The first test uses the route
101 that starts from the St. Laurent 3C station and ends at the Bayshore 1A station. The
GPS data are used to locate bus stops over time. Figure 2 shows a Google map view of
the routes 101 used in the test scenario #1.
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 303

Fig. 2. Google map view of the route 101 in the city of Ottawa.

The second test uses the route 85 that starts from the Bayshore 4B station and ends
at the Lebreton 2A station. Figure 3 shows a Google map view for routes 85. A large
part of the route 85 passes through the downtown of Ottawa, where it is usually more
crowded than the route 101. The route 85 is used to check the performance of clustering
algorithms on scenarios include a large number of passengers, crowded streets, and
crowded bus stations. Note that during experiments, the number of passengers in the
bus, as well as the number of entries and exits at every stop is manually counted. These
numbers are later used to intuitively check the performance of clustering algorithms.
Data collected by TrafficBox™ are uploaded to a computer using a USB port.

Fig. 3. Google map view of the route 85 inside the city of Ottawa.

3.3 Data Cleaning and Preprocessing

Collected Bluetooth and Wi-Fi data include MAC addresses that belong to the all
detected device in a certain range of distance. This range may be changed by replacing
the passive scanner antenna of TrafficBox™. However, under real practical conditions,
this range depends on some factors such as the weather condition, indoor obstacles,
obstruction of urban infrastructure, etc. For two test scenarios in which TrafficBox™ is
304 H. H. Afshari et al.

placed inside the bus, the range for Wi-Fi/Bluetooth detections is estimated to be about
200 m. TrafficBox™ generates a CSV file that includes the MAC address, the device
type, the signal strength, location coordinates, and the time stamp for every detection.
Note that sensors only detect Wi-Fi MAC addresses that belong to devices that are
actively communicating with the net. Otherwise, sensors detect all paired Bluetooth
devices without the need for them to be communicating with another source.
The raw data collected by sensors contain a considerable amount of noise, outliers,
and other inconsistency. For instance, at every bus stops, sensors detect MAC addresses
that belong to boarding passengers as well as the ones that belong to pedestrians, non-
passengers, or other individuals. Sensors may furthermore detect MAC addresses that
belong to other moving vehicles nearby the bus, or other individuals whose distance
from the bus is less than 200 m. Moreover, stationary Wi-Fi routers may have a long
detection range, and they should be considered as a source of noise [1]. In practical
situations, some passengers may turn their Bluetooth and/or Wi-Fi devices on or off
during the trip [1]. Hence, sometimes it is difficult to recognize the noise and other outlier
MAC addresses, even by eyes. In this context, to alleviate negative impacts of noise and
outliers, some preprocessing steps are recommended. In these steps, some soft thresholds
(instead of strict thresholds that completely remove outliers) is defined and applied to
raw data to remove outstanding outliers. Remaining outliers are automatically removed
though clustering. In this research, data preprocessing is performed in Python 3 and
Pandas library.
Dunlap et al. [1] have explained some preprocessing steps include applying strict
thresholds. This research uses some of their preprocessing steps, whereas our thresholds
are smaller. In the first step, based on the type of device, Wi-Fi MAC addresses are
separated from the Bluetooth ones. Clustering algorithms are separately applied to the
Wi-Fi and Bluetooth MAC addresses. In the next step, a threshold is defined based on
the number of detections Ndetect for every unique MAC address. MAC address data whose
number of detections is smaller than Ndetect is removed. In this research, Ndetect is set to
Ndetect = 2, such that

Detections per travel >Ndetect . (2)

Another important factor for preprocessing is the travel time that is defined as the
difference in time between the first and the last detection. The next step is to remove
MAC addresses whose travel time is smaller than a threshold, Ttravel, such that

Detections with travel time > Ttravel . (3)

In this research, a threshold on the travel time for both Bluetooth and Wi-Fi devices is
set to Ttravel = 30 s. This means that MAC addresses with a travel time smaller than 30 s
are removed.
In the final step, unique MAC addresses (Bluetooth and Wi-Fi separately) are iden‐
tified, and the average of their signal strength over all detections are calculated. After
that, MAC addresses with the average signal strength greater than a threshold Sstrength
are kept such that
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 305

Average signal strenght > Sstrenght . (4)

In this research, the threshold on the average of signal strength for Wi-Fi and Bluetooth
detection data is set to Sstrength = − 80 dB. This means that MAC addresses with the
average signal strength smaller than − 80 dB are filtered out.

3.4 Feature Extraction and Feature Engineering

Clustering is referred to as the task of dividing data points into some groups such that
data points in the same groups have more similar properties compared to other data
points. In this context, clustering algorithms can be used to detect anomalies (discords).
Anomalies are referred to as unusual or unexpected patterns occur in a dataset surpris‐
ingly [10]. To use clustering algorithms for the anomaly detection of time-series data,
there are three main approaches including [10]: (1) model-based approaches, (2) feature-
based approaches, and (3) shape-based approaches. In the model-based approach, a
parametric model is created for each time-series dataset, and alternatively, the raw time-
series dataset is converted into model parameters. Later on, a proper model distance and
a clustering algorithm are selected to cluster the dataset into some groups. In the feature-
based approach, every time-series dataset is converted into a feature vector. The clus‐
tering algorithm is then applied to feature vectors to divide them into distinct groups.
The third approach is the shape-based clustering in which shapes of time-series datasets
are compared based on a similarity index. Some nonlinear stretching and contracting
transformations are initially applied to datasets to match them as much as possible [10].
In this research, the feature-based approach is used in which every the time series
MAC address sensing dataset (passed through preprocessing steps) is converted into a
feature vector. After that, generated feature vectors are fed into clustering algorithms to
cluster MAC addresses that belong to passengers inside the bus into one group. Note
that clustering algorithms divide datasets into some groups based on statistical properties
of features. In this research, the feature vector is defined based on statistical properties
of MAC addresses that belong to passengers inside the bus. It is given by:
[ ]T
𝜃 = avg(s) var(s) n ΔT . (5)

Where avg(s) and var(s) are respectively the average and the variance of signal strength
values and are calculated over all detections for every unique MAC address. Moreover,
n and ΔT are the number of detections and the travel time for each MAC address,
respectively. The number of feature vectors is equal to the number of unique MAC
addresses. Note that before applying clustering, feature vectors are normalized such that
they have a zero mean and a unit Euclidean norm.
306 H. H. Afshari et al.

4 Cluster Modeling and Analysis

Most of the classic clustering algorithms need to have the number of clusters as input,
e.g., K-means clustering, K-medoids clustering, hierarchical clustering, etc. Besides,
some advanced clustering algorithms automatically select the number of clusters, e.g.,
Affinity Propagation, Mean shift, DBSCAN, etc. Hence, to apply classic clustering
algorithms, the optimal number of clusters is required. In this context, there are some
statistical measures in the literature [11] (e.g., Davies Bouldin index, Silhouette analysis,
etc.) that may be used to determine the best number of clusters.

4.1 Number of Clusters

The Silhouette analysis is used in this research to determine the optimal number of
clusters for classic clustering algorithms. Silhouette analysis [12] is a powerful tool for
interpretation and validation of the consistency within clusters of data points. It is mainly
based on the evaluation of the separation distance between clusters that are generated
by a clustering algorithm [12]. The Silhouette analysis provides an index that shows
how similar a data point is to its cluster (cohesion) compared to other clusters (separa‐
tion). This index is in the range of [− 1, + 1], where a high value near + 1 indicates that
the corresponding datum is well matched to its cluster, and is far from neighboring
clusters. An index 0 indicates that the corresponding data point is very close to the
decision boundary between two neighboring clusters, and a negative index indicates that
the datum is assigned to a wrong cluster [12].
The Silhouette index can furthermore be used to visually determine the proper
number of clusters. The Silhouette index is calculated based on the mean intra-cluster
distance a, and the mean nearest-cluster distance b for each data point [12]. Therefore,
the Silhouette coefficient s(i) for data point i is given by [12]:

b(i) − a(i)
s(i) = . (6)
max{a(i), b(i)}

Note that b(i) is the distance between data point i, and the nearest cluster that contains
the data point. It is deduced from Eq. (7) that: 1 ≤ s(i) ≤ 1. Figure 4 presents values of
the Silhouette index versus the number of clusters for Wi-Fi data under two test
scenarios. According to Fig. 4, it is deduced that the optimal number of clusters for both
test scenarios is equal to 3 since the corresponding Silhouette index for each scenario
has the largest value. Moreover, Fig. 5 presents a graphical representation of the Silhou‐
ette index obtained by the K-means algorithm. Figure 5 confirms that clustering data
into 3 clusters results in well-separated groups of data points, where all clusters pass the
average Silhouette index (i.e., the dashed line). Due to lack of space, this paper only
presents results corresponding to Wi-Fi MAC address data.
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 307

Fig. 4. Values of the Silhouette index versus the number of clusters for Wi-Fi data.

Fig. 5. Graphical representation of Silhouette analysis for 3 clusters (Wi-Fi MAC addresses).

Tables 1 and 2 present numeric values of the Silhouette index versus the number of
clusters obtained by the K-means algorithm for each test scenario. As presented, inde‐
pendent of the cluster number, the Silhouette index has a positive value close to 1, and
this confirms the proper performance of K-means algorithm for clustering Wi-Fi data.
For both scenarios, the optimal value of the cluster number is equal to 3.

Table 1. Silhouette index versus cluster numbers under test scenario #1


Number of clusters: 2 3 4 5 6
Silhouette coefficient: 0.68 0.72 0.71 0.57 0.53

Table 2. Silhouette index versus cluster numbers under test scenario #2


Number of clusters: 2 3 4 5 6
Silhouette coefficient: 0.51 0.57 0.54 0.54 0.55
308 H. H. Afshari et al.

4.2 Building Cluster Models


In this research, some algorithms are selected from the discussed three clustering
approaches. They are applied to feature vectors and their performances for recognizing
Wi-Fi MAC addresses are compared under two test scenarios. Note that feature vectors
are generated from the preprocessed data, and hence, the outstanding noise and outliers
have already been removed. The K-means, the fuzzy K-means, and the K-medians clus‐
tering algorithms are selected from the center-based approach. The agglomerative hier‐
archical clustering and the spectral clustering algorithm are selected from the graph-
based approach. The DBSCAN and the Gaussian mixtures algorithm also come from
the density-based approach.
All the above algorithms, except the DBSCAN, need the number of clusters as an
input. As discussed in Sect. 4.1, the optimal number of clusters is equal to 3. In this
context, cluster 1 contains Wi-Fi MAC addresses that certainly belong to persons trav‐
eling by the bus. Cluster 2 represents the ones that certainly belong to a person outside
the bus. Moreover, cluster 3 contains MAC addresses that more likely belong to people
outside but nearby the bus. The decision on labels of the cluster is made by looking at
clusters’ centroids. Simulation results need to manually be checked to ensure the proper
performance of algorithms. Note that route 101 mostly passes through areas that are far
from the downtown, whereas route 85 mostly passes through the downtown. Hence, test
scenario #2 deals with clustering of a larger dataset collected from crowded bus and bus

Fig. 6. Profiles of signal strengths over time for Wi-Fi data collected from route 101.

Fig. 7. Profiles of signal strengths over time for Wi-Fi data collected from route 85.
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 309

stops. Figure 6 presents profiles of signal strengths for Wi-Fi MAC addresses of test
scenario #1, before and after clustering. Figure 7 presents the ones under test scenario
#2. Clustered data are obtained using the K-mean algorithm.
Following Figs. 6 and 7, it is deduced that the K-means algorithm successfully sepa‐
rates Wi-Fi MAC addresses that belong to passengers in the bus under two different test
scenarios. To intuitively check the performance of clustering algorithms, it is a good
idea to look at the clustered features. Figure 8 presents 2D plots of features related to
testing scenario #1 and are clustered using the K-means algorithm. There are three main
clusters, whereas their centroids are represented by numbers 1, 2, and 3, respectively.
The clusters’ centroids are surrounded by data points that have features with closer
values. Figure 8 shows that the K-means algorithm is successful to cluster data points
into three groups based on their feature values.

Fig. 8. 2D plots of clustered features generated by K-means clustering for Wi-Fi data of route
101.
310 H. H. Afshari et al.

4.3 Performance Evaluation of Clustering Algorithms


There are two main approaches for evaluating the performance of clustering algorithms.
The first approach concentrates on defining a statistical measure that numerically quan‐
tifies how well similar data points are clustered into a group, without knowing labels.
Besides, the second approach needs the knowledge of ground truth classes (similar to
supervised learning) and is based on manual assignment of labels to data points during
experiments. In this paper, the Silhouette analysis was employed to determine the
optimal number of clusters. Note that values of the Silhouette index may further be used
as a statistical measure for evaluating the clustering performance. The Silhouette index
for the first and the second test scenario, assuming 3 clusters, is respectively obtained
equal to 0.72, and 0.57. Values of the Silhouette index versus the number of clusters
were presented in Fig. 4. As shown, values of the Silhouette index are positive and are
relatively close to 1, and hence, the proper performance of the K-means algorithm for
clustering similar data is statistically confirmed.
To follow the second approach and evaluate the clustering performance manually,
the Wi-Fi MAC address data obtained from the two test scenarios are labeled. After that,
the accuracy of clustering algorithms is evaluated based on some metrics that include
the Adjusted Rand Index [7], the Adjusted Mutual Information index [7, 13], the Homo‐
geneity-Completeness-V Measure score [7], etc. Homogeneity is a measure that checks
to see if each cluster K contains only members of a single class C [7]. Besides, Complete‐
ness checks to see if all members of a given class C are assigned to the same cluster K
[7]. Both Homogeneity and Completeness scores are in the range of [0, 1], where a larger
value represents better performance.
Homogeneity and completeness scores are respectively calculated by [7] .

H(C|K)
h=1− , (7)
H(C)

H(K|C)
c=1− , (8)
H(K)

where H(C|K) is the conditional entropy of classes given the cluster labels and is calcu‐
lated by [7]:
|C| |K|
∑ ∑ nc,k (n )
c,k
H(C|K) = − log , (9)
c=1 k=1
n n

moreover, H is the entropy of the classes and is calculated by [7]:


|C|
∑ nc (n )
c
H(C) = − log . (10)
c=1
n n

Note that n is the number of data points, nc and nk are respectively the numbers of data
points that belong to class c and cluster k, and nc,k is the number of data points from class
c that are assigned to cluster k [7]. Moreover, the harmonic mean of Homogeneity and
An Intelligent Traffic Management System Based on the Wi-Fi and Bluetooth 311

Completeness is referred to as the V-measure and is used to evaluate the agreement of


two independent assignments on the same dataset [7, 13]. The V-measure score is ranged
from [0, 1] and is calculated by [7]:

h×c
v=2 . (11)
h+c
Table 3 presents the Homogeneity-Completeness-V-measure score calculated for
clustering algorithms under the test scenario #1. According to Table 3, the K-means, the
hierarchical, and the spectral clustering have the best performance. Note that in this
research some of the clustering algorithms (e.g., the DBSCAN, and the Affinity Prop‐
agation algorithm) did not show an acceptable performance and hence, they are not
considered for comparison.

Table 3. Values of homogeneity-completeness-V-measure scores for test scenario #1


Clustering algorithm Homogeneity score Completeness score V-measure
K-means 0.896 0.953 0.924
K-medians 0.821 0.818 0.820
Fuzzy K-means 0.893 0.886 0.890
Hierarchical 0.896 0.953 0.924
clustering
Gaussian Mixture 0.857 0.852 0.855
Spectral clustering 0.896 0.953 0.924

5 Conclusion

This paper presented applications of clustering algorithms for removing noise and
outliers from Wi-Fi and Bluetooth MAC address detections. To estimate the traffic load
and provide an intelligent automated transit plan for public transit buses, it is important
to separate MAC addresses that belong to passengers in the bus from the ones belong
to persons outside the bus. Wi-Fi and Bluetooth detection data were initialed passed
through some preprocessing steps that included applying some thresholds to remove
outstanding noise and outliers. After that, clustering algorithms were used to automat‐
ically filter out the noise based on four features including (a) the average of the signal
strength over all detections; (b) their variance; (c) the number of detections; and (d) the
travel time. Performances of clustering algorithms were moreover compared in terms
of the Homogeneity-Completeness-V-measure score. It is concluded that the K-means,
the hierarchical clustering, and the spectral clustering algorithms had the best clustering
performance.
Future studies include using the clustering algorithms for the origin-destination (OD)
estimation, predicting the traffic load at each bus stop, and building an automated intel‐
ligent transit plan for public transit buses.
312 H. H. Afshari et al.

Acknowledgments. This research was supported by the Ontario Centres of Excellence (OCE)
Grant 27911–2017, and NSERC Engage Grant EGP 514854–17, in collaboration with SMATS
Traffic Solutions.

References

1. Dunlap, M., Li, Z., Henrickson, K., Wang, Y.: Estimation of origin and destination
information from Bluetooth and Wi-Fi sensing for transit. Transp. Res. Rec. J. Transp. Res.
Board 2595, 11–17 (2016)
2. Ji, Y., Zhao, J., Zhang, Z., Du, Y.: Estimating bus loads and OD flows using location-stamped
farebox and Wi-Fi signal data. J. Adv. Transp. 2017
3. Kostakos, V., Camacho, T., Mantero, C.: Towards proximity-based passenger sensing on
public transport buses. Pers Ubiquitous Comput. 17(8), 1807–1816 (2013)
4. Blogg, M., Semler, C., Hingorani, M., Troutbec, R.: Travel time and origin-destination data
collection using Bluetooth MAC address readers. In: Australasian Transport Research Forum,
vol. 36 (2010)
5. Kostakos, V., Camacho, T., Mantero, C.: Wireless detection of end-to-end passenger trips on
public transport buses. In: 13th IEEE International Conference on Intelligent Transportation
Systems (ITSC), Funchal, Madeira Island, Portugal, pp. 1795–1800 (2010)
6. Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining, 1st edn. Pearson Education
Inc, Boston (2006)
7. Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory
Methods 3, 1–27 (1974)
8. Park, H., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl.
36(2), 3336–3341 (2009)
9. Rafsanjani, M., Varzaneh, Z., Chukanlo, N.: A survey of hierarchical clustering algorithms.
J. Math. Comput. Sci. 5(3), 229–240 (2012)
10. Aghabozorgi, S., Shirkhorshidi, S., Wah, T.: Time-series clustering: a decade review. Inf.
Syst. 53, 16–38 (2015)
11. Legany, C.: Cluster validity measurement techniques. In: Proceedings of the 5th WSEAS
International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases,
Madrid, Spain (2006)
12. Muca, M., Kutrolli, G., Kutrolli, M.: A proposed algorithm for determining the optimal
number of clusters. Eur. Sci. J. 11(36), 1857–7881 (2015)
13. Rosenberg, A., Hirschberg, J.: V-measure: a conditional entropy-based external cluster
evaluation measure. In: Joint Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning, Prague (2007)
Economic and Performance Based Approach
to the Distribution System Expansion Planning
Problem Under Smart Grid Framework

Hatem Zaki1(&), R. A. Swief2(&), T. S. Abdel-Salam2(&),


and M. A. M. Mostafa2(&)
1
BC Hydro, Vancouver, BC, Canada
hatemzaki@mail.com
2
Ain Shams University, Cairo, Egypt
rania.swief@gmail.com, tarekabdelsalam@gmail.com,
mahmoud.a.mostafa@hotmail.com

Abstract. This paper proposes a new vision of the Distribution System


Expansion (DSE) problem considering new system performance measures. The
mathematical model has been rebuilt with a new combined multi-objective
formula, minimizing the system expansion Capital costs, Operations and
Maintenance (OM) costs and achieving the best combined performance measure
consisting of a combination of Reliability, Resiliency and Vulnerability. A new
practical weighted combined system performance index is applied and tested to
be used by utilities replacing the common simple reliability indices. The new
model uses the application of multi-objective optimization utilizing mixed
integer design variables, which include a combination of seven logical and
technical constraints to provide the best description of the real existing system
constraints. In addition to the new system performance proposed index, a new
algorithm of checking the system radial topology is proposed. The objective is
to find out the optimum sizing, timing and location of substations, into the
power distribution network. The proposed approach has also been tested on
14-bus real distribution system to demonstrate its validity and effectiveness on
real systems. The proposed approach has been also tested on IEEE 37-bus model
distribution system with modified parameters that are significantly larger and
more complex than the parameters frequently found in literature.

Keywords: Distribution system expansion  Smart grids


Reliability  Resiliency  Vulnerability  Genetic algorithm

1 Introduction

The distribution system is a vital part of the Electric Power System, denoted to connect
the transformer substations and the customers. DSE is a fundamental task for system
planners, asset managers, and operators. DSE is driven usually by the need to add
capacity in the system due to load growth and the inability of existing systems to serve
future loads. Finding an optimized solution to the DSE problem helps in making right,
sound and justifiable decisions, and forms a good defense for any investment decision,

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 313–332, 2019.
https://doi.org/10.1007/978-3-030-02686-8_25
314 H. Zaki et al.

especially the decision related to building or expanding a transformer substation with a


significant investment. This investment may impact electricity rates and hence affect
the financial performance of the utility.
DSE Planning involves decision making with multiple conflicting criteria, such as
capital investment and OM costs, energy losses, and reliability. A utopian solution that
optimizes all these objectives at the same time does not exist. Instead, a set of optimal
trade-off solutions exist, in which an improvement in any objective leads to deterio-
rations in other objectives. For example, a reduction in the investment cost by using
smaller cross-sectional area conductors or lower-class metal (such as using Aluminum
instead of Copper) increases energy losses and limits the ability to transmit power for
longer distances hence dictates lower utilization of assets [1].
In traditional researches, the DSE problem was modeled using a single objective
formula, often called the cost function. This cost function followed the model presented
by Gonen et al. in several publications with many improvements over the years [2].
This model was further improved and may, now, include non-financial measures such
as energy losses or reliability after aggregating them to fit in the financial formula [1].
Even though the Gonen’s school of thought dealt with a wide spectrum of DSE
problems, it lacked the smart grid dimension of including performance as a measure
when making such decision.
In today’s decision making, under Smart Grid (SG) approaches, system performance
is a vital characteristic of the DSE problem and must be combined as part of the
objectives when making such significant decision [3]. Several performance measures
have been proposed by researchers in the past, most of them were based on the Energy (or
Demand) Not Served (ENS) principles [4]. Some Researchers and utilities included
reliability in the form of customer choices as a newer direction in reliability measures.
This formed means of addressing customer expectations and was called Customer Based
Reliability (CBR) [5]. Many utilities have used one or a combination of common reli-
ability indices as a decision-making objective of system expansions and alterations [6].
Lately after a few destructive storms in North America, the IEEE standards and guides
introduced the concept of resiliency as a performance measure [7]. The resiliency of a
system is part of the common Customers Experiencing Lengthy Interruption Durations
reliability index (CELID), but is looking for outages of extremely long durations,
essentially more than 12 h. Researchers and planners have used the resiliency indices for
the purpose of allocating sectionalizing switches on the distribution systems feeders to
avoid entire feeder outages in case of contingencies [8]. Resiliency measures can be
applied to a type of outage characterized by being significant but expected (such as storms
and hurricanes). These outage causes are characterized by local impact on a limited
footprint. In this paper, a new modified resiliency index has been applied combining the
commonly used resiliency index with the addition of number of years under study.
Vulnerability of a system can also be one of the performance measures of integrated
systems. It has been widely used to assess cyber-security on data management and
control systems including power system Supervisory Control And Data Acquisition
(SCADA) [9]. Originally Vulnerability has been widely applied to water systems and
electric power generation systems. Nowadays Vulnerability has been applied to electric
power transmission systems with a quantitative risk approach. Vulnerability has several
definitions based on the infrastructures it addresses. In the electric power system
Economic and Performance Based Approach to DSE Planning Problem 315

Vulnerability can be defined as the impact and likelihood of the outage of critical
equipment in the system [10]. It defines the ability of the system to stay in-service
under an unusual disastrous attack such as significant destructive earth quakes with
unpredicted destructive area and terrorist attacks.
Recently, Invulnerability has been applied to Distribution System Planning as a
consideration utilizing the graph theory by ranking all nodes in terms of their criticality
with respect to the source node [11]. This approach forms the basis towards under-
standing Vulnerability and the criticality of distribution assets of a distribution system,
however, a vulnerability measure was not presented in these recent researches.
In this paper, a new weighted combination of reliability, resiliency and vulnera-
bility indices are proposed to be applied on distribution systems. These indices cover
all expected and unexpected outage causes that may affect the distribution system
infrastructure. A new index is then formed and used in the objectives for DSE planning
problem.
To solve the new formulated DSE model, Multi-objective optimization (also called
multi criteria optimization, multi performance or vector optimization) is used utilizing
an evolutionary solution algorithm [12]. Multi-objective optimization can be defined as
the problem of finding a vector of decision variables which satisfies constraints and
optimizes a vector function whose elements represent the objective functions [13].
After the significant improvements of computer software and the evolution of
Artificial Intelligence and Nature Inspired Techniques in solving multi-objective
complex optimization problems [14], researchers have included reliability as an
additional part of the objective function. Most researchers who have included reliability
as a separate objective have utilized the Energy (or Demand) Not Served (ENS) con-
cept as their main argument of modelling reliability such as Cossi et al. [3, 15].
A powerful class of optimization heuristic methods is the family of Metaheuristic
Techniques. The Genetic Algorithm (GA) became particularly suitable for the DSE
problem, once a well-established formulation for dealing with multi-objective problems
has been achieved [16].
In this paper, a commonly available Multi-Objective GA (MOGA) is used as a
means of finding the optimum, or near optimum solution with applications to modified
IEEE test cases as well as real life test cases.
This paper is divided into six sections. In addition to this introduction, Sect. 2
describes the DSE Problem including the new proposed parameters, in addition to
presenting a new approach in determining the Radial Structure of the distribution
system during the solution algorithm. The Mathematical Formulation and the solution
algorithm are discussed in Sects. 3 and 4, respectively. Test Cases are presented and
discussed in Sect. 5 and a conclusion is provided in Sect. 6.

2 Problem Description

The DSE problem is usually represented as a mixed integer multi-variable problem


[17]. The list of variables in this paper represents the substation locations and line
segment status (opened/closed or in-service/out-of-service). This model is presented to
find the optimum size, timing, and location of the distribution substation, as well as
316 H. Zaki et al.

determining the optimum status of each line section (opened or closed) recommended
for operations [18]. The optimum line section status, hence, identifies the system
configuration. The model used in this paper using this methodology was evaluated and
many complexities have been added to become as close as possible to the real systems.
The developed model is simple but includes all necessary objectives and constraints
to plan and operate the system. These constraints can be divided into Logical and
Technical constraints. Logical constraints are ensuring the solution provides a radial
system, all nodes are connected to one substation, and one of the new substations is
selected while the existing substations are still in-service. The technical constraints
include the voltage limits, line segment conductor current thermal limit and the power
balance of the system (Supply capacity equals total loads). In this paper, the objective
function is formed of two parts.
The first part is the total life cycle asset cost which includes the installation capital
costs and the Operations and Maintenance costs (OM). This part is represented using a
Cost Index (COSTINDEX). COSTINDEX is the Capital and the present value of Life
Cycle costs referred to the maximum asset cost of the system. The purpose of this
referral is to normalize the value obtained and make it homogenous with the other
components of the objective function.
The second part represents a combined system Contingency Index (CONDEX).
This index consists of three weighted components giving the planner the choice to
prioritize one component over the other by adjusting the three weights as required by
the utility’s strategic approach. CONDEX is formed of the following components:
(a) The Unified Reliability Index (URI) – this index has been previously used as a sole
indicator for reliability by utilities [19]. It is formed of four (or more) common
reliability indices: System Average Interruption Frequency Index (SAIFI), System
Average Interruption Duration Index (SAIDI), percentage of Customers Experi-
encing Multiple Interruptions of 4 or more (CEMI-4), and Customers Experiencing
Lengthy Interruption Durations of 6 or more hours (CELID-6). In this paper, only
these four indices are used due to the practical nature of the distribution system.
Other indices will require special unusual measuring equipment to provide enough
data to be used in calculating these indices.
(b) The System Resiliency Index (SRI) – This is one of the reliability indices but with a
more stringent condition. The SRI is measured using CELID-12 which represents
the percentage of customers experiencing outage durations of 12 h or more per
year (or per study period). This definition is also provided by the IEEE-std 1366-
2012 and has been used by many utilities across the world [7]. SRI was slightly
modified to include a measure of past number of years in order to add an argument
that expresses the period over which the resiliency happens during a certain study
period. For example, if the study period is measured over 5 years and the 12-h
outages occurred 3 of the 5 years, then SRI becomes the sum of CELID-12, and the
number 3 (assuming the total number of customers are have not changed for the
study period). This makes SRI range anywhere from zero to six for a study period
of five years. By using this methodology in calculating SRI both the number of
customers and the outage periods are included in this performance measure.
Economic and Performance Based Approach to DSE Planning Problem 317

(c) The System Vulnerability Index (SVI) – This is the new index presented in this
work. SVI represents the ability of the distribution system to stay in-service
during and after a massive disaster such as a massive earthquake, a one of a kind
storm with destructive wind speed (not annual storms), large permanent floods,
etc. To use a predictive Vulnerability index, three weighted arguments are created
and selected to form the SVI. This performance index is a function of the fol-
lowing arguments:
– Node Distance Index (NDI) which presents the distance between each node
and its source (substation in most cases)
– Node Failure Rate Index (NFRI) which presents the failure rate of each node
route as linked to its source.
– Node Failure Duration Index (NFDI) which presents the failure duration of
each node route as linked to its source.
SVI is then formed of the sum of the weighted values of NDI, NFRI, and NFDI.
Each of these measures are weighted according to its criticality to the distribution
system planner and combined in the SVI.
CONDEX is hence formed of the weighted sum of URI, SRI and SVI. The pres-
ence of these weights provides enough flexibility to adjust the system configuration
according to the highest priority index according to the strategy of the utility.
The above-mentioned Objectives are subjected to a number of constraints. These
constraints limit the optimum solution to a practical implementable solution, making
the model as close as possible to the systems implemented in real life. These constraints
are explained further in the Mathematical Formulation section. Figure 1 shows an
overview of the proposed model of the DSE problem objective function.

Fig. 1. Overview of the proposed model of the DSE problem

These objectives are subjected to four logical constraints and three typical technical
constraints to be all considered in the solution of the DSE problem all combined
together. These constraints are usually applied to represent the real-world distribution
systems which usually operate under these constraints.
318 H. Zaki et al.

One of these constraints has been also rebuilt, with a new model, to better represent
real systems. This constraint is the radiality constraint, in which the final solution must
consider the radial nature of the distribution power system to be operated.

2.1 Checking the Radial Structure of the System


In the DSE problem, previous researches used a single check-point to identify if the
system is radial. On one hand, some algorithms use number of nodes in comparison to
number of line sections after generating the element-node incidence matrix [20]. On
another hand, the Floyd-Warshall Algorithm was also used to find the shortest path in
single source distribution systems [21]. Another method for representing the radiality
constraint is to employ the branch-node incidence matrix [22]. These methods were
typically oriented to special cases with stringent conditions and cannot be generalized.
By studying these past algorithms, it can be observed that these algorithms have
worked in the past but were conditioned by one or more of the following:
(a) Test systems must NOT have internal loops supplied from the same line
(b) All systems used have one source or modified to satisfy one source before
applying the algorithm

Fig. 2. Radial structure checking algorithm overview


Economic and Performance Based Approach to DSE Planning Problem 319

In this paper, all radial structure conditions are combined under one algorithm. The
proposed algorithm uses an Iterative methodology to check for internal loops within the
system. Before it terminates, the algorithm uses a connectivity check algorithm to
ensure all nodes are connected to a source and to only one source. Figure 2 presents an
overview of the proposed algorithm.
The proposed radial checking algorithm starts by isolating all power sources of the
system (such as DG, Energy Storage, etc.) turning it to the classical well-known
distribution system supplied by substations. The algorithm then performs the following
checks:
(A) Checking for Internal loops
Internal loops are nodes and branches on the same feeders emerging from one node on
the feeder and terminating on another node on the same feeder. Internal loops in graphs
are called Cycles (or Network Cycles). In graph theory there are many numerical
methodologies capable of determining the presence of cycles in a graph [23]. One of
these methodologies is the Iterative Loop Counting Algorithm (ILCA). This method is
characterized by returning the total number of cycles in a graph, as well as its ease of
programming.
ILCA searches for loops by moving along a dynamic path. The use of this dynamic
path essentially turns the network into a tree, and the path at any given time is a line
from the top of the tree to any of the nodes on the branches. Loops occur whenever a
node ID exists in two separate places on the path.
(B) Checking for connectivity to a supply node
The connectivity to a supply (or a substation) can be determined using the well-known
Floyd-Warshall (Shortest Paths Algorithm), which is part of the graph theory appli-
cations [24].
(C) Checking if any node is supplied by more than one source
This is a simple algorithm which also uses the connectivity algorithm explained before
to determine if any of the nodes in the network is supplied by more than one substation.
As mentioned in the above explanation, the algorithm of determining the presence
of loops extensively uses the Graph Theory. It is very similar to the spanning tree
algorithm with different alignment to match the required results.

3 Mathematical Formulation

As mentioned in the Problem Description section, the problem is formed of two


objectives. The objective function is formed of two parts to be aggregated and mini-
mized under one representation. In order to build the model on an index basis, the two
parts of the objective function are normalized by referring them to maximum values in
the system. As such the mathematical minimization problem can simply be as follows:
320 H. Zaki et al.

Minimize
ð1Þ
COSTINDEX ¼ FNormalized þ CONDEXNormalized

Equation (1) describes the overview of the objective function. The components of
the Objective function are as follows:

3.1 Minimization of Assets Life Cycle Costs (F)


Capital Investment costs and the net present value of the OM costs are combined under
the following formula:
( " #)
1 X
T X
stn   mX
þ stn  
F¼ Cst;t Xi þ OMst;t þ Cl;t Xj þ OMl;t ð2Þ
Cst;t;max t¼1 i¼1 j¼stn þ 1

Where,
F total is the Life cycle costs of the assets during the study period
T is the number of years of the study period
stn is the total number of substations including old and new substations
Cst,t & Cl,t is the total investment cost of the substation st and each line section l at
year t
OMst,t & OMl,t is the net present value of the Operation and Maintenance costs for
substation st and line section l at year t
X is the binary design valrable reflecting the status of substations and line sections
Cst,t,max is the higest asset cost in the system
In order to accommodate the unit differences, all values were normalized by referral
to the highest asset cost in the system. This way all objective function arguments can be
added with no compromise of units or values.

3.2 Minimization of the Contingency Index (CONDEX)

CONDEX ¼ A  URI þ B  SRI þ C  SVI ð3Þ

Where,
A is the weighting factor of the Unified Reliability Index (URI),
B is the weighting factor of the System Resiliency Index (SRI)
C is the weighting factor of the System Vulnerability Index (SVI)

URI is mathematically defined as:

URI ¼ a1  SAIFI þ a2  SAIDI þ a3  CEMI  4 þ a4  CELID  6 ð4Þ


Economic and Performance Based Approach to DSE Planning Problem 321

Where,
a1, a2, a3, & a4 are the weighting factors of each reliability index
SAIFI is the reliability Index known as System Average Inerruption Frequency
Index
SAIDI is the reliability Index known as System Average Interruption Duration Index
CEMI – 4 is Customers Experiencing Multiple Interruptions of 4 or more
CELID – 6 is percentage of number of Customers Experiencing Lengthy
Interruption Durations of 6 h or more
SRI is mathematically defined as:

SRI ¼ CELID  12 þ Nyrs ð5Þ

Where,
CELID – 12 is the percentage of Customers Experiencing Lengthy
Interruption Durations of 12 h or more for a given number of years
Nyrs is the ratio of number of years there has been 12 h or more outages during the
span of given number of years CELID – 12 has been applied
It is common to use 5 years for most cases as the ultimate number of years for
system resiliency measurement.
SVI is mathematically defined as:

SVI ¼ c1  NDI þ c2  NFRI þ c3  NFDI ð6Þ

Where,
c1, c2, c3 are the weighting factors of each vulnarability index
And NDI, NFRI and NFDI are as previously defined in the Problem Description
section.
The above-mentioned objectives are subjected to a number of constraints to make
the simulation as close as possible to the system in the field. These constraints consist
of two sets of constraints: (a) Logical and, (b) Technical as follows:
(a) Logical constraints:
i Radiality of the system – This constraint to make sure the distribution system is
optimized as a radial system and no loops exist. This is implemented by an
algorithm, shown in Sect. 2.1, returning a flag called RadialFlag. If the flag
returns 1, then the system is radial. If the flag returns 0, then the system still has
loops.
ii Connectivity of all Nodes to a source – This constraint is to make sure all nodes
are empowered using at least one source (substation). This is also implemented
by an algorithm returning a flag called concheck. If the flag returns 1, then the
system is all healthy and fed by its available sources. If the flag returns 0, then
the system still has a disconnect. The algorithm uses the path function of the
graph theory as its basis to check connectivity between nodes and substations
322 H. Zaki et al.

iii Selection of only one new substation – This is performed using the fact that the
addition of the status variable of all new substations proposed to expand the
distribution system must be equal to unity. The formula for this constraint is as
follows:

X
nnewsubs
Xi ¼ 1 ð7Þ
i¼1

Where,
X is the decision variable of the optimization problem
n – newsubs is the number of new substations being added for the selection of one

iv Keeping the existing substations – If the existing substation has enough useful life,
it should be kept in service and must be selected as part of the model. This is
achieved using the fact that the multiplication of the status variable of all existing
substations in the distribution system must equal to unity. The formula for this
constraint is as follows:

Y
nexistingsubs
Xi ¼ 1 ð8Þ
i¼1

Where,
X is the decision variable of the optimization problem
n – existing subs the number of existing substations

(b) Technical constraints:


i Voltage Limits – Voltages of all system nodes must be within standard ranges
between a minimum value and maximum value.

Vmin  Vi  Vmax 8i ¼ 1; 2; 3. . .; n ð9Þ

Where,
n is the total number of nodes not including source nodes
Vmin and Vmax are the standard allowable voltage limits
Vi is the node voltage
Economic and Performance Based Approach to DSE Planning Problem 323

ii Current thermal Limit

Ii  Imax 8i ¼ 1; 2; 3. . .; m ð10Þ

Where,
m is the total number of line sections
Imax are the line section conductor allowable current thermal limit
Ii is the line section current flow

iii Power Balance for each substation – By adding all power flowing out of a
substation and comparing this power to the substation capacity, a power balance
index can be formulated. This is usually achieved by performing a load flow and
adding power flow in the first section of each feeder emerging from each sub-
station. The formula expressing this condition can be expressed as follows:
Pnfeeders 
Power Flowi [ 1. . .. . .Powerflag ¼ 1
i¼1
ð11Þ
Substationcapacity \1. . .. . .Powerflag ¼ 0

Where,
n – feeders is the number of feeders emerging from a substation

4 Solution Methodology

The solution methodology of the DSE problem, modeled in this paper, starts by storing
the values and parameters of the original system for comparison purposes. The
methodology then proposes calculating the objective function. The solution method-
ology then starts optimizing the system by finding the minimum objective function
value subjected to the identified constraints. Figure 3 shows the overview of the pro-
posed solution methodology.
The solution of the optimization problem was obtained using the MOGA. The GA
is an older algorithm that appeared in the early 1990s [25]. GAs (Goldberg 1989) are
search algorithms based on the principle of natural genetics and evolution.
Figure 4 shows the flow chart of the GA based algorithm for solving the opti-
mization problem.
324 H. Zaki et al.

Fig. 3. Overview of the solution methodology

The stopping criteria, mentioned in Fig. 4, determines when to stop the GA. This
includes reaching maximum iterations, obtaining a solution that meets maximum tol-
erance in comparison to the previous solution, reaching maximum number of popu-
lation generations, etc.
GAs have proven to be a useful approach to address a wide a variety of opti-
mization problems. Being a population-based approach, GA is well suited to solve the
multi-objective optimization problems. In this work, MOGA is applied to solve the
proposed multi-objective, single representation DSE planning problem.
Economic and Performance Based Approach to DSE Planning Problem 325

Fig. 4. Genetic algorithm flow chart

5 Case Studies

Test cases have been performed to demonstrate the viability and effectiveness of the
proposed model and obtained optimized solution. Two test cases were chosen and
presented in this paper. These test cases are the 14-node and the 37-node test systems.
The first test case, which is a 14-node test system, is presented in details with deep
analysis of its parameters and the obtained solutions. The proposed model was
examined on this test case using two basic scenarios. The first scenario, which is called
Case (a), reflects a case with all line sections of outage durations less than 12 h. The
second scenario, which is called Case (b), reflects a case where two-line sections were
modified to have more than 12 h outage durations.
326 H. Zaki et al.

The second test case, which is a 37-node test case, is very similar to the first one
and, therefore it is only presented in brief with some discussion on its obtained results.
This test case was modified from the typical 37-node IEEE test case to reflect a
balanced system as well the addition of a large DG connected directly to the existing
substation.
Both test cases obtained good results with clear improvement using the proposed
combination of cost and performance parameters in the objective functions.

5.1 Test Case I: The 14-Node Test System


Several scenarios were used for testing the algorithm using the 14-node test system.
The first scenario provides parameters such that the SRI index is zero, which means
there are all line sections will require less than 12 h to maintain in case of an outage.
The parameters of this test case are presented in Table 1. Failure Rate and Duration
of outages is a function of each line section age, installation quality, environment and
erosion factors and location. These numbers are typical numbers for test purposes only
and can be modified as required.

Table 1. 14-Node test case line parameters


Line From To Conductor Length Original Modified Failure rate Duration of
section size (m) status status (failure per outage
no. (AWG) year) (h/year)
1 1 10 556.5 7290 1 1 1 0.2
2 2 10 556.5 5180 1 1 1 0.3
3 3 10 556.5 24,390 1 0 1 0.5
4 3 11 556.5 700 0 1 1 8.5
5 8 11 556.5 4530 0 1 1 0.7
6 9 11 556.5 1625 0 1 1 0.9
7 1 6 350 7320 1 0 2 1
8 2 4 350 5260 1 1 5 1.5
9 2 5 350 4770 1 1 2 0.6
10 2 7 350 6250 0 1 3 0.4
11 6 8 350 1890 1 1 7 7
12 7 8 350 4630 1 0 4 0.9
13 8 9 350 1000 1 0 2 0.3
14 12 13 556.5 700 0 0 1 0.2
15 3 12 556.5 725 0 0 1 0.6
16 12 14 556.5 121 0 0 1 0.4
17 5 13 350 3850 1 1 2 6
18 7 14 350 5100 1 1 3 1.1
Economic and Performance Based Approach to DSE Planning Problem 327

The Existing and the proposed substations data of this system is shown in Table 2.

Table 2. Substations of 14-node test system


Substation node Capacity Capital cost O&M annual costs Existing/New
ID (KVA) ($k) ($k)
10 1000 7500 100 Existing
11 2500 4000 150 New
12 1500 1200 60 New

The original 14-node test system and the optimized system are both shown in
Fig. 5. While Case (a) presents the original system that required attention from the
planner, Case (b) presents the proposed modified system after applying the objective
functions and all constraints.

Fig. 5. 14 Node existing and modified test systems

Figure 5, case (a) shows the original system which was a radial system supplied by
substation 10. As a result of the load growth of the system, two feasible substations are
proposed in two different locations, each with three emerging feeders to supply the load
growth. It is required to select only one substation and determine the optimum system
configuration that minimizes the overall costs as well as the achievement of the best
reliability indices.
After running the proposed algorithm on this system using all constraints, the result
becomes case (b) which proposes the transfer of 4 nodes from substation 10 to sub-
station 11. Substation 11 is the selected candidate and the system can now operate
using the proposed configuration.
328 H. Zaki et al.

Table 3 represents the cost and performance values for both Cases presented in
Fig. 5. It is obvious that in order to improve the system performance and change it from
a fully radial system to an open loop system, there will be an increase in costs. The
open loop system operates in a radial fashion with internal open line sections called ties
to be used mainly during contingencies. The Cost have increased by approximately 1.5
times, however, there is a significant improvement in URI and SVI, which represent the
system performance in this case. The reason SRI is showing zero values in this case is
that none of the line sections parameters have been marked with a failure duration more
than 12 h.

Table 3. 14-Node test Case comparison of objective values


Original With failure durations With failure durations greater
system less than 12 h than 12 h on lines 5 and 6
Case (a) Case (b)
F 1.0312 1.5578 1.1894
URI 19.6933 4.2031 4.956
SRI 0 0 0
SVI 11.0736 5.7909 6.3029
COSTINDEX 31.7981 11.5518 12.4483

As a courtesy of testing the system, another run has been made on the same test
case after changing some of the line section failure durations to more than 12 h. For
comparison purposes the result of this new modified case is also presented in Table 3.
Introducing SRI to the optimization process changes its result. In the same original
test case, presented in Fig. 5, case (a), line Sections 5 and 6 durations of outages were
increased to 13 and 15 h respectively. If case (b) was maintained, its SRI would have
become 3.3 and the total objective function value would have been 18.6.
The impact of this failure duration change was that the optimization algorithm
chose substation 10 to be in-service as part of fulfilling the constraint to keeping the
existing substation, and substation 12, instead of substation 11 of case (b), as part of the
solution to the expansion problem. As such also line Sections 14 and 16 were rec-
ommended to be in-service and the final configuration became as seen in Fig. 6.
This shift in substation choice is logical as the algorithm tried to avoid supplying the
system using the high failure duration Sections 5 and 6. SRI is still measuring zero
because these two lines were avoided.
Economic and Performance Based Approach to DSE Planning Problem 329

Fig. 6. Modified Network Supplied from Substations 10 and 12 after increasing durations of
outages of line Sections 5 and 6.

It is expected that the objective function’s total value, of the new case, is higher
than the case (b). However, the final values are still much better than the original
objective function value with substation 10 supplying the entire system.

5.2 Test Case II: The 37-Node Test System


Similar to the 14-node test system, a 37-node test system was also used and analyzed to
test the proposed optimization solution. In this test system, there is one existing sub-
station supplying the entire load of the system and there are three proposed substations
in different locations and at different distances from the existing line sections. The 37-
node test system is shown in Fig. 7.
While the 14-node test system did not contain a DG connected to the system, the
37-node test system has a DG connected directly to the existing substation tied to node
4. The parameters of this 37-node test system are similar to the 14-node test system
except with a larger quantity of substations and line sections. Proposed substations are
numbered 38, 39 and 40 and they represent three different locations with three and four
feeders as shown in Fig. 7. Due to the size of this test case and to avoid crowded
figures, only the original system is presented.
330 H. Zaki et al.

Fig. 7. The 37-node test system

The indices and the objective function value of the original and the optimized
systems are shown in Table 4. While the SRI value remained almost the same, the cost
(F) deteriorated and URI, SVI improved, hence improving the total objective function
value.

Table 4. 37-Node test Case comparison of objective values


Original Optimized Line sections to Line sections to
system case system case be closed be opened
F 1.0475 1.3425 40, 41, 42 and 43 4, 9 and 23
URI 6.0262 5.1416
SRI 3.0495 3.0487
SVI 5.2298 4.4474
COSTINDEX 15.353 13.98

In this test case, line Sections 10 and 39 are assumed to have failure durations of 15
and 18 h per failure per year. Since line Section 39 is one of the proposed substations
main line to the system, the algorithm was able to avoid it in the optimization process
by excluding substation 38 from the selection. Line 10, however, is on the pathway of
all substations and hence it was elected in all options and is unavoidable when opti-
mizing the system. In comparison to the original case, line 10 is also one of the main
Economic and Performance Based Approach to DSE Planning Problem 331

components of the system and cannot be set to open. Therefore, the improvement in
SRI was marginal as the algorithm searched for the lower value by avoiding other line
sections with less number of customers due to its inability to change the failure duration
and set this line section to open.

6 Conclusion

In this work a new model for the DSE problem was proposed. The new model included
a combination of three performance indicators combined with the commonly used cost
function as a multi-objective function. Seven constraints were used in the solution for
the first time. The new proposed model demonstrates its viability to arrive to an
optimum solution considering the modern approaches of smart grids including per-
formance when planning the expansion of distribution systems. After testing the model
on two test systems with variable parameters it can be concluded that the model is a
practical implementable model that proposes a solution suitable for finding a trade-off
between cost and performance. The model can be easily applied in utilities and is
recommended to be used by planners to help them make the best investment decisions.

References
1. Luong, N.H., Grond, M.O.W., La Poutre, H., Bosman, P.A.N.: Scalable and practical multi-
objective distribution network expansion planning. In: IEEE Power and Energy Society
General Meeting (2015)
2. Vaziri, M., Tomsovic, K., Bose, A., Gonen, T.: Distribution expansion problem: formulation
and practicality for a multistage globally optimal solution. In: IEEE, Power Engineering
Society Winter Meeting (2001)
3. Cossi, A.M., da Silva, L.G., La Zaro, R.A.R., Mantovani, J.R.S.: Primary power distribution
systems planning taking into account reliability, operation and expansion costs. In: IEEE,
The Institute of Engineering and Technology (IET) Generation, Transmission and
Distribution, no. ISSN 1751-8687 (2011). https://doi.org/10.1049/iet-gtd.2010.0666
4. de Souza, J., Rider, M.J., Mantovani, J.R.S.: Planning of distribution systems using mixed-
integer linear programming models considering network reliability. J. Control Autom. Electr.
Syst. (2015). https://doi.org/10.1007/s40313-014-0165-z
5. Mazhari, S.M., Monsef, H., Romero, R.: A multi-objective distribution system expansion
planning incorporating customer choices on reliability. IEEE Trans. Power Syst., 1330–1340
(2015). https://doi.org/10.1109/TPWRS.2015.2430278
6. Muñoz-Delgado, G., Contreras, J., Arroyo, J.M.: Reliability assessment for distribution
optimization models: a non-simulation-based linear programming approach. In: IEEE, Power
and Energy Society General Meeting (2017)
7. IEEE std. 1366-2012 IEEE Guide for Electric Power Distribution Reliability Indices. IEEE
Power and Energy Society (2013)
8. Zare-Bahramabadi, M., Abbaspour, A., Fotuhi-Firuzabad, M., Moeini-Aghtaie, M.:
Resilience-based framework for switch placement problem in power distribution systems.
IET Gener. Transm. Distrib. 12(5), 1223–1230 (2018). https://doi.org/10.1049/iet-gtd.2017.
0970
332 H. Zaki et al.

9. Chee-Wooi, T., Chen-Ching, L., Govindarasu, M.: Vulnerability assessment of cybersecurity


for SCADA systems. IEEE Trans. Power Syst. 23(4), 1836–1846 (2008)
10. Johansson, J.: Risk and vulnarability analysis of large-scale technical infrastructures. Ph.D.
thesis, Media-Tryck, Lund University, Lund, Sweden, Lund, Sweden (2007)
11. Chen, J., Peng, M., Gao, X., Li, G.: Multi-objective distribution network planning
considering invulnerability. In: IEEE 2nd Information Technology, Networking, Electronic
and Automation Control Conference (ITNEC), Chengdu, China (2017)
12. Yang, X.-S.: Nature-Inspired Optimization Algorithms, Wlatham. Elsevier Inc., New York
(2014)
13. Ramírez-Rosado, I.J., Bernal-Agustín, J.L.: Genetic algorithms applied to the design of large
power distribution systems. IEEE Trans. Power Syst. 13(2), 696–703 (1998)
14. Yang, X.-S.: Nature-Inspired Optimization Algorithms. Elsevier Inc., New York (2014)
15. Pereira Jr., B.R., Contreras, J., Mantovani, J.R.S., Cossi, A.M.: Multiobjective multistage
distribution system planning using tabu search. In: IEEE, The Institute of Engineering and
Technology (IET) Generation, Transmission and Distribution, no. ISSN 1751-8687 (2013).
https://doi.org/10.1049/iet-gtd.2013.0115
16. Coello, C.A.C.: An updated survey of GA-based multiobjective optimization techniques.
ACM Comput. Surv. 32(2), 109–143 (2000)
17. Turkay, B.: Distribution system planning using mixed integer programming. In:
ELEKTRIK, Istanbul, Tubutak Emo, vol. 6, no. 1 (1998)
18. Gonen, T., Ramirez-Rosado, I.J.: Optimal multi-stage planning of power distribution
systems. IEEE Trans. Power Deliv., 512–519 (1987). https://doi.org/10.1109/TPWRD.1987.
4308135
19. Sindi, H., El-Saadany, E.: Unified reliability index development for utility performance
assessment. Intell. Ind. Syst. 2(2), 149–161 (2016)
20. Aghaei, J., Muttaqi, K.M., Azizivahed, A., Gitizadeh, M.: Distribution expansion planning
considering reliability and security of energy using modified PSO algorithm. University of
Wollongong Research online, Faculty of Engineering and Information Sciences papers,
Wollongong, Australia (2014)
21. Kumar, V., Krishan, R., Sood, Y.R.: Optimization of radial distribution networks using path
search algorithm. Int. J. Electron. Electr. Eng. 1(3), 182–187 (2013)
22. Abdelaziz, A.Y., Osama, R.A., El-Khodary, S.M.: Reconfiguration of distribution systems
for loss reduction using Hyper-Cube Ant Colony optimization algorithm. IET Gener.
Transm. Distrib. 6(2), 176–187 (2012)
23. Balakrishnan, R., Ranganathan, K.: A Textbook of Graph Theory, New York. Springer,
New York (2013)
24. Floyd, R.W.: Algorithm 97: shortest path. Mag. Commun. ACM 5(6), 345–350 (1962)
25. Heidari, S., Fotuhi-Firuzabad, M., Kazemi, S.: Power distribution network expansion
planning considering distribution automation. IEEE Trans. Power Syst. 30(3), 1261–1269
(2015)
Connecting to Smart Cities: Analyzing Energy
Times Series to Visualize Monthly Electricity Peak
Load in Residential Buildings

Shamaila Iram1 ✉ , Terrence Fernando2, and Richard Hill1


( )

1
University of Huddertsfield, Huddertsfield, UK
S.Iram@hud.ac.uk
2
University of Salford, Greater Manchester, UK

Abstract. Rapidly growing energy consumption rate is considered an alarming


threat to economic stability and environmental sustainability. There is an urgent
need of proposing novel solutions to mitigate the drastic impact of increased
energy demand in urban cities to improve energy efficiency in smart buildings. It
is commonly agreed that exploring, analyzing and visualizing energy consump‐
tion patterns in residential buildings can help to estimate their energy demands.
Moreover, visualizing energy consumption patterns of residential buildings can
also help to diagnose if there is any unpredictable increase in energy demand at
a certain time period. However, visualizing and inferring energy consumption
patterns from typical line graphs, bar charts, scatter plots is obsolete, less infor‐
mative and do not provide deep and significant insight of the daily domestic
demand of energy utilization. Moreover, these methods become less significant
when high temporal resolution is required. In this research work, advanced data
exploratory and data analytics techniques are applied on energy time series. Data
exploration results are presented in the form of heatmap. Heatmap provides a
significant insight of energy utilization behavior during different times of the day.
Heatmap results are articulated from three analytical perspectives; descriptive
analysis, diagnostic analysis and contextual analysis.

Keywords: Energy efficiency · Smart buildings · Data analytics · Heatmap

1 Introduction

In recent years, energy data analytics has got tremendous attention of researchers, econ‐
omists, industrialists, and policy makers from all over the world. This could be because
of the shortage of natural resources, environmental destruction, or proliferation of
energy demand due to the development of urban cities. Confronted, with this rapid
increase of energy demand, the researchers and scientists are finding greater interest to
design and develop advanced techniques and methods that can help us to cope with
energy crises or at least to mitigate its worst consequences.
Moreover, the rapidly increasing energy consumption rate poses an alarming threat
to the worldwide environmental sustainability and economic stability. International
Energy Agency’s (IEA) statistics reveal that 32% of the total final energy is being

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 333–342, 2019.
https://doi.org/10.1007/978-3-030-02686-8_26
334 S. Iram et al.

consumed by the buildings [1]. This percentage is even higher in non-industrial areas.
The fact that how people consumes energy depends on human behaviour and other
social, economic, environmental and geographical factors [2].
In recent years, energy efficiency and saving strategies have become a priority
objective for energy policies due to the proliferation of energy consumption and CO2
emission in the built environment. According to statistics 40% of all primary energy is
being consumed in and by the buildings [3]. International Energy Agency (IEA) in [5]
claims that “Energy efficiency is a critical tool to relieve pressure on energy supply and
it can also mitigate in part the competitive impacts of price disparities between regions”.
Analyzing energy patterns and identifying variations in energy usage with the help
of data mining techniques will help to build energy efficient buildings. It is evident in
the past 40 years that increasing energy efficiency of the buildings helps not only to
combat the climate changes but also to reduce the energy consumption [4].
Furthermore, this research work presents a framework that brings multi-domain
knowledge to an interdisciplinary project to solve the unaddressed or partially addressed
issue in the domain of energy efficient smart buildings. In doing so, this research work
elucidates the importance of mapping multi- domain experts’ opinion to develop the
new policies in deploying the significant changes. This new approach that combines
social, economic, behavioural and psychological, environmental, statistical and compu‐
tational phenomena offers a dynamic and compelling framework for designing energy
efficient buildings. This research work also acts as a bridge to fill the communication
gap between research community and the policy makers to make intelligent decisions
based on scientific evidence.

1.1 Times Series Analysis


In time series analysis concern lies in forecasting a specific quantity given that the
variations in that quantity over time are already known. While, other predictive models
that do not involve time series mainly focus on analysing a cross-sectional area of the
data which do not have time variance component. As stated by Hilda et al., in [6], “When
a variable is measured sequentially in time over or at a fixed interval (sampling interval)
the resulting data represents a time series”. They further elaborated that time series is
a collection of observations arranged in a natural order where each observation is asso‐
ciated with a particular instance or interval of time.
More specifically, time series, compared to common data, holds natural temporal
ordering where common data does not necessarily have natural ordering of the obser‐
vations. Furthermore, Millan et al. [7] defined time series analysis as a process of using
statistical techniques to model and explain a time-dependent series of data points.
Whereas, time series forecasting uses a prediction model to forecast the future events
based on the past events.
This research work also presents the application of different kinds of analytical and
visualization techniques to understand energy utilization patterns in residential building.
Data analytical results are visualized in the form of heatmap. Heatmap results are articu‐
lated from three different analytical perspectives as descriptive analysis, diagnostic
analysis and contextual analysis. Rest of the paper is structured as: State of the art work
Connecting to Smart Cities 335

is presented in Sect. 2 followed by methodological framework in Sect. 3. Exploratory


data analytical techniques are elaborated in Sect. 4; whereas Sect. 5 details the data that
is used in this research work along with data preprocessing techniques. Application of
heatmap examples are explained in Sect. 6. Section 7 provides brief summary of the
work along with conclusion and future research work.

2 Literature Survey

Platchkov and Pollitt in their paper [8] critically analysed and overviewed the longer
run trends of increasing global electricity demands and explain the potential impact in
the UK electrification. They claimed that the underlying resources cost for the energy
that is being used in different times of the day or the year changes accordingly. For
instance, on an off-peak day the price per megawatt hour (MWh) in the power market
does not rise above £50/MWh, however, on the peak day the price may reach to £800
for half hour periods across a 24-h period. This implies that, for median days there is a
comparatively great incentive of using electricity during night time. The main emphasis
of their research work is that the demand will increase steadily over time but the possible
coping solution is to shift the energy demand to off-peak time.
Therefore, a small demand response, either by reducing the consumption or by
shifting it to the cheaper time can make a significant difference in cost for residential as
well as for commercial buildings. This shows the significance of shifting demand to off-
peak time which is also called load balancing. Furthermore, figuring out the factors that
trigger the peak energy demand for a specific period of time in a building could poten‐
tially help to improve building’s heating, ventilation and air conditioning (HVAC)
system. Together with this, sudden peak in energy consumption can be because of some
mal-functioning or some unexceptional human behavior. Finding possible causes of high
energy demand for a certain period of time can possibly lead to find appropriate solutions
for it and ultimately a control in energy demand. Understanding this demand and supply
behavior in residential areas will further support the sustainable and renewable energy
technology.
David in his paper [2] states that selecting key variables and interactions is therefore
an important step in achieving more accurate predictions, better interpretations, and
identifications of key subgroups in the energy datasets for further analysis. Jenkins
et al. [8] visualize energy data to examine the monthly demand of substations and
synthesized equivalent. Walker and Pokoski [9] developed a model of residential electric
load where they introduced the psychological factors based on a person’s availability
that can affect the individual use of electrical appliances at a given time. Before that, in
early nineties, Capasoo et al. [10] applied bottom up approach to develop “Capasoo
Model”. This model uses the socioeconomic and demographic data, for instance, the
stock of appliances and their usage pattern in a household to model a load curve. This
load shape shows the relationship between the demand of residential customers and the
psychological and behavioral factor of the house occupants. Later in 2002, [11] Willis
used the bottom up approach to model the typical demand forecasting scheme for the
individual customers.
336 S. Iram et al.

3 Methodological Framework

The proposed methodological framework as shown in Fig. 1, for energy efficient smart
buildings, provides foundation for complex, diverse, contextually aware, eco-driven and
intelligently monitored nature of energy demand that frequently requires a multi domain,
interdisciplinary approach into research. This framework articulates the energy effi‐
ciency paradigm with respect to four significant attributes that should be considered to
improve end-use energy efficiency and to reduce energy demand. The embedded features
are predicated on the issues related to global climate change, social behavior, economic
productivity, and modelling the exceptionally large energy datasets to explore and inter‐
pret the interesting, useful patterns of energy usage.

Fig. 1. A methodological framework for cross disciplinary knowledge exchange to exploit the
design and development of energy efficient smart buildings.

The first crucial step to achieve a particular milestone is to identify and analyze the
problems, issues and concerns of different stakeholders in order to develop a shared
vision with common understanding and clear targets. The most important factor that
should be considered in constructing the smart buildings or smart cities is “human
beings”, which means, everything that we construct should be human oriented. Creating
a comprehensive roadmap will help us to focus on high-return predictive analytics with
clear pre-defined destinations and achievable milestones which is a starting point for
gaining a better understanding of customer’s requirements.
Hence, as a part of this research work, one of the milestones is to classify the prereq‐
uisites to provide a foundation to develop a globally acceptable socio-technical strategy
for building the smart buildings and smart cities. This will help to tackle all the issues
that are in mutual interests of different stakeholders. Since, this is a long term ongoing
project, this first part of the research work has already been accomplished and
published [12].
Connecting to Smart Cities 337

Our next research question is what is the role of data science in the design and
development of energy efficient smart buildings. In this research work, advanced analyt‐
ical methods and visualization techniques are used to explore complex energy datasets
in order to understand energy consumption patterns of a residential building.

4 Data Exploration: A Possible Solution

Data could be explored, analyzed, visualized and described at different level of


maturity. Most of the existing literature reveals four (4) informative levels of data
exploration depending on the complexity of the case studies under question. These
are recognized as descriptive analysis, diagnostic analysis, predictive analysis and
prescriptive analysis [1].
However, what is mostly neglected in most of the case studies analysis is to under‐
stand the circumstances in which a particular thing has happened. This is usually called
contextual awareness. Credibility of the results could only be attained by linking the
outcome of a particular analysis with certain situation in which it occurs. We are recom‐
mending contextual analysis as complementary method to describe any analytical
results. Therefore, data analytical types could be described from five different perspec‐
tives as listed in Table 1.

Table 1. Data exploration types, description and examples


Analytic Type Description Example
Descriptive analysis What is happening? Historical data reports
Diagnostic analysis Why did it happen? Fault Detection
Predictive analytics What is likely to happen? Cost Prediction
Prescriptive analysis What should we do about it Cost Optimization
Context analysis In which circumstances this Situation dependency
happened?

As mentioned earlier, this research work aims to understand energy utilization


patterns in a residential building to identify any unusual data behavior and their reasons.
Hence, the analysis will be carried out from three different perspectives as:
• Understanding energy utilization patterns → Descriptive Analysis
• Identifying extreme or abnormal data values → Diagnostic Analysis
• Finding the root cause of normal and extreme behavior → Context Analysis

5 Data Description

For this preliminary research, data is collected for 32 different houses in the area of
Manchester in different domains. In the domain of Building Information data is collected
for Archetype of the buildings, their Age, Addresses as longitude and latitude, Class,
Construction type, Ownership of the buildings, Floor area and Air test. Fifteen various
kinds of architypes of the buildings were found in that area named as BISF, Brick and
338 S. Iram et al.

block, Detached 1980s brick and block, End terrace pre1919 solid wall, Flat wimpey-
no-finess non-trad, Mid terrace pre 1919 solid wall, Semi-detached pre 1919 solid wall,
Semi-detached 1919 solid wall, Semi-detached 1920s solid wall, Semi-detached 1930s
solid wall, Semi-detached 1970s brick and block cavity, Semi-detached pre 1800 brick,
Terraced pre 1919 solid wall and Wates. Age of the building is categorised as 1920s,
1930s, 1950s, 1960s, 1970s, 1980s, pre 1800, pre 1919. Classes are defined as Detached,
End-terraced, Flats, Mid-Terraced, Semi-detached. Construction type is recognized as
Traditional and Non-traditional. Floor area is measured in square meters (m2) which is
further classified into three sections as Small (<50 m2), Medium (50–100 m2) and Large
(>100 m2). Air permeability results for air leakage test are categorised into three sections
as (<5 m3/(m2.h)), (5–10 m3/(m2.h)), (>10 m3/(m2.h)).
Demographic Information that is collected in the domain of Human Information
constitutes their Age, Gender, Family Composition and their Health Status. Family
composition is further recognised as Single occupants, Working couples, Small family,
Small family of three, Family of four, Family of five, Family of six, Retired singles,
Retired couples, Family of five with retired couples, and short term occupants with
complex needs. In the Services domain data is collected for electricity and gas usage in
KWH/m2 for one complete year. Electricity data is clustered into three sections as (<35
KWH/m2), (35–40 KWH/m2), and (>40 KWH/m2) whereas, gas data is also clustered
into three sections as (<120 KWH/m2), (120–140 KWH/m2), and (>140 KWH/m2).

5.1 Data Preprocessing


To understand data distribution, to find any outliers due to some extreme external
behavior or malfunction in the sensor devices and to prepare data for analyzing and
visualizing heatmap, energy dataset is preprocessed. At first, Cumulative Distribution
function (CDF) is applied on datasets to understand the probability of random variables
in the datasets. Equations (1) and (2) represents the cumulative distribution function
F(n) which is an estimate of the true CDF. It is found by making no assumptions about
the underlying distribution.

F(t) = P(X ≤ t) (1)


# of Sample Values ≤ t
Fn (t) = (2)
n
Figure 2(a) is the visual representation of CDF for temperature dataset for whole
building over one month. This includes hallway, Lounge and bedrooms. However,
Fig. 2(b) represents boxplot diagram to understand the extreme data behavior which is
sometime because of some malfunction in the devices.
Connecting to Smart Cities 339

Fig. 2. (a) Cumulative distribution of dataset. (b) Outliers identification with Boxplot diagram.

Temperature dataset is collected for complete one year for all 40 buildings. However,
to keep the analysis and visualization simple for this research work a dataset of one
month (January) is selected for one residential building. Dataset is prepared by applying
some functions from R1 packages such as lubridate, timeseries, and R classes POSIXct
and POSIXlt.
After discussions, it is decided to resample the datasets for different timestamp to
remove any suspicious or null value. Temperature dataset was collected after each five
seconds at first for 24 h in a day for one year. However, to reduce the probability of any
outliers, dataset was converted to each half an hour. This removed the probability of any
extreme/malfunction data behavior that could affect the results. After that, heatmap
algorithms are designed using R package ggplot2. Detail about heatmap application is
articulated in the next section.

6 Peak Identification- Heatmap Example

Once data is preprocessed and cleaned, the next step is to visualize energy utilization
patterns of a residential building. For this, we selected a building where a working couple
was living. The idea is to understand the usual behavior of energy utilization for each
day of a month. Also, apart from identifying their energy exploitation behavior, the
intention was to diagnose if there are any extreme or unusual data patterns that could
also be identified in the datasets.
As explained earlier, R library ggplot2 is selected to design heatmap algorithm.
Figure 3 provides visual representation of heatmap data values which are categorized
from 0–2000 KWH and the color bar selected with dark blue, red and yellow colors
where dark blue represents least data value and yellow represents extreme data value.
Each data point in the heatmap presents a data value for half an hour which extends from

1
https://www.r-project.org/.
340 S. Iram et al.

0–24 h. However, y-axis represents each day of the moth. Heatmap will help us to
perform descriptive, diagnostic as well as contextual analysis.

Fig. 3. Heatmap example to diagnose regular and extreme data behavior for a residential building.

As we can visualise in Fig. 3, there are some regular and some irregular energy
utilisation patterns for each day in the whole month. As we can see in the figure, from
11:00 PM to 7:00 AM the data values range comes within blue band, which identifies
low energy usage at that time which is highlighted as night time in the figure. Then from
7:00 AM to around 11:30 AM there is comparatively higher usage of electricity which
is probably due to the fact that everyone in the home is using electricity for normal house
hold activities at that time of the day. This can be visualised as red colour squares in the
figure. Then during the day time, again there is not much activity at home as compare
to the night time. This probably because they have left the house for work. Then, between
time span 5:30 PM to 11:00 PM higher energy consumption could be visualised when
usually everybody is at home and is engaged with different activities at home.
Moreover, this is also evident from the description above that by linking the descrip‐
tion of analytical results with its particular context actually helps to understand the
reasons of least and higher electricity consumption at particular time of the day.
Apart from a normal energy utilisation patterns, some extreme data behaviour could
also be visualised in the heatmap. For instance, all yellow points in the map tell us some
extreme or abnormal energy utilisation behaviour. This implies that there could be some
abnormality in the devices integrated in the house or this could be because of some
unusual behaviour of the residents. Identifying abnormal or extreme behaviour in energy
consumption patterns is called diagnostic analysis of the data. This also implies that
further investigation could be recommended to find the root cause of such extreme
behaviours that are the reasons of extreme energy utilisation.
Connecting to Smart Cities 341

7 Summary and Conclusion

Increased energy demand in residential as well as in commercial buildings in recent


years is deteriorating our natural energy resources and whole eco system. New and
effective solutions are required to control higher rate of energy consumption in the
buildings. This research work proposed a holistic multidisciplinary framework to
exchange knowledge and understanding from different domains for the design and
development of sustainable energy efficient buildings. This framework also presents the
collaboration model to share knowledge among different stakeholders and knowledge
experts to implement effective policies that help to improve energy efficiency.
This research work focuses on exploring data science techniques to understand users’
energy consumption patterns in residential buildings. Electricity data is collected from
32 different residential buildings for one year. Raw data is visualized using Cumulative
Distribution Function to understand its graphical distribution. However, boxplot
diagrams are used to visualize outliers in the dataset. Dataset is re-sampled for different
timestamp to eliminate the probability of unwanted data values. Once data was prepro‐
cessed, heatmap algorithm is designed and implement to understand electricity
consumption patterns for one residential building.
Descriptive analytical method is used to elaborate the results of the heatmap.
However, unusual or extreme energy utilization behavior is noticed in the energy
consumption pattern and elaborated using diagnostic analytical method. Contextual
analysis of the results helps to understand the rationale behind normal and unusual
energy consumption patterns. Peaks were identified in the heatmap that tell us some
extreme behavior of energy consumption. This, sometimes, could be because of any
fault in the integrated devices at home. However, this also recommends to understand
residents own behavior to use energy at home.
Energy analysis results reinforce our statement that figuring out the factors that
trigger the peak energy demand for a specific period of time in a building could poten‐
tially help to improve building’s heating, ventilation and air conditioning (HVAC)
system. Together with this, sudden peak in energy consumption can be because of some
mal-functioning or some unexceptional human behavior. Finding the possible causes of
high energy demand for a certain period of time can possibly leads to find appropriate
solutions for it and ultimately a control in energy demand. Understanding this demand
and supply behavior in residential areas will further support the sustainable and renew‐
able energy technology.
As part of future research work, authors intend to explore different data analytical
techniques that could be used to analyze stakeholders’ requirements that they want to
be integrated in smart buildings.

References

1. Fan, C., Xiao, F., Wang, S.: Development of prediction models for next-day building energy
consumption and peak power demand using data mining techniques. Appl. Energy 127, 1–
10 (2014)
342 S. Iram et al.

2. Hsu, D.: Identifying key variables and interactions in statistical models of building energy
consumption using regularization. Energy 83, 144–155 (2015)
3. Pérez-Lombard, L., Ortiz, J., Pout, C.: A review on buildings energy consumption
information. Energy Buildings 40(3), 394–398 (2008)
4. Pacala, S., Socolow, R.: Stabilization wedges: solving the climate problem for the next 50
years with current technologies. Science 305(5686), 968–972 (2004)
5. Internation Energy Agency (IEA), World Energy Outlook 2015, OECD/IEA, Editor, Paris
(2014)
6. Kosorus, H., Honigl, J., Kung, J.: Using R, WEKA and RapidMiner in time series analysis
of sensor data for structural health monitoring. In: 22nd International Workshop on Database
and Expert Systems Applications (DEXA), pp. 306–310. 29 Aug.-2 Sept., IEEE, France
(2011)
7. Millan, P., et al.: Time series analysis to predict link quality of wireless community networks.
Comput. Netw. 93(2), 342–358 (2015)
8. Platchkov, L.M., Pollitt M.G.: The Economics of Energy (and Electricity) Demand
Cambridge University, 13–14 May 2011
9. Walker, C.F., Pokoski, J.L.: Residential load shape modelling based on customer behavior.
IEEE Trans. Power Appar. Syst. 104(7), 1703–1711 (1985)
10. Capasso, A., et al.: A bottom-up approach to residential load modeling. IEEE Trans. Power
Syst. 9(2), 957–964 (1994)
11. Willis, H.L.: Spatial Electric Load Forecasting, 2nd edn. CRC Press, New York (2002)
12. Iram, S., Fernando, T., Bassanino, M.: Exploring cross-domain data dependencies for smart
homes to improve energy efficiency. In: Companion Proceedings of the 10th International
Conference on Utility and Cloud Computing, pp. 221–226. ACM, USA (2017)
Anomaly Detection in Q & A Based
Social Networks

Neda Soltani1(&), Elham Hormizi2,


and S. Alireza Hashemi Golpayegani1
1
Computer and IT Engineering Department,
Amirkabir University of Technology, Tehran, Iran
{neda.soltani,sa.hashemi}@aut.ac.ir
2
Computer and IT Engineering Department,
University of Science and Technology, Babol, Mazandaran, Iran
elham.hormozi@gmail.com

Abstract. Detection of anomalies in question/answer based social networks is


important in terms of finding the best answers and removing unrelated posts.
These networks are usually based on users’ posts and comments, and the best
answer is selected based on the ratings by the users. The problem with the
scoring systems is that users might collude in rating unrelated posts or boost
their reputation. Also, some malicious users might spam the discussion. In this
paper, we propose a network analysis method based on network structure and
node property for exploring and detecting these anomalies.

Keywords: Anomaly detection  Social networks Q&A


Reputation boosting  Spam detection

1 Introduction

Widespread participation in question and answer sites and answering specialized


questions, has led to the creation of massive data collections that are growing rapidly.
On the other hand, it’s hard to detect related, correct, and non-spam responses. In order
to identify spam, misleading or irrelevant answers that are replied to a question or
discussion, it is necessary to analyze these responses. Besides natural-language analysis
methods that have many complexities, some of these anomalies can be identified based
on the structure of communication between individuals and the content of the posts. For
instance, authors of [1] state that spammers create star-like sub-networks.
Anomaly means deviation from expected behavior. This means there exists patterns
in observed data that do not match the definition of normal behavior. In social net-
works, anomalies mean interactive patterns that have significant differences from the
whole network. In fact, the definition of anomaly depends on the nature of the problem.
Various types of anomalies could be defined in social network environments,
depending on the network of question. For example, spam emails are known as
anomaly. In a network-based trust system, collusion is identified as another type of
anomaly. These are just examples of anomaly types in network structures. Considering
the total amount of resources, time, and cost spent on these anomalies, it is necessary to

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 343–358, 2019.
https://doi.org/10.1007/978-3-030-02686-8_27
344 N. Soltani et al.

develop solutions to this issue. According to statistics, 67% of email traffic within the
period of January to June 2014 was spam. Also, in 82% of cases, social networks were
used for online abuse. These examples indicate the importance of the issue. These
anomalies appear as abrupt changes in interactions or interaction, which are completely
different from the usual form in a particular network. For instance, subnets that are
created for collusion have certain forms of interaction. Another symptom of anomalies
is highly interconnected subnets or star-like structures. Solutions that have been pro-
posed to detect anomalies in social networks are in two categories:
• Checking and comparing the network model with a normal interaction model.
• Checking network attributes.
Therefore, detection of anomalies in social networks involves the selection and
calculation of network characteristics, and classification and observation in the char-
acteristics space. The first challenge is the definition of normal behavior. Social net-
works do not have a fixed and balanced structure in all components due to the diversity
of individuals and available nodes; and the definition of a normal structure in such
networks is not possible. Another issue is that distributes of node degrees and network
structure of communities changes over time. The scenarios presented for a normal
structure are not necessarily real-time and it’s possible for a network to change before
structure is extracted. Anomaly detection includes the following steps [1]:
(1) Determining the smallest affected unit by behavior.
(2) Identifying characteristics that are different from normal states.
(3) Determining the context.
(4) Calculation of characteristics and extracting a characteristic space.
(5) Calculation of the distance between observations.
The difference between anomaly detection in social networks and other areas is that
in social networks we have individuals –containing characteristics—and the relation-
ships between them—, which are relevant to their characteristics. Networks may be
static or dynamic, labeled or not, and local or global; all of which affect the definitions
in the network, and also a definition of anomalies. Therefore, the method used for
anomaly detection in a friendship social network does not necessarily have optimal
result in an authors’ network.
In this paper, we will use social network analysis methods to detect anomalies in
content sent by users in a question and answer based social network. To achieve this
goal we have to first define the anomaly type; and second, present the detection method
based on the network and anomaly properties. Then, we will use network analysis
methods to use the presented method on the selected network. The main contribution of
this paper is using node properties along with graph structure for detecting anomalies.
The remainder of the paper is organized as follows: In the next section literature
review is throughout the recent works in this area. In Sect. 3, the problem statement is
presented in details. Then, our proposed solution methodology is explained. Section 4
covers the experiments and results of our tests and finally, in Sect. 5 we conclude our
work and discuss future works.
Anomaly Detection in Q & A Based Social Networks 345

2 Related Work

The types of anomalies in terms of the anomaly detection are in the following cate-
gories [1]: Static unlabeled anomalies, Static labeled anomalies, Dynamic unlabeled
anomaly, and Dynamic labeled anomaly.
Detection of anomalies is critical in preventing malicious activities such as bully-
ing, designing terrorist attacks and disseminating counterfeit information. The authors
of [2] examined the work that has been done to detect anomalies in social networks and
focus on the effects of new anomalies in social media and most new techniques to
identify specific types of anomalies. There are also a variety of studies on the detection
of anomalies, data types and data attributes in the social network, anomalies are
detected in network data [3–5, 8], which focus on graph data, including data weights to
detect anomalies. An “ego-nets” is provided that includes sub-graphs of favorite nodes
and neighboring nodes, and an “oddball” sphere regards around each node at the
substrate of the adjacent nodes that exists to each node. Then, a small list of numerical
features is designed for it. Detection of anomalies in temporary data has been done by
[7, 9, 10]. The key idea is to create a Granger graphical model on a reference data set,
and using a series of restrictions on the existing model, assuming that there is time
dependence as reference data, they test the determined dataset and also speed up
detection of anomalies by several random and parallel optimization algorithms. The
proposed methods in the referred papers cause the effectiveness of accuracy and
stability.
In [11], the author discusses about advances in detecting fraud and malformation
for social network data, including point anomaly detection. In that, a taxi driving fraud
detection system was used. To implement the system, there are a large number of GPS
trackers for 500 taxi drivers and systematically, they have investigated counterfeit
activities of taxi drivers. The author in [12] uses an algorithm called WSAR3E.0 that
can detect anomalies in simulated data with the earliest possible detection time and a
low false positive number. It is also discussed in some articles about the detection of
group malformations in social networks, applications, and systems.
In [13], in order to identify the social implicit relations and close entities in the
dataset, a framework has been used to solve similar unusual users in the real-world
datasets. This approach requires a model for coping of communications, a model for
independent users, and a method for distinguishing between them.
In [14], a graphical model called GLAD, which has the ability to discover the group
structure of social networks and detect group anomalies and also, required tests are
performed on real and unrealistic datasets by anomaly injections. This automatically
checks the nodes of a multi-layer network based on the degree of similarity of the
nodes to the stars in different layers and by parallelizing the extracted features and
anomalous detection operations in different layers of the multi-layer network, signifi-
cantly, the calculations have been increased by the distribution of inputs to different
machines cores. In [16], the author analyzes the distribution of input times and the
volume of events such as comments and displays of online surveys for ranking and
detecting suspicious users, such as spammers, bots and Internet fraudsters are being
346 N. Soltani et al.

used. In this paper, a relative model called VOLTIME is presented that measures the
distribution of input times from real users.
In another research-based on the idea that most user behavior is divergent from
what can be considered as ‘normal behavior’, there is a risk assessment that results in
more risks [17]. Because similar users follow a series of similar rules on social net-
works, this assessment is organized in two phases: Similar users are first grouped
together, then, for each identified group, one or more models are constructed for their
normal behavior [18]. Using the recorded sessions to solve the problem of whether
each session is abnormal determines the degree of anomalies in each session. Imple-
menting robust statistical analyzes on such data is very challenging as the number of
observed sessions is much smaller than the number of network users. The new method
being forwarded in this paper for detecting anomalies in a very large dimension based
on hyper-graphs, an important extension of graphs in which simultaneously the edges
connect to more than two vertices. Table 1 shows a comparison between abovemen-
tioned researches.

3 Problem Statement and Solution Methodology

As mentioned in introduction part, we are looking for anomalies in this dataset. We


limit anomaly types to spam and reputation sub-networks. Therefore, following
questions are to be answered in the database:
1. Which users submit answers irrelevant to the question, spam, or aim at misleads the
discussion?
2. Which users boost reputation on a mendacious basis?
We have ignored comments for some reasons; first, we want to keep track of the
discussion, which is mainly included in the posts not comments. Second, it would be a
time-consuming task to merge the comments to the posts, as the dataset is provided
separately for comments. Furthermore, comments are written in response to a single
post and mostly contain details about that post, not the whole question. Finally, rating
and badges are based on posts, not comments. So, the specific types of anomaly we are
looking for would be found in posts.

3.1 Methodology
In this section, we present our analysis made on the proposed network. The analyses
aim at detecting spammer accounts, and as a result, the spam answers.
Based on [4, 6], spammers create a star-like network. So, we first detect star-like
sub-networks. To do so we have to create ego-net for each individual node and then
study the neighbor nodes. A star-like sub-network is detected if there are few neighbors
who connect directly to one another. The node in the center of a star-like sub-network
is a spammer by a high possibility.
The other question mentioned in the previous section is about detecting the nodes,
which try to falsely boost their reputation. This is done by detecting communities
whose intercommunications are too much tight [19].
Anomaly Detection in Q & A Based Social Networks 347

Table 1. Comparison between recent researches on Social Networks Anomaly Detection.


Reference Anomaly Target Method Node/Edge
type network Property Included
[3–5, 8] Anomalies Weight Graph OddBall, ego-net Density, Weights,
Nodes Patterns, Hybrid Ranks and
Method for Outlier Eigenvalues, use
Node Detection Node and Edge
[7, 9, 10] Time-Series Weight Graph Granger graphical Edge, Weight
Anomaly model
Detection
[11] Point Weight Graph Taxi Driving Fraud Edge, Weight
Anomaly Detection System
Detection
[12] Bayesian Bayesian WSAR3E.0 Edge, Time
Network Network Algorithm,
Anomaly Simulation
Detection
[13] Intrusion Graph Tribes algorithm Node
Detection Network
[14] Group Graph Group Latent Node, weight
anomaly Network Anomaly Detection
Detection (GLAD) model,
d-GLAD
[15] Multilayer Unsupervised, ADOMS (Anomaly Node, Edge,
Networks Parameter- Detection On Weight
Free, and Multilayer Social
Network networks
[16] Suspicious Unsupervised VOLTIME Model Time
Users Anomaly
Detection
[17] User Online Social Two-Phase Risk Time, Node
Anomalous Networks Assessment
Behaviors Approach
[18] Anomaly Weighted OddBall Algorithm Node, Density,
Detection graphs Weights, Ranks

Finding Star-Like Structure. In order to detect star-like structures, we have to


detect cliques of size 3, i.e. triads in ego network of each node. In order to study ego
networks, we choose the nodes with the highest betweenness; as these nodes connect
components of the network to each other, are likely to create star-like structure. Fig-
ure 1 shows a pseudo code of the algorithm proposed for detecting star-like ego-
networks in this paper.
Detecting Highly Interconnected Communities. Another type of anomaly con-
sidered in this paper is collusion in order to boost reputation. Based on [1, 19], this type
of anomaly is detected by detecting highly interconnected communities. Communities
having this property are almost isolated from the whole network and have a large
348 N. Soltani et al.

Fig. 1. Pseudo code for the proposed algorithm

number of edges inside. While finding this type of community, edge weights get
important. In the first scenario we used to create the network, we did not consider the
edge weights. In order to add weight to edges, in such a way that it shows the level of
two nodes’ connectivity, we add the number of times one node answers another node’s
question as the edge weight between those nodes.
Considering the nature of the anomaly we want to detect, we can omit edge
directions; as we are looking for high interconnectedness. We assume that these sub-
networks contain malicious users who try to boost their own reputation by asking or
answering another’s questions. Communities are detected by identifying isolated
components of the network (Fig. 2).

Fig. 2. Pseudo code for the algorithm we presented for detecting anomalous communities.
Anomaly Detection in Q & A Based Social Networks 349

4 Experiments and Results


4.1 Dataset Specifications
The dataset has been downloaded from the Stack Exchange site and includes questions
about the “Android” category on this site. This dataset contains user information,
badges, comments posted below posts, questions and answers, history of post changes,
posts links, and registered votes for each post. Each of this information is in a separate
XML file [18]. On the Stack Exchange site, they do the control mechanism for posting
and controlling the users. Each post gets a negative or positive rating from users.
According to posts, people give each other a badge. Also, people’s reputation is based
on their posts, the number of correct answers set by the rest of the users, and so on. To
work with this dataset, we first enter the information in the Excel environment and save
the sections in the CSV file format. In the following, in order for the data to be able to
enter the Pajek software, using a Java program, read the files and save the nodes and
edges in separate files.
Network Creation Scenarios
One method to detect spam is detecting spammer accounts. Therefore, if we create
a network of users and analyze it in order to find the spammer accounts, we could
simply flag posts by those accounts as spam. Obviously, we won’t be able to detect
spam sent by normal users.
In the aforementioned network nodes are users. Each edge resembles a reply by a
user to another user’s post. Therefore, an edge connecting user u1 to user u2 shows that
user u1 has answered user u2’s one question. Edges are directed (from u1 towards u2).
Therefore, a user having high in-degree in one who has answered questions by many
users, and a user having a high out-degree is one who has answered questions of many
users. The latter users are more important to us now, as we consider spam answers.
Nodes have properties including id, reputation, account creation date, name, age,
positive votes count, negative votes count, and badges. We would use these properties
to detect spammer users.
A large number of users are solitary; i.e. there are a large number of users who have
not asked questions or answered any other questions. We remove solitary nodes, which
results in the network illustrated in Fig. 3.
The network created from users based on answers of each user to the other user’s
question. The network has several separate components. In a plenty of cases the user
has asked only one question, answered by only one user, none of whom interact with
the rest of the users.
In the following section, we will explain implementation of our proposed solution.
There are plenty of visualizations of resulting network, which represents nodes as small
circles (each of which is representative of a user either answering a question or asking
one). A connection between two nodes shows an answer from one user to the other’s
question.
350 N. Soltani et al.

Fig. 3. Network created based on scenario.

4.2 Implementation
Detecting Star-Like Ego-Net. In order to find the possible spammer accounts, we
choose the nodes based on betweenness and examine those nodes first. The first
experiment is done on user 137 who has the most betweenness. Figure 4 shows
neighbor network, Fig. 5 shows the ego-net of node 137 and Fig. 6 shows the triads of
the network in Fig. 4.

Fig. 4. Neighbor network of user 137.

50 nodes of total 105 nodes create a neighbor network with 137. Therefore, the ego
network of 137 is not a star like structure as more than 70% of its neighbors are
connected to each other. Table 2 shows the properties of node 137 which is used to
decide if anything abnormal exists about this node.
The next node in the highest betweenness order is 16575. Figures 7 and 8 show the
ego-net and neighbor network of this node respectively. There are 502 nodes in 16575
neighborhoods, but only 135 of them are connected to each other. In order to analyze
this node further, we check its properties as follows (Table 3). Considering upvote
Anomaly Detection in Q & A Based Social Networks 351

Fig. 5. Ego network of node 137.

Fig. 6. Triads of neighbor network of node 137.

Table 2. Node 137 properties.


ID Reputation CreationDate DisplayName UpVotes DownVotes Age Cb
137 14905 2010-09- Matt 1236 18 0.0040
14T02:48:38.087

count of this node compared to its downvote, high reputation, and 79 badges, it is
unlikely for this node to be a spammer. Although, the ego network of this user is quite
close to star structure.
The third experiment is done on user 1465. 110 nodes out of 272 nodes in 1465’s
neighborhood are connected to each other (45%). Considering this node’s properties,
we can see it has a high reputation, but the downvotes outnumber the upvotes. It is
possible that 1465 is a spammer user (Figs. 9 and 10). Considering other properties of
this node, we can see this user has had 1012 posts with an average rating of 3.32,
average view of 20500, the average answer to questions of 1.33, and average comments
352 N. Soltani et al.

Fig. 7. Ego network of 16575.

Fig. 8. Neighbor network of 16575.

Table 3. Properties of node 16575


ID Reputation CreationDate DisplayName UpVotes DownVotes Cb
16575 45479 2012-07- Izzy 1452 213 0.0034
02T20:06:13.047

on posts of 1.42. We compare these numbers to the overall average values (Table 4).
It is seen that average values for user 1465 is above, or almost equal to overall values;
based on which we conclude user 1465 is not a spammer, despite the prior guess.
Other nodes having a high betweenness are studied the same way.
Detecting Communities. Communities are detected by identifying isolated compo-
nents of the network. We omit components having less than 4 nodes. The result is
shown in Fig. 11. We consider the biggest component by detecting communities in it
and removing the edges, which connect communities to each other (Fig. 12).
In order to detect highly interconnected communities, each community is studied
solo. For each community, we study the degree distribution, the most central node, and
the reputation average of the community. As seen in Fig. 13, the sub-network has star-
Anomaly Detection in Q & A Based Social Networks 353

Fig. 9. Ego network of 1465.

Fig. 10. Neighbor network of 1465.

Table 4. Properties of node 1465 compared to the overall average


Average Score ViewCount AnswerCount CommentCount FavoriteCount
All data 1.75 2937.04 1.175 1.226 1.655
1465 3.32 20500.61 1.33 1.42 5.762

like structure and is not highly interconnected. The most central node has the following
properties (Table 5).
This user’s reputation is higher than the total average reputation. Nothing is
anomalous about this node so we move on to the next community.
One of the communities does not have star-like structure (which makes it possible
to be interconnected – Fig. 14). The biggest clique in it is as represented in Fig. 15.
All the nodes in Table 6 were created within two weeks. Most of them have a high
reputation, and their up-votes are much bigger than their downvotes. The clique created
354 N. Soltani et al.

Fig. 11. Communities in the network.

Fig. 12. Communities inside the biggest component of the network after removing components
having less than 4 nodes and edges between components

Fig. 13. Community with the highest number of nodes.


Anomaly Detection in Q & A Based Social Networks 355

Table 5. Properties of node 40036


ID Reputation CreationDate DisplayName UpVotes DownVotes Age
40036 3705 2013-08-25T09:42:20.677 RossC 913 885

Fig. 14. A community, which is not star like.

Fig. 15. Biggest clique.

in the aforementioned community is possible an anomaly because of it resembles a


highly interconnected subnetwork. Given that other communities have a similar
structure, this structure is abnormal.
The reason behind the fact that most communities have star-like structure is that
experts in each field answer questions of their own expertise and rarely answer the
question in all fields. Therefore, most users have asked few questions and these
questions have been answered by few numbers of experts in that specific field, who are
at the center of the stars.
For this community having a different structure, there could be two hypotheses:
(1) there exist a number of experts that communicate to one another and rarely answer
questions of other users, or (2) there are users in it who have joined the network in
order to get badges and boost reputation. Considering the creation time of the users in
this clique, the second hypothesis is further reinforced.
356 N. Soltani et al.

Table 6. 10 highest degree centrality nodes


ID Reputation CreationDate DisplayName UpVotes DownVotes Age Degree
137 14905 2010-09-14 Matt 1236 18 70
10 18945 2010-09-13 Bryan Denny 1481 30 29 65
482 15609 2010-09-27 Lie Ryan 3591 141 56
15 4856 2010-09-13 gary 1498 44 31
594 3820 2010-10-02 Edelcom 376 2 54 23
366 915 2010-09-22 Casebash 154 1 28 18
86 2168 2010-09-13 FoleyIsGood 165 1 33 17
382 1804 2010-09-22 BrianCooksey 119 0 49 16
7 1687 2010-09-13 Jonas 78 17 16
280 520 2010-09-21 Radek 159 0 15

Other communities exist that have structures different from star-like sub-network.
Figure 16 shows them:

Fig. 16. Other communities with star structure.


Anomaly Detection in Q & A Based Social Networks 357

5 Conclusion and Discussion

In this paper, we presented a solution to detect anomalies in social networks. We


focused on a famous QA network; therefore, the anomalies were defined as inappro-
priate answers (e.g. spam) and false reputation boosting. In order to detect these two
types of anomalies, we suggested and applied two different approaches. For detecting
spammers, we used a methodology to detect star like ego networks, and for detecting
false reputation boosting, we detected highly interconnected networks. As another
contribution of this paper, we considered network structure and node properties at the
same time, which helps to get results that are more accurate.
Detecting anomalies in social networks highly depend on the type, structure, and
the content of the network. Based on the type of anomaly to be detected, different
network scenarios exist. Also based on the network creation scenario, the solution will
be different. All of which makes it impossible to present a general-purpose anomaly
detection method.
The limitations of the research include the challenges of combining network
analysis results with mining results on node properties. As seen in this paper, we
analyzed nodes after finding the most probable abnormal node using network solutions.
Yet, there is not a unique systematic solution to this.
As the future path for this research, one can consider the following:
• Analysis and detection of other possible types of anomalies in a typical Q & A
social network, such as spurious expertise, irrelevant answers, offensive comments,
etc.
• Extension of research to user feedback-based areas like product overview, discus-
sion forums, and social groups; each of which is potentially fit for spam and
reputation boosting.
• Implementing different network generation scenarios; e.g. a weighted graph of users
based on the number of interactions between two users, a second layer network
generated based on keywords of users and questions. These scenarios might help
better in detecting abnormal behavior within the current context.

References
1. Savage, D., Zhang, X., Yu, X., Chou, P., Wang, Q.: Anomaly detection in online social
networks. Soc. Netw. 39, 62–70 (2014)
2. Liu, Y., Chawla, S.: Social media anomaly detection: challenges and solutions. In:
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining,
pp. 817–818. ACM, Cambridge (2017)
3. Akoglu, L., McGlohon, M.: Anomaly detection in large graphs. CMU-CS-09-173 Technical
Report (2009)
4. Akoglu, L., McGlohon, M., Faloutsos, C.: Oddball: spotting anomalies in weighted graphs.
In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) Advances in Knowledge Discovery
and Data Mining. PAKDD 2010. LNCS, vol. 6119. Springer, Berlin (2010)
358 N. Soltani et al.

5. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey.
IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012)
6. Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood formation and anomaly
detection in bipartite graphs. In: Fifth IEEE International Conference on Data Mining,
pp. 418–425. IEEE Computer Society, Washington, DC (2005)
7. Cheng, H., Tan, P.N., Potter, C., Klooster, S.: Detection and characterization of anomalies in
multivariate time series. In: Proceedings
8. Tong, H., Lin, C.-Y.: Non-negative residual matrix factorization with application to graph
anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data
Mining, pp. 143–153. Society for Industrial and Applied Mathematics (2011)
9. Qiu, H., Liu, Y., Subrahmanya, N.A., Li, W.: Granger causality for time-series anomaly
detection. In: IEEE 12th International Conference on Data Mining (ICDM), pp. 1074–1079.
IEEE (2012)
10. Sun, P., Chawla, S., Arunasalam, B.: Mining for outliers in sequential databases. In:
Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 94–105.
Society for Industrial and Applied Mathematics (2006)
11. Ge, Y., Xiong, H., Liu, C., Zhou, Z.H.: A taxi driving fraud detection system. In: 2011 IEEE
11th International Conference on Data Mining (ICDM), pp. 181–190. IEEE (2011)
12. Wong, W.K., Moore, A.W., Cooper, G.F., Wagner, M.M.: Bayesian network anomaly
pattern detection for disease outbreaks. In: Proceedings of the 20th International Conference
on Machine Learning (ICML-03), pp. 808–815. IEEE (2003)
13. Friedland, L., Jensen, D.: Finding tribes: identifying close-knit individuals from employment
patterns. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 290–299. ACM, Vancouver, August 2007
14. Yu, R., He, X., Liu, Y.: Glad: group anomaly detection in social media analysis. ACM
Trans. Knowl. Discov. Data (TKDD) 10(2), 18 (2015)
15. Bindu, P.V., Thilagam, P.S., Ahuja, D.: Discovering suspicious behavior in multilayer social
networks. Comput. Hum. Behav. 73, 568–582 (2017)
16. Chino, D.Y., Costa, A.F., Traina, A.J., Faloutsos, C.: VolTime: unsupervised anomaly
detection on users’ online activity volume. In: Proceedings of the 2017 SIAM International
Conference on Data Mining, pp. 108–116. Society for Industrial and Applied Mathematics
(2017)
17. Laleh, N., Carminati, B., Ferrari, E.: Risk assessment in social networks based on user
anomalous behaviour. IEEE Trans. Dependable Secure Comput. (2016)
18. Stack Exchange Data Dump. https://archive.org/details/stackexchange. Accessed 9 Nov
2017
19. Pandit, S., Chau, D.H., Wang, S., Faloutsos, C.: Netprobe: a fast and scalable system for
fraud detection in online auction networks. In: Proceedings of the 16th International
Conference on World Wide Web, pp. 201–210. ACM (2007)
A Study of Measurement of Audience
in Social Networks

Mohammed Al-Maitah ✉
( )

Computer Science Department, Community College, King Saud University, Riyadh, Saudi Arabia
malmaitah@ksu.edu.sa

Abstract. This article is dedicated to surveying and analyzing Facebook account


performance and developing a set of indicators, which can describe audience of
Facebook user. The raw experimental data was gathered and analyzed using stat‐
istical methods, developed initially for Twitter. Based on them audience was
classified into categories then main attributes of updates was carefully studied to
develop derived indicators which can show not only audience quality, but also
information coverage and partly influence (e.g. growth of authority and so on)
and demonstrated using graphical charts. Indicators were generalized into
formulae—so was built a base to further studies on Facebook account activity.
Directions of future work are also listed in conclusion.

Keywords: Social network · Performance · Facebook · Influence


Account survey

1 Introduction

Facebook engine provides a very small number of attributes to analyze. The most posts
are attributed by quantity of “likes” (e.g. number of people, who marked certain post)
and “shares” (e.g. number of people, who also placed certain post on his page). These
two attributes are not interdependent. This means that user can mark but not share,
likewise he can share but not mark. But even in such simple estimation system there’s
a set of difficulties. Firstly, there’s no way to determine, whether “like” demonstrates
exactly “like”. There are a number of events, which are marked, but not really liked by
the users. For example, it can be message about someone’s death or other sad news [1].
The official position of Facebook clears up that actions on this social network are focused
on positive interactions, whilst negative must be expressed by comments of other user.
Moreover, if post contains a link to other resource, accompanied by a short comment,
there’s no way to determine – weather was liked the link itself, or user’s comment to it.
So empirical studies show, that ‘like’ demonstrates just interest, acknowledgment
or just support, and that resource worth enough to attract attention of other people, but
do not enough to preserve it on personal timeline.
Hence, “share” can be described as so important event or text for user, so he decided
to preserve it. But there are also such issues, as in case of “like”. We cannot determine,
what exactly is important, the shared resource itself, or comment to it. We even can’t
determine the exact number of shares, as the post can be directly copied into a user’s

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 359–368, 2019.
https://doi.org/10.1007/978-3-030-02686-8_28
360 M. Al-Maitah

page with or without reference to its author. And there are a certain social network
aggregators – special sites, which gather news from SN’s and reprint them.
This means that we have very little raw data to estimate efficiency of Facebook
account. It’s obvious, that average quantity of likes and shares can show level of effi‐
ciency (and most network services act just that way), but to make a proper estimation,
we have to know more.
For example, Klout service tries to measure influence, which has certain user of
social network. It gains access to user’s account and tracks its activity in terms of impact
of every activity, then summarizes them and makes final estimate in Klout points from
0 to 100. The algorithms of Klout are closed and heavily secured by patents, but the
main parameters of its estimation are simple [2, 3].
• Quantity of followers (or subscribers – Klout uses the same approach for all main
world-wide social networks, including Facebook and Twitter);
• Quantity of likes and shares;
• Quantity of users, engaged in conversation (e.g. number of users, who leave at least
one comment);
• Interacting with another user with more high score.
According to Klout, the highest score in 2014 have Barack Obama, Beyonce,
Britney Spears, later in 2018 Barack Obama, Justin Bieber, zooey deschanel, and…,
surprisingly, The Beatles – e.g. accounts, which very often publish newsworthy
information. This can be compared with Nielsen rating for TV shows – the more
people watch it, the higher is the estimate [4]. There are even studies that show
dependence of Klout score from logarithmic function of the quantity of subscribers
and enlisted in conversations [3].
So, Klout estimate can show popularity, but it doesn’t show efficiency. Moreover, it
doesn’t work on people, who create original yet highly specialized content and have
their devoted audience [3]. Let’s pick up as example Drew Karpyshyn, one of bestselling
writers for Star Wars (he is also a script writer for award-winning game Mass Effect).
Will be there a surprise, that his Klout rating is only 53? Dan Abnett has such as low
rating of 54 points and even Umberto Eco has 55 points. These people are not unpopular,
just the opposite, but their popularity do not rely on frequent activity and being in-touch
with main global events or memes.
So, the precious question is not in estimate itself. We need a method, which can
measure relative popularity and efficiency – e.g. not in global context of social network,
but in context of their potential and devoted audience. This method will show more real
popularity then estimates, based on frequent updates. To reach this goal we need a proper
measurement of that audience, what is the main subject of this article.

2 Related Works

In recent years a number of studies in social networks were performed. The most of
them are on Twitter platform. One of the very complex works was conducted by Kwak,
Lee, Park and Moon [5]. They surveyed more than 4000s of trending topics and about
A Study of Measurement of Audience in Social Networks 361

106 millions of tweets. Complexity of this analysis is possible due to small size of
“tweet” – short message or even just a hashtag (short slogan used for trending topics in
Twitter).
Questions of influence in social network was covered by the theoretical works
by [6–9], who suggested that social network can be described as graph relation‐
ships; hence the influence can be modeled as threshold and cascade approximation.
Kempe also proposed a set of mathematical approaches for maximizing influence
within social networks using marketing strategies.
Newman, Watts et al. [10] suggest that analysis also can be conducted using random
graphs with certain degree of distributions. Such model allows describing not only social
network as a whole, but also a subnet works, like groups, communities and others.
On the contrary [11] consider social network as a net of directed links, which can be
marked, propagated and mentioned. The difference influence in the terms of marking
(likes, etc.) and influence in terms of propagation; perhaps, they were the first, who
pointed out, that high level of in degree not necessarily means real influence to other
users.
But all these surveys were conducted on Twitter, due, as was mentioned before, its
short-messaging nature. Facebook is still much less attractive platform for conducting
statistical and estimation studies and relevant studies on its content are rare to found. So
we use as a base mainly Twitter-based works.

3 Data Extraction and Analysis

Measuring of audience can be performed only on very specific group (or segment) within
social network. We developed a set of requirements for such group, which include: (a)
group must be large enough; (b) there must be at least three opinion leaders within it;
(c) group must have high update rate; (d) updates must contain original content or orig‐
inal comments to ensure, that audience have minimized influence from outside.
Hence, as experimental space we had selected Ukrainian segment of Facebook.
Here’s a checklist according our criteria:
Large enough network segment. According to [12] has 2 million 143 thousands 140
users. It’s about 4.72% of total country population (SocialBakers Facebook Statistics
Ukraine, 2017).
Opinion leaders: In Ukraine, at least 20 influential opinion leaders exists, who reside
primarily in Facebook (e.g. their original content appears there earlier then in national
media) [13]. Moreover, Proceedings of the ECSM-2014 outline, that in Ukraine about
40% (to be precise – from 49% to 38% – depending on internal situation) of population
describes Facebook as the primary source of important events [14].
High update rate: Our observations show, that in Ukrainian political and social life
emerges at least three main events (on hybrid war, on political process, on everyday life)
and about ten events of a smaller value. So daily update rate of an average Facebook
account with certain number of readers is about 3 or 5 updates in a day – which diverse
from large posts to one-liners.
362 M. Al-Maitah

Original content: ECSM-2014 also show that content in Ukrainian segment of


Facebook more often contains original information and opinions then traditional
media [13, 14].
So we picked up eight influential accounts, which already have their devoted audi‐
ence, have certain number of readers (more than 10000), and certain position in Ukrai‐
nian society, and observed them through one month, October of 2017. This period also
was a last month of electoral rally, so active Facebook audience was maximized and
measurement quite accurate. To preserve privacy, we identify observed accounts only
by initials and concentrated their characteristics in Table 1.

Table 1. Base characteristics of observed counts


User Updates per month Subscribers
O. T. 11 34761
A. Y. 46 290344
A. A. 28 244785
A. G. 94 75070
H. H. 134 18268
Y. S. 24 43037
P. P. 94 264982
Y. T. 50 76600

Fig. 1. Account update performance (likes).

This performance can be measured using certain indicators. Pay attention to Fig. 1
which shows detailed performance of one account (namely, H. H.) because it has very
large number of updates.
Selected accounts performed throughout October, as showed in Table 2.
A Study of Measurement of Audience in Social Networks 363

Table 2. Account raw performance.


User Average likes Average shares Updates with higher Min likes Max likes
like rate
O. T. 132 21 3 15 302
A. Y. 4143 350 21 26 9972
A. A. 4288 500 9 482 21989
A. G. 1222 157 40 97 4341
H. H. 255 24 40 12 2306
Y. S. 1129 150 7 60 5964
P. P. 2663 207 37 750 8436
Y. T. 406 32 18 96 943

We can see in this figure that performance of different posts diverse from very low
to very high. Such wide diversity allows us to split general audience into three main
categories:
• Supporters (or devoted audience): Their number is described by minimal like rate.
This is also lowest level of interested audience of certain account. Such people tend
to like every post of befriended or tracked account just to support it, even sad or bad
news, which cannot be positively marked.
• Regular audience: Their number is described by average like rate. This is number of
guaranteed readers, on which Facebook user can count when posting new update.
• Potential audience: Their number is described by maximal number of likes. It is the
current potential which account can handle if proper information policy is conducted.
Similarly, we can build a chart for shares, which is displayed in Fig. 2. This indicator
demonstrates rather not the audience, but the sensitivity level of account owner (e.g.
how his updates correspond with feelings and views of his subscribers). Hence, we have
the following categories of topics, depending of their share rate.

Fig. 2. Accounts update performance (shares).

Notes of zero importance: Such updates have zero shares. Mostly, its everyday notes,
which contains information, useful only for account owner; Notes for limited audience.
This type of topics have mostly “friends-only” visibility type and intended for sharing
364 M. Al-Maitah

only among close friends, partners, those, who has similar interests. They include
questions, requests and so on. Share rate for such topics is below average.
Main topics: This is updates with average (with a certain spread of values) share rate
and contains the main topics, which attract people to this account. Typically it’s an
opinion on specific interest – e.g. economics, politics, games, music, etc. – what can be
described as serious hobby or professional activity of account owner; socially, important
topics (or Hit Topics). This category contains hits of shares. The higher is the rate – the
more important topic, which update is dedicated to. Hits are very rare (see the chart) and
often have very high share rate comparing to most updates on other topics, but it is not
necessary.
It is possible to empirically point out, that Klout rating highly depends of hits. If
account has small number of hits, it will have low Klout rate, as well as other statistically
based popularity estimates.
Using this raw data, we can build at least two indicators, which can be used to
measure audience of certain account.
Active and passive audience: Active audience A1 is calculated as ratio between

average number of likes and total number of readers. Passive audience A1, respectively,
is the supplementary value, which can be obtained just as difference between 100 percent
and value of A1 (see Fig. 3).

Navg.like
A1 = ⋅ 100% (1)
Nreaders

A1 = 100 − A1 (2)

Fig. 3. Active audience percentage (blue bars) and social importance (red bars).

Social importance of account is the ratio between average number of shares and total
number of readers, just similar to the previous indicator.

Navg.shares
A2 = ⋅ 100% (3)
Nreaders

Social importance cannot be high for personal accounts – otherwise it’s not a
personal account, but a global or local media, which is primary source for very large
A Study of Measurement of Audience in Social Networks 365

numbers of other accounts. This indicator can be indeed used for determining whether
account belongs to a real person, or a media frontend. If importance is more than 0.5%,
it’s very good for a person, and importance higher than 10% is a mark of media.
Have such two base indicators, we can proceed to derived indicators.
For example, sensitivity level of account can be measured as a ratio between average
like rate and number of updates per month.

Navg.like
E1 = ⋅ 100% (4)
Nupd∕month

This indicator can be used to determine, how account owner main topics are valued
by his/her audience. Moreover, monthly change of sensitivity level can be used to eval‐
uate growth or degrade of account authority within its regular audience. This indicator
does not depend on hit topics; hence it will be more precise, then other statistical ratings.
The next indicator is calculated as ratio between minimal and maximal rates of like.
This indicator shows audience coverage.

Nmin.likes
E2 = ⋅ 100% (5)
Nmax.likes

Using this indicator and its monthly change, it is also possible to measure growth of
popularity of certain account. Likewise, have careful study of hit topics along with
monthly change of audience coverage, we can evaluate, how account owner views
correspond with views and interests of his/her subscribers.
And finally, using ratio between average number of shares and average like rate we
can determine relevance. It’s obvious that importance for account’s audience updates
will be not only “liked” but also “shared”, so the more percentage of such updates, the
more will be the value of this indicator.

Navg.shares
E3 = ⋅ 100% (6)
Navg.likes

Similar to social importance, this indicator can also be used to determine whether
account is a media. For personal account it shows grade of opinion leadership. The
persons with most value of sensitivity are opinion leaders for this group.
Just in case let’s calculate these indicators for our test subjects (see Fig. 4).
Of course, this study is just an approach to estimation method, but even in such short
form it can be used for analysis in social networks.
It can even solve the problem of “invisible audience”. In social networks, audience
largely remains invisible to users and can be estimated only indirectly – via feedback.
But the latter is unstable and varies day to day, because users can simply log out, haven’t
seen the precious post and so on. For big media products audience can be estimated via
surveys and web analytics, but for the individuals such things are unreachable, so they
not see their audience. But that “invisible audience” is critical for them and our method
can quantify it and help to improve their media activity.
366 M. Al-Maitah

Fig. 4. Sensitivity (blue bars), audience coverage (red bars) and relevance (green bars).

4 Conclusion and Future Works

This article covers experiment conducted only for one month, from raw data to certain
degree of generalization, recapping as a set of indicators and formulas. Given the high
rate of events in selected social network segment, this survey is just an outline, merely
an approach to a more complicated and more general method of estimation.
For example, we do not include to our survey number of comments due to two main
reasons: (a) we simply do not have a method to determine, whether comment is auto‐
mated or belong to real person and represent real opinion; (b) we do not have appropriate
method for estimation of comment value (Facebook allows only to like comment).
The question of fake accounts and automated comments is open and highly disput‐
able. Facebook itself estimates, that on its platform exists from 5.5% to 11.2% fake [15].
There’s also certain web services to estimate quantity of fakes among friends of given
account, based on certain criteria [16, 17]. Such tools provide SocialBakers [12], there
are also methods to distinguish them from real profiles [18] and others. But they allow
estimating only general quantity of fakes, but not the nature of certain comment and its
author. So, there’s a need for a detailed study on comments, which is one of our main
goals in future work.
The next our goal is to create integral rating estimate for account, which can provide
alternate to Klout and other frequency-dependent statistical tools. We intend to make
close survey of selected accounts for more long periods and determine not only base
indicators but also dynamics of their change.
And the third direction of our future work is surveying trending topics in Facebook,
its origins, flow and process of propagation, along with analysis of interest spaces,
related to them.
Such complex studies will be useful not only for exploration of information flow in
social network, but also will help people to improve their popularity and promote their
original content without necessity of frequent updates and dependencies of global news
traffic.
A Study of Measurement of Audience in Social Networks 367

References

1. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior in online
social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet
Measurement Conference, New York, USA, pp. 49–62 (2009)
2. Golliher, S.: How I reverse engineered klout score. Online journal by Sean Golliher. http://
www.seangolliher.com/2011/uncategorized/how-i-reversed-engineered-klout-score-to-an-
r2-094/
3. Stevenson, S.: What your klout score really means wired. http://www.wired.com/2012/04/
ff_klout/all/. Accessed Apr 2012
4. Drula, G.: Social and online media research—data, metrics and methods. Rev. Appl. Socio
Econ. Res. 3, 77–86 (2012)
5. Haewoon, K., Changhyun, L., Hosung, P., Sue, M.: What is Twitter, a social network or a
news media. In: Proceedings of the 19th International Conference on World Wide Web, New
York, USA, pp. 591–600 (2010)
6. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social
network. Theory Comput. Open Access J. 11, 105–147 (2015)
7. Ruixu, G.: Research on information spreading model of social network. In: Second
International Conference on Instrumentation and Measurement, Computer, Communication
and Control, Beijing, China, pp. 918–920 (2012)
8. Tang, J.: Computational models for social network analysis. A brief survey. In: Proceedings
of the 26th International Conference on World Wide Web Companion, Perth, Australia, pp.
921–925 (2017)
9. Jingbo, M., Lourdes, M., Amanda, H., Minwoong, C., Jeff, C.: Research on social networking
sites and social support from 2004 to 2015: a narrative review and directions for future
research. Cyberpsychol. Behav. Soc. Netw. 20(1), 44–51 (2017)
10. Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proc.
Nat. Acad. Sci. U.S.A. 99 (2002)
11. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P.: Measuring user influence in Twitter:
the million follower fallacy. In: Proceedings of the 4th International AAAI Conference on
Weblogs and Social Media (ICWSM), ICWSM 2010 on Weblogs and Social, Washington
DC, USA, pp. 10–17 (2010)
12. SocialBakers Facebook Statistics (Ukraine). http://www.socialbakers.com/statistics/
facebook/pages/total/ukraine/
13. Jaitne, M., Kantola, H.: Countering threats: a comprehensive model for utilization of social
media for security and law enforcement authorities. In: Proceedings of the 13th European
Conference on Cyberwarfare and Security, Greece, pp. 102–109 (2014)
14. Ronzhyn, A.: The use of Facebook and Twitter during the 2013–2014 protests in Ukraine.
In: Proceedings of the European Conference on Social Media, University of Brighton, UK,
pp. 442–448 (2014)
15. Facebook Estimates from 5.5 to 11.2 accounts are fake. The Next Web. http://
thenextweb.com/facebook/2014/02/03/facebookestimates-5-5-11-2-accounts-fake/
16. Veerasamy, N., Labuschagne, W.: Determining trust factors of social networking sites. In:
Proceedings of 12th European Conference on Information Warfare and Security, Finland, pp.
288–297 (2013)
368 M. Al-Maitah

17. Sirivianos, M, Cao, Q., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large
scale social online services. In: Proceedings of the 9th USENIX Conference on Networked
Systems Design and Implementation, USENIX Association Berkeley, CA, USA, pp. 15–15
(2012)
18. Cook, D.: Identity multipliers and the mistaken Twittering of birds of feather. In: Proceedings
of the 13th European Conference on Cyberwarfare and Security, Greece, pp. 42–48 (2014)
Predicting Disease Outbreaks Using
Social Media: Finding Trustworthy Users

Razieh Nokhbeh Zaeem(B) , David Liau, and K. Suzanne Barber

Center for Identity, The University of Texas at Austin, Austin, USA


{razieh,sbarber}@identity.utexas.edu, davidliau@utexas.edu

Abstract. The use of Internet data sources, in particular social media,


for biosurveillance has gained attention and credibility in recent years.
Finding related and reliable posts on social media is key to performing
successful biosurveillance utilizing social media data. While researchers
have implemented various approaches to filter and rank social media
posts, the fact that these posts are inherently related by the credibility
of the poster (i.e., social media user) remains overlooked. We propose six
trust filters to filter and rank trustworthy social media users, as opposed
to concentrating on isolated posts. We present a novel biosurveillance
application that gathers social media data related to a bio-event, pro-
cesses the data to find the most trustworthy users and hence their trust-
worthy posts, and feeds these posts to other biosurveillance applications,
including our own. We further present preliminary experiments to eval-
uate the effectiveness of the proposed filters and discuss future improve-
ments. Our work paves the way for collecting more reliable social media
data to improve biosurveillance applications.

Keywords: Biosurveillance · Social media · Twitter · Trust

1 Introduction
Thanks to the ever-growing use of social media, the Internet is now a rich source
of opinions, narratives, and information, expressed by millions of users in the
form of unstructured text. These users report, among many other things, their
encounters with diseases and epidemics. Internet biosurveillance utilizes the data
sources found on the Internet (such as news and social media) to improve detec-
tion, situational awareness, and forecasting of epidemiological events. In fact,
since mid 1990’s, researches have used Internet biosurveillance techniques to
predict a wide range of events, from influenza [5] to earthquakes [9]. Internet
biosurveillance takes advantage of what is called hivemind on social media—the
collective intelligence of the Internet users.
The sources of Internet biosurveillance (e.g., social media) are, generally,
timely, comprehensive, and available [10]. These sources, however, are enormous
and noisy. An important pre-processing step to draw meaningful results from
these sources is to filter and rank the most related parts of the data sources.
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 369–384, 2019.
https://doi.org/10.1007/978-3-030-02686-8_29
370 R. N. Zaeem et al.

Such filtering and ranking is widely recognized in the literature. For instance, in
their overview of Internet biosurveillance [10], Hartley et al. break the process
of Internet biosurveillance into four steps: (1) the collection of data from the
Internet, (2) the processing of the data into information; (3) the assembling
of that information into analysis; and (4) the propagation of the analysis to
biosurveillance experts. They identify relevancy ranking as one of the important
sub-steps of processing data into information in step two, before the actual
analysis begins in step three.
In order to filter and rank the posts (i.e., Twitter posts or news articles),
researchers have implemented various approaches, like Machine Learning (e.g.
Naive Bayes and Support Vector Machines [6,19]), and Natural Language Pro-
cessing (e.g., Keyword and Semantic-based Filtering [8] and Latent Dirichlet
Allocation [7]). All the previous efforts, however, have focused on ranking the
posts independently [12], ignoring the fact that these posts (Twitter posts or
news articles) are inherently related by the virtue of the credibility of the poster
(the Twitter user or news agency).
Furthermore, users of social media can post about anything they wish to talk
about. Some users talk about their illnesses online and these are the users we
wish to monitor as they give us a sampling of the union’s infectious disease state.
However, users can talk about being ill to illicit sympathy from other users, or
they can just be faking it. It is important to evaluate the trustworthiness of users
before extracting data for analysis.
Unlike previous work, we observe the fact that the credibility of the users
with respect to a given epidemiological event should be taken into account when
filtering and ranking related posts. We propose six trust filters that filter and
rank social media users who post about epidemiological events: Expertise, Expe-
rience, Authority, Reputation, Identity and Proximity. These trust filters obtain
the credibility or trustworthiness of a user by considering the structure of the
social network (e.g., the number of Twitter followers), the user’s history of posts,
the user’s geo-location, and his/her most recent post.
While we focus on the relevancy ranking sub-step by measuring the user
trustworthiness, we introduce a comprehensive framework that performs the
entire cycle of Internet biosurveillance as explained by the four steps mentioned
by Hartley et al. [10]. We leave technical details of some of the steps out of this
paper, and discuss them separately elsewhere.
Finally, in a preliminary set of experiments, we collect the posts and geo-
locations of 2,000 real Twitter users. We investigate the effectiveness of our pro-
posed trust filters. We observe the statistics of the filter scores and correlations
between the filters and suggest future improvements.

2 Overview: Surety Bio-Event App

The Surety Bio-Event App is our Internet biosurveillance application developed


at the University of Texas at Austin for the DTRA Biosurveillance Ecosystem
(BSVE) [18] framework. The BSVE provides capabilities allowing for disease
Predicting Disease Outbreaks Using Social Media 371

Fig. 1. Overview of the Surety Bio-Event App.

prediction and forecasting, similar to the functionality of weather forecasting.


The BSVE is a virtual platform with a set of integrated tools and data ana-
lytics which support real-time biosurveillance for early warning and course of
action analysis. The BSVE provides a platform to access a large variety of social
media data feeds, a software development kit to create applications (apps), var-
ious tools, and the cloud service to host a web-based user interface. Developers
develop BSVE apps and deploy them to the BSVE to be ultimately used by
biosurveillance experts and analysts.
Our Surety Bio-Event app covers the entire cycle of Internet biosurveil-
lance according to previous work [10]. Figure 1 shows a high level picture of
the Surety Bio-Event App. The four steps are: (1) Multi-Source Real-Time Data
which collects data (Sect. 5), (2) Trust Filter which processes data into infor-
mation (Sect. 3), (3) Surveillance Optimization (including early detection, situ-
ational awareness and prediction) which assembles the information into analysis
(Sect. 6), and (4) Forecasts and Predictions which propagates the analysis to
experts through a Graphical User Interface (Sect. 4). Furthermore, the Surety
app is user customizable and receives Goals and Situational Awareness as well
as Historical Data, Detections, and Predictions from biosurveillance experts.
Figure 2 shows a more detailed view of the App. In this paper, we concen-
trate on the second step, the trust filter, while we broadly review the other steps
too. With data collected from social media, the trust filter component of the
App evaluates the data sources to find the most trustworthy social media users
with respect to a given surveillance goal. The trust filter component optimizes
range, availability and quality of data using the combination of algorithms mea-
suring six dimensions of trust: Expertise, Experience, Authority, Reputation,
Identity and Proximity. The primary functions of the trust filter component are:
(1) improving the quality of data employed by BSVE applications and analysts
372 R. N. Zaeem et al.

Fig. 2. Diagram of data collection and analysis with the Surety Bio-Event App (SBEA).

to make biosurveillance decisions, (2) tracking and quantifying trustworthiness


of known, preferred users to guard against data bias and quality drift for BSVE
applications and analysts, and (3) expanding the landscape of possible trusted
social media users by offering trusted but previously unexplored users via rec-
ommendation notifications to BSVE applications and analysts.

3 Trust Filters
In order to determine user trustworthiness, we introduce the concept of a trust
filter—a score between 0 and 1 assigned to a user (e.g., a Twitter user) which
rates his/her trustworthiness with respect to a given criteria. We propose six
trust filters:

Expertise. Expertise measures a user’s involvement in the subject of inter-


est [3]. We define Expertise as the probability that a user will generate content
on the topic in question (e.g., an Influenza outbreak). Using the user’s history of
posts, Expertise can be calculated as how often a specific user has written about
the subject of interest in the past.
Expertise(ui , t) = p(t|ui ) = #P osts(ui , t)/#P osts(ui ), where ui is a user in
the social media network, t is a topic, and p(t|ui ) is the probability that a user
has generated content on that topic. We calculate this probability by counting
the number of that user’s posts on the topic and dividing by his/her total number
of posts. For all the filters, we use a keyword based classifier to distinguish the
posts concerting the topic of interest and the users posting about that topic.

Experience. Experience is the degree to which a user’s posts are corroborated


by other users. Informally, Experience seeks to measure how a user’s posts about
a subject are corroborated by the ground truth. Assuming that the average
Predicting Disease Outbreaks Using Social Media 373

involvement of all users in the subject of interest reveals the truth about the
outside world (e.g., everybody posts about flu when a flu outbreak actually
happens), we can use this average to calculate Experience. In order to do so,
we measure the difference between a user’s involvement in the subject using
Expertise and the average Expertise. To get a score that is between 0 and 1, and
using the fact that Expertise is already between 0 and 1, we calculate Experience
as Experience(ui , t) = 1 − |Expertise(t) − Expertise(ui , t)|.
The closer one’s Expertise to the average Expertise, the higher his/her Expe-
rience score.

Authority. Authority is the number and quality of social media links a user
receives from Hubs as an Authority [3]. A link is the relationship between users,
e.g., likes and comments on Facebook, and following on Twitter. We utilize the
Hyperlink-Induced Topic Search (HITS) [11] algorithm, a link analysis algorithm
widely used to rank Web pages and other entities that are connected by links, to
get a score between 0 and 1. In this algorithm, certain users, known as Hubs, serve
as trustworthy pointers to many other users, known as Authorities. Therefore,
Authorities are the users that have been recognized within the social media
community.

Reputation. Reputation is the number and quality of social media links to


a user. We utilize the PageRank algorithm [2], another widely used ranking
algorithm, to get a score between 0 and 1.

Identity. Identity is the degree of familial or social closeness between a user and
the person afflicted with the disease. The Identity filter is defined as the rela-
tionship between the posting user that talks about the disease and the subject of
the post that has somehow encountered the disease. If the user is reporting the
disease about himself/herself, the Identity score assigned would be the maximum
value, which is 1. If the user reports about a closer family member, the score
would be higher compared to when the user reports about an acquaintance of
his/hers. We utilize Natural Language Processing and Greedy algorithms to cal-
culate this score. This trust filter first finds all possible grammatical subjects of
a sentence (e.g., a Twitter post), then using the words in the family tree, it finds
the closest family relationship to those subjects and reports that family relation-
ship (e.g., self, mother, co-worker, son) for Identity. A score is assigned to this
relationship ranging from 1 (i.e., reporting disease about self) to 0 (i.e., talking
about total strangers). In order to get the Identity score of a user, the Identity
of all of his posts about the subject of interest are calculated and averaged. More
details on this filter can be found in our previous work [13].

Proximity. Proximity estimates the distance of a user from the event (e.g.,
disease outbreak location). Using relationship distance (i.e., Identity score) and
geographical distance (through geo-tagged posts and the geo-location of the
374 R. N. Zaeem et al.

user), Proximity utilizes a greedy algorithm to perform graph traversal over the
social media network and then combines the Identity value with the distance
value to calculate the Proximity as shown in Algorithm 1.

Algorithm 1. Proximity Algorithm


Input : Directed user graph G
Output: Proximity scores user.proximity
1 Initialize Identity threshold: T ;
2 for user in users do
3 if user.identity > T then
4 user.separation = 1/user.identity;
5 else
6 user.separation = ∞;
7 end
8 end
9 for user u in G do
10 for user v in G − {u} do
11 distance = v → u;
12 u.separation = min(u.separation, v.separation × distance);
13 end
14 end
15 for user in users do
16 user.proximity = 1 − user.separation;
17 end

Note that, the network graph that the trust filters use is pruned so that it
contains only those users that have posted (at least once) about the subject of
interest. As a result, trust filter scores are calculated focusing on the community
that discusses a particular subject on social media.

4 Trust Filter GUI


Figure 3 displays the Graphical User Interface (GUI) of the trust filter tab of
the Surety app. The GUI is composed of four smaller windows. On the top left,
the social media users are listed, and for each, the value of each of the six trust
filters is shown. Next to the gear icons, the names of the six trust filters appear:
Identity, Reputation, Experience, Expertise, Authority, and Proximity. The last
column is the Combined trust score, currently the average of the six filters.
On the GUI, the analyst or BSVE app developer selects a trust filter. He/she
can then sort the users with respect to that score (descending or ascending).
The higher the score, the more trustworthy the user with respect to that trust
filter. In Fig. 3, the users are sorted based on Proximity in descending order.
The analyst or BSVE app developer can also select favorite users that overtime
he/she has found trustworthy and mark them with a star. The GUI suggests
Predicting Disease Outbreaks Using Social Media 375

Fig. 3. Trust filter GUI of the Surety Bio-Event App.

social media users that have a higher combined score compared to the favorite
users with a blue glow under the user name (trusted but previously unexplored
users) as shown in the figure. The analyst can review the favorite users (bring
all the favorites to the top) too.
On the GUI, the Network Graph is the top right window, which displays
the users on social media as nodes and their links (e.g., following on Twitter)
and sizes. The analyst can select a trust filter to size the nodes in the Network
Graph. In this figure, the node sizes are based on Identity.
On the bottom left of the GUI, under Node Histogram, the GUI charts the
trust filter scores of users with the top five users for the selected filter.
On the bottom right, under Trust Score Distribution, the GUI displays the
range of user trustworthiness, based on each filter and the combined score. The
distribution of user trust scores with tunable granularity (set to 0.1 in this figure)
shows the number of social media users that have a given trust score.
376 R. N. Zaeem et al.

5 Data Collection
In this section and the next, we briefly overview the first and third steps of the
biosurveillance process, namely data collection and optimization, for the sake of
completeness.
The Surety app (1) uses data already available on the BSVE and (2) collects
data and uploads to the BSVE. The data sources monitored within the BSVE
include well established and trusted data providers such as the Centers for Dis-
ease Control (CDC) and the World Health Organization (WHO). Data from
these sources show the analyst working with the BSVE the best possible mea-
sure of the state of disease within the country. In addition, the BSVE collects
data from news sources and Twitter. From what the BSVE already provides,
Twitter does contain a treasure trove of information. However, other sources
such as blogs, Instagram, and Reddit have been under used. The Surety app
aims to fix these gaps in data collection. The trust filter part of the Surety
App seeks to collect data from other sources not currently supported by the
BSVE that contain connectivity network information, and are typically focused
on individuals as opposed to news feeds.
Figure 4 demonstrates some of the data sources for the Surety app. Note that
not all the data sources are candidates to be used with trust filters. Some of these
data sources provide only time series data which is used by the optimization part.
The data sources that are appropriate for trust filters are as follows. For these
sources, we have implemented methods within our API to collect historical user
data as well as connections to streaming APIs: Twitter, WordPress, Instagram,
Tumbler, Reddit, and Wikipedia.

6 Optimization
The third step of the biosurveillance process analyzes large collections of trusted
data sources to assemble systems that efficiently achieve user specified surveil-
lance goals, such as early outbreak detection. This analysis is accomplished
through optimization algorithms that evaluate data collections through com-
parison to historical and simulated bio-events. The Surety app yields trusted
data sources, along with statistical models and performance metrics to support
future surveillance activities. The trust filter part of the Surety App is capable
of collecting a wide-range of data then formatting that data into the required
time series data source for the optimization part. Our optimization algorithms,
discussed elsewhere, include early detection, situational awareness and predic-
tion [14].

7 Implementation
Our app is implemented with a Python Flask back-end and JavaScript front-
end. The back-end was developed to allow for user interactivity to the front end.
It serves JSON data generated from the algorithms to the user interface. The
application is integrated into the BSVE.
Predicting Disease Outbreaks Using Social Media 377

Fig. 4. Data collection sources of the Surety App.

8 Experiments
We have designed a preliminary set of experiments to answer the following
research question: How well do the proposed filters perform? In order to answer
this question, we plan to use seed data (e.g., a synthetic network of users, posts,
and disease outbreaks) as well as actual data (e.g., actual network of Twitter
users and their posts).
1. We observe the value of the trust filters and their trends.
2. We compare filter scores against hospital data to judge the ability of the trust
filters to detect disease outbreaks.
In this paper, we observe the trend of the proposed trust filters for a real network
of 2,000 Twitter users with their posts. The use of seed data as well as the
comparison with hospital data is work under progress.
For this set of experiments, we downloaded the posts and geo-location of 2,000
Twitter users. In order to do so, we performed a keyword search of the word ‘flu’
on Twitter API and then downloaded the user profile information (including geo-
location coordinates), the user’s friends’ time-lines, lists of friends and followers,
and past 30 days of tweets. We started the download on July 22, 2016 and,
because of Twitter’s bandwidth limitations, it took us a week to download 2,000
378 R. N. Zaeem et al.

users that have posted at least once with the word ‘flu’, totaling 33 GB. Note
that not all the posts of these users over past 30 days are necessarily about flu.
We use a keyword based classifier to distinguish flu-related posts.
Figure 5 shows the filters’ maximum, minimum and average values. The Iden-
tity trust filter has an average (as well as peak) value at about 0.48, which means
that, when people do post about flu, they tend to post about flu encounters of
their nuclear family members, as 0.5 is assigned to nuclear family members for
the Identity score. Reputation and Authority scores are unanimously close to 0,
implying that the network we downloaded had very little connectivity. The low
degree of connectivity is expected since people who post about flu do not neces-
sarily tend to follow others who post about flu. The average value of Expertise
was close to 0 too, meaning that even among those who have posted about flu
at least once, the number of flu related posts over a 30 day period was rela-
tively very low. The average value of 0.95 for Experience shows that most users’
Expertise score was close to the average Expertise, i.e. close to 0. Investigating
the out-liners should point to users that were unusually concerned about flu.
Finally, we found that Proximity should be re-defined to make it independent of
Identity, to show concrete distance from outbreak locations.

Fig. 5. Statistics of trust filters.

Figures 6, 7, 8, and 9 display the most interesting correlations we found


between the filter values. Figure 6 shows that the combined score is most heavily
influenced by Identity; these two filters are related with R2 equals to 0.49. There-
fore, we might need to normalize and weigh the filters to get a new less-biased
definition of the Combined score.
Figure 7 charts the correlation between Reputation and Authority filters
(R2 = 0.15). These two filters are not closely related. Therefore, while both
measure the connectivity of the network, they consider different aspects of
connectivity.
Predicting Disease Outbreaks Using Social Media 379

Fig. 6. Correlation between Combined Filter and Identity.

Fig. 7. Correlation between Reputation and Authority.

Figure 8 confirms that Experience and Expertise are inversely correlated. We


might need to update the definition of Experience to measure the corroboration
by others differently.
380 R. N. Zaeem et al.

Fig. 8. Correlation between Expertise and Experience.

Finally, while Proximity is initialized with Identity, as Fig. 9 shows, it is rather


independent of Identity. While the Proximity of users to a potential outbreak
location can be compared to one another, the absolute value of Proximity still
does not show the concrete physical distance between the user and a flu outbreak
location.

8.1 Feature Importance


We compare our trust filters with other simple features which are widely studied
in processing Twitter data [16]. Figure 10 and Table 1 show the feature impor-
tance score from the Scikit-Learn kit [17]. We use the Extremely Randomized
Tree Classifier as our method to evaluate the importance of each feature. We
utilize a library [1,15] in which the Gini coefficient is used as a measure to the
importance of each feature. In short, the total importance scores sum up to one
and the larger the score is, the more important in decision that feature is. As
Table 1 shows, the best feature from Extremely Random Tree Classifier is the
number of posts by a specific user within the given period of time. Consequently,
the filters that are based on the number of related posts, such as Experience and
Expertise, work well. However, the number of posts can be easily forged with
posting robots or Spam posts. Two other features that are known to perform
Predicting Disease Outbreaks Using Social Media 381

Fig. 9. Correlation between Identity and Proximity.

Fig. 10. Feature importance.


382 R. N. Zaeem et al.

well in similar types of problems are the average post length and the number
of tagged Twitter IDs which start with the symbol “@” [4]. Therefore, poten-
tial filters to consider can be based on these features. Identity, Reputation, and
Proximity all perform better than the other features studied in previous work,
including retweet, and whether or not the posts contain ‘?’ and ‘!’. Finally,
Authority performs poorly and can be considered irrelevant.

Table 1. Features and corresponding importance scores.

Feature Importance score


Number of posts 0.205
Experience 0.143
Expertise 0.132
Avg. post length 0.129
Number of @ tags 0.111
Identity 0.100
Reputation 0.099
Proximity 0.033
Retweet 0.029
Contains ‘?’ 0.010
Contains ‘!’ 0.009
Authority 0.002

9 Conclusion

Filtering and ranking social media posts is essential to biosurveillance applica-


tions that monitor them to detect and forecast disease outbreaks. We introduced
a novel way to filter and rank social media posts by concentrating on the trust-
worthiness of social media users with respect to a given subject. We proposed six
trust filters and used them in the context of a complete biosurveillance applica-
tion. We further evaluated these trust filters by observing how they perform on a
real set of Twitter posts downloaded from 2,000 users for over 30 days. Improv-
ing the filter definitions and judging the effectiveness of the filters in finding
actual disease outbreaks are two major future work directions.

Acknowledgment. Surety Bio-Event App is a long term project of the Center for
Identity. The authors thank Guangyu Lin, Roger A. Maloney, Ethan Baer, Nolan
Corcoran, Benjamin L. Cook, Neal Ormsbee, Haowei Sun, Zeynep Ertem, Kai Liu, and
Lauren A. Meyers for their contribution to this project. This work has been funded
by Defense Threat Reduction Agency (DTRA) under contract HDTRA1-14-C-0114
CB10002.
Predicting Disease Outbreaks Using Social Media 383

References
1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regres-
sion Trees. Statistics/Probability Series. Wadsworth Publishing Company, Belmont
(1984)
2. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine.
Comput. Netw. ISDN Syst. 30(1), 107–117 (1998)
3. Budalakoti, S., Barber, K.S.: Authority vs affinity: modeling user intent in expert
finding. In: 2010 IEEE Second International Conference on Social Computing
(SocialCom), pp. 371–378. IEEE (2010)
4. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In:
Proceedings of the 20th International Conference on World Wide Web, WWW
2011, pp. 675–684. ACM, New York (2011)
5. Collier, N., Son, N.T., Nguyen, N.M.: OMG U got flu? Analysis of shared health
messages for bio-surveillance. J. Biomed. Semant. 2(5), S9 (2011)
6. Denecke, K., Krieck, M., Otrusina, L., Smrz, P., Dolog, P., Nejdl, W., Velasco, E.:
How to exploit Twitter for public health monitoring. Methods Inf. Med. 52(4),
326–39 (2013)
7. Diaz-Aviles, E., Stewart, A., Velasco, E., Denecke, K., Nejdl, W.: Epidemic intelli-
gence for the crowd, by the crowd. Int. AAAI Conf. Web Soc. Media 12, 439–442
(2012)
8. Doan, S., Ohno-Machado, L., Collier, N.: Enhancing Twitter data analysis with
simple semantic filtering: example in tracking influenza-like illnesses. In: IEEE
Second International Conference on Healthcare Informatics, Imaging and Systems
Biology (HISB), pp. 62–71 (2012)
9. Doan, S., Vo, B.-K.H., Collier, N.: An analysis of Twitter messages in the 2011
Tohoku earthquake. In: International Conference on Electronic Healthcare, pp.
58–66. Springer (2011)
10. Hartley, D.M., Nelson, N.P., Arthur, R., Barboza, P., Collier, N., Lightfoot, N.,
Linge, J., Goot, E., Mawudeku, A., Madoff, L.: An overview of internet biosurveil-
lance. Clin. Microbiol. Infect. 19(11), 1006–1013 (2013)
11. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM
(JACM) 46(5), 604–632 (1999)
12. Lamb, A., Paul, M.J., Dredze, M.: Separating fact from fear: tracking flu infections
on Twitter. In: HLT-NAACL, pp. 789–795 (2013)
13. Lin, G., Nokhbeh Zaeem, R., Sun, H., Barber, K.S.: Trust filter for disease surveil-
lance: Identity. In: IEEE Intelligent Systems Conference, pp. 1059–1066, September
2017
14. Liu, K., Srinivasan, R., Ertem, Z., Meyers, L.: Optimizing early detection of emerg-
ing outbreaks. Poster presented at: Epidemics 6, Sitges, Spain, November 2017
15. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable impor-
tances in forests of randomized trees. In: Proceedings of the 26th International
Conference on Neural Information Processing Systems, NIPS 2013, USA, vol. 1,
pp. 431–439. Curran Associates Inc. (2013)
16. ODonovan, J., Kang, B., Meyer, G., Höllerer, T., Adalii, S.: Credibility in context:
an analysis of feature distributions in Twitter. In: 2012 International Conference
on Privacy, Security, Risk and Trust and 2012 International Conference on Social
Computing, pp. 293–301, September 2012
384 R. N. Zaeem et al.

17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine
learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
18. Digital Infuzion: DTRA Biosurveillance Ecosystem (BSVE) (2017)
19. Torii, M., Yin, L., Nguyen, T., Mazumdar, C.T., Liu, H., Hartley, D.M., Nelson,
N.P.: An exploratory study of a text classification framework for internet-based
surveillance of emerging epidemics. Int. J. Med. Inform. 80(1), 56–66 (2011)
Detecting Comments Showing Risk for Suicide
in YouTube

Jiahui Gao1, Qijin Cheng2(&), and Philip L. H. Yu1


1
Department of Statistics and Actuarial Science, The University of Hong Kong,
Pok Fu Lam, Hong Kong
2
Department of Social Work, The Chinese University of Hong Kong, Shatin,
Hong Kong
qcheng@cuhk.edu.hk

Abstract. Natural language processing (NLP) with Cantonese, a mixture of


Traditional Chinese, borrowed characters to represent spoken terms, and Eng-
lish, is largely under developed. To apply NLP to detect social media posts
showing suicide risk, which is a rare event in regular population, is even more
challenging. This paper tried different text mining methods to classify comments
in Cantonese on YouTube whether they indicate suicidal risk. Based on word
vector feature, classification algorithms such as SVM, AdaBoost, Random
Forest, and LSTM are employed to detect the comments’ risk level. To address
the imbalance issue of the data, both re-sampling and focal loss methods are
used. Based on improvement on both data and algorithm level, the LSTM
algorithm can achieve more satisfied testing classification results (84.3% and
84.5% g-mean, respectively). The study demonstrates the potential of auto-
matically detected suicide risk in Cantonese social media posts.

Keywords: Suicide  Text mining  Social media  Cantonese


Sentiment analysis

1 Introduction

Suicide is a serious public health concern globally and Hong Kong is no exception. The
latest suicide rate in Hong Kong is about 11.7 per 100,000 [1], which is about the
medium level in the global context [2]. In addition, suicide is the leading cause of death
among young people in Hong Kong [3]. Due to the popularity of social networking
sites in recent years, many young people were found to disclose their emotional distress
and even suicidal thoughts through social media [4]. Suicide prevention professionals
are, therefore, highly concerned with those online contents and hope to detect online
posts showing risk for suicide as early as possible so that interventions can be delivered
and lives can be saved.

Q. Cheng—Equal first author.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 385–400, 2019.
https://doi.org/10.1007/978-3-030-02686-8_30
386 J. Gao et al.

1.1 Related Work


Some pioneering efforts have been conducted to detect textual content showing suicide
risk. Some basic machine learning methods were used to classify suicide notes,
achieving 71% accuracy [6]. However, the accumulation of suicide notes is restricted
by very limited data sources and can be time consuming. Thanks to the instantaneity of
social media content, detection of suicide ideation in social network can strengthen
suicide prevention to a large extent. However, few work of suicide text detection in
social media has been conducted. In 2007, blogs were first used to detect users at risk.
Yen-Pei Huang [7] applied simple counting methods based on suicide-related key-
words to detect bloggers with suicide tendency [8], which only achieved 35% success
rate with low accuracy. Based on simple token unigram bag-of-word features, machine
learning algorithms were also used to predict the suicide tendency on Twitter [9].
Concerning users’ behavior feature in social network, M. Johnson Vioulès [10] applied
a martingale framework for suicide warning signs detection. However, Vioulès’ study
was run on only two Twitter users’ data.
In Mainland China, researchers have tried different statistical and machine learning
methods to detect Weibo (a Chinese social media site) posts showing emotional distress
and suicide risk [11]. Although achieving promising results, they also noted a few
challenges. First, dataset for detecting suicide risk is often highly imbalanced, given
that suicidal behavior is a rare event. A number of solutions to the class-imbalance
problem are proposed both at the data and algorithm levels [12]. At data level,
researchers often had to conduct re-sampling to adjust the imbalance, such as random
over-sampling of minority class with replacement, random under-sampling of majority
class, direct over-sampling, direct under-sampling, and so on [13]. At algorithm level,
adjustment of cost function in algorithm is suggested. In addition, those studies often
retrospectively collected data from social media and used the historical data for training
and testing. However, such solutions will make it questionable to directly apply the
results in real life, where suicide is indeed a rare event and social media contents are
constantly updating and evolving.
Although both Mainland China and Hong Kong consist mainly of Chinese ethnics,
Hong Kong people speak Cantonese dialect and often write in a mixture of Cantonese
and English due to its history of being a British colony. Due to the absence of Can-
tonese natural language processing tool, text feature extraction in Cantonese is often
based on simple n-gram features rather than word features [14]. A study found that
Cantonese pre-treated by a Mandarin word segmentation tool consistently outperforms
the character n-gram split [15]. In order to classify the at-risk online text better, we need
to do Cantonese word segmentation using a satisfactory method.
The main contribution of this paper is fourfold. First, this might be the first time
that Cantonese social media texts’ word vector features are used for detecting suicide
risk. We conducted Cantonese word segmentation based on a relatively complete
Cantonese dictionary by combining dictionaries on the internet. Second, unlike pre-
vious suicide detection that relied on retrospective accumulation, we investigated an
algorithm to detect suicide risk based on comments’ text features immediately. Third,
deep learning method was used to train the word vector model and achieved a better
result than custom machine learning model. Lastly, we introduced the focal loss, in
Detecting Comments Showing Risk for Suicide in YouTube 387

addition to the re-sampling method, to tackle the imbalance issue in text field and
achieved a satisfactory result. Focal loss, a new loss function, is found to be an effective
alternative for dealing with class imbalance [16].

1.2 Paper Outline


In the next section, the construction of Cantonese resource base will be briefly intro-
duced. Section 3 presents the methods we used to preprocess the suicide-related
comments. Section 4 introduces the feature extraction and classification methods.
Evaluation metrics will also be introduced in this section. Section 5 analyzes the
experiment result. In the last section, this paper is concluded and future works are
discussed.

2 Construction of Cantonese Resource Base

Social media posts are openly available at large. However, to label which posts show
risk for suicide requires annotations by suicide prevention professionals. Besides, even
though the simple Chinese and English text mining is relatively mature, little work was
done in the Cantonese text mining. The absence of popular Cantonese dictionary is also
an obstacle in the field.

2.1 Data Collection and Annotation


There has been a surge of student suicides in Hong Kong in recent years, which was
prominently reported by local press and generated wide discussion among the public.
One of the authors, QC, has been monitoring how people responded to this issue in
social media. She identified 162 YouTube videos relating to this issue published during
the 2015/16 school year, to which there were 5051 comments posted in the public
domain. The comments were downloaded by calling YouTube API and annotated by
QC and a trained research assistant (RA). Those comments indicating that the com-
menter was having or had serious suicidal thoughts, including having attempted sui-
cide, were labelled as at-risk. Both QC and the research assistant have first coded a
random sample of 100 comments separately. The inter-rater reliability was examined
by Cohen’s Kappa coefficient as 0.91, which indicated high agreement. Then the RA
completed the annotation of the rest of comments.

2.2 Construction of Cantonese Corpus


In fact, Cantonese is primarily a spoken language. The most important mechanism by
which Cantonese is represented in written form is phonetic borrowing. Sometimes,
when confronting the ‘sound but no character’ problem, Cantonese speakers resorted to
the strategy of creating a new character to represent a Cantonese word [17].
Similar to comments in YouTube, local online forums also contain a large amount
of short Cantonese texts mixed with extra characters. In order to acquire more written
388 J. Gao et al.

Cantonese corpus, 4,310,566 written Cantonese posts were crawled from a popular
local online forum [18].

2.3 Construction of Cantonese Dictionary


Word segmentation is a very important part before text classification. A good Can-
tonese dictionary is important in doing word segmentation. Through combining 26
Cantonese lexicons in Sogou [19], a popular text input software in China, we con-
structed a Cantonese dictionary containing 597,731 Cantonese words.

3 Text Preprocessing

YouTube comments are mainly written in Cantonese. However, English is also a


popular and official language in Hong Kong, 9% of the total comments that we col-
lected from YouTube are in English. To complete a full analysis, those English words
were first translated into Cantonese.

3.1 Translation
Because of lacking direct translation tool from English to Hong Kong Cantonese, two
steps were made to translate English comments to Cantonese. First, English words were
translated into simplified Chinese using the Google Translate API [20] for Python.
Second, Open Chinese Converter Project (OpenCC) [21] was used to convert simpli-
fied Chinese to Hong Kong Cantonese. OpenCC is an open source project for con-
version between Traditional Chinese and Simplified Chinese, supporting regional
idioms in Mainland China and Hong Kong [22].

3.2 Filtering
Stop words, by definition, are those words that appear in the texts frequently but do not
carry significant information [23]. Effective text mining can be achieved by removal of
stop words. Cantonese and Mandarin Chinese are within the same language family, so
their written forms share a number of words in common [15]. Due to the absence of
Cantonese stop word dictionary, we used the Mandarin stop words dictionary to filter
comments. Similar to English stop words, Chinese stop words are usually those words
with part of speeches like adjectives, adverbs, prepositions, interjections, and auxil-
iaries. Adverb “ ” (of), preposition “ ” (in), conjunction “ ” (because of) and
“ ” (so) are some examples [23].
According to the guidelines for manual annotation, a comment would be labelled as
non-risk if it only contains stop words, punctuations or emoji, because these simple
terms cannot provide sufficient information for the readers to assess suicide risk.
Following this guideline, if a comment only contains these terms, it will be detected
and classified as non-risk comment at first. For other comments, these terms will be
removed first and the remaining text will be classified using the classification models.
Detecting Comments Showing Risk for Suicide in YouTube 389

4 Text Classification for Suicidality Detection


4.1 Feature Representation
It is a common way to represent a document using a vector. In this paper, we utilized
the Jieba [24] segmentation tool and word2vec [25] model to acquire the sentence
vector.
Unlike English, Chinese sentences do not contain spaces. Therefore, words in a
sentence cannot be detected by computer automatically in Chinese. Based on the
Cantonese dictionary constructed in the last section, we conducted text segmentation
using Jieba [24], a Chinese text segmentation tool, to split the sentence into words.
The distributed representative of word in a vector space can group similar words
better and help algorithms to achieve a better result. This paper used the word2vec
model developed by Mikolov [25] for learning vector representations of words. We set
the dimensionality of vectors as 100 and learned the word vectors from the huge dataset
(4,310,566 Cantonese posts) collected from the local forum. Then, we averaged the
word vectors in a comment document to acquire its document vector.
Figure 1 shows the word ‘ (suicide in Traditional Chinese)’ and its 100
neighbouring words according to the cosine similarity between word vectors. The 100-
dimension word vector data were projected into 3 dimensions using the Principal
Component Analysis (PCA).

Fig. 1. Word vector visualization.


390 J. Gao et al.

4.2 Classifier
After filtering those comments only containing stop words, punctuations or emoji as
non-risk data, the remaining comments need to be classified.
Both machine learning and deep learning methods are popular in text classification.
The paper used algorithms in both fields to detect whether a comment shows risk for
suicide.
Support Vector Machine (SVM). Support Vector Machine (SVM) has been shown to
be highly effective at traditional text categorization [26]. This method searches for a
hyperplane represented by a vector that can separate document vectors of two classes
with maximum margin.
AdaBoost. Adaptive Boosting (AdaBoost) aims at constructing a “strong” classifier by
combining a number of “weak” classifiers [14]. The weights are proposed in AdaBoost
to increase the importance of misclassified data and decrease the importance of cor-
rectly classified data. Through combining these weak classifiers based on their relative
performance, AdaBoost can achieve an improved accuracy.
Random Forest (RF). Random forest is a variant of bagging methods proposed by
Breiman [27]. Similar to bagging, random forest constructs a decision tree for each of
the bootstrap samples drawn from the data. But unlike bagging, random forest ran-
domly selects a subset of predictors to determine the optimal splitting rule in each node
of the trees in order to avoid overfitting [28].
Long short-term memory network (LSTM). Long Short-Term Memory network
(LSTM) [29] is a special kind of recurrent neural network, capable of learning long-
term dependencies. We trained the LSTM model based on words, using the pre-trained
word2vec embedding layer with 100 dimensions. As shown in Fig. 2, The model is
formed by taking mean of the outputs of all LSTM cells to form a feature vector, and
then using multinomial logistic regression on this feature vector [30].

Fig. 2. Long short-term memory.

Topic seed words classification model. The suicide-related comment data studied
here are extremely imbalanced with a lot of non-risk comments. The paper designed a
topic seed word classification model to filter the non-risk comments at first and then use
the relatively balanced data to train the classifier. First of all, the seed words [31]
relating to suicide topic were summarized under the guidance of the suicide research
experts. The seed words list is shown in Table 1.
Based on the similarity of documents, if a document vector is far away from the
seed list, it can be predicted as non-risk. We describe the similarity by the cosine
similarity between a document and the seed list.
Detecting Comments Showing Risk for Suicide in YouTube 391

In Fig. 3, the x-axis shows the cutoff value for cosine similarity below which a
comment is predicted to be non-risk and the y-axis shows the misclassification rate. We
find that from 0.6 to 0.65, the misclassification rate did not increase much until it has a
sudden increase at cutoff = 0.7. As we use seed words here to filter out the non-risk
comments, we decided to choose 0.65 as the cutoff value for the cosine similarity. If a
comment’s cosine similarity from the seed list is smaller than 0.65, it will be classified
as non-risk and will be removed in the first stage. The remaining comments will then be
studied in the second stage for identification of at-risk comments.
Using 0.65 as cutoff to filter out the non-risk comments by the seed words, only
0.22% comments in the training data were misclassified. Table 2 shows the top 10 non-
risk comments with the highest cosine similarity filtered by seed words.

4.3 Loss Function


There are two kinds of loss function used in this article. Cross entropy loss will be used
when the model is trained by balanced dataset. Focal loss [16] will be used when the
model is trained by imbalance dataset.
Cross Entropy Loss. Cross entropy (CE) loss for binary classification:

 logð pÞ if y ¼ 1
CEðp; yÞ ¼
 logð1  pÞ otherwise

Where, y 2 f1g specified the class and p 2 ½0; 1 is the model’s estimated
probability for the prediction class.

Table 1. Seed words

Seed Words Translation in English


Suicide (Simplified Chinese)
Suicide (Traditional Chinese)
will go die (Both Traditional and Simplified Chinese)
Go die (Both Traditional and Simplified Chinese)
Why I am a human being (Cantonese)
Press (Traditional Chinese)
Pressure (Simplified Chinese)
Suffering (Traditional Chinese)
End one’s life (Traditional Chinese)

(continued)
392 J. Gao et al.

Table 1. (continued)

Seed Words Translation in English


End one’s life (Simplified Chinese)
Jump off (Cantonese)
Die (Both Traditional and Simplified Chinese)
End (Both Traditional and Simplified Chinese)
Vile (Traditional Chinese)
Disgust (Traditional Chinese)
Going to die (Both Traditional and Simplified Chinese)
Want to die (Both Traditional and Simplified Chinese)
Negative energy (Traditional Chinese)
Cry (Cantonese)
Very hard (Both Traditional and Simplified Chinese)
Very tired (Cantonese)
Cutting wrist (Cantonese)
Jump off a building (Traditional Chinese)
Jump off a building (Simplified Chinese)
Cutting wrist (Mandarin)
Cutting hand (Mandarin)
Leave this world (Cantonese)
Very stressful (Traditional Chinese)
Super stressful (Traditional Chinese)
Give up (Traditional Chinese)
Heartbroken (Traditional Chinese)
Jump off (Mandarin)
Unhappy (Cantonese)
Helpless (Traditional Chinese)
Garbage (Both Traditional and Simplified Chinese)
No hope (Cantonse)
Pain (Both Traditional and Simplified Chinese)
Collapse (Traditional Chinese)
Don’t want to live (Cantonese)
End one’s own life (Traditional Chinese)
Want suicide (Traditional Chinese)
End life (Traditional Chinese)

(continued)
Detecting Comments Showing Risk for Suicide in YouTube 393

Table 1. (continued)

Seed Words Translation in English


Kill oneself (Traditional Chinese)
Hopeless (Traditional Chinese)
What is the point to live on (Traditional Chinese)
Die (Both Traditional and Simplified Chinese)
Better to die (Cantonese)
Jumped (Cantonese)
What is the meaning of life (Traditional Chinese)
Kill (Traditional Chinese)

Fig. 3. Misclassification rate for various cosine similarity cutoff values.

Define pt :

p if y ¼ 1
pt ¼
1p otherwise

Then, cross entropy can be rewritten:

CEðp; yÞ ¼ CEðpt Þ ¼ logðpt Þ


394 J. Gao et al.

Table 2. Selected comments filtered by seed words

Cosine 1:at-risk Comment English Translation of


similarity 0:non-risk the Comment
0.6499 0 Actually you have a good
point
0.6498 0 Don’t feel sad
0.6497 0 Come on try your best
0.6495 0 We English teachers do a
lot of homework
0.6495 0 Why so many things are
arranged in the same week
0.6494 0 Thought there was
something wrong
0.6493 0 But believe we are the
best
0.6493 0 Have you thought how
many scores you can get
0.6490 0 Come on I believe you can
do it
0.6490 0 My mom forced me to
take Belilios (Note: a
school in Hong Kong)

Focal Loss. To address this class imbalance problem, focal loss [16] was designed by
reshaping the standard cross entropy loss such that it down-weights the loss assigned to
well-classified examples. The focal loss was defined as [16]:

FLðpt Þ ¼ at ð1  pt Þc logðpt Þ

Where, at 2 ½0; 1 for positive class 1 and ð1  aÞ for negative class. The tunable
focusing parameter c  0. at is introduced to balance the importance of
positive/negative examples. The modulating factor ð1  pt Þc is added to balance the
easy/hard examples (an example with large loss is defined as the hard example).

4.4 Evaluation
The aim of this paper is to predict whether a piece of YouTube comment is showing
suicide risk. The confusion matrix, as shown in Table 3, is commonly used in clas-
sification evaluation.
Here, we take at-risk class as positive class. Our purpose is to find the at-risk users
and save as many lives as possible. So the costs of false positive and false negative
predictions are not the same. A false positive prediction should be a serious matter as
we might miss the chance to save a life. Besides, non-risk class is dominating in the
data. Given such extremely imbalanced data, the error rate is no longer an appropriate
performance measure [32].
Detecting Comments Showing Risk for Suicide in YouTube 395

Table 3. Confusion matrix


Predicted positive Predicted negative
Positive class True positive (TP) False negative (FN)
Negative class False positive (FP) True negative (TN)

In this paper, we use geometric mean of the accuracies (G-mean) [33] as perfor-
mance measure:

TP
True Positive RateðAcc þ Þ ¼
TP þ FN
TN
True Negative RateðAcc Þ ¼
TN þ FP
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
G - mean ¼ Acc þ  Acc

G-mean is a popularly used performance evaluation measure in an imbalanced


training data. The idea is to maximize the accuracy on each of the two classes while
keeping these accuracies balanced [32]. For example, a high accuracy of negative
examples with a low accuracy of positive examples will result in a poor g-mean value.

5 Experiment and Results

This paper performed suicide-related comment classification based on both data and
algorithm levels.

5.1 Experimental Setting


Experimental Data. The data crawled from YouTube consist of 5051 comments (251
at-risk comments, 4800 non-risk comments), which were split into two datasets with
80% of them for training and 20% for testing purpose. To tackle the imbalanced
problem, we designed our model in two ways. One possibility is to apply under-
sampling to randomly select a balanced training dataset so that it consists of 201 risk
comments and 201 non-risk comments. Then the balanced dataset was used to train
classifiers using the cross-entropy loss as the loss function. Alternatively, we can use
the raw imbalanced training dataset (3840 risk comments and 201 non-risk comments)
to train classifiers using the focal loss.
Parameter Setting. This paper used the scikit-learn [34] library in Python to train
SVM, AdaBoost, and Random Forest models; used the genism [35] tool in Python to
396 J. Gao et al.

Table 4. Model parameters


Model Parameters
SVM kernel = ‘rbf’, C = 1.5, gamma = 0.05
(RBF)
Adaboost Base_estimator = decision tree, n_estmator = 50, learning_rate = 1,
algorithm = ‘SAMME.R’
Random max_depth = 5, n_estimators = 10, max_features = 1
forest
Word2vec size = 100, min_count = 5, sg = 1
LSTM vocab_dim = 100 # output dimension in embedding layer
batch_size = 32 # number of samples per gradient update
n_epoch = 4 #number of epochs to train the model

train word2vec model; used the keras [36] framework in Python to train the LSTM
model. Model parameters are shown in the following Table 4:

5.2 Experimental Results


Recall that once comments contain stop words, punctuations or emoji, they are all non-
risk. Such comments in the training data will then be classified as non-risk. Topic seed
words classification can also be used in advance to first classify the non-risk comments
to balance the dataset. Various classifiers mentioned in Sect. 4 were trained for the
remaining data. Finally, these methods were applied to the testing data and the testing
results are shown in Table 5.

Table 5. Testing results of classification based on improvement on data level (Testing data: 960
at-risk comments and 50 non-risk comments)
Feature extraction Classifier G-mean (%)
Set A CE loss SVM - no seed filter 78.3
SVM - seed filter 78.4
AdaBoost - no seed filter 79.2
AdaBoost - seed filter 78.6
RF - no seed filter 74.3
RF - seed filter 69.7
LSTM-no seed filter 84.3
LSTM-seed filter 82.3

Notice that using the under-sampling method, Set A consisting of 402 balanced
comments was generated and used to train classification models.
It can be seen from Table 5 that the deep learning algorithm LSTM performed
better than the traditional machine learning algorithms (SVM, AdaBoost and RF).
The LSTM classifier without filtering by the seed words performed the best, with
84.3% g-mean. The filter of seed words did not have significant impact on classification
Detecting Comments Showing Risk for Suicide in YouTube 397

even though it performs well to balance training and testing comments. This is because
using the under-sampling method, a balanced dataset was used to train the mode, when
the seed word filter is not necessary.
Given that the LSTM model performs well, this paper decides to solve the
imbalanced problem in the algorithm level based on LSTM model. The raw imbalanced
training dataset (Set B) without under-sampling was used to train the model. Here the
focal loss was introduced in LSTM model (setting a ¼ 0:75; c ¼ 1).
Due to the use of an imbalanced dataset to train the model, we cannot just use the
0.5 cutoff to predict the comment’s risk level. Based on the training dataset, we choose
the threshold which can achieve the highest g-mean as the model’s prediction cutoff.

Table 6. Testing results of classification based on improvement on algorithm level (Testing


data: 960 at-risk comments and 50 non-risk comments)
Feature extraction Classifier G-mean (%) Cutoff
Set B FC loss LSTM-no seed filter 81.8 0.20
LSTM-seed filter 84.5 0.25

As shown in Table 6, with the topic seed word filter, the LSTM model with focal
loss achieved 84.5% g-mean, which is slightly higher than the g-mean achieved by the
LSTM with cross-entropy loss based on the balanced dataset (84.3% g-mean).
Using the LSTM model and focal loss, the top 5 comments with highest predicted
probability of risk was shown in Table 7.

Table 7. Comments with highest predicted probability

(continued)
398 J. Gao et al.

Table 7. (continued)

6 Conclusion

This paper compared the performance of different classification algorithms based on the
word vector features. Because the YouTube comments are actually in a sequential list,
the LSTM which can learn the sequential information performs better than other
machine learning algorithms. Based on the topic seed word classification model and the
improvement on loss function, it can achieve the best testing performance (84.5% g-
mean). The focal loss was also effective in figuring imbalanced text classification
problem. In addition, in terms of combination with under-sampling methods to classify
comments, LSTM also performed better than other machine learning algorithms,
reaching 84.3% g-mean.
The study has pushed forward natural language processing with Cantonese, which
is a complicated dialect mixed Traditional Chinese, borrowed characters to represent
spoken terms, and English. It also demonstrates the potential of using machine learning
Detecting Comments Showing Risk for Suicide in YouTube 399

methods to detect suicide risk in real social media settings. As suicide prevention is a
battle against the clock, every minute saved in detecting suicide risk and alerting
intervention can be crucial. However, it is challenging to employ staff to monitor and
review online content 24/7. Based on the computerized algorithm, suicide professionals
can scale up the real-time monitoring of online content to detect potentially at-risk
posts, based on which more timely interventions can be implemented.

Acknowledgements. The study was supported by Hong Kong General Research Fund (Ref No.:
17628916).

References
1. Centre for Suicide Research and Prevention, The University of Hong Kong. https://csrp.hku.
hk/statistics/. Accessed 30 Mar 2018
2. World Health Organization Webpage. http://www.who.int/mental_health/suicide-
prevention/world_report_2014/en/. Accessed 30 Mar 2018
3. Cheng, Q., Chen, F., Lee, E.S.T., Yip, P.S.F.: The role of media in preventing student
suicides: a Hong Kong experience. J. Affect. Disord. 227, 643–648 (2018)
4. Cheng, Q., Kwok, C.L., Zhu, T., Guan, L., Yip, P.S.F.: Suicide communication on social
media and its psychological mechanisms: an examination of Chinese microblog users. Int.
J. Environ. Res. Public Health 12(9), 11506–11527 (2015)
5. Chan, M., et al.: Engagement of vulnerable youths using internet platforms. PLoS ONE 12
(12), e0189023 (2017)
6. Pestian, J.P., Matykiewicz, P., Grupp-Phelan, J.: Using natural language processing to
classify suicide notes. In: Proceedings of the Workshop on Current Trends in Biomedical
Natural Language Processing. Association for Computational Linguistics (2008)
7. Huang, Y.-P., Goh, T., Liew, C.L.: Hunting suicide notes in web 2.0-preliminary findings.
In: Ninth IEEE International Symposium on Multimedia Workshops, ISMW 2007. IEEE
(2007)
8. Moreno, M.A., et al.: Feeling bad on Facebook: depression disclosures by college students
on a social networking site. Depress. Anxiety 28(6), 447–455 (2011)
9. O’Dea, B., Wan, S., Batterham, P.J., Calear, A.L., Paris, C., Christensen, H.: Detecting
suicidality on Twitter. Internet Interv. 2(2), 183–188 (2015)
10. Vioulès, M.J., Moulahi, B., Azé, J., Bringay, S.: Detection of suicide-related posts in Twitter
data streams. IBM J. Res. Dev. 62(1), 7:1–7:12 (2018)
11. Cheng, Q., Li, T.M.H., Kwok, C.L., Zhu, T., Yip, P.S.F.: Assessing suicide risk and
emotional distress in Chinese social media: a text mining and machine learning study.
J. Med. Internet Res. 19(7), e243 (2017)
12. Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. Emerg.
Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)
13. Estabrooks, A., Jo, T., Japkowicz, N.: A multiple resampling method for learning from
imbalanced data sets. Comput. Intell. 20(1), 18–36 (2004)
14. Zhang, Z., Ye, Q., Li, Y.: Sentiment classification of Internet restaurant reviews written in
Cantonese. Expert Syst. Appl. 38(6), 7674–7682 (2011)
15. Zhang, Z., Ye, Q., Li, Y., Law, R.: Sentiment classification of online Cantonese reviews by
supervised machine learning approaches. Int. J. Web Eng. Technol. 5(4), 382–397 (2009)
16. Lin, T.-Y., et al.: Focal loss for dense object detection. arXiv preprint arXiv:1708.02002
(2017)
400 J. Gao et al.

17. Cheung, K.-H., Bauer, R.S.: The representation of Cantonese with Chinese characters.
University of California, Project on Linguistic Analysis (2002)
18. LIHKG Webpage. https://lihkg.com/category/30. Accessed 30 Mar 2018
19. Sogou Webpage. https://pinyin.sogou.com/dict/search/search_list/%D4%C1%D3%EF/
normal. Accessed 30 Mar 2018
20. Python Webpage. https://pypi.python.org/pypi/googletrans. Accessed 30 Mar 2018
21. Python Webpage. https://pypi.python.org/pypi/OpenCC. Accessed 30 Mar 2018
22. GitHub Webpage. https://github.com/BYVoid/OpenCC. Accessed 30 Mar 2018
23. Zou, F., Wang, F.L., Deng, X., Han, S., Wang, L.S.: Automatic construction of Chinese stop
word list. In: Proceedings of the 5th WSEAS International Conference on Applied Computer
Science (2006)
24. GitHub Webpage. https://github.com/fxsjy/jieba. Accessed 30 Mar 2018
25. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of
words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. (2013)
26. Joachims, T.: Text categorization with support vector machines: learning with many relevant
features. In: European Conference on Machine Learning (1998)
27. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
28. Liaw, A., Wiener, M.: Classification and regression by randomForest. R. News 2(3), 18–22
(2002)
29. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computat. 9(8), 1735–
1780 (1997)
30. Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text
classification. Adv. Neural Inf. Process. Syst. (2015)
31. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: Proceedings of the 20th
International Conference on Computational Linguistics. Association for Computational
Linguistics (2004)
32. Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning.
IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009)
33. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided
selection. ICML, Vol. 97 (1997)
34. Scikit-learn Webpage. http://scikit-learn.org/stable/. Accessed 30 Mar 2018
35. Gensim Webpage. https://radimrehurek.com/gensim/models/word2vec.html. Accessed 30
Mar 2018
36. Keras Webpage. https://keras.io/models/sequential/. Accessed 30 Mar 2018
Twitter Analytics for Disaster Relevance
and Disaster Phase Discovery

Abeer Abdel Khaleq(&) and Ilkyeun Ra

University of Colorado, Denver, CO 80204, USA


{abeer.abdelkhaleq,ilkyeun.ra}@ucdenver.edu

Abstract. Natural disasters happen at any time and at any place. Social media
can provide an important mean for both people affected and emergency per-
sonnel in sharing and receiving relevant information as the disaster unfolds
across the different phases of the disaster. Focusing on the phases of pre-
paredness, response and recovery, certain information needs to be retrieved due
to the critical mission of emergency personnel. Such information can be directed
depending on the disaster phase towards warning citizens, saving lives, or
reducing the disaster impact. In this paper, we present an analytical study on
Twitter data for three recent major hurricane disasters covering the three main
disaster phases of preparedness, response and recovery. Our goal is to identify
relevant tweets that will carry important information for disaster phase discov-
ery. To achieve our goal, we propose a cloud-based system framework focused
on three main components of disaster relevance classification, disaster phase
classification and knowledge extraction. The framework is general enough for
the three main disaster phases and specific to a hurricane disaster. Our results
show that relevant tweets from different disaster data sets spanning different
disaster phases can be classified for relevancy with an accuracy around 0.86, and
for disaster phase with an accuracy of 0.85, where key information for disaster
management personnel can be extracted.

Keywords: Twitter analytics  Twitter data mining


Social media classification  Disaster relevance classification
Disaster phase classification  Cloud-based analytics  Disaster management

1 Introduction

Natural disasters are large scale in impact and many of them span multiple disaster
phases. Some disasters need more focus on preparedness, some on response and some
on recovery. It is necessary to direct each agency to its mission during a disaster based
on the disaster phase. For example, warning systems and evacuation plans need to be in
place during preparedness, medical personnel need to act during response, and relief
agencies will provide shelters during recovery. Twitter provides a rich platform for key
information during a disaster. Analyzing and extracting informational tweets from
Twitter during disasters is one of the text mining researches in recent years [1].
However, Twitter data is highly unstructured and has a lot of noise and irrelevant

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 401–417, 2019.
https://doi.org/10.1007/978-3-030-02686-8_31
402 A. A. Khaleq and I. Ra

messages where identifying relevant tweets is a challenge [2]. There is a need to filter
out those relevant tweets during the disaster phases and uncover insightful information.
During a disaster, we may have massive number of disaster related tweets coming
from many different sources carrying important disaster information. Our idea is to
build a general system framework that can process the large number of the disaster
related tweets and filter out the relevant ones that may carry important information and
can be used for managing the disaster. From the collected disaster relevant Twitter data,
the disaster phase, the disaster location and other key information will be extracted. Our
system will be hosted in the cloud for storage and analytics processing capabilities and
protecting from the potential loss of resources during a disaster.
To accomplish our goal, we conducted an analytical study on Twitter data from
three recent hurricanes disasters including hurricane Matthew from 2016, Harvey and
Irma from 2017 across the three disaster phases of preparedness, response and recovery
to have a well diverse and general data set. We chose hurricanes as a disaster type for
our analytical study because they can be predicted, have a sustainable impact for
response and recovery. Hurricanes are natural disasters that affect the US and other
countries every year. They result in a great loss of civilians and cause a lot of damage
and devastation that goes far beyond expectations. Many lives can be saved, and many
resources can be sustained with minimal damage if the proper information can be
delivered to the right personnel at the right time during the right phase of the disaster.
This makes them applicable to our study where the three disaster phases of pre-
paredness, response and recovery can be further identified to provide the needed
resources.
Since each disaster phase has its own requirements and valuable information, it is
important to distinguish between these phases and extract the right information for each
phase. The contributions of our paper are as follows:
(1) Provide a general cloud-based framework for Twitter data analytics in hurricane
disaster management.
(2) Identify relevant tweets during a disaster from different hurricane disaster data
sets.
(3) Classify the disaster phase of preparedness, response and recovery from relevant
tweets.
(4) Extract key knowledge from relevant tweets text such as location, key phrases and
key terms that can be used by disaster emergency personnel.
Our study is not geared toward creating new classification algorithms. Rather it is
limited to the use of existing classification algorithms and methodologies to uncover
the disaster relevance, the disaster phase and disaster key knowledge from the massive
Twitter data that comes during a disaster. In this study we present our work on static
hurricane Twitter data to build the classification models, in future work we will
implement the system on streaming real-time disaster data.
The paper is organized as follows. Section 2 describes related work in Twitter
disaster relevance, Sect. 3 describes the proposed Twitter analytics system framework
along with the hurricane data sets used for the experiments, Sect. 4 presents the disaster
relevance classification experiment on the tweets, Sect. 5 presents the disaster phase
discovery experiment on both labeled and unlabeled tweets, Sect. 6 describes the
Twitter Analytics for Disaster Relevance and Disaster Phase 403

knowledge extraction experiment for the disaster location and other key information
from relevant tweets, and finally Sect. 7 presents a conclusion and future work
directions.

2 Related Work

It has been widely acknowledged that Humanitarian Aid and Disaster Relief (HADR)
responders can gain valuable insights and situational awareness by monitoring social
media-based feeds, from which tactical, actionable data can be extracted from the text
[3]. Ashktorab et al. [4], for example, introduced Tweedr, a Twitter-mining tool that
extracts actionable information for disaster relief workers during natural disasters. The
Tweedr pipeline consists of three main parts: classification, clustering, and extraction.
Imran et al. [5] developed an artificial intelligence system for disaster response that
classifies real-time Twitter data into relevant disaster categories based on keywords
hashtags. Imran et al. [6] performed disaster-relevant information extraction on Twitter
data for both hurricane Sandy in 2012 and Joplin tornado in 2011. In their work they
proposed a two-step method for disaster-related information extraction which are
classification of relevance and information extraction from tweets using off-the-shelf
free software. In the same context, Stowe et al. [2] performed Twitter data classification
for relevance before, during and after the hurricane Sandy 2012 disaster. Their method
was based on binary classification for both relevance and fine-grained categories such
as action, preparation, movement, etc. They concluded that tweets can be classified
accurately combining a variety of linguistic and contextual features which can sub-
stantially improve classifier performance.
Those research areas address tweets classification and fine-grain category classifi-
cation during a disaster without identifying the disaster phase. Wang et al. [7] pointed
out that most studies with exceptions of Haworth et al. [8] and Yan et al. [9] have
focused on disaster response instead of other phases because of lack of data through
those phases. This data sparsity problem in phases like, mitigation, preparedness and
recovery may cause unreliable analytical results. They emphasized that future work is
needed to overcome this limitation and effort needs to be directed toward gaining more
useful information for all phases of disaster management through mining social media
data.
To the best of our knowledge, there is no work on establishing a general classifi-
cation framework of Twitter data to classify the three main disaster phases of pre-
paredness, response and recovery. Most of the research work is more focused on
response and on the subcategories of fine-grained classification. There is also a lack for
a general hurricane disaster classification framework, thus our work will focus on the
characteristics of a disaster from the three shared phases of preparedness, response and
recovery specific to a hurricane natural disaster. Our work is different on the following
aspects:
1. We propose a general hurricane disaster classification framework based on three
natural hurricane disaster datasets with accuracy as a measurement for classification.
404 A. A. Khaleq and I. Ra

2. We will identify relevant tweets based on textual context by manually examining


and labeling the tweets and not using hashtags and keywords for a more general and
accurate classification.
3. We will uncover the disaster phase of preparedness, response and recovery
through classification of relevant tweets with accuracy as a measurement for classifi-
cation. We believe these three disaster phases can be founded easily in tweets related to
natural disasters like hurricanes.

3 System Framework and Data Set

Our proposed system framework will have a Twitter analytics component for disaster
relevancy and phase discovery specially tuned for hurricanes as part of a complete
cloud-based platform for disaster management and response. This can serve as a
foundation for a micro-service architecture where new components can be added, or
existing ones can be updated for a new disaster phase or new requirements. As the
focus of our study is on the Twitter analytics component, we plan on pursuing
implementing the cloud-based framework in our future work.

Fig. 1. System framework for Twitter analytics.


Twitter Analytics for Disaster Relevance and Disaster Phase 405

Figure 1 provides the general system framework along with the Twitter analytics
system workflow. Our focus in this study is on tweets texts for location and key
knowledge extraction. The date and time of a disaster can be extracted from the
created_at1 field of the tweets and will be part of the complete framework imple-
mentation of consecutive studies.
Our work is focused on static Twitter data that was collected from recent hurricane
disasters including hurricane Matthew, Harvey and Irma. All three disasters had sig-
nificant impact on US and other areas with casualties and damage. As we are aiming on
having a general classification framework for a hurricane disaster, we sampled the data
from three hurricanes to have a more general data set. We also made sure to diversify
the data by covering the disaster phases of preparedness, response and recovery from
each hurricane disaster. We identified the disaster phase based on the disaster evolving
date and time and the available hurricane information. We applied variable number of
queries with geo-tagged and non-geotagged queries as our focus is on identifying
relevance over a general data set using the different sets of disasters and different
queries without adding any bias to certain tweets on the classifier. We used Gnip2 for
the historic Matthew data set and Twitter API streaming for the disasters of Harvey and
Irma as they were unfolding. Table 1 provides a more detailed look at the data sets
collected from the three hurricanes, listing the query used and the corresponding dis-
aster phase.

Table 1. Collected data sets for the three hurricanes


Hurricane Date Query Disaster Number of
phase tweets
collected
Matthew 10/7/2016 Track = (“Hurricane Matthew”) (flood OR wind OR Preparedness 27,000 over
storm OR heavy OR rain), no retweets, lang=‘en’ the 3 days
Matthew 10/8/2016 Track =((“Hurricane Matthew”) (flood OR wind OR Response
storm OR heavy OR rain), no retweets, lang=‘en’
Matthew 10/9/2016 Track = (“Hurricane Matthew”) (flood OR wind OR Recovery
storm OR heavy OR rain), no retweets, lang=‘en’
Harvey 8/25/2017 Bounding box including corpus Christi, san Preparedness 7,728
Antonio, west of Houston, Lang=‘en’,
track=‘Hurricane Harvey’, no retweets
Harvey 8/28/2017 Bounding box around Houston area, Lang=‘en’, Response 121,658
track=‘Hurricane, Harvey, flood, help, rescue, rain’,
no retweets
Harvey 8/30/2017 Lang=‘en’, track=Houston, no location, no retweets Recovery 61,940
Irma 9/5/2017 Track=‘Hurricane Irma’, lang=‘en’, no retweets Preparedness 34,445
Irma 9/10/2017 Bounding box around Florida, Track=‘irma’, Response 1,128
lang=‘en’, no retweets
Irma 9/11/2017 Track = ‘Hurricane Irma’,lang = eng, no retweets Recovery 9,099

1
Tweet object https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.
2
Gnip http://support.gnip.com/.
406 A. A. Khaleq and I. Ra

4 Disaster Relevance Classification


4.1 Disaster Relevance Annotation
Our goal is to classify a general tweet during a hurricane disaster for relevance. We
manually examined the data for quality of tweets texts. Manually labeled a sample of
each disaster set over every phase for relevance. We examined the relevance of each
Twitter text to the disaster phase. If the Twitter text contains any crucial information
related to disaster phases such as “need”, water”, “evacuate”, “rescue”, we label it as
relevant. If the tweet text does not carry any crucial information, we label it as non-
relevant. For example, a relevant message is “Storm getting stronger: 2 million urged to
leave” where it has information about evacuation that is important for preparedness.
However, a message like, “We pray for those in the path of Hurricane Matthew. If you
are in an area that may be affected by the disaster phases and…” will be labeled as non-
relevant. It is important to point out that during this initial step we are classifying for
relevance only and not for the disaster phase. As we cannot manually label the huge
number of tweets across the three disaster sets, we randomly sampled a smaller data set
of each. Table 2 shows the sampled data sets across the three disaster phases of the
three hurricanes. Our initial plan was to sample the same number of tweets from each
data set over each phase, but some data sets have a lot of noise and repetitions for some
tweets which explains the lower number of tweets for some sets. However, we feel that
we have captured the three disaster stages over a hurricane disaster with this sample
data set as this is our focus.

Table 2. Sampled data set for disaster relevance classification


Disaster phase Hurricane Matthew Hurricane Harvey Hurricane Irma Total
Preparation 200 Relevant 200 Relevant 200 Relevant 600 Relevant
200 Non-relevant 157 Non-relevant 106 Non-relevant 463 Non-relevant
Response 188 Relevant 130 Relevant 191 Relevant 509 Relevant
109 Non-relevant 50 Non-relevant 105 Non-relevant 264 Non-relevant
Recovery 171 Relevant 31 Relevant 126 Relevant 328 Relevant
74 Non-relevant 16 Non-relevant 110 Non-relevant 264 Non-relevant
Total 559 Relevant 361 Relevant 517 Relevant 1437 Relevant
383 Non-relevant 178 Non-relevant 321 Non-relevant 927 Non-relevant

4.2 Relevance Classification Model


We have utilized Microsoft Azure machine learning studio3 to conduct our experiment
as we plan on having a cloud-based framework in addition to the fact that Azure
learning studio has a vast number of classification models and text analytics models
that can be easily tuned for performance. We combined the three data sets into one.

3
Microsoft Azure Machine Learning Studio https://azure.microsoft.com/en-us/services/machine-
learning-studio/.
Twitter Analytics for Disaster Relevance and Disaster Phase 407

We cleaned and removed missing data based on text or other important fields which
resulted in 2311 tweets, 1434 relevant and 877 non-relevant. We preprocessed the data
by removing special characters, URLs and user mentions for privacy. We kept numbers
as they are important for hurricane category, number of casualties, address, etc. We
tokenized, stemmed and removed stop words.

4.3 Binary Classification Algorithm


Stowe et al. [2] work showed that logistic regression with uni-gram features and cross
validation achieved best accuracy on binary classification for tweets relevance.
Habdank et al. [10], pointed out that uni-gram achieves better accuracy than bi-grams
in tweet text classification for relevance as proved in other researchers experiments. We
have also experimented with binary classification algorithms in previous work on
Twitter data including logistic regression, support vector machine, Naïve Bayes and
Stanford classifier. We found that logistic regression with uni-gram features gave us the
best accuracy. We applied TF-IDF (Term Frequency Inverse Document Frequency)
weighing function to uni-grams counts which adds weights of words that appear fre-
quently in a single record but are rare across the entire dataset. We used filter-based
feature selection to reduce the dimensionality and chose 1000 features with Chi-
squared as a score function to calculate the correlation between the label column value
and the text vector. We split the data 70% training and 30% testing. For parameter
tuning, we split the testing data 50% for parameter tuning and 50% for scoring. We also
used 10-fold cross validation to alternate between training and testing data and to assess
both the variability of the dataset and the reliability of the training model.

4.4 Evaluation Measurement


Having a classifier model that can accurately classify relevant tweets during an
emergency is an important part of measuring the classifier performance. Some tweets
can be a matter of saving or losing a life if it has not been classified correctly to be
relevant. Habdank et al. [10] explained how accuracy and recall are very important
evaluation measures. The higher the recall value the less relevant tweets have been
falsely marked negative. Precision and F1 score are also other significant measures.
Precision measures false positives and F1 score is the weighted mean of both precision
and recall. We focused in our experiment on accuracy as a main evaluation metric in
addition to recall, precision and F1 score.
Table 3 shows the logistic regression results across different feature hashing
techniques. The best accuracy we got was around 0.86 using 10-folds cross validation
and uni-gram with TF-IDF feature selection, which is slightly better than 0.856
achieved by Stowe et al. [2]. This shows that tweets from multiple data sets over
different disaster phases for a certain disaster type can be classified for relevance with
an accuracy similar and slightly better than one single data set which helps in building a
classifier that can be more general for a certain disaster type such as hurricanes.
408 A. A. Khaleq and I. Ra

Table 3. Results of binary classification for disaster relevance


Binary classification Average Precision Recall F1
accuracy score
Two-class logistic regression uni-gram with 0.858 0.868 0.90 0.886
TF-IDF cross validation
Two-class logistic regression unigram feature 0..852 0.857 0.91 0.884
selection parameter tuning cross validation
Two-class logistic regression uni-gram with 0.841 0.852 0.90 0.876
TF-IDF
Two-class logistic regression uni-gram with 0.835 0.85 0.893 0.871
feature selection parameter tuning

5 Disaster Phase Discovery

Once the tweets are classified for relevance, we need to identify the disaster phase from
the relevant tweets. We focus our work on the three main disaster phases of pre-
paredness, response and recovery as these are the main three phases where most of
natural disasters will go through especially hurricanes. We have experimented with
LDA (Latent Dirichlet Allocation) for topic discovery on unlabeled data and multi-
class binary classification on labeled data. The following sections describe our findings.

5.1 LDA for Disaster Phase Discovery on Unlabeled Data


LDA uses a generative approach on unlabeled data. The algorithm generates a prob-
abilistic model that is used to identify groups of topics which then can be used to
classify either existing training cases or new cases. It uses the distribution of words to
mathematically model topics [11]. The topic model gives us two major pieces of
information for any collection of documents: (1) a number of topics which are con-
tained within a corpus and; (2) for each document contained within the corpus, what
proportion of each of the topics is contained within each document [12]. It is important
to note that during a disaster usually tweets will be coming from one phase at a given
time with some overlap. Based on this we are not using LDA to uncover the disaster
phase as the disaster unfolds in real-time, we are rather identifying the disaster phase
from static data to help in discovering disaster phase. Based on similar terms among the
disaster phases across the three different disaster sets we can potentially label the data.
In LDA, every topic is a collection of words. Each topic contains all the words in the
corpus with a probability of the word belonging to that topic. LDA finds the most
probable words for a topic, associating each topic with a theme is left to the user.
The LDA approach requires careful validation of the topical clusters.
Twitter Analytics for Disaster Relevance and Disaster Phase 409

We applied LDA in Azure machine learning studio on the relevant tweets. In LDA,
an important parameter need to be identified which is the number of topics. We
experimented with few topics and different data sets to find the best topic discovery for
the disaster phases. When we applied LDA on one data set such as hurricane Irma that
has the three disaster phases with topic = 3 and uni-gram we got good separation based
on the disaster phase. Table 4 shows a sample of the results where we can identify topic
1 for assessment and recovery, topic 2 for response, and topic 3 for preparedness and
update. However, when we applied LDA on the general data set for the three hurricanes
we got mixed results and as we increase the number of topics we can see the sub
categories of the disaster emerge better such as warning, update, and death. We are
convinced that LDA can be a good choice for identifying the disaster phase on one data
set but does not perform well on a more diverse data set.

Table 4. Sample topics identified from LDA on hurricane Irma data set
Tweet text Topic1 Topic2 Topic3
drone footage naples florida shows complete 0.997509 0.001245 0.001245
devastation hurricane irma
hurricane irma 10 dead cuba record flooding hits 0.000831 0.998337 0.000831
northern florida latest news
nc dps state ready hurricane irmas effects reach north 0.000997 0.000997 0.998006
carolina

5.2 Multi-class Classification for Disaster Phase Discovery on Labeled


Data
As LDA did not perform well to accurately identify the three disaster phases, we
applied multi-class classification on the relevant tweets to classify the relevant tweets
for a disaster phase. We combined data sets from the three different disasters covering
the three disaster phases of preparedness, response and recovery to have a well-
balanced data set. Only relevant tweets were taken, with a phase label 1 for pre-
paredness, 2 for response, and 3 for recovery. The data was labeled manually based on
the disaster phase. We acquired a balanced data set with a total of 981 relevant tweets
consisting of 327 tweets from each disaster phase across the three different disasters.
The data was preprocessed in the same way we did our binary classification, split
into 70% training and 30% testing. We performed the experiment in Azure Machine
Learning studio. We identified several multi-class classification algorithms to evaluate
for accuracy based on recommendation from the work of Huang, et al. [13] and Azure
machine learning [14]. The classifiers were chosen based on their known high accuracy
for multi-class text classification. Table 5 provides the results of the multi-classifiers on
the data set.
410 A. A. Khaleq and I. Ra

Table 5. Results of multi-class text classification for disaster phase identification


Multi-classifier algorithm Average Overall Micro- Macro- Micro- Macro-
accuracy accuracy average average average average
precision precision recall recall
Neural networks uni-gram 0.85 0.775 0.775 0.777 0.775 0.775
feature hashing parameter
sweeping
Two-class logistic regression, 0.85 0.775 0.775 0.775 0.775 0.775
with one vs. all multi-classifier
uni-gram feature hashing
parameter sweeping
Multi-class decision forrest 0.845 0.768 0.768 0.77 0.768 0.768
with feature hashing parameter
sweeping

We can see that both neural networks with uni-gram feature hashing and parameter
tuning along with two-class logistic regression with one-vs-all multi-classifier gave an
average accuracy of 85% and an average recall of 78%.
Comparing our results to previous work on multi-class text classification, Stowe
et al. [2] performed binary classification on the fine-grain subcategory of the disaster
tweets and their best feature precision was around 0.71 and recall around 0.80. Huang
et al. [13] applied logistic regression binary classification on the fine-grained sub
categories of the disaster and got an overall precision of 0.647 and recall of 0.711. Our
results show that we can achieve an average accuracy of 0.85 on a more general
disaster phase discovery rather than fine-grained sub categories. This shows that rel-
evant tweets can be classified for a disaster phase discovery with good accuracy.

6 Knowledge Extraction

6.1 Location
After tweets are classified for relevance and disaster phase, useful information need to be
extracted. One main information is the location of the disaster. Tweets can be geo-tagged
by the user to indicate where the tweet is coming from. This information is represented in
the coordinates field of the tweet which is in the form of a geoJSON (longitude first, then
latitude). For example: “coordinates”: {“coordinates”: [− 75.14310264, 40.05701649],
“type”: “Point”}. The problem is not all tweets are geo-tagged, in our data set, for example,
for both hurricane Matthew and Harvey in a data set of 1973 tweets, only 1% tweets are
geo-tagged. Another field that a user can share a location is the place field which when
present, indicates that the tweet is associated, but not necessarily originating from, a Place.
In the same data set only 5% of the tweets are associated with a place. Extracting location
from text will aid in identifying the main areas affected by the disaster [15]. In the
following sections we present extracting tweets location from text, coordinates and place
fields of a tweet object.
Twitter Analytics for Disaster Relevance and Disaster Phase 411

6.1.1 Text Based Location Extraction


Our data set consists of 981 relevant tweets with 327 tweets from each disaster phase
across the three different Hurricane disasters of Matthew, Harvey and Irma. Based on
the lack of the tweet originating place in the coordinates field, we examined the tweet
text to extract the location. We applied the Named Entity Recognition module in Azure
learning studio [16] which identifies the names of things from text such as people,
companies, locations, etc. Figure 2 presents the extracted location from the tweets text.
We can see the extracted location names are associated with the hurricanes actual
locations. For example, hurricane Matthew was targeting Florida, Haiti, North Carolina
and South Carolina. Hurricane Harvey was targeting Houston, Texas and Hurricane
Irma was targeting South Carolina, North Carolina and Florida. We can also identify
the name of the hurricane such as Harvey, Matthew or Irma. This is a holistic approach
where the disaster is happening, but for precise location, the geo-tagged coordinates
field will give the exact address.

Fig. 2. Extracted locations from relevant tweets text for the three disasters Matthew, Harvey and
Irma sampled data set.

6.1.2 Coordinates and Place Fields Location Extraction


In this section we present a holistic approach to uncover a disaster location from the
three tweet fields of text, coordinates and place and compare the results for consistency.
Our data set consists of 121,658 tweets of Hurricane Harvey during the response phase.
We extracted the latitude and longitude from the coordinates field of the geo-tagged
412 A. A. Khaleq and I. Ra

tweets and uploaded them on a Google maps for visual representation using Google
Table Fusion. Figure 3 shows the geo-tagged tweets on Google map from the hurricane.
Harvey dataset during the response phase and how they are mainly originating from
Houston, TX the main affected area during the hurricane.

Fig. 3. Coordinates of geo-tagged tweets of hurricane Harvey, response phase data set.

The place field in a tweet object consists of subfields such as country, country_-
code, name, place_type all within a bounding box coordinates. We extracted those
subfields on the tweets that the user decides to share a place with. Again, the place is
not necessarily where the tweet is originating from. Figure 4 shows the cities names
based on the place name field in the same data set. Around 4000 tweets associate
Houston with the tweet place. In addition, about 1500 tweets associate Texas with the
tweet place. This indicates that the disaster place is associated with Houston, Texas.
To compare our results with the text extracted location on the same data set, we
applied the named entity recognition module in Azure on the tweets text which resulted
in around 6000 mentions of Houston and about 4200 mentions of Texas as shown in
Fig. 5. Comparing the results of coordinates, place and text fields extraction shows that
they are consistent which confirms that the disaster location is mainly affecting
Houston, TX.
Twitter Analytics for Disaster Relevance and Disaster Phase 413

Fig. 4. Extracted location from tweets place field for hurricane Harvey, response phase data set.

Fig. 5. Extracted location from tweets text for hurricane Harvey, response phase data set.
414 A. A. Khaleq and I. Ra

We can also see from the results that many of the disaster locations names are
coming from tweet text vs. place and coordinates confirming that tweets text carry key
information during a disaster phase. There can also be other fields to extract location
such as the user profile which is not necessarily where the tweet is originating from, but
the correlation can be further studied in future work.

6.2 Key Knowledge Extraction


For key knowledge extraction, we experimented with both term frequency and Key
Phrase Extraction module in Azure [17]. Our data set consists of 981 tweets from the
three-different disaster sets with a balanced distribution among the disaster phases
totaling 33% tweets for each disaster phase of preparedness, response and recovery.
For term frequency we created the matrix of terms using R in Azure for each
disaster phase. The preparedness phase resulted in 693 key terms. Table 6 shows the
top key terms for each disaster phase. For key phrase extraction, the module is a
wrapper for natural language processing API for key-phrase extraction. The phrases are
analyzed as potentially meaningful in the context of the sentence for various reasons
such as if the phrase captures the topic of the sentence and if the phrase contains a
combination of modifier and noun that indicates sentiment. The output is a dataset
containing comma separated key phrases in the text. Figure 6 gives the output of
applying the module on the preprocessed data set for each of the disaster phases.
Comparing the two outputs we can see the similarity among the key terms for each
disaster phase. Utilizing the key phrases module gives us more meaningful phrases in
high frequency for the disaster phase which will be very helpful for disaster personnel.
The term frequency can be used as a complimentary module for verification and for
building a key term dictionary for the disaster phase. The hurricanes names and
locations can be stripped off to generalize the dictionary terms for any disaster.

Table 6. Key terms for each disaster phase based on tweet text term frequency
Disaster Top key words in order
phase
Preparedness Hurricane, storm, Matthew, Harvey, Irma, Florida, category, Haiti, coast, Texas,
disaster, death, wind, strengthen, toll, mph, dead, brace, deadly, surge, barrel,
hit, near, Caribbean, news, atlantics, evacuation, head, immense, intensify,
prepare, suffer, update, order, safe, threaten, approach, flee, flood, declare,
expect
Response Hurricane, Matthew, storm, Florida, Irma, flood, help, key, coast, surge,
landfall, wind, Harvey, Houston, batter, category, power, Jacksonville, rain, hit,
people, Carolina, foot, downgrade, feel, victim, weaken, need, help, rescue, kill,
shelter, emergency, fear, relief, deadly, death, threaten, damage
Recovery Hurricane, Matthew, storm, Irma, Carolina, north, state, flood, Florida, death,
destruction, major, face, governor, rain, fatality, Houston, toll, damage, power,
leave, surge, hit, devastation, cholera, expect, river, destructive, head, outbreak,
cause, effect, collapse
Twitter Analytics for Disaster Relevance and Disaster Phase 415

Fig. 6. Top key phrases for preparedness, response and recovery disaster phases in order from
left to right.

7 Conclusion and Future Work

In this paper we proposed a general framework for a cloud-based Twitter analytics


platform for disaster relevance identification and disaster phase discovery. We exam-
ined three major hurricanes and specially focused on studying three main disaster
phases: disaster preparedness, disaster response, and disaster recovery. Our proposed
system consists of three main components of Twitter analytics: relevance classification,
disaster phase classification and knowledge extraction. Our experiment demonstrates
that we can build a general classifier with good accuracy around 86% to classify
relevant tweets from a hurricane disaster. Disaster phase discovery using multi-class
text classification turns out to be a better choice for uncovering the three main disaster
phases compared to LDA. LDA gives mixed results depending on the data set size and
diversity. We were able to classify the disaster phase of preparedness, response and
recovery using a multi-classifier with an accuracy around 85%. Relevant tweets for a
certain disaster phase carry important information for emergency management per-
sonnel. We extracted the disaster location name from the tweet text and from the geo-
tagged coordinates and place fields. As the number of geo-tagged tweets is usually very
416 A. A. Khaleq and I. Ra

limited, the extracted text-based location becomes helpful in identifying the general
location of a disaster. We have also extracted the key phrases and key terms for each
disaster phase which can be used to uncover more fine-grained categories and poten-
tially build a disaster phase key term dictionary.
Our study is limited in scope to the use of existing classification algorithms for
Twitter text classification of relevance and disaster phase discovery on hurricane static
disaster data. We focused in our study on extracting meaningful disaster knowledge
from tweets text. However, there is more disaster information that needs to be extracted
including the disaster time, and the disaster scale for assessment and recovery. Novel
approaches will be needed to uncover those areas from other tweets fields in addition to
the text field.
As we continue working on this framework, we plan to have a general Twitter
platform that can be utilized in a cloud-based disaster management application as a
service. The platform needs to be general enough to allow for dynamic requirements
update through micro-service architecture. Identifying relevant tweets in real-time is
another goal as we plan on implementing the system for real time streamed data. We
would like to test our work on various disasters from different domains which will help
in discovering similarity among the different disasters and the disaster phases via key
words or other similarity measures. Through our work, we also see a need for novel
labeling mechanisms for Twitter data based on text context. Presenting the extracted
information about the disaster in a user-friendly or standard format is another area to
work on.

Acknowledgements. Special thanks to Dr. Farnoush Banaei-Kashani, University of Colorado


Denver. This work is supported by the Department of Education GAANN Program, Fellowship #
P200A150283, focused on Big Data Science and Engineering.

References
1. Win, S.S.M., Aung, T.N.: Target oriented tweets monitoring system during natural disasters.
In: 16th IEEE/ACIS International Conference on Computer and Information Science (ICIS),
pp. 143–148. IEEE, Wuhan (2017)
2. Stowe, K., Paul, M.J., Palmer, M., Palen L., Anderson, K.: Identifying and categorizing
disaster-related tweets. In: The Fourth International Workshop on Natural Language
Processing for Social Media, pp. 1–6. Association for Computational Linguistics, Austin
(2016)
3. Vieweg, S.E.: Situational awareness in mass emergency: a behavioral and linguistic analysis
of microblogged communications. Doctoral dissertation, University of Colorado at Boulder,
Boulder, CO (2012)
4. Ashktorab, Z., Brown, C., Nandi, M., Culotta, A.: Tweedr: mining twitter to inform disaster
response. In: Hiltz, S.R., Pfaff, M.S., Plotnick, L., Shih, P.C. (eds.) 11th Interna-
tional ISCRAM Conference, pp. 354–358. The Pennsylvania State University, Pennsylvania
(2014)
5. Imran, M., Castillo C., Lucas J., Meier P., Vieweg, S.: AIDR: artificial intelligence for
disaster response. In: 23rd International Conference on World Wide Web, pp. 159–162.
ACM, Seoul (2014)
Twitter Analytics for Disaster Relevance and Disaster Phase 417

6. Imran, M., Elbassuoni S., Castillo, C., Diaz, F., Meier, P.: Practical extraction of disaster-
relevant information from social media. In: 22nd International Conference on World Wide
Web, pp. 1021–1024. ACM, Rio de Janeiro (2013)
7. Wang, Z., Ye, X.: Social media analytics for natural disaster management. Int. J. Geogr. Inf.
Sci. 32(1), 49–72 (2018)
8. Haworth, B., Bruce, E., Middleton, P.: Emerging technologies for risk reduction: assessing
the potential use of social media and VGI for increasing community engagement. Aust.
J. Emerg. Manag 30(3), 36 (2015)
9. Yan, Y., Eckle, M., Kuo, C.L., Herfort, B., Fan, H., Zipf, A.: Monitoring and assessing post-
disaster tourism recovery using geotagged social media data. ISPRS Int. J. Geo-Inf. 6(5), 144
(2017)
10. Habdank, M., Rodehutskors, N., Koch, R.: Relevancy assessment of tweets using supervised
learning techniques: mining emergency related tweets for automated relevancy classification.
In: 4th International Conference on Information and Communication Technologies for
Disaster Management (ICT-DM), pp. 1–8. IEEE, Münster (2017)
11. Latent Dirichlet Allocation. https://docs.microsoft.com/en-us/azure/machine-learning/studio-
module-reference/latent-dirichlet-allocation. Accessed 02 Feb 2018
12. Anastasopoulos, L.J., Moldogaziev, T.T., Scott, T.A.: Computational Text Analysis for
Public Management Research: An Annotated Application to County Budgets (2017)
13. Huang, Q., Xiao, Y.: Geographic situational awareness: mining tweets for disaster
preparedness, emergency response, impact, and recovery. ISPRS Int. J. Geo-Inf. 4(3),
1549–1568 (2015)
14. Machine learning algorithm cheat sheet for Microsoft Azure machine learning studio. https://
docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-cheat-sheet. Accessed
02 Feb 2018
15. Spielhofer, T., Greenlaw R., Markham, D., Hahne, A.: Data mining Twitter during the UK
floods: investigating the potential use of social media in emergency management. In: 3rd
International Conference on Information and Communication Technologies for Disaster
Management (ICT-DM), pp. 1–6. IEEE, Vienna (2016)
16. Named Entity Recognition. https://docs.microsoft.com/en-us/azure/machine-learning/studio-
module-reference/named-entity-recognition. Accessed 02 Feb 2018
17. Extract key phrases from text. https://docs.microsoft.com/en-us/azure/machine-learning/
studio-module-reference/extract-key-phrases-from-text. Accessed 02 Feb 2018
Incorporating Code-Switching and Borrowing
in Dutch-English Automatic Language
Detection on Twitter

Samantha Kent(&) and Daniel Claeser

Fraunhofer Institut FKIE, Fraunhoferstrasse 20, 53343 Wachtberg, Germany


{samantha.kent,daniel.claeser}@fkie.fraunhofer.de

Abstract. This paper presents a classification system to automatically identify


the language of individual tokens in Dutch-English bilingual Tweets. A dic-
tionary-based approach is used as the basis of the system, and additional features
are introduced to address the challenges associated with identifying closely
related languages. Crucially, a separate system aimed specifically at differenti-
ating between code-switching and borrowing is designed and then implemented
as a classification step within the language identification (LID) system. The
separate classification step is based on a linguistic framework for distinguishing
between borrowing and CS. To test the effectiveness of the rules in the LID
system, they are used to create feature vectors for training and testing machine
learning systems. The discussion centres are based on a Decision Tree Classifier
(DTC) and Support Vector Machines (SVM). The results show that there is only
a small difference between the rule-based LID system (micro F1 = .95) and the
DTC (micro F1 = .96).

Keywords: Code-switching  Borrowing  Dutch  English  Twitter


Machine learning  Decision trees  SVM

1 Introduction

In the European Union, it is estimated that just over half of all European citizens are
able to speak at least one other language in addition to their mother tongue [1]. Online
micro-blogging platforms such as Twitter provide the perfect setting for multilingual
communication, and Tweets containing Dutch and English, as in (1) below, are not
uncommon.
(1) oke give me some reasons waarom jij denkt dat het real is
ok give me some reasons why you think it’s real
Currently, multilingual communication poses a challenge for Natural Language
Processing (NLP) tasks such as Part-of-Speech tagging, machine translation, and
Named Entity Recognition. Improving the ability to process multilingual communi-
cation is vital, as it will contribute to further solving these tasks.
Automatic language identification (LID) is the task of determining the language of
a document, sentence or word. Language identification at Tweet level reaches accuracy

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 418–434, 2019.
https://doi.org/10.1007/978-3-030-02686-8_32
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 419

levels of over 95% for many languages. Nevertheless, one of the reasons the language
of a Tweet is incorrectly identified, aside from the marked Twitter language, is because
they can contain code-switching. Code-switching (CS) is defined as “the alternation of
two languages within a single discourse, sentence or constituent” [2]. CS can consist of
multi-word utterances or single-word insertions. To determine whether or not a Tweet
contains multiple languages, an analysis at token-level needs to be conducted.
While there are many different LID methods, arguably, one of the simplest
approaches is based on the use of a lexical lookup system. In this method, dictionaries,
which are lists containing lexical items extracted from a particular language, are used to
verify that a word is part of the lexicon of that language. This method was used as a
starting point to identify the language of tokens in Spanish-English, German-Turkish,
and Dutch-English Tweets [3]. The results suggested that the performance of a
dictionary-based LID system is much better for language pairs that are not as closely
related as Dutch-English. In the case of Dutch and English, many Dutch words are
borrowed from English and have been integrated into the Dutch lexicon. The challenge,
therefore, lies in determining whether the English words are in fact borrowed and part
of a monolingual Tweet, or if they are English words (CS) that are included in a
multilingual Tweet. Without distinguishing between these two types of words, it is very
difficult to accurately identify the language of tokens in sentences that contain both
English and Dutch.
Thus, in order to address this issue, this study seeks to present a method for
distinguishing between borrowed and code-switched English words in order to improve
the overall language classification of tokens in Dutch-English Tweets. To do so, the
method in this paper combines a LID system based on a dictionary lookup with a
synonym detection method that identifies whether the token in question is code-
switched or borrowed. Even though “words are seldom exactly synonymous” [4],
comparing the use of a token and its possible synonyms provides an indication as to
how a token is integrated into a language.

2 Code-Switching and Borrowing

To fully understand CS, a distinction between CS and lexical borrowing needs to be


made. Lexical borrowing is defined as “the incorporation of lexical items from one
language in the lexicon of another language” and is, together with CS, one of the more
prominent language contact phenomena [5, p. 189]. CS and borrowing are closely
related in the sense that lexical items that were once classified as foreign word CS may
be absorbed into the lexicon of a host language over time [6]. Example (2) below
illustrates that it is not always so easy to determine whether a word should be identified
as a foreign word or not.
(2) ik heb een video klaarliggen… een social test met mn docent, wanneer moet die
online?
I have a video ready to go… a social test with my teacher, when should it go
online?
420 S. Kent and D. Claeser

At first glance, it would seem as though ‘social’, ‘test’ and ‘online’ are all English
words in this sentence. In fact, according to the Woordenlijst Nederlandse Taal,1 the
only word that is actually an English word is ‘social’, as the Dutch equivalent of this
word is ‘sociaal’. The other two words are identical to English, but are also a part of the
Dutch lexicon. They should, therefore, not be identified as code-switching but instead
as borrowing.
Numerous attempts have been made to distinguish between borrowing and code-
switching. They range between establishing a set of specific criteria with which to
identify borrowing and CS, to the assertion that there is no clear-cut distinction
between the two. In the first view, one of the main distinguishing features between the
two is the number of words. Lexical borrowings consist of only one word, whereas CS
can consist of multiple words [7]. Having said this, the difficulty in distinguishing
between the two does not lie in the difference between single word lexical borrowings
and multi-word alternations, but rather between lexical borrowing and single word CS
inclusions. Table 1 provides a set of criteria to establish whether foreign inclusions can
be classified as borrowing or CS [7]. These criteria are used as guidelines to differ-
entiate between the two phenomena.
By delineating these criteria, the impression is given that there are only two pos-
sibilities to classify a single word inclusion: CS or borrowing. However, it is argued
that this strict separation of the two phenomena is not always possible and there are
many exceptions that do not fall into either category. Instead of strictly differentiating
between the two, CS and borrowing could be viewed as a continuum where the
canonical forms of CS and borrowing are placed at either end of the spectrum [8]. This
continuum makes it possible to account for tokens that may not be precisely in either
stage, but are instead transitioning into becoming fully-fledged loanwords.
The definition of borrowing that will be adopted in this paper is that borrowed
words are words that stem from a foreign language and have been integrated into the
lexicon of a native language. In contrast, words that are classified as code-switching are
not integrated. Rather than having to define a frequency at which a token is either
automatically classified as CS or borrowing, the approach taken here relies on the
difference between the frequency for the token and any possible alternatives in the
native language. This ensures that instead of having to assign an arbitrary value, the
unique difference between the tokens determines whether a word is CS or borrowing.

3 Related Work

Code-switching in Tweets was the topic of the shared task for the workshops on
computational approaches to code-switching during the conference on Empirical
Methods in Natural Language Processing (EMNLP) in 2014 and 2016. CS detection
methods ranged from deep learning algorithms to traditional machine learning
approaches and various dictionary-based approaches [9, 10]. The best result was

1
Woordenlijst Nederlandse Taal is a word list that contains the correct spelling of current Dutch
words. It is maintained by de Taalunie http://woordenlijst.org/.
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 421

Table 1. Characteristics of borrowing and CS [5, 7].


Criteria Borrowing Code-switching
No more than one word + −
Phonological adaptation + ±
Morphological adaptation + −
Syntactic adaptation + −
Frequent use + −
Replaces own word + −
Recognized as own word + −

obtained by [11] for Spanish-English with an F1 score of 91.3%. The performance of


the submissions for the Arabic language pair ranges from an F1 of 66% to an F1 of
83% for the system with the best performance [12]. The results suggest that the more
similar a language pair is, the more difficult it is to accurately detect CS.
To the best of our knowledge, there are currently only two studies that present a
method of automatically identifying CS and borrowing on social media. Neither study
incorporated their results into a LID method. The first one focused on English-Hindi
CS and on developing a method that automatically detects whether a foreign language
inclusion is CS or borrowing [13]. The method used is similar to the one in this paper,
as the starting point is also the assumption that it is possible to distinguish between CS
and borrowing by looking at the distribution of use of a foreign word in a native
language. They achieve this by looking at the frequency of use of a token in a
monolingual Hindi newspaper. Alternatively, [14] propose three different metrics to
measure word usage: The Unique User Ratio (UUR), The Unique Tweet Ratio
(UTR) and Unique Phrase Ratio (UPR). The results are that the overall micro
precision/recall is 0.33 for the UUR metric, compared to a baseline of 0.19 established
in [13].
It is clear from previous studies that multilingual text within one Tweet still pro-
vides a challenge for automatic language detection. The systems described above cite
similar reasons for the misclassification of certain tokens. Firstly, the highly informal
nature makes it difficult to capture the language of all tokens in a Tweet. A second
reason misclassifying occurs is because the presence of named entities complicates the
LID task [15]. Thirdly, words that share the same spelling in both languages are
difficult to detect [15, 16]. This particular challenge seems to increase the more similar
the languages in the language pair are. It seems as though it is more difficult to detect
the language of tokens if there is a high level of lexical overlap.

4 Resources

A Dutch-English code-switching corpus was created for the purpose of training and
testing the classifier and was compiled with the aim of collecting as many Dutch
Tweets containing English CS as possible. The corpus was compiled using the search
function in the Twitter streaming API and both a specific language setting, Dutch, and
422 S. Kent and D. Claeser

specific search words were used to find Tweets containing Dutch English code-
switching. The top 25 most frequently used Dutch words on Dutch Wikipedia, con-
sisting solely of grammatical function words, were used as search terms.
The language identification method presented in [3] was used to make a pre-selection
of Dutch Tweets that are likely to contain English tokens. Based on these language tags,
all Tweets with only Dutch or English tokens were separated from the Tweets that
contain both Dutch and English tokens. It was necessary to select not only Tweets that
were correctly identified by the LID system as CS, but to also include Tweets that were
incorrectly identified so as not to introduce a bias. Therefore, some Tweets in the corpus
contain only Dutch words that were mistakenly identified as English and are used to test
the classifiers ability to recognize code-switched and borrowed tokens. The authors
manually selected 1250 Tweets for annotation. The following four categories were used
in the manual annotation of the Tweets:
• Dutch (NL) – This category consists of Dutch words. It also includes all Dutch
words that are borrowed from English. Particular attention is paid to the annotation
of borrowed words, and because they are often overlooked and easily incorrectly
annotated as English, these words were double checked in the Dutch word list.
• English (EN) – All English words are labelled as English. If there is doubt about
whether a word is English or Dutch, the same criteria as described in the Dutch
category are applied.
• Social Media Token (SMT) – It proved useful to create a separate category for all
social media related tokens [16]. It includes all tokens that are specifically related to
Twitter, such as at-mentions containing people’s usernames, hashtags and URLs,
but it also includes tokens such as ‘hahahah’, ‘lol’ or ‘aww’.
• Ambiguous (AM) – This category includes tokens that cannot be categorized as
belonging to a particular language. Similarly to the SMT category, the tokens are
used by both languages and are thus considered to be language independent. For
example, company names such as Twitter or Google as well as the names of places
and people, are categorized as ambiguous.
The annotation was conducted by a native speaker of both Dutch and English, and a
second native speaker annotated 100 randomly selected Tweets to check the accuracy
of the annotation. A comparison of the Tweets annotated by both annotators shows a
high inter-rater agreement (Cohen’s Kappa = 0.949). 1000 Tweets were used as
training material and 250 Tweets were used to test the classifier. An overview of the
distribution of Tweets in the training and testing sets is given in Table 2 below. Note
that while the category ambiguous (AM) has been included for the purpose of com-
pleteness, it is not taken into account in any further classification or analysis.
The synonym dictionaries used in the LID system stem from three different sources.
The first dictionary was obtained from Open Dutch WordNet [17]. Open Dutch
WordNet is a lexical semantic database containing 117914 synonym sets, of which
51588 sets contain at least one Dutch synonym. The second dictionary is from a Dutch
language foundation called Open Taal.2 They provide language resources for the

2
http://data.opentaal.org/opentaalbank/woordrelaties/.
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 423

Table 2. Number of tokens in each of the four categories in the annotated Tweet training and
testing sets
Category No. of tokens in training set No. of tokens in testing set
Dutch (NL) 73% (n = 10637) 73% (n = 2680)
English (EN) 15% (n = 2220) 17% (n = 612)
Social Media Token (SMT) 9% (n = 1281) 9% (n = 341)
Ambiguous (AM) 3% (n = 438) 1% (n = 41)
Total 14576 3674

creation of Dutch language software. The final dictionary was created using Dutch
Wiktionary.3 The synonyms for each of the Dutch entries in the dictionaries were
extracted and used to compile a specific synonym dictionary. The addition of multiple
synonym dictionaries not only increases the number of synonym sets but also means
that entries can be cross-checked.
The word frequency dictionaries were created using the Wikipedia dumps for
Dutch and English (version: “all pages with edit history” on 01/03/2017). This par-
ticular version contains the pages itself and a user discussion section where Wikipedia
users may comment on the page content. This means that the dictionary contains both
formal and informal language, as well as a wide range of vocabulary from different
topics. The word list was created by stripping the raw input of all special characters,
tokenizing the sentences, and sorting the tokens according to their rank. The rank lists
were cut at five million types because any words that are lower down on it consist of
single words with a frequency of one.
The Social Media Token (SMT) word list consists of a combination of different
elements. The SMT list provided in [16] forms the basis of the list used here, which is
supplemented by two additional resources. Firstly, the addition of an emoticon list from
Wikipedia allows tokens such as “xD” to be captured. Secondly, a list of onomatopoeic
words, such as ‘haha’ ‘pff’, retrieved from the training corpus was also added. To
ensure that as many of these tokens as possible are identified as SMT, the list is
extended to include various different forms of the same token. This means that
alongside ‘haha’ and ‘pff’, ‘hahahah’ and ‘pffff’ were also added.

5 Classification

In this section, the classification process is described. Section 5.1 contains an overview
of the rule-based system, whereas Sect. 5.2 describes how the features derived from the
classification rules are extracted for use in various machine learning classifiers.

3
https://nl.wiktionary.org/wiki/Hoofdpagina.
424 S. Kent and D. Claeser

5.1 Rule-Based LID System


The notion of word frequency plays a central role in the design of the system. It is
assumed that the Dutch and English word frequency dictionaries are large enough for
all tokens to be present in both dictionaries. Crucially, the rank of the token will be
different as it will be more frequent in the language of origin compared to the other
dictionary. Thus, in the first step of the LID system, a token is assigned a language tag
based on whether the rank of the token is higher in the Dutch or English dictionary. In
the rare instance that a token is not present in either of the dictionaries, it is assigned the
tag ‘none’. As a final step, all none tags are tagged as the majority language (NL) of the
Tweet.
Aside from the binary classification of either Dutch or English, tokens that are
specific to social media also need to be taken into account. Tweets contain many
additional tokens, such as @-mentions, hashtags, and abbreviations, which do not
strictly belong to either of the two languages. To account for these tokens, an additional
rule containing Social Media Tokens (SMT) is introduced. Once the initial classifi-
cation based on the rank information is made, an additional lookup is performed in an
SMT word list. Without this list, almost all of the SMT tokens would be tagged as
English, simply because they are more frequent in the English rank dictionary com-
pared to the Dutch one. All tokens present in this SMT list are tagged as such and are
excluded from any further steps or rules in the LID system.
The lexical overlap between Dutch and English means that it is challenging to
capture the language of tokens that are orthographically identical in both languages.
For example, the word “school” is used in both Dutch and English and should therefore
also be classified as such. However, if the word “school” has a rank of 615 in Dutch
dictionary and a rank of 325 in the English dictionary, the classifier will tag the word as
English. If the LID system were to just consist of a basic dictionary lookup without any
additional rules, all Dutch occurrences of the word would be misclassified.
In order to account for these tokens, two additional rules have been incorporated
into the classifier. The first additional rule is the inclusion of a synonym detection
method to determine whether a token is code-switched or borrowed. To start, the token
that is being classified is matched to an equivalent synonym in the Dutch synonym
dictionary. If there is no match for the token, and therefore no synonym, the token is
classified as English. If there is a match, the token is classified as Dutch in the fol-
lowing two conditions:
• If the rank of the original English token is higher than that of the selected synonym
in the Dutch word frequency dictionary, the token is tagged as Dutch and therefore
is borrowed. For example: ‘soul’ (rank = 6914) vs. ‘ziel’ (rank = 7291).
• If the difference in ranks between the original English synonym and the selected
synonym is less than 30.000, the token is tagged as Dutch and therefore is also
borrowed. For example: ‘power’ (rank = 4092) vs. ‘macht’ (rank = 1316).
The maximum rank distance is iteratively determined to be 30.000 using a list of
English words that could potentially be borrowing or CS from the training data. To
select the corresponding synonym the original English token is compared to each of the
synonym sets, and if the token is present in a set, its synonyms are added to a match
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 425

list. Once the match lists have been created, the correct synonym is selected using a
process of elimination. In the first step, the synonym that occurs most frequently as a
synonym match is selected. Secondly, if there is a tie, the synonym with the highest
rank in the Dutch language dictionary is selected. The information obtained from the
synonym dictionaries only outweigh the frequency information gained in step one if
there is an actual synonym match. Otherwise, the classifier assigns the original tag.
The second additional rule considers the context of a token. It applies to tokens
where the token is in one language and the preceding and the following token are in
another language. In these cases, the token is assigned a language tag that matches the
language of the surrounding tokens. For example, if token ‘n’ is Dutch and tokens
‘n − 1’ and n + 1’ are English, it is possible that the middle token ‘n’ is, in fact,
English and should be reassigned as such. An essential addition to this rule is that it
only comes into effect when the ranks of the token are sufficiently similar in the Dutch
and English frequency dictionaries (Fig. 1).
If a maximum rank distance is not set, all tokens will be reassigned to match their
context and all one-word code-switches could be incorrectly classified and lost. After a
distance of 1000 ranks, English recall starts to decrease considerably. Therefore, in
order to optimize the identification of the English tokens, the rank distance has been set
to the maximum of 1000. To summarize, the steps in the LID system are as follows:
• Base rule: dictionary lookup using the rank information in the Dutch and English
Wikipedia dictionaries.
• Base rule: SMT lookup.
• Additional rule 1: Synonym dictionary lookup.
• Additional rule 2: The context rule.

5.2 Machine Learning


The four steps in the LID system have been converted into numeric vectors to use as an
input for the classifiers in scikit-learn 0.18. This allows the system to be tested in a
formal classification framework and be exported for further use. The resulting vector
has four different features: rank EN, rank NL, SMT, synonym rank, each corresponding
to the information derived from the rule-based LID system described in Sect. 5.1.
Rank EN, rank NL and synonym rank are all integers containing the absolute ranks
retrieved from the language dictionaries. For the SMT token, we converted the Boolean
‘present/absent’ in a social media token list to either returning an integer of 0 or 1.
A second variation of the vector was also tested. The absolute synonym rank infor-
mation was replaced with the difference in ranks between the token in question and its
corresponding synonym. All other vector dimensions remained the same. The differ-
ence between the first and second version of the vector is that in the first the difference
in ranks between the token and the synonym are returned implicitly. The information is
inherent in the synonym rank and the rank of the Dutch token and is thus already in the
vector. In the second version, the difference in ranks is explicitly added as a feature.
This distinction was made to allow the classifiers to be trained on different information
and to see if they would learn the rank difference without being explicitly given the
information. We trained and tested eight different classifiers using 10-fold cross
426 S. Kent and D. Claeser

Fig. 1. Dutch and English precision and recall with differing maximum rank distance.

validation, the results of which can be found in Table 3 below. The two best classifiers
will be discussed in more detail in the following section.

6 Evaluation

In this section, the results for LID system and the best performing classifier, the
Decision Tree classifier, are presented. Additionally, in Sect. 6.2, the code-switching
and borrowing detection rule is evaluated separately.

6.1 General Evaluation


The LID system and the Decision Tree Classifier were evaluated on a held-out set of
250 Tweets. The results are very similar. The precision, recall and F1 for the individual
categories, NL, EN, and SMT, in the Decision Tree classifier, are shown in Table 4
below. The best result is NL, with an F1 of 97.19%, followed closely by SMT and EN
that have an F1 of 96.47% and 88.73%. Compared to the LID system, both precision
and recall for the NL and EN improved. The overall F1 scores for the LID system and
the DTC are 94.66% and 95.69% respectively, which is a significant improvement
compared to the baseline (F1 = 85.29%) for Dutch-English CS detection in Claeser
et al. [3]. Both systems illustrate that it is easier to identify Dutch, the main language of
the Tweets, although there is an improvement in the classification of the EN tokens in
the DTC. All figures for the DTC do not include any post-processing, since the effect of
the context rule on the output of the classifier was below the variance of the results of
different test splits within cross-validation.
The confusion matrix in Table 5 provides the misclassified tokens for the DTC.
Most errors stem from tokens that should have been classified as either NL or EN.
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 427

Table 3. Classifier Performance Micro F1


Classifier Micro F1
Decision tree classifier 0.9537
Support vector machine 0.924
Ada boost classifier 0.9096
Linear discriminant analysis 0.8187
Quadratic discriminant Analysis 0.8186
Logistic regression 0.7729
Neural network 0.7503

Table 4. P, R, F1 for the individual categories in the DTC and LID system
Language Precision (%) Recall (%) F1 (%)
Decision tree classifier
Dutch (NL) 97.16 97.23 97.19
English (EN) 88.58 88.87 88.73
Social Media Token (SMT) 97.10 95.86 96.47
Rule-based LID system
Dutch (NL) 95.85 97.22 96.53
English (EN) 86.50 80.21 83.23
Social Media Token (SMT) 97.56 98.00 97.77

The SMT tokens are rarely misclassified, and if they are it is because a token is a more
unusual version of an SMT token already present in the SMT list.
One of the largest sources of errors consists of Dutch tokens that should have been
classified as English. This includes tokens such as ‘god’, ‘pianist’, ‘pressure’, and
‘dreaming’. There are two main types of errors. Firstly, single word inclusions were
misclassified due to the context in which they appeared. For example, ‘god’ and
‘pianist’ are part of the Dutch and English lexicon, and were misclassified in these
cases because they were used in an English context but classed as borrowed (NL) by
the inclusion of a synonym rank. Secondly, tokens have been misclassified because the
matched synonym is incorrect. A manual inspection of the tokens and their selected
synonyms shows, for example, that the synonym that was selected for ‘love’ is ‘rose’.
While these tokens are related in some way, they cannot be considered to be synonyms
of one another. However, because ‘love’ is more frequent than ‘rose’, it is automati-
cally classified as being a borrowed (NL) word because the English token is more
frequent than its supposed Dutch synonym.
Another source of errors is English tokens that should have been classified as Dutch.
In most cases, they were not detected as borrowed words by the classifier. One of the
main reasons is that for these tokens the synonyms were not included in any of the three
external synonym dictionaries. For example, ‘respect’, ‘defect’, ‘story’, ‘highlight’ and
‘trends’ are all part of the Dutch lexicon, but have been classified as English. The second
reason for misclassifications is the inclusion of multi-word code-switched segments.
428 S. Kent and D. Claeser

Table 5. Confusion matrix of the decision tree classifier


NL EN SMT Total
NL 2674 66 10 2750
EN 67 611 3 681
SMT 7 2 314 323
Total 2748 679 327 13889

For example ‘minute’ is misclassified as English. However, if it is used as part of the


phrase ‘last minute’, it should be considered Dutch. Only the phrase as a whole is
considered to be Dutch, the individual tokens within the phrase are not. In order to
capture these specific instances, multi-word token sequences would need to be included
in the dictionaries, and currently, the classifier operates on single tokens.

6.2 Evaluation of the Synonym Selection Rule


To evaluate the effect of the synonym detection step on the overall classification
process, a list of 400 words that are tagged as English in the base step of the LID
system were extracted for further analysis. Each token was tagged as either borrowed or
code-switching based on the information from the synonym dictionaries, and the
original language (EN) was appended to Dutch whenever the system indicated that the
word may be borrowed. This output was then compared to the gold standard, which
was based on the presence or absence of a word in the “Woordenlijst Nederlandse
Taal”. The analysis is based solely on the 400 individual tokens, without taking their
context in the Tweet into account. In total, 82% of the tokens were correctly identified
as being either borrowed or code-switched. 260 tokens were correctly identified as
code-switching, compared to a total of 289 tokens that should have been classified as
code-switching and 71 out of 97 tokens have been correctly identified as borrowing.
Without this additional step, based on the initial rank information, all of these tokens
would have been classified as code-switched (EN), even though many of these are
indeed part of the Dutch lexicon and should, therefore, be tagged accordingly. This
demonstrates the importance of distinguishing between borrowing and CS in a lan-
guage identification system that classifies closely related languages.
As well as analyzing the impact of the synonym dictionary rule as a whole, the two
different conditions in which a token is tagged as borrowing have also been examined
(see Sect. 5.1 for a description of the conditions). Each condition considers the rank
information of the token and the synonym that has been selected as an equivalent
match. The first enables the detection of borrowed tokens that have a higher rank than
its equivalent Dutch synonym. A total of 53 borrowed words were correctly identified
using this method. Among the correctly identified tokens are ‘we’, ‘must’, ‘budget’,
‘crash’, ‘super’, ‘sale’ and ‘media’, ‘perfect’, ‘modern’, and ‘ranking’. The information
that was used to classify the tokens is provided in Fig. 2 below. For each of the tokens,
the English version was used more frequently than its Dutch equivalent. In some cases,
the distance between the ranks of the two synonyms is much larger than others.
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 429

The larger the rank distance between the two synonyms, the larger the difference in
frequency of use of the borrowed word compared to the Dutch equivalent synonym.
In the second rule, a token is classified as borrowed if the distance between the rank
of a token and its selected synonym is less than 30,000. The CSB system correctly
identified 30 tokens using this rule. Among them is the selection of tokens provided in
Fig. 2. In these instances, the frequency of use is higher in the Dutch synonym
equivalent than in the token. For example, ‘ticket’ is used relatively frequently in
Dutch, although the Dutch version ‘kaartje’ is still used more frequently. In other
words, the original Dutch token is used more frequently than the borrowed equivalent
of the word. Interestingly, this rule enables the identification of borrowed nouns as well
as highly frequent grammatical tokens. The synonym pair ‘me’ and ‘mij’ demonstrated
the CSB system’s ability to recognize that ‘me’ is both a Dutch and English pronoun
(Fig. 3).
Whilst the first borrowing rule may have identified more borrowed tokens overall, a
direct comparison of the number of correct tokens identified by both of the rules shows
that they are both equally capable at identifying borrowed tokens. 89.9% of the tokens
classified by the first borrowing rule were correct and 90.9% of tokens classified by the
second rule were correct.
The synonym selection process was crucial to successfully differentiating between
borrowed and code-switched tokens. In order to judge whether the synonyms are a
correct match or not, two Dutch native speakers separately annotated the synonym
match lists. The judgment was based solely on whether the two tokens could be
synonyms, without taking any context into account. These figures do not take into
account whether or not the token was classified correctly; it focuses solely on whether
the synonym match is correct. Generally, there was agreement between the annotators,
and the final judgments for each annotator were merged to create an overall judgment
list. In total, out of the 97 borrowed tokens identified by the system, 79.4% of all
synonyms have a correct match. Table 6 below shows that of the 77 correct synonym
matches, only 5% of tokens were incorrectly classified as borrowing. Contrastingly,
40% of tokens with an incorrect synonym were incorrectly classified as borrowing.
Therefore, there seems to be a correlation between whether the synonym that is
identified by the system is correct or not and the corresponding classification of bor-
rowing or code-switching. If the synonym match is correct, the more likely the system
will correctly identify whether a token is borrowed or code-switched.
Overall, the system is relatively accurate at identifying whether an English token is
in fact just English, or whether it also belongs to the Dutch lexicon. These tokens have
been defined as borrowed tokens in the context of this study, even though strictly
speaking not all tokens are actually borrowed from English and some may share
another etymology. Nevertheless, the system is able to identify if a token should also be
classified as Dutch; so from the perspective of a method able to differentiate between
these two languages, the classifier will be a valuable tool in this process.
430 S. Kent and D. Claeser

Fig. 2. A selection of correctly identified borrowed (NL) tokens. The token is marked in bold
and supplemented by its rank in the Dutch Wikipedia dictionary as well as the synonym selected
by the classifier and its matching rank.

Fig. 3. A selection of correctly identified borrowed (NL) tokens using the maximum rank
distance rule.

Table 6. Correlation between synonym matches and the number of correctly classified
borrowed (NL) tokens
No. incorrectly classified tokens No. correctly classified tokens
Correct matches 5% (n = 4) 95% (n = 73)
Incorrect matches 40% (n = 8) 60% (n = 12)

7 General Discussion

Our initial assumption was that the Decision Tree classifier would be the most suitable
classifier for features extracted from the rule-based LID system. However, even though
it was the best performing classifier, the rule related to the rank distance between a
token and its corresponding synonym did not transfer. This is true for both versions of
the vector. The rule was not learned whether or not the rank distance was explicitly
provided. In each case, a different tree is generated, but both are equivocally complex:
the classifier bypasses the synonym rank rule and the model is based on grouping
tokens with similar ranks to create paths. We suspect the reason that the classifier did
not learn the rule is that the algorithm that builds the decision tree has the objective to
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 431

find the most efficient local split. It aims to create the purest subset with maximum
information gain, and consequently fails to detect the global optimum. Instead, the
classifier generated hundreds of specific paths to classify small groups of tokens.
The second best performing classifier, aside from the rule-based LID system, is the
Support Vector Machine. In contrast to the Decision Tree, the SVM does, in fact, learn
the synonym rank rule. We believe that this is because the RBF kernel enables the
classifier to generalize and learn the concept of a rank threshold for the synonyms. It
does so by transforming the non-linear data from the dictionary rank lists to a
hyperspace that allows for the separation of the otherwise intertwined examples of
borrowing and CS in the rank lists. This assumption is supported by the observation
that giving either the ranking distance as explicit information or just the synonym rank
has no visible influence on either runtime or performance of the resulting SVM. Neither
does changing the default value in the vector from 0 to −10 million, a value larger than
the size of the dictionary, for non-existing synonyms.
Interestingly, the rule-based LID system performed very similarly to the machine
learning classifiers. [16] also reported a similar finding, in that the results for the rule-
based system were actually slightly better than for the machine learning systems,
suggesting that if the rules are designed carefully, language detection for this particular
language pair can be just as accurate in rule-based systems as in machine learning
systems.
The performance of systems depends greatly on the quality of the external mate-
rials. While designing the systems, we noticed both advantages and disadvantages for
the different types of external resources. Firstly, the synonym dictionaries proved to be
quite difficult to obtain. The decision was made to combine multiple synonym dic-
tionaries in order to compensate for incomplete dictionaries. The main reason for doing
so is the ability to cross-reference entries for the lemmas. This allows for a verification
of whether the entry is actually correct. For example, for some dictionary entries, the
English translation of a word is listed as a synonym even though it is not officially a
part of the Dutch lexicon. These tokens caused issues, as they were not included as an
English token in the annotated gold standard, and were consequently incorrectly
classified. The most frequently occurring example is ‘why’, which is listed as a syn-
onym for ‘waarom’ in the Open Taal synonym dictionary. This mistaken entry would
be easy to rectify if all synonyms not present in at least one other dictionary are
disregarded as synonym matches. However, this would not be possible with the current
synonym dictionaries as many of the matches only occurred in one dictionary. Too
many entries would be lost and the performance of the identification of the borrowed or
Dutch tokens would decrease. If such a frequently occurring word is listed as a syn-
onym even though it is not, it is likely that this is also true for other entries, which may
cause issues in the classification of other tokens in the future.
Secondly, the Wikipedia rank list turned out to be a highly suitable external
resource. A comparison of the studies describing just a basic dictionary lookup
approach to the results obtained in this system illustrates that the quality of the
Wikipedia dictionaries enhanced the performance the first step in the LID system. The
system in [16] obtained an F1 score of 38% for identifying English tokens, [18]
obtained similar Figs. (38% and 35%) for the English-Hindi and English-Bengali
language pairs, and [19] obtained the highest F1 scores in comparison with 71% and
432 S. Kent and D. Claeser

73% for Spanish-English and Nepali-English respectively. In the LID system we


present, the most basic version without any additional rules achieved a micro F1 of
72%. This suggests that the quality of the dictionaries is good, because based on just
the lookup alone, the results are better than initially anticipated based on previous
research.
Having said this, a few issues still remain. Firstly, it must also be considered that
while Wikipedia contains a large variety of topics and registers, there may be some
topics that are overrepresented on Wikipedia and tokens related to that topic are
consequently also more frequent in the dictionaries than they would be in other cir-
cumstances. Secondly, the use of quotations or names in the articles may also mis-
represent the actual frequency of certain tokens. Names of books or films are not
translated into Dutch and they are often used in the original language. Consequently,
the article ‘the’ is extremely frequent in the Dutch Wikipedia pages even though it is
not a Dutch token. In the English rank dictionary, ‘the’ is the most frequently used
token and is ranked at one. In the Dutch dictionary, it is ranked at 63. Even if it is
highly ranked, the assumption that words are more frequent in their language of origin
still holds. Nevertheless, according to the Dutch Wikipedia rank dictionary, the word
‘the’ is more frequent than most Dutch lexical items and it does not match the fre-
quency information that one would expect of words that are not a part of the Dutch
lexicon.

8 Conclusion

The question posed in this paper was whether or not a dictionary-based LID system is
suitable for token-level language detection in a closely related language pair. Previous
research [3] indicated that lexical items present in both languages, in this case, Dutch-
English, caused misclassifications in a dictionary-based lookup system. It was difficult
to identify whether or not a token was code-switched because many English tokens
were classified as Dutch. The solution presented in this paper was to combine a system
designed specifically to differentiate between borrowing and code-switching. The
results show that by incorporating this method into token level language classification
yields a micro F1 of 94.66% and 95.69% for the rule-based LID system and the DTC
respectively. This is a great improvement compared to the baseline (F1 = 85.29%) for
Dutch-English CS detection in [3].
Even if the overall result is highly competitive to other similar systems, future
research could benefit from adding a number of improvements. Firstly, named entities
were excluded from classification altogether, because as far as we are aware, there are
no suitable external named entity recognition systems for code-switched Dutch-English
tweets. The systems could benefit from the addition of named entity recognition, but
more importantly, it should be included for the purpose of completing the classification
of a Tweet as a whole. Secondly, the synonym selection method could be improved, if
context were to be taken into account. Currently, the context information is only used
in the final step of the LID system to correct any misclassifications by the frequency
dictionary lookup and synonym dictionary lookup. It would be interesting to see
Incorporating Code-Switching and Borrowing in Dutch-English ALD on Twitter 433

whether performance improves if this step is implemented within the synonym


selection process, rather than as a final step.
One of the challenges for the design of the system was acquiring good external
resources. The dictionaries based on Dutch and English Wikipedia are a highly suitable
source for the creation of the language-specific word frequency lists. The inclusion of
formal and informal language and a wide range of topics ensure many of the tokens are
in fact present in the dictionaries. However, there seems to be a lack of freely available
material for Dutch natural language processing. The synonym dictionaries, in partic-
ular, are not ideal, as three separate dictionaries are necessary to achieve the results in
this paper. The performance of the systems would improve with a better quality syn-
onym dictionary. It is possible to improve the current dictionaries and tailor them
specifically to the task at hand by verifying the synonym sets and adding other forms of
the tokens already present. This would not only increase the likelihood of a synonym
being present in the dictionary, but also the likelihood that the synonym is a correct
match. Finally, both systems were developed using the language pair Dutch-English,
and because the design of the classifiers is quite simplistic and not necessarily tied
based on a particular language, it would be interesting to see how they would perform
on a different closely related language pair.

References
1. European Commission: Europeans and their languages. Special Eurobarometer 386 (2012)
2. Poplack, S.: Sometimes I’ll start a sentence in Spanish Y TERMINO EN ESPANOL: toward
a typology of code-switching. Linguistics 18, 581–618 (1980)
3. Claeser, D., Felske, D., Kent, S.: Token-level code-switching detection using Wikipedia as a
lexical resource. In: Rehm, G., Declerck, T. (eds.) GSCL 2017. Language Technologies for
the Challenges of the Digital Age. Lecture Notes in Artificial Intelligence, Lecture Notes in
Computer Science, vol. 10713, pp. 192–198. Springer, Heidelberg (2018)
4. Johnson, S.: A dictionary of the english language: a digital edition of the 1755 classic. In:
Besalke, B. (ed.) The History of the English Language. https://johnsonsdictionaryonline.
com/the-history-of-the-english-language/. Accessed 15 April 2014
5. Muysken, P.: Code-switching and grammatical theory. In: Milroy, L., Muysken, P. (eds.)
One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching,
pp. 177–198. Cambridge University Press, Cambridge (1995)
6. Auer, P.: Bilingual Conversation. Amsterdam/Philadelphia, Benjamins (1984)
7. Poplack, S., Sankoff, D.: Borrowing: the synchrony of integration. Linguistics 22, 99–135
(1984)
8. Clyne, M.: Dynamics of Language Contact. Cambridge University Press, Cambridge (2003)
9. Solorio, T., Blair, E., Maharjan, S., Bethard, S., Diab, M., Gohneim, M., Hawwari, A., Al-
Ghamdi, F., Hirschberg, J., Chang, A., Fung, P.: Overview for the first shared task on
language identification in code-switched data. In: Proceedings of the First Workshop on
Computational Approaches to Code Switching, pp. 62–72. Doha, Qatar (2014)
10. Molina, G., AlGhamdi, F., Ghoneim, M., Hawwari, A., Rey-Villamizar, N., Diab, M.,
Solorio, T.: Overview for the second shared task on language identification in code-switched
data. In: Proceedings of the Second Workshop on Computational Approaches to Code
Switching, pp. 40–49. Austin, Texas (2016)
434 S. Kent and D. Claeser

11. Shirvani, R., Piergallini, M., Gautam, G.S., Chouikha, M.: The Howard University system
submission for the shared task in language identification in Spanish-English Codeswitching.
In: Proceedings of the Second Workshop on Computational Approaches to Code Switching,
pp. 116–120. Austin, Texas (2016)
12. Samih, Y., Maharjan, S., Attia, M., Solorio. T.: Multilingual code-switching identification
via LSTM recurrent neural networks. In: Proceedings of the Second Workshop on
Computational Approaches to Code Switching, pp. 50–59. Austin, Texas (2016)
13. Bali, K., Sharma, J., Choudhury, M., Vyas, Y.: I am borrowing ya mixing?: An analysis of
English-Hindi code mixing in Facebook. In: Proceedings of the First Workshop on
Computational Approaches to Code Switching, Doha, Qatar, pp. 116–126 (2014)
14. Patro, J., Samanta, B., Singh, S., Basu, A., Mukherjee, P., Choudhury, M., Mukherjee, A.:
All that is English may be Hindi: enhancing language identification through automatic
ranking of the likeliness of word borrowing in social media. In: Proceedings of the 2017
Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark,
pp. 2264–2274, 7–11 September 2017
15. Nguyen, D., Doğruöz A.: Word level language identification in online multilingual
communication. In: Proceedings of the 2013 Conference on Empirical Methods in Natural
Language Processing, Seattle, Washington, pp. 857–862 (2013)
16. Dongen, N.: Analysis and prediction of Dutch-English code-switching in social media
messages. Unpublished master’s thesis. University of Amsterdam (2017)
17. Postma, M., van Miltenburg, E., Segers, R., Schoen, A., Vossen, P.: Open Dutch WordNet.
In: Proceedings of the Eight Global Wordnet Conference, Bucharest, Romania (2016)
18. Das, A., Gambäck, B.: Code-mixing in social media text: the last language identification
frontier? Trait. Autom. Lang. 54(3), 41–64 (2013)
19. Maharjan, S., Blair, E., Bethard, S., Solorio, T.: Developing language-tagged corpora for
code-switching tweets. In: Proceedings of LAW IX - The 9th Linguistic Annotation
Workshop, Denver, Colorado, pp. 72–84 (2015)
A Systematic Review of Time Series
Based Spam Identification Techniques

Iqra Muhammad ✉ , Usman Qamar, and Rabia Noureen


( )

National University of Sciences and Technology, H-12, Islamabad, Pakistan


iqra1804@gmail.com, usmanq@ceme.nust.edu.pk,
rabia.noureen15@ce.ceme.edu.pk

Abstract. Reviews are an essential resource for marketing the company’s prod‐
ucts on e-commerce websites. Professional spammers are hired by companies to
demote competitive products and increase their own product ratings. Researchers
are now adopting unique methodologies to detect spam on e-commerce websites.
Time-series based spam detection has gained popularity in the recent years. We
need techniques that can help us catch spammers in real time, using fewer
resources. Hence, an analysis involving the use of time series is of utmost impor‐
tance for real-time spam detection. We focus on systematically analyzing and
grouping spam detection techniques that either involve the use of temporal
features, or have used time series. This study will proceed with analyzing the
techniques in terms of accuracy and results. In this research paper, a survey of
different time series based spam detection techniques has been presented and
limitations of the techniques have been discussed.

Keywords: Review spam · Time series · Techniques

1 Introduction

In the past decade, the increasing use of e-commerce websites for online shopping has
also encouraged users to write reviews on products. This evolution of writing reviews
on merchant websites has also led to spammers posting spam reviews. Companies sell
products on e-commerce websites, hire spammers to post spam reviews for demotion
of competitor’s products. Spam has lessened the credibility of online reviews and people
become reluctant to buy a product, unsure whether the online reviews about a product
are spam or not spam. Online spam reviews affect both buyers and sellers. Researchers
have adopted a number of approaches for detection review spam. The conventional
approaches for detecting review spam involve focusing on one reviewer or a single
online review [1]. The authors in previous approaches [1], have detected duplicated
reviews in a dataset as spam In addition to this, some previous methods of spam detection
have focused on using n-gram features for spam identification [2]. Our study will focus
on providing a critical analysis, of the spam detection techniques that make use of time
series to identify spam.
Some spam detection techniques involve the use of psychological and behavioral
features and identifying fake reviews [3, 4]. In addition to this, some state of the art

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 435–443, 2019.
https://doi.org/10.1007/978-3-030-02686-8_33
436 I. Muhammad et al.

focuses on identifying temporal patterns for detection of spam [5]. Temporal patterns
involve exploration of temporal burstiness patterns for detection of opinion spam [5].
Author in [6] introduces a robust spam identification approach, in which content-based
factors, rating deviation and activeness of reviewers are employed along the use of time
series to identify spam in online reviews. The authors in [6] have listed the disadvantages
and advantages of the proposed technique in terms of increasing time efficiency and
reducing high computation requirements.
The authors in [7] have linked burstiness with reviewers. Bursts of reviews are
defined as the abnormal peaks in a time series of reviews. Bursts can occur in a time
series due to several reasons. The first reason can be due to the sudden rise of a product’s
sale on the merchant website. The second reason for the occurrence of a burst in a time
series can be due to spam attacks. Many current state of the art techniques have captured
these bursts in time series for identification of spam attacks. A spam review and a
spammer can be related in a burst. Spammers like to work in groups while posting
reviews hence spam reviews are related in a burst. Non-fake reviews are also related to
other non-fake reviews in a burst of time series of reviews. The authors in [8] have used
a time series based fake review detection approach in which, they have combined content
and usage information. This study [8] has covered product reviews and the behavioral
qualities of reviewers. Lastly, the authors in [9] have highlighted the technique of using
correlated temporal features for identifying spam attacks. Their methodology [9] of
spam attacks revolves around the creation of a multidimensional time series derived
from aggregation of statistics. The time series [9] has been constructed to show the
effectiveness of using correlations.
In the current study, a comparative analysis of existing time series based spam
detection techniques has been performed. The focal point of our review paper is that
after going through mentioned techniques, experts can devise an efficient time series
based spam detection approach that uses novel temporal features. The researchers can
benefit from the review of time series based spam detection techniques and identify the
limitations of the existing techniques to propose new methods of temporal-based spam
detection methods. The paper is ordered as follows: Sect. 2 describes the terms of spam
detection and time series, Sect. 3 describes the critical analysis of some of the time series
based spam detection techniques. The next Sect. 4 includes discussion on the techniques
and Sect. 5 consists of the conclusion and future work.

2 Definitions

2.1 Time Series

Time series is defined as a series of data points arranged in a timely order. Time
series is widely used in the banking sector to identify fraud in credit cards. It is also
used as an application in anomaly detection [9]. The authors in [9] use multivariate
time series as a tool for anomaly detection. Time series has also been recently used
in the literature, for the detection of opinion spam [8]. Time series can be defined
mathematically using the simple regression model:
A Systematic Review of Time Series 437

y(t) = x(t)𝛽 + 𝜀(t), (1)

where y(t) = {yt; t = 0, 1, 2,…} is a sequence, numbered by the time subscript t. t


includes an observable signal sequence x(t) = {xt} and an unobservable white-noise
sequence ε(t) = {εt} [16].

2.2 Review Spam Detection Techniques

Review Spam is defined as the set of fake reviews posted on e-commerce websites.
Opinion spam detection techniques [2, 3] have been widely used by researchers to detect
fake spam. Such techniques assist e-commerce websites in automation of spam detec‐
tion.

3 Systematic Review

This section will give an overview of some papers found in the literature have used time
series or temporal features for the detection of opinion spam.

3.1 On the Temporal Dynamics of Opinion Spamming

In [5], hybrid technique has been used to identify spamming on time series of Yelp
reviews. The authors in [5] discovered temporal patterns in time series and their rela‐
tionship with the posting rates of spammers. They used auto vector regression methods
to predict the fraud rate during multiple spamming policies. The authors in [5] also
discovered the effects of filtered reviews on the rating on future rating of reviews. Author
in [5] has covered three types of spamming policies. Due to the presence of three types
of spamming policies, restaurants in yelp were grouped according to the policies. They
calculated set of 10 modalities of normalized time series. For each behavioral modality,
they had to use time series clustering in a certain policy. The authors in [5] also char‐
acterized the reasons of spamming by making a comparison of the time series of decep‐
tive rating with the truthful ratings. They had to use number of weeks as time interval
for the time intervals in the time series. They also found out the major reasons of the
deceptive ratings using correlation techniques. The authors also carried out 5-fold cross
validation with classification on time series features, behavioral features and n-gram
features. This technique lacked the use of ten-fold cross validation when applying clas‐
sification on the review features. The authors could have also used additional set of
textual features from the review text to improve the accuracy of the model. The compar‐
ison of different spam detection techniques has been shown in Table 1.
438 I. Muhammad et al.

Table 1. Shows precision, recall, f-score and accuracy for all techniques.
Approaches Dataset Precision Recall F-score Accuracy
On the temporal Yelp hotels and Restaurant 86.3 95.3 90.6 90.1
dynamics of opinion Review dataset [14]
spamming [5] (late
spamming)
Exploiting Burstiness in Amazon Review Dataset 83.7% 68.6% 75.4% 77.6%
Reviews for Review [13]
Spammer Detection [7]
(Burst review with LBP
and local observation)
Fake Review Detection Amazon Review Dataset 75.2 75 7 74.9 x
via Exploitation of Spam [13]
Indicators and Reviewer
Behavior Characteristics
[8]
Detection of Fake Amazon Review Dataset 82 88 86 x
Opinions using time [13]
series [6]
Biomodal Distribution Dianping’s real-life filtered x x x x
and Co-bursting in (fake or spam) reviews [15]
Review Spam Detection
[10]
Modelling Review Spam Dianping’s real-life filtered x x x x
Using Temporal Patterns (fake or spam) reviews [15]
and Co-Bursting
Behaviors [12]
Review Spam Detection Review website x x x x
via Temporal Pattern (www.resellerratings.com)
Discovery [11] [11]

3.2 Exploring Burstiness in Reviews for Review Spammer Detection

A sudden rise in the popularity of products or the presence of spam attacks can produce
bursts in time series. The authors in [7] have captured these bursts in time series of
reviews. Spam reviews are related to other spam reviews in a burst. The reason is that,
the spammers work in groups and post spam reviews collectively. Real reviews are
related to other real reviews in time series. Author in [7] has proposed a robust spam
detection framework that uses a network of reviewers appearing in the peaks of time
series. They have also modeled reviewers and their co-occurrence in the peaks as Markov
Random Field. In addition to this, they have used Loopy Belief Propagation technique
to decide whether a reviewer can be marked as a spammer or not. They also used feature-
engineering techniques, in the Loopy Belief Network for network inference. Lastly, they
used an evaluation technique of using supervised classification on their reviews. The
limitations of this technique [7] include testing the proposed method on other review
datasets to increase the validity of their technique.
A Systematic Review of Time Series 439

3.3 Fake Review Detection via Exploitation of Spam Indicators and Reviewer
Behavior Characteristics
In [8] the authors have proposed a novel spam detection framework for the identification
of spam reviews. This technique combines content and usage information for the iden‐
tification of spam product reviews. The model also includes reviewer’s behavioral char‐
acteristics and product reviews. The authors have derived a relationship between both
reviews and spammers. Their proposed model [8], identified bursts to examine suspi‐
cious time intervals of product reviews. The technique has also employed each review‐
er’s past record of reviewing to derive the authorship attribute. This authorship attribute
of a reviewer is a strong indicator of spam in product reviews. The technique [8] has not
only considered reviews in burst intervals but also considered reviews outside the burst
intervals. The authors employed [8], basic spam indicators like the rating deviation,
number of reviews and content similarity. The reviews captured from burst time intervals
included spam indicators like content similarity and burst activity. The techniques last
step involves a linear weighted scoring function, which integrates the individual scores
and calculates a mean output for overall spam score.
Lastly, the technique [8] has been validated on a real word review dataset. The limi‐
tations of this technique may include lack of effective features. The feature set used for
identification of spam reviews can be improved by using additional reviewer based
features like reviewers location and taking into account reviewer’s writing style. They
can also use a different weighting scoring function for assigning scores, which might
improve the accuracy of the model.

3.4 Detection of Fake Opinions Using Time Series

Author in [6] focuses on the implementation of a unique time series based spam detection
algorithm. The algorithm involves factors like rating deviation, activeness of reviewer
and other content based factors or detection of spam reviews. There are certain flaws
associated with conventional spam detection techniques. The proposed technique [6]
has tried to overcome flaws of high time consumption and high computations time. The
technique is based on the assumption that the spammers work in groups and spam
reviews frequency raises during certain time intervals. Author in [6] has tried to over‐
come the drawbacks of high time consumption and high computation required for
searching for spam in large review datasets. The authors [6] have proposed that the
system can be used as a real-time spam filtering system. We can easily clean large review
datasets from spam reviews. Their proposed model achieved an F-score of 0.86. The
limitation of this study is that they have not taken into account, the spam reviews that
might exist outside the time series bursts. Secondly, the authors could have increased
the accuracy of the model by employing features focused on the characteristics of a
spammer like spammer’s IP address etc. Lastly, the validity of their proposed technique
[6] can be increased by applying it onto multiple datasets. The technique is domain
dependent because it has been created for application on review datasets.
440 I. Muhammad et al.

3.5 Biomodal Distribution and Co-bursting in Review Spam Detection


The author in [10] highlights the issue of spam detection and proposes a hybrid approach
of using biomodal distribution and co-bursting factors. According to the authors, online
reviews are critical for the comparison of different products on merchant websites [10].
As explained earlier in the article spammers and fraudsters take advantage of online
reviews and post fake opinions to attract customers on certain products. The previous
approaches have made us of review contents, reviewer’s behavioral traits and rating
patterns. This research [10] has focused on exploiting reviewer’s posting rates. The
authors [10] discovered that the reviewers posting rates have a biomodal relationship
with each other. According to [10], spammers post reviews in a collective manner within
short intervals of time. This phenomenon of posting reviews collectively is called co-
bursting. The authors in [10] have discovered patterns in a reviewer’s temporal
dynamics. Authors in [10] include a labeled hidden Markov model with two modes. This
model has been used to detect spamming using a single reviewer’s posting times. The
method is then extended to couple hidden Markov model for identifying posting
behavior and signals with co-bursting. They have also proposed a co-bursting network
based model, which aids in detection of spammers. The proposed approach [10] lacks
evaluation of the model through the use of supervised machine learning techniques.

3.6 Review Spam Detection via Temporal Pattern Discovery

This proposed approach [11] provides evidence of spam attacks being bursty. The bursts
in a time series can be either positive or negative. The authors propose [11] a correlated
temporal approach to detect spam. This approach uses singleton reviews spam identifi‐
cation. In addition to this, it maps SR spam identification to correlated pattern detection.
The proposed approach [11] is based on multidimensional time series anomaly detection
algorithm. The algorithm involves making a multi-scale time series and use statistics
with joint anomalies as an indicator of spam. The detected statistics involve factors like
average rating, ratio of singleton reviews and lastly the average rating of reviews. The
time-series, is then developed and an SR spam detection model is based on this time
series. The algorithm also uses integration of longest common subsequence and curve
fitting. Both of these factors are used to find abnormal sections in each dimension of
time series.
The authors [11] have introduced a ranking technique to sum up all anomalies in
various dimensions for detection of abnormal sections in time series. Fluctuations are
common in time series. This algorithm has used a time window size of more than two
months, so that noises in the time series can be smoothed. In a certain scenario, if a
singleton review spam attack occurs in time series, the time window size is decreased
so that any further abnormal patterns can become more obvious. The construction of
time series is done, and this time series is multi-dimensional. Multi-dimensional time
series is then used to identify abnormally correlated pattern detection problem. The
results of this methodology show that it is quite effective in identification of singleton
review spam attacks. The limitation of this approach can be that this technique is not
A Systematic Review of Time Series 441

applicable on other types of spams like sms and email spam. The model has been tested
on a single dataset.

3.7 Modelling Review Spam Using Temporal Patterns and Co-bursting


Behaviors
This technique [12] is based on a real life dataset from a review hosting site called
dianping. The authors [12] discovered that reviewers posting rates were biomodal. In
addition to this scenario, the transitions between different states could be used to detect
spammers from real reviewers. The technique proposed, involves a two model labeled
hidden Markov model for identification of spammers in review websites. The findings
of the model prove that the existing approach can outperform, supervised machine
learning algorithms. Spammers are keener on writing reviews in a group and hence bursts
in time series of reviews are created. The authors in [12] propose a co-bursting based
approach for identifying spammers. This framework can enable more precise detection
of spammers and outperforms the current state of the art mentioned in [12].
The authors have also mentioned that biomodal distributions are disparate and these
distributions were identified in both form as review spammers and non-spammers. The
limitation of this approach is that it requires time stamps of reviews in a dataset. Without
the presence of time stamps, the approach is not applicable in real life datasets. The
advantage of the algorithm is that it can be applied to commercials review spam filters.

4 Discussion

We have compared all the approaches using the metrics of precision, F-score, recall and
accuracy. The amount of precision, recall, f-score and accuracy for each technique has
been taken from the articles mentioned in Table 1. A comparison has been made among
the techniques keeping in view the fact that most of these approaches have been applied
to the similar datasets. After the comparison, it can be seen that only some of the algo‐
rithms mentioned in Table 1, have used precision, recall, f-score and accuracy for
comparison. The first article referred in Table 1, uses the dataset of Yelp [14]. Yelp [14]
is a website that provides reviews on hotels and restaurants. Spammers work in groups
to post fake reviews about certain hotels. Spammers target hotels and Restaurants and
fake reviews cause their ratings to decrease. This approach [5] has achieved an accuracy
of 90.1 with late spamming. Late spamming achieved the best set of precision and accu‐
racy among all three types of spamming. The second approach [7] mentioned in
Table 1, is based on exploration of burstiness in reviews for spammer identification.
This approach [7] produced these set of results in the table with the use of LBP and local
observation techniques. This algorithm used Amazon review dataset [13]. Amazon
review dataset [13] provides a large-scale dataset on various set of products. Products
rating, reviews and other attributes have been included in Amazon review dataset. This
approach achieves an accuracy of 70.1 with the LBP and local observations. The third
approach [8], included in the table is based on spam detection by using reviewer char‐
acteristics and various spam indicators. This approach [8] also used Amazon review
442 I. Muhammad et al.

dataset [13]. The approach [8] didn’t use the metric of accuracy for evaluating its model.
It achieved an F-score of 74.9%. The fourth algorithm [6], mentioned in the table makes
use of time series and other reviewer traits to detect spam in reviews. It has also Amazon
review dataset [13]. The model achieved an F-score of 86%. This model [6] didn’t used
any supervised machine learning technique to classify the suspicious set of reviews as
spam or non-spam.
The fifth article [10] included in Table 1, is based on a biomodal distribution model
used to detect review spam. This model used dianping’s [15] real life dataset. Dianping
[15] is a Chinese website that includes reviews about consumer products and retail
services. Dianping dataset is the single largest dataset to have spam and non-spam
classes. Each review is for a single individual. There have been references in the liter‐
ature of yelp datasets [14], with class labels but these datasets are much small in size
when compared to dianping dataset [15]. The authors in [10] have reasonably argued
their choice of dataset because of its large size and presence of labels. The models
proposed by this article [10], outperform existing models on this huge dataset [15]. This
paper [10] didn’t use any metrics like accuracy, precision, recall and f-score for its spam
detection model evaluation. The next technique [13] included in Table 1 has used
temporal patterns and co-bursting factors to identify spam in review dataset. This article
[12] has also used dianping’s real life dataset [15]. Temporal features were extracted
from the dataset time stamps [12]. The authors in this article [12] haven’t used metrics
like precision, recall, f-measure and accuracy for evaluation of proposed model. The
last technique [11] mentioned in Table 1 has highlighted the importance of temporal
features in reviews, for spam detection. Temporal patterns have been discovered in the
reviews of a reseller website [11]. The dataset [11] contained around 408,469 reviews.
Each review in the dataset [11] can be identified by a unique id. The authors in [11] used
the dataset for suspicious store detection via identification of singleton spam attacks.
Human evaluators in [11] were used to perform validation of the results by reading
reviews from all 53 stores and singling out stores that were suspicious. This technique
did not employ metrics like precision, recall, f-score and accuracy for evaluation of its
model. In conclusion, all approaches mentioned in Table 1, used time series based on
the assumption that spammers work in groups when posting spam reviews. Their collec‐
tive manner of working produces bursts in times series of reviews and we can easily
capture these bursts for spam detection.

5 Conclusion and Future Work

This research paper highlighted state of the art methods that involved the use of time
series for spam detection in online reviews. It made a critical comparative analysis of
the techniques present in the literature. It also showed the details of the techniques of
each related article in the literature related to time series based spam detection. Secondly,
we also provided a summarized overview of all techniques, their used datasets and made
a comparison of the metrics used for the evaluation of the proposed models. Our review
paper can be used by experts as an asset while searching for state of the art relevant to
time series based spam detection. Future work of this study includes proposing a hybrid
A Systematic Review of Time Series 443

approach to time series based spam detection. The model can include more diverse
feature engineering techniques and the use of supervised machine learning techniques
for suspicious reviews filtered by time series.

References

1. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International
Conference on Web Search and Web Data Mining - WSDM 2008 (2008)
2. Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion
spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers) (2014)
3. Dewang, R.K., Singh, P., Singh, A.K.: Finding of review spam through “Corleone, review
genre, writing style and review text detail features”. In: Proceedings of the Second
International Conference on Information and Communication Technology for Competitive
Strategies - ICTCS 2016 (2016)
4. Mukherjee, A., Kumar, A., Lin, B., Wang, J., Hsu, M., Castellanos, M.: Spotting opinion
spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 632–640 (2013)
5. Kc, S., Mukherjee, A.: On the temporal dynamics of opinion spamming. In: Proceedings of
the 25th International Conference on World Wide Web - WWW 2016 (2016)
6. Heydari, A., Tavakoli, M., Salim, N.: Detection of fake opinions using time series. Expert
Syst. Appl. 58, 83–92 (2016)
7. Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., Ghosh, R.: Exploiting burstiness
in reviews for review spammer detection. In: Kiciman, E., et al. (eds.) ICWSM. The AAAI
Press (2013)
8. Dematis, I., Karapistoli, E., Vakali, A.: Fake review detection via exploitation of spam
indicators and reviewer behavior characteristics. In: SOFSEM 2018: Theory and Practice of
Computer Science Lecture Notes in Computer Science, pp. 581–595 (2017)
9. Li, J., Pedrycz, W., Jamal, I.: Multivariate time series anomaly detection: a framework of
hidden Markov models. Appl. Soft Comput. 60, 229–240 (2017)
10. Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A., Shao, J.: Bimodal distribution
and co-bursting in review spam detection. In: Proceedings of the 26th International
Conference on World Wide Web - WWW 2017 (2017)
11. Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery.
In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining - KDD 2012 (2012)
12. Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A.: Modeling review spam using
temporal patterns and co-bursting behaviors. arXiv preprint arXiv:1611.06625 (2016)
13. Amazon: Amazon (2018). http://snap.stanford.edu/data/amazon/productGraph/. Accessed 4
Feb 2018
14. Yelp: Yelp (2017). http://www.yelp.com. Accessed 6 Dec 2017
15. Dianping Chinese Review dataset. http://liu.cs.uic.edu/download/dianping/. Accessed 6 Apr
2018
16. Hamilton, J.D.: Time Series Analysis, vol. 2. Princeton University Press, Princeton (1994)
CNN with Limit Order Book Data
for Stock Price Prediction

Jaime Niño1(B) , German Hernandez1 , Andrés Arévalo1 , Diego Leon2 ,


and Javier Sandoval2
1
Universidad Nacional de Colombia, Bogotá, Colombia
{jhninop,gjhernandezp,ararevalom}@unal.edu.co
2
Universidad Externado de Colombia, Bogotá, Colombia
{diego.leon,javier.sandoval}@uexternado.edu.co

Abstract. This work presents a remarkable and innovative short-


term forecasting method for Financial Time Series (FTS). Most of the
approaches for FTS modeling work directly with prices, given the fact
that transaction data is more reachable and more widely available. For
this particular work, we will be using the Limit Order Book (LOB) data,
which registers all trade intentions from market participants. As a result,
there is more enriched data to make better predictions. We will be using
Deep Convolutional Neural Networks (CNN), which are good at pat-
tern recognition on images. In order to accomplish the proposed task
we will make an image-like representation of LOB and transaction data,
which will feed up into the CNN, therefore it can recognize hidden pat-
terns to classify FTS in short-term periods. We will present step by step
methodology to encode financial time series into an image-like represen-
tation. Results present an impressive performance, ranging between 63%
and 66% in Directional Accuracy (DA), having advantages in reducing
model parameters as well as to make inputs time invariant.

Keywords: Short-term forecasting · Deep Learning


Convolutional Neural Networks · Limit Order Book
Pattern recognition

1 Introduction

Finance has become a highly sophisticated scientific discipline that depends on


innovations from computer science to analyze huge flows of data in real time.
Finance offers nonlinear relationships and large data sets on which Machine
Learning (ML) flourishes, but they also impose tremendous challenges when
applying these computational techniques, due to data noisiness, non linearities
among other characteristics of financial systems. Literature is vast when report-
ing applications using machine learning methods for FTS modeling [4,6,9,11,15].
Works include Artificial Neural Networks, Support Vector Machines, among
others. Lately, Deep Learning has emerged as a superior ML technique for a
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 444–457, 2019.
https://doi.org/10.1007/978-3-030-02686-8_34
CNN with Limit Order Book Data for Stock Price Prediction 445

wide variety of fields, including Image Recognition, Audio Classification, Natu-


ral Language Processing, as well as FTS Forecasting and Algorithmic Trading
among others. In this work, we use a Convolutional Neural Network to predict
movements of FTS. We will be working with both LOB and transaction (tick)
data. LOB data contains all traders intentions to negotiate an asset at a par-
ticular price and quantity at certain time t. LOB information is richer than
transaction data, which only records prices and quantities exchanged at certain
time t. In order to use CNN, we represent both LOB and tick data as images.
Results are very competitive when compare to other DL approaches reported in
[1,3,7,16,20], with the advantage of using the same trained model for different
assets.
This paper continues as follows: Sect. 2 explains how LOB and tick data is
transformed into images, Sect. 3 gives a brief summary of CNN, Sect. 4 explains
the methodology to process and classify image data, Sect. 5 shows results and
Sect. 6 gives final remarks, conclusions, and further work opportunities.

2 Limit Order Book and Tick Data Transformation


2.1 Definitions
Limit Order Book. Order Book Data records market agents buy/sell inten-
tions. It includes a time-stamp, quantity and price to buy/sell. This data is
known as Limit Order Book (LOB). Formally, an order x = (p, q, t, s) sent at
time tx with price px , quantity qx (number of shares) and side sx (buy / sell),
is a commitment to buy/sell up to qx units of an asset at price px . Orders are
sorted by arrival time t and quoted price p. Sell orders have larger prices than
buy orders. [5,8,18] Some other useful concepts include [8,18]:
– Spread size is the difference between the best sell and buy price.
– Bid price is the highest price among all active buy orders at time t. Conversely,
Ask price is the lowest price among all active sell orders at time t. Both are
called best quotes.
– An LOB L(t) is the set of all active orders at time t.
Dynamics of LOB are complex [5,8], since it reflects interactions among mar-
ket agents with a different point of views and different trading strategies. For a
particular time t, LOB concepts are illustrated in Fig. 1.
When all recorded intentions are joined, they can be seen as an Image Fig. 2.
On this image representation, y-axis represent prices, the x-axis is time and each
point is a quantity willing to be traded. The darker the color the most quantity
q at certain price p. In [19], authors used this graphic representation to cluster
LOB-Patterns in order to build a classifier. Based on this work, LOB data can
be seen as a list of tuples (prices-quantities) where agents expect to negotiate.
Numerically, this representation can be seen as a multivariate FTS1 .
1
Some considerations should be done, particularly related to the dimensionality of
the FTS.
446 J. Niño et al.

Fig. 1. LOB snapshot, taken from [8].

LOB Representation. For a set of successive timestamps, LOB data can be


represented as a matrix-like object, where column labels are timestamps, row
labels are prices and the content of each cell is the number of shares to bid/ask.
Each cell contains a quantity q, with subindex side s, time t and price line p.
Order side could be either ask a or bid b. Because there are order imbalances,
price lines subindex are k for the ask side and j for the bid side (Table 1).

Table 1. LOB matrix representation

t0 t1 ... tn
AskP ricek qa0k ... ... qank
AskP ricek−1 qa0k−1 ... ... qank−1
... ... ... ... ...
AskP rice0 qa00 ... ... qan0
BidP rice0 qb00 ... ... qbn0
... ... ... ... ...
BidP ricej−1 qb0j−1 ... ... qbnj−1
BidP ricej qb0j ... ... qbnj

Normalizing each qsti between 0–255, will produce a LOB gray scale image.
However, there is a lot more information in LOB data. Because each order is
recorded individually and sorted by arrival time, it is possible to aggregate vol-
umes at the same price. By doing so you can get how many different orders
(quotes) are placed at the same price. Formally, for each unique price p adds all
quantities qk , where q = [q1 , q2 , ...qm ], being m the last entered order at price p.
This information is very important because is different to have many distinct
agents interested at one particular price that just a few ones. However this
fact, under real market conditions, goes hand in hand with how much volume
(quantity) of the asset is available at that particular price p. In other words,
it is important to have some sense of the distribution. It is different to have a
CNN with Limit Order Book Data for Stock Price Prediction 447

lot volume concentrated in just one participant that distributed across many.
To introduce this information in our representation, we used maxpk (q), for each
unique price p at line k, signaling a sense of the volume distribution.
As a result, we will represent LOB data in a 4-channel representation, which
can be seen as a RGBA image (Fig. 2), where:

– R channel is only used for ask volumes qa , 0 otherwise.


– G channel is only used for bid volumes qb , 0 otherwise.
– B channel is only used to represent total number of placed orders at a unique
price p.
– A channel is only used to represent volume distribution for a unique price p,
taking maxpk (q).

Fig. 2. LOB as image, taken from [19].

Tick Data. Tick data records transactions, that is prices and quantities
exchanged for a particular asset. Formally, a transaction occurs when at time
t the bid price equals the ask price. At this point, a transaction T = (p, q, t)
occurs, where pT is the price, qT is the shares quantity exchanged and tT is the
transaction time-stamp [18]. Tick data is a univariate time series2 .

Tick Data Graphical Representation. As mentioned before, tick data is the


most widely used data when modeling FTS. This is because is easier to obtain.
LOB data is more difficult to get and usually cost a lot, not just in money terms
but also in storage terms. Transactions are heavily influenced by the intentions
2
Bivariate if volumes are included.
448 J. Niño et al.

recorded in the LOB, but they do not have the richness of LOB. Nevertheless,
we expect, that in conjunction with the LOB, to yield better results. In other
to homogenize inputs, it is necessary to transform tick data into a matrix-like
representation. In [22], authors show a step by step methodology that transforms
univariate time series into an image representation. This transformation is called
Grammian Angular Field (GAF), which consists of the following steps3 :
– Time series normalization between [−1, 1]
– Time series is converted from Cartesian to Polar coordinates
ti
φ = arccos(xt ); r = , ti ∈ N (1)
N
– GAF matrix deduction, defined as:
⎡ ⎤
< x1 , x1 > . . . < x1 , xn >
⎢ < x2 , x1 > . . . < x2 , xn > ⎥
⎢ ⎥
⎢ .. .. .. ⎥,
⎣ . . . ⎦
< xn , x 1 > . . . < x n , x n >
√ 
where < x, y > = x · y − I − x2 · I − y 2
Authors in [22] used for non-Financial Time Series. In this paper, we apply
the same general steps in order to obtain a graphical version of the tick data, as
illustrated in Table 2.
One advantage of this transformation is that marks peaks of the input signal,
based on intensity levels Table 2. This is useful for pattern recognition because
it helps to differentiate price variances within the original signal. On the other
hand, the transformed input can be rolled back to the original signal [22]. We
expect that on this new space, patterns could be easier to identify since CNN’s
learning capabilities have been proven good in frequency spaces. In fact, in a
previous work we show how a wavelet transformation improve results over a
pure time-space approach [1]4 .

3 Deep Learning - Convolutional Neural Networks


The concept of Deep Learning (DL) was adopted from Neuroscience [13], where
the seminal authors [17] proposed a novel way of how our visual cortex processed
data coming in through our visual system using a layered representation, starting
in the retina all the way up to the visual cortex. Their proposal consisted of
making sparse representations of input data, in order to get its appropriated
representation. In other words, any instance of data can be reconstructed as a
different linear combination of the same components from sparse representations
from the original data or to make more complex representations of the data at
each layer by combining the representation of the previous layer [13].
3
For full details please refer to [22].
4
We used other DL topologies.
CNN with Limit Order Book Data for Stock Price Prediction 449

Table 2. Original tick data vs Image representation of tick data

Tick-data line Image


chart representation

This development was computational feasible only until 2006, when semi-
nal authors [10], proposed a novel Unsupervised Learning algorithm to train
deep architectures consisted of Restricted Boltzmann Machines (RBM). This
model was capable of building complex representations of data at deeper layers
by capturing sparse representations from the previous ones. At that time, this
algorithm won an Image Classification contest and it was established as the DL
introduction [13].
Since its emergence, DL has facilitated the application and use of different
neural network topologies more successfully in different fields, due to the fact that
DL tackles the issue of gradient vanishing while training multilayer networks. As
a result, different network topologies are being used with DL, including tradi-
tional Multilayer Perceptron (MLP), Recurrent Neural Networks (RNN), Long
Short-Term Memory (LSTM) Networks, Deep Belief Networks (DBN) and Con-
volutional Neural Networks (CNN). Each topology has its own particularities.
In the case of CNN, they have been used for Image Processing and Classification
task. A CNN is a variation of a Multilayer Perceptron, which means that it is a
feed-forward network, however, it requires less processing when compared to a
MLP, due to the mechanism used to process input data. Moreover, CNN’s main
characteristic is to be space invariant, that is due to the convolution operator
that transform data inputs.
CNN are biological inspired, trying to emulate what happens in mammal’s
Visual Cortex, where neural cells are specialized to distinguish particular fea-
450 J. Niño et al.

tures. Building blocks of a CNN architecture are in charge of doing this feature
detection by activating or de-activating a set of neurons. Since market agents
decisions are mostly made from visual analysis of price changes and events in
the LOB, we expect that an algorithm can learn patterns in order to help trig-
ger trading decisions. In fact, [18,19] shown that a visual dictionary could be
constructed from LOB data and that dictionary had predicting capabilities.
The two main build blocks of a CNN are the convolution layer and the pooling
layer, which in conjunction with a dense layer, complete a CNN.

Convolution Layer. It is in charge of applying convolution operator to the


input matrix, in other words it applies a kernel to filter data input. Depending on
the parameters used, it can reduce or maintain input’s dimensionality. The reason
to convolve is to identify edges. That means to identify or separate features that
later on can be used to construct more complex representations in deeper layers.

Pooling Layer. It is a local operator, that takes convolution output and maps
subregions into a single number. The pooling operator can extract the max value
of the mapped subregion (Max pooling) or the average value of the mapped
subregion (Average Pooling). In other words, it gets subsamples out of the Con-
volution Layer.
Usually both layers make are treated as one layer in the CNN topology,
however, it is not necessary to have one convolution and one pooling layer.
Additionally, CNN topologies usually include various layers of convolution plus
pooling, therefore networks extract simpler features at the first layer, and by
combining those, can learn more complex features in deeper layers.

Dense Layer. Finally, the deeper convolutional layer is connected to a dense


layer (fully connected), from which network obtains its outputs. As mentioned
before, the CNN topology may have one or more dense layers.

AlexNet and LeNet: Well-Know CNN Architectures. LeNet-5 is a CNN


created by [14] and it was aimed to make hand-written number recognition. It
consists of 7 layers (Input, Conv + Pool, Conv + Pool, Dense + Output). At that
time, computing resources were scarce, creating a constraint for this technique.
However, as computer resources got better in performance and cost, training this
particular architecture is easy and it has become a baseline in image recognition
contests.
AlexNet was created in 2012 and it became famous due to the fact that
reduces the classification error in an Image Recognition Contest to 15.3% by
that time. Nowadays, classification error is much lower. Since AlexNet was the
pioneer, it has become baseline architecture as LeNet. AlexNet took advantages
of computer developments, particularly parallel processing through Graphics
Processing Units (GPUs). It was created by [12]. It has more filters than LeNet as
well as stocked convolution layers, as a result, it is deeper with more parameters.
CNN with Limit Order Book Data for Stock Price Prediction 451

We decide to compare different CNN topologies, in order to compare DA


among them in order to analyze advantages and disadvantages of each one. We
will make the comparison with another self-created CNN topology.
Next section will give step by step explanation for our experiment.

4 Classifying Financial Time Series with CNN


4.1 Why a CNN for FTS Classification
– Firstly, DL models have demonstrated a greater effectiveness in both clas-
sification and prediction tasks, in different domains such as video analysis,
audio recognition, text analysis and image processing. Its superiority is due
to the fact that they are able to learn useful representations from raw data,
avoiding the local minimum issue of ANNs, by learning in a layered way using
a combination of supervised and unsupervised learning to adjust weights W .
– Secondly, DL applications in computational finance are limited [2,3,7,21,23]
and as long as it goes to our knowledge, there is no publication applying CNN
to FTS, particularly using LOB data for short-term periods forecasting.
– Thirdly, CNN are good for pattern recognition, real traders have told us that
they try to identify patterns by following buy/sell intentions in a numeric
form. In a previous work, [18] identified volume barriers patterns to translate
them into trading decisions and [19] identified visual patterns and cluster
them into a bag of words model to predict market movements. As a result of
these works, we decided to extend them and use a more suitable technique
for pattern recognition such as CNN on image-like representation of market
data.
– Finally, by applying input space change (from time to a frequency), we
expect that CNN will recognize patterns more effective, indeed authors in
[1] improved their results by using wavelets to represent high frequency data
of several financial assets. Even tough our images are not natural ones, we
expect that CNN’s layers are capable to distinguish simple frequency changes
(edges) at lower layers in order to identify more complex patterns at deeper
ones.

4.2 Experimental Setup


– Data acquisition: Original dataset is compose of LOB and transaction data
for 12 stocks listed on the Colombian Stock Market, from Feb 16, 2016 to Dec
28, 2017. Dataset includes 184,450 LOB files and 612,559 ticks (transactions),
totaling 590MB in disk.5
– Data preparation: For each stock, data normalization was conducted, taking
into account some considerations which include handling of no orders at some
price levels in LOB data, some liquidity constrains and event of LOB. Details
are given in the next subsection.
5
Data provided by DataDrivenMarket Corporation.
452 J. Niño et al.

– Data transformation: For each stock both LOB data and tick data are trans-
formed to an image-like representation, following the methodology previously
explained.
– CNN modeling: We chose a base CNN architecture. We trained and test it
with transformed data.
– Model Comparison across different CNN architectures: We use another two
CNN, which mimic Le-Net and Alex-Net standard architectures, in order to
compare the proposed model.
– CNN comparison again other DL topologies: We compare results achieved
results obtained in this work against others, which have been used for simi-
lar problems (Short-term forecasting) but different Deep Learning topologies
(RNN, LSTM, Multilayer Perceptron, DBN)

Following paragraphs will provide further details of our experimental setup.

4.3 Data Preparation


Data Normalization. For each stock, prices, volumes (quantities) and a num-
ber of orders at the same price were normalized between (0–1]. Given the fact
that LOB data may have price levels with no demand/offer, minimum values
were reduced by a small factor so that minimum values had a small value above
zero. That is because empty cells in the LOB had a 0 value, therefore we can
differentiate a no entry in LOB with an entry with a very low volume or just
one order at certain price p.
Data normalization by stock facilitates magnitude equilibrium across all stock
data, regardless their nominal prices or volumes. In other words, we homogenize
the image representation in different dimensions: price, quantities, and a number
of orders.

Handling of Liquidity Constraints. Given the fact that Colombian market


is not highly liquid, we only took, for each stock, LOB data that had enough
entries in a single trading day. That is, we took trading days which had more
than 100 files on LOB data per stock, which is equivalent to have at least one
LOB event for any given stock every three and half minutes on average. For
classification purposes, it would not mind having low liquid days mixed with
high liquid days. However, for practical purposes, liquidity constraints are very
important in financial markets, because spreads may vary widely as liquidity is
lower. That is the reason we choose samples corresponding to highly liquid days.

Handling of LOB Events. We took an event-based approach, that is to


analyze a fixed number of LOB events (10 in this case). This means that the
LOB matrix explained in Sect. 2 Table 1, was partitioned into fixed segments of
10. And we took all of the ticks that happened between this 10 LOB records,
to create the corresponding image for tick data. Figure 3 illustrates procedure
described above.
CNN with Limit Order Book Data for Stock Price Prediction 453

Fig. 3. LOB events.

Handling of LOB Deepness. LOB data may have many different lines or
prices in both side (bid/ask). Depending of market conditions depth wide varies,
that is not all time you will have a symmetric number of lines for each book side.
We have decide to work with LOB data of 10 lines depth, that is the first 10
different prices for each side. Prices start from the best quotes (down/up) side
depending (bid/ask).

Additional Considerations. It is important to note the following:


– Price dynamics make that a price matrix with more than 20 rows (prices)
in Fig. 1. In other words, we will have unequal height for each LOB 10-event
image. Table 3 shows results graphically.
– Prices with no volume will have a 0 value. This value will be always different
for the lowest volume after normalization, as mentioned before.
– To make LOB image’s size homogeneous for modeling purposes, we resize each
image to be 10 width and 40 height. Individual price matrices have different
height. This happens because of price dynamics, in order words there are
different set of prices for each time t, depending of traders intentions.

4.4 CNN Modeling


Data Input. Four channel images are used, one for LOB data another for tick
data. A five dimensional tensor is used for data input, with size [n, 2, 10, 40, 4].
The first dimension is the number of samples, second one the number of images
categories (LOB/tick) an the other three, image dimensions (Width, Height,
Channels).
454 J. Niño et al.

Table 3. LOB data images

Data Labeling. Data will be classified in three different classes:

– Class 0: Upwards movement


– Class 1: Downwards movement
– Class 2: No trending movement

Class specification was based in how a following set of ticks behave after a
10-set LOB events. Ticks analysis was done to get the thresholds. Table 4 illus-
trates thresholds.

Table 4. Three class Rules

Price direction Rule Class


Upward movement Last tick price above 0.03% vs last tick of the previous window 0
Downward movement Last tick price below –0.03% vs last tick of the previous window 1
Flat movements Otherwise 2

CNN Architecture. We use a standard CNN architecture, which consists of


(Input + Conv + Pool + Conv + Pool + Dense + Dropout), input images’ size
is 10 × 40. We compare it to AlexNet and LeNet. We have to make some modifi-
cations to input images’ size (20 × 40) as well as filters sizes in some convolution
layers, particularly for AlexNet type configuration.
We set up two different experiments, one using LOB data only, another using
both LOB and tick data, meaning that the second one had more input informa-
tion. We used TensorFlow.
Special considerations for training size included dropout at 40% and bath size
at 100. The dataset was split into 90% for training, 10% for testing. The number
of samples was 67,348 images. Moreover, a similar setup was built taking into
account only LOB data. This means a whole set of experiments working with
2D Convolutions.
CNN with Limit Order Book Data for Stock Price Prediction 455

5 Results
5.1 Model Comparison Across Different CNN Architectures
The CNNs were used to classify the three target classes (Up, Down, Flat). Table 5
shows the performance of the three different architectures over the testing sam-
ples. Definitely, the combination of LOB and Tick data as model’s features signif-
icantly increased the model accuracy model; it achieved accuracies greater than
65%. LeNet* and AlexNet* had a better performance than the proposed topol-
ogy, but they require too much computational power for training purposes, then
it could become a serious problem in a real high-frequency trading strategy. On
the other hand, the proposed CNN Topology sacrifices some performance (less
than 1%), but it is simpler and easier to train. This property is useful in a real
environment, given that it allows to retrain the model and deploy it.

Table 5. Result summary for different architectures

Experiment Topology Data input Perfomance


2D-LeNet LeNet* LOB 59.56%
2D-AlexNet AlexNet* LOB 63.15%
2D-Own Other CNN Topology LOB 58.23%
3D-LeNet LeNet* LOB+Tick 66.09%
3D-AlexNet AlexNet* LOB+Tick 66.83%
3D-Own Other CNN Topology LOB+Tick 65.31%

5.2 Model Comparison Against Other DL Topologies


As observed in Table 6, proposed model is very competitive with the advantage
that one model runs for several assets (Table 6).

Table 6. Comparison against other DL topologies

DL Topology Classes Data used Directional accuracy


Multilayer Perceptron [1] 2 1-Stock, tick data 66%
Deep Belief Network [16] 2 1-Stock, LOB + Tick data 57%
Proposed Model (CNN) 3 12-stocks, LOB + Tick data 65.31%

6 Conclusion and Future Research


CNN for FTS prediction purpose worked well. DA shows that results are very
competitive, in fact, better than other approaches tested before [1,16,19]. As
expected, performance improves when both LOB and tick data is used in
456 J. Niño et al.

conjunction, and the main reason is simple: there is more market information.
Image-like representation is useful and even could be extended, that is it is
possible to have more channels in the original input image (matrix). Perceived
advantages

– One network for multiple assets. It is not usually the case, given the fact that
each asset has it owns dynamics. Image-like representation homogenize inputs,
resulting in an image representing market information, finding patterns across
all image set, regardless the asset.
– Lifetime of trained model. In financial applications frequent retraining is the
norm. This approach extends the lifetime of the trained model due to the
time invariance fact associated with images.

Perceived disadvantages

– It is a data intensive technique. As there are more images for training, results
will improve.
– Training times are large, particular for complex architectures such as AlexNet,
which uses several channels and several layers.
– Preprocessing could be tricky. There are a lot of details to take into account
when transforming raw data.

In our experience, we suggest a trade-off analysis between training times and


lifetime of the trained model. For real implementations with an expected lifetime
ranging from 5 min to a couple of hours, we think is hugely advantageous. This
model should be tested with data from more liquid markets, to check preprocess-
ing times as well as performance. We think that there are a lot of possibilities for
improvement, including the use of combined approaches (LSTM and CNN), and
to code more information in more channels, for example, technical information.

References
1. Arévalo, A., Nino, J., Hernández, G., Sandoval, J.: High-Frequency Trading Strat-
egy Based on Deep Neural Networks, pp. 424–436 (2016). https://doi.org/10.1007/
978-3-319-42297-8 40
2. Arnold, L., Rebecchi, S., Chevallier, S., Paugam-Moisy, H.: An introduction to
deep learning. In: ESANN (2011). https://www.elen.ucl.ac.be/Proceedings/esann/
esannpdf/es2011-4.pdf
3. Chao, J., Shen, F., Zhao, J.: Forecasting exchange rate with deep belief networks.
In: The 2011 International Joint Conference on Neural Networks, pp. 1259–1266.
IEEE (2011). http://ieeexplore.ieee.org/articleDetails.jsp?arnumber=6033368,
http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=6033368
4. Chen, M., Ebert, D., Hagen, H., Laramee, R.S., van Liere, R., Ma, K.L., Ribarsky,
W., Scheuermann, G., Silver, D.: Data, information, and knowledge in visualiza-
tion. IEEE Comput. Graph. Appl. 29(1), 12–19 (2009)
5. Cont, R., Stoikov, S., Talreja, R.: A stochastic model for order book dynamics.
Oper. Res. 58, 549–563 (2010)
CNN with Limit Order Book Data for Stock Price Prediction 457

6. De Goijer, J., Hyndman, R.: 25 years of time series forecasting. J. Forecast. 22,
443–473 (2006)
7. Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event-driven stock pre-
diction. In: Proceedings of the Twenty-Fourth International Joint Conference on
Artificial Intelligence (ICJAI) (2015). http://ijcai.org/papers15/Papers/IJCAI15-
329.pdf
8. Gould, M.E.A.: Limit order books. Quant. Financ. 13, 42 (2010)
9. Hamid, S., Habib, A.: Financial forecasting with neura networks. Acad. Acc.
Financ. Stud. J. 18, 37–56 (2014)
10. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief
nets. Neural Comput. 18(7), 1527–1554 (2006). https://doi.org/10.1162/neco.
2006.18.7.1527, pMID: 16764513
11. Huang, G.E.A.: Trends in extreme learning machines: a review. Neural Netw. 61,
32–48 (2015)
12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Proceedings of the 25th International Conference on
Neural Information Processing Systems, NIPS 2012, vol. 1, pp. 1097–1105. Curran
Associates Inc., USA (2012). http://dl.acm.org/citation.cfm?id=2999134.2999257
13. Laserson, J.: From neural networks to deep learning: zeroing in on the human
brain. XRDS 18(1), 29–34 (2011). https://doi.org/10.1145/2000775.2000787
14. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
15. Längkvist, M., Karlsson, L., Loutfi, A.: A review of unsupervised feature learn-
ing and deep learning for time-series modeling. Pattern Recognit. Lett. 42, 11–24
(2014). http://www.sciencedirect.com/science/article/pii/S0167865514000221
16. Nino, J., Hernandez, G.: Price direction prediction on high frequency data using
deep belief networks. In: Applied Computer Sciences in Engineering, pp. 74–83.
Springer (2016)
17. Olshausen, B.A., Field, D.J.: Natural image statistics and efficient coding. Net-
work Comput. Neural Syst. 7(2), 333–339 (1996). https://doi.org/10.1088/0954-
898X 7 2 014, pMID: 16754394
18. Sandoval, J.: Empirical shape function of the limit-order books of the USD/COP
spot market. In: ODEON, p. 7 (2013). https://ssrn.com/abstract=2408087
19. Sandoval, J., Nino, J., Hernandez, G., Cruz, A.: Detecting informative pat-
terns in financial market trends based on visual analysis. Procedia Com-
put. Sci. 80, 752–761 (2016). http://www.sciencedirect.com/science/article/pii/
S1877050916308407. International Conference on Computational Science 2016,
ICCS 2016, 6-8 June 2016, San Diego, California, USA
20. Shen, F., Chao, J., Zhao, J.: Forecasting exchange rate using deep belief networks
and conjugate gradient method. Neurocomput. 167, 243–253 (2015). https://doi.
org/10.1016/j.neucom.2015.04.071
21. Takeuchi, L., Lee, Y.: Applying Deep Learning to Enhance Momentum Trading
Strategies in Stocks (2013)
22. Wang, Z., Oates, T.: Encoding Time Series as Images for Visual Inspection and
Classification Using Tiled Convolutional Neural Networks (2015). https://pdfs.
semanticscholar.org/32e7/b2ddc781b571fa023c205753a803565543e7.pdf
23. Yeh, S., Wang, C., Tsai, M.: Corporate Default Prediction via Deep Learning
(2014). http://teacher.utaipei.edu.tw/cjwang/slides/ISF2014.pdf
Implementing Clustering and Classification
Approaches for Big Data with MATLAB

Katrin Pitz(&) and Reiner Anderl

Technische Universität Darmstadt, 64283 Darmstadt, Germany


pitz@dik.tu-darmstadt.de

Abstract. Data sets grow rapidly, driven by increasing storage capacities as


well as by the wish to equip more and more devices with sensors and con-
nectivity. In mechanical engineering Big Data offers new possibilities to gain
knowledge from existing data for product design, manufacturing, maintenance
and failure prevention. Typical interests when analyzing Big Data are the
identification of clusters, the assignment to classes or the development of
regression models for prediction. This paper assesses various Big Data
approaches and chooses adequate clustering and classification solutions for a
data set of simulated jet engine signals and life spans. These solutions include k-
means clustering, linear discriminant analysis and neural networks. MATLAB is
chosen as the programming environment for implementation because of its
dissemination in engineering disciplines. The suitability of MATLAB as a tool
for Big Data analysis is to be evaluated. The results of all applied clustering and
classification approaches are discussed and prospects for further adaption and
transferability to other scenarios are pointed out.

Keywords: Big Data  Clustering  Classification  K-means


Discriminant analysis  Neural networks  MATLAB

1 Introduction

When it comes to Big Data, there is no solitary, generally agreed-on definition, neither
in academia nor in industry [1]. However, most experts agree on Big Data exceeding
common storing capacities and computing methods [2]. It has also become popular to
outline Big Data via the 3 Vs introduced by [3]: volume, velocity, and variety. Volume
means that an increasing amount of data is to be handled, even though the specific
numbers for when to start labeling data as Big Data vary. Velocity stresses the fact that
data is generated, processed or modified at high speeds, in some applications close to
real time. Variety describes the state the data is in. This can range from structured data
to semi-structured or unstructured data. Text written or spoken by humans is often
referred to as unstructured data. Though, [2] emphasizes that many sources of Big Data
are not as unstructured as they may seem at first glance, but that it rather takes some
extra time and effort to find the logical flow they do possess. In addition to the three Vs
wider definitions have been proposed over the years leading to five or even more Vs
depending on the source consulted. For example, [4] presents value and veracity as
additional Vs with value considering the potential to contribute to entrepreneurial or

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 458–480, 2019.
https://doi.org/10.1007/978-3-030-02686-8_35
Implementing Clustering and Classification Approaches 459

scientific progress and veracity assessing the consistency and trustworthiness of the
data. Some other characteristics of Big Data are its exhaustiveness (capturing entire
populations or systems), flexibility (offering the possibility to add new aspects or
expand in size) and relational character (allow for linking to other data bases) [1].
The sources and drivers of Big Data are numerous. Web data is referred to as the
original Big Data [2] and often involves interests such as understanding customer
behavior. It may include social media data, interaction data or voluntarily submitted
data. Authors in [5] names mobile sensors, video surveillance, smart grids, geophysical
exploration and medical experimentation as further drivers of the data deluge. In the
field of mechanical engineering the focus lies on data generated by machinery.
A growing number of sensors and actuators are embedded into technical systems so
that some even reach the state of operating completely autonomously. Furthermore, the
interest in monitoring devices and equipment while it is in use is increasing rapidly.
Cameras, GPS units and radio frequency identification (RFID) tags are only some
examples of how this development currently manifests itself [1].
Big Data is closely linked to the fields of business intelligence (BI) and data
mining. It can be considered an extension of BI solutions as they are primarily built to
analyze structured data whereas Big Data approaches aim to handle all kinds of data
[6]. Still, BI solutions should not be discarded too quickly for the sake of Big Data
strategies. It seems more promising to integrate and conjoin Big Data into the data a
business already has and the methods that proved successful throughout its history [2,
7]. Data mining, on the other hand, denotes a set of methods to make use of data by
discovering similarities, patterns, trends, outliers or clusters [8]. Established data
mining techniques are focused on analyzing traditional, structured data [6]. Big Data
now aims at larger amounts of data which are more complex in their structure. This
does not necessarily mean that existing methods need to be overthrown and replaced,
but it at least poses questions of scalability and adaption [2]. Moreover, it is to be
discussed whether the tried and trusted data base language SQL (structured query
language) will still serve the purposes. NoSQL, columnar databases, massively parallel
processing (MPP) databases, cloud computing and frameworks like Hadoop are some
of the new technologies on the rise [2, 6].
This paper addresses various Big Data approaches, highlights their advantages as
well as their shortcomings and describes how they can be implemented with the help of
MATLAB 2017a, an established software tool for engineering applications [9]. The
data on which the implementation and validation is based stems from the National
Aeronautics and Space Administration (NASA) – Prognostics Center of Excellence
(PCoE). This institution collects and provides data sets from science and engineering
that are free of cost and allow researchers and practitioners to explore and enhance data
mining and machine learning algorithms [10]. The focus of this paper lies on clustering
and classification. In addition to implementation matters, general conclusions on
MATLAB’s suitability for Big Data purposes are drawn and the scalability of existing
MATLAB code is discussed.
The paper divides into seven sections. The introduction given in this section is
followed by a description of the data base in Sect. 2. Section 3 explains the criteria
based on which the approaches for clustering and classification are chosen and outlines
460 K. Pitz and R. Anderl

their theoretical foundations. The implementation of these approaches in MATLAB is


part of Sect. 4. Section 5 presents and discusses the results of both clustering and
classification. The paper concludes with an outlook on future work in Sect. 6 and a
summary in Sect. 7.

2 Database

The data set chosen for this paper is part of the NASA PCoE data repository. This
repository currently comprises 16 data sets ranging from biology to electrical or
mechanical engineering topics. What they all have in common is a time dependency
and an information on failure, i.e. they represent time series from a specific starting
condition until failure [10]. As this work is located in the field of mechanical engi-
neering a data set with an according background is chosen: “6 Turbofan Engine
Degradation Simulation Data Set”. This set, introduced by [11], deals with a classical
jet engine with the following main components: low pressure compressor (LPC), the
high pressure compressor (HPC), the outer shaft (N1), the core shaft (N2), the high
pressure turbine (HPT) and the low pressure turbine (LPT).
The data are the results of simulations using an engine model. It is not a record of
signals transmitted by engines physically existing and operated by airlines. Variations
in the production quality of the original engines and degradation effects are included in
the simulation. Each time series in the data set starts at an arbitrary point in the engine’s
life where it is not as good as new anymore but has not failed yet.
The data set separates into training data and test data. The training data serve to
train a model whereas the test data are used to validate the accuracy of the created
model. The time series from the training data provide the time of failure. They contain
all data points from starting condition to failure. The test data time series, on the
contrary, cut off at a point prior to engine failure. The created model can then be used to
estimate the remaining useful life (RUL) of the engine.
Time series enclose 21 different signals an engine would provide, e.g. temperatures,
pressures, shaft speeds and amounts of fuel and coolant. Three more signals that are
useful to determine an engine’s operation condition are available in each time series:
flight altitude, Mach number, and throttle angle. However, these signals shall not be
discussed in more detail as one of the paradigm shifts in applying Big Data approaches
is to focus more on what the data itself reveal on a statistical level and less on building
physical models that are comprehensible in all its interrelationships [12].
The entire data set is divided into five different subsets varying in complexity.
Some subsets show 6 different operating conditions, some only show 1 operating
condition. Analogously, some subsets exhibit 2 different failure mechanisms while
others only have 1 failure mechanism. This information on subsets, operating condi-
tions and failure mechanisms is available with the data set itself. Table 1 gives an
overview of how the data set divides into subsets.
Implementing Clustering and Classification Approaches 461

Table 1. Subsets of the engine data set


Subset Number of operating conditions Number of failure mechanisms
1 1 1
2 6 1
3 1 2
4 6 2
5 6 1

The size of the chosen data set is 12 Mb. This is a relatively small size, considering
that some authors claim the lower boundary of Big Data to be several terabyte or
petabyte [4]. However, a clear definition of how big Big Data has to be does not exist
[5]. Even though the data set may not have the highest volume, the remaining V criteria
should not be dismissed. For example, it exhibits high variety and value characteristics.
Furthermore, it is feasible to test Big Data approaches with this data set while
simultaneously allowing for upscaling to larger amounts of data in the implementation.

3 Chosen Approaches

There are different motivations for building models based on the jet engine data
described above. Typical engineering questions, that would be of interest for an engine
operator as well, are:
• Are operating conditions and failure mechanisms identifiable based on the signals
solely?
• How should an alarm system for imminent engine failures be designed?
• How can the remaining useful life of an engine be estimated?
In terms of data analysis, the first question relates to clustering, the second to
classification and the third to regression or, more generally, prognostics. This paper
focuses on the former two as they lay a base for further prognostic tools. Moreover,
assessing clustering and classification techniques allows to compare supervised versus
unsupervised learning [13].

3.1 Clustering
Clustering aims at identifying different groups of related data within a larger data set.
The grouping is carried out based on the mere data. No additional information stating
which point or series belongs to which group is available. A verification whether or not
the data have been clustered correctly is not possible. Clustering is therefore considered
a method of unsupervised learning [13].
For the chosen data set it is known that 6 different operating conditions and two
different failure mechanisms exist. However, it cannot be retrieved which time series is
from which group. It can be considered a classical clustering scenario, extended by the
fact that the number of clusters is explicitly given.
462 K. Pitz and R. Anderl

Data within one cluster shall be as homogeneous as possible whereas the clusters
themselves shall be as distant from one another as possible. Different distance measures
are a main distinguishing feature between different clustering methods [14]. Established
methods include hierarchical clustering, k-means clustering and Gaussian mixture
models. Hierarchical clustering methods do not need a priori information on how many
clusters are expected, but reveal an initially unknown cluster structure within the data
set. The major drawback is that hierarchical methods are accompanied by high com-
putational costs [15]. k-means clustering and Gaussian mixture models both belong to
the field of partitioning clustering. They both need the information on the number of
clusters to be found. k-means clustering strictly assigns data points to clusters whereas
Gaussian mixture models calculate belonging probabilities. For this work, k-means
clustering is chosen as it is computationally efficient [15] and well compatible with
MATLAB and other Big Data technologies such as Hadoop and MapReduce.
The basic idea of deploying k-means clustering is to divide all n elements into
k disjoint clusters so that the Euclidean distance between elements and cluster centers is
minimized. The clusters’ centers are denoted in the matrix M = [m1, …, mk]. Each
vector mj contains the center of the j-th cluster Cj which is calculated as follows:

1X
mj ¼ xi ; ð1Þ
nj xi 2Cj

with nj being the number of elements belonging to the j-th cluster and xi the values of it
i-th observation in this cluster. The algorithm for performing k-means clustering can
then be described by the following four steps [14]:
• Initialize clusters by specifying cluster centers, either randomly or deliberately.
Calculate the preliminary matrix M based on the specified cluster centers.
• Assign each element in the data set to its nearest cluster Cl, i.e.

xi 2 Cl if jjxi ml jj\jjxi mj jj


ð2Þ
for i ¼ 1; . . .; n; j 6¼ l; i ¼ 1; . . .; k:

• Update matrix M based on the current assignment of elements to clusters using (1).
• Repeat the second and third step until no further changes occur in the cluster
allocation.
k-means clustering is dependent on the initial choice of cluster centers. The algo-
rithm converges to a local minimum of distances between elements and centers.
Depending on the initial centers the final clusters may vary. Choosing them therefore
becomes an essential part of performing k-means clustering. However, choosing them
by hand is laborious and opposing to the idea of evaluating Big Data as automatically
as possible. The purely random selection of initial cluster centers, on the other hand,
may lead to long run times of the algorithm and clusters that are not close to the optimal
solution [16]. An algorithm that overcomes both shortcomings by choosing starting
centers based on weighed probabilities that account for the structure in the data is called
k-means++ and was first proposed in [17].
Implementing Clustering and Classification Approaches 463

k-means++ chooses the first center c1 randomly from all elements available in the
data set. It then calculates the distances D(xi) of all elements to the first center. The
following center c2 is chosen based on a weighed probability, ensuring that elements
are more likely to be chosen the higher their D2 value, i.e. their distance from the first
center, is. After that D(xi) is calculated again for each element, now denoting the
smallest distance between xi and any center chosen so far. The next center is chosen
based on the updated D2 probabilities. These last two steps are repeated until all
k starting centers have been set.
Modifications of the k-means clustering are k-medians clustering and k-medoids
clustering. The use of medians makes the method more robust in terms of outliers. k-
medoids clustering extends the original method by requiring that each cluster center
needs to coincide with an element of the data set. This makes the method applicable for
categorical data as well. However, both extensions are not necessary for the data
considered in this paper so that k-means clustering is chosen for implementation. Prior
to running the classical k-means clustering the above mentioned k-means++ is applied
to determine the cluster centers to start with.

3.2 Classification
Classification follows a similar aim as clustering but is part of supervised learning [13].
It also intends to sort data into groups, in this case called classes, which are as
homogeneous as possible. What sets classification apart from clustering is that in
classification procedures information on the actual class affiliation is available. The
model is trained with a set of training data for which the true class of each element is
known. The trained model can then be used to assign new data for which the class
affiliations are unknown to the appropriate classes.
The main interest in the jet engine scenario lies on the remaining useful life of the
individual engines. An operator of engines might wish to know which engines are close
to failure so that failure may be avoided by means of shop visits and maintenance.
Proximity to failure is indicated by low RUL values, given in flight cycles, e.g. RU = 5
means that the engine will only be able to perform five more flights before it fails.
Creating a warning system based on RUL values and their criticality is a legitimate,
self-evident use case for classification. Three classes are defined in Table 2.

Table 2. Classes for engine failure warning system


Class no. Range of values Significance System action
1 0  RUL  25 Engine very close to failure Alarm
2 25 < RUL  125 Engine heading toward failure Warning
3 RUL > 125 Normal operation None

Classification methods include decision trees, k-nearest neighbors, support vector


machines, naive Bayes, and discriminant analysis. An extensive introduction can be
found in [18]. All methods have advantages as well as shortcomings so that a general
statement on which method is superior to another without considering the specific use
464 K. Pitz and R. Anderl

case is hardly possible. A problem of classification that might arise regardless of the
chosen method is the phenomenon of overfitting. Overfitting denotes the effect that a
classification algorithm adapts overly well to the training data, i.e. scores a high
accuracy within this subset of data, but has a high error rate when classifying test data
[8]. One way to reduce overfitting is the use of cross validation. The data set is then
divided into k subsets. The algorithm is trained with k − 1 of these sets leaving the k-th
one for validation. This procedure is repeated until each subset has once been the
validation set. It obviously increases the computational cost compared to the more
basic holdout validation which only once divides the data set into training and vali-
dation data. It can be considered a trade-off between overfitting reduction and com-
putational efficiency. In this work, the decision is taken in favor of holdout validation.
Linear Discriminant Analysis. For this implementation a linear discriminant analysis
is chosen based on the facts that the linear case is efficient to calculate, allows a quick
classification and is supported by MATLAB’s capabilities. The main reasons to dismiss
the other classification possibilities are that naive Bayes is a rather simple method that
has its strength in serving as a benchmark for other methods. Support vector machines
allow quick classification and are highly generalizable but go along with high com-
putational effort, the need for transformations in specific cases [19] and an incom-
patibility with MATLAB’s Big Data functions. k-nearest neighbors disqualify, because
it is a method prone to outliers [15] and adverse in terms of memory space as the whole
data set has to be kept available as long as the algorithm is carried out. Decision trees
give the opportunity to understand the classification but need downstream pruning
steps [18] or parallelization in form of random forests [20] to handle overfitting.
Discriminant analysis is a method from the field of multivariate statistics. At first, a
distribution function is calculated for each class. Commonly, a multivariate normal
distribution is chosen whose density function is [21].
 
1 1 T 1
fX ð xÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi  exp  ðx  lÞ R ðx  lÞ : ð3Þ
ð2pÞp detðRÞ 2

X is the p-dimensional random variable that, in the engine data example, is composed
of the different signals each engine provides as mentioned in Sect. 2. µ, the vector of
means, and R, the covariance matrix, are to be determined individually for each class.
The borders between two classes are defined as where their density functions have the
same value. The functions describing those borders are called discriminant functions. If
the assumption of identical covariance matrices among all classes is fair, the method
simplifies to linear discriminant analysis. The discriminant functions are then hyper
planes or, regarding a two-dimensional case, linear functions as shown in Fig. 1.
Neural Network. Linear Discriminant Analysis: As an alternative to linear dis-
criminant analysis, classification is carried out with the help of a neural network. The
reasoning behind that is to create the option of comparison and to give a prospect for
future work that might expand into the field of regression for which neural networks are
also suitable [18]. Neural networks have become popular, sometimes being advertised
as a magical solution to all computational problems [18]. They are in fact a very
Implementing Clustering and Classification Approaches 465

Fig. 1. Example of a linear discriminant analysis for two dimensions [22].

powerful and general method that can in theory approximate any complex interrelations
[8]. A neural network is a nonlinear statistical model whose number of layers and
whose activation functions influence this complexity the model is able to represent
[18]. It is best applied in settings where prediction is more important than interpretation
of results [18].
Neural networks can be considered a simulation of the human brain and its learning
process. They involve neurons, weighed connections, and external stimuli. In the living
organism learning signifies the strengthening of synaptic connections between neurons
in response to an external stimulation that has been received. In the neural network this
can be modeled via weights and activation functions [8].
Figure 2 shows the general structure of a neural network with its input neurons,
output neurons, and two exemplary hidden layers. Hidden layers do their name justice
as they are not directly observed but only used internally in the calculation process.

Fig. 2. General structure of a neural network [23].


466 K. Pitz and R. Anderl

In a classification scenario with k classes the number of neurons in the output layer
is k as well so that each neuron represents one class. The input neurons stand for the
signals the model is fed with. The hidden layers in between represent the model to be
trained in order to assign a data element with certain input signal characteristics to its
appropriate class. This means that in the jet engine use case 21 signals can be drawn
upon for input neurons, and the 3 classes defined in Table 2 serve as output neurons.
Each connection is allocated to a weight wij. The first index i denotes the prede-
cessor this connections comes from, the second index j stand for the layer of the
network that is currently at focus. The variables ai state whether or not a connection is
activated. The sum of all incoming ai, weighed with the associated wij, is calculated by

X
n
zj ¼ wij ai ; ð4Þ
i¼0

with n being the number of preceding neurons. The value of z is then fed into the so
called activation function g(zj). Typically, the sigmoid function

1
gsigmoid ðzj Þ ¼ ð5Þ
1 þ ezj

is chosen for this purpose. An alternative worth considering, especially in regard to


performance in MATLAB [24], is the hyperbolic tangent function

2
gtanh ðzj Þ ¼  1: ð6Þ
1 þ ezj

The result of function g(zj) gives the activation aj the neuron propagates into the
next layer of the network. Figure 3 illustrated the activation process. The neuron shown
in this figure exhibits a bias fed into it, represented by a0 which is constantly 1 and
weight w0j, which is a standard modeling technique [18, 25].

Fig. 3. Model of one neuron [25].


Implementing Clustering and Classification Approaches 467

If a neural network only sends signals to its subsequent layers, as discussed so far, it
is called a feedforward network. This is the approach widely used [18]. Networks
which send signals back to their preceding layers exist as well and are sometimes
referred to as networks possessing a memory. The more common name is recurrent
neural network [26].
One difficulty in using a neural network for classification is to determine an ade-
quate size. There are no established rules on how many layers and neurons to use. It
rather is an iterative process of experimentation, facilitated by expertise and experience,
to find the right size for the specific scenario [18]. If too many neurons are chosen,
overfitting occurs. If there are too few neurons, the network might not be able to
sufficiently model complex interrelations in the data. The size of the neural network can
either be determined in a destructive approach or in a constructive one [27]. Destructive
in this case means that the starting point is a big network from which neurons are then
gradually being removed until the performance of the network starts to decrease.
Opting for the constructive approach is to start with a small network and add neurons
until the performance is not enhanced any further.
Once the structure of the neural network is set, it has to be trained. The training data
subset is used for this step. The generic approach to minimize errors is to use a gradient
descent method, also called backpropagation. Detailed equations can be found in [18].
The fastest algorithm MATLAB offers for training neural networks with up to several
hundreds of neurons is the Levenberg-Marquart backpropagation algorithm [28]. It was
first proposed in [29] and applied to neural networks in [30]. The main underlying idea
is to avoid calculating the computationally intensive Hessian matrix and choosing an
approximation instead.
In this work, a standard feedforward neural network with bias and hyperbolic
tangent activation functions is chosen. The size of the network is determined via the
destructive approach described above. Backpropagation is carried our via the Leven-
berg–Marquardt algorithm.
Both, the results of the linear discriminant analysis and of the neuronal network, are
discussed and compared in Sect. 5.

4 Implementation with MATLAB 2017A

The implementation of the chosen approaches to process the engine data set and solve
the problems of clustering and classification is carried out using the programming
environment MATLAB, release 2017a. Even though, MATLAB may not be the most
popular programming language when judged in an overall comparison, it is still listed
around rank 20 in current rankings [31, 32]. It has its strengths in matrix-based
numerical calculations and is widely used in science and engineering. Since release
2016b MATLAB offers new functionalities for handling Big Data, e.g. tall arrays, a
new data type that allows users to carry out calculations with data that would actually
be too big to fit into the working memory by breaking it down into heaps and eval-
uating equations repeatedly. This process can also be parallelized.
Through free educational licenses for teaching staff and affordable student licenses,
MATLAB has gained some popularity in academia. It may be explained thereby that
468 K. Pitz and R. Anderl

graduates are acquainted with it and established it in industry as well. Assuming that
MATLAB is an available tool to practitioners in the field of mechanical engineering,
this paper aims at exploring how and to which extent it can be used to dive into Big
Data analysis.
In order to fully reproduce the results discussed in this paper the following
MATLAB components are required:
• MATLAB R2016b or newer,
• MATLAB Parallel Computing Toolbox,
• MATLAB Statistics and Machine Learning Toolbox,
• MATLAB Neural Network Toolbox.

4.1 Workflow
Regardless of the Big Data approach at focus a general workflow for implementation
can be deployed. The workflow this paper follows, for clustering as well as for clas-
sification, is shown in Fig. 4.

Fig. 4. Workflow for the implementation of Big Data approaches.

It follows the recommendation given in [33] and is in line with MATLAB’s


guidelines [34]. The problem definition has already been laid out in Sect. 3. The first
three preparation steps are described in the following paragraph, with a focus on
dimensional reduction. The processing step will be dealt with under the headline of
parallel and distributed computing. Model design, validation and upscaling are dis-
cussed in Sect. 5.
Implementing Clustering and Classification Approaches 469

4.2 Data Preparation


First of all, a transformation is performed on the input signals in order to create z-
scores. In statistics, z-scores are random variables with mean 0 and standard deviation
1. MATLAB offers the function zscore to obtain z-scores. Using this standardized
form of variables helps comparing them and making them processible by statistical
methods. It can be regarded a step of preprocessing as depicted in Fig. 4.
Subsequently, 5000 data points are randomly sampled from the training data. Data
point does not mean that a certain engine is chosen, but that one value from one signal
of an engine at an arbitrary time is picked. Operating conditions or failure mechanisms
are not yet considered. datasample is the MATLAB function used for this sampling
step. The reasoning behind the sampling step is that the randomly chosen points are
representative for the entire set and that it is more efficient in terms of computational
cost to explore the sample rather than the entire set.
Scatter plots are chosen as an easy and intuitively accessible means of data
exploration. MATLAB’s function to create these is named scatter. Figure 5
exemplarily shows the scatter plots for signals 10 to 21. All x-axes show the negative
RUL value. All y-axes are without unit because of standardization.

Fig. 5. Signals of the engine data set, subset 1, training data, sample of 5000 points, z-scored,
plotted over negative RUL values.

It is evident that some signals show a clear trend over time while others remain
unaffected by time or just react with increased noise as time advances. Signal 11 for
example has a positive trend, i.e. when an engine is close to failure signal 11 tends to
have high values. Signal 21 gives an example of a negative trend, i.e. its values
decrease the closer an engine gets to failure. Signals like those two examples should be
470 K. Pitz and R. Anderl

included into models because their tendencies can help to categorize new data. Signal
10 exhibits no trend over time but stays constant. Therefore, it cannot contribute
information to a model that is built on time-dependencies. Signal 17 shows a weak
positive effect but not as distinct as others do. It could be argued whether or not to
include it. To opt for the safe side, it is dismissed in this work. Signal 14 is exemplary
for a signal that has a varying amount of noise. One might tend to interpret the points
close to RUL = 0 as an upward trend, but indeed they are just scattered further around
a signal value of 0. As the time span just before failure is of special interest for a
warning system, a signal with high noise in this area is of little help and should also be
excluded from the model.
Applying this reasoning to all signals available, the first half not shown in Fig. 5
and the second half documented in Fig. 5, the list of relevant signals to train time-
dependent models with results in:

2; 3; 4; 7; 8; 11; 12; 13; 15; 20; 21:

This can be considered a dimensional reduction. The original 21 signals were


reduced to 11 relevant ones. Reducing dimensions is a standard step in preparing data
for statistical learning algorithms. The less information is dragged along unnecessarily
the more efficient the algorithms work. Choosing the relevant inputs manually works
fine for a reasonable number of input variables. If the number increases, the process can
easily be automated, e.g. with the help of correlation coefficients. corrcoef is the
corresponding MATLAB function.
Note that Fig. 5 only shows a subset of the engine data set. The reduced list of
signals is to be seen as a first attempt at the least complex case of 1 operating condition
and 1 failure mechanism which is represented by subset 1. Processing other subsets
may require further selection of signals.

4.3 Parallel and Distributed Computing


When processing large amounts of data, as is typical for Big Data applications, there
are two steps to be considered in order to optimize computing times: parallelizing and
distributing the computation. Parallel computing refers to the internal processes in one
device, e.g. a laptop, workstation computer or computing server. Computations are
divided among multiple processor cores of this device. Distributed computing enhances
this concept by involving more than one device. Computing clusters are one way to
realize this.
Making data fit for parallel and distributed computing usually requires some steps
in front. Working with MATLAB and the described engine data set, those are the
following: First of all, CSV files are created. Each CSV file contains the data of one
engine. All files are then pooled together with the help of a datastore object.
A datastore object in MATLAB does not create one large variable or container
with all the separate data in it but solely captures the storing path of the files. When data
are needed for calculation they are transformed from the datastore object into a
tall array. tall arrays do not load all the data into the working memory at once but
process data in heaps. When tall arrays appear in a MATLAB script the respective
Implementing Clustering and Classification Approaches 471

equations are not evaluated immediately. An explicit gather command is needed to


execute calculations. The general aim when writing MATLAB code for Big Data is to
reduce gather commands to a minimum, because they are what drives computational
cost. It should also be checked whether all functions used are compatible with tall
arrays. Some examples used in this work that support the use of tall arrays are:
zscore, kmeans, discretize, and double. Self-written functions can handle
tall arrays as well.
In this paper, tall arrays are evaluated locally, using all processor cores available.
This form of parallelization is why the MATLAB Parallel Computing Toolbox is
necessary for executing the code. The size of the data set does not make the use of
distributed computing necessary. However, if bigger data sets were processed, the same
MATLAB code would still be applicable with only slight adjustments via the
mapreduce function. This would allow for the use of computer clusters or cloud
computing solutions such as Hadoop and Spark.
The neural network used for comparative purposes in the classification scenario
functions without tall arrays but has its performance optimized by the MATLAB
Neural Network Toolbox as well as by parallelization.
The third toolbox in use, MATLAB Statistics and Machine Learning Toolbox, does
not provide for parallel or distributed computing but for the statistical methods
themselves. It offers pre-defined functions for support vector machines, decision tress,
k-nearest neighbors, k-means, k-medoids, hierarchical clustering and many more, some
of which are directly applied to obtain the results discussed in the next section and
some of which were adduced as comparisons beforehand in order to find the right
approaches for the engine data scenario.

5 Results and Discussion

This section presents and discusses the results of both the clustering and the classifi-
cation problem.

5.1 Clustering
Clustering is carried out in order to determine groups of engines with similar operating
conditions and failure mechanisms.
Clustering for Operating Conditions. Different operating conditions are only
prevalent in subsets 2, 4, and 5. Therefore, only those subsets are subject to this kind of
clustering. Input variables are the three condition signals flight altitude, Mach number
and throttle angle as mentioned in Sect. 2. Having the information that these three
allow to deduce how the engine is operated while all other signals are just simulated
sensor signals recording internal processes in the engine, makes them an easy and
obvious choice. Clustering has been performed on the training data only. Two itera-
tions of k-means clustering were needed to identify all six clusters shown in Fig. 6.
472 K. Pitz and R. Anderl

Fig. 6. Identified clusters for operating conditions, subset 2, training data, sample of 5000
points, using negative RUL values.

All clusters turn out very concentrated, making them appear like six single points
even though a total of 5000 points is plotted. The cluster centers are given in Table 3.
The results for subsets 4 and 5 are similar, showing highly concentrated centers as well.
Clusters those are as clearly distinguishable as these could have been identified man-
ually just as well. Nevertheless, automated clustering embodies much less effort and is
more generalizable as it can also be used for complex, spread-out clusters.

Table 3. Cluster centers for operating conditions in subset 2


Cluster Cond. 1 (flight altitude) Cond. 2 (Mach number) Cond. 3 (throttle angle)
1 (yellow) 2 0.00 100
2 (green) 25003 0.62 60
3 (red) 45003 0.84 100
4 (purple) 20003 0.70 100
5 (blue) 10003 0.25 100
6 (orange) 35003 0.84 100

The results obtained from the training data can be transferred to the test data. No
modifications need to be made. It could be considered to use the cluster centers
identified from the training data as starting points for a clustering algorithm applied on
the test data. Still, the k-means++ algorithm which does not need manual input for
starting centers proved to be very effective as well in this scenario, given the fact that
only two iterations were necessary.
Clustering for Failure Mechanisms. Subsets 3 and 4 exhibit different failure
mechanisms and have therefore been considered in this part of clustering. It is assumed
that failure is a time-dependent phenomenon for the engine scenario. The closer an
Implementing Clustering and Classification Approaches 473

engine is to failure the higher or lower certain signals will be, indicating malfunctions
in parts of the engine. 21 signals are available in total. Section 4.2 gives a list reduced
to 11 signals that show a clear tendency over time. For this clustering it has been
compared whether using all 11 signals or further reducing the number of input variables
is more efficient. The decision is taken in favour of reduction. Essential signals could be
reduced to:
7; 12; 15; 20; 21:

Figure 7 shows why they are the most useful signals for identifying clusters of
failure mechanisms.

Fig. 7. Signals of the engine data set, subset 3, training data, sample of 5000 points, z-scored,
plotted over negative RUL values, different failure modes color coded in blue and red.

All signals chosen as inputs have a clear diverging trend towards RUL = 0.
A comparison with Fig. 5, in which a subset with only 1 failure mode is plotted and no
such diverging point clouds can be spotted, suggests that it is a valid indicator for the
failure modes in this case. The two different failure modes identified via k-means
clustering are already highlighted in Fig. 7. For signal 15 for example, it can be
concluded that high values towards the end of the engine’s life indicate the first failure
mode (red) while low values indicate the second one (blue).
Plotting the clusters like in Fig. 6 is no longer feasible as more than three
dimensions are used for failure mechanism clustering. Cluster centers are summarized
in Table 4. It is striking that the cluster centers are very close to each other with respect
to all five signals. Table 3 showed greater distances, at least for input Cond. 1. Still, the
k-means algorithm could identify failure mechanism clusters as efficiently as before.
Again, results are obtained after two iterations.
474 K. Pitz and R. Anderl

Table 4. Cluster centers for failure mechanisms in subset 3


Cluster Sign. 7 Sign. 12 Sign. 15 Sign. 20 Sign. 21
1 551.62 519.92 8.52 38.47 23.09
2 567.57 534.94 8.24 39.57 23.75

Only training data are used for clustering. The time-dependency makes data points
close to RUL = 0 more valuable than those with high RUL values. Hence, only the last
ten points of each time series are considered. In some cases those ten data points from
the same engine are not all assigned to the same cluster. However, as an engine is
assumed to only fail from one failure mechanism, a clear assignment to one or the other
cluster has to be made. Whenever this case occurs, the cluster the engine is assigned to
most often out of the ten times is chosen.
Time-dependency is what makes it difficult to transfer the failure mode clustering
from the training data to the test data. In the training data set all time series are available
until the event of failure whereas in the test data set time series are cut off at a random
RUL value, potentially a high one. For test data with a low RUL value it might be
possible to apply the clusters identified from the training data as diverging trends in the
relevant signals already show their effects. For new data with high RUL values this
will, if at all, be accompanied by great uncertainty.
Furthermore, it should be stated that subset 4 requires nested clustering as multiple
operating conditions and multiple failure mechanisms are present at the same time. This
is why the result for subset 4 consists of six pairs of clusters. The clustering for failure
mechanisms is carried out after the clustering for operating conditions but otherwise
does not differ from the procedure described before.

5.2 Classification
Classification has the aim of assigning elements of the engine data set to the right class
of criticality regarding the RUL value. Three classes have been defined in Table 2.
The quality of classification can be evaluated as the actual class affiliations are
available. Some erroneous classifications may be rated more undesirable than others.
Considering a warning system for engine failure, it is worse to receive a normal
operation prompt when actually a warning should be given than to receive an erroneous
warning when the engine is still in normal condition.
The clustering results are made further use of as additional inputs for classification.
For example, engines that were identified as belonging to the same failure mechanism
may be more likely to fall into the same class of criticality as well.
Classification via Linear Discriminant Analysis. The first method applied for clas-
sification is linear discriminant analysis. It needs a training time of 1.3 s on a con-
temporary, customary laptop (Lenovo Thinkpad E550, Intel Core i5-5200 processor).
Training and processing of the entire data set takes approximately 10 s. The results are
summarized in the form of a confusion matrix in Fig. 8.
The diagonal of the confusion matrix documents correct classification, e.g. the
upper left corner of the matrix states that 8.6% of all data elements (5249 in absolute
Implementing Clustering and Classification Approaches 475

Fig. 8. Confusion matrix for classification with linear discriminant analysis, subset 4.

numbers) have been classified for alarm and were real alarm cases. The lower right
corner sums up all diagonal entries, showing that in total 74% of all elements have been
classified correctly whereas 26% have suffered misclassification.
Three groups of misclassifications should be looked at more closely: Cases in
which the target class was alarm but the model only chose warning or normal and cases
in which the engine was operating normally but the model gave an alarm. The first two
mislead the operator to overestimate the engine’s performance and not consider
checkup or maintenance work. The latter may lead to premature shop visits and cause
unnecessary costs. For safety reasons the first two are to be considered even more
critical than the latter one which has economic consequences only. The fact that all
three of these misclassifications occur at a very low rate, 2.0, 0.0 and 0.1% respec-
tively, indicate good quality of the trained model.
Another aspect to be considered when developing an engine failure warning system
is that there should be at least one alarm before engine failure. Engines failing without
prior notice are highly undesirable in the intended system. Figure 9 shows that no such
case occurred for the linear discriminant analysis model.
Vertical lines in Fig. 9 represent individual engines. The y-axis shows its simulated
life in flight cycles. It can be concluded from the plot that most engines start in normal
condition, actually operating normally and correctly classified so. As simulations start
at an arbitrary point of time in an engine’s life some already show warning condition at
the beginning of time counting. The red tips of all lines demonstrate that each engine
has given multiple alarms before failure. Engine 118 for example has the highest line in
the plot and passes through all three phases, starting in normal condition, transitioning
into warning and finally reaching alarm state, initially giving some premature alarms
but then correctly classified as RUL  25. The fact that all engines give alarms, in case
of doubt rather too early than not giving it at all, emphasizes the well-functioning of the
warning system based on linear discriminant analysis.
476 K. Pitz and R. Anderl

Fig. 9. Displayed alarms, warnings and normal conditions when system is trained via linear
discriminant analysis, subset 4.

Classification via Neural Network. The second method applied for classification is a
neural network. Following the destructive approach leads to a number of 20 neurons in
1 hidden layer. 17 inputs, consisting of a reduced number of signals according to
Sect. 4.2 and clustering results, are used. Figure 10 shows the neural network as
modeled in MATLAB.

Fig. 10. Neural network modelled in MATLAB.

Training this network to the point that a valid model is found takes 170 iterations
on average. Using the same laptop as before this is equivalent to approximately 12 s.
The classification results obtained via the described neural network are summarized in
the confusion matrix in Fig. 11.
The sum of all diagonal elements is 74.1%, almost the same as with linear dis-
criminant classification. 25.9% of all data elements are still misclassified. However, the
three most severe misclassifications have values of 2.4, 0.0, and 0.1%, again almost
identical to the results obtained via linear discriminant analysis, which are acceptably
low. The neural network scores a slightly worse rate for classifying alarm conditions as
such but is slightly better at correctly classifying warnings.
Implementing Clustering and Classification Approaches 477

Fig. 11. Confusion matrix for classification with neural network, subset 4.

Figure 12, when compared to Fig. 9, also highlights the fact that the warning
system trained via neural network behaves almost identical to the one based on linear
discriminant analysis. All engines display alarms before failure which is the preferred
characteristic for the warning system discussed in this work.

Fig. 12. Displayed alarms, warnings and normal conditions when system is trained via neural
network, subset 4.
478 K. Pitz and R. Anderl

6 Outlook

The results presented in this paper offer various connecting points for further research.
One promising next step may be to broaden the focus from clustering and classification
to also include regression. In the considered use case regression models could be used
to estimate the remaining useful life of the engines. It should be examined to which
extent regression models can profit from clustering and classification results already
obtained for the data set.
Further enhancements could include image or video data to prove that the methods
are also applicable for high variety data. In general, bigger data sets should be con-
sidered for further validation. Integration of cloud solutions or distributed server
structures should be tested. Applying the approaches to data sets from other technical
systems could further prove their generalizability.

7 Summary

In this paper, a data set for applying Big Data approaches in a mechanical engineering
scenario has been chosen. Various Big Data approaches have been assessed and
compared. A problem definition of clustering and classification has been formulated.
For these two problems k-means clustering, linear discriminant analysis and neural
networks have been identified as adequate methods.
All three methods have been implemented using the programming environment
MATLAB 2017a. Above all, datastore objects, tall arrays and gather com-
mands are crucial for enabling MATLAB scripts for Big Data. The code produced
constitutes a basis for further extension. Bigger data sets could be processed spreading
the computation among a greater number of cores with the help of MATLAB’s Parallel
Computing Toolbox or involving computing clusters or cloud solutions via mapre-
duce settings. Moreover, existing MATLAB scripts for any purposes can be adapted
for Big Data use based on the insights gained by these examples. All that has to be
considered is whether all functions that are used support tall arrays and whether the
program sequence should be adjusted to minimize the number of gather commands.
MATLAB proved to be an adequate tool for analyzing large amounts of stored data
stemming from engine simulations. If it is still powerful enough when additional
challenges like near real-time data or highly unstructured social media data arise
remains to be proven.
The results of the methods themselves show that k-means clustering with k-means+
+ initialization is very fast and effective in identifying operating condition and failure
mechanism clusters in the engine data, reaching plausible results within two iterations.
Comparing linear discriminant analysis and a feedforward neural network with one
hidden layer shows a very similar performance for both when three defined classes for
RUL values are the underlying scenario. Both reach approximately 74% of correct
classifications and 2% or less for misclassifications considered especially severe. The
neural network is easier to implement in MATLAB, more generalizable but less
suitable whenever interpretation of results is a focus as well. The linear discriminant
analysis proved to be slightly faster than the neural network.
Implementing Clustering and Classification Approaches 479

References
1. Kitchin, R.: The Data Revolution. SAGE, Los Angeles (2014)
2. Franks, B.: Taming the Big Data Tidal Wave. Wiley, Hoboken (2012)
3. Laney, D.: 3D Data Management: Controlling Data Volume, Velocity, and Variety, https://
blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-
Volume-Velocity-and-Variety.pdf. Accessed 01 June 2018
4. Demchenko, Y., Grosso, P., Laat, C., de Membrey, P.: Addressing Big Data issues in
scientific data infrastructure. In: IEEE (ed.) 2013 International Conference on Collaboration
Technologies and Systems (CTS) (2013)
5. Long, C., Talbot, K., Gill, K. (eds.): Data Science & Big Data Analytics. Wiley, Indianapolis
(2015)
6. Simon, P.: Too Big to Ignore. Wiley, Hoboken (2013)
7. Iafrate, F.: From Big Data to Smart Data. Wiley, Hoboken (2015)
8. Aggarwal, C.C.: Data Mining. Springer, Cham (2015)
9. Discroll, T.A.: Learning MATLAB. Society for Industrial and Applied Mathematics,
Philadelphia (2009)
10. NASA Prognostics Center of Excellence: PCoE Datasets. https://ti.arc.nasa.gov/tech/dash/
pcoe/prognostic-data-repository/. Accessed 06 Sept 2017
11. Saxena, A., Goebel, K.: Turbofan Engine Degradation Simulation Data Set. https://ti.arc.
nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository. Accessed 14 June 2018
12. Kitchin, R.: Big Data, new epistemologies and paradigm shifts. SAGE J. Big Data Soc.
(2014)
13. Louridas, P., Ebert, C.: Machine learning. IEEE Softw. 33(5), 110–115 (2016)
14. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–
678 (2005)
15. Ester, M., Sander, J.: Knowledge Discovery in Databases. Springer, Berlin (2000)
16. Shindler, M.: Approximation Algorithms for the Metric k-Median Problem. UCLA, Los
Angeles (2008)
17. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SIAM (ed.)
SODA 2007: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete
Algorithms, pp. 1027–1035, Philadelphia (2007)
18. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn.
Springer, New York (2017)
19. Suthaharan, S.: Machine Learning Models and Algorithms for Big Data Classification.
Springer, New York (2016)
20. Genuer, R., Poggi, J.-M., Tuleau-Malot, C., Villa-Vialaneix, N.: Random forests for Big
Data. In: Big Data Research, pp. 22–46 (2017)
21. Schlittgen, R.: Multivariate Statistik. Oldenbourg, München (2009)
22. The MathWorks Inc.: Create and Visualize Discriminant Analysis Classifier. https://de.
mathworks.com/help/stats/create-and-visualize-discriminant-analysis-classifier.html. Acces-
sed 2018 Sep 2017
23. Nielsen, M.: Using Neural Nets to Recognize Handwritten Digits. http://neuralnetwork
sanddeeplearning.com/chap1.html. Accessed 27 Mar 2018
24. The MathWorks Inc.: Tansig: Hyperbolic Tangent Sigmoid Transfer Function. https://de.
mathworks.com/help/nnet/ref/tansig.html. Accessed 28 Mar 2018
25. Russell, S., Norvig, P.: Künstliche Intelligenz, 3., aktualisierte. Pearson, München (2012)
26. Kolen, J.F., Kremer, S.C. (eds.): A Field Guide to Dynamical Recurrent Networks. IEEE,
New York (2001)
480 K. Pitz and R. Anderl

27. Alpaydin, E.: Introduction to Maschine Learning. MIT Press, Cambridge (2004)
28. The MathWorks Inc.: Tainml: Levenberg–Marquardt Backpropagation. https://de.
mathworks.com/help/nnet/ref/trainlm.html. Accessed 27 Mar 2018
29. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc.
Ind. Appl. Math. 11(2), 431–441 (1963)
30. Hagan, M.T., Menhaj, M.: Training feed-forward networks with the Marquardt algorithm.
IEEE Trans. Neural Netw. 5(6), 989–993 (1994)
31. TIOBE: TIOBE Index for March 2018. https://www.tiobe.com/tiobe-index/. Accessed 21
Mar 2018
32. GitHut: Top Active Languages. http://githut.info/. Accessed 21 Mar 2018
33. Ramasso, E., Saxena, A.: Performance benchmarking and analysis of prognostic methods for
CMAPSS datasets. Int. J. Prognstics Health Manag. 5(2), 1–5 (2014)
34. The MathWorks Inc.: Big Data Workflow Using Tall Arrays and Datastores. https://de.
mathworks.com/help/distcomp/big-data-workflow-using-tall-arrays-and-datastores.html.
Accessed 27 Mar 2018
Visualization Tool for JADE
Platform (JEX)

Halim Djerroud(B) and Arab Ali Cherif

Université Paris8, Laboratoire d’Informatique Avancée de


Saint-Denis (LIASD), 2 Rue de la liberté, 93526 Saint-Denis, France
{hdd,aa}@ai.univ-paris8.fr

Abstract. This article presents JEX, a useful visualization extension to


the JADE platform. JEX provides the possibility for MAS (Multi-agent
systems) community using JADE to visualize and interpret their simula-
tions developed under it. Why this contribution? Agent-based modeling
is widely used to study complex systems. Therefore, several platforms
have been developed to answer this need. However, in many platforms,
the graphical representation of the environment and agents are not fully
implemented. In the case of JADE, it’s completely inexistent. Implement-
ing such a graphical representation within JADE is of interest because
it’s a powerful multi-agent platform and FIPA compliant. Adding an
extra feature like JEX will greatly help the scientific community and the
industry to represent and interpret their MAS models.

Keywords: Spatial simulation · JADE · Multi-agent systems

1 Introduction
Multi-agent systems (MAS) has become an active area of research. According to
Weiss [1], a multi-agent systems (MAS) is defined as a system involving two or
more agents to cooperate with each other while achieving local goals. Multi-agent
systems are acknowledged as a suitable paradigm for modeling complex systems.
They are applied in various domains such as collaborative decision support sys-
tems and robotics. The software development process of MAS requires robust
platforms to address the complexity of these tasks by offering MAS key features
such as agent development, monitoring and analysis. The development efficiency
can be significantly enhanced using a platform able to do specific representation.
Agent-based models [2] is the discipline aimed at understanding interaction
of agents in their environment. The multi-agent system are used in two cases: (a)
Simulation of complex phenomena [3] witch implies the simulation of interactions
between agents. This simulation is meant to define the system’s evolution in order
to predict its future organization, such as the food chain study. (b) Distributed
problems solving [1] such as the study of virus propagation in computer networks.
The study of complex phenomena often involves entities that evolve in space
and time. Implementation of these systems in an MAS requires the representation
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 481–489, 2019.
https://doi.org/10.1007/978-3-030-02686-8_36
482 H. Djerroud and A. A. Cherif

of the environment in which they evolve and integrate their positions within it?
We refer this kind of simulation as “Agent-based spatial simulation”.
Agent-based spatial simulation is a key tool and system complex study [4]. It
has grown considerably among the scientific community, and within many social
science disciplines such as psychology in case of simulating human behavior
during emergency evacuations [5].
Since the modeling method used to represent systems varies according to its
characteristics, it is essential to represent the environment as well as the agents
developed in it. For this purpose, several multi-agent platforms have chosen to
integrate a graphical interface that makes possible direct visualization of the
agents’ interaction and the development of the environment.
JADE1 is one of the most popular multi-user platforms [6]. This platform
is widely used in research works because it implements the FIPA standard [7].
Thus, it becomes consequently easily interoperable with other platforms that
implement the same standard like ZEUS [17], FIPA-OS [18], LEAP [19] and
JACK [20]. Furthermore, we can state that JADE is particularly well docu-
mented [8] and has a proven an impressive track record of the system that has
been develops with it.
Using JADE, we faced one main issue as JADE does not implement one
key feature which is a spatial representation module. Thus, dedicated and rigid
plateforms like GAMA [9] and Netlogo [12] seems to be more appealing as they
offers natively this functionality. JEX (JADE Environment Extension) has been
developed in order to address the lack of this key module in JADE. JEX is
a spacial representation module and technically a Java library that integrates
easily with JADE.
This article is structured as follows: First it presents a state of the art of sev-
eral Multi Agent System architectures that illustrate the interest of our contri-
bution. Second describes the JEX extension as well as all provided possibilities.
Finally, it compares our contribution with the tools presented in the Related
Work section. We conclude the article with a discussion about the perspectives
of our contribution.

2 Related Work
More and more applications are developed using MAS, but there are few multi-
agent oriented implementation tools and powerful agent programming languages.
MAS Design relies on existing languages and programming techniques and it’s
often hard to develop MAS (implementation, distribution, communications). The
trend in this context takes on Multi-Agent Oriented Programming and meaning
programming MAS with MAS tools. Many standards have been developed in
this regard such as FIPA2 , MASIF3 and DARPA4 . In this section, we introduce
1
JADE: An open source tool available in: http://jade.tilab.com/.
2
IEEE FIPA:Foundation for Intelligent Physical Agent.
3
MASIF-OMG (Object Management Group) : OMG effort to standardize mobile
agents - middleware services and internal middleware interfaces.
4
Knowledge Sharing Effort The DARPA Knowledge Sharing Effort.
Visualization Tool for JADE Platform (JEX) 483

and compare some agent platform such as JADE, NetLogo, GAMA and Mason
[10,11].
JADE [6] (Java Agent Development Framework) is one of the most pop-
ular agent technology platform. JADE has become a major open source soft-
ware project with a worldwide scope. It is an agent-oriented middleware that
facilitates the development of multi-agent systems. It’s FIPA compliant, FIPA
being an IEEE Foundation for Intelligent Physical Agents. JADE is developed
in JAVA. It includes a runtime environment with JADE agents, on which one
or more agents can be run from the host; a class library that programmers
must/can use to develop their agents; a suite of graphical tools that allow the
administration and monitoring of the activity of agents during implementation.
However, JADE has no tools to visualize agents and the environment.
NetLogo [12] is a multi-agent environment focused on [13,14], modeling
tools. It integrates its own programming language that can be described as a
high-level language. The environment is discrete and it is represented in 2D or
3D form depending on the version used. Netlogo represents the agents that are
obligatory in the environment and can not communicate with the environment
alone. Under Netlogo, it’s possible to depict a third type of facility referred to as
links. It connects up two agents and symbolizes the relationship between agents.
Gama [9] The GAMA platform (Gis & Agent-based Modelling Architec-
ture) is like Netlogo, it offers a complete modelling language - GAMA (Gama
Modelling Language) - allowing modellers to build models quickly and easily.
However, unlike Netlogo which is limited to the construction of simple models,
GAMA allows the construction of very complex models, as rich as those built by
a computer scientist from tools such as Repast Simphony. In particular, GAMA
offers very advanced tools for space management.
Mason [15] MASON is a fast, discrete, Java-based, multi-agent simulation
library designed to serve as a foundation for large customized Java simulations,
and to provide sufficient utility for several soft simulation needs. MASON con-
tains both a model library and an optional suite of 2D and 3D visualization
tools.

3 JEX Architecture

JEX is an extension visualization tool for JADE Framework, this section present
JEX general architecture. The main goal is to provide JADE with an easy and
effective viewer module like the NetLogo interface, therefore, JEX is inspired by
NetLogo visual architecture and functionalities.
To provide a visual representation of MAS, we need to represent agents,
patches and links. Agents can act on the environment, to simplify the complex
implementation of the environment, they are decomposed into small parts called
Patches. The Links are relations between agents.
For JEX we propose the following architecture: We consider the tree types of
entities mentioned above: Agents, Links and Environment. The tree entities has
been implemented as classes named JADE Agents and are named JexAgent,
484 H. Djerroud and A. A. Cherif

Fig. 1. JEX architecture, UML class diagram.

JexLink and JexEnvironnement, as illustrated in Fig. 1. These three classes


are derived from JexGenircAgent which is simply a jade Agent superclass.
We have chosen this implementation in order to take full advantage offered
by the JADEagent Superclass functionalities and be fully compatible with the
framework.

– JexEnvironnement Class consist of a set of patches. The user can choose


the environment dimension and global characteristics (patches size, word-
warps5 , colors etc). The dimensions of the patches can also be chosen. Each
element (patches) can be manipulated independently. From a technical point
of view JexEnvironnement is a static class, with static members. To avoid
multi-instances of environments, and ease agent access. Other global char-
acteristics have been added, such as: the posting6 ) delay, the origin position
(position (0,0)) of the environment and other parameters that fully listed in
JEX documentation7 .
– JexAgent and Jexlinks are used by JexObserver, that can be consider
as an agent used as a registration point the agents willing to subscribe to
graphics’s representation module. JexObserver provides other services such
as creating links (Links) and initializing the environment and proposes Jex-
AgentObserver Interface for the Agents wishing to use the graphics’ rep-
resentation functionalities.

We insist that these various actions are completely transparent to the user,
and they are performed automatically. In the next section we describe how to
integrate JEX into a JADE project.

5
Connect the edges of the environment.
6
Step by step execution, or time unit of execution.
7
http://djerroud.halim.info/index.php/jex.
Visualization Tool for JADE Platform (JEX) 485

4 Integration to JADE
JEX (JADE Environment Extension) comes in the form of a jex.jar java library.
This library makes it possible to provide JADE with a graphical environment
which makes it possible to visualize the agents and the environment.
The integration of JEX into a JADE project, does not require any mod-
ifications of the JADE project. It needs only creating a JexObserver type
agent. This agent enables it to configure the environment, e.g., the length and
the width of the environment, the refreshment time and soon. If none of these
parameters are specified, the values will be set by default. In the next section of
the code, we will present how the JexObserver agent is created.
We observe in the selection of the code that follows, that the creation of the
JexObserver agent is done in the same way as the creation of a JADE agent.
This is possible because JEX agents, as indicated in the previous section, are
JADE agents, more precisely they are derived from the JADE Agent class.
import j a d e . c o r e . Agent ;
import j e x . JexEnvironnement2D ;
imp ort j e x . J ex O b serv er ;
p u b l i c c l a s s J e x T e s t e r A g e n t e x t e n d s Agent {
protected void setup ( ) {
JexEnvironnement2D . i n i t 2 D ( ) ;
O b j e c t a r g s [ ] = new O b j e c t [ 1 ] ;
args [ 0 ] = ””;
ContainerController cc =
getContainerController ( ) ;
try {
A g e n t C o n t r o l l e r ac =
c c . createNewAgent ( ” JexObs ” ,
” j e x . JexObserver ” , args ) ;
ac . s t a r t ( ) ;
....
}
catch ( Exception e ) { . . . }
}
}

In order to maintain the flexibility of JADE, the JEX library does not
monitor all the agents systematically. It’s up to the user to choose the agents to
observe. In order to monitor an agent, the agent needs only to register within
the JexObserver agent as shown in the following code:
import j a d e . c o r e . Agent ;
import j e x . JexAgent ;
...
p u b l i c c l a s s AgentToObserve e x t e n d s Agent {
...
protected void setup ( ) {
jexObserver . subscribe (
486 H. Djerroud and A. A. Cherif

t h i s . getLocalName ( ) ) ;
...
addBehaviour ( . . . ) { . . . } ) ;
}}

Once the observer agent JexObserver is created, and the agents wishing to
benefit from JEX have registered with the observer, it remains to animate these
agents only. For that, JEX offers a set of functions that allow the manipulation
of the various agents in the environment. Among the functions that JEX offer,
we find the initialization functions, that give the initial position of the agent in
the environment. This position can be defined by the user or let JEX propose
a random position.
Another set of functions give a shape to the agents. This is defined by a basic
geometrical shape, e.g., square, rectangle or circle, etc., according the specific
form defined by the user via an image file.
Finally, there is the set of functions that allow the movement itself. These
functions can directly indicate a position to converge to, or give orientation and
movement. Other functions specify the color of the agents, the text on display
etc. All of these functions are described in the JEX documentation.
The selection of the code bellow gives an example of an implementation of
an agent that performs initialization and basic movements.
...
JexAgent j ex A g en t= j e x O b s e r v e r .
getJexAgent ( t h i s . getLocalName ( ) ) ;
...
jexAgent . setRadius ( 1 0 ) ;
j e x A g e n t . s e t S h a p e ( j e x A g e n t .CERCLE ) ;
j e x A g e n t . s e t C o l o r ( new J e x C o l o r ( 2 0 0 , 0 , 0 ) ) ;
jexAgent . s e t I n i t P o s (50 , 5 0 ) ;
...
addBehaviour ( new . . . Behaviour ( . . . ) {
p r o t e c t e d v o i d onTick ( ) {
j ex A g en t . s e t H e a d i n g ( 2 7 0 ) ;
j ex A g en t . f o r w a r d ( 5 ) ;
}
}); . . .

As indicated in the previous section, JEX allows the addition of links between
agents, these links (Links) are represented in the graphical environment by the
lines that connect the agents to each other. These links are particularly useful
when representing graphs. The following code gives an account of how to add
these links (Links) in JEX.
Visualization Tool for JADE Platform (JEX) 487

...
j e x O b s e r v e r . addLink (
j e x A g e n t . getJexAgentLocalName ( ) ,
” agent attached ” , f a l s e
);
...

We end this section with a graphic illustration (Fig. 2). We have chosen an
example to illustrate the possibilities of JADE associated with JEX, namely,
an implementation of a simulation for the propagation of viruses in a computer
network.
The model, displayed in Fig. 2, shows the spread of a virus through a network.
Although the model is somewhat abstract, the interpretation is the following:
each node represents a computer, and the modeling represents the progression
of a computer virus through this network. Each node has two states: infected or
not. In academia, such a model is sometimes called the SIR model.
The Blue nodes represent the uninfected machines. The links that exist
between these machines are figured as lines connecting to the nodes. The red
nodes represent the infected machines.

Fig. 2. Computer network, spread of viruses.


488 H. Djerroud and A. A. Cherif

5 Discussion
The existing multi-agent platforms are more or less specialized, we use again the
example of NetLogo that makes possible to accomplish feats in terms of visual
rendering and spatial representation of the agents. However this tool is very little
used in the scientific world, because of its lack of robustness and specificity of
language, that reduce the working possibilities.
JADE, is written in Java and is easy to use. It implements the FIPA protocol
which makes it one of the best multi-agent platforms. However, it does not offer
a graphical environment for the spatial representation of agents. Attempts to
combine the two platforms have already been tested [16]. The communication
between the two systems is possible via the exchange of XML files.
Spatial representation is essential for the study of complex phenomena as
we have shown in Sect. 1. The utility of integrating a spatial representation tool
for the powerful JADE tool is an important contribution. We described in this
article how to provide JADE with the same graphic means such as NetLogo,
which inspired us in our work.
For the future of JEX, we have developed tools for 2D representation, and
we plan to add a 3D representation of the environment as well as to improve the
API that we presented.
We share this work using a free license; the whole source code as well as the
jar file and the documentation can be downloaded from the link8 .

6 Conclusion

In this paper, we have proposed JEX a spatial representation of MAS agents as


an extension of JADE Framework. We discussed its algorithms and more impor-
tantly its effectiveness and complementary contribution to JADE. We suppose
that this easily integrated enhancement will be very beneficial to JADE’s devel-
oper community.

References
1. Weiss, G.: Multiagent Systems: A Modern Approach to Distributed Artificial Intel-
ligence. MIT Press, Cambridge (1999)
2. Vidal, J.M., Buhler, P., Goradia, H.: The past and future of multiagent systems.
In: AAMAS Workshop on Teaching Multi-agent Systems (2004)
3. Amigoni, F., Schiaffonati, V.: A multiagent approach to modelling complex phe-
nomena. Found. Sci. 13(2), 113–125 (2008)
4. Macal, C.M., North, M.J.: Agent-based modeling and simulation: ABMS examples.
In: Simulation Conference, Winter WSC 2008, p. 2008. IEEE (2008)
5. Pan, X., et al.: A multi-agent based framework for the simulation of human and
social behaviors during emergency evacuations. Ai Society 22(2), 113–132 (2007)

8
http://djerroud.halim.info/index.php/jex.
Visualization Tool for JADE Platform (JEX) 489

6. Bellifemine, F., Agostino, P., Giovanni, R.: JADE-A FIPA-compliant agent frame-
work. In: Proceedings of PAAM, vol. 99, pp. 97–108 (1999)
7. O’Brien, P.D., Nicol, R.C.: FIPA-towards a standard for software agents. BT Tech-
nol. J. 16(3), 51–59 (1998)
8. Bellifemine, F.L., Giovanni, C., Dominic, G.: Developing Multi-agent Systems with
JADE, vol. 7. Wiley (2007)
9. Taillandier, P., et al.: GAMA: a simulation platform that integrates geographical
information data, agent-based modeling and multi-scale control. In: International
Conference on Principles and Practice of Multi-Agent Systems. Springer, Heidel-
berg (2010)
10. Nguyen, G., et al.: Agent platform evaluation and comparison. Rapport technique,
Institute of Informatics, Bratislava, Slovakia (2002)
11. Trillo, R., Sergio, I., Eduardo, M.: Comparison and performance evaluation of
mobile agent platforms. In: Third International Conference on Autonomic and
Autonomous Systems ICAS 2007. IEEE (2007)
12. Tisue, S., Uri, W.: Netlogo: a simple environment for modeling complexity. In:
International Conference on Complex systems, vol. 21 (2004)
13. Tisue, S., Uri, W.: NetLogo: design and implementation of a multi-agent modeling
environment. Proc. Agent (2004)
14. Kornhauser, D., Rand, W., Wilensky, U.: Visualization tools for agent-based mod-
eling in NetLogo. Proc. Agent, 15–17 (2007)
15. Luke, S., et al.: Mason: a multiagent simulation environment. Simulation 81(7),
517–527 (2005)
16. Reis, J.C., Rosaldo, J.F.R., Gil, G.: Towards NetLogo and JADE Integration: an
industrial agent-in-the-loop approach
17. Nwana, H.S., Ndumu, D.T., Lee, L.C.: ZEUS: an advanced tool-kit for engineering
distributed multi-agent systems. In: Proceedings of PAAM, vol. 98 (1998)
18. Poslad, S., Phil, B., Rob, H.: The FIPA-OS agent platform: open source for open
standards. In: Proceedings of the 5th International Conference and Exhibition on
the Practical Application of Intelligent Agents and Multi-Agent, vol. 355 (2000)
19. Bergenti, F., Poggi, A.: Leap: a FIPA platform for handheld and mobile devices.
In: International Workshop on Agent Theories. Architectures and Languages.
Springer, Heidelberg (2001)
20. Winikoff, M.: JACK™ intelligent agents: an industrial strength platform. In: Multi-
Agent Programming, pp. 175–193. Springer, Boston (2005)
Decision Tree-Based Approach for Defect
Detection and Classification in Oil and Gas
Pipelines

Abduljalil Mohamed1(&), Mohamed Salah Hamdi1,


and Sofiene Tahar2
1
Information Systems Department, Ahmed Bin Mohamed Military College,
Doha, Qatar
{ajmaoham,mshamdi}@abmmc.edu.qa
2
Electrical and Computer Engineering Department, Concordia University,
Montreal, Canada
tahar@ece.concordia.ca

Abstract. Metallic pipelines are used to transfer crude oil and natural gas.
These pipelines extend for hundreds of kilometers, and as such, they are very
vulnerable to physical defects such as dents, cracks, corrosion, etc. These
defects may lead to catastrophic consequences if not managed properly. Thus,
monitoring these pipelines is an important step in the maintenance process to
keep them up and running. During the monitoring stage, two critical tasks are
carried out: defect detection and defect classification. The first task concerns
with the determination of the occurrence of a defect in the monitored pipeline.
The second task concerns with classifying the detected defect as a serious or
tolerable defect. In order to accomplish these tasks, maintenance engineers
utilize Magnetic Flux Leakage (MFL) data obtained from a large number of
magnetic sensors. However, the complexity and amount of MFL data make the
detection and classification of pipelines defects a difficult task. In this study, we
propose a decision tree–based approach as a viable monitoring tool for the oil
and gas pipelines.

Keywords: Defect detection and classification  Decision tree


Data mining  Pipeline monitoring and maintenance

1 Introduction

Oil and gas pipeline defect monitoring is an essential component of the pipeline
maintenance process. In order to maintain the pipeline in a properly working order,
different inspection tools such as magnetic flux leakage (MFL), ultrasonic waves, and
closed circuit television (CCTV) are used to detect and classify pipeline defects [1–3].
The complexity and amount of data obtained by such diverse tools require the use of
sophisticated defect detection and classification techniques. Most of the approaches
reported in the literature [4] have been proposed for the purpose of either prediction of
defect dimensions, detection of defects, or classification of defect types. To achieve

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 490–504, 2019.
https://doi.org/10.1007/978-3-030-02686-8_37
Decision Tree-Based Approach for Defect Detection 491

these objectives, techniques such as machine learning [5–7], wavelets [8–13], and
signal processing [14–16] are widely used.
The focus of this paper, however, is on developing a pipeline monitoring tool that
incorporates the two tasks namely: defect detection and defect classification. The main
inference engine for both tasks is a decision tree that takes as an input the crucial MFL
depth and length parameters.

2 Pipeline Monitoring

In this paper, we propose a new monitoring approach for oil and gas pipelines. The
general structure of the proposed approach is shown in Fig. 1.

Fig. 1. The proposed monitoring approach for the oil and gas pipelines.

MFL Signals. MFL data are collected from autonomous devices known as intelligent
pigs. An increase in flux leakage may indicate metal loss, which in turn, means the
possibility of defect occurrence. Thus, at the location of the potential defect, the depth
and length of the flux leakage are measured or estimated by using artificial neural
networks.
Defect Detection. These two crucial MFL parameters are first entered into the defect
detection unit. A decision tree is realized in this unit as defect detection technique. If no
defect is detected, the monitoring process terminates. On the other hand, if a pipeline
defect is detected, the two parameters will be passed on to the classification unit.
Defect Classification. In this unit, based on their severity level, the defect is classified
into one of two categorizes: Type I or Type II. In this work, Type I is considered a very
serious pipeline defect which requires an immediate action and reparation. Type II is
considered less serious and can wait and be scheduled for defect maintenance.
492 A. Mohamed et al.

3 Decision Tree-Based Approach for Defect Detection


and Classification

The decision tree utilized in this work is derived from the simple divide-and-conquer
algorithm. The decision tree is expressed recursively as described in the following
sections.
MFL Signal Depth and Length Attributes. In order to detect/classify pipeline
defects, the obtained MFL signals are first normalized and mapped into depth and
length ranges. According to the industry standard [17], the depth range for the MFL
signals is normalized between 0 and 1; and the length range for the MFL signals is
normalized between 0 and 6. These two ranges constitute the MFL attributes, and are
divided into different values as described below.
The MFL depth attribute values are:
Very high = [0.80 1.00],
High = [0.60 0.79],
Medium = [0.40 0.59],
Low = [0.20 0.39],
Very low = [0.00 0.19],
The MFL length attribute values are:
Large = [3.81 6.00],
Medium = [1.81 3.80],
Small = [0.61 1.80],
Very small = [0.00 0.60],
Defect Detection. Based on the information given in [17], the MFL attributes can now
be used to identify the status of the MFL signals as shown in Table 1. The MFL signal
can either be identified as abnormal (defect) or normal.
Constructing Decision Tree. To construct the decision tree for the defect detection, an
attribute is first selected and placed at the root node, and make branch for each possible
value. This splits up the MFL signals into subsets, one for every value of the attribute.
The process is repeated recursively for each branch, using only those instances that
actually reach the branch. If all instances at a particular node are all either abnormal or
normal, then we stop developing that part of the tree. There are two possibilities for
each split; and they produce two trees as shown in Figs. 2 and 3 for the depth and
length attributes, respectively.
The number of 2 (abnormal) and 1 (normal) classes is shown at the leaves. Any leaf
with only one class (i.e., 2 or 1) reaches the final split; and thus the recursive process
terminates. In order to reduce the size of the trees, the information gain for each node is
measured. Now the information for the two attributes is calculated and split is made on
the one that gains the most information.
Tree Structure. The informational value of creating a branch on the MFL-depth
attribute and the MFL-length attribute are then calculated. The number of normal and
abnormal at the leaf nodes in Fig. 2 are [0 4], [1 3], [2 2], [2 2], and [4 0], respectively.
Decision Tree-Based Approach for Defect Detection 493

Table 1. MFL signal abnormal and normal status based on its depth and length range
MFL-depth MFL-length Status
Very High Medium Low Very Very Small Medium Large Normal Abnormal
high low small (1) (2)
YES NO NO NO NO YES NO NO NO NO YES
YES NO NO NO NO NO YES NO NO NO YES
YES NO NO NO NO NO NO YES NO NO YES
YES NO NO NO NO NO NO NO YES NO YES
NO YES NO NO NO YES NO NO NO YES NO
NO YES NO NO NO NO YES NO NO NO YES
NO YES NO NO NO NO NO YES NO NO YES
NO YES NO NO NO NO NO NO YES NO YES
NO NO YES NO NO YES NO NO NO YES NO
NO NO YES NO NO NO YES NO NO YES NO
NO NO YES NO NO NO NO YES NO NO YES
NO NO YES NO NO NO NO NO YES NO YES
NO NO NO YES NO YES NO NO NO YES NO
NO NO NO YES NO NO YES NO NO YES NO
NO NO NO YES NO NO NO YES NO NO YES
NO NO NO YES NO NO NO NO YES NO YES
NO NO NO NO YES YES NO NO NO YES NO
NO NO NO NO YES NO YES NO NO YES NO
NO NO NO NO YES NO NO YES NO YES NO
NO NO NO NO YES NO NO NO YES YES NO

Fig. 2. The decision tree for the MFL depth attribute. The abnormal status is referred to by 2;
while the normal status is referred to by 1.

The number of normal and abnormal at the leaf nodes in Fig. 3 are [4 1], [3 2], [1 4],
and [1 4], respectively.
Calculating the information gain for each attribute yields the tree structure shown in
Fig. 4. As described in Fig. 5, the decision tree basically uses three values of the MFL-
depth attribute and four values of the MFL-length attribute. The values are Low,
Medium, and High for the MFL-depth attribute, and Very Small, Small, Medium, and
Large for the MFL-length attribute.
494 A. Mohamed et al.

Fig. 3. The decision tree for the MFL length attribute. The abnormal status is referred to by 2;
while the normal status is referred to by 1.

Fig. 4. The decision tree structure for the defect detection.

Defect Classification. The MFL data used for classifying the defect severity level is
shown in Table 2. The table shows that the two attribute values can indicate either the
defect level is of Type I, or the defect level is of Type II.
Decision Tree-Based Approach for Defect Detection 495

Fig. 5. The defect detection based on the two MFL attributes.

Table 2. MFL signal defect (i.e., Type I, Type II) status based on its depth and length range
MFL-depth MFL-length Defect
High Medium Low Small Medium Large Type I (1) Type II (2)
YES NO NO YES NO NO NO YES
YES NO NO NO YES NO YES NO
YES NO NO NO NO YES YES NO
NO YES NO YES NO NO NO YES
NO YES NO NO YES NO NO YES
NO YES NO NO NO YES YES NO
NO NO YES YES NO NO NO YES
NO NO YES NO YES NO NO YES
NO NO YES NO NO YES NO YES

Constructing Decision Tree. The two trees produced by the two attributes are shown in
Figs. 6 and 7. As was the case for the defect decision tree, the information gain for each
node is measured, and split is made on the one that gains the most information.
Tree Structure. The informational value of creating a branch on the MFL-depth
attribute and the MFL-length attribute are then calculated. The number of defect Type I
and Type II at the leaf nodes in Fig. 6 are [2 1], [1 2], and [0 3], respectively. The
number of defect type I and type II at the leaf nodes in Fig. 7 are [0 3], [1 2], and [2 1],
respectively.
Calculating the information gain for each attribute yields the tree structure shown in
Fig. 8. As described in Fig. 9, the decision tree basically uses three values of the MFL-
depth attribute and three values of the MFL-length attribute. The values are Low,
Medium, and High for the MFL-depth attribute, and Small, Medium, and Large for the
MFL-length attribute.
496 A. Mohamed et al.

Fig. 6. The decision tree for the MFL-depth attribute. The defect status of type I is referred to by
1; while type II is referred to by 2.

Fig. 7. The decision tree for the MFL-length attribute. The defect status of type I is referred to
by 1; while type II is referred to by 2.

Fig. 8. The decision tree structure for the defect classification.


Decision Tree-Based Approach for Defect Detection 497

Fig. 9. The defect classification based on the two MFL attributes.

4 Performance Evaluation

The performance of the proposed approach is measured by two important criteria: the
receiver operating characteristics (ROC) curves and the confusion matrices. In ROC,
the true positive rates (sensitivity) are plotted against the false positive rates (1-
specificity) for different cut-off points. For a specific severity class, the closer its ROC
curve is to the left upper corner of the graph, the higher its classification accuracy is. In
the confusion matrix plot, the rows correspond to the predicted class (output class), and
the columns show the true class (target class). In the defect detection and classification,
the proposed approach is compared with the four well-known classifiers, namely the
Naive Bayesian (NB) classifier, k-nearest neighbor (KNN) classifier, Artificial Neural
Network (ANN) classifier, and the Support Vector Machine (SVM) classifier.
Data. The available MFL dataset used in the experimental work is categorized as
follows. For the defect detection, there are 907 samples of normal status, and 2721
samples of the abnormal status. For the defect classification, there are 907 samples for
each type of defects. The data samples have been further divided as follows: 70% for
training, 15% for validation, and 15% for testing.
Defect Detection. The confusion matrix and the ROC curves for each detector model
are shown in Figs. 10, 11, 12, 13 and 14 for the models NB, KNN, ANN, SVM, and
the proposed decision tree (DT). In these figures, the normal status of the MFL signal is
referred to by Class 1, and abnormal status is referred to by Class 2.
498 A. Mohamed et al.

Fig. 10. The defect detection confusion matrix (a) and ROC curves (b) for the NB model.

Fig. 11. The defect detection confusion matrix (a) and ROC curves (b) for the KNN model.

Defect Classification. The confusion matrix and the ROC curves for each classifier
model are shown in Figs. 15, 16, 17, 18 and 19 for the models NB, KNN, ANN, SVM,
and the proposed decision tree (DT). In these figures, the defect type is referred to by
Class 1, and defect Type II is referred to by Class 2.
Decision Tree-Based Approach for Defect Detection 499

Fig. 12. The defect detection confusion matrix (a) and ROC curves (b) for the ANN model.

Fig. 13. The defect detection confusion matrix (a) and ROC curves (b) for the SVM model.

It should be noted from these figures that the proposed DT model outperforms all
other models. It yields 99.2% accuracy for the detection and classification. Moreover,
the artificial neural network model yields the worst performance at 70.2% detection
accuracy and 71.4% classification accuracy. The defect detection and classification
performance of all models are summarized in Table 3.
500 A. Mohamed et al.

Fig. 14. The defect detection confusion matrix (a) and ROC curves (b) for the DT model.

Fig. 15. The defect classification confusion matrix (a) and ROC curves (b) for the NB model.
Decision Tree-Based Approach for Defect Detection 501

Fig. 16. The confusion matrix (a) and ROC curves (b) for the KNN model.

Fig. 17. The defect classification confusion matrix (a) and ROC curves (b) for the ANN model.
502 A. Mohamed et al.

Fig. 18. The defect classification confusion matrix (a) and ROC curves (b) for the SVM model.

Fig. 19. The defect classification confusion matrix (a) and ROC curves (b) for the DT model.

Table 3. Detection and classification accuracy for the NB, KNN, ANN, SVM, and DT models.
Classifier model Defect
Detection Classification
NB 87% 83.8%
KNN 98.8% 96.8%
ANN 70.2 71.4%
SVM 89.5% 90%
DT 99.2% 99.2%
Decision Tree-Based Approach for Defect Detection 503

5 Conclusion

The monitoring process for the oil and gas pipelines consists of two main tasks: defect
detection and defect classification. The complexity and amount of the MFL monitoring
data make both tasks very difficult. In this work, we proposed a decision tree-based
approach as a viable monitoring tool. The new approach is evaluated using two
important criteria: the receiver operating characteristics (ROC) curves and the confu-
sion matrices. The performance of the new approach is compared with other well-
known monitoring tools. Extensive experimental work has been carried out and the
performance of the proposed approach along with four other well-known techniques
are reported. The new approach outperforms all of them with accuracy at 99.2% for the
detection and classification tasks.

Acknowledgment. This work was made possible by NPRP Grant # [5-813-1-134] from Qatar
Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the
responsibility of the authors.

References
1. Park, G.S., Park, E.S.: Improvement of the sensor system in magnetic flux leakage-type nod-
destructive testing. IEEE Trans. Magn. 38(2), 1277–1280 (2002)
2. Jiao, J., et al.: Application of ultrasonic guided waves in pipe’s NDT. J. Exp. Mech. 1, 000
(2002)
3. Jiao, J., et al.: Application of ultrasonic guided waves in pipe’s NDT. J. Exp. Mech. 17(1),
1–9 (2002)
4. Layouni, M, Tahar, S., Hamdi, M.S.: A survey on the application of neural networks in the
safety assessment oil and gas pipelines. In: 2014 IEEE Symposium on Computational
Intelligence for Engineering Solutions. IEEE (2014)
5. Khodayari-Rostamabad, A., et al.: Machine learning techniques for the analysis of magnetic
flux leakage images in pipeline inspection. IEEE Trans. Magn. 45(8), 3073–3084 (2009)
6. Lijian, Y., et al.: Oil-gas pipeline magnetic flux leakage testing defect reconstruction based
on support vector machine. In: Second International Conference on Intelligent Computation
Technology and Automation, ICICTA 2009, vol. 2. IEEE (2009)
7. Vidal-Calleja, T., et al.: Automatic detection and verification of pipeline construction
features with multi-modal data. In: 2014 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2014). IEEE (2014)
8. Song, S., Que, P.: Wavelet based noise suppression technique and its application to
ultrasonic flaw detection. Ultrasonics 44(2), 188–193 (2006)
9. Hwang, K., et al.: Characterization of gas pipeline inspection signals using wavelet basis
function neural networks. NDT E Int. 33(8), 531–545 (2000)
10. Mukhopadhyay, S., Srivastava, G.P.: Characterisation of metal loss defects from magnetic
flux leakage signals with discrete wavelet transform. NDT E Int. 33(1), 57–65 (2000)
11. Han, W., Que, P.: A modified wavelet transform domain adaptive FIR filtering algorithm for
removing the SPN in the MFL data. Measurement 39(7), 621–627 (2006)
12. Joshi, A., et al.: Adaptive wavelets for characterizing magnetic flux leakage signals from
pipeline inspection. IEEE Trans. Magn. 42(10), 3168–3170 (2006)
504 A. Mohamed et al.

13. Qi, S., Liu, J., Jia, G.: Study of submarine pipeline corrosion based on ultrasonic detection
and wavelet analysis. In: 2010 International Conference on Computer Application and
System Modeling (ICCASM), vol. 12. IEEE (2010)
14. Afzal, M., Udpa, S.: Advanced signal processing of magnetic flux leakage data obtained
from seamless gas pipeline. NDT E Int. 35(7), 449–457 (2002)
15. Guoguang, Z., Penghui, L.: Signal processing technology of circumferential magnetic flux
leakage inspection in pipeline. In: 2011 Third International Conference on Measuring
Technology and Mechatronics Automation (ICMTMA), vol. 3. IEEE (2011)
16. Kandroodi, M.R., et al.: Defect detection and width estimation in natural gas pipelines using
MFL signals. In: 2013 9th Asian Control Conference (ASCC). IEEE (2013)
17. Cosham, A., Hopkins, P., Macdonald, K.A.: Best practice for the assessment of defects in
pipelines—corrosion. Eng. Fail. Anal. 14(7), 1245–1265 (2007)
Impact of Context on Keyword Identification
and Use in Biomedical Literature Mining

Venu G. Dasigi1 ✉ , Orlando Karam2, and Sailaja Pydimarri3


( )

1
Bowling Green State University, Bowling Green, OH, USA
vdasigi@bgsu.edu
2
Kennesaw State University, Marietta, GA, USA
orlando.karam@gmail.com
3
Life University, Marietta, GA, USA
sailaja.pydimarri@life.edu

Abstract. The use of two statistical metrics in automatically identifying


important keywords associated with a concept such as a gene by mining scien‐
tific literature is reviewed. Starting with a subset of MEDLINE® abstracts that
contain the name or synonyms of a gene in their titles, the aforementioned
metrics contrast the prevalence of specific words in these documents against
a broader “background set” of abstracts. If a word occurs substantially more
often in the document subset associated with a gene than in the background
set that acts as a reference, then the word is viewed as capturing some specific
attribute of the gene.
The keywords thus automatically identified may be used as gene features in
clustering algorithms. Since the background set is the reference against which
keyword prevalence is contrasted, the authors hypothesize that different back‐
ground document sets can lead to somewhat different sets of keywords to be
identified as specific to a gene. Two different background sets are discussed that
are useful for two somewhat different purposes, namely, characterizing the func‐
tion of a gene, and clustering a set of genes based on their shared functional
similarities. Experimental results that reveal the significance of the choice of
background set are presented.

Keywords: Literature mining · Automatic keyword identification · TF-IDF


Z-score · Background set · Features · Clustering

1 Objectives and Goals

The usefulness of certain text mining approaches for automatic identification of


keywords associated with documents and using those keywords for additional anal‐
ysis, such as classification and clustering of documents, have been studied previ‐
ously [1, 4, 7]. Keywords are identified by the strength of their association with
documents or document classes, such as tweets [4] or research abstracts associated
with specific genes [1, 7]. Keywords thus identified are used as features for addi‐
tional purposes, such as classification of tweets based on sentiment [4] or organ‐
izing genes into groups or clusters based on functional similarity [1, 7].

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 505–516, 2019.
https://doi.org/10.1007/978-3-030-02686-8_38
506 V. G. Dasigi et al.

The strength of association a keyword has to a document or a collection is generally


not determined in isolation or absolute terms, but within the context of its contrast to its
strength in a reference or “background” set of documents. In this work, the focus is on
the significance of the context provided by the background set.
The objective in this work is to understand the impact of context, provided by such
a background collection of documents, in text mining to describe the function of a set
of genes, and in explicating possible similarities in their function by grouping them into
clusters. The task of clustering genes is carried out as a two-step process: First, keywords
specific to each gene of interest are algorithmically extracted from a subset of
MEDLINE® documents, based on two metrics: Z-score [1], a well-known statistical
concept, and TF-IDF [8], a classic term weight metric from information retrieval. The
formulation of these metrics helps identify how important and distinguishing a keyword
is for a particular gene. In the second step of clustering, the classic K-means algorithm
[5] is used to group related genes based on the keyword features into clusters. Each of
these clusters is interpreted as comprised of functionally related genes, as indicated by
the keywords the genes in each share among themselves.
To achieve these stated goals, the extracted keywords should represent two aspects
of the genes: they should be sufficiently specific as to characterize the gene and at the
same time, some of them should be shared among multiple genes so the genes may be
organized into functionally related clusters. To capture these two aspects for keywords,
two different background sets of documents are needed to provide a reference context.
Others have evaluated strengths of different features relative to the same background
set, as Ikeda and Suzuki did in identifying peculiar composite strings as in DNA
sequences [6]. However, few others have attempted to understand the impact of different
background sets in identifying keywords that are used for different purposes.
As pointed out above, keywords for a concept, such as a gene in this work, are
identified based on the strength of their association with the concept. Two alternative
metrics, namely, Z-scores and a less explored variant of TF-IDF (defined below), are
considered in this work to capture the strength associated with keywords. The quality
of keywords extracted for some genes from each metric is evaluated by an expert. The
quality of clusters resulting from K-means is evaluated by calculating the purity of clus‐
ters, which measures the overall similarity of the computed clusters of genes against
expert-defined clusters [2].

2 Methods

Keywords capture and represent the content of documents, such as biomedical abstracts.
Keywords that appear more often in a document are considered more likely to be repre‐
sentative of the content of the document. This ability of a keyword to represent the
content of a document is called the representation aspect. Useful keywords also need to
be able to distinguish between documents. A word that occurs in most documents obvi‐
ously cannot distinguish among those documents. This ability of a keyword to discrim‐
inate between documents will be called the discrimination aspect. Thus, a word that
occurs in not many documents, i.e., one with a low document frequency, can set the
Impact of Context on Keyword Identification and Use 507

small number of documents that it does occur in apart from the many that it doesn’t
occur in. A word that rates well in both the representation aspect and the discrimination
aspect would thus be a good keyword. When a concept, such as a gene, is captured by
a set of documents, it is useful to extend these notions from a single document to a group
of documents [3]. Thus, a keyword may be thought of as characterizing a group of
documents (related to a specific concept, such as a gene) and as distinguishing the group
from other groups (also related to other concepts, such as genes). In this extended view,
the keyword may also be viewed as characterizing a concept, such as a gene (which
underlies the group of documents), itself, and as distinguishing it from other concepts
or genes (which underlie the other groups of documents). In order to capture the repre‐
sentation and discrimination aspects of keywords relative to various concepts, the distri‐
bution of the keyword in various (possibly overlapping) groups of documents, which
correspond to the concepts in question, would be of interest.
TF-IDF has traditionally focused on the representation and discrimination aspects
of keywords relative to individual documents in information retrieval [8]. Andrade and
Valencia, and others following them, have used Z-score more naturally to capture the
distribution of a keyword within groups of documents [1]. The Z-score is thus directly
suitable for capturing the representation and discrimination aspects of keywords relative
to groups of documents, and the concepts underlying them, as Andrade and Valencia
did with protein families. In order to take advantage of the powerful notion of TF-IDF,
while adopting it to the context of concepts represented by groups of documents, the
original definition is improvised here. A brief definition of Z-score is presented first,
followed by a discussion of an improvised variant of TF-IDF that extends to groups of
documents.

2.1 The Z-Score


Well-known in statistics, the Z-score of a word a relative to a gene (or other concept) g
is defined as follows, where F stands for a frequency that simply counts the number of
documents containing a word.

Fga − F̄ a
Zga = ,
𝜎a

where, Fga, F̄ a, and 𝜎 a all relate to the word a, and are respectively the frequency (number
of documents that contain the word a, as mentioned above) in the group corresponding
to the gene g, the average frequency across groups corresponding to all genes of interest,
and the standard deviation of the frequency across the groups of documents corre‐
sponding to all genes of interest. While the standard deviation plays a useful role in
defining Z-score, it is not a focus of this paper.
Thus, the Z-score is a measure of how many standard deviations away the frequency
of the word in a group of documents corresponding to a given gene is from the average
frequency of the word across the set of various groups of documents corresponding to
all the genes that are of interest; for instance, a Z-score of 3 means that the frequency
in question is 3 standard deviations above average. The set of the various groups of
508 V. G. Dasigi et al.

documents corresponding to all the different genes that are of interest mentioned above,
used as a reference against each individual group of documents that corresponds to a
specific gene, is referred to as a background set. The need for selecting appropriate
background sets for different purposes is discussed in Sect. 2.3.
The Z-score is a measure of how far (in terms of the standard deviation) the frequency
of the word in the group of documents corresponding to a given gene (or a concept) is
from the average frequency of the word across the various groups of documents corre‐
sponding to all the genes (or concepts) in consideration; for instance, a Z-score of 3
means that the frequency in question is 3 standard deviations above average. This refer‐
ence set of the various groups of documents corresponding to all the different genes (or
concepts) that are of interest is the “background set”. The need for selecting appropriate
background sets is discussed in Sect. 2.3.

2.2 TF-IDF and Its Variant TF-IGF


The TF-IDF score is classic and well-known in the information retrieval literature, and
has been used to capture the strength of individual words to characterize documents and
distinguish them from other documents [8]. In contrast, in the present work, keywords
are of interest that distinguishes a gene from other genes (or a concept from other similar
concepts). The authors have previously extended the notion of TF-IDF to this new
context [3]. Since the extension is not as well-known as the Z-score, it is briefly explained
here. The entire document collection may be thought of as comprising (not necessarily
exhaustively) a number of possibly overlapping groups of documents corresponding to
different genes that are of interest. There may also be other documents that are unrelated
to any of these genes; thus the overlapping groups of documents do not necessarily
exhaust the entire document collection. Here the focus is on characterizing the repre‐
sentational and discriminating aspect of words relative to each gene (which corresponds
to a group of documents), and not relative to each document (as was the focus of TF-
IDF). The extension involves defining the term frequency TFga of a term a relative to a
gene (represented by a group of documents) g, the group frequency of a term a (similar
to the document frequency of a term), denoted GF a, the inverse group frequency for a
a
term a, denoted IGF a, and finally the combined notion TF-IGFg , the group variant of
TF-IDF, that brings all the pieces together.
TFga is defined as the sum of the number of times the word a appears in the documents
corresponding to the gene1 g, that is,

TFga = tfda ,
d∈g

1
This is also sometimes called the collection frequency of the term in the set of documents, and
counts the total number of occurrences of the term in all the documents of the collection. It
differs from the document frequency of a term in a collection of documents in that the document
frequency just counts how many documents contain the term (with no distinction on the number
of occurrences).
Impact of Context on Keyword Identification and Use 509

where g is used to refer to a gene, as well as to the group of documents associated with
it. The summation is over all documents d associated with the gene, or group of docu‐
ments, g, and tfda is the frequency of the term a in d.
GF a is defined simply as the number of genes or groups of documents that include
(at least a document that contains) the word a. Here G denotes the entire set of genes or
groups of documents.
{
∑ 1 if ∃d ∈ g|a is in d
GF a =
g∈G
0 otherwise

IGF a is defined much as the classic IDF.

|G|
IGF a = log ,
GF a
where |G| is the cardinality of the set of gene groups (44 in the present work with yeast
genes).
Finally, TFga and IGF a are multiplied to form

TF-IGFga = TFga ⋅ IGF a

Above, G has denoted the entire set of genes or groups of documents used in
computing the inverse group frequency for IGF a of a word a. This component is intended
to capture the aspect of keywords that can distinguish a gene associated with a particular
group of documents from all genes and the document groups associated with them. As
in the case of the Z-score, this entire set of groups G is used as the reference against
which individual groups are contrasted is called the “background set” here. The signif‐
icance of the background set is discussed in the next subsection below in more detail.

2.3 The Background Set


In the definitions of both Z-score and TF-IGF, the reference set of the various groups
of documents corresponding to all the different genes that are of interest has been called
the “background set”. The background set is roughly the universe of interest. The focus
is on how a word can distinguish a “foreground” set of documents, which corresponds
to a specific gene, from a background set of documents, which corresponds to all the
genes, and possibly all other concepts at large. Since each gene corresponds to a group
of documents, the term “gene” and the phrase “group of documents” are sometimes used
interchangeably, if it suits the context (often with the symbol g left ambiguous between
a gene and a group of documents).
In the case of the Z-score, it tries to capture how the frequency of a word (in a specific
group of documents corresponding to a “foreground” gene) deviates (in terms of
standard deviations) from the average frequency of the word (in the groups of documents
corresponding to the background set of genes). Thus, the average frequency F̄ a and the
standard deviation 𝜎 a are both computed from the background set.
510 V. G. Dasigi et al.

If a word is contained in only one group of documents corresponding to a specific


gene, then the average frequency of the word in the background set would be very small,
so the word would have a high value of Z-score for that gene, and potentially negative
Z-scores for all other genes. This in turn captures the notion that the word is very
significant for that particular gene. Thus it helps us in distinguishing the gene from
others, and possibly capturing part of its functional description.
TF-IGF attempts to capture how high the frequency of a word (in a specific group
of documents) corresponding to a “foreground” gene is, while the word occurs relatively
infrequently in (the groups of documents corresponding to) the background set of genes.
For any given word a, the first aspect is captured by a high TFga for a specific group of
documents corresponding to a gene g, and the second aspect is captured by a high IGF a
a
in the background set. As with Z-score, if a word a has a high TF-IGFg for a gene g, the
word helps distinguish the gene from others, possibly capturing part of its functional
description.
Keywords identified for a gene using the Z-score or TF-IGF could conceivably serve
at least two distinct purposes. They could be used to characterize or describe the function
of the gene as uniquely or distinctly as possible. Here, the focus would be on distin‐
guishing each gene from the others. Alternatively, the keywords might be used to iden‐
tify possible functional similarities and overlaps between the different genes (indicated
by possibly shared functional keywords). In this case, it would be desirable to see the
keywords capture as much of the functionality of each gene as possible, rather than
emphasize their distinction from other genes.
It appears that the specific choice of background set can impact the appropriateness
of the keywords selected for the gene for the two distinct purposes discussed above. In
order to obtain keywords that uniquely characterize a gene, the keywords should be
associated with the gene in question, but not with any or most of the other genes. A
natural background set for this purpose would be one that includes groups of documents
that correspond to each of the genes that are of interest, and no others. Every document
in the background set would be associated with one or more genes being studied that
we seek to distinguish from one another. There would be no documents in the back‐
ground set that are unrelated to one gene or the other from the set of genes being studied.
In the rest of the paper, this background set of documents is referred to as the restricted
background set.
On the other hand, suppose the focus were instead on grouping the various genes
from the set being studied into clusters based on similarities of function, indicated by
any keywords associated with each gene that are shared with at least another gene. In
this scenario, what would be very useful is to allow keywords identified for different
genes to overlap somewhat, indicating potential similarities in function between pairs
of genes, based on any keywords the pair shares. For this purpose, a background set such
as the one described in the preceding paragraph would be inappropriate, because it tends
to focus on distinguishing the various genes, rather than on whether they could be
similar. A different background set that includes many general documents (including
other biomedical documents, possibly not necessarily about any of the genes being
studied) might provide a broader and a more neutral reference. For instance, the entire
MEDLINE® document collection, which includes many documents that are not
Impact of Context on Keyword Identification and Use 511

necessarily about any of the genes in question, could be such a background set. This
kind of background set is naturally called unrestricted.
In this work, a restricted background set and an unrestricted background set are
created for use in identifying slightly different keyword sets for each gene. The hypoth‐
esis, to be verified, is that the former background set is more suitable for selecting
keywords that are better for characterizing gene function uniquely, while the latter is
more appropriate for selecting keywords used as gene features for functional clustering
of genes. The restricted background set is simply formed from the 44 groups of docu‐
ments that correspond to the 44 yeast genes. It is simply the union of all these documents,
2,233 in total. The unrestricted background set is the entire collection of 6,791, 729
MEDLINE® abstracts (which is a superset of the restricted set, since they were all
downloaded at the same time). This entire set is divided randomly into 44 groups, so as
to keep the methodology consistent and comparable.

3 Results and Analysis

As indicated before, a set of 44 genes that are involved in the cell cycle of budding yeast
have been chosen for this study, since others have studied them, as well. For example,
Cherepinsky et al. includes a study specifically for gene clustering, where they also
include an expert-defined clustering based on functions and transcriptional activators
[2]. In this work, that same expert-defined clustering (not shown here) is used as the
basis for comparison of the quality of clustering.
Using both TF-IGF and Z-score with context provided by the restricted and unre‐
stricted background sets, the N top-ranking keywords were generated by varying N from
10 to 100 for each gene. Thus, four combinations of experiments in all were performed
for generating keywords and for clustering genes. The top 30 keywords generated by
both TF-IGF and Z-scores for three different genes were evaluated by an expert. Using
the top N keywords as features, the K-means algorithm was used to compute gene clus‐
ters [5]. The flow of data for computations of Z-score, TF-IGF, and K-means is illus‐
trated in Fig. 1.
For the 44 yeast genes in consideration, the purity of the computed clustering was
evaluated relative to the expert-generated clustering from in Cherepinsky et al. [2], as
mentioned previously at the beginning of this section. A clustering is a set of sets of
genes. Each inner set of genes is sometimes called a cluster; thus a clustering is a set of
clusters. Purity is calculated by first computing the best degree of match against any
inner set of genes in the expert clustering for each inner set of genes in the computed
clustering, and then averaging this measure over all inner sets in the clustering.
For clustering purposes, once all the keywords for each of the 44 genes were iden‐
tified, any keywords that are unique to each gene, that is, those not shared by at least
two genes, were eliminated. Note that these eliminated words are very important for an
entirely different purpose, namely, to describe the potentially unique functional aspects
of the respective genes, although not that useful for clustering by K-means. A little fewer
than half of the total keywords were unique and eliminated for clustering purposes,
leaving a little over half of the total keywords shared by at least two genes.
512 V. G. Dasigi et al.

Fig. 1. The entire MEDLINE corpus constitutes the unrestricted background set; the restricted
background set is the subset of documents that corresponds to the 44 yeast genes. Four sets of
keywords are computed, based on Z-scores using the restricted background set (K-Z-R) and the
unrestricted one (K-Z-U), as well as based on TF-IGF using the restricted background set (K-T-
R) and the unrestricted one (K-T-U). Eventually four clusterings (C-Z-R, C-Z-U, C-T-R, and C-
T-U) are computed by K-means using these respective sets of keywords.

3.1 Impact of Different Background Sets – Keyword Quality

An expert was asked to evaluate the top 30 ranking keywords for three genes namely,
ace2, cdc21, and mnn1, from all four combinations of experiments. Not surprisingly,
the name of the gene itself is ranked at the top in most cases. According to the expert,
keywords obtained using TF-IGF were better than those based on Z-scores. Contrary to
initial expectation, in the first cut, the quality of the keywords did not appear to depend
significantly on the background set, although there were differences. However, an inter‐
esting observation was made for ace2, which is the name of both a yeast gene and also
a human gene. When Z-scores were computed by using the restricted background set,
more keywords related to the cell cycle function of the human gene (renal activity) were
selected than with the unrestricted background set. This surprising result has an inter‐
esting explanation: the restricted background set results in keywords that are less likely
to be shared between the different genes, and keywords related to human functions of
ace2 are less likely to be shared by other yeast genes. This expectation was at the heart
of the rationale behind the original hypothesis that keywords selected with the smaller,
restricted background set are better for defining the functions of the genes with a partic‐
ular focus on their distinctness relative to the all other genes represented in the back‐
ground set, while those selected with the larger, unrestricted background set are better
for clustering! Space considerations prohibit listing the keywords identified for the genes
under the four combinations.
Impact of Context on Keyword Identification and Use 513

3.2 Impact of Different Background Sets – Functional Clustering of Genes


After identifying sets of keywords associated with the genes of interest, the
keywords that are shared by more than one gene were used as features that form the
basis of K-means clustering. An initial set of clustering experiments was conducted
separately with the values of TF-IGF and Z-score that are associated with each
keyword as feature weights for the clustering algorithm. Those initial experiments
produced clustering results that were not particularly meaningful or interesting,
prompting the authors to switch to using simple binary weights for the features
(keywords) instead. The binary weights were defined as 1 if the word appears in at
least one document associated with the gene set, and 0 otherwise. Simplistic as these
weights are, intuitively binary weights on keywords in the context of the clustering
algorithm capture the notion of shared keywords. Experiments were repeated based
on 10, 20, 30, 50, 70, and 100 top-ranking keywords for each gene from each of the
lists generated by TF-IGF and Z-scores. Tables 1 and 2 show the purity results for
the clusters computed by the K-means algorithm based on keywords generated using
TF-IGF and Z-scores, respectively, both within the context of the restricted back‐
ground set. In the bottom row, the tables also show the total number of distinct
keywords used by the clustering algorithm across all 44 genes.

Table 1. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs,
with keywords based on TF-IGF computed in the context of the restricted background set.
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100
Micro purity 0.636 0.659 0.682 0.562 0.500 0.546
Macro purity 0.707 0.723 0.742 0.643 0.559 0.567
Keywords 315 600 830 1383 1833 2530

Table 2. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs,
with keywords based on Z-scores computed in the context of the restricted background set.
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100
Micro purity 0.409 0.477 0.477 0.432 0.432 0.409
Macro purity 0.455 0.523 0.511 0.496 0.489 0.443
Keywords 475 1010 1524 2280 2888 3623

Purity values are averaged across all clusters in two possible ways. As mentioned
before, each clustering (which is a set of sets of genes) may be viewed as a set of clusters,
where each cluster is a set of genes. Macro-averaging involves simply averaging purities
of individual clusters across all clusters of a clustering. Since the purity of each cluster
is a ratio, the alternative technique of micro-averaging (which is not really a kind of
averaging in the mathematical sense) involves taking the ratio of the sum of numerators
and the sum of denominators, without reducing any of the individual ratios. The micro
purity and macro purity rows in the tables refer to the micro-averaged purity and macro-
averaged purity across the clusters of the computed clustering.
514 V. G. Dasigi et al.

From Tables 1 and 2, it is interesting to notice that, when the restricted back‐
ground set is used in computing the metrics, purity of the clusters based on keywords
identified using TF-IGF is substantially better than that relating to Z-scores. The
results with TF-IGF are better in terms of both higher purity and fewer keywords
than with Z-scores! Fewer features allow for faster clustering.
The experiments were continued with the unrestricted background set to compute
the TF-IGF and Z-scores, and select keywords based on those computations. The unre‐
stricted background set has a much larger number of documents. The documents were
divided randomly into 44 groups for calculating the IGF and Z-scores. The 10, 20, 30,
50, 70, and 100 top-ranking keywords were once again obtained for each gene from the
lists generated based on the TF-IGF and Z-score metrics. Only the keywords shared by
at least two genes were considered, and the binary feature weight was used once again.
The K-means algorithm was repeated 1000 times again with 9 clusters to get the optimal
solution many times. Tables 3 and 4 show the purity results of clusters computed by the
K-means algorithm based on keywords generated using TF-IGF and Z-scores, respec‐
tively, this time both within the context of the unrestricted background set. As in
Tables 1 and 2, in the bottom row, Tables 3 and 4 also show the total number of distinct
keywords used by the clustering algorithm across all 44 genes.

Table 3. Clustering results for 9 clusters using binary keyword weight, based on 1000 runs, with
keywords based on TF-IGF computed in the context of the unrestricted background set.
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100
Micro purity 0.682 0.614 0.682 0.659 0.636 0.636
Macro purity 0.749 0.674 0.640 0.696 0.716 0.674
Keywords 247 417 590 885 1168 1563

Table 4. Clustering results for 9 clusters using binary keyword weights, based on 1000 runs,
with keywords based on Z-scores computed in the context of the unrestricted background set.
Top 10 Top 20 Top 30 Top 50 Top 70 Top 100
Micro purity 0.682 0.659 0.614 0.636 0.636 0.568
Macro purity 0.708 0.699 0.708 0.728 0.630 0.663
Keywords 309 547 747 1139 1526 2067

Tables 3 and 4 indicate that in the context of an unrestricted background set, the
purity of clustering with either metric is perhaps comparable to that with the other. The
total number of keywords extracted with TF-IGF is always fewer than that with Z-scores,
though, indicating faster clustering with TF-IGF.
The purities of the clustering based on keywords with the N top-ranking Z-scores
computed relative to both the restricted and unrestricted background sets are compared
next, from Tables 2 and 4. It can be readily seen that the purity is much better across the
board for the clusterings computed in the context of the unrestricted background set than
for the clusterings computed in the context of the restricted background set. The positive
impact of the unrestricted background set is also evident from a comparison of the
Impact of Context on Keyword Identification and Use 515

numbers of keywords used in computing the clusterings for each threshold of top-
ranking keywords. Fewer keywords in the context of the unrestricted background set
obviously means more keywords are shared between different genes. These results
substantiate the original hypothesis for Z-score that an unrestricted background set
allows for identification of more shared keywords for genes, and consequently, better
clustering by gene function.
Now, consider the purity of the clusterings generated by keywords with the top
N TF-IGF scores, computed in the context of the restricted and unrestricted back‐
ground sets, respectively, from Tables 1 and 3. These results with the TF-IGF
metric are less categorical based on the cluster purity than with Z-scores, although
the purity results with the unrestricted background set are somewhat better or the
same in most (actually 75%) of the cases presented, and in all the cases when 50 or
more top-ranked keywords are considered. Another interesting point to note is that,
as with Z-score, there is always a smaller number of distinct words in the Top N
ranking words when the unrestricted background set is used, indicating that more
keywords are shared within the context of an unrestricted background set. The
chances of more keywords being shared is higher when more keywords are consid‐
ered, in general, which is borne out by the previous observation that purity is clearly
improved with the unrestricted background set when 50 or more keywords are
considered. These results once again lend substantial support to the original hypoth‐
esis that, even for TF-IGF, use of a broader or unrestricted background set is better
for functional clustering of genes than a narrower or more restricted one.

4 Summary and Conclusion

In this paper, two metrics have been reviewed for identifying keywords that have a strong
association with a particular concept of interest, such as a gene, based on the prevalence
of the keyword in documents that are about the concept, contrasted to the keyword’s distri‐
bution in a general “background” set of documents. The two metrics used in working with
a set of 44 yeast genes are the standard statistical metric of Z-score and an extension of the
classic TF-IDF weight metric from information retrieval, which has been named TF-IGF.
The initial hypothesis is that different choices of background sets of documents lead to
keywords with somewhat different properties suitable for different purposes.
In relation to the ability of keywords to uniquely characterize the genes, especially
as distinguished from other genes, TF-IGF seemed to yield somewhat better keywords,
as judged by an expert. Some weak evidence was also found for the hypothesis that a
restricted background set might be more suitable for identifying keywords that are likely
to uniquely characterize the genes in the context of Z-score.
As for clustering of genes, TF-IGF produced keywords that led to clustering with better
purity than the Z-score, with either background set. The results were also achieved with
fewer keywords with TF-IGF than with the Z-score, which is an additional bonus that leads
to faster clustering. In addition, strong evidence was found for the hypothesis that an unre‐
stricted background set is more suitable with either Z-score or TF-IGF for identifying
keywords that could be potentially shared between different genes and thus more suitable
516 V. G. Dasigi et al.

for use in the K-means clustering algorithm. The evidence was supportive of the original
hypothesis in two aspects: a higher averaged cluster purity was obtained with fewer
keywords with the unrestricted background set, irrespective of whether Z-score or TF-IDF
was used to identify the keywords.
A final observation about our hypothesis about the impact of the choice of back‐
ground set on keyword quality for characterizing each gene and on clustering of
genes needs to be made in relation to the Z-score metric. For characterizing each
gene distinctly, it is important to identify as many unique keywords for each gene
(preferably not shared with many other genes) need to be identified. For clustering
of genes based on shared function, it is important to allow for more keywords with
strong association to the genes to be shared between multiple genes. The results
presented in Sects. 3.1 and 3.2 support this hypothesis much more decisively for the
Z-score metric than for the TF-IGF. Indeed, this is not so surprising because the
hypothesis has a more intuitive basis in the definition of the Z-score metric!

Acknowledgments. The authors acknowledge that the MEDLINE® data used in this research are
covered by a license agreement supported by the U.S. National Library of Medicine. Thanks are
also due to Professor Rajnish Singh (Kennesaw State University) for her assistance in relation to
evaluating the keywords for the various genes, and for her help in other ways related to this work.

References

1. Andrade, M., Valencia, A.: Automatic extraction of keywords from scientific text: application
to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998). https://
doi.org/10.1093/bioinformatics/14.7.600
2. Cherepinsky, V., Feng, J., Rejali, M., Mishra, B.: Shrinkage based similarity metric for cluster
analysis of microarray data. Proc. Natl. Acad. Sci. USA 100(17), 418–427 (2003). https://
doi.org/10.1073/pnas.1633770100
3. Dasigi, V., Karam, O., Pydimarri, S.: An evaluation of keyword selection on gene clustering
in biomedical literature mining. In: Proceedings of Fifth IASTED International Conference on
Computational Intelligence, pp. 119–124 (2010). URL: http://www.actapress.com/
Abstract.aspx?paperId=43008
4. Hamdan, H., Bellot, P., Béchet, F.: The impact of Z-score on Twitter sentiment analysis. In:
Proceedings of 8th International Workshop on Semantic Evaluation, pp. 596–600 (2014).
https://doi.org/10.3115/v1/s14-2113
5. Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a K-means clustering algorithm. J. R. Stat.
Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830
6. Ikeda, D., Suzuki, E.: Mining peculiar compositions of frequent substrings from sparse text
data using background texts. In: Proceedings of European Conference on Machine Learning
and Knowledge Discovery in Databases, Springer Lecture Notes in Artificial Intelligence, vol.
5781, pp. 596–611 (2009). https://doi.org/10.1007/978-3-642-04180-8_56
7. Liu, Y., Navathe, S., Pivoshenko, A., Dasigi, V., Dingledine, R., Ciliax, B.: Text analysis of
MEDLINE for discovering functional relationships among genes: evaluation of keyword
extraction weighting schemes. Int. J. Data Min. Bioinform. 1(1), 88–110 (2006). https://
doi.org/10.1504/ijdmb.2006.009923
8. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process.
Manag. 24, 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
A Cloud-Based Decision Support System
Framework for Hydropower Biological Evaluation

Hongfei Hou1,2 ✉ , Zhiqun Daniel Deng1,3, Jayson J. Martinez1, Tao Fu1, Jun Lu1,
( )

Li Tan2, John Miller2, and David Bakken4


1
Pacific Northwest National Laboratory, Energy and Environment Directorate,
Richland, WA 99352, USA
hongfei.hou@wsu.edu
2
School of Engineering and Applied Sciences, Washington State University Tri-Cities,
2710 Crimson Way, Richland, WA 99354, USA
3
Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA, USA
4
School of Electrical Engineering and Computer Science, Washington State University,
355 NE Spokane St., Pullman, WA 99163, USA

Abstract. Hydropower is one of the most important energy sources: it accounts


for more than 80% of the world’s renewable electricity and 16% of the world’s
electricity. Significantly more hydropower capacity is planned to be developed.
However, hydro-structures, including hydroelectric dams, may have adverse
biological effects on fish, especially on migratory species. For instance, fish can
be injured or even killed when they pass through turbines. This is why biological
evaluations on hydro-structures are needed to estimate fish injury and mortality
rates. The Hydropower Biological Evaluation Toolset (HBET) is an integrated
suite of science-based desktop tools designed to evaluate whether the hydraulic
conditions of hydropower structures are fish friendly by analyzing collected data
and providing estimated injury and mortality rates. The Sensor Fish, a small
autonomous sensor package, is used by HBET to record data describing the
conditions that live fish passing through a hydropower structure will experience.
In this paper, we present a plan to incorporate cloud computing into HBET, and
migrate into a cloud-based decision support system framework for hydropower
biological evaluation. These enhancements will make the evaluation system more
scalable and flexible; however, they will also introduce a significant challenge:
how to maintain security while retaining scalability and flexibility. We discuss
the technical methodologies and algorithms for the proposed framework, and
analyze the relevant security issues and associated security countermeasures.

Keywords: Decision support system · Hydropower · Dam · Fish injury


Fish-friendly turbine

1 Introduction

A decision support system (DSS) is a type of interactive knowledge-based software that


uses predefined models to process data inputted from various data sources to help busi‐
nesses and organizations in decision-making activities [1]. A DSS is composed of three

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 517–529, 2019.
https://doi.org/10.1007/978-3-030-02686-8_39
518 H. Hou et al.

fundamental components [2]: a data management component imports/stores data and


provides data access to other components; a decision-making component, containing
predefined decision-making models, compiles useful information from the data provided
by the data management component to make decisions; and a presentation component
enables users to interact with the systems (Fig. 1). DSS will be incorporated into the
Hydropower Biological Evaluation Toolset (HBET), an integrated suite of science-
based desktop tools designed to evaluate the degree to which the hydraulic conditions
of hydropower structures (e.g., turbine, spillway, overshot weir, undershot weir, and
pumped storage) affect entrained fish by analyzing the collected data and providing
estimated injury and mortality rates based off experimentally derived, species-specific
dose-response relationships [3].

Fig. 1. Architecture of DSS [4].

Hydropower is one of the most important energy sources, accounting for more than
80% of the world’s renewable electricity and about 16% of the entire world electricity
supply [5]. Significantly more hydropower capacity is planned to meet demand [5].
However, hydro-structures, including hydroelectric dams and hydraulic turbines, may
have adverse biological effects on fish, especially on migratory species. For example,
fish can be injured or even killed when they pass through turbines [6–10]. This is why
A Cloud-Based Decision Support System Framework 519

biological evaluations of hydro-structures are needed to estimate fish injury and


mortality rates.
HBET uses the Sensor Fish (SF), a small autonomous sensor package instrument
[11], to collect data describing the conditions that would be experienced by live fish
passing through a hydro-structure. SF and HBET can support evaluations of turbines
and sites including physical components (barriers, trash racks, spillways, etc.) fish
interact with during downstream passage to identify the most fish-friendly alternatives.
Currently HBET is platform dependent, and is available only to users who have access
to computers where HBET has been installed. To increase its availability and usage, we
will incorporate cloud computing into HBET, and migrate it into a cloud-based DSS
framework for hydropower biological evaluation. This will make HBET available to
users no matter where they are as long as they have an internet connection. Users would
always use the latest version without installation and upgrading. This will also make
HBET more scalable and flexible in incorporating new dose-relationship, new fish
species, and study types. However, at the same time, it introduces a significant concern:
how to maintain system security without adversely impacting the scalability and flexi‐
bility so no proprietary information is compromised. In this paper, we discuss the tech‐
nical methodologies and algorithms for the proposed framework, and analyze the rele‐
vant security issues and associated security countermeasures.

2 Overview of the Framework

The framework contains three major components, which reside in the cloud (Fig. 2).
The first component is data acquisition and integration (DAI), which contains modules
that receive data. There are two types of data sources for hydropower evaluation: the
internal database and external SF files. The second component is decision-making (DM),
which contains modules to estimate injury and mortality rates for different fish species.
For example, one module could estimate the barotrauma mortal injury rate, and another,
the major injury rate due to shear. Currently HBET can assess strike, shear and baro‐
trauma stressors. Fish species it supports include Chinook salmon, Australian bass,
Gudgeon, Murray cod, and Silver perch. Any incorporation of new stressors or new fish
species can be added through this component. The third component is data validation
and self-monitoring (DVSM), which contains modules which validate input data for
DAI modules and monitor outputs and behaviors of DM modules. For each module in
the first two components, there will be a corresponding data validation and self-moni‐
toring module. Each component can adopt as many modules as needed, and these
modules can be used for different purposes (i.e., different fish species, different study
type, and so on). For example, the DM component can contain DM modules for Chinook
salmon, and for Australian bass. Similar to a typical DSS, the proposed framework also
includes a knowledge base which includes information such as rules, logic, and corre‐
sponding conditions. For example,
520 H. Hou et al.

Fig. 2. Architecture of the proposed framework.

In the proposed framework, we will implement four countermeasures to address


cloud-related security concerns:
1. Introduce DVSM modules into the framework. Data validation is used to filter out
invalid input data. Self-monitoring will use data mining to predict each DM module’s
output, and compare the predicted result with the actual output from the DM module
to determine if the output is expected. Self-monitoring will also monitor modules’
behavior such as resource usage and execution time. (i.e., the proposed framework
would monitor its own behavior during runtime).
2. Use data encryption so that data cannot be interpreted even if it is exposed to unau‐
thorized users.
3. Use a login token and temporary password so that the commitment of any cloud
interfaces and API requests needs a valid login.
4. Create a module set for each study type so that module set’s failure in one study type
does not affect other study types.
A Cloud-Based Decision Support System Framework 521

3 System Security and Countermeasures

In order to analyze security levels of the proposed framework, we first need to identify
its vulnerabilities. There are common vulnerabilities that exist in all types of DSSs, such
as security issues in account authentication and lack of security education [12]. In this
research, we focus on the vulnerabilities that exist only in cloud-based DSSs, but not in
desktop ones:
1. Insecure cloud interfaces and APIs. Cloud-based systems provide cloud interfaces
and APIs [13] through which to communicate with other systems and/or devices,
and thus their security will depend on the security of the cloud interfaces and APIs.
These issues include insecure cloud interfaces, immature cloud APIs, insufficient
inputted data validation, and insufficient self-monitoring [14].
2. Resource overbooking. Resources can be overused if the modules in the cloud-based
DSS are modeled inaccurately [15]. This can also happen if attackers intentionally
design a module to allocate or occupy resources without limits. If resource over‐
booking occurs, services of a cloud-based DSS will become unavailable (i.e., the
DSS will be inaccessible). Typical methods employed by attackers to overbook
resources include unlimited memory allocation, unlimited occupation of storage,
and unlimited occupation of bandwidth.
3. Data exposure. Input data should only be accessible and exposed to the desired DAI
modules, and output from DM modules should only be exposed to the desired devices
or DVSM modules. However, since the data or training set data are saved in the
cloud-based database, they can be co-located with the data owned by competitors or
intruders because of weak separation [16].
4. Vulnerabilities in virtual machines and hypervisors. Cloud-based systems will run
in virtual machines or hypervisors. Compromises occurring in virtual machines and
hypervisors may introduce data leakage [17], and resource overbooking.
Threats in cloud computing will also exist in cloud-based DSS. There are 12 top
security threats faced by cloud-based service, called the “Treacherous 12” [18] as shown
in the first 12 rows of Table 1.
“Data breach” is always the major concern for all systems, including both desktop-
based systems and cloud-based ones. “Data encryption” can be applied to saved data so
that data cannot be interpreted even if it is breached. “Insufficient identity, credential,
and access management,” “system vulnerabilities”, “account or service hijacking”,
“malicious insiders”, “advanced persistent threats”, “data loss”, and “insufficient due
diligence” can bring security risks to input data, stored data, and systems themselves.
In this research, we aim to maintain flexibility and scalability but retain security when
migrating a desktop DSS into a cloud-based one. Thus, we will only focus on the threats
that are unique to cloud-based systems and some threats that are major concerns: “data
breaches”, “insecure cloud interfaces and cloud APIs”, “abuse and nefarious use of cloud
services”, “denial of services”, and “shared-technology vulnerabilities”. The counter‐
measures we will implement in the proposed framework are to address the focused
threats.
522 H. Hou et al.

Table 1. Threats in cloud-based DSS


Threats Explanations
Data breaches Data will be breached if it is accessed by
unauthorized services or function calls, or when
authorized services or function calls use the data in
an improper way. Data breaches are not unique to
cloud-based DSS, but it is the top concern for cloud-
based DSS users [15, 19]
Insufficient identity, credential, and access If identity, credential, and access management is not
management sufficient, sensitive data can be exposed to
unauthorized entities, and data and applications can
be manipulated unexpectedly [19]
Insecure cloud interfaces and cloud APIs Cloud interfaces and cloud APIs are the fundamental
parts of cloud-based DSSs. They are the bridges
between system components and databases. If the
cloud interfaces and APIs are not secure, attackers
can use them to access data and perform commitment
as often as they wish
System vulnerabilities System vulnerabilities include bugs or issues in
operating systems or software. Exploiting system
vulnerabilities is a common way for attackers to
commit their actions
Account or service hijacking After hijacking an account or service, attackers can
bypass the authentication process and then pretend to
be legitimate users, operators, or software
developers, in order to achieve their goals [19]
Malicious insiders Malicious insiders can cause much more damage
than other threats. For example, a system
administrator can access any data and any
application, and thus can inflict any kind of damage
Advanced persistent threats Advanced persistent threats (APTs) are cyberattacks
used to gain control over the systems to steal data
Data loss Input data or training set data stored can be deleted
or erased once attackers take control of a system. This
can also occur because of human error, but this is not
the focus of this research
Insufficient due diligence Without due diligence, wrong technologies or wrong
system configurations can be applied. This will
introduce a potentially large risk
Abuse and nefarious use of cloud services If cloud services are not secured, they can be abused
to achieve certain specific goals; for example, email
spam
Denial of services If a resource is overused, the system may have no
resources left to process any incoming legitimate
requests
Shared-technology vulnerabilities Sharing technology make cloud service more
scalable. However, it brings vulnerabilities at the
same time
Insecure virtual machines and hypervisors If the virtual machines or hypervisors of the cloud-
based system are not secure, the cloud-based system
will be at risk
A Cloud-Based Decision Support System Framework 523

The First Counter Measure is to Introduce DVSM Modules Into the Framework.
Each time a new DM module is added to the system, the system will use the inputted
domain (e.g., Turbine) and subdomain information (e.g., Francis) to find a matching
DVSM module, and then create a new instance of the module found for the newly added
module. If there is no existing module, the system will display user interfaces to request
information to generate the corresponding DVSM module. There are three steps to
collect information. The first step is to collect information about data validation, such
as data type or data sequence format. The second step is to collect information about
self-monitoring, such as conditions and corresponding actions when conditions are met.
For example, if execution time exceeds 30 s, change the module’s status to “Suspicious”.
The third step is to input a training data set which will be used for the self-monitoring
part of the newly generated module. The training data set will be saved into the database
for further reference. The newly generated DVSM module and training data set will be
reviewed and verified before being put into use. In the data validation part of each DVSM
module, we use a structured validation consisting of several operations for all newly
acquired data [20]. The first operation is to check whether the input datasets are in correct
format. For example, the dataset collected from Sensor Fish should be “<pres‐
sure><acceleration><acceleration><acceleration_z><temperature><voltage>
<rotation_x><rotation_y><rotation_z><magnetic_x><magnetic_y><magnetic_z>
”. The second operation is to check the data type for each field. For example, the data
type of “pressure” should be “float”. The third operation is data range check to make
sure the acquired data are within reasonable limits. For example, the pressure should be
greater than 0 psi. The last operation, to check data frequency, is to make sure that data
are collected at the expected intervals. In the self-monitoring part of each DVSM
module, we will implement the k-nearest neighbors algorithm (KNN), a non-parametric
algorithm with lazy learning [21] for data mining on modules’ outputs and behaviors.
We chose KNN for the following reasons: KNN is efficient because the lazy learning
algorithm can use the training data set without any generalization; KNN has been used
widely and can be applied for data with arbitrary distribution because it is not parametric;
KNN is ranked among the top 10 data mining algorithms [22].
In this research, we use SF as the data source. For each study, we deploy SF at the
desired study site to get a sufficient sample size for statistical analysis and required
precision. Each time a SF is released, the corresponding hydro-structure’s environmental
characteristics are recorded. After all SFs are released and recovered, the data files are
downloaded from the SF. DAI modules will upload these downloaded SF files into the
system, and then pass the interpreted data into the hydropower evaluation DM modules.
DVSM modules will use the attributes shown in Table 2 to monitor the outputs from the
DM modules.
Table 3 is part of the training data set, which contains combinations of attributes’
value and expected outputs. Multiple classes describe the modality injury rates associ‐
ated with the corresponding stressors: BMIR refers to barotrauma mortal injury rate, and
SMIR refers to shear major injury rate.
524 H. Hou et al.

Table 2. Attribute list to monitor HBET decision-making module


Name Description
DN Domain name, such as Hydropower Biological Evaluation (represented as an integer;
e.g., “1”). Read from the configuration files
STN Study-type name, such as Turbine (represented as an integer; e.g., “1”). Read from
the configuration files
SSTN Sub-Study type name, such as Francis (represented as an integer; e.g., “0”). Read
from the configuration files
FS Fish species studied, such as Chinook salmon (represented as an integer; e.g., “11”)
AFD Actual total flow discharge of the study site, in thousands of cubic feet per second
TFD Target total flow discharge of the study site, in thousands of cubic feet per second
APG Actual power generation of the study site, in megawatts
TPG Target power generation of the study site, in megawatts
BP Barometric pressure measured when the SF is released in pounds per square inch
ERD Estimated release depth when SF is released, in feet
BA Blade angle of the turbine, in percentage
WGO Wicket gate open percentage
TE Tailwater elevation of the study site, in feet
FB Forebay elevation of the study site, in feet
HHE Hydraulic head elevation of the study site, in feet

Table 3. Training data to monitor HBET decision-making module


DN 1 1 1 1 1
STN 1 1 1 1 1
SSTN 0 0 0 0 0
FS 11 11 11 11 11
AFD 50.087894 50.017833 50.143431 50.094383 50.043149
TFD 80 80 80 80 80
APG 92.438843 91.543232 93.431293 91.738209 91.637234
TPG 150 150 150 150 150
BP 14.721021 14.697332 14.719908 14.716734 14.700632
ERD 127.989454 124.548293 126.431829 125.438219 126.008943
BA 0.15 0.15 0.15 0.15 0.15
WGO 0.57 0.57 0.57 0.57 0.57
TE 17.895445 17.047384 18.089433 17.894343 18.047854
FB 120.483943 119.483943 120.894320 119.823343 120.439083
HHE 102.588498 102.436559 102.804887 101.929000 102.391229
BMIR 0.045684 0.053612 0.047534 0.048893 0.051234
SMIR 0.021367 0.031267 0.029123 0.035623 0.013434
A Cloud-Based Decision Support System Framework 525

After processing each SF data file, DVSM modules will retrieve the corresponding
information for each attribute shown in the Table 1 to generate a vector, which is then
used to calculate the Euclidean distance (ED, the square root of the sum of the square
of differences between the corresponding values of two vectors; Eq. 1) [23] against each
row of the training set.
√∑n
ED(x, y) = |xi − yi |2 (1)
i=1

After calculating the EDs for all rows in the training set, the system will add the
results as a new column into the training set and sorts it by the ED in ascending order.
The predicted class is the majority of the classes in the top K rows, which is used to
compare the result made by the corresponding DM module for the given inputted data
set. Comparison results are accumulated to calculate the error rate. In this research, we
would choose 11 as the value of K for the accuracy based on the chart (Fig. 3).

Fig. 3. Relationship between value of K and accuracy of KNN.

The Second Countermeasure is to Use Data Encryption. When registering to use


the cloud-based framework, an organization will be provided a public/private key set.
Input data will be interpreted by the DAI module, and then encrypted using the provided
public key and saved into the database. The training data set will also be encrypted and
saved into the database. For the DM module, any time it retrieves data from the database,
the private encryption key will be used to decode the data for further processing. The
public key is shared with the public, and the private key file is distributed by the organ‐
ization only to authorized users and saved into an encrypted USB drive. When using the
cloud-based DSS, the USB driver holding the private key file should be connected.
Without the private key, data cannot be interpreted.
526 H. Hou et al.

The Third Counter Measure is to Use a Login Token and Temporary Password.
A login token is generated when a user logs into the systems, and it expires any time the
user logs out or the session times out. When the server side processes the login request,
it will first validate it by checking the username and password. If the validation passes,
it then sends a temporary password to the user’s email or cell phone, based on the user’s
selection. The user must input the correct temporary password for the system to success‐
fully log the user in and generate the login token. The token generated is used for each
call of the cloud interfaces and application programming interfaces (APIs) as one of the
properties in the parameter JavaScript Object Notation (JSON) object. When the server
side of the cloud-based DSS receives the service request with the passed-in JSON object,
it first retrieves and validates the login token. If the login token is valid, the system
processes the request and moves forward. Otherwise, the request is discarded.

The Fourth Countermeasure is to Create a Module Set for Each Study Type. For
example, for a Turbine study, there will be a module set including a DAI module and a
corresponding DVSM module, a DM module and a corresponding DVSM module. Thus,
failure on any module in a Turbine study’s module set will be isolated from other study
types.
Besides applying the above mentioned countermeasures, we will also act quickly on
suggestions by system providers, such as upgrading or installing patches. Table 4 shows
the specific countermeasures proposed for the threats which may be encountered.

Table 4. Threats and countermeasures


Threats Security countermeasures
Data breaches Data encryption, login token and temporary
password, self-monitoring
Insufficient identity, credential, and access Data encryption, login token and temporary
management password, self-monitoring
Account or service hijacking Login token and temporary password, self-
monitoring
Advanced persistent threats Login token and temporary password, self-
monitoring
Insecure cloud interfaces and cloud APIs Login token and temporary password, self-
monitoring
Abuse and nefarious use of cloud services Data validation and scanning, self-monitoring
Denial of services Data validation and scanning, self-monitoring
Shared-technology vulnerabilities Independent component set, data encryption,
login token and temporary password, data
validation and scanning, self-monitoring

For threats not listed in Table 4, we will take actions suggested by system providers.
By applying the latest upgrades and installing the latest patches, we can prevent the
security risks due to “system vulnerabilities”. By improving employee screening and
hiring practices, we can reduce the issues that can be caused by “malicious insiders”.
Providing sufficient security education will significantly improve the security level
A Cloud-Based Decision Support System Framework 527

against risks caused by “insufficient due diligence” [24]. “Data loss” is caused mainly
by human error. By educating employees and incorporating login tokens and temporary
passwords, organizations can significantly reduce data loss. By configuring the virtual
machine as suggested by the vendors, applying all security patches, installing all security
upgrades, and pursuing regular monitoring, risks introduced by “virtual machine vulner‐
abilities” will be controlled. DVSM is the fundamental component in the proposed
framework. Compared to these existing methods, this component will not only monitor
the input data and output from DM modules’, but also monitor the modules’ behaviors.
Under these conditions, the proposed framework can maintain security when migrating
from a desktop DSS.

4 Conclusion

To increase the availability and usage of HBET, we will incorporate cloud computing
and migrate it into a cloud-based DSS framework. This will make the systems more
scalable and flexible. To maintain security while retaining scalability and flexibility, we
will implement several security countermeasures in the proposed framework. By
applying data encryption, login tokens and temporary passwords, and data validation
and self-monitoring, the proposed DSS framework can address threats including data
breaches; insufficient identity, credential, and access management; account or service
hijacking; advanced persistent threats; insecure cloud interfaces and cloud APIs; abuse
and nefarious use of cloud services; denial of services; and shared-technology vulner‐
abilities. For threats not mentioned above, we will take actions suggested by system
providers. By applying the latest upgrades and installing the latest patches, we can
prevent security risks due to system vulnerabilities. By improving employee screening
and hiring practices, we can reduce issues caused by malicious insiders. Providing
sufficient security education will significantly reduce risks caused by insufficient due
diligence. Data loss is caused mainly by human error. By educating the employees and
incorporating login tokens and temporary passwords, organizations can significantly
reduce data loss. By configuring the virtual machine as suggested by the vendors,
applying all security patches, installing all security upgrades, and pursuing regular
monitoring, risks introduced by virtual machine vulnerabilities will be controlled. We
conclude that the proposed framework can maintain security when migrating from a
desktop DSS. For future work, we will use this paper as the basis to implement the
proposed cloud-based DSS framework and deploy it into the cloud.

Acknowledgments. The work described in this article was funded by the U.S. Department of
Energy Water Power Technologies Office.
528 H. Hou et al.

References

1. Power, D.J.: Decision Support Systems: Concepts and Resources for Managers. Greenwood
Publishing Group, Santa Barbara (2002)
2. Sage, A.P.: Decision Support Systems Engineering, 1st edn. Wiley, Hoboken (1991).
ISBN-10: 047153000X, ISBN-13: 978-0471530008
3. Hou, H., Deng, Z.D., Martinez, J., Fu, T., Duncan, J.P., Johnson, G.E., Lu, J., Skalski, J.R.,
Townsend, R.L., Tan, L.: A hydropower biological evaluation toolset (HBET) for
characterizing hydraulic conditions and impacts of hydro-structures on fish. Energies 11(4),
990 (2018)
4. Turban, E., Aronson, J.E.: Decision Support Systems and Intelligent Systems, 6th edn.
Prentice Hall, Upper Saddle River (2001). ISBN:0130894656, 9780130894656
5. REN21: Renewables 2016 Global Status Report (Paris: REN21 Secretariat) (2016). ISBN:
978-3-9818107-0-7
6. Brown, R.S., Colotelo, A.H., Pflugrath, B.D., Boys, C.A., Baumgartner, L.J., Deng, Z.D.,
Silva, L.G.: Understanding barotrauma in fish passing hydro structures: a global strategy for
sustainable development of water resources. Fisheries 39(3), 108–122 (2014)
7. Cada, G.F.: The development of advanced hydroelectric turbines to improve fish passage
survival. Fisheries 26(9), 14–23 (2001)
8. Cushman, R.M.: Review of ecological effects of rapidly varying flows downstream from
hydroelectric facilities. N. Am. J. Fish. Manag. 5(3A), 330–339 (1985)
9. Pracheil, B.M., DeRolph, C.R., Schramm, M.P., Bevelhimer, M.S.: A fish-eye view of
riverine hydropower systems: the current understanding of the biological response to turbine
passage. Rev. Fish Biol. Fish. 26(2), 153–167 (2016)
10. Trumbo, B.A., Ahmann, M.L., Renholds, J.F., Brown, R.S., Colotelo, A.H., Deng, Z.D.:
Improving hydroturbine pressures to enhance salmon passage survival and recovery. Rev.
Fish Biol. Fish. 24(3), 955–965 (2014)
11. Deng, Z.D., Lu, J., Myjak, M.J., Martinez, J.J., Tian, C., Morris, S.J., Carlson, T.J., Zhou, D.,
Hou, H.: Design and implementation of a new autonomous sensor fish to support advanced
hydropower development. Rev. Sci. Instrum. 85(11), 115001 (2014)
12. Hashizume, K., Rosado, D.G., Fernández-Medina, E., Fernandez, E.B.: An analysis of
security issues for cloud computing. J. Internet Serv. Appl. 4, 5 (2013)
13. Dawoud, W., Takouna, I., Meinel, C.: Infrastructure as a service security: challenges and
solutions. In: The 7th International Conference on Informatics and Systems (INFOS), pp. 1–
8. IEEE Computer Society (2010)
14. Carlin, S., Curran, K.: Cloud computing security. Int. J. Ambient Comput. Intell. 3(1), 14–
19 (2011)
15. Catteddu, D.: Cloud computing: benefits, risks and recommendations for information
security. In: Serrão, C., Aguilera Díaz, V., Cerullo, F. (eds.) Web Application Security.
Communications in Computer and Information Science, vol. 72. Springer, Berlin (2010)
16. Viega, J.: Cloud computing and the common man. Computer 42(8), 106–108 (2009)
17. Rittinghouse, J.W., Ransome, J.F.: Cloud Computing: Implementation, Management, and
Security. CRC Press, Boca Raton (2009). ISBN 9781439806807
18. Violino, B.: The Dirty Dozen: 12 Top Cloud Security Threats for 2018. CSO Online (2018)
19. Cloud Security Alliance: Top Threats to Cloud Computing. V1.0 (2010)
20. Zio, M.D., Fursova, N., Gelsema, T., Gießing, S., Guarnera, U., Petrauskien, J., Kalben, L.Q.,
Scanu, M., Bosch, K.O., Loo, M., Walsdorfer, K.: Methodology for Data Validation 1.0.
ESSnet ValiDat Foundation (2016)
A Cloud-Based Decision Support System Framework 529

21. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am.
Stat. 46(3), 175–185 (1992)
22. Zhang, Z.: Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 4(11),
218 (2016)
23. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y.: A comparison study on similarity and
dissimilarity measures in clustering continuous data. PLoS ONE 10(12), e0144059 (2015)
24. Popovic, K., Hocenski, Z.: Cloud computing security issues and challenges. In: Proceedings
of the 33rd International Convention MIPRO, pp. 344–349. IEEE Computer Society,
Washington DC (2010)
An Attempt to Forecast All Different Rainfall
Series by Dynamic Programming Approach

Swe Swe Aung1,3(&), Shin Ohsawa2, Itaru Nagayama3,


and Shiro Tamaki3
1
Department of Software, University of Computer Studies, Taunggyi, Myanmar
2
Weathernews Inc., Okinawa, Japan
3
Department of Information Engineering,
University of the Ryukyus, Okinawa, Japan
{sweswe,nagayama,shiro}@ie.u-ryukyu.ac.jp

Abstract. Unexpected heavy rainfall has been seriously occurred in most parts
of the world, especially during monsoon season. As a serious consequence of
heavy rainfall, the people in those areas battered by heavy rainfall faced many
hardship lives. Without exception, prevention is the best way of minimizing
these negative effects. In spite of all, we developed a rainfall series prediction
system for different series patterns by applying the dynamic programming
approach aiming to acquire the rainfall level of the whole rainfall cycle. The
simple idea behind the proposed dynamic programming approach is to find the
similarity of two rainfall sequences upon the maximum match of the rainfall
level of those sequences. Based on 2011 and 2013 real data sets collected from
WITH radar, which is installed on the rooftop of Information Engineering,
University of the Ryukyus, the comparison between the conventional approach
(Polynomial Regression) and the proposed approach is investigated. These
correlation experiments confirm that the dynamic programming approach is
more efficient for predicting different rainfall series.

Keywords: Dynamic programming  Rainfall series  Polynomial regression


WITH radar

1 Introduction

Rainfall forecasting in meticulous practice plays an important role in predicting the


severe natural disasters with a view to prevent the potential threats and damages.
As reported by online news, heavy rainfall lashed Sierra Leone in Africa on
August 14, 2017, and left the region with landslides and mudslides due to heavy
flooding. On June 13, 2017, torrential rainfall hit Bangladesh and triggered deadly
mudslides in that region. The same deadly damages caused by guerrilla rainfall occurred
in Sri Lanka during the final week of May 2017. On July 5, 2017, many people went
missing in the massive landslides and floods from heavy rainfall that battered Fukuoka,
Japan. On July 21, 2017, the heaviest rainfall hit lower Myanmar, and many people were
temporarily displaced due to landslides and floods. Figure 1 shows the flood in the city
of Nago, Okinawa, Japan, caused by heavy rain on July 9, 2014.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 530–547, 2019.
https://doi.org/10.1007/978-3-030-02686-8_40
An Attempt to Forecast All Different Rainfall Series 531

Fig. 1. Okinawa in Japan floods caused by heavy rain on July 9, 2014 [1].

The damage caused by guerrilla rainfall points out the importance of localized
rainfall prediction with accurate estimations to prevent the after-effects. A quick change
in rainfall is one of the most difficult factors in making a decision about a long-term
prediction.
The states between developing and decaying cumulonimbus clouds can alter very
rapidly. Fortunately, the small-dish WITH aviation radar includes functions for
observing and capturing rapidly developing cumulonimbus clouds in high resolution to
deal with those difficulties.
For this purpose, a prediction model is designed by using the concept of dynamic
programming algorithm. Dynamic programming approach is a powerful tool for
solving the problem of investigating the similarity between two pairs of rainfall series.
The similarity between two rainfall sequences is defined according to the maximum
match number of rainfall levels. Direct comparison of two rainfall sequences is not
completely an appropriate matching to compute the similarity and generate the rainfall
level relationships between those two rainfalls. Thus, dynamic programming came to
our attention as an approach that is a good fit for predicting different rainfall series
pattern.
Another study for predicting the whole rainfall series is one of conventional curve
fitting approaches (polynomial regression). The concept of polynomial regression is to
generate a prediction model of independent variable x and dependent variable y cor-
responding to the nonlinear relationship between them. In this paper, the two
approaches primarily aim to investigate the most similar rainfall series for newcomers
are presented.
The rest of this paper is organized as follows. Section 2 describes related works.
Section 3 discusses WITH radar and how to generate rainfall level. Section 4 describes
the phenomenon of localized rainfall. Section 5 details the construction of rainfall level
data model and Sect. 6 details with dynamic programming model for rainfall series
prediction. Section 7 is about polynomial regression and Sect. 8 is analytical result and
discussion. Section 9 is the conclusion.
532 S. S. Aung et al.

2 Related Works

The level or amount of rainfall prediction system for short-term period has being
implemented by many researchers in many parts of the countries by applying various
prediction methodologies to different kinds of rainfall data resources. In this case, many
powerful machine learning approaches have come to the attention of researchers for the
short-term rainfall prediction systems.
In [2], the authors proposed a system for prediction of rainfall using radar reflec-
tivity data by applying five machine learning approaches (neural network, random
forest, classification and regression tree, support vector machine, and k-nearest
neighbor) in a watershed basin at Oxford. The purpose of the paper is to select one
algorithm, which could predict the rainfall with the highest precision accuracy. As
reported by the experimental results, artificial neural network MLP NN is the best
performance in comparison to other algorithms.
In [3], the authors designed a system for short-term rain forecasting system in the
northeastern part of Thailand by applying machine learning techniques (decision tree
(DT), artificial neural network (ANN) and support vector machine (SVM)). According
to the comparative results, artificial neural network and support vector machine are
more suitable for the prediction of short-term rainfall amount than decision tree.
Aung et al. [4] proposed a short-term prediction of localized rainfall from radar
images by applying dual-kNN approach aiming to forecast one-minute, three-minute,
and five-minute forecasts. They utilized dual-kNN approach in order to upgrade the
ordinary classification routines of classical k-nearest neighbors (k-NN) and to improve
the prediction accuracy. They experimentally confirmed with test cases and simulations
that the performance of dual-kNN is more effective than classical k-NN.
Inafuku et al. [5] designed a short-term prediction for guerrilla rainstorm by using
state-transition method. For the short-term prediction, they introduced the rapid state-
transition ones based on short-period sampling data to overcome the weakness of the
classical state-transition method. Besides, they introduced the estimation method of the
coordinates of center of gravity movement of rain cloud to get more precision forecast.
In [6], the authors proposed approach for searching for similarities in the amino acid
sequence of two proteins to determine whether significant homology exist between the
proteins by applying dynamic programming matching approach.
The systems described above only emphasized on how to do the prediction of
short-term rainfall prediction using various powerful machine learning approaches. In
other words, it means that the system makes a forecast emphasizing on only one part of
the rainfall series. Thus, this paper intends to predict the rainfall level of the whole
rainfall series or rainfall circle by applying dynamic programming approach. Dynamic
programming approach is a powerful approach for solving the problem of sequence
decisions [7]. The underling idea is to find the similarity of two sequence problems by
applying alignment method.
An Attempt to Forecast All Different Rainfall Series 533

3 WITH Radar and Rainfall Level

The small-dish aircraft radar dubbed WITH radar, which is owned by Weathernews
Inc., is Doppler radar for observing and capturing cumulonimbus clouds that can cause
torrential rainstorms. It has the following features [8].
• The diameter of the radar is about 1000 mm.
• It can capture the development processes of cumulonimbus clouds that cause
guerrilla rainstorms.
• It can observe altitudes of 2 km and below.
• Observations use the Doppler method.
• The frequency is 9340 MHz (X-band).
• Electric power is 30 W.
• Sampling time is six seconds.
• The observable range is a 50 km radius.
• Spatial resolution is a 150 m mesh.
Figure 2 combines three pictures, where the leftmost is the WITH radar installed on
the rooftop of the Information Engineering building. The middle picture is a cross-
section scan from the WITH radar that shows a cumulonimbus cloud forming near
Okinawa Island. The rightmost photo is the color scheme for rainfall levels 0 to 14. The
quantity of rainfall is defined by the equation 2.67  Rain Level, corresponding to a
quantity of precipitation from 00 mm/h to 40 mm/h.

Fig. 2. Left to right: WITH radar; an observed localized cumulonimbus cloud near Okinawa;
and the colors denoting the various rain levels.

In Fig. 2, the middle image represents a sample image from observation of a rain
cloud constructed by cross-section scan. In this image, the weather radar locates the
area where the suspected rain cloud produces a heavy rainstorm. The intensity of
rainfall levels is represented by 15 different colors (black, off-white, sky blue, light
blue, blue, dark blue, dark green, green, light green, light yellow, yellow, yellow–
orange, light pink, pink, and red) as shown in the rightmost section of Fig. 2. Beyond
that, the rainfall level in digital format is from 0 to 14, where 0 is clear (i.e. not raining).
Light rain is from rainfall level 1 to 5, moderate rainfall is from level 6 to 11, and heavy
rain is from level 12 to 14.
534 S. S. Aung et al.

Table 1 illustrates the intensity of each rainfall level in digital format. In this case,
the intensity of each rainfall level is computed by applying the following (1):

Intensity of Rainfall ¼ ððRainfall mm=hr Þ= Level Number Þ  Rainfall level ð1Þ

Table 1. Intensity of rainfall levels


Rainfall level Intensity of rainfall level
(2.66 * rainfall level)
0 0
1 2.66
2 5.32
3 7.98
4 10.64
5 13.3
6 15.96
7 18.62
8 21.28
9 23.94
10 26.6
11 29.26
12 31.92
13 34.58
14 37.24

4 Phenomenon of Localized Rainfall

Usually, the development and decay of cumulonimbus clouds lasts from 30 min to 1 h.
The phenomenon can occur over small islands, such as Okinawa. Figure 3 demon-
strates 12 rainfall series. Extensively, Y axis represents the rainfall level in inches, and
X axis denotes minutes.
Figures 3 and 4 illustrate the phenomenon of torrential rainfall based on 2011 and
2013 weather data. In Fig. 3, it can be clearly seen that the red dots represent growth
and blue dots represent decay. It is obvious that higher rainfall levels cover a larger
rainfall area. The two go hand-in-hand. Figure 4 illustrates a series of torrential rainfall
levels based on a time increment that lasted around 30 min. In this figure, the X axis is
time, and the Y axis is rainfall level.
Figure 5 demonstrates the development and decay conditions in torrential rainfall.
It usually starts at a small size and becomes bigger. Finally, it slowly starts to decay.
Actually, a rainfall cycle that usually lasts 30 min includes around 300 images, because
the radar takes one picture every six seconds. From one rainfall cycle, we only used
An Attempt to Forecast All Different Rainfall Series 535

Fig. 3. Twelve Rainfall Series based on 2011 and 2013 Rainfall Data.

Fig. 4. Rainfall levels for torrential rain lasting about 30 min.

Fig. 5. Radar images of rainfall lasting about 30 min.


536 S. S. Aung et al.

some of the more important images to illustrate the characteristics of the rainfall cycle
in Fig. 5.

5 Rainfall Level Data Model Construction

Before going into the detailed discussion of dual-kNN, we want to discuss how to
create a rainfall level data model for rainfall prediction.
Figure 6 illustrates the rainfall level of each pixel, P (ri) (x, y), extracted from radar
images where {P(ri) (x, y): i = 0, 1, 2, …,14} denotes the rainfall level at coordinates
(x, y), and i 2 {0, 1, 2, 3,…, 14}. Then, pixel values P (r0) (x, y), P (r1) (x, y),…, P (r14)
(x, y) represent each rainfall level in a single image. An image may contain different
rainfall levels corresponding to the current captured image and weather conditions.
After generating the pixel values (rainfall level), the intensity of each rainfall level is
manipulated again by applying (1). We create a data model, as shown in Table 2, for
the rainfall prediction system, which includes 15 features (R_Level0, R_Level1,…,
R_Level14) belonging to 15 different class types (R_ Intensity). In this case, a radar
image contains many pixels that denote different rainfall levels. Therefore, in the data
model, each instance represents only one image. Thus, one image is a combination of
15 different aspects of the instance, from R_Level0 to R_Level14. In detail, R_Level0

Fig. 6. Rainfall level of each pixel extracted from radar images.

indicates rainfall level 0, and its value is the total number of instances of rainfall level
0. R_Level1 also denotes all the instances of rainfall level1 extracted from the same
image, and so on. Now, we have created the simplest data model for rainfall prediction,
as shown in Table 2.

6 Dynamic Programming Matching for Rainfall Series


Prediction

A dynamic programming method, which is originated, by Needleman and Wunsch


(1970) becomes very useful and powerful approach in a variety of appliances in the
field of computer science. A simple strategy underlining the dynamic programming is
An Attempt to Forecast All Different Rainfall Series 537

Table 2. Rainfall level data model


R_Level0 R_Level1 .. . R_Level14

Fig. 7. Sample Rainfall Series named Rainfall_S1.

Fig. 8. Sample Rainfall Series named Rainfall_S2.

to investigate the similarity between two sequences corresponding to the maximum


match in a certain path.
For rainfall series prediction system, let us consider two sample rainfall series,
Rainfall_S1, and Raingfall_S2, as shown in Figs. 7 and 8. In the two sample rainfall
series, each has five images and each image represents different rainfall levels. In our
discussion, we often used node, which also represents the image of rainfall series.
For the two rainfall series, the similarity can be mathematically denoted as follows:
   
Similarity RainfallSi ; RainfallSj ¼ Score Optimal Alignment of RainfallSi and RainfallSj ð2Þ

As stated in (2), the similarity of two rainfall sequences is defined as the best or
optimal part that has the highest alignment scores among all alignments of two rainfall

Fig. 9. A sample graph for two rainfall sequences.


538 S. S. Aung et al.

sequences. The best optimal path represents the predicted rainfall series for a new
comer series.
Figure 9 illustrates a construction of directed graph G = (V, E) consisting of a set
of notes (V) connected by edges (E) to perform a Needleman-Wunsch alignment for
two rainfall series. Each node owns two properties, one is the pointer to the corre-
sponding node that gives optimal sub-alignment and the second one is alignment score.
As a first step, to find out the best alignment for each note, it needs to consider the
similarity of three corresponding subsequences (Score1, Score2 and Score3). Secore1 is
the addition of the best score of note, Note [i, j − 1] and ScoreðgapÞ. Secore2 is
computed by adding the score of the note, Note [i − 1, j − 1] and ScoreðgapÞ. Like-
wise, Score3 is the addition of the value of the node, Note [i − 1, j] and ScoreðmatchÞ.
Then, the alignment that belongs to the highest score is selected as the besignment for
the current note. For finding the best alignment score, the following equations are
given.

Score1 ¼ Scoreðsubalignmet1 Þ þ ScoreðgapÞ ð3Þ

Score2 ¼ Scoreðsubalignment2 Þ þ ScoreðgapÞ ð4Þ

Score3 ¼ Scoreðsubalignment3 Þ þ ScoreðmatchÞ ð5Þ

Where,
Score (gap) = −2, Score (matched pair) = Similarity (Image[i], Image[j]) and
Score (mismatch pair) = −1.
In details, Score (gap) means that there is no value to match for two nodes, Score
(mismatch pair) is that two nodes have their own value, but these two values are not the
same, and for Score (matched pair), the value of two nodes are the same. In the rainfall
series prediction system, we define the threshold (Similarity (Image [i], Image[j]) > 90)
for finding the similarity of two images. As we discussed in the previous section, the
rainfall image is the combination of 15 rainfall levels (R Level0 ; R Level1 ; R Level2 ,
…, R Level14 ). Thus, the similarity between two rainfall level images is defined in
terms of average distance of 15 rainfall levels as denoted in Eq. (6). If the similarity is
greater than 90%, we assume that the two images are identical or matched pair.
The average similarity between two rainfall images can be defined in percentage by
the following equation:
  
1 Xn\15 Absolute:Val Imagei :Level½m  Imagej :Level½m
SimilarityðImage½i; Image½ jÞ ¼ 1
15 m¼0 Imagei :LevelðmÞ þ Imagej :LevelðmÞ
ð6Þ

Where, n is the number of rainfall levels and Absolute.Val (Imagei∙Level[m],


Imagej∙ Level[m]) is the distance between Level[m] of Imagei and Imagej. Then, the
distance divided by the addition of Level[m] of Imagei and Imagej generates the dis-
tance between the two images in percentage. Consequently, the similarity is evaluated
by subtracting the distance from 1. As a final result, the average similarity between
An Attempt to Forecast All Different Rainfall Series 539

Imagei and Imagej is summing the similarity up to 14 rainfall levels and dividing the
additional result by the number of rainfall levels.
If the similarity is greater than 90%, then DP matching algorithm will take this two
images into account in the matching process. Otherwise, it will refuses to consider them
in creating the optimal sub-alignment. In details, the following step-by-step procedure
illustrates how to evaluate the similarity between two images (Imagei, Imagej).

Before going to the section of finding the best path, let us first observe sub-
alignment score of each note.
As shown in Fig. 9, let us consider the process of Node [i = 3, j = 3] in red color. It
has to define the best alignment score for the current note by selecting the highest score
from three sub alignments (sub-alignment1, subalignment2 and subalignment3) of
immediate predecessors in creating the best path. Where, sub-alignment1 comes from
Note [i, j − 1], sub-alignment2 is from Note [i − 1, j − 1], and sub-alignment3 is from
Note [i − 1, j].
Then, Score1, Score2, and Score3 can be denoted by the following equations:

Score1 ¼ Note½i; j  1  2
if 0  i  Series1 :Length and ð7Þ
0  j  Series2 :Length

Score2 ¼ Note½i  1; j  1 þ SimilarityðImage½i; Image½ jÞ


if 0  i  Series1 :Length and ð8Þ
0  j  Series2 :Length

Score3 ¼ Note½i  1; j  2
if 0  i  Series1 :Length and ð9Þ
0  j  Series2 :Length

After that, the best alignment for Note [i, j] can be chosen by a given equation:
540 S. S. Aung et al.

8
< Score1
ScoreðAlignment½i; jÞ ¼ max Score2 ð10Þ
:
Score3

Then, now it is ready to find the best path for two rainfall series. The optimal path

Fig. 10. Sample best optimal path for rainfall series forecast.

can be defined by using the scores backtrack through the nodes with the optimal
sub-alignment as shown in Fig. 11. The final result is the optimal path that is the most
similar to a new comer rainfall series. Figure 10 illustrates a sample best path for
rainfall series forecast in visualization view.

Fig. 11. Reconstructing the optimal path using backtrack through the nodes with best alignment
score.

Xj1
Optimal Path ðRainfall Si ; Rainfall Sj ¼ i¼1
ScoreðNotei ! Notei þ 1 Þ ð11Þ

Figure 11 demonstrates the best optimal path for rainfall series, Rainfall_S1 and
Rainfall_S2 by using backtrack through the nodes which owns the highest alignment
score until the last node. The most possible predecessor is the diagonal match. The DP
algorithm performs alignments with a time complexity of O (ij).
An Attempt to Forecast All Different Rainfall Series 541

Algorithm 1 illustrates the detail process of dynamic programming matching for


creating the best rainfall series path between two rainfall sequences.
542 S. S. Aung et al.

7 Polynomial Regression for Rainfall Series Prediction

Polynomial regression is a model of nonlinear regression approach, which is useful to


find the characteristic of nonlinear relationship between the independent variable x and
the dependent variable y. The polynomial regression is a popular approach for varieties
of application areas, for example business and economic, weather and traffic prediction
systems [9].
For rainfall series prediction system, it has two properties (time and rainfall level)
for each series. Here, we have a list of n rainfall series, S ¼ fs1 ; s2 ; s3 ; :::; si g; where
i ¼ f1; 2; 3; :::; ng. Each rainfall series has different rainfall series length characterized
by the following equation:

si ¼ ðht1 ; r1 i; ht2 ; r2 i; ht3 ; r3 i; . . .; htk ; rk iÞ ð12Þ

Where k ¼ f1; 2; 3; :::; ng; tk represents time series and rk is rainfall level. To
estimate different rainfall series, we generate different regression models for different
rainfall series as a model bank. For each new comer, xi , the prediction process is taken

Fig. 12. Rainfall series prediction model.

through the R_Model bank. After that, the error estimation of each rainfall model is
computed using least square error approach. As a final step, the system made a decision
of the best fit rainfall series according to the information of error estimation model.
Figure 12 illustrates the bock diagram for the detail process of rainfall series prediction
system.
The predicted value for the rainfall series using jth degree polynomial regression
model can be written as

f ð xÞ ¼ a0 þ a1 x þ a2 x2 þ ::: þ aj x j ð13Þ

Where j represents the degree of polynomial regression, aj are the regression


coefficients.
The general least square error is given by
Xn   2
er ¼ i¼1
yi a0 þ a1 x þ a2 x2 þ ::: þ aj x j ð14Þ
An Attempt to Forecast All Different Rainfall Series 543

Where, yi is the actual value, and Er is the least square error. For rainfall series
c can be written as
system, a set of least square error Er

c ¼ ðer1 ; er2 ; :::; ern Þ:


Er ð15Þ

The best fit line can be defined by choosing the minimized error from
c
the set of least square error Er:
 
c
The best line ¼ select minimize error Er ð16Þ

8 Experiment and Analysis Discussion

In this section, we will discuss the experimentation of the rainfall series prediction
system and the results that prove the efficiency of the new approach, dynamic pro-
gramming by comparing with polynomial regression approach. For those results, the
prediction accuracy is computed using a measurement of how close the actual value of
observed rainfall series to the value of forecasted rainfall series. As a first step, we
evaluate the forecast error by applying the following equation:

Error ðRainfallSi Þ ¼Absolute Value of fðAcutal ValueðRainfallSi Þ


ð17Þ
 Forecast ValueðRainfallSi ÞÞg

Error ðRainfallSi Þ%
Absolute Value of fðActual ValueðRainfallSi Þ  Forecast ValueðRainfallSi ÞÞg
¼
Actual ValueðRainfallSi Þ
ð18Þ

Then, the accuracy of rainfall series is evaluated by the follow given equation.

Accuracyð%Þ ¼ 1  Error ð%Þ ð19Þ

Table 3. Weather data size descriptions


Year Original data size Data size in the preprocessing stage
2011 3 GB 4 MB
2013 8.006 GB 10.9 MB

In this case, if accuracy is larger than 100, then the accuracy is 0%. For this
experimentation, rainfall-level history data was provided by Weathernews Inc. Table 3
describes the data sizes of the years (2011 and 2013) from two aspects: the original
size, and the size for the preprocessing stage, which includes noise filtering, and
converting images into a numerical format.
544 S. S. Aung et al.

Table 4. Number of images description included in each series


Rainfall series Number of images Amount time for prediction (millisecond)
Series 1 632 11976
Series 2 317 6227
Series 3 1013 24649
Series 4 875 20567
Series 5 285 5423
Series 6 571 10648
Series 7 1086 28737
Series 8 292 5700
Series 9 857 18614
Series 10 128 4135
Series 11 288 5920
Series 12 244 5467
Total = 6588 Average = 12338.58333

Table 4 describes the number of rainfall level images are included in each rainfall
series, and the amount of processing time required for each different rainfall series with
different processing time. As reported by Table 4, the more the rainfall series owns
images, the more the processing time they need. For all rainfall series, the average
processing time is 12338 ms.
Table 5 illustrates the prediction accuracy of different rainfall series pattern using
full-cross validation. This table has six columns. The first column is the name of
rainfall series. The second column is actual data. In more detail, the actual data is the
sum of rainfall levels of all images for one rainfall series. The third column describes
the forecasted rainfall level values. This value is also the summing up of rainfall levels
of all images of forecasted rainfall series. All rainfall series, except Series 3, Series 7,
and Series 10, achieve acceptable accuracy. For Series 3, Series 7 and Series 10, the
rainfall series stored in databank are different rainfall level value. Thus, the algorithm is
not able to retrieve 90% or greater than 90% similar series pattern. If we train the
algorithm with plenty of case-bank, the better accuracy the algorithm will gain.
However, the average forecast accuracy, 57%, confirms that the system is suitable to
apply to different rainfall series prediction system.
Table 6 gives the prediction accuracy of rainfall series without using Full-Cross
Validation approach. To put it another way, each rainfall series does not ignore itself in
finding the most similar rainfall series in case-bank. That is to say, the case-bank
includes the most similar rainfall series to each series. Thus, in this experiment, each
rainfall series achieves high prediction accuracy with average accuracy, 85%.
Tables 7 and 8 demonstrate a second approach to the prediction accuracy of rainfall
series, polynomial regression algorithm, that finds a nonlinear relationship between
time (tk Þ nd rainfall level (rk ). In Table 7, it states the prediction accuracy without using
full-cross validation. In this study, the algorithm fails to reach acceptable accuracy in
Series 2, 5, 7 and 8. Moreover, the estimation accuracy using full-cross validation can
An Attempt to Forecast All Different Rainfall Series 545

Table 5. Accuracy of rainfall series using full-cross validation


Rainfall series name Actual Forecast Error Error (%) Accuracy (%)
Series 1 67800 44023 23777 35.06932 64.93067847
Series 2 14899 13258 1641 11.01416 88.98583798
Series 3 77502 500829 423327 546.2143 0%
Series 4 86778 84935 1843 2.12381 97.87618982
Series 5 67726 34550 33176 48.98562 51.01438148
Series 6 109866 158971 49105 44.69536 55.30464384
Series 7 453628 83260 370368 81.64575 18.35424621
Series 8 23548 16228 7320 31.08544 68.9145575
Series 9 81869 95428 13559 16.56182 83.43817562
Series 10 3081 586 2495 80.9802 19.01979877
Series 11 7790 10945 3155 28.82595 71.17405208
Series 12 5787 3931 1856 32.07189 67.92811474
Average accuracy 57.24505637

Table 6. Accuracy of dynamic programming without using full-cross validation


Rainfall series name Actual Forecast Abs (error) Error (%) Accuracy (%)
Series 1 67800 67326 474 0.00699115 99
Series 2 14899 11926 2973 19.95436 80
Series 3 77502 86357 8855 11.42551 89
Series 4 86778 100456 13678 15.76206 84
Series 5 67726 64188 3538 5.223991 95
Series 6 109866 105300 4566 4.155972 96
Series 7 453628 528509 74881 16.50714 83
Series 8 23846 21845 2001 8.391344 92
Series 9 81869 91250 9381 11.45855 89
Series 10 3081 1689 1392 45.18014 55
Series 11 7790 6635 1155 14.8267 85
Series 12 5787 4161 1626 28.09746 72
Average accuracy 85%

be seen in Table 8. In this case, Series 2, 5, 7, 8, and 10 could not be performed well.
Therefore, the total average accuracy of polynomial regression approach is 67%
without full-cross validation and 54% with full-cross validation approach.
546 S. S. Aung et al.

Table 7. Accuracy of polynomial regression without using full-cross validation


Rainfall series name Actual Forecast Error Abs (error) Error (%) Accuracy (%)
Series 1 67800 86778 −18978 18978 27.99115 72%
Series 2 14899 5787 9112 9112 61.15847 39%
Series 3 77502 86778 9276 9276 11.96872 88%
Series 4 86778 86778 0 0 0 100%
Series 5 67726 5787 61939 61939 91.45528 9%
Series 6 109866 86778 23088 23088 21.01469 79%
Series 7 453628 86778 −366850 366850 80.87023 19%
Series 8 23846 5787 18059 18059 75.73178 24%
Series 9 81869 86778 4909 4909 5.996165 94%
Series 10 3081 3081 0 0 0 100%
Series 11 7790 5787 2003 2003 25.71245 74%
Series 12 5787 5787 0 0 0 100%
Average accuracy 67%

Table 8. Accuracy of polynomial regression using full-cross validation


Actual Forecast Error Abs€ Error (%) Accuracy (%)
Series 1 67800 86778 −18978 18978 27.99115 72
Series 2 14899 5787 9112 9112 61.15847 39
Series 3 77502 86778 9276 9276 11.96872 88
Series 4 86778 77502 −9276 9276 10.68935 89
Series 5 67726 5787 61939 61939 91.45528 9
Series 6 109866 86778 23088 23088 21.01469 79
Series 7 453628 86778 −366850 366850 80.87023 19
Series 8 23846 5787 18059 18059 75.73178 24
Series 9 81869 86778 4909 4909 5.996165 94
Series 10 3081 5787 −2706 2706 87.82863 12
Series 11 7790 5787 2003 2003 25.71245 74
Series 12 5787 3081 2706 2706 46.75998 53
Average accuracy 54

As reported by comparative study of dynamic programming and polynomial


regression for rainfall series forecast, dynamic programming approach is more suitable
prediction approach for the whole rainfall series than polynomial regression approach.

9 Conclusion

In this study, we proposed a new predictive approach, dynamic programming algorithm


aiming to forecast the different rainfall series pattern for the whole rainfall life cycle,
not for each stage of rainfall series. As we know, the dynamic programming algorithm
An Attempt to Forecast All Different Rainfall Series 547

is a powerful approach for solving the time series problems. The approach is also
popular for DNA and the amino acid sequence of two proteins. Actually, DP matching
also covers almost research areas. To that end, for the rainfall series problem, the DP
matching came to our attention as an approach to predict the different rainfall cycles.
Furthermore, we also apply polynomial regression approach to rainfall series estima-
tion to demonstrate and prove that dynamic programming is more efficient. In agree-
ment with the experiment results as stated in Tables 5, 6, 7 and 8, DP matching
achieved a higher prediction accuracy than conventional approach, polynomial
regression. Supposing this research is in progress contending to forecast all different
rainfall series, only a prediction have been executed over 2011 and 2013 datasets that
are obtainable at this moment. For our future works, we will collect more rainfall series
from different years and then apply DP matching algorithm using massive case-banks
for proving that the efficient of algorithm with stronger confirmation for different
rainfall level pattern prediction.

References
1. Gilbeaux, K.: Global resilience system, Typhoon Neoguri—Flooding in Nago, Okinawa,
Wed, 2014-07-09. https://resiliencesystem.org/typhoon-neoguri-flooding-nago-okinawa
2. Kusiak, A., Wei, X., Verma, A.P., Roz, E.: Modeling and prediction of rainfall using radar
reflectivity data: a data-mining approach. IEEE Trans. Geosci. Remote Sens. 51(4), 2337–
2342 (2013)
3. Ingsrisawang, L., Ingsriswang, S., Somchit, S., Aungsuratana, P., Khantiyanan, W.: Machine
learning techniques for short-term rain forecasting system in the northeastern part of Thailand.
In: World Academy of Science, Engineering and Technology, vol. 2, no. 5 (2008).
International Journal of Computer and Information Engineering
4. Aaung, S.S., Senaha, Y., Ohsawa, S., Nagayama, I., Tamaki, S.: Short-term prediction of
localized heavy rain from radar imaging and machine learning. IEIE Trans. Smart Process.
Comput. 7, 107–115 (2018)
5. Inafuku, S., Tamaki, S., Hirata, T., Ohsawa, S.: Guerrilla rainstorm prediction of using a state
transition. In: Proceedings of Japan Wind Energy Symposium, vol. 35, pp. 375–378 (2016)
6. Needleman, S.B., Wunsch, C.D.: A general method application to the search for similarities in
the amino acid of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
7. Brown, K.Q.: Dynamic Programming in Computer Science. Department of Computer
Science, Carnegie-Mel Ion University, Pittsburgh (1979)
8. Kusabiraki, C.: Weathernews Inc, June 11, 1986. https://global.weathernews.com/
infrastructure/with-radar/
9. Ostertagov, E.: Modelling using polynomial regression. Proc. Eng. 48, 500–506 (2012)
Non-subsampled Complex Wavelet Transform
Based Medical Image Fusion

Sanjay N. Talbar1, Satishkumar S. Chavan2 ✉ , and Abhijit Pawar3


( )

1
SGGS Institute of Engineering and Technology, Nanded 431606, MS, India
sntalbar@yahoo.com
2
Don Bosco Institute of Technology, Kurla (W), Mumbai 400070, MS, India
satyachavan@yahoo.co.in
3
SKN Medical College and General Hospital, Narhe, Pune 411041, MS, India
abhijitpawar.rad@gmail.com

Abstract. The paper presents a feature based medical image fusion approach for
CT and MRI images. The directional features are extracted from co-registered
CT and MRI slices using Non-Subsampled Dual Tree Complex Wavelet Trans‐
form (NS DT-CxWT). These features are combined using average and maxima
fusion rules to create composite spectral plane. The new visually enriched image
is reconstructed from this composite spectral plane by applying inverse transfor‐
mation. Such fused images are evaluated for its visual quality using subjective
and objective performance metrics. The quality of fused image is rated by three
radiologists in subjective evaluation whereas edge and similarity based fusion
parameters are computed to estimate the quality of fused image objectively. The
proposed algorithm is compared with the state of the art wavelet transforms. It
provides visually enriched fused images retaining soft tissue texture of MRI along
with bone and lesion outline from CT with better contrast for lesion visualization
and treatment planning. It is also found that the average score by radiologists is
‘3.85’ for proposed algorithm which is much higher than that of the average score
for other wavelet algorithms.

Keywords: Medical image fusion · Non-subsampled complex wavelet transform


Dual Tree Complex Wavelet Transform · Discrete Wavelet Transform
Radiotherapy · Fusion parameters

1 Introduction

Medical imaging is extensively used in disease diagnosis and treatment since last two
decades. Major imaging modalities are Ultrasound Guided Imaging (USG), Computed
Tomography (CT), and Magnetic Resonance Imaging (MRI) along with functional MRI
(fMRI), Positron Emission Tomography (PET), and Single-Photon Emission Computed
Tomography (SPECT). Every modality imaging has its own advantages and disadvan‐
tages like CT captures calcifications, implants, and bone structures prominently whereas
MRI provides better visualization of soft tissues and lesions [1]. No single modality
provides all relevant clinical information together. Therefore, there is a need to develop
techniques which will bring important clinical information of two or more modalities

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 548–556, 2019.
https://doi.org/10.1007/978-3-030-02686-8_41
Non-subsampled Complex Wavelet Transform 549

in a single frame. Such techniques which aid the radiologists in disease diagnosis and
treatment planning are called multimodality medical image fusion. The acquisition
process of these modalities is also completely different which makes them complemen‐
tary modalities for the fusion.
Medical image fusion has significant role in treatment of cancer using radiation
therapy. The treatment uses CT as main modality whereas MRI is preferred as a comple‐
mentary modality. The delineation of infected cells or tissues is obtained using both CT
& MRI and planning of radiation procedure is done using CT. Obviously, it is a great
help to medical physicist to have both CT and MRI information together in a single
frame for delineation. This will help radiation oncologist to prepare precise treatment
plan for treating the cancer patients in a best possible way.
In fusion system, source modalities can be varied over large number of acquisition
processes. The source modality images have complementary structural representations.
Many techniques and algorithms were proposed in the literature for the fusion [2]. Two
major categories of fusion techniques are spatial domain and frequency domain techni‐
ques. Fusion process is also broadly divided into point wise fusion, feature based fusion,
and parametric mapping of decision fusion. Point wise fusion is simpler and combines
information point to point, feature level fusion extracts and merges features, and decision
level fusion selects and maps the relevant information for creating new image.
As per literature, pyramid and wavelet based Multiresolution Analysis (MRA)
approaches are extensively used for medical image fusion [3]. However, wavelet based
methods showed superior results as wavelets decompose the source images into
frequency sub-bands which give an edge over the pyramid transforms. Discrete Wavelet
Transform (DWT) provides spatio-spectral localization, better directional sensitivity
with good signal-to-noise ratio. It is preferred transform by many researchers for medical
image fusion [4–7]. However, fused images may have distortions and visual inconsis‐
tencies due to demerits of DWT like limited directional selectivity, oscillations, no phase
information, etc. Recently, complex wavelet transform is also preferred over DWT. Dual
Tree Complex Wavelet Transform (DT-CxWT), Daubechies Complex Wavelet Trans‐
form (DCxWT) and M-Band Wavelet Transform (MBWT) have used for fusion process
due to their directional sensitivity and phase information [8–10]. Edge based techniques
like contourlet transform [11], curvelet transform [12], shearlet transform [13], ripplet
transform [14] have also gained much attention in medical image fusion. Redundancy
Discrete Wavelet Transform (RDWT) also performs better due to its shift invariance
property [15]. Soft computing approaches like artificial neural network, fuzzy logic,
neuro-fuzzy, etc. are also preferred for medical image fusion [16]. However, retaining
visual content in fused images is still a challenge which requires development of new
fusion schemes.
In this paper, new fusion scheme is proposed which uses Non-Subsampled Dual Tree
Complex Wavelet Transform (NS DT-CxWT) to extract directional features from source
CT and MRI images. These features in spectral space are combined using fusion rules
like averaging of low frequency coefficients and selection of maximum valued high
frequency coefficients. The proposed fusion scheme is described in Sect. 2 along with
conceptual background of NS DT-CxWT and fusion rules. The experimental results and
550 S. N. Talbar et al.

analysis of fused images using subjective and objective evaluation metrics are presented
in Sect. 3 which is followed by conclusion and future scope in Sect. 4.

2 Proposed Fusion Scheme

The medical image fusion is a process of merging the relevant and complementary clin‐
ical information into new visually enriched fused image [5]. Figure 1 shows the proposed
fusion scheme in which the directional spectral features are extracted using NS DT-
CxWT. The source images are co-registered CT and MRI slices of same anatomical
structure of the same patient. The selection of appropriate frames from the source
modalities are done by radiologists. These selected frames of CT and MRI are registered
for pixel alignment using geometric transformations like scaling, translation and rota‐
tion. The effectiveness of fusion process depends on the registration process.

Fig. 1. Proposed medical image fusion scheme.

The directional features of CT and MRI are combined using fusion rules resulting
new spectral plane. The inverse NS DT-CxWT is applied to reconstruct the fused image
from this new feature plane. The fused images are tested for their visual quality subjec‐
tively with the help of radiologists. The fusion parameters are also calculated to evaluate
the fused images for their visual quality and preservation of anatomical structures from
the source images. The novelty of this paper is the feature extraction using NS DT-
CxWT and fusion rules which are discussed in the following subsections.

2.1 Discrete Wavelet Transform

Discrete Wavelet Transform (DWT) is widely used technique for subband decomposi‐
tion of images. It converts image into four subbands at first level of decomposition i.e.
approximate (A1), horizontal (H1), vertical (V1), and diagonal (D1) subbands as shown
in Fig. 2(a). A1 provides textural information and other subbands give three discontinu‐
ities as (0°, 90°, and ±45°) as shown in Fig. 2(b). However, DWT represents combined
features in +45° and −45° orientations. It also suffers due to less directionality, aliasing,
oscillations at discontinuities, and shift variance [17].
Non-subsampled Complex Wavelet Transform 551

Fig. 2. Discrete wavelet transform (a) First level decomposition (b) Corresponding fourier
representation provides information as A1: textural, H1: 0°, V1: 90°, D1: ±45°.

2.2 Non-subsampled Dual Tree Complex Wavelet Transform

Dual Tree Complex Wavelet Transform (DT-CxWT) is designed using real coefficients
in two tree structures resulting a complex nature. Real and imaginary parts of DT-CxWT
are used in Tree ‘a’ and Tree ‘b’, respectively. The complex representation of DT-CxWT
is given in the form of ‘a + jb’. DT-CxWT is nearly shift invariant, provides phase
information, and exhibits high directional selectivity [17]. Figure 3 shows three levels
of decomposition of NS DT-CxWT. Here, h0[n] & h1[n] are low pass filter coefficients
and g0[n] & g1[n] are high pass filter coefficients in tree ‘a’ and ‘b’, respectively. After
filtering using low pass and high pass filters, conventional down sampling operation is
eliminated in every level to make DT-CxWT as Non-Subsampled DT-CxWT.

Fig. 3. Three levels of decomposition by NS DTCxWT used in proposed medical image fusion
scheme.

NS DT-CxWT has six wavelets that are computed using (1) and (2). Here, 𝜓ia (m, n)
b
and 𝜓i+3 (m, n), i = 1, 2, 3 are filter coefficients which provides feature representations
oriented in six directions as (±15°, ±45°, ±75°) after decomposition [17]. Thus, NS DT-
CxWT has an edge over the other transforms in terms of high directional selectivity.
The spectral directional representation for two levels of decomposition with six orien‐
tations is shown in Fig. 4. The non-subsampling avoids the loss of information.
552 S. N. Talbar et al.

1 ( )
𝜓ia (m, n) = √ 𝜓1,i (m, n) − 𝜓2,i (m, n) (1)
2

b 1 ( )
𝜓i+3 (m, n) = √ 𝜓1,i (m, n) + 𝜓2,i (m, n) (2)
2

Fig. 4. Fourier spectrum of NS DT-CxWT representing six distinct orientations.

The merits of the proposed fusion scheme using NS DT-CxWT are the directional
selectivity, phase information, shift invariance, and redundant content with same compu‐
tational complexity as DT-CxWT. It also supports in the selection of appropriate features
to create composite spectral space.

2.3 Fusion Rules

The source CT and MRI images are decomposed into three levels using separable NS
DT-CxWT. It results into two low frequency subbands and six high frequency subbands.
Low frequency subband coefficients are averaged and maximum valued high frequency
coefficient is selected using (3) to create composite spectral space. Here, CP is composite
plane, t stands for tree ‘a’ or ‘b’, and K represents a particular subband (A, V, H, D).
The inverse NS DT-CxWT is applied on this composite plane to reconstruct fused image.

⎧ 𝛼CT K (u, v) + (1 − 𝛼)MRI K ;𝛼 = 0.5


⎪ t t
CPKt (u, v) = ⎨ CTtK (u, v) ;CTtK (u, v) > MRItK (u, v) (3)
⎪ MRIt (u, v)
K
;MRItK (u, v) ≥ CTtK (u, v)

3 Experimental Results and Discussion

The proposed fusion scheme is tested for its performance on the database of 29 study
sets of CT and MRI of same patient. Eighteen sets are captured using Simens CT scan
- Somatom Spirit scanner and Siemens 1.5 T MRI - Magnetom C1 machine, respectively
and 11 study sets are taken from website ‘https://radiopaedia.org/’. The radiologists
Non-subsampled Complex Wavelet Transform 553

selected slices based on anatomical markers. It is then followed by geometric transfor‐


mation to register them for pixel/voxel alignment. Sample study sets of CT and MRI are
presented in Figs. 5(a–c) and (d–f), respectively. A personal computer having Intel
processor i5 (2.50 GHz) and 4 GB RAM is used for all the computations in
MATLAB2013a.

Fig. 5. Fusion results: (a, b, c) CT images from Set 1, Set 2, and Set 3, (d, e, f) MRI images from
Set 1, Set 2, and Set 3. Fused images of Set 1 (First Row), Set 2 (Second Row), Set 3 (Third Row)
using (a1, a2, a3) DWT, (b1, b2, b3) SWT, (c1, c2, c3) NSCT, (d1, d2, d3) DT-CxWT, (e1, e2,
e3) proposed.

The fusion metrics viz. Entropy (En), Fusion Factor (FusFac), mean Structural
Similarity Index Measure (mSSIM), and Edge Quality Measure (EQ) [7] are calculated
for objective quality assessment. En provides energy representation of an image and
FusFac is a parameter based on mutual information computed using original images
and the fused image. The effective means of preserving edges are defined using EQ
whereas mSSIM is an index for similarity between source images and fused image. En
& FusFac should have higher values and EQ & mSSIM should have value approaching
towards ‘one’ for considering the fused image as a good quality image.
The proposed algorithm is compared with Discrete Wavelet Transform (DWT),
Stationary Wavelet Transform (SWT), Nonsubsampled Contourlet Transform (NSCT),
and DT-CxWT for its performance. The comparative objective evaluation of fusion
parameters for three study sets are presented in Table 1. It shows that En and FusFac
are higher for the proposed algorithm in all three sets. The values of EQ and mSSIM are
higher and approaching towards ‘one’ for proposed fusion scheme. Thus, objective
evaluation reveals that the proposed fusion method outperforms over the other fusion
techniques.
554 S. N. Talbar et al.

Table 1. Objective evaluation of proposed fusion scheme and other wavelet methods.
Study set Algorithm En FusFac EQ mSSIM
Set 1 DWT [6] 3.0887 3.8972 0.6871 0.6387
SWT [14] 3.1087 3.9213 0.7021 0.6377
NSCT [10] 3.1127 4.1252 0.7256 0.6646
DTCxWT [8] 3.1295 4.3586 0.7241 0.6574
Proposed 3.1985 5.8546 0.7883 0.7147
Set 2 DWT [6] 2.8476 4.3331 0.7164 0.5449
SWT [14] 2.5687 4.9647 0.7365 0.5598
NSCT [10] 2.9561 5.1243 0.7198 0.5836
DTCxWT [8] 2.8814 5.6574 0.7483 0.6054
Proposed 3.2149 6.0148 0.7928 0.6681
Set 3 DWT [6] 3.1125 3.6550 0.8605 0.6905
SWT [14] 3.3285 3.6805 0.8925 0.6207
NSCT [10] 3.5593 3.9871 0.8766 0.6982
DTCxWT [8] 3.7899 4.0153 0.8672 0.6879
Proposed 4.1106 5.1589 0.9056 0.7354

Three radiologists evaluated the quality of fused images subjectively. The fused
images are compared with source images in terms of anatomical similarity, contrast,
false content, and usefulness of fused images in delineation of infected cells or tumour.
All the fused images are rated on the scale of 0 (poor) and 4 (excellent) by radiologists.
The average score of subjective analysis of the fused images with various fusion algo‐
rithms is tabulated in Table 2. The average score for the proposed algorithm is ‘3.85’
which is higher than compared techniques. It proves that the fused images using
proposed algorithm are useful in delineation and contouring of tumour for radiation
therapy. Figure 5 shows fused images of three sample study sets using various wavelet
techniques.

Table 2. Subjective evaluation of fused images by Radiologists.


S. N. Algorithm Subjective score by radiologists
#1 #2 #3 Average
1 DWT [6] 2.50 2.80 2.70 2.67
2 SWT [14] 2.70 3.00 3.20 2.97
3 NSCT [10] 2.90 3.10 3.30 3.10
4 DT-CxWT [8] 3.10 3.30 3.40 3.27
5 Proposed 3.65 3.81 4.10 3.85

4 Conclusion and Future Scope

The fusion scheme presented in this paper is a feature based approach in spectral domain
using NS DT-CxWT. It provides multiscale and multiresolution representation with six
directional selectivity, shift invariance, and phase information with reduced
Non-subsampled Complex Wavelet Transform 555

computational complexity. The fused images using proposed scheme are useful in better
visualization of the abnormality or lesions for treatment planning in radiation therapy.
Fusion rules take care of textural preservation and better representation of discontinuities
which result in retaining actual anatomical structures in the fused images. The subjective
score for the quality of fused images using the proposed scheme indicates the excellent
visual quality and proves its usefulness in treatment planning. The objective parameters
also exhibit superior fusion metrics for the proposed algorithm when compared with the
other wavelet based fusion algorithms. The quality of fused images can be further
improved by modifying fusion rules with the help of iterative fusion schemes like neural
network, fuzzy logic, neuro-fuzzy, genetic algorithms, etc.

References

1. Kessler, M.L.: Image registration and data fusion in radiation therapy. Br. J. Radiol. 79(1),
S99–S108 (2006)
2. James, A.P., Dasarathy, B.V.: Medical image fusion: a survey of the state of the art. Inf.
Fusion 19, 4–19 (2014)
3. Pajares, G., Cruz, J.M.: A wavelet-based image fusion tutorial. Pattern Recognit. 37(9), 1855–
1872 (2004)
4. Qu, G.H., Zhang, D.L., Yan, P.F.: Medical image fusion by wavelet transform modulus
maxima. Opt. Express 9(4), 184–190 (2001)
5. Chavan, S.S., Talbar, S.N.: Multimodality image fusion in the frequency domain for radiation
therapy. In: International Conference on Medical Imaging, m-Health and Emerging
Communication Systems (MedCom), Noida, pp. 174–178. IEEE (2014)
6. Yang, Y., Park, D.S., Huang, S., Rao, N.: Medical image fusion via an effective wavelet based
approach. EURASIP J. Adv. Signal Process. Article ID-579341, 13 (2010)
7. Chavan, S.S., Pawar, A, Talbar, S.N.: Multimodality medical image fusion using rotated
wavelet transform. In: 2nd International Conference on Communication and Signal
Processing (ICCASP - 2016). Advances in Intelligent Systems Research, vol. 137, pp. 627–
635, Atlantic Press (2016)
8. Singh, R., Srivastava, R., Prakash, O., Khare, A.: Multimodal medical image fusion in dual
tree complex wavelet transform domain using maximum and average fusion rules. J. Med.
Imaging Health Inform. 2, 168–173 (2012)
9. Singh, R., Khare, A.: Fusion of multimodal medical images using Daubechies complex
wavelet transform - a multiresolution approach. Inf. Fusion 19, 49–60 (2014)
10. Chavan, S.S., Talbar, S.N.: Multimodality medical image fusion using M-band wavelet and
Daubechies complex wavelet transform for radiation therapy. Int. J. Rough Sets Data Anal.
2(2), 1–23 (2015)
11. Shanmugam, G.P., Bhuvanesh, K.: Multimodal medical image fusion in non-subsampled
contourlet transform domain. Circuits Syst. 7, 1598–1610 (2016)
12. Chen, M.S., Lin, S.D.: Image fusion based on curvelet transform and fuzzy logic. In: 5th
International Conference on Image and Signal Processing (CISP), pp. 1063–1067. IEEE
(2012)
13. Wang, L., Li, B., Tian, L.F.: Multimodal medical image fusion using the interscale and intra-
scale dependencies between image shift-invariant shearlet coefficients. Inf. Fusion 19, 20–28
(2014)
14. Das, S., Chowdhury, M., Kundu, M.K.: Medical image fusion based on ripplet transform
type-I. Prog. Electromagn. Res. B 30, 355–370 (2011)
556 S. N. Talbar et al.

15. Singh, R., Vatsa, M., Noore, A.: Multimodal medical image fusion using redundant discrete
wavelet transform. In: Advances in Pattern Recognition, pp. 232–235 (2009)
16. Das, S., Kundu, M.K.: A neuro-fuzzy approach for medical image fusion. IEEE Trans.
Biomed. Eng. 60, 3347–3353 (2013)
17. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform.
IEEE Signal Process. Mag. 22(6), 123–151 (2005)
Predicting Concussion Symptoms Using
Computer Simulations

Milan Toma(B)

Computational Bio-FSI Laboratory, College of Engineering and Computing Sciences,


Department of Mechanical Engineering, New York Institute of Technology,
Northern Boulevard, Old Westbury, NY 11568, USA
tomamil@tomamil.eu
http://www.tomamil.com

Abstract. The reported rate of concussion is smaller than the actual


rate. Less than half of concussion cases in high school football players
is reported. The ultimate concern associated with unreported concus-
sion is increased risk of cumulative effects from recurrent injury. This
can, partially, be attributed to the fact that the signs and symptoms
of a concussion can be subtle and may not show up immediately. Com-
mon symptoms after a concussive traumatic brain injury are headache,
amnesia and confusion. Computer simulations, based on the impact force
magnitude, location and direction, are able to predict these symptoms
and their severity. When patients are aware of what to expect in the
coming days after head trauma, they are more likely to report the signs
of concussion, which decreases the potential risks of unreported injury.
In this work, the first ever fluid-structure interaction analysis is used to
simulate the interaction between cerebrospinal fluid and comprehensive
brain model to assess the concussion symptoms when exposed to head
trauma conditions.

Keywords: Head injury · Concussion · Fluid-structure interaction


Simulations

1 Introduction

In 1981, Goldsmith’s letter to the editor states, “The state of knowledge con-
cerning trauma of the human head is so scant that the community cannot agree
on new and improved criteria even though it is generally admitted that present
designations are not satisfactory” [1]. Even decades later, this assessment can
still be considered reasonable to a degree.
The head model presented here is the only model currently incorporating
cerebrospinal fluid (CSF) flow. Other reported head models treat CSF as a solid
part incapable of flowing around the brain when exposed to head trauma condi-
tions [2–6]. The CSF flows even on its own when the head is at rest, albeit slowly.
Obviously, when the head is exposed to a sudden stop, e.g. in a car accident,
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 557–568, 2019.
https://doi.org/10.1007/978-3-030-02686-8_42
558 M. Toma

the CSF flow around the brain has a significant contribution to the head injury
mechanism. Without the flow the simulated cushioning effect of CSF can not be
considered realistic.
The most common reasons for concussion not being reported include a player
not thinking the injury is serious enough to warrant medical attention (66.4% of
unreported injuries), motivation not to be withheld from competition (41.0%),
and lack of awareness of probable concussion (36.1%) [7]. Regardless of the rea-
son, as McCrea et al. state, “Future prevention initiatives should focus on educa-
tion to improve athlete awareness of the signs of concussion and potential risks of
unreported injury.”, [7]. Needless to say, there is an unlimited number of trauma
situations that can occur, and the concussion symptoms can vary from one case
to another.
In some cases, the skull is dented inward and it presses against the surface of
the brain. These types of fractures occur in 11% of severe head injuries. In impact
sports, the skull dentation rarely occurs. Most sport-related brain injuries result
from coup-contrecoup type of injury. Coup-contrecoup injury is dual impacting
of the brain into the skull; coup injury occurs at the point of impact; coun-
trecoup injury occurs on the opposite side of impact, as the brain rebounds,
see Fig. 1. Most common causes of coup-contrecoup brain injury include circum-
stances when the head jerks violently, e.g. during motor vehicle accidents, when
baseball players are colliding during the chase for a ball, football players tackling,
boxers punching, and so on.

Fig. 1. Coup-contrecoup injuries, brain shifts inside the skull resulting in injuries at
point of impact and away from point of impact, e.g. forehead injury can result in
additional injury to occipital area.

The brain is composed of three main structural divisions, namely the cere-
brum, cerebellum, and brainstem. The cerebrum is divided into two cerebral
hemispheres connected by the corpus callosum and shared ventricular system.
The CSF fills a system of cavities at the center of the brain, known as ventricles,
Simulating Concussion Symptoms 559

and the subarachnoid space surrounding the brain and spinal cord (Fig. 2). The
CSF cushions the brain within the skull and serves as a shock absorber for the
central nervous system [8,9].

Fig. 2. The schematic of the cerebrospinal fluid in which the brain is submerged. The
3D computational model used is designed based on this schematic.

2 Methods
The methods section describes the creation of the head model, loading conditions
used for its validation, and numerical and computational methods used.
A. Head Model
The five anatomical structures used in this study are shown in Fig. 3. They
all have unique material properties. This patient-specific model is based on the
Digital Imaging and Communications in Medicine (DICOM) images acquired
from an online database. The skin, spinal cord, meninges, and the arachnoid
granulation, are the anatomical features missing in this model. When compared
to the very short impact impulse time history used in these simulations, the CSF
flow in the head can be neglected, too. The CSF flow speed, 0.05–0.08 m·s−1 , is
relatively slow compared to the speed of an impact leading to traumatic brain
injuries, i.e. during the impact impulse time history the CSF flows by 0.2–0.3 mm.
Based on these assumptions, the presence of the granulations can be neglected,
too.
B. Loading Conditions
Based on whether the head is stationary and struck by a moving object, or is
moving and strikes a stationary object, the type of brain injury differs, according
to [10]. The stationary head is usually hit by objects which are of similar mass to
the head. In this study, the scenario in Fig. 1 is used and it is assumed that the
impacting object does not penetrate the skull. Thus, local deformation of the
560 M. Toma

Fig. 3. The entire head model with skull, cerebrum, cerebellum, pituitary gland and
brainstem, respectively. Fluid particles (blue dots surrounding the brain model, in the
lower right corner) fill the entire subarachnoid space and other cavities.

skull in the frontal area is not resulting in direct contact injury to the underlying
brain tissue. It has been estimated that for a contact area of approximately
6.5 cm2 the force required to produce a clinically significant skull fracture in the
frontal area of the cadaver skull is twice that required in the temporoparietal
area [11].
Corresponding loading conditions from cadaveric experiments in [12] are used
to perform the computational analysis of a frontal impact. The experiments
examined the blow to the head of a seated human cadaver. The impact pulse
history applied to the skull of the computational model is shown in Fig. 4.
Simulating Concussion Symptoms 561

Fig. 4. Impact impulse time history used to simulate the cadaveric experiments in [12]
and applied to the skull in the current model.

C. Computer Simulations
As stated above, the model is comprised of five parts. Rigid material properties
with density 1900 kg·m−3 [13] are assigned to the skull part. A non-linear elas-
tic constitutive material model with varying material properties based from the
literature [14–18] is used to simulate the cerebrum, cerebellum, pituitary gland,
and brainstem. The cerebrum is composed of 96,385 tetrahedral elements. Sim-
ilarly, the cerebellum, brainstem, and pituitary gland are composed of 40,808,
18,634 and 310 tetrahedral elements, respectively. The smoothed-particle hydro-
dynamics (SPH) method is used to model the CSF. The bulk modulus of 21.9
GPa [3] and density 1000 kg·m−3 [19] are used for the CSF. The subarachnoid
space between the skull and brain, and other cavities, are filled with 94,690 fluid
particles.
The IMPETUS Afea SPH Solver R
(IMPETUS Afea AS, Norway) was used
to solve the fluid motion and boundary interaction calculations. Simultaneously,
the IMPETUS Afea Solver R
was used to solve the large deformations calcula-
tions in the solid parts. In both the solvers, for parallel processing, a commodity
GPU was used. To remove the possibility of hourglass modes and element inver-
sion that plagues the classic under-integrated elements, all solid elements were
fully integrated. An explicit integration scheme was used for both the fluid and
solid domains and their interaction. A standard “under the table” workstation
was used for all simulations. Tesla K40 GPU with 12 GB of Graphic DDR
memory and 2880 CUDA Cores were used to achieve the parallel acceleration.
H-refinement of the finite element mesh was performed to confirm that conver-
gence was reached. The solutions were found to yield same results with both
the mesh size of our choice and mesh size of higher number of elements. Simi-
larly, a higher number of fluid particles is used to obtain results within 5% of
the values obtained with the smaller number of particles. This confirmed that
the results are converged. The SPH equations in greater detail can be found
in our prior publication [20]. This study used the SPH method rather than the
562 M. Toma

traditional FSI techniques because the latter can be computationally expensive


and challenging regarding their parallelization [21]. Geometrical simplifications
would need to be used in order to use traditional FSI methods. Consequently, the
anatomical accuracy of the model would have to be sacrificed. Besides, recently
the SPH has been increasingly used in biomedical applications by other research
groups as well [22].

3 Results
The results section shows validation of the simulations matching coup and con-
trecoup responses in CSF with experimental results. The stress values on the
cerebrum resulting from the frontal impact are shown and SPH impulse inten-
sity is superimposed with the Boadmann’s map of cytoarchitectonics.
A. Validation
The loading conditions from cadaveric experiments (Fig. 4) applied to the frontal
lobe yield corresponding coup and contrecoup pressure responses in CSF, see
Fig. 5 where both experimental [12] and computational results are shown for
comparison.
B. Second Deviatoric Principal Stress
The stress values on the cerebrum resulting from the frontal impact are shown in
Fig. 6. The stress maxima can be found also on the occipital lobe which supports
the experimental observations that forehead injury can result in additional injury
to occipital area. Similar conclusion, i.e. stresses and strains seen in both frontal
and occipital lobes, is also found in other more simplified computational studies,
e.g. [5].
Similar results, i.e. high stress values, are found also on the parietal lobe
(Fig. 7). Moreover, here it is possible to make an additional observation that
they only occur on the posterior aspects of the gyri.
C. SPH impulse intensity
In biomedical fluid mechanics, the wall shear stress is often used to describe the
effect the fluid flow has on the surrounding structure. However, that variable
is challenging to derive when using SPH methods. Instead, SPH can provide
different variable with similar meaning. For example, SPH impulse intensity,
i.e. SPH driven mechanical impulse per unit area in pascal-second, has similar
properties as wall shear stress.
The SPH impulse intensity at peak impact impulse is shown in Fig. 8 [25]. At
first, the SPH impulse intensity develops slowly. And, eventually, it reaches its
maximum values around the peak. The areas most affected by the fluid particles
during their migration to the occipital/parietal bones, i.e. the acceleration phase,
are the parietal and upper temporal lobes. The higher SPH impulse intensity
values become more visible also in the occipital lobe when the fluid particles
change direction and start their migration towards the frontal bone, i.e. at the
peak.
Simulating Concussion Symptoms 563

Fig. 5. Coup (a) and contrecoup (b) pressure responses in cerebrospinal fluid compared
to the experimental results of Nahum et al. [12].

Fig. 6. High values of the second deviatoric principal stress are observed in both the
frontal and occipital lobes of the brain, i.e. forehead injury can result in additional
injury to occipital area. High values are prevalent mostly in the inner areas of the two
hemispheres close to the edges where longitudinal fissure separates the two halves of
the brain (dashed rectangle).

Cerebral structures have been correlated with specific functions [23,24].


While the structure-function relationship is still debated, Brodmann’s map is
frequently cited [23]. Figure 8 imposes Brodmann’s map of cytoarchitectonics
and depicts the functional areas most affected at the peak. Areas ‘40’, ‘4’, ‘3,1,2’
and ‘52’ are those covered with more than 10% of SPH impulse intensity maxima
(10.1, 11.7, 15.3 and 21.7%, respectively).
564 M. Toma

Fig. 7. High values of the second deviatoric principal stress are observed also in the
parietal lobe. However, in the parietal lobe the areas with high values are observed
only in the posterior aspects of the gyri (schematic and dashed ellipsoid).

Fig. 8. The SPH impulse intensity at the peak superimposed with the Brodmann’s
map of cytoarchitectonics [25].

4 Discussion
The different layers of the brain move at different times because each layer has
a different density. Simplified computational models are not able to incorporate
this important aspect. Moreover, interaction between CSF and brain gyri and
sulci can not be analyzed computationally if the methods used do not model the
CSF as fluid. The model used in this study uses a comprehensive head/brain
model with detailed representation of all the parts and the computational anal-
ysis used is an FSI method with fluid properties for the CSF. The validation of
this model and the computational method is shown comparing the coup and con-
trecoup pressure responses in CSF with the experimental results from cadaveric
experiments.
Simulating Concussion Symptoms 565

A few anatomical features are omitted in the head model; namely the skin,
arachnoid granulations, spinal cord, vasculature, and meninges. Obviously, skin
is irrelevant in this case. Due to the relatively slow CSF flow, the arachnoid gran-
ulations are negligible. The spinal cord, vasculature, and meninges are omitted
at this stage to make the simulations less computationally expensive, but they
may be considered in future studies.
In Fig. 5, where coup and contrecoup pressure responses in CSF compared
to the experimental results of [12] are shown, it can be observed that the agree-
ment with the experimental results is better in the coup response as opposed
to that in the contrecoup response. The contrecoup pressure response reaches
slightly higher values compared to the experimental data because the contrecoup
response is secondary and therefore more dependent on the patient-specific geom-
etry used. However, both coup and contrecoup computational pressure responses
can be considered of good agreement with the experimental measurements.
As discussed, if the interaction of CSF with the brain is to be analyzed the
CSF has to be modeled with fluid elements or particles and not just with fluid-like
solid elements. The results then have potential to show more complex responses
to the loading conditions. For example, Fig. 6 shows that the contrecoup stress
response is prevalent mostly in the inner areas of the two hemispheres close to
the edges where longitudinal fissure separates the two halves of the brain. The
brain model is comprehensive containing multiple parts each with detailed real-
istic patient-specific geometry. The complexity of the model enables the analysis
of the brain down to the exact gyrus and sulcus. Additional areas of high stress
values can be found outside the frontal and occipital lobes. However, interest-
ingly, only the posterior aspect of the gyrus seems to be affected. This can be
explained by following the wave in the CSF that occurs after the impact to
the frontal lobe [25]. During the acceleration phase when the brain wants to
move backwards relative to the skull the fluid particles move to concentrate in
the space between the skull and occipital lobe to provide the cushioning effect
and prevent the brain from impacting to the skull. At that point the moving
particles affect mostly the anterior sides of the gyri. When the brain rebounds
and wants to move forward relative to the skull the fluid particles move to the
space between the skull and frontal lobe to provide the cushioning effect there.
At that point the moving particles affect mostly the posterior side of the gyri.
Other parts of the brain, such as the brain stem, are equally affected by the
coup-contrecoup injury.
The variables readily available in the SPH methods are somewhat different
from those commonly used to post-process the results in the biomedical fluid
mechanics, e.g. wall shear stress extracting of which would be more challenging
when using the SPH methods. On the other hand, e.g. SPH impulse intensity
can be used in its stead as it offers similar meaning. In order to maintain as
much anatomical accuracy as possible, SPH is used in this study instead of the
traditional FSI techniques which would require more anatomical simplifications
to keep the convergence criteria satisfied.
566 M. Toma

The cortical areas affected by SPH impulse intensity at the peak are pre-
sented in Fig. 8 [25,26]. It is offered that the patterns of SPH impulse intensity
maxima may represent the cortical areas most affected by a concussion. Areas
‘40’, ‘4’, ‘3,1,2’, and ‘52’ are the Brodmann’s areas with at least 10% coverage
of maximal SPH impulse intensity. The left supramarginal gyrus, i.e. Brodmann
area ‘40’, receives input from multiple sensory modalities and supports complex
linguistic processes. Lesions in that area may yield Gerstmann syndrome and
fluent aphasia, such as Wernicke’s aphasia. Motor functions are typically asso-
ciated with Brodmann area ‘4’, but it also plays a supportive role in sensory
perception. Lesions there may result in paralysis and decreased somatic sensa-
tion. Brodmann areas ‘3,1,2’ comprise the postcentral gyrus in the parietal lobe
and are primarily associated with somatosensory perception. Lesions there may
result in cortical sensory impairments, e.g. loss of fine touch and proprioception.
Brodmann area ‘52’, i.e. the parainsular, is the smallest of the mentioned areas
and has the highest percentage of SPH impulse intensity maxima coverage. It
joins the insula and the temporal lobe.
This validated model, where an FSI method is used to analyze the interac-
tion between CSF and brain, is a step closer to understanding the mechanisms of
brain injuries. Concussions are usually diagnosed symptomatically. Patients may
exhibit a range of symptoms, such as headache, tinnitus, photophobia, sleepi-
ness, dizziness, behavioral changes and confusion. Different area of brain affected
would potentially result in different set of symptoms. The model and method
presented in this study can predict the areas affected based on the loading con-
ditions. Therefore, the symptoms can be predicted, too. Since the signs and
symptoms of a concussion can be subtle and may not show up immediately, a
numerical analysis of this kind could serve as a predictor for the physicians and
patients who then could be warned about what symptoms they are to expect
and be ready for. Hence, if used in practice, it has the potential to contribute to
early diagnosis which is important in treatment of concussion.

References
1. Goldsmith, W.: Current controversies in the stipulation of head injury criteria -
letter to the editor. J. Biomech. 14(12), 883–884 (1981)
2. Luo, Y., Li, Z., Chen, H.: Finite-element study of cerebrospinal fluid in mitigating
closed head injuries. J. Eng. Med. 226(7), 499–509 (2012)
3. Chafi, M.S., Dirisala, V., Karami, G., Ziejewski, M.: A finite element method
parametric study of the dynamic response of the human brain with different
cerebrospinal fluid constitutive properties. In: Proceedings of the Institution of
Mechanical Engineers, Part H (2009). Journal of Engineering in Medicine 223(8),
1003–1019
4. Liang, Z., Luo, Y.: A QCT-based nonsegmentation finite element head model for
studying traumatic brain injury. Appl. Bionics Biomech. 2015, 1–8 (2015)
5. Gilchrist, M.D., O’Donoghue, D.: Simulation of the development of the frontal
head impact injury. J. Comp. Mech. 26, 229–235 (2000)
Simulating Concussion Symptoms 567

6. Ghajari, M., Hellyer, P.J., Sharp, D.J.: Computational modelling of traumatic


brain injury predicts the location of chronic traumatic encephalopathy pathology.
Brain 140(2), 333–343 (2017)
7. McCrea, M., Hammeke, T., Olsen, G., Leo, P., Guskiewicz, K.: Unreported con-
cussion in high school football players: implications for prevention. Clin. J. Sport
Med. 14(1), 13–17 (2004)
8. Rengachary, S.S., Ellenbogen, R.G.: Principles of Neurosurgery. Elsevier Mosby,
New York (2005)
9. Toma, M., Nguyen, P.: Fluid-structure interaction analysis of cerebral spinal fluid
with a comprehensive head model subject to a car crash-related whiplash. In: 5th
International Conference on Computational and Mathematical Biomedical Engi-
neering - CMBE2017. University of Pittsburgh, Pittsburgh (2017)
10. Yanagida, Y., Fujiwara, S., Mizoi, Y.: Differences in the intracranial pressure
caused by a blow and/or a fall - experimental study using physical models of
the head and neck. Forensic Sci. Int. 41, 135–145 (1989)
11. Nahum, A.M., Gatts, J.D., Gadd, C.W., Danforth, J.: Impact tolerance of the
skull and face. In: 12th Stapp Car Crash Conference, Warrendale, PA, pp. 302–
316. Society of Automotive Engineers (1968)
12. Nahum, A.M., Smith, R.W., Ward, C.C.: Intracranial pressure dynamics during
head impact. In: 21st Stapp Car Crash Conference (1977)
13. Fry, F.J., Barger, J.E.: Acoustical properties of the human skull. J. Acoust. Soc.
Am. 63(5), 1576–1590 (1978)
14. Barser, T.W., Brockway, J.A., Higgins, L.S.: The density of tissues in and about
the head. Acta Neurol. Scandinav. 46, 85–92 (1970)
15. Elkin, B.S., Azeloglu, E.U., Costa, K.D., Morrison, B.: Mechanical heterogene-
ity of the rat hippicampus measured by atomic force microscope indentation. J.
Neurotrauma 24, 812–822 (2007)
16. Gefen, A., Gefen, N., Zhu, Q., Raghupathi, R., Margulies, S.S.: Age-dependent
changes in material properties of the brain and braincase of the rat. J. Neurotrauma
20, 1163–1177 (2003)
17. Kruse, S.A., Rose, G.H., Glaser, K.J., Manduca, A., Felmlee, J.P., Jack Jr., C.R.,
Ehman, R.L.: Magnetic resonance elastography of the brain. Neuroimage 39, 231–
237 (2008)
18. Moore, S.W., Sheetz, M.P.: Biophysics of substrate interaction: influence on neutral
motility, differentiation, and repair. Dev. Neurobiol. 71, 1090–1101 (2011)
19. Lui, A.C., Polis, T.Z., Cicutti, N.J.: Densities of cerebrospinal fluid and spinal
anaesthetic solutions in surgical patients at body temperature. Can. J. Anaesth.
45(4), 297–303 (1998)
20. Toma, M., Einstein, D.R., Bloodworth, C.H., Cochran, R.P., Yoganathan, A.P.,
Kunzelman, K.S.: Fluid-structure interaction and structural analyses using a com-
prehensive mitral valve model with 3D chordal structure. Int. J. Numer. Meth.
Biomed. Engng. 33(4), e2815 (2017). https://doi.org/10.1002/cnm.2815
21. Toma, M., Oshima, M., Takagi, S.: Decomposition and parallelization of strongly
coupled fluid-structure interaction linear subsystems based on the Q1/P0
discretization. Comput. Struct. 173, 84–94 (2016). https://doi.org/10.1016/j.
compstruc.2016.06.001
22. Toma, M.: The emerging use of SPH in biomedical applications. Significances Bio-
eng. Biosci. 1(1), 1–4 (2017). SBB.000502
23. Brodmann, K.: Vergleichende Lokalisationslehre der Grosshirnrinde (in German).
Johann Ambrosius Barth, Leipzig (1909)
568 M. Toma

24. Limited TCT Research (ed.) Cortical Functions. Trans Cranial Technologies ltd.
(2012)
25. Toma, M., Nguyen, P.: Fluid-structure interaction analysis of cerebrospinal fluid
with a comprehensive head model subject to a rapid acceleration and deceleration.
Brain Inj. 1–9 (2018). https://doi.org/10.1080/02699052.2018.1502470
26. Varlotta, C., Toma, M., Neidecker, J.: Ringside physicians’ medical manual for
boxing and mixed martial arts: technology & impact sensor testing. Association of
Ringside Physicians, Chapter D10 (2018)
Integrating Markov Model, Bivariate Gaussian
Distribution and GPU Based Parallelization
for Accurate Real-Time Diagnosis
of Arrhythmia Subclasses

Purva R. Gawde1(&), Arvind K. Bansal1, and Jeffery A. Nielson2


1
Department of Computer Science,
Kent State University, Kent, OH 44240, USA
pgawde@kent.edu, arvind@cs.kent.edu
2
Department of Emergency, Northeast Ohio Medical University,
Rootstown, OH, USA
jeffnielson@gmail.com

Abstract. In this paper, we present the integration of SIMT (Single Instruction


Multiple Threads), Markov model and bivariate Gaussian distribution as a
general-purpose technique for real-time accurate diagnosis of subclasses of
arrhythmia. The model improves the accuracy by integrating both morpholog-
ical and temporal features of ECG. GPU based implementation exploits con-
current execution of multiple threads at the heart-beat level to improve the
execution efficiency. The approach builds a bivariate Gaussian Markov model
(BGMM) for each subclass of arrhythmia where each state includes bivariate
distribution of temporal and morphological features of each waveform and ISO-
lines using ECG records for each subclass from standard databases, and the
edge-weights represent the transition probabilities between states. Limited 30-
second subsequences of a patient’s beats are used to develop bivariate Gaussian
transition graphs (BGTG). BGTGs are matched with each of the BGMMs to
derive the exact classification of BGTGs. Our approach exploits data-parallelism
at the beat level for ECG preprocessing, building BGTGs and matching multiple
BGTG-BGMM pairs. SIMT (Single Instruction Multiple Thread) available on
CUDA resources in GPU has been utilized to exploit data-parallelism. Algo-
rithms have been presented. The system has been implemented on a machine
with NVIDIA CUDA based GPU. Test results on standard MIT- BIH database
show that GPU based SIMT improves execution time further by 78% with an
overall speedup of 4.5 while retaining the accuracy achieved by the sequential
execution of the approach around 98%.

Keywords: Arrhythmia  AI techniques  ECG analysis  Gaussian


GPU  Markov model  Medical diagnosis  Machine learning
Parallelism  Wearable devices

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 569–588, 2019.
https://doi.org/10.1007/978-3-030-02686-8_43
570 P. R. Gawde et al.

1 Introduction

An aging population is challenging the current healthcare system by increasing costs,


creating a lack of healthcare personnel, and contributing to more complex combinations
of chronic diseases [1]. Cardiovascular diseases like arrhythmia, ischemia, myocardial
infarction and cardiomyopathy (including hypertrophy) are some of the most common
problems in elderly leading to sudden cardiac death (SCD) [1, 2] and congestive heart
failure. Often, these symptoms go undetected due to the transient nature of symptoms
and the mobile life-style of the modern society. Transitory nature of arrhythmia
requires monitoring of ECG to diagnose and reduce the risk of SCD [1] including life-
threatening ventricular fibrillation [3].
The demand for an improved healthcare system requires development of infor-
mation technology, and one area of opportunity is wearable smart monitoring devices
[4, 5]. Advances in microelectronics have provided smaller, faster and more affordable
embedded platforms for personal monitoring systems such as the NVIDIA Jetson GPU
[4, 5]. Most of these wearable biomedical systems can detect a variety of abnormalities
such as stress, oxygen level saturation, ischemia and arrhythmias, but with limited
accuracy.
ECG signal analysis for real-time detection of abnormalities involves computation-
ally expensive modules like signal denoising, morphological and temporal feature
extractions; complex functional transforms [6], computational intelligence techniques for
classification and machine learning. The AI techniques include the use of Bayesian
network [7], neural networks [8] and Markov models [9, 10]. The computational overhead
of exploiting these techniques is significant and violates the basic requirement of
resource-limited smart wearable devices diagnosing abnormality accurately in real-time.
In recent years, several researchers have exploited GPU based SIMT (Single
Instruction Multiple Threads) parallelism to improve the computational efficiency for
automated ECG analysis [11], de-noising [12], and classification of premature beats
using neural networks [8, 13]. Different techniques for parallelization include time-
domain analysis [7] and probabilistic neural networks [13]. For arrhythmic beat clas-
sification, Fan, Xiaomao, et al. [14] have proposed GPU based detection of seven types
of beats using thresholds and rule-based system. These studies indicate that GPU based
parallelization significantly improves the computational efficiency of ECG analysis.
However, these studies separate only premature ventricular complex beats from normal
beats [13], and do not address the diagnosis of the subclassification of ventricular and
supraventricular arrhythmia in real time.
The finer classification of arrhythmias is important because different subclasses
require different treatment [2]. For instance, ventricular tachycardia is generally treated
with antiarrhythmic drugs [2]; while ventricular fibrillation needs immediate treatment
by a defibrillator. Subclasses of supraventricular arrhythmia like the atrial flutter can
result into blood clots leading to cerebrovascular events [3] if not treated.
Finer subclassification requires an integrated model that can capture both mor-
phological and temporal characteristics of ECG and consider transition probabilities
within waveforms to account for waveform variations. Arrhythmic ECG also presents a
Integrating Markov Model, Bivariate Gaussian Distribution 571

challenge when some waveform features are embedded in another waveform [3], which
can lead to misclassification [1, 3].
Our earlier work focused on detecting finer subclasses of supraventricular and
ventricular arrhythmia in real time using the integration of Markov models and the
identification of embedded P-waves [9, 10]. The run-time detection of the disease
subclass requires: (1) statistical derivation of a theoretical Markov model graph for
each subclass; (2) dynamically building a real-time graph using a limited number of
beats at the run-time; and (3) matching the real-time graph from an individual patient to
the derived graphs to best classify the patient condition.
Our previous work needs to be further improved for time-efficiency because
resource-limited wearable devices need to analyze the ECG for other heart abnor-
malities such as ischemia (lack of oxygen), electrolyte imbalance such as hyperkalemia
(excessive potassium), myocardial infarction (heart failure due to prolonged ischemia)
to name a few. Additional improvement in execution-time is required to facilitate real-
time detection of other heart abnormalities concurrently and in real-time in resource-
limited miniaturized wearable devices [4, 5].
In this research, we propose an integrated general-purpose BGMM (Bivariate
Gaussian Markov Model) model that further improves the accuracy by associating
bivariate Gaussian distribution of amplitude and duration of the waveforms and ISO-
lines with each state of the Markov model. We improve execution efficiency by
exploiting SIMT parallelism available on GPU as shown in Fig. 1.

Sensor Data Multiprocessor embedded GPU


CPU
Preprocessing SIMT concurrency

Executable function 1 Abnormality detection 1


Executable function n Abnormality detection n

Fig. 1. Personal monitoring system for multiple abnormalities detection.

The major contributions in this paper are:


1. The development of a general-purpose model that integrates bivariate Gaussian
distribution of amplitude and duration of waveforms for a state with Markov model
to integrate morphological and temporal features.
2. The exploitation of SIMT based concurrency on GPUs that significantly improves
the execution efficiency of the finer subclassification of arrhythmia.
3. The development of multiple algorithms for beat-level exploitation of SIMT for
dynamic graph building and graph matching exploiting expectation maximization
for the arrhythmia subclassification.
The remainder of the paper is organized as follows: Sect. 2 describes the back-
ground concepts of Markov model and bivariate Gaussian distribution. Section 3
describes our BGMM based approach for arrhythmia subclassification. Section 4 dis-
cusses SIMT parallelization of the approach. Section 5 discusses algorithms for
572 P. R. Gawde et al.

Arrhythmia

Supraventricular Ventricular
AFIB VTach
AF VFlu
AVNRT VFib
EAT

Fig. 2. A subclassification of Arrhythmia.

execution of kernel functions; Sect. 6 discusses implementation and performance


results. Section 7 compares our approach and performance with other related works.
Section 8 concludes the paper and discusses future directions.

2 Background

2.1 Arrhythmia Subclassification


Arrhythmia is defined as irregular heartbeats caused by the presence of irregular and
refractory pulse patterns due to the presence of ectopic nodes arising outside the sinus
node. Arrhythmia is broadly classified into either supraventricular arrhythmias arising
above lower chambers of heart, or ventricular arrhythmia arising in the lower chambers
of heart. Supraventricular arrhythmias are further subclassified as: (1) Atrial fibrillation
(AFib); (2) Atrial flutter (AF); (3) Atrial-ventricular nodal reentry tachycardia
(AVNRT); and (4) Ectopic atrial tachycardia (EAT). Ventricular arrhythmia is clas-
sified into three major subclasses: (1) Ventricular Tachycardia (VTach), (2) Ventricular
Flutter (VFlu) and (3) Ventricular Fibrillation (VFib). Different subclasses have dif-
ferent levels of threat to health, and are treated differently [3].
Atrial fibrillation (AFib) is characterized by the absence of P-waves and a QRS
complex duration of less than 120 ms with an atrial rate of 400–600 beats per minute
(bpm). Atrial Flutter (AF) is characterized by the presence of P-waves with shorter
duration, elevated PQ baseline and 250–350 atrial bpm. Atrial-ventricular nodal
reentry tachycardia (AVNRT) is characterized by retrograde P-waves after or embedded
inside QRS-complex, with an atrial rate of 250–300 bpm. Ectopic atrial tachycardia
(EAT) is characterized by the negative P-waves, T-wave elevation and heart rate of
around 150 bpm.
Ventricular Tachycardia (VTach) is typically characterized by wide S-wave
(>100 ms), elevated R-wave, wide T-wave and heart rate greater than 100 bpm.
Ventricular Flutter (VFlu) is characterized by the absence of P-waves, T-waves, S-
waves, baselines, wide R-waves, elevated amplitude of R-waves and increased QT
duration with heart rate 180–250 bpm. Ventricular Fibrillation (VFib) is characterized
by no identifiable P-wave, T-wave or ISO lines, elevated ST baselines and heart rate of
150–500 bpm.
Integrating Markov Model, Bivariate Gaussian Distribution 573

2.2 Markov Model


A Markov model [15] is a probabilistic finite-state nondeterministic automaton mod-
eled by a 5-tuple of the form (set of all states, set of initial states, set of final states,
transition matrix, and initial-state-probability-vector). Weighted edges are the transition
probabilities between two adjacent states. Statistical analysis based upon transition
frequency is used to build Markov models.

2.3 Bivariate Gaussian Distribution


The joint distributions of two variables, denoted as A and B, having normal Gaussian
distribution [16, 17] is calculated using conditional variance. Conditional variance is
used based on correlation between variables [17]. Assuming lA and rA represent mean
and variance of the variable A, and lB and rB represent the mean and variance of B.
Conditional mean for the variable B is calculated by (1).
rB
EðBjAÞ ¼ lB þ q ð A  lA Þ ð1Þ
rA

Where, q represents the covariance between the variables A and B. Conditional


variance of B is calculated by (2).
 
r2BjA ¼ r2B 1  q2 ð2Þ

Conditional distribution of the variable B given A = a is calculated by (3).


2  2 3
B  lBjA
1 6 7
hðbjaÞ ¼ pffiffiffiffiffiffi exp4 5 ð3Þ
rBjA 2p 2r2BjA

Using conditional distribution of B, the joint probability distribution is calculated


by (4).

f ða; bÞ ¼ fA ðaÞ:hðbjaÞ ð4Þ

2.4 Statistical Modeling of ECG for Subclassification

Bivariate Gaussian Markov Model (BGMM). A bivariate Gaussian Markov model


(BGMM) is a special class of Markov models that integrates joint Gaussian distribution
[17] of feature vectors for the states and probabilistic transition between the states. It is
modeled as a weighted directed graph where transition probabilities between two
adjacent states represent the weight of the edges and state value of a graph represent the
joint Gaussian distribution of two variables: amplitude and duration.
574 P. R. Gawde et al.

BGMM has eight states and transitions. The eight states are: (1) P-wave features;
(2) Q-wave features; (3) R-wave features; (4) S-wave features; (5) T-wave feature;
(6) PQ iso-segment; (7) ST iso-segment; and (8) TP iso-segment.
Bivariate Gaussian Transition Graph (BGTG). A bivariate Gaussian transition
graph (BGTG) is a weighted directed graph that shows the probability of transition
between the adjacent states of a finite state automaton like BGMM. However, BGTG is
made from a small sample of data-elements from the same patient’s heart-beats in
comparison to the BGMM that carries large sample-size of multiple patients having a
common physician annotated abnormality. The matching of BGTG with BGMM graph
provides subclassification of a patient’s ECG.

2.5 ECG Signal Preprocessing

Denoising. Raw ECG signals from the MIT-BIH database [18] contain at least three
types of noise: electromyography noise from muscles’ movement, radio frequencies
and power line noise [6]. Discrete Wavelet Transforms (DWT), a multi-resolution
decomposition scheme is used to eliminate these noises [6]. The source signal is
decomposed into low and high frequency sub-bands. Low-pass and high-pass filters are
used to remove low-frequency and high-frequency sub-bands, respectively.
Feature Extraction. Amplitude and duration for waveforms (P, Q, R, S and T) and for
baselines (TP or ISO1, PQ or ISO2 and ST or ISO3): Daubechies 6 (D6) wavelet
transform is used to detect amplitude and duration of waveforms in each beat [6].
Wavelet transforms are scaled up to eight levels to obtain corresponding approximation
coefficients. Four separate algorithms [6] are used to detect R-wave, Q and S-wave, PQ
and ST segments, and P-waves. Based on zero crossings of waveforms, durations of the
waveforms and baselines are derived.
SIMT and Parallel Computations. SIMT (Single Instruction Multiple Threads)
paradigm is based upon executing the same sequence of instructions concurrently
spawning multiple light-weight threads.

2.6 GPU and CUDA Architecture


A CUDA based GPU has multiprocessor cores, and acts as a coprocessor to the main
CPU. CUDA (Compute Unified Device Architecture) supports data-parallelism using
SIMT paradigm by spawning a high number of concurrent threads on different sets of
data-elements in compute-intensive applications [19]. Streaming multiprocessors
(SM) are assigned to multiple groups of threads called blocks using a grid architecture
[19] as shown in Fig. 3.
Each SM has multiple CUDA cores that are comprised of ALUs, FPUs (Floating
Processing Unit), load/store units and registers. These cores are assigned automatically
to balance the load by the SM scheduler. The GPU supports high latency global
memory to share information between CPU and GPU, short latency constant memory
that cannot be altered during a thread’s execution, limited on-chip shared memory and
local memory. Global memory is also used to share information across SMs. Constant
Integrating Markov Model, Bivariate Gaussian Distribution 575

CPU GPU
Grid 1 Block (0,0)
Kernel
Block(0,0) Block (1,0)

Thread 1 …
Block(0,1) Block (1,1)
Thread n

Fig. 3. A CUDA architecture.

memory is a cache memory written into before spawning the corresponding thread. It
does not allow rewriting during the thread execution.
A block is a group of threads that can be executed concurrently. These threads
communicate to each-other using low latency shared memory. The threads are auto-
matically allocated CUDA cores to exploit concurrency and balance the load.
NVIDIA GPU Based Architecture. NVIDIA GPU exploits data parallelism by
concurrent spawning of multiple threads. These threads are automatically allocated
CUDA cores, over which a programmer has no control. Distribution of data on SMs for
exploiting concurrency is also automated, and this cannot be specified by the pro-
grammer, either. The spawning of multiple blocks enhances the chance of concurrent
utilization of multiple SMs by mapping different blocks on different SMs.

3 BGMM Based Classification of Arrhythmia

Each state of the BGMM is associated with joint distribution of two variables:
amplitude and duration. Transitions between the states represent transition probabilities
between the states.
Values of zero vary in meaning for amplitude and duration: The duration of zero for
any of the baseline segments: ISO1 (TP-segment), ISO2 (PQ-segment) and ISO3 (ST
segment) imply that the corresponding state in the BGMM is bypassed (i.e. the event
never occurred). Conversely, an amplitude-value of zero is anticipated, and does not
imply the absence of transitions between the ISO-states and the corresponding P-Q-R-
S-T states because ISO-states have no peak (i.e. zero amplitude) in regular heart-beats.
P-waves embedded in the QRS-complex are considered missing.
The overall approach for real time irregular beat subclassification is divided into
two phases (as shown in Fig. 4): (1) a training phase that uses the standard MIT-BIH
database [18], and (2) a dynamic diagnosis phase based upon real-time collection and
analysis of a sequence of multiple beats-windows.
Training Phase: A BGMM is constructed for each subclass using the annotated MIT-
BIH database [18]. The training phase has four stages: (1) denoising the beats;
(2) feature extraction (amplitude and duration of each waveform in a beat); (3) area
subtraction to identify embedded waveforms; and (4) construction of Markov model.
Dynamic Detection Phase: This phase has six stages: (1) de-noising of acquired beats
(2) heartbeat collection for 30 s window; (3) morphological and temporal feature’s
576 P. R. Gawde et al.

Training Phase
Feature Embedded
Denoising BGMM construction
extraction waveform

Dynamic Phase
First Embedded
Feature waveform BGTG Graph
window
Denoising extraction detection Matching
analysis

Fig. 4. Bivariate Gaussian Markov model approach.

extraction; (4) embedded P-wave and R-wave detection, (5) BGTG construction and 6)
BGTG classification.
The second stage is executed once for first window of signal; subsequent windows
do not require this stage because they incrementally build the statistical information by
adding next beat information and removing the least recent beat information. A win-
dow of 30 s is chosen for beat analysis to balance the quick response time needed in
emergency conditions and to maintain accuracy.
Each GPU analyzes around 20 beats based on optimal error analysis [17] using a
confidence interval of 95%. Statistical analysis showed that error increases by 2% for
10 beats, and decreases only by 0.2% for 40 beats. However, performance degrades for
40 beats window.

3.1 Embedded Waveforms Detection


Embedded waveform analysis is required to derive one waveform embedded in
another. This can occur in the same beat or a preceding beat. An embedded waveform
can often be mistakenly considered missing [3] leading to misclassification of sub-
classes [9, 10]. In our previous work [9, 10], we identified P-waves embedded in QRS-
complex for the accurate diagnosis of EAT, and R-wave embedded in T-wave of the
previous beat in VTach.
The embedded waveforms are detected by area-subtraction technique [10, 20]. Area
subtraction is based upon finding the mean area of each type of waveforms and sub-
tracting the observed waveform area in the current beat from the corresponding mean.
The calculation uses a threshold for identifying embedded waveforms [3, 10] with a
confidence interval [17] of 95%. After area subtraction of the initial waveform, the
embedded P-wave or R-wave is allocated the mean amplitude and duration.

3.2 Bivariate Gaussian Transition Graph (BGTG) Construction


A BGTG is constructed by extracting the amplitude and duration of each of the eight
states and transitions between them. Zero durations in waveforms or ISO-states reflect
missing corresponding states. Embedded wave analysis is utilized to identify the absent
edges in the Markov model. Frequency analysis is used to derive transition probability.
Integrating Markov Model, Bivariate Gaussian Distribution 577

Figure 5 shows an example of a BGTG constructed for annotated beats of the EAT
arrhythmia in MIT-BIH [18] dataset. Table 1 shows average amplitude and durations
obtained for the same window. Transition from ISO3 ! T is only 0.02 meaning T-waves
are absent during EAT because the next depolarization (i.e. P-wave) begins before the
repolarization [3]. In addition, ectopic foci lead to negative amplitude of P-wave.

0.98 0.96
P
ISO1 ISO2
1 0.02 1
T 0.04
0.9 Q
8
R 1
0.02 ISO3
S
1 1

Fig. 5. A sample BGTG for 20-beat window.

3.3 Graph Matching


After constructing the BGTG, the diagnosis reduces to matching the BGTG with the
BBGMMs for appropriate classifications [9, 10]. The algorithm has three steps:
Step 1: For the constructed BGTG, most probable path (MPP) is identified. An MPP is
the path from ISO1 to ISO1 with the highest transition probability. For the BGTG
given in Fig. 2, MPP is given by: ISO1!P!ISO2!Q!R!S!ISO3!ISO1.
Step2: Transition probabilities below 0.05 are removed from BGTG to eliminate
noise.
The derivation of the threshold is based upon statistical analysis [17] of noise
present in dataset [18]. A subset of the BGMMs is selected that includes all the
transitions present in the BGTG. This step gives the list of prospective matching of
BGMMs.
Step 3: For all the BGMMs obtained from the Step 2, graph matching is performed by
multiplying two values: (1) probability that observed bivariate distribution of state in
BGTG is produced by state in BGMM using maximum likelihood estimation
(MLE) [16] and (2) probability that the state in the observed beat is generated by a
given BGMM based on transition probabilities using a standard forward-backward
algorithm [15]. BGTG is classified based upon the BGMM with the maximum
likelihood.
578 P. R. Gawde et al.

Table 1. Average amplitude and duration


Amplitude Duration
P-wave −0.20 mv 0.08 s
Q-wave −0.14 mv 0.2 s
R-wave 1.8 mv 0.6 s
S-wave −0.2 mv 0.1 s
T-wave 0.17 mv 0.10 s
ISO1 0 0.11 s
ISO2 0 0.09 s
ISO3 0 0.07 s

4 Concurrent Model

4.1 Dependency Analysis


Figure 6 shows various modules and their execution time. Table 2 shows average
processing time required for the four major modules. ECG preprocessing module has
two submodules: denoising module and feature extraction module.

Patient
Specific [CATEGORY
Analysis NAME]
[PERCENTAGE] [PERCENTAGE]

[CATEGORY
NAME] [CATEGORY
[PERCENTAGE] NAME]
[PERCENTAGE]

[CATEGORY
NAME]
[PERCENTAGE]

Fig. 6. Timing analysis of bivariate Markov model approach.

The high-level modules cannot be executed concurrently due to the inherent


dependency between the modules: preprocessing ! embedded wave detection !
BGTG construction ! graph matching. However, denoising, feature extraction,
embedded wave analysis and BGTG graph construction modules require the beat-level
analysis and shared memory to merge the data from individual beat analysis. Graph
matching matches one BGTG with multiple BGMMs. While first three modules can
exploit data-parallelism at the beat level within the same SM (streaming multiprocessor),
Integrating Markov Model, Bivariate Gaussian Distribution 579

Table 2. Average processing time for modules


Module Processing time (ms)
Preprocessing 950
Embedded wave 200
Transition graph 2800
Graph matching 3200

graph matching requires data-parallelism for concurrently matching multiple BGTG-


BGMM pairs.
Two major issues in exploiting GPU based parallelism are: (1) mismatch of the
latency time of different memories; and (2) mismatch between the data transfer rate
between CPU and GPU and data transfer rate between SMs within GPUs. Thus, we
have to optimize task distribution so that faster memory accesses in GPUs are exploited
without excessive data transfer between slower global memories. In addition, we have
to maintain the accuracy of the diagnosis while distributing the beats across SMs in
GPU based on statistical analysis. In our case, the CPU performs real-time ECG
collection and spawning of the data analysis. However, data parallel work is done in
GPU.
Feature extraction has two functionalities: (1) identification of the waveforms; and
(2) extraction of amplitude and duration of each waveform and ISO lines. The first task
begins without prior knowledge about the waveforms. It has eight subtasks: (1) R-wave
extraction; (2) Q-waves extraction; (3) S-wave extraction; (4) zero crossing detection to
get ISO2 baseline; (5) zero crossing detection to get the ISO3 base line; (6) P-wave
extraction; (7) T-wave extraction; 8) ISO1 extraction using knowledge of P and T
waves. There is a task dependency in identifying the beats. R-wave is identified first
followed by two tasks: (Q-wave detection ! zero crossing to get ISO2 ! P-wave
detection) and (R-wave detection ! zero crossing to get ISO3 ! T-wave detection).
After the detection of P-wave and T-wave, ISO1 is identified.

4.2 Exploiting Concurrency on GPU


The overall approach to exploit concurrency consists of three steps: (1) block level
parallelism for noise detection and waveform extraction by dividing the data into equal
time-slots; (2) exploiting data parallelism at the beat level for the amplitude and
duration analysis, embedded wave detection and BGTG construction; and (3) concur-
rent matching of BGTG-BGMM pairs by spawning multiple threads within a block,
one for each BGTG-BGMM pair.
Before starting concurrent processing of time-windows, the initial window for the
first 30-s period is analyzed sequentially in the CPU to estimate the statistical infor-
mation regarding the waveform features. The analyzed features are: (1) number of beats
and individual waveforms in 30 s window; (2) mean, median, minimum and maximum
of the amplitude and duration of each type of waveform and ISO-lines. This infor-
mation is needed to spawn and terminate multiple threads during concurrent analysis of
580 P. R. Gawde et al.

future windows. This information is stored in the global memory and the constant
memory for use by SMs for subsequent concurrency exploiting modules.
Concurrent Denoising and Feature Extraction. The noise removal submodule
processes 30-s window (around 120 beats) of raw ECG signal, and has no knowledge
of the waveforms. It performs convolution, low pass and high-pass filtering. Hence 30-
s windows are divided equally in multiple blocks (>6 per window in our case). After
the noise-removal, the signal is input to the feature-extraction module. Since the data is
already present in GPU, there is no data transfer overhead.
Beat detection and feature vector analysis are performed in one block to exploit
shared memory (low latency). Based upon the estimate of the R-waveform counts
derived from the initial window analysis, the same number of threads are spawned to
concurrently detect individual R-waveforms using barrier-based synchronization
(synchronization in Nvidia GPU terminology). After detecting R-waveforms, two sets
of concurrent threads are spawned to detect other waveforms and features (Q-wave,
ISO2, P-wave) and (S-wave, ISO3, T-wave) respectively. Again, the number of threads
spawned in each set is equal to number of detected R-waves. After detecting the
waveforms, one thread is spawned to sequentially identify all ISO1 lines in the sample.
After feature extraction, feature data is transferred to global memory for BGTG con-
struction. Since the data-size is quite small after feature extraction, the overhead of data
transfer is also quite small.
For each window, there are multiple BGTGs (around six for a 30-s window).
Multiple blocks are spawned, one for each BGTG construction, to exploit a maximum
number of available SMs in the GPU. For every BGTG, there are three tasks for every
state: (1) computing averages of the durations and amplitudes for each type of
waveform; (2) computing the joint probability of amplitude and duration; and
(3) computing the transition probability. Eight concurrent threads are spawned: one for
each state. This exploits data-parallelism. This gives six BGTGs for a 30-s window.
Graph matching phase exploits data-parallelism by spawning multiple blocks, one
for each BGTG. Each block spawns multiple threads, one for each BGTG-BGMM pair.
Each thread utilizes the CUDA cores by automatic allocation at the OS level.

5 Algorithms

This section describes algorithms for the major concurrent tasks: (1) concurrent
denoising and feature-extraction; (2) concurrent embedded-wave detection; (3) con-
current BGTG construction; (4) concurrent MPP (most probable path) detection; and
(5) concurrent Matching.
For describing the concurrent thread spawning, we use the constructs cobegin-
coend for modeling concurrent thread-groups that terminate together, barrier to model
waiting for a group of threads to terminate together, and forall to spawn multiple
threads concurrently. A block of activity in a single thread is enclosed within curly
brackets {…}. Blocks are used for processing multiple concurrent activities such as
thread-groups working on a finite number of beats to exploit maximum utilization of
automated thread-groups to SMs mapping in the GPU.
Integrating Markov Model, Bivariate Gaussian Distribution 581

5.1 Concurrent Preprocessing and Embedded Wave Detection

Algorithm for Concurrent Denoising and Feature Extraction. To execute this


kernel function, 30 s of data is divided into number of blocks corresponding to a set of
beats based on the average beat area calculated in the initial window analysis. On each
block, data is divided between multiple threads. Noise removal and the R-wave
detection with amplitude and duration is performed by the threads concurrently.
A barrier is used to finish the execution of all R-wave detection threads. Next, data is
divided into two chunks: left of R-wave (R-wave – D) and right of R-wave (R-
wave + D), which are spawned on multiple threads concurrently. Each of the left-side
threads detect and extract features of one corresponding Q-wave, ISO2 and P-wave.
Similarly, each of the right-side threads detect and extract features of one corre-
sponding S-wave, ISO3 and T-wave. Threads are terminated after they cross their
respective boundaries. After the termination, their output is used to detect ISO1 and
store its duration in the global memory.
Algorithm for Concurrent Embedded Wave Detection. A kernel function with a
grid of six blocks is launched, where one block is executed on one SM. To execute it,
each block gets information for average area calculation from the initial window
analysis and features calculated for each beat in the previous module. Each thread in a
block works on one beat. For the missing P-waves, the corresponding threshold area is
checked to assign average features for the missing waveform. Otherwise, unchanged
features are passed back to global memory. Detailed algorithms are given in Fig. 7.

5.2 Concurrent BGTG Construction


To exploit a maximum number of available resources and SMs, 120 beats were divided
into 20 beat blocks. A BGTG is constructed by each block by using feature-values of
20 beats and estimated values derived by initial window analysis. For each state of
BGTG, two calculations are performed by each thread: (1) bivariate probability; and
(2) transition probability to other states. Thread calculations are synchronized using
barrier to ensure fully constructed BGTG before transferring data to global memory.
Detailed algorithm is given in Fig. 8.

5.3 Concurrent Graph Matching


The concurrent graph matching algorithm has three kernel functions: (1) Computing
the most probable path in each BGTG; (2) pruning BGMMs that do not have an edge
present in the BGTG; (3) classifying BGTG using MLE and the forward-backward
algorithm.
Concurrent Most Probable Path. On the GPU, one grid with six blocks is deployed.
In each block, one state of BGTG is analyzed by each thread to calculate highest
probability for that state. Information of highest probability is stored in form of pair
(statei, statej) representing maximum probability from statei to statej. Final thread waits
for barrier and creates MPP by joining all state-pairs for one BGTG.
582 P. R. Gawde et al.

Algorithm Concurrent denoising and feature extraction


Input: ECG signal, D6 wavelet
Output: denoised beats with features extracted
{ //Execute grid of blocks on GPU for window of 30 sec.
forall block1 : blockn //dispatch 5 second window to block
{ forall threads T1:Tm { //denoising and R-wave detection
spawn Ti for denoising and R-wave detection;
store derived information in memory, and wait;
end barrier;}
count number of R-waves from memory. Let it be k;}
Co-begin
forall threads T1 : TK{
spawn Ti to detect and store Q-wave ISO2 P-wave
store derived information in memory;
terminate if distance > R-wave-location
end barrier; }
forall threads TK+1 : T2*K{
spawn Ti to detect and store S-wave ISO3 T-wave
store derived information in memory;
terminate if distance > R-wave-location
end barrier; }
Co-end
calculate ISO1 based on P-wave and Q-wave; store ISO1 information}}
Algorithm Concurrent embedded waveform detection
Input: Beats-and-features
Output: updated-beats-and-features
{//Execute grid of blocks on GPU
forall block1 : blockm //execute m concurrent blocks with 20 beats/block for multiple SMs
forall T1 : Tm // each thread works on one beat
if (missing (P-wave)) {
compute QRS area
if (QRS area > threshold) {
mark P-wave present with average amplitude, duration
update beats-and-features; }}
}

Fig. 7. Algorithm for concurrent waveform detection

Fig. 8. Algorithm for concurrent BGTG.


Integrating Markov Model, Bivariate Gaussian Distribution 583

Concurrent BGMM Pruning. To find the subset of potential BGMMs for each
BGTG, one grid of six blocks are launched and each block is executed on different SM.
Each block takes input of one BGTGs and all BGMMs. Comparison of one BGTG
with one BGMM is performed by each thread on one block. BGTGs with probabilities
less than the threshold are pruned by the first thread in the block. Next, concurrent
threads are launched for each BGTG-BGMM pair. If states in BGTG and BGMM
match, BGMM is considered as a potential match for the BGTG and is stored in the
common vector SUB accessible to all the threads in the block.
Concurrent Maximum Probability. To calculate the probabilities of matching each
BGTG with the filtered BGMMs, a kernel function with a grid of six blocks is laun-
ched. One BGTG is matched with the subset of filtered BGMMs in one block. The
probability of matching one BGTG-BGMM pair is calculated by multiplying two
values: (1) probability of state-value (bivariate Gaussian distribution) of BGTG pro-
duced by BGMM using MLE [16], and (2) probability of transition probabilities in
BGTG produced by BGMM using a forward-backward algorithm [15]. This probability
is stored in a vector accessible to all the threads in the block. The outputs for each block
are transferred back to the global memory. Detailed algorithm is given in Fig. 9.

6 Implementation

The software was executed on a Dell machine having Intel(R) Xeon(R) dual core CPU
E5-2680 @2.70 GHz 64-bit system with 128 GB RAM and CUDA enabled
GeForce GTX 1050 ti GPU card. In GTX 1050 ti, there are six SMs. Each SM has four
blocks with 32 cores per block, and 48 KB shared memory. There are 24 blocks, each
having 1024 threads. There are total of 768 cores in the GPU for SIMT processing.
We analyzed the MIT-BIH arrhythmia dataset [18] and the Creighton University
Ventricular Database available at PhysioNet [21]. The dataset was divided 60% for
training and 40% for testing. Threshold used for area subtraction in algorithm for
detection of the embedded waveforms was chosen experimentally after analyzing 3093
beats in MIT-BIH [18]. To derive the execution efficiency, we compared the CPU only
implementation and CPU + GPU combination with full CUDA resources. For the
acquisition of real-time ECG data, signal filtering and processing, feature extraction
and analysis, we use MATLAB software along with WFDB software package [21]
provided by PhysioNet written in C++ [21]. We also used MATLAB for statistical
analysis. GPU algorithms were executed in C with CUDA framework.

6.1 Performance Analysis and Discussion


We tested overall execution efficiency and improvement using single core CPU and
768 CUDA cores at the module level as summarized in Table 3. We also tested the
effect of memory utilization of different types of memory on overall improvement as
shown in Fig. 10. Based on limitations and advantages of each memory type, we
analyzed two approaches to exploit data parallelism: (1) Combination of constant
584 P. R. Gawde et al.

Fig. 9. Algorithm for concurrent graph matching

Table 3. Concurrent execution speedup of modules


Module Single CPU Concurrency using GPU Speedup
Preprocessing 950 ms 503 ms 1.8
Embedded wave 200 ms 102 ms 1.9
Transition graph 2800 ms 489 ms 5.7
Graph matching 3200 ms 492 ms 6.5
Total time 7150 ms 1586 4.5
Integrating Markov Model, Bivariate Gaussian Distribution 585

Fig. 10. Effect of memory utilization in speed up.

memory and global memory; (2) Combination of shared memory and constant
memory.
The execution times of different modules are based on the analysis of 120 beats per
execution for 500 iterations. Average time taken to execute the sequential BGMM
approach on CPU is around 7 s. Average time taken to execute the modules concur-
rently using the GPU is around 1.6 s. The overall improvement is 4.5x (77.8%) for
arrhythmia subclassification. After the GPU implementation finishes in 1.6 s, it remains
idle for the next 28.4 s while CPU collects next 30-s window in real-time. This idle
time can be utilized to analyze other abnormalities working on same ECG data [19].
In the first approach of memory utilization, constant memory was used for multiple
access of read-only data. Due to the read-only nature of constant memory, during
concurrent preprocessing of modules, global memory was used for information
exchange and storing dynamic data during concurrent preprocessing of modules. In the
second approach, faster shared memory within a single block was used as a read/write
memory during dynamic execution. However, due to its limited size, read-only mul-
tiple access data was stored in constant memory [19].
The experiment was run only for 20 beats/BGTG. The sequential execution
increased linearly as the number of BGTGs increased. The time saved with the com-
bination of shared memory and constant memory was more than the time saved using a
combination of constant memory and global memory. This difference is expected
because the shared memory is a cache memory with a low latency period. One more
interesting result was observed. The concurrent approach increased linearly up to six
BGTGs. After six BGTGs, the execution time became constant possibly due to addi-
tional automated allocation of CUDA resources or SMs. Thus, additional overloading
of the GPU is automatically compensated by additional allocation of CUDA resources.
This might prove a useful feature for exploring an analysis of other aspects of ECG
abnormalities without increasing execution time.

6.2 Classification Accuracy


We calculated false positives, false negatives, true positives and true negatives to
compute sensitivity as (TP/(TP + FN) * 100) and specificity as (TN/(TN + FP) * 100).
586 P. R. Gawde et al.

True positives (TP) are the number of positive detections that correspond to the anno-
tations of a specialist. False positives (FP) are the number of detections that do not
correspond with the annotations of a specialist. True negatives (TN) are the beats that
were not annotated as a ventricular arrhythmia beat by a physician, and were not
identified by the algorithm. False negatives (FN) are the heartbeats that were annotated
as arrhythmia by a specialist, but were not detected by the algorithm. Table 4 shows the
accuracy of our technique when using a combination of shared memory and constant
memory. Both sensitivity and specificity are high for all the subclasses.

Table 4. Accuracy of arrhythmia subclassification


Class Subclass Sensitivity Specificity Sensitivity Specificity
Sequential approach GPU based concurrent
approach
Ventricular AFib 97.3 93.6 97.2 96.4
Ventricular AFlu 92.3 94.3 92.1 94.3
Ventricular AVNRT 95.2 96.9 95.6 97.0
Ventricular EAT 98.6 94.0 98.6 94.0
Supraventricular VTach 94.0 96.3 94.0 96.4
Supraventricular VFlu 91.3 98.2 91.6 98.3
Supraventricular VFib 98.6 99.6 98.5 99.1

7 Related Works

Several researchers have exploited parallel techniques for various subtasks such as
processing of the signal using filters [12], wavelet transform [13], and classification of
beat to supraventricular and ventricular arrhythmia [14].
Lopes et al. [22] have proposed ventricular arrhythmia diagnosis using parallel
implementation of neural networks. Their approach focuses on parallel implementation
of back propagation. Their technique is limited to PVC beat detection, and does not
address real-time classification. Their approach does not detect embedded P-waves
reducing the accuracy. The sensitivity obtained using their approach is 94.5% [17]
compared to 98.8% obtained using our approach.
Another neural network-based classification approach has been proposed by Li
[11]. It is limited to separating supraventricular or ventricular beat using GPUs.
Phaudphut and Phusomsai obtained sensitivity of 88.0% [13] in detection of PVC beat
compared to 99.3% by our approach. In addition, we diagnose all seven major sub-
classes in real-time.
Some researchers have utilized the GPU for denoising and feature extraction [8,
12]. Domazet et al. [12] has proposed an optimization with shared and constant
memory for DSP filter for ECG denoising. Although our goal is much broader, we
tested our approach with two memory optimization techniques. Combination of shared
memory and constant memory showed improvements due to low latency compared
with a combination of global and constant memory as expected.
Integrating Markov Model, Bivariate Gaussian Distribution 587

8 Limitations and Future Directions

The current system uses only lead II for arrhythmia analysis. The model could be
extended further to analyze three leads signals on embedded GPU such as NVIDIA
Jetson [5] based wearable devices to handle ischemia, heart abnormalities due to
electrolyte imbalances and myocardial infarction in real-time. We are currently
extending our GPU based BGMM to a GPU based multivariate model to diagnose
ischemia, hyperkalemia and myocardial infarction using three leads in real-time.

References
1. Lerma, C., Glass, L.: Predicting the risk of sudden cardiac death. J. Physiol. 594(9), 2445–
2458 (2016)
2. Rautaharju, P.M., Surawicz, B., Gettes, L.S.: AHA/ACCF/HRS recommendations for the
standardization and interpretation of the electrocardiogram: part IV. J. Am. Coll. Cardiol. 53
(11), 982–991 (2009)
3. Garcia, T.B., Miller, G.T.: Arrhythmia Recognition: The Art of Interpretation. Jones and
Bartlett, Burlington (2004)
4. Abtahi, F., Snäll, J., Aslamy, B., Abtahi, S., Seoane, F., Lindecrantz, K.: Biosignal pi, an
affordable open-source ECG and respiration measurement system. Sensors 15(1), 93–109
(2014)
5. Page, A., Attaran, N., Shea, C., Homayoun, H., Mohsenin, T.: Low-power manycore
accelerator for personalized biomedical applications. In: ACM Proceedings of the 26th
Edition on Great Lakes Symposium on VLSI, Boston, pp. 63–68 (2016)
6. Mahmoodabadi, S.Z., Ahmadian, A., Abolhasani, M.D.: ECG feature extraction using
Daubechies wavelets. In: Proceedings of the Fifth IASTED International Conference on
Visualization, Imaging and Image Processing, Benidorm, pp. 343–348(2005)
7. Sayadi, O., Mohammad, B., Shamsollahi, M.B., Clifford, G.D.: Robust detection of
premature ventricular contractions using a wave-based Bayesian framework. IEEE Trans.
Biomed. Eng. 57(2), 353–362 (2010)
8. Jun, T.J., Park, H.J., Yoo, H., Kim, Y.H., Kim, D.: GPU based cloud system for high-
performance arrhythmia detection with parallel k-NN algorithm. In: Proceedings of the 38th
Annual International Conference of the. IEEE Engineering in Medicine and Biology Society
(EMBC), Orlando, pp. 5327–5330 (2016)
9. Gawde, P.R., Bansal, A. K., Nielson, J.A.: ECG analysis for automated diagnosis of
subclasses of supraventricular arrhythmia. In: Proceedings of International Conference on
Health Informatics and Medical Systems, Las Vegas, pp. 10–16 (2015)
10. Gawde, P.R., Bansal, A.K., Nielson, J.A.: Integrating Markov model and morphology
analysis for finer classification of ventricular arrhythmia in real-time. In: IEEE International
Conference on Biomedical & Health Informatics, Orlando, pp. 409–412 (2017)
11. Li, P., Wang, Y., He, J., Wang, L., Tian, Y., Zhou, T.: High-performance personalized
heartbeat classification model for long-term ECG signal. IEEE Trans. Biomed. Eng. 64(1),
78–86 (2017)
12. Domazet, E., Gusev, M., Ristov, S.: Optimizing high performance CUDA DSP filter for
ECG signals. In: Proceedings of the 27th DAAAM International Symposium in Intelligent
Manufacturing and Automation, Vienna, pp. 0623–0632 (2016)
588 P. R. Gawde et al.

13. Phaudphut, C, So-In, C., Phusomsai. W.: A parallel probabilistic neural network ECG
recognition architecture over GPU platforms. In: Proceedings of the 13th International Joint
Conference on. Computer Science and Software Engineering (JCSSE), Khon Kaen, pp. 1–7
(2016)
14. Fan, X., He, C., Chen, R., Li, Y.: Toward automated analysis of electrocardiogram big data
by graphics processing unit for mobile health application. IEEE Access 5, 17136–17148
(2017)
15. Russell, S., Norwig, P.: Artificial Intelligence—A Modern Approach, 3rd edn. Prentice Hall,
Upper Saddle River (2010)
16. Psutka, J.V., Psutka J.: Sample size for maximum likelihood estimates of Gaussian model.
In: International Conference on Computer Analysis of Images and Patterns, pp. 462–469.
Springer, Cham (2015)
17. Everitt, B., Skrondal, A.: The Cambridge Dictionary of Statistics, vol. 106. Cambridge
University Press, Cambridge (2002)
18. MIT-BIH Arrhythmia dataset. https://www.physionet.org/physiobank/database/MIT-BIH/
19. Nvidia, C.: C Programming Guide PG-02829–001_v9.1, March 2018. http://docs.nvidia.
com/cuda/pdf/CUDA_C_PrograBGMMing_Guide.pdf
20. Tallarida, R.J., Murray, R.B.: Area Under a Curve: Trapezoidal and Simpson’s Rules
Manual of Pharmacologic Calculations, pp. 77–81. Springer, New York (1987)
21. Creighton University Ventricular Tachyarrhythmia Database. https://physionet.org/
physiobank/database/cudb/
22. Lopes, N., Ribeiro, B.: Fast pattern classification of ventricular arrhythmias using graphics
processing units. In: Iberoamerican Congress on Pattern Recognition. LNCS, vol. 5856,
pp. 603–610. Springer, Heidelberg (2009)
Identification of Glioma from MR Images
Using Convolutional Neural Network

Nidhi Saxena(B) , Rochan Sharma, Karishma Joshi, and Hukum Singh Rana

University of Petroleum and Energy Studies, Dehradun, India


nsaxena117@gmail.com

Abstract. This paper presents a novel approach of classifying the type


of glioma using convolutional neural network (CNN) on 2D MR images.
Glioma, most common type of malignant brain tumor, and can be clas-
sified according to the type of glial cells affected. The types of gliomas
are, namely, actrocytoma, oligodendroglioma and glioblastoma multi-
forme (GBM). Various image processing and pattern recognition tech-
niques may be used for cancer identification and classification. Though
in recent years deep learning has been proved to be efficient in computer
aided diagnosis of diseases. Convolutional Neural Networks, a type of
deep neural network which is generally used for classification of images,
contains multiple sets of conv-pool layers for feature extraction, followed
by fully-connected (FC) layers that make use of extracted features for
classification.

Keywords: Glioma · Astrocytoma · Oligodendroglioma


Glioblastoma multiforme (GBM)
MRI and convolutional neural network (CNN)

1 Introduction
Glioma is a major type of brain tumor that can occur in all age groups though
mostly seen in adults. It originates in glial cells of brain. Glial cells are of four
types namely - astrocytes, oligodendrocytes, microglia and ependymal cells.
Accordingly astrocytoma, oligodendroglioma and glioblastoma multiforme are
the types of glioma cancers as shown in Fig. 1. These tumors can be cured if
detected at early stage but some of the fast growing gliomas can be dangerous.
The most common and aggressive type of brain tumor is glioblastoma multiforme
or GBM, which is a malignant grade IV glioma. In early-stage glioblastoma, as
per MRI findings, are ill-defined small lesions with little or no mass effect, and
having no or subtle contrast enhancement. Within several months, these lesions
develop typical MRI findings such as a heterogeneous enhanced bulky mass with
central necrosis. The average period from the initial to final scan in diagnosis
of glioblastoma has been 4.5 months [1]. Magnetic Resonance Imaging (MRI) is
one of the commonly used modalities used for diagnosing brain tumors. As com-
pared to other diagnostic methods, like computed tomography scan, ultrasound,
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 589–597, 2019.
https://doi.org/10.1007/978-3-030-02686-8_44
590 N. Saxena et al.

etc., MRIs are safe, non-invasive and reflects the true dimensions of organ/tissue,
therefore in imaging of the brain, it is widely considerable [2].
Convolutional neural networks (CNN) [3] consists of conv-pool layers followed
by fully-connected (FC) layers. One conv-pool layer consists of a convolutional
layer and a pooling layer. The convolutional layer is used to detect hierarchical
features from images, whereas pooling layer is used to forward the detected fea-
tures further in the network [4]. In the proposed model, convolution operations
are performed with same padding (size of feature space remains same) and pool-
ing is performed with valid (or no) padding (size of feature space is reduced).
Conv-pool layers detects useful features and forward them to FCs where classifi-
cation is performed. Unlike neural networks, in CNN each layer grid is connected
to only a limited number of layers. In CNN, the entire network can be put into
the GPU memory and the hardware cores can be used to boost network speed
using deep learning tools. They have a lot of applications in medical diagnosis
involving image segmentation independent of morphology. Lesions are detected
and classified accordingly using CNNs and type and severity of diseases can be
predicted.

Fig. 1. MR scans of types of gliomas: a. Astrocytoma, b. Olidendroglioma and c.


Glioblastoma multiform or GBM.

2 Literature Review
2.1 Segmentation
In medical domain, segmentation is the technique for detection and separation
of a part from medical image (can be a lesion or an organ) that can be used for
further diagnosis. Segmentation proves to be very helpful for monitoring disease
progression, plan treatment strategies and prediction of treatment outcomes. It
can be done in many ways like by thresholding or by developing a heuristic
algorithm as shown by Rajnikanth et al. [5]. Their work focuses on developing
Hamiltonian Mechanics 591

a heuristic algorithm to segment the tumor region from 2D brain MRI images.
Initially, preprocessing is done which enhances the tumor region in MR scans
followed by multi-level thresholding to segment the lesion. Then accuracy is
calculated on different slices of MR images, which is above 95% for all types of
MRI slices.
Deep learning can also be applied for segmentation of lesion and detection of
cancer from modalities like Computed Tomography (CT) scan, ultrasound and
MRI. Farnaz et al. [6], trained a deep convolutional neural network (DCNN) for
segmentation of lesions in brain from MR images. The proposed model was 6
layers deep (5 convolution layers and 1 FC layer) showed that the DICE similarity
coefficient matric was 0.90 for complete, 0.85 for core and 0.84 for enhancing
regions on BRATS 2016 dataset.
Segmentation, sometimes may need humans to provide some high level infor-
mation needed to extract the segmented region from images. This type of
segmentation is called interactive segmentation [7]. Wang et al. [8] performed
interactive medical image segmentation by fine-tuning a pre-trained CNN for
segmenting multiple organs from 2D fetal MR slices (here two types of organs
were annotated for training) and also on 3D segmentation of brain tumor core
and whole brain tumor (here the brain tumor core was annotated in one MR
sequence). The image specific fine-tuning made CNN model adaptive to a specific
test image which can be either unsupervised or supervised. Also, a weighted loss
function considering network and interaction based uncertainty for fine-tuning
was proposed. Experiments show that image specific fine tuning improves seg-
mentation performance.

2.2 Classification
In medical diagnosis, aim is to identify the presence of a disease in a person on the
basis of scans of a particular organ along with analyzing patient’s medical history.
To detect the disease by analyzing an image, pre-processing may prove to be
beneficial. Sadegi-Naini et al. [9] proposed a method for feature extraction (a pre-
processing step) and data analysis to characterize breast lesion by using texture
based features in ultrasound scans. Among 78 patients, 46 and 32 patients were
confirmed with benign and malignant lesions respectively based on radiology
and pathology reports.
Though MR is an efficient modality, still to apply Computer Aided Diagnosis
(CAD) sometimes pre-processing methods such as feature selection, extraction or
representation is required. Mingxia et al. [10] proposed an anatomical landmark
based feature representation which automatically extracts features in brain MR
images for the purpose of disease diagnosis. Experimental results showed that
the proposed method improves the performance of disease classification.
An approach to find the severity of tumor is to first segment tumor region
from the scan then classify it as malignant or benign. Deckota et al. [11] proposed
a system which identifies the cancerous nodule from the lung CT scan images
using watershed segmentation for detection and support vector machine (SVM)
for classification of nodule as malignant or benign. The proposed model includes
592 N. Saxena et al.

6 stages: image pre-processing, segmentation of the pre-processed image, feature


extraction, feature reduction using PCA, classification using SVM and evaluation
of the classification. The model detects cancer with 92% accuracy classifier has
accuracy of 86.6%.
In a classification problem of medical diagnosis, accuracy is generally mea-
sured in terms of specificity and sensitivity and both are directly proportional
to the accuracy of the classifier. Blumenthal et al. [12] proposed an automatic
classification for tumor and nontumor cells using support vector machine (SVM)
classifier which is trained on 4 components enhancing and nonenhancing, tumor
and nontumor. Classification results were evaluated using 2 fold cross validation
analysis of the training set and MR spectroscopy. High sensitivity and specificity
(100%) were obtained within the enhancing and nonenhancing areas.
Zakarakhi et al. [13] also proposed a scheme to classify brain tumor type and
grade using MR images. The proposed scheme consists of several steps includ-
ing ROI definition, feature extraction, feature selection and classification. The
extracted features include tumor shape and intensity characteristic as well as
rotation invariant texture features. Feature subset selection is performed using
SVM with recursive feature elimination. The binary SVM classification accuracy,
sensitivity and specificity were respectively 85%, 87% and 79% for discrimina-
tion of metastases from gliomas and 88%, 85% and 96% for discrimination of
high-grade from low-grade neoplasms.
Deep learning can be used efficiently for identification of different types of
substances in organ scans, as shown by Fang Liu et al. [14] as they designed a
deep Magnetic Resonance Attenuation Correction (MRAC) for classification of
air, bone and soft tissue in CT scans of various organs. Their method provided
an accurate pseudo CT scan with a mean Dice coefficient of 0.971 ± 0.005 for
air, 0.936 ± 0.011 for soft tissue and 0.803 ± 0.021 for bone.
Most common application of deep learning is detecting whether a person is
having a particular disease (mostly cancer) or not. David et al. [15] proposed
a skin cancer prediction model using ANN (artificial neural network) whose
training sensitivity was 88.5% and specificity was 62.2% for the prediction of
non-melanoma skin cancer (NMSC). The validation set showed a sensitivity of
86.2% and specificity of 62.7%. Vipin et al. [16] used a deep neural network
architecture for detection of tumors in lung CT scans and brain MR images.
They basically classified the images as being tumorous or non-tumorous. The
accuracy of classification was more than 97% for both CT and MR images.
Frameworks AlexNet and ZFNet are compared for the same purpose.

3 Method
3.1 Implementation Details
Dataset used is REMBRANDT [17,18] which consists of MR scans of 130
patients suffering from glioma tumors of different types and at different stages.
From this dataset, a total of 38,952 images, each of 128 × 128 were used. 5-
fold cross-validation is applied with batch-size of 512 images. The label 0 was
Hamiltonian Mechanics 593

Fig. 2. Architecture of proposed CNN.

astrocytoma, 1 was GBM and 2 was oligodendroglioma. Test split was 4096
images from a total of 38,952 images. For training, 45 epochs are used for each
validation. The proposed CNN model is implemented using tensorflow frame-
work on a system with configuration as 4 CPU, 15 GB with 2 NVIDIA K80
GPUs on ubuntu 16.05.

3.2 Convolutional Neural Network


In neural networks, there was a need to provide features to the network for
classification. CNNs are special type of neural networks where earlier layers are
used to extract features and later layers are used to perform classification using
the extracted features. In general, initial layers of CNN comprises of multiple
conv-pool layers followed by FCs. The last layer or output layer is either a
sigmoid layer (in case of binary classification) or a sigmoid layer. CNNs have
proved to be very effective for extracting features from images, and eliminates
the need of providing hand-crafted features to the network. Though training of
CNN are computationally expensive, but use of GPU can fasten the process.
Deeper the network, greater the classification power due to the additional
non-linearities and better quality of local optima [19]. However convolutions
with 3D kernels are computationally expensive in comparison to the 2D kernel,
which hampers the addition of more layers. Thus deeper network variants that
are implicitly regularized and more efficient networks can be designed by simply
replacing each layer of common architectures with more layers that use smaller
kernels [20].
However deeper networks are more difficult to train. It has been shown that
the forward (neuron activations) and backward (gradients) propagated signals
594 N. Saxena et al.

may explode or vanish if care is not given to retain its variance [21]. This problem
of vanishing gradients is solved by using adam optimizer described below.

Adam Optimizer: Adam, derived from adaptive moment estimation [22], is


an optimization algorithm used to solve the problem of vanishing gradients and
elps in achieving learning rate decay. It uses the first moment (which involves the
exponentially decaying average of the previous gradients) and second moments
(which involves exponentially decaying average of previous squared gradients).
Adam is generally regarded as being fairly robust to the choice of hyper parame-
ters, though the learning rate sometimes needs to be changed from the suggested
default.

Batch Normalization: In deep CNNs, each layer gets different inputs or acti-
vations which may result in inputs belonging to different distributions at different
layers. This problem of internal covariant shift [23] is solved by applying batch
normalization. It means, inputs at each layer are normalized so that they are all
on same scale and hence belong to same distribution. Thus, batch normalization
increases the adaptiveness of later layers learning. The batch normalization is
applied in all the layers in the proposed architecture.

Architecture: The CNN model (as shown in Fig. 2) is developed for 128 × 128
grayscale 2D MR images, having 5 conv-pool layers and 4 fully-connected (FC)
layers. All the conv-pool layers used 3 × 3 kernel size and stride as 1 and max-
pool layers used 2 × 2 kernel size and stride as 2. Activation function used in
all the layers is ReLu (Rectified Linear Unit). First layer used two dimensional
32 kernels followed by max-pool of stride and same padding. Second layer used
64 kernels, followed by 128 kernels in the third layer, 256 kernels in the forth
convolutional layer and finally 512 kernels in last convolutional layer (as shown
in Fig. 4). Batch normalization is performed in all conv-pool and FC layers.
After five layers, there are 8192 features which are then flattened in FC6 and
converged to 2048 features in FC7, followed by 512 in FC8, then 64 in FC9 and
finally to 3, which is the total number of classes. In last FC layer softmax, which
is a probabilistic activation, is applied as classification is done for three classes,
which are three different types of gliomas.

3.3 Results

The cost minimization in one validation set is shown in Fig. 3. The proposed
model is executed with 5-fold cross-validation and the overall cost minimization
is shown in Fig. 4. The reason for observed fluctuations in Fig. 4 is the applied
validation. The cost in last epoch of one validation set is much lower than the
cost at first epoch while training the next validation set. The model gives a
training accuracy of 63.17%, validation accuracy of 56.67% and test accuracy of
65.24%.
Hamiltonian Mechanics 595

Fig. 3. Cost plot.

Fig. 4. Cost when validation is applied.

4 Conclusion
This paper proposes a novel CNN based model for identification of glioma based
on their origin in brain. To the best of our knowledge, this is the first time deep
learning is applied for identification of glioma. The most common brain tumor is
gliblastoma multiforme which can be classified using the proposed model. GBM
is grade IV malingnant tumor. One shortcoming of the proposed model is that
some astrocytomas and oligodendrocytomas are misidentified as GBM. In future,
with further improvement, this model may assist radiologists to predict the type
of glioma a person is suffering from and treatment can be given accordingly.
596 N. Saxena et al.

5 Future Scope
Grade determines the severity of the disease. As mentioned, glioma has four
grades. Grade IV is the most malignant stage and is also called gliblastoma
multiforme or just high grade glioma. This paper detects the type of glioma
from MR images using CNN. Further, a different CNN architecture can be used
for the detection of grade of glioma.

References
1. Ideguchi, M., Kajiwara, K., Goto, H., Sugimoto, K., Nomura, S., Ikeda, E., Suzuki,
M.: MRI findings and pathological features in early-stage glioblastoma. J. Neu-
roOncol. 123, 289–297 (2015)
2. El-Gamal, F., Elmogy, M., Atwan, A.: Current trends in medical image registration
and fusion. Egypt. Inform. J. 17, 99–124 (2016). https://doi.org/10.1016/j.eij.2015.
09.002
3. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to
document recognition. Proc. IEEE 86, 2278–2324 (1998)
4. Zeiler, M., Fergus, R.: Visualizing and understanding convolutional networks. In:
European Conference on Computer Vision, pp. 818–833 (2014)
5. Rajnikanth, V., Fernandes, S., Bhushan, B., Sunder, N.: Segmentation and anal-
ysis of brain tumor using tsallis entropy and regularised level set. In: 2nd Inter-
national Conference on Micro-Electronics, Electromagnetics and Telecommunica-
tions. Springer, Singapore (2018)
6. Hoseini, F., Shahbahrami, A., Bayat, P.: An efficient implementation of deep con-
volutional neural networks for MRI segmentation. J. Digit. Imaging 31, 738 (2018)
7. McGuinness, K., O’Connor, N.: A comparative evaluation of interactive segmen-
tation algorithms. Pattern Recognit. 43, 434–444 (2010)
8. Wang, G., Li, W., Zuluaga, M., Pratt, R., Patel, P., Aertsen, M., Doel, T., David,
A., Deprest, J., Ourselin, S., Vercauteren, T.: Interactive medical image segmenta-
tion using deep learning with image-specific fine-tuning. IEEE Trans. Med. Imag-
ing. 37, 1562 (2018)
9. Sadeghi-Naini, A., Suraweera, H., Tran, W., Hadizad, F., Bruni, G., Rastegar, R.,
Curpen, B., Czarnota, G.: Breast-lesion characterization using textural features of
quantitative ultrasound parametric maps. Sci. Rep. 7, 13638 (2017)
10. Liu, M., Zhang, J., Nie, D., Yap, P., Shen, D.: Anatomical landmark based deep
feature representation for MR images in brain disease diagnosis. IEEE J. Biomed.
Health Inform. 22, 1476 (2018)
11. Devkota, B., Alsadoon, A., Prasad, P., Singh, A., Elchouemi, A.: Image segmen-
tation for early stage brain tumor detection using mathematical morphological
reconstruction. Procedia Comput. Sci. 125, 115–123 (2018)
12. Blumenthal, D., Artzi, M., Liberman, G., Bokstein, F., Aizenstein, O., Ben Bashat,
D.: Classification of high-grade glioma into tumor and nontumor components using
support vector machine. Am. J. Neuroradiol. 38, 908–914 (2017)
13. Zacharaki, E., Wang, S., Chawla, S., Soo Yoo, D., Wolf, R., Melhem, E., Davatzikos,
C.: Classification of brain tumor type and grade using MRI texture and shape in
a machine learning scheme. Magn. Reson. Med. 62, 1609–1618 (2009)
Hamiltonian Mechanics 597

14. Liu, F., Jang, H., Kijowski, R., Bradshaw, T., McMillan, A.: Deep learning MR
imaging-based attenuation correction for PET/MR imaging. Radiology 286, 676–
684 (2017)
15. Roffman, D., Hart, G., Girardi, M., Ko, C., Deng, J.: Predicting non-melanoma
skin cancer via a multi-parameterized artificial neural network. Sci. Rep. 8, 1701
(2018)
16. Makde, V., Bhavsar, J., Jain, S., Sharma, P.: Deep neural network based classifica-
tion of tumourous and non-tumorous medical images. In: International Conference
on Information and Communication Technology for Intelligent Systems, pp. 199–
206 (2017)
17. Scarpace, L., Flanders, A.E., Jain, R., Mikkelsen, T., Andrews, D.W.: Data From
REMBRANDT. The Cancer Imaging Archive (2017)
18. Clark, K., Vendt, B., Smith, K., Freymann, J., Kirby, J., Koppel, P., Moore, S.,
Phillips, S., Maffitt, D., Pringle, M., Tarbox, L., Prior, F.: The cancer imaging
archive (TCIA): maintaining and operating a public information repository. J.
Digit. Imaging 26, 1045–1057 (2013)
19. Choromanska, A., Henaff, M., Mathieu, M., Arous, G., LeCun, Y.: The loss surfaces
of multilayer networks. In: Artificial Intelligence and Statistic, pp. 192–204 (2015)
20. Kamnitsas, K., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Rueck-
ert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for
accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017). https://
doi.org/10.1016/j.media.2016.10.004
21. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward
neural networks. In: Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics, pp. 249–256 (2010)
22. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980
(2014)
23. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift. arXiv:1502.03167 (2015)
Array of Things for Smart Health Solutions Injury
Prevention, Performance Enhancement
and Rehabilitation

S. M. N. Arosha Senanayake1,2 ✉ , Siti Asmah @ Khairiyah Binti Haji Raub2,


( )

Abdul Ghani Naim1,2, and David Chieng3


1
Institute of Applied Data Analytics, University of Brunei Darussalam, Gadong BE1410, Brunei
arosha.senanayake@ubd.edu.bn
2
Faculty of Science, University of Brunei Darussalam, Gadong BE1410, Brunei
3
Wireless Innovation, MIMOS Berhard, Technology Park Malaysia, Kuala Lumpur, Malaysia

Abstract. Data visualization on wearable devices using cloud servers can


provide solutions for personalized healthcare monitoring of general public
leading to smart nation. The objective of this research is to develop personalized
healthcare IoT assistive devices/tools for injury prevention, performance
enhancement and rehabilitation using an Intelligent User Interfacing System. It
consists of Array of Things (AoT) which interconnects hybrid prototypes built
using different wearable measurement and instrumentations multimodel sensor
system for transient and actual health status and classification. Android platforms
have been used to prove the success of AoT using national athletes and soldiers
with whom were permitted the implementation of a knowledge base encapsulated
reference/benchmarking massive retrieve, retain, reuse and revise health pattern
sets accessible via case base reasoning cloud storage. Two case studies were
conducted for injury prevention and rehabilitation and performance enhancement
of soldiers and athletes using smart health algorithms. Validation and testing were
carried out using Samsung Gear S3 smart watches in real time.

Keywords: Array of Things (AoT) · Personalize healthcare


Multimodel sensor system · Transient health · Smart health

1 Introduction

Array of Things concept was firstly introduced in Smart Chicago project [1]. Their
concept was the designing of range of cyber physical devices as measurement and
instrumentation systems at urban scale based on the principle of array of telescopes and
IoT. In [2], authors summarize Parkinson Disease (PD) patients monitoring in the home
setting using wearable and ambient sensors. The technology includes a wireless unit
strapped around the wrist, Band-Aid-like sensors attached to the lower limbs, a wearable
camera worn as a pendant, a smart watch, and a mobile phone clipped on the belt used
as gateway to relay the data to the cloud to assess specific functions (using its embedded
sensors) as well as to communicate with the patient (using customized apps). The inte‐
gration of wearable technology with smart devices enables the remote monitoring of

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 598–615, 2019.
https://doi.org/10.1007/978-3-030-02686-8_45
Array of Things for Smart Health Solutions Injury Prevention 599

patients with PD and real-time feedback to clinicians, family/caregivers, and the patients
themselves.
Three Machine Learning (ML) algorithms were proposed to generate knee angle
patterns in sagittal plane, which is one of the joints used during the walk. The Extreme
Learning Machine algorithm outperformed against Artificial Neural Network and Multi-
output Support Vector algorithms and can generate a specific reference of normal knee
pattern depending on individual’s characteristics and walking speed. This specific refer‐
ence provides a personalized gait analysis [4].
Having done extensive research work on applying virtual measurement and instru‐
mentation for human motion analysis during past two decades [5–11], this paper intro‐
duces generalized frame work for data visualization on wearable devices for personal‐
ized healthcare using wearable sensors and its data fusion; Array of Things for smart
health solutions, as illustrated in Fig. 1.

Fig. 1. System overview of Array of Things for smart health solutions.

Smart health solution architecture consists of wearable devices for personalized


healthcare services and technologies and cloud server technologies in order to visualize
smart health data fused and update, repair and remove transient health data based on
actual health status using personalized wrist band data center. Thus, this paper is struc‐
tured from general system architecture introduced leading to specific application
domains used in order to prove its services. Smart health solution architecture is articu‐
lated using Hybrid System Architecture Platform (HSAP) that is the novel platform for
Array of Things (AoT) devices/tools composed of a set of cloud computing based sensor,
processing, control, and data services integrating AoT and cloud computing into a single
framework
Thus, this article describes the HSAP system architecture in detail using its core
components; smart data fusion, smart data analytics and deep learning. HSAP allows to
acquire personalize health pattern set using wearable devices which requires multimodel
sensory mechanisms to extract feature set, integrate feature set and transform it using
600 S.M.N. Arosha Senanayake et al.

data fusion techniques such a way that knowledge base (KB) of an individual person is
formed. Formed KB consists of pre-injury (healthy) pattern set, injury pattern set and
post-injury pattern set which will be updated using personalized wrist band data center
primarily using worn IoTs. Virtual measurements and instrumentation technologies
(LabVIEW) is used as the platform to interface AoTs connected to cloud server by
implementing Intelligent Graphical User Interfacing System (IGUIS) in order to acquire
current (actual) health data pattern set on site, online and real time to update KB using
case base reasoning such a way that cloud computing takes care of providing appropriate
services; reactive care, episodic centric and clinic centric for performance enhancement,
injury prevention and rehabilitation. Thus, KB interfaced with smart health algorithms
processed using cloud computing facilitates the classification of current health status
considered as actual health status while cloud storage maintains transient health status
of each individual using historic pattern set already available in cloud storage. Based on
the limited storage available in worn IoTs, Samsung Gear S3 watch provides 2 GB free
space, transient health status (classification) is stored in a queue to continuously update
the classification of individual using actual health status on site, online and real time.

2 Smart Health Solution Architecture

2.1 Rationale

On health or lifestyle monitoring, harvesting of motion data and context reasoning is


often a complex task. IntelliHealth Solutions was introduced to assess, monitor and to
provide feedback on active lifestyle focusing generalized solution for normal Brunei
citizens [4]. While IntelliHealth solutions has already achieved the establishment of
reference standards of Brunei Citizens based on soldiers and national athletes (healthy
citizens) using intelligent knowledge base formed (resident pattern storage in a cloud
server) [5], the aim of this research is to develop a transient wearable healthcare solutions
for transient pattern storage in real time with shared resource allocation using cloud
technology for resident pattern storage already formed using intelligent knowledge base.
This will allow real time monitoring of human test subject while performing real time
walking, jogging, running and cycling. So far, resident pattern storage of soldiers and
athletes has been established using smart data and decision fusion consisted of smart
data analytics, deep learning, case based reasoning and virtual measurement and instru‐
mentation technologies [6]. Thus, the achievement of the development of wearable
motion interfacing and reasoning devices for general public with its own vision ‘towards
active healthy lifestyle’ facilitates the monitoring of gait and rehabilitation of initially
ASEAN obese community with pilot study on going in Brunei as the center, Malaysia
and Vietnam under the ASEAN Institutes of Virtual Organization at National Informa‐
tion and Communications Technology (NICT), Tokyo, Japan with the title “IoT system
for Public Health and Safety Monitoring with Ubiquitous Location Tracking”.
Heavy computations required for motion data reasoning and position estimation
result in high energy consumption. Together with the needs to maintain a reliable data
connection anytime anywhere, a practical battery design is becoming a huge challenge
for such wearable devices. Certain computations need to be offloaded to a cloud server
Array of Things for Smart Health Solutions Injury Prevention 601

without significantly compromising the response time. In today’s highly digitized


society, cloud technologies play a critical role in preserving health and safety of citizen
especially women, children and the elderly. Over the last few years, there is a growing
needs for monitoring the citizen’s lifestyle including their health status.
Smart Health will have a direct impact on society leading to a smart society. The
ultimate achievement of AoT for smart health solutions works as a service provider for
the wellbeing of public. The AoT for quality life style have not been addressed exten‐
sively in recent years. Recently developed devices were not a great success due to three
main critical issues not appropriately integrated into customized devices targeting a
particular society needs (ASEAN countries); Intelligent User Interfaces, information
fusion and real time biofeedback control. Hence, the goal of Smart Health solutions is
to design, implement and build AoT devices/tools which incorporate hybrid tools; intel‐
ligent user interfacing systems and real time biofeedback control systems embedded
with information fusion.
Smart Health will have a direct impact on society leading to a smart society. The
ultimate achievement of AoT for smart health solutions works as a service provider for
the wellbeing of public. The AoT for quality life style have not been addressed exten‐
sively in recent years. Recently developed devices were not a great success due to three
main critical issues not appropriately integrated into customized devices targeting a
particular society needs (ASEAN countries); Intelligent User Interfaces, information
fusion and real time biofeedback control. Hence, the goal of Smart Health solutions is
to design, implement and build AoT devices/tools which incorporate hybrid tools; intel‐
ligent user interfacing systems and real time biofeedback control systems embedded
with information fusion.
Thus, AoT for smart health solutions embeds solutions for injury prevention,
performance enhancement and rehabilitation using reactive care services, episodic
response services and clinic centric services respectively. Intelligent Graphical User
Interfacing System (IGUIS) was built to integrate these services and tested using soldiers
and national athletes successfully as reported in [6]. IGUIS was built using virtual
measurement and instrumentation tools provided by LabVIEW and using Support
Vector Machines (SVM) interfaced with case base reasoning.

2.2 System Architecture


As shown in Fig. 1, the overall system architecture is mainly divided into two sub-
systems; Wearable Device and Server (Cloud) which are interconnected via communi‐
cation protocols with two critical parameters; one related to IoT(s) active from Array of
Things (AoT) and the status.
Initially, wearable device considered is Android based platform, but reconfiguring
to other wearable platforms is allowed using customizing tools integrated. Wearable
device contains multimodal healthcare system on device, personalized wrist band data
center and AoT platforms. AoT is designed in order to accommodate all embedded
platforms arising from multimodal healthcare system from different devices. It is imple‐
mented using real time embedded system interfaced with IGUISs. Hence, AoT uses
daisy chain methods to interface with all IoT devices encapsulated under smart health
602 S.M.N. Arosha Senanayake et al.

solutions. This will allow the connectivity of future IoTs to be developed with no addi‐
tional hardware. In order to facilitate the connectivity with Cloud servers, personalized
communication protocol is built.
Communication protocol is the interface to the server usually a cloud server config‐
ured to the IoT in consideration. It carries two important information from Android
device currently active; IoT and Status. IoT information contains personalized health
protocol headers which allows to reconfigure and to synchronize with corresponding
smart health data in the cloud server. The status is the result of actual health status of
actual human test subject in consideration in real time or online.
Cloud server contains smart health algorithms built in on server, smart health data
analytics and hybrid system platforms. Thus, cloud server is the service provider which
provides data visualization using virtual technologies and services requested by the end
user.
Hybrid system platforms is based on hybrid system architecture platforms (HSAP)
interfaced to wearable devices. As far as wearable devices connected are based on
HSAP, they can transfer necessary smart health data into HSAP for processing. In this
project, HSAP is restricted to wearable devices with Android platforms and its families
such as Tizen OS platforms used for smart watches. HSAP is depicted in Fig. 2.

Fig. 2. Hybrid System Architecture Platforms (HSAP).

Main components of HSAP are smart data fusion, smart data analytics and deep
learning. Smart data fusion is carried out using IoT currently active interfaced with actual
health status of current human test subject under consideration in real time or/and online.
Thus, this will facilitate to apply selected smart health algorithm in order to transform
Array of Things for Smart Health Solutions Injury Prevention 603

active pattern set for smart data analytics. Smart data analytics is responsible to apply
case based reasoning for the intelligent knowledge base already stored in cloud server
such a way that transient health pattern set already available in the memory is the basis
to retrieve the matching pattern set or/and revise and retain in the knowledge base. Deep
learning techniques are implemented to produce the output to be either visualized as
personalized health data or/and client services requested by clinicians or/and physio‐
therapists or/and trainers or/and subject under assessment which are primarily based on
the established protocols and norms for injury prevention, performance enhancement
and rehabilitation monitoring. In this research, Canadian protocols have been used to
implement decision fusion algorithms to make a final judgment as a wireless wearable
assistive tool/device independent of location and human anthropometry.

3 Prototypes Built, Emulation and Validation

The implementation of the AoT for Smart Health Solutions (SHS) is based on the criteria
and norms (Canadian norms) currently practiced by the Performance Optimization
Centre of Ministry of Defense and Sports Medicine and Research Centre of Brunei
utilizing the standard guidelines established for injury prevention, performance
enhancement and rehabilitation of soldiers and national athletes. Thus, AoT is designed
by setting up different functional/service units (currently in operation) as follows; reac‐
tive care, episodic response and clinic centric.
Thus, smart health solutions at its current stage support the following functionalities
across wearable devices and HSAP.
• Personalized Wrist Band Data Centre for Healthy Lifestyle
• Pre-clinical monitoring of movement disorders/abnormalities
• Secure Personalized Performance Analysis Data Center
• Personalized Recovery Progress Analysis and Classification
• Secure Sports/Military Personnel Performance Enhancement.
A hybrid intelligent framework was developed by combining case-based reasoning
(CBR) approach and adaptive intelligent mechanisms in order to build prototypes with
different functionalities. The framework utilizes the concept of solving new problems
by using/modifying the similar previous experiences (problem-solution pairs). CBR
problem-solving cycle consists of four steps [7, 12]:
• Retrieve: Finding similar case(s) from the knowledge base whose problem descrip‐
tion best matches with the given problem.
• Reuse: Reusing the solution of most similar case to solve the new problem.
• Revise: Adapting/Modifying the chosen solution according to the differences in new
problem.
• Retain: Storing the new problem-solution pair as a case once it has been solved.
Thus, designing intelligent hybrid knowledge based system is subject to the estab‐
lishment of knowledge base (KB) of smart health solutions using pattern sets currently
604 S.M.N. Arosha Senanayake et al.

available and at the same time allowing the evolvement of KB with new pattern sets
subject to CBR which is stored in a cloud server as depicted in Fig. 1.

3.1 Knowledge Base (KB)


The structure of knowledge base for smart health solutions is depicted in Fig. 3. The
knowledge base contains different types of information including; raw and processed
data, domain knowledge, historical data available for subjects (pre-injury, post-injury
and recovery data) and session data during convalescence, case library (problem-solu‐
tion pair), reasoning and learning models (trained intelligent methods) and other relevant
data (e.g. subjects’ profiles, gender, activity type, etc.).

Fig. 3. The structure of knowledge base for smart health solutions.

In order to manage the knowledge base repository, a relational database was used to
reduce the storage redundancy and provide flexibility. The knowledge base evolves with
the time-period when new problems are presented and new cases are added to the system
Array of Things for Smart Health Solutions Injury Prevention 605

using CBR. This evolution process makes it more useful for domains where subject’s
specific monitoring and prognosis mechanisms are required.
In general, the information in KB can be represented as in (1):
[ i j k
( j)
KB = pre_inj_I
( S
, post_inj_I
) ( S
, post_op_I
) S
, T pre_inj_I
] S
,
(1)
T post_inj_I jS , T post_op_I kS , Sp , D, C, Mt

where
pre_inj_I iS: raw input data set of a group of subjects ‘S’ for different activities at pre-
injury (i.e. healthy) stage for i sessions (i ≥ 1)
post_inj_I jS: raw input data set of a group of subjects ‘S’ for different activities during
post injury for j sessions (j ≥ 1)
post_op_I kS: raw input data set of a group of subjects ‘S’ for different activities during
post-surgery (i.e. rehabilitation) for k sessions (k ≥ 1)
T( pre_inj_I iS): processed input data set of a group of subjects ‘S’ for different activ‐
ities at pre-injury (i.e. healthy) stage for i sessions (i ≥ 1)
T( post_inj_I jS): processed input data set of a group of subjects ‘S’ for different activ‐
ities during post-injury (i.e. before surgery) for j sessions (j ≥ 1)
T( post_op_I kS): processed input data set of a group of subjects ‘S’ for different activ‐
ities during post-surgery (i.e. rehabilitation) for k sessions (k ≥ 1)
Sp: profile (e.g. gender, age, weight, height, type of injuries, activities etc.) of p
subjects
D: domain knowledge (e.g. type of protocols followed for subjects, local/standard
norms for different rehabilitation testing activities etc.)
C: case library consisting of problem-solution pairs (processed input, rehabilitation
procedure followed, outcomes and possible suggestions) related to individuals or
different group of subjects
Mt: trained intelligent models for each activity t to be monitored.
The designed KB is not a static collection of information, but it acts as a dynamic
resource which has the capacity to learn and evolve with the passage of time when new
problems are presented and new problem-solution pairs are added to the system using
CBR. This evolution process makes it more useful for domains where subject’s specific
monitoring and prognosis mechanisms are required. Thus, as an integral component of
injury prevention, performance enhancement and rehabilitation, this KB has been used
to optimize collection, organization and retrieval of relevant information for subjects
using CBR.

3.2 Smart Health Solutions Service Provider


Services defined by smart health solutions are tightly coupled with available AoT func‐
tional/service units and its functionalities across wearable devices and HSAP with the
hybrid intelligent knowledge based system formed as explained in the Sect. 3.1. Hence,
prototypes built, emulation and validation are carried out using reactive care, episodic
606 S.M.N. Arosha Senanayake et al.

response and clinic centric under the careful supervision of specialists; clinicians, phys‐
iotherapists, trainers, test subjects, etc.
Reactive Care. This service provides performance enhancement and injury prevention
tools as proactive and preventive care services for healthy active lifestyle. If a person is
concerned about daily active lifestyle, reactive care services produce required output
data using daily healthcare records up to date using easy steps as follows:
• Secure Personalized data center is responsible to store and to visualize all measure‐
ments of daily active lifestyle.
• If a person is not active during working time, preventive care tool assists to find and
to determine causes.
• Produce and generate personalized reports using data visualization tools.
Episodic Response. These tools guarantee life long active daily life style by providing
periodic monitoring and biofeedback control through appropriate intervention during
critical stages. Episodic response tools provide services not only for today, it is about
wellbeing throughout the life. Periodic monitoring of recovery stages upon the injury
treatment will lead the returning to healthy active lifestyle within shortest possible time
frame. These features are integrated using the following tools:
• Pre-clinical monitoring of movement disorders/abnormalities.
• Personalized Recovery Progress Analysis and Classification by storing personalized
data into a knowledge base in which pre-injury, post-injury and recovery data are
stored and fused in the cloud server.
• Real time biofeedback control using personalized wearable devices.
Clinic Centric. Clinic centric service guides patients with rehabilitation protocols for
recovery of injured joints/muscles or/and tiny muscle repair. The injury recovery is
crucial to return to active daily healthy lifestyle. Progressive recovery percentage can
be quantified and visualized using following tools:
• Secure personalized wrist band data center using wearable wireless sensor suit.
• Integrated tiny muscle detector of damaged tiny muscle areas in relevant muscles up
to mm2.
• Produce and generate personalized reports using virtual technologies interfaced with
data visualization tools.
In this research, prototypes built, emulation and validation of smart health solution
services have been proven and tested using the following key and critical planned activ‐
ities:
• Prototypes built for physical & mobility impairments, obesity, gait disorders, etc.
• Incorporated intelligent user interfacing tools and real time biofeedback mechanisms
in wearable devices (smart watches) and customized taking into consideration society
needs.
• Validate and test smart health solution service for different types of human test
subjects (ASEAN, Japan and USA) in different clinical environment; Performance
Optimization Centre and Sports Medicine and Research Center in Brunei.
Array of Things for Smart Health Solutions Injury Prevention 607

4 Case Studies Using AoT Built

AoT is built using virtual measurement and instrumentation technologies (LabVIEW),


Tizen OS emulator and smart watches for physical and mobility impairments, obesity
and gait disorders community and for national athletes as healthy subjects in a society.
In order to validate and test AoT so far built, clinical and laboratory environment were
set up as illustrated in Fig. 4 at Performance Optimization Centre of Ministry of Defense,
Sports Medicine and Research Centre of Ministry of Youth, Culture and Sports and
Physiotherapy unit under Ministry of Health.

Fig. 4. Clinical and laboratory set up for smart health solutions.

4.1 Case Study 1 – Injury Prevention and Rehabilitation


A general framework of intelligent and interactive biofeedback virtual measurement and
instrumentation system was built for physical and mobility impairments, obesity and
gait disorders as smart health solution for soldiers and professional athletes, especially
during rehabilitation monitoring. The application of machine learning techniques along
with custom built wireless wearable sensor suit facilitated in building a knowledge base
system for periodical rehabilitation monitoring of test subjects and providing a visual/
numeric biofeedback to the clinicians, patients and healthcare professionals. The vali‐
dated system is currently used as a decision supporting tool by the clinicians, physio‐
therapists, physiatrists and sports trainers for quantitative rehabilitation analysis of the
subjects in conjunction with the existing recovery monitoring systems [5].
In order to perform real time recovery classification of gait pattern for an ambulation
activity, multi-class Support Vector Machine (SVM) is implemented using one – vs –
all method. SVM has been extensively used as a machine learning technique for many
biomedical signal classification applications. The identification of class/status from gait
patterns of a new/actual subject can provide useful complementary information in order
to make the adjustments in his/her rehabilitation process. Figure 5 illustrates LabVIEW
608 S.M.N. Arosha Senanayake et al.

data flow diagram of SVM embedded into the Intelligent Graphical User Interfacing
System (IGUIS) built [6].

Fig. 5. Data flow diagram of SVM for recovery classification.

Thus, interactive biofeedback visualization was designed to monitor rehabilitation


and recovery status of subjects with physical and mobility impairments, obesity and gait
disorders. There are two conditions accepted by biofeedback visualization. First condi‐
tion is the availability of gait pattern set of the subject in the KB (offline) while the
second condition is the subject undergoing actual experiment to analyze current recovery
status (real time). In offline mode, biofeedback visualization displays previously saved
and visualized signals using IGUIS. The total time needed for real time system software
to start until the output produced is 20 s during real time analysis, otherwise in offline
processing, it is immediate. The visual output generated using IGUIS facilitates the
adjusting of individual subject’s rehabilitation protocol using standard procedures
governed.
Different classifiers may assign different classes to the same subject base on his/her
performance during each activity or due to misclassification. In addition to evaluate the
output of an individual activity of a subject, an overall assessment can also be helpful
to categorize the recovery stage of a subject after a certain rehabilitation period. The
classification results of multiple activities for each subject’s data have been combined
using Choquet integral method as illustrated in (2). The Choquet integral is a non-linear
functional defined with respect to a fuzzy measure gλ, where gλ is completely determined
by its densities (gi - degree of importance of classifier yi towards final decision). The
fusion of different classifiers is computed based on (1) and (2) [8, 13].
Array of Things for Smart Health Solutions Injury Prevention 609

t

ek = (hk (yi ) − hk (yi−1 )).g(Si ) (2)
i=1

Where
hk (yi ): The certainty of the identification of subject S to be in stage k using the
classifier yi
g(Si ): The degree of importance of classifier yi of the subject S towards final decision
ek: The overall recovery stage of the fuzzy integration based on the highest value
computed for e in the stage k of subject S.
Figure 6 shows classes of recovery classification of a knee injured test subject
extracted from IGUIS built. Four classes (A, B, C and D) were formed using historical
data collected and stored in the KB using fuzzy C-means clustering. Hence, classes A
through D represent different stages of health/recovery condition of subjects based on
the gait patterns; Class A: represents 2–6 months of recovery, Class B: represents 7–12
months of recovery Class C: represents 13–24 months of recovery; Class D: represents
healthy subject.

Fig. 6. Knee recovery classification of the subject classified as Class A in real time.

Having implemented hybrid intelligent framework together with CBR called as


smart health algorithms stored in cloud server for clinic centric/episodic response care
services and data visualization can be obtained using wearable IoT devices. In this study,
Tizen OS visualization emulator was used as illustrated in Fig. 7 and subsequently
visualized using Samsung Gear S3 smart watch as IoT device, courtesy from Samsung
Asia Pte Ltd, Singapore as illustrated in Figs. 8 and 9 using JSON tools.
610 S.M.N. Arosha Senanayake et al.

Fig. 7. Tizen OS emulator for real time classification during rehabilitation.

Fig. 8. IoT devices for real time classification during knee rehabilitation.

Fig. 9. Samsung Gear S3 smart watch for real time classification during knee rehabilitation.

In this study, Samsung Gear S3 smart watch works as an IoT for injury prevention
and rehabilitation tool wirelessly connected independent of clinicians and patients
(soldiers) locations. As far as IoT (Samsung Gear S3 smart watch) tool revises the pattern
set using actual (current) pattern set identified during the rehabilitation process, case
based reasoning is used to update the intelligent KB in the cloud server. Hence, clinicians
Array of Things for Smart Health Solutions Injury Prevention 611

were able provide real time biofeedback for patients so that soldiers under monitoring
due to rehabilitation and injury prevention followed the protocols given clinicians in
order to improve the recovery classification. As per IoT so far built for the critical joint
of soldiers’ knee, clinicians were able to prevent doing second Anterior Cruciate Liga‐
ment (ACL) surgery for women soldiers who were commonly prone not returning
soldiers’ career due to no real time biofeedback monitoring done previously. Therefore,
Samsung Gear S3 smart watch as the IoT for real time knee monitoring used in this study
was capable to provide the current recovery classification of knee injured soldier without
physical presence in the clinic and at the same time based on the current classification,
clinicians were able to provide new protocols to improve knee rehabilitation process.
Currently, this IoT is used for soldiers as soldiers are considered as reference/bench‐
marking population in a nation. Since this study has already proven the capability of
real time biofeedback monitoring using IoT via smart health data stored and accessed
via cloud server set up, current study is in focus on validating and testing normal public
in physiotherapy clinic in the government hospital and Jerudong Park Medical Center
under Gleneagles Hospital chain from Singapore under the close routine monitoring of
clinicians in the clinic. While patients are voluntarily taking part in this pilot study, smart
watches are sponsored by Samsung Asia Pte Ltd to revise the pattern set based on the
pattern set collected at home environment by automatically updating smart health data
in the cloud server.

4.2 Case Study 2 – Performance Enhancement


A hybrid framework combining Self Organizing Maps (SOMs) and CBR for clustering,
accessing, examining and recommending training procedures for performance enhance‐
ment of national athletes is implemented. This system is intended to assist sports profes‐
sionals, coaches or clinicians to maintain records of subject information, experiment
information, diagnose improper movements based on KB, provide recommendation for
improvements and monitor progress of performance over a period of time.
The IGUIS is built to facilitate monitoring and providing instantaneous biofeedback
during training sessions. The IGUIS supports a range of features necessary in real time
applications, and are clustered into separate frames for simplicity and ease of use.
Figure 10 illustrates IoT platforms used for real time data visualization during
performance enhancement of national athletes based on the hybrid framework
combining SOMs and CBR implemented as smart health algorithms in the cloud server
in order to derive personalized performance enhancement of athletes using reactive care
and episodic response services provided by the smart health solutions. In this study,
Tizen OS emulator followed by Samsung Gear S3 smart watch was used to visualize
data applying database driven neural computing interfaced with JSON tools as illustrated
in Fig. 11.
612 S.M.N. Arosha Senanayake et al.

Fig. 10. IoT platforms for athletes’ performance enhancement using hybrid intelligent
computing.

Fig. 11. Samsung Gear S3 smart watch for athletes’ performance enhancement in real time.

In this study, database-driven neural computing system was used to monitor different
activities instructed by coaches during their training regime. Different coaches use
different protocols and standards to classify national athletes. But, in general the expect‐
ation is to perform as excellent or very good for different activities assigned each athlete
during training regime, otherwise automatically considered as not deserve to be in the
national pool of athletes. Hence, women netball players in a training regime were
considered under the close monitoring of coaches and physical strength and conditioning
specialist who use Canadian protocols. There are pre-defined activities set by the coach
during training regime in order for coach to decide the positioning of players in forth‐
coming international games/tournament. By wearing smart watch during training regime
in the indoor stadium and pre-defined physical exercises given by coaches and clinicians,
just before the subsequent training regime, coach and clinicians have the access to profile
pattern set of each athlete updated in the cloud server. Samsung Gear S3 smart watch
considered as an IoT worn by each athlete automatically visualizes transient health status
of personalized classification from cloud storage prior to actual regime starts which is
fundamental for healthcare professionals, in this case coaches to determine the perform‐
ance level of athlete to be undergone in the actual training regime onsite, online and real
time. Hence, coaches and clinicians are able to make a judgment or/and re-adjust the
Array of Things for Smart Health Solutions Injury Prevention 613

training regime of each athlete with updated/revised protocols for the forthcoming
training regimes and actual games based on real time biofeedback monitoring.

5 Comparative Analysis with Existing Systems

Array of Things (AoT) using virtual measurement and instrumentation technologies for
smart health solutions addressed in the research work is novel. While there are specific
application domains exist using augmented, virtual and mixed realties, none of the
existing applications failed to introduce generalized architecture similar to Hybrid
System Architecture Platform (HSAP) which allows the interfacing and mapping to
specific domain of interest using cloud computing. Further, this article addresses the
solution space using wearable technologies from the acquisition of pattern set of person‐
alized health pattern set via multimodal healthcare system using personalized wrist band
data center while IoT themselves, in this case Samsung Gear S3 smart watch works as
real time biofeedback monitoring based on transient health status (classification) and
current/actual health status (classification or recovery status) onsite, online and real time
during injury prevention, performance enhancement and rehabilitation using cloud
computing. Hence, there is no concrete evidence in literature to do comparative analysis
because so far solutions provided are domain centric within digital healthcare technol‐
ogies and services.

6 Conclusions

Array of Things (AoT) for smart health solutions during injury prevention, performance
enhancement and rehabilitation futuristic concept introduced in this research work were
proven by interfacing virtual measurement and instrumentation (LabVIEW from NI)
and IoT platforms (Samsung Gear S3 smart watch). Intelligent graphical user interfacing
system was built to assist the formation of intelligent knowledge base which is an
evolving smart health pattern storage using case base reasoning via retrieve, reuse, revise
and retain mechanisms during real time biofeedback monitoring. At its current stage,
cloud storage consists of smart health data processed according to Canadian standard
protocols established by coaches, clinicians, physiotherapists and physical strength
conditioning specialists at Performance Optimization Center of Ministry of Defense and
at Sports Medicine and Research Center of Ministry of Youth, Culture and Sports using
nation active healthy population; soldiers and professional athletes. Two case studies
have been conducted during their training regimes under close monitoring of different
specialists. AoT for smart health solutions concept was proven using IoT platforms
during real time feedback monitoring and at the same time reference and benchmarking
were able to set up based on the nation active healthy population; soldiers and athletes.
This will allow to establish norms for general public for their health and safety moni‐
toring during their real time biofeedback monitoring using these IoT platforms as assis‐
tive tools/devices for different health classification and recovery status regardless of
patients location whether at home or/and at clinic under close monitoring of different
specialists.
614 S.M.N. Arosha Senanayake et al.

Thus, services provided by AoT; reactive care, clinic centric and episodic response
provide the platform to personalize IoT devices for healthcare using database driven
neural computing platforms.
Therefore, futuristic goal of this ongoing research will be the utilization different
deep learning algorithms, in particular reinforcement learning mechanisms for smart
data analytics which will be geared for smart data visualization and services.

Acknowledgments. This publication is part of the output of the ASEAN Institutes of Virtual
Organization at National Information and Communications Technology (NICT), Tokyo, Japan;
ASEAN IVO project with the title “IoT system for Public Health and Safety Monitoring with
Ubiquitous Location Tracking”. This research is also partially funded by the University Research
Council (URC) grant scheme of Universiti Brunei Darussalam under the grant No: UBD/PNC2/2/
RG/1(195).

References

1. Michael, E.P.: Introduction to the array of things. http://niu.edu/azad/_pdf/3-


Michael_May18_2016.pdf
2. Alberto, J.E., et al.: Technology in parkinson’s disease: challenges and opportunities. Mov.
Disord. 31(9), 1272–1282 (2016). https://doi.org/10.1002/mds.26642. Epub 29 April 2016
3. Vieira, A., Ribeiro, B., Ferreira, J.P.: GAIT analysis: methods & data review. Cisuc tecnhical
report TR-2017–004, December 2017 (unpublished)
4. Arosha Senanayake, S.M.N., et al.: IntelliHealth solutions: technology licensing. http://
intelli-health.org/
5. Yahya, U., Arosha Senanayake, S.M.N., Naim, A.G.: Intelligent integrated wearable sensing
mechanism for vertical jump height prediction in female netball players. In: Eleventh
International Conference on Sensing Technology (ICST), Sydney, Australia, pp. 94–100.
https://doi.org/10.1109/icsenst.2017.8304484. 978-1-5090-6526-4/17/$31.00 ©2017 Crown
6. Filzah Pg Damit, D.N., Arosha Senanayake, S.M.N., O., Malik, Jaidi Pg Tuah, P.H.N.:
Instrumented measurement analysis system for soldiers’ load carriage movement using 3-D
kinematics and spatio-temporal features. Measurement 95, 230–238 (2017)
7. Wulandari, P., Arosha Senanayake, S.M.N., Malik, O.A.: A real-time intelligent biofeedback
gait patterns analysis system for knee injured subjects. In: Nguyen, N.T., et al. (eds.)
Intelligent Information and Database Systems, Part II. Lecture Notes in Artificial Intelligence
(LNAI), vol. 9622, pp. 703–712. Springer, Heidelberg (2016). https://doi.org/
10.1007/978-3-662-49390-8_68
8. Arosha Senanayake, S.M.N., Malik, O.A., Iskandar, P.M., Zaheer, D.: A knowledge-based
intelligent framework for anterior cruciate ligament rehabilitation monitoring. J. Appl. Soft
Comput. 20, 127–141 (2014)
9. Senanayake, C., Arosha Senanayake, S.M.N.: A computational method for reliable gait event
detection and abnormality detection for feedback in rehabilitation. Comput. Methods
Biomech. Biomed. Eng. 14(10), 863–874 (2011)
10. Alahakone, A.U., Senanayake, A.: A real-time interactive biofeedback system for sports
training and rehabilitation. Proc. IMechE J. Sports Eng. Technol. 224(Part P), 181–190 (2010)
11. Gouwanda, D., Arosha Senanayake, S.M.N.: Emerging trends of body-mounted sensors in
sports and human gait analysis. In: International Federation for Medical and Biological
Engineering Book series, Chap. 102. Springer, Heidelberg (2008). ISBN 978-3-540-69138-9
Array of Things for Smart Health Solutions Injury Prevention 615

12. Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations,
and system approaches. AI Commun. 7, 39–59 (1994)
13. Murofushi, T., Sugeno, M.: An interpretation of fuzzy measures and the choquet integral as
an integral with respect to a fuzzy measure. Fuzzy Sets Syst. 29, 201–227 (1989)
Applying Waterjet Technology
in Surgical Procedures

George Abdou(&) and Nadi Atalla

New Jersey Institute of Technology, Newark, USA


{abdou,na76}@njit.edu

Abstract. The main objective of the paper is to predict the optimal waterjet
pressure required to cut, drill or debride the skin layers without causing any
damages to the organs. A relationship between the waterjet pressure and skin
thickness has been established. It also includes the modulus of elasticity of the
skin, the diameter of nozzle orifice, the nozzle standoff distance and the traverse
speed of the waterjet as well as the duration of applying the waterjet pressure.
Thus, practical relationship between waterjet operating parameters and the
physical properties of the skin has been formulated. A real Caesarean section
procedure data has been applied to the formulation. Given the Ultimate Tensile
Strength of the skin at the abdomen to be 20 MPa, incision parameters of
18 mm deep, 12 cm long and 0.4 mm wide, applying a traverse speed of
0.5 mm/s and stand-off distance of 5 mm, the resulted waterjet pressure is
17.89 MPa using a 0.4 mm orifice diameter.

Keywords: Waterjet  Surgery  Skin  Incision

1 Introduction

Waterjet technology has been used in several applications such as industrial cutting,
drilling and cleaning. Furthermore, waterjet technology can also be used in the medical
field; applications of this include dentistry, wound cleaning and other surgical opera-
tions. Over the years, waterjet techniques have been developed into a revolutionary
cutting tool in variety types of surgery [1]. It can be used in precision cutting of skin for
any type of surgery. The tool would simply be moved in a line to apply the pressure
and the cut. The main advantage of waterjet incision is its precision; it is as effective as
a laser cutter. However, the waterjet incision does not cause any thermal damage to the
separated tissue due to its coolant ability. Additionally, the waterjet also washes away
blood which eliminates any extra tools to do this which would be required in a regular
cut [2].
In vivo and in vitro experiments on patients and animals have been conducted with
continuous waterjet at different low pressures. However, few studies have focused on
the skin. Further analyses on the relationship among the operating parameters of
waterjet, structure, and mechanical properties of the skin must be conducted.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 616–625, 2019.
https://doi.org/10.1007/978-3-030-02686-8_46
Applying Waterjet Technology in Surgical Procedures 617

2 Literature Review

The waterjet technology is currently used for cutting a wide range of materials. The
main advantages of this technology include the lack of thermal effect on the material
being cut. While waterjet is applied to all kinds of industries, only the medical field will
be highlighted. Table 1 summarizes some of the applications of waterjet cutting in the
medical field.

Table 1. Overview of using waterjet in medicine [3]


Type of Operation description Benefits
surgery
Orthopedic Cutting endoprosthesis and bone Below the critical temperature by
cutting
Dental Cutting and grinding of dental Reduces the risk of jagged teeth and
materials reduces the need for anesthesia
General Resection of soft tissues: liver, gall Blood vessels and nerve fibers remain
bladder, brain, kidney, prostate, in the defined pressure maintained,
cleaning wounds minimal bleeding, intact edges and
precise cuts, lack of necrotic edge,
reduce the duration of myocardial
ischemia
Plastic Cleaning skin graft, removal of Separation of the layers of tissue,
tattoos, liposuction higher accuracy of results without
edema and contour changes
Dermatology Removing dead skin Possibility of direct dose medications
in a water jet

The performance of waterjet machining process is dependent on the water pressure


of the jet and the elastic properties of the skin. The initial impact is considered to be the
highest impact; it can be achieved when the waterjet hits the tissue. After that, the water
starts flowing radially and the impact of the jet decreases [4].

2.1 Waterjet in Surgical Wound Debridement


Waterjet technology can be used for surgical wound debridement and surgical inter-
ventions where selective cutting is necessary. Surgical wound debridement uses
devices on the market such as VersaJet and Debritom while surgical interventions use
devices on the market such as Jet Cutter 4, Helix HydroJet and ErbeJet2 [4].
A study in 2006 introduced Versajet waterjet as an alternative to standard surgical
excisional techniques for burn wounds. In the study, the Versajet waterjet was able to
sufficiently debride superficial partial thickness and mid-dermal partial thickness
wounds for the subsequent placement of Biobrane. Additionally, the study has
demonstrated that the Versajet waterjet has the advantage in the surgical treatment of
superficial to mid-partial thickness burns in the face, hand and foot [5].
618 G. Abdou and N. Atalla

Another study conducted in 2007 reviewed the versatility of the Versajet waterjet
surgical tool in treating the deep and indeterminate depth face and neck burns. With ex-
vivo histologic analysis of depth of debridement on human skin, the study confirmed
that predictable and controlled depth of debridement could be obtained by adjusting the
apparatus settings [6].

2.2 The Use of Waterjet Incision in Other Surgical Procedures


Waterjet technology in surgical procedures was first reported in 1982 for liver resec-
tion. Throughout the years, waterjet machining process has become a recognized
technique in different surgical areas. Clinically, waterjet technique is used for cutting
softs tissues like liver tissues. Experimentally, waterjet technique is used for dissecting
spleen, kidney tissue and brain tissues. While these tissues can be cut at low water
pressures, waterjet techniques can also cut bone and bone cement at much higher water
pressures [7].
Studies have been done using waterjet technology to drill or cut bone or bone
cement. A study in 2014 has shown that such cut requires water pressure that ranges
between 30 MPa to 50 MPa; which depends on the diameter of the nozzle. The study
also summarized different materials that were tested in previous analyses, the required
waterjet pressure to cut them as well as the nozzle diameter (Table 2).

Table 2. Overview of required waterjet pressures to cut bone and bone cement [7]
Material tested Dnozzle (mm) Required pressure (MPa)
Human calcanei 0.6 30
Human femora 0.3 40
Bone cement 40
Human femora 0.2 50
Bone cement 30
Human interface tissue 0.2 12
0.6 10

A comparison between the existing systems and the proposed algorithm is illus-
trated in Table 3.
The methods proposed in this study will provide more flexible and robust solutions
for setting up the waterjet apparatus when used in surgical procedures.

3 Mathematical Formulation

The operating parameters of the waterjet machining process are determined several
independent variables. Table 4 summarizes these variables based on four system
components: Process, skin, nozzle and pump characteristics [8].
Figure 1 describes how each parameter can control the incision characteristics as
well as the illustration of the incision processes.
Table 3. Features of previous works and proposed methods
Authors Year Type of study Method used Apparatus Water Pressure Depth of Width of Cutting Orifice Stand- Angle Feed
purity incision incision velocity diameter off rate/transverse
distance speed
Arif [8] 1997 Skin incision Finite element analysis Theoretical 100% Fixed Generated Generated N/A Fixed N/A N/A N/A
Water
Vichyavichien 1999 Skin incision Finite element analysis Theoretical 100% Fixed Generated Generated N/A Fixed Fixed Fixed N/A
[9] Water
Wanner et al. 2002 Fat tissue Ex vivo Commercial 0.9% Fixed Generated N/A Fixed Fixed Fixed Fixed N/A
[10] incision saline
Rennekampff 2006 Debridement of Ex vivo Commercial Sterile Fixed N/A N/A Fixed Fixed N/A Fixed N/A
et al. [5] burn wounds saline
Cubison et al. 2006 Debridement of Ex vivo Commercial N/A Fixed N/A N/A Fixed Fixed N/A N/A N/A
[11] burns
Tenenhaus 2007 Wound Ex vivo Commercial N/A Fixed N/A N/A Fixed Fixed N/A N/A N/A
et al. [6] debridement
Keiner et al. 2010 Brain tissue In vivo Commercial 0.9% Fixed N/A N/A N/A Fixed N/A N/A N/A
[12] dissection Saline
Kraaij et al. [7]2015 Interface tissue In vitro Custom 100% Fixed Generated N/A Fixed Fixed Fixed Fixed Fixed
incision Water
Bahls et al. [4] 2017 Various tissue In vivo Commercial 10% Fixed N/A N/A Fixed Fixed Fixed Fixed N/A
incision or Gelatin
abrasion and
removal
Proposed 2018 Skin incision Mathematical/Simulation Matlab & 100% Generated Variable Variable Generated Generated Variable Fixed Variable
Minitab Water
Applying Waterjet Technology in Surgical Procedures
619
620 G. Abdou and N. Atalla

Table 4. Waterjet incision parameters


Process characteristics Skin characteristics Nozzle characteristics Pump characteristics
Depth of cut Thickness Stand-off distance Pressure ratio
Width of cut Hardness Orifice diameter Flow rate
Traverse (feed) rate Consistency Nozzle structure Pump efficiency
Waterjet flow rate Power

Fig. 1. Waterjet parameters and its components.

3.1 Surgical Incisions Main Components: Operation Characteristics


The main three components for a surgical incision are: the width of incision, the length
of incision and the depth of incision. Before performing the incision, the surgical team
must have these three factors defined. The width of incision as well as the length of
incision is determined based on the individual surgery and the recommended incision
specifications. When performing a skin incision, the depth of incision is determined by
the skin thickness. Epidermal thickness differs by age, sex, gender, skin type, pig-
mentation, blood content, smoking habits, body site geographical location and many
other variables. For these reasons, a system which can adapt to the differences must be
created.
Applying Waterjet Technology in Surgical Procedures 621

To develop metrics for skin thickness, high frequency Ultrasound technology is


necessary. By applying the ultrasound apparatus on the area to be operated on, skin
thickness can instantly be measured and fed into the system which determines the water
pressure required for the skin incision. Other skin characteristics can also be deter-
mined from the Ultrasound results. Such characteristics include the elastic modulus of
each of the skin layers as well as their tensile strength.
The total energy required for the skin incision which is converted to pressure
energy is formulated as follows:

PE ¼ UTS Qs ð1Þ

Where UTS is the Ultimate Tensile Strength of the skin, and Qs is the flow rate at
which the waterjet removes the skin which is calculated as:
For skin cutting and debridement:

Qs cut ¼ Ds Ls f ð2Þ

For skin drilling:

Qs drill ¼ Ds ws vs ð2aÞ

Ds is the depth of incision, Ls is the length of incision, f is the traverse speed (feed rate),
ws is the width of cut and vs is the velocity of the waterjet stream at the skin.

3.2 Waterjet Operating Conditions: Catcher Characteristics


To minimize the process noise, a catcher is necessary. The kinetic energy of the catcher
is the remaining energy that is not absorbed by the skin incision process, it is for-
mulated as follows:

1
KEc ¼ Qc v2c qw ð3Þ
2
Where qw is the density of water. Qc is the flow rate at which the residue water is going
into the catcher; it is the sum of the flow rates of water out of the nozzle Qn and rate at
which the waterjet removes the skin Qs.
The velocity at which the excess water is going to the catcher (vc) is:
pffiffiffiffiffiffiffi
vc ¼ 2gx ð4Þ

where, g is the gravity.

3.3 Waterjet Operating Conditions: Nozzle Characteristics


The kinetic energy of the waterjet stream coming out of the nozzle is the sum of the
pressure energy required the skin incision and the kinetic energy of the catcher:
622 G. Abdou and N. Atalla

KEn ¼ PE þ KEc ð5Þ

To look at the nozzle characteristic of the waterjet incision, this kinetic energy (5)
will be equal to the following:

1
KEn ¼ Qn v2n qw ke ð6Þ
2
Where vn is the velocity of the waterjet stream coming out of the nozzle, ke is the loss
coefficient.
The waterjet nozzle converts high pressure water to a high velocity jet. The per-
formance of waterjet incision is affected by several variables such as nozzle orifice
diameter, water pressure, incision feed rate and standoff distance. In the medical field,
waterjet incision devices usually use low to medium pressure as well as a small design
nozzle that is different from industrial waterjet. A relationship between the velocity of
the waterjet stream coming out of the nozzle (vn) and the velocity of the waterjet stream
at the skin (vs) can be described as follows:
vs
vn ¼ ð7Þ
eax
Where, a is the taper index and x is the standoff distance of the nozzle. Assuming a
straight taper waterjet nozzle design, the flow of the water from the nozzle to the
atmosphere is affected by the area and the shape of the orifice. Table 5 represents the
different orifice types and the typical values of contraction (Cc) and loss (ke) coeffi-
cients for water orifices.

Table 5. Types of orifices and their coefficients values [13]


Orifice Description Cc Ke
SE Sharp-edged 0.63 0.08
RE Round-edged 1.0 0.10
TSE Tube with square-edged 1.0 0.51
TRE Short tube with rounded entrance 0.55 0.15

From (1) through (7) Qn and vn are calculated:


For cutting and debridement:

2 PEcut þ 2gxqw Ds Ls f
Qn ¼   ð8Þ
qw v2n  2gx
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 PEcut þ 2gxqw Ds Ls f
vn ¼  2gx ð9Þ
Qn qw
Applying Waterjet Technology in Surgical Procedures 623

For drilling:

2PEdrill þ 2gxqw Ds ws vs
Qn ¼   ð8aÞ
qw v2n  2gx
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2PEdrill þ 2gxqw Ds ws vs
vn ¼  2gx ð9aÞ
Qn qw

The relationship between Qn and vn can also be represented by:

Qn ¼ Cc An vn ð10Þ

An is the area of the orifice of the nozzle which is represented by:

dn2
An ¼ p ð11Þ
4
Where dn is the orifice diameter of the nozzle.

3.4 Waterjet Operating Conditions: Pump and Intensifier Characteristics


The relationship between the velocity of the waterjet flow coming out of the pump
reservoir and the one coming out of the nozzle is calculated as follows:

vr ¼ vn e2bLn ð12Þ

Where Ln is the length of the nozzle and b is the exponential constant which is based
on an exponential taper waterjet nozzle design where:

lnðdn =do Þ
b¼ ð13Þ
Ln

Where do is the diameter of the top of the nozzle.


The pressure ratio (rp) between the water outlet pressure (Pw2) and the oil inlet
pressure (Po1) and as well as the oil inlet area (Ao) and the water inlet area (Aw) is
described as follows:

Pw2 Ao
rp ¼ ¼ ð14Þ
Po1 Aw

The waterjet flow rate out of the intensifier (Qi) is equal to the waterjet flow rate
coming out of the nozzle (Qn). By design, the hydraulic intensifier increases the
pressure of water. Thusly, the water pressure coming out of the intensifier (Pw2) is
determined by the Power (W), the efficiency of the intensifier (ηi) and the flow rate (Qi)
as follows:
624 G. Abdou and N. Atalla

Wgi
Pw2 ¼ ð15Þ
Qi

4 Application Example and Results

In this example of a caesarean section procedure, Pfannenstiel traverse incision is


assumed. This curved incision (Length of incision Ls) is approximately 10–15 cm long
and 2 cm above the pubic symphysis [9]. Using the waterjet, the skin and rectus sheath
are opened traversely. The rectus muscles are not cut and the fascia is dissected along
the rectus muscles. The skin thickness at the abdomen for a female is approximately
2.30 mm while the subcutaneous adipose tissue thickness at the abdomen is approxi-
mately 15.7 mm [10]. The UTS of the skin at the abdomen ranges between 1 and
24 MPa [11]. The exact thickness of the skin and its characteristics would be measured
using high frequency Ultrasound. The width of cut is 0.4 mm; in an traditional incision,
a #10 (0.4 mm) blade is used [12, 13]. Table 6 summarizes the operation character-
istics as follows:

Table 6. Caesarean section operation characteristics [14–19]


Parameters Value
Depth of cut (Ds) 18.00 mm
Length of cut (Ls) 12.00 cm
Width of cut (ws) 0.40 mm
Ultimate Tensile Strength (UTS) 20.00 MPa
Density of water (q) 1.00 g/cm3
Feed rate (f) 0.50 mm/s
Gravity (g) 9.80 m/s2
Stand-off distance (x) 5.00 mm
Taper (a) 0.25

The waterjet velocity coming out of the nozzle (vn) is 151.05 m/s while the waterjet
velocity that reaches the skin (vs) is 150.86 m/s. The velocity of the excess water that is
going to the catcher is very minimal at 0.31 m/s. The calculated power required for the
intensifier is 423.52 W. Assuming the efficiency of the intensifier (ηi) is 80%, the
calculated pressure that is required for the cesarean section operation is 17.89 MPa
with a 0.4 mm nozzle orifice diameter.
The results obtained from this study can be summarized as follows:
1. The mathematical formulation for different incision processes has been developed
and simulated for the best results.
2. Using the cutting incision, an application example has been demonstrated.
3. The data applied has been extracted from real life application.
Applying Waterjet Technology in Surgical Procedures 625

5 Conclusion and Recommendations

Given any surgical operation characteristics, this mathematical model is able to cal-
culate the optimal operating conditions for surgical cutting, debridement or drilling.
This will help the surgeon pick the right nozzle size as well as the right waterjet
instrument parameters such as pressure, power and velocity. The next step is to use the
results of the study to create a comprehensive surgical procedure simulation model
such as a Caesarean section procedure or any other surgical procedure that is needed.

References
1. Areeratchakul, N.: Investigation of water jet based skin surgery (2002)
2. Yildirim, G.: Using Water jet technology to perform skin surgery (2003)
3. Hreha, P., Hloch, S., Magurová, D., Valíček, J., Kozak, D., Harničárová, M., Rakin, M.:
Water jet technology used in medicine. Tech. Gaz. 17(2), 237–240 (2010)
4. Bahls, T., et al.: Extending the capability of using a waterjet in surgical interventions by the
use of robotics. IEEE Trans. Biomed. Eng. 64(2), 284–294 (2017)
5. Rennekampff, H.-O., Schaller, H.-E., Wisser, D., Tenenhaus, M.: Debridement of burn
wounds with a water jet surgical tool. Burns 32, 64–69 (2006)
6. Tenenhaus, M., Bhavsar, D., Rennekampff, H.-O.: Treatment of deep partial thickness and
indeterminate depth facial burn wounds with water—jet debridement and a biosynthetic
dressing. Inj. Int. J. Care Inj. 38, 538–544 (2007)
7. Kraaij, G., et al.: Waterjet cutting of periprosthetic interface tissue in loosened hip
prostheses: an in vitro feasibility study. Med. Eng. Phys. 37(2), 245–250 (2015)
8. Arif, S.M.: Finite element analysis of skin injuries by water jet cutting. In: Mechanical and
Industrial Engineering. New Jersey Institute of Technology, Newark (1997)
9. Vichyavichien, K.: Interventions of water jet technology on skin surgery (1999)
10. Wanner, M., Jacob, S., Schwarzl, F., Oberholzer, M., Pierer, G.: Optimizing the parameters
for hydro-jet dissection in fatty tissue - a morphological ex vivo analysis. Eur. Surg. 34(2),
137–142 (2002)
11. Cubison, T.C.S., Pape, S.A., Jeffery, S.L.A.: Dermal preservation using the Versajet®
hydrosurgery system for debridement of paediatric burns. Burns 32, 714–720 (2006)
12. Keiner, D., et al.: Water jet dissection in neurosurgery: an update after 208 procedures with
special reference to surgical technique and complications. Neurosurgery 67(2), 342–354
(2010)
13. Abdou, G.: Analysis of velocity control of waterjets for waterjet machining. In: Waterjet
Cutting West. Society of Manufacturing Engineers, Los Angeles (1989)
14. Raghavan, R., Arya, P., Arya, P., China, S.: Abdominal incisions and sutures in obstetrics
and gynaecology. Obstet. Gynaecol. 16, 13–18 (2014)
15. Akkus, O., Oguz, A., Uzunlulu, M., Kizilgul, M.: Evaluation of skin and subcutaneous
adipose tissue thickness for optimal insulin injection. Diabetes Metab. 3(8) (2012)
16. Jansen, L.H., Rottier, P.B.: Some mechanical properties of human abdominal skin measured
on excised strips. Dermatology 117(2), 65–83 (1958)
17. Ritter, J.: The Modern-day C-section. Surg. Technol. 159–167
18. FST Homepage. https://www.finescience.com/en-US/Products/Scalpels-Blades/Scalpel-
Blades-Handles/Scalpel-Blades-10. Accessed 8 Apr 2018
19. WardJet Homepage. https://wardjet.com/waterjet/university/precision-quality. Accessed 31
Mar 2018
Blockchain Revolution in the Healthcare
Industry

Sergey Avdoshin(&) and Elena Pesotskaya

National Research University Higher School of Economics,


20 Myasnitskaya ulitsa, 101000 Moscow, Russian Federation
{savdoshin,epesotskaya}@hse.ru

Abstract. The paper analyses the possibility of using blockchain technologies


in the sphere of Healthcare. Modern society requires new tools, e.g. distributed
ledger and smart contracts for sharing data between patients, doctors and
healthcare professionals by giving them control over the data and allowing
smarter cooperation. In this situation, utilizing blockchain technology can
resolve integrity, data privacy, security and fraud issues, increase patient health
autonomy and provide access to better services. This paper provides a review of
blockchain technology and research of possible applications in healthcare, gives
an overview of positive trends and outputs.

Keywords: Blockchain  Distributed ledger  Smart contracts


Healthcare  Patient  Security

1 Introduction

Blockchain is already disrupting many industries. Initially it was intended as a banking


platform for digital currency, but now blockchain has applications that go beyond
financial transactions and its operations are becoming popular in many industries.
The idea of blockchain is to use a decentralized system that can replace banks and
other trusted third parties. Blockchain is a large structured database distributed by
independent participants of the system. This database stores an ever-growing list of
records in order (blocks). Each block contains a timestamp and a reference to the
previous block. The block cannot be changed spontaneously - each member of the
network can see that a transaction has taken place in the blockchain, and it is possible
to perform a transaction only with access rights (private key). Blocks are not stored on
a single server, this distributed ledger is replicated on thousands of computers
worldwide, so users interacting in the blockchain do not have any intermediaries.
Blockchain technology can be shared by individuals, organizations, and even devices.
It saves time, increases transparency, and gives the ability to make everything a
tradable asset. The World Economic Forum predicts that by 2027, it would be possible
to store nearly 10% of the global gross domestic product on blockchains [1].
The potential of blockchain has already been realized by many people - authors
who want to protect their research and share the knowledge at the same time, by car
owners who want to share their car or use rental cars with no 3rd parties’ commission.
Even for people who want to share music or even space on their hard drive, but want to
© Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 626–639, 2019.
https://doi.org/10.1007/978-3-030-02686-8_47
Blockchain Revolution in the Healthcare Industry 627

feel secure and protected at the same time with no involvement of counterparties. Many
industries are thinking about the great potential and possibilities of blockchain tech-
nology, and the strong positive effect it can have on people’s health and the healthcare
system. The cost of medicine in the world is constantly growing. According to the
Global Health Care report, world health spending in the world’s major regions will
increase from 2.4% to 7.5% between 2015 and 2020 and will reach $8.7 trillion by
2020 [2]. This is influenced by many factors, including the increase and aging of the
population, economic growth in developing countries, and others.
Let’s analyse the basic needs for healthcare services that every patient and doctor
face and the associated risks:
• Organizing visits to the best healthcare professionals, finding trusted and afford-
able care providers. What we can see now is the fact that though the prices for
medical services are increasing rapidly, it is still difficult to find the appropriate
specialist and treatment for a symptom or disease or it requires long waiting lists.
Availability of medical services for patients, access to the best possible treatments
and innovative services are very important in the healthcare industry. Patients need
to be able to search for care providers in a snap - even abroad if needed - with
information on where a specific treatment is done with great care and without delay
or sometimes after-hours access to medical care.
• Storage, management and control of access to patients’ data. Patients need instant
data access (including CT, MRI, x-rays, echocardiograms, ultrasounds, etc.) from
any place on their mobile device, iPad or PC. Such access has become possible due
to the digital revolution and a development of mobile healthcare, but still there is a
question how a person can be assured about personal data being secure. Also
patients face potential risks of data mismanagement, access limitations to their
patient records, and decentralization of all personal healthcare data.
• Communication with your doctor and community on a real time basis, getting
access to knowledge, trainings, healthcare plans, and advisory services. Lack of
communication between experts in different fields and the impossibility of a quick
consultation with several specialists from one area of medicine causes lower quality
and negative patient experiences. Patients expect to consult with a specialist who
has a long history of treating and healing patients with similar symptoms. What we
see now is a lack of incentives and personalized information about preventive care:
visits from one specialist to another to get a clear view on a disease, manually
searching Google and hoping that eventually, someone can help.
• Easy and transparent payments for the medical services. Many people will agree
that it would be convenient to use a single medical insurance around the world.
Today, this is hampered by difficulties with insurer checking and slow payments
through a long chain of intermediaries. Additionally, patients want to pay not for the
fact of seeing a specialist, but for the result that they receive. Currently, in most
cases payment takes place before admission, or money is written off regardless of
the outcome. Patients often overpay for repeated tests in multiple medical institu-
tions, or alternatively, undergo unnecessary examinations.
Telemedicine or mobile medicine can solve some of the raised issues, which has a
great potential to reduce the uncertainty of diagnoses, increase accessibility from
628 S. Avdoshin and E. Pesotskaya

remote areas, improve the quality and efficiency of treatment as well as the cost-
effectiveness. But it still faces many challenges associated with international payments
and 3rd party fees, centralization, patient security, integrity, and trust - factors related to
different organizational entities. Using blockchain technology, patients and society can
also eliminate the potential risks of data mismanagement, access limitations, delays in
prognosis and human manipulation.
The contribution of this paper is twofold. Firstly this paper explores the potential
applications of blockchain in the Health Industry by examining the core requirements
of the healthcare interested parties and society. Secondly, the analysis of the existing
solutions and applications helps generalising the framework and approach for choosing
the appropriate technology. This paper aims to provide a foundation for evaluating the
effects of a blockchain technology on healthcare ecosystem.
The main research question: What are the possibilities of using Blockchain in the Healthcare
Industry?

To approach the research question we describe applications of blockchain in the


health industry, based on the customer needs (Sect. 3), followed by the research of the
blockchain technology and solutions (Sect. 4). In the discussion, we present the
examples of several ICO launches and healthcare blockchain startups in practice.

2 Technology Investment Trends

Healthcare has the most aggressive deployment plans of any industry: 35% of
respondents in that industry say their company plans to deploy blockchain into pro-
duction within the next calendar year [3]. Many people will agree that it would be great
to use the insurance all over the world, having instant access to best healthcare pro-
fessionals. Currently there are many difficulties connected with insurance: long pro-
cedures and slow payments with participation of many involved intermediates, security
and trust.
The Global Health Journal [4] published a research of projects that implement a
blockchain technology in healthcare. Currently there are over a thousand blockchain
startups, various open source implementations. There are dozens of blockchain com-
panies targeting healthcare applications.
According to an IBM survey, which involved 200 healthcare executives across
sixteen countries, approximately 16% admitted to taking a proactive approach in
adopting a commercial blockchain solution in 2017 [5].
Blockchain startups seek investments through initial coin offering (ICO) with
tokens sold to the public - the startup exchanges “utility” tokens for cash. The initiated
tokens provide utility within the network, and tokens are traded on secondary
exchanges.
ICOs and token launches are a growing method of blockchain financing and
investors are proactively participating in such ICOs as there is no time to lose. Con-
tracts can be signed remotely, and the profit from ICOs has been growing over recent
years, with investors getting their money back even if the ICO does not work. Investors
hope to turn a profit by buying early access to potentially foundational blockchain
Blockchain Revolution in the Healthcare Industry 629

protocols and applications, just as early investors into bitcoin and Ethereum did. For
reference, a $100 investment into bitcoin on January 1, 2011 would now be worth
nearly $1.5 M. Over 250 blockchain teams have completed ICOs since January 2016,
with more than 55% of them raised during or after July 2017. Cumulatively (since
January 2016), the number of ICOs should surpass the number of equity deals in
October 2017, emphasizing the hype around the financing mechanism [6].
Currently Robomed Network (https://robomed.io/) is launching an ICO in order to
attract $30 mln for network deployment in Russia and all over the globe. The Robomed
Network is aimed at dramatically changing the healthcare environment and ecosystem
by applying a smart contract and a value-oriented approach to medical services. The
Robomed Network connects healthcare service providers and patients based on a smart
contract, the value criteria of which are the performance metrics of a specific medical
service and patient satisfaction.
Another international blockchain healthcare provider UBI (http://www.globalubi.
com/index.aspx) can be used for applications that record data about customer health
and automatically change the tariffs depending on the client’s behavior based on a
smart contract and already announced an ICO date.

3 Potential Applications of Blockchain in the Health Industry

3.1 Blockchain for Electronic Medical Records


In today’s digital age, technology is at the core of all business and personal aspects.
The rapidly evolving Internet of Medical Things (loMT) has made it difficult for the
existing health IT infrastructure and architecture to support it effectively. It is estimated
that by 2020, the number of connected healthcare loT devices will be 20–30 billion, up
from 4.5 billion in 2015 [7]. Many big companies see great potential in building the
interface between healthcare and the mobile industry and creating ecosystems and
using devices. There has been a notice able increase in the amount of data generated
regarding the health and lifestyle of consumers due to the IoT enabling more medical
device activity. Currently healthcare organizations store large amounts of sensitive
patient information with no single approach to cybersecurity that raise certain concerns
about interoperability, data privacy, and fraud.
The EHR (Electronic Health Records) system is believed to be of great benefit to the
mobile health sector of the future. However, in practice, their implementation is com-
plex and expensive, and adoption on a global scale is low. EHRs were never assumed to
support multi-institutional, life time medical records, unlike PHR. The concept behind
PHR (Personal Health Record) is that medical records are stored by a third party
provider so that they can be accessible in whole or in part by healthcare professionals as
and when needed. Mobile PHR systems represent the potential for significant changes in
how medical data are stored and used. PHRs also represent a change in the “ownership”
of health information - from the medical institution, or health authority, to the indi-
vidual, who is thereby empowered. Eventually, the argument goes, the “cure” is
replaced by continuous monitoring before any cure is needed [8].
630 S. Avdoshin and E. Pesotskaya

Certain difficulties arise in the establishment of an up-to-date healthcare system in


Russia as a number of barriers need to be broken down in order to ensure proper
communication between different stakeholders – connecting providers, physicians,
patients, clinics, government, etc. Patients nowadays have personal data distributed
among clinics, hospitals, labs and insurance companies. This ecosystem does not work
very well because there is no single list of all the places data can be found or the order
in which it was entered. Many Russian doctors don’t want patients to access EHRs,
being concerned by the fact that the patient can get access to his entire medical history,
and can draw wrong conclusions regarding the state of their health. This means that
patients take a passive role in managing and tracking their health, having a lack of
control and ownership that makes them feel disappointed in their care. Those patients
who don’t find proper care are discontented and their faith in medical professionals
disappears. This in turn deteriorates trust towards physicians, which is why less than
half (*34%) of patients trust medical professionals compared to a 70%+ rate 50 years
ago [9].
Concerns about the integrity and cybersecurity of patient data have always plagued
the healthcare industry. In 2016 alone, around 450 data breaches were reported
according to the Protenus Breach Barometer report. This impacted over 27 million
patients. The breaches were mostly caused by insiders; human error or theft of data,
amounted to 43% of the breaches, whereas the others were due to hacks, ransomware
or malware [10].
A solution would be a record management system that can handle EHRs based on
blockchain technology. It helps to guarantee data integrity and protect patient privacy
by handling access rights to a particular pool of data and ensuring that personal data
does not fall into the wrong hands. In blockchain personal data do not have to be placed
somewhere: everything is stored on the client’s device, and only their confirmation is
stored in the blockchain system.
Being decentralized, the technology of blockchain can ensure that data is stored
securely in chronological order, in millions of servers and devices. This chronological
chain of activity is shared—everyone participating on the network can maintain a
complete activity history. Cryptography (encoding) is used to ensure that previously
verified data modifications are safe. The permissions for the data access also stored on
the blockchain, and the patients’ data is only accessible by the party to whom access
was granted, despite this data being hosted in a decentralized manner. Every modifi-
cation of data is agreed to by the participants on a network according to the established
rules and the data can be trusted without having to rely on a central authority like
financial organization or government.
In blockchain technology patients are able to access securely and move their
medical records between different healthcare organizations. Whenever required, the
data from the various connected devices can be accessed instantly using the unique key
assigned to the medical professionals. During the visit of a new patient the doctor can
consult the system and other specialists, get all the necessary information on the state of
the patient’s health, and plan appropriate treatment. Such collaboration of patient and
doctor reduces the need to rely on intermediaries, the amount of time wasted while
waiting, and inconsistent treatment plans from different healthcare professionals.
Blockchain Revolution in the Healthcare Industry 631

All this improves patients trust and satisfaction. For this reason, blockchain technology
has been referred to as a “trust machine” [11].
We can see a growth of decentralized health platforms with a portable, secure, and
self-sovereign personal health record (PHR) built on blockchain technology and
designed to drive healthy patient behavior through the security token. Usually a plat-
form provides access to patient-controlled health records, including medication, diag-
nosis, care plan, complex medical imaging, patient generated behavior data, key vital
signs generated outside of the clinic including weight, blood pressure, sleep, stress
levels, glucose, and more. The platforms pull information from electronic health record
systems, as well as from all personal sources of patient-generated data including the
web, mobile applications, and connected devices. Patients grant permissions for data
access via smart contracts embedded in the blockchain, and executions performed via
the application. The mobile app then allows users to create an individual profile
through which they can review their health information, connect with care providers or
even chat to patients with similar conditions. Platforms are designed to be fully
compatible with existing EMR systems, and work like an API. Hospitals and health
care providers usually are able to use the same equipment and technology with only a
minor change to their backend. Among the most popular platforms we can distinguish
MintHealth [https://www.minthealth.io/], HealthHeart [https://www.healthheart.io/],
Patientory [http://www.patientory.com/], MedRec by Media Lab [https://medrec.
media.mit.edu/] and many others. Doctors, health systems, health coaches, case man-
agers, family, and friends can gain access to the data via social modules embedded in
the applications that will serve to build awareness around the healthcare chronic
conditions via a patient-centered community. Of course, only patients can specify who
can access their health records.
The advantages of using blockchain technologies apply to many participants within
ecosystem:
• “Medical history right in the pocket” and direct access to healthcare for Patients.
Patients get instant access to health information and the medical community to learn
more about treatment and therapy, get 247 advisory services, trainings, education
and access to care plan information. The patient community and EHR can even be
referenced in an emergency or when travelling abroad when quick access to medical
records is needed. Also patients will be able to search for care providers in a snap -
even abroad if needed - with information on where a specific treatment is done with
great care and without a long wait.
• “A data sharing platform for providing a personalised medicine” - for healthcare
professionals. Doctors, health coaches and healthcare advisors get instant access to
medical history information including complete notes from other medical organi-
zations. They can interact with patients more efficiently being able to leverage a
proven clinical tool with built-in automation. - complete view of their patients’
history, including out-of-network encounters, prescription fills, and lifestyle infor-
mation, and can eliminate the administrative burden associated with medical record
transfers. Doctors can reach relevant patients, build online reputation, and get access
to the latest technological possibilities.
632 S. Avdoshin and E. Pesotskaya

• “Cost-saving” for Healthcare Organizations and Insurance companies. They save


costs on data gaps by using improved standards of care, involving the patient in
their care plan, providing medication reminders, appointment booking and tools to
track personal health that have a positive impact and improve clinical outcomes.
Having a more complete picture of a patient’s health condition, insurers and
healthcare organizations can create individual healthcare plans based on personal-
ized information and machine intelligence, saving costs, improving outcomes and
increasing productivity of medical services.
For example, if the client was at the doctor’s place, the system will only have a
document stating that the medical examination took place, and the diagnosis and the
medical history will remain with the user. If the customer’s data were verified during
the conclusion of the contract, he can send the confirmed identification data to other
companies for the conclusion of new contracts without the need to re-pass the verifi-
cation process. In addition to that, transparency and fairness of tariffs and processing of
insured events can increase the client’s motivation and interest.

3.2 Blockchain for Tracking and Tracing Medical Fraud


The identification of healthcare fraud is another direction of application of the block-
chain technology. This affects the concern of the patients that healthcare representatives
and organizations used to falsify personal healthcare records and prescriptions.
Regardless of whether your employer provides you with health insurance, or if you
have taken out a policy for yourself, you can be at risk of fraud. This happens when a
person takes advantage of a patient by either inserting into their EHR false diagnoses of
medical conditions that are untrue, or by exaggerating the conditions that they do have.
The intention is to submit for payment fraudulent insurance claims.
Even if a person uses free medical care (which is common in Russia) with the
funding coming from the healthcare tax imposed on all registered employers (over 3%
of each employee’s income), this means the waste of a healthcare budget that can be
allocated for more quality services, higher medical staff compensations, more afford-
able care services, etc. Blockchain takes control over the customer healthcare record,
tracks all changes, and protects against mistakes and data mismatch.
Currently the workload for pharmacies, insurance companies, and doctors in ver-
ifying the correctness of prescriptions and reducing fraud and coincidental mistakes is
very high. Insurance companies more often than other financial institutions suffer from
fraud. Sometimes claims are denied because of incomplete or incorrect information.
Blockchain allows one to check the customer and every particular case with minimal
costs. Manipulation of claim assessments causes patients to suffer huge time delays and
loss of claims due to incomplete or ‘mismanaged’ records. A blockchain that connects
hospitals, physicians, lab vendors and insurers could enable a seamless flow of health
information for improved underwriting and validating of claims.
Among the benefits we can state the fact that insurance companies will need to
spend less time checking data, that they can trust the data presented to them, not only
from the access given to them by the patient, but also from the notes provided from the
medical professional. The burden of patient losses will be reduced as well as the cost of
Blockchain Revolution in the Healthcare Industry 633

disputes, an insurance company will have become completely transparent and would be
able to suggest a more personalized care plan based on accurate medical records.
EHR fraud and operational mistakes are not the only reasons for using blockchain
technology. Some participants can see the benefits to secure drug provenance, manage
inventories and provide an auditable drug trail. Drug production and distribution
involves many participants - manufacturers, distributors, wholesalers and pharmacies
who want to know the true source of the drug and track distribution from the factory
floor to the end user. A blockchain-based solution can help build such trust in
healthcare products and their supply chain. Manufacturers can record drug batches as
blockchain transactions tagged with a QR code revealing batch details. Records on a
blockchain cannot be modified, updates to records are stored on the blockchain by
writing the updated version of the full record to the blockchain with all versions of the
record available. The drug batch details are immutable once confirmed on the block-
chain. A single tracking identifier is established via a QR code across the distribution
chain. All downstream participants can trust a drug batch based on the scanned QR
code and use the same data to track further distribution, they can buy or sell the drug
post-verification using the QR code returned by the blockchain.
This greatly simplifies and streamlines the distribution management that can pre-
vent the drugs from falling into the wrong hands, authenticating the drug for the end
consumer which greatly reduces the counterfeiting possibility, price manipulation and
delivery of expired drugs [12]. Another advantage of using blockchain in this scenario
includes the safety of the patient as spurious drugs cannot enter the distribution chain.
The true source of the drug can be irrefutably proved as manufactured batches are
recorded on a blockchain as a single source of truth available to all participants. Each
participant in a blockchain can verify the drug before it is purchased and after it is
received [13]. Within a few seconds, the blockchain technology will allow patients to
check the drugs for authenticity learn the manufacturer and track the history of the
movement through the delivery chain.

3.3 Blockchain for Artificial Intelligence


Artificial Intelligence (AI) in the health sector uses algorithms and software to simulate
human abilities in the analysis of complex medical data. A huge amount of medical
data pushes the development of applications with AI, although it should be noted that
AI has not yet reached the full potential for the healthcare industry, as this requires a
large and diverse range of data to ensure accuracy and effective results.
Blockchain technology allows creating a platform where patients can discuss their
medical data with an advanced artificial intelligence “doctor”. This functionality might
help healthcare providers and medical companies to provide services, which will allow
their patients to have personalized (based on health data) AI-powered conversations
about their health. Also it will improve patient care and experience through an
advanced natural dialogue system which will be able to generate insights from com-
bined medical data [14].
With artificial intelligence healthcare specialists and primary care physicians are
able to diagnose quickly a patient with a given system, taking into consideration what
treatment has worked in the past for similar diseases (leverage all of the medical data,
634 S. Avdoshin and E. Pesotskaya

e.g. the blood tests, MRI results, X-Rays, echocardiograms, etc.) and how it has
worked. This principle can be applied to diagnosing illnesses as well.
Whatever it can be converted into alphanumeric data will be inputted into the AI
neural network. This enables the system to be trained to assist medical professionals,
helping them to diagnose conditions quickly and recommend treatment plans based on
an individual’s personal medical profile and their symptoms. An artificial intelligence
platform can be launched on the blockchain that is able to predict and diagnose ail-
ments based on a vast database of previous diagnostic histories and the results of
medical examinations. Patients will be able to approve their data to be used for this
while doctors will be able to narrow down options quickly for diagnosis and treatment
with the help of an intelligent platform with patient data from all around the globe as
MediBond platform [15] announces their intention of doing it. The more is the par-
ticipants, the greater is the value of the network.

3.4 Blockchain for Secure and Guaranteed Payments


Blockchain technology helps to create an ecosystem through smart contract and digital
currency, so that all participants – patients, doctors, healthcare providers, researches
and medical institutions, are financially motivated and secured. In this context “Smart”
means “without intermediates” - e.g. banks, financial organizations or insurance
company or brokers. Smart contract also means “technically executed” as without
execution there is no payment. They are written to execute some given conditions, to
eliminate the risk of relying on someone else to follow through on their commitments.
This is particularly important for value-based healthcare, in which payments are tied to
outcomes. For convenience, the agreement and the patient’s signature can be digital.
The patient pays for the medical services - visits, consultation, tests, etc. with Tokens
(cryptocurrency).
The distributed nature of blockchain technology makes possible accepting pay-
ments and paying healthcare providers for their contribution globally. This mechanism
avoids complicated legal and accounting procedures supported by assigned specialists
and charging fees for the services. This method of payment makes it possible for any
individual, no matter where they are in the world, to purchase services without the need
to pay additional charges related to processing credit card transactions. The protection
of patient’s rights is assured without the need of involving additional third parties, such
as expensive lawyers, or entities to ensure that the correct treatment has been pre-
scribed. Once the conditions of the smart contract have been met, the payment will
automatically be taken from the patient’s account and be deposited into the service
provider’s account.
Smart contracts offer several advantages: they are a reliable and transparent payout
mechanism for the customer that enables automation of claims handling and can be
used to enforce contract-specific terms. It means that in the case of illness or an
accident, a smart contract can ensure that the claim is only paid out if the patient
recovers and received full treatment in the preferred hospital as predefined by the
insurer. Although such programs could also be implemented without blockchain, but a
blockchain-based smart contract platform could provide substantial network effects - an
increased degree of transparency and credibility for customers due to decentralization.
Blockchain Revolution in the Healthcare Industry 635

Smart contracts offer a great benefit to Insurance companies as their business


depends directly on data that is available to the insurance specialist, and this data needs
to be reliable and trustworthy. Insurance contracts are usually complicated and hard to
understand for the majority of people, as they contain legal terminology. Smart con-
tracts help to make the insurance industry more transparent and friendly to both current
and potential clients.

3.5 Blockchain for Medical Research


Blockchain technology enables research and discovery. With smart contracts, it
becomes possible to reward healthcare content creators in proportion to how everyday
visitors perceive their content (e.g. “likes” that get recorded). Moreover, rewards are an
additional push for medical professionals to sign up to have a free mobile-friendly
online profile.
Healthcare companies can use the blockchain-based platform to reach potential
clinical trial participants who fit a certain medical history or care plan. The traditional
amount of time and effort required to source such participants is greatly reduced, as
well as the dependency on health systems to act as intermediaries. Additionally, the use
of such blockchain-based systems facilitates longitudinal tracking of trial participants.
This is of most importance though, this also reduces the risks and increases the effi-
ciency of these trials through means of participation that has been tailored to specific
health or genomic profiles.
Sometimes medical researchers mine the network as the healthcare community
(patients, doctors) release access to aggregate, anonymous medical data as transaction
“fees” that become mining rewards. In some blockchain research platforms (e.g.
MedRec) researchers can influence the metadata rewards that providers release by
selectively choosing which transactions to mine and validate. Providers are then
incentivized to match what researchers are willing to accept, within the boundaries of
proper privacy preservation. Patients and providers can limit how much of their data is
included in the available mining bounties.
This approach helps engage participants in health research, facilitates collaboration,
and fosters an environment of fast-paced learning, seeking better treatment options and
cures for the patients, enables the creation of new communities of individuals who have
a desire to connect with others that share a similar condition, learn about treatment
options, share their experiences, and participate in research. For example, in the Bur-
stIQ platform, individuals can browse the marketplace and make a request to participate
in a research initiative or patient community. Additionally, individuals have the option
to donate or sell their data to a research initiative or population data repository [16].
Among the advantages we can also mention a deep learning environment that con-
tinuously expands the knowledge of an individual to improve relevance and impact.
Researchers can find and access the people and data they need to support their research,
and collaborate with other researchers to explore new ideas. They are able to connect
directly with the right participants, reducing the cost and time-scale of both academic
and commercial health research.
636 S. Avdoshin and E. Pesotskaya

4 Blockchain Solutions

Blockchain is a digital platform that stores and verifies the entire history of transactions
between users across the network in a tamper. Transactions between users or counter-
parties are broadcast across the network and are verified by cryptographic algorithms
and grouped into blocks. At the moment, there are several competing protocols that
exist and a handful of other proprietary middleware and application development suites
for each protocol. They differ in permissions, functionality, access rights and decision
making processes inside the network. The terminology around blockchain is still
confusing. In different sources we can find different definitions of blockchain, and its
classification. In this paper we will distinguish between public and private blockchain,
as well as between permissionless blockchain and permissioned (exclusive) blockchain.
Each public blockchain can be inspected by anyone, whereas private blockchains
can only be inspected by computers that have been granted access rights. Some of the
solutions use an approach that involves tracking data modifications on a private
blockchain and recording hashes of these changes on a public blockchain. In this
approach, the public blockchain effectively serves as a notary for data modifications by
verifying that they occurred and at what time [17].
The majority of blockchain solutions were inspired by Bitcoin’s (https://bitcoin.org/),
original protocol, created in 2011, which aimed to provide an alternative to the formal
financial system, and made possible a blockchain data structure, in which every modi-
fication of data on a network is recorded as part of a block of other data modifications that
share the same timestamp.
Bitcoin blockchain is a public permissionless network where participants are able to
access the database, store a copy, and modify it by making available their computing
power. Bitcoin, a public network offers an open, permissionless invitation for anyone to
join. If the dominant requirement is a trust mechanism between strangers who know
nothing about each other, then a public network may be the way to go. For digital or
crypto-currencies, such as bitcoin, this as a catalyst for driving greater adoption
globally, enabling more people to make purchases with these currencies [18]. The most
notable non-Bitcoin public blockchain is Ethereum (https://www.ethereum.org/), which
was created in 2014. Like Bitcoin, Ethereum also permissionless, runs on a public Peer-
to-peer (P2P) network, utilizes a cryptocurrency “ether”, and stores information in
blocks. Compared to Bitcoin, which was solely designed to store information about
transactions, Ethereum is a programmable blockchain that also allows users to put self-
executing computer scripts and has much broader functionality. It provides a built-in
programming language and an open-ended platform that allows users to create
decentralized applications of unlimited variety. While distributing computing across a
P2P network necessarily results in slower and more expensive computation than nor-
mal, it also creates a database that is agreed to by consensus, available to all partici-
pants simultaneously, and permanent, all of which are useful when trust is a primary
concern.
Bitcoin and Ethereum are both public, permissionless blockchains, which anyone
with the appropriate technology can access and contribute to. Companies use these
open-ended platforms to build their customized solutions. For instance, HealthHeart’s
Blockchain Revolution in the Healthcare Industry 637

platform (https://www.healthheart.io/) uses the Ethereum functionality for assigning


unique addresses to patients, medical care providers, organizations, etc. and restricts
access to a patient’s addresses and link them to the full history of transactions for a
given identity on the blockchain, thus creating an audit trail for all events within a
medical record. It supports reviews of past transactions by consumers, providers and
third party entities that have been granted access, facilitates the connection between the
consumer and the care provider.
Public blockchains offer maximum transparency and its main goal is to prevent the
concentration of power. However, many private firms are uncomfortable relying on
public blockchains as a platform for their business operations due to concerns about
privacy, governance, and performance. For instance, within the banking industry
organizations prefer to transact only with trusted peers.
For this reason IBM (https://www.ibm.com) has invested significant resources into
helping the Linux Foundation design an open-source modular blockchain platform
called Hyperledger Fabric (https://www.hyperledger.org) which provides programmers
with a “blockchain builders kit”, and allows them to tailor all elements of a ledger
solution, including the choice of the consensus algorithm, whether and how to use
smart contracts, and the level of permissions required. It is another permissioned
network which provides collectively defined membership and access rights within a
given business network. Fabric is designed for organizations that need to meet confi-
dential obligations to each other without passing everything through a central authority
and ensuring confidentiality, scalability and security.
Also a number of startups, including Ripple (https://ripple.com/) and the R3
Consortium (https://www.r3.com/), a group of more than 70 of the world’s largest
financial institutions that focuses on developing blockchain permissioned solutions for
the industry, have developed platforms that run on private or permissioned networks on
which only verified parties can participate [19].
Consortium blockchains are usually open to the public but not all data is available
to all participants, while private blockchains provide another type of permission and
access rights to users. In private networks a central authority manages the rights to
access or modify the database. The system can be easily incorporated within infor-
mation systems and offers the added benefit of an encrypted audit trail. In private
blockchains, the network has no need to encourage miners to use their computing
power to run the validation algorithms.

5 Conclusion

Blockchain technology is gradually becoming very popular. The benefits of blockchain


are enormous, from decentralization, to security and scalability, to privacy and
affordability. Both health professionals and organizations will be able to work faster
and more efficiently, relative to how accessible, safe and trustworthy the information
available is. Professionals in the industry that are provided open access to this reliable
information would be able to predict future trends, keep track of pharmaceutical
inventories, amongst other things. As a result, the general population would have
improved health and a higher quality of life.
638 S. Avdoshin and E. Pesotskaya

Still there are a huge barriers to blockchain adoption, such as regulatory issues
(45%), followed by concerns over data privacy (26%) [20]. In the case of Russia – it
also does not have the required regulatory base and needs to provide targeted
government-backed funding with a specific focus on remote medical services and their
integration into existing healthcare programs. A major issue with data processing lies in
the fact that patient information is stored in different places, information is being lost or
concealed through the fault of the patient or the doctor, while there are no personalized
analytics. The regulatory concerns are linked to a decentralized infrastructure that can’t
be controlled by any person or group. Also not everyone is approaching blockchain
positively - there is an opinion that blockchain technology is relatively new, and its
business advantages are unproven, it requires non-trivial computing infrastructure
changes, though this is not completely accurate. There are many startups that have
already proved the fact that blockchain technology has a positive effect on the cost of
provided services, positively influences the delivery of care and the collaboration
between different interested parties. Despite this, in order to maintain regulation
compliant with global health standards, it is necessary to establish a consistent
approach to compliance framework and implementation through standardized pro-
cesses and interoperability. Not only standards need to be in place, but there also
should be a level of confidence and motivation from people before any organization
can adopt new blockchain technology.
For future work, the authors intend to improve this review paper with innovative
research, enrich with more quantitative data. A framework for analysis of existing ICOs
and solutions supported by a case study can be initiated. This framework would help to
evaluate and predict the effects of different blockchain projects in healthcare. A set of
criteria should be developed; the KPI measurement metrics and a validation model
should be identified to choose the most trusted provider by looking at the different
perspectives in the framework.

References
1. Espinel, V., Brynjolfsson, E., Annunziata, M.: Global Agenda Council on the Future of
Software & Society. Deep Shift: Technology Tipping Points and Societal Impact. World
Economic Forum Homepage. http://www3.weforum.org/docs/WEF_GAC15_Technological_
Tipping_Points_report_2015.pdf. Accessed 20 Jan 2018
2. 2017 global health care sector outlook. Deloitte Homepage. https://www2.deloitte.com/
content/dam/Deloitte/global/Documents/Life-Sciences-Health-Care/gx-lshc-2017-health-
care-outlook-infographic.pdf. Accessed 20 Jan 2018
3. Schatsky, D., Piscini, E.: Deloitte survey: blockchain reaches beyond financial services with
some industries moving faster. Deloitte Homepage. https://www2.deloitte.com/us/en/pages/
about-deloitte/articles/press-releases/deloitte-survey-blockchain-reaches-beyond-financial-
services-with-some-industries-moving-faster.html. Accessed 20 Jan 2018
4. Till, B., Peters, A., Afshar, S., Meara, J.: From blockchain technology to global health
equity: can cryptocurrencies finance universal health coverage?. BMJ Global Health
Homepage. http://gh.bmj.com/content/2/4/e000570. Accessed 20 Jan 2018
Blockchain Revolution in the Healthcare Industry 639

5. Hogan, S., Fraser, H., Korsten, P., Pureswaran, V., Gopinath R.: Healthcare rallies for
blockchain: keeping Patients at the center. IBM Corporation Homepage. https://www-01.
ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=GBE03790USEN&. Accessed 20 Jan 2018
6. Blockchain Investment Trends in Review. CBInsights Homepage. https://www.cbinsights.
com/research/report/blockchain-trends-opportunities/. Accessed 20 Jan 2018
7. Internet of Medical Things, Forecast to 2021. Reportlinker Homepage. http://www.
prnewswire.com/news-releases/internet-of-medical-things-forecast-to-2021-300474906.html
. Accessed 20 Jan 2018
8. Avdoshin, S., Pesotskaya, E.: Mobile healthcare: perspectives in Russia. Bus. Inform. 3(37),
7–13 (2016)
9. Embrace Disruptive Medical Technologies. The Medical Futurist Homepage. http://
medicalfuturist.com/grand-challenges/disruptive-medical-technology/. Accessed 20 Jan
2018
10. Protenus Releases 2016 Healthcare Data Breach Report. HIPAA Journal Homepage. https://
www.hipaajournal.com/protenus-releases-2016-healthcare-data-breach-report-8656. Acces-
sed 20 Jan 2018
11. Katz, D.: The Trust Machine. The Economist Homepage. https://www.economist.com/news/
leaders/21677198-technology-behind-bitcoin-could-transform-how-economy-works-trust-
machine. Accessed 20 Jan 2018
12. Gilbert, D.: Blockchain Technology Could Help Solve $75 billion Counterfeit Drug
Problem. International Business Times Homepage. http://www.ibtimes.com/blockchain-
technology-could-help-solve-75-billion-counterfeit-drug-problem-2355984. Accessed 20
Jan 2018
13. Chowdhury, C., Krishnamurthy, R., Ranganathan, V.: Blockchain: A Catalyst for the Next
Wave of Progress in Life Sciences. Cognizant Homepage. https://www.cognizant.com/
whitepapers/blockchain-a-catalyst-for-the-next-wave-of-progress-in-the-life-sciences-
industry-codex2749.pdf. Accessed 20 Jan 2018
14. Vitaris, B.: The Next Doctor You Consult Could Be a Robot: Healthcare Meets AI and the
Blockchain. Bitcoin Magazine Homepage. https://bitcoinmagazine.com/articles/next-doctor-
you-consult-could-be-robot-healthcare-meets-ai-and-blockchain/. Accessed 20 Jan 2018
15. Steffens, B., Billot, J., Marques, A., Gawas, D., Harmalkar, O.: Facilitate health care on
block chain. MediBond Homepage. https://medibond.io/doc/medibond_whitepaper.pdf.
Accessed 20 Jan 2018
16. Ricotta, F., Jackson, B., Tyson, H., et al.: Bringing Health to Life. BurstIq Homepage.
https://www.burstiq.com/wp-content/uploads/2017/09/BurstIQ-whitepaper_07Sep2017.pdf.
Accessed 20 Jan 2018
17. Pisa, M., Juden, M.: Blockchain and Economic Development: Hype vs. Reality. Center for
Global Development Homepage. https://www.cgdev.org/sites/default/files/blockchain-and-
economic-development-hype-vs-reality_0.pdf. Accessed 20 Jan 2018
18. Vaidyanathan, N.: Divided we fall, distributed we stand. The Association of Chartered
Certified Accountants (ACCA) Homepage. http://www.accaglobal.com/lk/en/technical-
activities/technical-resources-search/2017/april/divided-we-fall-distributed-we-stand.html.
Accessed 20 Jan 2018
19. Adam-Kalfon, P., El Moutaouakil, S.: Blockchain, a catalyst for new approaches in
insurance. PwC Homepage. https://www.pwc.com.au/publications/pwc-blockchain.pdf.
Accessed 20 Jan 2018
20. Strachan, J.: Pharma Backs Blockchain. The Medicine Maker Homepage. https://
themedicinemaker.com/issues/0717/pharma-backs-blockchain/. Accessed 20 Jan 2018
Effective Reversible Data Hiding
in Electrocardiogram Based
on Fast Discrete Cosine Transform

Ching-Yu Yang1,2(&), Lian-Ta Cheng1,2, and Wen-Fong Wang1,3


1
Department of Computer Science and Information Engineering, National
Penghu University of Science and Technology, Magong, Penghu, Taiwan
chingyu@gms.npu.edu.tw
2
National Penghu University of Science and Technology, Magong, Taiwan
3
National Yunlin University of Science and Technology, Douliu, Yunlin,
Taiwan

Abstract. Based on the fast discrete cosine transform (FDCT), the authors
present an effective reversible data hiding method for electrocardiogram
(ECG) signal. First, an input ECG data is transformed into a series of non-
overlapping bundles by one-dimensional (1-D) FDCT. The FDCT bundles are
subsequently attributed into two disjoint subsets according to a simple classi-
fication rule. Then, two pieces of data bits in different length are separately
embedded in the selected coefficients of the classified bundles via the least
significant bit (LSB) technique. Simulations confirmed that the hidden message
can be extracted without distortion while the original ECG signal can be fully
recovered. In addition, the perceived quality of the proposed method is good
while the hiding capacity is superior to existing techniques. Since computational
complexity is simple, the proposed method is feasible to be applied in real-time
applications, or to be installed in the health care (or wearable) devices.

Keywords: Data hiding  Reversible ECG steganography


Fast discrete cosine transform (FDCT)  LSB technique

1 Introduction

With the maturity of artificial intelligence algorithms, the popularization of the Internet
of Things, and the flexible use of big data, people and organizations can easily use the
diversity services such as the World Wide Web, e-mail, e-commerce, online news, and
social networking from the Internet. However, if the handling of important (or confi-
dential) data does not properly conduct, it is possible for crucial resources to be
compromised. Namely, the content of the message could be intercepted, eavesdropped,
or forged by adversaries (or hackers) during transmission. One of an economical
manner to protect (or secure) the information assets is the use of data hiding techniques.
In general, data hiding can be divided into two categories: steganography and digital
watermarking [1, 2]. The applications of both approaches are quite difference. The
main aims of the steganographic methods [3, 4] are to conceal secret bits in host media

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 640–648, 2019.
https://doi.org/10.1007/978-3-030-02686-8_48
Effective Reversible Data Hiding in Electrocardiogram 641

while maintaining an acceptable perceptual quality, whereas the primary goals of


digital watermarking [5, 6] try to achieve robustness with a limited hiding payload.
To secure patients’ diagnoses such as blood pressure, blood glucose level and body
temperature, as well as name, ID number, address and patient history and other sensitive
information, some researchers have developed the data hiding methods in biometrics,
such as electrocardiogram (ECG) or electromyography (EMG). However, most of ECG
steganography [7, 8] were incapable of restoring the original ECG signal after the
extraction of hidden message. As host biometrics signals are valuable to the hospitals
and individuals, it is undesirable that the host data be damaged after bit extraction.
To completely recover the original hosts and successfully extract the hidden
message at the receiver site, several authors have designed reversible ECG steganog-
raphy to achieve the goal [9, 10]. Yang and Wang [9] presented two types of data
hiding methods for ECG signals, namely, lossy and reversible ECG steganographys.
To preserve the originality of host ECG data, a reversible version of data hiding for
ECG signal was proposed. By employed the mean-predicted technique and coefficient
alignment, data bits were embedded in the predefined bundles of the host ECG.
Simulations revealed that the hidden bits were extracted successfully while the original
ECG signal can be restored completely. The average payload of the method was
44.07 Kb with signal to noise ratio (SNR) of value 34.78 dB. Based on the Hamming
code and matrix coding techniques, Shiu et al. [10] suggested a reversible data hiding
method for ECG and EMG signals. Simulations indicated that the hiding capacity of
their method was larger than those of existing techniques, but the average SNR was
only 17.99 dB. Since the perceived quality of the marked ECG signal was distorted
severely, rendering it of no use for clinical diagnosis in medicine.
In this article, we propose a simple but effective reversible ECG steganography,
which is capable of providing high hiding storage with good perceptual quality. The
remainder of this paper is organized as follows. Section 2 specifies the procedure of bit-
embedding/-extraction, plus overhead analysis and discussion. Section 3 presents the
demonstrations of the proposed method, and Sect. 4 provides the conclusion.

2 Proposed Method

First, an ECG host is transformed into a series of non-overlapping bundles via FDCT
[11–13]. The FDCT bundles are subsequently attributed into two disjoint subsets
according to a simple classification rule. Then, two pieces of data bits in different length
are separately embedded in the target coefficients of the classified bundles. The details of
bit embedding/extraction of the proposed method are specified in the following sections

2.1 Bit Embedding

Let Aj is the jth bundle of size 1  n derived from a host ECG, and also let Hj ¼
  n1
sji i¼0 be a series of non-overlapping jth bundle taken from 1-D FDCT coefficients,
which was obtained by performing FDCT from Aj with n = 8, as shown in Fig. 1.
The FDCT bundles are represented by I ¼ fHj jj ¼ 1; 2; . . .; jIjg with Hj ¼
 
10  Aj X ; where and X is a predetermined 8  8 matrix, as shown in (1). [Note that
642 C.-Y. Yang et al.

Fig. 1. Bundle of size 8.

to ensure a reversible ECG steganography can be reached, the values of sji in Hj are
obtained by performing a floor function to the multiplication of 10 and Aj X:]
21 1 1 1 1 1 1 1 31
63 5 3 3
 38  34  54  32 7
62 4 4 8 7
6 7
61 1
 12 1 1  12 1
1 7
6 2 2 7
65 7
64  38  32  34 3 3 3
 54 7
X¼6 7 :
4 2 8
61 ð1Þ
6 1 1 1 1 1 1 1 7 7
63 7
6  32 3 5
 54  38 3
4737
64 8 4 2
61 1 7
42 1 1  12  12 1 1 2
5
3
8  34 5
4  32 3
2  54 3
4  38

The main procedure of bit embedding of the proposed method is specified in the
following algorithm.
Algorithm 1. Hiding a secret message in an ECG host.
Input: Host ECG data E , scrambled secret message W, and control parameters μ.
~
Output: Marked ECG data E and bitmap Ψ.
Method:
Step 0. Perform forward FDCT from E to obtain 1-D DCT bundles

Step 1. Input a bundle H j from Ι . If the end of input is encountered, then proceed
to Step 5.
Step 2. Compute the average of the absolute values from
H j , if T ≤ μ , then mark this bundle with bit “0”, otherwise, mark it with bit “1”,
and save the bitmap
Step 3. If then take three (and two) data bits from W each time and embed it
{ } { }
in the coefficients s ji 3 (and s ji n − 2 ) by the LSB technique, respectively,
i =0 i =4
and return to Step 1.
Step 4. If then take two data bits from W each time and embed it in the
{ }i =0
coefficients s ji 3 by the LSB technique, and return to Step 1.
Step 5. Perform inverse FDCT from the marked bundles and form marked ECG data.
Step 6. Stop.
Effective Reversible Data Hiding in Electrocardiogram 643

To alleviate distortion and obtain better hiding capability during the encoding
phase, two pieces of data bits in different length are separately employed at Steps 3–4.
Namely, each time there are ð3  4Þ þ ð2  3Þ ¼ 18 and 2  4 ¼ 8 bits embedded in
the two classified bundles, respectively.

2.2 Bit Extraction


The decoding part of the proposed method is summarized here.
Algorithm 2. Extracting hidden message from mark ECG data and restoring original
ECG host.
~ the control parameters
Input: Marked ECG data E , μ , and the bitmap Ψ.
Output: A secret message W and host ECG data E .
Method:
~
Step 0. Perform forward FDCT from E to obtain 1-D DCT bundles
and read in the bitmap

Step 1. Input a bundle Ĥ l which derived from Ι̂. If the end of input is encountered,
then proceed to Step 4.
Step 2. If { }
then extract eighteen hidden bits from the coefficients s ji n − 2 and
i =0
restore the host bundles, and go to Step 1.
Step 3. If { }
then extract eight hidden bits from the coefficients s ji 3 and
i =0
restore the host bundles, and go to Step 1.
Step 4. Descramble and assemble all extracted bits, and perform inverse FDCT from
~
Ι̂ to restore the original ECG data. Notice that the marked ECG data E was
obtained by conducting
Step 5. Stop.

2.3 Overhead Analysis and Discussion


From Algorithm 1 we can see that it requires one bit to record the attribute of each
FDCT bundle in the bitmap W: The auxiliary information (Oh ) of the proposed method
is Oh ¼ jIj: For example, if the size of an input host ECG is 30,000 and the size of a
bundle is set to be 8, then overhead bits of the proposed method is Oh = 30,000/8 =
3,750. Notice as well overflow issue can be avoided during the encoding process.
In general, the value of the coefficient sjðn1Þ is often significantly larger than those
of remaining n − 2 coefficients of Hj after FDCT operation. The role of coefficient
sjðn1Þ is similar to that of the DC coefficient in conventional DCT domain. In other
words, if data bits were embedded in this coefficient, a severe distortion would be
introduced during the process of encoding. Therefore, the proposed method embeds
secret bits in the remaining n − 2 coefficients of Hj.
644 C.-Y. Yang et al.

3 Experimental Results

The simulations of the proposed method were implemented in Matlab (R2015b) pro-
gramming language under the platform of Microsoft Windows 10 laptop and an Intel
Core (TM) i5-6300U 2.4 GHz with 8 GB RAM. The host ECG signals were derived
from the MIT-BIH arrhythmia database [14]. Several host ECG data were utilized in
our experiments. The size of each test set was 30,000. The average execution time of
the proposed method was 0.125 s. The relationship between the average SNR/PRD and
net payload of the proposed method with various mean value (l) was drawn in Fig. 2.
It can be seen that the lower value of l, the larger the SNR value, and the lesser the
hiding capacity, and vice versa. In our proposed method, to achieve a desired net
payload, SNR value, and perceived quality, the value of l was set to be 9. Table 1
indicated the net payload, SNR, and PRD of the proposed method using l = 9. The
average SNR/PRD of the proposed method is 40.74 dB/0.0093 with an average net
payload of 45.80 Kb. In addition, the relationship between average SNR and net
payload of the proposed method using five different inputs with various l was depicted
in Fig. 3. From the figure we can see that the ECG100 has the best performance among
all the input data. The hiding performance of ECG102 won the second place, followed
by ECG101, ECG103, and ECG104. One of the main reason for ECG104 ranked at the
last place is that it contains more steep areas (or drastic variations) than smooth ones,
meaning that the corresponding coefficients in the FDCT bundles are often larger than
l, and the lesser data bits can be embedded in ECG104. The SNR and PRD are defined
as follows:
P
s2i
i
SNR ¼ 10 log10 P ð2Þ
ðsi  ^si Þ2
i

and
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uP
u ðsi  ^si Þ2
u
PRD ¼ u
t P 2 ;
i
ð3Þ
si
i

where, si and ^si are the data in original ECG and marked ECG divided by 10,
respectively. Generally speaking, the larger value of SNR, the lesser the PRD, the
better perceived quality can be obtained.
Close observation of the host and the marked ECGs, namely, ECG100, ECG101,
ECG111, and ECG220 (at the beginning of 5-s) were drawn in Fig. 4. The resultant
SNR and net payload were also depicted in the figures. It is clear that the perceived
quality is not bad. No apparent distortion existed in the marked ECGs. As described
previously, the less proportion of the steep areas (or drastic variations), the better the
hiding capability of the proposed method. From Fig. 4 we can see that ECG100 (in
Effective Reversible Data Hiding in Electrocardiogram 645

Fig. 2. The relationship between the average SNR/PRD and net payload of the proposed method
with various l. (a) Average SNR vs. net payload and (b) average PRD vs. net payload.

Fig. 4a) provided the best hiding capability, whereas, ECG111 provided the least
hiding storage (in Fig. 4c).
Performance comparison between our method and existing techniques [9, 10] was
listed in Table 2. It is obvious that the average SNR of our method is much larger than
that of the Yang and Wang’s technique [9] when the average net payload is around
44 Kb. Although the hiding storage of the Shiu et al.’s approach [10] is the largest
among the compared methods, their resultant SNR is not good. Due to a low SNR
implies a poor perceived quality of the marked ECG signals, it is not feasible for
medical staffs use it in the diagnosis of patients.
646 C.-Y. Yang et al.

Table 1. Net payload, SNR, and PRD performance of the proposed method using l = 9
ECG data Net payload SNR PRD
ecg100 54,580 41.12 0.0088
ecg101 46,220 41.31 0.0086
ecg102 50,440 41.22 0.0087
ecg103 46,560 39.77 0.0102
ecg104 44,830 39.59 0.0105
ecg111 42,220 43.15 0.0070
ecg112 45,960 42.03 0.0079
ecg113 44,240 38.98 0.0112
ecg114 47,990 42.25 0.0077
ecg115 51,130 38.63 0.0117
ecg121 51,870 42.03 0.0079
ecg220 45,170 37.60 0.0132
ecg221 42,740 41.40 0.0085
ecg222 49,380 42.15 0.0078
ecg223 46,740 40.59 0.0093
ecg230 43,370 40.08 0.0099
ecg231 43,930 40.70 0.0092
Average 46,904 40.74 0.0093

Fig. 3. The relationship between the SNR and net payload of the proposed method with various
host ECG.
Effective Reversible Data Hiding in Electrocardiogram 647

Fig. 4. Close observation of the host and the marked ECGs: (a) ECG100, (b) ECG101,
(c) ECG111, and (d) ECG220.

Table 2. Net payload/SNR comparison with existing techniques


ECG data Net payload/SNR
Yang and Wanga [9] Shiu et al.b [10] Our method
100 45,567/36.89 68,270/19.69 54,580/41.12
121 47,029/37.93 68,270/18.26 51,870/42.03
122 44,683/31.52 68,270/18.61 37,570/40.59
205 44,343/36.09 68,270/17.82 51,140/41.97
207 44,853/37.10 68,270/15.56 44,590/43.38
220 44,921/31.65 N/A 45,170/37.60
230 44,530/32.30 N/A 43,370/40.08
Average 45,132/34.78 68,270/17.99 45,497/39.83
a
With reversible version using bundle size = 1.
b
With (1023, 1013)-Hamming code.
648 C.-Y. Yang et al.

4 Conclusion

In this study, based on a smart processing of the FDCT coefficients, we proposed an


effective reversible data hiding method for ECG signal. First, a simple classification
rule was performed on the host bundles. Then, two pieces of data bits in different length
are separately embedded in the target coefficients of the classified bundles via the LSB
technique. Simulations confirmed that the hidden message can be extracted without
distortion and the original ECG signal is completely recovered at the receiver site. In
addition, the hiding capacity and SNR/PRD of the proposed method outperform those
of existing techniques. Due to the processing time of encoding/decoding is short, it is
suitable for our method to implement in the real-time applications, or to be performed
in a (mobile) health care device for ECG signal measurements.

References
1. Phadikar, A.: Data Hiding Techniques and Applications Specific Designs. LAP LAMBERT
Academic Publishing, Saarbrucken (2012)
2. Eielinska, E., Mazurczyk, W., Szczypiorski, K.: Trends in steganography. Commun. ACM
57, 86–95 (2014)
3. Yang, C.Y., Wang, W.F.: High-capacity steganographic method for color images using
progressive pixel-alignment. J. Inf. Hiding Multimed. Signal Process. 6, 815–823 (2015)
4. Li, B., Wang, M., Li, X., Tan, S., Huang, J.: A strategy of clustering modification directions
in spatial image steganography. IEEE Trans. Inf. Forensics Secur. 10, 1905–1917 (2015)
5. Hsiao, C.Y., Tsai, M.F., Yang, C.Y.: High-capacity robust watermarking approach for
protecting ownership right. In: The 12th International Conference on Intelligent Information
Hiding and Multimedia Signal Processing, November 21–23, Kaohsiung, Taiwan (2016)
6. Liu, S., Pan, Z., Song, H.: Digital image watermarking method based on DCT and fractal
encoding. IET Image Process. 11, 815–821 (2017)
7. Ibaida, A., Khalil, I.: Wavelet-based ECG steganography for protecting patient confidential
information in point-of-care systems. IEEE Trans. Biomed. Eng. 60, 3322–3330 (2013)
8. Chen, S.T., Guo, Y.J., Huang, H.N., Kung, W.M., Tseng, K.K., Tu, S.Y.: Hiding patients
confidential data in the ECG signal via a transform-domain quantization scheme. J. Med.
Syst. 38 (2014). doi: 10.1007/s10916-014-0054-9
9. Yang, C.Y., Wang, W.F.: Effective electrocardiogram steganography based on coefficient
alignment. J. Med. Syst. 40 (2016). doi: 10.1007/s10916-015-0426-9
10. Shiu, H.J., Lin, B.S., Huang, C.H., Chiang, P.Y., Lei, C.L.: Preserving privacy of online
digital physiological signals using blind and reversible steganography. Comput. Methods
Programs Biomed. 151, 159–170 (2017)
11. Chen, W.H., Smith, C.H., Fralick, S.C.: A fast computational algorithm for the discrete
cosine transform. IEEE Trans. Commun. COM-25, 1004–1009 (1977)
12. Feig, E., Winograd, S.: Fast algorithm for the discrete cosine transform. IEEE Trans. Signal
Process. 40, 2174–2193 (1992)
13. Liang, J., Tran, T.D.: Fast multiplierless approximations of the DCT with the lifting scheme.
IEEE Trans. Signal Process. 49, 3032–3044 (2001)
14. Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng.
Med. Biol. Mag. 20, 45–50 (2001)
Semantic-Based Resume Screening System

Yu Hou ✉ and Lixin Tao


( )

Pace University, New York City, NY 10038, USA


{yh50276p,ltao}@pace.edu

Abstract. At present, XML becomes one of the best choices for storing semi-
structured electronic resumes. Most of the companies let the candidates fill out
their resumes online on the company’s website and store these electronic resumes
uniformly. This paper assumes that all candidates’ electronic resumes will be
saved in the form of XML, and proposed a Semantic-based Resume Screening
System (RSS). The RSS could improve the accuracy and efficiency in the hiring
process by using the Ontology Knowledge Base and the Pace XML Validator.

Keywords: Knowledge representation · Web Ontology Language (OWL) · XML


Integrated syntax and semantic validation

1 Introduction

1.1 A Subsection Sample

Due to the low coverage, poor efficiency and high cost, the traditional offline recruitment
mode has been replaced by the internet recruitment mode since the last few decades.
The top companies may receive a large number of electronic resumes daily. Therefore,
it is challenging for recruiters to store and screen the resumes which are semi-structured.
Nowadays, the most popular model is applicants fill out their resumes online on the
company’s website, which facilitates the uniform store and management of electronic
resumes. Since XML has appeared, it becomes the best choice for storing electronic
resumes. At present, most of the companies are challenged by screening those semi-
structured resumes. It is a heavy work to screen the ideal candidate accurately and
efficiently from a large number of resumes. Manual screening is not only time-
consuming, but also has a strong subjectivity. It is difficult to be guaranteed that the
companies can find the ideal candidates from the large-scale resume objectively and
efficiently.
The traditional and the most common solution is keyword search, for example, if the
HR want to search candidates who graduate from Pace University, then he or she needs
to use ‘Pace University’ as the keyword to search in candidates’ resumes. However, this
method cannot meet the most HRs’ requirements very well, because some companies
HR usually use keyword tags for expressing their certain demands, such as ‘candidate
who has work experience in the Fortune Global 500 companies’, ‘candidate who is
graduated from the Lvy League’. The traditional keyword search just can screen the
resumes which include the specific name such as ‘Google’, ‘Facebook’, ‘Pace

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 649–658, 2019.
https://doi.org/10.1007/978-3-030-02686-8_49
650 Y. Hou and L. Tao

University’ and so on, but it cannot screen resumes by using ‘Fortune Global 500
companies’. This paper proposed a Semantic-based Resume Screening System (RSS),
which is introduced an Ontology Knowledge Base. The RSS could improve the prescre‐
ening of the hiring process by matching the Ontology Knowledge Base.
Since 2013 Pace University has developed Pace XML Validator [1], this validator
greatly improves the efficiency of the XML file’s verification with the features of reus‐
able, integrated syntax and semantic validation. Because of the advantages of XML in
the storage and retrieval of electronic resumes, this paper assumes that the electronic
resumes need to be filtered that is filled out by the candidates on the companies’ website
and stored as XML documents, and using the Pace XML Validator to set the constraints
to represent the screening requirements. Therefore, the validated XML documents meet
the screening requirements. Conversely, an XML document that failed the validation
means it does not meet the screening requirements.
This paper offers a proposed approach to help the RSS understand the user’s real
intention more accurately. The HR can achieve the ideal candidates from the optimized
result. The main contribution of the RSS is to enhance the accuracy and efficiency of
the electronic resumes screening process. In the following of this paper, the related work
of RSS system will be introduced in Sect. 2, and the approach on how to create a knowl‐
edge base will be discussed in Sect. 3. The details of the system framework and imple‐
mentation will be illustrated in Sect. 4. Finally, we will make a conclusion of this project.

2 Related Work

2.1 Ontology-Based Knowledge Representation

Knowledge Representation is the field of study concerned with using formal symbols
to represent a collection of propositions believed by some putative agent [2]. In a general
sense, knowledge representation is a set of conventions for describing the world and it
is the symbolization, formalization, or modeling of knowledge. From the perspective of
computer science, knowledge representation is a general method to study the feasibility
and validity of computer to represent knowledge. It is a strategy of representing human
knowledge as a data structure and a system control structure of machine processing.
More specifically, the knowledge can be defined as understanding, facts, information
and description for some real or imaginary entity. In other words, in the field of computer
science, Knowledge Representation means that let the machines can understand. At
present, the research on knowledge representation and organization method is mainly
composed of frame expression, generative expression, object-oriented expression and
ontology-based expression. Ontology-based knowledge representation is getting more
and more attention. The concept of ontology originated in the field of philosophy, which
is defined as ‘a systematic description of the objective existence in the world’ is a
systematic explanation or explanation of objective existence and concerns the abstract
essence of objective reality [3]. With the development of artificial intelligence, Knowl‐
edge Representation was given a new definition in the fields of AI and computer science.
Ontology is an integration tool for application and domain knowledge. It is a collection
of concepts in a certain domain and relations among concepts, and the relationship
Semantic-Based Resume Screening System 651

reflects the constraints and connections among concepts. Ontology-based knowledge


representation can ensure the consistency and uniqueness of knowledge sharing in the
process of sharing, and can fully express the complex semantic relations between
knowledge. Therefore, ontology can solve a large number of knowledge exchange and
disordered sharing situation to maximize the sharing and reuse of knowledge. The use
of ontology formal knowledge representation can easily access knowledge semantic
information. Specifically, ontologies emphasize the relationships between entities and
express. It can reflect these relationships through a variety of knowledge representation
elements. These elements are also called meta toms that includes concept, attributes,
relations, functions, axioms and instance. Therefore, ontology has been widely used in
many fields.

2.2 Pace Schematron XML Validator


Extensible Markup Language (XML) is a markup language that defines a set of rules
for encoding documents in a format that is both human-readable and machine-readable.
The XML can be used to mark data, define data types, and it is a source language that
allows users to define their own markup language. The main features of the XML are:
(1) Convenient extensibility. XML allows organizations or individuals to create a
collection of tags that suit their own needs, and these collections of tags can quickly get
used to the Internet. (2) Strong structure. The logical structure of XML document data
is a tree-like hierarchy. Each element in a document can be mapped to an object, and
corresponding attributes and methods are also available. Therefore, it is suitable for the
use of object-oriented programming to develop applications that process these XML
documents. (3) Good interaction. When users interact with applications, using XML
makes it easy to locally sort, filter, and perform other data operations without interacting
with the server which relieves the burden on the server. (4) Powerful Semantic. In XML
documents, people can use certain tags to define the relevant semantics for data, which
not only greatly improves the readability of the document for human beings, but is also
easy to be read and used by machines. Therefore, the information exchange between
different devices and different systems can be easy. Because XML describes the meaning
of data content by tagging it and separates the display format of the data, the search for
XML document data can be performed simply and efficiently. In this case, the search
engine does not need to go through the entire document, but only to find the contents of
the specified tag on it. In this way, it is no longer difficult to browse the Internet, as each
page is displayed exactly what the viewer wants. In the electronic resume, for different
candidates, some specific markers are fixed, such as name, age, graduation school, work
experience, etc., but only the content is specified by these marks. Therefore, combined
with the characteristics of XML, the storage of electronic resumes in the form of XML
documents has become the most effective method.
Since 2013, Pace University developed an integrated syntax/semantic validator
which is a Pace XML Validator. Schematron [4] is a popular rule-based XML dialect
that allows us to specify such co-constraints for a class of XML documents and then use
a standard Schematron validator to validate the co-constraints without coding. Over the
past decade, the standard implementation of the Schematron validator is to use a standard
652 Y. Hou and L. Tao

XSLT stylesheet [5, 6] to transform a Schematron document into a new validator XSLT
stylesheet, and then use the latter to validate the XML instance documents. However,
the current industry practice of XSLT-based Schematron validation may produce invalid
results and cannot be easily integrated with other system components [1]. Thus, Pace
University designed and implemented a validator as a reusable software component
based on DOM Level 3 XPath. It supports all key features of Schematron ISO [4]
including abstract rules and abstract patterns, network integration through web services,
and event-driven loose-coupling.

3 Create Knowledge Base

Ontologies are usually organized in taxonomies and typically contain modeling primi‐
tives such as classes, relations, functions, axioms and instances [7]. Therefore, the
ontology design of knowledge base is the design of concept, relationship and instance.
This paper will illustrate the design of using the domain knowledge base to analyze the
resume information, which could help the users to find the ideal candidates more accu‐
rately. At present, the design of the ontology for semantic analysis of resume information
is mainly composed of classes and instances. The classes in ontology have two functions:
(1) Describe the meaning of class and the knowledge contained in the class; (2) Define
subclasses and instances of the class. The difference between an instance and a class is
that the class could be a name or some attributes that describe an instance within a
collection, but the instance is a member of the collection. For example, the smartphone
is a class, and the iPhone 8 is an instance of this class. By matching the domain knowl‐
edge base, the system can set the constraints in Pace XML Validator more accurately,
so that the system can achieve the better result to the users. This paper uses Protégé as
an ontology modeling tool to create a knowledge base. Protégé is a free, open source
ontology editor and a knowledge management system [8]. Protégé provides a set of
behavior-oriented systems based on a knowledge model structure to support the
ontology construction of various expressions (such as an OWL, RDF, Dublin Core and
so on). In the Protégé editor, the ontology structure is shown in the hierarchical directory
structure. It is straightforward for the maintenance operations of the ontology (such as
adding classes, subclasses, attributes, instances). Therefore, there is no need to concern
the specific ontology language; it only needs to design a domain ontology model at the
conceptual level. The example used in this paper is that an HR needs to find the candi‐
dates that ‘graduated from Lvy League’, or ‘has work experience in Fortune Global 500
companies’. A knowledge base will be designed based on this assumption. First, the
‘Lvy League’ and ‘Fortune Global 500 companies’ are derived from the class Thing.
The university such as ‘Havard University’, ‘Columbia University’ and so on, they
belong to the class of ‘Lvy League’, and the company such as ‘IBM’, ‘Apple’, ‘Micro‐
soft’ and so on, they are the instances which belong to the class ‘Fortune Global 500
companies’. By establishing the knowledge base in this field, the system will understand
how to set the constraints in Pace XML Validator when it meets the requirements such
as ‘having a work experience in Fortune Global 500 companies’. Therefore, the system’s
ability will be enhanced.
Semantic-Based Resume Screening System 653

4 Design of Semantic-Based Resume Screening System Framework

The Semantic-based Resume Screening System (RSS) is composed of four parts: No. 1:
Reading the requirements from the users, and the RSS will conduct a preprocessing for
later operation. No. 2: Based on the pre-processed input, the system will match the
knowledge base created previously, then generate the contents from the resumes that
RSS will screen later. No. 3: Based on the contents, the RSS will generate constraints
automatically when the RSS invokes the Pace XML Validator. That is, the RSS will
generate a Schematron file (.sch file). No. 4: The RSS will invoke the Pace XML Vali‐
dator to validate each resume in the resume folder, then return the verified documents
that the users want to achieve. Figure 1 is the design of Semantic-based Resume
Screening System Framework.

Fig. 1. The design of semantic-based resume screening system framework.

4.1 Preprocessing
When a user enters a requirement, we need to preprocess the input first so that the later
operation can be more convenient for these requests. The main preprocessing is to ignore
the capitalization of the letter input, as well as the space input. As we know, when users
enter the requirements, the first letter of a university or organization always needs capi‐
talization in the expression. However, the expression of some classes and instances in
the knowledge base may not be stored in the form of capital letters. In order to avoid
errors caused by inconsistencies during the operation process, the RSS will ignore the
capitalization of the letter input in the preprocessing part. In the OWL file, the spaces
are often saved with the ‘symbol’. Figure 2 is an example of an OWL file. From this
example, we can see that the spaces in the class ‘fortune global 500 companies’ and the
class ‘lvy league’ are represented by the ‘symbol’. Therefore, the RSS will preprocess
the spaces, in order to avoid the error during matching the knowledge base.
654 Y. Hou and L. Tao

Fig. 2. An example of OWL file.

4.2 Matching the Knowledge Base

In this section, the RSS will use Jena to read and identify the established knowledge
base, which is to use Jena to read and analyze the saved OWL file. Apache Jena (or Jena
in short) is a free and open source Java framework for building semantic web and Linked
Data applications. The framework is composed of different APIs interacting together to
process RDF data [9]. First, the RSS will match the preprocessed input with the OWL
file read by Jena, thus, the RSS will understand whether the user needs the knowledge
base’s assist. For example, if the user wants to find out candidates who have experience
working with Fortune Global 500 companies, the RSS can know that ‘Fortune Global
500 companies’ means that candidates should have work experience in companies such
as IBM, Apple, Microsoft and so on because the instances ‘IBM, Apple, Microsoft’
belong to the class ‘fortune global 50 companies’ in the knowledge base. If a user’s
requirement does not need the knowledge base’s assist, for example, a user wants to
find candidates who graduated from Pace University, the RSS may find that ‘Pace
University’ is not one of the classes in the OWL file. Then, the RSS will return the result
“Pace University” directly without the knowledge base.
Semantic-Based Resume Screening System 655

4.3 Generate Schematron File


Through the previous section, the RSS will understand the details of users’ demand to
search. Next, the RSS will generate a Schematron file based on the keywords returned
in the previous session automatically. The Schematron file is to set the XML file’s
constraints. The Pace XML Validator will use the Schematron file to verify whether the
XML file can meet the constraints. For example, a candidate named Mike, his resume
is saved in XML format. Figure 3 is Mike’s resume saved in an XML file. If an HR
wants to find candidates who graduated from Pace University, then the keyword gener‐
ated in Sect. 2 is ‘Pace University’. And the RSS will generate the corresponding Sche‐
matron file based on the keyword to set constraints on the XML file. Figure 4 is a Sche‐
matron file generated from the keyword ‘Pace University’. In this file, we restrict the
XML file as follows: Search the content ‘Pace University’ under the ‘education’ element,
if the ‘education’ element has the content – ‘Pace University’, it means the verification
is passed, otherwise, it will fail. When an HR wants to find candidates who have working
experience in the Fortune Global 500 companies, the RSS will understand that ‘work in
Fortune Global 500 companies’ means ‘working in the companies such as IBM, Apple,
Microsoft and so on’ via matching the knowledge base. Thus, the keywords are the
companies’ name such as ‘IBM, Apple, Microsoft and so on’. Then, the RSS will
generate the corresponding Schematron file based on these keywords. Figure 5 is a
Schematron file with the constraints of ‘work in Fortune Global 500 companies’. In this
file, we restrict the XML file as follow: Search the content which includes any of the
companies’ name which is one of the Fortune Global 500 companies under the ‘work’
element, if the ‘work’ element has any of these names, it means the verification passed,
otherwise, it will fail.

Fig. 3. Mike’s resume.


656 Y. Hou and L. Tao

Fig. 4. The Schematron file with the keyword ‘Pace University’.

Fig. 5. The Schematron file of ‘work in Fortune Global 500 companies’.

4.4 Invoke Pace XML Validator

In this paper, we assume that all candidates’ electronic resumes are saved as XML files
in a specific folder. In this step, the RSS will invoke the Pace XML Validator and follow
the Schematron file which is generated in the previous step to verify the XML files
individually. Once the validation has completed, the RSS will return the verified XML
file. Figure 6 shows the three resumes saved in one folder; if a user wants to find a
candidate who graduated from Pace University. After the screening, the RSS will return
the verified XML files in ‘Alice.xml’ and ‘Tom.xml’. Figure 7 is the results of the
screening. If a user wants to find a candidate who has work experience in Fortune Global
500 companies. After the screening, the RSS will return the verified XML files in
‘Mike.xml’ and ‘Tom.xml’. Figure 8 is the results of the screening. Now, the user can
find their ideal candidates for the screening.
Semantic-Based Resume Screening System 657

Fig. 6. The example of resumes.

Fig. 7. The results of the RSS screen ‘Pace University’.

Fig. 8. The results of the RSS screen ‘Fortune Global 500 Companies’.

5 Conclusion

In this paper, we showed that the search based on keywords cannot satisfy the current
screening of electronic resumes. This paper proposed a Semantic-based Resume
Screening System (RSS). This system can greatly enhance the understanding ability of
the screening requirement based on the knowledge base. And this paper also improves
the efficiency of XML document validation through the application of Pace XML Vali‐
dator. The approach proposed in this paper can improve the efficiency and accuracy of
screening resumes. Therefore, by using this approach the efficiency of a company’s
hiring process will be highly promoted. In the future, our work will introduce the knowl‐
edge graph to improve the capability of the knowledge representation. Because the
ontology primarily only supports the subclass Of (is-a or inheritance) relation. Various
other relations, such as part-of are essential for representing information in various fields
including all engineering disciplines [10].
658 Y. Hou and L. Tao

References

1. Tao, L., Golikov, S.: Integrated syntax and semantic validation for services computing. In:
2013 IEEE 10th International Conference on Services Computing (2013)
2. Brachman, R.J., Levesque, H.J.: Knowledge Representation and Reasoning. Morgan
Kaufmann, San Francisco (2004)
3. Wu, J.: The construction of ontology-based domain knowledge base. Sci. Technol. Innov.
Herald 30, 250–252 (2010)
4. I. Standard: Information technology - Document Schema Definition Language (DSDL) - Part
3: Rule-based validation – Schematron, March 2013. http://standards.iso.org/ittf/
PubliclyAvailableStandards
5. Dodds, L.: Schematron; validating XML using XSLT, March 2013. http://www.ldodds.com/
papers/schematron_xsltuk.html
6. Jelliffe, R.: Schematron Implementations, March 2013. http://www.schematron.com/
links.htm
7. Gruber, T.R.: A translation approach to portable ontology specifications. Knowl. Acquis. 5,
199–220 (1993)
8. Musen, M.A.: The Protégé Project: a look back and a look forward. AI Matters 1(4), 4–12
(2015)
9. Jena, A.: Getting started with Apache Jena. https://jena.apache.org/getting_started/
index.html
10. Patel, K., Dube, I., Tao, L., Jiang, N.: Extending OWL to support custom relations. In: 2015
IEEE 2nd International Conference on Cyber Security and Cloud Computing, New York,
USA, November 2015
The Next Generation of Artificial Intelligence:
Synthesizable AI

Supratik Mukhopadhyay1 ✉ , S. S. Iyengar2, Asad M. Madni3, and Robert Di Biano4


( )

1
Division of Computer Science and Engineering, Louisiana State University,
Baton Rouge, LA 70803, USA
supratik@csc.lsu.edu
2
School of Computing and Information Sciences, Florida International University,
Miami, FL 33199, USA
iyengar@cs.fiu.edu
3
Department of Electrical and Computer Engineering, University of California,
Los Angeles, CA 90095, USA
ammadni@ee.ucla.edu
4
Department of Computer Science, Louisiana State University,
Baton Rouge, LA 70803, USA

Abstract. While AI is expanding to many systems and services from search


engines to online retail, a revolution is needed, to produce rapid, reliable “AI
everywhere” applications by “continuous, cross-domain learning”. We introduce
Synthesizable Artificial Intelligence, and discuss its uniqueness by its five
advanced “abilities”; (1) continuous learning after training by “connecting the
dots”; (2) measuring quality of success; (3) correcting concept drift; (4) “self-
correcting” for new paradigms; and (5) retroactively applying new learning for
development of “long-term self-learning”. SAI can retroactively apply new
concepts to old examples, “self-learning” in a new way by considering recent
experiences similar to the human experience. We demonstrate its current and
future applications in transferring seamlessly from one domain to another, and
show its use in commercial applications, including engine sound analysis,
providing real-time indications of potential engine failure.

Keywords: Artificial intelligence · Synthesizable Artificial Intelligence


IBM Watson · Natural language processing · Self-learning

1 Introduction

IBM’s Watson Analytics is no longer just a Jeopardy playing genius. Watson has
embarked on a journey of knowing, going far beyond its initial capacity for Jeopardy
question answering. Watson Analytics has made great strides employing the use of the

The authors acknowledge the sponsorship of NASA Ames Research Center, US Department
of Agriculture, National Science Foundation, US Department of Defense and the US Army
Research Lab in their research.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 659–677, 2019.
https://doi.org/10.1007/978-3-030-02686-8_50
660 S. Mukhopadhyay et al.

Natural Language Processing User Interface (NLP-UI) as a novel approach to analysis


of business problems allowing even unseasoned businessmen an opportunity to analyze
industry and personal datasets.
The diversity of challenges in AI and their specific embedded complexities should
not obscure the fact that the heart of the subject belongs to real-time reasoning. For the
last decade, researchers in Artificial Intelligence (AI) have made exponential progress
in applications across broad industry areas. Autonomous vehicles from Google and
others have registered countless miles on American roads. AI systems are interpreting
radiology images and diagnosing diseases with the same skill level as experienced radi‐
ologists and doctors. AI is influencing every aspect of human life from hearing aids to
stock trades. So, is AI ready for primetime, or are we already there? We think the state-
of-the-art in AI today is at the same stage that software engineering was in the early
1960s. During that time, software could only handle small problems in diverse domains
(e.g., numerical analysis, personnel management, etc.): there was no way in which
complex software systems involving millions or billions of lines of code could be created
to tackle real world problems. In the same way, today’s AI systems are limited to solving
smaller (but harder) problems like “image recognition”, and “automatic question
answering”. Scaling such systems to address large complex tasks such as automated
drug design, air traffic control, or running an entire enterprise remains a challenge. Soft‐
ware engineers invented abstractions embodied in object-oriented techniques and prin‐
ciples of software reuse to revolutionize productivity; today large software systems are
no longer developed from scratch: they are built by reusing existing code through
subclassing and overriding methods. A variety of software abstractions are available
today to enable code reuse, from design patterns to frameworks. Thanks to this meth‐
odology, today software is all encompassing, influencing every walk of human life from
power systems to retail. Is AI waiting for a similar abstraction revolution?
While AI has been part of many systems and services from search engines to online
retail, to realize the vision of “AI everywhere”, a revolution similar to that which
occurred in software is needed. Despite all the recent successes of AI, many questions
remain unanswered.
In many ways, Watson represents a solution to many problems, yet still has some
limitations in moving to a new domain. Watson cannot hit the ground running in a
completely new domain, automatically deploying and reconfiguring itself online when
situations change. The machine learning system of Watson is very good, but cannot
auto-tune to a problem domain instantaneously. The concept of domain changes in many
of these applications is still a problem of interest.
Researchers throughout the AI community have been asking, “How do you improve
productivity in the creation and deployment of AI systems?” In other words, how can
we produce AI systems rapidly and reliably as the applications of AI expand from
understanding specific scenes to serving societal and business needs in critical areas?
The authors and their team have introduced an alternative approach through Synthe‐
sizable Artificial Intelligence, or SAI technology. Previous work by Mukhopadhyay,
Iyengar et al., on Cognitive Information Processing Shell [1] served as an impetus for
this approach.
The Next Generation of Artificial Intelligence: Synthesizable AI 661

SAI is unique from any other AI system by virtue of its five technological advances
or “abilities”; (1) continuous learning after training by “connecting the dots”; (2) meas‐
uring the quality of success; (3) correcting concept drift; (4) “self-correcting” for new
paradigms; and (5) retroactively applying new learning for development of “long-term
self-learning.”
SAI can retroactively apply new concepts to old examples, “self-learning” in a new
way by considering recent experiences similar to the human experience. In this paper,
we demonstrate how our work on SAI has overcome limitations of other AI systems,
and its current and future applications in transferring seamlessly from one domain to
another. We show its use in current commercial applications, including engine sound
analysis, where it provides real-time indications of potential engine failure, and its future
uses in “automatic drug discovery”.

2 Hierarchical Fractal Architecture of SAI Agents and Related


Work

Currently, different AI systems specialize in single specific tasks, determined by the data
type in which they were trained in advance. SAI is unique in measuring the applicability
of a given agent (neural network), or cluster of neurons within a network, to a specific
task in real time. Thus, SAI can detect if the input changes to something the network is
not equipped to deal with and draw from a wide variety of related and unrelated data to
activate different neural clusters that can be used to rapidly understand the new input.
Adapting to new types of input during execution time is a difficult problem. Obvi‐
ously, there cannot be true learning to predict labels without at least a few ground truth
labels to check against. That said, unsupervised methods like self-organizing map or
auto encoders together with clustering can work in some situations. Unfortunately, these
unsupervised methods require a lot of data, so they cannot be used to adapt rapidly to a
new circumstance in real time. By learning data distribution without labels, or automat‐
ically organizing the data into clusters and assigning arbitrary labels, the data can be
correctly understood with only a handful of additional ground truth examples. By effec‐
tively utilizing neural clusters trained on other problems, we solve this problem, enabling
unsupervised learning that can also adapt to new circumstances immediately.
As SAI learns new concepts from other problems, we can retroactively apply these
concepts on old labeled data, allowing continual improvement as we gain better under‐
standing of old data. This is similar to the way humans perfect their skills in complex
tasks.
SAI determines the applicability of a given neural network, network layer, or feature
map, to the analysis of a given input. When applied properly, neural networks internally
organize input data into increasingly high level abstract information. By analyzing the
response of a network segment to known and unknown information, we can develop a
relation to determine whether the abstract concepts learned by a segment generalize well
to a given piece of data. This allows us to immediately draw from a diverse ‘segment
library’ of learned concepts when analyzing a new problem. Membership in the segment
library is determined by maximum applicability to any problem, not applicability to the
662 S. Mukhopadhyay et al.

current problem, so useful concepts are always retained. Humans can separate their
processing based upon natural features by filling in the open parts through learned expe‐
rience, enabling them to transfer the information to new experiences (Fig. 1).

Fig. 1. Feature response separation.

SAI agents are organized into a series of progressively more task specific network
layers, where each layer can be connected to multiple sub layers (Fig. 2).

Fig. 2. Hierarchical fractal architecture.

We refer to a layer plus all its possible sublayers as a lobe. Inputs such as raw sensory
data will flow into our network. Low level lobes will apply to most or all problems and
will start to process the data; at this point it will only be passed on to sub-lobes where
it is most applicable. As data flows through the network, only sections that are capable
of processing that type of data are activated, while non-relevant sections are bypassed.
Eventually the fully processed and understood data is routed to a final high-level lobe
and produces a result. Different lobes can be associated with different data types, but
different high-level lobes attached to the same mid-level lobe can also be associated with
different tasks for the same data. This allows us to efficiently use the same type of data
The Next Generation of Artificial Intelligence: Synthesizable AI 663

differently depending on the task, while still sharing maximum knowledge between
tasks. When we reach a point where no sub-lobe of a given lobe is applicable to a given
task, we ‘grow’ a new sub-lobe starting from that point, created from the most applicable
available network segments from previous tasks. As we train or learn about our new
task, these segments may diverge form their original values as the network improves at
the task. If this happens, they are also added to our segment library for future use. By
having a way to measure how well a network segment applies to a given input, we can
instantly transfer knowledge learned from other problems to the current situation. The
ability to transfer knowledge effectively between very different tasks allows rapid adap‐
tation to new and unusual conditions.
Conversely, when a given segment no longer applies to a given set of inputs, it can
be saved to a knowledge library rather than overwritten, allowing it to be recalled if it
becomes relevant again in the future. This gives us an effective method for lifelong self-
learning, where potentially valuable concepts that are irrelevant to the current problem
are unused, but not forgotten.
SAI’s hierarchical structure allows us to efficiently share knowledge between
different tasks, while retaining a segment library eliminating the problem of catastrophic
forgetting. Our method allows valuable learned concepts to be identified, and when they
become irrelevant due to changes in conditions or concept drift, they can be bypassed,
or phased out and saved for use in later tasks.
Because our system can measure the applicability of network weights to a specific
piece of data, we can swap out, or reroute around currently useless sections of our
network rather than overwriting them. This eliminates the danger of catastrophic forget‐
ting.
We believe that biological learning systems can adapt to new conditions so rapidly
because of two mechanisms: associative memory and analogy. Associative memory is
a mechanism by which training data can be effectively transferred between tasks that
are only loosely related, by associating part of one task with another, and using the
previously learned knowledge for the new task. Analogy is more complex and likely
only humans are capable of it. Analogy involves forming an association between two
relations involving the data instead of two data items. It is very powerful in that it allows
us to infer complex relations from sparse data.
When conditions change abruptly, a biological system will first try to adapt to the
new conditions by looking for similar conditions in the past in a different context (e.g.,
if we’re trying to detect cars and one went through a shadow); in past trainings other
objects have been in or gone through shadows, so we leverage the now very relevant
shadow-resistant features from other trainings to correct the problem rapidly without
needing to build a dataset of cars going through shadows), or by dropping part of the
classification criterion that have become unreliable in favor of a subset (e.g., blue paint
was spilled on a cat, now our method shows the standard color based map responses as
not applicable; but the shape based responses are still applicable, thus we swap out the
now irrelevant color-based cat features from our system (archiving them in the segment
library) and quickly learn to classify cats without relying on color). SAI integrates both
methods.
664 S. Mukhopadhyay et al.

SAI can be used to ‘grow’ new lobes on a nodal network agent when new useful
features are discovered and determine which lobes to branch based on inclusion of
features applicable to the current problem (Fig. 2). The issue of balance would apply to
determining the optimal feature set to assign to each lobe. If a lobe became so large it
was computationally infeasible to process data through it, it would be split into two
smaller balanced lobes. Because our network can bypass non-applicable layers and their
sub-layers, we avoid having to make such a hard tradeoff between knowledge acquisition
and memory retention.
Our SAI architecture can efficiently treat the same data differently based on context
and system goals; the same lower level lobe will be associated with different higher-
level lobes for different ways of handling its output. These can be activated selectively
based on system goals, or simultaneously to accomplish two tasks with only incremen‐
tally more processing power than is required for one. Similarly, several high-level lobes
may be associated with different versions of a drifting concept, or different noise types.
If the system goals are not explicitly given, the route the data takes through the
network is determined primarily by lobe applicability, and output paths represent system
goals. This means the system has the ability to choose its own system goals based on
the situation if necessary.
Changing architecture based on sensory input is a fundamental property of SAI in
that data is routed only through lobes capable of processing it.
As with all cognitive architectures, memory and computation are different aspects
of the same connections and weights. Sensory inputs are first processed in general areas
of the network, and then routed through dedicated areas based on the specific data type
and target task.
Instincts can be emulated by training a network segment to emulate a hard-coded
rule and adding that rule to the segment library. That allows it to be swapped either
manually or automatically where applicable and allows the system to learn to refine or
ignore the instinctual rule where necessary.
SAI has a library of network segments to draw on, and segments are stored by
maximum value in any situation, not current value. Therefore, catastrophic forgetting
cannot happen.
SAI represents a new paradigm in machine learning, able to draw on diverse knowl‐
edge to adapt any new situation rapidly.

3 Self-learning

Typical AI systems start out at some initial conditions, then improve at their target task
iteratively during training time, and reach some asymptotic maximum quality, then are
frozen in that state and fielded. A human expert however can continue to gain expertise
at a task long after they are finished being trained by an expert. Even when a human is
the best in the world at a given task, and no better expert exists, they can still continue
to gain expertise on their own. How is this possible?
Well, in one class of tasks the human can easily determine a success/failure labeling
or quality measure accurately, and therefore generate their own labeled data after
The Next Generation of Artificial Intelligence: Synthesizable AI 665

deployment. They then use reinforcement learning to continuously improve at their task.
Machine learning can already do this quite well, assuming a system can be trained to
estimate the quality measure, so we will neglect this case.
In another case the task we are trying to improve is a labeling task, so the system/
human can never really be sure it is improving at the task after deployment without the
occasional ground truth. Even for human experts, something akin to concept drift is
possible. Nevertheless, a human expert will gain a better and better understanding of the
task via unlabeled training; and be able to correct any concept drift from a single
example. Existing machine learning systems generally have the capability to correct for
concept drift via unlabeled plus labeled examples, but only our SAI architecture provides
a mechanism to detect the concept drift automatically, so it knows when to ask for more
examples.
If existing lobes become inapplicable to the current tasks, the system will grow new
lobes from that level on that apply better to the current problem and use them instead.
This is analogous to a human whose old way of doing things isn’t working anymore
experiencing a paradigm shift. The system may still need some ground truth to get a
handle on the new situation, but it would realize on its own that the old learning was
failing and that the results were no longer reliable and could ask for labeled examples
to regain its bearings.
Our system also demonstrates self-learning in another way. The system can retro‐
actively apply new concepts to old examples, learning new ways to understand long
known tasks in light of recent experiences. This means our system could continue to get
better at a task long after labeled data on the task had stopped coming in by transferring
useful concepts from other tasks. This sort of long term self-learning is one of the ways
human experts gain the highest levels of expertise.
Let’s look at two potential applications for this advanced technology.

3.1 A Practical Example


To illustrate the usefulness of SAI’s advanced architecture, consider a video recognition
network for classifying clips from musicals. The network would be trained for several
tasks related to classifying the clips, such as filling in the sound effects, recognizing
famous actors, determining the genre, and determining whether the clip comes from the
beginning, middle, or end of the story. This scenario is significant for two reasons.
Firstly, there are thousands of hours of such videos available, either already labeled or
easily labeled automatically. Second, there is significant interest in these types of appli‐
cations. Shazam does something similar with music.
SAI would start out using segments from one or more of the tasks, and produce an
input layer, some intermediate sub-layers, and three output lobes; one for each task. If
the same types of features were useful for all three problems, the network wouldn’t split
into these sub branches until shortly before the final layer.
To clarify, if the actor recognition task used very different features (facial features)
from the sound prediction task (visual cues, gestures, and body movements), the network
would bifurcate somewhere in the middle. Regardless, the early layers would contain
666 S. Mukhopadhyay et al.

features that applied to all three problems while the later layers would contain problem
specific features. The feature library would contain both.
This illustrates a commercial application of SAI, but its real strength lies in its
potential use to track suspicious behavior.
Suppose we are training a system to detect pickpockets (or terrorists) from watching
a video feed. There are not thousands of hours of data available on this; and it is not
publicly available and well labeled. We may have a few tens of examples of pickpockets
on video if we are lucky. Classically, this would make the problem infeasible; a computer
couldn’t solve the problem even though a human might be able to do it never having
seen a single real example. Humans can transfer knowledge from millions of other more
innocent interactions in their experience to understand what is happening. The human
already knows that the hands are used to grasp objects and are of interest, the clothing
has pockets in it, usually in the same areas, that someone suddenly changing direction
might be significant, etc. Similarly, SAI will look at the handful of ‘pickpocketing’
interactions and search its feature library for anything applicable.
The sound prediction features will be attuned to looking for small hand gestures (to
predict fingers snapping) and leg movements (to predict footstep sounds). The segment
that predicts where along the timeline a clip came from does so by learning to estimate
fatigue level from pose and timing differences. The actor recognition features would
understand the meaning and significance of faces and would share these types of features
with the fatigue estimation portion, which could use them to look for sweat on faces.
Some of these features (hand movements, stress level from pose) would have higher
than baseline applicability towards pickpocketing detection, and we could immediately
identify these and use them. So, when SAI has the initial path leading to our new ‘pick‐
pocketing’ output lobe, it would already understand a great deal about the meaning and
context of the scenes before even training with the ‘pickpocketing’ samples (Fig. 3).

Fig. 3. Pickpocket scenario demonstrating breakout of video features into characteristic lobes
and storage in the main Segment Feature Library for rapid future learning (Photo Courtesy of
Ili Simhi).

The new series of lobes would share low level features with the existing network;
and even the high-level features would be initialized from the most applicable members
The Next Generation of Artificial Intelligence: Synthesizable AI 667

of the feature library. At that point the network would proceed to learn from the ‘pick‐
pocketing’ samples, and if any features changed significantly, the network would know
it had learned a useful new concept, thus adding the concept to the segment library.
New concepts learned this way are retroactively applied to old problems. In this case
new concepts learned from pickpocketing detection could be checked for applicability
to music classification. This would allow our network to continue learning about a
problem long after data on that problem has stopped coming in, and therefore enabling
a better understanding of “old memories” in light of new experiences.

3.2 Transfer Learning


Due to transfer learning, analogical reasoning, and automated tuning, SAI can easily
transfer from one domain to another, unlike some other AI systems which cannot be
readily deployed to a new domain in order to learn from one another or from each other.
In SAI, for example, an “agent” performing the task of understanding imagery from
Synthetic Aperture Radars (SAR) can gain knowledge through transfer from an agent
performing semantic segmentation of CAMVID imagery or from a VGG-16 model pre-
trained on Imagenet [2]. A “core sample” from a previously trained agent on one task
can be used to train a new agent for a different task [2]. This strategy helps avoid the
need to train an entire “network” on a large dataset and improve overall performance.
For example, training a large VGG-16 network on a reasonably large dataset takes a
long time; SAI avoids that by using a VGG-16 model pre-trained on the Imagenet dataset
and extracting a “core sample” from it to create a new “agent”. Not only does this strategy
save training time, but it also helps create a trained agent from a relatively smaller
training dataset. In Watson, such an agent needs to be built from scratch by training on
a large labeled (SAR) dataset.
Another application using transfer of knowledge obtained through recognizing
objects is in the ImageNet dataset, which has no characters to segment and classify
handwritten foreign characters. Because ImageNet is drawn from a large and diverse
dataset, its features can be assumed to be more general purpose, being able to represent
many types of shapes and textures equally well. While it may not have the capacity to
directly recognize foreign characters, it should have the ability to recognize many
common simple structures in a wide array of image conditions, including noise. This is
the knowledge that we want to transfer out of it and combine with our own knowledge
of foreign characters. In general, assume that the SAI framework is asked to configure
an AI engine for a task “T.” We have at our disposal AI engines (neural networks) for
solving tasks T1, …, Tn. Some concepts learned in solving one or more of T1, …, Tn
will be relevant to solving task T. Assume we are provided a labeled dataset D for
training a neural network to solve task T.
SAI first starts with a randomly initialized neural network to solve T. For each
network corresponding to Ti, and for each of its layers, SAI determines the applicability
of the learned concepts towards the new task T. This is done through the evaluation of
a transferability metric that provides a measure between 0 and 1. SAI sorts the corre‐
sponding layers of each of the networks corresponding to T1, …, Tn in terms of
decreasing transferability measure. For each layer of the new network corresponding to
668 S. Mukhopadhyay et al.

T, SAI transfers the top k “relevant” weights from T1, …, Tn. Finally, SAI partitions D
into two subsets: a small subset Dtrain that will be used to fine tune the network and a
testing set Dtest that is used to test it. Note that both Dtrain and Dtest are also used in
computing the transferability metric. Notice that the data needed for fine tuning is only
a small subset of D. This scheme works even if D is small.
Today, large AI systems are developed and fine-tuned by companies with armies of
highly paid data scientists and engineers. It takes a significant amount of time, money,
and effort, together with a deluge of training data, to build and train an AI system that
can operate at the level of humans in a new domain.1 Even with this enormous force,
gaps in training remain (Fig. 4).

Fig. 4. Gaps in current state-of-the-art intelligent systems.

The AI community has recognized this limitation as one of the main stumbling blocks
hindering progress and preventing AI from positively influencing important areas of
human endeavor. The fast-changing nature of today’s world where M&As happen at the
blink of an eye, new diseases appear at an alarming rate (e.g., Zika), political landscapes
change overnight, and natural disasters come out of the blue make this slow mobility of
AI across domains a formidable problem. For years, scientists have wrestled with a
variety of solutions to this problem, such as “transfer learning.”
Most AI systems today rely on “transfer learning” to bring the experience of an AI
system in one domain to bear upon problems in another. This technique, however,
ignores the tremendous amount of human experience already available in the new
domain. Compare this to the way a person explores a new city. The person will combine
previously acquired skills, such as map reading with the knowledge obtained from
questioning locals about the best restaurants, museums, and shops, allowing them to
navigate and enjoy the city even though the city is new, and the tourist may not speak

1
While deep learning techniques have eliminated the need for automatically extracting features,
they have been shown not to work well, for example, for texture datasets where the inherent
dimensionality of the data is high [2].
The Next Generation of Artificial Intelligence: Synthesizable AI 669

the local language. This dynamic combination also enables a person to deal with
unforeseen events such as road closures and detours.
Humans have this innate ability to use this combination in their daily life to adapt to
new situations and tasks. This fundamental recipe used by humans to survive in a rapidly
evolving world is missing in current AI systems.2 How then is it possible to rapidly
synthesize AI systems, leveraging previous experience and existing knowledge in a new
domain to hit the ground running? Solving this problem requires rethinking the funda‐
mentals of existing AI architectures, through development of loosely coupled elastic
architectures that can interact with humans and other AI systems and draw upon their
knowledge and skills gained from previous experience and collaboratively solve inter‐
disciplinary problems.

3.3 Expanding the Reach of AI Through Synthesizable AI Using Peer Learning


Figure 5 depicts a loosely coupled Synthesizable AI architecture. The top layer provides
the reasoning, learning, and knowledge representation functionalities. It includes models
that represent human background knowledge. Multiple generative models exist, such as
Hidden Markov Models (HMM). In addition, SAI includes a transfer learning and an
analogical reasoning framework, a deep neural network model (DNN), a statistical
model (like statistical region merging), hypergraph-based models for large scale infer‐
ence together with heuristics to prune the search space, frameworks for active semi-
supervised and online learning, and an automatically curated belief store (based on
autoencoders) that manages beliefs of humans and AI systems.

Fig. 5. Synthesizable AI architecture.

2
Some individual pieces of the puzzle are already developed in subfields of AI like active
learning and transfer learning.
670 S. Mukhopadhyay et al.

The middle layer allows deployment, reconfiguration, and collaboration among AI


systems solving diverse problems using an elastic peer-to-peer agent architecture that
exploits the top layer and provides agility to it through dynamic agent synthesis and
deployment based on declaratively specified knowledge in near real time. That is, the
agents will use the reasoning engines as well as rules learnt by learning engines to process
information, learn from other agents or human expertise through transfer learning and
analogical reasoning, provide classification, and make decisions. It is this layer that
allows meta-learning for handling dynamically available human expert knowledge and
for dealing with concept drifts. It provides a single programming interface to synthesize
agents. Furthermore, the same layer enforces hot deployment of these agents under
operating condition by leveraging the third layer described below. The organization of
the agents in this layer can be flat (peer-to-peer) or hierarchical where agents in upper
layers are built by composing those in lower layers and can perform higher level tasks.
The third layer depicts a high-performance run-time execution middleware that
enables automated agent deployment and redeployment in real time through persistent
hot-swapping, provides runtime monitoring for the agents, interfaces with sensors and
actuators, and provides a distributed key-value store for publishing and subscribing to
information by agents and sensors. Agents in the second layer can tune the runtime
execution environment for optimal performance. Figure 6 shows an example flow for
the synthesizable AI architecture. The architecture creates and combines a feedback-
based meta-learning paradigm which is continuously monitoring the performance and
relevance of existing/emerging data sets. In case the data characteristic changes drasti‐
cally (e.g., in streaming video analytics, where the background changes from lighted to
dark as day gives way to night), a continually evaluated metric may indicate that the
performance of an agent has fallen below a threshold (for example, in the case of video
analytics, an unacceptable number of track overlaps, jumps, and drifts). SAI would
respond to this situation by dynamically replacing this agent by another more appropriate
to the altered situation or by adapting the former by transferring knowledge to it from
agent(s) already experienced in such situations. A measure of transfer is used to deter‐
mine which agent(s) the knowledge is transferred from in the latter case.

Fig. 6. Synthesizable AI flow diagram.


The Next Generation of Artificial Intelligence: Synthesizable AI 671

The Synthesizable AI architecture provides a practical approach in combining a


multi-agent-based architecture with machine reasoning and learning. It leverages
distributed and dynamic multi-agent synthesis to provide the following key features: (a)
Dynamically incorporating the contextual knowledge from experts into the learning
system; (b) Selectively use multiple learners to adapt to situation changes (c) Enable a
never-ending learning system to deal with concept drift, and (d) Enable transfer of
knowledge between agents solving problems in different domains. The integrated system
provides near real time response to rapidly changing situations without quality degra‐
dation or disruption in service commitments. The architecture allows a marketplace of
AI systems, that cooperate and learn from each other to solve interdisciplinary problems,
to be rapidly created, deployed, and adapted (Fig. 6).

4 Evaluation and Commercialization: A New Revolution


for the Next Decade

While SAI is still work in progress, it has been commercialized by AutoPredictive‐


Coding LLC., (http://autopredictivecoding.com) in the vertical of automated machine
diagnostics. The resulting SpotCheck application [4] provided real time machine diag‐
nostics from emitted sounds, vibrations, and magnetic fields (Fig. 7). As deterioration
of the machine lubricants, bearings, brushes, or other components occurs, very subtle
changes also occur in the sounds and vibrations of the machine as it continues to operate.
These sounds can be analyzed to estimate the oil quality, vacuum level, belt tension,
bearing condition, and other elements, and provide real-time indications of potential
internal failure. This analysis was used to drive systems longer, pushing them to their
limits, while avoiding catastrophic failure and saving millions of dollars each year.

Fig. 7. Using automated diagnostics to prolong the life of industrial machinery.

4.1 Automated Machine Learning System Now Used by NASA

For terrain recognition (Figs. 8 and 9) [5, 6], the advanced supercomputing division
at NASA Ames has been working with Louisiana State University to blend deep
learning techniques for use on existing neural networks to create a robust satellite
672 S. Mukhopadhyay et al.

dataset analysis system. Using a massive survey database consisting of over 330,000
scenes from across the United States, the system has been able to quickly train and
learn relevant patterns. The average image tile is 6000 pixels wide and 7000 pixels
deep, comprising approximately a 200 Mb file for each image. The entire data set
consists of 65 TB covering a ground sample distance of one meter. By using the SAI
technology and synthesized AI, the networks can then be trained one layer at a time
across very large and noisy datasets to provide the necessary fidelity for automatic
terrain recognition and terrain authentication.

Fig. 8. Sample images from the SAT 4/SAT 6 dataset [3].

Fig. 9. Automated tree cover estimation.

The technology has most recently been used for (Fig. 10), automatic yield prediction,
and automatic infrastructure tuning [7, 8]. Through a collaboration with NASA Ames
Research Center, SAI has recently been applied to determine tree cover areas and agri‐
cultural areas in California (Fig. 9). These activities will assist in monitoring potential
plant disease areas in remote, inaccessible areas requiring USDA intervention.
The Next Generation of Artificial Intelligence: Synthesizable AI 673

Fig. 10. Automatic yield prediction.

4.2 SAI: A Potential Solution for US Department of Agriculture Use in Yield


Prediction

Another application is the prediction of agricultural yields, based upon evaluation of


complex datasets, provides an excellent foundation for evaluation of these large data
sets and establishment of automatic yield prediction as depicted in Fig. 10. In the figure,
colors more clearly define the yield production based upon an original LANDSAT tile
which has been analyzed for specific patterns, most likely to yield higher growth.
Yet another emerging application is the use of synthesizable AI to analyze and auto‐
matically color images. This will have enormous application in a variety of areas,
including undersea exploration, and deep space exploration, as well as analysis of remote
area activities. Figure 11 depicts the application’s use in automatically coloring a black-
and-white terrain landscape through analysis of specific features and the system’s
capacity to “self-learn” based upon slight variations of terrain texture.

Fig. 11. Automatic terrain landscape coloring.

The search-based program/agent generation facility has already been used for intel‐
ligent tutoring applications in high school math education [9, 10], automated drug
discovery [11], and automated program visualization [12]. These applications will
continue to expand into the future.
674 S. Mukhopadhyay et al.

IBM’s Watson has been used commercially in IoT and the automotive industry, in
social media campaigning, in medical diagnosis, in image interpretation in radiology,
in natural language processing and speech recognition, in education, in financial serv‐
ices, in supply chain management, and commerce. Recently, there have been applica‐
tions of Watson to automated material discovery.
SAI has been used in a variety of domains including automated diagnostics for
industrial machinery [4], satellite image understanding [5, 6], infrastructure tuning [7,
8, 13], education [9, 10, 14], program execution visualization [12], noisy natural
language processing [15], and automated drug discovery [11, 16] some of which have
not yet been addressed by Watson.
One of the more promising future applications of the synthesizable AI applications
will be in development of automatic drug discovery, an area in which we were only now
beginning to envision. SAI technology is currently competing for the AI XPrize with
AI-based automated drug-discovery as its target.

5 The Future: Automatic Drug Discovery

In this age of vaccines and antibiotics there is still a constant effort to find new drugs to
combat illnesses for which there are no known cures. There is a need to discover
replacements for existing drugs targeted at pathogens which have become resistant to
current drugs. There is also a need to develop new drug therapies for health issues
adversely affecting the lives of hundreds of million people every day. Indiscriminate use
of antibiotics has resulted in pathogens developing drug resistance to produce “super‐
bugs” (http://www.cdc.gov/drugresistance).
Although the multidrug-resistance in pathogens is growing fast, the number of new
drugs being developed to treat bacterial infections has reached its lowest point since the
beginning of the antibiotic era. The resistance is particularly problematic in Gram-posi‐
tive organisms S. aureus, E. faecalis, and S. pneumoniae as well as a number of Gram-
negative organisms including K. pneumonia, A. baumannii, and P. aeruginosa. Hence,
there is a dire need to develop new platforms and approaches to discover antibacterial
agents against novel molecular targets. Not only are new drugs not being created, but
the existing process of creating drugs is slow, inefficient and costly.
There is a desperate need to identify new antibiotics and antimicrobials rapidly as
opposed to the normal time taken to create a drug. The solution is to develop a technique
to construct libraries of molecules with the end goal of finding and developing new
antibiotics and antimicrobial agents in a more efficient and cost-effective manner.
Our synthesizable AI-based approach (in collaboration with Dr. Brylinski from LSU
Biochemistry) can automatically synthesize targeted drug molecules (see http://
brylinski.cct.lsu.edu/content/molecular-synthesis for the eSynth tool), filter candidates
based on chemical criteria (such as being an antibiotic) [11], involves the analysis of 3D
image models of the pathogen, automates clinical testing for side effects, and predict the
candidate or candidates that is most likely to succeed. Our engine eSynth, generates
target directed libraries using a limited set of building blocks and coupling rules
mimicking active compounds. Given a set of initial molecules, eSynth synthesizes new
The Next Generation of Artificial Intelligence: Synthesizable AI 675

compounds to populate the pharmacologically relevant space. The building blocks [16]
of eSynth are:
Rigids: inflexible fragments often a single or fused aromatic group and
Linkers: flexible fragments connecting rigid blocks
The eSynth software rapidly generates a series of compounds with diverse chemical
scaffolds complying with Lipinski’s criteria for drug-likeness. Although, these mole‐
cules may have different physicochemical properties, the initial fragments are procured
from biologically active and synthetically feasible compounds. eSynth can successfully
reconstruct chemically feasible molecules from molecular fragments.
Figure 12 shows a 19-atom molecule compound rebuilt using eSynth. The process
involves decomposition of the original 19-atom molecule through fragmentation and
subsequent rebuilding to potentially more useful structures [16]. Furthermore, in a
procedure mimicking the real application, where one expects to discover novel
compounds based on a small set of already developed bioactives, eSynth can generate
diverse collections of molecules with the desired activity profiles.

Fig. 12. A 19-atom molecule rebuilt using eSynth [9].

Research activity is ongoing in several new, emerging areas as outlined in the


following paragraphs.

5.1 Antibiotic/Drug Filter

The goal is for eSynth to synthesize new compounds to populate the pharmacologically
relevant space. We use Lipinski’s Rule-of-Five to ensure that the synthesized
compounds have drug-like properties. Due to the number of possible combinations
growing exponentially with the number of molecular fragments, Lipinski’s Rule-of-Five
is applied to exclude those compounds that do not satisfy drug-like criteria.

5.2 Side-Effect/Toxicity Filter


Even after pharmaceutical companies spend years and billions of dollars in creating a
new drug, often it is the case that the drug has undesirable side-effects that renders it
unusable. To detect side effects, the companies must conduct extensive clinical trials
676 S. Mukhopadhyay et al.

that consume years of effort and billions of dollars. All that money and effort ultimately
gets wasted if the drug has a negative side-effect in which case it is rejected by the FDA.

5.3 Synthetic Accessibility Analysis


Natural products are a source of ingredients for many drugs. Some of these natural
products are hard to acquire. It is also difficult to analyze the molecular structure of a
compound for negative side-effects. We use deep neural network models that, from the
molecular structure of a natural product, can predict it synthetic accessibility score. For
compounds with high scores, it is possible to synthesize them using eSynth and analyze
their side-effects.

5.4 Automatic Drug Repurposing

Based on features extracted by 3D image models of the pathogens and those of drugs,
learning models will be used to repurpose existing drugs to new diseases.

5.5 Other Future Applications

Another application that SAI has been focusing on is automated vulnerability analysis.
SAI has been automatically able to localize the “attack surface” of an application.
Current research is focusing on automatically patching such vulnerabilities as well as
extending the analysis to large cyber infrastructures. SAI is currently being targeted in
the automatic lighting control domain in smart buildings.

6 Limitations

For continued expansion to synthesize new compounds in pharmacology, SAI and


eSynth must be strengthened through the use of more expanded deep neural networks
to determine side effects. We are currently evaluating and using deep neural network
models to predict possible side effects from the molecular structure and the bondings in
the drug molecule.

7 Conclusion

The future for artificial intelligence remains bright. Each day, new technologies such as
the Synthesizable AI can be called upon to rapidly assume even “deeper roles” in inter‐
disciplinary areas ranging from open street maps, cybersecurity and power systems to
kidney stone surgery through analysis of extreme and complex events and ever larger
sets of mega-data and utilization of newer computing architectures [17, 18].
The Next Generation of Artificial Intelligence: Synthesizable AI 677

References

1. Iyengar, S., Mukhopadhyay, S., Steinmuller, C., Li, X.: Preventing future oil spills with
software-based event detection. IEEE Comput. 43(8), 95–97 (2010). IEEE Computer Society,
0018–9162/10
2. Karki, M., DiBiano, R., Basu, S., Mukhopadhyay, S.: Core sampling framework for pixel
classification. In: Proceedings of the International Conference on Artificial Neural Networks
(2017)
3. Basu, S., Karki, M., Mukhopadhyay, S., Ganguly, S., Nemani, R., DiBiano, R., Gayaka, S.:
A theoretical analysis of Deep Neural Networks for texture classification. IJCNN 2016, 992–
999 (2016)
4. DiBiano, R., Mukhopadhyay, S.: Automated diagnostics for manufacturing machinery based
on well regularized deep neural networks, integration. VLSI J. 58, 303–310 (2017)
5. Basu, S., Ganguly, S., Nemani, R., Mukhopadhyay, S., Zhang, G., Milesi, C., et al.: A semi
automated probabilistic framework for tree cover delineation from 1-M NAIP imagery using
a high performance computing architecture. IEEE Trans. Geosci. Remote Sens. 53(10), 5690–
5708 (2015)
6. Basu, S., Ganguly, S., Mukhopadhyay, S., DiBiano, R., Karki, M., Nemani, R.: DeepSat—a
learning framework for satellite imagery. In: Proceedings of the ACM SIGSPATIAL 2015 (2015)
7. Sidhanta, S., Golab, W., Mukhopadhyay, S., Basu, S.: Adaptable SLA-aware consistency
tuning for quorum-replicated data stores. IEEE Trans. Big Data 3, 248–261 (2017)
8. Sidhanta, S., Mukhopadyay, S.: Infra: SLO aware elastic auto scaling in the cloud for cost
reduction. In: IEEE BigData Congress, pp. 141–148 (2016)
9. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of geometry proof
problems. In: Proceedings of AAAI, pp. 245–252 (2014)
10. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of solutions for shaded
area geometry problems. In: Proceedings of FLAIRS (2017)
11. Naderi, M., Alvin, C., Ding, Y., Mukhopadhyay, S., Brylinski, M.: A graph-based approach
to construct target focused libraries for virtual screening. J. Chemoinform. 8, 14 (2016)
12. Alvin, C., Peterson, B., Mukhopadhyay, S.: StaticGen: static generation of UML sequence
diagrams. In: Proceedings of the International Conference on the Foundational Aspects of
Software Engineering (2017)
13. Mukhopadhyay, S., Iyengar, S.S.: System and architecture for robust management of
resources in a wide-area network. US Patent Number 9,240,955 issued January 2016
14. Alvin, C., Gulwani, S., Majumdar, R., Mukhopadhyay, S.: Synthesis of problems for shaded
area geometry reasoning. In: Proceedings of AIED (2017)
15. Basu, S., Karki, M., Ganguly, S., DiBiano, R., Mukhopadhyay, S., Gayaka, S., Kannan, R.,
Nemani, R.: Learning sparse feature representations using probabilistic quadtrees and deep
belief nets. Neural Process. Lett. 1–13 (2016). https://doi.org/10.1007/s11063-016-9556-4
16. Liu, T., Naderi, M., Alvin, C., Mukhopadhyay, S., Brylinski, M.: Break down in order to
build up: decomposing small molecules for fragment-based drug design with eMolFrag. J.
Chem. Inf. Model. 57, 627–631 (2017)
17. Boyda, E., Basu, S., Ganguly, S., Michaelis, A., Mukhopadhyay, S., Nemani, R.: Deploying a
quantum annealing processor to detect tree cover in aerial imagery of California. PLoS ONE (2017)
18. Ganguly, S., Basu, S., Nemani, R., Mukhopadhyay, S., Michaelis, A., Votava, P., Milesi, C.,
Kumar, U.: Deep learning for very high resolution imagery classification. In: Srivastava, A.,
Nemani, R., Steinhaeuser, K. (eds.) Large-Scale Machine Learning in the Earth Sciences.
CRC Press, Boca Raton (2017)
Cognitive Natural Language Search Using
Calibrated Quantum Mesh

Rucha Kulkarni, Harshad Kulkarni, Kalpesh Balar, and Praful Krishna ✉


( )

Arbot Solutions Inc., dba Coseer, San Francisco, CA 94105, USA


praful@coseer.com

Abstract. This paper describes the application of a search system for helping
users find the most relevant answers to their questions from a set of documents.
The system is developed based on a new algorithm for Natural Language Under‐
standing (NLU) called Calibrated Quantum Mesh (CQM). CQM finds the right
answers instead of documents. It also has the potential to resolve confusing and
ambiguous cases by mimicking the way a human brain functions. The method
has been evaluated on a set of queries provided by users. The relevant answers
given by the Coseer search system have been judged by three human judges as
well as compared to the answers given by a reliable answering system called
AskCFPB. Coseer performed better in 57.0% of cases, and worse in 16.5% cases,
while the results were comparable to AskCFPB in 26.6% of cases. The usefulness
of a cognitive computing system over a Microsoft-powered key-word based
search system is discussed. This is a small step toward enabling artificial intelli‐
gence to interact with users in a natural manner like in an intelligent chatbot.

Keywords: Chatbot · Cognitive computing


Natural Language Processing (NLP) · Cognitive search
Natural Language Understanding (NLU)

1 Introduction

Natural Language Search and one of its prominent applications, Chatbots, are popular
topics in the field of technology as well as research.
Their popularity can be attributed to the tremendous potential and promises in several
fields [1–6]. There are several areas of business, for example, brand-building, customer
acquisition, product discovery, support, etc. that require human interaction. There is
always high cost related to human labor, inaccuracy related to fatigue and general human
biases and errors. An automation system based on Natural Language Search can remove
several of these problems by simply replacing the human.
A well-designed chatbot, for example, can be used to facilitate the internal processes
of a business. A chatbot, if successfully developed as a subject matter expert, can be
deployed to any part of the business so that any employee or customer can retrieve
important information from it at any time.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 678–686, 2019.
https://doi.org/10.1007/978-3-030-02686-8_51
Cognitive Natural Language Search Using Calibrated Quantum Mesh 679

However, in the current state, a clear majority of systems based on NLU are not well
designed or accurate enough. High accuracy is necessary so that business managers can
entrust them with mission-critical roles and tasks.
Highly advanced Artificial Intelligence (AI) technologies like deep learning have
been tremendously successful in analyzing structured data [7–9]. However, when it
comes to unstructured data, especially processing natural language like English, they
seem to fail. For a technology like deep learning to be successful, it needs considerable
amount of training data which might not be available to the enterprises. Moreover, such
data must be annotated by subject matter experts, which can be prohibitively expensive.
Most intelligent natural language systems like Chatbots fail because they are unable
to interact and process content like human beings do. Frequently, they are based on
keyword correlation which does not enable them to “understand” the relations between
words and their context.
Humans process information around certain ideas. Ideas are entities that are
expressed by words and phrases and complex relationships between them, – something
computer systems cannot trivially handle. Thus, they are unable to retrieve meaning
from information. Some essential characteristics of human thought process are: focusing
on ideas rather than words, prioritizing ideas based on significance and credibility and
knowing when there is not enough information available to take a decision.
Intelligent machines capable of producing high accuracy can be designed based on
the imperatives mentioned above without relying on keywords. They can extract ideas,
order them, store them in a hierarchical data structure and even derive context from live
conversations. This type of approach offers a significant advantage over traditional
chatbots in terms of capability and performance.
This unique paradigm of intelligent understanding of information is captured in one
branch of AI technology: cognitive computing [10–15]. Cognitive computing can be
used to automate tedious, repetitive and language-driven workflows that do not require
human intelligence anymore. This would allow the humans to focus on creativity and
judgment while the machines take care of the mundane jobs.
In this work, we have developed a Natural Language Search system that can help
users with their queries. It analyzes the query placed by the user and suggests relevant
answers from a list of Frequently Asked Questions (FAQ). The reported answer may be
a direct match with an existing entry in the FAQ or produce a solution that is part of
some other entry.
To evaluate the performance of the system, we used a human judge as well as
compared the results with that of AskCFPB [16]. AskCFPB is a well-established and
trustworthy resource to get answers maintained by the Consumer Finance Protection
Bureau of United States Government. It covers a variety of topics including bank
accounts, credit reports, credit scores, debt collection, student loans and mortgages.
There is a search box on the website where the users can enter their queries and look at
related questions and answers. This system is powered by popular Microsoft search
engine – Bing.
The rest of the paper describes the method, the evaluation criteria used, and the results
of the evaluation. We close with discussion on future work already underway at Coseer.
680 R. Kulkarni et al.

2 Methods

2.1 Tactical Cognitive Computing

All Coseer systems are built using Tactical Cognitive Computing (TCC). TCC is a
programming paradigm with a focus on high accuracy, short training times and low cost.
Tactical Cognitive Computing has been developed as a solution to traditional cognitive
computing systems that are expensive and take years to implement.
To be called tactical a cognitive computing system must be highly accurate. While
lower level accuracy has been accepted and even lauded in the consumer world, the
businesses need highly reliable systems.
A TCC system must also be quick to train. The key factor in enabling a quick training
time is a system’s ability to train without annotated training data. Annotation of training
data typically needs subject matter experts that are very expensive. Annotation is also
a time-intensive effort – some prominent implementations have taken years to train the
data.
Finally, a TCC system must be configurable, at low cost, to a wide variety of situa‐
tions in an enterprise. A key component of such configurability is the ability of tactical
cognitive computing systems to be deployed over commoditized hardware in public
cloud, private cloud or on-premise.
Coseer’s implementation of TCC for natural language uses our work with Calibrated
Quantum Mesh (CQM) and cognitive calibration, apart from various techniques in
natural language processing, natural language understanding, and artificial intelligence.

2.2 Calibrated Quantum Mesh

Calibrated Quantum Mesh (CQM) is a novel AI algorithm that is specifically built for
understanding natural language as human beings do. It does not need annotated training
data and reduces the need for unannotated data to a fraction.
CQM works on three basic principles, as shown in Fig. 1:

Multiple Meanings. CQM recognizes that any symbol, word or text can have more
than one meaning or quantum states with different probabilities. It considers all these
possible states to find the most probable answer.

Interconnectedness. CQM recognizes that everything is correlated to each other and


modifies each other’s behavior. Specifically, each item can influence the probability
distribution across quantum states of all other items it is connected to. CQM considers
such mesh of interconnections to reduce error.

Calibration. CQM sequentially adds all available information to help converge the
mesh into a single meaning. The calibration process is fast, accurate and efficient in
detecting any lacunas. The calibrations are implemented on training data, contextual
data, reference data and other known facts about the problem. Sometimes these cali‐
brating systems called Calibrating Data Layers are handled by an independent CQM
module or another AI process.
Cognitive Natural Language Search Using Calibrated Quantum Mesh 681

Fig. 1. Basic tenets of Calibrated Quantum Mesh (CQM).


682 R. Kulkarni et al.

When the training data is passed through CQM, it defines many of the mesh’s inter‐
relationships. Where applicable, data layer algorithms learn from such data. Often new
relations and nodes are added to the mesh, making it smarter.
When a workflow is modeled by CQM, the creation of any black boxes is avoided
to the maximum extent. This ensures transparency and interpretability of the models.
We note that keywords are not important for CQM in processing natural language.
Complex ideas are represented by different parts of the mesh with varying complexity.
This enables the algorithm to handle fluid, multi-state and inter-connected knowledge
– some inherent criteria of natural language.
The algorithm can also learn from non-direct corpora. For example, while assisting
a UK tax advisory, it was executed over HMRC.com, Law.com, Investopedia and a
proprietary glossary.
The most important advantage of CQM is that it does not need annotated training
data. As a result, training a CQM model is very fast and cost-effective. It also allows
iterations over the training process leading to highly accurate results. This capability
qualifies CQM based systems to be part of TCC.

2.3 Cognitive Natural Language Search System


A cognitive search system can be applied to understand and interpret textual data in a
natural way (Fig. 2). We used an algorithm based on CQM, which is also a TCC system,
to develop a Natural Language Search system. We applied the Coseer system to assist
users of AskCFPB with their questions.

Fig. 2. Overview of the cognitive search system.

The search system has two main steps: ingestion and search. In the ingestion step,
documents are interpreted by the CQM and processed into relevant data structures. In
this case, it was the FAQs that were processed and stored in a database. Then a search
module takes the query as input and searches the database for the relevant text or a
snippet. The relevant text is then sent to the user as a possible answer to the query.
Cognitive Natural Language Search Using Calibrated Quantum Mesh 683

3 Evaluation Criteria

The cognitive search system was evaluated in the following ways:

Accuracy. This criterion measures how accurately the system answers the queries. It
was calculated by dividing the number of queries correctly answered by the total number
of queries. The search system was tested with 158 queries. For each query, the top three
results returned by the system were evaluated by three human judges. The results were
marked as relevant if any of the top three results satisfactorily answered the question.

Comparative Performance. This evaluation criterion demonstrates how well the


search system performs as compared to AskCFPB search. AskCFPB was selected for
comparison because it is the most closely related search system. This system is powered
by Bing Search Engine. For this evaluation criteria, the same 158 queries were tested
on both the search system. Three human judges evaluated the top snippet in the following
categories: COMPARABLE, COSEER_BETTER and ASKCFPB_BETTER,
according to which result seemed more relevant to the query. While AskCFPB returns
documents, not answers, we considered most relevant snippet identified by Bing Search
Engine. We acknowledge that this is a very stringent evaluation criterion towards Coseer
systems.

Amount of Training Data Necessary (Not Used). A third evaluation criterion that is
not used in this evaluation is the amount of training data that is necessary to train a
tactical cognitive system. Typically, TCC systems need a fraction of data than other AI
systems, and do not need them to be annotated. In this evaluation an untrained model
was used.

4 Results and Discussion

For the accuracy calculation, 130 out of the 158 queries were correctly answered by the
Coseer cognitive search system, as evaluated by the human judges. This computes to
82.3% accuracy. This seems to be reasonable considering that the system was not trained
for this subject matter.
For the comparative study, 158 new queries were considered. Figure 3 shows the
results of the comparative study.
Out of the 158 queries, 26.6% showed comparable results. In 16.5% of the cases,
AskCFPB performed better than Coseer and in 57.0% of the cases, Coseer performed
better than AskCFPB.
To get further insight into why one system works better than the other, we reported
a couple of representative cases.
684 R. Kulkarni et al.

Fig. 3. Results of comparative performance between AskCFPB and Coseer.

Table 1 shows two queries where Coseer performed better than AskCFPB.

Table 1. Cases where Coseer performed better than AskCFPB.


Query Coseer answer AskCFPB answer
How long do mortgages How can I determine how long What exactly happens when a
normally last? it will take me to pay off my mortgage lender checks my
mortgage loan? credit?
What type of rent information At least one of the big three What is a credit report? -
is on my credit report? consumer reporting agencies, Consumer Financial
Experian, uses rental payment Protection…
and collection information in
its credit reports

There are several reasons behind the better performance of Coseer over
AskCFPB. Unlike AskCFPB, Coseer considers the context and the meaning of the
query. It provides emphasis on the functional words like ‘how long’ instead of
matching keywords. Similarly, Coseer considers all other possible meanings of the
search query to execute its search. Special attention is given to the important
phrases, abbreviations, and colloquialisms.
Table 2 reports a couple of cases where AskCFPB performed better than Coseer.
The second query in Table 2 is of special interest. Although the question here is whether
paying rent on time would strengthen credit history, the information about a weakening
of the credit history due to late payment is very relevant. Even though it appears to be
diametrically the opposite answer, AskCFPB has correctly recognized such an answer
as relevant. Coseer algorithm can be further improved by teaching it how to handle such
cases.
Cognitive Natural Language Search Using Calibrated Quantum Mesh 685

Table 2. Cases where AskCFPB performed better than Coseer.


Query Coseer answer AskCFPB answer
What info does a credit report If the investigation shows the A credit report is a statement
show? company provided wrong that has information about
information about you, or the your credit activity and current
information cannot be verified, credit situation such as loan
the company must notify all paying history and…
the credit reporting companies
to which it provided the wrong
information…
Can I build my credit history You have a steady source of Could late rent payments or
by paying my rent on time? income and a good record of problems with a landlord be in
paying your bills on time. my credit report?
Lenders will look at your
ability to repay the
mortgage…

5 Limitations, Conclusions and Future Work

The most significant limitation of the study is that an untrained AI system was used. In
future, it is necessary to train a system to achieve more than 90% accuracy as per the
first evaluation criterion. In that study, we will also be able to compare the two systems
as per the third evaluation criterion - how much data is necessary to train the system?
Although Natural Language Search is an exciting and popular technology with ever
increasing areas of applications, its ability to interact with people in a natural manner
remains at an early stage. We applied a tactical cognitive computing system in conju‐
gation with calibrated quantum mesh to develop a chatbot that helps customers with
their questions. The search system demonstrated reasonable accuracy in assisting the
users to find the answers to their queries. Although there are several opportunities to
improve, this comparative study demonstrates the usefulness of such an approach over
typical key-word based natural language processing systems. It recommends cognitive
computing as a key player in solving difficult problems that require humanlike thinking,
ability to reason and extract meaning from information.
We plan to extend CQM for other basic cognitive processes like processing intona‐
tions in speech, translating ideas back into words and perhaps processing and expressing
unarticulated thoughts and emotions in text.
Idea-oriented chatbots can be the key to assimilating human and computing worlds.
Coseer’s solutions demonstrate that we are already capable of designing and training
machines to process information like humans do, talk like humans do and provide busi‐
ness value as humans do.
Since the chatbots can run round the clock, at a fraction of the cost of a human
resource and with high accuracy, it is perhaps not an overstatement to say that the future
of the chatbot could be the future of all business.
686 R. Kulkarni et al.

Acknowledgment. We thank the larger team of Coseer for developing the system. We also thank
Obaidur Rahaman for assistance in preparing the manuscript.

References

1. Ghose, S., Barua, J.J.: Toward the implementation of a topic specific dialogue based natural
language chatbot as an undergraduate advisor. In: 2013 International Conference on
Informatics, Electronics and Vision (ICIEV), pp. 1–5 (2013)
2. Heller, B., et al.: Freudbot: an investigation of chatbot technology in distance education. In:
EdMedia: World Conference on Educational Media and Technology, pp. 3913–3918 (2005)
3. Hill, J., et al.: Real conversations with artificial intelligence: a comparison between human–
human online conversations and human–chatbot conversations. Comput. Hum. Behav. 49,
245–250 (2015)
4. Huang, J.Z., et al.: Extracting Chatbot Knowledge from Online Discussion Forums (2007)
5. Jia, J.: The study of the application of a web-based chatbot system on the teaching of foreign
languages. In: Society for Information Technology and Teacher Education International
Conference, pp. 1201–1207 (2004)
6. Jia, J.Y.: CSIEC: a computer assisted English learning chatbot based on textual knowledge
and reasoning. Knowl.-Based Syst. 22, 249–255 (2009)
7. Goodfellow, I., et al.: Deep Learning, vol. 1. MIT press, Cambridge (2016)
8. LeCun, Y., et al.: Deep learning. Nature 521, 436 (2015)
9. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117
(2015)
10. Ferrucci, D.A.: Introduction to “This is Watson”. IBM J. Res. Dev. 56 (2012)
11. Li, Y., et al.: Cognitive computing in action to enhance invoice processing with customized
language translation. Presented at the 2017 IEEE 1st International Conference on Cognitive
Computing (2017)
12. McCord, M.C., et al.: Deep parsing in watson. IBM J. Res. Dev. 56 (2012)
13. Amir, A., et al.: Cognitive computing programming paradigm: a corelet language for
composing networks of neurosynaptic cores. In: The 2013 International Joint Conference on
Neural Networks (IJCNN), pp. 1–10 (2013)
14. Cassidy, A.S., et al.: Cognitive computing building block: a versatile and efficient digital
neuron model for neurosynaptic cores. In: The 2013 International Joint Conference on Neural
Networks (IJCNN), pp. 1–10 (2013)
15. Esser, S.K., et al.: Cognitive computing systems: algorithms and applications for networks
of neurosynaptic cores. In: The 2013 International Joint Conference on Neural Networks
(IJCNN), pp. 1–10 (2013)
16. Dhoat, K.K.: Cognitive Search Technique for Textual Data. College of Engineering, Pune
(2013)
Taxonomy and Resource Modeling
in Combined Fog-to-Cloud Systems

Souvik Sengupta(B) , Jordi Garcia, and Xavi Masip-Bruin

Advanced Network Architectures Lab, CRAAX,


Universitat Politècnica de Catalunya, UPC BarcelonaTech,
Vilanova i la Geltrú, 08800, Barcelona, Spain
{souvik,jordig,xmasip}@ac.upc.edu

Abstract. As the technology is rapidly evolving, the society as a whole


is gradually surrounding by the Internet. In such a high connectivity sce-
nario, the recently coined IoT concept becomes a commodity driving data
generation rate to increase swiftly. To process and manage these data in
an efficient way, a new strategy, referred to as Fog-to-Clod (F2C), has
been recently proposed leveraging two existing technologies, fog com-
puting and cloud computing, where resources are playing a pivotal role
to manage data efficiently. In these scenarios, vast numbers of inter-
connected heterogeneous devices coexist, thus crafting a complex set of
devices. Managing efficiently these devices requires a proper resources
classification and organization. In this paper, we offer a model to clas-
sify and taxonomies the whole set of resources aimed to best suit the
Fog-to-Cloud (F2C) paradigm.

Keywords: Fog-to-Cloud (F2C) · Taxonomy · Ontology


Resources classification · Class diagram

1 Introduction
Technologies are rapidly evolving driving the whole society towards a new era
of smart services. Indeed, day by day, we are moving towards the ‘smart’ to
the ‘smarter’ world. As per the United Nation [1], by 2050 about 64% of the
developing world and 86% of the developed world will be urbanized. Also as
per some statistics [2], by 2050 more than 70% of world population will be
living in a smart environment, where most of the things will connect to the
network. Gartner Inc. [3] forecasts that by 2020 almost 20.4 billion connected
things will be in use worldwide. Also by 2022, M2M traffic flows are expected to
constitute up to 45% of the whole Internet traffic [4]. Beyond these predictions,
the McKinsey Global Institute [5] reported in 2015 that the number of connected
machines (units) had grown 300% over the last five years. Traffic monitoring of
a cellular network in the US also showed an increase of 250% for M2M traffic
volume in 2011. Also, Cisco [6] predicted that 50 billion objects and devices
would be connected to the Internet by 2020. However, although more than 99%
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 687–704, 2019.
https://doi.org/10.1007/978-3-030-02686-8_52
688 S. Sengupta et al.

of today’s available things in the world remain unconnected, several pieces of


evidence define the connectivity trend in different sectors. The following two
examples are highlighting the fact. According to a Navigant research report [7],
the number of installed smart meters around the world will grow to 1.1 billion
by 2022. Another report from Automotive News [8], states that the number of
cars connected to the Internet worldwide will increase from 23 million in 2013
to 152 million in 2020.
By following up all the trends and above scenarios, it is clear that IoT con-
nected devices are going to rule over the smart environment, being a key com-
ponent in the whole system. In short, the envisioned ‘smart’ scenario consists
in a massive amount of IoT devices, highly distributed over the network, along
with a set of highly demanding services, some of them not yet foreseen though.
It is also widely accepted that the benefits of cloud computing bring to handle
high processing and storage services demands. However, it is also recognized that
cloud data centres may fail to deal with services demanding strict low latency,
mainly due to the distance from the cloud -where the data is to be processed
- to the edge -where the data is to be collected, and the user is. As a conse-
quence, some critical undesired effects, such as network congestion, high latency
in service delivery and reduced Quality of Service (QoS) are being experienced
[9]. By addressing these problems, the fog computing recently came up, rely-
ing on adding processing capabilities between the cloud data centre and the
IoT devices/sensors, thus aimed at extending the cloud computing facilities to
the edge of the network [9,10]. However, interestingly, it is also recognized the
fact that fog computing is not going to compete with cloud computing, instead
collaborate both together, intended to provide better facilities to the next gen-
eration of computing and networking platforms [10]. Indeed, the whole scenario
may be seen as a stack of resources, from the edge up to the cloud, where a
smart management system may adequately allocate resources best suiting ser-
vices demands, regardless where the resources are, either at the cloud or fog. The
recently coined Fog-to-cloud (F2C) architecture [11], has been proposed intended
to build such a coordinated management framework. Therefore, it is clear that
the development and combination of new technologies (i.e., IoT, Cloud, and Fog
computing, etc.) offers a multi fascinate solution for the future smart scenario.
Unfortunately, the enormous diversity of devices makes such a management
system not easy to deploy. Indeed, efficient and proper management of such a set
of heterogeneous devices is a crucial challenge for any IoT computing platform to
succeed. However, to facilitate the design of the suggested resources coordinated
management framework, it is essential to know what the resources characteristics
and attributes are, thus building some resources catalogue. This paper aims to
identify a resource taxonomy and resource model, suitable for a coordinated
F2C system, as a mandatory step towards a real F2C management architecture
deployment.
The rest of the paper is organized as follows. Section 2 positions the current
state of the art. Next, Sect. 3 presents an architectural overview for the coordi-
nated Fog-to-Cloud paradigm. In Sect. 4, we show a class diagram to represent
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 689

our taxonomic view of the F2C resources, and also we discuss on the various
taxonomic parameters considered to make the classification of an F2C resource.
Following up the previous section, in Sect. 5 we represent and define the gener-
alized resource model for the F2C computing platform. To support our resource
model, we present some examples of real devices participating in the F2C sys-
tem. Finally, some concluding remarks and future directions of our research work
given in Sect. 6.

2 State of the Art: Related Work and Motivation

For any management system, proper utilization of resources undoubtedly facili-


tates an optimal service execution and hence helps to build an effective manage-
ment solution. Most importantly, to manage the whole set of resources, it is very
much essential to have them categorized and classified into a resources catalogue.
Apparently, to build such a description is necessary to identify the character-
istics and attributes of the resources to be organized. In this paper, we aim at
determining a resource classification and taxonomy, for a scenario combining fog
and cloud resources, like the one, we envisioned by the F2C. The underlying
objective of such a classification is to describe a catalogue of resources, where
resources are formally defined, thus easing both an efficient resources utilization
and an optimal services execution.
In a previous work [12], we put together a comprehensive literature survey,
highlighting the resource characteristics for distinct computing paradigms and
also observed several interesting findings. We found that, in most cases, hardware
components (i.e., memory, storage, processor, etc.), software (i.e., - APIs, OS,
etc.) and network aspects (i.e., - protocol, standards etc.) of the devices [13,14]
have been considered to classify the edge resources. Even for grid resources hard-
ware components have also been studied (i.e., storage capacity, memory etc.),
to classify them [15,16]. We recognized the relevance of efficient network man-
agement to build a dynamic computing platform, many references [13,17,18],
put the focus to identify the networking standards, technology and bandwidth
capacity. Interestingly, after revisiting the literature, we found that in most of
the fog and edge computing related work focuses on the network bandwidth as
the essential characteristics for efficient network management. It is worth high-
lighting the fact that, the closer to the edge resources are, the more significant
the impact on the access network is. Indeed, access networks become a criti-
cal part of the whole network infrastructure concerning the quality provision-
ing, congestion, real capacity and availability and also the part where devices’
mobility brings significant collateral effects on performance. Hence, as a sum-
mary, network bandwidth - as well as other network attributes at the edge must
be undoubtedly considered as a critical characteristic to characterize a resource.
Also, different edge devices may use different networking standards and tech-
nologies to communicate [13]. So consideration of the networking standards and
techniques are also mandatory when categorizing a resource.
690 S. Sengupta et al.

Differently, in the cloud arena, no such concerns are found for processing,
storage, power, or network (i.e., bandwidth) capacities of the cloud resources.
Interestingly, researchers have given their focus on managing the security, pri-
vacy, and reliability aspects [18,19] in the cloud paradigm. We also found that
cost management (i.e., charges for access and utilization of resources), is one of
the crucial aspects to build an efficient Cloud platform [20]. Indeed, several works
propose a cost model for system resources and services [18,20]. After a compre-
hensive reading (see [12] for more details) we may conclude that: (i) most of
the cloud-resources have some unique features - e.g., they are centralized, fault-
tolerant [18,20–22] etc.; (ii) in IoT, edge or fog, resources are geographically
distributed [12,21,23], while much agiler than cloud resources and suitable for
supporting real-time services. In summary, we may quickly assess that there are
a significant variety and diversity of system resources, what undoubtedly makes
resource categorization a challenging task.

3 An Overview of the F2C Architecture

The F2C has been introduced as a framework intended to both optimize the
resources utilization and improving the service execution, the latter through an
optimal mapping of services into the resources best suiting the services demands.
To that end, resources categorization becomes an essential component for a suc-
cessful F2C deployment. Consequently, an accurate description of the different
attributes and characteristics to be used to categorize a resource efficiently. Just
as an illustrative example, Fig. 1 depicts a picture showing how an F2C deploy-
ment in a smart city may look like, mainly representing the technological inte-
gration of the Cloud, Fog/Edge and IoT resources.

Fig. 1. Fog and cloud resources deployment in a Smart City.


Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 691

It is pretty apparent to observe the fact, that in a smart city, as shown in the
Fig. 1, several distinct and heterogeneous fog node devices may be found (i.e.,
smartphone, smartwatch, car, etc.) and also many IoT devices (i.e., surveillance
camera, temperature sensor etc.) can be connected or attached with them. We
also identify that several devices may become the leader fog node (i.e., road-side
unit, etc.) and each of them serve as the fog service provider of a particular fog
area to the smart city. Similarly, many different cloud providers may take over
the provisioning of cloud facilities to the citizens. The F2C solution, designed
to be a coordinated management platform, facilitates optimal management of
this broad set of heterogeneous resources (i.e., -IoT devices, fog nodes, cloud
resources, etc.). Unquestionably, the supervision of heterogeneous resources is
a crucial characteristic of the F2C platform. Thus, before devoting efforts to
categorize the resources, it is mandatory to revisit what the main aspects of
F2C are. In [11], the F2C is proposed as a combined, hierarchical and layered
architecture, where cloud resources reside at the top layer, the IoT layer at the
bottom consists in the set of IoT devices, and several intermediate fog layers
are considered bringing together the collection of heterogeneous edge devices. In
Fig. 2, we represent the hierarchical structure of the F2C architecture. Following
the hierarchical structure of the F2C architecture and considering the smart city
scenario, we found that the leader fog node of each fog area is responsible for
communicating with the upper layer resources in the F2C platform. Also, the
leader fog node is responsible for informing the upper layer resources about the
total resource information of its fog area.
It is worth emphasizing the fact that the concept of fog node has not widely
converged towards a unique definition yet. Although in a general view, this paper
is only using such fog node concept to represent a device belonging to fog (or by
extension to F2C), readers interested in this topic may find a more elaborated
discussion on its meaning in [24].

Fig. 2. Hierarchical architecture of F2C paradigm.


692 S. Sengupta et al.

Authors in [11], highlight the need to have a comprehensive devices control


and management strategy to build an efficient F2C system. As said earlier, it is
essential to correctly identify the resources characteristics and behaviour for a
successful F2C deployment. Indeed, by adequately identifying resources charac-
teristics and their behaviours - helps to build an efficient taxonomy of resources
of the F2C paradigm. This taxonomy would help the services to resources map-
ping process and thus optimizing the service execution. In the next Sect. 4, we
present the taxonomic view of F2C resources and later, in Sect. 5, we present a
generalized resource model for the F2C paradigm.

4 Proposing Taxonomy of the F2C Resources


The enormous diversity, heterogeneity and variety envisioned for the whole set
of resources from the edge up to the cloud, makes resources management in Fog-
to-Cloud a challenging effort. From a broad perspective, it is pretty evident that
the closer to the top (i.e., cloud) the larger the capacities are. Thus, we may
undoubtedly assess that computation, processing and storage capabilities are
higher in the cloud than in fog and higher in fog than in the edge. Interestingly, in
the F2C envisioned scenario this assessment is even more elaborated, leveraging
the different layers foreseen for fog. Indeed, in F2C different layers are identified
to meet different characteristics of distinct devices. Thus, considering the current
state of the art contributions, the specific layers architecture defined in F2C and
the potential set of attributes to characterize each one of them, we propose a
taxonomy for characterizing resources in an F2C system, as described next.
In the collaborative model foreseen in an F2C system, devices may partici-
pate as either ‘Consumer’, ‘Contributor’, or ‘Both’ of them. When a device acts
as a ‘Consumer’, the device gets into the F2C system to execute services, thus
being a pure resources consumer. When acting as ‘Contributor’, the device offers
its resources to both, itself and third users (in a future collaborative scenario),
to run services. Finally, some resources can act as ‘Both’, hence not only access-
ing (i.e., consuming) some services but also contributing with their resources
to support services execution. Thus, according to the participation role, in first
approach resources in an F2C system may be classified into three distinct types.
However, although the participation role is a key aspect, many other attributes
and characteristics must be considered as well in order to accommodate the
large heterogeneity of resources, including - Device attributes (Hardware,
Software, Network specification etc.), IoT components & Attached com-
ponents (Sensors, Actuators, RFID tags, Other attached device components),
Security & Privacy aspects (Device hardware security, Network Security
and Data Security), Cost Information (Chargeable device, Non-Chargeable
device), and History & Behavioural information (Participation role, Mobil-
ity, Life Span, Reliability, Information of the device location, etc.).
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 693

Fig. 3. The ontology-based F2C resource classification.

4.1 Taxonomy Modeling: Based on Ontology

In this paper, we present an F2C resource taxonomy model leveraging a proposed


ontology. To that end, in order to present the ontology-based resource taxonomy
model in F2C paradigm, we adopt the classification method proposed by Perez
[25]. According to the ontological model, modeling elements are divided into
five basic modeling original language: classes, relations, functions, axioms, and
instances. The ontology model O, is depicted in Fig. 3 and is shown as:

O = {C, R, F, A, I} (1)

C represents the class or concepts and can be further classified and subdivided
into a kind of basic class Ci. R represents the collection of relations, mainly
containing four basic types: part-of, kind-of, instance-of and attribute-of. F rep-
resents the collection of functions which can be formalized as:

F = C1 × C2 × C3 × ... × Cn−1 → Cn (2)

A represents the collection of axioms, and I represents the collection of instances.


Based on the ontological model described above, this paper analyzes the basic
elements of parameters C (class) and R (relation), according to the attributes
and expected behaviour for the whole set of resources in an F2C system. This
analysis will help to both propose the resource taxonomy for F2C and build the
resource description model for F2C.
694 S. Sengupta et al.

4.2 F2C Resource Taxonomy: View of the Class Diagram


Adopting the ontological model described above and following the attributes and
expected behaviour for the F2C system resources, Fig. 4, depicts in the form of
a class diagram the taxonomy proposed for F2C resources.

Fig. 4. Class diagram of the F2C Resource taxonomy: a completed model in Protégé.

According to the proposed class diagram, all resources in F2C can be initially
classified according to five different classes, each one further divided into several
sub-classes. Next, we present a brief description of each class and subclasses.

1. Device attributes - Devices participating in an F2C system can be classified


according to their hardware, software, networking specification and also by
considering their type.
– Hardware components - In an F2C system, storage, processor, main mem-
ory, graphics processing unit, and power source information of a device
help to classify them further.
– Software components - To participate in any service-oriented comput-
ing paradigm, devices must have an entry point, ‘software’ or ‘applica-
tion’. We assume that devices can join an F2C system in two ways: (i)
devices have the application or software copy installed, or; (ii) devices
must connect to another device, running the application or software copy.
According to the F2C architecture, two types of entry point are identified
for F2C resources: (i) one for cloud resources, and; (ii) another one for
the fog resources. This characteristic must also be considered to classify
F2C resources. Finally, also the operating system information and other
installed apps and APIs information will help classify them.
– Device type - Devices participating in an F2C system can be either phys-
ical or virtual device.
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 695

– Networking information - According to the large diversity of devices envi-


sioned in an F2C system; devices are expected to use several different
networking standards and technologies (i.e., wifi, Bluetooth, etc.). Hence,
information about the networking standards and supported technologies
must also be considered to classify F2C resources. Finally, being a key
attribute in the networking arena, we identify bandwidth as a key param-
eter to characterize F2C resources as well.
2. IoT components & Attached components information - The resources working
in an F2C system may have some sensors, actuators, RFID tags and other
attached-device components (i.e., webcam, printer, etc.). Therefore, resources
can be further classified according to the information of sensors, actuators,
RFID tags and other attached device components.
– Sensors - F2C resources may have attached various kind of sensors (i.e.,
temperature sensor, proximity sensor, etc.). Therefore, this information
must also be considered.
– Actuators - Similar to the previous one, many different actuators may be
attached to F2C resources (i.e., Mechanical, Thermal or Magnetic etc.).
Hence, similarly, this information must also be considered.
– RFID tags - F2C resources may also have the active or passive type of
RFID tags attached, so to be considered as well.
– Other attached device components - Many different external devices may
be connected to an F2C device (i.e., Webcam, external audio system,
printer, scanner, Arduino kit etc.). This information is enriching the
whole system; thus it must be undoubtedly considered to classify an F2C
resource.
3. Security & Privacy aspects - To build an efficient system, it is essential to iden-
tify the set of system resources requiring some protection and those requiring
not to be protected. In an F2C system, according to the device hardware
security, data privacy and the network security aspects, the resources can be
further classified as protected and insecure resources.
4. Cost information - In an F2C system, some resources are expected to offer
free access (i.e., with no cost) while some other may require some fee for
granting access. Therefore, according to the accessing cost, F2C resources
can be classified into Chargeable and Non-Chargeable resources.
5. History & Behavioral information - Beyond considering information about
resources attributes and components, resources in an F2C system may also
be classified according to the information of their present and past system
interaction, including resource reliability, life-span, mobility information, par-
ticipation role and information of their location.

Based on the above analysis, this paper considers the following five classes, device
attributes, information of IoT components and other attached devices, cost infor-
mation, security and privacy aspects, history and behavioural information, to
categorize resources in the F2C system.
696 S. Sengupta et al.

5 Presenting the Resource Description Model in F2C


In an F2C system, several fog areas may co-exist, as shown in Fig. 5 for an
illustrative smart city scenario. Each fog area is composed of one leader fog node,
various kind of fog node devices, IoT (i.e., sensors, actuators, etc.) and other
elements (i.e., printer, etc.), putting together a heterogeneous set of resources
as well as different data sources. As earlier stated, such heterogeneity makes
some challenges for the global management. Thus, as also mentioned in this
paper, correct and appropriate classification of resources becomes a must, to
facilitate in such coordinated management. Also, it is necessary to have a clear
and combined version of a generalized resource description. In this section, we
define a combined version of a generalized resource description for devices in an
F2C system.

Fig. 5. F2C scenario in a smart city.

Based on the previously described ontology and matching the F2C archi-
tecture, we conclude that the design of the full classes and sub-classes for each
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 697

resource turns into a key challenge to manage the whole system resources prop-
erly. Moving back to the smart city scenario depicted in Fig. 5, we may see, just
as an example, that the laptop contains the classes of device attributes, IoT com-
ponents & Attached components, Security & Privacy aspects, Cost information
as well as History & Behaviors. Each class includes different subclasses, such
as Hardware components, software components, Network information etc. Also,
the laptop contains a device id and a user id. To build an efficient F2C system
and to manage all the system resources properly, it is also essential to know the
total capacity and attributes of each fog area. Figure 5 shows that a fog area is
composed by a leader fog node, several types of fog node devices (i.e., laptop,
car, smartphone) and other attached devices (i.e., printer, light, etc.). Leveraging
such attributes description, we first propose a generalized resource description
model for an F2C system in Subsect. 5.1, and later, in Subsect. 5.2; we focus
on identifying the aggregated resource information model for a particular fog
area.
Moreover, it is worth highlighting the fact that for an F2C system to properly
work, the resource information must be stored efficiently. To that end, it is essen-
tial to have a strong but light-weight database. Also, efficiently and guaranteed
transfer of the resource description information, it is also mandatory to describe
the resource information through a standard and formatted language. Consid-
ering the characteristics of different databases and languages and according to
the proposed model, in this paper, we adopt a relational database management
system -SQLite, to store the resource information. Finally, to transfer the data
from one resource to another resource, in this paper we adopt JSON as the
information transferring implementation language.

5.1 Generalized Resource Description Model: A Single Resource

To participate in any service-oriented computing platform, devices must have an


entry point or ‘software’ or ‘application’ to join in. In the F2C system, devices can
join the system by two ways: (i) they have the ‘application’ or ‘software’ installed
on their device, or; (ii) they connect to another device that has the ‘application’
or ‘software’. So, considering the ontology-based resources classification model
proposed in Sect. 4, and for the sake of illustration aligned to the smart city
scenario depicted in Fig. 5, all devices in the smart city are denoted as - R, and
all devices endowed with the F2C enabled ‘software’ or ‘application’ copy are
denoted as - RF 2C . Hence, according to our proposed resource taxonomy of a
F2C system, RF 2C ⊆ R. The devices that do not have the F2C enabled ‘software’
or ‘application’, can also join the F2C system through a connection with an F2C
enabled device. They can be known as - ‘Other attached device components’ of
698 S. Sengupta et al.

the F2C enabled-device. We present the generalized resource description model


for the F2C enabled-device in a tuple form, as follows:

RF 2C =
<
user name; device id;
Device attributes: <
Hardware components: <
Storage information;
Main memroy information;
Processor information;
Power source information;
GPU & FPGA information
>;
Software components: <
Apps & APIs: <
F2C app: <
cloud resource app;
fog resource app
>;
Other apps & APIs
>;
Operating system
>;
Network information: <
Bandwidth information;
Networking standards information
>;
Resource type: <Physical device;
Virtual device>
>;
IoT components & Attached components: <
Sensors;
Actuators;
RFID tags;
Other attached device components
>;
Security & Privacy aspects: <
Device hardware security;
Network security;
Data privacy
>;
Cost information: <
Chargeable device; Non-Chargeable device
>;
History & Behaviors: <
Participation role; Mobility; Life span;
Reliability;
Information of the device Location;
resource sharing information
>
>

Before sharing the resource information, all resources (RF 2C ) in the F2C sys-
tem, keep storing a copy of their resource information according to the general-
ized resource description model. Resources(RF 2C ) are using their local database
(i.e., SQLite) to store their resource and components information. To share the
resource information efficiently with other F2C enabled resources, we adopt the
JSON language to make a standard and formatted description file. In the Listing
1.1, we represent the resource description file of an F2C enabled laptop, based
on the JSON language. The description file contains the detailed information
about the hardware (i.e., total and current available storage, RAM informa-
tion), software (i.e.,OS information, F2C app information etc.), IoT and other
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 699

attached components (sensors and other connected device information), history


& behavioural (i.e., current location information, participation role etc.) etc. of
the F2C enabled laptop.

Listing 1.1. The JSON-formatted resource description file for a F2C-enabled laptop:
An example
1
2 {
3 " user_name ":" craax_user123 " ,
4 " device_id ":11078934576 ,
5 " D e v i c e _ a t t r i b u t e s ": {
6 " H a r d w a r e _ c o m p o n e n t s ": {
7 " S t o r a g e _ i n f o r m a t i o n _ ( _in_MB_ ) ": {
8 " Total ":122880 ,
9 " Available ":965890
10 },
11 " M a i n _ M e m o r y _ i n f o r m a t i o n _ ( _in_MB_ ) ": {
12 " Total ":32768 ,
13 " Available ":13968
14 },
15 " P r o c e s s o r _ i n f o r m a t i o n ": {
16 " Processor_maker ":" Intel Core i7 -8550 U CPU @ 1.80 GHz " ,
17 " A v a i l a b l e _ p e r c e n t a g e _ o f _ p r o c e s s o r ":90.7
18 " P r o c e s s o r _ a r c h i t e c t u r e ":" X86_64 "
19 },
20 .
21 .
22 },
23 " S o f t w a r e _ c o m p o n e n t s ": {
24 " Operating_system ":" Windows -10 -10.0.16299 - SP0 " ,
25 " Apps_ & _APIs ": {
26 " F2C_app ":" fog_resource_app " ,
27 " Other_apps_ & _APIs ": {
28 " Adobe Acrobat Reader DC " ,
29 " AMD Software " ,
30 .
31 .
32 }
33 }
34 },
35 .
36 .
37 },
38 " IoT_components_ & _ a t t a c h e d _ d e v i c e _ c o m p o n e n t s ": {
39 .
40 .
41 },
42 .
43 .
44 }

5.2 Resource Description Presentation: Aggregated Model in Each


Hierarchy of F2C
As shown in Fig. 5 several fog areas may be included in a smart city, each of
them providing F2C services to the citizens. The policies used to define the fog
areas are out of the scope of this paper. However, it is pretty apparent that
correct management of the whole set of resources in fog areas is essential to
make the F2C system to be accurate and efficient. Unfortunately, since each fog
area is built by distinct resources not only in quantity but also in typology, the
capacity of processing, storage, power and networking techniques may differ for
each individual fog area, thus endowing each particular fog area with distinct
characteristics and features. This scenario makes the management of all fog
700 S. Sengupta et al.

areas notably challenging, thus difficulting the objective of building an efficient


F2C system. To mitigate this problem, a clear description of the entire set of
capacities and characteristics of each individual fog area is mandatory.

Fig. 6. Resource information sharing: from Fog to Cloud.

Previously we defined that, in the F2C system, devices those are sharing
their resources can participate in the system as - ‘Contributor’, or ‘Both’. Let’s
consider the Fig. 6 as an illustrative scenario to depict that cooperative sce-
nario. We may see that ‘Fog Area1’, contains one leader fog node and two fog
node devices (i.e., smartphone, laptop) along with other connected devices (i.e.,
printer, bulb etc.). Let’s consider that the two fog node devices and the leader
fog node are participating in the system as ‘Both’. In this case, the two fog
node devices are sharing their resource information with the leader fog node.
Thus, once the leader fog node receives the resource information for the two
fog node devices, it aggregates all the information along with its own resource
components information to form the resource information for the particular fog
area. Then, the leader fog node shares this aggregated information to the higher
layer in the F2C architecture. To make it work an strategy to aggregate the
resources information must be defined. To that end, next, we propose a general-
ized aggregated resource description model for the F2C system. We identify the
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 701

aggregated resource description model as aRDF 2C , and its structure is described


as following:

aRDF 2C =
<
fog node id; fog area id;
total number of the attached F2C enabled resources;
main memory capacity info ( in MB ): <
total available main memory;
F2C resource with highest main memory;
F2C resource with lowest main memory
>;
storage capacity info ( in MB ): <
total available storage size;
F2C resource with highest storage size;
F2C resource with lowest storage size
>;
processor info: <
processing capacity info ( in percentage ): <
average of processing capacity;
F2C resource with highest processing;
F2C resource with lowest processing
>;
processor core info ( number of cores ): <
average of total number of cores;
F2C resource with highest processor core;
F2C resource with lowest processor core
>
>;
gpu capacity ( in MB ): <
total available gpu capacity;
F2C enabled resource with highest gpu;
F2C enabled resource with lowest gpu
>;
power info remaining time ( in seconds ): <
average time of power remain;
F2C resource with highest power remain;
F2C resource with lowest power remain
>;
IoT & other attached devices info: <
sensors type info;
actuators type info;
RFID tag type info;
other attached device info;
>;
Security & Privacy score: <
average score for F2C resource;
F2C enabled resource with highest score;
F2C enabled resource with lowest score
>
>

By following this aggregated resource information, it can be easily drawn that


it is quite different from the generalized resource description model of a single
F2C resource. After getting all the resource information of a fog area, the leader
fog node of the respective area is aggregating all of the information, and it is
making an aggregated description file according to the upper mentioned model.
The aggregated description file only contains the information about leader fog
node id, fog area id, total number of fog nodes, the total capacity of main
memory, storage, GPU etc., information about the highest and lowest main
memory, storage, processing, GPU capacity of the F2C enabled fog node of
the respective fog area and so on. Then after creating the aggregated resource
information model, the leader fog node share this information with the upper
layer resources of the F2C paradigm.
702 S. Sengupta et al.

6 Conclusion
In this paper, we start highlighting the need to define a resources model to
ease the management of the F2C system. To that end, we begin presenting a
taxonomy for F2C resources. Leveraging the taxonomy along with the recent
literature, we propose an ontology-based resource description model for the F2C
system, where resources are described by device attributes, IoT components and
attached components, security and privacy aspects, cost information, and histor-
ical and behavioural information classes. The proposed model is illustrated in a
smart city scenario for the sake of understanding. And finally, in this paper, we
have also introduced the model for a generalized aggregated resources descrip-
tion file, aimed at sharing the resource information of a particular fog area. This
work is presented as the first step towards a comprehensive resource categoriza-
tion system which is considered as mandatory for an efficient F2C management
framework. Still, many challenges remain to be addressed. For example, consid-
ering active/non-active resources in the aggregated information, or even more
interesting, defining a strategy to implement the resource sharing as described in
the F2C. Even the classification of the F2C resources will help us to find out the
proper resources to map with services in the F2C paradigm. Implicitly, this work
will help us to define the cost-model for the F2C resources, and that will also
help us to find out some optimal solution for choosing the resources to execute
some tasks and provide some services. Thus, these challenges, as well as many
other open issues, will constitute the core of our future work as a follow up of
this paper.

Acknowledgment. This work was supported by the Spanish Ministry of Economy


and Competitiveness and the European Regional Development Fund, under contract
TEC2015-66220-R(MINECO/FEDER), and by the H2020 EU mF2C project reference
730929.

References
1. Department of Economic and Social Affairs. World Urbanization Prospects The
2014 Revision - Highlights. United Nations (2014). https://esa.un.org/unpd/wup/
publications/files/wup2014-highlights.pdf. ISBN 978-92-1-151517-6
2. Ismail, N.: What will the smart city of the future look like? Information Age
Magazine, 21 September 2017. http://www.information-age.com/will-smart-city-
future-look-like-123468653/
3. van der Meulen, R.: Gartner Says 8.4 Billion Connected “Things” Will Be in Use
in 2017, Up 31 Percent From 2016. Press Release by the Gartner, Inc. (NYSE: IT),
7 February 2017. https://www.gartner.com/newsroom/id/3598917
4. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet
of Things: a survey on enabling technologies, protocols, and applications. IEEE
Commun. Surv. Tutor. 17(4), 2347–2376 (2015)
5. Manyika, J., Woetzel, J., Dobbs, R., Chui, M., Bisson, P., Bughin, J., Aharon,
D.: Unlocking the potential of the Internet of Things. McKinsey&Company,
June 2015. https://www.mckinsey.com/business-functions/digital-mckinsey/our-
insights/the-internet-of-things-the-value-of-digitizing-the-physical-world
Taxonomy and Resource Modeling in Combined Fog-to-Cloud Systems 703

6. Cisco Systems Inc.: New Cisco Internet of Things (IoT) System Provides
a Foundation for the Transformation of Industries. Cisco News, 29 June
2015. https://investor.cisco.com/investor-relations/news-and-events/news/news-
details/2015/New-Cisco-Internet-of-Things-IoT-System-Provides-a-Foundation-
for-the-Transformation-of-Industries/default.aspx
7. Martin, R.: The Installed Base of Smart Meters Will Surpass 1 Billion by 2022,
Posted in the Newsroom of the Navigant Research, 11 November 2013
8. Ahmed, E., Yaqoob, I., Gani, A., Imran, M., Guizani, M.: Internet-of-Things-based
smart environments: state of the art, taxonomy, and open research challenges. IEEE
Wirel. Commun. 23(5), 10–16 (2016)
9. Mahmud, R., Buyya, R.: Fog computing: a taxonomy, survey and future directions.
In: Internet of Everything, pp. 103-130. Springer (2018)
10. Bonomi, F., Milito, R., Natarajan, P., Zhu, J.: Fog computing: a platform for
internet of things and analytics. In: Big Data and Internet of Things: A Roadmap
for Smart Environments, pp. 169–186. Springer (2014)
11. Masip-Bruin, X., Marin-Tordera, E., Jukan, A., Ren, G.J., Tashakor, G.: Foggy
clouds and cloudy fogs: a real need for coordinated management of fog-to-cloud
(F2C) computing systems. IEEE Wirel. Commun. Mag. 23(5), 120–128 (2016)
12. Sengupta, S., Garcia, J., Masip-Bruin, X.: A literature survey on ontology of differ-
ent computing platforms in smart environments. arXiv preprint arXiv:1803.00087
(2018)
13. Perera, C., Qin, Y., Estrella, J.C., Reiff-Marganiec, S., Vasilakos, A.V.: Fog com-
puting for sustainable smart cities: a survey. ACM Comput. Surv. (CSUR) 50(3),
32 (2017)
14. Dorsemaine, B., Gaulier, J.-P., Wary, J.-P., Kheir, N., Urien, P.: Internet of Things:
a definition & taxonomy. In: 2015 9th International Conference on Next Generation
Mobile Applications, Services and Technologies, pp. 72–77 (2015)
15. Vaithiya, S., Bhanu, M.S.: Ontology based resource discovery mechanism for mobile
grid environment. In: 2013 2nd International Conference on Advanced Computing,
Networking and Security (ADCONS), pp. 154–159 (2013)
16. Karaoglanoglou, K., Karatza, H.: Directing requests in a large-scale grid system
based on resource categorization. In: 2011 International Symposium on Perfor-
mance Evaluation of Computer & Telecommunication Systems (SPECTS), pp.
9–15 (2011)
17. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of Things (IoT): a
vision, architectural elements, and future directions. Futur. Gener. Comput. Syst.
29(7), 1645–1660 (2013)
18. Arianyan, E., Ahmadi, M.R., Maleki, D.: A novel taxonomy and comparison
method for ranking cloud computing software products. Int. J. Grid Distrib. Com-
put. 9(3), 173–190 (2016)
19. Parikh, S.M., Patel, N.M., Prajapati, H.B.: Resource management in cloud com-
puting: classification and taxonomy. arXiv preprint arXiv:1703.00374 (2017)
20. Zhang, M., Ranjan, R., Haller, A., Georgakopoulos, D., Menzel, M., Nepal, S.:
An ontology-based system for cloud infrastructure services’ discovery. In: 2012 8th
International Conference on Collaborative Computing: Networking, Applications
and Worksharing (CollaborateCom), pp. 524–530 (2012)
21. Baccarelli, E., Naranjo, P.G.V., Scarpiniti, M., Shojafar, M., Abawajy, J.H.: Fog
of everything: energy-efficient networked computing architectures, research chal-
lenges, and a case study. IEEE Access 5, 9882–9910 (2017)
704 S. Sengupta et al.

22. Moscato, F., Aversa, R., Di Martino, B., Fortiş, T.-F., Munteanu, V.: An analysis
of mOSAIC ontology for cloud resources annotation. In: 2011 Federated Conference
on Computer Science and Information Systems (FedCSIS), pp. 973–980 (2011)
23. Botta, A., de Donato, W., Persico, V., Pescapè, A.: Integration of cloud computing
and internet of things: a survey. Futur. Gener. Comput. Syst. 56, 684–700 (2016)
24. Marin-Tordera, E., Masip-Bruin, X., Garcia, J., Jukan, A., Ren, G.J., Zhu, J.: Do
we all really know what a Fog Node is? Current trends towards an open definition.
Comput. Commun. 109, 117–130 (2017)
25. Gomez-Perez, A., Fernandez-Lopez, M., Corcho, O.: Ontological engineering: with
examples from the areas of knowledge management, e-commerce and the semantic
web. Data Knowl. Eng. 46(1), 41–64 (2003)
Predicting Head-to-Head Games
with a Similarity Metric and Genetic
Algorithm

Arisoa S. Randrianasolo1(B) and Larry D. Pyeatt2


1
Lipscomb University, Nashville, TN, USA
arisoa.randrianasolo@lipscomb.edu
2
South Dakota School of Mines and Technology, Rapid City, SD, USA
larry.pyeatt@sdsmt.edu

Abstract. This paper summarizes our approach to predict head to head


games using a similarity metric and genetic algorithm. The prediction
is performed by simply calculating the distances of any two teams, that
are set to play each other, to an ideal team. The nearest team to the
ideal team is predicted to win. The approach uses genetic algorithm as
an optimization tool to improve the accuracy of the predictions. The
optimization is performed by adjusting the ideal team’s statistical data.
Soccer, basketball, and tennis are the sport disciplines that are used to
test the approach described in this paper. We are comparing our pre-
dictions to the predictions made by Microsoft’s bing.com. Our findings
show that this approach appears to do well on team sports, accuracies
above 65%, but is less successful for predicting individual sports, accu-
racies less than 65%. In our future work, we plan to do more testing on
team sports as well as studying the effects of the different parameters
involved in the genetic algorithm’s setup. We also plan to compare our
approach to ranking and point based predictions.

Keywords: Sports predictions · Similarity calculation


Genetic algorithm

1 Introduction

International sport competitions, professional sports, college sports, and even


regional and city tournaments now keep track of various data about the teams
involved in the competitions. Those data can be available right away as the
games progress, or may be extracted later by some experts after reviewing the
video of the games. The challenge is finding ways to make use of the available
data. Is there enough information in the data to predict the outcomes of future
games? What algorithm and calculations can be utilized to predict the outcomes
of future games? Those are some of the questions that teams and coaches may
have after receiving their statistical data from a tournament.
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 705–720, 2019.
https://doi.org/10.1007/978-3-030-02686-8_53
706 A. S. Randrianasolo and L. D. Pyeatt

In this paper, we summarize our approach to predicting the outcomes of head


to head games in tournaments. Our approach differs from others because it is not
utilizing all the possible historical data that can be gathered about the teams
that are involved. It is also not taking in consideration the past performance of
the teams in the same competition from previous years or previous matches. We
restrict the data that we are using to perform the prediction to only consist of
the most recent teams’ statistics in the tournament of interest.
This restriction of the data is based on the assumption that the performance
in the current tournament of interest is most indicative of the current strength
of the teams. Also, by using this restriction, this approach can be used in tour-
nament settings where teams do not necessarily know much about each other
before hand. This latter reason is our main motivation for this research.
Our approach uses a similarity metric over the most recent statistical data of
the teams involved in the tournament to predict the outcomes of head to head
games. To improve the predictions, we use a genetic algorithm as an optimization
mechanism.
This paper will cover some of the previous work done in terms of head to
head game predictions. Then, it will explain our early observation in predicting
head to head games. The forth section of the paper will cover the approach
that we are proposing. This will be followed by the testing and the results of
our experiments. The last section of this paper will contain our conclusions and
future work.

2 Related Work

The idea of predicting the outcome of a pairwise sport matchup is a research


topic for many investigators. Chen and Joachims explained the use of a general
probabilistic framework for predicting the outcome of pairwise matchups using
the blade-chest model [1,2]. A player or a team was represented by a blade
vector and a chest vector. The winning and losing probabilities were decided
based on the distance between one player’s blade to his opponent’s chest and vice
versa. The blade and chest vector were extracted from the player’s data and the
game features. This approach trained on historical data to tweak the parameters
involved in the model by maximizing the log-likelihood of the probability of the
known winner.
Machine learning is also used widely in sport predictions. In most of these
cases, as in the approach described previously, a considerable amount of historical
data is needed to train the model. For example, Pretorius and Parry trained a
random forest on past rugby games in order to predict the 2015 Rugby World
Cup [3]. The accuracy of the predictions made by their system was no different
than the prediction made by human agents on the 2015 Rugby World Cup.
Brooks, Kerr, and Guttag trained an SVM to predict if possessions will result
in shots in soccer [5]. The approach was applied on the Spanish La Liga soccer
league using the data from the 2012–2013 season. It had an Area Under the ROC
(Receiver Operating Characteristic) curve of 0.79. Microsoft’s Bing Predicts [11]
Predicting Head-to-Head Games 707

also claims to use machine learning in its prediction. Bing Predicts claimed
a 63.5% accuracy on predicting the 2016 NCAA March Madness and a 75%
prediction accuracy on the 2015 Women’s Soccer World Cup.
Evolutionary systems are also used in sport and matchup predictions. Soares
and Gilbert used a Particle Swarm Optimizer (PSO) to predict Cross-country
results [4]. Their approach transformed the team features from historical data
into a set of rankings. The rankings were multiplied by weights to produce the
final rankings. The final rankings were then evaluated from the results of the
cross-country meets as follows: A team received 1 point for each team it beat if
it was ranked ahead of that team, and received 1 point for each team it lost to
if it was ranked behind that team. A team received 0 points for each team in
which the opposite of either case above happened [4]. The goal of this approach
was to maximize the points earned through producing the final rankings used in
the predictions, and the way to do so is to optimize the weights using a PSO.
Another approach that uses ranking as a way to predict performance is to
create a complex-network based on different measures, such as clustering coeffi-
cient and node degree [6,7]. With this approach, a team sports league is viewed
as a network of players, coaches, and teams in evolution. The network was used
to predict teams’ behavior and to predict rankings. The rankings could be used
to predict the league’s winner. This approach was applied to NBA (National Bas-
ketball Association) and MLB (Major League Baseball) data and has achieved
a 14% rank prediction accuracy improvement over its best competitor [7].
The first difficulty in using many of these approaches resides in finding the
appropriate functions or transforms that can extract the needed information
from the historical data. Our approach uses a simple similarity metric and the
well known genetic algorithm to create the predictions. The second difficulty
arises from the struggle of finding enough data to train the model. In well known
competitions with well known teams, finding historical data is not a problem.
However, in less known competitions, such as regional or city or invitational or
small tournaments, finding historical data is not always possible. This is the
reason why we restrict the data that we are using to perform the predictions to
only consist of the most recent teams’ statistics in the tournament of interest.
We apply this restriction in all of the tournaments that we predicting regardless
of whether they are well known or not.

3 Early Observation
This research started because of a soccer coach who came to us with all sorts
of data about his team, and was struggling to find a way to use it to his team’s
advantage. The data that we received had no information about the other teams
in the division, so we could not do much in predicting head to head outcomes. To
continue this research, we started exploring publicly available data from other
sports competitions.
Our early observation lead us to notice that teams work to improve some
trackable features in the game. For example, in soccer, a team may try to maxi-
mize its ball possession time or possession percentage and minimize the amount
708 A. S. Randrianasolo and L. D. Pyeatt

of red cards that its players receive. In basketball, for example, a team may
try to minimize its turnover rate and maximize its three-points percentage. The
teams’ statistics data can be represented in a vector format.
This observation lead us to begin considering the idea of an ideal team. This
ideal team has the statistics that all teams, in a particular sport of interest, try
to reach. The values for the features in the ideal team’s vector can be hard to
reach for some teams. These values may even be impossible, but they should
represent what a perfect team should look like in the sport of interest. Now
that we have teams vectors and an ideal team vector, we can start working on
predictions.

Fig. 1. Similarity calculation.

The prediction is done simply by computing the similarity of each team to


the ideal team. A simple illustration of this idea is expressed in Fig. 1. Since the
data are vectorized, a distance or similarity calculation is not hard to compute,
and there are several distance measures that could be used. Given two teams
that are due to play in a head to head game, we predict that the nearest one to
the ideal team, represented by the ideal vector, will win the game.

3.1 Early Testing and Results


We started testing our approach on three competitions in 2016. The test com-
petitions were, the 2016 U.S. Open (tennis), the 2016 FIBA Africa Under 18
(basketball), and the 2016 UEFA European Championship (soccer) also known
as “euro 2016”. The 2016 FIBA Africa Under 18 was the ideal setup to test our
approach. The teams in that competition did not appear to have much infor-
mation about each other, and somehow had to utilize the statistics about the
other teams in order to know their winning chances and to create strategies.
The drawback of using this particular basketball competition was that it was
not a well known competition. We were not be able to compare our predictions
Predicting Head-to-Head Games 709

to other live predictions. This was the reason why we tested our approach to the
2016 U.S. Open and the 2016 UEFA European Championship competitions.
In the 2016 U.S. Open, we used the data from rounds one through four to
predict the quarterfinals. Then, we utilized the data from rounds one through
four plus the quarterfinals to predict the semifinals. Finally, we employed the
data from rounds one through four plus the quarterfinals and the semifinals to
predict the finals.
In the 2016 FIBA Africa Under 18, we used the data from the group stage
to predict the quarterfinals. Then, we followed the same procedure as in the
2016 U.S. Open. In the 2016 UEFA European Championship, we also utilized
the data from the group stage to predict the round of 16 and then we followed
the same approach as in the previous two sports mentioned above. The features
used during this early testing are shown in Table 1.

Table 1. Features used in early testing

2016 U.S. Open 2016 FIBA Africa 2016 UEFA Euro


sets played points per game total corner for
tie breaks played field goal attempts total corner against
total games field goal % offside
total aces 3-points attempts fouls committed
total double faults 3-points % fouls suffered
1st serves in % free throw attempts yellow cards
1st serve points won % free throw attempts % red cards
2nd Serve points pass completed
won % ball possession %
return games won total attempt
winners attempt on target
unforced errors attempt off target
attempt blocked
attempt against wood
work
total goals
total goals against

There was no specific study done in choosing the predictors during the early
observation part of this research. We used our knowledge about these three
different sports in choosing those predictors. We also used our knowledge about
these sports in selecting the ideal vectors. An in-depth study on how to pick the
predictors was left to the next phase of this research, which is summarized in
the next section.
710 A. S. Randrianasolo and L. D. Pyeatt

The ideal vector for the 2016 U.S. Open Men’s competition was:

3, 0, 18, 20, 0, 100, 100, 100, 9, 80, 0 .

The ideal vector for the 2016 U.S. Open Women’s competition was:

2, 0, 12, 20, 0, 100, 100, 100, 6, 80, 0 .

The ideal vector for the 2016 FIBA Africa Under 18 was:

150, 150, 80, 50, 50, 50, 80 .

The ideal vector for the 2016 UEFA European Championship was:

100, 0, 0, 0, 100, 0, 0, 100, 100, 200, 200, 0, 0, 0, 60, 0 .

In our early exploration, we used three different similarity or distance mea-


sures: Cosine distance, Manhattan distance (L1 -norm), and Euclidean distance
(L2 -norm). The prediction accuracy from 0 to 1 (0% to 100%) of each of these
three different distance metrics are captured in Fig. 2. We compared our pre-
dictions to the predictions from Microsoft’s Bing Predicts. The results of this
comparison are shown in Fig. 3.

Fig. 2. Comparison of similarity measures.


Predicting Head-to-Head Games 711

Fig. 3. Comparison with Bing.com.

4 Prediction Method

4.1 Choosing a Similarity Metric

Our early exploration seems to indicate that switching similarity metric based
on the sport event is possibly the way to proceed. However, we want to create
a general approach that will work in any type of sports. We locked our choice
to using Cosine distance as our similarity metric for the rest of this research.
The reasoning for this choice is that out of the combined predictions (U.S. Open
Men + U.S. Open Women + 2016 UEFA European Championship) recorded
in Fig. 3, the accuracy for Cosine distance was 18 30 which was similar to the
Manhattan distance’s accuracy, while the combined accuracy for the Euclidean
distance was 17
30 . We did not break the tie between Cosine and Manhattan; we
just picked one to go with.

4.2 Effect of the Ideal Vector

Our early observation has also pointed out that a change in the ideal vector will
affect the predictions. In the early observation, we used our personal knowledge
about the sports that we were dealing with to set up the ideal vectors. We do not
claim to be an expert in these sports or the competitions that we dealt with in
the early observation, and the ideal vectors that we picked could be erroneous.
Also, we want the ideal vector to be in close relationship with the trend in the
tournament. In one tournament, for example, a ball possession of 60% could be
712 A. S. Randrianasolo and L. D. Pyeatt

enough to win the tournament. While in another tournament, a ball possession


of 80% may be needed to win. This prompted us to employ an optimization
strategy to improve the ideal vector.

4.3 Approach

Our approach is summarized by Fig. 4. It starts with an input file containing the
statistics from the early rounds of the tournament and a starting ideal vector
that we manually selected based on what we think an ideal statistics should look
like for an ideal team. The approach, then, makes its first set of predictions based
on the next set of games that are to played in the tournament. The predictions
are compared to the observed outcomes to obtain the accuracy of the ideal
vector. Next, a genetic algorithm is called to optimize the ideal vector [8–10].
The genetic algorithm utilizes the same input file containing the team statistics
and the observed outcomes to calculate the fitness of each candidate ideal vector.
The best ideal vector is saved for the next set of predictions.

Fig. 4. The overall approach.

For the second set of predictions, the approach utilizes the best ideal vector
produced by the genetic algorithm in the first optimization and the team statis-
tics from the beginning of the tournament up to the most recent games. For
the third set of predictions, the approach uses the best ideal vector produced by
Predicting Head-to-Head Games 713

the genetic algorithm in the second optimization and the team statistics from
the beginning of the tournament up to the most recent games. The approach
continues in this manner until the approach produces the last set of predictions,
after which no further optimization is required.
As a tournament moves from one round to the next, there are usually fewer
games to predict. This means that the accuracy of any prediction methods can
potentially go down from one round to the next as the tournament progress. This
is another reason why we use a genetic algorithm optimization between rounds
so that the approach can learn the trend or the pattern from the previous rounds
to better predict the next round.

4.4 Short Introduction to Genetic Algorithms

A genetic algorithm is a search and an optimization process inspired from biology.


It is based on the survival of the fittest. In a genetic algorithm, a potential
solution is called an individual. An individual is, most of time, expressed as a
string of characters. The set of individuals is known as a population.
Each individual in the population has a fitness value. This value indicates
the individual’s quality of being a solution to the problem. Individuals in the
population are allowed to mate to produce new solutions.
The mating part of the algorithm is known as a crossover. During a crossover,
two individuals exchange characters to form a new string. Individuals that par-
ticipate in crossovers are selected by a process that is based on their fitness.
The more fit individuals have higher chances to participate in crossovers. The
eventual exchange of characters is governed by a crossover probability. This
probability determines whether the exchange is allowed to happen or not.
Individuals in the population can also mutate with a defined probability
known as the mutation probability. The mutation is usually performed by alter-
ing one or more characters from the string that represents an individual.
In each iteration, the algorithm attempts to create new individuals. The
algorithm halts when an individual with the desired fitness is generated, or when
the maximum number of allowed iterations is reached. Other halting conditions
can also be adopted.

4.5 Genetic Algorithm Set up

The individuals in the population are candidate ideal vectors. The population
size is fixed to 100 for our experiments, and the probability of crossover is set
to 60%. A roulette wheel selection approach is used to select the parents for the
crossover. Other selection approaches exist and we plan to study those more in
our future work. The crossover is performed at a fixed point which is always at
the middle of the candidate ideal vectors. The probability of mutation is 0.1%.
The mutation is performed by either adding 1, with a probability of 50%, or
subtracting 1, with a probability of 50%, to each of the values of a candidate
ideal vector that have range greater than or equal to 5. It is performed by
714 A. S. Randrianasolo and L. D. Pyeatt

adding or subtracting 0.1 with equal probability for values that have range less
than 5. Each candidate ideal vector is used to predict the set of games that
just happened; to which the observed outcomes are available. The fitness of each
candidate ideal vector is nothing else but its accuracy on the game that just
happened. The genetic algorithm is allowed to generate 1200 new individuals
before it stops. Then survival of the fittest is used to place a new individual in
the population. Our genetic algorithm approach was modeled after the approach
described by Goldberg [9].

5 Testing and Results


We revisited the competitions in the early observation with this new proposed
approach. The results are captured by Fig. 5. Since there is some randomness
in generating the population in the genetic algorithm, we ran the approach 51
times on each set of games that it tried to predict. We then used a majority rule
between any two teams going head to head to see which one was mentioned the
most to be the winner in the 51 prediction attempts. We choose 51, which is an
odd number, because we are interested in a win or lose situation and not a draw.
There seems to be an improvement in predicting the men’s U.S. Open tour-
nament and a slight improvement on the 2016 UEFA European Championship,
so we tested the approach with two other tournaments: the 2016–2017 UEFA
Champions League and the 2017 Australian Open. Before proceeding to use the
approach, we ran a correlation analysis on the predictor variables to help us in
choosing the features for the ideal vectors and the vectors for each team. Figure 6
has the correlation plot for the 2016–2017 UEFA Champions League competition
and Fig. 7 has the correlation plot for the 2017 Australian Open competition.
Table 2 shows the final features for the team vectors and the ideal vectors that
were used in the testing.
Tables 3 and 4 show the ranges of the possible values for each feature in the
ideal vectors for the two competitions. The starting ideal vector for the 2016–
2027 UEFA Champions League was:
60, 0, 200, 0, 0, 0, 100, 100, 100, 100, 0, 100, 0, 0 .
The starting ideal vector for the 2017 Australian Open Men’s competition was:
1, 80, 1, 90, 100, 30, 1, 100, 100 .
The starting ideal vector for the 2017 Australian Open Women’s competition
was:
0, 80, 0, 80, 100, 20, 0, 100, 100 .
The accuracy of the predictions can be seen in Fig. 8.
Over the eleven competitions that we have been predicting so far, we also
tracked how this approach performed as it moved from the first round of the pre-
dictions to the next rounds. Some competitions had more rounds than the others;
however they all had at least three rounds. The accuracy of the predictions from
the first three rounds are summarized in Fig. 9.
Predicting Head-to-Head Games 715

Fig. 5. Revisit of the early observations.

Fig. 6. Correlation for the 2016–2017 UEFA Champions League.


716 A. S. Randrianasolo and L. D. Pyeatt

Fig. 7. Correlation for the 2017 Australian Open.

Table 2. Features used in the testing.

2016-2017 UEFA 2017 Australian Open


total goals tie break
total goal against winners
attempt on target unforced errors
attempt off target service points won
attempt blocked percentage of first serve in
attempts against wood work aces
pass completion percentage double fault
ball possession percentage of 1st serve point won
total corner for percentage of 2nd serve point won
cross completion
fouls committed
fouls suffered
yellow cards
red cards
Predicting Head-to-Head Games 717

Table 3. Range of values in the Ideal Vector for the 2016–2017 UEFA Champions
League.

Total goals 0–40


Total goal against 0–40
Attempt on target 10–90
Attempt off target 10–90
Attempt blocked 5–60
Attempts against wood work 0–10
Pass completion percentage 50–100
Ball possession 30–70
Total corner for 0–10
Cross completion 5–90
Fouls committed 50–200
Fouls suffered 50–200
Yellow cards 5–30
Red cards 0–3

Table 4. Range of values in the Ideal Vector for the 2017 Australian Open.

Men Women
Tie break 0–2 0–1
Winners 90–100 0–50
Unforced errors 0–2 0–50
Service points won 90–100 0–70
Percentage of first serve in 90–100 50–100
Aces 20–40 0–20
Double fault 0–2 0–10
Percentage of 1st serve pt. won 90–100 50–100
Percentage of 2nd serve pt. won 90–100 50–100
718 A. S. Randrianasolo and L. D. Pyeatt

Fig. 8. Performance on the 2016–2017 UEFA Champions League and the 2017 Aus-
tralian Open.

Fig. 9. Performance from one round to the next.


Predicting Head-to-Head Games 719

6 Conclusion and Future Works


In this paper, we have summarized our approach on predicting head to head
games using only the statistical data of what the teams have been doing in a
tournament of interest. Our approach is aimed at predicting local or regional
competitions where little or no historical data is available by using a simple
similarity metric and the well known genetic algorithm.
Individual sports are more difficult to predict than team sports. Injuries,
emotions, fatigue, and other factors have a greater effect on individuals than they
do on teams. For individual sports, these factors must be taken in consideration
to improve the prediction. Taking social media input (similar to what Microsoft’s
bing.com [11] claims to be doing) or using additional data about each game,
such as time of the day or weather or public support (similar to what was
done by Chen and Joachims [1,2]) can be beneficial. Even in the work by Chen
and Joachims [1,2], predictions are still only around 60% and 70% in tennis.
Team performances in collective sports appear to have more regularity, making
predictions a little bit less difficult than individual sports.
The performance of our approach on the 2016 FIBA Africa Under 18, the 2016
UEFA European Championship, and the 2016–2017 UEFA Champions League,
indicates that it has the potential to do well for predicting the outcomes of
team and collective sports head to head games. We plan to test this approach
on more team sports in the future. Our future goals also include finding a way
to automatically infer the initial ideal vectors from the initial data rather than
depending on a human agent to generate them. We also plan to engage in a
more detailed analysis of the parameters involved in the genetic algorithm. This
will involve exploring different selection approaches and experimenting with the
crossover and the mutation probability. Our aim in this endeavor is not only
to improve the accuracy but also to uncover the reason for the slight drop of
performance between the second rounds and the third rounds of predictions as
we can see from Fig. 9. We also plan to compare our predictions to ranking based
and point based predictions.

References
1. Chen, S., Joachims, T.: Predicting matchups and preferences in context. In: Pro-
ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, KDD 2016, San Francisco, California, USA, pp. 775–784.
ACM, New York (2016)
2. Chen, S., Joachims, T.: Modeling intransitivity in matchup and comparison data.
In: Proceedings of the Ninth ACM International Conference on Web Search and
Data Mining, WSDM 2016, San Francisco, California, USA, pp. 227–236. ACM,
New York (2016)
3. Pretorius, A., Parry, D.A.: Human decision making and artificial intelligence: a
comparison in the domain of sports prediction. In: Proceedings of the Annual
Conference of the South African Institute of Computer Scientists and Information
Technologists, SAICSIT 2016, Johannesburg, South Africa, pp. 32:1–32:10. ACM,
New York (2016)
720 A. S. Randrianasolo and L. D. Pyeatt

4. Soares, C., Gilbert, J.E.: Predicting cross-country results using feature selec-
tion and evolutionary computation. In: The Fifth Richard Tapia Celebration of
Diversity in Computing Conference: Intellect, Initiatives, Insight, and Innovations,
TAPIA 2009, Portland, Oregon, pp. 41–45. ACM, New York (2009)
5. Brooks, J., Kerr, M., Guttag, J.: Developing a data-driven player ranking in soccer
using predictive model weights. In: Proceedings of the 22nd ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, KDD 2016, San
Francisco, California, USA, pp. 49–55. ACM, New York (2016)
6. Vaz de Melo, P.O.S., Almeida, V.A.F., Loureiro, A.A.F.: Can complex network
metrics predict the behavior of NBA teams? In: Proceedings of the 14th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
KDD 2008, Las Vegas, Nevada, USA, pp. 695–703. ACM, New York (2008)
7. Vaz de Melo, P.O.S., Almeida, V.A.F., Loureiro, A.A.F., Faloutsos, C.: Forecasting
in the NBA and other team sports: network effects in action. ACM Trans. Knowl.
Discov. Data 6, 13:1–13:27 (2012)
8. Mitchell, M., Forrest, S.: Genetic algorithms and artificial life. Artif. Life 1, 267–
289 (1994)
9. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learn-
ing. Addison-Wesley Longman Publishing Co. Inc., Boston (1989)
10. Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory
Analysis with Applications to Biology, Control and Artificial Intelligence. MIT
Press, Cambridge (1992)
11. Bing Predicts. http://www.bing.com/explore/predicts. Accessed 17 July 2017
Artificial Human Swarms Outperform Vegas
Betting Markets

Louis Rosenberg ✉ and Gregg Willcox


( )

Unanimous AI, San Luis Obispo, CA, USA


Louis@Unanimous.AI

Abstract. Swarm Intelligence (SI) is a natural phenomenon in which biological


groups amplify their collective intelligence by forming dynamic systems. It has
been studied extensively in bird flocks, fish schools, and bee swarms. In recent
years, AI technologies have enabled networked human groups to form systems
modeled on natural swarms. Referred to as Artificial Swarm Intelligence or ASI,
this approach has been shown to significantly amplify the effective intelligence
of human groups. The present study compares the predictive ability of ASI to
Vegas betting markets when forecasting sporting events. Groups of average sports
fans were required to forecast the outcome of 200 hockey games in the NHL
league (10 games per week for 20 weeks). The expected win rate for Vegas
favorites was 62% across the 200 games based on the published odds. The ASI
system achieved a win rate of 85%. The probability that the ASI system outper‐
formed Vegas by chance was very low (p = 0.006), indicating a significant result.
Researchers also compared the ROI generated from two betting models: one that
wagered weekly on the top Vegas favorite, and one that wagered weekly on the
top ASI favorite. At the end of the 20-week period, the Vegas model generated a
41% financial loss, while the ASI model generated a 170% gain.

Keywords: Swarm intelligence · Artificial intelligence Collective intelligence

1 Background

Artificial Swarm Intelligence (ASI) is a powerful method for amplifying the predictive
accuracy of networked human groups [1, 2]. A variety of prior studies, across a wide
range of prediction tasks have demonstrated that real-time “human swarms” can produce
more accurate forecasts than traditional “Wisdom of Crowds” methods such as votes,
polls, and surveys [3]. For example, a study in 2015 tested the ability of human swarms
to predict the outcome of college football games. The ASI system tapped the real-time
intelligence of 75 amateur sports fans to predict 10 bowl games. As individuals, the
participants averaged 50% accuracy when predicting outcomes against the spread. When
forecasting together as a real-time ASI system, those same participants achieved 70%
accuracy against the spread [2]. Similar increases have been found in other studies,
including a five-week study that tasked human participants, connected as an ASI system,
with predicting a set of 50 soccer matches in the English Premier League. Results showed
a 31% increase in accuracy when participants were connected in ASI swarms as

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 721–729, 2019.
https://doi.org/10.1007/978-3-030-02686-8_54
722 L. Rosenberg and G. Willcox

compared to forecasting as individuals [4]. The human swarms also outperformed the
BBC’s machine-model known as “SAM” over those same 50 games [5].
Although previous research has shown that ASI technology can empower human
groups to outperform individual forecasters as well as traditional crowd-based methods,
no formal study has been conducted to compare the predictive ability of ASI to major
betting markets [6]. To address this need, the current study was conducted to rigorously
compare “human swarms” to Vegas betting markets, assessing the accuracy rates and
the financial returns across a large set of predictions. Specifically, this largescale study
required groups of sports fans to forecast the outcome of 200 games in the National
Hockey League (NHL), structured as 10 games per week for 20 consecutive weeks.

1.1 From Crowds to Swarms


When collecting input from human groups, the phase “Wisdom of Crowds” is generally
used whenever the input is aggregated to generate output of higher accuracy. [7–9]. The
basic premise, also referred to as Collective Intelligence, dates to the early 1900’s and
generally involves collecting survey data from groups of individuals and computing a
statistical result. When comparing “swarms” and “crowds”, the primary difference is
that in crowd-based systems, the participants provide isolated input that is aggregated
in external statistical models, whereas in swarm-based systems the participants interact
in real-time, “thinking together” as a unified system. In other words, crowds are statis‐
tical constructs while swarms are closed-loop systems in which the participants act,
react, and interact in real-time, converging together on optimized solutions.
ASI systems are generally modeled on biological systems such as fish schools, bird
flocks, and bee swarms. The present study uses Swarm AI technology from the company
Unanimous AI. This technology is modeled primarily on the collective decision-making
processes employed by honeybee swarms [4]. This framework was chosen because
honeybee populations have been shown to reach optimal decisions by forming real-time
closed-loop systems [10]. In fact, at a structural level, the decision-making methods
observed in honeybee swarms are very similar to the decision-making processes
observed in neurological brains [11, 12].
When reaching decisions, swarm and brains are both employ large populations of
simple excitable units (i.e., bees and neurons) that operate in parallel to (a) integrate
noisy data about the world, (b) weigh competing alternatives when a decision needs to
be made, and (c) converge on preferred decisions as a unified system. In both brains and
swarms, outcomes are arrived upon through competition among sub-populations of
simple excitable units. When one sub-population exceeds a threshold level of support,
the corresponding alternative is chosen by the system. In honeybees, this enables the
group to converge on optimal decisions across a wide range of tasks, for example when
selecting the best possible hive location from a large set of options. Researchers have
shown that honey bees converge on the best possible solution to this life-or-death deci‐
sion approximately 80% of the time [13, 14].
Artificial Human Swarms Outperform Vegas Betting Markets 723

1.2 Creating Human Swarms


Unlike birds and bees and fish, humans have not evolved the natural ability to swarm,
as we don’t possess the subtle skills that other organisms use to establish high speed
feedback-loops among their members. Fish for example, when moving in schools, detect
faint vibrations in the water around them. Birds, when flocking, detect subtle motions
propagating through the formation. Honeybees, when reaching decisions as a unified
swarm, use complex body vibrations called a “waggle dance” to encode their changing
views. To enable real-time swarming among groups of networked humans, specialized
software is required to close the loop among all members. To solve this problem, a
software platform (swarm.ai) was created to allow human groups to form real-time
systems from anywhere in the world [1, 6]. Modeled after the decision-making process
of honeybee swarms, swarm.ai enables groups of networked users to work in parallel to
(a) integrate noisy information, (b) weigh competing alternatives when making deci‐
sions, and (c) converge on decisions, together as a real-time closed-loop system.
As shown in Fig. 1 below, artificial swarms answer questions by moving a graphical
puck to select among a set of answer options. Each participant provides their input by
moving a graphical magnet with a mouse, touchpad, or touchscreen. By adjusting their
magnet in relation to the moving puck, real-time participants can express their individual
intent on the system as a whole. The input from each user is not a vote, but a continuous
stream of vectors that varies freely over time. Because all members of the networked
population can vary their intent continuously in real-time, as moderated by AI algo‐
rithms, the artificial swarm explores the decision-space, not based on the input of any
single individual, but based on the emergent dynamics of the system as a whole. This
enables complex deliberations to emerge among all participants at the same time,
empowering the group to collectively consider each of the options and converge on the
solution that best represents their combined knowledge, wisdom, and insights.

Fig. 1. Real-time ASI choosing between options.


724 L. Rosenberg and G. Willcox

It is critical point out that participants do not only vary the direction of their individual
intent, but also modulate the magnitude by manipulating the distance between their
magnet and the puck. Because the puck is in fluid motion throughout the decision-space,
users need to continuously update the position and orientation of their magnet so that it
stays close to the puck’s outer rim. This is important, for it requires participants to remain
engaged throughout the decision-making process, continuously evaluating and re-eval‐
uating their individual thoughts and feelings with respect to the question at hand. If they
stop moving their magnet in relation to the changing position of the puck, the distance
grows and their applied sentiment wanes.

2 Forecasting Study

To quantify the forecasting ability human swarms as compared to large Vegas betting
markets, a 20-week study was conducted using randomly selected human subjects. The
participants, who were self-reported sports fans, were split into weekly groups. Each
group consisted of 25 to 35 participants, all of whom logged in remotely to the swarm.ai
system. Human subjects were paid $3.00 for their participation in each weekly session,
which required them to forecast the outcome of all ten hockey games being played that
night. All subjects were required to make their forecasts in two ways – (a) as individuals
reporting on a standard online survey, and (b) as a contributor to a real-time ASI system.
For each hockey game, participants were tasked with forecasting the winner and the
margin of victory, expressed as either (a) the team win by 1 goal, or (b) the team win
by 2 or more goals. The margins were chosen to match common Vegas gambling
spreads. Figure 2 below shows a snapshot of a human swarm comprised of 31 partici‐
pants in the process of predicting a match between Toronto and Calgary.

Fig. 2. ASI in the process of forecasting an NHL game.


Artificial Human Swarms Outperform Vegas Betting Markets 725

As shown in Fig. 2, each real-time swarm is tasked with selecting from among four
outcome options, indicating which team will win and which margin is most likely.
Again, the participants do not cast discrete votes but express their intent continuously
over time, converging together as a system. The image shown in Fig. 2 is a snapshot of
the system as it moves across the decision-space and converges upon an answer, a
process that generally required between 10 and 60 s to complete.
In addition to forecasting each individual game, participants were asked to identify
which of the weekly predictions is the most likely to be a correct assessment. In other
words, which of the teams forecast to win their games that week should be deemed the
“pick of the week” as a consequence of being the most likely team to win its game.
Figure 3 shown below is an example of ASI system in the process of identifying the pick
of the week. As shown, the system is selecting from among six possible teams to decide
which is most likely to win its game that week.

Fig. 3. ASI in process of identifying “Pick of the Week”.

2.1 Wagering Protocol

By collecting predictions for each of the 10 weekly games as well as a top “pick of the
week”, forecasting data was collected across all 20 weeks for accuracy comparison
against Vegas betting markets. To enable ROI comparisons against betting markets, two
standardized betting models were tracked across the 20-week period. In both models,
an initial simulated betting pool of $100 was created as the starting point for ROI
computations, the pools tracked over the 20-week period.
726 L. Rosenberg and G. Willcox

In “Wagering Model A,” a simple heuristic was defined which allocated weekly bets
equal to 15% of the current betting pool, dividing it equally across all ten weekly fore‐
casts made by the ASI system. In “Wagering Model B,” a similar heuristic was defined
which also allocated 15% of the current betting pool for use in weekly bets, but placed
the entire 15% upon one game, identified as “pick of the week”. Both pots were tracked
over the 20-week period, using actual Vegas payouts to compute returns. Vegas odds
used in this study were captured from www.sportsbook.ag, a popular online betting
market.

3 Results

Across the set of 200 games forecast by the ASI system, an accuracy rate of 61% was
achieved. This compares favorably to the expected accuracy of 55% based on Vegas
odds (p = 0.0665). Of course, the more important skill in forecasting sporting events is
identifying which games can be predicted with high confidence as compared to those
games which are too close to call. This skill is reflected in the “pick of the week” gener‐
ated by the ASI system. Across the 20 weeks, the system achieved 85% accuracy in
correctly predicting the winner of the “pick of the week” game. This compares very
favorably to the expected accuracy of 62% based on Vegas odds.
Figure 4 below shows the distribution of Vegas Odds for the twenty selected “pick
of the week” games. As described above, the swarm-based system had a win rate of 85%
across these same games. This is a significant improvement, equivalent to reducing the
error in Vegas Odds by 61%. The probability that the swarm outperformed Vegas Odds
by chance was extremely low (p = 0.0057), indicating a highly significant result.

Fig. 4. Results across 20 weeks of NHL predictions.


Artificial Human Swarms Outperform Vegas Betting Markets 727

In addition, a betting simulation was run for each prediction set in which 15% of the
current bankroll was bet on each weekly prediction. The performance of this model,
when betting against Vegas is shown below in Fig. 5. Starting with $100 and investing
each week according to this strategy, the Pick of the Week strategy results in a gain of
$270.20, equivalent to a 20-week ROI of 170%, and a week-over-week average ROI of
5.09%. For comparison, betting on all of the swarm’s picks evenly (for a total of 15%
of the bankroll) results in $121.82, or a 20-week ROI of 21.8%, indicating that the swarm
is selecting better than randomly among its picks.

Fig. 5. Cumulative betting performance across 20 weeks.

While it’s impressive to achieve 170% ROI over 20 weeks, we can gain additional
insight into the significance of this outcome by comparing against additional baselines.
For example, we can compare these results to (a) randomly placed bets across all games
played as a means of assessing if the swarm bets across all games are as significant as
they appear, and (b) bets placed on the Vegas favorite each week as a means of assessing
if betting on the swarm’s top picks is as impressive as it seems.
These baselines are shown in Fig. 6 as the green line and red line, respectively.
Looking first at random betting across all games, the net outcome across 20 weeks was
$72.39, which equates to 28% loss over the test period. This is significantly worse than
the $122 (22% gain) achieved by betting on all swarm-based forecasts. Even more
surprising, betting on the Vegas favorites each week resulted in a net outcome of $59,
which equates to a 41% loss over the 20-week test period. This is significantly worse
than the $270 (170% gain) achieved by betting on the swarm’s top picks.
728 L. Rosenberg and G. Willcox

Fig. 6. Swarm performance vs Baseline performance across 20 weeks.

4 Conclusions

Can real-time human swarms, comprised of average sports fans connected by swarming
algorithms, outperform the predictive abilities of largescale betting markets? The results
of this study suggest this is very much the case. As demonstrated across a set of 200
games during the 2017–2018 NHL hockey season, an ASI systems comprised of approx‐
imately 30 typical sports fans, were able to out-forecast Vegas betting markets. This was
most significant when the ASI system identified a “pick of the week” as the most likely
game to achieve the predicted outcome. Across the 20 weeks, the system achieved 85%
accuracy when predicting the “pick of the week” games, which compares favorably to
the expected accuracy of 62% based on Vegas odds. The probability that the system
outperformed Vegas by chance was extremely low (p = 0.006), indicating a highly
significant result.
In addition, when using the “pick of the week” within a simple automated wagering
heuristic, a simulated betting pool that started at $100, grew to $270 over the 20-week
period based on the swarm-based predictions. This was a 170% ROI. Additional work
is being conducted to optimize this wagering heuristic, as there appears to be room for
improvement when optimizing Vegas wagers based on a swarm-based predictive intel‐
ligence. Looking towards future research, additional studies are planned to better under‐
stand which types of problems are best suited for solutions using “human swarms” as
well as the impact of swarm size on output accuracy.

References

1. Rosenberg, L.: Human swarms, a real-time method for collective intelligence. In: Proceedings
of the European Conference on Artificial Life 2015, pp. 658–659
2. Rosenberg, L.: Artificial swarm intelligence vs human experts. In: 2016 International Joint
Conference on Neural Networks (IJCNN). IEEE
Artificial Human Swarms Outperform Vegas Betting Markets 729

3. Rosenberg, L., Baltaxe, D., Pescetelli, N.: Crowds vs Swarms, a Comparison of Intelligence.
In: IEEE 2016 Swarm/Human Blended Intelligence (SHBI), Cleveland, OH (2016)
4. Baltaxe, D., Rosenberg, L., Pescetelli, N.: Amplifying prediction accuracy using human
swarms. In: Collective Intelligence 2017, New York, NY (2017)
5. McHale, I.: Sports Analytics Machine (SAM) as reported by BBC. http://blogs.salford.ac.uk/
business-school/sports-analytics-machine/
6. Rosenberg, L., Willcox, G.: Artificial Swarms find Social Optima. In: 2018 IEEE Conference
on Cognitive and Computational Aspects of Situation Management (CogSIMA 2018) –
Boston, MA (2018)
7. Bonabeau, E.: Decisions 2.0: The power of collective intelligence. MIT Sloan Manag. Rev.
50(2), 45 (2009)
8. Woolley, A.W., Chabris, C.F., Pentland, A., Hashmi, N., Malone, T.W.: Evidence for a
collective intelligence factor in the performance of human groups. Science 330(6004), 686–
688 (2010)
9. Surowiecki, J. The wisdom of crowds. Anchor (2005)
10. Seeley, T.D., Buhrman, S.C.: Nest-site selection in honey bees: how well do swarms
implement the ‘best-of-N’ decision rule? Behav. Ecol. Sociobiol. 49, 416–427 (2001)
11. Marshall, J., Bogacz, R., Dornhaus, A., Planqué, R., Kovacs, T., Franks, N.: On optimal
decision-making in brains and social insect colonies. Soc. Interface (2009)
12. Seeley, T.D., et al.: Stop signals provide cross inhibition in collective decision-making by
honeybee swarms. Science 335(6064), 108–111 (2012)
13. Seeley, T.D.: Honeybee Democracy. Princeton University Press, Princeton (2010)
14. Seeley, T.D., Visscher, P.K.: Choosing a home: how the scouts in a honey bee swarm perceive
the completion of their group decision making. Behav. Ecol. Sociobiol. 54(5), 511–520
Genetic Algorithm Based on Enhanced
Selection and Log-Scaled Mutation
Technique

Neeraj Gupta1(B) , Nilesh Patel1 , Bhupendra Nath Tiwari2 ,


and Mahdi Khosravy3
1
Department of Computer Science and Engineering, Oakland University,
Rochester, MI, USA
{neerajgupta,npatel}@oakland.edu
2
INFN-Laboratori Nazionali di Frascati, Via. E. Fermi, 40 – I – 00044,
Frascati, Rome, Italy
bhupendray2.tiwari.phd@iitkalumni.org
3
Department of Electrical and Electronics Engineering,
Fedral University of Juiz de Fora, Juiz de Fora, Brazil
mahdi.khosravy@ufjf.edu.br

Abstract. In this paper, we introduce the selection and mutation


schemes to enhance the computational power of Genetic Algorithm (GA)
for global optimization of multi-modal problems. Proposed operators
make the GA an efficient optimizer in comparison of other variants of GA
with improved precision, consistency and diversity. Due to the presented
selection and mutation schemes improved GA, as named Enhanced Selec-
tion and Log-scaled Mutation GA (ESALOGA), selects the best chro-
mosomes from a pool of parents and children after crossover. Indeed,
the proposed GA algorithm is adaptive due to the log-scaled mutation
scheme, which corresponds to the fitness of current population at each
stage of its execution. Our proposal is further supported via the sim-
ulation and comparative analysis with standard GA (SGA) and other
variants of GA for a class of multi-variable objective functions. Addi-
tionally, comparative results with other optimizers such as Probabilistic
Bee Algorithm (PBA), Invasive Weed Optimizer (IWO), and Shuffled
Frog Leap Algorithm (SFLA) are presented on higher number of vari-
ables to show the effectiveness of ESALOGA.

Keywords: Selection operator · Mutation operator


Log-scaled mutation · Diversity preservation · Genetic algorithms
Metropolis algorithm

1 Introduction
Rapid industrial growth and utilization of the available resources efficiently are
of the prime importance nowadays, for example, route identification in traffic
systems, optimization of process allocation in maximizing production, utilization
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 730–748, 2019.
https://doi.org/10.1007/978-3-030-02686-8_55
Advances in Genetic Algorithm 731

of energy resources in power systems, optimizing VLSI circuits design, CAN


optimization in vehicles, etc. [1–12]. Most of the industrial problems are complex
in nature and belong to the combinatorial optimization, where the main focus
it to optimize discrete variables for maximizing/minimizing required objectives
[1,2]. Although, two methods are available to solve this type of problems, which
are integer programming approach and dynamic programming. These traditional
methods are known as exact algorithms [3,5]. However, due to the computational
complexity, where a fast solution is required for huge size optimization problems
that cannot be relied on such algorithms. Optimization in this aspect may be
critically crucial for the sustainable growth of industries in competitively winning
and in highly uncertain economic environments [3–5].
Henceforth, in last two decades, as an alternative approach to solve combi-
natorial problems, a large number of researchers have focused on approximate
methods to solve these problems that are close to their optimal state in a reason-
ably acceptable time. Thus, the development of heuristic algorithms in the field
of mathematics, engineering, etc. [6,7] have demonstrated a successful imple-
mentation towards the solution of real-life problems. As a result, a considerable
number of heuristic evolutionary algorithms have invented to work efficiently
on linear/ nonlinear, differentiable/ non-differentiable, concave/ convex prob-
lems with discrete variables [6–9]. A general description of complex functions
can be seen in [10] and their applications with discrete variables on power sys-
tem design in [11,12], and the capacity of the energy generators, the quantity of
goods produced, number of vehicles on the route, etc. in [13–15].
GA, as it works on binary variables, hardware friendly algorithms have been
proposed in many variants to solve the combinatorial problems. The literature-
survey shows a huge scope to further improve it by an appropriate combination
of mathematical modeling along with the heuristic concept [9]. GA and its asso-
ciated variants have been proved to give globally optimal solutions, especially for
the multi-modal non-differentiable/ combinatorial/ industrial problems [16–18].
Moreover, GA is very easy to implement and has an advantage of developing its
operators in a simple process from the inspiration of genetical processes which
have been rigorously investigated at a large scale during the last two decades
[1–19].
As developed by John Henry Holland [20], GA is inspired from the “survival
of the fittest principle”, which mimics the natural process of evolution in terms
of several operators as the selection, crossover, and mutation operators [20]. An
adaptation of these operators is analyzed and modeled by a large community of
researchers, where several of them have given evidence and they have improved
it by introducing novel selection approaches of the fittest individuals, types of
crossover variants and mutation schemes. These improved models of GA keep
the search not to stuck in a premature convergence. In the light of GA research,
this paper offers a combination of mathematical modeling and heuristic approach
together in order to find the global optimal solutions for multimodal nonlinear
functions. It is worth mentioning that over the last few decades GA has been
elected as a successful heuristic evolutionary technique for addressing various
732 N. Gupta et al.

global combinational industrial problems and it has been widely used due to its
simple structure, see for instance [9,16,21–23].
Regardless of the state of affairs, GA has as powerful optimization fundamen-
tals with a few drawbacks that can be seen in a number of readings [8,9,16,24].
GA convergences prematurely due to improper selection, crossover and muta-
tion probabilities and associated criteria [25–27]. In these papers, a variant of GA
has been described as a modification of the GA model parameters, i.e., selection
method, crossover operator, mutation operator, and undermining probabilities.
Based on [28], elitism ensures that winner chromosomes go in the next-generation
process that moves the search from a premature to the mature phase. This is
exploited in Sect. 3. Hereby, in the light of Adaptive GA [29], our proposal fur-
ther gives motivations to evolve its mutation probability based on the present
state of all candidates by using a probabilistic modeling.
This paper is structured in six sections. Firstly, in Sect. 2, we provide a brief
step-by-step description of the GA algorithm, as our proposal arises its improve-
ment. In Sect. 3, as the most important part of this paper, a brief description of
the proposed enhanced selection scheme and log-scaled mutation operators are
provided. Consequently, Sect. 4 presents a binary coded Enhanced Selection and
Log-scaled mutation Genetic Algorithm (ESALOGA) that as an optimization
package solves combinatorial problems. Section 5 presents simulated results in
comparison to other variants of GA and three real coded optimizers concerning
multi-modal benchmark functions. Finally, Sects. 6 and 7, respectively conclude
the paper and give future research directions and improvements.

2 Binary Coded GA
A step by step operation of the binary coded GA is presented [9] that firstly
allows to understand the concept of GA and symbiotic integration of different
operators such as selection, crossover and mutation operators.
Step 1: At first, the parameters of GA are initialized as the crossover and
mutation probabilities Pc and Pm , such that Pm  Pc , chromosomes in the
population s, number of bits l to represent one variables as to decide the
length of chromosomes, which is nl for n variables in the chosen problem.
Termination criteria as the maximum number of generations that GA could
proceed is selected based on the problem size.
Step 2: To start the evolution process, the fitness of each chromosome is cal-
culated in the population. In this process, a part of binary chromosome
representing
l−1 a variable is decoded to express in decimal represented as
dn = i=0 2i bni where bni ∈ {0, 1} belongs to the nth variable. Values of nth
variable are obtained as the bound xn ≤ xn ≤ xn , where nth variable xn is
(L) (U )

(L) xn
(U )
−x(L)
calculated as xn = xn + 2ln −1n dn based on its respective lower and upper
(L) (U )
bounds xn and xn . After converting the variable in a required domain,
the associated objective function f (x) is calculated for all individuals repre-
sented by the chromosome strings in the population. For the minimization
Advances in Genetic Algorithm 733

problem, the fitness function Fs , associated with s chromosome, is adopted


as Fs = 1+f1s (x) , which is the function of objective function fs (x).
Step 3: At this point, a selection operator selects the fittest chromosomes as
the candidates go for mating, based on Roulette wheel selection [9,30]. This
is the first stage of GA process, where multiple different operators have been
proposed, i.e., roulette wheel as in standard GA, tournament and uniform
selection as a variant of GA. In the proposal, we introduce an enhanced
selection scheme which is utilized after Step-4 instead in Step-3.
Step 4: In sequel, the crossover operator gives a number of strings from the
mating pool using fixed crossover probability Pc . For the selected pair of
candidates, knows as parents, a cross-site is generated randomly in the inter-
val (0, nl − 1) and swapped the selected regions between two pairs. At this
step, different crossover mechanisms have been proposed such as single-point,
multi-point, uniform crossover, and etc. The use of different crossover tech-
niques makes the standard GA to enhance as the variant of it.
Step 5: After above serial processes, children chromosome strings arise as the
result and the population of which is known as an intermediate population
as taken in [9]. At this step, we have a pool of parents and resulting their
offspring. Our proposal aims to answer which candidates should go to the
next evolution phase as better parents.
Step 6: At this juncture, bitwise mutation is carried out, where as a result of
mutation operator, a selected bit in the chromosome is flipped to opposite
binary value based on a relatively low fixed mutation probability pm . To
make the process adaptive, based on the current status of the population, we
propose a log scaled-mutation technique.
Step 7: Until the termination criterion is not reached, return to Step 2.

3 Selection and Mutation Schemes


In this section, we provide a step-by-step working principle of the proposed
enhance selection and log-scaled mutation operators in order for providing an
improved GA (ESALOGA) as a better optimization technique.

3.1 Proposed Selection Operator


Based on the Metropolis algorithm [31], we focus on possible improvements of the
GA for finding the optimal solution in due course of the cross-over while selecting
chromosome strings. This keeps intact a high degree of diversity in selecting
the children which are the most suitable when the chosen parents undergo a
cross-over. To chose appropriate candidates from the current pool of parents
and offspring a block diagrammed of the proposed selection strategy is given
in Fig. 1. Mathematically, this is realized by introducing a selection probability
as the Boltzmann probability distribution. Precisely, let T be the temperature,
then the selection probability p(T ) reads as the Maxwellian distribution.

p(T ) = e−ΔE/kT , (1)


734 N. Gupta et al.

where ΔE represents change in energy between the chosen parents and children.
With the above probability p(T ), a set of selected strings passed to the next
stage of evolution. It is worth mentioning that the principle of elitism [9] offers
the best fitness value to the string for a given pool of parents and children.
Following (1), the subsequent strings are selected that are the fittest string in

Fig. 1. Flow diagram for selection strategy after crossover.


Advances in Genetic Algorithm 735

the previous stage of the evolution. The proposed model is realized as per the
following steps:
Step 1: Choose an initial value of the temperature T as
M
T =α , (2)
I
where M is the maximum value of the fitness function {Fs |s = 1, 2, . . . , 20},
I is the number of iterations and s labels the strings pertaining to the cross-
over of a given population. Note that the initial value of the temperature T is
taken as large as possible such that it decreases in the subsequent iteration to
its desired value. Here, the proportionality constant α is set as per the chosen
algorithm.
Step 2: In order to find energy difference, one chooses j th string in a given pool
of parents and children and subtract the corresponding fitness value Fj of the
j th string to a priorly selected string Ff . In other words, the energy difference
that governs the probability distribution is given by

ΔE = Ff − Fj (3)

with j = 1, 2, 3.
Step 3: Compute p as per the equation 1 and obtain its minimum value as
ΔE
p = min(1, e− kT ) (4)
Step 4: Acquire a random number r ∈ (0, 1).
Step 5: If a candidate string is selected, and the corresponding previously
selected partner string is the fittest one, that is r < p.
Step 6: Else, go to Step 2, and repeat the search. In the case when none of the
strings are selected, one increases the value of mutation probability pm . In
practical situations, we may consider the corresponding value pm = 0.1.
Step 7: Finally, one selects the partner string chromosome by repeating Steps
2 to 6.

3.2 Proposed Mutation Operator


In this subsection, we offer log-scaled mutation strategy as given in Fig. 2 with
the corresponding operations as below:
Step 1: Obtain the mutation probability for a given fitness value fs (x) as per
the transformation ys = log10 Fs .
Step 2: For the maximum value of the fitness Fsmax , define ysmax = log10 Fsmax .
Step 3: Corresponding to the minimum fitness value Fsmin , define ysmin =
log10 Fsmin .
Step 4: ysmax is mapped to the minimum mutation probability pminm such that
the best candidates remain intact.
Step 5: ysmin is mapped to the maximum mutation probability pmaxm such that
the worst candidate mutate.
736 N. Gupta et al.

Step 6: Define a linear relationship between ys and pm,s as

m − pm
pmax min
pm,s = (ys − ysmin ), (5)
ysmax − ysmin

m − pm and ys
where the ratio of pmax − ysmin gives the β as the slop of the
min max

line plotted between pm,s and ys . This leads to the following linear equation:
pm,s = βys + γ, (6)
where γ is the intercept of the line as in (6) as

m − pm
pmax min
γ=− y min (7)
ysmax − ysmin s
With the above slop β and intercept γ, the mutation probability pm,s is
obtained by the following logarithmic relation
pm,s = βlog10 Fs + γ, (8)
where s labels the undermining chromosome. Physically, this shows the
inverse relation [9] between the fitness value Fs and the mutation probability
pm,s .

Fig. 2. Log-scaled mutation strategy.


Advances in Genetic Algorithm 737

Step 7: This assigns a unique mutation probability pm,s to each candidate


strings in the range (pmin max
m , pm ), viz. we have

m ≤ pm,s ≤ pm
pmin max
(9)

Step 8: Finally, a diversity in the selected population is realized by a bitwise


mutation process.

The fitness values of the strings are usually sparse, thus we propose a log-scale
mutation operator. In this approach, we find that all mutation probabilities
are kept in a specified range, irrespective of variations in the fitness values.
This makes our proposal adaptive and yields an evolution from a premature
to mature phase of a given population. In a nutshell, we have illustrated that
there is a non-linear relationship between mutation probability and fitness value
as far as evolutionary algorithms are concerned. In addition, it follows that
the higher fitness value leads to the lower mutation probability. This indicates
comparatively a larger search space while finding the global optimal solution.

4 Proposed GA (ESALOGA)

Based on the proposed enhanced selection and log-scaled mutation strategies,


we provide below pseudo-code of the algorithm.
For a given input parameters randomly generates a binary initial population
P with the fact that the candidate chromosome strings and mutation proba-
bility pm is adaptively selected by enhanced selection operation (EnSelection)
and given mutation range (pmin max
m , pm ), respectively. Produce the mating pool for
breeding for a given crossover probability pc . Extract two parents from the mat-
ing pool using standard roulette wheel (RW) selection operator. Indeed, other
selection schemes such as tournament and uniform could be adopted, as well, for
better performance. Perform the single-point crossover operation to produce two
children. Infact instead of sigle-point the use of two-points or uniform crossover
operation may enhance the computational capability. At this junction, form a
pool of two parents and produced their children choose two appropriate candi-
date strings using the enhanced selection (EnSelection) operator with probability
p(T ) as in (1). As a result of this two appropriate candidates are selected to go
in next evolution. When no chromosomes are selected from the pool, mutate all
the strings with an increased mutation probability pm and repeat the EnSelec-
tion operation. After this operation we get the intermediate population which
is subjected to the mutation operator with mutation probability pm . Following
the log-scaled strategy (LSMut), produce population of mutated string (Pm ),
as in Algorithm 1. Taking best chromosome from the above two populations as
shown in algorithm in line number (11). Repeate the steps until the termination
criterion is reached.
738 N. Gupta et al.

Algorithm 1. Pseudo-code for the proposed ESALOGA


Require: N : the number of chromosomes, pc : crossover probability, tmax : maximum
iterations, pm : mutation probability, pmin max
m : lower bound on pm , pm : upper bound
on pm , b: number of bits to represent one variable, v: number of variables.
P ←−round(rand(N,b*v)):initialize binary population randomly
1: GP←−best of [P]: GP belongs to the best solution in current P
2: for i ← 1 to tmax do
3: n←− 1
4: while n ≤ N do
5: [Parent1, Parent2] ←− Selection(P ) : RW or Tournament selection operation
6: [Children1, Children2] ←− Xover(P arent1, P arent2) : Crossover operation
7: [string1, string2] ←− EnSelection(P arent1, P arent2, Children1, Children2) :
Enhanced Selection operation two select two appropriate strings
8: P(n)←− string1
9: P(n+1)←−string2
10: n←− n+2
11: end while
12: Pm←−LSMut(P): Log-scale mutation after crossover
13: P←− N best chromosomes of [P,Pm]
14: GP(i)←−best of [P]
15: if Fitness(GP(i)) < Fitness(GP(i-1)) then
16: GP(i)←−GP(i − 1)
17: end if
18: end for
19: return GP

5 Results and Discussion


In this section, we provide effectiveness of the proposed GA for various bench-
mark functions [9,33]. Hereby comparing few variants of the GA, where they
are distinguished based on their different selection and crossover strategies, an
outline is given in Table 1. All the above variants are discussed in [32,33] and
tested on the benchmark functions which are concisely tabulated in Table 2. We
firstly present the results on Goldsteinprice, Levi, Beale, Himmelblau, Ackley,
and Rastrigin benchmark functions. Note that Rastrigin and Himmelblau func-
tions are multimodal in their nature while the Ackly function possesses a large
hole at its center with multi modularity. On the other hand, Beale function is a
unimodular with four sharp peaks at the corners. Similarly, Levi function has a
non-linear search space that may show a premature convergence in due course of
the execution of our optimization algorithm. Equally, it is worth noticing that an
optimization algorithm may get trapped in some of local minima of the objective
function, which our proposal overcome by having a larger diversity as shown in
Fig. 3 for different problems. Simulation results for comparative analysis of the
ESALOGA with respect to standard GA, VGA-1, VGA-2, VGA-3, and VGA-4
is given in Table 3 for 100 runs on aforementioned two variables problems.
Advances in Genetic Algorithm 739

Table 1. Selection and crossover strategies in variants of GA (VGA)

GA variants SGA VGA-1 VGA-2 VGA-3 VGA-4


Selection RW Random RW Random Tournament
Crossover Single-point Two-points Uniform Uniform Uniform

Table 2. Benchmark functions for testing ESALOGA

Functions Mathematical Description

Himmelblau: f1 (x1 , x2 ) = (x2 2 2 2


1 + x2 − 11) + (x1 + x2 − 7) with variables limit −6 ≤ x1 , x2 ≤ 6
n 2 − Acos(2πx )) with variables limit −5.12 ≤ x , x ≤ 5.12
Rastrigin: f (xi ) = An + (x
i=1 i  i 1 2
Ackley: f (x1 , x2 ) = −20exp(−0.2 0.5(x2 + x2 )) − exp(0.5(cos(2πx1 ) + cos(2πx2 ))) + e + 20
1 2
a = 20, b = 0.2, c = 2π, with variables limit −35 ≤ xi ≤ 35
Beale: f (x1 x2 ) = (1.5 − x1 + x1 x2 )2 + (2.25 − x1 + x1 x2 2 3 2
2 ) + (2.625 − x1 + x1 x2 )
with variables limit −4.5 ≤ x1 , x2 ≤ 4.5
Levi: f (x1 x2 ) = sin2 (3πx1 ) + (x1 − 1)2 (1 + sin2 (3πx2 )) + (x2 − 1)2 (1 + sin2 (2πx2 ))
with variables limit −10 ≤ x1 , x2 ≤ 10
Goldstein: f (x1 x2 ) = (1 + (x1 + x2 + 1)2 (19 − 14x1 + 3x2 2
1 − 14x2 + 6x1 x2 + 3x2 ))(30 + (2x1 − 3x2 )
2

(18 − 32x1 + 12x2 1 + 48x 2 − 36x 1 x 2 + 27x 2 )) with variables limits −2 ≤ x , x ≤ 2


2 1 2
n 4 2
Styblinski -Tang: f (x) = 1 i=1 (xi − 16xi + 5xi ), with variables limit −5 ≤ xi ≤ 5
2
n 2m  ix 2
i
Michalewicz: f (x) = − i=1 sin(xi ) sin with variables limit 0 ≤ xi ≤ π
π
n−1 sin2 (x2 i −x2 i+1 )−0.5
Schaffer No2.: f (x) = 0.5 + with variables limit −100 ≤ xi ≤ 100
i=1 (1+0.001(x2 +x2 2
i i+1 ))
 β
 n
Deceptive: f (x) = − 1 i=1 gi (xi ) with variables limit 0 ≤ xi ≤ 1, and β = 2,
n
n 4 n
i=1 cos (xi )−2 i=1 cos2 (xi )
Keane Bump: f (x) = −| n 
2 0.5
|
ix
i=1 i
n n
subject to: g1 (x) = 0.75 − x
i=1 i < 0, g 2 (x) = i=1 xi − 7.5n < 0

Results are compared on Six attributes, i.e., the best achieved by algorithms,
mean of the all solutions in 100 runs, standard deviation (Std) of the solutions
achieved in 100 runs, reliability of the algorithms stand for the solution achieved
by all lower than the mean of proposed GA, the worst achieved and at the last
average time taken by all algorithms for 1000 evolution epochs. This follows from
the average measurement techniques, giving a consistent and accurate determi-
nation of the approximate global optimal point as the effective of our proposed
algorithm. Interestingly, while the SGA and other variants get trapped in one of
their local optima, our proposed algorithm successfully terminates by locating
the global optimum for various benchmark functions.
The corresponding comparative results of the diversity preservation is
depicted in Fig. 3. In this figure, one can observe the spread of search for Him-
melblau, Beale, Ackley and Levi functions. As we can see that SGA trapped at
one point where ESALOGA examines different points for the global solution.
Approximately, similar effect can be seen for other functions. We address the
issue of premature convergence of the algorithm where the diversity preserva-
tion, where most of the GA variants behave similarly. Thus, we have proposed
an enhance selection scheme to overcome this condition of premature conver-
gence. We can equally maintain the diversity preservation adequately as shown
740 N. Gupta et al.

Fig. 3. Comparative result for the diversity preservation for the same number of gen-
erations (Left: Standard GA, Right: Proposed GA).
Advances in Genetic Algorithm 741

Table 3. Comparative simulation results of the proposed GA and other GA variants


in 100 runs

PGA SGA VGA-1 VGA-2 VGA-3 VGA-4


Goldsteinprice
Best 3.0010 3.0010 3.0010 3.0010 3.0010 3.0010
Mean 3.0806 11.4169 11.5692 6.1199 5.7327 12.7232
Std 0.0734 17.1370 17.9966 14.0022 8.2900 17.3742
Reliability 60% 56% 56% 60% 60% 50%
Worst 3.313 88.868 84.080 89.541 32.634 76.699
time 1.2953 2.9910 3.2558 2.9816 2.9116 3.6707
Levi
Best 7.8091e−04 5.5598e−05 5.5598e−05 5.5598e−05 5.5598e−05 5.5598e−05
Mean 0.0268 0.0555 0.0712 0.1220 0.1019 0.1432
Std 0.0270 0.1362 0.1733 0.2254 0.1790 0.3829
Reliability 60% 60% 70% 64% 54% 48%
Worst 0.110 0.725 0.725 0.725 0.725 2.600
time 1.3567 3.1313 3.7612 3.0177 3.3239 3.9555
Beale
Best 3.1186e−05 8.0472e−05 8.0472e−05 3.1186e−05 3.1186e−05 2.1385e−04
Mean 0.0024 0.2249 0.2651 0.2645 0.1736 0.2841
Std 0.0027 0.3054 0.4824 0.3083 0.2815 0.3226
Reliability 60% 14% 14% 14% 24% 60%
Worst 0.012 0.926 2.689 0.816 0.926 0.974
time 1.3624 2.9880 3.2909 2.9839 2.9306 3.6881
Himmelblau
Best 3.9863e−05 4.9682e−04 4.9682e−04 4.9682e−04 4.9682e−04 4.9682e−04
Mean 0.0633 0.2900 0.1968 0.2373 0.1486 0.6404
Std 0.2850 0.7672 0.5683 0.6091 0.4067 1.2703
Reliability 96% 76% 84% 76% 86% 64%
Worst 1.444 4.705 2.755 3.717 1.643 6.625
time 50.8745 3.6972 4.1740 3.4376 3.4056 4.5079
Ackley
Best 0.0182 0.1982 0.1982 0.1982 0.1982 0.1982
Mean 0.0372 0.6684 0.4341 0.6788 0.3161 0.9028
Std 0.0097 1.1057 0.8200 1.0503 0.5920 1.2887
Reliability 42 % 0% 0% 0% 0% 0%
Worst 0.061 3.639 3.639 3.639 3.639 3.639
time 1.4144 3.0694 3.3770 3.3646 3.2705 3.6703
Rastrigin
Best 0.0099 0.0104 0.0104 0.0104 0.0104 0.0104
Mean 0.2907 1.0351 1.2141 0.9595 1.5399 2.2707
Std 0.3054 1.0248 1.5045 1.0608 1.3773 2.1881
Reliability 66% 32% 32% 34% 20% 16%
Worst 1.0160 4.1020 7.9655 4.9817 5.0958 9.1854
time 1.3768 2.8542 3.1760 2.8413 3.0166 3.9116
742 N. Gupta et al.

in Fig. 3, which makes our algorithm relatively efficient. Further, we see from
Fig. 3 that our proposal reveals various local and global optimal points of the
aforementioned benchmark functions and offers a great diversity in searching
process instead of getting the same point under different evolutions.
This yields an appropriate optimization with high diversity preservation in a
given mating pool. We find an improved reliability (in percentage) as shown in
Table 3 in contrast to the standard GA and its other variants that get trapped in
an intermediate suboptimal state at most of the time. Hereby, we find that the
average performance of ESALOGA is comparable with the standard deviation.
Also the time taken by ESALOGA is reasonable. Moreover, from the results on
Himmelblau function, one can observe that ESALOGA tries the best to find
a better solution, but on the cost of its runtime. It reveals that ESALOGA
guarantees a better solution every time. As a mater of the fact, our algorithm
yields an intelligent mechanism to come out from a suboptimal trap and local
optima of a class of benchmark functions. By tuning the selective pressure to its
higher value, we can generate a desired diversity in the population and scan entire
search space while searching the global optimum. This provides an appropriate
trade-off between the selective pressure and diversity pressure.

5.1 Comparison of Proposed GA with Other Optimizers


In this section, we extend our algorithm to higher dimensions and provide com-
parison of ESALOGA with other optimizers involving certain complex functions,
i.e., Rastrigin, Ackley, Schaffer no2 [34], Michalewicz [35], Styblinski-Tang [36],
Deceptive [37,38] and constrained Keane’s bump [39], as shown in Table 2. These
functions have ability to extend in arbitrary dimensions with nonlinear analytical
investigations. These functions are related to the real-world problems, for exam-
ple, the Ackley function is considered as the free energy hypersurface of proteins.
Most of the above test functions add the difficulty of being less symmetric and
possess higher harmonics, which makes the functions difficult to solve and keep
the environment uncertain. Namely, the Schaffer function has concentric bar-
riers, whereby it capable to discriminate different optimizers. Hereby, we have
tested our algorithm on highly nonlinear, multi-modal functions with a large
number of local extrema. As mentioned above, one of them is the Michalewicz
function, which is observed as a strange mathematical function having n! num-
ber of local optima in n dimensions. Our optimization algorithm has given an
improved solution, as shown below in Table 4. Description of the Styblinski-Tang
Function [36] is considered further.
Another complex function is the deceptive function, which finds its impor-
tance in discriminating different optimizers. As in [37,38], it can be seen in the
existing literature about its computational difficulties. Here, we have shown the
results on the above complex functions, which qualify our algorithm as an apt
global optimizer. In the sequel, we focus on the constrained complex function
in multi-dimension. Namely, the Keane’s bump function is considered as the
test function. It is highly nonlinear and difficult to solve by the existing opti-
mizers because its solution exists at a nonlinear boundary. Performance of the
Advances in Genetic Algorithm 743

ESALOGA has been analyzed in comparison to VGA-4, probabilistic bee opti-


mization (PBA) [40], invasive weed optimization (IWO) [41], and shuffle frog
leap algorithm (SFLA) [42]. The comparative results of the above optimizers on
ten dimensional test functions are shown in Table 4. The parameters setting of
all the optimizers are taken as per the followings:

1. VGA-4 parameters:
(a) Fifty chromosomes are taken in a population
(b) Crossover probability is fixed at 0.8 to form the matting pool
(c) Mutation probability is taken as 0.02
2. PBA parameters:
(a) Number of scout bees are 50
(b) Recruited bees scale are defined as ?round(0.3*50)?
(c) Neighborhood radius is set as 0.1*(maximum variable value ? minimum
variable value)
(d) Neighborhood radius damp rate is 0.9
3. IWO parameters:
(a) Population size is taken as 50
(b) Minimum and maximum numbers of the seeds are 0 and 5 respectively
(c) Variance reduction exponent is set to 2
(d) Initial and final values of the standard deviation are 0.5 and 0.01 respec-
tively
4. SFLA parameters:
(a) Memeplex size is 25
(b) Number of memeplexes is 2
(c) Number of parents are defined as the maximum of rounded value of
(0.3*25) or 2
(d) Number of off-springs is taken as 3
(e) Maximum number of iterations is 5
5. proposed ESALOGA parameters:
(a) 50 chromosomes are taken in the population
(b) Crossover probability is 0.8 to form a matting pool
(c) Mutation probability is adaptively defined between 0 to 0.05 by our pro-
posed mutation scheme
(d) Mutation probability during enhance selection procedure is 0.02.

In one run of the optimization, all optimizers give the solution in five hun-
dred generations. We run the proposed algorithm for all the above mentioned
benchmark functions for fifty times to see the performance statistics. Hereby,
we compare the results on all the selected benchmark functions. Through the
observation of the Table 4, we can deduce the preeminence of the ESALOGA
over PBA, IWO, SFLA and GA for the above class of test functions. Compari-
son made on the five indices named as Best, Worst, Mean, which is achieved in
50 runs of the optimizer, Std the standard deviation of solutions in 50 runs by
optimizer, and Consistency which is defined as how many times the optimizer is
qualified as an expected solution (in percentage).
744 N. Gupta et al.

Table 4. Comparative results on ten variables for fifty runs

VGA-4 PBA IWO SFLA PGA


Styblinski-Tang Function
Best −389.2077 −261.6483 −377.5249 −377.5249 −391.6528
Worst −374.5288 −176.9046 −320.9780 −374.5288 −376.9688
Mean −383.0037 −218.7373 −352.0788 −354.9062 −385.3267
Std 4.2495 22.0725 16.9144 15.4860 3.4056
Consistency (Solution <−383) 55% 0% 0% 0% 90%
Michalewicz Extension function
Best −9. 5033 −3.4877 −9.3631 −9.2164 −9.6575
Worst −8.1878 −2.2156 −7.9995 −8.1878 −8.2459
Mean −8.9632 −2.8983 −8.8179 −8.6147 −9.0075
Std 0.3990 0.3521 0.4090 0.4242 0.3343
Consistency (Solution <−9) 55% 0% 40% 30% 65%
Ackley function
Best 0.0016 2.3168 0.0020 0 1.335e-04
Worst 2.0225 19.7360 18.8521 1.6538 1.6538
Mean 0.1086 11.6177 12.7090 0.4520 0.0828
Std 0.4506 4.8105 8.5472 0.8391 0.3698
Consistency (Solution <0.1) 95% 0% 30% 75% 99%
Rastrigin function
Best 3.0071 9.9496 0.9955 2.9849 1.0173
Worst 21.2696 34.8234 16.9149 21.2696 14.2134
Mean 10.8882 24.4759 8.7562 14.8746 6.3935
Std 5.0115 5.7601 3.7172 8.1695 3.5663
Consistency (Solution <10) 55% 50% 75% 35% 90%
Schaffer function No. 2
Best −3.9918 −1.0227 −1.1854 −3.4150 −3.7789
Worst −2.1046 0.0065 −0.1801 −2.1046 −2.6381
Mean −3.1594 −0.0941 −0.5848 −2.6446 −3.3691
Std 0.4458 0.2305 0.2734 0.5166 0.2948
Consistency (Solution <−3) 55% 0% 0% 25% 75%
Deceptive function
Best −0.9255 −0.4140 −0.7724 −0.8464 −0.9255
Worst −0.7483 −0.2729 −0.7040 −0.7483 −0.7187
Mean −0.8196 −0.3185 −0.7259 −0.7853 −0.7955
Std 0.0394 0.0389 0.0247 0.0326 0.0399
Consistency (Solution <0.8) 40% 0% 0% 10% 100%
Keane Bump function
Best −0.7257 −0.2368 −0.7492 −0.7038 −0.7405
Worst −0.6290 −0.1238 −0.2740 −0.6014 −0.6014
Mean −0.6818 −0.1750 −0.5778 −0.5532 −0.6856
Std 0.0292 0.0278 0.1532 0.1073 0.0357
Consistency (Solution <0.6) 30% 0% 25% 15% 55%
Advances in Genetic Algorithm 745

Based on the observations as in Table 4, we can extract the following com-


parative results:

1. ESALOGA is highly consistent than the other optimizers. In comparison


other optimizers, we have observed with the presented results that the ESA-
LOGA performs well for multimodal functions, which are highly complex
functions in their nature according to the literature [37,38].
2. For Styblinski-Tang, Ackley, Rastrigin and Deceptive functions, we find that
no other optimizer that the VGA-4 gives acceptable results as in Table 4.
Here, ESALOGA gives the optimal solution with a high consistency and low
standard deviation. Table 4 shows that the consistency of ESALOGA is 90%,
90%, 99%, and 100% for Styblinski-Tang, Rastrigin Ackley and Deceptive
functions respectively.
3. For Michalewicz function, the best optimization is given by ESALOGA with
a consistent mean around -9.0075, which is the best in comparison to all other
optimizers.
4. Schaffer function No. 2 is another highly complex function, which we have
solved by the ESALOGA with better results than the other above mentioned
optimizers.
5. On the highly complex constraint test function named as Kean-bump func-
tion, the ESALOGA gives outstanding results over other optimizers. Note
that only GA tries to compete with the results of the ESALOGA.
6. Overall, statistical results of ESALOGA are far better than the other opti-
mizers, as well.

6 Conclusion

In this paper, we have given an improved search technique based on biological


evolution. This is well suited to optimize multi-variable objective functions with
and without discontinuities. As a matter of the fact, the proposed operators
are flexible in finding the global minimum solution of a benchmark function.
Hereby, our proposition gives an improved technique for solving optimization
problems. Further, we have given simulation results of our proposal as a variant
of the standard GA. As the verification of the same, we have enlisted the global
solution of various of two variables bench-mark functions.
From the simulated results, it is found that our method precisely locates the
optimal points of multi-modal bench-mark functions. Hereby, various drawbacks
of the binary-coded GA including imprecision and inconsistency are taken care
by Metropolis scheme. This provides an enhanced selection and adaptive log-
scale mutation scheme. Subsequently, the global optimal solution is obtained
with an acceptable value of selection pressure. In other words, our proposal is
a meta-heuristic approach as far as the global optimization problems are con-
cerned. Indeed, this gives an improved precision and consistency as revealed via
the simulated results.
746 N. Gupta et al.

7 Future Scope
Proposed GA has a considerable scope of further improvement as discussed in
this section. The first stage of improvement belongs to the parallel population
approach which may give a better solution. To introduce more diversity, random
selection, tournament selection can be tested instead of roulette wheel selection
before crossover, where we are proposing selection after crossover. This is taken
as complimentary selection scheme for introducing more diversity after crossover.
In next paper, we will test the above specified selection strategies with proposed
GA and compare with the different variants of the available GA. The second
improvement is at the stage of crossover where different crossover techniques
such as single-point, multi-point, uniform, mid-point techniques can be tested
to see the superiority of proposed GA over other variants.
Results on the limited functions showing the proposed GA superiority over
SGA and other variants of GA. Thus, the insertion of enhanced selection scheme
as treated as complimentary selection after crossover, the log mutation scheme in
the structure of other GA variants may give better results over others. Moreover,
performance of the proposed GA can be increased by utilizing binary tree mem-
ory. Thus, we observe that the proposed GA has a wide scope of improvements
and it may further emerge as a dominant optimization algorithm for large scale
complex problems from sociology, engineering, Topology, Graphs, Biology, etc.
At this juncture, we anticipate that our proposal finds various applications in
real world industrial problems such as power systems, its transmission expansion
planning, data systems and wireless technology.

References
1. Bill, N.M., David, M.R.: Total productive maintenance: a timely integration of
production and maintenance. Prod. Inven. Manag. J. 33(4), 6–10 (1992)
2. Bevilacqua, M., Braglia, M.: The analytic hierarchy process applied to maintenance
strategy selection. Reliab. Eng. Syst. Saf. 70(1), 71–83 (2000)
3. Doganay, K.: Applications of optimization methods in industrial maintenance
scheduling and software testing. Mälardalen University Press Licentiate Theses,
School of Innovation, Design and Engineering, 180 (2014)
4. Shen, M., Peng, M., Yuan, H.: Rough set attribute reduction based on genetic
algorithm. In: Advances in Information Technology and Industry Applications,
The Series Lecture Notes in Electrical Engineering, vol. 136, pp. 127–132 (2012)
5. Sobh, T., Elleithy, K., Mahmood, A., Karim, M.: Innovative algorithms and tech-
niques in automation, Industrial Electronics and Telecommunications (2007)
6. Hillier, M.S., Hillier, F.S.: Conventional optimization techniques, evolutionary opti-
mization. Int. Ser. Oper. Res. Manag. Sci. 48, 3–25 (2002)
7. Miettinen, K., Neittaanmaki, P., Makela, M.M., Periaux. J.: Evolutionary algo-
rithms in engineering and computer science: recent advances in genetic algorithms.
In: Evolution Strategies, Evolutionary Programming, Genetic Programming and
Industrial Applications, Wiley (1999)
8. Kar: Genetic algorithm application (2016). http://business-fundas.com/2011/
genetic-algorithm-applications/. Accessed 27 June 2016
Advances in Genetic Algorithm 747

9. Deb, K.: Optimization for Engineering Design: Algorithms and Examples. Prentice
Hall of India Private limited, New Delhi (2005)
10. Tiwari, B.N.: Geometric perspective of entropy function: embedding, spectrum
and convexity, LAP LAMBERT Academic Publishing, ISBN-13: 978-3845431789
(2011)
11. Gupta, N., Tiwari, B.N., Bellucci, S.: Intrinsic geometric analysis of the network
reliability and voltage stability. Int. J. Electr. Power Energy Syst. 44(1), 872–879
(2010)
12. Bellucci, S., Tiwari, B.N., Gupta, N.: Geometrical methods for power network
analysis. Springer Briefs in Electrical and Computer Engineering (2013). ISBN:
978-3-642-33343-9
13. Nelson, B.L.: Optimization via simulation over discrete decision variables. In: Tuto-
rials in Operation Research, INFORMS, pp. 193 – 207 (2010)
14. Gupta, N., Shekhar, R., Kalra, P.K.: Computationally efficient composite transmis-
sion expansion planning: a Pareto optimal approach for techno-economic solution.
Electr. Power Energy Syst. 63, 917–926 (2014)
15. Gupta, N., Shekhar, R., Kalra, P.K.: Congestion management based roulette wheel
simulation for optimal capacity selection: probabilistic transmission expansion
planning. Electr. Power Energy Syst. 43, 1259–1287 (2012)
16. Goldberg, D.E.: Genetic Algorithms in Search Optimization and Machine Learning.
Addison-Wesley, Reading (1989b)
17. Chung, H.S.H., Zhong, W., Zhang, J.: A novel set-based particle swarm optimiza-
tion method for discrete optimization problem. IEEE Trans. Evol. Comput. 14(2),
278–300 (2010)
18. Liang, Y.C., Smith, A.E.: An ant colony optimization algorithm for the redundancy
allocation problem (RAP). IEEE Trans. Reliab. 53(3), 417–423 (2004)
19. Sharapov, R.R.: Genetic algorithms: basic ideas, variants and analysis, Source:
Vision Systems: Segmentation and Pattern Recognition, ISBN 987-3-902613-05-9,
Edited by: Goro Obinata and Ashish Dutta, pp.546, I-Tech, Vienna, Austria, June
2007. Open Access Database www.i-techonline.com
20. Holland, J.H.: Adaptation in natural and artificial systems, University of Michigan
Press, Ann. Arbor, MI (1975)
21. Goldberg, D.E., Lingle, R.: Alleles, loci, and the TSP. In: Proceedings of the 1st
International Conference on Genetic Algorithms, pp. 154 – 159 (1985)
22. Malhotra, R., Singh, N., Singh, Y.: Genetic algorithms: concepts, design for opti-
mization of process controllers. Comput. Inf. Sci. 4(2), 39–54 (2011)
23. Spears W.M., De Jong, K.A.: On the virtues of parameterized uniform crossover.
In: Proceedings of the 4th International Conference on Genetic Algorithms (1994)
24. Gupta, D., Ghafir, S.: An Overview of methods maintaining diversity in genetic
algorithms. Int. J. Emerg. Technol. Adv. Eng. 2(5), 263–268 (2012)
25. Ming, L., Junhua, L.: Genetic algorithm with dual species. In: International Con-
ference on Automation and Logistics Qingdao, pp. 2572 – 2575 (2008)
26. Cantu-Paz, E.: A survey of parallel genetic algorithms. Calc. Paralleles Reseaux
Syst. Repartis 10(2), 141–171 (1998)
27. Aggarwal, S., Garg, R., Goswani, P.: A review paper on different encoding schemes
used in genetic algorithms. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1), 596–
600 (2014)
28. Baluja, S., Caruana, R.: Removing the genetic form the standard genetic algorithm.
In: Proceedings of the 12th International Conference on Machine Learning, pp. 38
– 46 (1995)
748 N. Gupta et al.

29. Srinivas, M., Patnaik, M.: Adaptive probabilities of crossover and mutation in
genetic algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994)
30. Goldberg, D.E., Sastry, K., Kendall, G.: Genetic algorithms. In: Burke, E.K.,
Kendall, G. (eds.), Search Methodologies: Introductory Tutorials in Optimization
and Decision Support Techniques. Springer, Science + Business Media, NY (2014)
31. Cipra, B.A.: The Best of the 20th Century: Editors Name Top 10 Algorithms,
SIAM News 33(4) (2016). https://www.siam.org/pdf/news/637.pdf. Accessed 27
June 2016
32. Man, K.F., Tang, K.S., Kwong, S.: Genetic algorithm: concepts and applications.
IEEE Trans. Ind. Electron. 43(5), 519–534 (1996)
33. Jamil, M., Yang, X.: A Literature survey of benchmark functions for global opti-
mization problems. Int. J. Math. Model. Numer. Optim. 4(2), 150–194 (2013)
34. https://www.sfu.ca/∼ssurjano/schaffer2.html
35. https://www.sfu.ca/∼ssurjano/michal.html
36. https://www.sfu.ca/∼ssurjano/stybtang.html
37. Iclănzan, D.: Global optimization of multimodal deceptive functions. In: Blum,
C., Ochoa, G. (eds.) Evolutionary Computation in Combinatorial Optimisation.
EvoCOP 2014. Lecture Notes in Computer Science, vol. 8600. Springer, Berlin,
Heidelberg (2014)
38. Li, Y.: The deceptive degree of the objective function. In: Wright A.H., Vose M.D.,
De Jong K.A., Schmitt L.M. (eds.) Foundations of Genetic Algorithms. FOGA
2005. Lecture Notes in Computer Science, vol. 3469. Springer, Heidelberg (2005)
39. Mishra, S.K.: Minimization of Keane’s bump function by the repulsive particle
swarm and the differential evolution methods, May 2007 (2007). SSRN:http://
ssrn.com/abstract=983836
40. Karaboga, D., Akay, B.: A comparative study of artificial bee colony algorithm.
Appl. Math. Comput. 214(1), 108–132 (2009)
41. Bozorg-Haddad, O., Solgi, M., Loáiciga, H.A.: Invasive weed optimization. Meta-
Heuristic and Evolutionary Algorithms for Engineering Optimization, pp. 163–173.
Wiley (2017)
42. Eusuff, M., Lansey, K., Pasha, F.: Shuffled frog-leaping algorithm: a memetic meta-
heuristic for discrete optimization. Eng. Optim. 38(2), 129–154 (2006). Taylor &
Francis
Second-Generation Web Interface
to Correcting ASR Output

Oldřich Krůza(B) and Vladislav Kuboň

Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics,


Charles University, Malostranské nám. 25, Prague, Czech Republic
{kruza,vk}@ufal.mff.cuni.cz

Abstract. This paper presents a next-generation web application that


enables users to contribute corrections to automatically acquired tran-
scription of long speech recordings. We describe differences from similar
settings, compare our solution with others and reflect on the develop-
ment from the now 6 years old work we build upon in the light of the
progress made, lessons learned and the new technologies available in the
browser.

Keywords: Speech recognition · Transcription · Community-driven


Web standards

1 Introduction
In 2012 [7], we have presented a setting where a community of users contributed
corrections to automatically transcribed talks of a single speaker. Now that the
browser technologies evolved drastically and we could observe the usage patterns
and discover shortcomings of the solution at hand, we have created a next gen-
eration of the programme. We shall describe the steps taken and discuss their
motivation and impact.
The application we describe is a part of a larger system that deals with
Makoň’s recordings. It consists roughly of (1) the corpus itself, (2) an ASR
system trained specially for it and (3) a web interface for the users. These three
parts form a whole where the ASR gives a baseline transcription, the users
correct it and the corrections are fed as further training data to the acoustic and
language models. In this paper, we focus on the web interface.

1.1 Motivation

Our project focuses on the collection of recordings of Karel Makoň [5] *1912
†1993, the author of numerous books, translations and comments to works of
spiritual and religious nature, who was influenced by trances during recurring
surgery without anesthesy in the age of 6, ecstasies in the youth and finally

c Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 749–762, 2019.
https://doi.org/10.1007/978-3-030-02686-8_56
750 O. Krůza and V. Kuboň

facing and surviving certain death in a Nazi concentration camp, after which he
experienced enlightenment. He gave talks in a narrow circle of friends and the
recordings in our care have been taken between early 70’s and 1991, spanning
about 1000 h in total.
All of Makoň’s work deals more or less directly with a single topic: entering
the eternal life before the physical death. He draws mainly from the Christian
symbolism, builds up on Christian mysticism and ancient tradition of India and
China.
Makoň’s written works present his teachings in a systematic, comprehen-
sive fashion, while the recordings offer bonuses: talks tailored to the audience,
answers to questions, personal experiences, behind-the-scenes to the books etc.
The archive is freely accessible1 under the CC-BY license.

2 Differences to Other Settings


The spoken corpus is about 1000 h of a single speaker. Our aim is to have a
transcription as good as possible for the purpose of searching and further, higher-
level processing of the data. There is a pool of people interested in the talks,
who on one hand are the force we can try to employ and on the other hand are
the consumers of our effort, our target group so to speak.
The web application should therefore combine the two purposes: 1. serve its
user with making the content available in a manner as good as possible and 2.
animate the user to give as much and as high-quality contribution as possible.
To our best knowledge, there is no other project with a comparable setting.
However, we can compare single aspects found in other applications.

2.1 Transcription Apps

The best widespread match to our task is that of creating an application for
transcribing speech recordings. Let us compare the two tasks, pointing out the
main points of difference. For reference, we take (1) Transcriber2 , a classical
open-source program written in TCL, (2) oTranscribe3 , a free modern web-based
transcription tool and (3) Transcribe4 a commercial web-based transcription
tool.
The numbers in the bullet list below denote the programs our statement
applies to. For example, of the three only Transcriber allows speaker annotation,
hence there is only the number (1) standing at the second list item.

1
https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-1455.
2
trans.sourceforge.net.
3
otranscribe.com.
4
transcribe.wreally.com.
Second-Generation Web Interface to Correcting ASR Output 751

• transcription applications: • our application:


• are optimized for the case (1,2,3) • always assumes a prior
where there is no transcription is available;
transcription available and it
must be acquired from
scratch;
• allow annotation of (1) • assumes all utterances
speakers; come from the same speaker;
• need no quality control: the (1,2,3) • needs the transcription to
user is free to enter whatever be accurate because it is used
transcription she pleases and as training data for the
the ultimate measure is her acoustic model;
satisfaction;
• use alignment on the level (1)5• uses alignment on the level
of phrases, if any; of words;
• are user-centric: the user (1,2,3) • is data-centric: the whole
transcribes whatever acoustic application with all its tools
data they choose; and persons revolves around
the data set;
• assumes the user wants to (1,2,3) • assume the user wants to
transcribe; listen and possibly read along
and we want to animate her
to submit transcriptions;
• has no shared data between (1,2)6 • must count with collisions.
users;

Despite of these differences, we can still learn a lot from transcription soft-
ware. The ease of performing common tasks, like pausing, resuming and rewind-
ing is crucial for the user experience and in effect for the amount of submissions
that we receive. Also, the way the text is displayed synchronously to the audio
played has a big impact and the approaches have a lot of space for variation.

2.2 Wiki

Where our application diverts from transcription software, it mostly resembles


a wiki: a community platform that serves its users including the contributors
but where the quality of the contributions is essential, while the contributor’s
satisfaction alone is of less importance.
One major point of difference to a wiki is that wiki is creative, whereas
our task is mechanical. The user has basically no room for their own invention:
providing a different than correct transcription is seen as an error.

5
Transcriber explicitly aligns the text with speech, while the other two merely support
addition of timestamps into the transcription.
6
Transcribe supports team co-operation.
752 O. Krůza and V. Kuboň

Popular wikis have good measures for edit conflicts, which is where we could
learn some lessons. However, so far there was no need to do that because
1. if we always simply take the most recent version of a segment, the result stays
consistent even if a piece from user A comes into a larger transcription of
user B;
2. our user base is so far limited to a small community who have no problem
coordinating with each other. We plan to expand to broader public soon
though.
With regard to the transription as presented to the user, a submitted seg-
ment of transcription always overwrites the present version but we keep all the
submissions in a database, so undo operations, clustering submissions by their
author etc. are possible but we had little need for this so far.

2.3 Corpora
Our project is not the first involving community-driven care of a corpus. We can
mention the Manually annotated sub-corpus [6], where annotations of various
kinds are gathered from volunteers, or the Wikicorpus [10], a corpus of Wikipedia
articles with some linguistic annotation. Our project may reach profound sim-
ilarities with these in the future, when we no longer focus on the transcription
itself but rather on annotation.
There is also CzEng [3], the Czech-English Parallel Corpus, where a large
part of the translation is provided by volunteers. The similarity in setting is
considerable as both projects involve a machine-produced erroneous derivative
of the original material (in our case audio transcriptions, in the case of CzEng
Czech translations of English texts), and a community of volunteers correct
these. But the specifics of the projects bring different challenges and dictate
different approaches.
Marge (2009) [8] investigates using The Mechanical Turk to obtain audio
transcriptions. Mihalcea (2004) [9] offers a web interface for word-sense disam-
biguation and focuses mostly on annotator conflict resolution.

3 Description of the Web Application


3.1 Usage
We have no special assumption of the user beyond basic computer usage skills
and understanding the audio. We assume no prior training. There is a manual
for clearing common points of confusion. The main message in it is that any-
thing that is to be transcribed, should be transcribed with respect to phonetic
precision, even if it results in nonsensical character strings.
Anything except words spoken by the one speaker of interest is to be left
untranscribed, including noise or speech by other persons7 . Incomprehensible
7
In our data, other speakers represent a negligible fraction but we may later add
support for speaker annotation.
Second-Generation Web Interface to Correcting ASR Output 753

words are to be left uncorrected (the ASR output kept) if the phones are unclear.
If the phones uttered are clear but it is not clear what word was meant, the word
may be transcribed phonetically.

3.2 Implementation

The application consists of several views:

1. the start page where all recordings are listed and each points to a detail view,
2. the detail view, where a recording can be played back, its transcription is
displayed and can be corrected by the user,
3. the search page, where hits to a search query are listed and point to corre-
sponding positions in the recordings,
4. static pages with general information, contact etc.

We shall only discuss the detail view as the others are not relevant to this
paper. Figure 1 shows the interface during playback. Figure 2 shows the interface
while a segment is being edited. The interface in the figures is conveniently shown
in English, although in reality it is in Czech.
Legend to Fig. 1:

1. Header with
– app name linking to start page,
– about link,
– search field and
– username input field;
2. Identifier of the recording;
3. Automatically transcribed segments in grey;
4. Manually transcribed segments in black;
5. Currently played-back word highlighted by yellow background;
6. Marked word highlighted in regent st. blue;
7. Marked word info:
– occurrence: the word with contextual capitalization and punctuation as it
appeared in the text (currently being edited as the selected initial letter
reveals),
– form: normalized word form as it appears in the word list,
– pronunciation: Czech phonetic transcription of the word,
– position: time of the beginning of the word in seconds from the start of
the recording;
8. Tools for storing:
– direct links to the audio files,
– selecting the whole transcription for easy pasting,
– storing the decoded recording in the browser’s IndexedDB;
9. Graphical equalizer for compensating narrow-band noise;
754 O. Krůza and V. Kuboň

Fig. 1. Web interface during playback

10. Audio playback controls:


– play/pause button,
– current playback position,
– playback scrollbar,
– total recording length;
11. Current position reflected in URL fragment.
Legend to Fig. 2:
1. Selecting a text range with the mouse defines the segment the user is about
to transcribe;
2. The edit tool with
– text area prefilled with the current transcription,
– playback button that plays the corresponding segment,
– save button and
– download-segment button, which initiates a file-save action for the audio
segment corresponding the the selected text. The synthesis of the down-
loaded file takes place in the browser.
The commonest tasks have keyboard shortcuts: ctrl+space for play/pause
and ctrl+enter for submitting a correction.
Second-Generation Web Interface to Correcting ASR Output 755

Fig. 2. Interface in the state of editing a segment

3.3 Displaying the Transcription

Many transcription programs show the transcription as a vertical list of utter-


ances, see Fig. 3 for an example of Transcriber. We attribute this to the fact
that the atomic elements of the transcription are the user-entered utterances
and their boundaries are reliable. In our case, the atomic elements are words.
There are sentences, sure, but the segmentation to sentences by the ASR is
very unreliable, so we want it to be natural to transcribe a segment overlapping
sentence boundaries.
This is one of the reasons why we display the transcription basically as a
single wrapped line.

Performance Challenge. The transcription display was designed to have these


features:

1. Currently played-back word should be highlighted;


2. Manually transcribed segments should be clearly distinct from automatically
transcribed ones;
756 O. Krůza and V. Kuboň

3. Selecting one or more words with the mouse should trigger transcription mode
for the selected text; upon a successful save, this should be merged into the
display;
4. Clicking a word should bring up its context info (we call this the marked word
as the term selected word is already taken);
5. The whole transcription should be shown at once for easy searching;
6. The page should be responsive.

Fig. 3. A screenshot of transcriber

These requirements are harder to combine than it may seem. Notably respon-
siveness is hard to combine with all of the other ones. Why is that so?
Points 1 through 4 call for every word to be wrapped in its own element.
Point 5 and the median count of words in a transcript of about 6000 yield 6000
<span> elements just to show the text.
Although this may not seem like a big deal, it does affect the responsiveness
and memory footprint of the page.
Second-Generation Web Interface to Correcting ASR Output 757

In the original version, we solved this by sacrificing point 5: only 3 lines of


text are shown with the current word kept on the middle line as shown on Fig. 4.8
Thanks to the development in the web standards and their support from popular
browsers, a solution is possible.

Fig. 4. Original web interface from 2012

Solution. We can use the fortunate fact that manually transcribed words and
automatically transcribed ones tend to form larger chunks. The average number
of words per submitted segment is 7.9. Furthermore, the absolute majority of
such segments are adjacent to other manually transcribed chunks.9 Hence, wrap-
ping each chunk of consecutive manually or automatically transcribed words in
an HTML element is no problem, which solves point 2.
Point 3 can be implemented using document.selection and the Range
objects, which let us find out the innermost HTML element and text offset
of the start and end of the textual selection. Since we know the length of each
word, this allows us to map the selection to the corresponding words in the
transcription.

8
The current word is on the top line on the screenshot because it is at the beginning
of the recording.
9
The median number of chunks is 1 (most recordings have no manually corrected
segments), maximum is 1109. Median only counting touched recordings is 8.
758 O. Krůza and V. Kuboň

Points 1 and 4 can be implemented in two ways: We could either wrap the
current and marked word in a dedicated element or we could draw a highlighting
rectangle beneath the word.
Wrapping the word would definetely be more robust and less error-prone
but the constant changes in the DOM during playback with possible frequent
reflows speak against it. Finding the exact position of each word and draw-
ing a rectangle precisely beneath it (beneath on the z-axis; over it in the x-y
sense), avoiding positioning issues and keeping the rectangle position synced
even after scrolling/window resizing is definitely a challenge but we chose this
way nonetheless. The performance gain for the majority of the usage time out-
weighs the possible errors in the corner cases, more so since the eventual errors
are not critical and mostly remedied by further playback.
The efficiency of repositioning a rectangle is supported by the fact that we can
calculate the coordinates of all rendered words once and only recalculate them in
two cases: (1) In the rare event of screen resize and (2) when a corrected segment
is merged into the transcription, in which case we only need to recalculate for
the words further in the document.10

Manual/Automatic Distinction. As shown on Fig. 1, we draw automatic


transcription in grey and manual one in black. Why did we choose this instead
of normal/boldface? Firstly, the normal font is optimal for reading. Boldface
is meant to highlight spots in text. It becomes bulky when applied on long
continuous passages. The automatic transcription contains many errors, so there
is no sense in optimizing it for best reading experience.
There is also another practical reason. When the two font variants only differ
in color, and a segment of automatic transcription is left intact and submitted
as correct transcription, its merge-down into the displayed text causes no reflow,
which saves us computations and raises responsiveness. It may seem like a rare
use case but we believe that identifying correctly recognized words is a legitimate
way of contribution, so why not optimize for it?
Still, the underlying HTML tags are <span> and <b> because that way the
distinction persists when copy-pasting the text from the web page to a rich text
editor.

3.4 Ergonomy

It is clear that the ease of use is crucial in our case where the user is supposed to
perform a requiring, tedious task with repeated steps, especially since it is our
interest more than hers that she performs them. We compared our setting with
that of transcription apps in Sect. 2.1, pointing out lessons to learn. Let us now
look at some specific points and their actual (lack of) implementation.

10
We could even stop the recalculation as soon as we find that the new horizontal coor-
dinate of a word is left untouched, and add the difference in the vertical coordinate
to all subsequent words, i.e. when a line stays the same, so do all below it.
Second-Generation Web Interface to Correcting ASR Output 759

Keyboard Shortcuts. One of the most profound measures in ergonomy are


keyboard shortcuts. The most common task is pausing and resuming playback.
Both oTranscribe and Transcribe use the esc key for that, and Transcriber uses
the tab key. We chose ctrl+space combination. We argue that esc is not the
best of options for desktops because the distance the fingers have to travel from
the alphanumeric keys causes a noticeable delay. This can lead to missing a pause
between words. The tab key as chosen by Transcriber is a splendid choice from
the ergonomy point of view and there is no reason not to use it in a dedicated
user interface. However, in the browser, where the tab key has as native use, re-
binding it could lead to confusion and irritation. The space bar is probably the
easiest-to-find key in all situations and dedicating ctrl to all application-specific
commands as opposed to single keys lends a sense of consistency, we believe.
This is mere personal experience though, as we had no resources so far to
perform serious research to support these statements.
The only other keyboard shortcut we support is ctrl+enter for submit-
ting the correction. We chose this to stay consistent using the ctrl key and
because this shortcut is familiar to users of many instant messengers, like the
Facebook chat or the once popular official ICQ client. Also, requiring a key
combination prevents accidental submission, which is desirable as we only want
double-checked, guaranteed-precise ones. In comparison, Transcriber uses the
bare enter key to separate utterances. oTranscribe and Transcribe allow free
formatting with no explicit alignment, so using the enter key to split utterances
by lines is the user’s choice.

Missing Features. One of the features that Transcribe, the only commercial
tool in our reference list, offers is setting up keyboard shortcuts for common
words. We have not implemented this because ideally, common words should
be covered by speech recognition. However, it could be sensible to implement it
anyway. The reason is that a word can be very rare globally and thus poorly
recognized by ASR but very common in a specific passage. This particularly
regards named entities.
Another point in our ergonomy to-do list is lifting the need to select a segment
prior to correcting it. If the transcription was simply editable, it could increase
the ease of use rapidly. We would have to automate the selection of segment to
send for forced alignment but we could probably do a better job than the user
in the end.

3.5 Mechanics of Submitting a Corrected Segment


As stated above, when the user selects at least one character with the mouse,
the application enters the state of correcting the selected transcription padded
to whole words. In this mode, the transcription to correct is shown in a text area
and the global playback controls are replaced by those that only allow playback
of audio corresponding to the selected transcription.
Once the user believes that the content of the text area corresponds precisely
to the words uttered, she hits the save button or the ctrl+enter keyboard
760 O. Krůza and V. Kuboň

shortcut. This starts an asynchronous HTTP request to the back-end, where


parametrized (MFCC) versions of the recordings are stored, along with the new
transcription and the time positions of the beginning and end of the segment.
The server then cuts off the corresponding segment from the parametrized
recording, runs forced alignment on it with the provided transcription with a
threshold to reject bad matches. If the forced alignment fails, an error response
is sent back and the transcription is not merged into the original. In the case of
a success, the correction is merged on one hand on the server side and pushed
to a CDN, on the other hand it is merged into the transcription word array in
the JavaScript application. This redundancy warrants that we do not have to
reload the whole transcription every time a segment is corrected.
React ensures the updating of the chunks, and the coordinates of the words
further in the document are recalculated for word-highlighting purposes.
Apart from this, the version of the transcription to the recording is updated.
This is because the transcription files have a long cache time because normally,
they do not change at all. At the page load, the versions of all transcriptions are
loaded and used as cache busters. This enables us to use an external CDN and
cache effectively.

3.6 Implementation Details

Audio Engine. The adoption of Web Audio API [2] allowed for big improve-
ments in comparison with the original implementation. There are four major
differences between using the HTML <audio> tag and the Web Audio API.

– It is now possible to precisely replay the selected audio span.


– We could implement a graphical equalizer. Some recordings suffer from loud
noise in the low frequency spectrum. A systematic approach to acoustic nor-
malising of the material is a point of future work. Until that, the equalizer is
a huge relief for the users.
– Thanks to the OfflineAudioContext, it is possible to store the recording
in the browser’s storage and avoid downloading or decoding it again after
reload. We use IndexedDB as the storage method because localStorage has
too low quota of about 10 MB and the FileSystem API is not yet widely
enough supported.
– We have also implemented saving the audio corresponding to the selected
text segment as a sound file.

App State Management. We use React as the view library and Redux [1] for
state management. The good thing about Redux is that it makes it easy to keep
minimal state as the single source of truth and everything that can be computed
is computed, while avoiding needless calculations. This is of course nothing new
– basically it is what we know from database design as the normal representation
[4]. It is the first time this approach reached the web front-end in such degree of
popularity though.
Second-Generation Web Interface to Correcting ASR Output 761

Also in our case, this approach makes the program more predictable, less
error-prone and, as the modern programming jargon lovingly expresses, easier
to reason about. But some of our features make this a bit complicated.
Among the states the app can enter is simple playback, inspecting a word
and transcribing a segment. The only relevant things we actually keep in the
Redux store are:

1. The array of transcription words, each of which bears the flag whether it is
automatically or manually transcribed. This defines the manual - automatic
chunks of words that in turn define the HTML elements wrapping them.
2. The beginning and end of the selection in terms of chunk number and char-
acter offset in the chunk, which is basically what we get from the DOM upon
a mouseup event.

Whether a word is marked or a segment is being edited is determined solely


by the boundaries of the selected words. If there is a selection and the beginning
and end are identical, it means a word was simply clicked and its detail is shown
(it is marked). If the boundaries span at least one character, then all words that
intersect this span are selected for correction.
Simple as it sounds, a slight problem arises when a correction is accepted and
the corrected subtitles are merged into the view. In the time after the correction is
accepted and reflected in the redux state, but before the new chunks are rendered
in the document, selection changes cannot be reliably mapped to logical chunks.
Simple null defaults solve this problem.

4 Future Work

We plan to focus on optimizing the app for wider audience. Experience confirms
that Makoň’s talks are of interest to some people, and our aim is to remove
as many obstacles as possible from potentially interested people reaching the
material. The benefit from technical point of view would be clear: A web app
for listening to recordings and correcting their transcription is nice but one that
is really easy to use and inviting to people to submit corrections is nicer.
One of the aspects we want to explore is enabling people to naturally share
catching segments of talks on social networks.
Another point of near-future endeavor is higher-level work with the contents.
By this we mean that we would like to use both automatic processing methods
and the users to do semantic analysis of the talks: What topic is covered where?
What topics are covered at all? Which talks relate to which written works?, and
similar questions.
We shall also deploy the technology on a different data set once we find a
good fit.
762 O. Krůza and V. Kuboň

5 Conclusion
With our web application, a user can listen to recorded speech, see its transcrip-
tion with the currently played word highlighted, commit corrections to the tran-
scription, and inspect a word. The corrections are checked by a forced-alignment
mechanism on the server side. Our solution overcomes performance challenges
and is a serious improvement from the original version. All our codebase is open-
source, accessible on Github11 and we are actively looking for similar datasets
with communities to employ the application on.

Acknowledgments. The research was supported by SVV project number 260 453.
This work has been using language resources stored and distributed by the LIN-
DAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech
Republic (project LM2015071).

References
1. Abramov, D.: Redux. React Community. c (2015)
2. Adenot, P., Wilson, C., Rogers, C.: Web audio API. W3C, October 10 (2013)
3. Bojar, O., Janı́ček, M., Češka, P., Beňa, P., et al.: Czeng 0.7: parallel corpus with
community-supplied translations. LREC 2008 (2008)
4. Codd, E.F.: A relational model of data for large shared data banks. Commun.
ACM 13(6), 377–387 (1970)
5. Hájek, J.: Český mystik karel makoň. Dingir 2007(4), 142–143 (2007)
6. Ide, N., Fellbaum, C., Baker, C., Passonneau, R.: The manually annotated sub-
corpus: a community resource for and by the people. In: Proceedings of the ACL
2010 Conference Short Papers, pp. 68–73. Association for Computational Linguis-
tics (2010)
7. Krůza, O., Peterek, N.: Making community and ASR join forces in web environ-
ment. In: International Conference on Text, Speech and Dialogue, pp. 415–421.
Springer (2012)
8. Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon mechanical Turk
for transcription of spoken language. In: 2010 IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 5270–5273, March 2010
9. Mihalcea, R., Chklovski, T.: Building sense tagged corpora with volunteer con-
tributions over the web. Recent Advances in Natural Language Processing III:
Selected Papers from RANLP 2003 260, p. 357 (2004)
10. Reese, S., Boleda, G., Cuadros, M., Rigau, G.: Wikicorpus: a word-sense disam-
biguated multilingual wikipedia corpus (2010)

11
https://github.com/sixtease/MakonReact.
A Collaborative Multi-agent System for Oil Palm
Pests and Diseases Global Situation Awareness

Salama A. Mostafa1 ✉ , Ahmed Abdulbasit Hazeem2, Shihab Hamad Khaleefahand3,


( )

Aida Mustapha1, and Rozanawati Darman1


1
Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Johor, Malaysia
{salama,aidam,zana}@uthm.edu.my
2
Anbar General Director of Education, Anbar 31001, Iraq
ahmed.a.hazeem@gmail.com
3
Al Maarif University College, Anbar 31001, Iraq
shi90hab@gmail.com

Abstract. Many researchers have been studying biological and managerial chal‐
lenges of oil palm trees plantation and production. Oil Palm Pests and Diseases
(OPPD), such as Oryctes rhinoceros beetles and Ganoderma are most prominent
among the natural factors that deter the growth of oil palm trees and yields. Some
of these OPPD have the properties of fast expansion and dynamic distribution
making the monitoring of the OPPD a complex problem. Consequently, this paper
proposes a risk assessment framework for Oil Palm Pests and Diseases Global
Situation Awareness (OPPD-GSA). The OPPD-GSA framework operates by a
teamwork of humans and software agents in a Collaborative Multi-agent System
(CMAS). The overall system is implemented and experimentally tested in moni‐
toring and controlling a sample OPPD observation data of Oryctes rhinoceros
beetles and Ganoderma within five areas in Malaysia. The test results confirm
that the OPPD-GSA application is able to process the OPPD monitoring tasks in
real-time and handle Geo-located visualization data.

Keywords: Oil palm pests and diseases · Risk assessment · Multi-agent system
Global situation awareness

1 Introduction

Palm oil production is vital for the economy of Malaysia and its neighboring countries
[1, 2]. Malaysia is the world’s second-largest producer of the commodity after Indonesia.
Oil palm trees vicious enemies are a number of pests and diseases [3, 4]. These pests
and diseases have the ability to spread and incur serious damages to the trees including
the plant physiology, tissue, and metabolism. Ultimately, they damage the crop and
curbing its ability to optimize oil production [3]. Consequently, there is an urgent and
real need for an integrated solution to Oil Palm Pests and Diseases (OPPD) detection
and surveillance. The solution supports a synergized effort between regional agencies
and leverages the expertise and resources in eliminating the OPPD.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 763–775, 2019.
https://doi.org/10.1007/978-3-030-02686-8_57
764 S. A. Mostafa et al.

Monitoring the OPPD is a distributed and dynamic problem that requires advanced
technologies. A multi-agent system provides a variety of agents’ capabilities that facil‐
itate flexibility in solving dynamic and distributed problems [5, 6]. The agents are
equipped with communication, coordination, cooperation and/or negotiation capabili‐
ties. The goal of an individual agent dictates the resolution of local and dynamic prob‐
lems, while the goal of a multi-agent system dictates the resolution of distributed prob‐
lems [7, 8]. However, the OPPD monitoring aggregates uncertainty and approximation
of multiple events [9]. It entails that the agents understand the context of the perceived
aggregated knowledge of events, which is a more challenging task.
We hypothesize that improving software agents’ capabilities of situation awareness
to handle multiple events provide a suitable foundation for a distributed risk assessment
measures for monitoring the OPPD. The awareness of multi-agent systems for the
prevailing conditions of the OPPD would greatly improve their mental state to manifest
accurate monitoring ability [10, 11]. Subsequently, this work proposes an Oil Palm Pests
and Diseases Global Situation Awareness (OPPD-GSA) framework. The OPPD-GSA
framework deploys a teamwork of humans and software agents in a Collaborative Multi-
agent System (CMAS). The framework is tested and validated using real OPPD data to
provide online and global situation awareness of the OPPD for specific areas in
Malaysia.
This section presents an introduction to the work that includes the research problem,
methods, objectives, and outcomes. The next section presents the literature review in
three parts, which are the oil palm pests and disease, their risk assessment methods, and
situation awareness of a CMAS. Section 3 presents the OPPD-GSA framework that
includes the formulation of situation awareness and the CMAS. Section 4 presents the
implementation of the overall OPPD-GSA application, testing, results, and discussion.
Finally, Sect. 5 concludes the paper and views the future work.

2 The Literature Review

The research background of this work covers three parts. The first part presents the major
types of oil palm pests and diseases. The second part presents the oil palm pest and
disease assessment methods. Finally, the third part presents examples of situation
awareness research and applications in multi-agent systems.

2.1 The Oil Palm Pests and Diseases

There are many biological problems of a soil-borne organism affecting the oil palm trees.
The OPPD have become the major threats to oil palm plantations of large planted
and/or replanted areas over a long period of time [12]. The most common OPPD in
Malaysia among other related species are Oryctes rhinoceros beetles, and basidiomycete
fungus Ganoderma lucidum. These two OPPD cause great damage to oil palm planta‐
tions and yields.
Oryctes rhinoceros beetle becomes a serious problem due to the instituted policy of
no-burning in the 1990s in Malaysia [4]. The Ganoderma problem, on the other hand,
A Collaborative Multi-agent System for OPPD 765

is likely to become severe over the next few years as the fungus increases its geographical
range and virulence over time [13]. Figure 1 shows Ganoderma (fungus fruiting bodies)
symptoms on oil palm trees in Banting, Selangor, Malaysia.

Fig. 1. Ganoderma symptoms on oil palm trees [4].

Early detection of OPPD is an important prerequisite to its eventual control, and


perhaps eradication [1]. There is a lot of ongoing research on refining diagnostic methods
that eventually help in identifying, monitoring and managing these serious pathogens
[13, 14]. Different strategies have been investigated as primary options in oil palm plan‐
tations across Southeast Asia from Malaysia to Papua New Guinea [3]. The strategies
such as integrated pest management or biocontrol can be applied to the identified areas
to intervene and control these harmful organisms.

2.2 The Oil Palm Pests and Diseases Risk Assessment

According to a recent study made by [15], there is a need for implementing new methods
that are able to detect and monitor OPPD and estimate their risk. Such methods have a
significant impact on oil palm yield and industry [2]. Monitoring risks for pests and
diseases and reducing their attacks could greatly increase and maybe even double the
production of the oil palm crop [4]. Figure 2 shows the difference between the potential

Fig. 2. The development of oil palm yield over time [17].


766 S. A. Mostafa et al.

and actual yields of the oil palm trees in which the actual yield is influenced by yield-
reducing factors such as pests, and diseases [16].
Liaghat and Balasundram [16] present some remote sensing and GIS techniques that
help in improving oil palm crop management. The techniques are applied to identify
pest-infested and diseased plants in order to monitor the diseases and insect pests of
crops. The disease infection and insect infestation damages can be measured to provide
an estimated view of the crops and control their risk.
Shafri and Hamdan [9] point out the need for an online detection and surveillance
system of OPPD in Malaysia. They propose airborne hyperspectral imagery approach
to detect and map the affected oil palm trees. The system uses vegetation indices,
Lagrangian interpolation and red edge techniques in the pests and diseases detection
process. The system is tested in real case scenario and the location of the study area is
in Selangor. The system test results recorded 73% to 84% detection accuracy. Figure 3
shows a sample of the healthy and diseased oil palm trees that is detected by the airborne
hyperspectral sensor.

Fig. 3. A sample of the healthy and diseased oil palm [9].

Idris et al. [18] apply geostatistical techniques to quantify some of the oil palm
diseases growth. The collected data of the diseases is plotted into a GIS. The system is
used to analyze the data and predict the possible spread of the diseases. This prediction
helps to estimate the cost of the diseases’ treatment, revenue’s losses and expected yield
after treatment.

2.3 The Situation Awareness of Agents

A more advanced approach of a distributed and dynamic effort is needed for OPPD risk
assessment and monitoring. Software agents approach is considered as a potential solu‐
tion as it roots from Distributed Artificial Intelligence (DAI) computing systems [19].
They significantly contribute and facilitate solutions to many distributed and dynamic
problems [11]. Research in software agents has progressed over more than a decade due
to the demands of dynamic and open environments and the complexity of delegated
tasks since agents are capable of making autonomous decisions and performing goal-
directed actions in many applications [6].
A Collaborative Multi-agent System for OPPD 767

Subsequently, situation awareness capabilities in agents are found to be a very


effective approach to enhancing distributed decisions in dynamic environments. The
following paragraphs outline a number of attempts to formulate situation awareness in
agent-based systems.
Wardziński [20] emphasize the importance of situation awareness mechanism in
improving an agent’s knowledge and increasing its decision accuracy, especially, in
dynamic and uncertain environments. Baader et al. [10] proposed data aggregation,
semantic analysis, and alert generation layers that correspond accordingly to the percep‐
tion, comprehension and projection phases of situation awareness. Semantic analysis
layer is concerned with extracting the meaning of situations using ontologies of objects
and events. The model alerts human actor in the aviation domain about predefined situa‐
tions occurrence via a GUI.
Lili et al. [21] propose situation awareness mechanism to reason about agents’ deci‐
sions. They proposed a Situation Reasoning Module (SRM) that supports the agents’
assessment capabilities. Five processes are used in SRM, which are event detection,
situation cognition, task cognition, performance capacity assessment and integrated
situation reasoning. Hoogendoorn et al. [22] deploy a mechanism on an agent belief
optimization in which the agent’s degree of awareness on a situation is signified by an
activation value of belief. The mechanism aims to generate complex beliefs from the
observed beliefs and enable the agent to assess future situations.
Mostafa et al. [5, 11] propose a Situation Awareness Assessment (SAA) technique
for multi-agent systems. The SAA technique is meant to measure the situation awareness
of the agents in dynamic and uncertain environments and use the outcomes to manage
the collaborative decision-making cycle of the agents. It evaluates the agents’ utilities
from the success of their actions. Subsequently, the SAA applies one of four operational
states of proceed, halt, block and terminate to the agents run cycle based on their
performance.

3 The OPPD-GSA Framework

This paper proposes an Oil Palm Pests and Diseases Global Situation Awareness (OPPD-
GSA) framework, the aim of which is to assist decision-making parties in monitoring,
containment and eliminating oil palm pests and diseases. Figure 4 shows the OPPD-
GSA framework. The framework includes teamwork operators of humans and software
agents. Figure 6 shows the collaborative multi-agent system. The teamwork cooperates
in performing the OPPD-GSA functions. The framework comprises of four main parts:
Surveillance, Reporting, Assessment, and Visualization.
The surveillance part is mainly human-based in which the observations of human
agents is its key element. The human agents constitute the oil palms’ related agencies,
officers, and farmers that work in the field. Their roles are to detect and report oil palms
pests (e.g., Bagworms, Nettle caterpillar, and Rat) and diseases (e.g., Ganoderma basal
stem rot and Marasmius bunch rot) incidences to the system. The human agents have
an online Oil Palm Pest and Diseases Surveillance (OPPDS) application. They use their
naked eyes in the pests and diseases surveillance and inspection process. They may use
768 S. A. Mostafa et al.

Fig. 4. The OPPD-GSA framework.

imaging systems such as drones in the observation and data collection process. The
human agents are keying in their field observations using the OPPDS application. The
OPPDS application transfers the observation data to the Reporting part.
The Reporting part consists of interface software agents, Report Generation Module
(RGM), and OPPD guides and observations databases. The aim of this part is to extract
detailed information about the reported pests or diseases incidences. The interface agents
interact with the human agents through the OPPDS application and process the obser‐
vations. These agents with the aid of the RGM retrieves the data from the observation
database and then synthesizes the gathered data and generates OPPD cases and reports.
The OPPD cases can be accessed and viewed by the assessment software agents, while
the OPPD reports can be accessed and viewed by the human agent via the OPPDS
application. A case contains the needed information for the risk assessment process
including factors of temperature, severity level, type, time, and growth phase, and
season, while a report contains the detailed information about a particular observation
including the human agent reporter basic information, infection type, severity, date,
location and recorded evidence. Additionally, the Reporting part generates individual
reports and the overall report that show the relations between the reports.
The Assessment part consists of risk assessment software agents, OPPD cases, and
assessment criteria. The agents apply the assessment criteria on the OPPD cases and
dynamically perform distributed risk assessment measures to the OPPD cases. The
A Collaborative Multi-agent System for OPPD 769

dynamics entail that every assessment cycle is fed as input to the next cycle along with
the updated OPPD cases.
The Assessment part outcomes are time-bounded OPPD risk assessment and statuses.
Finally, the Visualization part uses a web Geographic Information System (GIS) tech‐
nology to view the past, current and expected OPPD risk assessments and statuses.

3.1 The Situation Awareness


The principle of involving humans and autonomous agents to carry out some system’s
initiatives manifests the notion of the intelligent interactive system. Progressively,
modeling improved agents to develop advanced systems has aroused a great interest in
agent research and application [21]. Applying situation awareness capabilities in agents
is one attempt that might provide potential solutions. This work adopts Endsley [23]
approach of situation awareness formulation which is “the perception of the elements
in the environment within a volume of time and space, the comprehension of their
meaning, and the projection of their status in the near future”. This approach suggests
the phases of sensing, perception, comprehension, and projection to improve decision-
making and action-performing of systems. Figure 5 shows the correlation between
observation and situation awareness in an agent’s run cycle.

Fig. 5. The representation of situation awareness in an agent’s run cycle.

Subsequently, an agent awareness of an event in an environment is built upon sensing


the event, perceiving some of its situational elements in a specific period of time, under‐
standing the situational elements dynamics of the event, and projecting the understood
situational elements into the near future [24]. Thus, the agent awareness entails knowl‐
edge interpretation (or belief) and deep analysis to the situational elements of events [22].

3.2 The Collaborative Multi-agent System


A software agent has an active run cycle that reduces the computation time and cost of
processes and ensures fast and dynamic responses. The OPPD-GSA framework has a
Collaborative Multi-agent Systems (CMAS) that consists of three groups of agents which
are human agents, aHi, interface agents, aIi, and risk assessment agents, aAi. These groups
communicate and cooperate to perform global risk-assessment tasks. The CMAS has a
770 S. A. Mostafa et al.

distributed problem-solving structure that provides local and global reasoning and aggre‐
gate decision-making capabilities. A general scheme of the CMAS is shown in Fig. 6.

Fig. 6. The collaborative multi-agent system.

The aHi group senses the environment and collects information regarding OPPD
incidences. They report the OPPD observations to the main system using OPPDS appli‐
cation as explained above. The OPPDS is a geographic information system (GIS) of
web and Android-based applications that enable human users to deal with and retain
geolocation data of OPPD in observations database including images, texts and coor‐
dinates.
The aIi group receives the OPPD observations, retrieves the related data from the
OPPD guides and observations databases. Then it synthesizes the data to pi or di cases
and distributes the cases to the aAi group. Subsequently, it submits the cases to the Report
Generation Module (RGM) to generate OPPD reports.
The aAi group is equipped with statistical methods that use a case’s soft data to
measure individual and distributed risks. The risk of pri or dir are ranked to five levels in
{ }
which r = x1 = {0.0, 0.1, 0.2}, x2 = {0.3, 0.4}, x3 = {0.5, 0.6}, x4 = {0.7, 0.8}, x5 = {0.9, 1.0} . A level
x1 indicates a very-low risk and a level x5 indicates a very-high or serious risk. Each of
the OPPD has a static impact value, τ, and the impact values have also the range of (0–
1). The aAi retrieves the related OPPD cases and aggregates their risk levels to generate
a current view of the pri and dir.
( )
Xnr ⇐ aggregate xmr , xn𝝉 , μ, 𝛿, k (1)

where X r is the aggregation of the distributed pests or diseases type; n is the distribution
index of which; x is an individual pest or disease type; m is a reference to a range of
distributed cases to be aggregated; μ is the index of risk assessment matrix; 𝛿 is the option
choice parameter that constrains the risk assessment collaborative decision of the multi-
agent system, and k is the aggregation granularity.
The 𝛿 determines the needed number of agents to do the risk assessment. Each agent
is responsible for applying a particular risk assessment matrix μ. The aAi use the aggre‐
gated risk levels to project future view of the pri or dir.
A Collaborative Multi-agent System for OPPD 771

⌣ ( )
Xnr ⇐ project Xnr , £, d (2)

where Xnr is the projection of the distributed pri or dir risk levels in the d duration of time
and £ is the projection metric.

4 The Implementation and Results

The OPPD-GSA framework is implemented in a web application. The implementation


platforms are Java, Jada, HTML, JavaScript and Google Maps JavaScript API.
Figure 7 shows a web page of the application that views an observation. The marker
indicates the observation location and its color indicates its OPPD status.

Fig. 7. The OPPD-GSA application and information layers.

The OPPD-GSA web application has an open street map with grids and coordinates.
The Google Maps API is customized with an enhanced viewer and 10 m grid appears.
It also has different map layers with a variety of overlays, such as polylines, markers,
polygons along with their location data as shown in Fig. 7.
The layers visualize the boundary of the oil palm lands and estates and their OPPD
statuses. The risk levels are represented by five colors in which red denotes very high
risk, green denotes very low risk and the three other levels are in between the two.
772 S. A. Mostafa et al.

The CMAS setting includes the factors of risk levels, risk impact, risk assessment
matrices and their index, the aggregation granularity and the projection duration. Some
of these factors can be customized by human users and others are automatically config‐
ured by the agents based on their analysis of the observations data. Table 1 shows the
setting options of the system.

Table 1. The options of the system setting


Type Option
Description Low range High range
xr The risk levels 1 5
x𝜏 The impact of the risk 0 1
μ The risk assessment matrices 1 4
δ The index of the risk assessment 1 2
matrices
k The aggregation granularity 1 3
d The projection duration (month) 1 6

Data collection has offline and online phases. The offline phase is conducted to gather
OPPD analysis data. The sources of this data are the literature, e.g., [2, 18], and oil palm
agencies in Malaysia. This data is used to build the OPPD guides and the assessment
criteria. The online phase is conducted during system operation to gather the data of the
OPPD observations.
The OPPD-GSA framework is experimentally tested using observation samples for
OPPD of five areas in Malaysia. The instigated OPPD in the tests are Oryctes rhinoceros
beetles, pri and Ganoderma, dir in Malaysia OPPD. The tests evaluate the dynamics of
the data flow in the system and the accuracy of the OPPD risk assessment. Table 2 shows
the corresponding preliminary result of the tests.

Table 2. The preliminary results



r r
Area pr1 d1r p𝜏1 d1𝜏 Xir Level 1 2 3
1 0.30 0.20 0.65 0.85 0.183 x11 x11 x12 x12
2 0.10 0.10 0.65 0.85 0.075 x21 x21 x21 x21
3 0.10 0.40 0.65 0.85 0.203 x32 x32 x32 x32
4 0.20 0.20 0.65 0.85 0.150 x41 x41 x41 x41
5 0.10 0.20 0.65 0.85 0.118 x51 x51 x51 x51
1 1 1
Overall 0.16 0.22 0.65 0.85 0.111 X X X X1

The test results confirm that the application is able to process the OPPD monitoring
in real-time and handle Geo-located visualization data. The assessment results show that
there is a considerably very low-risk level of Oryctes rhinoceros beetles, pri, and Gano‐
derma, dir in the five areas. The projection of three months shows the same results except
in area one. The overall OPPD risk is also very low.
A Collaborative Multi-agent System for OPPD 773

This research provides an integrated solution to OPPD detection, surveillance and


elimination. It supports a synergized effort between regional agencies and leverages the
expertise and resources even in countries facing geographic barriers where the infor‐
mation flow is not current or even readily available. Consequently, the proposed OPPD-
GSA framework is critical for directing control efforts, developing control tools, and
strategizing decision-making process. It is meant to:
• Enhance the surveillance capabilities by means of real-time intelligent data sharing
and coordination across borders, with improved information speed of access, accu‐
racy, and quality.
• Provide a comprehensive analysis and reports on the collected observations.
• Broadcast each local case’s severity and project a global risk view of the cases for
the authorities to respond in an efficient manner.

5 Conclusions and Future Work

This paper proposes a framework of Collaborative Multi-agent System (CMAS) for Oil
Palm Pests and Diseases Global Situation Awareness (OPPD-GSA). An application of
the OPPD-GSA framework is implemented for monitoring and controlling the OPPD
via applying risk assessment measures. The OPPD-GSA is experimentally tested by a
sample OPPD observation data of Oryctes rhinoceros beetles and Ganoderma within
five areas in Malaysia. The test results show that there is a considerably very low-risk
level of Oryctes rhinoceros beetles and Ganoderma in the five areas. The projection of
three months shows the same results except in area one. The overall OPPD risk assess‐
ment is also considered very low. The novel ideas of the system formalization can be
used to serve other similar emergency management systems such as pollution, fire and
flood monitoring and control.

Acknowledgment. This project is sponsored by the postdoctoral grant of Universiti Tun Hussein
Onn Malaysia (UTHM) under Vot D004 and partially supported by the Tier 1 research grant
scheme of UTHM under Vot U893.

References

1. Ramle, M., Wahid, M.B., Norman, K., Glare, T.R., Jackson, T.A.: The incidence and use of
Oryctes virus for control of rhinoceros beetle in oil palm plantations in Malaysia. J. Invertebr.
Pathol. 89(1), 85–90 (2005)
2. Foster, W.A., Snaddon, J.L., Turner, E.C., Fayle, T.M., Cockerill, T.D., Ellwood, M.F.,
Yusah, K.M.: Establishing the evidence base for maintaining biodiversity and ecosystem
function in the oil palm landscapes of South East Asia. Phil. Trans. R. Soc. B 366(1582),
3277–3291 (2011)
3. Murphy, D.J.: Future prospects for oil palm in the 21st century: biological and related
challenges. Eur. J. Lipid Sci. Technol. 109(4), 296–306 (2007)
774 S. A. Mostafa et al.

4. Liaghat, S., Ehsani, R., Mansor, S., Shafri, H.Z., Meon, S., Sankaran, S., Azam, S.H.: Early
detection of basal stem rot disease (Ganoderma) in oil palms based on hyperspectral
reflectance data using pattern recognition algorithms. Int. J. Remote Sens. 35(10), 3427–3439
(2014)
5. Mostafa, S.A., Ahmad, M.S., Tang, A.Y., Ahmad, A., Annamalai, M., Mustapha, A.: Agent’s
autonomy adjustment via situation awareness. In: Intelligent Information and Database
Systems, pp. 443–453. Springer, Cham (2014)
6. Andreadis, G., Bouzakis, K.D., Klazoglou, P., Niwtaki, K.: Review of agent-based systems
in the manufacturing section. Univers. J. Mech. Eng. 2(2), 55–59 (2014)
7. Durand, B., Godary-Dejean, K., Lapierre, L., Crestani, D.: Inconsistencies evaluation
mechanisms for a hybrid control architecture with adaptive autonomy. In: CAR: Control
Architectures of Robots (2009)
8. Mostafa, S.A., Ahmad, M.S., Ahmad, A., Annamalai, M., Gunasekaran, S.S.: A flexible
human-agent interaction model for supervised autonomous systems. In: 2016 2nd
International Symposium on Agent, Multi-Agent Systems and Robotics (ISAMSR), pp. 106–
111. IEEE, Putrajaya (2016)
9. Shafri, H.Z., Hamdan, N.: Hyperspectral imagery for mapping disease infection in oil palm
plantationusing vegetation indices and red edge techniques. Am. J. Appl. Sci. 6(6), 1031
(2009)
10. Baader, F., Bauer, A., Baumgartner, P., Cregan, A., Gabaldon, A., Ji, K., Schwitter, R.: A
novel architecture for situation awareness systems. In: International Conference on
Automated Reasoning with Analytic Tableaux and Related Methods, pp. 77–92. Springer,
Heidelberg (2009)
11. Mostafa, S.A., Ahmad, M.S., Annamalai, M., Ahmad, A., Gunasekaran, S.S.: Formulating
dynamic agents’ operational state via situation awareness assessment. In: Advances in
Intelligent Informatics, pp. 545–556. Springer, Cham (2015)
12. Flood, J., Bridge, P.D., Holderness, M.: Ganoderma Diseases of Perennial Crops. CABI, New
York (2000)
13. Panchal, G., Bridge, P.D.: Following basal stem rot in young oil palm plantings.
Mycopathologia 159(1), 123–127 (2005)
14. Bridge, P.D., O’Grady, E.B., Pilott, C.A., Sanderson, F.R.: Development of molecular
diagnostics for the detection of Ganoderma isolates pathogenic to oil palm. In: Flood, J.,
Bridge, P.D., Holderness, M. (eds.) Ganoderma Diseases of Perennial Crops, pp. 225–234.
CAB International, Wallingford (2000)
15. Mohammed, C.L., Rimbawanto, A., Page, D.E.: Management of basidiomycete root-and
stem-rot diseases in oil palm, rubber and tropical hardwood plantation crops. Forest Pathol.
44(6), 428–446 (2014)
16. Liaghat, S., Balasundram, S.K.: A review: the role of remote sensing in precision agriculture.
Am. J. Agric. Biol. Sci. 5(1), 50–55 (2010)
17. Woittiez, L.S., van Wijk, M.T., Slingerland, M., van Noordwijk, M., Giller, K.E.: Yield gaps
in oil palm: a quantitative review of contributing factors. Eur. J. Agron. 83, 57–77 (2017)
18. Idris, A.S., Mior, M.H.A.Z., Wahid, O., Kushairi, A.: Geostatistics for monitoring Ganoderma
outbreak in oil palm plantations. MPOBTS Information Series 74 (2010)
19. Byrski, A., Dreżewski, R., Siwik, L., Kisiel-Dorohinicki, M.: Evolutionary multi-agent
systems. Knowl. Eng. Rev. 30(2), 171–186 (2015)
20. Wardziński, A.: The role of situation awareness in assuring safety of autonomous vehicles.
In: International Conference on Computer Safety, Reliability, and Security, pp. 205–218.
Springer, Heidelberg (2006)
A Collaborative Multi-agent System for OPPD 775

21. Lili, Y., Rubo, Z., Hengwen, G.: Situation reasoning for an adjustable autonomy system. Int.
J. Intell. Comput. Cybern. 5(2), 226–238 (2012)
22. Hoogendoorn, M., Van Lambalgen, R.M., Treur, J.: Modeling situation awareness in human-
like agents using mental models. IJCAI Proc. Int. Jt. Conf. Artif. Intell. 22(1), 1697–1704
(2011)
23. Endsley, M.R.: Situation awareness global assessment technique (SAGAT). In: Aerospace
and Electronics Conference NAECON National, pp. 789–795. IEEE (1988)
24. Mostafa, S.A., Ahmad, M.S., Ahmad, A., Annamalai, M.: Formulating situation awareness
for multi-agent systems. In: 2013 International Conference on Advanced Computer Science
Applications and Technologies, pp. 48–53. IEEE, Kuching (2013)
Using Mouse Dynamics for Continuous User
Authentication

Osama A. Salman(&) and Sarab M. Hameed

Computer Science Department, College of Science,


University of Baghdad, Baghdad, Iraq
{ausama_cnc,sarab_majeed}@yahoo.com

Abstract. This paper suggests a new model for user authentication based on
mouse dynamics. The proposed model utilizes a neural network to identify user
behavior and Gaussian Naïve Bayes classifier is applied for classification pur-
pose and assessing the ability of the proposed model to distinguish between
genuine and imposter user. The performance of the proposed model is examined
on a dataset of 48 users. The results prove that the proposed model outperforms
other models in all evaluation metrics including, accuracy, false accept rate,
false reject rate and error equal rate.

Keywords: Behavioral biometrics  Continuous authentication


Gaussian Naïve Bayes  Mouse dynamics  Neural network

1 Introduction

An increasing interest in the research on mouse dynamics based user authentication has
been growing because there is no need of any specific hardware to collect biometric
data. Mouse dynamics include monitoring the user behavior through how he/she
interacts with the mouse as a means of authentication [1].
Unfortunately, most present computer systems involve authenticating a user just at
the beginning of login session. However, there is a crucial security flaw in some cases,
when the user leaves the computer for a period of time when it is unlocked. Conse-
quently, the attacker accesses the system resources and steal confidential information.
To deal with this problem systems require continuous user authentication in which the
user is continuously authenticated.
Several attempts in the literature have been suggested to address the problem of
continuous user authentication. Ahmed and Traore in 2010 [2] introduced the mouse
dynamics biometric concepts and presented a detector to gather and process mouse
movements. In addition, various factors adopted to form user signature were consid-
ered. Testing of the detector was performed on the dataset with 48. The proposed
detector result achieves false acceptance rate ðFARÞ equal 2.6052% and false rejection
rate ðFRRÞ equal 2.506%. Zheng et al. in 2011 [1] proposed an approach for user
authentication via mouse dynamics. In this approach, angle-based metrics are extracted
which unique for each user and platform independent and Support Vector Machine
ðSVM Þ was used for verification. Experiments were conducted on two datasets. The
first one involved thirty users and the second one consists of one thousand users.
© Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 776–787, 2019.
https://doi.org/10.1007/978-3-030-02686-8_58
Using Mouse Dynamics for Continuous User Authentication 777

The performance of the approached was evaluated in terms of equal error rate ðEERÞ
and the obtained result was 1.3%. Feher et al. in 2012 [3] introduced a method for
continuously verifying users based on individual mouse action. They extracted new
features form a hierarchy of mouse actions. These new features are injected with
previous work’s features. Furthermore, a multi-class classifier was utilized to verify
user identity. The evaluation was performed using a dataset collected from different
users. Results showed a significant enhancement in the accuracy when applying the
newly injected features. Shen et al. in 2014 [4] presented a study the performance of
anomaly detection algorithms based on mouse dynamics. The evaluation was per-
formed on a dataset containing 17,400 samples from fifty-five users and seventeen
detectors were applied. The results show that the six top-performing detectors produce
ERR between 8.81% and 11.63%. Mondal and Bours in 2015 [5] presented a study
regarding the performance of continuous authentication using mouse dynamics. They
use weighted fusion scheme, score boost, static trust model and dynamic trust model
for analyzing and SVM and ANN as a classifier. The evaluation was done on a dataset
that includes the mouse dynamics data obtained from forty-nine users. The results
showed significant improvement over the beforehand performance results on the same
dataset. Mondal and Bours in 2016 [6] introduced a new a technique based on pairwise
user coupling for identification and continuous user authentication. They build a dataset
that contains a combination of the data behavior of keystroke and mouse dynamics.
The accuracy result is 62.2% and the detection rate is 58.9%. Lu et al. in 2017 [7]
proposed an authentication approach using mouse movement and eye movement
tracking. Two neural networks were used for multi-class classification and binary
classification. In addition, the regression model with fusion was used for classification
purpose. The performance of the proposed approach was evaluated on a dataset col-
lected from forty users. The results clarify that coupling eye tracking with the mouse
dynamics are applicable for authentication.
In this paper, the problem of continuous user authentication is considered through
continuously analyzing the user’s mouse movements to obtain active and continuous
authentication.
In what follows, we first briefly describe the mouse dynamics. Then, in Sect. 3, we
introduce the proposed mouse dynamics user authentication model. The results of the
proposed model are then evaluated in Sect. 4. Finally, Sect. 5 concludes the current
work and hints some further ramifications.

2 Mouse Dynamics

Mouse dynamics is considered as an example of behavioral biometric. The key


strengths of the mouse dynamic compared with other biometric technologies is that it
enables monitoring the user dynamically and passively. Accordingly, it can be utilized
to track continuously genuine and imposter users during computing sessions [8]. The
mouse actions can be classified as silence action that denotes no movement and
movement activities. Movements of the mouse can involve movement type, movement
speed, traveled distance and movement direction. Movement type contains Mouse-
Move ðMM Þ action, Drag-and-Drop ðDDÞ action and Point-and-Click ðPC Þ action [2].
778 O. A. Salman and S. M. Hameed

Movement direction can be specified by an angle or by eight directions numbered


from one to eight. Each number comprises a set of mouse movements done in a 45-
degree. As an example, direction number one describes all actions done with angles
between zero-degree and 45-degree, while direction number two represents any actions
done with angles between 45-degree and 90-degree [5].

3 The Proposed Mouse Dynamics Model

The proposed mouse dynamics model (coined as MD) introduces new features con-
structed from the properties of mouse movement to observe the user behavior. The
main components of the proposed model can be recapped into two phases. These are
preprocessing phase and classification phase. A significant part of the proposed model
is feature extraction process. The main effort in the proposed model is to observe the
behavior of the user and analysis it to represent the user by a set of features that
characterize the behavior of the user and capable to discriminate genuine user from
imposter user. A neural network, histogram and average metric are utilized for
extracting the mouse feature set that characterizes the behavior of the user.

3.1 Description of Mouse Raw Data


In this research, the dataset developed in [2] is used. This dataset includes the mouse
dynamics data of 998 sessions collected from 48 users. The collected data contains
mouse activities. Each activity holds the characteristics of an intercepted mouse
movement. The collected data contains four main mouse activities as described in what
follows:
1. Action type: the action type takes four values 1, 2, 3 and 4 for mouse move ðMM Þ,
silence, point, click ðPC Þ and drag, and drop ðDDÞ, respectively.
2. Traveled distance in pixels (d).
3. Elapsed time in seconds (t).
4. Movement direction takes eight values (1 to 8) according to the mouse movement.

3.2 Noise Reduction


The collected mouse raw data from each user have different ranges depending on an
environment setting and the accuracy of mouse dynamics modeling can be affected by
nature of the data. Two types of filtering are used in this thesis. The first filter regarding
the distance while the second one regarding the speed. In the first filter, only the mouse
data have distance within zero value and the distance value greater than 25 and less
than 1200 are considered while in the second filter all users speed greater than 800 are
eliminated.

3.3 Mouse Dynamics Features Extraction


The purpose of mouse dynamics features extraction process is to find the distinguishing
features that form the user’s behavior. The raw mouse data can be utilized to construct
Using Mouse Dynamics for Continuous User Authentication 779

a mouse dynamics signature for a user. A mouse dynamics signature is constructed by


introducing new features, which is utilized in conjunction with features presented in
[2]. The process of extracting feature set F is carried out through proposing a model
based on neural network and histogram. To construct the feature set F that determining
user’s behavior, the mouse dynamic raw data for each user was divided into a number
of mouse actions called sessions. To characterize the behavior of a user of each session,
a number of features are extracted. These feature sets result from a combination of new
features introduced in this paper with the features of [2]. The features of [2] are
movement direction histogram ðMDH Þ denoted by eight values, action type histogram
ðATH Þ denoted by three values, traveled distance histogram (TDH denoted by two
values), movement elapsed time histogram ðMTH Þ denoted by three values, average
movement speed per action type ðATAÞ denoted by three values and average movement
speed for each direction ðMDAÞ represented by eight values.
The new features are introduced via a neural network to model the user behavior
from the mouse raw data. Backpropagation neural network is utilized to introduce a
new feature that defines the user behavior from mouse dynamics as a curve approxi-
mate to user-collected data. Authentication model involves neural network training to
learn the acceleration that should be within the mouse dynamics. This is accomplished
through neural network training with a speed and distance of mouse dynamics. First,
the speed (s) is calculated for each action as the ratio of the traveled distance divided by
the time of that action as in (1).

di
8i ¼ 1; . . . nr ; si ¼ ð1Þ
ti

nr : is the total no. of actions in the raw mouse data


In addition, the acceleration is calculated depending on the time and the speed in
the mouse raw data. Then, to train the neural network, there is a need to feed it with the
inputs: time (t) and the speed (s) for each user session. The output of the neural network
represents the acceleration for a particular input. The structure of neural network
consists of the input layer with two neurons that express the time and the speed. The
hidden layer includes five neurons and there is one neuron at the output layer that
represents the acceleration of the corresponding inputs. Two activation functions
including hyperbolic tangent sigmoid and linear are used. The hyperbolic tangent
sigmoid function was used for the hidden neurons and linear function for the output
neuron.
The trained backpropagation neural network is used to investigate the behavior of
the user. To model the behavior the user, first, for each testing session, the minimum
and maximum values of the speed and the time ðsmin Þ; ðsmax Þ; ðtmin Þ and ðtmax Þ,
respectively should be found. Then, twelve values of the speed and the time are
extracted using (2) and (3).
780 O. A. Salman and S. M. Hameed

s0 ¼ smin

t0 ¼ tmin

8i; 1  i  11

ðsmax  smin Þ
si ¼ þ si1 ð2Þ
11
ðtmax  tmin Þ
ti ¼ þ ti1 ð3Þ
11

tmin : is the minimum value t can get,


tmin : is the maximum value t can get.
smin : is the minimum value s can get,
smin : is the maximum value s can get.
The extracted twelve values of the speed and the time that represent input values are
propagated forward to the network to calculate the net output. The weighted sum of the
inputs to the neuron is calculated. Then, the bias value is added to the sum and finally,
the activation function for the neuron (i.e. the desired output) is calculated. The
extracted twelve values that represent the acceleration are inverted (coined as AST) to
obtain a distinct separable within users behavior.

3.4 Normalization
After extracting the features from mouse raw data that describe the user behavior, F
now can be represented by l distinct features for each session. Therefore, the dataset of
mouse dynamics F can be formally described as:

F ¼ fF1 ; F2 ; . . .; Fn g;

where
n is the total number of sessions in F:
Each session Fk 2 F can be expressed as follows:

8k 2 f1; . . .; ng

Fk ¼ ffk1 ; fk2 ; . . .; fkl g

fkl : determines whether Fk is legitimate user or imposter user as formulated in what


follows:

1 if Fk is a genuine user
fkl ¼
0 otherwise
Using Mouse Dynamics for Continuous User Authentication 781

The values of extracted features have different ranges; therefore, the features are set
in a uniform range to avoid some features’ domination over others. The features are
scaled linearly to the range [–1, 1].

3.5 Classification Phase


The role of classification stage in the proposed user authentication model is to cate-
gorize a user behavior as either genuine or an imposter. The extracted feature resulted
from preprocessing stage are used as the input to this stage. Gaussian Naïve Bayes
classifier is utilized to show the ability of the proposed model to recognize user
behavior as a genuine or an imposter. Gaussian NB is used according to its ability to
generate the probability of features by scanning a training data only once, which makes
the task of classification to be straightforward.
Gaussian NB classifier contains two stages: learning stage and testing stage. The
learning stage aims to estimate the prior probability of genuine class and imposter class
and the probability of predictor given class. On the other hand, the goal of the testing
stage is to categorize the user behavior into either genuine or an imposter.
In learning stage, Gaussian NB is trained with features extracted from data pre-
processing phase, given feature vectors F ¼ fF1 ; F2 ; . . .; Fn g and their corresponding
labels C ¼ f0; 1g, the prior probability P cj ; cj 2 C; 8j 2 f1; 2g, is calculated as the
frequency of user behavior belongs to cj divided by the total number of user behavior in
training dataset as in Equ. (4).
Pn
  ui
P cj ¼ k¼1
; cj 2 C; 8j 2 f1; 2g ð4Þ
n

Where

1 if ck ¼ cj
uk ¼
0 otherwise

Estimating the distribution of the feature of the given class is achieved by calcu-
lating the mean lj and variance r2j of feature Fi as in (5) and (6), respectively.
Pn
fki uk
lij ¼ Pk¼1
n ð5Þ
k¼1 uk
Pn 2
k¼1 ðfki  lij Þ uk
r2ij ¼ Pn ð6Þ
k¼1 uk
782 O. A. Salman and S. M. Hameed

where

1 if ck ¼ cj
uk ¼
0 otherwise

In the testing stage, the prior probability and mean and variance of each feature
resulted from the learning phase are used as input to the classification phase. Then, for
 0 0 
each of feature vector in testing dataset F0 ¼ F1 ; . . .; Fnt ; the posterior probability of
each class cj 2 C; 8j 2 f1; 2g is computed as in (7).

    Y nt  
P cj jFi0 ¼ P cj  PDF Fi0 ð7Þ
k¼1

where
PDF is the probability density function that is computed as in (8).
0
  1 12ðf ijmij Þ2
PDF Fi0 ¼ qffiffiffiffiffiffiffiffiffiffi e
r2
ij ð8Þ
2pr2ij

Lastly, a label is assigned to a user represented by feature vector Fi after computing


the two posterior probabilities. The user Fi is categorized as a genuine when the
posterior probability of c1 higher than the posterior probability of c2 . Otherwise, Fi will
be classified as an imposter.

4 Experimental Results

The performance of the proposed model MDD is evaluated. The evaluation is presented
in terms of Acc; FAR and FRR. Backpropagation algorithm is adopted for identifying
the user behavior. The results are obtained by setting the parameters of the neural
network as follows:
1. A number of epochs is set for 1000.
2. Learning rate g ¼ 0:01.
3. Error rate  ¼ 0:001.
Furthermore, 3-fold cross-validation approach is used for testing the proposed
mouse dynamics model.
Using Mouse Dynamics for Continuous User Authentication 783

4.1 Session Length Setting


The dataset of [2] consists of 48 users, each user has a different number of actions. In
the proposed work, five different settings for the length of session Slen ¼
f500; 1000; 1500; 2000; 2500g are adopted to show the impact of session length on the
ability of the proposed user authentication model in discriminate among users. The
session length represents the number of actions required to complete a session. Fig-
ure 1 depicts the number of session per user while varying session length.

450
500 1000 1500 2000 2500
400

350

300
Number of Session

250

200

150

100

50

0
5 10 15 20 25 30 35 40 45
User Number

Fig. 1. Number of sessions for each user.

4.2 Evaluation of MD Model


This section illustrates the contribution of the newly introduced features by the com-
paring the performance of the proposed models against [2] regarding accuracy ðAccÞ,
false reject rate ðFRRÞ, false accept rate ðFARÞ, error equal rate ðERRÞ and area under
curve ðAUC Þ. Tables 1 and 2 quantitatively report the comparison of the proposed
model against [2].
784 O. A. Salman and S. M. Hameed

The results show that the session length effects on the performance of user
authentication model. Increasing session length provides the proposed models with
more information for constructing user signature form mouse dynamics that enhance
the performance of the user authentication models in terms of accuracy and minimizing
FAR and FRR. The proposed MD model has the same number of features as in [2]. The
results reported in Tables 1 and 2 clarify the high performance of MD model in all
evaluation metrics compared to [2]. This is evidence that the contribution of the new
AST features to the accuracy of the user authentication model and the appropriateness
of inclusion of AST features to characterize the activity of the mouse.
ROC Figs. 2, 3 and 4 qualitatively depict the comparison of proposed models for
each fold within session length equals to 1500 against [2]. The ROC figures demon-
strate that the proposed model outperforms [2].

Table 1. Comparison of proposed model in terms of against [2] in terms of Acc; FRR and FAR
MD model [2]
Session length Fold # Acc% FRR FAR Acc% FRR FAR
500 1 91.397 0.2 0.079 89.853 0.288 0.09
2 91.029 0.266 0.079 91.324 0.228 0.078
3 90.588 0.266 0.084 90.956 0.316 0.077
Avg. 90.931 0.248 0.081 90.809 0.274 0.081
1000 1 91.801 0.175 0.076 89.751 0.162 0.099
2 92.105 0.22 0.07 89.62 0.279 0.092
3 92.69 0.214 0.064 89.912 0.209 0.094
Avg. 92.541 0.205 0.066 89.81 0.223 0.094
1500 1 92.375 0.05 0.077 91.939 0.121 0.077
2 93.682 0.103 0.06 91.721 0.1 0.082
3 92.593 0.108 0.071 88.235 0.13 0.117
Avg. 92.884 0.079 0.07 90.924 0.127 0.088
2000 1 91.908 0.1 0.08 92.197 0.208 0.068
2 93.66 0.087 0.062 89.049 0.091 0.111
3 93.931 0.083 0.059 92.197 0.238 0.068
Avg. 92.59 0.128 0.072 91.528 0.118 0.082
2500 1 92.473 0.063 0.076 92.115 0.053 0.081
2 93.571 0.125 0.059 91.429 0.15 0.081
3 93.548 0.176 0.057 93.19 0.056 0.069
Avg. 93.563 0.822 0.009 92.241 0.087 0.077
Using Mouse Dynamics for Continuous User Authentication 785

Table 2. Comparison of proposed model in terms of against [2] in terms of ERR and AUC
MD model [2]
Session length Fold # ERR AUC ERR AUC
500 1 0.175 0.941 0.175 0.915
2 0.152 0.927 0.177 0.915
3 0.142 0.927 0.177 0.904
Avg. 0.156 0.932 0.176 0.911
1000 1 0.086 0.971 0.111 0.953
2 0.122 0.954 0.122 0.943
3 0.118 0.954 0.123 0.957
Avg. 0.109 0.96 0.119 0.951
1500 1 0.071 0.983 0.091 0.97
2 0.088 0.974 0.1 0.96
3 0.073 0.975 0.106 0.951
Avg. 0.077 0.977 0.099 0.96
2000 1 0.089 0.984 0.042 0.99
2 0.087 0.981 0.091 0.97
3 0.083 0.977 0.086 0.979
Avg. 0.086 0.981 0.073 0.98
2500 1 0.063 0.982 0.058 0.984
2 0.086 0.976 0.123 0.968
3 0.092 0.981 0.056 0.976
Avg. 0.08 0.98 0.079 0.976

0.9

0.8

0.7

0.6
TPR=1-FAR

0.5

0.4

0.3

0.2

0.1
[2] MD Model EER

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FRR

Fig. 2. ROC of the proposed models against [2] for fold number 1.
786 O. A. Salman and S. M. Hameed

0.9

0.8

0.7

0.6
TPR=1-FAR

0.5

0.4

0.3

0.2

0.1
[2] MD Model EER

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FRR

Fig. 3. ROC of the proposed models against [2] for fold number 2.

0.9

0.8

0.7

0.6
TPR=1-FAR

0.5

0.4

0.3

0.2

0.1
[2] MD Model EER

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FRR

Fig. 4. ROC of the proposed models against [2] for fold number 3.
Using Mouse Dynamics for Continuous User Authentication 787

5 Conclusions

This paper proposes a continuous user authentication model based on mouse dynamics.
The proposed model couples neural network and statistical metrics to extract a valuable
feature from mouse movement activities that able to distinguish user as genuine or an
imposter. The neural network is utilized to identify the user behavior in a new manner
from mouse acceleration as a curve. Comparison results ensure that the proposed model
outperforms the related model considered in the literature in all evaluation metrics. As
scope for future work is to employ silence action from mouse dynamics to model the
user behavior.

References
1. Zheng, N., Paloski, A., Wang, H.: An efficient user verification system via mouse movements.
In: Proceedings of The 18th ACM Conference on Computer and Communications Security -
CCS 2011, Chicago, 17–21 October 2011
2. Ahmed, A.A.E., Traore, I.: Mouse dynamics biometric technology. In: Klinger, K., Snavely,
J. (Eds.) Behavioral Biometrics for Human Identification: Intelligent Applications, pp. 207–
223 (2010)
3. Feher, C., Elovici, Y., Moskovitch, R., Rokach, L., Schclar, A.: User identity verification via
mouse dynamics. Inf. Sci. 201, 19–36, (2012)
4. Chao, S., Zhongmin, C., Xiaohong, G., Roy, M.: Performance evaluation of anomaly-
detection algorithms for mouse dynamics. Comput. Secur. 45, 156–171 (2014)
5. Mondal, S., Bours, P.: A computational approach to the continuous authentication biometric
system. Inf. Sci. 304, 28–53 (2015)
6. Mondal, S., Bours, P.: Combining keystroke and mouse dynamics for continuous user
authentication and identification. In: 2016 IEEE International Conference on Identity,
Security and Behavior Analysis (ISBA), Sendai, Japan, 26 May 2016
7. Lu, H., Rose, J., Liu, Y., Awad, A., Hou, L.: Combining mouse and eye movement biometrics
for user authentication. In: Information Security Practices. Springer, Cham, pp. 55–71 (2017)
8. Shen, C., Cai, Z., Guan, X., Du And, Y., Maxion, R.A.: User authentication through mouse
dynamics. IEEE Trans. Inf. Forensics Secur. 8(1), 16–30 (2013)
Ten Guidelines for Intelligent Systems Futures

Daria Loi(&)

Intel Corporation, Hillsboro, OR, USA


daria.a.loi@intel.com

Abstract. Intelligent systems – those that leverage the power of Artificial


Intelligence (AI) – are set to transform how we live, travel, learn, relate to each
other and experience the world. This paper details outcomes of a global study,
where a multi-pronged methodology was adopted to identify people’s percep-
tions, attitudes, thresholds and expectations of intelligent systems and to assess
their perspectives toward concepts focused on bringing such systems in the
home, car, and workspace. After background details grounding the study’s
rationale, the paper first outlines the research approach and then summarizes key
findings, including a discussion on how people’s knowledge of intelligent
systems impacts their understandings of (and willingness to embrace) such
systems; an overview of the domino effect of smart things; an outline of people’s
concerns with, flexibility toward and need to maintain control over intelligent
systems; and a discussion of people’s preference for helper usages, as well as
insights on how people view Affective Computing. Ten design guidelines that
were informed by the study findings are outlined in the fourth section, while the
last part of the paper offers conclusive remarks, alongside open questions and a
call for action that focuses on designers’ and developers’ moral and ethical
responsibility for how intelligent systems futures are being and will be shaped.

Keywords: Intelligent systems  Design guidelines  Ethics of AI

1 Introduction

This paper discusses outcomes of a global study, in which a multi-pronged method-


ology was adopted to identify people’s perceptions, attitudes, thresholds and expec-
tations of intelligent systems and to assess their perspectives toward concepts focused
on bringing such systems into the home, car, and workspace. The paper is divided into
five sections. Background details to ground the study’s rationale are first offered,
followed by an outline of the study approach. Key study findings are summarized in the
third section of the paper and the fourth highlights ten design guidelines that were
informed by the study findings. Conclusive remarks are finally offered, including open
questions and a call for action.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 788–805, 2019.
https://doi.org/10.1007/978-3-030-02686-8_59
Ten Guidelines for Intelligent Systems Futures 789

2 Background

Intelligent systems – those that leverage the power of Artificial Intelligence (AI) – are
set to transform how we live, travel, learn, relate to each other and experience the
world. While these systems so far proved beneficial through scripted automation and
transactions (e.g. sensor-based factory automation or phone-based health or financial
transactions), serious challenges emerge when they are designed to have unscripted,
autonomous, active roles. Challenges increase further when these systems include
Affective Computing abilities [1] or become integral part of the environments we
inhabit daily (e.g. home, office, school, vehicle). As these systems continue to be
developed, overlapping concerns are accelerating and becoming mainstream – from
fears of jobs replacement [2] to the emergence of surveillance [3] or deeply unequal
societies [4], to name a few. At the core of such concerns is the realization that AI and
intelligent systems may challenge, if not threaten, the fundamentals of human and
social behaviour and the very foundations of our society. As Bostrom and Yudkowsky
[5] point out, “although current AI offers us few ethical issues that are not already
present in the design of cars or power plants, the approach of AI algorithms toward
more humanlike thought portends predictable complication.”
The fascinating part of the AI debate is that negative focus is often directed to the
systems, as if they had the ability to come into existence autonomously. While AI will
be able to independently design and develop another AI [6], as of now intelligent
systems have one common trait: they are designed by people – typically data scientists,
often assisted by designers, social scientists, and business experts that make decisions
on what to design, how, why and what data to feed into systems, making them smart
over time, training them. An challenging aspect of such a process is that people are not
perfect nor fully predictable – in other words, not only human biases may play a key
role while systems are designed, human imperfections (i.e. traits of humanity) may also
have repercussions on systems themselves, once they start interacting with the world.
A good example is offered by Tay, a chatbot designed as an experiment to “ex-
periment with and conduct research on conversational understanding” [7] capable of
getting smarter by engaging with people in casual conversations. In less than 24 h from
its launch, Tay was morally corrupted (and subsequently shut down) as its users did
something not accounted for: they fed the system all sorts of misogynistic, racist
remarks and Tay simply learned – “repeating these sentiments back to users, proving
correct that old programming adage: flaming garbage pile in, flaming garbage pile out”
[8]. The Tay example shows how an intelligent system may channel or enable
unwanted, unpredicted behaviors because of at least three aspects that may have been
underestimated during the design process:
• people can be unpredictable,
• unpredictability implies a potential for unpredictable outcomes, and
• unpredictable outcomes may damage initial design intentions as well as the context
surrounding the system.
Stanford researchers recently made public work that utilized Deep Neural Networks
to detect sexual orientation from facial images. While in their paper Wang and Kosinski
790 D. Loi

[9] explained their rationale, mentioning that “findings expose a threat to the privacy
and safety of gay men and women”, their work turned against them, infuriating LGBT
advocacy groups [10], attracting AI experts’ criticism and disapproving readers’ email
threats, and becoming the center of an ethical review by the American Psychological
Association (cleared, see [11]). In this example, while scientists appeared to have a
benevolent agenda, their work negatively impacted the very cohort they allegedly
intended to protect – as well as themselves. Again, it is clear that an intelligent system
may end up channeling or enabling unwanted, unpredicted behaviors, likely because of
key aspects that were underestimated during the design process.
AI and Intelligent Systems are in desperate need for ethical as well as design
guidelines. While AI has greatly evolved from a technical point of view, it is in its
infancy as far as ethics and design process goes. The challenge is not only a technical
one, it is first and foremost a social, cultural, political, ethical one. Jake Metcalf
articulates this issues when he states that “more social scientists are using AI intending
to solve society’s ills, but they don’t have clear ethical guidelines to prevent them from
accidentally harming people (…). There aren’t consistent standards or transparent
review practices” [12].
The AI ethics debate is palpable yet not novel, given the number of organizations
focused on the topic (e.g. Partnership on AI; Leverhulme Centre for the Future of
Intelligence; Data & Society) and publications [5, 13–15]. It is clear that there are a
number of unaddressed social, behavioural, decisional and moral questions and that
great responsibilities are on the shoulders of those in charge of designing and devel-
oping intelligent systems. As Bostrom [14] puts it, while we could build a “superin-
telligence that would protect human values”, the “problem of how to control what the
superintelligence would do” looks rather difficult – within this context, designers and
technologists have key roles, agencies and responsibilities.
Ackerman [16] proposed that when AI lets us down it is not due to its creators’ lack
of care, it is due to the social-technical gap that exist between “what we know we must
support and what we can support technically”. Agreeing with Ackerman’s [16] view,
this paper additionally proposes that a future enriched and enabled by intelligent yet
trustworthy, ethical systems requires careful implementation of guidelines that govern
the actions of designers, technologists, social scientists, and business experts that
decide what to design, how, why and what data to feed into a given system. The study
here reported was motivated by a need to contribute to conversations on such
guidelines.
The ethics debate surrounding intelligent systems has been and still is dominated by
two forces: data science on one side, social science and humanities on the other: on
both sides sit experts. Barocas and Boyd [15] well discuss such a polarization, adding
that “the gaps between data scientists and critics are wide, but critique divorced from
practice only increases them”. Adding to Barocas and Boyd [15] perspectives, in this
paper it is proposed that “practice” should be extended to include end users’ everyday
life expertise. In other words, end users should actively participate in this debate and
related decision making.
This perspective is at the core of why the study described in this paper focused on
identifying design guidelines that are inspired from, supported by and grounded in
everyday people’s perspective, attitudes, thresholds and expectations toward intelligent
Ten Guidelines for Intelligent Systems Futures 791

systems. Derived from people’s everyday practices, the guidelines focus on empow-
ering designers and developers to shape human-centric AI futures. The study did not
aim at creating the ultimate guideline list – rather, it was conducted to identify practical
people-centric recommendations that will hopefully spark a healthy debate on the
processes used to develop intelligent systems and the agency that designers and
developers should have in such processes.

3 Study Approach

The study at the center of this paper adopted a multi-pronged approach that mixed four
very diverse techniques: a large scale market analysis; a multi-country survey; 18 in-
home qualitative interviews; and one participatory workshop with 8 participants.
The market analysis, conducted at the start of the project, focused on intelligent
systems from a landscape perspective, with emphasis on nine verticals (home, office,
factory, retail, entertainment, public transport, automotive, classroom, learning) and
nine vectors (players, products, academic research, investments, partnerships, associ-
ations, mergers and acquisitions, policies, events). The analysis focused on existing
secondary research (e.g. publicly available data such as academic publications, press
releases, reports, whitepapers, and databases) to ground protocols for subsequent study
phases and help isolate key focus verticals. At the end of this first phase, smart home,
autonomous vehicles and smart workspace were selected as key verticals to focus on in
subsequent phases.
Survey and in-home interviews focused from a quantitative as well as qualitative
perspective on two key areas: people’s perceptions, attitudes, thresholds and expec-
tations of intelligent systems; and people’s perspectives toward specific scenarios of
intelligent systems in home, autonomous cars, and workspace. A series of jargon-free
descriptions were created and used with participants to:
• explain what intelligent systems are and what technologies they include;
• provide a series of scenarios of what such systems may enable;
• describe what smart homes, autonomous vehicles and smart workspaces are;
• offer specific examples showing what smart homes, autonomous vehicles and smart
workspaces may enable.
In addition to gathering feedback on a wide range of themes, a series of metrics
were collected to facilitate comparative analysis for each scenario and description:
• 1 to 5 Likert ratings to identify comfort levels or assess concepts across seven
parameters (relevance, uniqueness, appeal, quality, comfort, excitement,
trustworthiness);
• Word associations exercises, where participants were asked to provide feedback to
concepts by selecting three items from a list of adjectives (e.g. exciting; creepy);
• Emotion association exercises, where participants provided feedback by selecting
three items from a list of emotions (e.g. love/desire; worried/fearful).
While in-home interviews were conducted in the US, the survey was conducted in
US, PRC (People’s Republic of China) and Germany. Participants for both survey and
792 D. Loi

interviews were recruited using a screener that focused on several criteria, including:
age; gender; smartphone, PC and intelligent systems ownership; and intelligent systems
purchase intention (refer to Fig. 1 for sample details). The screener also focused on soft
quotas, such as family composition and income, and had a natural fallout in relation to
users’ knowledge of intelligent systems.

Fig. 1. Participants sample details. Source: Loi, D. 2017

The survey, administered to 607 participants, focused on:


• Ownership and intent to purchase intelligent systems;
• Comfort levels with embracing intelligent systems in four diverse contexts (home,
car, workspace and classroom);
• Grouping intelligent systems’ scenarios into one of four clusters: must have, nice to
have, do not want, and not sure;
• High level feedback to smart home, autonomous cars and smart workspace;
• Comfort level with specific usages focused on smart home, autonomous vehicles
and smart workspace; and
• Comparative feedback to smart home, autonomous cars and smart workspace
concepts.
In-home interviews lasted a total of two hours per participant, during which
observational techniques were mixed with a semi-scripted interview approach that
mirrored the above-mentioned survey’s flow, focus and criteria. After completing
survey and in-home interviews, a subset of interviewees was invited to a participatory
workshop where themes were further-explored and participants co-created a manifesto
Ten Guidelines for Intelligent Systems Futures 793

to regulate intelligent systems futures. It should be noted that a similar workshop


structure was subsequently used within a professional conference setting [17, 18].

4 Results Highlights

Given the sample size and the multiple approaches used in the study, a vast amount of
data was collected and analysed. Not all data will be reviewed in this paper – while data
that grounded the guidelines is discussed in the following sub-sessions, additional
findings will be discussed in future publications.

4.1 What One Knows Makes a Difference


During survey and interviews, participants were asked to rate their likelihood to pur-
chase an intelligent system for smart home, autonomous vehicle and smart workspace
twice: at the start of the study and at the end, once participants had the opportunity to
enrich their knowledge through provided documentation.
Overall, participants’ likelihood to consider a smart home was rated higher when
compared with ratings for autonomous vehicles and smart workspaces and, during
interviews, participants were clearly more excited about this context of use. However,
when comparing ratings collected at the start with those collected at the end of the
session, data shows a drop in US and Germany ratings and an increase in PRC ones
(Fig. 2). Moreover, when comparing pre- and post-ratings by gender, data shows a
frequent drop in women’s ratings, while male ratings stay the same or increase.
A similar trend was noted during in-home interviews. This data seems to indicate that
the notion that knowledge equals understanding, and that understanding may equal
higher likelihood to embrace a new concept may not always apply to intelligent
systems.

Fig. 2. Response to concepts before and after exposure to details. Source: Loi, D. 2017

It is proposed that amount and type of provided information play a role in people’s
perception and willingness to embrace intelligent systems. Culture and gender play an
even more crucial role. This data highlights that how intelligent systems are explained
794 D. Loi

as well as demonstrated to consumers will be central to their willingness to embrace, or


reject, such systems.

4.2 Once They Have One, They Want More


An interesting trend, here called the domino effect of smart things, emerged during in-
home interviews. This will be illustrated with the story of Catherine (pseudonym), a 40
year old woman that shares a three-story detached house with her husband and 9 year
old daughter (Fig. 3).

Fig. 3. Catherine discusses the benefits of her smart home systems. Source: Loi, D. 2017

About a year ago Catherine purchased an Amazon Echo Dot and put it in the living
room. After using the device for “music and reminders”, she starts enjoying more
features yet realizes that the device’s range is confined to one room only. She therefore
purchases a second system for the TV room and, soon after that, Catherine purchases a
new system for her daughter’s bedroom, primarily to “listen to calming music at night”.
Her daughter is on the autism spectrum and Catherine shares how pleased she is with
the independence these devices are providing to her daughter. A fourth Dot is soon
acquired for the home office. Then, in June 2017, Amazon Echo Show start shipping –
Catherine learns that she can buy two systems for a reduced cost and does not hesitate:
one systems is purchased for the kitchen and the other for the main bedroom. She loves
the ability to use the two new systems as a video intercom as they encourage her
daughter to be more independent, while providing the ability to visually check on her
as needed. Catherine proudly shows me that she can use her Dots and Shows to operate
her new smart alarm system as well as the new smart light systems. She also explains
that she may soon purchase a smart lock for her main door. While showing how to
inter-operate her devices, she shares that she wishes they did a better job of under-
standing when to listen (and not listen) or when she is talking to one versus another
system. “They are not perfect”, she says.
Ten Guidelines for Intelligent Systems Futures 795

Catherine’s story was not unique during this study and survey data seems to
indicate that this domino effect may be common. For instance, when comparing data
related to ownership versus likelihood to purchase new intelligent systems, numbers
show that intent is almost invariably higher than existing ownership. Not only, PRC
participants (who owned the highest average of devices/person) expressed an higher
intent to purchase than US and Germany counterparts. There seems to be a direct
correlation between amount of owned intelligent systems and willingness to get even
more. While exciting news for those that manufacture and sell such products, this trend
could easily backfire if such multitudes of systems fail in satisfying people’s need to
have consistency and reliability in how they relate to each other.

4.3 Everyone Is Scared, yet Everyone Is Prepared to Compromise


Another clear trend identified through survey and then deepened through in-home
interviews relates to participants’ general fears and preoccupations with being part of
an artificially intelligent world. Sonia, a 66 year old retiree that spends time between
grandkids and learning about technological innovations, shared a sense of resignation
and acceptance when she stated: “I am not sure I want to live in a world where
everything is artificial and intelligent, even if I can see a place for these things”. More
combatively and critically, 33 years old small business owner Nathan shared concerns
with technology with the potential to impact relationships, “affecting intimacy, creating
dependency”. Most interviewees referred to intelligent systems as something useful yet
deeply problematic.
Many well understood the quid pro quo of this technology: to be smart, an intel-
ligent system needs to learn and to learn, data must be fed to the system – personal data.
At the same time, all participants seemed open for negotiations, prepared to accept and
compromise, as long as a clear Return on Investment (ROI) is provided in return.
In some cases accessing intelligence was worth the inconvenience of everyday
intrusions: “You say ‘Alexa’ or ‘Echo’ and it wants to start talking and you do not even
know how or why […] It feels like somebody is in our world [laughs]. We got used to
it but it is one of the things we dislike” (Catherine, 40). In other cases, people liked the
convenience but wanted to ensure they could still maintain the ability to be human: “In
life it’s good to make mistakes so you can learn from them… here feels like you would
not make mistakes anymore […] I can see the convenience but I can see that since
everyone learns by doing, here I would not get a chance to” (Sheila, 69). Some were
painfully aware of the fact that compromises will be needed: “I do not mind if my info
is being shared but it is not good when the data can be used against you… it all goes
down to what you are willing to compromise” [Stuart, 52]. Others made it clear that
intelligent systems will need to provide a range of options, empowering them to choose
based on personal comfort zone: “Camera is a bit too much for me […] but I know it’s
needed for lots of these things so I guess it depends on the privacy options you have –
provided you know where the data is and what is being used for” (Amanda, 28).
As previously stated, if clear ROI is provided, people appear rather open to
negotiate, accept and compromise access to their data. The key is to provide usages that
have high ROI – these are discussed in the next section.
796 D. Loi

4.4 Safe, Efficient, Practical, Transparent


When survey participants that declared not having an interest in purchasing AI-based
systems (N = 70) were asked to provide the rationale for their adversity (see Fig. 4),
39% of them listed security concerns, right after their top motivator: cost. At the third
place they listed privacy and intrusiveness issues. The trend mirrored findings from in-
home interviews: “It’s always listening… How secure is it? […] what do they do to
protect you?” (Jules, 39). Additionally, many participants had specific expectations:
“My concern is that these things can be hacked so I expect them to be designed so they
are safe” (Sonia, 66). People explained how their trust in a system is interlinked with
their trust toward those providing it: “What if a business or social change occurs and
the initial agreement behind the system changes? Where does (data) end? […] this is a
power that can be abused and I know that people always abuse power. It’s not that I do
not trust technology, I do not trust people” (Nathan, 33). Moreover, participants often
referred to brand trust as something that would make or break their willingness to
consider an intelligent system: “It’d have to be a company I trust, that has a proven
record of keeping things private, no security breaches, scams and things like that”
(Amanda, 28).

Fig. 4. Top motivators for not purchasing an AI-based system. systems. Source: Loi, D. 2017

Possibly due to these privacy and data security concerns, many expressed greater
openness to and interest in intelligent systems focused on making them efficient. This is
well demonstrated by survey responses to a question where participants were asked to
group provided intelligent systems usages according to four clusters, namely must
have, nice to have, do not want, and not sure. As illustrated in Fig. 5 (usages abbre-
viated) the utilitarian usage “remind me of tasks and meetings” was rated as number
one “must have” overall and number one for US (62%) and Germany (66%).
Ten Guidelines for Intelligent Systems Futures 797

Fig. 5. Top 10 Must Have usages. Includes Top 3 ranking by country. Source: Loi, D. 2017

Fig. 6. High Comfort Smart Home usages (abbreviated). % indicates amount of participants that
feel “very comfortable” with a Smart Home performing the activity. Source: Loi, D. 2017

Another indication of this efficiency trend is visible from survey feedback received
in relation to Smart Home usages, where top ranking usages show a clear preference for
usages focused on maintenance, prevention, and efficiency (Fig. 6, usages
abbreviated).

4.5 The Secret Life of Emotions


The study also tested a number of Affective Computing [1] usages, all focused on the
ability to identify the emotional state of a person or group of people to activate a series
of context-appropriate actions (e.g. personalized recommendations or interventions).
798 D. Loi

When faced with usages that relate to such an intimate topic, people often paused and
their responses included a deep sense of skepticism, aversion, curiosity, and distrust.
Some participants felt intrigued yet did not trust the ability of an affective system to
be reliable and smart enough: “Thought provoking […] In theory is great but it’d need
to be super sophisticated. Not sure it can be THAT sophisticated [emphasis added to
mirror participant’s vocalization]” (Jim 65) and “the issue is not with discomfort with
the action but doubts that it can do it properly and reliably (Sheila, 69).
Others felt such usages would be intrusive: “Having a device monitoring my mood
is too personal […] This is beyond what a machine should be doing. Keeping track of
things is ok, but emotional state? I do not see this as a positive thing at all” (Esther, 34).
Some participants did not oppose affective systems but opposed to the idea of a
system with a conversational human agency: “I do not want the car to check up on me
but I do not think this [idea] is a bad thing” (Catherine, 40), while only a minority of
interviewees expressed excitement about the notion of human systems: “The more
personal technology gets… that’d be great. Not only smarter but actually human-like”
(Amanda, 28).
It should be finally added that during in-home interviews people often offered ideas
on how an affective system may specifically benefit them in given situations,
demonstrating the contextual nature of their willingness to embrace them.

4.6 Smart Versus Intelligently Independent Systems


Feedback demonstrated the clear line that people draw between smart versus inde-
pendent systems. Many interviews clarified that an intelligent system to them means
convenience and that, although open to some serendipity, they need control, pre-
dictability and consistency. This need to be and feel in control is exemplified by data
related to the usage “ask before automating things”, which ranked second, with 57% of
users (aggregate numbers) selecting it as a must have feature (Fig. 5). People’s dis-
comfort toward independent systems (and their need to keep control) was often
interlinked with discomfort toward technology with its own personality and perspec-
tives, as such traits were often seen as yet another way for a system to become
independent, overstepping beyond acceptable smartness.
It should be noted that while personality and perspectives were generally poorly
received, participants saw specific contexts where they would be not only acceptable,
but desirable. For instance, the “provide companionship to elderly or people in need”
usage was very well received, ranking in the top five must have features for USA and
Germany and first for PRC (Fig. 5). This indicates that in specific application-contexts
(such as companionship) traits such as personality and the ability of having a per-
spective, acting with some degree of independence, are acceptable if not desirable. It
appears that the ROI of companionship usages is high, since people showed openness
to compromise on system traits that would be otherwise considered undesirable.
Ten Guidelines for Intelligent Systems Futures 799

5 Ten Design Guidelines for Intelligent Systems Futures

The previous section focused on insights gathered over the course of the study at the
center of this paper. This section showcases 10 design guidelines that were directly
informed by those insights.

5.1 ONE: Take a Firm, Unambiguous Ethic Stand – Be a Trusted Brand


Intelligent systems have been and will continue to be exposed to high scrutiny, and
rightly so. In fact, scrutiny will increase, thanks to increased mainstream awareness,
standards, and governmental mandates. Intelligent System designers and developers
must not only promote ethical practices, they must design them: be firm in their ethical
stand, ensuring that such a stand is present throughout their design and development
process. They must be a trusted brand and design trusted systems – this includes a
responsibility to speak up and not commit to (nor enable) new ideas, designs and
developments if they appear to break one’s ethic stands.

5.2 TWO: Adopt the Minimize Intrusion Mantra and a Less-Is-More


Approach
In case of doubt, intelligent system designers and developers should use minimalism as
a compass. This means ensuring that intelligent systems strictly collect the minimal
data (type and amount) that is required for successfully achieving a requested trans-
action. During the study here reported, it was clear that the more one becomes familiar
with an intelligent system, the more one trusts and feels comfortable in using it.
However, familiarity requires not only time, it requires careful design and
consideration.
A way to use a minimalist approach is to set a system’s default settings at a basic
level – basic functions, mirrored by basic amount of data collection, use, storage and
exposure. An intelligent system should be capable of dynamically changing its settings,
based on direct users’ requests or feedback loops embedded in the system. Such a
system should be conceived as an organism that adapts to the user and should never
collect more data than what is required to satisfactorily complete a task, unless spec-
ified otherwise by its users.

5.3 THREE: Design Socially Trusted and Trustworthy Platforms


This guideline incorporates a number of sub-guidelines, all centred on ensuring
intelligent systems are designed to be socially trusted and trustworthy. These include:
• Intelligent systems must fail safe;
• Privacy and hacking concerns must addressed upfront – for instance, by offering
data protection services and warranties as part of the product;
• Check and balances mechanism must be embedded into the fabric of a system;
• By default, all data should be encrypted;
800 D. Loi

• Data types should be separated – only the user’s system should have the ability to
assemble them into a cohesive picture;
• Similarly to online money transaction models, how an intelligent system does
something should be separated from what is being done and where it is done;
• Intelligent systems’ motivations and actions must be transparent;
• Users must have the ability to provide feedback to an intelligent system and the
system must take all feedback into account as well as explain how and when
provided feedback will be executed; and
• An intelligent system must explain where data is stored, where it may go, who can
access it (and why), and whether it will stay somewhere permanently – in acces-
sible, transparent ways.

5.4 FOUR: Do not Make Systems Human, but Capable of Helping


Humans
During the study at the center of this paper, interviewees clearly articulated what type
of relationship they wish to have with intelligent systems: one where the system is in a
subordinate role, never a peer. People expressed a need to be in charge and, in
respecting such a need, the system should be designed to ensure that there is no
ambiguity on who is in control. Unless otherwise specified and authorized by the user,
an intelligent system should always ask before acting, with the exception of emer-
gencies, where additional behavioural rules will be needed and agreed on. Since
human-like attributes are typically associated with an unwelcome level of indepen-
dence, it is recommended to:
• Design helper systems, with clear power boundaries;
• Avoid designing systems that behave (or are perceived) as assuming or arrogant –
this is a particularly important point for affective systems;
• If emotion recognition is an available capability, tackle emotions by context and
embed in the system ways to educate people about them. However, never assume
on behalf of a user and always leave full control on actions and behaviors to end
users;
• Do not underestimate people’s scepticism on affective usages’ reliability; and
• Consider using emotion understanding to help people help and connect with other
people.

5.5 FIVE: Prioritize Usages that Matter – Helper Usages


The fact that a technology could do something does not imply that it should. Designers
and developers should be mindful and reflective of this precept: in the case of intel-
ligent systems this is an extremely important point to consider. Pushing usages with
low (or no perceived) ROI or usages that may be (or be perceived as) ethically
questionable or low in purpose will have long term repercussions on the product’s
success, users’ willingness to embrace it, and potentially society.
People during this study clearly expressed what usages have high ROI: utilitarian
usages that make them feel efficient yet in charge; usages that make them save money,
Ten Guidelines for Intelligent Systems Futures 801

energy and time; usages that reduce their preoccupations and remove frustrations; and
usages that allow them to remove boundaries and focus on what really matters. When
designing intelligent systems, it is recommended that everyday chores, efficiency, and
helper usages that keep a clear hierarchical distinction between helper and master are
tackled first. Systems should then be capable of identifying what type of advanced
usages may be pertinent and of interest to end users and should be designed to educate
as well as ramp up users to such advanced opportunities.

5.6 SIX: Design Systems with Consistent Behaviors, Yet Design


for Serendipity
People expressed a duality throughout the study: on one side, they asked for consis-
tency and reliability, on the other they did not want to feel predictable and asked for
technology that can enrich their understandings and even surprise them (within comfort
zone). It is recommended to design systems that are enriching and predictable yet
designers and developers should embrace the challenge of designing for contextual,
personalized serendipity.

5.7 SEVEN: Make People Feel Unique and Empower Their Unique
Goals
Part of human nature is the need to feel special, acknowledged as an individual. During
many interviews people described how they want to feel throughout their technological
interactions: unique. Many for instance resented the notion of a system that is so smart
that can predict their behavior spotlessly, as feeling predictable makes them feel boring
and less unique. Additionally, most expressed a need for technology capable of
empowering them so they can achieve their unique goals – especially those goals that
would be out of their reach otherwise.
Intelligent Systems should make people feel connected, wanted, and acknowledged
– a system should make users feel that they are cared for and it should have ability to
help users care for others and their surroundings. Companionship, social connected-
ness, and mediated social interactions all offer great design and development oppor-
tunities for addressing such a human need and for enriching people’s everyday lives,
especially the lives of those that may be in higher need for assistance, support, and
nurturing (for instance, senior citizens).

5.8 EIGHT: Create Multiple and Diverse Educating Tools


One size fits all approaches are rarely satisfactory – in the case of intelligent systems
design, they would heavily compromise how people understand, perceive, relate to, and
embrace such systems. During the study many asked for and expressed a strong sen-
timent that they have the right to get a clear idea of what a system is, what it does,
where data goes as well as who does what and when.
Designers and developers should focus on empowering people so they can make
informed choices on whether and how to incorporate intelligent systems in their lives.
This requires not only an appreciation that people have diverse baseline understandings
802 D. Loi

on these matters: it requires the development of multiple, diverse, contextual ways


(content, methods, tools) to educate people on what they are in for and how to choose
what is best for them, their surroundings and communities. These educating tool must
avoid cryptic, tech-centric and confusing lingo – people need to understand these
systems, not be confused or feel betrayed by it. While this guideline should apply to
any product, in the case of complex systems powered by AI, where trust is a massive
sticky point, it becomes fundamental.

5.9 NINE: Design On-Boarding Mechanisms that Grow and Evolve


Borrowed from human resources [19], onboarding is a term that user experience
designers adopted to describe the process used to ramp up users, making them familiar
with a new site, app, or service and increasing the likelihood that they will continue
using such a site, app, or service. When well implemented, onboarding assists users in
learning how to use an application incrementally, avoiding cognitive overload. In the
context of intelligent systems, it is suggested to use this technique not only to ramp up
users as they start engaging with a system, but also to:
• Adjust the system’s behavior as it increases its knowledge of the user and as a
user’s understanding of, and trust toward, the system changes. For instance, a
system may ask the user: I noticed you tend to do X every day at the same time, do
you wish me to do Y instead of Z in the future?; and
• Contextually explain to the user what will be compromised or what could be gained
if a setting was changed or activated in relation to a system’s recommendation. For
instance, the system may say: Given you seem to like X, I think you may also enjoy
Y. If you wish to try Y out, note that I will have to gather Z data under W
circumstances.
Intelligent systems should be able to evolve as their relationship with their users
evolve, they should have ability to grow up and grow old with their owners, and should
transparently empower people, equipping them to choose what is best for them and to
change their choices over time.

5.10 TEN: Create Families of Products


During interviews with owners of several smart devices, a need for smart inter-
operability across devices often emerged as a key theme. Additionally, during the
research participants showed a general tendency to use (or want to use) technology
outside its initially designed purposes – tendency that reached new levels when they
interacted with their multiple intelligent systems: they expected such systems to be
smarter, able to dynamically adjust to their everyday practices, and capable of perfectly
collaborating and understanding each other, regardless of who manufactured them and
regardless of their original purpose. In light of such behaviors, designers and devel-
opers should adopt a family-of-products design mindset, carefully designing for users’
expectations, potential cross-devices usages and likely mis-usages. Each intelligent
system should be designed as a node in a complex, dynamic network of systems.
Ten Guidelines for Intelligent Systems Futures 803

6 Conclusions and Implications

The study reported in this paper aimed at identifying design guidelines that are inspired
from, supported by and grounded in everyday people’s perspective, attitudes, thresh-
olds and expectations toward intelligent systems. Thanks to a multi-pronged approach
that included qualitative and quantitative tools, a number of key insights were dis-
cussed. First, the paper discussed how people’s knowledge of intelligent systems
impacts their understandings of (and willingness to embrace) such systems. After an
overview of the domino effect of smart things, the paper articulated that while people
have great concerns, they are prepared to flex their comfort zones if there is an evident
ROI. Then, it was demonstrated that people want to maintain control over intelligent
systems and that they have a preference for efficiency, helper usages. Finally, insights
on how people view Affective Computing [1] were offered, alongside a discussion
showing that while people are open to smart things, they are less enthusiastic toward
intelligent independent ones. These insights were then used to articulate ten design
guidelines, namely:
• Take a firm, unambiguous ethic stand – be a trusted brand
• Adopt the minimize intrusion mantra and a less-is-more approach
• Design socially trusted & trustworthy platforms
• Do not make systems human, but capable of helping humans
• Prioritize usages that matter – helper usages
• Design systems with consistent behaviors, yet design for serendipity
• Make people feel unique and empower their unique goals
• Create multiple and diverse educating tools
• Design on-boarding mechanisms that grow and evolve
• Create families of products
The study here discussed was not intended to produce the ultimate design guide-
lines – rather, it was conducted to identify practical people-centric recommendations
that will hopefully spark a healthy debate on the processes used to develop intelligent
systems and the agency that designers and developers have and should have in such
processes. Within such a debate, a number of questions remain in need for deepening
and practical development, including:
• What ethical considerations should designers and developers prioritize?
• What level of autonomy and agency should intelligent systems have?
• Should autonomy and agency change contextually or by context of use? How?
• What level of transparency should be provided to end users? How?
• How should an intelligent system relate to, converse and engage with users?
• What specific design attributes may enable systems that are effective and accurate
yet unobtrusive, respectful, intuitive and transparent intelligent?
• Can a human-centric approach to intelligent systems be effective while enabling
sustainable business models and technological progress?
• What social and behavioral contracts should underpin people’s interactions with
intelligent systems?
804 D. Loi

Designers and developers have the moral and ethical responsibility to engage with
how intelligent systems futures are being and will be shaped. A future enriched and
enabled by intelligent yet trustworthy, ethical systems requires careful implementation
of guidelines that govern the actions of those in charge of deciding what to design,
how, why and what data to feed into a given system. Designers and developers are
called on to be challenged by and contribute to the complex yet exciting task of shaping
the present and future of intelligent systems.

References
1. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
2. Global Artificial Intelligence study: sizing the prize. PwC. https://www.pwc.com/gx/en/
issues/data-and-analytics/publications/artificial-intelligence-study.html. Accessed 11 Oct
2017
3. The age of AI surveillance is here, Quartz. https://qz.com/1060606/the-age-of-ai-
surveillance-is-here/. Accessed 30 Sep 2017
4. Are we about to witness the most unequal societies in history? The Guardian. https://www.
theguardian.com/inequality/2017/may/24/are-we-about-to-witness-the-most-unequal-
societies-in-history-yuval-noah-harari. Accessed 23 June 2017
5. Bostrom, N., Yudkowsky, E.: The ethics of artificial intelligence. In: The Cambridge
Handbook of Artificial Intelligence, pp. 316–334. Cambridge University Press (2011)
6. Google’s New AI is better at creating AI than the company’s engineers, futurism. https://
futurism.com/googles-new-ai-is-better-at-creating-ai-than-the-companys-engineers/. Acces-
sed 23 June 2016
7. Trolls turned Tay, Microsoft’s fun millennial AI bot, into a genocidal maniac, The
Washington Post. https://www.washingtonpost.com/news/the-intersect/wp/2016/03/24/the-
internet-turned-tay-microsofts-fun-millennial-ai-bot-into-a-genocidal-maniac/?utm_term=.
388462f65470. Accessed 23 June 2016
8. Twitter taught Microsoft’s AI chatbot to be a racist asshole in less than a day. The Verge.
https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist. Accessed 24
Mar 2016
9. Wang, Y., Kosinski, M.: Deep neural networks are more accurate than humans at detecting
sexual orientation from facial images. J. Pers. Soc. Psychol. 114, 246 (2018)
10. GLAAD and HRC call on Stanford University & responsible media to debunk dangerous &
flawed report claiming to identify LGBTQ people through facial recognition technology.
https://www.glaad.org/blog/glaad-and-hrc-call-stanford-university-responsible-media-
debunk-dangerous-flawed-report. Accessed 7 Nov 2017
11. Study claiming AI can detect sexual orientation cleared for publication. KQED. https://ww2.
kqed.org/futureofyou/2017/09/13/can-facial-recognition-detect-sexual-orientation-
controversial-stanford-study-now-under-ethical-review/. Accessed 7 Nov 2017
12. AI research is in desperate need of an ethical watchdog, Wired. https://www.wired.com/
story/ai-research-is-in-desperate-need-of-an-ethical-watchdog/. Accessed 14 Jan 2017
13. Gunkel, D.J.: The Machine Question: Critical Perspectives on AI, Robots, and Ethics. MIT
Press, Cambridge (2012)
14. Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press,
Dangers (2014)
15. Barocas, S., Boyd, D.: Engaging the ethics of data science in practice. Commun. ACM 60
(11), 23–25 (2017)
Ten Guidelines for Intelligent Systems Futures 805

16. Ackerman, M.S.: The intellectual challenge of CSCW: the gap between social requirements
and technical feasibility. Hum. Comput. Interact. 15(2–3), 179–220 (2000)
17. Loi, D., Raffa, G., Esme, A.A.: Design for affective intelligence. In: HCII 2017, San Antonio
(2017)
18. Loi, D., Lodato, T., Wolf, C.T., Arar, R., Blomberg, J.: PD manifesto for AI futures. In: PDC
2018. Hasselt & Genk, Belgium (2018)
19. Bauer, T.N., Erdogan, B.: Organizational socialization: the effective onboarding of new
employees. In: Zedeck, S. (ed.) APA Handbook of Industrial and Organizational
Psychology, vol. 3, pp. 51–64 (2011)
Towards Computing Technologies on Machine
Parsing of English and Chinese Garden Path
Sentences

Jiali Du(&), Pingfang Yu, and Chengqing Zong

Guangdong University of Foreign Studies, Guangzhou 510420, China


dujiali68@126.com

Abstract. This paper discusses the syntactic effect and semantic influence of
computing technologies on machine parsing and machine translation (MT) of
English and Chinese Garden Path Sentences. An effective MT system focuses
on both accuracy and speed. Both syntactic and semantic information exerts a
considerable influence on translation. English gives head-occupied focus and
syntactic information is a key for parsing. Chinese provides end-directed focus
and semantic background is necessary for parsing. The translation of garden
path sentences in English and Chinese shows distinctive features. Different
filler-gap relations in source and target languages result in different output. The
integration of various methods of computational linguistics, e.g. CFG, RTN,
CYK, WFST and CQ analysis is helpful to explain the processing breakdown
and backtracking clearly and concisely.

Keywords: Machine translation  Computational linguistics


Garden path sentences

1 Introduction

MT (machine translation) is the direct result of combining computational skill and


linguistic knowledge. With the development of computational skills, MT is used to
analyze the natural languages and is considered one of the first computational appli-
cations of linguistic knowledge from the 1950’s. Even though the early MT pro-
grammes begin in with very high hopes, both accuracy and precision is lacking. MT
attempts to bridge the language gap and integrate communication skills with machine
and human being; however, the technology may not be always perfect. Some scholars
call this unbalanced phenomenon: ‘the spirit is willing but the flesh is weak’ or ‘the
steak is wonderful but the whisky is lousy’. With the advancement of software,
equipment and linguistic involvement, the steadiness and reliability of MT has been
greatly improved over the past decade. Designing a system for MT needs to capture
both language-independent and language-specific information. An effective translation
has to recognize the whole phrases, sentence structures and their closest counterparts in
the target language. The application of the large computerized corpus and statistical
techniques leads to better MT.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 806–827, 2019.
https://doi.org/10.1007/978-3-030-02686-8_60
Towards Computing Technologies on Machine Parsing 807

MT system is now ubiquitous and potent. The today’s global marketplace increases
the need for translation, and the growth of computing power satisfies the demand. The
widespread accessibility of MT makes the lively presentation clear and concise. For
example, the intelligent Thai text – sign translation for language learning (ITSTL)
system may benefit the deaf and hearing-impaired by translating Thai text into sign
language [1]. A parameterized interlingual approach called UNITRAN is desirable
because of its simple description of natural grammars. The MT system facilitates
modification, augmentation, and cross-linguistic variation [2]. A functional approach to
special translation takes advantages of the specified information of function and is
considered an effective method for MT [3]. Only on the basis of hybrid knowledge
databases of MT system can the suffering from garden path effect on sentence structure
be alleviated and can robustness of MT system be enhanced [4–6].
Rapid growth of Chinese economy needs indistinguishably good translations.
China belongs to BRICS group, which comprises Brazil, Russia, India, China and
South Africa since the 2010. China and the other counties, which are at a similar stage
of newly advanced economic development, try to build better global economy. The
shift of economic power from the developed to the developing requires China strong
enough to input efficiently and output widely. Thus more research on translation is
needed. MT in China is more popular than ever with the key demand of translation, and
the linguistic involvement in the system is probably the inevitable way of the further
improvement. The translation of complex sentences, e.g. garden path (GP) sentences, is
hard for system. It is necessary for system to understand the deep structure of GP
sentences in order to work properly. English GP analysis has been becoming an
attractive research programme since Bever [7, 8] published his articles to discuss the
special phenomenon. However, Chinese GP discussion is infrequent and inadequate.
This paper will discuss the translation of English and Chinese GP sentences, and
compare the difference between them.

2 Machine Translation and Linguistic Involvement

MT is a sub-field of computational linguistics. Continued and heavy involvement in


linguistic analysis is one of the key factors to enhance the quality of MT. Early MT
system basically performs simple substitution of words in source language for words in
target one, and pays less attention to linguistic knowledge. As a result, the complex
cognitive operation related to in-depth knowledge of the grammar, semantics, syntax,
idioms, and even the culture of speakers loses the power during the translation, which
makes MT system unreliable and unrobust. Idiosyncratic gaps between source and
target sentence structure usually originate in cultural differences and computational
treatment of these gaps is a very difficult problem for not-yet-completed MT [9]. The
syntactic involvement makes MT consider the phrases rather than the words to be the
translation units. The application of linguistic and statistical methods in a Spanish-to-
Basque speech MT system improves the probabilistic finite-state transducers, and the
syntactic role helps to cluster words by means of a statistic analysis [10]. Based on
doctor–patient dialogues, e.g. Retry (Repeat and Rephrase) and Accept behaviors in the
mediated verbal channel, an English–Persian speech MT system is established, and the
808 J. Du et al.

dynamic Bayesian network model is proved to be effective to guide the users’ beha-
viour [11, 12]. In a cross-language information retrieval system, structured linguistic
factors perform well, including morphological analyzers, word lists, electronic dic-
tionaries, n-gramming of untranslatable words, unstructured queries, etc. [13].
Statistical skill pushes the development of MT. The human translation process
comprises two parts, namely decoding the meaning of the source text and re-
encoding this meaning in the target language. This process is an intellectual one.
However, MT is a process different from human translation, and the complex cognitive
procedures are impossible to be fully and clearly included in the system. The statistical
methods and statistics-based corpus help the MT system to create a target text as a
bilingual person nearly does and to understand the meaning as if it were almost con-
veyed by human being. For example, a speech recognition system implemented by
posterior probability is effective to improve speech translation, which is tested in a
Japanese-to-English task; and the speech quality is improved by converting the
recognition word lattice to a translation word graph [14]. By describing a self-
organizing map (SOM), a SomAgent statistical analysis in a prototype of an automatic
German–Spanish MT can use artificial neural networks to determine the correct
meaning of a word and take advantages of the parallelism by modeling a community of
conceptually autonomous agents [15]. Probabilistic inference stimulates panlingual
lexical translation. By building a massive translation graph for languages with no
translation dictionaries, scholars find a novel approach to lexical translation with the
automatic construction from over 630 machine-readable dictionaries. This method is
proved to be effective and helpful for MT [16].
Both computational technologies and linguistic knowledge are necessary for MT.
In static single assignment form (SSA form) which is a popular intermediate repre-
sentation in compilers, back-translation algorithms is admissible and preferred [17].
Linguistic approaches, e.g. case grammar, may take effect. In the MT system from
Arabic to English and French, Arabic lends itself quite naturally to a Fillmore-like
analysis, according to which verbs are the center and the various noun sentences
occupy specific peripheral nodes around the center [18]. In a user-assisted query MT
system, Cross-Language Information Retrieval (CLIR) is interactive and helps searcher
and system collaborate to find appropriate documents with the effective use of the new
capabilities [19]. The integration architectures are effective and efficient. Speech-input
translation can be properly regarded as a pattern recognition problem. Both statistical
alignment models and stochastic finite-state transducers are useful in construction of
MT system from Spanish/Italian to English. The acoustic models (hidden Markov
models) are embedded into the finite-state transducers, and the translation of a source
utterance is the result of search on the integrated network [20].
The support and integration of rule-based module and statistical translation module
is important as an ongoing and stable factor for MT system. For example, in the
Spanish MT system from speech to sign language, the eSIGN 3D avatar animation
module reduces the delay between the spoken utterance and the sign sequence ani-
mation. And the configuration with the integration of rule and statistics is helpful for
system [21]. Among the influential factors involved in the advancement of MT, syn-
tactic effect is noticeable and persistent.
Towards Computing Technologies on Machine Parsing 809

3 Syntactic and Semantic Effect on Machine Translation

This section comprises the analyses of Raman and Reddy’s system, syntactic effect on
English garden path sentences and semantic effect on Chinese sentences.

3.1 Raman and Reddy’s Machine Translation System


An effective MT system focuses on both accuracy and speed. English and Indian
belong to the same language family, namely Indian-European family, and they share
many linguistic similarities, which are convenient to develop a highly accurate Indian-
English Parallel MT System. Real time application is an effective method to speed the
translation. The MT system must highlight parallelism both at word-level in the
morphological analysis and at phrase-level in the semantic analysis stages [22].
According to Raman & Reddy, exploitation of parallelism and a good dictionary
organization may bring the efficiency and effectiveness of system. The construction of
system comprises the morphological analysis, phrase-level analysis and generation. It is
a reasonable strategy for a proper organization of dictionary to allow exploitation of
parallelism in its access mechanism. The system test is related to different conditions of
load sharing by the transputers, and to sentences consisting of different phrases.
Figure 1 shows the flow diagram of the parallel MT clearly and cleanly.
The parallel system needs the information of lexical and phrasal levels. In Trans-
puter 1, ten procedures are discussed, for example, (1) Read the input sentence,
(2) Split the sentence into words, (3) Control the distribution, (4) Morphological
analysis, (5) Collect the analysis info, (6) Split the sentence into phrases, (7) Control
the distribution, (8) Phrase-level analysis and generation, (9) Collect output structure,
(10) Decode & display TL sentence. The arrow directs and follows the procedure from
Step one to Step ten. This means system firstly processes the word-level information
and then begins the processing of phrasal level. Transputer 1 is a fundamental stage in
which the basic processing of natural language is required. Transputer 2 includes (1)
Morphological analysis, (2) Router, (3) Phrase-level analysis & Generation, (4) Rou-
ter. This stage is a transitional one which exists between Transputer 1 and Transputer 3.
Transputer 3 is involved in Morphological analysis, and Phrase-level analysis and
Generation. The stage is an advanced one and all the labeled information is processed
here. The arrows from Transputer 1 to Transputer 3 are bidirectional and the translation
is dynamic and flexible.
The efficiency of Indian-English Parallel MT System shows linguistic knowledge
and practice is necessary, and syntactic effect on MT is noticeable and striking. System
has to adhere to the linguistic rules governing the suitability of both source and target
languages. From the system created by Raman & Reddy, we can find the fact that the
same language family shares many similarities, and this peculiarity brings efficiency
and effectiveness. If the source language and target language come from different
families, e.g. Chinese from Sino-Tibetan family and English from Indian-European
family, we are not sure whether the efficiency can be further improved or not. We will
compare the difference of syntactic types between head-initial English and head-final
Chinese, and analyze the deep structures of two languages based on GP sentences.
810 J. Du et al.

Fig. 1. Raman & Reddy’s flow diagram of the parallel machine translation system.

3.2 Syntactic Effect on Machine Translation of English GP Sentences


An endocentric phrase has a special syntactic type. In linguistics, the phrase type
comprises endocentric and exocentric phrases. The endocentric phrase includes head
and dependents. The head determines the syntactic type of that phrase and the
dependents modify the head as the complements. The head plays an active role in
sentence processing because it can specify subcategorizing relations with elements
within the same phrase and get integrated with other elements outside of the phrase. On
the contrary, the exocentric has no head and dependents, and all the elements contribute
to the syntactic type, lacking a clear head. The headed phrases decide on the direction
of branching. Head-initial, head-final and head-medial phrases are the basic categories.
Towards Computing Technologies on Machine Parsing 811

For head-initial phrases, dependents are placed after head, which emphasizes the left-
branching. For head-final phrases, they are right-branching, and head-medial phrases
consist of both left and right branching.
There are two accepted division of thematic roles involving the head. One argues
that thematic roles will be assigned to arguments once they come into presence in a
sentence. The other maintains that thematic roles are not assigned until the head is
reached. The early assignment meets the cognitive demands by processing the sentence
as quickly as possible, which sometimes brings the GP effect because of the shift of
thematic roles assigned to the heads. The late assignment needs more cognition since
the brain has to maintain the incremental information, and multiple possibilities need to
be maintained until the appearance of head. For the head-initial language, the early
assignment is efficient, while the late assignment is effective for the head-final lan-
guage. For example, if a special relative clause is temporarily analyzed as a main
clause, the shift from a head-initial processing to a head-final processing may some-
times bring processing breakdown of GP model [23, 24].
Chinese is head-final while English is head-initial. According to a head direc-
tionality parameter in word order, many lingual typologists classify Chinese syntax as
head-final since the central emphases are often found at the end of phrases, and relative
clauses are put before their referents. English is considered to be the head-initial one,
and the head is always prior to the dependents. The difference between head-final and
head-initial typologies leads to the distinct head-dependents relations, namely filler-gap
relationship.
Chinese provides end-directed focus. Chinese is a typical head-final language, and
the gap or dependent is followed by the head or filler. The sequence of processing is
“relative clause – relativizer –filler”. Both the relativizer and the filler are linearized to
the right of the clauses. The head-final Chinese sample is as follows.
“wuding/shang/zhire/yangguang/tuise/[__GAP] de /jimu[FILLER] ” [Chinese]
“the house/on/shining/the sun/faded/[__GAP] relativizer /the building blocks
[FILLER]”
“the building blocks the sun shining on the house faded” [English]
According to the Penn Treebank Set and Stanford Parser analysis, we can obtain the
hierarchical structure of the Chinese sample.
English gives head-occupied focus. English is mostly considered head-initial, and
the filler appears firstly with the gap following. The processing is “filler– relativizer –
relative clause”.
812 J. Du et al.

(ROOT
(NP
(CP
(IP
(LCP
(NP(NN wuding
(LC shang))
(NP
(VP(VA zhire)
(NP(NN yangguang)))
(VP(VA tuise)))
(DEC de))
(NP(NN jimu))))

Dependency tree is a helpful diagram to clearly show the filler-gap relation. Please
see Franz Kafka’s analysis (see Fig. 2).

discovered

he that

had

he been

changed

into

bug

amonstrous
verminous
...he discovered that [...] he had been changed into a
monstrous verminous bug

Fig. 2. Franz Kafka’s dependency tree.

In Fig. 2, we can find the filler appears firstly and GAPs are hierarchically placed as
the dependents. This head-occupied structure is distinctly different from Chinese end-
directed one. According to the analysis of Chinese sample above, we can obtain
different filler-gap relations and different hierarchical structures.
Towards Computing Technologies on Machine Parsing 813

“the building blocks [FILLER] the sun shining on the house faded [__GAP]”
(ROOT
(FRAG
(NP
(NP(DT The) (NN building))
(NP(NNS blocks))))
(SBAR
(S
(NP
(NP(DT the)(NN sun))
(VP(VBG rising)
(PP(IN on)
(NP(DT the)(NN house)))))
(VP(VBD faded))))

Different filler-gap relations in source and target languages result in different out-
put. Example 1 below is an English GP sentence which makes English readers suffer a
lot of cognitive ups and downs. However, the Chinese translation of this sentence is a
non-GP sentence, and there is no cognitive overburden for readers. The reason lies in
the distinctly different filler-gap relationships.
Example 1. The building blocks the sun shining on the house faded are red. (GP
sentence).
Example 2. Wuding shang zhire yangguang tuise de jimu shi hongse de. (Non-GP
sentence).
According to the Context-Free Grammar, the Example 1 can be processed suc-
cessfully with the shift of blocks from verb to nouns. Please see the processing below.
Input: The building blocks the sun shining on the house faded are red.
G={Vn, Vt, S, P}
Vn={S, NP, VP, IP, Det, N, SC, V, PP, P, Adj}
Vt={the, building, blocks, sun, shining, on, house, faded, are, red}
S=S
P:
(a) S-NP VP (b) NP-NP IP (c) NP-Det NP (d) NP-Det N
(e) NP-N N (f) NP-NP SC (g) SC-V PP (h) PP-P NP
(i) IP-NP VP (j) VP-V Adj (k) VP-V (l) VP-V NP
(m) Det-{the} (n) N-(building, building blocks, sun, house}
(o) V-{blocks, shining, faded, are} (p) Adj-{red} (q) Prep-{on}

Processing procedures can be shown below. If the blocks are considered to be a


verb, the processing breakdown will be created, and system will have to backtrack to
the position where blocks can be optionally considered to be a plural noun rather than a
verb.
We can find that system will return to the original place in which the building
blocks are processed as a compound rather than a structure of N+V.
814 J. Du et al.

The building blocks ... on the house faded are red


Det building blocks ... on the house faded are red m
Det N blocks ... on the house faded are red n
NP blocks ... on the house faded are red d
NP V the sun ... on the house faded are red o
NP V Det sun ... on the house faded are red m
NP V Det N shining on the house faded are red n
NP V NP shining on the house faded are red n d
NP V NP V on the house faded are red o
NP V NP V P the house faded are red q
NP V NP V P Det house faded are red m
NP V NP V P Det N faded are red n
NP V NP V P NP faded are red d
NP V NP V PP faded are red h
NP V NP SC faded are red g
NP V NP faded are red f
NP VP faded are red l
S faded are red a
S V are red o
S V V red o
S V V Adj p
S V VP j
?

Breakdown and Backtracking


Towards Computing Technologies on Machine Parsing 815

This option leads to a successful processing after the breakdown and backtracking.
The building blocks ... on the house faded are red
Det building blocks ... on the house faded are red m
Det N blocks ... on the house faded are red n
Det NN ... on the house faded are red n
Det NP ... on the house faded are red e
NP the sun shining. on the house faded are red c
NP Det sun shining on the house faded are red m
NP Det N shining on the house faded are red n
NP NP shining on the house faded are red d
NP NP V on the house faded are red o
NP NP V P the house faded are red q
NP NP V P Det house faded are red m
NP NP V P Det N faded are red n
NP NP V P NP faded are red d
NP NP V PP faded are red h
NP NP SC faded are red g
NP NP faded are red f
NP NP V are red o
NP NP VP red k
NP IP are red i
NP are red b
NP V red o
NP V Adj p
NP VP j
S a
SUCCESS

The decoding above can be shown in a clear tree diagram in which the key word of
blocks is considered to be a plural noun rather than a verb (Fig. 3).

NP VP

NP IP V Adj

Det NP NP VP

N N NP SC V

Det N V PP

Prep NP

Det N

The building blocks the sun shining on the house faded are red

Fig. 3. Tree diagram of Example 1.


816 J. Du et al.

Recursive Transition Network is another effective method to process Example 1


clearly and concisely. According to the RTN in Fig. 4, the system comprises one main
net and five subnets. The different hierarchical frameworks make the processing
efficiently.

S net: NP VP
0 1 f
N SC

NP subnet: Det N IP
0 1 2 f

VP subnet: V Adj
0 1 f
NP

V-ing PP
SC subnet: 0 1 f

P NP
PP subnet: 0 1 f

NP VP
IP subnet 0 1 f

Fig. 4. RTN of Example 1.

According to the NP subnet, both the building and the building blocks can be
processed as a NP, and the ambiguity leads to the different directions of processing.
Towards Computing Technologies on Machine Parsing 817

The building blocks the sun shining on the house faded are red
<S/0, The building blocks the sun...faded are red, >
<NP/0, The building blocks the sun...faded are red, S/1: >
<NP/1, building blocks the sun...faded are red, S/1: >
<NP/2, blocks the sun...faded are red, S/1: >
<NP/f, blocks the sun...faded are red, S/1: >
<VP/0, blocks the sun...faded are red, NP/f: S/f: >
<VP/1, the sun...faded are red, NP/f: S/f: >
<NP/0, the sun...faded are red, VP/f: NP/f: S/f: >
<NP/1, sun shining...faded are red, VP/f: NP/f: S/f: >
<NP/2, shining...faded are red, VP/f: NP/f: S/f: >
<SC/0, shining...faded are red, NP/2: VP/f: NP/f: S/f: >
<SC/1, on the house...red, NP/2: VP/f: NP/f: S/f: >
<PP/0, on the house...red, SC/1: NP/2: VP/f: NP/f: S/f: >
<PP/1, the house...red, SC/1: NP/2: VP/f: NP/f: S/f: >
<NP/0, the...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: >
<NP/1, house...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: >
<NP/2, faded...red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: >
<NP/f, faded are red, PP/f: SC/1: NP/2: VP/f: NP/f: S/f: >
<PP/f, faded are red, SC/1: NP/2: VP/f: NP/f: S/f: >
<SC/f, faded are red, NP/2: VP/f: NP/f: S/f: >
<NP/f, faded are red, VP/f: NP/f: S/f: >
<VP/f, faded are red, NP/f: S/f: >
<NP/f, faded are red, S/f: >
<S/f, faded are red, >
<, faded are red, >
FAIL
BREAKDOWN AND BACKTRACKING

The processing above shows that the option of the building as a NP considers
blocks to be a verb, and this requirement directs towards the processing breakdown and
backtracking. If the building blocks are accepted by the system as a NP, a perfect result
will appear.
818 J. Du et al.

<NP/1, building blocks the sun...faded are red, S/1: >


<NP/1, blocks the sun...faded are red, S/1: >
<NP/2, the sun...faded are red, S/1: >
<IP/0, the sun...faded are red, NP/2: S/1: >
<NP/0, the sun...faded are red, IP/1: NP/2: S/1: >
<NP/1, sun...faded are red, IP/1: NP/2: S/1: >
<NP/2, shining on the house...red, IP/1: NP/2: S/1: >
<SC/0, shining on the...red, NP/2: IP/1: NP/2: S/1: >
<SC/1, on the house...red, NP/2: IP/1: NP/2: S/1: >
<PP/0, on the house...red, SC/1: NP/2: IP/1: NP/2: S/1: >
<PP/1, the house...red, SC/1: NP/2: IP/1: NP/2: S/1: >
<NP/0, the...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: >
<NP/1, house...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: >
<NP/2, faded...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: >
<NP/f, faded...red, PP/1: SC/1: NP/2: IP/1: NP/2: S/1: >
<PP/f, faded are red, SC/1: NP/2: IP/1: NP/2: S/1: >
<SC/f, faded are red, NP/2: IP/1: NP/2: S/1: >
<NP/f, faded are red, IP/1: NP/2: S/1: >
<VP/0, faded are red, IP/f: NP/2: S/1: >
<VP/f, are red, IP/f: NP/2: S/1: >
<IP/f, are red, NP/2: S/1: >
<NP/f, are red, S/1: >
<VP/0, are red, S/f: >
<VP/1, red, S/f: >
<VP/f, , S/f: >
<S/f, , >
<, , >
SUCCESS

The unsuccessful and successful processing shown above can be expressed in a


well-formed substring table. The well-formed substring table is an n*n matrix in which
n refers to the string of length. The field (i, j) of the chart comprises the set of all
elements which start at position
 i and finish at position j, namely
Chart ði; jÞ ¼ AjA ! Wi þ 1 . . .Wj .
The famous CYK algorithm can be used to process the GP sentence effectively.
According to the CYK, if chart (2, 3), namely, blocks, is considered to be a verb, we
can obtain a non-well-formed substring table in Fig. 5 in which the last three words fail
to be included in the system. The chart and the n*n matrix based on the processing are
shown below.
In the unsuccessful processing matrix of Example 1, the last words V, V, A cannot
be processed since system has completed the processing at the chart (0, 9) in Table 1 in
which S has been obtained.
Obviously, this processing is unacceptable. If Chart (2, 3) is processed as a noun
and chart (0, 3) is analyzed as a NP, system can process perfectly in Fig. 6.
Towards Computing Technologies on Machine Parsing 819

Fig. 5. The non-well-formed substring table of Example 1.

Table 1. The unsuccessful processing matrix of Example 1.


1 2 3 4 5 6 7 8 9 10 11 12
0 {D} {NP} {} {} {} {} {} {} {S} {} {} {?}
1 {N} {} {} {} {} {} {} {} {} {} {}
2 {V} {} {} {} {} {} {VP} {} {} {}
3 {D} {NP} {} {} {} {NP} {} {} {}
4 {N} {} {} {} {} {} {} {}
5 {V} {} {} {SC} {} {} {}
6 {P} {} {PP} {} {} {}
7 {D} {NP} {} {} {}
8 {N} {} {} {}
9 {V} {} {?}
10 {V} {VP}
11 {A}

Fig. 6. The well-formed substring table of Example 1.


820 J. Du et al.

.0The. 1building.2blocks. 3the. 4sun. 5shining. 6on. 7the. 8house. 9faded. 10are. 11red.12
n:=12
for j: =1 to string length(12)
lexical_chart_fill (j-1, j)
for i: j-2 down to 0
syntactic_chart_fill(i, j)

Fill the field (j−1, j) in the chart with the word j which belongs to the preterminal
category in Table 2.

Table 2. The successful processing matrix of Example 1.


1 2 3 4 5 6 7 8 9 10 11 12
0 {D} {} {NP} {} {} {} {} {} {} {NP} {} {S}
1 {N} {NP} {} {} {} {} {} {} {} {} {}
2 {N} {} {} {} {} {} {} {} {} {}
3 {D} {NP} {} {} {} {NP} {IP} {} {}
4 {N} {} {} {} {} {} {} {}
5 {V} {} {} {SC} {} {} {}
6 {P} {} {PP} {} {} {}
7 {D} {NP} {} {} {}
8 {N} {} {} {}
9 {V} {} {}
10 {V} {VP}
11 {A}

chart (j-1, j):={X | X wordj P}


j-1=0, j=1, chart(0, 1):={The} j-1=1, j=2, chart(1, 2):={building}
j-1=2, j=3, chart(2, 3):={blocks} j-1=3, j=4, chart(3, 4):={the}
j-1=4, j=5, chart(4, 5):={sun} j-1=5, j=6, chart(5, 6):={shining}
j-1=6, j=7, chart(6, 7):={on} j-1=7, j=8, chart(7, 8):={the}
j-1=8, j=9, chart(8, 9):={house} j-1=9, j=10, chart(9, 10):={faded}
j-1=10, j=11, chart(10, 11):={are} j-1=11, j=12, chart(11, 12):={red}

The reduction steps abide by the syntactic rules by which the reduced symbols
cover the string from i to j.
Towards Computing Technologies on Machine Parsing 821

syntactic_chart_fill(i, j)
for i: =0 to 10
chart(i, j)={A: A BC P; i<k <j; B chart (i, k);C chart (k, j)}
chart(i, j):={}
for k:= i+1 to j-1
for every A BC P
if B chart (i, k) and C chart (k, j)
then chart(i, j):=chart(i, j) {A}
If S chart(0,n)
then accept
else reject.

By means of discussion of English GP sentence, we find that processing breakdown


and backtracking is the special character of GP sentence. Generally speaking, the
frequency is an important character to influence the result. The preferred choice with
high frequency is replaced by the unpreferred one with low frequency when garden
path effect takes place. Whether frequency in Chinese GP sentences is effective or not
will be discussed as follows.

3.3 Semantic Effect on Machine Translation of Chinese Garden Path


Sentences
Chinese GP sentences are closely related to frequency. According to Du’ idea in his
doctoral dissertation entitled The Asymmetric Information Compensation Hypothesis:
Research on Confusion Quotient in Garden Path Model, [25] Chinese GP model
focuses more on semantic selection than English one, which intensively highlights the
structural selection.
Theta theory can be used to explain Chinese GP sentences. In the theta theory, the
theta roles assigned to subject and object by verbs are considerably different.
Example 3. Mary broke the windows last week.
Example 4. Mary broke her legs last week.
Example 5 daibu de shi jingcha. It was the police that ordered the arrest (of the
suspect).
Example 6 daibu de shi yifan. It was the suspect that was arrested (by the police).
By comparing the English examples above, we can find that the same verb broke
assigns same internal theta role THEME to different internal arguments the windows
and her legs, and gives different external theta roles AGENT and PATIENT to the same
external argument Mary. We can obtain the argument structures based on Examples 3
and 4.
Break: V; [NP[+AGENT]/NP[+PATIENT] NP[+THEME]]
The same theta roles in Chinese examples function as same as English examples. In
Example 5, the verb shi assigns the internal theta role THEME to internal argument
822 J. Du et al.

jingcha, and then the whole verbal phrase provides the external theta role AGENT to
external argument daibu de. This means the policeman has the power to arrest the
suspect. In Example 6, yifan obtains the THEME role and daibu de is assigned as
PATIENT. The same verb structure leads to different semantic selections, and the
structural unbalance in frequency may result in the GP model.
Shi: V; [NP[+AGENT]/NP[+PATIENT] NP[+THEME]]
Confusion Quotient (CQ) is created to analyze the Chinese GP sentences. Dr. Du
introduced the idea of Confusion Quotient for the analysis of Chinese GP sentences in
his doctoral dissertation. He thought that the asymmetric information is the key factor
to decode the complex sentences, and a lot of parameters are involved in the language
processing. Please see the Formula.
n  
1X Oi
Vcq ¼ 2 ð1Þ
n i1 Ei

Vcq is value of confusion quotient. O (observer) means the real frequencies found in
the corpus. E(expecter) means the ideal frequencies expected to be found in the corpus.
N (number) means the numbers of peculiarities involved in the natural language pro-
cessing. I (i) means the unit of peculiarity. If the value is positive, the range is (−∞,1),
and the negative result brings the duration of (1, 2). There are three conditions involved
in the discussion.
Firstly, two related syntactic structures A/B are hypothesized to exist.
(1) Oi=Ei. This is a balanced structure in which all the syntactic effects of various
structures are unobvious. Structure A and B have the same frequency, namely,
they respectively occupy 50 per cent of the whole frequencies. The value of
confusion quotient is 1, which means the ambiguousness appears and lingers on.
Readers can obtain ambiguous meanings from the same structure and fail to
distinguish them without the help of context.

Example 7. Shangke de shi xiaowu. A: It is xiaowu who attends class. / B: It is


xiaowu who gives a lecture.

(2) Oi>Ei. This is an unbalanced structure in which the frequency promotes the pri-
macy of prototype structure whose frequency is much greater than 50%. For
example, if Structure A obtains much higher frequency than Structure B, A may
be the prototype in the readers’ cognition, establishing the primacy of decoding.
If A nearly occupies all the frequency while B fails to appear in the corpus, the
value of confusion quotient directs to −∞, which means the absolute primacy of
prototype structure has no confusion.

Example 8. Zheci xingdong zhong daibu de shi jingcha. The police will be arrested in
the action.
In the Chinese corpus (http://www.cncorpus.org/ccindex.aspx), the structure of
“Shi: V; [NP[+PATIENT] NP[+THEME]]” is the prototype, occupying very high frequency
in which Jingcha is the patient and will be arrested.
Towards Computing Technologies on Machine Parsing 823

From the discussion above, we can see that the prototypical structure (A structure)
has different confusion quotient, and the value varies in the duration (−∞,1). If Oi of A
structure is great enough, the CQ value directs (−∞), which means the least confusion.
If Oi of A structure has the same frequency with Ei, A structure and B structure share
the frequency and they are ambiguous structure. The value of CQ of A structure is 1,
which is the extreme of prototype structure. Once the frequency of A structure is lower
than B structure, A will not be considered to be the prototype. Therefore, the CQ value
of prototype structure is a semi-opened structure in which the lowest frequency brings
the greatest confusion value 1, and the highest frequency leads to least confusion value
(−∞).
(3) Oi<Ei. This is another unbalanced structure. If Structure A has low observed fre-
quency (Oi) than the expected frequency (Ei), the GP model may gradually take
effect. For example, if frequency of Oi is high towards the same frequency of Ei, the
value of CQ of A structure touches the extreme of 1, the smallest value of GP
model. If frequency of Oi is low enough towards zero, the value of CQ of A
structure nearly reaches the extreme of 2, the greatest value of GP model. In other
words, the CQ value of GP model is a closed structure where the variation of value
exists between 1 and 2. The greater the value is, the more confusion is, and the
complexer GP cognition is Statistics helps to define the GP model’s analysis.
According to statistics, if the significance level is .05; degree of freedom is 1; then
critical value is 3.84. If we hypothesize the whole frequency is 50, and the fre-
quency of B structure is X. Then, the frequency of A structure is 50-X, and the
expected frequency is half of the whole one, i.e. 25.

X ðO  EÞ2
x2 ¼ ð2Þ
E

The nonparametric statistics is useful to analyze the primacy of models. Chi-square


test value is x2; O means observed frequency; E means expected frequency. According
to the hypothesis, we can obtain the GP’s extreme frequency by which the critical ratio
between A structure and B structure can be established. Please see the Table 3.

Table 3. Chi-square test of GP model’s frequency.


Category Observed Expected Deviation D2 D2/E
A structure 50−X 25 25−X (25−X)2 (25−X)2/25
B structure X 25 X−25 (X−25)2 (X−25)2/25
Total 50 50 3.84

In Table 3, we can calculate X. (X−25)2/25=1.92, X=18。This means the ratio


between A frequency and B frequency is 32: 18. If the observed frequency of Struc-
ture A is higher than 32, A structure will be an obvious prototype structure. The higher
the frequency is, the easier the structure is. There is no cognitive overburden for
824 J. Du et al.

readers. If the observed frequency is lower than 32, the frequency of A structure will
approach the frequency of B structure, resulting in the evitable ambiguity.
The critical value of confusion quotient can be calculated. According to the dis-
cussion above, we hypothesize the total is 50; the critical observed number of A
structure is 32; the critical expected number of B structure is 18; the number of
peculiarities involved in the processing is 1. According to Formula 1, the critical value
of CQ can be obtained.
In Table 4, value of CQ of A structure is 0.72, which belongs to (−∞,1), and value
of CQ of B structure is 1.28, which belongs to (1, 2). Thus we find that if the value of
CQ of A structure is lower than 0.72, this structure is considered the prototype. If the
value of CQ of B structure is higher than 1.28, B structure has more potential to
promote the GP model. The Chinese GP sentence is as follows.

Table 4. Confusion quotient value of GP model.


Category Observed Expected O−E (O−E)/E 1−(O−E)/E
A structure 32 25 +7 0.28 0.72
B structure 18 25 −7 −0.28 1.28
Total 50 50

Example 9. Daibu de shi jingcha; shenpan de shi faguan; fuxing de shi zuifan. It was
the police that ordered the arrest (of the suspect); it was the judge that sentenced (sb to a
long term of imprisonment); it was the criminal that served a prison sentence.
Example 9 is a Chinese GP sentence in which daibu and shenpan are two potential
peculiarities involved in the language processing, namely, N = 2. According to the
Formula 1, the statistics-based analysis helps to decode GP sentence. All the linguistic
data is from the website http://www.cncorpus.org/ccindex.aspx.
We can obtain CQ values of daibu and shenpan with the peculiarity of “V; NP
[PATIENT]NP[THEME]”, namely, Vcq ¼ ð2 þ 2Þ=2 ¼ 2. The CQ value of “V; NP
[AGENT]NP[THEME]” can be shown as Vcq ¼ ð0:54 þ 1:67Þ=2 ¼ 0:57. The dif-
ferent values express the level of confusion. The higher the value is, the more
confusion is. If the cognition shifts from the least-confused structure (0.57) to the most-
confused structure (2), GP model takes effect.
If there is no shift from the prototype structure to the low frequency structure, GP
effect fails to appear. The contrastive sentence to Example 9 is as follows:
Example 10. Daibu de shi yifan; shenpan de shi baotu; fuxing de shi zuifan. It was the
suspect that was arrested; it was the thugs that were sentenced (to death); it was the
criminal that served a prison sentence.
According to the frequency list shown in Tables 5 and 6 about daibu and shenpan,
we can obtain Vcq ¼ ð0:54 þ 1:67Þ=2 ¼ 0:57, which is lower than the critical CQ
value of 0.72. That means Example 10 is a prototype structure and no shift happens,
without bringing the breakdown or overburden to the cognition, even though both
Examples 9 and 10 share the same structure.
Towards Computing Technologies on Machine Parsing 825

Table 5. The calculation of CQ value of daibu.


Category Observed Expected O-E (O-E)/E 1-(O-E)/E
Daibu:V; NP[AGENT]NP[THEME] 44 17.33 +26.67 1.54 −0.54
Daibu:V; NP[PATIENT]NP[THEME] 0 17.33 −17.33 −1 2
Daibu:V; 22 17.33 +4.67 0.27 0.73
NP[THEME][bei]
Daibu:N 18 17.33 +0.67 0.04 0.96
Daibu:V; 14 17.33 −3.33 −0.19 1.19
NP[THEME]
Daibu:V; 6 17.33 −11.33 −0.65 1.65
NP[THEME][ba/jiang]
Total 104 104

Table 6. The calculation of CQ value of shenpan.


Category Observed Expected O−E (O−E)/E 1−(O−E)/E
Shenpan: V; 11 33.33 −22.33 −0.67 1.67
NP[AGENT]NP[THEME]
Shenpan: V; 0 33.33 −33.33 −1.00 2.00
NP[PATIENT]NP[THEME]
Shenpan: V; 2 33.33 −31.33 −0.94 1.94
NP[THEME][bei]
Shenpan: N 146 33.33 112.67 3.38 −2.38
Shenpan: V; 39 33.33 5.67 0.17 0.83
NP[THEME]
Shenpan: V; 2 33.33 −31.33 −0.94 1.94
NP[THEME][ba/jiang]
Total 200 200

The discussion above shows that Chinese GP model is closely related to semantic
information, and the same syntactic structure leads to different even opposite meaning.
On the contrary, syntactic structure considerably affects the processing of English GP
sentences.

4 Conclusion

MT is a sub-field of computational linguistics and is closely related to linguistics. Both


computational technologies and linguistic knowledge are necessary for MT. The
support and integration of rule-based module and statistical translation module is
important as an ongoing and stable factor for MT system. For the head-initial language,
e.g. English, syntactic information is the key to process GP sentences while for the
head-final language, e.g. Chinese, semantic information is crucial for system to decode
GP sentences. It is concluded that the integration of various methods, e.g. CFG, RTN,
826 J. Du et al.

CYK, WFST and CQ analysis, are effective to parse GP sentences. With the devel-
opment of cognitive science, the work of cognitive psychology and brain sciences can
be introduced to analyze the processing breakdown of GP sentences, which makes
human computer interaction intertwined with cognitive engineering, as a result,
markedly improving the performance of translation in the future.

References
1. Dangsaart, S., et al.: Intelligent Thai text–Thai sign translation for language learning.
Comput. Educ. 3(51), 1125–1141 (2008)
2. Dorr, B.: Interlingual machine translation a parameterized approach. Artif. Intell. 1(63), 429–
492 (1993)
3. Steiner, E.: Some remarks on a functional level for machine translation. Lang. Sci. 4(14),
607–621 (1992)
4. Du, J.L., Yu, P.F.: Syntax-directed machine translation of natural language: effect of garden
path phenomenon on sentence structure. In: 2010 International Conference on Intelligent
Systems Design and Engineering Applications, pp. 535–539. IEEE (2010)
5. Wang, Y. et al.: A new method to calibrate robot visual measurement system. In: Advances
in Mechanical Engineering (2013)
6. Wang, X., Wanli, Z., Wang, Y.: A novel approach to word sense disambiguation based on
topical and semantic association. Sci. World J. 2013, 8 pages (2013)
7. Bever, T.G.: The cognitive basis for linguistic structures. In: Hayes, J.R. (ed.) Cognition and
the Development of Language, pp. 279–362. Wiley, New York (1970)
8. Lin, C., Bever, T.G.: Garden path and the comprehension of head-final relative clauses. In:
Processing and Producing Head-Final Structures, pp. 277–297. Springer, Netherlands (2011)
9. Nitta, Y.: Problems of machine translation systems: effect of cultural differences on sentence
structure. Futur. Gener. Comput. Syst. 2(2), 101–115 (1986)
10. Pérez, A., Torres, M.I., Casacuberta, F.: Joining linguistic and statistical methods for
Spanish-to-Basque speech translation. Speech Commun. 11(50), 1021–1033 (2008)
11. Shin, J.H., Panayiotis, G., Shrikanth, N.: Towards modeling user behavior in interactions
mediated through an automated bidirectional speech translation system. Comput. Speech
Lang. 2(24), 232–256 (2010)
12. Khalilov, M., Fonollosa, J.A.R.: Syntax-based reordering for statistical machine translation.
Comput. Speech Lang. 4(25), 761–788 (2011)
13. Lehtokangas, R., Airio, E., Järvelin, K.: Transitive dictionary translation challenges direct
dictionary translation in CLIR. Inf. Process. Manag. 6(40), 973–988 (2004)
14. Zhang, R., Kikui, G.: Integration of speech recognition and machine translation: speech
recognition word lattice translation. Speech Commun. 48, 321–334 (2006)
15. López, V.F., et al.: A SomAgent statistical machine translation. Appl. Soft Comput. 2(11),
2925–2933 (2011)
16. Soderland, S., et al.: Panlingual lexical translation via probabilistic inference. Artif. Intell. 9
(174), 619–637 (2010)
17. Sassa, M., Ito, Y., Kohama, M.: Comparison and evaluation of back-translation algorithms
for static single assignment forms. Comput. Lang. Syst. Struct. 2(35), 173–195 (2009)
18. Mankai, C., Mili, A.: Machine translation from Arabic to English and French. Inf. Sci. Appl.
3(2), 91–109 (1995)
19. Oard, D.W., He, D., Wang, J.: User-assisted query translation for interactive cross-language
information retrieval. Inf. Process. Manag. 1(44), 181–211 (2008)
Towards Computing Technologies on Machine Parsing 827

20. Casacuberta, F., et al.: Some approaches to statistical and finite-state speech-to-speech
translation. Comput. Speech Lang. 1(18), 25–47 (2004)
21. San-Segundo, R., et al.: Speech to sign language translation system for Spanish. Speech
Commun. 11(50), 1009–1020 (2008)
22. Raman, S., Reddy, N.R.: A transputer-based parallel machine translation system for Indian
languages. Microprocess. Microsyst. 6(20), 373–383 (1997)
23. Patson, N.D., et al.: Lingering misinterpretations in garden-path sentences: evidence from a
paraphrasing task. J. Exp. Psychol. Learn. Mem. Cogn. 1(35), 280–285 (2009)
24. Farmer, T.A., Sarah, E., Spivey, M.J.: Gradiency and visual context in syntactic garden-
paths. J. Mem. Lang. 4(57), 570–595 (2007)
25. Du, J.L.: The Asymmetric Information Compensation Hypothesis: Research on Confusion
Quotient in Garden Path Model. The Commercial Press, Beijing, China (2015)
Music Recommender According to the User
Current Mood

Murtadha Al-Maliki(&)

School of Engineering, University of Portsmouth, Portsmouth, UK


murtadha.al-maliki@port.ac.uk

Abstract. The researcher reviews the RS human-centered design that considers


the mood of the user prior to making a recommendation, whereby the term mood
refers to the continuously changing general emotional states felt by users. Mood
management theory stipulates that people will adjust their environment,
including deciding to expose themselves to certain media, for the tenacities of
dealing with their emotional state. However, the context-awareness has turn out
to be one on the interior technologies yet the critical function for utility appli-
cations of inclusive computing environment. The venture concerning the use of
context data for inferring a user’s state of affairs is referred according to,
namely, adherence reasoning. In this research, we included the functionality
about affection cause in a music recommendation system taking into account the
mood management theory. Our proposed system includes certain modules like,
Mood Module and Recommendation Module. The Mood Module determines
the genre on the music suitable according to the user’s context. Finally, the
Recommendation Module recommends the track according to the user current
mood.

Keywords: Context-Awareness  Music recommendation system


Mood management theory

1 Introduction

The researcher reviews the RS human-centered design that considers the mood of the
user prior to making a recommendation, whereby the term mood refers to the con-
tinuously changing general emotional states felt by users [1]. The core consideration is
that users have the ability to dynamically update their mood throughout and following
numerous activities like listening of a light music, playing a strategy video game, or
watching a comedy [1, 5]. In other words, users can make an unconscious and con-
scious choice over the content of entertainment that assists in maintenance of positive
mood and healing or moderate pain in terms of both duration and intensity [4, 9] which
have been specified under the label of mood management (Knobloch-Westerwick
2006).
In their study, [9] acknowledged that a broad collection of information consump-
tion from music, news, movies, and documents are impacted by user’s mood. The
concept has been further scrutinized in the research community of mood management
[3]. In specific, selection of music is characterized by self-indulgent motivations to

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 828–834, 2019.
https://doi.org/10.1007/978-3-030-02686-8_61
Music Recommender According to the User Current Mood 829

either mend their negative mood or preserve their positive mood in terms of both
duration and intensity. In regard to this, a user’s emotional state serves as a signifi-
cantly useful predictor of their decisions of music. While a selection of genres can
possibly drive entertainment choice, tragic or sad contents are perhaps more likely to be
circumvented by alarge number of users; while funny or light-hearted music is mostly
sought after [4, 6]. Similarly, [7] established three key motivations activating users’
movie-going behaviors: entertainment, self-escape, and self-development. The former
two appear to be consistent with self-indulgent considerations for users to mend or
preserve positive mood; while the latter appears to be the least correlated to self-
indulgent considerations, in distinction, reflecting users’ eudemonic motivations in
pursuing greater meaningfulness and insight towards life for self-reflection [8]. Con-
sequently, a wider collection of movies have been confirmed to establish the con-
nections between users’ mood states and preferences of entertainment [4] (Fig. 1).

Recommender Mood
Song library .mp3 Last.FM
Engine Management

Fig. 1. The Framework.

2 Framework
2.1 Last.FM
Last.fm is a music web services providing a public API which gives a good opportunity
to the researchers to build their own programs using Last.fm data. The Last.fm API also
allows calling methods that respond in REST style XML or JSON. This module has the
ability of collecting all data from Last.fm, such as artists, their songs, their albums, their
tags. So, in order to achieve its purpose, the Song Searcher1 module has the following
main features:
• Artist tags retrieval
• Artist albums retrieval
• Tracks or songs retrieval
• Artist basic statistics retrieval
So the task of the Las.FM module is to keep a recent list of all the music files
updated by reading the tags fields from.mp3 file which contains the songs library. It
gathers all artists and their relevant information which already mentioned in above from
the Last.fm and redirecting that same information to the specific databases present in
the database server. It also identifies and interprets the missing or the wrong infor-
mation of a song by using the API through Last.FM and using the crawler to get
information of the track by locating its metadata.
The additional information of the songs such as date of release and the album
details can help in bringing much accurate song suggestions for the user. Basically, the
function of this component is to provide the user with the complete information of the
830 M. Al-Maliki

song. The user credibility is increased as better recommendations can be drawn for the
user by the system that can help in providing quality content and suitable music choices
to the user. The system uses track.getInfo to get the information of the song which can
be served as user metadata (Algorithm 1).

Algorithm 1: Last.FM
1. While there exists a track to be added to the database do
2. Go through the mp3 file tags
3. Establish a connection to the Last.FM site
4. If the tags are found to be null then
5. Collect ant tags from the Last.FM and added to the update
6. End if
7. Terminate from the while

2.2 Song Library.Mp3


It is the database when all songs are kept.

2.3 Mood Management


Our recommender system has 18 mood categories [2]. In [2], the author divided the
mood to the 18 mood tag group categories. The mood tag groups were derived by a
method combining the strength of social tags, linguistic resources and human expertise
as shown in Table 1.
The songs in our library have been classified according to the Algorithm 2 which
shows that the song tags with its frequency will be collected from last.fm by using the
method track.getTopTags and then the tags for each song will be compared with the
mood based words which belong for each mood category (Table 1) after that the mood
song factor will be determined according to (1).
X
Mood Factor ¼ ðmatched tags belong for specific mood category * tag frequencyÞ
ð1Þ

Algorithm 2: Mood Manager


1: While there are songs S in songs database do
2: Get the song tags with tag frequency from last.fm
3: Compare the collected tags for each song with mood standard table (Table 1) and
save the matched tag for each mood category
4: Calculate the mood factor for each song (mood factor equation)
5: end while

2.4 Recommender Engine


Our system is designed to recommend music to the user according to the current user
mood in a tricky way according to the mood management theory for example if the
Music Recommender According to the User Current Mood 831

Table 1:. Mood Categories [2]


Mood number Mood based words
Mood 1 (Calm) calm, comfort, quiet, serene, mellow, chill out, calm down, calming,
chillout, comforting,
content, cool down, mellow music, mellow rock, peace of mind,
quietness, relaxation, serenity, solace, soothe, soothing, still, tranquil,
tranquility, tranquility
Mood 2 (Sad) sad, sadness, unhappy, melancholic, melancholy, feeling sad,
Mood 3 (Happy) happy, happiness, happy songs, happy music, glad
Mood 4 romantic, romantic music
(Romantic)
Mood 5 upbeat, gleeful, high spirits, zest, enthusiastic, buoyancy, elation,
(Gleeful)
Mood 6 depressed, blue, dark, depressive, dreary, gloom, darkness, depress,
(Depressed) depression, depressing, gloomy
Mood 7 (Angry) anger, angry, choleric, fury, outraged, rage, angry music
Mood 8 (Grief) grief, heartbreak, mournful, sorrow, sorry, doleful, heartache,
heartbreaking, heartsick, lachrymose, mourning, plaintive, regret,
sorrowful
Mood 9 dreamy
(Dreamy)
Mood 10 cheerful, cheer up, festive, jolly, jovial, merry, cheer, cheering, cheery,
(Cheerful) get happy, rejoice, sunny
Mood 11 brooding, contemplative, meditative, reflective, broody, pensive,
(Brooding) pondering, wistful
Mood 12 aggression, aggressive
(Aggressive)
Mood 13 confident, encouraging, encouragement, optimism, optimistic
(Confident)
Mood 14 angst, anxiety, anxious, jumpy, nervous, angsty
(Anxious)
Mood 15 earnest, heartfelt
(Earnest)
Mood 16 desire, hope, hopeful
(Hopeful)
Mood 17 pessimism, cynical, pessimistic, weltschmerz, cynical, sarcastic
(Pessimism)
Mood 18 excitement, exciting, exhilarating, thrill, ardor, stimulating, thrilling,
(Excitement) titillating

current user mood is angry the system will provide him/her calm songs to make
him/her more comfortable. There are 18 mood categories in our system, therefore, a
questionnaire have been made to know the users listening taste under these 18 mood
categories according to the mood management theory. 96 users [54 female, 42 male]
832 M. Al-Maliki

had been asked about music preferences when they feel: calm, sad, happy, romantic,
gleeful, etc. as shown in Fig. 2.

Mood Data DistribuƟon for 96 people


120

100
No. of People

80

60

40

20

Type of Songs
Calm Sad Happy RomanƟc Gleeful Earnest
Depressed Angry Grief Dreamy Cheerful Pessimism
Brooding Aggressive Anxious Confident Hopeful Excitement

Fig. 2. Mood data collection.

The line graph describes the preferences of what type of songs that 96 people would
like to listen when they have different kind of mood. For instance, 64 and 48 of people
would like to listen to excitement and gleeful songs respectively when their mood is
clam while 68 and 64 would like to listen to excitement and romantic songs respec-
tively when they are in happy mood.
In our system, the highest three values for each mood will be used as shown in
Table 2.
Then the mood algorithm will be (Algorithm 3):

Algorithm 3: Mood recommender algorithm


1: Get the user current mood
2: While there are songs S in song list do
3: Get the song mood type
4: If the song mood type fits with the normalized mood table (Table 2) then
5: Add the song to the recommended list
6: end if
7: end while
Table 2. Normalized mood table
Mood Mood name Normalize moods
number Normalize mood-1 Normalize mood-2 Normalize mood-3
Name Value Score Name Value Score Name Value Score
Mood 1 CALM EXCITEMENT 64 100 ROMANTIC 48 90 GLEEFUL 48 90
Mood 2 SAD CHEERFUL 58 100 HOPEFUL 58 100 CONFIDENT 51 90
Mood 3 HAPPY EXCITEMENT 67 100 ROMANTIC 64 90 HAPPY 61 80
Mood 4 ROMANTIC ROMANTIC 96 100 HAPPY 64 90 DREAMY 64 90
Mood 5 GLEEFUL EXCITEMENT 80 100 HAPPY 61 90 GLEEFUL 61 90
Mood 6 EARNEST EXCITEMENT 64 100 CHEERFUL 64 100 HAPPY 45 90
Mood 7 DEPRESSED CHEERFUL 74 100 CONFIDENT 64 90 HOPEFUL 61 80
Mood 8 ANGRY CLAM 77 100 CONFIDENT 58 90 HOPEFUL 42 80
Mood 9 GRIEF CHEERFUL 77 100 HOPEFUL 54 90 HAPPY 48 80
Mood 10 DREAMY EXCITEMENT 86 100 ROMANTIC 58 90 GLEEFUL 48 80
Mood 11 CHEERFUL EXCITEMENT 84 100 GLEEFUL 70 90 HAPPY 64 80
Mood 12 PESSIMISM HOPEFUL 70 100 CHEERFUL 67 90 CONFIDENT 51 80
Mood 13 BROODING HAPPY 64 100 CHEERFUL 64 100 HOPEFUL 45 90
Mood 14 AGGRESSIVE CALM 83 100 EXCITEMENT 48 90 CONFIDENT 35 80
Mood-15 ANXIOUS CONFIDENT 77 100 CALM 74 90 HOPEFUL 45 80
Mood 16 CONFIDENT EXCITEMENT 86 100 GLEEFUL 64 90 CHEERFUL 48 80
Mood 17 HEPEFUL EXCITEMENT 80 100 GLEEFUL 42 90 CHEERFUL 42 90
Mood 18 EXCITEMENT EXCITEMENT 96 100 GLEEFUL 61 90 HAPPY 48 80
Music Recommender According to the User Current Mood
833
834 M. Al-Maliki

3 Conclusion and Future Work

In this work, the songs have been classified according to the mood. The classification
process had been done by collecting the songs’ tags from last.fm and compares them
with the standard mood classification table (Table 1). The questionnaire had been made
and 96 users had been asked about their preferences to listen when they are in specific
mood and the data were distributed according to the results from that questionnaire.
The future work will be evaluated by using the online evaluation method and ask real
user to test the system.

References
1. Holbrook, M.B., Gardner, M.P.: Illustrating a dynamic model of the mood-updating process
in consumer behavior. Psychol. Mark. 17(3), 165 (2000)
2. Hu, X., Downie, J.S., Ehmann, A.F.: Lyric text mining in music mood classification. Am.
Music. 183(5,049), 2–209 (2009)
3. Knobloch-Westerwick, S.: Mood management: theory, evidence, and advancements. In:
Bryant, J., Vorderer, P. (eds.) Psychology of Entertainment, pp. 239–254 (2006)
4. Oliver, M.B.: Tender affective states as predictors of entertainment preference. J. Commun. 58
(1), 40–61 (2008)
5. Ryan, R.M., Rigby, C.S., Przybylski, A.: The motivational pull of video games: a self-
determination theory approach. Motiv. Emot. 30(4), 344–360 (2006)
6. Schaefer, A., Nils, F., Sanchez, X., Philippot, P.: A Multi-criteria Assessment of Emotional
Films (unpublished manuscript) (2005)
7. Tesser, A., Millar, K., Wu, C.H.: On the perceived functions of movies. J. Psychol. 122,
441–449 (1998)
8. Waterman, A.S.: Two conceptions of happiness: contrasts of personal expressiveness
(eudaimonia) and hedonic enjoyment. J. Pers. Soc. Psychol. 64, 678–691 (1993)
9. Zillmann, D.: Mood management: using entertainment to full advantage. In: Donohew, L.,
Sypher, H.E., Higgins, E.T. (eds.) Communication, Social Cognition, and Affect, pp. 147–171
(1988)
Development of Extreme Learning Machine
Radial Basis Function Neural Network Models
to Predict Residual Aluminum for Water
Treatment Plants

C. D. Jayaweera(&) and N. Aziz

School of Chemical Engineering, Engineering Campus,


Universiti Sains Malaysia, Seri Ampangan, Seberang Perai Selatan,
14300 Nibong Tebal, Penang, Malaysia
chamanthidj@gmail.com, chnaziz@usm.my

Abstract. Two sets of input parameters were employed to develop Extreme


Learning Machine Radial Basis Function (ELM-RBF) models predicting
residual aluminum, in order to facilitate parametric analysis of reported physical
and chemical phenomena relating to the effect of alum dosage, raw water
(RW) turbidity and RW color on residual aluminum concentration. RW turbidity
was identified as the dominant variable affecting the distribution of the multi-
variate data, condensed into two principal components using principal compo-
nent analysis. Thus two sets of models were developed based on the RW
turbidity value: low turbidity models and high turbidity models. The perfor-
mance of all models was satisfactory, with test correlation coefficients exceeding
0.85. The shapes of the plots of the parametric analysis were satisfactory and
were in line with reported phenomena. However, the numerical accuracy of the
plots obtained by the parametric analysis was poor. It was noted that using data
with a wider range of values for the dominant variable (RW turbidity) helped
improve the parametric plots.

Keywords: Residual aluminum  ELM-RBF  Water treatment

1 Introduction

Soft sensors are applied in a wide range of applications such as environmental emission
monitoring, estimation of product compositions in distillation columns, etc. Usage of
soft sensors enable tighter control of the most critical parameters of a production
process, implementing early warning systems, simulating ‘what if’ scenarios, replacing
and optimization of the use of expensive hardware sensors. Soft sensors cannot
completely replace hardware sensors. However a significant economic benefit could be
realized by intelligent use of cheap hardware or efficient use of expensive hardware
sensors in combination with soft sensors [1]. Thus, soft sensors could be used in water
treatment applications for predicting treated water qualities as measures to prevent
process upsets (early warning systems) or to replace hardware sensors. Residual alu-
minum is a major concern in water treatment which could cause health issues such as

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 835–848, 2019.
https://doi.org/10.1007/978-3-030-02686-8_62
836 C. D. Jayaweera and N. Aziz

the Alzheimer’s disease. Therefore development of a soft sensor predicting residual


aluminum would be beneficial as a preventive measure to avoid hindering the operation
of water treatment plants.
Water characteristics such as pH, natural organic matter and alum dosage are few of
many factors affecting residual aluminum content. While extensive studies have been
carried out to model the coagulation process for water treatment [2–12], very few
studies had been carried out in modeling residual aluminum. The author in [12]
developed a multiple input multiple output model predicting residual aluminum and pH
using 11 input parameters. In [13], the author carried out studies to determine optimum
parameters to model residual aluminum content with satisfactory performance. How-
ever, no studies have been carried out regarding modeling of residual aluminum using
the ELM-RBF model.
However, research has been done to examine chemical and physical phenomena
associated with residual aluminum, such as the effects of alum dosage, natural organic
matter, pH and temperature on the residual aluminum concentration. Al exists in sol-
uble forms at pH less than 6. At higher pH the solubility of Al declines and it is easier
to reduce the residual Al concentration [13]. The residual Al content could also be an
issue for water plants situated in cold areas, as low temperatures tend to increase the
residual Al concentration [14]. The speciation of Al depends on the competition of Al
cations for anionic groups. Monomeric forms of Al tend to form complexes with
organic matter [13]. In their study [15], the authors mention that the dose of the
coagulant is also a prominent factor affecting residual Al content. Aluminum ions
remain in solution until the binding capacity of NOM is fulfilled. On increasing the
alum dosage, once the maximum binding capacity of NOM has been exceeded, the
complex formed is bound to precipitate. The ability of the said complex to precipitate
depends on pH.
Efforts have been made by researchers to capture reported physical and chemical
phenomena in their models. Authors in [2] developed models predicting treated water
turbidity and studied the models’ ability to capture reported chemical and physical
phenomena. It was observed that multi-layer perceptron and general regression neural
network models were not able to demonstrate complex relationships such as the
response of TW turbidity to changes in alum dosage. However, it has been reported that
ELM-RBF models demonstrate excellent generalization similar to support vector
machines [16]. Therefore ELM-RBF neural network was employed for model devel-
opment in this research.
This paper analyses the model’s ability to satisfactorily predict residual aluminum
content and capture reported chemical and physical phenomena under two conditions:
at high and low turbidity. The reason for using raw turbidity as the determining factor is
discussed in Sect. 2. Two sets of input parameters were selected to facilitate carrying
out a feasible parametric analysis.
Objectives of this study are to
(1) Develop models predicting residual aluminum with satisfactory performance.
(2) Examine the model’s ability to capture reported physical and chemical phenom-
ena relating to the response of residual aluminum to changes in alum dosage,
color and RW turbidity.
Development of ELM-RBF Neural Network Models 837

It was assumed in this research, that natural organic matter was responsible for
color. However, there are no means to confirm this hypothesis as no total organic
carbon or UV-254 data was available [17]. However, it was observed that the plots
obtained from the parametric analysis closely resembled the behavior expected of
residual aluminum in response to NOM.

2 Methodology

Data provided by the Segama water treatment plant in Sabah, Malaysia, was utilized for
model development. Water quality ranges of the water treated is given in Table 1.

Table 1. Water quality ranges


Raw water Treated water
pH Turb Col TDS Alkalinity pH Turb Color TDS Alkalinity Al Coagulant
(NTU) (HU) (mg/l) (NTU) (HU) (mg/l) residual dosage
(mg/l)
Min 6.6 11 36 60 50 6.5 0.19 0 90 20 0.01 20
Max 7.9 3405 656 160 170 7.5 5.24 6 170 100 0.2 170

Two ELM-RBF models were developed using input parameters as shown in


Table 2. Selection of input parameters was carried out via an exhaustive search by
testing the performance of models using all possible combinations of variables. The
number of radial basis centers used in both models is 20. The radbas function (ex )
2

was used for the transfer function of the radial basis layer. All values were normalized
using (1), such that all values ranged between 0 and 1.
x  xmin
xnorm ¼ ð1Þ
xmax  xmin

Table 2. Input parameters used for the development of models


Model Variables
1 RW turbidity, RW color, TW turbidity, Alum dosage, residual Al (t-1)
2 RW turbidity, RW color, TW color, Alum dosage, residual Al (t-1)

A principal component analysis was carried out to condense all variables into 2
significant variables in order to enable visualizing the data distribution in a 2D plot.
The purpose of plotting the distribution of data is to enable selection of a suitable
subset of data to improve the accuracy of the prediction model. Thus, during imple-
mentation of a neural network model in an industrial scale, the model itself will be able
to self-diagnose its validity for prevalent process conditions, which would increase the
model’s robustness and reliability. Therefore, it was considered more effective to
determine the suitable subset of data based on water quality variables which could be
838 C. D. Jayaweera and N. Aziz

Table 3. Principal components


Principal component Eigen value % variance Cumulative variance
PC1 0.0056 46 46
PC2 0.0034 28 74
PC3 0.0018 15 89
PC4 0.0011 9 98
PC5 0.0003 2 100

measured using sensors and exclude alum dosage in the analysis. Thus, a model could
be expected to warn the operators of its reliability for a particular process condition, by
conveniently reading a value from a sensor measuring a water quality variable
(preferably a raw water quality). Results of the principal component analysis are shown
in Table 3.
Thus the 5 variables are reduced to 2 components accounting for 74% of total
variance. The correlation coefficients of each variable with respect to the two com-
ponents are given in Table 4.

Table 4. Coefficients of variables


Variables PC1 PC2
RW color 0.30 −0.22
RW turbidity 0.77 −0.52
TW turbidity 0.32 0.44
TW color 0.44 0.69
Residual Al (t-1) 0.16 0.11

According to Table 4, PC1 is dominated by RW turbidity as it has the highest


coefficient, and PC2 is dominated by TW color.
The plot of PC1 vs. PC2 is given in Fig. 1.
It could be noted in Fig. 1 how data is clustered into groups. As noted in Table 4,
the most significant variable affecting the distribution in Fig. 1 is RW turbidity. Visual
analysis of RW turbidity data showed that majority of the normalized values were less
than 0.1. Therefore the indexes of data with RW turbidity < 0.1 were separated and
plotted on the graph in Fig. 1. The resultant plot is shown in Fig. 2.
Figure 2 demonstrates how an entire cluster of Fig. 1 had been occupied by data
with RW turbidity < 0.1. The red colored cluster constituted 83% of the total set of
data. Therefore it was decided to develop two models for the interest of maintaining
high accuracy and model performance: A model for low turbidity water (RW turbid-
ity < 0.1) and a model for high turbidity water (RW turbidity > 0.1).
Data division was carried out such that 65% was used for training (for the Hessian
matrix, which directly calculates the weights connecting the radial basis layer to the
output layer), 25% was used for validation (in order to find the fittest set of input
Development of ELM-RBF Neural Network Models 839

Fig. 1. Plot of principal components. X axis – PC2, Y axis – PC1.

Fig. 2. Plot of data with RW turbidity < 0.1.

weights/radial basis centers from 100 sets of random radial basis centers) and 10% was
used for testing.
The performance of each model developed was measured using mean square error
and correlation coefficient. A parametric analysis was also carried out to investigate the
ability of the model to capture reported physical and chemical phenomena. The
parametric analysis carried out in this study discusses the variation of the residual
aluminum concentration with RW color, RW turbidity and alum dosage. The two
models shown in Table 2 were developed in order to facilitate carrying the parametric
analysis. Model 1 enables analyzing the variation of residual aluminum with RW color,
as the absence of TW color makes the analysis feasible. Similarly model 2 enables
analyzing the variation of residual aluminum with RW turbidity. The effect of alum
dosage on residual aluminum was observed on both models.
840 C. D. Jayaweera and N. Aziz

pH is a prominent variable affecting the residual aluminum content. The solubility


of aluminum increases at low pH values (less than 6) resulting in an increase in the
residual aluminum concentration. The residual aluminum content decreases at high pH
values (higher than 6) due to the decreasing solubility. However, the pH range of the
water utilized in this process ranged between 6.5 and 7.5. Therefore it was highly
unlikely that the effect of pH could be visualized using the available data.
Temperature is one other factors affecting residual aluminum content, though
temperature measurements were not available. However, data used in this research
were from a water treatment plant in a tropical area where significant changes in
temperature in the form of seasons do not occur.

3 Results and Discussion

3.1 Low Turbidity Models


The performance of the two models developed for low turbidity water is shown in
Table 5.

Table 5. Performance of models developed for low turbidity water


Models Training Validation Testing
CC MSE CC MSE CC MSE
1 0.8999 1.92  10−4 0.9131 1.54  10−4 0.8681 4.06  10−4
2 0.8996 1.92  10−4 0.8947 2.88  10−4 0.8959 1.12  10−4

The test correlation coefficients of both models have exceeded 0.85. Thus, both
models have demonstrated satisfactory performance. The test performance of model 2
is higher than model 1. It could be noted in Table 2, that model 1 is more influenced by
behavior of turbidity, while model 2 is more influenced by color. It has been indicated
in literature that at low turbidity values, the coagulant demand is governed by NOM,
and at high turbidity values the demand is governed by turbidity [15], which may have
influenced the performance of the models. The regression plots of predicted test data
the two models are given in Figs. 3 and 4.
Predicted data distribution of model 2, as shown in Fig. 4, is closer to the y = x
line, than the distribution of model 1 (Fig. 3). Therefore, the regression plots are in line
with test performance given in Table 5.
The effect of color on residual aluminum was investigated while other variables
were fixed at constant values. The plot obtained is shown in Fig. 5.
It could be noted in Fig. 5 that the numerical accuracy of the predicted variation is
poor. However the shape of the curve obtained has a reasonable validity. The color was
increased from a very low value. Therefore, at initial stages, color removal by addition
of coagulant is not possible due to low values. As the color increases to a certain
removable degree, the residual aluminum will begin to decrease from being consumed
for color removal. According to [18], natural organic matter (which is assumed to be
Development of ELM-RBF Neural Network Models 841

Fig. 3. Regression plot of model 1 (X axis – actual data, Y axis – predicted data).

Fig. 4. Regression plot of model 2 (X axis – actual residual Al, Y axis predicted residual Al).

Fig. 5. Variation of residual aluminum with RW color as predicted by model 1. X axis – RW


color, Y axis – residual aluminum concentration.
842 C. D. Jayaweera and N. Aziz

responsible for color in this study) forms complexes with aluminum and hinders the
aluminum ions in the turbidity removal process. However, NOM has a maximum
binding capacity of aluminum. Once the coagulant dosage has been increased so as to
exceed the binding capacity of NOM, the complex formed is bound to precipitate,
simultaneously enabling efficient turbidity removal. The so called point of binding
capacity could be related to the minimum observed in the curve in Fig. 5. As the color
(NOM) increases, the Al ions in solution will increasingly form complexes with NOM
and remain in solution, possibly hindering turbidity removal, thus, relating to the
increasing trend in residual aluminum content. However, as the alum dosage is kept at a
constant, the residual aluminum content will increase to a maximum remain constant
despite of increasing color.
The variation of residual aluminum with RW turbidity is shown in Fig. 6.

Fig. 6. Variation of residual aluminum with RW turbidity as predicted by model 2 (X – RW


turbidity, Y – residual aluminum).

The numerical accuracy of the plot in Fig. 6 is poor. However, as in the case of
Fig. 5, the shape of the curve demonstrates some validity.
It was expected for the residual aluminum to remain constant at very low values of
turbidity as adding coagulant does not remove low levels of turbidity. As the turbidity
reaches a removable level, the residual aluminum was expected to decrease as the alum
is consumed in turbidity removal. However, as the turbidity increases while the alum
dosage remains constant, particles flocculate around aluminum ions to form agglom-
erates which are not large enough to be filtered, thus causing an increase in the residual
aluminum. The residual aluminum content reach and remain at a maximum, despite
increasing turbidity, as the alum dosage is kept at a constant. The expected behavior is
fairly reflected in the shape of the plot in Fig. 6.
The plots of residual aluminum vs. alum dosage are shown in Figs. 7 and 8.
Development of ELM-RBF Neural Network Models 843

Fig. 7. Variation of residual aluminum with alum dosage as predicted by model 1 (X – alum
dosage, Y – residual aluminum).

Fig. 8. Variation of residual aluminum with alum dosage as predicted by model 2 (X – alum
dosage, Y – residual aluminum).

The numerical accuracy of Figs. 7 and 8 is poor. The expected trends also cannot
be visualized in the graphs. An ideal plot of residual aluminum versus RW turbidity is
expected to have the following trends.
Residual aluminum concentration should gradually increase at low values of alum
dosage, as the coagulant concentration is not sufficient for turbidity or color removal.
Once sufficient alum has been added the residual aluminum content is expected to
decrease. The curve should reach a minimum (at maximum turbidity and color
removal) and begin to increase with alum dosage.
The narrow range of data used for model development could be one of the reasons
for the models’ inability to capture the relationship with alum dosage efficiently.
844 C. D. Jayaweera and N. Aziz

3.2 High Turbidity Models


The performance of the two models developed for high turbidity water is shown in
Table 6.

Table 6. Performance of models developed using data with RW turbidity values > 0.1
Models Training Validation Testing
CC MSE CC MSE CC MSE
1 0.8938 3.44  10−4 0.8831 2.59  10−4 0.8834 0.0057
2 0.8999 3.25  10−4 0.8705 2.96  10−4 0.8753 0.0079

It could be noted how the mean squared error had increased compared to Table 5
due to the wider range of RW turbidity in the data. However, the test correlation
coefficient remains to exceed 0.85, thus, demonstrating satisfactory performance. The
test performance of Model 1 is higher than model 2. As per the discussion under
Table 5, it could be noted that model 1, which is mainly influenced by the turbidity
behavior, is more suitable to the high turbidity condition. The regression plots of the
two models are shown in Figs. 9 and 10.

Fig. 9. Regression plot of model 1 (X – actual residual Al, Y – predicted residual Al).

Fig. 10. Regression plot of model 2 (X – actual residual Al, Y – predicted residual Al).
Development of ELM-RBF Neural Network Models 845

Both models demonstrate reasonable distribution. Data distribution in the plot of


model 1 (Fig. 9) is closer to the y = x line, agreeing with test results in Table 6.
The plot of RW color vs. residual Al as predicted by model 1 for high turbidity
water is given in Fig. 11.

Fig. 11. Variation of residual aluminum with RW color as predicted by model 1.

The effect of color on residual aluminum was investigated while other variables
were fixed at constant values. Figure 11 is similar to the plot in Fig. 5. However, the
numerical accuracy is not yet improved.
The variation of residual aluminum with RW turbidity is shown in Fig. 12.

Fig. 12. Variation of residual aluminum with RW turbidity as predicted by model 2.


846 C. D. Jayaweera and N. Aziz

Figure 12 is similar in shape to Fig. 6, with constant residual aluminum content at a


wider range of RW turbidity at initial stages. No improvement in numerical accuracy
could be observed.
The plots of the variation of residual aluminum with alum dosage are shown in
Figs. 13 and 14.

Fig. 13. Variation of residual aluminum with alum dosage as predicted by model 1.

Fig. 14. Variation of residual aluminum with alum dosage as predicted by model 2.

It could be noted that the expected shape of the curve for the residual aluminum
versus alum dosage has improved. However the numerical accuracy remains poor. The
plots demonstrate the increase in residual aluminum with alum dosage in the initial
stage, the minimum reached on treating turbidity and color, and the proceeding
increasing trend in residual aluminum with alum dosage, which were not noted in
Figs. 7 and 8. Therefore usage of a wider range of the dominating variable (RW
turbidity) appears to have improved the model’s ability to capture reported chemical
and physical phenomena.
Development of ELM-RBF Neural Network Models 847

4 Conclusion

Extreme learning machine radial basis function (ELM-RBF) neural network models
predicting residual aluminum concentration were developed with satisfactory perfor-
mance. All models had correlation coefficients exceeding 0.85. A parametric analysis
was carried out to test the models’ ability to capture reported physical and chemical
phenomena relating to the response of residual aluminum to changes in alum dosage,
raw water (RW) turbidity and Natural organic matter (NOM). The models were able to
demonstrate expected trends of the parametric plots, although the numerical accuracy
was poor. It was observed that raw water turbidity was the dominating variable
affecting the multivariate data distribution. Therefore two sets of models were devel-
oped based on ranges of values of raw water turbidity. The parametric plots improved
when data with a wider range of values for raw water turbidity was used. Models
developed in this study could be further improved by optimizing the radial basis layer
by reducing the degree of randomness using established methods such as genetic
algorithms.

Acknowledgment. The cooperation of Sabah Water Supply Department and LDWS for sup-
plying Segama Water Treatment Plant data is greatly acknowledged.

References
1. Leardi, R.: Nature-Inspired Methods in Chemometrics. Elsevier, Amsterdam (2003)
2. Kennedy, M., Gandomi, A., Miller, C.: Coagulation modeling using artificial neural
networks to predict both turbidity and DOM-PARAFAC component removal. J. Environ.
Chem. Eng. 3(4), 2829–2838 (2015)
3. Kim, C., Parnichkun, M.: MLP, ANFIS, and GRNN based real-time coagulant dosage
determination and accuracy comparison using full-scale data of a water treatment plant.
J. Water Supply: Res. Technol.-Aqua 66(1), 49–61 (2016)
4. Valentin, F.N.: An hybrid neural network based system for optimization of coagulant dosing
in a water treatment plant. Citeseerx.ist.psu.edu (1999). http://citeseerx.ist.psu.edu/viewdoc/
citations;jsessionid=81E01F677A156AF2CEDCB2C7CEB14ACE?doi=10.1.1.46.7239
5. Griffiths, K., Andrews, R.: The application of artificial neural networks for the optimization
of coagulant dosage. Water Sci. Technol. Water Supply 11(5), 605 (2011)
6. Joo, D.: The effects of data preprocessing in the determination of coagulant dosing rate.
Water Res. 34(13), 3295–3302 (2000)
7. Wu, G., Lo, S.: Effects of data normalization and inherent-factor on decision of optimal
coagulant dosage in water treatment by artificial neural network. Expert Syst. Appl. 37(7),
4974–4983 (2010)
8. Zangooei, H., Delnavaz, M., Asadollahfardi, G.: Prediction of coagulation and flocculation
processes using ANN models and fuzzy regression. Water Sci. Technol. 74(6), 1296–1311
(2016)
9. Robenson, A., Shukor, S., Aziz, N.: Development of process inverse neural network model
to determine the required alum dosage at segama water treatment plant sabah, Malaysia
(2009)
848 C. D. Jayaweera and N. Aziz

10. Wu, G., Lo, S.: Predicting real-time coagulant dosage in water treatment by artificial neural
networks and adaptive network-based fuzzy inference system. Eng. Appl. Artif. Intell. 21(8),
1189–1195 (2008)
11. Heddam, S., Bermad, A., Dechemi, N.: Applications of radial-basis function and generalized
regression neural networks for modeling of coagulant dosage in a drinking water-treatment
plant: comparative study. J. Environ. Eng. 137(12), 1209–1214 (2011)
12. Maier, H.: Use of artificial neural networks for predicting optimal alum doses and treated
water quality parameters. Environ. Model Softw. 19(5), 485–494 (2004)
13. Yang, Z., Gao, B., Yue, Q.: Coagulation performance and residual aluminum speciation of
Al2(SO4)3 and polyaluminum chloride (PAC) in yellow river water treatment. Chem. Eng.
J. 165(1), 122–132 (2010)
14. Tomperi, J., Pelo, M., Leiviskä, K.: Predicting the residual aluminum level in water
treatment process. Drink. Water Eng. Sci. (2013)
15. Gregor, J., Nokes, C., Fenton, E.: Optimising natural organic matter removal from low
turbidity waters by controlled pH adjustment of aluminium coagulation. Water Res. 31(12),
2949–2958 (1997)
16. Extreme learning machine: RBF network case - IEEE Conference Publication (2004).
Ieeexplore.ieee.org http://ieeexplore.ieee.org/document/1468985/
17. Volk, C.: Impact of enhanced and optimized coagulation on removal of organic matter and
its biodegradable fraction in drinking water. Water Res. 34(12), 3247–3257 (2000)
18. Yan, M., Wang, D., Ni, J., Qu, J., Ni, W., Van Leeuwen, J.: Natural organic matter
(NOM) removal in a typical North-China water plant by enhanced coagulation: targets and
techniques. Sep. Purif. Technol. 68(3), 320–327 (2009)
Multi-layer Mangrove Species Identification

Fenddy Kong Mohd Aliff Kong, Mohd Azam Osman ✉ ,


( )

Wan Mohd Nazmee Wan Zainon, and Abdullah Zawawi Talib


School of Computer Sciences, Universiti Sains Malaysia (USM), 11800 Pulau, Pinang, Malaysia
fkong.ucom11@student.usm.my, {azam,nazmee,azht}@usm.my

Abstract. One of the challenges encountered by visitors while visiting and


exploring a mangrove park is to identify the mangrove species and instantly
retrieve its related information. This paper presents a mobile-based method for
mangrove species identification called multi-layer mangrove species identifica‐
tion method (MSIA) and its application for Kilim Geo-forest Park, in Langkawi,
Malaysia. This work involves formulating the identification method, the design
and development of the mobile application, and its integration with the Mangrove
Reference Data Centre. The application is designed with a user-friendly interface
to support visitors with limited knowledge on the mangrove tree species. One of
the main features of MSIA is the automatic identification of the mangrove tree
species based on its leaf which enhances the performance of species identification.
Firstly, the mangrove tree leaf is captured using the mobile phone camera. Next,
the first layer in the identification method is carried out to identify the leaf shape.
Once the species has been identified, additional parameters such as types of tree
root, tree bark, flower or fruit will be entered in the second layer of the identifi‐
cation process. A specific mangrove species common name and its related infor‐
mation such as biological information and possible medicinal or commercial
worth of the species will then be displayed. This paper presents the design and
implementation of MSIA as well as the testing and the evaluation of the method
and its application. It is believed that MSIA would make a visit to the mangrove
park more meaningful and enjoying to the visitors.

Keywords: Artificial intelligence · Image processing


Multi-Layer identification · Mangrove species

1 Introduction

Many types of mangrove species can be found in many areas in Malaysia such as in the
Kilim Karst Geo-forest Park in Langkawi. The park is made up of several elongated
hills and islands with narrow valleys in between, and these valleys are home to one of
the best and unique mangrove forest in the world. In 2007, UNESCO officially declared
the Langkawi archipelago as one of 94 globally recognized Geo-parks to be endorsed
for its natural beauty, ecological harmony, and archeological, geological and cultural
significance [1]. One of the challenges encountered by visitors while visiting the
mangrove park is to identify the mangrove species and acquire its related information.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 849–855, 2019.
https://doi.org/10.1007/978-3-030-02686-8_63
850 F. K. M. A. Kong et al.

This paper presents the design and implementation of a mobile application and the multi-
layer mangrove species identification method (MSIA) for visitors visiting the park, and
its integration with the Mangrove Reference Data Centre. Image processing and artificial
intelligence techniques are utilized to enhance the precision of mangrove tree species
identification and a multi-layer approach is adopted. The application is designed with a
user-friendly interface to support visitors with limited knowledge on the mangrove. This
work is part of the Sustainability and Productivity of Mangrove Ecosystem project in
collaboration with the Centre for Research Initiatives of Universiti Sains Malaysia
(USM), USM School of Computer Sciences and USM School of Biological Sciences.

2 Background and Related Work

Image processing techniques have been widely used in plant monitoring and identifi‐
cation. Their applications include detection of plant diseases, plant species identification
and crop growth monitoring. In plant species identification, the solutions range from
methods that are technology independence to methods that are technology dependence.
In an effort to identify mangrove tree species, we will focus on solution involving the
leaf of the plant. Technology independence methods are methods used by the farmers
by carrying out physical inspection on a leaf. This type method has been the most
common method practiced ever since farming started.
In technology dependence methods, various methods and approaches in identifying
the leaf include Leafsnap [2], WAPSI [3] and many more [4–9]. Leafsnap [2] is amongst
the first mobile app for identifying plant species using automatic visual recognition. It
identifies the tree species based on the photographs of the tree leaves. The key component
of the application is the computer vision techniques for discarding non-leaf images,
segmenting the leaf from an un-textured background, extracting features representing
the curvature of the leaf’s contour over multiple scales, and identifying the species from
a dataset of 184 trees in the North-Eastern region of the United States. The application
obtains state-of-the-art performance on the real-world images from the new Leafsnap
dataset, which they consider as the largest of its kind.
Web Application for Plant Species Identification (WAPSI) [3] incorporates content-
based image retrieval (CBIR). At the heart of this application is a shape-based leaf image
retrieval system, which uses a contour descriptor based on the curvature of the leaf
contour that reduces the number of points for shape representation. A two-step algorithm
for retrieving information is used. Firstly, it reduces the search in the database by using
some geometrical features. Secondly, leaf images are ranked using a similarity measure
between the contour representations. The similarity function is applied on different
images to calculate the distance between the characteristic points’ vectors by using a
variant triangular membership function. This function is crucial in producing good
results in the plant species identification. The effectiveness of the method was experi‐
mented on their Web-based application.
On the other hand, there are several mobile apps that can help users to identify flowers
such as Audubon Wildflowers, Flower Pedia and What Flower [10]. However, these
applications only work as a reference guide with no capability to automatically recognize
Multi-layer Mangrove Species Identification 851

and identify the flower from an input image. Table 1 shows a summary of features of
the existing solutions for plant species identification.

Table 1. Comparison of existing plant species identification applications


Criteria Leafsnap WAPSI
Audubon Wildflowers/Flower
Pedia/What Flower
Application Category Mobile App Web Mobile App
Hardware Smartphone Desktop Smartphone
Client-server Architecture Yes Yes No
Own Dataset Yes Yes Yes
Image-Based Identification Yes Yes No
Prompting of Additional Parameters No No No
Automated Detection No No No

The distinct feature of MSIA compared to the existing solutions is the multi-layer
mangrove identification process, which incorporates an image of mangrove tree leaf as
the first layer input and then prompts additional parameters for the second layer of the
identification process. In addition, MSIA provides an automated detection method for
plant species identification.
One of the main features of MSIA is to automatically identify the mangrove tree
species based on its leaf by utilizing image processing and artificial intelligence tech‐
niques. There are several existing applications on plant identification and classification
that uses these techniques. In the method by Aitkenhead et al. [8], the image is rescaled,
compressed and then saved in 24-bit BMP format. A variety of transformations on the
RGB values are performed in an attempt to find the most effective method so that the
system can work on different levels of light condition. A method of highlighting green
from red and blue under all possible light conditions is needed due to the fact that the
system is required to work in different levels of light condition. Thus, the three RGB
values are added together for each pixel, and then the green value is divided. In many
cases, a large proportion of image area is bare soil. In order to reduce the processing
time and optimizing the system’s ability to recognize different plants, only images
containing the vegetation are selected. Thus, an image detection algorithm is imple‐
mented to isolate the areas of interest within each image. This algorithm operates by
performing six steps on each image. Later, their work was extended by Sathya Bama
et al. [5] to work on image interpretation by using morphology and neural network.
Another method is the leaf image retrieval using combined features by Wang et al.
[6]. This application involves two stages of image retrieval process. The first stage is to
identify the eccentricity of an object using shape feature method. Eccentricity which can
be roughly used to classify leaf images is used because of its simplicity and usefulness
as well as its ability to perform translation, scaling and rotation invariants easily. The
second stage of the image retrieval process involves combining three-feature sets which
include eccentricity to retrieve leaf image information.
852 F. K. M. A. Kong et al.

3 Design of Multi-layer Mangrove Species Recognition

Multi-layer Mangrove Species Identification (MSIA) is proposed as a tool to assist visi‐


tors in acquiring information of the mangrove tree instantly. MSIA is implemented in a
mobile app that is designed and developed for identifying mangrove tree species. The
method utilizes image processing and artificial intelligence techniques with multi-layer
recognition in identifying mangrove tree species. It operates based on client-server
architecture that requires an application to be installed in a mobile device for capturing
an image and then sending the captured image for processing at the server side. Then,
it will display the result on smartphone’s screen.
The application has two main modules namely Mobile Module and Server Module
as shown in Fig. 1. The Mobile Module consists of Image Capturing Module and GPS
Capturing Module. The GPS Capturing Module is the back-end process running on the
mobile phone. Its duty include image capturing and retrieving the GPS coordinates of
the current location of the mobile phone, and later these information will be sent to the
server side for image processing.

Fig. 1. Module diagram of MSIA application.

Synchronizing Data Module is responsible for data connection and integration


between Mobile Module and Server Module in sending and receiving data from or to
the server as well as updating data when network is available whereas Server Module
consists of three modules which are Image Processing Module, Artificial Intelligence
Module and Data Updating Module. The Image Processing Module is the first layer
identification process which involves a complex process and requires high memory
usage, and thus this module will be deployed on the server. The Artificial Intelligence
Module receives the result from Image Processing Module as the input. This module
acts as the second layer of the identification process which will further determine the
targeted species by prompting additional parameters from the user e.g., tree’s bark color,
fruit or flower. Then, the module will identify the result of possible specific mangrove
Multi-layer Mangrove Species Identification 853

species based on the Mangrove Reference Data Centre, and finally the result will be sent
to the Mobile Module.
Figure 2 shows the process flow of the multi-layer mangrove species identification
process. Firstly, the mangrove tree leaf is captured. Next, the first layer (Phase 1) of the
process of identifying the leaf shape will be carried out which involves three initial steps,
namely, image pre-processing, leaf shape detection and image classification. The color
of the mangrove leaves is usually green. However, the various shades and a variety of
changes in water, nutrient and atmosphere might result in different levels of green in the
color of the leaves. Thus, the color feature is not very reliable and not suitable to be used
in the leaf recognition process. Therefore, in image-preprocessing step, grey-scaling is
applied to the image and this is followed by initial segmentation to exclude the image
background or non-leaf portion of the image.

Fig. 2. Simplified flow of the multi-layer mangrove species identification method.

Subsequently, leaf shape detection is applied using SURF descriptor algorithm to


find the range of points that will classify the leaf shape into either elliptic shape, oblong-
elliptical shape or round shape. After identifying the leaf shape, the classification process
for possible mangrove species will be carried out by comparing it with the available
dataset. Once the possible species has been identified, additional parameters such as
types of tree root, tree bark, flower or fruit will be entered as the second layer of the
identification process (Phase 2 in Fig. 2). A specific mangrove species common name
and its related information from the Mangrove Reference Data Centre such as biological
information and possible medicinal or commercial worth of the species will then be
854 F. K. M. A. Kong et al.

displayed. If no single species is found, the application will display a list of possible
species to the user.

4 Testing and Evaluation

Testing activities are important in the implementation and deployment activities of the
proposed method and MSIA application. It acts as a validation process and a process to
examine a component, sub-module, module or system in order to determine its opera‐
tional characteristics, and validate on whether it has any defect or fallacy. Testing is
divided into three parts which are unit testing, integration testing and system testing.
Unit testing was performed on every module throughout the development before each
module is integrated with another module. After unit testing, integration testing was
conducted within closely inter-related module. The ease of use and smoothness of user
interfaces are evaluated in integration testing as the apps itself is targeted largely to non-
IT savvy visitors. The elements of Human Computer Interaction are emphasized by
providing sufficient error messages and guidelines in the apps.
From 7th to 9th May 2016, the system was tested on site at Kilim Geo-forest Park.
The outcome was recorded on a video. In brief, we managed to test two specific species
accurately which are Rhizophora Mucronata and Avicennia Marina as shown in
Table 2. Capturing of the image of a leaf and entering correct parameter has given correct
identification. This is very important as the feedback from system testing will reflect on
how well the MSIA application has achieved its purpose.

Table 2. Test cases for system testing


Species Avicennia Marina Rhizophora Mucronata
Family Avicenniaceae Rhizophoraceae
Leaf Elliptic, 8.0 cm long × 3.0 cm wide Broadly elliptic to oblong
Root Pneumatophore, pencil-like. Stilt roots
Bark Reddish brown, papery scaly. Dark brown with cracks or horizontal
fissures.
Fruit Ovoid, to 1.3 cm long, with a short Capsule
pointed apex
Common name Api-Api jambu Bakau belukap, Bakau jangkar,
Bakau kurap, Belukap

5 Discussion and Conclusion

MSIA application which was developed as a tool to assist visitors at Kilim Geo-forest
Park will bring more meaningful and enjoying visit to the visitors. Based on the evalu‐
ation, the application is able to effectively identify the mangrove species based on the
captured image. The application has its own uniqueness as it involves a multi-layer
identification process.
Multi-layer Mangrove Species Identification 855

There are certain limitations and challenges due to external factors that we could not
avoid. Image capturing is limited to only healthy leaves, and the distance between the
camera and the leaf when capturing the image must be less than one meter in order to
fulfil the requirement of at least 60% of the 640 × 640 pixels image is captured with no
overlapping of the leaf image with other leaves or objects.
Storing and Updating Data Module can be introduced to receive data from the
Synchronizing Data Module and store them in a database. Data consisting of the GPS
coordinates, image captured and additional parameters that are entered by users could
later be utilized by any interested party or researchers for various purposes such as in
planning, sustaining and maintaining the mangrove Park.

References

1. Biodiversity Informatics Research Group. Langkawi Mangrove Biodiversity Database


(2009). http://www.mangrove.my/page.php?
2. Kumar, N., Belhumeur, P.N., Biswas, A., Jacobs, D.W., Kress, W.J., Lopez, I.C., Soares,
J.V.B.: Leafsnap: a computer vision system for automatic plant species identification. In:
Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II.
LNCS, vol. 7573, pp. 502–516. Springer, Heidelberg (2012)
3. Caballero, C., Aranda, M.C.: WAPSI: web application for plant species identification using
fuzzy image retrieval. In: Greco, S., Bouchon-Meunier, B., Coletti, G., Fedrizzi, M.,
Matarazzo, B., Yager, R.R. (eds.) IPMU 2012. Communications in Computer and Information
Science, vol. 297, pp. 250–259. Springer, Berlin, Heidelberg (2012)
4. Wang, Z., Chi, Z., Feng, D.: Shape based leaf image retrieval. IEE Proc. Vis. Image Signal
Process. 150(1), 34–43 (2003)
5. Sathya Bama, B., Mohana Valli, S., Raju, S., Abhai Kumar, V.: Content based leaf image
retrieval (CBLIR) using shape, color and texture features. Indian J. Comput. Sci. Eng. 2(2),
202–211 (2011)
6. Wang, X.-F., Du, J.-X., Zhang, G.-J.: Recognition of leaf images based on shape features
using a hypersphere classifier. In: Proceedings of International Conference on Intelligent
Computing 2005, LNCS, vol. 3644, pp. 250–259. Springer, Heidelberg (2005)
7. Kaneko, T., Saitoh, T.: Automatic recognition of wild flowers. In: Proceedings of
International Conference on Pattern Recognition, vol. 02, pp. 2507. IEEE (2000)
8. Aitkenhead, M.J., Dalgetty, I.A., Mullins, C.E., McDonald, A.J.S., Strachan, N.J.C.: Weed
and crop discrimination using image analysis and artificial intelligence methods. Comput.
Electron. Agric. 39(3), 157–171 (2003)
9. Gu, X., Du, J.-X., Wang, X.-F.: Leaf recognition based on the combination of wavelet
transform and Gaussian interpolation. In: Proceedings of International Conference on
Intelligent Computing, LNCS, vol. 3644, pp. 253–262. Springer, Heidelberg (2005)
10. Fortegra homepage. http://www.protectcell.com/News/How-to-use-your-phone-to-identify-
spring-flowers.aspx. Accessed 17 May 2017
Intelligent Seating System with Haptic Feedback
for Active Health Support

Peter Gust1, Sebastian P. Kampa1 ✉ , Nico Feller2, Max Vom Stein1,


( )

Ines Haase2, and Valerio Virzi2


1
Bergische Universität Wuppertal, Gaußstraße 20, 42119 Wuppertal, Germany
kampa@uni-wuppertal.de
2
Technische Hochschule Köln, Betzdorfer Straße 2, 50679 Cologne, Germany

Abstract. Due to infrequent change in posture, static sitting leads to muscular


tension and even possible degeneration of the intervertebral discs and is therefore
one of the main causes of serious physical complaints of the back. This sitting
behavior can be observed particularly at seated workplaces such as office work
or vehicle guidance in transport or long-distance traffic. Previous ergonomic
seating systems have manually or actuator-operated adjustment mechanisms and
in some cases a movable seat mechanism. Postural support is adjusted once and
remains unchanged in different sitting positions. This leads to a lack of – or
incorrect support of – body posture and thus to rapid fatigue of the muscles and
intervertebral discs. A new approach for ergonomic seating systems is the intro‐
duction of haptic feedback through automatic and prospective actuator deforma‐
tion within the seat surface dependent on the user’s individual sitting position and
behavior. Haptic feedback is provided by a composite of a sensor that determines
the distribution of compressive force and an actuator based on a shape memory
alloy. If several units are used in different zones of the seating furniture, the sitting
position can be determined and evaluated in real time and the seat can react intel‐
ligently. If the user exceeds the permissible retention time within a position,
change of sitting position is stimulated by a load-based actuation of the actuators.
The discomfort, barely perceptible to the user, leads to dynamic sitting and thus
actively helps to reduce muscular tension and maintain performance over a longer
period of time. This paper is a draft modular technology concept for the promotion
of dynamic body posture in any seating system.

Keywords: Ergonomics of sitting · Best posture · Posture balancing


Stress-Oriented posture support · Modular system · Prospective design

1 Introduction

The strain on the intervertebral discs and muscles when sitting depends on the sitting
position and seating furniture. Even small changes in sitting position can have consid‐
erable impact on the stresses that occur in the body [1]. The term static posture is used
in this context of a position that is maintained unchanged over a period of time without
movement and with concomitant muscular strain [6]. Static working postures and poor

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 856–873, 2019.
https://doi.org/10.1007/978-3-030-02686-8_64
Intelligent Seating System with Haptic Feedback for Active Health Support 857

sitting positions are generally associated with musculoskeletal discomfort and disor‐
ders [5].
In order to avoid muscle strain, seated workplaces must therefore be designed in
such a way that the user moves body and limbs at regular intervals [3]. The design of
the seat should also encourage spontaneous changes of posture [6]. This is important,
because only with constant movement can fluid transport, and hence also nutrient supply,
be maintained in the intervertebral discs [4]. Frequent change in body position, with
relevant movement of the limbs or other parts of the body in relation to each other or in
relation to a fixed object, is called dynamic body posture [6].
People at seated workplaces suffer just as frequently from back pain as do those who
perform heavy physical work, although the strains on their system are different [2].
Approximately 25% of sickness reports are due to complaints of the musculoskeletal
system [7]. For this reason, measures must be taken to reduce the strain on the muscles
and intervertebral discs.
This need must also be seen in the context of New Work, with its ever-increasing
tendency towards decentralization and individualization [13]. Working time is, with
increasing frequency, decoupled from a specific workplace [13], which means there is
no longer only one fixed work chair, but several at different stationary, mobile – or at
some point in the future even autonomously driven – workplaces; all the more reason
for the promotion of user-tailored dynamic posture.
Norms and standards describing the essential design parameters of seats are defined
for selected work equipment [6, 7]. DIN EN ISO 9241-5, for example, lists four main
features that significantly promote a dynamic posture at computer workstations [6].
These features, then, describe minimum requirements for the configuration of office
chairs [7]:
• seat surface inclination
• movement of seat and backrest
• rollers
• rotatability.
When designing the workplace, it must be ensured that the user can set and change
the sitting position at all times [6].
In addition, backrests with individually adjustable lumbar supports are used to
promote natural sitting posture in the middle lumbar region [6]. This is necessary
because in an upright posture the spinal column has a double S-shaped course in the
sagittal plane, with the cervical and lumbar spine curved convexly forward (lordosis)
and the chest and sacral spine curved concavely backward (kyphosis) [6, 9]. Although
this form also remains in a seated position, a tendency towards kyphosis can be seen
after a long time, especially in the lumbar spine [9]. This tendency can be counteracted
by restoring natural posture by means of lumbar support and thus relieving the inter‐
vertebral discs [9].
In addition to computer workstations, this form of support is indispensable in auto‐
mobiles, again because of the poor posture their sitting position encourages. This is
illustrated in Fig. 1. On the left is a standing person with normal curvature of the spine,
858 P. Gust et al.

in the middle a vehicle driver without lumbar support, and on the right a driver whose
spinal posture is corrected with lumbar support.

Fig. 1. Comparison of the skeleton in standing and sitting position with and without lumbar
support [9].

Individually adjustable armrests also relieve neck and shoulder muscles during work
interruptions or static tasks such as driving a vehicle, and thus prevent muscle pain [6,
10]. Like lumbar support, armrests are also used in many areas and workplaces.
However, some design parameters of static workstations cannot be transferred to
mobile workstations, e.g. in individual or long-distance traffic. This applies above all to
the dynamic parameters, such as the castors and rotatability of the seat. Since position
change is not intended in static work tasks like driving, other measures must be taken
to stimulate dynamic sitting.
Car manufacturers use actuators in the backrest and seat that can be activated indi‐
vidually. For example, the BMW 7 series uses 18 pneumatic lifting elements that can
be activated in a massage and rotation cycle in accordance with medical-physiothera‐
peutic parameters. The arrangement of the actuators within the seat is shown in Fig. 2.
While the massage function promotes blood circulation in the back muscles and relieves
tension, the rotation function slightly twists the occupant’s body, thus lightly stressing
and relieving the intervertebral discs. This promotes the supply of nutrients and thus the
regeneration of the intervertebral discs [8].
Another approach, also from BMW in model X5, is the use of a two-part seat that
promotes dynamic posture by lifting and sinking the seat alternately [10]. This is shown
in Fig. 3 below. Other car manufacturers use similar concepts to support dynamic
posture.
Intelligent Seating System with Haptic Feedback for Active Health Support 859

Fig. 2. Rear seat of a BMW 7 series [8] Fig. 3. Comfort/Active Seat of a BMW X5
Series [11]

All in all, it can be seen that the problem of static posture and resultant complaints
is well-known, and that active countermeasures are already being taken to prevent pain
and physical damage caused by incorrect sitting position and lack of exercise.

2 Haptic Feedback System with Posture Detection

The Introduction to this article has shown that promoting dynamic posture is an impor‐
tant aspect of workplace design. This is done in two ways:
On the one hand, passive elements such as individually adjustable back and armrests
and lumbar supports are used to relieve the body and ensure good postural support. On
the other hand, active – e.g. pneumatic lifting – elements are used in vehicle seats to
promote blood circulation to the muscles and to reduce strain on the intervertebral discs
by actively stimulating postural change.
However, these solutions have one thing in common: The actual posture of the user
and the number of position changes carried out independently of the active system are
not recorded. On the contrary, the system runs predefined cycles whose type and inten‐
sity can be selected by the user. Regardless of actual body size and mass, the user is
guided into positions that are not determined independently.
Figure 4 shows this relationship. Existing systems start from an unknown posture
and guide the user into a defined position with a defined impulse. Within discrete inter‐
vals this defined position is continuously changed to another defined position.
860 P. Gust et al.

undefined t=x defined t=x defined


posture impulse posture
t=x

Fig. 4. Current functionality [own illustration].

The new approach presented here is to combine an actuator with a sensor in order
to determine actual posture and hence to be able to output load and posture-dependent
impulses. The aim is to effect subconscious perception of these impulses by the user and
to initiate an independent change of posture as a result.
Different sitting postures and length of stay within any one position will be deter‐
mined by a distribution of actuators adapted to body stature. The measured data will be
recorded and in this way the individual posture of the user will be detected. Based on
data acquired over time, the user will be brought into a dynamic posture by means of
intelligently controlled and unconsciously perceived impulses. By connecting the
system to a smartphone, it is also possible to provide user recognition, which means that
sitting positions can be recorded and settings transferred independently of the seat. This
is shown in Fig. 5.

independent t < x
change

undefined t << x defined t=x verify


posture posture posture

unnoticed dependent t>x


impulse change

Fig. 5. New functionality [own illustration].

3 A Haptic Feedback System Approach

3.1 Force-Displacement Measurement Chair

The required force and displacement to be applied by the actuator are determined on a
modified office chair. This has a device for measuring force as a function of the displace‐
ment. The test setup is essentially a modified measuring system based on Vink and Lips
[12], who determined the sensitivity of the human body at 32 different points on the
contact surface between body and chair. The cohort of 23 test persons in the study already
shows a clear tendency to pinpoint certain zones as more sensitive than others. The
findings of Vink and Lips are presented in Fig. 6 below [12].
Intelligent Seating System with Haptic Feedback for Active Health Support 861

Fig. 6. Areas with significantly different sensitivities [12].

Table 1 shows the forces determined by Vink and Lips at which test persons felt
clear discomfort. It becomes clear that the area adjacent to the backrest on the seat has
the lowest comparative sensitivity [12]. For subsequent measurement of force-displace‐
ment, for which maximum required values are determined, this range is therefore to be
regarded as the reference zone.

Table 1. Average values at which discomfort is experienced [12]

Position Sensitivity – pressure in N (with standard deviation)


10.92 (5.30) 12.17 (5.94) 15.17 (7.20) 12.61 (6.86) 10.72 (6.68)
11.81 (8.30) 21.57 (16.02) 13.10 (7.99)
Backrest
14.38 (9.44) 19.24 (13.10) 15.36 (9.73)
15.29 (6.59) 21.01 (8.67) 17.18 (8.67)
22.39 (9.33) 23.42 (11.09) 23.21 (7.87) 22.69 (9.40)
Seat pan 23.50 (9.33) 22.39 (10.17) 21.35 (8.01) 22.64 (8.88)
19.76 (8.63) 15.73 (6.23) 16.61 (6.45) 18.83 (7.28)

Figure 7 shows the test setup of the force-displacement measuring device. Compres‐
sive force is applied through a 2.5 sq cm stamp connected to a device measuring the
force through the seat on the test persons. Vertical measurement is ensured by a linear
guide connected to a displacement sensor. This test setup ensures that measurements
and forces are always applied in the same way. Exact positioning of the test person on
the seat is not possible due to different individual anthropometric characteristics.
However, this is not necessary, as sitting habits are in any case different.
862 P. Gust et al.

Piston Cushion
Seat Pan
Force Sensor
Displacement
Sensor Degree of Freedom

Linear Actuator

Fig. 7. Test configuration for force-displacement measurement [own illustration].

For the experiments, 12 test subjects – 4 women and 8 men – were used, which
confirmed the trend of Vink and Lips’s results and at the same time extended these by
indicating the displacement. Table 2 shows the results as measured:

Table 2. Results of force-displacement measurement in the rear right area of the seat
Weight Height Force
Perception Discomfort Perception Discomfort
Average 78.33 kg 16.92 mm 23.42 mm 24.75 N 47.75 N
Standard 14.62 kg 7.54 mm 7.95 mm 16.72 N 21.19 N
deviation

What is noticeable is that the perception threshold and perceptible discomfort have
shifted upwards. This is due to a slightly different test setup in which a seat cushion is
still mounted on the seat. The cushion absorbs part of the force as well as part of the
displacement, resulting in higher readings than Vink and Lips’s measurements. Further‐
more, the discomfort threshold for women is about 50% higher than for men, both in
strength of the force and distance of the displacement. This makes a second difference
from the results of Vink and Lips.
However, since the actuator is to be installed in upholstered seats, the measured
values for the project appear realistic. For discomfort to be generated below the
conscious perception threshold, a required force of 30 N and a stroke of 20 mm is there‐
fore assumed.

3.2 Actuator Design

The actuator has the task of applying a force or impulse to the human body. This impulse
must be initiated in such a way as to cause discomfort that is unconsciously perceived
by the user and leads to the voluntary adoption of a different sitting position. To create
Intelligent Seating System with Haptic Feedback for Active Health Support 863

this discomfort, the force must be applied very slowly. The adjustment speeds required
for this must, then, be determined in tests.
In addition, noise must be avoided, as this can be interpreted by the user as an indi‐
cator of the actuator’s response. The actuator must also be small enough to be installed
in the restricted space of the seat and backrest.
Starting from the data determined in 3.1, the actuator can now be designed. To ensure
smooth and lubricant-free guidance, the actuator is constructed on the basis of a linear
guide with sliding surfaces made of tribologically optimized polymers [14]. The drive
is a Bowden cable with an electrically operated shape memory alloy (SMA) wire. These
wires can be integrated into small spaces and provide smooth, stepless, silent, and very
strong linear tensile force. However, the wire must be long enough for the required
displacement.
The actual application of force on the user should then take place through a polymer
leaf spring. The tractive force of the SMA wire causes the leaf spring to bulge and the
impulse is then transmitted to the user. Figure 8 shows the schematic structure of such
an actuator.

Fig. 8. Design scheme of an actuator based on a shape-memory alloy wire [own illustration].

3.3 Sensors

Capacitive sensors are used to record posture and sitting behavior. The capacitive sensor
system used in the experiments cost only about 0.5% of conventional measuring systems
for localization and quantification of seat force distribution – as used, for example, in
the design of new car seats. It was important to select an affordable, space- and energy-
saving system that provided sufficient accuracy with measuring the available seating
positions and could cope with changing climatic conditions, shocks and impacts.
864 P. Gust et al.

4 System Evaluation

4.1 Actuator Test Bench

In order to verify the concept, a prototype of the actuator using a linear guide was created
on the basis of the design from Fig. 8. This is shown in Fig. 9 below.

Fig. 9. Actuator prototype for force-measuring test [own illustration].

Different materials and material thicknesses were used for the leaf springs to verify
the general feasibility of the system. Within the test, only vertical forces were initially
determined on a test bench using a force measuring device (Fig. 10).

Fig. 10. Actuator test bench [own illustration].

The principle of the SMA-actuator precluded the use of electrically conductive


materials without electrical decoupling, as these would cause a short circuit. For this
reason, different plastics and their properties were tested. The materials used were ABS,
PC, PET, PMMA and PVC in different material thicknesses. The vertical force was
measured at a distance of 12 mm above the initial position in conjunction with the hori‐
zontal force and resultant distance of movement necessary for the deformation of the
Intelligent Seating System with Haptic Feedback for Active Health Support 865

material. Measurement results are shown in Fig. 11. The x axis shows horizontal force
input and the y axis vertical force output.

40
35
ABS 1,5 mm
Vertical Force [N]

30 PET 1,0 mm
25 ABS 1,0 mm
20 PC 0,5mm
15
10
5
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Horizontal Force [N]

Fig. 11. Measurement results of tested material.

Since the tested dimensions of PMMA and PVC were overly large, the graphs are
not shown in the diagram. However, the generated vertical force was proportional in
each case to yield strength and material thickness. Furthermore, with increasing stiffness,
increasing force was required to achieve the required displacement. This is indicated by
the shift of the graphs on the x axis. These findings enabled the leaf spring to be opti‐
mized.

4.2 FEM Optimization of the Actuator

FEM-based optimization is performed to determine a suitable geometry for the leaf


springs, with the aim of decreasing horizontal force and displacement and increasing
vertical actuator force. The optimization is carried out with ANSYS1. The geometry is
based on a shell element with material thickness defined as a parameter in ANSYS. In
order to reduce computing time, a symmetrical system is assumed and only one side is

Fig. 12. FEM model structure.

1
ANSYS, Inc.: https://www.ansys.com.
866 P. Gust et al.

simulated. To measure the resultant vertical force a solid, weightless cylinder is attached
with a rotational spring on top of the leaf spring. Given a constant spring rate, the contact
force can be calculated. The model structure is shown in Fig. 12.
The shape of the leaf spring is based on a shifted catenary curve (1, [cf. 18]), which
makes this shape particularly suitable for static loads – as can be seen from its use in
load-bearing structures such as bridges.
[ (x − x ) ] ( −x )
f (x) = − a ∗ cosh 0
− a + a ∗ cosh 0
−a (1)
a a
The parameters are the width of the plastic actuator, and the material thickness and
height of the function depending on the yield point. In order to reach a deflection of
20 mm and a vertical force of 30 N it is necessary to maximize the curvature factor a
and the width of the leaf spring and to minimize the thickness of the plastic spring. This
correlation can be seen in Fig. 13 below.

Fig. 13. Response surfaces; Left: Width (x), Curvature factor (y) and Deformation (z); Right:
Width (x), Thickness (y) and Deformation (z).

The durability of the actuator depends on the materials used. For safety reasons, the
use of materials that are liable to break is avoided: only ABS, PC and PET are considered
suitable. Hence the equivalent tensile stress (von Mises) is determined with different
material parameters. Optimization results show that PC and PET have similar properties
at a material thickness of 1 mm, but the thermal dimensional stability of PET is lower.
The stress for ABS, on the other hand, is overly high. Due to the significant heating of
the SMA actuator, polycarbonate is therefore preferred.

4.3 Sensor Test Chair


The positioning of the sensors in the seat and backrest follows Nicol [15]. Figure 14
shows the distribution of pressure in the backrest and seat in an upright sitting position,
Intelligent Seating System with Haptic Feedback for Active Health Support 867

followed by the iliac crest and lordosis. From Nicol’s study it can be concluded that the
distribution of forces is clearly visible across the pressure peak in the seat and backrest
[15]. Further investigation will be carried out to confirm this.

Fig. 14. Pressure distribution in an office chair: vertical sitting position [Revised based on Nicol,
cf. 15].

Based on studies by Nicol [15], it can be assumed that the distribution of forces on
the seat surface is clearly visible across the pressure peaks of the buttocks and thighs.
This makes it possible to identify the sitting position by means of four sensors arranged
around the center of gravity of the seat surface, on the basis of moments around the x
and y axes. To integrate the sensors into the seat, they are mounted on a holder placed
between the seat and the seat surface holder. The test setup is shown in Fig. 15.

Fig. 15. Device for measuring pressure distribution in the seat surface [own illustration].

On the basis of optimum pressure distribution of the back (Fig. 16, illustration on
right) the arrangement of the sensors is transferred to the dimensions of a backrest [16].
It becomes apparent that measurement resolution must be increased for clear determi‐
nation of different sitting position in comparison to the seat surface. This is achieved by
using six sensors, as shown in Fig. 16 (left-hand illustration). The advantage of this
arrangement is that inclination of the upper body to one side can be determined.
868 P. Gust et al.

Fig. 16. Left: Arrangement of capacitive sensors [own illustration]; Right: Optimal pressure
distribution of the back [16].

The sensors are read out using a microcontroller and PSoC Designer2 and
programmed in programming language C. In addition, an algorithm is developed that
uses sensor data to determine the position of the user. The algorithm is based on 14
characteristic sitting postures, divided into six load classes. Figure 17 shows the sitting
postures and their classification.
The measured values required to detect the respective position are determined in a
test. The test persons take up the different sitting positions and the sensor data are
recorded. By activating the sensors in different combinations, all 14 sitting positions can
be clearly determined.
The classification is based on biomechanical observations using an EMG measure‐
ment system and a pain scale for self-assessment by the subjects. The following illus‐
tration (Fig. 18) shows the clear correlation between muscle stress, pressure distribution
of the seat surface and contact points of the back surface for two selected seat positions
or seat load classes.

2
Cypress Semiconductor Corp.: http://www.cypress.com/products/psoc-designer.
Intelligent Seating System with Haptic Feedback for Active Health Support 869

Fig. 17. Classification of posture in load classes from low (class 1) to high (class 6) [17].
870 P. Gust et al.

Fig. 18. Relationship between stress and pressure distribution [own illustration].

4.4 Software

Capacitive sensors are installed on the backrest and seat to determine the users’ position.
A microcontroller processes the data received from the capacitive sensors and senses
whether the user is present on a specific sensor or not, and how high the strain on a
particular part of the body is. The presence of the user or lack thereof on different sensors
corresponds to a specific sitting position. Every position has unique characteristics
within its profile, and these can be identified by the algorithm. The key principles of the
algorithm are shown in the following pseudo-code (Fig. 19) for the position multitasking
and lean lateral:

Fig. 19. Pseudo-code [own illustration].

Firstly, the sensor data is stored in respective variables. The Identification function
receives the sensor data and determines the sitting position depending on which sensors
are activated. If none of the sensors in the backrest is activated and the strain on the front
left sensor is higher than the mean of the other seat sensors, the algorithm has the output
“multitasking”.
Intelligent Seating System with Haptic Feedback for Active Health Support 871

Accordingly, there is a maximum time a user should sit in a specific position. An


algorithm recording sitting time for each position is based on the principle of react on
change. After every change of position, the total time is counted and subtracted from
the total time passed. In this way the total time sitting in a specific position is measured
and can be evaluated by the user. When the maximum time for a particular position is
reached, the system enables the actuator; this provokes gentle discomfort in the user that
encourages transit to a different position.
The collected data is transmitted via Bluetooth or via a serial port to other devices
such as cell phones or other microcontrollers. The data consists of users’ sitting position,
duration in this position, processed sensor data, and the command to take action. The
generated data is used by the App to create an individual sitting profile.

5 Results

It has been shown that there is a significant need for modular and active seating systems
that promote dynamic posture at both stationary and mobile – and even autonomously
mobile – workplaces. The aim is to achieve continuous load distribution within the
muscles and spine in order to avoid tension and to ensure a constant flow of nutrients
to, and within, the intervertebral discs. New Work envisions a shift in working behavior
from the classic workplace in a centralized company to a decentralized model in which
work is carried out at different locations [13]. This makes it necessary to integrate a
modular active system into almost every seat.
In addition to existing standards that define the minimum requirements for seating
systems, there are already some approaches, especially in the automotive industry, to
promote a dynamic posture. These approaches range from individual adjustment options
for the seat surfaces and backrests, through armrests and lumbar supports, to actuator
adjustment of individual areas within the contact surfaces between user and seat.
However, previous systems have one thing in common: the changed seating position is
not determined by the user, but by predetermined cycles. Rather than record the actual
sitting position of the user, conventional systems initiate user-independent impulses that
actively cause a change in the sitting position.
The aim of the present research project was, therefore, to develop a modular actuator
that determines the individual sitting posture of the user, continuously monitors it, and
emits unnoticed impulses to change that posture. Using a test bench procedure, the
project determined the necessary forces and displacements for unconsciously perceived
discomfort of test persons. The required parameters, 30 N and 20 mm stroke, were
comparable with the findings of Vink and Lips [12]. The function of the actuator, based
on a plastic leaf spring with SMA drive, was also verified in the tests.
Since the shape of the leaf spring required long adjustment paths in order to achieve
the required stroke, shape optimization was carried out using FEM analysis. It could be
shown that a catenary curve had good shape properties for this application, enabling the
displacement and force applied by the drive to be significantly reduced, thus improving
efficiency.
872 P. Gust et al.

The determination of the seating position with capacitive sensors also provided good
information. The arrangement of the sensors in the seat and backrest in accordance with
Nicol [15] allowed the detection of 14 characteristic sitting postures, which could be
divided into six load classes.

6 Conclusion

Since the results of the research project were entirely positive, the next step is to connect
the sensors with the actuators and combine them in a single module. A possible concept
is shown in Fig. 20. The spring actuator is made of a bent plastic disc which is performed
in accordance with the specific simulation. The actuator spring will be placed in two
linear guided sledges via a fast mounting click-fixture system. Movement heights can
be adjusted in different seats by varying the attack angle or shape of the spring.

Fig. 20. Haptic feedback system with posture detection in automotive seat [own illustration].

The possible positioning of the actuators is illustrated in Fig. 20, using the example
of an automobile seat. Here the distribution of the sensors from Figs. 15 and 16 is taken
over and extended by the actuators in the seat – illustrated in the diagram by green spots
and circles on the seat system. A possible smartphone representation of measurement
data feedback is also illustrated.
In future data collection in the cloud will facilitate learning from many different users
with specific profiles. Seats in different vehicles could also eventually be connected and
adjusted to any user of an ‘intelligent’ chair.

References

1. Wilke, H.-J., et al.: Intradiscal pressure together with anthropometric data – a data set for the
validation of models. Clin. Biomech. 16, 111–126 (2001)
Intelligent Seating System with Haptic Feedback for Active Health Support 873

2. Nachemson, A.: Towards a better understanding of low-back pain: a review of the mechanics
of the lumbar disc. Rheumatol. Rehabil. 14(3), 129–143 (1975)
3. DIN EN 1335-1: Office furniture – Office work chair – Part 1: Dimensions, Determination of
dimensions. Beuth Verlag, Berlin (2002)
4. Wilke, H.-J., et al.: New in vivo measurements of pressures in the intervertebral disc in daily
life. SPINE 24(8), 755–762 (1999)
5. Graf, M., Guggenbühl, U., Krueger, H.: An assessment of seated activity and postures at five
workplaces. Int. J. Ind. Ergon. 15(2), 81–90 (1995)
6. DIN EN ISO 9241-5: Ergonomic requirements for office work with visual display terminals
(VDTs) – Part 5: Workstation layout and postural requirements. Beuth Verlag, Berlin (1999)
7. Lenkeit, M.: Ergonomie. Richtiges Sitzen am Arbeitsplatz reduziert körperliche Belastung.
In: MM MaschinenMarkt, vol. 39, p. 36 (2008)
8. BMW Group: Der neue BMW 7er. Entwicklung und Technik. 1st edn. Vieweg + Teubner,
Wiesbaden (2009)
9. Grünen, R.E., Günzkofer, F., Bubb, H.: Anatomische und anthropometrische Eigenschaften
des Fahrers. In: Bubb, H. et al. (ed.) Automobilergonomie. Springer Vieweg, Wiesbaden
(2015)
10. Bubb, H., Grünen, R.E., Remlinger, W.: Anthropomentrische Fahrzeuggestalgung. In: Bubb,
H., et al. (ed.) Automobilergonomie. Springer Vieweg, Wiesbaden (2015)
11. BMW AG Homepage. https://www.bmw.de/de/topics/service-zubehoer/original-bmw-
zubehoer/original-bmw-zubehoer-showroom/interieur/sitze/komfort-_aktivsitze0.html?
bmw=sea:59424387:1788890067:bmw%20komfortsitze. Accessed 7 April 2018
12. Vink, P., Lips, D.: Sensitivity of the human back and buttocks: the missing link in comfort
seat design. Appl. Ergon. 58, 287–292 (2017)
13. Hackl, B., et al.: New Work: Auf dem Weg zur neuen Arbeitswelt. Springer Gabler,
Wiesbaden (2017)
14. igus® GmbH Homepage. https://www.igus.de/drylin/linearfuehrung. Accessed 11 April
2018
15. Nicol, K.: http://nicol-biomechanik.de/doku.php?id=sitzenbei. Accessed 12 April 2018
16. Hartung, J.: Objektivierung des statischen Sitzkomforts auf Fahrzeugsitzen durch
Kontaktkräfte zwischen Mensch und Sitz, unv. Diss., Technische Universität München
(2005)
17. Feller, N., et al.: Prospective design of seating systems for digitalized working worlds. In:
Goonetilleke, R.S., Karwowski, W. (eds.) Advances in Physical Ergonomics and Human
Factors, pp. 98–105. Springer, Cham (2017)
18. Wohlhart, K.: Statik. In: Grundlagen und Beispiele. Vieweg, Wiesbaden (1998)
Intelligence in Embedded Systems:
Overview and Applications

Paul D. Rosero-Montalvo1,2(B) , Vivian F. López Batista1 , Edwin A. Rosero2 ,


Edgar D. Jaramillo2 , Jorge A. Caraguay2 , José Pijal-Rojas3 ,
and D. H. Peluffo-Ordóñez4,5
1
Departamento Informática y Automática, Universidad de Salamanca,
Salamanca, Spain
pdrosero@utn.edu.ec
2
Universidad Técnica del Norte, Ibarra, Ecuador
3
Intituto Tecnológico Superior 17 de Julio, Ibarra, Ecuador
4
Yachay Tech, Urcuquı́, Ecuador
5
Corporación Universitaria Autónoma de Nariño, Pasto, Colombia

Abstract. The use of electronic systems and devices has become widely
spread and is reaching several fields as well as indispensable for many
daily activities. Such systems and devices (here termed embedded sys-
tems) are aiming at improving human beings’ quality of life. To do so,
they typically acquire users’ data to adjust themselves to different needs
and environments in an adequate fashion. Consequently, they are con-
nected to data networks to share this information and find elements
that allow them to make the appropriate decisions. Then, for practi-
cal usage, their computational capabilities should be optimized to avoid
issues such as: resources saturation (mainly memory and battery). In
this line, machine learning offers a wide range of techniques and tools
to incorporate “intelligence” into embedded systems, enabling them to
make decisions by themselves. This paper reviews different data stor-
age techniques along with machine learning algorithms for embedded
systems. Its main focus is on techniques and applications (with special
interest in Internet of Things) reported in literature about data analysis
criteria to make decisions.

Keywords: Decision making · Embedded systems


Internet of things · Machine learning

1 Introduction
Micro-controllers are electronic systems that work as the central processor unit
(CPU) of an electronic system, their main function is to acquire data from digi-
tal or analogous pins, perform information processing and generate some actions
through output peripherals. The existence of open-source hardware boards has
made developments in electronic applications have considerably increased, reach-
ing lower-cost approaches and providing a platform of free access with large
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 874–883, 2019.
https://doi.org/10.1007/978-3-030-02686-8_65
Intelligence in Embedded Systems 875

amount of information [1,2]. For instance, the AVR micro-controllers are incor-
porated in Arduino. Thanks to its type 8-bit reduced instruction set computing
(RISC) architecture along with its processing memory that can be 16 bits, the
most of its instructions can be performed in a clock cycle to have up to 32
work records, being this feature, one of its biggest benefits [3]. Such micro-
controllers are provided with features such as: hardware and software interrup-
tions, timers, communications ports and sleep modes for energy savings. Origi-
nally, the Arduino boards were provided with the ATmega 168 micro-controller.
More recent releases like the new Arduino distributions include the ATmega
328 micro-controller featuring reduced size and better capabilities mentioned
previously [4].
In recent years, new technologies like Systems On a Chip (SOC) has driven
applications with scaling of embedded systems. This allows the integration of
several microprocessors by increasing stacked cores and improving their process-
ing capacity [3]. Besides, SOC uses 3-D Silicon systems, that means that it are
an advanced packaging which prevents physical damage and corrosion [5]. This
has allowed that different technologies can allows to improve machine learning
capabilities, to take data in real time to process it faster and to determine the
appropriate sampling cycles of the sensors [6]. In virtue of the above commented,
an embedded system (ES) is defined as an electronic system, usually forming part
of a bigger system device that specifically designed to perform certain functions
[7]. Its main characteristic is to use one or several digital processors (CPUs) per
microprocessor, which allows the system helps to major control and gain some
“intelligence” in tasks such as processing information generated by sensors, con-
trolling certain actuators, communicate with others systems, among others. In
the design of an embedded system, specialized engineers and technicians are
usually involved in both electronic hardware and software design [6]. The core of
such module is made by at least one CPU among others: 4, 8, 16 or 32 bits [8]. In
general, for the design of an ES, they has limited resources, the amount of mem-
ory will be scarce, the computation capacity and the number of external devices
will be limited [9]. Regarding the software, there will be specific requirements
according to the application and storage system size.
Currently, the ES more used is wireless sensor networks (WSN) for its appli-
cation flexibility, they are which has been recognized as the most emerging and
interesting technology for the development of the Internet of Things. This has
allowed to increase its popularity in industrial and academic research [5]. The
new WSN products are driving the next wave of exponential growth of systems
in education, acting as sensor nodes in the development of a range of applica-
tions [7].
The rest of this paper is structured as follows: Sect. 2 presents the Internet
of Things and their impact in ES. Section 3 shows in a more broad way to WSN.
Section 4 presents the new protocols used by embedded systems. The intelligence
for ES is showed in Sect. 5, how machine learning algorithms. Section 6 indicates
the most important ES applications. Finally, Sect. 7 presents the conclusions and
remarks of this work.
876 P. D. Rosero-Montalvo et al.

2 Internet of Things
The Internet of Things (IoT) is a shared intelligent network connecting different
types of electronic systems between themselves by sending data to the Internet
through communication protocols. Of this way, millions of connected devices
from an integral and reliable global infrastructure are part of information soci-
ety With intelligent processing it allows people to acquire information to make
decisions [10]. Researchers like [11], explains that the Internet of Things is about
to transform our cities become in smart cities with the cooperation of different
sectors to achieve sustainable results through data analysis. This include, in
addition to sensors, a correct data extraction and processing methodology to
improve the quality of life of all human beings. To do so, it is necessary to make
economies of scale through investment in infrastructure that allows the develop-
ment, management, monitoring, performance analysis and remote diagnosis to
perform a predictive analysis of large amounts of datasets. All this information
flow generates 20 exabytes of information with 25 billion devices. It is expected
that by 2020 there will be 50 billion devices connected to the Internet, that is
6.58 devices per person [12,13].
The standardized protocol to communicate with the Internet is Transmission
Control Protocol (TCP)/IP, which can handle two forms of communication for
one hand, is TCP, it creates connections with each other which guarantees the
delivery of data without errors and without order. If there is a failure the protocol
informs the transmitter to send the information again. Connection between a
transmitter and a receiver allows a specific port to send reliable data. For the
other hand is User datagram protocol (UDP), it is a data transmission protocol
that does not need to establish a previous connection, it is a communication
known as maximum effort, which means that the key part of this protocol is
to send information to the network as soon as possible. This protocol does not
acknowledge if data completely arrived or not [14]. The protocols are not suitable
for IoT. For this reason, researchers was determined ways to make them light [15].

3 Protocols

The ES needs light transmission protocols of computational cost. 6LoWPAN is


a standard that you have entered the notion of ES and wireless sensor networks
based on the transmission of IPV6 packets over IEEE 802.15.4 networks [16].
The appearance of these networks makes it necessary to implement security
mechanisms [15]. The 6LoWPAN protocol stack includes the standard IEEE
802.15.4 MAC layer and IEEE 802.15.4 physical layer, the IP layer adopts the
IETF IPV6 protocol. Thus allowing interconnection between networks [16]. RPL
routing, also known as the network layer protocol, is a distance vector routing
protocol for low power networks, using IPv6. The network devices that run
the RPL protocol connect without present cycles. Proposed by IETF for IPv6
routing, RPL is designed for networks with high packet loss and low power loss
rates [14]. The objective of RPL is to target networks that “comprise up to
Intelligence in Embedded Systems 877

thousands of nodes”, where most nodes have very limited resources, the network
is directed by a central node. Multi point to point, point to multi point and
point to point traffic are included [17].
When an embedded system needs to connect to the IoT, the most commonly
used standards are Message Queue Telemetry Transport (MQTT) and restricted
application protocol CoaP. MQTT is a machine-to-machine communication pro-
tocol (M2M). It is useful for connections to remote locations where a small code
space and/or scarce network bandwidth is required. The (CoAP) is a specialized
web transfer protocol for use with restricted nodes and restricted networks in
the Internet of Things. The protocol is designed for machine-to-machine (M2M)
applications such as smart power and building automation [14,15]. The two
protocols are used very frequently, their difference lies in the different fields of
applications and the necessary latency of information.

4 Wireless Sensor Network


In the beginning WSN was defined as an ad-hoc network formed by a large col-
lection of very simple devices that combine detection, computation and commu-
nication capacity, among others [16]. For 2010 WSN was proposed as a standard
protocol where the nodes communicated to a base station and in this the data
was analyzed, without extending from this area [16].
With the arrival of the Internet of Things, inconveniences happened since
protocol lacked flexibility and compatibility with applications since they require
modifications to the protocols, a very commercial solution is the proposed 6LoW-
PAN standard as a viable method to carry IPV6 through WSN [16]. Unlike other
networks, wireless sensor networks have the feature of collecting detected data
(temperature, pressure, movement, fire detection, voltage/current, etc.) and for-
warding it to the gateway through a protocol one-way communication. Sensor
nodes in WSN can also be considered as a collection of low cost, low power
consumption, and multi-functional wireless sensor nodes [17]. Also, it allowing
to detect data events of unprocessed sensors in scattered nodes. WSN allows to
carry out tasks of supervision, monitoring and distributed control to make quick
decisions with good performance with effective high transmission rates to avoid
excessive energy consumption to avoid failed transmissions. They are networks
are hundreds or thousands of nodes that can solve problems distributive commu-
nicating with each other. The protocol of access to the medium (MAC) is simple
and lightweight at low cost, the control of access to the environment improves the
performance to guarantee the performance of successful transmissions improving
battery life [18].

5 Machine Learning
The increasing use and applications of embedded systems have made it possi-
ble to collect large amounts of data to provide necessary information in real
time to make appropriate decisions by people. In this sense, the veracity of the
878 P. D. Rosero-Montalvo et al.

data takes an important role in the implementation of decision support systems.


Unfortunately, the data can be influenced by uncontrolled variables, such as
environmental factors that impair the validity and usability of them, the data
acquisition noise, voltage unbalance, sensors are affected by faults, transient or
permanent, for mention among the most important considerations. Also, the
execution time and memory restrictions, and the use of algorithms and parame-
ters learned from data, introduce additional levels of uncertainty that affect the
accuracy of the decision-making algorithm. For these reasons, machine learning
(ML) can be provide methods to obtain knowledge of a set of data where people
can not do it because of the quantity and complexity of the information. With
the appearance of Big data, these techniques have become important to discern
few important data and discard what that is not useful to a mathematical model
of prediction or classification [19].
ML has two divisions in groups for data modeling, these are: unsupervised
and supervised. In supervised learning, it creates models based on a historical
data entry to find a certain output, with these algorithms can be divided into
predict or classify [18]. The unsupervised techniques are related to the data that
only have input or the corresponding variables are unknown to find correlations
between the data that were not known, they can be divided into problems of
data grouping (clusters) or the association between them [20]. A ML algorithm
must have model training and validation data to know the degree of accuracy of
the classification or prediction in relation to the real data [21].
The machine learning algorithms need a pre-processing of data where you
can preserve the useful information that represents all the characteristics of a
high-dimensional data set [20], that is, a matrix with many variables that the
human being can not appreciate their relationship [21]. Many ML techniques
reduce the data to a less variables of its nominal version to be intelligible to
the people use. In this article, is mentioned the most useful criteria for data
representation.

5.1 Data Cleaning


The number of data for the training of an algorithm can have a high computa-
tional cost, where many of them do not contribute to the statistical model. The
appropriate selection of the training set allows to increase the effectiveness of
the algorithm, reduce the computational load and the appropriate selection of
the number of data to be used [22]. Data cleaning tries to eliminate the most
amount of data of the training set.

5.2 Pattern Recognition


At WSN, different variables can be acquire of the state of an event, this data can
be obtained with matrix of high dimensionality. Of this way, it is presumable
that there is redundancy or excessive data by the system’s speed reading. A mul-
tivariate filter means that all variables have an importance or weight equivalent
that can represent a data set with less weight. In model-based methods, there
Intelligence in Embedded Systems 879

are one or several dependent or independent variables [23]. The quality of the
adjusted model determines which of them are the most significant for the algo-
rithm. The reduction of dimensionality allows to reduce the size of the matrix
of use for supervised or unsupervised models by eliminating attributes that may
be irrelevant or redundant in relation to the objective to be achieved. Allowing
to improve the quality of the model by focusing on appropriate correlations and
expressing an algorithm with fewer variables that can be better visualized by the
human [23]. Association is a concept and correlation are a measure of association
[24]. This terms are very useful for the algorithms choose the correct data set.

5.3 Dimensionality Reduction

The volume of data increases the difficulty for the detection of patterns and tech-
niques of machine learning, one way to deal with this problem is the presentation
of data in a smaller dimension that maintains the structure of the original space
[24]. Initially the techniques of dimensionality reduction were based on linear
methods, being simple and rigid that did not always represent a set of data,
most variables and their relationships have a complex behavior.
Non-linear models allow such detection of patterns of nature. The methods
of dimensionality reduction (DR) are oriented to the preservation of the data
topology represented in an affinity matrix [25]. DR methods are able to simplify
the description of the data set that can represent large volumes of information
at optimal processing times, while keeping the same properties of the complex
high-dimensional data. As a result, it favors compression, elimination of redun-
dancy and improves the processes with the implementation of machine learning
algorithms.

5.4 Prototype Selection

The original capabilities of most Data Mining techniques have been exceeded
by the deluge of incoming data. However, several techniques try to alleviate
the drawbacks of using an overwhelming amount of data [21]. The techniques
of prototype selection (PS) are methods of data preprocessing whose objective
is to reduce the set of training to generate better representative examples and
improve the rules of the nearest neighbor.
Many selection methods exist in the literature with different properties, these
techniques can be classified into two different approaches, known as prototype
selection (choosing a subset of the original training data) and prototype genera-
tion (new artificial prototypes) [19]. Therefore, its goal is to isolate the smallest
set of instances that allows a data mining algorithm to predict the class of a query
instance with the same quality as the initial data set. By minimizing the size of
the data set, it is possible to reduce the complexity of the space and decrease
the computational cost of the data mining algorithms that will be applied later,
improving their generalization capabilities by eliminating noise [22].
880 P. D. Rosero-Montalvo et al.

5.5 Classification Algorithms

The classification is one of the most studied topics in the machine learning
techniques, the reason for this is that there are a lot of problems in different
areas such as security, medicine or finance that need to classify many of the
data who drive. The objective of the classifiers is to build a model or classifier
from a set of already classified examples that allows classifying new examples
not previously seen in the future [26]. These tasks, are very related with ES, that
means that the most popular ES applications needs classifications algorithms.
The supervised classification problem are divided into two main phases, the
following are detailed: for the one hand, the classification system will use a series
of examples called training set, this information will already be classified to learn
from them [25]. Using the training set, you will create a series of rules or decision
methods to correctly classify the training examples [27]. To another hand, the
classification algorithm needs a test set. This data are useful to determinate
the system performance. Are many ways to test the system, the most used is
confusion matrix [27].

6 Applications
Embedded systems has many applications for their easy installation and acquire
data. Of this way, technologies like this, it becomes the beginning of IoT.

6.1 Farming

Inside in agriculture, the implementation of sensors in crops is proposed to know


the environmental and land conditions to compare weather conditions and deter-
mine the amount of water, fungicides, nutrients, among others. Many agricultural
areas can improve their efficiency by determining harmful weeds for plants and
animals with programmed robots that are responsible for the precise elimina-
tion of the same. Livestock can be monitored to know the areas where they are
and give warning when one has been lost, also with 3D accelerometers can detect
physical problems and share this information among farmers to analyze patterns
of diseases [16].

6.2 Smart Buildings

The buildings collect information by sensors of light, heat, movement, among


others. By interpreting the data, Es can save the consumption of electrical energy
and the generation of algorithms to learn the behaviors of people in physical
spaces [17].
Intelligence in Embedded Systems 881

6.3 Education

An ICAMPUS is to revolutionize the practice of teaching with an ecosystem


of knowledge with the ES use. Projects such as Living Labs that are environ-
ments that unite people with technology to promote innovation, development
and research with a view to putting these advances into a curriculum in schools
and colleges In order to provide tools for a changing world that improves skills
in the global economy, the Smart-boxes (modifiable ES to teach basic electronic)
allow to generate a learning by applying high technology that the student has
different feedback environments using a persuasive and interactive programming
through the use of images that allow to reflect the behavior of an electronic sys-
tem and can link a programming code with real life generating driving conditions
in different electronic devices [7].

6.4 Transport

Transport, one goal is to achieve efficiency and safety, such as warning the car
to slow down when a traffic light changes to yellow or warn of a parking space.
We must consider that 90% of accidents are human errors. Because of that, a
smart environment can improve the decisions of a driver based on traffic data
or vehicle density.
In airports, sensors are being installed to know the flow of people passing by
to deploy extra personnel and help with the long lines; all this traffic of people
can be observed from an application and can redirect the road with different
ways to reach their destination [8].

6.5 Health
A global concern is the population growth, in the year 2025 it is estimated that
there will be around 1200 million elderly people and people over 80 will be 30%
of this population in developed countries and 12% in countries in the process of
development. In addition to problems such as obesity and mental illness increase
social spending. The Internet of Things allows to address the fields of prevention
and early detection, research and health care; since vital signs can be monitored
to collect a large amount of data and be able to determine if some patterns of
life can alter their health, with this information the doctors can perform remote
assistance and allow quick action. This requires a very reliable inter operable
infrastructure for the acquisition and analysis of data and above all, maintaining
the confidentiality of the user. These ES are considerate wearable [17].

7 Conclusions

This work showed about the basic concepts of electronic systems and their trends
in applications of the future. The IoT and WSN are the next stages of embedded
systems are present for their portability and low resources. These systems could
882 P. D. Rosero-Montalvo et al.

not work without efficient learning algorithms that were shown briefly. Machine
learning algorithms in relation to embedded systems take a fundamental part
in the analysis of data, cleaning techniques, pattern recognition, among others.
They allow with low computational resources that electronic systems become
autonomous.
The future of electronic systems are based on the different applications that
can help improve the quality of life of people, its major challenges that must be
continued working are: the durability of the battery, the secure connection to
the cloud and the management of the devices. In addition, the protocols must
be even lighter due to the increasing acquisition and data transmission.
Finally, the algorithms of machine learning and its connection to IoT will be
part of our normal life, these technologies are the industrial revolution 4.0. As a
near need, It must have management devices within the WSN that manage the
amount of data to be sent to the cloud. This in order to avoid the unnecessary
expense of uploading data that does not provide information. The new term to
explain it is the fog of IoT.

References
1. Parameswaran, S., Wolf, T.: Embedded systems security-an overview. Des. Autom.
Embed. Syst. 12, 173–183. https://doi.org/10.1007/s10617-008-9027-x
2. Noergaard T.: Embedded Systems Architecture. Chemistry, p. 657. https://doi.
org/10.1016/B978-0-12-382196-6.00006
3. Kadionik, P.: Introduction to Embedded Systems. Communicating Embedded Sys-
tems (2013). https://doi.org/10.1002/9781118557624.ch1
4. Levy, M., Conte, T.M.: Embedded multicore processors and systems. IEEE Micro,
7–9 (2009). https://doi.org/10.1109/MM.2009.41
5. Toulson, R., Wilmshurst, T.: Embedded Systems, Microcontrollers, and ARM
(2017). https://doi.org/10.1016/B978-0-08-100880-5.00001-3
6. Gu, C.: Building Embedded Systems. O’Reilly & Associates (2016). https://doi.
org/10.1007/978-1-4842-1919-5
7. Edwards, S., Lavagno, L., Lee, E.A., Sangiovanni-Vincentelli, A.: Design of embed-
ded systems: formal models, validation, and synthesis. Proc. IEEE (1997). https://
doi.org/10.1109/5.558710
8. Chin, J., Callaghan, V.: Educational living labs: a novel internet-of-things based
approach to teaching and research. In: 2013 9th International Conference Intelligent
Environments (IE), pp. 92–99 (2013)
9. Alippi, C.: Intelligence for embedded systems. In: Intelligence for Embedded
Systems: A Methodological Approach (2014). https://doi.org/10.1007/978-3-319-
05278-6
10. Kortuuem, G., Keynes, M., Bandara, A.: Educating the internet of things genera-
tion. Computer, 53–61 (2013)
11. Zhao, G.X., Bei, Q.: Application of the IOT technology in the intelligent manage-
ment of university multimedia classrooms. Appl. Mech. Mater., 2050–2053 (2014)
12. Thangavel, D., Ma, X., Valera, A.: Performance evaluation of MQTT and CoAP via
a common middleware. In: 2014 IEEE Ninth International Conference on Intelligent
Sensors, Sensor Networks and Information Processing (ISSNIP), pp. 4–6 (2014)
Intelligence in Embedded Systems 883

13. Alwakeel, S., Alhalabi, B., Aggoune, H., Alwakeel, M.: A machine learning based
wsn system for autism activity recognition. In: 2015 IEEE 14th International Con-
ference on Machine Learning and Applications (ICMLA), pp. 771–776 (2015)
14. Knickerbocker, J., Patel, C., Andry, P., Cornelia, T.: Through-Vias: 3-D silicon
integration and silicon packging technology using silicon. IEEE J. Solid-State Cir-
cuits, 1718–1725 (2006)
15. Singh, K.: WSN LEACH based protocols: a structural analysis. In: International
Conference and Workshop on Computing and Communication (IEMCON). Van-
couver, BC, pp. 1–7 (2015). https://doi.org/10.1109/IEMCON.2015.7344478
16. Sudheendran, S., Bouachir, O., Moussa, S., Dahmane, A.O.: Review - challenges of
mobility aware MAC protocols in WSN. In: Advances in Science and Engineering
Technology International Conferences (ASET), Dubai, Sharjah, Abu Dhabi, United
Arab Emirates, pp. 1–6 (2018). https://doi.org/10.1109/ICASET.2018.8376831
17. Arya, S., Yadav, S.S., Patra, S.K.: WSN assisted modulation detection with max-
imum likelihood approach, suitable for non-identical Rayleigh channels. In: 2017
International Conference on Recent Innovations in Signal Processing and Embed-
ded Systems (RISE), Bhopal, India, pp. 49–54 (2017). https://doi.org/10.1109/
RISE.2017.8378123
18. Khan, A.R., Rakesh, N., Bansal, A., Chaudhary, D.K.: Comparative study of WSN
protocols (LEACH, PEGASIS and TEEN). In: 2015 Third International Confer-
ence on Image Information Processing (ICIIP), Waknaghat, pp. 422–427 (2015).
https://doi.org/10.1109/ICIIP.2015.7414810
19. Rosero-Montalvo, P., et al.: Prototype reduction algorithms comparison in nearest
neighbor classification for sensor data: empirical study. In: IEEE Second Ecuador
Technical Chapters Meeting (ETCM), Salinas, pp. 1–5 (2017). https://doi.org/10.
1109/ETCM.2017.8247530
20. Restuccia, F., D’Oro, S., Melodia, T.: Securing the internet of things in the age
of machine learning and software-defined networking. IEEE Internet Things J.
https://doi.org/10.1109/JIOT.2018.2846040
21. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neigh-
bor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach.
Intell. 34(3), 417–435 (2012)
22. Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance
selection for data reduction in KDD: an experimental study. IEEE Trans. Evol.
Comput. 7(6), 561–575 (2003)
23. Simes, A., Costa, E.: CHC-based algorithms for the dynamic traveling salesman
problem. In: Applications of Evolutionary Computation: EvoApplications (2011)
24. Peña-Unigarro, D.F., et al.: Interactive data visualization using dimensionality
reduction and dissimilarity-based representations. In: Intelligent Data Engineering
and Automated Learning–IDEAL 2017, pp. 461–469. https://doi.org/10.1007/978-
3-319-68935-7 50
25. Rosero-Montalvo, P.D., et al.: Data visualization using interactive dimensionality
reduction and improved color-based interaction model. In: Biomedical Applications
Based on Natural and Artificial Computing - IWINAC 2017, pp. 289–298. https://
doi.org/10.1007/978-3-319-59773-7 30
26. Nuñez-Godoy, S., et al.: Human-sitting-pose detection using data classification and
dimensionality reduction. In: IEEE Ecuador Technical Chapters Meeting (ETCM),
Guayaquil, pp. 1–5 (2016). https://doi.org/10.1109/ETCM.2016.7750822
27. Rosero-Montalvo, P.D., et al.: Elderly fall detection using data classification on a
portable embedded system. In: IEEE Second Ecuador Technical Chapters Meeting
(ETCM), Salinas, pp. 1–4 (2017). https://doi.org/10.1109/ETCM.2017.8247529
Biometric System Based on Kinect Skeletal,
Facial and Vocal Features

Yaron Lavi1, Dror Birnbaum1, Or Shabaty1,


and Gaddi Blumrosen1,2P(&)
1
Tel Aviv University, Tel Aviv 69978, Israel
gaddi.blumrosen@ibm.com
2
IBM Research, Yorktown Heights, NY 10598, USA

Abstract. Identification of human subject in different environments plays a


significant role in many fields like security and health care. The identification
can be performed by using different sensory metrics, often named “biometric”.
Traditional biometric technologies are based mainly on fingerprint, retina, voice,
and face. In this study, the spontaneous use of skeleton, facial, and vocal metrics
is being investigated. For this, a Microsoft Kinect (“Kinect”) system, which was
mainly built to estimate human subject kinematic features are deployed. Kinect
is affordable, non-wearable, and has the potential to assess joints location, voice,
and facial properties simultaneously. A set of skeletal, facial, and vocal features
is extracted, and create a “Kinect Signature” that is used to identify different
subjects in the scene. The methods were verified by a set of four experiments
simulating common realistic scenarios. The experiments indicate that the
skeleton, facial, and vocal metrics derived from the Kinect can differentiate
between different subjects. The results of this work indicate that while skeletal
based metrics are usually more accessible compared to facial and vocal metrics,
facial and vocal metrics are more accurate. Aggregation of all data streams
improves biometric system performance and their continuity in different envi-
ronments and times. Such systems can be a base for an affordable, accurate real-
time biometric system, that can be deployed at home, and public facilities like
hospitals.

Keywords: Biometric  Kinect  Posture  Voice recognition  Face recognition

1 Introduction

Reliable person recognition in different environments plays a significant role in services


where there is need to confirm or determine the identity of an individual, like security
[1], or medicine [2]. A person that is authorized by the system, is referred as part of
white list, and ones that are not, are in the black list [3]. The data collected in the
process of person identification is called, biometric data. Features based on these

Y. Lavi, D. Birnbaum and O. Shabaty—Equal contribution.

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 884–903, 2019.
https://doi.org/10.1007/978-3-030-02686-8_66
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 885

metrics are used to recognize the desired person by using enhance classification
methods [4].
The identification process can utilize different biometric technologies. Traditional
Bio-metric technologies are based on eye’s retina [5], finger print, voice, face, [6] and
recently body posture and gait [7]. The sensor modalities can be based on optical
modalities, like video cameras, electromagnetic measurements like radar, or vocal
ultrasonic signatures [8]. The sensor can be divided into two main categories: direct
and indirect biometric systems [4]. In direct biometric systems, the user performs an
identification procedure, like drowning a fingerprint on a screen, or wearing a coded
marker. In indirect system, the biometric system tries to identify the person without any
intentional action. This has the advantage of person recognition from a distance, and
excludes the need from sometimes a tedious procedure.
Microsoft Kinect™ (Kinect), is an active system originally developed for the
gaming industry, and has recently gained popularity as a tool to asses human activity
and for patient monitoring [9]. The Kinect combines an optical video camera to pro-
duce a real-time stream of color images, an infrared radar technology to produce depth
images stream, and a voice recorder [10]. The Kinect software processes the images
from the infra-red radiation reflections, and aggregates together with the related color
video streams. It can reconstruct skeleton joints parts [11], and key facial points [12].
The Kinect’s capability to assess human kinematic data was recently validated
when compared to an optical marker-based 3D motion analysis [11]. The Kinect
succeeded in measuring spatial characteristics, ranging from excellent for gross
movements to very poor for fine movement such as hand clasping [9]. Recently,
skeletal data observed from single camera and training session, was shown to have
comparable results to Kinect [13]. An algorithm to monitor activities of daily living
based on skeletal data, has the capacity to detect abnormal events in the home [14].
Despite its relatively high accuracy rate and its ability to provide full-body kinematic
information, the Kinect (versions 1 and 2), still possesses the following deficiencies
[21]: (1) limited coverage; (2) distortion of facial and skeleton estimations; and
(3) when multiple people cross through the Kinect range, or when one person is closer
than another or hides the other, the current Kinect application begins an automatic re-
detection process, with different index assignments, which can lead to inaccurate
interpretation of the data [15].
Kinect Based Biometric System can be based on each of the sensor modalities
independently, or on combination of all [4]. Face Recognition is one of the most
extensively researched problems in biometrics, and many techniques have been pro-
posed in the literature. A biometric system based on subjects faces was suggested in
[12]. The Kinect depth images can improve the facial recognition quality based only on
2-D color images [16]. A system that combines human metrology face recognition, and
speaker identification to increase identification performance and range, based on Kinect
was suggested in [17]. A methodology of recognizing subjects based on their gait
patterns was suggested in [18]. The Kinect Signature (KS) based on features like
subject’s size and proportions between different BPs, was used recently to differentiate
between subjects [19, 20]. It was inspired by sonar [8] and radar [21] signatures, which
are patterns unique to each person. These KS attributes can be derived in a separate
calibration phase, or using a priori knowledge about the Subject of Interest (SoI).
886 Y. Lavi et al.

This work suggests using a biometric system based on Kinect facial, skeleton, and
vocal metrics simultaneously, under varying environment conditions. Tools to identify
users by each sensing modality separately are derived together with methods to
aggregate the information from all. The advantage and disadvantage of using each
sensing modality is further discussed. The feasibility of the new technology is
demonstrated in an experiment setup with ten adult subjects (6 males, 4 females), in
extreme environmental conditions, which includes multiple subjects (with occlusion of
subjects’ body parts), change of clothing, and change in illumination.
This paper has three-fold contribution: (1) a new set of low dimension features
based on Kinect skeleton and facial points estimation, and the audio recording;
(2) methods to identify each biometric separately and together; (3) evaluation of the
tolerance of the three Bio-metric system to challenging environmental conditions that
include change of cloth, change of light, and environment with multiple subjects and
objects.
This paper is organized as follows. Section 2 describes the methods used in this
study. Section 3, describes the experimental set-up for evaluation of the technology;
Sect. 4, the results, and Sect. 5 summarizes the results and suggests directions for
future research.

2 Methods

The three Kinect sensory streams of skeleton, face-points, and audio recording are used
as biometric data. First features are extracted for each of the biometric data stream, and
form the KS (Kinect Signature). Existing prior knowledge related to feasible values of
these features and their statistical distribution is used to tune filter parameters, and to
detect artefactual time instances, where the Kinect based features are distorted. The
significant and reliable features for subject identification are chosen and used to
training a classifier. The trained system can be used for continuous subject identifi-
cation determination if the person is in to the white list, black list, or unidentified. The
data analysis stages are summarized in Fig. 1.

2.1 Kinect Biometric Data


The Kinect utilizes independent color (RGB) and depth images streams. The color and
depth images can be aggregated to provide recursive estimations for 3-D joint coor-
dinates (3-D) [22]:
 
^J m ¼ Lj ^J m1 ; C m ; Dm ;
ð1Þ
^J m ¼ Kjm þ xm j ;

where jm is the 3-D joint location of length of 25 (Kinect v2), J^m , and J^m1 are the
joints’ location estimation at time instance m, and m − 1, respectively, C m , and Dm are
the color and depth images at time instance m, Lj is a function that maximizes the joint
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 887

Fig. 1. Kinect based biometric system data flow.

matching probability based on a very large database of people [23], and K, and xm j are
the skeleton joints’ distortion and noise factors, respectively [24].
The image processing is performed independently on each frame using Kinect
training data set [24]. Detected people, are included in the current active set, which is
restricted to maximum 6 people [23]. The number of joints in the skeleton varies
between 20 (Kinect v1) to 25 (Kinect v2) [23]. Still, when one subject hides behind
another, or moves in and out of the Kinect range and the skeleton might become not
valid, and a new registration process is initiated for this subject.
Kinect support three main different facial features: Shape Units (SU), Animation
Units (equivalent to Action Units), full facial edge points (Facial key points from a
facial mesh) of size 1300, and partial key points, named Facial Key Points (FKS), of
size of 5. While AUs are used more for facial expression, and SUs and full HD facial
points require higher data storage, and longer registration process and line of sight
conditions, the FKS of size 5 are estimated together with the skeleton after successful
skeleton assessment [23]. The FKS, are tracked using algorithm like Viola-Jones and
result in five facial points (of the eyes, nose, and two edges of the mouth), projected on
the 2-D color image [25].
888 Y. Lavi et al.

Using the 3-D coordinates related to the 3-D facial points, pm , can be used to
estimate the 3-D facial point vector as follows:
 
^pm ¼ Lf p^m1 ; Cfm ; Dm
f ;
ð2Þ
^pm ¼ pm þ xm
f ;

where ^pm , and ^pm1 are the 3-D facial points’ location estimation of length five at time
instance m, and m − 1, Cfm , and Dm f are the color and depth images of the facial area as
estimated by the Kinect at time instance m, Lf is a function that maximizes the joint
matching probability, and xm f is the facial points’ estimation error.
The Kinect has a high quality audio recording, which can be used for speaker
recognition [17]. The recorded audio signal at time instance m is defined by:

^vm ¼ vm þ xm
v; ð3Þ

where xm
v is the audio noise due to amplifiers noise, and analog to digital conversion.

2.2 Skeleton Based Biometric System


Skeleton based features can be based on the subject kinematics, asymmetry measures
[26], or static features, which have the advantage of being invariant to time [20]. This
work focuses, without loss of generality, on two main skeleton-based static features
families: length, and ratio between different Body Parts (BPs). Color features of the
body, are limited to specific scene. Still, colors of specific BPs like the face, or the
hands, which are not covered by a cloth, are more likely to be used. Static features,
have the advantage that their values can be assumed constant over time, and thus can be
used to identify each subject [27].
A partial sum of BPs’ lengths is associated with the subject’s body dimensions, like
BP’s spread or height, and is defined as:
X  
Lm
s ¼ i;i0 2I
D J^im  ^Jim0 ; ð4Þ

where the operation D is Euclidean distance metric, I is the full set of joint’s indexes,
 
and D ^Jim  ^Jim0 is the length of the BP between joints i0 , and i, which is denoted as
BPi;i0 .
Another complementary static feature to the BPs’ length is the ratio between BPs.
The ratio feature at time instance m, can be defined as a subset of ratios between a set of
BPs. For subset of two BPs, the ratio between the BPi;i0 and the BPl;l0 , is defined by:
 
D J^im  ^Jim0
Rm
s ¼  m  ð5Þ
D ^J  ^J m0
l l
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 889

The set of static KS’s features in (4) and in (5), form the KS’s skeleton features:

Fm m m
s ¼ fLs ; Rs g ð6Þ

Kinect based skeleton estimations suffers from distortion when the subject moves
out of the Kinect effective range and the Kinect uses inaccurate interpolation, or when
there is an erroneous skeleton merge with nearby subject or object. The temporal
distortion in the skeleton estimations can leads to temporal artifacts in the features.
Since the KS’s features are static, they are distributed around their mean value over
different body postures and positions, and the KS’s features mean can be used as their
reference value.
Prior knowledge regard the subjects’ body can be applied. The BP’s length, and
ratio features, have typical minimal, maximal, and mean values, that can assessed by
training or by applying known human philological constraints. These vales can be used
to detect and remove artifacts. Information regard invalid postures [28], subject range
(if it resides in the optimal coverage), and about the times the joints are interpolated by
the Kinect [29], can also be used for artifact removal and correction.
Artifactitious times, when the skeleton estimations are distorted, and as a results
some of the features, deviate from their expected value, can be detected, and removed.
A simple measure for the skeleton features quality can be estimated by the deviation of
the skeleton features using a binary quality measure. The binary quality measure at the
n’th KS at instance time m, is defined as:
  m  
1 ^s \eAs 
F ; F
¼ 
; ð7Þ
Qm s
s 0 else

where F ^s is a the static features estimated without distortion that can be estimated in
 
optimal condition, Fm ^  is distance metric between Fm
s ; Fs
b
s and F s , and eAs is the
skeleton distortion threshold, which is tuned to maximize the artifact detection prob-
ability. The low quality measures can be replaced by the KI’s median values F ^s ; or
interpolated by using physiological constrains on the skeleton as in [30], prior to
feature selection.
Feature selection algorithm chooses the set of features that are more robust to
skeleton joint estimation error, and conceives the most relevant information for subject
identification. An efficient supervised one is Fisher’s linear discriminant classifier,
which has computational and statistical scalability [31]. Tree based Classification
algorithms like Random forest, can be used to select the features in real-time [32].
To decide if the subject in a white or black lists, the KS’s skeleton features after
feature selection Fms are matched to patterns stored in the data base [4] using a pattern
matching algorithm. The results of the matching can include also the confidence of the
decision.
890 Y. Lavi et al.

An identification criterion to check if the subject belongs to a dataset of length


N (whitelist or blacklist) at instance time m is:
 
^ns ¼ argmaxn fs Fs;n ; Fm
s
ð8Þ
s;k [ eDv
s:t: Qm

where fs is the matching pattern function, n is the subject index in the database,
n ¼ 1. . .N, Fs;n is the stored n’th subject skeleton’s KS in the database, Qm
s;n is the
confidence of the pattern matching, and eD is the detection threshold, which is usually
setup to minimize the false detection probability. In case the maximal similarity con-
fidence is below detection threshold, ^ns would be null, indicating the subject is not
likely to be part of the list.

2.3 Facial Based Biometric Estimation


Similar to the skeleton based features, facial features can be derived from the Kinect
FKSs, and can be separate to static and dynamic features [9]. Dynamic features are
related to dynamics of the facial expressions, have high variations, and are less suitable
for subject identification. Static features can be based on: (1) face dimensions
(FD) features, like distance between facial points-data [37]; (2) face proportions
(FP) features, like ratio between distances of the facial points-data; and (3) skin color
consistency.
The facial KS, based on the FPs in (2), can be then defined similar to the skeleton
joints based KS features in (6), as:

Fm m m m
f ¼ fLf ; Rf ; Cf g ð9Þ

P   Dð^fim ^fim0 Þ  m
where Lm
f ¼ D ^fim  ^fim0 , Rm
i;i0 2I
m ^
f ¼ Dð^f m ^f m Þ, C f ¼ C fi , are the length, ratio
 m l l0

features, and C ^fi is the pixel values at the FP ^fim at time instance m.
Prior-knowledge regards the facial dimensions and proportions are used to exclude
inaccurate estimations. Burst noise, can be filtered out by a median filter, and a binary
quality measure at the n’th KS at instance time m, can be defined as:
   
  m ^ 
1 Ff ; ff \eAf 
Qm ¼ ; ð10Þ
s
0 else 

where F ^f is a the static features estimated without distortion, and eAf is the facial
distortion threshold, which is tuned to maximize the artifact detection probability.
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 891

Same feature selection algorithm can be applied with the face. An identification
criterion to check if the subject belongs to a dataset of length N (whitelist or blacklist)
at instance time m is:
 
^nf ¼ argmaxn ff Ff ;n ; Fm
f
ð11Þ
f ;k [ eDf
s:t: Qm

where ff is the matching pattern function, Ff ;n is the stored n’th subject facial KS in the
database, Qm f ;n is the confidence of the facial pattern matching, and eDf is the detection
threshold like in (8).

2.4 Voice Based BioMetric System


A high-pass filter, h, is applied on the audio samples in (3), ^vm , in a process called pre-
emphasis process:

bv m vm  h
h ¼b ð12Þ

The filtered signal is used to produce spectral features, named Cepstral features by
applying Mel-filter banks of size 20, in the bandwidth range of 125–3800 Hz, in a
sliding window of length of 10 ms. The Mel frequency cepstral coefficients are:

Cm
p ¼ FðbhÞ
vm ð13Þ

Delta and double delta coefficients were then calculated using a five-frame window
resulting in a 60-dimensional feature vector [33]:
0
Fm m 0m 00 m
v ¼ fC p ; C p ; Cp g ð14Þ
0 00
where Cpm , Cpm , are the Delta and double delta coefficients.
The features dimension can be reduced by using a “i-vector feature extractor” [34],
scaled down using an LDA matrix, and normalized. These features are fed to a clas-
sifier using according to:
 
^nv ¼ argmaxn fv Fv;n ; Fm
v
ð15Þ
v;k [ eDv
s:t: Qm
892 Y. Lavi et al.

where fv is the vocal matching pattern function, and Fv;n is the stored n’th subject vocal
KS’s pattern in the database, Qmv;n is the confidence of the vocal pattern matching, and
eDv is the detection threshold for the vocal recognition.

2.5 Aggregation of Skeleton, Facial, and Vocal Estimations


Recognition of the subject identity can be transferred to a classification problem using
the features of skeleton, face, and voice:
 
^n ¼ C1 Fm m m
s ; Ff ; Fv ð16Þ

where C1 is the classification function, and n^ is the estimated subject index if in the
white/black list, and null if not.
The classifier can exploit correlations between the data streams. For example, facial
recognition, can enhance vocal recognition [35]. A solution to (16) is to deploy in a first
layer, a classifier for each sensor data stream, and then to feed its estimates as features
into a second layer classifier, constrained by the classification confidence from the first
layer:
 
^n ¼ F n^s ; ^nf ; ^nv
ð17Þ
s:t: Qm s ; Qf ; Qv
m m

The multiple layer implementations, enable using state of the art methods for each
stream, and ease the control over the diversity of the multiple sources. In this work, we
will focus, without loss of generality, on the two-layer classification solution. A sub-
optimal solution to the problem in (16), is a linear combination of the soft estimations
of each subject, and then finding the index with maximal probability, or above
uncertainty threshold:

^n ¼ argmaxn ðasn þ afn þ avn Þ; ð18Þ

where asn ; afn ; avn are a measure of the reliability of each subject n in the list (white or
black).
The reliability measure can be estimated by the quality measure of the estimations,
Qm m m
s Qf , and Qv . Additional prior knowledge can be incorporated into this measure.
,
For example, when the subject is in non-line of sight conditions, its reliability is
reduced [36]. Similarly, when the faces are hidden, the subject turn around, or when the
subject is relatively far away from the microphone, the facial and audio estimations
qualities are reduced [20]. In the binary case, the solution coincides with the well-
known Majority of voting [37].
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 893

When multiple Kinect sensors are available in the environment, point cloud fusion
like the one in [38], can be used to give to a more accurate single Kinect reference, on
which the suggested methods can be applied.

3 Experimental Setup

To validate the suggested methods, and to examine the effect of change in environ-
mental conditions on system performance, we performed four experiment sets on ten
adult subjects (6 males, 4 females), in challenging environmental conditions that
include multiple subjects, shadowing with partial occlusions, change of clothing, and
change in lightning conditions.

3.1 Experiment Setup


The motion sensor was Microsoft Kinect (Kinect v2), which consisted of a depth sensor
based on an infrared (IR) projector and camera, a color camera, and a voice recorder
[38]. The Kinect SDK was used. A dedicated software and Graphical User Interface
(GUI) were built in Matlab software (Matlab INC, version 2016.a). For real-time
implementation, the biometric system can be performed on blocks, that induce a delay
equal to the number of frames in the block [36], or on frame by frame base.

Fig. 2. The system setup.


894 Y. Lavi et al.

The feasibility of the new technology was demonstrated with four experiment sets
in a single room (at Tel Aviv University, Israel) of size 5  10 m, with ten adults
subjects (six adult males, and four adult females, the first, third and fourth subject were
the authors, GB, DB, and YL). The subjects were inside the optimal range region,
within horizontal angular range of 70o, and from 1 to 5 m from the Kinect camera. The
setup, with two subjects and one chair, is shown in Fig. 2.

3.2 Experiment Sets


The experiments sets were designed to show feasibility of the biometric system and to
investigate the performance under real life conditions. The experiments were: (1) A
subject standing still in front of the sensor, spreading his/her hands to the sides,
walking in random directions in the Kinect range, turning around, sitting on a chair 3
times and walk randomly in the Kinect range while holding a cellular phone; (2) A
subject standing, titling his head slowly in all directions, and counting from one to ten;
(3) A subject standing in front of the Kinect and making 5 facial expressions of happy,
sad, angry, surprised, scared. In addition the subject was asked to wear glasses and
remove the glasses; and, (4) A subject changing his/her cloth, walking around with two
other subjects, with change light conditions (from light to dark), and with shadowing
from people and from an daily life object (chair).
The first experiment set was design to derive diverse Kinect skeleton and facial
features for training. The second was to extract vocal features. The third was to test the
facial features tolerance to various facial expressions. The fourth was designed to
evaluate the biometric system tolerance to extreme changes in environmental condi-
tions. Figure 3 shows snap shots from the Kinect system from experiment sets.

3.3 Software Modules


The dedicated software and GUI was used to retrieve the skeleton, facial, and vocal
data streams in real-time on top of the bridge software tool [39]; for monitoring; for
playback for human expert diagnosis; for tagging the different subjects with their
names; and for implementing classification algorithm for subject recognition. Figure 4
(a) describes the GUI for recording, training the classifiers, playback, and monitoring.
Figure 4(b) demonstrates the process of tagging the different subjects in the white list.
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 895

Fig. 3. Different sets description. (a)–(d) shows snap shots from Kinect camera from the training
sets (first, second) and the fourt extreme condition sets.

4 Results and Discussion


4.1 Pre-processing
The data was captured, and labeled for all experiment sets, using the GUI described in
Sect. 3.3. The features were extracted as described in Sect. 2. After extracting the
features, prior-knowledge on human facial and body was applied on the features. For
this, non-feasible features values of the facial and skeletal features were truncated
similar to [38]. The minimal feasible proportion ratio between any two body-parts was
set to be 10%, and the maximal, and minimal body part lengths, were set to 5 and
100 cm for the body, and 2 and 25 cm for the face.
A median filtering was performed for each feature, with values of eAs ¼ 0:4,
eAf ¼ 0:4, and eAv ¼ 0:1, in similar manner as [36]. A baseline for the confidence
values of the features was derived based on deviation from the mean value. Significant
features were derived using F-test criterion, and non-significant features were excluded.
Then a PCA was performed on the features.
896 Y. Lavi et al.

Fig. 4. Different sets description: (a) describe the main GUI for recording, playback, training
and recognition (in real-time); (b) describes the window for labeling the different subject
(considered as being in white list).

4.2 Training
The training was performed on the first experiment set, for the facial and skeletal
streams, and on the second set for the voice. The representation of the training sets after
pre-processing and artifact removal for the skeletal, facial, and vocal features with the
first two PCs are shown in Fig. 5. The explained variance using the first 2 PCs for the
skeletal, facial, and vocal streams was 74.1, 99.8, and 9.1635%, respectively. To reach
an explained variance of over 95%, 8, 2, and 354 features were needed, respectively.
This indicates, on more compact representation for the skeletal and facial features
compared to the vocal features. The higher number of vocal features can be explained
by the dynamic nature of the voice, and its variability over wide range of spectral
portions, compared with the static features of the body and face that were used in this
work.
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 897

Fig. 5. Representation of the skeletal (a), facial, (b), and vocal (c) features in the first two PCs.

4.3 Identification Under Environmental Changes


The effect of the environmental condition in reference to the training set is shown in the
PC’s domain for each data stream in Fig. 6. For the voice, the testing was performed on
the third set, and for the skeletal and facial, was on the fourth set of the extreme
changes in environmental conditions. Due to the spread of the medians from the testing
set, the effect of environmental changes is more significant in the body (skeleton)
identification. The change in environmental conditions increase the spread of the
experiment conditions’ distributions, but still, even in the skeleton, the different subject
seems to be located in relatively separable clusters.
For the identification, Discriminate Function Analysis (DFA) classifier was used to
recognize the subject of interest, according to (8), (11), and (14). The Receiver
Operating Characteristics (ROC) curve, based on the soft DFA classifier outputs is
presented in Fig. 7. The curves are calculated by averaging the ROC curves for all the
subjects for each environmental condition. The ROC curves for skeletal (7a), facial
(7b), and vocal (7c) data streams are shown in Fig. 7(a), (b), and (c), respectively. The
skeletal based estimations are the most effected by the environment conditions, com-
pared to the facial, and vocal, which agrees with the features separation shown in the
scatter plots in Figs. 5 and 6. The true positive rate is high, with lower variability, for
the vocal and facial features, and lower with higher variability for the skeletal
classification.
For the skeletal features, the training success rate is high of around 98%, and false
positive of 0.05, with low standard deviation of 0.1163. For the control, which was
extracted from the times the subject of interest skeleton, the accuracy was reduced to
around 75%, at false positive of 0.1, with higher error standard deviation of 0.2737,
which might be due to occlusions with other objects in the scene while moving, and
due to reaching non-optimal range. The true positive rate dropped to around 65%, for
false positive rate of 0.1, with standard deviation between subjects of around 0.3. This
shows that changes in cloth, occlusion with objects like chair, or other subjects in the
environment, or change of light, all degrade the performance of the Kinect estimations.
For the facial based estimation, the success rate is over 0.96 for all experimental
conditions, with small variance of around 0.11, between subjects. For the vocal,
898 Y. Lavi et al.

Fig. 6. Effect of condition on subject recognition across experiments. The subject representation
in the vocal and facial seems to be less invariant to change in environment conditions, compared
to the skeleton based system.

the true positive rate for the test set was around 0.97, with standard deviation between
subjects of less than 0.1.
These experiments, were mostly performed when the subject was facing to the
Kinect camera, and the distance from the camera, was sufficient for high vocal
recognition. Both are assumption, can’t be always assumed in real-life conditions,
where the subject can be out of the range for vocal identification, or with his back to the
camera, where the facial information cannot be available. Table 1 summarize the tol-
erance to environmental conditions.

4.4 Multi-sensor’s Aggregation


For aggregation, we will use the soft decision of each classifier, according to (17). The
instantaneous weights are the confidence level of the classifier, and without prior-
knowledge assumptions, we assume equal weight for each classifier, the weights of the
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 899

Fig. 7. ROC curves representing the effect of enviromental conditions on the classifier for
skeletal (a) facial, (b), and vocal (c) data streams.

classifiers are equal. An example for sensor aggregation of skeletal, and facial classi-
fiers, is given in Fig. 8. Figure 8(a) and (b) shows the instantaneous skeletal and vocal
based classifier results, and their estimation quality estimated by their classification
confidence. For both estimations the highest confidence is for subject 3, but for the
skeletal estimations, it is not significant, as other subjects, like the second one, also has
high confidence score. Figure 8(b) shows the implementation of Eq. (17), by summing
the average weights, and selecting the maximal value, which results in estimating
subject 3, with higher confidence, compare to the other subject. This demonstrates, that
sensor aggregation, can improve the overall results of the biometric system.
900 Y. Lavi et al.

Table 1. Summary of tolerance to environmental changes


Data stream Skeleton Facial Vocal
Tolerance to change in light Low High Maximal
Tolerance to change in Medium High High
clothing
Tolerance to absence of facial High None High
appearance
Tolerance to multiple persons Low – risk of High Low – subject’s voice
distortion/occlusion might be mixed
Tolerance to nearby objects Low. Risk of High
distortion/occlusion
Tolerance to audio noise None None High
Accessibility Between 1–5 m, in Frontal Anywhere, up to 10 m
limited angles

Fig. 8. Effect of condition on recognition. (a) and (b) shows the skeletal and vocal classification
significancy over time (marked by black dots, between 0 to 1), which can be used as weights in
combining the estimations. (c) shows aggregation of the mean estimation in the experiment time
slot, where the third subject was identified.
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 901

4.5 Real-Time Implementation


The identification can be performed instantaneously or by using blocks. The buffer size,
induce delay in the recognition, but promise to be more accurate, as it average the
statistics over time. In some implementations, the statistics can be calculated on-line,
thus the delay is only initial, when new subject enter the scene. The windows length
can be optimized based on the training set.

5 Conclusions and Future Work

In this paper, we examined the use of Kinect facial, skeletal, and vocal data streams for
forming an enhanced multi-sensing biometric system. Methods were derived to extract
features, filter artifact, and identify each subject separately or together. The methods
were verified and the system performance was evaluated in challenging environmental
conditions that include change of cloth, change of light, and environment with multiple
subjects and objects. The results of this work, show ways to utilize the three inde-
pendent streams of data, and form from each a complementary information in chal-
lenging environment that can improve the recognition based on each stream separately.
In future, the suggested technology should be verified with more environmental
conditions, and with more subjects. The skeletal data can be replaced in future with
single optical camera, by running session of training, and with no sacrifice in perfor-
mance [13]. Utilization of the suggested system and procedure can enable in future
affordable efficient biometric system that has higher tolerance to extreme conditions.
The biometric system can be aggregated with the Kinematic features estimation of the
Kinect, and enable continuous assessment of activity of subject of interest.

Acknowledgment. We would like to thanks the participating in the test sets. Special thanks to
Dr. Hagai Aronowitz from IBM research, for referring the authors to papers in the field of vocal
identification and to related python package that assisted in extracting the vocal features, and last
to Prof. Alex Bronstein, for his help in supervising the students, asking challenging questions in
their final examination, and in contributing from his wise comments to improve the paper quality.

References
1. Prabhakar, S., Pankanti, S., Jain, A.K.: Biometric recognition: security and privacy concerns.
IEEE Secur. Priv. 1(2), 33–42 (2003)
2. Mishra, D., Mukhopadhyay, S., Kumari, S., Hurram Khan, M.K., Chaturvedi, A.: Security
enhancement of a biometric based authentication scheme for telecare medicine information
systems with nonce. J. Med. Syst. 38(5), 41 (2014)
3. Gorodnichy, D.O.: Evolution and evaluation of biometric systems. In: IEEE Symposium on
Computer and Intelligence for Security and Defense Applications, CISDA 2009 (2009)
4. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans.
Circ. Syst. Video Technol. 14(1), 4–20 (2004)
5. Wildes, R.P.: Iris recognition: an emerging biometric technology. Proc. IEEE 85(9), 1348–
1363 (1997)
902 Y. Lavi et al.

6. Jain, A.K., Hong, L., Kulkarni, Y.: A multimodal biometric system using fingerprint, face,
and speech. In: International Conference on Audio- and Video-Based Biometric Person
Authentication (AVBPA), pp. 182–187 (1999)
7. Iwashita, Y., Uchino, K., Kurazume, R.: Gait-based person identification robust to changes
in appearance. Sens. (Switz.) 13(6), 7884–7901 (2013)
8. Blumrosen, G., Fishman, B., Yovel, Y.: Noncontact wideband sonar for human activity
detection and classification. IEEE Sens. J. 14(11), 4043–4054 (2014)
9. Springer, S., Seligmann, G.Y.: Validity of the kinect for gait assessment: a focused review.
Sens. (Switz.) 16(2), 1–13 (2016)
10. Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
11. Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from RGBD images.
In: IEEE International Conference on Robotics and Automation, pp. 842–849 (2011)
12. Min, R., Kose, N., Dugelay, J.L.: KinectFaceDB: a kinect face database for face recognition.
IEEE Trans. Syst. Man Cybern. Syst. 44(11), 1534–1548 (2013)
13. Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, H., Seidel, H., Xu, W., Casas,
D.A.N., Theobalt, C., May, C.V.: VNect: real-time 3D human pose estimation with a single
RGB camera. In: Siggraph, pp. 1–13 (2017)
14. Da Luz, L., Masek, M., Lam, C.P.: Activities of daily living classification using depth
features. In: IEEE International Conference on IEEE Region 10, TENCON 2013, pp. 1–4
(2013)
15. Galna, B., Barry, G., Jackson, D., Mhiripiri, D., Olivier, P., Rochester, L.: Accuracy of the
Microsoft Kinect sensor for measuring movement in people with Parkinson’s disease. Gait
Post. 39(4), 1062–1068 (2014)
16. Goswami, G., Vatsa, M., Singh, R.: Face recognition with RGB-D images using kinect. In:
Bourlai, T. (ed.) Face Recognition Across the Imaging Spectrum, pp. 281–303. Springer,
Cham (2016)
17. Ouellet, S., Grondin, F., Leconte, F., Michaud, F.: Multimodal biometric identification
system for mobile robots combining human metrology to face recognition and speaker
identification. In: Proceedings of IEEE International Workshop on Robot and Human
Interactive Communication, vol. 2014, pp. 323–328, October 2014
18. Sinha, A., Chakravarty, K., Bhowmick, B.: Person identification using skeleton information
from kinect. In: Sixth International Conference on Advances in Computer-Human
Interactions, ACHI 2013, pp. 101–108 (2013)
19. Blumrosen, G., Miron, Y., Plotnik, M., Intrator, N.: Towards a real time kinect signature
based human activity assessment at home. In: 2015 IEEE 12th International Conference on
Wearable and Implantable Body Sensor Networks (BSN), pp. 1–6 (2015)
20. Blumrosen, G., Miron, Y., Intrator, N., Plotnik, M.: A real-time kinect signature-based
patient home monitoring system. Sensors 16(11), 1965 (2016)
21. Blumrosen, G., Uziel, M., Rubinsky, B., Porrat, D.: Noncontact tremor characterization
using low-power wideband radar technology. IEEE Trans. Biomed. Eng. 59(c), 674–686
(2012)
22. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M.,
Moore, R.: Real-time human pose recognition in parts from single depth images. Commun.
ACM 56(1), 116–124 (2013)
23. Microsoft Inc. (2015). http://www.microsoft.com/en-us/kinectforwindows/
24. Daniel, H.J.K.: Joint depth and color camera calibration with distortion correction. IEEE
Trans. Med. Imaging 34(10), 2058–2064 (2012)
25. Microsoft: No Title. Report. https://msdn.microsoft.com/en-us/library/dn785525.aspx
26. Gkalelis, N., Tefas, A., Pitas, I.: Human identification from human movements. In: 2009
16th IEEE International Conference on Image Processing (ICIP), pp. 2585–2588 (2009)
Biometric System Based on Kinect Skeletal, Facial and Vocal Features 903

27. Sinha, A., Chakravarty, K., Bhowmick, B.: Person identification using skeleton information
from kinect. In: Proceedings of International Conference on Advances in Computer-Human
Interactions, no. c, pp. 101–108 (2013)
28. Calderita, L.V., Bandera, J.P., Bustos, P., Skiadopoulos, A.: Model-based reinforcement of
kinect depth data for human motion capture applications. Sensors 13(7), 8835–8855 (2013)
29. Donath, L., Faude, O., Lichtenstein, E., Nüesch, C., Mündermann, A.: Validity and
reliability of a portable gait analysis system for measuring spatiotemporal gait character-
istics: comparison to an instrumented treadmill. J. Neuroeng. Rehabil. 13(1), 1–9 (2016)
30. Huang, H.Y., Chang, S.H.: A skeleton-occluded repair method from kinect. In: 2014
International Symposium on Computer, Consumer and Control (IS3C), pp. 264–267 (2014)
31. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn.
Res. 3(Mar), 1157–1182 (2003)
32. Breiman, L.E.O.: Random Forests, pp. 5–32 (2001)
33. Aronowitz, H., Aronowitz, V.: Efficient score normalization for speaker recognition,
pp. 4402–4405 (2010)
34. Garcia-romero, D., Espy-wilson, C.Y.: Analysis of I-vector length normalization in speaker
recognition systems
35. Wang, J., Zhang, J., Honda, K., Wei, J., Dang, J.: Audio - visual speech recognition
integrating 3D lip information obtained from the Kinect. Multimed. Syst. 22(3), 315–323
(2016)
36. Blumrosen, G., Miron, Y., Plotnik, M., Intrator, N.: Towards a real-time kinect signature
based human activity assessment at home. In: Body Sensor Network (2015)
37. Kittler, J., Society, I.C., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE
Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)
38. Córdova-Esparza, D.-M., Terven, J.R., Jiménez-Hernández, H., Herrera-Navarro, A.-M.: A
multiple camera calibration and point cloud fusion tool for Kinect V2. Sci. Comput.
Program. 143, 1–8 (2016)
39. Terven, J.R., Córdova-Esparza, D.M.: Kin2. A Kinect 2 toolbox for MATLAB. Sci.
Comput. Program. 130, 97–106 (2016)
Towards the Blockchain-Enabled Offshore Wind
Energy Supply Chain

Samira Keivanpour1 ✉ , Amar Ramudhin2, and Daoud Ait Kadi3


( )

1
Department of Management, Information and Supply Chain, Thompson Rivers University,
Kamloops, BC, Canada
skeivanpour@tru.ca
2
Logistics Institute of University of Hull, Hull, UK
Ramudhin@hull.ac.uk
3
Department of Mechanical Engineering, Laval University, Quebec, Canada
Daoud.Aitkadi@gmc.ulaval.ca

Abstract. While the technology of offshore wind production is more or less


mature, there are still many issues to be solved for mass production and deploy‐
ment of wind farms at reasonable costs. Hence, one of the challenging topics in
developing a supply-chain strategy of offshore wind energy is efficiency on the
one hand and meeting the requirements of stability, flexibility, and adaptability
from market, technology and policy perspectives on the other. Supply-chain
management of offshore wind energy requires processing a large amount of data
and effective traceability and visibility. Blockchain as a fresh technology could
provide the solution for interoperability and collaboration among a number of
suppliers dispersed all over the world. This paper seeks to discuss the opportu‐
nities of blockchain technology in upstream, midstream and downstream of
offshore wind energy.

Keywords: Offshore wind energy supply chain · Blockchain · Transparency


Visibility · Industry 4.0

1 Introduction

The offshore wind industry is expanding fast around the world due to several advantages
of this source of renewable energy. The stronger winds resources in the offshore areas,
the lack of social and geographical constraints of onshore wind power, the technology
evolution and the increasing demand for electricity in coastal regions are some of these
beneficial factors [1]. The players in the supply chain of offshore wind energy include
developer/owner and operator, turbine, substation, foundation, array cabling and export
cable manufacturers, consultants and other contractors for ports, geophysical survey,
navigations, project management, maritime traffic and navigation risk, insurer, vessel
supports and suppliers, and cable positioning. The different services for vessel installa‐
tion, accommodation, maintenance, and logistics should also be considered. Considering
this complexity, handling the needs of stability, flexibility, adaptability and cost effi‐
ciency is essential in offshore wind supply chain. Hence, supply-chain management of

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 904–913, 2019.
https://doi.org/10.1007/978-3-030-02686-8_67
Towards the Blockchain-Enabled Offshore Wind Energy Supply Chain 905

offshore wind energy requires processing a large amount of data and effective tracea‐
bility and visibility. Blockchain technology is a fresh concept that has been taken into
account in recent years. The application of this technology in logistics and supply chain
is in the infant stage. In this study, first, a brief review of the application of blockchain
in the supply chain in different industrial contexts is provided. Then, based on the
configuration of the offshore wind energy supply chain, a conceptual framework is
proposed to discuss the application perspective of blockchain in the offshore wind
industry. The contribution of this study consists in taking a new step toward blockchain-
enabled supply chain in offshore wind energy. The rest of the paper is organized as
follows: Sect. 2 explains the characteristics of the offshore wind energy supply chain.
Section 3 provides a review of blockchain technology and its application in logistics and
supply chain, Sect. 4 explains the conceptual framework and finally, Sect. 5 concludes
with some challenges and the opportunities for future research.

2 Offshore Wind Energy Supply Chain

Handling the needs of cost efficiency, stability, flexibility, and adaptability are essential
in offshore wind supply chain. The offshore wind industry is growing and there are many
opportunities for cost efficiency due to economies of scale, technology improvement
and policy supports. The supply chain plays a critical role in this cost reduction. ORE
Catapult [2] published a report on analyzing cost reduction opportunities in offshore
wind farms. Three indicators of competition, collaboration, contracting are mentioned
as the key parameters for assessing cost reduction opportunities in the supply chain.
Kaiser and Snyder [3] developed a model for estimation of the costs of offshore wind
energy. For supply chain, the authors highlighted the role of turbines, foundation, cables,
and installation services. They mentioned that few players in turbine manufacturing, the
installation and the amount of required investments for vessel construction are the main
reasons for high costs of development and maintenance. EC HARRIS [4] recommended
increased competition, vertical collaboration and economic scale as three essential
drivers for cost reduction in offshore wind supply chain. Roeth et al., [5] studied cost
reduction opportunities of New York offshore wind energy. The authors discussed global
competition, innovation, and collaboration among the players as the key drivers of cost
reduction. The supply chain of offshore wind farms includes three sections. Upstream
includes turbine manufacturing and its three tiers of suppliers (sub-components, parts,
and materials), offshore section (foundation, substation, and vessel) and its three tiers,
cable manufacturing (export and inter-array cables) and its three tiers and finally research
and development. The midstream includes wind farm developing and all relevant serv‐
ices such as logistics, construction and installation services as well as operations. The
downstream includes the power companies and the end users of electricity generated by
the offshore wind farm. The supply chain configuration is shown in Fig. 1.
906 S. Keivanpour et al.

Fig. 1. Upstream, midstream and downstream of offshore wind energy supply chain.

Typical wind condition such as the speed of the wind, direction, and intensity or
ocean conditions such as average wave height, period, and the tide could affect facility
design, operations planning and performance monitoring of offshore wind farms [6].
The Marine subsurface condition such as ocean depth temperature, marine growth,
seafloor scour could affect operations facility design. In addition, extreme conditions
such as extreme wind gusts, hurricane, lightning, and earthquakes also affect energy
projection, design, operations and performance of offshore wind farms. Water depth,
wind regime, distance to shore and geographical location of offshore wind farms are
critical features in supply-chain performance. Another key issue in wind energy oper‐
ation is addressing the balance between supply and demand. Based on the variability of
the wind power, there is a need for energy storage. The large-scale storage technologies
could provide a secure energy in the electrical load and help voltage stabilization.
Adaptability of suppliers in response to rapid innovation and technology trends in
offshore wind industry plays an essential role in supply-chain management. Turbine
components, foundation structure, wind farm layout, and the electrical grid connection
are essential elements in determining the configuration of the offshore wind farms that
are strongly dependent on technology development [7].
The government also plays a critical role in the renewable energy sector. The plan
and policy provided by local government could shape the business structure. The mech‐
anisms for tax credit and subsidies have impacts on investment and development of the
wind power zones. Moreover, the other role is supporting the local suppliers and business
sectors to lead the growth and competition in offshore wind energy market. The collab‐
oration between government and different industrial parties could facilitate and expedite
the development process and remove the barriers and challenges.
Governance of the offshore wind supply chain is also challenging. Degree of stand‐
ardization in decision-making process, performance measurement systems through
offshore wind supply chain, the degree of control over tiers of suppliers in upstream,
midstream and downstream [8], number of suppliers in different tiers of supply chain,
number of operating locations, the interdependency of the suppliers and suppliers lead
time and bullwhip effects are the sources of supply-chain complexity.
Towards the Blockchain-Enabled Offshore Wind Energy Supply Chain 907

Hence, supply chain management of offshore wind energy requires processing a large
amount of data and effective traceability and visibility.

3 Industry 4.0 and Blockchain Technology in Logistics and Supply


Chain

Internet of things (IoT), big data analysis and Industry 4.0 are fresh concepts in the
manufacturing context. IoT is a network of physical objects that provides interaction
and collaboration of the different objects [9] and facilitates data exchange and decision-
making and efficiency. Hofmann and Rüsch [10] discussed the application of Industry
4.0 in logistics management. The authors highlighted the value of cyber-physical
systems in two logistics practices (Just in Time (JIT) and Kanban systems). The authors
stressed that Industry 4.0 can improve the cross-company oriented logistics model by
facilitating transparency of information sharing, traceability of information and handling
the real-time information in different tiers of the supply chain.
The blockchain is a decentralized network of the actors that can handle the cloud
functions in the network in a peer-to-peer framework. The value of this new technology
cannot be limited in economic and finance by using transparency in transactions among
several users, but can also be extended to several social, humanitarian and scientific
applications [11]. Blockchain technology includes the chain of several blocks that can
be shared among users. Each block includes three essential elements: data inside the
block (including the sender and receiver information and the data related to the exchange
between these two users) a hash (that is defined as the identity of the block based on the
inside data) and previous block hash (Fig. 2). There are no authority, middleman or
centralized centres for storing and tracking the records in blockchain network. Instead
of a centre agent, each agent in the network has access to a ledger that includes the
records of the data. Hence, the distributed ledger is an essential mechanism in the block‐
chain. The smart contract is a set of instructions for the users of the blockchain network.
This digital contract is a pre-written agreement between the users. There are few studies
that focused on the application of blockchain in the supply chain or industry chain. A
brief review is provided in this section. Tian [12] studied the utilization of blockchain
technology in traceability of agriculture food supply chain. The author discussed using
Radio Frequency Identification (RFID) and blockchain technology in building a decen‐
tralized supply chain for improving monitoring and controlling the safety and quality of
food in China. Sikorski et al. [13] introduced the application of machine-to-machine
blockchain in the chemical industry. The authors developed a simulation for trading
between two producers and one consumer over a blockchain. Korpela et al. [14]
discussed the role of blockchain in the integration of supply chain in a digital supply
chain framework. Clauson et al. [15] investigated the application of blockchain in the
public healthcare supply chain. The authors highlighted that considering the role of
safety and security in the health supply chain, blockchain technology would be valuable
for functionality, integrity and data provenance in the healthcare context. Madhwal and
Panfilov [16] proposed the application of blockchain in aircraft assembly operation. The
authors discussed that traceability of the blockchain could increase the security of spare
908 S. Keivanpour et al.

parts market with the authenticity of the parts. Debabrata and Albert [17] discussed the
advantage of blockchain in performance management of supply chain. The authors
emphasized that blockchain improves the visibility of supply chain and the security of
information technology-based systems for monitoring and controlling of the supply
chain’s players. The synthesis of the literature reveals some critical points. First, appli‐
cation of blockchain technology in the supply chain is in the infant stage of development.
The scholars discussed some application perspectives. However, implementation and
deep analysis require empirical research and evidence from real industrial cases. Second,
the applications in the industrial contexts with high need of safety, security, and trace‐
ability are more stressed. Third, cross-company orientated logistics models and busi‐
ness-2-business (B2B) integration in the supply chain are better candidates for using
this technology in the future.

Fig. 2. Blockchain elements (blocks, data, hash and previous block hash).

4 The Conceptual Framework of the Offshore Supply Chain


with Blockchain Technology

In this section, a conceptual framework is recommended that discusses the application


of blockchain technology and Industry 4.0 in the offshore wind energy supply chain.
This framework is designed based on the configuration of offshore supply chain shown
in Fig. 1. In this framework, blockchain enabled upstream, midstream and downstream
are shown (Fig. 3).
In upstream and midstream of offshore wind energy supply chain, blockchain-based
ERP and blockchain-based suppliers’ records and database management could be
valuable. Parikh [18] discussed the advantage of using blockchain in Enterprise resource
planning (ERP) systems. The author emphasized the role of blockchain in tracking with
shared end-to-end provenance and enhancing the security of exchange of information
among the players. Andrews et al. [19] also discussed the benefits of blockchain interface
Towards the Blockchain-Enabled Offshore Wind Energy Supply Chain 909

Fig. 3. The contribution of blockchain technology in different tiers of the supply chain.

on ERP systems at the strategic level. The players in supply chain of offshore wind
energy could be classified into eight categories: developer/owner and operator, turbine,
substation, foundation, array cabling, export cable, consultants and other contractors for
ports, geophysical survey, navigations, project management, maritime traffic and navi‐
gation risk, insurer, vessel supports, and cable positioning. The other category of
suppliers is vessel suppliers. The different services for vessels including installation
vessels, accommodation, and maintenance vessels should be considered. Based on the
variety of supply-chain processes, the number of players is large. For example, the
number of suppliers for two offshore wind farms in the UK, Robin Rigg wind farm and
Walney Phase 1, is 138 and 237 respectively (Offshore 4C website [21]). Collaboration
and transparency among the suppliers are crucial in this industrial context. Integrated
material and flow across the supply chain with blockchain-enabled platform can provide
full transparency via distributed ledger. For example, a distributed ledger system for
turbine and substation suppliers is shown in Fig. 4.

Fig. 4. Distributed ledger system for turbine and substation suppliers.


910 S. Keivanpour et al.

Turbine manufacturer, transport logistics supplier, turbine installation provider and


turbine maintenance agent should collaborate closely for effective production planning
and ordering process. Distributed ledger could facilitate transparency and the integration
of supply chain. If there is any change in the MRP (Material Requirement Planning) of
one supplier, the other suppliers and vendors could get an immediate notification for
adapting to this change. This level of transparency for offshore operation is critical as
the offshore weather condition can play an important role in project development. The
records and updated information regarding suppliers could be exchanged via a block‐
chain-based database management system. This integrated framework can provide the
benefits for transport performance measurement of suppliers and enable the supply chain
to react to unforeseen changes immediately. This platform can also be used for auditing
the suppliers during different phases of the project development. Software-as-a-Service
(SaaS) cloud-based (private cloud) can be used for suppliers’ performance management.
Another perspective of the blockchain in offshore wind energy is used in logistics and
transportation. Marine and weather condition monitoring (real-time data monitoring)
could be used for intelligent routing systems and track autonomous trucks and vessels.
Real-time optimization, controlling and communication are three essential characteris‐
tics of cyber physics systems. Distributed ledger system of blockchain could facilitate
the communication and detection of any problem in the logistics system. The logistics
and maintenance of offshore wind farm are also critical for maintaining the power and
the availability of turbines. Blockchain-enabled inventory management systems can
improve planning, ordering, and delivery of spare part items from onshore location to
offshore sites. Distributed ledger enables information-sharing exchange among vendors,

Fig. 5. Distributed ledger enables information-sharing exchange among vendors, inventory sites,
autonomous vessels and offshore sites.
Towards the Blockchain-Enabled Offshore Wind Energy Supply Chain 911

inventory sites, autonomous vessels and offshore sites (Fig. 5). Another opportunity of
using blockchain in offshore wind energy is blockchain-based smart grid for increasing
the stability and adaptability of production considering the volatility of energy market.
Mengelkamp et al. [20] developed a model with a decentralized market platform for
energy trading in the local market. The authors concluded that using blockchain in
renewable energy market reduces the electricity costs.

5 Discussion and Conclusion

Supply chain plays an important role in cost reduction of offshore wind farms. The
complexity of supply chain also contributes to the cost and competitiveness of the supply
chain. The uncertainties in the market, technology and policy sides push decision makers
to handle the adaptability, stability and cost efficiency of supply chain at the same time.
Industry 4.0 and cyber-physical systems revolution in manufacturing and process aid
integration of communication, computerization, and controlling of the systems and
processes. Blockchain technology as a fresh concept in Industry 4.0 paradigm enables
supply chain in handling visibility, traceability, and security of flows of information,
material, and money. In this paper, we discussed the application of this technology in
the offshore wind energy supply chain. Blockchain can provide the solution for
improving dynamics and strategic management of complexity in the supply chain.
Table 1 summarizes the essential challenges of supply chain development in the offshore
wind energy industry and the potential solutions of blockchain technology for these
challenges. This study provides the first attempt to explain the application of decentral‐
ized ledger concepts and smart contract in offshore wind energy context. However, there
are a lot of opportunities and avenues for future research. Utilizing UML (unified
modelling language) for modelling the information exchange in decentralized ledger
systems could clarify the application perspective. The challenges of applying blockchain
including its impact on Switching costs, Bargaining power of suppliers, confidentiality
of data and security should be addressed in a separate study. A SWOT analysis
(strengths, weaknesses, opportunities, and threats) is recommended to highlight the
challenges and opportunities of this new technology in the context of offshore wind
energy. Considering the rapid growth of renewable energy in China, the related chal‐
lenges in this market in comparison to the European renewable energy market for appli‐
cation of blockchain should be addressed. For example, the level of trust and transpar‐
ency required in blockchain-enabled supply chain should be discussed in future research.
912 S. Keivanpour et al.

Table 1. The challenges in offshore wind energy and opportunities provided by blockchain
technology
No The challenges in the supply chain of offshore The solution provided by blockchain
wind energy
1 Regulation effect (policy for development, Increase transparency of real-time energy
tax credits, and subsidies, supporting the trades and create prospects to assess credit
suppliers and business sectors) risk precisely (see also Stanley-Smith [22])
2 Technology effect (foundations, site Distributed ledgers of innovation and
selection, wind measurement, wind turbines, research and development increase
electrical transmission, and operation) technology evolution and diffusion among
the manufacturers of key components
(turbine, foundation, and electrical
transmission)
3 Market effect (Capital and operating cost Distributed market platform increases the
changes, discount rate, the rate of investment visibility in the market and reduces the risks
return, debt, power purchase agreement) of investment
4 Formalization (degree of standardization in The decentralized platform makes the
decision-making process, performance opportunity of autonomous agents and the
measurement systems through offshore wind self-controlling mechanism by smart
supply chain) contracts
5 Centralization (the degree of control over top-
tiers of suppliers in upstream, midstream and
downstream of the supply chain)
6 Horizontal integration (number of suppliers Interoperability can improve collaboration
in each tier of offshore wind supply chain) and among suppliers via connected platform
Vertical integration (number of tiers in (including IoT and cloud computing)
upstream, midstream and downstream)
7 Spatial (number of operating locations)
8 Relationship coherence (level of connection
between firms in the supply chain)
9 Suppliers lead time Transparency and visibility in supply chain
decreases the bullwhip impact
10 Demand variability The blockchain-based smart grid can
improve energy trade and decrease the
impacts of uncertainty in demand volatility
11 Supply variability IaaS and blockchain-based database
management systems can keep tracking of
suppliers’ records and improve the
adaptability of supply chain
12 The complexity of the process and planning Real-time tracking of material and
integration of supply chain via
decentralized ledger can improve project
planning

Acknowledgment. This research project was funded in part by the GreenPort Hull.
Towards the Blockchain-Enabled Offshore Wind Energy Supply Chain 913

References

1. Keivanpour, S., Ramudhin, A., Ait Kadi, D.: The sustainable worldwide offshore wind energy
potential: a systematic review. J. Renew. Sustain. Energy 9(6), 065902 (2017)
2. ORE Catapult (2015). https://ore.catapult.org.uk/reports-and-resources/reports-publications/
ore-catapult-reports/
3. Kaiser, M.J., Snyder, B.: Offshore wind energy cost modeling: installation and
decommissioning, vol. 85. Springer, Heidelberg (2012)
4. OWCRP (2012). https://www.thecrownestate.co.uk/media/5614/ei-echarris-owcrp-supply-
chain-workstream.pdf
5. Roeth, J., McClellan, S., Ozkan, D., Kempton, W., Levitt, A., Thomson, H.: New York
Offshore Wind Cost Reduction Study. New York State Energy Research and Development
Authority: Albany, NY, USA (2015)
6. Elliot, D., Frame, C., Gill, C., Hanson, H., Moriarty, P., Powell, M., Wynne, J.: Offshore
resource assessment and design conditions: a data requirements and gaps analysis for offshore
renewable energy systems. US Department of Energy, Washington, DC, USA, Technical
report DOE/EE-0696 (2012)
7. Higgins, P., Foley, A.: The evolution of offshore wind power in the United Kingdom. Renew.
Sustain. Energy Rev. 37, 599–612 (2014)
8. Choi, T.Y., Hong, Y.: Unveiling the structure of supply networks: case studies in Honda,
Acura, and DaimlerChrysler. J. Oper. Manag. 20(5), 469–493 (2002)
9. Jeschke, S., Brecher, C., Meisen, T., Özdemir, D., Eschert, T.: Industrial internet of things and
cyber manufacturing systems. In: Industrial Internet of Things, pp. 3–19. Springer, Cham (2017)
10. Hofmann, E., Rüsch, M.: Industry 4.0 and the current status as well as future prospects on
logistics. Comput. Ind. 89, 23–34 (2017)
11. Swan, M.: Blockchain: blueprint for a new economy. O’Reilly Media, Inc. (2015)
12. Tian, F.: An agri-food supply chain traceability system for China based on RFID & blockchain
technology. In: 2016 13th International Conference on Service Systems and Service
Management (ICSSSM), pp. 1–6. IEEE (2016)
13. Sikorski, J.J., Haughton, J., Kraft, M.: Blockchain technology in the chemical industry:
machine-to-machine electricity market. Appl. Energy 195, 234–246 (2017)
14. Korpela, K., Hallikas, J., Dahlberg, T.: Digital supply chain transformation toward blockchain
integration. In: Proceedings of the 50th Hawaii International Conference on System Sciences
(2017)
15. Clauson, K.A., Breeden, E.A., Davidson, C., Mackey, T.K.: Leveraging blockchain
technology to enhance supply chain management in healthcare. Blockchain in Healthcare
Today (2018)
16. Madhwal, Y., Panfilov, P.B.: Blockchain and supply chain management: aircrafts parts
business case. In: Annals of DAAAM & Proceedings, vol. 28 (2017)
17. Debabrata, G., Albert, T.: A framework for implementing blockchain technologies to improve
supply chain performance
18. Parikh, T.: The ERP of the Future: Blockchain of Things (2018)
19. Andrews, C., Broby, D., Paul, G., Whitfield, I.: Utilising financial blockchain technologies
in advanced manufacturing (2017)
20. Mengelkamp, E., Notheisen, B., Beer, C., Dauer, D., Weinhardt, C.: A blockchain-based
smart grid: towards sustainable local energy markets. Comput. Sci. Res. Dev. 33(1–2), 207–
214 (2018)
21. Offshore 4C website. http://www.4coffshore.com/windfarms
22. Stanley-Smith, J.: Blockchain and tax: what businesses need to know? Int. Tax Rev. (2016)
Optimal Dimensionality Reduced
Quantum Walk and Noise
Characterization

Chen-Fu Chiang(B)

State University of New York Polytechnic Institute, Utica, NY 13502, USA


chiangc@sunyit.edu

Abstract. In a recent work by Novo et al. (Sci. Rep. 5, 13304, 2015),


the invariant subspace method was applied to the study of continuous-
time quantum walk (CTQW). In this work, we adopt the aforementioned
method to investigate the optimality of a perturbed quantum walk search
of a marked element in a noisy environment on various graphs. We for-
mulate the necessary condition of the noise distribution in the system
such that the invariant subspace method remains effective and efficient.
Based on the noise, we further formulate how to set the appropriate
coupling factor to preserve the optimality of the quantum walker. Thus,
a quantum walker based on an N by N Hamiltonian can be efficiently
implemented using the near-term quantum technology.

Keywords: Quantum walk · Dimensionality reduction


Optimization · Graph

1 Introduction

In quantum computing, quantum walks are the quantum analogue of classical


random walks. Quantum walks are motivated by the use of classical random
walks and they can be further used for the design of randomized algorithms and
pseudo random number generators [12], in addition to solving search problems.
For some oracular problems, quantum walks provide an exponential speedup over
any classical algorithm [4,6]. Quantum walks also give polynomial speedups over
classical algorithms for many practical problems, such as the element distinctness
problem [2], the triangle finding problem [9], and evaluating NAND trees [7].
The well-known Grover search algorithm can also be viewed as a quantum walk
algorithm.
Quantum walks can be formulated in both discrete time [1] and continuous
time [8] versions. The connection between discrete time quantum walk and con-
tinuous time quantum walk has been well studied. It was shown that continuous-
time quantum walk can be obtained as an appropriate limit of discrete-time
quantum walks [3]. In this work, we focus on the study of continuous-time quan-
tum walk (CTQW), not only because it offers a simpler physical picture but
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 914–929, 2019.
https://doi.org/10.1007/978-3-030-02686-8_68
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 915

also it is less challenging to perform CTQW experiments in comparison to their


discrete-time counterparts. Based on these motivations, we set out to inves-
tigate how to optimize CTQW searches on a uniform complete multi-partite
graph. Although uniform complete multi-partite graphs constitute just a subset
of all possible graphs, they include some of the most important examples, such
as complete graphs, complete bipartite graphs and star graphs, in applications
of quantum walks to computations.
In this work, we adopt the invariant subspace method from [10], which allows
us to perform a dimensionality reduction to simplify the analyses of CTQW on
a uniform complete multi-partite graph. In short, the key is to transform the
original graph to a much simpler structure yet retain pertinent properties that
we would like to investigate, such as the optimality of a quantum walk search.
In this way, the analysis becomes more transparent and the dynamics of the
walker can be more intuitively understood on an abstract level. Throughout the
text, we also refer to a multi-partite graph as a P -partite with a slight twist
on the standard notation. The difference is that the whole graph has actually
P + 1 partitions where the extra one partition is the partition that contains the
solution (marked vertex).
The contribution from this work is as follows. By applying the systematic
dimensionality reduction technique via Lanczos algorithm, we extend the appli-
cable graphs from complete graphs, complete bipartite graphs and star graphs
[10] to uniform complete multi-partite graphs. We extend a reduction scheme
to transform an arbitrary N by N adjacency matrix Ha of a uniform complete
multi-partite graph into a 3 by 3 reduced Hamiltonian that has fast transport
between its two lowest eigenenergy states. We further parameterize the coupling
factor based on the configuration of a given uniform complete multi-partite graph
to keep the CTQW search on uniform complete multi-partite graphs optimal.
Finally we characterize the error patterns in which systematic dimensionality
reduction still takes place such that coupling factor based on our formula will
keep the quantum walker search remain optimal.
The remainder of the article is organized as the following. In Sect. 2, we
brief on the work that applies the Lanczos algorithm to perform dimensionality
reduction to obtain the right form of the reduced adjacency matrix. In Sect. 2.3
we further develop theorems to show (a) how to choose the correct coupling
factor based on the given parameters (configuration) on a reduced adjacency
matrix and (b) the optimality is preserved once transformed back to the original
adjacency matrix. By adding additional constraints to uniform complete multi-
partite graphs, we recover many useful examples such as complete graphs, star
graphs and complete bipartite graphs. The reduced Hamiltonian of a uniform
complete multi-partite graph is slightly different for each of these three cases
because there are transitions among partitions that behave differently for each
case. In Sect. 3, under three types of errors, systematic disorder, static diagonal
disorder, and reducible non-diagonal disorder, we characterize the errors such
that systematic dimensionality reduction to a 3 by 3 Hamiltonian is still feasible.
With successful reduction, application of our coupling factors in the experiment
916 C.-F. Chiang

will keep the quantum walker search optimal. Finally in Sect. 4, we draw our
conclusion.

2 Background

In this section, we explain and brief the mechanism and important theorems for-
mulated in optimizing quantum walk search in uniform complete multiple-partite
graphs in a perfect setting, i.e. error free. In this way, the analysis becomes more
transparent and the dynamics of the walker can be more intuitively understood
on an abstract level. Throughout the text, we also refer to a multi-partite graph
as a P -partite with a slight twist on the standard notation. The difference is
that the whole graph has actually P + 1 partitions where the extra one partition
is the partition that contains the solution (marked vertex).
A uniform complete P-partite graph (UCPG) can be denoted as
G(V0 , V1 , ..., VP ). Let V be the union of all partitions. A UCPG is a graph with
P + 1 partitions of vertices with the following properties: (1) each vertex vi in
vertex partition Vj connects to all other vertices in vertex partition Vk as long
as j = k (2) except vertex partition V0 , each of the vertex partitions has the
same size. Let the size of the vertex partition Vj be mj , i.e. mj = |Vj |. Then
we know that for a UCPG G with N vertices, it automatically satisfies that
P × m1 + m0 = N since m1 = m2 = · · · = mP . An example of UCPG is given
at Fig. 1.

Fig. 1. A UCPG graph G(V0 , V1 , V2 ) where m0 = 3 and m1 = 2. The white element is


the marked element |ω that resides in partition V0 .

2.1 Dimensionality Reduction

Without loss of generality, let us assume the marked vertex |ω is in V0 and
m0  1. The original adjacency matrix is

   P 
 
Ha = |iω| + |ij| + |ij|. (1)
i∈V,i∈V
/ 0 j∈V0 ,j=ω i∈V,i∈V
/ 0 k=1 j∈Vk i∈V,i∈V
/ k
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 917

Algorithm 1. Dimensionality Reduction and Coupling Factor Determination


Require: A UCPG G of N nodes with one marked element |ω
Ensure: |ω can be found efficiently by CTQW
Start of process
• Dimensionality Reduction: Construct the reduced 3 by 3 Hamiltonian Hra by use
of Lanczos algorithm on the N by N adjacency matrix Ha based on a UCPG G
• Hamiltonian Construction: Construct CTQW Hamiltonian Hseek = −γHra −|ωω|
• Basis Change: Express Hseek = H (0) + H (1) in the eigenbasis (|ω, |e1 , |e2 ) of
H (0) by applying perturbation theory
• CTQW Initialization: Determine coupling factor γ to induce fast transport between
two lowest eigenenergy states |ω and |e1  in Hseek
• Existence of Constant Overlap: Demonstrate the initial starting state |s and |e1 
have a non-exponentially small overlap such that |s can reach |ω efficiently via |e1 .
Therefore the optimality (quadratic speed-up) is thus preserved.
End of process

With renormalization, we can express the N vertices in the P + 1 partitions


in the subspace spanned by |ω, |SV0 −ω , |SV1 , · · · , |SVP  where
1 
|SV0 −ω  = √ |i, (2)
m0 − 1 i∈V ,i=ω
0

1 
|SVi  = √ |j. (3)
mi
j∈Vi ,i=0

Define the following state that


1 
|SV̄0  = √ |i. (4)
N − m0
i∈V,i∈V
/ 0

By use of Lanczos algorithm and the fact that partitions not containing |ω have
the same size, the reduced adjacency Hamiltonian Hra in the (|ω, |SV0 −ω , |SV̄0 )
basis is
⎡ √ ⎤
0 0  N − m0
Hra = ⎣ 0  0 (N − m0 )(m0 − 1)⎦ (5)

N − m0 (N − m0 )(m0 − 1) N − m 0 − m1
The entry Hra (3, 3) can be easily verified because of the uniform size of non-
solution partitions and (P − 1) × m1 = N − m0 − m1 . To be complete, we provide
the reduction process based on Lanczos algorithm in Appendix A.

2.2 Hamiltonian Construction and Basis Change


For simplicity, let us define α = m m1
N and α1 = N . Since Hra expressed in the
0

(|ω, |SV0 −ω , |SV̄0 ) basis captures the same dynamics as Ha , the Hamiltonian
of a CTQW can be defined as [5]
Hseek = −γHra − |ωω| (6)
918 C.-F. Chiang

where γ is the coupling parameter between connected vertices. By (5, 6), we


know Hseek = H (0) + H (1) in the (ω, SV0 −ω , SV̄0 ) basis is1,2
⎡ ⎤
−1 0  0
⎢ −γN α(1 − α) ⎥
H (0) = ⎣ 0 
0 ⎦ (7)
P α2
0 −γN α(1 − α) −γN ((1 − α) − 1−α1 )
⎡  ⎤
0 0 −γ (1 − α)N
H (1)
=⎣  0 0 0 ⎦. (8)
−γ (1 − α)N 0 0
Prior to proceeding further, it is worth noticing that the format of this
reduced Hamiltonian differs from the format derived in [10] for a complete bipar-
tite graph. The difference is the existence of a self-loop entry for the basis vector
SV̄0 . It later propagates in Hseek and H (0) . Because of this entry, in order to do
systematic dimensionality reduction, it imposes a stronger constraint of equal
size for partitions that do not contain the solution. We address this issue in order
to generalize the result shown in [10] for UCPG.
In the remainder of the section, we introduce Theorem 1, Lemma 1 and
Theorem 2. The relationships among them provide the foundation for showing
the optimality preserving of the underlying CTQW. The optimality preserving
is explained in Subsect. 2.3. Theorem 1 provides us the technique to construct
the reduced Hamiltonian Hseek in the eigenbasis (|ω, |e1 , |e2 ) of H (0) . Lemma
1 discovers important properties of Hamiltonian Hseek written in the eigenbasis
(|ω, |e1 , |e2 ) to be used in Theorem 2. Theorem 2 shows the necessary condition
for fast transport to occur in Hseek by tuning the coupling factor γ.
Now we prove Theorem 1 to show how to express a reduced Hamiltonian
Hseek in the basis of its major matrix via perturbation theory. For simplicity,
let us simply call Hseek as H in the theorem.

Theorem 1. Given a reduced Hamiltonian H = H (0) + H (1) in the


(|ω, |b1 , |b2 ) basis where

⎡ ⎤ ⎡ ⎤
−1 0 0 0 0 v2
H (0) = ⎣ 0 0 v1 ⎦ , H (1) =⎣0 0 0⎦ (9)
0 v1 v 3 v2 0 0

v√1 and v2 are negative numbers and v3 is a non-positive number where v1 /v2 =
N α ≥ 1. Let the eigenvectors

basis of H (0) be (|ω, |e1 , |e2 ). We choose
2 +β+ |b2 )
κ = vv31 ≥ 0 and β± = κ± 2κ +4 , then we know eigenvector |e1  = (|b1√ 21+β+

1
Clear that α1 = (1 − α)/P .
2
Entry (3,3) at H (0) is thus −γN ((1 − α) − (1 − α)/P ) = −γ(N − m0 − m1 ).
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 919

(|b1 +β− |b2 )


and eigenvector |e2  = √ 2
where the corresponding eigenvalues are
1+β−
λ± = v1 β± . H can thus be written in the (|ω, |e1 , |e2 ) eigenbasis as
⎡ ⎤
−1 v2 √ β2+ v2 √ β2−
β+ +1 β− +1
⎢ √ 2 ⎥
⎢ v2 β+ +1 ⎥
H = ⎢ β −β λ 0 ⎥. (10)
⎣ +√ 2− +

−v2 β− +1
β+ −β− 0 λ−

Proof. It is clear to see that |e1  and |e2  are both vectors of linear combination
of |b1  and |b2 . Without loss of generality, let |e  = |b1 +β|b2  be an eigenvector
of H (0) with eigenvalue λ. After some calculation we obtain λ = βv1 where

κ ± κ2 + 4 v3
β= , κ= . (11)
2 v1
√ √
2 2
For simplicity, let β+ be κ+ 2κ +4 and β− be κ− 2κ +4 . By renormalizing the
eigenvectors |e1  = |b1  + β+ |b2 , |e2  = |b1  + β− |b2 , we have

|e1  |e2 
|e1  =
, |e2  =
(12)
β+
2 +1 β−
2 +1

such that
H (0) |e1  = λ+ |e1 , H (0) |e2  = λ− |e2  (13)
where
λ± = β± v1 . (14)
In the (|ω, |b1 , |b2 ) eigenbasis, from (9) we know H (1) |b1  = 0, H (1) |ω =
v2 |b2  and H (1) |b2  = v2 |ω. To express H (1) in the (|ω, |e1 , |e2 ) eigenbasis, by
simple basis change, we obtain

H (1) |e1  = v2 (β+ /( β+


2 + 1))|ω, H (1) |e2  = v2 (β− /( β−
2 + 1))|ω (15)

v2 β+
2 +1 −v2 β−
2 +1
H (1) |ω = v2 |b2  = |e1  + |e2 . (16)
β+ − β− β+ − β−
Hence, the Hamiltonian H can be expressed as shown in (10).

Lemma 1. Given a derived reduced Hamiltonian H written in the


(|ω, |e1 , |e2 ) basis as shown in Theorem 1, we then know that (a) Hamilto-
nian H is symmetric and (b) β+ > 0 > β− and λ+ < 0, λ− > 0.
920 C.-F. Chiang

Proof. With the value of β± as shown in Theorem 1, we know that


β+ β− = −1 (17)
and it immediately leads to the observation that
β+ (β+ − β− ) = β+
2
+ 1, β− (β+ − β− ) = −(1 + β−
2
). (18)
With this observation, we can immediately conclude that

β+ v2 β+ 2 +1
β− −v2 β− 2 +1
v2
= , v2
= . (19)
β+
2 +1 β+ − β− β−
2 +1 β+ − β−

Therefore,
√ the property (a) that H is symmetric is proved. For property (b),
since κ2 + 4 > κ > 0, we immediately have β+ > 0 > β− . And with the fact
that v1 < 0 and λ± = β± v1 , we can also immediately conclude that λ+ < 0 and
λ− > 0.
For simplicity, let
β+ β−
δ 1 = v2
, δ 2 = v2
. (20)
β+
2 +1 β−
2 +1

By use of Lemma 1, H can be written in the (|ω, |e1 , |e2 ) basis as


⎡ ⎤
−1 δ1 δ2
H = ⎣ δ1 λ+ 0 ⎦ (21)
δ2 0 λ −
where |ω and |e1  can form the basis for the two states of the lowest eigenvalue.
Theorem 2. Given a Hamiltonian H in the form shown in Lemma 1, it is
desirable to have λ+ = −1 such that |ω and  |e1  form the basis for the two
states of the lowest eigenvalue. Since v1 = −γN ( α(1 − α)) then the degeneracy
between site energies of |ω and  |e1  facilitates transport between these two low
energy states, hence γ = (N α(1 − α)β+ )−1 . The transport between |ω and
|e2  is prohibited since δ2 is much smaller than λ− .
Proof. Since we desire to have faster transport between the lowest eigen energy
states, we need to set
λ+ = v1 β+ = −1. (22)

With the fact that v1 = −γN ( α(1 − α)), we need to set

γ = (N α(1 − α)β+ )−1 (23)
From (20, 21) and λ− in (14), we know δ2 is much smaller than λ− because
(since m0  1, α = m0 /N , then αN  1)
δ2 v2 1 1
=
=
<√ . (24)
λ− v1 (β− + 1)
2 αN (β− + 1)
2 αN
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 921

For a given UCPG G, by use of (7–9), we can properly bound κ as


P α2 
v3 (1 − α) − 1−α1 (1 − α)(1 − P1 ) 1−α
κ= =  = √ , 0≤κ< . (25)
v1 α(1 − α) α α

2.3 From Existence of Constant Overlap to Optimality Preserving


For a search space of size N , classical search has the complexity
√ of O(N ). Quan-
tum walk search provides a quadratic speed-up O( N ) in comparison to its
classical counterpart. Note that the complexity is for the number of calls to a
single step of a search operation. In the remainder of this subsection, we will
show that the quadratic speed-up (optimality) remains with the γ chosen based
on Theorem 2.
For a given UCPG G, the processing flow described in Algorithm 1 can be
shown as a flow chart in Fig. 2.

Fig. 2. The procedure from systematic dimensionality reduction, basis change, fast
transport and finally optimality preservation.

By using the theorems and lemma from Subsect. 2.2, Hseek can be expressed
as (10) in the eigenbasis (|ω, |e1 , |e2 ) of H (0) . By rewriting (12) using applying
(5, 6) and Theorem 1, we know
(SV0 −ω + β+ SV̄0 ) (SV0 −ω + β− SV̄0 )
|e1  =
, |e2  =
(26)
1 + β+
2 1 + β−
2


κ± κ2 +4
where β± = 2 .

For a CTQW based on Hseek , we need to decide the value of coupling param-
eter γ to ensure the optimal performance of the underlying quantum walk is pre-
served. If the coupling parameter γ is wrongly chosen, the underlying CTQW
922 C.-F. Chiang

search might not remain optimal, i.e. its quadratic speed-up might be lost. The
determination process of correct γ is shown in Theorem 2. Theorem 3 is an
extension of Theorem 2 to various cases with respect to the values of variable P
and variable α.

Theorem 3. Given a UCPG G = (V0 , V1 , · · · , VP ) and its adjacency matrix


P
Hamiltonian Ha in the (|ω, |b1 , |b2 ) basis where N = i=0 |Vi |, we can obtain
the reduced search Hamiltonian Hseek in a new eigenbasis (|ω, |e1 , |e2 ) by use
of Theorem 1 for constructing the underlying CTQW.  We can then use Theo-
rem 2 to determine the coupling factor γ = (N α(1 − α)β+ )−1 . The chosen γ
ensures the underlying CTQW remains optimal.

Proof.
√ There are two aspects that we need to address to show that the optimality
O( N ) is preserved. One (1) is fast search speed and low escaping speed while
the other one (2) is the overlap between |e1  and the initial system state |s (a
uniform superposition) as it determines how many times we need to repeat the
experiment.
The search speed is determined by the dynamics between fast transport non-
solution |e1  and solution state |ω, i.e. |e1  → |ω. The degenerate eigenspace
formed by |ω and |e1  captures the dynamics between those two states. The
escape speed is from solution |ω to undesirable non-solution states |e2 .
From (21), we know that δ1 is responsible for the search speed and δ2 is
responsible for escape speed. In (24)We have shown that δ2 is small with respect
to λ− , the escape speed is small. By use of (20), we know that

v2 β+ −1
|δ1 | = e1 |Hseek |ω = |
| = |
| (27)
β+
2 +1 αN (β+
2 + 1)

 
because v2 = −γ N (1 − α) and γ = (N α(1 − α)β+ )−1 . Hence, we obtain the
running time

π π
Trun = = αN (β+
2 + 1). (28)
2|δ1 | 2
Let us verify that the running time Trun remains optimal in different settings
of UCPG G when the coupling factor γ is chosen based on Theorem 2. Briefly
speaking, with a fixed search space of size N , the configuration of a UCPG G is
controlled by variable P and variable α. We will discuss different settings based
on those two variables.
Case 1: P = 1
This is a typical complete bipartite graph as seen in [10]. We immediately know
that κ = 0 since α1 = 1 − α from (25). This leads to β+ = 1 from (11). Because
of that, no matter what value of α is, Trun at (28) holds its quadratic speed-up.
Case 2: 2 ≤ P ≤ N − 1 and α ∝ N1
√ √
By (25), we know that κ ∝ N − 1 and by (11), we know β+ ∝ N − 1. By
plugging in the values of α and β+ , Trun at (28) still holds its quadratic speed-up.
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 923

Case 3: 2 ≤ P ≤ N − 1 and α ∝ 1 (such as NN−1 )



By (25), we know that κ ∝ 1/ N − 1; by (11), we know
√ 
(1/ N − 1) + (1/N − 1) + 4
β+ ∝ 1 (29)
2
when N is large. By plugging in the values of α and β+ , Trun at (28) still holds
its quadratic speed-up.
Case 4: 2 ≤ P ≤ N − 1 and α is some constant (non-extreme values):
Immediately we know κ and β+ are some constants that would not affect the
complexity. Hence, Trun at (28) still holds its quadratic speed-up.
However, the Trun above assumes that we start the search from eigenstate |e1  to
find |ω, i.e. |e1  → |ω, which is not the case because we start from |s. Hence,
at Trun the success probability of observing |ω is the overlap between |e1  and
|s. The success probability is3


βα2 − β 21N + 1 − α 2
+ +
PO = |e1 |s|2 =
. (30)
1
1 + β2
+

Therefore 1/PO is the number of times we need to repeat the experiment. We


need to show that PO is some constant such that it would not affect the total
complexity under the Big O notation. By examining the four cases listed in
Theorem 3 and the corresponding values of α and β+ into (30), we know that
PO remains as some constant that is not exponentially small.
Since the total runtime is
1
Trun × (31)
PO
where Trun holds quadratic speed-up and P1O is some constant that is not large
(not scaling with N ), the complexity still
 holds the quadratic speed-up. There-
fore, we know that the chosen γ = (N α(1 − α)β+ )−1 ensures that the under-
lying CTQW remains optimal.

3 Noise Error Patterns and Optimality Preserving


Now we consider how to keep the dimensionality reduced quantum walk search
optimal by characterizing the noise patterns in the system. The noises can
be introduced due to the precision limitation and the noisy environment. For
instance, not all numbers have a perfect binary representation and the approxi-
mated numbers would cause perturbation. Let matrix Ha be the closest Hamil-
tonian to Ha that can be prepared by an available quantum system of limited
precision. In the remaining of this section, we examine the effect of (1) systematic

3
Simply

compute √
their inner product and we know that |s =
|ω+ m0 −1|SV0 −ω + N −m0 |SV̄ 
√ 0
N
.
924 C.-F. Chiang

errors, (2) static errors, and (3) non-static errors on CQTW and the coupling
factor while the goal is to keep the search by CTQW optimal and the feasibil-
ity of systematic dimensionality reduction. For simplicity, let us assume we are
working on only complete graphs for the noise characterization.

3.1 Systematic Disorder


If the error is systematic, that is each adjacency matrix entry that connects two
different sites in Ha suffers an
error in comparison to the original Ha . It is clear
to see Ha = (1 −
)Ha . This N by N Ha matrix can be efficiently reduced to
(1 −
)Hra smoothly by use of Lanczos algorithm. This can apply to all UCPG
graphs, including CG, CBG and SG. The new coupling factor γ  scales by a
γ
factor of 1− accordingly to keep the search optimal with

γ  = (N ( α(1 − α)β+ )(1 −
))−1 . (32)

In reality, the distribution of errors is seldom perfectly systematic when envi-


ronmental noise is considered. When an arbitrary noise distribution is introduced
into a quantum system of a higher dimension, the task becomes daunting. The
computation complexity exponentiates due to high dimensionality and the irreg-
ularity in noise distribution. The irregularity makes the convergence of Lanczos
algorithm extremely slow or impossible to reduce the dimensionality. The reduc-
tion process becomes computationally expensive and the reduced dimensionality
might not be of significant importance. Furthermore, it could be the reduced
dimensionality still exceeds the implementability of quantum walkers with cur-
rent quantum technology to solve real life problems on a reasonable scale. For
this study we aim to examine the possible noise pattern that Lanczos algorithm
can efficiently reduce the original adjacency matrix.

3.2 Static Diagonal Disorder


Environmental assisted quantum search [11] suggests that naturally occurring
open quantum system dynamics can be advantageous for a quantum algorithm
based on quantum walks affected by static diagonal errors. The scenario consid-
ered is the quantum search on a complete graph with diagonal disorder due to
imperfect Oracle. An imperfect oracle that marks each node of the graph erro-
neously that non-solution node j is marked with an energy
j while the solution
node ω is marked with an energy −1 +
ω . This is perceived as static disorder
N −1
on the complete graph. With the state |s = √1N i=0 |i, the perturbed search
Hamiltonian with tuning factor γ = 1/N in the original dimension is
N
 −1
Hseek = −|ωω| − |ss| +
i |ii| (33)
i=0

where the static disorder


i are i.i.d random variables with mean 0 and standard
deviation σ << 1. By use of degenerate perturbation theory, the reduced search
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 925

Hamiltonian in the {|ω, |sω̄ } basis is


 √ 
−1 +

ω −1/ N
Hseek = (34)
−1/ N −1
where |sω̄  is the equal superposition of all nodes other than the solution node
|ω. The gap between the ground state and the first excited state of the perturbed
Hamiltonian is 
Δ =
2ω + 4/N (35)
With γ = 1/N , the success probability of the algorithm is
sin2 (Δt)
Pω (t) (36)
1 + N
2ω /4

The maximum success probability is obtained when time t = π/Δ O( N ).
From experimental aspect, this algorithm can always choose a fixed tuning factor,
and requires only the static disorder on |ω, then the optimality is preserved
by calculating the running time solely based on
ω that is some variable with
value between 0 and 1. This shows the only influence comes from
ω in the
static disorder situation when the static disorder variables
i obey the i.i.d.
distribution, small deviation σ and mean 0 condition. Interested readers can
refer to this article [11] for technical details.

3.3 Reducible Non-diagonal Noise


In this section, we extend the errors to those non-diagonal terms in the adja-
cency matrix Ha . We will identify the error patterns that the affected underlying
system Hamiltonian by making sure the dimensionality reduction is feasible by
Lanczos algorithm. We then characterizing those error patterns.
Suppose non-negative errors
ij , where ∀i, j ∈ [1, N ] and i = j, occur across
the original adjacency matrix Ha . To make this scenario not be mixed with the
static diagonal disorder, eii = 0, let the perturbed adjacency matrix be Ha with
Ha ij = Haij −
ij and let the index for ω be 1 in the system. Our goal is to
make sure Lanczos algorithm will terminate after two iterations to guarantee
the desired dimensionality. We are under such a constraint because we want to
apply the theorems developed in previous sections.
Without loss of generality, let |ω be the solution state and the first normal
basis vector. In the first iteration of Lanczos, we know ω|ω1  = 0, where |ω1  =
N |ω1 
Ha |ω = i=2 (1 −
i1 )|i. Let |v2  = |ω 1 
be the second normal basis vector.
In the second iteration, let |ω2  = Ha |v2 . For the Lanczos to terminate at this
stage, the first condition is that |ω2  must be a linear combination of |ω and
|v2  that
|ω2  = c1 |ω + c2 |v2 . (37)
When the first condition is met, we have ω2 |v2  = c2 since |v2  ⊥ |ω. As
we proceed with the purification process of Lanczos, we must make sure
|ω2  = |ω2  − ω2 |v2 |v2  − |ω1 |ω = 0 (38)
926 C.-F. Chiang

such that Lanczos algorithm will stop. This is the second condition. It is thus
desired to have c1 = |ω1  due to (37, 38). The corresponding reduced Hamil-
tonian Hra in (|ω, |v2 ) basis with noise is
 
0 c1
Hra = (39)
c1 c2

This immediately implies



N

c1 =  (1 −
i1 )2 (40)
i=2

for the second condition to be satisfied. Next we need to examine the constraint
on c1 , c2 such that the first condition is satisfied. We need to show that
N N N
1    
(Ha |v2 ) = ( (1−
i1 )(1−
1i )|ω)+ (1−
ki )(1−
i1 )|k (41)
|ω1  i=2
k=2 i=2,i=k

is the same as (37). This immediately gives us the requirement for the noise
pattern that ∀k ∈ [2, N ]
N

(1 −
ki )(1 −
i1 ) = c2 (1 −
k1 ). (42)
i=2,i=k

For c1 , by use of (40, 41) and the fact that c1 = |ω1 , we know c21 has
N
 N

(1 −
i1 )2 = (1 −
i1 )(1 −
1i ). (43)
i=2 i=2

From (42), it is clear that we do not have constraints on the error variables
as we can simply compute the value of c2 based on the given errors. However,
from (43), we must obey this relation among the error variables. There are many
feasible scenarios and the simplest scenario is the errors are symmetric.

4 Discussion
The notion of invariant subspaces [10] of continuous-time quantum walk
(CTQW) problems is a powerful technique that simplifies the analyses of vari-
ous quantum walk related studies such as the spatial search algorithm, quantum
transport, and quantum state transfer. In essence, it maps a spatial search algo-
rithm to a transport problem on a reduced graph. The dimensional reduction is
purposely constructed to preserve the dynamical evolution of a walker. Hence,
any quantum walker optimization on a reduced graph guarantees an optimiza-
tion on the original graph. In this work, we apply this technique to deduce an
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 927

appropriate coupling factor for the underlying√CTQW to run optimally (to keep
the quadratic speed-up with running time O( N )) for a spatial search. We gen-
eralize the result in [10] from complete graphs (CG), complete bipartite-graphs
(CBG) and star graphs (SG) to uniform complete P-partite graphs (UCPG). It
is clear that UCPG could be non-regular or regular based on the constraints we
impose. More specifically, we (1) derive the formula for the coupling factor γ
and (2) show that CTQW constructed based on our choice of coupling factor
will remain optimal.
The proof of the optimality is two-fold. The speed of the CTQW is based
on (1) the transport efficiency between the two lowest energy eigenstates (one
is the marked state |ω and the other state is |e1 ) and (2) the overlap between
the initial state |s and the |e1  in the invariant subspace. We showed that the
transport efficiency preserved the quadratic speed-up and the overlap is some
constant that does not scale with the inverse of N . Therefore, the CTQW search
based on the coupling factor determined by our approach will remain optimal.
It is clear to see that a high dimension Hamiltonian would not be possible
to be implemented using near-term quantum technology if we try to encode the
given original configuration. However, with our reduction scheme and coupling
factor determination approach, we can implement this dynamics of the high
dimension Hamiltonian with very few quantum bits and the quantum walker
will remain optimal while searching on this reduced system. Furthermore, since
quantum system is susceptible to the noise, we characterize the noise pattern for
three types of error distribution: systematic disorder, static diagonal disorder,
and reducible non-diagonal disorder. For the first two cases, it is apparent that
no specific pattern is required while for the reducible non-diagonal errors, we
know the pattern must satisfy 43 such that application of our coupling factors
in the experiment will keep the quantum walker search optimal as the pattern
guarantees systematic dimensionality reduction to a 3 by 3 Hamiltonian.

A Appendix A: Reduction Using Lanczos Algorithm

Algorithm 2. Lanczos Algorithm


Require: A Hermitian matrix A of size N × N and optionally a number of iterations
m. As default, it is m = n but in our case, we desire m = 3.
Ensure: A orthonormal basis for A
  
Let |ω = v1 then let w1 = Av1 , α1 = w1∗ v1 and w1 = w1 − α1 v1
For j = 2, · · · , m, do:
1. Let βj = ||wj−1 ||
2. If βj = 0, then let vj = wj−1 /βj
 
3. Let wj = Avj , αj = wj∗ vj

4. Let wj = wj − αj vj − βj vj−1 .
928 C.-F. Chiang

If an N × N matrix is reduced to a 3 × 3 matrix by Lanczos algorithm, the


reduced matrix in the {v1 , v2 , v3 } basis is
⎡ ⎤
α1 β2 0
⎣ β2 α2 β3 ⎦ (44)
0 β3 α3

By using the Lanczos algorithm on a UCPG configuration given in Sect. 2, we


start with v1 √
= |ω and A = Ha . Immediately we know that α1 = 0 and that
leads to β2 = N − m0 . At iteration j = 2, we can obtain
1 
v2 = √ |i (45)
N − m0 i∈V,i∈V
/ 0
1 
w2 = √ ((N − m0 )|ω + (N − m0 ) |i
N − m0 i∈V0 ,i=ω

+(N − m0 − m1 ) |i) (46)
i∈V,i∈V
/ 0
1 
w2 = √ (N − m0 ) |i (47)
N − m0 i∈V ,i=ω 0

with
 α2 = (N − m0 − m1 ). At iteration j = 3, we obtain β3 =
(N − m0 )(m0 − 1) with

1 
v3 = √ |i (48)
m0 − 1 i∈V ,i=ω
0
√ 
w3 = m0 − 1 |i (49)
i∈V,i∈V
/ 0

and then we have α3 = 0, w3 = 0. Readers should be reminded that in 5 the


matrix is written in the {v1 , v3 , v2 } basis, instead of the {v1 , v2 , v3 } basis.

References
1. Aharonov, Y., Davidovich, L., Zagury, N.: Quantum random walks. Phys. Rev. A
48(2), 1687 (1993)
2. Ambainis, A.: Quantum walk algorithm for element distinctness. SIAM J. Comput.
37(1), 210–239 (2007)
3. Childs, A.M.: On the relationship between continuous-and discrete-time quantum
walk. Commun. Math. Phys. 294(2), 581–603 (2010)
4. Childs, A.M., Cleve, R., Deotto, E., Farhi, E., Gutmann, S., Spielman, D.A.: Expo-
nential algorithmic speedup by a quantum walk. In: Proceedings of the Thirty-fifth
Annual ACM Symposium on Theory of Computing, pp. 59–68. ACM (2003)
5. Childs, A.M., Goldstone, J.: Spatial search by quantum walk. Phys. Rev. A 70(2),
022314 (2004)
Optimal Dimensionality Reduced Quantum Walk and Noise Characterization 929

6. Childs, A.M., Schulman, L.J., Vazirani, U.V.: Quantum algorithms for hidden non-
linear structures. In: 48th Annual IEEE Symposium on Foundations of Computer
Science, FOCS 2007, pp. 395–404. IEEE (2007)
7. Farhi, E., Goldstone, J., Gutmann, S.: A quantum algorithm for the hamiltonian
nand tree. arXiv preprint quant-ph/0702144 (2007)
8. Farhi, E., Gutmann, S.: Quantum computation and decision trees. Phys. Rev. A
58(2), 915 (1998)
9. Magniez, F., Santha, M., Szegedy, M.: Quantum algorithms for the triangle prob-
lem. SIAM J. Comput. 37(2), 413–424 (2007)
10. Novo, L., Chakraborty, S., Mohseni, M., Neven, H., Omar, Y.: Systematic dimen-
sionality reduction for quantum walks: optimal spatial search and transport on
non-regular graphs. Sci. Rep. 5 (2015)
11. Novo, L., Chakraborty, S., Mohseni, M., Omar, Y.: Environment-assisted analog
quantum search. arXiv preprint arXiv:1710.02111 (2017)
12. Yang, Y.-G., Zhao, Q.-Q.: Novel pseudo-random number generator based on quan-
tum random walks. Sci. Rep. 6, 20362 (2016)
Implementing Dual Marching Square
Using Visualization Tool Kit (VTK)

Manu Garg ✉ and Sudhanshu Kumar Semwal


( )

Department of Computer Science, University of Colorado, Colorado Springs, USA


manugarg27@gmail.com, ssemwal@uccs.edu

Abstract. In the past few decades, volume rendering is perhaps one of the most
visited research topics in the field of scientific visualization. Since volume data‐
sets are large and require considerable computing power to process, the issue of
supporting real time interaction has received much attention. Extracting a polyg‐
onal mesh from an existing scalar field identified in the volume data has been the
focus since 1980s. Many algorithms, like Marching Cube, and Marching Square,
a 2D interpretation, have been developed to extract the polygonal mesh from the
scalar interpretation of the volume data. Only a few of these techniques claim to
solve all known existing problems due to concave nature of the surfaces embedded
inside the volume data. Some extract meshes with too many polygons. Many such
polygons, with same orientation, could be combined. Sharp features or small
detail in the underlying surface could be lost due to polygonal approximation.
Other techniques suffer from topological inconsistencies, self-intersections, inter-
cell dependencies, and other similar issues. Dual Marching Cubes and its inter‐
pretation in 2D, Dual Marching Squares (DMS) produce smother results in
comparison to the Marching Square algorithm. In this paper, we implement DMS
using the VTK pipeline. Renderings of MS and DMS are provided.

Keywords: Scientific visualization · Volume visualization


Dual marching squares · VTK pipeline

1 Introduction

The term visualization, as Ware [1] describes it, means the construction of a visual image
in the mind (Oxford English Dictionary 1973). Earliest visualizations can be found in
Chinese cartography by the year of 1137 [2]. Scientific visualization uses computer
graphics and Human Computer Interaction (HCI) techniques to process numerical data
into two- and three-dimensional visual images. This visualization process includes
gathering, processing, displaying, analyzing, and interpreting data. Volume visualiza‐
tion is a set of techniques used to extract meaningful information from volumetric data
using image processing and interactive graphics techniques. Volume datasets can be
collected by sampling, simulation, or modeling techniques. For example, Computed
Tomography (CT) can be used to get a sequence of 2D slices or Magnetic Resonance
Imaging (MRI) data set [1–42]. Other applications of volume rendering are in Compu‐
tational Fluid Dynamics (CFD) [32] and Volume CAD (V-CAD) [33]. Surface

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 930–940, 2019.
https://doi.org/10.1007/978-3-030-02686-8_69
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 931

extraction methods include contour tracking [3], opaque cubes [4], marching cubes [5],
marching tetrahedra [6], and dividing cubes [7]. Sometimes these techniques can
generate false positives (spurious surfaces identified as e.g. cancer) or false negatives
(erroneous holes in surfaces or missing cancerous cells), particularly in the presence of
small or poorly defined features [34]. As geometric information of the objects (voxel)
is generally not retained, this may inflict difficulties encountered when rendering discrete
surfaces [34] especially those obtained from the discretized volume data set. In response
the problems mentioned above, direct volume rendering techniques were developed that
attempt to capture the entire 3D data as 2D Image projection. Volume rendering tech‐
niques convey more information than surface rendering methods, but at the cost of
usually increased algorithm complexity, and consequently increased rendering times.
One of the most basic volume rendering algorithm is the ray casting [36, 38]. There are
multiple ways to find iso-contours on a 2D scalar field. One popular way is the Marching
Squares (MS) algorithm which will be described in more detail later in this paper. The
Marching Cube (MC) has been incorporated in many ways since its introduction [18].
Many of its extensions (e.g., [19]) are analogues for 2D scalar fields. One MC extension
is Dual Marching Cubes (DMC) as proposed in [20]. DMC can produce smoother
contour than MC for some cases. DMC [21] helps in reducing the MC’s disadvantage
of creating a lot of triangles even in flat areas where they are not needed. A new 2D
analogue of DMC called the Dual Marching Squares (DMS) in explained [16], and is
also implemented in this paper using Visualization Tool Kit (VTK).

2 Volume Data

The medical data is in most cases a set of discrete samples in three-dimensional space [13],
which produce a volumetric dataset, as in the case of Magnetic Resonance Imaging (MRI)
scans where set of slices define the 3D volume data set. In most cases, this data is rendered
straight away with the help of a volumetric renderer, which usually produces a gray-scale
image of the rendered region. Sometimes a boundary representation of a layer (2D slice)
should be constructed. This is where an isosurface extraction algorithm can create a polyg‐
onal representation of a certain isolevel of the provided discrete scalar field. Volumes are
special cases of scalar data: regular 3D grids of scalars, typically interpreted as density
values. Each data value contained inside a cubic cell or a voxel. Typical scalar volume data
is composed of a 3-D array of data and three coordinate arrays of the same dimensions. The
coordinate arrays specify the x, y, and z coordinates for each data point.

3 Iso-Surface Extraction

Earliest examples are from 1970 Keppel [3]. Marching Cubes (MC) by Lorensen and Cline
[8] is probably, one of the most well-known algorithms in Computer Graphics and by far the
most cited resource in the field. Surface extraction methods could be grouped into three
classes based on the approach they take. (a) Cellular approaches in Allgower and Gnutz‐
mann [23]; (b) Delaunay-based or Particle approaches, as described in Szeliski and
Tonnesen [24], Witkin and Heckbert [25], Marching Triangles (MT) technique in [26] and
932 M. Garg and S. K. Semwal

improvement provided in Akkouche et al. [27]; (c) Morphing of the data as in [35], element
Driven in [27] and Crespin et al. [28]. Shrink Wrap approaches presents a version which
handles arbitrary geometry. Bottino et al. [31] provide a global surface algorithm.
Marching Square: Before proceeding it is important that some of the main concepts
of the foundational techniques are briefly described. MS is a special case of the MC
algorithm, restricted to two-dimensional space. Therefore, it is used for the extraction
of isocurves and isolines. This method can be used to give a piecewise-linear approxi‐
mation to a two-dimensional object based on 4 vertices. A total of 16 configurations
based on four vertices can be described by the following four situations (Fig. 1).

Fig. 1. The 4 unique configurations of the Marching Squares algorithm, which are necessary to
reproduce all others.

Comments on 3D Version of Marching Squares Called Marching Cube: In 1987


Lorensen and Cline [8] present an algorithm that creates a triangular mesh for medical
data. Known as “marching cubes” due to the way it “marches” from one to the next, the
algorithm is considered to be the basic method for surface rendering in applications.
They use Marching Cubes (MC15) to process computer tomography slices in scan line
order, while maintaining inter-slice connectivity, containing 15 configurations of
possible surface intersections. Nielson et al. [9] found that MC15 has no topological
guarantees for consistency and produces visual hull surfaces containing small holes or
cracks due to certain voxel face ambiguities. They proposed a modification to MC15
that implements face tests to resolve the ambiguities. This modification still does not
guarantee the correct topology either. In 1995, Chernyaev [10] showed that there are 33
topologically intersections and not 15, Chernyaev’s algorithm is referred to as MC33.
Montani et al. also noted the topological inconsistency, computational efficiency and
excessive data fragmentation as disadvantage of MC15. They propose a method to
minimize the number of triangular patches specified in the marching cube surface lookup
table, reducing the amount of data output and improving the computational efficiency.
In a paper by Lewiner et al. [11], an efficient and robust implementation Chernyaev’s
MC33 algorithm is described. Tarini et al. [12] developed a fast and efficient version of
the marching cubes algorithm, called marching intersections, to implement a volumetric
based visual hull extraction technique.
Dual Marching Cube: The DMC [20] algorithm, bases its structure on the MC algo‐
rithm, but improves it in many ways. In DMC, the dual of an octree is tessellated via
the standard marching cubes method. This algorithm eliminates or reduces poorly
shaped triangles and irregular or crooked specular highlights. The DMC always gener‐
ates topological manifold surfaces. Nielson unifies MC surface fragments to polygonal
patches where the vertices of these patches are located on the lattice edges. Since each
lattice edge is adjacent to four cells, each patch vertex is touched by four patches. The
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 933

dual surface is now defined (1) by replacing each patch by a vertex; and (2) by replacing
each patch vertex by a quadrilateral face. In contrast to DC, this approach results in a
classification of 23 cell configurations that are dual to the 23 MC configurations required
for extracting topological manifolds. Each configuration may create up to 4 vertices and
the connectivity is well defined via the lattice edges. More precisely, when a lattice edge
intersects the isosurface, this edge is associated with four vertices forming a quadrilateral
surface fragment.
Dual Marching Square: DMS is the 2D analogue of DMC and is an elegant extension
of 2D MC algorithm. Dual Marching Squares can be considered as a post-processing of
the segments produced by Marching Squares. DMS appears to improve the curvedness
for at least the objects with smoothly curved boundaries as shown in Fig. 2 [42] below.
It does so by considering the dual graph of the quadtree which is one of the basic data
structure in 2D graphics. Quadtree is hierarchical. In a quad tree a 2D region is recur‐
sively divided into four quadrants. Each quadrant is either a leaf cell or subdivided
further. Quad Tree contains one type of hierarchical node and three terminal nodes (leaf,
empty, full). After implementing the quadtree, a dual-grid and marching square over the
grid can be created. This is an example where the DMS is working in higher resolution
than MS. So, we have finer data and hence better surface results.

Fig. 2. An interpreted curve achieved by Dual Marching Square (a) compared to Marching
Square (b) (Drawn using draw.io).

4 Implementation and Results

DMS algorithm is implemented using the Visualization Toolkit (VTK). DMS is the 2D
analogue of Dual Marching Cubes. Its contour is the dual of the contour produced by
Marching Squares. The test data “h” consist of boundaries comprised largely of smooth
curve. The first part of DMS is generation of quad tree (Fig. 3). This quadtree is respon‐
sible for having less triangles in plane areas and more in curved. The recursive quad
trees generated are shown below.
After the quad trees are generated, the cells are merged whenever a cell’s distance
field is entirely determined by a single line, using the concept of Adaptively Sampled
Distance Fields (ADFs) [40]. Single flat edges are combined to create larger leaf cells
(Fig. 4). The next step is to derive dual-grid from the quadtree where each vertex of
quadtree is placed at the c enter of its square (Fig. 6). This grid is topologically dual to
the quadtree. The vertices are generated by the process defined in DMC [21]. According
to DMC the vertices of dual grid from which surface will be extracted are generated
using feature isolation. For each vertex of the quad tree, a dual grid cell whose vertices
are the feature vertices inside of each square of quadtree, will be created. This technique
934 M. Garg and S. K. Semwal

Fig. 3. First five quad tree level for test data “h”.

helps to better interpret sharp features. Figure 7 shows the topology of the example quad
tree where each vertex is placed in the center of a square. For the test data the vertices
generated for each green square are shown in Fig. 5.

Fig. 4. Merging of cells to form a larger leaf (green color) after ADFs.

Fig. 5. Quad tree created for test shape “h” after ADFs. Leaf cells (Green Color), Empty cells
(Grey Color), and Full cells (Back Color).
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 935

Fig. 6. Dual grid (Black color) over a primary quadtree (Grey color).

Fig. 7. Vertices (Red color) of the test shape “h” for Dual Marching Square. Leaf cells (Green
Color), Empty cells (Grey Color), and Full cells (Back Color).

The feature extraction is based on the local information field and its gradients as
given by Kobbelt [41]. This can be achieved by finding the position and normal for all
the points along the cells side that intersect with the isocontour. Then perform least-
squares fit to find the feature position (vertex). Once the vertices are generated the next
step is to join them together to form the contour. This can be done by marching using
three functions similar to ones defined in DMC [21]. (a) faceProc: It is called on one
cell. It returns nothing, if the cell is a leaf. For the root cell, it calls itself on each subtree,
on each horizontal pair of cells it calls hEdgeProce and on each vertical pair of cells it
calls vEdgeProc. (b) vEdgeProc: It is called on vertical pair of cells. It the cells are both
leafs, it creates contour between two cells, else calls itself again. (c) hEdgeProc: It is
same as vEdgeProc, but works on horizontal pair of cells. Here are the results generated
by MS and DMS (Fig. 8).
936 M. Garg and S. K. Semwal

Fig. 8. Left “h” generated using Marching Square. Right “h” generated using Dual Marching
square.

5 Results and Discussions

Dual Marching Square Algorithms. Here 3D isosurface is constructed by assembling


the set of 2D contours located on a set of parallel slices which are displayed in Figs. 9,
10, 11 and 12. The visible human dataset is dense, i.e., the slice planes are close to each
other and don’t exhibit too-sharp variations. So, the 3D isosurface is constructed by
connecting points on an isoline with the closest points on isoline from previous and next
slice [42]. Following Fig. 9, 10, 11 and 12 are the results obtained by using Marching
Square and Dual Marching Square interpretations [42]. Times in seconds are equivalent,
quality of images are different with as DMS produces more geometry to capture the
curvedness and corners than MS. The implementation resulted by using VTK library
functions for dual marching square. In future work, we hope to further study other
aspects of DMS behavior. Some of the optimization suggestion like adaptive sampling,
similar to the ADFs [37], and another optimization is the GPU implementation. Lastly,
we also need measures for generated surface quality for the given volume data, so that
we can compare the surface generated by DMS and DMC and provide more/better
quantitative analysis of our results [42]. Because 2D contours on a slice are used for 3D

Fig. 9. Left Female head_Front Marching Square (duration 0:00:00.246355) Right Female
head_Front Dual Marching Square (duration 0:00:00.247965).
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 937

rendering, side effects are created (Fig. 9, 10, 11 and 12), which needs to be further
explored in future.

Fig. 10. Left Female Head_side Marching Square (duration 0:00:00.246355) Right Female
Head_side Dual Marching Square (duration 0:00:00.247965).

Fig. 11. Left Female Eye_ Front Marching Square (duration 0:00:00. 213779), Right Female
Eye_Front Dual Marching Square (duration 0:00:00. 223355).

Fig. 12. Left Female_Ear_Front Marching Square (duration 0:00:00.244979), Right


Female_Ear_Front Dual Marching Square(duration 0:00:00. 253785).

Acknowledgments. The authors are indebted to VTK Community – several existing functions
from VTK toolbox were used to interpret the existing medical data to generate images and their
renderings shown in this paper (Fig. 9, 10, 11 and 12).
938 M. Garg and S. K. Semwal

References

1. Ware, C.: Information Visualization: Perception for Design, 2nd edn. Morgan Kaufmann
(2004)
2. Collins, B.M.: Data visualization - has it all been seen before? In Earnshaw, R.A., Watson,
D. (eds.) Animation and Scientific Visualization – Tools and Applications, 1st edn., pp. 3–
28. Academic Press, London (1993). Chapter 1
3. Keppel, E.: Approximating complex surfaces by triangulation of contour lines. IBM J. Res.
Dev. 19(1), 2–11 (1975)
4. Herman, G.T., Liu, H.K.: Three-dimensional display of human organs from computed
tomograms. Comput. Graph. Image Process. 9(1), 1–21 (1979)
5. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction
algorithm. Comput. Graph. 21(4), 163–168 (1987)
6. Shirley, P., Tuchman, A.: A polygonal approximation to direct scalar volume rendering.
Comput. Graph. 24(5), 63–70 (1990)
7. Cline, H.E., Lorensen, W.E., Ludke, S., Crawford, C.R., Teeter, B.C.: Two algorithms for
the reconstruction of surfaces from tomograms. Med. Phys. 15(3), 320–327 (1988)
8. Lorensen, W.E., Cline, H.E.: Computer graphics. Marching Cubes: High Resolut. 3D Surf.
Reconstr. 21(4), 163-166 (1987)
9. Nielson, G.M., Hamann, B.: The asymptotic decider: resolving the ambiguity in marching
cubes. In: Proceedings of Visualization 1991, pp. 29–38, October 1991
10. Chemyaev, E.V.: Marching cubes 33: construction of topologically correct isosurfaces
Technical report, CN 95–17, CERN (1995)
11. Lewiner, T., Lopes, H., Viera, A.W., Tavares, G.: Efficient implementation of marching cubes
cases with topological guarantees. J. Graph. Tools 8(2), 1–15 (2003)
12. Tarini, M., Callieri, M., Montani, C., Rocchini, C., Olsson, K., Persson, T.: Marching
intersections: an efficient approach to shape-from-silhouette. In: 7th International Fall
Workshop on Vision Modeling, and Visualization, November 2002
13. Lichtenbelt, B., Crane, R., Naqvi, S.: Introduction to Volume Rendering. Prentice-Hall Inc,
Upper Saddle River (1998)
14. Elvins, T.T.: A survey of algorithms for volume visualization. Comput. Graph. 26(3), 194–
201 (1992)
15. Brodlie, K., Wood, J.: Recent advances in volume visualization. Comput. Graph. Forum
20(2), 125–148 (2001)
16. Gong, S., Newman, T.S.: Dual marching squares: description and analysis. In: 2016 IEEE
Southwest Symposium on Image Analysis and Interpretation (SSIAI), Santa Fe, NM, pp. 53–
56 (2016)
17. Freeman, H.: Computer processing of line-drawing images. ACM Comput. Surv. 6(1), 57–
97 (1974)
18. Newman, T., Yi, H.: A survey of the marching cubes algorithm. Comput. Graph. 30(5), 854–
879 (2006)
19. Treece, G., Prager, P., Gee, A.: Regularised marching tetrahedra: Improved iso-surface
extraction. Comput. Graph. 23(4), 583–598 (1999)
20. Nielson, G.: Dual marching cubes. Proc. Vis. 04, 489–496 (2004)
21. Schaefer, S., Warren, J.: Dual marching cubes: primal contouring of dual grids. In: 12th
Pacific Conference on Computer Graphics and Applications, PG 2004. Proceedings, pp. 70–
76 (2004)
22. Bloomenthal, J., Wyvill, B. (eds.): Introduction to Implicit Surfaces. Morgan Kaufmann
Publishers Inc., San Francisco (1997)
Implementing Dual Marching Square Using Visualization Tool Kit (VTK) 939

23. Allgower, E.L., Gnutzmann, S.: Simplicial pivoting for mesh generation of implicitly defined
surfaces. Comput. Aided Geom. Des. 8(4), 305–325 (1991)
24. Szeliski, R., Tonnesen, D.: Surface modeling with oriented particle systems. In: Proceedings
of the 19th Annual Conference on Computer Graphics and Interactive Techniques,
SIGGRAPH 1992, pp. 185–194. ACM, New York (1992)
25. Witkin, A.P., Heckbert, P.S.: Using particles to sample and control implicit surfaces. In:
Proceedings of the 21st Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH 1994, pp. 269–277. ACM, New York (1994)
26. Hilton, A., Stoddart, A.J., Illingworth, J., Windeatt, T.: Marching triangles: range image
fusion for complex object modelling. ICIP 2, 381–384 (1996)
27. Akkouche, S., Galin, E., Centrale, E.: Adaptive implicit surface polygonization using
marching triangles. Comput. Graph. Forum 20, 67–80 (2001)
28. Desbrun, M., Tsingos, N., Paule Gascuel M.: Adaptive sampling of implicit surfaces for
interactive modeling and animation. Comput. Graph. Forum, 171–185 (1995)
29. Crespin, B., Guitton, P., Schlick, C.: Efficient and accurate tessellation of implicit sweep
objects. In: Constructive Solid Geometry, pp. 49–63 (1998)
30. Wyvill, G., Kunii, T.L., Shirai, Y.: Space division for ray tracing in CSG. IEEE Comput.
Graph. Appl. 6(4), 28–34 (1986)
31. Bottino, A., Nuij, W., Overveld, K.V.: How to shrinkwrap through a critical point: an
algorithm for the adaptive triangulation of isosurfaces with arbitrary topology. In:
Proceedings Implicit Surfaces 1996, pp. 53–72 (1996)
32. Ebert, D.S., Yagel, R., Scott, J., Kurzion, Y.: Volume rendering methods for computational
fluid dynamics visualization. In: IEEE Conference on Visualization 1994, Proceedings,
Washington, DC, pp. 232–239, CP26 (1994)
33. Kase, K., Teshima, Y., Usami, S., Ohmori, H., Teodosiu, C., Makinouchi, A.: Volume CAD.
In: Proceedings of the 2003 Eurographics/IEEE TVCG Workshop on Volume Graphics,
Tokyo, Japan, 07–08 July 2003
34. Kaufman, A., Cohen, D., Yagel, R.: Volume graphics. IEEE Comput. 26(7), pp. 51–64, July
1993
35. Semwal, S.K., Chandrashekher, K.: 3D morphing for volume data. In: The 18th Conference
in Central Europe, on Computer Graphics, Visualization, and Computer Vision, WSCG 2005
Conference, pp. 1–7, January 2005
36. Buchanan, D.L., Semwal, S.K.: A new front to back composition technique for volume
rendering. In: Chua, T.S., Kunii, T.L. (eds.) CG International 1990. Springer, Tokyo (1990)
37. Frisken, S.F., Perry, R.N., Rockwood, A.P., Jones, T.R.: Adaptively sampled distance fields:
a general representation of shape for computer graphics. In: Proceedings of the 27th Annual
Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 2000, vol. 78,
pp. 249–254. ACM Press/Addison-Wesley Publishing Co., New York (2000)
38. Swann, P.G., Semwal, S.K.: Volume rendering of flow-visualization point data. In: Nielson,
G.M., Rosenblum, L. (eds.) Proceedings of the 2nd Conference on Visualization 1991 (VIS
1991), pp. 25–32. IEEE Computer Society Press, Los Alamitos (1991)
39. DICOM Structured Reporting, Dr. David A. Clunie
40. Frisken, S.F., Perry, R.N., Rockwood, A.P., Jones, T.R.: Adaptively sampled distance fields:
a general representation of shape for computer graphics. In: Proceedings of the 27th Annual
Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 2000), pp. 249–
254. ACM Press/Addison-Wesley Publishing Co., New York (2000)
940 M. Garg and S. K. Semwal

41. Kobbelt, L.P., Botsch, M., Schwanecke, U., Seidel, H.-P.: Feature sensitive surface extraction
from volume data. In: Proceedings of the 28th Annual Conference on Computer Graphics
and Interactive Techniques, (SIGGRAPH 2001), pp. 57–66. ACM, New York (2001)
42. Garg, M., Squares, D.M.: Implementation and analysis using VTK. MS thesis, Supervisor:
Sudhanshu Kumar Semwal, Department of Computer Science, University of Colorado,
Colorado Springs, pp. 1–63 (2017)
Procedural 3D Tile Generation for Level Design

Anthony Medendorp ✉ and Sudhanshu Kumar Semwal


( )

Department of Computer Science, University of Colorado, Colorado Springs, CO, USA


anthonymedendorp1@gmail.com, ssemwal@uccs.edu

Abstract. Procedural level generation in game design can reduce the resources
needed in various aspects of game development while still providing a robust and
re-playable game experience for the player. Procedural level design is most
frequently seen in rogue-like adventure game and should not be limited to just a
single genre or design style, because of the great potential of this approach.
Through research into procedural generation via programming, and practical
experience as a 3D artist, these two contrasting and historically separated sides
of game design can be united to create a more coherent and practical approach.
In this paper, we will focus on implementing and addressing technical challenges
of 3D tile generation and their movement which is the main contribution of our
work.

Keywords: Unity3d™ · Platformer · Rouge like · Game design · Puzzle


Procedural generation

1 Introduction

Platform adventure games provide the player with a sense of wonder and excitement as
they explore new lands and overcome challenging obstacles on their quests. Banjo-
Kazooie is an example of a platformer, where the player plays as a bear and bird
exploring various lands while collecting an assortment of items. The downside of plat‐
form adventure games is that they don’t offer the player any variety upon multiple play‐
throughs. In today’s gaming world re-playability has become a major selling point for
games. Roguelike games such as Binding of Isaac, Spelunky, and Darkest Dungeon offer
the player more value for the money as each playthrough of the game is different keeping
the player coming back for more. Our goal was to design procedural process that merges
the best of those genres and offers players a style of game that had yet to really be
developed or even defined: one that provides the adventure of a platformer, the challenge
of puzzle-solving, the excitement of exploration, and the continued playability of a
dungeon crawler.

2 Literature Review

Green (2016) takes a coding-driven approach to procedural content generation, which


is now the standard for most games that offer a high degree of re-playability [1]. The
major downside of this approach is that the art assets it utilizes are necessarily very

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 941–949, 2019.
https://doi.org/10.1007/978-3-030-02686-8_70
942 A. Medendorp and S. K. Semwal

simple so that they will fit nicely into a grid in all possible combinations. These are often
one-dimensional rather than multilayered grids, and they are therefore best suited for
2D games. More recently, however, game developers have begun to adapt procedural
level generation for 3D environments, resulting in games like No Man’s Sky that can
generate an entire universe that houses over eighteen quintillion possible planets to
explore, each with its own geography and vegetation. PCG (procedural content gener‐
ation) is generally explained either from a coding perspective without much attention
to the game’s visual aesthetics or, conversely, from an artistic perspective of modular
design, overlooking any in-depth consideration of the programming methodology. For
example, Tor Frick (2011) offers a comprehensive exploration of modular texture design
within Unreal Engine, but the focus is on the artwork without any discussion of the
coding involved in procedural design [4]. There are plenty of examples of artists that
employ modular design, which then could be applied to a procedural approach to level
design, but thus far, the research does not go beyond artistic considerations to explain
the programming side. 2D tile sliding puzzles are common, but 3D platform games
employing these kinds of puzzles do not actually currently exist because the tile sliding
game mechanic has yet to be implemented in a 3D platform game. The oldest example
of a sliding puzzle is the fifteen-puzzle invented in 1880 by Noyes Chapman (Fig. 1) [5].
From the 1950s through the 1980s, sliding puzzles evolved and began employing letters
to form words. These sorts of puzzles have several possible solutions, which add another
dimension to the gameplay. In an online resource, Andrew Chapple analyzes the possible
permutations available to the 15-puzzle setting a solid foundation for solvability within
the 15-puzzle.

Fig. 1. In game screenshot showing player, health gauge, pickup count, and possible trap/coin
pickup.

Platformer Games: A platformer is a game in which the central challenge is to navigate


a character across a series of disconnected platforms, usually by jumping. In this project,
we focused of two subtypes of platform games. These are puzzle platformers and plat‐
form-adventure games. In puzzle platformers, some sort of puzzle mechanics is typically
added to the platform navigation. The platform-adventure genre also incorporates
Procedural 3D Tile Generation for Level Design 943

challenges from some elements of action-adventure games, such as open exploration,


inventory management, and an ability system. Super Mario 64 and Banjo-Kazooie for
the Nintendo 64 are examples of 3D platform adventure games. They dominated an
entire generation of gaming consoles but experienced a significant decline in popularity
in the new millennium with the advent of the two most recent generations of game
consoles such as the Xbox 360 and PlayStation 3. Nintendo remains one of the only
major game developer that continued to pursue the market for these games.
The demand for platform-adventure games remains strong, however, and may even
be undergoing a resurgence, as evidenced by the nearly 2.5 million dollars raised through
crowdfunding on Kickstarter for the development of Yooka-Laylee, a spiritual successor
to games like Banjo-Kazooie and Donkey Kong 64. The downside of this fluctuation in
consumer interest is that the genre has suffered, especially in terms of the stagnation of
ideas. Instead of reinventing the genre as so many supporters hoped it would, Yooka-
Laylee comes across as merely a nostalgic clone, in our opinion.
By contrast, roguelike games are a subgenre of role-playing games (RPGs) that are
often characterized as ‘dungeon crawlers’. In a dungeon crawler, players navigate a
maze-like environment, typically fighting through waves of enemies and collecting
treasure along the way. Due to the simplicity of the dungeon crawler structure, this
mechanic lends itself well to procedurally generated games.

3 Motivation

By focusing on the few key features of RPGs that are relevant to this project, we can
offer a simplified definition of the roguelike subgenre, which includes two key charac‐
teristics: (1) a dungeon crawler structure that consists of (2) procedurally generated
levels. Roguelikes are mostly 2D or 2.5D games, with 3D roguelikes being much less
common. Platformers generally have a well-defined story for the player but lack re-
playability, whereas roguelikes offer re-playability but often become repetitive because
they lack a strong story to drive them. The goal and focus of our work was to take the
key traits of both genres and merge them to set the ground work for a platform adventure
game that offers the player maximum re-playability while minimizing the amount of 3D
and 2D content to be created.

4 Methodology

Unity3D™ [2–5] was used to create the framework for a third-person roguelike puzzle
platformer game using the principles of procedural generation. The initial goal was to
create a game that randomly generates the levels based on simple parameters which the
program follows. By utilizing Unity3D™’s 3D assets and tiling textures, we created a
minima game experience that would offer a large amount of variety. For typical video
games, a level designer hand-places all the assets, whereas we envisioned a program
that could place the assets at the beginning of each stage itself automatically.
944 A. Medendorp and S. K. Semwal

Concept/Story: The core game concept for this project involves a maze-style sewer
system that players traverse from beginning to end via a three-dimensional path of
movable Blocks while also avoiding traps, hazards, and rising sewage levels. Our story
centers around a rat (the player) on a quest to save his best friend who was kidnapped
by an unknown group of enemy rats, forcing him out of his complacent life of living in
a quiet corner of the sewer, spending his days consuming cheese. The quest to rescue a
friend is the primary motivation for the narrative, but the sense of urgency is heightened
when it is revealed that the kidnappers have also poisoned the sewer system and activated
a series of pumps, causing toxic water to steadily rise. As the player flees this threat and
searches for the kidnapped friend, they are guided by their cheese dealer, who offers aid
in exchange for assistance in gathering his stolen cheese. The game’s tone is adult-
oriented, but the story elements embrace the nostalgia of children’s games, targeting
adults that grew up with early 3d platform games. In true platformer fashion, the player
periodically encounters bosses to battle at predefined intervals as they progress through
the levels, utilizing the tile sliding mechanic for combat. After each boss’ fight, it is
revealed that the kidnapped friend is in another sewer, guarded by a yet more difficult
boss, and the player must continue climbing up through the sewer system, encountering
new challenges along the way to keep the gameplay fresh. Our story line maps to rules
that must be followed and fixed obstacles to overcome along the journey. The starting
and ending positions for each level are permanently situated diagonally from each other,
some platforms are immovable, the passageways to be navigated expand and shrink,
and the poisonous sewage rises faster as the game progresses. Technical challenges are
explained next.
Procedural Design Concepts: By utilizing tiling textures and modular 3D assets the
total amount of unique assets could be minimized. A procedural design was laid out on
paper breaking the actual game into Blocks to represent the slide-able tiles. Each Block
was broken into smaller pieces that allowed them to fit together with a variety of pieces.
Block size was established as 2048 cubed units, with the inside broken into 3 layers high
and are 3 by 3. These sections allowed the use of snapping on a grid within 3ds max
which improved the time it took to flesh out the layout. Procedural level design offers
the benefit of a large amount of variety with minimal assets, making our implementation
ideal for a single or small team project where time and resources might be limited. Since
platformer games have yet to take on a procedural level design and focus the gameplay
around it, this concept provided a good opportunity for study as well (Fig. 1).
The ‘Block’ Algorithm: As mentioned earlier, a Block is a 2048 cubed units section
of the puzzle board comprised of interlocking 3d assets that purposely fit together. A
Block is a combination of various prefabs that together make up a moveable section of
the sewer, and the player can slide these Blocks around the board to make traversal to
the exit easier. Each Block contains: (a) One of five (not too many not too little) possible
floor pieces that are chosen randomly from an array. (b) One of five possible scaffolding
pieces that are chosen randomly from an array. (c) One of two possible ladder pieces
that are mirrors of each other have a definable chance of spawning in a block.
Additional elements that are not part of movable Blocks are also generated as stationary
items on each level map (Fig. 2): (a) The Entrance prefab is a tube leading back down the
Procedural 3D Tile Generation for Level Design 945

ladder to the previous level, acting as the player spawn point. Although the player does not
have the ability to go backwards to previous levels, the entrance serves as the starting point
for each new level. (b) The Exit prefab is a rising ladder platform that sends the player to
the next level. The height of the exit is determined by the water level, rising and falling
along with it so that the player can access the Exit no matter the height of the sewage, if
they reach it before the level has been fully submerged. (c) The Exterior Box is the outer
boundary of the level map that blocks the player from traveling outside of the predefined
play area. Since the size of the map can change from level to level, the Exterior Box also
changes to correspond to the size of each level map. (d) Boarder Caps are transition pieces
placed at each Block location, but they do not move with the Blocks. Instead, these
stationary pieces help hide seams between Blocks, preventing the appearance of bleeding.
(e) Sewage spawns to the size of the game board, starting at a height of 256 and rising at a
rate determined by the size of the map and current level. For instance, the water level rises
faster on smaller boards than on larger boards, and at higher levels, say level 5, sewage
rises at a faster rate than it did at level 1.

Fig. 2. Rough Outline of 3D asset concept. Green is the lower platforms, red is the scaffolding,
purple is the starting and ending location, orange is player scale.

Gameplay Algorithm: The sewer maze is comprised of three-dimensional Blocks that


are chosen procedurally by the game for each level and placed in a two-dimensional
grid. There are five variations of base pieces that are rotated and placed in the grid of
the game board. There is also an upper scaffold piece that is added to the base pieces to
add another layer to the 3D playable area. Additional traps and hazards are randomly
added to the base pieces to add even more variation. Utilizing third-person camera
controls, a player must navigate through the sewer by climbing ladders, jumping across
small gaps, and collecting cheese along the way. Starting from the player spawn point
at the bottom left of the grid, the player heads to the exit at the top right by traversing
the board, avoiding hazards. The player can use the DPad to slide sections of the sewer
around in a similar fashion to a 2d sliding puzzle. The start and exit are fixed at the (N^N
– 2 N + 2) and (2 N – 1) (These numbers represent the tiles diagonal to the bottom left
corner and top right corner) tile locations on the board, and the tile the rat is currently
946 A. Medendorp and S. K. Semwal

standing on cannot be slid, which prevents users from simply sliding to the exit on a
Block without moving the rat itself. To add another challenge to the gameplay experi‐
ence, as time passes, the sewage level rises, forcing the player to higher ground and
eventually killing the player if they do not solve the puzzle quickly enough. Switches
can be pulled to lower the sewage level a certain amount that is based on the size of the
board. If the sewage rises too high, the player is forced to climb a ladder to the higher
scaffolding to navigate the stage.
Establish How to Generate an Always Solvable Puzzle: This goal required a large
amount of time and deliberation as to how to best handle this complex task. Possible
solutions considered included generating a pre-solved board and then shuffling the
pieces to an unsolved state. This would ensure easy calculation of sewage rising speed
and difficulty, but generating a board that is random is our focus. This meant that path
finding logic or hardcoded logic would have to be added to the board generator.
An alternative was to create one or more partially built sewers that would be solvable
(Fig. 3) and then fill in the rest of the board with random pieces before shuffling the
board. This would add more randomness to the board but still ensure that the board is
always solvable, and difficultly and sewage rising rate could still be easily determined.
The downside is it wouldn’t be as random as originally intended, and there would be a
higher chance that the boards would seem less unique.

Fig. 3. Top down view of an example board without Jumble. Red lines outline the Blocks that
offer the player an always solvable solution.

The final idea and was to establish what makes a board unsolvable and correct those
issues. The sewer isn’t like a 2D sliding picture puzzle where the player needs to create
Procedural 3D Tile Generation for Level Design 947

the entire picture to win; it is closer to the 2d word slider puzzles where there are multiple
solutions that are achievable. By ensuring that the player has as many paths as possible
with open spaces to slide each Block, we can reduce the chances of a player getting
stuck. Additionally, by making sure that the variety of possible pieces is spawned more
evenly on the map, we prevent the chances of unintentionally generating levels with
inevitable dead ends.
This still left the possibility that the player might get stuck at the spawn point if a
piece was generated that prevented the player from being able to move from the level
entrance point. To address this, the spawn and exit points were moved away from the
corners of the play area in a column and row, forcing each map to be at a minimum 5
by 5 Blocks so that the spawn and exit are not directly diagonal of each other.
Increasing the frequency of ladder spawns in the map also increases the number of
possible traversable pathways. The downside of this approach is that we can no longer
easily generate the sewage rising rate and difficulty of the map. Instead, the sewage rising
rate is based on the size of the board and current level. Board difficulty has not yet been
addressed, but the plan is that difficulty will be focused on the presence of enemies and
traps, the player’s health bar, and the cost of items available for purchase from the shop.
Develop the controls for moving the Blocks correctly within a grid and block out
rough geometry to represent the pieces once the camera and player controls are in place.
We created a tile movement script to be applied to each Block and, within the update
function, allow a Block to swap places with an adjacent EmptyTile. This approach led
to an unforeseen issue wherein multiple Blocks were able to slide during a single button
press. So we needed to modify the implementation so that Blocks are now stored within
an array and sorted accordingly, swapping only with the next Block. As a result, Blocks
now move individually, with each button press controlling movement for just one Block
at a time as originally intended.
Create the method by which the program will generate the game board. Generating the
game board is based on pulling together random pieces that make up a single Block and
placing them into an array that the tile movement wrapper handles. Extra checks are done
to insure the same piece is not spawning more frequently than others in the map.
Secondary Goals Beyond these initial tasks, the next steps were to add core game
framework. The core game framework was roughed out so that the player can navigate
through an entire game from start menu to game to pause menu cycling thru various
levels and place holder boss stages. Placeholders were incorporated within the board
generator to represent traps, currency and switches despite not having functionality
added to these assets yet.

5 Key Features Implementation

Each board is generated using the BoardManager script. When SetupScene is called
from the GameManager calls AutomaticGrowth, which checks if automatic growth of
levels is on or not and responds accordingly, then calls BuildLevel(). The BuildLevel()
function runs through all the various functions to build a level which include in this order
BoardSetup(), AddBoardComponents(), PopulateBoardItems(), CreateOuter(), and
948 A. Medendorp and S. K. Semwal

Jumble(). BoardSetup() creates a new gameobject called Board to hold all the Blocks.
Within the nested for loop, for the rows and columns size, first a random floorTiles
gameobject is set toInstantiate and replaced with specific floorTiles at certain board
locations. Between the Entrance and Exit for instance the BoardSetup() spawns only
plus-sign floor tiles which are assigned to floorTiles[0]. This floor tile insures there is
always a path between the start and finish generated. floorTiles[0] is removed from the
possible random other tiles in order to reduce the amount of them overall on the board.
At the bottom right corner of the board only EmptyTile is spawned instead of a floor
tile. We do some special checks for entrance and exits and spawn entrances and exits in
the correct board locations one location diagonal of the bottom left corner and top right
corner. Finally if the tile we are spawning is not tagged as an “EmptyTile” we Genera‐
teUpperLayer() along with Ladders and Switch spawn locations. GenerateUpperLayer()
creates another toInstantiate for Scaffold, Ladder, and Switches and instances them into
each Block of the board.
AddBoardComponents() is responsible for adding the TileMovementWrapper.cs to
the Board and initializing the size of the board for the TileMovementWrapper.cs. It also
adds the DPadButtons.cs script that makes sure each DPadButton.cs input is a single
button press at a time. PopulateBoardItems() then spawns a set number of coins and
traps at open locations on the board using the SpawnItem(). SpawnItem() checks random
locations on the floor and scaffold and spawns items based on if the space is available
or not. CreateOuter() is responsible for creating an outer wall surrounding the board
which prevents the player from getting outside the bounds of the board. Jumble() then
uses the TileMovementWrapper.cs that was applied to board and moves the EmptyTile
X amount of times to effectively shuffle the board and prevent the player from walking
straight from the start to the finish (Fig. 4).

Fig. 4. Top down view of an example board generated from the BoardManager. Green represents
sewage, grey is scaffolding paths and lower level paths, red dots represent traps and coins.
Procedural 3D Tile Generation for Level Design 949

6 Conclusions and Future Research

Our implementation is an attempt at the 15-puzzle, by introducing it to a platform and


using rogue-like mechanics, to give a fresh take on this genre. Utilizing the grid and
snaps, helped ensure that the various items that make up a block fit together and that
blocks themselves fit with each other. We have focused on technical challenges of
generating solvable board generation and tile-movement in this paper. Future ideas for
the project include adding working boss stages, event switches that control in-game
items, a shop and inventory system, functional currency pickups, functional heart health
system, dialog system, and AI enemy system. These items still need to be added towards
a fully functional game.

Acknowledgments. The authors are indebted to Unity3D™ Community and gratefully


acknowledge that several existing functions from Unity3D™ toolbox were used to implement the
figures shown in this paper. We also used several tutorials from YouTube™ including: (a)
Tutorials - 2D Roguelike Project, (b) Options Menu in Unity 5 Tutorial - Part 1, (c) Solving Sliding
Tile N-Puzzles With Genetic Algorithms and A*, (d) Unity Controller Controlled GUI Tutorial,
(e) Flying Camera Menu Tutorial [Unity3D 4.6], (f) Creating a Start Menu in Unity 5, (g) Unity
Third Person Control Tutorials. Other online resources acknowledged are: (i) Adventures in
Bitmasking “Angry Fish Studios, (ii) An exercise in modular textures - Scifi lab
UDK”, polycount, 2011. Available: http://polycount.com/discussion/89682/an-exercise-in-
modular-textures-scifi-lab-udk. [Accessed: 29- Nov- 2017], (iii) File:15-Puzzle.jpg - Wikimedia
Commons.

References

1. Green, D.: Procedural Content Generation for C Game Development: Get to Know Techniques
and Approaches to Procedurally Generate Game Content in C Using Simple and Fast
Multimedia Library. Packt Publishing, Birmingham (2016)
2. Felicia, P.: A Quick Guide to Procedural Levels with Unity, San Bernardino (2017)
3. Norton, T.: Learning C♯ by Developing Games with Unity 3D. Packt Publishing, Limited,
Birmingham (2013)
4. Hocking, J.: Unity in Action. Manning Publications Co., Shelter Island (2015)
5. Murray, J.: C# Game Programming Cookbook for Unity3D. CRC Press, Boca Raton (2017)
Some Barriers Regarding the Sustainability
of Digital Technology for Long-Term Teaching

Stefan Svetsky ✉ and Oliver Moravcik


( )

Faculty of Materials Science and Technology in Trnava, Slovak University of Technology,


Trnava, Slovakia
{stefan.svetsky,oliver.moravcik}@stuba.sk

Abstract. Computer support of teaching is linked to terms like e-Learning,


Technology-enhanced learning and Educational technology. Despite the very
high level of the global IT services, networks, and actual clouds, which are also
used in education, from a personalized teacher point of view, this is mostly only
technological infrastructure that can be used for handling a curriculum content.
A set of successfully used educational IT tools could be also mentioned, however,
these are often only single-purposed static solutions. One needs not to do a
research review – he can simply ask his colleagues – how many barriers are caused
by incompatibility of the software formats, and short life cycle of software, hard‐
ware and networks solutions. Paradoxically, it is automatically supposed in the
actual scientific papers that digital technology functions without any problems.
Additionally, many questions arise regarding the IT sustainability for long-term
teaching within engineering education. This paper describes some categories of
the barriers to embedding the digital technology into teaching and shows some
key points derived from around 12 years of the practical experience related to
solving the personalized IT support of teachers within teaching bachelors
students. As for the long-term teaching, it also demonstrates that state-of-the-art
of the IT support for university teaching is not yet suitable including indicated
lower IT skills of students. In real life, to be sustainable, a teacher should be
teacher, programmer and researcher in one person. Regarding elimination of the
barriers within long-term teaching, an all-in-one universal approach is presented
by using the in-house educational software BIKE(E), based on design of a “virtual
knowledge”. This specific default data structure enables solving knowledge
(educational data) transmission through the off-line and online environments. In
comparison with other solutions, it seems to be the most universal and sustainable
solution for personalized IT support.

Keywords: Computer support of teaching · Technology enhanced learning


Educational technology · Educational IT tools · Educational software
Digital technology · Personalized IT support

1 Introduction

In principle, each strategic policy is dedicated to integration of digital technology to


education. However, in contrast to long-term teaching needs, the life-cycle of

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 950–961, 2019.
https://doi.org/10.1007/978-3-030-02686-8_71
Some Barriers Regarding the Sustainability of Digital Technology 951

technological tools in real life is shorter. This fact is rarely discussed in scientific liter‐
ature, where it is automatically supposed that any technology or computer support of
teaching works perfectly. However, from the teacher’s point of view, it works not
perfectly. He needs a sustainable, i.e. long-term personalized support. Such personalized
support was a key issue of some ICT calls of the EU-FP7 research program, and its
importance is very realistically emphasized in [1]: “Ideally, personalization is relevant
to all the stages of the learner’s journey”. Similarly, its importance for lifelong compe‐
tence development is discussed in [2], or e.g. in the field of information and knowledge
processing is discussed in [3]. In view of this, teachers still use for their teaching mostly
global solutions (learning management systems, WEB-services, cloud computing, even
social networks). Around ten years ago, the global learning management system Moodle
was indicated as the most frequently used TEL - tool at European universities [4]. It is
still used, although nowadays, universities already use their own learning management
systems. One should be aware that such global solutions are static and not personalized
enough (they could be considered rather as a kind of an information or knowledge
management). It is emphasized similarly in [5] that “the most of the existing e-learning
platforms offer just a single way to organize the course contents (book structure)” and
“whoever is interested in organizing the course contents in a different way must use
specific tools that are external to the e-learning environment”.
Any support of uncertain and unstructured teaching processes by digital technology
is very complex, and it is much more difficult than a common IT support of technical
processes, which are well structured. This results into a certain terminological chaos and
different approaches like e-Learning, Technology-enhanced learning (TEL), Educa‐
tional technology, Computer supported collaborative learning. Nowadays, e-Learning
is mostly considered to be subject of distance learning e.g. as it was indicated by Univer‐
sity of Oxford, and “confusing” in general. It was replaced by the term “Learning and
technology” (the terms Technology-enhanced learning and Educational technology
were not accepted) [6]. From the real teaching practice point of view, the subject of TEL
was very realistically described in [7]. As for the practice, the actual situation in TEL
was discussed in the UCISA report [8]. In this report, it is argued that lack of time,
departmental/school and institutional culture, including internal funding remain “the
leading barrier to TEL development”.
It is not possible to present here a complex review because the integration of digital
technology into teaching and learning covers many fields of ICT and Computer science.
However, as could be generalized from the above mentioned, developing personalized
approaches and universal, all-in-one software is still a big challenge. To confirm the lack
of software solutions, one can quote from [9]: “a system design on the model basis has
been widely ignored by the community until now, and software engineering is missing
in TEL system development”. In light of the knowledge principles of teaching and
learning, such missing models should be developed based on the subject of knowledge.
In this context, semantic and ontological approaches should also be taken into account,
e.g. as a connection of technology and education [10], or knowledge representation and
ontologies [11]. The high level of ICT can be a basis for development of other specific
approaches, e.g. for solving ontologies dedicated to visually impaired people [12],
952 S. Svetsky and O. Moravcik

educational cloud computing [13], or combinations with machine learning or


evolutionary algorithms [14].
Based on the idea that computers can be the means of automation of teaching
processes (when knowledge is accepted as a regulated process parameter), author’s
approach is based on modeling a knowledge representation using the so-called virtual
knowledge as a specific default data structure. They already published this approach
[e.g. in 15, 16]. To better understand such approach, it must be mentioned that the
authors’ personalized computer support of teaching was empirical at the beginning. Step
by step it evolved into systematic research under the umbrella of TEL (it was the official
term of the FP7-ICT calls in period of 2007–2013). Within this research a paradigm of
batch (balk) information and knowledge processing was designed, which is performed
by the in-house developed software BIKE(E) (Batch Information and Knowledge Editor
and Environment). Its selected part WPad is installed on classroom computers. The
software is written by the main author of this paper.
By using BIKE(E), teaching material, informatics training tools, and personalized
virtual learning environment with communication channels were produced. Together
with personal cloud and network spaces, this created a background infrastructure for
integrating digital technology into teaching. The personalized approach to teaching
processes as knowledge based enabled authors to research the automation of teaching
processes by using a model of virtual knowledge, including solving the knowledge
transmission between computers and networks (registered at the patent office). In addi‐
tion, this virtual knowledge as default data structure enabled teachers to research new
approach to collaborative teaching and collaborative activities of teachers, students and
researchers on shared virtual spaces or clouds. Authors’ current research is focused on
(multi-lingual) human knowledge processing, modelling a new generation of educa‐
tional packages, and working on a vision of educational robot. It should be emphasized
that this approach does not use any mathematical model. Technological design is simply
based on a simulation of mental processes running within teaching, self-study and asso‐
ciated activities (e.g. writing paper or research project). In principle, the virtual knowl‐
edge only “switches” between human and machine as it is needed for (educational)
knowledge processing and transmission, including informatics activities - to assure
compatibility and adaptation to Windows and networks.
The authors’ target is that all must be low-budget, effective and user friendly from
the users’ point of view, including using minimum of interfaces and software. And
of course, knowledge processing is in natural language. To the best of authors’
knowledge, such approach (knowledge representation, data transmission, personal‐
ized educational software, batch knowledge processing paradigm) does not exist yet.
It seems to be beyond the state of the art, i.e. there are no analogical results in the
literature on modelling the automation of teaching processes. It should be also
emphasized that the authors’ complex approach covers many informatics areas in
comparison with common solutions of computer support of teaching, which are
mostly single-purpose and monothematic. In other words, many barriers of digital
technology integration into teaching must be overcome, especially from the long-
term teaching point of view. Some of them are discussed in the next section that
relates to internet browsers, changes of hardware and software, servers and
Some Barriers Regarding the Sustainability of Digital Technology 953

networks, including human factor and behavior of students. The significance of


didactics and informatics algorithms and off-line and online data transmission for
automation of teaching processes is discussed in the separate sections (it is impor‐
tant for collaborative learning activities).

2 Barriers of ICT Integration into Teaching

2.1 Internet Browsers

Internet browsers are needed because the teaching material was designed by the teacher
and produced by the continually developed in-house software BIKE(E)/WPad as a set
of html-files. They were used for browsing both in offline mode (classroom computers,
notebooks) and online mode (faculty’s virtual learning space). At the beginning of
research on TEL, identical learning content was used by students in classrooms, faculty’s
server and teacher’s private internet domain. Within 10 years of authors’ research, only
internet browsers and WPad were used, despite the fact that mostly PowerPoint is
commonly used for lectures and exercises. Each student or group of one to three students
had always the study material at their own computer screen (it was combined with using
sheets of paper or blackboard, e.g. when writing chemical formulas). From the peda‐
gogical point of view, the OPERA browser under Windows XP was the most suitable
browser. It enabled students to create sessions from study materials (something like a
personal library), however, only until version 9.27, when it was re-designed to be
compatible with the Google Chrome browser. In general, internet browsers versions are
continually updated, therefore they have short life cycle. Moreover, each new version
of the internet browser required an increased memory (RAM) capacity. It resulted in
slow performance of classroom computers, consequently, teaching process was often
interrupted. The same situation repeated itself when the “more popular” Firefox and
Google Chrome were used later on the Windows 7 operating system. In this case, older
versions of internet browsers had to be used because the higher versions were not
supported.
Such obstacles complicated the computer support of teaching when students gener‐
ated html-files, e.g. within writing the collaborative semester work, or production of
study material for chemistry to eliminate their lower knowledge. Other specific problems
occurred, e.g. because the html-outputs generated by WPad were not always identical
in every browser, including problems with the diacritic of natural language (some
computers had only English settings).
It must be mentioned as well that the described research was performed by the teacher
at the faculty’s detached workplace which has no IT administrators.

2.2 Changes of Hardware and Software

One should be aware that computers and hardware have shorter lifetime than the one
that is required by any long-term teaching and learning. The teacher had to move all his
documents, pictures and other files from old computers to new computers (commonly
tens to hundred thousand files). It was therefore very time-wasting, to change data for
954 S. Svetsky and O. Moravcik

10 or 15 classroom and home computers. But it was even more time-wasting to install
previously used software into new computers, including its activation. In addition,
certain 32-bit programs, e.g. for speech recognition, worked only on 32-bit computers.
Thus, the teacher had to upgrade to 64-bit versions of the software.
In comparison with this, the transfer of study content created by the educational
software BIKE(E)/WPad was not problematic because this database application func‐
tions well under any Windows operating system. However, because of newer Windows
versions and software, certain specific items of the user menu had to be modified in order
to adapt the existing programming codes to the newer operating system. It affected
mainly the menu items design related to file management in offline mode. It must be
also noted that these obstacles are rarely mentioned in literature. In all related scientific
papers, it is mostly automatically supposed that technology works without any problems.
In the case of IT support of teaching, computers worked reliably commonly for three to
five years. After that period they were slower and slower.
For example, it took the teacher’s work computer - with Windows XP (2005) on it
- ten minutes to launch and it was regularly frozen during the antivirus background scan.
Nevertheless, this was eliminated via remote control, i.e. by switching from computer
to the teacher’s virtual space on the faculty’s cloud. After the switching, the teacher
worked without problems using Windows 7 and WPad (client-server).
In real life, a teacher must simultaneously use computers with different operating
systems, usually combining his work computers with home client computers and note‐
books. In our case:
• Home computer’s lifecycle with Windows XP on it was about 15 years before its
break-down (currently, Windows 10 notebook).
• Work computer with Windows XP (2005) on it was discarded (2017) when the
teacher changed work position.
To summarize, the lifecycle of hardware and software was less than five years. The
educational software BIKE(E)/WPad was working without any problems for 12–15
years. This meant that it was not problematic to copy the educational content, only
certain menu items, related to the compatibility with Windows functions or general
software, were rewritten.

2.3 Servers and Networks


Nowadays, digital technology is more or less embedded in every kind of teaching, so to
use terms like “traditional face-to-face teaching”, blended learning, WEB-based
learning, instructional design, as well as, e-Learning, technology-enhanced learning,
educational technology, seems to be problematic. Namely, if a teacher wants to solve
the IT support of his teaching directly or indirectly, he must combine several kinds of
methods and technological tools (software, hardware).
Despite the fact that authors published their research continually under the umbrella
of technology-enhanced learning, they consider their research basically as face-to-face
teaching, which is supported by any off-line and online digital technology tools to auto‐
mate teaching processes. During teaching hour, an IT tool must be used which enables
Some Barriers Regarding the Sustainability of Digital Technology 955

the teacher to deliver teaching content in the quickest way and within limited time.
Namely, one must be aware that teaching hour has only 45 min and the teacher must
cover multiple topics within this short period.
In this context, after implementing the technological off-line and online infrastructure
with teaching content (educational knowledge repository), it was needed to solve the feed‐
back and knowledge flow transmission. Therefore, in this stage, the teacher wrote PHP/
MySQL internet applications which functioned as communication channels between the
classroom and faculty’s server learning space. For example, when teaching C++ program‐
ming, this was the quickest way to transmit programming codes of students sitting in class‐
rooms, i.e. downloading and uploading files generated by WPad in combination with the
Bloodshed Dev-C++ software (Note: it is a full-featured Integrated Development Envi‐
ronment (IDE) for the C/C++ programming language - http://bloodshed.net/ - http://
orwelldevcpp.blogspot.sk/2015/). In this case, students wrote C++ codes to WPad-IDE,
and using a specific user menu item, the codes were sent and directly opened in Dev-C+
+-IDE. Finally, students launched it simply by pressing F9. In this case, WPad functioned
as a repository of solved codes. Then new bachelors student could use the base of program‐
ming codes written by students in previous semesters.
Such support of the programming languages teaching seems to be the ideal peda‐
gogical solution. Namely, if a group of students had a task to write a programming code
and only one student in the classroom wrote the correct code, then the teacher only asked
the student to upload his code to the virtual learning space and all students could see it
on their computer screen (and eventually download it and practice on their computers).
Now, one is able to understand, that several years ago, when the faculty’s Windows
server switched to UNIX where texts were case sensitive, the amount of work the teacher
had with rewriting hundreds of learning (browsing) paths within existing PHP-codes.
This illustrates that a teacher must overcome also these specific barriers, especially a
teacher who teaches, designs and researches his teaching using digital technology for a
longer time. Despite this fact, it was not the most crucial moment. Situation got much
more complicated when after ten years of existence of the faculty’s support learning
portal, the life of the internal server technically ended (2017). In other words, the study
content and support tools on the faculty’s virtual learning space are not usable for
teaching anymore. On the other hand, a new server is being just launched, so a part of
content will be transferred to it. This example illustrates the fact that the life cycle of
hardware, networks and servers is shorter than the lifetime which is required for the long
life-learning of individuals.

2.4 Students and Human Factor


Authors’ approach to computer support of bachelors teaching was continually developed
according to the reactions of students to pedagogical/didactic and informatics issues. At
the beginning, the in-house software BIKE(E)/WPad was programmed to produce
teaching and study material for some courses of study (Background of environmental
protection, Occupation health and safety, Chemistry, Industrial Management). The
generated html files were uploaded via teacher’s FTP access to the faculty’s server.
956 S. Svetsky and O. Moravcik

Despite the fact that a good pedagogical quality material, which was suitable for self-
study, was prepared, students were not enough motivated to use it. In order to improve
the feedback, a specific internet PHP/MySQL application was written that functioned
as communication channel. So each course of study was supported by this communi‐
cation channel to simulate popular internet chats.
However, there were two challenges. Firstly, students surprisingly were not able to
guess what to write into the communication channels, although the channels were also
dedicated to information exchange between students. Thus, the teacher had to tell them
to, e.g. write there the titles of their semester works. After several years of using the
communication channels, one can say, that they were used by students without teacher
instructions in less than 5% cases (it was mostly communication between distance
students). This means that the channels were rather used for controlled communication,
instructions of teacher, embedding of study material, even for testing suitability for
semester exams.
Secondly, if one writes WEB-applications using PHP-forms, there is still the problem
of text formatting. So if the student or teacher is to write any text into a text-box form
they must be aware that the form is not a text or html editor. In our case, this was solved
by pre-formatted text templates as illustrated on Fig. 1. To be honest, one can say that
students did not fully understand it. On the teacher’s side, when he prepared study or
instructional materials for the communication channels by using BIKE(E)/WPad, this
challenge was solved by writing a code for user menu, which converted natural language
text to html-text. This text was then copied to text-boxes of the communication channels.

Fig. 1. Example of the text area of communication channels.

After long-term development of the personalized computer support teaching for


several courses of study, one should be aware that there are hundreds of links and paths
at disposal in the virtual learning environment, open WEB-domain or on off-line class‐
room computers. To support browsing teaching and study materials, a navigation html
file was prepared for each course of study. Such navigation file contained thematic
blocks with set of links (e.g., Study material, Communication channels, Calculation area,
Tests, Results).
Some Barriers Regarding the Sustainability of Digital Technology 957

Surprisingly, there was the same problem at the beginning of every teaching hour.
Namely, students were not capable to open the links and start the work on time. It took
sometimes 10 min off the teaching hour. Although the links were put into Favorites of
each browsers, the problems continued. It also occurred after the decision to use only
Internet Explorer as default browser (e.g., some students installed Firefox or Google
Chrome on their classroom computers again), or after sending them an e-mail with the
links. There was only one possibility to technically solve this problem. However, it was
time consuming for the teacher who personally had to set the links on each classroom
computer. One could wonder where the so called “digital natives” were?
Here must be noted that this situation is connected to the general problem when using
computers, i.e. how to switch within and between off-line and online interfaces (see,
how we all click, click and click on links). From a research point of view, it logically
led to an idea that the navigation paths must be dynamically joined with knowledge
flow. It enabled to work out additional BIKE(E)/WPad codes and test the codes for this
problem. In other words, the virtual unit should contain both content and active elements
(off-line paths, online-links). This was successfully tested within teaching the program‐
ming language course which has commonly 4–7 students. This resulted in new authors’
visions for the future research that in principle the pedagogical/didactic, informatics and
application algorithms should be solved and designed synchronically (compare it with
the next text as regards the categories of algorithms).

3 Categories of Algorithms

A typical problem when solving the computer support of teaching is the mechanical use
of existing general software. It is suitable mostly for content storing and processing, not
for feedback and dynamic teaching activities which are connected to continual knowl‐
edge flow. However, when anyone writes programs, e.g. for accounting, he must know
accounting rules and algorithms (a sequence of accounting steps). Thus, if anyone should
write programs for automation of teaching processes, as well, these must be primarily
defined and described by the teaching algorithms (i.e. as sequences of pedagogical/
didactic steps).
According to the authors findings, from a programming of view, with one item of
the teaching content tens of activities (events) can be potentially connected. Such items
can be represented by information, content knowledge, schema, which should be used
(processed) within a teaching hour in various ways and up to the teacher needs. Similarly,
other activities can be connected with the items like reading texts, constructing teaching
material, internet retrieving, writing a paper, assessment, translating, repeating, drilling
etc. If one had a set of defined pedagogical/didactic algorithms and was able to write
programming codes or applications in general, he would find that this is additionally
connected with many activities on his computer, e.g. file management, conversions to
various formats (texts, visualization, audio/video). For example, when writing a paper,
he must continually switch between many off-line and online windows, folders and use
many paths and links. In other words, he must be aware which repeated algorithms he
uses when working on his client computer, notebook or network.
958 S. Svetsky and O. Moravcik

During their research, authors indicated that the above mentioned approach “how to
computerize and automate teaching and associated educational activities” requires one
to solve three categories of algorithms:
• Pedagogical/didactic algorithms needed for teaching (within lectures, exercises, self-
study).
• Algorithms for solving the adaptation and compatibility related to Windows, servers
and networks.
• Application algorithms for supported teaching activities (to produce outcomes by
processing knowledge flow).
A very important finding from modeling the Computer supported collaborative
learning was that the identification of pedagogical/didactic algorithms is crucial for this
kind of IT support. It is also the key issue when solving shared collaborative activities
of group of teachers and researchers. From a layman point of view, if a sequence of
teaching steps is not known and defined it is not possible to write any programming
codes for collaborative and shared activities.

4 Off-Line and Online Data Transmission

Teaching process is about knowledge transmission between teacher and students. There‐
fore, the issue of data transmission is very important. In view of this, one should be
aware that a teacher or students must use and manage hundreds of online and off-line
sources to find and select a suitable information and knowledge. Technology enables
them to perform information or knowledge management basically as the so called file
management. Nowadays, in real life, a teacher uses commonly multiple home or work
computers, i.e. client computers and notebooks, including virtual spaces of clouds and
networks. Figure 2 illustrates such personal infrastructure which was developed within
the long-term research on TEL to overcome the technological barriers mentioned above.
The issue of educational data transmission (content transfer) is less described in
scientific literature than content processing, despite the fact that it is very time
consuming. It does not matter whether a teacher teaches ten to hundred bachelor students
or he solves automation of personal processes. To overcome this time barrier, it is
important to identify which activities he must often repeat, in order to write appropriate
programming codes. As regards the computer-computer file transmission when using
the ICT infrastructure in Fig. 2, one has various alternatives, i.e. using USB, file manager,
Windows Explorer or WIFI to copy/move or download/upload the files. However, if he
often needs to transfer files between folder C:\AA and D:\BB he can automate it e.g. by
the command written into the bat file:
XCOPY C:\AA\file.* D:\BB\file.*.
In our case, the teacher must transmit files between home computers, IBM BOX cloud
and faculty’s cloud. The quickest way is by using IBM BOX cloud, which enables
synchronization, so, it is possible to write the following bat-file command:
XCOPY C:\AA\file.* C:\USERS\…\BOX SYNC\file.*
Some Barriers Regarding the Sustainability of Digital Technology 959

(i.e. to copy the file into BOX Sync-folder on teacher’s computer). In the case as in
Fig. 2 that WPad is installed both on the faculty’s cloud and client computers/notebook
(actually Windows 10) the bat-file command could be:
XCOPY C:\AA\file.* \\TSCLIENT\D\BB\file.*
As regards the transmission of BIKE(E)/WPad-files (database tables), it is possible to
save them directly to the BOX Sync-folder. As was presented within the FTC 2017
conference, the data transmission is performed according to the Utility model regis‐
tered on the Patent office.

Fig. 2. Example of the personal teacher off-line/online ICT infrastructure.


960 S. Svetsky and O. Moravcik

5 Conclusions

In contrast with actual scientific literature, which automatically supposes that digital
technology always works in the education setting without problems, this contribution
discussed some challenges regarding the IT sustainability (suitability) for long-term
Engineering teaching. Based on around twelve years of experiences with developing a
personalized computer support of teaching, it was demonstrated on examples (from the
teaching practice and academic research) that life-cycle of browsers, software, hardware,
servers and networks is shorter than a real personal computer support of teachers and
students requires. Some categories of barriers were mentioned. Basically, the life-cycle
of technological tools is too short (3–5 years), while the teacher needs the technology
for much longer periods. The authors’ research has shown that the best solution to this
problem is to develop both their own educational software and personalized ICT infra‐
structure for educational data transmission (see Fig. 2). In other words, the sustainability
of the computer support of long-term teaching was achieved by using the all-in-one
software application BIKE(E)/WPad (it has been in use for more than ten years). The
application was continually developed as an all-in-one software while teaching around
two thousand bachelor students under umbrella of the TEL. In principle, it is based on
design of a virtual knowledge data structure that enabled a single user to handle educa‐
tional content in many ways up to his personal needs. This provides teachers, students
or researchers with a user friendly transmission of personalized data via the virtual
knowledge structure through off-line and online environments. In this context, the soft‐
ware application BIKE(E)/WPad is additionally suitable for modelling the collaborative
activities of students or researchers running on faculty’s servers or global clouds. The
most effective and low-budget solution for teaching and collaborative activities was the
case when WPad was installed both on cloud or virtual computer with Windows 7 and
personal computers of teacher and researchers.
Technically, the research limitations are that this software is Windows dependent
and that all content must be converted to plain text format. From the pedagogical point
of view, the teachers must be willing to use the software and be able to think of ways
how to insert knowledge content into the virtual knowledge mentioned above (e.g.
content of the course of study). The future research on collaborative activities requires
solving a design of their pedagogical-didactic algorithms as a background for writing
the programming codes and applications. Authors also presented a new vision of their
future research proposing that the pedagogical-didactic, informatics and application
algorithms should be designed synchronically.

References

1. Laurillard, D.: Digital Technologies and Their Role in Achieving Our Ambitions for
Education. https://www.researchgate.net/publication/320194879_Digital_technologies_
and_their_role_in_achieving_our_ambitions_for_education. Accessed 16 Feb 2018
2. Kostadinov, Z.: Sharing personal knowledge over the Semantic Web. In: Proceedings of the
International Workshop: Networks for Lifelong Competence Development, Sofia (2006).
http://dspace.ou.nl/bitstream/1820/720/1/Paper09.pdf
Some Barriers Regarding the Sustainability of Digital Technology 961

3. Bieliková, M., et al.: Personalized Conveying of information and knowledge. Studies in


informatics and information technology. In: Research Project Workshop Smolenice, pp. 53–
86. SUT Press, Bratislava (2012)
4. Matusu, R., Vojtesek, J., Dulik, T.: Technology-enhanced learning tools in European higher
education. In: Proceedings of the 8th WSEAS International Conference on Distance Learning
and Web Engineering, Santander, Cantabria, Spain (2008)
5. Alfano, M., Cuscino, N., Lenzitti, B.: Structuring didactic materials on the WEB (STRUCT).
Commun. Cognit. 41(1 & 2), 53–66 (2008)
6. Programme Specification for M.Sc. Education (Learning and Technology). Department of
Education, University of Oxford. http://www.education.ox.ac.uk
7. Goodman, P.S., et al.: Laurence Erlbaum Associates: Technology Enhanced Learning:
Opportunities for Change (2002)
8. Walker, R., Voce, J., Swift, E., Ahmed, J., Jenkins, M., Vincent, P.: 2016 Survey of
Technology Enhanced Learning for higher education in the UK. UCISA TEL Survey report
(2016)
9. Martens, A.: Software engineering and modeling in TEL. In: Huang, R., Kinshuk, Chen, N.-
S. (eds.) The New Development of Technology Enhanced Learning Concept, Research and
Best Practices, pp. 27–40. Springer, Heidelberg (2014)
10. Tolgyessy, M., Hubinský, P.: The kinect sensor in robotics education. In: Proceedings of RiE
2011, 2nd International Conference on Robotics in Education, Vienna, Austria, pp. 143–146
(2011)
11. Haidegger, T.: Developing and maintaining sub-domain ontologies. In: Proceedings of
Standardized Knowledge Representation and Ontologies for Robotics and Automation.
Workshop at IEEE/RSJ IROS, Chicago, IL (2014)
12. Mikulowski, D., Pilski, M.: Ontological support for teaching the blind students spatial
orientation using virtual sound reality. In: Advances in Intelligent Systems and Computing
book series (AISC) Interactive Mobile Communication Technologies and Learning,
Proceedings of the 11th IMCL Conference, vol. 725, pp. 309–316 Springer (2018)
13. Shyshkina, M.: The general model of the cloud-based learning and research environment of
educational personnel training. In: Auer, M., Guralnick, D., Simonics, I. (eds.) Teaching and
Learning in a Digital World. ICL 2017. Advances in Intelligent Systems and Computing, vol.
715. Springer, Cham
14. Volná, E., Kotyrba, M.: A comparative study to evolutionary algorithms. In: Proceedings
28th European Conference on Modelling and Simulation, ECMS 2014, Brescia, Italy, pp.
340–345 (2014)
15. Svetsky, S., Moravcik, O.: The implementation of digital technology for automation of
teaching processes. In: Proceedings of the Future Technologies Conference, San Francisco,
USA, pp. 340–348. IEEE (2016)
16. Svetsky, S., Moravcik, O.: The empirical research on human knowledge processing in natural
language within engineering education. In: WEEF & GEDC 2016: The World Engineering
Education Forum & The Global Engineering Deans Council, Seoul, Korea, pp. 10–12 (2016)
Digital Collaboration with a Whiteboard
in Virtual Reality

Markus Petrykowski(B) , Philipp Berger, Patrick Hennig,


and Christoph Meinel

Hasso-Plattner Institute, Potsdam, Germany


markus.petrykowski@student.hpi.de
https://hpi.de/forschung/fachgebiete/internet-technologien-und-
systeme.html

Abstract. Nowadays virtual reality has become suitable for the end
consumers and offers a whole new way of digital interaction. This tech-
nique allows users to perceive and engage with the environment as they
would interact with it in the real world. We introduce a virtual reality
whiteboard that allows users to work together and implement Design
Thinking methods as in a real work space. We conceptualize the inter-
action methods and implement a prototype to tackle typical collabo-
ration tasks including brainstorming, prioritizing or the explanation of
concepts. The conducted user study shows that participants performed
better for specific Design Thinking tasks within the virtual reality in
comparison to today’s digital whiteboards, although these are supported
by personal interactions like video conferencing.

Keywords: Collaboration · Design-Thinking · Virtual reality


WebVR

1 Introduction
People are using computers for a long time now. With a keyboard and a mouse
they are able to interact with them. After touch displays have become affordable
and popular, people also started to use them as well. A lot of applications, and
probably the most common ones like text editing, make a lot of sense to be used
with a keyboard and a mouse because it does not extend to a third dimension.
At this point it makes sense to introduce virtual reality. It provides a more
natural and intuitive way of interaction. The user is able to grab objects and
change their position by actually grabbing them with his hands. Virtual reality
was first introduced by Sutherland [1] in 1965 where he described his idea of an
ultimate display. From this moment on applications for VR evolved slowly.
Apart from a more natural way of interaction, virtual reality also allows to
increase the perception of social presence. People may physically be at different
places but can be at the same place within a virtual environment. This possibility
opens up a whole new way of bringing people together. Design Thinking is an
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 962–981, 2019.
https://doi.org/10.1007/978-3-030-02686-8_72
Digital Collaboration in Virtual Reality 963

innovation method that succeeds by aggregating the knowledge and experience


of multiple people with different backgrounds. Therefore methods from Design
Thinking such as brainstorming are particularly interesting to be implemented
within a virtual reality application.
Having a look at different collaboration tools shows that their complexity
increases. Looking at Google Docs1 which enables people to write texts or cre-
ate slides together, going further to applications like neXboard that provides a
digital whiteboard to make Design Thinking over distances possible. This shows
the various possibilities for collaboration. People can either work together while
being at the same place or, enabled by modern technologies, they can collabo-
rate while being at different places. The enhancements that can be created by a
virtual reality prototype will be worked out in this paper.
This paper proposes an application that enables Design thinking within the
virtual reality. This raises two challenges in particular. The first is to create an
environment in which users can experience each other in a similar way as they
could if they would meet in person. The second tackles the mapping of different
2D interactions to matching or alternative user interactions within the virtual
reality space. For example how a user selects a tool or how he even uses it.
Using our prototype (see Fig. 1), we propose solution to these challenges and
evaluate them by executing a user study to compare digital whiteboards and
VR solutions.

Fig. 1. The VR prototype for support- Fig. 2. neXboard with video-functionality


ing remote collaboration tasks.

2 Related Work
Since the first idea of virtual reality with head mounted displays proposed by
Sutherland in 1968 [2] there have already been different approaches for creating
virtual environments.
We first introduce collaboration systems without virtual reality, explain an
augmented reality approach which is a compound of 2D interfaces and a 3D
environment and show successful implementations of collaboration within virtual
reality.
1
An online service provided by Google that enables people to work on documents
collaboratively https://www.google.de/intl/en/docs/about/ (February 2018).
964 M. Petrykowski et al.

The neXboard is the successor of the Tele-Board. It has been developed by


Wenzel and Gumienny et al. [3,4] as a browser based application. It is a digital
whiteboard that enables people to work together in real-time with sticky-notes
and scribbles. It also focuses on supporting users to perform Design Thinking
across distances [5]. Design Thinking is an innovation method that has been
further developed by Meinel, Plattner and Weinberg [6].
The collaboration can be supported by a video-conference as shown in Fig. 2.
NeXboard is an example for a successful approach to support digital collabora-
tion among distances. The videoconferencing approach has been shown to have
a positive impact onto the result as described in more detail by Fussel et al. and
Wenzel et al. [7,8].
Szalavári et al. [9] propose a system called Studierstube that enables users
within one room to collaborate with each other, while using an Augmented Real-
ity head mounted display. According to them, AR should give the possibility “[..]
to provide insight into a complicated problem by the enrichment of simulation
data.” They put an emphasis onto bringing the people together in one room.
This would allow them to communicate the way they are used to - by speaking
and gesturing. This approach shows that collaboration, even if it is supported
by digital technologies, thrives from the presence of the coworkers and the pos-
sibility to see and talk to each other.
DIVE, which was introduced by Carlsson and Hagsand [10], is a system
designed to engage users to see, meet and collaborate with other users and
applications in a virtual environment. All participating users are represented by
“graphical objects called body-icons”.

Fig. 3. DIVE VR System - View of one Fig. 4. Interaction tools and different user
user seeing a board and other users [2]. created objects [11].

Figure 3 shows the DIVE application, the authors do not use real world
metaphors for the interaction between the user and the virtual environment.
Actions such as grasping and selecting are performed with a 6-DOF mouse by
pointing at objects and clicking the mouse button. Despite that DIVE used old
technologies for a virtual reality implementation they still showed that users
within a virtual environment profit from a digital representation of each other.
Digital Collaboration in Virtual Reality 965

To further dive into related work we need to discuss interaction techniques


for a collaborative VR environment like CocoVerse. CocoVerse is a “Multi-User
Framework for Collaboration and Co-Creation in Virtual Reality” by Greenwald,
Corning and Maes from the MIT Media Lab [11]. This system shown in Fig. 4
is the most recent one and also makes use of modern virtual reality hardware
(HTC Vive). According to the interaction they stick to the principle of locally
available interaction. The user is equipped with a toolbelt that he can access
regardless of his position. Regarding the interaction with virtual objects their
system allowed the users to either interact by intersecting with the objects or
by reaching them from a distance. This approach shows that by using modern
virtual reality hardware the user can be equipped with naturally feeling ges-
tures that create an immerse experience. The authors emphasis the importance
of locally available interaction e.g. to change between available tools. To sum
up, the described approaches have different emphases but still are collabora-
tion tools. According to the collaboration they point out, that the feeling of
social presence and the possibility to interact with each other plays an impor-
tant role. The feeling of such a social presence can be supported by providing
digital avatars that represent each user. Further more the immersiveness of a VR
system plays an important role for conveying realistic experiences. Giraldi et al.
[12] mention a few basic principles that should be covered such as a view based
on the heads position and a depth perception through stereoscopic viewing to
create a immersive experiences. The introduced systems also show an emergence
of interaction techniques. Whereas in the early days of VR systems like DIVE or
Studierstube used pointer based input such as rays, the more recent applications
like CocoVerse made use of modern hardware to provide more natural metaphors
for reacting to the virtual environment. What sticks out for VR systems is that
the possibility to move around within the virtual environment raises the need of
locally available interaction, which is an important principle.

3 Collaboration in Virtual Environments

The previous section showed similar work done by other researchers and
extracted the essential points. This thesis evaluates how far virtual reality is
able to support collaboration with whiteboard related applications. In the fol-
lowing section the main aspects for the prototype are being discussed according
to interaction paradigmas and the creation of social presence to provide an easy
to use collaboration tool.

3.1 Interaction Paradigmas

The neXboard, which will be the whiteboarding tool that the application is
based on, allows different kind of interactions, such as creating cluster, drawing
scribbles, creating sticky-notes or placing voting dots on different sticky-notes.
To keep it simple enough on the one hand, but complex enough to efficiently
work on a task on the other hand the tools for the following discussion are being
966 M. Petrykowski et al.

reduced to the three essential ones. Those are a pen to draw on the virtual board,
a tool that creates sticky notes and the general purpose hand for gestures and
expressions. Those are being called the pen tool, the sticky tool and the hand
tool in the following.

3.2 Tool Selection


As the tools have now be defined it is crucial to figure out how the user can
choose which one he wants to use.

Using Buttons. The VR-joysticks usually provide the user with physical but-
tons he can click. To most people until the age of around 35 those buttons are
familiar due to the popularity of game consoles from the 1970’s such as an XBox
or a Playstation. Therefore this feels usable for them. However, this approach
would not seamlessly integrate with the virtual environment.

Using a Digital Tool-Belt. Another technique could be build similar to the


one that was used by Greenwald, Corning and Maes [11]. They proposed a tool-
belt that opens up once the user reaches down to his waist level. The open
tool-belt allows the user to select between the tools. This has the advantage
that tools are always in reach and not placed at a certain position. This sticks
to the principle of locally available interaction described in Sect. 2. Due to the
possibilities to freely move around in the virtual environment users would not
have to walk a lot in order to change their tool.

Changing Tools Through Gestures. A third method could enable the user
to use a certain set of gestures which would magically change the current tool.
This would also apply to the principle of locality as the user can perform those
gestures wherever he is. But on the other hand the gestures could be harder to
memorize and would definitively lack in explorability. It describes the ability for
a user to find out this feature on his own. Another point where this approach
might be difficult could be the number of tools. The more tools available, the
more gestures a user has to memorize.
Apart from all the complications this approach could have, it could also
be applied in a simple manner. The application could for example provide one
simple gestures to change through the available tools. This method would allow
an easy and fast switch between the tools if there are only a few.

Providing Dedicated Hardware for Different Interactions. A forth app-


roach is to actually have dedicated hardware for the user to choose from that can
be used for certain interactions. The oculus touch controller could therefore act
as the general purpose device. But one could also imagine to use some kind of
virtual reality pen that behaves like a pen in the real world. This would empower
Digital Collaboration in Virtual Reality 967

the user to get a good feeling of the tool he currently uses. A challenging problem
would be to help the user to find the devices.
There are a few problems that come with that idea. The first one is the
realistic tracking of the additional controller. If one has multiple devices, he
cannot hold them all at once in his hand. Therefore he would want to place
them somewhere. So a device could either be placed on the ground or on certain
objects. Since the user has to be aware of those objects in his environment in
order to not collide with them during his VR experience they also have to be
tracked. Of course the additional hardware could also be placed outside the
tracked area. This would lead to the difficulty of getting them back without
leaving the immerse space. It could either be solved by an additional person
that reaches in the tool or by grabbing and searching for them which would be
the same as tapping in the dark.
All in all this approach would be very challenging on the one hand, but
it would open up the opportunity to provide a realistic and maybe even more
immerse experience for the user.

Comparison. After the closer look to the possibilities of how to select a cer-
tain tool two methods sound most promising. For the immerse experience within
the virtual environment it is most important to understand and access everything
within this world. This requirement sorts out the first method, as the buttons are
not visible to the user and do not completely fit into the natural metaphor, as
well as the last introduced technique of dedicated controller in Sect. 3.2 since it
creates more problems than it actually solves in order to evaluate VR for digital
collaboration.
Therefore the gesture technique from Sect. 3.2 could be considered for the
tool selection because it provides a simple interaction as well as the tool-belt
approach which provides a convenient and already approved method of accessing
different kind of tools.

3.3 Interaction Methods


Since the user should now be able to choose the tool that he wants, the next
important aspect is to decide how the actual interaction with the environment
takes place. In the following three different approaches are being introduced. In
“A Survey of 3D Interaction Techniques” [13] Chris Hand distinguishes between
either 2D or 3D based interaction techniques. The following two approaches
cover both of them.

Interaction Through Pointing. Users can for example use a ray-cast to inter-
act with the virtual environment. Their interaction is accomplished through a
point or as Eric Bier proposes a so called Skitter [14] a wireframe representation
of a 3D graph that allows a better 3D orientation. Taking the three different
tools into consideration this method would work for all three of them. This is
possible due to the two dimensional nature of the whiteboard. The neXboard
968 M. Petrykowski et al.

is intended to be used with a mouse pointer on a desktop computer or with


touch input on touch sensitive devices. A ray caster provides a pretty similar
experience as a 2D mouse. The probably biggest advantage of this method is
the ability to interact with the environment from a distance, at least to a cer-
tain extend, as the user still needs to see what he is doing. On the other hand
it could break with the real life metaphor. CocoVerse described in Sect. 2 has
also evaluated a ray-casting interaction. They found out that users found this
possibility convenient.

Interaction Through Intersecting. This approach enforces the user to be


right next to the object that he wants to interact with, as he needs to virtually
intersect his hands and the object. In that case it is only the virtual whiteboard
he can engage with. The described interaction could be depicted by the real
world scenario of standing in front of a big touch screen. The screen does not
allow any kind of three dimensional interaction. Instead the user is only able to
execute tasks by touching and dragging. Nevertheless the user experiences the
perception of direct manipulation of the board. Compared to using a mouse, the
touch in the virtual environment, which corresponds to the intersection between
the hand and the virtual whiteboard, equals to pressing the mouse. Moving the
touching hand represents a mouse move with a clicked button. And removing
the hand from the board corresponds to the same behaviour as releasing the
mouse button.
Although the actual interaction can be mapped onto the usage of a mouse it
nevertheless provides a more immerse experience, which is also the advantage of
this approach.

3.4 Creating Social Presence of Others


The most cited advantage of virtual reality is the possibility to create the sense
of other peoples presence although they physically are somewhere else. This is
reflected in the different applications like for example Facebook Spaces2 , which
is a communication tool for virtual reality, that evolved within the last year.
They cover areas like communication or even social gaming.
Social presence is still important for people in order to perform certain tasks
well. That is why companies spent a huge amount of money to bring their
employees together. The most immerse way before virtual reality has been video
conferencing which still plays an important role. But virtual reality is able to
deliver a more realistic and immerse experience.
The most common way of creating social presence is by using so called ghosts
or digital avatars. Those avatars represent each user. Due to the accurate tracking
of hands and head provided by the Oculus Rift, it is possible to deduce the
posture of each user. This allows realistic representations of each participant
and leads to the illusion of actually being next to each other.
2
See more on https://www.oculus.com/experiences/rift/1036793313023466/ (Febru-
ary 2018).
Digital Collaboration in Virtual Reality 969

3.5 How Is Collaboration Made Possible

To successfully enable the users to collaborate with each other in virtual reality
certain conditions have to be created. First they need to be able to perform direct
interaction. This means that actions they make should instantly be delivered
to every other participant in the room. This action contains changes to the
whiteboard as well as performed gestures with their body, like pointing, and the
spoken language. Second the user need to be familiar with the interactions and
the environment. The last point is the attendees of the virtual environment need
to perceive each others presence.

4 Prototype

This section describes the built prototype according to its technologies in the
beginning and explains what it is capable of in the following.

4.1 Technologies

The prototype was built with modern web technologies to show that virtual
reality can also be leveraged by today’s web browsers. A particular advantage
that comes with this approach is the massive compatibility with different devices
and heterogeneous systems where a web browser can run on. That is especially
important for collaboration tools, as they should provide the possibility for as
much user as possible to participate.
Therefore the virtual reality web application uses a library called A-Frame3
which abstracts from WebGL (Web Graphics Library) to render the virtual envi-
ronment. A-Frame’s purpose is to easily create three dimensional environments
by only using the browser’s DOM (Document Object Model). It for example
takes care of rendering the stereoscopic images for each eye.
The system was built as an additional extension for the already existing
neXboard landscape4 . The prototype therefore uses the existing communication
backend for exchanging the whiteboard information. The real-time communica-
tion is handled by using websockets. Apart from the existing backend another
websocket had to be created to enable the clients to synchronize changes of
the virtual world in real-time, for example if a user performs gestures or moves
within the three dimensional world.
Figure 5 shows the prototype’s connection to the neXboard. Since the
neXboard client as well as the VR client both use the same websocket server
to synchronize the whiteboard data, users of both clients are able to work on
the same board together.

3
https://aframe.io/ February 2018.
4
See Sect. 2.
970 M. Petrykowski et al.

Fig. 5. Architectural diagram of prototype.

4.2 VR/Non-VR Collaboration

Since this prototype is part of the already existing neXboard landscape, it is


possible to collaborate with each other in real-time in virtual reality as well as
without VR.

Fig. 6. Two people collaborating with Fig. 7. The open tool menu.
each other in real time.

Figure 6 shows the collaboration of two persons that both use a virtual reality
device. That is the most interesting aspect of the proposed approach. Each user
is able to see where the other person is, as well as how he moves his head and
hands. This feature should enable a social presence for the participants.

4.3 Supported Features


The purpose of this prototype is to enable collaboration with a whiteboard
application. Based on the neXboard it supports a small subset of functionalities
that make out the main interaction during a collaboration.
Figure 7 shows the menu that allows the user to choose his current tool. He
is able to select either the drawing tool, the interaction tool or the sticky-note
tool. Similar to CocoVerse [11] the menu opens by the use of a gesture. The user
has to turn his wrist by 90◦ and back as one fluent gesture. This triggers the
menu to open.
Digital Collaboration in Virtual Reality 971

Fig. 8. Participant using the drawing Fig. 9. Participant using the sticky-
tool on the whiteboard. note tool on the whiteboard.

The drawing tool as shown in Fig. 8 allows the user to draw on the white-
board, as he could on an actual whiteboard. Drawing is only possible by directly
intersecting or touching the whiteboard with the digital pen.
Shown in Fig. 9 is the user’s hand dragging a sticky-note by touching the
whiteboard. This tool allows the user to either interact by touching the white-
board or by using a laser-pointer interaction as described in Sect. 3.3. It allows
the user to either create a sticky-note by interact on a free whiteboard spot or
to interact with existing sticky-notes by selecting them.

4.4 Mocked Functionalities

Certain features that are essential for a collaboration but could be mocked easily
were left out in the prototype. This covers two features. The first is the call
feature that allows participants to talk to each other. And the second is the
possibility to add text to the sticky-notes.
Mocking the call feature is done by placing collaborating users within the
same room during the study. Making sure that their virtual placement roughly
matches their real placement allows them to talk to each other and still get a
feeling that the sound comes from the other virtual person.
The possibility to add text to sticky-notes is especially important during the
brainstorming process. To still enable the participants to do so and not to have
any trouble using a virtual keyboard, text-to-speech is used instead. To ensure
a good quality this text-to-speech simulation is done similar to the approach
followed by the concept of mechanical turk5 .

5 User-Study

The built prototype enables users to collaborate with each other within a virtual
environment. But in order to deduct whether they actually benefit from it, users
have to be conducted within a user study. In general the participants are going

5
The concept of the mechanical turk was first introduced with a fake automated chess
playing machine in the late 18th century. https://en.wikipedia.org/wiki/The Turk
(February 2018) It describes the process of faking a certain interaction or logic by
the use of a human.
972 M. Petrykowski et al.

to interact with an instructed user by either using the virtual reality prototype
or the neXboard with its video functionality. This study should analyse the col-
laboration of the participants with another user. It focuses on how the attendees
behave differently in respect to the use of virtual reality or not.

5.1 Personas
The user group contains people that are already using the neXboard. These are
a wide variety of people as meetings can be hold by developers, designers, scrum
masters or executives. This study will mostly cover participants between 20 and
30 since they would rather use new technologies due to their higher technology
awareness.

Creating a Similar Start Experience for Each Participant. As virtual


reality devices are not widely adopted yet, most people would not have used a
VR device yet. Therefore every participant would need to get an introduction to
the technology. This ensures that they have a basic knowledge of interfaces for
the virtual environment and know how they could interact with it.
A great tool for this introduction is the oculus “First Contact”6 application.
Within this application a robot instructs the user to grab, throw or pull different
objects from the environment. After this tutorial the user is able to deal with
hand poses and to interact with the virtual environment.

5.2 User-Study Setup

The goal of this study is to show to which extend and in which particular use
cases users benefit from virtual reality in the context of whiteboard applica-
tions. It focuses on the analysis of collaboration aspects, like the participants’
communication, the usage of the whiteboard or the resulting board. This allows
to conclude whether user performed worse or better on certain tasks with or
without virtual reality.
To analyze a collaboration’s success, a session has to be performed by at least
two persons. But to reduce the dimensions of different variables that have to be
taken into consideration when two users are tested at the same time, one user
is going to be part of the study to be able to focus onto the other one. After
the introduction with the oculus “First Contact” application the user is going
to solve three different tasks with either the neXboard and a video conference
or the virtual reality prototype. The usage of the different applications will be
equally distributed among the participants and tasks which are described in the
following Sect. 5.3:
After the tasks were successfully finished the user fills out a short question-
naire that is further described in Sect. 5.4.

6
See more on https://www.oculus.com/experiences/rift/1217155751659625/ (Febru-
ary 2018).
Digital Collaboration in Virtual Reality 973

5.3 Tasks to Be Solved by Participants


In the following three tasks are going to be outlined. For this study tasks with
different focuses are chosen. The tasks are going to represent different situations
that come up during a Design Thinking process. They are embedded into an
overall fictional problem of planning the next holiday vacation. Although the
tasks belong together and could be performed one after another, it is important
that all tasks have the same start point to be able to compare the tasks against
each other. This means, that the outcome of one task are not being used for one
of the other tasks. Each of the task fulfills following limitations:

1. It can be solved in a reasonable amount of time of around 5 min.


2. It should consist of elements that require to communicate with coworkers.
3. It has a definition of when the task is being completed.

Task One: Brainstorming. One basic task performed within the design think-
ing process is to generate ideas. The first activity is therefore to think about pos-
sible places to go or activities to do. The participants are given three minutes
to do so. They start from an empty whiteboard that just contains 2 sticky-notes
that allocate a certain spot on the board for either the places or the activities.

Task Two: Prioritizing. After successfully coming up with ideas the partici-
pants face a board that contains sticky-notes with places and activities. The goal
of this exercise is to bring those ideas into a prioritization of most interesting,
interesting and not interesting ideas. All participants get the same items that
should be prioritized. The board they work on contains 28 items that need to
be agreed on.

Task Three: Explanation. This task addresses the communication part and
should help to understand whether it makes a difference for a user’s compre-
hension to get things explained in virtual reality or through a video conference.
Therefore the briefed user explains something to the other person. Afterwards
the participant has to recap the explained things as detailed as he can. It is
important to mention that this user knows beforehand that he has to explain it
back. He is not restricted in any kind, so he can take notes on the board if he
wants to.
For this task the participants work on a board that contains rough orientation
points of what is being said. To stick to the overall problem of this user study’s
setup a certain city was chosen that is being introduced to the participant.

5.4 Comparison of VR - Non VR


For each of the tasks quantified metrics are taken to objectively compare the
performance of both systems. Each task has one major performance indicator
which are:
974 M. Petrykowski et al.

1. Number of created sticky-notes


2. Time needed to bring ideas into order
3. Amount of information that was repeated by the participant.

Apart from that the user feedback as well the observations are being taken
into consideration for the evaluation.

User Feedback. The feedback is being gathered through a questionnaire. It


is an important part of the study as it allows to get more insights about the
user. The questions cover general information, the concept of neXboard and this
paper’s prototype and possible experiences of sickness, as this could be a major
issue for VR applications.

6 Evaluation

This sections takes the outcomes of the user study described in the previous
section and sets them in relation to the question whether Design Thinking can
successfully be implemented in virtual reality in comparison to the use of a
digital collaboration tool without virtual reality, which is neXboard in this case.

Fig. 10. Demographic values of partic- Fig. 11. Number of created sticky
ipants. notes per participant.

6.1 Participants

Within the qualitative user study 17 participants took part. Figure 10 shows the
demographic properties of the usergroup. Most of them are male and in their mid
twenties. All come from different backgrounds like designers, sound engineers,
sales people, design thinkers, students and software engineers. This was impor-
tant to get a heterogeneous feedback. On the other hand the common neXboard
users also come from different backgrounds. Especially when groups use the
neXboard for design thinking they consist of multicultural team members. The
usergroup was chosen to fit this schema.
Digital Collaboration in Virtual Reality 975

6.2 Task 1 - Brainstorming


Task one as described in Sect. 5.3 consisted of creating sticky-notes with a brain-
storming of three minutes.
Figure 11 shows the number of created sticky-notes for each brainstorming
session in virtual reality as well as without VR. It visualizes that on average more
sticky-notes were created with participants that used the virtual reality device.
That is an interesting outcome, as one crucial part within the brainstorming
process consists of creating sticky-notes with content. One could assume that, as
people have the possibility to use a keyboard while using the neXboard within
a browser, the user would generate more content than in virtual reality where
they can only use a speech-to-text input.
But the complete opposite took place. The two highest outliers are 15 and 16
sticky-notes within a session and were both created in virtual reality. Whereas
the highest result without virtual reality has been 13 and came up only once.
“Go for quantity” is one of Design Thinking’s basic concepts. It emphasizes
that during the phase of idea generation it is good to have as much content as
possible to build on for the later phases.
All user were introduced to the possible tools they could choose from and
knew about the possibilities to use them. However all of them used the sticky-
note tool with its laser pointing ability to manipulate the whiteboard from a
distance. Still some of them tried to interact through touch with the board
but changed to pointing later on. One participant even used the pen to write on
the board. On the other hand the participants using the neXboard used different
colors to differentiate between certain aspects, but did not change the tool during
the whole exercise.

Fig. 12. Amount of time needed for Fig. 13. Number of points that were
sorting ideas. mentioned by the listening participant.

6.3 Task 2 - Prioritization


The second task deals with a brainstormed input and aims to bring the ideas
into a priority. Figure 12 shows the amount of time that was needed to finish the
ordering. The overall average was 3 min 41 s. Participants that used the Virtual
Reality were slightly above this average with 3 min 52 s. Especially 2 participants
needed around 6 min or more. Without virtual reality the user finished much
faster with an average of 3 min 20 s.
976 M. Petrykowski et al.

Most often the users followed the strategy to separate out those ideas where
neither of them could imagine to actually use them. Afterwards they discussed
on remaining ones. The huge difference in the timespans might not only be a
matter of used technology but also dependent on the agreement of the discussing
participants.
According to the usage all virtual reality user made use of the ray casting
interaction to change the sticky-notes position. Since the whiteboard itself had a
size of roughly 2.5 × 3.5 m it is more convenient to use a distant raycaster than
standing in front of the board and moving around all the time. One behaviour
that all users shared, regardless of whether they used virtual reality or not, is
that they focused on the whiteboards content and not on the other person. They
still had the social presence of the other person, but the voice interaction was
the most important communication channel within this exercise.

6.4 Task 3 - Comprehension

Task three covers the comprehension of concepts that are explained either
through virtual reality or the neXboard with a running video conference. As
shown in Fig. 13 the participants that had to use a video conference instead of
the virtual reality prototype for this task were able to recall 11 points on average,
which are roughly 4 points more than the other participants. They were able to
remember 7.8 points on average, around 2 points less than the overall average of
9.5. Therefore the participants using a videoconference performed better. The 4
sessions with the lowest numbers of recognized points all come up with the use
of VR.
Studies [7] have shown that gestures and face expressions play an important
role in supporting comprehension for people. That was one of the reasons for
introducing avatars in virtual reality and a video conference for the neXboard.
The explanation process of this exercise was accompanied by a board7 that con-
tained the main points of what was said. Interestingly in virtual reality the par-
ticipants mainly faced the board to follow the explanation instead of the talking
person’s avatar. Therefore the participant’s digital representation seems not to
be accurate or realistic enough to use it as a communication partner. Although
the avatar conveys gestures and movements it is missing facial expressions. This
is the main difference to the video conference, which allows both user to see each
others gestures and facial expressions. According to the results the missing facial
expressions seem to have a bad influence on a person’s comprehension.
This task did not contain mandatory use of the board for the participant dur-
ing listening and recalling the information. However three user of the neXboard
created sticky-notes during the explanation phase to better remember the points
afterwards. Even in virtual reality one participant tried to take notes by writing
with the pen on the board, but could not hold up to the speed of explanation.
This usecase shows a major issue of the text-input method in virtual reality.

7
See Sect. 5.3, Fig. 13.
Digital Collaboration in Virtual Reality 977

Participants complained about the problem that they could not use the text-to-
speech input during a conversation in which the other persons explains some-
thing. The input technique for text therefore seems to be inconvenient while
another user is still talking.
Overall this task shows that the comprehension works currently better with-
out virtual reality due to the missing facial expressions and the text-to-speech
input which prevents users from taking notes during a conversation.

6.5 Task Comparison

Looking at the three tasks reveals that according to the implemented prototype
not every use case seemed to be a good fit for virtual reality. The biggest dif-
ferences between both approaches are shown in task one where VR performed
better and in task three where videoconferencing was a better fit. For task two
the results have shown that the neXboard actually worked better. However the
differences between VR and non-VR were pretty small there.

Fig. 14. Which task the participants Fig. 15. Which concept supported the
found the easiest. user the most.

As Fig. 14 shows, participants found task two the easiest to work on. Of all
participants there was no tendency towards VR or non-VR. Eight participants
found their easiest task to be done within virtual reality and nine voted for a
task that they did only supported by a video conference.

6.6 Comparing neXboard and Virtual Reality Prototype

Independent of the task the participants were also asked to which extend they
liked either the neXboard or the virtual reality approach and whether they think
it would support their collaboration.
Figure 15 shows that the attendees found the neXboard more supportive
regarding the collaboration than the virtual reality prototype. The neXboard
performs better than the virtual reality prototype. However the actual consent
was pretty close. The VR approach was less supportive by only two points.
978 M. Petrykowski et al.

That still shows a good performance of the VR application as it still is in a pro-


totypical phase and not as fleshed out as the neXboard. Overall those questions
also state a high support by the two systems for digital collaboration.

6.7 Comparing Interaction Approaches

A tendency that could be observed all over was that participants ended up in
using the ray-caster method for interaction with the whiteboard. A few user even
started by using the touch interaction but changed during the task. There are
a few reasons for that. On the one hand the size of the whiteboard with around
2.5 × 3.5 m makes it cumbersome to use touch interaction. It enforces the user
to move throughout the whole distance to move a sticky-note from one end to
another. Due to the user’s body size he may not even reach every position on
the board. On the other hand the touch interaction enforces the user to stand
close to the board. This position does not allow him to have an overview of the
whole board and makes it hard to see the whole picture. Both of this pain points
are solved when standing at a distance and using the ray-caster.
User feedback showed that 30% of the participants liked the touch interac-
tion more. That is not surprising as using touch makes the prototype feel more
natural. However, the provided touch interaction was not convenient enough to
successfully work with it. Another reason for the dominance of the ray-caster
interaction is the similarity to a mouse interaction on desktop devices. The par-
ticipants are used to that and people usually like to stick to what they already
learned since it does not require any additional efforts.
Apart from the interaction with the board the participants were also able to
choose between different tools through an interaction menu as shown in Sect. 4.3.
CocoVerse [11]8 already proposed a method to open the menu by reaching down
with one hand to the user’s waist level. This paper’s prototype tried out another
gesture to see whether it works well too. Half of the participants liked this way
to open the menu although all of them had to be taught first to be able to
execute the gesture. And some of them still struggled to open it. However, all
of them liked that they were able to change the active tool wherever they are.
This confirms once more as Greenwald et al. [11] also stated that the approach
of locality of tools in virtual reality makes sense for the user and seems to be
important for successful VR applications.

7 Future Work

The introduced prototype throws up certain points that could lead to a major
improvement. One important aspect comes up through the comprehension task.
In order to successfully integrate a virtual reality application into a Design
Thinking process people need to understand each other well. The digital avatar
used in this paper is the right step towards that since it gives the user a sense of

8
Described in Sect. 2.
Digital Collaboration in Virtual Reality 979

social presence of other people. However the task showed that not only gestures
but also facial expression help to support a participant’s comprehension. There-
fore it is definitively worth it to find a way to derive those expressions from the
spoken word, its voice level or the user’s image of his face.
Another aspect is the cross virtual reality communication. The popularity
of VR devices has just started so that there still are a lot of people without
one. In order to promote the usage of VR applications such as the one proposed
in this paper people need to be able to work together even if they do not own
a device. Evaluating different ways of how to integrate non-VR users into the
virtual environment would make this applications more usable and compatible
with heterogeneous groups.

8 Conclusion

In this paper, we introduce the impact of VR technologies for collaboration tasks,


especially Design Thinking. By evaluating related work and diving deeper into
the used interaction techniques, we elaborated on the best paradigms to select
for our prototype that will support the collaboration for remote teams.
The developed prototype was successfully capable of providing an immersive
experience by combining interaction methods and collaboration aspects of tools
like DIVE [10], CocoVerse [11] and neXboard.
It showed in particular that participants preferred to interact with the white-
board by using a ray-cast. This allowed them to get a better overview of the
board. This paper also confirms the approach for changing tools with a certain
gesture proposed in the CocoVerse system by implementing a similar tool menu.
Our user study has shown that the participants appreciate the virtual reality
approach and feel that it is supportive for Design Thinking tasks at hand. Espe-
cially brainstorming which is an essential part of Design Thinking show better
results with the proposed VR approach than the digital whiteboard (neXboard)
according to the number of generated sticky-notes.
The digital avatars for each user had a positive influence on the communi-
cation between the participants as it allowed them to see each other’s gestures.
They were also able to see where the other users are positioned in the room and
thereby are supported in tasks like comprehension and understanding.
One interesting outcome provided by the user study’s feedback was that
placing people into a virtual environment shields them from outer influences.
They were able to focus on the task without being disturbed. This perception
can be generated best in immerse environments. The proposed application fulfills
the four important points that make out immersiveness. Therefore the user’s
feedback confirms that those points actually make out an immerse feeling. This
increased concentration also affects the results of the performed tasks of the user
study. Brainstorming and prioritization of ideas work equally well and better
than with non-VR approaches.
The perceived immersiveness of the participants also confirms that mod-
ern web technologies are capable of providing a good virtual reality experience.
980 M. Petrykowski et al.

The user were asked for motion sickness and showed that only two attendees felt
slightly sick which is usual for people using VR for their first time. Furthermore
the precision of the interaction was also perceived as acceptable to work with.
Comparing a 2D digital collaboration tool and the proposed virtual reality
system of this paper shows that VR is useful for certain kind of tasks like brain-
storming or prioritization. Tasks that contain comprehensive elements however
still work better when done within a video conference. Both approaches can be a
good surrogate for actual in person meetings that would otherwise be expensive
according to time and money. Design Thinking among distances profits from
virtual reality especially if the application is web based and multiple users can
join with every device they like.

References
1. Sutherland, I.E.: The ultimate display. Multimedia: From Wagner to virtual reality
(1965)
2. Sutherland, I.E.: A head-mounted three dimensional display. In: Proceedings of
the December, 9–11 1968, Fall Joint Computer Conference, Part I, pp. 757–764.
ACM (1968)
3. Gumienny, R., Gericke, L., Quasthoff, M., Willems, C., Meinel, C.: Tele-board:
enabling efficient collaboration in digital design spaces. In: 2011 15th International
Conference on Computer Supported Cooperative Work in Design (CSCWD), pp.
47–54. IEEE (2011)
4. Wenzel, M., Gericke, L., Gumienny, R., Meinel, C.: Towards cross-platform
collaboration-transferring real-time groupware to the browser. In: 2013 IEEE 17th
International Conference on Computer Supported Cooperative Work in Design
(CSCWD), pp. 49–54. IEEE (2013)
5. Wenzel, M., Gericke, L., Thiele, C., Meinel, C.: Globalized design thinking: bridging
the gap between analog and digital for browser-based remote collaboration. In:
Design Thinking Research, pp. 15–33. Springer (2016)
6. Plattner, H., Meinel, C., Weinberg, U.: Design thinking–innovation lernen–
ideenwelten öffnen, mi-wirtschaftsbuch (2009)
7. Fussell, S.R., Kraut, R.E., Siegel, J.: Coordination of communication: effects of
shared visual context on collaborative work. In: Proceedings of the 2000 ACM
Conference on Computer Supported Cooperative Work, pp. 21–30. ACM (2000)
8. Wenzel, M., Meinel, C.: Full-body webRTC video conferencing in a web-based
real-time collaboration system. In: 2016 IEEE 20th International Conference on
Computer Supported Cooperative Work in Design (CSCWD), pp. 334–339. IEEE
(2016)
9. Szalavári, Z., Schmalstieg, D., Fuhrmann, A., Gervautz, M.: Studierstube: an envi-
ronment for collaboration in augmented reality. Virtual Reality 3(1), 37–48 (1998)
10. Carlsson, C., Hagsand, O.: Dive a multi-user virtual reality system. In: Virtual
Reality Annual International Symposium, pp. 394–400. IEEE (1993)
11. Greenwald, S.W., Corning, W., Maes, P.: Multi-user framework for collaboration
and co-creation in virtual reality. In: 12th International Conference on Computer
Supported Collaborative Learning (CSCL) (2017)
Digital Collaboration in Virtual Reality 981

12. Giraldi, G., Silva, R., Oliveira, J.: Introduction to virtual reality. LNCC Research
report, vol. 6 (2003)
13. Hand, C.: A survey of 3d interaction techniques. Comput. Graph. Forum 16(5),
269–281 (1997)
14. Bier, E.A.: Skitters and jacks: interactive 3d positioning tools. In: Proceedings of
the 1986 Workshop on Interactive 3D Graphics, pp. 183–196. ACM (1987)
Teaching Practices with Mobile in Different
Contexts

Anna Helena Silveira Sonego1, Leticia Rocha Machado2,


Cristina Alba Wildt Torrezzan2, and Patricia Alejandra Behar3 ✉
( )

1
UFRGS/PPEDU, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401,
90040-060 Porto Alegre, RS, Brasil
2
UFRGS/PPGIE, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401,
90040-060 Porto Alegre, RS, Brasil
3
UFRGS/PPGIE/PPGEDU, Av. Paulo Gama, 110 – Prédio 12105 – 3° Andar Sala 401,
90040-060 Porto Alegre, RS, Brasil
pbehar@terra.com.br

Abstract. This article aims to outline different pedagogical strategies with appli‐
cations (apps) in the classroom. Every year the use of mobile devices like tablets
and smartphones increases. At the same time, applications are being developed
to meet this demand. It is therefore essential that educators investigate their use
as a motivational technological medium that can possibly be used in the class‐
room. Apps can be used both as a source of information as well as a tool for
creating material. Thus, this article will present the results of a study applying
teaching strategies in different contexts. It therefore, highlights the importance of
mobile learning as a viable alternative in the classroom. In order to do so, there
was a multiple case study in the undergraduate pedagogy program and a digital
inclusion course for seniors both offered in the first semester of 2017 at the Federal
University of Rio Grande do Sul (UFRGS). Educational applications and exam‐
ples of teaching strategies using apps were created in these classes. Educational
applications offer the possibility to bring innovations to teaching practices, as well
as new forms of communication, interaction and authorship, thus contributing to
the process of teaching and learning.

Keywords: Educational applications · Mobile learning · Teaching strategies

1 Introduction

The number of mobile devices being produced and offered to Brazilians increases every
year. In “The Brazilian Media Study” [1] the cellular phone was ranked as the second
means of accessing the Internet (66%), followed by the tablet (7%). This shows that
Brazilians use phones for different purposes, including to access digital Internet tools.
There are different reasons for this, including the quick learning curve to use these
devices (mainly due to the interactive touch screen), mobility, fast communication and
frequent updates.
With this context in mind, it is important that the development of this new mode of
communication and reasoning is also incorporated in the classroom to keep up with the

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 982–991, 2019.
https://doi.org/10.1007/978-3-030-02686-8_73
Teaching Practices with Mobile in Different Contexts 983

changes in society. Thus, one can prepare a subject to use mobile technology not only
for entertainment, but also for educational goals and to meet their daily life needs more
productively.
One of the most commonly used tools on mobile devices are applications or digital
resources designed to carry out certain tasks such as communicating, playing, creating
text, etc. Currently there are about 1.43 million applications (apps) available on Google
Play (https://play.google.com/store?hl=pt-BR), and 1.21 million at the Apple Store
(https://itunes.apple.com/br/genre/ios/id36?mt=8) [2]. Yet there are few studies
regarding the use and construction of applications in education. Therefore, this paper
aims to present possible pedagogical strategies that can be used in the construction and
use of apps in the classroom, involving examples of educational activities that have
already been implemented at the Federal University of Rio Grande do Sul/Brazil.
This paper is structured in six sections. The first addresses the concept of mobile
learning (Sect. 2). Then Sect. 3 describes the methodology used in this study. Next
examples of the use and construction of educational applications supported by educa‐
tional strategies are presented in Sect. 4. Lastly, Sect. 5 presents the conclusions.

2 Mobile Learning

Currently, mobile technology is the increasingly used in different sectors of society.


Education, in turn, needs to be constantly updated in order to support its students. This
brings new challenges in the educational sector, such as the Mobile Learning approach
(M-Learning).
M-Learning, incorporates the use of mobile technologies, separately or together with
other Information and Communication Technologies (ICT) [3]. Thus, this type of tech‐
nology can provide students with possibilities to construct and improve knowledge at
any time or place. According to [4], M-Learning can occur in situations where technol‐
ogies can offer the student means to build their knowledge. However, a simple random
use of a mobile device to perform an isolated activity in the classroom is not mobile
learning. In order to be effectively understood as such, the teacher needs to integrate the
use of technology with pedagogical planning that involves the study of content, teaching
materials, implementation strategies and activities.
In addition to supporting academic activities, this type of learning can also aid the
interaction and communication among those involved in the educational process.
According to [5], M-Learning provides opportunities to unite people in real and virtual
worlds, creating learning communities among teachers and students. This occurs with
the aim to integrate the process of teaching and learning with the use of mobile tech‐
nologies. Therefore, there is the need to create one or more teaching strategies to support
this educational process, or a possible set of educational activities that can be applied
according to the individual and/or collective needs of students [6]. One possibility is the
use of applications in the classroom, which will be discussed below.
984 A. H. S. Sonego et al.

2.1 The Use of Educational Applications in the Mobile Learning Process


Applications (apps), as described above, are programs designed especially for mobile
platforms such as smartphones and tablets [7]. When used in the classroom, they can
become an educational resource [8], capable of providing an innovative, dynamic, inter‐
active, collaborative and even playful knowledge building process.
There are tools that allow teachers and students to build their own educational appli‐
cations. Some of these are available in free versions. For example, the Fábrica de Apli‐
cativos1 (http://fabricadeaplicativos.com.br/) enables the creation of applications for
mobile devices in different areas, offering a reasonable amount of features.
This perspective contends that the construction and use of apps can be integrated
into educational objectives, challenging educators and students and also prompting
innovations in teaching and learning. In addition, app-building is a way to mediate
learning with the use of mobile devices in the classroom. Therefore, instead of prohib‐
iting the use of these devices, pedagogical strategies must be created to bring the educa‐
tional environment closer to the current social reality.
Hence, it is argued that teachers and students may gradually find new ways to use
applications. They will no longer be solely for entertainment, but increasingly used to
solve everyday problems. Autonomy, collaboration and interaction are also motivated
by this strategy, since students can take an authorial stance from the search for useful
applications to their creation and sharing of this resource with the class. It is also a way
to unite theory and practice, enabling the construction of meaning for the covered
content.
However, simply using applications is not sufficient to support educational goals. It
is necessary to formulate pedagogical strategies that integrate the elements involved in
the process of teaching and learning to promote quality education. Thus, the following
sections will present some pedagogical strategies used for this study to address the crea‐
tion and use of applications in the classroom.

3 Methodology

This paper explores pedagogical strategies that can be adopted to create educational
applications. The research is descriptive theoretical-practical, because it is dedicated to
the (re)construction of ideas and improvement of principals related to studies of mobile
learning and authorship. We used a qualitative case study to carry out this research.
Therefore we used different instruments and believe that the qualitative approach will
contribute significantly to meeting the objectives proposed in this study. Based on [9,
p. 21], qualitative research is a science that is attentive to studies that cannot be quantified
and at the same time, can work “with the universe of meanings, reasons, aspirations,
beliefs, values, and attitudes.” For this study, we chose a case study, because according
to [10], it is a type of research that refers to phenomena, facts, and contemporary events
that are part of our daily lives. According to the author, “the differentiating power of the
study is its ability to handle a wide variety of evidence - documents, artifacts, interviews

1
Application Factory in English, a resource in Portuguese found at the site provided in the text.
Teaching Practices with Mobile in Different Contexts 985

and observations - beyond what might be available in a conventional historical study”


[10, p. 27]. Thus, a case study allowed us to evaluate how educational applications can
enhance the teaching as well as student’s learning process. Moreover, based on M-
Learning, we investigated what capacities are needed for mobile learning.
In order to meet the proposed objectives, the study was conducted in three recursive
steps:
(1) Construction of the theoretical framework of the themes: Mobile learning, mobile
devices, educational applications, authorship. We studied the authors who deal with
M-learning in different contexts, both in conventional classrooms and in different
educational spaces (such as continuing education). Among the areas surveyed,
those that stand out are: education, gerontology, design, information technology [3,
7, 8, 11, 12].
(2) Planning and implementation of the classes: The intention was to plan and imple‐
ment teaching strategies that include the author’s development of educational
applications. We used two types of public to develop and implement this study:
undergraduate students in the Education Department at the Federal University of
Rio Grande do Sul (UFRGS/Brazil) and elderly students in a digital inclusion course
at the same university. The first group was made up of 26 undergraduate students
that were 18 years old or older, enrolled in a traditional classroom course in the
Education Department at UFRGS/Brazil). The workload of the course was 45 h and
students had to develop an application relevant to the areas of technology and
education. The class was observed during the formal class times and in addition the
students were given a survey questionnaire. It aimed to analyze opinions and
expectations regarding the experience of M-learning in undergraduate education.
The questionnaire was created on a site that contains this type instrument on the
Internet, and was made available to students through the use of the virtual learning
environment (ROODA – (in English the acronym corresponds to: Cooperative
Network of Learning): http://ead.ufrgs.br/rooda). Based on the survey responses,
statements were selected that best highlighted the construction of educational
applications, relating the possible opportunities and challenges associated with the
use of an app in the school environment. The answers made it possible to analyze
information about the relationship between theory and practice in teaching and
learning mediated by mobile devices. The alternatives of the questionnaire
responses were organized based on the Likert scale, which, according to [13] (p.
4), “requires respondents to indicate their level of agreement or disagreement with
statements regarding the attitude being measured.” Thus, it was possible to conduct
an analysis by triangulating the data, which is only possible when you have more
than one source with different information. Author in [14, p. 1142] states that trian‐
gulation happens, “when we use more than one approach to the investigation of a
research question in order to increase confidence in the results.” Therefore, the
triangulation of data allows the expansion of the research, unlike when you have
just one research procedure. The second group was the elderly, who were 60 years
old or older, who participated in a digital inclusion course, offered at the same
university. The class had a workload of 45 h, and all of the classes were in person.
The goal was to build applications that show the main sights of the city in which
986 A. H. S. Sonego et al.

they lived. These two publics were chosen in order to make a comparison of the
use of pedagogical strategies that allow the construction of ideas and production of
knowledge through the authorship of materials for M-learning.
(3) Development of educational strategies for the educational use of applications: This
step was based on the theoretical framework and results obtained in the undergrad‐
uate course and continued education workshop.
There were two data collection instruments used: (a) Participant observation; (b)
Data collected through the productions in virtual learning environment features. The
following section presents the trajectory and results of this research.

4 Trajectory and Results

The construction of educational apps in the classroom involved students in research


(they had to research about the applications and themes for them). Moreover, they had
to read, understand texts and write for their applications. Hence, this multiple case study
involved two groups of students, an undergraduate pedagogy course and an extension
course for seniors. This enabled analysis of how educational applications can enhance
the teaching and learning of students through M-Learning.

4.1 The Construction of Applications in a Pedagogical Undergraduate Course

Building an application in the undergraduate course began by planning a group task,


which was to design and develop an educational app. The themes were to be related to
topics studied in class or about information technology in education, an issue closely
linked to the subject of the class. It asked for the apps to present a theme (in the appli‐
cation description), suggestion of an educational app, application tips, examples from
videos, photo album, audio, references and credits (authors).
The activity began in the week that discussed the topic “Mobile Learning”, lasting
for 14 more days (including distance learning). At the end of this time, students posted
the application link in the virtual learning environment ROODA (in English the acronym
corresponds to: Cooperative Network of Learning). This is a virtual environment plat‐
form for distance learning (https://ead.ufrgs.br/rooda/), which was used to plan and
organize the “Media, Digital Technologies and Education” class offered in the first half
of the daytime pedagogy course offered in the first semester of 2017 at UFRGS/Brazil.
This application has provided support for this research.
Examples of apps produced in this undergraduate course are presented below
(Figs. 1 and 2). These applications can be accessed at the online address provided and
installed on a mobile device. They are still available on the Internet and not through any
specific mobile device app store.
Teaching Practices with Mobile in Different Contexts 987

Fig. 1. Example of an application made by a student in the education class. Available at: http://
galeria.fabricadeaplicativos.com.br/repositorio_digital.

Fig. 2. Example of an application made by a student in the education class. Available at: http://
galeria.fabricadeaplicativos.com.br/infoplan-turmab-midias#gsc.tab=0

4.2 Construction of Applications in an Elderly Digital Inclusion Course


Research about the use of applications by the elderly is still quite recent and there are
few apps geared toward this population. Those available are primarily related to the
health of the elderly (medication warnings, diabetes control, etc.). It is worth inquiring
when education will produce applications and/or investigate teaching strategies that
meet the elderly’s other needs (social, cultural, technological, etc.). Therefore, there is
988 A. H. S. Sonego et al.

a demand exists to create pedagogical strategies that can assist in the elderly’s critical
development through, for example, authorship.
The Digital Inclusion Unit (UNIDI) of the Federal University of Rio Grande do Sul
(UFRGS) offered a distance learning/classroom workshop for seniors in 2014, called
“Between cultures in southern Brazil: The elderly’s view of the city Porto Alegre.” The
workshop lasted for five months, with two hour weekly meetings. The goal was for the
elderly to create applications to present the city where they live and the most interesting
places to go and tourist sites for other seniors to visit.
The virtual learning environment ROODA - Cooperative Learning Network, was
used as a pedagogical strategy to develop these applications. In addition to communi‐
cation tools such as chat and a forum, this environment also provided support materials
such as tutorials and a page with detailed lessons about the workshop (http://intercul‐
tura.weebly.com/).
Each participant had the goal of creating an application about the city. Field trips
were included in the classes so that participants could collect data on the region and also
take pictures of the scenery.
A total of 15 seniors participated in the workshop, with an average age of 67.
However, only 5 applications were completed in the workshop by the elderly them‐
selves: Route of the POA tourist bus, Buildings in Porto Alegre, Landmarks of Porto
Alegre - RS, Bus rides, and Gaucho legends. Figures 3 and 4, presented below, show
examples of applications developed in this elderly digital inclusion course. All of the
apps designed can be accessed through online address provided in each figure.

Fig. 3. Example of an application developed by a student in the class for the elderly. Available
at: http://galeria.fabricadeaplicativos.com.br/onibusturismopa
Teaching Practices with Mobile in Different Contexts 989

Fig. 4. Example of an application developed by a student in the class for the elderly. Available
at: http://galeria.fabricadeaplicativos.com.br/lendasgauchas

4.3 Outline of Pedagogical Strategies


The results of these strategies are seen point to the valid contribution of the creation of
an educational application to building and sharing of information, knowledge and
concepts collaboratively. Thus, it was taken into consideration the fact that the activity
has been published on the ROODA (in English the acronym corresponds to: Cooperative
Network of Learning) Webfolio in a format visible to all, enabling one to go to the
address (URL) of the app developed in the class and extension course for seniors. More‐
over it allowed all students to view their peers’ work on their mobile devices. They could
download the applications that they were interested in, about the theme and/or interactive
content, providing a less linear reading, containing video, audio, images, links and
others.
It is possible to outline some pedagogical strategies that can assist in the production
of applications in the classroom based on this research and experience:
– Planning: In addition to outlining the objectives of the educational proposal, it is
important to decide the subject of applications with the students so that they are
involved and motivated to develop the apps.
– Materials: It is important to plan a time to collect materials for the application. A
class on how to collect materials (photo, images on the Internet; videos, etc.) is also
necessary, as well as one on how to separate information into specific folders on the
computer to find it easier when it is time to create the app.
– Features to create Apps: It is difficult to find features for building applications that
are easy to use and are also in Portuguese. There are few tools for laymen. The strategy
used in the two examples presented in this article is found in the Fábrica de Aplica‐
tivos (http://fabricadeaplicativos.com.br/). Although it is relatively easy to use on the
computer, this feature limits the tools that can be included in the app.
990 A. H. S. Sonego et al.

– Copyright: It is very important to take precautions regarding copyrights on materials


produced and applications. One must be extra careful, because these apps can be
accessed and downloaded on mobile devices by anyone in the world.
– Educational goal: Without an educational goal, applications provide little student
involvement and can even be discouraging. The clarity of educational objectives in
building the app, for the teacher as well as the student, is essential for the proper
application of this technology.
These were some possible pedagogical strategies that can be adopted by teachers at
different levels and types of education. There is still a great deal of research to be done
and much to be proposed in this field. However, it has been shown that the development
of educational applications in the classroom is extremely compelling and challenging
for students. It motivates them to continue learning and developing other applications
of interest and can help them to acquire knowledge.

5 Final Considerations

This work has shown that the use and construction of educational applications as a
pedagogical and authorial strategy is relevant. In fact, it has the potential to generate
innovation in schools, offer new and different possibilities in the teaching and learning
process, and help students to better understand content and information.
Thus, mobile learning presents innovations and challenges for its implementation,
such as connectivity, portability, flexibility, autonomy of students and new forms of
communication and interaction. In conclusion, mobile learning is now being developed.
It is therefore still necessary to research and understand this tool and its possibilities in
education. For future research will be developed and implementing the system by expert
judgment or analysis with rubrics that allows observing the supposed benefits of digital
devices in teaching. Hence, this article hopes to provoke reflection on mobile learning
in schools, aiming to collaborate by strengthening the related concepts and aid in the
use and development of educational applications in the classroom.

References

1. Brasil. Presidência da República. Secretaria de Comunicação Social. Pesquisa brasileira de


mídia: hábitos de consumo de mídia pela população brasileira. Secom, Brasília (2014)
2. Tecmundo. Play Store passa App Store em número total de aplicativos e desenvolvedores
(2015)
3. Unesco. Policy Guidelines for Mobile Learning. Publicado pela Organização das Nações
Unidas para a Educação, a Ciência e a Cultura (UNESCO), 7, place de Fontenoy, 75352 Paris
07 SP, France. A tradução para o português desta publicação foi produzida pela Representação
da UNESCO no Brasil (2013)
4. Leite, B.S.: M-Learning: o uso de dispositivos móveis como ferramenta didática no Ensino
de Química. Revista Brasileira de Informática na Educação 22(3) (2014)
5. Batista, S.C.F.: M-learnMat: modelos pedagógicos para atividade de m-learning em
matemática. 255 p. Tese. (Doutorado em Informática na Educação) Universidade Federal do
Rio Grande do Sul, Porto Alegre, RS (2011)
Teaching Practices with Mobile in Different Contexts 991

6. Behar, P.A.: Modelos Pedagógicos em Educação a Distância. Artmed, Porto Alegre (2009)
7. Santos, F.M., de Freitas, V., S. F.: Avaliação da usablidade de ícones de aplicativo móvel
utilizado como apoio educacional para crianças na idade pré-escolar. Ação Ergonômica.
Revista Brasileira de Ergonômia 10(2) (2015)
8. Bento, M.C.M., Cavalcante, R.S.: Tecnologias móveis em educação: o uso do celular na sala
de aula. ECCOM 4(7), 113–120 (2013)
9. Minayo, M.C.S. (ed.). Pesquisa social: teoria, método e criatividade. Vozes, Rio de Janeiro
(2003)
10. Yin, R.K.: Estudo de Caso: planejamento e métodos. Bookman Companhia Ed, Porto Alegre
(2005)
11. Doll, J., Cachioni, M., Machado, L.R.: As novas tecnologias e os idosos. In: Py, L. (ed.)
Tratado de geriatria e gerontologia. GEN, Rio de Janeiro (2016)
12. Yu, Z., et al.: Facilitating medication adherence in elderly care using ubiquitous sensors and
mobile social networks. Comput. Commun. 65, 1–9 (2015)
13. Bryman, A.: Triangulation. Encyclopedia of Social Science Research (2011). http://
www.sagepub.com/chambliss4e/study/chapter/encyc_pdfs/4.2_Triangulation.pdf. Accessed
11 Sept 2017
14. Brandalise, L.T.: Modelos de medição de percepção e comportamento - uma revisão,
Florianópolis (2005). http://www.lgti.ufsc.br/brandalise.pdf. Accessed 21 Nov 2017
Accessibility and New Technology
MOOC- Disability and Active Aging:
Technological Support

Samuel A. Navarro Ortega1,2 and M. Pilar Munuera Gómez1,2(&)


1
The University of British Columbia, Vancouver, Canada
2
Universidad Complutense, Madrid, Spain
pmunuera@ucm.es

Abstract. Covered in this paper are the notions of autonomy for the disabled
and the elderly and universal access to new Information and Communications
Technology (ICT). Assistance provided to the disabled has its roots in the
human rights movement which proposes a new model to analyze the evolution
of treatment for people with disabilities. A historical-critical analysis of the ideas
and attitudes that for centuries have shaped the lives of people with disabilities
reveals a situation of discrimination and social exclusion. In the case of Spain,
for example, more than 4 million people lived with disabilities in 2009.
Recently, many obstacles have been overcome in the process of bringing dis-
abled people and the elderly into mainstream social life. In countries such as
Canada and Sweden, legislation and public policies have ensured that disabled
people and the elderly maintain quality of life, with full recognition of their
rights as citizens. The paper concludes by presenting the MOOC Disability and
Active Aging: Technological Support. This new educational initiative is being
developed at Universidad Complutense in collaboration with international
institutions. The aim is to inform the community about technologies whose
application enhances independent living for persons with a functional diversity.
A first version of the course in 2017 attracted 3,334 participants from six
continents and coming from a wide range of backgrounds and ages. Participants
appreciated above all the methodological approach of the course. They stressed
the broad perspective from which new technologies were discussed in order to
promote independence and autonomy for the disabled and the elderly.

Keywords: Accessibility  Information and communications technology


Disabilities

1 Introduction

New technological innovations are serving to enhance the learning processes and living
conditions of communities of disabled individuals and the elderly [24]. Information
and Communications Technology (hereafter ICT or IC technologies) should be
understood as technologies that process, store and communicate information to single
users and across groups of users [15]. Most importantly, these new technologies

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 992–1004, 2019.
https://doi.org/10.1007/978-3-030-02686-8_74
Accessibility and New Technology MOOC 993

decrease the number of obstacles that disabled individuals face when applying for new
jobs, enrolling in educational programs, or simply pursuing daily life activities.
The mandate of the United Nations Convention on the Rights of Persons with
Disabilities acknowledges that disability “results from the interaction between persons
with impairments and attitudinal and environmental barriers that hinder their full and
effective participation in society on an equal basis with others” [38]. Clearly, IC
technologies cannot eliminate attitudinal barriers; they can however increase effective
functionality. In other words, ICT can help ameliorate differences in effectiveness by
allowing persons with a disability (i.e., a health condition) to function on a par with the
rest of the community.
By the early 1990s, the development of commercial networks and enterprises
marked the beginning of the transition to the modern Internet [33]. In parallel with the
sustained growth of Internet connectivity, there has been an influx of new products and
services, some of which have directly benefitted people with special needs. Echoing
this developmental trend in technology, the United Nations Convention endorsed and
motivated those involved in the research and design of new technologies to continue
such efforts. More precisely, Article 4 of the Convention encourages States Parties “To
undertake or promote research and development of, and to promote the availability and
use of new technologies, including information and communications technologies,
mobility aids, devices and assistive technologies, suitable for persons with disabilities,
giving priority to technologies at an affordable cost” [8, 9, 38].
Article 9 of this same Convention advocates for the opportunity for people with
special needs to live independently, engaging fully in all aspects of life. Likewise,
States Parties are encouraged to “provide persons with disabilities access, on an equal
basis with others, to the physical environment, to transportation, to information and
communications, including information and communications technologies and sys-
tems, and to other facilities and services open or provided to the public” [3, 4, 38]. Two
concrete examples that address this mandate have to do with access to Internet services
and to websites. Drawing on the increasing reliance on the Internet to access services
(e.g., electronic mail, online banking, news groups, etc.), it is imperative that the
community of the disabled and the elderly also be able to make use of them at an
affordable cost. And because much online information is organized on websites, it is
imperative that the sites consider users who might require assistance to navigate them.
In the case of public administrations, they should mandate that their organizations
design websites following explicit criteria of accessibility such as adjustable font sizes
and color contrasts, to name a few. The idea is to make websites more user-friendly to
the visually disabled or the hard of hearing as explained below.
The jobsite for UK disability employment is a good example of a government
organization responding to Article 9 of the United Nations Convention. The site offers
an online interface designed to deliver “barrier-free” e-recruitment [12]. If a disabled
person were looking for a position as a manager, the site offers 50 options to choose
from. But what would happen if the jobseeker were visually impaired and could not
read the ad for the job? The site is equipped with text-to-speech capability so that the
visually impaired jobseeker can listen to the advertisement being read at a slow speech
rate and clearly enunciated. But if the jobseeker were hearing impaired, the jobsite
acknowledges that a face-to-face job interview would be more challenging than a
994 S. A. Navarro Ortega and M. Pilar Munuera Gómez

phone interview. Thus, the jobsite advises the jobseeker to follow tips for the hard of
hearing such as use a hearing aid, sit as close to the interviewer as possible and face the
person if feasible, filling in missing words from conversation so that concepts are
understood [12]. Interestingly, an ad for a position as service manager in the home-
building stated that the successful candidate would receive the latest IT kit (e.g., a
Surface Pro tablet and smart phone), supported by the company’s new online integrated
IT system. Therefore, the disabled applicant selected for the job would not have to be in
the office at all times. Instead, the employer offers the possibility of completing support
plans on the go.
Just as in the workforce, Information and Communications Technologies are also
accommodating academic environments so that learners who live with a disability can
feel more productive. In the case of low vision and visually impaired learners of an
additional language (e.g., Spanish, English), there are several forms of technology that
can enhance students’ learning experience. The market offers software packages that
enable visually disabled learners to work with digitalized copies of language textbooks.
Other software packages enable a blind student to dictate information to a personal
computer and afterward have the computer read the material back to the learner. And
there are even electronic devices capable of enlarging print materials so that visually
disabled students can read handouts or flashcards. Even in the case of the movies and
documentary films that often enrich students’ exposure to a new language, voices off
camera added to the audiovisual materials allow a visually disabled learner to follow the
narrative being displayed [28]. Interesting to consider is that ICT technologies enhance
the possibility of accommodating learners in and outside of the classroom, but without
neutralizing the features that individualize each learner [14]. On the contrary, these
supporting tools facilitate the academic integration of the community of visually dis-
abled students as they continue to function in line with their own learning styles [29].
In short, technology has the potential to become the greatest ally of the disabled and
the elderly, bringing them closer to functioning independently [10].

2 A Step Forward in Assisting People with Disabilities

At present, the professionals and organizations that help persons living with a disability
no longer use old, negative terms to talk about them [6]. Most importantly, disabled
persons are no longer perceived as carriers of something harmful and who should not
be seen. The traditional view has been replaced by the notion of rehabilitation, which
in turn underscores the concept of autonomy. That is, a disabled person can achieve
degrees of autonomy thanks to a process of rehabilitation [30]. This perspective of
rehabilitation is rooted in the social movement that strives for the rights of persons with
disabilities [38].
Worth mentioning is that a disability arises from an interaction between a fore-
seeably permanent disability (i.e., a health condition) and contextual factors (e.g.,
environmental and personal) that may impede social integration. This definition
stresses the idea of societal barriers (or obstacles) which hinder full assimilation of the
disabled person. In this context, information and communications technologies are
Accessibility and New Technology MOOC 995

tools that help achieve autonomy. For the disabled person, this offers greater oppor-
tunities to function on an equal footing with the rest of the society.
In line with this reconceptualization of a disabled person and their functional role in
society, [6, 34] our lexicon has also been going through a process of positive change.
Gradually, pejorative and patronizing terms to label disabilities have yielded ground to
more inclusive and non-negative language. For instance, we now acknowledge that a
person who lives with a disability is someone who has a functional diversity [31, 32].
That is the disability signals an aspect that makes a person distinct from others, but by
no means should this be considered an adverse feature (cf. people with special needs).
Underlying the semantics of functional diversity, there is the idea of lack of respect
that persons without functional diversity (i.e., “normal” or “healthy” individuals)
sometimes display toward the cohort of disabled people. Social constructions and even
environmental structures appear to disregard persons living with a disability [30]. This
is the case with poor accessibility (e.g., lack of wheelchair ramps) in workplaces,
recreational areas, or government buildings. It is not uncommon that the design and
construction of public areas favors sophisticated architectural designs over function-
ality. This ends up imposing mobility challenges for persons with physical functional
diversity or those with a visual processing functional diversity.
In short, stepping forward in assisting people with disabilities entails two major
requirements. On the one hand, there is a need to obtain and maintain equal rights for
the community of the disabled. On the other hand, we need to develop innovative
theories that place value on the intrinsic dignity of people discriminated against due to
their functional diversity. This is a topic that may lead into bioethics debates about their
autonomy.

3 Disability in the 21st Century

According to the National Institute of Statistics, more than 4 million people in Spain
lived with disabilities in 2009 [5]. In that same year, the Canadian national portrait of
disability showed that about 4.4 million people — one in seven — had a disability, an
increase from earlier that decade [15, 21]. In Sweden, about 1.5 million people have
some type of disability [35], and if families are taken into consideration, the number of
citizens who are directly involved increases exponentially. The question that comes to
mind has to do with the way we have reached these many disabled people in the 21st
century.
Historical-critical analysis of the ideas and attitudes that for centuries have shaped
the lives of people with disabilities largely ignores the path of suffering, discrimination,
and social exclusion to which they have been subjected. This is certainly paradoxical if
we consider that many problems related to a disability are often caused more by issues
of a social nature than by the health condition per se. Oppressive reactions from
individuals without a functional disability have been in many instances more detri-
mental for disabled people than the psychomotor limitation with which they live [2]. In
the introduction to the 2009 Federal Disability Report, former Human Resources
Minister of Canada, Diane Finley, said “[t]he challenges people with disabilities face in
their day-to-day lives are numerous and often go unnoticed” [20]. This realization has
996 S. A. Navarro Ortega and M. Pilar Munuera Gómez

been critical for understanding how to develop appropriate public policies. Fortunately,
over the last few decades, this disquieting situation in which many persons with a
disability live has been gradually improving, as mentioned in the previous section. The
effort and commitment of a few organizations and government agencies have con-
tributed to the overcoming of numerous obstacles, which in turn has brought about an
integration of the community of disabled persons into mainstream society [11].
It is important to bear in mind that social changes concerning treatment of people in
a state of dependency have resulted from the struggles and perseverance of the disabled
people themselves and their families. Together, they have strived to achieve civil rights
and social equality [21–23]. Likewise, important emphasis has now been assigned to
the possibility that persons with a functional diversity are able to enjoy independent
living [34]. In other words, the expectation remains that people who live with a dis-
ability can in fact participate in society enjoying the fullness of their rights (e.g., make
personal decisions such as getting married).
The UNICEF World Report on Disability shows how “[p]olicy shifted towards
community and educational inclusion and medically-focused solutions which have
created more interactive approaches recognizing that people are disabled by environ-
mental factors as well as their bodies” [39]. Some societies have promptly responded to
these demands [3, 4]. This is the case with the 2005 Disability Discrimination Act in
the United Kingdom of Great Britain and Northern Ireland. This Act led public sector
organizations to further equality for persons with disabilities through initiatives such as
introducing a corporate disability equality strategy or evaluating the potential impact of
proposed policies and activities on disabled people [18]. Similarly, the Swedish gov-
ernment’s disability policy aims to offer persons with functional diversity increased
opportunities to participate in society on equal terms with others. The official site of
Sweden notes that the government has identified several priorities of importance for
disabled people, among which the justice system, transportation, and Information
Technology (IT) are top priorities [35]. In the case of IT, the goal has been “to give
people with disabilities a greater degree of independence”. Consequently, there is a
pronounced emphasis on digital inclusion in Sweden’s national IT strategy [35].
In Canada, the federal government has been consistently working toward creating
policies that ensure the well-being of an increasing number of disabled people. The
official site of the Government of Canada has a special section called Living with a
Disability, whose aim is to inform citizens about the many services and financial
benefits available to assist people with disabilities and their families [20]. Likewise,
users learn that Service Canada has a list of all the benefits (e.g., pension plan disability
benefits, benefits for children, etc.) available for Canadians who are functionally
diverse. The Canadian government, in line with the World Report, acknowledges that
an aging population now has a much longer life expectancy. And the most common
current types of disability are those caused by a natural process of psychomotor
deterioration such as pain, and by mobility and agility issues [21].
In an effort to appreciate the hegemony of disability, it is fundamental to understand
how the individualization of disability is interconnected across levels of society, in
politics, in practice, and in personal experience. Considering these interconnections is
vital in order to reformulate disability as an issue for society, and to develop a more
appropriate understanding of political responses, professional practices and personal
Accessibility and New Technology MOOC 997

experience [32]. In what follows, we introduce an educational initiative that is being


developed in Spain and which directly addresses the issue of social integration for all
through new information and communications technologies.

4 MOOC-Disability and Active Aging: Technological


Support

4.1 General Description


The Massive Open Online Course (MOOC) Disability and Active Aging: Technolog-
ical Support is an educational initiative developed at Universidad Complutense,
Madrid, with collaboration from other national and international institutions. The
course is available on the Miriadax platform https://miriadax.net/web/discapacidad-y-
envejecimiento-activos-soportes-tecnologicos at no cost.
The primary objective of this course it is to offer information about technologies
whose application enhances the life of persons living with a disability (e.g., indepen-
dent living, autonomy, inclusion, accessibility). With its modular structure, the course
allows students to complete one or more modules simultaneously, in any particular
order, advancing at their own pace. Students learn that IC technologies have trans-
formed production systems, creating a context of liberation and increasing competition
in a globalized world. For the community of the disabled and the elderly, these new
technologies represent a clear possibility of achieving social integration for the com-
munity at large. Computers, microelectronics, multimedia, and telecommunications are
examples of highly widespread information and communications technologies studied
in the course. These technologies are largely available in homes, workplaces, and
academic centres.
The course contents are highly useful, and they may motivate the creation of new
technological advances that continue to favor equal opportunities for the disabled and
the elderly. This course, then, expects to promote both creativity and interest in the use
of ICT, bearing in mind that its application should maintain respect for the individual
and his or her rights [8, 9, 14].
Course participants learn how a correct application of IC technologies increases
success in education; hence, institutions are able to more successfully accommodate
disabled students and senior citizens. For instance, in Module 10, students learn about
international experiences from Canada with respect to how this country deals with the
issue of learning disabilities among the adult population. In particular, students learn
that Canadian seniors with learning disabilities are far more successful at meeting their
needs for aids and devices now than they were in 2001 (e.g., more than 56% of adults’
needs were fully met, compared with just 17.4% earlier) [20]. Conversely, younger
Canadians and those with communicative disabilities experience problems in accessing
the necessary aids and devices, for the most part due to cost [20, 21].
The course invites participants to critically analyze some of these technologies.
Drawing on the notion of Individual Learner Differences, [14] it stresses the fact that
each person has different skills and abilities, meaning that we all learn at our own pace.
Bearing this idea in mind, the course emphasizes that the design and development of IC
998 S. A. Navarro Ortega and M. Pilar Munuera Gómez

technologies should consider individual differences [1, 14, 25]. What’s more, it
emphasizes that technological support should be experimentally tested and integrate a
diversity of users. The idea is that when these products become available in the market,
the community of disabled users or the elderly will be able to draw on previous
experience to inform them. With this idea in mind, students are expected to participate
actively (e.g., joining discussion groups online and in class), to share information, and
to discuss personal experiences, etc. The ultimate goal is to inspire new advances while
maintaining a critical and analytical position.
The team of instructors includes academics from Spanish and international uni-
versities, as well as experts in disciplines such as disability, aging, new technologies,
and social policies. A unifying characteristic of all the instructors is their interest in ICT
that is capable of facilitating the social inclusion of disabled persons and the elderly.
Students also receive first-hand experience from people who employ assistive tech-
nology to improve their quality of life.
The MOOC is particularly useful for students of ICT Engineering, as they need in
their technological designs to consider users who are disabled or face a natural aging
process [8, 9]. Likewise, students in the Social Sciences, Law, and health-related fields
benefit from learning about improvements in living conditions thanks to the use of IC
technologies [16]. Ideally, positive outcomes studied in the course will promote
interdisciplinary start-up projects.

4.2 Course Objectives


The following are general objectives envisioned for the course:
1. Raise awareness in the community about the advances that the use of IC tech-
nologies offers to facilitate social participation and accessibility for persons with a
functional diversity.
2. Inform professionals and the community about the possibilities that IC technologies
offer the elderly in order to maintain an active aging process.

4.3 Course Modules


The total structure of the course is comprised of 11 modules. Each module has its own
set of contents and objectives. Students can complete the modules in any order and, as
mentioned before, they can study more than a single module at a time. See Table 1 for
a complete description.

4.4 A Comparative View of Universidad Complutense’s MOOC


and Those from Other Institutions
The MOOC Disability and Active Aging: Technological Support, developed by aca-
demics from Universidad Complutense, is certainly not the first course nor the only one
of its type. A quick search of the Web shows that several institutions are offering
courses on disability such as MOOC: e-Learning inclusive [e-Learning Inclusive] [25]
or accessibility through the use of ICT for customer and employees with disabilities
Accessibility and New Technology MOOC 999

Table 1. Modules and descriptors of the MOOC Disability and Active Aging: Technological
Support.
Modules Contents
Module 0 Introduction and Course Presentation
Module 1 Social Participation of Disabled People through Access to New Technologies
Module 2 Accessibility to the Web
Module 3 Public Policies for Persons with a Functional Diversity and for Active Aging
Module 4 Social Intervention for the Elderly
Module 5 Food and Nutrition for a Healthy Aging
Module 6 Tele-assistance and Digital Home. Technological Support Tools for the Well-
being of the Elderly
Module 7 Alzheimer Patients: Social and Family Intervention, Memory Treatment.
Software and Computing Applications
Module 8 Social Services within the Context of Disability
Module 9 Telemedicine
Module 10 International Experiences
Module 11 Occupational Therapy for Healthy Aging

[17] (see Accessibility MOOCs and Free Online Courses for more information) [27].
Likewise, academic forums such as the annual international congress on the theme of
University and Disability [19] or the Closing the Gap 2018: 36th Annual Conference
[7] gather researchers, academics, therapists, clinicians, and experts in disability and
accessibility.
It is certainly encouraging to see these numerous initiatives, as they all, in one way
or another, bring the community of disabled people to the forefront of our discussion.
Furthermore, they trigger the creation of new courses on the topic of disability. Clearly,
educational institutions vary on the syllabus design and objectives envisioned for each
course. For example, our MOOC offers a very practical review of what technological
devices are currently available to enhance the social participation and inclusion of the
disabled and the elderly. Also, it emphasizes the view of disabled individuals who have
concrete realities and who support their lives with technology. The course does not
focus on a single theme (e.g., issues of accessibility) as most other MOOCs do; instead,
it presents eleven different but interrelated topics. The unifying thread of all the topics
is the use of information and communications technology to promote the autonomy of
the disabled and the elderly. Because the language of instruction is Spanish, Hispanic
countries can access state-of-the-art information delivered in their first language.
Table 2 summarizes relevant comparisons between Universidad Complutense’s
MOOC and a few programs with similar characteristics.
1000 S. A. Navarro Ortega and M. Pilar Munuera Gómez

Table 2. Comparison of Universidad Complutense’s MOOC and other MOOCs on IC


technologies and the disabled.
Universidad Complutense’s MOOC Other MOOCs
Information and communications New technologies are studied as assistive
technologies are presented from a broad tools for specific objectives e.g., integrating
perspective in which they function as disabled university students [13] or
assistive tools to develop independent living integrating disabled customers or employees
for the disabled and the elderly [17, 26]
There is an integral view of the different Less emphasis is put on viewing IC
applications of IC technologies to improve technologies vis-à-vis the social participation
the living conditions of disabled people. In of the disabled and the elderly. Instead, they
particular, they enhance the development of explain how to design ICT that might be
autonomy more accessible e.g., for education
administrators, [1] e-Learning inclusive [25]
In addition to new technologies, emphasis is Course syllabi are designed around a single
put on topics that favor a better quality of topic e.g., how to adapt an online course for a
living for the disabled (e.g., food and person with cognitive disabilities, [25] the
nutrition, occupational therapy for a healthy impact of transport systems on accessibility,
aging) safety regulations, etc. [37]. They also offer
theoretical arguments to account for social or
economic challenges imposed by longer life
expectancy

4.5 First Version of the Course


In January 2017, the MOOC Disability and Active Aging: Technological Support was
taught for the first time to a total enrollment of 3,334 students. Of these, 79.25%
(N = 2,471) were students under 25 years of age (minimum age), whereas 0.9%
(N = 9) were students 65 years and older (maximum age). The students’ backgrounds
also showed great diversity. For example, 2.79% (N = 93) of the course participants
were academics/researchers, 10.65% (N = 355) were university graduates, 5.55%
(N = 185) were undergraduates, 0.87% (N = 29) had not yet begun university, 0.48%
(N = 16) were university staff, and a good majority of participants 79.66% (N = 2,656)
reported no activity.
Geographical Distribution. The course attracted students from distant geographical
locations. For example, 27.29% (N = 910) students were located in Spain, followed by
an ample participation of Latin American students, (e.g., 6.64% (N = 207) from Chile,
4.38% (N = 146) from Argentina, and 3.84% (N = 128) from Mexico). There were
also students from European countries such as Portugal (N = 11), the United Kingdom
(N = 4), and France (N = 5). From North America, there were 3 students from Canada;
from Oceania, 2 from Australia; and from Eurasia, 2 from the Russian Federation.
From East Asia, there was a student from Japan; and from Northern Africa, there was
one from Morocco. As observed, the MOOC sparked interest from six of the seven
continents of the world.
Accessibility and New Technology MOOC 1001

Methodological Approach. The course disseminated information on IC technologies


largely as aids that improve the quality of life of persons with a functional diversity.
Unlike most MOOCs, ours adopted a comprehensive view of new technologies for
people’s overall well-being.
Instructional Materials. Instructors designed their own syllabi, which included
objectives, materials, activities, and assessment criteria. Moreover, each instructor
facilitated access to a broad range of materials formatted as downloadable pdf docu-
ments, PowerPoint slides, or selected websites. Participants could also screen a series of
educational video materials explaining, among other things, procedures for caring for a
person in a situation of dependency, and applications and devices for blind or visually
impaired people e.g., TapTapSee to photograph and describe objects for a user [36].
Team of Instructors. Instructors were academics, researchers and specialists working
in a variety of fields (e.g., artificial intelligence, psychology, social work). They were
affiliated with universities and institutions in Spain and abroad.
Collaborative Participation. In alignment with the general objective of the course,
students and instructors worked collaboratively. Students took advantage of their
participation in the discussion board, where they exchanged information and held
active discussions. Worth mentioning is the case of a Colombian student who prepared
a video interview on one of the module topics, and showed it in an English-language
class.
Student Satisfaction. At the end of the course, instructors collected feedback from
students. The aim was to learn about their satisfaction with the course, and receive
suggestions that could help perfect it. This is well-illustrated in the case of a
Venezuelan student who wrote the following:
Al finalizar el curso deseo dar las gracias por el apoyo y el material compartido. Ciertamente
lo aprendido me ha hecho ver de forma diferente el estilo de vida y las necesidades de los
adultos mayores y las personas con alguna discapacidad, la sensibilización es muy importante.
En Latinoamérica, específicamente en mi país Venezuela hay mucho por hacer y crear, sin
embargo, visualizar los avances de otros países lo reta a uno como profesional. Y. Y. M. M., 12
de Junio, 2017. [At the end of the course, I would like to express my thanks for the support and
the material that was shared. Clearly, what was learned has helped me see differently the living
conditions and needs of the elderly and of persons with a disability, sensitizing is very
important. In Latin America, especially in my country of Venezuela, there is a lot to do and
create; however, viewing the advances in other countries challenges one as a professional.]

5 Conclusion

The social inclusion of the elderly and persons living with a disability has not always
been a successful enterprise. The marginalization of those who, due a natural aging
process, see their physical and cognitive abilities diminished has been far too common.
Similar situations have been observed among the community of people who live with a
disability.
1002 S. A. Navarro Ortega and M. Pilar Munuera Gómez

The development of new Information and Communications technologies is serving


to consistently reverse societal exclusion. These technological advances, together with
governmental actions and social policies, are ensuring, rightly, that the elderly and the
disabled continue to be protected and acknowledged as valuable community members.
The abovementioned academic initiative led by Universidad Complutense, in paving
the way for widespread experimentation with and dissemination of IC technologies,
plays an important part in our progress toward full social integration and independent
living for persons with a functional diversity.

References
1. Administering School ICT Infrastructure: developing your knowledge and skills, European
Schoolnet. https://www.mooc-list.com/course/administering-school-ict-infrastructure-
developing-your-knowledge-and-skills-european. Accessed 21 May 2018
2. Alemán, C., Ramos, M.M.: Políticas para la Promoción de la Autonomía Personal y
Atención a las Personas en Stuación de Dependencia [Policies for the Promotion of Personal
Autonomy and Attention to People in Situations of Dependency]. In: Alemán, C. (ed.)
Políticas sociales [Social policies], pp. 100–101. Civitas, Thomson Reuters, Pamplona
(2009)
3. Alemán, C., Alonso, J.M., García, M.: Servicios Sociales Públicos [Public Social Services].
Tecnos, Madrid (2011)
4. Alemán, C., Alonso, J.A., Fernández, P.: Dependencia y Servicios Sociales [Dependency
and Social Services]. Aranzadi, Navarra (2013)
5. Casado, D. (ed.): Respuestas a la Dependencia. La Situación en España. Propuestas de
Protección Social y Prevención [Answers to Dependency. The Situation in Spain. Proposals
for Social Protection and Prevention]. CCS, Madrid (2004)
6. Casado, D.: En Busca de un Sistema Conceptual para la Discapacidad [In Search of a
Conceptual System for Disability]. In: Casado, D., García, J. (eds.) Discapacidad y
Comunicación Social [Disability and Social Communication], pp. 29–40. 4ª edn. Real
Patronato de Prevención y Atención a Personas con Minusvalía, Madrid (1998)
7. Closing the Gap 2018: 36th Annual Conference. https://www.closingthegap.com/
conference/?utm_source=Google%20Display%20Network&utm_medium=Search&utm_
campaign=CTG%20search%20ads. Accessed 21 May 2018
8. De Asis, R.: Ten guidelines for the correct interpretation of rights. Age Hum. Rights 1, 25–
33 (2013)
9. De Asis, R.: Ethics and robotics. A first approach. Age Hum. Rights 2, 1–24 (2014)
10. DeJong, G.: The Movement for Independent Living: Origins, Ideology and Implications for
Disability Research. Michigan State University, Michigan (1979)
11. De Lorenzo, R.: Discapacidad, Sistemas de Protección y Trabajo Social [Disability,
Protection Systems and Social Work]. Alianza, Madrid (2007)
12. Disability Jobsite - Supporting People with a Disability. https://www.disabilityjobsite.co.uk/
job/15074006/Service-Manager–Senior-Support-Worker. Accessed 8 May 2018
13. Disability Awareness and Support (Coursera), University of Pittsburgh. https://www.mooc-
list.com/course/disability-awareness-and-support-coursera. Accessed 21 May 2018
14. Dörnyei, Z.: The Psychology of the Language Learner: Individual Differences in Second
Language Acquisition. Routledge, New York/London (2005)
Accessibility and New Technology MOOC 1003

15. Federal Disability Report: Advancing the Inclusion of People with Disabilities (2009).
https://www.canada.ca/en/employment-social-development/programs/disability/arc/federal-
report2009.html. Accessed 9 May 2018
16. Iáñez, A.: De la Exclusión a la Vida Independiente: Resultados de una Investigación con
Personas con Diversidad Funcional Física en Sevilla [From Exclusion to an Independent
Life: Results of an Investigation of People with Physical Diversity in Seville]. In: Capellín,
M. J. (ed.) El Derecho a la Ciudad. Actas VIII Congreso de Escuelas, pp. 93–103. José
Capellín, Gijón (2010)
17. Information and Communication Technology (ICT) Accessibility (edX), Georgia Institute of
Technology. https://www.mooc-list.com/course/information-and-communication-techno-
logy-ict-accessibility-edx. Accessed 21 May 2018
18. Improving the Life Chances of Disabled People: Final Report. Prime Minister’s Strategy
Unit, London (2005)
19. La Universidad, Motor de Cambio para la Inclusión, IV Congreso Internacional Universidad
y Discapacidad. https://ciud.fundaciononce.es/. Accessed 21 May 2018
20. Living with a Disability, Government of Canada. https://www.canada.ca/en/employment-
social-development/services/benefits/disability/living.html. Accessed 17 May 2018
21. More Disabled People in Canada: Report. http://www.cbc.ca/news/technology/more-
disabled-people-in-canada-report-1.825521. Accessed 9 May 2018
22. Munuera, M.P:. Resolución de Conflictos. Promoción de la Autonomía desde la Mediación
[Conflict Resolution. Promotion of Autonomy from Mediation]. Editorial Académica
Española, Saarbrücken (2014a)
23. Munuera, M.P.: Nuevos Retos en Mediación Familiar, Discapacidad, Dependencia
Funcional, Salud y Entorno Social [New Challenges in Family Mediation, Disability,
Functional Dependency, Health and Social Environment]. Tirant lo Blanch, Valencia (2014)
24. Munuera Gómez, M.P., Navarro Ortega, S.A.: The Visually Disabled and the Elderly in the
Age of IC Technologies. Nova Science, New York (2018)
25. MOOC: e-Learning inclusivo [e-Learning inclusive], ASMOZ. https://asmoz.org/es/curso/
mooc-e-learning-inclusivo/. Accessed 21 May 2018
26. MOOC: La discapacidad en el Mundo Laboral [Disability in the Workplace], Plataforma de
Acción Social. http://www.plataformaong.org/noticias/1565/curso-mooc-la-discapacidad-en-
el-entorno-laboral. Accessed 21 May 2018
27. MOOC List, Accessibility MOOCs and Free Online Courses. https://www.mooc-list.com/
tags/accessibility?title = ICT + for + persons + with + a+functional + diversity + &field_
start_date_value_op = between&field_start_date_value[value][date] = &field_start_date_
value[min][date]=&field_start_date_value[max][date]=&sort_by=field_start_date_
value&sort_order=DESC. Accessed 21 May 2018
28. Navarro Ortega, S.: Technologies that help visually impaired spanish learners. In: Munuera
Gómez, M.P., Navarro Ortega, S.A. (eds.) The Visually Disabled and the Elderly in the Age
of IC Technologies, pp. 3–29. Nova Science, New York (2018)
29. Navarro, S., Zebehazy, K.: Learn to Listen, Listen to Learn: What Can We Learn from
English-Spanish Blind Bilinguals to Improve Listening Skills in L2 Spanish? In preparation
30. Oliver, M.: ¿Una Sociología de la Discapacidad o una Sociología Discapacitada? [A
Sociology of Disability or a Disabled Sociology?]. In: Barton, L. (ed.) Discapacidad y
Sociedad [Disability and Society], pp. 35–48. Morata, Madrid (1998)
31. Palacios, A., Romañach, J.: El Modelo de la Diversidad. La Bioética y los Derechos
Humanos como Herramientas para Alcanzar la Plena Dignidad en la Diversidad Funcional
[The Model of Diversity. Bioethics and Human Rights as Tools to Achieve Complete
Dignity in Functional Diversity]. Diversitas-AIES, Valencia (2007)
1004 S. A. Navarro Ortega and M. Pilar Munuera Gómez

32. Palacios, A., Romañach, J.: El Modelo de la Diversidad: una Nueva Visión de la Bioética
desde la Perspectiva de las Personas con Diversidad Funcional (discapacidad) [The Model of
Diversity: A New Vision of Bioethics from the Perspective of the Person with Functional
Diversity (disability)]. In: Ausín, T., Aramayo, R.R. (eds.) Interdependencia del Bienestar a
la Dignidade [Interdependence of Well-being and Dignity], pp. 37–47. Plaza & Valdés,
Madrid (2008)
33. Peter, I.: So, who really did invent the Internet? The Internet History Project. http://www.
nethistory.info/History%20of%20the%20Internet/origins.html. Accessed 8 May 2018
34. Puig de Bellacasa, R.: Concepciones, Paradigmas y Evolución de las Mentalidades sobre la
Discapacidad [Conceptions, Paradigms and Evolution of Mentalities regarding Disability].
In: Casado Pérez, D., García Viso, J.M. (eds.) Discapacidad y Comunicación Social
[Disability and Social Communication], pp. 53–66. Real Patronato de Prevención y Atención
a Personas con Minusvalía, Madrid (1998)
35. Sweden’s Disability Policy. https://sweden.se/society/swedens-disability-policy/. Accessed
16 May 2018
36. TapTapSee. https://taptapseeapp.com/. Accessed 22 May 2018
37. Transport Systems and Transport Policy: An Introduction, Hasselt University. https://www.
mooc-list.com/course/transport-systems-and-transport-policy-introduction-hasselt-university
. Accessed 22 May 2018
38. United Nations Convention on the Rights of Persons with Disabilities. https://www.un.org/
development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html.
Accessed 8 May 2018
39. World Report on Disability, World Health Organization. https://www.unicef.org/protection/
World_report_on_disability_eng.pdf. Accessed 16 May 2018
Lecturing to Your Students: Is Their Heart In It?

Aidan McGowan ✉ , Philip Hanna, Des Greer, and John Busch


( )

School of Electronics, Electrical Engineering and Computer Science, Queens’ University,


Belfast BT9 6AY, Northern Ireland
aidan.mcgowan@qub.ac.uk

Abstract. The measurement of cognitive activity using physiological means


such as heart rate activity is a well-established research practice. Most previous
studies have concluded that elevated heart rate occurs when an individual is
cognitively engaged. However, there have been very few studies focusing on the
effect in a learning environment. The recent proliferation of accurate, cheap and
unobtrusive wearable devices with biometric sensors presents a new opportunity
to perform a relatively inexpensive, natural, large scale study on the biometric
effects on students during a series of lectures. This study presents the design and
results of a unique two year study of students’ heart rate activity during a series
of university computer programming lectures. It benchmarks student heart rate
patterns during lectures and finds that there is a significant correlation between
elevated heart rates and module scores. To the best of the authors’ knowledge this
type of live, natural learning environment study has not been reported before.

Keywords: Heart rate · Programming · Cognitive learning, wearable devices

1 Introduction

The ability to measure mental effort under the stresses of varying cognitive workloads
has been the subject of much research attention. High cognitive workload has been
associated with high mental effort, affecting an individual’s ability to perform a set task
[1]. Researchers have long been aware that there appears to be a finite limit on working
memory in the human brain, with Millar [2] being one of first to quantify mental load
capacities. For most humans, going beyond that limit will result in a cognitive overload
which will substantially interfere and inhibit their performance and learning ability [3].
The process of learning is complex and despite the volume of study it has received
there exists a large number of contrasting definitions and learning paradigms used to
describe it. Behaviorism, Cognitive Information Processing (Cognitivism) and
Constructivism are just some of the relatively recent frameworks of principles
attempting to explain how individuals acquire, retain and recall knowledge. While each
paradigm differs in detail, most practitioners are in agreement that learning is a dynamic
information processing and reasoning activity and that teaching needs to support active
engagement. In third-level education, lecturing typically involves a range of teaching
and learning activities designed to provide stimuli to facilitate learning. The information
processing model [4, 5] suggests that once a stimulus is received and perceived the

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1005–1016, 2019.
https://doi.org/10.1007/978-3-030-02686-8_75
1006 A. McGowan et al.

information is passed to the working memory in the brain where the mind finally
becomes aware of it, potentially resulting in further processing [3].
Given the significance of working memory and cognitive load to the ability of an
individual to perform a task, including learning, a number of processes have been estab‐
lished attempting to measure an individual’s working memory, cognitive load and
mental effort associated with a task. These techniques may be categorized as: perform‐
ance, subjective and physiological. The Stoop test is an example of a performance tech‐
nique where subjects are given a primary task and then concurrently asked to read a
series of cards with the name of a color printed on the card but not in the color’s name.
The response time of reading and error count in relation to an increasing level required
for the primary activity are deemed the measurements of cognitive load. Subjective
measures require subjects to self-report on task difficulty and mental effort required,
examples include the NASA-TLK scale.
However, in recent years physiological measurements of mental effort have grown
in popularity. Common techniques include measurements in changes in blood glucose
levels [6], blood oxygen saturation levels [7], blink rate [8], pupil diameter [9], galvanic
skin response [10] and cardiac activity measurements including Heart Rate Variability
[11] and numerous studies involving Heart Rate (HR). These procedures aim to measure
biological changes that are thought to be caused by increased mental effort. Mental
workload increases have been linked to a lowering of the parasympathetic (“rest” or
“digest”) autonomous nervous system activity (ANS) and an increase in sympathetic
(“fight or flight”) activity [12]. Changes in the ANS can be measured by several phys‐
iological measurements, including Heart Rate, Heart Rate Variability and skin conduc‐
tance [13]. Veltman [14] associated elevated mental workload with increased arousal
and neural activity which intensifies metabolic demand and is the likely cause of
increases in heart rate commonly observed in cognitive physiological studies [12].
Scholey [7] further reasoned that observed increases in HR during cognitive processing
are the body’s facilitation of the delivery of metabolic substrates to the brain that are
then utilized by neural mechanisms underpinning cognitive performance. Fairclough [6]
frameworks the rational for this as the requirement for the mobilization of energy to the
brain especially as the brain has substantial energy demands and also does not have the
mechanism to store energy. Therefore, an increase in workload required by the brain
appears to result in the consequential physiological changes observed, such as in HR
increases.
There have been substantial amounts of study over a lengthy period in the specific
study of heart rate (HR) in relation to cognitive activity. HR is now the peripheral
measure most used to assay effect and cognition [15]. As far back as the 1970s a series
of experiments by Lacey [16, 17] demonstrated that tasks requiring increased cognitive
processing are associated with HR acceleration. Furthered by Kaiser [18] who found
that anagram solving also influenced HR; with the most difficult anagrams producing
the highest increases and the easiest producing the least. Numerous clinical experiments
have been conducted measuring HR and cognitive tasks, including a reported increase
in HR for computer gamers performing complex gaming tasks and by subjects
performing difficult mental arithmetic [19]. Abundant real world occupation studies
have also reported similar HR increases due to increased cognitive task load, such as
Lecturing to Your Students: Is Their Heart In It? 1007

for air traffic controllers [20], fighter pilots [9] and university lecturers [21]. Increased
memory load (number of items) was shown to be accompanied by accelerated HR [22,
23]. Indeed, Cranford’s study [3] directly linked HR to varying degrees of cognitive
load in problem solving and concluded that HR monitoring has further significant
potential use in measuring cognitive load during the learning process. Daly [24]
concluded that there was a significant positive relationship between heart rate and exam
performance. Kohlisch [25] conducted a series of experiments measuring increasingly
higher levels of mental workload on students during computing tasks. They concluded
that at high mental load an individual will experience excessive mental task strain and
that HR was a useful indicator of mental load.
In terms of quantifying the increases in HR, Fredericks [26] found that HR was
increased from resting heart rate when subjects were attempting the Stoop Test (by
12.38%) and during an arithmetic calculating test (by 16.78%). This is a relatively
common theme throughout literature with [27, 28] concurring that percentage HR
increases from 10–20% with similar mental task loads.
As such, there is much concurring literature to support the understanding that the
cardiovascular system responds to cognitive stress [26] but HR activity is also influenced
by many other factors. Anxiety is a potential influence on the heart rate measurements
during a cognitive activity. Anxiety has been linked to physiological arousal [29] with
a small amount being thought to be motivator to perform [30]. Up to a point this works
until the levels become excessive and debilitates performance [31]. Luque-Casado [32]
concludes that it is unclear how individual physical fitness levels affects cognitive
processing, however, regular exercise has been shown to elicit beneficial changes in
brain structures and therefore potentially cognitive performance as well as a lowering
of resting heart rate. Other potential influences tested for significance include age [33],
blood glucose levels [6], arterial blood oxygen saturation levels [7], emotional levels
[34], nutritional status [35] and personality types [23], with other influences such as
gender and time of day of measurement having received little or no attention. To varying
degrees all these studies report on the significance of contributing factors influencing
HR with cognitive activity; yet, all concur in design or conclusion that mental load
activities are positively correlated with HR.
As noted, there are many task situation, clinical, simulated and real world studies in
this area; however, there have been a small number of studies in the use of HR as a
measure of student cognitive engagement in university lectures. Bligh [36] carried out
a series of classroom lecture studies showing that student HR decreased over the course
of a 50-minute lecture. The decline in HR was interpreted as a measure of decreasing
arousal, which Bligh considered as one component of cognitive engagement. In addition,
Bligh reported a single event where a question from a student resulted in an elevation
of HR in other students. Darnell [37] expanded on this work, and concurred with Bligh
that there appears to be a decrease in average HR across a 50 min lecture class and a
temporary increase in HR in response to student questions. In addition, they concluded
that pair-share sessions resulted in elevated average HR.
The devices used in most HR cognitive studies were generally expensive and obtru‐
sive. Consequently most of the studies suffered from small sample sizes and limited
sampling points. The prominent nature of the HR measuring device likely also affected
1008 A. McGowan et al.

the results, with the students acutely aware throughout the experiment that their HR was
being sampled. Anttonen [34] concluded the need for new methods for inconspicuous
heart rate measurement. The recent proliferation of accurate, cheap and unobtrusive
wearable devices with biometric sensors presents a new opportunity to perform a rela‐
tively inexpensive, natural, large scale study on the biometric effects on students during
a series of lectures.

2 Research Objectives

This research study was designed to use wearable devices (Microsoft Band 2) to measure
and record the heart rate activities of a large representative number of students during
a number of lectures. This was to firstly establish a benchmarked understanding of
student HR activities during lectures. The second part of the research was a focused
study with a smaller number of students contrasting their resting heart rate (RHR) with
that of their average heart rate during number of lectures (LHR). The analysis of the
data sought to identify any general patterns of HR activity, to potentially relate these
patterns to cognitive activities and to further check for any correlations with overall
module attainment. To the best of the authors’ knowledge this scale of measurement
and study in a live lecture environment has not been reported on before.

2.1 Research Questions

A. Is there a general decline in average HR of students over a length of a 50 min


lecture ?
The initial focus of this study was to build on and extend the understandings gained
from [36] and [37] in relation to the general pattern of student heart rates throughout
lectures. A baseline understanding of this student heart rate during lectures (LHR) is
required to better analyse HR patterns in relation to individual teaching and learning
experiences and may be relatable to cognitive activities.
B. Is there an increase in student HR during lectures (LHR) in comparison with resting
HR (RHR)?
An analysis and comparison of the average RHR and average LHR, where average
LHR is an average of recorded HR beats as measured per second over a 50 min period.
This seeks to establish a baseline understanding of any repeatable differences that may
be relatable to cognitive activities.
C. Is there a correlation with HR and overall module attainment?
There were three separate investigations within this theme:
• Is there a correlation between the resting HR of students and final module score?
• Is there a correlation between the lecture HR of students and final module score?
• Is there a correlation between percentage variances between the resting HR and
lecture HR of students and final module score?
Lecturing to Your Students: Is Their Heart In It? 1009

3 Methodology

The study was conducted over two years with two different cohorts of postgraduate
students taking a 24 week compulsory module in Java programming in semesters one
and two of a one year MSc course in Software Development. The research purpose and
the methodology to be employed were explained to all the students before the course
and volunteers were requested. A large number expressed an interest and willingness to
participate and subsequently the students were chosen at random. Similar to other studies
in the area any students that had diagnosed cardiovascular defects and smokers were not
included and the students were asked to refrain from caffeine intake one hour prior to
the lecture. Additionally all HR recordings including the students recording of resting
HR where taken at the same time of day. Each student was given a Microsoft Band 2
wearable device and encouraged to wear it regularly. This was to ensure that a baseline
RHR could be established for each student and also to lessen the potential influence on
the results that measuring the HR during the lecture may have had. The HR for students
was sampled per second. After each lecture each student in the study uploaded their HR
data to a secure central server which was accessible to the researchers.
The recording of the HRs was conducted over two stages. The initial stage was
designed to establish the average HR of students during lectures. A total of 70 individual
student HR recordings during 35 lectures of 50 min duration were recorded. There were
35 students involved in this activity (male = 20, female = 15).
The second stage focused on investigating if differences exist between RHR and
LRH. A total of 15 students recorded several 50 min periods at rest. In order to establish
their LHR average the same students then recorded their HR during 10 lectures. An
analysis and subsequent comparison of the RHR and LRH was then conducted.

4 Results and Discussions

A. Is there a general decline in average HR of students over a length of a 50 min


lecture ?
The overall pattern of HR during lectures (Fig. 1) shows three distinct phases. The
initial phase shows the average recorded HR at the beginning of the lecture was 75 bpm.
This then decreases to around 71 bmp within the next 11 min. A mid phase follows
which lasts from minute 12 to 48 with a reasonably constant average HR of 71 bmp with
a slight dip to 70 bmp from minutes 28 to 34. The last phase shows a slight increase over
the last 3 min. The relatively higher HR at the beginning is likely due to recent physical
activity, with the students having just walked to the lecture theatre. The final phase
increase is likely due to the students readying themselves to leave at the end of the
lecture.
This research concurs with previous research [36–38] that there is an overall decrease
in HR during the 50 min of the lecture. However, the decease is very slight and mainly
due to the elevated HR at the beginning of the lecture due to the mild physical activity
of the students’ walking to the lecture venue.
1010 A. McGowan et al.

Fig. 1. Average overage HR profile over a 50 min lecture.

While the average general HR profile shows limited variation over time, this is not
typical when observing an individual profile. As shown in Fig. 2, there is rarely a smooth
profile, with frequent peaks and troughs observed throughout the lecture. Moreover,
there are significant variations between individual students during the same lecture
(Fig. 3).

Fig. 2. Typical individual lecture HR profile.

Fig. 3. Individual HR profile of five students during the same lecture.

This suggests that there are significant differences between the way students react to
the same teaching stimuli. If indeed this is the case then it may be possible to measure
and correlate these activities to expected responses in HR and cognitive activity.
Lecturing to Your Students: Is Their Heart In It? 1011

B. Is there an increase in student HR during lectures in comparison with resting HR


(RHR)?
Benchmarking RHR profiles enabled a comparison with the LHR. A focused
study of 15 students (m = 9, f = 6) were initially targeted. Although the 15 students
were involved in this phase the study reports on the 11 complete sets of data received
(11 students, m = 7, f = 4). One of the students did not complete the course and three
were unable to provide a complete set of data due to short term illnesses during the
data recording period. The RHR and LRH measurements were found to be normally
distributed, RHR (p = 0.986, Kolmogorov-Smirnov, Z = 0.453) and LRH (p = 0.999,
Kolmogorov-Smirnov, Z = 0.382). The results shown in Fig. 4 illustrate that in every
instance for each student their LHR was higher average than their RHR.

Fig. 4. Individual Student HR at rest compared to HR during lectures.

Using a Paired Samples T Test it was found that there was a statistically significant
difference between the mean RHR (64.44 SD = 4.36) and LRH (72.89 SD = 5.5)(p =
0.002, t = –8.772, df = 10). The size of this difference in mean scores was found to be
extremely strong (r = 0.94). This finding concurs with much of the previous HR cogni‐
tive effect studies that there is an increase in HR when an individual is cognitively
engaged.
C. Is there a correlation with HR and overall module attainment?
• Is there a correlation between the resting HR of students and final module score?
Using a Pearson correlation test there was found to be no significant correlation
between RHR and module score (r = 0.146, p = 0.668). This finding partially supports
a similar previous study [32] that baseline HR has no association with cognitive
performance. Although the same study did report that sustained attention tasks were
better performed by students with lower RHR. The findings in the present study would
suggest that RHR measurements on their own could not be used as a predicator for final
module attainment.
• Is there a correlation between the lecture HR of students and final module score?
Using a Pearson correlation test there was found to be no significant correlation
between LRH and module score (r = 0.491, p = 0.125). The findings in this study would
1012 A. McGowan et al.

suggest that LHR measurements on their own could not be used as a predicator for final
module attainment. It is not possible to compare this outcome with previous studies as
this measurement has not been made before.
• Is there a correlation between percentage variances between the resting HR and
lecture HR of students and final module score?
Similar to most other HR and cognitive effect studies the percentage variance in HR
at lecture time in comparison with RHR was established for each student (Table 1). This
is designed to help normalize the natural differences in baseline RHR between individual
students for a more even comparison.

Table 1. Individual RHR, LHR, variances between RHR and LHR and module scores achieved
Student ID Resting HR (bpm) Lecture HR (bmp) Percentage difference Module
in HR score
s1 66.5 72.7 9.3 65
s2 70.0 81.6 16.6 80
s3 69.2 74.5 7.7 70
s4 61.4 67.1 9.3 50
s5 62.0 67.4 8.7 50
s6 58.1 63.7 9.6 72
s7 66.2 74.1 11.9 84
s8 63.1 72.4 14.7 82
s9 70.0 79.1 13.0 74
s10 58.2 71.0 22.0 85
s11 64.1 78.2 22.0 75

The results show that there was an average percentage increase among all students
in HR from RHR to LRH of 13.2%. Each student’s percentage variance increase in HR
from RHR to LHR compared to individual module score is shown in Fig. 5 and demon‐
strates a positive correlation between the two variables. Additionally a Pearson analysis
illustrates a fairly strong and positive correlation (r = 0.691, p = 0.042).

Fig. 5. Module score achieved and percentage increase in HR during lectures for students.
Lecturing to Your Students: Is Their Heart In It? 1013

This finding would suggest that those students that exhibit a higher percentage
increase in HR from baseline achieved better results in the module. It concurs with
previous studies that there is an increase in HR of 10–20% on task demand. It would
also suggest that the students with the higher HR differences were more actively cogni‐
tively engaged at lecture time. While there are many factors involved in final module
mark, it would appear that being more activity engaged, as indicated by HR differences,
could be a potential indicator for the level of module attainment. A linear regression to
include the difference in HR was calculated to predict module score based on HR differ‐
ences. A significant regression equation was found (F(1,9) = 5.580, p = 0.042), with an
R2 of 0.383. Students predicted scores is equal to 52.078 + 1.478 (Percentage increase
in HR). It is also of note that students with the lower HR increase (around 6%) had a
wide range of scores spanning from 50% to 70%. Whereas, the students that recorded
higher HR increases (8–14%) consistently scored higher than 75%. This again is in
agreement with other reported studies indicating that higher HR points to higher cogni‐
tive engagement.

5 Conclusions

The present study is unique in which it utilized non-intrusive wearable devices to


measure HR in a natural learning environment. The findings concur with much of the
previous clinical and simulated studies of HR and cognitive effect showing an increase
in HR under cognitive load. The finding that elevated HR at lecture time is correlated
with higher module scores is potentially significant. The current demand for computer
science graduates has resulted in increasingly larger class sizes in universities. The
challenges of effective delivery in these mass education environments are well docu‐
mented, with one of the recurring themes being high attrition rates, especially in
programming courses [39]. Larger cohort sizes means there are obvious increased diffi‐
culties in identifying struggling students. In the future the increasing common use of
HR measuring wearable devices could be used as leverage to help in the early identifi‐
cation of such students.
The research results would suggest that those students with a higher increased HR
during the lectures are more cognitively engaged during these key learning contact
points. The causality of the higher HR could be a myriad of influences including moti‐
vations to learn, interest in the subject material, the teaching styles employed, where the
student is sitting in the lecture theatre [40] and previous in-term results achieved by the
student.
There are several limitations of the current design. Firstly, heart rate is a gross
psychophysiological measure, yet it does have a proven large scale design research [41].
HR is by its nature an individual measurement and while the study attempted to control
for variables such as caffeine intake, time of day, subject age and health there are many
other potential lifestyle and biological influences that may affect HR recordings. While
the phase one part of this study was with large sample sizes the relatively small sample
sizes for the analysis of module performance is a threat to the generalizability of the
findings, however, many seminal HR studies such as (e.g. [16, 17, 41]) have been
1014 A. McGowan et al.

conducted with similar sizes. Although it is worth noting that there is still a large
sampling required, in this case the baseline study of HR involved over 7.35 million
sample points and the focused study having 363,000 h sample points. Also the study
concentrated in two cohorts of first year programming students as such this also poten‐
tially restricts the generalizability of the findings.
While the study includes some students that were borderline passes it would also be
of interest to study the HR of students that failed the module. A stated aim of the research
was to measure as much as possible without being intrusive; this meant that the students
were responsible for fully engaging with the logistics of experiment. They had to attend
the lectures, ensure that they had the wearables fully charged and to subsequently upload
the data. This process, even with enthusiastic volunteers, proved to be difficult at times
and limited the sample sizes for the focused study. The study is restricted to LRH of
programming students and this also restricts the generalizability of the findings.
The authors suggest that future study in this area could readily be extended to include
other disciplines. Additional experiment variable control or assessment may be possible,
such as controls or consideration of age, stress, well-being, gender and previous subject
knowledge. The individual LHR responses (Figs. 2 and 3) illustrate that there are signif‐
icant differences between individuals but also allude to some commonalities in response
to teaching events. Checking if common increases and decreases are aligned to the
learning activity would enable an analysis of the relative effectiveness of the various
interactive and non-interactive teaching methods employed during the lectures. This has
the potential to increase future student cognitive engagement and lecturer performance
with the aim to increase overall student attainment. Indeed the LHR profile (numbers of
significant increases or decreases in HR) is also worthy of investigation, such as inves‐
tigating if there is a correlation with LHR profile during lectures and overall module
attainment.

References

1. da Silva, F.: Mental workload, task demand and driving performance: what relation. Procedia
– Soc. Behav. Sci. 162(2014), 310–319 (2014)
2. Millar, G.: The magical number seven, plus or minus two. some limits on our capacity for
processing information. Psychol. Rev. 101(2), 343–352 (1956)
3. Cranford, K., Tiettmeyer, J., Chuprinko, B., Jordan, S., Grove, N.: Measuring load on working
memory: the use of heart rate as a means of measuring chemistry students’ cognitive load. J.
Chem. Educ. 91(5), 641–647 (2014). https://doi.org/10.1021/ed400576n
4. Axelrod, R.: Schema theory: an information processing model of perception and cognition.
67(4), 1248–1266 (1973)
5. Mayer, R.: Multimedia Learning, 2nd Ed. Cambridge University Press, New York (2009)
6. Fairclough, S., Houston, K.: A metabolic measure of mental effort. Biol. Psychol. 66(2), 177–
910 (2004)
7. Scholey, A., Moss, M., Neave, N.: Cognitive performance, hyperoxia, and heart rate
following oxygen administration in healthy young adults. Physiol. Behav. 67(5), 783–789
(1999)
8. Beatty, J., Lucero-Wagoner, B.: The pupillary system. In: Cacioppo, J.T., Tassinary, L.G.,
Berntson, G.G. (Eds.) Handbook of Psychophysiology, pp. 142–162 (2000)
Lecturing to Your Students: Is Their Heart In It? 1015

9. Wilson, G.: An analysis of mental workload in pilots during flight using multiple
psychophysiological measures. Int. J. Aviat. Psychol. 12(1), 3–18 (2002)
10. Mirza-babae, P., Long, S., Foley, E., McAllister, G.: Understanding the contribution of
biometrics to games user research. In: Proceedings of the 2011 DiGRA International
Conference: Think Design Play DiGRA/Utrecht School of the Arts DiGRA 2011, vol. 6,
January 2011. ISBN/ISNN: ISSN 2342-9666
11. Thayer, J., Hansen, A., Saus-Rose, E., Johnsen, B.: Heart rate variability, prefrontal neural
function, and cognitive performance: the neurovisceral integration perspective on self-
regulation, adaptation, and health. Ann. Behav. Med. 37, 141–153 (2009)
12. Brouwer, A., Zander, T., van Erp, J., Korteling, J., Bronkhorst, A.: Using neurophysiological
signals that reflect cognitive or affective state: six recommendations to avoid common pitfalls.
Front. Neurosci. 9, 136 (2015)
13. Berntson, G., Bigger, J., Eckberg, D., Grossman, P., Kaufmann, P., Malik, M.: Heart rate
variability: origins, methods, and interpretive caveats. Psychophysiol. 34(6), 623–648 (2007)
14. Veltman, J., Gaillard, A.: Physiological workload reactions to increasing levels of task
difficulty. Ergonomics 41(5), 656–669 (1998)
15. Guerra, P., Sánchez-Adam, A., Miccoli, L., Polich, J., Vila, J.: Heart rate and P300: integrating
peripheral and central indices of cognitive processing. Int J Psychophysiol. 100, 1–11 (2015).
https://doi.org/10.1016/j.ijpsycho.2015.12.008
16. Lacey, J., Obrist, B., Black, P, Brener, A., DiCara, L.: Studies of heartrate and other bodily
processes in sensorimotor behaviour. In: Cardiovascular Psychophysiology, Aldine, Chicago
(1974)
17. Lacey, J., Lacey, B., Black, P.: Some Autonomic-Central Nervous System Interrelationships
Physiological Correlates of Emotion. Academic Press, New York (1970)
18. Kaiser, D., Sandman, C.: Physiological patterns accompanying complex problem solving
during warning and non-warning conditions. J. Comp. Physiol. Psychol. 89, 357–363 (1975)
19. Turner, J., Carroll, D.: Heart rate and oxygen consumption during mental arithmetic, a video
game, and graded exercise: further evidence of metabolically-exaggerated cardiac
adjustments. Psychophysiology 22, 261–267 (1985)
20. Wilson, G., Eggemeier, F.: Physiological measures of workload in multi-task environments.
In: Damos, D. (Ed.) Multiple-task Performance, pp. 329–360. Taylor and Francis, London
(1991)
21. Filaire, E., Portier, H., Massart, A., Ramat, L., Teixeira, A.: Effect of lecturing to 200 students
on heart rate variability and alpha-amylase activity. Eur. J. Appl. Physiol. 108(5), 1035–1043
(2010)
22. Backs, R., Selijos, K.: Metabolic and cardiorespiratory measures of mental effort: The effects
of level of difficulty in a working memory task. Int. J. Psychophysiol. 16(1994), 57–68 (1994)
23. Pearson, G., Freeman, F.: Effects of extraversion and mental arithmetic on heart-rate
reactivity. Percept. Motor Skills 72, 1239–1248 (1991). https://doi.org/10.2466/pms.
1991.72.3c.1239
24. Daly, A., Chamberlain, S., Spalding, V.: Test anxiety, heart rate and performance in a-level
french speaking mock exams: an exploratory study. Educ. Res. 53(3), 321–330 (2011)
25. Kohlisch, O., Schaefer, F.: Physiological changes during computer tasks: responses to mental
load or to motor demands? 39(2), 213–224 (1996)
26. Fredericks, T.K., Choi, S.D., Hart, J., Butt, S.E., Mital, A.: An investigation of myocardial
aerobic capacity as a measure of both physical and cognitive workloads. Int. J. Ind. Ergon.
35(12), 1097–1107 (2005). https://doi.org/10.1016/j.ergon.2005.06.002
27. Ettema, J.H., Zielhuis, R.L.: Physiological parameters of mental load. Ergonomics 14(1),
137–144 (1971)
1016 A. McGowan et al.

28. Hitchen, M., Brodie, D.A., Harness, J.B.: Cardiac responses to demanding mental load.
Ergonomics 23(4), 379–382 (1980)
29. Daly, A., Chamberlain, S., Spalding, V.: Test anxiety, heart rate, and performance in A-level
French speaking mock exams: an exploratory study. Educ. Res. 53, 321–330 (2011)
30. Hardy, L., Beattie, S., Woodman, T.: Anxiety-induced performance catastrophes:
investigating effort required as an asymmetry factor. Br. J. Psychol. 98, 15–31 (2007)
31. Hopko, D., McNeil, D., Lejuez, C.W., Ashcraft, M., Eifert, G., Riel, J.: The effects of anxious
responding on mental arithmetic and lexical decision task performance. J. Anx. Disorders
17, 647–655 (2003)
32. Luque-Casado, A., Zabala, M., Morales, E., Mateo-March, M., Sanabria, D.: Cognitive
performance and heart rate variability: the influence of fitness level. PLoS ONE 8(2), e56935
(2013)
33. Mukherjee, S., Yadav, R., Yung, I., Zajdel, D., Oken, B.: Sensitivity to mental effort and test-
retest reliability of heart rate variability measures in healthy seniors. Clin. Neurophysiol.
122(10), 2059–2066 (2011). https://doi.org/10.1016/j.clinph.2011.02.032
34. Anttonen, J., Surakka, V.: Emotions and heart rate while sitting on a chair. In: Proceedings
of the SIGCHI Conference on Human Factors in Computing Systems CHI 2005, pp. 491–
499. ACM, New York (2005)
35. Lieberman, H., Farina, E., Caldwell, J., Williams, K., Thompson, L., Niro, P., Grohmann, K.:
Cognitive function, stress hormones, heart rate and nutritional status during simulated
captivity in military survival training. Physiol. Behav. 165, pp. 86–97 (2016). https://doi.org/
10.1016/j.physbeh.2016.06.037. ISSN 0031-9384
36. Bligh, D.A.: What’s the Use of Lectures? Jossey-Bass Publishers, San Fransico (2000). Or
Intellect Books (1998). Originally published in 1972. Boucsein, W.: Electrodermal activity.
Plenum Press, New York (1992)
37. Darnell, D., Krieg, P.: Use of heart rate monitors to assess student engagement in lecture.
FASEB J. 28(1) Supplement 721.25 (2014)
38. Stern, R.M., Ray, W.J., Quigley, K.S.: Psychophysiological Recording. Oxford University
Press, New York (2001)
39. McGowan, A., Hanna, P., Anderson, N.: Computing gender wars — A new hope. In: 2017
IEEE Frontiers in Education Conference (FIE), Indianapolis, IN, USA, pp. 1–8 (2017). https://
doi.org/10.1109/fie.2017.8190480
40. McGowan, A., Hanna, P., Greer, D.: Learning to program: choose your lecture seat carefully!
In: 2007 IEEE Proceedings of the 2017 ACM Conference on Innovation and Technology in
Computer Science Education ITICSE 2017. ACM, New York, 03–05 July 2017
41. Raine, A., Venables, P., Mednick, S.: Low resting heart rate at age 3 years predisposes to
aggression at age 11 years: evidence from the Mauritius child health project. J. Am. Acad.
Child. Adolesc. Psychiatry. 36(10), 1457–1464 (2007)
Development of Collaborative Virtual
Learning Environments for Enhancing
Deaf People’s Learning in Jordan

Ahmad A. Al-Jarrah(B)

Applied Science Department, Ajloun University College,


Al-Balqa Applied University, Ajloun 26816, Jordan
aljarrah@bau.edu.jo

Abstract. In this research, we aim to address the problem of combin-


ing the benefits of using eLearning environments and let students work
together in groups in a collaborative virtual learning environments. In
this paper, we explain the development of an establishment of a col-
laborative model for deaf people to support collaborative learning. The
created model is an extension of traditional collaborative learning mod-
els in class rooms; it allows multiple deaf students to work together in
a virtual environment to discuss the learning materials, solve problems,
transfer knowledge, etc. We start by extending the collaborative learning
model, by introducing explicit roles for each member in the group. The
model provides a social space for deaf people to communicate using sign
language. Moreover, it is implemented in a collaborative virtual environ-
ment that provides different types of features to support learning over
distance. It allows deaf students to communicate and share knowledge
easily. Bilingual chatting (text and sign language), video conference, and
avatar feature are the main three features in the proposed model.

Keywords: e-Learning · Collaborative learning


Collaborative virtual environments · Deaf people’s learning

1 Introduction

In the last few decades, computer technologies was flourished rapidly and is used
in most of the life fields. The foundation of these large amounts of technologies
aims to enhance, facilitate and make human life easier. Deaf people learning
aspect as other aspects are deeply involved in scientific research. But, building a
good healthy educational environment for deaf people where they need specific
requirements according to their situation is not an easy thing. Moreover, the
large number of disabled individuals requires more efforts to support them. They
have the right to receive services as normal people, and they have full right to
have a good education exactly as the same level of opportunities as a normal
peer [4,9,13].
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1017–1028, 2019.
https://doi.org/10.1007/978-3-030-02686-8_76
1018 A. A. Al-Jarrah

1.1 Motivation

The community education is facing a number of issues that are related to enhance
the deaf people education using technologies [19]. These issues are related to a
number of factors which cause a leek of participation of deaf people in learning
environments. Factors can differ from country to country, such as; the available
resources, culture, funds, etc.
According to the World Federation of the Deaf, the number of deaf people
around the world is around 70 million [22]. The number of deaf people in Jordan
is over 8000, according to Higher Council for Affairs of Persons with Disabilities
(HCD) statistics till 2015 [14]. HCD takes care of people that have disabilities
where they provide different types of services (e.g. education, medical care, train-
ing, etc.). Also, the number of special schools were established to provide a good
education for disabled people. These schools were prepared to be good environ-
ments to teach deaf people. Unfortunately, the increased number of schools in
the last few years still cannot meet the demand of providing a required education
for deaf people in Jordan.
Unfortunately, the number of disabled people that are attending these schools
is still limited. The percentage of people with disabilities that attended schools
in 2011 is only 13% of all people that have disabilities. Although that the number
of schools increased after 2012, the percentage of people that attended schools
decreased over the next three years, the percentages in 2012, 2013, and 2014 are
12.2%, 8.7% and 6.0% respectively [15]. The somber numbers of disabled people
who are attending schools can be attributed to a variety of factors, e.g., there
are no schools close to disabled people, there is no transportation to schools,
there is not enough funds to support schools to increase there ability to accept
more students, etc.
Collaborative learning is a methodology where two or more students interact
and collaborate during the learning process [8]. Collaborative learning activities
have been recognized as increasing technical abilities; the collaborative aspect
allows students to explore learning materials with the backing of a social infras-
tructure, which creates a ‘safer’ and less threatening learning environment. At
the same time, the collaborative aspect introduces a level of accountability and
supports exploration of the learning materials from different perspectives and
interpretations. Collaborative learning has been shown to be particularly impor-
tant to engage traditionally underrepresented groups. Creating a collaborative
virtual learning environment for deaf people will let students interact with each
other more easily, transfer knowledge better, and enhance learning output.

1.2 Brief Literature Review

In this research, we found that there are a number of researches that study
the usage of technology in learning environments for deaf students. These stud-
ies focus on different side of learning for deaf students. The common goals for
these researches are; engage more people in educational environments, develop
the learning environments, enhance the learning outcomes for the engaged deaf
Collaborative Virtual Learning Environments (CLVED) 1019

students, etc. One of the learning environments which is used widely in learn-
ing is e-learning environment. It has a wide range of usage and lets different
types of students–including deaf or hearing impaired students–to be engaged in
learning [13], therefore, complete their study and pursue an advanced degree in
different majors.
Different types of studies focus on e-learning enhancement to support deaf
students with required services that encourage them and enhance their learning
level [5]. Drigas et al. [9] presents a study of an e-learning system which aim to
provide videos for Greek sign language side by side with each text block in the
learning environment. The proposed system provides different type of services for
deaf learners to support the learning environments, bilingual information (text and
sign language), high level of visualization and video conferencing. Kyun et al. [18]
proposed an e-learning system to support blind and deaf students to study together
side be side with normal people.
The first research in Jordan was done by Khwaldeh et al. [17] where they
addressed the interactivity issue in a deaf class room. The study aims to facilitate
and enhance learning for deaf people using a centralized based learning system.
The system provides required videos for learners with a good quality that can be
transferred over Internet with the deliver of all details related to sign language
movements, where the movements will be clear enough to be recognized by deaf
students.
Chowdhuri et al. [6] proposed a virtual classroom for deaf people which aims
to provide learning facilities for deaf people. The proposed system has a number
of functional and non-functional components, which in all aim to provide the
deaf students with required material for the course (chapters, ppts, assignments,
etc.) in sign language.
Bouzid et al. [4] study the effectiveness of using 3D human avatars to pro-
vide educational services for deaf students. The study aims to investigate the
effectiveness of signing avatar technology on SignWriting vocabulary acquisition
and the comprehensive of deaf students. The results show an actual difference
between the scores of learners without using the avatar and the scores of learners
after using a signing avatar.
Real Time Arabic Sign Language Translation System (RTASLTS) is a real
time system that works as a translator between deaf and normal people [10].
RTASLTS consists of three steps; video conference, pattern construction and
discrimination, and text and audio transformation. The system was tested on a
database with 700 gestures. The evaluation results show that the system was able
to translate from Arabic sign language into Arabic text and sound recognition
with a rate of 97.4%. According to the research conclusion, the system can be
used to support the communication between deaf and normal people.

1.3 Summary of Contribution


Collaborative learning is a social act where participants talk among themselves,
listen to different perspectives, articulate and defend their ideas, make a conver-
sation between learners, etc. Students work in pairs or small groups to achieve
1020 A. A. Al-Jarrah

shared learning goals in collaborative learning. Cooperative learning, team learn-


ing, or group learning are other names for collaborative learning [3,20].
Learning is an approach to process and synthesize information, not just sim-
ply memorize and repeat it [12,20]. In a collaborative environment, the learner
actively engages with his/her peers. The diversity in the group members (e.g.
background and viewpoints) cause the learners to gain a number of benefits
while working with others on the common tasks [11]. Moreover, learning in such
environments can be flourished in a social environment as the team members
have conversations for different topics. Also, learners in the collaborative learn-
ing environment are challenged both socially and emotionally while they listen to
different perspectives [11,20]. Collaborative work requires a special experience in
addition to let the learners gain more skills and experience such as defend their
ideas at the same time that they listen to different perspectives. As a result of
that, learners have to start creating their own unique conceptual framework.
According to the previous discussion, collaborative learning can be defined as
two or more students working together and sharing the workload equitably as
they progress toward intended learning outcomes.
Collaborative learning model consists of five basic elements [7,11,12,20]; face-
to-face interaction, positive interdependence, individual accountability, profes-
sional skills, and group processing. These five elements determine the interaction
between members where face-to-face doesn’t necessarily mean to have a meeting
in one place for the group. The group can work together over distance using a
specific provided collaborative environment or using any general communication
technology (e.g. phone, Skype, email, Google hangout, etc.). The main purpose
of this elements is that the collaborative environment should have an interac-
tive channel between the learners in the same group. Where, the second element
focuses on the success of the whole group as one part and cannot divided over
members, the success of one member means the success of all group members.
This goes hand-in-hand with individual accountability, where each member is
responsible of his/her individual task.
The fourth element which about professional skills that each member should
have or gain while work within the group. Working within a group will encourage
the members and help them to develop and practice trust-building. Moreover,
It helps students to practice a number of important skills, such as; leadership,
decision- making, communication, and conflict management skills. The group
should have a methodology to monitor itself to be sure that the whole group is
working together effectively.
Collaborative learning strategies and structures can be used to determine
the group shape, members roles, or the collaboration steps. Think-Pair-Share,
Three-Steps Interview and Pairs Check are an examples of collaborative learning
structures [3,16,20]. The discussion in Think-Pair-Share goes through four steps;
listen, think, pair and share. The instructor posts a question, everyone listens
carefully before making any response, each team member takes their time think-
ing about the answer, team members pair with neighbors to discuss the response,
and finally share the answer with the whole class. The three-step interview
Collaborative Virtual Learning Environments (CLVED) 1021

technique is used usually as an ice-breaker technique or as a team-building exer-


cise. It starts by pairing students and letting one student interview another,
then they switch roles. After that, a group of four members builds by joining
two pairs. Each member of the group introduces his/her partner, highlighting
the most interesting points.
In Pairs Check, students are grouped in teams with four students in each.
Each group is organized in two subgroups of two students each. Each pair works
on solving a problem on a worksheet. One student works on solving the first
problem, where the partner has a coach role. The coach encourages his partner
and offers an exaggerated praise while s/he is solving the problem. After solving
the first problem, they switch roles. After solving two problems, the four students
in the team check each others’ work. At the end, if the team agrees on the
solution, they will announce it.

Fig. 1. The core components of the CVLED model.

2 CVLED Model

The Collaborative Virtual Learning Environment for Deaf people (CVLED)


model is created to encourage deaf people to attend classes and continue their
study wherever they are. Figure 1 shows the core components of the CVLED
model. These components are combined together to create the new model by
merging the benefits of using these components in teaching deaf students. Some
students in Jordan face issues that prevent them from attending schools. This
model can help them in solving this issue by attending class over distance, where
they don’t need to be at school physically. Moreover, The model allows the stu-
dents to collaborate together to solve the posted questions by the instructor.
The instructor generates random groups with four students in each. Students
follow a specific steps to solve the problems and announce their solutions to the
whole class.
1022 A. A. Al-Jarrah

2.1 Overview

The proposed collaborative virtual learning environment is designed to extend a


two collaborative learning models (three-steps interview and pairs check). The
original models were used in classes to let students cooperate together to solve
the posted problems in the class by instructor. The new model merges these
two model and extend them to be used to learn students over distance. Figure 2
shows the four main steps in CVLED model and how the two merged models are
used in the new model. Deaf students’ schools for the deaf can use the model to
connect with deaf students and allow them to attend classes over distance. Each
student receives a user name and password after registration to allow him/her
to enter to the system and attend classes. In each class, instructor and students
follow the model steps to work together to flourish the learning environment.

Fig. 2. CVLED model steps with the two merged models.

The Four CVLED’s Steps. The system allows the instructor to publish the
course materials and questions for students, and allows instructor and students to
create a group study to have a discussion about the materials. In this discussion,
students use a number of provided features (e.g. free hand writing white board,
video conference, avatar interpreter, etc.) to communicates and solve the posted
problems. Moreover, the CVLED model supports the instructors to make sure
that the learning process is implemented completely and all students achieve
the learning tasks that they have to do. S/he works as a monitor for the whole
class, evaluates each student contribution in the group, and ensures that the
learning outcomes are met. Actually, s/he has to be sure that the five elements
of collaborative learning are implemented.
Collaborative Virtual Learning Environments (CLVED) 1023

Instructor and students follow the following four steps to learn and solve the
posted questions (Fig. 2):

1. Creating groups and assign roles: Students work together to solve the
posted problems by instructor in groups, where each group has four students.
The instructor generates groups by using any popular technique (e.g. ran-
domly, balanced background group, balanced gender group, etc.). The group
members work together in the next steps to solve the posted problem. In
general, the groups are dynamic, where the instructor regenerates them each
class, problem, or course materials.
2. Three-Step Interview: It is used as we mentioned before as an ice-breaker
between group members. It will be used each time the instructor regenerates
groups. The group uses the model as the following:
• The group is divided into two pairs.
• Each pair uses the model in two rounds. In the first round, the first partner
in each pair interviews his/her partner. – This step is an important step
to let the group members know each other and feel comfortable while
they work on solving problems.
• In the second round, the members’ roles are switched in each pair, the
second member interviews his/her partner.
• After the two pairs finish the two rounds, the two pairs join together to
build the group with four members.
• Finally, each member of the group introduces his/her partner, highlighting
the most interesting points.
In this step, the group uses video conference or avatar interpreter to commu-
nicate, in addition to text chatting.
3. Pairs Check: The team of four students works in this step on solving prob-
lems where the four students are organized again in two teams of two students
each. Each pair works on solving the problem using the provided features (e.g.
white board, video conference, text chatting, etc.). The problem is divided
into two sub-problems. Each pair starts by solving the first sub-problem by a
student, where the partner has a coach role. The coach encourages his partner
and offers exaggerated praise while s/he is solving the problem. After solving
the first problem, they switch roles. After solving two sub-problems, the four
students in the team check each others’ work. At the end, if the team agrees
on the solution, they will announce it.
4. Announce results: The whole class is build again by joining all groups
together. The instructor recognizes students’ accomplishment. S/he gives each
group members a time to announce their final results. The class discusses each
group results in the mean time, and after all groups finish, the instructor
announce the final results. The whole class celebrates their accomplishments,
and the instructor asks each student to thank his/her partner for his/her
contribution.
1024 A. A. Al-Jarrah

2.2 Implementation
In the previous section, we presented CVLED model which enhances deaf stu-
dents’ learning and encourages more students to attend schools. In this section,
we present the main functionality and features that the CVLED system has and
how it interacts with different types of users.

(a) Send button. (b) Video Conference button.

Fig. 3. Ex. The send and video conference buttons with sign language and Arabic
language caption.

The graphical user interface (GUI) of CVLED. The GUI of CVLED system con-
tains the main features that provide the users with all required functionality. It
was designed as simple as possible to let students use it easily. At the beginning,
each user is asked to enter his/her username and password to login to the sys-
tem. We added a new capability to the system that allows the student to write
his/her name or the password in sign language to login to the system. After the
user login, the main window appears, where it is designed with the three main
components:

1. Menu bar: It has the main menu items that facilitate the usage of the system
by students and instructor. The menu options are changeable according to the
user type (student, instructor, or administrator) and according to the work-
ing phase on the problem. The instructor has the capability to add/remove
user, post course material or question, monitor the work progress via log
file, etc. Students use the options to start video conference, text chatting and
other features that flourish the learning environments and facilitate the group
communication.
2. Toolbar: It is a core component which could be useful to accelerate the access
of features where it can be accessed by the menu bar. The toolbar components
are designed as friendly graphical user interface for deaf students (e.g. buttons
are designed with sign language captions (e.g. Fig. 3 shows two buttons (send
and video conference) with Arabic and sign language captions)). One of the
main buttons in the toolbar is a button for video conference call which allows
the group members to start a video conference. The video conference call has
two options; open the webcam or use an avatar instead of using webcam – this
option is added according to the deaf students requirements in Jordan where
they don’t like to appear in the webcam conversation. The second important
Collaborative Virtual Learning Environments (CLVED) 1025

button allows the user to open a whiteboard, it is a free hand space to write
or draw any shape to support the group discussion. The whiteboard is shared
between the group members which supports collaborative works between stu-
dents where they can have a discussion over distance as they site around a
table in a class room. The whiteboard can be used by the instructor also to
explain any of the course material.

3. Workspace: It is the largest space of the main window. The workspace is


divided into three main components: (1) The collaborative tools space is
designed to contain one of the collaborative tools at a time, it supports the
collaborative learning for the whole class or within each group (e.g. white-
board), (2) video conference space shows the list of all class members’ web-
cams. At a time, one of the users webcam is activated and enlarged. If the
user chooses to use an avatar instead of his/her captured video, the system
transfers the user hands’ moves or translates a text to sign language which
reflects the avatar’s hands. (3) Text or sign language chat area which is used
to support the discussion of the whole class or the group’s members, it can
be used in one of four modes; text to text, text to sign, sign to text, or sign
to sign. In general, the appearance of components depends the current step
(e.g. the whole class works together, individual group works, etc.).
In the workspace area, the instructor can show the learning materials by
publishing text and figures side by side of a video with sign language which
explains the material contents.

User Information. The required information about users are kept in an XML
database. The system asks the user to enter his/her username and password to
login to the system. Therefore, this information in addition to other informa-
tion such as, first name last name, phone number, address, etc. are saved in the
database. Also, the system keeps the main data about groups and each member
role in the group. The system keeps the transaction for each user in a log file.
The log file is important to support the assessment for students which can be
done by evaluating each student’s contribution [1,2].

Application-Level Communication Mechanisms and System Architecture. The


system is designed to provide a synchronous view of the shared collaborative
learning tools (e.g. whiteboard) on the instructor and students workstations
which are connected through the Internet. A communication layer is required
to support the exchange of events among the students in the class or the group
members. RabbitMQ [21] is used to realize the transfer of events between stu-
dents. RabbitMQ is an open source message broker software, that provides a
reliable method to send and receive messages. The Advanced Message Queu-
ing Protocol (AMQP) implemented in RabbitMQ is a suitable protocol needed
in our project. The group members’ sessions are connected by message queues;
each team member will use one queue to receive messages from the other group
members, or s/he can broadcast a message for a queue defined for the whole
group.
1026 A. A. Al-Jarrah

Fig. 4. Collaborative Virtual Environment for CVLED system.

The overall architecture of the system is a standard client-server architecture


(see Fig. 4). Each user in the system executes a local client. The local client
presents an interface to the user similar as CVLED on all other connected users.
The system server enables the interactions between the different CVLED sessions
and maintains synchronization between the local views of the virtual workspace
being manipulated.

3 Conclusion and Future Work


We are working to finish the first prototype of CVLED; it is scheduled to be
ready in May 2018. According to the scheduled finish time of the prototype, the
evaluation of the system is scheduled to be used in the academic year 2018/2019.
The first round will start at the beginning of fall semester 2018. The evaluation
will be done in two rounds; In the first round, students will be divided into two
subgroups; the first group of students will continue to use the old method for
learning, as they can attend schools or receive the learning materials as before.
The second group, will use the system which allows them to attend the classes
and contribute in the learning environments over distance (from home, works,
etc.). The second round is scheduled to be in the second semester 2018/2019.
In this round, the two previous groups will be switched. In the two rounds, the
students will be asked to answer a pre-survey before the beginning of the round
and a post-survey after the end of each round. In addition to the four surveys, the
students contribution and learning outcomes will be used to assess the CVLED
system.
The other modification that we are working on is to extend the model to have
more collaborative strategies (e.g. round table, Jigsaw, etc.). This modification
allows the instructor to form the groups according to the implemented strategy
which makes the collaboration between students more flexible. The group size in
Collaborative Virtual Learning Environments (CLVED) 1027

the first prototype depends on the two implemented strategies (Three-Step Inter-
view and Pairs Check), where in both the group size is four. The modification
allows the instructor to shape groups with different sizes.

References
1. Al-Jarrah, A., Pontelli, E.: “alice-village” alice as a collaborative virtual learning
environment. In: Frontiers in Education Conference (FIE), pp. 1–9. IEEE (2014)
2. Al-Jarrah, A., Pontelli, E.: On the effectiveness of a collaborative virtual pair-
programming environment. In: International Conference on Learning and Collab-
oration Technologies, pp. 583–595. Springer (2016)
3. Barkley, E.F., Cross, K.P., Major, C.H.: Collaborative Learning Techniques: A
Handbook for College Faculty. Wiley (2014)
4. Bouzid, Y., khenissi, M.A., Jemni, M.: The effect of avatar technology on sign writ-
ing vocabularies acquisition for deaf learners. In: 2016 IEEE 16th International
Conference on Advanced Learning Technologies (ICALT), pp. 441–445 (2016).
https://doi.org/10.1109/ICALT.2016.127
5. Canal, M.C., Garcı́a, L.S.: Research on accessibility of question modalities used in
computer-based assessment (CBA) for deaf education. In: International Conference
on Universal Access in Human-Computer Interaction, pp. 265–276. Springer (2014)
6. Chowdhuri, D., Parel, N., Maity, A.: Virtual classroom for deaf people. In: 2012
IEEE International Conference on Engineering Education: Innovative Practices
and Future Trends (AICERA), pp. 1–3. IEEE (2012)
7. Department of Staff Development at Prince George’s Country Public Schools, in
collaboration with the Division of Instruction: A Guid to Cooperative Learning.
http://www.pgcps.pg.k12.md.us/∼elc/learning1.html
8. Dillenbourg, P.: Collaborative learning: cognitive and computational approaches.
advances in learning and instruction series. In: ERIC (1999)
9. Drigas, A.S., Vrettaros, J., Kouremenos, D.: An e-learning management sys-
tem for the deaf people. In: Proceedings of the 4th WSEAS International Con-
ference on Artificial Intelligence, Knowledge Engineering Data Bases, AIKED
2005, pp. 28:1–28:5. World Scientific and Engineering Academy and Society
(WSEAS), Stevens Point, Wisconsin, USA (2005). http://dl.acm.org/citation.cfm?
id=1363642.1363670
10. El-Alfi, A., El-Gamal, A., El-Adly, R.: Real time arabic sign language to arabic
text & sound translation system. Int. J. Eng. 3(5) (2014)
11. Felder, R.M., Brent, R.: Cooperative learning. In: Active Learning: Models from
the Analytical Sciences, ACS Symposium Series, vol. 970, pp. 34–D–53 (2007)
12. Srinivas, H.: Knowledge Management (2014). http://www.gdrc.org/kmgmt/index.
html
13. Hashim, H., Tasir, Z., Mohamad, S.K.: E-learning environment for hearing
impaired students. TOJET: Turkish Online J. Educ. Technol. 12(4) (2013)
14. HCD: Higher council for the rights of persons with disabilities (2018). http://hcd.
gov.jo/en/events
15. HCD: Higher council for the rights of persons with disabilities reports and docu-
ments (2018). http://www.hcd.gov.jo/ar/library-downloads
16. Kagan, S.: The structural approach to cooperative learning. Educ. Leadership
47(4), 12–15 (1989)
1028 A. A. Al-Jarrah

17. Khwaldeh, S., Matar, N., Hunaiti, Z.: Interactivity in deaf classroom using cen-
tralised e-learning system in Jordan. PGNet, ISBN, pp. 1–9025 (2007)
18. Kyun, N.C., Tat, L.Y., Saripan, M.I., Abas, A.F.: Education for all: disabled
friendly flexi e-learning system. In: Proceedings of AEESEAP Regional Sympo-
sium on Engineering Education, pp. 120–124 (2007)
19. Lago, E.F., Acedo, S.O.: Factors affecting the participation of the deaf and hard
of hearing in e-learning and their satisfaction: a quantitative study. Int. Rev. Res.
Open Distrib. Learn. 18(7) (2017)
20. Li, M., Lam, B.: Cooperative learning (2005)
21. Pivotal, Inc.: Rabbitmq (2014). https://www.rabbitmq.com
22. WFD: World federation of the deaf (2016). https://wfdeaf.org/our-work/
Game Framework to Improve English Language
Learners’ Motivation and Performance

Monther M. Elaish1,2 ✉ , Norjihan Abdul Ghani1 ✉ , Liyana Shuib1,


( ) ( )

and Abdulmonem I. Shennat3


1
Department of Information Systems, Faculty of Computer Science and Information
Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia
{norjihan,liyanashuib}@um.edu.my
2
Department of Computer Science, Faculty of Information Technology,
University of Benghazi, Benghazi 1803, Libya
m_el81@yahoo.com
3
Department of Computer Science, Faculty of Engineering,
University of Victoria, Victoria, Canada
ashennat@uvic.ca

Abstract. The dominance of English as the global language of entertainment,


education and business creates a strong need to learn and teach it. Learning a
second language is often difficult and educators continue to seek innovative ways
to improve language learning and increase learners’ motivation especially for
second language learners. A number of technologies exist to assist in language
learning, ranging from basic tools of distance, electronic, and mobile learning, to
the use of games on mobile platforms to teach language skills. Developing games
for educational purposes, however, is not a straightforward task even for profes‐
sional developers. As this technology is a recent trend, the available frameworks
and guidelines to help developers as well as educators are still not adequate. In
this paper, we explore an educational mobile game framework that is designed to
improve students’ motivation and enhance English language learning. Through
a review of existing mobile learning frameworks, plus refining and validation
process with experts, the paper identifies necessary components and their holistic
structure. The key components include inputs from the domain of persuasive
technology, Bloom’s taxonomy and educational content. At the core of the frame‐
work, it is a set of guidelines to develop mobile language learning mobile games.
We present an evaluation of the proposed framework using expert input, which
returned a positive and supportive feedback. In a subsequent phase, the proposed
framework will be used to build a mobile game application to enhance language
learning, and an experimental evaluation will be set on a target sample of Arabic-
speaking primary school students.

Keywords: Mobile learning · Mobile game framework


English language learning

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1029–1040, 2019.
https://doi.org/10.1007/978-3-030-02686-8_77
1030 M. M. Elaish et al.

1 Introduction

English is spoken globally and is the main language used in the international academic
and business communities, but learning the language is an added burden for non-native
speakers [1]. Inducing enough motivation is both challenging and important [2] as it
requires intricate balancing of the students’ skills and the learning task at hand [3].
Lack of motivation is a major problem in learning anything. The difficulties of
learning a second language require extended motivation and persistence [4]. [5] defines
motivation as “reason why people decide to do something, how long they are willing to
sustain the activity, and how hard they are going to pursue it”. Digital technologies have
been proposed to improve students’ motivation and increase their engagement in the
learning process. For instance, Mobile learning (m-learning) is rising in popularity even
among people studying the English Language [6]. Indeed, mobile learning has been seen
as a progression from face-to-face learning, to distance and web-based learning, then
mobile-assisted language learning (MALL) [7]. To effectively use mobile learning,
many mobile games for teaching languages. This construct of mobile games for language
learning (MGLL) is the context of this paper.
Mobile educational games are now being used as an educational teaching strategy [8];
there has been link between mobile games learning and grade attainment. Students who use
mobile games for learning attain higher test scores than those who were regularly involved
in project-based lessons [9]. The use of digital game-based application as a motivational
tool in English learning has been recommended [2, 10] as an intelligent adaptation of
gaming can assist non-native speakers in expanding their vocabulary and other relevant
aspects of learning a language [3]. The role of this adaptation is to keep the learner in a state
of balance, neither bored nor overwhelmed by balancing the material offered with the abil‐
ities of the learner. This is done by embedding the learning within a more elaborate game,
with an engaging narrative and a quest that keeps the learner interested in playing [3].
However, learning games should not be just a collection of game elements built together
to form an application that is then labeled as a language learning tool because it tests
vocabulary lists compared, for example, to testing flags of countries [43]. Games for
learning languages, similar to the general case of other gamified applications, should be
built using a systematic process and should refer to models and frameworks of building
apps. A systematic processes of game development for language learning is lacking in the
current landscape of MGLL research. One of the barriers to success has been that previous
mobile learning design frameworks did not include theoretical instructional design guide‐
lines to support mobile learning [11, 12]. The instructional design strategies require a high
level of persuasion and interactivity so this study used persuasive technology to overcome
that barrier. Indeed, questions on the effectiveness of existing instructional design strategies
remain which suggests that new approaches are needed [13, 14].
According to [15], persuasive technology is “any interactive computing system
designed to change people’s attitudes or behavior”. Moreover, it can change their
response depending on user inputs, requirements, and states. In this work, persuasive
technology principles in addition to other educational components are combined to build
a framework, with the main objective of providing a framework for developing language
learning mobile games that can enhance learner’s motivation.
Game Framework to Improve English Language Learners’ Motivation 1031

2 Background

The hallmark of a successful learning game is for the users to enjoy learning. In [16],
the author identifies the main attributes of a game as: providing the player with an active
experience, encouraging learning by doing of the active participant, providing a social
medium that affords “the player with human-to-human interactions and emotional
responses”, being “participatory by providing the player with customized rapid feed‐
back”. This makes learning engaging by promoting behavioral learning, while at the
same time offering rewards and providing role models for the player.
The author in [17] notes that language learning competence is enhanced by the
learner’s awareness, sense of autonomy and authenticity. Awareness involves the under‐
standing of what the learner is doing, while autonomy is the ability of the learner to
make personal decisions about the process. Authenticity refers to the relevance, mean‐
ingfulness and practical application of the learning materials and process. That is, the
learner must be psychologically present while learning and must have the desire to put
in the required effort towards learning a new language.
The author in [18] highlighted the importance of “comprehensible input” in language
teaching. This means that the instructor must create a learning environment that is not
too difficult for the student. The learner must also be active in the language learning,
inculcate problem solving techniques within the language, and create personal ideas in
self-made sentences that convey meaning in the language. In other words, critical
engagement as opposed to passive or mechanical learning will not lead to competence
in learning a language. Interaction in the language is needed for the learner to commu‐
nicate and pass meaning in the target language to others [19].
Persuasive technology research has been focusing on interactive and computational
technologies, such as, the Internet services, computers, mobile devices and video games
[21]. Nonetheless, this includes and is based on theories of human-computer interaction,
results in rhetoric [22] and experimental psychology. According to [23], the introduction
of these new technologies in the classroom introduces new opportunities for education.
As indicated, if teachers encountering persuasive technology as a new tool in the class‐
room are to effectively assimilate it, they must consider how it relates to existing educa‐
tional paradigms and how it may offer something innovative. The author in [20] consider
persuasion in relation to socio-cultural or Vygotskian theories of learning. Whilst not
identifying any actual points of conflict, they do identify a difference in emphasis. They
posit that socio-cultural approaches, with their emphasis on the social process of
learning, do not give enough weight to the quality (i.e. credibility) of the texts, evidence
or tools that are employed.
They further argue that the emphasis on group social processes in learning in Vygot‐
skian theory underplays the role of individual emotional and cognitive preferences in
determining the outcome of the learning process. Credibility is crucial when technology
has the role of being the persuader and the development of credibility should be inte‐
grated into the overall pedagogical strategy. The credibility means the power to influence
and technology does not have that property by nature.
The author in [7] defines a mobile learning framework “in terms of the emerging
procedures and processes that can be used to define the mobile language learning”.
1032 M. M. Elaish et al.

Furthermore, [7] provides a framework that describes critical requirements for creating
time- and context-appropriate learning content. Having a framework that incorporates
the appropriate learning theory and the capabilities of the technologies into the chosen
instructional design strategies are essential to attaining desired outcomes for mobile
learning initiatives [40]. While some design frameworks exist for use in implementing
educational technology, there are concerns about their appropriateness for the design of
mobile learning in all cases [13].
According to [24], there are three domains that teachers should understand and use
to design lessons. These three domains are cognitive (thinking), affective (feeling or
emotion) and psychomotor (kinaesthetic or physical). There are different taxonomies
(classifications) for each domain with psychomotor being the simplest, and the cognitive
being the most complex in that hierarchy.
The domain of learning in these terms was developed and described from 1956 to
1972. There is an untrue notion of assigning all these domains to Bloom, who was the
first author on the cognitive domain and whose name appears in the affective domain as
well. The domains are:
• Cognitive Domain [25]
• Affective Domain [26]
• Psychomotor Domain [27]
This study is concerned with the affective domain, which focuses on emotions and
feelings and can be hierarchically classified into five parts [26]:
• Receiving. The learner’s willingness to receive, sensitivity to the existence of stimuli-
awareness, or selected attention.
• Responding. The learners’ motivation to learn, “active attention to stimuli”, feelings
of satisfaction, or willing responses.
• Valuing. The learners’ beliefs and attitudes of worth and recognition, commitment,
or preference.
• Organization. The learners’ “internalization of values and beliefs involving values
conceptualization and value organization system”. As beliefs or values become inter‐
nalized, the learners organize them in order of priority.
• Characterization. The learner’s “highest internalization and behavior that reflects a
generalized set of values and a” philosophy or characterization of life. So learners
are capable of acting and practicing on their beliefs or values.

3 Research Approach and Methods

To propose a mobile game framework that can motivate and enhance English learning,
we first reviewed three main aspects of existing m-learning frameworks: gaming, moti‐
vation, as well as the learner and learning environment. We suggested to include appli‐
cation components that support each of these aspects. While each application case needs
a different framework, this study:
Game Framework to Improve English Language Learners’ Motivation 1033

• Reviewed the current mobile frameworks to find out the main attributes of the frame‐
works that were used to develop a mobile game for language learning.
• Developed a framework based on persuasive guidelines, affective learning domain
taxonomy and the learners/learning environment. This was based on the assumption
that mobile technologies can broaden the landscape of learning experiences. Models
and frameworks are important in structuring the design, development, implementa‐
tion, and evaluation of these mobile learning experiences [28] and the frameworks
must take into account both the learner and the learning environment [29].
• The framework was certified by experts. The experts evaluated each guideline’s
principle using an online questionnaire to ascertain the validity of each principle.

3.1 Framework Development


In the first design cycle, the framework was developed from the literature. First, two
studies provided the background of the mobile learning framework: [28, 30]. The authors
in [30] used the snowballing technique to select 17 mobile educational frameworks for
analysis that identified their main components as the “learner, device, context, time,
content, social interactions, usability, pedagogy, and surrounding culture”. In [28], the
author reviewed 17 frameworks that they classified into five categories: “pedagogies
and learning environment design, platform/system design, technology acceptance, eval‐
uation, and psychological construct”. This [28] analysis is applicable to this study
because of its arrangement, classification, relevance and purpose. Subsequently, for this
study, three criteria were identified as critical for an m-learning framework for English
language learning:
• Include only frameworks where language is one of or the main coverage area. Four
frameworks fit this criterion [31, 34].
• The second criterion was based on the framework’s focus on mobile games to
increase motivation for language learning. The assumption is that language learning
requires a different mobile framework [13, 29]. Accordingly, the frameworks used
in designing mobile learning must take into account both the learner and the learning
environment [29]. Table 1 shows whether each framework has the above components
or not based on the author’s review.

Table 1. Mapping of framework components to the most relevant frameworks in literature


Reference Mobile L. Learning Game focus Motivation Learner and learning envi‐
ronment
[32] Yes Yes Yes Yes No
[31] Yes Yes No Yes Yes
[33] Yes Yes No Yes Yes
[34] Yes Yes Partly No Yes

• Using Crompton’s [35] characterization, each framework was analyzed on how it


deals with the aspects of “context, learner, device, social interactions, and
1034 M. M. Elaish et al.

pedagogical approaches”. However, not all frameworks showed this, nor were they
all clear on how they applied or achieved each of these aspects. Some of them focused
on location-based learning, which is not our aim.
The author in [32] showed the strongest resemblance to the aspects we identified as
essential for a mobile learning framework for learning English by leveraging the power
of mobile games to motivate learners. However, this framework does not include the

Table 2. Comparison of persuasive technology and game patterns


Persuasive technology Game patterns
[15] principles [21] principle [36]
No. 43 28 73
Conflict among the Less Less High
mobile game patterns
Clear description Less Less High
Motivation focus Yes Yes Some of them

Table 3. Guideline principles based on persuasive technology [37]


Guideline Description on the principle
Reduction Makes the system simpler
Tunneling A method to guide the user through a set of predetermined sequence
actions to encourage or dampen behavior
Self-monitoring Allows users to track their performance and status
Tailoring Design depends on needs, interests, personality, the use of context or
any aspect belong to users group
Convenience Easy to access
Mobile simplicity Mobile applications that are uncomplicated to use will have a greater
potential to persuade
Mobile loyalty Serves its own user needs and wishes
Information quality Delivers current, pertinent, and well-arranged information
Kairos or JiTT (Just In Gives suggestion at the right moment
Time Teaching)
Social facilitation Shows user others performing the same behavior
Social comparison Allows comparison
Social learning Allows users to observe others performance
Competition Technology can motivate users to adopt a target attitude or behavior by
leveraging human beings’ natural drive to compete
Cooperation Technology can motivate the user to adopt a target attitude or behavior
by leveraging human beings’ natural drive to cooperate
Recognition By offering public awareness (individual or group), computing
technology can raise the likelihood that a person or group will adopt a
behavior or attitude
Conditional rewarding Rewards depend on target behavior
Game Framework to Improve English Language Learners’ Motivation 1035

learner and learner environment, which is also important. Therefore, we modified the
framework to include the learner and educator. Moreover, [32]’s framework was devel‐
oped to select the game patterns that were used to solve the motivation problem. The
framework components are based on the game design patterns for mobile games estab‐
lished by [36], and the Bloom’s taxonomy of learning outcomes [25].
To deal with the pattern limitation described by [32] additional issues of persuasive
technology and game patterns were included. For the persuasive technology, there are
two main studies [15, 21]. As illustrated in Table 2, the reason to replace game patterns
with persuasive principles guideline is that of fewer principles compared to patterns,
conflict among patterns and clearer classification of principles.
A persuasive guideline was developed previously by [37] based on three aspects:
mobile, game, and language learning; and a set of persuasive principles shown in Table 3.

4 The Proposed Framework

Research on the introduction of Information and communications technology (ICT) in


education [38, 39] has shown that for its effectiveness, there needs to be understanding of
both the strengths and weakness of the technology, while at the same time being cogni‐
zance of the pedagogical practices required in implementing technology-enhanced learning.
To do this, a framework is proposed for developing educational mobile game application.

4.1 Initial Framework

Figure 1 depicts a framework that integrates the learning ideas from persuasive guide‐
lines and learner and learning environments into the application requirements for a

Educator Learner

Instructional
Educational Mobile Game
Content Application

Guideline

Persuasive
Technology Bloom’s Taxonomy
of Learning (Affective
Domain)

Fig. 1. Initial framework.


1036 M. M. Elaish et al.

mobile game. As stated above, this includes integrating a set of pedagogical approaches
to support learning English. Thus, the framework extends learning in a mobile environ‐
ment by using pedagogical approaches that also include persuasive principles.

4.2 Evaluated the Framework


After the framework was developed, it was iteratively evaluated by experts. The selected
evaluators were experts in mobile learning, game usability, game design and develop‐
ment, human-computer interaction, software development on Web and mobile plat‐
forms, computer and communications engineering, mobile security, graphic design,
visual communication, multimedia studies, instructional design. Thirteen experts were
identified through their research and publications in Google scholar. Of the thirteen, five
of them volunteered to participate in the review. The review iteratively asked the experts
to review and comment on the following eight items:
• Correctness of the framework (5/5)
• Suitability of the framework for a study into mobile games in language education
(4/5)
• Suitability for the framework in capturing elements of motivation in primary school
language learning (3/5)
• Suitability of the framework in the actual design of the gaming application (3/5)
• Ease of use of the framework for primary school learners (4/5)
• An opinion on the use of the application in helping learners to improve their vocabu‐
lary (4/5)
• Possible revisions, enhancement on the framework (2 comments)
• Missing elements in the framework (no comments)
In the final iteration, an agreement on each of the item was sought with the numbers
in brackets above showing the number agreeing with each of the brackets. Two experts
provided queries about the educational theory because they believe this kind of frame‐
work should has educational theory included. For language learning, socio-culture
theory has been added to the framework (see Fig. 2). This is to reinforce the fact that
language is best learnt in a social setup. Therefore, the tenets from the socio-cultural
theory are used the persuasive technology as a social tool for language learning. In this
framework, therefore, persuasive reflects the socio-culture theory which views of
learning, development and motivation as social in nature [41], and emphasizes the inter‐
dependence of social and cultural interactions in the construction of knowledge [42].
Most of the experts have a favourably reviewed the framework and they strongly believe
it could be used to improve primary students’ motivation and performance in English
language learning.
Game Framework to Improve English Language Learners’ Motivation 1037

Educator Learner

Instructional
Educational Mobile Game
Content Application

Guideline

Persuasive
Technology Bloom’s Taxonomy
of Learning (Affective
Domain)

Socio-Culture
Theory

Fig. 2. Proposed framework that integrates learning ideas, learner and learning environments to
form requirements for a mobile game.

5 Conclusions

Designing mobile games for language learning is not an easy task, especially for young
students who do not fully understand the purpose of education or their educational
materials. However, persuasive technology can provide a good way to design an inter‐
face that can guide learners through application steps. This technology can offer theo‐
retical grounding to support a mobile learning framework that is needed for proper
1038 M. M. Elaish et al.

mobile application design. Due to the importance of persuasive technology it should be


considered for each case. In this study, guidelines were developed based on three factors
(mobile, game, language learning) to optimize interface design that covers all the tools
used in the application. Expert opinion and feedback are important elements to ensure
that the design of the framework follows the guidelines and can be applied and used to
design applications. In this case, experts evaluated the framework and gave positive
comments and feedback.
The ultimate goal of the proposed framework in this paper is to provide a guideline
and systematic reference for professional developers to build mobile game applications
that can motivate students. Consequently, the next step in our research is to develop an
actual game application based on the proposed framework. The intended English
Vocabulary Game EVG prototype will be specifically developed to support beginning
students with individual learning practices based on the studied course goals. The appli‐
cation can be seen as a learning support resource that complements other existing tools
for students. The main purpose of the prototype is to showcase the implementation of
the framework proposed in this paper, including the principles of persuasive technology,
and to allow for the experimental evaluation using a sample of primary school students.

References

1. Cheng, C.-M.: Reflections of college English majors’ cultural perceptions on learning English
in Taiwan. Engl. Lang. Teach. 6(1), 79–91 (2013)
2. Ma, Z.-H., Hwang, W.-Y., Chen, S.-Y., Ding, W.-J.: Digital game-based after-school-assisted
learning system in English. In: 2012 International Symposium on Intelligent Signal
Processing and Communications Systems, pp. 130–135. IEEE (2012)
3. Sandberg, J., Maris, M., Hoogendoorn, P.: The added value of a gaming context and intelligent
adaptation for a mobile learning application for vocabulary learning. Comput. Educ. 76, 119–
130 (2014)
4. Kondo, M., Ishikawa, Y., Smith, C., Sakamoto, K., Shimomura, H., Wada, N.: Mobile assisted
language learning in university EFL courses in Japan: developing attitudes and skills for self-
regulated learning. ReCALL 24(2), 169–187 (2012)
5. Dornyei, Z., Ushioda, E.: Teaching and Researching Motivation. Longman, London (2001)
6. Elaish, M.-M., Shuib, L., Ghani, N.-A., Yadegaridehkordi, E., Alaa, M.: Mobile learning for
English language acquisition: taxonomy, challenges, and recommendations. IEEE Access 5,
19033–19047 (2017)
7. Kukulska-Hulme, A.: Language learning defined by time and place: a framework for next
generation designs. In: Díaz-Vera, E.-J. (eds.) Left to My Own Devices: Learner Autonomy
and Mobile Assisted Language Learning, Innovation and Leadership in English Language
Teaching, 6th edn. Emerald Group Publishing Limited, Bingley (2012)
8. Chen, H., Lin, K., Wang, Y.: The comparison of solitary and collaborative modes of game-
based learning on students’ science learning and motivation. J. Educ. Technol. Soc. 18(2),
237–248 (2015)
9. Huizenga, J., Admiraal, W., Akkerman, S., Dam, G.T.: Mobile game-based learning in
secondary education: engagement, motivation and learning in a mobile city game. J. Comput.
Assist. Learn. 25(4), 332–344 (2009)
10. Elaish, M.-M., Shuib, L., Ghani, N.A., Yadegaridehkordi, E.: Mobile English Language
Learning (MELL): a literature review. Educ. Rev., 1–20 (2017)
Game Framework to Improve English Language Learners’ Motivation 1039

11. Herrington, A., Herrington, J.: Authentic mobile learning in higher education. In: Jeffrey, P.
(ed.) Proceedings of the Australian Association for Research in Education (AARE)
International Educational Research Conference, pp. 1–9. AARE, Australia (2007)
12. Park, Y.: A Pedagogical framework for mobile learning: categorizing educational
applications of mobile technologies into four types. Int. Rev. Res. Open Distrib. Learn. 12(2),
78–102 (2011)
13. Berking, P., Archibald, T., Haag, J., Birtwhistle, M.: Mobile learning: not just another delivery
method, interservice/industry training, simulation. In: Interservice/Industry Training,
Simulation, and Education Conference (I/ITSEC), pp. 1–10 (2012)
14. Koszalka, T.-A., Ntloedibe-Kuswani, G.-S.: Literature on the safe and disruptive learning
potential of mobile technologies. Dist. Educ. 31, 139–157 (2010)
15. Fogg, B.-J.: Persuasive Technology. Elsevier, Amsterdam (2003)
16. Winn, B.-M.: The design, play, and experience framework. In: Handbook of Research on
Effective Electronic Gaming in Education, pp. 1010–1024. IGI Global, Hershey (2008)
17. Van Lier, L.: Interaction in the Language Curriculum: Awareness, Autonomy and
Authenticity. Longman, London (1996)
18. Krashen, S.: We acquire vocabulary and spelling by reading: additional evidence for the input
hypothesis. Mod. Lang. J. 73(4), 440–464 (1989)
19. Cummins, J., Swain, M.: Linguistic interdependence: a central principle of bilingual
education. In: Bilingualism in Education: Aspects of Theory, Research and Practice, pp. 80–
95 (1986)
20. Alexander, P.-A., Fives, H., Buehl, M.-M., Mulhern, J.: Teaching as persuasion. Teach.
Teach. Educ. 18(7), 795–813 (2002)
21. Oinas-Kukkonen, H., Harjumaa, M.: Persuasive systems design: key issues, process model,
and system features. commun. Commun. Assoc. Inf. Syst. 24(1), 28 (2009)
22. Bogost, I.: Persuasive Games: The Expressive Power of Videogames. MIT Press, Cambridge
(2007)
23. Mintz, J., Aagaard, M.: The application of persuasive technology to educational settings:
some theoretical from the HANDS project. In: Proceedings of Poster Papers for the Fifth
International Conference on Persuasive Technology, Persuasive 2010, pp. 101–104. Oulu
University Press (2010)
24. Wilson, L.-O.: Making instructional decisions. https://thesecondprinciple.com/teaching-
essentials/instructional-decisions/. Accessed 28 Mar 2018
25. Bloom, D.: Taxonomy of educational objectives. In: Handbook 1: Cognitive Domain. David
McKay, New York (1956)
26. Krathwohl, D.-R., Bloom, B.-S., Masia, B.-B.: Taxonomy of Educational Objectives, Book
II. Affective Domain. David Mackay, New York (1964)
27. Harrow, A.: A taxonomy of the psychomotor domain: a guide for developing behavioral
objectives. Addison-Wesley Longman Publishing Co. Inc., New York (1972)
28. Hsu, Y., Ching, Y.-H.: A review of models and frameworks for designing mobile learning
experiences and environments. Can. J. Learn. Technol. 41, 1–22 (2015)
29. Teall, E., Wang, M., Callaghan, V.: A synthesis of current mobile learning guidelines and
frameworks. In: E-Learn: World Conference on E-Learning in Corporate, Government,
Healthcare, and Higher Education, Association for the Advancement of Computing in
Education (AACE), pp. 443–451 (2011)
30. Rikala, J.: Designing a mobile learning framework for a formal educational context. Jyväskylä
Stud. Comput. (2015)
1040 M. M. Elaish et al.

31. Wei, Y., So, H.-J.: A three-level evaluation framework for a systematic review of contextual
mobile learning. In: 11th International Conference on Mobile and Contextual Learning, pp.
164–171. Helsinki (2012)
32. Schmitz, B., Klemke, R., Specht, M.: Effects of mobile gaming patterns on learning outcomes:
a literature review. Int. J. Technol. Enhanced Learn. 4(5–6), 345–358 (2012)
33. Abdullah, M.-R.-T.-L., Hussin, Z., Asra, B., Zakaria, A.-R.: MLearning scaffolding model
for undergraduate English language learning: bridging formal and informal learning. TOJET:
Turk. Online J. Educ. Technol. 12(2), 217–233 (2013)
34. Scanlon, E., Gaved, M., Jones, A., Kukulska-Hulme, A., Paletta, L., Dunwell, I.:
Representations of an incidental learning framework to support mobile learning. In:
Proceedings of the 10th International Conference on Mobile Learning, pp. 238–242 (2014)
35. Crompton, H.: A historical overview of mobile learning: toward learner-centered education.
In: Handbook of Mobile Learning, pp. 3–14 (2013)
36. Davidsson, O., Peitz, J., Björk, S.: Game Design Patterns for Mobile Games. Project Report
to Nokia Research Center, Finland (2004)
37. Elaish, M.-M., Shuib, L., Ghani, N.-A.: Mobile game applications (MGAs) for english
language learning: a guideline for development. In: 94th IASTEM International Conference,
pp. 11–16. Kuala Lumpur (2017)
38. Salmon, W.-C.: Scientific explanation: causation and unification. Critica: Revista
Hispanoamericana de Filosofia 22, 3–23 (1990)
39. Motiwalla, L.F.: Mobile learning: a framework and evaluation. Comput. Educ. 49(3), 581–
596 (2007)
40. Zarei, A., Mohd-Yusof, K., Daud, M.-F.: Mobile multimedia instruction for engineering
education: Why and how. ASEAN J. Eng. Educ. 2, 21–29 (2015)
41. McInerney, D.-M., Walker, R.-A., Liem, G.: Sociocultural Theories of Learning and
Motivation: Looking Back, Looking Forward. Information Age Publishing Inc., Charlotte,
North Carolina (2011)
42. John-Steiner, V., Mahn, H.: Sociocultural approaches to learning and development: a
Vygotskian framework. Educ. Psychol. 31(3–4), 191–206 (1996)
43. Elaish, M.-M., Ghani, N.-A., Shuib, L., Al-Haiqi, M.-A.: Mobile games for language learning.
In: Paiva, S. (ed.) Mobile Applications and Solutions for Social Inclusion, pp. 137–156. IGI
Global (2018)
Insights into Design of Educational Games:
Comparative Analysis of Design Models

Rabail Tahir(&) and Alf Inge Wang

Norwegian University of Science and Technology, Trondheim, Norway


rabail.tahir@ntnu.no

Abstract. The study reports on an ongoing research that intends to identify and
validate the core dimensions for Game-Based-Learning (GBL) and further
explore the shift in dimensional focus between different phases of educational
game development life cycle: pre-production (design), production (develop-
ment) and post-production (testing and maintenance). Hence, this paper presents
the initial work focusing on design phase by presenting a comparative analysis
of educational game design models using GBL attributes, validity and frame-
work attributes as analytical lens. The main objective is to analyze the funda-
mental GBL attributes in existing design models to identify the common
attributes which demonstrate their importance for design phase and highlight
any need for further research in terms of attribute validation and framework
improvement. This study also highlights the strengths and weakness of existing
design frameworks. The results of analysis underline learning/pedagogical
aspects and game factors as the most essential attributes for design phase of
educational games. Comparative analysis also guides researchers/practitioners to
better understand GBL through various properties of different existing design
models and highlights the open problems such as lack of tool support, empirical
validation, independent evaluations, adaptability and absence of concrete
guidance for application to make more informed judgments.

Keywords: Educational games  Game-Based learning  Serious games


Design models  Frameworks  Comparative analysis  Design attributes

1 Introduction

Over the past decade, educational games or game-based learning systems have greatly
impacted the learning industry. However, it has been a constant challenge for educa-
tional game designers to understand the different aspects embedded in game-based
learning [1]. Lately, several researchers have proposed design frameworks/models/
guidelines to guide educational game design [2–16]. According to Neil [17] usually all
proposed design models tend to communicate some core foundational elements, yet
they differ in their approach and results. As there is a lack of dialogue between
researcher and practitioners and also among researchers themselves. Therefore, also at
completely theoretical level, there is a lack of work providing comprehensive com-
parative analysis in the field [17]. To the best of our knowledge, we found only two

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1041–1061, 2019.
https://doi.org/10.1007/978-3-030-02686-8_78
1042 R. Tahir and A. I. Wang

such attempts of comparison studies for learning game design frameworks. Dos Santos
et al. [18] presented a comparison of 5 digital learning game design methodological
frameworks and highlighted their differences and similarities to identify selection cri-
teria for guiding framework choice and promote methodological frameworks as a way
to encourage principled educational games design. However, the framework selection
is not explicitly stated. Likewise, Ahmad et al. [19] presented a survey of different
educational design frameworks; against criteria such as well-designed games, effective
video games, four learning theories and key elements of a games and analyzed them
from software engineering perspective for the development of effective educational
games. However, the keywords are not specifically focused on educational games.
Malliarakis et al. [20], however, did not present a comparative analysis but studied
existing frameworks for educational game design to document the features supported
by current educational games to teach computer programming in order to establish a
framework for the design of their computer programming specific educational game.
Often the underlying purpose of comparison entails valuing one model over
another. However, this is not the sole focus of this study. Rather, the approach here is to
analyze the existing design models/frameworks against core GBL dimensions to pin-
point elements specifically focused for the design phase based on similarities in ana-
lyzed frameworks. The GBL dimensions selected as analytical lens comes from our
previous research results [33]. Although all core dimensions are considered important
for an effective educational game product but dividing them in different phases might
help education game designer and developers to emphasize the focus in that phase and
ease the process. Further, the design frameworks are also compared in terms of vali-
dation of used dimensions and exploring framework attributes to highlight strengths
and weaknesses which would aid researchers and designer in better understanding the
issues in educational game design. The objectives of this study are the following:
1. Exploring game-based learning attributes used in existing design models.
2. Validation of game-based learning attributes by existing models and frameworks:
Support for being theoretically grounded and empirically sound.
3. Comparison of existing GBL frameworks using analytical lens to identify open
issues and highlight their strengths and weaknesses.
The paper is organized into following sections. Section 2 describes background by
presenting an overview of educational game design frameworks/models, Sect. 3
describes the method, Sect. 4 illustrates the comparative analysis, Sect. 5 presents
discussion and finally, Sect. 6 concludes the study with conclusion and future work.

2 An Overview of Educational Game Design


Frameworks/Models

Our previous research study examined the state of the art in game-based learning by
conducting a systematic literature review. The work reported in [21] highlighted the
existing design focused approaches for educational games and these frameworks/
models were selected for the comparative analysis described in this paper. In this
Insights into Design of Educational Games 1043

section, the existing educational game design models/frameworks are presented, and
their objectives are briefly described.

2.1 Level Up
The goal of Level Up [6], is to build new modes to design and evaluate the future
game-based learning systems. The author hypothesized that the framework will
increase the production speed of educational games, increase the quality and offer
scientific evaluation of educational content of the games. According to the author Level
Up framework will make use of a collection of empirical experiments as well as log-
data driven analyses using empirical learning curves for understanding learning in
educational games. The aim is to model learning of students and identify gaps to
improve game development by using educational data mining on game-log data of
students. The learning models could be dual fold: assessing the quality of learning in
educational game and identifying the exact spots for applying in-game feedback (e.g.
hints on more difficult problems). The author makes use of game-log data for evalu-
ating learning in an educational game. The evaluations and logging system together are
considered to provide foundation for developing design principles for an effective
educational game.

2.2 Experiential Gaming Model


The experiential gaming model [8] is developed based on the idea of integrating
experiential learning theory, flow theory and game design. Experiential gaming model
emphasizes the importance of clear goals, providing immediate feedback, and matching
challenges to skill level of players. The model comprises of an experience loop,
ideation loop, and a challenge depository. The model uses the operational principle of
human blood-vascular system as metaphor. The heart of the model is formed by
challenges based on educational objectives. The flow theory is applied and factors
contributing to flow experience are discussed in the model to enhance positive user
experience and maximize educational game impact.

2.3 Framework for the Analysis and Design of Educational Games


This framework for design of educational game [2] is developed based on existing
components including a method for specifying the educational objectives, principles for
instructional design supported by empirical research in learning sciences and a
framework for linking game dynamics, mechanics and aesthetics. The framework
directs the levels which are essentials for an educational game to be effective. The
framework discusses the three components: Learning objectives, MDA and Instruc-
tional principles highlighting the support they can provide to game designer by the
analytical angle. The author highlights that success of educational game is more
prospective when learning objectives of educational game are clearly established early
in development process and if designers carefully think about linking the desired game
aesthetic in game mechanics, via proper game dynamics observing the proven
instructional design principles.
1044 R. Tahir and A. I. Wang

2.4 RETAIN Model


Zhang et al. [16] presented the RETAIN model consisting of six elements (relevance,
embedding, transfer, adaptation, immersion and naturalization). The model is con-
structed on instructional design principles and describes the notorious concepts
between instructional design and game, providing a common framework for educators
and game designers by comprehending the effective integration of game and learning
content to even them out.

2.5 Adaptive Digital Game-Based Learning Framework


The author [13] has identified essential components and features of best practice to be
considered for the design of games-based learning environments based on existing
models and frameworks. The author discusses four frameworks/models in this paper:
The Design Framework for Edutainment Environment, Adopted Interaction Cycle for
Games, The Engaging Multimedia Design Model for Children and Game Object
Model. Based on analysis the developed framework focuses on the learners and the
game design. The framework also highlights some important features such as chal-
lenge, goals, story and objectives not included as part of the framework.

2.6 A Theoretical Framework for Serious Game Design


Rooney [10] investigated a triadic theoretical framework consisting of the elements of
pedagogy, play and fidelity for the design of serious games. The author points out that
the inherent inconsistencies between pedagogy, game design and fidelity make it dif-
ficult to balance these elements during serious game design process and integrating
them in one coherent framework. Another challenge is the multidisciplinary nature of
serious game that require collaboration between members from different disciplines
bringing in the conflicting interests, priorities and from diverse backgrounds can
complicating the process of “balancing”.

2.7 The “I’s” Have It (A Framework for Serious Educational


Game Design)
The framework “I’s have it” for the design of serious educational games is a nested
model of six elements: identity, immersion, interaction, increased complexity, informed
teaching and instructional [4]. The elements of the framework are derived from studies
on design and development of games from Grade 5 to graduate level. The elements are
grounded in theory and research within education, instructional technology, psychol-
ogy, and learning sciences. According to the framework educational games contain
these six elements that come into view in the order of magnitude staring from the
element identity and ending at instructional. According to the author the backbone of
his work is based on the research in constructivist viewpoint which shows that people
learn based on discovering prior schema and eventually building the new knowledge by
connecting their new experience with prior ones.
Insights into Design of Educational Games 1045

2.8 e-VITA Framework for SGs


The framework for serious games developed as a part of e-VITA project [9] focuses on
three key dimensions including technical verification, user experience and pedagogical
aspects (learning outcome). The project highlights serious games as a game, an IT
product, and a learning instrument. It argues that with respect to development and
evaluation, an educational game should have three critical dimensions to be effective
(1) it should be easy-to-use and technically sound; (2) it should be engaging and fun
game; and (3) it should be an effective learning instrument providing desired learning
outcomes. To improve motivation and learning, all the three dimensions should be
targeted, the failure to meet any one dimension could compromise the effectiveness of
serious games.

2.9 Educational Games (EG) Design Framework


The focus of Ibrahim et al. [7] was to develop an educational game design framework
for higher education. This author compared few available frameworks and recommend
the required criteria based on his analysis both from pedagogy and game design
viewpoint. The idea behind this framework is to combine three factors that include
pedagogy, game design and learning content modelling into the educational game
design. The focus of game design is on multimodality and usability. As usability
studies in educational games are not much focused by researchers. Similarly, the focus
of pedagogical factor is learning outcomes and motivation theory. The factors of fun,
problem solving, and syllabus matching are also highlighted.

2.10 Game Factors and Game-Based Learning Design Model


Shi et al. [11] underlined the fact that prior models are designed based on specific game
genres making them difficult to use when target game genre is different from default
game genres applied in research. Therefore, the author presents macro level design
concepts comprising of 11 key factors for game-design. The factors include game
goals, game fantasy, game mechanism, game value, narrative, interaction, challenges,
freedom, sociality, sensation, and mystery. The author verifies the usability of the
model and performance of identified factors for designing educational games by ana-
lyzing two applications.

3 Method

The methodology used in this paper is the comparative analysis of educational game
design models/frameworks using appropriate analytical tools. The Quasi-formal
comparison technique proposed by [22] and used by many researches [23–25] for
comparative reviews is employed in this study.
The comparison of existing frameworks and models with one another is useful to
get an insight into a specific area and identify the gaps for future research. Although, it
is a very difficult task, but the result is often considered to have some sort of researcher
bias as it is based upon the subjective judgment of the researcher. Two alternative
1046 R. Tahir and A. I. Wang

approaches have been proposed for comparative analysis, informal and quasi-formal
comparison. However, informal comparison lacks a systematic framework to direct the
analysis and therefore is more likely to have a subjective bias. Quasi-formal compar-
ison on the other hand attempts to subdue the subjective limitations by presenting a
strategy and creating a baseline for comparison in the form of an analytical tool. Quasi-
formal comparisons can be conducted using different techniques. One technique is to
select a set of critical perspectives or attributes and then compare the objects against
them and this is considered closer to a traditional scientific method [22]. This approach
is adopted for conducting the quasi-formal comparison in this study. For this purpose,
appropriate analytic tools are needed to make analysis and comparison. Although many
researchers have proposed and used analytical tools for comparative analysis [26–29]
but not all fit for the purpose and specific area of this research. The analytical lenses
seen as appropriate for the research objective of this study are classified as:
GBL/educational game attributes; validity and framework attributes. The GBL attri-
butes were selected based on our earlier research study which categorized game-based
learning into six fundamental dimensions using directed content analysis [33] of GBL
literature selected through a systematic literature review [21]. The analytical lenses of
validity and framework attributes are taken from [23, 26, 27]. These analytical lenses
are described along with the references in Table 1. The research study outlines three
research questions, which are as follows:
RQ1. Which GBL attributes are essential for design phase of educational game
development life cycle. (comparison of attributes covered in each
model/framework).

Table 1. Analytical lens for comparative analysis of existing educational game design
models/frameworks
Analytical lens Description Reference
GBL Attributes How many and which GBL attributes are covered by the educational game [21, 33]
design model/framework?
Learning/pedagogical Does the model/framework consider learning/pedagogical attribute, or any
elements related to it?
Game factor Does the model/framework consider game factor attribute, or any elements
related to it?
Affective Reactions Does the model/framework consider affective reaction attribute, or any
elements related to it?
Usability Does the model/framework consider usability attribute, or any elements related
to it?
User Does the model/framework consider user attribute, or any elements related to
it?
Environment Does the model/framework consider environment attribute, or any elements
related to it?
Validity Does the model/framework have support for its claims? [18, 23,
Theoretical evidence Is the model/framework grounded in appropriate theory? (author provide 26]
(Development basis) development basis for the model/framework)
(continued)
Insights into Design of Educational Games 1047

Table 1. (continued)
Analytical lens Description Reference
Empirical evidence Does the model/framework have empirical support for its claims? (details of
(Validation/application) application/validation of framework/model: game name, sample size, validated
elements)
Framework attributes What type of attributes are provided by the model/framework? [18, 23,
Tool/instrument Support Does the model/framework offer tool/instrument support for its artefacts? 27, 28]
Assessment and What types of assessment approaches are used for the model/framework?
stakeholders Which groups of stakeholders are required to participate in assessment?
Applicable Stage What is the most appropriate educational game development lifecycle phase(s)
to apply the model/framework?
Application domain In which application domain(s) the model is mostly applied?
Guidance for application Does the model/framework rely only on abstract principles or it provides
(abstract principles vs concrete guidance? (offer guidelines on how to practically use it for educational
concrete guidance) game design)
Target/adaptability Is the model/framework fit for all educational games (universal/generic) or is it
situation appropriate (specific)? Does it offer adaptability in actual use?
Strength/weakness What are the strengths and weaknesses of the model/framework?

RQ2. To what extent are the attributes being used in existing models validated? Are
they theoretically grounded? Is empirical evidence available?
RQ3. What type of characteristics are provided by existing design models to
operationalize and use them and their strengths and weaknesses?

4 Comparative Analysis

The frameworks described above aimed at establishing guidelines and patterns for
designing effective educational games. A comparison of these models, highlights not
only the fundamental common characteristics to be considered during GBL design
phase but also highlights the distinct aspects and approaches of each framework plus
bringing forward the open issues that still needs to be addressed in GBL design
research. In this section, 15 existing educational design models/guidelines (including
10 models/frameworks and 5 design guidelines/principals) are compared and analyzed
using the three categories of analytical lenses (GBL attributes, validity and framework
attributes) described in Table 1.

4.1 Key GBL Attributes


Among the most significant comparison features is the number of key attributes a
model/framework deal with [26]. Six fundamental GBL elements were selected for
comparative analysis of design frameworks (see Table 2). These include learning/
pedagogy, game factors, affective reactions, usability, user and environment. The
reason for selecting specifically these six attributes as analytical lens is because they are
identified as core dimensions of GBL in our earlier research study [33]. Therefore, the
aim here is to identify if any of these six attributes should be more focused or par-
ticularly essential for the design phase of effective educational games.
1048

Table 2. Comparative analysis of educational game design models/frameworks based on key GBL attributes
Design-focus frameworks Learning/pedagogy Game Affective Usability Users Environment Total
factor reactions
Game-Based Learning X (learning objectives) X (game X (User X (child 4
Guidelines [3] req.) Interface) req.)
Level Up [6] X (learning) 1
Experiential gaming X (experiential learning) X (Game X (flow) 3
model [8] design)
R. Tahir and A. I. Wang

Usability guidelines for X (Usability) X (Context) 2


mobile educational games
[14]
Framework for analysis X (learning objectives, X (MDA 2
and design of educational Instructional design)
games [2]
RETAIN Model [16] X (Relevance, Embedding, X 2
Transfer, Adaptation, (immersion)
Naturalization
Adaptive Digital Game- X (Game X (Learner) 2
Based Learning design)
Framework [13]
A Theoretical Framework X (pedagogy) X (fidelity) X (play) 3
for Serious Game Design
[10]
“I’s” have it [4] X (instructional) X X 3
(identity) (immersion)
(continued)
Table 2. (continued)
Design-focus frameworks Learning/pedagogy Game Affective Usability Users Environment Total
factor reactions
User Experience for X (learning content) X (game X (usability) X (mobility) 4
Mobile Game-Based play)
Learning [12]
EGameDesign [15] X (Knowledge X 2
enhancement) (Enjoyment)
e-VITA framework for X (Pedagogical aspects) X (affective X (usability) X (Technical 4
SGs [9] aspects) verification)
Educational Games X (pedagogy, learning X (Game 2
(EG) Design Framework content) design)
[7]
Design principals for X (design 1
serious game [5] principal)
Game Factors and Game- X (Game 1
Based Learning Design Factors)
Model [11]
Total models:15 11 10 6 4 2 3
Bold X is used when all factors of that attribute are covered by a framework and X when only some are covered.
Insights into Design of Educational Games
1049
1050 R. Tahir and A. I. Wang

Learning/pedagogical entails the elements related to pedagogy and learning such as


learning objective, strategy, content and outcome. Game Factors include the features of
a game world that encompass every perspective of game environment (game definition,
mechanics, narrative, aesthetics, resources). Affective Reactions depict the emotions
and feelings stimulated during interaction with educational game such as (flow,
engagement, motivation, enjoyment). Usability signifies how usable is the educational
game by its users in achieving its goals (learnability, satisfaction, interface). User is the
learner/player playing the educational game and their characteristics such as profile,
cognitive and psychological needs. Lastly, environment describes the technical and
context-related aspects of educational game. Table 2 presents the comparative analysis
based on these GBL attributes.

4.2 Validity: Theoretical and Empirical Evidence


This section analyzes the design frameworks in terms of their validity, examining the
theoretical and empirical support available for each framework. The theoretical validity
is examined to explore the development basis and foundations of these design
frameworks/models. Empirical support is required to see if the existing design models
are grounded in empirical evidence or applied to any educational game. It is important
to see if the existing educational game design models have strong practical orientation
in real life educational game design and development using empirical studies or just
present in research work. Table 3 details the models/frameworks with their develop-
ment basis, empirical validation or application, educational games on which the model
is applied, sample size of empirical study and the elements of model/framework val-
idated in the study.

4.3 Framework Attributes


The existing design frameworks are also analyzed with analytical lens of framework
attributes mentioned in Table 1. The comparative analysis of educational game design
frameworks in terms of tool support, assessment and stakeholders, application stage,
domain, guidance for application and target/adaptability is presented in Table 4.
Table 5 highlights the strengths and weaknesses of each mentioned framework. For
this part of analysis, we have only included the design frameworks/models and not
design principals/guidelines. Therefore, a total of 10 frameworks are compared here.
The framework attributes are briefly described here: a tool support facilitates to
capture the design artefacts together with evaluation outcomes, decision rationales and
measurements that are invaluable assets [23]. A stakeholder is any representative or
person having interest in the system [23]. A perspective of abstract versus concrete
guidance allows to assess guidance for application, whether the frameworks offer any
concrete guidance for their application in designing educational games or just rely on
abstract rules e.g. to illustrate this “respect people” without providing any guidelines on
how to perform it is an abstract principle [23]. The target of analyzed design models
can be categorized as general or specific based on whether model can be used for the
design of any kind of educational game and for any target audience or they focus on
any specific platform, audience or game genre, providing specific guidelines for their
Insights into Design of Educational Games 1051

target. Design models are used for the design process of educational games therefore,
the application stage is the design phase. However, some of these models claim to be
equally applicable to other stages of development lifecycle.

Table 3. Comparative analysis of educational game design models/frameworks based on


validity
Model Theoretically Empirical Educational Sample Validated
Ref grounded validation/application game(s) size elements
(Development
basis)
[3] Reviewed No validation
literature* (not
specified)
[6] Intelligent tutoring Yes (empirical study) Wu’s Castle 61 Learning
system literature video game curve
[8] Experiential Yes [31, 32] IT-Emperor 221 Flow
learning theory, game, Day antecedents
flow theory and Off
game design
[14] Interviews with No validation
educational game
developers, game
design theory, and
game analyses
[2] Existing Yes*(case study), Zombie NI
components: applied framework to Division
method for analyze the game
specifying
educational
objectives,
framework for
relating game’s
mechanics,
dynamics, and
aesthetics, and
principles for
instructional design
[16] Game and Yes*(case Knowledge NI Relevance,
instructional design study), applied for Discovery Embedding,
principals (Keller’s evaluation of Transfer,
ARCS Model, educational game Adaptation,
Gagne’s events Immersion,
principles of Naturalization
Bloom’s
scaffolding)
(continued)
1052 R. Tahir and A. I. Wang

Table 3. (continued)
Model Theoretically Empirical Educational Sample Validated
Ref grounded validation/application game(s) size elements
(Development
basis)
[13] Four models: No validation
Design Framework
for Edutainment
Environment,
Adopted
Interaction Cycle
for Games,
Engaging
Multimedia Design
Model for Children
and GOM
[10] NI No validation
[4] Experience of No validation The Great NI
developing and (example only) Entomologist
testing educational Escape
games and using
research from
commercial video
games
[12] NI Case study 1Malaysia 64
[15] Four-dimensional Yes* (case study), VIEW
game-design applied to design a
evaluation learning game
framework and
Bloom six levels of
knowledge
[9] NI Yes* (preliminary e-VITA- NI
validation of game European life
(results not provided) experience
[7] Compares a few No validation
frameworks:
Adaptive Digital
Game-Based
Learning
Framework, Three
Layered Thinking
Model,
Experiential
Gaming Model and
Model for
Educational Game
Design
(continued)
Insights into Design of Educational Games 1053

Table 3. (continued)
Model Theoretically Empirical Educational Sample Validated
Ref grounded validation/application game(s) size elements
(Development
basis)
[5] Literature review Yes *(case study), Gem Game, NI
of related work* applied in 2 Math Grandma’s
(not specifically video games but no Garden Game
stated) evaluation performed
[11] Literature search of Yes Slice it, Xiao- 31 All 11 factors
studies whose Mao
primary concerns
were game factors
NI = Not identified. * is used when it is stated but not explained, not empirical validation or
when results are not provided

5 Discussion

A comparison among existing models/frameworks clarifies the underlying common


features and distinctive aspects. Mainly such comparison provides two benefits: first to
help educational game designer/researchers understand and contrast the alternative
approaches available for selecting an appropriate one, and second to highlight the open
problems for future research. However, this study has a third key benefit of guiding
educational game designers in design phase by highlighting the essential attributes for
design of educational games. This study performs the comparative analysis of educa-
tional game design models/frameworks through the perspective of important GBL
features that in our viewpoint could be considered as the core dimensions and are
fundamental for an effective GBL product. Although all of these attributes are
important for educational game development life cycle, but the view or focus may
change in different phases of design, development and evaluation; leading to some
attributes more important in one phase than the other. Therefore, the idea is to explore
this shift and focus.
RQ1: The comparison among existing models/frameworks in terms of GBL attri-
butes clarifies the underlying common features for design phase. 11 design models
included learning attribute mostly focusing on learning objectives, learning content,
instructional design, knowledge enhancement/transfer and pedagogical aspects.
10 frameworks focused on game factors with emphases on game design including
factors such as goals, mechanics, dynamics, aesthetics, narrative and fidelity. However
only 6 design frameworks focused on affective reactions such as experiential gaming
model emphasized on flow experience, RETAIN and I’s focused on immersions,
EGameDesign focused on enjoyment. Although it is a common feature of digital games
and considered equally important in educational games as well, but in design models it
comes after learning and game factors. Usability is approached by 4 frameworks/
guidelines including e-VITA, experience for mobile game-based learning, usability
guidelines for mobile educational games and game-based learning guidelines.
1054

Table 4. Comparative analysis of educational game design models/frameworks based on framework attributes
Model Tool support Assessment/stakeholder Assessment Guidance for Target/adaptability Applicable Domain
Ref method application stage
[6] NO Mixed (user &model)/ Qualitative Partial guidance Specific/NI Design and Computer science
students, user evaluation
[8] NO NI NI Abstract General/NI Design and IT
analysis
[2] NO Expert assessment/ Qualitative Concrete/application General/NI Design Math
R. Tahir and A. I. Wang

designer and use of


components
[16] Yes/Specified Expert Quantitative Concrete/criteria and General/NI Analysis, Chinese, math,
design and assessment/Teachers case study to apply it design, foreign languages
evaluation and instructional development
criteria designers and evaluation
[13] NO NI NI Abstract General/NI Design NI
[10] NO NI NI Abstract General/NI Design NI
[4] NO NI NI Abstract General/NI Design NI
[9] NO Mixed approach/expert Quantitative Abstract Specific/Yes (used Evaluation and Intergenerational
and users and based on game scope design and intercultural
qualitative &characteristics) learning
[7] NO NI NI Abstract Specific/NI Design Higher education
[11] Yes User-based/player Qualitative Concrete General/Yes (macro Design Geometry/history,
elements for geography, culture
different genre)
Insights into Design of Educational Games 1055

Table 5. Strength and weakness of existing educational game design models/frameworks


Model Strength Weakness
Ref
[6] Uses data-driven analysis of learning The steps in the process of designing
experiences through visualizations, educational games are not clearly
educational data mining, and statistical defined
techniques applied to game logs.
Game-log data are used to model
learning and identify places of
improvements
[8] Model links gameplay with It only provides a link between game
experiential learning to facilitate the design and educational theory not
flow experience guiding the whole game design project.
Several issues such as engaging
storyline, appropriate graphics and
sounds, and game balance are not
included. Only good gameplay cannot
save learning game
[2] Useful analytical tool and also assist to The framework is descriptive and
improve the creativity of educational difficult to apply. It does not offer any
game designer by guiding the tool or instrument support as well
brainstorming of game ideas from both
game design and educational angles.
Encourage thinking across components
rather than individual approach
[16] Offers a common framework for The model provides guidance to assess
educators and game designers by already developed games for classroom
comprehending the effective use. However, does not provide
integration of curriculum and game. practical guidelines to structure the
The model also aids in evaluating the design process for educational game
effectiveness of games used in development. The criterion for design
educational settings as well as to select and evaluation should be refined
valuable games for use in classrooms further to be perfect for educational
game design in practice
[13] Emphasize the pedagogical aspects in key features presented for designing
designing educational games educational games are based on four
frameworks and not all are specific for
educational games. No guidance is
provided on practical application of
framework
[10] The triadic theoretical framework Does not provide any concrete
provides a rich theoretical basis and guidance on steps to integrate them in
present serious game design elements design process or how to operationalize
by outlining underpinning theories and them in serious game design
associated challenges
(continued)
1056 R. Tahir and A. I. Wang

Table 5. (continued)
Model Strength Weakness
Ref
[4] Provides a hierarchy with identity as Model does not provide design steps
core foundational element. Includes and practical application of these
informed learning concept as an concepts in design process with
important element in hierarchy. It reference to their magnitude
exhibits a game concept to demonstrate
learning game design process
[9] Framework emphasize the threefold The framework does not focus on game
nature of educational game and include specific dimensions and doesn’t
technical verification and user provide practical guidelines to
experience along with pedagogical educational game design
dimension, highlighting critical aspects
of each
[7] The model emphasizes on higher The model does not provide concrete
education with game design, pedagogy guidance for application. Although
and learning content modelling as main model focuses on higher education, but
factors and is designed specifically for the compared frameworks used as
student self-learning with incorporated development basis are not specific for
self-assessment modules higher education
[11] Presented macro game design concepts GBL combines game and education but
that can be adapted to different game the model only discussed the game
genre. To build a GBL design model it factors
defines all factors and also analyze the
relationships among them

Environment is covered by three frameworks [9, 12, 14] focusing on context, mobility
and technical verification. User attribute is only focused by adaptive digital game-based
learning framework and game-based learning guidelines that included learner and
children requirements respectively. Majority of analyzed frameworks focus on two
attributes (learning and game design) highlighting their importance in design phase.
None of the design frameworks or even guidelines covered all six attributes.
RQ2: The analytical lens of validity highlighted that all analyzed frameworks to
some extent cited some theory or literature to justify their development. The selection
of a theoretical basis for development of framework is based on the specific objectives
and approach of each framework towards game-based learning. The knowledge of
underlying developmental base is also important for educational game designer to
select the framework appropriate to their objectives. Most frameworks are theoretically
grounded in literature for a pedagogical base and game design principals. Some of the
pedagogical theories used include Blooms taxonomy, Piaget’s schemes and Gagne’s
events of instruction, Vygotsky zones of proximal development, experiential learning
theory and instructional design principals [4, 15, 16, 31].
Insights into Design of Educational Games 1057

Some frameworks (Adaptive digital game-based learning framework and Educa-


tional Games (EG) design framework) compared existing models as developmental
base of their framework. Moreover, “I’s” combined the practical experience from field
with research from commercial games as the development base. When it comes to
empirical validation or application of design frameworks, only two frameworks level
up and experiential gaming model had empirical evidence of their validity with sample
size of 61 and 221 respectively. Learning curve, flow antecedents and game factors in
[11] were the only elements validated by empirical study. However, the frameworks are
validated by the authors who proposed them, and no other educational game so far
reported to use these frameworks in its design. All the other mentioned frameworks
were not empirically validated, only mentioning it as a future work. However, four
frameworks: Framework for analysis and design of educational games, RETAIN,
EGameDesign and design principals for serious game illustrated the application of
framework on educational game as a case study without actual implementation.
RQ3: The comparison on the basis of framework attributes highlighted some open
problems. Surprisingly, no tool support is available by existing educational game
design frameworks except Game Factors and Game-Based Learning Design Model that
provided an instrument called “Game factor questionnaire” and RETAIN model which
provided design and evaluation criteria in terms of level points, higher the points, better
is the designed educational game. The studied models also differ in terms of assessment
and stakeholders involved. Framework for analysis and design of educational games
and RETAIN model focused on expert-based assessment with teachers and designers as
stakeholders, e-VITA framework for SGs focused on mixed approach of both expert
and user assessment and Game Factors and Game-Based Learning Design Model
emphasized on user-based assessment. While the authors of remaining frameworks and
models did not provide any information.
Based on comparative analysis, six frameworks (Experiential gaming model,
Adaptive digital dame-based learning framework, A theoretical framework for serious
game design, “I’s”, e-VITA framework for SGs and Educational Games (EG) design
framework) emphasized on abstract principles rather than concrete guidance and are
limited to high-level concepts without providing any procedural guidance to structure
the design process of educational games. The other three frameworks provided some
form of concrete guidance to support educational game design. Framework for analysis
and design of educational games provided guidance on each of the three components
by illustrating their application on a zombie game and also guided how to think across
component during brainstorming. RETAIN provided a criterion with level points to
assess already developed educational game and a case study to illustrate it. However, it
did not provide guidance for designing a new educational game. Game Factors and
Game-Based Learning Design Model suggested macro elements and represented a
thinking process with a model to help educational game designers incorporate it in their
game along with an instrument (game factor questionnaire) for assessment.
The comparative analysis also illustrated that most of the models are general for
any educational game design and audience. However, there were three specific models,
two of these focused on a specific domain (computer science games in level up,
intergenerational in e-VITA framework) and one focused on specific audience (higher
education students in Educational Games (EG) design framework). The framework
1058 R. Tahir and A. I. Wang

attribute of “adaptability in use” is addressed by only two models: e-VITA framework


which emphasized that framework should be employed depending on the character-
istics and scope of game and Game Factors and Game-Based Learning Design Model
that not only emphasized but also provided the opportunity for adaptation by offering
macro elements that can be adapted for different genre. According to the comparative
analysis, most of the analyzed frameworks focused only on design stage but three
models (Level up, Experiential gaming model and e-Vita) can be used for evaluation or
analysis as well along with design stage. Moreover, RETAIN model claims to be
applicable for all stage (Analysis, design, development and evaluation) of educational
games development life cycle. However, no practical usage is available. The educa-
tional game design models are applied in various educational domains such as com-
puter science, math’s, geography, culture, language and history are particularly
mentioned among the compared models.

6 Conclusion and Future Work

This paper particularly focuses on design of educational games and reports on the
comparative analysis of design models/frameworks for game-based learning. The study
analyzes the use of GBL dimensions and validation in existing frameworks to identify
essential elements for design stage. Secondly it also highlights the differences and
similarities between different GBL design frameworks/models by exploring framework
attributes to guide educational game designer/researchers in making more informed
decisions and also to underline the open research issues in this area. The results of
comparative analysis conclude that: Learning/pedagogy (Learning objective, instruc-
tional design, learning content and knowledge enhancement/outcome) and game fac-
tors (mechanics, dynamics, narrative, aesthetics, goals) are the most essential attributes
for the design of educational games. The attributes of affective reactions (flow,
enjoyment, immersion) comes after learning and game factors. Whereas, usability (user
interface), user (learner requirements) and environment (including technical and con-
text related aspects) are less emphasized by the analyzed educational game design
models. Therefore, the design phase of educational game should emphasize more on
linking learning objective with game objective in an efficient way to facilitate the
affective reactions such as flow in order to engage and immerse the player [8, 10]. The
importance of these three attributes in the design of educational game is also evident
from the developmental basis of these models, most of which are theoretically
grounded in learning and game design theories with focus on ARCS models and flow
theory. However, there is a scarcity of evidence for empirical validity and practical
application of educational game design models for educational game development.
A few empirical studies and developed educational games that exist for framework
validation are conducted by the same researchers who developed the framework in
order to validate it and few elements such as learning curve, flow antecedents and some
game design factors are empirically validated. A bigger community of educational
game designers and researchers is needed who are willing to apply these models for
designing educational games to bring useful insights from industry and go beyond the
researchers who developed these frameworks.
Insights into Design of Educational Games 1059

Therefore, the analysis brings forward two extremely important issues which are in
line with the results of [18]; lack of independent evaluation and absence of practical
application of these design models in educational game industry for designing effective
educational games. This lack of usage and assessment can also be seen as a result of
absence of tool support, lack of adaptability and concrete guidance for practical
application of framework concepts in the design process of educational game devel-
opment. However, one aspect could also be that most of industry work is not published
in research community and a collaboration between industry and research is important
for thorough insights. Also, most of the frameworks do not provide any information on
assessment approach, method or stakeholder(s) that are required to participate in
assessment.
For overcoming these issues, future research should focus on providing concrete
guidelines and steps to use the framework’s principals for educational game design in
practice for example if a framework focuses on linking gameplay and learning so
researcher should provide practical insights about how certain learning objective such
as problem solving can be seamlessly embedded in game mechanics or if focus is
challenges then how to increase learning complexity along with increasing game
challenges and mapping learning content to game tasks and narrative. The future
research should also guide the game designers on assessment of the design principals
(that the models provides) embedded in their educational game as part of design phase.
Finally, there is an extreme lack of tool support for available educational game design
models which need to be addressed to make ways for framework-based educational
game design by providing tool support for practical application. The future work will
focus on the development and evaluation models for educational games to investigate
and compare the shift in dimensional focus between different stages of educational
game development lifecycle.

References
1. Ahmad, M., Rahim, L.A., Arshad, N.I.: Towards an Effective Modelling and Development
of Educational Games with Subject-Matter: A Multi-Domain Framework. IEEE (2015)
2. Aleven, V., Myers, E., Easterday, M., Ogan, A.: Toward a Framework for the Analysis and
Design of Educational Games. IEEE (2010)
3. Alfadhli, S., Alsumait, A.: Game-Based Learning Guidelines: Designing for Learning and
Fun. IEEE (2015)
4. Annetta, L.A.: The, “I’s” have it: a framework for serious educational game design. Rev.
General Psychol. 14(2), 105 (2010)
5. Chorianopoulos, K., Giannakos, M.N.: Design principles for serious video games in
mathematics education: from theory to practice. Int. J. Serious Game 1(3), 51–59 (2014)
6. Eagle, M.: Level Up: A Frame Work for the Design and Evaluation of Educational Games.
ACM (2009)
7. Ibrahim, R., Jaafar, A.: Educational Games (EG) Design Framework: Combination of Game
Design, Pedagogy and Content Modeling. IEEE (2009)
1060 R. Tahir and A. I. Wang

8. Kiili, K.: Digital game-based learning: Towards an experiential gaming model. Int. Higher
Educ. 8(1), 13–24 (2005)
9. Pappa, D., Pannese, L.: Effective design and evaluation of serious games: the case of the
e-VITA project. In: Knowledge Management, Information Systems, e-Learning, and
Sustainability Research, pp. 225–237 (2010)
10. Rooney, P.: A theoretical framework for serious game design: exploring pedagogy, play and
fidelity and their implications for the design process (2012)
11. Shi, Y.-R., Shih, J.-L.: Game factors and game-based learning design model. Int. J. Comput.
Games Technol. 2015, 11 (2015)
12. Shiratuddin, N., Zaibon, S.B.: Designing User Experience for Mobile Game-Based
Learning. IEEE (2011)
13. Tan, P.-H., Ling, S.-W., Ting, C.-Y.: Adaptive Digital Game-Based Learning Framework.
ACM (2007)
14. Thomas, S., Schott, G., Kambouri, M.: Designing for learning or designing for fun? Setting
Usability Guidelines for Mobile Educational Games. Learning with Mobile Devices: A Book
of Papers, pp. 173–181 (2004)
15. Yu, S.-C., Fu, F.-L., Su, C.: EGameDesign: guidelines for enjoyment and knowledge
enhancement. In: Hybrid Learning and Education, pp. 35–44 (2009)
16. Zhang, H., Fan, X., Xing, H.: Research on the Design and Evaluation of Educational Games
Based on the RETAIN Model. IEEE (2010)
17. Neil, K.: Game Design Tools: Time to Evaluate (2012)
18. dos Santos, A.D., Fraternali, P.: A Comparison of Methodological Frameworks for Digital
Learning Game Design. Springer, Heidelberg (2015)
19. Ahmad, M., Rahim, L.A., Arshad, N.I.: An analysis of educational game design frameworks
from software engineering perspective. J. Inf. Commun. Technol. 14 (2015)
20. Malliarakis, C., Satratzemi, M., Xinogalos, S.: Designing Educational Games for Computer
Programming: A Holistic Framework. Electr. J. e-Learning 12(3), 281–298 (2014)
21. Tahir, R., Wang, A.I.: State of the art in game based learning: dimensions for evaluating
educational games. In: Academic Conferences International Limited (2017)
22. Song, X., Osterweil, L.J.: Toward objective, systematic design-method comparisons. IEEE
Softw. 9(3), 43–53 (1992)
23. Abrahamsson, P., Oza, N., Siponen, M.T.: Agile Software Development Methods: A
Comparative Review1. Springer, Heidelberg (2010)
24. Chowdhury, A.F., Huda, M.N.: Comparison between Adaptive Software Development and
Feature Driven Development. IEEE (2011)
25. Katayama, E.T., Goldman, A.: From manufacture to software development: a comparative
review. In: Agile Processes in Software Engineering and Extreme Programming, pp. 88–101
(2011)
26. Tripathi, P., Kumar, M., Shrivastava, N.: Theoretical Validation of Quality Metrics of Indian
e-Commerce Domain. IEEE (2009)
27. Babar, M.A., Gorton, I.: Comparison of Scenario-Based Software Architecture Evaluation
Methods. IEEE (2004)
28. Yusof, N., Rias, R.M.: Serious games in psychotherapy: a comparative analysis of game
design models (2006)
29. Abrahamsson, P., Warsta, J., Siponen, M.T., Ronkainen, J.: New Directions on Agile
Methods: A Comparative Analysis. IEEE (2003)
Insights into Design of Educational Games 1061

30. Ahmed, F.F.: Comparative Analysis for Cloud Based e-learning. Procedia Comput. Sci. 65,
368–376 (2015)
31. Kiili, K.: Evaluations of an experiential gaming model. Hum. Technol. Interdisc. J. Hum.
ICT Environ. (2006)
32. Kiili, K.: Content creation challenges and flow experience in educational games: the IT-
Emperor case. Int. Higher Educ. 8(3), 183–198 (2005)
33. Tahir, R., Wang, A.I.: Codifying game-based learning: LEAGUE for evaluation. In:
Proceedings of the 12th European Conference on Games Based Learning (2018)
Immersive and Collaborative Classroom
Experiences in Virtual Reality

Derek Jacoby(B) , Rachel Ralph, Nicholas Preston, and Yvonne Coady

University of Victoria, Victoria, Canada


derekja@gmail.com

Abstract. In these early days of educational Virtual Reality (VR) appli-


cations, it is critical to establish best practices for exploring the subtle
relationship between experiences in VR and learning. In contrast to typ-
ical user studies, the evaluation of a VR experience offered by a proto-
type can be subject to the intermittent breaking of an illusion; something
users tend not to recover from. Our work proposes a set of metrics related
to presence, immersion and flow, and considers them in the context of
two case studies. First, the results of a 60 user exploratory study reveal
the need to not only modify the proposed metrics, but to innovate in
terms of collaborative experiences. Second, key ways to introduce cost-
effective collaboration mechanisms into educational VR experiences are
introduced. Both of these studies are the result of ongoing work with the
Royal BC Museum.

Keywords: Virtual Reality · Education · Disaster preparedness


Interactive education · Collaborative education

1 Introduction
A Virtual Reality (VR) experience can provide a profound means of knowledge
transfer. Essentially, VR allows a user to take their own path through a con-
textualized knowledge-base in a realistic (or non-realistic), natural (or virtual),
interactive way. Additionally, virtual surroundings evoke psychological and phys-
ical reactions, that potentially have deeper impact than other forms of traditional
media. Evaluating the illusive impact of an experience is difficult—in particular
if it is an experience offered by a VR prototype that suffers from possible tran-
sient glitches that could break the illusion. As a result, user studies have to try
to tease out the phenomenon of a “broken illusion” due to a possible technical
glitch from other factors.
In the course of several design iterations, we have experimented with differ-
ent styles of interaction and are developing a methodology for evaluating the
effectiveness of VR-based educational software. Part of the challenge in an edu-
cational setting is also cost, and in the final work reported here we offer a collab-
orative classroom environment that uses only two headsets, along with 6 tablet
computers, designed to provide an immersive experience for an entire class. The
two educational experiences reported here are: (1) a historical exploration of the
c Springer Nature Switzerland AG 2019
K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1062–1078, 2019.
https://doi.org/10.1007/978-3-030-02686-8_79
VR for Classrooms 1063

gold rush designed in collaboration with the Royal BC Museum (RBMC), and
(2) the interactive experience of a tsunami in Port Alberni designed with Ocean
Networks Canada (ONC).

2 Related Work
Twenty-first century learning has been described as learning that encourages
high-level thinking skills and the development of technological literacies. It also
includes: problem solving, critical thinking, self-directed learning, and collabo-
ration by solving real-world scenarios and involves the use of technology; and
moreover, these skills are transferable among subjects, grade levels, and life [1,2].
Twenty-first century learning shifts towards higher-order thinking skills as well
interpersonal and self-directed skills, or the ability to work in a team or indi-
vidually and become a leader while being accountable and adaptable; in other
words, a form of social responsibility.
The use of VR and digital media seem to have created a “new realm of inter-
action” [2] (p. 132) in the 21st century. Even though, there are a variety of con-
cerns from too much screen time to logistical lack of devices or non-functioning
issues [3–5], a deeper understanding of why and how to integrate VR into ped-
agogical practices is needed. A growing number of educators are interested in
the “interaction age” in which students and teachers shift their expectations to
adapt to the changing job market [2]. These concerns drive the need to explore
21st century learning in education, especially as it links to VR.
Today, many digital technologies are more closely linked to this experience of
interaction. Technology and in particular some VR developments allow, encour-
age, and force interdisciplinary applications [1]. VR allows 21st century learn-
ers to dream, explore, and collaborate. Rather than watching television and
increasing the amount of screen time for students, a warning from the Ameri-
can Academy of Pediatrics (AAP), we need to reshape the thought of creating
digitally literate learners who use technology to interact and find information.
Prior research in VR describes how we can use VR to promote skills and knowl-
edge through its immersive and interactive qualities [6]. Some researchers have
begun to explore quasi-experimental ways of measuring successful VR expe-
riences through various knowledge pre- and post-tests, focusing on measuring
content knowledge [7,8]. Other researchers have used surveys or questionnaires
to measure the VR experience in general [9]. Still other researchers have mea-
sured presence, immersion, and flow as a way of understanding immersion and
interaction, which can lead to learning [10,11]. There are several survey ques-
tionnaires that have been developed and validated that would be appropriate
for measuring learning, such as the Presence Questionnaire and the Immersion
Tendency Questionnaire [11,12].
Presence is described as a “psychological state of being there mediated by
an environment that engages our senses, captures our attention, and fosters our
active involvement” [11] (p. 298). Immersion is also a psychological state and
can be characterized as “perceiving oneself to enveloped by, included in, and
1064 D. Jacoby et al.

interacting with an environment that provides a continuous stream of stimuli


and experiences” [11] (p. 299).
Another questionnaire that has potential to capture the subtle consequences
of VR focuses on flow. Flow is a state where “people feel involved in meaningful
actions, maintain a sense of control and stay focused on a goal” [10] (p. 506).
The flow experience “seems to occur only when a person is actively engaged in
some form of clearly specified interaction with the environment” [13] (p. 43).
Similar to presence and immersion, flow is focuses on active engagement within
an environment. For example, Bressler and Bodzin [10] used a short flow state
scale to measure flow in a post-survey with students. These survey questionnaires
could be the grounding for measuring some 21st century experiences and could
be combined with knowledge tests to measure VR experiences.

3 Case Study 1: The Gold Rush in British Columbia


In this case study, we evaluate a prototype educational VR experience designed
for a museum installation involving the “gold rush” (Figs. 1 and 2). Specifically,
we address the following case study questions:

– What metrics should be considered in the user survey?


– How does each metric weigh into the experience, and how are they related?
– How can we quickly explore and navigate between quantitative and qualitative
results?

This case study focuses on our evaluating a prototype built for an exhibit
for the Royal British Columbia Museum (RBCM) for children and young adults
to experience British Columbia in the mid 1800s. The experience includes infor-
mation about the era and the location including a “fly-over” experience (Fig. 2),
along with an opportunity to “pan for gold”–meaning the user must, in VR,
scoop sand and water into a pan and gently agitate it to allow the gold to sink
to the bottom of the pan. Metrics included in the user survey were composed
from 14 previously validated metrics for presence, immersion, and flow. Rele-
vance was determined using principle component analysis, which revealed that
our survey results appeared to be sound. Correlation coefficients between fac-
tors were derived from quantitative results, and organized in a hierarchy. Metrics
from the hierarchy were explored using a brush technique in a parallel coordi-
nate graph, supporting exploration and navigation from the quantitative to the
qualitative results (Fig. 3).
The remainder of this case study is organized as follows. We first identify
validated metrics from related work to create our own survey, and evaluate the
soundness of the results. The case study concludes with a list of possible improve-
ments and a discussion on generalizability.
What factors should be considered in the survey?
This project focused on an adapted presence, immersion, and flow question-
naire consisting of 14 questions were chosen from previous validated studies
based on the design of this project and the VR prototype (Fig. 4). Also, four
VR for Classrooms 1065

Fig. 1. Participant wearing the HTC Vive and headphones in first flyover.

open ended questions were asked. Three were yes/maybe/no questions (Q1, Q2,
Q3), each with room to add comments in free text (Q1q, Q2q, Q2q).
Q1: Did this increase your interest in the gold rush? Q1q: Why or why not?
Q2: Do you want to try this experience again? Q2q: Why or why not?
Q3: Would you come to a museum to try other experiences like this? Q3q: Why
or why not?

Along with a final opportunity to comment on anything, free form. Quali-


tative data from the questionnaire was analyzed through open-thematic coding
and compared with the quantitative results.

3.1 Methodology
The participants include a sampling of graduate students (n = 60) chosen for
their interest in VR. Each participant was asked to sign a consent form and
video and image release form before doing the VR experience. In the prototype
experience, users put on an HTC Vive headset and headphones. The first part
of the experience had users stand on a platform as it travelled over a canyon
(a “fly-over” experience), lasting about one minute. During this time, the users
listened to the sounds of the canyon as well as a narrative describing the gold
rush experience.
Once they travelled from one side of the bridge to the other, the user is
teleported down to the water’s edge. Here they were guided by the game handler
1066 D. Jacoby et al.

Fig. 2. Overview of gold panning canyon seen by participant in the flyover.

Fig. 3. Manual categorization of qualitative results from comments.

to find the pan on the ground. The participants then use the trigger button on
the handheld controllers to activate different gestures to remove different debris
from the pan, which could take one to three minutes depending on the users’
abilities. After four different gestures, the participants were successful in finding
gold. Once the participant had been in the experience for a maximum of 5 min,
they are asked to fill out an online questionnaire (Table 1).

3.2 Findings
Open Thematic Coding: Table 2 shows how the results Q1–Q3 aligned with
the open-thematic coding of the qualitative results (Q1q, Q2q and Q3q). One
interesting pattern that emerged was that neutral (“maybe”) results in Q1–Q3
tend to be coded as negative in the qualitative comments.
These findings would be consistent with the phenomenon of the illusion being
broken for the user, and not being able to recover. In some cases this was a tech-
nical problem, while in others it was more closely aligned with not meeting user
VR for Classrooms 1067

Table 1. Questions in our survey, assessing immersion, presence, and flow

Number Question
Question 1 How completely were you able to actively survey or search the
environment using vision?
Question 2 How involved were you in the virtual environment experience?
Question 3 How quickly did you adjust to the virtual environment
experience?
Question 4 How much did the visual display quality interfere or distract you
from performing assigned tasks or required activities?
Question 5 How much did the control devices (handheld) interfere with the
performance of assigned tasks or with other activities?
Question 6 How well could you concentrate on the assigned tasks or required
activities rather than the actual VR mechanisms
(headset/handles) used to perform those tasks or activities?
Question 7 How much did the auditory (sound) aspects of the environment
involve you?
Question 8 How well could you move or manipulate objects in the virtual
environment?
Question 9 I was challenged and I felt I could meet the challenge.
Question 10 How much did you lose track of normal time outside of the
virtual experience?
Question 11 Did you enjoy what you were doing?
Question 12 Were you ‘in the zone’ ?
Question 13 How mentally alert do you feel at the present time?
Question 14 How good are you at blocking out external distractions when you
are involved in something?

Table 2. Qualitative results comparing positive, neutral, and negative statements

Result Q1 Q1q Q2 Q2q Q3 Q3q


Positive 27 29 37 37 47 48
Neutral 19 3 16 3 10 2
Negative 14 28 7 20 3 10
Q1: Did this increase your interest in the
gold rush? Q1q: Why or why not?
Q2: Do you want to try this experience
again? Q2q: Why or why not?
Q3: Would you come to a museum to try
other experiences like this? Q3q: Why or
why not?
1068 D. Jacoby et al.

expectations. For example, some participants described that using the handheld
controllers and understanding how to pan for gold was complicated. Some users
also described that they felt they were “doing it wrong” and could not figure it
out without assistance from the game handler.
Also, some users felt that the “real” experience was impeded by some tech-
nical issues. In particular, one user said that sometimes the hands would go
through the pan instead of staying on the edge. Another user also said that they
tried to tilt the pan and there was no reaction in the game, which made the
“reality” of the activity compromised. The majority of users who felt confused
or challenged by the experience said that the experience was too short. Many
users identified that they wished the experience was much longer. Though the
prototype had problems, the level of interest and engagement of the participants
was quite high. Overall, over 78% of users want to go to a museum to try an
experience like this. The majority of the participants described the general expe-
rience very positively saying “it makes learning more interesting” and “super fun
and interactive”. Other comments of note included, “I think VR helps us get a
better experience of touring around the museum”, “I could see this being a very
effective at teaching”, and “it gives you an opportunity to get more involved”.

How Does Each Metric Weigh into the Experience, and How Are
They Related? The relationship between immersive tendencies of individuals
and presence experienced in VR was investigated using Pearson product-moment
correlation coefficient (Fig. 4). Preliminary analyses were performed to ensure
no violation of the assumptions of normality, linearity and homoscedasticity.
There was a small, positive correlation between the two variables, r = .29, n = 60,
p < .05, with high levels of presence associated with high levels of immersion.

Fig. 4. Pearson product-moment correlations between measures of presence and


immersion.
VR for Classrooms 1069

The relationship between immersive tendencies of individuals and flow


experienced in VR was investigated using Pearson product-moment correlation
coefficient (Fig. 5). There was a medium, positive correlation between the two
variables, r = .31, n = 60, p < .05, with high levels of flow associated with high
levels of immersion.

Fig. 5. Pearson product-moment correlations between measures of flow and immersion.

Based on the correlation coefficient analysis, we organized the top values


organized as the hierarchy (see Fig. 8). This showed the importance of factors
such as feeling “enjoyment”, “in the zone”, “challenged”, “involved”, along with
the ability to “survey”, “adjust” and “concentrate”. We explored these factors
using brushing (or selecting) in a Parallel Coordinate Graph (PCG). The dataset
allows us to visualize and explore individual results (ids 1–60) in the hierarchy
(see Fig. 6). We additionally splayed the values slightly, so that the individual
results can be drilled into and explored.

Fig. 6. Correlations organized as a hierarchy of factors.


1070 D. Jacoby et al.

How Can We Quickly Explore and Navigate Between Quantitative


Metrics and Qualitative Results? In this graph, values of 4 and above for
each of the “enjoy” and “in the zone” factors have been coloured green, values
of 3 are blue, and below 3 are red.
The PCG (see Fig. 7) allows for exploration of further relationships within
this data set, for example brushing or selecting values of (a) “ability to
adjust” >= 4, with (b) “ability to survey” >= 4.

Fig. 7. Parallel coordinate graph of factors from the hierarchy.

Further more, brushing allowed for broader exploration of relationships


between all factors in the quantitative results, while providing a mapping to
the qualitative as well. This technique was useful to navigate to the correspond-
ing qualitative results, in order to better understand the subtle issue of a broken
illusion versus unmet expectations.
Overall, our results of this first case study are challenging to categorize largely
due to the broken illusion problem. Clearly, participants identified a number of
areas for prototype improvement, while still indicating high levels of presence,
flow, and immersion. Despite the shortcomings of the prototype being evaluated,
they identified the potential for VR experiences to increase engagement, interest,
and learning in informal learning museum spaces. However, in order to increase
engagement and resilience to the broken illusion, we were clearly going to have to
innovate with collaboration in the next prototype. The following section discusses
the generalizability of these early results, before introducing the next Case Study.

3.3 Case Study 1: Generalizability


Detailed qualitative feedback has allowed us to identify several technical issues
with the handheld controllers, the height of the lighthouse boxes of the HTC
Vive for gold panning, and the need to expand recognized gestures. All of these
glitches broke the illusion for the users that experienced them, those some of
them were not related to any one factor in our questionnaire.
Our users were a homogenous group with high interest in VR, and it’s poten-
tial in a museum. One of the most frequent terms in the feedback was that the
experience was too short and that the participants want to spend longer in
this “gold rush” era. This could have impact on several of the factors in the
questionnaire.
This process was valuable for us in terms of allowing us to evolve not only
our prototype but also our questionnaires. Additionally, we needed to consider
VR for Classrooms 1071

how to make the VR experience less isolating and more collaborative. We did
this by further customizing validated metrics, focusing on key factors such as
collaboration. Each of these elements were informed by exploring the general
relationship between quantitative and qualitative results, which allowed us to
examine subtle relationships and tradeoffs in our first Case Study.

4 Case Study 2: Tsunami Preparedness


The Royal BC Museum installation work is designed for single user interactions.
The entire experience must be relatively short to allow other museum guests to
access the experience, but in order to be a self-contained experience it must be
at least 4 or 5 min long. Our next case study targets a much different environ-
ment; a classroom. In partnership with Ocean Networks Canada (ONC), we are
working on tsunami early warning and emergency preparedness (see Fig. 8). The
experience we are creating here is aimed at engaging a full class of students for
a longer period of time. Due to cost constraints, only two headsets are available
for the class, so the experience is built for headsets and tablet computers, and
even android cell phones.
Because this classroom experience has a broader set of educational goals, our
approach has been to develop a set of mini-games that elucidate various aspects
of those goals. No individual mini-game lasts more than 2 to 3 min, and they
are designed to cause the students to pass devices around between mini-games
so that the greatest number of students are exposed to the VR headsets. The
mini-games are generally collaborative, but there are some competitive aspects
to the design as well so as to keep attention and engagement focused on the
designed goals.

Fig. 8. The VR Tsunami classroom game.


1072 D. Jacoby et al.

4.1 VR Tsunami Game Design

Our approach to a collaborative classroom learning environment involves physi-


cal props and devices, tablet computers, and two VR headsets. In order to keep
a consistent experience among the devices we have introduced a guide character,
Allison, that is consistent among all of the interactions (Figs. 9 and 10). This is
both to provide consistency, and also because part of the user interaction is via
speech recognition and the use of an obviously non-human guide character helps
to set expectations of less-than-human levels of speech understanding during the
interactions.

Fig. 9. Physical model of our guide, Allison.

Fig. 10. The guide, Allison, also appears in the virtual environments.
VR for Classrooms 1073

The mini-games on the devices are structured such that the students must
pass devices around frequently. This is to make sure that all students have an
opportunity to use all the devices, but also because these opportunities to inter-
act with each other during the device changes are an important part of collabo-
ration. All of the student mini-games contribute to a team score, so the class is
motivated to all work together. Although most of our testing so far has been in
groups of 6–10 students (using only two tablets, Fig. 11) the system is designed
to accommodate 30 students on 6 tablets, 2 headsets, and one physical model of
Allison.

Fig. 11. A class of students using multiple devices.

The game is divided into three phases, each of which incorporates multi-
ple mini-games contributing to the final score for the class. In the first phase,
the preparation phase, the students learn about tsunami early warning and the
importance of planning. Some of the games include preparing emergency kits
(Fig. 9) and flying underwater vehicles to install and repair bottom pressure
recorders to give notice of an incoming tsunami (Fig. 12).
The second phase, the earthquake phase, is triggered by an earthquake. We
are using the town of Port Alberni as the model town for this experience because
due to it’s geography it is particularly susceptible to tsunami activity caused by
near-field earthquakes. There is a fault line off of the west coast of Vancouver
Island, and Port Alberni is at the head of a long, narrow inlet which serves to
amplify tsunami waves that would be generated from an earthquake. This initial
phase of the disaster is all about the approximately one hour that the town
would have to prepare between the earthquake and the subsequent tsunami.
Time is sped up in the game, and during this phase students must perform
mini-games such as controlling firetrucks to fight fires caused by broken gas
1074 D. Jacoby et al.

Fig. 12. A mini-game to repair a bottom pressure recorder (BPR) to detect tsunamis.

Fig. 13. A mini-game to direct firetrucks dispatched to resolve fires caused by the
earthquake.

lines in the earthquake (Fig. 13), clearing evacuation routes, making emergency
announcements, and driving boats in the harbour to control oil spills and rescue
individuals that are at greatest risk from the tsunami.
The final phase, the tsunami phase, occurs when the tsunami reaches Port
Alberni. During this phase (and the earthquake phase), the VR-based mini-
games are mostly centered around a map of Port Alberni that has been create
from digital elevation models from Ocean Networks Canada. These elevation
VR for Classrooms 1075

models allow reliable predictions to be developed as to the speed of the incoming


tsunami and what areas of the town are most at risk. The mini-games during
this phase consist of rescuing individuals and providing guidance towards escape
routes. The views of the town are both from a map-view perspective (including
getting closer using the magnifying glass, Fig. 14) and a drone-based overhead
view of the emerging disaster.

Fig. 14. Incoming tsunami!

After all of the tsunami-phase mini-games are complete, the class reviews
their score and has the opportunity to compare against the performance of other
teams (Fig. 15). It is at this point that the teacher will also conclude with some of
the lessons experienced interactively during the game - after seeing the disaster as
personally relevant and witnessing it “first-hand” it is hypothesized that students
will be much more motivated to retain information conveyed during the lesson.
Although our studies confirming this are not yet complete, the next section will
discuss some of the surveys and analyses that we will be using to make this
case.

4.2 Evolution from Case Study 1 to Case Study 2, and Beyond


As highlighted above, the new prototype leverages collaboration to enhance all
three qualities of presence, immersion and flow. Additionally, the questionnaires
have been modified and the audience broadened. The new survey focuses on
direct measures from the students as to how enjoyable and useful they found
the VR immersion, how the sound and graphics aided their understanding of
tsunamis, and how the collaborative system allowed them to work together.
As important as the direct measures are the teacher surveys. In the actual
1076 D. Jacoby et al.

Fig. 15. Students reviewing their class score.

deployment of this system there will be an Ocean Networks Canada educational


outreach worker in each classroom to set up the system and engage the students
with some activities to support the collaborative game. The regular classroom
teachers will have an opportunity to comment on the effectiveness of the collab-
orative game developed and described here, and the ONC education outreach
worker will be a constant across classrooms to help provide feedback.
Not only does broadening the audience assist with our own evaluation, but
also development of future prototypes. One of the precarious elements of this
project is that most of the development is being done by computer science
and software engineering undergraduate students on co-op work placements and
internships. This means that they are both generally inexperienced and looking
for high impact projects with real users to learn on. Thus having a project with
direct feedback from the students, teachers, and our Ocean Networks Canada
client is a good environment for our undergraduate students.
This classroom project is in it’s very early stages and has been deployed only
in test environments so far, but we plan to do another user evaluation much like
Case Study 1.

5 Conclusions and Future Work

More research is needed to directly explore how to precisely measure presence,


immersion, and flow but it is encouraging that all three appear to be positively
correlated. Our work represents just the first few steps down the long road to
developing quantitative and qualitative metrics for 21st century skills. Measure-
ment of knowledge retention is another possible way of assessing VR educational
environments, and we will investigate this as our applications are deployed in
VR for Classrooms 1077

real classrooms. Our early results suggest that self-reporting and automated
collection of datasets may be the best way to start to assess new metrics, but
still require careful analysis to uncover subtle relationships between factors that
contribute to the experience in VR.
As we roll out these experiences to the Royal BC Museum and classrooms
around British Columbia, our study results will be augmented by real-world
usage data and qualitative feedback from teachers and students. Our approach
of mixed device use is currently cost-driven, but the theoretical basis for consid-
eration as a better pedagogical approach requires further study and validation
of our hypotheses of greater collaboration and engagement with peers. Short
experiences in VR to make the situation seem personal and engaging, combined
with longer collaborative sessions with classmates on tablets, may in the end
prove more effective than protracted isolated experiences.

Acknowledgments. We thank all the participants who were available for this study,
and the students involved in creating the prototypes.

References
1. Kereluik, K., Mishra, P., Fahnoe, C., Terry, L.: What knowledge is of most worth.
J. Digit. Learn. Teacher Educ. 29(4), 127–140 (2013). https://doi.org/10.1080/
21532974.2013.10784716
2. Saavedra, A.R., Opfer, V.D.: Learning 21st-century skills requires 21st-century
teaching. Phi Delta Kappan 94(2), 8–13 (2012). https://doi.org/10.1177/
003172171209400203
3. Hanson, K., Shelton, B.: Design and development of virtual reality: analysis of
challenges faced by educators. ITLS Faculty Publications, vol. 11, p. 01 (2008)
4. Huang, H.: Toward constructivism for adult learners in online learning environ-
ments. Br. J. Educ. Technol. 33(1), 27–37 (2002). https://onlinelibrary.wiley.com/
doi/abs/10.1111/1467-8535.00236
5. Huang, H.-M., Rauch, U., Liaw, S.-S.: Investigating learners’ attitudes toward vir-
tual reality learning environments: based on a constructivist approach. Comput.
Educ. 55(3), 1171–1182 (2010). http://www.sciencedirect.com/science/article/pii/
S0360131510001466
6. Bricken, M., Byrne, C.: Summer students in virtual reality: a pilot study on educa-
tional applications of virtual reality technology. University of Washington (1992)
7. Chen, C.-T.: Development and evaluation of senior high school courses on emerging
technology: a case study of a course on virtual reality. Turk. Online J. Educ.
Technol. 11(1), 46–59 (2012)
8. Hauptman, H.: Enhancement of spatial thinking with virtual spaces 1.0. Comput.
Educ. 54(1), 123–135 (2010). https://doi.org/10.1016/j.compedu.2009.07.013
9. Tcha-Tokey, K., Christmann, O., Loup-Escande, E., Richir, S.: Proposition and
validation of a questionnaire to measure the user experience in immersive virtual
environments. Int. J. Virtual Reality 16(1), 33–48 (2016)
10. Bressler, D., Bodzin, A.: A mixed methods assessment of students’ flow experiences
during a mobile augmented reality science game. J. Comput. Assist. Learn. 29(6),
505–517 (2013). https://onlinelibrary.wiley.com/doi/abs/10.1111/jcal.12008
1078 D. Jacoby et al.

11. Witmer, B.G., Jerome, C.J., Singer, M.J.: The factor structure of the presence
questionnaire. Presence Teleoperators Virtual Environ. 14(3), 298–312 (2005).
https://doi.org/10.1162/105474605323384654
12. Witmer, B.G., Singer, M.J.: Measuring presence in virtual environments: a pres-
ence questionnaire. Presence Teleoperators Virtual Environ. 7(3), 225–240 (1998).
https://doi.org/10.1162/105474698565686
13. Csikszentmihalyi, M.: Flow: The Psychology of Optimal Experience. Harper and
Row, New York (1990)
The Internet of Toys, Connectedness
and Character-Based Play in Early Education

Pirita Ihamäki1(&) and Katriina Heljakka2


1
Prizztech Ltd., Pori, Finland
Pirita.ihamaki@prizz.fi
2
University of Turku, Turku, Finland

Abstract. The concept of the Internet of Things defines the idea of the
Internet – a global, interconnected network of computers connected to everyday
objects, products, and other objects in the surrounding environments. Again, at
the heart of the concept of the Internet of Toys lies the idea of playthings that are
capable of information processing and communicating with children, with other
connected toys and their environment, and even autonomous decision taking.
This study aims to understand smart and connected toys potentialities in the
context of toy-based learning. We have conducted a study with 20 preschool-
aged children from ages 5 to 6 years by using a group interview and playtests
with three Internet of Toys’ playthings. Our main conclusions are that although
these toys as ‘edutainment’ cater for opportunities for toy-based learning, one of
the key factors for preschoolers is the creative play patterns that they come up
with these character toys. This imaginative form of play may even overshadow
the toys educational potential unless they are used in the context of guided play.

Keywords: Internet of Things  Internet of Toys  Toyification


Toy-based learning

1 Introduction

The novelty of our contribution on the Internet of Toys research is that we contextu-
alize our experiences with them to understand the educational values of these toys,
which actualize in a play situation in an early toy-based learning environment. Lev
Vygotsky sees that play is a key mechanism for cognitive development as children
learn and develop in the context of play. Play seems to be a natural and universal
learning tool for children and adults. Through play, children can acquire skills without
knowing it in the most natural way. It can be a lifelong and enjoyable activity to carry
out, and we should think of play as a lifelong learning tool because play represent
recreation activities which are easy and fun to do. Lindon (2002) pointed out that
“children use play to promote their own learning, they do not have to be persuaded into
playing” [1]. The educators can use play in developing basic skills: to explore, con-
struct, imitate, discuss, plan, manipulate, problem-solve, dramatize, create and exper-
iment [2]. In our study the core issue focuses on the Internet of Toys, which play an
emerging role in supporting and encouraging contemporary play. For children as well
as adults, play offers physical, social and cognitive benefits and integrates all three

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1079–1096, 2019.
https://doi.org/10.1007/978-3-030-02686-8_80
1080 P. Ihamäki and K. Heljakka

areas of development. The Internet of Toys encourages three subtypes of play:


(1) pretend play, (2) object play, and (3) physical play. We see that in the core of
pretend play is imagining. Children deconstruct reality implicitly asking “what if?” For
example, a group of children may perform a detailed story using toy figures as if they
were real. Object play refers to the use of toys as well as everyday materials (e.g.
phones, tablets) and objects during play. Object play may involve instances of make-
believe. This type of play entails handling, exploring, and focusing on an object and its
features as opposed to using the object only as a story prop, as in dramatic play.
Moreover, physical play often requires props (e.g. football, basketball). All physical
play involves moderate to vigorous physical activity and fosters interactions where
children talk to one another and build cooperation and collaboration [3].
In our case study, we found out that a fourth issue which integrates children’s play
values related to educational purposes of the Internet of Toys, is guided play. Guided
play refers to learning experiences that combine the child-directed nature of free play
with a focus on learning outcomes and adult mentorship [4]. Playing with toys in this
way, by creating make-believe, dramatic scenarios, fuels children’s imagination and
creativity and invites them to alter the rules of reality. That entails the special features
of the Internet of Toys, for which adults want to see educational benefits. Based on our
research, we will explain why guided play is important for the use of the Internet of
Toys in an educational context. One important facet of our findings is to understand
that edutainment is not the only issue why children want to have playful experiences
with the Internet of Toys. As the playful experiences are also related to affordances
beyond education, our goal was to explore children’s own responses to these new types
of toys, which most often represent character toys.
Our research questions are the following:
RQ1 What playful experiences does the Internet of Toys offer for preschool-aged
children?
RQ2 What kind of educational values can the Internet of Toys give preschool-aged
children in a toy based-learning environment?
In our study, playtests with preschool–aged children have shown that Internet of
Toys has various play affordances, educational value, social and recreational values. In
this case study, we have seen how the preschool-aged children explore and engage with
the Internet of Toys’ playful affordances. Although this play has elements of similarity
with playing with traditional toys, it is distinguished from them because the Internet of
Toys provides added social and motivational benefits. In other words, with the Internet
of Toys one may play socially (share one toy) and become motivated by the ‘tasks’
given by the toy (through its technologically enabled features). Our study presents the
Playful Experience Framework for the IoToys, which describes fifteen different
experiences the preschool-aged children experienced during our playtests.
The paper first introduces Internet of Toys (IoToys) as a sub-phenomenon of the
Internet of Things (IoT). We then move on to discuss earlier examples on IoToys and
related research. The remainder of the article is organized as follows: We explain the
concept of toy-based companionship and go on to describe, how current
technologically-enhanced edutainment is undergoing a toyification. What follows is an
introduction on toy-based learning and the role of play in these processes. We then
The Internet of Toys, Connectedness and Character-Based Play 1081

present our case study on IoToys played in a preschool context. In the next section, we
discuss the results, limitations of our study and propose further ideas on what should be
considered when using IoToys in future preschool education and how the phenomenon
could be studied in next phases of research.

2 The Internet of Things Phenomenon


2.1 The Internet of Things (IoT)
The Internet of Things is bringing fundamental changes in economic, environmental,
healthcare, social and political realms [5]. The domestic Internet of Things (IoT) is the
current favored term but draws on an extensive lineage of technological visions for the
future of the home. According to Gartner (2017), 8.4 billion networked things were in
use by 2017 and increase of 31% on the year before [6]. The European Commission
(2016, 2) estimates that more than 26 billion things will be connected by 2020 [7]. We
need to understand that once connected, anything can become a part of further networks
and be used to circulate information [8]. In the future we will see that connected things
can be designed to sense their environment and document information about what is
happening there. Information will be given real-time or it can be documented and
inspected later.
The field of application for the Internet of Things solutions is increasingly
extending to virtually all areas of every day. The most prominent areas of application
include the smart industry, where the development of intelligent production systems
and connected production sites is often discussed under the heading of Industry 4.0. In
the smart home, smart building is receiving a lot of attention. Again, smart toys’
applications focus on Internet of Toys possibilities. Internet of Toys is quite a new sub-
area of the Internet of Things. At its core is the combination of physical and digital
components to create new products and enable novel business models for toy and game
businesses. Consequently, a range of opportunities is unfolding for toy and game
companies to generate incremental value through the Internet of Toys, also considered
‘smart toys’.

2.2 The Internet of Toys (IoToys)


Satyanarayanan’s (2001) vision of ‘pervasive computing’ consider invisibility in use,
where a pervasive computing environment as one saturated with computing and
communication capability, yet so gracefully integrated with users that it becomes a
“technology that disappears” [9]. The Internet of Toys represent such a technology.
The Internet of Toys have an Internet connection inside of the toys or they connect
online through mobile phones, which means that technology has become pervasive.
Hung et al. (2017) defined a smart toy “as a mobile device consisting of a physical toy
component that connects to one or more toy computing services to facilitate gameplay
in the Cloud through networking and sensory technology to enhance the functionality
of a traditional toy” [10]. The goal of our study is to understand the ‘edutaining’
potentiality of the Internet of Toys, (or IoToys), which in the context of children’s
1082 P. Ihamäki and K. Heljakka

everyday play often come to mean play experiences especially related toys connected
to the Internet. In this study, we use the concept of the IoToys (Wang et al. 2010) in
reference to early education to map the potentialities these smart and connected toys
hold when considering toy-based learning opportunities [11]. Holloway & Green
(2016) have defined smart toys as the Internet Toys, which (1) are connected to online
platforms through Wi-Fi and Bluetooth but can also be connected to other toys, (2) are
equipped with sensors, and (3) related one-on-one to children [12]. According to our
literature review, earlier studies with smart toys introduce how the concept of IoToys
has evolved and how these toys are being played with. For example, Frei et al.
(2000) have studied Curlybot smart toys by conducting an informal user study with 81
children. Curlybot is a two-wheeled vehicle that measures, records and plays back its
exact movement on any flat surface. The Curlybot is a self-contained smart toy
requiring no external computers, which has a microprocessor to control not only move
forward and backward but also rotate freely, the smart toy includes a memory chip. The
child records the movements of Curlybot by pressing a button that lights up a red or
green indicator. The study showed that children ages four and above playing with
Curlybot engaged in computational and mathematical concepts in a more creative way
[13]. Piper and Ishii (2002) have researched elementary school students playing with
Pegblocks, which is an educational toy showing basic physics principles. Children
manipulate wooden toys connected to each other via electrical cables to observe kinetic
energy changes. Based on the smart toy categorization, Pegblocks is a self-contained
smart toy that initiates cognitive tasks such as understanding kinetic energy changes.
Peglocks is a set of five wooden blocks. Each block consists of nine pegs combined
with electric motors, converting the kinetic energy of the child’s hand into electrical
energy [14]. Vaucelle and Jehan’s study (2002) explores Dolltalk, a computational toy
that records children’s gestures and speech and plays back their voices. Dolltalk is a
self-contained smart toy that initiates cognitive tasks, specifically, linguistic expres-
sions and storytelling. Dolltalk includes a platform with tag sensors, two speakers, one
microphone and two stuffed animals with sensors. When the child removes the two
stuffed animals from the platform, recording begins. Vauchelle and Jehan conducted a
user study with 12 children at an elementary school and concluded that children
generally enjoyed their interaction with Dolltalk by frequently repeating the playback
[15]. Fontijn and Mendl’s (2005) StoryToy is an environment featuring stuffed farm
animals that tell stories and react to each other. StoryToy is a self-contained smart toy,
which initiates the cognitive task of storytelling. The StoryToy plush character has a
motion sensor connected to a wireless transmitter that advances play. StoryToy pro-
poses three modes free play, reactive play and story play, which based on the location
of the duck character. All sensor events are uploaded to the computer via a receiver and
translated into audio responses by Java after responses are sent through a wireless
speaker. The researchers from the Philips Research Company and Eindhoven
University of Technology conducted their study with children between 2 and 6 years of
age. The research presents results that older children (4–6 years) considered more
complex dialogues enjoyable, but it was hard to follow dialogues of younger children
(2–3 years) [16]. Merrill et al.’s (2007) research included Sifteo, a self-contained smart
toy which allows children to interact with electronic blocks to produce different
knowledge combinations. Children select electronic blocks in accordance with their
The Internet of Toys, Connectedness and Character-Based Play 1083

desires and create their own patterns, it initiates cognitive tasks through thinking,
imagination and knowledge creation. Sifteo has mainly five components, namely, color
LCD screen, accelerometer, infrared transceivers, rechargeable battery, and RF radio.
A user’s physical manipulations are sensed and considered as input to the system.
Visual feedback is displayed on the LCD screen [17]. What is lacking from these
explorations is the considerations regarding the ‘characters’ of the toys, or their per-
sonalities as playthings. In our study we have chosen to inspect three IoToys which
based on gender-neutrality as character type of toys (toys with faces) and their avail-
ability on Amazon US (in August 2017): (1) CogniToy Dino, (2) Wonder Workshop’s
Dash Robot and (3) Fisher-Price’s Smart Toy Bear. These toys fulfill the criteria of the
IoToys. They are “smart” and their connectivity usually occurs through mobile devices
(smartphones and tablets). In some cases, smart toys also contain their own computers
(e.g., The CogniToy Dino and Fisher-Price’s Smart Toy Bear). Cagiltay et al.
(2013) describe that characteristics of smart toys need to be analyzed in order to
develop effective smart toys learning environments, which is the environment where
smart toys expected to have educational values [18]. For example, Fisher-Price’s Smart
Toy Bear includes nine different smart cards, which depict both educational and
entertaining activities. The IoToys allows children to learn through everyday experi-
ences. In our case study, we add the concept of guided play to discuss the use of IoToys
in early education. In reference to our case study we also present the preschool-aged
children’s Playful Experiences Framework for playing with the IoToys.

3 Toy-Based Companionship

The nature of the concept of “toys” has changed considerably over the last decades.
The toy industry has for a long time attempted to combine traditional toys and physical
games with digital devices in ‘smart’ ways, but only a few successful of this hybridity
has been seen on the market. A branch of this development, the IoToys, are still in their
infancy.
The IoToys bring with them the question of their possible advantages of use in
education as they integrate multimedia material in a way impossible for ‘traditional’
toys. Recently, many children have used computer-mediated toys [19]. Early childhood
curriculum provided opportunities for children to play and interact with toys and same
time create companionship with toys [20]. Children learn best when they are active
participants, when they are engaged, when the information is meaningful and when an
activity is socially interactive [21]. The IoToys supports guided play, where guided
plays refer to learning experiences that combine the child-directed nature of free play
with a focus on learning outcomes and adult mentorship. That makes children engaged,
but with the advantage of focusing the child on the dimensions of interest for a learning
objective. The companionship towards new technologies like the IoToys when applied
in learning environments brings some benefits for children [22]. For example, it
enhances the educational value of children’s play [23] and enables physical objects to
be seamlessly connected to digital content [20]. Combining physical and digital worlds
such as the physicality of traditional toys and interactive connected smart toys is
potentially beneficial for children. In particular, because of the emergence of IoToys as
1084 P. Ihamäki and K. Heljakka

a category of playthings, early childhood educators should be aware of these toys in the
context of early learning and their educational potential with a focus on the social
aspect of play.

3.1 Toyification of Technologically-Enhanced Edutainment


Noxon (2006) states that “toyification” describes “how everyday adult stuff is getting
less utilitarian and more toy-like” [24]. According to our belief, toyification has taken a
strong hold in current product development and marketing. Following this trend, some
companies have been working to make their products more toy-like to appeal to people
who might be feeling overwhelmed otherwise [25]. More specifically, toyification
communicates the idea of an entity (either physical, digital or hybrid) being inten-
tionally reinforced with ‘toyish’ elements or dimensions; an object, a structure, an
application, a character or a technology acquiring a toyish appearance, form or func-
tion. In parallel to the gamification of everyday life, it is, in this way possible to trace
simultaneously occurring patterns of toyification taking place in different realms of
culture [26, 27]. Whereas games (both physical and digital) have, for a long time, been
considered a sufficient educational media and recent developments have demonstrated
a gamification tendency in the realm of education, we believe that, paralleling this
development, it is possible to see an emerging toyification of education, especially
when considering the IoToys as a part of the global phenomenon of the IoT. We see
that toyification of technologically-enhanced play is becoming more emergent in
education and edutainment as education is turning more informal through for example
gamification. Toyification brings with it possibilities to formal education in connection
with guided play. We describe in our case study how the IoToys related playthings are
in other words shedding hard technology, for example, the Fisher-Price Smart Toy
Bear. In the Smart Toy Bear the ‘brains’ are a computer without a screen. We
understand that in this case, toyification of technology has improved smart toys in
combination with the connectivity of the IoToys, which are now providing new ways to
entertain and ‘edutain’ the children playing. Moreover, they also bring with them
playful experiences into children’s use of cutting-edge technologies, as discussed
further. Edutainment represents an informal type of education form of education that
has been successfully used by many education systems around the world. Edutainment
by purpose and content consists of informal education which is to improve learners’ life
control and skills education which is, for example, to offer experiences like playing
with connected toys [28]. The term edutainment is defined in several ways. The
American Heritage Dictionary defines edutainment as “the act of learning through a
medium that both educated and entertains”. According to Buckingham and Scanlon
[29] edutainment is a “hybrid genre that relies heavily on visual material, on narrative
or game-like formats computer games-education-implications for game developers, and
on more informal, less didactic style of address”. Computer edutainment includes game
types: adventure, quiz, role-play, simulations and experimental drama; edutainment on
the Internet included tele-learning systems, web-based educational systems and inter-
active smart connected toys (IoToys). This type of edutainment uses the interactivity
via software and hardware and connects with other telecommunications systems [28].
The Internet of Toys, Connectedness and Character-Based Play 1085

3.2 Toy-Based Learning in a Connected World


Toy-based learning in the digitalizing age presents us with questions concerning
connectivity. Connected toys incorporate Internet technologies that respond to and
interact with children. A toy-based learning environment can provide physical inter-
action between the toy and the playing children. Lampe and Hinske (2007) pointed to
that ideal learning experience comes from the combination of physical experience,
digital content, and imagination of the child [30]. In addition, learners can use the toys
abilities according to educational aims [19]. Some researchers have seen a potential for
toys to use them an education. For example, Demir and Sahin (2014) studied the
scientific toys used to teach physics, chemistry and biology concepts. They evaluated
the toys according to scientific creativity [31]. Kara et al. developed smart toys for
storytelling activity and examined storytelling skills, creativity, and narrative activities.
Both studies showed positive effects of toys and their depended variables [32, 33].
There are few studies on how children play with these smart toys [34, 35]. A previous
study shows how these smart toys facilitate a child’s social skills [23].
Technology-based toys are increasingly popular with today’s children [36]. In
earlier research on smart toys, authors identified some of the unique features that
connected with different developmental stages. Connected toys can contribute to
blurring the boundaries between formal and informal learning [37]. Children’s input
(data) can be analyzed and responded to in increasingly individualized ways. This
individualization, therefore, has the potential to offer significant educational benefits
and is at the center of major changes in existing learning technologies. These tech-
nologies can give children “choice in the pace, place, and mode of their learning” [38].
Today’s game-playing children learn to do things to take in information from many
sources and make decisions quickly, to deduce a game’s rules from playing rather than
by being told, to create strategies for overcoming obstacles, to understand complex
systems through experimentation. First of all, increasingly they learn to collaborate
with others, as digital games are about playing with networks [39].
Early studies suggest that the used technologies in early childhood education could
be addressed by developing new ideas about children’s digital play that helps educators
to recognize children’s activity with technologies in a play-based way [40, 41]. This is
because early childhood education is traditionally play-based, and educators are used to
observing and assessing young children’s play. Toy-based learning, contrary to the
often structured, rule-bound, and competitive (and potentially more acknowledged)
game-based learning, seems to build more on an open-ended, imaginative but still
educational realm, especially fit for young learners such as children of preschool age.
Children play with their IoToys and potentially build an imagined world with them.
In this theoretical frame, a socio-constructivist view is adopted, according to which
learning is not an individual, but particularly social and societal activity that means that
means that learning always takes place in a social context. Under such a framework of
toy-based learning the use of the educational features of the IoToys contributes to the
realization of: (1) Meaningful learning, based on preschool age children’s own group
work with educational materials (in our case for drawing a picture of their chosen
IoToys plaything); (2) authentic learning using learning resources of real-life or sim-
ulations of the everyday phenomena (in our case study the Fisher-Price Smart Toy Bear
1086 P. Ihamäki and K. Heljakka

which for example has smart cards that remind the player to “brush his/her teeth”);
(3) social learning: technology supports the process of joint knowledge development,
connected with toys, IoToys can support collaboration between fellow preschool-aged
children, who can be based at different schools or abroad; (4) active-reflective learning:
preschool-age children’s playing may result in problem-solving using available
resources selectively according to their interest, search and learning strategies;
(5) problem-based learning: a method that challenges preschool-aged children to
“learning by doing”, preschool-aged children’s group are seeking solutions to real
world problems, which are presented on the following toy-based learning diagram used
to engage children’s curiosity and initiate motivation to learning.

4 Study

In our case study, we see play value, which can use to describe the overall enjoyment of
a child with a certain toy. It consists of factors such as complexity and challenge,
appropriateness for the context (in this case study kindergarten with preschool aged
children), correspondence to the character of the child. We have developed a toy-based
learning diagram (Fig. 1) based on The Experiential Learning Cycle developed by
Kolb [42]. We have used our three IoToys playthings, which are Fisher-Price Smart
Bear, CogniToys Dino, and Wonder Workshop’s Dash. The four corners of the dia-
gram represent four experiential learning styles: when a child takes a toy to interact and
play with. This diagram can be used as a toy-based learning tool by placing an idea for
a toy and children interactive style and the envisioned uses of the IoToys within the
diagram and then thinking of how the toy could be adapted to facilitate a different
experiential learning cycle with it. This adapted toy is placed in the diagram and can
function as a starting point for a next adaptation, the initial idea being changed again
and again to cater for various play types with the learning different subjects [43]. Our
study employs toys that according to their marketers cater to enjoyment and oppor-
tunities for learning. In this way, the toys under scrutiny represent “edutainment”,
although their educational promises are often accentuated over the play value of their
traditional play patterns [35].

4.1 Research Data


The research design of our study consists of children’s playtests, marketer’s presen-
tations of the IoToys features and educational potentialities, and a preschool teacher
survey. We have investigated children’s own responses to these technologically
enhanced toys. We have conducted two group interviews and interactive playtests with
20 preschool-aged children.
All of the toys chosen for this study represent different character toys, meaning that
they carry a resemblance to know animal (CogniToys Dino, Fisher-Price’s Smart Toy
Bear) or familiar robot forms (as in the case of Wonder Workshop Dash). All of the
IoToys use English language and respond to light, sound and/or movement). One of the
reasons for this selection of IoToys was based on our interest in mapping out their
capacity to invite their user to experiences playful experiences. What guided our interest,
in particular, are the toy industry’s promises related to smart toys’ activities [35].
The Internet of Toys, Connectedness and Character-Based Play 1087

Fig. 1. Toy-based learning diagram.

Both empirical inquiries include questions concerning the toys’ educational


potential. Our methodology includes participatory observation, playtests, and written
and visual types of documentation through photographing and videotaping the test
groups playing, learning, and interacting with our IoToys, including the children
drawing their chosen IoToys after the playtests. The multimethod approach allows us to
carry out both a narrative and visual analysis of data [35].
Group Interviews and Play Tests
We have conducted two group interviews and interactive play tests with 20 preschool-
aged children (5–6 years of age) in a Finnish group and a Finnish/English speaking
bilingual group in a West-coast Finnish town in October 2017. Finnish children are
introduced early to mobile technologies and many even have their own mobile phones
and tablets before starting school (typically at age 7). We were informed that the
children in the Finnish group each have their personal tablet at preschool, which they
are allowed to use in supervision for a limited time per day. In order to understand the
children’s exposure to mobile technologies, we also asked their kindergarten teachers
how many of them have a mobile phone of their own. Of the children that participated
in our study, 10 reported owning a mobile phone of their own. This question was
relevant in developing an understanding of whether or not it is possible for the children
to, for example, use the mobile phone to operate an app, photograph, or video-record
their toy play by themselves [35].
1088 P. Ihamäki and K. Heljakka

4.2 Educational Promises of the Internet of the Toys in Our Study: The
Marketers’ Perspective
The IoToys included in our study [35] are briefly described in the following from the
perspective of their marketers:
CogniToys Dino, A “Personalized Learning Buddy”
Amazon.com describes the CogniToys (by Elemental Path) as an educational toy that
includes stories, games, jokes, and fun facts, encompassing subjects including
vocabulary, math, geography, science, and more to engage “your child in educational
play based on their academic needs”. The age recommendation given for CogniToys by
the manufacturer is five years and older. The CogniToys Dino will constantly evolve,
with its cloud-connected, Wi-Fi enabled character allowing for the play experience to
constantly improve and update automatically as new content becomes available. The
toy is said to engage kids with a wide variety of content by encouraging learning and
play using interactive dialogue. In practice, the CogniToys Dino grows with the
children by listening to their questions and adapting to their personal preferences and
unique educational skill set. The toy explores favorite colors, animals, and more to
customize engagement as well. The educational promises of the CogniToys are that
learning is a “FUNdemental Part of the CognitToys Experience” and that each Dino
comes with “a variety of custom modules to engage kids in educational play including
problem-solving challenges, geography games, historical fun facts and more”. Once the
Dino is configured using the CogniToys App, it presents age-appropriate content from
the first “Hello!” [44]. CogniToys represents a new wave of toy design, having its
origins in a Kickstarter campaign [45]. According to a marketing text published by
Toys “R” Us, the CogniToys Dino represents the “next generation of internet-
connected smart toys” and “can engage your child in conversations, play games, sto-
rytelling and more” [46].
Wonder Workshop’s Dash
The product description given by the company behind Wonder Workshop’s Dash
claims the toy “is a real robot that makes learning to code fun for kids”. “Responding to
voice, navigating objects, dancing, and singing, Dash is the robot your child always
dreamed of having. Use the free Apple, Android, and Kindle Fire apps to create new
behaviors for Dash—doing more with robotics than ever before possible. Dash presents
your kids with hundreds of projects, challenges, and puzzles as well as endless pos-
sibilities for freeform play. Along with Dash, you can use our five free mobile apps.
The Wonder and Blockly apps are designed for every child to have fun on their own
while learning how to program robots” [47]. The manufacturer recommends this toy for
children ages six and up. Wonder Dash Robot has multiple apps, one of which is the
Blockly App, which is in standard use in elementary schools and recommended for
kids by Code.org. With Blockly, “your child or student can take on coding challenges
and make their own programs for Dash. […] you can create your own dance, record
your choice, and have Dash play it back, or even program Dash to follow you around.
With the new tutorial section, it is possible to program with no previous experience”.
While the company sells its programmable robots directly to families, it has also seen
Dash and Dot becoming part of schools’ curricula and coding clubs over the years.
The Internet of Toys, Connectedness and Character-Based Play 1089

According to Kolodny, some 8,500 schools are using Dash and Dot around the world
today [48]. According to a marketing text published by Toys “R” Us, Wonder
Workshop’s Dash “is a real robot, responsive to the world, on the go and at the ready.
Kids imagine the sidekick, pet, or pal they’ve always wanted and brought it to life with
Dash and their own code. […] Dash is a faithful explorer in the world your child
creates. Dash can greet kids as they come home from school, help them deliver a
message to a friend, follow them on journeys, become a true partner in fun” [49].
Fisher-Price Smart Toy Bear
According to a product description given by Fisher-Price, the “Smart Toy is the next
generation of play”. Manufacturer recommends the Smart Toy for ages three to eight
years. Smart Toy Bear is an interactive learning friend with all the brains of a computer,
without the screen. “The more your child plays with Smart Toy, the more this
remarkable furry friend adapts to create personalized adventures. Fisher-Price Smart
Toy Bear can start a true friendship with a child and that will help your child grow
socially and emotionally, too” [50].
“Smart Toy is an interactive learning friend with all the brains of a computer,
without the screen. When children talk, their furry friend listens and adapts to future
conversations. Smart Toy actually recognizes their voice. The toy also recognizes his
Smart Cards (each Smart Toy comes with nine Smart Cards and a cute little backpack
to store them in). Smart Toy knows what your child wants to do: make up a story, play
a game, go on an adventure and more. The Smart Toy encourages social-emotional
development, imagination, and creativity” [50].
According to a marketing text published by Toys“R”Us, the Fisher-Price Smart
Toy Bear’s features encompass the following: “The toy includes Voice Recognition:
Talks and listens and remembers what your child says—the two of them can have
actual conversations! Image Recognition: Visually recognizes the nine Smart Cards
included so your child can choose activities like stories, games, and adventures! Smart
Card expansion packs are available “to expand the play”. The toy “Learns your child’s
favorite things and activities!” Knows when you toss him in the air (with a little help
from his accelerometer). Knows the time of day, weather, and world events. Plays
games with the whole family make up stories where your child can choose what
happens next! Takes your child on imaginative adventures. Tells jokes. By down-
loading a free app at smarttoy.com/app, unlimited Wi-Fi content updates may be
unlocked and help the Smart Toy “learn your child’s name”. Also, “Parents can unlock
bonus activities with the app, such as bedtime, clean-up, break time, and party time!”
The marketing text attached to the toy promises that “No personally identifiable data is
transmitted by Smart Toy” [51].

5 Results and Discussion

Toys have been a constant presence in children’s lives for centuries. For children,
however, toys are tools for encouraging different kinds of play. Our case study shows
that children use play to disentangle ambiguities they find in the world and to playtest
their incident hypotheses about how things work. For example, when preschoolers are
1090 P. Ihamäki and K. Heljakka

offered the IoToys to play with that has an ambiguous causal mechanism, the first thing
they do, without being told, is figure out how the toys work through exploratory play.
However, the IoToys represent multimedia toys, and first need adult mentor-
ship. Guided play crucially incorporates an element of adult structuring of the play
environment (toy-based learning environment), but the child maintains control within
that environment.
In the two group interview sessions, the researchers introduced all three IoToys to
the children one by one, first by showing the toy and then letting each child interact
with it. Finally, we showed the children a short video of the toys’ functions based on
non-commercial material (review videos) found on YouTube. During the child-toy
interaction, the group was asked the three following questions: (1) what the toy could
teach them, (2) how the child would play with the toy alone, and (3) how the child
would play with the toy in the company of other children. The children’s answers from
the group interviews to these questions are collected in Table 1. After the playtest, all
children had to figure out how the IoToys work. Acting on the IoToys to discover how
they work thus led to better learning compared to playing with these toys merely to
confirm what has been shown.
In our research, we have found out that the IoToys need to be explored through
various play forms, which are pretend play, object play, physical play and guided play.
In this case study the toys’ capacity to invite their players to pretend play and creative
play and, in this way, their potential play value in terms of open-ended play (and
intrinsically motivated play), when contrasted to their educational value (instrumentally
motivated play) seems in balance as all these toys afford all forms of play. In this case
study pretend play showed to be a relevant form of play with preschool children as the
toys included in our study all represent characters, whose personalities may be
developed further in for example imaginative play. In the study, the investigation of
object play revealed everyday materials like tablets are integrated in play, for example
for the Dash robot to act as a remote control. On the other hand, mundane material (like
papers and pencils) can be used to make ‘rails’ for the Dash robot to follow. In this case
study physical play happens for example with CogniToy, which played music and the
children in our study started dancing in the middle of the playtest. Same happened,
when we introduced the Dash robot. Because of its movement and sound, children
started to follow the robot. Children’s interaction with each other resulted in collab-
orative learning, for learning with others directs how to act and what to do next.
Finally, we also analyzed marketing materials in connection with the IoToys, and
children’s own playful experiences. Following the Playful Experience (PLEX) model
introduced by Lucero et al. [52], we built on the suggested framework to understand the
playful dimensions related to play with the IoToys. Our Playful Experience Framework
of the IoToys validation efforts included a study of everyday gadget use, such as
playing with the IoToys and playing with mobile phones and tablets, to see what
experiences those devices or the IoToys prompted in use (in this case by preschool-
aged children). As a result, 15 categories were included in the Playful Experience of the
IoToys Framework (see Table 2).
Based on our findings presented in the previous section, we have understood and
discussed that in the educational context of most importance is that children are guided
in their play with the IoToys. Also as in kindergarten teacher need to have some earlier
The Internet of Toys, Connectedness and Character-Based Play 1091

Table 1. Children’s responses to three Internet of Toys’ character toys in our case study
Questions CogniToys Dino Wonder workshop’s Smart toy
dash bear
What the toy teaches the child * How to make * How to make * English
(educational play patterns) different sounds different sounds language
* How to sing (e.g., farm animals) * Tells
stories
* Music * How to
play tag
How the child would play with * Dance * Play tag * Nurse
the toy alone (solitary play * Sing with the * Play hide and seek the toy
patterns) toy * Play house * Play
* Play disco hide and
with it seek
* Use it in play
in which you
need music
* Use it as a
lamp
* Take videos
with it
* Nurse it
How the child would play with N/A * Play disco dancing * Play
the toy with other children (social * Play football school
play patterns) * Make arts & crafts with the
toy
* Share
the toy
* Play
house

experiences with IoToys in the context of education so that they understand their
potential and educational values. Kindergarten teachers need to have goals in what they
want to teach the children with the help of these toys and which of their affordances are
considered as entertaining. Moreover, what is needed is a discussion on how important
the subtypes of play are for children. For example, pretend play supports children’s
own creativity. All play is somehow pretend of play, when children create own stories
to play with character-types of toys, which they want identify and make friends with.
Object play patterns entail that children use everyday materials such as tablets and
phones to play with. In fact, children see these digital devices as a toys and don’t
necessarily make a differences between digital and physical things. We believe that
technology in toys will be even more invisible for the future. Physical play with IoToys
also gives the possibility to make children to move and be active. As we have seen in
our case study, dancing is one of the physical play patterns encouraging movement, but
also the playing of the hide and seek game, in which children followed the Dash Robot.
1092 P. Ihamäki and K. Heljakka

Table 2. The Playful Experience of the Internet of Toys Framework, with 15 categories
Playful Description IoToys in our study
experience
Challenge Children’s abilities are tested by the Wonder Workshop Dash
IoToys’ demanding tasks
Competition Children can contest their earlier Wonder Workshop Dash
experiences with IoToys
Completion Finishing a major task, like listening to Fisher-Price Smart Toy,
the IoToys’ story CogniToy Dino
Control Commanding IoToys with an Ipad Wonder Workshop Dash,
Fisher-Price Smart Toy,
CogniToy Dino
Discovery Children’s imaginative play with IoToys Wonder Workshop Dash,
presents what the designer may not even Cognitoy Dino
thought, e.g. using the IoToy as a lamp
Exploration Investigating an object or situation with Wonder Workshop Dash,
the IoToys Fisher-Price Smart Toy Bear,
CogniToy Dino
Expression Children play creatively e.g. by coding Wonder Workshop Dash
Fantasy An imagined experience, e.g. the “IoToys Fisher-Price Smart Toy,
can teach me to fly” CogniToy Dino
Fellowship IoToys like Dash Robot has their own Wonder Workshop Dash
community to share an experience of their
own toy with others
Humor IoToys give children fun and joyous Fisher-Price Smart Toy,
experiences e.g. by telling children stories CogniToy Dino
and jokes
Nurture Children want to take care of their IoToys Fisher-Price Smart Toy
Relaxation Children comment that the IoToy can Fisher-Price Smart Toy,
read them a bedtime story CogniToy Dino
Sensation Children think that IoToys are exciting Fisher-Price Smart Toy,
for stimulating senses and giving children CogniToy Dino
feedback
Sympathy Children can share emotional states with Fisher-Price Smart Toy,
their IoToys CogniToy Dino
Thrill Children’s excitement derives from to Fisher-Price Smart Toy,
taking risks with the IoToys, e.g. by CogniToy Dino
listening to a ghost story told by the toy
or risk-taking in reference to learning
coding with the IoToys
The Internet of Toys, Connectedness and Character-Based Play 1093

6 Conclusions

Today’s Digital Revolution has, alongside the Internet of Things, introduced the
IoToys. One thing is almost certainly assured, toys will always play a role in facilitating
children’s play. So will the technologically enhanced toys as demonstrated by our
study. These IoToys, when used in combination with guided play activities, can give
children rich learning opportunities. In our case study, we present the Toy-based
Learning diagram, which with the help of the IoToys identified some of the unique
features that connected experiential learning style. The learning environment can be in
kindergarten, school or home environment, and learning may happen when children are
active, engaged, learning the meaningful material, and in a social context (which can be
also connected toys with other toys, or other children globally). On the other hand, the
character-type the IoToys employed in our research show that the imaginative play
patterns, such as treating the toy character as a companion that may be nurtured and
played with without light, sound, or movements, often overshadow its educational
potential grounded in the pre-programmed content that guides the child in learning how
to carry out, for example, language-based or mathematical activities. This means that
recognizing children’s actual play activities with the IoToys in play-based situations
would provide educators with useful knowledge on the toys’ capacity to invite ‘hybrid’
play patterns beyond digital play.
Finally, we reflect on the limitation of this study. The limitations that must be
considered are a) the scarcity of earlier literature on the IoToys used in education, and
b) the study environment, which in our study was a Finnish preschool environment (for
n = 20, 5–6-year-old children) in combination with our use of social group interviews
and play tests rather than individual interviews. Our study was conducted with a
relatively small sample size (20 children with Finnish kindergartens), with limited
demographic diversity. Despite these limitations, we believe that this work represents
an important first inquiry on children’s interactions with the chosen IoToys, particularly
in an early-education context as a new tool for observing and assessing young chil-
dren’s toy-based learning. We hope that it will inspire ongoing and future work in
IoToys, especially with an interest in children’s experience in early education.
In upcoming work we will continue our research with Finnish kindergartens by
collecting long-term data. Early childhood education is traditionally play-based:
educators are used to observing and assessing young children’s play. Valuable ideas of
how to incorporate IoToys into early education curricula could be found out by
including questions regarding companionship besides connectivity into the inquiries on
the toys’ capacity to function as edutainment - and by listening to the children them-
selves in order to understand their play with these technologically-enhanced toys even
better. Furthermore, in the following stages of research we will explore more thor-
oughly the experiences and ideas of kindergarten teachers to explore how
technologically-enhanced, character-based play with the IoToys could best be facili-
tated to support learning in early education.
1094 P. Ihamäki and K. Heljakka

References
1. Lindon, J.: What is Play. National Children’s Bureau, London (2002)
2. Wasserman, S.: Serious Players in the Primary Classroom. Teacher College Press, New York
(1990)
3. Hassiger-Das, B., Zosh, J., Hirs-Pasek, K., Golinkoff, R.: Toys. In: Peppler, K. (ed.)
The SAGE Encyclopedia of Out-of-School Learning. Sage Publication Inc., Thousand Oaks
(2017)
4. Weisberg, D., Hirsh-Pasek, K., Golinkoff, R., Kittredge, A., Klahr, D.: Guided play:
principles and practices. Curr. Dir. Psychol. Sci. 25(3), 177–182 (2016)
5. Kshetri, N.: The economics of the Internet of Things in the Global South. Third World Q. 38
(2), 311–339 (2017)
6. Gartner Inc., Gartner says 8.4. billion connected “things” will be in use in 2017, up 31
percent from 2016. http://www.gartner.com/newsroom/id/359891,accessed. Accessed 27
Feb 2018
7. European Commission, Advancing the Internet of Things in Europe. http://eur-lex.europa.
eu/legal-content/EN/TXT/?uri=CELEX:52016SC0II0. Accessed 27 Feb 2018
8. Bunch, M., Meikle, G.: The Internet of Things. Polity Press, Cambridge (2018)
9. Satyanarayanan, M.: Pervasive computing: visions and challenges. IEEE Pers. Commun. 8
(4), 10–17 (2001)
10. Hung, P.C.K., Fantinato, M., Rafferly, F., Iqbal, S.-Y., Huang, S.-C.: Towards a privacy rule
model smart toys. In the IEEE 50th Hawaii International Conference on System Sciences
(HICSS-50), 4–7 January, Big Island, Hawaii, USA (2017)
11. Wang, X.C., Berson, I., Jaruszewicz, C., Hartle, L., Rosen, D.: Young children’s technology
experiences in multiple contexts: Bronfenbrenner’s ecological theory reconsidered. In:
Berson, I., Berson, M. (eds.) High-tech Toys: Childhood in a Digital World, Denver, USA,
pp. 23–47. Information Age Publishing Inc. (2010)
12. Holloway, D., Green, L.: The Internet of Toys. Commun. Res. Pract. 2(4), 506–519 (2016)
13. Frei, P., Su, V., Mickhak, B., Ishii, H.: Curlybot: designing a new class of computing
systems. In: Proceedings of the SIGGHI Conference on Human Factors in Computing
Systems, pp. 129–136. ACM Press, New York (2000)
14. Piper, B., Ishii, H.: PegBloks: a learning aid for the elementary classroom. In: Terveen, L.,
Wixon, D. (eds.) CHI ’02 Extended Abstract on Human Factors in Computing Systems,
pp. 686–687. ACM Press, Minneapolis (2002)
15. Vaucelle, C., Jehan, T.: Dolltalk: a computational toy to enhance children’s creativity. In:
CHI 2002, Extended Abstracts on Human factors in Computing Systems, pp. 776–777.
ACM Press, New York (2002)
16. Fontijin, W., Mendel, P.: StoryToy: the interactive storytelling toy. In: Gellersen, H.W.,
Want, R., Schmidt, A. (eds.) Third International Conference, Pervasive 2005. Proceedings
Series: Lecture notes in Computer Science, 3468, pp. 37–42, Munich, Germany (2005)
17. Merrill, D., Kalanithi, J., Maes, P.: Siftables: towards sensor network user interfaces. In:
Proceedings of First International Conference on Tangible and Embedded Interaction, 15–17
February, Louisiana, USA, pp. 75–78 (2007)
18. Cagitay, K., Kara, N., Aydin, C.C.: Smart toy based learning. In: Spector, J.M., Merrill, M.,
Elen, J., Bishop, M. (eds.) Handbook of Research on Educational Communication and
Technology, pp. 703–711. Springer, New York (2013)
19. Kara, N., Aydin, C.C., Cagiltay, K.: Design and development of a smart storytelling toy.
Interact. Learn. Environ. 22(3), 288–297 (2012)
20. Yelland, N.: Technology as play. Early Childhood Educ. J. 26(4), 217–267 (1999)
The Internet of Toys, Connectedness and Character-Based Play 1095

21. Hirsh-Pasek, K., Adamson, L.B., Bakeman, R., Owen, M.T., Golinkoff, R.M., Pace, A.,
Suma, K.: The contribution of early communication quality to low-income children’s
language success. Psychol. Sci. 26, 1071–1083 (2015)
22. Plowman, L., Stephen, C.: Children, play and computers in pre-school education. Br.
J. Educ. Technol. 36(2), 145–157 (2005)
23. Hinske, S., Langheinrich, M., Lampe, M.: Towards guidelines for designing augmented toy
environments. In: Proceedings of the 7th ACM Conference on Designing Interactive
Systems, pp. 78–87. ACM, New York (2008)
24. Noxon, C.: Fisher-Price Fonts. Rejuvenile Consumer Goods: Blog Archive. http://www.
rejuvenile.com/blog/c/rejuvenile_consumer_goods/. Accessed 11 Feb 2018
25. Fisher, S.: Forget Gamification. Think Toyification., Simplicity 2.0, IT Trends and Though
Leadership. https://www.laserfiche.com/simplicity/forget-gamification-think-toyification/.
Accessed 11 Feb 2018
26. Heljakka, K.: Playing with words and toying with vocabulary: seizing new meanings related
to the things for play. In: 7th ITRA World Congress: Toys as Language and Communication,
23–25 July, Braga, Book of Abstracts. Faculty of Philosophy, Catholic University of
Portugal, Braga (2014)
27. Heljakka, K.: Toys as tools for skill-building and creativity in adult life. Int. J. Media,
Technol. Lifelong Learn. 11(2), 134–148 (2015). Seminar.net
28. Rapeepisarn, K., Wong, K.W., Fung, C.C., Depickere, A.: Similarities and differences
between ‘learn through play’ and ‘edutainment’. In: Proceeding of the 3rd Australian
Conference on Interactive Entertainment, 4–6 December, Perth, Australia, pp. 28–32 (2006)
29. Buckingham, D., Scanlon, M.: Selling learning: towards a political economy of edutainment.
Media Cult. Soc. 27(1), 41–58 (2005)
30. Lampe, M., Hinske, S.: Integrating interactive learning experiences into augmented toy
environment. In: Proceedings of the Workshop on Pervasive Learning, Toronto, pp. 1–9
(2007)
31. Demir, S., Sahin, F.: Assessment of prospective science teachers’ metacognition and
creativity perceptions and scientific toys in terms of scientific creativity. Procedia Soc.
Behav. Sci. 152, 686–691 (2014)
32. Kara, N., Aydin, C.C., Aydin, C.C.: User study of new smart toy for children’s storytelling.
Interact. Learn. Environ. 22(5), 551–563 (2012)
33. Kara, N., Aydin, C.C., Cagiltay, K.: Investigating the activities of children toward a smart
storytelling toy. J. Educ. Technol. Soc. 16(1), 28–43 (2013)
34. Johnson, J.E., Christie, J.E.: Play and digital media, computers in the schools.
Interdiscip. J. Pract., Theory Appl. Res. 26(4), 284–289 (2009)
35. Ihamäki, P., Heljakka, K.: Smart, skilled and connected in the 21st century: educational
promises of Internet of Toys (IoToys). In: Hawaii University International Conferences, Art,
Humanities, Social, Sciences & Education, 3–6 January 2018, Prince Waikiki, Hotel,
Honolulu, Hawaii (2018)
36. Cagiltay, K., Kara, N., Aydin, C.C.: Smart toy based learning. In: Spector, J., Merril, M.,
Elen, J., Bishop, M. (eds.) Handbook of Research on Educational Communication and
Technology, pp. 703–711. Springer, New York (2014)
37. Montgomery, K.: Children’s media culture in a big data world. J. Child. Media 9(2), 266–
271 (2015). pp. 268
38. Gordon, H.R.D.: The history and growth of career and technical education in America, p. 3.
Waveland Press, Las Vegas (2014)
39. Prensky, M.: Digital game-based learning. Comput. Entertain. (CIE) 1(1), 1–4 (2003)
1096 P. Ihamäki and K. Heljakka

40. Edward, S.: Digital play in the early years: a contextual response to the problem of
integrating digital technologies and play-based learning in the early childhood curriculum.
Eur. Early Child. Educ. Res. J. 20(2), 199–212 (2013)
41. Yelland, N.: Reconceptualising play and learning in the lives of young children. Aust.
J. Early Child. 36(2), 4–12 (2011)
42. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development.
Prentice Hall, New Jersey (1984)
43. Gielen, M.: Essential concepts in toy design education: aimlessness, empathy and play value.
Int. J. Art Technol. 3(1), 4–16 (2010)
44. CogniToys Dino, Powered by IBM Watson, Kids Cognitive Electronic Learning Toys,
Amazon. https://www.kickstarter.com/projects/cognitoys/cognitoys-internet-connected-
smart-toys-that-learn. Accessed 8 Aug 2017
45. CogniToys: Internet-connected Smart Toys that Learn and Grow, Kickstarter. https://www.
kickstarter.com/projects/cognitoys/cognitoys-internet-connected-smart-toys-that-learn.
Accessed 8 Aug 2017
46. Cognitoys Dino Educational Smart Toy Powered by IBM Watson – Green, Toys “R” US.
https://www.toysrus.com/buy/robotics/cognitoys-7-inch-dinosaur-green-88262-95833696.
Accessed 8 Aug 2017
47. Wonder Workshop Dash Robot, IPhone Accessories. https://www.apple.com/shop/product/
HJYC2VC/A/wonder-workshop-dash-robot. Accessed 8 Aug 2017
48. Kolodny, L. Kids can now program Dash and Dot robots through Swift Playgrounds,
TechCrunc.com. https://techcrunch.com/2016/10/18/kids-can-now-program-dash-and-dot-
robots-through-swift-playgrounds/. Accessed 8 Aug 2017
49. Wonder Workshop Dash Robot, Toys “R” US. https://www.toysrus.com/buy/robotics/
wonder-workshop-dash-robot-da01-96039966. Accessed 8 Aug 2017
50. Smart Toy Bear, Fisher-Price. http://fisher-price.mattel.com/shop/en-us/fp/smart-toy/smart-
toy-bear-dnv31. Accessed 8 Aug 2017
51. Fisher-Price Smart Interactive Bear Toy, Toys “R” US. https://www.toysrus.com/product?
productId=65244526. Accessed 8 Aug 2017
52. Lucero, A., Holopainen, J., Ollila, E., Suomela, R., Karapanos, E.: The playful experiences
(PLEX) framework as a guide for expert evaluation. In: DPPI 2013, Praxis and Poetics, 3–5
September 2013, Newcastle upon Tyne, UK (2013)
Learning Analytics Research: Using
Meta-Review to Inform Meta-Synthesis

Xu Du1, Juan Yang1, Mingyan Zhang1, Jui-Long Hung2(&),


and Brett E. Shelton2
1
National Engineering Research Center for E-Learning, Central China Normal
University, Wuhan 430079, China
2
Boise State University, Boise, ID 83725, USA
andyhung@boisestate.edu

Abstract. Research in learning analytics is proliferating as scholars continue to


find better and more engaging ways to consider how data can help inform
evidence-based decisions for learning and learning environments. With well
over a thousand articles published in journals and conferences with respect to
learning analytics, only a handful or articles exist that attempt to synthesize the
research. Further, a meta-review of those articles reveals a lack of consistency in
the scope of included studies, the confluence of educational data mining
activities and “big data” as a parameter for inclusion, and the reporting of actual
strategies and analytic methods used by the included studies. To fill these gaps
within existing reviews of learning analytics research, this metasynthesis follows
procedures outlined by Cooper to reveal developments of learning analytics
research. The results include a number of metrics showing trends and types of
learning analytic studies through 2017 that include which fields are publishing
and to what extent, what methods and strategies are employed by these studies,
and what domains remain largely yet unexplored.

Keywords: Learning analytics  Metasynthesis  Educational data mining

1 Introduction

Data-driven decision making, supported by the techniques of data analytics [1], has
been widely applied in many fields such as government management [2], economics
[3], health care [4] as well as education [5]. Domain names plus “analytics” (such as
Business Analytics or Health Analytics) have become popular research topics in the era
of big data. Learning analytics (LA hereafter), involving “the measurement, collection,
analysis and reporting of data about learners and their contexts, for purposes of
understanding and optimizing learning and the environments in which it occurs.”
According to [6], is a term to represent the research area of “education” plus “ana-
lytics”. It includes the development of technology enriched formats of instructional
delivery, such as various categories of blended and online learning. These online
learning environments can be leveraged to track a student’s online behaviors and store
them in accompanying database systems. When analyzing these stored data streams
inside existing institutional database systems or data warehouses, the outcomes can be

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1097–1108, 2019.
https://doi.org/10.1007/978-3-030-02686-8_81
1098 X. Du et al.

utilized to support all levels of educational decisions. With that in mind, EDUCAUSE’s
Next Generation learning initiative defined LA as “the use of data and models to
predict student progress and performance and the ability to act on that information” [7].
LA research, then, may address the practical use of analytical results for guiding
institution management, teaching and learning practices and policy making.
LA and Educational Data Mining (EDM, hereafter) are highly related subjects that
overlap in definition and scope [8, 9]. Although both communities of researchers within
LA and EDM have similarities where learning science and analytic techniques inter-
sect, there are some significant differences between them in terms of origins, tech-
niques, fields of emphasis and types of discovery [10–12]. EDM refers to computerized
methods and tools for automatically detecting and extracting meaningful patterns and
information from large collections of data from educational settings [13]. LA is focused
on understanding and optimizing learning and learning environments by measuring,
gathering, analyzing and reporting of data about learners and learning contexts [9].
Therefore, the former focuses on how to extract valuable information to automate
learning or intervention processes while the latter if concerned with how to optimize
the processes of teaching and learning through data analysis.
The desire to take advantage of data driven decision making, the field of LA has
been rapidly growing in popularity. More than 1,000 articles have been published in
either journals and proceedings from 2011 to 2017, each year returning more than the
year before. The annual number of publications increased to 200+ since 2015 from only
28 in 2011. This growth trend over the past 7 years is reflected by the number of review
articles in LA [14–21]. However, although these review articles conducted systematic
data collections and article analysis, there are still questions regarding LA research that
remain unanswered. For example, most of the reviews combined articles relating to both
EDM and LA which is problematic due to the substantial differences between outcomes
and goals. In addition, only three articles focused on LA only [14, 18, 21].
The review sample size and scope of the reviews are relatively small (the largest
one only encompassed 135 articles) when compared with the total number of available
articles on LA. One reason is that most of the reviews included journal articles only in
their analysis. As an emerging research area, detailed and comprehensive research
found within conference proceedings may be an important source to reveal the state-of-
the-art progress. In addition, there is no review article which compared the trends
between or within journal articles and proceedings.
Existing review articles focus on providing bibliometric analysis. The bibliometric
results provide an overview of the number of publications, source of journals, prolific
researchers, and topical sub-categories within the target domain. Since LA aims to
support educational decision making, it stands to reason that the research findings
found within LA literature are just as valuable as bibliometric analysis. Yet, how many
LA studies reveal unique findings when comparing new analysis against traditional
methods? How many employ terms like “big data” and what does that term mean
within the LA literature? Both academia and practitioner benefit from a comprehensive
review that address gaps left by the reviews. Therefore, this article aims to provide a
basis for identifying gaps within existing review articles covering LA, then, use a more
comprehensive review technique to address (1) which fields are publishing and to what
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1099

extent, (2) what methods and strategies are employed by these studies, and (3) what
domains have a large following while others remain largely unexplored.

2 Literature Review

Based on our search (major data sources will be listed later), 1,051 articles and pro-
ceedings were published from 2011 to 2017. Given the intensive research efforts
focused within LA, it is unsurprising that there are eight review articles published from
2012 to 2017 (see Table 1) that attempt to synthesize LA research articles. The fol-
lowing sections focus provide a summary of these review articles and their key
findings.

Table 1. Related review papers regarding learning analytics


Ref. Database Number of papers analyzed Years
[14] LAK conference proceedings 70 conference papers 2012
[15] Web of science and conference 40 journal papers and conference 2008–
proceedings papers in total, the authors didn’t 2013
include detailed numbers
[16] Google Scholar 90 journal papers 2010–
2015
[17] ACM and IEEE Digital Libraries, 76 journal papers 2005-
Scopus, Springer Link and Google 2015
Scholar
[18] Google Scholar, Educational 112 journal papers 2000-
Resources Information Center, 2015
ProQuest, and EBSCO HOST
[19] ACM Digital Library, IEEE 55 journal papers 2010–
Xplore, Springer Link, Science 2015
Direct, and Wiley and Google
Scholar
[20] ACM Digital Library, AISEL, 40 journal papers and conference 2010–
IEEE Xplore, SpringerLink, papers in total, the authors didn’t 2015
Science Direct, Wiley, Google include detailed numbers
Scholar and proceedings pf the
Workshop on ARTEL
[21] LAK conference proceedings, 135 (LAK:65, SpringerLink:37, 2011-
SpringerLink and web of science web of science:33) 02/2016

In the first review article [14], the authors identified three driving factors-
technological, educational and political factors-which had greatly driven the develop-
ment of LA in educational settings. They concluded that LA was primarily focused on
the higher educational challenge of optimizing “opportunities for online learning” and
EDM was primarily focused on the technical challenge of extracting “value from big
sets of learning-related data”. Finally, they suggested that LA should explore the usage
1100 X. Du et al.

of new data source like contextual data and broaden the focus, not only in formal
settings, but also within informal learning and lifelong learning.
There are some interesting and unique findings revealed by the remaining seven
review articles. The authors found that the number of publications experienced sig-
nificant growth across those years [16, 17, 21]. The majority of articles were
exploratory or experimental studies [15–17, 19, 20]. Virtual Learning Environments or
Learning Management Systems were the most popular learning settings [15, 20, 21].
Two review articles reported that visual data analysis, social network analysis, pre-
diction and outlier detection were the most commonly found analytic methods [18, 21].
The most popular research topics included student behavior modeling and performance
prediction, the support of students’ and teachers’ reflections, and the awareness and the
improvement of feedback and assessment services [15, 17, 21]. Another two articles
examined educational dashboards and found that action-related and content-related
variables gathered from a single LMS platform were the primary indicators for mon-
itoring, awareness and reflection and there was no difference with traditional dash-
boards in visualization types [19, 20]. Only one articles summarized the sample sizes
and found that 84% of the studies had fewer than 500 participants [17].
These reviews taken together revealed that most of the researchers were interested
in comparing the fields of LA and EDM research, in addition to providing basic
bibliometric analysis. Different conclusions made by these articles also indicated the
scope is too small to obtain consistent results. Since LA belongs to applied science (i.e.
utilizing analysis for extracting useful knowledge in support of decision-making), what
knowledge has been discovered from the studies is perhaps more important than
reporting on the type of analytic methods. Therefore, in producing a potentially
valuable metric in reviews of LA research, efforts should be made on addressing
research gaps that involve results of these studies.

3 Method

The review analysis followed the review procedures proposed by Cooper [22], which
consist of five steps: (a) formulation of research problems, (b) searching relevant
resources, (c) evaluation of the appositeness of the data, (d) analysis and synthesis of
relevant data, and (e) presentation of the results. “Learning Analytics” was applied as a
key term to search for related articles on the Web of Science database. Two additional
data sources, the Journal of Learning Analytics and Proceedings of the Learning
Analytics and Knowledge conference, were combined with Web of Science LA articles
in forming the raw dataset. The search period was from 2000 to 2017, but the first LA
article mined was published in 2011, so the resulting time period is from 2011 to 2017.
After the search, 560 journal articles (412 from Web of Science and 148 from Journal
of Learning Analytics) and 491 conference papers (1,051 in total) were collected. Two
articles were removed due to duplication. Therefore, 1,049 articles moved forward to
the data exploration phase. Limitations of this method exist, as there is likely LA
research published that was not captured through these portals. An additional limitation
is that papers in languages other than English were excluded.
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1101

To address the research objectives, a coding scheme was defined to generate


derived variables (see Table 2). All 1,049 articles were carefully reviewed and coded
by two experienced researchers within the field. To ensure inter-rater reliability, if there
were inconsistent coding values or unclear concepts, the researchers discussed until
consensus was reached [23].

Table 2. The coding scheme


Dimensions Purposes
Bibliometrics Gain an overview of research trends via data exploration
Research Code research approaches for later comparison. Possible values include
approach concept or framework only, proof of concept with small scale data analysis,
and data analysis
Research Code research strategies for later comparison. Possible values include
strategies descriptive, predictive and prescriptive
Analytic Code analytic methods for later comparison. Possible values include
methods descriptive statistics, visualization, social network analysis, unsupervised
learning methods, and supervised learning methods
Sample Code sample characteristics for later comparison. Possible characteristics
include target population, sample size and educational level

4 Results

Results start with bibliometrics (i.e. data exploration). Because cross comparisons
contain too many possible combinations, only descriptive results are reported in this
section.

4.1 Bibliometrics
Trends of Publication Numbers. Figure 1 shows the number of publication across
years. The rising trend was a result of combining the conference papers and journal
articles. Based on the theory of innovation diffusion, we infer 2011 to 2013 is the stage
of innovators who wanted to be the first to try the innovation. Higher percentages of
proceedings also can be observed in 2011-2013. Starting from 2014, the Journal of
Learning Analytics released their first issue and published approximately 35 articles
annually. In addition, the percentages of journal publications started to exceed the
number of proceedings. We can consider 2014–2016 as the stage of early adopters. The
growth rate is the highest at all stages of innovation diffusions and researchers can be
considered as opinion leaders within the fields and are comfortable adopting new ideas.
Starting from 2017, we infer LA research is at the stage of early majority. Early
majority in this sense means that more researchers and journals, other than innovators
and early adopters, worked on or published LA research. If our inference is correct, we
can expect the growth rate will slow down and start to remain a stable level of
publication in forthcoming years.
1102 X. Du et al.

300
Journal articles
250 Proceedings
200 156
172
150 120
100 74
7 127
50 21 101
1 88
52 47 56
27
0
2011 2012 2013 2014 2015 2016 2017

Fig. 1. The distribution of papers from 2011 to 2017.

Nationalities and Authors. The first author’s countries are listed in Fig. 2. The bar
charts show the numbers of publications by years and the first authors’ nationalities.
The lines in the individual charts are the average annual publications of the top eight
countries. First, scholars in USA, Europe, and Australia started to publish LA research
since the outset. However, Asian scholars, such as those from China and Taiwan did
not publish LA articles until 2015. In proceedings publications, USA significantly
exceeded the average of the top 8 countries. England, Australia, and Canada can be
considered as equal to the average. The rest of countries were below average. Although
China is below average, the trend is rising.

Fig. 2. The distribution of the top 8 countries and districts.


Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1103

In the number of journal publications, USA authors still significantly exceeded the
average. Spain and Australia were equal to the average. The rest of countries were
below average. The charts also indicate scholars in England were more focused on
producing proceedings, while Spain and Taiwan were more focused on producing
journal publications. One notable trend is that USA, Spain, Australia and England
reached the highest numbers of journal publications in either 2015 or 2016 then show a
decline in 2017. Overall, USA had significantly higher numbers in both proceedings
and journal publications and that also raised the average numbers. Asian scholars
(Taiwan and China) did not publish LA studies until 2015.

First Authors’ Departments and Journals’ Research Areas. A total of 558 articles,
including 410 Web of Science articles and 148 Journal of LA articles, were examined
to identify the first authors’ departments and journals’ research areas. Around 67.2%
(375/558) of LA articles were published in the educational journals. Journals in
computer science and psychology also published large percentages of LA articles
(17.2% (96/558) and 8.6% (48/558)). Journals in these three areas published 93.0%
(519/558) of the total number of LA articles. In 53.0% (296/558), the first author’s
department was in the field of education, with 19.2% (107/558) was in computer
science and 10.9% (61/558) in information systems. The results indicate that LA has
attracted research effort from non-education fields, but especially in computer science
and information science. Because analytics and algorithm development are important
topics in those fields, the outcomes are somewhat unsurprising. However, 42.1%
(45/107) of first authors in computer science and 62.3% (38/61) of first authors in
information systems chose to publish LA articles in educational journals. However,
first authors in the Education field tended to publish LA articles in psychology journals
rather than computer science. The results might be an indication that the education
researchers are from a sub-field of educational psychology. However, the results may
also indicate that computer science journals have higher entrance barriers for
researchers from education.

Most Prolific Journals. Figure 3 shows the top 10 journals with the highest numbers of
LA publications. The bar charts show publications of individual journals by years, and
the line in each chart denotes the annual average of the top 10 journals. First, the results
indicate the top 10 are all educational journals. Second, the Journal of LA far exceeded
the average of the top 10 journals in the number of publications. Computers in Human
Behavior was equal to the average. The rest of journals were below average. Only two
journals (IEEE Transactions on Learning Technology and Interactive Learning Envi-
ronments) show increasing numbers of LA studies. The remaining eight journals
reached their highest numbers of publications in either 2015 or 2016, then show a
decline in 2016–2017.

4.2 Research Trends


Research Approach. Before coding research approaches, the coders removed 148
proceedings articles collected from the Learning Analytics and Knowledge confer-
ences. The articles were considered too short (less or equal to two pages) to provide
robust insights. Figure 4 contains research approaches of 901 articles. The four codes,
1104 X. Du et al.

Fig. 3. Journals that published more than 10 papers on learning analytics.

(1) review, (2) concept or framework only, (3) proof of concept with small scale data
analysis, and (4) data analysis, reflect the level of analysis involved in the study.
Review articles aimed to summarize findings via literature review. Concept or
framework only articles focused on the introductions of perspectives, concepts, or
frameworks. The majority of proof of concept articles focused on the introductions of
the proposed methods or frameworks, followed by a small scale of data analysis for the
proof of concepts. The data analysis articles describe detailed steps of data collection,
analysis methods, results, and interpretations.
The results of the research approaches are reported as follows. Nineteen papers
were coded as review articles. Roughly 62.7% (565/901) of proceedings and articles
were concept or framework only (300 articles, 33.3%) and proof-of-concept (265
articles, 29.4%) with small scale data analysis. Only 317 articles (35.2%) fell into the
data analysis category. The coders found the distributions were similar between Web of
Science and the Journal of Learning Analytics. Therefore, journal articles were not
separated for further comparison.
In this section, 317 data analysis articles were identified. The following sections
will further analyze these articles from the aspects of research topics, research strate-
gies, methods, sample, and major findings.

Research Topics. We further analyzed the purpose of study of the 317 data analysis
articles. The research topics of prediction of performance, decision support for teachers
and learners, detection of behaviors and learner modeling, descriptive and predictive
analysis of retention/dropout, descriptive and predictive analysis of cognitive states,
account for 82.6% (262/317) of LA publications. In summary, there are three major
directions in LA research: (1) predict student’s performance or the likelihood of
dropout (2) detect student’s learning progress via analysis, and (3) provide feedback or
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1105

Fig. 4. The research approach of the 901 papers.

modeling based on analysis results. Predict student’s performance or likelihood of


dropout attracted the most research efforts. For number of publications, proceedings
exceeded journal articles in analysis of cognitive states, learner interactions, and others.
Considering proceedings might represent new research trends in the field, we might
expect to see more related articles in these topics in the future.

Research Strategies, Analytic Method, and Sample. The coding results of 317 data
analysis articles are presented. Of these, 171 (53.9%) articles were coded as descriptive
studies. The descriptive methods include descriptive statistics, data visualization, social
network analysis or unsupervised learning techniques to conduct their studies. Statistics
(45.0%), data visualization (24.0%), and clustering (15.2%) are the most popular
analytic methods in the descriptive analysis. 93% articles’ sample sizes were either
smaller than 500 (57.3%) or larger than 1,000 (35.7%). Comparing the method trends
across years, statistics, data visualization, and clustering show rising trends while the
other methods show consistency in trend.
Again, of the studies coded for data analysis, 141 (44.5%) articles were coded as
predictive studies. The adopted methods include regression (58.3%), decision tree
(10.4%), or other supervised learning algorithms. Similar as the distribution of descriptive
studies, 92.2% of studies had either smaller than 500 (51.1%) or larger than 1,000 sample
sizes (41.1%). When further checking the method trends across years, Naïve Bayes,
support vector machine, and ensemble method show rising trends as the rest of methods
show steady numbers across years. Only five articles were coded as using prescriptive
analysis. Prescriptive studies, which aim to discover hidden issues and propose the cor-
responding solutions, only rely on descriptive methods such as statistics and data visu-
alization. Because there were only five articles, no trend can be extracted from this group.
1106 X. Du et al.

Sample Sizes and Learning Environments. The sample sizes and the learning envi-
ronments have also been examined. The distributions of MOOCs were different from
higher education studies, and were discussed separately. Overall, 82.6% (262/317) of
studies targeted higher education (higher education (208) + MOOCs (54)). However,
61.8% (128/208) of higher education studies had smaller than 500 samples, while
75.9% (41/54) of MOOC studies had larger than 1000 samples. Only 17.4% (55/317)
studies were focused on the K-12 environment, and smaller datasets were even more
common in the K-12 level. A majority 69.1% (38/55) of studies had smaller than 500
samples. In all studies with over 10,000 samples, only 9% (6/67) were conducted in K-
12 environments.

5 Discussion

The purpose of this review study is to reveal the development trends of LA by ana-
lyzing related papers from 2011 to 2017. It was found that the development of LA
showed a dramatic growth beginning from 2014, showing the significant role for
improving data-driven decision making in education settings. Scholars from USA had
the most contributions within the LA domain, followed by researchers from Australia
and Europe. Scholars from Asia began to show a strong interest in this topic since
2015. The number of journal publications showed a declining trend in 2017. On the
other hand, proceeding publications did not reflect the same declination, so that trend
reflecting journal publications could be abhorrent. It is also worth noting that 289
(51.6%) of 560 articles have been published by the top 10 producing educational
journals. The remaining papers were published by more than 100 different journals,
which may indicate that many interdisciplinary or non-educational journals tend to not
embrace LA research. Many scholars from non-education fields, especially in computer
science and information science, have published research studies related to LA,
however, educational scholars may have met with higher entrance barriers in journals
based in other fields.
It was also found that nearly 70% of the papers did not present numerical data
analysis, and the majority of research studies in the field of LA focus on proposing
frameworks or conducting proof-of-concept research, which is consistent with previous
reviews [16, 19, 20]. Therefore, future research could pay more attention to data
analysis and work to provide unique discoveries in guiding how to improve and
optimize the processes of teaching and learning. Although prediction of student per-
formance or likelihood of dropout attracted the most research effort, we might expect
seeing more articles forthcoming on other related topics.
Considering research strategies and analytic methods, 53.9% of the articles were
coded as descriptive studies, which generally adopted descriptive statistics, data
visualization and clustering to conduct their studies. Further, 44.5% of articles were
coded as predictive studies, which often adopted regression or other supervised
learning classification algorithms. More than 50% of the articles–both for descriptive
analysis and predictive analysis–have conducted their studies on a relatively small
sample size (less than 500). A reasonable explanation is that it is challenging to gather
Learning Analytics Research: Using Meta-Review to Inform Meta-Synthesis 1107

the educational data, due to various reasons, such as data retention and individual
privacy.
The most popular research environment was higher education (including MOOC),
which is consistent with previous reviews [14]. A possible reason is that academic
analytics [9] mainly focus on the issues of high educations’ student success and
learning analytics draws from that success. Therefore, it is not hard to understand why
most scholars devote themselves to the research of educational issues in high educa-
tion. LA is identified as “specifically correlative to the K-12 education arena” [24], and
the study of K-12 education has not been ignored, which is line with the future research
suggestions of Ferguson [14].

6 Conclusion

This review includes the development trends and high-order thinking of the research
field of learning analytics by systematically analyzing related papers gathered from web
of science, Journal of Learning Analytics and the LAK conference proceedings
spanning 2011 to 2017. In large part based on the results of a meta-review of existing
literature, the results provide some insight as to LA research, the topics and domains
that cover the field, and the analytic results of the research. The development of
learning analytics is trending toward more publications overall, with scholars hailing
from the United States contributing the most. Many scholars outside the area of “ed-
ucation” are also contributing to the LA literature. The majority of papers are still in the
early stages of research development, proposing concepts or frameworks and con-
ducting proof-of-concept analysis, confirming earlier reports [16]. Although some
emerging machine learning algorithms within the educational realm are promising,
traditional statistical methods are still preferred by many scholars. The most prolific
research area of LA focuses on higher education, also found in 2012 by Ferguson [14],
but more recently research has broadened to include K-12 educational settings.

References
1. Picciano, A.G.: The evolution of big data and learning analytics in american higher
education. J. Asynchronous Learn. Netw. 16(4), 9–20 (2012)
2. Yiu, C.: The big data opportunity: making government faster, smarter and more personal. In:
Policy Exchange, pp. 1–36 (2012)
3. Wang, G., Gunasekaran, A., Ngai, E.W.T., Papadopoulos, T.: Big data analytics in logistics
and supply chain management: certain investigations for research and applications. Int.
J. Prod. Econ. 176, 98–110 (2016)
4. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential.
Health Inf. Sci. Syst. 2(1), 3 (2014)
5. Daniel, B.: Big data and analytics in higher education: opportunities and challenges. Br.
J. Edu. Technol. 46(5), 904–920 (2015)
6. 1st International Conference on Learning Analytics and Knowledge. https://tekri.athabascau.
ca/analytics/. Accessed 18 Apr 2018
1108 X. Du et al.

7. Learning Analytics: The Future is Now. https://edtechdigest.com/2012/05/10/learning-


analytics-the-future-is-now/. Accessed 18 Apr 2018
8. Bienkowski, M., Feng, M., Means, B.: Enhancing teaching and learning through educational
data mining and learning analytics: an issue brief. https://tech.ed.gov/learning-analytics/.
Accessed 18 Apr 2018
9. Elias, T.: Learning analytics: definitions, processes and potential (2011). http://
learninganalytics.net/LearningAnalyticsDefinitionsProcessesPotential.pdf. Accessed 18 Apr
2018
10. Chatti, M.A., Dyckhoff, A.L., Schroeder, U., Thüs, H.: A reference model for learning
analytics. Int. J. Technol. Enhanc. Learn. 4(5/6), 318–331 (2012)
11. Romero, C., Ventura, S.: Data mining in education. Wiley Interdiscip. Rev. Data Min.
Knowl. Discov. 3(1), 12–27 (2013)
12. Siemens, G., Baker, R.S.J.D.: Learning analytics and educational data mining: towards
communication and collaboration. In: International Conference on Learning Analytics and
Knowledge, pp. 252–254. ACM (2012)
13. Kumar, R., Sharma, A.: Data mining in education: a review. Int. J. Mach. Eng. Inf. Technol.
5(1), 1843–1845 (2017)
14. Ferguson, R.: The state of learning analytics in 2012: a review and future challenges.
Technical Report KMI-12-01, Knowledge Media Institute the Open University UK (2012).
http://kmi.open.ac.uk/publications/techreport/kmi-12-01. Accessed 18 Apr 2018
15. Papamitsiou, Z., Economides, A.A.: Learning analytics and educational data mining in
practice: a systematic literature review of empirical evidence. J. Educ. Technol. Soc. 17(4),
49–64 (2014)
16. Sin, K., Muthu, L.: Application of big data in education data mining and learning analytics –
a literature review. ICTACT J. Soft Comput. 5(4), 1035–1049 (2015)
17. Vihavainen, A., Ahadi, A., Butler, M., Börstler, J., Edwards, S.H., Isohanni, E., Korhonen,
A., Petersen, A., Rivers, K., Rubio, M.A., Sheard, J., Skupas, B., Spacco, J., Szabo, C., Toll,
D.: Educational data mining and learning analytics in programming: literature review and
case studies. In: ITiCSE on Working Group Reports, pp. 41–63. ACM (2015)
18. Avella, J.T., Kebritchi, M., Nunn, S.G., Kanai, T.: Learning analytics methods, benefits, and
challenges in higher education: a systematic literature review. J. Interact. Online Learn. 20
(2), 1–17 (2016)
19. Schwendimann, B.A., Rodrigueztriana, M.J., Vozniuk, A., Prieto, L.P., Boroujeni, M.S.,
Holzer, A., Gillet, D., Dillenbourg, P.: Perceiving learning at a glance: a systematic literature
review of learning dashboard research. IEEE Trans. Learn. Technol. 99, 30–41 (2017)
20. Rodríguez-Triana, M.J., Prieto, L.P., Vozniuk, A., Boroujeni, M.S., Schwendimann, B.A.,
Holzer, A., Gillet, D.: Monitoring, awareness and reflection in blended technology enhanced
learning: a systematic review. Int. J. Technol. Enhanc. Learn 9(2/3), 1–26 (2017)
21. Leitner, P., Khalil, M., Ebner, M.: Learning analytics in higher education—a literature
review. In: Learning Analytics: Fundaments, Applications, and Trends, pp. 1–23. Springer
(2017)
22. Cooper, H.M.: Organizing knowledge syntheses: a taxonomy of literature reviews. Knowl.
Soc. 1(1), 104 (1988)
23. Chang, C.Y., Lai, C.L., Hwang, G.J.: Trends and research issues of mobile learning studies
in nursing education: a review of academic publications from 1971 to 2016. Comput. Educ.
116, 28–48 (2018)
24. Johnson, L., Adams, S., Cummins, M.: NMC Horizon Report: 2012K-12 Edition, The New
Media Consortium (2012)
Students’ Evidential Increase in Learning
Using Gamified Learning Environment

V. Z. Vanduhe1 ✉ , H. F. Hassan2, Dokun Oluwajana1, M. Nat1, A. Idowu1, J. J. Agbo1,


( )

and L. Okunlola1
1
Cyprus International University, Nicosia, Cyprus
vanyeb4u@gmail.com, dklewa@gmail.com, mnat@ciu.edu.tr,
richarddw6@gmail.com, nurse_johnson@yahoo.com,
ayooluwa85@yahoo.com
2
Cihan University, Erbil, Iraq
eng.hasan.f.hasan@gmail.com

Abstract. Gamification has been amazing in the past few years as it has cover
education and training due to continue technological innovation. It is evidential
that Gamification increases participation, motivation and engagement. However,
gamification design and implementation fail to achieve desirable outcome in
education due to poor design and mainly the gamification environment that is
being gamified. This research paper is aimed at mapping game elements in a well-
known generally accepted Learning Management System. This provides a gami‐
fication environment that addresses limitation of gamification in education. A
gamified course in learning management system (GCLMS) is developed to study
increase in student, learning from GCLMS. Steps and levels of the gamification
environment is shown with an evidential prove of how GCLMS increase students
learning through an initial evaluation. Feedback results from students increase in
learning and confidence in applying what they learnt from the GCLMS in real life
scenario. This research was carried out on 47 second year undergraduate nursing
students.

Keywords: Gamified course in learning management system (GCLMS)


Gamification in learning environment (GLE) · Gamification
Gamification in education · Gamified learning

1 Introduction

Generally, the use of Learning Management Systems (LMS) has also come to embrace
terms such as virtual learning, games, tele-learning, blended learning, flipped learning,
gamification and mobile learning in order to help students to interact with their peers
and instructor. These interactivities have increased learner’s ability to build their own
knowledge especially when learners interact with their instructor and other learners
because technology facilitates learning, support equal accessibility and increase knowl‐
edge sharing among students. LMS serves as potential link between students and exten‐
sive shared resources to help in accomplishing their educational responsibilities also a
support for students’ enhanced learning outcomes through different educational

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1109–1122, 2019.
https://doi.org/10.1007/978-3-030-02686-8_82
1110 V. Z. Vanduhe et al.

technological methods, with the aim of increase learning. According to [5, 22] note that
in order to achieve enhanced educational learning and performance on students of this
current generation which are referred as digital natives, technology need to be involved
in education because we are in a digital age. The present generation of students which
are called digital native which means they are born into technology. With technology,
the willingness of student to use technology in education is paramount. This creates a
room for universities to adopt e-Learning system that will provide learning through
technological tools. Moodle is a well-known LMS for digital immigrates and digital
natives; therefore, there are limited adaptation and studies on mapping gamification in
Learning Management Systems. An acceptable gamification environment in education
has been in play for the past few decades. Numerous study on applying gamification in
education recorded limitation on increase in student learning and performance as well
as an acceptable easy to use gamification environment for education. In order to provide
solution to the current gamification in education limitations stated as, we developed a
system to map gamification in education using Moodle plugins. At the course of this
study, aside introduction which is the first, (2) related work on gamification, (3) gami‐
fication assessment in Moodle, (4) description of Current GCLMS called gamified
course environment (GCE), (5) initial evaluation of the use of GCLMS, (6) conclusion
and future work.

2 Related Work

Gamification in compares the use of game elements, mechanics, or experience in non-


game design setting to achieve a specific goal in learning, through digital motivation
and engagement. The authors in [5, 14, 16] define gamification as the use of game
elements and mechanism in non-game design settings. Gamification is a personal expe‐
rience that a user gains through virtual and digital game element, which creates intrinsic
motivation to learn through personal experience [3, 15, 30]. The use of game creates
positive emotions which motivates learners not to reach burn out stage in learning.
Learning with emotions makes learners to have culture to triggered behavioral motiva‐
tion and engagement. Behavioral motivation and engagement is met where someone full
participation and involvement in activities in an online environment is determined [9].
Engagement measurement methodologies are studied in an online platform through self-
reporting, teaching-rating, logs, viewing course content online and observation techni‐
ques [7]. The authors in [5] find out in their study that online behavioral engagement
measurement which was conducted in a macro-level includes total counts logs and times
that individuals spent on each online activity during a semester in learning management
system. Assessment in gamification is important, the assessment need to be design to
factor towards the goal of the gamification design. Excessive measures on assessment
in gamification de-motivate users. Thus, any form of platform that contains motivation,
progressiveness and instant feedback is known as a gamification platform because they
are the pillar that gamification stand on [5, 6, 14].
Students’ Evidential Increase in Learning 1111

2.1 The Adoption Gamified Learning Environment in Learning Management


System (LMS)
Gamifying Course in LMS is driven from current studies on gamification in education
with intensive study of gamification platforms. Gamified learning environment is the
application of game mechanisms or elements in online education environment to
improve and motivate student leaning and behaviors [1, 2, 6, 13, 15, 18]. Study on the
use of educational LMS has conducted different analysis to examine student motivation
towards gamification [2, 17].
According to the Horizon Report of Higher Education 2017 Edition states that
gamification is part of new technological innovations that educational institutions need
to adopt in order to improve student success. They concluded that this approach would
assist educational stakeholders to provide a flexible and user-friendly ways of learning
in educational environments that will concurrently meet up educational needs and avert
current challenges in students learning [2–4, 13, 21].
In an empirical study on exploring the impact of intrinsic and extrinsic motiva‐
tion [1, 4] on the undergraduate students’ participation and performance in an online
gamified learning intervention [12, 22] find out that gamification practices have a
positive impact on students intrinsic learning needs. Their results show that the posi‐
tive impact of gamified intervention on student participation varies depending on the
student type of motivation that occurs during the process. Lack of practical experi‐
ence in learning in a simulated or real life scenario creates negative impact on
student confidence in knowledge acquired and also on discussion online [3, 4, 21].
Therefore, using gamification gives students chance to see professionalism in prac‐
tice. Other related studies on student motivation on the use of gamification in online
learning environment (GOLE) have shown that it supports different educational
methodologies such as gamified pedagogy, classroom live, collaboration, self-regu‐
lated learning, diversifying of ideas, scaffolding of concepts, simulation, and self-
reflection among students in a GCLMS [2, 12, 19, 20].
In a study by [6, 12] state that the use of gamified pedagogy helps students to improve
self-motivated learning and interaction with course content and collaboration among
them. In addition [3, 5, 6, 19] stated that good implementation of e-learning with gami‐
fication increases students’ satisfaction, engagement, effectiveness and efficiency in
learning. However, in a study by [7, 11, 17, 28] on using game based learning in LMS
approach, contribute that information systems enhance students deep leaning in a
convenient and attractive ways to relate, collaborate and share ideas within a Problem
Based Learning (PBL) system approach [6, 20]. Although, with the numerous persistent
study on the use of information system tools in PBP [11], only few studies on student
motivation on the use of gamification in LMS online learning environment have been
worked on [4, 23, 24].
The adoption or mapping game elements in education has been effective in students
learning. Therefore, current research on gamification has given this study foundation to
map gamification in Learning Management Systems using common gamification design
process.
1112 V. Z. Vanduhe et al.

2.2 Gamification Design


Technological artistic aesthetics that is designed to create interaction between two or
more players for exercise, edutainment, or education for the purpose of experiment is
referred to as game design. Game design consist of defied rules, goals and challenges
towards role playing game, video game, casino game, table top game, sports or logic
thinking game [9, 15, 27, 28].
The definition of gamification is based on the use of game element in (a non-game)
which is the use of game elements in education, training, or organization using techno‐
logical game engines. This leads us to studies done by [12, 29, 22] that when designing
gamification, the following need to be consisted:
• The first thing to consider is the innovation process gamification design. This inno‐
vation design process involves the transfer of game elements to the learning or
training platform that is enticing, friendly and easy to use.
• Profound understanding of user’s demotivation and problems in students face in
learning.
• The gamification platform must have the ability to affect the behavior of users through
interaction with peers, unveiling task, and competition ability. The ability of user to
reply after activity completion should not be undermined, because the use of gami‐
fication provides users to replay even when he/she fails a level.
• Therefore, the psycho-behavioral effect of gamification has open another layer to
game design to add group of individual behavioral effect.

2.2.1 Framework of Gamification Design


Mechanics, Dynamics and Aesthesis/Sensation (MDA) are used to define the foundation
of gamified design elements [8, 29].
Gamified mechanics is defining as tool that describes specific composition of game
elements such as badges, collection, or achievements. Examples of game mechanics are
rewards, chances, resource acquisition, transactions cooperation, challenge, feedback,
and win stage.
Dynamics is the systematic connection of a game player with the gamification
system, the connection include teamwork, collaboration [1], choice making, or compe‐
tition with other participants. This usually aims at creating remarkable user gameful
experience [1, 9, 6, 15, 27] examples are expression, challenges, time pressure or tension.
Aesthesis/Sensation this is the extraction of user emotions towards the game. This
is a sensational motivation influenced by player’s desire. Typical example is fantasy,
collaboration, fellowship, expression, discovery or submission [22, 30].
Due to the diver’s use of game element and its application, the need to determine the
effect of game element on individual intrinsic and extrinsic behavioral needs. Extrinsic
rewards such as badges or money does not increase motivation and engagement in a
long run [16, 17, 22] whereas, intrinsic factors, such as self-enjoyment, and engagement
keeps player’s loyalty and at the long run increase motivation [12, 30].
Students’ Evidential Increase in Learning 1113

Intrinsic and Extrinsic Gamification Effects. Gamification has a confirmed direct


effect on the intrinsic stands of a player or user. Intrinsic motivation is seen as a self-
determined mind to achieve a goal. While extrinsic motivation depends on physical
rewards such as money or verbal gifts. In [7, 29, 30], the authors define both Intrinsic
and extrinsic as what drive game user to continue playing. This increase the quality of
effort the people invest in a given task. The authors in [20, 30] gave more light on intrinsic
motivation that work experience in a controlled environment has a proportional negative
impact on performance, but a self-determined drive to work increases work performance
[16, 20, 21, 30].

2.3 GCLMS Adoption


Although there are various methodologies on ground on the adoption of GLE, as well
as the study on use for users’ behavioral intentions in order to investigate GLE adoption.
Studies show that most common methodology is through quantitative and qualitative
data collection or mixed method.
The author in [16] examined the incorporating intrinsic motivator into the Tech‐
nology Acceptance Model and the study attempts to explain students’ behavioral inten‐
tion by using the e-learning system from a motivational perspective through quantitative
method and the results show that both perceived usefulness and enjoyment have signif‐
icant impact on students’ intention to use.
In another study on GLE adoption and acceptance, [10, 20, 29] developed a policy
adoption framework for implementation of gamification through scientific investigation
of combine principles of the Technology Acceptance Model and Technology-Organi‐
zation-Environment (TOE) framework. The methodology used comprises of seven
stages that adopt the interpretive paradigm and a mixed-methods research design and
proposed the integration of academic users’ acceptance with macro-level factors [8, 28].
Based on the existing literature found, there are limited studies on the use of quan‐
titative data in tandem to assess students’ motivational attitude toward the use of LMS
in a GLE environment, and this study attempts to address the gap in the literature [5, 8,
15, 22].

3 Gamification Assessment in MOODLE

In gamifying a course in Moodle, the major algorithm that run to provide a gamified
experience is based on pointing system. To earn points, certain rules need to be put in
place where by, point determines the level, badge, and shift in the progress bar and other
game elements in Moodle. At certain point, when a player earns points, the accumulated
point unlocks badge, this is set by the course administrator, and same procedures are
made for other game elements. Therefore, the contextual activity of gamification in
Moodle is accessed by points in which the point is assigned by the pointing rule, Level
up is earned based on leveling rule drive from pointing rule. Badge which represent
achievement is earn based on badge rule driven from pointing rule set by the adminis‐
trator, movement in progress bar are earned based on the activity competed. Leaderboard
1114 V. Z. Vanduhe et al.

works based on rule set by the pointing rule where by, the more points earn, the more
proportional move in the leaderboard. Figure 1 below illustrates the pointing rule [21–
23, 25].

Fig. 1. Pointing rule in GCLMS.


Students’ Evidential Increase in Learning 1115

As stated in Fig. 1, gamification in action is formed in content base and participant


behavioral responsive action on the content. Gamified content in Moodle tends to
increase user’s interaction with the content. Point allocation to content is described as
point assignment and pointing rule as assigning rule to cumulative specific action. Level
creates an impressive representation of completed activity in Moodle gamified content.
This is based on assigning rules to an upward movement in leveling. In assigning badge
to set of activity completed increases user’s aspiration which in turn is based on assigning
point to badge. This means that an accumulated point earned gives birth to a badge. A
bar in a progress bar represent an activity, therefore, when an activity in completed, there
will be an automatic shift to the next progress bar. Progress bar rules are apportioned
to set of activities. In Moodle leaderboard is referred to as ladder board where by the
serve the same purpose. The system that runs in ladder board is same as progress bar.
In adopting gamification in LMS, as discussed above pointing is the fuel of gamifi‐
cation in this research which drives all other game element. Figure 2 below illustrates
game element mechanism for this study.

Moodle game
elements Experience
Point point
Level up
Badge
Progress bar Engagement
Leaderboard Collaborative Motivation
Moodle games Point
earn Output Change in behavior
mechanism Participation
CompleƟon tracking Collaboration
Achievement Recognition
Rewards status
AcƟvity restricƟon
Time limit on acƟvity
Measurement Status
Moodle Forum

Fig. 2. GCLMS design.

4 Description of Current GCLMS

Moodle LMS create a course in form of GCLMS (see Fig. 1). This application was demon‐
strated and presented to the participants with instructions on interactive video quiz gamifi‐
cation, for over a period of 5 weeks. The purpose is to help the students understand how
gamification works, improve their knowledge, to create interest and make students engage
in Moodle gamified learning environment [9, 14, 16]; also, to learn about real-life issues,
critical - thinking, participation and collaborate with other members.
Upon the completion of this activities, students earn rewards for completion such as;
point, level ups, progress, leader board and badge. This is to introduce competitiveness
among students, also to be ranked accordingly based on leader board and it is determined
1116 V. Z. Vanduhe et al.

by the number of points they received after completing their tasks. Participants were allowed
to use the application anywhere as long as they participant completed the task [17].
Gamification in Moodle is designed based on gamification plugin installed to gamify
courses within the LMS [18–20]. The algorithm that runs to gamify course in Moodle
is called CRUD. This algorithm gives points based on CRUD which stands for Create
Read Update and Delete within LMS). CRUD play monitors course [1, 3, 4, 10, 22],
activity views and activity completion as well. Three levels where created in the Moodle
GCLMS (see Fig. 3) resources used are quiz venture (see Fig. 4), H5P interactive video
(see Fig. 5 and millionaire game was used for quiz assessment (see Fig. 6). Moodle
bocks such as level up ladder is used to display ranks, badge notated levels, list of

Fig. 3. Current version of our Moodle gamified course showing the course dashboard.

Fig. 4. Illustrating gamified quiz in Moodle using Quiz venture


Students’ Evidential Increase in Learning 1117

participant, total experience points and progress (see Fig. 7); progress bar which shows
activity that is needed to be completed and the progress within the GCLMS (see Fig. 8).

Fig. 5. A gamified interactive video explaining steps in carrying out vital sign test.

Fig. 6. Show wants to be a millionaire gamified quiz.


1118 V. Z. Vanduhe et al.

Fig. 7. Demonstrating game elements in Moodle such as rank, levels, total individual experience
point, badge and progress within GCE.

Fig. 8. Showing activities that are to be completed within the GCLMS.

This is a normal Moodle page, but this has some little features which depicts game
element. Figure 3 is a screen short of user interface of gamified learning environment
used in this research. The top icon on the right shows the experience points earned as
well as avatar, below is the ladder board as well as progress bar. Level one to level three
are the Moodle gamified content using quizventuire, interactive video and who want to
be a millionaire game. Figures 4, 5, 6 and 7 describe the gamified course content in
details. Next level shows restricted until the activity requirement shows complete before
it unveils, this is applicable to all the levels. After clicking on “level 1” called gamify
me, Fig. 4 describes in details what happens next.
Students’ Evidential Increase in Learning 1119

Quizeventure is a Moodle plugin that gamifies quiz in such a way that the question
is displayed at the top of the screen while the possible answers drop from the top. The
lives are displayed at the top left hand corner of the screen. As the answers drops from
the top, the player aims and shot at the correct answer; though possible options could
shoot back at the player below. Arrow keys help the player navigate left, right, up and
down, while the space bar is to shoot. Below the screen at the left side gives more
comfortable features such as sound and full screen abilities. This plugin could take as
many more questions and answers and design by the instructor.
In making educational video interactive, at certain point the video automatically
pulses and a question pops up in multiple question format or any form of questioning.
After choosing the answers from the option the video continues. Those purple small
cycles represent questions imbedded in the video. This gives student ability to re watch
the video to be able to understand every part of the video.
Who want to be a milloniare is a well known game in the world, in introducing this
game to education. This present learning in a friendly way that motivates students to
learn and get engaged. Gamifying quiz such as this, takes off burn out stage and tension
in quiz. At the top of Fig. 6 is shown the life line such as 50/50, call a friend, vote or
cancel the question. This works as it is in who want to be a millonaire whereby, when
you fail a questions, it takes the player to the beginning and will loose all the points gain.
Ladder board in LMS is seen as leader board. This presents rank of all the partici‐
pants, their current level, total experience points and progress. Both students and course
administrator is able to view the ladder board, though it could be locked from students.
This is a progress bar block in Moodle LMS which shows completion activity
progress in gamified course environment.

5 Initial Evaluation

An initial evaluation was carried out in other to get feedback from student with regards
to the gamified platform; this initial evaluation was carried out on second year nursing
students in Cyprus International University. The aim of this initial evaluation exercise
is to determine positive and negative effect of our GCLMS before moving to the crucial
aspect of our research which is the structural empirical aspect that deals with intensive
research and scientific analysis on which game element has the highest or negative effect
on students’ performance. This is a pilot program that will later cover all students in the
faculty of health science and the English faculty. Forty-seven students took part in this
exercise and their feedback on this report that GCLMS has well fit and help them under‐
stand procedures in carrying out vital signs. More so, the experience of GCLMS made
vital sign to seem as a real life scenario whereby making them have confidence in
carrying out vital signs on patient in hospital during practical classes in the hospital.
There are some reports from the students:
Student 1: “As a nursing student real life scenarios in learning increase learning, the
gamified interactive video environment make it seems real”.
Student 2: When I receive points and see my leader board higher than that of my
mates it encourages as well as makes learning competitive and collaborative as well. I
1120 V. Z. Vanduhe et al.

managed to get move to some level with the help of my mates, I send them message
using the forum to ask how they did it and they help me.
Student 3: I had no issue in the GCLMS because is it easy to play, this is on point
Student 4: This GCLMS in my point of view is really cool; the idea of involving fun
in learning makes me to see nursing as fun.
Student 5: I feel that the leaderboard and experience point notification should appear
in all the pages of the levels. I find it not convenient to always go to the dashboard to
view my points.
Student 6: The interface where this GCLMS is running seems to be slow compared
to the normal university Moodle.
Student 7: I love playing game to see game in classroom is a nice experience. Playing
the game again and again made me not to need my books again. When I see new thing
in the GCLMS I ask my colleagues using the forum.
Student 8: When a teacher gives me some assessment and I pass, I feel more accom‐
plished than when my colleagues assess me. GCLMS made me feel like my teacher
teaches me and assess me and this assessment does not make me feel as a failure because
it gives me chance to go all over again and again.
Student 9: I suggest that at the end of the GCLMS a certificate should be issued so
that I can post on Facebook.
Student 10: I wish all my courses use this system.

6 Conclusion and Future Work

This study examines evidence in student learning using gamification by gamifying a


course; this is a pilot study, so that GCLMS will be adopted in course that has to do with
practicals. Numerous studies and the application of gamification to increase student
learning are already adopted in many areas of students learning. Though limitation from
the results of student feedback on their personal experience with evidential increase in
student learning have not been acquired, a major limitation of gamification from
previous study is an acceptable gamified environment for gamifying student learning
was an issue.
This study focuses on providing solution to limitations of adopting gamification in
education. However, gamifying a generally accepted learning management system
which students and educators are familiar with, it provides solution to the limitations of
gamification in education. Results from student’s feedback of using GCLMS turn out
as great achievement in gamification in education.
Our future study involves improving the suggestions given by the students from the
above initial evaluation section. An experience block on all the pages will be available
so as to provide students information on their current experience point’s achievement.
Certificate in the next version of this GCLMS will be issued in order for students to get
self-completion achievements as well and use it to entice other students which give way
to more game elements.
Critical study on which game element trigger student’s behavior in learning needs
to be addressed in future so that more emphasis will be drown on that. The adoption of
Students’ Evidential Increase in Learning 1121

this GCLMS needs to be carried out in other courses and universities to ensure and
confirm the result received from our study of using GCLMS. Finally, though GCLMS
is a prototype, it is accessible from our PhD project website as a guest in http://sengage‐
ment.org/ [26].

References

1. Barna, B., Fodor, S.: An Empirical study on the use of gamification on IT courses at higher
education. In International Conference on Interactive Collaborative Learning, Cham (2017)
2. Basten D.: Gamification, in Software Engineering. IEEE Comput. Soc. 34(5), 76–81 (2017).
https://doi.org/10.1109/ms.2017.3571581
3. Becker, A.: NMC Horizon Report: 2017 Library Edition, The New Media Consortium (2017).
https://www.learntechlib.org/p/182005/
4. Buckley, P., Doyle, E.: Gamification and student motivation. Interact. Learn. Environ. 24(6),
1162–1175 (2016). https://doi.org/10.1080/10494820.2014.964263
5. Çakıroğlu, U., Başıbüyük, B., Güler, M., Atabay, M., Memiş, B.Y.: Gamifying an ICT course:
influences on engagement and academic performance. Comput. Hum. Behav. 69, 98–107
(2017). https://doi.org/10.1016/j.chb.2016.12.018
6. Challco, G.C., Mizoguchi, R., Bittencourt, I., Isotani, S.: Personalization of gamification in
collaborative learning contexts using ontologies. IEEE Lat. Am. Trans. 12(6), 1995–2002
(2015)
7. Costa, C.J.: Gamification: software usage ecology. Online J. Sci. Technol. 8(1), 91–100
(2018)
8. De-Troyer, O.V.: Linking serious game narratives 73 with pedagogical theories and
pedagogical design strategies. J. Comput. High. Educ. 29(3), 549–573 (2017). https://doi.org/
10.1007/s12528-017-9142-4
9. Dias, J.: Teaching operations research to undergraduate management students: The role of
gamification. Int. J. Manag. Educ. 15(1), 98–111 (2017). https://doi.org/10.1016/j.ijme.
2017.01.002
10. Ding, L., Er, E., Michael, O.: An exploratory study of student engagement in gamified online
discussions. Comput. Educ. 120, 213–226 (2018). https://doi.org/10.1016/j.compedu.
2018.02.007
11. Freitas, A.A., Michelle, F.M.: Classroom Live: a software-assisted gamification tool.
Comput. Sci. Educ. 23(2), 186–206 (2013). https://doi.org/10.1080/08993408.2013.780449
12. Hamari, J., Koivisto, J., Sarsa, H.: Does gamification work? In: Hawaii International
Conference on System Science, A Literature Review of Empirical Studies on Gamification,
pp. 3025–3034 (2014). IEE. https://doi.org/10.1109/hicss.2014.377
13. Hsu, C.-C., Wang, T.-I.: Applying game mechanics and student-generated questions to an
online puzzle-based game learning system to promote algorithmic thinking skills. Comput.
Educ. 1–37 (2018, in Press). https://doi-org.cmich.idm.oclc.org/10.1016/j.compedu.
2018.02.002
14. Johanna, P., Maria, R.-S., Christian, G.: Motivational active learning: engaging university
students in computer science education. In: Proceedings of the 2014 Conference On
Innovation & Technology in Computer Science Education, Uppsala, Sweden (2014)
15. Kapp, K.M.: Choose your level: using games and gamification to create personalized
instruction. In: Murphy, M., Redding, S., Twyman, J. (eds.). Handbook on Personalized
Learning for States, Districts, and Schools. Center on innovation and learning, pp. 131–143
(2015)
1122 V. Z. Vanduhe et al.

16. Landers, R., Armstrong, M.: Enhancing instructional outcomes with gamification: an
empirical test of the technology-enhanced training effectiveness model. Comput. Hum.
Behav. 71, 499–507 (2017). https://doi.org/10.1016/j.chb.2015.07.031
17. Marko, U., Vukovic, G., Jereb, E., Pintar, R.: The model for introduction of gamification into
e-learning in higher education. In: Procedia - Social and Behavioral Sciences, 7th World
Conference on Educational Sciences, vol. 197, pp. 388–397 (2015). Greece: Science Direct.
https://doi.org/10.1016/j.sbspro.2015.07.154
18. Martin, B., Isabel, M., Markus, B., Jasminko, N.: A design framework for adaptive
gamification applications. In: Proceedings of the 51st Hawaii International Conference on
System Sciences (2018)
19. Michael, H., Rowan, T.: A gamification design for the classroom. Interact. Technol. Smart
Educ. 15(1), 28–45 (2018). https://doi.org/10.1016/j.compedu.2018.02.007
20. Sebastian, D.: Gamification: designing for motivation. Interactions 19(4), 14–17 (2014).
doi:https://dl.acm.org/citation.cfm?doid=2212877.2212883
21. Tobias Wolf, W.H.: Gamified digital services: how 93 gameful experiences drive continued
service usage. In: Proceedings of the 51st Hawaii 94 International Conference on System
Sciences, pp. 1187–1196 (2018). http://hdl.handle.net/10125/50034
22. Tugce, A., Berkan, C., Goknur, K.: A qualitative investigation of student perceptions of game
elements in a gamified course. Comput. Hum. Behav. 78, 235–254 (2018). https://doi.org/
10.1016/j.chb.2017.10.001
23. Pérez‐Berenguer, D., García‐Molina, J.: A standard‐based architecture to support learning
interoperability: a practical experience in gamification. Softw. Pract. Exp. (2018). doi:https://
doi.org/10.1002/spe.2572
24. Pirker, J., Schiefer, M.R., Güt, C.: Motivational active learning - engaging university students
in computer science education. In: Proceedings of the 19th Annual Conference on Innovation
and Technology in Computer Science Education, pp. 297–302. ACM, Sweden (2014). https://
doi.org/10.1145/2591708.2591750
25. Vitkauskaitė, E.: Points for posts and badges to brand advocates: 87 the role of gamification
in consumer brand engagement. In: Proceedings of the 51st 88 Hawaii International
Conference on System Sciences, pp. 1148–1157 (2018)
26. Vanduhe, V.Z., Nat, M., Oluwajana, D.I., Hasan, H.F., Idowu, A.: Sengagement.org. Students
engagement http://sengagement.org/Moodle/course/view.php?id=10. Assessed 2018
27. Werbach, K., Hunter, D.: The Gamification Toolkit: Dynamics, Mechanics, and Components
for the Win. Wharton Digital Press (2015). https://books.google.com.cy/books?
id=RDAMCAAAQBAJ
28. Zichermann, G., Cunningham, C.: Gamification by Design: Implementing Game Mechanics
in Web and Mobile Apps. O’Reilly Media, Canada (2011)
29. Seaborn, K., Fels, D.I.: Gamification in theory and action: a survey. Int. J. Hum Comput Stud.
74, 14–31 (2015). https://doi.org/10.1016/j.ijhcs.2014.09.006
30. Mekler, E., Brühlmann, F., Tuch, A., Opwis, K.: Towards understanding the effects of
individual gamification elements on intrinsic motivation and performance. Comput. Hum.
Behav, 71, 525–534 (2017). https://doi.org/10.1016/j.chb.2015.08.048
Improving the Use of Virtual Worlds in Education
Through Learning Analytics: A State of Art

Fredy Gavilanes-Sagnay1 ✉ , Edison Loza-Aguirre1,2, Diego Riofrío-Luzcando3,


( )

and Marco Segura-Morales1


1
Departamento en Informática y Ciencias de la Computación, Escuela Politécnica Nacional,
Ladrón de Guevara, E11-253, P.O. Box 17-01-2759 Quito, Ecuador
{fredy.gavilanes,edison.loza,marco.segura}@epn.edu.ec
2
CERAG FRE 3748 CNRS/UGA, 150, rue de la Chimie, BP 47, 38040 Grenoble Cedex 9, France
lozaedison@univ-grenoble-alpes.fr
3
Facultad de Arquitectura e Ingenierías, Campus Miguel de Cervantes,
International University SEK, Calle Alberto Einstein, Quito, Ecuador
diego.riofrio@uisek.edu.ec

Abstract. The use of Virtual Worlds in Education is becoming an innovative


alternative to traditional education. However, these solutions are confronted to
several issues such as: lack of indicators to follow up the students’ progress, lack
of well-defined evaluation parameters, difficulties for evaluating collective and
individual contributions, difficulties for keeping students engaged and motivated,
a very time-consuming teachers’ supervision, and the absence of tutors for
guiding the learning process, among others. In this review, we explore and
describe academic contributions focused on the application of Learning Analytics
to improve Virtual Worlds in Education from three perspectives: Personalized
Learning, Adaptive Learning and Educational Intervention. Our results highlight
that most of the research focus on support decisions whose nature concerns
operational non-real-time issues. Additionally, almost all the contributions focus
in solving only a few issues, but none of them offer a holistic framework that
could be used by teachers or pedagogical personnel for decision making.

Keywords: Virtual environments · Virtual worlds · Learning analytics


Data mining · Educational platform

1 Introduction

Virtual Worlds, the most common form of Virtual Habitats, is a type of Virtual Envi‐
ronment [1] that has become an innovative alternative to traditional education methods
[2]. Over the last decade, their role has grown to the point where most of the universities
in the world are reforming their programs for gradually bring these approaches as a
lifelong learning instrument [3]. Monitoring the population inside Virtual Worlds or
evaluating activity and task designs based on actual user behaviour can provide new
insights on large scale implementations [4]. Also the unique features of Virtual Worlds
in sensorial learning have promoted the idea of learning ways, anywhere and anytime
in immersive and interactive contexts [5, 6].

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1123–1132, 2019.
https://doi.org/10.1007/978-3-030-02686-8_83
1124 F. Gavilanes-Sagnay et al.

Even though the use of Virtual Worlds in education has become almost ubiquitous,
it is confronted in practice to several issues such: problems related with knowing what
is happening within the virtual world to identify conflictive user behaviours [7–9] or
tracking the students’ interactions with elements of the virtual world [10, 11], lack of
indicators to follow up the progress of the students in the courses [12], lack of imple‐
mentation of well-defined evaluation parameters [12], difficulties for evaluating the
collective and individual contributions while the students handle tasks [13], difficulties
for keeping students engaged and motivated [14], a very time-consuming teachers’
supervision in the search for signs of doubt, frustration, stress or fatigue from students
[15], pedagogical issues that are inherent to conventional learning [16, 17], absence of
tutors with experience to guide the learning process [17, 18]. These problems raise the
need to pursue the quest of mechanisms to improve the use of Virtual Worlds in educa‐
tion and guarantee the effective fulfilment of learning objectives [19, 20].
In this context, Learning Analytics would contribute to solving some of the issues
cited above. Learning Analytics refers to the measurement, collection, analysis and
reporting of data about learners, teachers and their contexts, for purposes of under‐
standing and optimizing learning and the environments in which it occurs [21]. Applied
on learning environments, Learning Analytics enables the analysis of data about teachers
and learners that use the environment for identifying behaviour patterns, assess the
learning process, improves the overall learning experience and gives the opportunity to
use this information to reflect on learning activity of the users [22, 23]. Learning
Analytics seeks to exploit educational data to deliver feedback to learners and teachers
in the system [24]. In the case of Virtual Worlds used in education, the analysed data
can come either from interactions of avatars with other users, the 3D objects of the virtual
world, or with the Virtual World itself (e.g. frequency of use, task accomplishment,
movement patterns, preferred locations) [7, 25].
Since decisions in education – or in any field – should be informed and based on the
right choose of the best available option [26], Learning Analytics would contribute with
useful indicators for pedagogical managers to see things from new viewpoints, reduce
blind spots, assimilate complex data structures and address issues from ‘in-production’
courses. Thus, the aim of this study is to explore how Leaning Analytics has been used,
up to date, for decision-making intended to address the issues that would impact the
fulfilment of learning objectives using Virtual Worlds.
To meet our research objective, we performed an extensive review of literature [27]
to study the contributions that link the use of Virtual Worlds in education with Learning
Analytics. We performed our review by collecting the articles from the last 10 years,
included on Science Direct, the IEEE Xplore library, the ACM Digital Library and the
Springer Digital Library.
The rest of the paper is organized as follows. The next section describes the contri‐
butions found in our review. In Sect. 3, we present how the contributions deal with the
issues cited above in this introduction. The Sect. 3 offers a discussion about the literature
found. Finally, in Sect. 4 we offer our conclusions.
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1125

2 Using Learning Analytics on Virtual Worlds Used in Education

In this section, we present all the articles found during our review grouped according to
three perspectives: Personalized Learning, Adaptive Learning and Educational Inter‐
vention [28].

2.1 Personalized Learning

Personalized learning refers to instruction where the Virtual World can be set up to meet
the learner needs. The improvement of the learning process is obtained from the analysis
of the data of each learner to customize the environment. This customization increases
the learners’ personal motivation and facilitates the design of strategies for educative
coaching [29]. Personalized learning also allows the development of learning schemes
in which individual research and experimentation are promoted. It provides a unique,
highly focused learning path for each student. Contributions that meet these objectives
are presented below.
In their research, [7] propose a framework for the recovery and analysis of data
related to educational settings of virtual worlds. For this, the authors implemented a
pharmaceutical industrial laboratory named Usalpharma Lab, which is a virtual labora‐
tory in Second Life. The virtual laboratory represents all the installations, equipment
and the documentation needed for teaching ‘Good Laboratory Practices’. Both, students
and teachers are represented as avatars. Teachers guide and evaluate the activities
proposed to students during the course, which means that they should be present when
the activities are in progress. Every action that occurs into the Virtual World, originated
by the user or by any event is saved into a database. The data is exploited later through
a framework, which includes the following layers: (1) the ‘evidence description layer’
that collects the evidence of interactions between the learner and the Virtual World, (2)
the ‘collector layer’, which is responsible for processing the data sent by the description
layer, (3) the ‘storage layer’ that is where the data processed is stored, (4) the ‘analysis
layer’, which analyses data and also maps the information inside a database (several
statistical procedures and data mining methods are executed in this layer), and (5) the
‘presentation layer’, which is responsible for the presentation of information to final
users or other applications integrated with this architecture. The main particularity of
their approach is that the learner is at the centre of the architecture since their initial
interactions are analysed through their five-layer framework, which leads, in turn, to
take actions to improve the Virtual World.
In their work, [25] identified and validated learners’ behaviour and patterns with the
intention to avoid or reduce student defections in virtual courses. They offered insights
about the advantages of the structures, contents and interactions on Virtual Worlds when
compared with other types of Virtual Environments. After a class, both teachers and
students evaluate aspects such acceptance and relevance through surveys. Their
responses will lead to execute actions to avoid students’ defections and improve the
adoption of the Virtual World. The researchers pointed the importance of two aspects
of Virtual Worlds used in education for gain learners’ acceptance: (1) the versatility on
interaction with other users offered by Virtual Worlds (e.g. gestures, text chat, voice
1126 F. Gavilanes-Sagnay et al.

chat), and (2) the freedom of movement across an open world (i.e. displacements
between virtual islands or virtual lands), which facilitate learners to find places adapted
to their preferences.
In their article, [30] report the development of a methodology for studying the
behaviour of users with autism through a Virtual World. For collecting data, the authors
defined a three-level scheme to analyse reciprocal interaction, which consists of: (1) a
first ‘interaction mode level’ that describes reciprocal interactions (i.e. initiations,
responses and continuation of activities and tasks) with focus on the social interactions
among participants, (2) a second ‘interaction mode level’ that considers aspects such as
the duration of the activities, or learners’ patterns in social activities (i.e. verbalization,
text messages or avatar gestures), and (3) a ‘context level’ that describes learners’
engagement and technological supports. The authors personalize the Virtual World
based on the data collected from the platform and from the reactions of the faces and
gestures captured by a camera.
In [31], the author attempted to apply Learning Analytics methods for studying
students with social behaviour disorders. They used a collaborative Virtual World named
iSocial. The authors focused on exploring tools that would allow them to gain sense of
the data collected from the Virtual World. Then, they focused on answering questions
about how participants with social behaviour disorders use their avatars while follow an
instruction. Data was collected in two forms: (1) by recording the movements and posi‐
tions of the avatars, and (2) by filming the movements and gestures of students in the
real world, synchronizing them with the actions captured from the Virtual World. The
main contribution of this research resides on how the authors used data visualization
techniques to understand individual students’ behaviour in the Virtual World since each
student was considered a special case.
In their work, [32] report their experiences studying a virtual office conceived for
teaching aspects about information security. The Virtual World was implemented in
Second Life. The aim of the Virtual World was to study the impacts on achievement of
learning outcomes though constructivist learning. The authors customized the learning
process for two groups of students: a control group and an experimental group. Authors
use the experimental group for introducing and testing improvements on the Virtual
World and evaluate the results. Later, they analyse which of the changes lead to situa‐
tions where the students of the experimental group showed better perceived learning
achievements that the students of the control group. This trial and error process allows
testing learning strategies and uses only those that proved as effective for a student or a
group of them.
In [33], authors present a predictive student action model for Virtual Worlds used in
education. Using this model, it is possible to predict common behaviours from students
by analysing sequences of common mistakes. The authors took data from error logs and
clustering it while they observe the time in which errors occur until students achieve the
entire practice. Then each defined cluster is represented by an ‘automata’ that will be
used for generating typologies of students. The authors implement their methods on
what they called the Student Behaviour Predictor, which has mainly been used to predict
the most probable future action based on the last action. This kind of analysis would
allow personalizing the learning process based on actions of each student. The model
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1127

proposed by these authors will help to students to execute actions and fulfil learning
objectives using predictive methods.
In [34], the author describes the evolution of computer tools in the transition from
e-learning to v-learning. They report the opportunities that the newcomer provides,
specialty in public higher education. In his study, the researcher analyses some factors
(e.g. motivation) on younger students while they visit a 3D virtual library on Second
Life. A description of the main tools focused to adapt the transition from e-learning to
v-learning is also offered. The author highlights psychological implications of learner’s
experience on Virtual Worlds for future studies.

2.2 Adaptive Learning


This approach focuses on automatically adapting learning design, learning process, and
methodologies according to the cognitive schemes of students or by the identification
of areas where they have difficulties [35]. The customizations come as the result of
analysing the data that is captured while students follow a course, just like Personalized
Learning. However, even when Personalized Learning and Adaptive Learning look
similar, they are not the same. While Personalized Learning refers to customizations by
an instructor, Adaptive Learning refers to techniques that allow the monitoring of student
´s progress and the modification of instructions in real time. In our review, we only found
a single contribution that analyses and use data in real time.
In [36], authors propose a framework for the use of Virtual Worlds in education
focused on the identification of learning flows and the verification of student´s satisfac‐
tion through process mining techniques. Their framework has a core based in a Virtual
World platform known as OPENET4EVE. The authors propose a feature to model
learning processes in Virtual Worlds that can monitor and register the events generated
by students and teachers. Then, they use a Process Miner System to study a real flow of
information in a course. These adaptations can generate a new structure of the learning
process or even a new learning strategy that can be exploited on other case studies.

2.3 Educational Intervention

This approach is a useful instrument to reduce the student failure and promote compe‐
tency-based learning. The aim is to influence the skills development of a learner to ensure
his/her successful training and education [37]. It allows obtaining predictions about the
attitude and behaviour that the student would adopt when confronted to a specific
content, an evaluation or group works. Once again, we were able to find only one contri‐
bution that fits in this category.
In their work, [38] explore the scope of Virtual Worlds and adopt a typology for
virtual communities based on the five forces of Michel Porter [39]. For each community,
they described five elements: purpose, place, platforms, population, and profit model.
The authors selected Second Life as a representative case study for applying two surveys
and analysing results. At last, they provide guidelines for the implementations of future
Virtual Worlds centred on Education, Social Sciences and Humanities. The authors used
1128 F. Gavilanes-Sagnay et al.

the five forces of Porter in order to propose adaptations to Virtual Worlds for providing
learners with the skills needed to success on their courses.

3 Solving Issues Concerning the Use of Virtual Worlds Through


Learning Analytics

In Table 1, we summarize how each of the contributions described above bring solutions
for the most common issues on the use of Virtual Worlds in education.

Table 1. Problems related to the use of Virtual Worlds in Educations


No. Problem Personalized Adaptive Educational
learning learning intervention
1 Identifying conflictive user [7, 25, 30–34] [36]
behaviours
2 Track the students’ interactions with [7, 25, 30–34] [36] [38]
elements of the Virtual World
3 Lack of indicators for following up [7, 30] [36]
the progress of the students in the
courses
4 Lack of implementation of well- [38]
defined evaluation parameters
5 Difficulties for evaluating the [7, 30–34]
individual and collective
contributions while the students
handle tasks
6 Difficulty of keeping students [25, 30, 31, 34]
engaged and motivated
7 A teachers’ very time-consuming [7, 32]
supervision in the search for signs of
doubt, frustration, stress or fatigue
from students
8 Pedagogical issues that are inherent [7, 34]
to conventional learning
9 Absence of virtual tutors for guiding [7, 25, 30–32, 34] [36]
the learning process

Concerning Personalized Learning, we can appreciate in Table 1 that most of the


contributions offer solutions for: identifying conflictive user behaviours, tracking
students’ interactions with elements of the Virtual World, evaluating individual and
collective contributions while the students handle tasks, keeping students engaged and
motivated, and for the absence of virtual tutors for guiding the learning process. These
results are not surprising since Learning Analytics has proven to be very useful for
dealing with these problems in Virtual Environments. Additionally, the problems listed
above are operational in nature and they refer to situations where technological contri‐
butions are easier to implement and evaluate. Conversely, more complex problems (i.e.
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1129

lack of implementation of well-defined evaluation parameters, teachers’ very time-


consuming supervision, and pedagogical issues that are inherent to conventional
learning) have received less attention. Dealing with such issues demand a higher
abstraction level that demands the right construction of indicators for supporting peda‐
gogical decisional process. Nonetheless, some customizations for dealing with these
problems have been implemented based on the analysis of data.
Contributions bringing solutions for Adaptive Learning where, by far, fewer than
those for Personalized Learning. The unique contribution that uses Learning Analytics
deal with several issues: identifying conflictive user behaviours, tracking students’
interactions with elements of the Virtual World, following up the progress of the students
in the courses, and the absence of virtual tutors for guiding the learning process.
Conversely, the problems in where exists absence are: lacking well-defined evaluation
parameters, difficulties for evaluating the individual and collective contributions while
the students handle tasks, difficulties of keeping students engaged and motivated,
teachers’ very time-consuming supervision, and pedagogical issues that are inherent to
conventional learning. Once again, the attention rest in the champ of operational deci‐
sions. However, in this case, it is not surprising since more complex decisions cannot
be taken in a real-time fashion.
About Educational Intervention, contributions are also scarce. As it can be appreci‐
ated in Table 1, the contribution identified in this category deals with only two issues:
track students’ interactions with elements of the Virtual World and implementation of
well-defined evaluation parameters. This is not surprising since competence-based
learning demands complex analysis of data that should respond to the information needs
of pedagogical experts.
Regarding the platforms used for implementing the Virtual Worlds, which were later
supported by Learning Analytics mechanisms, most of the studies used well-established
platforms for hosting virtual worlds (Table 2). Second Life was the most used platform
by the contributions retained in our review. Open Wonderland [40, 41] and Open Simu‐
lator [36], both distributed under Open Source licences, were also preferred by
researchers. The latter two platforms also offer flexibility for implementing monitors
that collect data. The remaining contributions developed their own virtual worlds using
game development platforms as Unity.

Table 2. Platforms used on retained studies


3DVLE Contributions using the platform
Second Life [7, 32, 34, 38]
Open Wonderland [30, 31]
Open Simulator [25, 33]
OPENET4EVE [36]

4 Discussion and Conclusions

Learning Analytics is a powerful tool for improving the use of Virtual Worlds in educa‐
tion. Our review shows that most of the contributions on this field yields on support for
1130 F. Gavilanes-Sagnay et al.

Personalized Learning. That means most of the research were centred on supporting
decisions whose nature falls on operational non-real time tasks (e.g. identifying conflic‐
tive user behaviours, track the students’ interactions with virtual elements, following up
the progress of the students, absence of virtual tutors for guiding the learning process).
On the other hand, the problems in where exists total or relative absence of treatment
were more complex and strategical issues: implementation of well-defined evaluation
parameters, evaluation of individual and collective contributions, keep engagement and
motivation, reducing supervision time, pedagogical issues that are inherent to conven‐
tional learning. Therefore, research opportunities are open in the field of Learning
Analytics for supporting decision-making of teachers and pedagogical authorities
concerning ‘strategical’ decisions about contents, pedagogical design, linearity of the
learning process, Virtual World design, interfaces, evaluation mechanisms, teamwork,
interactions among users, etc.
Surprisingly, few of the contributions can be classified in the camp of Adaptive
Learning. A research opportunity rises on the development of models for automatic
decisions based on real-time data recovered from Virtual Worlds used in education. An
opportunity is also offered for contributions on the field of Educational Intervention,
where the identification of relevant indicators for developing competences and reducing
student deflection is needed.
Neither of the research reviewed has contributed to the development and application
of a framework for dealing with decisions concerning needs of the decision makers of
these courses, neither at operational nor strategical level. Instead, cited contributions
focus on few aspects of the relationship with decision-making but without following a
holistic approach. Even worse, none have even reported the results of asking teachers
or pedagogical authorities about their needs in terms of information needs.

References

1. Saracevic, M.: Concept and types of virtual environments: research about positive impact on
teaching and learning. UNITE: University Journal of Information Technology and Economics
1(1), 51–57 (2014)
2. Letouze, P., Prata, D., Barcelos, A., Barbosa, G., Franc, G., Rocha, M.: Is technology
management education a requirement for a virtual learning environment? In: Technology &
Engineering Management Conference, pp. 404–408 (2017)
3. Milkova, E., Slaby, A.: E-learning as a powerful support of education at universities. In: 28th
International Conference on Information Technology Interfaces, pp. 83–88 (2006)
4. Drachen, A., Sifa, R., Thurau, C.: The name in the game: patterns in character names and
gamer tags. Entertain. Comput. 5(1), 21–32 (2014)
5. Lan, Y., Hsu, T.: Guest editors’ introduction: special issue “ICT in language learning”. Res.
Pract. Technol. Enhanc. Learn. 10(1), 21 (2015)
6. Kumar, S., Daniel, B.: Integration of learning technologies into teaching within Fijian
Polytechnic Institutions. Int. J. Educ. Technol. High. Educ. 13(1), 36 (2016)
7. Cruz-Benito, J., Therón, R., García-Peñalvo, F., Maderuelo, C., Pérez-Blanco, J., Zazo, H.,
et al. Monitoring and feedback of learning processes in virtual worlds through analytics
architectures: a real case. In: 9th Iberian Conference on Information Systems and
Technologies (CISTI), pp. 1–6 (2014)
Improving the Use of Virtual Worlds in Education Through Learning Analytics 1131

8. Virvou, M., Katsionis, G., Manos, K.: Combining software games with education: evaluation
of its educational effectiveness. J. Educ. Technol. Soc. 8, 54–65 (2005). International Forum
of Educational Technology & Society
9. Bremer, P., Weber, G., Tierny, J., Pascucci, V., Day, M., Bell, J..: Interactive exploration and
analysis of large-scale simulations using topology-based data segmentation. In: IEEE
Transactions on Visualization and Computer Graphics, pp. 1307–1324 (2011)
10. Wojciechowski, R., Cellary, W.: Evaluation of learners’ attitude toward learning in ARIES
augmented reality environments. Comput. Educ. 68, 570–85 (2013)
11. Williams, D.: The mapping principle, and a research framework for virtual worlds. Commun.
Theory 20(4), 451–470 (2010)
12. Oliveira, F., Santos, S.: PBLMaestro: a virtual learning environment for the implementation
of problem-based learning approach in computer education. In: 2016 IEEE Frontiers in
Education Conference, pp. 1–9 (2016)
13. Bandura, A.: Perceived self-efficacy in cognitive development and functioning. Educ.
Psychol. 28(2), 117–148 (1993)
14. Hmelo-Silver, C.: Problem-based learning: what and how do students learn? Educ. Psychol.
Rev. 15(3), 22–30 (2004)
15. Goncalves, S., Carneiro, D., Alfonso, J., Fdez-Riverola, F., Novais, P.: Analysis of student’s
context in e-Learning. In: 2014 International Symposium on Computers in Education, pp.
179–182 (2014)
16. Panchoo, S.: Learning space: assessment of prescribed activities of online learners. In: 2017
International Conference on Platform Technology and Service, pp. 1–4 (2017)
17. Boojihawon, D., Gatsha, G.: Using ODL and ICT to develop the skills of the unreached: a
contribution to the ADEA triennial of the Working Group on Distance Education and Open
Learning, pp. 12–17 (2012)
18. Abrami, P., Bernard, R., Wade, A., Schmid, R., Borokhovski, E., Tamin, R., Newman, S.: A
review of e-learning in Canada: a rough sketch of the evidence, gaps and promising directions.
Can. J. Learn. Technol./La revue canadienne de l’apprentissage et de la technologie 32(3)
(2008)
19. Carmody, K., Zane, B.: Existential elements of the online learning experience. Int. J. Educ.
Dev. Using ICT 1(3), 108–119 (2005)
20. Panayides, M.: The impact of organizational learning on relationship orientation, logistics
service effectiveness and performance. Ind. Mark. Manag. 36(1), 68–80 (2007)
21. Sungkur, R., Santally, M., Peerun, S., Foo, R., Wu, Y., Wah, T., et al.: True sight learning-
an innovative tool for learning analytics. In: IEEE International Conference, Emerging
Technologies and Innovative Business Practices for the Transformation of Societies, pp. 235–
240 (2016)
22. Einhardt, L., Tavares, T., Cechinel C.: Moodle analytics dashboard: a learning analytics tool
to visualize users interactions in moodle. In: Proceedings - 2016 11th Latin American
Conference on Learning Objects and Technology, pp. 1–6 (2016)
23. Gros, B.: The design of smart educational environments. Smart Learn. Environ. 3(15), 1–11
(2016)
24. Johnson, J., Shum, S., Willis, A., Bishop, S., Zamenopoulos, T., Swithenby, S., Bourgine, P.:
The FuturICT education accelerator. Eur. Phys. J. Spec. Top. 214(1), 215–243 (2012)
25. Cruz-Benito, J., Therón, R., García-Peñalvo, F., Lucas, E.: Discovering usage behaviors and
engagement in an Educational Virtual World. Comput. Hum. Behav. 47(1), 18–25 (2015)
26. Simon, H.: A mechanism for social selection and successful altruism. Science 250(4988),
1665–1668 (1990)
1132 F. Gavilanes-Sagnay et al.

27. Budgen, D., Brereton, P.: Performing systematic literature reviews in software engineering.
In: Proceeding of the 28th International Conference on Software Engineering, p. 1051 (2006)
28. Sclater, N.: Learning Analytics Explained, 1st edn. Taylor & Francis, London (2017)
29. Hwang, G.: Definition, framework and research issues of smart learning environments - a
context-aware ubiquitous learning perspective. Smart Learn. Environ. 1(1), 4 (2014)
30. Schmidt, M., Laffey, J., Schmidt, C., Wang, X., Stichter, J.: Developing methods for
understanding social behavior in a 3D virtual learning environment. Comput. Hum. Behav.
28(2), 405–413 (2012)
31. Schmidt, M., Laffey, J.: Visualizing behavioral data from a 3D virtual learning environment:
a preliminary study. In: 45th Hawaii International Conference on System Sciences, pp. 3387–
3394 (2012)
32. Chau, M., Wong, A., Wang, M., Lai, S., Chan, K., Li, T., et al.: Using 3D virtual environments
to facilitate students in constructivist learning. Decis. Support Syst. 56(1), 115–121 (2013)
33. Riofrio-Luzcando, D., Ramírez, J.: Predictive student action model for procedural training in
3D virtual environments. Intell. Tutoring Syst. Struct. Appl. Chall. 1(1), 1–2 (2016)
34. Tick, A.: A new direction in the learning processes, the road from eLearning to vLearning.
In: 6th IEEE International Symposium on Applied Computational Intelligence and
Informatics, pp. 359–362 (2011)
35. Ro, T., Bari, B.: Adaptive e-learning environments: research dimensions and technological
approaches. Int. J. Distance Educ. Technol. 11(3), 1–11 (2013)
36. Fernández-Gallego, B., Lama, M., Vidal, J., Mucientes, M.: Learning analytics framework
for educational virtual worlds. Procedia Comput. Sci. 25(1), 443–447 (2013)
37. Atkisson, M., Wiley, D.: Learning analytics as interpretive practice. In: Proceedings of the
1st International Conference on Learning Analytics and Knowledge vol. 1, no. (1), p. 117
(2011)
38. Messinger, P., Stroulia, E., Lyons, K., Bone, M., Niu, R., Smirnov, K., et al.: Virtual worlds
- past, present, and future: New directions in social computing. Decis. Support Syst. 47(3),
204–228 (2009)
39. Porter, C.: A typology of virtual communities: a multi‐disciplinary foundation for future
research. J. Comput. Mediat. Commun. 10(1) (2004)
40. Kaplan, J., Yankelovich, N.: Open wonderland: an extensible virtual world architecture. IEEE
Internet Comput. 15(5), 38–45 (2011)
41. Allison, C., Campbell, A., Davies, C., Dow, L., Kennedy, S., McCaffery, J., et al.: Growing
the use of Virtual Worlds in education: an OpenSim perspective. In: Proceedings of the 2nd
European Immersive Education Summit (2012)
Design and Evaluation of an Online Digital
Storytelling Course for Seniors

David Kaufman ✉ , Diogo Silva, Robyn Schell, and Simone Hausknecht


( )

Simon Fraser University, Burnaby, BC V5A1S6, Canada


dkaufman@sfu.ca

Abstract. Purpose. The purpose of this proposed project was to develop and
evaluate an online version of a digital storytelling course delivered through the
university’s Canvas learning platform. Background. In digital storytelling, partic‐
ipants write their personal stories in a clear and linear structure, and then create
short movies using relatively simple video editing software. This provides an
opportunity to share life lessons, leave a legacy, and engage socially with their
peers. Method. We adapted the content and activities from the earlier face-to-face
course into weekly online modules. The target audience comprised 15 older adults
between 60 and 75 years old. A Research Assistant (RA) provided online assis‐
tance when requested using Skype. A qualitative approach was employed to
collect data, including a demographic questionnaire, module questionnaires, a
course evaluation survey near the end, and individual interviews. Results. The
findings of our evaluation showed that 9 of the 15 participants were able to
complete the online course in varying timeframes. Participants’ feedback was
very positive and all participants who completed the course reported that they
would recommend it to a friend. Conclusion. Two key suggestions emerged for
improving the course. First, make the time and workload requirements clear
during the recruitment process. Second, investigate ways for reducing the time
required to complete the course in future offerings. Despite these suggestions, the
results appear to provide support for offering the digital storytelling online course
to a wider audience of older adults.

Keywords: Digital storytelling · Online course · Seniors

1 Background

Digital storytelling is a multimedia experience that layers narrative, music, sound,


visuals, and sometimes film segments into a short video artifact. To create a digital story,
storytellers reflect upon their past, write a script, and then add the media pieces. These
pieces are tied together through the use of digital technology. This requires that the
learner reflects on a personal story and incorporates media to portray mood and meaning
[1]. The process involved in creating a digital story requires learners to engage with
multimedia technology that may increase visual and digital literacy, while also requiring
consideration on the meaning of the various media elements being used [2]. Furthermore,
the finished product can be shared with a large or small audience, including family and

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1133–1141, 2019.
https://doi.org/10.1007/978-3-030-02686-8_84
1134 D. Kaufman et al.

friends. With the creation of the digital story, the learner is not only learning about
technology as a consumer, but also as a producer [3]. According to the founder of the
StoryCenter, digital storytelling is defined as a process for “gathering of personal stories
into short little nuggets of media called digital stories” (p. 1) [4]. Digital stories are
meaningful and powerful artifacts of modern expression because they integrate narra‐
tive, images, music, which provides depth to characters, situations, experiences, and
insights [5]. Digital stories are generally three to five-minute films that combine images,
text, narration, and music. The images are often personal photos from meaningful events
in the person’s life. The script is created by the storyteller to be incorporated into the
film using a recorded voiceover [6].
Digital storytelling is used in a variety of fields including education when used by
teachers as a powerful tool in 21st century classrooms [7]. Digital storytelling is also
used in cultural studies as a method for creative practices and is used as an additional
resource for oral and historical archives [8]. Digital storytelling is also implemented
globally as a resource for marginalized populations to share their stories [9]. Several
benefits of digital storytelling used in education include improved literacy skills,
promoting 21st century skills, and engaging students and teachers [7]. Some other bene‐
fits of digital storytelling include improved self-efficacy and adoption of educational
technologies [10].
Digital storytelling has also been used to increase adoption of new technologies since
once the project is complete, participants may have an increased sense of self-efficacy
[10]. In a study by Heo [10] with 98 pre-service teachers using a quasi-experimental
design, it was found that after completing a digital storytelling assignment, students
increased both their efficacy rate and their attitudes towards using technology with other
learners in the classroom. Regarding older adult learners, increasing technology efficacy
can also be important to adoption. The current study aimed to examine whether older
adults reported an increase in digital literacy skills after working through a digital story‐
telling project in a 10-week course.
Several benefits of this digital storytelling course curriculum were found previously
for older adults [11, 12]. These benefits include: empowered participants; assisted social
connections with course participants, friends, and family; provides a means for legacy
creation; increased digital storytelling, technology, and internet skills; provides an
opportunity to share stories with others and to learn something new [11, 12].
Storytelling, both digital and traditional, is a social activity that involves a high level
of communication. Previous studies have shown that a lack of communication and social
connection to others can contribute to isolation and loneliness [13, 14], which in turn
can result in problems such as depression and cognitive decline for older adults. It is
important to provide opportunities where older adults can share their experiences, make
connections, and build relationships with others in a positive and supportive social
environment.
The creative process used in digital storytelling can provide older adults with the
means to capture and reflect on memories and lived experiences. The ‘wisdom’ accu‐
mulated through their lives is often valuable to them and also may be to others.
Digital stories can be shared publicly by uploading them to the Internet, saving them
on a digital media device such as a flash drive, or showing them to others in public
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1135

events. With the use of new media, digital stories may allow for an increase in the
everyday voices of elders to be heard [8]. A digital storytelling course also can be viewed
as a lifelong learning experience. Many of the positive aspects attributed to lifelong
learning have been found to lead to an increase in the well-being of older adults [15].
Previous research on storytelling and older adults has examined the effect of recalling
lived experiences and sharing them through autobiographical narratives and reminis‐
cence [16, 17] suggest that sharing autobiographical narratives can have several positive
effects for older adults such as increased self-esteem, a stronger identity, and finding
increased meaning in their lives. A review by Bohlmeijer et al. [16] on reminiscence
research, a process of recalling events in a persons’ life, found that reminiscing had a
moderate effect on life-satisfaction and well-being. In another example, Meléndez Moral
et al. [18] conducted research on integrative reminiscence. In this style of reminiscence,
participants recall events and try to integrate past and present to form meaning. Results
suggested that reminiscence led to positive outcomes of increased self-esteem, life inte‐
gration, life satisfaction, psychological well-being and reduced depression.
Older adults often express an interest in sharing their life experiences with their
family and society. Sharing their stories can be considered as an act of leaving a life
legacy, so that older adults’ family and other individuals may learn through their life
experiences [17]. Older adults may feel that leaving a legacy is a way to keep their
“presence” alive, even after death [19]. Older adults may also wish to share their many
life experiences with a wider audience as these stories may resonate with others outside
their respective families. Digital media provides a way to store, preserve, and share the
digital legacies of older adults [20].
The events within a person’s life are often examined and re-examined for meaning
at different times across the life course. Various researchers have suggested that this
meaning making process contributes to a person’s identity [21–23]. A person will
examine their past experiences to better understand how they became their current self
[21]. New experiences and themes must be evaluated against previous life stories; there‐
fore, “the life story itself develops in terms of its content and themes” (p. 86). McLean,
Pasupathi and Pals [23] suggest that situated stories, those created and told within a
specific situation that have a specific purpose and audience, may affect the development
of self throughout the lifespan. Life stories, similar to ideas of cognitive dissonance, are
most impactful on self-identity when they are about a challenge or disruption [23].

1.1 Outline of the Digital Storytelling Course


The course is intended to give participants an opportunity to explore their life stories
and create a digital artefact, so they could easily share a piece of wisdom or a legacy
story from their life with course participants and others. Storytelling is not done in
isolation, as stories are created to be shared with others. Thus, this course was not
intended to be simply an isolated activity, but the stories were shared at the end in a
“sharing our stories” event within the community. Informed by StoryCenter (previously
The Center for Digital Storytelling), the Digital Storytelling Cookbook [24], and creative
writing and film techniques, the digital storytelling course for older adults was designed
with two separate, yet integrated, phases: story creation and digital production. This
1136 D. Kaufman et al.

enabled sufficient time for deciding on and then writing a solid story before incorporating
the technology. Similarly, Ohler [25] suggested that the first task in digital storytelling
is to teach learners how to be storytellers, and the second is to use multimedia to enhance
their story. Additionally, the course was designed to create as many collaborative expe‐
riences as possible to enhance community, social connection, sharing, and knowledge
construction; however, each participant worked on their own digital story.
The authors designed the course outline, unit plan, weekly lesson plans, and weekly
handouts which were used by all facilitators to ensure the content and delivery were
standardized. During the course, participants learned about story creation and were
provided with numerous opportunities to share ideas and drafts of their stories.
Following story creation, participants digitized their work by combining voice, images,
music, and sounds to illustrate their narrative (Table 1). Although there were many
opportunities to exchange stories and share understanding of each other’s life histories,
social opportunities became more limited as participants spent increased time at their
computers focused on their individual stories. The course consists of an outline module,
nine activity modules (Weeks 1 to 9) and a final module for participants to comment on
each other’s digital video productions (Week 10).

Table 1. Outline of the online digital storytelling course


Week 1 Introducing the course (and evaluation study)
Week 2 Introducing WeVideo, and practice creating a verbal story
Week 3 Writing a script (draft)
Week 4 Sharing the story with peers and revising the script
Week 5 Finding/preparing images and creating a storyboard
Week 6 Recording the narrative in own voice and adding sound/music
Week 7 Editing images and narrative
Week 8/9 Editing and adding final touches
Week 10 Publishing and sharing final digital story & feedback to peers

WeVideo, a browser-based digital storytelling software, was chosen as it is browser-


based and thus allows for access on both Windows-based and Apple computers on the
Internet. Thus, it was expected that participants would spend some time working on their
stories outside of the course.
After offering this course face-to-face several dozen times, we created an online
version to be delivered through the university Learning Management System (LMS)
called Canvas. The length of time to complete the course and create a digital story
generally depends on the storyteller’s commitment to the project, their technology skills
and the complexity of the story. We estimated that it would take between 25–30 h, with
some materials given prior to the workshop to assist participants in preparation (e.g.,
suggestions for script writing, image selection). We expected that by the end of the
course, participants would be able to create a three to six-minute digital story. This
timeline needed to be somewhat flexible to provide a successful and enjoyable experi‐
ence for the participants.
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1137

2 Evaluation Method

2.1 Evaluation Questions

1. What are the participants’ perceptions and opinions of the learning design?
2. What are the participants’ perceptions and opinions of their learning experience?

2.2 Evaluation Approach

The target audience comprised 15 older adults between 60 and 75 years old recruited
through the university’s seniors program and several long-term care facilities. Nine
completed the course and participated in the evaluation study. A qualitative approach
was used for this research and development project. The method employed self-report
questionnaires at the end of each module. Each participant provided data regarding the
quality, clarity and efficacy of the course design and material provided to attain the goals
of each module, and to complete the tasks proposed. In order to capture the participants’
perceptions regarding specific details, the same questionnaire was presented by the end
of each module. In addition, there was a second set of questions for the sole purpose of
evaluating the instructional videos. At the end of the course, we asked for an overall
course evaluation, and followed up with individual interviews to understand the written
evaluations.

2.3 Evaluation Procedures


The following procedures describes the major elements of our research study.
• Designed participant surveys, interview guide, and rubrics to analyze stories.
• Actively recruited participants through current contacts as well as new ones.
• Delivered the online course and collected data.
• Conducted surveys online after each module.
• Administered online survey at conclusion of course.
• Interviewed the nine participants that finished at the conclusion of the course via
skype and telephone.
• Analyzed data continuously and adapt the interventions, if required.
• Uploaded participants’ stories on the university vault to ensure security.

2.4 Evaluation Instruments

The course evaluation relied on a pre-questionnaire, module questionnaires, course


evaluation, and interview guide. The evaluation instruments were applied as follows:
• Pre-questionnaire (end of the course outline Week 1)
This questionnaire was used to collect information on participants’ age, gender,
computer literacy and usage frequency, video editing software skills, and if they had
taken online courses before.
1138 D. Kaufman et al.

• Course evaluation (end of Week 10)


The course evaluation comprised a Likert scale assessing the helpfulness of the
course facilitator, the process used to guide participants in writing their personal story,
the quality of the video editing software, the course’s level of difficulty, participants’
level of satisfaction with the course, and whether or not participants would recommend
the course to a friend. It also contained open-ended questions about what participants
liked the most and the least on the module, what could be changed, and if they would
like to add any comments.
• Interview questionnaire guide (end of the course)
The interview guide comprised eight open-ended questions used for an audio
recorded interview via Skype software. Participants were asked to talk about the 3
following issues: (1) their experience in taking the online course, (2) what they liked
most and least about it, (3) if the written instructions and instructional videos clearly
explained the expected outcomes of the activities, (4) whether this material had been
enough to instruct them throughout the modules, and (5) whether the absence of a person
explaining face-to-face made them feel insecure. It also asked participants what could
be changed, and if they would like to add any comments.

3 Data Analysis

We coded and analyzed the data for the questions that included quantitative responses
using Excel as a tool since the number of our participants was small. Using a content
analysis framework, we developed a codes and categories to help us identify broad
themes based on our data analysis.

4 Preliminary Evaluation Results

The preliminary results showed consistency in both the profile of participants and their
questionnaire and interview responses. The results reported here are from the nine
participants who concluded the course, consisting of three in the pilot test phase and six
in the field test phase.
The pilot test phase was designed to find possible flaws in the design and the three
participants were personal acquaintances of the researchers. The field test phase
recruited a total of 13 participants, and seven dropped out in the first module. Of the
seven dropouts, four provided feedback as to why they did not move past the activities
of the Week 1 module, all claiming that the volume of work proposed in each module
did not match their time schedules. This showed that, even after retirement, these partic‐
ipants were busy with either part-time jobs or social activities.
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1139

4.1 Pre-questionnaire
Most participants were women with ages ranging from 65 to 74 years. The only two age
exceptions were one participant who checked the age box of 75 to 79. In terms of
computer usage and skills, all nine participants claimed to use a computer daily, with
six of them having very good skills. Only two had taken an online course before.

4.2 Course Evaluation and Interviews

Of the nine participants, eight rated the process used to guide them in writing their own
story as good or very good, with only one rating it as fair. All nine participants rated
WeVideo as good or very good, and all of them were satisfied or very satisfied with the
course and would recommend it to a friend. When rating the course’s level of difficulty,
six considered it to be easy or just right, while three considered it to be difficult.
The two repeating points that participants liked least about the course were its length,
which conflicted with their personal schedules, and the lack of constant forum partici‐
pation. The participants who complained about the latter suggested guidelines for how
the forum discussions should take place, in order to create more interaction, as well as
interaction with the facilitator instead of individual emails.
During the interviews, the three main factors brought up by participants were the
strong notion of having created a legacy that reflects their family values, the feeling of
agency for possessing the knowledge to produce a product that defines their own image,
and the chance to discover a new group of people who shared the positive experience.
It is interesting to note that the completion rate for the course was similar to regular
university-based online courses, even though participants were seniors and this was a
non-credit course.

5 Conclusion

Two key suggestions emerged for improving the course. First, make the time and work‐
load requirements clear during the recruitment process. Second, investigate ways for
reducing the time required to complete the course in future offerings. Despite these
suggestions, all participants who completed the course reported that they would recom‐
mend it to a friend. This pilot and field test appears to provide support for offering the
online course to a wider audience in order to make digital storytelling accessible to more
older adults.

Acknowledgments. We wish to thank the Social Sciences and Humanities Research Council of
Canada (SSHRC) and the AGE-WELL National Centre of Excellence Network for financial
support of this project.
1140 D. Kaufman et al.

References

1. Hausknecht, S., Vanchu-Orosco, M., Kaufman, D.: Digitising the wisdom of our elders:
connectedness through digital storytelling. Published online on 17 July, pp. 1–21 (2018)
2. Jakes, D.S., Brennan, J.: Digital Storytelling, visual literacy and 21st century skills. In: Online
Proceedings of the Tech Forum New York (2005)
3. Miller, P.J., Cho, G.E., Bracey, J.R.: Working-class children’s experience through the prism
of personal storytelling. Hum. Dev. 48(3), 115–135 (2005)
4. Lambert, J.: Digital Storytelling: Capturing Lives, Creating Community. Digital Diner Press,
Berkeley (2006)
5. Rule, L.: Digital storytelling: never has storytelling been so easy or so powerful. Knowledge.
Quest, 38, 4, 56 (2010)
6. Stenhouse, R., Tait, J., Hardy, P., Sumner, T.: Dangling conversations: reflections on the
process of creating digital stories during a workshop with people with early-stage dementia.
J. Psychiatr. Ment. Health Nurs. 20(2), 134–141 (2013)
7. Robin, B.R.: Digital storytelling: a powerful technology tool for the 21st century classroom.
Theory Pract. 47(3), 220–228 (2008)
8. Burgess, J.: Hearing ordinary voices: cultural studies, vernacular creativity and digital
storytelling. Continuum 20(2), 201–214 (2006)
9. Sawhney, N.: Voices beyond walls: the role of digital storytelling for empowering
marginalized youth in refugee camps. In: Proceedings of the 8th International Conference on
Interaction Design and Children, pp. 302–305. ACM (2009)
10. Heo, M.: Digital storytelling: an empirical study of the impact of digital storytelling on pre-
service teachers’ self-efficacy and dispositions towards educational technology. J. Educ.
Multimed. Hypermedia 18(4), 405 (2009)
11. Hausknecht, S., Schell, R., Zhang, F., Kaufman, D.: Older adults’ digital gameplay: a follow-
up study of social benefits. In: Information and Communication Technologies for Ageing
Well and e-Health. Springer International Publishing, pp. 198–216 (2015)
12. Hausknecht, S., Vanchu-Orosco, M., Kaufman, D.: Shaing life stories: design and evaluation
of a digital storytelling workshop for older adults. In: Computer Supported Education: 8th
International Conference, CSEDU 2016, Rome, Italy. Revised Selected Paper. Springer
International Publishing (2016)
13. Rook, K.S.: Social relationships as a source of companionship: Implications for older adults’
psychological well-being. In: Sarason, B.R., Sarason, I.G., Gregory, R.P. (eds.) Social
Support: An Interactional View, pp. 219–250. Wiley, New York (1990)
14. Boulton-Lewis, G.M., Buys, L., Lovie-Kitchin, J.: Learning and active aging. Educ. Gerontol.
32(4), 271–282 (2006)
15. Weinstein, L.B.: Lifelong learning benefits older adults. Act. Adapt. Aging 28(4), 1–12 (2004)
16. Bohlmeijer, E., Roemer, M., Cuijpers, P., Smit, F.: The effects of reminiscence on
psychological well-being in older adults: a meta-analysis. Aging Ment. Health 11(3), 291–
300 (2007)
17. Birren, J.E., Deutchman, D.E.: Guiding Autobiography Groups for Older Adults: Exploring
the Fabric of Life. Johns Hopkins University Press (JHU Press), Baltimore (1991)
18. Meléndez Moral, J.C., Fortuna Terrero, F.B., Sales Galán, A., Mayordomo Rodríguez, T.:
Effect of integrative reminiscence therapy on depression, well-being, integrity, self-esteem,
and life satisfaction in older adults. J. Posit. Psychol. 10(3), 240–247 (2015)
19. Wallace, J., Wright, P.C., McCarthy, J., Green, D.P., Thomas, J., Oliver, P.: A design-led
inquiry into personhood in dementia. In: Conference Proceedings CHI 2013: Changing
Perspectives, Paris, France (2013)
Design and Evaluation of an Online Digital Storytelling Course for Seniors 1141

20. Sherlock, A.: Larger than life: digital resurrection and the re-enchantment of society. Inf. Soc.
29(3), 164–176 (2013)
21. Pasupathi, M., Mansour, E., Brubaker, J.R.: Developing a life story: Constructing relations
between self and experience in autobiographical narratives. Hum. Dev. 50(2–3), 85–110
(2007)
22. McAdams, D.P., McLean, K.C.: Narrative identity. Curr. Dir. Psychol. Sci. 22(3), 233–238
(2013)
23. McLean, K.C., Pasupathi, M., Pals, J.L.: Selves creating stories creating selves: a process
model of self-development. Pers. Soc. Psychol. Rev. 11(3), 262–278 (2007)
24. Lambert, J.: Digital Storytelling Cookbook. Berkeley, CA (2010)
25. Ohler, J.: The world of digital storytelling. Educ. Lead. 63(4), 44–47 (2006)
The Role of Self-efficacy in Technology Acceptance

Saleh Alharbi1,2 ✉ and Steve Drew3


( )

1
Griffith University, Gold Coast, Australia
saleh@su.edu.au
2
Shaqra University, Riyadh, Saudi Arabia
3
Tasmanian Institute of Learning and Teaching, University of Tasmania, Hobart, Australia

Abstract. In this paper, we propose a model that can be used to evaluate the role
of individual differences in technology acceptance. More specifically, this study
reviews the impact of self-efficacy on technology acceptance by proposing a
model that uses Technological Pedagogical and Content Knowledge (TPACK)
to underpin self-efficacy. The model explains the influence that self-efficacy may
have on perceived ease of use and perceived usefulness in the Technology
Acceptance Model (TAM). This model can be integrated into future research
concerning e-learning readiness with particular focus on educational settings. The
adaptability of the model will enable researchers to introduce additional factors
specific to their study context and the research design being employed.

Keywords: Individual differences · Technology acceptance · Higher education

1 Introduction

Individual differences is a term used to describe the variations between individuals that
determine the ability to successfully achieve desired results [1]. Traits, personal circum‐
stances and characteristics, perceptions, and behavior are major determinants of indi‐
vidual differences [2]. Individual differences are considered a vital factor in the context
of technology acceptance. Agarwal and Prasad [3] state that while it is not clearly known
how strong the effect of individual differences on technology acceptance is, the impor‐
tance of individual differences as a vital construct in technology acceptance is indisput‐
able. Investigating individual differences can assist organizations in creating a profile
for individual users within the organization. Therefore, based on the user’s profile, tech‐
nology acceptance can be facilitated by introducing various intermediations, which may
improve individuals’ beliefs about certain technology.
According to Hong et al. [4], individual differences are present in many studies
concerning information system success [5, 6] and human/computer interactions [7].
Previous research has identified different variations of individual beliefs that affect
technology acceptance, such as self-efficacy [8], computer self-efficacy [4, 9–11], ease
of use and usefulness [12] (Reference removed according to blind review policy), and
experience with educational tools, LMS in particular [13]. This study attempts to provide
researchers with a guide for investigating the influence of self-efficacy on ease of use
and usefulness in educational settings. More specifically, this study will focus on the

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1142–1150, 2019.
https://doi.org/10.1007/978-3-030-02686-8_85
The Role of Self-efficacy in Technology Acceptance 1143

influence of individual differences in self-efficacy on ease of use and usefulness, and


potentially behavioral intention to use a certain technology. The outcome of this paper
will be a model that can be integrated in any study concerning technology acceptance
with particular focus on the use of technologies to mediate learning and teaching.
The rest of the paper is structured as follows. First, related literature is explored to
provide a theoretical background for the proposed role of self-efficacy in technology
acceptance. Then, the study investigates the possible interrelationships between factors
explored in the literature review section. Finally, the proposed model is presented
followed by the conclusion which discusses future research plans and how the proposed
model can be validated.

2 Theoretical Background

2.1 Perceived Ease of Use and Usefulness

Perceived ease of use (PEOU) and perceived usefulness (PU) are two internal beliefs
within the technology acceptance model (TAM) [12]. According to Davis [12],
perceived usefulness is “the prospective user’s subjective probability that using a
specific application system will increase his or her job performance within an organi‐
zational context” (p. 985) and perceived ease of use is “the degree to which the prospec‐
tive user expects the target system to be free of effort” (p. 985). Ease of use and usefulness
are factors that can be affected by external variables and mediate the influence of these
external variables on users’ attitudes towards a certain technology, their behavioral
intention to use, and the actual technology use. TAM is a well-tested technology accept‐
ance theory and has been adapted in several fields of technology and system use (Fig. 1)
[14–17].

Fig. 1. The technology acceptance model [12].

2.2 Technological Pedagogical and Content Knowledge

Teaching is argued to be an ill-structured process in nature and involves complex


processes. This complexity has attracted a large volume of research to investigate the
teachers’ thought processes and the different types of knowledge teachers need. Recent
research identified three primary types of teacher knowledge: content knowledge,
teaching knowledge, and technological knowledge. These three unitary components of
knowledge are the core elements of the technological pedagogical and content
1144 S. Alharbi and S. Drew

knowledge framework (TPACK) proposed by Mishra and Koehler [18]. TPACK


extends the pedagogical content knowledge theory (PCK) introduced by Shulman [19]
which is defined in Shulman [20] as “the special amalgam of content and pedagogy that
is uniquely the province of teachers, their own special form of professional under‐
standing” (p. 8). Generally, Shulman [19] aimed to build a coherent framework that
explains the type of knowledge teachers should have and the relationship among content-
related knowledge and pedagogy knowledge. PCK involves an understanding of general
pedagogical knowledge that goes beyond subject matters such as classroom organization
and management. Further, PCK advocates the importance of knowledge related to
students and their personal traits. Another aspect of knowledge included within PCK is
the knowledge of educational context, like the understanding of the community cultures,
educational goals and purposes, content knowledge, curriculum knowledge, and peda‐
gogical content knowledge. Despite this broad range of included knowledge, PCK does
not envision the importance of ICT in the teaching process because it does not recognize
the importance of ICT.
In this digital age, the introduction of technology into classrooms is inevitable. This
change has motivated educational researchers to understand the effects of technology
on teaching and teachers’ beliefs [21]. This is the main idea behind Mishra and Koehler

Fig. 2. The TPACK framework (source [22]).


The Role of Self-efficacy in Technology Acceptance 1145

[18] TPACK framework (Fig. 2) which adds technological knowledge to the PCK
theory. Technological knowledge discusses the knowledge required by teachers to
effectively integrate technology into the teaching process. Therefore, the TPACK frame‐
work combines the complex interplay of pedagogical knowledge, content knowledge,
and technological knowledge. According to Mishra and Koehler [18]:
TPACK is the basis of good teaching with technology and requires an understanding of the
representation of concepts using technologies, pedagogical techniques that use technologies in
constructive ways to teach content; knowledge of what makes concepts difficult or easy to learn
and how technology can help redress some of the problems that students face; knowledge of
students’ prior knowledge and theories of epistemology; and knowledge of how technologies
can be used to build on existing knowledge and to develop new epistemologies or strengthen
old ones (p. 1029).

Technology Knowledge (TK): Technology knowledge refers to the knowledge


required by teachers to embrace a new technology for teaching. Teachers raise and
evolve their level of technological knowledge to get the most out of recent technology
either for teaching or in daily life. This includes knowledge about basic digital technol‐
ogies, communication technology, information processing and systems, and the ability
to solve technological issues when they occur.
Content Knowledge (CK): According to Mishra and Koehler [18], content knowledge
is the “knowledge about actual subject matter that is to be learned or taught” (p. 1026).
Different subjects (e.g. earth science, mathematics, language, arts) have different
content, and therefore teachers must be knowledgeable about their subjects’ content.
Pedagogical Knowledge (PK): This type of knowledge refers to the strategies that may
be used for teaching and the understanding of learning and teaching practices and theo‐
ries. For example, classroom management, plan development, and assessment are all
pedagogical knowledge teachers must have.
Pedagogical Content Knowledge (PCK): This is knowledge of subject matters, with
reference to knowledge of teaching methods [19]. The combination of content and
pedagogy knowledge aims to improve teaching strategies in the content areas.
Technological Content Knowledge (TCK): Technological content knowledge repre‐
sents the knowledge required by teachers to present subject matter effectively, using a
specific technology. This type of knowledge enables changing learning practices, as
specific technology could be used for specific content.
Technological Pedagogical Knowledge (TPK): Technological pedagogical knowl‐
edge implies how technological knowledge can be used to implement various teaching
methods. Thus, the way teachers teach may change with the introduction of technology
in classrooms.
Technological Pedagogical Content Knowledge (TPACK): Technological pedagog‐
ical content knowledge is the knowledge required to effectively integrate technology to
implement different types of teaching methods with different types of subject content.
TPACK is a complex intersection between three domains of knowledge (content knowl‐
edge, pedagogy knowledge, and technological knowledge), where teachers intuitively
1146 S. Alharbi and S. Drew

understand how to teach specific content using suitable teaching strategies and specific
technology.

2.3 Self-efficacy
According to Angeli and Valanides [23], individuals’ beliefs and experiences are signif‐
icant constructs that may mediate individuals’ use of ICT in education. Self-efficacy
describes an individual’s ability to do a certain task in a given context. Bandura [24]
defines self-efficacy as “beliefs in one’s capabilities to organize and execute the courses
of action required to produce given attainments” (p. 3), which can influence various
aspects of one’s behavior.
Computer self-efficacy has been repeatedly highlighted in educational research. In
various studies, computer self-efficacy is a concept that is used to describe teachers’
perceptions of their level of confidence to use technology-enhanced learning to facilitate
teaching and the student learning process. The studies also proposed a relationship
between teacher anxiety and computer self-efficacy, in which one’s anxiety is due to
his/her low level of efficacy in using ICT in teaching [25]. This anxiety may hinder the
introduction of technology to enhance the teaching experience and improve student
knowledge. Teaching with technology is also linked to computer self-efficacy belief in
many studies. For instance, based on Bandura’s theory, Wong et al. [26] introduced
computer teaching efficacy as a factor that may affect technology acceptance in an
educational setting. Computer teaching efficacy is defined as one’s perception of their
level of competence and ability to adopt computers in teaching [26].
Previous studies suggest that higher self-efficacy belief may enhance technology
acceptance, while lower self-efficacy may affect one’s decision to accept new tech‐
nology. In a similar study, Park et al. [27] found that self-efficacy, among other psycho‐
logical traits, is a significant determinant of technology acceptance, and the higher self-
efficacy is, the higher technology acceptance will be. Similarly, Bandura [24] advocated
that the level of one’s confidence to perform a task successfully and the outcome expect‐
ation have a direct impact on the motivation to perform that task.

2.4 The Relationship Between Self-efficacy and TAM

Various individual characteristics have been examined in technology acceptance studies.


For instance, many studies have examined the impact of computer self-efficacy on tech‐
nology acceptance through the effect on perceived ease of use, perceived usefulness, and
behavioral intention to use a given technology. In line with this present research, studies
that investigated the effect of external variables, such as individual differences, particu‐
larly computer-self efficacy, on the core constructs of TAM are explored.
There is a consensus among social scientists that a relationship exists between indi‐
vidual differences and perceived ease of use and behavioral intention to use a certain
technology. A study conducted by Darsono [28] revealed that computer self-efficacy
indirectly impacts both perceived ease of use and perceived usefulness. The behavioral
intention to use also seems to be directly affected by individual characteristics such as
computer self-efficacy. Similarly, Gong et al. [29] examined different determinants in
The Role of Self-efficacy in Technology Acceptance 1147

relation to technology acceptance in an educational setting. The study showed a strong


direct impact of self-efficacy on perceived ease of use, and a weaker relationship between
self-efficacy and behavioral intention. Sharp [30] carried out a study using TAM and
reported that computer self-efficacy significantly affected perceived ease of use, with
comparable findings in other similar studies [1, 29]. Yi and Hwang [31] investigated the
application of TAM for a web-based IS and found that self-efficacy is a strong deter‐
minant of ease of use, and in combination with behavioral intention, significantly affect
the actual use. In summary, it appears that self-efficacy is a significant determinant of
perceived ease of use, but not perceived usefulness.
In contrast, several researchers have challenged the previous studies on the grounds
that self-efficacy may affect perceived usefulness. In their study, Stylianou and Jackson [2]
found that self-efficacy influenced perceived usefulness. Teo [32] applied TAM to inves‐
tigate pre-service teachers’ technology acceptance, and reported that the impact of
computer self-efficacy on perceived usefulness is higher than the impact on perceived ease
of use.
It appears that there is inconsistency with this argument. To the authors’ best knowl‐
edge, there is a lack of clarity on the impact of individual characteristics on one’s decision
to engage in using technology. Furthermore, the previous studies did not clearly show the
impact of self-efficacy on perceived usefulness, which is a main construct within TAM and
indirectly affects behavioral intention, and therefore the actual use of the system.
Bandura’s theory states that there is a significant relationship between self-efficacy
and teacher’s knowledge. The theory suggests that improving teachers’ knowledge
would improve their self-efficacy belief, which would lead to increased technology use
as a medium of instruction. As discussed previously, the types of knowledge represented
in the TPACK domain, and self-efficacy beliefs are considered significant factors that
may influence teachers’ decisions to incorporate technology to facilitate teaching and
improve information delivery methods. Many educational studies that incorporate
TPACK discuss the importance of self-efficacy beliefs in the involvement of ICT in
education. Senemoğlu [33] as cited in Kazu and Erten [34] states that self-efficacy is an
important factor in the development of TPACK. In their study, Yi and Hwang [31]
investigated self-efficacy in terms of teacher’s technological pedagogical content knowl‐
edge in relation to web instructions. The study advocates that assessing self-efficacy is
essential for providing information on teachers’ education and professional develop‐
ment. Understanding the relationship between self-efficacy beliefs and the different
types of knowledge in TPACK could potentially assist in the successful integration of
technology in teaching.

3 Proposed Model

In accordance with the study aims, academics’ technological self-efficacies are deter‐
mined by their TPACK scores. For the sake of simplicity, only types of knowledge
related to technology are assessed. Based on the discussion above, the relationships
between TPACK constructs and TAM are hypothesized as seen in H1–H4 in Table 1.
1148 S. Alharbi and S. Drew

Table 1. Summary of proposed hypotheses


Hypothesis number Statement
H1 TK, TCK, TPK and TPACK will significantly influence PU
H2 TK, TCK, TPK and TPACK will significantly influence PEOU
H3 TK will significantly influence PEOU positively
H4 TK will significantly influence PU positively
H5 TCK will significantly influence TPACK positively
H6 TPK will significantly influence TPACK positively
H7 TK will significantly influence TPACK positively
H8 TK will significantly influence TPK positively
H9 TK will significantly influence TCK positively
H10 TCK will significantly influence TPK positively

The hypotheses H1 and H2 investigate the joint effect of TPACK on TAM constructs.
The relationship between academics’ technological knowledge and both PEOU and PU
is presented in the hypotheses H3 and H4.
This research also investigates the interrelationships between TPACK constructs as
seen in H5–H10 in Table 1. The suggested model is depicted in the Fig. 3.

Fig. 3. The proposed model for self-efficacy assessment.

4 Discussion

The aim of this paper was to explore the influence of individual differences on tech‐
nology acceptance. It is evident from the literature discussed above that the significance
of self-efficacy as a predictor of technology acceptance has received little attention.
Therefore, in response to this gap this study proposed a model that can be used as a base
for future research investigating the influence of self-efficacy on technology acceptance.
The model is derived from well-known theories, namely TAM and TPACK, and is
suitable for assessing technology acceptance assessment, especially in educational
The Role of Self-efficacy in Technology Acceptance 1149

settings. We recommend the model be validated through quantitative empirical study to


examine the proposed constructs and test the hypotheses. The scale used for data collec‐
tion will be context-specific, therefore the model acts as a guide and measuring items
are left for future researchers to contextualize.

5 Concluding Remarks

This paper is limited to provide a theoretical base for larger studies to investigate the
role of individual differences in technology acceptance. Therefore, we recommend the
model be validated through quantitative empirical study to examine the proposed
constructs and test the hypotheses. The scale used for data collection will be context-
specific, therefore the model acts as a guide and measuring items are left for future
researchers to contextualize.

References

1. Lewis, W., Agarwal, R., Sambamurthy, V.: Sources of influence on beliefs about information
technology use: an empirical study of knowledge workers. MIS Q. 27, 657–678 (2003)
2. Stylianou, A.C., Jackson, P.J.: A comparative examination of individual differences and
beliefs on technology usage: Gauging the role of IT. J. Comput. Inf. Syst. 47(4), 11–18 (2007)
3. Agarwal, R., Prasad, J.: Are individual differences germane to the acceptance of new
information technologies? Decis. Sci. 30(2), 361–391 (1999)
4. Hong, W., Thong, J.Y., Wong, W.-M., Tam, K.Y.: Determinants of user acceptance of digital
libraries: an empirical examination of individual differences and system characteristics. J.
Manag. Inf. Syst. 18(3), 97–124 (2002)
5. Harrison, A.W., Rainer, R.K.: The influence of individual differences on skill in end-user
computing. J. Manag. Inf. Syst. 9(1), 93–112 (1992)
6. Zmud, R.W.: Individual differences and MIS success: a review of the empirical literature.
Manage. Sci. 25(10), 966–979 (1979)
7. Dillon, A., Watson, C.: User analysis in HCI—the historical lessons from individual
differences research. Int. J. Hum. Comput. Stud. 45(6), 619–637 (1996)
8. Igbaria, M., Iivari, J.: The effects of self-efficacy on computer usage. Omega 23(6), 587–605
(1995)
9. Chau, P.Y.: Influence of computer attitude and self-efficacy on IT usage behavior. J. Organ.
End User Comput. (JOEUC) 13(1), 26–33 (2001)
10. Hasan, B.: Delineating the effects of general and system-specific computer self-efficacy
beliefs on IS acceptance. Inf. Manag. 43(5), 565–571 (2006)
11. Ariff, M.S.M., Yeow, S., Zakuan, N., Jusoh, A., Bahari, A.Z.: The effects of computer self-
Efficacy and technology acceptance model on behavioral intention in Internet banking
systems. Procedia Soc. Behav. Sci. 57, 448–452 (2012)
12. Davis, F.D.: Perceived usefulness, perceived ease of use, and user acceptance of information
technology. MIS Q. 13, 319–340 (1989)
13. Taylor, S., Todd, P.: Assessing IT usage: the role of prior experience. MIS Q. 19(4), 561–
570 (1995)
14. Al-Busaidi, K.A., Al-Shihi, H.: Instructors’ acceptance of learning management systems: a
theoretical framework. Commun. IBIMA 2010, 1–10 (2010)
1150 S. Alharbi and S. Drew

15. Ma, Q., Liu, L.: The technology acceptance model: a meta-analysis of empirical findings. J.
Organ. End User Comput. (JOEUC) 16(1), 59–72 (2004)
16. Kim, D., Chang, H.: Key functional characteristics in designing and operating health
information websites for user satisfaction: an application of the extended technology
acceptance model. Int. J. Med. Inform. 76(11), 790–800 (2007)
17. Moon, J.-W., Kim, Y.-G.: Extending the TAM for a World-Wide-Web context. Inf. Manag.
38(4), 217–230 (2001)
18. Mishra, P., Koehler, M.: Technological pedagogical content knowledge: a framework for
teacher knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006)
19. Shulman, L.S.: Those who understand: knowledge growth in teaching. Educ. Res. 15(2), 4–
14 (1986)
20. Shulman, L.S.: Knowledge and teaching: foundations of the new reform. Harv. Educ. Rev.
57(1), 1–23 (1987)
21. Margerum-Leys, J., Marx, R.W.: Teacher knowledge of educational technology: a case study
of student/mentor teacher pairs. J. Educ. Comput. Res. 26(4), 427–462 (2002)
22. tpack.org: The TPACK Framework. In: Reproduced by Permission of the Publisher © 2012
by tpack.org (2012). http://tpack.org
23. Angeli, C., Valanides, N.: Epistemological and methodological issues for the conceptualization,
development, and assessment of ICT–TPCK: advances in technological pedagogical content
knowledge (TPCK). Comput. Educ. 52(1), 154–168 (2009)
24. Bandura, A.: Self-efficacy: The Exercise of Control. W.H. Freeman and Company, New York
(1997)
25. Brown, I.T.: Individual and technological factors affecting perceived ease of use of web-based
learning technologies in a developing country. Electron. J. Inf. Syst. Dev. Ctries. 9(5), 1–15
(2002)
26. Wong, K.-T., Teo, T., Russo, S.: Influence of gender and computer teaching efficacy on
computer acceptance among Malaysian student teachers: an extended technology acceptance
model. Australas. J. Educ. Technol. 28(7), 1190–1207 (2012)
27. Park, S., et al.: Acceptance of computer technology: understanding the user and the
organizational characteristics. In: Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, vol. 50, pp. 1478–1482. SAGE Publications (2006)
28. Darsono, L.I.: Examining information technology acceptance by individual professionals.
Gadjah Mada Int. J. Bus. 7(2), 155–178 (2005)
29. Gong, M., Xu, Y., Yu, Y.: An enhanced technology acceptance model for web-based learning.
J. Inf. Syst. Educ. 15(4), 365–373 (2004)
30. Sharp, J.H.: Development, extension, and application: a review of the technology acceptance
model. Inf. Syst. Educ. J. 5(9), 1–10 (2006)
31. Yi, M.Y., Hwang, Y.: Predicting the use of web-based information systems: self-efficacy,
enjoyment, learning goal orientation, and the technology acceptance model. Int. J. Hum
Comput Stud. 59(4), 431–449 (2003)
32. Teo, T.: Modelling technology acceptance in education: a study of pre-service teachers.
Comput. Educ. 52(2), 302–312 (2009)
33. Senemoğlu, N.: Gelişim Öğrenme ve Öğretim Kuramdan Uygulamaya (16. Baskı). Ankara:
Pegem Akademi Yay. Eğt. Dan. Hiz. Tic. Ltd., Şti (2010)
34. Kazu, I.Y., Erten, P.: Teachers’ technological pedagogical content knowledge self-efficacies.
J. Educ. Train. Stud. 2(2), 126–144 (2014)
An Affective Sensitive Tutoring System
for Improving Student’s Engagement in CS

Ruth Agada1, Jie Yan1(&), and Weifeng Xu2


1
Bowie State University, Bowie, MD 20715, USA
jyan@bowiestate.edu
2
University of Baltimore, Baltimore, MD 21201, USA

Abstract. With the growing popularity of online teaching and tutoring, there
are many attempts to enhance students’ learning experience during the lecture.
This paper presents an animated tutoring system for improving student
engagement using nonverbal cues, including students’ facial expressions. The
system can (1) capture students’ facial expressions in the scenario; (2) identify
various facial expressions, including anger, disgust, fear, sadness, happy, and
surprise; and (3) provide feedback to students based on students’ facial
expressions. To evaluate the tutoring system, we predicate the student
engagement using support vector machine with the captured information, and
measure students’ engagement using students’ academic performance, i.e., in-
system exercise, quizzes, and exams. Our empirical study shows that the student
performance using the level 2 animation is 10% and 20% high then levels 1 and
0, respectively.

Keywords: Tutoring system  Virtual character animation


Conversational agent

1 Introduction

In communications, a speaker often uses a composite communication model to better


interact with the audiences. In other words, the speaker may not know the knowledge
state of their participants, however, the speaker may adapt his/her behavior to best suit
the situation by looking at the facial expressions, body gesture and other non-verbal
cues of these participants. In a virtual environment setting, virtual agents similarly
express their states using similar modals. In this context, automatic synthesis of hand
gestures in synchrony with face, as well as speech, is expected to incorporate nonverbal
communication components into virtual character animation to improve the believ-
ability of animations. Such approach can be found in a wide range of applications in
human-centered video gaming and film industries.
As students refer to mobile and online models to supplement learning, there are
many attempts to attune different virtual teaching strategies using sensory modality
available to its human counterpart. Early studies of nonverbal behavior in tutoring
relied on manual observations of affect and nonverbal behavior [1]. Most studies have
examined individual modalities in detail, such as facial expression [2–4], posture [5, 6],
or gesture [6, 7]. As prevailing research focuses on the type of interaction and type of

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1151–1163, 2019.
https://doi.org/10.1007/978-3-030-02686-8_86
1152 R. Agada et al.

impactful agent behaviors [8–13], the current work builds on this by examining mul-
timodal data streams, which can provide rich evidence of students’ cognitive and
affective states, in addition to evidence captured from student logs. It is likely that a
multimodal combination of automatically tracked affective data streams would need to
be considered to best adapt to learner affect during tutoring [1, 14].
In this paper, we present an affect sensitive framework for an affect capable tutoring
agent for CS students. The affect sensitive tutor for C++ (AST4CPP) has an embedded
tutoring agent that can help supplement teachers, assist, and motivate learners in their
process of distributed learning environments [15]. The study aims to establish more
effective communication with the students by recognizing the students’ gestures and
emotional state beyond the heavily studied frustration and boredom emotions and using
a virtual character capable of expressing its emotional state. In particular, the system
automatically detects students’ gestures and frontal faces in the video input stream and
recognizes the emotion with respect to six basic facial expressions (anger, disgust, fear,
joy, sadness, and surprise) which were suggested by Ekman [16] and further attempts
to correlate those expressions to affect body gestures.

2 Related Works

Several systems incorporate game-based learning environments which are adequately


researched in [1, 8]. These systems employ different means of engaging the student.
These means are anything from narrative centered learning to simple didactic learning
strategies. Results from these studies have shown that they deliver experiences in
which learning, and engagement are synergistic, also outlined in [8]. Student interac-
tion data has provided a rich source of information from which students’ development
of competencies and progress towards learning goals are diagnosed.
Most notably in the development of affect sensitive tutoring systems is the Auto-
Tutor [17], which features multimodal systems to predict affect based on emotion set
defined by experts. Focusing on the extended emotion set defined by [18–20], the
system produced the best levels of agreement, with Cohen’s K of 0.33 for fixed
emotion judgments and 0.39 for spontaneous ones [1].
As stated in the previous section, as far as different modalities studied for affect
information, the face is a large source of affect information [2–4, 21]. Any system in
development must incorporate facial expression data with any other modality. As far as
spontaneous data sets are created, stimuli to elicit spontaneous facial actions have been
highly controlled and camera orientation has been frontal with little or no variation in
head pose. Rapid head movement also may be difficult to automatically track through a
video sequence. Head motion and orientation to the camera are important if AU
detection is to be accomplished in social settings where facial expressions often occur
with head motion [22]. Moreover, the intensity of the expressions plays a big role in
accurately recognizing the expressions [23, 24].
Additionally, body gestures also communicate different affect states. Those motions
in congress with facial affect add more weight to the full affect state of the student. As a
result studies performed by [25–27] worked to build databases of body gestures for
affect recognition.
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1153

To that end, we work to study effects of an affect sensitive system that observes
features extracted from the face. These features highlight subtle expressions and them
to known expression sets defined by Ekman [16].

3 AST4CPP Architecture Framework

The AST4CPP system employs a combination of a conversational dialog subsystem [9,


28–30], the observational guidance subsystem [10, 31–33] and auditing subsystem.
The conversational dialogue subsystem and the observational guidance subsystems
help students form a deeper cognitive connection to the material. The auditing sub-
system identify incorrect inputs from students and produce visual and textual expla-
nations to help the learner identify and correct his/her mistakes.

3.1 Observational Guidance System


Being able to “see and hear” the student is key for the system to adequately diagnose
the students’ engagement/frustration level in an unobstructed manner. Physiological
measures have been used to measure engagement and alertness. However, these can be
cumbersome and detract from any learning engagement method employed [34]. To that
end, we develop a system that works in two phases, firstly to develop a system for
emotion classification that can be applied to the engagement/frustration recognition
problem. Secondly, aligning affect capabilities to an embodied virtual agent for CS
students. The first phase of the system is to design an emotion recognition system based
on specific feature sets, which is broken into three different stages:
• Face registration: The face and facial landmark (eyes, nose, and mouth) positions
are localized automatically in the image; the face box coordinates are computed;
and the face patch is cropped from the image [24]. We experimented with window
size.
• The cropped face patch is classified by four binary classifiers for different
engagement level [34].
• The outputs of the binary classifiers are fed to a decision system to estimate the
image’s emotion (boredom/frustration).
Stage (1) is standard for automatic face analysis, and our particular approach is
described in [24]. Stage (2) is discussed in the next subsection, and stage (3) is
discussed.

Boost (BF). For face detection and tracking, the haar-like features have proven an
extremely reliable means for face detection. The use of haar features is an established
method for face detection and identification of facial features. In the experiment
replicated in [34], they use a variant of haar-like features which they refer to as
Box Filters. The key advantage of working with haar features are their calculation
speed. Given that the haar wavelet already uses integral images, the features of the
object in question can be of any size and still be calculated in constant time, but there is
an issue of overfitting with this method. To resolve the issue of overfitting, the BF is
usually boosted to improve recognition. In [34], they run a Gentle Boost on BF features
1154 R. Agada et al.

as seen in Fig. 1, for 100 rounds to extract necessary features to represent the differing
levels of engagement.

Fig. 1. Box filter features, sometimes known as haar-like wavelet filters.

Gabor Filters. Gabor Filters are bandpass filters with a tunable spatial orientation and
frequency. When a Gabor filter is applied to an image, it gives the highest response at
edges and at points where texture changes as seen in Fig. 2. Gabor Energy Filters have
a proven record in a wide variety of face processing applications, including face
recognition and facial expression recognition [35].

Fig. 2. Gabor filters applied to an image in the CK+ dataset.

Gaussian Local Binary Pattern (LBP). Agada uses a method for valence facial
expression classification. In [24] implementation, the whole image is partitioned into a
4  4 sub regions, and the Gaussian local binary pattern operator is applied to each
region. Furthermore, the local binary operator threshold by the sample mean of
neighborhood pixels in a 3  3 sliding window extracts the expressional features of the
face. Using the sample mean, the output image is a uniformly weighted average, which
may result in some loss of key features. Jin et al. [36] noted that LBP could miss the
local structure information under some circumstances. Most implementations of LBP
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1155

descriptor ignore statistical relevance of features. Hence, a Gaussian weighted LBP


operator is employed because the weighted average is more toward the central pixel
and its frequency response.

3.2 Conversational Dialogue System


The conversational dialogue system is a combination of the mapping method in [37]
and a decision tree based clustering method to determine viseme pattern generations for
smooth animation sequences. While this method is being applied to the synthesis of
animation sequences for foreign languages [37], Fig. 3 illustrates some of the visemes
that have been mapped to one of the AST4CPP virtual agents.

Fig. 3. Sample viseme morph targets.

For smooth animation between each identified phoneme, (1) is applied to each
phoneme instance pi in a leaf node, where d(pi, pj) is the Mahalanobis distance between
a point pi and pj and a distribution D and N is the number of phonemes in the node.
1156 R. Agada et al.

The smallest value µbest and variance r best are then selected. Equation (2) is then
used to determine the subset impurity IZ, in which k is a scaling factor [37].
PN  
j¼1 d pi ; pj
li ¼ ð1Þ
N 1
IZ ¼ N  ðlbest  k  rbest Þ ð2Þ

The system generates its behavior pattern from a series of scripts with predeter-
mined tags. From analyzing data collected from human tutors/teacher during instruc-
tion, we create a set of gestures to mimic natural gestures during interactions for the
virtual agent to use as its own behavior set. As with the viseme animation, a similar
method id applied to the gesture animation to create a smooth transition between each
triggered animation tags in the sequence.

3.3 Auditing System


During immediate feedback training, the system evaluates each student action against
the current state of the problem state and generate a tree representing all correct variants
of steps included the best possible action for the student to take. and provides feedback
on every intermediate step. Correct student actions modify the graph to produce the
next problem-state for that case and student. Incorrect actions are matched against a set
of specific errors and produce visual and textual explanations to help the learner
identify and correct their mistakes by sending the appropriate tags to the conversational
dialog system. The system creates problem space based on the content of the lesson and
test questions provided by the instructor to the system. In this problem space, the
system tracks student actions using a simple Dijkstra algorithm to find the appropriate
path to improve understanding given any possible misconceptions that may occur. As
stated earlier, not only do the following path trigger certain animation tags for the
conversational dialog system, but also accounts and attempts to move students toward
the solution.

4 Empirical Study

Our empirical study aims to answering the following two questions:


• (Research Question One) What is the accuracy of feedback the AST4CPP frame-
work can achieve in terms of students’ facial expression? Furthermore, can we
utilize these specific features to predicate a student’s engagement or frustration level
in for a given lecture clips?
• (Research Question Two) To what extent that students expose to the virtual tutors at
various levels perform better than students interact with no virtual agents?
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1157

4.1 Accuracy of Detecting Students’ Facial Expressions


The goal of this empirical study is to test if our framework and algorithms can rec-
ognize various facial expressions. Specifically, (1) we use the same dataset in [38] for
the accuracy analysis. The dataset contains a sequence of images per subject. We
exclusively looked at three distinct expressions per subject – the initial neutral pose, the
midpoint of expression generation, and the apex of the expression. We do this to
account for varying degrees of the same expression, and (2) we evaluate various facial
expression recognition accuracy using various feature descriptors listed in the previous
section.
Table 1 shows the prototypical facial expression recognition accuracy using the
support vector machine with respect to the Gabor and Gaussian LBP feature descrip-
tors. For example, for sadness expression, the recognition rate using Gabor energy
filters is 75%, and the accurate rate using Gaussian LBP is 85.93%. On average, for
Gabor filters [39], we see at least a 79.77% accuracy over all expression while the
Gaussian LBP [24] shows a performance level 10% higher than the Gabor. Overall,
Gaussian LBP outperforms Gabor Energy Filters.

Table 1. Facial recognition Accuracy Using support vector machine using Gabor and
Gaussian LBP feature
Recognition Accuracy
Gabor energy filters (%) [39] Gaussian LBP (%) [24]
Anger 83 87.16
Disgust 75.6 87.97
Fear 79 94.90
Sadness 75 85.93
Happy 89 94.29
Surprise 77 83.08
Mean recognition accuracy 79.77 88.89

Future avenues are to apply the adaptive measures to the LBP operator as it is an
excellent texture identifier.
A student’s engagement or frustration level can be measured by a collection of
facial expression in a period. Table 2 shows the classification results for cropped faces
for a fixed time of one second. Each cell reports the accuracy (2AFC) averaged over
four cross-validation folds, along with standard deviation in parentheses. Accuracies at
pixel resolution were very slightly lower. All results are for subject-independent
classification.

4.2 How Do Virtual Tutors Influence Students’ Performance?


To study how virtual tutors have impacts on the students’ performance in programming
classes, we have cloned three virtual agents. These virtual agents are identical in
appearance and vocal qualities. Moreover, they all exhibit the same non-instructional
1158 R. Agada et al.

Table 2. Subject-Independent, within-Data Set Engagement Recognition Accuracy (2AFC


Metric) for Each Engagement [34]
Engagement/Frustration level Boost (BF)
1 96.5
2 70.9
3 60.7
4 63.2

and let the user know the objective of the lesson. The only difference is that each virtual
agent behaves at a different animation levels:
• Level 0: No movement – no emotion (NG-NE): this agent has its head and voice
expression completely muted; a static version of the agent.
• Level 1: Idle – no emotion (G-NE): this agent is limited to only audio expression;
the body and head movements are muted.
• Level 2: Gesture – emotion (G-E): this agent is fully animated and realistically
expressive.
Figure 4 illustrates some example of gestures and expressions of one of the
AST4CPP virtual agents.
Test subjects consist of randomly selected students to form three groups from our
COSC 112 Introduction to Programming courses, which has an enrollment of 59
students. Each group was assigned to a virtual agent at different performance level.
Each session lasted approximately forty minutes on multiple days. The students had the
option to select from any given lecture available curriculum. Recordings of each ses-
sion included user logs, post-session quizzes and webcam footage. Figure 5 shows the
AST4CPP interface, in which student’s facial expressions are captured and displayed at
the right corner of the screen and the virtual tutor is in the main screen.
Our empirical study is conducted Using the environmental setup in the Compu-
tational Perception and Animation Lab (CPAL) the participants interfaced with a
version of the virtual tutor, which was randomly assigned by the experimenter. CPAL
has an Alienware x51 gaming PC with at least 1 GB of graphics memory, a Logitech
c270 webcam running at 720p, a three-button mouse, dual color monitors, and
Sony MDR - XB400 headphones.
AST4CPP can display a wide range of emotions and gestures but remains in
domains that have a positive impact on learning. It reads from pre-generated scripts
with embedded commands for a range of body gestures and affective behavior. The
agent queries its behavioral database for the desired actions and performs them based
on user behavior. These actions include: giving positive or negative feedback, dis-
cussing problems or solutions, giving hints when triggered. While waiting for a
response from a student, a looping ‘idle’ action seamlessly gives the impression that
agent is waiting.
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1159

Table 3 shows the student performance for each group. The student performance is
measured by the mean of all quiz and exam score. Overall, the means of each leave are
9.72. 8.77, and 7.68, respectively. The student performance using the level 2 clone is
10% and 20% higher than levels 1 and 0, respectively. We attribute their success to the
affective nature of the agent, which allows for a more immersive interaction with the
agent.

Fig. 4. Various gestures and facial expressions of AST4CPP agent.


1160 R. Agada et al.

Fig. 5. AST4CPP interface

Table 3. Descriptive statistics of user performance for test clones


Descriptive Score
LEVEL N Mean Std. Std. 95% Confidence Minimum Maximum
Deviation Error Interval for Mean
Lower Upper
Bound Bound
2 18 9.72 3.97 0.94 7.75 11.7 1 15
1 22 8.77 4.59 0.98 6.74 10.81 0 14
0 19 7.68 3.93 0.90 5.79 9.58 0 14
Total 59 8.71 4.21 0.55 7.62 9.81 0 15

5 Conclusion and Future Work

Considering that in-class teaching time is very limited, it can be very challenging for an
educator to instruct the student, while keeping the student’s interest in the course high
throughout the semester. This requires innovation in teaching, especially when the class
is large. The affect sensitive tutoring system is designed to simulate the process of one-
to-one tutoring by helping educators to further supplement the contents to suit the
needs of the students. In this research, we put forth the affective sensitive tutoring
system for the introductory computer programming course, providing the student the
opportunity to learn about certain aspects of the course at Bowie State University.
Several instruction modules were created in which the student is presented with a
lecture specified by the instructor, as well topics to be covered in that lecture. Based on
the subtopic selected by the student, only information pertinent to that subtopic is
displayed. Two issues were evaluated in the analyses: the comprehension level of the
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1161

user after interacting with the system and the user’s affective states during their learning
experiences. Overall, we see that the results of the condition in which the agent is fully
expressive shows a marked increase in the level of comprehension because as we
speculated the user is more invested in the software when the agent fully articulates
emotion through head movement and facial expression. In addition, the condition in
which the agent is partially animated illustrates that the comprehension level is the
average of both the fully animated and non-animated agent. We hypothesized that the
magnitude of the post-test scores were due to the students’ interaction with the agent.
Since there were marked differences in the magnitude of the test scores between each
clone, as predicted the partially expressive and muted clones were not as effective.
This system’s current iteration mainly focuses on information gathered from the
face. Future iterations will investigate student body gestures as they pertain to affective
states of the student, as well as their correlations to facial expression.

Acknowledgment. This work is supported in part by the National Science Foundation under
Grant Numbers 1714261.

References
1. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E.: Wiebe, E.N., Lester, J.C.: Predicting learning
and affect from multimodal data streams in task-oriented tutorial dialogue. In: Proceedings
7th International Conference on Educational Data Mining, Edm, pp. 122–129 (2014)
2. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Automatically
recognizing facial indicators of frustration: a learning-centric analysis. In: Proceedings -
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction,
ACII 2013, pp. 159–165 (2013)
3. Whitehill, J., Serpell, Z., Foster, A., Lin, Y.-C., Pearson, B., Bartlett, M., Movellan, J.:
Towards an optimal affect-sensitive instructional system of cognitive skills. In: Computer
Vision and Pattern Recognition Workshop 2011, pp. 20–25 (2011)
4. Bellegarda, J.R.: A data-driven affective analysis framework toward naturally expressive
speech synthesis. IEEE Trans. Audio Speech Lang. Process. 19(5), 1113–1122 (2010)
5. D’Mello, S., Dale, R., Graesser, A.: Disequilibrium in the mind, disharmony in the body.
Cogn. Emot. 26(2), 362–374 (2012)
6. Grafsgaard, J.F., Wiggins, J.B., Boyer, K.E., Wiebe, E.N., Lester, J.C.: Embodied affect in
tutorial dialogue: student gesture and posture. In: Lecture Notes in Computer Science,
including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics, vol. 7926 LNAI, pp. 1–10 (2013)
7. Mahmoud, M., Robinson, P.: Interpreting hand-over-face gestures. In: Lecture Notes in
Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics, vol. 6975 LNCS, no. PART 2, pp. 248–255 (2011)
8. Min, W., Wiggins, J.B., Pezzullo, L.G., Boyer, K.E., Mott, B.W., Frankosky, M.H., Wiebe,
E.N., Lester, J.C.: Predicting dialogue acts of virtual learning companion utilizing student
multimodal interaction data. In: Proceedings 9th International Conference on Educational
Data Mining, pp. 454–459 (2016)
9. Baldassarri, S., Cerezo, E.: Maxine: embodied conversational agents for multimodal
affective communication. In: Mukai, N. (eds.) Computer Graphics. Tech Open-Access
Publishing (2012)
1162 R. Agada et al.

10. Roll, I., Aleven, V., McLaren, B.M., Koedinger, K.R.: Improving students’ help-seeking
skills using metacognitive feedback in an intelligent tutoring system. Learn. Instr. 21(2),
267–280 (2011)
11. Baker, R.S.J.D., D’Mello, S.K., Rodrigo, M.M.T., Graesser, A.C.: Better to be frustrated
than bored: the incidence, persistence, and impact of learners’ cognitive-affective states
during interactions with three different computer-based learning environments. Int. J. Hum
Comput Stud. 68(4), 223–241 (2010)
12. Chi, M., Vanlehn, K., Litman, D., Jordan, P.: An evaluation of pedagogical tutorial tactics
for a natural language tutoring system: A reinforcement learning approach. Int. J. Artif.
Intell. Educ. 21(1–2), 83–113 (2011)
13. D’Mello, S.K., Olney, A., Person, N.K.: Mining collaborative patterns in tutorial dialogues.
J. Educ. Data Min. 2(1), 1–37 (2010)
14. D’Mello, S.K., Calvo, R.A.: Significant accomplishments, new challenges, and new
perspectives. In: Calvo, R.A., D’Mello, S.K. (eds.) New Perspectives on Affect and Learning
Technologies, pp. 255–271. Springer, New York (2011)
15. Ben Ammar, M., Neji, M., Alimi, A.M., Gouardères, G.: The affective tutoring system.
Expert Syst. Appl. 37(4), 3013–3023 (2010)
16. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of
Facial Movement. Consulting Psychologists Press, Palo Alto (1978)
17. Nye, B.D., Graesser, A.C., Hu, X.: AutoTutor and family: a review of 17 years of natural
language tutoring. Int. J. Artif. Intell. Educ. 24(4), 427–469 (2014)
18. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I., Ave, F.: The
extended Cohn-Kanade dataset (CK +): a complete dataset for action unit and emotion-
specified expression. In: Computer Vision and Pattern Recognition Workshops, pp. 94–101,
July 2010
19. Kort, B., Reilly, R.: Analytical models of emotions, learning and relationships: towards an
affect-sensitive cognitive machine. In: Conference on Virtual Worlds and Simulation
(VWSim 2002), pp. 1–15 (2002)
20. Bartneck, C.: Integrating the OCC model of emotions in embodied characters. In:
Proceedings of the Workshop on Virtual Conversational Characters: Applications, Methods,
and Research Challenges, Melbourne (2002)
21. Butko, N.J., Theocharous, G., Philipose, M., Movellan, J.R.: Automated facial affect
analysis for one-on-one tutoring applications. Face Gesture 2011(2), 382–387 (2011)
22. Girard, J.M., Cohn, J.F., Jeni, L.A., Sayette, M.A., De la Torre, F.: Spontaneous facial
expression in unscripted social interactions can be measured automatically. Behav. Res.
Methods, 1–32 (2014)
23. Yang, P.: Facial Expression Recognition and Expression Intensity Estimation. The State
University of New Jersey, Rutgers (2011)
24. Agada, R., Yan, J.: A model of local binary pattern feature descriptor for valence facial
expression classification. In: 2015 IEEE 14th International Conference on Machine Learning
and Applications, vol. 2, no. 2, pp. 634–639 (2015)
25. Malatesta, L., Asteriadis, S., Caridakis, G., Vasalou, A., Karpouzis, K.: Associating gesture
expressivity with affective representations. Eng. Appl. Artif. Intell. 51, 124–135 (2016)
26. Gunes, H., Piccardi, M.: Bimodal face and body gesture database for automatic analysis of
human nonverbal affective behavior. In: Proceedings - International Conference on Pattern
Recognition, vol. 1, pp. 1148–1153 (2006)
27. Weerasinghe, P., Rajapakse, R.P.C.J., Marasinghe, A.: An empirical analysis on emotional
body gesture for affective virtual communication, vol. 12, no. 1, pp. 101–107 (2015)
28. Ammar, M.B., Neji, M.: Conversational embodied peer agents in affective e-learning. In: 8th
International Conference on ITS, pp. 29–37 (2006)
An Affective Sensitive Tutoring System for Improving Student’s Engagement 1163

29. Graesser, A.C., Vanlehn, K., Rosé, C.P., Jordan, P.W., Harter, D.: Intelligent tutoring
systems with conversational dialogue. AI Mag. 22(4), 39–52 (2001)
30. Latham, A., Crockett, K., McLean, D., Edmonds, B.: A conversational intelligent tutoring
system to automatically predict learning styles. Comput. Educ. 59(1), 95–109 (2012)
31. Arroyo, I., Beck, J.E., Beal, C.R., Wing, R., Woolf, B.P.: Analyzing students’ response to
help provision in an elementary mathematics intelligent tutoring system. In: Papers of the
AIED-2001 Workshop on Help Provision and Help Seeking in Interactive Learning
Environments, pp. 34–46 (2001)
32. Shah, F.: Recognizing and responding to student plans in an intelligent tutoring system:
CIRCSIM-tutor. Illinois Institute of Technology (2000)
33. Shih, B., Koedinger, K., Scheines, R.: A response time model for bottom-out hints as
worked examples. In: Proceedings of the 1st International Conference on Educational Data
Mining, pp. 117–126 (2008)
34. Whitehill, J., Serpell, Z., Lin, Y.C., Foster, A., Movellan, J.R.: The faces of engagement:
automatic recognition of student engagement from facial expressions. IEEE Trans. Affect.
Comput. 5(1), 86–98 (2014)
35. Cament, L.A., Galdames, F.J., Bowyer, K.W., Perez, C.A.: Face recognition under pose
variation with local gabor features enhanced by active shape and statistical models. Pattern
Recognit. 48(11), 3371–3384 (2015)
36. Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved LBP under bayesian
framework. In: Third International Conference Image Graph, no. 2, pp. 306–309 (2004)
37. Whipple, J., Agada, R., Yan, J.: Foreign language visemes for use in lip-synching with
computer-generated audio. J. Comput. Sci. Inf. Technol. 5(2), 1–14 (2017)
38. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I., Ave, F.: The
extended cohn-kanade dataset (CK +): a complete dataset for action unit and emotion-
specified expression, July 2010
39. Ou, J., Bai, X.-B., Pei, Y., Ma, L., Liu, W.: Automatic facial expression recognition using
gabor filter and expression analysis. In: Second International Conference on Computer
Modeling and Simulation 2010, ICCMS 2010, vol. 2, pp. 215–218 (2010)
Multimedia Interactive Boards as a Teaching
and Learning Tool in Environmental Education:
A Case-Study with Portuguese Students

Cecília M. Antão ✉
( )

Centro de Química Estrutural, Instituto Superior Técnico, Universidade de Lisboa,


Lisbon, Portugal
cmantao@gmail.com

Abstract. Multimedia interactive whiteboard (IWB) is a teaching and learning tool


which, in spite of recent application, has yielded good results in the instruction and
learning process with students of elementary schools, in different countries. In this
study, a multimedia interactive whiteboard was used as an e-learning tool in envi‐
ronmental education under a STSE perspective. Two classes of high-school
students, aged 13–16 years old, participated in the study, aiming to improve envi‐
ronmental awareness and Natural Sciences scores in an urban public school in
northern Portugal. The goal was to understand how efficient the multimedia inter‐
active whiteboard and the ‘Society, Technology, Science-Environment’ (STSE)
perspective were in understanding the concepts related to habitat destruction and how
they would both contribute to the students’ behavior when taking part in a classroom
debate to find the best solutions for an environmental problem. A questionnaire
(attitude scale) and a post-test were the research tools used in the educational meth‐
odology. There were various benefits of using IWB in class – promotion of envi‐
ronmental awareness and collaborative work, increased test achievement and
increased willingness to study Natural Sciences.

Keywords: Interactive whiteboard · Multimedia · Natural sciences


Society, Technology, Science-Environment (STSE)

1 Introduction

Over the last decade, novel interactive technologies have become popular in classrooms
across the world. According to the 2016 final report of the European Commission DG
Communications Networks, Content & Technology [1], a connectivity package of
measures was taken to ensure that everyone in the EU would have the best possible
internet connection to participate in the digital society and economy, based on the wide‐
spread deployment and take-up of very high capacity networks, in rural and urban areas.
Beforehand, in 2013, it was stated that interactive whiteboards (IWB) were spread in all
European schools at different levels [2]. Several studies have been conducted focusing
on the advantages, problems and impact of IWB in public education. The current case-
study aims at evaluating the benefits of IWB, under an STSE approach, in environmental
awareness and in Natural Sciences knowledge, with 8th level - students in a public high

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1164–1169, 2019.
https://doi.org/10.1007/978-3-030-02686-8_87
Multimedia Interactive Boards as a Teaching and Learning Tool 1165

school, in Portugal. For this purpose, the IWB technology was used to present a real
environmental problem in their geographical region, the students being asked to assume
a specific character in a role-play activity, discuss the problem within the classroom ad,
eventually, reach a solution.

1.1 Portuguese Paradigm


Portugal is a peculiar case in education and its evolution during the 20th century was
very different from all other European countries, with higher rates of school drop-out
and failure. As a consequence, the scientific literacy of the Portuguese population was
quite poor comparing to other European countries early this century. The educational
situation has been gradually improving since then, along with the introduction of ICT
in junior and high schools and the increasing social interest for the web technologies.
In 2008, there was a governmental plan to install IWB in public schools, in two-
phases, 1600 units and then 6000 units, aiming at one IWB per 3 rooms [3]. A national
plan to train teachers in ICT competences was carried out at the same time and, by 2010,
31230 teachers had received specific ICT training with interactive whiteboards.
In 2012, the PISA report showed that Portugal was already above the equity line of
OECD countries as to the allocation of educational resources and mathematics perform‐
ance [4]. The 2015 PISA assessment focused on science literacy and the Portuguese
students showed to be above OECD average in sciences for the first time in 15 years [5].
IWB-based education is however far from being popular in science teaching-and-
learning. This study attempts to show the benefits of IWB as an e-learning tool for
Natural Sciences to 8th level students, in a Portuguese public school.

2 Methodology

In Portugal, basic teaching lasts 9 years divided by 3 cycles, the 3rd cycle comprising
7th to 9th levels. The Natural Sciences curriculum for the 8th level is usually taught in
two 60 min-lessons per week.
This research took place in an urban public high-school, in Porto, Portugal. It was
conducted during 4 lessons in 2–3 weeks: the students interacted directly with the IWB
in the first three lessons and in the last one they were asked to complete a questionnaire
and evaluated in a post-test. More precisely, the IWB technology was first introduced
to the students who were encouraged to play an online game with their class’ support.
Second, the IWB was used to introduce the environmental problem to the class – water
and soil contamination resulting from industrial waste kept in a deactivated coal mine
– followed by the explanation of the role-play activity in order to let the students think
about the different characters they would assume in the classroom debate - the President
of the community, the engineer, the farmer, the scientist, the teacher, the fireman, the
priest, the doctor, and the retired man/woman. In the third lesson, after forming small
groups to prepare the debate, they discussed the problem in the classroom as in a parish
council until they reached a consensual solution [6].
1166 C. M. Antão

A questionnaire survey with 14 questions was delivered to identify individual


response to IWB-based lessons, adapted from [7]. A post-test assessed the students’
knowledge and ability to deal with an environmental problem similar to the one
discussed in class, as defined in the Natural Sciences curriculum.
The software used was ActivInspire Studio; the resources were ActivBoard,
ActivPen and flipcharts.

3 Participants

The sample of the research included 41 students from an urban context in the city of
Porto, divided by class A (20 students) and class B (21 students) of 8th grade, aged 13–
16 years. It comprised 46% girls and 54% boys. All of them participated in the survey
and 98% enrolled in the post-test.

4 Findings

4.1 The Results of the Analysis of the Respondents’ Attitude Towards the Benefits
of Using IWB

Questionnaire/Attitude Scale. The response of each student followed an attitude scale


with five levels – strongly agree, agree, partly agree, disagree and strongly disagree
(Table 1).

Table 1. The opinion of the students about the benefits of IWB learning
Question Strongly Agree (%) Partly Disagree Strongly
agree (%) A–B agree (%) (%) A–B disagree
Class A– A–B (%) A–B
Class B
I concentrate better when the 50–33 40–57 10–10 0–0 0–0
IWB is used
Since IWB is used, I am eager to 10–10 40–19 15–29 10–24 25–19
come to school
There is no need to use IWB in 0–0 0–0 5–10 35–45 60–48
the classroom
To learn, IWB is no different 5–5 10–10 20–24 20–43 45–19
from other boards
I enjoy learning with the IWB 75–43 15–43 10–10 0–5 0–0

The students of Class A showed a more positive opinion about the IWB than the
class B: higher percentages of Class A students declared to ‘strongly agree’ to the
sentence ‘I enjoy learning with the IWB’ and to ‘agree’ that ‘since the IWB is used, I
am eager to come to school’.
When the results were aggregated by gender, a significant percentage of boys (38%)
strongly agreed to study more because of the IWB than the girls (20%). On the other
Multimedia Interactive Boards as a Teaching and Learning Tool 1167

hand, girls showed a slightly more positive attitude – 65% said they ‘agree’ to put their
finger in the air more frequently with IWB-based lessons than 62% of the boys (data not
shown).

STSE Perspective. The teaching strategy of problem-solving associated with role-play


gave better results in lessons/groups where most students were of feminine gender. In
these cases, we reached a solution for the environmental issue. During the debate, the
girls showed to be more mature when standing up for their ideas whilst some boys tended
to make out-of-context comments in order to get laughs. On the other hand, certain
students engaged themselves in the role-play and were surprisingly able to understand
all aspects of the problem and look for a solution. Just like [8], we believe that the STSE
approach associated with problem-solving allowed the development of competences, in
particular, the search for information to better understand some phenomena, science
communication and cooperation with the others.

Post-test. In the fourth lesson, the students’ knowledge was evaluated with a post-test
presenting an environmental problem similar to the role-play – the environmental
recovery of a deactivated land mill, in the suburbia of Porto.
Class A students obtained an average score of 63.4% and class B students, 64.4%.
Most students understood the concepts towards habitat destruction and problem-solving
with environmental and social implications.
Negative scores, meaning below 50%, were only 14.3%, which was much better than
the usual test results for both classes in Natural Sciences.
Again, the analysis by gender showed significant differences: the boys obtained an
average score of 68% while the girls had 59.5%, i.e. an 8.5% difference, considered to
be significant in a small sample like this one (Fig. 1).

Fig. 1. Dispersion of the scores, in percentage, obtained in the post-test by the students of class
A and B, according to the gender. The standard deviation was 24.8% for the girls and 14.3% for
the boys.
1168 C. M. Antão

5 Discussion

Concerning attitudinal behavior of 8th level students towards the IWB, no great differ‐
ences were detected between Class A and Class B. Most students had a positive response
to IWB technology and engaged in the problem-solving strategy.
Nevertheless, the majority of Class A students showed more enthusiasm in certain
aspects of IWB-based technology – Class A was the one whose students enjoyed learning
with IWB-based tools the most, even though the post-test results showed little differ‐
ences from Class B as to knowledge acquisition and environmental awareness.
The most significant differences refer to the gender: not only the boys showed a more
positive attitude to IWB but also they got higher scores in the post-test. It is likely that
the boys from both classes were more confident about their computer knowledge than
the girls and, therefore, the IWB was more attractive to them. This may have led the
boys to study more than the girls, and to get higher scores than the girls in the post-test.
In spite of the economic crisis, the majority of these students owned a PC or a smart‐
phone, this same aspect being mentioned in a case-study with IWB in the first cycle also
conducted in Portugal, in 2015 [9]. Therefore, both boys and girls of our research groups
had similar opportunities to interact with ICT at home and the differences cannot be
justified by a differentiated access to ICT tools.
This gender difference could be explained by the role-play used in the problem-
solving methodology: the boys chose the characters with decision responsibilities
(president or engineer) while the girls preferred more neutral characters like a scien‐
tist or a teacher. Being on a decision position, the boys may have engaged in the
environmental problem more seriously and, as a consequence, spent more time
thinking about a solution.
The use of IWB was undoubtedly beneficial as a teaching-and-learning tool in this
study: most students showed to be in favor of using IWB in the classroom and their
motivation may have contributed to their score increase in the post-test. It was also a
profitable experience for the training teacher to be a debate mediator and find the right
moment to leave the spotlight, in order to empower the students in searching for the
environmental problem solution.
The role-play associated with the problem-solving methodology developed the
collaborative and cognitive skills of the students: individual or collaborative assessments
such as sharing and negotiating ideas, regulating problem solving, maintaining commu‐
nication, knowledge and problem-solving skills were part of the process of learning.
These proved to be key assessments in a recent study to develop a Collaborative Science
Assessment prototype by ETS researchers in the USA [10].
Overall, the IWB technology was successful in associating a role-play approach of
an environmental problem-solving. The way the IWB was used increased the students’
motivation for learning, also because it focused on something real – a local environ‐
mental problem – that is rarely mentioned in science educational research. The IWB
contributed to increase the 8th level students’ interest and knowledge of Natural Sciences
and made them aware of some environmental issues. Similarly, in a research study in
Hungary in 2017, a specific ICT tool, Edmodo, was found to enhance 10th level students’
academic achievement and motivation in Biology learning [11].
Multimedia Interactive Boards as a Teaching and Learning Tool 1169

In future studies more classes of the same size should be used in order to get extended
statistical data. The period of research would also benefit from an extra-lesson to help
the students build their characters and improve their performance in the debate. ICT
technology, and in particular IWB, has already proved to be an advantage in science
teaching-and-learning in public schools but it still lacks the necessary engagement of
teachers to try new interdisciplinary approaches.

References

1. European Commission DG Communications Networks: Content & Technology 2016. https://


ec.europa.eu/info/publications/annual-activity-report-2016-communications-networks-
content-and-technology_en. Accessed 25 Apr 2018
2. Vainoryte, B., Zygaitiene, B.: Peculiarities of interactive whiteboard application during
lessons in Lithuanian general education schools. Procedia–Soc. Behav. Sci. 197, 1672–1678
(2015)
3. Min-Edu.pt Homepage. http://erte.dgidc.min-edu.pt/publico/conteudos/BrochuraQIM.pdf.
Accessed 25 Apr 2018
4. PISA 2012: Programme for International Student Assessment. http://www.oecd.org/pisa/
keyfindings/pisa-2012-results.htm. Accessed 25 Apr 2018
5. PISA 2015: Programme for International Student Assessment. http://www.compareyour
country.org/pisa/country/prt?lg=en. Accessed 25 Apr 2018
6. Krüger, V., Nunes, S.L.P.: Um projeto educativo referenciado pelo MIE e um enfoque CTS.
In: IV Encontro Ibero-Americano de colectivos escolares e redes de professores que fazem
investigação na sua escola, Lajeado, RS, Brazil, pp. 1–8 (2005)
7. Sad, S.N.: An attitude scale for smart board use in education: validity and reliability studies.
Comput. Educ. 58, 900–907 (2012)
8. Carrasquinho, S., Vasconcelos, C., Costa, N.: Resolución de problemas en la enseñanza de
la geologia: contribuciones de un estudio exploratório. Revista Eureka sobre Enseñanza y
Divulgación de las Ciencias 4(1), 67–86 (2007)
9. Dias, V., Gil, H., Costa, N., Gonçalves, T.: O quadro interativo multimédia (QIM) num
contexto de prática de ensino supervisionada em 1º CEB. In: Atas do XVII Simpósio
Internacional de Informática Educativa, Escola Superior de Educação do Instituto Politécnico
de Setúbal, pp. 7–12, Setúbal, Portugal, 25–27 novembro 2015
10. von Davier, A.A., Hao, J., Lei, L., Kyllonen, P.: Interdisciplinary research agenda in support
of assessment of collaborative problem solving: lessons learned from developing a
collaborative science assessment prototype. Comput. Hum. Behav. 76, 631–640 (2017)
11. Végh, V., Nagy, Z.B., Zsigmond, C., Elbert, G.: The effects of using Edmodo in biology
education on students’ attitudes towards biology and ICT. Probl. Educ. 21st Century 75(5),
483–495 (2017). http://oaji.net/articles/2017/457-1509895649.pdf. Accessed 01 June 2018
Author Index

A Beheshti, Babak D., 196


Abdel-Salam, T. S., 313 Benítez, Diego S., 171
Abdou, George, 616 BenMessaoud, Fawzi, 235
Abuhussein, Abdullah, 205 Berger, Philipp, 962
Afshari, Hamed H., 298 Birnbaum, Dror, 884
Agada, Ruth, 1151 Biswas, Subir, 79
Agbo, J. J., 1109 Blumrosen, Gaddi, 884
Ahmed, Bestoun S., 241 Bures, Miroslav, 241
Ait Kadi, Daoud, 904 Busch, John, 1005
Al Shebli, Hessa Mohammed Zaher, 196
Alférez, Germán H., 257 C
Alharbi, Saleh, 1142 Cachipuendo, Rolando, 171
Al-Jarrah, Ahmad A., 1017 Cao, Houwei, 55
Al-Maitah, Mohammed, 359 Cao, Xixi, 55
Al-Maliki, Murtadha, 828 Caraguay, Jorge A., 874
Alsubaei, Faisal, 205 Carpenter, Vanessa Julia, 104
Amiruzzaman, Md, 283 Chan, Liliang, 55
Anderl, Reiner, 458 Chavan, Satishkumar S., 548
Antão, Cecília M., 1164 Cheng, Lian-Ta, 640
Arévalo, Andrés, 444 Cheng, Qijin, 385
Arosha Senanayake, S. M. N., 598 Cherif, Arab Ali, 481
Atalla, Nadi, 616 Chiang, Chen-Fu, 914
Aung, Swe Swe, 530 Chieng, David, 598
Avdoshin, Sergey, 626 Chowdhury, Wahida, 152
Aylon, Linnyer Beatryz Ruiz, 1 Claeser, Daniel, 418
Aziz, N., 835 Coady, Yvonne, 1062
Cooper, Rachel, 134
B Coulton, Paul, 134
Bakken, David, 517
Balar, Kalpesh, 678 D
Bansal, Arvind K., 569 Darman, Rozanawati, 763
Barber, K. Suzanne, 369 Dasigi, Venu G., 505
Barreto, Gilmar, 1 Deng, Zhiqun Daniel, 517
Batista, Vivian F. López, 874 Di Biano, Robert, 659
Behar, Patricia Alejandra, 982 Djerroud, Halim, 481

© Springer Nature Switzerland AG 2019


K. Arai et al. (Eds.): FTC 2018, AISC 880, pp. 1171–1174, 2019.
https://doi.org/10.1007/978-3-030-02686-8
1172 Author Index

Drew, Steve, 1142 Hübner, Rodrigo, 1


Du, Jiali, 806 Hung, Jui-Long, 1097
Du, Xu, 1097
Du, Youchen, 46 I
Duron-Arellano, David, 19 Idowu, A., 1109
Ihamäki, Pirita, 1079
E Iram, Shamaila, 333
Elaish, Monther M., 1029 Islam, Md. Manirul, 225
Encalada, Patricio, 63 Iyengar, S. S., 659

J
F Jacoby, Derek, 1062
Feller, Nico, 856 Jalali, Shahrzad, 298
Feng, Lin, 46 Jaramillo, Edgar D., 874
Fernando, Terrence, 333 Jayaweera, C. D., 835
Foroughi, Farhad, 185 Joshi, Karishma, 589
Fu, Tao, 517
Fuertes, Walter, 171 K
Kahvazadeh, Sarang, 122
G Kampa, Sebastian P., 856
Gao, Jiahui, 385 Karam, Orlando, 505
Garcia, Jordi, 687 Kaufman, David, 1133
Garg, Manu, 930 Keivanpour, Samira, 904
Gavilanes-Sagnay, Fredy, 1123 Kent, Samantha, 418
Gawde, Purva R., 569 Khairiyah Binti Haji Raub, Siti Asmah @, 598
Ghani, Norjihan Abdul, 1029 Khaleefahand, Shihab Hamad, 763
Ghods, Amir H., 298 Khaleq, Abeer Abdel, 401
Giorno, Fernando, 159 Khosravy, Mahdi, 730
Golpayegani, S. Alireza Hashemi, 343 Komogortsev, Oleg, 79
Gómez-Cárdenas, Alejandro, 122 Kong, Fenddy Kong Mohd Aliff, 849
Gordón, Carlos, 63 Krishna, Praful, 678
Greer, Des, 1005 Krůza, Oldřich, 749
Griffith, Henry, 79 Kuboň, Vladislav, 749
Guanochanga, Byron, 171 Kulkarni, Harshad, 678
Gupta, Neeraj, 730 Kulkarni, Rucha, 678
Gust, Peter, 856
L
H Lavi, Yaron, 884
Haase, Ines, 856 Lema, Henry, 63
Haddara, Moutaz, 92 León, Diego, 63, 444
Hamdi, Mohamed Salah, 490 Liau, David, 369
Hameed, Sarab M., 776 Lindley, Joseph, 134
Hanna, Philip, 1005 Liu, Shenglan, 46
Hassan, H. F., 1109 Liu, Xiangyu, 55
Hausknecht, Simone, 1133 Loi, Daria, 788
Hazeem, Ahmed Abdulbasit, 763 Loza-Aguirre, Edison, 1123
Helgesen, Tim, 92 Lu, Jun, 517
Heljakka, Katriina, 1079 Luksch, Peter, 185
Hennig, Patrick, 962
Hernandez, German, 444 M
Hill, Richard, 333 Machado, Leticia Rocha, 982
Hormizi, Elham, 343 Madni, Asad M., 659
Hou, Hongfei, 517 Mahamud, Md. Sadad, 225
Hou, Yu, 649 Marín-Tordera, Eva, 122
Author Index 1173

Martinez, Jayson J., 517 R


Masip-Bruin, Xavi, 122, 687 Ra, Ilkyeun, 401
McGowan, Aidan, 1005 Raahemi, Bijan, 298
McKenna, H. Patricia, 269 Rahman, Md. Saniat, 225
Medendorp, Anthony, 941 Ralph, Rachel, 1062
Mehrandezh, Mehran, 19 Ramudhin, Amar, 904
Meinel, Christoph, 962 Rana, Hukum Singh, 589
Meneses, Fausto, 171 Randrianasolo, Arisoa S., 705
Miller, John, 517 Riofrío-Luzcando, Diego, 1123
Møbius, Nikolaj “Dzl”, 104 Rivas, Mario, 159
Mohamed, Abduljalil, 490 Rosenberg, Louis, 721
Moravcik, Oliver, 950 Rosero, Edwin A., 874
Mostafa, M. A. M., 313 Rosero-Montalvo, Paul D., 874
Mostafa, Salama A., 763 Ryan, Sarah, 235
Muhammad, Iqra, 435
Mukhopadhyay, Supratik, 659 S
Mustapha, Aida, 763 Salman, Osama A., 776
Salvador, Santiago, 171
N Sandoval, Javier, 444
Nagayama, Itaru, 530 Saxena, Nidhi, 589
Naim, Abdul Ghani, 598 Schell, Robyn, 1133
Nat, M., 1109 Schnörr, Claudius, 30
Navarro Ortega, Samuel A., 992 Segura-Morales, Marco, 1123
Nielson, Jeffery A., 569 Semwal, Sudhanshu Kumar, 930, 941
Niño, Jaime, 444 Sengupta, Souvik, 687
Noureen, Rabia, 435 Serebryanyk, Alla, 30
Sewell III, Thomas, 235
O Shabaty, Or, 884
Ohsawa, Shin, 530 Sharma, Rochan, 589
Okunlola, L., 1109 Shelton, Brett E., 1097
Oluwajana, Dokun, 1109 Shennat, Abdulmonem I., 1029
Osman, Mohd Azam, 849 Shiva, Sajjan, 205
Overholt, Dan, 104 Shuib, Liyana, 1029
Silva, Diogo, 1133
P Soltani, Neda, 343
Patel, Nilesh, 730 Sonego, Anna Helena Silveira, 982
Pawar, Abhijit, 548 Soto-Lopez, Daniel, 19
Peluffo-Ordóñez, D. H., 874 Stein, Max Vom, 856
Peñaherrera, Cristian, 63 Suman, Samiul Haque, 225
Peng, Pai, 55 Svetsky, Stefan, 950
Pesotskaya, Elena, 626 Swief, R. A., 313
Petrykowski, Markus, 962 Sylnice, Joe R., 257
Pijal-Rojas, José, 874
Pilar Munuera Gómez, M., 992 T
Pitz, Katrin, 458 Tahar, Sofiene, 490
Preston, Nicholas, 1062 Tahir, Rabail, 1041
Pydimarri, Sailaja, 505 Talbar, Sanjay N., 548
Pyeatt, Larry D., 705 Talib, Abdullah Zawawi, 849
Tamaki, Shiro, 530
Q Tan, Li, 517
Qamar, Usman, 435 Tao, Lixin, 649
Qiao, Hong, 46 Tapia, Freddy, 171
1174 Author Index

Tiwari, Bhupendra Nath, 730 X


Toma, Milan, 557 Xu, Li, 46
Torres, Jenny, 171 Xu, Weifeng, 1151
Torrezzan, Cristina Alba Wildt, 982
Toulkeridis, Theofilos, 171 Y
Yan, Jie, 1151
Yang, Ching-Yu, 640
V
Yang, Juan, 1097
Vanduhe, V. Z., 1109
Yu, Philip L. H., 385
Villacís, César, 171
Yu, Pingfang, 806
Virzi, Valerio, 856
Z
W Zaeem, Razieh Nokhbeh, 369
Wang, Alf Inge, 1041 Zainon, Wan Mohd Nazmee Wan, 849
Wang, Wen-Fong, 640 Zaki, Hatem, 313
Willcox, Gregg, 721 Zhang, Mingyan, 1097
Willis, Amanda, 104 Zisler, Matthias, 30
Wu, Jie, 46 Zong, Chengqing, 806

You might also like