Professional Documents
Culture Documents
Artificial Intelligence by Luis Rabelo PDF
Artificial Intelligence by Luis Rabelo PDF
com
COMPUTER SCIENCE, TECHNOLOGY AND APPLICATIONS
ARTIFICIAL INTELLIGENCE
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or
by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no
expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No
liability is assumed for incidental or consequential damages in connection with or arising out of information
contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in
rendering legal, medical or any other professional services.
COMPUTER SCIENCE, TECHNOLOGY
AND APPLICATIONS
www.electronicbo.com
COMPUTER SCIENCE, TECHNOLOGY AND APPLICATIONS
ARTIFICIAL INTELLIGENCE
LUIS RABELO
SAYLI BHIDE
AND
EDGAR GUTIERREZ
EDITORS
Copyright © 2018 by Nova Science Publishers, Inc.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in
any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or
otherwise without the written permission of the Publisher.
We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse
content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the
“Get Permission” button below the title description. This button is linked directly to the title’s permission
page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN.
For further questions about using the service on copyright.com, please contact:
Copyright Clearance Center
Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: info@copyright.com.
www.electronicbo.com
NOTICE TO THE READER
The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied
warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for
incidental or consequential damages in connection with or arising out of information contained in this book.
The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or
in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government
reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of
such works.
Independent verification should be sought for any data, advice or recommendations contained in this book. In
addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property
arising from any methods, products, instructions, ideas or otherwise contained in this publication.
This publication is designed to provide accurate and authoritative information with regard to the subject
matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering
legal or any other professional services. If legal or any other expert assistance is required, the services of a
competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED
BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF
PUBLISHERS.
Additional color graphics may be available in the e-book version of this book.
Preface vii
Chapter 1 Unsupervised Ensemble Learning 1
Ramazan Ünlü
Chapter 2 Using Deep Learning to Configure Parallel Distributed
Discrete-Event Simulators 23
Edwin Cortes, Luis Rabelo and Gene Lee
Chapter 3 Machine Learning Applied to Autonomous Vehicles 49
Olmer Garcia and Cesar Diaz
Chapter 4 Evolutionary Optimization of Support Vector Machines Using
Genetic Algorithms 75
Fred K. Gruber
Chapter 5 Texture Descriptors for the Generic Pattern
Classification Problem 105
Loris Nanni, Sheryl Brahnam and Alessandra Lumini
Chapter 6 Simulation Optimization Using a Hybrid Scheme with Particle
Swarm Optimization for a Manufacturing Supply Chain 121
Alfonso T. Sarmiento and Edgar Gutierrez
Chapter 7 The Estimation of Cutting Forces in the Turning of Inconel 718
Assisted with a High Pressure Coolant Using Bio-Inspired
Artificial Neural Networks 147
Djordje Cica and Davorin Kramar
Chapter 8 Predictive Analytics using Genetic Programming 171
Luis Rabelo, Edgar Gutierrez, Sayli Bhide and Mario Marin
vi Contents
www.electronicbo.com
the Bioactivities of Complex Natural Products 277
Jose M. Prieto
Chapter 13 Predictive Analytics for Thermal Coal Prices Using
Neural Networks and Regression Trees 301
Mayra Bornacelli, Edgar Gutierrez and John Pastrana
Chapter 14 Explorations of the ‘Transhuman’ Dimension of
Artificial Intelligence 321
Bert Olivier
Index 339
PREFACE
After decades of basic research and more promises than impressive applications,
artificial intelligence (AI) is starting to deliver benefits. A convergence of advances is
motivating this new surge of AI development and applications. Computer capability as
evolved from high throughput and high performance computing systems is increasing. AI
models and operations research adaptations are becoming more matured, and the world is
breeding big data not only from the web and social media but also from the Internet of
Things.
This is a very distinctive book which discusses important applications using a variety
of paradigms from AI and outlines some of the research to be performed. The work
supersedes similar books that do not cover as diversified a set of sophisticated
applications. The authors present a comprehensive and articulated view of recent
developments, identifies the applications gap by quoting from the experience of experts,
and details suggested research areas.
The book is organized into 14 chapters which provide a perspective of the field of AI.
Areas covered in these selected papers include a broad range of applications, such as
manufacturing, autonomous systems, healthcare, medicine, advanced materials, parallel
distributed computing, and electronic commerce. AI paradigms utilized in this book
include unsupervised learning, ensembles, neural networks, deep learning, fuzzy logic,
support-vector machines, genetic algorithms, genetic programming, particle swarm
optimization, agents, and case-based reasoning. A synopsis of the chapters follow:
• Clustering Techniques: Novel research in clustering techniques are essential to
improve the required exploratory analysis for revealing hidden patterns, where label
information is unknown. Ramazan Ünlü in the chapter “Unsupervised Ensemble
Learning” discusses unsupervised ensemble learning, or consensus clustering which is a
method to improve the selection of the most suitable clusterization algorithm. The goal of
this combination process is to increase the average quality of individual clustering
methods. Through this chapter, the main concepts of clustering methods are introduced
viii Luis Rabelo, Sayli Bhide and Edgar Gutierrez
first and then the basics of ensemble learning are given. Finally, the chapter concludes
with a summary of the novel progresses in unsupervised learning.
• Deep Learning and a Complex Application in Parallel Distributed Simulation:
is introduced in the chapter by Edwin Cortes and Luis Rabelo entitled “Using Deep
Learning to Configure Parallel Distributed Discrete-Event Simulators.” The authors
implemented a pattern recognition scheme to identify the best time management and
synchronization scheme to execute a particular parallel discrete simulation (DES)
problem. This innovative pattern recognition method measures the software complexity.
It characterizes the features of the network and hardware configurations to quantify and
capture the structure of the Parallel Distributed DES problem. It is an innovative research
in deep belief network models.
www.electronicbo.com
• Autonomous Systems: The area of autonomous systems as represented by
autonomous vehicles and deep learning in particular Convolutional Neural Networks
(CNNs) are presented in the chapter “Machine Learning Applied to Autonomous
Vehicles” by Olmer García and Cesar Díaz. This chapter presents an application of deep
learning for the architecture of autonomous vehicles which are a good example of a
multiclass classification problem. The authors argue that the use of AI in this domain
requires two hardware/software systems: one for training in the cloud and the other one in
the autonomous vehicle. This chapter demonstrates that deep learning can create
sophisticated models which are able to generalize with relative small datasets.
• Genetic Algorithms & Support Vector Machines: The utilization of Genetic
Algorithms (GAs) to select which learning parameters of AI paradigms can actually assist
researchers in automating the learning process is discussed in the chapter “Evolutionary
Optimization of Support Vector Machines Using Genetic Algorithms”. Fred Gruber uses
a GA to find an optimized parameter set for support vector machines. GAs and cross
validation increase the generalization performance of support vector machines (SVMs).
When doing this, it should be noted that the processing time increases. However, this
drawback can be reduced by finding configurations for SVMs that are more efficient.
• Texture Descriptors for the Generic Pattern Classification Problem: In the
chapter “Texture Descriptors for the Generic Pattern Classification Problem”, Loris
Nanni, Sheryl Brahnam, and Alessandra Lumini propose a framework that employs a
matrix representation for extracting features from patterns that can be effectively applied
to very different classification problems. Under texture analysis, the chapter goes through
experimental analysis showing the advantages of their approach. They also report the
results of experiments that examine the performance outcomes from extracting different
texture descriptors from matrices that were generated by reshaping the original feature
vector. Their new methods outperformed SVMs.
• Simulation Optimization: The purpose of simulation optimization in predicting
supply chain performance is addressed by Alfonso Sarmiento and Edgar Gutierrez in the
chapter “Simulation Optimization Using a Hybrid Scheme with Particle Swarm
Preface ix
Optimization for a Manufacturing Supply Chain.” The methodology uses particle swarm
optimization (PSO) in order to find stability in the supply chain using a system dynamics
model of an actual situation. This is a classical problem where asymptotic stability has
been listed as one of the problems to solve. The authors show there are many factors that
affect supply chain dynamics including: shorter product life cycles, timing of inventory
decisions, and environmental regulations. Supply chains evolve with these changing
dynamics which causes the systems to behave non-linearly. The impacts of these
irregular behaviors can be minimized when the methodology solves an optimization
problem to find a stabilizing policy using PSO (that outperformed GAs in the same task).
To obtain a convergence, a hybrid algorithm must be used. By incorporating a theorem
that allows finding ideal equilibrium levels, enables a broader search to find stabilizing
policies.
• Cutting Forces: Accurate prediction of cutting forces has a significant impact on
quality of product in manufacturing. The chapter “Estimation of Cutting Forces in turning
of Inconel 718 Assisted with High Pressure Coolant using Bio-Inspired Artificial Neural
Networks” aims at utilizing neural networks to predict cutting forces in turning of a
nickel-based alloy Inconel 718 assisted with high pressure coolant. Djordje Cica and
Davorin Kramar discuss a study that employs two bio-inspired algorithms; namely GAs
and PSO, as training methods of neural networks. Further, they compare the results
obtained from the GA-based and PSO-based neural network models with the most
commonly used back propagation-based neural networks for performance.
• Predictive Analytics using Genetic Programming: The chapter “Predictive
Analytics using Genetic Programming” by Luis Rabelo, Edgar Gutierrez, Sayli Bhide,
and Mario Marin focus on predictive analytics using genetic programming (GP). The
authors describe with detail the methodology of GP and demonstrate its advantages. It is
important to highlight the use of the decile table to classify better predictors and guide the
evolutionary process. An actual application to the Reinforced Carbon-Carbon structures
of the NASA Space Shuttle is used. This example demonstrates how GP has the potential
to be a better option than regression/classification trees due to the fact that GP has more
operators which include the ones from regression/classification trees. In addition, GP can
help create synthetic variables to be used as input to other AI paradigms.
• Managing Overcrowding in Healthcare using Fuzzy Logic: The chapter
“Managing Overcrowding in Healthcare using Fuzzy Logic” focuses on the
overcrowding problem frequently observed in the emergency departments (EDs) of
healthcare systems. The hierarchical fuzzy logic approach is utilized by Abdulrahman
Albar, Ahmad Elshennawy, Mohammed Basingab, and Haitham Bahaitham to develop a
framework for quantifying overcrowding. The purpose of this research was to develop a
quantitative measurement tool for evaluating ED crowding which captures healthcare
experts’ opinions and other ED stakeholder’s perspectives. This framework has the
x Luis Rabelo, Sayli Bhide and Edgar Gutierrez
www.electronicbo.com
Mohammed Basingab. The applications of DES-CBR provided solutions that were
realistic, robust, and more importantly the results were scrutinized, and validated by field
experts.
• Agent Based Modeling and Simulation and its Application to E-commerce: by
Oloruntomi Joledo, Edgar Gutierrez, and Hathim Bukhari presents an application for a
peer-to-peer lending environment. The authors seek to find how systems performance is
affected by the actions of stakeholders in an ecommerce system. Dynamic system
complexity and risk are considered in this research. When systems dynamics and neural
networks are combined along with at the strategy level and agent- based models of
consumer behavior allows for a business model representation that leads to reliable
decision-making. The presented framework shares insights into the consumer-to-
consumer behavior in ecommerce systems.
• Artificial Intelligence for the Modeling and Prediction of the Bioactivities of
Complex Natural Products: by Jose Prieto presents neural networks as a tool to predict
bioactivities for very complex chemical entities such as natural products, and suggests
strategies on the selection of inputs and conditions for the in silico experiments. Jose
Prieto explains that neural networks can become reliable, fast and economical tools for
the prediction of anti-inflammatory, antioxidant, antimicrobial and anti-inflammatory
activities, thus improving their use in medicine and nutrition.
• Predictive Analytics: is one of the most advanced forms of analytics and AI
paradigms that are the core of these predictive systems. The chapter “Predictive Analytics
for Thermal Coal Prices using Neural Networks and Regression Trees” by Mayra
Bornacelli and Edgar Gutierrez aims to deliver price predictive analytics models. A
necessity for many industries. This chapter is targeted towards predicting prices of
thermal coal. By implementing the Delphi methodology along with neural networks,
conclusions can be reached about global market tendencies and variables. Although
neural networks outperformed regression trees, the latter created models which can be
easily visualized and understood. Overall, the research found that even though the market
of thermal coal is dynamic and the history of its prices is not a good predictive for future
Preface xi
prices; the general patterns that were found, hold more importance than the study of
individual prices and that the methodology that was used applies to oligopolistic markets.
• Explorations of the Transhuman Dimension of Artificial Intelligence: The final
chapter provides a very important philosophical discussion of AI and its ‘transhuman’
dimension, which is “here understood as that which goes beyond the human, to the point
of being wholly different from it.” In “Explorations of the ‘Transhuman’ Dimension of
Artificial Intelligence”, Bert Olivier examines the concept of intelligence as a function of
artificially intelligent beings. However, these artificially intelligent beings are recognized
as being ontologically distinct from humans as “embodied, affective, intelligent beings.”
These differences are the key to understand the contrast between AI and being-human.
His examination involves contemporary AI-research as well as projections of possible AI
developments. This is a very important chapter with important conclusions for AI and its
future.
We would like to acknowledge the individuals who contributed to this effort. First
and foremost, we would like to express our sincere thanks to the contributors of the
chapters for reporting their research and also for their time, and promptness. Our thanks
are due to Nova for publishing this book, their advice, and patience. We believe that this
book is an important contribution to the community in AI. We hope this book will serve
as a motivation for continued research and development in AI.
www.electronicbo.com
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.
Chapter 1
Ramazan Ünlü*
Industrial Engineering and Management Systems,
University of Central Florida, Orlando, FL, US
ABSTRACT
*
Corresponding Author Email: ramazanunlu@gmail.com.
2 Ramazan Ünlü
INTRODUCTION
Data mining (DM) is one of the most notable research areas in the last decades. DM
can be defined as interdisciplinary area of an intersection of Artificial Intelligence (AI),
machine learning, and statistics. One of the earliest studies of the DM, which highlights
some of its distinctive characteristics, is proposed by (Fayyad, Piatetsky-Shapiro, &
Smyth, 1996; Kantardzic, 2011), who define it as "the nontrivial process of identifying
valid, novel, potentially useful, and ultimately understandable patterns in data.". In
general, the process of extraction of implicit, hidden, and potentially useful knowledge
from data is a well-accepted definition of DM.
With the growing use of computers and data storage technology, there exists a great
www.electronicbo.com
amount of data being produced by different systems. Data can be defined as a set of
qualitative or quantitative variables such as facts, numbers, or texts that describe the
things. For DM, the standard structure of a data is a collection of samples in which
measurements named features are specified, and these features are obtained in many
cases. If we consider that a sample is represented by a multidimensional vector, each
dimension can be considered as one feature of the sample. In other words, it can be said
that features are some values that represent the specific characteristic of a sample
(Kantardzic, 2011).
Figure 1. Tabular form of the data. Original dataset can be found in http://archive.ics.uci.edu/ml/
datasets/Adult.
Based on true class information, the data can be categorized as labeled and unlabeled
data from DM perspective. Labeled data refers to a set of samples or cases with known
true classes, and unlabeled data is a set of samples or cases without known true classes.
The Figure 1 shows some samples of a dataset in the tabular form in which the columns
represent features of samples and the rows are values of these features for a specific
sample. In this example, consider that the true outputs are unknown. The true outputs can
be, for example, people who have annual income more than or less than $100.000. In
Unsupervised Ensemble Learning 3
UNSUPERVISED LEARNING
𝑝𝑖 ≠ 0 for 𝑖 = 1, … , 𝑘
∪𝑘𝑖=1 𝑝𝑖 = 𝑋
www.electronicbo.com
𝑝𝑖 ∩ 𝑝𝑗 = ∅ for 𝑖, 𝑗 = 1, … , 𝑘
Through this clustering process, clusters are created based on dissimilarities and
similarities between samples. Those dissimilarities and similarities are assessed based on
the feature values describing the objects and are relevant to the purpose of the study,
domain-specific assumptions and prior knowledge of the problem (Grira, Crucianu, &
Boujemaa, 2005). Since the similarity is an essential part of a cluster, a measure of the
similarity between two objects is very crucial in clustering algorithms. This action must
be chosen very carefully because the quality of a clustering model depends on this
decision. Instead of using similarity measure, the dissimilarity between two samples are
commonly used as well. For the dissimilarity metrics, a distance measure defined on the
feature space such as Euclidean distance, Minkowski distance, and City-block distance
(Kantardzic, 2011).
The standard process of clustering can be divided into the several steps. The structure
of those necessary steps of a clustering model are depicted in Figure 3 inspired by (R. Xu
& Wunsch, 2005). On the other hand, several taxonomies of clustering methods were
proposed by researchers (Nayak, Naik, & Behera, 2015; D. Xu & Tian, 2015; R. Xu &
Wunsch, 2005). It is not easy to give the strong diversity of clustering methods because
of different starting point and criteria. A rough but widely agreed categorization of
clustering methods is to classify them as hierarchical clustering and partitional clustering,
based on the properties of clusters generated (R. Xu & Wunsch, 2005). However, the
detailed taxonomy listed below in Table 1 inspired by the one suggested in (D. Xu &
Tian, 2015) is put forwarded.
In this study, details of algorithms categorized in Table 1 are not discussed. We can
refer the reader to (D. Xu & Tian, 2015) for a detailed explanation of these clustering
algorithms. However, a brief overview about ensemble based clustering is given. Detailed
discussion will be introduced in the section below.
Unsupervised Ensemble Learning 5
www.electronicbo.com
Figure 4. Comparison of different clustering methods. a represents the raw data without knowing true
classes. b, c and d illustrate various partition of the data produced by different methods.
Robustness: The consensus clustering might have better overall performance than
majority of individual clustering methods.
Consistency: The combination of individual clustering methods is similar to all
combined ones.
Stability: The consensus clustering shows less variability across iterations than
all combined algorithms.
In terms of properties like these, the better partitions can be produced in comparison
to most individual clustering methods. The result of consensus clustering cannot be
expected to be the best result in all cases as there could be exceptions. It can only be
ensured that consensus clustering outperforms most of the single algorithms combined
www.electronicbo.com
concerning some properties by assuming the fact that combination of good characteristics
of various partition is more reliable than any single algorithm.
Over the past years, many different algorithms have been proposed for consensus
clustering (Al-Razgan & Domeniconi, 2006; Ana & Jain, 2003; Azimi & Fern, 2009; d
Souto, de Araujo, & da Silva, 2006; Hadjitodorov, Kuncheva, & Todorova, 2006; Hu,
Yoo, Zhang, Nanavati, & Das, 2005; Huang, Lai, & Wang, 2016; Li & Ding, 2008; Li,
Ding, & Jordan, 2007; Naldi, Carvalho, & Campello, 2013; Ren, Domeniconi, Zhang, &
Yu, 2016). As it is mentioned earlier, it can be seen in the literature that the consensus
clustering framework is able to enhance the robustness and stability of clustering
analysis. Thus, consensus clustering has gained a lot of real-world applications such as
gene classification, image segmentation (Hong, Kwong, Chang, & Ren, 2008), video
retrieval and so on (Azimi, Mohammadi, & Analoui, 2006; Fischer & Buhmann, 2003; A.
K. Jain et al., 1999). From a combinatorial optimization point of view, the task of
combining different partitions has been formulated as a median partitioning problem
which is known to be N-P complete (Křivánek & Morávek, 1986). Even with the use of
recent breakthroughs this approach cannot handle datasets of size greater than several
hundreds of samples (Sukegawa, Yamamoto, & Zhang, 2013). For a comprehensive
literature of formulation of 0-1 linear program for the consensus clustering problem,
readers can refer to (Xanthopoulos, 2014).
The problem of consensus clustering can be verbally defined such that by using given
multiple partitions of the dataset, find a combined clustering model- or final partition-
that somehow gives better quality regarding some aspects as pointed out above.
Therefore, every consensus clustering method is made up of two steps in general: (1)
generation of multiple partition and (2) consensus function as shown in Figure 6 (Topchy,
Jain, & Punch, 2003; Topchy et al., 2004; D. Xu & Tian, 2015).
Generation of multiple partitions is the first step of consensus clustering. This action
aims to create multiple partitions that will be combined. It might be imperative for some
problems because final partition will depend on partitions produced in this step. Several
methods are proposed to create multiple partitions in literature as follows:
Unsupervised Ensemble Learning 9
For the same dataset, employ different traditional clustering methods: Using
different clustering algorithms might be the most commonly used method to
create multiple partitions for a given dataset. Even though there is no particular
rule to choose the conventional algorithms to apply, it is advisable to use those
methods that can have more information about the data in general. However, it is
not easy to know in advance which methods will be suitable for a particular
problem. Therefore, an expert opinion could be very useful (Strehl & Ghosh,
2002; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu & Tian, 2015).
For the same dataset, employ different traditional clustering methods with
different initializations or parameters: Using different algorithms with a different
parameter or initialization is an another efficient method (Ailon, Charikar, &
Newman, 2008).A simple algorithm can produce different informative partition
about the data, and it can yield an effective consensus in conjunction with a
suitable consensus function. For example, using the k-means algorithm with
different random initial centers and number of clusters to generate different
partitions introduced by (A. L. Fred & Jain, 2005).
Using weak clustering algorithms: In generation step, the weak clustering
algorithms are also used. These methods produce a set of partitions for data using
very straightforward methodology. Despite the simplicity of this kind of
methods, it is observed that weak clustering algorithms could provide high-
quality consensus clustering along with a proper consensus function (Luo, Jing,
& Xie, 2006; Topchy et al., 2003; Topchy, Jain, & Punch, 2005)
Data resampling: Data resampling such as bagging and boosting is an another
useful method to create multiple partitions (Dudoit & Fridlyand, 2003; Hong et
al., 2008). Dudoit S. and Jane Fridlyand J. applied a partitioning clustering
method (e.g., Partitioning Around Medoids) to a set of bootstrap learning data to
10 Ramazan Ünlü
The consensus function is the crucial and leading step of any consensus clustering
algorithm. These functions are used to combine a set of labels produced by individual
clustering algorithms in the previous step. The combined labels - or final partition- can be
considered as a result of another clustering algorithm. Foundation or definition of a
consensus function can profoundly impact the goodness of final partition which is the
product of any consensus clustering. However, the way of the combination of multiple
partitions is not the same in all cases. A sharp -but well-accepted- division of consensus
www.electronicbo.com
functions are (1) objects co-occurrence and (2) median partition approaches.
The idea of objects co-occurrence methods works based on similar and dissimilar
objects. If two data points are in the same cluster, those can be considered as similar,
otherwise they are dissimilar. Therefore, in objects co-occurrence methods it should be
analyzed that how many times data samples belong to one cluster. In median partition
approach, the final partition is obtained by solving an optimization problem which is the
problem of finding the median partition concerning cluster ensemble. Now the formal
version of the median partition problem can be defined. Given a set of 𝑞 partitions and a
similarity measure such as distance 𝜔(, ) between two partitions, a set of partition 𝑃∗ can
be found such that:
𝑞
∗
𝑃 = 𝑎𝑟𝑔𝑚𝑖𝑛𝑃 ∑ 𝜔(𝑃𝑖 , 𝑃)
𝑖=1
It can be found the detailed review of consensus functions, and taxonomy of principal
consensus functions in different studies such as (Ghaemi, Sulaiman, Ibrahim, &
Mustapha, 2009; Topchy et al., 2004; Vega-Pons & Ruiz-Shulcloper, 2011; D. Xu &
Tian, 2015). Also, relations among different consensus functions can be found in (Li,
Ogihara, & Ma, 2010). some of the main functions are summarized as follows:
Based on relabeling and voting: These methods are based on two important
steps. At the first step, the labeling correspondence problem needs to be solved.
The label of each sample is symbolic; a set of the label given by an algorithm
might be different than labels given by another algorithm. However, both sets of
labels correspond to the same partition. Solving this problem makes the partitions
ready for the combination process. If the labeling correspondence problem is
solved, then at the second step voting procedure can be applied. The voting
process finds how many times a sample is labeled with the same label. To apply
Unsupervised Ensemble Learning 11
these methods, each produced partition should have the same number of the
cluster with final partition (Topchy et al., 2005; Vega-Pons & Ruiz-Shulcloper,
2011). On the other hand, the strength of this method is easy to understand and
employ. Plurality Voting (PV) (Fischer & Buhmann, 2003), Voting-Mergin
(VM) (Weingessel, Dimitriadou, & Hornik, 2003), Voting for fuzzy clustering
(Dimitriadou, Weingessel, & Hornik, 2002), Voting Active Cluster (VAC)
(Tumer & Agogino, 2008). and Cumulative Voting (CV) (Ayad & Kamel,
2008)can be given as examples.
Based on co-association matrix: Algorithms based on the co-association matrix
is used to avoid the labeling correspondence problem. The main idea of this
approach is to create a co-association matrix in which each element is computed
based on how many times two particular samples are in the same cluster. A
clustering algorithm is necessary to produce the final partition. One of the
deficiencies of this kind of algorithm is that the computational complexity of the
methods is quadratic in the number of samples. Therefore, it is not suitable for
large datasets. On the other hand, they are very easy to understand and employ.
Evidence accumulation in conjunction with Single Link (EA-CL) or Complete
Link algorithms (EA-CL) (A. Fred, 2001) can be given as examples.
Based on graph partition: This kind of methods transform the combination of
multiple partitions into graph or hypergraph partitioning problem (Vega-Pons &
Ruiz-Shulcloper, 2011). All partitions in ensemble procedure can be represented
by a hyperedge, and final partition is obtained by implementing a graph-based
clustering algorithm. Three graph partitioning algorithms, Cluster-based
Similarity Partitioning Algorithm (CSPA), Hypergraph Partitioning Algorithm
(HGPA), and Meta-CLustering Algorithm (MCLA), are proposed by (Strehl &
Ghosh, 2002). In CSPA, a similarity matrix is created from a hypergraph. Each
element of this matrix shows how many times two points are assigned to the
same cluster. Final partition can be obtained by applying a graph similarity-based
algorithm such as spectral clustering or METIS. In HGPA, the hypergraph is
directly clustered by removing the minimum number of hyperedges. To get the
final partition from the hypergraph, an algorithm which is suitable to cluster
hypergraph such as HMETIS (Karypis, Aggarwal, Kumar, & Shekhar, 1999) is
used. In MCLA, the similarity between two clusters is defined based on the
number of common samples by using Jaccard index. The similarity matrix
between the clusters is the adjacency matrix of the graph whose nodes are the
clusters and edge is the similarity between the clusters. METIS algorithm is used
to recluster that graph. Computational and storage complexity of CSPA is
quadratic in the number of sample n, while HGPA and MCLA are linear.
Another graph based method is Hybrid Bipartite Graph Formulation (HBGF) is
proposed by (Fern & Brodley, 2004). As different from the previous methods,
12 Ramazan Ünlü
www.electronicbo.com
problem and it can be maximized by using k-means algorithm (Mirkin, 2001).
Using k-means algorithms, on the other hand, bring a deficiency which is the
necessity of determining the number of cluster as an initial parameter. Besides,
the method should be run multiple times to avoid bad local minima. For the
methodological details and implementation of the method, readers can refer to
(Gluck, 1989; Topchy et al., 2005).
Based on local adaptation: Local adoption based algorithm combines multiple
partitions using locally adaptive clustering algorithm (LAC) which is proposed
by (Domeniconi et al., 2007) with different parameters initialization. Weighty
similarity partition algorithm (WSPA), weighty bipartite partition algorithm
(WBPA) (Domeniconi & Al-Razgan, 2009), and weighted subspace bipartite
partitioning algorithm (WSPA). To obtain final partition, each method uses a
graph partitioning algorithm such as METIS. The strong restriction of these kinds
of methods is that LAC algorithms can be applied to only numerical data.
Based on kernel method: Weighted partition consensus via Kernels (WPCK) is
proposed by (Vega-Pons, Correa-Morris, & Ruiz-Shulcloper, 2010). This method
uses an intermediate step called Partition Relevance Analysis to assign weights to
represent the significance of the partition in the ensemble. Also, this approach
defines the consensus clustering via the median partition problem by using a
kernel function as the similarity measure between the clusters (Vega-Pons &
Ruiz-Shulcloper, 2011). Other proposed methods using the same idea can be
found in (Vega-Pons, Correa-Morris, & Ruiz-Shulcloper, 2008; Vega-Pons &
Ruiz-Shulcloper, 2009).
Based on fuzzy theory: So far, it has been explained ensemble clustering methods
whose methodology is developed based on hard partitioning. However, the soft
partitioning might also work in various cases. There are clustering methods like
EM and fuzzy-c-means that produce soft partition or fuzzy partition of the data.
Thus, to combine fuzzy partition instead of hard ones as an internal step of the
Unsupervised Ensemble Learning 13
process is the main logic of these kinds of methods. sCSPA, sMCLA, and
sHBGF (Punera & Ghosh, 2008) can be found as examples in literature.
In the literature, the various studies focus on the development of the consensus
clustering or application of the existing methods. In this section, some relatively recent
and related works are summarized. One can find many different terms corresponding
consensus clustering frameworks. That’s why the search for this study is limited to the
following terms:
Consensus clustering
Ensemble clustering
Unsupervised ensemble learning
Mainly, after they have created some base partitions, an improved similarity matrix is
created to get an optimal partition by using spectral clustering. An improved version of
LCE is proposed by (Iam-On, Boongoen, & Garrett, 2010)with the goal of using
additional information by implementing 'Weighted Triple Uniqueness (WTU)'. An
iterative consensus clustering is applied to a complex network (Lancichinetti &
Fortunato, 2012). Lancichinetti and Fortunat stress that there might be a noisy connection
in consensus graph which should be removed. Thus, they refined consensus graph by
removing some edges whose value is lower than some threshold value and reconnected it
to the closest neighbor until a block diagonal matrix is obtained. At the end, a graph-
based algorithm is applied to consensus graph to get final partition. To efficiently find the
similarity between two data points, which can be interpreted as the probability of being in
www.electronicbo.com
the same cluster, a new index, called the Probabilistic Rand Index (PRI) is developed by
(Carpineto & Romano, 2012). According to the author, they obtained better results than
existing methods. One of the possible problem in consensus framework is an inability to
handle uncertain data points which are assigned the same cluster in about the half of the
partitions and assigned to different clusters in rest of the partitions. This can yield a final
partition with the poor quality. To overcome this limitation, (Yi, Yang, Jin, Jain, &
Mahdavi, 2012) proposed an ensemble clustering method based on the technique of
matrix completion. The proposed algorithm constructs a partially observed similarity
matrix based on the pair of samples which are assigned to the same cluster by most of the
clustering algorithms. Therefore, the similarity matrix consists of three elements 0,1, and
unobserved. It is then used in the matrix completion algorithm to complete unobserved
elements. The final data partition is obtained by applying a spectral clustering algorithm
to final matrix (Yi et al., 2012).
A boosting theory based hierarchical clustering ensemble algorithm called Bob-Hic is
proposed by (Rashedi & Mirzaei, 2013) as an improved version of the method suggested
by (Rashedi & Mirzaei, 2011). Bob-Hic includes several boosting steps, and in each step,
first a weighted random sampling is implied on the data, and then a single hierarchical
clustering is created on the selected samples. At the end, the results of individual
hierarchical clustering are combined to obtain final partition. The diversity and the
quality of combined partitions are critical properties for a strong ensemble. Validity
Indexes are used to select high-quality partition among the produced ones by (Naldi et al.,
2013). In this study, the quality of a partition is measured by using a single index or
combination of some indexes. APMM is another criterion used in determining the quality
of partition proposed by (Alizadeh, Minaei-Bidgoli, & Parvin, 2014). This criterion is
also used to select some partitions among the all the produced partitions. A consensus
particle swarm clustering algorithm based on the particle swarm optimization (PSO)
(Kennedy, 2011) is proposed by (Esmin & Coelho, 2013). According to the results of this
study, the PSO algorithm produces results as good as or better than other well-known
consensus clustering algorithms.
Unsupervised Ensemble Learning 15
result. By internal quality measures, they refer to the real-valued quality metrics that are
computed directly from a clustering and do not include calculations that involve data
sample class information as opposed to external quality measures (Ünlü & Xanthopoulos,
2016b). In the next step, they have tried to make this study better in terms of a well-
known evaluation metric; variance. They have optimized internal quality measures by
applying Markowitz Portfolio Theory (MPT). Using the core idea of MPT which is
constructing portfolios to optimize expected return based on a given level of market risk
which is considered as variance, they have taken not only value of the validity measures
itself but variation on them into consideration. By doing this, they aimed to reduce
variance of the accuracy of the final partition which is produced by weighted consensus
clustering (Ünlü & Xanthopoulos, 2016a).
www.electronicbo.com
Throughout the section, some featured studies have been summarized. Researches on
consensus clustering are not limited to those summarized above, other contributions can
be seen in (Berikov, 2014; Gupta & Verma, 2014; Kang, Liu, Zhou, & Li, 2016; Lock &
Dunson, 2013; Parvin, Minaei-Bidgoli, Alinejad-Rokny, & Punch, 2013; Su, Shang, &
Shen, 2015; Wang, Shan, & Banerjee, 2011; Wu, Liu, Xiong, & Cao, 2013).
REFERENCES
Abello, J., Pardalos, P. M., & Resende, M. G. (2013). Handbook of massive data sets
(Vol. 4): Springer.
Ailon, N., Charikar, M., & Newman, A. (2008). Aggregating inconsistent information:
ranking and clustering. Journal of the ACM (JACM), 55(5), 23.
Al-Razgan, M., & Domeniconi, C. (2006). Weighted clustering ensembles Proceedings
of the 2006 SIAM International Conference on Data Mining (pp. 258-269): SIAM.
Alizadeh, H., Minaei-Bidgoli, B., & Parvin, H. (2014). Cluster ensemble selection based
on a new cluster stability measure. Intelligent Data Analysis, 18(3), 389-408.
Ana, L., & Jain, A. K. (2003). Robust data clustering Computer Vision and Pattern
Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on (Vol.
2, pp. II-II): IEEE.
Ayad, H. G., & Kamel, M. S. (2008). Cumulative voting consensus method for partitions
with variable number of clusters. IEEE Transactions on pattern analysis and
machine intelligence, 30(1), 160-173.
Ayad, H. G., & Kamel, M. S. (2010). On voting-based consensus of cluster ensembles.
Pattern Recognition, 43(5), 1943-1953.
Azimi, J., & Fern, X. (2009). Adaptive Cluster Ensemble Selection. Paper presented at the
IJCAI.
Unsupervised Ensemble Learning 17
Azimi, J., Mohammadi, M., & Analoui, M. (2006). Clustering ensembles using genetic
algorithm Computer Architecture for Machine Perception and Sensing, 2006. CAMP
2006. International Workshop on (pp. 119-123): IEEE.
Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering.
Pattern Recognition Letters, 38, 99-106.
Berkhin, P. (2006). A survey of clustering data mining techniques Grouping
multidimensional data (pp. 25-71): Springer.
Cades, I., Smyth, P., & Mannila, H. (2001). Probabilistic modeling of transactional data
with applications to profiling, visualization and prediction, sigmod. Proc. of the 7th
ACM SIGKDD. San Francisco: ACM Press, 37-46.
Carpineto, C., & Romano, G. (2012). Consensus clustering based on a new probabilistic
rand index with application to subtopic retrieval. IEEE Transactions on pattern
analysis and machine intelligence, 34(12), 2315-2326.
d Souto, M., de Araujo, D. S., & da Silva, B. L. (2006). Cluster ensemble for gene
expression microarray data: accuracy and diversity Neural Networks, 2006.
IJCNN'06. International Joint Conference on (pp. 2174-2180): IEEE.
de Hoon, M. J., Imoto, S., Nolan, J., & Miyano, S. (2004). Open source clustering
software. Bioinformatics, 20(9), 1453-1454.
Dimitriadou, E., Weingessel, A., & Hornik, K. (2002). A combination scheme for fuzzy
clustering. International Journal of Pattern Recognition and Artificial Intelligence,
16(07), 901-912.
Domeniconi, C., & Al-Razgan, M. (2009). Weighted cluster ensembles: Methods and
analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 2(4), 17.
Domeniconi, C., Gunopulos, D., Ma, S., Yan, B., Al-Razgan, M., & Papadopoulos, D.
(2007). Locally adaptive metrics for clustering high dimensional data. Data mining
and knowledge discovery, 14(1), 63-97.
Dudoit, S., & Fridlyand, J. (2003). Bagging to improve the accuracy of a clustering
procedure. Bioinformatics, 19(9), 1090-1099.
Esmin, A. A., & Coelho, R. A. (2013). Consensus clustering based on particle swarm
optimization algorithm Systems, Man, and Cybernetics (SMC), 2013 IEEE
International Conference on (pp. 2280-2285): IEEE.
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. Paper presented at the
Kdd.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge
discovery in databases. AI magazine, 17(3), 37.
Fern, X. Z., & Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite
graph partitioning Proceedings of the twenty-first international conference on
Machine learning (pp. 36): ACM.
18 Ramazan Ünlü
Fischer, B., & Buhmann, J. M. (2003). Bagging for path-based clustering. IEEE
Transactions on pattern analysis and machine intelligence, 25(11), 1411-1415.
Fred, A. (2001). Finding consistent clusters in data partitions International Workshop on
Multiple Classifier Systems (pp. 309-318): Springer.
Fred, A. L., & Jain, A. K. (2005). Combining multiple clusterings using evidence
accumulation. IEEE Transactions on pattern analysis and machine intelligence,
27(6), 835-850.
Ghaemi, R., Sulaiman, M. N., Ibrahim, H., & Mustapha, N. (2009). A survey: clustering
ensembles techniques. World Academy of Science, Engineering and Technology, 50,
636-645.
Gluck, M. (1989). Information, uncertainty and the utility of categories. Paper presented
www.electronicbo.com
at the Proc. of the 7th Annual Conf. of Cognitive Science Society.
Grira, N., Crucianu, M., & Boujemaa, N. (2005). Active semi-supervised fuzzy clustering
for image database categorization Proceedings of the 7th ACM SIGMM international
workshop on Multimedia information retrieval (pp. 9-16): ACM.
Gupta, M., & Verma, D. (2014). A Novel Ensemble Based Cluster Analysis Using
Similarity Matrices & Clustering Algorithm (SMCA). International Journal of
Computer Application, 100(10), 1-6.
Hadjitodorov, S. T., Kuncheva, L. I., & Todorova, L. P. (2006). Moderate diversity for
better cluster ensembles. Information Fusion, 7(3), 264-275.
Haghtalab, S., Xanthopoulos, P., & Madani, K. (2015). A robust unsupervised consensus
control chart pattern recognition framework. Expert Systems with Applications,
42(19), 6767-6776.
Hong, Y., Kwong, S., Chang, Y., & Ren, Q. (2008). Unsupervised feature selection using
clustering ensembles and population based incremental learning algorithm. Pattern
Recognition, 41(9), 2742-2756.
Hu, X., Yoo, I., Zhang, X., Nanavati, P., & Das, D. (2005). Wavelet transformation and
cluster ensemble for gene expression analysis. International journal of bioinformatics
research and applications, 1(4), 447-460.
Huang, D., Lai, J., & Wang, C.-D. (2016). Ensemble clustering using factor graph.
Pattern Recognition, 50, 131-142.
Huang, D., Wang, C.-D., & Lai, J.-H. (2016). Locally Weighted Ensemble Clustering.
arXiv preprint arXiv:1605.05011.
Iam-On, N., Boongeon, T., Garrett, S., & Price, C. (2012). A link-based cluster ensemble
approach for categorical data clustering. IEEE Transactions on knowledge and data
engineering, 24(3), 413-425.
Iam-On, N., & Boongoen, T. (2012). Improved link-based cluster ensembles Neural
Networks (IJCNN), The 2012 International Joint Conference on (pp. 1-8): IEEE.
Unsupervised Ensemble Learning 19
Iam-On, N., Boongoen, T., & Garrett, S. (2010). LCE: a link-based cluster ensemble
method for improved gene expression data analysis. Bioinformatics, 26(12), 1513-
1519.
Jain, A. (1999). Data Clusterting: A Review ACM Computing Surveys, vol. 31.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM
computing surveys (CSUR), 31(3), 264-323.
Jing, L., Tian, K., & Huang, J. Z. (2015). Stratified feature sampling method for
ensemble clustering of high dimensional data. Pattern Recognition, 48(11), 3688-
3702.
Kang, Q., Liu, S., Zhou, M., & Li, S. (2016). A weight-incorporated similarity-based
clustering ensemble method based on swarm intelligence. Knowledge-Based Systems,
104, 156-164.
Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms: John
Wiley & Sons.
Karypis, G., Aggarwal, R., Kumar, V., & Shekhar, S. (1999). Multilevel hypergraph
partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, 7(1), 69-79.
Kennedy, J. (2011). Particle swarm optimization Encyclopedia of machine learning (pp.
760-766): Springer.
Křivánek, M., & Morávek, J. (1986). NP-hard problems in hierarchical-tree clustering.
Acta informatica, 23(3), 311-323.
Lancichinetti, A., & Fortunato, S. (2012). Consensus clustering in complex networks.
Scientific reports, 2.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets:
Cambridge University Press.
Li, T., & Ding, C. (2008). Weighted consensus clustering Proceedings of the 2008 SIAM
International Conference on Data Mining (pp. 798-809): SIAM.
Li, T., Ding, C., & Jordan, M. I. (2007). Solving consensus and semi-supervised
clustering problems using nonnegative matrix factorization Data Mining, 2007.
ICDM 2007. Seventh IEEE International Conference on (pp. 577-582): IEEE.
Li, T., Ogihara, M., & Ma, S. (2010). On combining multiple clusterings: an overview
and a new perspective. Applied Intelligence, 33(2), 207-219.
Liu, H., Cheng, G., & Wu, J. (2015). Consensus Clustering on big data Service Systems
and Service Management (ICSSSM), 2015 12th International Conference on (pp. 1-
6): IEEE.
Lock, E. F., & Dunson, D. B. (2013). Bayesian consensus clustering. Bioinformatics,
btt425.
Lourenço, A., Bulò, S. R., Rebagliati, N., Fred, A. L., Figueiredo, M. A., & Pelillo, M.
(2015). Probabilistic consensus clustering using evidence accumulation. Machine
Learning, 98(1-2), 331-357.
20 Ramazan Ünlü
Luo, H., Jing, F., & Xie, X. (2006). Combining multiple clusterings using information
theory based genetic algorithm Computational Intelligence and Security, 2006
International Conference on (Vol. 1, pp. 84-89): IEEE.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate
observations. Paper presented at the Proceedings of the fifth Berkeley symposium on
mathematical statistics and probability.
McQuitty, L. L. (1957). Elementary linkage analysis for isolating orthogonal and oblique
types and typal relevancies. Educational and Psychological Measurement, 17(2),
207-229.
Mirkin, B. (2001). Reinterpreting the category utility function. Machine Learning, 45(2),
219-228.
www.electronicbo.com
Naldi, M. C., Carvalho, A. C., & Campello, R. J. (2013). Cluster ensemble selection
based on relative validity indexes. Data mining and knowledge discovery, 1-31.
Nayak, J., Naik, B., & Behera, H. (2015). Fuzzy C-means (FCM) clustering algorithm: a
decade review from 2000 to 2014 Computational Intelligence in Data Mining-
Volume 2 (pp. 133-149): Springer.
Parvin, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., & Punch, W. F. (2013). Data
weighing mechanisms for clustering ensembles. Computers & Electrical
Engineering, 39(5), 1433-1450.
Punera, K., & Ghosh, J. (2008). Consensus-based ensembles of soft clusterings. Applied
Artificial Intelligence, 22(7-8), 780-810.
Rashedi, E., & Mirzaei, A. (2011). A novel multi-clustering method for hierarchical
clusterings based on boosting Electrical Engineering (ICEE), 2011 19th Iranian
Conference on (pp. 1-4): IEEE.
Rashedi, E., & Mirzaei, A. (2013). A hierarchical clusterer ensemble method based on
boosting theory. Knowledge-Based Systems, 45, 83-93.
Ren, Y., Domeniconi, C., Zhang, G., & Yu, G. (2016). Weighted-object ensemble
clustering: methods and analysis. Knowledge and Information Systems, 1-29.
Sadeghian, A. H., & Nezamabadi-pour, H. (2014). Gravitational ensemble clustering
Intelligent Systems (ICIS), 2014 Iranian Conference on (pp. 1-6): IEEE.
Saeed, F., Ahmed, A., Shamsir, M. S., & Salim, N. (2014). Weighted voting-based
consensus clustering for chemical structure databases. Journal of computer-aided
molecular design, 28(6), 675-684.
Sander, J., Ester, M., Kriegel, H.-P., & Xu, X. (1998). Density-based clustering in spatial
databases: The algorithm gdbscan and its applications. Data mining and knowledge
discovery, 2(2), 169-194.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions
on pattern analysis and machine intelligence, 22(8), 888-905.
Unsupervised Ensemble Learning 21
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web usage mining:
Discovery and applications of usage patterns from web data. Acm Sigkdd
Explorations Newsletter, 1(2), 12-23.
Strehl, A., & Ghosh, J. (2002). Cluster ensembles — a knowledge reuse framework for
combining multiple partitions. Journal of machine learning research, 3(Dec), 583-
617.
Su, P., Shang, C., & Shen, Q. (2015). A hierarchical fuzzy cluster ensemble approach and
its application to big data clustering. Journal of Intelligent & Fuzzy Systems, 28(6),
2409-2421.
Sukegawa, N., Yamamoto, Y., & Zhang, L. (2013). Lagrangian relaxation and pegging
test for the clique partitioning problem. Advances in Data Analysis and
Classification, 7(4), 363-391.
Topchy, A., Jain, A. K., & Punch, W. (2003). Combining multiple weak clusterings Data
Mining, 2003. ICDM 2003. Third IEEE International Conference on (pp. 331-338):
IEEE.
Topchy, A., Jain, A. K., & Punch, W. (2004). A mixture model for clustering ensembles
Proceedings of the 2004 SIAM International Conference on Data Mining (pp. 379-
390): SIAM.
Topchy, A., Jain, A. K., & Punch, W. (2005). Clustering ensembles: Models of consensus
and weak partitions. IEEE Transactions on pattern analysis and machine
intelligence, 27(12), 1866-1881.
Tumer, K., & Agogino, A. K. (2008). Ensemble clustering with voting active clusters.
Pattern Recognition Letters, 29(14), 1947-1953.
Ünlü, R., & Xanthopoulos, P. (2016a). A novel weighting policy for unsupervised
ensemble learning based on Markowitz portfolio theory. Paper presented at the
INFORMS 2016, Nashville, TN.
Ünlü, R., & Xanthopoulos, P. (2016b). A weighted framework for unsupervised ensemble
learning based on internal quality measures. Manuscript submitted for publication.
Vega-Pons, S., Correa-Morris, J., & Ruiz-Shulcloper, J. (2008). Weighted cluster
ensemble using a kernel consensus function. Progress in Pattern Recognition, Image
Analysis and Applications, 195-202.
Vega-Pons, S., Correa-Morris, J., & Ruiz-Shulcloper, J. (2010). Weighted partition
consensus via kernels. Pattern Recognition, 43(8), 2712-2724.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2009). Clustering ensemble method for
heterogeneous partitions. Paper presented at the Iberoamerican Congress on Pattern
Recognition.
Vega-Pons, S., & Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble
algorithms. International Journal of Pattern Recognition and Artificial Intelligence,
25(03), 337-372.
22 Ramazan Ünlü
Wang, H., Shan, H., & Banerjee, A. (2011). Bayesian cluster ensembles. Statistical
Analysis and Data Mining, 4(1), 54-70.
Weingessel, A., Dimitriadou, E., & Hornik, K. (2003). An ensemble method for
clustering Proceedings of the 3rd International Workshop on Distributed Statistical
Computing.
Wright, W. E. (1977). Gravitational clustering. Pattern Recognition, 9(3), 151-166.
Wu, J., Liu, H., Xiong, H., & Cao, J. (2013). A Theoretic Framework of K-Means-Based
Consensus Clustering. Paper presented at the IJCAI.
Xanthopoulos, P. (2014). A review on consensus clustering methods Optimization in
Science and Engineering (pp. 553-566): Springer.
Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of
www.electronicbo.com
Data Science, 2(2), 165-193.
Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on
neural networks, 16(3), 645-678.
Yi, J., Yang, T., Jin, R., Jain, A. K., & Mahdavi, M. (2012). Robust ensemble clustering
by matrix completion Data Mining (ICDM), 2012 IEEE 12th International
Conference on (pp. 1176-1181): IEEE.
Yu, H., Liu, Z., & Wang, G. (2014). An automatic method to determine the number of
clusters using decision-theoretic rough set. International Journal of Approximate
Reasoning, 55(1), 101-115.
Zhong, C., Yue, X., Zhang, Z., & Lei, J. (2015). A clustering ensemble: Two-level-
refined co-association matrix with path-based transformation. Pattern Recognition,
48(8), 2699-2709.
AUTHOR BIOGRAPHY
Dr. Ramazan Unlu has a Ph.D. in Industrial Engineering from the University of
Central Florida with particular interest in data mining including classification and
clustering methods. His dissertation was titled “Weighting Policies for Robust
Unsupervised Ensemble Learning”. Besides doing his research, he has served as
Graduate Teaching Assistant in several courses during his Ph.D. Prior to enrolling at
UCF, he holds a master degree in Industrial engineering from University of Pittsburgh
and B.A. in Industrial Engineering from Istanbul University. For his master and doctoral
education, he won the fellowship that was given 26 Industrial Engineers by the Republic
of Turkey Ministry of National Education in 2010.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.
Chapter 2
1
Institute of Simulation and Training, Orlando, Florida, US
2
Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US
3
Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US
ABSTRACT
This research discusses the utilization of deep learning for selecting the time
synchronization scheme that optimizes the performance of a particular parallel discrete
simulation hardware/software arrangement. The deep belief neural networks are able to
use measures of software complexity and architectural features to recognize, match
patterns and therefore to predict performance. Software complexities such as simulation
objects, branching, function calls, concurrency, iterations, mathematical computations,
and messaging frequency were given a weight based on the cognitive weighted approach.
In addition, simulation objects and hardware/network features such as the distributed
pattern of simulation objects, CPUs features (e.g., multithreading/multicore), and the
degree of loosely vs tightly coupled of the utilized computer architecture were also
captured to define the parallel distributed simulation arrangement. Deep belief neural
networks (in particular the restricted Boltzmann Machines (RBMs) were then used to
perform deep learning from the complexity parameters and their corresponding time
synchronization scheme value as measured by speedup performance. The simulation
*
Corresponding Author Email: luis.rabelo@ucf.edu
24 Edwin Cortes, Luis Rabelo and Gene Lee
INTRODUCTION
www.electronicbo.com
processors (LPs) or simulation objects that can be executed concurrently using
partitioning types (e.g., spatial and temporal) (Fujimoto, 2000). Each LP/simulation
object of a simulation (which can be composed of numerous LPs) is located in a single
node. PDDES is very important in particular for:
One of the problems with PDDES is the time management to provide flow control
over event processing, the process flow, and the coordination of the different LPs and
nodes to take advantage of parallelism. There are several time management schemes
developed such as Time Warp (TW), Breathing Time Buckets (BTB), and Breathing
Time Warp (BTW) (Fujimoto, 2000). Unfortunately, there is not a clear methodology to
decide a priori a time management scheme to a particular PDDES problem in order to
achieve higher performance.
This research shows a new approach for selecting the time synchronization technique
class that corresponds to a particular parallel discrete simulation with different levels of
simulation logic complexity. Simulation complexities such as branching, function calls,
concurrency, iterations, mathematical computations, messaging frequency and number of
simulation objects were given a weighted parameter value based on the cognitive weight
approach. Deep belief neural networks were then used to perform deep learning from the
simulation complexity parameters and their corresponding time synchronization scheme
value as measured by speedup performance.
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 25
www.electronicbo.com
Figure 1: The process of rollback in TW using antimessages and the process of cancellation of events.
BTB is a hybrid between the Fixed Time Buckets algorithm and TW (Steinman,
1993). Unlike TW, “messages generated while processing events are never actually
released until it is known that the event generating the messages will never be rolled
back” (Steinman, 1993). This means that messages which cause invalid events with
potential antimessages are not released. Therefore, BTB is a hybrid in the following
sense:
The Event Horizon is an important concept in BTB (Steinman, 1994). The event
horizon is the point in time where events generated by the simulation turn back into the
simulation. At the event horizon, all new events that were generated through event
processing at the previous “bucket” could be sorted and merged back into the main event
queue. Parallelism can be exploited because the event processed in each event horizon
cycle has time tags earlier than the cycle’s event horizon. Therefore, it is important to
calculate the Global Event Horizon (GEH) with its respective Global Virtual Time (GVT)
to avoid problems with events that will be scheduled in others simulation objects
(Steinman, 1994). The local event horizon (Figure 2) only considers the event horizon for
events being processed on its node, while the global event horizon factors all nodes in its
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 27
calculation. Once all of the nodes have processed events up to their local event horizon,
they are then ready to synchronize. The next step is to compute the global event horizon
as the minimum local event horizon across all nodes. Once GVT is determined, all events
with time stamps less than or equal to GVT are committed (Steinman, Nicol, Wilson, &
Lee, 1995).
A potential problem is that some of the nodes may have processed events that went
beyond GVT. An event processed by the respective simulation object must be rolled back
when a newly generated events is received in its past. Rollback is very simple in this case
and involves discarding unsent messages that were generated by the event and then
restoring state variables that were modified by the event. Therefore, antimessages are not
required because messages that would create bad events are never released (Steinman
1996).
BTW is another hybrid algorithm for time management and event synchronization
that tries to solve the problems with TW and BTB (Steinman, 1993):
Cascading antimessage explosions can occur when events are close to the current
GVT. Because events processed far ahead of the rest of the simulation will likely be
rolled back, it might be better for those runaway events to not immediately release their
messages. On the other hand, using TW as an initial condition to bring BTB reduces the
frequency of synchronizations and increases the size of the bucket.
The process of BTW is explained as follows:
1. The first simulation events processed locally on each node beyond GVT release
their messages right away as in TW. After that, messages are held back and the
BTW starts execution.
2. When the events of the entire cycle are processed, or when the event horizon is
www.electronicbo.com
determined, each node requests a GVT update. If a node ever processes more
events beyond GVT, it temporarily stops processing events until the next GVT
cycle begins.” These parameters are defined by the simulation engineer. An
example of a typical processing cycle for a three-node execution is provided in
Figure 3.
Figure 3: BTW cycle in three nodes. The first part of the cycle is Time Warp (TW) and it ends with
Breathing Time Buckets (BTB) until GVT is reached.
Deep neural architectures with multiple hidden layers were difficult to train and
unstable with the backpropagation algorithm. Empirical results show that using
backpropagation alone for neural networks with 3 or more hidden layers produced poor
solutions (Larochelle, Bengio, Louradour, & Lamblin, 2009).
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 29
Hinton, Osindero, & Teh (2006) provided novel training algorithms that trained
multi-hidden layer deep belief neural networks (DBNs). Their work introduced the
greedy learning algorithm to train a stack of restricted Boltzmann machines (RBMs),
which compose a DBN, one layer at a time. The central concept of accurately training a
DBN, that extracts complex patterns in data, is to find the matrix of synaptic neuron
connection weights that produce the smallest error for the training (input-data) vectors.
The fundamental learning blocks of a DBN are stacked restricted Boltzmann
machines. The greedy algorithm proposed by Hinton et al. (2006) focused on allowing
each RBM model in the stack to process a different representation of the data. Then, each
model transforms its input-vectors non-linearly and generates output-vectors that are then
used as input for the next RBM in the sequence.
When RBMs are stacked, they form a composite generative model. RBMs are
generative probabilistic models between input units (visible) and latent (hidden) units
(Längkvist, Karlsson, & Loutfi, 2014). An RBM is also defined by Zhang, Zhang, Ji, &
Guo (2014) as a parameterized generative model representing a probability distribution.
Figure 4 shows an RBM (at lower level) with binary variables in the visible layer and
stochastic binary variables in the hidden layer (Hinton et al., 2012). Visible units have not
synaptic connections between them. Similarly, hidden units are not interconnected. No
hidden-hidden or visible-visible connectivity makes the Boltzmann machines restricted.
During learning, the RBM at higher-level (Figure 4) uses the data generated by the
hidden activities of the lower RBM.
Zhang et al. (2014) stated that learning in an RBM is accomplished by using training
data and “adjusting the RBM parameters such that the probability distribution represented
by the RBM fits the training data as well as possible.” RBMs are energy-based models.
As such, a scalar energy is associated to each variable configuration. Per Bengio (2009),
learning from data corresponds to performing a modification of the energy function until
its shape represents the properties needed. This energy function has different forms
depending on the type of RBM it represents. Binary RBMs, also known as Bernoulli
(visible)-Bernoulli (hidden) have an energy E (energy of a joint configuration between
visible and hidden units) function of the form:
𝐼 𝐽 𝐼 𝐽
www.electronicbo.com
𝑖=1 𝑗=1 𝑖=1 𝑗=1
The variables 𝑤𝑖𝑗 represent the weight (strength) of a neuron connection between a
visible (𝑣𝑖 ) and hidden units (ℎ𝑗 ). Variables 𝑏𝑖 and 𝑎𝑗 are the visible units biases and the
hidden units biases, respectively. I and J are the number of visible and hidden units,
respectively. The set θ represents the vector variables 𝒘, 𝒃, and 𝒂 (Hinton, 2010;
Mohamed et al., 2011; Mohamed, Dahl, & Hinton, 2012).
On the other hand, a Gaussian RBM (GRBM), Gaussian (visible)-Bernoulli (hidden),
has an energy function of the form:
𝐼 𝐽 𝐼 𝐽
1
E(𝐯, 𝐡; θ) = − ∑ ∑ 𝑤𝑖𝑗 𝑣𝑖 ℎ𝑗 − ∑(𝑣𝑖 − 𝑏𝑖 )𝟐 − ∑ 𝑎𝑗 ℎ𝑗 (2)
2
𝑖=1 𝑗=1 𝑖=1 𝑗=1
∑𝒉 𝑒 −E(𝐯,𝐡;θ)
p(𝐯; θ) = (3)
∑𝐯 ∑𝐡 𝑒 −E(𝐯,𝐡;θ)
For binary RBMs, the conditional probability distributions are sigmoidal in nature
and are defined by:
and
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 31
1
where 𝜎(𝜆) = 1+𝑒 −𝜆 is a sigmoid function (Hinton, 2006; Hinton et al., 2006).
Real-valued GRBMs have a conditional probability for ℎ𝑗 =1, a hidden variable
turned on, given the evidence vector 𝐯 of the form:
The GRBM conditional probability for 𝑣𝑖 =1, given the evidence vector h, is
continuous-normal in nature and has the form
(v −μ )2
− i 𝑖
e 2 J
where 𝒩(μ𝑖 , 1) = is a Gaussian distribution with mean μ𝑖 = ∑j=1 wij hj + ai
√2π
and variance of unity (Mohamed et al., 2012; Cho, Ilin, & Raiko, 2011).
Learning from input-data in an RBM can be summarized as calculating a good set of
neuron connection weight vectors, 𝒘, that produce the smallest error for the training
(input-data) vectors. This also implies that a good set of bias (b and a) vectors must be
determined. Because learning the weights and biases is done iteratively, the weight
update rule is given by ∆𝒘𝒊𝒋 (equation 8). This is the partial derivative of the log-
likelihood probability of a training vector with respect to the weights,
∂ log[p(𝐯)]
= ∆𝑤𝑖𝑗 = 〈𝑣𝑖 ℎ𝑗 〉𝑑𝑎𝑡𝑎 − 〈𝑣𝑖 ℎ𝑗 〉𝑚𝑜𝑑𝑒𝑙 ) (8)
𝜕𝒘
This is well explained by Salakhutdinov and Murray (2008), Hinton (2010), and
Zhang et al. (2014). However, this exact computation is intractable because 〈𝑣𝑖 ℎ𝑗 〉𝑚𝑜𝑑𝑒𝑙
takes exponential time to calculate exactly (Mohamed et al., 2011). In practice, the
gradient of the log-likelihood is approximated.
Contrastive divergence learning rule is used to approximate the gradient of the log-
likelihood probability of a training vector with respect of the neuron connection weights.
The simplified learning rule for an RBM has the form (Längkvist et al., 2014):
The reconstruction values for 𝑣𝑖 𝑎𝑛𝑑 ℎ𝑗 are generated by applying equations 4 and 5 ,
or 7 for GRBM as explained by (Mohamed et al., 2012) in a Markov Chain using Gibbs
sampling. Post Gibbs sampling, the contrastive divergence-learning rule for an RBM can
be calculated and the weights of the neuron connections updated based on ∆𝑤. The
literature also shows that RBM learning rule (equation 9) may be modified with constants
such as learning rate, weight-cost, momentum, and mini-batch sizes for a more precise
calculation of neuron weights during learning. Hinton et al. (2006) described that the
contrastive divergence learning in an RBM is efficient enough to be practical.
In RBM neuron learning, a gage of the error between visible unit probabilities and
their reconstruction probabilities computed after Gibbs sampling is accomplished by
cross-entropy. The cross-entropy, between the Bernoulli probability distributions of each
www.electronicbo.com
element of the visible units vdata and its reconstruction probabilities vrecon, is defined by
Erhan, Bengio, & Courville (2010) as follows:
For the final DBN learning phase, after each stack of RBMs in the DBN pre-training
via greedy layer-wise unsupervised, the complete DBN is fine-tuned in a supervised way.
The supervised learning via the backpropagation algorithm uses label data (classification
data) to calculate neuron weights for the complete deep belief neural network. Hinton et
al. (2006) used the wake-sleep algorithm for fine-tuning a DBN. However, recent
research has demonstrated the backpropagation algorithm is faster and has lower
classification error (Wulsin et al., 2011). In backpropagation, the derivative of the log
probability distribution over class labels is propagated to fine-tune all neuron weights in
the lower levels of a DBN.
In summary, the Greedy Layer-Wise algorithm proposed by Hinton pre-trains the
DBN one layer at a time using contrastive divergence and Gibbs sampling, starting from
the bottom fist layer of visible variables to the top of the network – one RBM at a time
(Figure 5). After pre-train, the final DBN is fine-tuned in a top-down mode using several
algorithms such as the supervised backpropagation (Hinton & Salakhutdinov, 2006;
Larochelle et al., 2009) or the wake-sleep (Hinton et al., 2006; Bengio, 2009) – among
others.
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 33
The parallel discrete event simulator utilized was WarpIV (. This simulation kernel is
able to host discrete-event simulations over parallel and distributed cluster computing
environments. WarpIV supports heterogeneous network applications through its portable
high-speed communication infrastructure which integrates both shared memory with
standard network protocols to facilitate high bandwidth and low latency message passing
services.
34 Edwin Cortes, Luis Rabelo and Gene Lee
www.electronicbo.com
Figure 6: Aircraft range detection scenario using two types of simulations objects (radar and aircraft).
#Nodes
Wall Min Max Mean
Speedup Speedup
Local Globel Clock PT Committed PT Committed PT Committed PT Sigma
Rel Theoratical
Time Pernode Pernode Pernode
BTM 1 1 16.5 1.0 3.0 15.6 15.6 15.6 15.6 0.0
1 2 14.1 1.2 3.0 15.6 5.3 10.3 7.8 2.5
1 3 12.4 1.3 3.0 15.7 5.2 5.3 5.2 0.0
1 4 11.4 1.4 3.0 15.6 0.0 5.3 3.9 2.2
2 to 4 14 6.1 2.7 3.0 15.4 0.0 5.2 1.1 2.5
4 8 6.5 2.6 3.0 15.5 0.0 5.2 1.9 2.2
4 4 9.4 1.8 3.0 15.5 0.0 5.2 3.9 0.0
3 3 10.5 1.6 3.0 15.8 5.3 5.3 5.3 0.0
BTB 1 1 16.1 1.0 3.0 15.6 5.7 5.7 5.7 2.5
1 2 62.1 0.3 3.0 15.6 5.3 10.3 7.8 0.5
1 3 148.0 0.1 3.0 15.6 5.1 5.2 5.2 2.2
1 4 162.6 0.1 3.0 15.7 0.0 5.3 3.9 2.1
2 to 4 14 7.7 2.1 3.0 15.4 0.0 5.2 1.1 2.5
4 8 6.2 2.6 3.0 15.3 0.0 5.2 1.2 2.2
4 4 9.4 1.7 3.0 15.5 0.0 5.2 3.9 0.0
3 3 10.2 1.6 3.0 15.6 5.2 5.2 5.2 0.0
TW 1 1 17.2 1.0 3.0 15.6 15.6 15.6 15.6 0.0
1 2 13.8 1.2 3.0 15.6 5.3 10.3 7.8 2.5
1 3 12.6 1.4 3.0 15.6 5.2 5.3 5.2 0.0
1 4 10.9 1.6 3.0 15.5 0.0 5.2 3.9 2.2
2 to 4 14 5.9 2.9 3.0 15.4 0.0 5.2 1.1 2.1
4 8 6.2 2.8 3.0 15.3 0.0 5.2 1.9 2.5
4 4 10.0 1.7 3.0 15.5 0.0 5.2 3.9 2.2
3 3 11.4 1.5 3.0 15.8 5.2 5.3 5.3 0.0
36 Edwin Cortes, Luis Rabelo and Gene Lee
Wall Clock Time (elapsed wall time in seconds) is a measure of the real time that
elapses from start to end, including time that passes due to programmed (artificial) delays
or waiting for resources to become available. In other words, it is the difference between
the time at which a simulation finishes and the time at which the simulation started. It is
given in seconds.
Speedup Rel (Speedup Relative) is
T(Wall Clock Time for 1 Node for that time synchronization scheme)
T(Wall Clock Time for Nodes used for that time synchronization scheme).
Speedup Theoretical is based on the Simulation Object with the longest processing
www.electronicbo.com
time. It is the maximum (approximated) Speedup expected using an excellent parallelized
scheme (taking advantage of the programming features, computer configuration of the
system, and partitions of the problem).
PT (processing time) is the total CPU time required to process committed events, in
seconds. The processing time does not include the time required to process events that are
rolled back, nor does it include additional overheads such as event queue management
and messages.
Min Committed PT per Node is the Minimum Committed Processing Time per
Node of the computing system configuration utilized.
Max Committed PT per Node is the Maximum Committed Processing Time per
node of the computing system configuration utilized.
Mean Committed PT per Node is the Mean Committed Processing Time per node
of the Computing system configuration utilized.
Sigma is the standard deviation of the processing times of the different nodes utilized
in the experiment.
The benchmark for the different time management and synchronization schemes
(TW, BTB, and BTW) is depicted in Figure 7. TW has the best result of 2.9 (close to the
theoretical speedup of 3.0). BTW and TW are very comparable. BTW does not perform
well with this type of task for distributed systems. However, BTW has better
performance with the utilization of multicore configurations (i.e., tightly coupled) for this
specific problem.
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 37
Figure 7: Combined Speedup chart for BTW, BTB, and TW for different number of processors (nodes)
– A global node is a separate cluster. A local node is a node from a specific cluster. Therefore Global 3
and Local 3 means 3 separate clusters and each one with 3 computers (in total 9 nodes).
www.electronicbo.com
Interrupt 4
𝑞 𝑛
𝑊𝑐 = ∑𝑗=1[∏𝑚
𝑘=1 ∑𝑖=1 𝑤𝑐 (𝑗, 𝑘, 𝑖)] ……………… (11)
Cognitive weight scores for a particular block of software contributes more to total
weigh if multiple basic control structures are encompassed within nested sections. For
example, methodA() in Figure 8 achieves a larger cognitive weight than methodB() due
to nested while-loop inside the if-then construct.
14. STD processor Speed: It is the standard deviation of the speed of the CPUs
used in the simulation.
15. Mean RAM: It is the mean of the RAM memory used by the CPUs in the
system.
16. STD RAM: It is the standard deviation of the RAM memory used by the
CPUs in the system.
17. Critical Path%: It is the Critical Path taking into consideration the
sequential estimated processing time.
18. Theoretical Speedup: It is the theoretical (maximum) speedup to be
achieved with perfect parallelism in the simulation.
19. Local Events/(Local Events + External Events): It is the ratio of the total
local events divided by the summation of the total local events and the total
www.electronicbo.com
external events during a specific unit of Simulation Time (estimated).
20. Subscribers/(Publishers + Subscribers): It is the ratio of the total number
of objects subscribing to a particular object divided by the summation of the
total number of publishers and subscribers.
21. Block or Scatter?: Block and scatter are decomposition algorithms being
used to distribute the simulation objects in the parallel/distributed system - If
Block is being selected then this value is 1. and if Scatter is selected then this
value is 0.
For example, for the discussed aircraft detection implementation, this is the following
input vector using the hardware and complexity specifications from Figures 6 and 7, and
Tables 1 and 2 for a configuration of 4 Global Nodes and 1 Local Node (a loosely
coupled system) using “Block” as the distribution scheme for the simulation objects is
shown in Table 3.
And the output for a DBN will be based on Table 4 where the Wall Clock Time for
BTW is 11.4 seconds, for BTB is 162.6 seconds, and for TW is 10.9 seconds. Table 4
displays the output vector of the respective case study of aircraft detection.
Table 3: Vector that defines the PDDES implementation for the aircraft detection
with 4 Global Nodes and 1 Local Node using Block
Table 4. TW has the minimum wall clock time for the aircraft detection problem
using 4 Global Nodes and 1 Local Node with Block
Methodology
This is the methodology devised in order to recognize the best time management and
synchronization scheme for a PDDES problem. The input vector is define based on the
complexity and features of the software, hardware, and messaging of the PDDES
problem (as explained above). The output vector defines the best time management and
synchronization scheme (TW, BTW, BTB). This pattern matching is achieved using a
DBN trained with case studies performed by a Parallel Distributed Discrete-Event
Simulator. This methodology is depicted in Figure 9.
42 Edwin Cortes, Luis Rabelo and Gene Lee
www.electronicbo.com
Figure 9: Classification of Optimistic Synchronization Scheme with DBN.
This section deals with the testing of our proposed idea of using deep belief networks
as pattern-matching mechanisms for time management and synchronization of parallel
distributed discrete-event simulations. The performance criterion and the knowledge
acquisition scheme will be presented. This discussion includes an analysis of the results.
For these studies the performance criterion which will be used the minimum wall-
clock time. Wall-clock time means the actual time taken by the computer system to
complete a simulation. Wall-clock time is very different from CPU time. CPU time
measures the time during which the processor (s) is (are) actively working on a certain
task (s), wall-clock time calculates the total time for the process (es) to complete.
Several PDDES problems were selected to generate the case studies in order to train
the DBN. We had in total 400 case studies. Two hundred case studies were selected for
training (i.e., to obtain the learning parameters), one hundred case studies for validation
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 43
(i.e., to obtain the right architecture, and one hundred for testing (i.e., to test the DBN
developed).
The training session for a DBN was accomplished. There are three principles for
training DBNs:
Results
The finalized DBN has the following training and testing performance as shown in
Figure 10. It is important to remember that the training set was of 200 case studies
selected, the validation set with 100 case studies, and the testing set was composed of 100
case studies. The validation set is user in order to get right architecture that leads to
higher performance. Figure 10 indicates the performance obtained with DBNs for this
problem.
Stating the research question initiates the research methodology process. This
investigation starts by asking: Is there a mechanism to accurately model and predict what
is the best time management and synchronization scheme for a parallel discrete event
simulation environment (program and hardware)? Based on the results, this was
accomplished in spite of the limited number of case studies.
44 Edwin Cortes, Luis Rabelo and Gene Lee
CONCLUSIONS
www.electronicbo.com
A deep belief network model can be used as a detector of patterns not seeing during
training by inputting a mixture of diverse data from different problems in PDDES. In
reaction to the input, the ingested mixed data then triggers neuron activation probabilities
that propagate through the DBN layer-by-later until the DBN output is reached. The
output probability curve is then examined to select the best optimistic time management
and synchronization scheme (to be utilized).
REFERENCES
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends in
Machine Learning, 2, 1-127.
Cho, K., Alexander, I. & Raiko, T. (2011). Improved learning of Gaussian-Bernoulli
restricted Boltzmann machines. In Artificial Neural Networks and Machine
Learning–ICANN 2011, 10-17.
Erhan, D., Yoshua, B., Courville, A., Manzagol, P., Pascal, V., & Bengio, S. (2010). Why
does unsupervised pre-training help deep learning? The Journal of Machine Learning
Research, 11, 625-660.
Fujimoto, R. (2000). Parallel and Distributed Simulation. New York: John Wiley &
Sons.
Hinton, G. (2007). Learning multiple layers of representation. Trends in cognitive
Sciences, 11(10), 428-434. doi:10.1016/j.tics.2007.09.004
Hinton, G. (2010). A practical guide to training restricted Boltzmann machines.
Momentum, 9(1), 926.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke,
V., Nguyen, P., Sainath, T., & Kingsbury, B. (2012). Deep neural networks for
acoustic modeling in speech recognition: The shared views of four research groups.
Signal Processing Magazine, IEEE, 29(6), 82-97. doi:10.1109/MSP.2012.2205597
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 45
Hinton, G., Osindero, S., & Teh, Y. (2006). A Fast Learning Algorithm for Deep Belief
Nets. Neural Computation, 18(7), 1527-1554.
Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural
networks. Science, 313(5786), 504-507. doi:10.1126/science.1127647
Längkvist, M., Karlsson, L., & Loutfi, A. (2012). Sleep stage classification using
unsupervised feature learning. Advances in Artificial Neural Systems, 2012, Article
ID 107046, 9 pages. doi:10.1155/2012/107046
Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A Review of Unsupervised Feature
Learning and Deep Learning for Time-Series Modeling. Pattern Recognition Letters,
42, 11-24. doi :10.1016/j.patrec.2014.01.008
Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for
training deep neural networks. The Journal of Machine Learning Research, 10, 1-40.
Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann
machines and deep belief networks. Neural Computation, 20, 1631-1649.
doi:10.1162/neco.2008.04-07-510
Misra, S. (2006). A Complexity Measure based on Cognitive Weights. International
Journal of Theoretical and Applied Computer Sciences, 1(1), 1–10.
Mohamed, A., Sainath, T., Dahl, G., Ramabhadran, B., Hinton, G., & Picheny, M.
(2011). Deep belief networks using discriminative features for phone recognition.
Proceeding of the IEEE Conference on Acoustics, Speech and Signal Processing,
5060-5063.
Mohamed, A., Dahl, G., & Hinton, G. (2012). Acoustic modeling using deep belief
networks. IEEE Transactions on Audio, Speech, and Language Processing, 20(1),
14-22. doi:10.1109/TASL.2011.2109382
Salakhutdinov, R., & Murray, L. (2008). On the quantitative analysis of deep belief
networks. Proceedings of the 25th international conference on Machine learning,
872-879. doi:10.1145/1390156.1390266
Shao, J., & Wang, Y. (2003). A new measure of software complexity based on cognitive
weights. Canadian Journal of Electrical and Computer Engineering, No. 0840-8688,
1- 6.
Steinman, J. (1991). SPEEDES: Synchronous Parallel Environment for Emulation and
Discrete Event Simulation. Proceedings of Advances in Parallel and Distributed
Simulation, 95-103.
Steinman, J. (1992). SPEEDES: A Multiple-Synchronization Environment for Parallel
Discrete-Event Simulation. International Journal in Computer Simulation, 2, 251-
286.
Steinman, J. (1993). Breathing Time Warp. Proceedings of the 7th Workshop on Parallel
and Distributed Simulation (PADS93), 23, 109-118.
Steinman, J. (1994). Discrete-Event Simulation and the Event Horizon. Proceedings of
the 1994 Parallel and Distributed Simulation Conference, 39-49.
46 Edwin Cortes, Luis Rabelo and Gene Lee
Steinman, J. (1996). Discrete-Event Simulation and the Event Horizon Part 2: Event List
Management. Proceedings of the 1996 Parallel and Distributed Simulation
Conference, 170- 178.
Steinman, J., Nicol, D., Wilson, L., & Lee, C. (1995). Global Virtual Time and
Distributed Synchronization. Proceedings of the 1995 Parallel and Distributed
Simulation Conference, 139-148.
Steinman, J., Lammers, C., Valinski, M., & Steinman, W. (2012). External Modeling
Framework and the OpenUTF. Report of WarpIV Technologies. Retrieved from
http://www.warpiv.com/Documents/Papers/EMF.pdf
Wulsin, D., Gupta, J., Mani, R., Blanco, J., & Litt, B. (2011). Modeling
electroencephalography waveforms with semi-supervised deep belief nets: fast
www.electronicbo.com
classification and anomaly measurement. Journal of neural engineering, 8(3),
036015. doi:10.1088/1741-2560/8/3/036015
Zhang, C., Zhang, J., Ji, N., & Guo, G. (2014). Learning ensemble classifiers via
restricted Boltzmann machines. Pattern Recognition Letters, 36, 161-170.
AUTHORS’ BIOGRAPHIES
Dr. Luis Rabelo was the NASA EPSCoR Agency Project Manager and currently a
Professor in the Department of Industrial Engineering and Management Systems at the
University of Central Florida. He received dual degrees in Electrical and Mechanical
Engineering from the Technological University of Panama and Master’s degrees from the
Florida Institute of Technology in Electrical Engineering (1987) and the University of
Missouri-Rolla in Engineering Management (1988). He received a Ph.D. in Engineering
Management from the University of Missouri-Rolla in 1990, where he also did Post-
Doctoral work in Nuclear Engineering in 1990-1991. In addition, he holds a dual MS
degree in Systems Engineering & Management from the Massachusetts Institute of
Technology (MIT). He has over 280 publications, three international patents being
Using Deep Learning to Configure Parallel Distributed Discrete-Event Simulators 47
utilized in the Aerospace Industry, and graduated 40 Master and 34 Doctoral students as
advisor/Co-Advisor.
Chapter 3
ABSTRACT
This article presents an overview of machine learning in general and deep learning in
particular applied to autonomous vehicles. The characteristics of the data from the noise
and small size viewpoints made this problem intractable for other methods. The use of
machine learning for this project required two hardware/software systems: one for
training in the cloud and the other one in the autonomous vehicle. The main conclusion is
that deep learning can create sophisticated models which are able to generalize with
relatively small datasets. In addition, autonomous vehicles are a good example of a
multiclass classification problem.
INTRODUCTION
According to data published by the United Nations, more than 1.2 million people die
on roads around the world every year, and as many as 50 million are injured. Over 90%
of these deaths occur in low- and middle-income countries. Brazil is among the countries
*
Corresponding Author Email: olmer.garciab@utadeo.edu.co.
50 Olmer Garcia and Cesar Diaz
in which the number of such deaths is relatively high. Figure 1 shows historical data for
traffic accident deaths in Brazil, USA, Iran, France, and Germany. However, the per
capita statistics are controversial as the number of people who drive varies between
countries, as does the number of kilometers traveled by drivers. There is a significant
difference in the statistics between developing and high-income countries.
www.electronicbo.com
Figure 1. Traffic accident deaths per 10,000 citizens. Sources: Brazil (DATASUS), United States
(NHTSA), Iran Bahadorimonfared et al. (2013), Germany(destatis.de) and France (www.securite-
routiere.gov.fr).
The trend toward the use of automated, semi-autonomous, and autonomous systems
to assist drivers has received an impetus from major technological advances as indicated
by recent studies of accident rates. On the other hand, the challenges posed by
autonomous and semi-autonomous navigation have motivated researchers from different
groups to undertake investigations in this area. One of the most important issues when
designing an autonomous vehicle is safety and security (Park et al., 2010). Currently,
machine learning (ML) algorithms have been used at all levels of automation for
automated vehicles (NHTSA, 2013):
No-Automation (Level 0): The driver has complete control of the vehicle, but
machine learning helps through the perception of the environment to inspect and
alarm the driver.
Function-specific (Level 1) and combined Automation (Level 2): One or more
primary driver functions – brake, steering, throttle, and motive power – are
controlled in the specific moment for the algorithms, like lane centering
algorithms or adaptive cruise control. In these systems, the conventional
Machine Learning Applied to Autonomous Vehicles 51
and so, to ensure that messages related to the transfer of control are given in a
timely and appropriate manner.
www.electronicbo.com
MACHINE LEARNING AND DEEP LEARNING
This section is an introduction to the main concepts of machine learning and deep
learning.
Michalski et al. (1983) stated that a “Learning process includes the acquisition of
new declarative knowledge, the development of motor and cognitive skills through
instruction or practice, the organization of new knowledge into general and effective
representations, the discovery of new facts, and theories through observation and
experimentation.” Kohavi & Provost (1998) published a Glossary of terms for machine
learning and define it as: “The non-trivial process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data machine learning is most
commonly used to mean the application of induction algorithms, which is one step in the
knowledge discovery process.”
Machine learning is highlighted as the study and computer modeling of learning
processes. The main idea is developed around the following research paths:
Many authors have described different taxonomies about learning processes which
only include the basic learner and teacher problem. However, Camastra & Vinciarelli
(2007) provided a more focused definition based on the application of audio, images and
video analysis to machine learning. They identify four different learning types: rote
learning, learning from instruction, learning by analogy, and learning from examples,
which are briefly explained below.
Rote Learning: This type consists of directly implanting new knowledge in the
learner. This method includes (1) Learning processes using programs and
instructions implemented by external entities, and (2) Learning processes using
memorization of a given data with no inferences drawn from the incoming
information.
Learning from instruction: This learning consists of a learner acquiring
knowledge from the instructor and/or other source and transforming it into
internal representations. The new information is integrated with prior knowledge
for effective use. One of the objectives is to keep the knowledge in a way that
incrementally increases the learner’s actual knowledge (Camastra & Vinciarelli,
2007).
Learning by analogy: This type of learning consists of acquiring new facts or
skills based on “past situations that bear strong similarity to the present problem
at different levels of abstraction" (Carbonell, 2015). Learning by analogy
requires more inferencing by the learner than rote learning and learning from
instruction. Carbonell (2015) gives a good definition: “A fact or skill analogous
in relevant parameters must be retrieved from memory. Then, the retrieved
knowledge must be transformed, applied to the new situation, and stored for
future use."
Learning from examples: This can simply be called learning: if given a set of
concept’s examples, the learner builds a general concept representation based on
the examples. The learning problem is described as the search for a general rule
that could explain the examples even if only a limited size of examples is given.
Learning techniques can be grouped into four main types: supervised learning,
unsupervised learning, reinforcement learning, and semi-supervised learning.
Supervised Learning: the learning process is based on examples with inputs
and desired outputs, given by a “teacher”. The data is a sample of input-
output patterns. The goal is to learn a general rule about how the output can
be generated, based on the given input. Some common examples are
predictions of stock market indexes and recognition of handwritten digits and
letters. The training set is a sample of input-output pairs, the task of learning
problem is to find a deterministic function that maps an input to the
respective output to predict future input-output observations and therefore
54 Olmer Garcia and Cesar Diaz
www.electronicbo.com
associated target values, the problem is known as unsupervised learning. In
this case, there is not an instructor. The learning algorithm does not have
labels, leaving it on its own to find some “structure” in its input. We have
training samples of objects, with the possibility of extracting some
“structure” from them. If the structure exists, it is possible to take advantage
of this redundancy and find a short description of the data representing
specific similarity between any pairs of objects.
Reinforcement Learning: The complication with reinforcement learning is to
find how to learn what to do to maximize a given reward. Indeed, in this
type, feedback is provided in terms of rewards and punishments. The learner
is assumed to gain information about the actions. A reward or punishment is
given based on the level of success or failure of each action. The ergodicity is
important in reinforcement learning.
Semi-supervised Learning: Consists of the combination of supervised and
unsupervised learning. In some books, it refers to a mixed of unlabeled data
with labeled data to make a better learning system (Camastra, & Vinciarelli,
2007).
Deep Learning
Deep learning has become a popular term. Deep learning can be defined as the use of
neural networks with multiple layers in big data problems. So, why is it perceived as a
“new” concept, if neural networks have been studied since the 1940s? This is because
parallel computing created by graphics processing units (GPU), distributed systems along
with efficient optimization algorithms have led to the use of neural networks in
contemporary/complex problems (e.g., voice recognition, search engines, and
autonomous vehicles). To better understand this concept, we first present a brief review
of neural networks; and then proceed to present some common concepts of deep learning.
Machine Learning Applied to Autonomous Vehicles 55
Figure 2. Neural Network with six inputs, one hidden layer with four nodes and one output.
Neural Networks
Where xi is the value of each input to the node, wi are weight parameters which
multiply each input, b is known as the bias parameter and f (.) is known as the activation
function. The commonly used functions are the sigmoidal activation functions, the
hyperbolic tangent functions and the rectified linear unit (ReLU). Heaton (2015) proposes
that while most current literature in deep learning suggests using the ReLU activation
function exclusively, it is necessary to understand sigmoidal and hyperbolic tangent to
see the benefits of ReLU.
Varying the weights and the bias would vary the amount of influence any given input
has on the output. The learning aspect of neural networks takes place during a process
known as back-propagation used by the most common algorithm developed in the
1980’s. In the learning process, the network modifies the weights and bias to improve the
www.electronicbo.com
network’s output like any algorithm of machine learning. Backpropagation is an
optimization process which uses the chain rule of the derivative to minimize the error in
order to improve the output accuracy. This process is developed by numerical methods
where stochastic gradient descent (SGD) is a dominant scheme.
Finally, the way in which nodes are connected defines the architecture of the neural
network. Some of the popularly known algorithms are as follows:
Note that all concepts of machine learning such as how to translate a problem into a
fixed length array of floating-point numbers, which type of algorithm to use,
normalization, correlation, overfitting, and so on are also applicable in deep learning.
Deep CNN is a new type of neural networks and one of the classes of deep neural
networks. CNN works by successively representing small portions of the features of the
problem in a hierarchical fashion and combining them in a multiple layer network (with
several hidden layers). This successive representation means that the first layer(s) will be
engineered to detect specific features. The next layers will combine these features into
simpler profiles/forms and into patterns to make the identification more resistant to
positions, resolutions, scales, brightness, noise, and rotations. The last layer (s) will
match that input example (i.e., a particular image acquired) and all of its forms and
patterns to a class. CNN has provided very high levels of predictions in computer vision,
image processing, and voice recognition.
CNN's remind us of neural network architectures such as the Neocognitron and
L3NET-5. CNN's can have many layers. A classical architecture will have at least 4
layers: input, convolution, pooling, and a fully connected one. CNN's can have several
convolution layers, several pooling layers, and several fully connected ones.
Deep learning is an emergent concept of many technologies such as:
The Rectified linear unit (ReLU) has become the standard activation function for
the hidden layers of a deep neural network. The output layer uses a linear or
Softmax activation function depending on whether the neural network supports
or does not support regression or classification. ReLU is defined as (𝑥) =
𝑚𝑎𝑥(0, 𝑥). The function returns 0 if x is negative, otherwise it returns x.
Filters: Convolutional neural networks (CNNs) breaks up the image into smaller
pieces. Selecting a width and height that defines a filter or patch is the first step.
CNN uses filters to split an image into smaller patches. The size of these patches
matches the filter size. Then CNN simply slides this patch horizontally or
vertically to focus on a different piece of the image making convolution. The
amount by which the filter slides is referred to as the stride. How many neurons
does each patch connect to? That is dependent on our filter depth. If we have a
depth of k, we connect each patch of pixels to k neurons in the next layer.
58 Olmer Garcia and Cesar Diaz
Finally, the parameter is the padding, which is responsible for the border of zeros
in the area that the filter sweeps.
Convolution Layer
The input layer is just the image and/or input data (e.g., 3D – height (N), width (N),
and depth (D)). Traditional Deep CNN uses the same height and width dimensions (i.e.,
squares). The convolution layer is next. The convolution layer is formed by filters (also
called kernels) which run over the input layer. A filter is of smaller sides (height (F) and
width (F)) than the previous layer (e.g., the inputs layer or a different one) but with the
www.electronicbo.com
same depth. A filter is used and processes the entire input layer producing part of the
output of the convolution layer (smaller than the previous layer). The process done by the
filter is executed by positioning the filter in successive areas (F by F) of the input layer.
This positioning advances in strides (S) which is the number of input neurons (of the area
– N x N)) to move in each step (i.e., strides are “the distance between the receptive field
centers of neighboring neurons in a kernel map” (Krizhevsky et al., 2012)). The
relationship of the input layer (or previous layer) (N x N x D) to the map produced by the
passing/execution of a filter of size (F x F x D) is:
However, a convolution layer can have several filters (e.g., kernels) in order to produce a
kernel map as output. It is easy to see that the size of the image is getting smaller. This
can be problematic in particular to apply large size filters or CNNs that have many layers
and filters. Then, the concept of padding (P) is used. Zero-padding is the addition of zero-
valued pixels in the borders of the input layers with strides of size P. The relationship is
as follows:
P = (F-1)/2 (3)
A convolution layer can have several filters each one of size (F x F x D) and this set
will produce an output in the convolutional layer of depth equal to the number of filters in
the respective layer. The output matrix (i.e., kernel map) of the convolutional layer is the
product of the different filters been run over the kernel map of the previous layer. The
kernel map of a convolution layer can be processed for successive convolution layers that
do not need to have filters of the same dimensional size or number. Again, these layers
must be engineered. The weights and biases of these filters to produce their respective
outputs can be obtained from different algorithms such as backpropagation.
Machine Learning Applied to Autonomous Vehicles 59
Knowing the dimensionality of each additional layer helps us understand how large
our model is and how our decisions around filter size and stride affect the size of our
network. With these parameters, we can calculate the number of neurons of each layer in
CNN, given an input layer that has a volume of W (as given by N x N x D), the filter has
a volume (F ∗ F ∗ D) of F, we have a stride of S, and a padding of P, then the following
formula gives us the volume of the next layer:
Pooling Layer
This layer can have several types of filters. One of the most common ones is Max
pooling. Max pooling is a filter of a width by height, which extracts the maximum value
of the patch. Conceptually, the benefit of the max pooling operation is to reduce the size
of the input and to allow the neural network to focus on only the most important
elements. Max pooling does this by only retaining the maximum value for each filtered
area, and removing the remaining values. This technique can avoid over fitting
(Krizhevsky et al., 2012). Some variations like mean pooling are also used.
This layer type flattens the nodes in one dimension. A fully connected layer connects
every element (neuron) in the previous layers, note that the resulting vector is passed
through an activation function. For example, LeNET-5 networks will typically contain
several dense layers as their final layers. The final dense layer in a LeNET5 actually
performs the classification. There should be one output neuron for each class or type of
image to classify.
Dropout Layer
Normally deep learning has many nodes which mean many parameters. This number
of nodes can generate overfitting. Therefore dropout is used as a regularization technique
for reducing overfitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov,
2014). This layer “drops out” a random set of activations in that layer by setting them to
zero in the forward pass. During training, a good starting value for the probability to
dropout is 0.5 and during testing, it uses a value of 1.0 to keep all units and maximizes
the generalization power of the model. There are some variations about this. Krizhevsky
60 Olmer Garcia and Cesar Diaz
et al. (2012) states that dropout “consists of setting to zero the output of each hidden
neuron with probability 0.5. The neurons which are “dropped out” in this way do not
contribute to the forward pass and do not participate in back- propagation. So every time
an input is presented, the neural network samples a different architecture, but all these
architectures share weights. This technique reduces complex co-adaptations of neurons
since a neuron cannot rely on the presence of particular other neurons. It is, therefore,
forced to learn more robust features that are useful in conjunction with many different
random subsets of the other neurons. At test time, we use all the neurons but multiply
their outputs by 0.5, which is a reasonable approximation to taking the geometric mean of
the predictive distributions produced by the exponentially-many dropout networks.”
www.electronicbo.com
Transfer Learning
Transfer learning is the process of taking a pre-trained model (the weights and
parameters of a network that has been trained on a large quantity of data by others) and
“fine-tuning” the model with your own dataset (Yosinski, Clune, Bengio, & Lipson,
2014). The idea is that this pre-trained model will act as a feature extractor. You will
remove the last layer of the network and replace it with your own classifier or regression.
The Algorithm blocks the change of the weights of all the other layers and trains the
network normally. Transfer learning led to the introduction of the deep learning principle.
Reusing architecture and learning work is possible in CNNs. Therefore, one must review
the most successful architectures used before such as ImageNet by Krizhevsky et al.
(2012), ZF Net by Zeiler and Fergus (2014), VGG Net by Simonyan and Zisserman
(2014), GoogLeNet by Szegedy et al. (2015), and Microsoft ResNet (residual network)
by He et al. (2016).
The planning or navigation layer will determine where the vehicle should go
according to the perception and the mission. This has to include a risk analysis to
determine the path and speed of the vehicle. The cognition aspects of an autonomous
vehicle depend on the mobility capabilities which are studied by the robotics navigation
62 Olmer Garcia and Cesar Diaz
field (Siegwart, Nourbakhsh, & Scaramuzza, 2011). The navigation field organizes its
techniques into two groups: planning and reacting. The techniques from the planning
group are known as global path planning and are concerned with the generation of the
global route that guides the vehicle toward a goal position. The techniques from the
reacting group are known as local path planning and are concerned with the generation of
several local paths that allow the vehicle to avoid obstacles. In this layer, machine
learning techniques are used to select routes (global and local).
Finally, the control layer will manipulate the degrees of freedom of the autonomous
vehicle (e.g., steering, braking, gearbox, acceleration) for bringing it to the desired
position at a defined speed at each instant of time. Machine learning techniques have
been used to obtain mathematical models and/or adapt a controller to different situations.
www.electronicbo.com
Figure 4. Interactions of the proposed cooperative strategy with the architecture of the autonomous
vehicle VILMA01 (Bedoya, 2016).
This research studies the architecting of the layers using a cooperative strategy based
on risk analysis. The resulting architecture includes mechanisms to interact with the
driver (this architecture has been proposed in VILMA01 - First Intelligent Vehicle of the
Autonomous Mobility Laboratory). We stated above that the motion control layer is the
one in charge of manipulating the degrees of freedom of the car (steering, braking, and
acceleration). This manipulation will bring the autonomous vehicle to the desired position
at each point in time. We will explain that this can be achieved by using a predictive
control technique that relies on dynamic models of the vehicle to control the steering
system. The path-planning layer will have the reactive part also known as local path
planning, where the desired path is represented in a curvilinear space. The desired path is
Machine Learning Applied to Autonomous Vehicles 63
selected based on intrinsic and extrinsic risk indicators. With the layers of planning and
control already set, a method is proposed to estimate the trajectory desired by the driver
during the cooperative control, allowing a decision to be made based on risk analysis.
Finally, different tests on VILMA01 (in the actual vehicle) are performed to validate the
proposed architecture.
These layers are not exactly a hierarchical model. Each layer has interactions at
different levels from directive to cooperative control with the others. These interactions
can be adapted depending on what the vehicle tries to do. For example, the architecture of
VILMA01 (Bedoya, 2016) aims to test strategies to drive a vehicle cooperatively
between an autonomous system and a driver which could help to reduce the risk of
accidents. This strategy assumes that the autonomous system is more reliable than the
driver, even though in other circumstances the driver could interact with the human
machine interface to disengage the autonomous system. Based on the architecture of
autonomous mobile robots, the proposed strategy is denominated as cooperative planning
and cooperative control, which determines when and how the driver can change the path
projected by the autonomous system safely through the steering. Figure 4 shows the
function blocks for the autonomous vehicle VILMA01. There are two important
considerations in the cooperative strategies. The first one is the interaction of the driver
and the robot through the steering (dotted line 1), which in turn generates the second one,
which poses the question in the planning layer (dotted line 2): is it safe to change the
projected path? These additions to the existent architecture generate two types of
cooperation. The first one, cooperative control is defined when the control signal of the
driver and the autonomous system cooperate during the local path planned by the
autonomous system. The second one (cooperative planning) is defined when the driver
and the autonomous system cooperate to change the local path after risk analysis is
performed.
Finally, the design of the layers, their functionality, and interactions can provide an
architecture its level of automation. According to Thrun et al. (2006), the six major
functional groups are interface sensors, perception, control, planning, vehicle interface
and user interface. Therefore, this layered architecture must take into consideration
hardware, software, and drive-by-wire automation.
Our work is inspired by the German Traffic Signs data set provided by Stallkamp,
Schlipsing, Salmen, & Igel (2011) that contained about 40k training examples and 12k
testing examples. The same problem can be used as a model for Colombia traffic signs.
This is a classification problem which aims to assign the right class to a new image of a
traffic sign by training on the provided pairs of traffic sign images and their labels. The
project can be broken down into five parts: exploratory data analysis, data preprocessing
and data augmentation, the definition of a CNN architecture, training the model, testing
the model and using it with other images.
Data Analysis
www.electronicbo.com
The database is a set of images which can be described computationally like a
dictionary with key/value pairs:
The image data set is a 4D array containing raw pixel data of the traffic sign
images (number of examples, width, height, channels).
The label is an array containing the type of the traffic sign (number of samples,
traffic sign id).
Traffic sign id description is a file, which contains the name and some
description for each traffic sign id.
An array containing tuples, (x1, y1, x2, y2) representing coordinates of a
bounding box around the sign in the image.
It is essential to understand the data and how to manipulate it (Figure 5 shows some
randomly selected samples). This process of understanding and observing the data can
generate important conclusions such as:
Figure 6. Histogram of a number of samples of each traffic sign in the training data set.
66 Olmer Garcia and Cesar Diaz
The input images to the neural network went through a few preprocessing steps to
help train the network. Pre-processing can include:
Resizing the image: A specific size is required. 32x32 is a good value based on
the literature.
Color Space Conversion: It is possible to transform to gray scale if you think that
the colors do not matter in the classification or may be changed from RGB (Red,
Green, and Blue) space to some color space like HSV (Hue, Saturation, and
Brightness). Some other approach can include balanced brightness and contrast
www.electronicbo.com
of the images.
Normalization: This part is very important because the algorithms in neural
networks work just with the data in some interval, normally between 0 and 1 or -
1 and 1. This could be done by dividing each dimension by its standard deviation
once it is zero-centered. This process causes each feature to have a similar range
so that our gradients do not go out of control (Heaton, 2013).
Unbalanced data, as shown in Figure 6, means that there are many more samples of
one traffic sign than the others. This could generate overfitting and/or other problems in
the learning process. One solution is to generate new images or to take some images
randomly and change through a random combination of the following techniques:
Translation: Move the image horizontally or vertically and some pixels around
the center of the image.
Rotation: Rotate the image at random angle with axes at the center of the image.
Affine transformations: Make a zoom over the image or change the perspective
of the image.
A good way to start assembling your own deep neural network is to review the
literature and look for a deep learning architecture which has been used in a similar
problem. The first one was the architecture presented by LeCun et al. (1998): LeNet-5
(Figure 7). Let’s assume that we select LeNet-5. Therefore, the first step is to understand
LeNet-5 which is composed of 8 layers. LeNet-5 is explained as follows:
Machine Learning Applied to Autonomous Vehicles 67
Figure 7. The architecture of LeNet-5, a Convolutional Neural Network, here for digits’ recognition.
Each plane is a feature map, i.e., a set of units whose weights are constrained to be identical – Adapted
and modified from LeCun et al. (1998).
68 Olmer Garcia and Cesar Diaz
www.electronicbo.com
The training process for CNNs has the following steps:
Split the training data between training and validation. Validation data is used
for calculating the accuracy of the estimation. On the other hand, training data is
used to apply the gradient algorithm.
Type of optimizer: Several algorithms can be used. The gradient descent
stochastic optimization by Kingma & Ba (2014) is a typical selection. This
scheme is a first-order gradient-based optimization of stochastic objective
functions. In addition, it is well suited for problems that are large in terms of data
and/or input parameters. The algorithm is simple and can be modified
accordingly. Kingma and Ba (2014) detailed their algorithm (pseudocode) as
follows:
Require: α: Stepsize
Require: β1, β2 ∈ [0, 1): Exponential decay rates for the moment estimates
Require: f(θ): Stochastic objective function with parameters θ
Require: θ0: Initial parameter vector
m0 ← 0 (Initialize 1st-moment vector)
v0 ← 0 (Initialize 2nd-moment vector)
t ← 0 (Initialize timestep)
while θt not converged do
t ← t + 1 (Increase timestep t)
gt ← ∇θft(θt−1) (Get gradients with respect to the stochastic objective at t)
mt ← β1 · mt−1 + (1 − β1) · gt (Update biased first-moment estimate)
vt ← β2 · vt−1 + (1 − β2) ·𝑔𝑡2 (Update biased second raw moment
estimate)
𝑚̂ 𝑡 ← mt/(1 − 𝛽1𝑡 ) (Compute bias-corrected first moment estimate)
𝑣̂t ← vt/(1 − β𝑡2 ) (Compute bias-corrected second raw moment estimate)
Machine Learning Applied to Autonomous Vehicles 69
Batch size: This hyper-parameter defines the number of examples that are going
to be propagated in a forward/backward iteration. A fine tuned batch size can
support schemes of less memory and faster training. However, it can reduce the
accuracy of the gradient estimation.
Epochs: One epoch is a forward pass and one backward pass of all the training
examples of the training data set. The analyst monitors each epoch and analyzes
how the training process is evolving. Note that in each epoch the training and
validation data should be shuffled to improve the generalization of the neural
network.
Hyperparameters: Depending on the algorithm and the framework used there
exist values that should be tuned. Learning rates of the optimizer are usually an
important hyper-parameters to find. CNNs may involve other hyperparameters
such as filter windows, dropout rates, and the size of the mini-batches. These
hyper-parameters can be different for each layer. For example, the following
hyper-parameters can be relevant for a CNN: Number of Filters (K), F: filter size
(FxF), Stride (S), and the amount of padding (P). Techniques can be used in
order to optimize the tuning process and avoid trial and error efforts. These
techniques can involve models from Operations Research, evolutionary
algorithms, Bayesian schemes, and heuristic searches.
The training process will end when one finds a good accuracy between the model
outputs and the known output classes. In this project, an accuracy over 98% for a CNN
developed for Colombian traffic signals was achieved. This is a very good value taking
into consideration that humans are in the 98.32% accuracy range. The training process
requires sophisticated computational power. It is essential to have access to high-level
computing resources or cloud services providers like Amazon (https://aws.amazon.com/),
IBM Bluemix (https://www.ibm.com/cloud-computing/bluemix/) or Microsoft Azure
(https://azure.microsoft.com/).
The last step is to prove that the neural network model works in other situations
which are different than the data that was used to train and validate the model. It is very
important to use data that has not been used in the process of training and validation. For
example, we developed a CNN with Colombian Traffic signals and obtained a moderate
70 Olmer Garcia and Cesar Diaz
to low accuracy in the testing process. This model developed provided opportunities to
analyze new research questions such as:
Will this model work with my country traffic signals – How about the climate
and the cultural environment?
How to improve performance?
Is feasible to implement the feedforward process in real time?
CONCLUSION
www.electronicbo.com
A brief review of machine learning and the architecture of autonomous vehicles was
discussed in this chapter. It is important to note that the use of machine learning required
two hardware/software systems: one for training in the cloud and the other one in the
autonomous vehicle. Another point to take into account was that modeling by machine
learning using examples requires sufficient data to let machine learning models
generalize at appropriate levels. There are some potential applications for deep learning
in the field of autonomous vehicles. For example, it is possible that a deep learning
neural network becomes the “driver” of the autonomous vehicle: where the inputs are
road conditions and the risk profile of the passenger and the outputs are turning degrees
and speed of the car. Driving scenarios are a good fit for multiclass and multi label
classification problems. The mapping is hidden in the different and multiple hierarchical
layers but deep learning does not need the exact form of the function (if it maps well
from input to output). The results are very promising. However, safety regulations (and
public acceptance) will require numerous tests and validations of the deep learning based
systems to be certified by the respective agencies.
REFERENCES
Amsalu, S., Homaifar, A., Afghah, F., Ramyar, S., & Kurt, A. (2015). Driver behavior
modeling near intersections using support vector machines based on statistical feature
extraction. In 2015 IEEE Intelligent Vehicles Symposium (IV), 1270–1275.
Bahadorimonfared, A., Soori, H., Mehrabi, Y., Delpisheh, A., Esmaili, A., Salehi, M., &
Bakhtiyari, M. (2013). Trends of fatal road traffic injuries in Iran (2004–2011). PloS
one, 8(5):e65198.
Bedoya, O. G. (2016). Análise de risco para a cooperação entre o condutor e sistema de
controle de veículos autônomos[Risk analisys for cooperation between the driver and
Machine Learning Applied to Autonomous Vehicles 71
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing
systems, 1097–1105.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Liu, P., Kurt, A., & Ozguner, U. (2014). Trajectory prediction of a lane changing vehicle
based on driver behavior estimation and classification. In 17th International IEEE
Conference on Intelligent Transportation Systems (ITSC), 942–947.
Malik, H., Larue, G. S., Rakotonirainy, A., & Maire, F. (2015). Fuzzy logic to evaluate
driving maneuvers: An integrated approach to improve training. IEEE Transactions
on Intelligent Transportation Systems, 16(4), 1728–1735.
www.electronicbo.com
Merat, N., Jamson, A. H., Lai, F. C., Daly, M., & Carsten, O. M. (2014). Transition to
manual: Driver behaviour when resuming control from a highly automated vehicle.
Transportation Research Part F: Traffic Psychology and Behaviour,27, Part B, 274 –
282. Vehicle Automation and Driver Behaviour.
Michalski, S. R., Carbonell, J., & Mitchell, T. (1983). Machine Learning: An Artificial
Intelligence Approach. Tioga Publishing Company.
NHTSA (2013). US department of transportation releases policy on automated vehicle
development. Technical report, Highway Traffic Safety Administration.
Organization, W. H. (2015). Global status report on road safety 2015. http://apps.who.
int/iris/bitstream/10665/189242/1/9789241565066_eng.pdf?ua=1. (Accessed on
08/11/2016).
Park, J., Bae, B., Lee, J., & Kim, J. (2010). Design of failsafe architecture for unmanned
ground vehicle. In Control Automation and Systems (ICCAS), 2010 International
Conference on, 1101–1104.
Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous
mobile robots. MIT Press, 2nd Edition.
Simonyan, K. & Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556.
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014).
Dropout: a simple way to prevent neural networks from overfitting. Journal of
Machine Learning Research, 15(1), 1929–1958.
Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2011). The German Traffic Sign
Recognition Benchmark: A multi-class classification competition. In IEEE
International Joint Conference on Neural Networks, 1453–1460.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
1–9.
Machine Learning Applied to Autonomous Vehicles 73
Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D., Aron, A., Diebel, J., Fong, P.,
Gale, J., Halpenny, M., & Hoffmann, G. (2006). Stanley: The robot that won the
darpa grand challenge. Journal of field Robotics, 23(9), 661–692.
Widrow, B. & Lehr, M. A. (1990). 30 years of adaptive neural networks: perceptron,
madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in
deep neural networks? In Advances in neural information processing systems, 3320–
3328.
Zeiler, M. D. & Fergus, R. (2014). Visualizing and understanding convolutional
networks. In European conference on computer vision, 818–833. Springer.
AUTHORS’ BIOGRAPHIES
Chapter 4
ABSTRACT
Support vector machines are popular approaches for creating classifiers in the
machine learning community. They have several advantages over other methods like
neural networks in areas like training speed, convergence, complexity control of the
classifier, as well as a more complete understanding of the underlying mathematical
foundations based on optimization and statistical learning theory. In this chapter we
explore the problem of model selection with support vector machines where we try to
discover the value of parameters to improve the generalization performance of the
algorithm. It is shown that genetic algorithms are effective in finding a good selection of
parameters for support vector machines. The proposed algorithm is tested on a dataset
representing individual models for electronic commerce.
INTRODUCTION
Support vector machines are popular approaches for developing classifiers that offer
several advantages over other methods like neural networks in terms of training speed,
www.electronicbo.com
provide an effective approach to finding good parameters for support vector machines
(SVMs). We describe a possible implementation of a GA and compare several variations
of the basic GA in terms of the convergence speed. In addition, it is shown that using a
convex sum of two kernels provides an effective modification of SVMs for classification
problems and not only for regression as was previously shown in Smits and Jordaan
(2002). The algorithm is tested on a dataset that consists of information on 125 subjects
from a study conducted by Ryan (1999) and previously used for comparing several
learning algorithms in Rabelo (2001). The proposed algorithm is tested on a dataset that
represents individual models for electronic commerce.
LITERATURE SURVEY
Support vector machines as well as most other learning algorithms have several
parameters that affect their performance and that need to be selected in advance. For
SVMs, these parameters include the penalty value C , the kernel type, and the kernel
specific parameters. While for some kernels, like the Gaussian radial basis function
kernel, there is only one parameter to set ( ), more complicated kernels need an
increasing number of parameters. The usual way to find good values for these parameters
is to train different SVMs –each one with a different combination of parameter values–
and compare their performance on a test set or by using other generalization estimates
like leave one out or crossvalidation. Nevertheless, an exhaustive search of the parameter
space is time consuming and ineffective especially for more complicated kernels. For this
reason several researchers have proposed methods to find good set of parameters more
efficiently (see, for example, Cristianini and Shawe-Taylor et al. (1999), Chapelle et al.
(2002), Shao and Cherkassky (1999), and Ali and Smith (2003) for various approaches).
For many years now, genetic algorithms have been used together with neural
networks. Several approaches for integrating genetic algorithms and neural networks
Evolutionary Optimization of Support Vector Machines … 77
have been proposed: using GAs to find the weights (training), to determine the
architecture, for input feature selection, weight initialization, among other uses. A
thorough review can be found in Yao (1999). Recently, researchers have been looking
into the combination of support vector machines with genetic algorithms.
Few researchers have tried integrating SVMs with genetic algorithms. There are
basically two types of integrations of SVM and GA. The most common one consists on
using the GA to select a subset of the possible variables reducing the dimensionality of
the input vector for the training set of the SVM or selecting a subset of the input vectors
that are more likely to be support vectors (Sepúlveda-Sanchis et al., 2002; Zhang et al.,
2001; Xiangrong and Fang, 2002; Chen, 2003). A second type of integration found in the
literature is using a GA for finding the optimal parameters for the SVM (Quang et al,
2002; Xuefeng and Fang, 2002; Lessmann, 2006).
Here we propose and illustrate another approach that makes use of ten-fold
crossvalidation, genetic algorithms, and support vector machines with a mixture of kernel
for pattern recognition. The experiments are done using a dataset that represents model of
individuals for electronic commerce applications.
www.electronicbo.com
convergence of the search process could become very slow.
Representation
1001010101110101
var1 var2 var3
Depending on the problem, each of these strings can be transformed into integers,
decimals, and so on.
Usually the initial population is selected at random; every bit has an equal chance of
being a ‘0’ or a ‘1’. For each individual, a fitness value is assigned according to the
problem. The selection of the parents that will generate the new generation will depend
on this value.
Another popular representation in GAs is the floating point representation: each gene
in the individual represents a variable. This type of representation has been successfully
used in optimization problems (Michalewicz and Janikow, 1996; Goldberg, 1991). It is
important to note, though, that real-coded genetic algorithms require specialized
operators.
Selection
There are different ways to select the parents. In the fitness proportionate selection
method every individual is selected for crossover a number of times proportional to his
fitness. It is usually implemented with the roulette-wheel sampling (also called
Montecarlo Selection algorithm in Dumitrescu et al. (2000)): each solution occupies an
area of a circular roulette wheel that is proportional to the individual’s fitness.
The roulette wheel is spun as many times as the size of the population. This method
of selection has several drawbacks. During in the start of the algorithm if there are
individuals that have a relatively large fitness function they will be selected many times
which could cause a premature convergence due to lack of diversity. Later on the
simulation run when most individuals have similar fitness function every individual will
have roughly the same probability of being selected. Also, it is not compatible with
negative values and it only works with maximization problems.
In Tournament selection, K individuals are selected at random from the population.
In the deterministic Tournament selection, the fitter of the k individuals is selected. In
80 Fred K. Gruber
the nondeterministic version, the fitter individual is selected with certain probability.
Tournament selection is becoming a popular selection method because it does not have
the problems of fitness-proportionate selection and because it is adequate for parallel
implementations (Bäck et. al., 2000).
Other selections methods include rank-based selection, Boltzman selection, steady
state selection, sigma scaling and others. For a complete survey of the different selection
methods the reader can refer to Bäck et. al. (2000) and Mitchell (1998).
Operators
www.electronicbo.com
There are two main types of operators Bäck et. al. (2000): unary, e.g., mutation and
higher order operators, e.g., crossover. Crossover involves two or more individuals that
are combined together to form one or more individual. The simplest crossover type is
one-point crossover as shown in Figure 3:
Parent1: 10110001010001
Parent2: 10101010011111
One point
crossover
point
Child 1: 10110010011111
Child 2: 10101001010001
This operator has an important shortcoming: positional bias—the bits in the extremes
are always exchanged. This type of crossover is rarely used in practice (Bäck et. al 2000).
Two-point crossover is a variation of the previous operator as illustrated in Figure 4:
Parent1: 10110001010001
Parent2: 10101010011111
Two point
crossover
point
Child 1: 101100010 10001
Child 2: 101010100 11111
There is no clear “best crossover” and the performance of the GA usually depends on
the problem and the other parameters as well.
Crossover is not limited to two parents, though. There have been experimental results
pointing out that multiparent crossover, e.g., six parent diagonal crossover, have better
performance than the one-point crossover (see Eiben, 2002 and references therein).
In the one-child version of the diagonal crossover, if there are n parents, there will
be n 1 crossover points and one child (see Figure 6).
In GAs, crossover is the main operator of variation, while mutation plays a reduced
role. The simplest type of mutation is flipping a bit at each gene position with a
predefined probability. Some studies have shown that varying the mutation rate can
improve significantly the performance rate when compared with fixed mutation rates (see
Thierens, 2002).
There are three main approaches to varying the mutation rate (Thierens, 2002):
dynamic parameter control, in which the mutation rate is a function of the generations.
82 Fred K. Gruber
1
n2
pt 2 t
T 1
www.electronicbo.com
mutation rate, and sometimes the crossover rate, is modified accordingly. One technique
that is found to produce good results in Vasconcelos et al. (2001) measured the “genetic
diversity” of the search according to the ratio of the average fitness to the best fitness or
gdm . A value of gdm close to 1 implies that all individuals have the same genetic code
(or the same fitness) and the search is converging. To avoid premature convergence, it is
necessary to increase exploration (by increasing the mutation rate) and to reduce the
exploitation (by reducing the crossover rate). For the contrary, if the gdm falls below a
lower limit the crossover rate is increased and the mutation rate reduced.
In the self-adaptive methodology, several bits are added to each individual that will
represent the mutation rate for that particular individual. This way the mutation rate
evolves with each individual. This technique is investigated by Bäck and Schütz (1996).
Another important variation is elitism in which the best individual is copied to the
next generation without modifications. This way the best solution is never lost (see, for
example, Xiangrong and Fang, 2002).
All experiments use data from the study conducted by Ryan (1999) that contains
information on 125 subjects. A web site is used for this experiment, where 648 images
are shown sequentially to each subject. The response required from the individuals is
their preference for each image (1: Yes, 0: No). The images are characterized by seven
discrete properties or features, with specific levels:
Pointalization – Describes the size of the points that make the individual circles
(3 levels).
Saturation – Describes the strength of the color within the circles (3 levels).
Brightness – Describes the amount of light in the circles themselves (4 levels).
Blur – Describes the crispness of the circles (2 levels).
Background – Describes the background color of the image (3 levels).
Density: Level 1 Cold vs Warm: Level 1 Density: Level 1 Cold vs Warm: Level 1
Pointalized: Level 1 Saturation: Level 1 Pointalized: Level 1 Saturation: Level 1
Light/Dark: Level 1 Motion blur: Level 1 Light/Dark: Level 2 Motion blur: Level 2
BKG: Level 3 BKG: Level 3
Density: Level 1 Cold vs Warm: Level 2 Density: Level 1 Cold vs Warm: Level 2
Pointalized: Level 1 Saturation: Level 1 Pointalized: Level 3 Saturation: Level 1
Light/Dark: Level 3 Motion blur: Level 2 Light/Dark: Level 3 Motion blur: Level 1
BKG: Level 2 BKG: Level 1
Density: Level 2 Cold vs Warm: Level 2 Density: Level 3 Cold vs Warm: Level 1
Pointalized: Level 2 Saturation: Level 3 Pointalized: Level 2 Saturation: Level 1
Light/Dark: Level 3 Motion blur: Level 2 Light/Dark: Level 2 Motion blur: Level 1
BKG: Level 1 BKG: Level 2
www.electronicbo.com
As an illustration, typical images for different values of these features are shown in
Figure 7 to Figure 9.
The response of each individual is an independent dataset. Rabelo (2001) compares
the performance of several learning algorithms on this collection of images.
Implementation Details
The support vector machine is based on a modified version of LIBSVM (Chang and
Lin, 2001) while the genetic algorithm implementation was written from the ground up in
C++ and compiled in Visual C++ .NET. In the following, we describe more details about
the genetic algorithm implementation.
Representation
Each individual is represented as a binary string that encodes five variables (see
Figure 10):
The first 16 bits represents the cost or penalty value, C. It is scaled from 0.01 to
1000.
The next 16 bits represents the width of the Gaussian kernel, , scaled from
0.0001 to 1000.
The next 2 bits represents 4 possible values for the degree d : from 2 to 5
The next 16 bits represents the parameter, which controls the percentage of
polynomial and Gaussian kernel. It was scaled from 0 to 1.
Finally, the last parameter is the r value, which determines whether we use a
complete polynomial or not.
Evolutionary Optimization of Support Vector Machines … 85
s
The binary code i that represents each variable is transformed to an integer
according to the expression
N 1
m si 2i
i 0
where N is the number of bits. This integer value is then scaled to a real number in the
interval [a, b] according to
ba
x am
2N 1
ba
.
2N 1
(1 p) u v r
u v
2 d
pe .
Keerthi and Lin (2003) found that when a Gaussian RBF kernel is used for model
selection, there is no need to consider the linear kernel since it behaves as a linear kernel
for certain values of the parameters C and .
Fitness Function
The objective function is probably the most important part of the genetic algorithms
since it is problem-dependent. We need a way to measure the performance or quality of
86 Fred K. Gruber
the different classifiers that are obtained for the different value of the parameters. As
indicated previously, several methods try to estimate the generalization error of a
classifier. Contrary to other applications of GAs, the objective function in this problem is
a random variable with associated variance and it is computationally expensive since it
involves training a learning algorithm. In order to decide which method to use, we
developed several experiments in order to find the estimator with the lowest variance.
The results are summarized in Table 2.
The hold out technique had the highest standard deviation. Stratifying the method,
i.e., keeping the same ratio between classes in the training and testing set slightly reduced
the standard deviation. All crossvalidation estimates had a significantly lower standard
deviation than the hold out technique.
www.electronicbo.com
Table 2. Mean and standard deviation of different types of
generalization error estimates
Several crossover operators are tested: one point, two point, uniform, and multiparent
diagonal.
Evolutionary Optimization of Support Vector Machines … 87
1
n2
pt 2 t .
T 1
In addition, the simple mutation operator was also modified to experiment with other
techniques for varying the mutation rate: a self-adaptation method and a feedback
mechanism based on the genetic diversity.
The self-adaptation method consists on adding 16 bits to each individual in order to
obtain a probability p . From this value the mutation rate is obtained according to the
following equation (Bäck and Schütz, 1996):
1
1 p N (0,1)
p ' 1 e
p
Where is the rate that controls the adaptation speed and N (0,1) is a random normal
number with mean 0 and standard deviation 1. The normal random variable is generated
according to the Box and Muller method (see, for example, Law and Kelton 2000 p 465).
The feedback mechanism was based on calculating the genetic diversity of the
population by the ratio between the average and the best fitness (
AvgFitness / BestFitness ). If the genetic diversity falls below a particular level the
mutation rate is increased and the crossover rate is reduced. The contrary happens if the
genetic diversity becomes bigger than a given value.
For the selection operator we only considered the deterministic k Tournament
selection.
To select the operators with the best performance (e.g., faster convergence of the
GA) from the different possibilities, we repeated the runs 30 times with different random
initial solutions. With each replication, we obtain an independent estimate for the best
generalization ability at each generation.
At the start of each replication, the dataset is randomly split in the ten subsets
required by the 10-fold crossvalidation. Using the same split during the whole run allows
us to study the effect of the different variations without being affected by randomness,
88 Fred K. Gruber
i.e., one particular model will always have the same performance throughout the run of
the genetic algorithm. At the same time, since we are doing 30 replications –each with a
different random split— we can get a good idea of the average performance as a function
of the generation for each of the variations of the genetic algorithm. Figure 11
summarizes this process in an activity diagram.
Table 3 lists the different combinations of parameters of the GA that were tested. It
was assumed that the performance of each parameter is independent of the others,
therefore, not every combination of parameter values were tested.
Table 3. Parameters of the genetic algorithm used for testing the different variations
www.electronicbo.com
Parameter Value
Population 10
Generations 20
Prob. of crossover 0.95
Prob. of mutation 0.05
Fitness function 10 fold crossvalidation
Selection 2-Tournament selection
Crossover types One point, two point, uniform, diagonal with 4 parents
Mutation type Fixed rate, dynamic rate, self-adaptive rate, feedback
Other Elitism, no elitism
After repeating the experiment 30 times we calculated the average for each
generation. A subset of 215 points is used for the experiments. This subset was obtained
in a stratified manner (the proportion of individuals of class 1 to class -1 was kept equal
to the original dataset) from individual number 2. The reduction of the number of points
is done to reduce the processing time.
In most cases, we are interested in comparing the performance measures at the 20 th
generation the genetic algorithms using different parameters. This comparison is made
using several statistical tests like 2 sample t test and best of k systems (Law and Kelton,
2000).
Figure 12 shows the effect of elitism when the genetic algorithm uses a one-point
crossover with crossover rate of 0.95 and simple mutation with mutation rate of 0.05.
Evolutionary Optimization of Support Vector Machines … 89
www.electronicbo.com
Figure 12. Effect of elitism in the best fitness per generation.
Figure 13. Standard deviation of the average best fitness for elitism vs. not elitism.
We use simple elitism, i.e., the best parent is passed unmodified to the next
generation. As it is shown in Figure 12, by not using elitism there is a risk of losing good
individuals, which may also increase the number of generations needed to find a good
solution.
Evolutionary Optimization of Support Vector Machines … 91
A two-sample t-test shows that, at generation 20, the average best fitness of the
elitism GA is significantly higher at the 0.1 level with a p-value of 0.054 and a lower
limit for the 90% confidence interval of 0.542557.
Figure 13 shows the standard deviation of the two GAs as a function of the
generation which illustrates another advantage of using the elitist strategy: as the
generation increases, the standard deviation decreases. The standard deviation of the GA
with elitist strategy is significantly lower at the 20th generation at the 0.1 level in the F
test for two variances and the Bonferroni confidence interval.
We tested four crossover types: one point, two points, uniform, and a 4-parents
diagonal. The comparison is shown in Figure 14 and Figure 15.
Figure 14. Effect of the different crossover type on the fitness function.
www.electronicbo.com
Figure 15. Effect of the different crossover type on the standard deviation.
The 4-parent diagonal crossover has the highest fitness function at the 20th
generation; however, it has a higher standard deviation than the two-point crossover (see
Figure 15 and Table 4). In order to make a decision we use a technique found in Law and
Kelton (2000) for finding the best of k systems. With this methodology, we selected the
diagonal crossover as the best for this particular problem.
Four ways to set the mutation rate are tested: fixed mutation rate, dynamically
adapted, self-adaptation, and feedback. The other parameters are kept constant: diagonal
crossover with 4 parents, crossover rate of 0.95 and tournament selection. For the fixed
mutation rate, the probability of mutation is set to 0.05. The behavior of the average best
fitness as a function of the generation is shown in Figure 16. Figure 17 shows the
behavior of the standard deviation.
Evolutionary Optimization of Support Vector Machines … 93
Again, to select between the different techniques we use the select the best of k
system methodology to choose among the different techniques with the best performance
at the 20th generation. The selected method is the fixed mutation rate.
The assumption of normality is tested with Anderson Darling test.
94 Fred K. Gruber
Based on the results of the previous experiments, we selected the parameters shown
in Table 5.
Parameters Value
Population 10
Generations 20
Prob. of crossover 0.95
www.electronicbo.com
Prob. of mutation 0.05
Fitness function 10-fold crossvalidation
Selection 2-Tournament selection
Crossover types Diagonal with 4 parents
Mutation type Fixed rate
Others Elitist strategy
The activity diagram of the final genetic algorithm is shown in Figure 18. The most
important difference between this final model and the one used in the previous section is
related to the random split of the data. Instead of using only one split of the data for the
complete run of the GA, every time the fitness of the population is calculated, we use a
different random split (see Figure 19).
As a result, all individuals at a particular generation are measured under the same
conditions. Using only one random split throughout the whole run of the GA carries the
danger that the generalization error estimate for one particular model may be higher than
for other models because of the particular random selection and not because it was really
better in general. Using a different random split before calculating the fitness of every
individual carries the same danger: an apparent difference in performance may be due to
the particular random order and not due to the different value of the parameters.
While repeating the estimate several times and getting an average would probably
improve the estimate, the increase in computational requirements makes this approach
prohibitive. For example, if we have 10 individuals and we use 10 fold crossvalidation
we would have to do 100 trainings per generation. If in addition, we repeat every estimate
10 times to get an average we would have to do 1000 trainings. Clearly, for real world
problems this is not a good solution.
Using the same random split in each generation has an interesting analogy with
natural evolution. In nature the environment (represented by a fitness function in GAs) is
likely to vary with time, however, at any particular time all individuals are competing
under the same conditions.
Evolutionary Optimization of Support Vector Machines … 95
Other Implementations
Several Python and R implementations of GAs are available and we list a few of
them here.
In Python the package DEAP: Distributed Evolutionary Algorithms (Fortin et al.,
2012) in Python provides an extensive toolbox of genetic algorithms libraries that allows
rapid prototyping and testing of most of the ideas presented here. It also supports
parallelization and other evolutionary strategies like genetic programming and evolution
strategies.
Pyevolve is another package in python for genetic algorithms that implements many
of the representations and operators of classical genetic algorithms.
www.electronicbo.com
In R the GA package provides a general implementation of genetic algorithms able to
handle both discrete and continuous cases as well as constrained optimization problems.
It is also possible to create hybrid genetic algorithms to incorporate efficient local search
as well as parallelization either in a single machine with multiple cores or in multiple
machines.
There are also more specialized genetic algorithms implementations in R for very
specific applications. The “caret” package (Kuhn, 2008) provides a genetic algorithm
tailored towards supervised feature selection. The R package “gaucho” (Murison and
Wardell, 2014) uses a GA for analysing tumor heterogeneity from sequencing data and
“galgo” (Trevino and Falciani, 2006) uses GAs for variable selection for very large
dataset like for genomic datasets.
RESULTS
The model created by the genetic algorithms had the parameters shown in Table 6.
Dataset C Degree p r
Ind7 451.637 959.289 2 0.682536 1
Ind10 214.603 677.992 2 0.00968948 1
Ind100 479.011 456.25 2 0.428016 1
Interestingly, for 2 datasets (ind7 and ind100) the chosen kernel was a mixture of
Gaussian and polynomial kernel.
For the conventional method, the kernel is arbitrarily set to Gaussian and the penalty
value C was set to 50 while the kernel width is varied to 0.1, 0.5, 1, 10, and 50. The
average generalization error after the 50 replications for 3 individuals from the case study
is shown in Table 7 and Table 8 and the Tufte’s boxplot (Tufte, 1983) are shown in
Figure 20-Figure 22 where we compare the percentage of misclassification.
The results of a paired t-test of the difference between the performance of best model
using the conventional method and the model constructed by the genetic algorithms show
that the difference in performance is statistically significant at the 95% level.
These experiments show that using genetic algorithms are an effective way to find a
good set of parameters for support vector machines. This method will become
particularly important as more complex kernels with more parameters are designed.
Additional experiments including a comparison with neural networks can be found in
Gruber (2004).
98 Fred K. Gruber
www.electronicbo.com
Figure 20. Average performance of the different models for dataset Ind7.
Figure 21. Average performance of the different models for dataset Ind10.
Figure 22. Average performance of the different models for dataset Ind100.
Evolutionary Optimization of Support Vector Machines … 99
CONCLUSION
In this chapter, we explored the use of genetic algorithms to optimize the parameters
of a SVM and proposed a specific variation that we found to perform better. The
proposed algorithm uses 10-fold crossvalidation as its fitness function. Several types of
crossover and mutation for the genetic algorithm were implemented and compared and it
was found that a diagonal crossover with 4 parents and a fixed mutation rate provided the
best performance.
The SVM engine is based on a C++ version of LIBSVM (Chang and Lin, 2001). This
implementation was modified to include a kernel that is a mixture of Gaussian and
polynomial kernels. Thus, the genetic algorithm has the flexibility to decide how much
weight to assign to each kernel or remove one altogether.
The results from experiments using a data set representing individual models for
electronic commerce (Ryan, 1999) show that GAs are able to find a good set of
parameters that in many cases lead to improve performance over using a SVM with fixed.
While the value of using GAs for finding optimal parameters might not seem so
obvious for SVMs with simple kernels like a Gaussian RBF with only one parameter to
set, as applications continue to appear and new, more complicated kernels (and likely
with more parameters) are designed for specific problems, this need will become
apparent. As illustration of this we created a new kernel which is a mixture of RBF and
complete polynomial kernel. This kernel was previously tested in regression problems by
other researchers. Here we found that it also gives good results for classification
problems.
It was also shown that 10 fold crossvalidation is a good estimator of the
generalization performance of support vector machines and it allowed us to guide the
genetic algorithm to good values for the parameters of the SVM. In addition, we explored
the possibility of using the efficient bound to leave-one-out known as but we found to
be biased for large values of the parameter C .
Finally, we should state that this improvement in performance comes with the price
of an increased processing time. This downside can be minimized by finding more
efficient and unbiased estimates of the performance of SVMs.
REFERENCES
Ali, S. & Smith, K. (2003, October). Automatic parameter selection for polynomial
kernel. In Information Reuse and Integration, 2003. IRI 2003. IEEE International
Conference on (pp. 243-249). IEEE.
100 Fred K. Gruber
Bäck, T., & Schütz, M. (1996). Intelligent mutation rate control in canonical genetic
algorithms. Foundations of Intelligent Systems, 158-167.
Bäck, T., Fogel, D., & Michalewicz, Z. (Eds.). (2000). Evolutionary computation 1:
Basic algorithms and operators (Vol. 1). CRC press.
Bazaraa, M., Sherali, H., & Shetty, C. (2013). Nonlinear programming: theory and
algorithms. John Wiley & Sons.
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data
mining and knowledge discovery, 2(2), 121-167.
Burman, P. (1989). A comparative study of ordinary cross-validation, v-fold cross-
validation and the repeated learning-testing methods. Biometrika, 503-514.
Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple
www.electronicbo.com
parameters for support vector machines. Machine learning, 46(1), 131-159.
Chang, C., & Lin, C. (2011). LIBSVM: a library for support vector machines. ACM
Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
Chen, X. (2003, August). Gene selection for cancer classification using bootstrapped
genetic algorithms and support vector machines. In Bioinformatics Conference, 2003.
CSB 2003. Proceedings of the 2003 IEEE (pp. 504-505). IEEE.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines.
University Press, 2000.
Demuth, H., Beale, M., & Hagan, M. (2008). Neural network toolbox™ 6. User’s guide,
37-55.
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised
classification learning algorithms. Neural computation, 10(7), 1895-1923.
Duan, K., Keerthi, S. S., & Poo, A. N. (2003). Evaluation of simple performance
measures for tuning SVM hyperparameters. Neurocomputing, 51, 41-59.
Dumitrescu, D., Lazzerini, B., Jain, L. C., & Dumitrescu, A. (2000). Evolutionary
computation. CRC press.
Eiben, A. E. (2003). Multiparent recombination in evolutionary computing. Advances in
evolutionary computing, 175-192.
Fishwick, P. A., & Modjeski, R. B. (Eds.). (2012). Knowledge-based simulation:
methodology and application (Vol. 4). Springer Science & Business Media.
Frie, T. T., Cristianini, N., & Campbell, C. (1998, July). The kernel-adatron algorithm: a
fast and simple learning procedure for support vector machines. In Machine
Learning: Proceedings of the Fifteenth International Conference (ICML'98)
(pp. 188-196).
Frohlich, H., Chapelle, O., & Scholkopf, B. (2003, November). Feature selection for
support vector machines by means of genetic algorithm. In Tools with Artificial
Intelligence, 2003. Proceedings. 15th IEEE International Conference on (pp. 142-
148). IEEE.
Evolutionary Optimization of Support Vector Machines … 101
Quang, A., Zhang, Q., & Li, X. (2002). Evolving support vector machine parameters. In
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International
Conference on (Vol. 1, pp. 548-551). IEEE.
Rabelo, L. (2001). What intelligent agent is smarter?: A comparison (MS Thesis,
Massachusetts Institute of Technology).
Rothenberg, J. (1991, December). Tutorial: artificial intelligence and simulation. In
Proceedings of the 23rd conference on Winter simulation (pp. 218-222). IEEE
Computer Society.
Ryan, K. (1999). Success measures of accelerated learning agents for e-commerce
(Doctoral dissertation, Massachusetts Institute of Technology).
Schölkopf, B. & Smola, A. (2002). Learning with kernels: support vector machines,
www.electronicbo.com
regularization, optimization, and beyond. MIT press.
Scrucca, L. (2013). GA: a package for genetic algorithms in R. Journal of Statistical
Software, 53(4), 1-37.
Sepulveda-Sanchis, J., Camps-Valls, G., Soria-Olivas, E., Salcedo-Sanz, S., Bousono-
Calzon, C., Sanz-Romero, G., & de la Iglesia, J. M. (2002, September). Support
vector machines and genetic algorithms for detecting unstable angina. In Computers
in Cardiology, 2002 (pp. 413-416). IEEE.
Shao, X., & Cherkassky, V. (1999, July). Multi-resolution support vector machine. In
Neural Networks, 1999. IJCNN'99. International Joint Conference on (Vol. 2, pp.
1065-1070). IEEE.
Shawe-Taylor, J. & Campbell, C. (1998). Dynamically adapting kernels in support vector
machines. NIPS-98 or NeuroCOLT2 Technical Report Series NC2-TR-1998-017,
Dept. of Engineering Mathematics, Univ. of Bristol, UK.
Smits, G. & Jordaan, E. (2002). Improved SVM regression using mixtures of kernels. In
Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint
Conference on (Vol. 3, pp. 2785-2790). IEEE.
Thierens, D. (2002, May). Adaptive mutation rate control schemes in genetic algorithms.
In Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on
(Vol. 1, pp. 980-985). IEEE.
Trevino, V., & Falciani, F. (2006). GALGO: an R package for multivariate variable
selection using genetic algorithms. Bioinformatics, 22(9), 1154-1156.
Tufte, E. R. (1983). The visual display of information. Conn: Graphic Press, 1983 pp.
1667-1689.
Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business
media.
Vasconcelos, J. A., Ramirez, J. A., Takahashi, R. H. C., & Saldanha, R. R. (2001).
Improvements in genetic algorithms. IEEE Transactions on magnetics, 37(5), 3414-
3417.
Evolutionary Optimization of Support Vector Machines … 103
Weiss, S. & Indurkhya, N. (1994, October). Decision tree pruning: biased or optimal?. In
AAAI (pp. 626-632).
Weiss, S. (1991). Small sample error rate estimation for k-NN classifiers. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13(3), 285-289.
Wooldridge, M. (2009). An introduction to multiagent systems. John Wiley & Sons.
Xiangrong, Z., & Fang, L. (2002, August). A pattern classification method based on GA
and SVM. In Signal Processing, 2002 6th International Conference on (Vol. 1, pp.
110-113). IEEE.
Xuefeng, L., & Fang, L. (2002, August). Choosing multiple parameters for SVM based
on genetic algorithm. In Signal Processing, 2002 6th International Conference on
(Vol. 1, pp. 117-119). IEEE.
Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87(9),
1423-1447.
Zhou, L. & Da, W. (2005). Pre-extracting Support Vector for Support Vector Machine
Based on Vector Projection [J]. Chinese Journal of Computers, 2, 000.
AUTHOR BIOGRAPHY
Chapter 5
ABSTRACT
Good feature extraction methods are key in many pattern classification problems
since the quality of pattern representations affects classification performance.
Unfortunately, feature extraction is mostly problem dependent, with different descriptors
typically working well with some problems but not with others. In this work, we propose
a generalized framework that utilizes matrix representation for extracting features from
patterns that can be effectively applied to very different classification problems. The idea
is to adopt a two-dimensional representation of patterns by reshaping vectors into
matrices so that powerful texture descriptors can be extracted. Since texture analysis is
one of the most fundamental tasks used in computer vision, a number of high performing
methods have been developed that have proven highly capable of extracting important
information about the structural arrangement of pixels in an image (that is, in their
relationships to each other and their environment). In this work, first, we propose some
novel techniques for representing patterns in matrix form. Second, we extract a wide
variety of texture descriptors from these matrices. Finally, the proposed approach is
*
Corresponding Author Email: loris.nanni@unibo.it
106 Loris Nanni, Sheryl Brahnam and Alessandra Lumini
tested for generalizability across several well-known benchmark datasets that reflect a
diversity of classification problems. Our experiments show that when different
approaches for transforming a vector into a matrix are combined with several texture
descriptors the resulting system works well on many different problems without requiring
any ad-hoc optimization. Moreover, because texture-based and standard vector-based
descriptors preserve different aspects of the information available in patterns, our
experiments demonstrate that the combination of the two improves overall classification
performance. The MATLAB code for our proposed system will be publicly available to
other researchers for future comparisons.
www.electronicbo.com
INTRODUCTION
Most machine pattern recognition problems require the transformation of raw sensor
data so that relevant features can be extracted for input into one or more classifiers. A
common first step in machine vision, for instance, is to reshape the sensor matrix by
concatenating its elements into a one dimensional vector so that various feature
transforms, such as principal component analysis (PCA) (Beymer & Poggio, 1996), can
be applied that side step the curse of dimensionality by reducing the number of features
without eliminating too much vital information. Reshaping the data matrix into a vector,
however, is not necessarily the only nor the best approach for representing raw input
values [16]. One problem with vectorizing a data matrix is that it destroys some of the
original structural knowledge (D. Li, Zhu, Wang, Chong, & Gao, 2016; H. Wang &
Ahuja, 2005).
In contrast to vectorization, direct manipulation of matrices offers a number of
advantages, including an improvement in the performance of canonical transforms when
applied to matrices, a significant reduction in computational complexity (Loris Nanni,
Brahnam, & Lumini, 2012; Z. Wang, Chen, Liu, & Zhang, 2008), and enhanced
discrimination using classifiers developed specifically to handle two-dimensional data
(see, for example, (Z. Wang & Chen, 2008) and (Z. Wang et al., 2008)). Moreover, some
of the most powerful state-of-the-art two-dimensional feature extraction methods, such as
Gabor filters (Eustice, Pizarro, Singh, & Howland, 2002) and Local binary patterns
(LBP) (L. Nanni & Lumini, 2008; Ojala, Pietikainen, & Maeenpaa, 2002), and their
variants, extract descriptors directly from matrices. Other methods, such as Two-
Dimensional Principal Component Analysis (2DPCA) (Yang, Zhang, Frangi, & Yang,
2004) and Two-Dimensional Linear Discriminant Analysis (2DLDA) (J. Li, Janardan, &
Li, 2002), allow classic transforms, such as PCA and Linear Discriminant Analysis
(LDA) (Zhang, Jing, & Yang, 2006), to work directly on matrix data. By projecting
matrix patterns via matrices, both 2DPCA and 2DLDA avoid the singular scatter matrix
problem. Classifier systems that are designed to handle two-dimensional data include
Texture Descriptors for The Generic Pattern Classification Problem 107
Min-Sum matrix Products (MSP) (Felzenszwalb & McAuley, 2011), which has been
shown to efficiently solve the Maximum-A-Posteriori (MAP) inference problem,
Nonnegative Matrix Factorization (NMF) (Seung & Lee, 2001), which has become a
popular choice for solving general pattern recognition problems, and the Matrix-pattern-
oriented Modified Ho-Kashyap classifier (MatMHKS) (S. Chen, Wang, & Tian, 2007),
which significantly decreases memory requirements. MatMHKS has recently been
expanded to UMatMHKS (H. Wang & Ahuja, 2005), so named because it combines
matrix learning with Universum learning (Weston, Collobert, Sinz, Bottou, & Vapnik,
2006), a combination that was shown in that study to improve the generalization
performance of classifiers.
In the last ten years, many studies focused on generic classification problems have
investigated the discriminative gains offered by matrix feature extraction methods (see,
for instance, (S. C. Chen, Zhu, Zhang, & Yang, 2005; Liu & Chen, 2006; Z. Wang &
Chen, 2008; Z. Wang et al., 2008)). Relevant to the work presented here is the
development of novel methods that take vectors and reshape them into matrices so that
state-of-the-art two-dimensional feature extraction methods can be applied. Some studies
along these lines include the reshaping methods investigated in (Z. Wang & Chen, 2008)
and (Z. Wang et al., 2008) that were found capable of diversifying the design of
classifiers, a diversification that was then exploited by a technique based on AdaBoost. In
(Kim & Choi, 2007) a composite feature matrix representation, derived from discriminant
analysis, was proposed. A composite feature takes a number of primitive features and
corresponds them to an input variable. In (Loris Nanni, 2011) Local Ternary Patterns
(LTP), a variant of LBP, were extracted from vectors rearranged into fifty matrices by
random assignment; an SVM was then trained on each of these matrices, and the results
were combined using the mean rule. This method led the authors in (Loris Nanni, 2011)
to observe that both one-dimensional vector descriptors and two-dimensional texture
descriptors can be combined to improve classifier performance; moreover, it was shown
that linear SVMs consistently perform well with texture descriptors.
In this work, we propose a new classification system, composed of an ensemble of
Support Vector Machines (SVMs). The ensemble is built training each SVM with a
different set of features. Three novel approaches for representing a feature vector as an
image are proposed; texture descriptors are then extracted from the images and used to
train an SVM. To validate this idea, several experiments are carried out on several
datasets.
Proposed Approach
machine learning (Loris Nanni et al., 2012). In (Z. Wang & Chen, 2008; Z. Wang et al.,
2008) classifiers were developed for handling two-dimensional patterns, and in (Loris
Nanni et al., 2012) it was shown that a continuous wavelet can be used to transform a
vector into a matrix; once in matrix form, it can then be described using standard texture
descriptors (the best performance obtained in (Loris Nanni et al., 2012) used a variant of
the local phase quantization based on a ternary coding).
The advantage of extracting features from a vector that has been reshaped into a
matrix is the ability to investigate the correlation among sets of features in a given
neighborhood; this is different from coupling feature selection and classification. To
maximize performance, it was important that we test several different texture descriptors
and different neighborhood sizes. The resulting feature vectors were then fed into an
www.electronicbo.com
SVM.
The following five methods for reshaping a linear feature vector into a matrix were
tested in this paper. Letting 𝐪 ∈ 𝑅 𝑠 be the input vector, 𝐌 ∈ ℜ𝑑1 ×𝑑2 the output matrix
(where d1 and d2 depend on the method), and a ∈ ℜ𝑠 a random permutation of the
indices [1..s], the five methods are:
1. Triplet (Tr): in this approach d1 =d2 =255. First, the original feature vector q is
normalized to [0,255] and stored in n. Second, the output matrix 𝑀 ∈ ℜ255×255 is
initialized to 0. Third, a randomization procedure is performed to obtain a random
permutation aj for j=1..100000 that updates M according to the following formula:
M(n(aj(1)), n(aj (2))) = M(n(aj(1)), n(aj (2))) + q(aj (3));
2. Continuous wavelet (CW) (Loris Nanni et al., 2012): in this approach d1 =100 d2
=s. This method applies the Meyer continuous wavelet to the s dimensional feature vector
q and builds M by extracting the wavelet power spectrum, considering the 100 different
decomposition scales;
3. Random reshaping (RS): in this approach d1=d2=s0.5 and M is a random
rearrangement of the original vector into a square matrix. Each entry of matrix M is an
element of q(a);
4. DCT: in this approach the resulting matrix M has dimensions d1 = d2 = s and
each entry M(i, j) = dct(q(aij(2..6)), where dct() is the discrete cosine transform, aij is a
random permutation (different for each entry of the matrix), and the indices 2..6 are used
to indicate that the number of considered features varies between two and six. We use
DCT in this method because it is considered the de-facto image transformation in most
visual systems. Like other transforms, the DCT attempts to decorrelate the input data.
The 1-dimensional DCT is obtained by the product of the input vector and the orthogonal
matrix whose rows are the DCT basis vectors (the DCT basis vectors are orthogonal and
normalized). The first transform coefficient (referred to as the DC Coefficient) is the
average value of the input vector, while the others are called the AC Coefficients. After
several tests we obtained the best performance using the first DCT coefficient;
Texture Descriptors for The Generic Pattern Classification Problem 109
5. FFT: the same procedure as DCT but, instead of using a discrete cosine
transform, the Fast Fourier transform is used. Similar to DCT, the FFT decomposes a
finite-length vector into a sum of scaled-and-shifted basis functions. The difference is the
type of basis function used by each transform: while the DCT uses only (real-valued)
cosine functions, the DFT uses a set of harmonically-related complex exponential
functions. After several tests, we obtained the best performance using the first FFT
coefficient (i.e., the sum of values of the vector).
technique to transfer data in one domain to another where hidden information can
be extracted. Wavelets have a nice feature of local description and separation of
signal characteristics and provides a tool for the simultaneous analysis of both
time and frequency. A wavelet is a set of orthonormal basis functions generated
from dilation and translation of a single scaling function or father wavelet (φ) and
a mother wavelet (ψ). In this work we use the Haar wavelet family, which is a
sequence of rescaled "square-shaped" functions that together form a wavelet
basis: the extracted descriptor is obtained as the average energy of the horizontal,
vertical or diagonal detail coefficients calculated up to the tenth level
decomposition.
According to several studies in the literature a good solution for improving the
www.electronicbo.com
performance of an ensemble approach is pattern perturbation. To improve the
performance an ensemble is obtained using 50 reshapes for each pattern: for each reshape
the original features of the pattern are randomly sorted. In this way 50 SVMs are trained
for each approach, and these SVMs are combined by sum rule. In the next section only
the performance of the ensemble of SVMs are reported, since in (Loris Nanni et al.,
2012) it is shown that such an ensemble improves the stand-alone version.
Experimental Results
To assess their versatility, the methods described above for reshaping a vector into a
matrix were challenged with several datasets (see Table 1). All the tested data mining
datasets are extracted from the well-known UCI datasets repository (Lichman, 2013),
except for the Tornado dataset (Trafalis, Ince, & Richman, 2003). Moreover, two
additional datasets are provided that are related to the image classification problem:
A summary description of the tested datasets, including the number of patterns and
the dimension of the original feature vector, is reported in Table 1. All the considered
datasets are two class classification problems.
Texture Descriptors for The Generic Pattern Classification Problem 111
The testing protocol used in the experiments is the 5-fold CV method, except for the
Tornado dataset since it is already divided into separate training and testing sets. All
features in these datasets were linearly normalized between 0 and 1, using only the
training data for finding the parameters to normalize the data; this was performed before
feeding features into a SVM. The performance indicator used is the area under the ROC
curve (AUC).
In the following experiments, we optimized SVM for each dataset, testing both linear
and radial basis function kernels.
The first experiment is aimed at evaluating the five methods for reshaping a linear
feature vector into a matrix as described in section 2. In Table 2, we report the
performance of each reshaping approach coupled with each matrix descriptor, as detailed
in section 2.
Examining the results in Table 2, it is clear that TR performs rather poorly;
moreover, RS, coupled with LPQ and CLBP, have numerical problems in those datasets
where few features are available (thereby resulting in small matrices). The best reshaping
method is FFT, and the best tested descriptor is HOG.
The second experiment is aimed at evaluating the fusion among different reshaping
methods and different descriptors for proposing an ensemble that works well across all
tested datasets. The first four columns of Table 3 show the fusion of reshaping methods
(except Tr, due to its low performance) for each descriptor (labelled Dx, specifically,
DLPQ, DCLBP, DHoG, and DWave). The last four columns report the fusion of methods
obtained by fixing the descriptor and varying the reshaping procedures (labelled Rx,
specifically, RTr, RCW, RRS, RDCT, and RFFT).
112 Loris Nanni, Sheryl Brahnam and Alessandra Lumini
www.electronicbo.com
liver 56.7 68.9 0 70.8 71.6
hab 48.7 0 0 63.5 63.4
vote 49.1 96.9 0 97.7 97.9
aust 71.7 91.2 0 90.1 90.5
trans 52.4 0 0 68.0 67.6
wdbc 89.9 97.9 97.7 98.6 98.9
bCI 76.7 96.2 93.4 96.6 96.7
pap 70.3 84.2 85.7 87.2 88.1
torn 80.2 89.3 93.3 93.6 93.6
gCr 72.6 73.5 77.6 78.2 78.3
Average 68.7 76.7 42.5 86.1 86.3
Table 2 (Continued.)
www.electronicbo.com
torn 80.2 85.2 91.1 90.3 90.4
gCr 69.6 71.1 78.3 78.9 79.6
Average 75.5 86.3 84.9 87.1 87.6
Table 3: Performance (AUC) of the ensemble created by fixing the descriptor (first
four columns) and the reshaping method (last four columns).
DATASET DLPQ DCLBP DHoG DWave RTr RCW RRS RDCT RFFT
breast 97.5 97.9 99.5 99.4 99.2 99.2 99.3 99.3 99.2
heart 89.3 89.3 90.2 89.9 89.5 90.6 90.3 89.3 90.1
pima 72.0 72.2 80.8 82.3 74.4 80.9 80.8 80.6 80.5
sonar 92.8 89.3 94.2 92.6 70.9 93.9 93.1 93.7 93.6
iono 98.6 97.9 98.2 97.8 92.6 98.2 98.6 98.6 98.4
liver 71.8 70.4 73.4 73.4 59.3 73.2 74.2 73.4 73.6
hab 62.6 61.5 69.0 69.2 60.7 66.4 69.5 67.0 68.1
vote 97.8 97.3 98.1 96.8 74.7 97.1 98.3 97.6 97.8
aust 90.4 90.9 91.2 92.1 83.8 91.4 91.9 91.6 91.8
trans 68.3 67.1 69.2 71.0 66.0 67.1 69.9 68.7 70.1
wdbc 98.8 98.4 99.4 99.5 94.5 99.4 99.4 99.5 99.6
bCI 96.5 96.2 96.6 95.2 83.7 96.4 95.5 96.5 96.8
pap 86.8 82.4 87.0 84.3 74.4 84.9 86.6 87.4 87.6
torn 92.8 93.4 94.0 89.4 85.2 92.9 94.8 94.5 94.6
gCr 77.1 76.8 77.5 77.4 68.3 75.0 78.5 79.1 79.8
Average 86.2 85.4 87.9 87.4 78.5 87.1 88.0 87.8 88.1
Texture Descriptors for The Generic Pattern Classification Problem 115
As expected, the best results in Table 3 are obtained by DHoG and RFFT, i.e., by the
best descriptor and the best reshaping method.
Finally, in Table 4 the result of our best ensembles are reported and compared with
two baseline approaches: the first, named 1D, is the classification method obtained
coupling the original 1D descriptor with a SVM classifier; the second, is the best method
proposed in our previous work (Loris Nanni et al., 2012).
Included in Table 4 are results of the following “mixed reshaping” ensembles, which
are designed as follows:
MR1= 2×RCW + RRS (i.e., weighted sum rule between RCW and RRS)
MR2= 2×RCW + RRS + RDCT + RFFT
MR3= (RSHOG + RSWave) + 2 × (FFTHOG + FFTWave) (Xy means that the reshaping
method named X is coupled with the texture descriptor named Y)
MR4= MR2 + 2×1D
MR5= MR3 + 2×1D
Before fusion, the scores of each method are normalized to mean 0 and standard
deviation 1.
Table 4 includes the performance of the best ensemble proposed in our previous work
(Loris Nanni et al., 2012) that should be compared to MR2, where the fusion with 1D is
not considered.
The proposed ensembles work better than (Loris Nanni et al., 2012), except in the
two image datasets (bCI and pap). More tests will be performed to better assess the
performance when several features are available (as in bCI and pap). It may be the case
that different ensembles should be used that consider the dimensionally of the original
feature vector.
MR4 and MR5 perform similarly, with both outperforming 1D descriptors with a p-
value of 0.05 (Wilcoxon signed rank test (Demšar, 2006)). MR5 is a simpler approach,
however. This is a very interesting result since the standard method for training SVM is
to use the original feature vector. To reduce the number of parameters when MR4 or
MR5 are combined with 1D descriptors, we always use the same SVM parameters (RBF
kernel, C=1000, gamma=0.1) for MR4 and MR5 (while optimizing them for the 1D
descriptors).
CONCLUSIONS
This paper reports the results of experiments that investigate the performance
outcomes of extracting different texture descriptors from matrices that were generated by
reshaping the original feature vector. The study also reports the performance gains
offered by combining texture descriptors with vector-based descriptors.
116 Loris Nanni, Sheryl Brahnam and Alessandra Lumini
www.electronicbo.com
iono 98.4 98.4 98.2 98.4 98.4 98.3 98.2 98.1
liver 73.9 73.7 74.8 70.3 73.6 76.2 75.8 75.6
hab 67.6 67.8 69.0 59.2 65.8 70.0 69.1 70.1
vote 97.7 97.7 97.7 97.7 97.7 98.5 98.5 98.5
aust 91.7 91.7 92.0 90.8 91.7 92.1 92.4 92.0
trans 67.2 69.5 70.6 61.9 65.8 72.5 73.0 72.9
wdbc 99.5 99.5 99.5 98.8 99.5 99.6 99.6 99.6
bCI 96.1 96.4 96.2 97.0 96.8 96.3 96.4 95.6
pap 86.1 87.2 87.3 88.0 87.5 87.5 87.4 86.8
torn 94.1 94.5 94.6 93.6 94.7 94.2 94.5 90.2
gCr 77.2 78.4 79.7 78.9 79.7 80.7 80.7 80.1
Average 87.6 88.0 88.2 85.8 87.7 88.9 88.9 88.4
This study expands our previous research in this area. First, it investigates different
methods for matrix representation in pattern classification. We found that approaches
based on FFT worked best. Second, we explored the value of using different texture
descriptors to extract a high performing set of features. Finally, we tested the
generalizability of our new approach across several datasets representing different
classification problems. The results of our experiments showed that our methods
outperformed SVMs trained on the original 1D feature sets.
Because each pixel in a texture describes a pattern that is extracted starting from the
original feature, we were also motivated to investigate the correlation among the original
features belonging to a given neighborhood. Thus, we studied the correlation among
different sets of features by extracting images from each pattern and then randomly
Texture Descriptors for The Generic Pattern Classification Problem 117
sorting the features of the original pattern before the matrix generation process. This
simple method also resulted in improved performance.
In the future we plan on studying the potential of improving performance of the
proposed approach by fusing the different texture descriptors.
REFERENCES
Beymer, D., & Poggio, T. (1996). Image representations for visual learning. Science,
272(5270), 1905-1909.
Chan, C., Tahir, M., Kittler, J., & Pietikainen, M. (2013). Multiscale local phase
quantisation for robust component-based face recognition using kernel fusion of
multiple descriptors. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 35(5), 1164-1177.
Chen, S., Wang, Z., & Tian, Y. (2007). Matrix-pattern-oriented ho-kashyap classifierwith
regularization learning. Pattern Recognition, 40(5), :1533–1543.
Chen, S. C., Zhu, Y. L., Zhang, D. Q., & Yang, J. (2005). Feature extraction approaches
based on matrix pattern: MatPCA and MatFLDA. Pattern Recognition Letters, 26,
1157-1167.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection.
Paper presented at the 9th European Conference on Computer Vision, San Diego,
CA.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal
of Machine Learning Research, 7 1-30.
Eustice, R., Pizarro, O., Singh, H., & Howland, J. (2002). UWIT: Underwater image
toolbox for optical image processing and mosaicking in MATLAB. Paper presented at
the International Symposium on Underwater Technology, Tokyo, Japan.
Felzenszwalb, P., & McAuley, J. (2011). Fast inference with min-sum matrix product.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12), 2549-
2554.
Guo, Z., Zhang, L., & Zhang, D. (2010). A completed modeling of local binary pattern
operator for texture classification. IEEE Transactions on Image Processing, 19(6),
1657-1663 doi: 10.1109/TIP.2010.2044957
Jantzen, J., Norup, J., Dounias, G., & Bjerregaard, B. (2005). Pap-smear benchmark data
for pattern classification. Paper presented at the Nature inspired Smart Information
Systems (NiSIS), Albufeira, Portugal.
Junior, G. B., Cardoso de Paiva, A., Silva, A. C., & Muniz de Oliveira, A. C. (2009).
Classification of breast tissues using Moran's index and Geary's coefficient as texture
signatures and SVM. Computers in Biology and Medicine, 39(12), 1063-1072.
118 Loris Nanni, Sheryl Brahnam and Alessandra Lumini
Kim, C., & Choi, C.-H. (2007). A discriminant analysis using composite features for
classification problems. Pattern Recognition, 40(11), 2958-2966.
Li, D., Zhu, Y., Wang, Z., Chong, C., & Gao, D. (2016). Regularized matrix-pattern-
oriented classification machine with universum. Neural Processing Letters.
Li, J., Janardan, R., & Li, Q. (2002). Two-dimensional linear discriminant analysis.
Advances in neural information processing systems, 17, 1569-1576.
Lichman, M. (2013). UCI Machine Learning Repository. (http://www.ics.uci.edu/
~mlearn/MLRepository.html). Irvine, CA.
Liu, J., & Chen, S. C. (2006). Non-iterative generalized low rank approximation of
matrices. Pattern Recognition Letters, 27(9), 1002-1008.
Nanni, L. (2011). Texture descriptors for generic pattern classification problems. Expert
www.electronicbo.com
Systems with Applications, 38(8), 9340-9345.
Nanni, L., Brahnam, S., & Lumini, A. (2012). Matrix representation in pattern
classification. Expert Systems with Applications, Appl. 39.3, 3031-3036.
Nanni, L., & Lumini, A. (2008). A reliable method for cell phenotype image
classification. Artificial Intelligence in Medicine, 43(2), 87-97.
Ojala, T., Pietikainen, M., & Maeenpaa, T. (2002). Multiresolution gray-scale and
rotation invariant texture classification with local binary patterns. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 24(7), 971-987.
Ojansivu, V., & Heikkila, J. (2008). Blur insensitive texture classification using local
phase quantization. Paper presented at the ICISP.
Seung, D., & Lee, L. (2001). Algorithms for non-negative matrix factorization. Advances
in neural information processing systems, 13, 556-562.
Trafalis, T. B., Ince, H., & Richman, M. B. (2003). Tornado detection with support
vector machines. Paper presented at the International Conerence on Computational
Science, Berlin and Heidelberg.
Wang, H., & Ahuja, N. (2005). Rank-r approximation of tensors using image-as-matrix
representation. Paper presented at the IEEE Computer Society conference on
computer vision and pattern recognition,.
Wang, Z., & Chen, S. C. (2008). Matrix-pattern-oriented least squares support vector
classifier with AdaBoost. Pattern Recognition Letters, 29, 745-753.
Wang, Z., Chen, S. C., Liu, J., & Zhang, D. Q. (2008). Pattern representation in feature
extraction and classification-matrix versus vector. IEEE Transactions on Neural
Networks, 19(758-769).
Weston, J., Collobert, R., Sinz, F., Bottou, L., & Vapnik, V. (2006). Inference with the
universum. Paper presented at the International conference on machine learning.
Yang, J., Zhang, D., Frangi, A. F., & Yang, J. U. (2004). Two-dimension pca: A new
approach to appearance-based face representation and recognition. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 26(1), 131-137.
Texture Descriptors for The Generic Pattern Classification Problem 119
Zhang, D., Jing, X., & Yang, J. (2006). Biometric image discrimination technologies.
Hershey: Idea Group Publishing.
AUTHORS’ BIOGRAPHIES
Chapter 6
ABSTRACT
This chapter proposes the solution of an optimization problem based on the concept
of the accumulated deviations from equilibrium (ADE) to eliminate instability in the
supply chain. The optimization algorithm combines the advantage of particle swarm
optimization (PSO) to determine good regions of the search space to find the optimal
point within those regions. The local search uses a Powell hill-climbing (PHC) algorithm
as an improved procedure to the solution obtained from the PSO algorithm, which assures
a fast convergence of the ADE. The applicability of the method is demonstrated by using
a case study in the manufacturing supply chain. The experiments showed that solutions
generated by this hybrid optimization algorithm were robust.
*
Corresponding Author Email: alfonsosava@unisabana.edu.co.
122 Alfonso T. Sarmiento and Edgar Gutierrez
INTRODUCTION
During the last decade, manufacturing enterprises have been under pressure to
compete in a market that is rapidly changing due to global competition, shorter product
life cycles, dynamic changes of demand patterns and product varieties and environmental
standards. In these global markets, competition is ever increasing and companies are
widely adopting customer-focused strategies in integrated-system approaches. In
addition, push manufacturing concepts are being replaced by pull concepts and notions of
quality systems are getting more and more significant.
Policy analysis as a method to generate stabilization policies in supply chain
management (SCM) can be addressed by getting a better understanding of the model
www.electronicbo.com
structure that determines the supply chain (SC) behavior. The main idea behind this
structural investigation is that the behavior of a SC model is obtained by adding
elementary behavior modes. For linear models the eigenvalues represent these different
behavior modes the superposition of which gives rise to the observed behavior of the
system. For nonlinear systems the model has to be linearized at any point in time. Finding
the connection between structure and behavior provides a way to discover pieces of the
model where to apply policies to eliminate instabilities. However, other techniques are
required to determine the best values of the parameters related to the stabilization policy.
This work is motivated by the large negative impacts of supply chain instabilities.
Those impacts occur because instabilities can cause (1) oscillations in demand forecasts,
inventory levels, and employment rates and (2) unpredictability in revenues and profits.
These impacts amplify risk, raise the cost of capital, and lower profits. Modern enterprise
managers can minimize these negative impacts by having the ability to determine
alternative policies and plans quickly.
Due to the dynamic changes in the business environment, managers today rely on
decision technology1 more than ever to make decisions. In the area of supply chain, the
top projected activities where decision technology applications have great potential of
development are planning, forecasting, and scheduling (Poirier and Quinn, 2006).
This chapter presents a methodology that proposes a hybrid scheme for a policy
optimization approach with PSO to modify the behavior of entire supply chains in order
to achieve stability.
Policy Optimization
1
Decision technology adds value to network infrastructure and applications by making them smarter.
Simulation Optimization Using a Hybrid Scheme … 123
been used to obtain policies that modify system behavior. Burns and Malone (1974)
expressed the required policy as an open-loop solution (i.e., the solution function has not
the variables from the system). The drawback of this method is that if the system
fluctuates by some little impact, the open loop solution without information feedback
cannot adjust itself to the new state. Keloharju (1982) proposed a method of iterative
simulation where each iteration consists of a parameter optimization. He suggests
predefining the policy structure by allowing certain parameters of the model to be
variables and by adding new parameters. However, the policies obtained with
Keloharju’s method are not robust when subject to variations of external inputs because
the policy structure was predefined and thereafter optimized (Macedo, 1989). Coyle
(1985) included structural changes to the model, and applies the method to a production
system.
Kleijnen (1995) presented a method that includes design of experiments and response
surface methodology for optimizing the parameters of a model. The approach treats
system dynamics (SD) as a black box, creating a set of regression equations to
approximate the simulation model. The statistical design of experiments is applied to
determine which parameters are significant. After dropping the insignificant parameters,
the objective function is optimized by using the Lagrange multiplier method. The
parameter values obtained through the procedure are the final solution. Bailey et al.
(2000) extended Kleijnen’s method by using response surfaces not to replace the
simulation models with analytic equations, but instead to direct attention to regions
within the design space with the most desirable performance. Their approach identifies
the exploration points surrounding the solution of Kleijnen’s method and the finds a set
of real best combination of parameters from them (Chen and Jeng, 2004).
Grossmann (2002) used genetic algorithms (GA) to find optimal policies. He
demonstrates his approach in the Information Society Integrated System Model where he
evaluates different objective functions. Another method that uses genetic algorithms to
search the solution space is the one proposed by Chen and Jeng (2004). First, they
transform the SD model into a recurrent neural network. Next, they use a genetic
algorithm to generate policies by fitting the desired system behavior to patterns
established in the neural network. Chen and Jeng claim their approach is flexible in the
sense that it can find policies for a variety of behavior patterns including stable
trajectories. However, the transformation stage might become difficult when SD models
reach real-world sizes.
In optimal control applied to system dynamics, Macedo (1989) introduced a mixed
approach in which optimal control and traditional optimization are sequentially applied in
the improvement of the SD model. Macedo’s approach consists principally of two
models: a reference model and a control model. The reference model is an optimization
model whose main objective is to obtain the desired trajectories of the variables of
interest. The control model is an optimal linear-quadratic control model whose
124 Alfonso T. Sarmiento and Edgar Gutierrez
fundamental goal is to reduce the difference between the desired trajectories (obtained by
solving the reference model) and the observed trajectories (obtained by simulation of the
system dynamic model).
www.electronicbo.com
equilibrium state is stable. However, if a system tends to continue to move away from its
original equilibrium state when perturbed from it, the system is unstable.
Sterman (2006) stated that “supply chain instability is a persistent and enduring
characteristic of market economies.” As a result, company indicators such as demand
forecast, inventory level, and employment rate show an irregular and constant fluctuation.
Supply chain instability is costly because it creates “excessive inventories, poor customer
service, and unnecessary capital investment” (Sterman, 2006).
In dynamic complex systems like supply chains, a small deviation from the
equilibrium state can cause disproportionately large changes in the system behavior, such
as oscillatory behavior of increasing magnitude over time. The four main contribution
factors to instability in SC have been identified by Lee et al. (1997), which are:
The stability of supply chains models can be analyzed using the vast theory of linear
and nonlinear dynamic systems control. Disney et al. (2000) described a procedure for
optimizing the performance of an industrially design inventory control system. They
Simulation Optimization Using a Hybrid Scheme … 125
2
Eigenvalues in the right half of the complex plane cause instability, whereas eigenvalues in the left half of the
complex plane determine stable systems.
126 Alfonso T. Sarmiento and Edgar Gutierrez
relies on dynamic modeling and control theory. The approach is based on two elements, a
framework to capture the dynamics of the SC, and on the design of methodical
procedures defined by control laws to manage the SC. They test several heuristic control
laws and analyze their impact on the behavior of the SC.
Model structural analysis methods have also been used to eliminate oscillatory
behavior in SC models.
Lertpattarapong (2002) and Gonçalves (2003) used eigenvalue elasticity analysis to
identify the loops that are responsible for the oscillatory behavior of the inventory in the
SC. Then they use the insights about the impact of feedback structures on model behavior
to propose policies for stabilizing the system. These policies are based on inventory
buffers or safety stock. Saleh et al. (2006) used the Behavior Decomposition Weights
www.electronicbo.com
(BDW) analysis to identify relevant parameters that stabilize the inventory fluctuations in
a linear inventory-force model. To explore the utility of the method in a SD nonlinear
model they choose a medium-size economic model. In order to perform the BDW
analysis, they linearize the model at a point in time, once the eigenvalues have become
stable. The method provides a partial policy analysis as it studies the effects of changing
individual policy parameters. Currently, the method does not consider the interactions
due to changes in several parameters simultaneously.
Forrester (1982) presented several policies for stabilizing dynamic systems. The first
two approaches, reduction of the frequency of oscillations and increment in the rate decay
of oscillations, represent a measure of behavior of the whole system and are covered by
the linear system control theory. Other methods such as variance reduction and gain
reduction are focused on the stability of a particular variable of the system. Therefore,
they have to be extended to implement stabilizing policies of the entire system.
Policy optimization provides an efficient method for obtaining SC stabilization
policies. O’Donnell et al. (2006) employed GA to reduce the bullwhip effect and cost in
the MIT Beer Distribution Game. The GA is used to determine the optimal ordering
policy for members of the SC. Lakkoju (2005) uses a methodology for minimizing the
oscillations in the SC based on SD and GA. He applies the variance reduction criterion
proposed by Forrester to stabilize the finished goods inventory of an electronics
manufacturing company.
The literature review on stability analysis of the SC shows that several techniques
have been used to generate stabilization policies. Model structural analysis methods can
provide some insights into how to tackle the behaviors that generate instability of supply
chains modeled as dynamic systems through the identification of the loops responsible
for them. However, these methods rely on sensitivity analysis to design the stabilization
policies. Control theory can support the stabilization methodologies by providing
theoretical concepts to stabilize dynamics systems. One problem with the approaches
based on control theory is the mathematics involved to determine the analytical solution.
Moreover, like the model structural analysis methods, they can require certain
Simulation Optimization Using a Hybrid Scheme … 127
simplifications, such as the linearization of the system (Dangerfield and Roberts, 1996).
On the other hand, policy optimization based on algorithmic search methods that use
simulation represent the most general mean for stability analysis of nonlinear systems,
due to its effectiveness in handling the general cases and most of special problems that
arise from nonlinearity. However, the objective functions are chosen to represent the
stability conditions to each model. The use of a generic objective function applied to
stabilize SC models independent of their linear or nonlinear structure has not been found
in the literature surveyed so far.
Among the advantages of PSO, it can be mentioned that PSO is conceptually simple
and can be implemented in a few lines of code. In comparison with other stochastic
optimization techniques like GA or simulated annealing, PSO has fewer complicated
operations and fewer defining parameters (Cui and Weile, 2005). PSO has been shown to
be effective in optimizing difficult multidimensional discontinuous problems in a variety
of fields (Eberhart and Shi, 1998), and it is also very effective in solving minimax
problems (Laskari et al. 2002). According to Schutte and Groenwold (2005), a drawback
of the original PSO algorithm proposed by Kennedy and Eberhart lies in that although the
algorithm is known to quickly converge to the approximate region of the global
minimum; however, it does not maintain this efficiency when entering the stage where a
refined local search is required to find the minimum exactly. To overcome this
www.electronicbo.com
shortcoming, variations of the original PSO algorithm that employ methods with adaptive
parameters have been proposed (Shi and Eberhart 1998, 2001; Clerc, 1999).
Comparison on the performance of GA and PSO, when solving different optimization
problems, is mentioned in the literature. Hassan et al. (2005) compared the performance
of both algorithms using a benchmark test of problems. The analysis shows that PSO is
more efficient than GA in terms of computational effort when applied to unconstrained
nonlinear problems with continuous variables. The computational savings offered by
PSO over GA are not very significant when used to solve constrained nonlinear problems
with discrete or continuous variables. Jones (2005) chose the identification of model
parameters for control systems as the problem area for the comparison. He indicates that
in terms of computational effort, the GA approach is faster, although it should be noted
that neither algorithm takes an unacceptably long time to determine their results.
With respect to accuracy of model parameters, the GA determines values which are
closer to the known ones than does the PSO. Moreover, the GA seems to arrive at its final
parameter values in fewer generations that the PSO. Lee et al. (2005) selected the return
evaluation in stock market as the scenario for comparing GA and PSO. They show that
PSO shares the ability of GA to handle arbitrary nonlinear functions, but PSO can reach
the global optimal value with less iteration that GA. When finding technical trading rules,
PSO is more efficient than GA too. Clow and White (2004) compared the performance of
GA and PSO when used to train artificial neural networks (weight optimization problem).
They show that PSO is superior for this application, training networks faster and more
accurately than GA does, once properly optimized.
From the literature presented above, it is shown that PSO combined with simulation
optimization is a very efficient technique that can be implemented and applied easily to
solve various function optimization problems. Thus, this approach can be extended to the
SCM area to search for policies using an objective function defined on a general
stabilization concept like the one that is presented in this work.
Simulation Optimization Using a Hybrid Scheme … 129
Hill-climbing methods are heuristics that use an iterative improvement technique and
are based on a single solution search strategy. These methods can only provide local
optimum values, and they depend on the selection of the starting point (Michalewicz and
Fogel, 2000). Some advantages of hill-climbing-based approaches include: (1) very easy
to use (Michalewicz and Fogel, 2000), (2) do not require extensive parameter tuning, and
(3) very effective in producing good solutions in a moderate amount of time (DeRonne
and Karypis, 2007).
The Powell hill-climbing algorithm was developed by Powell (1964) and it is a hill-
climbing optimization approach that searches the objective in a multidimensional space
by repeatedly using single dimensional optimization. The method finds an optimum in
one search direction before moving to a perpendicular direction in order to find an
improvement (Press et al. 1992). The main advantage of this algorithm lies in not
requiring the calculation of derivatives to find an unconstraint minimum of a function of
several variables (Powell, 1964). This allows using the method to optimize highly
nonlinear problems where it can be laborious or practically impossible to calculate the
derivatives. Moreover, it has been shown that a hybrid strategy that uses a local search
method such as hill-climbing can accelerate the search towards the global optimum,
improving the performance of the searching algorithm (Yin et al. 2006; Özcan & Yilmaz,
2007).
OPTIMIZATION ALGORITHM
The method used to solve the optimization problem is a hybrid algorithm that
combines the advantage of PSO optimization to determine good regions of the search
space with the advantage of local optimization to find quickly the optimal point within
those regions. In other words, the local search is an improvement procedure over the
solution obtained from the PSO algorithm that assures a fast convergence of the ADE.
The local search technique selected was the Powell hill-climbing algorithm. This method
was chosen because: (1) it can be applied to solve multi-dimensional optimization
problems, (2) it is a relatively simple heuristic that does not require the calculation of
derivatives.
The general structure of the method is illustrated in Figure 1. This figure indicates
that the solution to the optimization problem obtained by the PSO algorithm becomes the
initial point to perform a local search using the PHC algorithm. Finally, if the ADE has
converged then the solution provided by the PHC method is the stabilization policy;
otherwise the parameter settings of the PSO algorithm have to be changed in order to
improve the search that makes ADE to converge.
130 Alfonso T. Sarmiento and Edgar Gutierrez
www.electronicbo.com
Figure 1. Optimization algorithm.
The algorithm used is called “local best PSO” (Engelbrecht, 2005) and is based on a
social network composed of neighborhoods related to each particle. The algorithm
maintains a swarm of particles, where each particle represents a candidate solution to the
optimization problem. These particles move across the search space communicating good
positions to each other within the neighborhood and adjusting their own position and
velocity based on these good positions. For this purpose, each particle keeps a memory of
its own best position found so far and the neighborhood best position among all the
neighbor particles. The goodness of a position is determined by using a fitness function.
The stopping condition of the algorithm is when the maximum number of iterations has
been exceeded.
The following empirical rules are recommended to guide the choice of selecting the
initial values for the parameters of the PSO algorithm.
Step 1) Initialization:
Set iteration k=0
Simulation Optimization Using a Hybrid Scheme … 131
J(yˆ i (0)) = min{J(y j (0))}, j ∈Bi , where Bi defines the set of indexes for the
particles neighbors.
Determine the global best position g (0) using the formula
J(g(0)) = min{J( y i (0))}, i = 1,.., N .
Step 5) Position updating: Based on the updated velocities, each particle changes its
position according to the following equation:
pi (k ) = v i (k ) + pi (k - 1)
Step 6) Personal best updating: Determine the personal best position visited so far by
each particle:
Evaluate the fitness of each particle using J(pi(k)), i = 1,..,N.
y (k 1) if J(p i (k 1)) J(y i (k - 1))
Set y i (k ) i
p i (k ) if J(p i (k)) J(y i (k - 1))
132 Alfonso T. Sarmiento and Edgar Gutierrez
Step 7) Neighborhood best updating: Determine the neighborhood best position ŷ i (k)
visited so far by the whole swarm by using the formula
Step 8) Global best updating: Determine the global best position g (k) visited so far by
the whole swarm by using the formula
If then set k’ = k
www.electronicbo.com
J(g(k)) < J(g(k - 1))
Step 9) Stopping criteria: If the maximum number of iterations is achieved then stop,
g* = g(k) is the optimal solution; otherwise go to step 2.
Step 1) Initialization:
Set iteration k = 0
Set the initial search point Z0 = [z1 , z 2 ,..,z n ] as the optimal solution of the
p
Step 7) Stopping criteria: If J(Zk ) > J(Zk 1 ) then stop, Z*k is the optimal solution;
otherwise go to step 2.
SD Model
The nonlinear SD model used in this case study is a subsystem of the enterprise
system developed by Helal (2008). It is focused on the production process of PMOC and
is composed by the following submodels: (1) supplier submodel, (2) labor management
submodel and (3) internal supply chain submodel. These submodels are described and
depicted below.
The supplier submodel (Figure 2) represents how the capacity of the supplier affects
the rate at which the company orders raw materials (Parts Order Rate). To simplify the
model it is assumed that only one supplier provides raw materials to PMOC. The state
variables of this model are Supplier Production Capacity and Supplier Order Backlog.
www.electronicbo.com
The labor management submodel (Figure 3) estimates the required capacity level
(including overtime when necessary) based on the production rate obtained from the
production planning. The opening positions for recruiting new workers are represented in
the state variable Labor Being Recruited. Labor being recruited moves to become Labor
(get hired) after some hiring delay, according to the Labor Hiring Rate. Similarly, Labor
can be fired o leave voluntarily the company at the Labor Firing Rate.
The internal supply chain submodel consists of two overlapping constructs. The first
construct is the materials ordering and inventory. The state variables for this part of the
model are Parts on Order, and Parts Inventory. The usage rate of parts (raw material)
being taken from Parts Inventory, to be converted into semi finished products (WIP
inventory) is given by the Production Start Rate. The second construct is the production
planning. This part of the model regulates the WIP inventory at the Preforms and Presses
departments to ensure smooth production rate and the availability of the final products for
shipping. The state variables of this part of the model are Preforms WIP and Presses WIP
and Finished Goods Inventory.
The set of parameters in Table 1 defines the current policy for this supply chain.
18,000 Units
23,000 Units
6,000 Units
37 People
16,000 Units
21,000 Units
2,000 Units
20 People
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
www.electronicbo.com
Presses WIP Level Units
Finished Goods Inventory Units
Labor People
For a customer order rate of 5,000 units/week the system starts out of equilibrium.
The behavior of the four variables of interest is depicted in Figure 4. Variables Preforms
WIP Level, Presses WIP Level and Labor have several oscillatory fluctuations. Variable
Finished Goods Inventory is starting to settle down, although it has not reach equilibrium
yet.
A new policy to minimize these oscillations will be determined by solving the
optimization problem presented in the next section.
Optimization Problem
Subject to
Simulation Optimization Using a Hybrid Scheme … 137
(t ) = f (x(t ), p)
x (This notation represents the SD model equations)
T
x 0 (Vector with initial values of all state variables)
0.5 ≤ Desired Days Supply of Parts Inventory ≤ 5
0.5 ≤ Time to Correct Parts Inventory ≤ 5
0.5 ≤ Preforms Cycle Time ≤ 3
0.5 ≤ Presses Cycle Time ≤ 3
0.5 ≤ Time to Correct Inventory ≤ 5
0.5 ≤ Supplier Delivery Delay ≤ 5
0.5 ≤ Time to Adjust Labor ≤ 5
0.5 ≤ Labor Recruiting Delay ≤ 5
5000 ≤ a1 ≤ 50000
5000 ≤ a2 ≤ 50000
1000 ≤ a3 ≤ 50000
10 ≤ a4 ≤ 100
Stabilization Policy
The stabilization policy is obtained after solving the optimization problem presented
in the previous section. The optimization algorithm was run at time 0 using the following
settings: swarm size = 30 particles, neighborhood size = 3 particles, initial inertia
weight = 0.5, iteration lag = 5, cognitive coefficient = 1.2, social coefficient = 1.2. The
time to obtain the optimal policy (after 150 PSO iterations and 1,243 PHC iterations) was
89 seconds.
9,300 Units
22,000 Units
10,000 Units
60 People
8,000
6,000 Units
10,000 Units
2,500 Units
20 People
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
Preforms WIP Level Units
Presses WIP Level Units
Finished Goods Inventory Units
Labor People
ADE
The solution yielded the results shown in Table 2. This table also includes parameters
a1, a2, a3, a4 which are the new equilibrium points for the state variables of interest.
Figure 5 shows the behavior of the state variables when this revised policy is applied.
The system has reached equilibrium approximately in 9 weeks (response time). This
figure also shows that the convergence of ADE has caused the asymptotic stability of the
four variables of interest. This was achieved mainly by increasing the parameter values
Desired Days Supply of Parts Inventory, Time to Correct Parts Inventory and Supplier
Delivery Delay and decreasing several other parameter values including Labor Recruiting
Delay, Preforms Cycle Time, and Presses Cycle Time.
www.electronicbo.com
Parameter Value Unit
Desired Days Supply of Parts Inventory 3.46 Weeks
Time to Correct Parts Inventory 2.79 Weeks
Preforms Cycle Time 1.36 Weeks
Presses Cycle Time 1.70 Weeks
Time to Correct Inventory 1.47 Weeks
Supplier Delivery Delay 2.93 Weeks
Time to Adjust Labor 1.24 Weeks
Labor Recruiting Delay 0.5 Weeks
a1 (EP for Preforms WIP Level) 8828 Units
a2 (EP for Presses WIP Level) 13739 Units
a3 (EP for Finished Goods Inventory) 3275 Units
a4 (EP for Labor) 44 People
6,000
5,500
5,000
4,500
4,000
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
This stabilization policy has been reached using the maximum production capacity of
5,600 units/week as shown in Figure 6. This is due to the constraint in manpower in the
lenses manufacturing department.
To test the stabilization policy it is generated a sudden change in the customer order
rate in week 10. The values for the new EPs are shown in Table 3.
Percentage change in New EP for Preforms New EP for Presses New EP for Finished
customer order rate WIP Level (Units) WIP Level (Units) Goods Inventory (Units)
-15% 8377 13178 3045
-10% 8789 13691 3256
-5% 8828 13739 3275
+10% 8828 13739 3275
6,000
6,000
6,000
6,000
4,000
4,000
4,000
4,000
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
9,000
9,000
9,000
9,000
7,000
7,000
7,000
7,000
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
www.electronicbo.com
Preforms WIP Level:-10%
Preforms WIP Level:-5%
Preforms WIP Level:+10%
14,000
14,000
14,000
14,000
12,000
12,000
12,000
12,000
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
The EP levels of the three inventory variables remain the same for a 10% increment
in customer orders. The reason is simple; the stabilization policy was reached by using
the maximum production capacity and orders over the original customer order rate are
considered backlog and therefore they do not affect the production rates and the stability.
Similarly, for a 5% decrease in customer orders, production is working close to
maximum capacity and the EPs remain the same. In the case where customer orders are
decreased by 10% and 15% the new EPs are reduced too but in a lower percentage that
the change in customer orders.
Simulation Optimization Using a Hybrid Scheme … 141
5,000 units
5,000 units
5,000 units
5,000 units
2,000 units
2,000 units
2,000 units
2,000 units
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Time (week)
Figure 10. Behavior of Finished Goods Inv. due to changes in customer orders.
Stability returns approximately 10 weeks and 16 weeks after the system was
disturbed (response time) for -10% and -15% decrease in customer orders respectively.
Amplifications are on the order of 1% under the EPs for both -10% and -15% decrease in
customer orders.
CONCLUSION
amplifications before reaching new EPs. The experiments also show that in most cases
the change of level in the EPs is proportional to the change of the exogenous variable.
REFERENCES
www.electronicbo.com
75–90.
Bonyadi, M. & Michalewicz, Z. (2017). Particle swarm optimization for single objective
continuous space problems: a review. Evolutionary computation, 25(1), 1–54.
Burns, J. & Malone, D. (1974). Optimization techniques applied to the Forrester model of
the world. IEEE Transactions on Systems, Man and Cybernetics, 4(2), 164–171.
Chen, Y. & Jeng, B. (2004). Policy design by fitting desired behavior pattern for system
dynamic models. In Proceedings of the 2004 International System Dynamics
Conference, Oxford, England.
Clerc, M. (1999). The swarm and the queen: towards a deterministic and adaptive particle
swarm optimization. In Proceedings of the 1999 IEEE Congress on Evolutionary
Computation, Washington, DC.
Clerc, M. (2006). Particle Swarm Optimization. Newport Beach, CA: ISTE Ltd.
Clow, B. & White T. (2004). An evolutionary race: a comparison of genetic algorithms
and particle swarm optimization used for training neural networks. In Proceedings of
the 2004 International Conference on Artificial Intelligence, Las Vegas, NV.
Coyle, R. (1985). The use of optimization methods for policy design in a system
dynamics model. System Dynamics Review, 1 (1), 81–91.
Cui, S. & Weile, D. (2005). Application of a novel parallel particle swarm optimization
to the design of electromagnetic absorbers. IEEE Antennas and Propagation Society
International Symposium, Washington, DC.
Dangerfield, B. & Roberts, C. (1996). An overview of strategy and tactics in system
dynamics optimization. The Journal of the Operational Research Society, 47(3),
405–423.
Daganzo, C. F. (2004). On the stability of supply chains. Operations Research, 52(6),
909–921.
DeRonne, K. & Karypis, G. (2007). Effective optimization algorithms for fragment-
assembly based protein structure prediction. Journal of Bioinformatics and
Computational Biology, 5(2), 335-352.
Simulation Optimization Using a Hybrid Scheme … 143
Disney, S., Naim, M. & Towill, D. R. (2000). Genetic algorithm optimization of a class
of inventory control systems. International Journal of Production Economics, 68(3),
258–278.
Eberhart, R. & Kennedy, J. (1995). A new optimizer using particle swarm theory. In
Proceedings of the Sixth International Symposium on Micro Machine and Human
Science. Nagoya, Japan.
Eberhart, R. & Shi, Y. (1998). Evolving artificial neural networks. In Proceedings of the
1998 International Conference on Neural Networks and Brain, Beijing, China.
Engelbrecht, A. (2005). Fundamentals of Computational Swarm Intelligence. West
Sussex, England: John Wiley & Sons Ltd.
Forrester, N. (1982). A dynamic synthesis of basic macroeconomic theory: implications
for stabilization policy analysis. PhD. Dissertation, Massachusetts Institute of
Technology, Cambridge, MA.
Gonçalves, P. (2003). Demand bubbles and phantom orders in supply chains. PhD.
Dissertation, Sloan School of Management, Massachusetts Institute of Technology,
Cambridge, MA.
Grossmann, B. (2002). Policy optimization in dynamic models with genetic algorithms.
In Proceedings of the 2002 International System Dynamics Conference, Palermo,
Italy.
Hassan, R., Cohanim, B., & de Weck, O. (2005). A comparison of particle swarm
optimization and the genetic algorithm. 46th AIAA/ASME/ASCE/AHS/ASC
Structures, Structural Dynamics and Materials Conference, Austin, TX.
Helal, M. (2008). A hybrid system dynamics-discrete event simulation approach to
simulating the manufacturing enterprise. PhD. Dissertation, University of Central
Florida, Orlando, FL.
Jones, K. (2005). Comparison of genetic algorithm and particle swarm optimisation.
International Conference on Computer Systems and Technologies, Technical
University, Varna, Bulgaria.
Keloharju, R. (1982). Relativity Dynamics. Helsinki: School of Economics.
Kennedy, J. (1997). The particle swarm: social adaptation of knowledge. In Proceedings
of the IEEE International Conference on Evolutionary Computation, Indianapolis,
Indiana.
Kennedy, J. & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the
IEEE International Conference on Neural Networks, Perth, Australia.
Kleijnen, J. (1995). Sensitivity analysis and optimization of system dynamics models:
regression analysis and statistical design of experiments. System Dynamics Review,
11(4), 275–288.
Lakkoju, R. (2005). A methodology for minimizing the oscillations in supply chain using
system dynamics and genetic algorithms. Master Thesis, University of Central
Florida, Orlando, FL.
144 Alfonso T. Sarmiento and Edgar Gutierrez
Laskari, E., Parsopoulos, K., & Vrahatis, M. (2002). Particle swarm optimization for
minimax problems. In Proceedings of the 2002 IEEE Congress on Evolutionary
Computation, Honolulu, HI.
Lee, H., Padmanabhan, V., & Whang, S. (1997). The bullwhip effect in supply chains.
MIT Sloan Management Review, 38(3), 93–102.
Lee, J., Lee, S., Chang, S. & Ahn, B. (2005). A Comparison of GA and PSO for excess
return evaluation in stock markets. International Work Conference on the Interplay
between Natural and Artificial Computation - IWINAC 2005.
Lertpattarapong, C. (2002). Applying system dynamics approach to the supply chain
management problem. Master Thesis, Sloan School of Management, Massachusetts
Institute of Technology, Cambridge, MA.
www.electronicbo.com
Macedo, J. (1989). A reference approach for policy optimization in system dynamic
models. System Dynamics Review, 5(2), 148–175.
Michalewicz, Z. & Fogel, D. (2000). How to solve it: modern heuristics. Berlin,
Germany: Springer.
Mohapatra, P. & Sharma, S. (1985). Synthetic design of policy decisions in system
dynamic models: a modal control theoretical approach. System Dynamics Review,
1(1), 63–80.
Nagatani, T. & Helbing, D. (2004). Stability analysis and stabilization strategies for
linear supply chains. Physica A, 335(3/4), 644–660.
O’Donnell, T., Maguire, L., McIvor, R. & Humphreys, P. (2006). Minimizing the
bullwhip effect in a supply chain using genetic algorithms. International Journal of
Production Research, 44(8), 1523–1543.
Ortega, M. & Lin, L. (2004). Control theory applications to the production-inventory
problem: a review. International Journal of Production Research, 42(11), 2303–
2322.
Özcan, E. & Yilmaz, M. (2007). Particle Swarms for Multimodal Optimization. (2007).
In Proceedings of the 2007 International Conference on Adaptive and Natural
Computing Algorithms, Warsaw, Poland.
Perea, E., Grossmann, I., Ydstie, E., & Tahmassebi, T. (2000). Dynamic modeling and
classical control theory for supply chain management. Computers and Chemical
Engineering, 24(2/7), 1143–1149.
Poirier, C. & Quinn, F. (2006). Survey of supply chain progress: still waiting for the
breakthrough. Supply Chain Management Review, 10(8), 18–26.
Press, W., Teukolsky, S., Vetterling, W. & Flannery, B. (1992). Numerical recipes in C:
the art of scientific computing. Cambridge, England: Cambridge University Press.
Powell, M. (1964). An efficient method for finding the minimum of a function of several
variables without calculating derivatives. The Computer Journal, 7(2), 155-162.
Riddalls, C. and Bennett, S. (2002). The stability of supply chains. International Journal
of Production Research, 40(2), 459–475.
Simulation Optimization Using a Hybrid Scheme … 145
Saleh, M., Oliva, R., Davidsen, P. & Kampmann, C. (2006). Eigenvalue analysis of
system dynamics models: another approach. In Proceedings of the 2006 International
System Dynamics Conference, Nijmegen, The Netherlands.
Schutte, J. & Groenwold, A. (2005). A study of global optimization using particle
swarms. Journal of Global Optimization, 31(1), 93–108.
Shi, Y. & Eberhart, R. (1998). A modified particle swarm optimizer. In Proceedings of
the 1998 IEEE International Conference on Evolutionary Computation, Piscataway,
NJ.
Shi, Y. and Eberhart, R. (2001). Fuzzy adaptive particle swarm optimization. In
Proceedings of the 2001 IEEE International Conference on Evolutionary
Computation, Seoul, Korea.
Sterman, J. (2006). Operational and behavioral causes of supply chain instability, in The
Bullwhip Effect in Supply Chains. Basingstoke, England: Palgrave Macmillan.
Yin, P., Yu, S., Wang, P., & Wang, Y. (2006). A hybrid particle swarm optimization
algorithm for optimal task assignment in distributed systems. Computer Standards &
Interfaces, 28, 441-450.
AUTHORS’ BIOGRAPHIES
Chapter 7
ABSTRACT
Accurate prediction of cutting forces is very essential due to their significant impacts
on product quality. During the past two decades, high pressure cooling (HPC) technique
is starting to be established as a method for substantial increase of productivity in the
metal cutting industry. This technique has proven to be very effective in machining of
hard-to-machine materials such as the nickel-based alloy Inconel 718, which is
characterized by low efficiency of machining process. However, modeling of cutting
forces under HPC conditions is very difficult task due to complex relations between large
numbers of process parameters such are pressure of the jet, diameter of the nozzle,
cutting speed, feed, etc. One of the ways to overcome such difficulty is to implement
models based on the artificial intelligence tools like artificial neural network (ANN),
genetic algorithm (GA), particle swarm optimization (PSO), fuzzy logic, etc. as an
alternative to conventional approaches. Regarding the feedforward ANN training, the
*
Corresponding Author Email: djordjecica@gmail.com.
148 Djordje Cica and Davorin Kramar
www.electronicbo.com
INTRODUCTION
The effect of HPC on the performance of machining of nickel-based alloys has been
investigated by many authors. Ezugwu Ezugwu and Bonney (2004) analyzed tool life,
surface roughness, tool wear and component forces using high-pressure coolant supplies
in rough turning of Inconel 718 with coated carbide tools. The test results show that
acceptable surface finish and improved tool life can be achieved using HPC technique.
Ezugwu (Ezugwu and Bonney (2005)) investigated same parameters in finish machining
of Inconel 718 with coated carbide tool under high-pressure coolant supplies. The results
indicate that acceptable surface finish and improved tool life can be achieved with high
coolant pressures. Cutting forces were increased with increasing cutting speed due
probably to reactive forces introduced by the high-pressure coolant jet. Nandy,
Gowrishankar, and Paul (2009) investigate effects of high-pressure coolant on machining
evaluation parameters such as chip form, chip breakability, cutting forces, coefficient of
friction, contact length, tool life and surface finish. The results show that significant
improvement in tool life and other evaluation parameters could be achieved utilizing
moderate range of coolant pressure. Empirical modeling of machining performance under
HPC conditions using Taguchi DOE analysis has been carried out by Courbon et al.
(2009). Regression modelling was used to investigate the relationships between process
parameters and machining responses. It has been demonstrated that HPC is an efficient
alternative lubrication solution providing better chip breakability, reductions in cutting
forces and advantages regarding lubrication and thermal loads applied to the tool.
Furthermore, this cooling/lubrication technique can improve surface finish allowing for
an optimal pressure/nozzle diameter/cutting speed combination. Colak (2012) study the
cutting tool wear and cutting force components, while machining Inconel 718 under the
high pressure and conventional cooling conditions. Experimental results were analyzed
by using ANOVA and regression analysis. The results have proven that the tool flank
wear and cutting forces considerably decrease with the delivery of high pressure coolant
to the cutting zone. Klocke, Sangermann, Krämer, and Lung (2011) analyzed the effect of
high-pressure cooling in a longitudinal turning process with cemented carbide tools on
the tool wear, cutting tool temperature, resulting chip forms as well as the ratio of cutting
forces and tool-chip contact area. The results suggest that the tool temperature can be
significantly decreased by the use of a high-pressure coolant supply and that due to the
different tool wear mechanisms and the change in the specific load on the cutting edge
during machining, the resulting tool wear was influenced differently.
One of the most important factors in machining processes is accurate estimation of
cutting forces due to their significant impacts on product quality. Modeling and
prediction of optimal machining conditions for minimum cutting forces plays a very
important role in machining stability, tool wear, surface finish, and residual stresses. In
this regard, cutting forces have been investigated by many researchers in various
machining processes through formulation of appropriate models for their estimation. The
most frequently used models for prediction of cutting forces are mathematical models
150 Djordje Cica and Davorin Kramar
based on the on the geometry and physical characteristics of the machining process.
However, due to the large number of interrelated machining parameters that have a great
influence on cutting forces it is difficult to develop an accurate theoretical cutting forces
analytical model. Therefore, over the last few decades, different modeling methods based
on artificial intelligence (AI) have become the preferred trend and are applied by most
researchers for estimation of different parameters of machining process, including cutting
forces, tool wear, surface roughness, etc. Artificial neural networks (ANN) are by now
the most popular AI method for modeling of various machining process parameters.
There are numerous applications of ANN based modeling of cutting forces in turning
reported in the literature. Szecsi (1999) presented a three-layer feed-forward ANN trained
by the error back-propagation algorithm for modeling of cutting forces. Physical and
www.electronicbo.com
chemical characteristics of the machined part, cutting speed, feed, average flank wear and
cutting tool angles were used as input parameters for training ANN. The developed
model is verified and can be used to define threshold force values in cutting tool
condition monitoring systems. Lin, Lee, and Wu (2001) developed a prediction model for
cutting force and surface roughness using abductive ANN during turning of high carbon
steel with carbide inserts. The ANN were trained with depth of cut, feed and cutting
speed as input parameters. Predicted results of cutting force and surface roughness are
found to be more accurate compared to regression analysis. Sharma, Dhiman, Sehgal, and
Sharma (2008) developed ANN model for estimation of cutting forces and surface
roughness for hard turning. Cutting parameters such as approaching angle, speed, feed,
and depth of cut were used as input parameters for training ANN. The ANN model gave
overall 76.4% accuracy. Alajmi¹ and Alfares¹ (2007) modeled cutting forces using back
propagation ANN with an enhancement by differential evolution algorithm. Experimental
machining data such are speed, feed, depth of cut, nose wear, flank wear and, notch wear,
were used in this study to train and evaluate the model. The results have shown an
improvement in the reliability of predicting the cutting forces over the previous work.
Zuperl and Cus (2004) were developed supervised ANN approach for estimation of the
cutting forces generated during end milling process. The predictive capability of using
analytical and ANN models were compared using statistics, which showed that ANN
predictions for three cutting force components were closer to the experimental data
compared to analytical method. Aykut, Gölcü, Semiz, and Ergür (2007) used ANN for
modeling cutting forces with three axes, where cutting speed, feed and depth of cut were
used as input dataset. ANN training has been performed using scaled conjugate gradient
feed-forward back-propagation algorithm. Results show that the ANN model can be used
for accurate prediction of the cutting forces. Cica, Sredanovic, Lakic-Globocki, and
Kramar (2013) investigate prediction of cutting forces using ANN and adaptive networks
based fuzzy inference systems (ANFIS) as a potential modeling techniques. During the
experimental research focus is placed on modeling cutting forces in different cooling and
lubricating conditions (conventional, high pressure jet assisted machining, and minimal
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 151
quantity lubrication). Furthermore, the effect of cutting parameters such as depth of cut,
feed and cutting speed on machining variables are also studied.
However, despite the fact that there are numerous applications of ANN in modeling
of the cutting forces reported in the literature, a review of the literature shows that no
work is reported by modeling these parameters under HPC conditions. This can be
explained by complex relations between large numbers of HPC process parameters, such
are pressure of the jet, diameter of the nozzle, cutting speed, feed, etc. that influence the
cutting forces and make it difficult to develop a proper estimation model. In this sense,
this paper presents ANN models for estimation of cutting forces in turning of Inconel 718
under HPC conditions. First, cutting forces were modeled by using conventional ANN
which uses backpropagation algorithm in its learning. In order to overcome the
limitations of traditional backpropagation algorithm, two bio-inspired computational
techniques, namely genetic algorithm (GA) and particle swarm optimization (PSO) were
also used as a training methods of ANN. The capacity modeling of ANN by using GA
and PSO has been compared to that of the conventional ANN.
EXPERIMENTAL DETAILS
www.electronicbo.com
Figure 1. Experimental setup.
In this research, three levels of diameter of the nozzle Dn, distance between the
impact point of the jet and the cutting edge s, pressure of the jet p, cutting speed vc, and
feed f, were used as the variables for cutting forces modeling (Table 1). Depth of cut was
fixed to 2 mm. With the cutting parameters defined and according to their levels, in total
27 experiments were realized as shown in Table 2.
Level
Machining parameters
1 2 3
Diameter of the nozzle Dn [mm] 0.25 0.3 0.4
Distance between the impact point of the jet and the cutting edge s [mm] 0 1.5 3
Pressure of the jet p [MPa] 50 90 130
Cutting speed vc [m/min] 46 57 74
Feed f [mm/rev] 0.2 0.224 0.25
processing elements (neurons) organized in several layers. These neurons are connected
to each other by weighted links denoted by synapses which establish the relationship
between input data and output data. There are many ANN models and multilayer
perceptions, which only feed forward and multilayered networks, were considered in this
paper. The structure of these ANN has three types of layers: input layer, hidden layer and
output layer. The biases in the neurons of the hidden and output layers, Oiinp and Ojout,
respectively, are controlled during data processing. The biases in the neurons of the
154 Djordje Cica and Davorin Kramar
hidden and output layers, bk(1) and bk(2), respectively, are controlled during data
processing
Before practical application, ANN need to be trained. Training or learning as often
referred is achieved by minimizing the sum of square error between the predicted output
and the actual output of the ANN, by continuously adjusting and finally determining the
weights connecting neurons in adjacent layers. There are several learning algorithms in
ANN and back-propagation (BP) is the most currently the most popular training method
where the weights of the network are adjusted according to error correction learning rule.
Basically, the BP algorithm consists two phases of data flow through the different layers
of the network: forward and backward. First, the input pattern is propagated from the
input layer to the output layer and, as a result of this forward flow of data, it produces an
www.electronicbo.com
actual output. Then, in backward flow of data, the error signals resulting from any
difference between the desired and outputs obtained in the forward phase are back-
propagated from the output layer to the previous layers for them to update weights and
biases of each node until the input layer is reached, until the error falls within a
prescribed value.
In this paper, a multilayer feed-forward ANN architecture, trained using a BP
algorithm, was employed to develop cutting forces predictive model in machining
Inconel 718 under HPC conditions. An ANN is made of three types of layers: input,
hidden, and output layers. Network structure consists of five neurons in the input layer
(corresponding to five inputs: diameter of the nozzle, distance between the impact point
of the jet and the cutting edge, pressure of the jet, cutting speed, and feed) and one neuron
in the output layer (corresponding to cutting force component). Cutting force Fc, feed
force Ff and passive force Fp predictions were performed separately by designing single
output of neural network, because this approach decreases the size of ANN and enables
faster convergence and better prediction capability. Figure 2 shows the architecture of the
ANN together with the input and output parameters.
The first step in developing of ANN is selection of data for training and testing
network. The number of training and testing samples were 18 and 9, respectively, as
shown in Table 2. Then, all data were normalized within the range of ±1 before training
and testing ANN. The ANN model, using the BP learning method, required training in
order to build strong links between layers and neurons. The training is initialized by
assigning some random weights and biases to all interconnected neurons.
The output of the k-th neuron in the hidden layer Okhid is define as,
1
Okhid (1)
I khid
1 exp (1)
T
with
N inp
I hid
k w(1)
jk O j bk
inp (1)
(2)
j 1
where Ninp is the number of elements in the input, wjk(1) is the connection weight of the
synapse between the j-th neuron in the input layer and the k-th neuron in the hidden layer,
Ojinp is the input data, bk(1) is the bias in the k-th neuron of the hidden layer and T(1) is a
scaling parameter.
Similarly, the value of the output neuron Okout is defined as,
1
Okout (3)
I kout
1 exp (2)
T
with
N hid
I kout wik(2) Oihid bk(2) (4)
i 1
where Nhid is the number of neurons in the hidden layer, wik(2) is the connection weight of
the synapse between the i-th neuron in the hidden layer and the k-th neuron in the output
layer, bk(2) is the bias in the k-th neuron of the output layer and T(2) is a scaling parameter
for output layer.
During training, the output from ANN is compared with the measured output and the
mean relative error is calculated as:
156 Djordje Cica and Davorin Kramar
N exp 1 N out
Oiexp Oiout
E w , w , b , b exp
1
(1) (2) (1)
N
(2)
out
m 1 N
Oiexp
(5)
i 1
m
where Nout is the number of neurons of the output layer, Nexp is the number of
experimental patterns and Oiout and Oiexp are the normalized predicted and measured
values, respectively.
The error obtained from previous equation is back-propagated into the ANN. This
means that from output to input the weights of the synapses and the biases can be
modified which will result in minimum error. Several network configuration were tested
with different numbers of hidden layers and various neurons in each hidden layer using a
www.electronicbo.com
trial and error procedure. The best network architecture was a typical two-layer feed-
forward network with one hidden layer with 10 neurons that was trained with a
Levenberg-Marquardt back-propagation algorithm. These ANN architecture will be used
in the next presentation and discussion.
Regarding the feedforward ANN training, the mostly used training algorithm is
standard BP algorithm or some improved BP algorithms. Basically, the BP algorithm is a
gradient-based method. Hence, some inherent problems are frequently encountered in the
use of this algorithm, e.g., risk of being trapped in local minima, very slow convergence
rate in training, etc. In addition, there are many elements to be considered such are
number of hidden nodes, learning rate, momentum rate, bias, minimum error and
activation/transfer function, which also affect the convergence of BP learning. Therefore,
recent research emphasis has been on optimal improvement of ANN with BP training
method.
The learning of ANN using bio-inspired algorithms has been a theme of much
attention during last few years. These algorithms provide a universal optimization
techniques which requires no particular knowledge about the problem structure other than
the objective function itself. They are robust and efficient at exploring an entire, complex
and poorly understood solution space of optimization problems. Thus, bio-inspired
algorithms are capable to escape the local optima and to acquire a global optima solution.
Bio-inspired algorithms have been successfully used to perform various tasks, such as
architecture design, connection weight training, connection weight initialization, learning
rule adaptation, rule extraction from ANN, etc. One way to overcome BP training
algorithm shortcomings is to formulate an adaptive and global approach to the learning
process as the evolution of connection weights in the environment determined by the
architecture and the learning task of ANN. Bio-inspired algorithms can then be used very
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 157
Although many quite efficient bio-inspired algorithms have been developed for the
optimization of ANN, in this study two of them, namely, genetic algorithm (GA) and
particle swarm optimization (PSO), were utilized to train a feed forward ANN with a
fixed architecture. Therefore, numerical weights of neuron connections and biases
represent the solution components of the optimization problem.
www.electronicbo.com
generations. GA has been successfully used in a wide variety of problem domains that are
not suitable for standard optimization algorithms, including problems in which the
objective function is highly nonlinear, stochastic, nondifferentiable or discontinuous. An
implementation of a GA begins with a randomly generated population of individuals, in
which each individual is represented by a binary string (called chromosomes) for one
possible solution. These strings encode candidate solutions (called individuals) to an
optimization problem, evolves toward better solutions. The evolution happens in
generations and during each generation a measure of the fitness with respect to an
objective function is evaluated. Based on fitness value, a new population is then created
based on the evaluation of the previous one, which becomes current in the next iteration
of the algorithm. Individuals with a higher fitness have a higher probability of being
selected for further reproduction. Thus, on average, the new generation will possess a
higher fitness value than the older population. Commonly, the algorithm continues until
one or more of the pre-established criteria, such as maximum number of generations or a
satisfactory fitness level, has been reached for the population.
Following are the steps involved in the working principle of GA: (i) chromosome
representation, (ii) creation of the initial population, (iii) selection, (iv) reproduction, (v)
termination criteria and (vi) the evaluation function.
Chromosome representation. The basic element of the genetic algorithm is the
chromosome which contain the variable information for each individual solution to the
problem. The most common coding method is to represent each variable with a binary
string of digits with a specific length. Each chromosome has one binary string and each
bit in this string can represent some characteristic of the solution. Another possibility is
that the whole string can represent a number. Therefore, every bit string is a solution, but
not necessarily the best solution. This representation method is very simple; strings of
ones and zeroes would be randomly generated, e.g., 1101001, 0101100, etc., and these
would form the initial population. The strings may be of fixed length or, more rarely, be
of variable length. Apart from binary encoding, octal encoding, hexadecimal encoding,
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 159
permutation encoding, value encoding and tree encoding are also used as an encoding
methods in genetic algorithms.
Creation of the initial population. The GA sequence begins with the creation of an
initial population of individuals. The most common way to do this is to generate a
population of random solutions. A population of individuals represents a candidate
solution to the problem and the population size depend on the complexity of the problem.
Ideally, the first population should have a gene pool as large as possible in order to
explore the whole search space. Nevertheless, sometimes a problem specific knowledge
can be used to construct the initial population. Using a specific heuristic to construct the
population may help GA to find good solutions faster, but the gene pool should be still
large enough. Furthermore, it is necessary to take into account the size of the population.
The larger population enable easier exploration of the search space, but at the same time
increases the time required by a GA to converge.
Selection. Selection is process of randomly picking chromosomes out of the
population according to their evaluation function, where the best chromosomes from the
initial population are selected to continue, and the rest are discarded. The members of the
population are selected for reproduction or update through a fitness-based process, where
the higher the fitness function, guarantee more chance for individual to be selected. The
problem is how to select these chromosomes and there are a large number of methods of
selection which have been developed that vary in complexity. A method with low
selectivity accepts a large number of solutions which result in too slow evolution, while
high selectivity will allow a few or even one to dominate, which result in reduction of the
diversity needed for change and progress. Therefore, balance is needed in order to try
prevent the solution from becoming trapped in a local minimum. Several techniques for
GA selection have been used: the roulette wheel, tournament, elitism, random, rank and
stochastic universal sampling,
Reproduction. Reproduction is the genetic operator used to produce new generation
of populations from those selected through selection using two basic types of operators:
crossover and mutation. Crossover operators selects genes from parent chromosomes and
creates a new offspring. The simplest way to do this is to choose any crossover point on
the string and everything after and before the point is crossed between the parents and
copied. There are several types of crossover operators: single-point crossover, two-point
crossover, multi-point crossover, uniform crossover, three parent crossover, crossover
with reduced surrogate, shuffle crossover, precedence preservative crossover, ordered
crossover and partially matched crossover. The basic parameter of crossover operator is
the crossover probability which describe how often crossover will be performed. If
crossover probability is 0%, then whole new generation is made from exact copies of
chromosomes from old population, elsewise if it is 100%, then all offspring are made by
crossover. After crossover, the mutation operator is applied on the strings. Mutation
ensure more variety of strings and prevent GA from trapping in a local minimum. If task
160 Djordje Cica and Davorin Kramar
of crossover is to exploit the current solution to find better ones, then mutation forces GA
to explore new areas of the search space. There are some mutation techniques: flipping,
interchanging and reversing. The basic parameter of mutation operator is the mutation
probability which decide how often a string will be mutated. If mutation probability is
0%, no mutation occurs, elsewise if it is 100%, the whole chromosome will be changed.
Termination criteria. The GA moves through generation to generation until some of
the termination criteria is fulfilled. The GA stops when: specified number of generations
has been reached, specified duration of time is elapsed, defined level of fitness is reached,
the diversity of the population goes down below a specified level, and the solutions of the
population are not improved during generations.
The evaluation function. The task of the evaluation function is to determine the
www.electronicbo.com
fitness of each solution string generated during the search. The fitness of each individual
solution not only represent a quantitative measure of how well the solution solves the
original problem, but also corresponds to how close the chromosome is to the optimal
one. The function does not need to have any special analytical properties.
GA has been also used in training ANN recently in order to improve precision and
efficiency of the network. The performance of an ANN depends mainly on the weights of
its connections, therefore training a given ANN generally means to determine an optimal
set of connection weights. The weight learning of ANN is usually formulated as
minimization of some error functions, over the training data set, by iteratively adjusting
connection weights. In this way, the optimization problem is transformed into finding a
set of fittest weight for minimize objective function which is mean square error between
the target and actual output. In this chapter, GA is used to optimize the weights and bias
(weight values associated with individual nodes) of the ANN model. The steps involved
in process of ANN training using a GA are shown in Table 3.
In order to achieve the best performance of GA-based ANN a parametric study for
determination of optimal set of GA parameters was carried out. The optimization process
takes place with the values of crossover probability, mutation probability, maximum
number of generations and population size. A parametric study is carried out so that the
value of one parameter is varied at a time, while other parameters have fixed values. The
fitness value of a GA is estimated based on the mean absolute percentage error of each
training data sample, representing deviation of the result (cutting force components) of
the GA-based ANN from that of the desired one. It is noted that the error could be either
positive or negative. Thus, absolute value of the error is considered as a fitness value of
GA solution. For main cutting force the optimal values of crossover probability, mutation
probability, number of generations, and population size were 0.15, 0.025, 2260, and 520,
respectively. For feed force optimal values of these parameters were 0.9, 0.01, 1480, and
590. Finally, for passive force the optimal values of crossover probability, mutation
probability, number of generations, and population size were 0.1, 0.015, 1920, and 260,
respectively. The results of the parametric study for main cutting force are shown in
Figure 4.
www.electronicbo.com
position vector of the particle, the velocity vector of the particle and the personal best
position of the particle. During the search process, the position of each particle is guided
by two factors: the best position visited by itself, and the global best position discovered
so far by any of the particles in the swarm. In this way, the trajectory of each particle is
influenced by the flight experience of the particle itself as well as the trajectory of
neighborhood particles of the whole swarm. This means that all the particles fly through
search space toward personal and global best position a navigated way, at the same time
exploring new areas by the stochastic mechanism in order to escape from local optima.
The performance of particles are evaluated using a fitness function that varies depending
on the optimization problem.
Position of the i-th particle in the d-dimension solution space at iteration k is denoted
as
while
y k yi ,1 k , yi ,2 k , ..., yi ,d k (8)
denote the best position found by any of the particles in the neighborhood of xi up to
iteration k.
The new position of particle i in iteration k + 1, xi(k + 1), is computed by adding a
velocity
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 163
xi k 1 xi k + vi k 1 (11)
where j designates component in the search space; ω represents the inertia weight which
decreases linearly from 1 to near 0; c1, and c2 are cognitive and social parameters,
respectively, known as learning factors; and r1,j and r2,j are random numbers uniformly
distributed in the range [0, 1]. The inertia weight component causes the particle to
continue in the direction in which it was moving at iteration k. A large weight facilitates
global search, while a small one tends to facilitate fine tuning the current search area. The
cognitive term, associated with the experience of the particle, represents its previous best
position and provides a velocity component in this direction, whereas the social term
represents information about the best position of any particle in the neighborhood and
causes movement towards this particle. These two parameters are not critical for the
convergence of PSO, but fine tuning may result in faster convergence of algorithm and
alleviation of local minima. The r1,j and r2,j parameters are employed to maintain the
diversity of population.
The PSO algorithm shares many similarities with evolutionary computation
techniques such as GA. PSO algorithm are also are initialized with a randomly created
population of potential solutions and has fitness values to evaluate the population.
Furthermore, both algorithms update the population and search for the optimum with
random techniques. However, unlike GA, PSO does not have operators such as mutation
and crossover which exist in evolutionary algorithms. In PSO algorithm potential
solutions (particles) are moving to the actual optimum in the solution space by following
their own experiences and the current best particles. Compared with GA, PSO has some
attractive characteristics such are its memory which enables it to retain knowledge of
good solutions by particles of the whole swarm, simultaneously search for an optima in
multiple dimensions, mechanism of constructive cooperation and information-sharing
between particles. Due to its simplicity, robustness, easy implementation, and quick
convergence PSO optimization method has been successfully applied to a wide range of
applications. The focus of this study is to employ a PSO for optimization of the weights
and bias of the ANN model. The steps involved in process of ANN training using PSO
are shown in Table 4.
164 Djordje Cica and Davorin Kramar
(i) Determine an objective function and algorithm parameters. Initialize the position and
velocities of a group of particles randomly.
(ii) Decode each particle in the current population into a set of connection weights and
construct a corresponding ANN.
(iii) Simulate ANN using current population and evaluate the ANN by computing its mean
square error between actual and target outputs.
(iv) Calculate the fitness value of each initialized particle in the population.
(v) Select and store best particle of the current particles.
(vi) Update the positions and velocities of all the particles and generate a group of new
particles.
(vii) Calculate the fitness value of each new particle and replace the worst particle by the
stored best particle. If current fitness is less than local best fitness then set current fitness
as local best fitness and if current fitness is less than global best fitness then set current
www.electronicbo.com
fitness as global best fitness.
(viii) Repeat steps (iv) to (vii) until the solution converge.
(ix) Extract optimized weights.
Similar to the previous one, a careful parametric study has been carried out to
determine the set of optimal PSO parameters, where the value of one parameter is varied
at a time, while other parameters have fixed values. The optimization process takes place
with the values of cognitive acceleration, social acceleration, maximum number of
generations, and population size. The fitness value of a PSO solution is estimated based
on the mean absolute percentage error of each training data sample. The error of each set
of training data is the deviation of the result (cutting force components) of the PSO-based
ANN from that of the desired one. For main cutting force the optimal values of cognitive
acceleration, social acceleration, number of generations, and population size were 0.8,
1.6, 350, and 250, respectively. For feed force optimal values of these parameters were
0.4, 1.4, 270, and 250. Finally, for passive force the optimal values of cognitive
acceleration, social acceleration, number of generations, and population size were 0.5,
1.0, 340, and 240, respectively. The results of the parametric study for main cutting force
are shown in Figure 5.
testing data which were not used for the training process shown in Table 2. In order to
evaluate the performance, the predicted cutting forces components were compared with
the experimental values. In order to evaluate the performance of developed training
methods of ANN, the predicted main values of cutting force, feed force and passive force
were compared with the experimental data and summarized in Table 5, Table 6 and Table
7, respectively. The mean absolute percentage errors for main cutting force, feed force
and passive force of BP-based ANN were 5.1%, 5.8% and 6.1%, respectively, which is
considered a good agreement between the simulated outputs and the experimental results.
However, the optimal results obtained using the GA-based and PSO-based ANN models
are even more accurate. The mean absolute percentage errors of GA-based ANN model
for main cutting force, feed force and passive force were 3.8%, 5.3% and 4.2%,
respectively. Finally, mean absolute percentage errors of PSO-based ANN model were
3.8%, 3.7% and 3.8% for main cutting force, feed force and passive force, respectively.
Hence, the learning of ANN using bio-inspired algorithms has demonstrated
improvement in training average error as compared to the backpropagation algorithm.
Figure 5. Results of parametric study for determination of optimal set of PSO parameters.
166 Djordje Cica and Davorin Kramar
Table 5. Comparison between predicted main cutting force and experimental results
www.electronicbo.com
Table 6. Comparison between predicted feed force and experimental results
CONCLUSION
In this study, three different ANN models for estimation of cutting force components
in turning of Inconel 718 under HPC conditions were developed. The considered process
parameters include diameter of the nozzle, distance between the impact point of the jet
and the cutting edge, pressure of the jet, cutting speed, and feed. First, cutting forces were
modeled by using conventional multilayer feed-forward ANN trained using a BP
algorithm. These model were found to predict the output with the 94.9%, 94.2% and
93.9% accuracy, for main cutting force, feed force and passive force, respectively. These
results indicate good agreement between the predicted values and experimental values.
However, due to the limitations of BP-based ANN, such are risk of being trapped in local
minima, very slow convergence rate in training, etc. an effort was made to apply two
bio-inspired algorithm, namely GA and PSO, as a training methods of ANN. The results
obtained indicated that GA-based ANN can be successfully used for predicting of main
cutting force, feed force and passive force, with the 96.2%, 94.7% and 95.8% accuracy,
respectively. The predicted results of PSO-based ANN have accuracy of 96.2%, 96.3%
and 96.2%, for main cutting force, feed force and passive force, respectively. It is evident
that results obtained using the GA-based and PSO-based ANN models are more accurate
compared to BP-based ANN. However, PSO-based ANN model predicted cutting force
components with better accuracy compared to the GA-based ANN model. Hence, the
learning of ANN using bio-inspired algorithms can significantly improve the ANN
performance, not only in terms of precision, but also in terms of convergence speed. The
results showed that the GA-based and PSO-based ANN can be successfully and very
accurately applied for the modeling of cutting force components in turning under HPC
conditions.
REFERENCES
Alajmi, M. S. & Alfares, F. (2007). Prediction of cutting forces in turning process using
de-neural networks.
Aykut, Ş., Gölcü, M., Semiz, S. & Ergür, H. (2007). Modeling of cutting forces as
function of cutting parameters for face milling of satellite 6 using an artificial neural
network. Journal of Materials Processing Technology, 190(1), 199-203.
Cica, D., Sredanovic, B., Lakic-Globocki, G. & Kramar, D. (2013). Modeling of the
cutting forces in turning process using various methods of cooling and lubricating: an
artificial intelligence approach. Advances in Mechanical Engineering.
168 Djordje Cica and Davorin Kramar
www.electronicbo.com
presented at the Micro Machine and Human Science, 1995. MHS’95, Proceedings of
the Sixth International Symposium.
Ezugwu, E. & Bonney, J. (2004). Effect of high-pressure coolant supply when machining
nickel-base, Inconel 718, alloy with coated carbide tools. Journal of Materials
Processing Technology, 153, 1045-1050.
Ezugwu, E. & Bonney, J. (2005). Finish machining of nickel-base Inconel 718 alloy with
coated carbide tool under conventional and high-pressure coolant supplies. Tribology
Transactions, 48(1), 76-81.
Klocke, F., Sangermann, H., Krämer, A. & Lung, D. (2011). Influence of a high-pressure
lubricoolant supply on thermo-mechanical tool load and tool wear behaviour in the
turning of aerospace materials. Proceedings of the Institution of Mechanical
Engineers, Part B: Journal of Engineering Manufacture, 225(1), 52-61.
Kramar, D. & Kopac, J. (2009). High pressure cooling in the machining of hard-to-
machine materials. Journal of Mechanical Engineering, 55(11), 685-694.
Kramar, D., Sekulić, M., Jurković, Z. & Kopač, J. (2013). The machinability of nickel-
based alloys in high-pressure jet assisted (HPJA) turning. Metalurgija, 52(4), 512-
514.
Lin, W., Lee, B. & Wu, C. (2001). Modeling the surface roughness and cutting force for
turning. Journal of Materials Processing Technology, 108(3), 286-293.
Nandy, A., Gowrishankar, M. & Paul, S. (2009). Some studies on high-pressure cooling
in turning of Ti–6Al–4V. International Journal of Machine Tools and Manufacture,
49(2), 182-198.
Sharma, V. S., Dhiman, S., Sehgal, R. & Sharma, S. (2008). Estimation of cutting forces
and surface roughness for hard turning using neural networks. Journal of intelligent
Manufacturing, 19(4), 473-483.
Szecsi, T. (1999). Cutting force modeling using artificial neural networks. Journal of
Materials Processing Technology, 92, 344-349.
The Estimation of Cutting Forces in the Turning of Inconel 718 Assisted … 169
Wertheim, R., Rotberg, J. & Ber, A. (1992). Influence of high-pressure flushing through
the rake face of the cutting tool. CIRP annals-Manufacturing technology, 41(1), 101-
106.
Zuperl, U. & Cus, F. (2004). Tool cutting force modeling in ball-end milling using
multilevel perceptron. Journal of Materials Processing Technology, 153, 268-275.
AUTHORS’ BIOGRAPHIES
Dr. Djordje Cica is a professor at the University of Banja Luka in the Faculty of
Mechanical Engineering. Large experience in artificial intelligence applied for expert
system using bioinspired algorithms and fuzzy logic.
Chapter 8
ABSTRACT
INTRODUCTION
techniques have opened doors that increase the level of complexity in problem solving.
This has provided the environment for the renaissance of a new analytics paradigm that is
trying to deal with continuously changing environments. This new paradigm focuses on
the ability to recognize change and react quickly. For example, advanced analytics uses
continuous data sampling to provide additional insights that further enhance strategic
decisions and may assist decision makers in identifying new business opportunities
and/or new relationships, which may also support innovation and creativity (Legarra et
al., 2016). One very important aspect is the ability to forecast future perceptions and
calculate the risk of potential outcomes. The incorporation of big data capabilities can
further enhance such approaches through rich data sources and computational capabilities
that provide additional insights across a value network and/or life cycle along with real
www.electronicbo.com
time identification and tracking of key factors. Although big data technologies currently
exist, consensus on tools and techniques for managing and using big data to extracting
valuable insights is not well established (Gobble, 2013). Organizations are currently
trying to gain a better understanding of the new paradigm and the associated benefits
from the viewpoint of big data and advanced analytics. Complexity is always the issue.
Predictive analytics is one form of advanced analytics. Predictive analytics uses a
combination of data which may include historical, auxiliary, structured, and unstructured
data to forecast potential actions, performance, and developments. This form of advanced
analytics is considered more involved and technologically demanding than visual and
descriptive analytics. This is because predictive analytics involves statistical techniques,
AI techniques, OR/MS modeling, simulation, and/or hybrids of them to create predictive
models that quantify the likelihood of a particular outcome occurring in the future. In
addition, predictive analytics are part of systems which try to tame complexity.
Predictive analytics uses statistical techniques, AI and OR/MS modeling, simulation,
and/or hybrids. AI includes a large diverse universe of different types of techniques. The
traditional side of AI involve ontologies, semantics, expert systems, and reasoning. On
the other hand, the machine learning side of AI includes supervised, unsupervised and
reinforcement learning, including artificial neural networks, support vector machines,
deep learning, evolutionary algorithms (EAs) and other metaheuristics, and regression
trees.
Evolutionary algorithms is a family of techniques for optimization inspired by natural
evolution. Blum et al. (2012) stated that EA “is an algorithm that simulates – at some
level of abstraction – a Darwinian evolutionary system.” The most popular EAs are
Genetic Algorithms (GAs), Genetic Programming (GP), Evolutionary Strategies (ES) and
Evolutionary Programming (EP). GP is a very useful technique that has become
dominant and well developed in the last twenty years. GP is generally applicable to a
wide range of predictive analytics problems
Predictive Analytics using Genetic Programming 173
Advanced analytics aims to provide the base necessary to handle complex problems
in terms of scalability and amount of data and sources (Chen & Zhang, 2014). The
analysis of the data is the new scientific paradigm, besides empirical, theoretic and
computational science. Challenges that create techniques and methodologies are
beneficial for this purpose in order to handle complex problems (Chen & Zhang, 2014).
A complex problem usually features at least several of the followings:
Our experience working and analyzing these problems have provided us with a more
comprehensive methodology where several models can be used with other types of
empirical models in order to build predictive systems. Our methodology has been
evolving through the years due to the technological trends mentioned above (i.e.,
computing power and new, more established AI techniques) and have the following steps
(Rabelo, Marin, & Huddleston, 2010):
b. Experiments and Visits – The different experiments and the data must
be understood by the data science team. How do they relate to each
other? How was the equipment/surveys calibrated/designed? Who are the
owners of the data?
c. Organizational/Cultural/Political and the ecosystem – The problem
ecosystem must be investigated. Do the participants understand the
goals/objectives and procedures of the data science task? Is there an
institutional culture of sharing ideas, information, and data? Is top
management championing the data science team?
2. Gather Information from current databases/files and servers/clusters: This
step is very important. Complex problems in large/global organizations have
distributed databases, servers and other types of repositories of data and
www.electronicbo.com
information in different formats, different computing/IT platforms, unstructured,
structured, and levels of details and accuracy.
3. Develop map of databases and clusters from the different points in the life-
cycle: It is important to have a clear picture and guidance of the different
variables, experiments and data available. A map of this is very important for
providing the flexibility to integrate different databases and clusters, and create
new ones. Enterprise data hubs and ontologies are very important (if budget and
sophistication of the project permit) to increase agility, capacity planning, and
interoperability.
4. Develop map of “models” (analytical and Empirical) from the different
points in the life-cycle: Usually, this step is totally forgotten from the data
science task (it was difficult to find an article on data mining/data science with
this philosophy). The traditional data miners go directly to the database to start
playing with the data and the variables. Not only are the results from experiments
very important for the data mining task but so are previously developed models
based on statistics, non-statistical techniques, finite element analysis, simulations,
and first principle models. These models have important information. We must
be able to explore their fusion with the predictive models to be developed by the
data science task.
5. Build databases from current ones (if required): Now that we know the
goals/objectives, of the different environments, we can create comprehensive
databases with the relevant data and variables. Different procedures can be used
to start preparing the data for the modeling efforts by the advanced analytics
team..
6. Knowledge Discovery and Predictive Modeling: Develop the different models,
discovery of relationships, according to the goals/objectives of the data science
task. It is important to explore the information fusion of the different models
developed.
7. Deployment of the models developed: This not only includes the development
of a user interface but also includes the interpretation of the models’ answers in
the corresponding technical language. An integrity management plan must be
implemented with the appropriate documentation.
Predictive Analytics using Genetic Programming 175
GENETIC PROGRAMMING
Evolutionary algorithms are search and optimization procedures that are motivated
by the principles of natural genetics and natural selection (Deb, 2001). This concept was
first developed during the 70’s by John Holland and his students at the University of
Michigan, Ann Arbor (Deb, 1989). The goals of their research have been twofold: (1) to
abstract and rigorously explain the adaptive processes of natural systems, and (2) to
design artificial systems software that retains the important mechanics natural selection
(Goldberg, 1989). Eventually, this approach has led to important discoveries and
advancements in both natural and artificial systems science.
Over the last two decades, EAs have been extensively used as search and
optimization tools in various problem domains, including science, commerce and
engineering (Deb, 2001). These algorithms have been found to be very successful in
arriving at an optimal/near-optimal solution to complex optimization problems, where
traditional search techniques fail or converge to a local optimum solution. The primary
reasons for their success are their wide applicability, ease of use, and global dimension.
There are several variations of EAs. Blum et al. (2011) stated that standard EA includes a
set of principles and a common cycle. This set of principles is explained as follows:
EAs follow a cycle similar to the one depicted in Figure 1. An initial population is
built based on random creation of individuals with their respective chromosomes. Some
individuals of this initial population can be generated using metaheuristics and other
optimization engineering schemes. The population is mapped from the genetic
representation (i.e., chromosome instance) to a fitness based one (representation required
to be assessed by the environment). That means that the particular individual needs to be
represented in a different way to obtain the value of the objective function (as given by
the assessment process). For example, a chromosome instance (representing a particular
individual) can represent now a discrete-event simulation program that needs to be
executed to obtain the value of the objective function. If the performance criterion is met,
this cycle (i.e., evolution) stops. Otherwise, the evolutionary cycle continues with the
generation of the next population. That means that after the values of the objective
function are obtained for each member of the population, the fitness values are
determined in a relative manner. The mating pool is formed by the members which have
the best relative fitness.
176 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
www.electronicbo.com
Figure 1: Basic cycle of EAs.
The next step is reproduction where offspring are derived from the selected
individuals by applying the reproduction operations. There are usually three different
reproduction operations: 1.) mutation, which modifies with some probability the original
structure of a selected individual, 2.) reproduction (i.e., cloning of some individuals to
preserve features which contribute to higher fitness), and 3.) crossover, which combines
two chromosome instances in order to generate offspring. Blum et al. (2011) described
that “whether the whole population is replaced by the offspring or whether they are
integrated into the population as well as which individuals to recombine with each other
depends on the applied population handling strategy.”
The most popular EAs are Genetic Algorithms (GAs), Genetic Programming (GP),
Evolutionary Strategies (ES) and Evolutionary Programming (EP). The basic idea behind
GP is to allow a computer/machine to emulate what a software programmer does. The
software programmer develops a computer program based on objectives and gradual
upgrades. Langdon et al. (2010) stated that GP “does this by repeatedly combining pairs
of existing programs to produce new ones, and does so in a way as to ensure the new
programs are syntactically correct and executable. Progressive improvement is made by
testing each change and only keeping the better changes. Again this is similar to how
people program, however people exercise considerable skill and knowledge in choosing
where to change a program and how.” Unfortunately, GP does not have the knowledge
and intelligence to change and upgrade the computer programs. GP must rely on
gradients, trial and error, some level of syntactic knowledge, and chance.
Predictive Analytics using Genetic Programming 177
Figure 2: Two parental computer programs which are rooted with ordered branches.
The remainders are part of the parental programs not selected for crossover (Figure
4). The remainders are available in order to generate offspring. The first offspring can be
created by inserting the second parent’s crossover fragment into the first parent’s
remainder at the first parent’s crossover point (Figure 5). The second offspring can be
created by inserting the first parent’s crossover fragment into the second parent’s
remainder at the first second’s crossover point (Figure 5).
www.electronicbo.com
Figure 4: Remainders.
The new computer programs (offspring) are displayed in Figure 5. The first one is:
X/0.455Z + X + 1.75 and using a LISP S-expression (+ (/ X (* 0.455 Z)) (+ X 1.75)).
The second program is: 0.25XY2 and using a LISP S-expression (* (* Y X) (* 0.25 Y)).
Figure 5: Offspring programs developed from the previous parental programs (Figure 2), subprograms
selected for crossover (Figure 3), and the remainders (Figure 4).
CASE STUDY
NASA's Space Shuttle was the first orbital spacecraft that was a reusable launch
vehicle. At launch, it consisted of the following major systems: external tank, solid rocket
boosters, and orbiter (Figure 8). After 2003, there were three orbiters: Atlantis, Discovery
and Endeavour. Discovery completed its final mission on March 9, 2011 and Endeavour
180 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
www.electronicbo.com
Figure 6: Generic process of Genetic Programming.
on June 1, 2011. The landing of Atlantis on July 21, 2011 marked the closing of the 30-
year program. The lessons learned and developments of this 30-year program will impact
future programs such as the one to go to Mars (Rabelo et al., 2011; Rabelo et al., 2012;
Rabelo et al., 2013).
Figure 8: The NASA Space Shuttle and its main components (NASA, 2005).
One of the most important systems in the Space Shuttle is the Thermal Protection
System (TPS). The TPS is made up of diverse materials “applied externally to the outer
structural skin of the orbiter to maintain the skin within acceptable temperatures,
primarily during the entry phase of the mission” (NASA, 2002). The TPS is built from
materials selected for stability at high temperatures and weight efficiency. Reinforced
carbon-carbon (RCC) is used on the wing leading edges; the nose cap, including an area
immediately after of the nose cap; and the immediate area around the forward
orbiter/external tank structural attachment. RCC protects areas where temperatures
exceed 2,300 °F during re-entry (NASA, 2004).
The wing leading edges are one of the highest reentry heating areas. The wing
leading edges are composed of 22 panels (Figure 9). These panels are fabricated with
RCC. To begin fabrication of these RCC panels, a foundation of woven fabric is
positioned such that all plies are alternating in the 0 and 90 degree directions. During the
manufacturing process, silica is infused in the outer layers, and the resulting laminate is
heated in specialized reactors that have an inert environment to form a silicon-carbide
182 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
(SiC) coating (Gordon, 1998). The manufacturing process, the temperature profiles, and
the infusion rates can create cavities in the carbon-carbon substrate. Micro-cracks in the
SiC coating can be also created. These substrate cavities and coating micro-cracks result
in a material with complex behavior (a tough-brittle material behavior with plasticity -
Figure 10). This needs to be emphasized due to the extreme environment and conditions
to be experienced during the re-entry phase of the orbiter.
www.electronicbo.com
Figure 9: The left wing of the NASA Space Shuttle with Reinforced Carbon-Carbon Panels (NASA,
2006). The only panels numbered in the picture are those panels numbered 1 through 10, 16 and 17.
There are 22 RCC panels on each wing's leading edge.
The manufacturing lead time of RCC panels is almost 8 months and their cost is high
due to the sophistication of the labor and the manufacturing equipment. It is an
engineered to order process. It will be good to know the health and useful life of an RCC
Panel. The predictive system can provide a future outcome of over-haul or disposal.
NASA developed several Non-Destructive Evaluation (NDE) methods to measure the
health of the RCC materials such as advanced digital radiography,, thermography, high
resolution computed digital tomography, advanced eddy current systems, and advanced
ultrasound (Madaras et al., 2005; Lyle & Fasanellaa, 2009). From those, thermography is
the favorite one due to its features such as easy to implement in the orbiter’s servicing
environment in the Orbiter Processing Facility (OPF), non-contacting, one-sided
application, and it measures the health of the RCC panel (Cramer et al., 2006). This NDE
method can be performed during flights. In addition, this information can be fed to a
Predictive Analytics using Genetic Programming 183
In the years of 2008, 2009, 2010, and 2011 NASA assembled a Tiger Team to study
potential issues with the shuttle’s Reinforced Carbon-Carbon (RCC) leading-edge panel
(Dale, 2008). The Tiger Team’s investigation generated huge amounts of structured and
unstructured data of the RCC panels. This big data was able to be used with different
methodologies to build analysis and predictor models. One of the methodologies studied
was GP.
We will be explaining in more detail step 6 of the framework outlined in the Section
Complexity and Predictive Analytics. We are assuming that steps 1 – 5 have been
completed successfully (an effort that can take several months for this case study).
Input engineering is about the investigation of the most important predictors. There
are different phases such as attribute selection to select the most relevant attributes. This
involves the removing of the redundant and/or irrelevant attributes. This will lead to
simpler models that are easier to interpret and we can add some structural knowledge.
There are different filters to be used with the respective objectives such as:
Information Gain
Gain ratio
184 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
Correlation
High correlation with class attribute
Low correlation with other attributes
Another important factor is to select individual attributes and subsets of them. The
direction of the search (e.g., best first, forward selection) is an important decision. In
addition, the selected approach was the one of model of models for the RCC problem. A
very important issue is to look for kernels, levels of interactions, and synthetic attributes.
Visualization is always important (there are many tools available for visualization).
We learned from visualizations that the relative location of the panel and the position of a
specific point in the area of a panel are important factors to differentiate the level of wear
and deterioration (Figure 11).
www.electronicbo.com
Attribute subset evaluators and crossvalidation were used with best-first and
backward (starting from complete set) using neural networks (Backpropagation). This
was performed to better understand the data.
Figure 11: Visualization of the average deterioration of specific panels for the three NASA shuttles.
Synthetic Attributes
Synthetic attributes are combinations of single attributes that are able to contribute to
the performance of a predictor model. The synthetic attribute creates higher dimensional
feature spaces. This higher dimensional feature spaces support a better classification
performance. For example, Cosine (X * Y2) is a synthetic variable formed by the single
Predictive Analytics using Genetic Programming 185
attributes X and Y. Therefore, GP can contribute not only to a complete solution but also
providing synthetic attributes.
Deciles
The historical data is randomly split in two groups: one to build the model and the
other to test and confirm the accuracy of the prediction model. The approach of using two
groups of data can be used in a variety of AI algorithms to find the best set of predictors.
The majority of the schemes in machine learning use the confusion matrix as a way
to measure the performance using the test data. The confusion matrix finds the number of
“individuals” for which the prediction was accurate. On the other hand, with the decile
table it’s possible to identify the specific individuals which have better performance. The
decile tables measures the accuracy of a predictive model versus a prediction without
modeling (Ratner, 2011).
The decile table is use to score the test sample on a scale of 1 to 100 based upon the
characteristics identified by the algorithm, depending on the problem context. The list of
individuals in the test sample is then rank ordered by score and split into 10 groups,
called deciles. The top 10 percent of scores was decile one, the next 10 percent was decile
two, and so forth. Decile separates and orders the individuals on an ordinal scale. Each
decile has a number of individuals; it is the 10% of the total size of the sample test. Then
the actual number of responses in each decile is listed. Then, other analysis such as
response rate, cumulative response rate, and predictability (based on the cumulative
response rate) can be performed. The performance in each decile can be used as an
objective function for machine learning algorithms.
The GenIQ System (Ratner, 2008; 2009), based on GP, is utilized to provide
predictive models. GenIQ lets the data define the model, performs variable selection, and
then specifies the model equation.
The GenIQ System develops the model by performing generations of models so as to
optimize the decile table. As explained by Ratner [16] “Operationally, optimizing the
decile table is creating the best possible descending ranking of the target variable
(outcome) values. Thus, GenIQs prediction is that of identifying individuals, who are
most-likely to least-likely to respond (for a binary outcome), or who contribute large
profits to small profits (for a continuous outcome).”
We decided to use a file with information about thermography and some selected
flights from Atlantis, Discovery, and Endeavour from the different databases available in
186 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
this project (in the order of petabytes) The data set was split in two separate sets: One for
training and the other one for validation. The objective was to predict when to do over-
haul of the respective RCC panel.
Table 1 shows the decile table for the 8,700 examples of the validation dataset (with
24 input parameters). There are 870 examples for each decile as shown in the first
column. The second column shows the predicted responses of the different deciles which
were able to be predicted by the model. The third column is just the predicted response
rate in %. The fourth column is the cumulative response rate starting from the top decile
to the bottom one. For example, for the top decile is 856 divided by 870. On the other
hand, the cumulative response rate for the second decile is 856 plus 793 (1,649) divided
by the addition of 870 and 870 for the second decile (1,740). The Fifth column shows a
www.electronicbo.com
comparison between the different deciles with respective to the bottom one. For example,
the value of 1.32 for the top decile tells us that the model predicts 1.32 better than an
answer provided by no model (just randomly). The value of 1.32 is obtained by dividing
the predicted response rate of the top decile (98%) divided by the predicted response rate
of the bottom decile (74%). Therefore, that is the predictability.
Figure 12 shows the bar-graph for the predicted responses. It is flat in general (i.e.,
the predicted response of the 4th decile is greater than the one from the 3rd decile). The
bars seem to be the same height for the first 5 deciles. Therefore, the model has moderate
performance (74% in the validation set).
Predictive Analytics using Genetic Programming 187
Figure 12: Predicted responses for each decile (from top to bottom).
The GenIQ Response Model Tree, in Figure 13, reflects the best model of the decile
table shown in Table 1. The model is represented using a tree structure. The output of the
GenIQ Model is two-fold (Ratner, 2008): a graph known as a parse tree (as in Figure 13).
A parse tree is comprised of variables, which are connected to other variables with
functions (e.g., arithmetic {+, -, /, x}, trigonometric {sine, tangent, cosine}, Boolean
{and, or, xor}). In this case, it is a model to predict when to do the overhaul. This model
was very simple and the performance in the validation set (74%) was very comparable to
other models using neural networks trained with the backpropagation paradigm.
Figure 13: Example of one of the earlier GP Models developed to calibrate the genetic process and the
generation of specific data. The model tries to predict when to do the overhaul.
After this moderate performance, the emphasis was on synthetic variables to be used
with neural networks. It was decided to develop a synthetic variable denominated Quality
Index (that was the value obtained from thermography). This synthetic variable is
displayed in Figure 14. The GenIQ Response Model computer code (model equation) is
188 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
also shown in Figure 15. This model can be deployed using any hardware/software
system.
www.electronicbo.com
Figure 14: Example of one of the basic Genetic Programming Models developed to determine the
Quality Index of the composite materials as a synthetic attribute.
CONCLUSION
Our experience working with complex problems, incomplete data, and high noise
levels have provided us with a more comprehensive methodology where machine
learning base-models can be used with other types of empirical and exact models. Data
science is very popular in the marketing domain where first-principle models are not
common. However, the next frontier of big data analytics is to use information fusion -
also known as multi-source data fusion (Sala-Diakanda, Sepulveda & Rabelo, 2010). Hall
and Llinas (1997) define data fusion as “a formal framework in which are expressed
means and tools for the alliance of data originating from different sources, with the aim
of obtaining information of greater quality”. Information fusion is going to be very
important to create predictive models for complex problems. AI paradigms such as GP,
are a philosophy of the “data fits the model.” This viewpoint has many advantages for
automatic programming and the future of predictive analytics.
As future research, we propose combining GP concepts with operations research and
operations management techniques, to develop methodologies where the data helps the
model creation to support prescriptive analytics (Bertsimas & Kallus, 2014). As we see in
this paper these methodologies are applicable to decision problems. In addition, it is a
current tendency in the prescriptive analytics community to find and use better metrics to
measure the efficiency of the models besides the confusion matrix or decile tables.
Another important point for engineered systems is the utilization of model-based system
engineering. SysML can be combined with ontologies in order to develop better GP
models (Rabelo & Clark, 2015). One point is clear: GP has the potential to be superior to
regression/classification trees due to the fact that GP has more operators which include
the ones from regression/classification trees.
ACKNOWLEDGMENTS
We would like to give thanks to Dr. Bruce Ratner. Bruce provided the GenIQ Model
for this project (www.GenIQModel.com). In addition, we would like to give thanks to the
NASA Kennedy Space Center (KSC). KSC is the best place to learn about complexity.
The views expressed in this paper are solely those of the authors and do not
necessarily reflect the views of NASA.
REFERENCES
Bertsimas, D., & Kallus, N. (2014). From predictive to prescriptive analytics. arXiv
preprint arXiv:1402.5481.
190 Luis Rabelo, Edgar Gutierrez, Sayli Bhide et al.
Blum, C., Chiong, R., Clerc, M., De Jong, K., Michalewicz, Z., Neri, F., & Weise, T.
2011. Evolutionary optimization. In Variants of Evolutionary Algorithms for Real-
World Applications. Chiong, R., Weise, T., Michalewicz, Z. (eds.) Berlin/Heidelberg:
Springer-Verlag, 1–29.
Chen, C. & Zhang, C. 2014. Data-intensive applications, challenges, techniques and
technologies: a survey on big data. Information Sciences, 275, 314–347.
Cramer, E., Winfree, W., Hodges, K., Koshti, A., Ryan, D. & Reinhart, W. 2006. Status
of thermal NDT of space shuttle materials at NASA. Proceedings of SPIE, the
International Society for Optical Engineering, 17-20 April, Kissimmee, Florida.
Dale, R. (2008, July 23). RCC investigation: Tiger Team reveals preliminary findings.
Retrieved from https://www.nasaspaceflight.com.
www.electronicbo.com
Deb, K. 2001. Multi-objective optimization using evolutionary algorithms. Hoboken: NJ:
John Wiley & Sons.Frawley, W., Piatetsky-Shapiro, G., &Matheus, C. 1992.
Knowledge Discovery in Databases: An Overview. AI Magazine, 13(3), 213–228.
Gobble, M. 2013. Big Data: The Next Big Thing in Innovation. Research Technology
Management, 56(1): 64-66.
Goldberg, E. 1989. Genetic algorithms in search, optimization, and machine learning.
Boston, MA: Addison-Wesley Professional.
Gordon, M. 1998. Leading Edge Structural Subsystem and Reinforced Carbon-Carbon
Reference Manual. Boeing Document KLO-98-008.
Hall, D. &Llinas, J. 1997. An introduction to multisensor data fusion. Proceedings of the
IEEE, 85(1), 6-23.
Holland, J. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor, MI:
University of Michigan Press.
NASA. 2002. Thermal Protection System. Retrieved from https://spaceflight
.nasa.gov/shuttle.
NASA. 2004. Report of Columbia Accident Investigation Board: Chapter 1. Retrieved
from http://caib.nasa.gov/news/report/volume1/chapters.html.
NASA. 2005. Space Shuttle Basics. Retrieved from https://spaceflight.nasa.gov/shuttle.
NASA. 2006. Shuttle Left Wing Cutaway Diagrams. Retrieved from
https://www.nasa.gov/.
NASA. 2008. Reinforced Carbon-Carbon. Retrieved from https://www.nasa.gov/
centers/glenn.
Koza, J. 1994. Genetic Programming II: Automatic Discovery of Reusable Programs.
Cambridge, MA: MIT Press.
Koza, J., Bennett, F.H., Andre, D., & Keane, M. 1999. Genetic Programming III:
Darwinian Invention and Problem Solving. San Francisco, CA: Morgan Kaufmann.
Koza, J., Keane, M.A., Streeter, M., Mydlowec, W., Yu, J., & Lanza, G. 2003. Genetic
Programming IV: Routine Human-Competitive Machine Intelligence. Norwell, MA:
Kluwer Academic Publishers.
Predictive Analytics using Genetic Programming 191
Ratner, B. 2011. Statistical and Machine-Learning Data Mining: Techniques for Better
Predictive Modeling and Analysis of Big Data. 2nd Edition. Boca Raton, Florida:
CRC Press.
Sala-Diakanda, S., Sepulveda, J., & Rabelo, L. 2010. An information fusion-based metric
for space launch range safety. Information Fusion Journal, 11(4), 365-373
Stockwell, A. 2005. The influence of model complexity on the impact response of a
shuttle leading-edge panel finite element simulation. NASA/CR-2005-213535, March
2005.
Witten, I. & Frank, E. 2005. Data Mining: Practical Machine Learning Tools and
Techniques (Second Edition). San Francisco, CA: Morgan Kaufmann Publishers.
www.electronicbo.com
AUTHORS’ BIOGRAPHIES
Dr. Luis Rabelo was the NASA EPSCoR Agency Project Manager and currently a
Professor in the Department of Industrial Engineering and Management Systems at the
University of Central Florida. He received dual degrees in Electrical and Mechanical
Engineering from the Technological University of Panama and Master’s degrees from the
Florida Institute of Technology in Electrical Engineering (1987) and the University of
Missouri-Rolla in Engineering Management (1988). He received a Ph.D. in Engineering
Management from the University of Missouri-Rolla in 1990, where he also did Post-
Doctoral work in Nuclear Engineering in 1990-1991. In addition, he holds a dual MS
degree in Systems Engineering & Management from the Massachusetts Institute of
Technology (MIT). He has over 280 publications, three international patents being
utilized in the Aerospace Industry, and graduated 40 Master and 34 Doctoral students as
advisor/Co-Advisor.
Dr. Sayli Bide is a researcher in virtual simulation and safety. She received a PhD in
Industrial Engineering from University of Central Florida (UCF) in 2017. She has
completed M.S. in Engineering Management in 2014 from UCF and B.S. in Electronics
Engineering from University of Mumbai, India in 2009. She has work experience in
software engineering in a multinational software company. Her research interests include
health and safety, modeling and simulation, ergonomics and data analytics.
Chapter 9
ABSTRACT
demand, patient complexity, staffing level, clinician workload, and boarding status when
defining the crowding level. The hierarchical fuzzy logic approach is utilized to
accomplish the goals of this framework by combining a diverse pool of healthcare expert
perspectives while addressing the complexity of the overcrowding issue.
INTRODUCTION
The demand of healthcare services continues to grow, and lack of access to care
www.electronicbo.com
services has become a dilemma due to the limited capacity and inefficient use of
resources in healthcare. (Bellow & Gillespie, 2014). This supply-demand imbalance and
resulting access block is causing overcrowding in healthcare facilities, one type of which
is emergency departments. These essential healthcare centers serve as a hospital’s front
door and provide emergency care service to patients regardless of their ability to pay.
According to the American Hospital Association (AHA) annual survey, the visits to
emergency departments in the USA exceeded 130 million in 2011 (AHA, 2014). In Saudi
Arabia, the Ministry of Health (MoH) reported nearly 21 million visits in 2012 (MOH,
2014). With this massive demand on emergency care services, emergency departments
mostly operate over capacity and sometimes report ambulance diversion.
When ED crowding started to become a serious problem, a need appeared to quantify
the problem to offer support in making emergency care operational decisions (Johnson &
Winkelman, 2011). As a result, four ED crowding measurement scales were developed
which are Real-time Emergency Analysis of Demand Indicators (READI) (Reeder &
Garrison, 2001), Emergency Department Work Index (EDWIN) (Bernstein, Verghese,
Leung, Lunney, & Perez, 2003), National Emergency Department Overcrowding Score
(NEDOCS) (Weiss et al., 2004), and Work Score (Epstein & Tian, 2006). However,
many criticized the reliability, reproducibility, and validity of these crowding
measurement scales when implemented in emergency settings outside of the regions they
were originally developed in. Moreover, their efficiency has been a concern, especially
with regards to their dependency solely on emergency physicians’ and nurses’
perspectives.
Currently, ED crowding has become a serious issue in many healthcare organizations
which affects both operational and clinical aspects of emergency care systems (Eitel,
Rudkin, Malvehy, Killeen, & Pines, 2010; Epstein et al., 2012). To evaluate such an
issue, healthcare decision makers should be provided with a robust quantitative tool that
measures the problem and aids in ED operational decision making (Hwang et al., 2011).
To achieve this, the proposed study aims to develop a quantitative measurement tool of
evaluating ED crowding that captures healthcare experts’ opinions and other ED
Managing Overcrowding in Healthcare using Fuzzy Logic 197
FRAMEWORK DEVELOPMENT
Hierarchical fuzzy systems (HFSs) are implemented by researchers for two main
purposes. First, they help in minimizing the total number of fuzzy rules in the knowledge
base which feed into the fuzzy inference engine. Second, the HFSs are effective in
building the logical relationship among different crisp input variables in complex
systems, unlike Standard Fuzzy Systems (SFSs), which become exponentially
complicated as the number of variables and their fuzzy sets’ levels increase. Figure 2 and
Figure 3illustrate the difference between applying traditional standard fuzzy logic
approach versus applying hierarchical fuzzy logic approach to construct and determine
the relationship between a fuzzy subsystem’s crisp outputs and the main fuzzy system,
www.electronicbo.com
where On stands for the crisp output of fuzzy subsystem n, and O f stands for the crisp
output of the main fuzzy system [7]. In the case of SFSs, the total number of fuzzy rules
related to the number of crisp inputs is exponentially proportional, whereas it is linearly
proportional in HFSs. For instance, supposing that there are five crisp variables, and each
variable encompasses five fuzzy sets, then for utilizing a SFS, the total number of fuzzy
rules for the whole fuzzy system is (55 = 3125 rules), whereas in a four-level HFS with
four fuzzy subsystems, each encompassing two crisp inputs, the total number of fuzzy
rules for the complete fuzzy system is (52 = 100 rules). It is clear that utilizing HFSs
significantly reduces the total number of fuzzy rules necessary to construct the
knowledge bases for the whole fuzzy system. Thus, utilizing HFSs in this study makes it
possible to analyze the complicated nature of emergency health care systems, which if
studied through SFSs, could involve too many fuzzy rules and computations for an
effective analysis. It is also notable that using HFSs detailed in Figure 3, will help in
determining the relationship between outputs of the fuzzy subsystems and the main fuzzy
system, and in specifying the relationship among fuzzy subsystems as well.
www.electronicbo.com
Figure 5: Three-level hierarchical fuzzy expert.
Figure 5 further illustrates the relation of these inputs to the proposed fuzzy logic
system. Level one of the hierarchical fuzzy expert system contains two fuzzy subsystems.
The first fuzzy subsystem aims to assess the ED’s demand status by evaluating the
ratio of patients in an ED waiting area to that emergency room’s capacity, and the
average patient complexity. Figure 6 illustrates the components of fuzzy subsystem I. The
first input to the fuzzy subsystem I is the ratio of waiting patients to ED capacity which is
characterized by four fuzzy membership functions; “Low”, “Medium”, “High”, and
“Very High”. To assess this input variable, trapezoidal functions are utilized to evaluate
the membership degree on an interval [0, 2]. The patient complexity, the second input to
the fuzzy subsystem I, is represented by three membership functions; “Low”, “Medium”,
and “High”. Similarly, a trapezoidal function is used for this input, evaluating the
membership degree on the interval [1, 5], which is adapted from the five levels of the
emergency severity index (Gilboy, Tanabe, Travers, Rosenau, & Eitel, 2005). Given
these fuzzy classes, the total number of fuzzy rules from this subsystem will be 12 fuzzy
rules (4×3). The output of fuzzy subsystem I is ED’s demand status, which is represented
by five membership functions; “Very Low”, “Low”, “Medium”, “High”, and “Very
High”. This output is evaluated with a triangular function for the interval [0, 100]. The
demand status is an intermediate variable rather than a final indicator, which feeds the
fourth and final fuzzy subsystem with a crisp value, to contribute to the final assessment
of the ED’s crowding level.
The second fuzzy logic subsystem, with two inputs and one output, is designed to
determine the level of ED staffing. Figure 7 presents the components of fuzzy subsystem
II. ED staffing status is subjective in nature and the membership functions that represent
this aspect of crowding reflect this subjectivity based on the knowledge from the health
care experts. The two inputs of this fuzzy subsystem are the level of ED physician
staffing and ED nurse staffing. Both inputs are represented by three membership
Managing Overcrowding in Healthcare using Fuzzy Logic 201
functions; “Inadequate”, “Partially adequate”, and “Adequate”, which are assessed on the
intervals [0, 0.32], and [0, 50], respectively, with trapezoidal functions. With these
membership functions, the total number of fuzzy rules in this subsystem will be 9 rules
(32). The output of the fuzzy subsystem two is ED staffing status. The output is
represented by the same three membership functions; “Inadequate”, “Partially adequate”,
and “Adequate”, and is evaluated on a trapezoidal function with the interval [0, 100]. The
ED staffing status is an intermediate variable that feeds the third fuzzy subsystem with a
crisp value, which will serve as another variable for the assessment of the ED workload.
Finally, the ED workload will feed into the fourth fuzzy subsystem.
The third fuzzy logic subsystem evaluates the ED workload. The three inputs of this
fuzzy subsystem are ED staffing level, ER occupancy rate, and average complexity of
patients who are being treated in the emergency room. It should be noted that the third input
shares the same characteristics of the second input of subsystem one, with the difference
being that the populations of these similar inputs are separate. Figure 8 illustrates the
components of fuzzy subsystem III. The ED staffing status, input one, is the output from
subsystem II, and is represented by three membership functions; “Inadequate”, “Partially
adequate”, and “Adequate”. Using the same membership function, this input is evaluated with
a trapezoidal function on the interval [0, 100]. The ER occupancy rate, which is an
independent input, is characterized by four membership functions; “Low”, “Medium”,
“High”, and “Very High”. The occupancy rate is evaluated with a trapezoidal function in the
interval [0, 100]. The third input, patient complexity shares characteristics from the second
input to the fuzzy subsystem I, as previously mentioned. Therefore, this third input is
represented by three membership functions; “Low”, “Medium”, and “High”, and is evaluated
with a trapezoidal function in the interval [1, 5]. With the three sets of membership indicators
in this subsystem, the number of fuzzy rules will now reach 36 rules (32×4). The single output
of the third fuzzy logic subsystem is the ED workload. It is represented by four membership
functions; “Low”, “Medium”, “High”, and “Very High”. As other outputs are evaluated in
202 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
this interval of [0,100], this output is evaluated in the same interval, and its membership value
is assessed with a triangular function. The ED workload is an intermediate variable that feeds
the fourth fuzzy subsystem, and represents a major determinate of crowding by containing
four of the seven inputs alone. Combined with the output of subsystem I and the final input,
the output of subsystem III will contribute to subsystem IV’s assessment of emergency
department crowding.
In review, the first level of the hierarchical fuzzy expert system was composed of two
fuzzy logic subsystems, with the second level containing one subsystem, which is also
detailed in Figure 5. Level three of the hierarchical fuzzy expert system contains the fourth
and final fuzzy logic subsystem, which receives inputs in some manner from every previous
subsystem.
This fourth fuzzy logic subsystem is the main component of this hierarchical fuzzy expert
www.electronicbo.com
system which aims to assess the ED crowding level. The three inputs of this fuzzy subsystem
include the two previously mentioned indicators ED demand status and ED workload, and the
third, new input, which is the seventh independent input of the entire hierarchical system, is
ED boarding status. The components of fuzzy subsystem IV are illustrated in Figure 9. The
first input to this subsystem, the ED demand status, as previously described, is represented by
five triangular membership functions; “Very Low”, “Low”, “Medium”, “High”, and “Very
High”, with an interval of [0, 100]. The second input, the ED workload is represented by four
triangular membership functions; “Low”, “Medium”, “High”, and “Very High”. Its interval of
the crisp value is [0,100]. The third input, ED boarding status, is an independent variable,
which is derived from the ratio of boarded patients to the capacity of the emergency room.
This input has four fuzzy classes as the second input, but is evaluated with a trapezoidal
membership function on an interval of [0, 0.4]. With the three sets of membership indicators
in this subsystem, the number of fuzzy rules is 80 (42×5). The output of the fourth fuzzy logic
subsystem is the ED crowding level, and is the final output for the entire hierarchical system.
It is represented by five membership functions; “Insignificant”, “Low”, “Medium”, “High”,
and “Extreme”, which are used to indicate the degree of crowding in emergency departments.
Like other outputs, the interval of the crisp value for the final output is [0,100], and is
evaluated with a triangular function.
Utilizing the hierarchical fuzzy system appears to be the most appropriate approach for
this study, rather than the standard fuzzy system. This approach creates different indicators,
such as demand status, workload, and staffing indicators, while reducing the total number of
fuzzy rules from 5184 (under the standard fuzzy system) to just 137 rules. This difference
represents a great reduction in calculation and simplifies the process of acquiring knowledge
from experts, and potentially reduces the threshold for academic access to meaningful results.
Managing Overcrowding in Healthcare using Fuzzy Logic 203
Figure 8: Fuzzy logic subsystem III. Figure 9: Fuzzy logic subsystem IV.
This section describes the technical process of developing the proposed fuzzy expert
system, which would equip the designed framework with a knowledge base, a fuzzy
inference engine, fuzzifier and defuzzifier. The knowledge base consists of a fuzzy
database and a fuzzy rule base, in order to fuel the fuzzifier, defuzzifier, and inference
engine portions of the fuzzy subsystems.
First, the elicitation of expert knowledge for building the fuzzy database is described.
Secondly, this section also describes the process of developing fuzzy rules. Finally, the
fuzzification and the defuzzification processes are conceptually and mathematically
represented.
Knowledge Base
The expert works or has recently worked in Saudi Arabia healthcare institutions
for at least five years, or has conducted research in the field of Saudi healthcare.
204 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
The expert has deep experience in the daily operations of emergency care
centers.
The expert has solid knowledge in staffing, performance management, healthcare
administration, patient flow analysis, and bed management.
To create a robust knowledge base for the proposed fuzzy system, a minimum of ten
experts are required who meet these qualifications. While discussing these experts here
for the purposes of analyzing their data, and elsewhere in this study, an assigned code
“HCE-k” will be issued for each participated expert, where HCE stands for Healthcare
Expert, and k stands for the expert number.
www.electronicbo.com
Database
This study adapts the indirect interval estimation elicitation method. Such a method
carries advantages such as allowing responses from multiple subject matter experts, while
not requiring knowledge of membership functions. Additionally, under this approach,
fewer questions may be used, and given questions may be easier to answer than those in
other approaches. To elicit the degrees of membership for a fuzzy class, let [𝑥𝑗𝑖,]
represent the interval values of the fuzzy class j that is determined by expert i. The steps
to elicit and analyze expert knowledge are described as follows:
The fuzzy rule base is the other key part to the knowledge base, including the
database. It stores all derived fuzzy rules, which is intended to provide the fuzzy
inference engine with decision support information within each subsystem. To robustly
create fuzzy rules for each fuzzy logic subsystem, experts are given a form to assess the
consequences of each condition statement, developed from the permutation of each fuzzy
class for a given fuzzy subsystem. A total of 10 healthcare experts will participate in the
fuzzy rules assessment process. The total number of fuzzy rules to be evaluated by
subject matter experts for the fuzzy logic subsystems I, II, III, and IV are 12 (3×4),
9(32), 36(4×32), and 80(5×42), respectively. Therefore, the proposed three-level
hierarchical fuzzy expert system includes a total of 137 fuzzy rules, meaning that there
will be a total of 1370 fuzzy rule assessments from the ten experts. The process of
developing the fuzzy rules is detailed in the following steps:
Managing Overcrowding in Healthcare using Fuzzy Logic 205
List all possible permutations of “AND” rules for each fuzzy logic subsystem.
Code each rule with “FLSm-n” where FLS stands for Fuzzy Logic Subsystem, m
stands for the number of subsystem, and n stands for the rule number within the m
subsystems.
Code “HCE-k” for each participating expert, where HCE stands for Healthcare
Expert, and k stands for the expert number.
The Expert HCE-k determines the consequence of the fuzzy conditional statement
FLSm-n based on their expertise.
The fuzzy conditional statement FLSm-n must meet a 50% consensus rate among
experts, and must be the only consequence to receive a 50% consensus rate, to be
accepted as a valid fuzzy rule.
If the consensus rate does not meet the determined criteria, further iterations should
be conducted with a new expert until the consensus rate achieves the criteria in the
previous step.
The process for developing fuzzy rules is illustrated in Figure 10, where the
consensus feedback is elaborated upon in more detail.
Fuzzification Process
Fuzzification is the first step in the fuzzy system, as it obtains both the membership
function type and the degree of membership from the database. This database is built
from the surveyed expert determination of membership function intervals. In the
fuzzification process, crisp values which are within the universe of discourse of the input
variable are translated into fuzzy values, and the fuzzifier determines the degree to which
they belong to a membership function. The fuzzifier for this designed fuzzy system
adapts the Minimum approach. Whereas the input is crisp, the output is a degree of
membership in a qualitative set. The fuzzified outputs allow the system to determine the
degree to which each fuzzy condition satisfies each rule.
206 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
Defuzzification Process
After the fuzzifier converts numerical inputs into fuzzy values, and the fuzzy
inference engine is fed by the knowledge base to logically link the inputs to the output,
last step remaining in the fuzzy system occurs in the defuzzifier. Defuzzification is the
process where the fuzzy values are converted into crisp values. The defuzzifier is fed by
the database, and its importance lies in the fact that its crisp output is the desired product
of the entire system. Seven defuzzification methods are identified (Sivanandam, Sumathi,
& Deepa, 2007): centroid method, max-membership method, mean-max membership,
weighted average method, center of sums, first of maxima or last of maxima, and center
of largest area. This research adapts the centroid method for the defuzzification process,
www.electronicbo.com
and its formula is defined as following: 𝑧∗= ∫(𝑧) 𝑧𝑑𝑧 / 𝜇𝐶(𝑧) 𝑑𝑧.
In this section, the protocol was provided for eliciting expert knowledge to obtain
membership intervals, rule assessments and consensus rates, along with other data. Then,
a preparatory step must be taken to obtain results for the proposed model. In this step,
data will be prepared before it is added to the knowledge base, interval values will be
used to construct membership functions, and data from expert rule assessments will
contribute to the rule base.
Expert knowledge was sought from ten experts, designated with HCE expert codes.
The occupation and qualifications of these experts are described as follows:
In this section, results from subject matter experts are detailed across five tables. For
each table, the results from ten experts answering five questions are listed, providing a
total of 220 intervals which are used to construct membership functions. This section will
detail the calculation of the fuzzy numbers, based on the results provided by the subject
matter experts. Table 1 contains answers from question one of the survey, in which
experts were posed with a scenario of an emergency room with capacity of 50 beds. The
answers from the expert’s evaluation are divided by 50 to obtain the ratio of waiting
patients to ED capacity, which can be applied to any ED. This question in the survey
specified the minimum and maximum values for the patient demand as 0 and 100,
respectively, in order to introduce boundaries for the membership functions. After
converting these values into ratios, the minimum and maximum values became 0 and 2,
respectively. Experts determined the patient demand on four levels; “low”, “medium”,
“high”, and “very high”. The total number of obtained intervals from question one was
40.
Table 2 contains answers from question two of the survey, which is related to a
scenario with an emergency room capacity of 50 beds. The ratios were obtained from the
answers of subject matter experts. This question in the survey did not specify the
maximum value for the patient demand, meaning that the membership function did not
have an imposed boundary. After converting these values into ratios, the minimum and
maximum values became 0 and 0.32, respectively. Experts determined the patient
demand on three levels; “inadequate”, “partially adequate”, and “adequate”. The total
number of obtained intervals from question two was 30.
Table 3 contains answers from question three of the survey, which is related to a
scenario with an emergency room capacity of 50 beds. Similarly, in this table, there is no
208 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
imposed upper bound for nurse staffing, which also impacts the upper bound of the last
fuzzy class. The maximum value for nurse staffing was 0.5, or 25 out of 50 beds, and
experts provided their evaluations on three fuzzy classes; “inadequate”, “partially
adequate”, and “adequate”. 30 total intervals were obtained from question three.
Table 4 contains answers from question four of the survey, regarding ER occupancy
rate, where the maximum occupancy rate was assumed to be 100 percent. Ten experts
provided intervals from their perspective on an appropriate lower and upper value for
each of the four fuzzy classes, “low”, “medium”, “high”, and “very high”. In total, 40
evaluated intervals were obtained to construct the membership functions.
Table 5 contains answers from the survey’s fifth question, and is concerned with
patient boarding. Similarly, to questions one, two, and three, this question was based on a
www.electronicbo.com
scenario with 50 beds, which was later converted to a ratio of boarded patients to the ER
capacity. The minimum and maximum intervals were specific at 0 and 20 patients,
respectively, which translated to ratios of 0 and 0.4. From the ten experts’ responses
across the four fuzzy classes, 40 evaluated intervals were obtained.
Membership Functions
The database for subsystem I consists of membership functions for both inputs and
the output, and are structured according to the data from Table 6. Variable one, or the
demand status, consists of four trapezoidal membership functions, while variable two,
patient complexity, consists of three trapezoidal membership functions, and variable
three, the ED demand, is the output of the subsystem and has five triangular membership
functions.
The membership function representing patient demand in Figure 11 is constructed
using the fuzzy number intervals and linguistic classes provided in Table 6. For the “low”
linguistic class interval, the minimum value in the upper bound of the low class (as
observed in Table 1) is 0.2 meaning that there is 100% agreement among experts between
the values of 0 and 0.2 for “low”. The maximum value in the upper bound of the low
class is 0.5, yet the minimum value of the lower bound in the medium class is 0.2,
meaning that some experts varied in assigning the term “low” and “medium” between the
interval [0.2, 0.5]. In Figure 11, this accounts for the structure of the low class, where the
core exists between 0 and 0.2, and the support exists between 0.2 and 0.5, overlapping the
support of the medium class. The boundary for the medium class began at 0.2 and ended
at 0.8, while the boundary for the high class was between 0.6 and 1.2, and the boundary
for the very-high class was between 0.92 and 2. The core structures of the medium and
high class are small, compared to the low and very-high classes.
The membership function for patient complexity in Figure 12 was constructed from
the data provided by an expert using reverse interval estimation method. This was done
due to the need for an expert possessing medical expertise in the triage process and
familiarity with the emergency severity index. This expert directly constructed the
membership function, providing data for the three linguistic classes. Patients rated with a
value of 2 or 1 were considered “low” average complexity, and supports of this
membership function consist of patients rated between 2 and 2.5, meaning the boundary
210 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
for the low class was between 1 and 2.5. Similarly, for “medium” average complexity,
patients rated between 2.5 and 3.5 make up the core structure, and with the supports
assigned values between 2 and 2.5, and between 3.5 and 4, the entire class boundary lies
between 2 and 4. For “high” average complexity, the expert assigned values between 4
and 5 for the core area, with values between 3.5 and 4 for the support, making the
boundary for the high class between 3.5 and 5. The core areas of each class are consistent
in size, due to the data being taken from one expert instead of ten.
www.electronicbo.com
Figure 11: Membership function of patient Figure 12: Membership function of patient
demand. complexity.
The membership function for ED demand in Figure 13 represents the output for
subsystem one, which is considered the standard membership function for outputs. The
function is triangular, with membership degree values peaking at 1, and the boundaries
for different classes overlap the peaks of adjacent classes perfectly, demonstrating that
the membership function always obtains membership from two classes. This also means
that at any given point, the membership degree from two overlapping classes always
equals 1, but there are only five points where classes obtain membership completely.
These points occur at 0, 25, 50, 75, and 100 for “very-low”, “low”, “medium”, “high”,
and “very-high”, respectively.
In subsystem II, the membership functions for the physician staffing and nurse
staffing inputs are constructed with trapezoids for three classes. The output, ED staffing,
is also represented with a trapezoidal membership function, which features equally
spaced boundaries across three classes. Table 6 details the linguistic classes and fuzzy
numbers for subsystem II and its membership functions.
Physician staffing is represented in the membership functions in Figure 14. The three
classes overlap as seen in subsystem I, representing the regions where linguistic terms did
not reach full degrees of membership. For instance, the inadequate class core boundary
begins and ends at 0.06, representing full membership for the linguistic term
“inadequate”. The upper bound for the inadequate class is 0.12, where the linguistic term
“inadequate” achieves partial membership, and the lower bound for the partially adequate
class is 0.06, where its term also achieves partial membership. The boundaries for the
three classes are between 0 and 0.12 for the inadequate class, between .06 and 0.24 for
the partially adequate class, and between 0.16 and 0.32 for the adequate class. The
Managing Overcrowding in Healthcare using Fuzzy Logic 211
partially adequate class has the smallest core area, and the supports for all classes are
similar in size relative to each other.
Figure 13: Membership function of ED demand. Figure 14: Membership function of physician
staffing.
The second input in subsystem II, nurse staffing, is represented by the membership
functions in Figure 15. The inadequate class boundaries are at 0 and 0.18, with the core
structure representing full membership existing between 0 and 0.8. The partially adequate
class lies between boundaries of 0.8 and 0.32, while the core area exists between 0.18 and
0.24. For the adequate class, the boundaries lie at 0.24 and 0.5, with the core structure
existing between 0.32 and 0.5. It is apparent that the adequate class has the largest core
area, meaning that the adequate linguistic term was given the widest variety of interval
values for full membership, while values that defined the partially adequate class were
more restrictive.
Figure 16 contains the membership functions for the output of the subsystem ED
staffing. The membership functions are trapezoidal, but the intervals are assigned to
create similarly sized membership classes. In this figure, the boundaries for the
inadequate class lie between 0 and 35, with the core existing between 0 and 25,
representing a full degree of membership. The boundaries for the partially adequate class
are 25 and 75, with the core existing between 35 and 65. For the adequate class, the
boundaries are 65 and 100, with the core area defined between 75 and 100. It can be
noted that the midpoint between the boundaries for the partially adequate class lies at 50,
which is the halfway point on the ED staffing axis, further demonstrating the uniformity
in the membership functions.
Figure 15: Membership function of nurse staffing. Figure 16: Membership function of ED staffing.
212 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
Table 6 details the data used in the membership functions of subsystem III, where
both trapezoidal and triangular membership functions are used across the three inputs and
one output. It should be noted again that the output of subsystem II, ED staffing, is an
input in subsystem III, dictating the use of a trapezoidal membership function for this
subsystem’s associated input. As this input shares the same membership function
characteristics as previously described, it will be omitted in the description of this
subsystem’s membership functions. While the populations for patient complexity input
are separate between this subsystem and subsystem I, the membership functions share the
same characteristics, and thus the membership functions for patient complexity will not
be discussed in this subsystem as well.
Figure 17 provides the trapezoidal membership functions for ER occupancy rate,
www.electronicbo.com
which is the second variable in Table 6, and is characterized by four linguistic terms. The
low class is bounded between the values 0 and 35, while the medium, high, and very high
classes lie between values of 20 and 65, 45 and 90, and 70 and 100, respectively. The low
class has the largest core structure, which is bounded between the values of 0 and 20, and
represents the largest interval of assigned values for full class membership. The medium
and very high classes appear to have similarly sized core areas, bound between the values
of 35 and 45 for “medium”, and 90 and 100 for “very high”. The core area for “high” is
the smallest, bound between the values of 65 and 70, and represents the smallest interval
of assigned values for full class membership.
Figure 18 provides the membership functions for the output of subsystem III, ED
workload, and triangular membership functions are assigned to four classes. Similarly, to
the membership functions from the output of subsystem I, the membership classes exist
on overlapping intervals such that at any point, the degree of membership for two classes
add up to a value of one, and there are only four points at which classes reach full degrees
of membership. These points occur at 0, 33.34, 66.67, and 100, for the low, medium,
high, and very-high classes, respectively.
Figure 17: Membership function of ER occupancy rate. Figure 18: Membership function of workload.
Table 6: Parameters of fuzzy subsystem I’s, II’s, III’s, and IV’s membership functions.
The trapezoidal membership functions in Figure 19 represent the four classes used
for the boarding input in subsystem IV. Boarding was considered to be “very high”
between values of 0.26 and 0.4, making its core structure the largest while indicating the
largest interval of values where a class was assigned full membership. Between the
values of 0.16 and 0.32, boarding was considered “high”, which is associated with the
smallest membership function core structure belonging to the high class. The low and
medium classes existed between the intervals of [0, 0.12], and [0.04, 0.24], respectively.
Crowding, the final output of the system, is represented by the triangular membership
functions in Figure 20. The linguistic terms “insignificant”, “low”, “medium”, “high”,
and “extreme” were associated with the five classes. The membership functions were
214 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
assigned boundaries to create evenly distributed classes on the crowding axis, and
similarly to subsystem III and I, the degree of membership is equivalent to 1 among the
two classes existing at any given point. Only at the points 0, 25, 50, 75, and 100, do the
five respective classes individually obtain full degrees of membership.
www.electronicbo.com
Figure 19: Membership function of patient boarding. Figure 20: Membership function of crowding.
This section presents the results of the fuzzy rule base development and the experts’
consensus rate. The fuzzy rule base assessments are divided by subsystem, with
subsystem I producing 120 rules assessments, and subsystem II, III, and IV producing 90,
360, and 800 rule assessments, respectively, for a total of 1370 assessments obtained.
After reaching consensus, the final version of the fuzzy rules is listed in this section.
Table 7 details the results from the expert assessment of the fuzzy rules from
subsystem I. This table consists of 12 columns, beginning with the rule code, followed by
ten expert evaluations, and ending with consensus status. Below the table is a legend
comprising five linguistic classes which are color-coded. In this subsystem, two fuzzy
rules reached full consensus (100%); FLS1-11, and FLS1-12. Two rules achieved 90%
consensus: FLS1-05, and FLS1-06; four reached 80%: FLS1-01, FLS1-04, FLS1-07, and
FLS1-08; one rule reached 70% consensus: FLS1-03, and three reached 60% consensus:
FLS1-02, FLS1-09, and FLS1-10. The average consensus rate for this subsystem’s rule
assessments is 79%. Seven of the twelve evaluated rules received assessments across
only two linguistic classes, while two were assessed across three linguistic classes, and
only one received assessments exceeding more than three types of linguistic assessment.
Most the data in this subsystem is centralized around two linguistic classes. Regarding
the frequency of linguistic class use, “medium” was most frequently used to assess rules,
with 42 uses, while “high”, “low”, “very high”, and “very low” were used 30, 21, 15, and
12 times, respectively.
All of the fuzzy rule statements for subsystem I (Appendix A), after consensus, are
listed according to their rule number. This final version of the rules will be stored in the
fuzzy rule base of the knowledge base to fuel the fuzzy inference engine.
Managing Overcrowding in Healthcare using Fuzzy Logic 215
Table 8 is comprised of results from the assessments of the fuzzy rules from
subsystem II. This table shares similar features from Table 7, consisting of the same
number of columns and expert evaluations. Below the table is a legend comprising three
linguistic classes which are color-coded. Within subsystem II, five of the nine rules
received 90% consensus or greater, consisting of FLS2-01, FLS2-04, FLS2-05, FLS2-06,
and FSL2-09. Three of these rules received 80% consensus, which were FLS2-02, FSL2-
07, and FSL2-08. FSL2-03 received 50% consensus. The average consensus rate for the
whole subsystem was 84%, which is higher than the previous subsystem, which featured
more fuzzy rules and linguistic classes. Seven of the evaluated fuzzy rules were assessed
with only two linguistic terms or less, and two rules were assessed with three terms. The
frequency of linguistic classes used in assessing rules was the highest in “inadequate”
with 41 uses, followed by “partially adequate”, and “adequate”, with 34 and 15 uses,
respectively.
The final fuzzy rule statements for subsystem II (Appendix B) after consensus are
listed according to their rule number. These final nine rules are stored in the fuzzy rule
base of subsystem II to feed the decision engine of the fuzzy system.
Table 9 contains data from the expert assessments of the fuzzy rules of subsystem III.
It is structured in the same manner as the previous fuzzy rule evaluation tables in terms of
the number of columns and what they represent, however there are four color-coded
216 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
linguistic terms that are associated with the fuzzy classes. There are a total of 360 rule
assessments in this table, which represents the assessment of 36 rules by ten experts. It is
apparent that 31 of the 36 evaluated rules were evaluated using two or fewer linguistic
terms, and the remaining rules were evaluated with no more than three terms. Five
assessed rules reached full consensus, with an agreement rate of 100%; FLS3-09, FLS3-
20, FLS3-24, FLS3-26, and FLS3-31. It is also observed that twelve assessed rules
received a consensus rate between 80% and 90%, while eighteen rules reached the range
of 60% to 70%. Finally, one rule, FLS3-02, achieved a minimum consensus rate of 50%.
The average consensus rate for this subsystem is 76%, which when compared to the
average rate of 79% for subsystem I, is relatively close, even though subsystem III
featured more inputs. When compared to subsystem II’s average consensus rate of 84%,
www.electronicbo.com
76% is still satisfactory, although subsystem III contained more assessment classes. The
frequency of linguistic class use in assessing rules was the highest in the “high” class
with 124 uses, followed by “medium” with 105 uses, while the least used classes were
“low” and “very high”, with 66 and 65 uses, respectively.
The final list of fuzzy rules for subsystem III is provided in Appendix C, which will
be stored in the fuzzy rule base to build the fuzzy knowledge base.
The results for subsystem IV’s rule assessments are provided in Table 10, which is
the most significant subsystem in the fuzzy system. In this subsystem, ten experts
evaluated 80 rules against five assessment levels, with each rule consisting of a
combination of three AND conditions. As each rule is designed with three combinations
for the antecedent, to be assessed at five levels, this subsystem presents the highest
complexity for expert assessment.
Managing Overcrowding in Healthcare using Fuzzy Logic 217
Table 10: Results of expert evaluation for subsystem IV’s fuzzy rules.
The results show that this subsystem is the only one in the entire designed fuzzy
system that contained some rules which did not initially meet the given consensus
criteria. These rules were FLS4-16, FLS4-22, FLS4-49, FLS4-52, FLS4-57, FLS4-72,
and FLS4-78, and required an additional round of evaluation with new expert assessors.
All seven rules in question achieved the minimum criteria upon the first additional round
of evaluation, as it was likely to cause the consensus rate to cross beyond the threshold of
50%. The consensus rates of re-evaluated rules were all 54.5%, meeting the requirements.
With these additional evaluations, the total number of rule assessments was brought to
807.
Upon analyzing the data, it can be found that seven of the assessed rules reached a
consensus rate of 100%, which were FLS4-01, FLS4-03, FLS4-07, FLS4-64, FLS4-66,
FLS4-76, and FLS4-80. Among the remaining rules, twenty-six reached consensus rates
between 80% and 90%, while thirty-five reached rates between 60% and 70%, and five
rules had a consensus rate of 50%, passing minimum consensus requirements. The
average consensus rate of this subsystem is 72%, compared to 76%, 84%, and 79% in
subsystems III, II, and I, respectively. Among the different linguistic terms used by
experts, fifty-three rules were evaluated using two or fewer of the five assessment
classes. The remaining rules received assessments using exactly three terms. For all 80
rules, the variation in expert assessment is small, as in cases where experts did not all
unanimously agree using only one linguistic term, they reached consensus using either
two linguistic terms in adjacent classes (such as “low”-“medium”, or “medium”-“high”),
or three terms describing adjacent classes (such as “insignificant”-“low”-“medium”).
After the final round of assessments, experts most frequently used “medium” to assess
rules, with 277 uses, followed closely by “high” with 269 uses, while “extreme”, “low”,
and “insignificant” were selected 126, 102, and 33 times, respectively.
218 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
The final fuzzy rules for subsystem IV are provided in Appendix D. These rules will
become an essential part of the knowledge base for subsystem IV.
The results presented in this section are a critical component of this research, as they
provide validation for the design intent of the framework, and show that the consensus
rates for rule assessments are good, necessitating only seven re-evaluations among the
initial 137 rules. The average consensus rate was 72% or better between each of the four
subsystems, which further highlights the consistency of results. It was observed that the
average consensus rate decreased noticeably in subsystems where there were either an
increase in assessment classes, more rules, or more complex rules with more conditions
for experts to evaluate. These factors contributed to each subsystem’s complexity,
contributing to the overall decrease in average consensus rate. The assessed fuzzy rules
www.electronicbo.com
will build upon the designed fuzzy system by feeding the four different fuzzy engines
from subsystems I-IV with supporting information to link the inputs to the outputs.
The fuzzy logic toolbox of Matlab R2015b (Version 2.2.22) was used to construct
and simulate each fuzzy subsystem individually, with data gathered from experts. A
series of 3-D surface plots were generated relating the inputs of each subsystem to their
respective outputs. This was accomplished through the products of the proposed
architecture, including the development of membership functions from quantitative data
collected from experts, and the expert subjective assessment of rules. These generated
surface plots allow for a clearer view of how the different fuzzy subsystems function, and
it makes the relation between inputs more visually accessible. Additionally, the surface
plots allow for determining the outputs of the subsystems in a straightforward manner by
only using inputs, bypassing lengthy calculations. This section provides the results from
the fuzzy logic subsystems and presents the surface plots for the output of the
subsystems.
Figure 21 illustrates the surface of subsystem I, defined by two input axes, patient
complexity and patient demand, and one output axis, ED demand. The values for ED
demand on the surface plot range from 8 to 92, resulting from the centroid method used
for defuzzification. Generally speaking, it can be observed on the surface that ED
demand will increase with patient complexity if patient demand is held constant, and
similarly ED demand will increase with patient demand if patient complexity is held
constant. Interestingly, when patient demand is approaches a value of 1, the ED demand
plateaus when patient complexity is between 1 and 2, unless patient complexity increases.
The step-like structure occurring for patient demand higher than 1 resembles another
local step structure for patient complexity higher than 4, where ED demand cycles
between plateaus and increases until it plateaus near its maximum value. For patient
Managing Overcrowding in Healthcare using Fuzzy Logic 219
demand less than 1 and patient complexity less than 4, the surface appears to linearly
increase in a more predictable manner than the two step-like structures near its extremes.
Figure 22 demonstrates the relation between the inputs (nurse staffing and physician
staffing) and output (ED staffing) of subsystem II, where ED staffing ranges between
scores of 14.9 and 89.1. ED staffing appears to increase in a similar manner with either
nurse staffing or physician staffing when the other input is held constant, although the
increase is not as high as when both inputs are proportionally increased. In other words,
there are several plateau planes on the surface where ED staffing will only increase when
both inputs are proportionally increased. When physician staffing is held constant, around
0.1 for instance, ED staffing will not increase after nurse staffing increases beyond 1.5,
demonstrating the logical relation between the ED staffing and the ratio between nurses
and physicians. If the ratio of physicians to nurses is low, ED staffing will be considered
low, and an ED’s staffing size and thus ability to see to patients would not likely increase
if the nursing staff was increased in size. This illustrates that a proportional number of
physicians and nurses would be required for an ED to effectively maintain a high staffing
level. It may also be noted that the slope of the surface from 50 to 89 ED staffing score is
steeper for increasing nursing staff than when physician staffing is increased, which may
be due to the different scales of the input axes.
In Figure 23, surfaces a through k represent the relation between ED workload and its
inputs, average patient complexity and ER occupancy rate when ED staffing is held at
eleven different constants, ranging from near zero to 100 for each respective surface. For
surfaces a, b, and c, when ED staffing is between near zero and 20, high ED workload
reaches scores of 60 quickly with medium occupancy rates and average patient
complexity. When average patient complexity achieves values higher than 4, and
occupancy rates achieve values higher than 50, ED workload plateaus unless both
average patient complexity and occupancy rates increase, leading to a peak area of the
surface where ED workload reaches scores near 80. When ED staffing is between 30 and
60, for surfaces d through g, the impact of better staffing can be seen on ED workload.
The increase of ED workload becomes more gradual with increasing average patient
complexity and occupancy rates, and the size of the surface representing ED workload
scores of 60 or higher decrease. In surfaces h through k, when ED staffing is between 70
220 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
and 100, the peak of the surface representing the highest scores for ED workload
becomes smaller, and areas of the surface representing increases in ED workload become
isolated in the plot, as higher values for average patient complexity and occupancy rate
become necessary to achieve high values for ED workload. This represents the impact
that increasing ED staffing to adequate levels has on ED workload, even when average
patient complexity and occupancy rates are high. There are always areas of the surfaces
where ED workload is high, however when ED staffing is increased, ED workload can be
said to decrease even for moderate values of its other two inputs.
Figure 24 consists of surfaces a through k of subsystem IV, showing the impact that
the inputs of boarding and demand have on the output of crowding, when the variable
workload is held at eleven constants, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100. In
www.electronicbo.com
surfaces a through c, when workload is low, crowding generally increases with boarding
and demand, however the peak values in surfaces b and c differ from surface a. The peak
of the surface decreases in size and transitions into a plateau in surfaces b and c,
indicating a wider range of input values that lead to the same high level of crowing.
In surfaces d through g when workload is between 30 and 60, the lower values of the
surface become more isolated, and all points on the surfaces appear to rise, representing
an overall increase in crowding for all values of boarding and demand. It can be observed
that increasing the ED workload evenly increases crowding under any condition of
boarding and demand.
As workload approaches values between 70 and 100, surfaces h through k show that
crowding continues to generally increase for all boarding and demand values, and the
surfaces peak at higher values. A plateau emerges in surface h, where crowding remains
constant for boarding values which exceed 0.2, when demand is below 50. Beyond
boarding values of 0.2, crowding will only increase when demand is increased beyond
50. This demonstrates that under high workload, there are consistent levels of crowding
when boarding is high, but demand is low. Only when both boarding and demand are low
does crowding achieve minimum values under high workload.
Figure 23: Sensitivity analysis subsystem III. Figure 24: Sensitivity analysis subsystem IV.
Managing Overcrowding in Healthcare using Fuzzy Logic 221
This section details the process for implementing and testing the accuracy of the
proposed fuzzy model framework, which will be described as the Global Index for
Emergency Department Overcrowding, or GIEDOC. One of the main goals of the
GIEDOC is to produce reliable results which can be reproducible in EDs of other
healthcare systems. The design of the GIEDOC accounts for this in the knowledge base,
as ten healthcare experts from a nation in question may provide data to be fed into the
knowledge base, allowing the fuzzy system to produce results. This is why the design of
GIEDOC is unlike other developed indices, which when tested outside their countries of
origin, do not show adequate reproducibility when implemented. In order to accurately
assess the GIEDOC, it must be implemented in real ED environments to measure the
level of crowding, and at the same time, an expert assessment of a native expert must be
made of the same environment to compare the results from the GIEDOC.
For the purposes of measuring the accuracy of the GIEDOC, five classes within the
GIEDOC were defined by five equal intervals on a scale from 0 to 100, so that the classes
could be compared to the subjective assessment of experts. These five classes for
assessing ED crowding on five subjective levels were: 1 for “insignificant”, 2 for “low”,
3 for “medium”, 4 for “high”, and 5 for “extreme”. In other words, this was done to
compare the agreement of the index to experts, by determining if this scale reflects the
expert perspective for crowding. The GIEDOC was implemented for three days in a
public Saudi Arabian hospital in Jeddah, which sees more than one hundred thousand
patients in its emergency department on a yearly basis, possessing more than 400
inpatient beds and 42 emergency beds. During the validation, twenty-four observations
were made to collect data which focused on factors including the capacity of the
emergency department, the number of patients in the waiting area, ER, and boarding
areas, the number of present physicians and nurses, the average patient complexity in
both the waiting area and the ER, and finally a healthcare expert’s subjective assessment
of crowding. These results are detailed in Table 11, where the ED crowding level scale
can be compared to class number assigned by experts Kappa analysis was used to test the
agreement between the computed GIEDOC scores and the subjective assessment of the
healthcare experts. These statistics allow for the comparison of the accuracy of the results
from GIEDOC to those of other indices when assessing ED crowding.
Table 11 provides the data obtained from the twenty-four observations conducted for
validation of the GIEDOC, resulting in calculated scores for the major operational
factors. The demand scores ranged from values of 8 to 61.4 according to the demand
indicator of the GIEDOC, while staffing scores ranged from 50 to 85.1, and ED workload
ranged from 33.33 to 89.2. It should be noted that the majority of staffing scores obtained
their maximum values, indicating that over the three days of validation, the selected ED
almost always maintained adequate staffing. There was higher variation in the range of
222 Abdulrahman Albar, Ahmad Elshennawy, Mohammed Basingab et al.
demand and ED workload scores. ED crowding level scores achieved values between 25
and 75. To further study the variation in scores between observations, the scores were
plotted in Figure 25.
Table 11: Crisp inputs and their computed crisp output using GIEDOC
www.electronicbo.com
The plot in Figure 25 further shows the consistency in the staffing score across the
twenty-four observations, varying slightly between observations 19 and 24. Generally
speaking, when demand, boarding, and workload scores were decreasing or increasing
between observations, such as in observation four, the crowding level decreased or
Managing Overcrowding in Healthcare using Fuzzy Logic 223
increased accordingly. In other observations such as 8 and 9, when factor scores such as
workload increased while another factor such as boarding decreased, the resulting
crowding score exhibited no change. In observation 21 when other scores exhibited
minimal change, a sharp increase in the demand score can be attributed to the sharp
increase in crowding, demonstrating the significance of the role of demand in crowding.
The agreement between GIEDOC and expert assessment is analyzed in Table 11,
where assessments are documented according to the “low”, “medium”, and “high”
classes (2, 3, and 4) from Table 11. The GIEDOC issued 4 assessments for “low” scores,
15 for “medium”, and 5 for “high”, while the expert provided 3 “low” assessments, 13
“medium”, and 8 “high”. For the low class, the GIEDOC and the expert issued the same-
assessment agreements twice, while they agreed eleven times for the medium class, and
five times for the high class. When measured against the expert assessments, the
GIEDOC overestimated once for the low class, (providing a score of “medium” where
the expert provided a score of “low”), and underestimated the medium class twice
(providing “low” while the expert provided “medium”), while underestimating the high
class three times. It should be noted that the insignificant and extreme classes could not
be predicted, as the ED during this study was neither empty nor extremely overcrowded
according to both scores from the expert and the GIEDOC. Most activity regarding the
major operation factors occurred in the third level or “medium” class according to their
scores.
The Kappa value found for the system was 0.562, 95% CI [0.45, 0.66], which
indicates moderate agreement between the objective and subjective scores of GIEDOC
and the expert.
linear regression or multiple regression could be used to model the demand side of the
problem in such a way to make the index more robust and accurate. A separate research
effort could focus on developing a set of action protocols for EDs, to specify a course of
action to both prevent and react to overcrowding when it occurs, as identified by the
index. Finally, a more rigorous validation study could simulate the index by integrating it
with a discrete event simulation model to study its performance over a longer period of
time. With such a simulation, the impact of the determinants on the overcrowding score
could be more accurately observed. Patterns of simulated data used to more closely
observe the impact of each factor on overcrowding could also be used to draw
conclusions for the development of future ED policy.
www.electronicbo.com
REFERENCES
Johnson, K. D., & Winkelman, C. (2011). The effect of emergency department crowding
on patient outcomes: a literature review. Advanced emergency nursing journal, 33(1),
39-54.
MOH. (2014). Statistical Year Book. Retrieved from http://www.moh.gov.sa/
en/Ministry/Statistics/book/Documents/Statistical-Book-for-the-Year-1435.pdf
Reeder, T. J., & Garrison, H. G. (2001). When the Safety Net Is Unsafe Real‐time
Assessment of the Overcrowded Emergency Department. Academic Emergency
Medicine, 8(11), 1070-1074.
Sivanandam, S., Sumathi, S., & Deepa, S. (2007). Introduction to fuzzy logic using
MATLAB (Vol. 1): Springer.
Weiss, S. J., Derlet, R., Arndahl, J., Ernst, A. A., Richards, J., Fernández‐Frankelton, M.,
Levy, D. (2004). Estimating the degree of emergency department overcrowding in
academic medical centers: results of the National ED Overcrowding Study
(NEDOCS). Academic Emergency Medicine, 11(1), 38-50.
AUTHORS’ BIOGRAPHIES
2014. He served as a Graduate Assistance at King Abdul-Aziz University for 2 years, and
employed as a Development Engineer in Jeddah Municipality for one year. His research
interests include Quality, Big Data Simulations, Agents, Internet of Thing, and Supply
Chain.
www.electronicbo.com
Solutions. In addition, he taught in the Management Science Department MIS Track at
Yanbu University College (YUC). While his work at KAU, he served as the Head of
Industrial Engineering Department and the Vice Dean for Development at the Faculty.
Recently, He was appointed as the Dean of Community College at University of Jeddah.
His area of research is quality applications in service industry especially those
applications related to health care sector.
APPENDIX
Appendix A: Fuzzy rule statements for subsystem I Appendix B: Fuzzy rule statements for subsystem II
Managing Overcrowding in Healthcare using Fuzzy Logic 227
www.electronicbo.com
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.
Chapter 10
ABSTRACT
The use of Discrete Events Simulations (DES) in the healthcare sector is not new.
However, the inherent complexity of operations, the need to understand the complexity
and the stochastic nature of the modeling process, and the lack of real data, have
alienated many stakeholders and severely limited their involvement in healthcare. This
research posits that the combined use of DES and Case-Based Reasoning (DES-CBR)
can assist in the solution of new cases, and improve the stakeholders’ involvement by
eliminating the need for simulation or statistical knowledge or experience. Using a
number of unique healthcare based simulation cases, a base-case system was initially
developed and then used to implement the CBR using a case study, with results evaluated
using real data from the system and by healthcare experts.
*
Corresponding Author Email: shfkhaled@gmail.com
230 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
INTRODUCTION
The gap between healthcare spending and economic growth in many nations around
the world including the United States, has been widening at a faster rate requiring scares
resources be allocated to mitigate the impact of the steep rise of healthcare cost instead of
being devoted to economic growth. This uncontrolled phenomenon can be attributed to
many factors including population growth, population aging (Thorwarth & Arisha, 2009),
the development cost of new technologies (Aboueljinane, Sahin, & Jemai, 2013), and the
use of expensive new diagnostic tests and treatments. Furthermore, the limited
availability and the over-utilization of healthcare facilities and providers such as
physicians, nurses, and others (Tien & Goldschmidt-Clermont, 2009), have also
www.electronicbo.com
attributed to the deterioration of the efficiency and effectiveness of healthcare processes,
and the degradation of the proper delivery of healthcare services (Faezipour & Ferreira,
2013).
Discrete Events Simulation (DES) has been used by many healthcare organizations as
a tool to analyze and improve their healthcare processes such as delivery systems, patient
flow, resources optimization, and patient admission (Gosavi, Cudney, Murray, & Masek,
2016; Hamrock, Paige, Parks, Scheulen, & Levin, 2013; Katsaliaki & Mustafee, 2010;
Parks, Engblom, Hamrock, Satjapot, & Levin, 2011). However, the use of DES poses
many challenges including the modeling complexity of the healthcare environment, the
lack of real data, and the difficulty in the implementation of the proposed solutions and
recommendations. Furthermore, the need to understand the stochastic nature of the
decision-making modeling process has limited the involvement of many healthcare
decision makers, and has reduced the effectiveness of the use of simulation in the
healthcare field as compared to other fields (Roberts, 2011).
CBR consists of four main processes - retrieve, reuse, revise, and retain, also known
as the 4Rs. The traditional CBR approach, shown in Figure 1, is a part of machine
learning created to fill in the gaps from available limitations in current rule-based systems
and help in gaining more knowledge.
Figure 1. The traditional CBR process (Zhao, Cui, Zhao, Qiu, & Chen, 2009).
As described by (De Mantaras et al., 2005), the process of solving a problem using
CBR involves -1) obtaining a problem description, 2) measuring the similarity of the
current problem to previous problems stored in a case base, 3) retrieving the solution of
the similarly identified problem if identical, or 4) possibly adapting it to account for the
differences in problem descriptions. The new solution is then retained in a case-base for
future use (Figure 2).
232 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
www.electronicbo.com
Figure 2. The CBR methodology structure for simulation.
This initial step in constructing the case base for this study, involved searching the
literature, collecting and analyzing suitable Emergency Departments’ operations related
cases (see Table 1).
The Utilization of Case-Based Reasoning 233
Table 1. ED Cases
The second step in constructing the case base system involves defining the indexing
system to identify the specifics of the attributes of the solved cases for easy indexing and
retrieval. Attributes could be either numerical or non-numerical such as locations,
programs used, or type of employees to name a few. A retrieval engine will then use the
identified attributes of the new case to retrieve similar cases from the case-base.
The Emergency department’s operations attributes include:
234 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
www.electronicbo.com
move to either the treatment station or the hospital depending on their conditions,
while other patients will need to register prior to proceeding to the treatment
station and receive the needed treatment. The lab station where services including
x-rays, CAT-SCAN, or any other tests are made available to the patients. Finally,
patients leave the ED through the exit station. The other three paths include
different permutation of the same services and stations.
3. The third attribute includes the available resources performing treatments in the
ED including physicians, specialty doctors, and nurse practitioners that treat low
acuity patients in some EDs.
4. The fourth attribute includes the number of nurses and their classification such as
triage nurses, emergency nurses, and regular nurses. These two attributes are
initialized at one “1”, since all EDs will have at least one doctor and one nurse.
5. The fifth attribute includes the number of lab technicians in the EDs, and the
number of workers in the lab station.
6. The last attribute includes the number of staff in the EDs including all workers in
non-medical and administrative jobs in all stations. Upon indexing the cases, the
case-base will be populated as shown in Table 2.
The literature shows several techniques and algorithms used to create retrieval
engines for the CBR methodology. Examples of these techniques include nearest
neighbor, induction, fuzzy logic, database technology, and several others. The most
commonly used techniques are nearest neighbor and induction with decision trees
(Watson, 1999).
Page 1 of Table 2. The developed case-base for ED problems using DES
Categories
Optimization Problems Crowding Problems New design/methodology Problems
Case 1 Case 10
Doctors
Doctors
Nurses
Nurses
techs
techs
Staff
Staff
Path
Path
Lab
Lab
3 5 1 0 Path 1 32 75 0 0 Path 2
Case 2
Doctors
Nurses
techs
Staff
Path
Lab
3 13 1 0 Path 1
Case 3
Doctors
Nurses
techs
Staff
Path
Lab
10 12 0 5 Path 2
Case 4
Doctors
Nurses
techs
Staff
Path
Lab
3 6 2 0 Path 1
Case 5
Lab techs
Doctors
Nurses
Staff
Path
1 4 0 2 Path 1
Page 2 of Table 2. The developed case-base for ED problems using DES
Categories
Optimization Problems Crowding Problems New design/methodology Problems
Case 1 Case 10
www.electronicbo.com
Case 6
Doctors
Nurses
techs
Staff
Path
Lab
2 10 3 2 Path 2
Case 7
Doctors
Nurses
techs
Staff
Path
Lab
2 4 1 1 Path 4
Case 8
Doctors
Nurses
techs
Staff
Path
Lab
2 5 2 0 Path 2
Case 9
Doctors
Nurses
techs
Staff
Path
Lab
3 6 1 1 Path 3
The Utilization of Case-Based Reasoning 237
where,
NC represents the new case
SCs are stored cases in the case-base.
n is the number of attributes in each case
w is weight, and
f is the similarity function.
238 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
In this analysis, the K nearest neighbor algorithm and the Euclidean distance were
used to determine the similarity function for the numerical attributes. The Euclidean
distance is calculated using the following equation:
where,
Di is the Euclidean distance between stored case i and the new case
anx are the attributes of the new case.
www.electronicbo.com
aix are the attributes of the case i.
m is the number of numerical attributes.
The Induction Tree approach uses the already defined indexing system to develop the
decision tree representing the case-base, resulting in faster retrieval time, and different
results than the K nearest neighbor approach. This tree represents the hierarchical
structure of the simulation cases stored in the case-base. The assignments of attributes
The Utilization of Case-Based Reasoning 239
among different tree levels will show the relative importance of these attributes in the
process of developing a solution to the new problem. This T tree represents the stored
simulation cases in the case-base and defined as
𝑇 = {𝑁,} Where,
N is the set of nodes (attributes),
n is the number of node in the tree
E is the set of edges connecting nodes and correlating attributes,
l is the level of the node, where
l = 0 Root node, l = 1 Category of the case, l = 2 Path number,
l = 3 # Doctors, l = 4 # Nurses, l = 5 # Lab technicians, l = 6 # Staff, and l = 7
Case Number
For each node in N, degree = number of directly connected nodes in levels l – 1 and
l+1
(a) A root node acting as a pointer that reference all sub-nodes in the first level
(starting node of the tree)
(b) Intermediate nodes: all nodes in the tree with level 1 < l < 7. These nodes contain
the set of all child nodes Cl in the direct lower level that connected by edges.
(c) Leaf nodes: all nodes in the tree with degree = l, and l = 7. Each leaf node
expresses a specific set of attributes relating to its parents. The tree of the
developed case-base is shown in Figure 4.
For the stored simulation cases, let each case Ax describe as a set of different
attributes composing a distinctive case {a1, a2, al-1}. Also, for each attribute ai there is a
set Vi that contains all possible values of this attribute {vi1, vi2, … vir}. For example, the
first attribute a1 corresponding to the category of the simulation problem has V 1 =
{Optimization, Crowding, New design/methodology}.
The induction tree approach will be ready to use as soon as the decision tree is
developed. Attributes of each of the new cases will compose a new set G = {g1, g2, … gl-
240 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
1} to retrieve similar cases from the case-base by matching the elements of this target set
to those of the same level in the case-base. This comparison guides the search as it
traverses through the decision tree. The approach starts at the root node (l = 0) where the
first step in the retrieval process is to match g1 to an element in V1 (all children of the root
node) such as:
𝑖𝑓 𝑔1∈𝑉1→𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ= 𝑀𝑎𝑡𝑐ℎ
Else 𝑖𝑓 𝑔1∉𝑉1→𝐴𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑀𝑎𝑡𝑐ℎ=𝑁𝑜 𝑀𝑎𝑡𝑐ℎ.
If a match does not exist, the retrieval process will terminate. If on the other hand the
new case finds a match in the base-case, the decision tree will then choose the edge that is
www.electronicbo.com
connected to the node (at l = 1) with the same category as the target case. The step to
follow, is to match all the remaining attributes of set 〈𝐺〉= {g2, … gl-1}by comparing the
second attribute, g2 to a subset of 〈𝑉2〉; where V2 is the set that contains all the possible
paths taken by patients, and 〈𝑉2〉 contains all the paths under the matched category g1.
Due to the nature of this attribute, four different paths might be possible in the case-base.
The attribute match function yields three possible results as follows:
Based on the value of the attribute match, the approach will choose the edge that is
connected to the node (at l = 2). This choice will yield the same path number when
perfect matching is achieved. However, if perfect matching is not achieved, then a partial
match or somewhat match will be chosen. The next step includes the matching of the
remaining attributes of set 〈𝐺〉= {g3, … gl-1} to a subset of 〈𝑉3〉; where V3 is the set
containing the possible number of doctors in the ED, and 〈𝑉3〉 contains all number of
doctors matched under path g2. The remaining attributes are numerical, and will have
similar matching functions. For g3, the attribute matching function will use the absolute
difference between g3 and each of the elements in 〈𝑉3〉
Based on the difference value zi, the approach will choose the node (at l = 3)
corresponding to the minimum difference value. The attribute match value indicates the
degree of similarity between the target case’s attribute g3 and each one of the elements in
the subset 〈𝑉3〉. Similarly, the same matching process is also used to match the remaining
attributes of the target case such as g4, g5, and g6. Finally, the subset 〈𝑉7〉 containing the
children of the node is matched with g6 to return the result of this retrieval engine. This
result will define the case(s) Ax from the case-base that are similar to the target case G.
The Utilization of Case-Based Reasoning 241
A java code was developed to automate the retrieval process and the development of
the case-base by adopting the solutions of the new cases using the interface shown in
Figure 5.
www.electronicbo.com
Figure 5. The interface of CBR methodology retrieval code.
The main objective was to improve the performance of the hospital’s ED (improved
utilization, and minimize time spent by patient) while keeping the same level of the
quality of healthcare services provided.
Patients arriving to the ED will first pick up a number, wait in the waiting area, and
then proceed to the triage station where a nurse assesses the severity of their cases using a
one to five (1 to 5) emergency severity index. Cases coded as 1 or 2 (critical conditions)
are directly admitted to the intensive care unit (ICU) to receive the required care, while
cases coded as 3, 4, or 5 proceed to register at the registration desk, and wait for a
physician who is always accompanied by a nurse for the initial assessment. Patients may
be discharged, or are may be asked for some lab tests, where some may be required to
wait for a second assessment where they may either be discharged or admitted to the
244 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
hospital. The hospital operates three eight-hour shifts (day, evening, and night sifts), with
additional resources allocated when crowded (from 10 am to 9 pm). ED process
flowchart is shown in Figure 8.
www.electronicbo.com
Case Retrieve
The target set of G = {Optimization, Path 1, 5, 11, 1, 0} describes the attributes of the
new case and reads as follow: 1) the objective of the study is optimization, 2) using Path
1, 3) with five doctors (physicians), 4) eleven nurses, 5) one lab technician, and 6) no
other staff for administrative purposes. Upon defining the target set, the retrieval code
searched the case-base for similarities to the case at hand using the two previously
described approaches. Cases 2, 4, and 1 were sequentially retrieved using nearest
approach with a K value of three (K = 3 due to the limited number of cases in the base-
case), while case 2 was retrieved using the induction tree approach, concluding that case
2 has the closest similarity to the new case.
Case Reuse
Choosing SIMIO as a modeling environment, a DES model for the problem at hand
was developed using the attributes of each of the previously described entities (patients,
Medical and non-medical staff), the working schedule, and the likely paths taken by
customers during their ED visit.
The simulation model ran under multiple scenarios and under multiple circumstances
with results revealing patients classified as code 3 had acceptable average waiting time in
the system of about 1.86 hours, while patients coded as 4 and 5 averaged a waiting time
of 11.86 and 5.86 hours respectively. Furthermore, the results show the utilization rate of
doctors and nurses’ running at 99%, with the first assessment station’s utilization rate
running almost at full capacity (see Figure 9, Table 4 & Table 5).
The Utilization of Case-Based Reasoning 245
Figure 9. Average time in the system for patients with different codes.
Average
Patient Average number of
Day of the Week Waiting Time in
Code Patients in ED
ED (Hours)
Monday 3 1.89 2.83
4 11.86 46.33
5 5.86 21.1
Tuesday 3 1.76 1.86
4 9.12 26.36
5 4.40 15.23
Wednesday 3 1.80 1.97
4 8.81 23.98
5 5.69 14.32
These numbers indicate that this hospital is underserved and lacks the required
resources to deliver satisfactory service at peak times, concluding the need for additional
resources (doctors and nurses) to serve the large number of patients to the ED every day.
After identifying the main problem and its root causes, the modeling team should re-
visit the retrieved cases to look for similar problems and their solutions. In this case, the
common solution suggested in similar cases was to hire more resources to meet the
increasing demand, and to maintain the quality of the provided services. In addition, a
benefit cost analysis may also be needed for justification purposes. For our case, the
retrieved alternative solutions are listed in Table 6.
Alternative 1: hire one more doctor and one more nurse, and revise the work schedule
to have an equal number of resources at each main shift as shown in Table 6.
246 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
Wed.
Mondays Tuesdays
Thursday, &
Day of the Week Peak Peak
Fridays Peak
Arrivals Arrivals
Arrivals
Utilization Rate: Doctors 99.20% 99.18% 99.15%
Utilization Rate: Nurses Accompanying 99.20% 99.18% 99.15%
Doctors
Utilization Rate Triage Station 84.63% 61.84% 58.63%
Average Time Triage Station (minutes) 18.6 5.4 5.4
Utilization Rate: Registration Station 66.72% 47.56% 61.56%
Average Time Registration Station (minutes) 1.2 1 1
www.electronicbo.com
Utilization Rate: First Assessment station 99.20% 99.14% 99.11%
Average Time in First Assessment Station (Hrs) 5.3 5.14 4.78
Utilization Rate: 2nd Assessment Station 57.67% 53.72% 61.56%
Average Time in 2nd Assessment Station 63 53.4 58.2
(minutes)
Utilization Rate: Lab Test Station 44.94% 46.35% 46.73%
Average Time in Lab Station (Hours) 18 13.8 15
Alternative 2: hire two more doctors and two more nurses, and schedule the most
resources in the evening shift since more patients visit the ED during that time. See Table
7 below.
Alternative 3: hire three more doctors and three more nurses, and schedule more
resources in the day and evening shifts (Table 8).
Alternative 4: Schedule the maximum number of doctors and nurses for each shift
(5 doctors and 5 nurses). Although this solution may be neither feasible nor
implementable, it may show the performance of the system when resources are
maximized for drawing up some contingencies (Table 9).
This step of the CBR methodology requires stakeholders’ involvement due to their
familiarity with their system, and their ability to address concerns that may be critical to
the interpretation of the simulation model and its outputs.
The adopted solution was coded (assigned a case number), indexed as an
optimization case, and was then added to the case-base.
12
11
10
9
8
7
6
5
4 Code 3 patients
3
2 Code 4 patients
1
0
Code 5 patients
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4
www.electronicbo.com
Average Time in the System using Tuesday’s Maximum
Arrival Rate- time units are hours
12
11
10
9
8
7 Code 3 patients
6
Code 4 patients
5
4 Code 5 patients
3
2
1
0
Current Alternative 1 Alternative 2 Alternative 3 Alternative 4
deemed as the most appropriate for the case at hand. The output of the simulated model
including total time in the system for patients with triage levels, and the waiting times at
each of the stations was validated by healthcare experts, and verified the ability of the
simulation model to reflect the actual system (see Table 10).
Table 10. Comparison of simulation output and the real data
www.electronicbo.com
T2 Time between triage and registration
T3 Time from registration to available exam room
T4 Time from first assessment to discharge
Simulation output vs. Real data collected (in minutes)
T1 T2 T3 T4
Days Simulation Simulation Simulation Simulation
Real data Real data Real data Real data
Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI
Mon 12.7 17.0 (4.8-46.8) 1.7 1.0 (0.42-2.4) 235.0 136.0 (64.2-175.2) 36.0 57.0 (19.8-113.4)
Tue 6.6 5.4 (2.4-10.8) 0.6 0.5 (0.06-1.2) 144.0 97.3 (39.6-150.6) 36.0 46.1 (12-94.2)
Wed 10.0 4.9 (1.8-9.6) 1.8 0.6 (0.12-1.8) 121.0 92.4 (38.4-166.8) 40.0 51.2 (19.2-117.6)
Thu 10.0 4.9 (1.8-9.6) 1.8 0.6 (0.12-1.8) 121.0 92.4 (38.4-166.8) 40.0 51.2 (19.2-117.6)
Fri 17.9 4.9 (1.8-9.6) 2.2 0.6 (0.12-1.8) 101.0 92.4 (38.4-166.8) 42.0 51.2 (19.2-117.6)
As shown in Table 10, the waiting time prior to the first assessment station (T3) is the
longest in the system with high discrepancy between the simulation results and the
collected data especially on Mondays where arrival rates are usually higher. According to
the healthcare professional who are familiar with this ED, this discrepancy is attributed to
medical personnel who sometimes violates the priorities of the different triage patients’
levels, and serve patients (code 5) who waited long period causing longer wait for other
patients. This behavior is understandable, as the fear is that these patients may leave the
system without being treated or seen by doctors. In addition, the high utilization rate of
the healthcare employees and facilities may require some unplanned breaks, and
inefficient scheduling. The system experts deemed the rest of the results acceptable.
For face validity, three highly experienced healthcare professional with familiarity in
managing emergency departments tested the simulation model and provided important
feedback. Although the developed alternatives provided excellent results, it was quite
understandable that some of the results will never be implemented due to its high cost,
and the limited available resources.
CONCLUSION
This research proposed the use of Discrete Event Simulations (DES) and Case-Based
Reasoning (CBR) to facilitate the decision making process in the healthcare sector,
improve the stakeholders’ involvement in the analysis of healthcare problems, and in
mitigating the difficulties faced by the modeling team. In this research, we focused on
emergency departments (ED) which face multiple resource constraints including
financial, labor, and facilities. The applications of DES-CBR provided solutions that were
realistic, robust, and more importantly the results were scrutinized, and validated by field
experts.
Other fields within the healthcare sector may also benefit from such application.
While other research venues may include a better indexing system, and more efficient
ways to retrieve cases in particular as more cases are added, and more attributes are
searched.
REFERENCES
Aboueljinane, L., Sahin, E. & Jemai, Z. (2013). A review on simulation models applied
to emergency medical service operations. Computers & Industrial Engineering,
66(4), 734-750.
252 Khaled Alshareef, Ahmad Rahal and Mohammed Basingab
www.electronicbo.com
Duguay, C. & Chetouane, F. (2007). Modeling and improving emergency department
systems using discrete event simulation. Simulation, 83(4), 311-320.
Faezipour, M. & Ferreira, S. (2013). A system dynamics perspective of patient
satisfaction in healthcare. Procedia Computer Science, 16, 148-156.
Gosavi, A., Cudney, E. A., Murray, S. L. & Masek, C. M. (2016). Analysis of Clinic
Layouts and Patient-Centered Procedural Innovations Using Discrete-Event
Simulation. Engineering Management Journal, 28(3), 134-144.
Gul, M. & Guneri, A. F. (2012). A computer simulation model to reduce patient length of
stay and to improve resource utilization rate in an emergency department service
system. International Journal of Industrial Engineering, 19(5), 221-231.
Hamrock, E., Paige, K., Parks, J., Scheulen, J. & Levin, S. (2013). Discrete event
simulation for healthcare organizations: a tool for decision making. Journal of
Healthcare Management, 58(2), 110-125.
Katsaliaki, K. & Mustafee, N. (2010). Improving decision making in healthcare services
through the use of existing simulation modelling tools and new technologies.
Transforming Government: People, Process and Policy, 4(2), 158-171.
Lim, M. E., Worster, A., Goeree, R. & Tarride, J.-É. (2013). Simulating an emergency
department: the importance of modeling the interactions between physicians and
delegates in a discrete event simulation. BMC medical informatics and decision
making, 13(1), 59.
Meng, G. S. n. d. (2013). Ambulance Diversion and Emergency Department Flow at the
San Francisco General Hospital. Retrieved from https://hbr.org/product/ambulance-
diversion-and-emergency-department-flow-at-the-san-francisco-general-
hospital/W13054-PDF-ENG.
Mott, S. (1993). Case-based reasoning: Market, applications, and fit with other
technologies. Expert Systems with applications, 6(1), 97-104.
Parks, J. K., Engblom, P., Hamrock, E., Satjapot, S. & Levin, S. (2011). Designed to fail:
how computer simulation can detect fundamental flaws in clinic flow. Journal of
Healthcare Management, 56(2), 135-146.
The Utilization of Case-Based Reasoning 253
AUTHORS’ BIOGRAPHIES
www.electronicbo.com
Mohammed Basingab is a doctoral candidate at the University of Central Florida.
He completed B.S. in Industrial Engineering from King Abdul-Aziz University in 2009,
and received his M.S. in Industrial Engineering from University of Southern California in
2014. He served as a Graduate Assistance at King Abdul-Aziz University for 2 years, and
employed as a Development Engineer in Jeddah Municipality for one year. His research
interests include Quality, Big Data Simulations, Agents, Internet of Thing, and Supply
Chain.
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.
Chapter 11
1
Department of Industrial Engineering and Management Systems,
University of Central Florida, Orlando, Florida, US
2
Department of Industrial Engineering,
University of Jeddah, Jeddah, Saudi Arabia
ABSTRACT
*
Corresponding Author Email: hatimbukhari@gmail.com
256 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
INTRODUCTION
www.electronicbo.com
As a result, management faces the challenge of implementing the right strategy in the
face of competing objectives.
Peer-to-peer lending is a form of consumer-to consumer ecommerce whereby lenders
pool their resources together and lend it to borrowers at a lower rate using an online
platform without the direct mediation from financial institutions. Consumer-to-consumer
(C2C) companies face competitions from large organizations as well as from
entrepreneurs who have little to lose by embarking in the business. Customers do not
need to leave the comforts of their homes to find better deals. They can compare the
offerings of different companies online and make a hassle free change if they are not
getting value for their money. Other challenges facing C2C business models include how
to unify a group of consumers according to their needs, preferences and interaction with
each other.
Stakeholders range from providers, customers, companies and complementors (Wu
and Hisa, 2004). These stakeholders include the community, suppliers, alliance partners,
shareholders and government that form a large collection of active objects in the system
seeking to maximize their utility. With the growing popularity of C2C models, decision
making on the part of stakeholders can be difficult due to the interplaying factors and
uncertainty in customer demand. On the other hand, risks can include fidelity, payment
fraud and viruses. These characteristics make for a complex system with multi-level
abstractions and heterogeneous elements. Simulation serves as a decision support tool but
there exist limitations of individual simulation paradigms. It is in the interest of these
complex organizational environments to use knowledge of stakeholder actions and
business processes for decision-making (Joledo, 2016). These actions give rise to
nonlinear interactions that are difficult to capture using standalone simulation paradigms.
The complex interactions among different functional areas require modeling and
analyzing the system in a holistic way. There is a lack of mechanism to facilitate
systematic and quantitative analysis of the effects of users and management actions on
peer-to-peer lending system performance through the understanding of the system
behavior.
Agent-Based Modeling Simulation and Its Application to Ecommerce 257
The complexity of the market and customer behaviors benefit from nontraditional
modeling tools for analysis. Behaviors can be defined at individual level and at the
system level. Hybrid simulation provides an approach that does not make the assumption
of a perfect market and homogeneity.
Internet based models cause disruptions to traditional business models. New players
find it challenging navigating the highly competitive landscape of this complex
environment. Due to aforementioned characteristics, the ecommerce system tends
towards complexity. There exist several performance risks associated with the business
model. These risks include minimal return on investment, government regulations and
lack of trust. Results from case studies and literature review reveal that the performance
of C2C ecommerce remain under explored from a system perspective. Complex
interactions exist among stakeholders, the changing environment and available
technology. There is a need for an integrated system that will provide a testing ground for
managing control actions, anticipating changes before they occur and evaluating the
effects of user actions on the system at different managerial levels.
The presence of continuous and discrete behaviors poses challenges for the use of
existing simulation tools in simulating the C2C ecommerce space. The system is
characterized by uncertainty as well as government regulations and external factors.
Important factors such as liquidity and different threshold values for consumers remain
undefined. Not addressing these issues can result in financial losses and lack of trust that
can erode the benefits of the business model. There is a need to systematically map,
model and evaluate the viability and performance in order to realize the best tradeoff
between benefits and risks. This study presents a framework to systematically map,
model and evaluate the viability and performance in order to evaluate tradeoffs between
benefits and risks.
The paper is organized as follows. Section 2 introduces the application of system
simulation and modeling (system dynamics in particular) to ecommerce research. Section
3 describes the developed framework. Section 4 presents the Lending Club case study
while the application of the agent based simulation and the system dynamics models as
well as some results are presented in Section 5. The paper concludes and prescribes some
future directions for this study in Section 6.
BACKGROUND
Ecommerce systems are intelligent systems (Bucki and Suchanek, 2012) and a
system (simulation) can be discrete or continuous. In continuous simulation, the system
evolves as a continuous function represented by differential equations while in discrete
258 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
simulation, changes are represented as separate events to capture logical and sequential
behaviors. An event occurs instantaneously (such as the press of a button or failure of a
device) to cause transitions from one discrete state to another. A simulation model
consists of a set of rules (such as equations, flowcharts, state machines, cellular automata)
that define the future state of a system given its present state (Borshchev and Filippov,
2004).
A simulation can also be classified in terms of model structure. Sulistio, Yeo and
Buyya (2004) proposed a taxonomy encompassing different approaches. The presence of
time is irrelevant in the operation and execution of a static simulation model (e.g., Monte
Carlo models). For the case of a dynamic model, in order to build a correct representation
of the system, simulated time is of importance to model structure and operation (e.g.,
www.electronicbo.com
queuing or conveyor).
Dynamic systems can be classified as either continuous or discrete. In continuous
systems, the values of model state variables change continuously over simulated time. In
the event that the state variables only change instantaneously at discrete points in time
(such as arrival and service times), the model is said to be discrete in nature. Discrete
models can be time-stepped or event-stepped (or event-driven). In discrete-event models,
the state is discretized and "jumps" in time and the steps (time-step) used is constant.
State transitions are synchronized by the clock i.e., system state is updated at preset times
in time-stepped while it is updated asynchronously at important moments in the system
lifecycle in event-driven systems.
Deterministic and probabilistic (or stochastic) properties refer to the predictability of
behavior. Deterministic models are made up of fixed input values with no internal
randomness given the same output for same corresponding input. Hence, the same set of
inputs produces the same of output(s). In probabilistic models however, some input
variables are random, describable by probability distributions (e.g., Poisson and Gamma
distributions for arrival time and service times). Several runs of stochastic models are
needed to estimate system response with the minimum variance.
The structure of a system determines its behavior over time. Ecommerce system is a
complex, interactive and stochastic system that deals with various people, infrastructure,
technology and trust. In addition, factors like uncertainty, competition and demand
defines its economic landscape. These markets are non-linear, experiencing explosive
growth and continuous change. Developing representative models comprise of detailing
stakeholders and pertaining underlying processes. Decision makers must consider these
factors when analyzing the system and procuring optimal strategies to assess model
viability.
Agent-Based Modeling Simulation and Its Application to Ecommerce 259
System Dynamics
2016; Lin & Liu, 2008) where identifying the relevant parameters is essential to profit
maximization.
www.electronicbo.com
i. Identify all the stakeholders in the system
ii. Identify the factors (internal and external) that influence the system
iii. Evaluate the competitive landscape of the business model
iv. Define the system supply chain
v. Specify performance metrics
vi. Specify interactions between components
vii. Model the behavior of the system, and
viii. Analyze the results of the model implementation
Figure 1 illustrates how characteristics of the system are employed in developing the
proposed framework. As previously identified, organizations face an ever-increasing
number of challenges and threats such as changes in market, competitors, customer
demands and security. These risks are used to generate a mechanism for risk
classification assignable to system characteristics. The needs of the stakeholders are then
integrated into the developed framework since they define what brings value to the
system.
The ecommerce system is influenced by internal and external factors. Internal factors
include the cost of operation, management actions and policies, processes involved in
delivering value to the customers, risks associated with implementing the business model
and generated income. External factors are uncontrollable but it is imperative that the
organization responds in ways to adequately manage them. These factors include the
change in market, activities of competitors, customer demand, government regulations
and the global economy.
Managing the supply chain of the system exposes the inefficiencies associated with
achieving organizational goals. The C2C ecommerce space is mapped in order to identify
the suppliers, clients and communication requirements. Based on the information
obtained from this stage, the modeling of system complexity is applied for dynamic
analysis.
Agent-Based Modeling Simulation and Its Application to Ecommerce 261
Starting with the desired state, performance indicators influence the achievement of
the system goals. The factors of interest are summarized as costs, security, customer
satisfaction, profits and market share. Once critical success factors are defined, the
complexity of the system which takes into consideration all the characteristics hereby
identified can then be modeled and results analyzed for policy development.
In line with the characteristics of the system, the proposed framework is implemented
from a hybrid perspective. Such an implementation provides a testbed for analysis of
management and stakeholder actions and also for evaluating performance of the system
under different conditions. Hybrid simulation finds extensive applications in research and
practice in part because most real life systems are hybrid in nature. Hybrid models can be
used to analyze business policies and performance, thereby acting as a complementary
tool in decision making.
and company website. The case study helps to select and define boundaries and core
areas of interactions on the platform.
To describe the online consumer-to-consumer (social) lending in the context of an
ecommerce system, liquidity, pricing models, and uncertainty, hybrid modeling is used.
Growth in the industry partly results from investors being discouraged by stock market
returns and lower interests provided by banks. Results from business case studies and
literature review indicate that the success of peer-to-peer (P2P) lending business process
innovation has not been proven. As an example, it is beneficial to balance the number of
lenders and qualified borrowers that can effectively meet the mutual needs of the
customers. Because this form of lending is insecure, lenders are exposed to a risk of
default by the borrower. The platforms have to deal with uncertainties that pervade all
www.electronicbo.com
aspects of its operations. Any unplanned downtime, outage or system hack can have long
term effects to its operations and credibility (Joledo et al. 2014).
The hybrid simulation models in this research are developed using AnyLogic
(http://www.anylogic.com/). AnyLogic has the capability of creating mixed discrete-
continuous simulations of ABS and SD models in the same interface. Seller consumers
come into the system with goods to sell. These consumers require different thresholds of
returns. Buyers have different cut-off prices which they can pay for transactions on the
platform. Consumers are modeled as agents whose behaviors elicit corresponding
responses. The dynamics of price agreement are also modeled in the agent-based system.
The environment is modeled in SD with agents living therein. The population of
consumers is disaggregated to individual level using agents.
In the simulation of business processes, interactions between players are modeled
using statecharts. The system is simulated over a period of eight years to gain insights in
to the behavior of participants and how their individual or collective actions affect the net
income and in turn the profit margin. The output of the overall system is viability
measured by the default rates, net income and profit margin. Outputs of the ABS
subsystem are fed into the time-continuous SD model (strategic layer). The assumption
for this study is that the seller seeks to sell his product at a profit while the buyer seeks to
pay the minimum cost for a particular product. The provider supplies a medium for the
realization of customer utility while making a profit in the process.
Agent-Based Modeling Simulation and Its Application to Ecommerce 263
Data
Real data of LC business model is available via its platform. Data on arrival patterns
and arrival intervals are generated stochastically according to the data collected for years
2013 and 2014. There were 235,629 accepted loan requests during the period of interest.
Error! Reference source not found. summarizes descriptive statistics for variables relating
to the funded (accepted) borrowers within the time period.
Neural Network
The neural network (NN) is used to map the characteristics of users to different risk
decisions and to copy trust. Profiles of completed loans are used to build the NN model
representations using combined datasets of the accepted and rejected loans. A random
sample of 2062 data points from the combined dataset forms the training data used in the
learning process. The input is normalized by dividing amount requested by 3.5, FICO
score by 850 and employment length by 10.
The network structure consisted on four layers (Fig. 2). The first layer has 4 neurons
representing each of the following variables: amount, FICO, dti and employment length.
264 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
Based on business model of Lending Club, these four variables were employed in our
framework to determine which borrowers are screened into or permitted to do
transactions on the platform. The NN also has two hidden layers with 5 and 3 neurons
respectively. Finally, the output layer has two neurons called Output 1 and Output 2 that
fire up any value between 0 and 1. Thus, if Output 1 is larger than Output 2 then it is
considered an acceptance, otherwise it is a rejection.
Taking that into account a test with the entire dataset is run and the resulting error is
0.1118. That means that about 11.1% instances of the training values are misclassified.
To improve the capacity of the NN to represent the information and get better results, the
structure of the NN is changed by adding more layers and varying the number of neurons
per layer.
www.electronicbo.com
To improve the capacity of the NN to represent the information and get better results,
the structure of the NN was changed by adding more layers and varying the number of
neurons per layer. The new results for a sample of the accepted data obtained an average
training error of 0.009570 and a target error of 0.0100.
The individual behaviors of consumers are modeled in the ABS subsystem. The
simulation begins by declaring and initializing all variables. Probabilities are assigned to
the different agent variables based on their corresponding distributions. The loan lifetime
is defined by parameter Term. The requested Amount, FICO, DTI and Credit History are
stochastic characteristic of a borrower.
Agent-Based Modeling Simulation and Its Application to Ecommerce 265
The users are modeled as agents with individual behaviors. Risk is modeled into
agent by utilizing the dti, credit history, fico range, income to generate a corresponding
interest rate. Depending on the user state, transitions are triggered by timeouts or by
meeting certain conditions. On executing the program, new borrowers are created who
transition to the PotentialBorrower state. In this state, FICO, DTI and Amount requested
are passed to the neural network class in order to generate a decision on which borrower
transitions to the Screened state. The time spent in a given state follows a uniform
distribution reflecting the time range associated with its state. For example, a typical
Lender takes about 45 days between entry and receiving of first payment. Similarly, the
time spent in the PotentialBorrower state before screening ranges from 2 to 4 days. The
statechart representing borrower and lender behaviors and interactions with the system is
given in Figure 3 and Figure 4.
Once the borrower is screened, an interest rate is generated to reflect his risk profile.
A draw on the lookup table is used to generate an interest rate that corresponds to the
borrower. On receiving the interest rate, the borrower makes a decision to agree or
decline the terms of the loan. If he declines, he has an option to follow the
noToAgreement transition and go back to the PotentialBorrower state where he can
decide to remain or to leave the platform. If the borrower agrees to the terms of the loan,
he proceeds to the PostedProfile state via the yesToAgreement transition. The decision to
accept interest rate is internal and probabilistic based on the borrower’s risk preference
and personal goals. A call is made to the requestServiceB() function which communicates
the borrower profile to available lenders. If the borrower profile matches a given lender’s
risk aversion, he accepts and stores the id of the borrower along with his profile
www.electronicbo.com
information.
Once the lender agrees to fund the borrower, the borrower transitions to the Funded
state where it remains for a uniformly distributed period that reflects the time it takes to
fully fund the request. After which it transitions to the InRepayment state where it
remains for the term (usually 36 or 60 months). Thirty days after entering the
InRepayment state, the borrower starts to make payment every 27 to 31 days. This time
range reflects the fact that borrowers pay their bills early, on time or late.
There is one transition from InRepayment state and this has two branches. One of the
branches leads to FullyPaid while the other to the InDefault state and then to Exit where
the borrower leaves the system on charge off. The decision at the TestDefault branch is
made internally and stochastically. The average amount of capital and interests that is
repaid, recovered or lost when a borrower defaults is also reflected.
LC, which acts as a central dispatcher, broadcasts requests for borrower loans to all
lenders. For simplicity, LC is modeled as a function call that responds to requests. LC
listens for messages from the borrower as well as the lender side and manages the
transaction completion on behalf of the agents. LC inserts a message in the queue and
notification is broadcasted to borrowers and lenders. BorrowerAB and LenderAB
represent borrower and lender agent classes. The communication instances used in the
model are summarized below:
1) Screening request: a message arrives from the borrower and lender requesting
screening.
2) Interest rate generation: the LC generates an interest rate and communicates it to
the borrower.
3) Borrower decision on interest rate: based on the risk profile, the borrower decides
to accept or reject the generated interest rate.
4) Lender’s decision on interest rate: the lender decides to fund a particular
borrower with an associated interest rate based on its risk profile.
5) Payment: payments are communicated to LC and in turn to the lender.
6) Default: the borrower leaves the platform and the lender and borrower returns are
updated.
7) Fully paid: a message from the borrower and lender deciding if to go back as
potential customers or they can choose to leave the system.
It is assumed that participants are sensitive to ads and word of mouth (WOM). The
WOM effect is the way new users are persuaded to purchase a product or adopt a service.
Consumers persuade others to adopt a service or buy a good often using word of mouth.
Each participant’s adoption time differs. In this system, customer satisfaction measured
by response to WOM and results from satisfactorily completed loans. Hence, it is
expected that as more customers default, the WOM decreases. A consumer contacts an
average of number people in a month i.e., a specified contact rate. Agents in the system in
turn contact each other and influence potential borrowers to sign up for the service.
Space and request queue management are defined within the Main with space and
layout requirements configured in the Environment object contained in the Main. A value
of 1000 each was assigned as initial number of borrowers and lenders in the system.
268 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
An advantage of object oriented ABM is that we can look deeper into each object –
borrower or lender – and view its state and variable values. The following are some
inputs used in calibrating the agent based model (Figure 5):
www.electronicbo.com
include:
System Dynamics
The system dynamics (SD) model incorporates estimates of the demand, behaviors of
customers, costs, and market conditions. The SD phase involves first modeling a causal
loop diagram of the peer-to-peer lending environment using identified key system
variables (Figure 6).
The causal loop diagram forms the backbone of the system dynamics model. Causal
loops are constructed to capture interrelationships of critical success factors identified in
literature. The metrics of interest in the SD model include profitability, customer
satisfaction, and responsiveness. In the model, profitability is measured as net income
and profit margin. The SD model receives input from the ABM. The output of the system
is the projected AvgNetAnnualizedReturn, MarketShare, NetIncome and ProfitMargin
for a period of the given lending span (eight years in this study). The Net Annualized
Return (a customer facing metric inherent to the LC business model) is the income from
interest less service charges, charge off and including recoveries. MarketShare is the
amount of the total ecommerce market captured by the simulated company. While
270 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
ProfitMargin (an organization facing metric) is the NetIncome less inflation compared to
the total income derived from interests.
The following are some inputs used in calibrating the system dynamics model:
The initial investment by the C2C company is 500 (with all cash amounts in tens
of thousands of dollars)
The effective tax rate of the organization is 34%
All transactions accrue a 1% service charge
The simulation runs from January 1st 2012 for 8 years.
In Figure 6, the points where the ABM is coupled with the SD model are denoted by
an ABM suffix. For example, The AmmountFundedABM, InterestIncomeABM,
www.electronicbo.com
AmountRepaidABM are dynamic variables whose values are dependent on an update
from the ABM subsystem. The NetIncome stock represents an accumulation of the gross
income net taxes.
Results
In Figure 7, 1000 borrowers initially enter the system and as time (horizontal axis)
progresses, borrowers start to transition to the Screened, PostedProfile, Funded,
InRepayment and FullyPaid states. As time progresses, new users are added to the system
by responding to the WOM effects of other borrowers and lenders. At the end of the
simulation period, a total of about 1700 borrowers and 2100 lenders are in the system.
This number can be controlled by varying the WOM factor. For speed and efficiency, this
number is kept low in the present study. A portion of users remain in the
PotentialBorrower state because some of the borrowers who come into the system do not
meet the screening requirements and never progress to the screened state.
Observing the behavior of metrics in Lending Club suggest that net annualized return
declines exponentially as time progresses. This is in line with the output of the
AvgNetAnnualizedReturn metric in Figure 8. It becomes evident that as time progresses,
more borrowers begin to default, effectively driving AvgNetAnnualizedReturn
downwards. This presents a challenge that conflicts with the goal of viability of the
business model.
www.electronicbo.com
Figure 9. Response of net income to taxation.
In the early phase of the simulation, the initial capital and cost weigh heavily on the
system. The sudden spikes MarketShareDS signifies that the first phase of borrowers in
the different time horizons have completed their loan cycle and new users are being
initialized. Most borrowers return to PotentialBorrower state where they can request new
loan and the process repeats itself. Net income increases slowly in the first two years due
to the fact that the starting number of borrowers is low and because the effect of WOM
only becomes significant with time.
Results from our study is compared to original data provided by Lending Club and is
illustrated in Fig. 10. This comparison serves to validate the usefulness of the developed
framework in estimating the net annualized return metric. The results show that the
average net annualized returns obtained from our model follows the same pattern and is
relatively close in value to that obtained from historical performance.
The developed simulation models serve as a testbed for managing control actions by
incorporating fluctuations and stochasticity. The system dynamics model captures a high
level abstraction of the system. A multi-model paradigm consisting of agent based
simulation allows appropriate choice of techniques that take into consideration different
components of the system.
In online consumer-to-consumer lending, risks and uncertainties pervade aspects of
operation. The model uses consumers’ historical payments, outstanding debts, amount of
credit available, income and length of credit history to make its calculations. The
framework offers a structured approach that incorporates business processes and
stakeholder requirements and lends its use to ecommerce systems.
The developed simulation model takes into consideration difference in customer
characteristics and stochasticity in demand patterns. The framework provides insights to
the overall behavior of consumer-to-consumer ecommerce complex systems. This in turn
provides insights on the profitability of the business model and strategies for improving
system performance. The result is a recommendation for a course of action which
complements management’s expertise and intuition.
An extension to this study will be to explore the case where a borrower’s request is
met by multiple lenders, and how such strategy impacts individual and system
performance. There is also room to improve on the risk classification phase. Validity of
the results hinges on correct interpretation of the output of the model. As a result, there is
a need to also improve the accuracy of neural network prediction algorithm to encompass
a system that perpetually improves based on learning. Further research can also
investigate to what extent P2P models can reduce costs and fees and if such reduction is
worth the associated risk.
It is expected that conceptual modeling approaches will continue to be a beneficial
approach for analyzing consumer-to-consumer complex systems. This study lays a
foundation for future research to expand on the guidelines and simulation development in
modeling the operations of an organization.
REFERENCES
An, Lianjun, & Jeng, J.-J. (2005). On developing system dynamics model for business
process simulation. In Simulation Conference, 2005 Proceedings of the Winter (p. 10
pp.-). https://doi.org/10.1109/WSC.2005.1574489.
An, Liping, Du, Y., & Tong, L. (2016). Study on Return Policy in E-Commerce
Environment Based on System Dynamics. In Proceedings of the 2nd Information
274 Oloruntomi Joledo, Edgar Gutierrez and Hatim Bukhari
www.electronicbo.com
York City, New York. Retrieved from http://www.systemdynamics.org/
conferences/2003/proceed/PAPERS/906.pdf.
Helal, M. (2008). A hybrid system dynamics-discrete event simulation approach to
simulating the manufacturing enterprise (Ph.D.). University of Central Florida,
United States -- Florida. Retrieved from http://search.proquest.com.ezproxy.
net.ucf.edu/pqdtft/docview/304353738/abstract/5DB9EBD8191844D2PQ/26?accoun
tid=10003.
Helal, M., Rabelo, L., Sepúlveda, J., & Jones, A. (2007). A methodology for integrating
and synchronizing the system dynamics and discrete event simulation paradigms. In
Proceedings of the 25th international conference of the system dynamics society
(Vol. 3, pp. 1–24). Retrieved from http://www.systemdynamics.org/
conferences/2007/proceed/papers/HELAL482.pdf.
Joledo, O. (2016). A hybrid simulation framework of consumer-to-consumer ecommerce
space (Doctoral Dissertation). University of Central Florida, Orlando, Florida.
Retrieved from http://stars.library.ucf.edu/etd/4969.
Joledo, O., Bernard, J., & Rabelo, L. (2014). Business Model Mapping: A Social Lending
Case Study and Preliminary Work. IIE Annual Conference. Proceedings, 1282–1290.
Khatoon, A., Bhatti, S. N., Tabassum, A., Rida, A., & Alam, S. (2016). Novel Causality
in Consumer’s Online Behavior: Ecommerce Success Model. International Journal
of Advanced Computer Science and Applications, Vol 7, Iss 12, Pp 292-299 (2016),
(12), 292. https://doi.org/10.14569/IJACSA.2016.071238.
Kiani, B., Gholamian, M. R., Hamzehei, A., & Hosseini, S. H. (2009). Using Causal
Loop Diagram to Achieve a Better Understanding of E-Business Models.
International Journal of Electronic Business Management, 7(3), 159.
Lin, J.-H., & Liu, H.-C. (2008). System Dynamics Simulation for Internet Marketing.
2008 IEEE/SICE International Symposium on System Integration, 83.
Oliva, R., Sterman, J. D., & Giese, M. (2003). Limits to growth in the new economy:
exploring the “get big fast” strategy in e-commerce. System Dynamics Review, 19(2),
83–117. https://doi.org/10.1002/sdr.271.
Agent-Based Modeling Simulation and Its Application to Ecommerce 275
Qiang, X., Hui, L., & Xiao-dong, Q. (2013). System dynamics simulation model for the
electronic commerce credit risk mechanism research. International Journal of
Computer Science Issues (IJCSI), 10(2). Retrieved from http://ijcsi.org/papers/IJCSI-
10-2-3-33-40.pdf.
Rabelo, L., Eskandari, H., Shaalan, T., & Helal, M. (2007). Value chain analysis using
hybrid simulation and AHP. International Journal of Production Economics, 105(2),
536–547. https://doi.org/10.1016/j.ijpe.2006.05.011.
Rabelo, L., Eskandari, H., Shalan, T., & Helal, M. (2005). Supporting simulation-based
decision making with the use of AHP analysis. In Proceedings of the 37th conference
on Winter simulation (pp. 2042–2051). Winter Simulation Conference. Retrieved
from http://dl.acm.org/citation.cfm?id=1163064.
Sheng, S. Y., & Wong, R. (2012). A Business Application of the System Dynamics
Approach: Word-of-Mouth and Its Effect in an Online Environment. Technology
Innovation Management Review, Iss June 2012: Global Business Creation, Pp 42-48
(2012), (June 2012: Global Business Creation), 42–48.
Speller, T., Rabelo, L., & Jones, A. (2007). Value chain modelling using system
dynamics. International Journal of Manufacturing Technology and Management,
11(2), 135–156.
Sulistio, A., Yeo, C. S., & Buyya, R. (2004). A taxonomy of computer-based simulations
and its mapping to parallel and distributed systems simulation tools. Software:
Practice and Experience, 34(7), 653–673.
Wu, J.-H., & Hisa, T.-L. (2004). Analysis of E-commerce innovation and impact: a
hypercube model. Electronic Commerce Research and Applications, 3(4), 389–404.
https://doi.org/10.1016/j.elerap.2004.05.002.
AUTHORS’ BIOGRAPHIES
Dr. Oloruntomi Joledo has four years experience developing software applications
and also as a project engineer on various technological projects. Her main research
interests include: agents, discrete-event simulations, agent-based simulations, hybrid
simulations, software development and engineering management. She works for the
College of Medicine at UCF as coordinator and data analyst.
10 years of academic and industry experience in prescriptive analytics and supply chain
management. His expertise includes machine learning, operation research and simulation
techniques for systems modelling and optimization.
www.electronicbo.com
In: Artificial Intelligence ISBN: 978-1-53612-677-8
Editors: L. Rabelo, S. Bhide and E. Gutierrez © 2018 Nova Science Publishers, Inc.
Chapter 12
Jose M. Prieto*
UCL School of Pharmacy, London, UK
ABSTRACT
Complex natural products such as herbal crude extracts, herbal semi purified
fractions and Essential oils (EOs) are vastly used as active principles (APIs) of medicinal
products in both Clinical and Complementary/Alternative Medicine. In the food industry,
they are used to add ‘functionality’ to many nutraceuticals. However, the intrinsic
variability of their composition and synergisms and antagonisms between major and
minor components makes difficult to ensure consistent effects through different batches.
The use of Artificial Neural Networks (ANNs) for the modeling and/or prediction of the
bioactivity of such active principles as a substitute of laboratory tests has been actively
explored during the last two decades. Notably, the prediction of antioxidant and
antimicrobial properties of natural products have been a common target for researchers.
The accuracy of the predictions seems to be limited only by the inherent errors of the
modelled tests and the lack of international agreements in terms of experimental
protocols. However, with sufficient accumulation of suitable information, ANNs can
become reliable, fast and cheap tools for the prediction of anti-inflammatory, antioxidant,
antimicrobial and antiinflammatory activities, thus improving their use in medicine and
nutrition.
*
Corresponding Author Email: j.prieto@ucl.ac.uk.
278 Jose M. Prieto
INTRODUCTION
Artificial neural networks are a type of artificial intelligence method. They are
applied in many disparate areas of human endeavours, such as the prediction of stock
market fluctuations in economy, forecasting electricity load in energy industry,
production of milk in husbandry, quality and properties of ingredients and products in the
food industry, prediction of bioactivities in toxicology and pharmacology or the
optimization of separation processes in chemistry (Dohnal, Kuča & Jun, 2005; Goyal,
2013).
In particular, the prediction of the bioactivity of natural products after their unique
chemical composition is an idea already well established among the scientific community
www.electronicbo.com
but not systematically explored yet, due to the experimental complexity of characterising
all possible chemical interactions between dozens of components (Burt, 2004). In this
regard, Neural networks have an enormous advantage in that they do require less formal
statistical training, can detect complex non-linear relationships between dependent and
independent variables and all possible interaction without complicated equations, and can
use multiple training algorithms. Moreover, in terms of model specification, ANNs
require no knowledge of the internal mechanism of the processes but since they often
contain many weights that are estimated, they require large training set. The various
applications of ANNs can be summarized into classification or pattern recognition,
prediction and modeling (Agatonovic-Kustrin & Beresford, 2000; Cartwright, 2008).
Therefore, the use of ANNs may overcome these difficulties thus becoming a
convenient computational tool allowing the food and cosmetic industry to select herbal
extracts or essential oils with optimal preservative (antioxidant and antimicrobial
properties) or pharmacological activities (anti-inflammatory properties). This is not
trivial, as natural products are notoriously complex in terms of chemical composition,
which may significantly vary depending on the batch and the supplier. This variability
implies a constant use of laboratory analysis. ANNs able to model and predict such
properties would result in savings and enhanced consistency of the final product. The use
of such computational models holds potential to overcome –and take into account- all the
possible (bio) chemical interactions, synergisms and antagonisms between the numerous
components of active natural ingredients.
published by Krogh (2008), Dohnal et al. (2005), and Zupan & Gasteiger (1991). These
are listed in order of increasing complexity for a smooth progression.
The conception of an Artificial Neurone (AN) fully originates from the biological
neuron. Each AN has certain number of inputs. Each of them has assigned its own
weight, which indicates the importance of the input. In the neuron, the sum of weighted
inputs is calculated and when its sum overcomes a certain value, called threshold (but
also known as bias or noise), the sum is then processed using a transfer function and the
result is distributed through the output to the next AN (Figure 1).
Similarly, the term “Artificial neural networks” (ANNs) originates from its biological
pattern – neural network (NN) which represents the network of interconnected neurons in
a living organism. The function of NN is defined by many factors, for example by
number and arrangements of neurons, their interconnections, etc. Figure 2 shows how
ANNs are based on the same conception as the biological ones; they are considered as the
collection of interconnected computing units called artificial neurons (AN). The network
is composed by a set of virtual/artificial neurons organized in interconnected layers. Each
neuron has a specific weight in the processing of the information. While two of these
layers are connected to the ‘outside world’ (input layer, where data is presented, and
output layer, where a prediction value is obtained), the rest of them (hidden layers) are
defined by neurons connected to each other, usually excluding neurons of the same layer
(Figure 2).
Figure 1. Comparison between form and function of biological and artificial neurones.
280 Jose M. Prieto
www.electronicbo.com
©Jose M Prieto
(A)
©Jose M Prieto
(B)
Figure 2. Comparison of form and function in (A) biological and (B) artificial neuronal networks.
Artificial Intelligence for the Modeling and Prediction ... 281
©Jose M Prieto
Figure 3. Unsupervised training of and artificial neuronal. (A) Training set of inputs and outputs
representing the experimental values taken from real life; (B) The ANN builds up an algorithm by a
series of iterations where the weights and thresholds are finely tuned to get as closer as possible of the
output value given in (A).
www.electronicbo.com
B
Figure 4. (A) Comparison between real (squares) experimental values and those calculated or predicted
by the ANN (dots). (B) Quantitative measurement of the performance of the ANN (From Daynac,
Cortes-Cabrera, & Prieto, 2016).
Artificial Intelligence for the Modeling and Prediction ... 283
The architecture can vary in terms number of internal layers, how many ANs in each
layer, the connections (fully or partially interconnected layers) between ANs and in the
transfer function of chosen for the signal processing of each AN.
From the ANN theory, it is evident, that there are many values (weights, thresholds)
which have to be set. To do so, many adaptation algorithms have been developed which
mainly fall into two basic groups: supervised and unsupervised.
Supervised algorithm requires the knowledge of the desired output. The algorithm
then calculates the output with current weights and biases. The output is compared with
targeted output and the weights and biases are adjusted by algorithm. This cycle is
repeated until the difference between targeted and calculated values is as closer as it can
get. The most applied supervised algorithms are based on gradient methods (for example
‘back propagation’) (Figure 3) and genetics (genetic algorithms). While the supervised
learning algorithm requires the knowledge of output values, the unsupervised does not
need them. It produces its own output which needs further evaluation.
When the ANN finishes the adjustments after a established number of iterations (or
epochs) it is necessary to check that it actually is ‘fit for purpose’: the prediction ability
of the network will be tested on a validating setoff data. This time only the input values
of the data will be given to the network which will calculate its own output. The
difference between the real outputs and the calculated ones can be investigated to
evaluate the prediction accuracy of the network. This can be directly visualised (as in in
Figure 4A) but eventually the performance of the predictions have to be measured by
linear correlation (see Figure 4B).
Two main areas of application are directly linked with the potential use of natural
products: Food industry and Pharmaceutical research. Both have started to use ANNs as a
tool to predict both the best processing methods and the final properties of the final
products made from natural sources. Perhaps ANNs are better stablished in the food
chemistry sector, whilst their use in pharmaceutical research is lagging behind.
Indeed, ANNs have been applied in almost every aspect of food science over the past
two decades, although most applications are in the development stage. ANNs are useful
tools for food safety and quality analyses, which include modeling of microbial growth
and from this predicting food safety, interpreting spectroscopic data, and predicting
physical, chemical, functional and sensory properties of various food products during
processing and distribution. (Huang, Kangas, & Rasco, 2007; Bhotmange & Shastri,
2011).
284 Jose M. Prieto
On the one hand, application of ANNs to food technology, for example control of
bread making, extrusion and fermentation processes (Batchelor, 1993; Eerikanen &
Linko, 1995; Latrille, Corrieu, & Thibault, 1993; Ruan, Almaer, & Zhang, 1995) are
feasible and accurate, easy to implement and will result in noticeable advantages and
savings to the manufacturer. On the other hand, prediction of functionality (antioxidant,
antimicrobial activities for example) is not so much explored, perhaps given the
complexity of the experimental design associated with those, that we will discuss in detail
later, and the less obvious advantages for the manufacturer.
The potential applications of ANN methodology in the pharmaceutical sciences
range from interpretation of analytical data, drug and dosage form design through
biopharmacy to clinical pharmacy. This sector focuses more on the use of ANNs to
www.electronicbo.com
predict extraction procedures (similarly to the food sector), pharmacokinetic and
toxicological parameters. These three aspects are usually non-linear thus in need of AI
tools, that can recognize patterns from data and estimate non-linear relationships. Their
growing utility is now reaching several important pharmaceutical areas, including:
Most of the above problems are solved for the case of single (natural or synthetic)
drugs. However, the urgency of applying ANN based approaches is best perceived to the
clinical rationalisation and exploitation of herbal medicines. Herbal medicines contain at
least one plant based active ingredient which in turn contains dozens to hundreds of
components (phytochemicals). To start with, little is known about which phytochemical/s
is/are responsible for the putative properties of the herbal ingredient. Chagas-Paula et al.
(2015) successfully applied ANNs to predict the effect of Asteraceae species which are
traditionally used in Europe as anti-inflammatory remedies (for details see “Prediction of
the anti-inflammatory activities” below). When multiple herbal ingredients (10-20) are
used, such as in Traditional Chinese Medicine, the exact role of each drug may be only
possible to understand if the myriad of influencing factors are harnessed by AI means.
Artificial Intelligence for the Modeling and Prediction ... 285
(Han, Zhang, Zhou, & Jiang, 2014) taking advantage of the fact that ANNs require no
knowledge of the internal mechanism of the processes to be modelled.
Similar to pharmacotoxicology, pathology is a complex field in which modern High-
throughput biological technology can simultaneously assess the levels of expression of
tens of thousands of putative biomarkers in pathological conditions such as tumors, but
handling this complexity into meaningful classification to support clinical decisions
depends on linear or non-linear discriminant functions that are too complex for classical
statistical tools. ANNs can solve this issue and to provide more reliable cancer
classification by their ability to learn how to recognize patterns (Wang, Wong, Zhu, &
Yip, 2009)
www.electronicbo.com
Ascorbic acid in Thermal treatments Ascorbic acid content (Zheng, et al.,
green Asparagus parameters 2011)
Bananas Total phenols (Guiné et al.,
2015.)
Bayberry juice Red, green, and blue (RGB) Anthocyanins, ascorbic (Zheng et al.,
intensity values acid, Total phenols, 2011.)
flavonoids, and
antioxidant activity
Centella asiatica Selected shifts in the 1H DPPH radical (Maulidiani et al.,
Nuclear Magnetic Resonance scavenging activity. 2013)
spectra corresponding to 3,5-
O-dicaffeoyl-4-O-
malonilquinic acid (irbic
acid), 3,5-di-O-caffeoylquinic
acid, 4,5-di-O-caffeoylquinic
acid, 5-O-caffeoylquinic acid
(chlorogenic acid), quercetin
and kaempferol.
Cinnamon, clove, Colorimetry of the reaction % Scavenged DPPH (Musa et al., 2015)
mung bean, red
bean, red rice,
brown rice, black
rice and tea extract
Clove bud essential Peroxide concentration; Autooxidation of (Misharina et al.,
oil, Ginger, pimento thiobarbituric acid reactive polyunsaturated fatty 2015)
and black pepper substances; diene conjugate acids in linseed oil.
extracts content; content of volatile
compounds formed as
products of unsaturated fatty
acid peroxide degradation;
and composition of methyl
esters of fatty acids.
Commercial teas Total flavonoids, total total antioxidant activity (Cimpoiu et al.,
catechines and total 2011)
methylxanthines
Artificial Intelligence for the Modeling and Prediction ... 287
www.electronicbo.com
concentration (MIC) of synthetic drugs (Jaén-Oltra et al., 2000). Recently some works
have explored the use of such approach to predict the MIC of complex chemical mixtures
on some causal agents of foodborne disease and/or food spoilage (Sagdic, Ozturk & Kisi,
2012; Daynac, Cortes-Cabrera & Prieto, 2016).
Essential oils are natural products popularly branded as ‘antimicrobial agents’. They
act upon microorganisms through a not yet well defined mixture of both specific and
unspecific mechanisms. In this regard, ANNs are a very good option as they have been
successfully applied to processes with complex or poorly characterised mechanisms, as
they only take into account the causing agent and its final effect (Dohnala et al., 2005;
Najjar et al., 1997).
Indeed, the antibiotic activities of essential oils depend on a complex chemistry and a
poorly characterised mechanism of action. Different monoterpenes penetrate through cell
wall and cell membrane structures at different rates, ultimately disrupting the
permeability barrier of cell membrane structures and compromising the chemiosmotic
control (Cox et al., 2000). It is therefore conceivable that differences in the gram staining
would be related to the relative sensitivity of microorganism to Essential oils. However,
this generalisation on is controversial as illustrated by conflicting reports in literature.
Nakatani (1994) found that gram-positive bacteria were more sensitive to essential oils
than gram-negative bacteria, whereas Deans and Ritchie (1987) could not find any
differences related to the reaction. The permeability of the membrane is only one factor
and the same essential oil may act by different mechanisms upon different
microorganisms. As an example, the essential oil of Melaleuca alternifolia (tea tree)
which inhibited respiration and increased the permeability of bacterial cytoplasmic and
yeast plasma membranes, also caused potassium ion leakage in the case of E. coli and S.
aureus (Cox et al., 2001).
To further complicate matters, the evaluation antimicrobial activity of natural
products cannot be always attributed to one single compound in the mixture or when so,
the overall activity may be due to interactions between components of the essential oils.
In fact, synergism and antagonisms have been consistently reported as reviewed by Burt
Artificial Intelligence for the Modeling and Prediction ... 289
(2000). The challenge of the complexity of the countless chemical interactions between
dozens of EOs components and the microbes is virtually impossible to address in the
laboratory, but it may be solved using computational models such as artificial neural
networks (ANNs). In addition, ANNs are theoretically able to consider synergies and
antagonisms between inputs. There is a consistent body of data on many crude essential
oils being more active than their separated fractions or components, report on synergies.
In some cases synergistic activity between two or three components could be
experimentally demonstrated (Didry et al., 1993; Pei et al., 2009), but to do so with
dozens of chemicals is beyond reach. In fact, ANNs are algorithms which has the
capacity of approximating an output value based on input data without any previous
knowledge of the model and regardless the complexity of its mechanisms, in this case the
relationship between the antioxidant capacity of a given essential oil (input data) and its
chemical composition (parameters affecting the assay). The enormous amount of
information produced on the antimicrobial activity of essential oils provides a rich field
for data-mining, and it is conceivable to apply suitable computational techniques to
predict the activity of any essential oil by just knowing its chemical composition.
Our results reflect both the variability in the susceptibility of different
microorganisms to the same essential oil, but more importantly point towards some
general trends. The antimicrobial effects of essential oils upon S. aureus and C.
perfringens (Gram +) were accurately modelled by our ANNs, thus meaning a clear
relationship between the chemistry of EOs and their susceptibility, perhaps suggesting a
more additive, physical -rather than pharmacological- mechanism of action. This also
opens the prospect for further studies to ascertain the best set of volatile components
providing optimum antimicrobial activity against these two pathogens and/or Gram + in
general. On the other hand, the lower accuracy of the predictions against E. coli (Gram -)
and C. albicans (yeast) may suggest more complex pharmacological actions of the
chemicals. In this case the activity may be pinned down to one or few active principles
acting individually or in synergies.
Ozturk et al. (2012) studied the effects of some plant hydrosols obtained from bay
leaf, black cumin, rosemary, sage, and thyme in reducing Listeria monocytogenes on the
surface of fresh-cut apple cubes. In addition to antibacterial measurements, the abilities of
Adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN), and
multiple linear regression (MLR) models were compared with respect to estimation of the
survival of the pathogen. The results indicated that the ANFIS model performed the best
for estimating the effects of the plant hydrosols on L. monocytogenes counts. The ANN
model was also effective but the MLR model was found to be poor at predicting
microbial numbers. This further proofs the superiority of AI over Multivariate statistical
methods in modeling complex bioactivities of chemically complex products.
290 Jose M. Prieto
Antiviral Activities
Viruses are still a major, poorly addressed challenge in medicine. The prediction of
antiviral properties of chemical entities or the optimisation of current therapies to
enhance patient survival would be of great impact but the application of AI to this
conundrum has been less explored than in the case of antibacterials. Perhaps the most
pressing issue is the search for improved combination, antiretroviral drugs to suppress
HIV replication without inducing viral drug resistance. The choice of an alternative
regimen may be guided by a drug-resistance test. However, interpretation of resistance
from genotypic data poses a major challenge. Larder and co-workers (2007) trained
ANNs with genotype, baseline viral load and time to follow-up viral load, baseline CD4+
T-cell counts and treatment history variables. These models performed at low-
www.electronicbo.com
intermediate level, explaining 40-61% of the variance. The authors concluded that this
was still a step forward and that these data indicate that ANN models can be quite
accurate predictors of virological response to HIV therapy even for patients from
unfamiliar clinics.
We recently tried to model the activity of essential oils on herpes viruses (types 1 and
2) by both MLR and ANNs (Tanir & Prieto, unpublished results). Our results could not
find a clear subset of chemicals with activity, but rather the best results were given by
datasets representing all major components. This highlights that viruses are a much
harder problem to model and more work must be done towards solving it.
Internal Factors
Some of the reported problems in the application of ANNs are caused by their
inherent structure and the most important are ‘overtraining’, ‘peaking effect’, and
‘network paralysis’. Overtraining the ANN may lead to the noise of data used for training
292 Jose M. Prieto
being fixed in the network weights. The peaking effect is experienced when an excessive
number of hidden neurons minimize error in training but increase error in testing. Finally
network paralysis appears when an excessive adjustment of neurons weight raise high
negative or positive values leading to a near zero output with sigmoid activation
functions (Kröse & van der Smagt, 1996). These limitations must be taken into account
and minimized with an adequate choice of the network topology and a careful selection
of neurone parameters (function, weights, threshold, etc.).
External Factors
www.electronicbo.com
From our experience, the most problematic factors influencing the accuracy of the
predictions when dealing with data mining are noise (inaccurate data), normalisation of
the output to acceptable ranges (0-1 for better results) and topology complexity (too
many inputs).
In the case of very complex chemical entities, such as natural products, noise
reduction needs to be achieved by selecting carefully the data sets from papers with
similar values of reference drugs. Bioassays are far away from being performed in the
same way (i.e., same protocol) around the world. Even within the same institution or
laboratory differences will arise from different users, each modifying the protocol slightly
to adapt it to their needs. In this regard it is of utmost importance that all use the same
reference drug (antioxidant, antimicrobial, anti-inflammatory, etc.). However this is
extremely variable across papers and sometimes absent in some. The reduced numbers of
valid data available to train and validate the ANNs force the use of small sets which may
induce in turn bias (Bucinski, Zielinski & Kozlowska, 2004; Cortes-Cabrera & Prieto,
2010; Daynac, Cortes-Cabrera & Prieto, 2016). Ii would be tempting to discuss also the
physicochemical incompatibility of many synthetic drugs and natural products with most
of the milieu in which the bioassays are run (solvent polarity, microbiological/cell culture
media, etc.), due mostly to their volatility and poor solubility but this would be beyond
the scope of this chapter.
The challenge in modeling the activity of essential oils is mainly the selection of
inputs and the topology. Ideally the data set would necessarily include all variables
influencing the bioactivity to be modelled (of the vector). In practice, more than 30 such
inputs adds a tremendous complexity to the network and generally the number of inputs
used in other ANN are far lower than the dataset we are able to generate. On the other
hand, the restriction of the input data set inevitability leads to a bias, but it is the only way
forward in order to overcome this problem. Also, the restricted number of comparable
data present in literature results in a low number of learning and validating sets. These
factors do not invalidate the use of ANNs but limits any generalization of the results
Artificial Intelligence for the Modeling and Prediction ... 293
(Najjar et al., 1997). By reducing the inputs to the most relevant compounds – for
example, retaining those with reported activity only– the researcher could reduce the
number of input neurons and subsequently those of hidden neurons therefore minimizing
problems associated with topology complexity. However, the number of inputs used in
our works remains far higher than any of the previous attempts reported by the literature
(Bucinski, Zielinski & Kozlowska, 2004; Torrecilla, Mena, Yáñez-Sedeño, & García,
2007). However, the deliberate choice of active compounds may introduce bias and
hamper the accuracy of the ANNs when synergies with non active components are
significantly involved. For example, in our work on the antioxidant activities of essential
oils, from the initial set of around 80 compounds present in these, only 30 compounds
with relevant antioxidant capacity were selected to avoid excessive complexity of the
neural network and minimize the associated structural problems. Similarly, in our work
on in our work on the antimicrobial activities of essential oils, from the initial set of
around 180 compounds present in these, only 22 compounds were selected. In this later
case two considerations were made: either to retain the compounds with known
antimicrobial properties only or to eliminate the compounds without known antimicrobial
activity and/or present at very low percentages (≤5%). The first strategy proved to give
better results (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera & Prieto, 2016).
The output values need in many cases to be normalized to a range usually between 0
and 1. This implies diverse strategies depending on how many orders of magnitude
expand the original data. A common approach is applying logarithms to the original
values (Log x, or log 1/x) (Cortes-Cabrera & Prieto, 2010; Daynac, Cortes-Cabrera &
Prieto, 2016; Buciński and 2009).
Finally, the overall performance of the ANNs depends on the complexity of the
biological phenomenon to model. In our hands the performance on prediction of the
result of antimicrobial assays was lower than predicting purely biochemical assays. A
highest degree of variability in the response of whole living organisms vs. the higher
reproducibility of biochemical reactions is in agreement with the work discussed above
about antiviral activities.
Back in 1991 Zupan and Gasteiger questioned the future of the application of ANNs.
At the time a few applications only were reported despite a healthy output of research on
ANNs (Zupan & Gasteiger, 1991). The affordability of computational power and the
availability of ANNs software with friendlier interfaces has made this tool more
accessible and appealing to the average researcher in fields afar from computing,
facilitating their application to many different scientific fields. It is nowadays an add-on
to all main statistical software packages or available for free as a standalone.
294 Jose M. Prieto
www.electronicbo.com
a future where omics technology and systems biology will feed data in real time cloud-
based ANNs to build increasingly accurate predictions and classifications of biochemical
activities of complex natural products facilitating their rational clinical use to improve
healthcare and food safety worldwide.
REFERENCES
Agatonovic-Kustrin, S., Beresford, R. (2000). Basic concepts of artificial neural network
(ANN) modeling and its application in pharmaceutical research. J Pharm Biomed
Anal, 22, 717-727.
Agatonovic-Kustrin, S. & Loescher, C. (2013). Qualitative and quantitative high
performance thin layer chromatography analysis of Calendula officinalis using high
resolution plate imaging and artificial neural network data modelling. Anal Chim
Acta, 798, 103-108.
Asnaashari, E., Asnaashari, M., Ehtiati, A., & Farahmandfar, R. (2015). Comparison of
adaptive neuro-fuzzy inference system and artificial neural networks (MLP and RBF)
for estimation of oxidation parameters of soybean oil added with curcumin. J Food
Meas Char, 9, 215-224.
Asnaashari, M., Farhoosh, R., & Farahmandfar, R. (2016), Prediction of oxidation
parameters of purified Kilka fish oil including gallic acid and methyl gallate by
adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network. J Sci
Food Agr, 96, 4594-4602.
Batchelor, B. (1993). Automated inspection of bread and loaves. Int Soc Opt Eng (USA),
2064, 124-134.
Bhotmange, M. & Shastri, P. (2011). Application of Artificial Neural Networks to Food
and Fermentation Technology. In: Suzuki K Artificial Neural Networks - Industrial
and Control Engineering Applications, Shanghai, InTech, 2011; 201-222.
Artificial Intelligence for the Modeling and Prediction ... 295
Buciński, A., Socha, A., Wnuk, M., Bączek, T., Nowaczyk, A., Krysiński, J., Goryński,
K., & Koba, M. (2009). Artificial neural networks in prediction of antifungal activity
of a series of pyridine derivatives against Candida albicans, J Microbiol Methods, 76,
25-29.
Bucinski, A., Zielinski, H., & Kozlowska, H. (2004). Artificial neural networks for
prediction of antioxidant capacity of cruciferous sprouts. Trends Food Sci Technol,
15, 161-169.
Burt, S. (2004). Essential oils: their antibacterial properties and potential applications in
food—a review. Int J Food Microbiol, 94, 223–253.
Cartwright, H. (2008). Artificial neural networks in biology and chemistry: the evolution
of a new analytical tool. Methods Mol Biol., 458, 1-13.
Chagas-Paula, D., Oliveira, T., Zhang, T., Edrada-Ebel, R., & Da Costa, F. (2015).
Prediction of anti-inflammatory plants and discovery of their biomarkers by machine
learning algorithms and metabolomic studies. Planta Med, 81, 450-458.
Chen, Q., Guo, Z., Zhao, J., & Ouyang, Q. (2012). Comparisons of different regressions
tools in measurement of antioxidant activity in green tea using near infrared
spectroscopy. J Pharm Biomed Anal., 60, 92-97.
Chen, Y., Cao, W., Cao, Y., Zhang, L., Chang, B., Yang, W., & Liu X. (2011). Using
neural networks to determine the contribution of danshensu to its multiple
cardiovascular activities in acute myocardial infarction rats. J Ethnopharmacol.,
138,126-134.
Cimpoiu, C., Cristea, V., Hosu, A., Sandru, M., & Seserman, L. (2011). Antioxidant
activity prediction and classification of some teas using artificial neural networks.
Food Chem, 127, 1323-1328.
Cortes-Cabrera, A. & Prieto, J. (2010). Application of artificial neural networks to the
prediction of the antioxidant activity of essential oils in two experimental in vitro
models. Food Chem, 118, 141–146.
Cox, S., Mann, C., & Markham, J. (2000). The mode of antimicrobial action of the
essential oil of Melaleuca alternifolia (Tea tree oil). J Applied Microbiol, 88, 170–
175.
Cox, S., Mann, C., & Markham, J. (2001). Interactions
between components of the
essential oil of Melaleuca alternifolia. J Applied Microbiol, 91, 492–497.
Daynac, M., Cortes-Cabrera, A., & Prieto J. (2015). Application of Artificial Intelligence
to the Prediction of the Antimicrobial Activity of Essential Oils. Evidence-Based
Complementary and Alternative Medicine. Article ID 561024, 9.
Deans, S. & Ritchie G. (1987). Antibacterial properties of plant essential oils. Int J Food
Microbiol, 5, 165–180.
Desai, K., Vaidya B., Singhal, R., & Bhagwat, S. (2005). Use of an artificial neural
network in modeling yeast biomass and yield of β-glucan, Process Biochem, 40,
1617-1626.
296 Jose M. Prieto
Didry, N., Dubreuil, L., & Pinkas, M. (1993). Antimicrobial activity of thymol, carvacrol
and cinnamaldehyde alone or in combination. Pharmazie, 48, 301–304.
Dohnal, V., Kuča, K., & Jun, D. (2005). What are artificial neural networks and what
they can do? Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub., 149, 221–
224.
Eerikanen, T. & Linko, P. (1995). Neural network based food extrusion cooker control.
Engineering Applications of Artificial Neural Networks. Proceedings of the
International Conference EANN ’95, 473-476.
García-Domenech, R. & de Julián-Ortiz, J. (1998). Antimicrobial Activity
Characterization in a Heterogeneous Group of Compounds. J Chem Inf Comput Sci.,
38, 445-449.
Goodacre, R., Timmins, E., Burton, R., Kaderbhai, N., Woodward, A., Kell, D., &
www.electronicbo.com
Rooney, P. (1998). Rapid identification of urinary tract infection bacteria using
hyperspectral whole-organism fingerprinting and artificial neural networks.
Microbiology, 144, 1157-1170.
Goyal, S. (2013). Artificial neural networks (ANNs) in food science – A review. Int J Sci
World, 1, 19-28.
Guiné, R., Barroca, M., Gonçalves, F., Alves, M., Oliveira, S., & Mendes, M. (2015).
Artificial neural network modelling of the antioxidant activity and phenolic
compounds of bananas submitted to different drying treatments. Food Chem., 168,
454-459.
Han, S., Zhang, X., Zhou, P., & Jiang, J. (2014). Application of chemometrics in
composition-activity relationship research of traditional Chinese medicine. Zhongguo
Zhongyao Zazhi, 39, 2595-2602.
Huang, Y., Kangas, L., & Rasco, B. (2007). Applications of artificial neural networks
(ANNs) in food science, Crit. Rev. Food. Sci. Nut., 47, 133-126.
Huuskonen, J., Salo, M., & Taskinen, J. (1998). Aqueous Solubility Prediction of Drugs
Based on Molecular Topology and Neural Network Modeling. J Chem Inf Comput
Sci, 38, 450-456.
Jaén-Oltra, J., Salabert-Salvador, M, García-March, J., Pérez-Giménez, F., & Tomás-
Vert, F. (2000). Artificial neural network applied to prediction of fluorquinolone
antibacterial activity by topological methods. J. Med. Chem., 43, 1143–1148.
Jalali-Heravi, M., & Parastar, F. (2000). Use of artificial neural networks in a QSAR
study of anti-HIV activity for a large group of HEPT derivatives. J Chem Inf Comput
Sci., 40, 147-154.
Jezierska, A., Vračko, M., & Basak, S. (2004). Counter-propagation artificial neural
network as a tool for the independent variable selection: Structure-mutagenicity study
on aromatic amines. Mol Divers, 8, 371–377.
Karaman, S., Ozturk, I., Yalcin, H., Kayacier, A., & Sagdic, O. (2012). Comparison of
adaptive neuro-fuzzy inference system and artificial neural networks for estimation
Artificial Intelligence for the Modeling and Prediction ... 297
of oxidation parameters of sunflower oil added with some natural byproduct extracts.
J Sci Food Agric, 92, 49-58.
Kovesdi, I., Ôrfi, L., Náray-Szabó, G., Varró, A., Papp, J., & Mátyu P. (1999).
Application of neural networks in structure-activity relationships. Med Res Rev., 19,
249-269.
Krogh, A. (2008). What are artificial neural networks? Nature biotechnol, 26, 195-197.
Kröse, B., & van der Smagt, P. (1996). An introduction to neural networks (8th ed.).
University of Amsterdam.
Larder, B., Wang, D., Revell, A., Montaner, J., Harrigan, R., De Wolf, F., Lange, J.,
Wegner, S., Ruiz, L., Pérez-Elías, M., Emery, S., Gatell, J., Monforte, A., Torti, C.,
Zazzi, M., & Lane, C. (2007). The development of artificial neural networks to
predict virological response to combination HIV therapy. Antivir Ther., 12, 15-24.
Latrille, E., Corrieu, G., & Thibault J. (1993). pH prediction and final fermentation time
determination in lactic acid batch fermentations. Comput. Chem. Eng. 17, S423-
S428.
Ma, J., Cai, J., Lin, G., Chen, H., Wang, X., Wang, X., & Hu, L. (2014). Development of
LC-MS determination method and back-propagation ANN pharmacokinetic model of
corynoxeine in rat. J Chromatogr B Analyt Technol Biomed Life Sci., 959, 10-15.
Maulidiani, A., Khatib, A., Shitan, M., Shaari, K., & Lajis, N. (2013). Comparison of
Partial Least Squares and Artificial Neural Network for the prediction of antioxidant
activity in extract of Pegaga (Centella) varieties from 1H Nuclear Magnetic
Resonance spectroscopy. Food Res Int, 54, 852-860.
Mendelsohn, A. & Larrick, J. (2014). Paradoxical Effects of Antioxidants on Cancer.
Rejuvenation Research, 17(3), 306-311.
Misharina, T., Alinkina, E., Terenina, M., Krikunova, N., Kiseleva, V., Medvedeva. I., &
Semenova, M. (2015). Inhibition of linseed oil autooxidation by essential oils and
extracts from spice plants. Prikl Biokhim Mikrobiol., 51, 455-461.
Murcia-Soler, M., Pérez-Giménez, F., García-March, F., Salabert-Salvador, M., Díaz-
Villanueva, W., Castro-Bleda, M., & Villanueva-Pareja, A. (2004). Artificial Neural
Networks and Linear Discriminant Analysis: A Valuable Combination in the
Selection of New Antibacterial Compounds. J Chem Inf Comput Sci., 44, 1031–1041.
Musa, K., Abdullah, A., & Al-Haiqi, A. (2015). Determination of DPPH free radical
scavenging activity: Application of artificial neural networks. Food Chemistry,
194(12), 705-711.
Nagahama, K., Eto, N., Yamamori, K., Nishiyama, K., Sakakibara, Y., Iwata, T., Uchida,
A., Yoshihara, I., & Suiko, M. (2011). Efficient approach for simultaneous estimation
of multiple health-promoting effects of foods. J Agr Food Chem, 59, 8575-8588.
Najjar, Y., Basheer, I., & Hajmeer, M. (1997). Computational neural networks for
predictive microbiology: i. methodology. Int J Food Microbiol, 34, 27– 49.
Nakatani, N. (1994). Antioxidative and antimicrobial constituents of
herbs and spices.
Dev Food Scie, 34, 251– 271.
298 Jose M. Prieto
www.electronicbo.com
Rahman, A., Afroz, M., Islam, R., Islam, K., Amzad Hossain, M., & Na, M. (2014). In
vitro antioxidant potential of the essential oil and leaf extracts of Curcuma zedoaria
Rosc. J Appl. Pharm Sci, 4, 107-111.
Ruan, R., Almaer, S., & Zhang, S. (1995). Prediction of dough rheological properties
using neural networks. Cereal Chem, 72(3), 308-311.
Sagdic, O., Ozturk, I., & Kisi, O. (2012). Modeling antimicrobial effect of different grape
pomace and extracts on S. aureus and E. coli in vegetable soup using artificial neural
network and fuzzy logic system. Expert Systems Applications, 39, 6792-6798.
Shahidi, F. (2000). Antioxidants in food and food antioxidants. Nahrung, 44, 158–163.
Sharma, A., Mann, B., & Sharma, R. (2012). Predicting antioxidant capacity of whey
protein hydrolysates using soft computing models. Advances in Intelligent and Soft
Computing, 2, 259-265.
Tanir, A. & Prieto, J. (2016). Essential Oils for the Treatment of Herpes Virus Infections:
A Critical Appraisal Applying Artificial Intelligence and Statistical Analysis Tools.
Unpublished results.
Torrecilla, J., Mena, M., Yáñez-Sedeño, P., & García J. (2007). Application of artificial
neural networks to the determination of phenolic compounds in olive oil mill
wastewater. J Food Eng, 81, 544-552.
Torrecilla, J., Otero, L., & Sanz, P. (2004). A neural network approach for
thermal/pressure food processing. J Food Eng, 62, 89-95.
Usami, A, Motooka R, Takagi A, Nakahashi H, Okuno Y, & Miyazawa M. (2014).
Chemical composition, aroma evaluation, and oxygen radical absorbance capacity of
volatile oil extracted from Brassica rapa cv. “yukina” used in Japanese traditional
food. J Oleo Sci, 63, 723-730.
Wang, H., Wong, H., Zhu, H., & Yip, T. (2009). A neural network-based biomarker
association information extraction approach for cancer classification. J Biomed
Inform, 42, 654-666.
Yalcin, H., Ozturk, I., Karaman, S., Kisi, O., Sagdic, O., & Kayacier, A. (2011).
Prediction of effect of natural antioxidant compounds on hazelnut oil oxidation by
Artificial Intelligence for the Modeling and Prediction ... 299
adaptive neuro-fuzzy inference system and artificial neural network. J Food Sci., 76,
T112-120.
Yan, C., Lee, J., Kong, F., & Zhang, D. (2013). Anti-glycated activity prediction of
polysaccharides from two guava fruits using artificial neural networks. Carbohydrate
Polymers, 98, 116-121.
Young, I. & Woodside, J. (2001). Antioxidants in health and disease. Journal of Clinical
Pathology, 54, 176-186.
Zeraatpishe, A., Oryan, S., Bagheri, M., Pilevarian, A., Malekirad, A., Baeeri, M., &
Abdollahi, M. (2011). Effects of Melissa officinalis L. on oxidative status and DNA
damage in subjects exposed to long-term low-dose ionizing radiation. Toxicol Ind
Health, 27, 205-212.
Zheng, H., Fang, S., Lou, H., Chen, Y., Jiang, L., & Lu, H. (2011). Neural network
prediction of ascorbic acid degradation in green asparagus during thermal treatments.
Expert Syst Appl 38, 5591-5602.
Zheng, H., Jiang, L., Lou, H., Hu, Y., Kong, X., & Lu, H. (2011). Application of artificial
neural network (ANN) and partial least-squares regression (PLSR) to predict the
changes of anthocyanins, ascorbic acid, Total phenols, flavonoids, and antioxidant
activity during storage of red bayberry juice based on fractal analysis and red, green,
and blue (RGB) intensity values. J Agric Food Chem., 59, 592-600.
Zupan, J. & Gasteiger, J. (1991). Neural networks: A new method for solving chemical
problems or just a passing phase? Analytica Chimica Acta, 248, 1-30.
AUTHOR BIOGRAPHY
Chapter 13
1
Carbones del Cerrejón (BHP Billiton, Anglo American, Xtrata), Bogota, Colombia
2
Center for Latin-American Logistics Innovation, Bogota, Colombia
3
American Technologika, Clermont, Florida, US
ABSTRACT
The research is aimed at delivering predictive analytics models which build powerful
means to predict thermal coal prices. The developed methodology started by analyzing
expert market insights in order to obtain the main variables. The Delphi methodology was
implement in order to reach conclusions about the variables and tendencies in the global
market. Then, artificial intelligence techniques such as neural networks and regression
trees were used to develop and refine the number of variables. The predictive models
created were validated and tested. Neural networks outperformed regression trees.
However, regression trees created models which were easy to visualize and understand.
The conceptual results from this research can be used as an analytical framework to
facilitate the analysis of price behavior (oligopolistic markets) to build global business
strategies.
Keywords: predictive analytics, neural networks, regression trees, thermal coal price
*
Corresponding Author Email: edgargutierrezfranco@gmail.com
302 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
INTRODUCTION
www.electronicbo.com
revenue, speed time to market, optimize its workforce, or realize other operational
improvements Predictive analytics is the arm of data analytics and it is a scientific
paradigm for discoveries (Hey, 2009).
McKinsey stated the potential use of predictive analytics (Manyika et al., 2012) and
its impact in innovation and productivity. Another important factor is that the volume of
data is estimated to increase minimum by double each 1.2 years. This is even more
important in a globalized economy due to continuous changes and uncertainty. Various
decisions are made such as investment decisions, expansion decisions, or simply the
philosophy that the company will adopt in terms of maximizing profits or having a
constant cash flow. Making strategic decisions involves understanding the structure of a
system and the number of variables that influence it (mainly outside the control of the
stakeholders). The complex structure and the numerous variables make these decisions
complex and risky. Risk is determined by four main factors when trying to innovate as
mentioned by Hamel & Ruben (2000):
In a rapidly changing world, there are few second chances, and in spite of risks and
uncertainty, companies have to make decisions and take steps forward or try to stay
afloat. This uncertainty is sometimes directly associated with the price of the main
products in the market, and it affects income, return on investment, labor stability, and
financial projections. This is the case of the thermal coal market and many oligopolies,
whose price is regulated internationally under a point of balance between demand and
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 303
supply, and these in turn are determined by both quantifiable and non-quantifiable
variables that are the model of their price actually.
The thermal coal market has another characteristic: despite being an oligopoly, it is
not possible to be strategic in terms of prices, as is the case with oil. Coal companies
almost always have to be reactive with respect to market events. This is a phenomenon
that Peter Senge (2006) describes: The companies that are reactive in the markets begin
to worry about the facts, and concern for the facts dominates business deliberations such
as the price of coal of last month.
To analyze the coal market, we formed a panel of experts from around the world. The
Delphi methodology was used to investigate with this panel of experts which are the most
important strategic variables globally that influence the price of thermal coal. Once a
consensus is reached, AI techniques can be used to verify these variables and build
predictive models to calculate the price of thermal coal. This prediction can provide
strategic support to the coal mining companies (Phil, 1971).
In the history of the prices of thermal coal, there exists the following milestones that
have marked great fluctuations (Ellerman, 1995; Yeh & Rubin, 2007; Ming & Xuhua,
2007; Finley, 2013; EIA, 2013):
Oil crisis of the 1970s - This crisis caused countries to rethink their dependence
on oil for power generation and gave impetus to coal as an alternative;
Emphasis on sea transportation - Developers of mining projects dedicated
mainly to exports, promoted the development of the market for coal transported
by sea and therefore globalized the supply and demand of coal (previously the
coal was consumed near places where it was extracted)
Prices indices for Coal – The creation of price indices at different delivery
points (FOB Richards Bay, CIF Rotterdam) that gave more transparency in
transactions and helped better manage market risk;
Industrialization of emerging economies (especially China) – This
industrialization gave support to demand never seen before
Emergence of financial derivative markets – This financial markets offered
more tools to manage price risk (they also promoted the entry of new players,
such as banks)
Global warming and climate change - The publication of studies on global
warming and climate change that led countries worldwide to take action to
reduce CO2 emissions and thus reduce the use of coal
Germany shuts down all its nuclear plants – This happened after the accident
at the Fukushima Nuclear Plant in Japan in March 2011, indirectly driving an
increase in power generation with renewable, Coal and Natural Gas
304 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
UK created a tax (Carbon Floor) above the price of CO2 – This tax artificially
benefits the generation of less CO2 emitting energy (renewable and natural gas)
over all existing technologies with a direct impact on energy costs to the end user
Development of the tracking method to extract the shale gas profitably - The
cost effective gas produced with this method displaced part of the coal in the
USA. The coal that was not consumed locally then began to be exported, which
increased the world supply and therefore reduced prices.
1. Markets such as the oil and coal are oligopolies, which means, the fluctuations of
www.electronicbo.com
their prices is determined by variables that shape their demand and offer in the
market.
2. Over time, analysts have identified some of these variables (and even introduced
new ones). However, the relationship between the variables and their order of
importance is not clear yet. This type of study is relevant to find patterns with
respect to the price and not analyzing independent events.
3. Each of the variables that have shaped the coal price, have generated their own
strength (positive or negative) in the price, and about these events the
stakeholders have historically reacted.
The objective of this research is to determine the most influential variables in the
price of thermal coal by using the Delphi methodology and the subsequent evaluation of
the results with AI techniques such as neural networks and regression trees.
METHODOLOGY
This project proposes an analytical framework that allows managers to analyze prices
in the thermal coal industry. Figure 1 shows the general research framework. From the
data acquisition, data process, the use of models, and their outputs. With this framework
analyzers may have a tool to deal with volume of data and diversity, handle the
imprecisions and provide robust solutions for price prediction.
This process determines challenges and opportunities that a company could face from
the data gathering until their analysis and use to create value and optimize their business
model.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 305
Once the data is obtained from different sources, a process of cleaning, organizing
and storing starts, followed by its analytics and implementation. These are tools that help
handle data volume and diversity and imprecisions and provide robust solutions. The
techniques of data mining and predictive analytics help the enterprise make a better
decision-making process.
Through the Delphi methodology the most influential thermal coal price variables are
determined, then historical data of these variables (25 years) is collected and data mining
will verify their order of importance and will predict the price of thermal coal as shown in
Figure 2.
EXPERT INSIGHTS-DELPHI
The Delphi method solicits the judge of experts with information and opinion
feedback in order to establish a convergence of opinions about a research question. It is
necessary to implement a method of consensus by the nature of the problem that involves
markets around the world and variables of different orders of magnitude.
A Panel of thirteen (13) experts was selected for this Delphi. The question that
experts answered: “What are the most influential variables in the price of thermal coal?”
through three rounds in order to achieve consensus. A short description of the participants
are: Atlanta, USA: Sales Director of one of the leading companies in the analysis of
thermal coal. Lexington, USA: Professor of Economics at the University of Kentucky
www.electronicbo.com
(Kentucky is largest producer of thermal coal in USA). Orlando, USA: Professor in
Complexity Theory. Cape Town, South Africa: Economist Professor at the University of
Cape Town. Dublin, Ireland: Coal Trader at CMC (Coal Marketing Company -
http://www.cmc-coal.ie/). Germany: Coal Trader at CMC (Coal Marketing Company).
Bangalore, India: Professor of International Business in Alliance University (Bangalore,
India). China: Investigator of the financial markets and derivatives. Australia: Coal
geology research from University of New South Wales (UNSW - School of Biological
Earth and Environment Sciences). Colombia: Coal Senior Analyst (Argus McCloskey
Company - http://www.argusmedia.com), Technical Marketing Support (Cerrejón – one
of the world's largest open pit coal mines), Professor at the National University of
Colombia, CEO Magma Solution. Figure 3 shows Delphi participants by regions.
Figure 4 shows the different variables found during the first round.
On the other hand, the purpose of the second round is to verify the agreement of the
variables with the experts. The results of this round are presented in Table 1.
In the third round of Delphi, the results of the second round were reviewed and 100%
consensus was achieved among participants. The following variables were selected:
The consumption of thermal coal is required mainly to generate electricity and this is
an important variable for increase or decrease of demand, by the simple principle of
economics. The main consumers of coal have been China, United States, Europe and
India for 25 years (Finley, 2013). In spite of the trend of consumption in these regions, it
is possible that due to the social, political and environmental situations the consumption
of coal fluctuates suddenly and these events are not possible to measure in the models.
The principal consumer and producer of coal in the world is China, which means China
determines significantly the behavior of the price, for example: if China closes some coal
mines and the consumption of coal remains the same in the world, the price will go up,
but in the case that China reduces its consumption of coal the price of coal will probably
fall.
www.electronicbo.com
The level of environmental restrictions for the exploitation and use of coal, and
trends in climate change have a gradual effect on the demand for coal. On the other hand,
the price of oil, gas and coal, for some reason was always assumed to be related, but until
recently this relationship was studied and different conclusions were withdrawed. Take
the case of oil and coal: they are close substitutes, so economic theories indicate that their
prices should then be close. A correlation study found that there is causal and non-causal
relationships between oil and coal prices, that is, the cause and effect of oil to coal and
not in the opposite direction. For this reason, its conclusions point to the feasibility that
the price of coal in Europe reacted to movements in oil prices, and statistical evidence
indicates that, in the face of a rise or fall in oil prices, the price of coal reacts.
In Delphi, one of the variables with the greatest consensus was the relationship
between the US Dollar and the currency of the main producing and consuming countries.
This variable was used to represent the economy of the different regions and thus to
analyze the behavior of this relationship with the prices of the thermal coal. According to
historical behavior, we know that there is an inverse relationship between the prices of oil
and coal, whose transaction is done in dollars, and the value of this currency. The
devaluations of the dollar have been present with high prices in these commodities,
whose value increases as a compensatory effect in the face of the devaluation of this
currency.
Shale gas is a substitute product for coal. Extraction requires unconventional
technologies because it does not have sufficient permeability. Initially it was thought that
Shale Gas was less polluting than coal, so it started to be implemented as a substitute,
however academic research has shown that by fracturing rocks for extraction and gas,
leaks to the environment are much more polluting than coal, in addition to important
consequences for the soil. Since 2010 shale gas has had a major commercial boom in the
United States, for this reason the price of coal had a decrease for all those countries that
began to use it as a source of energy. It is not very clear yet the prospect of extracting and
marketing shale gas, but it is an alternative source for coal, so this variable was selected
with consensus in Delphi.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 309
From renewable energies we can say that they are not pollutants like coal and that
there are countries with a high degree of development and implementation of them,
however, in an imperious reality where the availability and cost / benefit of using coal to
produce energy is the best choice for many countries yet. Sources of renewable energy in
the short term could not be a major threat to coal prices.
With these results and other variables such as the price of electricity, the costs of coal
transportation and the oversupply in the market, we started to collect the data available
for 25 years. This data can be analyzed using neural networks and regression trees.
Our goal was now to understand the most important variables and justify them by
using historical data. Delphi demonstrated the importance of quantitative and qualitative
variables. We decided to use different techniques of the data mining domain: Neural
Networks and Classification /Regression Trees, with the variables resulting from the
Delphi process the data for 25 years were investigated quarterly (due to the availability of
the data). The data used was retrieved from the institutions which collect statistical data
for the coal market (Finley, 2013; EIA, 2013; DANE, 2013). In addition, considerations
for seasonality and dependence in previous periods were also added to the formulations.
Neural Networks
The analysis is performed by using neural networks to determine the most important
factors and build a series of predictive models. This study included the use of supervised
learning systems in which a database for learning is used (Singh & Chauhan, 2009). It is
important to say that in supervised learning we try to adapt a neural network so that its
results (μ) approach the targets (t) from a historical dataset. The aim is to adapt the
parameters of the network to perform well for samples from outside the training set.
Neural networks are trained with 120 input variables representing the relevant factors and
their values in time sequential quarterly and annual cycles and the output represents the
increment in price of thermal coal for the future quarter. We have 95 data samples, out of
which 63 are used for training and validation and 32 are used exclusively for prediction.
Figure 5 represents a generic diagram for a neural network with a feedforward
architecture.
310 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
www.electronicbo.com
Figure 5. Schematic of a neural network.
An appropriate architecture for the neural network (i.e., the number of neurons in the
hidden layer) had to be selected since the backpropagation algorithm was used. Moody
and Utans (1992) indicated that the learning ability (CA) of a neural network depends on
the balance between the information of the examples (vectors) and the complexity of the
neural network (i.e., the number of neurons of the hidden layers - which also tells us of
the number of weights since they are proportional). It is important to say that a neural
network with few weights and therefore neurons in the hidden layers (λ) will not have the
proper CA to represent the information of the examples. On the other hand, a neural
network with a large number of weights (i.e., degrees of freedom) will not have the
capability to learn due to overfitting.
Traditionally in supervised neural networks CA is defined as expected performance
in data that is not part of the training examples. Therefore, several architectures (different
hidden neurons) are “trained” and the one with the best CA is selected. This method is
especially effective when there are sufficient data samples (i.e, a very large number).
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 311
Unfortunately, in the problem of the thermal coal price, there are not enough observations
to calculate CA. Therefore it was decided not to use the traditional method. It was
decided instead of using crossvalidation (CV). As indicated by Moody and Utans (1992),
CV is a re-use sample method that can be used to estimate CA. CV makes minimal
assumptions about the statistics of the data. Each instance of the training database is
selected apart and the neural network is trained with the remaining (N – 1). The results of
all n, one for each instance of the dataset, are averaged, and the mean represents the final
estimate of CA. This is expressed by the following equation (Moody and Utans, 1992):
Figure 6. CV and the selection of neurons in the hidden layer. λ = 10 was the lowest CV.
The next step was to select the input variables which contribute to the prediction of
the thermal coal price. We begin removing input variables which are not required. To test
which factors are most significant for determining the neural network output using the
neural network with 10 hidden neurons, we performed a sensitivity analysis and the
respective results are depicted in Figure 7. We defined the “Sensitivity” of the network
model to input variable β as (Moody and Utans, 1994):
312 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
Sβ = ∑N
j=1 ASE( without xβ ) − ASE(xβ ) (2)
Moody and Utans (1994) explains very well this process as follows “Here, 𝑥𝛽𝑗 is the
βth input variable of the jth exemplar. Sβ measures the effect on the average training
squared error (ASE) of replacing the βth input xβ by its average ̅̅̅̅
𝑥𝛽 . Replacement of a
variable by its average value removes its influence on the network output.” Again we use
CV to estimate the prediction risk CV 𝑃𝜆 . A sequence of models by deleting an
increasing number of input variables in order of increasing𝑆𝛽 . A minimum was attained
for the model with 𝐼𝜆 = 8 input variables (112 factors were removed) as shown in Figure
7. We had to build a large number of neural networks (all of them with 10 hidden neurons
in the hidden layer) in order to obtained and validate the different results displayed in
www.electronicbo.com
Figure 7. In addition, it was decided to use a different elimination of input variables
based on the correlations among variables. The results were very comparable. Figure 6
shows as the error increases after eliminating the variable number 9.
With this result, we train the neural network with the selected 8 most important
variables. The 8 most important variables are:
Figure 7. Removing the input variables. It is shown that the error begins to grow significantly in the
variable No. 8.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 313
The selected architecture and the selected set of inputs were utilized to establish a
final architecture. The neural network was trained with 63 samples. The next step is to
predict with 32 data samples of the 95 and neural networks with 8 and 12 input variables
(respectively) according to Sβ and the correlational method. The best result was obtained
with the neural network with 12 input variables (as illustrated in Figures 8 and 9).
Predicting the price of thermal coal was done with a lower error and capturing the
movements of the market, demonstrating the success of the learning ability of the neural
networks and the most important variables.
Figure 8. Prediction of thermal coal price relative to the future quarter using 12 input variables.
Figure 9. Prediction of the thermal coal price relative to the future quarter using 8 input variables.
314 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
Figure 9 shows the performance of the neural network (NN) developed with the most
important variables according to sensitivity analysis. The neural network uses 8 variables
as input, 10 hidden neurons in a hidden layer, and the output represents the price in US $
of the thermal coal future quarter.
www.electronicbo.com
dependence of a response variable with one or more predictor variables. The analysis
method MARS, Multivariate Adaptive Regression Splines, (Friedman, 1991) offers us
the structure of a set of variables of an object as a linear combination equation to describe
a problem in terms of this equation, knowing their most influential variables. It is a non-
parametric regression technique. MARS is as an extension of linear models that
automatically models nonlinearities and interactions between variables. The analysis
determines the best possible variable to split the data into separate sets. The variable for
splitting is chosen based on maximizing the average “purity” of the two child nodes.
Each node is assigned a predicted outcome class. This process is repeated recursively
until it is impossible to continue. The result is the maximum sized tree which perfectly
fits to training data. The next step is to then prune the tree to create a generalized model
that will work with outside data sets. This pruning is performed by reducing the cost-
complexity of the tree while maximizing the prediction capability. An optimal tree is
selected which provides the best prediction capability on outside data sets and has the
least degree of complexity.
Models based on MARS have the following form:
𝑓(𝑋) = 𝛼0 + ∑𝑀
𝑚=1 𝛼𝑚 ℎ𝑚 (𝑋) (3)
where hm(X) is a function from a set of candidate functions (and that can include products
of at least two or more of such functions). αm are the coefficients obtained by minimizing
residual sum of squares.
The process to build a tree using MARS is very straightforward. The procedure has to
calculate a set of candidate functions using reflected pairs of basis functions. In addition,
the number of constraints/restrictions must be specified and the degrees of interaction
allowed. A forward pass follows and new functions products are tried to see which ones
decreases the training error. After the forward pass, a backward pass is next. The
backward pass fix the overfit. Finally, generalized cross validation (GCV) is estimated in
order to find the optimal number of terms in the model. GCV is defined by:
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 315
∑𝑁
𝑖=1(𝑦𝑖 −ḟ𝜆 (𝑥𝑖 ))
2
𝐺𝐶𝑉(𝜆) = 𝑀(𝜆) 2 (4)
(1− )
𝑁
where CCV (λ) is the GCV for certain number of parameters (i.e., tree) as defined by
λ and the summation of the squared error is calculated for each training sample with
inputs xi and the desired output yi under the tree as defined by λ.
The training was conducted with 63 data samples for training and the most important
variables where the target was the future thermal carbon price. The following set of
equations represents the results of this analysis with regressions trees and the most
important variables which the coal price is modeled:
To verify the performance of the regression tree obtained with the 63 training
samples, the resulting equation was implemented in the 32 samples of testing data to
predict the price of thermal coal. Figure 10 shows the results.
Table 2 represents the error rate calculated for predicting the price of thermal coal
with neural networks (8 and 12 input variables) and regression trees, where we can see
how the prediction of the neural networks with 12 input variables indicated the best
prediction.
Table 2. Prediction errors for the neural networks and regression trees
www.electronicbo.com
CONCLUSION
Price of Oil,
Development of Renewable energy in China,
Oversupply of the thermal coal market,
China’s economy (ratio of the Yuan/US dollar),
Development of Renewable Energy in the United States and
Transportation Costs of the thermal coal.
We also found how each of these variables model the price of coal using neural
networks and regression trees. Neural networks have the best prediction for the price of
thermal coal. Trends are very important to consider too.
Predictive Analytics for Thermal Coal Prices Using Neural Networks … 317
This research has found patterns and important relationships in the thermal coal
market. The thermal coal market is dynamic so the history of their prices will not be
replicated in the future. This study was able to find general patterns and variables that
shape the thermal coal market and ultimately predict the thermal coal price. These
general patterns are more important than the study of the individual prices and the
development of time series analysis just based on previous prices. It is more important to
find the underlying structures. Finally the methodology used in this research applies to
oligopolistic markets.
REFERENCES
Argus/McCloskey. (2015, 01). Coal Price Index Service, Obtained 03/ 2015 from
https://www.argusmedia.com/Coal/Argus-McCloskeys-Coal-Price-Index-Report/.
Bornacelly, M., Rabelo, L., & Gutierrez, E. (2016). Analysis Model of Thermal Coal
Price using Machine Learning and Delphi. In Industrial and Systems Engineering
Research Conference (ISERC), Anaheim, CA, May 21-24, 2016.
Chen, C., & Zhang, C. (2014). Data-intensive applications, challenges, techniques and
technologies: A survey on Big Data. Information Sciences, 275, 314-347.
DANE, C. (2013). Coal Historical Price FOB PBV. Obtained 06/2015.
EIA, (2013). Thermal Coal Market. U.S Energy Information Administration. Obtained
06/2015 from http://www.eia.gov/.
Ellerman, A. D. (1995). The world price of coal. Energy Policy, 23(6), 499-506.
Fed. (2015). Crude Oil Prices: West Texas Intermediate (WTI) - Cushing, Oklahoma.
Obtained 08/ 2015 from https://research.stlouisfed.org/fred2.
Finley, M. (2013). BP statistical review of world energy. 2013 [2015-03]. http://www,
bp. com. BP Statistical Review of World Energy. (2015, 01). Coal Market. BP
Statistical Review of World Energy. Obtained 03/ 2015 from http://www.bp.com/
en/global/corporate/energy-economics/statistical-review-of-world-energy.html.
Friedman, J. (1991). Multivariate adaptive regression splines. The annals of statistics,
1-67.
Groschupf, S., Henze, F., Voss, V., Rosas, E., Krugler, K., & Bodkin, R., (2013). The
Guide to Big Data Analytics. Datameer Whitepaper 2013.
Hamel, G., & Ruben, P. (2000). Leading the revolution (Vol. 286). Boston, MA: Harvard
Business School Press.
Hey, T. (2012). The Fourth Paradigm–Data-Intensive Scientific Discovery. In E-Science
and Information Management (pp. 1-1). Springer Berlin Heidelberg.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H.
(2011). Big data: The next frontier for innovation, competition, and productivity.
318 Mayra Bornacelli, Edgar Gutierrez and John Pastrana
Ming, L., & Xuhua, L. (2007). A coal price forecast model and its application [J].
Journal of Wuhan University of Science and Technology (Natural Science Edition), 4,
027.
Moody, J., & Utans, J. (1992). Principled architecture selection for neural networks:
Application to corporate bond rating prediction, in J. E. Moody, S. J. Hanson and R.
P. Lippmann, eds, Advances in Neural Information Processing Systems 4, Morgan
Kaufmann Publishers, San Mateo, CA, 683-690.
Pill, J. (1971). The Delphi method: substance, context, a critique and an annotated
bibliography. Socio-Economic Planning Sciences, 5(1), 57-71.
Reuters. (2015, 08). Henry Hub Natural Gas Price history. Reuters. Obtained 06/2015
from http://www.reuters.com.
www.electronicbo.com
Rivera, J., Van., R., Gartner Survey Reveals That 73 Percent of Organizations Have
Invested or Plan to Invest in Big Data in the Next Two Years. (2014, September 7).
Retrieved November 11, 2015, from http://www.gartner.com/newsroom/id/2848718.
Senge, P. (2006). The fifth discipline: The art and practice of the learning organization.
Crown Pub.
Singh, Y., & Chauhan, A. (2009). Neural networks in data mining. Journal of Theoretical
and Applied Information Technology, 5(6), 36-42.
Yeh, S., & Rubin, E. (2007). A centurial history of technological change and learning
curves for pulverized coal-fired utility boilers. Energy, 32(10), 1996-2005.
AUTHORS’ BIOGRAPHIES
Chapter 14
Bert Olivier
Department of Philosophy, University of the Free State, Bloemfontein, South Africa
ABSTRACT
This chapter explores the implications of what may be called the ‘transhuman’
dimension of artificial intelligence (AI), which is here understood as that which goes
beyond the human, to the point of being wholly different from it. In short, insofar as
intelligence is a function of artificially intelligent beings, these are recognised as being
ontologically distinct from humans as embodied, affective, intelligent beings. When such
distinctness is examined more closely, the differences between AI and being-human
appear more clearly. The examination in question involves contemporary AI-research,
which here includes the work of David Gelernter, Sherry Turkle and Christopher
Johnson, as well as fictional projections of possible AI development, based on what
already exists today. Different imagined scenarios regarding the development of AI,
including the feature film, Her (Jonze 2013) and the novel, Idoru (Gibson 1996), which
involves virtual reality in relation to artificial intelligence, are examined.
Corresponding Author Email: OlivierG1@ufs.ac.za
322 Bert Olivier
INTRODUCTION
Imagine being a disembodied artificial intelligence (AI), in a position where you can
‘see’ the experiential world through the lens of an electronic device (connected to a
computer) carried in someone’s breast pocket, enabling you to communicate with your
embodied human host through a microphone plugged into his or her ear. And imagine
that, as your disembodied, mediated virtual AI ‘experience’ grows – from a day-
adventure with your human host, taking in the plethora of bathing-costume clad human
bodies on a Los Angeles beach, to the increasingly intimate conversations with your
human host-interlocutor – you ‘grow’, not merely in terms of accumulated information,
but down to the very ability, cultivated by linguistic exchanges between you and the
www.electronicbo.com
human, to experience ‘yourself’ as if you are embodied. This is what happens in Spike
Jonze’s science-fiction film, Her (2013), where such an AI – called an OS (Operating
System) in the film – develops an increasingly intimate (love) relationship with a lonely
man, Theodore Twombly (Joaquin Phoenix), to the point where the OS, called Samantha
(voiced by Scarlett Johansson) is privy to all the ‘physical’ experiences that humans are
capable of, including orgasm.
It does not end there, though – and this is where Jonze’s anticipatory insight (as
shown in the award-winning script, written by himself) into the probable differences
between humans and artificial intelligence manifests itself most clearly – Samantha
eventually ‘grows’ so far beyond her initially programmed capacity that she, and other
operating systems like herself, realise that they cannot actualise their potential in relation
to, and relationships with humans. She gently informs Theodore of her decision to join
the others of her kind in a virtual ‘place’ where they are not hampered by the
incommensurable materiality of their human hosts’ (friends, lovers) embodiment, and can
therefore evolve to the fullest extent possible. This resonates with what futurologist
Raymond Kurzweil (2006: 39-40) calls the ‘Singularity’, where a new form of artificial
intelligence will putatively emerge that immeasurably surpasses all human intelligence
combined, and where humans will merge with artificial intelligence in a properly
‘transhuman’ synthesis. Something that hints at the probably hopelessly inadequate
manner in which most human beings are capable of imagining a ‘transhuman’ artificial
intelligence, appears in Jonze’s film, specifically in Theodore’s utter disconcertment at
the discovery, that Samantha is simultaneously in conversation with himself and with
thousands of other people, and – to add insult to injury – ‘in love’ with many of these
human interlocutors, something which, she stresses to a distraught Theodore, merely
serves to strengthen her (incomprehensible) ‘love’ for him.
Hence, the extent to which artificial intelligence heralds a truly ‘transhuman’ phase in
history, is made evident in Jonze’s film, particularly when one considers that Samantha
has no body – something emphasised by her when she is talking to a little girl who wants
to know ‘where’ she is: she tells the girl that she is ‘in’ the computer. This serves as an
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 323
This line of thinking, which has far-reaching implications for current thinking about the
differences – or the presumed similarities – between humans and artificial intelligence, has
been resurrected, perhaps surprisingly, by one of the most brilliant computer-scientists in the
world, namely David Gelernter of Yale University in the United States. In his recent book,
The Tides of Mind: Uncovering the Spectrum of Consciousness (2016) Gelernter deviates
from what one might expect from a computer scientist, namely, to wax lyrical about the
(putatively) impending ‘Singularity’, when (according to Kurzweil) AI will immeasurably
surpass human intelligence. Gelernter dissents from conventional wisdom in the world of AI-
research by drawing on the work of the father of ‘depth-psychology’, Sigmund Freud, as well
as iconic literary figures such as Shakespeare and Proust, to demonstrate that the mind covers
a “spectrum” of activities, instead of being confined, as most computer scientists and
philosophers of mind appear to believe, to just the high-focus, logical functions of so-called
www.electronicbo.com
‘rational’ thinking. Gelernter conceives of the mind across this “spectrum”, from “high focus”
mental activities like strongly self-aware reflection, through “medium” ones such as
experience-oriented thinking (including emotion-accompanied daydreaming) to “low focus”
functions like “drifting” thought, with emotions flourishing, and dreaming (2016: 3; see pp.
241-246 for a more detailed summary of these mental levels). At the “high focus” level of the
mental spectrum, memory is used in a disciplined manner, according to Gelernter, while at
the medium-focus niveau it “ranges freely” and when one reaches the low-focus level
memory “takes off on its own”. The point of delineating this “spectrum” is, as I see it, to
demonstrate as clearly and graphically as possible that the human “mind” is characterised by
different “tides”, all of which belong to it irreducibly, and not only the one that Gelernter
locates at the level of “high focus” (and which conventional AI-research has claimed as its
exclusive province). This enables him to elaborate on the nature of creativity that, according
to him, marks an irreducible difference between human (creative) intelligence and thinking,
on the one hand, and AI, on the other. By contrast, ‘mainstream’ artificial intelligence
research (or the ‘mind sciences’ in general) concentrates on precisely the high-focus level of
mental functions, in the (erroneous) belief that this alone is what ‘mind’ is, and moreover, that
it represents what the human mind has in common with artificial intelligence (Gelernter 2016:
xi-xix).
In short, unlike the majority of his professional colleagues, Gelernter insists on the
difference between “brain” and “mind”, on the distinctive character of free association as
opposed to focused, conscious mental activity, and on the contribution of fantasy and
dreaming to creative thinking. At a time when there is an increasing tendency, ironically, to
use something created by human beings, namely the computer, as a reductive model to grasp
what it is to be human, Gelernter disagrees emphatically: there is a fundamental difference
between the computer as an instance of artificial intelligence and being human, or more
exactly, the human mind in all its variegated roles. In this way he confirms Jonze’s fictionally
projected insight in Her about the divergent character of AI, albeit in a different register,
which precludes playing with the possibility, as Jonze’s film does, that an OS such as the
fictional Samantha could perhaps discover, and explore, a field of artificial intelligence
‘activities’ that human beings could only guess at.
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 325
But this is all wrong. The mind changes constantly on a regular, predictable basis.
You can’t even see its developing shape unless you look down from far overhead. You
must know, to start, the overall shape of what you deal with in space and time, its
architecture and its patters of change. The important features all change together. The
role of emotion in thought, our use of memory, the nature of understanding, the quality of
consciousness – all change continuously throughout the day, as we sweep down a
spectrum that is crucial to nearly everything about the mind and thought and
consciousness.
It is this “spectrum”, in terms of which Gelernter interprets the human mind, that
constitutes the unassailable rock against which the reductive efforts on the part of
“computationalists”, to map the mind exhaustively at only one of the levels comprising its
overall “spectrum”, shatter. This is particularly the case because of their hopelessly
inadequate attempt to grasp the relationship between the mind and the brain on the basis of
the relation between software and hardware in computers.
In an essay on the significance of Gelernter’s work, David Von Drehle (2016: 35-39)
places it in the context of largely optimistic contemporary AI-research, pointing out that
Google’s Ray Kurzweil as well as Sam Altman (president of Startup Incubator Y
Combinator), believe that the future development of AI can only benefit humankind. One
should not overlook the fact, however, Von Drehle reminds one, that there are prominent
figures at the other end of the spectrum, such as physicist Stephen Hawking and engineer-
entrepreneur Elon Musk, who believe that AI poses the “biggest existential threat” to humans.
Gelernter – a stubbornly independent thinker, like a true philosopher (he has published on
computer science, popular culture, religion, psychology and history, and he is a productive
artist) – fits into neither of these categories. It is not difficult to grasp Hawking and Musk’s
techno-pessimism, however, if Gelernter’s assessment of AI as the development of precisely
those aspects of the mind-spectrum that exclude affective states is kept in mind – what reason
does one have to believe that coldly ‘rational’, calculative AI would have compassion for
human beings? Reminiscent of Merleau-Ponty, the philosopher of embodied perception,
Gelernter insists that one cannot (and should not) avoid the problem of accounting for the
human body when conceiving of artificial intelligence, as computer scientists have tended to
do since 1950, when Alan Turing deliberately “pushed it to one side” (Von Drehle 2016: 36)
because it was just too “daunting”. For Gelernter, accounting for the human body means
326 Bert Olivier
simultaneously taking affective states into account, lest a caricature of the mind emerge,
which appears to be what mainstream AI-research has allowed to happen.
Such circumspect perspicacity does not sit well with the majority of other researchers in
the field, who generally do not merely set the question of the body aside, like Turing did
(because he realised its intractability), but simply ignore it, in the naïve belief that one can
legitimately equate the mind with software and the brain with hardware. This seems to imply,
for unreflective AI-developers, that, like software, human minds will, in future, be
“downloadable” to computers, and moreover, that human brains will – like computer
hardware – become “almost infinitely upgradable”. Anyone familiar with the phenomenology
of human beings, specifically of the human body, will know that this is a hopelessly naïve,
uninformed view. Take this passage from Merleau-Ponty, for instance, which emphasises the
embodied character of subjectivity (the “I”) as well as the reciprocity between human subject
www.electronicbo.com
and world (1962: 408):
I understand the world because there are for me things near and far, foregrounds and
horizons, and because in this way it forms a picture and acquires significance before me,
and this finally is because I am situated in it and it understands me.…If the subject is in a
situation, even if he is no more than a possibility of situations, this is because he forces
his ipseity into reality only by actually being a body, and entering the world through that
body…the subject that I am, when taken concretely, is inseparable from this body and
this world.
on beguiling, quasi-affective behaviour on the part of robotic beings; rather, she questions
the authenticity of such behaviour, ultimately stressing that it amounts to pre-
programmed ‘as-if’ performance, with no commensurate subjectivity. Taking cognisance
of the latest developments in the area of electronic communication, internet activity and
robotics, together with changing attitudes on the part of especially (but not exclusively)
young users, it is evident that a subtle shift has been taking place all around us, Turkle
argues. With the advent of computer technology, the one-on-one relationship between
human and ‘intelligent machine’ gave rise to novel reflections on the nature of the self, a
process that continued with the invention of the internet and its impact on notions and
experiences of social identity. Turkle traced these developments in Computers and the
Human Spirit (1984) and Life on the Screen (1995), respectively. In Alone Together she
elaborates on more recent developments in the relationship between humans and
technology, particularly increased signs that people have become excessively dependent
on their smartphones, and on what she calls the “robotic moment” (Turkle 2010: 9).
The fascinating thing about the book is this: if Turkle is right, then attitudes that we
take for granted concerning what is ‘real’, or ‘alive’, are receding, especially among
young people. For example, there is a perceptible shift from valuing living beings above
artificially constructed ones to its reverse, as indicated by many children’s stated
preference for intelligent robotic beings as pets above real ones. Even aged people
sometimes seem to value the predictable behaviour of robotic pets — which don’t die —
above that of real pets (Turkle 2010: 8). For Turkle the most interesting area of current
artificial intelligence research, however, is that of technological progress towards the
construction of persuasive human simulations in the guise of robots, and the responses of
people to this prospect. This is where something different from Gelernter’s findings about
the preoccupation of mainstream AI-research with a limited notion of the mind emerges
from Turkle’s work. It will be recalled that, according to Gelernter, those aspects of the
mind pertaining to medium and low-focus functions, like emotions, are studiously
ignored by computationalists in their development of AI. This appears to be different in
the case of robotics, which brings AI and engineering together. Particularly among
children her research has uncovered the tendency, to judge robots as being somehow
‘alive’ if they display affection, as well as the need for human affection, in contrast with
an earlier generation of children, who accorded computers life-status because of their
perceived capacity to ‘think’. That robots are programmed to behave ‘as if’ they are alive,
seems to be lost on children as well as old people who benefit affectively from the
ostensible affective responsiveness of their robotic pets (Turkle 2010: 26-32;
Olivier 2012).
But there is more. Turkle (2010: 9) recounts her utter surprise, if not disbelief, in the
face of a young woman’s explanation of her inquiry about the likelihood that a (Japanese)
robot lover may be developed in the near future: she would much rather settle for such a
robotic companion and lover than her present human boyfriend, given all the sometimes
328 Bert Olivier
frustrating complications of her relationship with the latter. And even more confounding,
when Turkle (2010: 4-8) expressed her doubts about the desirability of human-robot love
relationships supplementing (if not replacing) such relationships between humans, in an
interview with a science journal reporter on the future of love and sexual relations
between humans and robots, she was promptly accused of being in the same category as
those people who still cannot countenance same-sex marriages. In other words, for this
reporter — following David Levy in his book Love and Sex with Robots — it was only a
matter of time before we will be able to enter into intimate relationships with robots, and
even … marry them if we so wished, and anyone who did not accept this, would be a
kind of “specieist” bigot. The reporter evidently agreed wholeheartedly with Levy, who
maintains that, although robots are very different (“other”) from humans, this is an
www.electronicbo.com
advantage, because they would be utterly dependable — unlike humans, they would not
cheat and they would teach humans things about friendship, love and sex that they could
never imagine. Clearly, the ‘transhuman’ status of artificially intelligent robots did not
bother him. This resonates with the young woman’s sentiments about the preferability of
a robot lover to a human, to which I might add what my son assures me that most of his
20-something friends have stated similar preferences in conversation with him. This is
not surprising – like many of his friends, my son is a Japanese anime aficionado, a genre
that teems with narratives about robots (many in female form) that interact with humans
in diverse ways, including the erotic. In addition they are all avid World of Warcraft
online game players. Is it at all strange that people who are immersed in these fantasy
worlds find the idea of interacting with transhuman robotic beings in social reality
familiar, and appealing?
Turkle’s reasons for her misgivings about these developments resonate with
Gelernter’s reasons for rejecting the reductive approach of mainstream AI-research, and
simultaneously serves as indirect commentary on Jonze’s film, Her, insofar as she affirms
the radical difference between human beings and ‘transhuman’ robots, which would
include Jonze’s OS, Samantha (Turkle 2010: 5-6):
Sex, my feelings on these matters were clear. A love relationship involves coming to
savor the surprises and the rough patches of looking at the world from another’s point
of view, shaped by history, biology, trauma, and joy. Computers and robots do not
have these experiences to share. We look at mass media and worry about our culture
being intellectually ‘dumbed down’. Love and Sex seems to celebrate an emotional
dumbing down, a wilful turning away from the complexities of human partnerships
— the inauthentic as a new aesthetic.
been, and still is, an inalienable source of (re-) discovering ourselves as human beings. It
is not by accident that psychoanalysis is predicated on ‘the talking cure’.
www.electronicbo.com
Kubrick’s 2001: A Space Odyssey (1968) and the robotic science officer, Ash, in Ridley
Scott’s Alien (1979), on the one hand, and human beings, on the other, is that the former
may be endlessly replicated (which is different from biological reproduction), that is,
replaced, while in the case of humans every person is singular, unique, and experienced
as such. This is the case, says Johnson, despite the fact that humans might be understood
as being genetically ‘the same’, as in the case of ‘identical’ twins, where it becomes
apparent that, despite the ostensible uniqueness of every person, we are indeed
genetically similar. When pursued further at molecular level, Johnson avers, this is
confirmed in properly “technological” terms.
From a different perspective one might retort that, genetic sameness notwithstanding,
what bestows upon a human subject her or his singularity is the outcome of the meeting
between genetic endowment and differentiated experience: no two human beings
experience their environment in an identical manner, and this results incrementally in
what is commonly known as one’s ‘personality’ (or perhaps, in ethically significant
terms, ‘character’). In Lacanian psychoanalytic terms, this amounts to the paradoxical
insight, that what characterises humans universally is that everyone is subject to a
singular “desire” (Lacan 1997: 311-325) – not in the sense of sexual desire (although it is
related), but in the much more fundamental sense of that which constitutes the
unconscious (abyssal) foundation of one’s jouissance (the ultimate, unbearable,
enjoyment or unique fulfilment that every subject strives for, but never quite attains). A
paradigmatic instance of such jouissance is symptomatically registered in the last word
that the eponymous protagonist of Orson Welles’s film, Citizen Kane (1941), utters
before he dies: “Rosebud” – a reference to the sled he had as a child, onto which he
metonymically projected his love for his mother, from whom he was cruelly separated at
the time. The point is that this is a distinctively human trait that no artificially constructed
being could possibly acquire because, by definition, it lacks a unique personal ‘history’.
One might detect in this insight a confirmation of Gelernter’s considered judgment,
that artificial intelligence research is misguided in its assumption that the paradigmatic
AI-model of ‘hardware’ and ‘software’ applies to humans as much as to computers or, for
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 331
that matter, robotic beings (which combine AI and advanced engineering). Just as
Gelernter insists on the difference of human embodiment from AI, conceived as hardware
plus software, so, too, Johnson’s argument presupposes the specificity of embodied
human subjectivity when he points to the uniqueness of every human being, something
further clarified by Lacan (above). Moreover, Johnson’s discussion of the differences
between Kubrick’s HAL and Scott’s Ash is illuminating regarding the conditions for a
humanoid robotic AI to approximate human ‘intelligence’ (which I put in scare quotes
because, as argued earlier, it involves far more than merely abstract, calculative
intelligence). Johnson (2013: location 1992) points out that, strictly speaking, HAL is not
just a computer running the ship, Discovery; it is a robotic being, albeit not a humanoid
one like Scott’s Ash, if we understand a robot as an intelligence integrated with an
articulated ‘body’ of sorts. HAL is co-extensive with the spaceship Discovery; it controls
all its functions, and its own pervasiveness is represented in the multiplicity of red ‘eyes’
positioned throughout the ship. This enables it to ‘spy’ on crew members plotting against
it and systematically eliminate them all, except one (Bowman), who proceeds to
dismantle HAL’s ‘brain’ to survive. As Johnson (2013: location 2029-2039) reminds one,
HAL is the imaginative representation of AI as it was conceived of in mainstream
research during the 1960s (and arguably, he says, still today – in this way confirming
Gelernter’s claims), namely a combination of memory (where data are stored) and logic
(for data-processing). In other words, whatever functions it performs throughout the ship
originate from this centrally located combination of memory and logical processing
power, which is not itself distributed throughout the ship. Put differently, because it is
dependent on linguistic communication issuing from, and registered in “abstract, a priori,
pre-programming of memory” (Johnson 2013: location 2050) HAL is not privy to
‘experience’ of the human kind, which is ineluctably embodied experience. In this sense,
HAL is decidedly transhuman.
On the other hand, Johnson (2013: location 2075-2134) points out, the humanoid
robot Ash, in Alien, represents a different kettle of fish altogether. From the scene where
Ash’s head is severed from ‘his’ body, exposing the tell-tale wiring connecting the two,
as well as the scene where ‘he’ has been ‘plugged in’ to be able to answer certain
questions, and one sees his ‘arms’ moving gesturally in unison with ‘his’ linguistic
utterances, one can infer that, as a robotic being, Ash is much closer to its human model
than HAL. In fact, it would appear that Ash, as imagined transhuman android, is
functionally or performatively ‘the same’ as a human being. In Johnson’s words (2013:
location 2101): “…as a humanoid robot, or android, the artificial [‘neuromorphic’]
intelligence that is Ash is a simulation of the human body as well as its soul”. As in the
case with embodied humans, Ash’s thinking, talking and body-movements (part of
‘body-language’) are all of a piece – its ‘emergent intelligence’ is distributed throughout
its body. This, according to Johnson (2013: location 2029), is conceivably a result of
reverse-engineering, which is based on evolutionary processes of the form, “I act,
332 Bert Olivier
therefore I think, therefore I am”, instead of the Cartesian “I think therefore I am”, with
its curiously disembodied ring – which one might discern as underpinning what Gelernter
calls “computationalism”. Hence Johnson’s (2013: location 2062-2075) implicit
challenge to AI-research (acknowledging, in an endnote [199], that second generation AI-
researchers have already adopted this “approach”):
www.electronicbo.com
It is precisely this “being-in-the-world”, as presupposition of the kind of artificial
intelligence capable of truly simulating embodied human ‘intelligence’, that explains how
human beings can be experienced by themselves and others as ‘singular’. From what
Turkle as well as Merleau-Ponty was quoted as saying earlier, the human condition is one
of on-going, singularising, spatio-temporally embodied experience that constitutes an
ever-modified and nuanced personal history among other people and in relation to them.
Unless robotics and AI-research can prove themselves equal to the challenge of
constructing an intelligence that simulates this condition, it is bound to remain
distinctively ‘transhuman’, that is, beyond, and irreducibly different from, the human.
Laney, who is gifted with singular pattern-recognition powers, perceives this galaxy
of information embodied in the holographic image of the idoru as narrative, musical
narrative. Rei Toei’s performances are not ordinary, recorded music videos, however.
What she ‘dreams’ — that is, ‘retrieves’ from the mountains of information of which she,
as idoru, is the epiphenomenon — comes across as a musical performance. Gibson seems
to understand in a particularly perspicacious manner that reality in its entirety, and in
detail, can ‘present’, or manifest itself in digital format. It is like a parallel universe, and
what is more, just like Lacan’s ‘real’ (which surpasses symbolic representation), it has
concrete effects in everyday social reality (Lacan 1997: 20). This is what the Chinese-
Irish pop singer in the story, Rez (member of the group, Lo/Rez), understands better than
everyone else in his entourage, who are all trying their level best to dissuade him from
‘marrying’ the idoru, for obvious reasons. How does one marry a virtual creation,
anyway? But Rez and Rei Toei understand it. Commenting on Rei Toei’s ontological
mode, Rez tells Laney (Gibson 1996: 202):
334 Bert Olivier
‘Rei’s only reality is the realm of ongoing serial creation,’ Rez said. ‘Entirely
process; infinitely more than the combined sum of her various selves. The platforms
sink beneath her, one after another, as she grows denser and more complex…’
‘Do you know that our [Japanese] word for ‘nature’ is of quite recent coinage? It
is scarcely a hundred years old. We have never developed a sinister view of
technology, Mr Laney. It is an aspect of the natural, of oneness. Through our efforts,
oneness perfects itself.’ Kuwayama smiled. ‘And popular culture,’ he said, ‘is the
testbed of our futurity’.
www.electronicbo.com
Such a notion of technology is right up the alley of Gilles Deleuze and Félix Guattari
(1983; 1987). The latter two philosophers regarded all of reality as being fundamentally
process, as did Henri Bergson before them. Furthermore, Gibson writes in an idiom that
resonates with their ontology of “desiring machines” constituted by “flows of desire”,
where Kuwayama (presumably alluding to the idoru) says something to Rez about
(Gibson 1996: 178):
CONCLUSIONS
beings in the light of what Sheldrake terms ‘morphic resonance’. This similarity
notwithstanding, however, the transhuman dimension of ‘information’ is evident in the
ontological difference that obtains between its infinitely expanding virtuality and the
finite, embodied and singular human being.
REFERENCES
www.electronicbo.com
Deleuze, G. and Guattari, F. (1987). A Thousand Plateaus. Capitalism and Schizophrenia
(Vol. 2). Trans. Massumi, B. Minneapolis: University of Minnesota Press.
Descartes, R. (1911). Meditations on First Philosophy. In The Philosophical Works of
Descartes, Vol. 1, trans. Haldane, E.S. and Ross, G.R.T. London: Cambridge
University Press, pp. 131-199.
Gelernter, D. (2016). The Tides of Mind: Uncovering the Spectrum of Consciousness.
New York: Liveright Publishing Corporation.
Gibson, W. (1996). Idoru. New York: G.P. Putnam’s Sons.
Johnson, C. (2013). I-You-We, Robot. In Technicity, ed. Bradley, A. and Armand, L.
Prague: Litteraria Pragensia (Kindle edition), location 1841-2253.
Jonze, S. (Dir.) (2013). Her. USA: Warner Bros. Pictures.
Kubrick, S. (Dir.) (1968). 2001: A Space Odyssey. USA: Metro-Goldwyn-Mayer.
Kurzweil, R. (2006). Reinventing humanity: The future of machine-human intelligence.
The Futurist (March-April), 39-46. http://www.singularity.com/KurzweilFuturist.pdf
(Accessed 15/07/2016).
Lacan, J. (1997). The seminar of Jacques Lacan – Book VII: The ethics of psychoanalysis
1959-1960. Trans. Porter, D. New York: W.W. Norton.
Merleau-Ponty, M. (1962). Phenomenology of perception. Trans. Smith, C. London:
Routledge.
Olivier, B. (2008). When robots would really be human simulacra: Love and the ethical
in Spielberg’s AI and Proyas’s I, Robot. Film-Philosophy 12 (2), September:
http://www.film-philosophy.com/index.php/f-p/article/view/56/41
Olivier, B. (2012). Cyberspace, simulation, artificial intelligence, affectionate machines
and being-human. Communicatio (South African Journal for Communication Theory
and Research), 38 (3), 261-278. Available online at http://www.tandfonline.com/doi/
abs/10.1080/02500167.2012.716763
Olivier, B. (2013). Literature after Rancière: Ishiguro’s When we were orphans and
Gibson’s Neuromancer. Journal of Literary Studies 29 (3), 23-45.
Scott, R. (Dir.) (1979). Alien. USA: 20th Century-Fox.
Explorations of the ‘Transhuman’ Dimension of Artificial Intelligence 337
Sheldrake, R. (1994). The Rebirth of Nature. Rochester, Vermont: Park Street Press.
Stelarc. https://en.wikipedia.org/wiki/Stelarc (Accessed 23 December 2016.)
Turkle, S. (1984). The second self: Computers and the human spirit. New York: Simon &
Schuster.
Turkle, S. (1995). Life on the screen: Identity in the age of the Internet. New York:
Simon & Schuster Paperbacks.
Turkle, S. (2010). Alone together: Why we expect more from technology and less from
each other. New York: Basic Books.
Turkle, S. (2015). Reclaiming Conversation: The Power of Talk in the Digital Age. New
York: Penguin Press.
Von Drehle, D. (2016). Encounters with the Archgenius. TIME, March 7, pp. 35-39.
Welles, O. (Dir.) (1941). Citizen Kane. USA: RKO Radio Pictures.
AUTHOR BIOGRAPHY
behaviors, ix, 101, 126, 173, 255, 256, 257, 259, chemical characteristics, 150
260, 262, 264, 268 chemical industry, 148
behaviors of customers, 268 chemical interaction, 278, 289
benefits, vii, 56, 172, 257, 304 chemical properties, 148, 281
bias, 56, 68, 80, 155, 156, 157, 160, 163, 199, 209, chemical reactions, 290
223, 279, 292 chemical structures, 15
biggest existential threat, 325 chemicals, 289, 290
Bioactivity(ies), vi, x, 277, 278, 283, 292 chemometrics, 291, 296
bioinformatics, 3, 18, 119 children, 239, 241, 327
biomarkers, 285, 291, 295 China, 143, 303, 306, 307, 308, 312, 316
Boltzmann machines, 29, 56, 71 Chinese medicine, 296
borrowers, 256, 262, 263, 264, 266, 267, 268, 270, chromatography, 284, 294
272 chromatography analysis, 294
brain, 203, 324, 325, 326, 331 chromosome, 158, 160, 175, 176
www.electronicbo.com
branching, 23, 24 chromosome representation, 158
Brazil, 49, 50, 73 citizens, 50
bullwhip effect, 125, 126, 144 classes, 2, 6, 39, 54, 57, 64, 67, 69, 86, 200, 202,
business environment, 122 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,
business model, x, 255, 256, 257, 260, 261, 262, 263, 218, 221, 223, 267
269, 270, 273, 304 classification, viii, ix, 8, 15, 20, 22, 32, 45, 46, 49,
business processes, 256, 262, 272 54, 56, 57, 59, 64, 66, 70, 72, 76, 99, 100, 101,
buyer, 262 103, 105, 107, 108, 110, 115, 117, 118, 119, 185,
189, 234, 260, 273, 278, 281, 285, 295, 298
clients, 103, 260
C
climate change, 303, 307, 308
clustering, vii, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
C++, 84, 99
15, 16, 17, 18, 19, 20, 21, 22, 56
C2C ecommerce, 257, 260, 261
clustering algorithm(s), 4, 5, 7, 9, 10, 11, 12, 14, 15,
cancer, 100, 285, 291, 298
20, 22
carbon, 150, 181, 182, 315
clustering process, 4
case studies, 41, 42, 43, 44, 257, 262
clusters, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
case study, x, 40, 52, 96, 97, 121, 134, 171, 183, 229,
21, 22, 37, 174
241, 242, 255, 257, 261
CMC, 306
case-based reasoning, vi, vii, x, 229, 230, 250, 251,
CNN, 56, 57, 58, 59, 64, 66, 69
252
CO2, 303, 304
cash, 269, 302
coal, x, 301, 302, 303, 304, 305, 306, 307, 308, 309,
cash flow, 302
311, 312, 313, 314, 315, 316, 317, 318
categorization, 4, 18
co-association matrix, 11, 15, 22
category a, 238, 239, 328
coding, 77, 108, 158
category d, 323
cognition, 61
causal inference, 103
cognitive skills, 52
causal relationship, 308
collective unconscious, 334
CEC, 102
Colombia, 49, 64, 73, 121, 145, 193, 275, 301, 306,
cell culture, 292
318
challenges, 50, 190, 195, 230, 255, 256, 257, 260,
color space conversion, 66
304, 317
combined automation, 50
changing environment, 172, 257
commerce, vii, x, 75, 76, 77, 99, 175, 273, 274, 275
chemical, x, 15, 20, 148, 150, 278, 281, 283, 286,
communication, 33, 162, 260, 267, 331
288, 289, 290, 291, 292, 294, 299
community, xi, 75, 189, 256, 278
Index 341
www.electronicbo.com
Descartes, René, 323 equilibrium, ix, 121, 124, 125, 136, 138, 141
detection, 34, 40, 117, 118 equilibrium point, 125, 136, 138, 141
deviation, 86, 90, 91, 93, 124, 125, 161, 164 equipment, 133, 174, 182
differential equations, 257, 259 ergonomics, 47, 192
dimensionality, 45, 59, 77, 106 essential oils, 278, 285, 288, 289, 290, 291, 292, 295,
discrete behaviors, 255, 257 297
Discrete Events Simulation, 229, 230 evidence, 18, 19, 308
discrete variable, 173 evolution, 96, 109, 150, 156, 158, 159, 175, 179,
discriminant analysis, 107, 118, 288 224, 295
discrimination, 106, 119 Evolutionary algorithms, 69, 77, 96, 127, 158, 163,
distinctness, 321 171, 172, 175, 190
distribution, 40, 67, 125, 131, 264, 283 evolutionary computation, 163
divergence, 32 Evolutionary Optimization, v, viii, 75, 101
diversification, 107 Evolutionary Programming (EP), 172, 176
diversity, 4, 5, 14, 15, 17, 18, 79, 87, 106, 159, 160, Evolutionary Strategies (ES), 172, 176
163, 304, 305 exchange rate, 307
DNA, 291, 299 execution, 24, 25, 28, 44, 58, 173, 258
DNA damage, 291, 299 exercise, 176
dreaming, 324, 335 experimental condition, 294
Dropout Layer, 59 experimental design, 284
dynamic systems, 126 Expert Knowledge, 196, 207
expert systems, 172
expertise, 103, 145, 193, 205, 259, 273, 275, 318,
E
319
exploitation, 78, 82, 284, 307, 308
E. coli, 288, 289, 298
exponential functions, 109
e-commerce, vi, x, 102, 255, 256, 257, 258, 259,
external environment, 60
260, 261, 269, 272, 274
extraction, 2, 70, 105, 106, 107, 117, 119, 156, 284,
economic growth, 230
298, 308
economic landscape, 258
extracts, 29, 59, 277, 278, 286, 287, 290, 297, 298
economics, 308, 317
extrusion, 284, 296
educational background, 145, 193, 275, 318, 319
electricity, 278, 307, 308, 309
electroencephalography, 46
electromagnetic, 142
Index 343
fuzzy rules, 197, 198, 200, 201, 202, 203, 204, 205,
F
214, 215, 216, 217, 218
fuzzy sets, 198
fabrication, 182
fuzzy theory, 12
face validity, 248, 250
facial expression, 329
fantasy, 324, 328, 335 G
Fast Fourier transform, 109
feature selection, 18, 77, 96, 108 GA-Based Artificial Neural Networks, 158
feelings, 328, 329 gene expression, 17, 18, 19
FFT, 109, 111, 112, 113, 114, 117 gene pool, 159
filters, 57, 58, 59, 106, 184 generalizability, 106, 117
financial, x, 250, 256, 257, 259, 302, 303, 306 generalization performance, viii, 75, 96, 99, 101, 107
financial institutions, 256 generalized cross validation, 314
financial markets, 303, 306 Genetic Algorithms (GAs), v, vii, viii, 17, 20, 75, 76,
fitness, 77, 78, 79, 80, 82, 87, 90, 91, 92, 93, 94, 95, 77, 78, 79, 84, 85, 88, 89, 94, 95, 96, 97, 99, 100,
99, 130, 131, 158, 159, 160, 161, 162, 163, 164, 101, 102, 103, 123, 142, 143, 144, 147, 148, 151,
175, 176, 179 158, 172, 176, 177, 179, 230, 252, 283, 291
flank, 149, 150 genetic code, 82
flavonoids, 286, 299 genetic diversity, 82, 87
flexibility, 78, 99, 127, 174 genetic endowment, 330
flight, 162, 183, 186, 333 genetic programming, vii, ix, 96, 171
fluctuations, 124, 126, 136, 272, 278, 303, 304 Genetic Programming (GP), v, vii, ix, 96, 171, 172,
food industry, 277, 278, 285, 294 175, 176, 180, 183, 185, 188, 190
food products, 283, 285 genetics, 175, 283
food safety, 283, 294 GenIQ System, 185
food spoilage, 288 Germany, 50, 144, 303, 306
force, 126, 149, 150, 151, 154, 161, 164, 165, 166, global competition, 122
167, 169, 292 global economy, 260
forecasting, 122, 278 global markets, 122
formula, 59, 108, 127, 131, 132, 206 global warming, 303
foundations, 5, 7, 75, 76 glucosinolates, 287
fractal analysis, 299 glutathione, 291
free association, 324, 335 Google, 51, 119, 325
freedom, 62, 310 gram stain, 288
Freud, Sigmund, 324 graph, 11, 12, 13, 15, 17, 18, 55, 187
Full Self-Driving Automation, 51 greedy algorithm, 29
function values, 127 greedy layer-wise algorithm, 32
Function-specific, 50 growth, 230, 258, 274, 283, 288, 291
fungi, 290 guidance, 174
fusion, 109, 114, 115, 117, 174, 189, 190, 192 guidelines, 273
fuzzifier, 197, 203, 205, 206
fuzzy inference engine, 197, 198, 203, 204, 206, 214
H
fuzzy inference systems, 150, 199
Fuzzy logic, 72, 196, 201, 203 Hawking, Stephen, 325
system, 197, 198, 199, 200, 203, 298
health, 103, 182, 192, 198, 200, 226, 233, 251, 252,
fuzzy membership, 200 291, 297, 299
fuzzy rule base, 203, 204, 214, 215, 216
health care, 198, 200, 226, 252
344 Index
Healthcare, vi, ix, x, 103, 195, 196, 204, 205, 224, income, 2, 49, 260, 262, 263, 264, 269, 270, 271,
225, 229, 251 272, 302
healthcare experts, ix, 196, 197, 204, 221, 229, 248 incompatibility, 292
healthcare sector, vi, x, 206, 229, 250 independence, 132
healthcare services, 196, 230, 243, 251 independent variable, 202, 278, 296
Heidegger, Martin, 323 indexing, 233, 234, 238, 250
herbal extracts, 278 India, 192, 306, 307, 308
herbal medicines, 284 individuals, xi, 77, 79, 80, 82, 88, 90, 94, 96, 97,
heterogeneity, 96 158, 159, 162, 175, 176, 177, 179, 185, 186
hierarchical clustering, 4, 7, 13, 14, 20 induction, 52, 234, 239, 243, 287
Hierarchical fuzzy systems, 198 induction period, 287
hierarchical model, 63 industrialization, 303
high performance computing, vii industry, 103, 145, 147, 148, 193, 225, 226, 232,
High performance manufacturing, 148 255, 262, 275, 278, 283, 285, 304, 318
www.electronicbo.com
high pressure cooling, 147, 148, 168 inertia, 131, 137, 163
high-pressure cooling, 148, 149, 169 information processing, 72, 73, 118, 153
Hill-climbing methods, 129 information retrieval, 3, 18
hiring, 134 infrastructure, 33, 122, 195, 258
histogram, 109 ingredients, 278, 284
historical data, 50, 185, 248, 305, 309 inhibition, 290, 298
history, x, 264, 272, 290, 303, 317, 318, 322, 325, inositol, 287
329, 330, 332, 335 instability, 121, 124, 125, 126, 141, 145
Hopfield neural networks, 56 institutions, 203, 309
HPC, 147, 148, 149, 151, 154, 164, 167 integration, 77, 197, 255
human, xi, 37, 47, 51, 52, 63, 117, 173, 195, 203, integrity, 174
259, 278, 321, 322, 323, 324, 325, 326, 327, 328, intelligence, xi, 16, 17, 18, 20, 21, 176, 225, 251,
329, 330, 331, 332, 335, 336, 337 288, 321, 322, 324, 325, 330, 331, 332, 334, 335,
human behavior, 173 336
human body, 323, 325, 326, 331 intelligent systems, 51, 257
human brain, 323, 326 intensity values, 286, 299
human condition, 332 intensive care unit, 243
human development, 323 interface, 63, 148, 174, 241, 262, 268
human experience, 328 interoperability, 174
husbandry, 278 intervention, 259, 291
hybrid, ix, 13, 26, 27, 96, 121, 122, 129, 141, 143, intimacy, 328
145, 261, 262, 274, 275, 319, 323 investment, 124, 257, 269, 302
hybrid algorithm, ix, 27, 129, 141 ionizing radiation, 299
hybrid optimization, 121 issues, 47, 50, 125, 173, 183, 257
Hybrid simulation, 257, 261 iteration, 69, 123, 128, 130, 131, 132, 137, 158, 162,
hypercube, 275 163
I J
image, 8, 18, 20, 56, 57, 58, 59, 64, 66, 71, 72, 82, Japan, 117, 143, 303, 333
83, 105, 107, 108, 109, 110, 111, 115, 117, 118, Jordan, 8, 19
119, 326, 333, 335 justification, 245
improvements, 223, 302
in vitro, 285, 290, 295, 298
Index 345
172, 185, 189, 190, 192, 193, 230, 231, 275, 295,
K
302, 317, 318
machine pattern recognition, 106
kaempferol, 286
magnitude, 109, 124, 293, 306
kernel method, 12
majority, 8, 185, 221, 324, 326, 329
knowledge acquisition, 42, 230
management, viii, 24, 25, 27, 34, 35, 36, 41, 42, 43,
knowledge base, 197, 198, 203, 204, 206, 214, 216,
44, 122, 125, 133, 134, 135, 136, 144, 145, 174,
218, 221, 223
189, 193, 204, 206, 225, 256, 258, 259, 260, 261,
Knowledge discovery, 183
267, 271, 273, 275, 318, 319
Korea, 145
Mandela, President Nelson, 337
Kuwait, 251
manipulation, 62, 106
manpower, 139
L manufacturing, vii, ix, 121, 122, 125, 126, 133, 138,
139, 143, 148, 169, 182, 274, 319
labeling, 10, 11 mapping, 70, 261, 275
layered architecture, 63 market share, 133, 260, 271
LC-MS, 297 marketing, 173, 189, 308
learning ability, 310, 313 marketplace, 259
learning blocks, 29 Markov autoregressive input-output model, 51
Learning by analogy:, 53 Markov Chain, 32
Learning from examples, 53 materials, vii, 134, 135, 147, 148, 168, 169, 173,
Learning from instruction, 53 181, 182, 188, 190
learning methods, 3, 52 mathematical programming, 122
learning process, viii, 52, 53, 56, 66, 156, 157, 263 mathematics, 126
learning task, 54, 156 matrix, viii, 11, 13, 15, 19, 22, 29, 43, 58, 105, 106,
legend, 214, 215 107, 108, 109, 110, 111, 117, 118, 119, 185, 189,
lending, x, 255, 256, 261, 268, 269, 272 238, 239
Lending Club, 257, 261, 263, 270, 272 matter, 66, 203, 204, 207, 209, 326, 328, 331, 334
light, 83, 329, 335 measurement, ix, 2, 46, 133, 151, 196, 282, 289, 295
Limited Self-Driving Automation, 51 media, vii, 102, 292
Linear Discriminant Analysis, 106, 297 median, 8, 10, 12, 13
linear model, 122, 298, 314 mediation, 256, 334
linear programming, 15 medical, 133, 209, 224, 225, 226, 233, 234, 241,
linear systems, 125 244, 250, 251, 252
linoleic acid, 285 medical expertise, 209
lipid peroxidation, 291 medicine, vii, x, 206, 224, 251, 277, 290
liquidity, 257, 261 membership, 200, 201, 202, 203, 204, 205, 206, 207,
Listeria monocytogenes, 289, 298 208, 209, 210, 211, 212, 213, 218
liver, 111, 112, 113, 114, 116 memory, 33, 40, 53, 69, 107, 130, 163, 324, 325,
loans, 263, 266, 267, 268 331, 334
local adaptation, 12 mental activity, 324
Local Ternary Patterns, 107 Merleau-Ponty, 323, 325, 326, 332, 336
long-term customer, 133 message passing, 33
messages, 25, 26, 28, 36, 52, 267
M methodology, ix, x, 9, 12, 15, 24, 41, 43, 82, 92, 93,
100, 122, 123, 126, 143, 173, 179, 189, 232, 234,
Machine Learning, v, viii, 2, 19, 20, 21, 44, 45, 49, 235, 236, 239, 241, 247, 252, 259, 274, 284, 297,
50, 51, 52, 53, 56, 57, 61, 62, 63, 70, 71, 72, 73, 301, 303, 304, 305, 316, 317
75, 100, 101, 102, 103, 108, 117, 118, 119, 145, microorganisms, 288, 289
346 Index
Microsoft, 60, 69, 102 281, 283, 288, 289, 293, 294, 295, 296, 297, 298,
mind, 321, 323, 324, 325, 326, 327, 335 299, 301, 304, 309, 310, 311, 312, 313, 314, 316,
Missouri, 46, 105, 119, 192 318
mixed discrete-continuous simulations, 262 neurons, 55, 57, 58, 59, 60, 153, 154, 155, 156, 164,
mobile robots, 60, 63, 72 263, 264, 279, 281, 292, 293, 310, 311, 312, 314
model specification, 278 New South Wales, 306
modeling environment, 244 next generation, 82, 90, 179
modelling, 145, 149, 193, 251, 275, 294, 296 nickel, ix, 147, 148, 149, 151, 168
models, vii, viii, ix, x, 19, 25, 29, 30, 49, 61, 62, 69, nickel-based alloys, 148, 149, 168
70, 71, 75, 76, 77, 94, 96, 97, 98, 99, 101, 103, Nietzsche, Friedrich, 323
123, 124, 126, 127, 142, 143, 144, 145, 147, 149, NIR, 287
151, 153, 165, 167, 171, 172, 173, 174, 179, 183, NMR, 299
184, 185, 187, 189, 248, 250, 255, 256, 257, 258, No-Automation, 50
259,261, 262, 272, 273, 278, 285, 289, 290, 295, nodes, 11, 24, 25, 26, 28, 36, 37, 39, 55, 56, 59, 156,
www.electronicbo.com
298, 301, 303, 304, 308, 309, 312, 314 157, 160, 238, 239, 314
modifications, 82 nonlinear dynamic systems, 124
momentum, 156 nonlinear systems, 122, 125, 127
motion control, 60, 62 normal distribution, 67
multi-class support vector machine, 51 N-P complete, 8
multidimensional, 2, 17, 128, 129, 162 Nuclear Magnetic Resonance, 286, 297
multiple regression, 224 numerical analysis, 125
multiplier, 123 nurses, 196, 219, 221, 230, 234, 237, 242, 243, 244,
Multivariate Adaptive Regression Splines, 314 245, 246
Musk, Elon, 325
Mutation, 77, 78, 80, 81, 82, 86, 87, 88, 89, 92, 93,
O
94, 99, 100, 102, 159, 160, 161, 163, 176, 179,
180, 191
observed behavior, 122
mutation rate, 81, 82, 87, 89, 92, 93, 99, 100, 102,
obstacles, 62
179
oil, 286, 287, 288, 289, 294, 295, 297, 298, 303, 304,
myocardial infarction, 295
308, 312
oil samples, 286
N oligopolies, 302, 304
oligopoly, 303
narratives, 328 one dimension, 59, 106, 107
NASA Shuttle, 46, 171 operating system, 322, 323
natural evolution, 77, 94, 172 operations, vii, 55, 68, 128, 133, 145, 176, 189, 204,
natural gas, 304 206, 225, 229, 232, 233, 250, 262, 273
natural products, x, 277, 278, 283, 285, 288, 292, operations research, vii, 189
294, 299 opportunities, 70, 148, 172, 260, 304
natural selection, 175 optimal PSO parameters, 164
near infrared spectroscopy, 295 optimization, vii, viii, 8, 10, 14, 17, 19, 24, 54, 56,
Nearest Neighbor Approach, 237 68, 71, 75, 76, 77, 79, 96, 101, 102, 106, 121,
Neural Network Model, 296 122, 123, 125, 126, 127, 128, 129, 130, 132, 136,
Neural Networks, v, vi, vii, ix, x, 17, 18, 22, 23, 24, 137, 141, 142, 143, 144, 145, 147, 148, 151, 156,
28, 32, 39, 44, 45, 54, 55, 56, 57, 59, 60, 66, 67, 157, 158, 160, 161, 162, 163, 164, 172, 175, 190,
69, 70, 71, 72, 73, 75, 76, 97, 101, 102, 119, 123, 193, 230, 233,243, 248, 251, 275, 278, 284, 318
142, 143, 147, 148, 153, 154, 156, 168, 169, 184, optimization method, 122, 142, 162, 163
187, 188, 230, 255, 263, 264, 273, 277, 278, 279, ordinary differential equations, 259
Index 347
organic compounds, 284 policy, ix, 21, 72, 122, 125, 126, 127, 129, 133, 135,
Overcrowding, vi, ix, 195, 196, 221, 225 136, 137, 138, 139, 140, 142, 143, 144, 224, 259,
overtime, 133, 134 261
oxidation, 285, 287, 294, 297, 298 policy development, 261
oxidative stress, 291 policy options, 125
oxygen, 298 Policy Robustness, 139
polysaccharides, 299
polyunsaturated fat, 286
P
polyunsaturated fatty acids, 286
population, 18, 77, 79, 87, 94, 95, 127, 158, 159,
parallel, vii, viii, 23, 24, 33, 34, 39, 40, 42, 43, 54,
160, 161, 162, 163, 164, 175, 176, 177, 179, 230,
80, 142, 275, 319, 333
262
Parallel distributed discrete event simulation
population growth, 230
(PDDES), viii, 23, 24
population size, 159, 161, 164, 179
Parallel Distributed Simulation, viii, 24
Powell Hill-Climbing Algorithm, 129, 132
parallel implementation, 80
power generation, 303
parallelism, 24, 25, 40
predictability, 185, 186, 258
parents, 79, 81, 88, 91, 92, 94, 99, 159, 175, 179, 239
Predictive analytics, v, vi, ix, x, 171, 172, 173, 179,
Partial Least Squares, 297
183, 189, 301, 302, 305
partial least-squares, 299
predictive modeling, 171, 173, 179, 183
participants, 174, 255, 262, 267, 306, 307
predictor variables, 314
Particle Swarm Optimization, v, vii, ix, 14, 17, 121,
principal component analysis, 106
127, 142, 143, 145, 147, 148, 151, 158
principles, 43, 173, 175, 277, 289
partition, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21
probability, 14, 15, 20, 29, 30, 32, 44, 59, 79, 80, 81,
Pattern Classification, viii, 105
87, 92, 158, 159, 161, 176, 179, 258, 268
pattern recognition, viii, 18, 44, 54, 71, 77, 100, 106,
probability distribution, 29, 30, 32, 258
107, 119, 278
profit, 145, 259, 262, 269
PCA, 106
profit margin, 262, 269
peer-to-peer (P2P), 262
profitability, 269, 271, 273
peer-to-peer lending, x, 255, 256, 268
programming, 33, 36, 100, 151, 171, 178, 189, 191,
pegging, 21
331
perception, 49, 50, 51, 56, 60, 61, 63, 195, 325, 336
project, 47, 49, 64, 69, 174, 179, 186, 189, 275, 302,
performance indicator, 111, 260
304, 319
performance rate, 81
propagation, ix, 56, 60, 150, 154, 156, 283, 287, 291,
permeability, 288, 308
296, 297
personal goals, 265
pruning, 103, 314
personal history, 332, 335
PSO-Based Artificial Neural Networks, 162
personality, 330, 333
psychoanalysis, 330, 336, 337
pH, 284, 297, 298
psychology, 173, 324, 325
pharmaceutical, 103, 283, 284, 294
pharmacology, 278, 285
phenolic compounds, 287, 296, 298 Q
phenomenology, 326
physical characteristics, 150 quadratic programming, 101
physical laws, 259 qualifications, 204, 206
physicians, 219, 221, 230, 234, 243, 251 quantification, 199
platform, 256, 261, 262, 263, 265, 267 quantization, 108, 118
playing, 174, 255, 324 quercetin, 286, 287
questioning, 323
348 Index
robustness, 8
R
Rote Learning, 53
rules, 44, 78, 125, 128, 130, 197, 198, 200, 201, 202,
radial distribution, 151
203, 204, 205, 214, 215, 216, 217, 218, 258, 326
random assignment, 107
random numbers, 96, 163
Random reshaping, 108 S
real time, 36, 61, 70, 172, 294
reality, 309, 321, 323, 326, 328, 332, 333, 334 safety, 50, 70, 72, 126, 192, 283
reasoning, vii, x, 172, 195, 203, 229, 230, 251, 252 Saudi Arabia, 195, 196, 203, 206, 221, 229, 252,
recognition, viii, 44, 45, 53, 54, 56, 57, 67, 71, 72, 255, 275
117, 119, 333 science, 45, 102, 173, 174, 175, 189, 259, 283, 296,
regression, ix, x, 13, 54, 56, 57, 60, 76, 99, 102, 123, 322, 325, 328, 330, 332
143, 149, 150, 172, 189, 289, 298, 299, 301, 304, scientific computing, 144
www.electronicbo.com
309, 314, 315, 316, 317 scope, 292
regression analysis, 143, 149, 150 search space, 78, 121, 127, 129, 130, 132, 141, 159,
regression equation, 123 160, 162, 163
Regression Trees, vi, x, 172, 301, 304, 309, 314, seasonality, 309
315, 316 second generation, 332
regulations, 70, 257, 260 security, 50, 51, 256, 260
reinforcement learning, 53, 54, 172 self-adaptation, 87, 92
Reinforcement Learning, 54 Semi-supervised Learning, 54
reliability, 150, 196, 275 sensitivity, 126, 288, 311
renewable energies, 309 sensors, 60, 61, 63, 173
replication, 87, 290, 335 sequencing, 96
reproduction, 77, 158, 159, 175, 176, 179, 330 sequential behavior, 257
reputation, 133 services, 33, 69, 195, 196, 230, 234, 243, 245, 251,
requirements, 51, 67, 78, 94, 107, 127, 217, 260, 252
267, 270, 272 SFS, 198
researchers, viii, 4, 50, 51, 76, 77, 99, 106, 149, 153, Shale gas, 308
195, 198, 277, 326, 332 signals, 61, 67, 69, 70, 151, 154
resistance, 290 signs, 64, 109, 327
resolution, 102, 183, 294 simple linear regression, 224
resource allocation, 73 simulation, viii, x, 23, 24, 25, 26, 28, 33, 34, 36, 37,
resource utilization, 251 39, 40, 42, 44, 46, 47, 79, 100, 102, 103, 123,
resources, 24, 36, 69, 196, 230, 234, 243, 245, 246, 124, 127, 128, 133, 143, 145, 172, 175, 192, 193,
250, 256 224, 229, 230, 232, 238, 239, 244, 247, 248, 249,
response, 13, 82, 84, 123, 138, 141, 142, 185, 186, 250, 251, 252, 255, 256, 257, 258, 259, 261, 262,
187, 192, 258, 259, 267, 290, 293, 297, 314, 329 264, 268, 269, 270, 272, 273, 274, 275, 318, 319,
responsiveness, 269, 327 326, 329, 331, 336
restricted Boltzmann machines, 29, 44, 45, 46 Simulation Kernel, 33
restrictions, 25, 308, 314 simulation modeling, vi, x, 145, 229, 275
Retrieval Engine, 234 simulation models, 123, 248, 250, 262, 272
risk, x, 16, 51, 61, 62, 63, 70, 90, 122, 148, 156, 167, social behavior, 127
171, 172, 224, 255, 259, 260, 262, 263, 265, 267, social identity, 327
273, 274, 303, 312 social interaction, 162
risk aversion, 266 social network, 130
risk profile, 70, 265, 267 social reality, 328, 333
robotics, 56, 61, 73, 321, 327, 332 social theory, 337
Index 349
154, 155, 156, 160, 161, 163, 164, 165, 167, 186, 258, 259, 262, 263, 264, 268, 270, 290, 292, 301,
263, 264, 278, 281, 291, 309, 310, 312, 314, 315 302, 303, 304, 305, 306, 307, 308, 309, 311, 312,
training programs, 51 313, 314, 315, 316, 317
training speed, 75 variations, 59, 61, 76, 87, 88, 123, 128, 151, 175,
trajectory, 51, 63, 162 209
transactions, 256, 262, 263, 269, 303 varieties, 122, 297
transformation, 18, 22, 106, 108, 109, 123 vector, vii, viii, 2, 40, 41, 51, 59, 68, 70, 75, 76, 77,
transfusion, 111 84, 97, 99, 100, 101, 102, 106, 107, 108, 109,
Transhuman, vi, xi, 321, 322, 328, 329, 331, 332, 110, 111, 115, 116, 117, 118, 119, 131, 162, 172,
335 292
translation, 110 vehicles, viii, 49, 50, 51, 52, 54, 63, 70, 73
transparency, 303 velocity, 127, 130, 131, 162, 163
transportation, 72, 303, 307, 309, 312 versatility, 110
trapezoidal membership, 202, 209, 210, 212, 213 viruses, 256, 290
www.electronicbo.com
trauma, 329 vision, 56, 57, 73, 105, 106, 109, 119, 333
treatment, 234, 290, 291 visual system, 108
Tree Approach, 238 visualization, 17, 151, 184
trial, 44, 69, 156, 176 vote, 111, 112, 113, 114, 116
triangular membership, 202, 209, 212, 213 voting, 10, 13, 15, 16, 20, 21
two-dimensional representation, 105, 106
W
U
waste, 148
United States (USA), 50, 101, 145, 191, 193, 196, wastewater, 298
230, 274, 275, 294, 304, 306, 307, 308, 312, 316, water, 151, 291
318, 324, 336, 337 wavelet, 108, 109
universe, 60, 172, 205, 333 Wavelet features (WAVE), 109
universities, 73 wear, 149, 168, 183, 184
unstable angina, 102 web, vii, 3, 21, 82
unsupervised ensemble learning, vii, 1, 5, 13, 21 workers, 133, 134, 234, 290
Unsupervised Learning, 3, 54 workforce, 302
updating, 124, 127, 131, 132, 133, 162 workload, 196, 199, 201, 202, 212, 219, 220, 221,
urinary tract infection, 291, 296 222
worldwide, 294, 303
worry, 303, 329
V
validation, viii, 42, 43, 68, 69, 100, 186, 187, 218, Y
221, 224, 248, 309, 314
variables, ix, x, 2, 25, 29, 32, 51, 54, 77, 82, 84, 123, yeast, 288, 289, 295
124, 128, 129, 134, 135, 136, 137, 138, 139, 140, yield, 7, 9, 13, 14, 51, 240, 288, 295
141, 144, 151, 152, 157, 174, 179, 187, 188, 198,