You are on page 1of 424

Studies in Fuzziness and Soft Computing

Rafael Bello
Rafael Falcon
José Luis Verdegay Editors

Uncertainty
Management
with Fuzzy and
Rough Sets
Recent Advances and Applications
Studies in Fuzziness and Soft Computing

Volume 377

Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Studies in Fuzziness and Soft Computing” contains publications on
various topics in the area of soft computing, which include fuzzy sets, rough sets,
neural networks, evolutionary computation, probabilistic and evidential reasoning,
multi-valued logic, and related fields. The publications within “Studies in Fuzziness
and Soft Computing” are primarily monographs and edited volumes. They cover
significant recent developments in the field, both of a foundational and applicable
character. An important feature of the series is its short publication time and
world-wide distribution. This permits a rapid and broad dissemination of research
results.

More information about this series at http://www.springer.com/series/2941


Rafael Bello Rafael Falcon
• •

José Luis Verdegay


Editors

Uncertainty Management
with Fuzzy and Rough Sets
Recent Advances and Applications

123
Editors
Rafael Bello Rafael Falcon
Department of Computer Science School of Electrical Engineering
Universidad Central “Marta Abreu” and Computer Science
de Las Villas University of Ottawa
Santa Clara, Villa Clara, Cuba Ottawa, ON, Canada

José Luis Verdegay and


Department of Computer Science
and Artificial Intelligence, Technical Research & Engineering Division
School of Informatics and Larus Technologies Corporation
Telecommunications Engineering Ottawa, ON, Canada
University of Granada
Granada, Spain

ISSN 1434-9922 ISSN 1860-0808 (electronic)


Studies in Fuzziness and Soft Computing
ISBN 978-3-030-10462-7 ISBN 978-3-030-10463-4 (eBook)
https://doi.org/10.1007/978-3-030-10463-4

Library of Congress Control Number: 2018964931

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To our families

To our friends and colleagues at ISFUROS


2017
Preface

Granular computing (GrC) has been gaining momentum as a suitable computa-


tional paradigm to solve different kinds of problems. GrC allows analyzing infor-
mation from different perspectives by generating different granulations of the
universe of discourse. The information granules in each granulation of the universe
bring together objects that are related according to an underlying property, such as
inseparability, similarity, or functionality. Then, we operate at the level of infor-
mation granules instead of at the level of the original objects. Different separability
relationships give rise to different granulations, with a varying number of infor-
mation granules, hence yielding different levels of data abstraction. Fuzzy set
theory (FST) and rough set theory (RST) are two landmark methodologies under
the GrC umbrella.
These two theories can also be employed in the context of handling uncertainty
in a wide variety of computational models. Uncertainty can manifest itself both in
the data used to solve a problem, and in the knowledge of the application domain
fed to the problem-solving method. There are several types of uncertainty, such as
inaccuracy, vagueness, inconsistency, and missing data. The term soft computing is
utilized to bring together different computational techniques to actively consider the
uncertainty as an essential part of problem-solving. FST and RST are two
remarkable members of the soft computing family, which allow modeling vague-
ness and inconsistency, respectively. Given the increasing complexity of the
problems to be tractably solved, it is often necessary to combine two or more
techniques to generate new problem-solving approaches. These are the so-called
hybrid systems.
In FST, a universe of (possibly continuous) values for a system variable is
reduced to a discrete set of values, i.e., the set of linguistic terms. These terms are
defined as fuzzy sets via a membership function. Linguistic terms represent
information granules. Hence, the set of linguistic terms constitutes a granulation
of the universe for the linguistic variable under consideration. Linguistic variables
constructed in this way are used to represent knowledge of the application domain
in a more human-centric manner.

vii
viii Preface

In RST, the objects in the universe are brought together in an information


granule by using a separability (indiscernibility) relation. This leads to a granulation
of the universe according to that relation. In the classical RST formulation, the
underlying relation is an equivalence relation, which induces a partition of the
universe into a set of equivalence classes. In many cases, however, it is necessary to
replace the equivalence relation with a more flexible one (e.g., a tolerance relation).
In this case, the set of obtained information granules indicates a covering of the
universe of discourse.
A case that illustrates the need to combine both theories is when objects are
described through one or more numerical attributes. In that case, the granulation
of these continuous values could be performed via fuzzy sets (to account for the
vagueness and imprecision) and then the granulation of the objects themselves
could be conducted by using an RST indiscernibility relation (in order to detect
inconsistent information). As fuzzy sets or rough sets are combined, so-called fuzzy
rough sets or rough fuzzy sets have been developed and successfully applied to a
plethora of use cases. Additionally, other soft computing techniques can be
hybridized with FST and/or RST. For instance, fuzzy sets and genetic algorithms
(GAs) allow the generation of various computational methods, such as genetic
fuzzy systems. In the same way, fuzzy sets and artificial neural nets (ANNs) come
together in different ways to breed more powerful techniques, such as neuro-fuzzy
systems. ANNs have also been coupled with rough sets. For instance, RST-based
feature selection methods are used in the preprocessing stage of many ANN
models, as well as new neuron models have spawned (such as rough neurons) from
this profitable synergy.
The 2nd International Symposium on Fuzzy and Rough Sets (ISFUROS 2017)
was held from October 24–26, 2017, at the Meliá Marina Varadero hotel in
Varadero, Cuba, as a forum to present and discuss scientific results that contribute
toward theory and applications of fuzzy and rough set theories as well as their
hybridizations. ISFUROS 2017 took place under the umbrella of the First
International Scientific Convention organized by the Universidad Central de Las
Villas (UCLV), with over 20 concurrent events spread across five very intense and
fruitful days.
ISFUROS 2017 featured three keynote talks, two tutorial sessions, one panel
discussion, and 30 oral presentations out of the 55 submissions received. Out
of these, 20 accepted submissions were invited to prepare extended versions as
contributed book chapters to this Springer volume in the prestigious Studies in
Fuzziness and Soft Computing series. These 20 submissions encompass 62 authors
whose geographical distribution is as follows: Cuba (23), Spain (8), Canada (7),
Colombia (7), Finland (4), Peru (4), Belgium (3), Germany (2), Brazil (1), Italy (1),
Japan (1), and Poland (1).
This volume has been structured in three different parts. The first one is devoted
to theoretical advances and applications of fuzzy sets. The second one highlights
rough set theory and its applications, and the third one is dedicated to hybrid
systems.
Preface ix

In Part I, the reader will find new methods based on fuzzy sets to solve machine
learning problems, such as clustering, as well as optimization problems that borrow
FST elements into their formulation. Other contributions put forth new approaches
for decision making, including those featuring fuzzy cognitive maps. There are nine
chapters comprising this Part I.
Part II includes six chapters that enrich the state of the art in RST. Several papers
propose new algorithms for knowledge discovery and decision making using rough
sets.
In Part III, five hybrid methods are introduced. Fuzzy and rough sets are com-
bined in two of the chapters. In the rest, fuzzy sets are coupled with neural and Petri
nets, as well as with GAs.
The editors hope that the methods and applications presented in this volume will
help broaden the knowledge about granular computing, soft computing and two of
its most important building blocks: fuzzy and rough set theories.
The rest of this preface briefly expands on the content of each chapter so that the
reader may dive straight into those that captured her interest.

Part I: Fuzzy Sets: Theory and Applications

Chapter “A Proposal of Hybrid Fuzzy Clustering Algorithm with Application in


Condition Monitoring of Industrial Processes” introduces a fuzzy clustering algo-
rithm inspired by the Weighted Fuzzy C-Means (W-FCM) method that leans on
maximum entropy principles and kernel functions to better separate the clusters.
The proposed technique first aims at identifying and removing outlier points prior to
the clustering process. Its parameters are learned through the popular differential
evolution metaheuristic optimizer. The algorithm was applied to a fault diagnosis
scenario and enabled the online detection of new system faults.
Chapter “Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of
Interest” introduces a route planning problem with applications in tourism. The goal
of the tourist trip design Problem is to maximize the number of points of interest to
visit. The authors proposed a new, more realistic formulation where (i) the points of
interest are clustered in various categories and (ii) the scores and travel time con-
straints are modeled through fuzzy logic. A fuzzy optimization approach and an
efficient greedy randomized adaptive search procedure (GRASP) implementation
were considered. The computational experiments indicate that the proposed tech-
nique is able to find significant solutions.
The Optimal Bucket Order Problem (OBOP) is a rank aggregation problem
where the resulting ranking may be partial, i.e., ties are allowed. Several algorithms
have been proposed to solve OBOP. However, their performance with respect to the
characteristics of the problem instances is not properly studied. Chapter
“Characterization of the Optimal Bucket Order Problem Instances and Algorithms
by Using Fuzzy Logic” describes different aspects of the OBOP instances (such as
the number of items to be ranked, distribution of the precedence values, and the
x Preface

utopicity), as well as the performance of several OBOP algorithms, from a fuzzy


logic standpoint. Based on this fuzzy characterization, several fuzzy relations
between instance characteristics and algorithmic performance have been
discovered.
Chapter “Uncertain Production Planning Using Fuzzy Simulation” applies fuzzy
logic to a production planning scenario with successful results. The goal is to
characterize the mean flow time of the system, namely the time by which a product
is finished and released to the customer. Other performance measures such as
production time and waiting time were modeled as fuzzy sets following a recently
proposed fuzzy random variable generation method.
Chapter “Fully Fuzzy Linear Programming Model for the Berth Allocation
Problem with Two Quays” investigates the berth allocation problem (BAP) for two
quays, where vessels can berth at any position within the limits of the quay and may
arrive at different times during the planning horizon. It is assumed that the arrival
time of the vessels is imprecise, meaning that vessels can actually be late or early up
to a certain threshold. Triangular fuzzy numbers represent the imprecision of the
vessel arrivals. Two models for this BAP scenario are unveiled. The first one is a
fuzzy mixed integer linear programming (MILP), which allows obtaining berthing
plans with different degrees of precision. The second one is a fully fuzzy linear
programming (FFLP) model that yields a fuzzy berthing plan that can adapt to
possible contingencies related to the vessels’ arrivals. The proposed models have
been implemented in CPLEX and evaluated in a synthetic scenario with a varying
number of vessels. The chapter concludes by suggesting the steps to be taken so as
to implement the FFLP BAP model in a maritime container terminal.
Chapter “Ideal Reference Method with Linguistic Labels: A Comparison with
LTOPSIS” is concerned with multicriteria decision making (MCDM). The building
blocks of an MCDM model are described, followed by a brief tour of the most
popular compensatory MCDM methods. In particular, the chapter points out the
limitations of the reference ideal method (RIM) to operate with linguistic labels.
Next, RIM’s basic concepts are described, and another variant is proposed to
determine the minimum distance to the reference ideal, as well as the normalization
function. The proposed scheme is illustrated by means of an example and compared
against the LTOPSIS method.
Fuzzy cognitive maps (FCMs) can be defined as recurrent neural networks that
allow modeling complex systems using concepts and causal relations. While this soft
computing technique has proven to be a valuable knowledge-based tool for building
decision support systems, further improvements related to its transparency are still
required. In Chapter “Comparative Analysis of Symbolic Reasoning Models for
Fuzzy Cognitive Maps,” the authors designed an FCM-based model where both the
causal weights and concepts’ activation values are described through linguistic terms
like low, medium, or high. Augmenting FCMs with the computing with words
(CWW) paradigm leads to cognitive models that are closer to human reasoning, thus
facilitating the understanding of the model’s output for decision makers. The sim-
ulations using a well-known case study related to simulation scenarios illustrate the
soundness and potential application of the proposed model.
Preface xi

Another success story showcasing FCMs is reported in Chapter “Fuzzy


Cognitive Maps for Evaluating Software Usability.” Software usability evaluation
is a highly complex process given the variety of criteria to consider and the lack of
consensus on the values to be used. The usability evaluation method proposed in
this chapter incorporates soft computing elements such as fuzzy logic and fuzzy
linguistic modeling. Furthermore, the use of FCMs allows adding the interrelation
between usability criteria and therefore obtaining a real global usability index.
A mobile application was developed to evaluate the usability of other mobile
applications based on the approach described here. The results obtained in a
real-world environment shows that the proposed technique is a feasible, reliable,
and easy-to-interpret solution for its use in industry.
Chapter entitled “Fuzzy Simulation of Human Behaviour in the Health-e-Living
System” elaborates on an application of fuzzy set theory to preventive health
support systems where adherence to medical treatment is an important measure to
promote health and reduce healthcare costs. Preventive healthcare information
technology system design includes ensuring adherence to treatment through
just-in-time adaptive interventions (JITAI). Determining the timing of the inter-
vention and the appropriate intervention strategy are two of the main difficulties
current systems face. In this chapter, a JITAI system called health-e-living (Heli)
was developed for a group of patients with type-2 diabetes. During Heli’s devel-
opment stages, it was verified that the state of each user is fuzzy and it is difficult to
identify the right moment to send a motivational message to the user without being
annoying. A fuzzy formula is proposed to measure the patients’ adherence to their
goals. As the adherence measurement needed more data, the Disco software toolset
was introduced to model the human behavior and the health action process
approach (HAPA) to simulate the interactions between users of the Heli system.
The effectiveness of interventions is essential in any JITAI system and the proposed
formula allows Heli to send motivational messages in correspondence with the
status of each user so as to evaluate the efficiency of any intervention strategy.

Part II: Rough Sets: Theory and Applications

Covering-based RST is an extension of Pawlak’s RST, and it was proposed to


expand the applications of the latter to more general contexts. In this extension, a
covering is used instead of a partition obtained through an equivalence relation.
Recently, many authors have studied the relationships between covering-based
rough sets, matroids, and submodular functions. In Chapter “Matroids and
Submodular Functions for Covering-Based Rough Sets,” the authors introduced
the matroidal structures obtained from different partitions and coverings of a
specific set. An extension of a matroidal structure for covering-based rough sets is
also unveiled. Finally, a partial order relation among the matroidal structures is
formulated via submodular functions, coverings, and their approximation operators.
xii Preface

Chapter “Similar Prototype Methods for Class Imbalanced Data Classification”


put forward four new methods for solving imbalanced classification problems based
on nearest prototypes. Using similarity relations for the granulation of the universe,
similarity classes are generated and a prototype is selected for each similarity class.
The novelty of the proposal lies in the marriage between RST, specifically the use
of the similarity quality measure, and classification concepts based on nearest
prototypes, to classify objects under these conditions. The implementation of this
RST metric allows creating a prototype that covers the objects whose decision value
is the majority class of the similarity class. Experimental results showed that the
performance of the proposed techniques is statistically superior to other imbalanced
classification methods.
For any educational project, it is important and challenging to know, at the time
of enrollment, whether a given student is likely to successfully pass the academic
year or not. This task is not simple at all because many factors contribute to failure
in an academic setting. Inferring how likely it is that an enrolled student struggles to
meet the program requirements is undoubtedly an interesting challenge for the areas
of data mining and education. In Chapter “Early Detection of Possible
Undergraduate Drop Out Using a New Method Based on Probabilistic Rough Set
Theory,” the authors proposed the use of data mining techniques in order to predict
how likely a student is to succeed in the academic year. Normally, there are more
students who succeed compared to those who fail, hence resulting in an imbalanced
data representation. To cope with imbalanced data, a new algorithm based on
probabilistic RST is introduced. This algorithm has two main drivers: (1) the use of
two different threshold values for the similarity between objects when dealing with
minority or majority class examples and (2) the combination of the original data
distribution with the probabilities predicted by the RST method. The experimental
analysis confirmed that better results are obtained in comparison to a number of
state-of-the-art algorithms.
Community detection is one of the most important problems in social network
analysis. This problem has been successfully addressed through multiobjective
evolutionary algorithms (MOEAs); however, most of the proposed MOEA-based
solutions only detect disjoint communities, although it has been shown that in most
real-world networks, nodes may belong to multiple communities. In Chapter
“Multiobjective Overlapping Community Detection Algorithms Using Granular
Computing,” three algorithms that build a set of overlapping communities from
different perspectives are introduced. These algorithms employ granular computing
principles and are rooted on a multiobjective optimization approach. The proposed
methods make use of highly cohesive information granules as initial expansion
seeds and employ the local properties of the network vertices in order to obtain
highly accurate overlapping communities structures.
Relational database systems are the predominant repositories to store
mission-critical information collected from industrial sensor devices, business
transactions and sourcing activities, among others. However, conventional
knowledge discovery processes require data to be transported to external mining
tools, which is a very challenging exercise in practice. To get over this dilemma,
Preface xiii

equipping databases with predictive capabilities are a promising direction. Using


rough set theory is particularly interesting for this subject, because it has the ability
to discover hidden patterns while being founded on a well-defined set of operations.
Unfortunately, existing implementations consider data to be static, which is a
prohibitive assumption in situations where data evolve over time and concepts tend
to drift. Therefore, Chapter “In-Database Rule Learning Under Uncertainty: A
Variable Precision Rough Set Approach” proposed an in-database rule learner for
non-stationary environments. The assessment under different scenarios with other
state-of-the-art rule inducers demonstrates that the proposed technique is compa-
rable to existing methods, yet superior in critical applications that anticipate further
confidence from the decision-making process.
Chapter “Facial Similarity Analysis: A Three-Way Decision Perspective”
describes a three-way classification of human judgments of similarity. In other
words, a pair of photographs is classified as similar, dissimilar, or undecidable. The
agreement of a set of participants leads to both a set of similar pairs and a set of
dissimilar pairs; their disagreement leads to undecidable pairs. Probabilistic rough
sets are used as the vehicle to induce three-way decisions. The authors put forth a
simple model and then a more refined model. Findings from this study may benefit
practical applications. For example, the selected photograph pairs in the similar,
dissimilar, and undecidable regions may provide a firm foundation for the devel-
opment of an understanding of the processes or strategies different people use to
judge facial similarity. The authors anticipate that it might be possible to use the
correct identification of strategy so as to create presentations of photographs that
would allow eyewitness identification to have improved accuracy and utility.

Part III: Hybrid Approaches

Rough cognitive ensembles (RCEs) can be defined as a multiclassifier system


composed of a set of Rough Cognitive Networks (RCNs), each operating at a
different granularity level. While this model is capable of outperforming several
traditional classifiers reported in the literature, there is still room for enhancing its
performance. In Chapter “Fuzzy Activation of Rough Cognitive Ensembles Using
OWA Operators,” the authors introduced a fuzzy strategy to activate the RCN input
neurons before performing the inference process. This fuzzy activation mechanism
essentially quantifies the extent to which an object belongs to the intersection
between its similarity class and each granular region in the RCN topology. To do
that, it is necessary to conduct an information aggregation process. An aggregation
technique based on the ordered weighted averaging operators (OWA) is developed
in this chapter. The numerical simulations have shown that the improved ensemble
classifier significantly outperforms the original RCE model for the datasets under
consideration. After comparing the proposed model to 14 well-known classifiers,
the experimental evidence confirms that the proposed scheme yields very promising
classification rates.
xiv Preface

In Chapter “Prediction by k-NN and MLP a New Approach Based on Fuzzy


Similarity Quality Measure. A Case Study,” the performance of the k-nearest
neighbors (k-NN) and multilayer perceptron (MLP) algorithms is used in a classical
task in the realm of Civil Engineering: predicting the behavior of the anchorage
of the railway’s fixations before the stud corrosion. The use of fuzzy similarity
quality measure for calculating the weights of the features that combine the uni-
variate marginal distribution algorithm (UMDA) enables both k-NN and MLP to
operate in the case of mixed data (i.e., nominal and numerical attributes).
Experimental results verified that the UMDA + RST + FUZZY approach in this
chapter is better than other methods utilized to calculate the feature weights.
Chapter “Scheduling in Queueing Systems and Networks Using ANFIS” is
concerned with a scheduling problem that appears in many real-world systems
where the customers must be waiting for a service known as queuing system.
Classical queueing systems are handled using probabilistic theories, mostly based
on asymptotic theory and/or sample analysis. The authors addressed a situation
where neither enough statistical data exists nor asymptotic behavior can be applied
to. This way, they proposed an adaptive neuro-fuzzy inference system (ANFIS)
method to derive scheduling rules of a queuing problem based on uncertain data.
They employed the utilization ratio and the work in process (WIP) of a queue to
train an ANFIS network to finally obtain the estimated cycle time of all tasks.
Multiple tasks and rework are considered into the problem, so it cannot be easily
modeled using classical probability theory. The experiment results through simu-
lation analysis demonstrated an improvement of the proposed ANFIS implemen-
tation across several performance measures compared to traditional scheduling
policies.
Chapter “Genetic Fuzzy System for Automating Maritime Risk Assessment”
employs genetic fuzzy systems (GFSs) to assess the risk level of maritime vessels
transmitting automatic identification system (AIS) data. Previous risk assessment
approaches based on fuzzy inference systems (FIS) relied on domain experts to
specify the FIS membership functions as well as the fuzzy rule base (FRB), a
burdensome and time-consuming process. This chapter aims to alleviate this load
by learning the membership functions and FRB for the FIS of an existing risk
management framework (RMF) directly from data. The proposed methodology is
tested with four different case studies in maritime risk analysis. Each case study
concerns a unique scenario involving a particular region: the Gulf of Guinea, the
Strait of Malacca, the Northern Atlantic during a storm, and the Northern Atlantic
during a period of calm seas. The experiments compare 14 GFS algorithms from the
KEEL software package and evaluate the resulting FRBs according to their accu-
racy and interpretability. The results indicate that IVTURS, LogitBoost, and NSLV
generate the most accurate rule bases while SGERD, GCCL, NSLV, and GBML
each generate interpretable rule bases. Finally, IVTURS, NSLV, and GBML
algorithms offer a reasonable compromise between accuracy and interpretability.
Generalized fuzzy Petri nets (GFP-nets) were recently proposed. Chapter “Fuzzy
Petri Nets and Interval Analysis Working Together” describes an extended class of
GFP-nets called type-2 generalized fuzzy Petri nets (T2GFP-nets). The new model
Preface xv

extends the existing generalized fuzzy Petri nets by introducing a triple of operators
ðIn; Out1 ; Out2 Þ in a T2GFP-net in the form of interval triangular norms, which are
supposed to function as substitute for the triangular norms in GFP-nets. Trying to
make GFP-nets more realistic with regard to the perception of physical reality, the
chapter establishes a connection between GFP-net and interval analysis. The link is
methodological, demonstrating the possible use of the interval analysis methodol-
ogy (to deal with incomplete information) to transform GFP-nets into a more
realistic model. The proposed approach can be used both for knowledge repre-
sentation and reasoning in knowledge-based systems.

Santa Clara, Cuba Rafael Bello


Ottawa, Canada Rafael Falcon
Granada, Spain José Luis Verdegay
July 2018
Acknowledgements

We want to express our sincere gratitude and appreciation to all those who made
ISFUROS 2017 and this Springer volume possible. In particular, we acknowledge
the support and direction provided by the ISFUROS 2017 Steering Committee and
the technical reviews and scientific insights contributed by all technical program
committee members, who generously devoted their time and efforts to provide
constructive and sound referee reports to evaluate the quality of all received
submissions.
Our gratitude also goes to the UCLV Convention organizers and the Meliá
Marina Varadero staff, who helped run the conference quite smoothly despite the
short notice to move the Convention to Varadero from its original venue in Santa
Maria Key after the catastrophic impact of hurricane Irma on the northern central
region of Cuba in September 2017. Editors are also indebted to the help received
from the project TIN2017-86647-P (funded by the Fondo Europeo de Desarrollo
Regional, FEDER) and the Asociación Universitaria Iberoamericana de Postgrado
(AUIP) research network iMODA. Special thanks go to Prof. Janusz Kacprzyk,
Gowrishankar Ayyasamy, and Leontina Di Cecco for their priceless support with
the publication of this Springer volume.

xvii
Contents

Part I Fuzzy Sets: Theory and Applications


A Proposal of Hybrid Fuzzy Clustering Algorithm with Application
in Condition Monitoring of Industrial Processes . . . . . . . . . . . . . . . . . . 3
Adrián Rodríguez-Ramos, Antônio José da Silva Neto
and Orestes Llanes-Santiago
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points
of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Airam Expósito, Simona Mancini, Julio Brito and José A. Moreno
Characterization of the Optimal Bucket Order Problem Instances
and Algorithms by Using Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Juan A. Aledo, José A. Gámez, Orenia Lapeira and Alejandro Rosete
Uncertain Production Planning Using Fuzzy Simulation . . . . . . . . . . . . 71
Juan Carlos Figueroa-García, Eduyn-Ramiro López-Santana
and Germán-Jairo Hernández-Pérez
Fully Fuzzy Linear Programming Model for the Berth Allocation
Problem with Two Quays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Flabio Gutierrez, Edwar Lujan, Rafael Asmat and Edmundo Vergara
Ideal Reference Method with Linguistic Labels: A Comparison
with LTOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Elio H. Cables, María Teresa Lamata and José Luis Verdegay
Comparative Analysis of Symbolic Reasoning Models for Fuzzy
Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Mabel Frias, Yaima Filiberto, Gonzalo Nápoles, Rafael Falcon,
Rafael Bello and Koen Vanhoof
Fuzzy Cognitive Maps for Evaluating Software Usability . . . . . . . . . . . 141
Yamilis Fernández Pérez, Carlos Cruz Corona and Ailyn Febles Estrada

xix
xx Contents

Fuzzy Simulation of Human Behaviour in the Health-e-Living


System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Remberto Martinez, Marcos Tong, Luis Diago, Timo Nummenmaa
and Jyrki Nummenmaa

Part II Rough Sets: Theory and Applications


Matroids and Submodular Functions for Covering-Based
Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Mauricio Restrepo and John Fabio Aguilar
Similar Prototype Methods for Class Imbalanced Data
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Yanela Rodríguez Alvarez, Yailé Caballero Mota,
Yaima Filiberto Cabrera, Isabel García Hilarión,
Yumilka Fernández Hernández and Mabel Frias Dominguez
Early Detection of Possible Undergraduate Drop Out Using
a New Method Based on Probabilistic Rough Set Theory . . . . . . . . . . . 211
Enislay Ramentol, Julio Madera and Abdel Rodríguez
Multiobjective Overlapping Community Detection Algorithms
Using Granular Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Darian H. Grass-Boada, Airel Pérez-Suárez, Rafael Bello
and Alejandro Rosete
In-Database Rule Learning Under Uncertainty:
A Variable Precision Rough Set Approach . . . . . . . . . . . . . . . . . . . . . . 257
Frank Beer and Ulrich Bühler
Facial Similarity Analysis: A Three-Way Decision Perspective . . . . . . . 289
Daryl H. Hepting, Hadeel Hatim Bin Amer and Yiyu Yao

Part III Hybrid Approaches


Fuzzy Activation of Rough Cognitive Ensembles Using OWA
Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
Marilyn Bello, Gonzalo Nápoles, Ivett Fuentes, Isel Grau, Rafael Falcon,
Rafael Bello and Koen Vanhoof
Prediction by k-NN and MLP a New Approach Based on Fuzzy
Similarity Quality Measure. A Case Study . . . . . . . . . . . . . . . . . . . . . . . 337
Yaima Filiberto, Rafael Bello, Wilfredo Martinez, Dianne Arias,
Ileana Cadenas and Mabel Frias
Contents xxi

Scheduling in Queueing Systems and Networks Using ANFIS . . . . . . . . 349


Eduyn López-Santana, Germán Méndez-Giraldo
and Juan Carlos Figueroa-García
Genetic Fuzzy System for Automating Maritime Risk Assessment . . . . . 373
Alexander Teske, Rafael Falcon, Rami Abielmona and Emil Petriu
Fuzzy Petri Nets and Interval Analysis Working Together . . . . . . . . . . 395
Zbigniew Suraj and Aboul Ella Hassanien
Contributors

Rami Abielmona School of Electrical Engineering and Computer Science,


University of Ottawa, Ottawa, Canada;
Research & Engineering Division, Larus Technologies Corporation, Ottawa,
Canada
John Fabio Aguilar Universidad Militar Nueva Granada, Bogotá, Colombia
Juan A. Aledo Universidad de Castilla-La Mancha, Albacete, Spain
Yanela Rodríguez Alvarez Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
Dianne Arias Department of Computer Science, University of Camagüey,
Camagüey, Cuba
Rafael Asmat Department of Mathematics, National University of Trujillo,
Trujillo, Peru
Frank Beer University of Applied Sciences Fulda, Fulda, Germany
Marilyn Bello Department of Computer Science, Universidad Central “Marta
Abreu”, de Las Villas, Santa Clara, Cuba;
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
Rafael Bello Department of Computer Science, Universidad Central “Marta
Abreu”, de Las Villas, Santa Clara, Cuba
Hadeel Hatim Bin Amer Department of Computer Science, University of Regina,
Regina, SK, Canada
Julio Brito Departamento de Ingeniería Informática y de Sistemas, Instituto
Universitario de Desarrollo Regional, Universidad de La Laguna, San Cristóbal de
La Laguna, Canary Islands, Spain
Ulrich Bühler University of Applied Sciences Fulda, Fulda, Germany

xxiii
xxiv Contributors

Elio H. Cables Universidad Antonio Nariño, Bogotá, Colombia


Yaima Filiberto Cabrera Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
Ileana Cadenas Department of Civil Engineer, University of Camagüey,
Camagüey, Cuba
Carlos Cruz Corona University of Granada, Granada, Spain
Antônio José da Silva Neto Instituto Politécnico da Universidade do Estado do
Rio de Janeiro (IPRJ/UERJ), Nova Friburgo, Brazil
Luis Diago Interlocus Inc., Yokohama, Japan;
Meiji Institute for Advanced Study of Mathematical Sciences, Meiji University,
Tokyo, Japan
Mabel Frias Dominguez Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
Ailyn Febles Estrada Cuban Information Technology Union, Havana, Cuba
Airam Expósito Departamento de Ingeniería Informática y de Sistemas, Instituto
Universitario de Desarrollo Regional, Universidad de La Laguna, San Cristóbal de
La Laguna, Canary Islands, Spain
Rafael Falcon Research & Engineering Division, Larus Technologies
Corporation, Ottawa, Canada;
School of Electrical Engineering and Computer Science, University of Ottawa,
Ottawa, Canada
Juan Carlos Figueroa-García Universidad Distrital Francisco José de Caldas,
Bogotá, Colombia
Yaima Filiberto Department of Computer Science, University of Camagüey,
Camagüey, Cuba
Mabel Frias Department of Computer Science, University of Camagüey,
Camagüey, Cuba
Ivett Fuentes Department of Computer Science, Universidad Central “Marta
Abreu”, de Las Villas, Santa Clara, Cuba;
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
José A. Gámez Universidad de Castilla-La Mancha, Albacete, Spain
Darian H. Grass-Boada Advanced Technologies Application Center
(CENATAV), Havana, Cuba
Isel Grau Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium
Contributors xxv

Flabio Gutierrez Department of Mathematics, National University of Piura, Piura,


Peru
Aboul Ella Hassanien Faculty of Computers and Information, Cairo University,
Giza, Egypt
Daryl H. Hepting Department of Computer Science, University of Regina,
Regina, SK, Canada
Yumilka Fernández Hernández Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
Germán-Jairo Hernández-Pérez Universidad Nacional de Colombia, Bogotá,
Colombia
Isabel García Hilarión Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
María Teresa Lamata Universidad de Granada, Granada, Spain
Orenia Lapeira Universidad Tecnológica de la Habana José Antonio Echeverría,
CUJAE, Havana, Cuba
Orestes Llanes-Santiago Departamento de Automática y Computación, Universidad
Tecnológica de la Habana José Antonio Echeverría, CUJAE, Havana, Cuba
Eduyn-Ramiro López-Santana Universidad Distrital Francisco José de Caldas,
Bogotá, Colombia
Edwar Lujan Department of Informatics, National University of Trujillo, Trujillo,
Peru
Julio Madera Research Institute of Sweden RISE SICS Västerås AB, Västerås,
Sweden
Simona Mancini Universit di Cagliari, Cagliari, Italy
Remberto Martinez ExtensiveLife Oy, Tampere, Finland
Wilfredo Martinez Department of Civil Engineer, University of Camagüey,
Camagüey, Cuba
Germán Méndez-Giraldo Universidad Distrital Francisco José de Caldas,
Bogotá, Colombia
José A. Moreno Departamento de Ingeniería Informática y de Sistemas, Instituto
Universitario de Desarrollo Regional, Universidad de La Laguna, San Cristóbal de
La Laguna, Canary Islands, Spain
Yailé Caballero Mota Departamento de Computación, Universidad de
Camagüey, Camagüey, Cuba
xxvi Contributors

Jyrki Nummenmaa University of Tampere, Tampere, Finland


Timo Nummenmaa University of Tampere, Tampere, Finland
Gonzalo Nápoles Faculty of Business Economics, Hasselt University, Hasselt,
Belgium;
Hasselt Universiteit, Diepenbeek, Belgium
Yamilis Fernández Pérez University of Informatics Sciences, Havana, Cuba
Airel Pérez-Suárez Advanced Technologies Application Center (CENATAV),
Havana, Cuba
Emil Petriu School of Electrical Engineering and Computer Science, University
of Ottawa, Ottawa, Canada
Enislay Ramentol Research Institute of Sweden RISE SICS Västerås AB,
Västerås, Sweden
Mauricio Restrepo Universidad Militar Nueva Granada, Bogotá, Colombia
Abdel Rodríguez Research Institute of Sweden RISE SICS Västerås AB,
Västerås, Sweden
Adrián Rodríguez-Ramos Departamento de Automática y Computación,
Universidad Tecnológica de la Habana José Antonio Echeverría, CUJAE, Havana,
Cuba
Alejandro Rosete Universidad Tecnológica de La Habana “José Antonio
Echeverría” (Cujae), Havana, Cuba
Zbigniew Suraj Faculty of Mathematics and Natural Sciences, University of
Rzeszów, Rzeszów, Poland
Alexander Teske School of Electrical Engineering and Computer Science,
University of Ottawa, Ottawa, Canada
Marcos Tong ExtensiveLife Oy, Tampere, Finland
Koen Vanhoof Faculty of Business Economics, Hasselt University, Hasselt,
Belgium;
Hasselt Universiteit, Diepenbeek, Belgium
José Luis Verdegay Universidad de Granada, Granada, Spain
Edmundo Vergara Department of Mathematics, National University of Trujillo,
Trujillo, Peru
Yiyu Yao Department of Computer Science, University of Regina, Regina, SK,
Canada
Acronyms

ACO Ant Colony Optimization


AIS Automatic Identification System
ANFIS Adaptive Neuro-Fuzzy Inference System
ANN Artificial Neural Network
ANOVA Analysis of Variance
AOI Area of Interest
AUC Area Under the Curve
BAP Berth Allocation Problem
BPA Bucket Pivot Algorithm
CI Computational Intelligence
CWW Computing With Words
DB Database
DE Differential Evolution
DOKEWFCM Density Oriented Kernel-Based Entropy regularized Weighted
Fuzzy C-Means
EA Evolutionary Algorithm
FAR False Alarm Rate
FCM Fuzzy Cognitive Map/Fuzzy C-Means
FDR False Detection Rate
FFLP Fully Fuzzy Linear Programming
FIS Fuzzy Inference System
FLP Fuzzy Linear Programming
FN Fuzzy Number
FPN Fuzzy Petri Net
FRB Fuzzy Rule Base
FRV Fuzzy Random Variable
FST Fuzzy Set Theory
GA Genetic Algorithm
GFP-nets Generalized Fuzzy Petri Nets
GFS Genetic Fuzzy System

xxvii
xxviii Acronyms

GRASP Greedy Randomized Adaptive Search Procedure


GrC Granular Computing
HAPA Health Action Process Approach
IA Interval Analysis
InDBR In-Database Rule Inducer
IR Imbalance Ratio
ISFUROS International Symposium on Fuzzy and Rough Sets
JITAI Just-In-Time Adaptive Interventions
K-NN K-Nearest Neighbors
LRIM Linguistic Reference Ideal Method
LTOPSIS Linguistic Technique for Order of Preference by Similarity to
Ideal Solution
MCDM Multicriteria Decision Making
MFT Mean Flow Time
MILP Mixed Integer Linear Programming
ML Machine Learning
MLP Multilayer Perceptron
MOEA Multiobjective Evolutionary Algorithm
MOO Multi-Objective Optimization
NN Neural Network/Nearest Neighbor
NSGA-II Non-Dominated Sorting Genetic Algorithm II
OBOP Optimal Bucket Order Problem
OWA Ordered Weighted Averaging
POI Point of Interest
PSO Particle Swarm Optimization
PT Production Time
QCAP Quay Crane Assignment Problem
QN Queuing Networks
QS Queuing Systems
RBF Radial Basis Function
RCE Rough Cognitive Ensemble
RCN Rough Cognitive Network
RIM Reference Ideal Method
RMF Risk Management Framework
RST Rough Set Theory
SC Soft Computing
SCADA Supervisory Control and Data Acquisition
SMOTE Synthetic Minority Over-Sampling Technique
SVM Support Vector Machine
TEU Twenty-Foot Equivalent Unit
TOPSIS Technique for Order of Preference by Similarity to Ideal Solution
TTDP Tourist Trip Design Problem
TTDPC Tourist Trip Design Problem Clustered
UCI University of California Irvine
UCLV Universidad Central de Las Villas
Acronyms xxix

UMDA Univariate Marginal Distribution Algorithm


VPRS Variable Precision Rough Sets
WIP Work In Progress
WT Waiting Time
Part I
Fuzzy Sets: Theory and Applications
A Proposal of Hybrid Fuzzy Clustering
Algorithm with Application in Condition
Monitoring of Industrial Processes

Adrián Rodríguez-Ramos, Antônio José da Silva Neto


and Orestes Llanes-Santiago

Abstract In this chapter a hybrid algorithm using fuzzy clustering techniques is


presented. The algorithm is applied in a condition monitoring scheme with online
detection of novel faults and automatic learning. The proposal, initially identifies
the outliers based on data density. Later, the outliers are removed and the clustering
process is performed. To extract the important features and improve the cluster-
ing, the maximum-entropy-regularized weighted fuzzy c-means is used. Then, the
use of kernel functions is performed for clustering the data, where there is a non-
linear relationship between the variables. Thus, the classification accuracy can be
improved because better class separability is achieved. Next, the regulation factor of
the resulting partition fuzziness (parameter m) and the Gaussian Kernel bandwidth
(parameter σ ) are optimized. The feasibility of the proposal is demonstrated by using
the DAMADICS benchmark.

1 Introduction

Fuzzy clustering methods are unsupervised classification tools [1] which can be
employed to define groups of observations by considering the similarities among
them. In particular, fuzzy clustering tools allow to handle data uncertainty which is
common across different disciplines such as image processing, machine learning,
modeling and identification [2–8]. An important advantage of this type of methods

A. Rodríguez-Ramos · O. Llanes-Santiago (B)


Departamento de Automática y Computación, Universidad Tecnológica de la Habana José
Antonio Echeverría, CUJAE, Calle 114, No. 11901, 10390 La Habana, Cuba
e-mail: orestes@tesla.cujae.edu.cu
A. Rodríguez-Ramos
e-mail: adrian.rr@automatica.cujae.edu.cu
A. J. da Silva Neto
Instituto Politécnico da Universidade do Estado do Rio de Janeiro (IPRJ/UERJ),
Rua Bonfim, 25 - Parte - Campus UERJ, Nova Friburgo, RJ 28625-570, Brazil
e-mail: ajsneto@iprj.uerj.br

© Springer Nature Switzerland AG 2019 3


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_1
4 A. Rodríguez-Ramos et al.

is that they can remove the influence of noise and outliers from the data clustering
[50, 51].
The Fuzzy C-Means (FCM) algorithm [9], is one of the most widely used algo-
rithm for clustering due to its robust results for overlapped data. Unlike k-means
algorithm, data points in the FCM algorithm may belong to more than one cluster
center. FCM algorithm obtains very good results with noise free data but are highly
sensitive to noisy data and outliers [1].
Other similar techniques as, Possibilistic C-Means (PCM) [10] and Possibilistic
Fuzzy C-Means (PFCM) [11] interprets clustering as a possibilistic partition and
work better in presence of noise in comparison with FCM. However, PCM fails to find
optimal clusters in the presence of noise [1] and PFCM does not yield satisfactory
results when dataset consists of two clusters which are highly unlike in size and
outliers are present [1, 10]. Noise Clustering (NC) [12], Credibility Fuzzy C-Means
(CFCM) [13], and Density Oriented Fuzzy C-Means (DOFCM) [10] algorithms were
proposed specifically to work efficiently with noisy data.
The clustering output depends upon various parameters such as distribution of data
points inside and outside the cluster, shape of the cluster and linear or non-linear sep-
arability. The effectiveness of the clustering method relies highly on the choice of the
metric distance adopted. FCM uses Euclidean distance as the distance measure, and
therefore, it can only be able to detect hyper spherical clusters. Researchers have pro-
posed other distance measures such as, for example, Mahalanobis distance measure,
and Kernel based distance measure in data space and in high dimensional feature
space, such that non-hyper spherical/non-linear clusters can be detected [14, 15].
However, one drawback of these clustering algorithms is that they treat all features
equally in the decision of the cluster memberships of objects. A solution to this pro-
blem, is to introduce the proper attribute weight into the clustering process [16, 17].
Many attribute-weighted fuzzy clustering methods have been proposed in the
last times. In [18], is used the weighted Euclidean distance to replace the general
Euclidean distance in FCM. In [19], the grouping is carried out clustering on the
selected subspace instead of the full data space by directly assigning zero weights to
features which have little information. Recently, [20] present an enhanced soft sub-
space clustering (ESSC) algorithm by employing both within-cluster and between-
cluster information. In [21], a novel subspace clustering technique has been proposed
by introducing the feature interaction using the concepts of fuzzy measures and the
Choquet integral. [22] give a survey of weighted clustering technologies. Finally,
in [23], a maximum-entropy-regularized weighted fuzzy c-means (EWFCM) algo-
rithm is proposed, to extract the important features and improve the clustering. In
EWFCM algorithm, the attribute-weight entropy regularization is defined in the new
objective function to achieve the optimal distribution of attribute weights. So that,
we can simultaneously minimize the dispersion within clusters and maximize the
entropy of attribute weights to stimulate important attributes for contributing to the
identification of clusters. Then, the good clustering result can be yielded and the
important attributes can be extracted for cluster identification. Moreover, the kernel
based EWFCM (KEWFCM) clustering algorithm is realized for clustering the data
with non-spherical shaped clusters.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 5

Another problem usually present in fuzzy clustering methods is the significative


dependency of their performance with the adequate selection of their parameters [24,
25].
In order to overcome the problems above mentioned, in this chapter a hybrid
algorithm using fuzzy clustering techniques is proposed, constituting the principal
contribution of this paper. The algorithm is applied in a condition monitoring scheme
with online detection of novel faults and automatic learning with the ability to analyze
the set of data classified as noise and evaluate whether they form an unknown class
or not, which constitutes another contribution of this chapter.
In the first place, the proposal enables the identification of outliers before the
clustering process. Subsequently, the outliers are removed and the clustering pro-
cess is performed. To extract the important features and improve the clustering, the
maximum-entropy-regularized weighted fuzzy c-means is used. Then, the use of ker-
nel functions is performed for clustering the data with non-spherical shaped clusters.
Thus, the classification accuracy can be improved because better class separability is
achieved. Next, the regulation factor of the resulting partition fuzziness (parameter
m) and the Gaussian Kernel bandwidth (parameter σ ) are optimized by using the
Differential Evolution algorithm.
In summary, the new algorithm developed in this chapter presents the following
characteristics:
• Eliminates or reduces the presence of noise and outliers in a datasets.
• Allows to deal with the uncertainties and non-linearities of the data due to the high
complexity of the modern industrial systems.
• It is a useful tool to extract the important features and improve clustering process.
• It has the ability to analyze the set of data classified as noise and determine if they
form a new class or not.
The organization of the chapter is as follows: in Sect. 2 a description of the FCM
algorithm is made. Later, we comment on recently proposed fuzzy clustering algo-
rithms, which will be used in the comparison with the proposed algorithm. In Sect. 3,
a description of the novel hybrid fuzzy clustering algorithm is presented. Next, in
Sect. 4, the new condition monitoring scheme is described by remarking the capac-
ity to detect new faults. The latter is achieved with an automatic learning approach
based on fuzzy clustering techniques. The benchmark and the design of experiments
developed to demonstrate the performance of the proposal together with the obtained
results are shown in Sect. 5. Conclusions are drawn at the end.

2 Related Works

Many algorithms have been developed for fuzzy clustering. Among the most used
techniques are: fuzzy-type relationship algorithms such as the fuzzy non-metric
model [26], the fuzzy C-Means of relationship [27], the non-Euclidean C-Means
of relationship [28], the fuzzy C-medoids [29], and the fuzzy relation data clustering
6 A. Rodríguez-Ramos et al.

algorithm [30]. On the other hand dynamic algorithms are found such as the adaptive
fuzzy clustering algorithm (AFC-Adaptative Fuzzy Clustering) [31], the Matryoshka
method [32], the dynamic neuro-fuzzy inference system (DENFIS) has been used
to make prediction of time series [33]. There is also another technique such as the
LAMDA algorithm (Learning Algorithm for Multivariate Data Analysis) is a fuzzy
classification technique based on the evaluation of the suitability of individuals for
each class [34].
Among the fuzzy clustering methods, the distance based represent the majority.
Fuzzy C-Means (FCM) is the most popular one. The optimization criterion (1) defined
by FCM is used to cluster the data by considering the similitude among observations.


c 
N
J (X ; U, v) = (μik )m (dik )2 (1)
i=1 k=1

The exponent m > 1 in (1), is an important factor which regulates the fuzziness of
the resulting partition. If m → ∞, all patterns will have the same membership degrees
to each group (fuzzy partition). However, if m → 1, the patterns will belong to only
one group (hard partition). The fuzzy clustering allows to obtain the membership
degrees matrix U = [μik ]c×N where μik represents the degree of fuzzy membership
of the sample k to the i −th class, which satisfies the following relationship:


c
μik = 1, k = 1, 2, . . . , N (2)
i=1

where c is the number of classes and N is the number of samples. In this algorithm,
the similitude is evaluated by means of the distance function dik , represented by the
Eq. (3). This function provides a measure of the distance between the data and the
center of each class v = v1 , v2 , . . . , vc , being A ∈ n×n the norm induction matrix,
where n is the quantity of measured variables.

dik2 = (xk − vi )T A (xk − vi ) (3)

The measure of dissimilarity is the square distance between each data point and
the clustering center vi . This distance is weighted by a power of the membership
degree (μik )m . The value of the cost function J is a measure of the weighted total
quadratic error and statistically, it can be seen as a measure of the total variance of
xk regarding vi .
The conditions for local extreme for the Eqs. (1) and (2) are derived using
Lagrangian multipliers:
1
μik = c  2/(m−1) (4)
j=1 dik,A /d jk,A
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 7
N
(μik )m xk
vi = 
k=1
N
(5)
k=1 (μik )m

In Eq. (5) should be noted that vi is the weighted average of the data elements
that belong to a cluster, i.e., it is the center of the cluster i. The FCM algorithm is an
iterative procedure where N data are grouped in c classes. Initially, the user should
establish the number of classes (c). The centers of the c classes are initialized in a
random form, and they are modified during the iterative process. In a similar way the
membership degrees matrix U is modified until it is stabilized, i.e. Ut − Ut−1  < ε,
where ε is a tolerance limit prescribed a priori, and t is an iteration counter.
New fuzzy clustering methods have been proposed in the last years to deal with
the classification problem in different applications.
Ding [14] recently presented GAKFCM for clustering the data in two steps. First,
the initial clusters centers are adjusted by using an improved adaptive genetic algo-
rithm. Second, classification is accomplished through the KFCM method. A picture
fuzzy clustering method (FC-PFS) is presented in [4] by considering theory of pic-
ture fuzzy sets (PFS). It is demonstrated that better clustering quality than other
important methods can be achieved with FC-PFS. The essence of this method is that
it modifies the objective function based on PFS theory. The idea behind the new
function considers two aspects. First, inherits from FCM’s objective function where
the membership degree μ in Eq. (1) are replaced by μ(2 - ξ ) which means that one
data element belonging to a cluster has both: high value of positive degree and low
value of refusal degree [4]. Second, the entropy information is added to the objective
function for helping the method to decrease the neutral and refusal degree of an
element that turns into a member of the cluster. The clustering quality is improved
because the entropy information is relevant [4].
A proper cluster structure which covers the feature set is hard to define for many
data sets. Thus, Zhou [23] presents a maximum- entropy-regularized weighted fuzzy
c-means method to determine important features for enhancing the data clustering
results. The optimal distribution of attribute weights is determined by defining an
objective function based on the attribute weight entropy regularization. This approach
allows, at the same time, to minimize the dispersion within clusters and to maximize
the entropy of the weights of those attributes that promote the identification of clus-
ters. Thus, relevant attributes for a successful clustering identification are identified.
In addition, the kernel version of the EWFCM method (KEWFCM) is implemented
to deal with data possibly containing non-spherical shaped clusters. The Gaussian
kernel has been used [23].
8 A. Rodríguez-Ramos et al.

Fig. 1 Procedure performed by the DOKEWFCM algorithm

3 The Proposed Algorithm

3.1 Kernel-Based DOEWFCM (DOKEWFCM)

The DOKEWFCM algorithm is intended as a hybrid algorithm that uses the potential
of DOFCM [13] to detect and eliminate the outliers in a dataset, and the potentiali-
ties of KEWFCM [23] to extract the important features and improve the clustering
process. Kernel functions allow to cluster data with non-spherical shaped clusters.
Thus, classification errors can decrease because a better separability among classes is
achieved. The Fig. 1 shows the procedure performed by the DOKEWFCM algorithm.
A cluster of noise observations is created together with c clusters (total: c + 1
clusters). The final clustering is performed after the outliers are identified by con-
sidering the data density.The point neighborhood defined by a certain radius must
include a minimum number of observations. The neighborhood membership or den-
sity factor is defined by DOKEWFCM and it assess the density of an observation
with respect to its neighborhood. This measure of the neighborhood membership of
a point i in X is defined as:

ηneighbor
i
hood
hood =
i
Mneighbor (6)
ηmax
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 9

where ηneighbor
i
hood represents the number of points given a neighborhood i; ηmax is
the maximum number of points given by the most populated neighborhood of the
dataset.
If the point q is in the point neighborhood of the point i, q will satisfy:

q ∈ X |dist (i, q) ≤ rneighbor hood (7)

where rneighbor hood is the radius of neighborhood, and dist (i, q) is the distance
between points i and q. Neighborhood membership of each point in the dataset
X is calculated using Eq. (6). The threshold value α is selected from the complete
range of neighborhood membership values, depending on the density of points in the
dataset. The point will be considered as an outlier if its neighborhood membership
is less than α. Let i be a point in the dataset X , then
 i
Mneighbor hood < α then i outlier
(8)
hood ≥ α then i non-outlier
i
Mneighbor

α can be selected from the range of Mneighbor


i
hood values after observing the density of
points in the dataset and it should be close to zero. Ideally, a point will be classified as
outlier only if no other point is present in its neighborhood, i.e., when neighborhood
membership is zero or threshold value α = 0. However, in this scheme, a point is
considered as an outlier when its neighborhood membership is less than α, where
α is a critical parameter to identify the outlier points. Its value depends upon the
nature of dataset, i.e., taking into account the density of the dataset, then, its value
will vary for different datasets. After the outliers are identified, the clustering process
is performed. In this case, the objective function is defined as:


c+1 
N 
M
J = (μik )m wil Φ(xkl ) − ṽil 2
i=1 k=1 l=1


c+1 
M
+γ −1 wil log(wil ) (9)
i=1 l=1

c M
Subject to 0 ≤ i=1 μik ≤ 1 and l=1 wil = 1, 0 ≤ wil ≤ 1, where U = [μik ]c×N
is the membership degree matrix in the original space. W = [wil ]c×M is the attribute
weight matrix in the original space. Ṽ = [ṽil ]c×M is the cluster center matrix in the
kernel space. Φ is the non-linear mapping from the original feature space to the
kernel space. In this case, the Gaussian kernel is used (K(xkl , vil ) = e−xkl −vil  /σ ).
2 2

The matrices V and W are updated according to the Eqs. (10) and (11) respectively.
For this case, in Eq. (10) must be observed that: i = 1, . . . , c.
N
k=1 (μik ) K(xkl , ṽil )x kl
m
ṽil =  N
(10)
k=1 (μik ) K(xkl , ṽil )
m
10 A. Rodríguez-Ramos et al.

 N 
ex p −γ k=1 (μik )m Φ(xkl ) − ṽil 2
wil =   N  (11)
M
s=1 ex p −γ k=1 (μik ) Φ(xks ) − ṽil 
m 2

Membership function μik is modified as:


⎧ 1

⎨  M  1 if non-outlier
m−1
l=1 wil Φ(xkl )−vil 
Q 2
μik = c
j=1  M (12)
l=1 jl  jl 
2


w Φ(x kl )−Qv

0 if outlier

The DOKEWFCM algorithm is presented in Algorithm 1.


Algorithm 1 DOKEWFCM
Input: data, c, ε > 0, m > 1, γ > 0, I tr _max.
Output: data without outliers Xp , U, V, W
Identification of the outliers (Step 1):
Compute neighborhood radius.
Compute ηneighbor
i
hood with Eq. (7).
Select ηmax .
i
Compute Mneighbor hood with Eq. (6).
With the value of α, identify outliers according to (8).
Clustering process (Step 2):
Initialize U to random fuzzy partition.
Initialize W of each attribute.
for l = 1 to l = I tr _max do
Update V using Eq. (10).
Update U using Eq. (12).
Update W using Eq. (11).
Verify stopping criterion: Ut − Ut−1  < ε
end for

The stopping criteria implemented in this algorithm are:


1. Criterion 1: Maximum number of iterations (I tr _max).
2. Criterion 2: Ut − Ut−1  < ε (ε is a tolerance limit prescribed a priori, and t is
an iteration counter).

3.2 An Illustrative Example: UCI Machine Learning


Datasets

Various datasets from the UCI Machine Learning Repository [35] are used to evaluate
the performance of the proposal: Iris, Glass, Ionosphere, Haberman and Heart. These
datasets are contaminated with outliers included evenly among the classes. Table 1
gives an overview of the datasets modified.
To evaluate the performance of the proposed algorithm (DOKEWFCM), the
KEWFCM algorithm [23] was selected to perform a comparative analysis. In addi-
tion, other recent algorithms (GAKFCM [14], FC-PFS [4]) with excellent results
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 11

Table 1 Description of the datasets modified


Dataset No. of elements No. of variables No. of classes Elements in each classes
Iris 198 (48 outliers) 4 3 (66, 66, 66)
Glass 279 (65 outliers) 9 6 (90, 96, 22, 18, 14, 39)
Ionosphere 421 (70 outliers) 34 2 (156, 265)
Haberman 366 (66 outliers) 3 2 (265, 101)
Heart 320 (50 outliers) 13 2 (180, 140)

Table 2 Results of the comparison


Dataset GAKFCM FC-PFS KEWFCM DOKEWFCM
Iris 72.73 85.02 89.81 97.33a
Glass 44.58 47.61 40.97 57.75a
Ionosphere 60.25 62.87 67.33 79.20a
Haberman 64.02 63.38 66.46 77.45a
Heart 55.58 60.22 59.89 73.78a
a Best classification

were also selected to make this comparison. The values of the common parameters
for these algorithms are: I tr _max = 100, ε = 10−5 , m = 2. The specific parameters:
• KEWFCM: γ = 0.05 and σ = 10.
• GAKFCM: σ = 10, crossover rate pco = 0.6 and mutation rate pmo = 0.001.
• FC-PFS: α = 0.6 (where α ∈ (0, 1] is an exponent coefficient used to control the
refusal degree in picture fuzzy sets)
Each algorithm was executed ten times on each dataset. In order to make the
comparative analysis, the classification rate was used as a performance metric. The
classification rate is a measure used to determine how well clustering algorithms
perform on the given dataset with a known cluster structure [23]. It can be measured
by using the Eq. (13), which is expressed as a percentage in this chapter as follow:
c
di
C R = i=1 (13)
N

where di is the number of objects correctly identified in the cth cluster, and N is the
number of all objects in the dataset. Table 2 shows the results of the comparison. It
can be observed that the proposed algorithm obtains the best ACR for all analyzed
datasets.
The Fig. 2 shows for the Iris dataset, that the DOKEWFCM algorithm is able to
classify the outliers (shown in black color). Later on, the algorithm classifies the
observations after the outliers were eliminated (Fig. 3).
Table 3 shows the attribute weight assignment of DOKEWFCM algorithm on
Iris dataset. It is evident that attributes 3 and 4 contribute much more than other
two attributes in clustering, since the algorithm assigns higher weights to these two
attributes.
12 A. Rodríguez-Ramos et al.

100 50
petal length

petal width
50

0 0
100 100
100 100
50 50 50 50
sepal width 0 0 sepal length sepal width 0 0 sepal length

100 100
sepal length

sepal width
50 50

0 0
50 50
100 100
petal width 50 petal width 50
0 0 petal length 0 0 petal length

Fig. 2 Identification of the outliers with DOKEWFCM algorithm (Step 1)

100 40
petal length

petal width

50 20

0 0
60 60
80 80
40 60 40 60
sepal width 20 40 sepal length sepal width 20 40 sepal length

100 100
sepal length

sepal width

50 50

0 0
50 50
100 100
50 50
petal width 0 0 petal length petal width 0 0 petal length

Fig. 3 Classification process with DOKEWFCM algorithm (Step 2)

3.2.1 Statistical Tests

Statistical tests are applied to determine if significant differences among the results
presented in Table 2 exist [36–38]. The non-parametric test of Friedman is first used
to evaluate if significant differences among the methods are obtained. If the test is
positive, a comparison in pairs is performed by employing the non-parametric test
of Wilcoxon.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 13

Table 3 Attribute weight assignment of DOKEWFCM algorithm on Iris dataset


Attribute 1 Attribute 2 Attribute 3 Attribute 4
Cluster 1 0.0006 0.0018 0.8557 0.1419
Cluster 2 0.0002 0.0004 0.5849 0.4145
Cluster 3 0.0012 0.0004 0.6766 0.3218

Table 4 Results of the Wilcoxon test for the iris dataset


1 versus 2 1 versus 3 1 versus 4 2 versus 3 2 versus 4 3 versus 4
 +
R 0 0 0 0 0 0
 −
R 55 55 55 55 55 55
T 0 0 0 0 0 0
Tα=0.05 8 8 8 8 8 8
Winner 2 3 4 3 4 4

3.2.2 Friedman Test

The results using the Iris dataset are shown below. In this case, for four experiments
(k = 4) and 10 datasets (N = 10), the value of statistical Friedman FF = 270 0
−→ ∞
was obtained. With k = 4 and N = 10, FF is distributed according to the F distribu-
tion with 4 − 1 = 3 and (4 − 1) × (10 − 1) = 27 degrees of freedom. The critical
value of F(3,27) for α = 0.05 is 2.9604, so we reject the null-hypothesis (F(3,27)
FF ) which means that at least the average performance of at least one algorithm is
significantly different from the average value of the performance of other algorithms.
For the remaining datasets (Glass, Ionosphere, Haberman and Heart) the same results
were obtained when applying the Friedman test.

3.2.3 Wilcoxon Test

The comparison results for the Iris dataset can be observed in Table 4 (1: GAKFCM, 2:
FC-PFS, 3: KEWKFCM, 4: DOKEWFCM). The first two rows contain the values of
the sum of the positive (R + ) and negative (R − ) rank for each comparison established.
The next two rows show the statistical values T and the critical value of T for a level
of significance α = 0.05. The last row indicates which algorithm was the winner
in each comparison. The summary in Table 5 shows the times that each algorithm
was the winner using all datasets. This results validates that the new fuzzy clustering
algorithm proposed in this chapter obtains the best performance.
14

Table 5 Final result of the comparison between algorithms


Iris Glass Ionosphere Haberman Heart Final result
Algorithm No.Wins Ranking No.Wins Ranking No.Wins Ranking No.Wins Ranking No.Wins Ranking Total Wins Final Ranking
GAKFCM 0 4 1 3 0 4 1 3 0 4 2 4
FC-PFS 1 3 2 2 1 3 0 4 2 2 6 3
KEWFCM 2 2 0 4 2 2 2 2 1 3 7 2
DOKEWFCM 3 1 3 1 3 1 3 1 3 1 15 1
A. Rodríguez-Ramos et al.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 15

3.2.4 Cluster Analysis Using Validity Indices

The classification rate (see Eq. (13)) is a measure used to determine how well clus-
tering algorithms perform on the given dataset with a known cluster structure, but in
practice, you will not know the cluster structure. Therefore, the Davies-Bouldin and
Silhouette validity indices were also analyzed [48, 49].
Let XT = X1 , . . . , XN be the dataset and let D = (D1 , . . . , Dc ) be its clustering
j j
 c clusters. Let D j = X1 , . . . , Xmj be the j − th cluster, j = 1, . . . , c, where m j
into
=  D j .
The Davies-Bouldin index (DB) is defined in the following way:
 
1 
c
Δ(Di ) + Δ(D j )
DB = max (14)
c i=1,i
= j δ(Di , D j )

where Δ(Di ), Δ(D j ) is the intra-cluster distance and δ(Di , D j ) is the inter-cluster
distance.
Small values for the DB index indicate compact clusters, and whose centers are
well separated between them. Consequently, the number of clusters that the DB index
minimizes is taken as the optimum.
Silhouette width of the i − th vector in the cluster Δ(D j ) is defined in the fol-
lowing way:
j j
j bi − ai
si =   (15)
j j
max ai , bi

j
where ai is the average distance between the i − th vector in the cluster D j and the
j
other vectors in the same cluster and bi is the minimum average distance between
the i − th vector in the cluster D j and all the vectors clustered [48, 49].
j
From the Eq. (16), it follows that 1 ≤ si ≤ 1. We can now define the Silhouette
of the cluster D j :

1  j
mj
Sj = s (16)
m j j=1 i

Finally, the global Silhouette index of the clustering is given by Eq. (17). Values
next to 1 of the Silhouette index indicate a better clustering. Therefore, the number
of clusters that the S index maximizes is taken as the optimum.

1
c
S= Sj (17)
c j=1
16 A. Rodríguez-Ramos et al.

1.1 0.8
DB index
1 S index 0.75
Validity indices

Validity indices
0.9 0.7
DB index
0.8 S index
0.65
0.7
0.6
0.6
0.5 0.55

0.4 0.5
0.3 0.45
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10
Number of Clusters Number of Clusters
(a) Results for Iris dataset (b) Results for Glass dataset
1 1.1
DB index DB index
0.9 S index 1 S index
Validity indices

Validity indices
0.8
0.9
0.7
0.8
0.6
0.7
0.5
0.4 0.6

0.3 0.5
0.2 0.4
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10
Number of Clusters Number of Clusters
(c) Results for Ionosphere dataset (d) Results for Haberman dataset
1
DB index
S index
0.9
Validity indices

0.8

0.7

0.6

0.5

0.4
2 3 4 5 6 7 8 9 10
Number of Clusters
(e) Results for Heart dataset

Fig. 4 Values of Davies-Bouldin and Silhouette indices

The Fig. 4a–e shows the values of the validity indices when the DOKEWFCM
algorithm is used. The analysis was performed for the datasets: iris, glass, ionosphere,
haberman and heart. The number of classes was varied from 1 to 10 with the objective
of analyzing if the best validity index was obtained for the corresponding number
of classes. The results showed in Fig. 4 corroborate the well performance of the
algorithm proposed in this paper.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 17

4 Novel Condition Monitoring Scheme with Capacity


to Detect New Faults and Automatic Learning

The Supervisory Control and Data Acquisition (SCADA) systems are used for acquir-
ing data in industrial processes. Based on a measure of similitude, the acquired data
are grouped in classes using clustering methods. These classes can be related to
functional states. To determine the class which an observation belongs, the classi-
cal statistical classifiers compare it with the center of each class using a measure
of similitude. However, the fuzzy classifiers use the comparison to determine the
membership degree of the observation to each class. In general form, the observation
is assigned to which class where its membership degree is highest as it is shown in
(18).

Ci = {i : max {μik } , ∀i, k} (18)

Figure 5 shows the condition monitoring scheme with possibilities of online detec-
tion of new faults and automatic learning using the proposed hybrid algorithm. The
hybrid fuzzy clustering algorithm has two stages: a training stage and an online
stage. In the first, the algorithm is trained using a historical dataset and the classes
which identify the functional stages of the process are formed. In the online stage, the
hybrid algorithm classifies every new observation obtained from the process. After
obtaining a continuous number of observations that make up a window of time,
the observations do not classified in the known functional states (classes) are ana-
lyzed seeking if their constitute a new class. Whenever a new class is detected, it is

Fig. 5 Classification scheme proposed using fuzzy clustering


18 A. Rodríguez-Ramos et al.

characterized by the experts and it is added to the training database. After that, the
classifier should be trained again.
Next, it is presented a description of each stage in detail.

4.1 Off-Line Training

Firstly, the centers corresponding to the known functional states (classes) v =


v1 , v2 , . . . , v N are located by using the historical database for training.
In the proposed technique, a set of N observations X = [x1 , x2 , . . . , x N ] are clas-
sified into c + 1 classes using the DOKEWFCM algorithm. The normal operation
conditions (NOC) of the system together with the faults represent the c classes.
Next, the parameters of the proposed method: m and σ are adjusted by using and
optimization algorithm and a validity index as optimization function. Therefore, a
better position of the class center of each operation state can be estimated and an
improved U partition matrix is obtained. Afterwards, the estimated values of m and σ
in Eq. (9) will be used during the online recognition, and it will contribute to improve
the classification of the samples obtained by the data acquisition system from the
process [50].
To evaluate the performance of the clustering methods with the variations of its
parameters, several validity measures or indices are used. In this chapter, the partition
coefficient PC) [39–41] which determines the fuzziness degree of the partition U is
the validity measure used. Equation (19) display the formula to obtain it.

1 
c N
PC = (μik )2 (19)
N i=1 k=1

The clustering process will be better as much as as the partition U is less fuzzy,
because it permits a better measure of the overlapping degree among the classes.
Then, the best result is obtained with the maximization of the value of PC because
that is equivalent to the fact that each pattern belongs only to one group.
Then, the optimization problem is defined as:

1 
c N
max {PC} = (μik )2
N i=1 k=1

subject to:
m min < m ≤ m max

σmin ≤ σ ≤ σmax

A range of values for m and σ are defined by considering the last definition.
Although 1 < m < ∞ it is widely known that from a practical perspective m is not
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 19

greater than 2 [4–7]. Thus, 1 < m ≤ 2 is considered in this paper. The smoothness
degree of the function is indicated by the parameter σ . If this parameter is overesti-
mated a linear behavior is exhibited by the function such that the projection to the high
dimensional space is useless for separating the non-linear data space. Meanwhile,
if the value of σ is underestimated, the result will be highly sensitive to the noise
present in the data. Therefore, with the objective that small and larges values be con-
sidered during the exploration process, a large search space for the algorithm should
be used. In this chapter, after several experiments , was determined as a satisfactory
interval the follow: 0.25 ≤ σ ≤ 20.
In condition monitoring field, bio-inspired algorithms have been used with excel-
lent results [42–44] to solve optimization problems. There are several bio-inspired
algorithms, in original and improved versions. Some examples are Genetic Algo-
rithm (GA), Artificial Bee Colony (ABC), Differential Evolution (DE), and Particle
Swarm Optimization (PSO) for only mentioning someone of them . In this chapter,
the best values of m and σ are estimated by using the DE algorithm because of its
easy implementation and good outcomes.

4.2 Online Recognition

With the objective to avoid an unwanted displacement of the center of each class
after the training stage produced by an unknown small fault with high latency time,
the hybrid algorithm is modified in this stage, and the updating of the center of each
class is not developed [51].
The experts select how many observations (k) form the time window and set the
parameter T h, The parameter k should be selected according to the process features
because it represents the number of sampling times that should be considered by the
experts to investigate if a fault is occurring. A group of observations is classified as
noise if they do not represent at least the T h percent of the k observations that form the
time window. Otherwise, the group is considered to probably represent a fault, T h is
also determined by the experts. When an observation xk arrives, the DOKEWFCM
algorithm (Step 1: Identification of the outliers) classifies it as noise or as good
taking into account the results of the training. If the observation is classified as a
good sample, the DOKEWFCM algorithm (Step 2: Clustering process) identifies to
which of the known classes Ci it belongs to. A counter of noise observations (N O)
is incremented when an observation is classified as noise, such strategy is repeated
up to k observations such that the time windows is completed.
The percentage of observations classified as noise is calculated once k obser-
vations are acquired (N O P = N O ∗ 100/k). The existence of a new class is ana-
lyzed if N O P > T h. Otherwise, the N O parameter is re-initiated. The N O could
then represent either a new fault class or outliers. The occurrence of a new normal
operating condition is not considered here because it is assumed that the process
operators should be aware of these situations such that the diagnosis system can be
updated with new data and re-started. DOKEWFCM is employed to inspect the noise
20 A. Rodríguez-Ramos et al.

observations. Outliers will generally form a dispersed data with low density and a
cluster is not formed by them. Conversely, once a new fault impact the process the
observations will form a high density region that constitutes a class.
If a new class is confirmed, the experts can analyze the pattern to determine if a
single or multiple fault is occurring. Once the pattern is identified and characterized,
it will be stored, if correspond, in the historical database used in the training stage.
Later on, the classifier should be trained again and the procedure of online recognition
will be repeated systematically.
The scheme described for the online step is a mechanism for the detection of novel
faults with automatic learning. Algorithm 2 describes this proposal.

Algorithm 2 Recognition
Input: data X k , class centers V, rneighbor hood , n max , α, m, σ .
Output: Current State.
Select k and T h
Initialize O counter = 0 and N O counter = 0
for j = 1 to j = k do
O counter = O counter + 1
Compute ηneighbor
i
hood with Eq. (7).
i
Compute Mneighbor hood with Eq. (6).
With the value of α, identify outliers with Eq. (8).
if k ∈
/ Coutlier then
Compute the distances from the observation k to class centers with Eq. Φ(xkl ) − ṽil 2 .
Compute the membership degree of the observation k to the c good classes with Eq. (12).
Determine to which class belongs the observation k using (18).
else
Store observation k in Cnoise
N O counter = N O counter + 1
end if
end for
Compute N O P = (N O counter k
)∗100

if N O P > T h then
Apply DOKEWFCM algorithm for Cnoise considering only classes C N F and Coutlier
if Cnoise ∈/ Coutlier then
Create a new fault: C N F
Store in the historical database for training.
else
Delete Cnoise
N O counter = 0
O counter = 0
end if
else
Delete Cnoise
N O counter = 0
O counter = 0
end if
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 21

Fig. 6 Structure of benchmark actuator system [45]

5 Benchmark Case Study: DAMADICS

5.1 Process Description

In order to apply the proposed methodology to condition monitoring, the DAMADICS


benchmark was selected. This benchmark actuator [45, 46] belongs to the class of
intelligent electro-pneumatic devices widespread in industrial plants. The data for
the experiments performed in this chapter can be found in http://diag.mchtr.pw.edu.
pl/damadics/. The DAMADICS has been used in a water control loop of a tank with
gravitational flow as an example of a possible application.
The structure of the benchmark actuator system can be observed in Fig. 6. The
actuator is formed by the following devices:
• Control valve
• Spring and diaphragm pneumatic servomotor
• Positioner
The pipeline flow is manipulated by the control valve. A servomotor modifies the
position of the valve rod. This servomotor type is of pneumatic type with spring and
diaphragm. The fluid acts upon the flexible diaphragm such that a linear motion of the
servomotor stem is performed. Finally, the positioner is used to reduce the undesired
control-valve-stem miss-possitions caused by external forces (for instance friction
and clearance). The operating modes considered are presented in Table 6. Represen-
tative faults of each device are chosen to easily illustrate the results. Moreover, the
22 A. Rodríguez-Ramos et al.

Table 6 Operation modes Operation modes Description


simulated in the DAMADICS
0 Normal operation condition (NOC)
1 Valve clogging (Fault 1)
2 Critical flow (Fault 7)
3 Electro-pneumatic transducer fault
(Fault 12)
4 Positioner spring fault (Fault 15)
5 Unexpected pressure change across
the valve (Fault 17)
6 Fully or partly opened bypass valves
(Fault 18)
7 Flow rate sensor fault (Fault 19)

Table 7 Measured process Description Symbol


variables
Process control external signal CV
Stem displacement X
Liquid flow rate F
Process value PV

faults have a variety of behaviors to demonstrate the robustness and sensitivity of the
proposed approach.
In the off-line training stage the diagnostic system was not trained to recognize
the faults 17, 18 and 19 with the aim of using them to test the algorithm of online
detection of new faults. These faults were only simulated in the online recognition
stage. A sampling time of 1 s is used to simulate the 4 variables shown in Table 7.
The simulations were performed by using the Matlab-Simulink DABLIB library.
The actuator block inputs and outputs were contaminated with white noise to assess
the robustness of the proposal. Such noise can be caused by the electromagnetic
susceptibility of physical sensors
A total of 80 observations were acquired from each process state. Then, 160
observations representing outliers were evenly distributed among the classes. Outliers
are simulated as values out of the variables measurement range

5.2 Analysis and Discussion of Results

The verification of the performance quality is an important stage in the experiment


design. The confusion matrix is by far the most popular indicator of the classifica-
tion results such that the performance can be visualized. The classification mistakes
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 23

between a state r and a state s in the set of L experiments is indicated by the C Mr s


element of the C M.
The cross validation strategy was used to obtain the confusion matrices. Thus,
the dataset is partitioned into complementary subsets (d). The d − 1 subsets are
used for training and the remaining one is employed for validation/testing. The final
results are obtained from the average of multiple rounds of cross validation which are
performed by modifying the observations of the subsets to reduce the variability. Ten
partitions were used to perform cross validation on the DAMADICS experiments.

5.2.1 Off-Line Training Stage

In the off-line training stage the DOKEWFCM algorithm was applied. The values
of the parameters used in the simulations for the applied algorithm were: Number of
iterations = 100, ε = 10−5 and initial values of m = 2 and σ = 1 were considered.
In this stage the diagnostic system was not trained to recognize the faults 17–19
with the objective of using them to test the algorithm in the online detection of new
faults. These faults were only simulated in the online recognition stage.
To estimate the m and σ parameters, DE algorithm was used due to its advan-
tages, specifically its simple structure, higher speed and robustness [42]. The control
parameters in DE are the size of the population Z, the crossover constant C R and the
scaling factor FS . The values of the parameters for the DE algorithm considering
a search space 1 < m ≤ 2 and 0.25 ≤ σ ≤ 20 were: C R = 0.5, FS = 0.1, Z = 10,
Eval_max = 100 and PC > 0.9999.
The DE algorithm were executed 10 times and the arithmetic mean of the param-
eters m, σ and number of evaluations of the objective function (Eval_Fobj) were
calculated. The behavior of the objective function (PC) presented in Fig. 7, shows
how the DE algorithm rapidly converges . From the iteration 7 the best parameters
were obtained: m = 1.0527 and σ = 15.6503.
These experiments were performed in a computer with the following character-
istics: Intel Core i7-6500U 2.5 - 3.1GHz, memory: 8GB DDR3L. The average exe-
cution time was approximately 3 min, equivalent to 89 evaluations of the objective
function.
Table 8 shows the results obtained in the training stage. In the second column
can be analyzed the results of the classification for the operating states that were
considered (NOC, faults F1, F7, F12 and F15). The last column reflects the variables
or attributes with greater contribution (higher weight values) in the clustering of
the analyzed classes (operating states). To obtain these attributes, a parameter called
weight threshold (T w) must be selected from the expert criterion. If the weight of the
attribute is greater than T w, then the attribute is selected. Figure 8 shows an example
of selection of the attributes considering the faults F1, F7, F12 and F15 (T w = 0.25).
24 A. Rodríguez-Ramos et al.

Fig. 7 Value of the objective

Value of the objective function (PC)


1
function (PC)
0.9

0.8

0.7

0.6

0.5

0.4

0.3
0 2 4 6 8 10
Iterations

Table 8 Results of the Operation Classification Variables with greater


training stage in the mode (%) contribution
DAMADICS process
NOC 100 CV, X, F, PV
F1 97.67 CV, X, F
F7 99.23 CV, F
F12 98.05 CV
F15 87.13 X, F

Fault 1 Fault 7
0.4 0.4

0.3 0.3
Weight

Weight

0.2 0.2

0.1 0.1

0 0
1 2 3 4 1 2 3 4
Attribute Attribute
Fault 12 Fault 15
0.6 0.4
0.5
0.3
0.4
Weight

Weight

0.3 0.2
0.2
0.1
0.1
0 0
1 2 3 4 1 2 3 4
Attribute Attribute

Fig. 8 Attribute weight assignment of DOKEWFCM algorithm


A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 25

Table 9 Results of the Classification (%)


recognition stage in the
DAMADICS process Operation mode Case 1 Case 2
(Experiment 1) NOC 100 100
F1 96.91 98.83
F7 98.13 99.90
F12 97.75 99.33
F15 86.87 91.19
AVG 95.93 97.85

Table 10 Results of the Case 1 versus Case 2


Wilcoxon test for the Case 1  +
and Case 2 R 0
 −
R 55
T 0
Tα=0.05 8
Winner Case 2

5.2.2 Recognition Stage

In this stage, the Algorithm 2 was applied to perform online recognition. In a first
experiment, we considered the operating states used in the training stage (NOC,
faults F1, F7, F12, F15). In the second experiment were used the faults 17, 18 and
19 to test the algorithm of online detection of new faults.
In order to detect early a new fault, it was considered to evaluate 100 samples.
This imply a size of the window of time k = 100 equivalent to 100 sec. In the case of
the decision threshold, it was decided a value of threshold of T h = 60% to establish
an adequate level of majority of samples classified as noise. It must be remarked
that the parameters must be adjusted according to the type of process and experts
opinion.
Table 9 shows a comparison between the results of the classification with all vari-
ables (Case 1) and with the attributes of greater contribution (Case 2) determined in
the training stage. The results show that by using the variables with the most contri-
bution in the clustering of the classes during the training stage, a better classification
(%) of the different operating states is obtained.
However, to know if there are significant differences between the Case 1 and Case
2, it is necessary to apply statistical tests. Then, it is necessary to make a comparison in
pairs to determine which is the best algorithm. For this, the non-parametric Wilcoxon
test is applied.
Table 10 shows the results of the comparison in pairs of the Case 1 and Case 2
using the Wilcoxon test. This results validates that the best results are obtained when
the variables with the most contribution in the clustering during the training stage
are used.
26 A. Rodríguez-Ramos et al.

Table 11 Results of the Operation Classification Variables with greater


recognition stage in the mode (%) contribution
DAMADICS process
(Experiment 2) F17 84.67 CV, X, F
F18 90.33 X, F
F19 95.85 F

In the second experiment the unknown faults F17–19 were analyzed. First, the fault
17 was considered, which was identified as a new class. Once a new fault is detected,
the experts should determine the features of the unusual behavior and re-train the
fault diagnosis system by considering a dataset formed by the new observations
together with the old dataset. Similar experiments were developed for faults 18 and
19 respectively. The Table 11 shows the results obtained for the unknown faults F17–
19. The last column reflects the variables that most contributed to the identification
of these faults.

5.3 Analysis of the Number of False and Missing Alarms

False Alarm Rate (FAR) and Fault Detection Rate (FDR) are performance measures
that can be determined, according to [47], by using the following equations:
N o. o f samples (J > Jlim | f = 0 )
F AR = (20)
total samples ( f = 0)

N o. o f samples (J > Jlim | f


= 0 )
FDR = (21)
total samples ( f
= 0)

where J is the output for the used discriminative algorithms by considering the
fault detection stage as a binary classification process, and Jlim is the threshold that
determines whether one sample is classified as a fault or normal operation. Figures 9
and 10 present the results obtained in the classification process of the faults F1,F7,
F12 and F15. In both cases, the best results are obtained with the variables of greater
contribution in the clustering.
Figure 11 illustrates the FAR and FDR performance indicators for the unknown
faults.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 27

Fig. 9 False Alarm Rate 8


(%) obtained for the faults Case 1
F1, F7, F12 and F15 7 Case 2

FAR (%)
4

0
1 7 12 15
Fault

Fig. 10 Fault Detection 100


Rate (%) obtained for the Case 1
faults F1, F7, F12 and F15 99 Case 2
FDR (%)

98

97

96

95
1 7 12 15
Fault

Fig. 11 Performance 100


Performance indicator (%)

indicator (%) obtained for FAR


FDR
the unknown faults F17, F18 80
and F19
60

40

20

0
17 18 19
Fault

6 Conclusions

In the present chapter a hybrid fuzzy clustering algorithm is proposed. The algorithm
is applied in a condition monitoring scheme with online detection of novel faults and
automatic learning. This allows first to identify the outliers before the clustering
process with the aim to minimize the classification errors. Later on, the outliers are
removed and the clustering process is performed. To extract the important features
28 A. Rodríguez-Ramos et al.

and improving the clustering, the maximum-entropy-regularized weighted fuzzy c-


means is used. Then, the use of kernel functions is performed for clustering the
data, where there is a non-linear relationship between the variables. This allows
achieving greater separability among the classes, and reducing the classification
errors. Afterwards, an step is used to optimize the parameters m and σ of the algorithm
used in this stage, applying Differential Evolution (DE) algorithm. These parameters
are used in the online recognition stage, where the classifier incorporates a novel
fault detection algorithm.
In the online recognition stage, the proposed novel algorithm analyzes the obser-
vations, within a given time window, which do not belong to the known classes and it
determines whether they form a new class (either single or multiple fault) or they are
outliers. Once the new pattern is identified and characterized a strategy is presented to
incorporate it into the knowledge base of the classifier. The excellent results obtained
show the feasibility of the proposal.

Acknowledgements The authors acknowledge the financial support provided by FAPERJ, Fun-
dacão Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro; CNPq, Conselho
Nacional de Desenvolvimento Científico e Tecnológico; CAPES, Coordenação de Aperfeiçoamento
de Pessoal de Nível Superior, research supporting agencies from Brazil; UERJ, Universidade do
Estado do Rio de Janeiro and CUJAE, Universidad Tecnológica de La Habana José Antonio Echev-
erría and the help of Dr. Marcos Quiñones Grueiro (Universidad Tecnológica de La Habana José
Antonio Echeverría)

References

1. Gosain, A., Dahika, S.: Performance analysis of various fuzzy clustering algorithms: a review.
In: 7th International Conference on Communication. Comput. Virtualiz. 79, 100–111 (2016)
2. Chi Man Vonga, K. I. W., Kin Wong, P.: Simultaneous-fault detection based on qualitative
symptom descriptions for automotive engine diagnosis. Appl. Soft Comput. 22, 238–248 (2014)
3. Jiang, X.L., Wang, Q., He, B., Chen, S.J., Li, B.L.: Robust level set image segmentation
algorithm using local correntropy-based fuzzy c-means clustering with spatial constraints.
Neurocomputing 207, 22–35 (2016)
4. Thong, P.H., Son, L.H.: Picture fuzzy clustering: a new computational intelligence method.
Soft Comput. 20, 3549–3562 (2016)
5. Kesemen, O., Tezel, O., Ozkul, E.: Fuzzy c-means clustering algorithm for directional data
( f cm4dd). Expert Syst. Appl. 58, 76–82 (2016)
6. Zhang, L., Lu, W., Liu, X., Pedrycz, W., Zhong, C.: Fuzzy c-means clustering of incomplete
data based on probabilistic information granules of missing values. Knowl. Based Syst. 99,
51–70 (2016)
7. Leski, J.M.: Fuzzy C-ordered-means clustering: Fuzzy Sets Syst. 286, 114–133 (2016)
8. Saltos, R., Weber, R.: A rough-fuzzy approach for support vector clustering. Inf. Sci. 339,
353–368 (2016)
9. Aghajari, E., Chandrashekhar, G.D.: Self-Organizing Map based Extended Fuzzy C-Means
(SEEFC) algorithm for image segmentation. Appl. Soft Comput. 54, 347–363 (2017)
10. Kaur, P., Soni, A., Gosain, A.: Robust kernelized approach to clustering by incorporating new
distance measure. Eng. Appl. Artif. Intell. 26, 833–847 (2013)
11. Askari, S., Montazerin, N., Zarandi, M.H.: Generalized possibilistic fuzzy C-Means with novel
cluster validity indices for clustering noisy data. Appl. Soft Comput. 53, 262–283 (2017)
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 29

12. Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and
categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38,
8684–8689 (2011)
13. Kaur, P.: A density oriented fuzzy c-means clustering algorithm for recognising original cluster
shapes from noisy data. Int. J. Innov. Comput. Appl. 3, 77–87 (2011)
14. Ding, Y., Fu, X.: Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm.
Neurocomputing 188, 233–238 (2016)
15. Akbulut, Y., Sengur, A., Guo, Y., Polat, K.: KNCM: kernel neutrosophic C-Means clustering.
Appl. Soft Comput. 52, 714–724 (2017)
16. Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52, 217–
237 (2003)
17. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review.
SIGKDD Explor. 6, 90–105 (2004)
18. Wang, X.Z., Wang, Y.D., Wang, L.J.: Improving fuzzy c-means clustering based on feature-
weight learning. Pattern Recognit. Lett. 25, 1123–1132 (2004)
19. Borgelt, C.: Feature weighting and feature selection in fuzzy clustering. Proc. IEEE Conf.
Fuzzy Syst. 1, 838–844 (2008)
20. Deng, Z., Choi, K.S., Chung, F.L., Wang, S.: Enhanced soft subspace clustering integrating
within-cluster and between-cluster information. Pattern Recognit. 43, 767–781 (2010)
21. Ng, T.F., Pham, T.D., Jia, X.: Feature interaction in subspace clustering using the Choquet
integral. Pattern Recognit. 45, 2645–2660 (2012)
22. Tang, C.L., Wang, S.G., Xu, W.: New fuzzy c-means clustering model based on the data
weighted approach. Data Knowl. Eng. 69, 881–900 (2010)
23. Zhou, J., Chen, L., Philip Chen, C.L., Zhang, Y., Li, H.L.: Fuzzy clustering with the entropy
of attribute weights. Neurocomputing 198, 125–134 (2016)
24. Silva Filho, T.M., Pimentel, B.A., Souza, R.M., Oliveira, A.L.I.: Hybrid methods for fuzzy
clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst.
Appl. 42, 6315–6328 (2015)
25. Bernal de Lázaro, J.M., Llanes-Santiago, O., Prieto Moreno, A., Knupp, D.C., Silva-Neto, A.J.:
Enhanced dynamic approach to improve the detection of small-magnitude faults. Chemi. Eng.
Sci. 146, 166–179 (2016)
26. Roubens, M.: Pattern classification problems and fuzzy sets. Fuzzy Sets Syst. 1, 239–253
(1978)
27. Hathaway, R.J., Davenport, J.W., Bezdek, J.C.: Relational duals of the c-means clustering
algorithms. Pattern Recognit. 22, 205–212 (1989)
28. Hathaway, R.J., Bezdek, J.C.: NERF C-means: non-Euclidean relational fuzzy clustering. Pat-
tern Recognit. 27, 429–437 (1994)
29. Krishnapuram, R., Joshi A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering
algorithms for web mining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)
30. Dave, R., Sen, S.: Robust fuzzy clustering of relational data. IEEE Trans. Fuzzy Syst. 10,
713–727 (2002)
31. Krishnapuram, R., Kim, J.: A note on the GustafsonKessel and adaptive fuzzy clustering
algorithms. IEEE Trans. Fuzzy Syst. 7, 453–461 (1999)
32. Li, C., Biswas G., Dale M., Dale P., Matryoshka.: A HMM based temporal data clustering
methodology for modeling system dynamics. Intell. Data Anal. 6, 281–308 (2002)
33. Kasabov, N.K., Song, Q.: DENFIS: dynamic evolving neural-fuzzy inference system and its
application for time-series prediction. IEEE Trans. Fuzzy Syst. 10, 144–154 (2002)
34. Aguilar, J., Lopez De Mantaras R.: The process of classification and learning the meaning of
linguistic descriptors of concepts. Approx. Reason. Decis. Anal. 165–175 (1982)
35. Asuncion, A., Newman, D.: UCI machine learning repository, University of California, School
of Information and Computer Science, Irvine, CA. [Online] Accessed http://archive.ics.uci.
edu/beta
36. García, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data
sets for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
30 A. Rodríguez-Ramos et al.

37. García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for
analyzing the evolutionary algorithms behaviour: a case study on the cec 2005 special session
on real parameter optimization. J. Heur. 15, 617–644 (2009)
38. Luengo, J., García, S., Herrera, F.: A study on the use of statistical tests for experimentation
with neural networks: analysis of parametric test conditions and non-parametric tests. Expert
Syste. Appl. 36, 7798–7808 (2009)
39. Li, C., Zhou, J., Kou, P., Xiao, J.: A novel chaotic particle swarm optimization based fuzzy
clustering algorithm. Neurocomputing 83, 98–109 (2012)
40. Pakhira, M., Bandyopadhyay, S., Maulik, S.: Validity index for crisp and fuzzy clusters. Pattern
Recognit. 37, 487–501 (2004)
41. Wu, K., Yang, M.: A cluster validity index for fuzzy clustering. Pattern Recognit. 26, 1275–1291
(2005)
42. Camps Echevarría, L., Llanes-Santiago, O., Silva Neto, A.J.: An approach for fault diagnosis
based on bio-inspired strategies. Stud. Comput. Intell. 284, 53–63 (2010)
43. Liu, Q., Lv, W.: The study of fault diagnosis based on particle swarm optimization algorithm.
Comput. Inf. Sci. 2, 87–91 (2009)
44. Lobato, F., Steffen Jr., F., Silva Neto, A. J.: Solution of inverse radiative transfer problems in
two-layer participating media with Differential Evolution. Inverse Probl. Sci. Eng. 18, 183–195
(2009)
45. Bartys, M., Patton, R., Syfert, M., de las Heras, S., Quevedo. J.: Introduction to the damadics
actuator FDI benchmark study. Control Eng. Pract. 14, 577–596 (2006)
46. Kourd, Y., Lefebvre, D., Guersi, N.: FDI with neural network models of faulty behaviours and
fault probability evaluation: application to DAMADICS. In: 8th IFAC Symposium on Fault
Detection, Supervision and Safety of Technical Processes (SAFEPROCESS), pp. 744–7495
(2012)
47. Yin, S., Ding, S.X., Haghani, A., Hao, H., Zhang, P.: A comparison study of basic data-driven
fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process.
J. Process Control 22, 1567–1581 (2012)
48. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal
Process. 83, 825–833 (2003)
49. Gunter, S. and Bunke, H.: Validation Indices for Graph Clustering. In: Jolion, J., Kropatsch, W.,
Vento, M. (eds.) Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representations
in Pattern Recognition, CUEN Ed., pp. 229–238. Italy(2001)
50. Rodríguez Ramos, A., Llanes-Santiago, O., Bernal de Lázaro, J.M., Cruz Corona, C., Silva
Neto, A.J., Verdegay Galdeano, J.L.: A novel fault diagnosis scheme applying fuzzy clustering
algorithms. Appl. Soft Comput. 58, 605–619 (2017)
51. Rodríguez Ramos, A., Silva Neto, A.J., Llanes-Santiago, O.: An approach to fault diagnosis
with online detection of novel faults using fuzzy clustering tools. Expert Syst. Appl. 113,
200–212 (2018)
Solving a Fuzzy Tourist Trip Design
Problem with Clustered Points of Interest

Airam Expósito, Simona Mancini, Julio Brito and José A. Moreno

Abstract This paper introduces a route-planning problem with applications in


tourism. The goal of the Tourist Trip Design Problem is to maximize the number
of points of interest to visit. We propose a new variant, in our view more realis-
tic, where on the one hand, the points of interest are clustered in various categories
and on the other, the scores and travel time constraints are fuzzy. In this work time
constraints are modeled as fuzzy. A fuzzy optimization approach and an efficient
greedy randomized adaptive search procedure are applied to solve the problem. The
computational experiments indicate that this soft computing approach is able to find
significant solutions.

Keywords Tourist trip design problem · The team orienteering problem with time
windows · Clustered point of interest · Fuzzy constraints · Fuzzy optimization
Greedy randomized adaptive search procedure

1 Introduction

The selection of the attractions to visit at the tourist destination is a problem that
arises when the tourist decide to visit a destination. Most destinations have multiple
points of interests (POIs), most of them are tourist attractions. POIs are the main

A. Expósito (B) · J. Brito · J. A. Moreno


Departamento de Ingeniería Informática y de Sistemas, Instituto Universitario
de Desarrollo Regional, Universidad de La Laguna, 38271 San Cristóbal de La Laguna,
Canary Islands, Spain
e-mail: aexposim@ull.edu.es
J. Brito
e-mail: jbrito@ull.edu.es
J. A. Moreno
e-mail: jamoreno@ull.edu.es
S. Mancini
Universit di Cagliari, 09124 Cagliari, Italy
e-mail: simona.mancini@unica.it

© Springer Nature Switzerland AG 2019 31


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_2
32 A. Expósito et al.

reason why tourists visit the destination, and their decision is motivated by either
historical, beauty or cultural values. Typically tourists have a limited time to visit POIs
at destination and have to select which of them are most interesting. The selection
takes into account, their preferences associated with the degree of satisfaction that
could be perceived by visiting each POI and the cost of the activities within the visit.
The design of tourist routes at destination has been addressed as an optimization
problem associated with the route generation. The problem is known in the literature
[17] as the Tourist Trip Design Problem (TTDP). The corresponding optimization
problems have received increasing interest in the tourism management and service
in order to be incorporated to recommenders, tourism planning tools and electronic
guides. The design and development of tourist trip planning applications is area of
research in computer engineering with increasing interest. The TTDP model usually
considers several basic parameters. Generally, they are the possible set of POIs to
visit by the tourist, the number of routes to be designed taking into account the days
of tourist stay at destination, the travel distance or time between POIs using the
available routing information among POIs, the scores of the POI that correspond
to the degree of interest, the maximum time available for sightseeing each day,
and the time windows for visiting all the POIs. The solution for the optimization
problem must maximize the total score of the POIs selected, and identify the optimal
scheduling routes.
The problems may be complicated and made more realistic by considering addi-
tional features and constraints. Some of them are maximum budget, either by day or
for the whole stay at destination, or specific requirements on the minimum and/or
maximum number of days that the tourist visit the POIs within a certain category
(restaurants, beaches, historic sites, nature facilities, etc.), or on the number of visits
to POIs of a category by some days. Travel times that depend on traffic congestion,
weather conditions, or on the time of day when he/she travels. Other realistic variants
arise when some of the POIs have time windows constraints and the time used to
visit them have to be taken into account in the cost or profit of the visit [7]. In this
paper, we present the Tourist Trip Design Problem with Clustered POIs, in which we
consider that the set of POIs are grouped into different categories. Categories rep-
resent different types of visiting sites (museum, amusement park, beach, restaurant,
...). The aim is to define the set of feasible routes, one for each day of the stay, that
maximize the total score. The tours must start and end at a given starting point and
the duration of each tour(computed considering both travel, visit and waiting times)
cannot exceed a maximum time. The problem also includes POIs which are acces-
sible in certain time windows. In addition, for each category, the number of visited
POIs by category can be bounded or even fixed. For instance, when considering the
lunch restaurant category, the number of visits for each trip must be exactly one,
while other categories may have only one sided limits.
Available information from real world routing planning problems is often impre-
cise, vague or contains uncertainty. Specifically, travel times depend on the surround-
ing conditions and the traffic, roads or weather. The available information on these
conditions is often sparse, imprecise and not easily accessible by tourists. Moreover
they usually have a high degree of flexibility, optimizing their time and setting their
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 33

itineraries. Thus it is necessary to propose models and methodologies that support


new applications in the organization and planning of the tourist experiences, incor-
porating imprecision and flexibility. Soft Computing includes an appropriate family
of models and methods that provides useful answers to problems with these types
of information features. Namely fuzzy sets and systems provide a suitable method-
ological approach for dealing with uncertainty which is a consequence of imprecise
nature of the information and decisions. Metaheuristics are a pertinent procedure to
deal with this kind of models since they offer efficient solutions and strategies that
are integrated with other Soft Computing tools to facilitate approximate solutions to
more complex real world problems [22].
This work considers a version of TTDP where the scores or profits obtained in
the locations are imprecise amounts and time constraints are considers flexible and
soft (not strict), i.e. it assumes that there is a certain tolerance in the fulfillment of
the same. Consequently, we incorporate them into the model in fuzzy terms, as fuzzy
numbers and as fuzzy constraints, respectively. To solve this variant, we propose a
specific methodology from Soft Computing, integrating fuzzy optimization approach
and metaheuristics procedure. We use the ideas introduced by Bellman and Zadeh
[1] for fuzzy optimization problems and the methods developed by Verdegay et al.
[5, 20, 21]. This approach provides Fuzzy Linear Programming (FLP) formulations
of the problems with a number of methods for solving them, in a direct and easy
way, obtaining solutions that are coherent with their fuzzy nature.
The proposed approach applies an algorithm based on the Greedy Randomized
Adaptive Search Procedure (GRASP) to provde high quality solutions. This meta-
heuristic is an iterative process consisting of two phases: the construction phase and
the local search phase. GRASP [6] has been successfully applied to a wide range of
optimization problems, including several route planning problems [15]. As far as we
know, GRASP is one of the few approaches in the literature which solves TOP [16].
The rest of the paper is organized as follows: Sect. 2 describes the problem and the
fuzzy model formulation. Section 3 explains the fuzzy optimization solution approach
to solve the TTDP with clustered POIs. Next, Sect. 4 describes the used GRASP to
find solutions. In Sect. 5 our computational experiments and corresponding results
are described. Finally, last section includes some concluding remarks and future
works.

2 Related Works

Most of the operational research literature dealing with TTDP modeling uses the
Team Orienteering Problem (TOP) [3] or TOP models with time windows (TOPTW)
[18]. The Team Orienteering Problem has been extensively studied in the literature
[19]. The Team Orienteering Problem with Time Windows(TOPTW) is an extension
of the TOP where nodes can be visited only within a specific Time Window. Typ-
ically, POIs are characterized by a time window. Several TOPTW are described in
the literature and solved with metaheuristics, among other, iterated local search [18],
34 A. Expósito et al.

ant colony optimization [14], hybridized evolutionary local search [10], LP-granular
variable neighborhood search [11], genetic algorithm [9], artificial bee colony algo-
rithm [4], and, iterative three-component heuristic [8].
In the literature there are some works using fuzzy optimization approach with
TTDP. The earliest contribution [12] considers a fuzzy routing problem for sightsee-
ing. Recently, M. Verma and K. K. Shukla apply fuzzy optimization to the orienteer-
ing problem [23], Mendez in his Ph.D. thesis [13] uses fuzzy number comparisons
to deal with VRPTW with fuzzy scores and Brito et al. in [2] apply a GRASP for
solving the TOP with fuzzy scores and constraints.

3 Fuzzy Model Formulation

The Tourist Trip Design Problem with Clustered POIs (TTDPC) addressed in this
research is modelled as a multiple-route planning problem. The problem is aimed
at designing a set of routes in a given tourist destination. The number of routes
corresponds to the number of days of the stay at the destination. Each route visit a
certain number of POIs in a limited time. Each POI has associated a score or profit,
a visit time, a time window and a category to which it belongs. The objective is to
maximize the sum of the scores of all the visited POIs. In the fuzzy model proposed
the score of POIs, time limit for the routes and time windows are expressed in fuzzy
terms, as fuzzy number and fuzzy constraints, respectively.
Table 1 describes the sets of indices, parameters, and decision variables of the
problem.
The mathematical model can be written as follows:
Maximize: 
p̃i Yik (1)
k∈K i∈I

Subject to: 
X0jk = 1 ∀k ∈ K (2)
j∈I


Xj0k = 1 ∀k ∈ K (3)
j∈I

 
Xijk = Xjik ∀i ∈ I 0 ∀k ∈ K (4)
j∈I j∈I


Xijk ≤ Yik ∀i ∈ I ∀k ∈ K (5)
j∈I 0

 

Tj ≥ Ti + vi + tij − M 1− Xijk ∀i ∈ I 0 ∀j ∈ I (6)
k∈K
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 35

Table 1 Indices, parameters and decision variables


Indices and parameters
sK Set of routes k
C Set of categories c
I Set of POIs i
I0 I ∪ {0} Set of vertices where 0 indicates the tour starts and ends
Ic Set of POIs belonging to category
p̃i Node i score or profit
tij Travel time between nodes i and j
vi Visit time for node i
Tmax Maximum tour duration
[ei , li ] Opening time windows for node i
Ncmin , Ncmax Minimum and maximum number of nodes belonging to cluster c to be
included in each tour
T0 is an arbitrary set equal to 0 as well as v0
Decision variables
Xijk Binary variable taking value equal to 1 if POI j is visited just after POI i
in tour k, and 0 otherwise
Yik Binary variable taking value equal to 1 if POI i is included in tour k
Ti Variable representing arrival time at POI i

Ti + vi + ti0 ≤f Tmax ∀i ∈ I (7)

Ti ≥ ei ∀i ∈ I (8)

Ti + vi ≤f li ∀i ∈ I (9)

Yik ≤ 1 ∀i ∈ I (10)
k∈K


Ncmin ≤ Yik ≤ Ncmax ∀c ∈ C ∀k ∈ K (11)
i∈I c

Xijk ∈ {0, 1} ∀i = j ∈ I 0 ∀k ∈ K; Yik ∈ {0, 1} ∀i ∈ I ∀k ∈ K (12)

The objective function concerns the maximization of the collected scores or profits
(fuzzy numbers) and is reported in (1). Constraint (2) imposes that each tour must
start from the hotel, while constraint (3) combined with constraint (4) imply that
each tour must end at the hotel. Constraint (4) guarantees the flow balancing at POIs.
Constraint (5) imposes that a POI can be visited by a tour only if it has been assigned
to it. Constraint (6) guarantees tours connectivity while constraint (7) ensure that the
maximum tour duration is respected by all tours. M is a large constant used to make
36 A. Expósito et al.

constraint (6) not binding when POI j is not visited just after POI i. Constraints (8–9)
guarantee that time windows are respected. Each POI can be assigned to at most one
tour, as stated by constraints (10). Constraint (11) guarantees that, for each cluster c,
at least Ncmin and at most Ncmax POIs are visited in each tour. Finally, constraint (12)
specify variables domain. Note that symbol ≤f in (7) and (9) denote that constraints
are fuzzy.

4 The Proposed Fuzzy Optimization Approach

In the previous section, we formulated the TTDPC as a linear programming problem


with fuzzy coefficients in the objective function and fuzzy inequalities in some con-
straints. Fuzzy Linear Programming (FLP) constitutes the basis for solving fuzzy
optimization problems and their solution methods have been the subject of many
studies in the fuzzy context. Different FLP models can be considered according to
the elements that contain imprecise information that are used as a basis for the classi-
fication proposed in [21]. These models are: models with fuzzy constraints, models
with fuzzy goals, models with fuzzy costs and models with fuzzy coefficients in
the technological matrix and resources. In addition, a fifth model, the general fuzzy
problem, in which all of the parameters are subject to fuzzy considerations, can be
studied. The corresponding methodological approaches that provide solutions to FLP
[5], provide methods for solving TTDPC with fuzzy terms. Therefore, this problem
can be solved in a direct and simple way, obtaining solutions that are coherent with
their fuzzy nature.
To solve the optimization problem with fuzzy constraints, Verdegay ‘Verde-
gay1995 using the representation theorem for fuzzy sets, proves that the solutions
for the case of linear functions can be obtained from the auxiliary model:

Maximize z = cx
subject to Ax ≤ b + τ (1 − α) (13)
x ≥ 0, α ∈ [0, 1]

where τ = (τ1 , τ2 , . . . , τm ) ∈ m is the tolerance level vector. Thus, we use that


approach to obtain an equivalent model to deal with fuzzy constraints of TTDPC
which is obtained by replacing (7) and (9) by the following constraints:

Ti + vi + ti0 ≤ Tmax + τ1 (1 − α), ∀i ∈ I (14)

Ti + vi ≤ li + τ2 (1 − α), ∀i ∈ I (15)

where τ1 , τ2 ∈  are the tolerance level vectors or the maximum violations in the
fulfillment of time limit for the routes and time windows constraints provided by the
decision maker, and α ∈ [0, 1]. Applying this model, for each value of α we obtain a
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 37

new optimal solution. The end result is an optimal solution range varying in α. The
result is consistent with the fuzzy nature of the problem.
The next step is to deal with the fuzzy coefficients in the objective function. The
fuzzy model is transformed into a simpler auxiliary model. The method proposed the
use of an ordering function g that allows the comparison between fuzzy numbers,
which facilitates maximization of the objective function. Therefore the objective
function (1) is replaced by: 
g(p̃i )Yik (16)
k∈K i∈I

More specifically, in this paper we use triangular fuzzy numbers to represent the fuzzy
scores. We use the third index of Yager for comparative purposes. The following
objective function is obtained:

(pi1 + 2pi2 + pi3 )Yik (17)
k∈K i∈I

where si is a triangular fuzzy number Tr(pi1 , pi2 , pi3 ).

5 GRASP Solutions

Since TTDPC is an NP-hard problem and difficult to solve in practice, metaheuristic


methods are appropriated to optimize our model. We propose a standard GRASP
metaheuristics to solve the model. The standard GRASP is a multistart two-phase
metaheuristic for combinatorial optimization proposed by Feo and Resende [16]
basically consisting of a construction phase and a local search improvement phase.
It is executed a number of maxIterations times in a multistart strategy and the best
solution found is kept. A feasible solution is obtained in the construction phase.
Subsequently the neighborhood of the solution is explored until a local minimum
is found in the local search phase. The pseudocode shown in Fig. 1 illustrates the
main phases of a GRASP procedure where maxIterations is the maximum number
of iterations in the procedure.

Algorithm 1 Pseudocode of the standard GRASP


1: function GRASP(maxIterations, RCLsize)
2: readInput()
3: for i:=1 to maxIterations do
4: solution = GRASPConstructionPhase(RCLsize)
5: solution = localSearch(solution)
6: updateSolution(solution, bestSolution)
7: end for
8: return bestSolution
9: end GRASP
38 A. Expósito et al.

Fig. 1 Flexible and tight instances comparison for best solutions

The construction phase of the standard GRASP procedure is shown in Fig. 2.


The solution construction mechanism builds a solution step-by-step by adding a new
POI from the Restricted Candidate List (RCL) to the current partial solution under
construction without destroying feasibility.

Algorithm 2 Pseudocode of the Construction Phase of the standard GRASP


1: function GRASPConstructionPhase(RCLsize)
2: Initialize the partialSolution with m empty routes
3: while (it is possible to visit new POIs) do
4: Set the Candidate List CL = ∅
5: for all POI i ∈ I do
6: Find the best feasible triplet (i, j, k) to insert this new POI i in partialSolution according
to greedy time function f (i, j, k)
7: Add the feasible triplet (i, j, k) to CL
8: end for
9: Create the Restricted Candidate List RCL with the best RCLsize triplets (i, j, k) from CL
according to f
10: Select a random triplet (i, j, k) from RCL
11: Update the variables of route k by inserting the POI i at position j
12: end while
13: return partialSolution
14: end GRASPConstructionPhase

The candidate list with the POIs to be inserted in the solution is constructed by the
standard GRASP using a greedy function f . The RCL is built by selecting sizeRCL
feasible insertion triplets (i, j, k) with best values for the greedy function f . This
greedy function represents the incremental increase in the cost function due to the
incorporation of this element into the partial solution. The evaluation function, used
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 39

Fig. 2 Instances comparison for flexible instances

in this paper, locates the best position in which to insert a candidate for all routes,
minimizing the travel time of insertion. Through this greedy function the candidate
list is formed from the best elements, in this case those whose incorporation to the
current partial solution results in the smallest incremental time. The list of candidates
is sorted in descending order according to the score or ascending order according to
the travel time so that the candidates with the highest score or lowest travel time are
placed at the top of the candidate list. Through this greedy function the candidate
list is formed from the best elements. When a candidate is randomly selected, it
is incorporated into the partial solution. Thus the candidate is excluded from the
candidate list and incremental costs are re-evaluated. The construction phase ends
with a feasible current solution. Subsequently, a local search phase is applied with
the aim of improving the solution.
Usually a local search algorithm works interactively, replacing the current solution
with a better solution obtained in the neighborhood. The procedure ends when a better
solution is found in the neighborhood. Figure 3 shows a basic local search algorithm.
Our local search uses exchange movements between locations of different routes in
order to reduce the time routes. This neighborhood search uses a best improving
strategy, all neighbors are explored and the current solution is replaced by the best
neighbor. If the first steps in the local search are able to reduce the route travel time
then the local search tries to insert new locations in the solution in order to maximize
the total score.
40 A. Expósito et al.

Fig. 3 Instances comparison for tight instances

Algorithm 3 GRASP improvement phase


1: function localSearch(solution)
2: s = solution
3: repeat
4: Find the best neighbor n of current solution s according the total time
5: if (TotalTime(n) ≤ TotalTime(s)) then
6: s=n
7: end if
8: until TotalTime(n) ≥ TotalTime(s) for all neighbor n
9: return s
10: end localSearch

In general terms, the construction phase and the local phase search try to maximize
the total score of the solution. This two phases process is iterated, continuing until
the imposed termination criterion is reached.

6 Experimentations and Results

This section describes the computational experiments that were carried out in our
study and the corresponding results. The aim of the experiments is to evaluate the
accuracy of the proposed approach and evaluate its behavior when it is used to solve
the TTDPC with fuzzy coefficients and constraints.
Thirty instances were used in the experiments for comparative purposes. The set
of instances include data from 30 real POIs related to touristic attractions in the
island of Tenerife in Spain. Travel times are computed on a real road network. The
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 41

Table 2 30 POIs instances


Instances POIs K Tmax Type
1 30 1 300 Tight and flexible
2 30 1 450 Tight and flexible
3 30 1 600 Tight and flexible
4 30 2 300 Tight and flexible
5 30 2 450 Tight and flexible
6 30 2 600 Tight and flexible
7 30 3 300 Tight and flexible
8 30 3 450 Tight and flexible
9 30 3 600 Tight and flexible
10 30 4 300 Tight and flexible
11 30 4 450 Tight and flexible
12 30 4 600 Tight and flexible
13 30 5 300 Tight and flexible
14 30 5 450 Tight and flexible
15 30 5 600 Tight and flexible

data provides the position of a set of 30 locations with a given score which can be
visited on a specific day. The maximum number of routes of the solution is also
included. The number of clusters is 4, which are kept fixed for all the instances.
For each POI the visiting time and the opening time windows are taken from real
data and are fixed for all the instances. The maximum number of routes varies from
1 to 5 (K ∈ 1, 2, 3, 4, 5) according to the specific instance. The maximum time by
route is 5, 7.5 or 10 h (Tmax ∈ 300, 450, 600) according to the specific instance. For
each combination of K and Tmax we generate two instance, one, named flexible in
which the minimum/maximum number of POIs to be selected for each category are
not strictly binding, and one, named tight, in which the value of Nmin and Nmax are
tighter, and in at least one case, Nmin = Nmax , for a total of 30 small instances. For
more details concerning the used instances, see Table 2.
The tolerance level applied in the maximum time constraint is 20% of the maxi-
mum time and 20% of the time windows latest time for time windows constraints.
The values of α are 0, 0.2, 0.4, 0.6, 0.8, and 1.0. Regarding GRASP parameters,
several RCL size are used: 3, 4, 5 and 6. The experimentation is divided depending
on how to sort the candidate list in the GRASP procedure, by time or score. The
results presented in Tables 3 and 4 correspond to the best solution for the values of
RCL size ordering the candidate list by time. Furthermore, the results presented in
Tables 5 and 6 correspond to the best solution for the values of RCL size ordering the
candidate list by score. The tables named above have the following structure. The
first column of the tables includes the name of instance used. The second column
shows for each instance the best score, the average score, and the average execution
time in microseconds. Finally, the following columns show the values of the second
42 A. Expósito et al.

Table 3 Results for flexible instances and RCL ordered by time


Instances/Alpha α = 0.0 α = 0.2 α = 0.4 α = 0.6 α = 0.8 α = 1.0
Instance 1 Best Sc. 29.0 29.0 29.0 29.0 29.0 29.0
Avg. Sc. 20.83 21.56 21.75 21.57 20.86 21.19
Avg. Time 0.17 0.09 0.12 0.12 0.03 0.03
Instance 2 Best Sc. 46.0 46.0 46.0 46.0 46.0 46.0
Avg. Sc. 30.57 30.38 30.78 30.8 30.79 30.39
Avg. Time 0.14 0.12 0.1 0.13 0.1 0.04
Instance 3 Best Sc. 62.0 66.0 66.0 66.0 66.0 66.0
Avg. Sc. 40.35 41.26 40.21 40.88 41.12 41.05
Avg. Time 0.05 0.04 0.06 0.07 0.05 0.06
Instance 4 Best Sc. 57.0 57.0 57.0 57.0 57.0 57.0
Avg. Sc. 35.14 35.26 34.49 35.4 34.51 35.15
Avg. Time 0.21 0.19 0.04 0.05 0.05 0.05
Instance 5 Best Sc. 90.0 90.0 90.0 92.0 92.0 92.0
Avg. Sc. 62.74 63.37 63 62.52 62.6 62.63
Avg. Time 0.07 0.07 0.09 0.08 0.06 0.06
Instance 6 Best Sc. 114.0 119.0 119.0 119.0 119.0 119.0
Avg. Sc. 83.69 83.22 83.55 83.27 83.75 83.13
Avg. Time 0.08 0.1 0.08 0.1 0.08 0.08
Instance 7 Best Sc. 82.0 82.0 82.0 82.0 82.0 82.0
Avg. Sc. 50.95 50.33 51.33 50.01 51.28 49.38
Avg. Time 0.04 0.05 0.04 0.03 0.03 0.03
Instance 8 Best Sc. 125.0 125.0 127.0 128.0 128.0 128.0
Avg. Sc. 98.66 98.76 97.7 98.66 98.13 98.16
Avg. Time 0.1 0.1 0.09 0.09 0.09 0.09
Instance 9 Best Sc. 165.0 167.0 167.0 167.0 170.0 170.0
Avg. Sc. 129.19 129.08 129.11 129.05 129.58 129.15
Avg. Time 0.12 0.12 0.12 0.13 0.12 0.12
Instance 10 Best Sc. 99.0 99.0 99.0 99.0 99.0 99.0
Avg. Sc. 68.14 68.94 67.58 68.18 68.84 68.98
Avg. Time 0.05 0.05 0.05 0.05 0.05 0.06
Instance 11 Best Sc. 153.0 153.0 153.0 159.0 159.0 159.0
Avg. Sc. 132.25 132.4 132.78 133.39 132.8 132.86
Avg. Time 0.13 0.13 0.13 0.13 0.13 0.13
Instance 12 Best Sc. 207.0 207.0 207.0 207.0 207.0 207.0
Avg. Sc. 173.13 173.42 172.78 173.14 173.64 173.44
Avg. Time 0.18 0.17 0.19 0.17 0.17 0.17
Instance 13 Best Sc. 113.0 113.0 113.0 113.0 113.0 113.0
Avg. Sc. 86.35 86.18 86.25 86.42 87.08 85.8
Avg. Time 0.06 0.06 0.06 0.06 0.06 0.07
Instance 14 Best Sc. 180.0 184.0 184.0 184.0 184.0 186.0
Avg. Sc. 161.97 161.31 161.55 161.51 161.51 161.28
Avg. Time 0.17 0.18 0.2 0.16 0.16 0.16
Instance 15 Best Sc. 243.0 248.0 248.0 248.0 248.0 248.0
Avg. Sc. 214.76 215.32 213.86 214.5 213.86 213.94
Avg. Time 0.22 0.22 0.22 0.26 0.22 0.22
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 43

Table 4 Results for tight instances and RCL ordered by time


Instances/Alpha α = 0.0 α = 0.2 α = 0.4 α = 0.6 α = 0.8 α = 1.0
Instance 1 Best Sc. 14.0 14.0 14.0 14.0 14.0 14.0
Avg. Sc. 10.51 10.29 10.41 10.73 10.59 10.09
Avg. Time 0.11 0.04 0.04 0.05 0.05 0.05
Instance 2 Best Sc. 32.0 32.0 32.0 32.0 32.0 32.0
Avg. Sc. 21.42 21.09 21.25 21.26 21.74 20.86
Avg. Time 0.06 0.14 0.07 0.07 0.07 0.07
Instance 3 Best Sc. 60.0 60.0 60.0 60.0 60.0 60.0
Avg. Sc. 37.25 38.11 37.7 37.07 38.28 36.97
Avg. Time 0.12 0.09 0.07 0.03 0.03 0.03
Instance 4 Best Sc. 28.0 28.0 28.0 28.0 28.0 28.0
Avg. Sc. 20.07 20.1 19.94 20.23 20.64 20.3
Avg. Time 0.1 0.06 0.08 0.08 0.06 0.07
Instance 5 Best Sc. 61.0 61.0 64.0 64.0 64.0 64.0
Avg. Sc. 40.7 41.13 40.91 40.3 40.77 40.17
Avg. Time 0.13 0.05 0.02 0.02 0.04 0.04
Instance 6 Best Sc. 112.0 112.0 116.0 116.0 116.0 116.0
Avg. Sc. 73.65 72.65 73.05 74.07 72.33 73.31
Avg. Time 0.05 0.06 0.06 0.04 0.04 0.05
Instance 7 Best Sc. 43.0 45.0 45.0 45.0 45.0 45.0
Avg. Sc. 29.22 29.29 29.05 30.04 29.83 29.95
Avg. Time 0.01 0.01 0.01 0.02 0.01 0.01
Instance 8 Best Sc. 92.0 92.0 92.0 92.0 92.0 92.0
Avg. Sc. 61.19 61.54 61.07 62.46 61.25 61.49
Avg. Time 0.03 0.03 0.03 0.03 0.03 0.03
Instance 9 Best Sc. 145.0 146.0 146.0 148.0 148.0 148.0
Avg. Sc. 108.96 108.51 108.81 108.61 107.96 107.64
Avg. Time 0.06 0.06 0.06 0.06 0.06 0.06
Instance 10 Best Sc. 62.0 62.0 62.0 62.0 62.0 62.0
Avg. Sc. 39.98 40.31 40.58 40.42 40.49 40.37
Avg. Time 0.02 0.02 0.02 0.02 0.03 0.02
Instance 11 Best Sc. 113.0 113.0 113.0 117.0 117.0 122.0
Avg. Sc. 82.51 83.08 82.72 83.25 82.87 81.82
Avg. Time 0.05 0.05 0.05 0.08 0.06 0.05
Instance 12 Best Sc. 187.0 187.0 189.0 189.0 190.0 190.0
Avg. Sc. 143.78 143.16 144.61 143.43 143.27 142.63
Avg. Time 0.08 0.08 0.08 0.08 0.07 0.07
Instance 13 Best Sc. 74.0 74.0 74.0 74.0 74.0 74.0
Avg. Sc. 51.99 51.64 51.05 52.3 51.2 52.23
Avg. Time 0.02 0.02 0.02 0.02 0.02 0.02
Instance 14 Best Sc. 139.0 139.0 141.0 143.0 148.0 148.0
Avg. Sc. 103.98 104.11 105.57 105.06 104.76 104.8
Avg. Time 0.06 0.06 0.06 0.06 0.06 0.09
Instance 15 Best Sc. 217.0 217.0 217.0 217.0 217.0 217.0
Avg. Sc. 177.35 178.19 178.95 177.93 177.21 179.01
Avg. Time 0.14 0.09 0.1 0.1 0.1 0.11
44 A. Expósito et al.

Table 5 Results for flexible instances and RCL ordered by score


Instances/Alpha α = 0.0 α = 0.2 α = 0.4 α = 0.6 α = 0.8 α = 1.0
Instance 1 Best Sc. 29.0 29.0 29.0 29.0 29.0 29.0
Avg. Sc. 21.56 21.3 21.81 21.77 21.74 21.3
Avg. Time 0.21 0.12 0.1 0.03 0.03 0.03
Instance 2 Best Sc. 49.0 49.0 49.0 49.0 49.0 49.0
Avg. Sc. 35.5 34.9 35.35 35.97 35.22 35.6
Avg. Time 0.14 0.15 0.15 0.1 0.12 0.14
Instance 3 Best Sc. 68.0 68.0 68.0 68.0 68.0 68.0
Avg. Sc. 53.65 53.47 53.27 54.09 54.47 53.88
Avg. Time 0.14 0.07 0.06 0.06 0.08 0.06
Instance 4 Best Sc. 54.0 54.0 54.0 54.0 57.0 57.0
Avg. Sc. 34.81 34.92 34.88 35 35.72 35.59
Avg. Time 0.19 0.16 0.19 0.06 0.06 0.06
Instance 5 Best Sc. 92.0 98.0 98.0 98.0 98.0 98.0
Avg. Sc. 71.52 70.61 71.27 70.41 71.16 71.92
Avg. Time 0.1 0.09 0.1 0.1 0.1 0.08
Instance 6 Best Sc. 130.0 130.0 130.0 130.0 131.0 131.0
Avg. Sc. 107.27 107.45 107.01 107.19 107.83 107.76
Avg. Time 0.12 0.13 0.11 0.11 0.11 0.11
Instance 7 Best Sc. 82.0 82.0 82.0 82.0 82.0 82.0
Avg. Sc. 51.7 51.21 49.87 50.49 51.06 51.21
Avg. Time 0.03 0.04 0.05 0.04 0.04 0.04
Instance 8 Best Sc. 135.0 141.0 141.0 141.0 141.0 141.0
Avg. Sc. 107.12 106.28 107.21 106.72 106.94 105.61
Avg. Time 0.11 0.11 0.16 0.16 0.16 0.13
Instance 9 Best Sc. 184.0 184.0 184.0 184.0 184.0 184.0
Avg. Sc. 156.88 156.41 156.28 156.3 156.06 156.45
Avg. Time 0.19 0.18 0.18 0.17 0.18 0.19
Instance 10 Best Sc. 99.0 99.0 99.0 99.0 99.0 99.0
Avg. Sc. 68.59 70 69.38 69.03 69.62 69.4
Avg. Time 0.05 0.05 0.05 0.05 0.05 0.05
Instance 11 Best Sc. 166.0 171.0 171.0 171.0 171.0 171.0
Avg. Sc. 141.16 142.03 140.58 141.02 140.39 140.95
Avg. Time 0.15 0.15 0.17 0.18 0.16 0.15
Instance 12 Best Sc. 228.0 229.0 229.0 229.0 229.0 229.0
Avg. Sc. 201.36 200.96 201.26 201.36 201.36 201.41
Avg. Time 0.23 0.25 0.24 0.25 0.23 0.23
Instance 13 Best Sc. 113.0 113.0 113.0 113.0 113.0 113.0
Avg. Sc. 86.19 85.13 87.19 86.81 86.12 86.56
Avg. Time 0.06 0.06 0.06 0.06 0.06 0.06
Instance 14 Best Sc. 198.0 198.0 198.0 202.0 202.0 202.0
Avg. Sc. 172.13 172.68 172.06 172.38 173.03 171.81
Avg. Time 0.23 0.21 0.21 0.19 0.19 0.19
Instance 15 Best Sc. 262.0 262.0 263.0 263.0 264.0 264.0
Avg. Sc. 239.06 239.63 239.91 239.22 238.78 239.11
Avg. Time 0.28 0.3 0.28 0.28 0.27 0.29
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 45

Table 6 Results for tight instances and RCL ordered by score


Instances/Alpha α = 0.0 α = 0.2 α = 0.4 α = 0.6 α = 0.8 α = 1.0
Instance 1 Best Sc. 14.0 14.0 14.0 14.0 14.0 14.0
Avg. Sc. 10.46 10.31 10.34 9.85 10.92 10.37
Avg. Time 0.1 0.05 0.05 0.04 0.05 0.05
Instance 2 Best Sc. 34.0 34.0 34.0 34.0 34.0 34.0
Avg. Sc. 28.1 27.65 27.81 27.91 27.6 27.7
Avg. Time 0.09 0.1 0.05 0.03 0.02 0.03
Instance 3 Best Sc. 60.0 60.0 60.0 60.0 60.0 60.0
Avg. Sc. 38.71 39.35 38.77 39.37 39.31 38.75
Avg. Time 0.16 0.08 0.09 0.11 0.1 0.08
Instance 4 Best Sc. 28.0 28.0 28.0 28.0 28.0 28.0
Avg. Sc. 19.57 19.89 19.73 19.82 20 20.2
Avg. Time 0.08 0.11 0.06 0.05 0.06 0.06
Instance 5 Best Sc. 67.0 67.0 67.0 67.0 67.0 67.0
Avg. Sc. 53.52 53.49 53.51 53.55 53.08 53.46
Avg. Time 0.09 0.05 0.03 0.05 0.05 0.05
Instance 6 Best Sc. 116.0 116.0 116.0 116.0 116.0 116.0
Avg. Sc. 77.77 78.77 79.66 78.12 78.38 78.43
Avg. Time 0.07 0.08 0.05 0.05 0.07 0.08
Instance 7 Best Sc. 43.0 45.0 45.0 45.0 45.0 45.0
Avg. Sc. 29.68 29.87 29.67 29.86 29.87 29.09
Avg. Time 0.02 0.01 0.01 0.01 0.01 0.01
Instance 8 Best Sc. 97.0 97.0 97.0 97.0 97.0 97.0
Avg. Sc. 78.68 78.55 78.17 78.47 79.38 78.57
Avg. Time 0.04 0.05 0.04 0.04 0.04 0.04
Instance 9 Best Sc. 150.0 151.0 154.0 154.0 157.0 158.0
Avg. Sc. 117.55 117.55 118.68 118.03 118.4 118.44
Avg. Time 0.08 0.07 0.07 0.07 0.07 0.08
Instance 10 Best Sc. 59.0 59.0 59.0 59.0 62.0 62.0
Avg. Sc. 40.63 39.78 40.58 40.72 40.16 40.22
Avg. Time 0.04 0.04 0.02 0.02 0.02 0.02
Instance 11 Best Sc. 128.0 128.0 128.0 128.0 128.0 128.0
Avg. Sc. 103.2 103.75 103.65 103.89 104.02 103.12
Avg. Time 0.06 0.06 0.06 0.06 0.06 0.06
Instance 12 Best Sc. 193.0 193.0 193.0 193.0 197.0 197.0
Avg. Sc. 158 157.95 157.81 158.17 157.23 156.57
Avg. Time 0.09 0.1 0.1 0.1 0.1 0.1
Instance 13 Best Sc. 71.0 73.0 74.0 74.0 74.0 74.0
Avg. Sc. 51.93 51.35 51.62 51.67 51.46 51.73
Avg. Time 0.02 0.02 0.02 0.02 0.02 0.02
Instance 14 Best Sc. 153.0 153.0 153.0 153.0 153.0 153.0
Avg. Sc. 128.66 128.35 128.52 128 128.68 128.13
Avg. Time 0.08 0.08 0.08 0.08 0.08 0.08
Instance 15 Best Sc. 237.0 237.0 237.0 237.0 237.0 237.0
Avg. Sc. 195.06 194.88 195.34 194.44 194.63 193.58
Avg. Time 0.11 0.11 0.11 0.12 0.11 0.12
46 A. Expósito et al.

column for each α value. The GRASP procedure was run 1000 times for each of the
instances and parameters used in experimentation. One thousand executions of the
GRASP for each parameter combination are carried out in less than one second. All
computations were carried out on a Intel Dual Core with 2.5 GHz processor and 4
GB Ram.
As we can see, different solutions are obtained by varying α and an increase in tol-
erance levels allows to find better solutions. Both results consistent with the proposed
fuzzy approach. As one would expect, in the computational results a differentiation
between the results of the flexible and tight instances is observed. Specifically and
as shown in Table 1, the flexible instances have a higher score in all cases for the best
solutions with respect to the tight instances.
Following the goal of maximizing the total score of the solution, the results shown
in Tables 3, 4, 5 and 6 reveal that ordering by score the list of candidates in GRASP is
more effective than ordering by time. The difference of the score between solutions
according to the ordering used can be appreciated more clearly in the Figs. 2 and 3.
In these figures, the best average scores for all α values are compared, taking into
account the two order type of the candidate list mentioned above.

7 Conclusion

In this study, we present a Soft Computing approach applied to the Fuzzy TTDPC,
specifically with fuzzy scores, fuzzy time constraints and fuzzy time windows con-
straints. In order to solve the problem to get high quality solutions in reasonable
time, GRASP metaheuristic has been used. The computational experiment confirms
that the proposed approach is feasible to solve this model. The application of this
methodology generates a set of different solutions consistent with its fuzzy nature.
Future work will extend experimentation with other instances which have a greater
number of POIs and clusters. Also we would like to evaluate the behavior and effi-
ciency of other metaheuristics. The multiobjective problem will be one of the first
lines of research to be studied. This multiobjective version will consider the score
obtained in the locations and route time in the objective function.

Acknowledgements This work has been partially funded by the Spanish Ministry of Economy
and Competitiveness with FEDER funds (TIN2015-70226-R) and supported by Fundación Cajaca-
narias research funds (project 2016TUR19) and the iMODA Network of the AUIP. Contributions
from Airam Expósito-Márquez is supported by la Agencia Canaria de Investigación, Innovación y
Sociedad de la Información de la Consejería de Economía, Industria, Comercio y Conocimiento
and by the Fondo Social Europeo (FSE).
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 47

References

1. Bellman, R., Zadeh, L.: Decision making in a fuzzy environment. Manag. Sci. 17(4), 141–164
(1970)
2. Brito, J., Expósito, A., Moreno, J.A.: Solving the team orienteering problem with fuzzy scores
and constraints. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp.
1614–1620. IEEE (2016)
3. Chao, I.M., Golden, B.L., Wasil, E.A.: The team orienteering problem. European J. Oper. Res.
88(3), 464–474 (1996)
4. Cura, T.: An artificial bee colony algorithm approach for the team orienteering problem with
time windows. Comput. Ind. Eng. 74, 270–290 (2014)
5. Delgado, M., Verdegay, J., Vila, M.: A general model for fuzzy linear programming. Fuzzy
Sets Syst. 29, 21–29 (1989)
6. Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. Glob. Optim.
6, 109–133 (1995)
7. Gavalas, D., Konstantopoulos, C., Mastakas, K., Pantziou, G., Tasoulas, Y.: Cluster-based
heuristics for the team orienteering problem with time windows. In: International Symposium
on Experimental Algorithms, pp. 390–401. Springer (2013)
8. Hu, Q., Lim, A.: An iterative three-component heuristic for the team orienteering problem with
time windows. European Journal of Operational Research 232(2), 276–286 (2014)
9. Karbowska-Chilinska, J., Zabielski, P.: Genetic algorithm solving the orienteering problem
with time windows. In: Advances in Systems Science, pp. 609–619. Springer (2014)
10. Labadie, N., Melechovský, J., Wolfler Calvo, R.: Hybridized evolutionary local search algo-
rithm for the team orienteering problem with time windows. J. Heur. 17(6), 729–753 (2011)
11. Labadie, N., Mansini, R., Melechovsk, J., Calvo, R.W.: The team orienteering problem with
time windows: An lp-based granular variable neighborhood search. European J. Oper. Res.
220(1), 15–27 (2012)
12. Matsuda, Y., Nakamura, M., Kang, D., Miyagi, H.: A fuzzy optimal routing problem for
sightseeing. IEEJ Trans. Electron. Inf. Syst. 125, 1350–1357 (2005)
13. Mendez, C.E.C.: Team Orienteering Problem with Time Windows and Fuzzy Scores. Ph.D.
thesis, National Taiwan University of Science and Technology (2016)
14. Montemanni, R., Gambardella, L.: An ant colony system for team orienteering problems with
time windows. Found. Comput. Decis. Sci. 34(4), 287–306 (2009)
15. Resende, M.G., Ribeiro, C.C.: Greedy randomized adaptive search procedures: advances,
hybridizations, and applications. In: Gendreau, M., Potvin, J.Y. (eds.) Handbook of Meta-
heuristics, International Series in Operations Research and Management Science, vol. 146, pp.
283–319. Springer, US (2010)
16. Souffriau, W., Vansteenwegen, P., Berghe, G.V., Oudheusden, D.: A greedy randomised adap-
tive search procedure for the team orienteering problem. In: proceedings of EU/MEeting (2008)
17. Vansteenwegen, P., Oudheusden, D.V.: The mobile tourist guide: an or opportunity. OR Insight
20(3), 21–27 (2007)
18. Vansteenwegen, P., Souffriau, W., Berghe, G.V., Oudheusden, D.V.: Iterated local search for
the team orienteering problem with time windows. Comput. Oper. Res. 36(12), 3281–3290
(2009)
19. Vansteenwegen, P., Souffriau, W., Berghe, G.V., Oudheusden, D.V.: The city trip planner: an
expert system for tourists. Expert Syst. Appl. 38(6), 6540–6546 (2011)
20. Verdegay, J.: Fuzzy Information and Decision Processes, Chap. Fuzzy Mathematical Program-
ming. North-Holland (1982)
21. Verdegay, J.L.: Fuzzy optimization: models, methods and perspectives. In: In proceeding 6th
IFSA-95 World Congress, pp. 39–71 (1995)
22. Verdegay, J.L., Yager, R.R., Bonissone, P.P.: On heuristics as a fundamental constituent of soft
computing. Fuzzy Sets Syst. 159, 846–855 (2008)
23. Verma, M., Shukla, K.K.: Application of fuzzy optimization to the orienteering problem. Adv.
Fuzzy Syst. 2015, 8 (2015)
Characterization of the Optimal Bucket
Order Problem Instances and
Algorithms by Using Fuzzy Logic

Juan A. Aledo, José A. Gámez, Orenia Lapeira and Alejandro Rosete

Abstract The problem of aggregating several rankings in order to obtain a consensus


ranking that generalizes them is an active field of research with several applications.
The Optimal Bucket Order Problem (OBOP) is a rank aggregation problem where the
resulting ranking may be partial, i.e. ties are allowed. Several algorithms have been
proposed for OBOP. However, their performances with respect to the characteristics
of the instances are not studied properly. This paper uses fuzzy logic in order to
describe different aspects of OBOP instances (such as the number of items to be
ranked, distribution of the precedences values, and the utopicity) and the performance
of several OBOP algorithms. Based on this fuzzy characterization, several fuzzy
relations between instance characteristics and the performance of the algorithms
have been discovered.

1 Introduction

The problem of aggregating preferences or rankings about a set of N items is a very


active field of research [1, 17], with several applications [3, 22, 23]. In general, the
objective of rank aggregation problems is to obtain a consensus ranking that gener-
alizes a set of input rankings. In this paper we deal with the Optimal Bucket Order
Problem (OBOP), which is a distance-based rank aggregation problem [18, 24]. The
input in the OBOP is a matrix M where each cell M(i, j) represents the probability

J. A. Aledo (B) · J. A. Gámez


Universidad de Castilla-La Mancha, Albacete, Spain
e-mail: juanangel.aledo@uclm.es
J. A. Gámez
e-mail: jose.gamez@uclm.es
O. Lapeira · A. Rosete
Universidad Tecnológica de La Habana “José Antonio Echeverría” (Cujae), Havana, Cuba
e-mail: olapeira@ceis.cujae.edu.cu
A. Rosete
e-mail: rosete@ceis.cujae.edu.cu

© Springer Nature Switzerland AG 2019 49


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_3
50 J. A. Aledo et al.

that the item i precedes the item j in the set of preferences to be aggregated. The
objective of OBOP is to find an ordered partition of the set of items [24] (called
bucket order) that minimizes the L 1 distance with respect to the precedence matrix
M.
Several algorithms have been proposed for OBOP, e.g. the Bucket Pivot Algorithm
(BPA) [18, 24], the SortCC algorithm [21] and L I A GM P2 [6]. There is not a unique
winner algorithm for all OBOP instances. For example, L I A GM P2 outperforms BPA
(in general), but not always [6]. In addition, the influence of the characteristics
of OBOP instances in the performance of OBOP algorithms has not been studied
properly.
This paper focuses on analyzing the performance of several OBOP algorithms with
respect to the instance characteristics. Our aim is to derive interesting knowledge that
serves to characterize and to predict the performance of OBOP algorithms, not only
to help selecting the best algorithm for each instance but also to devise new future
versions of them. In order to do that, the fuzzy logic [25] is used to characterize
each instance in terms of fuzzy labels. Then, the relations among these fuzzy labels
are studied. The idea is to use the interpretability and flexibility of fuzzy logic as a
valuable tool to analyze the comparative results of several OBOP algorithms. This
is in line with the call for using fuzzy concepts as “a methodological basis in many
application domains” [19].
The main contributions of this work are:
• Several fuzzy measures are proposed to characterize OBOP instances.
• Several fuzzy measures are proposed to characterize the performance of OBOP
algorithms
• We obtain interesting knowledge that serves to characterize OBOP instances and
to predict the performance of OBOP algorithms
The paper is organized as follows: Section 2 introduces several concepts related
to OBOP. Section 3 presents the fuzzy methodology used to characterize OBOP
instances and algorithms. Section 4 explains the knowledge discovered through the
experiments conducted by using several dataminig techniques.

2 Technical Background

A simple example of ranking may be 1|2, 3|4 which represents the preferences about
the items 1, 2, 3 and 4, meaning that the item 1 is the preferred one, followed by the
items 2 and 3 which are tied, and finally the item 4. This implies that in this ranking,
item 1 precedes 2, 3 and 4 (denoted 1 ≺ 2, 1 ≺ 3, 1 ≺ 4), and items 2 and 3 precede
item 4 (denoted 2 ≺ 4, 3 ≺ 4). Note that in this case, there is no precedence relation
between items 2 and 3, i.e. they are tied.
When we have to deal with different rankings (i.e. opinions about the order of
several items), the problem of aggregating all of them in a consensus ranking arises.
The rank aggregation problem is a family of problems that tries to obtain a ranking
Characterization of the Optimal Bucket Order Problem Instances … 51

which represents a consensus over a set of input rankings. There are many types of
rank aggregation problems depending on the characteristics of the rankings to be
aggregated, the expected resulting ranking, the conceptual meaning of the prece-
dences, and the measure used to indicate that a ranking is better or worse than other
[1, 5, 7].
The simplest problem is when the rankings to be aggregated are complete rankings
without ties (i.e. the Kemeny problem) [7, 20]. There are several variations of this
case, for example, by allowing partial or incomplete rankings [6, 13] as inputs. The
Optimal Bucket Order Problem (OBOP) is a singular rank aggregation problem that
receives as input a precedence matrix that describes the precedences in a set of
rankings and produces as output a complete ranking (possibly with ties) [16, 18].
For example, suppose that the rankings to be aggregated are:
• 1|2|3|4
• 2|1|3|4
• 1|2|4|3
• 2|1|4|3
For these four rankings, the precedence matrix M is shown in Table 1.
In Table 1, the cell M(1, 2) is equal to 0.5 because the item 1 precedes the item
2 in 2 out of 4 cases. On the other hand, M(1, 4) = 1 because the item 1 precedes
the item 4 in all the cases, while M(4, 1) = 0 because the item 4 never precedes
the item 1. It should be noted that M(i, i) = 0.5, i = 1..N (main diagonal) and
M(i, j) + M( j, i) = 1, i = j for i, j = 1..N . The objective of OBOP is to find a
ranking whose matrix representation R minimizes the distance
 with respect to the
input precedence matrix M, i.e. minimizes D(R, M) = i, j |R(i, j) − M(i, j)|.
For example, the matrix representation R of the ranking 1|2|3|4 is presented in
Table 2.

Table 1 The precedence matrix M for the set of rankings { 1|2|3|4, 2|1|3|4, 1|2|4|3, 2|1|4|3 }
M 1 2 3 4
1 0.5 0.5 1 1
2 0.5 0.5 1 1
3 0 0 0.5 0.5
4 0 0 0.5 0.5

Table 2 The precedence matrix R that represents the ranking 1|2|3|4


R 1 2 3 4
1 0.5 1 1 1
2 0 0.5 1 1
3 0 0 0.5 1
4 0 0 0 0.5
52 J. A. Aledo et al.

The distance D(R, M) between the matrix M in Table 1 and the matrix R in
Table 2 is 2, derived from the difference in the bolded cells in Table 2 with respect to
the corresponding ones in Table 1. In this case, the best complete rankings (without
ties) that may be returned as solution are any of the four mentioned input rankings.
If ties are allowed in the output (as is possible in OBOP) the situation changes. It
is clear that 1 and 2 must be placed before 3 and 4, but there is not a clear precedence
relation between 1 and 2, and between 3 and 4. This suggests that the expected result
should be a bucket with the items 1 and 2, placed before another bucket containing
the items 3 and 4. The ranking that meets this requirement is 1, 2|3, 4, that is, a
solution with two buckets. Indeed, the matrix representation R1 of this ranking is the
matrix in Table 1, i.e. R1 =M. Consequently, this ranking is the optimal solution with
distance D(R1 , M) = 0. This example illustrates the advantage of allowing ties in
the output, as occurs in OBOP.
More formally, suppose a set of items [[N ]] = {1, . . . , N } to be ranked. A bucket
order is a total or linear order with ties [15, 18], i.e. a partial order [24]. This implies
that each item belongs to a bucket. In Fagin [15] a bucket order is defined as a
transitive binary relation between the buckets, i.e. B1 ≺ B2 ≺ · · · ≺ Bk . In general,
given two items u ∈ Bi , v ∈ B j , if i < j then u precedes v. All the items that belong
to the same bucket are considered tied.
OBOP is NP-Hard [18]. Observe that, given N items, there are N ! rankings which
order all the items without ties, i.e. they are permutations of the N items. If ties are
allowed, the number of possible rankings is much larger [6, 8].
The most popular algorithm for solving the OBOP is the Bucket Pivot Algorithm
(BPA) [18, 24]. However, BPA suffers from some drawbacks because of the random
selection of the pivot used to decide the positions of the other elements. Kenkre [21]
proposed to face this problem by first constructing the buckets (clustering step) and
then ordering them, resulting in the SortCC algorithm. Recently [6], a new version
of BPA called L I A GM P2 has been presented (it will be called simply LIA in the rest
of the paper). LIA is based on a heuristic selection of the pivot and the inclusion
of several elements as pivots. These algorithms may also be used to produce initial
solutions for metaheuristic-based solutions for OBOP [4].
Based on the results presented in [6], LIA outperforms (in general) BPA, but not
in all instances. In spite of the fact that some recommendations were included in
[6] about which is the best algorithm for each OBOP instance, the influence of the
characteristics of the instances on the performance of the algorithms was not studied.

3 Methodology

As the main objective is to find interesting knowledge that serves to characterize


and to predict the performance of OBOP algorithms, this paper takes as input sev-
eral experimental data about OBOP instances and the performance of some OBOP
algorithms in these instances. To the best of our knowledge the larger experimen-
tal comparison of OBOP algorithms was presented in [6] with 50 OBOP instances.
Characterization of the Optimal Bucket Order Problem Instances … 53

Thus, the experimental results presented in [6] are taken as initial point for the rest
of the paper.

3.1 Instances

To characterize each instance, we compute several characteristics [6] of 50 OBOP


instances (precedence matrices) obtained from real rankings sets available in PrefLib
[22]. Among the characteristics of each instance (matrices M) are:

• N : number of items or elements to be ranked


• U : utopicity, i.e. the distance from M to its utopian matrix [6]
• N ear P : proportion of precedences in M that are near to 0 or 1, i.e. a clear prece-
dence
• N ear T : proportion of precedences in M that are near to 0.5, i.e. a clear tie
• N ear I : proportion of precedences in M that are far from 0, 0.5 and 1
• u v : utopian value as was defined in [6], a possibly super-optimal minimum value
• av : anti-utopian value as was defined in [6], a possibly super-optimal maximum
value
• P: Precedences, i.e. number of cells with values greater than 0

A general description of the instances used in the experiments is presented in


Table 3 in terms of the minimum (Min), median, average(Ave), maximum(Max) and
standard deviation (StdDev) of the values of the previous characteristics in the 50
instances. As can be noted, each characteristic varies in different scales.
It is possible to visualize several characteristics in a unique graph if these charac-
teristics vary in similar intervals. For example Fig. 1 show how the values of N ear P ,
N ear T and N ear I are distributed in the different instances (they are sorted in the
x-axis according to the values of N in ascending order).
Figure 2 also allows to visualize the values of these three characteristics along
with the values of U .

Table 3 General description of the 50 instances used in [6]


Min Median Average Max StdDev
N 10 70 88.66 242 69.72
U 0.48 0.64 0.67 0.95 0.12
N ear P 0.04 0.49 0.51 0.97 0.2
N ear T 0 0.16 0.14 0.49 0.1
N ear I 0.03 0.37 0.34 0.62 0.14
uV 2.33 521.75 1198.29 6693.8 1816.16
aV 59.16 3883.5 10333.16 51217.6 14961.84
P 73 3900 9841.4 51077 14511.64
54 J. A. Aledo et al.

1
0,9
0,8
0,7
0,6
NearT
0,5 NearI
NearP
0,4
0,3
0,2
0,1

Fig. 1 Distribution of N ear P , N ear T and N ear I as N grows

1
0,9
0,8
0,7
0,6 U
NearP
0,5
NearT
0,4 NearI
0,3
0,2
0,1
0

Fig. 2 Distribution of U , N ear P , N ear T and N ear I as N grows

The same can be done with utopian and anti-utopian value as it is presented in
Fig. 3.
However, it is hard to compare characteristics that vary in different scales. For
example, if all the previous characteristics are included in the same figure, the val-
ues of the series in Fig. 2 will not be observable because they are smaller than the
corresponding values in the series of Fig. 3.
In order to ease the comprehension and comparison of these characteristics, we
define fuzzy labels which correspond to adjectives related to the values of these
characteristics. Indeed, we define a fuzzy label that means “the value is great”
associated to each of the previous characteristics. The goal is to use “meaningful
linguistic labels” as is suggested in [19]. Based on the numerical data, fuzzy labels
of type “Great Value”(GV ) are created by using the same fuzzification function
(Eq. 1) for all the characteristics: the maximum value gets 1, the minimum value gets
0, and the others get the proportion with respect the minimum and maximum.
Characterization of the Optimal Bucket Order Problem Instances … 55

60000

50000

40000

uv
30000
av

20000

10000

Fig. 3 Distribution of u V and aV as N grows

Fig. 4 General description of the values of GV labels

x − Minimum
GV (x) = (1)
Maximum − Minimum

This fuzzification by normalization (Eq. 1) allows to obtain a unified view of all


these characteristics, that allows to visualize the median, average and standard devi-
ation in a common graph presented in Fig. 4 (minimum is always 0, while maximum
is always 1).
Figure 4 is a graphical version of the information presented in Table 3. For exam-
ple, this graph allows to note that P, u v and av are similar in terms of the proportional
values of median, average and standard deviation. These three characteristics vary a
lot. Indeed, it may be observed that their standard deviations are greater than their
averages and medians. This does not happens for U , N ear P , N ear T , and N ear I , i.e.
their values of the medians and averages are greater than their standard deviations.
N has a different pattern, because the standard deviation is greater than its median
but it is smaller than its averages.
It is possible to put all the values of GV labels in the same graph (see Fig. 5).
The series are sorted by the value of N in ascending order. We can realize that
56 J. A. Aledo et al.

1
0,9
0,8
N
0,7 uv
0,6 av
U
0,5
NearP
0,4 NearT
0,3 NearI
P
0,2
0,1
0

Fig. 5 Dispersion of the values of GV labels as N grows

in the smallest instances GV(P), GV(u V ) and GV(aV ) are smaller while the other
characteristics varies a lot. It is also worth noting that as N grows the values of
P, u V and aV also grows similarly, while GV(U ), GV(N ear P ), GV(N ear T ) and
GV(N ear I ) tend to be much closer around medium values.
Other comparison based on data-mining techniques allowed by the fuzzification
will be presented in Sect. 4.1.

3.2 Algorithms

A similar methodology may be applied to the performance of the algorithms. We take


(from [6]) the average performance in 30 independent runs of BPA (the most popular
algorithm for OBOP) and LIA (the best algorithm in the experiments conducted in
[6] and the current state-of-the-art for OBOP) over the 50 instances. In addition,
we include in the comparison two other algorithms that were not used in [6]: Borda
algorithm [10, 14] (without tie-breaking), and SortCC [21]. In the case of SortCC,
two different values of parameter Beta (0.10 and 0.25) are used, based on the results
reported in [21]). It is worth mentioning that the results of Borda and SortCC in these
50 instances are not available in the literature. So, they are executed and presented
in Table 4 (columns Borda, CC10 and CC25 ). In Table 4 the column ID shows the
identifier of each instance, i.e. the name of each “Election-Data” database in PrefLib
[22].
In spite of the fact that LIA achieves the best performance in 46 of the 50 instances,
it should be noted that CC25 is the best one in four instances, and CC10 , Borda
and BPA are the best ones in one instance (in some instances there are ties in the
best position). In addition, as LIA is slower than the other algorithms [6], it is also
interesting to compare the other algorithms. If LIA is not taken into account, BPA is
the best in 23 instances, CC25 in 17, CC10 in 8, and Borda in 4.
Characterization of the Optimal Bucket Order Problem Instances … 57

Table 4 Results of the algorithms in the 50 instances used in [6]


ID Borda BPA LIA CC10 CC25
14 -01 30.94 15.01 13.09 19.43 15.54
15 -48 21.67 16.27 13 21.91 16.42
06 -03 7.33 6.05 5.77 6.33 6.07
06 -04 2.67 3.39 3 2.67 3.44
15 -74 56.67 53.91 45.26 57.84 55.31
06 -11 19.11 14.53 14.22 16.44 14.6
06 -12 9 5.67 5.67 6.67 5.67
06 -28 43.33 35.82 39.56 37.04 35.64
06 -48 18.56 12.86 12.49 15.8 12.76
06 -18 12.44 8 7.93 9.76 8.04
15 -50 160.5 137.1 116.5 157.47 156.23
15 -67 159 145.37 112.2 149.53 146.07
06 -46 30.29 22.39 20.29 21.72 21.98
15 -73 261.33 244.23 194.67 262.4 247.27
15 -65 358.5 347.67 272.9 346.77 349.8
15 -46 261 267.07 209.13 243.97 253.03
15 -44 357 345.23 293.2 338.23 332.53
15 -55 423 470.33 446.37 397.8 396.73
15 -66 398.5 448 327.23 394.97 389.87
15 -34 631.5 678.27 548.43 626.93 625.77
15 -59 538 590.2 414.87 531.17 522.57
15 -77 699.33 672.31 607.56 702.02 656.48
15 -54 632.5 594.83 433.43 611.8 620.03
15 -30 835.5 784.03 635.9 832.83 835.83
15 -41 1153 1085.4 938.83 1195.03 1176.67
15 -16 931 1002.5 749 931.93 931.73
15 -57 1231.5 1063.4 884.7 1218.83 1216.87
15 -69 1109 1244.23 894.4 1091 1097.03
15 -19 1383.5 1489.27 1156.43 1388.7 1374.5
15 -24 1497.5 1588.3 1236.27 1524.67 1517.37
15 -27 1785.5 1805.67 1303.17 1789.47 1764.97
15 -42 2081 1882.43 1503.43 1921.07 1948.77
15 -12 1809 1921.17 1455 1789.47 1752.23
15 -29 2041 2127.8 1565 1982.33 2009.6
15 -07 2063 2216.5 1605.33 2047.83 2035.73
15 -18 2508.5 2733.03 1902.5 2483.17 2466
15 -25 2679.5 2694.8 2068.87 2640.8 2606.07
15 -09 2448 2502.1 2042.4 2382.67 2417.97
15 -20 3436.5 3136.8 2540.87 3484.67 3480.17
(continued)
58 J. A. Aledo et al.

Table 4 (continued)
ID Borda BPA LIA CC10 CC25
15 -17 3246 3192.33 2544 3282.13 3311.73
15 -40 3756 3534.9 2730 3715.97 3775.53
15 -23 3973 3988.27 3156 3967.37 4010.57
15 -32 4320.5 4694.6 3559.93 4337.07 4315.77
15 -14 4887.5 5219.37 3822.27 4973.57 4903.13
15 -01 8488.5 11890.47 8787.03 7825.23 7705.23
11 -01 7489 6397.59 6058.5 7492.99 6669.47
15 -02 7489 6511.65 6058.46 7492.96 6696.22
11 -02 14845 13909.29 12545 14848.87 14396.77
15 -04 14845 14209.03 12545 14848.8 14535.67
15 -03 16219 13696.37 12233.87 15469.87 15364.07

Table 5 General description of the performance of the algorithms in the 50 instances used in [6]
Borda BPA LIA CC10 CC25
Min 2.67 3.39 3 2.67 3.44
Median 1020 1032.95 816.85 1011.47 1014.38
Average 2473.67 2433.12 2013.58 2438.76 2384.15
Max 16219 14209.03 12545 15469.87 15364.07
StdDev 3853.73 3678.34 3197.64 3784.17 3681.41

Observe that the inner complexity of each problem makes harder to see the dif-
ference among the performance of the algorithms. Indeed, as is shown in Fig. 3 the
minimum (u v is a lower bound) and the maximum value (av is a upper bound) for
each instance increases as N grows in these 50 instances. This increased complex-
ity of the instances implies that if we compute the average performance (minimum
distance), the results seem to be very similar (see Table 5) or if we plot the results of
Table 4 (see Fig. 6 where x-axis corresponds to values of N in ascending order).
As it was explained in the previous section, in order to ease the analysis, we may
define fuzzy labels. For example, it is possible to define a label to characterize the
fulfillment of the adjective “the algorithm X performs well”, that may be applied to
the previous algorithms. For each value corresponding to the performance of each
algorithm, a “Good Performance” (G P) label is defined by taking into account that
“good performance” corresponds to small values in OBOP because it is a minimiza-
tion problem.
As the maximum and minimum values are unknown for all the instances, the
utopian value (u v ) and the anti-utopian value (av ) are used as extremes (super-
optimal) values. Then, the values are fuzzified as shown in Eq. 2.
Characterization of the Optimal Bucket Order Problem Instances … 59

18000

16000

14000

12000
Borda
10000 BPA
LIA
8000 CC10
CC25
6000

4000

2000

Fig. 6 Performance of the algorithms as N grows

Table 6 Overall performance of the algorithms in terms of GP labels in the 50 instances


Borda BPA LIA CC10 CC25
Min 0.59 0.8 0.84 0.76 0.76
Median 0.87 0.86 0.92 0.87 0.88
Average 0.87 0.88 0.93 0.88 0.89
Max 1 1 1 1 1
StdDev 0.07 0.06 0.04 0.06 0.06

av − x
G P(x) = (2)
av − u v

Just by using the GP labels of Eq. 2, the fuzzy version of Table 5 becomes more
meaningful (see Table 6) clarifying the overall advantage of LIA over the other
algorithms (see also Fig. 7). It may be observed that LIA is the only algorithm where
the median and averages are greater than 0.9. It also has the minimum value of
standard deviation, i.e. it is the most stable algorithm.
The stability of LIA can be also observed if we plot GP labels (see Fig. 8).
Also, if we compare the performance of the algorithms in terms of the GV labels
(Fig. 8) it is more noticeable the superior performance of LIA with respect to the
other algorithms.
It is worth noting that the fuzzification method used to characterize the perfor-
mance of each algorithm does not depends on the set of algorithms considered, thus
any future result of OBOP algorithms in these problems may be analyzed in the light
of the same framework used here.
The previous fuzzification allow to obtain the values of the fuzzy adjectives
(labels) GV(N ), GV(U ), GV(N ear P ), GV(N ear T ), GV(N ear I ), GV(u V ), GV(aV ),
GV(U ), GP(BPA), GP(Borda), GP(LIA), GP(CC10 ) and GP(CC25 ) for each OBOP
60 J. A. Aledo et al.

1
0,9
0,8
0,7
0,6 Min
Median
0,5
Average
0,4 Max
StdDev
0,3
0,2
0,1
0

Fig. 7 General description of GP labels of each algorithm

1
0,95
0,9
0,85
GP(Borda)
0,8
GP(BPA)
0,75 GP(LIA)
GP(CC10)
0,7
GP(CC25)
0,65
0,6
0,55
0,5

Fig. 8 GP labels of each algorithm as N grows

instance. In Sect. 4 these fuzzy labels will be used to obtain general knowledge
describing the problems, the performance of the algorithms and the relation among
them.

4 Results and Discussion

This section describes several patterns regarding the instances, the performance of
the algorithms and the relation among them. In order to study these three dimensions
(instances characteristics, algorithm performance, instances-algorithms relations),
we will analyze the database composed by the 13 columns (representing the values
of the previous 8 GV labels and the 5 GP labels) and the 50 rows (representing the
instances), in order to obtain statistical measures (correlations), clusters (by using
Fuzzy C-Means [9]), and fuzzy predicates (by using FuzzyPred [11]).
Characterization of the Optimal Bucket Order Problem Instances … 61

4.1 Instances

By using Fuzzy C-Means [9], the 50 instances may be grouped (in terms of the
instance characteristics) into the following 5 clusters. Figure 9 shows the centers of
each cluster. They are called P clusters because they are obtained only by taking into
account the similarity in terms of the problem characteristics.

• Cluster P0 (10 instances): 14–01, 15–16, 15–34, 15–41, 15–48, 15–50, 15–57,
15–65, 15–73, 15–77
• Cluster P1 (14 instances): 15–07, 15–09, 15–12, 15–14, 15–17, 15–18, 15–20,
15–23, 15–25, 15–27, 15–29, 15–32, 15–40, 15–42
• Cluster P2 (8 instances): 06–03, 06–04, 06–11, 06–12, 06–18, 06–28, 06–46, 06–
48
• Cluster P3 (6 instances): 11–01, 11–02, 15–01, 15–02, 15–03, 15–04
• Cluster P4 (12 instances): 15–19, 15–24, 15–30, 15–44, 15–46, 15–54, 15–55,
15–59, 15–66, 15–67, 15–69, 15–74

In Fig. 9 can be observed that cluster P2 includes the instances with the smallest
values of N , u v , av , N ear T , N ear I and P, and with the maximum values of U and
N ear P (Fig. 9 presents the complement of U and N ear P to ease the visualization).
This implies that this cluster P2 groups the easiest instances (smallest, clear prece-
dences). Cluster P3 is the opposed one, with the biggest instances with low utopicity.
Cluster P1 has intermediate values in terms of the number of items (N ). Finally,
clusters P0 and P4 are similar in terms of N (almost small). However, P0 includes
the instances with less utopicity U and biggest values of N ear I , while P4 has greatest
values of utopicity and N ear P (the second in these aspects, only dominated by P2).
Another way to observe the relations among the characteristics of the instances is
by using the Pearson correlation coefficients between N and the other aspects (see
Table 7). It is interesting to note that the correlation between N and the other labels
is only significant with respect to u V , aV and P (also observable in Fig. 5). It is also

1
0,9
0,8
0,7
0,6 P0
P1
0,5
P2
0,4 P3
P4
0,3
0,2
0,1
0

Fig. 9 Centers of the problem-based clusters obtained by Fuzzy C-Means


62 J. A. Aledo et al.

Table 7 Pearson correlations between the GVs labels used to characterize the instances
N uV aV U N ear P N ear T N ear I P
N 1 0.94 0.96 −0.4 −0.38 0.39 0.27 0.95
uV 0.94 1 0.97 −0.34 −0.34 0.4 0.21 1
aV 0.96 0.97 1 −0.28 −0.27 0.29 0.18 0.98
U −0.4 −0.34 −0.28 1 0.98 −0.7 −0.92 −0.32
N ear P −0.38 −0.34 −0.27 0.98 1 −0.79 −0.89 −0.33
N ear T 0.39 0.4 0.29 −0.7 −0.79 1 0.41 0.38
N ear I 0.27 0.21 0.18 −0.92 −0.89 0.41 1 0.2
P 0.95 1 0.98 −0.32 −0.33 0.38 0.2 1

interesting to note the strong direct relation between U and N ear P , and the negative
relation between U and N ear I . In general, greatest values of N are not aligned with
extreme values of the others labels (U , N ear T , N ear I , N ear P ). In spite of that, there
is a slight tendency to increase N ear T and N ear I and to decrease U and N ear P
when N is large.

4.2 Algorithms

By using Fuzzy C-Means [9], the 50 instances may be grouped (in terms of the sim-
ilarity of the performance of the algorithms) into the following 3 clusters. Figure 10
shows the centers of each cluster. They are called A clusters because they are based
on the similarity in terms of the algorithm performance.

• Cluster A0 (10 instances): 06-03, 06-04, 06–11, 06–12, 06–18, 06–28, 06–46,
06–48, 11–001, 15–02.
• Cluster A1 (14 instances): 11–02, 14–01, 15–03, 15–04, 15–20, 15–25, 15–30,
15–40, 15–41, 15–48, 15–50, 15–57, 15–65, 15–77
• Cluster A2 (26 instances): 15–01, 15–07, 15–09, 15–12, 15–14, 15–16, 15–17,
15–18, 15–19, 15–23, 15–24, 15–27, 15–29, 15–32, 15–34, 15–42 15–44, 15–46,
15–54, 15–55, 15–59, 15–66, 15–67, 15–69 15–73, 15–74

Despite the fact that LIA seems to be the best algorithm, it is worth analyzing
the performance of the other algorithms. Based on the centers shown in Fig. 10, the
cluster A0 is composed by the instances where all the algorithms performs better,
almost reaching the utopian value. On the contrary, cluster A2 is composed by the
instances where the performance of the algorithms are the furthest from the utopian
value. In this cluster the performance of LIA is comparatively better that the others,
followed by BPA (outperforming both CC versions and Borda). Finally, cluster A1 is
composed by the instances where both CC are the second best algorithms followed
by Borda (BPA is the worst). This knowledge is very useful for the application case
Characterization of the Optimal Bucket Order Problem Instances … 63

1
0,98
0,96
0,94
0,92
A0
0,9 A1
0,88 A2

0,86
0,84
0,82
0,8

Fig. 10 Centers of the algorithms-based clusters obtained by Fuzzy C-Means

Table 8 Coincidences of each P cluster with each A cluster


A0 A1 A2 Total
P0 0 (0, 0) 7 (70, 50) 3 (30, 12) 10
P1 0 (0, 0) 3 (21, 21) 11 (79, 42) 14
P2 8 (100, 80) 0 (0, 0) 0 (0, 0) 8
P3 2 (33, 20) 3 (50, 21) 1 (17, 4) 6
P4 0 (0, 0) 1 (8, 7) 11 (92, 42) 12
Total 10 14 26 50

where the execution time is an important constraint. As LIA is slower than the others,
it is interesting to know when the other algorithms are preferable.

4.3 Instances Versus Algorithms

In order to obtain relations between the characteristics of the instances and the per-
formance of the algorithms, we first show the coincidences between the P clusters
(obtained in Sect. 4.1) and the A clusters (obtained in Sect. 4.2). Table 8 shows the
relationships between each set of clusters. In each cell appears the number of cases
where each problem-based cluster P with each algorithms-based cluster A coincide.
In parenthesis it is shown the percent that represents each value with respect to the
total by rows and columns, respectively.
It is worth noting that in 100% of instances of cluster P2 the performance of
the algorithms is according to the cluster A0, while in 80% of instances where the
performance of the algorithms is according to the cluster A0 the instances belong
to cluster P2. This implies that the instances of type P2 are very related with the
64 J. A. Aledo et al.

Table 9 Pearson correlations between the measures and the performance of the algorithms
Borda BPA LIA CC10 CC25 Ave
N −0.2 −0.42 −0.47 −0.33 −0.37 −0.36
uV −0.23 −0.34 −0.45 −0.31 −0.32 −0.33
aV −0.12 −0.24 −0.33 −0.19 −0.2 −0.22
U 0.85 0.86 0.79 0.89 0.84 0.84
N ear P 0.93 0.83 0.78 0.92 0.85 0.86
N ear T −0.8 −0.57 −0.55 −0.66 −0.65 −0.65
N ear I −0.78 −0.79 −0.74 −0.86 −0.78 −0.79
P −0.2 −0.31 −0.42 −0.28 −0.29 −0.3

performance of type A0, i.e. for the easiest instances (P2) all the algorithms behave
similarly (A0).
Also note the strong relation between other clusters:
• P4 and A2: Small problems with great values of utopicity and N ear P , where the
advantage of LIA is very clear, followed by BPA.
• P1 and A2: Problem with intermediate size (N ), where the advantage of LIA is
very clear, followed by BPA.
• P0 and A1: Small problems with less utopicity and biggest values of N ear I , where
the advantage of LIA is clear, followed (in order) by both CC and Borda.
For problems of type P3 (biggest problems with low utopicity) there is not clear
tendency to belong to a cluster of algorithms performance.
Finally, Table 9 shows the correlation between each GP value (describing instances
characteristics) and the GV values (associated with algorithms performance). It is
remarkable the strong influence of U and N ear P in the performance of the algorithms.
That is, the greater the values of GV(U ) and GV(N ear P ), the greater the GP labels
associated to all the algorithms (best performance). LIA is the algorithm with less
dependence with respect to N ear P , N ear T , N ear I and U , i.e. it is more stable with
respect to the precedences in the input matrices. Similarly, N ear T affects negatively
more to Borda than to the rest of algorithms.
Based on Table 9 we can conclude that:
• U and N ear P are the characteristics that most positively influence on the perfor-
mance of the algorithms (i.e. the greater the value of GV(U ) or GV(N ear P ), the
greater the GP labels)
• N ear T and N ear I are the aspect that most negatively influence the performance
of the algorithms. That is, the smaller the value of GV(N ear T ) or GV(N ear I ), the
greater the GP labels.
The following predicate generalizes this knowledge. The symbol “−” is used to
indicate the complement (negation).
Characterization of the Optimal Bucket Order Problem Instances … 65

IFGV (U ) ∨ GV (N ear P ) ∨ −GV (N ear T ) ∨ −GV (N ear I )

THEN

(G P(Bor da) ∧ G P(B P A) ∧ G P(L I A) ∧ G P(CC10 ) ∧ G P(CC25 ))

As each instance has a value of membership to each GV and GP labels, it is


possible to evaluate the degree of membership of this predicate in each instance, and
its value in the whole database. This is the truth value of this predicate, i.e. the Fuzzy
Predicate Truth Value (FPTV). The FPTV [11] (in this case computed by using Zadeh
min/max functions [25] for conjunction/disjunction) in the 50 instances is 0.74, that
corresponds to the minimum value in the 50 instances (universal generalization of
this predicate in the set of instances). This pessimistic characteristic of FPTV has
been studied in [11], and other quality measures for fuzzy predicates were introduced.
For example, the Fuzzy Predicate Support (FPS) is computed as the average of the
truth value in each instance. For the previous predicate, FPS is 0.87. Both values
confirm the validity of this predicate because it has truth values greater than 0.74 in
all the instances and on average its truth value is 0.87.
At was previously stated, LIA is the algorithm with the best overall performance.
This is confirmed by the Friedman non-parametrical test available in Keel [2] (Fried-
man statistic: 87.68, p-value: 0) that results in the following mean rank: LIA (1.26),
CC25 (2.89), B P A (3.35), CC10 (3.53), and Bor da (3.97). The advantage of LIA
with respect to the others algorithms was confirmed (p-value ≤ 0.05) after applying
the 1x N Holm post-hoc [12]. By using the N x N Holm post-hoc [12] it was also con-
firmed the advantage of LIA with respect to the other algorithms, and the advantage
of CC25 with respect to Bor da with p-values smaller than 0.05. The advantages in
the other pairwise comparisons of algorithms (CC25 with respect to B P A and CC10 ;
B P A with respect to CC10 and Bor da; and CC10 with respect to Bor da) was not
confirmed (p-values greater than 0.2).
Now we analyze the relation between this order (LIA, CC25 , B P A, CC10 and
Bor da) with respect to the instance characteristics. In order to do so, the O(X,Y)
labels are introduced (Eq. 3), to describe the degree of advantage of algorithm X
with respect to algorithm Y, i.e. each O(X,Y) label means “algorithm X outperforms
algorithm Y ”. To obtain the value of O(X,Y) the difference between the performance
of both algorithms is computed (i.e. GP(X)-GP(Y)), and the value of O(X,Y) is
obtained by fuzzification of these differences according to Eq. 3. We consider that a
difference of 0.1 between the GP labels is enough to state the superior performance
of an algorithm.

⎨ 1 if G P(X ) − G P(Y ) > 0.1
O(X, Y ) = 0 if G P(X ) − G P(Y ) < 0.1 (3)
⎩ (G P(X )−G P(Y )+0.1)
0.2
otherwise
66 J. A. Aledo et al.

Table 10 FPTV and FPS of O(X ,Y ), CO and LW labels


O(LIA,CC25 ) O(CC25 ,BPA) O(BPA,CC10 ) O(CC10 ,Borda) CO
FPTV 0.38 0.27 0.05 0.43 0.05
FPS 0.71 0.52 0.52 0.53 0.42
O(LIA,CC25 ) O(LIA,BPA) O(LIA,CC10 ) O(LIA,Borda) LW
FPTV 0.38 0.46 0.38 0.44 0.38
FPS 0.71 0.72 0.73 0.75 0.68

Table 11 Pearson correlations between “Correct Order” (CO) label and the GV labels
N uV aV U N ear P N ear T N ear I P
CO −0.34 −0.31 −0.3 0.42 0.33 −0.23 −0.31 −0.29
LW 0.09 −0.03 −0.1 −0.65 −0.68 0.52 0.61 −0.04

Based on the previous definition of O(X,Y) labels, it is possible to compute


the value of fuzzy labels O(LIA,CC25 ), O(CC25 ,BPA), O(BPA,CC10 ) and O(CC10 ,
Borda) that express the consecutive advantage based on the order LIA, CC25 , B P A,
CC10 and Bor da. With these four labels, is is possible to introduce the label “correct
order” (CO) to describe the accomplishment of the previous order in each instance
as it is presented in the predicate of Eq. 4.

C O = O(L I A, CC25 ) ∧ O(CC25 , B P A) ∧ O(B P A, CC10 ) ∧ O(CC10 , Bor da)


(4)
In a similar way, the “LIA Wins” (LW) predicate is introduced to describe the
accomplishment of the advantage of LIA over the others algorithms. Then, the LW
label is defined in Eq. 5.

L W = O(L I A, CC25 ) ∧ O(L I A, B P A) ∧ O(L I A, CC10 ) ∧ O(L I A, Bor da)


(5)
The truth value (FPTV) and fuzzy support (FPS) of CO and LW (and the inner
components of both) over the 50 instances are presented in Table 10. It should be
noted that LW values are greater than CO values. This means that the superiority
of LIA over the other algorithms is clearer than the order (LIA, CC25 , BPA, CC10 ,
Borda).
The Pearson correlations of the truth value of CO and LW with respect to GV
labels is presented in Table 11.
Based on these correlations, it may be observed that the order defined by CO is
positively influenced by N ear P and U , but negatively affected by the other aspects.
In general, this means that this order is clearer when the instances are smaller (N ),
almost utopian (U ), and with majority of precedences near to 0 or 1. These are the
simplest instances.
Characterization of the Optimal Bucket Order Problem Instances … 67

Table 12 Examples of predicates describing the database with GP and GV labels


Predicate FTPV FPS
−GV (N ear P ) ∨ 0.84 0.93
G P(CC25 ) ∨ G P(L I A)
−GV (N ear I ) ∨ 0.84 0.93
G P(CC25 ) ∨ G P(L I A)
G P(CC10 ) ∨ −G P(CC25 ) ∨ 0.84 0.93
G P(L I A)
(−GV (N ) ∧ G P(CC10 )) ∨ 0.84 0.93
G P(CC25 ) ∨ G P(L I A)
(GV (N ear P ) ∧ 0.76 0.89
G P(Bor da)) ∨ G P(CC25 ) ∨
G P(CC10 )

On the other side, the advantage of LIA (based on the LW label) is reinforced
when utopicity U and N ear P decreases (negative correlations), and when N ear T
and N ear I increases, i.e. the most difficult instances. The influence of N , u V , aV
and P in the advantage of LIA is almost zero.
Another way to obtain dependencies between instance characteristics and the
performance of the algorithms may be achieved by using other data-mining methods
over the 50 instances. By using FuzzyPref [11] several predicates with high values
of FPTV and FPS were obtained. They are presented in Table 12.
For example, the first two predicates state that (in the instances) it is true that LIA
or CC25 achieve a good performance, or the values of N ear P or N ear I are small.
The third predicate states that LIA or CC10 achieve a good performance or CC25
does not achieve a good performance. The fourth predicate is similar to the first
two predicates, but the alternative to the good performance of LIA or CC25 is that
CC10 achieves a good performance and the number of items N is small. In general,
the meaning of these predicates is that LIA, CC25 and CC10 are complementary,
guarantying a good performance of any of them in most instances. In the instances
where this does not happen, the values of N ear P , N ear I or N must be small. The last
predicate states that the performance of CC25 or CC10 is good, or Borda performs
well and N ear P is great. This implies that any of these three algorithms (CC25 , CC10
and Borda) performs well in each instance. In general, it worths noting that most of
the predicates include G P(L I A), which confirms the good overall performance of
LIA.
It is also possible to obtain (by using Fuzzy C-Means) a clustering that describes
all the GV and GP values, which results in three clusters with the centers that are
shown in Fig. 11.
The first cluster PA0 contains 8 instances (06–03, 06–04, 06–11, 06–12, 06–28,
14–01, 15–48 and 15–74) with the lowest values of N , u V , aV , N ear T , N ear I and
P, and the largest values of U and N ear P which are the simplest instances where
all the algorithms obtain a very good performance.
68 J. A. Aledo et al.

1
0,9
0,8
0,7
0,6
0,5 PA0
0,4 PA1
PA2
0,3
0,2
0,1
0

Fig. 11 Centers of the clusters obtained by Fuzzy C-Means with all the GV and GP labels

The second cluster PA1 contains 6 instances (06–18, 06–46, 06–48, 15–65, 15–
67 and 15–73) with the highest values of N , u V , aV , N ear T , N ear I and P, and the
smallest values of U and N ear P , which are the most complex instances where the
performance of all the algorithms is not as good as in PA0, so enlarging the advantage
of LIA over the others algorithms.
The cluster PA2 contains the remaining 36 instances, with intermediate values of
all the instance characteristics between those of PA0 and PA1 (closer to PA0 in terms
of N , u V , aV , and P; closer to PA1 in terms of U and the distribution of values in the
matrices (N ear P , N ear T , and N ear I ). In PA2 the performance of the algorithms is
similar to the performance in PA1 with a little better performance of LIA.
In general, it may be stated that in the simplest instances all the algorithms behave
similarly good, but in the most complex instances with larger dimensions (in terms
of N , u V , aV and P) and where the distribution of the matrix values is biased toward
more uncertain values (greatest values of N ear I , N ear T and smaller values of N ear P
and U ) the performance of LIA is much clearly superior with respect to the others
algorithms).

5 Conclusions

In this work the fuzzy logic concepts are used to analyze the performance of several
OBOP algorithms and to derive relations among these results and the characteristics
of a given instance.
In particular, we introduce several fuzzy labels to describe the characteristics
of the instances and the performance of the algorithms. Then, these fuzzy labels
are used as input to several datamining methods. Based on the several datamining
models obtained, we can state that:
Characterization of the Optimal Bucket Order Problem Instances … 69

• The utopicity makes the OBOP instances easier.


• The percent of precedences that are near to 0 or 1 makes the OBOP instances
easier.
• When the utopicity is smaller the problem becomes harder.
• When the precedences are far from 0 or 1 the problem becomes harder.
• When the utopicity is smaller and the precedences are far from 0 or 1 the advantage
of LIA (the state-of-the-art algorithm for OBOP) is greater.

Based on these results several recommendations for future work may be derived:
• It would be interesting to provide algorithms to deal with OBOP instances with
small values of utopicity and with a majority of precedences far from 0 and 1.
• Based on the characteristics of each instance, a meta-algorithm may be designed
that first identifies the characteristics of the instance and then recommends and
uses the most appropriate algorithm for each particular case.
• The same methodology based on fuzzy logic used in this work may be applied to
derive conclusions about the characteristics of the instances and the performance
of the algorithms in other optimization problems.

References

1. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clus-
tering. J. ACM 55:5, 23:1–23:27 (2008)
2. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-
mining software tool: data set repository, integration of algorithms and experimental analysis
framework. J. Mult. Valued Logic Soft Comput. 17, 255–287 (2010)
3. Aledo, J. A., Gámez, J. A., Molina, D., Rosete, A.: Consensus-based journal rankings: a com-
plementary tool for bibliometric evaluation. J. Assoc. Inf. Sci. Technol. (2018). http://dx.doi.
org/10.1002/asi.24040
4. Aledo, J.A., Gámez, J.A., Rosete, A.: Approaching rank aggregation problems by using evo-
lution strategies: the case of the optimal bucket order problem. European J. Oper. Res. (2018).
http://dx.doi.org/10.1016/j.ejor.2018.04.031
5. Aledo, J.A., Gámez, J.A., Molina, D.: Using extension sets to aggregate partial rankings in a
flexible setting. Appl. Math. Comput. 290, 208–223 (2016)
6. Aledo, J.A., Gámez, J.A., Rosete, A.: Utopia in the solution of the Bucket Order Problem.
Decis. Support Syst. 97, 69–80 (2017)
7. Ali, A., Meila, M.: Experiments with kemeny ranking: what works when? Math. Social Sci.
64, 28–40 (2012)
8. Bailey, R.W.: The number of weak orderings of a finite set. Soc. Choice Welf. 15(4), 559–562
(1998)
9. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New
York, NY (1981)
10. Borda, J.: Memoire sur les Elections au Scrutin. Histoire de l’Academie Royal des Sciences
(1781)
11. Ceruto, T., Lapeira, O., Rosete, A.: Quality measures for fuzzy predicates in conjunctive and
disjunctive normal form. Ingeniería e Investigación 3(4), 63–69 (2014)
70 J. A. Aledo et al.

12. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonpara-
metric statistical tests as a methodology for comparing evolutionary and swarm intelligence
algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
13. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In:
Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pp. 13–22.
ACM (2001)
14. Emerson, P.: The original Borda count and partial voting. Soc. Choice Welf. 40(2), 353–358
(2013)
15. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., Vee, E.: Comparing and Aggregating
Rankings with Ties. In: PODS 2004, pp. 47–58. ACM (2004)
16. Feng, J., Fang, Q., Ng, W.: Discovering bucket orders from full rankings. In: Proceedings of
the 2008 ACM SIGMOD International Conference on Management of Data, pp. 55–66. ACM
(2008)
17. Fürnkranz, J., Hüllermeier, E.: Preference learning: an introduction. In: Fürnkranz, J., Hüller-
meier, E. (eds.), Preference Learning, pp. 1–17. Springer (2011)
18. Gionis, A., Mannila, H., Puolamäki, K., Ukkonen, A.: Algorithms for discovering bucket orders
from data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’06, pp. 561–566. ACM (2006)
19. Hullermeier, E.: Does machine learning need fuzzy logic? Fuzzy Sets Syst. 281, 292–299
(2015)
20. Kemeny, J.L., Snell, J.G.: Mathematical Models in the Social Sciences. Blaisdell-New York
(1962)
21. Kenkre, S., Khan, A., Pandit, V.: On Discovering bucket orders from preference data. In:
Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 872–883. SIAM
(2011)
22. Mattei, N., Walsh, T.: PrefLib: A Library for Preferences http://www.preflib.org. In: Perny,
P., Pirlot, M., Tsoukiàs, A. (eds.) 2003 Proceedings of Third International Conference on
Algorithmic Decision Theory, ADT, pp. 259–270. Springer (2013)
23. Nápoles, G., Dikopoulou, Z., Papageorgiou, E., Bello, R., Vanhoof, K.: Prototypes construction
from partial rankings to characterize the attractiveness of companies in Belgium. Appl. Soft
Comput. 42, 276–289 (2016)
24. Ukkonen, A., Puolamäki, K., Gionis, A., Mannila, H.: A randomized approximation algorithm
for computing bucket orders. Inf. Process. Lett. 109(7), 356–359 (2009)
25. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Uncertain Production Planning
Using Fuzzy Simulation

Juan Carlos Figueroa-García, Eduyn-Ramiro López-Santana


and Germán-Jairo Hernández-Pérez

Abstract Some industrial problems lack of statistical information, so third party


information (experts, surveys, etc) is often used for planning. This chapter presents
a method for simulating a production planning scenario where tasks have no prob-
abilistic execution times, using experts opinions. We use fuzzy execution times to
simulate the mean flow time of the system under non-probabilistic uncertainty.

1 Introduction and Motivation

Production planning is a critical activity in manufacturing since it provides vital


information for logistic planning. Most of available planning techniques are based
on deterministic/statistical methods, so the lack of available/reliable statistical data
leads to use information coming from experts as a reliable source.
Lack of statistical information is an often problem in simulation models and
sometimes information coming from experts is used. Experts opinions/perceptions
can be handled using fuzzy sets in order to deal with uncertainty coming from human-
like information regarding words or concepts. The use of fuzzy sets to compute
functions, optimization, differential equations etc. is wide, so we attempt to use
them in simulation systems.
The chapter is organized as follows: Sect. 1 shows the introduction. Section 2
introduces some fuzzy random variable generation concepts; in Sect. 3, the simulation
model of a production planning scenario and its results are presented, and Sect. 4
presents the concluding remarks of the study.

J. Carlos Figueroa-García (B) · E.-R. López-Santana


Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
e-mail: jcfigueroag@udistrital.edu.co
E.-R. López-Santana
e-mail: erlopezs@udistrital.edu.co
G.-J. Hernández-Pérez
Universidad Nacional de Colombia, Bogotá Campus, Colombia
e-mail: gjhernandezp@gmail.com

© Springer Nature Switzerland AG 2019 71


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_4
72 J. Carlos Figueroa-García et al.

2 Basics on Fuzzy Random Variable (FRV) Generation

Firstly, we establish basic notations. P(X ) is the class of all crisp sets, and F (X )
is the class of all fuzzy sets. A fuzzy set A : X → [0, 1] is defined on an universe of
discourse X and is characterized by a membership function μ A (x) ∈ [0, 1]. A fuzzy
set A can be represented as the set of ordered pairs of x, μ A (x), i.e.,

A = {(x, μ A (x)) | x ∈ X }. (1)

A fuzzy number (see Bede [1], Diamond and Kloeden [3]) is defined as follows:
Definition 1 Consider a fuzzy subset of the real line A : R → [0, 1]. Then A is a
fuzzy number (FN) if it satisfies the following properties:
(i) A is normal, i.e. ∃x  ∈ R such that A(x  ) = 1;
(ii) A is α-convex (i.e. A(αx + (1 − α)y)  min{A(x), A(y)}, ∀ α ∈ [0, 1]);
(iii) A is upper semicontinuous on R, i.e. ∀ε > 0 ∃δ > 0 such that

A(x) − A(x  ) < ε, |x − x  | < δ;

Let us denote G (R) ∈ F (X ) as the class of all FNs which includes gaussian,
triangular, exponential, etc. The α-cut of a set A ∈ G (R) namely αA is the set of
values with a membership degree equal or greatest than α i.e.
α
A = {x | μ A (x)  α} ∀ x ∈ X, (2)
   
α
A = inf α μ A (x), sup α μ A (x) = Ǎα , Âα . (3)
x x

Varón-Gaviria et al. [9] and Pulido-López et al. [8] proposed a method for gener-
ating random variables using μ A and its α-cuts. First, the area of a fuzzy number is
defined next:
Definition 2 Let A ∈ G (R) be a fuzzy number, then its area Λ is as follows:
 
Λ = Λ1 + Λ2 = l(x)d x + r (x)d x (4)
x∈R x∈R

Definition 3 Let Λ1 , Λ2 be the partial areas of A ∈ G (R). Then the normalized


areas λ1 , λ2 of A ∈ G (R) are defined as follows:

Λ1
λ1 = , (5)
Λ
Λ2
λ2 = , (6)
Λ
λ1 + λ2 = 1. (7)
Uncertain Production Planning Using Fuzzy Simulation 73

Fig. 1 Gaussian fuzzy set A

Definition 4 Let A ∈ G (R) be symmetric, then the following properties hold:

Λ1 = Λ2 = 0.5Λ, (8)
λ1 = λ2 = 0.5, (9)
| Ǎα − c| = | Âα − c| → Âα = Ǎα + 2(c − Ǎα ), (10)

where c is the core value of A i.e. μ A (c) = 1.

The proposed method is summarized in Algorithm 1 and Fig. 1.

Algorithm 1 α-cut based method


Require: μ A ∈ G (R) (see Eq. (3))
Compute λ1 and λ2 using Definitions (2) and (3)
Compute αA = [ Ǎα , Âα ]
Compute U1 [0, 1] and U2 [0, 1]
Set α = U1 [0, 1]
If U2  λ1 then x = Ǎα , otherwise set x = Âα
return α, x as the realization of X (ω) with membership α

Probabilistic random variable generation uses the cumulative probability function


F(x) to return X (ω) using a random number U [0, 1] (see Devroye [2], and Law and
Kelton [6]). Varón-Gaviria et al. [9] and Pulido-López et al. [8] used μ A to compute
the areas λ1 , λ2 , then use two random numbers U1 , U2 , use U1 to compute αA and
U2 to select either Ǎα or Âα as its random realization X (ω) where ω is the set of all
possible realizations of X (ω), often known as the universal set of X .
74 J. Carlos Figueroa-García et al.

3 A Production Planning Scenario

We simulate a production planning scenario were tasks execution times are defined
by experts (uncertain production planning has been covered by Mula et al. [7], and
Lan and Zhao [5]). This way, we generate fuzzy random variables (see Varón-Gaviria
et al. [9] and Pulido-López et al. [8]) to simulate execution times, the Mean Flow
Time (MFT), its membership degree and its overall performance.
A company processes five products Pi in five stages S j using a path Ri ; all products
start been coding in a warehouse W . All paths are as follows:
• R1 :W → S2 → S1 → S4 → S5
• R2 :W → S3 → S4 → S2 → S5
• R3 :W → S2 → S3 → S4 → S5
• R4 :W → S2 → S1 → S3 → S4 → S5
• R5 :W → S1 → S3 → S2 → S5 → S4
For the sake of understanding, every path Ri consists on a set of stages (i, j), the
ordering relation given above, a processing time pi j , and a starting instant ti j . The
main problem is the lack of reliable statistical data, so we only have experts-based
information (a.k.a third party sources).
The goal is to characterize the mean flow time MFT of the system namely MFTi ,
defined as the time in which a product is finished and released to the customer:

MFTi = ti j  + pi j  (11)

where j  is the last processing stage of the path i.


Another performance measure is the production time PT per product i, PTi :

PTi = pi j , ∀ j ∈ Nn (12)
j

and the waiting time WT per product i, WTi is:

WTi = (MFTi − PTi ) (13)

Now, the experts on every stage (workers, engineers, etc.) were asked about their
opinions of the processing times per product/stage and its shapes. Every expert has a
different perception about the processing times in every station, so they use different
membership functions to represent their knowledge about every processing time.
The shapes and fuzzy random variable generators X (ω) of every fuzzy processing
time pi j were proposed by Varón-Gaviria et al. [9] and Pulido-López et al. [8].
Gaussian fuzzy random variables G(c, δ) are shown next:
Uncertain Production Planning Using Fuzzy Simulation 75

μ A (x) = exp−0.5((x − c)/δ ) ∀ x ∈ (−∞, ∞),


2


Ǎα = c − −2 · ln(α) · δ,

Âα = c + −2 · ln(α) · δ.

and the generation procedure for Gaussian fuzzy number given U1 , U2 , is:

c − √−2 · ln(U1 ) · δ, for U2  0.5,
X (ω) = (14)
c + −2 · ln(U1 ) · δ, for U2 > 0.5.

The equations to generate triangular fuzzy random variables T (a, c, b) are:





x −a b−x
μ A (x) = max min , , 0 ∀ x ∈ [a, b],
c−a b−c
Ǎα = α(c − a) + a,
Âα = b − α(b − c).

and the generator for triangular fuzzy numbers given U1 , U2 , is:



⎪ c−a
⎨ U1 (c − a) + a, if U2  ,
X (ω) = b−a (15)
⎪ c − a
⎩ b − U1 (b − c), if U2 > .
b−a

The equations to generate exponential fuzzy random variables e(θ ) are:

μ A (x) = exp−x/θ ,
Âα = −θ · ln(α).

and the generator for exponential fuzzy numbers given U1 , is:

X (ω) = −θ · ln(U1 ). (16)

All information is summarized in Table 1.


The MFTi is a function of FNs namely M(MFT), then it is computed via fuzzy
extension principle, as shown as follows:

M(MFTi ) = sup min



{ pi j  }, (17)
MFTi j

where j  is the last processing stage of the path i, and the membership degree for
PTi namely P(PTi ) is shown as follows:

P(PTi ) = sup min{ pi j }. (18)


PTi j
76 J. Carlos Figueroa-García et al.

Table 1 Parameters and shapes of each fuzzy production time pi j


(i, j) W S1 S2 S3 S4 S5
P1 E(0.5) G(3, 1) E(5) E(2) T (1, 4, 5)
P2 E(0.5) E(5) T (2, 4, 6) E(2) T (1, 2.5, 5)
P3 E(0.5) E(4.5) T (2, 3, 5) E(2.5) T (2, 3, 6)
P4 E(0.5) G(3.5, 1.5) E(6) T (1, 4, 6) E(4) T (3, 7, 8)
P5 E(0.5) G(6, 1.5) E(5) T (8, 11, 15) E(11) T (3, 6, 8)

Fig. 2 MFT and PT for P1

The simulation was ran in Promodel (see Harrell [4]), and we performed 12
runs of 196 h each, which corresponds to a full operation year. The resultant fuzzy
sets M(MFTi ) and P(PTi ) for the product 1 are shown in Fig. 2.
Note that PT1 seems closer to a convex fuzzy set while MFT1 does not. The
reason lies in the processing itself: every product Pi has to wait until predecessors
are processed on every stage j. Those waiting times are uncertain and suddenly add
uncertainty to MFT1 reflected into its behavior. This actually means that even if
Uncertain Production Planning Using Fuzzy Simulation 77

Fig. 3 MFT, PT and WT as time series for P1


78 J. Carlos Figueroa-García et al.

Table 2 Descriptive measures per product


Product P1 P2 P3 P4 P5
Mean(MFT) 29.836 31.948 31.903 44.213 48.159
Var(MFT) 165.113 191.672 172.440 178.514 158.676
min MFT 8.4 8.32 10.65 20 10.92
K(MFT) 2.268 1.957 2.108 0.722 3.121
Skw(MFT) 1.324 1.278 1.295 0.880 1.377
max MFT 83.05 89.34 92.79 94 109.31
Mean(PT) 13.671 14.041 14.331 23.449 30.911
Var(PT) 28.677 24.997 25.428 50.353 34.954
min PT 4.44 5.22 5.79 10.56 9.08
K(PT) 3.927 1.784 2.501 2.295 1.384
Skw(PT) 1.521 1.210 1.299 1.264 0.768
max PT 44.52 35.86 42.34 58.68 59.25
Mean(WT) 16.165 17.907 17.573 20.763 17.248
Var(WT) 150.651 178.087 162.663 165.587 149.554
min WT 0.12 0.01 0.33 0.04 0.07
K(WT) 2.776 2.741 2.444 1.302 3.801
Skw(WT) 1.508 1.510 1.445 1.098 1.709
max WT 66.32 77.79 78.1 75.63 77.31

processing times pi j are known, then waiting times at every stage add uncertainty
to the total time in the system, which is a natural consequence of multiple products
been processed into common stages. A similar behavior can be seen for all products
(see Figs. 4 and 5 in the Appendix).
Figure 3 shows a time series of MFT1 , PT1 and WT1 for the first 1000 runs of
the simulated experiments. It is interesting to see that MFT1 , and W1 seem to be
close each other, so we can infer that MFT1 is sensitive to WT1 (same for remaining
products, as shown in (Figs. 6, 7 and 8) the Appendix).
On the other hand, all MFT, PT and WT actually show a random behavior accord-
ing to the ARCH test for which no significance was found in all series. Runs and
Turning points were not performed since MFT, PT and WT strongly depends on its
predecessors, so they will reject the test.
Table 2 shows the average (Mean), variance (Var), min, max, kurtosis (K), and the
skewness (Skw) of the MFT, PT and WT of every product. Note that every product
shows a mixed performance which is a clear sign of the goodness of the proposed
method since it does not produce uniform results but non-uniform values, which is
highly desirable in simulation systems in order to cover unexpected events and see
its effect over the system.
Uncertain Production Planning Using Fuzzy Simulation 79

We performed a Friedman test to compare all runs, and we found no significant dif-
ferences among them (p-value = 0.617). To compare the performance of all products
we performed the ANOVA and Levene tests, and we found no significant evidence
to think that all means (p-value ≈ 0) and variances (p-value ≈ 0) are equal. This
means that the results of all runs are statistically similar and all processing times PTi
are different. Finally we performed an ARCH test for every product (MFTi , PTi and
WTi ) using 5 lags and we found no any heteroscedasticity effect.
It is clear that the system has a performance conditioned to every product, its
path, and fuzzy uncertainty, but the proposed simulation methodology produces a
non uniform performance, which is expected from mixed fuzzy random generation.

4 Concluding Remarks

We have applied the fuzzy random variable generation method proposed by Varón-
Gaviria et al. [9] and Pulido-López et al. [8] to a production planning scenario with
successful results. All MFTi , PTi and WTi were simulated and modeled as fuzzy
sets, and some convex/nonlinear behaviors were seen.
When analyzing MFTi , PTi and WTi as time series we can see that they show
a random behavior (no ARCH effect is present) which is one of our goals: involve
fuzzy uncertainty into simulation systems.
The interaction among different products in all stations cause differences between
MFTi and PTi which leads to add uncertainty to MFTi as a consequence of WTi in
every stage. While PTi is a pure fuzzy function, MFTi involve ti j  which and WTi
which are complex to be individually characterized.
The perception of experts can be used in discrete event simulation problems
where no statistical information is available/reliable with satisfactory results using
our proposal which is able to deal with any shape of a fuzzy number.
Finally, some interesting topics to be covered in the future are: (i) simulation of
fuzzy logic systems, (ii) complexity analysis of our proposal, (iii) comparison to
statistical approaches, and (iv) extensions to Type-2 fuzzy sets.

Acknowledgements The authors would like to thank to Prof. Miguel Melgarejo and Prof. José
Jairo Soriano for their invaluable discussion around all topics treated in this chapter, and a special
thanks is given to all members of the LAMIC Research Group.

Appendix

In this appendix we present the results of MFTi and PTi and the results of 1000 runs
of the simulation model (Figs. 4, 5, 6, 7 and 8).
80 J. Carlos Figueroa-García et al.

Fig. 4 MFT for P2 , P3 , P4 , P5


Uncertain Production Planning Using Fuzzy Simulation 81

Fig. 5 PT for P2 , P3 , P4 , P5
82 J. Carlos Figueroa-García et al.

Fig. 6 MFT as time series for P2 , P3 , P4 , P5


Uncertain Production Planning Using Fuzzy Simulation 83

Fig. 7 PT as time series for P2 , P3 , P4 , P5


84 J. Carlos Figueroa-García et al.

Fig. 8 WT as time series for P2 , P3 , P4 , P5


Uncertain Production Planning Using Fuzzy Simulation 85

References

1. Bede, B.: Mathematics of Fuzzy Sets and Fuzzy Logic. Springer (2013)
2. Devroye, L.: Non-uniform Random Variate Generation. Springer, New York (1986)
3. Diamond, P., Kloeden, P.: Metric topology of fuzzy numbers and fuzzy analysis. Fundamentals
of Fuzzy Sets. 7 (2000)
4. Harrell, C.: Simulation using ProModel, 3rd ed. McGraw-Hill (2012)
5. Lan, Y., Zhao, R.: Minimum risk criterion for uncertain production planning problems. Int. J.
Prod. Econ. 61(3), 591–599 (2011)
6. Law, A., Kelton, D.: Simulation Modeling and Analysis. Mc Graw Hill (2000)
7. Mula, J., Poler, R., García-Sabater, J., Lario, F.: Models for production planning under uncer-
tainty: a review. Int. J. Prod. Econ. 103(1), 271–285 (2006)
8. Pulido-López, D.G., García, M., Figueroa-García, J.C.: Fuzzy uncertainty in random variable
generation: a cumulative membership function approach. Commun. Comput. Inf. Sci. 742(1),
398–407 (2017)
9. Varón-Gaviria, C.A., Barbosa-Fontecha, J.L., Figueroa-García, J.C.: Fuzzy uncertainty in ran-
dom variable generation: an α-cut approach. LNCS 10363(1), 1–10 (2017)
Fully Fuzzy Linear Programming Model
for the Berth Allocation Problem with
Two Quays

Flabio Gutierrez, Edwar Lujan, Rafael Asmat and Edmundo Vergara

Abstract In this work, we study the berth allocation problem (BAP), considering
the cases continuous and dynamic for two quays; also, we assume that the arrival
time of vessels is imprecise, meaning that vessels can be late or early up to a allowed
threshold. Triangular fuzzy numbers represent the imprecision of the arrivals. We
present two models for this problem: The first model is a fuzzy MILP (Mixed Integer
Lineal Programming) and allows us to obtain berthing plans with different degrees of
precision; the second one is a model of Fully Fuzzy Linear Programming (FFLP) and
allows us to obtain a fuzzy berthing plan adaptable to possible incidences in the vessel
arrivals. The models proposed have been implemented in CPLEX and evaluated in a
benchmark developed to this end. For both models, with a timeout of 60 min, CPLEX
find the optimum solution for instances up to 10 vessels; for instances between 10
and 65 vessels it finds a non-optimum solution and for bigger instants no solution
is founded. Finally we suggest the steps to be taken to implement the model for the
FFLP BAP in a maritime terminal of containers.

1 Introduction

The maritime transport of containers continues to increase mainly because of the


ease to carrying the goods as well as the large quantity of containers that vessels can
transport. During the year 2016, for instance around 701 420 047 TEUs (Twenty-foot

F. Gutierrez (B)
Department of Mathematics, National University of Piura, Piura, Peru
e-mail: flabio@unp.edu.pe
E. Lujan (B)
Department of Informatics, National University of Trujillo, Trujillo, Peru
e-mail: elujans@unitru.edu.pe
R. Asmat (B) · E. Vergara (B)
Department of Mathematics, National University of Trujillo, Trujillo, Peru
e-mail: rasmat@unitru.edu.pe
E. Vergara
e-mail: evergara@unitru.edu.pe
© Springer Nature Switzerland AG 2019 87
R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_5
88 F. Gutierrez et al.

Equivalent Unit) have moved all over the world; at present, China leads this type of
transport with 199 565 501 TEUs, followed by the United States with 48 381 723
TEUs, according to UNCTAD [18].
Port terminals that handle containers are usually known as Maritime Container
Terminals (MCT), have different shapes and dimensions, and some of them have
many quays. Since a MCT is an open system with three distinguishable areas (berth,
container yard and landside areas) there exist different complex optimization prob-
lems [17]. In this work we focus on the Berth Allocation problem (BAP).
The BAP is an NP-Hard complexity problem [12], consisting in allocating one
position and a time of berthing to each vessel arriving to the terminal. When a vessel
arrives to the quay, it needs to wait before it can be attended. The goal of the present
work is to minimize such waiting time.
Due to multiple factors such as weather conditions (rain, storms, etc.), technical
problems, stops at other terminals, among others, vessels can arrive earlier or later
than their scheduled arrival time which makes the actual times of arrival for each
vessel highly uncertain [2, 11]. This situation affects the operations of load and
discharge, other activities at the terminal and the services required by costumers.
There are many types of uncertainty such as the randomness, imprecision (ambi-
guity, vagueness), and confusion, that can be categorized either stochastic or fuzzy
[24]. Since the fuzzy sets are specially designed to deal with imprecision, they were
selected for the present work.
The administrators of MCT continuously review and change the plans, but a
frequent review of the berthing plan is not a desirable thing from a planning of
resources point of view. Therefore, the capacity of adaptation of the berthing plan is
important for a good system performance that a MCT manages. As a result, a robust
model providing a berthing plan that supports the possible imprecision (earliness or
lateness) in the arrival time of vessels and easily adaptable is desirable.
Among the many attributes commonly desired to classify the models related to the
BAP [1]. The spatial and temporal attributes are the most important ones. The spatial
attribute can be discrete or continuous. In the discrete case, the quay is considered
as a finite set of berths, where segments of finite length describe every berth and
usually a berth just works for one vessel at a time, whereas in for the continuous
case, the vessels can berth at any position within the limits of the quay. On the
other hand, the temporal attribute can be static or dynamic. In the static case, all
the vessels are assumed to be at the port before performing the berthing plan while
for the dynamical case, the vessels can arrive to the port at different times during
the planning horizon. In [1], the authors make an exhaustive review of the current
existing literature about BAP. To our knowledge, there are very few studies dealing
with BAP and with imprecise (fuzzy) data.
A cooperative search is developed in [10], to deal with the problem of the discrete
and dynamical BAP. A such work the problem is assumed to be deterministic.
In [23], the uncertainty is dealed with probabilities, that is, it is supposed that
historical data are available to obtain the distribution of probability of arrival for each
vessel. In many ports there is not enough data available to obtain these distributions.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 89

A fuzzy MILP (Mixed Integer Lineal Programming) model for the discrete and
dynamic BAP was proposed in [4], triangular fuzzy numbers represent the arrival
times of vessels, they do not address the continuous BAP. According to Bierwirth
[1], to design a continuous model, the planning of berthing is more complicated than
for a discrete one, but the advantage is a better use of the space available at the quay.
The continuous and dynamic BAP, with imprecision in the arrival of vessels
represented by triangular fuzzy numbers was studied in [5, 6]. In the first a MILP
fuzzy model is proposed and a α − cuts method is used to obtain the solution. In the
latter, a Fully Fuzzy Linear Programming (FFLP) model is proposed and is solved
by the Nasseri method.
The models cited at previous works do not deal the BAP problem with two quays.
In [7], a MILP model to the BAP with multiple quays was developed. In this model,
the imprecision in the arrival of vessels is not taken into account.
In this work, we study the dynamical and continuous BAP with two quays and
imprecision in the arrival of vessels. We suppose that the distributions probability
for the advances and delays of vessels is unknown, that is, the problem can not be
treated with stochastic optimization. We assume that the arrival times of vessels are
imprecise, the triangular fuzzy numbers represent the imprecision.
This paper is structured as follows: In Sect. 2, we describe the basic concepts of
fuzzy sets. Section 3, presents the formulation of a Fully Fuzzy Linear Program-
ming Problem, and describes a method of solution. Section 4, describes the BAP, the
notation that was used in the models, the assumptions, and the benchmarks for the
BAP used to evaluated the models. Section 5, shows the fuzzy MIPL model for the
BAP with two quays. In Sect. 6, shows the FFLP model for the BAP with two quays.
Finally, conclusions and future lines of research are presented in Sect. 7.

2 Fuzzy Set Theory

The fuzzy sets offers a flexible environment to optimize complex systems. The con-
cepts about fuzzy sets are taken from [21].

2.1 Fuzzy Sets

 in X is a set of pairs:
Definition 1 Let X be the universe of discourse. A fuzzy set A

 = {(x, μ A(x)), x ∈ X }
A

where μ A : X → [0, 1] is called the membership function and, μ A(x) represents the

degree that x belongs to the set A.
90 F. Gutierrez et al.

In this work, we use the fuzzy sets defined on real numbers R.


 in R is normal if max x μ A(x) = 1.
Definition 2 The fuzzy set A

Definition 3 The fuzzy set A  in R is convex if and only if the membership function

of A satisfies the inequality

μ A[βx1 + (1 − β)x2 ] ≥ min[μ A(x1 ), μ A(x2 )] ∀x1 , x2 ∈ R, β ∈ [0, 1]

Definition 4 A fuzzy number is a normal and convex fuzzy set in R.


=
Definition 5 A triangular fuzzy number (see Fig. 1) is represented by A
(a1, a2, a3).
 = (a1, a2, a3) is denominated a non-
Definition 6 The triangular fuzzy number A
negative triangular fuzzy number ⇔ a1 ≥ 0.
 a fuzzy set and a real number α ∈ [0, 1]. The crisp set
Definition 7 Let A

Aα = {x : μ A(x) ≥ α, x ∈ R}

 (Fig. 1).
is called α − cut of A

This concept provides a very interesting approach in fuzzy set theory, since the
family of α − cuts contains all information about the fuzzy set. By adjusting the α
value we can get the range or set of values that satisfy a given degree of membership.
In other words, the α value ensures a certain level of satisfaction, precision of the
result or robustness of the model.
To a fuzzy set with membership function of triangular type, A  = (a1, a2, a3) (see
Fig. 1), the α − cut is given by:

Aα = [a1 + α(a2 − a1), a3 − α(a3 − a2)]. (1)

Fig. 1 Interval
corresponding to an α − cut
level, for a triangular fuzzy
number
Fully Fuzzy Linear Programming Model for the Berth Allocation … 91

2.2 Fuzzy Arithmetic

If we have the nonnegative triangular fuzzy numbers  a = (a1, a2, a3) and 
b=
(b1, b2, b3), the operations of sum and difference are defined as follows:
Sum: a + b = (a1 + b1, a2 + b2, a3 + b3).
Difference: a −b = (a1 − b3, a2 − b2, a3 − b1).

2.3 Comparison of Fuzzy Numbers

Comparison of fuzzy numbers allows us to decide between two fuzzy numbers


a and 
 b to determine the greatest one. However, fuzzy numbers do not always pro-
vide an ordered set like the real numbers do. All methods for ordering fuzzy num-
bers have advantages and disadvantages. Different properties have been applied to
justify comparison of fuzzy numbers, such as: preference, rationality, and robustness
[8, 19].
In this work, we use the method called First Index of Yagger [20]. This method
uses the ordering function

a1 + a2 + a3
R(A) = (2)
3
As a result, A ≤ B when R(A) ≤ R(B), that is,

a1 + a2 + a3 ≤ b1 + b2 + b3.

2.4 Distributions of Possibility

The imprecision can be represented by possibility distributions [22]. These distri-


butions allow us to formalize, in a reliable way, a very large amount of situations
estimating magnitudes located in the future. The measure of possibility of an event
can be interpreted as the degree of possibility of its occurrence. Among the various
types of distributions, triangular and trapezoidal ones are most common. Formally,
the distributions of possibility are fuzzy numbers; in this work, we use triangular dis-
tributions of possibility 
a = (a1, a2, a3), which are determined by three quantities:
a2 is value with the highest possibility of occurrence, a1 and a3 are the upper and
lower limit values allowed, respectively (Fig. 1). These bound values can be inter-
preted, e.g., as the most pessimistic and the most optimistic values depending on the
context.
92 F. Gutierrez et al.

3 Fully Fuzzy Linear Programming

Fuzzy mathematical programming is useful to handle situations within optimization


problems including imprecise parameters [13]. There are different approaches to
the fuzzy mathematical programming. When in the problem, the parameters and
decision variables are fuzzy and linear, this can be formulated as s FFLP. There are
many methodologies of solution to a FFLP [3]. Mostly of them, convert the original
fuzzy model in a classical satisfactory model.
In this work, we use the method of Nasseri et al. [14].
Given the FFLP
n
max j 
C Xj
j=1

Subject to

n
 xj ≤ 
ai j  bi , ∀i = 1 . . . m (3)
j=1

where parameters  ai j , 
cj,  b j and the decision variable  x j are nonnegative fuzzy
numbers ∀ j = 1 . . . n, ∀i = 1 . . . m.
If all parameters and decision variables are represented by triangular fuzzy
numbers, C ai j = (a1i j , a2i j , a3i j ), 
 = (c1 j , c2 j , c3 j ),  bi = (b1i , b2i , b3i ), 
xj =
(x1 j , x2 j , x3 j ).
Nasseri’s method transforms (3) into a classic problem of mathematical program-
ming.
⎛ ⎞
n
max R ⎝ (c1 j , c2 j , c3 j )(x1 j , x2 j , x3 j )⎠
j=1


n
a1i j x1 j ≤ b1i , ∀i = 1 . . . m (4)
j=1


n
a2i j x2 j ≤ b2i , ∀i = 1 . . . m (5)
j=1


n
a3i j x3 j ≤ b3i , ∀i = 1 . . . m (6)
j=1

x2 j − x1 j ≥ 0, x3 j − x2 j ≥ 0 (7)

where R is an ordering function (see Sect. 2.3).


Fully Fuzzy Linear Programming Model for the Berth Allocation … 93

4 Problem Description

The BAP with two quays consists in deciding the quay, the moment, and the position
when each vessel arriving to the terminal must moor. In this way the waiting time is
minimized. The BAP can be represented in a bidimensional way, as shown in Fig. 2,
the horizontal axis (Time) represents the time horizon and the vertical axis (Quay),
the length of the quay.
The notation to be used in the formulation of the problem is showed in Fig. 2 and
the Table 1.

Fig. 2 Representation of a vessel according to the time and position

Table 1 Notation of variables and parameters of the problem


Variables and parameters Description
V The set of incoming vessel
Q The set of quays
L Total length of the quay at the MCT
H Planning horizon
ai Arrival time at port, i ∈ V
li Vessel length, i ∈ V
hi Handling time of the vessel in the berth (service
time), i ∈ V
m iq Berthing time of vessel, i ∈ V, q ∈ Q
piq Berthing position, where the vessel will moor,
i ∈ V, q ∈ Q
wiq = m iq − ai Waiting time of vessel since the arrival to the
berthing, i ∈ V, q ∈ Q
diq = m iq + h i Departure time, i ∈ V, q ∈ Q
94 F. Gutierrez et al.

The decision variables are m iq and piq .


Depending on the model, the arrival times, berthing times, handling time, and
departure times of the vessel can be considered to be of fuzzy nature (imprecise) and
denoted by  , 
a, m  respectively.
h and d,
We consider the following assumptions: all the information related to the waiting
vessels is known in advance, the arrival time is imprecise (fuzzy), every vessel has
a draft that is lower or equal to the draft of the quay, the berthing and departures
are not time consuming, simultaneous berthing is allowed, safety distance between
vessels is not considered.

4.1 Benchmark BAPQCAP

The researching group “Inteligencia Artificial—Sistemas de Optimizacion” of the


Universidad Politécnica de Valencia (Spain), has developed a benchmark to the BAP
and to the Quay Crane Assignment Problem (QCAP). The benchmark is formed by
groups of vessels from 5, 10, 15 to 100; each group consists of 100 instances.
In Table 2 we can see an instance of 10 vessels.
This benchmark has been used to evaluate different meta-heuristics to the BAP
and the QCAP [15, 16]. Since imprecision is not considered in any of its parameters,
this benchmark is deterministic.

4.2 Benchmark BAPQCAP-Imprecise

With the aim of evaluating the models presented in [5, 6], we develop the benchmark
BABQCAP-Imprecise, that is a extended version of the BAPQCAP. In this extension,

Table 2 Example of one V a h l


instance to the benchmark
BAPQCAP V1 34 60 260
V2 86 100 32
V3 43 120 139
V4 165 110 193
V5 52 80 287
V6 67 90 318
V7 38 100 366
V8 15 80 166
V9 110 90 109
V10 95 120 251
Fully Fuzzy Linear Programming Model for the Berth Allocation … 95

Table 3 Example of one V a1 a2 a3 h l


instance with imprecision in
the arrival time of vessels V1 15 34 42 60 260
V2 77 86 103 100 232
V3 30 43 55 120 139
V4 150 165 184 110 193
V5 33 52 69 80 287
V6 50 67 82 90 318
V7 22 38 50 100 366
V8 2 15 29 80 166
V9 95 110 118 90 109
V10 81 95 115 120 251

the arrival times of vessels are considered imprecise. To simulate this imprecision, in
every instance of the benchmark BAPQCAP, the possibility of delay and advance was
added to the arrival time up to an allowed tolerance. This possibility is represented
by a fuzzy triangular number (a1, a2, a3) (see Fig. 1).
Where:
a1: Minimum allowed advance in the arrival of the vessel. This value is random and
it is generated within the range [a−20, a].
a2: Arrival time with the highest possibility of a vessel (taken from original bench-
mark).
a3: Maximum allowed delay in the arrival of the vessel. This value is also random
and it is generated within the range [a, a+20].
Table 3, shows the modification done to the instance of Table 2. We can appreciate
the third column is the value of the arrival time of vessel with the highest possibility,
the second one represents the advance and the fourth one, the delay.
The triangular fuzzy number used to represent the imprecision in the arrival is
obtained from an expert present in every vessel. This expert have to indicate the time
interval of possible arrival, as well as the most-possible time the arrival occurs.
This data could also be obtained from historical data regarding the arrival of
each vessel.

4.3 Case Study

With the aim to show the advantages and disadvantages of the models presented in
this work, we use one instance consisting of 10 vessels (Table 3) as a case study.
In Fig. 3, we show the imprecise arrival of vessel as a triangular fuzzy number.
For example, for vessel V2, the most possible arrival is at 86 units of time, but it
96 F. Gutierrez et al.

Fig. 3 Imprecise arrival of vessels showed in Table 3

could be early or late -up to 77 and 103 units of time, respectively; the handling time
is 100 and the length of vessel is 232.

5 A MILP Fuzzy Model for the BAP with Two Quays

In this section we proposed a fuzzy MILP model to the continuous and dynamical
BAP able to allocate a quay to an arriving vessel. This model is an extension of the
model presented in [5], developed for a single quay.
We assume imprecision in the arrival time of vessels, meaning that the vessels
can be late or early up to a given allowed tolerance.
Formally, we consider that the imprecision in the arrival time of vessels is a fuzzy
number  a . The goal is to allocate a certain time and a place at the quay q ∈ Q, to
every vessel according certain constraints, with the aim of minimize the total waiting
time of vessels.

min (m iq − 
ai ) (8)
q∈Q i∈V

Subject to: 
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (9)
q∈Q

m iq ≥ 
ai ∀i ∈ V, ∀q ∈ Q (10)

piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (11)

piq + li ≤ p jq + M(1 − z ixjq ) ∀i, j ∈ V, i = j, ∀q ∈ Q (12)

m iq + h i ≤ H ∀i ∈ V, ∀q ∈ Q (13)
y
m jq − (m iq + h i ) + M(1 − z i jq ) ≥ S(
ai ) ∀i, j ∈ V, i = j, ∀q ∈ Q (14)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 97

y y
z ixjq + z xjiq + z i jq + z jiq ≥ B Miq + B M jq − 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (15)

y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q (16)

If the deterministic and fuzzy parameters are of linear-type we are dealing with a
fuzzy MILP model. The constraints are explained below:

• Constraint (9): each vessel is assigned to a quay.


• Constraint (10): the berthing time must be at least the same as the fuzzy arrival
time.
• Constraint (11): there must be enough space at the quays for the berthing.
• Constraint (12): at the quays, a vessel need to be to left or right side of another
one.
• Constraint (13): the berthing plan must be adjusted within the planning horizon.
• Constraint (14): for a vessel j berthing after vessel i, at quay q, its berthing time
m j must include the time of advance and delay S( ai ) allowed to vessel i.
• Constraint (15): the constraints (12) and (13) must be accomplished.

where z ixjq decision variable indicating if vessel i is located to the left of vessel j at
y
the berthing (z ixjq = 1), z i j = 1 indicates that the berthing time of vessel i is before
the berthing time of vessel j, in the quay q. M is a big integer constant.

5.1 Solution of the Model

The imprecise arrival for every vessel is represented by triangular distribution of


possibility 
a = (a1, a2, a3) (see Fig. 1). We consider that arrivals will not occur
before a1, nor after a3. The arrival with the maximum possibility is a2.
For a triangular fuzzy number  a = (a1, a2, a3), according to (1), its α−cut is
given by:
Aα = [a1 + α(a2 − a1), a3 − α(a3 − a2)]

The α−cut represents the time interval allowed for the arrival time of a vessel,
given a grade precision α. The size of the interval S(α) = (1 − α)(a3 − a1) must
be taken into account to the berthing time vessel next to berth. It can be observed
that for the value α, the earliness allowed is E(α) = (1 − α)(a2 − a1), the delay
allowed is D(α) = (1 − α)(a3 − a2) and S(α) = e(α) + D(α).
In Fig. 4, the alpha cuts B10.5 , B60.5 and B30.5 for the arrival of three vessels, with
a level cut α = 0.5 are showed.
By using the alpha-cuts as a method of defuzzification to the fuzzy arrival of ves-
sels, a solution to the fuzzy BAP model is obtained with the next auxiliary parametric
MILP model.
98 F. Gutierrez et al.

Fig. 4 α−cut for α = 0.5 to the fuzzy arrival of three vessels

Input: Set of incoming vessels V .


Output: Berthing plans to V with different grades of precision
For each α = {0, 0.1, . . . , 1}.
earliness allowed to vessel i

E i (α) = (1 − α) ∗ (a2i − a1i ).

delay allowed i

Di (α) = (1 − α) ∗ (a3i − a2i ).

tolerance time allowed to the arrival of vessel i

Si (α) = E i (α) + Di (α) ∀i ∈ V



min (m iq − (a1 + α ∗ (a2 − a3))) (17)
q∈Q i∈V

subject to: 
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (18)
q∈Q

m iq ≥ (a1 + α ∗ (a2 − a1)) ∀i ∈ V, ∀q ∈ Q (19)

piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (20)

piq + li ≤ p jq + M(1 − z ixjq ) ∀i, j ∈ V, i = j, ∀q ∈ Q (21)

y
m jq − (m iq + h i ) + M(1 − z i jq ) ≥ Si (α) ∀i, j ∈ V, i = j, ∀q ∈ Q (22)

y y
z ixjq + z xjiq + z i jq + z jiq ≥ 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (23)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 99

y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q. (24)

The planning horizon is given by:



H= (h i ) + max{a3i , i ∈ V }.
i∈V

In the parametric MILP model, the value of α is the grade of precision allowed in
the arrival time of vessels. For every α ∈ [0, 1], and for every vessel i, the allowed
tolerance time Si are computed. The lower the value α is, the lower the precision,
i.e., the longer the allowed time at the arrival of every vessel.

5.2 Evaluation

To the evaluation we have used a personal computer equipped with a Intel Core (TM)
i3 CPU M370 @ 2.4 GHz with 4.00 GB RAM. The experiments were performed with
a timeout of 60 min.

5.2.1 Evaluation of the Study Case

For each instance, eleven degrees of precision (α = {0, 0.1, . . . , 1}), generated eleven
berthing plans.
As an illustrative example, three different berthing plans are showed in Tables 4,
5 and 6, for the vessels of Table 3.

Table 4 Berthing plan with α = 1 of precision in the arrival time of vessels


V a1 a2 a3 E D m1 m2 m3 h d1 d2 d3 l p Q
V1 15 34 42 0 0 34 34 34 60 94 94 94 260 440 0
V2 77 86 103 0 0 94 94 94 100 194 194 194 232 468 0
V3 30 43 55 0 0 43 43 43 120 163 163 163 139 301 0
V4 150 165 184 0 0 165 165 165 110 275 275 275 193 251 0
V5 33 52 69 0 0 52 52 52 80 132 132 132 287 366 1
V6 50 67 82 0 0 138 138 138 90 228 228 228 318 273 1
V7 22 38 50 0 0 38 38 38 100 138 138 138 366 0 1
V8 2 15 29 0 0 15 15 15 80 95 95 95 166 135 0
V9 95 110 118 0 0 132 132 132 90 222 222 222 109 591 1
V10 81 95 115 0 0 95 95 95 120 215 215 215 251 0 0
100

Table 5 Berthing plan with α = 0.5 of precision in the arrival time of vessels
V a1 a2 a3 E D m1 m2 m3 h d1 d2 d3 l p Q
V1 15 34 42 9.5 4.0 24.5 34.0 38.0 60 84.5 94.0 98.0 260 440 1
V2 77 86 103 4.5 8.5 102.0 106.5 115.0 100 20.02 206.5 215.0 232 302 0
V3 30 43 55 6.5 6.0 36.5 43.0 49.0 120 156.5 163.0 169.0 139 561 0
V4 150 165 184 7.5 9.5 158.0 165.0 174.5 110 267.5 275.0 284.5 193 0 1
V5 33 52 69 9.5 8.5 42.5 52.0 60.5 80 122.5 132.0 140.5 287 0 0
V6 50 67 82 8.5 7.5 98.0 106.5 114.0 90 188.0 196.5 204.0 318 382 1
V7 22 38 50 8.0 6.0 30.0 38.0 44.0 100 130.0 138.0 144.0 366 0 1
V8 2 15 29 6.5 7.0 8.5 15.0 22.0 80 88.5 95.0 102.0 166 395 0
V9 95 110 118 7.5 4.0 144.0 151.5 155.5 90 234.0 241.5 245.5 109 193 1
V10 81 95 115 7.0 10.0 140.5 147.5 157.5 120 260.5 267.5 277.5 251 0 0
F. Gutierrez et al.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 101

Table 6 Berthing plan with α = 0 of precision in the arrival time of vessels


V a1 a2 a3 E D m1 m2 m3 h d1 d2 d3 l p Q
V1 15 34 42 19 8 15 34 42 60 75 94 102 260 414 1
V2 77 86 103 9 17 149 158 175 100 249 258 275 232 468 0
V3 30 43 55 13 12 30 43 55 120 150 163 175 139 0 1
V4 150 165 184 15 19 150 165 184 110 260 275 294 193 256 1
V5 33 52 69 19 17 33 52 69 80 113 132 149 287 413 0
V6 50 67 82 17 15 150 167 182 90 240 257 272 318 150 0
V7 22 38 50 16 12 22 38 50 100 122 138 150 366 0 0
V8 2 15 29 13 14 2 15 29 80 82 95 109 166 248 1
V9 95 110 118 15 8 95 110 118 90 185 200 208 109 139 1
V10 81 95 115 14 20 102 116 136 120 222 236 256 251 449 1

The column corresponding to the Q, in Tables 4, 5 and 6, indicates the quay where
the vessel has to moor. The value 1 makes reference that the vessel must moor in
Quay one and the quay two if the value is 0.
For α = 1, maximum precision in the arrival of vessels (see Table 4), the earliness
and delays are E = 0 and D = 0, respectively, that is, earliness and delays are not
allowed in the arrival of any vessel. In most cases, if a vessel has a delay with respect
to its precise arrival time, this plan ceases to be valid. For example, vessel V 5 berths
at quay one. Vessel V 5 a berthing time m2 = 52 and a departure time d2 = 132, if
this vessel has a delay, then vessel V 9 can not berth at its allocated time m2 = 94.
Vessel V 1 berths at quay two, if V 1 is late V 2 can not berth at its allocated time.
This can be observed in Fig. 5. For a greater number of vessels, the delay of vessels
complicate even more the berthing plans.
The case when precision degree α = 0.5 is shown in Table 5. If vessel V 5 is, for
instance, assigned to quay two, the optimum berthing time is m2 = 52, the earliness
allowed is E = 9.5, the delay allowed D = 8.5, that is, the vessel can berth and the
time interval [42.5, 60.5], and the departure can be at the time interval [122.5, 140.5],
the optimum departure time is d2 = 132; after vessel V 5, vessel V 10 can berth, the
optimum berthing is m2 = 147.5, with an allowed earliness of E = 7 and a allowed
delay of D = 10, that is, the vessel can berth and the time interval [140.5, 157.5],
and the departure can be at the time interval [260.7.5, 277.5], the optimum departure
time is d2 = 267.5 (see Fig. 6).
For α = 0, minimum allowed precision in the arrival time of vessels, the earliness
and delays are increased (see Table 6). For instance, if vessel V 5 is assigned to quay
two, the optimum time of berthing is m2 = 52, the allowed earliness and delays
are E = 19, D = 17, respectively. Therefore, the time interval where the vessel can
berth is [33, 69], after vessel V 5, vessel V 2 can berth, the optimum time of berthing
is m2 = 158, but it can berth at the time interval [149, 175] (see Fig. 7).
Considering the structure of the model created, for every value of α, allowed the
earliness and delays are proportional to its maximum earliness and delay times. For
102 F. Gutierrez et al.

example, for α = 0.5, vessel V 1 can be early or delayed up to a maximum of 9.5


and 4 units of time, respectively (See Table 5). If α = 0.0, the earliness and delays
to the vessel V 1, are E = 19 and D = 8, respectively (See Table 6).
For all values of α, the model was resolved in an optimum way. In Table 7, the
objective function T and the computation time used by CPLEX to obtain the solution
to the different degrees of precision α are showed. The lower T = 202 is obtained
within a time of 3.27 s, corresponding to a degree of precision α = 1; and the greater
T = 386, is obtained in a time of 5.34 s corresponding to a degree of precision α = 0.
There is a linear relationship between α and T , the decrease of α, increases the
value of T , e.g, to a degree of precision α = 0.5, the value of T is 308; and to α = 0
is 386.
The decision-makers of the TMC, can choose a plan according to the pair (α, T )
that is a satisfactory solution. For example, if a plan using the lower waiting time of
vessels is desirable, though earliness and delays in the vessel arrival are not permitted,
the person in charge can choose the pair (1, 202); if a plan with 0.5 of precision in
the arrival of vessels is desirable, though the value of waiting time increases, ant the
person has the possibility to choose the pair (0.5, 308).
This model assigns slacks to support possible delays or earliness of vessels, this
represent a big waste of time without the use of the quay and the vessel has to stay
longer than is necessary at the port.

5.2.2 Evaluation of the Benchmark BAPQCAP-Imprecise

Table 8, shows the average of results obtained by CPLEX to the Benchmark


BAPQCAP-Imprecise (see Sect. 4.2) with a precision of α = 0.0.

Fig. 5 Graphical representation of berthing plan of Table 4


Fully Fuzzy Linear Programming Model for the Berth Allocation … 103

Fig. 6 Graphical representation of berthing plan of Table 5

Fig. 7 Graphical representation of berthing plan of Table 6

The values showed are the average of the objective function of solutions founded
(Avg T ), the number of instances solved with optimality (#Opt) and the number of
instances solved without certified optimality (#NOpt).
In our results, it can be observed that in all the solved cases, T increases as the
number of vessels increases. To the given timeout, CPLEX, has found the optimum
solution in 30% of the instances with 10 vessels; a non-optimum solution in 100%
of the instances from 15 to 65 vessels; and for number of vessels greater or equal to
70 no solution was found.
The growth of T for the values of α = {0, 0.5, 1} is shown in Fig. 8. With the
given timeout, CPLEX has found a solution up to instances of 65 vessels.
104 F. Gutierrez et al.

Table 7 Value of the α T Time (s)


objective function to every
degree of precision 1.0 202.0 3.27
0.9 223.2 4.84
0.8 244.4 6.96
0.7 265.6 5.37
0.6 286.8 7.38
0.5 308.0 4.31
0.4 324.4 5.36
0.3 339.8 5.84
0.2 355.2 4.25
0.1 370.6 5.89
0.0 386.0 5.34

Table 8 Evaluation of the benchmark BAPQCAP-Imprecise to α = 0.0


Vessels Avs T # Opt # NOpt
5 99.24 100 0
10 2430.78 30 70
15 8738.00 0 100
20 20016.00 0 100
25 31776.00 0 100
30 46348.00 0 100
35 50766.00 0 100
40 74872.00 0 100
45 98822.00 0 100
50 128826.00 0 100
55 161084.00 0 100
60 203712.00 0 100
65 239288.00 0 100
70 — — —
75 — — —
80 — — —

6 A FFLP Model for BAP with Multiples Quays

We propose a FFLP model to the continuous and dynamical BAP able to allocate
a quay to a incoming vessel; which is an extension of the model presented in [5],
developed to a single quay. This model solves the inconvenience of a great waste of
time without the use of the quays of the MILP fuzzy model (see Sect. 5).
Fully Fuzzy Linear Programming Model for the Berth Allocation … 105

Fig. 8 Evaluation of the imprecise benchmark to different values of α

The arrival times (


a ), berthing times (  of the vessel are
m ), and handling times (d)
considered to be of fuzzy nature (imprecise).
In a similar way to the model of Sect. 5, the objective is to allocate all vessels, to
different quays, according to several constraints minimizing the total waiting time,
for all vessels.

min (
m iq − 
ai ) (25)
q∈Q i∈V

Subject to:

B Miq = 1 ∀i ∈ V, ∀q ∈ Q (26)
q∈Q

iq ≥ 
m ai ∀i ∈ V, ∀q ∈ Q (27)

piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (28)

piq + li ≤ p jq + M(1 − z ixjq ) ∀i, j ∈ V, i = j, ∀q ∈ Q (29)

iq + 
m h i ≤ H ∀i ∈ V, ∀q ∈ Q (30)

 jq + 
y
m hi + ≤ m
i + M(1 − z i jq ) ∀i, j ∈ V, i = j, ∀q ∈ Q (31)
106 F. Gutierrez et al.

y y
z ixjq + z xjiq + z i jq + z jiq ≥ B Miq + B M jq − 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (32)

y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q. (33)

The interpretation of constraints are similar to the model of Sect. 5, with the
exception of the constraint (31). This constraint with regard to the time and indicate
the vessel berths after or before another one.

6.1 Solution of the Model

We assume that all parameters and decision variables are linear and some of them
are fuzzy. Thus, we have a fully fuzzy linear programming problem (FFLP).
The arrival of every vessel is represented by a triangular possibility distribu-
tion a = (a1 , a2 , a3 ), in a similar way, the berthing time is represented by m=
(m 1 , m 2 , m 3 ), and 
h = (h 1 , h 2 , h 3 ) is considered a singleton.
When representing parameters and variables by triangular fuzzy numbers, we
obtain a solution to the fuzzy model proposed applying the methodology proposed
by Nasseri (see Sect. 3).
To apply this methodology, we use the operation of fuzzy difference on the objec-
tive function and the fuzzy sum on the constraints (see Sect. 2.2) as well as the First
Index of Yagger as an ordering function on the objective function (see Sect. 2.3)
obtaining the next auxiliary MILP model.
 1
min ((m1iq − a3i ) + (m2iq − a2i ) + (m3iq − a1i )) (34)
q∈Q i∈V
3

Subject to: 
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (35)
q∈Q

m1iq ≥ a1i ∀i ∈ V, ∀q ∈ Q (36)

m2iq ≥ a2i ∀i ∈ V, ∀q ∈ Q (37)

m3iq ≥ a3i ∀i ∈ V (38)

piq + li ≤ L q ∀i ∈ V (39)

m3iq + h i ≤ H ∀i ∈ V (40)

piq + li ≤ p jq + M(1 − z ixjq ) ∀i, j ∈ V, i = j (41)


Fully Fuzzy Linear Programming Model for the Berth Allocation … 107

y
m1iq + h i ≤ m1 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (42)

y
m2iq + h i ≤ m2 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (43)

y
m3iq + h i ≤ m3 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (44)

m2iq > m1iq ∀i ∈ V (45)

m3iq > m2iq ∀i ∈ V (46)


y y
z ixjq + z xjiq + z i jq + z jiq ≥ B Miq + B M jq − 1 ∀i, j ∈ V, i = j. (47)

The planning horizon is the same of the model from Sect. 5.

6.2 Evaluation

For the evaluation a personal computer equipped with a Core (TM) i3 CPU M370
@ 2.4 GHz with 4.00 GB RAM was used. The experiments were performed within
a timeout of 60 min.

6.2.1 Evaluation of Study Case

For the vessels of study case (see Table 3), the berthing plan obtained with the model
is showed in Table 9, and the polygonal-shaped are showed in Fig. 9.
The berthing plan showed in Table 9, is a fuzzy berthing one, e.g, for vessel V4,
the most possible berthing time is at 165 units of time, but it could berth between

Table 9 Fuzzy berthing plan obtained to the study case


V a1 a2 a3 m1 m2 m3 h d1 d2 d3 l p Q
V1 15 34 42 15 34 42 60 75 94 102 260 0 0
V2 77 86 103 82 95 109 100 182 195 209 232 329 1
V3 30 43 55 30 43 55 120 150 163 175 139 561 1
V4 150 165 184 150 165 184 110 260 275 294 193 0 1
V5 33 52 69 33 52 69 80 113 132 149 287 0 1
V6 50 67 82 75 94 102 90 165 184 192 318 0 0
V7 22 38 50 22 38 50 100 122 138 150 366 318 0
V8 2 15 29 2 15 29 80 82 95 109 166 287 1
V9 95 110 118 113 132 149 90 203 222 239 109 193 1
V10 81 95 115 122 138 150 120 242 258 270 251 449 0
108 F. Gutierrez et al.

Fig. 9 Fuzzy berthing plan in polygonal-shape

150 and 184 units of time; the most possible depart time is at 275 units of time, but
it could departure between 260 and 294 units of time.
An appropriate way to observe the robustness of the fuzzy berthing plan is the
polygonal-shape representation (see Fig. 9). The line below the small triangle repre-
sents the possible early berthing time; the line that is above the small triangle, the
possible late berthing time, the small triangle represents the optimum berthing time
(with a greater possibility of occurrence) and the length of the polygon represents
the time that the vessel will stay at the quay.
In the circles of Fig. 9, we observe an apparent conflict between the departure
time of some vessels with others, at quay one, vessels V8 with V2, and vessel V5
with V9; at quay two, vessels V7 with V10. The conflicts are not such, for example,
if vessel V8 is late, vessel V2 has slack times supporting delays. Assume that vessel
V8 is late 10 units of time; according the Table 9, the berthing occurs at m = 15 +
10 = 25 units of time and its departure occurs at d = 25 + 80 = 105 units of time;
vessel V2 can moor after this time, since according to Table 9, its berthing can occur
between 82 and 109 units of time. A similar situation occurs for vessels V5 and V9
at quay one and for V7 and V8 at quay two; as observed in Fig. 10.
To analyze the robustness of the fuzzy berthing plan, we simulate the incidences
showed in Table 10.
With the incidences of Table 10, a feasible berthing plan can be obtained as shown
in Table 11. In Fig. 11, we observe that the berthing plan obtained, is a part of the
fuzzy plan obtained initially.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 109

Fig. 10 Delayed berthing of the vessels V8, V5 and V7

Table 10 Incidences in Vessel Time Incidence


vessel arrival times
V1 10 Earliness
V2 12 Delay
V3 0 On time
V4 15 Delay
V5 12 Earliness
V6 8 Earliness
V7 8 Delay
V8 11 Delay
V9 10 Earliness
V10 9 Delay

Table 11 Final berthing plan Vessels m h d l p Q


including incidents
V1 24 60 84 260 0 0
V2 107 100 207 232 329 1
V3 43 120 163 139 561 1
V4 180 110 290 193 0 1
V5 40 80 120 287 0 1
V6 86 90 176 318 0 0
V7 46 100 146 366 318 0
V8 26 80 106 166 287 1
V9 122 90 212 109 193 1
V10 147 120 267 251 449 0
110 F. Gutierrez et al.

Fig. 11 Final berthing plan included in the fuzzy plan

6.2.2 Evaluation of the Benchmark BAPQCAP-Imprecise

Table 12, shows the average of results obtained by CPLEX to the Benchmark
BAPQCAP-Imprecise (see Sect. 4.2).

Table 12 Evaluation of Vessels Avs T # Opt # NOpt


imprecise benchmark
5 91.6 100 0
10 2811.9 8 92
15 8492.0 0 100
20 18760.0 0 100
25 30063.3 0 100
30 50444.0 0 100
35 63898.0 0 100
40 75880.0 0 100
45 101766.7 0 100
50 144804.0 0 100
55 226946.0 0 100
60 226254.0 0 100
65 263254.0 0 100
70 — — —
75 — — —
80 — — —
Fully Fuzzy Linear Programming Model for the Berth Allocation … 111

From results, we can observe that in all cases solved by CPLEX, the objective func-
tion T increases as the number of vessels increases. To the given timeout, CPLEX,
found the optimum solution in 8% of the instances with 10 vessels; a non-optimum
solution in 100% of the instances from 15 to 65 vessels; and for a number of vessels
greater or equal to 70 no solution was founded.

6.3 Aplication of Model FFLP

The model FFLP for the BAP could be applied in MTC two or more quays, to this
end it is suggested to follow the following steps:
• Step 1: To set a planning horizon and the length of the quay.
• Step 2: An expert in every vessel has to indicate the time interval of possible arrive,
as well as the most possible time the arrival occurs (approximations to this data
could be also obtained from historical data of the arrival of each vessel).
• Step 3: Having the data of step 2, we form the fuzzy triangle representing the
imprecise arrival of each vessel.
• Step 4: Known parameters in advance of each vessel (fuzzy triangle of arrivals,
the service time and the length of the vessel) must be entered to the model.
• Step 4: Solve an auxiliary model by a linear programming solver. The decision
variables obtained are the mooring time and the position at the quay. For bigger
instances (greater than 65 vessels) given the high complexity of the BAP, the aux-
iliary model must be solved by a heuristic or meta-heuristic (previously evaluated
as the most efficient) approach giving good solutions in reasonable times.
• Step 5: With the parameters and decision variables obtained we must form the
fuzzy berthing plan.
• Step 6: With the incidences occurring for every vessel (earliness or delay) within
an allowed threshold, to carry out the final berthing plan.

7 Conclusion

Both models presented in this work, solve the continuous and dynamical BAP for
two quays with imprecision in the arrival of vessels.
The results obtained show that the fuzzy MILP model to the BAP provides differ-
ent berthing plans with different degrees of precision, but it also has an inconvenience:
after the berthing time of a vessel, the next vessel has to wait all the time considered
for the possible earliness and delay. This represents a big waste of time without the
use of the quay and the vessel has to stay longer than is necessary at the port.
The model FFLP to the BAP surpasses the inconvenience of the fuzzy MILP
model, the fuzzy berthing plan obtained can be adapted to possible incidences in the
vessel arrivals.
112 F. Gutierrez et al.

The models were evaluated with a timeout of 60 min. In that time both models
were able to find the optimum solution for a small number of vessels, for instances
from 15 up to 65 vessels they found non-optimum solutions and for greater number
vessels they found no solutions.
To implement the model in a MTC we suggest the steps to follow.
Finally, because of this research, we have open problems for future researches:
To extend the model that considers the quay cranes to be assigned to every vessel. To
use meta-heuristics to solve the fuzzy BAP model more efficiently, when the number
of vessels is greater.

Acknowledgements This work was supported by INNOVATE-PERU, Project N PIBA-2-P-069-


14.

References

1. Bierwirth, C., Meisel, F.: A survey of berth allocation and quay crane scheduling problems in
container terminals. Eur. J. Oper. Res. 202(3), 615–627 (2010)
2. Bruggeling, M., Verbraeck, A., Honig, H.: Decision support for container terminal berth plan-
ning: integration and visualization of terminal information. In: Proceedings of Van de Vervoers
logistieke Werkdagen (VLW2011), pp. 263–283. University Press, Zelzate (2011)
3. Das, S.K., Mandal, T., Edalatpanah, S.A.: A mathematical model for solving fully fuzzy linear
programming problem with trapezoidal fuzzy numbers. Appl. Intell. 46(3), 509–519 (2017)
4. Exposito-Izquiero, C., Lalla-Ruiz, E., Lamata, T., Melian-Batista, B., Moreno-Vega, J.: Fuzzy
optimization models for seaside port logistics: berthing and quay crane scheduling. Computa-
tional Intelligence, pp. 323–343. Springer International Publishing, Cham (2016)
5. Gutierrez, F., Vergara, E., Rodrguez, M., Barber, F.: Un modelo de optimización difuso para el
problema de atraque de barcos. Investig. Oper. 38(2), 160–169 (2017)
6. Gutierrez, F., Lujan, E., Vergara, E., Asmat, R.: A fully fuzzy linear programming model to
the berth allocation problem. Ann. Comput. Sci. Inf. Syst. 11, 453–458 (2017)
7. Frojan, P., Correcher, J., Alvarez-Valdez, R., Kouloris, G., Tamarit, J.: The continuous Berth
Allocation Problem in a container terminal with multiple quays. Exp. Syst. Appl. 42(21),
7356–7366 (2015)
8. Jimenez, M., Arenas, M., Bilbao, A., Rodríguez, M.V.: Linear programming with fuzzy param-
eters: an interactive method resolution. Eur. J. Oper. Res. 177(3), 1599–1609 (2007)
9. Kim, K., Moon, K.C.: Berth scheduling by simulated annealing. Transp. Res. Part B Methodol.
37(6), 541–560 (2003)
10. Lalla-Ruiz, E., Melin-Batista, B., Moreno-Vega, J.: cooperative search for berth scheduling.
Knowl. Eng. Rev. 31(5), 498–507 (2016)
11. Laumanns, M., et al.: Robust adaptive resource allocation in container terminals. In: Proceed-
ings of 25th Mini-EURO Conference Uncertainty and Robustness in Planning and Decision
Making, Coimbra, Portugal, pp. 501–517 (2010)
12. Lim, A.: The berth planning problem. Oper. Res. Lett. 22(2), 105–110 (1998)
13. Luhandjula, M.K.: Fuzzy mathematical programming: theory, applications and extension. J.
Uncertain Syst. 1(2), 124–136 (2007)
14. Nasseri, S.H., Behmanesh, E., Taleshian, F., Abdolalipoor, M., Taghi-Nezhad, N.A.: Fully
fuzzy linear programming with inequality constraints. Int. J. Ind. Math. 5(4), 309–316 (2013)
15. Rodriguez-Molins, M., Ingolotti, L., Barber, F., Salido, M.A., Sierra, M.R., Puente, J.: A
genetic algorithm for robust berth allocation and quay crane assignment. Prog. Artif. Intell.
2(4), 177–192 (2014)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 113

16. Rodriguez-Molins, M., Salido, M.A., Barber, F.: A GRASP-based metaheuristic for the Berth
allocation problem and the quay crane assignment problem by managing vessel cargo holds.
Appl. Intell. 40(2), 273–290 (2014)
17. Steenken, D., Vo, S., Stahlbock, R.: Container terminal operation and operations research-a
classification and literature review. OR Spectr. 26(1), 3–49 (2004)
18. UNCTAD: Container port throughput, annual, 2010–2016. http://unctadstat.unctad.org/wds/
TableViewer/tableView.aspx?ReportId=13321. Accessed 02 March 2018
19. Wang, X., Kerre, E.: Reasonable properties for the ordering of fuzzy quantities (I). Fuzzy Sets
Syst. 118(3), 375–385 (2001)
20. Yager, R.R.: A procedure for ordering fuzzy subsets of the unit interval. Inf. Sci. 24(2), 143–161
(1981)
21. Young-Jou, L., Hwang, C.: Fuzzy Mathematical Programming: Methods and Applications, vol.
394. Springer Science & Business Media, Berlin (2012)
22. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 100, 9–34 (1999)
23. Zhen, L., Lee, L.H., Chew, E.P.: A decision model for berth allocation under uncertainty. Eur.
J. Oper. Res. 212(3), 54–68 (2011)
24. Zimmermann, H.: Fuzzy Set Theory and its Applications, Fourth Revised edn. Springer, Dor-
drecht (2001)
Ideal Reference Method with Linguistic
Labels: A Comparison with LTOPSIS

Elio H. Cables, María Teresa Lamata and José Luis Verdegay

Abstract In many life situations we are in the presence of decision making problems,
therefore it becomes necessary to study different theories, methods and tools to
solve these kinds of problems as efficiently as possible. In this paper, we describe
the elements that integrate a decision making model, as well as show some of the
compensatory multicriteria decision making methods such as TOPSIS, VIKOR or
RIM, that are most used. In particular, we identify the limitations of the RIM method
to operate with linguistic labels. Next, the basic concepts of the Reference Ideal
Method are described, and another variant is proposed to determine the minimum
distance to the Reference Ideal, as well as the normalization function. We illustrate
our method by means of an example and compare the results with those obtained by
the LTOPSIS method. Finally, the conclusions are presented.

Keywords Multicriteria decision making · Reference ideal method · RIM

1 Introduction

There are different situations where it is necessary to solve a decision making prob-
lem. To facilitate the work of the decision maker, different methods have been devel-
oped, among which the Multicriteria Decision Making methods (MCDM) can be
mentioned. Particularly, we will refer to the methods with a compensatory concep-
tion [1]. The purpose of this kind of problem is the selection of the best alterna-
tive Ai , i  1, 2, . . . , m, from the evaluation of each alternative for a criteria set

E. H. Cables (B)
Universidad Antonio Nariño, Bogotá, Colombia
e-mail: ehcables@uan.edu.co
M. T. Lamata · J. L. Verdegay
Universidad de Granada, 18071 Granada, Spain
e-mail: mtl@decsai.ugr.es
J. L. Verdegay
e-mail: verdegay@decsai.ugr.es

© Springer Nature Switzerland AG 2019 115


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_6
116 E. H. Cables et al.

C j , j  1, 2, . . . , n, such that the valuation or judgment matrix M is obtained. Also,


the relative importance of each criterion C j is established by a weight w j . Then, the
decision making model is organized as follows:

w1 w 2 wn
C1 C2 · · · Cn
⎛ ⎞
A1 x11 x12 · · · x1n
M
A2 ⎜ ⎟
⎜ x21 x22 · · · x2n ⎟
⎜ ⎟
.. ⎜ .. .. . . .. ⎟
. ⎝ . . . . ⎠
Am xm1 xm2 · · · xmn

There are several methods of compensatory Multicriteria Decision Making,


among which are: the Analytical Hierarchical Processes (AHP) [2], Analytical Net-
work Process (ANP) [3], the SMART method [4], the ELECTRE methods [5], the
PROMETHEE method [6], the TOPSIS method [7], the VIKOR method [8], the
RIM method [9], among others.
It is important to highlight that these methods have been modified based on the
needs of the social environment, for example, when it is in the presence of information
having a high degree of imprecision or vagueness. In this case, one can mention
several methods that operate with fuzzy numbers to solve this problem, such as:
• The AHP method [10–17].
• The ANP method [18–21].
• The ELECTRE method, with its respective versions and applications, ELECTRE
III [22, 23], ELECTRE IS [24], ELECTRE TRI [25–30].
• The PROMETHEE method [31–33].
• The VIKOR method [34–42].
• The TOPSIS method [43–49].
Also, the TOPSIS method has been modified to operate with linguistic labels, as
is the case with the LTOPSIS method [50].
Each of the above methods uses different conceptions to obtain the final aggrega-
tion value for each alternative. In particular, the TOPSIS, VIKOR and RIM methods
are based on determining the separation to the Ideal Solution of each alternative.
However, the TOPSIS and VIKOR methods identify the best alternative from the
Positive Ideal Solution (PIS) and the Negative Ideal Solution (NIS), associating the
maximum value and the minimum value, respectively; while the RIM method’s ideal
solution can be any value or values set that is between the maximum values and min-
imum values.
The RIM method was proposed for crisp values; however, in a decision-making
problem we can be in the presence of data represented in different forms, for exam-
ple: natural numbers, real numbers, linguistic labels, and fuzzy numbers, among
others. Taking into account the characteristics of the RIM method, the objective of
Ideal Reference Method with Linguistic Labels … 117

this paper is to propose a variant of the RIM method to operate with linguistic labels.
After formulating the problem to be solved, in addition to mentioning the main com-
pensatory multicriteria decision making methods, the RIM method is described and
a new formulation is proposed to determine the minimum distance to the Reference
Ideal and to perform the normalization of the values of the decision matrix, so that
it allows operating with linguistic labels. Then, through an example solved with the
LTOPSIS method, the work of the L-RIM method is illustrated.

2 Background: RIM Method

In general, the conception of TOPSIS, VIKOR and RIM methods is to determine the
best alternative from the separation to the ideal solution, however it uses different
metrics. On the other hand, the RIM method extends the work conception, because
it allows for the ideal solution to be a value or a values set that can be among the
minimum value and the maximum value.

2.1 Basic Concepts

To work with the RIM methods, it should be considered essential concepts associated
with each criterion C j , j  1, 2, . . . , n, which are described below:
• The Range R j , which represents a values set belonging to a universal set, which
can be an interval, a labels set, a numbers set or simple values. Additionally, it is
associated with each of the criteria.
• The Reference Ideal R I j , that represents the maximum importance of the criterion
C j for the associated Range, furthermore, R I j ⊂ R j .
Then, based on these concepts, the distance to the Reference Ideal is determined.
In this case, the distance from a value xi j to their corresponding Reference Ideal is
obtained by expression (1).
 




dmin xi j , I R j  min
xi j − C j
,
xi j − D j
(1)

In this case, it is considered that R I j  C j , D j and xi j is the valuation or
judgment of each alternative i for each criterion C j .
The RIM method, like the TOPSIS and VIKOR methods require the normalization
of the valuation or judgments matrix M in order to transform the values xi j to the
same scale. It is necessary to express that these methods have different metrics to
carry out the process of normalization of the matrix M.
In the particular case of RIM, the normalization of the M valuation matrix is done
through expression (2) [9].
118 E. H. Cables et al.

f : xi j ⊕ R j ⊕ R I j → [0, 1] (2)

where:

• R j  A j , B j is the Range.
• RIj   C j , D j is the Reference Ideal.
• dmin xi j , R I j , it isobtained
through

expression

(1).


• xi j ∈ A j , B j , dist A j , C j 
A j − C j
and dist D j , B j 
D j − B j

The concepts referred to above are essential for working with the RIM method,
which is designed to operate only with numerical arguments. However, in everyday
practice there are several decisions making problems where the valuation of the
different alternatives Ai for each criterion C j is done through linguistics terms which
imply making a modification in the calculation method to determine the minimum
distance to the Reference Ideal and the normalization of the values set.

2.2 Normalization with Linguistic Labels

To guide the RIM method to operate with linguistic labels, it is first necessary to
associate with each label the numerical value to be used, which in this case will be a
triangular fuzzy number x̃  (x1 , x2 , x3 ). Then, the distance between two linguistic
labels can be obtained by expression (3).

dist L : L X × L Y → R
 1
dist L (L X , L Y )  dist L X̃ , Ỹ  (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 (3)
3

As it is known, the RIM method in its work conception uses the minimum distance,
therefore the following formulation for linguistic labels is necessary.

Definition 1 Let L X , L C , L D be linguistic labels then the minimum distance from


the label L X to the interval [L C , L D ], it is given by the function dmin
L
, such that:
L
dmin : L X ⊗ [L C , L D ] → R

dmin (L X , [L C , L D ])  min dist L (L X , L C ), dist L (L X , L D )
L
(4)

where the functions dist L (L X , L C ) and dist L (L X , L D ) are obtained by expres-


sion (3).

Then, from the definition above, we have the conditions to define the normalization
function.
Ideal Reference Method with Linguistic Labels … 119

Definition 2 Let L ki j , L A j , L B j , L C j , L D j be linguistic labels, such that R L j 



L A j , L B j represents the Range, R I L j  L C j , L D j represents the Reference Ideal
and I R L j ⊆ R L j for each criterion C j , then the normalization function f L is given
by:

f L : L ki j ⊕ R L j ⊕ R I L j → [0, 1]


⎪ 1 i f L ki j ∈ R I L j






⎪ dL L ki j ,R I L j L ki j ∈  L A j , L C j ∧ L ki j ∈
/ R IL j ∧

⎪1 − min  if
 ⎨ dist L L A j ,L C j dist L L A j , L C j
 0
f L ki j , R L j , R I L j 
L





dL
min
L ki j ,R I L j L ki j ∈  L D j , L B j ∧ L ki j ∈
/ R IL j ∧

⎪ 1−  if

⎪ dist L L D j ,L B j dist L D j , L B j
 0
L


⎩ 0 i f other case
(5)

where:
 
• dmin
L
L ki j , R I L j  dmin
L
L ki j , L C j , L D j , which is obtained through expres-
sion (4).
 
• dist L L A j , L C j and dist L L D j , L B j , which is obtained through expression (3).

3 L-RIM: Reference Ideal Method with Linguistic Labels

Starting with the definition of minimum distance to the reference ideal (expression 4)
for a working domain with linguistic labels, the normalization function (expression 5)
was established for the new context, which allows to modify some steps of the RIM
algorithm [9], as shown below:
Step 1. Define the context.
In this case, the information associated with each criterion C j are linguistic terms,
therefore, it is defined:

• The Range R L j  L A j , L B j , which is a linguistic labels set.

• The Reference Ideal R I L j  L C j , L D j , which is a linguistic labels set, where
R IL j ⊆ RL j .
• The weight w j associated to the criterion.

Step 2. Obtain the decision matrix, where the valuations issued lki j are linguistic
terms, such that lki j ∈ R L j .
120 E. H. Cables et al.
⎛ ⎞
lk11 lk12 · · · lk1n
⎜ ⎟
⎜ lk lk · · · lk ⎟
⎜ 21 22 2n ⎟
V ⎜
⎜ .. .. . . .. ⎟

⎜ . . . . ⎟
⎝ ⎠
lkm1 lkm1 · · · lkmn

Step 3. Normalize the decision matrix V .


⎛ ⎞
f L (lk11 , R L 1 , R I L 1 ) f L (lk12 , R L 2 , R I L 2 ) · · · f L (lk1n , R L n , R I L n )
⎜ ⎟
⎜ f L (l , R , R I ) f L (l , R , R I ) · · · f L (lk2n , R L n , R I L n ) ⎟
⎜ k21 L1 L1 k22 L2 L2 ⎟
N ⎜ ⎜


⎜ .
.. .
.. .. .. ⎟
⎝ . . ⎠
f L (lkm1 , R L 1 , R I L 1 ) f L (lkm1 , R L 2 , R I L 2 ) · · · f L (lkmn , R L n , R I L n )
⎛ ⎞
n 11 n 12 · · · n 1n
⎜ n 21 n 22 · · · n 2n ⎟
⎜ ⎟
⎜ ⎜ .. .. . . .. ⎟

⎝ . . . . ⎠
n m1 n m2 · · · n mn

where the function f L is the expression (5).


Step 4. Calculate the weighted normalized matrix P.
⎛ ⎞
n 11 · w1 n 12 · w2 · · · n 1n · wn
⎜ n 21 · w1 n 22 · w2 · · · n 2n · wn ⎟
⎜ ⎟
P  N ⊗W ⎜ ⎜ .. .. .. ..


⎝ . . . . ⎠
n m1 · w1 n m2 · w2 · · · n mn · wn

Step 5. Calculate the distance to the ideal and non-ideal solution of each alternative
Ai .
 
 
 n
  n  2
Ai+   and Ai−  
2
pi j − w j pi j ,
j1 j1

where i  1, 2, . . . , m and j  1, 2, . . . , n.
Step 6. Calculate the relative index to the reference ideal of each alternative Ai .
Ai−
Ri  A+ +A − , where 0 ≤ Ri ≤ 1, i  1, 2, . . . , m, where 0 ≤ Ri ≤ 1, i 
i i
1, 2, . . . , m
Step 7. Rank the alternatives Ai in descending order from the relative index Ri .
Ideal Reference Method with Linguistic Labels … 121

If the relative index Ri is close to the value 1, it indicates that it is very good.
However, if this value is close to the value 0, it is interpreted that the alternative must
be rejected.
As can be observed, the RIM algorithm is modified in the following aspects:
• In step 1, the Range definition and Reference Ideal use linguistic data.
• The judgment or valuations matrix V is formed by linguistic labels.
• The normalization function uses the linguistic labels and the linguistic labels sets
as arguments.

4 Illustrative Example

To show the use of the proposed method, we apply the example used in [50] about a
study of a decision making maintenance problem in an engine Factory.
The decision-making problem consists of deciding which is the best system for
cleaning the pieces in the maintenance of four-stroke engines. In this problem, we
have the following alternatives:
• A1: Conventional cleaning
• A2: Chemical cleaning
• A3: Ultrasonic cleaning
To evaluate the different alternatives, the following criteria were used:
• C1: Total annual operation cost
• C2: System productivity
• C3: System load capacity
• C4: Cleaning efficiency
• C5: Harmful effects
In this case, the different criteria were evaluated for each alternative through the
set of linguistic labels defined in Table 1 and its graphic representation is observed
in Fig. 1.

Table 1 Definition of the Linguistic labels Fuzzy numbers


linguistic labels and their
corresponding Fuzzy numbers Very poor (0, 1, 2)
Poor (1.5, 2.5, 3.5)
Medium poor (3, 4, 5)
Fair (4, 5, 6)
Medium good (5, 6, 7)
Good (6.5, 7.5, 8.5)
Very good (8, 9, 10)
122 E. H. Cables et al.

Fig. 1 Graphic representation of the fuzzy numbers with their respective linguistic label

C1 C2 C3 C4 C5
A1 Medium good Medium good Fair Fair Medium good
A2 Medium good Fair Very good Medium good Very poor
A3 Fair Very good Medium good Medium good Medium good

Fig. 2 Decision matrix

Table 2 Definition of the working context


Criteria Weights Range R L j Reference ideal
R IL j
C1 0.3461 [V er y poor, V er y good] {V er y good}
C2 0.2975 [V er y poor, V er y good] {V er y good}
C3 0.0686 [V er y poor, V er y good] {V er y good}
C4 0.1812 [V er y poor, V er y good] {V er y good}
C5 0.1066 [V er y poor, V er y good] {V er y good}
Note The weights associated with each criterion were found by means of the Analytic Hierarchy
Process [51]

The expert evaluated the previously defined alternatives to solve the established
problem, as shown in Fig. 2 [50].
When RIM is applied, it is necessary to consider the context (see Table 2). In this
case, the Range and the Reference Ideal are the same for all criteria.
When it is substituted in the decision matrix (Fig. 1) and in the working context
(Table 2), the fuzzy number corresponding to each linguistic label, obtaining the new
decision matrix, as well as the Range and the Reference Ideal (Table 3).
The Tables 4, 5 and 6 show the different steps of the algorithm.
Ideal Reference Method with Linguistic Labels … 123

Table 3 Representation with fuzzy number of the Decision Matrix, the Range and the Reference
Ideal
Alternatives C1 C2 C3 C4 C5
A1 (5, 6, 7) (5, 6, 7) (4, 5, 6) (4, 5, 6) (5, 6, 7)
A2 (5, 6, 7) (4, 5, 6) (8, 9, 10) (5, 6, 7) (0, 1, 2)
A3 (4, 5, 6) (8, 9, 10) (5, 6, 7) (5, 6, 7) (5, 6, 7)
Range RL1 RL2 RL3 RL4 RL5
LA (0, 1, 2) (0, 1, 2) (0, 1, 2) (0, 1, 2) (0, 1, 2)
LB (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10)
Reference R IL1 R IL2 R IL3 R IL4 R IL5
ideal
(8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10)

Table 4 Normalized valuation matrix


Alternatives C1 C2 C3 C4 C5
A1 0.625 0.625 0.5 0.5 0.625
A2 0.625 0.5 1 0.625 0
A3 0.5 1 0.625 0.625 0.625

Table 5 Weighted normalized matrix


Alternatives C1 C2 C3 C4 C5
A1 0.216313 0.185938 0.0343 0.0906 0.066625
A2 0. 216313 0.14875 0.0686 0.11325 0
A3 0.17305 0.2975 0.042875 0.11325 0.066625

Table 6 Indexes calculation


Alternatives Ai+ Ai− Ri
A1 0.200683 0.308525 0.606
A2 0.234419 0.294022 0.556
A3 0.191894 0.370884 0.659

Finally, it can be said that the order of the alternatives is A3 > A1 > A2 , which is
equal to the result obtained with LTOPSIS, although the Ri values are different but
very close.
When applying the LTOPSIS method to this decision problem, the following
relative index is obtained for each alternative (see Table 7).
As it is observed, when applying the LTOPSIS method and the LRIM method,
the same order is obtained for the alternatives ( A3 > A1 > A2 ), and the value of the
relative indexes are very close.
On the other hand, it is necessary to specify that the use of the LRIM method
offers advantages with respect to LTOPSIS, because the LRIM method uses the
124 E. H. Cables et al.

Table 7 Relative index Alternatives Ri


through LTOPSIS
A1 0.5938
A2 0.5469
A3 0.6933

same working principle of RIM [9] and the RIM method does not present Rank
Reversal. The LRIM method only modifies the distance function to a set (4) and the
normalization function (5), which in this case operates with linguistic labels.

5 Conclusions

There are many multicriteria decision methods that can be applied to a decision
making problem to obtain the best alternative. Among them, in this paper we have
focused on TOPSIS and RIM for their algorithmic resemblance. Hence, we have
worked with linguistic variables and for that reason RIM had to be adapted for the
management of these data, while TOPSIS was already developed. Therefore, in this
paper a study of RIM was carried out and a modification was proposed to operate
with linguistic labels, arriving at the following main conclusions:
• It was only necessary to modify the working method to determine the minimum
distance to the Reference Ideal and the normalization function.
• Through the example used to show the work with L-RIM, it was observed that
the values obtained for the relative index were very close to the values obtained
with the LTOPSIS for other examples well known in the literature. However, this
will not always happen since TOPSIS cannot operate when the best value for a
certain criterion is not an extreme value (maximum or minimum) but that value is
included among them.

Acknowledgements This work has been partially funded by projects TIN2014-55024-P and
TIN2017-86647-P from the Spanish Ministry of Economy and Competitiveness, P11-TIC-8001
from the Andalusian Government, and FEDER funds. Also, the support provided by the Antonio
Nariño University, Colombia.

References

1. Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs.
Wiley, New York (1976)
2. Saaty, T.L.: The analytic hierarchy process. McGraw-Hill, New York (1980)
3. Saaty, T.L.: Fundamentals of the Analytic Network Process. ISAHP, Kobe, Japan (1999)
4. Edwards, W., Barron, F.H.: SMARTS and SMARTER: improves simple methods for multiat-
tibute utility measurement. Organ. Behav. Hum. Decis. Process. 60, 306–325 (1994)
Ideal Reference Method with Linguistic Labels … 125

5. Roy, B.: Classement et choix en présence de points de vue multiples (la méthode ELECTRE).
Revue Francaise d’Informatique et de Recherche Opérationnelle 8, 57–75 (1968)
6. Brans, J.P., Vincke, P., Mareschal, B.: How to select and how to rank projects: the PROMETHEE
method. Eur. J. Oper. Res. 24, 228–238 (1986)
7. Hwang, C.L., Yoon, K.: Multi-attribute Decision Making: Methods and Applications. Springer-
Verlag, Berlin (1981)
8. Opricovic, S.: Multi-criteria optimization of civil engineering systems. Faculty of Civil Engi-
neering. Belgrade (1998)
9. Cables, E., Lamata, M.T., Verdegay, J.L.: RIM-reference ideal method in multicriteria decision
making. Inf. Sci. 337, 1–10 (2016)
10. Bozbura, F.T., Beskese, A., Kahraman, C.: Prioritization of human capital measurement indi-
cators using fuzzy AHP. Expert Syst. Appl. 32, 1100–1112 (2007)
11. Wang, Y.M., Luo, Y., Hua, Z.: On the extent analysis method for fuzzy AHP and its applications.
Eur. J. Oper. Res. 186, 735–747 (2008)
12. Dagdeviren, M., Yuksel, I.: Developing a fuzzy analytic hierarchy process (AHP) model for
behavior-based safety management. Inf. Sci. 178, 1717–1733 (2008)
13. Buyukozkan, G., Cifci, G., Guleryuz, S.: Strategic analysis of healthcare service quality using
fuzzy AHP methodology. Expert Syst. Appl. 38, 9407–9424 (2011)
14. Chou, C.H., Liang, G.S., Chang, H.C.: A fuzzy AHP approach based on the concept of possi-
bility extent. Qual. Quant. 47, 1–14 (2013)
15. Dabbaghian, M., Hewage, K., Reza, B., et al.: Sustainability performance assessment of green
roof systems using fuzzy-analytical hierarchy process (FAHP). Int. J. Sustain. Build. Technol.
Urban Dev. 5, 1–17 (2014)
16. Kubler, S., Voisin, A., Derigent, W., et al.: Group fuzzy AHP approach to embed relevant data
on communicating material. Comput. Ind. 65, 675–692 (2014)
17. Sánchez-Lozano, M., García-Cascales, M.S., Lamata, M.T.: Evaluation of optimal sites to
implant solar thermoelectric power plants: case study of the coast of the Region of Murcia,
Spain. Comput. Ind. Eng. 87, 343–355 (2015)
18. Ayag, Z., Ozdemir, R.: An intelligent approach to ERP software selection through fuzzy ANP.
Int. J. Prod. Res. 45, 2169–2194 (2007)
19. Onut, S., Tuzkaya, U.R., Torun, E.: Selecting container port via a fuzzy ANP-based approach:
a case study in the Marmara Region, Turkey. Trans. Policy 18, 182–193 (2011)
20. Kang, H.Y., Lee, A.H., Yang, C.Y.: A fuzzy ANP model for supplier selection as applied to IC
packaging. J. Intell. Manuf. 23, 1477–1488 (2012)
21. Vahdani, B., Hadipour, H., Tavakkoli-Moghaddam, R.: Soft computing based on interval valued
fuzzy ANP-A novel methodology. J. Intell. Manuf. 23, 1529–1544 (2012)
22. Roy, B.: ELECTRE III: Un algorithme de rangement fondé sur une représentation floue des
préférences en présence de critéres multiples. Cahiers du Centre d´Etudes de recherche oper-
ationnelle, 20, 3–24 (1978)
23. Montazer, G.A., Saremi, H.Q., Ramezani, M.: Design a new mixed expert decision aid-
ing system using fuzzy ELECTRE III method for vendor selection. Expert Syst. Appl. 36,
10837–10847 (2009)
24. Roy, B., Skalka, J.: ELECTRE IS, aspects méthodologiques et guide d´utilisation. Université
Paris-Dauphine, Paris, Cahier du LAMSADE (1985)
25. Yu, W.: ELECTRE TRI: Aspects methodologiques et manuel d´utilisation. Universite Paris-
Dauphine, Document du LAMSADE (1992)
26. Sevkli, M.: An application of the fuzzy ELECTRE method for supplier selection. Int. J. Prod.
Res. 48, 3393–3405 (2010)
27. Wu, M.-C., Chen, T.-Y.: The ELECTRE multicriteria analysis approach based on Atanassov’s
intuitionistic fuzzy sets. Expert Syst. Appl. 38, 12318–12327 (2011)
28. Hatami-Marbini, A., Tavana, M., Moradi, M., et al.: A fuzzy group Electre method for safety
and health assessment in hazardous waste recycling facilities. Saf. Sci. 51, 414–426 (2013)
29. Devi, K., Yadav, S.P.: A multicriteria intuitionistic fuzzy group decision making for plant
location selection with ELECTRE method. Int. J. Adv. Manuf. Technol. 66, 1219–1229 (2013)
126 E. H. Cables et al.

30. Sánchez-Lozano, J.M., García-Cascales, M.S., Lamata, M.T.: Comparative TOPSIS-


ELECTRE TRI methods for optimal sites for photovoltaic solar farms: case study in Spain. J.
Clean. Prod. 127, 387–398 (2016)
31. Behzadian, M., Kazemzadeh, R.B., Albadvi, A., et al.: PROMETHEE: a comprehensive liter-
ature review on methodologies and applications. Eur. J. Oper. Res. 200, 198–215 (2010)
32. Chen, Y.T., Wang, T.-C., Wu, C.-Y.: Strategic decisions using the fuzzy PROMETHEE for IS
outsourcing. Expert Syst. Appl. 38, 13216–13222 (2011)
33. Gupta, R., Sachdeva, A., Bhardwaj, A.: Selection of logistic service provider using fuzzy
PROMETHEE for a cement industry. J. Manuf. Technol. Manag. 23, 899–921 (2012)
34. Sanayei, A., Mousavi, S.F., Yazdankhah, A.: Group decision making process for supplier
selection with VIKOR under fuzzy environment. Expert Syst. Appl. 37, 24–30 (2010)
35. Opricovic, S.: Fuzzy VIKOR with an application to water resources planning. Expert Syst.
Appl. 38, 12983–12990 (2011)
36. Park, J.H., Cho, H.J., Kwun, Y.C.: Extension of the VIKOR method for group decision making
with interval-valued intuitionistic fuzzy information. Fuzzy Optim. Decis. Making 10, 233–253
(2011)
37. Jeya, R., Vinodh, S.: Application of fuzzy VIKOR and environmental impact analysis for
material selection of an automotive component. Mater. Des. 37, 478–486 (2012)
38. Yucenur, G.N., Demirel, N.C.: Group decision making process for insurance company selec-
tion problem with extended VIKOR method under fuzzy environment. Expert Syst. Appl. 39,
3702–3707 (2012)
39. Kim, Y., Chung, E.S.: Fuzzy VIKOR approach for assessing the vulnerability of the water
supply to climate change and variability in South Korea. Appl. Math. Model. 37, 9419–9430
(2013)
40. Wan, S.P., Wang, O.Y., Dong, J.-Y.: The extended VIKOR method for multi-attribute group
decision making with triangular intuitionistic fuzzy numbers. Knowl.-Based Syst. 52, 65–77
(2013)
41. Mokhtarian, M.N., Sadi-Nezhad, S., Makui, A.: A new flexible and reliable interval valued
fuzzy VIKOR method based on uncertainty risk reduction in decision making process: an
application for determining a suitable location for digging some pits for municipal wet waste
landfill. Comput. Ind. Eng. 78, 213–233 (2014)
42. Chang, T.H.: Fuzzy VIKOR method: a case study of the hospital service evaluation in Taiwan.
Inf. Sci. 271, 196–212 (2014)
43. Antucheviciene, J.: Evaluation of alternatives applying TOPSIS method in a fuzzy environment.
Technol. Econ. Dev. Econ. 11, 242–247 (2005)
44. Mahdavi, I., Mahdavi-Amiri, N., Heidarzade, A., et al.: Designing a model of fuzzy TOPSIS
in multiple criteria decision making. Appl. Math. Comput. 206, 607–617 (2008)
45. Ashtiani, B., Haghighirad, F., Makui, A.: Extension of fuzzy TOPSIS method based on interval-
valued fuzzy sets. Appl. Soft Comput. 9, 457–461 (2009)
46. Afshar, A., Marino, M.A., Saadatpour, M.: Fuzzy TOPSIS multicriteria decision analysis
applied to Karun reservoirs system. Water Resour. Manag. 25, 545–563 (2011)
47. García-Cascales, M.S., Lamata, M.T.: Multi-criteria analysis for a maintenance management
problem in an engine factory: rational choice. J. Intell. Manuf. 22, 779–788 (2011)
48. Arslan, M., Cunkas, M.: Performance evaluation of sugar plants by fuzzy technique for order
performance by similarity to ideal solution (TOPSIS). Cybern. Syst. 43, 529–548 (2012)
49. Ceballos, B., Lamata, M.T., Pelta, D.A.: Fuzzy multicriteria decision-making methods: a com-
parative analysis. Int. J. Intell. Syst. 32(7), 722–738 (2017)
50. Cables, E., Garcia-Cascales, M.S., Lamata, M.T.: The LTOPSIS: an alternative to TOPSIS
decision-making approach for linguistic variables. Expert Syst. Appl. 39, 2119–2126 (2012)
51. Garcia-Cascales, M.S., Lamata, M.T.: Selection of a cleaning system for engine maintenance
based on the analytic hierarchy process. Comput. Ind. Eng. 56(4), 1442–1451 (2009)
Comparative Analysis of Symbolic
Reasoning Models for Fuzzy Cognitive
Maps

Mabel Frias, Yaima Filiberto, Gonzalo Nápoles, Rafael Falcon, Rafael Bello
and Koen Vanhoof

Abstract Fuzzy Cognitive Maps (FCMs) can be defined as recurrent neural net-
works that allow modeling complex systems using concepts and causal relations.
While this Soft Computing technique has proven to be a valuable knowledge-based
tool for building Decision Support Systems, further improvements related to its
transparency are still required. In this paper, we focus on designing an FCM-based
model where both the causal weights and concepts’ activation values are described by
words like low, medium or high. Hybridizing FCMs and the Computing with Words
paradigm leads to cognitive models closer to human reasoning, making it more com-
prehensible for decision makers. The simulations using a well-known case study
related to simulation scenarios illustrate the soundness and potential application of
the proposed model.

M. Frias (B) · Y. Filiberto


Department of Computer Science, University of Camaguey, Camaguey, Cuba
e-mail: mabel.frias@reduc.edu.cu
Y. Filiberto
e-mail: yaima.filiberto@reduc.edu.cu
G. Nápoles · K. Vanhoof
Hasselt Universiteit Agoralaan gebouw D, Diepenbeek, Belgium
e-mail: gonzalo.napoles@uhasselt.be
K. Vanhoof
e-mail: koen.vanhoof@uhasselt.be
R. Falcon
Larus Technologies Corporation, School of Electrical Engineering and Computer Science,
University of Ottawa, Ottawa, Canada
e-mail: rfalcon@ieee.org
R. Bello
Department of Computer Science, University of Las Villas, Santa Clara, Cuba
e-mail: rbellop@uclv.edu.cu

© Springer Nature Switzerland AG 2019 127


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_7
128 M. Frias et al.

1 Introduction

Fuzzy Cognitive Maps (FCMs) [1] can be seen as neural networks that allow modeling
the dynamic of complex systems using concepts and causal relations between them.
They continue growing in popularity within the scientific community as a decision-
making method, where the transparency attached to the network becomes one of
their most relevant features.
Actually, the transparency of these knowledge-based networks has motivated
researchers to develop interpretable classifiers. As an example, Nápoles [2] pro-
posed an FCM using a single-output architecture to predict the resistance of HIV
mutations to existing drugs. While this model was able to notably outperform the tra-
ditional classifiers reported in the literature, such results could not easily be extended
to other application domains.
In scenario analysis, the problem shifts from obtaining high prediction rates to
exploiting the model by performing WHAT-IF simulations. More explicitly, due to
the fact that FCMs are comprised of cause-effect relations, the experts can explore
the impact of activating a subset of concepts over the whole system, where both
the activation of concepts and causal weights are described by numerical values.
However, this can be a challenge for experts since human beings usually think in a
more qualitative, symbolic way.
Besides, if we analyze the way to solve day-to-day activities, we realize that
depending on the aspect presented by each problem we can deal with different
numerical values, but in other cases, the problem presents qualitative aspects that
are complex to evaluate by means of exact values [3].
Combining the graphical nature of FCMs with natural language techniques to
describe the concepts’ activation values and the causal relationships between them
has recently emerged as a very attractive research direction.
The use of linguistics terms or words to describe the whole cognitive network
actually moves beyond the knowledge representation; preserving the features during
the neural inference rule is pivotal towards developing an accurate linguistic model. In
this paper, we further explore the hybridization between FCMs and the Computing
With Words, (CWW) [4] paradigm where the activation vectors and the weight
matrix are described using words, which allows removing the need for membership
functions. With this goal in mind, we adopt the symbolic model of CWW based on
ordinal scales since it is a very intuitive approach providing high interpretability.
The simulations using a case study evidence the theoretical soundness and broad
potentialities attached to ours proposals.
The paper is organized as follows. In the next Section, a short introduction to FCMs
is presented. Section 3 describes the basic principles behind the CWW paradigm, In
Sect. 4 some works related to combinations of FCM and CWW are described, while
in Sect. 5 the proposed models are presented. Section 6 presents the simulations,
whereas Sect. 7 summarizes the concluding remarks and the research directions to
be accomplished in a near future.
Comparative Analysis of Symbolic Reasoning Models … 129

2 Fuzzy Cognitive Maps

As already mentioned, FCMs are recurrent artificial neural networks introduced by


Kosko, in 1986 [1]. From the structural perspective, these knowledge-based networks
can be described by a collection of concepts and causal connections between them.
Such concepts denote either entities, objects, variables or states related to the physical
system under investigation.
The causal relation between two concepts Ci and Cj is characterized by a numerical
weight wij ∈ [−1, 1] that denotes the direction and intensity to which the concept
Ci causes Cj . The sign of wij indicates whether the relationship between concepts Ci
and Cj is direct or inverse. These relationships have three possible states [5] that are
briefly summarized as follows:

• wij > 0 indicates a direct causality between the concept Ci and the concept Cj ,
that is, an increase (decrease) in the value of Ci leads to an increase (decrease) in
the value of Cj .
• wij < 0 indicates inverse (negative) causality between the concept Ci and the con-
cept Cj , that is, an increase (decrease) in the value of Ci leads to a decrease
(increase) in the value of Cj .
• wij = 0 indicates no relationship between Ci y Cj .

Equation 1 shows how to propagate an initial stimulus across the cognitive network
comprised of N neural processing entities, where A(t) j denotes the activation value
of concept Cj for the t-th iteration, whereas wij is the causal weight connecting
concepts Ci and Cj . Likewise, the function f (.) is a transfer function that keeps the
inner product into the allowed activation interval, e.g. f (x) = 1/(1 + e−λx ). Other
alternatives for the transfer functions are the bivalent, the trivalent or the hyperbolic
tangent function.
⎛ ⎞
N
A(t+1)
i =f ⎝ A(t) ⎠
j wji , i  = j (1)
j=1

The above reasoning rule is repeated until either the network converges to a fixed-
point attractor or a maximal number of cycles is reached. The former scenario implies
that a hidden pattern was discovered [6] whereas the latter suggests that the system
outputs are cyclic or chaotic.

3 Computing with Words

In 1973, Zadeh introduced the notion of linguistic variable, which allows computing
words instead of numbers [4]. This symbolic processing paradigm allows handling
linguistic variables (e.g., values in the form of words or sentences of natural lan-
guage). The notion of linguistic variable is adopted to describe situations that cannot
clearly be defined in quantitative terms. The linguistic variables allow translating the
130 M. Frias et al.

natural language into logical or numerical statements. The relevance of the CWW
paradigm in decision-making has allowed the emergence of different linguistic com-
putational models such as:

• Linguistic Computational Model based on the Extension Principle [7, 8]. In this
model, the semantics of linguistic terms are given by fuzzy numbers defined in
the [0, 1] interval, which are usually described by membership functions. The
following expression formalizes the linguistic aggregation operator attached to
this model, where S n symbolizes the n-Cartesian Product, F̃ is an aggregation
operator based on the extension principle, F(R) is the set of fuzzy sets over the
set of real numbers and app1 (.) is an approximation function that returns a label
from the linguistic term set S.

F̃ app1 (.)
S n → F(R) −→ S

• Linguistic Computational Symbolic Model based on ordinal scale [9]. This model
performs the computation on the indexes of the linguistic terms. Usually, it imposes
a linear order to the set of linguistic terms S = {S0 , . . . , Sg } where Si < Sj if and
only if i < j. Formally, it can be expressed as:
R app2 (.)
S n → [0, g] −→ {0, . . . , g} → S

where R is a symbolic linguistic aggregation operator, app2 (.) is an approximation


function used to obtain an index {0, . . . , g} associated to a term in S = {S0 , . . . , Sg }
from a value in the [0, g] interval.
• The 2-tuple Fuzzy Linguistic Representation Model [3]. The above models have
simple computational processes and high interpretability, but a common drawback:
the loss of information caused by the need of expressing results in a discrete
domain. The 2-tuple model is based on the notion of symbolic translation that
allows handling a domain of linguistic expressions as a continuous universe. This
can be formalized as follows:
 app3 (.) −1
S → (Si , ai ) −→ (Si , αi ) → S

where Si ∈ S and αi ∈ [−0.5, 0.5), app3 (.) is the aggregation operator for 2-tuples,
whereas the functions  and −1 transform numerical values into a 2-tuples and
vice-versa without losing information.

4 Related Work

Recent work in FCMs has combine its graphical nature with natural language tech-
niques to describe both the concepts’ activation values and the causal relations
between them. In that way, it is obtain a qualitative reasoning model.
Comparative Analysis of Symbolic Reasoning Models … 131

For example, in 2014 is proposed a model for decision making with a FCM where
the causal relations are represented, initially, by linguistic 2-tuple. But to do the
FCM’s inference process, this values are transformed in numeric values [10].
A FCM for modeling consensus process is proposed in [11], where the linguistic
2-tuples is use as a form of causal knowledge representation. But again, the inference
process is perform over numerical values.
Rickard et al. [12] introduced, in 2015, a symbolic model based on interval type-2,
(IT2) fuzzy membership functions and the weighted power mean operator [13–16].
The Membership Functions are calculated from multiple-user interval inputs corre-
sponding to vocabulary words as described in the paper [17] by Hao and Mendel. The
aggregation functions used, are based upon the fuzzy neuronal model described in the
paper [18], which allows for the separate aggregations of positively and negatively
causal inputs to each node, followed by a final aggregation of these two aggregates.
Rather than using a distance function to map the IT2 node outputs at each iteration
into one of the IT2 vocabulary words, they use the Jaccard similarity measure for
this purpose. This method was applied, for first time, in a real medical dataset for
categorize the celiac disease, CD. This work show the good results of CWW FCM
method in a classification task [19].
In 2016, Gónzalez et al. [20] use the CWW paradigm to modeling project portfo-
lio risks interdependencies in a FCM. In this article the weight matrix is represented
using the 2-tuple linguistic model. This proposal allows visualize and do more under-
standable the relationships between the risks but it is not clear if the activation of
concepts (risks) is expressed with numerical values or using the 2-tuples model too.
That same year, Salah Hasan Saleh and his colleagues [21] proposed a FCM-model
where the weight matrix is expressed with hesitant fuzzy set [22]. This model was
used to improve the interpretability of diagnostic models of cardiovascular diseases.
Although this proposal achieves more flexibility to express causal relations between
concepts, the map was just used to show the relations between the symptoms and
there is no inference process.
More recently, in [23] the authors presented a model to perform the neural reason-
ing process of FCM-based systems using Linguistic Computational Symbolic Model
based on ordinal scale [9] to represent the concepts’ activation values and the weight
matrix. This proposal has the drawback of the symbolic model from CWW used,
loss of information, lack of accuracy and no parameters to adjust. Aiming at solving
these drawbacks in [24] the authors introduced a model that replaces the numerical
components of the FCM reasoning with linguistic terms represented with Triangular
Fuzzy Numbers (TFN) [25]. This model was applied in the analyzes of the effects
of different variables (i.e., concepts) leading to the presence of Chondromalacia in
a patient.
As it can be observe, the interest to returning to the FCM the diffuse aspect of
Kosko’s initial proposal have been growing. But not all proposals achieve a com-
pletely fuzzy inference process, most of them only represent the causal relationships
through linguistic terms that are transformed into numerical values before performing
the inference process. That’s why we going to carry out a comparative study between
Rickard’s proposal and the methods proposed in ISFUROS 2017 and MICAI 2017
since in these proposals the entire inference process is executed with linguistic terms.
132 M. Frias et al.

5 Fuzzy Cognitive Reasoning with Words

In this section, we describe a model where concepts’ activation values and weights
defining the semantics of the FCM-based systems are described using words, instead
of describing them using numerical values. The goal of this model is to improve the
transparency of FCM-based models but the reasoning process is not trivial, we have
to solve two key problems: (i) how to multiply two linguistic terms or words, and
(ii) how to add the result of this product.
Problem 1. What does A(t)j wji mean? Does it represent the product of two linguistic
terms defined in the CWW paradigm?
Problem 2. How to define a transfer function f (.) that has a set of words as an
argument? Is this function really needed?
In order to answer the above questions, let us assume a basic model comprising
a set of linguistic terms S = {NA (Not Activated), V L/ − V L (Very Low), L/ − L
(Low), ML/ − ML (Moderate Low), M / − M (Moderate), MH / − MH (Moderate
High), H / − H (High), V H / − V H (Very High)}. The negative linguistic terms in
S will only be used to describe a negative causal weights wij between two concepts
since we are assuming that concept’s activation values C = {C1 , C2 , . . . , CN } are
always positive.
Aiming at mapping the product A(t)j wji , we consider the operator described in Eq. 2,
where ς(wji ) and ς(A(t) (t)
j ) are the gaussian fuzzy numbers (GFN) [26] for wij and Ai ,
respectively.

I (wji , A(t) (t)


j ) = ς(wji )ς(Aj ) (2)

A gaussian fuzzy number can be describe by a triplet (m, σ1 , σ2 ) where m is the


crisp magnitude of the GFN and σ1 , σ2 are fuzziness parameters. Figure 1 illustrates
the gaussian membership functions associated with the set of linguistic terms S.
There are many papers related to the fuzzy number arithmetic (e.g., [27–30]).
In this paper, we adopted the multiplication defined in [28] between two GFNs â =
(ma , σ1a , σ2a ) and b̂ = (mb , σ1b , σ2b ) as follows: â × b̂ ≈ (ma × mb , σ1a × abs(ma ) +
σ1b × abs(mb ), σ1a × abs(ma ) + σ1b × abs(mb )). Equation 3 displays the aggrega-
tion of the Ni linguistic terms impacting the ith concept, which produces a fuzzy
number.


Ni
ς(Ci(t+1) ) = Ij (wji , A(t)
j ) (3)
j=1

Usually the fuzzy number obtained from Eq. 3 do not mach with any linguistic
term in the initial linguistic terms set, so a linguistic approximation process is needed.
The next step of the proposed symbolic reasoning model is devoted to recovering
the linguistic term attached to ς(Ci(t+1) ). With this goal in mind, we use the deviation
between two GFNs as a distance function [31], which can be defined as follows:
Comparative Analysis of Symbolic Reasoning Models … 133

Fig. 1 Gaussian membership function


1 m
δ(â, b̂) = (a − bm )2 + (aσ1 − bσ1 )2 + (aσ2 − bσ2 )2 (4)
3

Equation 5 displays the reasoning rule for this configuration, which computes the
corresponding linguistic term for the ith linguistic concept. This function determines
the linguistic term reporting the minimal distance between its GFN and the one
resulting from Eq. 3. However, the linguistic term computed in this steps could be
defined by a GFN comprising negative values, which is not allMowed in our activation
model. Aiming at overcoming this issue, we rely on a transfer function for symbolic
domains showed in Fig. 2.

A(t+1)
i = argmin{δ(ς(Ci(t+1) ), ς(Sk ))} (5)
Sk ∈S

It should be stated that the linguistic FCM-based model presented in this section
preserves its recurrent nature. This implies that it will produce a state vector com-
prised of linguistic terms at each iteration until either a fixed-point is discovered or
a maximal number of iterations is reached.
Operating with words leads to other advantages, which are related to the system
convergence. After a certain number of iterations, a linguistic FCM will converge to
either a fixed-point attractor or a limit cycle (see [32] for further details) but chaos
is not possible. This happens because a linguistic FCM is a closed system that will
134 M. Frias et al.

Fig. 2 Transfer function for symbolic domains

Fig. 3 Example of a
linguistic FCM-based model

produce |S|N different responses at most. Therefore, after |S|N iterations, the map
will produce a previously visited state.
To illustrate how our model operates, let us consider the FCM displayed in Fig. 3,
which comprises 5 concepts and 7 causal relations.
The goal of this example is to compute the linguistic activation term for the
C5 concept given the following activation sequence: C1 ← H , C1 ← High(H ),
C2 ← High(H ), C3 ← Medium(M ), C4 ← Low(L). Once the concepts have been
activated, we can perform the reasoning process as explained above. This implies
computing the linguistic activation value A5 as the result of aggregating the linguis-
tic activation terms attached to concepts C1 − C4 and their corresponding linguistic
weights. Next we illustrate the operations related to one iteration in the symbolic
reasoning process:

I1 = ς(H )ς(−H ) = [0.82, 0.11, 0.11] ∗ [−0.83, 0.11, 0.11] = [−06806, 0.1815, 0.1815]

I2 = ς(H )ς(M ) = [0.82, 0.11, 0.11] ∗ [0.50, 0.11, 0.11] = [0.41, 0.1452, 0.1452]
Comparative Analysis of Symbolic Reasoning Models … 135

I3 = ς(M )ς(−M ) = [0.50, 0.11, 0.11] ∗ [−0.50, 0.11, 0.11] = [−0.25, 0.11, 0.11]

I4 = ς(L)ς(H ) = [0.12, 0.10, 0.10] ∗ [0.82, 0.11, 0.11] = [0.0984, 0.1022, 0.1022]

then,

ς(C5 ) = (I1 + I2 + I3 + I4 ) = (−0.5062, 0.4607, 0.4607)



1
δ(ς(C5 ), S1 ) = (−0.5062 + 1)2 + (0.4607 − 0.06)2 + (0.4607 + 0.06)2 = 0.43
3

..
.

1
δ(ς(C5 ), S4 ) = (−0.5062 + 0.50)2 + (0.4607 − 0.11)2 + (0.4607 − 0.11)2 = 0.28
3

..
.

1
δ(ς(C5 ), S15 ) = (−0.5062 − 1)2 + (0.4607 − 0.08)2 + (0.4607 − 0.08)2 = 0.92
3

A5 = min{0.43, 0.34, 0.30, 0.28, 0.30, 0.35, 0.42, 0.47, 0.46,

0.46, 0.55, 0.64, 0.73, 0.82, 0.92} = 0.28

A5 = argmin{δ(ς(Ci(t+1) , ςSk ))} = S4 = f (−M ) = L.


Sk ∈S

This new model is include in the comparative analysis of the next section.

6 Comparative Analysis of Symbolic Reasoning Model

In this section, we present a case study in order to asses the reliability of the proposed
models for FCM-based systems.
The Mobile Payment System (MPS) was a project idea related to the fast evolving
world of mobile telecommunications. It was conceived as a prototype project to test
the validity and applicability of the FCM methodology developed. The idea behind
the MPS project is to allow mobile phone users to make small and medium payments
using their mobile phones [33], see Fig. 4.
136 M. Frias et al.

Fig. 4 FCM for the MPS project

Fig. 5 Linguistic terms and their membership functions

In this subsection, we study the behavior of our proposal and three FCM combined
with Computing with words.
The experiment is oriented to calculating the linguistic activation values of each
concept. This case study, requires a fuzzification process, so the first step is fuzzi-
fied the numerical weights describing the causality relations between concepts. The
Figure 5 displays the triangular membership functions used before applied the models
FCM-Ordinal (proposed in [23]) and FCM-TFN (proposed in [24]) in the simulation
scenario. To applied the model CWW FCM (proposed in [12]) the numerical values
were fuzzify with type-2 fuzzy sets and to apply the model FCM-GFN (proposed in
Comparative Analysis of Symbolic Reasoning Models … 137

Table 1 Simulations results


Converged nodes activations
FCM nodes Initial FCM-Ordinal FCM-TFN FCM-GFN CWW FCM
activations
1 H H H H H
2 VH H MH MH VH
3 ML MH VH VH VH
4 VH H MH MH VH
5 NA M M MH VH
6 VH H VH VL VH
7 VL M M M VL
8 VH VH VH VH VH
9 L MH MH MH VH
10 VL MH MH MH VH
11 H H H H H
12 VL MH MH MH VL
13 MH MH MH H H
14 H MH M MH VH
15 MH MH M M H
16 H H VH VH H
17 VH VH VH VH VH
18 L M VH VH H
19 H H H H H
20 ML H VH VH VH
21 H H H H H
22 H H H H H
23 H H H H H
24 H M H H H

this paper), the fuzzification process was made using the membership function show
in the Fig. 1.
The initial activation for externality nodes, i.e., those nodes have no in-links.
(1, 11, 19, 21, 22, 23 and 24) were fixed to “High” and the remained nodes were
initialized random. The simulations results are show in Table 1.
As seen, the fourth models converge to similar results because the factors with
greater activation value in FCM-Ordinal, FCM-TFN, FCM-GFN and CWW FCM
models agree with those reported as most important for the success of Mobile Pay-
ment System, in the paper [33].
If we compare these output vectors with the opinion of several interviewed (table
A2 of paper [33]) no difference are observed between this results and the opinion
of the interviewed. This similarity was calculated applying the Euclidean distance
function between the mean of opinions and each one of the output vector of models.
138 M. Frias et al.

With this case study we have illustrated the practical advantages of using sym-
bolic expressions to describe the FCM components and its reasoning mechanism.
The results achieved are logically coherent and according to common sense. The
interpretability of these symbolic inference models is appreciated by users with no
background in Mathematics or Computer Sciences.

7 Conclusions

In this paper, we have presented a model to perform the neural reasoning process
of FCM-based systems using linguistics terms. This implies that both the concepts’
activation values and the weight matrix are qualitatively quantified by linguistics
terms, instead of using numerical values. The proposed model is particularly attrac-
tive in decision-making scenarios since experts feel more comfortable describing the
problem domain using symbolic terms.
The simulations using a case study reveal that our model is capable of producing
similar qualitative values in both symbolic and numerical settings. This outcome
is surely encouraging and comprises interesting research avenues, which are being
explored by the authors. For example, whether the ordinal model is the best choice
to operate the linguistic terms is questionable. Moreover, employing the same aggre-
gation operator for representing the sum and the multiplication could be considered
unrealistic. In spite of these facts, we identify the proposed model as a baseline for
future studies in this field.

Acknowledgements The authors would like to thank to John T. Rickard from Distributed Infinity,
Inc. Larkspur, CO, USA for his support with the simulations.

References

1. Kosko, B.: Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 24, 65–75 (1986)
2. Nápoles, G., Grau, I., Bello, R., Grau, R.: Two-steps learning of fuzzy cognitive maps for
prediction and knowledge discovery on the HIV-1 drug resistance. Exp. Syst. Appl. 41(3),
821–830 (2014)
3. Herrera, F., Martínez, L.: A 2-tuple fuzzy linguistic representation model for computing with
words. IEEE Trans. Fuzzy Syst. 8(6), 746–752 (2000)
4. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems ad decision pro-
cesses. IEEE Trans. Syst. Man Cybern. SMC-3(1), 28–44 (1973)
5. Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamic System Approach to Machine
Intelligence. Englewood Cliffs (1992)
6. Kosko, B.: Hidden patterns in combined and adaptive knowledge networks. Int. J. Approx.
Reason. 2(4), 377–393 (1988)
7. Bonissone, P.P., Decker, K.S.: Selecting uncertainty calculi and granularity: an experiment in
trading-off precision and complexity, pp. 217–247. Amsterdam, The Netherlands (1986)
8. Degani, R., Bortolan, G.: The problem of linguistic approximation in clinical decision making.
Int. J. Approx. Reason. 2, 143–162 (1988)
Comparative Analysis of Symbolic Reasoning Models … 139

9. Delgado, M., Verdegay, J.L., Vila, M.A.: On aggregation operations of linguistic labels. Int. J.
Intell. Syst 8, 351–370 (1993)
10. Pérez-Teruel, K., Leyva-Vázquez, M., Espinilla, M.: Computación con palabras en la toma de
decisiones mediante mapas cognitivos difusos. Revista Cubana de Ciencias Informáticas 8(2),
19–34 (2014)
11. Pérez-Teruel, K., Leyva-Vázquez, M., Estrada-Sentí, V.: Mental models consensus process
using fuzzy cognitive maps and computing with words. Ing. Univ. 19(1), 173–188 (2015)
12. Rickard, J.T., Aisbett, J., Yager, R.R.: Computing with words in fuzzy cognitive maps. In:
Proceedings of World Conference on Soft Computing, pp. 1–6 (2015)
13. Dujmovic, J.: Continuous preference logic for system evaluation. IEEE Trans. Fuzzy Syst
15(6), 1082–1099 (2007)
14. Dujmovic, J., Larsen, H.L.: Generalized conjunction/disjunction. Int. J. Approx. Reason. 46,
423–446 (2007)
15. Rickard, J.T., Aisbett, J., Yager, R.R., Gibbon, G.: Fuzzy weighted power means in evaluation
decisions. In: 1st World Symposium on Soft Computing (2010)
16. Rickard, J.T., Aisbett, J., Yager, R.R., Gibbon, G.: Linguistic weighted power means: compari-
son with the linguistic weighted average. In: IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE 2011), pp. 2185–2192 (2011)
17. Hao, M., Mendel, J.M.: Encoding words into normal interval type-2 fuzzy sets: HM approach.
IEEE Trans. Fuzzy Syst. 24, 865–879 (2016)
18. Rickard, J.T., Aisbett, J., Yager, R.R.: A new fuzzy cognitive map structure based on the
weighted power mean. IEEE Trans. Fuzzy Syst. 23, 2188–2202 (2015)
19. Najafi, A., Amirkhani, A., Papageorgiou, E.I., Mosavi, M.R.: Medical decision making based
on fuzzy cognitive map and a generalization linguistic weighted power mean for computing
with words (2017)
20. Gónzalez, M.P., De La Rosa, C.G.B., and Francisco José Cedeña Moran. Fuzzy cognitive maps
and computing with words for modeling project portfolio risks interdependencies. Int. J. Innov.
Appl. Stud., 15(4):737–742, mayo, 2016
21. Saleh, S.H., Rivas, S.D.L., Gomez, A.M.M., Mohsen, F.S., Vzquez, M.L.: Representación del
conocimiento mediante mapas cognitivos difusos y conjuntos de términos lingüisticos difusos
dudosos en la biomedicina. Int. J. Innov. Appl. Stud. 17(1), 312–319 (2016)
22. Torra, V., Narukawa, Y.: On hesitant fuzzy sets and decision. In: IEEE International Conference,
pp. 1378–1382 (2009)
23. Frias, M., Filiberto, Y., Nápoles, G., Vahoof, K., Bello, R.: Fuzzy cognitive maps reasoning
with words: an ordinal approach. In: ISFUROS 2017 (2017)
24. Frias, M., Filiberto, Y., Nápoles, G., Garcia-Socarras, Y., Vahoof, K., Bello, R.: Fuzzy cognitive
maps reasoning with words based on triangular fuzzy numbers. In MICAI 2017 (2017)
25. Van Laarhoven, P.J.M., Pedrycz, W.: A fuzzy extension of saaty’s priority theory. Fuzzy Sets
Syst 11, 229–241 (1983)
26. Pacheco, M.A.C., Vellasco, M.M.B.R.: Intelligent Systems in Oil Field Developmnt Under
Uncertainty. Springer, Berlin, Heidelberg (2009)
27. Akther, S.U., Ahmad, T.: A computational method for fuzzy arithmetic operations. Daffodil
Int. Univ. J. Sci. Technol. 4(1), 18–22 (2009)
28. Reznik, L.: Fuzzy Controller Handbook. Newnes (1997)
29. Weihua, S., Peng, W., Zeng, S., Pen, B., Pand, T.: A method for fuzzy group decision making
based on induced aggregation operators and euclidean distance. Int. Trans. Oper. Res. 20,
579–594 (2013)
30. Xu, Z.S.: Fuzzy harmonic mean operators. Int. J. Intell. Syst. 24, 152–172 (2009)
31. Chen, C.T.: Extension of the topsis for group decision-making under fuzzy environment. Fuzzy
Sets Syst. 114, 1–9 (2000)
32. Nápoles, G., Papageorgiou, E., Bello, R., Vanhoof, K.: On the convergence of sigmoid fuzzy
cognitive maps. Inf. Sci. 349–350, 154–171 (2016)
33. Carvalho, J.P.: On the semantics and the use of fuzzy cognitive maps and dynamic cognitive
maps in social sciences. Fuzzy Sets Syst. 214, 6–19 (2013)
Fuzzy Cognitive Maps for Evaluating
Software Usability

Yamilis Fernández Pérez, Carlos Cruz Corona and Ailyn Febles Estrada

Abstract The usability assessment is a highly complex process given the variety of
criteria to consider and it manifests imprecision, understood as the lack of concretion
about the values to be used, synonymous with ambiguity. The usability evaluation
method proposed in this work incorporates elements of Soft Computing such as
fuzzy logic and fuzzy linguistic modeling. Furthermore, the use of fuzzy cognitive
maps allows adding the interrelation between criteria and therefore to obtain a real
global index of usability. A mobile app was developed to evaluate the usability of
mobile applications based on this proposal. The application of this proposal in a
real-world environment shows that it is an operative solution, reliable, precise and
of easy interpretation for its use in the industry.

Keywords Software quality · Soft computing · Fuzzy cognitive map · Fuzzy logic

1 Introduction

Usability is one of the most important attributes of software quality. It is very usual
to define usability as a software ease of use, but this definition is ambiguous. For this
reason, there are several definitions according to different approaches to measure it.
Best known definitions appear in: ISO 9126, ISO 9241 and ISO 25010 [1].
The definition most used for the evaluation of usability is that made by ISO 25010
standard. It defines usability as “the extent to which a product can be used by speci-

Y. F. Pérez (B)
University of Informatics Sciences, Havana, Cuba
e-mail: yamilisf@uci.cu
C. C. Corona (B)
University of Granada, Granada, Spain
e-mail: carloscruz@decsai.ugr.es
A. F. Estrada (B)
Cuban Information Technology Union, Havana, Cuba
e-mail: ailyn.febles@uniondeinformaticos.cu

© Springer Nature Switzerland AG 2019 141


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_8
142 Y. F. Pérez et al.

fied users to achieve specified goals with effectiveness, efficiency and satisfaction in
a specified context of use” [2]. ISO describes usability as a combination of appropri-
ateness recognisability, learnability, operability, user error protection, user interface
aesthetics and accessibility.
The software usability assessment process is too expensive because it implies the
use of material resources and a team of well-trained specialists. It is highly complex
process given the variety of criteria to consider. For that reason, it is necessary to
achieve a correlation among the software assessment results and the actual usability
of the product.
The usability criteria to be taken into account in the assessment of a software
product are grouped in different ways for a better understanding. The most widely
used are those known as hierarchical models, which decompose the usability into
criteria organized in the form of a n-ary tree. Such a hierarchical decomposition is a
strategy widely used in different scientific disciplines. The most significant usability
models are McCall, Nielsen, ISO 9241, ISO 9126 and ISO 25010. These models
largely overlap; the attributes in different models are superimposed. Different name
for the same attribute are used and in some cases, there are equal names for different
attributes, which is determined when the actually measured for this attribute in a low
level is examined. Different ways to mix attributes are used and they are located in
different places in hierarchy.
The usability criteria are usually interdependent, because the result of the pref-
erence of one criterion over another is influenced by the others. With the increasing
level of understanding of usability, which transcends a simple taxonomy, the mod-
els have evolved towards the overlap and interrelation between these criteria. This
causes a group of criteria to influence quality in a contradictory way. For example,
greater appropriateness recognisability, means better learnability.
In spite of this, the proposed solutions are purely hierarchical [3, 4]. There are
many methods which combine conventional Multicriteria Decision Making Methods
(MCDM) with fuzzy concepts. Some use fuzzy TOPSIS [5], AHP [4, 6, 7], and
others that use fuzzy multi criteria approach [4, 8, 9]. Nevertheless, the usability
assessment from independent criteria causes some prejudice in favor or against the
product assessment.
As a result of the above problem, an evaluation method of usability assessing the
interrelationship between criteria is the main contribution made by this book chapter.
For this, the proposal uses elements of Soft Computing such as fuzzy logic, fuzzy
linguistic modeling and Fuzzy Cognitive Maps (FCM) as solution methods.
In addition, the proposal incorporates the restrictions of essential criteria, this is
considered as other contribution.
The paper has the following structure: Sect. 2 analyze and compare the several
methods existing in the literature for the same purpose, Sect. 3 describes the method-
ology and methods used to obtain the solution, Sect. 4 defines a generic usability
model, presents the new method based on Soft Computing, describes an app for
usability assessment for mobile applications and a case study in a Cuban company.
Finally, Sect. 5 is devoted to the conclusions and future works.
Fuzzy Cognitive Maps for Evaluating Software Usability 143

Table 1 Analysis of related works


Solutions Interdependence Essential criteria Data independence
among criteria
[11] Yes, fuzzy ANP No No
[13] No No Yes
[4] No No No
[6] No No No
[8] No No Yes
[15] No No No

2 Related Works

Different methods have been developing for usability evaluation based on MCDM
and fuzzy theory [4–6, 8, 10–15]. Among the most used techniques, is the one that
uses a fuzzy multi criteria approach, highlighting the Fuzzy TOPSIS and Fuzzy AHP
methods, or their derivatives. But it has the limitation that it does not take into account
the relation between the criteria. This entails an analysis with ANP, it incorporates
feedback and interdependent relationships among decision attributes. This provides
a more accurate approach for modeling complex decision environment. ANP have
two disadvantages: first, it is difficult to provide a correct network structure among
criteria even for experts, and different structures lead to different results. Second, to
form a supermatrix all criteria have to be pair-wise compared with regard to all other
criteria, which is difficult and also unnatural [16].
In [1] an extensive review about several usability assessment models was pre-
sented. The authors remarked that there is great similarity among the models, which
allows modeling a general structure for their representation. In addition, to judge the
usability for cognitive and practical reasons, the usability sub characteristics must
be equal or less than seven. It is insufficient to have a single level, that is why sub-
characteristics are defined. Other element presented in this paper was that the values
of the criteria are heterogeneous because they come from objective and subjective
criteria.
In the majority of the bibliography, it is not reflected the interdependency among
the criteria. There are models that reflect this relation because they repeat measures
for the related attributes. The use of any Soft Computing techniques allows obtaining
better results. The conclusion about one specific Soft Computing technique being
better than another is not appropriate. In the case of the aggregation to obtain a global
value of usability, the complete aggregation as well as the partial aggregation must
be permitted.
If the most outstanding works are selected, it can be seen that a solution that solves
the problems encountered is not found (see Table 1).
144 Y. F. Pérez et al.

None of the analyzed methods allow partial aggregation of the criteria and only
one solution incorporated interdependent relationships among criteria. This can see
in Table 1.
The usability evaluation method proposed in this paper allows adding the interre-
lation between criteria, the essential criteria and data independence to obtain a real
global index of usability.

3 Methodology

Precision in a model assumes that its parameters represent exactly the perception
of the phenomenon or the characteristics of the actual modeled system [17]. This
does not happen in the modeling of interdependence, where it manifests imprecision,
understood as the lack of concretion about the values to be used, synonymous with
ambiguity. Soft Computing is a methodology widely used in situations where the
data to be considered are not accurate, but imprecise.
Quite often, the description of the state of objects or phenomena is done through
words or sentences instead of numbers, for it to be useful and appropriate. This is the
case of linguistic variable; whose value establishes the description. These variables
are useful because they constitute a way of compressing information [18]. In addition,
they help to characterize a phenomenon that may be ill-defined or complex to define,
or both. They are a means of translating concepts or linguistic descriptions into
numerical ones and treating them automatically. Linguistic modeling is based on
fuzzy sets and has proved its efficacy for the representation of information of a
qualitative nature.
Fuzzy Cognitive Maps is a technique developed by Kosko [19], for quantitative
modeling, based on the knowledge and experience of experts, and it is a fuzzy directed
graph. The nodes represent concepts and arcs the relationships between concepts.
In the FCM, there are three possible types of relations between concepts: positive
relation, negative relation or non-existence of relations. The degree of the relationship
is described through a fuzzy number or linguistic value, defined in the interval [−1,
1]. An FCM, consisting of n concepts, is represented in a n × n matrix, known as
the adjacency matrix. This matrix is obtained from the values assigned to the arcs.
In the contribution, it is decided to treat the interdependence between criteria
using Fuzzy Cognitive Maps, with the definition of the linguistic variable Influence
(I).
Also, there is a subset of criteria classified as essential (EC), determined from the
usability requirements. These criteria having associated restrictions and in turn, an
interval is linked to it. The essential criteria are treated on the basis of restrictions
and definition of a penalization vector.
Fuzzy Cognitive Maps for Evaluating Software Usability 145

Fig. 1 Usability model

4 Proposed Model

In this section, the usability model is presented, assessing the interdependence


between criteria through the use of the tools detailed in the Sect. 3.

4.1 Usability Model

The usability model (UM) is represented as a graph defined as:

U M  (V, E v , E h , EC) (1)

where:
• V is the set of evaluation criteria.
• E v is the set of vertical links.
• E h is the set of horizontal links (influence)
• EC is the set of essential criteria
UM (see Fig. 1) is constructed by levels: level 0, represents the usability index;
level 1, the sub-characteristics; level 2, the metrics obtained from the software testing
process and from expert assessments. A criterion is only found in one level and the
union of all corresponds with the whole to be valued.
 
Level0  {U } Level j  ∅, Level j  V . . . (2)
0≤ j≤l 0≤ j≤l

Each criterion has a weight (W) associated. At each level, there is a set of weight
vectors. The sum of the weights of the sibling criteria is equal to 1.
146 Y. F. Pérez et al.

Fig. 2 Linguistic variable influence

The vertical links (E v ), represent the relationship between criteria (vertex) of


consecutive levels, i.e. the relationship between parent and children.

Ev ⊂ V x V
  (3)
E v  (y, x)/x, y ∈ V, x ∈ Level j , y ∈ Level j−1

The criteria at all levels have a parent that is at the previous consecutive level,
except level 0 (see Eq. 4).
 
∀0< j≤l Level j  x/(y, x) ∈ E v , x ∈ V, y ∈ Level j−1 (4)

Horizontal links (E h ) represent the interdependence between criteria. Each link


is presented by a triplet x, y, Ix y , which means the influence of the criterion x on
the criterion y, with a weight Ix y .

E h ⊂ V x V x {hn, n, win f, p, hp}


 
E h  x, y, Ix y /x, x ∈ V, ∃z [(z, x) ∈ E v , (z, y) ∈ E v ],

Ix y ∈ {hn, ln, win f, lp, hp}
   
∀x,y x, y, Ix y ∈ E h ⇒ y, x, Ix y ∈ / Eh (5)
 
Ix y is a linguistic variable, which establishes five labels: highly negative (hn),
negative (n), without influence (winf ), positive (p) and highly positive (hp), defined
in the interval [–1, 1] (See Fig. 2). The number of linguistic terms and membership
function parameters of the linguistic variable were modeled by human-experts in
software quality in the Cuban context.
The horizontal links are modeled through the Fuzzy Cognitive Maps, discussed
in the Sect. 3.
To determine the interdependence between criteria of the same level, the FCM is
constructed from the information provided by the experts. The relationship between
the sibling criteria is first analyzed and a FCM is formed for each group of siblings.
Nodes are sibling criteria and edges represent the influence of one criterion on another.
Fuzzy Cognitive Maps for Evaluating Software Usability 147

The resulting FCM is reviewed by each expert and to each edge is associated a value
of the variable I. The FCM obtained by each expert should be added later, using
a technique that allows for consensus. It is advisable to use a consensus-building
algorithm as proposed in [20].
The consensus Fuzzy Cognitive Map (FCMc) is obtained for each group of sib-
lings. From each FCMc, the adjacency matrix (AMc) is found, the different AMc
of each level are combined and the matrix of interdependence between criteria is
determined.
The unification of the different AMc is simple because there are no common criteria
between the different maps. The combination is performed according to Eq. 6.
⎡ ⎤
AMc1 win f win f
⎢ ⎥
⎣ win f AMc2 win f ⎦ (6)
win f win f AMck

This matrix is called the criteria interdependence matrix (MI).


⎡ ⎤
y1,1 · · · y1,j · · · y1,n
  

⎢ ⎥
⎢ . .. .. ⎥
⎢ .. . . ⎥
⎢ ⎥
⎢ ⎥


M I l  ⎢ yι,1 · · · yι,j · · · yι,n ⎥


  

(7)
⎢ ⎥
⎢ . .. .. ⎥
⎢ . ⎥
⎣ . . . ⎦
yn,1 · · · yn,j · · · yn,n
  

M I l represents the interdependence matrix between the n criteria of level l.


On the other hand, there is a subset of criteria in the model classified as essential
(EC) and determined from the usability requirements. The essential criteria are the
attributes that the product must satisfy and whose absence must be penalized so that
the entire sub-tree is evaluated to zero.
  
EC  x, Bx , B x B x , Bx ∈ R, Bx < B x (8)

where,
• x is essential criterion.
• Bx : lower threshold value of criterion x.
• B x : upper threshold value of criterion x.
With this formal and generic definition of the usability model, a better structuring
of the problem of usability assessment is achieved.
148 Y. F. Pérez et al.

4.2 Penalization Vector

The penalization vector is calculated on based of the essential criteria and their
restrictions. The objective of this vector is to control that the usability measures
considered essential act in accordance with the defined restrictions; otherwise, the
associated sub-characteristics are assigned the value 0.
For the calculation of the penalization vector P, Eq. 9 is used.
 
P  min z ik (9)
1≤k≤r

where,
• z ik is an element of the matrix Z.
• r represents sibling criteria.

⎡ ⎤
z 11 . . . z 1ln
Z  ⎣ ... ... ... ⎦ (10)
z m1 . . . z mln

Z is a matrix of values of 0 and 1, which is calculated with Eq. 11.



1 if C j ∈/ EC
zi j  (11)
g(x, Bx , B ) if C j ∈ EC
x


  1 if Bx ≤ x ≤ B x
g x, Bx , B x

0 other case

4.3 Usability Assessment Method

The method consists of the following steps:


1. Obtaining the usability model: The usability model, the measures and essential
criteria to be used are defined, all based on the evaluation requirements.
2. Determination of the weight of the criteria and the interdependence: The first
task to be done is to assess the importance of each criterion, based on the opinion
of the experts. The weight of the criteria, can be established from the use of
different methods, either by direct determination of experts or by comparison in
pairs, obtaining the eigenvector. It must be taken into account, that the sum of the
weights of siblings, must be equal to 1. Then, the relationship between siblings’
criteria in each level is determined with the use of Fuzzy
 Cognitive Maps and

l
the interdependency matrix for each level is derived M I .
Fuzzy Cognitive Maps for Evaluating Software Usability 149

3. Usability evaluation: Usability testing and experts’ evaluation are performed for
the different software, the value of each of the selected measures is obtained,
 l  and unification of information and the evaluation matrix is
it is normalization


established Me .
4. Aggregation of information: First, the influence matrix (G) is calculated using
the Eq. 12.
 l  l
 
l


G  f Me + Me × M I (12)

Next, the previous matrix is weighted and the information of sibling criteria is
added (Eq. 13).

Gp  G ⊗ W, gpi j  gi j · w j (13)

Finally, the products are penalized taking into account the essential criteria, and
using the method of the penalization vector described.

Mel−1 × P l−1  (Mei1 · Pi1 , Mei2 · Pi2 , . . . , Mein · Pin ). (14)

Step 4 is repeated until the usability index is obtained.


5. Recommendation: A ranking is obtained according to the usability index.

4.4 App to Evaluate the Usability of Mobile Applications

An app was developed to evaluate the usability of mobile applications. The proposed
method is the basis of the implementation of this application. The app is developed
on Android Operating System and through web services. The interfaces have been
designed in a pleasant, understandable and easy to operate manner, so that the user,
at all times, knows the actions that can and should be performed. It goes through
the application, from the registration of users, to the ranking of the apps, intuitively
and with a comfortable navigation for the user, conceiving a sequential process.
Figures 3, 4, 5 and 6, show the user interfaces of each step to be performed. Figures 3
and 4 corresponds to step 2 Determination of the weight of the criteria and the
interdependence. On the other hand, the Fig. 5 shows of the value of measures
according of the third step. The results obtained from the application of the method
are shown in Fig. 6.
150 Y. F. Pérez et al.

Fig. 3 Interface of interdependence matrix entry

Fig. 4 Interface of entry of


weight of criteria
Fuzzy Cognitive Maps for Evaluating Software Usability 151

Fig. 5 Interface of usability


measure entry

4.5 Study Case

The previous method and app was applied in a controlled environment a cuban
software quality evaluation company, CALISOFT, for the usability assessment of
three products (S1, S2, S3).
Based on the software requirements, it is determined to evaluate the Usability and
the sub-features, according to ISO 25010: appropriateness recognisability, learnabil-
ity, operability, user interface aesthetics (see Table 2). Exhaustive description and
Integrity of the documentation are numeric measures, while Satisfaction and Appear-
ance are linguistic variables. Satisfaction is a linguistic variable, which establishes
five labels: Very Low (VL), Low (L), Medium (M), High (H), Very High (VH). While
appearance establishes five label too as Not Pleasant (NP), Low Pleasant (LP), Pleas-
ant (P), High Pleasant (HP), Very High Pleasant (VHP). All measures are benefit.
Here ends first step.
The second step is executed. The weight vector for the usability is determined
through peer comparison (see Table 3). A study on the sensitivity of the resulting
152 Y. F. Pérez et al.

Fig. 6 Interface the


recommendation

Table 2 Data domain of the measures


Criteria Appropriateness Learnability Operability User interface
recognisability aesthetics
Exhaustive Integrity of the Satisfaction (Sat) Appearance
description (ED) documentation (App)
(IntDoc)
Domain [0, 1] [0, 1] (VL, L, M, H, (NP, LP, P, HP,
VH) VHP)
Cost-benefit B B B B

rankings in terms of slight modifications of the weights is outside the objectives of


the paper.
Subsequently, the relationship between criteria of level 1 is defined, with the use
of FCM (see Fig. 7). To establish the interdependence between criteria of the same
level, the FCMc is formed for the group of siblings. From the FCMc, the adjacency
matrix (AMc), of the level is derived and obtains the matrix of interdependence
between criteria M I 1 .
Fuzzy Cognitive Maps for Evaluating Software Usability 153

Table 3 Weight vector for Usability


Usability sub-characteristics Weight
Appropriateness recognisability (Rec) 0.152
Learnability (Lea) 0.211
Operability (Op) 0.400
User interface aesthetics (Aest) 0.237

Fig. 7 Fuzzy cognitive map of usability and the adjacency matrix

Table 4 The value of each measure for each software product


Measure Exhaustive Integrity of the Satisfaction (Sat) Appearance
description (ED) documentation (App)
(IntDoc)
S1 0.9 0.7 H HP
S2 0.8 0.7 M VHP
S3 0.5 0.6 L P

Table 5 The value of evaluation matrix


ED IntDoc Sat App
S1 (0.75, 1, 1) (0.5, 0.75, 1) (0.5, 0.75, 1) (0.5, 0.75, 1)
S2 (0.5, 0.75, 1) (0.5, 0.75, 1) (0.25, 0.5, 0.75) (0.75, 1, 1)
S3 (0.25, 0.5, 0.75) (0.25, 0.5, 0.75) (0, 0.25, 0.5) (0.25, 0.5, 0.75)

Software tests were performed, according to third step. The resulting data collected
and metrics were obtained. The value of each measure for each software is shown
in Table 4. After normalizing and unifying the data in fuzzy triangular numbers, the
evaluation matrix (Me) was achieved, as it is shown in Table 5.
154 Y. F. Pérez et al.

Table 6 Usability index for each software product


UI Defuzzified Ranking
UI
S1 0.67 0.75 0.80 0.75 1
S2 0.66 0.75 0.80 0.74 2
S3 0.64 0.73 0.78 0.72 3

The value of each usability index and ranking are shown in Table 6, as resulting
of last steps. The best usability index corresponds to product S1.

5 Conclusions and Future Works

Through the analysis of the models of usability used in the industry, it was possible
to obtain a solution to the problem of modeling a generic structure, through a graph.
The proposed method to value the interdependence between criteria and the essential
criteria. It also integrates the manipulation of ambiguous, imprecise information from
different sources. The proposal is based on elements of Soft Computing, such as fuzzy
logic, fuzzy linguistic modeling and the use of fuzzy cognitive maps. It is inspired
by real practical experiences provided by a Cuban company.
In this paper, the efficacy of Fuzzy Cognitive Map was demonstrated for modeling
problems of decision making, oriented fundamentally to the structuring and analysis
the interdependence between criteria.
The method facilitates and reduces the time for decision making, by creating a
logical, rational and transparent basis for analysis. It also achieves a better structuring
of the problem and therefore, greater participation and influence of all. Besides, it
increases the depth of analysis, which leads to an increase in the quality of the
decision.
The application of the proposal in a controlled environment shows that it is an
operative, reliable and precise solution, which is easily interpreted for its application
in industry.
Given the relevance of the topic addressed, the increasing complexity of the soft-
ware and the need to move towards the achievement of excellence in the products,
the continuity of the research is justified, moving towards the following lines: to
extend the proposed method, incorporating the modeling of the dynamic nature of
the evaluation, since the parameters change over time and produce an impact on the
final evaluation of the product. In addition, it is necessary not only to evaluate but also
to predict the usability of intermediate products in the development process using
machine learning algorithm. From the stored data of various evaluations, techniques
or algorithms for machine learning could be incorporated into the proposed model.
The weights of the aggregation mechanisms can be modified according to the context,
learning the weights of the aggregation function from the historical behavior.

Acknowledgements This work has been partially funded by the Spanish Ministry of Economy and
Competitiveness with the support of the project TIN2014-55024-P, and by the Regional Government
of Andalusia—Spain with the support of the project P11-TIC-8001 (both including funds from the
European Regional Development Fund, ERDF).
Fuzzy Cognitive Maps for Evaluating Software Usability 155

References

1. Fernández-Pérez, Y., Febles-Estrada, A., Cruz, C., Verdegay, J.L.: Complex Systems: Solutions
and Challenges in Economics, Management and Engineering (2017)
2. ISO/IEC, ISO/IEC 25010:2011 Systems and software engineering—Systems and software
Quality Requirements and Evaluation (SQuaRE)—System and Software Quality Models
(2011)
3. Basto Cordero, L.J., Ribeiro Parente Filho, L.F., Costa dos Santos, R., Gassenferth, W., Soares
Machado, M.A.: Ipod system’s usability: an application of the fuzzy logic. Glob. J. Comput.
Sci. Technol. 13 (2013)
4. Bhatnagar, S., Dubey, S.K., Rana, A.: Quantifying website usability using fuzzy approach. Int.
J. Soft Comput. Eng. 2, 424–428 (2012). ISSN: 2231-2307
5. Montazer, GhA, Saremi, H.Q.: An application of type-2 fuzzy notions in website structures
selection: utilizing extended TOPSIS method. WSEAS Trans. Comput. 7, 8–15 (2008)
6. Dubey, S.K., Mittal, A., Rana, A.: Measurement of object oriented software usability using
fuzzy AHP. Int. J. Comput. Sci. Telecommun. 3, 98–104 (2012)
7. Kurosu, M.: Human-Computer Interaction Users and Contexts: 17th International Conference,
HCI International 2015 Los Angeles, CA, USA, 2–7 August 2015 Proceedings, Part III. Lecture
Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 9171, pp. 35–42 (2015)
8. Singh, A., Dubey, S.K.: Evaluation of usability using soft computing technique. Int. J. Sci.
Eng. Res. 4, 162–166 (2013)
9. Cables, E., García-cascales, M.S., Lamata, M.T.: The LTOPSIS: an alternative to TOPSIS
decision-making approach for linguistic variables. Expert Syst. Appl. 39, 2119–2126 (2012)
10. Lamichhane, R., Meesad, P.: A usability evaluation for government websites of Nepal using
fuzzy AHP. In: 7th International Conference on Computing and Information Technology
IC2IT2011, pp. 99–104 (2011)
11. Etaati, M.L., Sadi-Nezhad, S.: A, using fuzzy analytical network process and ISO 9126 quality
model in software selection: a case study in e-learnig systems. J. Appl. Sci. 11, 96–103 (2011)
12. Challa, J.S., Paul, A., Dada, Y., Nerella, V., Srivastava, P.R.: Quantification of software quality
parameters using fuzzy multi criteria approach. In: 2011 International Conference on Process
Automation, Control and Computing (PACC), pp. 1–6 (2011)
13. Challa, J.S., Paul, A., Dada, Y., Nerella, V.: Integrated software quality evaluation: a fuzzy
multi-criteria approach. J. Inf. Process. Syst. 7, 473–518 (2011)
14. Dubey, S.K., Gulati, A., Rana, A.: Usability evaluation of software systems using fuzzy multi-
criteria approach. IJCSI Int. J. Comput. Sci. 9, 404–409 (2012). ISSN 1694-0814
15. Li, Q., Zhao, X., Lin, R., Chen, B.: Relative entropy method for fuzzy multiple attribute decision
making and its application to software quality evaluation. J. Intell. Fuzzy Syst. 26, 1687–1693
(2014)
16. Kiszová, Z., Mazurek, J.: Modeling dependence and feedback in ANP with fuzzy cognitive
maps. In: Proceedings of 30th International Conference on Mathematical Methods in Eco-
nomics, pp. 558–563 (2012)
17. Zimmermann, H.J.: Fuzzy set theory. Wiley Interdiscip. Rev. Comput. Stat. 2, 317–332 (2010)
18. Zadeh, L.A.: Soft computing and fuzzy logic. IEEE Softw. 11, 48–56 (1994)
19. Kosko, B.: Fuzzy cognitive maps. Int. J. Man Mach. Stud. 24, 65–75 (1986)
20. Groumpos, P.P.: Fuzzy cognitive maps: basic theories and their application to complex systems.
Fuzzy Cogn. Maps 247, 1–22 (2010)
Fuzzy Simulation of Human Behaviour
in the Health-e-Living System

Remberto Martinez, Marcos Tong, Luis Diago, Timo Nummenmaa


and Jyrki Nummenmaa

Abstract This chapter shows an application of fuzzy set theory to preventive health
support systems where adherence to medical treatment is an important measure to
promote health and reduce health care costs. Preventive health care information
technology systems design include ensuring adherence to treatment through Just-In-
Time Adaptive Interventions (JITAI). Determining the timing of the intervention and
the appropriate intervention strategy are two of the main difficulties facing current
systems. In this work, a JITAI system called Health-e-living (Heli) was developed
for a group of patients with type-2 diabetes. During the development stages of Heli
it was verified that the state of each user is fuzzy and it is difficult to get the right
moment to send motivational message without being annoying. A fuzzy formula
is proposed to measure the adherence of patients to their goals. As the adherence
measurement needed more data, it was introduce the DisCo software toolset for
formal specifications, the modelling of human behaviour and health action process
approach (HAPA) to simulate the interactions between users of the Heli system.
The effectiveness of interventions is essential in any JITAI system and the proposed
formula allows Heli to send motivational messages in correspondence with the status
of each user as to evaluate the efficiency of any intervention strategy.

R. Martinez · M. Tong
ExtensiveLife Oy, Lohkaretie 2 B 9, 33470 Tampere, Finland
e-mail: remberto@health-e-living.com
M. Tong
e-mail: marcos@health-e-living.com
L. Diago (B)
Interlocus Inc., Yokohama 226-8510, Japan
e-mail: ldiago@i-locus.com
L. Diago
Meiji Institute for Advanced Study of Mathematical Sciences, Meiji University,
4-21-1 Nakano, Tokyo 164-8525, Japan
T. Nummenmaa · J. Nummenmaa
University of Tampere, Tampere, Finland
e-mail: timo.nummenmaa@staff.uta.fi
J. Nummenmaa
e-mail: jyrki.nummenmaa@staff.uta.fi
© Springer Nature Switzerland AG 2019 157
R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_9
158 R. Martinez et al.

1 Introduction

Developing better systems to capture and track patient-specific receipt of preven-


tive health services delivered anywhere and over time will be critical to optimising
performance measurement and reducing unnecessary duplication of care [1]. In the-
ory, Just-in-Time Adaptive Interventions (JiTAIs) [2] are a persuasive technology
which promise to empower personal behavioural goals by optimising treatments to
situational context and user behaviours [3]. JiTAI design aim is to provide the right
type/amount of support, at the right time, by adapting to an individual’s changing
internal and contextual state [2]. However, People’s health determinants are diffi-
cult to model because of their inherent uncertainty, the complex interactions among
them, as well as the considerable number of variables and lack of precise mathemat-
ical models [4].
Fuzzy set theory provides the necessary tools when someone intends to work with
vague, ambiguous, imprecise, noisy or missing information [5–8]. In [6], the authors
present a general view of the current applications of fuzzy logic in medicine and bioin-
formatics. Several fuzzy-logic based context models [5], and a related context-aware
reasoning middleware that provides a personalized, flexible and extensible reason-
ing framework have been developed to infer how personal behaviour is expected to
change under a given intervention [7].
In our previous work presented at ISFUROS 2017 [9] we have applied fuzzy
modelling to calculate progress and send motivation emails to users depending on
the type of adherence to the system (high, medium or low). The acquired data is
related to nutrition, mood and physical activity mainly. However, the results were
only preliminary because the sample size was not large and there were missing data.
In this chapter we add the DisCo software toolset [10] to simulate users missing data
in order to validate our previous modelling by Fuzzy Rules extraction from Time
Series and using rules to reproduce real models and extract new knowledge from
the simulations. The data simulated includes the HAPA model [11], the adherence
formula specifications developed in our previous work and the human behaviour
model reported in [12] to demonstrate the advantages of using fuzzy rules extracted
from real system data, compare these rules validity in simulated data and to discovers
a new user behaviour or a new user type that is not available in the real system or in
the scientific literature.
The rest of the chapter is organised as follows. First, Sect. 2 introduces the Heli
system, our previous fuzzy formula for measuring user’s adherence to de system
and the modelling of Human Behaviour with a simplified HAPA model. Section 3
briefly describes the Disco simulation and Heli adherence formula specification.
Section 4 presents the proposed method to extract rules from the simulations and
Sect. 5 compares the results of the proposed approach with the approach reported
in [12] to demonstrate the advantages of modelling human behaviour during fuzzy
adherence calculation. Finally, conclusions and future works are presented in Sect. 6.
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 159

2 Health-e-Living (Heli) System

Health-e-living (Heli) [13] is a mobile solution to deliver preventive, educational and


promotional health to all citizens comfortable with IT technologies independently
of their age or geographical location. Heli uses the metaphor of social networks
to improve health habits of persons connected to their support network in real life
(family, friends, and colleagues). Figure 1 shows the logical solution provided by the
Heli system. The EMA (Ecological Momentary Assessment) data collected about
user activities related to biometrics, mental, nutrition or physical activity are gathered
and transmitted using a mobile device with communication capabilities. Users can
annotate the data in history or discuss with experts or own support communities. The
data is stored in a private cloud after user’s consent and is used to generate automatic
reports on goal progress and trends for both users and coaches. The Heli system
provides its users with periodical interactions, educational material and motivational
messages to help them achieve their selected goals. At the same time provides tools
to coaches for easier daily management, survey preparations and guidance material
creation.

2.1 Fuzzy Adherence Measurement

People’s health determinants are difficult to model because of their inherent uncer-
tainty, the complex interactions among them, as well as the considerable number of
variables and lack of precise mathematical models. This was the main motivation
to use a fuzzy approach as a practical option to model the adherence to a treatment
and healthy lifestyle of a patient in the Heli system. In Heli, two variables are com-
bined for the evaluation of patients’ status: the progress of the proximal outcomes

Fig. 1 Logical solution provided by the Heli system [13]


160 R. Martinez et al.

Δx = x − gi  and the patient adherence to the system y = F(x, z). Note that the
value of y depends on the inputs x (i.e. proximal outcomes) which are controlled by
the patients and the contextual inputs z (e.g. environment) which are not controlled
by the patients. Progress indicates how close a patient is to completing the outcomes
gi(1≤i≤n) and the adherence measures how effective the system is in its intervention.
The adherence is modelled as a fuzzy weighted average involving type-1 (T1) fuzzy
sets as follows [14]: n
wi xi
y = i=1 n (1)
i=i wi

In (1), wi are weights that act upon proximal outcomes xi . While it is always true
that the sum of the normalized weights that act upon each xi add to one, it is not a
requirement that the sum of the unnormalized weights must add to one. The adherence
is calculated as an average of several goals, and gives an idea of how well the
system goes with the goals of the patients. Every goal gi is defined on an interval
gi ∈ [g− , g + ] and the values of y− and y + are computed accordingly as follows:
⎧ ⎫
⎧ ⎫ ⎪ 1, if (x ≤ g + ), ⎪
⎨ 1, if (x ≥ g− ) ⎬ ⎪
⎨ 2g −x
+ ⎪

+ i f (g + < x < 2g + )
y− (x) = x/g− i f (0 < x < g− ) , y (x) = g +
.
⎩ ⎭ ⎪
⎪ 0 if x ≥ 2g + ⎪⎪
1 other wise ⎩ ⎭
1 other wise
(2)
An example of positive goal to achieve would be to increase fruits consumption as a
minimum of 5 fruit portions in a week or to walk a minimum of 10,000 steps a day.
For this type of goals g− it is enough to achieve the minimum in order to have 100%
of completion. Similarly for negative goal, and example could be to decrease sugary
beverage consumption with a maximum of 1 glass of soda in a week. This type of
goal g + would achieve 100% of completion with no data entry or zero amount of
beverage portions consumed.
As it was mentioned before, the choice of the time interval between decision
points can have a dramatic impact on the ability of a user to achieve its goals. Patient
progress is calculated daily and adherence is added to the system weekly but it is
calculated every two weeks because we can not calculate without data. The time
interval between decision points is set to two-weeks to see how many entries there
are in that period. As long as adhesion is closer to 1 the system is more effective.

2.2 Modelling Human Behaviour with a Simplified


HAPA Model

HAPA is designed as a sequence of two continuous self-regulatory processes, a goal-


setting phase (motivation) and a goal-pursuit phase (volition). The second phase is
subdivided into a pre-action phase (volition) and an action phase (maintenance)
(see Fig. 2).
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 161

Fig. 2 Health action process approach [13]


162 R. Martinez et al.

In this work we use the stage model; the stage approach assumes that change is
non-linear and consists of several qualitative steps that reflect different mindsets of
people. We could model the efficacy of HeLi as the probability of compliance with
Health Goal set at the evaluation time (2 weeks) or just simply the probability of an
intended Health Behaviour Change Compliance.

H BCC(t) = C H (t) ∗ M A(t) ∗ P(t) (3)

where CH(t) is adherence history over the time the system is used and n is the number
of previous inputs
C H (t) = 1 − (0.1)n(t) (4)

MA(t) is the Motivation to comply with Health Goal selected according to your
personal motivation (Mi) and believes (Bi) at any time,

n
M A(t) = Mi ∗ Bi (5)
i=1

P(t) is the Perceived Self-Efficacy over time, including outcomes expectations (Ok)
and risk perception (Rk) during intention formation.

n
P(t) = Ok ∗ Rk (6)
k=1

Brailsford used a probability to participate in treatment of 0.85 [12] for patients


diagnosed with cancer. In Heli the real data is collected from patients diagnosed with
Diabetes Type 2 risk and so the probability of any user to input data after selecting
a personal goal over time is equivalent to the probability of a user participating to
a Health Behaviour Change during the period of time when system is used. Heli
system as a preventative health process tries to persuades users to change towards a
healthy behaviour by assessing the data collected over the period of two weeks. For
any lifestyle change to be consider as a sticky habit the period of assessment should
be greater than two weeks and so is the period used for evaluating the efficacy of the
system.
Assuming that motivation to provide personal data is constant for all users, it is
possible to model three main user types influenced by a goal self awareness and the
normative believe of that entering data in the system will contribute to achieving the
set goal as low (0.5), moderate high (0.8) and high (1.0) and that will lead to provide
more entries during the time span of two weeks. It is possible to assume in a first
modelling phase that P is always 1.
With this simplistic model, it is possible to generate simulation data so that is
representative of the results obtained with real data. In this model the simulator
could be configured to generate an input event based on certain probability. In this
first iteration the user behaviour is able to form an intention, plan and take basic
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 163

actions. The next step would be to model a user behaviour that is influenced by the
environment and will change the perceived self-efficacy in time.
The main contribution of HAPA model is allowing perceived self-efficacy to
change over time in situations where user needs to cope with setbacks or recover
from life challenges. In this iteration is possible to add two more properties to the
user model: emotional commitment (1, can cope or 0.1, not) and failure learning
(1, can recover or 0.2, not). Now the simulated data allows a user to react on the
event of receiving a motivational message or not. In this phase the value of P will
be calculated over the assessment period of time. Figure 2 shows on top the basic
HAPA model with its three user states: Motivation, Volition and Maintenance. On
the bottom is represented how the model is used in system Heli where Volition and
Maintenance are combined in one state as HAPA_Volition.

3 Heli Adherence Specification for DisCo Simulation

As the amount of real data available in the Heli system was limited and the amount of
available data from real users was not substantial enough, in this chapter we describe
Fuzzy Adherence Simulation using a HAPA model for User’s behaviour in a DisCo
formal specification environment as a method to generate more data resembling the
data observed in the real system.
There were 126 users registered in Heli from 2013/07 to 2017/03 (including
8 system administrators, 20 coaches and 98 patients mainly related with type-2
diabetes). Figure 3 shows patients weight (44–125 kg) and the distribution of the
number of goals selected by the participants.

3.1 Computing Adherance and State of the Patients

Several authors [2, 3, 11] have emphasized the importance of having computational
models of human behaviour to monitor the dynamics of an individual’s internal
state and context in real time. The adaptation requires monitoring the individual to
decide (a) whether the individual is in a state that requires support; (b) what type (or
amount) of support is needed given the individual’s state; and (c) whether providing
this support has the potential to disrupt the desired process. In our previous work we
focus on the design and evaluation of effective interventions exploring patients self-
reporting states and sending motivational messages based on a dimensional approach.
Table 1 shows 7 dimensions, 13 states and some examples of motivational messages
used in the intervention. Note that the messages are associated to the dimensions and
not to the states of the patients, since the states vary over time and in some cases
the states were not reported during the system test stage (e.g. states marked with
“-” in the table). Current probabilities of the states are included within parenthesis.
164 R. Martinez et al.

(a) Weight

(b) Number of goals selected by participants

Fig. 3 Statistics of the data collected for 98 patients registered in Heli from 2013/07 to 2017/03

Motivational messages are sent to the patients based on the computed adherence to
their proximal outcomes and their reported states (i.e. feedbacks).
The Waikato Environment for Knowledge Analysis (WEKA) [15] software was
used to predict the state of one patient (id = 19). The patient provided 46 feed-
backs to de system including 8 states: tired (14), stressed (5), busy (14), sick/ill (2),
energetic (1), confident (3), socially pressured (3) and happy (4). The number in
parenthesis represents the times the state was provided. NaiveBayes, MLPClassifier,
AdaBoostM1 and RBFNetwork classifiers were tested with one feature computed
by (1) and the 8 states provided by the patient. Using a 10-fold cross-validation
method the accuracy of the classifiers was 23.9130, 26.0870, 32.6087 and 34.7826%
respectively. The accuracy of the classifiers is still very low due to class overlapping
(e.g. tired, stressed, busy and socially pressured are very similar) and missing values
in the computation of the fuzzy adherence for the patients. In Heli, the number of
users with fuzzy adherence was very small (25/98 ≈ 25.5%) because most users
(73/98 ≈ 74.5%) prefer to use the system to store daily data without a specific goal.
As the emotional dimensions used in the research may not be the most adequate,
later on we use machine learning tools to enhance the effectiveness of Heli based on
computational models of human behaviour like the health action process approach
(HAPA) [11].
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 165

Table 1 Health-e-living intervention approach


Dimension Patient states Examples of motivational messages
1. Physiological Sick/ill (0.0285) Think of the week. What has caused your
Stressed (0.1285) physiological response? What could you
Tired (0.3857) change in your behaviour to next time avoid
these (e.g. go to bed earlier, eat more
regularly, eating proper meals etc.)
2. Optimism Energetic (0.0285) Sometimes things that are out of our control
Confident (0.0714) prevent us from fulfilling our good intentions.
Content (-) Try again next week!
3. Time Busy (0.2142) You can learn from every experience. What
will you do similarly/differently next time?
4. Drive Hungry/thirsty (-) It’s hard work to change an old habit, and in
the long term a warm and caring attitude
helps more than criticism
5. Social Socially pressured (0.0428) Think of the situation at hand. Does it matter
Unsupported (-) that you deviated from your plan?
6. Emotional Happy (0.0857) Keep going and see how the changes affect
your well-being!
7. Discourage Disappointed (0.0142) What has helped you succeed before? Could
Unmotivated (-) you apply those skills to this situation?

3.2 Disco Simulation

A formal specification should state precisely what a completed piece of software


is supposed to do, but not how the task should be achieved [16]. Formal speci-
fications are a powerful method for modelling system behaviour, usually used in
software development. DisCo is primarily intended for the specification of reactive
systems and its semantics have been defined with the Temporal Logic of Actions [17].
The DisCo software toolset [18], originally developed at the Tampere University of
Technology, includes a compiler for compiling specifications created in the DisCo
language, a graphical animation tool for animation and simulation of those specifi-
cations, and a scenario tool for representing execution traces as Message Sequence
Charts. The human interaction with Heli system was specified using DisCo language
and the simulation was executed in DisCo Animator version DisCo2000ˆ2 presented
in [19].
The main purpose of using DisCo specifications was to generate large amount of
data as close as possible to the real world data collected while modelling a Human
Behaviour that includes goals settings and the intention of acting upon the achieve-
ment of those goals. Running the simulation on DisCo animator supported the prob-
ability of a user providing input to the system, the adherence formula computation
based on those entries and the probability of a user reacting after receiving a feedback
message from Heli [9].
166 R. Martinez et al.

Fig. 4 DisCo animator simulation in Heli world

In Fig. 4 the Heli simulation world consists of four classes (patient, coach, Heli
system and external world) that can interact between each others through actions.
Actions are enabled on the simulation according to their guard (simulation sys-
tem state and relationship between classes). Enabled actions are selected for execu-
tion nondeterministically (with weighted probability) in any specific execution time.
When a participant user (patient) registers to Heli system and selects a goal (i.e mon-
itor own weight), a relation is Patient O f H eli becomes active and indicates that the
participant is already in the HAPA_Motivational state. After running a simulation
for a period equivalent to 367 days, the adherence to the system is observed and com-
puted as the number of data inputs during a week period of time. All modelled users
were registered and defined a goal (on targets related to weight management, better
nutrition and increase physical activity level), when no EMA entries are available in
an evaluation period of time the simulation assigns the user to HAPA_Motivational
state. Later on when EMA entries are available during the week, the user is con-
sidered in HAPA_Volitional or HAPA_Maintenance state and the contents of each
entries can be used to compute the progress over time towards the selected goal (see
Fig. 2).
There are more than 1200 records per patient on average in the simulation. On
each recorded entry, the simulation computes the probability of entering next input,
the compliance history and the new value of adherence. Over the elapsed time of
every two weeks, in the simulated world, is possible to assess the value of adher-
ence and based on that the system sends personalised messages according to user
attribute compliance: not_very_active, active and very_active. Since the simulation
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 167

purpose was to generate data and not to represent the messages personalisation, the
model increased the probability of sending more user activity for those users where
the value of adherence was closer to 1 (max). This feature represents a participant
resilience to cope with external environment setbacks. Resilience can be simplified
to include user’s emotional commitment and the ability to learn from a failure and
to keep own goal as a user being of two types: responsive and not_responsive. A
user being responsive will have a high correlation with user in HAPA_Volitional or
HAPA_Maintenance states, having moderately high adherence or high adherence.
Listing 9.1 Functions used in the Disco specification of Adherence in equation (1)

function weightProgress(p: Patient ) : real is


return 1.0 − abs(p. goal . target .v1 − p. act . weight)
/ (p. goal . target .v1) ;
end;
function stepsProgress (p: Patient ) : real is
return 1.0 − (p. goal . target .v2 − p. act . steps )
/ (p. goal . target .v2) ;
end;
function fruitsProgress (p: Patient ) : real is
return 1.0 − (p. goal . target .v3 − p. act . fruits )
/ (p. goal . target .v3) ;
end;
function computeAdherance(w:World; p: Patient ) : real is
return ( p. goal .w.v1 ∗ weightProgress(p) +
p. goal .w.v2 ∗ stepsProgress (p) +
p. goal .w.v3 ∗ fruitsProgress (p) )
/ (p. goal .w.v1 + p. goal .w.v2 + p. goal .w.v3) ;
end;

4 Fuzzy Rules Extraction from Time Series

As in our previous research the The Waikato Environment for Knowledge Analysis
(WEKA) [15] software was used to predict the state of above three patients. Instead
of using the state reported by the patients, we used the three states included in the
HAPA model (motivational, volitional and maintenance) to find a correspondence
between the HAPA states and previous states reported by the real participants. Using
a 10-fold cross-validation method to predict the HAPA states with J48 classifier the
accuracy was 99.76, 98.63 and 99.20% for each patient respectively. As the states
of the patients are fixed by the simulator in the HAPA model, we find a mapping
between the values of adherence in the precedents of the rules and the values of
adherence in the previous states reported by the patient in the real Heli-system. In
the current simulation of the system from the specifications, the HAPA states only
168 R. Martinez et al.

depend on the registration process and the number of messages sent to the Heli by the
Participants. There is not specification detailed for the changes in the HAPA states
after the Patient entered the maintenance state. The factors to modify the states are
currently under investigation and modelling.

5 Preliminary Results

As the results in our previous research were only preliminary due to the small size of
the the sample, in this chapter our experiments are focused to review the conditions
of the real patients who participated in the Heli system. Using the simulation we
intend to mimic real conditions with better models and extract new knowledge from
the Disco simulations. While the generated data does not reflect the real data exactly,
it can be used to quickly validate the assumptions in the participants goal achieve-
ment progress and its system adherence calculation and then compare the results of
the proposed approach with the approach reported in [12]. Figure 5 shows the prob-
ability distribution function for the adherence experimental data and the values of
adherence computed by both simulation models. The main difference between both
simulation models is that proposed model includes motivational messages sent by the
coaches to the patients based on the values of adherence while Brailsford model does
not includes motivational messages. The graph were computed by using the Matlab
f itdist function that creates a probability distribution object by fitting the Epanech-
nikov kernel function to the data. The values of the bandwidth were 0.1029, 0.0497
and 0.0744 for the real, Braidford and proposed simulated data respectively. Both
models used in simulation were simplified to not include users personal motivation
to achieve the intended goal, however the inclusion of HAPA model in the proposed
allowed to predict the user state based on the number of inputs over time. The pro-
posed model matches the adherence value increase of participants after receiving
motivational messages as it was observed in the real data [9]. Any user instance was
allowed to have several goals at the same time, however due to time limitations the
executed simulation assumed only one type of goal at the time.
As it is shown in Fig. 5, the values of adherence are higher for the proposed
approach (0.54 ± 0.32 vs. 0.51 ± 0.32) and also the graph is closer to the experi-
mental/real graph. Although we did not compare the two distributions directly, we
can see in the figure that in the extremes (adherence close to zero and adherence
close to one) the distributions are very similar (they differ only in a scale factor).
However, when the adherence values are between 0.4 and 0.8 the distributions do
not look alike. This can be said to be due to the lack of data acquired during the real
experiments and/or because the simulation is still far from including the aspects of
reality that govern those cases where adherence has average values. The probability
with which a patient sends data after receiving a message from a Coach has not been
modelled either. The variation of the motivation over time has not been included
either (the value of m = 1 was fixed during the simulation).
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 169

Fig. 5 Comparison of the histograms (upper part) and probability distribution function (PDF, lower
part) for simulated and real values of adh

The authors consider that the main objective of this work has been fulfilled, since
we were able to increase the number of data by means of simulation and at the same
time predict the states in which the patient is (according to HAPA) from the values of
adherence computed by the proposed fuzzy formula in (1). Finally, Fig. 6 shows the
results of the simulations with Disco for 367 days with the 3 patients, the coach and
the Heli system shown in Fig. 4. The average number of entries randomly generated
was 2.4 times larger than the real data mainly because the number of real messages
sent by the automatic Heli coach is smaller than those of the simulated. In the real
Heli, the automatic coach generates a motivational message every 2 weeks for each
patient, so it would be around 25 a year maximum or 50 if the patient provides more
feedback (see the patient states in Sect. 3.1). A human coach could generate a little bit
more messages when supporting a real participant. In the simulation, the messages
are generated daily in a random way so the number of records is larger. However, the
simulation shows that some pattern of adherence exists according to the type of goal
settled by the Patient. For example, Patient 1, that tried to control the weight below
75 kg never reached the goal during one year. On the contrary, Patient 2 and Patient 3
who settled their goals related with physical activities (e.g. walk more than 1000 steps
a week) and nutrition (e.g. eat more than 7 fruits a week) could achieve their goal
170 R. Martinez et al.

Fig. 6 Results of the simulations with Disco for 367 days with the 3 patients

several times in the year. As it is shown, Fig. 6 shows a cyclic behaviour for Patient 2
and Patient 3. After the patient reached the goal the adherence is reduced. Although
this behaviour was included in the simulated specification it can be consider in some
agreement with real life.

6 Conclusions and Future Work

JITAI systems like Heli appear to be a promising framework for developing mHealth
interventions. In Heli, the number of users with fuzzy adherence was very small
(25/98 ≈ 25.5%) because most users (73/98 ≈ 74.5%) prefer to use the system to
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 171

store daily data without a specific goal. However, the proposed interventions showed
that even after several stress inputs patients do not leave the system. Although this
research is still in its infancy, fuzzy measures like the proposed adherence formula
constitute a practical option to measure the way a patient approaches by successive
approximations in time to a certain goal. The chapter showed by mean of simulation
that there is a close correspondence between real world adherence of the patients and
its computational model.
The simplified model used in simulation did not include the reactiveness of users
when receiving motivational messages from Heli automatic coach. The ways to
adjust the number of registered records during simulation to agree with those of
the real life is currently under investigation. Future works should expand the model
to improve user personal motivation and perceived self-efficacy. For example by
using the data available after user profiling or the data collected from system usage.
Another interesting approach would be to measure adherence when the model allows
to change a goal after several weeks of simulation execution.
The introduction of HAPA and human behaviour factors in the model required a
better understanding of the user. The data collected from the real system was used to
decide what relations and actions were the most important for state transitions and
adherence computation.
While DisCo specifications were enough to describe the real system implemen-
tation, it could be expanded to give more freedom in modelling. The relationships
between classes were enough to represent the Heli world in the simulator. Main
limitations of current DisCo specification toolset is the ability to specify complex
mathematical formulas and the reduced semantics set during the simulation prepa-
ration. The Animator tool was pleasant to use, however some improvements are
needed to implement external factors variability for the system under modelling dur-
ing execution time. For the logs output processing it is desirable to add an export
functionality to common standard formats like csv or database connectors. After
several iterations of modelling, It was possible to generate large amount of data to
discover new knowledge about real system. The data generated is found useful for
other phases of software testing cycle in stage level systems.
More research is required to understand the impact of behavioural interventions on
real life user lifestyle achievements and what aspects of user motivation is triggering
the intention of resilience improvements.

References

1. Bowen, M.E., Bhat, D., Fish, J., Moran, B., Howell-Stampley, T., Kirk, L., Persell, S.D., Halm,
E.A.: Improving Performance on Preventive Health Quality Measures Using Clinical Decision
Support to Capture Care Done Elsewhere and Patient Exceptions
2. Nahum-Shani, I., Smith, S.N., Spring, B.J., Collins, L.M., Witkiewitz, K., Tewari, A., Murphy,
S.A.: Just-in-time adaptive interventions (JITAIs) in mobile health: key components and design
principles for ongoing health behavior support. Ann. Behav. Med. (2016). https://doi.org/10.
1007/s12160-016-9830-8
172 R. Martinez et al.

3. Murray, T., Hekler, E., Spruijt-Metz, D., Rivera1, D.E., Raij, A.: Formalization of computational
human behavior models for contextual persuasive technology. In: PERSUASIVE 2016, LNCS
9638, pp. 150–161 (2016). https://doi.org/10.1007/978-3-319-31510-2_13
4. Hekler, E.B., Michie, S., Pavel, M., Rivera, D.E., Collins, L.M., Jimison, H.B., Garnett, C.,
Parral, S., Spruijt-Metz, D.: Advancing models and theories for digital behavior change inter-
ventions. Am. J. Prev. Med. 51(5), 825–832 (2016). https://doi.org/10.1016/j.amepre.2016.06.
013
5. Yuan, B., Herbert, J.: Fuzzy CARA - a fuzzy-based context reasoning system for pervasive
healthcare. Procedia Comput. Sci. 10, 357–365 (2012)
6. Torres, A., Nieto, J.J.: Fuzzy logic in medicine and bioinformatics. J. Biomed. Biotechnol.
2006, Article ID 91908, 1–7. https://doi.org/10.1155/JBB/2006/91908
7. Giabbanelli, P.J., Crutzen, R.: Creating groups with similar expected behavioural response in
randomized controlled trials: a fuzzy cognitive map approach. BMC Med. Res. Methodol. 14,
130 (2014)
8. Gursel, G.: Healthcare, uncertainty, and fuzzy logic. Digit. Med. 2, 101–112 (2016)
9. Martinez, R., Tong, M., Diago, L.: Fuzzy adherence formula for the evaluation of just-in-time
adaptive interventions in the health-e-living system. In: Proceedings of ISFUROS Symposium
(2017)
10. The DisCo project WWW page. http://disco.cs.tut.fi on the World Wide Web. Accessed 16
April 2018
11. MacPhail, M., Mullan, B., Sharpe, L., MacCann, C., Todd, J.: Using the health action process
approach to predict and improve health outcomes in individuals with type 2 diabetes mellitus.
Diabetes Metab. Syndr. Obes. Targets Ther. 7, 469–479 (2014)
12. Brailsford, S.C.: Healthcare: human behavior in simulation models. In: Kunc, M., Malpass, J.,
White, L. (eds.) Behavioral Operational Research. Palgrave Macmillan, London (2016)
13. Martinez, R., Tong, M.: Can mobile health deliver participatory medicine to all citizens in
modern society? In: 4th International Conference on Well-Being in the Information Society,
WIS 2012, Turku, 22 August 2012–24 August 2012, pp. 83–90 (2012)
14. Liu, F., Mendel, J.M.: Aggregation using the fuzzy weighted average as computed by the
Karnik-Mendel algorithms. IEEE Trans. Fuzzy Syst. 16(1), 1–12
15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data
mining software: an update. SIGKDD Explor Newsl. 11, 10–18 (2009). https://doi.org/10.
1145/1656274.1656278
16. Diller, A.: Z: An Introduction to Formal Methods. Wiley, New York (1990)
17. Lamport, L.: The temporal logic of actions. ACM Trans. Program. Lang. Syst. 16(3), 872–923
(1994)
18. Aaltonen, T., Katara, M., Pitkanen, R.: DisCo toolset - the new generation. J. Univers. Comput.
Sci. 7(1), 3–18 (2001)
19. Nummenmaa, T.: Executable Formal Specifications in Game Development: Design, Validation
and Evolution. Ph.D. thesis, Tampere University Press, Tampere (2013)
Part II
Rough Sets: Theory and Applications
Matroids and Submodular Functions
for Covering-Based Rough Sets

Mauricio Restrepo and John Fabio Aguilar

Abstract Covering-based rough set theory is an extension of Pawlak’s rough set


theory, and it was proposed to expand the applications of the latter to more general
contexts. In this case a covering is used instead of the partition obtained from an
equivalence relation. Recently many authors have studied the relationships between
covering-based rough sets, matroids and submodular functions. In this paper, we
present the matroidal structures obtained from different partitions and coverings of
a specific set. We also propose an extension of a matroidal structure for covering-
based rough sets. Finally, we establish a partial order relation among the matroidal
structures via submodular functions, coverings, and their approximation operators.

1 Introduction

The classical rough set theory was extended to covering-based rough set theory by
many authors. W. Żakowski [17], J. A. Pomykala [7], E. Tsang et al. [10], W. Zhu
and F. Wang [22–24], Xu and Zhang [15] present different approximation operators
for covering approximation spaces. In 2012, Y. Y. Yao and B. Yao proposed a general
framework for the study of covering-based rough sets in [16].
Matroids are important tools for describing some concepts in graph theory and lin-
ear independence in matrix theory [4, 5]. S. Wang et al. present a matroidal approach
to rough set theory, defining a matroidal structure from the partition obtained from
an equivalence relation [9]. X. Li et al. present a matroidal approach to rough sets
via closure operators [6].
Matroidal structures of covering-based rough sets are generally induced by a fami-
ly of subsets of a universe, defined through lower and upper approximation. In [11],
two matroidal structures of covering-based rough sets are built, using transversal

M. Restrepo (B) · J. F. Aguilar


Universidad Militar Nueva Granada, Bogotá, Colombia
e-mail: mauricio.restrepo@unimilitar.edu.co
J. F. Aguilar
e-mail: john.aguilar@unimilitar.edu.co

© Springer Nature Switzerland AG 2019 175


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_10
176 M. Restrepo and J. F. Aguilar

theory and upper approximation number. A matroidal structure from lower approxi-
mation operator in rough set theory was presented in [25].
The idea of independent sets in matroid theory can be useful for attribute reduction
problem. Some rough set-based methods in feature selection have been used for sol-
ving the attribute reduction problems [12, 26]. Some recent papers have established
interesting properties of the matroids and some connections with other mathematical
structures [4, 6, 13].
Different order and preorder relations on coverings are defined in [1]. Order
relations on approximation operators are presented in [3, 8].
In this paper, we use the upper approximation number function of a covering C as
a submodular function to build a matroidal structure. We use some basic examples
to compare the matroidal structures of different partitions of a set U . Also, we
obtain the respective matroidal structure for different coverings and we establish
a preorder relation on induced matroids. We study the matroidal structures obtained
from different lower approximation operators and different coverings. Additionally,
we extend the lower approximation matroidal structure to covering-based rough sets.
Finally, we compare the order relation of these structures, with the order defined in
upper approximation operators, as was established in [8].
The results of the comparison regarding about order among matroids are helpful
to select appropriate structures in typical rough set applications, such as attribute
selection and classification.
The remainder of this paper is organized as follows: Sect. 2 presents preliminary
concepts about covering-based rough sets, as well as matroids and submodular func-
tions. Section 3 presents the main matroids obtained by different methods. In Sect. 4,
we present some preorder relations between coverings, and we establish an order
relation between different matroidal structures and submodular functions. Finally,
Sect. 5 presents some conclusions and outlines our future work.

2 Preliminaries

2.1 Rough Sets

Throughout this paper, we assume that U is a finite and non-empty set. P(U )
represents the collection of subsets of U and |A| the cardinal of the set A for any
A ⊆ U.

2.1.1 Pawlak’s Rough Set Approximations

In Pawlak’s rough set theory, an approximation space is an ordered pair apr =


(U, E), where E is an equivalence relation defined on U . There are at least three
different, yet equivalent ways to define lower and upper approximation operators:
Matroids and Submodular Functions for Covering-Based Rough Sets 177

the element-based definition, the granule-based definition, and the subsystem-based


definition [16]. If E is an equivalence relation and [x] E is the equivalence class of
x ∈ U , for each A ⊆ U , the lower and upper approximations are defined by:

apr (A) = {[x] E ∈ U/E : [x] E ⊆ A} (1)

apr (A) = {[x] E ∈ U/E : [x] E ∩ A = ∅} (2)

A subset A is said to be exact if apr (A) = apr (A), otherwise it is called a rough
set. These approximations are called granule-based, according to [16].

2.1.2 Covering-Based Rough Sets

Many authors have investigated generalized rough set models obtained by changing
the condition that E is an equivalence relation, or equivalently, that U/E is a partition
of U . Changing of the partition for a collection of non-empty subsets K ⊆ U , with
∪K = U , gives rise to covering-based rough sets [14, 18–21].
Definition 1 Let C = {K i } be a family of non-empty subsets of U . C is called a cover-
ing of U if ∪K i = U . The ordered pair (U, C) is called a covering approximation
space [19].

It is clear that a partition generated by an equivalence relation is a special case


of a covering of U , so the concept of covering is a generalization of the concept of
partition.
Similar definitions of approximation operators apr (A) and apr (A) shown in
Eqs. 1 and 2 can be applied to covering approximation spaces with elements K ∈ C.
If A ⊆ U , then: 
apr C (A) = {K ∈ C : K ⊆ A} (3)

apr C (A) = {K ∈ C : K ∩ A = ∅} (4)

In a covering approximation space (U, C), the minimal and maximal sets that
contain an element x ∈ U are particularly important. The collection C (C, x) = {K ∈
C : x ∈ K } can be used to define a neighborhood system of x ∈ A.

Definition 2 Let (U, C) be a covering approximation space, and x in U . The set

md(C, x) = {K ∈ C (C, x) : (∀S ∈ C (C, x), S ⊆ K ) ⇒ K = S} (5)

is called the minimal description of x, i.e. md(C, x) contains the minimal elements
of C (C, x) [2]. On the other hand, the set

M D(C, x) = {K ∈ C (C, x) : (∀S ∈ C (C, x), S ⊇ K ) ⇒ K = S} (6)

is called the maximal description of x [24].


178 M. Restrepo and J. F. Aguilar

From the collections md(C, x) and M D(C, x) Yao and Yao introduced four new
coverings derived from the covering C [16].
1. C1 = ∪{md(C, x) : x ∈ U }
2. C2 = ∪{M D(C, x) : x ∈ U }
3. C3 = {∩(md(C, x)) : x ∈ U } = {∩(C (C, x)) : x ∈ U }
4. C4 = {∪(M D(C, x)) : x ∈ U } = {∪(C (C, x)) : x ∈ U }.
For example, the covering C1 is the collection of all sets in the minimal descrip-
tion of each x ∈ U , while C3 is the collection of the intersections of the minimal
descriptions for each x ∈ U . Additionally, they considered the so-called intersection
reduct C∩ and union reduct C∪ of a covering C:

C∩ = C \ {K ∈ C : (∃K ⊆ C \ {K }) (K = ∩K)} (7)


C∪ = C \ {K ∈ C : (∃K ⊆ C \ {K }) (K = ∪K)} (8)

These reducts eliminate the intersection (respectively, union) reducible elements


from the covering, and clearly they are also coverings of U . The equality C1 = C∩ ,
among coverings was established in [8].

Example 1 For the set U = {1, 2, 3, 4}, and the covering of U ,

C = {{1, 2}, {2, 3}, {4}, {1, 2, 3}, {2, 3, 4}},

the minimal description for each element is: md(C, 1) = {{1, 2}}, md(C, 2) =
{{1, 2}, {2, 3}}, md(C, 3) = {{2, 3}}, md(C, 4) = {{4}}. On the other hand, the maxi-
mal descriptions are: M D(C, 1) = {{1, 2, 3}}, M D(C, 2) = {{1, 2, 3}, {2, 3, 4}},
M D(C, 3) = {{1, 2, 3}, {2, 3, 4}}, M D(C, 4) = {{2, 3, 4}}. Therefore, the six cover-
ings obtained from the covering C are:
1. C1 = {{1, 2}, {2, 3}, {4}}
2. C2 = {{1, 2, 3}, {2, 3, 4}}
3. C3 = {{1, 2}, {2}, {2, 3}, {4}}
4. C4 = {{1, 2, 3}, {2, 3, 4}, {1, 2, 3, 4}}
5. C∩ = {{4}, {1, 2}, {1, 2, 3}, {2, 3, 4}}
6. C∪ = {{1, 2}, {2, 3}, {4}}.

2.1.3 Neighborhood Operators

Definition 3 ([16]) A neighborhood operator is a mapping N : U → P(U ). If


N (x) = ∅ for all x ∈ U , N is called a serial neighborhood operator. If x ∈ N (x)
for all x ∈ U , N is called a reflexive neighborhood operator.

Each neighborhood operator defines an ordered pair (apr N ,apr N ) of dual approxi-
mation operators, in the sense that apr N (∼ A) =∼ apr N (A), where ∼ A is the
complement of A:
Matroids and Submodular Functions for Covering-Based Rough Sets 179

apr N (A) = {x ∈ U : N (x) ⊆ A} (9)

apr N (A) = {x ∈ U : N (x) ∩ A = ∅} (10)

Different neighborhood operators, and hence different element-based definitions


of covering-based rough sets, can be obtained from a covering C. In general, we are
interested in the sets K in C such that x ∈ K .

Definition 4 ([16]) If C is a covering of U and x ∈ U , a neighborhood system


C (C, x) is defined by:
C (C, x) = {K ∈ C : x ∈ K } (11)

From the neighborhood system C (C, x), the minimal and maximal sets that con-
tain an element x ∈ U can also be used for defining the following neighborhood
operators, introduced by Y. Y. Yao and B. Yao [16]:
1. N1 (x) = ∩{K : K ∈ md(C, x)}
2. N2 (x) = ∪{K : K ∈ md(C, x)}
3. N3 (x) = ∩{K : K ∈ M D(C, x)}
4. N4 (x) = ∪{K : K ∈ M D(C, x)}.
According to Eqs. 9 and 10, each neighborhood operator Ni , for i ∈ {1, 2, 3, 4},
defines a pair of approximation operators apr N and apr Ni . A systematic study about
i
neighborhood operators in covering based-rough sets can be found in [3].

2.2 Matroids

One of the meanings of matroid is related to the notion of linear independence. For
example, let us consider the column vectors of the matrix A and the reduced row
echelon form: E A .
⎛ ⎞ ⎛ ⎞
1 0 21 1020
A = ⎝ 1 0 2 2 ⎠  E A = ⎝0 1 4 0 ⎠ (12)
2 −1 0 0 0001

If {a1 , a2 , a3 , a4 } represents the column vectors of A, then {a1 , a2 , a4 } is a set


of linearly independent vectors. Additionally, we know that any subset of linearly
independent vectors are also linearly independent. In this case, the collection of
independents is: I = {∅, {a1 }, {a2 }, {a4 }, {a1 , a2 }, {a1 , a4 } {a2 , a4 }, {a1 , a2 , a4 }}.

Definition 5 ([11]) Let U be a finite set. A matroid on U is a pair M = (U, I), where
I is a collection of subsets of U with the following properties:
1. ∅ ∈ I.
2. If I ∈ I and I  ⊆ I then I  ∈ I.
180 M. Restrepo and J. F. Aguilar

3. If I1 , I2 ∈ I and |I1 | < |I2 |, then there exists x ∈ I2 − I1 such that I1 ∪ {x} ∈ I,
where |I | denotes the cardinality of the set I .

The members of I are called independent sets of U . A base for the matroid M is
any maximal set in I. The sets not contained in I are called dependent. A minimal
dependent subset of U is called a circuit of M.
The rank function of a matroid is a function r : P(U ) → N given by

r (A) = max{|X | : X ⊆ A, X ∈ I}. (13)

Proposition 1 ([11]) If A and B are subsets of U , then:


1. r (A)  |A|
2. A ⊆ B ⇒ r (A)  r (B)

Proposition 2 ([11]) For the function defined in Eq. 13, the following property holds:

r (A ∪ B) + r (A ∩ B)  r (A) + r (B) (14)

for all A, B ⊆ U .

If r (A) = r (A ∪ {a}), element a is said to be dependent of A, and we denote it:


a ∼ A. The dependent elements of A can be used to define the closure of A.

2.3 Submodular Functions

Submodular functions are a generalization of rank functions and are used in graph
theory, game theory, and some optimization problems.

Definition 6 ([5]) Let U be a non-empty set and f : P(U ) → R. The function f


is called submodular if for all A, B ⊆ U , f (A ∪ B) + f (A ∩ B)  f (A) + f (B).

In a covering approximation space it is possible to define some submodular func-


tions as follows:

Definition 7 ([5]) Let C be a covering of U . For all A ⊆ U ,

f C (A) = |{K ∈ C : K ∩ A = ∅}|, (15)

is called the upper approximation number of A with respect to C.

Proposition 3 ([11]) For a covering C of U the following properties hold for f C :


1. f C (∅) = 0
2. For all X ⊆ Y ⊆ U , f C (X )  f C (Y )
3. For all X, Y ⊆ U , f C (X ∪ Y ) + f C (X ∩ Y )  f C (X ) + f C (Y )
Matroids and Submodular Functions for Covering-Based Rough Sets 181

Table 1 Values of submodular functions for the coverings of U


A fC f C1 f C2 f C3 f C4 f C∩ f C∪
∅ 0 0 0 0 0 0 0
{1} 2 1 1 1 2 2 1
{2} 4 2 2 3 3 3 2
{3} 3 1 2 1 3 2 1
{4} 2 1 1 1 2 2 1
{1, 2} 4 2 2 3 3 3 2
{1, 3} 4 2 2 2 3 3 2
{1, 4} 4 2 2 2 3 4 2
{2, 3} 4 2 2 3 3 3 2
{2, 4} 5 3 2 4 3 4 3
{3, 4} 4 2 2 2 3 3 2
{1, 2, 3} 4 2 2 3 3 3 2
{1, 2, 4} 5 3 2 4 3 4 3
{1, 3, 4} 5 3 2 3 3 4 3
{2, 3, 4} 5 3 2 4 3 4 3
{1, 2, 3, 4} 5 3 2 4 3 4 3

Each submodular function defines a matroid in the sense of Proposition 4.

Example 2 Let U = {1, 2, 3, 4} be a set, and its coverings the ones defined in Exam-
ple 1. The values of f Ci (A) for all subsets A of U are shown in Table 1.

3 Matroidal Structures

This section considers the matroidal structures obtained via submodular functions
f Ci (A) for different partitions and coverings.

Proposition 4 Let C be a covering of U , and f a submodular function. Then


M f (C) = (U, I f (C)) is a matroid, where

I f (C) = {I ⊆ U : for all I  ⊆ I, f C (I  )  |I  |}.[6] (16)

3.1 Matroids from Partitions

Consider some partitions of U = {1, 2, 3, 4} and the respective matroidal structures.


According to Definition 7 and Proposition 4, each partition (covering) generates a
182 M. Restrepo and J. F. Aguilar

f ( 1
) f ( 2
)

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

f ( 3
) f ( 4
)
{1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4}


Ø

f ( 5
)

Fig. 1 Matroidal structures for five different partitions of U = {1, 2, 3, 4}

matroidal structure. The following example presents the collection of sets belonging
to the matroid.

Example 3 Let A = {1, 2, 3, 4} be a set with the partitions P1 = {{1}, {2}, {3}, {4}},
P2 = {{1, 2}, {3}, {4}}, P3 = {{1, 2}, {3, 4}}, P4 = {{1, 2, 3}, {4}} and P5 = {{1, 2,
3, 4}}. The matroidal structure I f (Pi ) for each partition Pi , is shown in Fig. 1.
For example, for partition P2 = {{1, 2}, {3}, {4}} we have that A = {2, 3, 4} ∈
I f (P2 ) because f P2 (A) = 3, and for each subset X ⊆ A, we have: f P2 (X ) ≥ |X |.

As we can see, a finer partition has a greater number of independent sets in the
matroid.

3.2 Matroids from Coverings

This section shows the matroidal structure of a covering C and the associated cover-
ings C1 , C2 , C3 , C4 , C∪ and C∩ , according to the upper approximation number given
in Eq. 15 and the Proposition 4.
Matroids and Submodular Functions for Covering-Based Rough Sets 183

f ( )= f ( ) f ( ) =
1 f
( )
3

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

f ( ) f ( ) =
4 f
( )
2

Fig. 2 Matroidal structures for different coverings of U = {1, 2, 3, 4}

Example 4 Let A = {1, 2, 3, 4} be a set. For the covering C and the associated
coverings C1 , C2 , C3 , C4 , C∪ and C∩ of Example 1, we have the matroidal structures
shown in the Fig. 2. The dark-circled sets belong to the matroid I f (Ci ).

Proposition 5 If C = P(U ), then I f (C) = P(U ).

Proof It is easy to see that f C ({k}) = 2n−1 for each k ∈ A. Let Ak = {1, 2, 3, . . . , k},
we have that f C (Ak ) ≥ f C (Ak−1 ) + f C ({k}) ≥ f C (Ak−1 ) + 2n−1 ≥ n ≥ k. So, for
each A ⊆ U we have that f C (A) ≥ |A|, so each A ⊆ U belongs to I f (C) and there-
fore, I f (C) = P(U ). 

3.3 Matroids from Neighborhood Operators

From the neighborhood operators defined above, we have the following coverings:

C N = {N (x) : x ∈ U } (17)

Therefore, by using Definition 7 it is possible to obtain new approximation opera-


tors. In the same way, it is possible to show that they are different from apr Ni .

Example 5 For the covering C in Example 1, we have the coverings:


1. C N1 = {{1}, {2}, {1, 3}, {2, 4}}
2. C N2 = {{1, 2, 3}, {2}, {1, 3}, {2, 4}}
184 M. Restrepo and J. F. Aguilar

3. C N3 = {{1, 2, 3}, {2}, {2, 4}}


4. C N4 = {{{2, 4}, {1, 2, 3}, {1, 2, 3, 4}}
Each covering C Ni for i = 1, 2, 3, 4 defines a matroid according to Proposition 4.

3.4 Matroids from Approximation Operators

By using different concepts in covering rough sets it is possible to define other


sub-modular functions. As an alternative to Eq. 15, we use the element approach of
neighborhood to define another type of submodular functions.

Definition 8 If (U, C) is a covering space and Ni are the neighborhood operators


defined in Sect. 2.1.3, it is possible to define the functions:

g Ni (A) = |{x ∈ U : Ni (x) ∩ A = ∅}| = |apr Ni (A)| (18)

Clearly, g Ni (∅) = 0 when apr Ni (∅) = ∅, and it is true for N1 , N2 , N3 and N4 .

Proposition 6 The functions g Ni are non-decreasing.

Proof It is simple to prove this by using the monotonicity property of the approxima-
tion operators: apr Ni . If A ⊆ B, then apr Ni (A) ⊆ apr Ni (B). Obviously, if A ⊆ B,
then |apr Ni (A)| ≤ |apr Ni (B)| and therefore, g Ni (A) ≤ g Ni (B). 

Proposition 7 The functions g Ni are submodulars.

Proof The approximation operators: apr Ni are join morphisms, i.e. apr Ni (A ∪ B) =
apr Ni (A) ∪ apr Ni (B), therefore:

|apr Ni (A ∪ B)| = |apr Ni (A) ∪ apr Ni (B)|


= |apr Ni (A)| + |apr Ni (B)| − |apr Ni (A ∩ B)|

In this case, the equality of the submodular property holds. 

Proposition 8 The matroidal structure for each submodular function g Ni is trivial,


i.e. Igi = P(U )

Proof If A is a subset of U , we know that for all I ⊆ A, we have that I ⊆ apr Ni (I )


and therefore g Ni (I ) ≥ |I |. So, A ∈ Igi . 

A matroidal structure defined from a lower approximation operator in classical


rough set theory was presented in [11]. We propose the following generalization to
covering-based rough sets, using order-preserving lower approximation operators,
i.e. operators that satisfy the property: if A ⊆ B, then apr (A) ⊆ apr (B).
Matroids and Submodular Functions for Covering-Based Rough Sets 185

Table 2 Illustration of submodular functions and lower approximations of neighborhood operators


A g N1 (A) g N2 (A) g N3 (A) g N4 (A) apr N (A) apr N (A) apr N (A) apr N (A)
1 2 3 4
{1} 1 2 1 3 {} {} {} {}
{2} 3 3 4 4 {2} {} {} {}
{3} 1 2 4 4 {} {} {} {}
{4} 1 1 1 3 {4} {4} {} {}
{1, 2} 3 3 4 4 {1, 2} {1} {} {}
{1, 3} 2 3 4 4 {} {} {} {}
{1, 4} 2 3 2 4 {4} {4} {} {}
{2, 3} 3 3 4 4 {2, 3} {3} {2, 3} {1, 2, 3, 4}
{2, 4} 4 4 4 4 {2, 4} {4} {} {1, 2, 3, 4}
{3, 4} 2 3 4 4 {4} {4} {} {1, 2, 3, 4}
{1, 2, 3} 3 3 4 4 {1, 2, 3} {1, 2, 3} {1, 2, 3} {1, 2, 3, 4}
{1, 2, 4} 4 4 4 4 {1, 2, 4} {1, 4} {} {1, 2, 3, 4}
{1, 3, 4} 3 4 4 4 {4} {4} {} {1, 2, 3, 4}
{2, 3, 4} 4 4 4 4 {2, 3, 4} {3, 4} {1, 2, 3, 4} {1, 2, 3, 4}

Proposition 9 If (U, C) is a covering space and apr is a lower approximation


operator which preserves order, then

Iapr = {A ⊆ U : apr (A) = ∅} (19)

is a matroid in U .

Proof We will prove each property in Definition 5.


1. Since apr (∅) = ∅, we have that ∅ ∈ Iapr .
2. If I ∈ Iapr and I  ⊂ I , then apr (I  ) ⊆ I = ∅. Therefore, apr (I  ) = ∅ and I  ∈
Iapr .
3. If I1 , I2 ∈ Iapr with |I1 | < |I2 | for each x ∈ I2 − I1 , we have that I1 ⊆ I1 ∪
{x} ⊆ I2 . Using the order-preserving property we have: apr (I1 ) ⊆ apr (I1 ∪
{x}) ⊆ apr (I2 ). Then ∅ ⊆ apr (I1 ∪ {x}) ⊆ ∅. Therefore apr (I1 ∪ {x}) = ∅ and
so I1 ∪ {x} ∈ Iapr .

Example 6 For the covering C from Example 1, the values of g Ni (A) are shown
in the first four columns of Table 2, and the lower approximations in the last four
columns.

The matroidal structures obtained from submodular functions g Ni can be seen in


Fig. 3.
186 M. Restrepo and J. F. Aguilar

apr ( N1) apr ( N2)

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}

{1} {2} {3} {4} {1} {2} {3} {4}


Ø Ø

apr ( N3) apr (N4 )

Fig. 3 Matroidal structures for lower approximation operators apr N


i

4 Order and Pre-order Relations

4.1 Pre-order Relations on Coverings

Different pre-order relations among coverings can be defined. For example, following
the idea of general topology, we can say that C is finer than D, if D ⊆ C. For the
coverings defined before, we have that: C1 ⊆ C, C2 ⊆ C, C∩ ⊆ C and C∪ ⊆ C.
Other pre-order relations for coverings can be seen in [1].

Definition 9 ([1]) If C and D are coverings of U , we say that C precedes D, denoted


as: C  D, if for each K ∈ C there exists L ∈ D such that K ⊆ L.

The relation  is reflexive and transitive, but in general it is not anti-symmetric.


For example, for C = {{1}, {1, 2, 3}, {2, 4}} and D = {{1, 2}, {1, 2, 3}, {2, 4}}, we
have that C  D, and that D  C. Clearly, C = D.

Proposition 10 If C and D are coverings of U such that C ⊆ D, then C  D.

Proof If K ∈ C, then K ∈ D with K ⊆ K . So, C  D. 

Definition 10 ([1]) If C and D are coverings of U , we define C  D if for for all


L ∈ D, there exist {K 1 , K 2 , . . . , K p } ∈ C such that L = K 1 ∪ K 2 ∪ · · · ∪ K p .

Definition 11 If C and D are coverings of U , and N is a neighborhood operator, we


define C  D, if for all x ∈ U , N C (x) ⊆ N D (x).
Matroids and Submodular Functions for Covering-Based Rough Sets 187

The relation  is reflexive and transitive, but in general it is not anti-symmetric.


For the neighborhood operator N1 , from [3] we know that N1C = N1C1 , and that
generally C = C1 .
Proposition 11 The order relation among the seven coverings is:
a. C3  C1  C2  C4
b. C  C∩ and C  C∪
Proof By definition it is easy to show that the coverings C, C1 , C2 , C3 and C4 satisfy
the conditions of Proposition 10. 

4.2 Order Relation on Submodular Functions

A pre-order relation among submodular functions, is given by the upper approxima-


tion number, of the relations among coverings defined above.
Proposition 12 If C and D are coverings of U such that C ⊆ D, then f C  f D .
Proof We will show that f C (X )  f D (X ) for all X ⊆ U . If K ∈ C satisfies K ∩ X =
∅, then K ∩ X = ∅ with K ∈ D, and so f C  f D . 
From Proposition 11 and the relations C1 ⊆ C, C2 ⊆ C and C∩ ⊆ C, we have:
• f C1  f C
• f C2  f C
• f C∩  f C
Proposition 13 If C  D, then f C  f D .
Proof We will show that f C (X )  f D (X ) for all X ⊆ U . We know that f C (X ) =
|{K ∈ C : K ∩ X = ∅}|. If K ∈ C, with K ∩ X = ∅, then there exists L ∈ D such
that K ⊆ L, so ∅ = K ∩ X ⊆ L ∩ X and L ∩ X = ∅, therefore f C  f D . 
Proposition 14 If C  D, then g CN  g DN .
Proof According to the definition of relation , we have that N C (x) ⊂ N D (x) for
all x ∈ U . By Proposition 7 in [8], we have apr CN (A) ⊆ apr DN (A), and therefore
|apr CN (A)| ≤ |apr DN (A)|. 

4.3 Order Relation on Matroidal Structures

This section pretends to establish an order relation between the matroidal structures
M f (C) = (U, I f (C)) for the covering C and its associated coverings C1 , C2 , C3 ,
C4 , C∪ and C∩ , and an order relation among the matroids: Iapri .
In this case, we can use Propositions 12 and 13 to establish an order relation
among matroidal structures.
188 M. Restrepo and J. F. Aguilar

f
( )

( ) = f ( ) f ( )
2 f ( )
4 ( )
f 1 f

f ( )
3

Fig. 4 Order relation for matroidal structures

apr N1 apr4

apr N3 apr N2 apr3 apr2

apr N4 apr1

Fig. 5 Order relation for matroidal structures derived from approximation operators

Proposition 15 If f C  f D , then I f (C) ⊆ I f (D).


Proof If I ∈ I f (C), then for all I  ⊆ I , we have f C (I  ) ≥ |I  |. Since f C  f D , we
have f D (I  ) ≥ f C (I  ) ≥ |I  |, therefore I ∈ I f (D). 
Figure 4 shows the partial order relations among the matroidal structures, based
on Proposition 15.
In this case, we can see the relation among matroids, as the order I f (C3 ) ⊆
I f (C1 ) = I f (C∪ ). While I f (C2 ), I f (C4 ) and I f (C∩ ) are not comparable.
For the lower approximation operators we have the following proposition:
Proposition 16 If apr i ≤ apr j , then Iapr j ⊆ Iapri .

Proof Let us suppose that apr i ≤ apr j . If X ∈ Iapr j then apr j (X ) = ∅ and apr i (X )
⊆ apr j (X ) = ∅. So, apr i (X ) = ∅ and X ∈ Iapri . 

The order relation among lower approximation operators and the matroids can be
seen in Fig. 5.
Matroids and Submodular Functions for Covering-Based Rough Sets 189

apr11

apr4
apr7
apr12

apr9

apr6
apr14 apr8

apr3
apr10 apr5

apr16 apr1

Fig. 6 Order relation for matroidal structures defined through lower approximation operators

This order relation can be extended to other order-preserving lower approximation


operators, for example: apr C and apr C for all coverings C1 , C1 , C2 , C3 , C4 and C∩
defined in [16] and the order relation established in [8].
In this case, the matroid Iapri in Fig. 6 represents the respective group i of lower
approximation operators considered in [8], although it is to be noted that the groups
13 and 15 were deleted, because they do not satisfy the monotonicity property.

5 Conclusions

This paper presents different matroidal structures obtained from partitions and cov-
erings of some sets, other matroidal structures defined from upper approximation
number, and structures of the lower approximation operators in rough sets. These
structures are generalized to covering-based rough sets, through order-preserving
lower approximation operators.
We use preorder relations among coverings, presented in [1], and the order relation
among sixteen lower approximation operators, presented in [8], to define a partial
order relation on matroidal structures.
It is important to note that finer coverings generate matroidal structures with a
greater number of sets. Results about order among matroids are helpful to select
appropriate structures in typical rough set applications, such as attribute selection
190 M. Restrepo and J. F. Aguilar

and classification. Our future studies will consider these structures and their relation
with the attribute reduction problem via approximation operators in covering-based
rough sets.

Acknowledgements This work was supported by the Universidad Militar Nueva Granada Special
Research Fund, under the project CIAS 2549-2018.

References

1. Bianucci, D., Cattaneo, G.: Information entropy and granular co-entropy of partition and cov-
erings: a summary. Trans. Rough Sets 10, 15–66 (2009)
2. Bonikowski, Z., Brynarski, E.: Extensions and intensions in rough set theory. Inf. Sci. 107,
149–167 (1998)
3. D’eer, L., Restrepo, M., Cornelis, C., Gómez, J.: Neighborhood operators for covering-based
rough sets. Inf. Sci. 336, 21–44 (2016)
4. Huang, A., Zhu, W.: Geometric lattice structure of covering based rough sets through matroids.
J. Appl. Math. 53, 1–25 (2012)
5. Lai, W.: Matroid Theory. Higher Education Press, Beijing (2001)
6. Li, X., Liu, S.: Matroidal approaches to rough sets via closure operators. Int. J. Approx. Reason.
53, 513–527 (2012)
7. Pomykala, J.A.: Approximation operations in approximation space. Bulletin de la Académie
Polonaise des Sciences 35, 653–662 (1987)
8. Restrepo, M., Cornelis, C., Gómez, J.: Partial order relation for approximation operators in
covering-based rough sets. Inf. Sci. 284, 44–59 (2014)
9. Tang, J., She, K., Min, F., Zhu, W.: A matroidal approach to rough set theory. Theor. Comput.
Sci. 47, 1–11 (2013)
10. Tsang, E., Chen, D., Lee, J., Yeung, D.S.: On the upper approximations of covering generalized
rough sets. In: Proceedings of the 3rd International Conference on Machine Learning and
Cybernetics, pp. 4200–4203 (2004)
11. Wang, S., Zhu, W., Min, F.: Transversal and function matroidal structures of covering-based
rough sets. Lect. Notes Comput. Sci. RSKT 2011(6954), 146–155 (2011)
12. Wang, S., Zhu, Q., Zhu, W., Min, F.: Matroidal structure of rough sets and its characterization
to attribute reduction. Knowl.-Based Syst. 54, 155–161 (2012)
13. Wang, S., Zhu, W., Zhu, Q., Min, F.: Four matroidal structures of covering and their relationships
with rough sets. Int. J. Approx. Reason. 54, 1361–1372 (2013)
14. Wu, M., Wu, X., Shen, T.: A new type of covering approximation operators. IEEE Int. Conf.
Electron. Comput. Technol. xx, 334–338 (2009)
15. Xu, W., Zhang, W.: Measuring roughness of generalized rough sets induced by a covering.
Fuzzy Sets Syst. 158, 2443–2455 (2007)
16. Yao, Y.Y., Yao, B.: Covering based rough sets approximations. Inf. Sci. 200, 91–107 (2012)
17. Zakowski, W.: Approximations in the space (u, π ). Demonstratio Mathematica 16, 761–769
(1983)
18. Zhang, Y., Li, J., Wu, W.: On axiomatic characterizations of three pairs of covering based
approximation operators. Inf. Sci. 180, 274–287 (2010)
19. Zhu, W.: Properties of the first type of covering-based rough sets. In: Proceedings of Sixth
IEEE International Conference on Data Mining - Workshops, pp. 407–411 (2006)
20. Zhu, W.: Properties of the second type of covering-based rough sets. In: Proceedings of the
IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Tech-
nology, pp. 494–497 (2006)
21. Zhu, W.: Basic concepts in covering-based rough sets. In: Proceedings of Third International
Conference on Natural Computation, pp. 283–286 (2007)
Matroids and Submodular Functions for Covering-Based Rough Sets 191

22. Zhu, W.: Relationship between generalized rough sets based on binary relation and covering.
Inf. Sci. 179, 210–225 (2009)
23. Zhu, W., Wang, F.: A new type of covering rough set. In: Proceedings of Third International
IEEE Conference on Intelligence Systems, pp. 444–449 (2006)
24. Zhu, W., Wang, F.: On three types of covering based rough sets. IEEE Trans. Knowl. Data Eng.
8, 528–540 (2007)
25. Zhu, W., Wang, J.: Contraction to matroidal structure of rough sets. LNAI 8171, 75–86 (2013)
26. Zhu, X., Zhu, W., Fan, X.: Rough set methods in feature selection via submodular function.
Soft Comput. 21(13), 3699–3711 (2017)
Similar Prototype Methods for Class
Imbalanced Data Classification

Yanela Rodríguez Alvarez, Yailé Caballero Mota, Yaima Filiberto Cabrera,


Isabel García Hilarión, Yumilka Fernández Hernández
and Mabel Frias Dominguez

Abstract In this paper, new methods for solving imbalanced classification problems
based on prototypes are proposed. Using similarity relations for the granulation of
the universe, similarity classes are generated and a prototype is selected for each
similarity class. Experimental results show that the performance of our methods is
statistically superior to other imbalanced methods.

Keywords Imbalanced classification · Prototype selection · Prototype generation


Classification · Similarity relations

1 Introduction

In Machine Learning the problems of class imbalance (the examples of one class over
the other predominate, disproportionately) continue to emerge in the industrial and
academic sectors alike. Many classification algorithms used in real-world systems
and applications fail to meet the performance requirements when faced with severe

Y. R. Alvarez (B) · Y. C. Mota · Y. F. Cabrera · I. G. Hilarión · Y. F. Hernández · M. F. Dominguez


Departamento de Computación, Universidad de Camagüey, Circunvalación Norte Km 5 ½,
Camagüey, Cuba
e-mail: yanela.rodriguez@reduc.edu.cu
Y. C. Mota
e-mail: yaile.caballero@reduc.edu.cu
Y. F. Cabrera
e-mail: yaima.filiberto@reduc.edu.cu
I. G. Hilarión
e-mail: isabel.garcia@reduc.edu.cu
Y. F. Hernández
e-mail: yumilka.fernandez@reduc.edu.cu
M. F. Dominguez
e-mail: mabel.frias@reduc.edu.cu

© Springer Nature Switzerland AG 2019 193


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_11
194 Y. R. Alvarez et al.

class distribution skews [1, 2]. Various approaches have been developed in order to
deal with this issue, including some forms of class under-sampling or over-sampling
[3], synthetic data generation [4], misclassification cost sensitive techniques [5],
decision trees [6], rough sets [7], kernel methods [8], ensembles [9–11] or active
learning [12]. Novel classifier designs are still being proposed [13].
An alternative to reduce this problem, maybe, is the classification based on the
Nearest Prototype (NP) [14]. This is a method to determine the value of the decision
attribute of a new object by analyzing its similarity with respect to a set of prototypes
which are selected or generated from the initial set of instances. The way to get this
set of prototypes is based on selecting an original set of labeled examples or replacing
the original set by a different and diminished one [15].
Also, using the Rough Set Theory (RST) [16] it is possible to solve problems
related to: data reduction, discovery of dependencies between data, estimation of
data significance, generation of decision or control algorithms from data, approximate
classification of data, discovery of similarities or differences in data, discovery of
patterns, discovery of cause-effect relationships, etc. In particular, rough sets have had
an interesting application in medicine, business, engineering design, meteorology,
vibration analysis, conflict analysis, image processing, voice recognition, character
recognition, decision analysis, etc. [17].
On the other hand, the algorithms NPBASIR-CLASS [18] and NPBASIR SEL-
CLASS [19] have been recognized for their good results with respect to classification
accuracy. These methods are based on the NP approach combined with the RST.
These methods (NPBASIR-CLASS and NPBASIR SEL-CLASS) are designed to
construct and select prototypes, respectively, using the concepts of Granular Com-
putation [20], and they are based on the NPBASIR algorithm [21]. Granulation of
a universe is performed using a relation of similarity which generates similarity
classes of objects of the universe, and for each similarity class one prototype is built.
To construct the similarity relation is used the method proposed to [22].
The goal of this work is to extend the capabilities of prototype-based methods
and similarity relationships so that they are sensitive to class imbalanced data clas-
sification.

2 Methodology

The method proposed in [18] is an iterative procedure in which prototypes are con-
structed from similarity classes of objects of the universe: a similarity class is con-
structed using the similarity relation R where the similarity class is denoted by
([Oi]R), and a prototype is constructed for this class of similarity. Whenever an
object is included in a class of similarity, it is marked as used, and is not taken into
account when another similarity class is constructed; but used objects can belong to
similarity classes that will be constructed for other non-used objects.
This method uses a similarity relation R and a set of instances X 
{X 1 , X 2 , . . . , X n }, each of which is described by a vector of m descriptive features
Similar Prototype Methods … 195

and belongs to one of k classes C  {c1 , c2 , . . . , ck }. The similarity relation R is


constructed according to the method proposed in [22]; this is based on finding the
relation that maximizes the quality of the similarity measure. In this case, the relation
R is sought that generates a granulation considering the m descriptive features, as
similar as possible to the granulation according to the classes.
Moreover, the method proposed in [19] is an iterative procedure in which proto-
types are constructed from similarity classes of objects in the universe: a similarity
class is constructed using the similarity relation R([Oi]R) and a prototype is selected
for this class of similarity. The main different between this method and the last method
mentioned is that this proposal selects from original training set the objects more
similar in relation to remains object per similarity class instead create new objects
from similarity class.
A data set is balanced if it has approximately equal percentage of examples in
the concepts to be classified; that is, if the distribution of examples by classes is
uniform. In reality, however, events almost never occur with the same frequency, not
to mention “rare” occurrences that occur sporadically [23], so that in capturing the
data many examples of a class appears (which would be the normal state) and few
of the others (which would be the rare anomaly or event). Given that more and more
applications are facing this problem [24], it has been booming to become one of
the learning problems that have focused the attention of researchers in the Machine
Learning area in recent years.
The decision tree C4.5 proposed by Quinlan in 1993 [25], is often referred to as
a statistical classifier of the unbalanced training sets [26] and for this reason is used
for the experimentation in this investigation. It requires parameters that have been
selected according to the author’s recommendation: particularly, confidence level 
0.25, minimum number of item sets per sheet  2 and finally pruned  true.
Furthermore, the Cost-sensitive C4.5 decision tree (CS-C4.5) [27] builds decision
trees that try to minimize the number of high cost errors and, as a consequence, leads
to the minimization of the total misclassification costs of most cases. The method
changes the class distribution such that the induced tree is in favor of the class of
high weight/cost and is less likely to commit errors with high cost.
Among the methods described in the literature with better results in the preprocess-
ing of these types of data are found: SMOTE [28], SMOTE-ENN [26], SMOTE-RSB*
[29] and SMOTE-TL [26]. An important advantage of the data level approaches is
that their use is independent from the classifier selected [30]. Next, were view these
high-quality proposals in more details because that will be used in our experimental
study.
A. Synthetic minority oversampling technique (SMOTE) [27]: is an oversampling
technique of the minority class. It works by taking each minority class sample
and introducing synthetic examples along the line segments joining any/all of
the k minority class nearest neighbours.
B. SMOTE-ENN [25]: This method consists of the application of the Edited Near-
est Neighbor rule (ENN) as cleaning method over the data set obtained by the
application of SMOTE.
196 Y. R. Alvarez et al.

C. SMOTE-RSB* [28]: This is another hybrid data level method. It first applies
SMOTE to introduce new synthetic minority class instances to the training set
and then removes synthetic instances that do not belong to the lower approxi-
mation of its class, computed using rough set theory. This process is repeated
until the training set is balanced.
D. SMOTE- TL [25]: This method consists of the application of Tomek Links as
cleaning method over the data set obtained by the application of SMOTE.

3 Imbalanced Methods Based on Prototype

The NPBASIR SEL-CLASS and NPBASIR-CLASS algorithms treat the positive


and negative classes symmetrically. So they are not prepared to deal with the class
imbalance.
One of the alternatives to face learning from unbalanced training sets is to create
new algorithms or modify existing ones according to the problem of imbalance
between classes. This is the case of this work, in which a modification of NPBASIR-
CLASS and NPBASIR SEL-CLASS algorithms is proposed, to adapt to the new
situation. The variant analyzed to consist of the modification of the measure of
similarity quality, which serves as the basis for the calculation of the weights used
in the construction/selection of prototype sets. In addition, we modified the relation
R that is used to construct the similarity class of the objects.
The proposed modifications to study the performance of NPBASIR-CLASS and
NPBASIR SEL-CLASS algorithms in the case of imbalanced mixed data consider
modifying the measure of similarity quality. The Similarity Quality Measure of the
decision system is defined by expression (1) in [22].

ϕ(x)
θ (S D)  ∀x∈U (1)
|U |

The measure θ (S D) [22] represents the degree to which the similarity between
objects, using the conditional features in A, is equivalent to the similarity obtained
according to the decision feature d. The problem is to find the relations R1 and R2
that maximizes the Similarity Quality Measure, according to the expression (2).

ϕ(x)
max → ∀x∈U (2)
|U |

In the case of decision systems in which the domain of the decision feature is
discrete, as in the case of classification problems, the relation R2 is defined as x R2 y ⇔
x(d)  y(d), where x(d) is the value of the decision feature d for the object x.
This measure has been successfully applied as method of calculating weights in
the k-NN function estimator to calculate the initial weights of the links between the
input layer and the hidden layer in a Perceptron multi-layer network [31] and in the
Similar Prototype Methods … 197

rule generation IRBASIR [32] and recently in the construction of the prototypes set
to solve problems of approximation of functions [21] and classification [19, 33].
Next we present the modification alternatives to the NPBASIR-CLASS and
NPBASIR SEL-CLASS algorithms for imbalanced datasets with two classes. These
variants consider modifying the quality measure of the similarity defined by (1).
IMBNPBASIR-CLASS v1 and IMBNPBASIR SEL-CLASS v1: Modification
of the measure of similarity quality (3) as follows:

ϕ ∗ (x)
θ (DS)  ∀x∈U (3)
|U |

where ϕ ∗ (x) is defined by expression (4):




ϕ(x) si x ∈ C +
ϕ (x)  (4)
ϕ(x)2 si x ∈ C −

This modification makes the contribution of the objects of the majority class,
unless it has value 1, where C + is the set of objects belonging to the majority class
and C − to the minority class.
IMBNPBASIR-CLASS v2 and IMBNPBASIR SEL-CLASS v2: Modification
of the measure of similarity quality (5) as the other alternative:

α ∗ θ − (DS) + (1 − α) ∗ θ + (DS)
θ (DS)  para 0.5 < α < 1 (5)
2

where θ − (DS) and θ + (DS) are defined by expressions (6) and (7):

− ϕ(x)
θ (DS)  ∀x∈C−

(6)
|C |

+ ϕ(x)
θ (DS)  ∀x∈C+
+
(7)
|C |

With this modification, we can clearly give more weight to the objects of the
minority class when calculating the quality of the similarity.
Also, in both variants, in step 3 of IMBNPBASIR C and IMBNPBASIR SEL-C
use the relation R  to build the similarity class of object xi , this means that two
objects are similar if their similarity according to descriptive features is greater than
a threshold ε and belong to the same class:
   
xi R  x j ⇔ F1 xi , x j ≥ ε1 y F2 xi , x j  1

where functions F1 and F2 are defined by expressions (8) and (9):


n
F1 (x, y)  wi ∗ ∂(xi , yi ) (8)
i1
198 Y. R. Alvarez et al.

F2 (x, y)  ∂(xi , yi ) (9)

The weights in expression (5) are calculated according to the method proposed
in [31, 34], and the features’ comparison functions ∂i (xi , yi ), which calculates the
similarity between the values of objects x and y with respect to the feature instances
i, is defined by expression (10), where Di is the domain of feature i:

⎪ |(xi −yi )|
⎨ 1 − Max(ni )−Min(ni ) si i es continuous

∂i (xi , yi )  1 si i es discr ete to y xi
 yi (10)


⎩ 0 si i es discr ete to y x
 y
i i

Using expressions (8) and (10) allows works with mixed data, i.e., application
domains, where the domain of descriptive features can be either numeric or symbolic
values.

4 Experimental Results

This section is divide in two parts, firstly we compare the both variants of
IMBNPBASIR-CLASS and IMBNPBASIR SEL-CLASS with the state-of-the-art
methods for imbalanced classification over the entire collection of 89 datasets. Next,
we provide a detailed analysis for different IR levels (low IR, high IR, and very high
IR). Furthermore, we compare the algorithms proposed combined with three meth-
ods SMOTE, SMOTE-ENN, SMOTE-RSB*, SMOTE-TL with the state-of-the-art
methods for imbalanced classification.
We consider 89 datasets with different imbalance ratios (IR) (between 1.82 and
129.44) to evaluate our proposal. You can find and download all the datasets from
KEEL-dataset repository [35] from KEEL-dataset webpage (http://keel.es/datasets.
php). The characteristics of these datasets can be found in Table 1, showing the IR,
the number of instances (Inst), and the number of attributes (Attr) for each of them.
Apart from considering the dataset collection as a whole, in our experimental
study we have also considered three subsets of the collection based on their IR. The
purpose of this division is to evaluate the behavior of the algorithms at different
imbalance levels.
(1) IR < 9 (low imbalance): This group contains 22 datasets, all with IR lower than
9.
(2) IR ≥ 9 (high imbalance): This group contains 49 datasets, all with IR at least 9.
(3) IR ≥ 33 (very high imbalance): This group contains 18 datasets, all with IR at
least 33.
This section presents the results of the experimental analysis. In particular, the
proposed algorithms are compared with the state-of-the-art algorithms selected for
the comparative study with the objective of determining which is the most competitive
Table 1 Description of the datasets used in the experimental evaluation
Datasets IR Inst Attr Datasets IR Inst Attr
glass1 1.82 214 9 ecoli-0-1-4-7_vs_5-6 12.28 332 7
ecoli-0_vs_1 1.86 220 9 cleveland-0_vs_4 12.31 173 13
wisconsinImb 1.86 683 7 ecoli-0-1-4-6_vs_5 13 280 6
pimaImb 1.87 768 9 ecoli4 13.84 336 7
iris0 2 150 4 shuttle-c0-_vs_-c4 13.87 1829 9
glass0 2.06 214 9 yeast-1_vs_7 14.3 459 7
Similar Prototype Methods …

yeast1 2.46 1484 8 glass4 15.46 214 9


vehicle1 2.52 846 18 page-blocks-1-3_vs_4 15.85 472 10
vehicle3 2.52 846 18 abalone9-18 16.68 731 8
haberman 2.68 306 3 dermatology-6 16.9
vehicle2 2.88 846 18 glass-0-1-6_vs_5 19.44 184 9
glass-0-1-2-3_vs_4-5-6 3.2 214 9 shuttle-c2-_vs_-c4 20.5 129 9
vehicle0 3.23 846 18 shuttle-6_vs_2-3 22 230 9
ecoli1 3.36 336 7 yeast-1-4-5-8_vs_7 22.1 693 8
new-thyroid1 5.14 215 5 glass5 22.78 214 9
new-thyroid2 5.14 215 5 yeast2_vs_8 23.1 482 8
ecoli2 5,46 336 7 lymphography-normal-fibrosis 23.6 148 19
segment0 6.01 2308 20 yeast4 28.1 1484 8
glass6 6.38 214 9 winequalityred-4 29.17 1599 11
yeast3 8.11 1484 8 poker-9_vs_7 29.5
(continued)
199
Table 1 (continued)
200

Datasets IR Inst Attr Datasets IR Inst Attr


ecoli3 8.19 336 7 kddcup-guess-passwd_vs_satan 29.98 1642 41
page-blocks0 8.77 5472 10 yeast-1-2-8-9_vs_7 30.57 947 8
ecoli-0-3-4_vs_5 9 200 7 abalone-3_vs_11 32.47 502 8
yeast2_vs_4 9.08 514 8 winequality-white-9_vs_4 32.6 168 77
ecoli-0-6-7_vs_3-5 9.09 222 7 yeast5 32.73 1484 8
ecoli-0-2-3-4_vs_5 9.1 202 7 winequalityred-8_vs_6 35.44 656 11
glass-0-1-5_vs_2 9.12 172 9 ecoli-0-1-3-7_vs_2-6 39.14 281 7
yeast-0-3-5-9_vs_7-8 9.12 506 8 abalone-17_vs_7-8-9-10 39.3 2338 8
yeast-0-2-5-6_vs_3-7-8-9 9.14 1004 8 abalone-21_vs_8 40.5 581 8
yeast-0-2-5-7-9_vs_3-6-8 9.14 1004 8 yeast6 41.4 1484 8
ecoli-0-4-6_vs_5 9.15 203 6 winequality-white-3_vs_7 44 900 11
ecoli-0-1_vs_2-3-5 9.17 244 7 winequality-red-8_vs_6-7 46.5 855 11
ecoli-0-2-6-7_vs_3-5 9.18 224 7 kddcup-land_vs_portsweep 49.5 1061 41
glass-0-4_vs_5 9.22 92 9 abalone-19_vs_10-11-12-13 49.69 1622 8
ecoli-0-3-4-6_vs_5 9.25 205 7 winequality-white-3-9_vs_5 58.28 1482 11
ecoli-0-3-4-7_vs_5-6 9.28 257 7 poker-8-9_vs_6 58.4 1485 11
yeast-0-5-6-7-9_vs_4 9.35 528 8 shuttle-2_vs_5 66.6 3316 9
vowel0 9.98 988 13 winequality-red-3_vs_5 68.1 691 11
ecoli-0-6-7_vs_5 10 220 6 abalone-20_vs_8-9-10 72.69 1916 8
(continued)
Y. R. Alvarez et al.
Table 1 (continued)
Datasets IR Inst Attr Datasets IR Inst Attr
glass-0-1-6_vs_2 10.29 192 9 kddcup-buffer-overflow_vs_back 73.43 2233 41
glass2 10.39 214 9 kddcup-land_vs_satan 75.6 1610 41
ecoli-0-1-4-7_vs_2-3-5-6 10.59 336 7 poker-8-9_vs_5 82 2075 11
led7digit-0-2-4-5-6-7-8-9_vs_1 10.97 443 7 poker-8_vs_6 85.8 1477 11
ecoli-0-1_vs_5 11 240 6 kddcup-rootkit-imap_vs_back 100.1 2225 41
Similar Prototype Methods …

glass-0-6_vs_5 11 108 9 abalone19 129.44 4174 8


glass-0-1-4-6_vs_2 11.06 205 9
201
202 Y. R. Alvarez et al.

Table 2 Mean AUC for state-of-the-art methods and the proposed methods for different IR levels
Algorithm All <9 >9 >33
S-C4.5 0.83 0.86 0.85 0.71
B-C4.5 0.82 0.87 0.83 0.70
E-C4.5 0.83 0.87 0.85 0.72
TL-C4.5 0.82 0.86 0.84 0.71
CS-C4.5 0.82 0.87 0.83 0.71
IMBNP-C-v1 0.90 0.86 0.90 0.94
IMBNP-SC-v1 0.92 0.86 0.93 0.95
IMBNP-C-v2 0.90 0.89 0.88 0.97
IMBNP-SC-v2 0.95 0.90 0.96 0.99

proposals in each of the four blocks of experiments considered (all sets, the low IR,
Those of high IR and those of very high IR).
SMOTE, SMOTE-ENN, SMOTE-RSB* and SMOTE-TL are four preprocessing
methods that need to be combined with a base classifier, for this purpose we chose
C4.5 [23] a well-known classifier. Similarly, we will consider Cost-sensitive C4.5
decision tree (CS-C4.5), as imbalanced learning method to compare with our method;
like as discussed in previously sections.
Table 2 shows the mean AUC of the selected preprocessing algorithms using as
classifiers C4.5 and the all IMBNPBASIR variants for all datasets, the low IR, High
IR and very high IR. The columns in the tables correspond to: SMOTE-C4.5 (S-
C4.5), SMOTE-RSB*-C4.5 (B-C4.5), SMOTE-ENN-C4.5 (E-C4.5), SMOTE-TL-
C4.5 (TL-C4.5) IMBNPBASIR-CLASS v1 (IMBNP-C-v1), IMBNPBASIR SEL-
CLASS v1 (IMBNP-SC-v1), IMBNPBASIR-CLASS v2 (IMBNP-C-v2) and IMB-
NPBASIR SEL-CLASS v2 (IMBNP-SC-v2). We can see that IMBNPBASIR SEL-
CLASS v2 obtains the highest average AUC.
In order to compare the different algorithms appropriately, we will conduct a
statistical analysis using nonparametric tests as suggested in the literature [36].
We first use Friedman’s aligned-ranks test [37] to detect statistical differences
between a set of algorithms. The Friedman test computes the average aligned-ranks
of each algorithm, obtained by computing the difference between the performance of
the algorithm and the mean performance of all algorithms for each dataset [11]. The
lower the average rank, the better the corresponding algorithm. Then, if significant
differences are found by the Friedman test, we check if the control algorithm (the
one obtaining the smallest rank) is significantly better than the others using Holm’s
posthoc test [11, 38] (Tables 3, 4, 5, 6, 7, 8, 9 and 10).
After this experimental study, it can be observed that the method that offers the
best results for the imbalanced case is IMBNPBASIR SEL-CLASS v2 in all variants
studied (considering all sets, low IR, high IR and very high IR). In the low IR case
IMBNPBASIR SEL-CLASS v2 obtained comparable results respect IMBNPBASIR
SEL-CLASS v1. On the other hand, in the very high IR all variants of IMBNPBASIR
obtain comparable results and they get results significantly higher than the state-of-
Similar Prototype Methods … 203

Table 3 Average Friedman Algorithm Ranking


rankings for all imbalance
datasets S-C4.5 5.9101
SB-C4.5 6.3596
SE-C4.5 5.9888
STL-C4.5 6.4438
CS-C4.5 6.5112
IMBNP-C-v1 4.0112
IMBNP-C-v2 3.6573
IMBNP-SC-v1 3.9045
IMBNP-SC-v2 2.2135

Table 4 Holm’s posthoc procedure for all Imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 10.468653 0 0.00625 Rejected
7 TL-C4.5 10.304439 0 0.007143 Rejected
6 B-C4.5 10.099171 0 0.008333 Rejected
5 E-C4.5 9.195993 0 0.01 Rejected
4 S-C4.5 9.00441 0 0.0125 Rejected
3 IMBNP-C-v1 4.379044 0.000012 0.016667 Rejected
2 IMBNP-SC-v1 4.119039 0.000038 0.025 Rejected
1 IMBNP-C-v2 3.51692 0.000437 0.05 Rejected

Table 5 Average Friedman Algorithm Ranking


rankings for low imbalance
datasets S-C4.5 5.6364
B-C4.5 5.3864
E-C4.5 5.4773
TL-C4.5 6.6818
CS-C4.5 5.3409
IMBNP-C-v1 4.75
IMBNP-C-v2 5.2273
IMBNP-SC-v1 3.6591
IMBNP-SC-v2 2.8409
204 Y. R. Alvarez et al.

Table 6 Holm’s posthoc procedure for low imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 TL-C4.5 4.651572 0.000003 0.00625 Rejected
7 S-C4.5 3.385464 0.000711 0.007143 Rejected
6 E-C4.5 3.192795 0.001409 0.008333 Rejected
5 B-C4.5 3.082699 0.002051 0.01 Rejected
4 CS-C4.5 3.02765 0.002465 0.0125 Rejected
3 IMBNP-C-v2 2.89003 0.003852 0.016667 Rejected
2 IMBNP-C-v1 2.312024 0.020776 0.025 Rejected
1 IMBNP-SC-v1 0.990867 0.32175 0.05 No rejected

Table 7 Average Friedman Algorithm Ranking


rankings for high imbalance
datasets S-C4.5 5.8367
B-C4.5 6.5612
E-C4.5 5.8673
TL-C4.5 6.4388
CS-C4.5 6.8163
IMBNP-C-v1 3.8469
IMBNP-C-v2 3.0816
IMBNP-SC-v1 4.5714
IMBNP-SC-v2 1.9796

Table 8 Holm’s posthoc procedure for high imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 8.741877 0 0.00625 Rejected
7 B-C4.5 8.280807 0 0.007143 Rejected
6 TL-C4.5 8.059494 0 0.008333 Rejected
5 E-C4.5 7.026698 0 0.01 Rejected
4 S-C4.5 6.97137 0 0.0125 Rejected
3 IMBNP-SC-v1 4.684466 0.000003 0.016667 Rejected
2 IMBNP-C-v1 3.375028 0.000738 0.025 Rejected
1 IMBNP-C-v2 1.99182 0.046391 0.05 Rejected
Similar Prototype Methods … 205

Table 9 Average Friedman Algorithm Ranking


rankings for very high
imbalance datasets S-C4.5 6.4444
B-C4.5 7
E-C4.5 6.9444
TL-C4.5 6.1667
CS-C4.5 7.1111
IMBNP-C-v1 3.5556
IMBNP-C-v2 3.3056
IMBNP-SC-v1 2.3889
IMBNP-SC-v2 2.0833

Table 10 Holm’s posthoc procedure for very high imbalance datasets, using IMBNP-SC-v2 as the
control algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 5.507655 0 0.00625 Rejected
7 B-C4.5 5.385938 0 0.007143 Rejected
6 E-C4.5 5.32508 0 0.008333 Rejected
5 S-C4.5 4.777358 0.000002 0.01 Rejected
4 TL-C4.5 4.473068 0.000008 0.0125 Rejected
3 IMBNP-C-v1 1.612739 0.106801 0.016667 No rejected
2 IMBNP-C-v2 1.338877 0.180611 0.025 No rejected
1 IMBNP-SC-v1 0.334719 0.737837 0.05 No rejected

the-art algorithms. In the other cases it surpasses the latter and to the other algorithms
of the state of the art.
Table 11 shows the mean AUC of the selected preprocessing algorithms using
as classifiers IMBNPBASIR and the state-of-the-art algorithms. The columns in the
tables correspond to:
• SMOTE + IMBNPBASIR-CLASS v1: S-IMBNP-C-v1
• SMOTE + IMBNPBASIR-CLASS v2: S-IMBNP-C-v2
• SMOTE + IMBNPBASIR SEL-CLASS v1: S-IMBNP-SC-v1
• SMOTE + IMBNPBASIR SEL-CLASS v2: S-IMBNP-SC-v2
• SMOTE-ENN + IMBNPBASIR-CLASS v1: E-IMBNP-C-v1
• SMOTE-ENN + IMBNPBASIR-CLASS v2: E-IMBNP-C-v2
• SMOTE-ENN + IMBNPBASIR SEL-CLASS v1: E-IMBNP-SC-v1
• SMOTE-EE + IMBNPBASIR SEL-CLASS v2: E- IMBNP-SC -V2
• SMOTE-TL + IMBNPBASIR-CLASS v1: TL-IMBNP-C-v1
• SMOTE-TL + IMBNPBASIR-CLASS v2: TL-IMBNP-C-v2
• SMOTE-TL + IMBNPBASIR SEL-CLASS v1: TL-IMBNP-SC-v1
• SMOTE-TL + IMBNPBASIR SEL-CLASS v2: TL-IMBNP-SC-v2
206 Y. R. Alvarez et al.

Table 11 Mean AUC for state-of-the-art methods and the proposed methods for different IR levels
combined with preprocessed datasets
Algorithm All <9 >9 >33
S-IMBNP-C-v1 0.91 0.86 0.92 0.93
S-IMBNP-C-v2 0.90 0.86 0.92 0.92
S-IMBNP-SC-v1 0.94 0.89 0.95 0.97
S-IMBNP-SC-v2 0.94 0.89 0.95 0.97
E-IMBNP-C-v1 0.88 0.86 0.90 0.83
E-IMBNP-C-v2 0.87 0.86 0.90 0.83
E-IMBNP-SC-v1 0.90 0.89 0.93 0.86
E-IMBNP-SC-v2 0.90 0.89 0.93 0.86
TL-IMBNP-C-v1 0.90 0.86 0.92 0.88
TL-IMBNP-C-v2 0.89 0.85 0.92 0.88
TL-IMBNP-SC-v1 0.92 0.89 0.95 0.90
TL-IMBNP-SC-v2 0.91 0.88 0.93 0.91

We can see that SMOTE + IMBNPBASIR SEL-CLASS v1 and SMOTE +


IMBNPBASIR SEL-CLASS v2 obtains the highest averages AUC (Tables 12 and
13).
IMBNPBASIR SEL-CLASS v1 and IMBNPBASIR SEL-CLASS v2 using as
base classifiers combined with SMOTE, SMOTE-ENN, and SMOTE-TL as prepro-
cessing methods, gets results significantly higher than the state-of-the-art algorithms
and the other variants proposed.

Table 12 Average Friedman rankings for all imbalance preprocessed datasets


Algorithm Ranking Algorithm Ranking
S-C4.5 12.5281 E-IMBNP-C-v1 8.8202
B-C4.5 12.9157 E-IMBNP-C-v2 9.0337
E-C4.5 12.6517 E-IMBNP-SC-v1 6.8708
TL-C4.5 13.0056 E-IMBNP-SC-v2 6.1685
CS-C4.5 12.882 TL-IMBNP-C-v1 8.9831
S-IMBNP-C-v1 8.1404 TL-IMBNP-C-v2 9.3539
S-IMBNP-C-v2 8.3764 TL-IMBNP-SC-v1 6.5787
S-IMBNP-SC-v1 5.2247 TL-IMBNP-SC-v2 6.4607
S-IMBNP-SC-v2 5.0056
Similar Prototype Methods … 207

Table 13 Holm’s posthoc procedure for all imbalance preprocessed datasets, using S-IMBNP-SC-
v2 as the control algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
16 TL-C4.5 10.568173 0 0.003125 Rejected
15 B-C4.5 10.449429 0 0.003333 Rejected
14 CS-C4.5 10.4049 0 0.003571 Rejected
13 E-C4.5 10.10062 0 0.003846 Rejected
12 S-C4.5 9.937348 0 0.004167 Rejected
11 TL-GEN-V2 5.744217 0 0.004545 Rejected
10 E-GEN-V2 5.321194 0 0.005 Rejected
9 TL-GEN-V1 5.2544 0 0.005556 Rejected
8 E-GEN-V1 5.039178 0 0.00625 Rejected
7 S-GEN-V2 4.452882 0.000008 0.007143 Rejected
6 S-GEN-V1 4.14118 0.000035 0.008333 Rejected
5 E-SEL-V1 2.463928 0.013742 0.01 No rejected
4 TL-SEL-V1 2.078011 0.037708 0.0125 No rejected
3 TL-SEL-V2 1.922161 0.054586 0.016667 No rejected
2 E-SEL-V2 1.536244 0.124478 0.025 No rejected
1 S-SEL-V1 0.289437 0.772247 0.05 No rejected

5 Conclusions

Four new proposals for imbalanced data classification were shown in the paper. The
novelty of the proposal lies in the use of hybridization of the Rough Sets Theory,
specifically the use to measure similarity quality, and concepts of classification based
on prototypes, to classify objects under these conditions. The implementation of this
measurement allows creating a prototype that covers the objects that have as decision
value the majority class of the similarity class.
Finally, after the experimental study and the statistical analysis carried out, it can
be concluded that the proposed methods are very competitive in imbalanced domains,
since they get results significantly higher than the state-of-the-art algorithms.

References

1. Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia
Conference on Knowledge Discovery and Data Mining. Springer (2012)
2. García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition
in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)
208 Y. R. Alvarez et al.

3. Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selec-
tion for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
4. Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary
data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett.
34(12), 1339–1347 (2013)
5. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying
rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining.
ACM (2005)
6. Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of
the 2010 SIAM International Conference on Data Mining. SIAM (2010)
7. Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning.
Knowl.-Based Syst. 21(8), 753–763 (2008)
8. Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets.
IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
9. Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-
, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4),
463–484 (2012)
10. García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using geneti-
cally evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)
11. Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolu-
tionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
12. Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings
of the 30th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval. ACM (2007)
13. Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett.
34(10), 1146–1151 (2013)
14. Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int.
J. Intell. Syst. 16(12), 1445–1473 (2001)
15. Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest
neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)
16. Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)
17. Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing.
Revista Cubana de Ciencias Inf. 4 (2010)
18. Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity
relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)
19. Frias, M., et al.: Prototypes selection based on similarity relations for classification problems.
In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE
(2015)
20. Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th
Joint Conference on Information Sciences. Citeseer (2000)
21. Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the
nearest prototype approach based on similarity relations for problems of function approxima-
tion. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)
22. Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In:
2010 10th International Conference on Intelligent Systems Design and Applications (ISDA).
IEEE (2010)
23. Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on
Knowledge Discovery and Data Mining. Springer (2007)
24. Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis
diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)
25. Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)
26. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)
Similar Prototype Methods … 209

27. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl.
Data Eng. 14(3), 659–665 (2002)
28. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res.
16, 321–357 (2002)
29. Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In:
10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering
and Decision Making (to appear) (2012)
30. Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling
and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl.
Inf. Syst. 33(2), 245–265 (2012)
31. Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications
in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge
Management and Decision Support. Atlantis Press (2013)
32. Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría
de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)
33. Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm.
Dyna 80(182), 182–190 (2013)
34. Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in
composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO
2010) (2010)
35. Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algo-
rithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)
36. García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of
experiments in computational intelligence and data mining: experimental analysis of power.
Inf. Sci. 180(10), 2044–2064 (2010)
37. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
38. Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70
(1979)
Early Detection of Possible
Undergraduate Drop Out Using a New
Method Based on Probabilistic Rough
Set Theory

Enislay Ramentol, Julio Madera and Abdel Rodríguez

Abstract For any educational project, it is important and challenging to know, at


the moment of enrollment, whether a given student is likely to successfully pass the
academic year. This task is not simple at all because many factors contribute to college
failure. Being able to infer how likely is an enrolled student to present promotions
problems, is undoubtedly an interesting challenge for the areas of data mining and
education. In this paper, we propose the use of data mining techniques in order to
predict how likely a student is to succeed in the academic year. Normally, there are
more students that success than fail, resulting in an imbalanced data representation.
To cope with imbalanced data, we introduce a new algorithm based on probabilistic
Rough Set Theory (RST). Two ideas are introduced. The first one is the use of
two different threshold values for the similarity between objects when dealing with
minority or majority examples. The second idea combines the original distribution
of the data with the probabilities predicted by the RST method. Our experimental
analysis shows that we obtain better results than a range of state-of-the-art algorithms.

Keywords Educational data mining · Drop out · Imbalanced classification


Probabilistic rough set

1 Introduction

Two essential elements in any educational project are retention and completion of
studies by students. Drop out is one of the most complex problems that educational
institutions are facing nowadays. Drop out means the fact that a number of students

E. Ramentol · J. Madera (B) · A. Rodríguez


Research Institute of Sweden RISE SICS Västerås AB,
Stora Gatan 36, SE-722 12, Västerås, Sweden
e-mail: julio.madera@reduc.edu.cu
E. Ramentol
e-mail: enislay.ramentol@ri.se
A. Rodríguez
e-mail: abdel.rodriguez@reduc.edu.cu

© Springer Nature Switzerland AG 2019 211


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_12
212 E. Ramentol et al.

enrolled does not follow the normal path of the academic program, either by repeating
courses or withdrawing from it, permanently or temporarily [37]. There might exist
several causes for student’s drop out [36]. Determining how likely a student is to
complete successfully the academic year is both: important and challenging.
Our main objective in this paper is therefore to create a reliable tool to predict the
likelihood of each student to successfully pass the academic year. Such a prediction
will be carried on the description of the students available at the moment of enroll-
ment. This information will be very useful to create categories of students so the
attention to them could be personalized in terms of their expected results allowing
us to reduce drop out. The study will be carried out on the Informatics Engineering
department of University of Camagüey, Cuba where the drop out rate of freshmen is
about 25%.
A straightforward way to perform such a prediction is to use data mining tech-
niques [24, 32, 33]. A problem affecting the accuracy of standard algorithms is the
fact that, normally, much more students smoothly pass the academic year than those
that have problems with specific topics. This means that any data mining technique
will have to cope with much more examples of successful students than the opposite,
biasing the technique to predict a success more often than a failure. In fact, for the
academic process, the most important prediction are the possible failures.
In machine learning this phenomenon is known as class imbalance problem, and
has been identified as a current challenge in Data Mining [42]. Imbalanced prob-
lems could be tackled from different perspective. Four techniques categories have
been established: data level, algorithm level, cost sensitive and ensembles. In this
paper we focus on solutions at algorithm level. We use the Rough Set Theory (RST)
[28, 45] to create a probabilistic approach in order to predict drop out.
This paper introduces two novel ideas for classifying highly imbalanced data sets
using a probabilistic RST. The first idea is to use two different similarity thresholds
for deciding on the membership of the concepts to the classes. The threshold used to
decide on the inseparability of the objects that belong to the positive class is set to a
very low value. When deciding on the negative class we use a higher threshold. This
way helps the less represented class. The second idea presented in this paper is to
combine a posteriori probabilities (for a given observation) with a priori probabilities
(original distribution of concepts) into the classification algorithm.
We formally introduce the problem in Sect. 2. The background information is
presented during the next sections. Section 3 introduces the imbalanced data set
techniques. We also describe some preprocessing techniques for imbalanced data sets
and discuss the evaluation metric used in this work. Section 4 discuss the standard
and probabilistic Rough Set Theory (RST) respectively. In Sect. 5 we present our
proposal. In Sect. 6 we introduce the experimental study, i. e., the benchmark data
sets, the statistical tests for performance comparison and the experimental analysis
in order to validate the goodness of our proposal. In Sect. 7 we draw the conclusions.
Early Detection of Possible Undergraduate Drop Out … 213

2 The Students Drop Out

Students drop out is a higher educational issue that attracts several researchers nowa-
days [19, 22] given its importance. Let the cause be reprobation, or just social-
economic factors, the fact is that today drop out rates are higher than ever before
[10]. We can also add all those students that do not pass the exams of the ongoing
academic year and decide to repeat the courses in the following year.
This problem also affects the economy of the countries for it makes the pro-
fessional formation process even more expensive. Students drop out causes have
widely been studied. Reasons might vary from family related conditions, parents
educational background, how old the students are when enrolling into to the system
or other social factors given the motivation of the students. It is for sure a tough task to
know beforehand whether every given student will succeed or not in the University.
There are many applications using data mining techniques to improve the edu-
cational system, such as the tool for auto-regulated intelligent tutoring systems pre-
sented in [11] and the decision support system presented in [21]. In this paper we
introduce a data mining approach for the students drop out prediction.

2.1 Construction of the Dataset

We consider data collected over the period 2008–2012 containing information about
the students on the Informatics Engineering program. We selected a target dataset of
292 students who were in their first academic year at the department. The selected
variables have their foundation on psychologists and educators studies. All such
mention variables aim to describe each student in details in order to identify the
major causes of school failure. The variables are shown in Table 1. We proceed to
label the students once they are characterized into two single classes: students that
smoothly promoted or those that did not pass all subjects of the semester.
The main goal is to know, at the moment of enrollment in college, how likely
the student is to smoothly promote the current year. This will make it possible to
group the students given the probability of success or failure and give them a special
treatment in order to help them promoting.
In order to perform such a prediction the task is divided in the following steps:
1. Making-up the data-set: (a) Determining the variables, (b) Measuring each vari-
able and (c) Labeling each observation as a student that promoted or not
2. Determining the data-set characteristics given the amount of instances on each
class
3. Choosing the classifier for the application
4. Choosing (if necessary) the preprocessing techniques for the application
5. Supporting the study experimentally
Table 2 shows the description of the data set. There exist two class values, the
first for the students that smoothly pass the academic year and a second one for those
214 E. Ramentol et al.

Table 1 Descriptive variables


Variable names Type Values
Municipality nominal {1-13}
Age at enrollment nominal {17-30}
High school nominal {1-36}
Academic origin nominal {polytechnic school, regular high school}
Marital status nominal {married, single, divorced}
Gender nominal {male, female}
Colorcast nominal {white, black, brown}
Scholarship nominal {extern, intern}
Mother’s academic background nominal {primary, secondary, high, college}
Priority (student’s career-priority) nominal {1-9}
{athlete, MININT,
contest, 18th order,
Coming from nominal polytecnic courses,
polytechnic institute,
regular high school, active worker}
Type of military service nominal {differed, 18th order, none}
Mean score (high school and real [60, 100]
admission tests)
Math admission test real [60,100]
History admission test real [60,100]
Mean score at high school real [60,100]
Academic status (class) nominal {without problem, with problem}

Table 2 Class description


Class value Description # students
Without problem For students promoting without problems 217
With problem For students who does not pass all subjects, and ask for 75
readmission

that does not pass all subjects. It might be noticed the big difference in the amount
of students belonging to each class. This pose a problem for classifiers known as
imbalanced data set. The imbalanced data set problem is introduced next.

3 A Short Introduction to Imbalanced Data-Set

The learning task in data mining, when data present a disproportional representation
in the number of examples in classes, it is a challenge for the researcher of this area.
This phenomenon is known as imbalanced class problem and is very common in
many applications from the real world [14, 25].
Early Detection of Possible Undergraduate Drop Out … 215

Classical machine learning algorithms often obtain high accuracy with the major-
ity class while with the minority class quite the opposite occurs. This happens because
the classifier focus only on global measures that do not take into account the data dis-
tribution by classes [35]. Nevertheless the most interesting knowledge often focuses
on the minority class [18].

3.1 State of the Art Methods

The imbalanced classification problem can be tackled using four main types of
solutions:
1. Sampling (solutions at the data level) [2, 7]: this kind of solution consists of
balancing the class distribution by means of a preprocessing strategy.
2. Design of specific algorithms (solutions at the algorithmic level) [20]: in this
case we need to adapt our method to deal directly with the imbalance between
the classes, for example, modifying the cost per class or adjusting the probability
estimation in the leaves of a decision tree to favor the positive class [41].
3. Cost sensitive: this kind of methods incorporate solutions at data level, at algo-
rithmic level, or at both levels together, considering higher misclassification
costs for the examples of the positive class with respect to the negative class,
and therefore, trying to minimize higher cost errors [48].
4. Ensemble solutions: [15] Ensemble techniques for imbalanced classification
usually consist of a combination between an ensemble learning algorithm and
one of the techniques above, specifically, data level and cost-sensitive ones.
Following, we described some high-quality proposals that will be used in our exper-
imental study.
• Synthetic Minority Oversampling Technique (SMOTE) . [7] is an oversampling
method.
• SMOTE-Tomek links. [2] Use Tomek links to the oversampled training set as a
data cleaning method.
• SMOTE-ENN. [2] ENN tends to remove more examples than the Tomek links
do.
• Borderline-SMOTE1 and Borderline-SMOTE2. These methods only oversam-
ples or strengthens the borderline minority examples [17].
• Safe-Level-SMOTE. This method assigns each positive instance its safe level
before generating synthetic instances [6].
• SPIDER2 [26]. This method consists of two phases corresponding to preprocess-
ing the majority and minority classes respectively.
• SMOTE-RSB*. [31] This hybrid method constructs new samples using the Syn-
thetic Minority Oversampling Technique together with the application of an editing
technique based on the Rough Set Theory and the lower approximation of a subset.
216 E. Ramentol et al.

• Cost-sensitive C4.5 decision tree (C4.5-CS): [38]. This method builds decision
trees that try to minimize the number of high cost errors and, as a consequence of
that, leads to the minimization of the total misclassification costs in most cases.
• Cost-sensitive Support Vector Machine (SVM-CS): [40]. This method is a mod-
ification of the soft-margin support vector machine [39].
• EUSBOOST: [16]. It is an ensemble method that uses Evolutionary UnderSam-
pling guided boosting.

3.2 Evaluation in Imbalanced Domains

When facing an imbalance problem, the traditional predictive accuracy is not appro-
priate. It occurs because the costs of different errors vary from one class to another
markedly [8, 31].
In imbalance domains one of the most appropriate measure is the Receiver Oper-
ating Characteristic (ROC) graphics [5]. In these graphics, the tradeoff between the
benefits (True Positive rate) and costs (False Positive rate) can be visualized, and it
acknowledges the fact that the capacity of any classifier cannot increase the number
of true positives without also increasing the false positives. The area under the ROC
curve (AUC) corresponds to the probability of correctly identifying which of the two
stimuli is noise and which is signal plus noise.
In this paper, we use the definition given by Fawcett [13], who proposed an
algorithm that, instead of collecting ROC points, adds successive areas of trapezoids
to the computed AUC value. Fawcett’s proposal calculate the AUC by approximating
the continuous ROC-curve by a finite number of points. The coordinates of these
points in ROC-space are taken as false positive and true positive rates obtained
by varying the threshold θ of the probability above which an instance is classified
as positive. The curve itself is approximated by linear interpolation between the
calculated points. The AUC can therefore be determined as the sum of the areas of
the subsequent trapezoids. This method is referred to as the trapezoid rule.

4 Rough Set Theory

Rough sets theory was presented in 1982 [27]. This theory has evolved into a method-
ology for dealing with different types of problems, such as uncertainty produced by
inconsistencies in data [3].
RST is a mathematical tool to express uncertainty when it appears as inconsistency.
RST can deal with quantitative and qualitative data, it is not necessary to eliminate
missing values. RST has become in a powerful tool for data mining task such feature
selection, instance selection, rules extraction and so on [30].
Early Detection of Possible Undergraduate Drop Out … 217

RST provides tree concepts: the lower and upper approximations of a subset X ⊆
U and the boundary region. These concepts were originally introduced in reference
to an indiscernibility relation R.
Using the concept of similarity, the classical RST has been extended. This exten-
sion has been possible by considering that objects that are not indiscernible but
sufficiently close or similar can be grouped into the same class [34]. The main objec-
tive of the similarity relation is to create a more flexible model. There are many
similarity functions, which depend on the type of compared attribute. The following

similarity relation R must satisfy some minimal requirements:

R being an indiscernibility relation (equivalence relation) defined on U , R is a
 
similarity relation extending R if ∀x ∈ U , R (x) ⊆ R (x)and ∀x ∈ U , ∀y ∈ R (x),
   
R (y) ⊆ R (x), where R (x)is a similarity class of x, ie. R (x) = {y ∈ U : y R x}.
The approximation of the set X ⊂ U , using the inseparability relation R, has been
induced as a pair of sets called R− lower approximation of X and R−upper approx-
imation of X . The lower approximation B∗ (X ) and upper approximation B ∗ (X ) of
X are defined respectively as shown in Eqs. 1 and 2.

B∗ (X ) = {x ∈ X : R (x) ⊆ X } (1)

 
B ∗ (X ) = R (x) (2)
x∈X

Taking into account the equations defined in Eqs. 1 and 2, the boundary region of

X is defined for the relation R as:

B N B (X ) = B ∗ (X ) − B∗ (X ) (3)

If the set B N B is empty, then the X set is exact with respect to the relation R .
If, on the contrary, B N B (X ) = θ , the X set is inexact or approximated with respect

to R .

4.1 Probabilistic Rough Set

In the last years, many researcher have put a lot of effort in create some approaches
for the construction of probabilistic rough set models. These approaches have been
proposed based in the concept of rough membership function.
In [43] the authors form two classes of rough set models: the algebraic and proba-
bilistic rough. The first one focus on algebraic and qualitative properties of the theory.
The second one are more practical and capture quantitative properties of the theory
[4, 45].
Using rough membership functions and rough inclusion, the classical rough set
approximation are reformulated, defining larger positive and negative regions and
218 E. Ramentol et al.

providing probabilities that define region boundaries. In boundary region we find the
objects that induce uncertainty, try to reduce this region is challenging task that face
the researcher in this area. Probabilistic rough set provides a possible solution by
re-defining more flexible Positive (POS) and Negative (NEG) regions, that is to say,
including in POS and NEG objects that was previously in boundary region [4].
Pawlak et al. introduce in [28] a proposal that defined probabilistic approxima-
tions. This proposal put an element x into the lower approximation of A if the
majority of its equivalent elements [x] are in A. The lower and upper 0.5 proba-
bilistic approximation operators are dual to each other. The boundary region consists
of those elements whose conditional probabilities are exactly 0.5, which represents
maximal uncertainty.
The requirement of this approach is too loose for real decisions. To overcome
these difficulties, probabilistic rough set models are proposed to generalize the 0.5
probabilistic rough sets model, and a pair of threshold parameters is introduced.
By considering two separate cases, Yao and Wong [47] introduced a more general
probabilistic approximations in the decision-theoretic rough set model [4].

B∗ (X ) − α = {x ∈ U |P(A |[x] ) ≥ α } (4)

B∗ (X ) − β = {x ∈ U |P(A |[x] ) > β } (5)

where 0 ≤ β < α ≤ 1. If α = 1 and β = 0, the classical lower and upper approxima-


tions are obtained. Based on Bayesian decision procedure, decision-theoretic rough
set model provides systematic methods for deriving the required thresholds on prob-
abilities for defining the three regions: positive region, boundary region and negative
region. A review on decision-theoretic rough sets is presented in [12].
How to choose the proper thresholds thus becomes an important task. Unfortu-
nately, the thresholds are usually given by expert’s experience in most of the proba-
bilistic rough sets.

4.2 Rough Sets Based on Rough Membership Function

The objects in the same equivalent class have the same degree of membership.
This membership may be interpreted as the probability of x belonging to X given
that x belongs to an equivalence class, this interpretation leads to probabilistic rough
sets [43].
The rough membership function is defined by Eq. 6, this measure in the interval
[0, 1].
|X ∩ B(x)|
μ BX (x) = (6)
|B(x)|
Early Detection of Possible Undergraduate Drop Out … 219

B(x) denotes the equivalence class of object x according to the relation B. By


definition, elements in the same equivalent class have the same degree of membership.
This value may be interpreted analogously to conditional probability (as a frequency-
based judgment of conditional probability): that an arbitrary element belongs to X
provided that the element belongs to B(x), and may be thought of as the certainty
degree of membership of x to X (Pr (x ∈ X : x ∈ B(x))). This interpretation leads
to probabilistic rough sets [28].
The lower and upper approximations are defined by Eqs. 9 and 10.
 
B∗ (X ) = x ∈ U/μx B (x) = 1 (7)

 
B ∗ (X ) = x ∈ U/μx B (x) > 0 (8)

A more general definition of lower and upper approximations can be made by


using an arbitrary precision threshold “τ ”, expression Eqs. 9 and 10:
 
B∗τ (X ) = x ∈ U/μx B (x) = τ (9)

 
B ∗τ (X ) = x ∈ U/μx B (x) > 1 − τ (10)

An study about rough membership functions is presented in [44], the relation-


ships between rough sets and fuzzy sets based on the concept of rough membership
functions is presented by the author, and the definitions of the lower and upper
approximation given by Eqs. 9 and 10 are related with the notion of α − cuts of
fuzzy sets; α − cuts are crisp set approximations of a fuzzy set at different levels.
This way is the first alternative to make more flexible the definition of rough sets.

5 Proposal for Imbalanced Domains

In this section we introduce a new approach for soft classification over imbalanced
domains based on probabilistic rough set. The membership probability of an instance
to a class is given as follows:
  
[x] X 
Pr (X |[x]) = (11)
|[x]|

wherePr (X |[x]) is the membership probability of x to the class X , [x] X are the
objects belonging to the class X that are similar to x and [x] are all objects similar
to x in the universe.
Using the probabilistic RST has shown very good results [46]. Using this approach
over imbalanced data sets reaches a poor performance though. Based on this, we
220 E. Ramentol et al.

propose two novel ideas to be integrated in a classification mechanism for imbalanced


data sets:
1. The first consists in the use of two different threshold values for determining the
similarity between objects.
2. The second proposal consists in measuring the probability of belonging to each
class based on the combination of the a posteriori and a priori probabilities.

5.1 Proposal 1: About the Use of Two Different Values


of Threshold for Similarity

When classifying an instance, the method needs to find all similar instances in the
training set. Such a similarity is determined using expression Eq. 12 by fixing a
threshold value. It is quite common to use 0.9 for such a threshold, making the set
of found objects really similar to the original one.

n
wk ∗ δk (xik , x jk )
k=1
Similarit y Matri x(i, j) = (12)
M
where n is the number of features, wk the weight for feature k, xik and x jk are the
values for feature k respectively, δk is the function of comparison for feature k, M is
the number of features considered in the equivalence relation, B is the features set
considered in the equivalence relation.
The weight of a feature is defined as:

1 if k ∈ B
wk = (13)
0 other case

δk is calculated for discrete attributes in the following way:



1 i f xik = x jk
δk (xik , x jk ) = (14)
0 other case

and for continuous attributes:


 
 
xik − x jk 
δk (xik , x jk ) = 1 − (15)
max Ak − min Ak

where max Ak and min Ak are the extremes of the domain intervals for feature k.
Nevertheless, it is demonstrated in [31] that it is necessary to reduce this threshold
for the imbalanced data sets. Reducing such a value would mean soften the restrictions
for the search. In the imbalanced context, lowering the threshold for the minority
class means using a less restrictive search for the minority class in order to cope with
Early Detection of Possible Undergraduate Drop Out … 221

its poor representation with respect to the other class. As a consequence we might
expect a fairer classification.
We propose the use of a different threshold value for each class. By doing so,
the classification method helps the less represented instances to be better classified.
Remember that due to the high overlapping existing between the classes almost all
instances are similar to the most represented instances and almost none (or even none
at all) are similar to the less represented samples.

5.2 Proposal 2: About the Measure of Probability


of Belonging to Each Class

Standard classification methods ignore the original distribution of data. This is nor-
mally a valid procedure when the classes are balanced. For the imbalanced learning
problem, ignoring such a distribution could cause a poor classification. We propose
to incorporate the original distribution in the probabilities calculation. 
Let C X be the total amount of samples belonging to class X and C = C X the
X
total amount of samples on the dataset. The a priori probability of any new sample to
belong to a given class X can be expressed as Pr (X ) = CCX . For a new observation we
may calculate the probability of the new sample to belong to each class as proposed
in Eq. 11. In a balanced dataset, this expression might be sufficient for the a priori
probabilities are homogeneous. In an imbalanced dataset the original distribution of
the samples is not so. We propose to measure the probability of belonging to each
class based on the ratio of the a posteriori and a priori probabilities:
Pr (X | [x])
R(X | [x]) =
Pr (X )

The probability for each class can be expressed as its own ratio normalized by the
total aggregation of all of them:
R (X | [x])
P(X | [x]) =  (16)
Y R (Y | [x])

Finally, the membership function to the positive class can be expressed as the
average of the probability of the pattern to belong to the positive class and the
probability of the pattern not to belong to the negative class:
P (X | [x]) + 1 − P (X | [−x])
μ X (x) = (17)
2
The membership function to the negative class can be obtained analogously.
222 E. Ramentol et al.

5.3 RST-2Simil: The Algorithm

Based on the previous proposals we formulate the following algorithm:


Step1: Calculate the probability of belonging to the positive region of the major-
ity class with expression Eq. 16, using a very high threshold for determining the
similarity of the objects.
Step2: Calculate the probability of belonging to the positive region of the minor-
ity class with expression Eq. 16, using a very low threshold for determinating the
similarity of the objects.
Step 3: Label the object with the most likely class.

Algorithm 1 RST-2Simil
Require: T st, the set of test examples;
T ra, the set of training examples;
threshold1, the threshold to determinate similarity between minority instances;
threshold2, the threshold to determinate similarity between majority instances;
Ensure: Pmin , probability to belong to positive class,
Pmay , probability to belong to majority class.
1: for each x ∈ T st do
2: Pmin = Compute Pr ob(T ra, theshold1)
3: Pmay = Compute Pr ob(T ra, theshold2)
4: if Pmin ≥ Pmay then
5: x ∈ Min class
6: else
7: x ∈ Ma jclass
8: end if
9: end for

6 Experimental Study

In this section, we experimentally evaluate the proposed algorithm on the students


data set described in Sect. 2. In Sect. 6.1, we describe the setup of our experiments.
In Sect. 6.2 we compare the proposed method with the state-of-the-art algorithms,
using data-sets from UCI machine learning repository. In Sect. 6.3 the parameters of
the algorithm are adjusted, while in Sect. 6.4 we compare the results of our approach
with those obtained with the state-of-the-art methods. We close this section with the
discussion of the test case in Sect. 6.5.

6.1 Experimental Setup

We consider the following imbalanced learning methods to compare our method


with:
Early Detection of Possible Undergraduate Drop Out … 223

• SMOTE: in combination with kNN, C4.5 and SVM


• SMOTE-Tomek links: in combination with kNN, C4.5 and SVM
• SMOTE-ENN: in combination with kNN, C4.5 and SVM
• Borderline-SMOTE1: in combination with kNN, C4.5 and SVM
• Borderline-SMOTE2: in combination with kNN, C4.5 and SVM
• SMOTE-RSB*: in combination with kNN, C4.5 and SVM
• Safe-Level-SMOTE: in combination with kNN, C4.5 and SVM
• SPIDER2: in combination with kNN, C4.5 and SVM
• CS-C4.5
• CS-SVM
• EUSBoost
The first eight methods selected are preprocessing methods based on SMOTE in
combination with three well known classifiers: kNN [9], C4.5 [29] and SVM [39],
representing lazy learners, decision tree-based methods and support vector machines,
respectively. We select two methods based on cost sensitive learning: CS-C4.5 and
CS-SVM. Finally we select an ensemble method called EUSBOOST. All selected
methods are described in Sect. 3.

6.2 Experimental Study Using Data-Sets from UCI Machine


Learning Repository

To analyze our proposal, we have considered 18 data-sets from the UCI repository
[1] with highly imbalanced rates (higher than 9). The description of these data-sets
appears in Table 3 (column IR indicates the imbalance ratio).
The results of the experimental study for the test partitions are shown in
Table 4, where in the first 3 columns we have included the result for 1-NN, Cost-
Sesitive-C4.5, C4.5, Cost-Sensitive-MLP and our proposal, the best method is high-
lighted in bold for each data-set. In the remain columns we can observe the results
for 5 resampling techniques (based in SMOTE) using C4.5. We can observe the
goodness of our method approach since it obtains the highest performance value for
almost all the methodologies that are being compared.
We support the comparison with a statistical analysis in order to demonstrate
the superiority of our proposal. The average ranks of the algorithms are shown in
Table 5. The p-value computed by the Friedman test is approximately 0, which
indicates that the hypothesis of equivalence can be rejected with high confidence.

6.3 Adjusting the Parameters of RST-2SIMIL

As mentioned in Sect. 5.3 RST-2SIMIL needs two parameters: the threshold1 to


evaluate similarity when finding the membership probability to the minority class
and the threshold2 to do the same for the majority class.
224 E. Ramentol et al.

Table 3 UCI data-sets used in the experiments


Dataset IR Inst Attr
yeast-2_vs_4 9.1 515 7
yeast-0-5-6-7-9_vs_4 9.35 528 8
vowel0 10.10 988 13
glass-0-1-6_vs_2 10.29 192 9
glass2 11.59 214 9
ecoli4 15.8 336 7
yeast-1_vs_7 14.3 459 7
abalone9-18 16.4 731 8
yeast-1-4-5-8_vs_7 22.1 693 8
yeast4 28.1 1484 8
yeast-1-2-8-9_vs_7 30.57 947 8
yeast6 41.4 1484 8
abalone19 129.44 4174 8
ecoli-0-1_vs_5 11 240 6
ecoli-0-1-4-7_vs_2-3-5-6 10.59 336 7
led7digit-0-2-4-5-6-7-8-9_vs_1 10.97 443 7
yeast-0-2-5-6_vs_3-7-8-9 9.14 1004 8
yeast-0-3-5-9_vs_7-8 9.12 506 8

Figure 1 illustrates the procedure carried out to tune the parameters. Figure 1a
shows the average AUC value over the Y axis and the variation of the threshold1 over
the X axis. We can observe that the best result is obtained when using 0.5. Figure
1b shows a similar result but associated to the threshold2. We can observe that the
best result is obtained using 0.6, although the difference is much shorter than when
varying threshold1.

6.4 Comparison to State-of-the-Art Methods Using Drop Out


Data-Set

In this section, we show the results of the selected state-of-the-art methods and a
comparison with our proposal. In order to have a better idea of the contribution of
each proposal we first compare the probabilistic RST with and without each upgrade,
this is:

(A) Classic PRS: thr eshold = 0.6


(B) PRS using thr eshold1 = 0.5 and thr eshold2 = 0.6
(C) PRS + a priori probabilities
(D) PRS using thr eshold1 = 0.5 and thr eshold2 = 0.6+ a priori probabilities
Table 4 Results of the experiments using UCI repository
Classifiers Resampling
Datasets knn-1 CS-C45 C45 CS-MLP RST-PROB C4.5+SM C4.5+SM+ENN C4.5+BL-SM C4.5+SL-SM C4.5+SP2
yeast-2_vs_4 0.8521 0.8866 0.8307 0.7330 0.9041 0.8620 0.8777 0.8612 0.8852 0.8667
yeast-0-5-6-7-9_vs_4 0.7023 0.7243 0.6802 0.6894 0.7901 0.7682 0.7725 0.7776 0.7705 0.6938
vowel0 1.0000 0.9422 0.9706 0.6817 0.9928 0.9433 0.9344 0.9178 0.9522 0.9706
glass-0-1-6_vs_2 0.5767 0.6155 0.5938 0.4038 0.6963 0.6367 0.6667 0.6243 0.6450 0.5462
glass2 0.6008 0.6416 0.7194 0.4396 0.6258 0.5424 0.6819 0.6535 0.5884 0.6635
ecoli4 0.8702 0.8636 0.8437 0.8702 0.9909 0.8044 0.8592 0.8124 0.8699 0.8639
yeast-1_vs_7 0.6457 0.6139 0.6275 0.5771 0.6969 0.7064 0.6968 0.6615 0.6969 0.6814
abalone9-18 0.6037 0.6655 0.5859 0.6490 0.6802 0.6201 0.7332 0.7275 0.6783 0.5948
yeast-1-4-5-8_vs_7 0.5735 0.5540 0.5000 0.4893 0.6737 0.5230 0.5192 0.5039 0.6013 0.5380
yeast4 0.6671 0.7222 0.6135 0.6554 0.8821 0.7004 0.7157 0.6826 0.7973 0.6693
Early Detection of Possible Undergraduate Drop Out …

yeast-1-2-8-9_vs_7 0.5530 0.6769 0.6156 0.4307 0.7222 0.7051 0.6397 0.6137 0.5682 0.6260
yeast6 0.7482 0.8082 0.7115 0.5891 0.9282 0.8280 0.8273 0.7931 0.8156 0.8161
abalone19 0.4963 0.5701 0.5000 0.4949 0.7058 0.5203 0.5185 0.5172 0.5343 0.5284
ecoli-0-1_vs_5 0.8705 0.8182 0.8159 0.7409 0.9449 0.8227 0.8477 0.8614 0.8295 0.9159
ecoli-0-1-4-7_vs_2-3-5-6 0.8154 0.8772 0.8051 0.7622 0.8110 0.8461 0.8529 0.7937 0.8665 0.8353
led7digit-0-2-4-5-6-7-8-9_vs_1 0.5000 0.8436 0.8788 0.5624 0.9363 0.8832 0.8379 0.8943 0.9035 0.8635
yeast-0-2-5-6_vs_3-7-8-9 0.7814 0.7846 0.6606 0.6221 0.7469 0.7543 0.7649 0.7376 0.8140 0.7112
yeast-0-3-5-9_vs_7-8 0.6949 0.6765 0.5868 0.5797 0.7059 0.7222 0.7078 0.6682 0.7075 0.6328
Mean 0.6973 0.7380 0.6966 0.6095 0.8019 0.7327 0.7474 0.7278 0.7513 0.7232
225
226 E. Ramentol et al.

Table 5 Average Rankings of the algorithms (Friedman)


Algorithm Ranking
knn-1 6.25
Cost-sensitive-C45 4.8889
C45 7.6944
Cost-sensitive-MLP 9.1944
RST-PROB 2.3056
C4.5 + SMOTE 4.9444
C4.5 + SMOTE + ENN 4.2778
C4.5 + Borderline-SMOTE 6.0556
C4.5 + SafeLevel-SMOTE 3.75
C4.5 + SPIDER2 5.6389
Friedman statistic (distributed according to chi-square with 9 degrees of freedom): 68.330303
P-value computed by Friedman Test: 0

Table 6 shows the comparison. We might see that using two different thresholds
helps to increase the mean AUC over the classic method. Furthermore, using the com-
bination of both proposals improves the mean AUC over each individual proposal.
Future comparisons will only consider the full proposal (D).
Table 7 shows the AUC results with its associated standard deviation of differ-
ent preprocessing methods combined with different classifiers, using a 5 × 5 cross
validation. The best results are shown in bold. The best preprocessing techniques
when using the C4.5 classifiers are SMOTE-TL and SMOTE-ENN. The best pre-
processing techniques to combine with SVM are SMOTE and Borderline-SMOTE2.

(a) Threshold1 - Fixing the threshold1 (b) Threshold2 - Fixing the threshold2

Fig. 1 Parameters – Fixing the final value for parameters keeping constants the rest of parameters

Table 6 Result for the variants A, B, C and D


A B C D
0.7547 ± 0.01291 0.7731 ± 0.00591 0.7518 ± 0.01373 0.7821 ± 0.00424
Early Detection of Possible Undergraduate Drop Out … 227

Table 7 AUC mean±standard deviation results in test for the preprocessing methods in combina-
tion with C4.5, 1NN y SVM
Preprocessing method C4.5 1-NN SVM
none 0.6127 ± 0.03804 0.5922 ± 0.00649 0.7082 ± 0.01374
SMOTE 0.7160 ± 0.01504 0.6175±0.01225 0.7352 ± 0.01681
SMOTE-TL 0.6963 ± 0.02898 0.6373 ± 0.01057 0.7240 ± 0.01022
SMOTE-ENN 0.6971 ± 0.03334 0.6315 ± 0.01643 0.7285 ± 0.02501
SMOTE-RSB* 0.7309 ± 0.01700 0.6267 ± 0.01304 0.7085 ± 0.01353
Boderline-SMOTE1 0.6937 ± 0.01496 0.6197 ± 0.02666 0.7245 ± 0.01766
Boderline-SMOTE2 0.7093 ± 0.01854 0.6217 ± 0.00209 0.7404 ± 0.01360
Safe-level-SMOTE 0.6914 ± 0.02523 0.5910 ± 0.00414 0.7097 ± 0.01349
SPIDER2 0.6962 ± 0.01081 0.6313 ± 0.01743 0.6811 ± 0.01843

Table 8 AUC mean±standard deviation results in test for Cost-Sensitive, EUSBOOST and our
proposal
CS-C4.5 CS-SVM EUSBOOST RST-2Simil
0.7220 ± 0.02480 0.7599 ± 0.02613 0.7616 ± 0.01144 0.7821 ± 0.00424

No good result is obtained with the 1NN classifiers combined with any preprocessing
technique.
Table 8 shows the AUC results with its associated standard deviation for the
remaining methods using a 5x5 cross validation. The first two rows correspond to
the cost sensitive methods. The best competitors are the ensemble and our proposal
(results are shown in bold).
Figure 2 shows a resume of the above comparison. It might be observed the
difference of the results of our proposal (last bar) and the state-of-the-art methods.

AUC

0.80

0.78

0.76
RST-2Simil
COST SENSITIVE

EUSBOOST

0.74
SVM +
SMOTE-B2
SVM +
C4.5 +

SMOTE
SVM +
S-RSB*

0.72

0.7

Fig. 2 Average AUC for the best methods used in the comparison
228 E. Ramentol et al.

Table 9 Probability of causing drop out for each student

Groups # Student Prob Drop out Real state


1 0,91418 true*
2 0,69278 true*
3 0,66222 true*
4 0,65363 true*
5 0,61841 false
6 0,60063 false
7 0,60023 false
High risk 8 0,59510 false
9 0,59508 false
10 0,59123 false
11 0,58837 false
12 0,54210 false
13 0,53756 true*
14 0,53504 false
15 0,51347 false
16 0,49878 false
17 0,48397 false
18 0,47344 true*
Medium risk 19 0,45540 false
20 0,44552 false
21 0,44155 false
22 0,42767 false
23 0,41748 true*
24 0,39997 false
25 0,38101 false
26 0,36550 false
Low risk 27 0,36410 false
28 0,35583 false
29 0,34891 false
30 0,34750 false
31 0,34718 false
32 0,33599 false
33 0,32216 false
34 0,29970 false
35 0,28657 false
Early Detection of Possible Undergraduate Drop Out … 229

6.5 A Real Sholar Case to Validate the Proposed Method

The experimental study compares the methods in terms of AUC. For the real appli-
cation we need to group the students given their probability to smoothly pass the
academic year. For this reason we test our proposal in the group of students enrolled
in their first academic year at 2013–2014. They already finished such a year so we
could check the results. The group is composed by 35 students. From the total, 28
students smoothly passed the year and 7 had problems (IR = 4). Table 9 shows the
details of the results over the group.
We created three groups of students. The first group (shaded in red) corresponds
to the high risk students and it is composed by the students that our method predicted
to have high probability to fail. We advise the professors for future academic years
to take special care of this group. There are 7 students that had troubles passing the
year and 5 of them are in this group.
The second group (yellow) corresponds to the medium risk students. We recom-
mend the professors to watch over these students. They might be less likely to fail
but still, they might need extra help. We may find two of the 7 students who failed
the year in this group.
Finally, the third group (green) corresponds to the low risk students. There are
not student that did not pass the year in this group.
All in all, 7 out of 7 students that did not pass the academic year are grouped
into the high or medium risk groups. The main goal of our proposal is not to give a
hard classification of the students into two classes (pass or fail) but to provide the
professors with a probability of the students to succeed so the resources can be better
used to avoid drop out.

7 Conclusions

In this paper, we have presented a new proposal for classifying over high imbalanced
data sets. The proposal belongs to the category of techniques at the algorithms level.
Our main contribution from the machine learning point of view can be summarized
as follows:
• We introduce a new measure based in probabilistic rough set to obtain the lower
approximation for high imbalanced data sets. This measure is used to obtain a
classification model.
• The second novelty is the use of two different values of threshold for determining
the similarity between objects.
From the results of our experimental analysis, we have observed good average results
obtained by our proposal. An important conclusion is that using this proposal is not
necessary the use of a preprocessing step because we obtain similar or superior results
than 8 well-known preprocessing methods.
From the application on the prediction of the drop out point of view our main
contributions are as follows:
230 E. Ramentol et al.

• We manage to assign a realistic probability of success (or failure) to every fresh


student given some characteristics determined by specialists at the moment of
enrollment at the Informatics Engineering.
• From these probabilities we create risk groups so the attention to students on higher
risk groups might be personalized.

References

1. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)


2. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behaviour of several methods for
balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2004)
3. Bello, R., Falcon, R., Pedrycz, W., Kacprzyk, J.: Granular Computing: at the Junction of Rough
Sets and Fuzzy Sets. Springer, Berlin (2008)
4. Bello, R., Garcia, M.M.: Probabilistic approaches to the rough set theory and their applications
in decision-making. In: Soft Computing for Business Intelligence, pp. 67–80. Springer, Berlin
(2014)
5. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning
algorithms. Pattern Recognit. 30, 1145–1159 (1997)
6. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-
synthetic minority over-sampling technique for handling the class imbalanced problem. Pacific-
Asia Conf. Knowl. Discov. Data Min. 3644, 475–482 (2009)
7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.: SMOTE: synthetic minority over-
sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
8. Chawla, N., Japkowicz, N., Kolcz, A.: Editorial: special issue on learning from imbalanced
data sets. SIGKDD Explor. 6(1), 1–6 (2004)
9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27
(1967)
10. Dekker, G.W., Pechenizkiy, M., Vleeshouwers, J.M.: Predicting students drop out: a case study.
Educational Data Mining, pp. 41–50 (2009)
11. Domingo, P.A., Garcia-Crespo, B.R., Iglesias, A.: Edu-ex: A tool for auto-regulated intelligent
tutoring systems development based on models. Artif. Intell. Rev. 18, 15–32 (2002)
12. Dun, L., Huaxiong, L., Xianzhong, Z.: Two decades research on decision-theoretic rough sets.
In: Proceedings of the 9th IEEE International Conference on Cognitive Informatics, ICCI 2010,
pp. 968–973 (2010)
13. Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)
14. Fawcett, T.E., Provost, F.: Adaptive fraud detection. Data Mining Knowl. Discov. 3, 291–316
(1997)
15. Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for
the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans.
Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2012)
16. Galar, M., Fernández, A., Barrenechea, E., Herrera, F.: EUSBoost: enhancing ensembles for
highly imbalanced data-sets by evolutionary undersampling. Pattern Recognit. 46, 3460–3471
(2013)
17. Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced
data sets learning, pp. 878–887. Springer (2005)
18. He, H., García, E.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9),
1263–1284 (2009)
19. Herzog, S.: Measuring determinants of student return vs. dropout/stopout vs. transfer: a first-to-
second year analysis of new freshmen. In: Proceedings of 44th Annual Forum of the Association
for Institutional Research (AIR) (2004)
Early Detection of Possible Undergraduate Drop Out … 231

20. Huang, Y.M., Hung, C., Jiau, H.C.: Evaluation of neural networks and data mining methods
on a credit assessment task for class imbalance problem. Nonlinear Anal.: Real World Appl.
7(4), 720–747 (2006)
21. Kotsiantis, S.B.: Use of machine learning techniques for educational proposes: a decision
support system for forecasting students grades. Artif. Intell. Rev. 37, 331–344 (2012)
22. Lassibille, G., Gomez, L.: Why do higher education students drop out? evidence from spain.
Edu. Econ. 16(1), 89–105 (2007)
23. Liu, D., Li, T., Ruan, D.: Probabilistic model criteria with decision-theoretic rough sets. Inf.
Sci. 181, 3709–3722 (2011)
24. Luan, J.: Data mining and its applications in higher education. New Directions For Institutional
Research, pp. 17–36 (2002)
25. Mazurowski, M., Habas, P., Zurada, J., Lo, J., Baker, J., Tourassi, G.: Training neural network
classifiers for medical decision making: the effects of imbalanced datasets on classification
performance. Neural Netw. 21(2–3), 427–436 (2008)
26. Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy
and borderline examples. Rough Sets Curr. Trends Comput. Lect. Notes Comput. Sci. 6086,
158–167 (2010)
27. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 145–172 (1982)
28. Pawlak, Z., Wong, S., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int.
J. Man-Mach. Stud. 29, 81–95 (1988)
29. Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
30. Rahman Ali, M.H.S., Lee, S.: Rough set-based approaches for discretization: a compact review.
Artif. Intell. Rev. (2015). https://doi.org/10.1007/s10462-014-9426-2
31. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB∗ : a hybrid preprocessing
approach based on oversampling and undersampling for high imbalanced data-sets using smote
and rough sets theory. Int. J. Knowl. Inf. Syst. 33, 245–265 (2012)
32. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst.
Appl. 33, 135–146 (2007)
33. Romero, C., Ventura, S., Espejo, P.G., Hervas, C.: Data mining algorithms to classify students.
In: Proceedings of the 1st International Conference on Educational Data Mining (EDM 08)
(2008)
34. Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. Adv.
Mach. Intell. Soft-Comput. 4, 17–33 (1997)
35. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern
Recognit. Artif. Intell. 23(4), 687–719 (2009)
36. Superby, J., Vandamme, J.P., Meskens, N.: Determination of factors influencing the achieve-
ment of the first-year university students using data mining methods. In: Proceedings of the
Workshop on Educational Data Mining at ITS 06 (2006)
37. Terenzini, P.T., Lorang, W.G., Pascarella, E.: Predicting freshman persistence and voluntary
dropout decisions: a replication. Res. Higher Educ. 15(2), 109–127 (1981)
38. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl.
Data Eng. 14(3), 659–665 (2002)
39. Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)
40. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector
machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
41. Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution
on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
42. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis.
Mak. 5(4), 597–604 (2006)
43. Yao, Y., Wong, S., Lin, T.: A review of rough set models. In: Lin, T.Y., Cercone, N. (eds.) Rough
Sets and Data Mining: Analysis for Imprecise Data, pp. 47–75. Kluwer Academic Publishers,
Boston (1997)
44. Yao, Y.Y.: Generalized rough set models. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in
Knowledge Discovery, pp. 286–318. Physica, Heidelberg (1998)
232 E. Ramentol et al.

45. Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20, 287–297 (2003)
46. Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 341–353 (2010)
47. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximating concepts. Int. J.
Man-mach. Stud. 37, 793–809 (1992)
48. Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Comput. Intell. 26(3), 232–257
(2010)
Multiobjective Overlapping Community
Detection Algorithms Using Granular
Computing

Darian H. Grass-Boada, Airel Pérez-Suárez, Rafael Bello and Alejandro


Rosete

Abstract Community detection is one of the most important problems in Social


Network Analysis. This problem has been successfully addressed through Multi-
objective optimization Evolutionary Algorithms (MOEAs); however, most of the
MOEAs proposed only detect disjoint communities, although it has been shown that
in most real-world networks nodes may belong to multiple communities. In this
chapter, we introduce three algorithms which build, from different perspectives, a
set of overlapping communities using Granular Computing theory and based on a
Multi-objective Optimization approach. The proposed algorithms use highly cohe-
sive granules as initial expansion seeds and they employ the local properties of the
vertices in order to obtain well accurate overlapping communities structures.

1 Introduction

The detection of communities in a social network is a problem that have been widely
addressed in the context of Social Network Analysis (SNA) [24]. Taking into account
the NP-hard nature of the community detection problem [21], several approaches has
been reported in the literature [7, 8, 15, 17].

D. H. Grass-Boada (B) · A. Pérez-Suárez


Advanced Technologies Application Center (CENATAV), Havana, Cuba
e-mail: dgrass@cenatav.co.cu
A. Pérez-Suárez
e-mail: asuarez@cenatav.co.cu
R. Bello
Department of Computer Science, Universidad Central “Marta Abreu”,
de Las Villas, Santa Clara, Cuba
e-mail: rbello@uclv.edu.cu
A. Rosete
Facultad de Ingeniería Informática, Universidad Tecnológica de la Habana “José Antonio
Echeverría” (Cujae), Havana, Cuba
e-mail: rosete@ceis.cujae.edu.cu

© Springer Nature Switzerland AG 2019 233


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_13
234 D. H. Grass-Boada et al.

Most reported approaches define an objective function that captures the notion of
community and then, they use heuristics in order to search for a set of communities
optimizing this function. Although there is no consensus regarding which properties
must satisfy a group of nodes to be considered as a community, intuitively, it is
desirable for a community to have more inner edges than outer edges [19].
Single-objective optimization approaches have two main drawbacks: (a) the opti-
mization of only one function confines the solution to a particular community struc-
ture, and (b) returning one single partition may not be suitable when the network has
many potential structures. Taking into account these limitations, many community
detection algorithms model the problem as a Multi-objective Optimization Problem.
Despite the good results attained by the reported community detection algorithms
following a Multi-objective Optimization approach, most of them constraint com-
munities to be disjoint [5, 18, 21, 28]; however, it is known that most real-world
networks have overlapping community structures [16]. Take into account that those
vertices belonging to more than one community represent individuals that share
characteristics or interests. It is worth noting, the space of feasible solutions in the
overlapping community detection problem is more complicated than that of the dis-
joint case; thus, it results challenged to discover overlapping community structures
in social networks.
To the best of our knowledge, only the algorithms proposed in [10–13, 25]
addressed the overlapping community detection problem from a Multi-objective
Optimization point of view. These algorithms use MOEAs for solving the multi-
objective community detection problem and for looking for the set of Pareto optimal
solutions. Nevertheless, they make little use of the local properties of the nodes in
the network, as well as they do not define which properties must satisfy a node in
order to belong to more than one community.
Our work makes use of Granular Computing [26] for addressing the overlapping
community detection, from a Multi-objective Optimization point of view. Granu-
lar Computing is a term for describing theories, tools and techniques that employ
information granules (subsets of objects of the problem at hand) for problem solving
purposes; objects belonging to the same granule are viewed as inseparable, similar
or near to each other [1].
The hypothesis of our work is that using highly cohesive granules as communities
seeds and an algorithm following a multi-objective approach, that make use of the
local properties of vertices, we can obtain well accurate overlapping communities.
With this aim, in this work we propose three multi-objective optimization algorithms
which build, from different perspectives, a set of overlapping communities. These
algorithms start by building a set of communities seeds using Granular Computing
and then, they iteratively process each seed using three new steps we introduce in the
multi-objective optimization framework, named expansion, improving and merging.
Starting from the seeds these steps aim to detect overlapping zones in the network,
to improve the overlapping quality of these zones, and to merge communities having
high overlapping, respectively.
Multiobjective Overlapping Community Detection Algorithms … 235

Our main contributions are summarized as follows:


1. We propose the cohesive-granules based representation in order to representing
overlapping community structure in the network.
2. We include three new steps in the multi-objective optimization framework,
named expansion, improving and merging to build accurate overlapping com-
munities.
3. We introduce three multi-objective optimization algorithms which accurately
detect overlapping community structures in complex networks.
Our experimental evaluation over real-life and synthetic social networks, com-
pares our proposals against the state-of-the-art related algorithms in terms of the
accuracy they attain, according to the NMI external evaluation measure [8]. The
experimental results showed that our proposals are promising and effective for over-
lapping community detection in social networks.
The remainder of this chapter is organised as follows: Sect. 2 briefly describes
the related work. In Sect. 3, we introduce our proposals whilst Sect. 4 presents an
experimental evaluation, over synthetic and real-life networks, in which the perfor-
mance of our proposals is tested and compared against other related state-of-the-art
algorithms, in terms of the accuracy in the detection of the communities, measured
using the NMI [8]. Finally, Sect. 5 gives the conclusions and future work directions.

2 Related Work

Let G = V, E be a given network, where V is the set of vertices and E the set of
edges among the vertices. A multi-objective community detection problem aims to
search for a partition P ∗ of G such that:

F(P ∗ ) = min P∈Ω ( f 1 (P), f 2 (P), . . . , fr (P)) , (1)

where P is a partition of G, Ω is the set of feasible partitions, r is the number of


objective functions, f i is the ith objective function and min(·) is the minimum value
obtained by a partition P taking into account all the objectives functions. With the
introduction of the multiple objective functions there is usually no absolute optimal
solution, thus, the goal is to find a set of Pareto optimal solutions [21]. A solution
(i.e., a set of communities) is said to be Pareto optimal iff there is no other solution
dominating it. Let S1 , S2 ∈ Ω be two solutions. S1 is said to dominate S2 iff it fulfils
the following two conditions: (i)∀i = 1 . . . r, f i (S1 ) ≤ f i (S2 ), and (ii) ∃ j = 1 . . . r
such that f j (S1 ) < f j (S2 ). The multi-objective algorithms reporting in the literature
for addressing the problem of overlapping community detection use MOEAs.
The first algorithm using MOEAs for detecting overlapping communities is
MEA_CDPs [13] which uses an undirected representation of the solution and the
classical NSGA-II optimization framework with the reverse operator, in order to
236 D. H. Grass-Boada et al.

search for the solutions optimising the average community fitness, the average com-
munity separation and the overlapping degree between communities. On the other
hand, iMEA_CDPs [11] uses the same representation and optimization framework
of MEA_CDPs but it proposes to employ the PMX crossover operator and the simple
mutation operators as evolutionary operators. iMEA_CDPs employs the Modularity
function [20] and a combination of average community separation and overlapping
degree as its objective functions.
Another related algorithm is IMOQPSO [10] which uses a center-based repre-
sentation of the solution that is built from the eigenvectors extracted from the line
graph associated to the network. The line graph is obtained by interpreting each edge
of the network as a vertex, and by adding an edge in the line graph for each pair
of edges having one vertex in common. The optimization framework used by IMO-
QPSO combines QPSO and HSA and it uses two objective functions which measure
how strong is the connection inside and outside communities.
OMO [12] and MOEA-OCD [27] uses the classical NSGA-II optimization frame-
work and a representation based on adjacencies between edges of the network. OMO
uses two objective functions which measure the average connection density inside
and outside the communities. On the other hand, MOEA-OCD uses the negative fit-
ness sum and the unfitness sum as objective functions. Unlike previously algorithms
mentioned, in MOEA-OCD algorithm, a local expansion strategy is introduced into
the initialization process to improve the quality of initial solutions.
MCMOEA [25] first detects the set of maximal cliques of the network and then
it builds the maximal-clique graph. Starting from this transformation, MCMOEA
uses a representation based on labels and the MOEA/D optimization framework in
order to detect the communities optimising the RC and KKM objective functions;
see [4] for a description of these functions. Most existing multi-objective algorithms
for detecting overlapping communities use traditional random initialization method,
thus takes no account of the topology properties of the network, resulting in a lot
of redundant and undesirable initial solutions. In contrast, our algorithms use the
local properties of the nodes in the network to define a cohesive-granules based
representation.
Unlike above commented algorithms, our algorithms does not build overlapping
communities directly but rather it uses the cohesive-granules based representation,
in order to produce a set of seed clusters which are then used for building the final
overlapping communities, using greedy-randomized local expansion procedure. The
local expansion procedure aims to iteratively add neighbors of a cohesive granule as
long as a community fitness function is optimized. This step also allows to discovered
overlapping nodes since it is possible to include nodes that have already been assigned
to other communities.
Multiobjective Overlapping Community Detection Algorithms … 237

3 Overlapping Community Detection Based on Cohesive


Granules

The main idea of our work is using Granular Computing in order to detect a set of
communities seeds, which are used for representing the solution (i.e, communities)
and then, to process these seeds, through three introduced steps named expansion,
improving and merging, for building the final set of overlapping communities.
We propose two alternatives for building the communities seeds both based on a
similarity relation among the vertices of the network.  We will say that  v j ∈ 
 a vertex V
is related with a vertex vi ∈ V , denoted as vi Rv j , iff  N (vi ) ∩ N v j  > 21 ·  N v j .
The set built from all the vertices related to a vertex vi form the so called similarity
class of vi , denoted as [vi ] R . This relation R constitutes our granularity criterion [26].
Taking into account what has been previously described, in this section we intro-
duce three multi-objective optimization algorithms which build, from different per-
spectives, a set of overlapping communities. These algorithms, named MOCD-OV,
MOGR-OV and MOGR-PAES-OV use the introduced three steps in order to obtain
a set of overlapping communities from a set of seeds; however, they are different
in terms of the alternative they use for building these seeds and/or in terms of the
metaheuristic each of them employs.
Following, we describe the general steps and some particularities of each proposed
algorithm and then, the expansion, improving and merging steps are described in
details. Finally, Sect. 3.7 discusses the computational complexity of the proposed
algorithms.

3.1 The MOCD-OV Algorithm

The MOCD-OV algorithm extends the well-known disjoint community detection


algorithm named MOCD [21], for building overlapping communities. With this aim,
MOCD-OV uses the disjoint communities detected by MOCD as seed clusters that
are then processed through the expansion, improving and merging steps, in order to
discover the overlapping communities existing in the network. The pseudo-code of
MOCD-OV is showed in Algorithm 1.
MOCD-OV starts by building a transformed network G
= V, E
 containing
only edges that represent strong connections. For building G
= V, E
 we com-
pute [vi ] R for each vi ∈ V (step 1 of Algorithm 1) and then, for each pairof vertices
vi , v j ∈ V we add an undirected edge (vi , v j ) in E
if v j ∈ [vi ] R or vi ∈ v j R . Tak-
ing into account G
, MOCD-OV generates an initial population of chromosomes P,
using the locus-based adjacency graph encoding for representing each chromosome.
The decoding of a chromosome requires the identification of all connected compo-
nents; each connected component is interpreted by MOCD-OV as a granule (i.e., a
community seed). This initial population is evaluated, in step 3, using the objectives
functions described by MOCD in [21] and then, it is processed in the step 8 through
238 D. H. Grass-Boada et al.

the selection operator proposed by the PESA-II metaheuristic [3] for building the
mating population M.
In the step 9, M is used for creating the current population C P through the
crossover and mutation operators. The uniform two-point crossover operator is
selected for crossover and for mutation, some genes are randomly selected and sub-
stituted by other randomly selected adjacent nodes.
Afterwards, in the step 12 each chromosome (i.e., set of seed clusters) is processed
using the expansion step. This step processes each seed at a time, by iteratively adding
neighbor vertices to a seed as long as a predefined function improves. It has been
shown in the literature that this kind of local building process attains good results
in single-objective optimization approaches [23], thus we decided to employ it in
order to detect overlapping zones in the network. As a result of the previous step, we
obtain a set of overlapping communities which is then processed by the improving
step. This step focused on to locally improve each overlapping zone previously
detected. For fulfilling this purpose, we define two properties that state, from two
different points of view, what must satisfy a vertex in order to belong to more than
one community. Thus, in this step we iteratively analyze which vertices should be
added or removed from each overlapping zone in order to improve its quality. Finally,
in the merging step the overlapping among the detected communities is revised and
those communities having a high similarity, according to a proposed measure, are
merged; this way, the redundancy in the solution is reduced. The resulting sets of
overlapping communities obtained from each chromosome conforms the current
overlapping population (COP). Once these three steps finished (steps 11–16), the
fitness of both COP and CP is computed.
For evaluating each chromosome in CP we employ the objectives functions
described by MOCD in [21]. On the other hand, for evaluating each solution
Si ∈ C O P we employ as objective functions the intra and inter factors of the overlap-
ping Modularity proposed in [20]. The intra and inter factors measure the intra-link
and inter-link strength of Si , respectively. These functions are defined as follows:
⎛ ⎞
|Si |
Av,w
I ntra(Si ) = 1 − ⎝ ⎠ (2)
j=1 v,w∈C j
2 · m · Ov · Ow

|Si |
|N (v)| · |N (w)|
I nter (Si ) = , (3)
j=1 v,w∈C
4 · m 2 · Ov · Ow
j

where v and w are two vertices belonging to community C j ∈ Si , Av,w is 1 if there


is an edge between v and w in the original network; otherwise, Av,w is 0. Ov and Ov
are the number of communities to which vertices v and w belong, respectively; m
is the total number of edges in the network, N (v) is the set of adjacent vertices of
vertex v and |·| refers to the cardinality of a given set.
Multiobjective Overlapping Community Detection Algorithms … 239

Once CP and COP have been evaluated, the nondominated individuals of both CP
and COP are stored. For accomplishing this task we maintain two Pareto sets: one for
the sets of seeds and the other one for the sets of overlapping communities. Finally,
from the Pareto set containing the sets of seeds, the region-based selection defined in
PESA-II is used to select the next Population. In region-based selection, the unit of
selection is now a hyperbox, rather than an individual. A selective fitness is derived
for a hyperbox [3]. Therefore, solutions located in less crowded niches are selected
and assigned to P. Steps 8–21 are repeated a predefined number of iterations.

Algorithm 1: MOCD-OV Algorithm


Input: G = V, E
Output: Pareto Set with overlapping community structures (P Set OC)
1 Computing [vi ] R for each vi ∈ V ;
2 Building the transformed network G
= V, E
;
3 population P ← Create initial population from G
= V, E
;
4 Evaluating (P);
5 Pareto Set of overlapping communities P Set OC ← { };
6 Pareto Set of disjoint communities P Set DC ← { };
7 while stop condition is not satisfied do
8 matingPopulation M ← Selection (P);
9 current population C P ← Apply crossover and mutation operators (M);
10 current overlapping population OC P ← { };
11 foreach Si ∈ C P do
12 Oi ← Expansion (Si );
13 Oi ← Improving (Oi );
14 Oi ← Merging (Oi );
15 OC P ← OC P ∪ {Oi };
16 end
17 Evaluating (OC P);
18 P Set OC ← P Set OC ∪ Nondominated individuals in OC P;
19 Evaluating (C P);
20 P Set DC ← P Set DC ∪ Nondominated individuals in C P;
21 P ← Less-crowded niches of P Set OC;
22 end
23 return P Set OC

3.2 The MOGR-OV Algorithm

MOGR-OV is a single-solution based algorithm [22] that, unlike the MOCD-OV


algorithm, obtains in each iteration only one solution (i.e., a set of overlapping com-
munities). This algorithm starts by building the set Gr = {g1 , g2 , . . . , gn } containing
240 D. H. Grass-Boada et al.

the subgraphs induced by each similarity class vertex [vi ] R , vi ∈ V ; each of these
subgraph is interpreted as a granule (i.e., a seed community), MOGR-OV could use
for building the final communities. The pseudo-code of MOGR-OV is showed in
Algorithm 2.
In the steps 5–9 of Algorithm 2, MOGR-OV builds a solution C. For accomplish-
ing this task, MOGR-OV iteratively applies the roulette wheel selection method over
Gr , where the probability of being selected of a granule g j ∈ Gr is computed by
using the number of unclustered vertices (i.e., vertices that do not belong to any pre-
viously built community of C) belonging to g j . Once a granule g j has been selected
it is processed using the expansion step in order to build the community associated
with g j .
In the steps 10–11, MOGR-OV processes the current solution C using the improv-
ing and merging methods, in order to optimize the quality of the overlapping zones
and for reducing the redundancy in the overlapping communities. The resulting set
of overlapping communities is evaluated using the Eqs. (2) and (3), and as result of
this evaluation, it is added to the Pareto set iff it is a nondominated solution. Steps
5–15 are repeated a predefined number of iterations.

Algorithm 2: MOGR-OV Algorithm


Input: G = V, E
Output: Pareto Set with overlapping community structures (P Set OC)
1 Computing [vi ] R for each vi ∈ V ;
2 Building the set Gr = {g1 , g2 , . . . , gn }, subgraphs induced by each [vi ] R , vi ∈ V ;
3 Pareto Set of overlapping communities P Set OC ← { };
4 while stop condition is not satisfied do
5 while community structure C not built do
6 subgraph seed g j ← Apply the roulette wheel selection method over Gr ;
7 community ci ← Expansion (g j );
8 C ← C ∪ {ci };
9 end
10 C ← Improving (C);
11 C ← Merging (C);
12 Evaluating (C);
13 if C is Nondominated then
14 P Set OC ← P Set OC ∪ {C}
15 end
16 end
17 return P Set OC
Multiobjective Overlapping Community Detection Algorithms … 241

3.3 The MOGR-PAES-OV Algorithm

The MOGR-PAES-OV algorithm is also a single-solution based algorithm; however,


unlike MOGR-OV, MOGR-PAES-OV uses the optimization framework proposed by
the MOEAs-based metaheuristic PAES. This is a simple MOEA using a single-parent
single-offspring EA similar to (1 + 1) — evolution strategy [6]. The pseudo-code of
MOGR-PAES-OV algorithm is showed in Algorithm 3.
MOGR-PAES-OV starts by using the MOGR-OV, with only one iteration, in order
to build an initial solution C. This initial solution is evaluated using the Eqs. (2) and
(3), and it is added to the Pareto set. Afterwards, the solution C is processed through
the mutation operator for obtaining a new solution C
. The mutation operator removes
a random community from C and then, it assigns to C
the remaining communities
in C. Afterwards, for completing solution C
the roulette wheel selection method
is applied over Gr , where the probability of being selected of a granule g j ∈ Gr
is computed by using the number of unclustered vertices (i.e., vertices that do not
belong to any previously built community of C
) belonging to g j .
In the steps 14–15 of Algorithm 3, the overlapping communities represented by
current solution C
are processed using the improving and merging methods. The
resulting set of overlapping communities by using the Eqs. (2) and (3), and as result
of this evaluation, the solution C
is added to the Pareto set iff it is a nondominated
solution. This is the case where offspring (solution C
) and the parent (solution C)
do not dominate each other, thus the choice between the offspring and the parent is
made by comparing them with the archive of best solutions found so far.
In the step 19 of Algorithm 3, the individual-based selection defined in PAES is
used. Therefore, the solution with least crowded region in the objective space among
the members of the archive, it is accepted as a parent and assigned to C. Steps 8–19
are repeated a predefined number of iterations.

3.4 Expansion Step

Overlapping vertices are supposed to be those vertices that belong to more than one
community and in order to be correctly located inside a community they need to have
edges with vertices inside those communities. For detecting overlapping zones each
seed Si is processed for determining which vertices outside Si share a significant
number of their adjacent vertices with vertices inside Si , considering G = V, E.
Let Si be a seed cluster and ∂ Si ⊆ Si the set of vertices of Si having neighbors
outside Si . The strength of ∂ Si , denoted as Str (∂ Si ), is computed as the ratio between
the number of edges of ∂ Si with vertices inside Si , and the number of edges of ∂ Si
with vertices inside and outside Si .
The strategy for expanding seed Si is as follows: (1) determining the set L of

vertices v ∈/ Si having at least one adjacent in ∂ Si , such that Str (∂ Si ) − Str (∂ Si ) >

0, where Si = Si ∪ {v}, (2) applying the roulette wheel selection method over L,
242 D. H. Grass-Boada et al.

Algorithm 3: MOGR-PAES-OV Algorithm


Input: G = V, E
Output: Pareto Set with overlapping community structures (P Set OC)
1 Computing [vi ] R for each vi ∈ V ;
2 Building the set Gr = {g1 , g2 , . . . , gn }, subgraphs induced by each [vi ] R , vi ∈ V ;
3 Pareto Set of overlapping communities P Set OC ← { };
4 initial solution C ← Create by using the MOGR-OV, with only one iteration
5 Evaluating (C);
6 P Set OC ← P Set OC ∪ {C}
7 while stop condition is not satisfied do

8 C ← Apply mutation operator (C)


9 while community structure C
not built do
10 subgraph seed g j ← Apply the roulette wheel selection method over Gr ;
11 community ci ← Expansion (g j );
12 C
← C
∪ {ci };
13 end
14 C
← Improving (C
);
15 C
← Merging (C
);
16 Evaluating (C
);
17 if C
is Nondominated then
18 P Set OC ← P Set OC ∪ {C
}
19 C ← solution with least crowded region of P Set OC;
20 end
21 end
22 return P Set OC

Fig. 1 Strategy for


expanding a seed cluster Si ,
taking into account the
neighbor vertices in L and
the border vertices ∂ Si

where the probability of selecting a vertex v ∈ L is computed by using the increase


v produces in Str (∂ Si ), and (3) repeat steps 1–2 while L = ∅. Figure 1 shows the
strategy for expanding seed.
Multiobjective Overlapping Community Detection Algorithms … 243

3.5 Improving Step

Let Z be an overlapping zone detected by the expansion step and C Z = {C1 , C2 , . . . ,


Cm } the set of communities that set up Z . Let v ∈ Z be an overlapping vertex. Let
N (v|C Z ) be the set of adjacent vertices of v belonging to at least one community
in C Z . Let G v = {G 1v , G 2v , . . . , G lv } be the set of communities or overlapping zones
containing the vertices in N (v|C Z ). Let N
(v|C Z ) be the set of adjacent vertices of
v that belong to at most one community in C Z .
A property we will expect an overlapping vertex like v satisfies is to have the
vertices in N (v|C Z ) equally distributed over the groups of G v . The uniformity of v,
denoted as U (v), measures how much the distribution of vertices in N (v|C Z ) deviates
from the expected distribution of N (v|C Z ) and it is computed as follows:
 
 N (v|C Z ) ∩ G i  1
U (v) = 1 − abs v
−  , (4)
|N (v|C Z )| G 
G v ∈G v
i v

where abs(·) is the absolute value. U (v) takes values in [0, 1] and the higher its value
the better well-balanced v is.
Another property we would expect an overlapping vertex v ∈ Z to fulfill is to
be a connector between any pair of its adjacent vertices in N
(v|C Z ); that is, we
would expect that the shortest path connecting any pair of vertices u, w ∈ N
(v|C Z )
should be the path made by the edges (u, v) and (v, w). The simple betweenness of v,
denoted as S B(v), measures how much connector v is and it is computed as follows:
|C Z −1| |C Z |  | E(Ci ,C j )|

2· i=1 j>i 1− |N
(v|C Z )∩Ci |·| N
(v|C Z )∩C j |
S B(v) = (5)
|C Z | · (|C Z | − 1)

where E(Ci , C j ) is the set of edges with one vertex in Ci and the other one in C j .
S B(v) takes values in [0, 1] and the higher its value the best connector v is.
Let Uave (Z ) be the initial average uniformity of the vertices belonging to an
overlapping zone Z . In order to improve the quality of Z we will analyze the addition
or removal of one or mores vertices from Z . Thus, any vertex v ∈ Z having U (v) <
Uave (Z ) is a candidate to be removed from Z , whilst any vertex u ∈ N (v|C Z ), v ∈ Z ,
such that U (u) > Uave (Z ) is a candidate to be added to Z . Taking into account
that both the uniformity and simple betweenness concepts can be straightforward
generalized in order to be applied to Z , we employ such properties for measuring
which changes in Z increase its quality as an overlapping zone and which do not.
Let T be an addition or removal which turns Z into Z
. T is considered as viable iff
(U (Z
) + S B(Z
)) − (U (Z ) + S B(Z )) > 0. The heuristic proposed for improving
the set O = {Z 1 , Z 2 , . . . , Z j } of overlapping zones detected by the expansion step
is as follows: (1) computing Uave (Z i ) for each Z i ∈ O, (2) detecting the set T of
viable transformations to apply over O, (3) performing the transformation t ∈ T
which produces the higher improvement in its zone, and (4) repeat steps 2 and 3
while T = ∅.
244 D. H. Grass-Boada et al.

3.6 Merging Step

Let OC = {C1 , C2 , . . . , Ck } be the set of overlapping communities detected after


the improving step. Although it is allowable for communities to overlap, they should
have a subset of vertices which makes them different from any other one.
The distinctiveness of a community C, denoted as DC , is computed as the differ-
ence between the number of edges composed of vertices belonging only to C and the
number of edges composed of at least one vertex community C shares with another
community. Two communities Ci and C j which overlap each other are candidate to
be merged iff DCi ≤ 0 or DC j ≤ 0.
The strategy followed in this step in order to reduce the redundancy existing in the
detected communities is as follows: (1) detecting the set PC of pair of communities
that are candidate to be merged, (2) applying the roulette wheel selection method
over the set PC, where the probability of selection of each pair is computed by using
the highest absolute value of the distinctiveness of the two communities forming the
pair, and (3) repeat steps 1 and 2 while PC = ∅.

3.7 Computational Complexity Issues

Our proposals need to compute the similarity class of each vertex, as well as they
perform the expansion, improving and merging steps.
The computation of [vi ] R for each vertex vi of the network is one of the more com-
putational expensive step. This step is O(n 3 ) because it needs to compute the shared
neighbors between each pair of vertices in the graph. Fortunately, it is performed just
once so it does not affect to much to the overall performance of the algorithms.
As it was showed in [2], the expansion step is O(q · d · |L|); where q is the size
of the biggest seed analyzed, d is the average vertex degree, and |L| is the number of
vertices outside a community seed having edges with the seed. On the other hand, the
improving step has a computational complexity of O(ti · n 3 ) and finally, the merging
step has a computational complexity of O(tm · k · m · n 2 ), where k is the number
of communities discovered, m is the average number of communities to which a
community overlap, and tm is the number of iterations performing of merging step.
Based on the above mentioned analysis, in the case of the MOCD-OV algorithm
the computational complexity is O(g · s · (T · n 3 + e + n)), where T = max(ti , tm ),
e is the number of edges (i.e., n 2 ), g is the number of iterations and s the population
size. Finally, by the rule of the sum the MOCD-OV algorithm is O(g · s · T · n 3 ).
Starting from this point and taking into account that both MOGR-OV and MOGR-
PAES-OV are single-solution based algorithms, we can assert that their complexity
is O(T · n 3 ).
Multiobjective Overlapping Community Detection Algorithms … 245

4 Experimental Results

In this section, we conduct several experiments for evaluating the effectiveness of


our proposals.
The experiments were focused on: (1) to evaluate the accuracy attained by our
proposals on real networks and to compare their performances against the one attained
by MEA_CDP [13], IMOQPSO [10], iMEA_CDP [11], OMO [12] and MOEA-
OCD [27] algorithms; and (2) to evaluate the accuracy attained by our proposals on
synthetic networks and to compare their performances against the one attained by
MOEA-OCD [27] algorithm, which has reported the best results over this kind of
networks, and finally.
The real-life networks used in our experiments were the American College Foot-
ball network, the Zachary’s Karate Club network, the Bottlenose Dolphins network,
and the Krebs’ books on American politics network; these networks can be down-
loaded from http://konect.uni-koblenz.de/networks. Table 1 shows the characteristics
of these networks.
Since the Newman benchmark networks [14] have some limitations on node
degrees and community sizes, we decided to use the Lancichinetti–Fortunato–
Radicchi (LFR) benchmark [9] for generating the synthetic networks; this benchmark
is suitable for both separated and overlapping situations.

4.1 Experiments on Real-Life Networks

In this section we evaluate the performance of our proposals over the networks shown
in Table 1, and we compare their results against that attained by MEA_CDP, IMO-
QPSO, iMEA_CDP, OMO and MOEA-OCD, over the same networks. For evaluating
the accuracy of each algorithm we used the NMI external evaluation measure, pro-
posed by Lancichinetti et al. in [8]. NMI takes values in [0,1] and it evaluates a set
of communities based on how much these communities resemble a set of communi-
ties manually labeled by experts, where 1 means identical results and 0 completely
different results.
For computing the accuracy attained by one of our proposal over each network
we employed the experimental framework proposed in [12]. For example, in case
we want to compute the accuracy attained by MOCD-OV, we executed it over each

Table 1 Overview of the real-life networks used in our experiments


Networks # of Nodes # of Edges Ave. degree # Communities
American Cool. Football 115 613 10.66 12
Zachary’s Karate Club 34 78 4.58 2
Bottlenose Dolphins 62 159 5.129 2
Krebs’ books 105 441 8.4 3
246 D. H. Grass-Boada et al.

Table 2 Comparison of our proposals against Multi-objective algorithms, regarding the NMI value.
Best values appears bold-faced
Algorithms Football Zachary’s Dolphins Krebs’ Ave. rank. pos.
MEA_CDP 0.495 0.52 0.549 0.469 6.25
IMOQPSO 0.462 0.818 0.886 X 5.5
iMEA_CDP 0.593 0.629 0.595 0.549 4.25
OMO 0.33 0.375 0.41 0.39 7.75
MOEA-OCD 0.77 0.487 0.648 0.484 5
MOCD-OV 0.793 0.88 0.95 0.502 1.5
MOGR-OV 0.789 0.908 0.944 0.495 2
MOGR- 0.781 0.856 0.675 0.479 3.75
PAES-OV

network and we selected the highest NMI value attained by a solution of each resulting
Pareto set. This experiment is repeated twenty times and for each network, the average
of the highest NMI values attained is computed. The same heuristic is followed
for computing the accuracy of both MOGR-OV and MOGR-PAES-OV algorithms.
Due to MOCD-OV extends MOCD algorithm, we used the parameter configuration
defined in [21].
Table 2 shows the average NMI attained by each algorithm over the real-life
networks used in this experiment; the average values for MEA_CDP, IMOQPSO,
iMEA_CDP and OMO algorithms were taken from their original articles. The aver-
age values for MOEA-OCD were computed following the above mentioned heuristic.
The “X” in Table 2 means that IMOQPSO does not report any results on the Krebs’
books network.
As it can be seen from Table 2, both MOCD-OV and MOGR-OV outperform the
other algorithms in all the networks, excepting in Krebs’ in which they attains the
second and third best result, respectively. On the other hand, the MOGR-PAES-OV
algorithm we proposed attains similar results than MOCD-OV and MOGR-OV, in
Football and Zachary’s networks, while its performance slightly decays in bigger
networks like Dolphins and Krebs’. In the last column of Table 2 we also showed
the average ranking position attained by each algorithm and as it can be observed,
our proposals clearly outperforms the other methods. From the above experiments
on real-life networks, we can see that our proposals are are promising and effective
for overlapping community detection in complex networks, being MOCD-OV the
one performing the best.

4.2 Experiments on LFR Benchmark

In this section we evaluate the performance of our proposals over several synthetic
networks generated from the LFR benchmark [9], in terms of the NMI value they
attain, and we compared it against the results attained by MOEA-OCD algorithms,
Multiobjective Overlapping Community Detection Algorithms … 247

which has reported the best results over this kind of networks, among the algorithms
described in Sect. 2.
In LFR benchmark networks, both node degrees and community sizes follow the
power-law distribution and they are regulated using parameters τ1 and τ2 . Besides,
the significance of the community structure is controlled by a mixing parameter μ,
which denotes the average fraction of edges each vertex in the network has with other
communities. The smaller the value of μ, the more significant community structure
the LFR benchmark network has. In the first part of this experiment, we set network
size to 1000, τ1 = 2, τ2 = 1, the node degree is in [0, 50] with an average value
of 20, whilst the community sizes varies from 10 to 50 elements. Using previous
parameters values we vary μ from 0.1 to 0.5 with an increment of 0.05.
For computing the accuracy attained by each one of our proposals and for MOEA-
OCD we follow the same method used in the experiments on Sect. 4.1. We show in
Fig. 2 the average NMI value attained for each algorithm over the LFR benchmark
when μ varies from 0.1 to 0.5 with an increment of 0.05.
As it can be seen from Fig. 2, as the value of μ increases the performance of
each algorithm deteriorate, being both MOGR-OV and MOGR-PAES-OV those that
performing the best. As the mixing parameter μ exceeds 0.1, the MOEA-OCD algo-
rithm begin to decline in its performance and it is outperformed by MOGR-OV and
MOGR-PAES-OV. Finally, when the value of μ in greater than 0.4 all our proposals
outperform the MOEA-OCD algorithm.
For summarizing the above results, we evaluated the statistical significance of
the NMI values attained by MOGR-OV and MOGR-PAES-OV with respect to those
attained by MOEA-OCD, over each network; we exclude MOCD-OV from this

0.95
Ave. NMI

0.9

0.85

0.8
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Mixing parameter  in LFR benchmark
MOGR-OV MOGR-PAES-OV MOCD-OV MOEA-OCD

Fig. 2 Average NMI value attained by each algorithm on LFR benchmark networks when μ varies
from 0.1 to 0.5 with an increment of 0.05
248 D. H. Grass-Boada et al.

Algorithm*Networks; LS Means
Current effect: F(8, 79)=28.134, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.02
1.00
0.98
0.96
0.94
0.92
NMI value

0.90
0.88
0.86
0.84
0.82
0.80
0.78
Algorithm
0.76 MOGR-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45 Algorithm
MOEAOCD
Networks

Fig. 3 Statistical significance of the results attained by MOGR-OV and MOEA-OCD over each
network

analysis taking into account that its performs was worst than that of MOGR-OV and
MOGR-PAES-OV algorithms. For testing the statistical significance we used the
software STATISTICA v8.0, and we perform a factorial ANOVA in order to analyze
the higher-order interactive effects of multiple categorical independent factors. With
this aim, we first evaluated the statistical significance of the results of each algorithm
over each network (see, Figs. 3 and 4).
As it can be seen from Figs. 3 and 4, the results attained by both MOGR-OV
and MOGR-PAES-OV, over each network, are statistically superior with respect to
that of the MOEA-OCD algorithm. This can be observed also from Figs. 5 and 6, in
which we showed the statistically significance of the overall performance of our two
proposals with respect to that of MOEA-OCD.
In the second part of this experiment, we set μ = 0.1 and μ = 0.4, and we vary
the percent of overlapping nodes existing in the network from 0.1 to 0.45 with an
increment of 0.05; the other parameters remain the same as the first experiment.
Figures 7 and 8 shows the average NMI value attained for each algorithm over each
of these networks.
As it can be seen from Fig. 7, when the structure of the networks is well defined,
MOGR-OV, MOGR-PAES-OV and MOEA-OCD have a performance almost stable,
independently the number of overlapping nodes in the network, being MOGR-OV
the one performing the best. It is also worth mentioning that the performance of the
MOCD-OV algorithm is highly affected by the increasing in the fraction of overlap-
ping vertices. On the other hand, as can be seen from the Fig. 8, when the structure of
the communities is uncertain, the performance of the MOEA-OCD algorithm drops
quickly when the overlapping in the network increases, being our proposals and
specifically MOGR-OV and MOGR-PAES-OV those that perform the best.
Multiobjective Overlapping Community Detection Algorithms … 249

Algorithm*Networks; LS Means
Current effect: F(8, 84)=16.676, p=.00000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.02
1.00
0.98
0.96
0.94
0.92
NMI value

0.90
0.88
0.86
0.84
0.82
0.80
0.78
Algorithm
0.76 MOGR -PAES-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOEAOCD
Networks

Fig. 4 Statistical significance of the results attained by MOGR-PAES-OV and MOEA-OCD over
each network

Fig. 5 Statistical Algorithm; LS Means


Current effect: F(1, 79)=369.21, p=0.0000
significance of the overall Effective hypothesis decomposition
results attained by Vertical bars denote 0.95 confidence intervals
0.96
MOGR-OV wrt.
MOEA-OCD 0.95

0.94
NMI value

0.93

0.92

0.91

0.90
MOGR-OV MOEAOCD
Algorithm

Similar to previous experiments, we evaluated the statistical significance of the


NMI values attained by MOGR-OV and MOGR-PAES-OV with respect to those
attained by MOEA-OCD, over each network. The statistical significance of the results
of each algorithm over each network are shown in Figs. 9 and 10.
As it can be seen from Figs. 9 and 10, the results attained by both MOGR-OV and
MOGR-PAES-OV, over each network, are statistically superior to that of the MOEA-
OCD algorithm. This can be observed also from Figs. 11 and 12, in which we showed
the statistically significance of the overall performance of our two proposals with
respect to that of MOEA-OCD.
250 D. H. Grass-Boada et al.

Algorithm; LS Means
Current effect: F(1, 84)=207.64, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.95

0.94
NMI value
0.93

0.92

0.91

0.90
MOGR -PAES-OV MOEAOCD
Algorithm

Fig. 6 Statistical significance of the overall results attained by MOGR-PAES-OV wrt. MOEA-OCD

0.95

0.9

0.85
Ave. NMI

0.8

0.75

0.7

0.65
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

MOGR-OV MOGR-PAES-OV MOCD-OV MOEA-OCD

Fig. 7 Networks with significant community structure (μ = 0.1). Average NMI value attained by
each algorithm when the fraction of overlapping nodes varies from 0.1 to 0.5 with an increment of
0.05

Finally, we evaluated the statistical significance of the NMI values attained by


ours proposals. With this aim, we reproduced the experiments described above. In
the networks created with LFR benchmark when μ varies from 0.1 to 0.5 with an
increment of 0.05, the statistical significance of the results attained by our algo-
rithms over these networks are shown in Fig. 13. Also, we showed the statistically
significance of the overall performance of our two proposals in Fig. 14
As it can be seen from Figs. 13 and 14, the results attained by MOGR-OV are
statistically better with respect to that of the MOGR-PAES-OV algorithm.
Multiobjective Overlapping Community Detection Algorithms … 251

0.95

0.9

0.85

0.8

0.75
Ave. NMI

0.7

0.65

0.6

0.55

0.5
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

MOGR-OV MOGR-PAES-OV MOCD-OV MOEA-OCD

Fig. 8 Networks with indistinct community structure (μ = 0.4). Average NMI value attained by
each algorithm when the fraction of overlapping nodes varies from 0.1 to 0.5 with an increment of
0.05

Algorithm*Networks; LS Means
Current effect: F(7, 91)=2.3308, p=.03108
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.1

1.0

0.9

0.8
NMI value

0.7

0.6

0.5

0.4

Algorithm
0.3 MOEA-OCD
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOGR-OV
Networks

Fig. 9 Statistical significance of the results attained by MOGR-OV and MOEA-OCD over each
network
252 D. H. Grass-Boada et al.

Algorithm*Networks; LS Means
Current effect: F(7, 85)=2.5328, p=.02042
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.1

1.0

0.9

0.8
NMI value

0.7

0.6

0.5

0.4

Algorithm
0.3 MOEA-OCD
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45 MOGR -PAES-OV
Networks

Fig. 10 Statistical significance of the results attained by MOGR-PAES-OV and MOEA-OCD over
each network

Fig. 11 Statistical Algorithm; LS Means


Current effect: F(1, 91)=217.69, p=0.0000
significance of the overall Effective hypothesis decomposition
results attained by Vertical bars denote 0.95 confidence intervals
0.90
MOGR-OV wrt.
MOEA-OCD 0.85

0.80

0.75
NMI value

0.70

0.65

0.60

0.55
MOEA-OCD MOGR-OV
Algorithm

4.3 Multi-resolution Structures on Real-World Networks

We further illustrated the advantages of our algorithms for identifying multi-


resolution structures on real-world networks. In order to show these advantages,
we selected MOGR-OV and thus, we showed several examples of the different
granularity MOGR-OV is able to detect. Three of the solutions found by MOGR-OV
over Zachary’s network are showed in Fig. 15. Figure 15a shows the Pareto front on
Karate network. Figure 15b–d correspond to three solutions labeled as s4, s3, s5, in
Multiobjective Overlapping Community Detection Algorithms … 253

Algorithm; LS Means
Current effect: F(1, 85)=149.97, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.85

0.80

0.75
NMI value

0.70

0.65

0.60

0.55
MOEA-OCD MOGR -PAES-OV
Algorithm

Fig. 12 Statistical significance of the overall results attained by MOGR-PAES-OV wrt. MOEA-
OCD

Algorithm*Networks; LS Means
Current effect: F(8, 81)=2.4055, p=.02209
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.00

0.98

0.96

0.94
NMI value

0.92

0.90

0.88

0.86

0.84
Algorithm
0.82 MOGR-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOGR -PAES-OV
Networks

Fig. 13 Statistical significance of the results attained by MOGR-OV and MOGR-PAES-OV over
each network

Pareto front, respectively. Figure 15b, c show two overlapping community structures
in which vertices 3, 9, and 31 are overlapping vertices.
Functions (2) and (3) have the potential to balance each others tendency to increase
or decrease the number of communities. This is crucially important in order to obtain
different number of communities, avoiding this way the convergence to trivial solu-
tions [21]. For example, from the community structure in Fig. 15c, it is apparent
that the community of the right further divides into two smaller ones in Fig. 15b;
therefore, the Intra value increases and the Inter value decreases. On the other hand,
254 D. H. Grass-Boada et al.

Algorithm; LS Means
Current effect: F(1, 74)=18.111, p=.00006
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.825

0.820

0.815
NMI value
0.810

0.805

0.800

0.795

0.790

0.785
MOGR-OV MOGR -PAES-OV
Algorithm

Fig. 14 Statistical significance of the overall results attained by MOGR-OV and MOGR-PAES-OV

(a) (b)
0.9
s5 27 23
0.8 20 13
15
10 4
0.7 16
0.051 21
0.6 0.816 34 14
s4 31
0.5 5
Inter

s1 s3 s2 30 6
0.4 33 2 17
9 1
0.128
0.3 0.5 7
19
0.192 11
0.2 0.415 24 3 22
intra
0.1 inter 28 29 18
8 12
0
0 0.05 0.1 0.15 0.2 0.25 26 32
Intra 25

(c) (d)
27 23
20 13
15
10 4
16
21
34 14
31
30 5
6
33 2 17
9 1
7
19
11
24 3 22
28 29 18
8 12
26 32
25

Fig. 15 Examples of the overlapping communities detected over the Zachary’s network. a Non-
dominated front; b–d correspond to three solutions labeled as s4, s3, s5, in nondominated front,
respectively

the minimum Intra value found by MOGR-OV is 0.051 whose corresponding com-
munity structure is showed in Fig. 15b. In this case, one community covers many
vertices, thereby the Intra value decreases and Inter value increases.
Multiobjective Overlapping Community Detection Algorithms … 255

5 Conclusions

In this paper, we introduced three algorithms that combine Granular Computing and
a multi-objective optimization approach for discovering overlapping communities in
social networks. These algorithms start by building a set of seeds that is afterwards
processed for building overlapping communities, using three introduced steps, named
expansion, improving and merging.
The proposed algorithms, named MOGR-OV, MOGR-PAES-OV and MOCD-
OV, were evaluated on four real-life networks in terms of their accuracy and they
were compared against five Multi-objective algorithms of the related work. This
experiment showed that our proposal and, specifically, the MOCD-OV algorithm
outperforms in terms of the NMI external measure the other algorithms in almost
all the real collection. Moreover, our proposals were also evaluated over several
synthetic networks, in terms of the NMI value. These other experiments showed
that, when the structure of the network is not well defined, our proposals perform the
best. Additionally, when the quality of the structure of the network is fixed and the
overlapping of the network begin to increase, one of our proposals, the MOGR-OV
algorithm, is the one with the highest accuracy in almost all cases.
We can conclude from our experimental evaluation that, from our proposals, the
algorithm MOGR-OV is the one that offers the better trade-of in terms of the accuracy
with real and synthetic networks.
As future work, we would like to explore the use of another mutation operator in
the MOGR-PAES-OV algorithm, specifically, one that employs the local properties
of vertices defining seeds that belong to the Pareto set. We have the hypothesis that
this is the key for boosting the accuracy of our algorithms.

Acknowledgements If you want to include acknowledgments of assistance and the like at the end
of an individual chapter please use the acknowledgement environment — it will automatically
render Springer’s preferred layout.

References

1. Bargiela, A., Pedrycz, W.: Granular computing. Handbook on Computational Intelligence:


Volume 1: Fuzzy Logic, Systems, Artificial Neural Networks, and Learning Systems, pp. 43–
66. World Scientific, New Jersey (2016)
2. Chen, J., Zaiane, O.R., Goebel, R.: Detecting communities in large networks by iterative
local expansion. In: International Conference on Computational Aspects of Social Networks,
CASON’09, pp. 105–112. IEEE (2009)
3. Corne, D.W., Jerram, N.R., Knowles, J.D., Oates, M.J.: Pesa-ii: Region-based selection in evo-
lutionary multiobjective optimization. In: Proceedings of the 3rd Annual Conference on Genetic
and Evolutionary Computation, pp. 283–290. Morgan Kaufmann Publishers Inc. (2001)
4. Gong, M., Cai, Q., Chen, X., Ma, L.: Complex network clustering by multiobjective discrete
particle swarm optimization based on decomposition. IEEE Trans. Evol. Comput. 18(1), 82–97
(2014)
256 D. H. Grass-Boada et al.

5. Gong, M., Ma, L., Zhang, Q., Jiao, L.: Community detection in networks by using multiobjective
evolutionary algorithm with decomposition. Phys. A: Stat. Mech. Appl. 391(15), 4050–4060
(2012)
6. Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the pareto archived
evolution strategy. Evol. Comput. 8(2), 149–172 (2000)
7. Lancichinetti, A., Fortunato, S.: Consensus clustering in complex networks. Sci. Rep. 2, 336
(2012)
8. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical com-
munity structure in complex networks. New J. Phys. 11(3), 033015 (2009)
9. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detec-
tion algorithms. Phys. Rev. E 78, 046110 (2008)
10. Li, Y., Wang, Y., Chen, J., Jiao, L., Shang, R.: Overlapping community detection through an
improved multi-objective quantum-behaved particle swarm optimization. J. Heuristics 21(4),
549–575 (2015)
11. Liu, C., Liu, J., Jiang, Z.: An improved multi-objective evolutionary algorithm for simultane-
ously detecting separated and overlapping communities. Nat. Comput. 15(4), 635–651 (2016)
12. Liu, B., Wang, C., Wang, C., Yuan, Y.: A new algorithm for overlapping community detection.
In: 2015 IEEE International Conference on Information and Automation, pp. 813–816. IEEE
(2015)
13. Liu, J., Zhong, W., Abbass, H.A., Green, D.G.: Separated and overlapping community detection
in complex networks using multiobjective evolutionary algorithms. In: 2010 IEEE Congress
on Evolutionary Computation (CEC), pp. 1–7. IEEE (2010)
14. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E
69, 066133 (2004)
15. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys.
Rev. E 69(2), 026113 (2004)
16. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure
of complex networks in nature and society. Nature 435(7043), 814 (2005)
17. Pizzuti, C.: Ga-net: a genetic algorithm for community detection in social networks. In: Inter-
national Conference on Parallel Problem Solving from Nature, pp. 1081–1090. Springer (2008)
18. Pizzuti, C.: A multiobjective genetic algorithm to find communities in complex networks. IEEE
Trans. Evol. Comput. 16(3), 418–430 (2012)
19. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying com-
munities in networks. Proc. Natl. Acad. Sci. 101(9), 2658–2663 (2004)
20. Shen, H., Cheng, X., Cai, K., Hu, M.B.: Detect overlapping and hierarchical community struc-
ture in networks. Phys. A: Stat. Mech. Appl. 388(8), 1706–1712 (2009)
21. Shi, C., Yan, Z., Cai, Y., Wu, B.: Multi-objective community detection in complex networks.
Appl. Soft Comput. 12(2), 850–859 (2012)
22. Talbi, E.G.: Metaheuristics: from Design to Implementation, vol. 74. Wiley, New York (2009)
23. Wang, X., Liu, G., Li, J.: Overlapping community detection based on structural centrality in
complex networks. IEEE Access 5, 25258–25269 (2017)
24. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cam-
bridge University Press, Cambridge (1994)
25. Wen, X., Chen, W.N., Lin, Y., Gu, T., Zhang, H., Li, Y., Yin, Y., Zhang, J.: A maximal clique
based multiobjective evolutionary algorithm for overlapping community detection. IEEE Trans.
Evol. Comput. 21(3), 363–377 (2017)
26. Yao, Y., et al.: Granular computing: basic issues and possible solutions. In: Proceedings of the
5th Joint Conference on Information Sciences, vol. 1, pp. 186–189 (2000)
27. Yuxin, Z., Shenghong, L., Feng, J.: Overlapping community detection in complex networks
using multi-objective evolutionary algorithm. Comput. Appl. Math. 36(1), 749–768 (2017)
28. Zhou, Y., Wang, J., Luo, N., Zhang, Z.: Multiobjective local search for community detection
in networks. Soft Comput. 20(8), 3273–3282 (2016)
In-Database Rule Learning Under
Uncertainty: A Variable Precision Rough
Set Approach

Frank Beer and Ulrich Bühler

Abstract Relational Database Systems are the predominant repositories to store


mission-critical information collected from industrial sensor devices, business trans-
actions and sourcing activities, among others. As such, they provide an exceptional
gateway for data science. However, conventional knowledge discovery processes
require data to be transported to external mining tools, which is a very challenging
exercise in practice. To get over this dilemma, equipping databases with predic-
tive capabilities is a promising direction. Using Rough Set Theory is particularly
interesting for this subject, because it has the ability to discover hidden patterns
while founded on well-defined set operations. Unfortunately, existing implemen-
tations consider data to be static, which is a prohibitive assumption in situations
where data evolve over time and concepts tend to drift. Therefore, we propose an in-
database rule learner for nonstationary environments in this chapter. The assessment
under different scenarios with other state-of-the-art rule inducers demonstrate the
algorithm is comparable with existing methods, but superior when applied to critical
applications that anticipate further confidence from the decision-making process.

1 Introduction

Data analysis became more pronounced with Machine Learning (ML) and the broad
availability of related frameworks in the 1990s and early 2000s. Over time, these
software systems constantly have been refined supplying a huge arsenal of mining
algorithms turning conventional workstations into analytical platforms. One of the
reasons for their saturating success in practice has certainly been their simple and
intuitive design, which still make them a central workhorse for data science nowa-
days. As these ML workbenches are usually isolated from the problem domain, the

F. Beer (B) · U. Bühler


University of Applied Sciences Fulda, Leipziger Straße 123, 36037 Fulda, Germany
e-mail: frank.beer@informatik.hs.fulda.de
U. Bühler
e-mail: u.buehler@informatik.hs.fulda.de

© Springer Nature Switzerland AG 2019 257


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_14
258 F. Beer and U. Bühler

typically mining process involves load procedures to import data of interest either
given through flat files or external data repositories right before knowledge extraction
can commence. While these import mechanisms work properly for data sets of mod-
erate size, they perform rather poorly for large quantities of data due to inefficient file
operations or enduring data transmissions. Regarding the challenge of ever-growing
data volumes to analyze, these traditional loading techniques, thus, become a huge
concern for mining tasks in the long run. To mitigate these downsides of classic ML
software frameworks, a decisive paradigm termed “in-database analytics” evolved in
data science and related disciplines (e.g. [1–4]). It essentially brings analytics to the
data taking advantage of native SQL and other built-in functionality such as efficient
data structures or parallel processing. Hence, in-database processing has the poten-
tials to largely reduce data transports by fusing ML components and data repository
into a single scalable mining system. This is favorable for many real-world scenarios
as hidden knowledge is stored in relational Database Systems (DBs) predominantly
either provided through transactional data or warehouses.
Employing Rough Set Theory (RST) is of particular interest for in-database ana-
lytics, because it is based on pure set operations, that are efficiently implemented by
most relational engines and, in fact, research in that direction is promising given the
current progress (e.g. [5–8]). However, most existing approaches exhibit two funda-
mental drawbacks: (i) They are either impractical due to their poor implementation
or unable to cope with vagueness as opposed to the virtue of RST. (ii) Furthermore,
they consistently consider data to be drawn from the same distribution. To their
full extend, both points have practical relevance and constitute strong assumptions in
uncertain and highly dynamic environments. This is particularly true when analyzing
data that evolve over time, which are ultimately stored at a DB. Under such circum-
stances, noise can be apparent or the concepts to be learned may change suddenly or
gradually in an unforeseeable fashion drastically degrading classification accuracy
of an initially trained predictive model. This phenomenon is commonly referred to
as “concept drift” (e.g. [9, 10]) and requires learning algorithms to provide adequate
mechanisms to adopt to these changes. Various disciplines are suffering from drifts
due to their nonstationary nature. These include marketing applications, where cus-
tomer purchasing habits might be influenced due to advertisements or fashion trends
[11]. Another example are long-term studies of medical data that are collected over
years or even decades [12]. Thus, it is very likely the data generating process may
have changed over time making the mining task a difficult endeavor. A final exam-
ple of a drift scenario is adversarial behavior frequently penetrating cyber security
applications. In such settings, an attacker intends to manipulate the outcome of the
learning algorithm to exploit vulnerabilities or to simply evade detection [13].
To address the lack of uncertainty management in recent RST literature for in-
database applications, we propose a new bottom-up rule-based classifier for non-
stationary environments and class imbalance problems as an extension of an earlier
work [14]. It is termed Incremental In-Database Rule Inducer (InDBR), which lever-
ages Variable Precision Rough Sets (VPRS) and efficient database algebra in order to
produce certain and uncertain decision rules as new data samples become available.
The motivation combining rule learning and VPRS to undertake mining tasks under
In-Database Rule Learning Under Uncertainty: A Variable Precision … 259

changing conditions has several reasons. In general, rules are intelligible, and thus
an exceptional source to describe expressive pattern towards transparent decision-
making. Furthermore, each rule can be updated easily without retraining the entire
model in case parts of it are subject to drifts. Ultimately, an intrinsic concern of
nonstationary environments is data noise, which is natively handled by VPRS and an
in-database implementation has been recently compiled [15]. These benefits given
as baseline, InDBR features a novel bottom-up generalization strategy reacting fast
to drifts. Additionally, InDBR has the ability to abstain from classification in situa-
tions it is uncertain, which increases confidence especially for critical applications
that require quality predictions and traceability for domain experts rather than unex-
plainable prospects.
The remainder of this chapter is structured as follows: First, we introduce funda-
mentals of VPRS (Sect. 2) and retrospect related approaches of other authors (Sect. 3).
In Sect. 4, VPRS is formally transported to the domain of DBs, which is exploited
in Sect. 5 proposing InDBR. Section 6 evaluates InDBR and two other state-of-the-
art rule inducers towards both predictive and descriptive capabilities. Based on the
obtained results, we recap and conclude on this chapter (Sect. 7).

2 Rough Set Preliminaries

This section outlines rudiments of RST and VPRS as originally introduced by Pawlak
[16, 17] and Ziarko [18]. We describe basic data structures, indiscernibility relation
(Sect. 2.1) and the concept approximation (Sect. 2.2).

2.1 Information Systems and Indiscernibility Relation

Information in RST is represented in a two-dimensional data structure called Infor-


mation System (IS), which consists of objects U = {x1 , . . . , xn } and attributes
A = {a1 , . . . , am }, n, m ∈ N. Thus, it can be expressed within the tuple U, A,
where each a ∈ A constitutes a formal mapping from U to a’s value range Va , i.e.
a : U → Va . An extension to an IS is the Decision System (DS), which holds some
context-specific decision made by an expert or teacher in addition. This information
is represented by the decision features d ∈ D with d : U → Vd . It is denoted by
U, A, D with A ∩ D = ∅. If we have any a ∈ A ∪ D : a(x) = ⊥, i.e. a missing or
null value, the underlying structure is called incomplete, otherwise we call it com-
plete. Objects inside an IS or DS can be discerned using the indiscernibility relation
w.r.t. the feature set B ⊆ A. Formally, it is an equivalence relation denoted by

I N D(B) = {(x, y) ∈ U × U | ∀a ∈ B : a(x) = a(y)} , (1)


260 F. Beer and U. Bühler

which induces a partition U/I N D(B) consisting of pair-wise disjoint non-empty


equivalence classes K j over U w.r.t. B. For short, we write U/B = {K 1 , . . . , K q }, 1 ≤
j ≤ q ∈ N. Consequently, partitions induced by decision features E ⊆ D are denoted
in a similar fashion, i.e. U/E = {C1 , . . . , Ck }, k ∈ N.

2.2 Variable Precision Rough Sets

In order to approximate a target concept X ⊆ U using B ⊆ A, RST makes use of


the standard subset inclusion to determine whether X can be classified with certainty
(i.e. K ⊆ X ) or vaguely (i.e. K ∩ X = ∅ and K  X ) for K ∈ U/B. This formal
approach is relaxed in VPRS towards a majority inclusion allowing to address minor
irregularities in the data that would be considered uncertain using classic RST. There-
fore, VPRS introduces the relative inclusion

1 − |X|X∩Y| | , if X = ∅
c(X, Y ) = (2)
0 , otherwise

where X and Y are two ordinary sets. Using the bound c(X, Y ) ≤ β with 0 ≤ β <
0.5, X is said to be included in Y w.r.t. the permitted error β and we write X ⊆β Y .
Combining this relaxation and the indiscernibility relation, a given target concept
can be classified in terms of VPRS using the following two definitions.
Definition 1 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-lower approximation of X can be expressed by

X B,β = {K ∈ U/B | c(K , X ) ≤ β} . (3)

Definition 2 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-upper approximation of X can be expressed by

X B,β = {K ∈ U/B | c(K , X ) < 1 − β} . (4)

With both Definitions 1 and 2, we retrieve the β-approximation of X w.r.t. B and


precision β, i.e. a variable precision rough set X B,β , X B,β . One can verify, that
information in B is insufficient if we have X B,β = X B,β . Object causing this uncer-
tainty are consolidated within the β-boundary approximation as given in Definition 3.

Definition 3 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-boundary approximation of X can be expressed by

X B,β = X B,β \ X B,β . (5)


In-Database Rule Learning Under Uncertainty: A Variable Precision … 261

For binary or multiclass classification problems, VPRS provide further notions. Uti-
lizing a DS, they are determined by the following Definitions 4 and 5.
Definition 4 Given U, A, D, B ⊆ A, E ⊆ D and β ∈ [0, 0.5), all concepts
induced by the partition U/E can be evaluated using the β-positive region

P O S B,E,β = X B,β . (6)
X ∈U/E

Definition 5 Given U, A, D, B ⊆ A, E ⊆ D and β ∈ [0, 0.5), all concepts


induced by the partition U/E can be evaluated using the β-boundary region

B N D B,E,β = X B,β . (7)
X ∈U/E

Since P O S B,E,β is the union of all available β-lower approximations with respective
X ∈ U/E, it covers those x ∈ U which can be classified with certainty using B and
precision β. Conversely, B N D B,E,β holds all inconsistent objects. Employing both
β-regions, a comprehensive outline on the quality of B w.r.t. E is supplied. Note,
VPRS are a generalization of classic RST. One can verify that in case of β = 0, both
models are equivalent.
Note, the algorithms presented in this chapter only rely on the β-approximation to
induce decisive rules as part of the knowledge extraction process. On these grounds,
we omit the introduction of core and reduct as key features of RST. Instead, the
interested reader is referred to [17, 18] for further details on this subject.

3 Related Work

In this section, a brief review of close approaches is provided incorporating RST


and relational DBs (Sect. 3.1). Furthermore, state-of-the-art rule-based classifiers
are presented coping with nonstationary environments (Sect. 3.2).

3.1 Combining Rough Sets and Databases

Research on combining DBs and RST dates back to the mid 1990s in order to lever-
age the efficient infrastructure provided (parallelism, algorithms, data structures and
statistics). In this context, one of the first systems using database algorithms is the
data mining toolkit RSDM [5]. It incorporates SQL commands to pretreat and fetch
relevant data from a DB, which are finally processed on a row-by-row basis to com-
pute VPRS. This conventional client-server architecture provides solid performance
as long as data can be compressed adequately at the DB end as fewer rows need
262 F. Beer and U. Bühler

to be processed. At this point, it is unclear whether the authors used DB cursors or


an external procedure computing VPRS. In both cases, however, poor performance
can be assumed as cursor implementations can be considered rather inefficient and
external processing suffers from enduring network input/output [15]. Therefore, [19]
suggests to push more aggregation operations to the DB in order to avoid transmitting
huge volumes of data over the wire. This can decrease latency but still a client-server
communication is implied. This drawback is addressed in [20] incorporating the
concept approximation with DBs by modifying relational operations. This, in turn,
entails adjustments to the internals of a DB, which prohibit a general employment of
RST with other DBs. More elaborated methods are introduced by [6, 7, 21] for fea-
ture selection tasks. These approaches run completely in-database exploiting existing
relational operations, but they are not fully compliant with the concept approxima-
tion of RST requiring further processing steps to handle inconsistency in the data. It
is worth noting that due to the clean conditions presumed, different core and reduct
attributes are obtained as opposed to the classic definitions. A procedure capable to
extract rules exploiting DB technology is proposed in [8]. Since it is based on the
ideas in [7, 21], removing data inconsistency is still an obligatory step. Driven by
the drawbacks of previous methods, the work in [22] introduces a new rough set
model for DBs, which is fully compliant with the original definitions of RST using
extended relational algebra. Hence, this model can be ported to any conventional DBs
supporting SQL. The results of this work are further extended in [15] to compute
VPRS with an emphasis on dimensionality reduction.
To our best knowledge, no approach exists to date operating in the presence of
concept drift despite the mentioned effort. Therefore, we build on the latest results
in [15] and propose InDBR, which produces decision rules in nonstationary envi-
ronments for in-database applications.

3.2 Rule Learning Under Drifting Conditions

Decision rules are used to represent knowledge for decades. One of the first
approaches handling concept drifts is the family of algorithms called FLORA con-
sisting of FLORA2, FLORA3 and FLORA4 [23]. The main ideas behind FLORA2
is a partial memory storing examples used to induce new rules. The memory is
implemented as sliding window and contracts as drifts occur. FLORA3 expands
FLORA2 by dealing with reappearing concepts. After each learning cycle, it deter-
mines whether to reconsider useful rules of the past. FLORA4 distinguishes between
concept drift and data noise by tracking a rule’s accuracy through confidence inter-
vals. Another method derived from the classic sequential covering algorithm AQ [24]
is AQ11-PM+WAH [25]. It incorporates the adaptive window of [23] to handle drift-
ing conditions. As such, AQ11-PM+WAH is comparable to FLORA2 performance-
wise, but maintains fewer examples during learning. However, both mentioned rule
learners are not designed to process data arriving in a stream-like fashion. FACIL is
the first algorithm explicitly built to mine numeric data streams [26]. It is a bottom-up
In-Database Rule Learning Under Uncertainty: A Variable Precision … 263

rule inducer that is able to store inconsistent rules with corresponding examples. To
maintain a specific purity within the rule set, a user-defined threshold needs to be pro-
vided. Rules violating the minimum purity are replaced by new rules generated from
the associated examples. A complete different approach is applied in VFDR, which
produces either ordered or unordered sets of rules following a top-down approach
by stepwise specializing existing rules [27]. The rule induction is guided by the
Hoeffding bound [28] as an adaption from the decision tree VFDT [29]. In order
to improve its performance under drift, VFDR is extended with a drift detector in
[30]. Often a demand to stream-based learners is the any-time property, i.e. always
being able to classify incoming examples (e.g. [31, 32]). Thus, [27, 30] incorporate
Naive Bayes (NB), which takes over classification in scenarios, where no appropri-
ate rule exist in the rule set. An algorithm explicitly relaxing the any-time capability
is eRules [33], which enhances the well-known batch learner PRISM [34]. During
training, it buffers incoming examples that are unclassifiable by the existing rule set
and triggers PRISM once a user-defined threshold is reached. As such, classification
is abstained when no appropriate rule exists or eRules is uncertain. This approach is
further improved through its successor G-eRules [35] since eRules performs poorly
when confronted with continuous data. The most recent rule classifier compiled by
the authors of [33, 35] is called Hoeffding Rules [36], which incorporates the Hoeffd-
ing bound as a statistical measure to determine the number of examples required to
stimulate the production of new decision rules. The latest bottom-up rule approach
coping with drifts is the any-time algorithm RILL [37], whose induction strategy is
based on distance measures to find nearest rules. Furthermore, it utilizes intensive
pruning only keeping most essential information.
In contrast to the presented approaches, our incremental rule-based learner
exploits VPRS and is designed for in-database applications. Additionally, it adopts
the idea of [33] relaxing the any-time requirement. Particularly, the latter point is
very beneficial for real-world scenarios that require reliable classification capabilities
supporting decision-makers.

4 In-Database Variable Precision Rough Set Model

This section is based on [15] and formally brings VPRS to the domain of rela-
tional DBs. First, we discuss how to express an IS and DS in DB terminology
(Sect. 4.1) and introduce required relational operations to port the indiscernibility
relation (Sect. 4.2). Ultimately, a redefinition of the β-approximation is provided
permitting to compute VPRS inside DBs (Sect. 4.3).
264 F. Beer and U. Bühler

4.1 Information Systems and Database Tables

An IS is a data structure that naturally corresponds to a data table in relational terms.


However, essential differences can be identified when focusing on their different
scopes [38]. While an IS is used to discover patterns in a snapshot fashion, the
philosophy of a table is to serve as a repository for long-term data storing and retrieval
respectively [39]. However, we try to overcome these contextual deviations by simply
assembling an IS or DS to the relational domain considering the following: Let be
U, A, D with U = {x1 , . . . , xn }, the features A = {a1 , . . . , am } and the decision
D = {d1 , . . . , d p }, n, m, p ∈ N, then we use the traditional notation of a (m + p)-ary
DB relation
T ⊆ Va1 × · · · × Vam × Vd1 × · · · × Vd p , (8)

where Vai and Vd j are the attribute domains of ai , 1 ≤ i ≤ m and d j , 1 ≤ j ≤ p,


which conforms with the definition of a DS intuitively (see Sect. 2.1). Additionally,
we permit T to hold duplicated tuples, i.e. fulfilling multiset semantics, and write
Ta1 ,...,am ,d1 ,...,d p  or TA,D to indicate T with its underlying attribute schema. In terms
of a conventional U, A, we indicate the corresponding data table by Ta1 ,...,am  or
TA respectively.

4.2 Indiscernibility and Relational Operations

Given the definition of a data table TA from the previous section, such a relation
can also be a result of any of the following algebraic operations: projection π , selec-
tion σ , grouping G and joining . In more detail, π B (TA ) allows to project tuples
t ∈ TA to a specified feature subset B ⊆ A while removing duplicates. A projec-
tion without duplicate elimination is indicated by π B+ (TA ). Note, we further permit
attribute modifications during the projection through simple assignments or arith-
+
metic operations. An illustrative example is given by π3→x,x→y,x+y→z (Tx,y ), where
x is assigned to the value 3, y is allocated with x and the new attribute z holds the
sum of x and y respectively. Filtering tuples is performed via σφ (TA ). It essentially
removes those t ∈ TA not fulfilling condition φ and keeps the original schema A.
The grouping operator G F,G,B (TA ) groups tuples of TA according to the attributes
G and applies the aggregation functions F = { f 1 , . . . , fr }, r ∈ N0 , while the output
schema of G corresponds to F, B with B ⊆ G ⊆ A. In this respect, we have for
F = ∅ and G = B : G F,G,B (TA ) ≡ π B (TA ). That given, we are able to define the
indiscernibility relation based on extended relational algebra. For our purpose, we
simply count the number of members in each elementary class of a given table TA ,
i.e. the cardinality expressed by the aggregate count, and include it as new feature c.
Consolidated, we make use of the following notation

G˜c,B
G
(TA ) := ρc,b1 ,...,bm  (G{count},G,B (TA )) , (9)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 265

with the ρ-operator renaming attribute count to c and B = {b1 , . . . , bm } ⊆ G ⊆ A


resulting in the respective output schema c, b1 , . . . , bm . Furthermore, our model
is based upon the fusion of relations, which is sufficiently provided by the natural
join operator . It assembles two tables SW  and TH  indicated by SW  TH . The
result of this expression is a new relation R such that b S = bT , ∀b ∈ W ∩ H . Note,
R’s schema consists of all features in W and H , where equal attributes are shown
only once.

4.3 Computing Variable Precision Rough Sets

Having discussed the mapping of an IS and DS respectively alongside with the indis-
cernability relation from a DB perspective, this section transfers the β-approximation
to the domain of DBs in two phases. First, we restructure Definitions 1–5 to rewrit-
ten set-oriented expressions and show that these are no extensions to Ziarko’s model
but equivalent terms given through Propositions 1 and 2. These propositions can be
ported to relational algebra intuitively and hence Theorems 1 and 2 can be obtained
in the second stage representing a compliant in-database VPRS model. To point out
the practical efficiency of that resulting model, Theorem 3 is presented and briefly
discussed.

Proposition 1 Let U, A and B ⊆ A. For any X ⊆ U and a fixed β ∈ [0, 0.5), the
β-approximation of X can be described by

{K ∈ U/B | ∃H ∈ X/B : φ} , (10)

where the condition φ is defined as




⎨c(K , H ) ≤ β, for X B,β
φ : c(K , H ) < 1 − β, for X B,β (11)


β < c(K , H ) < 1 − β, for X B,β .

Proof We have to compare classes K ∈ U/B which have elements in X ⊆ U


and H ∈ X/B. Because of X ⊆ U , we obtain for K ∩ X = ∅: K ∩ X = H and
thus c(K , X ) = 1 − |K ∩X |/|K | = 1 − |H |/|K | = 1 − |K ∩H |/|K | = c(K , H ). It follows:
c(K , X ) = c(K , H ) ≤ β, which is proposed by X B,β . Likewise, we can show
c(K , X ) < 1 − β is equivalent to c(K , H ) < 1 − β holding X B,β . From those two
justifications, we can deduce X B,β immediately. 

Proposition 2 Let U, A, D, B ⊆ A and E ⊆ D. For any fixed β ∈ [0, 0.5), the
β-regions P O S B,E,β and B N D B,E,β can be described by

{K ∈ U/B | ∃H ∈ U/(B ∪ E) : φ} , (12)
266 F. Beer and U. Bühler

where the condition φ is defined as



c(K , H ) ≤ β, for P O S B,E,β
φ: (13)
β < c(K , H ) < 1 − β, for B N D B,E,β .

Proof Exploiting the equality {H ∈ X/B | X ∈ U/E} = U/(B ∪ E), we conclude


Proposition 2 directly from Proposition 1. 

Theorem 1 Let TA , B ⊆ A, β ∈ [0, 0.5) and let the target concept CA be a subset
of T . We can compute the β-lower (L B,β (T, C)), β-upper (U B,β (T, C)) and β-
boundary approximation (B B,β (T, C)) of C using the relational operations

πc+t ,b1 ,...,bm (σφ (G˜cBt ,B (T )  G˜cBp ,B (C))) , (14)

where the condition φ is defined as


⎧ cp
⎨1 − ct ≤ β,
⎪ for X B,β
cp
φ : 1 − ct < 1 − β, for X B,β (15)

⎩ cp
β < 1 − ct < 1 − β, for X B,β .

Theorem 2 Let TA,D , B ⊆ A, E ⊆ D and β ∈ [0, 0.5). The β-positive region


(L B,E,β (T )) and β-boundary region (B B,E,β (T )) can be computed by

πct ,b1 ,...,bm (σφ (G˜cBt ,B (T )  G˜cB∪E


p ,B
(T ))) , (16)

where the condition φ is defined as


 c
1 − cpt ≤ β, for P O S B,E,β
φ: cp (17)
β < 1 − ct < 1 − β, for B N D B,E,β .

Theorem 3 The in-database VPRS model based on extended relational algebra


given through Theorems 1 and 2 can be computed in O(nm), where n is the number
of tuples and m the number of attributes.

Proof The grouping (G ) and projection (π ) can be implemented using hash aggre-
gations, which requires nm time for either operation. Therefore, the comparison ()
of both partitions utilizing the hash join algorithm results in 4nm. At most, the selec-
tion (σ ) requires a sequential scan followed by the final projection (π ). Thus six
subsequent scans need to be performed overall, which is O(nm). 
In-Database Rule Learning Under Uncertainty: A Variable Precision … 267

Theorem 3 relies on adequate hash algorithms that are provided by most conven-
tional DB engines such as Oracle,1 PostgreSQL2 or SQL-Server.3 Additionally, it
assumes that a collision resistant hash function and sufficient main memory are
assured to accomplish the computation. We should further state that using Theo-
rem 3 also enables a high degree of parallelism either given through a single node or
distributed DB.
Note, respective corollaries can be exposed from Theorems 1 and 2 for the task of
feature selection in particular, i.e. seeking core and reducts in relational environments.
For the sake of completeness, the reader is referred to [15] treating this subject in
more detail.

5 In-Database Rule Learning

Based on the theoretic considerations from the previous section to describe VPRS
in terms of DB terminology, in this section we introduce InDBR as new in-database
rule learner. Therefore, we discuss important notations for rules (Sect. 5.1) alongside
with an approach to handle data imbalance (Sect. 5.2). These aspects are finally
incorporated into InDBR as essential part of its learning strategy (Sect. 5.3).

5.1 Rule Representation and Properties

Unlike other ML algorithms, the ultimate goal of rule-based learning is to induce


a predictive model consisting of expressive rules, which provide transparency to
decision-makers. The current version of our approach operates on nominal data
exclusively and produces a set of decision rules in propositional form represented
through a DB relation. Such a rule is illustrated by

a1 = va1 ∧ · · · ∧ an = van → d = vd . (18)

The left part of the rule is the descriptor or condition and the right part poses the
conclusion or consequent. The descriptor comprises a conjunction of literals a =
va denoting a logical test whether attribute a has the value va ∈ Va . In case the
entire conjunction holds true, the rule is applicable and returns the corresponding
conclusion.4 This way, a rule can be understood intuitively as follows: if condition
then consequent. Further important characteristics of a rule are concerned with its

1 https://www.oracle.com/database/enterprise-edition/.
2 https://www.postgresql.org/.
3 https://www.microsoft.com/sql-server/.
4 We defined the rule conclusion in (18) over a single decision attribute out of simplicity. For this
reason, we restrict further related formalism consequently to one decision attribute only w.l.o.g.
268 F. Beer and U. Bühler

length and coverage. Given an arbitrary rule r , we define len(r ) to be the number of
literals constituting the descriptor, while cov(r ) exhibits the set of covered examples
by r . In this context, a rule r is said to cover an example e if all literals in the descriptor
hold true on e. Being able to compare rules, we simply use set-theoretic operations.
As such, a rule r is said to be more general than another rule r  if its coverage is equal
or beyond the coverage of r  , i.e. cov(r ) ⊇ cov(r  ). In order to obtain the coverage
in terms of r ’s classification ability, cov p (r ) and covn (r ) are essential indicating the
positive and negative coverage respectively. Thus, cov p contains the set of covered
examples by r ’s consequent and covn those where the conclusion fails. Combined,
we are able to introduce the error δ of a rule r given through

|cov p (r )| |cov p (r )|
δ(r ) = 1 − =1− . (19)
|cov p (r ) ∪ covn (r )| |cov(r )|

Note, since induced rules are stored in a relation with fixed schema, compact rules
with a length smaller than the schema require special treatment. For this purpose and
being able to perform relational operations on rules in a unified fashion, we allow the
rule set to be incomplete, i.e. permitting null values (see Sect. 2.1). In this regard, the
length of a rule is determined by all properly set literals a = v except those where
a = ⊥.
The induction of our method is guided by the efficient relational representations
of the β-regions L and B introduced in Sect. 4.3. Both queries, however, suppress
the decision attribute, which requires additional steps to expose final rules on a given
input relation TA,{d} . These steps are computed as follows

{c,b ,...,b ,h ,...,h ,d}


ω(Q B,{d},β (T )) :=πc+p ,c−c p →cn ,b1 ,...,bm ,h 1 ,...,h q ,d (G˜c p ,{c,b
1 m 1 q
1 ,...,bm ,h 1 ,...,h q ,d}
+
(πc,b1 ,...,bm ,⊥→h 1 ,...,⊥→h q ,d
(Q B,{d},β (TA,{d} )  TA,{d} ))) ,
(20)
where Q stands either for L or B, B = {b1 , . . . , bm } and A \ B = {h 1 , . . . , h q },
m, q ∈ N. ω extracts rules controlled by B ⊆ A and precision β. In addition to
descriptor and conclusion, the final schema of ω also contains statistics of each par-
ticular rule r ∈ ω(Q B,{d},β (T )) through the attributes c p and cn , which correspond
to cov p (r ) and covn (r ) respectively. Considering the extraction of certain rules via
ω(L B,{d},β (T )), several rules may exist with different decisions but same descriptor
as a result of the admissible error β per equivalence class. This is crucial causing
inconsistency even though one of the conflicting rules positively covers the majority
of examples. Therefore, we are only interested in those rules r ∈ ω(L B,{d},β (T ))
maximizing cov p (r ) per conflict and in those not disputing. Conversely, rules pro-
duced by ω(B B,{d},β (T )) are quite uncertain from the current processing point of
view, because neither rule provides sufficient evidence w.r.t. β on input T . However,
such rules may be of interest in future decision-making. Thus, our approach induces
and maintains both types of rules, which is discussed further in Sect. 5.3.
In-Database Rule Learning Under Uncertainty: A Variable Precision … 269

5.2 Partial Memory and Data Imbalance

In conventional settings where the data distribution is invariant w.r.t. changes, learn-
ing predictive models from a static source of data is state-of-the-art. In nonstationary
environments, however, existing models become outdated as the assumed condi-
tions, it was trained on, are not valid any longer. Thus, more dynamics in terms of
the learner’s visibility are essential. In this context, it is common practice for incre-
mental learning to utilize a sliding window or micro batches as partial memory to
serve the underlying induction process. While, these approaches are straightforward
and ensure to train on most recent information representing the current trends in the
data, they are incapable to sort out situations where the class distribution appears to
be skewed. Beside concept drifts, this phenomenon frequently occurs in a number
of critical applications including intrusion detection, fraud or customer churn dis-
covery and pose a crucial concern for many learning algorithms that typically bias
towards the majority class. Generally, learning in such a setting is known as the “class
imbalance problem”.
To counteract this issue in nonstationary environments, we propose a new
approach that relies not on one but on k ∈ N sliding windows, where k is the number
of expected concepts to learn with a predefined window size w ∈ N. Consequently,
the partial memory maintains kw examples in worst-case and keeps instances of
the minority classes much longer compared to those from the majority classes.
This, in turn, constitutes a natural under-sampling technique for majority examples
and provides a balanced representation for the induction process once all windows
are filled accordingly. The concept of this approach is illustrated in Fig. 1, where
W = {W1 , . . . , Wk } are the sliding windows of the proposed data structure each of
size w.

5.3 Incremental Rule Inducer

Taking the previous considerations into account, we present the incremental in-
database rule inducer InDBR from a conceptual perspective in this section and con-
centrate on training and generalization procedure respectively. In a nutshell, it utilizes

older examples newer examples


W1 : ...
...
...

Wk : ...
size w

Fig. 1 Proposed partial memory consisting of k sliding windows with size w


270 F. Beer and U. Bühler

Algorithm 1 InDBR (Training)


Predefined settings: A (condition attributes), d (decision attribute), αd (max. age of rules per
class), β (error tolerance), v (batch size), m (hop coverage constant), g (function determining the
percentage of rules to generalize), R (rule set), W (sliding window)
Input: V (incoming batch of maximum size v)
Output: R
1: BEGIN
2: append V to W /*in case the buffer exceeds, remove oldest entries*/
3: update statistics of R w.r.t. W
4: R ← generalize rule set w.r.t. β, m, g, R, W
5: P ← {r ∈  R | cov p (r ) ∩ V = ∅ ∧ δ(r ) ≤ β} /*active rules positively covering batch*/
6: U ← V \ [ {cov(r )|δ(r ) ≤ β, r ∈ R}] /*determine uncovered examples*/
7: L ← ω(L A,{d},β (U )) /*all new active rules*/
8: B ← ω(B A,{d},β (U )) /*all new inactive rules*/
9: R ← R ∪ L ∪ B /*append new rules to R*/
10: reset age α(r ) for r ∈ P according to αd
11: increment α(r ) for r ∈ R \ (P ∪ L ∪ B)
12: R ← R \ {r ∈ R | α(r ) exceeds αd } /*retire antiquated rules*/
13: END

incoming training examples to further generalize the existing rule set based on a novel
bottom-up generalization strategy exploiting VPRS. Remaining examples still not
covered after generalizing are turned into most specific rules ensuring a complete
coverage from the current point of view. Finally, InDBR keeps track of its model
quality by pruning old or unused rules. Due to this training cycle, the expressiveness
of InDBR’s predictive model evolves over time as new training data arrive, while
keeping its complexity low focusing on the most recent input. These are important
characteristics to quickly react to abrupt or gradual changes in the underlying con-
cepts to learn. The internals of InDBR are presented in Algorithm 1, which is shown
as pseudocode facilitating readability rather than providing complex DB statements.
However, one can verify its complete translation to the domain of DBs can be carried
out straightforwardly. To further detail the functional operations, we categorized the
training procedure into four main steps:

(i) consolidate incoming data (line 2–3)


(ii) generalize existing rule set (line 4)
(iii) extract new rules (line 5–9)
(iv) maintain rule aging (line 10–12).

Step (i) refers to the handling of incoming examples. These are provided by a
relation V serving as interface. Thereby, InDBR supports two types of input pro-
cessing, i.e. example-by-example or a batch of training data. On the one hand, this
permits comparability with other approaches as most related work operate on data
streams processing each arriving training example sequentially. On the other hand,
relational DBs generally show better performance, when confronted with a batch of
data taking advantage of parallel DB operations. Thus, the input to InDBR can be
adjusted according to different scenarios, which is controlled by the parameter v.
In-Database Rule Learning Under Uncertainty: A Variable Precision … 271

Furthermore, InDBR utilizes a buffer for incoming data acting as partial memory
(see Sect. 5.2), which is implemented using a conventional table W . Since W is fun-
damental to infer new rules or to generalize existing ones, InDBR’s current rule set
R needs to be refreshed as new data arrives due to potentially outdated statistics.
The next stage (ii) is concerned with the generalization of existing rules depicted in
Algorithm 2. It partitions the entire rule set according to their corresponding length j
into disjoint subsets R j . These are iteratively processed in ascending order to retrieve
attribute sets of more general rule candidates. In order to obtain such rules in the
current iteration 2 ≤ j ≤ t, we define a function g : N → [0, 1] that determines the
percentage of rules to generalize according to the length j. Having selected such an
proportion K R ⊆ R j of size g( j) · |R j | at random, “dropping conditions” is carried
out to seek for new rules as well-established generalization strategy [40]. In essence,
it stepwise drops literals from existing rules to retrieve more general ones. In our case,
the heuristic is guided by two measures, i.e. cov and δ such that cov(r̂ ) ⊃ cov(r )
and δ(r̂) ≤ β holds true for an arbitrary parent rule r and its potential successor rule
r̂ with len(r̂ ) = len(r ) − 1. Utilizing this approach, truly more general rules are
retrieved, which on the one hand may produce a higher error in comparison to the
predecessor. On the other hand, such rules can also be seen more tolerant increasing
its range for unseen examples. As a consequence, we get not only new rules, but may
obtain interesting attribute sets from these rules that can be valuable to extract further
generalizations for rules r ∈ Rl , l > j not considered yet. These are stored within
AC ⊆ P(A) and further processed using VPRS. In particular, InDBR leverages the
partition induced by B using ω(L ) and W for all B ∈ AC efficiently exposing
new rules, while disregarding already covered examples from previous iterations.
By definition, those rules are certain as they are based on the β-positive region and
could directly replace their more specific predecessor. However, there is a high risk
for overgeneralization. To get over this dilemma, InDBR proposes the parameter
m that only permits such abrupt generalizations if new rules r̃ provide sufficient
evidence w.r.t W and its predecessor rule r , i.e. |cov(r̃ )| ≥ |cov(r )| + m. This way,
the generalization routine exploiting VPRS continues with the next iteration j + 1
until all granularity levels have been explored systematically. As a parent r can have
multiple child rules r̂ , a final step is to determine the best descendant out of those
r̂ . The query covering this task seeks for rules with maximum purity 1 − δ(r̂ ) and
highest coverage cov(r̂ ), where the age α of that particular rule is set to the smallest
among r̂ . Ultimately, these best rules are appended to the rule set R, while their
parents are dropped consequently.
As briefly mentioned in the previous sections, InDBR also features the extraction
of uncertain rules, i.e. exhibiting the error β < δ < 1 − β. Obviously, such rules
cannot be used for the classification of incoming examples. However, they might
be valuable in future situations as data evolve. Thus, we refer to active and inactive
rules in this specific context. Both types of rules are extracted in stage (iii) based on
all examples U still not covered by existing rules. Therefore, InDBR makes use of
ω(L ) and ω(B) w.r.t. to U and the entire feature set inducing most specific active
and inactive decision rules.
272 F. Beer and U. Bühler

Algorithm 2 InDBR (Generalizing)


Input: β (error tolerance), m (hop coverage constant), g (function determining the percentage of
rules to generalize), R (rule set), W (sliding window)
Output: R (generalized rule set)
1: BEGIN
2: P R ← {R j | j = 2, . . . , t} with R j = {r ∈ R | len(r ) = j} /*partitioning rule set by length*/
3: Ĉr ← ∅, ∀r ∈ R /*initialize new rule groups for rules more general than r */
4: Ĉr ← {r }, ∀r ∈ R1 /*initialize most general rule group*/
5: Č ← ∅ /*initialize set for best rules per rule group*/
6: FOR j = 2, . . . , t LOOP
7: K R ← randomly select g( j) percent of rules in R j ∈ P R
8: Ĉr ← {r̂ }, ∀r ∈ K R where r̂ is a new rule as a result of the dropping condition heuristic
such that the conditions cov(r̂ ) ⊃ cov(r ) and δ(r̂ ) ≤ β are fulfilled
9: AC ← collection of attribute sets from rules r̂ compiled in the previous step
10: FOR B ∈ AC LOOP 
11: R̃ ← ω(L B,{d},β (W \ [ ∀Ĉr ∀r̂ ∈Ĉr cov(r̂ )])) /*induce new consistent rules with length
j − 1 from W without examples already covered by r̂ ∈ Ĉr */
12: Ĉr ← Ĉr ∪ {r̃ }, ∀r̃ ∈ R̃, r ∈ Rl with l > j, where cov(r̃ ) ⊇ cov(r ) and |cov(r̃ )| ≥
|cov(r )| + m
13: Rl ← Rl \ {r }, ∀r ∈ Rl with l > j where r are the ancestors of r̃ ∈ R̃ in the previous step
14: END FOR
15: END FOR
16: Č ← Č ∪ {ř }, ∀Ĉr where ř is the best rule in Ĉr w.r.t. 1 − δ(r̂ ) and cov(r̂ ) among all r̂ ∈ Ĉr
with α(ř ) = min(α(r̂ ))
17: R ← R \ {r ∈ R | ∃Ĉr = ∅} /*remove all obsolete rules from the rule set*/
18: R ← R ∪ Č /*append new more general rules to rule set*/
19: END

The final step (iv) during the incremental learning takes cares of the rule aging.
It refreshes the age of those rules that correctly hit at least one of the examples in
V . The age of all other rules is incremented. Once a rule exceeds its corresponding
age defined per class in αd , it is removed from R. On the one hand, this ensures to
drop certain rules that were not hit over a longer period, which indicates outdated
knowledge due to a potential shift in the underlying concept. On the other hand,
antiquated uncertain rules may be a result of data noise or an ongoing gradual drift.
When it comes to classification of an incremental rule learner, often one of its
drawbacks is the inability to cover the entire data space as opposed to other learners
such as most decision trees. This contradicts with strict demands, where algorithms
should be able to predict at any time (see Sect. 3.2). Rule learners yet satisfying
such requirements frequently accept a poorer accuracy or use a specific strategy to
compensate this issue. Two common techniques are the introduction of default rules
or the orchestration of an additional predictor with any-time properties, which is
trained in parallel. Both points are crucial for many real-world applications requir-
ing quality predictions over the any-time property, which are fully reproducible by
decision-makers. Two examples include medical diagnostics or network security.
Emphasizing the latter, two practical issues can be identified: (i) Considering the
huge amounts of network traffic to monitor, producing false alarms as a result of a
In-Database Rule Learning Under Uncertainty: A Variable Precision … 273

weak predictive model can overwhelm security analysts easily from an operational
point of view resulting in a loss of trustworthiness. (ii) Once a prediction is made, it
should be transparent ideally represented through a meaningful pattern highlighting
the case. Neither of these points can be addressed when incorporating default rules
that typically reflect nothing more but the class distribution nor by embedding an
additional stable ML model, which generally can be assumed a black-box. In turn,
this results in unexplainable alarms not supporting necessary follow-up activities
required to safeguard the integrity of the network landscape. Thus, we argue that
the rule engine should only fire when it is certain, i.e. quality and unambiguous
rules exist. InDBR addresses these concerns by abstaining classification explicitly
in cases where no adequate rule is present or it is uncertain about an upcoming deci-
sion, which is in line with the opinion of other authors (e.g. [33, 35, 36]). Therefore,
classification relies only on most certain rules given through the active rule set of
InDBR, i.e.
A R = {r ∈ R | δ(r ) ≤ β} . (21)

6 Comparative Study

This section comprises experimental results evaluating the proposed rule inducer
InDBR against other rule learner from different perspectives. First, we introduce
the setup of the experiments with employed data sets (Sect. 6.1). Furthermore, the
predictive capabilities of each rule learner are evaluated (Sect. 6.2) followed by an
assessment of discovery-oriented aspects (Sect. 6.3).

6.1 Experimental Setup and Employed Data Sets

In order to provide a qualitative analysis of InDBR in nonstationary environments


from various angles, a comparison with state-of-the-art rule inducing algorithms
is indispensable. Therefore, we use G-eRules and VFDR as two state-of-the-art
top-down rule-based classifiers. We also tried to get the code of FACIL and RILL
representing close related bottom-up approaches as well as the implementation of
Hoeffding Rules, AQ11-PM+WAH and FLORA. Unfortunately, these sources were
unavailable at the time of writing despite all of our effort. Yet, it is worth mention-
ing that G-eRules seems to be a pre-version of the recent classifier Hoeffding Rules
as stated by the authors.5 Examining these two learners alongside with InDBR, the
experiments were conducted on four synthetic data sets with sudden and gradual
drift characteristics as well as on five data sets reflecting various real-world classi-
fication problems. Seven of these data sets could be obtained using the well-known

5 https://github.com/thienle2401/G-eRules/ (commit: 45ff73a87a008c36730563cccb233055d6f54


678)
274 F. Beer and U. Bühler

data stream mining framework MOA6 given through built-in data generators and its
designated repository. Additional two real-world tasks were downloaded separately
from the hosting service GitHub.7 In what follows, we highlight the main character-
istics of the employed data sets, whereas their consecutive summary is depicted in
Table 1.
Airline: 539.383 flight records with seven features are given in this particular data
set covering a nonstationary real-world problem [41]. Its task deals with flights that
are delayed or on-schedule and it is often used to evaluate algorithms under drifting
circumstances (e.g. [14, 30, 35]). In our experiments, we use the MOA version of
the data set available at its corresponding website.
Electricity: This data set comprises data from the Australian New South Wales
Electricity Market and is also frequently used as benchmark for drifting environments
(e.g. [37, 42]) as it expresses price dynamics of demand and supply. Each of the
45.312 records contains eight attributes such as timestamp or market demand and
refers to a 30 minute period whereas its problem is concerned with the relative price
change within the last 24 hours. The majority class holds 58%, thus having a tendency
to skewed classes. This data source was downloaded from the MOA website.
Rotating Hyperplane (RHP): This data generator was established in [43] and
can be formalized as follows: Given

d a d-dimensional space of uniformly distributed
data points x, the hyperplane i=1 wi xi = w0 divides the points into the positive

d
class if i=1 wi xi ≥ w0 or into the negative class otherwise, where xi is the ith
coordinate of x and wi is its corresponding weight. By altering wi with the probability
of changing direction τ and magnitude c per x, i.e. wi = wi + cτ , the orientation and
position of the hyperplane can be manipulated introducing drifting circumstances.
We utilized MOA to generate two data sets with the parameters τ = 0.03, c = 0.1 and
τ = 0.01, c = 0.1 to represent a long lasting and shorter gradual drift over 200.000
points with ten features omitting noise. Note, the former also contains notions of
local abrupt drifts.
Outdoor-Stream: This data set contains a collection of image sets recorded by an
autonomous system in a garden environment and was first used in [44]. Each of the
4.000 records consists of 21 attributes representing ten images that were collected on
obstacles from different perspectives and lighting conditions in temporal order. The
task is to separate the records into 40 different categories while the classes are evenly
distributed. This real-word problem is available at the mentioned GitHub repository.
Poker-Hand: The challenge of this data set is to predict the poker hand out of
five playing cards encoded by suit and rank resulting in ten condition attributes
per hand out of a standard 52-card deck. The problem contains 829.201 hands. A
normalized version of it was downloaded from the MOA website without major
modifications from our side. The class distribution is highly imbalanced such that
the eight smallest classes carry not more than 7.62% of the data. Note, even though

6 http://moa.cms.waikato.ac.nz/.
7 https://github.com/vlosing/driftDatasets/tree/master/realWorld/ (commit: 89f1665ed89af 78cae-
cabec62c680a57a4f16646).
In-Database Rule Learning Under Uncertainty: A Variable Precision … 275

it is uncertain if a drift is contained, this data set is commonly utilized to assess


approaches in nonstationary environments (e.g. [14, 37, 42]).
Radial Basis Function (RBF): This data set generator is based on centroids
randomly positioned in the data space with class label and weight. New examples are
generated and more likely to be associated to labels of centers with higher weight.
Their specific placement around the centroids happens in a normally distributed
manner. Controlled by velocity parameter v, the centroids are moved in the data
space shifting the decision boundary progressively. MOA was used to build a data
set with these characteristics consisting of 100.000 instances, ten features, v = 0.001
and two centroids.
SEA-Concepts: This data set was introduced in [45] and contains three features,
where only two are relevant for the classification problem, i.e. a1 and a2 . It is char-
acterized by an abrupt drift behavior given through the decision boundary that is
computed via different arithmetic functions applied to a1 and a2 over time. We mod-
ified this data set through MOA with three classification functions, removed noise
and used the first 100.000 records of the data stream. In this binary decision problem,
the majority class comprises 76% of the data.
Weather: Proposed in [46], this data set consists of 18.159 instances each carrying
eight meteorological features measured between 1949 and 1999 at an Air Force Base
in North America. Its classification task aims at predicting whether it is raining or
not. The authors emphasize that due to the long-term capturing process of 50 years
a realistic precipitation drift problem is comprised within the data set. Note, the
class distribution is imbalanced, where roughly 69% of the ground truth exhibits
no raining conditions. This data sets was downloaded from the same location as
Outdoor-Stream.
These brief details given, most benchmark data contained a mixture of numeric
and nominal data types. Due to the fact that InDBR’s current version is incapable
to handle numeric attributes with a continuous domain, we carefully discretized
all relevant data sets. Note, this additional preprocessing step had no influence to
their inherent drift characteristics, thus resulting in a fair comparison among the
selected rule inducers. From an operational perspective, we should further note that
InDBR is not directly comparable to VFDR and G-eRules as it is designed for
in-database applications, which generally prefer set processing over example-by-
example treatment. This has consequences in terms of runtime and accuracy. Due
to this mismatch, we omit deeper insights to the runtime behavior of the algorithms
and focus on their effectiveness in terms of predictive and descriptive capability in
this work. Creating equal conditions over the course of the experiments, InDBR was
adjusted with a batch size of v = 1.

6.2 Predictive Capabilities

Reviewing Sect. 5.2, class imbalance can be a huge concern for a ML algorithm.
At the same time, this phenomenon is not only causing trouble during training but
also when assessing its predictive capabilities. In such a setting, several popular
276 F. Beer and U. Bühler

Table 1 Employed data sets to analyze concept drifts and class imbalance
Data set #Records #Attributes #Classes Imbalance Type of drift
Airline 539.383 7 2 no unknown
Electricity 45.312 8 2 (yes) unknown
Outdoor- 4.000 21 40 no unknown
Stream
Poker-Hand 829.201 10 10 yes unknown
RBF 100.000 10 2 yes gradual
RHP (long) 200.000 10 2 no abrupt/gradual
RHP (short) 200.000 10 2 no gradual
SEA- 100.000 3 2 yes abrupt
Concepts
Weather 18.159 8 2 yes unknown

performance measures such as “accuracy”, which showcases the ratio between cor-
rectly classified examples and all instances seen during evaluation, can be misleading.
To highlight their inherent problem, let us assume a binary classification task where
the positive class consists of roughly 3% of the data and the negative class holds
97%. In this and similar situations with skewed class distributions, a naive learner
deeming all examples to fall in the negative class indeed features 97% accuracy.
Obviously, it distorts the classification result not indicating that 100% of the positive
samples are predicted incorrectly. Thus, it is imperative to utilize a more sophisti-
cated performance measure given our experimental setup consisting of both balanced
and imbalanced data reviewing Table 1. Countering this challenge, we make use of
the “F1-score” in conjunction with two established scaling methods, i.e. “micro-
averaging” and “macro-averaging”. In this context, F1-score refers to the harmonic
mean of the two measures “precision” and “recall” that emerged from information
retrieval, an area highly subject to class imbalance (e.g. [47, 48]). On that note, the
micro-average F1-score (μF1) weights each classification equally during evaluation,
and thus reflects the conventional accuracy in a multiclass setting. In contrast, the
macro-average of the F1-score (mF1) weights all classes evenly permitting insights
to the effectiveness of a classifier across classes. As a result, we obtain two indica-
tors rating both the overall classification performance in terms of correct predictions
and illuminating a classifier’s deficits on imbalanced data. Note, we refer to these
measures only by means of multiclass decision problems out of consistency being
aware that several data sets in our test environment in fact target binary classification
tasks.
Unlike conventional evaluation methods for batch learning that rely either on hold-
out test sets or cross-validation, estimating the predictive tendencies of an incremental
decision model is a further challenge, because the model evolves over time and no
explicit test data are available due to the continuous nature of the learning process.
Common practice to determine the performance in such a setting is the predictive
In-Database Rule Learning Under Uncertainty: A Variable Precision … 277

sequential (prequential) or interleaved-test-then-train technique, which first utilizes


an incoming training example for testing right before the sample is exerted to update
the model (see [49, 50]). We adopt this idea and use a sliding window to incorporate
an additional forgetting mechanism not favoring periods with high error due to drift
or long-lasting segments where the model was stable. This way, we measure the
discussed F1-scores on micro- and macro-scale providing pessimistic error estimates.
Both, evaluation method and considered measures are applicable to all three clas-
sifiers in our experiments, but additional attention must be payed for two distinguish-
ing concepts limiting a direct comparison, i.e. the abstaining characteristic and the
any-time property. Recalling that InDBR and G-eRules explicitly disclose a level of
abstaining, the any-time learner VFDR does not at least from a superficial perspec-
tive. Yet with a focus on rule learning in this chapter, a closer look to VFDR reveals
that it indeed abstains when isolating its rule engine from the built-in NB component.
Following this perception, we meter VFDR’s any-time capabilities together with its
abstaining behavior. While the latter is directly comparable to the other rule induc-
ers by definition, we enhance InDBR with an additional NB model turning it into
an any-time learner, which finally yields comparable F1-scores among VFDR and
InDBR. In all other cases (i.e. the isolated comparison of InDBR and G-eRules), we
present the corresponding abstain rates as well as the tentative F1-scores (t-μF1 and
t-mF1). In this context, t-μF1 and t-mF1 refer to G-eRule’s and InDBR’s predictive
capabilities in cases, where the rule inducers are confident about a decision.
The comparison of VFDR and InDBR in terms of μF1 revealed fairly similar
results not exceeding a notable difference except for two data sets, where InDBR’s
performance was better by 6.82% on Poker-Hand and 27.90% on Outdoor-Stream.
Considering mF1, observed results were inconspicuous. VFDR only outran our
approach by 11.98% on RBF, but at the same time InDBR produced fewer mis-
classification across the classes on Outdoor-Stream disclosing VFDR by more than
24%. These findings were confirmed by a low average difference among the two
learners (<4.5%) on all nine data sets for both F1-scores indicating a comparable
performance assessing in an any-time setting. When it comes to the abstaining behav-
ior of both rule engines, however, different insights could be obtained. In eight cases,
our proposed algorithm outperformed VFDR by more than 10%. Only on Poker-
Hand, VFDR could match up to InDBR. Overall, the abstaining behavior of InDBR
averaged out at 16.30%, which completely is in contrast to the rates of VFDR, i.e.
a mean abstain rate of 60.46% on all nine benchmarks. On Outdoor-Stream, VFDR
conducted its weakest result not supply a rule at all showcasing its general drawback,
i.e. a lazy rule induction particularly in the initial learning phase with few training
examples available. This key observation is further stressed by Fig. 2a–c, where
details about the abstaining behavior are illustrated over the complete incremental
learning cycle on data set Electricity, Poker-Hand and SEA-Concepts.
Concerning the assessment of InDBR against its direct competitor G-eRules,
results are twofold. While both pure rule engines share a nearly similar outcome for
t-μF1 on five data sets with a mean difference of 3.13%, G-eRules produced poorer
scores on the other four benchmarks with a discrepancy of 18.20%. These results are
even more apparent reviewing t-mF1, where InDBR outperformed G-eRules on five
278 F. Beer and U. Bühler

(a) (b) (c)


1.0

1.0
0.8
0.8

0.8
0.6
abstain rate
0.6

0.6
0.4
0.4

0.4
0.2
0.2

0.2
0.0

0.0

0.0
0k 10k 20k 30k 40k 0k 200k 400k 600k 800k 0k 25k 50k 75k 100k

(d) (e) (f)


0.8

0.85
0.9

0.7
tentative m−F1

0.75
0.8

0.6
0.5

0.65
0.7

0.4

0.55
0.6

0.3

0k 10k 20k 30k 40k 0k 200k 400k 600k 800k 0k 25k 50k 75k 100k

InDBR G−eRules VFDR

Fig. 2 Abstaining behavior and predictive performance on selected data sets over the course of
the incremental learning process: a–c provide insights to the individual abstain rates of VFDR,
G-eRules and InDBR for data set Electricity, Poker and SEA-concepts, while d–f showcase t-mF1
of G-eRules and InDBR on these benchmarks all in consecutive order

data sets by 18.48% on average. It is worth noting that out of these five classification
tasks, four comprised class imbalance problems. Considering all imbalanced data set,
G-eRules produced an averaged t-mF1 of 64.98%, which was roughly 10% weaker
than the outcome of InDBR providing solid results given the degree of difficulty on
these data sets. Details w.r.t. their prequential performance on three imbalanced data
sets are depicted in Fig. 2d–f. The potentials of InDBR become even more convincing
when reviewing its performance on multiclass problems represented through Poker-
Hand and Outdoor-Stream in our series of experiments. It achieved a mean t-mF1
of 82.83%, which was more than 33% better than numbers produced by G-eRules.
Combining these points indicate that our approach in combination with the sliding
windows is promising and provides more visibility over the course of the learning
cycle. When it comes to the comparison of the abstain rates, both algorithms showed
no compelling differences. The only notable disparity could be found on two data
In-Database Rule Learning Under Uncertainty: A Variable Precision … 279

Table 2 Prequential evaluation of the rule learners in percent using μF1, mF1, t-μF1, t-mF1 and
abstain rate (abs) under concept drift: Bold numbers indicate overall winner per row and performance
measure
Data set VFDR G-eRules InDBR
μF1 mF1 abs t-μF1 t-mF1 abs μF1 mF1 t-μF1 t-mF1 abs
Airline 66.40 61.70 94.63 62.97 59.00 13.70 66.19 65.46 67.57 66.85 13.59
Electricity 78.88 78.62 62.55 76.24 75.36 16.76 80.27 79.61 84.59 83.61 19.39
Outdoor- 55.43 58.13 100.00 57.01 54.13 42.20 83.33 82.33 95.33 94.67 27.00
Stream
Poker-Hand 78.66 63.85 32.37 77.32 44.48 21.84 85.48 63.45 94.88 70.99 29.74
RBF 90.04 85.26 76.85 83.92 63.98 8.02 86.86 73.28 86.00 64.95 11.98
RHP (long) 83.41 83.51 26.69 83.71 83.73 6.61 86.84 86.85 87.97 87.97 4.83
RHP (short) 81.61 81.62 22.80 79.64 79.66 4.30 80.52 80.56 81.29 81.32 3.26
SEA-Concepts 86.06 81.58 87.35 81.76 72.22 14.09 86.51 80.42 90.31 81.47 18.68
Weather 71.48 67.68 40.88 76.63 68.85 16.28 75.17 70.28 79.67 72.39 18.22

sets, where InDBR showed a weaker performance on Poker-Hand while G-eRules


struggled on Outdoor-Stream confessing a tie overall between both rule learners.
To formally verify these described tendencies towards their statistical significance,
we assessed the distribution of the F-scores and the abstaining behavior utilizing the
Friedman test [51] and Wilcoxon test [52]. They entirely confirmed the described
characteristics. Comparing VFDR and InDBR, neither result exposed a significant
difference on both μF1 and mF1. However, a distinction could be identified on their
abstain rate with a level of significance of α = 1% on both tests. In addition, both
tests outlined a significant discrepancy contrasting the the F-scores of G-eRules and
InDBR (α = 1%), while rating their abstaining to be fairly similar. On the one hand,
these results manifest that InDBR is comparable to VFDR in terms of any-time
classification, but it provides more explainable predictions. On the other hand, both
tests showcase InDBR’s superior over G-eRules regarding the tentative F-scores,
while maintaining a similar abstaining in the given experimental setup. However,
it is arguable that due to the weak performance of VFDR and G-eRules on the
unrepresentative Outdoor-Stream benchmark results appear to be skewed, but even
ignoring this data set yields a significant difference with α = 1% on the Friedman test
and α = 2% on the Wilcoxon test. The summary of all obtained results is presented
in Table 2.
Motivated by the recent advances of cybercrime, we further extended the evalua-
tion to study the classification capabilities in an adversarial environment, where an
attacker is in the position to manipulate portions of the training data with the ultimate
goal to disrupt the learning process. To simulate such a plausible situation, we adapted
the nine data sets from the previous experiments poisoning the class labels by distort-
ing 15% of the values at random. This way, class noise is introduced complicating the
rule learning process. In the remainder of this section, we share initial results on the
performance of the three rule learners towards their resilience to such manipulations.
280 F. Beer and U. Bühler

Table 3 Prequential evaluation of the rule learners in percent with 15% class noise and concept
drift using μF1, mF1, t-μF1, t-mF1 and abstain rate (abs): Bold numbers indicate overall winner
per row and performance measure
Data set VFDR G-eRules InDBR
μF1 mF1 abs t-μF1 t-mF1 abs μF1 mF1 t-μF1 t-mF1 abs
Airline-15 60.01 56.98 96.03 55.70 53.64 16.15 61.45 55.15 63.16 55.85 21.15
Electricity-15 69.41 68.77 60.78 63.83 63.54 22.60 71.00 70.43 74.52 73.27 25.59
Outdoor- 45.45 42.51 100.00 42.97 38.41 47.51 63.00 58.25 71.75 65.25 27.75
Stream-15
Poker-Hand 68.30 21.43 27.62 58.29 16.61 20.80 69.19 19.60 76.46 41.69 31.84
RBF-15 74.98 68.86 70.91 68.95 57.19 16.58 75.48 64.23 74.98 59.96 18.27
RHP-15 (long) 73.37 73.44 28.49 73.16 73.20 14.51 75.48 75.45 76.37 76.38 7.29
RHP-15 (short) 71.92 71.96 29.04 72.43 72.51 15.04 72.38 72.38 73.79 73.79 9.89
SEA-Concepts- 71.42 65.58 87.22 69.65 62.97 16.53 74.61 68.63 80.56 71.20 23.49
15
Weather-15 64.73 61.24 64.15 65.40 60.74 24.63 67.28 62.72 69.83 63.89 14.39

The intra-comparison among the biased data sets revealed very few changes among
the performance tendencies of the three classifiers compared to original test results.
In more detail, the pairwise comparison of VFDR and InDBR exposed a mean differ-
ences of 2.58% considering the F-scores and 42.73% examining abstain rates indi-
cating no major discrepancies w.r.t. the previous results. However, obtained numbers
on μF1 demonstrate a win on all nine data sets for InDBR underpinning a signif-
icant difference with agreement across the Friedman test (α = 1%) and Wilcoxon
test (α = 5%). Associating G-eRules and InDBR uncovered a combined deviation of
9.64% on both F-scores, which is in line with the original assessment. Yet, a distinc-
tion could be observed on the abstaining behavior as in three out of nine measurements
G-eRules provided a better performance. Despite these results, no statistical signif-
icance could be determined concluding no substantial differences. Comparing the
outcome of both series of experiments towards an inter-assessment disclosed slightly
weaker results for InDBR. On average, the F-scores of VFDR decreased by 12.42%
while InDBR’s performance dropped by 13.15%. At the same time, the abstain rates
increased by 2.24% for VFDR and 3.36% for InDBR. Contrasting G-eRules and
InDBR revealed similar results on the tentative F-scores. While G-eRules collapsed
by 11.74% on average, InDBR’s t-μF1 and t-mF1 decreased by 12.73%. Turning
over to their abstaining behavior, the corresponding rates fell by 5.64% and 3.66%
respectively. Based on that, we can deduce that even in the given adversarial setup
InDBR outperforms G-eRules on the F-scores examining no significant difference on
the abstain rates. Furthermore, its any-time capabilities even increased w.r.t. VFDR,
but also inidcate slightly poorer results on an inter-comparison. However, this deficit
is always below 2% per performance measure constituting a rather marginal gap.
The results of this setup are highlighted in Table 3.
In-Database Rule Learning Under Uncertainty: A Variable Precision … 281

6.3 Descriptive Capabilities

In contrast to conventional criteria seeking the predictive capabilities of a rule learner


as discussed in the previous section, further aspects are concerned with the ability to
characterize the quality of extracted pattern in the data from a descriptive perspective
(see [53–55]). This includes the evaluation of pattern w.r.t. to their interestingness or
usefulness. However, metering these attributes can turn out to be highly subjective,
and thus there is a lack of established and appropriate metrics available [53]. To
contend with these issues, we make a pragmatical attempt to get insights to the
quality of the rule sets produced by VFDR, G-eRules and InDBR and propose four
measures addressing discovery-oriented aspects in drifting environments:

• Average rule set size: Similar to the depth of a decision tree, the rule set size
provides an indicator for the overall model complexity. Commonly, a smaller rule
set is preferred w.r.t. to both computational demands and monitoring aspects.
• Average rule length: The length of a decision rule characterizes the simplicity of
a pattern. Shorter rules refer to a more expressive pattern identified by the rule
engine, thus permitting to assess the level of generality to cover potentially unseen
examples.
• Average coverage: In an incremental setting, the coverage of a rule set is an indi-
cator determining how well rules reflect arriving examples. This measure can be
critical for decision-makers as a low coverage signals a poor adaption and repre-
sentation of most recent data. In our experiments, the rule set coverage is metered
w.r.t. to a reference sliding window comprising a fixed size of the last 1000 exam-
ples.
• Average rule purity: Not only the complete coverage is of interest but also the
quality of an individual decision rule, which can be discovered via the rule purity
providing an intuition to its consistency and confidence. Therefore, the same con-
cept from the previous measure is applied, i.e. a sliding window holding latest
arriving examples of size 1000.

The results applying these criteria to the rule learners G-eRules, VFDR and InDBR
is discussed in the remainder of this section. Therefore, we used the nine benchmark
data sets from the previous Sect. 6.2 without class noise. Results disclosed that the
any-time learner VFDR outperforms InDBR in terms of rule set size by a big margin.
On average, it carries 94.81 fewer rules than InDBR. Yet, this outcome is not surpris-
ing given the high abstain rates uncovered earlier. Another finding is the weak result
from G-eRules on this measure, which becomes even more evident when reviewing
Fig. 3a where its rule set size constantly grows over the course of the learning cycle
on the Electricity data set. This behavior is in contrast to VFDR and InDBR that
produce a rather constant growth. The same characteristics could be observed on
three other benchmark data sets such that G-eRules contains more than 979.36 rules
on average as opposed to InDBR. Considering the average rule length, G-eRules
performed much better, but still being behind VFDR by a mean difference of 3.06.
Hence, our method was outrun, which we relate to the different induction strategies
282 F. Beer and U. Bühler

(a) (b) (c)

1.0
9 10
1200

0.9
8
7
rule set size

0.8
900

rule length

rule purity
6

0.7
5
600

0.6
3
300

0.5
1

0.4
0

0k 10k 20k 30k 40k 0 0k 200k 400k 600k 800k 0k 50k 100k 150k 200k

InDBR G−eRules VFDR

Fig. 3 Discovery-oriented quality on selected data sets over the course of the incremental learning
process: a average rule set size on Electricity, b average rule length on Poker and c average rule
purity on RHP (short)

implying a disadvantage for InDBR due to its bottom-up approach. In particular, the
effect of the different concepts can be examined in Fig. 3b. Moreover, the experi-
ments showed that rules created by InDBR pose a higher purity well ahead of its
competitors in eight out of nine benchmark tests such that the gap between VFDR and
G-eRules compared to InDBR amounts for more than 25%. An example is depicted
in Fig. 3c, where InDBR’s purity is rather constant in comparison to the oscillating
numbers provided by VFDR and G-eRules. Regarding the coverage, InDBR pro-
vides a solid outcome on five out of nine data sets resulting in a mean coverage of
92.88%, while outnumbered on the remaining benchmark tests. Combined, however,
it covers 43.82% more incoming examples than VFDR and 2.20% fewer examples
than G-eRules. Examining the win-loss analysis, InDBR won one time and was eight
times on second place w.r.t. the average rule set size. On the rule length, our method
approached one time the first position, was three times second and five times on
the last place constituting the weakest result on the descriptive measures. By means
of average rule purity, InDBR catered for eight wins and was one time in second
position, while it won five times, was two times second and two times on the third
place considering the average coverage (Table 4).

7 Closing Remarks

Conventional mining attempts often face performance problems due to long lasting
data loads and insufficient support of parallel algorithms. One promising remedy is
in-database processing, which is an emerging paradigm in data science. By fusing
mining components and data repository, it essentially brings predictive analytics to
the domain of relational databases carrying several benefits including the reduction
Table 4 Discovery-oriented aspects of rule learners w.r.t. average rule set size (size), average rule length (len), average rule purity (pur) and average coverage
(cov): Bold numbers indicate overall winner per row and performance measure; pur and cov are expressed in percent
Data set VFDR G-eRules InDBR
size len pur cov size len pur cov size len pur cov
Airline 92.67 1.01 19.64 5.37 3100.85 2.03 17.13 83.39 83.84 2.64 74.11 96.37
Electricity 8.92 1.06 76.17 36.89 904.05 2.81 28.66 82.08 114.97 5.72 93.23 83.00
Outdoor- 0.00 0.00 0.00 0.00 95.75 4.26 48.01 74.10 24.82 18.52 80.00 71.00
Stream
Poker- 82.06 2.12 22.93 67.21 2052.18 8.84 4.34 78.00 194.75 7.20 96.33 89.95
Hand
RBF 13.62 1.45 62.54 22.47 451.70 1.88 39.37 92.61 121.75 6.26 73.36 97.25
RHP 16.77 3.64 87.69 73.31 242.74 8.03 62.40 93.06 52.74 5.15 88.10 66.36
(long)
RHP 15.73 3.33 80.82 77.21 133.97 8.15 59.53 95.52 52.14 3.19 82.01 68.79
(short)
SEA- 15.83 1.01 96.55 12.63 2571.57 1.90 11.90 85.99 264.61 1.40 84.63 97.79
In-Database Rule Learning Under Uncertainty: A Variable Precision …

Concepts
Weather 8.34 1.29 75.22 58.06 368.70 4.47 48.79 82.58 197.63 5.54 83.28 77.06
283
284 F. Beer and U. Bühler

of unnecessary data loads as well as supplying a scalable computing platform given


the efficient algorithms and data structures provided. As profound framework to
analyze data under vagueness and uncertainty, Rough Set Theory is of increasing
interest for this subject, because it is built on well-defined set operations facilitating
its integration. However, existing implementations exhibit drawbacks particularly
when considering nonstationary environments, where data evolves over time and
concepts tend to drift. Addressing the lack of efficient uncertainty management in
recent rough set literature for in-database applications, we proposed a new incre-
mental in-database rule inducer in this chapter, which is based on Variable Precision
Rough Sets coping with concept drifts and class imbalance. The evaluation of our
approach under different scenarios revealed it is able to compete with state-of-the-art
rule learners in terms of classification abilities and discovery-oriented aspects. In par-
ticular, our experiments underlined that its predictive accuracy is comparable with the
rule inducer VFDR on any-time classification tasks, but it is superior regarding trans-
parent decision-making producing far more explainable predictions. Moreover, our
method outruns its direct competitor G-eRules in drifting environments, multiclass
and class imbalance problems with statistical significance, while both algorithms
maintained a similar abstain behavior. In terms of descriptive characteristics, our
approach showed superior results w.r.t. coverage and purity, but at the same time it
has a tendency to produces larger rule sets and with longer rule descriptors, which
can be related to it bottom-up induction strategy. Despite these promising results,
several points remained open which are part of our ongoing research. These include
the reduction of the abstain rate and extending the rule engine to support continuous
data. Additionally, we are studying opportunities to degrade the number of required
parameters.

Acknowledgements The authors would like to thank the German Federal Ministry of Education
and Research (BMBF) for support within the project IntErA under grant number 03FH023PX3.

References

1. Tileston, T.: Have your cake & eat it too! accelerate data mining combining SAS & teradata.
In: Teradata Partners 2005 Experience the Possibilities (2005)
2. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis
practices for big data. In: Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1481–1492.
VLDB Endowment (2009)
3. Shreya, P., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I.: Large-
scale predictive analytics in vertica: fast data transfer, distributed model creation, and In-
database prediction. In: Proceedings of the 2015 ACM SIGMOD International Conference on
Management of Data, pp. 1657–1668. ACM (2015)
4. Luo, S., Gao, Z.J., Gubanov, M., Perez, L.L., Jermaine, C.: Scalable linear algebra on a rela-
tional database system. In: Proceedings of the IEEE 33rd International Conference on Data
Engineering (ICDE 2017), pp. 523–534. IEEE (2017)
5. Fernandez-Baizán, M.C., Menasalvas Ruiz, E., Peña Sánchez, J.M.: Integrating RDMS and
data mining capabilities using rough sets. In: Proceedings of the 6th International Conference
on Information Processing and Management of Uncertainty (IPMU’96), pp. 1439–1445 (1996)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 285

6. Kumar, A.: New techniques for data reduction in a database system for knowledge discovery
applications. J. Intell. Inf. Syst. 10(1), 31–48 (1998)
7. Hu, X., Lin, T.Y., Han, J.: A new rough set model based on database systems (RSFDGrC 2003).
In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol. 2639, pp. 114–121.
Springer, LNCS (2003)
8. Vaithyanathan, K., Lin, T.Y.: High frequency rough set model based on database systems. In:
Proceedings of the 2008 Annual Meeting of the North American Fuzzy Information Processing
Society (NAFIPS 2008), pp. 1–6. IEEE (2008)
9. Z̆liobaitė, I.: Learning under concept drift: an overview. Technical report, Faculty of Mathe-
matics and Informatics, Vilnius University (2010)
10. Gama, J., Z̆liobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift
adaptation. In: ACM Computing Surveys, vol. 46, no. 4, pp. 1–37. ACM (2014)
11. Rozsypal, A., Kubat, M.: Association mining in time-varying domains. Intell. Data Anal. 9(3),
273–288 (2005)
12. Kukar, M.: Drifting concepts as hidden factors in clinical studies. In: Artificial Intelligence in
Medicine, vol. 2780, pp. 355–364. Springer, LNCS (2003)
13. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. In: ACM Computing
Surveys, vol. 41, no. 3, 15, pp. 1–58. ACM (2009)
14. Beer, F., Bühler, U.: Learning adaptive decision rules inside relational database systems. In:
Proceedings of the 2nd International Symposium of Fuzzy and Rough Sets (ISFUROS), pp.
1–12 (2017)
15. Beer, F., Bühler, U.: In-database feature selection using rough set theory. In: Information
Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016), CCIS,
vol. 611, pp. 393–407. Springer (2016)
16. Pawlak, Z.: Rough sets. In: International Journal of Computer and Information Science, vol.
11, no. 5, pp. 341–356. Kluwer (1982)
17. Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht
(1991)
18. Ziarko, W.: Variable precision rough set model. In: Journal of Computer and System Sciences,
vol. 46, no. 1, pp. 39–59. Elsevier (1993)
19. Nguyen, H.S.: Approximate Boolean reasoning: foundations and applications in data mining.
In: Transactions on Rough Sets V, vol. 4100, pp. 334–506. Springer, LNCS (2006)
20. Machuca,F., Millán, M.: Enhancing query processing in extended relational database systems
via rough set theory to exploit data mining potentials. In: Knowledge Management in Fuzzy
Databases. Studies in Fuzziness and Soft Computing, vol. 39, pp. 349–370. Physica (2000)
21. Han, J., Hu, X., Lin, T.Y.: A new computation model for rough set theory based on database
systems. In: Data Warehousing and Knowledge Discovery (DaWaK 2003), vol. 2737, pp. 381–
390. Springer, LNCS (2003)
22. Beer, F., Bühler, U.: An In-database rough set Toolkit. In: Proceedings of the LWA 2015
Workshops: KDML, FGWM, IR and FGDB (LWA’15), pp. 146–157. CEUR-WS (2015)
23. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach.
Learn. 23(1), 69–101 (1996)
24. Michalski, R.S.: On the Quasi-minimal solution of the general covering problem. In: Proceed-
ings of the 5th International Symposium on Information Processing, pp. 125–128 (1969)
25. Maloof, M.A.: Incremental rule learning with partial instance memory for changing concepts.
In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), pp.
2764–2769 (2003)
26. Ferrer-Troyano, F.J., Aguilar-Ruiz, J.S., Riquelme, J.C.: Incremental rule learning and border
examples selection from numerical data streams. J. Univers. Comput. Sci. 11(8), 1426–1439
(2005)
27. Gama, J., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the 22nd
International Joint Conference on Artificial Intelligence (IJCAI’11), pp. 1255–1260. AAAI
Press (2011)
286 F. Beer and U. Bühler

28. Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Journal of
the American Statistical Association, vol. 58, no. 301, pp. 1330. Taylor & Francis (1963)
29. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp.
71-80. ACM (2000)
30. Kosina, P., Gama, J.: Handling time changing data with adaptive very fast decision rules. In:
Machine Learning and Knowledge Discovery in Databases, vol. 7523, pp. 827–842. Springer,
LNCS (2012)
31. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams.
In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD’03), pp. 523–528. ACM (2003)
32. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach.
Learn. Res. 11, 1601–1604 (2010)
33. Stahl, F., Gaber, M.M., Salvador, M.M.: eRules: a modular adaptive classification rule learning
algorithm for data streams. In: Research and Development in Intelligent Systems XXIX (SGAI
2012), pp. 65–78. Springer (2012)
34. Cendrowska, J.: PRISM: an algorithm for inducing modular rules. In: International Journal of
Man-Machine Studies, vol. 27, no. 4, pp. 349–370. Academic Press (1987)
35. Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Di Fatta, G.: Computationally efficient rule-
based classification for continuous streaming data. In: Research and Development in Intelligent
Systems XXXI (SGAI 2014), pp. 21–34. Springer (2014)
36. Le, T., Stahl, F., Gaber, M.M., Gomes, J.B., Di Fatta, G.: On expressiveness and uncertainty
awareness in rule-based classification for data streams. In: Neurocomputing, vol. 265(C), pp.
127–141. Elsevier (2017)
37. Deckert, M., Stefanowski, J.: RILL: algorithm for learning rules from streaming data with
concept drift. In: Foundations of Intelligent Systems, vol. 8502, pp. 20–29. Springer, LNCS
(2014)
38. Pawlak, Z.: Information systems - theoretical foundations. In: Information Systems, vol. 6, no.
3, pp. 205–218. Elsevier (1981)
39. Lin, T.Y.: An overview of rough set theory from the point of view of relational databases. In:
Bulletin of International Rough Set Society, vol. 1, no. 1, pp. 30–34. IRSS (1997)
40. Michalski, R.S.: A theory and methodology of inductive learning. In: Artificial Intelligence,
vol. 20, no. 2, pp. 111–161. Elsevier (1983)
41. Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams.
In: Data Mining and Knowledge Discovery, vol. 23, no. 1, pp. 128–168. Springer (2011)
42. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for
evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD’09), pp. 139–148. ACM (2009)
43. Hulten, G.S., Domingos, P.: Mining time-changing data streams. In: Proceedings of the
7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD’01), pp. 97–106. ACM (2001)
44. Losing, V., Hammer, B., Wersing, H.: Interactive online learning for obstacle classification on
a mobile robot. In: Proceedings of the 2015 International Joint Conference on Neural Networks
(IJCNN), pp. 1–8 (2015)
45. Street, W., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In:
Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD’01), pp. 377–382. ACM (2001)
46. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments.
In: IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1517–1531 (2011)
47. van Rijsbergen, C.J.: Foundations of evaluation. J. Doc. 30(4), 365–373 (1974)
48. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge
University Press, Cambridge (2008)
49. Dawid, P.A.: Present position and potential developments: some personal views: statistical
theory: the prequential approach. In: Journal of the Royal Statistical Society, vol. 147, no. 2,
pp. 278–292. Wiley (1984)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 287

50. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In:
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD’09), pp. 329–338. ACM (2009)
51. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings.
Ann. Math. Stat. 11(1), 86–92 (1940)
52. Wilcoxon, F.: Individual comparisons by ranking methods. In: Biometrics Bulletin, vol. 1, no.
6, pp. 80–83. Wiley (1945)
53. Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-
oriented perspectives. In: International Journal of Intelligent Systems, vol. 16, no. 1, pp. 13–27.
Wiley (2001)
54. McGarry, K.: A survey of interestingness measures for knowledge discovery. In: The Knowl-
edge Engineering Review, vol. 20, no. 1, pp. 39–61. Cambridge University Press (2005)
55. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. In: ACM Com-
puting Surveys, vol. 38, no. 3, p. 9. ACM (2006)
Facial Similarity Analysis: A Three-Way
Decision Perspective

Daryl H. Hepting, Hadeel Hatim Bin Amer and Yiyu Yao

1 Introduction

A fundamental task in the sorting of facial photographs is the modelling of pairwise


facial similarity. A three-way analysis of facial similarity described in this work
uses data obtained from card sorting of a set of facial photographs, done by a group
of participants. Participants were asked to sort the photographs into an unrestricted
number of piles, using their own judgements of similarity to place similar photos
into the same pile. Photos placed into different piles are considered to be dissimilar.
In particular, each participant compared the photo to be sorted with the last photo
placed on top of each pile. The decision faced by each participant is to add the photo
to an existing pile or to create a new pile. Given the lack of an objective standard for
judging similarity, different participants may be using different strategies in judging
the similarity of photos. It could be very useful to identify and study these strategies.
An overall evaluation of similarity can be obtained by synthesizing judgments
from the set of participants. A two-way analysis classifies a pair of photos as either
similar or dissimilar. This may be too restrictive. Motivated by the three regions
in rough set theory, in this work we present a framework for three-way analysis of
judgments of facial similarity. Based on judgments by the set of participants, we
divide all pairs of photos into three classes: a set of similar pairs that are judged
primarily as similar; a set of dissimilar pairs that are judged primarily as dissimi-
lar; and a set of undecidable pairs that have conflicting judgments. A more refined

D. H. Hepting (B) · H. H. Bin Amer · Y. Yao


Department of Computer Science, University of Regina, Regina, SK S4S 0A2, Canada
e-mail: hepting@cs.uregina.ca
H. H. Bin Amer
e-mail: binamerh@cs.uregina.ca
Y. Yao
e-mail: yyao@cs.uregina.ca

© Springer Nature Switzerland AG 2019 289


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_15
290 D. H. Hepting et al.

three-way classification method is also suggested based on a quantitative description


of the quality of similarity judgments. The classification in terms of three classes
provides an effective method to examine the notions of similarity, dissimilarity, and
disagreement. Within the framework, we examine the following two issues:
• Connections between rough sets, shadowed sets, and three-way decisions. We will
argue that three-way decisions provide a generalization of rough sets and shadowed
sets and offer a new perspective for facial similarity analysis.
• Three-way quantitative analysis of human similarity judgments, beyond a focus
on the entire group of participants. We also consider a three-way analysis of indi-
viduals, as well as groupings of individuals.
The results shed new lights on our understanding of similarity in both a subjective
and an objective setting.
There are at least two possible classes of methods for studying similarity. One
class is based on similarity measures on facial photographs. The other class relies on
human judgments. Although the former class can be easily automatized, the selection
of a semantically meaningful similarity measure is a challenge. On the other hand,
the latter class explores human perception of similarity. A semantically meaningful
similarity measure must be informed by the human perception of similarity, therefore
studies of human judgments of similarity are critical in formulating a semantically
meaningful similarity measure.
Unlike tasks of sorting based on shapes, colours of shapes, or numbers of shapes
(as in the Wisconsin Card Sorting Test [2]) which have objectively correct answers,
the judgment of facial similarity is more subjective. With the sorting of facial pho-
tographs, a desired outcome is information about how similarity between faces was
judged. Because different participants in the card sorting task may apply different
strategies, it may be difficult to identify the information about the similarity judg-
ments. Based on a recently proposed theory of three-way decisions [10, 11], the
main objective of this work is to suggest a framework of three-way analysis of
human judgments of similarity.
Suppose a group of participants provides similarity judgments of a set of facial
photos. Based on their judgments, we divide all pairs of photos into three classes: a set
of similar pairs that are judged by 60% of the participants as similar, a set of dissimilar
pairs that are judged by 60% of the participants as dissimilar, and a set of undecidable
pairs about which the participants have disagreed (less than 60% of the participants
judged the pair as similar and less than 60% of the participants judge the pair as
dissimilar). Based on this three-way classification, we can identify several research
problems, such as: human perception of similarity versus dissimilarity, comparative
studies of similar and dissimilar photos, and many more. In this work, we report
our preliminary results from applying the theory and performing experiments with
three-way analysis of similarity based on human judgments.
The remainder of the Chapter is organized as follows. Section 2 presents a
trisecting-and-acting model of three-way decisions as a general framework. Section 3
explores the connections between rough sets and three-way decisions and interprets
probabilistic rough sets and shadowed sets in terms of three-way decisions. Section 4
Facial Similarity Analysis: A Three-Way Decision Perspective 291

presents the analysis of facial similarity judgments as an application of the trisecting-


and-acting model of three-way decisions. Section 5 presents results from analyzing
a dataset obtained from card sorting. Conclusions and opportunities for future work
are identified in Sect. 6.

2 A Trisecting-and-Acting Model of Three-Way Decisions

In a nutshell, the basic ideas of three-way decisions are thinking and problem-solving
about threes [11], representing a commonly and widely used human practice in
perceiving and dealing with complex worlds. In other words, we typically divide
a whole or a complex problem into three relatively independent parts and design
strategies to process each of the three parts. Yao [11] proposed a trisecting-and-
acting (T&A) model of three-way decisions. By following this basic idea of three-
way decisions, we put forward here a model of three-way analysis of facial photos.
To grasp the idea of thinking in threes, let us consider three examples. Marr [6]
suggested that an in-depth understanding of any information processing can be under-
stood at three levels: the computational theory level, the representation and algorithm
level, and the hardware implementation level. Each level focuses on a different aspect
of information processing. Kelly [5] presented a three-era framework for studying
the past, present, and future of computing: the tabulating era (1900s–1950s), the
programming era (1950s–present), and the cognitive era (2011–). This framework
helps us to identify the main challenges and objectives of today’s computing. Many
taxation systems categorically classify citizens as low, middle, or high income, with
different taxation methods applied to each.
An evaluation-based trisecting-and-acting model of three-way decisions consists
of two components [10, 11]. According to the values of an evaluation function, we
first trisect a universal set into three pair-wise disjoint regions. The result is a weak
tri-partition or a trisection of the set. With respect to a trisection, we design strategies
to process the three regions individually or jointly. The two components of trisecting
and acting are both relatively independent and mutually supportive. Effectiveness
of the strategies of action depends on the appropriateness of the trisection; a good
trisecting method relies on knowledge about how the resulting trisection is to be
used. It is important to search for the right combination of trisection and strategies
of action.
Let OB denote a finite set of objects. Suppose v : OB −→  is an evaluation
function over OB, where  is the set of real numbers. For an object x ∈ OB, v(x)
is called the evaluation status value (ESV) of x. Intuitively, the ESV of an object
quantifies the object with respect to some criteria or objectives. To obtain a trisection
of OB, we require a pair of thresholds (α, β), α, β ∈ , with β < α. Formally, three
regions of OB are defined by:
292 D. H. Hepting et al.

Rl (v) = {x ∈ OB | v(x) ≤ β},


Rm (v) = {x ∈ OB | β < v(x) < α},
Rh (v) = {x ∈ OB | v(x) ≥ α}. (1)

They correspond to subsets of objects with low, middle, and high ESVs, respectively.
The three regions satisfy the following properties:

(i) Rl (v) ∩ Rm (v) = Rl (v) ∩ Rh (v) = Rm (v) ∩ Rh (v) = ∅,


(ii) Rl (v) ∪ Rm (v) ∪ Rh (v) = OB.

It should be noted that one or two of the three regions may in fact be the empty set.
Thus, the family of three regions is not necessarily a partition of OB. We call the
triplet (Rl (v), Rm (v), Rh (v)) a weak tri-partition or a trisection of OB.
A trisection is determined by a pair of thresholds (α, β). Based on the physical
meaning of the evaluation function v, we can formulate the problem of finding a pair
of thresholds as an optimization problem [10]. In order words, we search for a pair
of thresholds that produces an optimal trisection according an objective function.
Once we obtain a trisection, we can devise strategies to act on the three regions.
We can study properties of objects in the same region. We can compare objects
in different regions. We can form strategies to facilitate the movement of objects
between regions [1]. There are many opportunities working with a trisection of a
universal set of objects.
Let us use the example of a taxation system again to illustrate the main ideas of
the trisecting-and-acting model of three-way decisions. In this case, OB is the set of
all citzens who pay tax. The evaluation function is the income of a citizen in dollars.
Suppose that a pair of thresholds (α, β) is given in terms of these dollars, say $35k
and $120k, respectively. The three regions Rl (v), Rm (v), and Rh (v)) represent the
low income (i.e., income ≤ $35k), middle income (i.e., $35k < income < $120K ),
and high income (i.e., income ≥ $120 k), respectively. For the three levels of income,
one typically devises different formulas or rates to compute tax.

3 Rough Sets, Fuzzy Sets, Shadowed Sets, and Three-Way


Decisions

Three-way decisions provide a general framework to unify ideas from several theories
for modelling uncertainty. Although the introduction of the concept of three-way
decisions was motivated by the notion of the three regions of rough sets, its recent
developments are far beyond rough sets. To gain insights into three-way analysis of
facial similarity, we interpret probabilistic rough sets and shadowed sets in terms of
three-way decisions.
Facial Similarity Analysis: A Three-Way Decision Perspective 293

3.1 Probabilistic Rough Sets as Three-Way Decisions

In formulating probabilistic rough sets [13, 14], we start with an equivalence rela-
tion E on the set of objects OB, namely, E is reflexive (i.e., ∀x ∈ OB, x E x), sym-
metric (i.e., ∀x, y ∈ OB, x E y =⇒ y E x), and transitive (i.e., ∀x, y, x ∈ OB, x E y ∧
y E z =⇒ x E z). The equivalence relation divides the set of objects into a family of
pair-wise disjoint equivalence classes. Let [x] E , or simply [x] when E is understood,
denote the equivalence class containing x:

[x] E = {y ∈ OB | x E y}. (2)

It can be seen that our card sorting problem can, in fact, be modelled by an equivalence
relation for each individual participant. That is, piles made by a participant can be
viewed as a family of pair-wise disjoint equivalence classes. Given a subset of objects
X ⊆ OB, we define a conditional probability by [7],

|X ∩ [x]|
Pr (X |[x]) = , (3)
|[x]|

where | · | denotes the cardinality of a set. In the framework of three-way decisions,


the condition probability may be viewed as an evaluation function about the proba-
bility that an object belongs to X given that the object belongs to [x]. For a pair of
thresholds (α, β) with 0 ≤ β < α ≤ 1, according to Eq. (1), we have probabilistic
three regions of rough sets:

POS(α,β) (X ) = {x ∈ OB | Pr (X |[x]) ≥ α},


BND(α,β) (X ) = {x ∈ OB | β < Pr (X |[x]) < α},
NEG(α,β) (X ) = {x ∈ OB | Pr (X |[x]) ≤ β}. (4)

That is, the three regions of rough sets are interpreted within the framework of three-
way decisions.
The three regions of rough sets deals with the classification of objects based
on information about the equivalence of objects. Equivalence is a special type of
similarity. Our study of facial similarity uses a notion that may be considered as a
generalization of equivalence relations. More specifically, we consider three levels:
similar, undecidable, and dissimilar. It will be interesting to study approximations
under a three-level of similarity, rather than a two-level equivalence.
294 D. H. Hepting et al.

3.2 Shadowed Sets as Three-Way Approximations of Fuzzy


Sets

A fuzzy set, proposed by Zadeh [15], models a concept with an unsharp boundary.
A fuzzy set is defined by a membership function μA : OB −→ [0, 1]. The value
μA (x) is called the membership grade of x. Pedrycz [8, 9] argues that humans are
typically insensitive to detailed membership grades. While we can easily comprehend
membership grades close to the two end points of 0 and 1, we cannot make distinction
about membership grades in the middle. For this reason, he introduces the notion of
shadowed sets as three-way approximations of fuzzy sets.
In the framework of three-way decisions, the membership function μA is an
evaluation function. Given a pair of thresholds (α, β) with 0 ≤ β < α ≤ 1, according
to Eq. (1), we have a three-way approximation as follows:

CORE(α,β) (μA ) = {x ∈ OB | μA (x) ≥ α},


SHADOW(α,β) (μA ) = {x ∈ OB | β < μA (x) < α},
NULL(α,β) (μA ) = {x ∈ OB | μA (x) ≤ β}. (5)

It might be pointed out the formulation is slightly different from Pedrycz’s formu-
lation. Pedrycz uses three values 0, [0, 1], and 1 as membership grades for the three
regions.
For modelling similarity, we need to consider a fuzzy relation μR : OB ×
OB −→ [0, 1]. A fuzzy similarity relation [16], as a generalization of equivalence
relation, is a fuzzy relation that is reflexive (i.e., ∀x ∈ OB, μcal R (x, x) = 1, symmet-
ric (i.e., ∀x, y ∈ OB, μR (x, y) = μR (y, x), and max-min transitive (i.e., ∀x, z ∈
OB, μR (x, z) ≥ max y∈OB min(μR (x, y), μR (y, z)). The membership grade
μR (x, y) may be interpreted as the degree to which x is similar to y.
In the framework of three-way decisions, the fuzzy similarity relation μR is an
evaluation function on OB × OB. Given a pair of thresholds (α, β) with 0 ≤ β <
α ≤ 1, according to Eq. (1), we have a three-way approximation of the similarity
relation:

SIM(α,β) (μR ) = {(x, y) ∈ OB × OB | μR (x, y) ≥ α},


UND(α,β) (μR ) = {(x, y) ∈ OB × OB | β < μR (x, y) < α},
DIS(α,β) (μR ) = {(x, y) ∈ OB × OB | μR (x, y) ≤ β}, (6)

where SIM, UND, and DIS denote, respectively, the sets of similar, undecidable, and
dissimilar pairs of objects. To be consistent with later discussions, we rename the
three regions to better reflect their semantics.
When applying three-way decisions, it is necessary use semantically sound and
meaningful notions and terms for naming and interpreting an evaluation function
and the resulting three regions. In the rest of this paper, we give details about how
Facial Similarity Analysis: A Three-Way Decision Perspective 295

to obtain a similarity evaluation by pooling together information from a group of


raters. While each of the raters uses an equivalence relations, a synthesis of a family
of equivalence relations is a similarity relation.

4 Three-Way Classification of Similarity Judgments


of Facial Photographs

This section describes an application of the trisecting-and-acting model of three-way


decisions in analyzing facial similarity judgments.

4.1 A Simple Three-Way Classification

Let P denote the set of unordered pairs of photos from a set of photos. Let N denote
the number of participants. Based on the results of sorting, we can easily establish
an evaluation function, v :−→ {0, 1, . . . , N } regarding the similarity of a pair of
photographs, that is, for p ∈ P,

v( p) = the number of participants who put the pair in the same pile. (7)

Given a pair of thresholds (l, u) with 1 ≤ l < u ≤ N , according to Eq. (1), we can
divide the set of pairs P into three pair-wise disjoint regions:

SIM(v) = { p ∈ P | v( p) > u},


UND(v) = { p ∈ P | l ≤ v( p) ≤ u},
DIS(v) = { p ∈ P | v( p) < l}. (8)

They are called, respectively, the sets of similar, undecidable, and dissimilar pairs.
Alternatively, we can consider a normalized evaluation function vn ( p) = v( p)/N ,
which gives the percentage of participants who consider the pair p is similar. This
provides a probabilistic interpretation of the normalized evaluation function. With
such a transformation, we can apply a probabilistic approach, suggested by Yao and
Gao [12], to determine the pair of thresholds (α, β) with 0 < β < α ≤ 1.

4.2 A Refined Interpretation Based on Quality of Judgments

The simple three-way classification discussed in the previous subsection is based on


the analyses reported by Hepting et al. [4], in which they did not consider the quality
of judgments made by each participant. In this subsection, we look at ways to quantify
296 D. H. Hepting et al.

the quality of the judgments by different participants. Intuitively speaking, both the
number of piles and the sizes of individual piles provide hints on the quality and
confidence of a participant. If a participant used more piles and, in turn, smaller sizes
of individual piles, we consider the judgments to be more meaningful. Consequently,
we may assign a higher weight to the participant.
Consider a pile of n photos. According to the assumption that a pair of photos
in the same pile is similar, it produces n(n − 1)/2 pairs of similar photos. Suppose
a participant provided M piles with the sizes, respectively, of n 1 , . . . n M . The total
number of similar pairs of photos is given by:

M
n i (n i − 1)
NS = . (9)
i=1
2

Since the total number of all possible pairs is 356∗355


2
, the probability of judging a
random pair of photos to be similar by the participant is given by:
M
n i (n i − 1)
PS = i=1
, (10)
356 ∗ 355
and the probability of judging a random pair of photos to be dissimilar is given by:

PD = 1 − PS . (11)

Thus, we have a probability distribution (PS , 1 − PS ) to model the similarity judg-


ment of the participant. Returning to Fig. 2, having fewer than 16 dissimilar votes
for a pair (placing it either in the similar or undecidable groups) is highly unlikely.
The intuition is that the smaller the probability, the greater the confidence of the
participant in that judgment. For most pairs of photos, some participants have rated
them similar and some have rated them dissimilar. There are no photo pairs that
were rated similar by all participants, but there are some photo pairs that were rated
as dissimilar by all participants. Based on the probabilities calculated: for the real
data there are 0 all-similar pairs and 232 all-dissimilar pairs expected and for the
simulated data there are likewise 0 all-similar pairs and 11,490 all-dissimilar pairs
expected.

5 Three-Way Analysis of Human Similarity

Based on the proposed model, we report results from analyzing a dataset obtained
from card sorting.
Facial Similarity Analysis: A Three-Way Decision Perspective 297

5.1 Facial Similarity Judgments Through Card Sorting

We briefly review a procedure used to obtain similarity judgments on a set of facial


photographs through a technique known as card sorting. The details have been
reported elsewhere [4].
There were 25 participants who judged the similarity of a set of facial photographs.
Each photograph was presented on a separate card. Each participant was given a
randomly-ordered stack of 356 facial photographs and asked to sort the facial pho-
tographs into an unrestricted number of piles based on perceived similarity. It was
explained to the participants that photographs in the same pile are considered to be
similar and photographs in different piles are considered to be dissimilar. Figure 1a
shows the participant behaviours in the card sorting study.
The total of 63,190 pairs from the 356 cards is a very large number. It was impos-
sible to ask a participant to exhaustively consider all pairs. Instead, the following
procedure was used so that a participant made direct judgments on a small fraction
of all possible pairs. Each participant drew a single photo successively from the stack
of photos. Once a photo was placed in a pile, it could not be moved. When a new
photo was drawn from the stack, a participant only compared the newly-drawn photo
with the very top photo on each existing pile. The new photo could be placed on an
existing pile, or a new pile could be created.
To show the possible utility of the judgments from the described procedure, we
observe the diversity of behaviours from the 25 participants by comparing it with the
randomly-generated judgments. For this purpose, a set of randomly-generated data
for 25 hypothetical participants was created, which was generated according to the
code in Table 1. Figure 1b presents the randomly-simulated participants.
In terms of number of piles, the 25 participants produced between 3 to 38 piles,
which indicates a large variation. It can be observed that the participant judgments in
terms of sizes of different piles are significantly different from those in the randomly-
generated data. This suggests that the restricted procedure does generate useful
human similarity judgments. We hypothesize that the variability in the number of
piles (between 3 and 38) and the pile size (1 and 199) reflects some variability in the
confidence of the participants’ judgments. The interpretation that some participants
judge similarity “correctly” and others judge it “incorrectly” cannot be applied here
because there is no objective standard against which each participant’s ratings can
be judged.

5.2 Three-Way Analysis Based on the Simple Model

For the dataset used in this work, we have N = 25. We set l = 10 and u = 15. Specif-
ically, we consider a pair of photographs to be similar if more than 15 participants out
of 25 put them in the same pile, or equivalently, more than 15/25 = 60% participants
put them in the same pile. We consider a pair of photographs to be dissimilar if less
298 D. H. Hepting et al.

Summary of Pile Sizes by Participant


(a)

200
150
Pile Size
100
50
0

1 3 5 7 9 11 13 15 17 19 21 23 25
Participants

Randomly Generated
(b)
200
150
Pile Sizes
100
50
0

1 3 5 7 9 11 13 15 17 19 21 23 25
Participants

Fig. 1 A summary of pile sizes by participant: a real data from card sorting study and b randomly-
simulated data
Facial Similarity Analysis: A Three-Way Decision Perspective 299

Table 1 Code, written in the python language, to generate piles of photos to simulate participants
behaving randomly

# assign photos randomly to piles


import sys , itertools , random
# dictionary of photos
from photos_dict import photos
# seed random number generator
random. seed ()
# get the l i s t of photo labels
photonames = l i s t (photos . keys ( ) )

# for each participant (p)


for p in range(25):
# i n i t i a l i z e dictionary
randpiles = {}
# s t a r t with 0 piles
pilecount = 0
# randomly shuffle the photo names
random. shuffle (photonames)
# for each photo (ph)
for ph in range(356):
# choose a pile for photo , at random
cp = int (round(random.random() ∗ pilecount ))
# append photo to chosen pile ( i n i t i a l i z e i f needed)
i f cp not in randpiles :
randpiles [cp] = []
( randpiles [cp ] ) . append(photonames[ph])
pilecount += 1
else :
( randpiles [cp ] ) . append(photonames[ph])
# write out the simulated data into a separate f i l e
with open( ’rand/’+ st r (p+1). z f i l l (2)+ ’. txt ’ , ’w’) as outf :
for rk in sorted ( randpiles . keys ( ) ) :
# concatenate same−pile photo names for output
ostr = ""
for pl in range( len ( randpiles [ rk])−1):
ostr += st r (( randpiles [ rk ] ) [ pl ]) + " "
ostr += st r (( randpiles [ rk ] ) [ len ( randpiles [ rk])−1])
ostr += "\n"
outf . write ( ostr )
300 D. H. Hepting et al.

Frequency of Dissimilar Votes, Real vs. Random

20000
Real data
Random data

15000
Similar Undecidable Dissimilar
Pair Count
10000
5000
0

0 5 10 15 20 25
Number of Dissimilar Votes per Pair

Fig. 2 A summary of ratings by participant from real data from card sorting study and randomly-
simulated data

Table 2 Number of pairs in Region Real Random


each region
Similar (SIM) 125 0
Undecidable 6416 0
(UND)
Dissimilar (DIS) 56,649 63,190

than 10 participants out of 25 put them in the same pile, or equivalently, less than
10/25 = 40% participants put them in the same pile. Otherwise, we view that the
judgments of the 25 participants are inconclusive to declare similarity or dissimilarity
of the pair of photos.
Figure 2 shows the effects of these thresholds on the real and random data. Based
on the pair of thresholds l = 10 and u = 15, we have similar pairs, undecidable
pairs, and dissimilar pairs. Table 2 summarizes the numbers of pairs in each region,
between the observed and randomly-simulated data.
Figure 3 shows two samples of Similar pairs (S1 and S2 refer to the left and right
pairs, respectively). For both S1 and S2, 19 participants put the pair into the same
pile. Figure 4 shows two samples of Undecidable pairs (U1 and U2 refer to the left
and right pairs, respectively). For both U1 and U2, 13 participants put the pair into
the same pile. Figure 5 shows two samples of Dissimilar pairs (D1 and D2 refer to
the left and right pairs, respectively). For D1, 4 participants put the pair into the same
pile and for D2, only 2 participants put the pair into the same pile.
Facial Similarity Analysis: A Three-Way Decision Perspective 301

Fig. 3 The 2 pairs of photos shown here (S1 left, S2 right) represent samples from the similar
(SIM) region

Fig. 4 The 2 pairs of photos shown here (U1 left, U2 right) represent samples from the undecidable
(UND) region. Pairs U1 and U2 were highlighted in the study by Hepting and Almestadi [3]

Fig. 5 The 2 pairs of photos shown here (D1 left, D2 right) represent samples from the dissimilar
(DIS) region

An inspection of the final three-way classification confirms that pairs in the similar
set are indeed similar, pairs in the dissimilar set are very different, and pairs in the
undecidable set share some common features while differing in some other aspects.
302 D. H. Hepting et al.

5.3 Three-Way Analysis Based on the Refined Model

A more refined approach is possible by looking at the number of photos that are
considered along with the photos in any particular pair. If a participant   made M
piles, the number of possible configurations for the participant is M + M2 . Figure 6
compares the variability in observed participant data (min = 6, max = 741) with
that of simulated participants (min = 105, max = 276). These plots summarize the
number of possible pile configurations that may contain a particular photo pair, by
participant. Higher numbers of possible configurations correspond to more piles of
smaller size.
Figure 7 summarizes the number of photos in the piles that contain each of the
photo pairs in Figs. 3, 4 and 5. When the pair is judged to be dissimilar (N) by a
participant, the number of photos associated with the pair is the sum of the sizes of
the 2 different piles that each contain one of the photos in the pair. When the pair is
judged to be similar (Y) by a participant, the number of photos associated with the
pair is size of the single pile that contains both photos.
Figure 8 summarizes by relative rank the number of photos associated with each
of the photo pairs in Figs. 3, 4 and 5. Regardless of the number of possible pile con-
figurations that may contain the pair of interest, the smallest of these configurations
has a relative rank approaching 0 and the largest of these configurations has a relative
rank of 1. The relative rank can be transformed into a similarity score according to
Eq. 12.

2−relative rank(PAB )
, A and B are in the same pile,
Sr (A, B) = 2
relative rank(PA +PB ) (12)
2
, A and B are in two different piles.

This score is computed for each rating of each pair of photos. From the card sorting
study, 63,190 scores can be computed for each of the 25 participants. As an example,
Participant 21 made 7 piles of photos with sizes: 2, 19, 36, 36, 56, 86, and 120 (355
photos rated). This leads to 28 configurations of piles, some with the same size.
Please see Table 3 for details of the calculations and Fig. 9 for a plot of the results. In
order to create a single similarity score for a pair of photos, we sum the score from
each rating and divide by the number of raters (N ), according to Eq. 13.

1 
N
S(A, B) = Sr (A, B). (13)
N r =1

Figure 10 summarizes the similarity scores, sorted into increasing order, for each
rating of each sample pair. The scores are determined by the relative rank of the
configuration that contains the pair. Similarity scores for pairs rated as dissimilar
Facial Similarity Analysis: A Three-Way Decision Perspective 303

(a) Possible Pile Configurations Containing a Photo Pair

700
600
500
400
Count
300
200
100
0

Participants

(b) Possible Pile Configurations Containing a Photo Pair


700
600
500
400
Count
300
200
100
0

Simulated Participants

Fig. 6 Number of possible pile configurations that may contain a particular photo pair, by partici-
pant. Real participants on the left and simulated participants on the right
304 D. H. Hepting et al.

Summary of Photos in Pile(s) with S1 Summary of Photos in Pile(s) with S2


350

350
Number of Photos with Pair
300

300
Number of Photos with Pair
250

250
200

200
150

150
100

100
50

50
0

0
N Y N Y

Summary of Photos in Pile(s) with U1 Summary of Photos in Pile(s) with U2


350

350
Number of Photos with Pair
300

300
Number of Photos with Pair
250

250
200

200
150

150
100

100
50

50
0

N Y N Y

Summary of Photos in Pile(s) with D1 Summary of Photos in Pile(s) with D2


350

350
Number of Photos with Pair
300

300
Number of Photos with Pair
250

250
200

200
150

150
100

100
50

50
0

N Y N Y

Fig. 7 Summary of pile configuration sizes for the sample pairs (see Figs. 3, 4 and 5). The bold
lines indicate the median sizes
Facial Similarity Analysis: A Three-Way Decision Perspective 305

Summary of Relative Ranks for Pair S1 Summary of Relative Ranks for Pair S2
1.0

1.0
0.8

0.8
Relative Rank

Relative Rank
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
N Y N Y

Summary of Relative Ranks for Pair U1 Summary of Relative Ranks for Pair U2
1.0

1.0
0.8

0.8
Relative Rank

Relative Rank
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

N Y N Y

Summary of Relative Ranks for Pair D1 Summary of Relative Ranks for Pair D2
1.0

1.0
0.8

0.8
Relative Rank

Relative Rank
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

N Y N Y

Fig. 8 Summary of relative ranks for the sample pairs (see Figs. 3, 4 and 5). The bold lines indicate
the median relative ranks
306 D. H. Hepting et al.

Table 3 Calculations from Eq. 12 carried out for participant 21, who made 7 piles of photos with
sizes: 2, 19, 36, 36, 56, 86, and 120
Size Rank Relative rank Similarity score
2 1 0.0357 0.9821
19 2 0.0714 0.9643
2 + 19 = 21 3 0.1071 0.0536
36 4 0.1429 0.9286
2 + 36 = 38 6 0.2143 0.1071
19 + 36 = 55 8 0.2857 0.1429
56 10 0.3571 0.8214
2 + 56 = 58 11 0.3929 0.1964
36 + 36 = 72 12 0.4286 0.2143
19 + 56 = 75 13 0.4642 0.2321
86 14 0.5000 0.7500
2 + 86 = 88 15 0.5357 0.2679
36 + 56 = 92 16 0.5714 0.2857
19 + 86 = 105 18 0.6429 0.3214
120 19 0.6786 0.6607
2 + 120 = 122 20 0.7143 0.3571
36 + 86 = 122
19 + 120 = 139 23 0.8214 0.4107
56 + 86 = 142 24 0.8571 0.4286
36 + 120 = 156 25 0.8929 0.4464
56 + 120 = 176 27 0.9643 0.4821
86 + 120 = 206 28 1.0000 0.5000
Similar ratings are indicated by bold type

(not placed in the same pile) will be in the range (0, 0.5] and scores for pairs rated
as similar (placed in the same pile) will be in the range (0.5, 1.0). A score near 0
occurs when the photo pair is rated as dissimilar, but the combined size of the piles
containing the photos is very small. A score near 1 occurs when the photo pair is
rated as similar and the size of that pile is the very small. The similarity scores of the
sample pairs are, for S1: 0.7377; for S2: 0.7230; for U1: 0.5607; for U2: 0.5742; for
D1: 0.4015; and for D2: 0.4421. In Sect. 5.2, we began with α0 = 0.6 and β0 = 0.4.
We notice that S1, S2, U1, and U2 remain in their original regions. However D1
and D2 are now both in region UND. Let us examine the selection of α and β more
closely.
Facial Similarity Analysis: A Three-Way Decision Perspective 307

Fig. 9 Plot of similarity Plot of Similarity Scores from Ranked Pile Configurations
scores from rank of pile

1.0
configurations for participant
21. See Table 3 for the
calculations

0.8
0.6
Score
0.4
0.2

0 5 10 15 20 25
Rank

Figure 11 considers all similarity scores from all ratings of photopairs. The boxplot
summarizes 1,267,785 dissimilar (N) ratings and 304,186 similar (Y) ratings. From
this analysis, we chose two sets of thresholds.
• α1 = 0.7000 (median score for pairs in same pile), β1 = 0.4367 (median score
for pairs in different piles). The application of this threshold set is illustrated in
Fig. 13.
• α2 = 0.6389 (25th percentile of scores for pairs in same pile), β2 = 0.4824 (75th
percentile of scores for pairs in different piles). The application of this threshold
set is illustrated in Fig. 14.
In Fig. 12, the trilinear plot summarizes an exploration for values of α and β. Each
plotted point represents the fraction of pairs in the DIS, UND, and SIM regions by
different choices for α and β. Points at a vertex indicate 100% of the pairs are assigned
to the region indicated by vertex label. In this Figure, each point represents the assign-
ment of all 63,190 pairs to the 3 regions. It is also possible to consider the assignment
of a pair’s individual ratings to those regions and obtain more finely-grained infor-
mation about the pair’s similarity. Figures 13 and 14 illustrate the assignment of
individual ratings amongst the DIS, UND, and SIM regions.
308 D. H. Hepting et al.

Scores for Pair S1 Scores for Pair S2


1.0

1.0
0.8

0.8
Similarity Score

Similarity Score
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
5 10 15 20 25 5 10 15 20 25

Participants (Sorted) Participants (Sorted)

Scores for Pair U1 Scores for Pair U2


1.0

1.0
0.8

0.8
Similarity Score

Similarity Score
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

5 10 15 20 25 5 10 15 20 25

Participants (Sorted) Participants (Sorted)

Scores for Pair D1 Scores for Pair D2


1.0

1.0
0.8

0.8
Similarity Score

Similarity Score
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

5 10 15 20 25 5 10 15 20 25

Participants (Sorted) Participants (Sorted)

Fig. 10 Summary of similarity scores, sorted into ascending order, for each rating of the sample
pairs (see Figs. 3, 4 and 5)
Facial Similarity Analysis: A Three-Way Decision Perspective 309

Summary of Scores for All Pairs

1.0
1 = 0.7000
1 = 0.4367
2 = 0.6389
2 = 0.4824

0.8
0.6
Score
0.4
0.2
0.0

N Y

Fig. 11 Summary of similarity scores for dissimilar (N) and similar (Y) ratings for all 63,190 pairs,
computed according to Eq. 13. Two pairs of thresholds, (α1 , β1 ) and (α2 , β2 ), are also indicated

u = 15, l = 10
0 = 0.6000, 0 = 0.4000
1 = 0.7000, 1 = 0.4367 UND
2 = 0.6389, 2 = 0.4824

DIS SIM

Fig. 12 This trilinear plot summarizes an exploration for values of α and β taken from [0,1] at
increments of 0.01 such that α > β. Each point plotted in grey represents a choice of α and β.
Plotted in black are the points corresponding to Table 4
310 D. H. Hepting et al.

Ratings for Pair S1 ( 1, 1) Ratings for Pair S2 ( 1, 1)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

0
DIS UND SIM DIS UND SIM
Region Region

Ratings for Pair U1 ( 1, 1) Ratings for Pair U2 ( 1, 1)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

DIS UND SIM DIS UND SIM


Region Region

Ratings for Pair D1 ( 1, 1) Ratings for Pair D2 ( 1, 1)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

DIS UND SIM DIS UND SIM


Region Region

Fig. 13 Classification as one of dissimilar, undecidable, or similar. These decisions are based on
thresholds α1 = 0.7000 and β1 = 0.4367
Facial Similarity Analysis: A Three-Way Decision Perspective 311

Ratings for Pair S1 ( 2, 2) Ratings for Pair S2 ( 2, 2)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

0
DIS UND SIM DIS UND SIM
Region Region

Ratings for Pair U1 ( 2, 2) Ratings for Pair U2 ( 2, 2)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

DIS UND SIM DIS UND SIM


Region Region

Ratings for Pair D1 ( 2, 2) Ratings for Pair D2 ( 2, 2)


25

25
20

20
15

15
Count

Count
10

10
5

5
0

DIS UND SIM DIS UND SIM


Region Region

Fig. 14 Classification as one of dissimilar, undecidable, or similar. These decisions are based on
thresholds α2 = 0.6389 and β2 = 0.4824
312 D. H. Hepting et al.

Table 4 Number of pairs classified for different threshold pairs. The first line of data is repeated
from Table 2
Thresholds Dissimilar (DIS) Undecidable (UND) Similar (SIM)
(u = 15, l = 10) 56,649 6416 125
(α0 = 0.6000, β0 = 2782 60,018 390
0.4000)
(α1 = 0.7000, β1 = 16,472 46,714 4
0.4367)
(α2 = 0.6389, β2 = 43,469 19,649 72
0.4824)

6 Conclusions and Future Work

This work presents a three-way classification of human judgments of similarity. The


agreement of a set of participants leads to both a set of similar pairs and a set of
dissimilar pairs. Their disagreement leads to undecidable pairs. Findings from this
study may find practical applications. For example, the selected photo pairs (Figs. 3,
4 and 5) may provide a firm foundation for the development of understanding of
the processes or strategies that different people use to judge facial similarity. We
anticipate that it may be possible to use the correct identification of strategy to create
presentations of photos that would allow eyewitness identification to have improved
accuracy and utility.
As future work, a three-way classification suggests two types of investigation. By
studying each class of pairs, we try to identify features that are useful in arriving at a
judgment of similarity or dissimilarity. By comparing pairs of classes, for example,
the class of similar pairs and the class of dissimilar pairs, we try to identify features
that enable the participants to differentiate the two classes. It will also be of interest
to define quantitative measures to precisely describe our initial observations.

Acknowledgements The authors thank the editors, Rafael Bello, Rafael Falcon, and José Luis
Verdegay, for their encouragement and the anonymous reviewers for their constructive comments.
This work has been supported, in part, by two NSERC Discovery Grants.

References

1. Gao, C., Yao, Y.: Actionable strategies in three-way decisions. In: Knowledge-Based Systems
(2017). https://doi.org/10.1016/j.knosys.2017.07.001
2. Grant, D.A., Berg, E.: A behavioral analysis of degree of reinforcement and ease of shifting
to new responses in Weigl-type card-sorting problem. J. Exp. Psychol. 38(4), 404–411 (1948).
https://doi.org/10.1037/h0059831
3. Hepting, D.H., Almestadi, E.H.: Discernibility in the analysis of binary card sort data. In:
Ciucci, D., Inuiguchi, M., Yao, Y., Śle˛zak, D., Wang, G. (eds.) Rough Sets, Fuzzy Sets, Data
Facial Similarity Analysis: A Three-Way Decision Perspective 313

Mining, and Granular Computing. RSFDGrC 2013, Lecture Notes in Computer Science, vol.
8170, pp. 380–387. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-41218-9_41
4. Hepting, D.H., Spring, R., Śle˛zak, D.: A rough set exploration of facial similarity judgements.
In: Peters, J.F., Skowron, A., Hiroshi, S., Chakraborty, M.K., Śle˛zak, D., Hassanien, A.E., Zhu,
W. (eds.) Transactions on Rough Sets XIV. Lecture Notes in Computer Science, vol. 6600, pp.
81–99. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-21563-6_5
5. Kelly, J.: Computing, Cognition and the Future of Knowing. Whitepaper, IBM Reseach (2015)
6. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing
of Visual Information. W.H. Freeman and Company, New York (1982)
7. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach.
Int. J. Man-Mach. Stud. 29(1), 81–95 (1988). https://doi.org/10.1016/S0020-7373(88)80032-4
8. Pedrycz, W.: Shadowed sets: representing and processing fuzzy sets. Trans. Sys. Man Cyber.
Part B 28(1), 103–109 (1998). https://doi.org/10.1109/3477.658584
9. Pedrycz, W.: Shadowed sets: bridging fuzzy and rough sets. In: Rough-Fuzzy Hybridization:
A New Trend in Decision Making, 1st edn., pp. 179–199. Springer, New York (1999)
10. Yao, Y.: An Outline of a Theory of Three-Way Decisions, pp. 1–17. Springer, Berlin (2012).
https://doi.org/10.1007/978-3-642-32115-3_1
11. Yao, Y.: Three-way decisions and cognitive computing. Cogn. Comput. 8(4), 543–554 (2016).
https://doi.org/10.1007/s12559-016-9397-5
12. Yao, Y., Gao, C.: Statistical Interpretations of Three-Way Decisions, pp. 309–320. Springer
International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-25754-9_28
13. Yao, Y., Greco, S., Słowiński, R.: Probabilistic Rough Sets, pp. 387–411. Springer, Berlin
(2015). https://doi.org/10.1007/978-3-662-43505-2_24
14. Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20(5), 287–297 (2003). https://
doi.org/10.1111/1468-0394.00253
15. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965). https://doi.org/10.1016/S0019-
9958(65)90241-X
16. Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971). https://
doi.org/10.1016/S0020-0255(71)80005-1
Part III
Hybrid Approaches
Fuzzy Activation of Rough Cognitive
Ensembles Using OWA Operators

Marilyn Bello, Gonzalo Nápoles, Ivett Fuentes, Isel Grau, Rafael Falcon,
Rafael Bello and Koen Vanhoof

Abstract Rough Cognitive Ensembles (RCEs) can be defined as a multiclassifier


system composed of a set of Rough Cognitive Networks (RCNs), each operating at
a different granularity degree. While this model is capable of outperforming several
traditional classifiers reported in the literature, there is still room for enhancing its
performance. In this paper, we propose a fuzzy strategy to activate the RCN input
neurons before performing the inference process. This fuzzy activation mechanism
essentially quantifies the extent to which an object belongs to the intersection between
its similarity class and each granular region in the RCN topology. The numerical sim-
ulations have shown that the improved ensemble classifier significantly outperforms
the original RCE model for the adopted datasets. After comparing the proposed model
to 14 well-known classifiers, the experimental evidence confirms that our scheme
yields very promising classification rates.

M. Bello (B) · I. Fuentes · R. Bello


Computer Science Department, Central University of Las Villas, Santa Clara, Cuba
e-mail: mbgarcia@uclv.cu
I. Fuentes
e-mail: ivett@uclv.cu
R. Bello
e-mail: rbellop@uclv.edu.cu
M. Bello · G. Nápoles · I. Fuentes · K. Vanhoof
Faculty of Business Economics, Hasselt University, Hasselt, Belgium
e-mail: gonzalo.napoles@uhasselt.be
K. Vanhoof
e-mail: koen.vanhoof@uhasselt.be
I. Grau
Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium
R. Falcon
Research & Engineering Division, Larus Technologies Corporation, Ottawa, Canada
e-mail: rfalcon@ieee.org
R. Falcon
School of Electrical Engineering & Computer Science,
University of Ottawa, Ottawa, Canada

© Springer Nature Switzerland AG 2019 317


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_16
318 M. Bello et al.

Keywords Pattern classification · Granular Computing · Ensemble learning


Rough cognitive maps · Fuzzy activation mechanism

1 Introduction

The advent of Big Data [8] has underscored the need to shift how automated sys-
tems ingest, represent and process real-world or simulated data. Given the volume,
velocity, veracity and variability challenges posed by the Big Data phenomenon, it
is no longer realistic to expect that traditional pattern classification algorithms [10]
could sift through these sizable datasets and yield actionable insights in a reason-
able amount of time. The focus has then moved to the development of algorithms
that perceive and treat data at a higher, more symbolic level instead of dealing with
the underlying, often numerical representation. Granular Computing (GrC) [4] has
proved an excellent paradigm for this kind of processing that suits our data-prolific
world quite well.
One of the manifestations of applying GrC to automated systems is that of granular
classifiers [3]. In particular, Fuzzy Cognitive Maps (FCMs) [17] have been hybridized
with information granules stemming from fuzzy sets [25] or rough sets [22, 23].
Rough cognitive networks (RCNs) [23] are a type of granular classifier in which
a sigmoid FCM’s topology (i.e., the set of concepts and weights) is automatically
learned from data. An RCN node denotes either a decision class or one of the three
approximation regions (positive, negative or boundary) originated from a granulation
of the input space according to Rough Set Theory (RST) principles.
While RCN’s classification performance was deemed competitive with respect
to state-of-the-art classifiers [23], they were still sensitive to an input parameter
denoting the similarity threshold upon which the rough information granules are
built. To overcome that limitation, Rough Cognitive Ensembles (RCEs) were recently
put forth by Nápoles et al. [22]. An RCE is an ensemble method with a collection
of RCNs as base classifiers, each operating at a different granularity level. After
comparing RCEs to 15 state-of-the-art classifiers, it was concluded that the proposed
technique produced highly competitive prediction rates.
In this paper we bring forth a new activation mechanism for RCE that boosts its
performance in classification problems. This new mechanism essentially quantifies
the extent to which an object belongs to the intersection between its similarity class
and each granular region. For that, it is necessary applying an information aggregation
process. In this research, we use an aggregation technique based on the ordered
weighted averaging operators (OWA) [34]. After comparing the improved ensemble
classifier to the original RCE model and 14 other state-of-the-art classifiers, the
experimental evidence suggests that our scheme yields very promising classification
rates.
The remainder of this paper is structured as follows. Section 2 elaborates on
the two building blocks behind of rough cognitive mapping. Section 3 unveils the
fundamentals of the RCNs and RCEs. The new activation rule is described in Sect. 4
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 319

while the empirical analysis is found in Sect. 5. Conclusions and pointers to future
work are given in Sect. 6.

2 Theoretical Background

2.1 Rough Set Theory

Rough Set Theory is a methodology proposed in the early 1980’s for handling uncer-
tainty arising in the form of inconsistency [24]. Let DS = (U, Ψ ∪ {d}) denote a
decision system where U is a non-empty, finite set of objects called the universe
of discourse, Ψ is a non-empty, finite set of attributes describing any object in U
and d ∈ / Ψ represents the decision attribute. Any subset X ∈ U can be approximated
by two crisp sets, which are referred to as its lower and upper approximations and
denoted by Φ X = {x ∈ U | [x]Φ ∈ X } and Φ X = {x ∈ U | [x]Φ ∩ X = ∅}, respec-
tively. In this classic formulation, the equivalence class [x]Φ comprises the set of
objects in U that are deemed inseparable from x according to the information con-
tained in the attribute subset Φ ⊆ Ψ .
The lower and upper approximations are the basis for computing the posi-
tive, negative and boundary regions of any set X . The positive region P O S(X ) =
Φ X includes those objects that are certainly contained in X ; the negative region
N E G(X ) = U − Φ X denotes those objects that are certainly not related to X , while
the boundary region B N D(X ) = Φ X − Φ X captures the objects whose member-
ship to the set X is uncertain, i.e., they might be members of X .
In the original RST formulation, two objects are deemed indiscernible if they have
identical values for the selected attributes. This binary equivalence relation leads to a
partition of the universe into multiple equivalence classes. While this definition works
well with nominal attributes, it is not applicable to numerical attributes. To relax
this stringent requirement, we can replace the equivalence relation with a similarity
relation.
Equation (1) shows the indiscernibility relation adopted in this paper, where 0 ≤
ϕ(x, y) ≤ 1 is a similarity function. This weaker binary relation claims that two
objects x and y are inseparable as long as their similarity degree ϕ(x, y) goes above
a similarity threshold 0 ≤ ξ ≤ 1. This user-specified parameter establishes the degree
of granularity upon which the similarity classes are built. Determining the precise
granularity degree becomes a central issue when designing high-performing rough
classifiers.
R : x Ry ⇐⇒ ϕ(x, y) ≥ ξ (1)

The similarity function could be formulated in a variety of ways. In this paper, we


assume that ϕ(x, y) = 1 − δ(x, y), where 0 ≤ δ(x, y) ≤ 1 is the distance function.
320 M. Bello et al.

2.2 Fuzzy Cognitive Maps

Fuzzy Cognitive Maps can be defined as interpretable recurrent neural networks


widely used in modeling and simulation purposes [17]. Their topology describes a
set of concepts (i.e., objects, variables or entities in a particular problem) and their
causal relations. The activation value of such concepts (also called neurons) regularly
takes values in the [0, 1] interval. On the other hand, the strength of the causal
relation between two concepts Ci and C j is quantified by a weight wi j ∈ [−1, 1] and
denoted via a directed edge from Ci to C j . There are three possible types of causal
relationships among neural processing entities that express the type of influence from
one neuron to the other.
• If wi j > 0 then an increment (decrement) in the cause concept Ci produces an
increment (decrement) of the effect concept C j with intensity |wi j |.
• If wi j < 0 then an increment (decrement) in the cause concept Ci produces a
decrement (increment) of the effect concept C j with intensity |wi j |.
• If wi j = 0 then there is no causal relation between Ci and C j .
Equation (2) shows Kosko’s inference rule [17], which is based on the standard
McCulloch-Pitts scheme [21]. In this inference rule, Ai(t) is the activation value of
the Ci neuron at the tth iteration, w ji is the causal weight connecting the neurons C j
and Ci while f (.) is a monotonically non-decreasing transfer function (e.g., binary,
trivalent, sigmoid). This updating mechanism is repeated until a stopping condition
is satisfied, thus producing a state vector A(t) at each iteration. The activation degree
of each neuron is given by the value of the transformed weighted sum that this
processing unit receives from connected neurons on the causal network.
⎛ ⎞
M
Ai(t+1) = f⎝ w ji A(t)
j
⎠ , i = j (2)
j=1

After a fixed number of iterations, the system will arrive at one of the following
states: (i) equilibrium point, (ii) limited cycle or (iii) chaotic behavior [18]. The map
is said to have converged if it reaches a fixed-point attractor. Otherwise, the process
terminates after a maximum number of iterations T is reached and tput corresponds
to the activation vector A(T ) in the last iteration T .

3 Rough Cognitive Mapping

In this section, we introduce the main principles behind the Rough Cognitive Net-
works and the Rough Cognitive Ensembles.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 321

3.1 Rough Cognitive Networks

Recently, Nápoles and his collaborators [23] introduced the RCNs in an attempt to
develop an accurate, transparent classification model that hybridizes RST and FCMs.
Basically, an RCN is a granular sigmoid FCM whose topology is defined by the
abstract semantics of the three-way decision rules [35, 36]. The set of input neurons
in an RCN represents the positive, boundary and negative regions of the decision
classes in the problem under consideration. tput neurons describe the set of decision
classes. The RCN topology (both concepts and weights) is entirely computed from
historical data, thus removing the need for expert intervention at this stage.
The first step in the RCN learning process is related to the input data granula-
tion using RST. The positive, boundary and negative regions of each decision class
according to a predefined attribute subset are computed using the training data set
and a predefined similarity relation R (see Sect. 2.1).
The second step is concerned with automated topology design. A sigmoid FCM
is automatically created from the previously computed RST-based information gran-
ules. In this scheme, each rough region is mapped to an input neuron whereas each
decision class is represented by an output neuron. Rules R1 − R4 formalize the direc-
tion and intensity of the causal weights in the proposed topology; these weights are
estimated by using the abstract semantics of three-way decision rules.
• (R1 ) IF Ci is Pk AND C j is Dk THEN wi j = 1.0
• (R2 ) IF Ci is Pk AND C j is Dv=k THEN wi j = −1.0
• (R3 ) IF Ci is Pk AND C j is Pv=k THEN wi j = −1.0
• (R4 ) IF Ci is Nk AND C j is Dk THEN wi j = −1.0
In such rules, Ci and C j represent two map neurons, Pk and Nk are the positive
and negative regions related to the kth decision respectively while −1 ≤ wi j ≤ 1 is
the causal weight between the cause Ci and the effect C j .
Although the boundary regions are concerned with an abstaining decision, an
instance x ∈ B N D(X k ) could be positively related to the kth decision alternative.
Therefore, an additional rule considering the knowledge about boundary regions is
introduced.
• (R5 ) IF Ci is Bk AND C j is Dv AND B N D(X k ) ∩ B N D(X v ) = ∅ THEN wi j =
0.5
Figure 1 displays an RCN for solving binary classification problems. Notice that
we added a self-reinforcement positive causal connection to each input neuron with
the goal of preserving its initial excitation level when performing the neural updating
rule.
The last step refers to the network exploitation, which simply means computing the
response vector Ax (D) = (A x (D1 ), . . . , A x (Dk ), . . . , A x (D K )). The input object
x is presented to the RCN as an input vector A(0) that activates the causal network.
Rules R6 –R8 formalize the method used to activate the input neurons, which is based
on the inclusion degree of the object to each rough granular region.
322 M. Bello et al.

Fig. 1 RCN for pattern recognition problems with two decision classes

| R̄(x) ∩ P O S(X k )|
• (R6 ) IF Ci is Pk THEN Ai(0) =
|P O S(X k )|
(0) | R̄(x) ∩ N E G(X k )|
• (R7 ) IF Ci is Nk THEN Ai =
|N E G(X k )|
| R̄(x) ∩ B N D(X k )|
• (R8 ) IF Ci is Bk THEN Ai(0) =
|B N D(X k )|
Once the excitation vector A(0) has been computed, the reasoning rule depicted in
Eq. (2) is performed until either the network converges to a fixed-point attractor or a
maximal number of iterations is reached. Then, the class with the highest activation
value is assigned to the object.

3.2 Rough Cognitive Ensembles

RCEs were recently introduced in [22] so as to eliminate the RCN parameter learning
stage. An RCE is an ensemble of several RCNs where each base classifier operates
at a different granularity degree.
Figure 2 displays an RCE comprised of N base classifiers with K decision classes,
where Dk(i) denotes the kth decision class for the ith granular network R(ξi ) and Dk
is the aggregated-type concept associated with the kth decision class.
In order to activate the ensemble, N excitation vectors {A(0)
[x|ξi ] }i=1 are computed,
N
(0)
where A[x|ξi ] is used to perform the neural reasoning process in the ith RCN. The
ith activation vector denotes the inclusion degree of the similarity class R̄(ξi ) (x) into
each information granule induced by the corresponding similarity threshold ξi .
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 323

Fig. 2 Rough Cognitive Ensemble of N networks for problems with M classes

The reader may notice that if ξi ≤ ξ j then R̄(ξi ) (x) ⊆ R̄(ξ j ) (x), which could pro-
duce correlated base classifiers [31]. Hence, the authors resorted to instance bagging
[5] in order to counter the correlation effects coming from this rule. By doing so, a
reasonable trade-off between ensemble diversity and accuracy was attained.
Another important aspect of RCEs is related to the aggregation of multiple outputs
once the neural reasoning step is completed. Combining the decisions of different
models means amalgamating the various outputs into a single prediction. The sim-
plest way to do this in classification models is adopting a standard (or weighted)
voting scheme [7]; in this way, the predicted class is derived from the aggregated
output vector.

4 A Fuzzy Activation Mechanism

In RCN-based classifiers, once the networks have been constructed we can determine
the decision class for a new observation by performing the neural reasoning process.
Rules R6 –R8 compute the initial activation vector A(0) , as mentioned in Sect. 2.2.
This mechanism is simply the proportion of the objects in a particular rough region
R E G(X ) that also belong to the new object’s similarity class R̄(x). It does not
take into account the similarity of these objects (located at the intersection of both
concepts, y ∈ R̄(x) ∩ R E G(X )) with respect to the objects in R̄(x) or those in
R E G(X ).

Example 1 Let us suppose that R̄(x) = {x, y1 , y2 } and P O S(X 1 ) = {y2 , y3 , y4 }.


This implies that A(0)
P O S(X 1 ) (x) = |{y2 }|/|{y2 , y3 , y4 }| = 1/3 = 0.33. This activation
324 M. Bello et al.

mechanism does not explicitly consider the membership degree of y2 to either concept
R̄(x) or P O S(X 1 ) when activating the corresponding neuron.

To overcome this drawback, we propose a fuzzy activation mechanism that is


based on the inclusion degree of each object to the intersection R̄(x) ∩ R E G(X k ),
where R E G(X ) stands for any of the three rough regions associated with the concept
X , i.e., P O S(X k ), N E G(X k ) or B N D(X k ).
Equation (3) shows how to compute the activation value of a neuron denoting
a granular region, where T denotes a t-norm. A t-norm is a conjunction function
T : [0, 1] × [0, 1] → [0, 1] that fulfills three conditions: (i) ∀a ∈ [0, 1], T (a, 1) =
T (1, a) = a, (ii) ∀a, b ∈ [0, 1], T (a, b) = T (b, a), and (iii) ∀a, b, c ∈ [0, 1], T
(a, T (b, c)) = T (T (a, b), c).
The inclusion degree is modeled as a fuzzy set. If the object does not belong to
the intersection set R̄(x) ∩ R E G(X k ), then its membership will be zero. Notice that
this fuzzy activation strategy allows introducing further flexibility when defining the
similarity relation attached to each granular base classifier in numerical domains.
  
T μ R̄(x) (y), μ R E G(X k ) (y)
y∈ R̄(x)∩R E G(X k )
A R E G(X k ) (x) =  (3)
μ R E G(X k ) (y)
y∈R E G(X k )

Rules R6∗ –R8∗ comprise the new rules proposed in this paper.
• (R6∗ ) IF Ci is Pk THEN Ai(0) = A P O S(X k ) (x)
• (R7∗ ) IF Ci is Nk THEN Ai(0) = A N E G(X k ) (x)
• (R8∗ ) IF Ci is Bk THEN Ai(0) = A B N D(X k ) (x)
The terms μ R̄(x) (y) and μ R E G(X k ) (y) are the membership degrees of y to the test
object’s similarity class and rough region of the concept X , respectively. For com-
puting both, it is necessary applying an information aggregation process.
In this research, we use an aggregation technique based on the ordered weighted
averaging operators (OWA) [34] that provide an aggregation which lies in between
two extreme cases. At one extreme is the situation in which we desire that all the
criteria be satisfied. At the other extreme is the case in which the satisfaction of any
of the criteria is all we desire. These two extreme cases lead to the use of “and” and
“or” operators to combine both criteria (i.e., μ R̄(x) (y) and μ R E G(X k ) (y)).
In Eqs. (4) and (5) it shows how the terms μ R̄(x) (y) and μ R E G(X k ) (y) are calculated
using OWA operators.

μ OR̄(x)
WA
(y) = O W A W (ϕ(y, x1 ), . . . , ϕ(y, xn )), xi=1,n ∈ R̄(x) (4)

μ OR EWG(X
A
k)
(y) = O W A W (ϕ(y, x1 ), . . . , ϕ(y, xn )), xi=1,n ∈ R E G(X k ) (5)

Example 2 Let us assume that R̄(x) and P O S(X 1 ) are given as displayed in Example
1 and W = W Ave = ( n1 , . . . , n1 ). Following on from this, they are computed as the
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 325

Table 1 Similarity matrix x y1 y2 y3 y4


x 1 0.98 0.95 0.75 0.85
y1 0.98 1 0.35 0.8 0.9
y2 0.95 0.35 1 0.98 0.95
y3 0.75 0.8 0.98 1 0.91
y4 0.85 0.9 0.95 0.91 1

average similarity of y to all objects in each set. Additionally, let us suppose that the
similarity among all the objects is given in Table 1. From the above assumption we
can compute:
• μ R̄(x) (y2 ) = (ϕ(y2 , x) + ϕ(y2 , y1 ) + ϕ(y2 , y2 ))/|{x, y1 , y2 }| = (0.95 + 0.35 + 1)
/3 = 0.77
• μ P O S(X 1 ) (y2 ) = (ϕ(y2 , y2 ) + ϕ(y2 , y3 ) + ϕ(y2 , y4 ))/|{y2 , y3 , y4 }| = (1 + 0.98 +
0.95)/3 = 0.98
In this example, the activation degree A(0)P O S(X 1 ) = (0.98 ∗ 0.77)/2.89 = 0.26.
The reader can notice that this value is slightly lower than 0.33 in Example 1, and
presumably more realistic. In the next section, we explore the prediction capability
of the RCE algorithm using this new activation mechanism. The resulting algorithm
is named Fuzzy Rough Cognitive Ensembles.

5 Results and Discussion

We first describe the experimental settings and then compare RCE’s performance in
both crisp and fuzzy environments. To conclude, we compare the best-performing
ensemble algorithm against state-of-the-art classifiers.

5.1 Experimental Design

Aiming at exploring whether the improved method leads to higher prediction rates
or not, we leaned upon 100 classification datasets taken from the UCI Machine
Learning [20] repository. Table 2 outlines the number of instances, attributes and
decision classes for each dataset. In the adopted datasets, the number of attributes
ranges from 2 to 240, the number of decision classes from 2 to 38, and the number of
instances from 14 to 5300. These ML problems involve 9 noisy and 29 imbalanced
datasets, where the imbalance ratio ranges from 5:1 to 439:1.
The presence of noise and the imbalance ratio (calculated as the ratio of the size
of the majority class to that of the minority class) are also given. In this paper, we
say that a dataset is imbalanced if the number of instances belonging to the majority
326 M. Bello et al.

decision class is at least five times the number of instances belonging to the minority
class. On the other hand, we replaced missing values with the mean or the mode
depending on whether the attribute was numerical or nominal, respectively.
Moreover, we evaluate the algorithms’ performance for three heterogeneous
distance functions taken from [33]: the Heterogeneous Euclidean-Overlap Metric
(HEOM), the Heterogeneous Manhattan-Overlap Metric (HMOM) and the Hetero-
geneous Value Difference Metric (HVDM).
• The Heterogeneous Euclidean-Overlap Metric (HEOM). This heterogeneous
distance function computes the normalized Euclidean distance between numerical
attributes and an overlap metric for nominal attributes.
• The Heterogeneous Manhattan-Overlap Metric (HMOM). This heterogeneous
variant is similar to the HEOM function since it replaces the Euclidean distance
with the Manhattan distance when computing the dissimilarity between two numer-
ical values.
• The Heterogeneous Value Difference Metric (HVDM). This function involves
a stronger strategy for quantifying the dissimilarity between two discrete attribute
values. Instead of computing the matching between attribute values, it measures
the correlation between such attributes and decision classes.
The similarity threshold associated with each base classifier is uniformly dis-
tributed in the [0 : 96; 1) interval. In all ensemble models, the number of RCN base
classifiers is set to N = 10 in order to keep the computational complexity manage-
able.
Each dataset has been partitioned using a 10-fold cross-validation procedure, i.e.,
the dataset has been split into ten folds, each containing 10% of the instances. For
each fold, an ML algorithm is trained with the instances contained in the training
partition (all other folds) and then tested with the current fold, so no object is used
for training and testing purposes at the same time.

5.2 Discussion of the Results

To measure the classifiers’ prediction capability, we computed the Kappa coefficient.


Cohen’s kappa coefficient [28] measures the inter-rater agreement for categorical
items. It is usually deemed a more robust measure than the standard accuracy since
this coefficient takes into account the agreement occurring by chance.
The first experiment is oriented to determining the t-norm leading to the best
prediction rates. Table 3 display the t-norms included in this first simulation. Figure
3 shows the average Kappa coefficient achieved by the proposed model perform-
ing instance bagging (FRCEs) across three heterogeneous distance functions using
differents t-norms.
From the above simulations we can notice that the proposed model computes the
best prediction rates with the t-norms: Standard, Algebraic Product and Lukasiewicz.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 327

Table 2 Characterization of the ML datasets adopted for the simulations


Dataset Instances Attributes Classes Noisy Imbalance
Acute-inflammation 120 6 2 No No
Acute-nephritis 120 6 2 No No
Anneal 898 38 6 No 85:1
Anneal.orig 898 38 6 No 85:1
Appendicitis 106 7 2 No No
Audiology 226 69 24 No 57:1
Australian 690 14 2 No No
cre Autos 205 25 7 No 22:1
Balance-noise 625 4 3 Yes 5:1
Ballons 16 4 2 No No
Banana 5300 2 2 No No
Bank 4521 16 2 No 7:1
Blood 748 4 2 No No
Breast 277 9 2 No No
BC-wisconsin-diag 569 31 2 No No
BC-wisconsin-prog 198 34 2 No No
Bridges-version1 107 12 6 No No
Car 1728 6 4 No 17:1
Cardiotocography-10 2126 35 10 No 10:1
Cardiotocography-3 2126 35 3 No 9:1
Chess 3196 36 2 No No
Cleveland 297 13 5 No 12:1
Colic 368 22 2 No No
Colic.orig 368 27 2 No No
Collins 500 23 15 No 13:1
Contact-lenses 24 4 3 No No
Contraceptive 1473 9 3 No No
Credit-a 690 15 2 No No
Credit-g 1000 20 2 No No
crx 653 15 2 No No
csj 653 34 6 No No
Cylinder-bands 540 39 2 No No
Dermatology 358 34 6 No 5:1
Echocardiogram 131 11 2 No 5:1
Ecoli 336 7 8 No 71:1
Ecoli0 220 7 2 No No
Ecoli-0vs1 220 7 2 No No
Ecoli1 336 7 2 No No
(continued)
328 M. Bello et al.

Table 2 (continued)
Dataset Instances Attributes Classes Noisy Imbalance
Ecoli2 336 7 2 No 5:1
Ecoli3 336 7 2 No 8:1
Energy-y1 768 8 38 No No
Eucalyptus 736 19 5 No No
Glass0 214 9 2 No No
Glass-0123 versus 456 214 9 2 No No
Glass1 214 9 2 No No
Glass2 214 9 2 No No
Glass-20an-nn 214 9 6 Yes 8:1
Glass3 214 9 2 No 6:1
Glass-5an-nn 214 9 6 Yes 8:1
Glass6 214 9 2 No 6:1
Hayes-roth 160 4 3 No No
Heart-statlog 270 13 2 No No
Ionosphere 351 34 2 No No
Iris 150 4 3 No No
Iris0 150 4 2 No No
Iris-20an-nn 150 4 3 Yes No
Iris-5an-nn 150 4 3 Yes No
Labor 57 16 2 No No
LED7digit 500 7 10 No No
Lung-cancer 32 56 3 No No
Mammographic 830 5 2 No No
mfeat-fourier 2000 76 10 No No
mfeat-morpho 2000 6 10 No No
mfeat-pixel 2000 240 10 No No
mfeat-zernike 2000 47 10 No No
Molecular-biology 106 57 2 No No
monk-2 432 6 2 No No
New-thyroid 215 5 2 No 5:1
Parkinsons 195 22 2 No No
pima 768 8 2 No No
pima-10an-nn 768 8 2 Yes No
pima-20an-nn 768 8 2 Yes No
pima-5an-nn 768 8 2 Yes No
Planning 182 12 2 No No
Postoperative 90 8 3 No 32:1
Primary-tumor 339 17 22 No 84:1
saheart 462 9 2 No No
Solar-flare-1 323 5 6 No 11:1
(continued)
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 329

Table 2 (continued)
Dataset Instances Attributes Classes Noisy Imbalance
Solar-flare-2 1066 12 6 No 7:1
Sonar 208 60 2 No No
Soybean 683 35 19 No 11:1
Spectfheart 267 44 2 No No
Sponge 76 44 3 No 23:1
Tae 151 5 3 No No
Tic-tac-toe 958 9 2 No No
Vehicle 846 18 4 No No
Vehicle0 846 18 2 No No
Vehicle1 846 18 2 No No
Vehicle2 846 18 2 No No
Vehicle3 846 18 2 No No
Vertebral2 310 6 2 No No
Vertebral3 310 6 3 No No
Vowel 990 13 11 No No
Weather 14 4 2 No No
Wine 178 13 3 No No
Wine-5an-nn 178 13 3 Yes No
Winequality-white 4898 11 7 No 439:1
Wisconsin 683 9 2 No No
Yeast1 1484 8 2 No No
Zoo 101 16 7 No 10:1

Table 3 T-norms explored in T-norm Formulation


this paper
Standard intersection T (x, y) = min{x, y}
Algebraic product T (x, y) = x y
Lukasiewicz T (x, y) = max{0, x + y − 1}


⎨x ,y=1
Drastic product T (x, y) = y ,x =1


0 , other wise

Following on from this, we adopt the Lukasiewicz t-norm in the rest of the simulations
conducted in this paper.
The second experiment explore different OWA operators pointed out in [34]. In
Eqs. (6)–(8) are shown three important special cases of these OWA operators.

O W A W ∗ , wher e W ∗ = (1, 0, . . . , 0)T (6)

O W A W∗ , wher e W∗ = (0, 0, . . . , 1)T (7)


330 M. Bello et al.

Fig. 3 Average Kappa measure computed for the proposed model using three heterogeneous dis-
tance functions with different t-norms

Fig. 4 Average Kappa measure computed for the proposed model using three heterogeneous dis-
tance functions with different OWA operators

1 1 1
O W A W Ave , wher e W Ave = ( , , . . . , )T (8)
n n n
Figure 4 shows the average Kappa coefficient achieved by FRCE across three het-
erogeneous distance functions using different OWA operators. From the above sim-
ulations we can notice that the proposed model computes the best prediction rates
with O W A W Ave operator.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 331

Fig. 5 Average Kappa measure according to different criteria

Table 4 Results of the Wilcoxon signed rank test


p-value Negative ranks Positive ranks Null hypothesis
FRCE versus 0.003368 11 24 Rejected
RCE (HEOM)
FRCE versus 0.031599 10 20 Rejected
RCE (HVDM)
FRCE versus 0.000009 2 25 Rejected
RCE (HMOM)

The following experiment focused on determining the best-performing granular


ensemble model. For this, we evaluated the prediction capability of standard RCEs
and FRCE across three heterogeneous distance functions.
Figure 5 displays the average Cohen’s kappa coefficient achieved by each algo-
rithm. It shows that regardless of the underlying distance function, the FRCE classifier
outperforms its competitor. However, the winning FRCE variant is obtained with the
HMOM distance function.
In order to examine the existence of statistically significant differences in perfor-
mance, the next step is to determine whether the superiority of the FRCE classifier for
each configuration is statistically significant or not. By doing so, we resorted to the
Wilcoxon signed rank test [32]. Table 4 reports the p-value, the number of negative
ranks (i.e., ranks for which RCE was better than FRCE) and the number of positive
ranks (i.e., ranks for which FRCE was better than RCE) computed by the Wilcoxon
signed rank test associated with each pairwise comparison using FRCEs and RCEs
with a different distance function. The statistical analysis supports the superiority of
the FRCE algorithm as all the null hypotheses (equal performance) were rejected.
The above simulations confirm FRCE’s superiority over RCE independently of
the distance function, thus demonstrating that the proposed activation mechanism
indeed strengthens the RCE classifier.
332 M. Bello et al.

Fig. 6 Average Kappa measure reported by the adopted classifiers

As a further simulation, we compare the prediction ability of the improved granular


classifier against 14 traditional classifiers existing in the Weka software tool [12].
The classifiers used for comparison are: Decision Table (DT) [16], Naive Bayes (NB)
[14], Naive Bayes Updateable (NBU) [14], Support Vector Machines (SMO) with
sequential minimal optimization algorithm [15], Multilayer Perceptron (MLP) [13],
Simple Logistic (SL) [30], Decision Tree (J48) [26], Fast Decision Trees (FDT) [29],
Best-first Decision Trees (BFT) [27], Logistic Model Trees (LMT) [19], Random
Trees (RT) [2], Random Forests (RF) [6], k-nearest neighbors learner (kNN) [1] and
K ∗ instance-based classifier (K*) [9].
Similarly to the previous experiments, we used Cohen’s Kappa coefficient to
quantify the algorithms’ performance. Figure 6 shows the average Kappa measure
attained by each method across the 100 datasets. For this experiment, the Friedman
test [11] suggests rejecting the null hypothesis ( p-value = 1.8266E − 29 < 0.05)
using a confidence level of 95%, hence confirming that there are significant differ-
ences between at least two algorithms across the selected datasets. From these results,
it is clear that LMT is the best-ranked algorithm, FRCE classifier is the second-best
ranked, whereas NBU is the worst one.
Table 5 shows the p-values reported by the Wilcoxon test and the corrected ones
according to the post-hoc procedures using FRCEs as the control method. We assume
that a null hypothesis H0 can be rejected if at least one of the adopted post-hoc
procedures supports the rejection.
The results point to the fact that, in spite of LMT standing as the most competitive
classifier in terms of Kappa measure, no significant differences were spotted among
them. In addition, the null hypothesis was accepted for RF, MLP, SMO, SL and
J48. Note however that these classifiers report slightly lower Kappa measures. More
importantly, FRCE is capable of outperforming the remaining classifiers.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 333

Table 5 Adjusted p-values using FRCE as the control method


Algorithm p-value Bonferroni Holm Holland Null
hypothesis
RT 1.882E−09 2.63E−08 2.63E−08 2.63E−08 Rejected
DT 1.372E−07 1.92E−06 1.78E−06 1.78E−06 Rejected
FDT 0.000005 7.00E−05 6.00E−05 6.00E−05 Rejected
NBU 0.000042 5.88E−04 4.62E−04 4.62E−04 Rejected
kNN 0.000052 7.28E−04 5.20E−04 5.20E−04 Rejected
K* 0.000744 0.010416 0.006696 0.00667611 Rejected
BFT 0.001144 0.016016 0.009152 0.00911544 Rejected
NB 0.001563 0.021882 0.010941 0.01088983 Rejected
LMT 0.009738 0.136332 0.058428 0.0570239 Accepted
J48 0.041909 0.586726 0.209545 0.19270214 Accepted
SL 0.553821 1 1 0.96036887 Accepted
SMO 0.574572 1 1 0.96036887 Accepted
MLP 0.854787 1 1 0.97891318 Accepted
RF 0.949685 1 1 0.97891318 Accepted

6 Conclusions

In this paper, we presented a fuzzy activation mechanism for RCNs. This mechanism
is based on the assumption that objects may belong to the intersection set between
the similarity class and each non-empty granular region with different membership
degree. The numerical results have shown that the proposed modification leads to
improved prediction rates, while it remains comparable with selected state-of-the-
art classifiers. The fuzzy approach is only focused on the activation mechanism, the
information granules are still crisp. The future research will be focused on replacing
the crisp constructs with fuzzy ones, so further flexibility may be achieved.

References

1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1),
37–66 (1991)
2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Com-
put. 9(7), 1545–1588 (1997)
3. Balamash, A., Pedrycz, W., Al-Hmouz, R., Morfeq, A.: Granular classifiers and their design
through refinement of information granules. Soft Comput. 1–15 (2015)
4. Bargiela, A., Pedrycz, W.: Granular Computing: An Introduction, vol. 717. Springer Science
& Business Media (2012)
5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
334 M. Bello et al.

7. Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: improving accuracy of classifier
ensembles by using random feature subsets. Pattern Recogn. 36(6), 1291–1302 (2003)
8. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies:
a survey on big data. Inf. Sci. 275, 314–347 (2014)
9. Cleary, J.G., Trigg, L.E., et al.: K*: an instance-based learner using an entropic distance mea-
sure. In: Proceedings of the 12th International Conference on Machine Learning, vol. 5, pp.
108–114 (1995)
10. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2012)
11. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data
mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
13. Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Con-
ference on Neural Networks, 1989. IJCNN, pp. 593–605. IEEE (1989)
14. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Pro-
ceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345.
Morgan Kaufmann Publishers Inc. (1995)
15. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO
algorithm for SVM classifier design. Neural Comput. 13(3), 637–649 (2001)
16. Kohavi, R.: The power of decision tables. In: Machine Learning: ECML-95, pp. 174–189.
Springer, Berlin (1995)
17. Kosko, B.: Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 24(1), 65–75 (1986)
18. Kosko, B.: Hidden patterns in combined and adaptive knowledge networks. Int. J. Approx.
Reason. 2(4), 377–393 (1988)
19. Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Machine Learn. 59(1–2), 161–205
(2005)
20. Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
21. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. In:
Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 15–27.
MIT Press, Cambridge (1988)
22. Nápoles, G., Falcon, R., Papageorgiou, E., Bello, R., Vanhoof, K.: Rough cognitive ensembles.
Int. J. Approx. Reason. 85, 79–96 (2017)
23. Nápoles, G., Grau, I., Papageorgiou, E., Bello, R., Vanhoof, K.: Rough cognitive networks.
Knowl. Based Syst. 91, 46–61 (2016)
24. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
25. Pedrycz, W., Homenda, W.: From fuzzy cognitive maps to granular cognitive maps. IEEE
Trans. Fuzzy Syst. 22(4), 859–869 (2014)
26. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman Publishers (1993)
27. Shi, H.: Best-first decision tree learning. Ph.D. thesis, Citeseer (2007)
28. Smeeton, N.C.: Early history of the kappa statistic. Biometrics 41, 795 (1985)
29. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National
Conference on Artificial Intelligence, vol. 1, pp. 500–505. AAAI’06, AAAI Press (2006)
30. Sumner, M., Frank, E., Hall, M.: Speeding up logistic model tree induction. In: Knowledge
Discovery in Databases: PKDD 2005, pp. 675–683. Springer (2005)
31. Turner, K., Oza, N.C.: Decimated input ensembles for improved generalization. In: Interna-
tional Joint Conference on Neural Networks, 1999. IJCNN’99, vol. 5, pp. 3069–3074. IEEE
(1999)
32. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–93 (1945)
33. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res.
6(1), 1–34 (1997)
34. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision-
making. In: Readings in Fuzzy Sets for Intelligent Systems, pp. 80–87. Elsevier (1993)
35. Yao, Y.: Three way decision: an interpretation of rules in rough set theory. In: Wen, P., Li, Y.,
Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) Rough Sets and Knowledge Technology,
pp. 642–649. Springer, Berlin (2009)
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 335

36. Yao, Y.: The superiority of three-way decisions in probabilistic rough set models. Inf. Sci.
181(1), 1080–1096 (2011)
Prediction by k-NN and MLP a New
Approach Based on Fuzzy Similarity
Quality Measure. A Case Study

Yaima Filiberto, Rafael Bello, Wilfredo Martinez, Dianne Arias,


Ileana Cadenas and Mabel Frias

Abstract In this paper the performance of k Nearest Neighbors (k-NN) and Multi-
layer Perceptron network (MLP) algorithms are used in a classical task in the branch
of the Civil Engineering: prediction of the behavior before the stud corrosion of
anchorage of the railways fixations. The use of fuzzy similarity quality measure
method for calculating the weights of the features that combines the Univariant
Marginals Distribution Algorithm (UMDA), allows to performance of k-NN and
MLP in the case of mixed data (features with discrete or real domains). Experimen-
tal results show that this approach is better than other methods used to calculate the
weight of the features.

1 Introduction

Inside the field of the Artificial Intelligence, the Rough Set Theory (RST) proposed
by Pawlak in 1982 offers measures for the analysis of data. The measure called
classification quality allows calculating the consistency of a decision system. Its

Y. Filiberto (B) · D. Arias · M. Frias


Department of Computer Science, University of Camaguey, Camaguey, Cuba
e-mail: yaima.filiberto@reduc.edu.cu
D. Arias
e-mail: dianne.arias@reduc.edu.cu
M. Frias
e-mail: mabel.frias@reduc.edu.cu
R. Bello
Department of Computer Science, University of Las Villas, Santa Clara, Cuba
e-mail: rbellop@uclv.edu.cu
W. Martinez · I. Cadenas
Department of Civil Engineer, University of Camaguey, Camaguey, Cuba
e-mail: wilfredo.martinez@reduc.edu.cu
I. Cadenas
e-mail: ileana.cadenas@reduc.edu.cu

© Springer Nature Switzerland AG 2019 337


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_17
338 Y. Filiberto et al.

main limitation is being used only for decision systems where the features domain
is discrete. A new measure (named Similarity Quality Measure) for the case of
decisions systems in which the features domain, including the decision feature, does
not have to be necessarily discrete, is proposed in [1]. This measure has the limitation
of using thresholds when constructing relations of similarity among the objects of
the decision system. These thresholds are parameters of the method to be adjusted
and parameters are aggravating factors recognized when analyzing any algorithm.
The accuracy of the method is very sensitive to small variations in the threshold.
Threshold values are also dependent on the application, so an exquisite adjustment
process of the thresholds is needed to maximize the performance of the knowledge
discovery process. Therefore, it is necessary to incorporate a technique that allows
us handling inaccuracy. The Fuzzy Sets Theory [2], as one of the main elements of
soft computing, uses fuzzy relations to make computational methods more tolerant
and flexible to inaccuracy, especially in the case of mixed data. Since Similarity
Quality Measure is quite sensitive to similarity values of thresholds, this limitation
was tackled by using fuzzy sets to categorize its domains through fuzzy binary
relations. This new measure named Fuzzy Similarity Quality Measure based on Fuzzy
Sets facilitate the definition of similarity relations (since there are fewer parameters
to consider) without degrading, from a statistical perspective, the efficiency of the
mining tasks of subsequent data. The Fuzzy Similarity Quality Measure computes the
relation between the similarity according to the conditional features and the similarity
according to the decision feature d. The method proposed here as a weighing method
of features is based on a heuristic search in which the quality of the fuzzy similarity
measure of a decision system is used as heuristic value. We use UMDA [3] to find the
best set of weight; this method has showed good performance to solve optimization
problems [1]. In this problem each particle represents a set of weights W and the
quality of particle is calculated by the fuzzy similarity measure. The impact of a new
method called UMDA+RST+FUZZY, in the k- Nearest Neighbors (k-NN) [4] and
MLP [5] algorithms is studied in this paper.

2 The Similarity Quality Measure with Fuzzy Sets

In [6] a fuzzy (binary) relation R was defined as a fuzzy collection of ordered


 pairs,
then a fuzzy relation from X to Y or, equivalently, a fuzzy relation in X Y , is a
fuzzy subset of X × Y characterized by a membership (characteristic) function R
which associates with each pair (x, y) its “grade of membership” R (x, y), in R.
We shall assume for simplicity that the range of R is the interval [0, 1] and will
refer to the number R (x, y) as the strength of the relation between x and y. In the
Fuzzy Similarity Quality Measure, membership (characteristic)functions are used
to built the similarity relations between objects respects to predictive and decision
features. These functions include the weights for each feature and local functions to
calculate how the values of a given feature are similar. Given a decision system DS,
and these fuzzy relations, two granulations are built using the binary relations R1
Prediction by k-NN and MLP a New Approach … 339

and R2 defined in Eqs. 1 and 2:

x R1 y = F1 (X, Y ) (1)

x R2 y = F2 (X, Y ) (2)

where: R1 and R2 are fuzzy relations defined to describe the similarity between
objects x and y regarding condition features and feature decision respectively. Binary
relations R1 and R2 are defined by the following functions F1 Eqs. 3 and 4.


k
F1 (X, Y ) = wi ∗ ∂i (X i , Yi ) (3)
i=1

F2 (X, Y ) = ∂(X d , Yd ) (4)

where
⎧ |(X i −Yi )|
⎨1 − Max(αi )−Min(αi )
i f i is continuous
∂(X i , Yi ) = 1 i f i is discr ete and X i = Yi (5)

0 i f i is discr ete and X i = Yi

which establish a relationship of similarity between two objects (x, y) considering


the similarity of the same with respect to features in A (calculated as the F1 function
in relation R1 ) and the target feature (calculated according to the function F2 universe
in relation R2 ), the purpose is to find the relations R1 and R2 such that R1 (x) and
R2 (x) are as similar as possible to any element of the universe. From fuzzy relations
R1 and R2 can be constructed fuzzy sets N1 (x) and N2 (x). Based on this approach,
the sets are constructed:

N1 = {(y, μR1 (x, y)) ∀y ∈ U } (6)

N2 = {(y, μR2 (x, y)) ∀y ∈ U } (7)

The problem is finding the functions F1 and F2 such that N1 (x) = N2 (x), where
the symbol “=” the greatest possible similarity between N1 (x) and N2 (x) sets for
every object in the universe. The degree of similarity between the two sets for an
object x is calculated as the similarity between fuzzy sets N1 (x) and N2 (x) can be
calculated by expression 8. The expression 8 was presented in [7].


k
[1 − |μR1 (x1 ) − μR2 (x2 )|]
i=1
ϕ(x) = (8)
n
340 Y. Filiberto et al.

Using the expression 8 as the quality of a similarity decision system (DS) with a
universe of objects N is defined by Eq. 9:
⎧ ⎫

⎪ 
k ⎪
⎪ ϕ(x) ⎪
⎨ ⎪

i=1
θ (DS) = (9)

⎪ n ⎪


⎩ ⎪

This measure represents the degree of similarity of a decision system.

3 The Regression Problem with k-NN and MLP Methods

3.1 K-Nearest Neighbours

The key idea in the k-NN method is that similar input data vectors have similar
output values [1, 8]. This algorithm assumes all instances correspond to points in the
n-dimensional space Rn . The target function value for a new query is estimated from
the known values of the k nearest training examples. One obvious refinement to the
k-NN algorithm is to weight the contribution of each of the k neighbors according to
their distance to the query point Xq giving greater weight to closer neighbors. The
k-NN algorithm for approximating a discrete-value target function is given in (2) by
[9].
k
f (X q ) ← argmaxv∈V wi ∗ δ(v, f (xi )) (10)
i=1

The k-NN method is a simple, intuitive and efficient way to estimate the value of
an unknown function. Finding these K nearest neighbors requires the use of distance
functions (nominal, numerical or mixed). Similarity functions are often employed
in mixed problems, i.e. those with both nominal and numerical attributes [10]. The
results presented in [11] show that an important aspect in the methods based on
similarity grades, as the k-NN method, is the set of weights assigned to the features,
because this improves significantly the performance of the method [12]. In this paper
we propose a new alternative for calculating the weights of the features to be asso-
ciated with the predictive features that appear in the weighted similarity function
based on Fuzzy Similarity Quality Measure.
Prediction by k-NN and MLP a New Approach … 341

3.2 Multilayer Perceptron Neural Network

The most popular neural network model is the Multilayer Perceptron (MLP) and the
most popular learning algorithm is the Back-propagation (BP) [13], which is based on
correcting the error. The essential character of the BP algorithm is gradient descent,
because the gradient descent algorithm is strictly dependent on the shape of the error
surface. The error surface may have some local minimum and multimodal. This
results in falling into some local minimum and appearing premature convergence
[14]. BP training is very sensitive to initial conditions [13]. In general terms, the
choice of the initial weight vector W0 may speed convergence of the learning process
towards a global or a local minimum if it happens to be located within the attraction
based on that minimum. Conversely, if W0 starts the search in a relatively flat region
of the error surface it will slow down the adaptation of the connection weights [15].
An MLP is composed of an input layer, an output layer and one or more hidden layers,
but it has shown that for most problems it is sufficient with a single hidden layer. The
number of hidden units is directly related to the capabilities of the network, in our
case the number determine what follows (i + j)/2, where i is input neurons and j is
the output. Each link between neurons has an associated weight W , which is modified
in the so-called learning process. From there, the information is passed to the hidden
layer, and then transmitted to the output layer, that is responsible for producing the
network response [16]. In general, MLPs can have several hidden layers. However,
we consider the initialization of MLPs with only one hidden layer. Assuming a three-
layer neural network with n inputs (features), q outputs (categories), and one hidden
layer with a variable number of nodes (n + q)/2; see Fig. 1.
The method presented in this paper (UMDA+RST+FUZZY method) is used to
assign weights to the links between the input layer and hidden layer.

Fig. 1 The topology of the


MLP
342 Y. Filiberto et al.

Algorithm 1 Pseudocode for UMDAc algorithm


Set t ← 1;
Generate L  0 individuals randomly;
while termination condition is not met do
Select I ≤ L according to a selection method;

n
Estimate de distribution p s (x, t) = p(X i , t) of the selected I individuals;
i=1
Generate L new individuals according to the distribution p s (x, t);
Set t ← t + 1;
end while

3.3 Algorithm UMDA+RST+FUZZY

In order to calculate the weights, a heuristic search is performed. We selected the


Univariant Marginals Distribution Algorimth (UMDA) [3] for assigning weights,
taking into account the relative ease of implementation, speed in locating the optimal
solution, its powerful scanning capabilities and its relative lower computational cost
in terms of memory and time. The implementation of an optimization algorithm to
calculate different weights for each attribute would free the researcher of the civil
engineering area of their definition by using other qualitative or quantitative criteria.
The UMDAc algorithm, proposed by Larrañaga et al. [3, 17], is a modified version
of UMDA for continuous domain. Hence, the algorithm assumes that the variables
are independent one each other, and some statistical tests are carried out for every
variable in each generation in order to find the density function that better fits this
variable (some different density functions are considered). The two parameters that
are estimated in case that all the distributions are normal are the average μi (t) and
the standard deviation σi (t).
In Algorithm 1 we show a brief pseudocode for the UMDAc algorithm. As it can
be seen, the UMDA starts by randomly generating the initial population of poten-
tial solutions (individuals, also called points), and then the algorithm begins to
iteratively evolve the current population until a termination condition is met. This
termination condition is usually either to find a solution or to reach a maximum
number of function evaluations. The new generation of individuals is computed as
follows. From the whole population, only I individuals (with I ≤ L, being L the size
of the population) are selected. Then, the UMDA explicitly extracts global statistical
information from this set of I parent solutions, and build a posterior probability dis-
tribution model of promising solutions p s (x, t), based on the extracted information
(s, x, and t represent the selected set of parents, the set of variables composing the
individuals, and the generation number, respectively). After estimating the univariate
probability distribution, L new solutions are sampled from the model thus built and
fully or in part replace the current population (at generation t) to form the new one
(generation t + 1).
Prediction by k-NN and MLP a New Approach … 343

For the approached problem it is used as function of optimization to maximize


the value of the expression 9 and named this algorithm UMDA+RST+FUZZY.

4 Experimental Setup

We will apply the proposed methods on a real dataset from the UCI Machine Learning
repository (baskball, detroit, diabetes-numeric, elusage, fishcatch, pollution, pwLin-
ear, vineyard, bolts, cloud, gascons and veteran, longley, pyrim, bodyfat). The variants
for calculating the weights for k-NN with k = 1 are: the proposed method in [1] (called
PSO+RST) in this case we use UMDA instead PSO, the weight obtained by Conju-
gated Gradient method (KNNVSM) [18], assigning the same weight to each feature
(called Standard) and Relief [19]. The results of the error of the MLP and the results of
the MLP when the different weight calculation methods (Random (MLP-AL), Stan-
dard (1/Quantity-Features), KNNVSM, UMDA+RST and UMDA+RST+FUZZY)
are used, were compared to prove the effectiveness of the UMDA+RST+ FUZZY
method. The results achieved by the k-NN and MLP for the cases standard error,
where the weights are initialized using the mentioned variants, are shown in Tables 1
and 2.
In order to compare the results, a multiple comparison test is used to find the best
algorithm. In Tables 3 and 4 the results of the Friedman statistical test are shown.
There can be observed that the best ranking is obtained by our proposal. Thus, this
indicates that the accuracy of UMDA+RST+FUZZY is significantly better. Also the
Iman-Davenport test was used [20]. The resulting p-value = 0.004666159801 < α
(with 3 and 33 degrees of freedom) for k-NN and MLP respectively - indicates that
there are indeed significant performance differences in the group for both methods.

Table 1 Results of the error for regression with k-NN method


Dataset KNNV S M Standard UMDA+RST UMDA+RST+FUZZY
baskball 0.092 0.092 0.099 0.091
detroit 32.18 32.18 28.92 26.39
diabetes-numeric 0.613 0.613 0.604 0.644
elusage 12.38 12.38 10.15 10.53
fishcatch 56.1 56.1 48.51 46.96
pollution 41.29 41 43.78 39.09
pwLinear 2.55 2.60 2.41 2.67
vineyard 2.56 2.52 2.14 2.40
bolts 11.8 11.74 9.37 8.18
cloud 0.55 0.55 0.42 0.40
gascons 8.39 8.39 8.14 8.10
veteran 120.6 120.6 116.8 91.79
344 Y. Filiberto et al.

Table 2 Results of the error for regression with MLP method


Dataset KNNV S M Standard UMDA+RST UMDA+RST+FUZZY
baskball 0.09 0.09 0.09 0.07
detroit 44.58 44.58 44.08 34.66
diabetes-numeric 0.6 0.6 0.6 0.49
elusage 10.71 11 10.93 9.65
pollution 58.35 62.94 60.1 36.55
vineyard 2.53 2.53 2.56 2.28
veteran 172.1 172.2 196.7 99.61
longley 368 368.4 393.9 264
pyrim 0.09 0.09 0.09 0.09
bodyfat 0.63 0.63 0.6 0.6

Table 3 Average ranks Algorithm Ranking


obtained by each method in
the Friedman test for k-NN KNNVSM 3.2083
Standard 3.0417
UMDA+RST 2
UMDA+RST+FUZZY 1.75

Table 4 Average ranks Algorithm Ranking


obtained by each method in
the Friedman test for MLP KNNVSM 2.6
Estndar 3.2
UMDA+RST 3
UMDA+RST+FUZZY 1.2

There is a set of methods to increase the power of multiple test; they are called
sequential methods, or post-hoc tests. In this case it was decided to use Holm [21]
test to find algorithms significantly higher. UMDA+RST+FUZZY - as the con-
trol method- conduct to pair wise comparisons between the control method and
all others, and determine the degree of rejection of the null hypothesis. The results
reported in Table 5 reject all null hypotheses whose p-value is lower than 0.025, hence
confirming the superiority of the control method [10]. Since the UMDA+RST vs.
UMDA+RST+FUZZY null hypothesis was NOT rejected, This is equivalent to say-
ing that there are no significant differences in the performance of both algorithms
when they are combined with the 1-NN method and therefore they can be deemed
equally effective. The results reported in Table 6 reject all null hypotheses where the
p-value is lower than 0.05, as we can observe, the test rejects all cases in favor of the
best ranking algorithm. It can be noticed that UMDA+RST+FUZZY is statistically
superior to all compared methods When combined with the MLP method.
Prediction by k-NN and MLP a New Approach … 345

Table 5 Holm’s table with α = 0.025 for 1-NN, UMDA+RST+FUZZY is the control method
i Algorithm z = (R0-Ri)/SE p Holm Hypothesis
3 KNNVSM 2.766993 0.005658 0.016667 Reject
2 Standard 2.450765 0.014255 0.025 Reject
1 UMDA+RST 0.474342 0.635256 0.05 Its not rejected

Table 6 Holm’s table with α = 0.05 for MLP, UMDA+RST+FUZZY is the control method
i Algorithm z = (R0-Ri)/SE p Holm Hypothesis
3 Standard 3.464102 0.000532 0.016667 Reject
2 UMDA+RST 3.117691 0.001823 0.025 Reject
1 KNNVSM 2.424871 0.015314 0.05 Reject

5 Applications of the Method in the Solution of a Real


Problem

In this section a real problem related with the branch of the Civil Engineering is
solved. In Cuba early determination of impairment by corrosion of the stud of anchor-
age of the railways fixations, it contributes to improved maintenance planning. To
determine the causes of this behavior, an extensive field study was developed from
which the data set was prepared. The data set has 96 instances and 5 features, includ-
ing the class feature. The description of the data set is shown in Table 7. The problem
is to predict the behavior before the corrosion of the stud of anchorage of the railways
fixations.
The data used for the study were been of experiment carried out in different
railway in Cuba, in the central railway of the city of Camag üey. A sample of these
data is shown in Table 8.
An experimental study for the data-set corrosion is performed (Table 9 and 10).
To predict the behavior before the corrosion of the stud of anchorage of the rail-
ways fixations (RN) for any orientation of the railway allows to plan the anticorrosive
maintenance appropriately to these elements and with it to rationalize the material

Table 7 Description of the data-set used in the experiment


Attributes Description
Time of exhibition (time-exh) 3, 6 and 12 months
Area (area) Urban, Rural, Industrial and Marine-coastal
Relative position in the railway (pos) External, Interior
Azimuth of the railway (azimuth) 0, 10, 30, 125, 140, 225
Rail (rail) North, South
Lost mass of the stud (lost-mass) Numeric value between 0 and 1
346 Y. Filiberto et al.

Table 8 Example of data-set used in the experiment


time-exh area pos azimuth rail lost-mass
3 Urban Interior 0 North 0.19
3 Rural External 10 South 0.06
3 Marine-coastal External 225 South 0.42
6 Industrial Interior 30 North 0.29
12 Urban External 140 North 0.38
12 Industrial Interior 30 South 0.3

Table 9 Results of Dataset UMDA+RST UMDA+RST+Fuzzy


regression with 1-NN
corrosion 0.07 0.06

Table 10 Results of Dataset UMDA+RST UMDA+RST+Fuzzy


regression with MLP
corrosion 0.08 0.08

resources and humans required for this task and to elevate the security of the move-
ment of the trains.

6 Conclusion

In this paper has been study of combination of the Fuzzy Similarity Measure Quality
with the UMDA method and the use of feature’s weight compute by this method
in k-NN and MLP methods. The main contribution is the combination of the Fuzzy
Similarity Measure Quality with the UMDA method. This measure computes the
grade of similarity on a decision system in which the features can have discrete
or continuous values. The paper includes the calculus of the features weights by
means of the optimization of this measure. The experimental study for problems of
classification shows a superior performance of the k-NN and MLP algorithm when
the weights are initialized using the method proposed in this work, compared to
other previously reported methods to calculate the weight of features. Its applica-
tion to solve a classification problem of branch of the Civil Engineering has shown
satisfactory results.

References

1. Filiberto, Y., Bello, R., Caballero, Y., Larrua, R.: In: Proceedings of the 10th International
Conference on Intelligent Systems Design and Applications ISDA 2010 IEEE, pp. 1314–1319.
IEEE Press (2010)
Prediction by k-NN and MLP a New Approach … 347

2. Zadeh, L.A.: Inf. Control 8, 338 (1965)


3. Larrañaga, P., Etxeberria, R., Lozano, J.A., Pea, J.M.: Optimization by learning and simulation
of bayesian and gaussian networks. Kzza-ik-4-99, Dept. of Computer Science and Artificial
Intelligence, University of the Basque Country (1999)
4. Cover, T.M., Hart, P.E.: IEEE Trans. Inf. Theory, pp. 21–27 (1967)
5. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Cambridge
(1995)
6. Zadeh, L.A.: Inf. Sci. 3, 177 (1971)
7. Wang, W.: Fuzzy Sets Syst. 85, 305 (1997)
8. Filiberto, Y., Bello, R., Caballero, Y., Larrua, R.: In: International Workshop on Nature Inspired
Cooperative Strategies for Optimization, pp. 359–370. Springer, Berlin (2010)
9. Mitchell, T.: McGraw Hill, p. 414 (1997)
10. Fernandez, Y., Filiberto, Y., Bello, R.: In: 11th International Conference on Electrical Engi-
neering, Computing Science and Automatic Control, 1-6, pp. 296–301. IEEE Press, Mexico
(2014)
11. Duch, W., Grudzinski, K.: Intelligent Information Systems, pp. 32–36 (1999)
12. Filiberto, Y., Bello, R., Caballero, Y., Frias, M.: In: 4th International Workshop on Knowledge
Discovery. Knowledge Management and Decision Support, pp. 130–139 (2013)
13. Rumelhart, D., Hilton, G., Williams, R.: Nature 323, 533 (1986)
14. Fu, X., Zhang, S., Pang, Z.: A resource limited immune approach for evolving architecture
and weights of multilayer neural network, part I. ICSI 2010, vol. 6145, pp. 328–337. Springer,
Heidelberg (2010)
15. Adam, S., Alexios, D., Vrahatis, M.: Revisiting the problem of weight initialization for multi-
layer perceptrons trained with back propagation. ICONIP 2008, vol. 5507, pp. 308–315.
Springer, Heidelberg (2009)
16. Coello, L., Fernandez, Y., Filiberto, Y., Bello, R.: Computaci y Sistemas 19(2), 309 (2015)
17. Etxeberria, R., Lozano, J.A., Peña, J.M., Larrañaga, P.: In: Wu, A.S. (ed.) Proceeding of the
Genetic and Evolutionary Computation Workshop Program. Morgan Kaufmann, Las Vegas,
Nevada, USA, pp. 201–204 (2000)
18. Wettschereckd, D.: A description of the mutual information approach and the variable similarity
metric. Technical Report, Artificial Intelligence Research Division, German National Research
Center for Computer Science, Sankt Augustin, Germany (1995)
19. Kononenko, I.: In: European Conference on Machine Learning (1994)
20. Iman, R.L., Davenport, J.: Commun. Stat. 18, 571 (1980)
21. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6(2), 65–70
(1979)
Scheduling in Queueing Systems
and Networks Using ANFIS

Eduyn López-Santana, Germán Méndez-Giraldo


and Juan Carlos Figueroa-García

Abstract This paper is concerned with a scheduling problem in many real-world


systems where the customers must be waiting for a service known as queueing
system. Classical queueing systems are handled using probabilistic theories, mostly
based on asymptotic theory and/or samples analysis. We address a situation where
neither enough statistical data exists, nor asymptotic behavior can be applied to.
This way, we propose to use an Adaptive Neuro-Fuzzy Inference System (ANFIS)
method to infer scheduling rules of a queueing problem, based on uncertain data.
We use the utilization ratio and the work in process (WIP) of a queue to train an
ANFIS network to finally obtain the estimated cycle time of all tasks. Multiple tasks
and rework are considered into the problem, so it cannot be easily modeled using
classical probability theory. The experiment results through simulation analysis show
an improvement of our ANFIS method in the performance measures compared with
traditional scheduling policies.

Keywords ANFIS · Scheduling · Queueing systems · Queueing networks


Utilization · WIP

1 Introduction

In many systems, particularly services, it is often that the customers must be waiting
for be processing, for instances the customers in a bank, or the people in a metro or
subway station, among the others. These systems are called queueing systems (QS).
For the service’s supplier view, there are several decision-making problems such as:

E. López-Santana (B) · G. Méndez-Giraldo · J. C. Figueroa-García


Universidad Distrital Francisco José de Caldas, Bogotá, Colombia
e-mail: erlopezs@udistrital.edu.co
G. Méndez-Giraldo
e-mail: gmendez@udistrital.edu.co
J. C. Figueroa-García
e-mail: jcfigueroag@udistrital.edu.co
© Springer Nature Switzerland AG 2019 349
R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_18
350 E. López-Santana et al.

the number of servers to attend the customers, the kind of technology to improve
the service times, the queue policy (e.g. in serial or parallel setting), the capacity
of the system (servers plus queue size), among others. Some of these problems are
solved frequently using classical queueing theory, however when it is not possible
because the data, for probabilistic analysis, does not able, the problems are solved
using the experience of the people or their perception. These feature is a first source
of uncertain of the QS in the decision process.
Additional features in the QSs, such as feedback loops in the system, non-linearity,
variability, product mixes, routing, equipment random failures and stochastic arrival
times, add more complexity to the problem [1]. Some cases where the customer must
follow several steps to be processed are called queueing networks (QN). Around of
this setting, several queues arise, then the decision process is more complex since
numerous decision will be taken simultaneously to ensure the flow through the sys-
tem.
The decision making tools compare usually the efficiency of different config-
urations in terms of equipment, operators, storage areas, waiting areas, etc., and
determine long-term decisions, for instance in capacity expansion [1, 2]. There are
several methods as queueing theory, Jackson Networks, Mean Value Analysis and
Equilibrium Point Analysis, among others [3]. However, these systems do not con-
sider the uncertainty information.
Recently, López-Santana, Franco and Figueroa-Garcia [1] study the problem to
scheduling tasks in a QS considering the condition based systems in terms the queue’s
length, utilization and the work in process involving the imprecision in their mea-
surement process. They propose a Fuzzy Inference Systems (FIS) to determine the
server to allocate a specific task according to the condition in the systems measured in
terms of queue’s length and server’s utilization. In other related work [4], the authors
propose an ANFIS to determine the status in the system using input variables like
the length queue and utilization in order to scheduling task in a QS.
The purpose of this paper consists in apply the ANFIS based approach proposed
by [4] in QSs and QNs to determine the server to allocate a specific task according to
the queue’s length and server’s utilization. In addition, we consider the rework and
multi-tasking features. For our knowledge, this is the first time to ANFIS is applied
to scheduling decisions in QSs and QNs according to the literature review.
The remainder of this paper is organized as follows: Sect. 2 presents a background
and literature review of task scheduling in a queueing system. Section 3 describes
the proposed method. Section 4 shows two examples application of our method in
a QS and a QN. Finally, Sect. 5 concludes this work and provides possible research
directions.
Scheduling in Queueing Systems and Networks Using ANFIS 351

2 Background and Literature Review

In this section we describe the overviews about of queueing systems (QS) and queue-
ing networks (QN) and shows a short review of works related with the scheduling
process in QS and QN.

2.1 Queueing Systems (QS) and Queueing Networks (QN)

A QS is a setting which a customer (humans, finished goods, messages) arrives at a


service facility, get served according to a given service discipline, and then depart
[5]. AQN is configured when the service is completed by its step in different stages
(or stations) where it is served in a sequential way. The complexity increases in QNs
because scheduling several tasks with several stations consider more variables like
capacity, routing probabilities, variability, blocking, reprocessing, among others [3].
In a QS the customers requiring service are generated over time by an input source
[6].
Figure 1 shows this process in a QS, where the customers enter to the QS and
join a queue. At certain times, a member of the queue is selected for service by some
rule known as the queue discipline. The required service is then performed for the
customer by the service mechanism, after which the customer leaves the QS. A QS
could be characterized in terms of Kendall’s notation [7], whose encoding under the
following structure:

1/2/3/4 (1)

where 1 refers to the arrival process that can be Poisson (M), Deterministic (D) or
general distribution different to Poisson (G); 2 is the service process that can be also
M, D o G; 3 represents the number of servers by stage of process in the network,
which can be single (represented by 1) or multiple (represented by s); and 4 states
the system’s capacity, infinite when it is empty or a K to indicate the queue’s length.
According to [1], the standard terminology and notation in QS consider as the
state of system the number of customers in the system. The Queue length (Ql) is the

Fig. 1 The basic queueing system. Source [6]


352 E. López-Santana et al.

number of customers waiting for service to begin or state of system minus number
of customers being served. Pn (t) denotes the probability of exactly n customers in
QS at time t, given number at time 0. s is the number of servers (parallel service
channels) in the QS. λn is the mean arrival rate (expected number of arrivals per unit
time) of new customers when n customers are in system and μn is the mean service
rate for overall system (expected number of customers completing service per unit
time) when n customers are in system.
When λn is a constant for all n, this constant is denoted by λ. When the mean
service rate per busy server is a constant for all n ≥ 1, this constant is denoted
by μ. (In this case, μn  sμ when n ≥ s, that is, when all s servers are busy.)
Also,ρ  λ/sμ is the utilization factor for the service facility, i.e., the expected
fraction of time the individual servers are busy, because λ/sμ represents the fraction
of the system’s service capacity (sμ) that is being utilized on the average by arriving
customers (λ).
When a QS has recently begun operation, the state of the system (number of
customers in the system) will be greatly affected by the initial state and by the time
that has since elapsed. The system is said to be in a transient condition. However, after
sufficient time has elapsed, the state of the system becomes essentially independent
of the initial state and the elapsed time (except under unusual circumstances). The
system has now essentially reached a steady-state condition, where the probability
distribution of the state of the system remains the same (the steady-state or stationary
distribution) over time. Queueing theory has tended to focus largely on the steady-
state condition, partially because the transient case is more difficult analytically.
We assume that Pn is the probability of exactly n customers  in QS. Then L is
the expected number of customers in QS, it is computed by ∞ n0 n Pn , and L q is
the
∞ expected queue length (excludes customers being served), it is computed by
ns (n − s)P n . In addition, W is the expected waiting time in system (includes
service time) for each individual customer and Wq is the expected waiting time in
queue (excludes service time) for each individual customer. It has been proved that
in a steady-state queueing process,

L  λW. (2)

It is known as Little’s Law [3, 8, 9]. Furthermore, the same proof also shows that

L q  λWq . (3)

The Eqs. (1) and (2) are extremely important because they enable all four of the
fundamental quantities L , W, L q , and Wq to be immediately determined as soon as
one is found analytically.
Figure 2 presents an example of a QN. There are three stations and each one with
a single queue. The external arrival occurs at station 1 and 2 and there are two class.
In a single-class network, customers being processed or waiting for processing at any
given station are assumed to be indistinguishable, meanwhile in a multi-class system,
several classes of customers are served at each station. In our example, station 1 and
Scheduling in Queueing Systems and Networks Using ANFIS 353

Fig. 2 Example of the queueing network setting

2 could be processing class 1 and 2, and station 3 process only class 1. In addition,
there are several routes for each class defined by the stations that a customer must be
processed. The Kendall notation can be applied to singles stations in a QN separately,
but when the QN is analyzed globally, the notation cannot be used.
In QSs and QNs, arrivals and service times are presented in terms of probability
distributions. In addition, characteristics of service stations like the configuration
and routing protocols determine the flow of customers from one station to another,
including the number of servers in each stage. Another feature is the size of the
waiting area for each station. When this is limited, some of the customers causes
congestion at the previous station and it is raised blockage in the following stations.
In a general sense, a QN must be defined in terms of arrival and service rates,
and routing probabilities or proportion in which classes of customers are transferred
sequentially from one service stage to another. Particularly, the routing probabilities
induces feedback cycles that increase complexity in the understanding of this type of
systems (see Fig. 2). Since the QN is a system of nodes that interact, the operation of
each node and the routing depend on what’s happening along the network, given this
dependence, can occur any or combination of synchronous or parallel processing
of transactions in multiple nodes, toggle the routing of transactions to avoid the
congestion (or interference), speeding up or slowing down the rate of processing in
the following nodes that can be idle or congested and the customers are blocked from
entry to a specific phase of the network when the phase is not capable to process
more customers.

2.2 Review of Scheduling Techniques to QS and QN

Scheduling is a decision-making process related with the allocation of resources


to perform a set of tasks in a specific planning horizon subject several operational
constraints such capacity or unavailability of resources, due dates, priorities, cance-
354 E. López-Santana et al.

lations, among others in order to optimize one or more objectives [10]. This problem
has several applications in manufacturing and services environments, particularly
the scheduling problems are more difficult to solve in services because the complex-
ity increases for these systems. The QS and QN are the main examples of service
systems, thus the scheduling decision are a rich are to develop new method to help
the decision makers.
Terekhov et al. [11] provide an overview of queueing-theoretic models and meth-
ods that are relevant to scheduling in dynamic settings. They found that, the queue-
ing theory aims to achieve optimal performance in some probabilistic sense (e.g.,
in expectation) over a long-time horizon, since it is impossible to create an opti-
mal schedule for every single sample path in the evolution of the system. Moreover,
queueing theory generally studies systems with simple combinatorics, as such sys-
tems are more amenable to rigorous analysis of their stochastic properties, and usually
assumes distributional, rather than exact, knowledge about the characteristics of jobs
or job types. However, in cases where we lack data to build the stochastic models,
the scheduling decisions are made with the traditional rules [12, 13] as: FIFO (First
in first out), LIFO (Last in first out), SPT (Shortest processing time), LPT (Longest
processing time), EDD (Earliest due date), among others: Likewise, it is possible to
apply multi-attribute priority rules as [13]: CR + SPT (Critical ratio + the shortest
process time), S/OPN (minimum Slack time per remaining Operation), S/RPT + SPT
(Slack per remaining process time + the shortest process time), PT + WINQ (Process
time + work in the next queue), PT + PW (Process time + wait time), among others.
However, these rules do not involve the uncertainty or the condition-based of the
systems, thus it is necessary include these features in the solution techniques.
According to the QNs modeling purpose, the technical solution is selected consid-
ering the criterion of accuracy of the expected result with respect to the assumptions
of the systems behavior. Baldwinet et al. [14] proposes its classification in two types
namely: exact and approximate. Within the analytical techniques are classified as
exact: the Jackson Networks and BCMP networks. On the other hand, there are
approximate techniques as: Mean Value Analysis (MVA) and Equilibrium Point
Analysis. Table 1 presents the scope of the analysis techniques described above, by
specifying the types of customer and network that can be modeled with precision
(compared regard to the assumptions of each method). We include the technique
called “Kingman’s parametric decomposition” [3] considering their contribution for
the modeling of flow times in QNs.
Recently some applications have been developed in QS. Jain et al. [15] develop an
iterative approach using MVA, for the prediction of performance in flexible manufac-
turing systems with multiple devices materials handling. They demonstrate improve-
ments in the throughput, average time of service and average waiting time, with
respect to the previous configuration of the materials handling devices; uses a neuro-
fuzzy controller to compare the performance measures obtained with MVA demon-
strating the consistency between the results of both techniques, giving the basis for
the automation of the system using soft-computing. Cruz [16] examines the problem
of maximizing the throughput in QNs with general service time, finding the reduction
in the total number of waiting areas and the service rate, through genetic algorithms
Scheduling in Queueing Systems and Networks Using ANFIS 355

Table 1 Classification of analysis techniques scope of QN


Analytical technique Network topology Customer type
Jackson networks Open, closed Singleclass
BCMP (Type I, II, III, IV) Open, closed, mixed Multiclass
Kingman’s parametric Open, closed, mixed Multiclass, multi-class with
decomposition retry
Mean value analysis Open, closed Singleclass
Equilibrium point analysis Open, closed, mixed Multiclass, multi-class with
(EPA) retry

multi-objective to find a feasible solution to the need of improve the service given the
natural conflict between the cost and throughput. Yang and Liu [17] develop a hybrid
transfer function model that combines statistical analysis, simulation and analysis
of queues, taking as input values the systems work rate, the performance variables
throughput and work in process (WIP).
Applications of fuzzy logic to scheduling in QS and QN are scarce. Suganthi
and Meenakshi [18] developed a FIS combined with a round robin priority rule
to scheduling task in Cognitive Radio Network. Chude-Olisah et al. [19] address
the problem of queue scheduling for the packet-switched system is a vital aspect
of congestion control. They propose a fuzzy logic-based decision method to queue
scheduling to enforce some level of control for traffic of different quality of service
requirements using predetermined values. The results of simulation experiments
show that their proposed method reduces packet drop, provides good link utilization
and minimizes queue delay as compared with the priority queueing (PQ), first-in-
first-out (FIFO), and weighted fair queueing (WFQ). Cho et al. [20, 21] present a
method that use a FIS to dynamically and efficiently schedule priority queues in a for
internet routers. The fuzzy rules obtained minimize the selected Lyapunov function
presented by [21]. Their results based in simulation experiments outperform the
popular weighted round Robin (WRR) queue scheduling mechanism. López-Santana
et al. [1] presents a FIS to scheduling task in QS and applied too for QN in [22].
They show as the proposed FIS obtain better results than traditional scheduling rules
as round robin and equiprobable.
However, the use of artificial intelligence to modelling QS is scarce, Azadeh et al.
[23] has demonstrated optimize the modeling and simulation of QS and QN, since
under this scheme may include systems constraints and desired performance objec-
tives, gaining flexibility and the ability to deal with the complexity and nonlinearity
associated with the modeling of QSs and QNs.
356 E. López-Santana et al.

3 Proposed ANFIS to Scheduling in QS and QN

In this section, first we describe the architecture of ANFIS. Second, we show our
ANFIS-based approach to task scheduling in a QS.

3.1 ANFIS Architecture

ANFIS is a flexible approach based on fuzzy logic and artificial neural networks
[24]. We present an architecture based in two inputs x and y and one output z based
in [24–26]. Suppose, if the rule base contains two fuzzy if-then rules such as:
Rule 1: If x is A1 and y is B1 , then z 1  p1 x + q1 y + r1 .
Rule 2: If x is A2 and y is B2 , then z 2  p2 x + q2 y + r2 .
Then the membership functions and fuzzy reasoning is illustrated in Fig. 3, and the
corresponding equivalent ANFIS architecture is shown in Fig. 4. The node functions
in the same layer are of the same function family as described as follows:
Layer 1. Every node i in this layer is a square node that compute as:

Oi1  μAi (x), (4)

where x is the input to node i and Ai the linguistic label (e.g., small, large, etc.)
associated with this node function. μAi (x) is a Membership Function (MF) with
a maximum value of 1 and minimum value of 0. Any continuous and piecewise
differentiable functions are used for node functions in this layer. Parameters in this
layer are referred to as premise parameters.

Fig. 3 Membership function and fuzzy reasoning (based in [24])


Scheduling in Queueing Systems and Networks Using ANFIS 357

Fig. 4 Architecture of ANFIS (based in [24])

Layer 2. Every node in this layer is a circle node labeled , which multiplies the
incoming signals and sends the product out. For instance,

wi  μAi (x)μBi (y), i  1, 2, . . . (5)

Layer 3. Every node in this layer is a circle node labeled N. The ith node computes
the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths:
wi
wi  (6)
w1 + w2

For convenience, outputs of this layer will be called normalized firing strengths.
Layer 4. Every node i in this layer is a square node computed as:

Oi4  wi z i  wi ( pi x + qi y + ri ), (7)

where wi is the output of layer 3, and { pi , qi , ri } is the parameter set. Parameters in


this layer will be referred to as consequent parameters. 
Layer 5. The single node in this layer is a circle node labeled that computes
the overall output as the summation of all incoming signals, i.e., it is given by:

 wi z i
Oi 
5
wi z i  i . (8)
i i wi

Thus, it is an adaptive network that is functionally equivalent to a Sugeno fuzzy


model [25]. We use a hybrid learning algorithm similar to [24], in the forward pass,
the functional signals go forward till layer 4 and the consequent parameters are iden-
tified by the least squares estimate. In the backward pass, the error rates propagate
backward, and the premise parameters are updated by the gradient descent. The con-
358 E. López-Santana et al.

sequent parameters thus identified are optimal (in the consequent parameter space)
under the condition that the premise parameters are fixed.

3.2 Proposed ANFIS

López-Santana et al. [1] present the task scheduling problem in QS and propose a
Fuzzy inference system which has as the output the cycle time (W ) and the inputs
the queue’s length (Ql) and the utilization (u) based in Kingman equation given by:

W  VUT (9)

where, V refers to a variability in the system, U is the utilization and T is the time.
Likewise, the Little Law in Eq. (1) refer that the W depends of L. Thus, we have
two equations to determine the cycle time W . The authors used a Mandami FIS
for the Fuzzification and Defuzzification Interfaces. As input, their approach uses
membership functions defined by experts or users in the system.
In this paper, we present an alternative process based in ANFIS approach. Our
solution does not need the setting of MF for the output. Our method uses a set of
training data to build this MF. ANFIS integrates both neural networks and fuzzy logic
principles train a Sugeno systems using neuro-adaptive learning as describe in above
section. Figure 5 presents the architecture of proposed ANFIS approach. In the next
four sections, we describe the inputs, output, method, and performance measure.
Inputs. The inputs of ANFIS are: u as the average of the utilizations of all servers;
Ql as the average length of queue of all servers; M Fi as the number of membership
functions of each input i ∈ {u, Ql}; μr as the type of membership functions of each
input i ∈ {u, Ql}; and N as number of epochs.
Output. The output of ANFIS is W as estimated cycled time of station or step
Method. The ANFIS is given by:

Fig. 5 Architecture of ANFIS approach


Scheduling in Queueing Systems and Networks Using ANFIS 359

W  AN F I S(u, Ql) (10)

In similar way of the FIS proposed by [1], our ANFIS evaluates in a simulation
when a customer arrival to the process and the server l ∗ to attend a specific customer
is determined as the server with the minimum value of Wl for all l ∈ {1, 2, . . . , s}.
Equation (4) states this method.

l ∗  argmin {Wl } (11)


l∈{1,2,...,s}

Finally, this ANFIS proposed it is applied for each station un a QN, but the training
data consider the effect of all network, i.e., it considers the different flows as inputs
for every stations.

4 Results

We develop a prototype of a QS and a QN in Matlab 2017 using the toolbox of


SimEvents. In addition, we use the toolbox of ANFIS to develop our proposed
method. In this section, we present two simulations. The first one consists in a sim-
ulation of a QS and we apply and compare four scheduling techniques: round robin
policy, equiprobable policy, FIS prosed by [1], and our proposed ANFIS method.
The second one consists in a simulation of a QN and we apply and compare the same
scheduling techniques of the previous simulation.

4.1 Example 1: Simulation of a QS

Figure 6 presents the prototype for a system with 4 servers each one with a queue,
a single class of customer, the capacity is infinity, and the probability of rework is
20%. The discipline in the queue is FIFO (First in First Out).

4.1.1 Setting ANFIS Parameters

To setting the parameters of ANFIS model. We use as training data the results of a sin-
gle simulation of 1000 unit times using a round-robin scheduling policy reported by
[1]. The QS is setting as G/G/4 where the inter-arrival time follows a uniform distribu-
tion between 0.5 and 1 min. The service time follows a uniform distribution between
1.5 and 3.0 min. And, a rework probability of 20%. The input of ANFIS consists for
utilization (u) and queue’s length (Ql) of 3 membership functions (M Fu  3) and
μu = gbellmf (Generalized bell-shaped membership function) type. The number of
epochs is N  50.
360 E. López-Santana et al.

Fig. 6 Example 1 of prototype for a QS with 4 servers with rework

Fig. 7 Results of training data of proposed ANFIS for Example 1 a training error, b training data
versus FIS output

Figure 7 shows the results of training error (a) and training data versus FIS output
(b). The results indicate a small error that reduces as the epochs increase and the fit
is good for the ANFIS output. In this case, there are not a set of rules defined by the
user. Figure 8 shows the results of the ANFIS approach based in training data; the
rule-based system in the graph (a) and its response surface in graph (b). The response
surface indicated as the utilization and queue’s length increases the cycle time also
increases, which is agree with the results of [1].
In order to compare the performance of our ANFIS-based approach we consider
a round robin scheduling policy that consists in allocate a server in sequential way,
equiprobable policy that consists in allocate any server with a same probability, and
the FIS approach proposed by [1].
Scheduling in Queueing Systems and Networks Using ANFIS 361

Fig. 8 Training results of proposed ANFIS for Example 1 a rule base system, b response surface

Figures 9 and 10 show the results for the utilizations and queue’s length for all
servers, respectively. Assuming the mean values of interarrival (ta ) and service (ts )
times and exponential distribution, the theoretical utilization is given by mtts a without
rework. For the example, the theoretical utilization is 0.75 for the system. About
our example’s results in a single simulation run, the utilizations converge to 0.80 in
average for all servers. The increasing is due to the rework. However, all scheduling
policies converge to the same value in steady-state. The FIS and ANFIS approaches
converge faster than round robin and equiprobable policies. Respect to the queue’s
length, round robin policy gets the shortest queue, after is our ANFIS approach and

Fig. 9 Example 1’s results of utilizations case G/G/4 with rework for a round robin, b equiprobable,
c FIS ([1]) approach, and d ANFIS approach
362 E. López-Santana et al.

Fig. 10 Example 1’s results of queue’s length case G/G/4 with rework for a round robin, b equiprob-
able, c FIS ([1]) approach, and d ANFIS approach

then the FIS approach. The equiprobable policy is lower performance. The results of
utilizations show that ANFIS converge for the minimum value of 0.79 approximately,
this is lower than FIS and round robin policies. The equiprobable policy has lower
performance.
These results confirm the rapid response of our ANFIS approach compared with
the traditional policies and is better or equal than the FIS approach proposed by
[1]. The difference of the FIS and ANFIS approaches consist in considering the
permanently check of the system’s status, i.e., these are condition-based scheduling
policies.

4.2 Example 2: Simulation of a QN

Figure 11 present the structure for the QN of example 2 with three stations based
in [22]. The first station has 4 servers, an input and rework probability of 0.2. Its
outputs go to second and third stations with 0.3 and 0.5 probability, respectively.
The second station has 3 servers, an additional input, rework probability of 0.15
and its outputs go to first station and third station with probability 0.4 and 0.45,
respectively. The third station has 4 servers, rework and its outputs exit from the
QN. Each station is a G/G/s system. For the first input in first station the inter arrival
times follows a uniform probability density function U(0.4, 1.5). The second station,
the probability density function is U(1.5, 3.5). For the services time the probability
Scheduling in Queueing Systems and Networks Using ANFIS 363

Fig. 11 Summary of QN for Example 2. Source [22]

Fig. 12 Example 2 prototype of the QN. Source [22]

density functions are: U(1.5, 2.5), U(1.7, 2.7) and U(2, 2.8), for first, second and
third stations. Figure 12 shows the prototype of Matlab for the Example 2.

4.2.1 Setting ANFIS Parameters

To setting the parameters of ANFIS model to the QN. We use as training data the
results of a single simulation of the QN of 500 unit times using a round-robin
scheduling policy reported by [22]. In addition, the input of ANFIS consists for
364 E. López-Santana et al.

utilization (u) and queue’s length (Ql) of 3 membership functions (M Fu  3) and


μu = gbellmf (Generalized bell-shaped membership function) type. The number of
epochs is N  150 for station 1 and 2, and N  100 for station 2.
Figures 13, 14 and 15 show the results of training error (a) and training data versus
ANFIS output (b) for stations 1 to 3, respectively. The results show a small error that
is reduced as the epochs increase and the obtained fit is good for the ANFIS output
respect to the simulation data. These results are congruent to the results obtained in
Example 1.

Fig. 13 Example 2’s results of training data of proposed ANFIS for station 1 a training error, b
training data versus FIS output

Fig. 14 Example 2’s results of training data of proposed ANFIS for station 2 a training error, b
training data versus FIS output

Fig. 15 Example 2’s results of training data of proposed ANFIS for station 3 a training error, b
training data versus FIS output
Scheduling in Queueing Systems and Networks Using ANFIS 365

Fig. 16 Training results of proposed ANFIS for station 1 a rule base system, b response surface

Fig. 17 Training results of proposed ANFIS for station 2 a rule base system, b response surface

Figures 16, 17 and 18 illustrate the obtained FIS as result of the ANFIS method
based in training data for stations 1 to 3, respectively, the rule-based system in the
graph (a) and its response surface in graph (b). The response surface indicated as
the utilization and queue’s length increases the cycle time also increases, which is
coherent with the results of example 1 and the results presented by [1, 22].
In similar way of Example 1, we compare the performance of our ANFIS-based
approach with round robin scheduling policy that consists in allocate a server in
sequential way, equiprobable policy that consists in allocate any server with a same
probability, and the FIS approach proposed by [1]. We run a simulation of 500-unit
times with a warm time of 100 min to transient condition.
Figures 19 and 20 present the results for the utilizations and queue’s length,
respectively, for equiprobable scheduling policy. Subfigures (a), (b) and (c) show the
results for stations 1, 2 and 3 respectively. The results exhibit the evolution of the
utilizations over the time where it can observe as the values trends to converge for a
similar value, however some servers for each station have a high utilization because
the scheduling policy does not observe the queue’s length which it is high too. This
scheduling policy does not observe the condition of the server and always assign the
same work for all servers.
366 E. López-Santana et al.

Fig. 18 Training results of proposed ANFIS for station 3 a rule base system, b response surface

(a) Utilization for station 1 (b) Utilization for station 2

(c) Utilization for station 3

Fig. 19 Results of utilizations of equiprobable scheduling policy of Example 2

The results of round robin scheduling policy are shown in Figs. 21 and 22, for
utilizations and queue’s length respectively. Subfigures (a), (b) and (c) present the
results for stations 1, 2 and 3 respectively. In similar way to equiprobable scheduling
policy, the results are shown for each station and all servers. For this case, the results
show as the utilization for all server and for each station converges for a similar value
and the queue’s length is low compared with the equiprobable results. The queue’s
length is shorter than the equiprobable policy for all time, however this policy has not
in to account the condition of the station and if a breakdown will occur the allocation
is the same for all jobs to processing.
Scheduling in Queueing Systems and Networks Using ANFIS 367

(a) Queue´s length for station 1 (b) Queue´s length for station 2

(c) Queue´s length for station 3

Fig. 20 Results of queue’s length of equiprobable scheduling policy of Example 2

(a) Utilization for station 1 (b) Utilization for station 2

(c) Utilization for station 3

Fig. 21 Results of utilizations for round robin scheduling policy of Example 2

Figures 23 and 24 illustrate the results for the utilizations and queue’s length,
respectively, for FIS scheduling policy. Subfigures (a), (b) and (c) present the results
for stations 1, 2 and 3 respectively. The utilizations over the time converge for a
368 E. López-Santana et al.

(a) Queue´s length for station 1 (b) Queue´s length for station 2

(c) Queue´s length for station 3

Fig. 22 Results of queue’s length for round robin scheduling policy of Example 1

(a) Utilization for station 1 (b) Utilization for station 2

(c) Utilization for station 3

Fig. 23 Results of utilizations for FIS proposed scheduling policy of Example 2

similar value, and the queue’s length is low for all time. This scheduling policy
consider the condition of the server and always assign the work according to the
minimum value of cycled time computed with the proposed FIS.
Scheduling in Queueing Systems and Networks Using ANFIS 369

(a) Queue´s length for station 1 (b) Queue´s length for station 2

(c) Queue´s length for station 3

Fig. 24 Results of queue’s length for FIS proposed scheduling policy of Example 2

Figures 25 and 26 present the results for the utilizations and queue’s length, respec-
tively, for the proposed ANFIS scheduling policy. Subfigures (a), (b) and (c) show
the results for stations 1, 2 and 3 respectively. We can observe that the utilizations
are similar of FIS’s results over the time and these converge for a similar value.
Regarding for the queue’s length, the results show a low value for all time and these
are lower than the FIS’s results. This scheduling policy also consider the condition
of each server and always allocate a customer according to the minimum value of
cycled time computed with the proposed ANFIS. The obtained results of FIS and
ANFIS are better than round robin and equiprobable policies.
Finally, the results are consistent with the reported by [1, 22]. Moreover, the FIS
and ANFIS approaches converge faster than the traditional policies for all stations.
In addition, for equiprobable policy the utilization and queue’s length have a high
variability for all stations while in round robin, FIS and ANFIS policies the results
assemble at the same value for all stations.

5 Concluding Remarks

This paper studies the problem of scheduling customers or tasks in a queueing sys-
tems and queueing networks that consists in to allocate which servers process each
customer. We propose a method to scheduling the customers through the stations
based in an ANFIS-based approach that consists in selecting the customer accord-
370 E. López-Santana et al.

(a) Utilization for station 1 (b) Utilization for station 2

(c) Utilization for station 3

Fig. 25 Results of utilizations for ANFIS proposed scheduling policy of Example 2

(a) Queue´s length for station 1 (b) Queue´s length for station 2

(c) Queue´s length for station 3

Fig. 26 Results of queue’s length for ANFIS proposed scheduling policy of Example 2

ing to a cycled time estimated with a FIS that use as inputs the utilization and the
queue’s length. Traditional scheduling policies work with different rules as round
robin, equiprobable, shortest queue, among others.
Scheduling in Queueing Systems and Networks Using ANFIS 371

Our simulation’s results evidence a better performance of ANFIS approach than


classical scheduling policies as round robin and equiprobable. In addition, the results
are like the FIS approach, however the ANFIS allows to build the FIS using historic
data and could be consider more information like breakdowns, variability, blocking,
among others. Thus, our approach provides a condition-based framework to develop
scheduling rules for queueing systems and queueing networks without constraint.
This work generates possible future development lines. We could be considering
other variables as input for the ANFIS like breakdowns, variability and blocking.
Also, it is possible to design a multi-agent system that allow the load balancing of
tasks in queueing networks. Moreover, the validation in real-world case is possible,
for example healthcare services or call center services.

References

1. López-Santana, E.R., Franco, C., Figueroa-Garcia, J.C.: A Fuzzy inference system to schedul-
ing tasks in queueing systems. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.)
Intelligent Computing Methodologies, pp. 286–297. Springer International Publishing AG
(2017)
2. Yang, F.: Neural network metamodeling for cycle time-throughput profiles in manufacturing.
Eur. J. Oper. Res. 205, 172–185 (2010). https://doi.org/10.1016/j.ejor.2009.12.026
3. Hopp, W.J., Spearman, M.L.: Factory Physics—Foundations of Manufacturing Management.
Irwin/McGraw-Hill (2011)
4. Lopez-Santana, E., Mendez-Giraldo, G., Figueroa-García, J.C.: An ANFIS-based approach to
scheduling in queueing systems. In: 2nd International Symposium on Fuzzy and Rough Sets
(ISFUROS 2017), pp. 1–12. Santa Clara, Cuba (2017)
5. Ross, S.: Introduction to Probability Models. Academic Press (2006)
6. Hillier, F.S., Lieberman, G.J.: Introduction to Operations Research. McGraw-Hill Higher Edu-
cation (2010)
7. Kendall, D.G.: Stochastic processes occurring in the theory of queues and their analysis by the
method of the imbedded Markov Chain. Ann. Math. Stat. 24, 338–354 (1953). https://doi.org/
10.1214/aoms/1177728975
8. Little, J.D.C.: A proof for the queuing formula: L = λ W. Oper. Res. 9, 383–387 (1961). https://
doi.org/10.1287/opre.9.3.383
9. Little, J.D.C., Graves, S.C.: Little’s law. In: Chhajed, D., Lowe, T.J. (eds.) Building Intuition:
Insights From Basic Operations Management Models and Principles, pp. 81–100. Springer,
Boston, MA (2008)
10. López-Santana, E.R., Méndez-Giraldo, G.A.: A knowledge-based expert system for scheduling
in services systems. In: Figueroa-García, J.C., López-Santana, E.R., Ferro-Escobar, R. (eds.)
Applied Computer Sciences in Engineering WEA 2016, pp. 212–224. Springer International
Publishing AG (2016)
11. Terekhov, D., Down, D.G., Beck, J.C.: Queueing-theoretic approaches for dynamic scheduling:
a survey. Surv. Oper. Res. Manag. Sci. 19, 105–129 (2014). https://doi.org/10.1016/j.sorms.
2014.09.001
12. Pinedo, M.L.: Planning and Scheduling in Manufacturing and Services. Springer (2009)
13. López-Santana, E.: Review of scheduling problems in service systems (2018)
14. Baldwin, R.O., Davis IV, N.J., Midkiff, S.F., Kobza, J.E.: Queueing network analysis: concepts,
terminology, and methods. J. Syst. Softw. 66, 99–117 (2003). https://doi.org/10.1016/S0164-
1212(02)00068-7
372 E. López-Santana et al.

15. Jain, M., Maheshwari, S., Baghel, K.P.S.: Queueing network modelling of flexible manufac-
turing system using mean value analysis. Appl. Math. Model. 32, 700–711 (2008). https://doi.
org/10.1016/j.apm.2007.02.031
16. Cruz, F.R.B.: Optimizing the throughput, service rate, and buffer allocation in finite queueing
networks. Electron. Notes Discret. Math. 35, 163–168 (2009). https://doi.org/10.1016/j.endm.
2009.11.028
17. Yang, F., Liu, J.: Simulation-based transfer function modeling for transient analysis of general
queueing systems. Eur. J. Oper. Res. 223, 150–166 (2012). https://doi.org/10.1016/j.ejor.2012.
05.040
18. Suganthi, N., Meenakshi, S.: An efficient scheduling algorithm using queuing system to min-
imize starvation of non-real-time secondary users in cognitive radio network. Clust. Comput.
1–11 (2018). https://doi.org/10.1007/s10586-017-1595-8
19. Chude-Olisah, C.C., Chude-Okonkwo, U.A.K., Bakar, K.A., Sulong, G.: Fuzzy-based dynamic
distributed queue scheduling for packet switched networks. J. Comput. Sci. Technol. 28,
357–365 (2013). https://doi.org/10.1007/s11390-013-1336-2
20. Cho, H.C., Fadali, M.S., Hyunjeong L.: Dynamic queue scheduling using fuzzy systems for
internet routers. In: The 14th IEEE International Conference on Fuzzy Systems, FUZZ’05,
pp. 471–476. IEEE (2005)
21. Cho, H.C., Fadali, M.S., Lee, J.W., Lee, Y.J., Lee, K.S.: Lyapunov-based fuzzy queue schedul-
ing for internet routers TT. Int. J. Control Autom. Syst. 5, 317–323 (2007)
22. López-Santana, E.R., Franco-Franco, C., Figueroa-García, J.C.: Simulation of fuzzy infer-
ence system to task scheduling in queueing networks. In: Communications in Computer and
Information Science, pp. 263–274 (2017)
23. Azadeh, A., Faiz, Z.S., Asadzadeh, S.M., Tavakkoli-Moghaddam, R.: An integrated artificial
neural network-computer simulation for optimization of complex tandem queue systems. Math.
Comput. Simul. 82, 666–678 (2011). https://doi.org/10.1016/j.matcom.2011.06.009
24. Geethanjali, M., Raja Slochanal, S.M.: A combined adaptive network and fuzzy inference
system (ANFIS) approach for overcurrent relay system. Neurocomputing 71, 895–903 (2008).
https://doi.org/10.1016/j.neucom.2007.02.015
25. Jang, J.-S.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man
Cybern. 23, 665–685 (1993). https://doi.org/10.1109/21.256541
26. López-Santana, E.R., Méndez-Giraldo, G.A.: A non-linear optimization model and ANFIS-
based approach to knowledge acquisition to classify service systems. In: Huang, D.-S., Bevilac-
qua, V., Premaratne, P. (eds.) Intelligent Computing Theories and Application, pp. 789–801.
Springer International Publishing (2016)
Genetic Fuzzy System for Automating
Maritime Risk Assessment

Alexander Teske, Rafael Falcon, Rami Abielmona and Emil Petriu

Abstract This chapter uses genetic fuzzy systems (GFS) to assess the risk level of
maritime vessels transmitting Automatic Identification System (AIS) data. Previous
risk assessment approaches based on fuzzy inference systems (FIS) relied on domain
experts to specify the FIS membership functions as well as the fuzzy rule base
(FRB), a burdensome and time-consuming process. This chapter aims to alleviate
this burden by learning the membership functions and FRB for the FIS of an existing
Risk Management Framework (RMF) directly from data. The proposed methodology
is tested with four different case studies in maritime risk analysis. Each case study
concerns a unique scenario involving a particular region: the Gulf of Guinea, the Strait
of Malacca, the Northern Atlantic during a storm, and the Northern Atlantic during
a period of calm seas. The experiments compare 14 GFS algorithms from the KEEL
software package and evaluate the resulting FRBs according to their accuracy and
interpretability. The results indicate that IVTURS, LogitBoost, and NSLV generate
the most accurate rule bases while SGERD, GCCL, NSLV, and GBML each generate
interpretable rule bases. Finally, IVTURS, NSLV, and GBML algorithms offer a
reasonable compromise between accuracy and interpretability.

Keywords Maritime domain awareness · Risk management · Genetic


algorithms · Fuzzy systems · Multi-objective optimization

A. Teske (B) · R. Falcon · R. Abielmona · E. Petriu


School of Electrical Engineering and Computer Science, University of Ottawa,
Ottawa, Canada
e-mail: atesk062@uottawa.ca
R. Falcon
e-mail: rfalcon@ieee.org
R. Abielmona
e-mail: rami.abielmona@larus.com
E. Petriu
e-mail: petriu@uottawa.ca
R. Falcon · R. Abielmona
Research & Engineering Division, Larus Technologies Corporation, Ottawa, Canada

© Springer Nature Switzerland AG 2019 373


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_19
374 A. Teske et al.

1 Introduction

Maritime Domain Awareness (MDA) can be understood as the situational knowledge


of physical and environmental conditions which can affect the safety and timeliness
of maritime operations [1]. Assessing the risk level of maritime entities in real time
is a vital aspect of MDA if risks are to be mitigated. Maritime Risk Assessment
(MRA) is the process of quantifying the risk level of vessels based on the fusion of
multiple data sources. Traditionally, this has required domain experts to identify the
risk factors in the environment, to define their mathematical underpinnings, and to
determine how the risk values should be combined to produce an overall value. This
process is not only time consuming, but it is also error-prone and uncertainty-plagued
given the potential for disagreement among a set of domain experts.
The Risk Management Framework (RMF) put forth in [2, 3] makes use of a fuzzy
inference system (FIS) to combine the value of several risk factors into a single
overall risk level. The fuzzy rule base powering the FIS is directly acquired from
domain experts.
In our previous work submitted to the 2017 International Symposium on Fuzzy
and Rough Sets,1 we attempted to reduce the reliance on domain experts by learning
the rule base directly from data. The proposed methodology was illustrated with
two case studies in maritime risk analysis. We employed five genetic fuzzy systems
(GFSs) available in KEEL [4] to obtain the membership functions and FRB used
for risk assessment purposes and compared the accuracy and interpretability of each
resulting FIS.
In this chapter, we extend that work by adding additional case studies and by
testing several additional GFS algorithms. The new case studies include one set in the
Strait of Malacca (which we expect to have similar Regional Hostility as the Gulf of
Guinea but with a higher Collision Factor), as well as an Atlantic No-Storm scenario
(set in the same AOI as the previous Atlantic scenario but in a period of interest that
does not include harsh weather). In terms of new algorithms, we have tested all of
the algorithms available in KEEL that were applicable to our dataset and that did not
crash (i.e. some of the KEEL algorithms have bugs that cause them to crash). In total,
this work considers 4 case studies and 14 GFS algorithms. The experimental results
(Sect. 5.4) indicate that IVTURS, LogitBoost, and NSLV generate the most accurate
rule bases while SGERD, GCCL, NSLV, and GBML each generate interpretable
rule bases. Meanwhile, IVTURS, NSLV, and GBML algorithms offer a reasonable
compromise between accuracy and interpretability.
The rest of this paper is structured as follows: Sect. 2 briefly goes over relevant
works while Sect. 3 unveils the proposed methodology to automate MRA. Section 4
describes the case studies and data sources considered for the experiments. Section
5 outlines the empirical results and discussion before Sect. 6 wraps up the study.

1 http://www.site.uottawa.ca/~rfalc032/isfuros2017/.
Genetic Fuzzy System for Automating Maritime Risk Assessment 375

2 Related Work

This section briefly reviews some relevant works along maritime risk analysis and
genetic fuzzy systems.

2.1 Maritime Risk Analysis

The purpose of risk assessment is to refine the situational picture for an operator
and/or decision maker. Following this, the goal is to recommend courses of action to
mitigate the identified and assessed risks. ISO 8402:1995/BS 4778 defines risk as:
“A combination of the probability, or frequency, of occurrence of a defined hazard
and the magnitude of the consequences of the occurrence” which closely follows the
International Maritime Organization (IMO) definition [5]. An effective risk manage-
ment strategy must involve the following actions: identify (to be aware of the present
hazards), review (to assess the risk associated with those hazards), control (to reduce
the risks that are not supportable), and review (to monitor the effectiveness of the
controls) [6].
Other projects that deal with risk detection, risk analysis, and risk management
within maritime settings include:
• A Risk Management Framework (RMF) for the risk-driven multi-criteria decision
analysis (MCDA) of various maritime situations, including the automatic genera-
tion of responses to incidents such as a Vessel in Distress (VID) [3, 7]
• Raytheon’s ATHENA Integrated Defense System (IDS) [8], which is designed to
search for suspicious behaviours in search-and-rescue situations
• The Predictive Analysis for Naval Deployment Activities (PANDA) [9] case-based
reasoning system that uses contextual-based risk assessment that relies on a human-
generated risk ontology
• The Maritime Automated Super Track Enhanced Reporting (MASTER) integra-
tive reporting project based on the Joint Capability Technology Demonstration
(JCTD) and the Comprehensive Maritime Awareness (CMA) [10].

2.2 Genetic Fuzzy Systems

Fuzzy Inference Systems (FISs) use fuzzy logic to map input features to class outputs.
Typically, FISs rely on fuzzy membership functions to map numerical inputs to degrees
of membership to linguistic variables modelled as fuzzy sets along with fuzzy rule
bases to accomplish this. The two most common types of FISs are Mamdani [11] and
Sugeno [12]. The main difference between these is that the consequent of Mamdani
FIS rules are fuzzy sets, whereas in Sugeno FISs the rule consequents are polynomial
expressions. Both type of FISs will provide a numerical output back to the user, which
reflects the decision variable of interest in the problem under consideration.
376 A. Teske et al.

Introduced in 1992 with the publication of [13], GFSs are computational models
for automatically learning the FIS membership functions’ parameters directly from
data. In this work, a genetic algorithm (GA) is used to optimize the parameters of
the FIS, with the objective of finding membership function parameters that emulate
a known fuzzy logic controller. This first version of GFS is technically considered
an example of reinforcement learning.
The same year (i.e. 1992) saw the introduction of the Michigan approach for
GFSs [14]. The Michigan approach typically optimizes the FIS rule base. Each
individual in the genetic population represents a single rule, and the entire population
represents the rule base. This introduces a fascinating contradiction. In GA terms,
the individuals in the population are competing with each other to survive based on
the natural selection principles that GAs are built upon. Yet from the FIS perspective,
the individuals in the population are cooperating together to collectively form a good
rule base. Therefore the individuals are both competing with and cooperating with
each other, a contradiction that is referred to as the “cooperation versus competition”
problem [15].
The Pittsburgh approach of GFSs was introduced with [16]. This approach is suit-
able for optimizing the FIS rule base and/or membership functions. Each individual
in the population encodes the entire set of rules and/or membership functions, and
the population is a set of candidate rule base/membership functions. This scheme
implies that the individuals in the population are competing against each other and
not cooperating with each other, which resolves the “cooperation vs competition
problem” seen with the Michigan approach. The drawback of this method is that the
individuals contain much more information, which drastically increases the size of
the search space. This can make it difficult to obtain optimal solutions.
The third common family of GFSs are known as Iterative Rule Learning (IRL)
approaches. As with the Michigan approach, the IRL approach models each individ-
ual as a single rule. However, only the best rule from each iteration is added to the
population, with subsequent iterations generating rules to complement the already-
established ones. The IRL approach addresses the “cooperation versus competition
problem” by dividing the cooperation and competition into two different phases: the
individuals compete within each iteration, and cooperation occurs as rules are added
to the final rule base.
Since their inception, GFSs have been extensively studied [17] and applied to
a wide variety of domains including medicine [18, 19], finance [20, 21], indus-
trial/manufacturing [22, 23], and many others. Figure 1 shows the architecture of a
typical GFS. For further reading [24], is a recent survey on the state-of-the-art of
GFSs.

3 Automating Maritime Risk Assessment

In order to apply genetic fuzzy systems to maritime risk assessment (MRA), we model
the latter as a classification problem. The input features describing each AIS-reporting
vessel are a set of risk features, i.e. numbers attributes in the range [0, 1] that quantify
Genetic Fuzzy System for Automating Maritime Risk Assessment 377

Fig. 1 Architecture of a typical GFS

the extent of a particular risk for the vessel. The decision classes represent the overall
risk assessment which can take a value from the set {LOW-RISK, MEDIUM-RISK,
HIGH-RISK}.

3.1 Risk Features

The RMF’s Risk Feature Extraction module [2] is used to calculate the following
four risk features for each AIS contact:

3.1.1 Weather Risk

Vessels navigating over open oceans may encounter weather conditions that threaten
the safety of the vessel’s crew, passengers, and cargo. Several aspects of weather could
potentially pose a threat including visibility, ice conditions, currents, etc. However,
the single most important weather factor that impacts risk is wave height originating
from wind and swell [25]. Therefore, we model weather risk by mapping the wave
height to the “high weather risk” linguistic term with a trapezoidal membership
function with a = 1.25 m, b = 14 m, c = d = INF. This configuration is inspired by
the World Meteorological Organization sea state code,2 according to which waves
are “moderate” at 1.25 m, “rough” by 2.5 m, etc.

2 https://www.nodc.noaa.gov/woce/woce_v3/wocedata_1/woce-uot/document/wmocode.htm.
378 A. Teske et al.

Table 1 Incident severities. Incident type Severity


Source [26]
Bomb threat 1.0
Terrorism 0.9
Hostage scenario 0.8
Crew damage 0.7
Theft 0.6
Invasion 0.5
Near invasion 0.4
Threatened 0.3
Approach 0.2
Crew error 0.1

3.1.2 Collision Factor

Vessels navigating near one another run the risk of colliding. We calculate this risk
feature as a function of each vessel’s distance to the nearest ship. This is mapped to
a trapezoidal membership function with a = b = 0 m, c = 150 m, d = 926 m. This
configuration is inspired by Transport Canada’s guidelines on avoiding collisions.3

3.1.3 Regional Hostility

Certain regions of the world tend to see hostile activity by bad actors such as pirates.
We refer to these activities as maritime incidents, which serve as the basis for calcu-
lating a regional hostility risk factor. It is defined on the basis of three indicators as
follows:
• Mean Incident Proximity (MIP): As in [26], MIP is the mean distance to the n
nearest incidents within max_distance km of the vessel. The distance is mapped
to risk values via a trapezoidal membership function with a = b = 0 km, c = 370.4
km, d = 740.8 km. These parameter values make the MIP metric fairly sensitive
to the presence of maritime incidents.
• Mean Incident Severity (MIS): Following [26], MIS is calculated as the mean
severity of the n nearest incidents within max_distance of the vessel. The incident
severities are given in Table 1.
• Vessel Pertinence Index (VPI): As in [26], VPI is the maximum relevance of
the n nearest incidents within max_distance of the vessel. The similarities of the
vessel categories is given in Table 2.
Then the overall regional hostility is calculated as the weighted sum of αMIP +
βMIS + γ V PI with previously suggested values of α = 0.4, β = 0.3, γ = 0.3 [26].

3 https://www.tc.gc.ca/eng/marinesafety/tp-tp14070-3587.htm.
Genetic Fuzzy System for Automating Maritime Risk Assessment 379

Table 2 Vessel similarities. Source [26]


Cargo Tanker/industrial Warship Small Small
transport military trans-
vessel port/utility
Cargo transport 1.0 0.5 0 0 0.5
Tanker/industrial 1.0 0.5 0 0 0.5
Warship 1.0 0.5 0 0 0.5
Small military vessel 1.0 0.5 0 0 0.5
Small transport/utility 1.0 0.5 0 0 0.5

Since the MIP trapezoidal membership function parameters suggested in [26] rarely
leads to risk values above 0, we use the above mentioned values. Finally, we use
max_distance = 1000 km.

3.1.4 Degree of Distress

Measures the potential impact of a disaster involving the vessel. For example, vessels
which carry hazardous material or many passengers would have high degree of
distress. Based on the data available to us, we calculate degree of distress as a
combination of the following indicators:
• Environment Risk: The potential impact to the environment as a result of this
vessel capsizing. The mapping from vessel type to environment risk is given in
Table 3.
• Risk of Attack: The Risk of Attack accounts for the probability of the vessel being
attacked based on its category (e.g. if most of the reported maritime incidents
correspond to cargo vessels, then this ship type has high Risk of Attack). Unlike
the VPI, this probability is based on all reported incidents in a given time period,
not just the n closest ones. It is calculated with the formula:

Risk of Attack (X ) = P(X .Category|I )

[26] where P(X .category|I ) is the fraction of the total number of incidents where
the vessel’s category is involved.

Table 3 Risk environment mapping. Source [3]


Type of vessel Risk environment
Speedboat 0.1
Coast guard, Tugboat, Medical vessel 0.2
Cruise ship 0.5
Oil tanker 1.0
380 A. Teske et al.

In [26], the Degree of Distress risk feature also included a “number of people
on board” and a “fuel level” component. However, to the best of our knowledge
there is no readily available data source for these data. Therefore we exclude these
components of Degree of Distress from this work.

3.2 Ground Truth

For each set of risk values, a ground truth overall risk level is assigned to train the
GFS. In this work we use a simple heuristic to generate the ground truth, but in
practice the ground truth could be determined by consulting a domain expert. Our
simple heuristic first discretizes each risk value according to the following scheme:


⎨LOW-RISK, if the risk value is in [0,a)
Risk Value = MEDIUM-RISK, if the risk value is in [a,b)


HIGH-RISK, if the risk value is in [b,1]

We use a = 0.4 and b = 0.7. Each risk feature in the RMF’s Risk Assessment Module
(see Sect. 3.1) will be modelled by the three aforementioned linguistic terms. The
calculation of the overall risk level proceeds as follows:



⎪ HIGH-RISK,if at least one risk value is HIGH-RISK



⎪ OR

Overall Risk = at least two risk values are MEDIUM-RISK



⎪ LOW-RISK, if all risk features are LOW-RISK


⎩MEDIUM-RISK, otherwise

The overall architecture of the proposed methodology is shown in Fig. 2.

4 Maritime Risk Analysis Case Studies

We conduct a total of four experiments, each concerning a specific Area of Interest


(AOI) and Period of Interest (POI). We expect the GFSs to produce distinctive FRBs
for each scenario, corresponding to the unique risk landscape of each AOI.
The first AOI is the Gulf of Guinea (min latitude = −20, max latitude = 7, min
longitude = −7, max longitude = 15) with a POI of January 1 2018 00:00:00–January
1 2018 23:59:59. Thirty-eight maritime incidents were reported in the AOI in 2017
(Fig. 3). Amongst the victims whose ship type could be determined, 47.3% were
Genetic Fuzzy System for Automating Maritime Risk Assessment 381

Fig. 2 Architecture of the proposed MRA methodology

Fig. 3 Maritime incidents in the Gulf of Guinea 2017

cargo ships, 44.7% were tankers, and 7.8% were utility vessels. Weather conditions
in the AOI/POI were mild.
The second AOI concerns the Strait of Malacca (min latitude = −4, max latitude
= 9, min longitude = 92, max longitude = 110) with POI January 1 2018 00:00:00–
January 1 2018 23:59:59. Not only is the Strait of Malacca one of the world’s busiest
maritime traffic lanes, it is also one of the narrowest: 1.5 nautical miles at its narrowest
point. This, combined with the steady growth of traffic within the strait make it a
potentially dangerous area to navigate. Indeed, 60 ship accidents were reported to
Maritime and Port Authority Singapore in 2015 [27]. Additionally, 37 maritime
incidents occurred in the AOI in 2017 (Fig. 4). Finally, weather conditions were mild
in the AOI/POI.
382 A. Teske et al.

Fig. 4 Maritime incidents around the Strait of Malacca 2017

The third and fourth scenarios each concern the same AOI: a northern stretch
of the Atlantic ocean (min latitude = 35, max latitude = 60, min longitude = −50,
max longitude = 0) with two different POIs: January 1 2018 00:00:00–January 1
2018 23:59:59 (“Atlantic Storm” scenario) and January 13 2018 00:00:00–January
13 2018 23:59:59 (“Atlantic No-Storm” scenario). The Atlantic Storm scenario takes
place during a harsh weather event in the Atlantic (Fig. 5). In the Atlantic No-Storm
scenario the weather is much milder (Fig. 6). No piracy activity was recorded in this
region in 2017.

4.1 Data Sources

The data for our experiments originates from the following sources:
AIS Data from Orbcomm.4 We make use of two full days of AIS data from Orbcomm
(i.e. January 1 2018 and January 13 2018), sampling these datasets as specified in
Sect. 5.1. Among the fields available in the AIS messages, we make use of latitude,
longitude, and ship type.

4 https://www.orbcomm.com/.
Genetic Fuzzy System for Automating Maritime Risk Assessment 383

Fig. 5 Wave height, North Atlantic Ocean, January 1 2018 0:00:00GMT

Fig. 6 Wave height, North Atlantic Ocean, January 13 2018 0:00:00GMT

Weather Data from the National Oceanic and Atmospheric (NOAA) Administra-
tions’s WaveWatch III archive.5 NOAA provides various weather forecasts in the
GRIdded Binary (GRIB) file format [28], a file format for reporting meteorological
data in a grid. We make use of NOAA’s global wave height GRIB files.

5 ftp://polar.ncep.noaa.gov/pub/history/waves.
384 A. Teske et al.

Fig. 7 Sample maritime incident report from IMB

Maritime Incident Reports from the ICC International Maritime Bureau’s (IMB)
2017 Piracy and Armed Robbery Against Ships Report.6 This report lists maritime
incidents that occur throughout the world in a semi-structured format (see Fig. 7).
For each of the incidents in the 2017 report, we extract the date/time, location, type
of vessel attacked, and type of incident.

5 Experimental Analysis

5.1 Experimental Setup

For each case study mentioned in Sect. 4, we arbitrarily select one AIS message
from 1000 randomly selected vessels. For each of these messages, we keep only the
latitude, longitude, and ship type field. Each contact is fed to the Risk Management
Framework to determine the local risk values of the four risk features described in
Sect. 3.1, then the ground truth is assigned using the scheme described in Sect. 3.2.
The datasets are fed to the following fourteen KEEL algorithms: AdaBoost, COACH,
GBML, GCCL, GP, GPG, IVTURS, LogitBoost, MaxLogitBoost, NSLV, SGERD,
Slave2, SlaveV0, SP. Table 4 compares the algorithms under consideration and shows
the parameters that we employ for this study.
All experiments were performed on the Windows 10 platform with an i7-3520M
processor and 8GB of RAM. We downloaded the KEEL master branch from source
control,7 to perform the experiments. Each experiment was repeated 30 times using
a different random seed to account for the stochastic nature of the algorithms, and
the average values are reported.

5.2 Performance Metrics

Each resulting FIS was evaluated according to two metrics: accuracy via the well-
known F-measure and interpretability via the “total rule length” metric [29]. The

6 https://www.icc-ccs.org/.
7 https://github.com/SCI2SUGR/KEEL checked out on 01/05/2018.
Genetic Fuzzy System for Automating Maritime Risk Assessment 385

Table 4 Comparison of algorithms from experiments


Algorithm Interpretable rule Membership Number of Parameterization
base function type(s) linguistic terms
per variable
AdaBoost No Triangular Parameter Default
COACH Yes Triangular Parameter Max iterations =
1000
GBML Yes Triangular Learned Default
GCCL Yes Triangular Parameter Max evaluations
= 1000
GP Yes Triangular Parameter Max iterations =
1000
GPG No Triangular Parameter Max iterations =
1000
IVTURS Yes Triangular Learned Max iterations =
1000
LogitBoost No Triangular Parameter Default
MaxLogitBoost No Triangular Parameter Default
NSLV Yes Triangular Fixed Default
SGERD Yes Triangular Learned Default
Slave2 Yes Triangular Fixed Default
SlaveV0 Yes Triangular Fixed Default
SP No Triangular Parameter Max iterations =
1000

ideal FRB should have high F-measure and low total rule length. Note that although
we evaluate two objectives, the GFSs we tested are not dual-objective optimization
algorithms; they have the sole objective of maximizing accuracy.
F-Measure is a well-known metric for evaluating the accuracy of classification algo-
rithms. The key advantage of F-Measure over standard accuracy is that F-Measure
takes false positives and false negatives into account, making it especially suitable
for unbalanced datasets. For a 2 class problem, it is calculated as:

precision ∗ recall
F =2∗
precision + recall

where:
tp
precision =
tp + fp

tp
recall =
tp + fn
386 A. Teske et al.

and:
tp = true positive, fp = false positive, fn = false negative

For a multi-class problem, the F-Measure is defined as the average F-Measure for
each class.
Total Rule Length is a useful tool for measuring the complexity of a rule base (RB).
It is defined as the sum of the number of conditions in each rule [29]. This implicitly
takes into account both the number of rules in the RB and the number of conditions
in the rules.

5.3 Statistical Analysis

The Friedman test was employed to rank the performance of the algorithms. Fol-
lowing this, the Nemenyi post-hoc test was used to test the statistical significance
between the rankings [30].
The Nemenyi tests allow us to arrange the algorithm into tiered groups, i.e. group
“A”, group “B”, group “C”, etc. All of the algorithms in group “A” are statistically
better than the algorithms in group “B” and so on. However, an algorithm can be
placed in more than one group. For example, a group of “AB” indicates that the
statistical test could not confirm that the algorithm is inferior to any of the algorithms
in group “A”, nor could the test confirm that the algorithm is statistically superior to
all of the algorithms in group “B”. Therefore, it may belong to group “A” or to group
“B”.

5.4 Results and Discussion

The results for accuracy are given in Table 5 for the Guinea scenario, Table 6 for
the Malacca scenario, Table 7 for the Atlantic Storm scenario, and Table 8 for the
Atlantic No-Storm scenario. The results for interpretability are given in Table 9 for
the Guinea scenario, Table 10 for the Malacca scenario, Table 11 for the Atlantic
Storm scenario, and Table 12 for the Atlantic No-Storm scenario.
In terms of accuracy, the top performers include IVTURS (A), LogitBoost (AB),
and NSLV (ABC) in the Guinea and Malacca scenarios. For the Atlantic Storm sce-
nario, the top performers are LogitBoost (A), MaxLogitBoost (AB), and IVTURS
(ABC). Finally, in the Atlantic No-Storm scenario the top performers are Logit-
Boost (A), GBML (AB), and IVTURS (ABC). In all of the scenarios, IVTURS and
LogitBoost are each top performers.
In terms of interpretability, SGERD is the top performer in all of the scenarios (A).
GCCL is also a strong contender in the Guinea scenario (B), the Malacca scenario
(AB), the Atlantic Storm scenario (BC), and the Atlantic No-Storm scenario (BC).
Genetic Fuzzy System for Automating Maritime Risk Assessment 387

Table 5 Accuracy results for the Guinea scenario


Algorithm Accuracy (F-measure) Friedman rank Nemenyi group
IVTURS 0.98 1.27 A
LogitBoost 0.98 2.4 AB
NSLV 0.98 3.03 ABC
GBML 0.98 3.7 ABCD
SLAVEv0 0.97 5.42 BCDE
SLAVE2 0.97 5.52 BCDEF
MaxLogitBoost 0.97 6.77 DEFG
AdaBoost 0.96 7.9 EFGH
COACH 0.92 9.0 EFGHI
SGERD 0.82 10.6 HIJ
GP 0.81 10.8 HIJK
SP 0.76 12.13 IJKL
GPG 0.75 12.47 IJKLM
GCCL 0.6 14.0 JKLMN

Table 6 Accuracy results for the Malacca scenario


Algorithm Accuracy (F-measure) Friedman rank Nemenyi group
IVTURS 0.99 1.2 A
LogitBoost 1.0 1.8 AB
NSLV 0.95 3.73 ABC
SLAVE2 0.95 4.43 ABCD
SLAVEv0 0.95 4.57 ABCDE
MaxLogitBoost 0.91 6.0 CDEF
GBML 0.9 6.27 CDEFG
GP 0.75 9.03 FGH
SGERD 0.72 9.7 FGHI
COACH 0.71 9.97 GHIJ
GPG 0.7 10.27 HIJK
SP 0.65 11.67 HIJKL
AdaBoost 0.64 12.37 HIJKLM
GCCL 0.29 14.0 LMN

NSLV algorithm performs well in the Guinea (C) and Malacca (C) scenarios but
performs slightly worse in the Atlantic Storm (DE) and Atlantic No-Storm (F) sce-
narios. Finally, GBML has good performance in the Atlantic Storm (B) and Atlantic
No-Storm (B) scenarios although its performance is less impressive in the Guinea
(CD) and Malacca (EF) scenarios.
388 A. Teske et al.

Table 7 Accuracy results for the Atlantic Storm scenario


Algorithm Accuracy (F-measure) Friedman rank Nemenyi group
LogitBoost 0.99 1.0 A
MaxLogitBoost 0.94 2.1 AB
IVTURS 0.95 3.03 ABC
NSLV 0.93 4.73 BCD
SLAVEv0 0.93 5.08 BCDE
SLAVE2 0.93 5.12 BCDEF
COACH 0.8 7.2 DEFG
GBML 0.92 7.73 DEFGH
AdaBoost 0.85 9.03 GHI
SP 0.73 10.07 GHIJ
GP 0.76 11.63 IJK
GPG 0.74 11.73 IJKL
SGERD 0.74 12.53 IJKLM
GCCL 0.61 14.0 KLMN

Table 8 Accuracy results for the Atlantic No Storm scenario


Algorithm Accuracy (F-measure) Friedman rank Nemenyi group
LogitBoost 0.99 1.0 A
GBML 0.95 2.7 AB
IVTURS 0.93 4.0 ABC
SLAVE2 0.92 4.3 ABCD
SLAVEv0 0.92 4.53 ABCDE
NSLV 0.92 5.53 BCDEF
MaxLogitBoost 0.89 6.8 CDEFG
AdaBoost 0.91 7.3 CDEFGH
GP 0.84 9.3 GHI
SGERD 0.82 10.3 GHIJ
GPG 0.81 10.47 GHIJK
SP 0.71 12.33 IJKL
COACH 0.7 12.43 IJKLM
GCCL 0.62 14.0 JKLMN

In terms of algorithms that achieve good accuracy and interpretability, there is no


one clear answer. Although LogitBoost and MaxLogitboost provide top tier accuracy,
their rule bases are not at all interpretable. On the other hand, SGERD consistently
generates simple rule bases at the cost of low accuracy. The algorithms which offer
a reasonable compromise between the two objectives include IVTURS, NSLV, and
GBML.
Genetic Fuzzy System for Automating Maritime Risk Assessment 389

Table 9 Interpretability results for the Guinea scenario


Algorithm Interpretability (rule Friedman rank Nemenyi group
length)
SGERD 7.22 1.08 A
GCCL 14.11 2.25 B
NSLV 21.21 3.91 C
GBML 22.88 4.33 CD
IVTURS 24.1 4.71 CDE
COACH 38.9 6.69 F
GP 49.1 6.87 FG
SLAVE2 49.05 8.2 H
SLAVEv0 50.9 8.47 HIJ

Table 10 Interpretability results for the Malacca scenario


Algorithm Interpretability (rule Friedman rank Nemenyi group
length)
SGERD 8.2 1.34 A
GCCL 10.12 1.93 AB
NSLV 16.07 3.18 C
IVTURS 21.23 4.33 D
COACH 26.8 6.03 E
GBML 27.11 6.04 EF
GP 51.29 7.74 G
SLAVEv0 34.18 8.06 GHI
SLAVE2 35.16 8.3 GHIJ

Table 11 Interpretability results for the Atlantic Storm scenario


Algorithm Interpretability (rule Friedman rank Nemenyi group
length)
SGERD 5.8 1.08 A
GBML 9.98 2.45 B
GCCL 12.13 3.11 BC
IVTURS 17.79 4.66 D
NSLV 21.82 4.97 DE
COACH 22.42 6.32 F
GP 46.88 7.04 FG
SLAVE2 36.63 8.21 H
SLAVEv0 36.55 8.57 HIJ
390 A. Teske et al.

Table 12 Interpretability results for the Atlantic No Storm scenario


Algorithm Interpretability (rule Friedman rank Nemenyi group
length)
SGERD 5.76 1.23 A
GBML 8.85 2.49 B
GCCL 10.14 2.98 BC
COACH 14.2 4.52 D
IVTURS 14.76 4.9 DE
NSLV 21.03 6.26 F
GP 44.16 7.89 G
SLAVEv0 28.55 8.05 GHI
SLAVE2 30.79 8.64 GHIJ

5.5 Characterization of Fuzzy Rule Bases Per AOI

In Sect. 4, we anticipated that the FRBs generated for each scenario would differ
significantly, corresponding to the unique risk landscape of each case study. To test
this, we measured how frequently each risk feature appeared as an antecedent of a
fuzzy rule.
Table 13 shows the average probability that an antecedent will correspond to a
particular risk feature. Across all of the case studies, the Degree of Distress risk factor
consistently appears in roughly 30% of all conditions. In the Gulf of Guinea scenario,
regional hostility (29%) and Collision Factor (21%) are both important risk features,
whereas Weather Factor (17%) plays a slightly lesser role. The risk landscape in the
Strait of Malacca is revealed to be similar to the Gulf of Guinea, although Weather
Factor is (16%) is slightly less important while Collision Factor (22%) and Regional
Hostility (32%) are slightly more important. It is surprising that Collision Factor
isn’t much more important in the Strait of Malacca given the vessel congestion in the
AOI. For the two Atlantic scenarios, Regional Hostility (9%) almost never appears in
the rule base. As we would expect, Weather Factor is more important in the Atlantic
Storm (27%) than in the Atlantic No-Storm (21%) and Collision Factor is more
important in the Atlantic No-Storm (40%) than in the Atlantic Storm (34%).

Table 13 Average distribution of risk features in rule conditions


AOI Weather risk Collision factor Regional hostility Degree of distress
North Atlantic 0.21 0.4 0.09 0.30
No-Storm
North Atlantic 0.27 0.34 0.09 0.30
Storm
Gulf of Guinea 0.17 0.21 0.29 0.33
Strait of Malacca 0.16 0.22 0.32 0.31
Genetic Fuzzy System for Automating Maritime Risk Assessment 391

5.6 Accuracy Versus Interpretability

In order to illustrate the difference between a highly accurate and a highly inter-
pretable rule base, we compare a rule base generated by IVTURS to one generated
by SGERD. SGERD generated the following RB:

1. IF collisionFactor IS LOW AND regionalHostility IS LOW-MEDIUM THEN


OVERALLRISK IS LOW
2. IF collisionFactor IS LOW AND degreeOfDistress IS LOW-MEDIUM THEN
OVERALLRISK IS MEDIUM
3. IF collisionFactor IS MEDIUM-HIGH AND regionalHostility IS LOW-MEDIUM
THEN OVERALLRISK IS HIGH

IVTURS generated the following RB:

1. IF weatherRisk IS LOW THEN OVERALL RISK IS LOW


2. IF weatherRisk IS VERY LOW THEN OVERALL RISK IS LOW
3. IF collisionFactor IS VERY LOW THEN OVERALL RISK IS LOW
4. IF collisionFactor IS LOW AND degreeOfDistress IS LOW THEN OVERALL
RISK IS MEDIUM
5. IF collisionFactor IS MEDIUM AND degreeOfDistress IS LOW AND weatherRisk
IS VERY LOW THEN OVERALL RISK IS MEDIUM
6. IF degreeOfDistress IS HIGH THEN OVERALL RISK IS HIGH
7. IF degreeOfDistress IS MEDIUM THEN OVERALL RISK IS HIGH
8. IF collisionFactor IS HIGH THEN OVERALL RISK IS HIGH

Clearly the SGERD rule base is far simpler: it has fewer rules and fewer con-
ditions. Indeed, in our experiments SGERD’s rule bases contained an average of
6.75 conditions while IVTUR’s rule bases contained an average of 19.47 conditions.
However, this comes at the cost of accuracy: SGERD managed an average accuracy
of 77.5% yet IVTURS achieved 96.2%.

6 Conclusions

In this chapter, GFSs have been applied to the problem of assessing the overall risk
level of AIS-reporting maritime vessels. The GFSs automatically learn the rule base
and membership functions for a FIS which assigns each AIS message emitted by a
vessel one of three risk levels (Sect. 3) according to four individual risk values. The
data sources include AIS records, weather reports, and maritime incident reports
from three regions of the world: the North Atlantic, the Gulf of Guinea, and the
Strait of Malacca (Sect. 4).
The datasets were fed to fourteen GFS algorithms via the KEEL framework and
the resulting FRBs were evaluated according to their accuracy (F-measure) and inter-
pretability (total rule length) (Sect. 5.1). The experimental results (Sect. 5.4) indicate
392 A. Teske et al.

that IVTURS, LogitBoost, and NSLV generate the most accurate rule bases while
SGERD, GCCL, NSLV, and GBML each generate interpretable rule bases. Finally,
IVTURS, NSLV, and GBML algorithms offer a reasonable compromise between
accuracy and interpretability.
We also investigated the structure of the rule bases produced by each algorithm,
noting the prevalence of each risk factor within the rule bases. We saw that the
frequency with which each risk factor appears in the rules characterizes the unique
risk landscape of each AOI (Sect. 5.5).
As future work, we would like to design a more sophisticated scheme for assigning
the ground truth for the AIS messages, to consider additional risk features, as well
as to investigate the feasibility of producing a global rule base that does not depend
on a specific AOI.

Acknowledgements The authors acknowledge the financial support of the Ontario Centres of
Excellence (OCE) and the National Sciences and Engineering Research Council of Canada
(NSERC) for the project entitled “Big Data Analytics for the Maritime Internet of Things”.

References

1. Abielmona, R.: Tackling big data in maritime domain awareness. Vanguard, 42–43 (2013)
2. Falcon, R., Abielmona, R., Nayak, A.: An evolving risk management framework for wireless
sensor networks. In: Proceedings of the 2011 IEEE International Conference on Computational
Intelligence for Measurement Systems and Applications (CIMSA), pp. 1–6, Ottawa, Canada
(2011)
3. Falcon, R., Abielmona, R.: A response-aware risk management framework for search-and-
rescue operations. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1540–
1547, Brisbane, Australia (2012)
4. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J.,
Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: Keel: a software tool to
assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)
5. International Maritime Organization: Guidelines for Formal Safety Assessment (FSA) for use
in the IMO Rule-Making Process (2002)
6. International Association of Classification Societies: A guide to risk assessment in ship oper-
ations (2012)
7. Falcon, R., Desjardins, B., Abielmona, R., Petriu, E.: Context-driven dynamic risk manage-
ment for maritime domain awareness. In: 2016 IEEE Symposium Series on Computational
Intelligence (SSCI), pp. 1–8. IEEE (2016)
8. Friedman, N.: The Naval Institute Guide to World Naval Weapon Systems. Naval Institute
Press (2006)
9. Moore, K.E.: Predictive analysis for naval deployment activities. PANDA BAA, 05-44 (2005)
10. Lim, I., Jau, F.: Comprehensive maritime domain awareness: an idea whose time has come?
In: Defence, Terrorism and Security, Globalisation and International Trade (2007)
11. Mamdani, E.H.: Application of Fuzzy Logic to Approximate Reasoning Using Linguistic
Synthesis
12. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and
control. IEEE Trans. Syst. Man Cybern. SMC-15(1), 116–132 (1985)
13. Karr, C.: Genetic algorithms for fuzzy controllers. AI Expert 6(2), 26–33 (1991)
14. Valenzuela-Rendón, M.: The Fuzzy Classifier System: a Classifier System for Continuously
Varying Variables (1991)
Genetic Fuzzy System for Automating Maritime Risk Assessment 393

15. Herrera, F., Magdalena, L.: Genetic Fuzzy Systems: A Tutorial, vol. 13, pp. 93–121. Tatra
Mountains Mathematical Publications (1997)
16. Thrift, P.R.: Fuzzy Logic Synthesis with Genetic Algorithms (1991)
17. Herrera, F.: Genetic fuzzy systems: taxonomy, current research trends and prospects. Evol.
Intell. 1(1), 27–46 (2008)
18. Dong, W., Huang, Z., Ji, L., Duan, H.: A genetic fuzzy system for unstable angina risk assess-
ment. BMC Med. Inform. Decis. Mak. 14, 12 (2014)
19. Nouei, M.T., Kamyad, A.V., Sarzaeem, M.R., Ghazalbash, S.: Developing a genetic fuzzy
system for risk assessment of mortality after cardiac surgery. J. Med. Syst. 38(10), 102 (2014)
20. Aznarte, J.L., Alcalá-Fdez, J., Arauzo-Azofra, A., Benítez, J.M.: Financial time series fore-
casting with a bio-inspired fuzzy model. Expert Syst. Appl. 39(16), 12302–12309 (2012)
21. Liu, C.-F., Yeh, C.-Y., Lee, S.-J.: Application of type-2 neuro-fuzzy modeling in stock price
prediction. Appl. Soft Comput. 12(4), 1348–1358 (2012)
22. Serdio, F., Lughofer, E., Pichler, K., Buchegger, T., Efendic, H.: Residual-based fault detection
using soft computing techniques for condition monitoring at rolling mills. Inf. Sci. 259, 304–
320 (2014)
23. Ramli, A.A., Watada, J., Pedrycz, W.: A combination of genetic algorithm-based fuzzy c-
means with a convex hull-based regression for real-time fuzzy switching regression analysis:
application to industrial intelligent data analysis. IEEJ Trans. Electr. Electron. Eng. 9(1), 71–82
(2014)
24. Fernández, A., López, V., Del Jesus, M.J., Herrera, F.: Revisiting Evolutionary Fuzzy Systems:
taxonomy, applications, new trends and challenges. Knowl. Based Syst. 80, 109–121 (2015)
25. Bowditch, N.: Weather routing. In: The American Practical Navigator: An Epitome of Navi-
gation, p. 896 (2002)
26. Falcon, R., Abielmona, R., Billings, S., Plachkov, A., Abbass, H.: Risk management with hard-
soft data fusion in maritime domain awareness. In: The 2014 Seventh IEEE Symposium on
Computational Intelligence for Security and Defense Applications (CISDA), pp. 1–8 (2014)
27. Calamur, K.: High traffic, high risk in the strait of Malacca. In: The Atlantic (2017)
28. World Meteorological Organization: Guide to GRIB (2003)
29. Gacto, M.J., Alcalá, R., Herrera, F.: Interpretability of linguistic fuzzy rule-based systems: an
overview of interpretability measures. Inf. Sci. 181(20), 4340–4360 (2011)
30. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonpara-
metric statistical tests as a methodology for comparing evolutionary and swarm intelligence
algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
Fuzzy Petri Nets and Interval Analysis
Working Together

Zbigniew Suraj and Aboul Ella Hassanien

Abstract Fuzzy Petri nets are a potential modeling technique for knowledge rep-
resentation and reasoning in knowledge-based systems. Over the last few decades,
many studies have focused on improving the fuzzy Petri net model. Various new
models have been proposed in the literature on the subject, which increase both
modeling strength and usability of fuzzy Petri nets. Recently, generalised fuzzy Petri
nets have been proposed. They are a natural extension of the classic fuzzy Petri nets.
The t-norms and s-norms are entered into the model as substitutes for operators min,
max and · (the algebraic product). This paper, however, describes how the extended
class of generalised fuzzy Petri nets, called type-2 generalised fuzzy Petri nets, can
be used to represent knowledge and model reasoning in knowledge-based systems.
The type-2 generalised fuzzy Petri nets expand existing generalised fuzzy Petri nets
by introducing the triple of operators (In, Out1 , Out2 ) in the net model in the form of
interval triangular norms that are supposed to act as a substitute for triangular norms
in generalised fuzzy Petri nets. Thanks to this relatively simple modification, a more
realistic model than the previous one was obtained. The new model allows to use
approximate information in relation to the representation of knowledge, as well as
modeling reasoning in knowledge-based systems.

Keywords Fuzzy Petri net · Decision making · Classification · Approximate


reasoning · Knowledge-based system

Z. Suraj (B)
Faculty of Mathematics and Natural Sciences, University of Rzeszów, Rzeszów, Poland
e-mail: zbigniew.suraj@ur.edu.pl
A. E. Hassanien
Faculty of Computers and Information, Cairo University, Giza, Egypt
e-mail: aboitcairo@gmail.com

© Springer Nature Switzerland AG 2019 395


R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_20
396 Z. Suraj and A. E. Hassanien

1 Introduction

Petri nets (PNs) [30] are widely used in various areas of science and practice, in par-
ticular in robotics and artificial intelligence. They are particularly useful in modeling
and analysis of discrete event systems [5, 40, 41]. The extraordinary advantages of
PNs, such as simple formalism or intuitive graphical representation, make them a
very interesting research object for many years. Over the past few decades, many
different types of PNs have been proposed for different applications. There are many
books, articles and conference materials devoted to the theory and applications of
PNs in the world, see e.g. [5, 10, 23, 29, 31, 40, 41]. Although studies on the theory
and application of PNs have brought many benefits, a number of shortcomings still
remain, namely, the traditional PNs [5, 29] are not able to represent satisfactorily
so-called knowledge-based systems (KBSs). To deal with such an inconvenience in
1984, a new PN model called the fuzzy Petri net (FPN) was proposed by Lipp [15].
FPNs are a convenient tool facilitating the structurisation of knowledge, providing
intuitive visualization of knowledge-based reasoning, and facilitating the design of
effective fuzzy inference algorithms using imprecise, unclear or incomplete infor-
mation. All this makes the FPNs find their permanent place in the design of KBSs [3,
16, 25]. From the very beginning of the introduction of FPNs to support approximate
reasoning in KBS [17], scientists and practitioners in the field of artificial intelligence
have paid close attention to these net models. However, the first FPNs, according to
the literature on this topic [16], have many shortcomings and are not enough for the
increasingly complex KBSs. Therefore, many authors have proposed new alternative
models of such net models to increase their strength for the needs of both knowledge
representation and smarter implementation of rule-based reasoning [2, 3, 9, 14, 16,
26–28, 32–36].
This paper describes how the extended class of general fuzzy Petri nets (GFP-
nets) [32], called type-2 generalised fuzzy Petri nets (T2GFP-nets), can be used
for both knowledge representation and reasoning in knowledge-based systems. The
T2GFP-net expands the existing GFP-nets by introducing a triple of operators
(In, Out1 , Out2 ) in the T2GFP-net model in the form of interval triangular norms
that are supposed to act as a substitute for triangular norms in GFP-nets. In addi-
tion, this extension allows to present modeled by T2GFP system at a much more
convenient level of abstraction than using the classic FPN or even the GFP-net. The
selection of appropriate operators for a system modeled in a more generalized form
is very important, especially in situations where modeled systems are described by
incomplete, imprecise and/or unclear information. In the classic case, a fuzzy set,
called a type 1 fuzzy set, is defined in terms of the function from the universe to the
interval [0,1] (including 0 and 1). This means that the membership of each element
belonging to a fuzzy set is characterized by a single value from the unit interval [0,1],
and not the subinterval, as in the case of T2GFP-net.
In practical applications, it is more convenient to use the element belonging to a
fuzzy set, expressed as a unit interval subinterval instead of a single value from such a
range. The fuzzy set defined in this way is known as the type 2 fuzzy set. Any desired
Fuzzy Petri Nets and Interval Analysis Working Together 397

operations on the type 2 fuzzy sets can be defined by extending the definition of
appropriate operations to the type 1 fuzzy sets, i.e. based on the membership function
of an element with individual values from the interval [0,1]. Research concerning
the type 2 fuzzy sets are mainly focused on the so-called the min-max system [6,
20]. A somewhat weaker side of inference based on the type 2 fuzzy set theory is the
relatively higher computational cost compared to the approach using the type 1 fuzzy
set. To overcome this difficulty, it was proposed in the literature to consider special
cases of the type 2 fuzzy sets [7, 28, 38], which can be basically reduced to fuzzy
sets, in which the membership function of an element to the set takes only values
that are subintervals of the interval [0,1]. In the case of representing the membership
function value of an element belonging to a type 2 fuzzy set as a subinterval of
[0,1], so-called Φ-fuzzy sets [28] are obtained. In the Φ-fuzzy sets the subinterval is
simply considered to be the range in which the true membership [11, 28] is located.
With this assumption, a number of calculations related to performing operations
on Φ-fuzzy sets can be simplified. In addition, the definitions of extended triangular
norms (also called interval t-norms) for interval fuzzy operations are also significantly
simplified. In such a situation, the calculation of interval t-norm values is basically
limited to calculating their values only for the two extreme points of the intervals.
Fuzzy production rules used in this work as rules of inference are based precisely on
interval t-norms. The approach based on the use of type 2 fuzzy sets assumes that the
exact value of the membership function cannot be determined in the form of a single
real value. The corresponding range of values therefore determines the scope of the
exact value under consideration. The use of interval t-norms in T2GFP-net makes
the model more general and practical. In addition, it can be more credible when
accessing uncertain information. And this in turn leads to the fact that the reasoning
process carried out in KBSs based on uncertain knowledge is more realistic. The
new FPN model presented in this paper uses all the possibilities described above.
The natural consequence of this fact is that the approach proposed in this work can
be used to represent knowledge and modeling reasoning in e.g. KBSs [16, 39], fault
diagnosis of systems [13], as well as fuzzy regulation of quality [24].
The organization of this paper is as follows. Section 2 is devoted to basic notions
concerning triangular norms, interval computations and interval triangular norms. In
Sect. 3 a brief introduction to GFP-nets is provided. Section 4 presents the T2GFP-
nets formalism. In Sect. 5, we describe three structural forms of fuzzy production
rules. Section 6 presents two algorithms. The first algorithm constructs a T2GFP-
net on the base of a given set of fuzzy production rules. However, the second one
describes an approximate reasoning process realized by execution of a T2GFP-net
representing a given KBS. A simple example coming from the domain of air traffic
control illustrating the proposed methodology is given in Sect. 7. In Sect. 8 a discus-
sion on comparison with the existing literature has been made. Section 9 includes
remarks on directions for further research related to the presented methodology.
398 Z. Suraj and A. E. Hassanien

2 Preliminaries

In this section, we remind basic concepts and notations regarding triangular norms,
interval computations and interval triangular norms.

2.1 Triangular Norms

A triangular norm (t-norm for short) is a function t : [0, 1]2 → [0, 1], such that for
all a, b, c ∈ [0, 1] the following conditions are satisfied: (1) it has 1 as the unit
element, i.e., t(a, 1) = a; (2) it is monotone, i.e., if a ≤ b then t(a, c) ≤ t(b, c);
(3) it is commutative, i.e., t(a, b) = t(b, a); (4) it is associative, i.e., t(t(a, b), c) =
t(a, t(b, c)).
More relevant examples of t-norms are ZtN (a, b) = min(a, b) (minimum, Zadeh
t-Norm), GtN (a, b) = a · b (algebraic product, Goguen t-Norm), and LtN (a, b) =
max(0, a + b − 1) (Lukasiewicz t-Norm).
Since t-norms are just functions from the unit square into the unit interval, the com-
parison of t-norms is done in the usual way, i.e., pointwise. For the three basic t-norms
and for each (a, b) ∈ [0, 1]2 we have the following order LtN (a, b) ≤ GtN (a, b) ≤
ZtN (a, b).
An s-norm is a function s : [0, 1]2 → [0, 1] such that for all a, b, c ∈ [0, 1] the
following conditions are satisfied: (1) it has 0 as the unit element, i.e., s(a, 0) = a,
(2) it is monotone, i.e., if a ≤ b then s(a, c) ≤ s(b, c), (3) it is commutative, i.e.,
s(a, b) = s(b, a), and (4) it is associative, i.e., s(s(a, b), c) = s(a, s(b, c)).
However, the examples of s-norms corresponding respectively to the three basic
t-norms presented above are ZsN (a, b) = max(a, b) (maximum, Zadeh s-Norm),
GsN (a, b) = a + b − a · b (probabilistic sum, Goguen s-Norm), and LsN (a, b) =
min(1, a + b) (bounded sum, Lukasiewicz s-Norm).
As in the case of t-norms, we also have for the three basic s-norms and for each
(a, b) ∈ [0, 1]2 the following order: ZsN (a, b) ≤ GsN (a, b) ≤ LsN (a, b).
For further details, the reader is referred to [12].

2.2 Interval Computation

An interval number [a, a ] with a ≤ a is the set of real numbers defined by


[a, a ] = {x : a ≤ x ≤ a }. Degenerate intervals of the form [a, a] are equivalent
to real numbers.
One can perform arithmetic operations with interval numbers through the arith-
metic operations on their members.
Let A = [a, a ] and B = [b, b ] be two interval numbers, and let +, −, ·, /,
and = denote arithmetic operations (addition, subtraction, multiplication, division,
Fuzzy Petri Nets and Interval Analysis Working Together 399

respectively) and arithmetic equality relation on pairs of real numbers. The arith-
metic operations with real numbers may be easily extended to pairs of interval
numbers in the following way: A + B = [a + b, a + b ], A − B = [a − b , a − b],
A · B = [min(a · b, a · b , a · b, a · b ), max(a · b, a · b , a · b, a · b )], A/B = [a,
a ] · [1/b , 1/b] for 0 ∈
/ [b, b ]. We shall write A = B if and only if a = a and b = b .
In the special case where both A and B are non-negative intervals, the multiplica-
tion can be simplified to A · B = [a · b, a · b ], 0 ≤ a ≤ a , 0 ≤ b ≤ b .
For further details, the reader is referred to [1, 21].

2.3 Interval Triangular Norms

The notion of t-norms on single values in [0,1] can be extended to subintervals of


[0,1]. Moreover, basic properties of interval t-norms can be obtained from t-norms.
Let A = [a, a ] and B = [b, b ] be two interval real numbers such that 0 ≤
a ≤ a , 0 ≤ b ≤ b . Then for a given t-norm t, an extended t-norm is defined by:
T (A, B) = {t(x, y) : x ∈ A, y ∈ B}. Similarly, an extended s-norm is defined by:
S(A, B) = {s(x, y) : x ∈ A, y ∈ B}. Moreover, the following facts are true for any
continuous t-norm or s-norm: (1) The interval t-norm T of a continuous t-norm
produces the interval T (A, B) = [t(a, b), t(a , b )]. (2) The interval s-norm S of a
continuous s-norm s produces the interval S(A, B) = [s(a, b), s(a , b )].
Interval t-norms corresponding to ZtN , GtN , and LtN can be computed by the fol-
lowing formulas: iZtN (A, B) = [min(a, b), min(a , b )] (interval minimum, interval
Zadeh t-Norm), iGtN (A, B) = [a · b, a · b ] (interval algebraic product, interval
Goguen t-Norm), iLtN (A, B)=[max(0, a + b − 1), max(0, a + b − 1)] (interval
Lukasiewicz t-Norm).
The corresponding interval s-norms are: iZsN (A, B) = [max(a, b), max(a , b )]
(interval maximum, interval Zadeh s-Norm), iGsN (A, B) = [a + b − a · b, a +
b − a · b ] (interval probabilistic sum, interval Goguen s-Norm)), iLsN (A, B) =
[min(1, a + b), min(1, a + b )] (interval bounded sum, interval Lukasiewicz
s-Norm).
Consider the following relation defined on intervals: A  B if and only if a ≤
b and a ≤ b . With this relation, the counterpart of the order for the three basic
t-norms presented above can be expressed as: iLtN  iGtN  iZtN . Similarly as for
interval t-norms with the relation , the counterpart of the order for the three basic
s-norms presented above can be expressed as: iZsN  iGsN  iLsN .
In the sequel we shall write A ≺ B if and only if A  B and A = B.
For further details, the reader is referred to [22, 28].
400 Z. Suraj and A. E. Hassanien

3 Generalised Fuzzy Petri Nets

(Classic) Petri nets (PNs) are a simple and convenient tool for modeling systems. They
have an intuitive graphic representation. PNs were proposed in the 1960s by Petri
[30]. Analysis of the PNs enables obtaining important information about the structure
and dynamic behavior of the modeled system. This information can be used in the
evaluation of the modeled system, its improvement or change. Therefore, they are
helpful at the system design stage. In this paper, we assume that the reader knows the
basic concepts of the PN theory. Readers interested in deeper knowledge about PNs
and their applications are referred to the book [5]. GFP-nets are a modification of the
PNs. They allow modeling of knowledge-based systems in which both knowledge and
inference using this knowledge are generally imprecise, unclear or incomplete. GFP-
nets are used to graphically present the production rules and modeling of approximate
reasoning based on such rules [32].

Definition 1 A generalised fuzzy Petri net (GFP-net) is said to be a tuple N =


(P, T , S, I , O, α, β, γ , Op, δ, M0 ), where: (1) P = {p1 , p2 , . . . , pn } is a finite set of
places; (2) T = {t1 , t2 , . . . , tm } is a finite set of transitions; (3) S = {s1 , s2 , . . . , sn } is a
finite set of statements; (4) the sets P, T , S are pairwise disjoint; (5) I : T → 2P is the
input function; (6) O : T → 2P is the output function; (7) α : P → S is the statement
binding function; (8) β : T → [0, 1] is the truth degree function; (9) γ : T → [0, 1]
is the threshold function; (10) Op is a union of t-norms and s-norms called the set
of operators; (11) δ : T → Op × Op × Op is the operator binding function; (12)
M0 : P → [0, 1] is the initial marking, and 2P denotes a family of all subsets of the
set P.

In the drawing, places are presented as circles and transitions as rectangles. The
function I represents the directed arcs joining places with transitions, and the function
O represents the directed arcs joining transitions with places. A place p is called an
input place of a transition t, if I (t) = {p}, and if O(t) = {p }, then a place p is called
an output place of t. The initial marking M0 is an initial distribution of tokens in the
places. It can be represented by a vector of dimension n of tokens (real numbers) from
[0, 1]. For p ∈ P, M0 (p) can be interpreted as a truth value of the statement s bound
with a given place p by means of the statement binding function α. Graphically, the
tokens are represented by means of suitable real numbers placed over the circles
corresponding to appropriate places.
We accept that if M0 (p) = 0 then the token does not exist in the place p. The
numbers β(t) and γ (t) are placed in a net drawing under the transition t. The first
number is interpreted as the truth degree of an implication corresponding to a given
transition t. The role of the second one is to limit the possibility of transition firings,
i.e., if the input operator In value for all values corresponding to input places of
the transition t is less than a threshold value γ (t) then this transition cannot be
fired (activated). The operator binding function δ connects transitions with triples of
operators (In, Out1 , Out2 ). The first operator in the triple is called the input operator,
and two remaining ones are the output operators. The input operator In concerns the
Fuzzy Petri Nets and Interval Analysis Working Together 401

way in which all input places are connected with a given transition t (more precisely,
statements corresponding to those places). However, the output operators Out1 and
Out2 concern the way in which the next marking is computed after firing the transition
t. In the case of the input operator we assume that it can belong to one of two classes,
i.e., t- or s-norm, whereas the second one belongs to the class of t-norms and the
third to the class of s-norms.
Let N be a GFP-net. A marking of N is a function M : P → [0, 1].
By the dynamics of GFP-net, we understand the way in which new net marking
is calculated based on the current marking after firing the transition enabled in this
marking.
Let N = (P, T , S, I , O, α, β, γ , Op, δ, M0 ) be a GFP-net, M be a marking of
N , t ∈ T , I (t) = {pi1 , pi2 , . . . , pik } be a set of input places for a transition t and
β(t) ∈ (0, 1]. (0 does not belong to the unit interval.) A transition t ∈ T is enabled
for marking M , if the value of input operator In for all input places of the transition
t by M is positive and greater than, or equal to, the value of threshold function
γ corresponding to the transition t. Formally, In(M (pi1 ), M (pi2 ), . . . , M (pik )) ≥
γ (t) > 0.
We assume that one can only fire enabled transitions. Firing the enabled transition
t consists of removing the tokens from its input places I (t) and adding the tokens to
all its output places O(t) without any alteration of the tokens in other places. If M is
a marking of N enabling transition t and M  is the marking derived from M by firing

transition t, then for each p ∈ P a computation of the next marking M is as follows:
(1) Tokens from all input places of the fired transition t are removed. (2) Tokens in
all output places of t are modified in the following way: at first, the value of input
operator In for all input places of t is computed, next, the value of output operator
Out1 for the value of In and for the value of truth degree function β(t) is determined,
and finally, a value corresponding to M  (p) for each p ∈ O(p) is obtained as a result
of output operator Out2 for the value of Out1 and the current marking M (p). (3)
Numbers in the remaining places of net N are not changed. Formally, for p ∈ P
M  (p) = 0 if p ∈ I (t), Out2 (Out1 (In(M (pi1 ), M (pi2 ), . . . , M (pik )), β(t)), M (p)) if
p ∈ O(t), and M (p) otherwise.

Example 1 Let us consider a GFP-net in Fig. 1a. For this net we have: the set of
places P = {p1 , p2 , p3 }, the set of transitions T = {t1 }, the input function I and the

Fig. 1 A GFP-net with: a the initial marking, b the marking after firing t1
402 Z. Suraj and A. E. Hassanien

output function O in the form: I (t1 ) = {p1 , p2 }, O(t1 ) = {p3 }, the set of statements
S = {s1 , s2 , s3 }, the statement binding function α : α(p1 ) = s1 , α(p2 ) = s2 , α(p3 ) =
s3 , the truth degree function β : β(t1 ) = 0.8, the threshold function γ : γ (t1 ) =
0.3, and the initial marking M0 = (0.5, 0.7, 0). In addition, there are: the set of
operators Op = {ZtN , ZsN , GtN } and the operator binding function δ defined as
follows: δ(t1 ) = (ZtN , GtN , ZsN ). The transition t1 is enabled by the initial marking
M0 , because min(M0 (p1 ), M0 (p2 )) = 0.5 ≥ γ (t1 ). After firing the transition t1 by the

marking M0 we receive a new marking M = (0, 0, 0.4) (Fig. 1b), at which t1 is no
longer enabled.

For further details, the reader is referred to [32].

4 Type-2 Generalised Fuzzy Petri Nets

In Sect. 2 we have recalled basic notions of interval analysis and related areas, now
we will describe how to modify the GFP-net model (see Sect. 3), so as to make it
closer to the physical reality.

Definition 2 A type-2 generalised fuzzy Petri net (T2GFP-net) is a tuple N  =


(P, T , S, I , O, α, β, γ , Op, δ, M0 ), where: (1) P, T , S, I , O, α have the same mean-
ing as in Definition 1; (2) β : T → L([0, 1]) is the truth degree function; (3)
γ : T → L([0, 1]) is the threshold function; (4) Op is a union of interval t-norms
and interval s-norms called the set of operators; (5) δ : T → Op × Op × Op is the
operator binding function; (6) M0 : P → L([0, 1]) is the initial marking, and L([0, 1])
denotes the set of all closed subintervals of the unit interval.

In the T 2GFP-net, functions defined in positions (2), (3) and (6) are more general
compared to the corresponding functions in GFP-net. This time their values are
interval numbers from [0,1], instead of individual values from this interval. Moreover,
in this model we assume that the input operator can belong to one of two classes,
i.e., interval t- or interval s-norms, whereas the second one belongs to the class of
interval t-norms, and the third to the class of interval s-norms. This extension allows
you to present and analyze the modeled system using T 2GFP-net on a more general
level of abstraction.
Let N  be a T2GFP-net. A marking of N  is a function M : P → L([0, 1]). We
assume that if M (p) = [0, 0] then the token does not exist in the place p.
A transition t ∈ T is enabled for marking M , if the interval produced by input
operator In for all input places of the transition t by M is (strictly) greater than
[0,0] and greater than, or equal to, the interval being a value of threshold function
γ corresponding to the transition t, i.e., In(M (pi1 ), M (pi2 ), . . . , M (pik ))
γ (t)
[0,0].
Let N  = (P, T , S, I , O, α, β, γ , Op, δ, M0 ) be a T2GFP-net, t ∈ T , I (t) = {pi1 ,
pi2 , . . . , pik } be a set of input places for a transition t and β(t) ∈ L((0, 1]). (0 does not
belong to the unit interval.) Moreover, let In be an input operator and Out1 , Out2 be
Fuzzy Petri Nets and Interval Analysis Working Together 403

Fig. 2 A T 2GFP-net with: a the initial marking, b the marking after firing t1

output operators for the transition t. If M is a marking of N  enabling transition t and


M  is the marking derived from M by firing transition t, then for each p ∈ P: M  (p) =
[0, 0] if p ∈ I (t), Out2 (Out1 (In(M (pi1 ), M (pi2 ), . . . , M (pik )), β(t)), M (p)) if p ∈
O(t), and M (p) otherwise.

For a T 2GFP-net, a procedure for computing the marking M is similar to an
appropriate procedure corresponding to a GFP-net presented above.
Example 2 Let us consider a T2GFP-net in Fig. 2a. For this net we have: the set
of places P, the set of transitions T , the set of statements S, the input function
I , the output function O, the statement binding function α which are described
analogously to Example 1. In addition, there are: the truth degree function β :
β(t1 ) = [0.7, 0.8], the threshold function γ : γ (t1 ) = [0.2, 0.3], and the initial mark-
ing M0 = ([0.5, 0.6], [0.7, 0.8], [0, 0]). And there are also: the set of operators
Op = {iZtN , iGtN , iZsN } and the operator binding function δ defined as follows:
δ(t1 ) = (iZtN , iGtN , iZsN ). The transition t1 is enabled by the initial marking
M0 , since iZtN (M0 (p1 ), M0 (p2 )) = [min(0.5, 0.7), min(0.6, 0.8)] = [0.5, 0.6]

γ (t1 ) = [0.2, 0.3]. Firing transition t1 by the marking M0 transforms M0 to the mark-
ing M  = ([0, 0], [0, 0], [0.35, 0.48]) (Fig. 2b), at which t1 is no longer enabled.

5 Structural Forms of Fuzzy Production Rules

A fuzzy production rule (a rule for short) is an important and fruitful approach
to knowledge representation and a FPN is a very useful way to represent the rule
graphically [32]. In this paper, we assume that a KBS is described by rules of the
form: IF premise THEN conclusion (CF) for which the premise is consumed and
the conclusion is produced each time the rule is used, where CF means a certainty
factor. Moreover, the system modeling is realized by transforming these rules into a
T 2GFP-net depending on the form of a transformed rule. In the paper, we consider
three structural forms of rules.
Type 0: IF s THEN s (CF = [c, c ]), where s, s denote statements, [a, a ], [b, b ]
are the interval numbers corresponding to their values, and CF is a certainty factor.
The truth values of s, s , and CF belong to L([0, 1]).
404 Z. Suraj and A. E. Hassanien

Fig. 3 A T2GFP-net
representation of rule type 0

The degree of reliability of the rule is expressed by the value of the parameter CF.
The higher value of [c, c ] means that the rule corresponding to this value is more
reliable. In similar way, the value [d , d  ] ∈ L([0, 1]) is interpreted. It represents the
threshold value assigned to each rule. The higher value [d , d  ] means the higher truth
degree of the rule premise, i.e. s, is required. However, the operator In and the oper-
ators Out1 , Out2 represent the input operator and the output operators, respectively.
These operators play an important role in optimizing the rule firing. This aspect will
be discussed in more details in Sect. 7. According to Fig. 2, the token value at the
output place p of the transition t corresponding to the production rule is calculated
according to the formula [b, b ] = Out1 ([a, a ], [c, c ]).
A T2GFP-net structure of this rule is shown in Fig. 3.
If the antecedence or the consequence of a rule contains And or Or (classical
propositional connectives), it is called a composite rule. Below, two types of com-
posite rules are presented together with their T2GFP-net representation (see Fig. 4).
Type 1: IF s1 And/Or s2 . . . And/Or sk THEN s (CF = [c, c ]), where s1 , s2 ,
…, sk , s denote statements, and [a1 , a1 ], [a2 , a2 ], ..., [ak , ak ], [b, b ] their values,
respectively. The token value [b, b ] is calculated in the output place as follows
(Fig. 4a): [b, b ] = Out1 (In([a1 , a1 ], [a2 , a2 ] . . . , [ak , ak ]), [c, c ]).
It is easy to see that a rule of type 0 is a particular case of a rule of type 1, as in
the case of the rule of type 0, there is only one statement in the antecedence.
Type 2: IF s THEN s1 And s2 . . . And sn (CF = [c, c ]), where s , s1 , s2 , ..., sn
denote statements, and and [b, b ], [a1 , a1 ], [a2 , a2 ], ..., [an , an ] denote their values,

Fig. 4 A T2GFP-net representation of the rule: a type 1, b type 2


Fuzzy Petri Nets and Interval Analysis Working Together 405

respectively. The token value is calculated in each output place as follows (Fig. 4b):
[ak , ak ] = Out1 ([b, b ], [c, c ]).
Remarks:
1. Taking into account the fact that there are single statements in the rules of type
0 and 2, you could omit the input operator In in Figs. 3 and 4b. Nevertheless, in
order to maintain the adopted pattern of the triples of operators in these figures,
we leave the operator where it is.
2. In three graphical representations of the types of rules considered above, we
assume that the initial markings of the output places are equal to [0,0]. In this
situation, the output operator Out2 can be omitted from formulas describing the
values of markings at output places, because it does not change the marking value
in these places. Otherwise, i.e. for non-zero marking of the output places, it is
necessary to take into account the output operator Out2 . This means that in each
formula presented above, the final marking value [a, a should be calculated as
follows: [a, a ] = Out2 ([a , a ], M (p)), where [a , a ] means the token values
calculated for the appropriate rule types using the formulas above, and M (p) is
the marking of the output place p. Intuitively, the final token value corresponding
to M  (p) for each p ∈ O(t) is obtained as a result of the operation Out2 for the
calculated value of operation Out1 and the current marking M (p).
3. In this paper, we do not consider rules of the form: IF s THEN s1 Or s2 . . .
Or sn . Rules of this type do not represent a single implication, but a set of n
implications with the same premise s and n conclusions si , i = 1,2,…, n.
4. Due to technical reasons the names of functions β, γ in Figs. 3 and 4 are repre-
sented by b and g respectively, and not in their original shape.

6 Algorithms

To model and analyze the system with uncertainty, we usually have to do the follow-
ing three steps (cf. [39]):
Step 1. Generate corresponding FPN model for KBS.
Step 2. Design a reasoning algorithm based on some application backgrounds.
Step 3. Implement the reasoning algorithm with the appropriate parameters.

In this section, we present two algorithms that correspond to the realization of the
first two steps mentioned above. An example of the realization of the third step will
be presented in Sect. 7.
The first algorithm constructs a T2GFP-net on the base of a given set of rules;
the transformation of rules into a T2GFP-net is realized depending on the form of
the transformed rule (see previous section). However, the second one describes a
reasoning process realized by execution of a T2GFP-net representing a given KBS.
The effectiveness of this algorithm is obvious. It depends mainly on the number
of rules belonging to the set R [4].
406 Z. Suraj and A. E. Hassanien

Algorithm 1: Construction of T2GFP-net using a set of rules


Input : A finite set R of rules
Output: A T2GFP-net N
F ← ∅; (* The empty set. *)
for each r ∈ R
if r is a rule of type 0 then
construct a subnet Nr as shown in Fig. 3;
if r is a rule of type 1 then
construct a subnet Nr as shown in Fig. 4a;
if r is a rule of type 2 then
construct a subnet Nr as shown in Fig. 4b;
F ← F ∪ {Nr };
integrate all subnets from a family F on joint places and create a result net N ;
return N ;

Before demonstrating Algorithm 2, we will first introduce two auxiliary concepts


regarding the two types of statements identified in this algorithm. They are so-called
starting statements and goal statements. The first of these occur in the premises
of rules that initiate the inference process described by rules derived from a given
rule-based knowledge base. The second concerns the statements included in the
conclusions of the rules that generate the final decisions proposed by the analyzed
inference process.
In the PN representation of rules, the places associated with the first group of
statements are called starting places, while places associated with the second type of
statements are called goal places. What is more, when the degrees of truth relating to
individual starting statements are given, then by analyzing the process of inference
described by the set of given rules step by step, we can try to find out what the
degrees of truth in the goal statements are. The purpose of the Algorithm 2 is just
determining the degrees of truth of goal statements based on the degrees of truth of
starting statements.
We assume in the paper that the truth degrees of the starting statements are given
by the domain expert.

Algorithm 2: Reasoning algorithm using T 2GFP-net


Input : A set of the markings of starting places
Output: A set of the markings of goal places
repeat
Determine the transitions ready for firing
while Are there any transitions ready for firing? do
Fire a transition ready for firing;
Compute the new markings of places after firing the transition;
Determine the transitions ready for firing;
Read the markings of goal places;
Reset the markings of all places
until Is this the end of simulation?;
Fuzzy Petri Nets and Interval Analysis Working Together 407

Algorithm 2 is based on the idea of the reachability tree [29, 31]. The main benefits
of this approach are the ease of understanding the algorithm and the ease of finding
the path of inference. On the other hand, its weaker side is the more complex data
structure and the relatively slow speed of inference (cf. [39]).
The following section shows an example of using these two algorithms together
with the appropriate parameters.

7 An Example

This section shows an example of a simplified version of the real problem [8]. This
applies to the following situation: a plane B waits at a certain airport for a plane A to
allow some passengers to change plane A to plane B. Now, the conflict arises when
the plane A is late. In this situation, you can consider the following alternatives:
• Plane B waits for the arrival of plane A. In this case, B will depart late.
• Plane B departs according to schedule. In this case, passengers leaving plane A
must wait for a later plane.
• Plane B departs according to schedule, and another plan is proposed for passengers
of plane A.
In order to make the most accurate decision, one should also take into account
several other factors, such as time of delay, number of passengers changing plane,
etc. Consideration of the optimal solution to the problem with mutually exclusive
goals, such as minimizing delays in the entire flight network, warranty connections
to the satisfaction of passengers, the efficient use of expensive resources, etc., in this
example are completely omitted.
To describe the aforementioned conflict in air travel, we propose to consider the
following three rules:
• IF s2 Or s3 THEN s6
• IF s1 And s4 And s6 THEN s7
• IF s4 And s5 THEN s8 ,
where the statements’ labels have the meanings presented in Table 1.
Using Algorithm 1 (Sect. 6) for constructing a T2GFP-net on the base of a given
set of rules, we present the T2GFP-net model corresponding to these rules. This
net model is shown in Fig. 5, where the logical operators Or, And are interpreted as
iZsN (interval maximum) and iZtN (interval minimum), respectively. Note that the
places p1 , p2 , p3 and p4 include the sets of fuzzy values [0.5,0.6], [0.4,0.5], [0.7,0.8]
and [0.5,0.7] corresponding to the statements s1 , s2 , s3 and s4 , respectively. In this
example, the statement s5 attached to the place p5 is the only crisp one and its value is
equal to [1,1]. Moreover, there are: the truth degree function β : β(t1 ) = [0.8, 0.9],
β(t2 ) = [0.6, 0.7] and β(t3 ) = [0.9, 1], the threshold function γ : γ (t1 ) = [0.3, 0.4],
γ (t2 ) = [0.4, 0.5], γ (t3 ) = [0.5, 0.6], the set of operators Op = {iZtN , iGtN , iZsN }
408 Z. Suraj and A. E. Hassanien

Table 1 Interpretation of the Label Interpretation


statements’ labels
s1 Plane B is the last plane in this direction today
s2 The delay of plane A is huge
s3 There is an urgent need for the parking space of
plane B
s4 Many passengers would like to change for plane B
s5 The delay of plane A is short
s6 (Let) plane B depart according to schedule
s7 Substitute an additional plane C (in the same
direction of flight as plane B)
s8 Let plane B wait for plane A

Fig. 5 An example of T2GFP-net model of air traffic control: a the initial marking, b the marking
after firing a sequence of transitions t1 t2

and the operator binding function δ defined as follows: δ(t1 ) = (iZsN , iGtN , iZsN ),
δ(t2 ) = (iZtN , iGtN , iZsN ), δ(t3 ) = (iZtN , iGtN , iZsN ).
Assessing the statements attached to the places from p1 up to p5 , we observe that
the transitions t1 and t3 can be fired. Firing these transitions according to the firing
rules for the T2GFP-net model allows for computation of the support for the alter-
natives in question. In this way, the possible alternatives are ordered with regard to
the preference they achieve from the knowledge base. This order forms the basis for
further examinations and simulations and, ultimately, for the dispatching proposal. If
one chooses a sequence of transitions t1 t2 then they obtain the final value, correspond-
ing to the statement s7 , equal to the interval [0.3,0.42]. The detailed computation in
Fuzzy Petri Nets and Interval Analysis Working Together 409

Fig. 6 A graph representing


all reachable markings of the
T2GFP-net from Fig. 5

this case proceeds as follows. We can see that the transition t1 is enabled by the
initial marking M0 , because iZsN (M0 (p2 ), M0 (p3 )) = iZsN ([0.4, 0.5], [0.7, 0.8]) =
[max(0.4, 0.7), max(0.5, 0.8)] = [0.7, 0.8]
γ (t1 ) = [0.3, 0.4]. Firing transition t1
by the marking M0 transforms M0 to the marking M1 = ([0.5, 0.6], [0, 0], [0, 0],
[0.5, 0.7], [1, 1], [0.56, 0.72], [0, 0], [0, 0]), because iGtN ([0.7, 0.8], [0.8, 0.9]) =
[0.7 · 0.8, 0.8 · 0.9] = [0.56, 0.72], where t2 is still enabled. Firing transition t2 by
the marking M1 transforms M1 to the marking M2 = ([0, 0], [0, 0], [0, 0], [0, 0],
[1, 1], [0, 0], [0.3, 0.42], [0, 0]) since iZtN ([0.5, 0.6], [0.56, 0.72], [0.5, 0.7]) =
[min(0.5, 0.56, 0.5), min(0.6, 0.72, 0.7)] = [0.5, 0.6] and iGtN ([0.5, 0.6], [0.6,
0.7]) = [0.5 · 0.6, 0.6 · 0.7] = [0.3, 0.42], where all transitions are disabled. In the
other case (i.e., for the transition t3 only), the final value, this time corresponding to
the statement s8 , equals the interval [0.45,0.7], where also all transitions are disabled.
We omit the particular calculation in the second case, because it runs similarly as
above.
The graphical representation of Algorithm 2 execution is illustrated in Fig. 6.
We can easily see in this graph three sequences of firing transitions (the reach-
able paths): (t1 , t2 ), (t1 , t3 ), and (t3 , t1 ). The first reachable path goes from the ini-
tial marking M0 represented in the graph by the node N0 to the final marking M2
represented in the graph by the node N2 . However, the next two reachable paths
transform marking M0 into the final marking M4 = ([0.5, 0.6], [0, 0], [0, 0], [0, 0],
[0, 0], [0.56, 0.72], [0, 0], [0.45, 0.7]) represented in the graph by the node N4 (see
Table in Fig. 7). Since the markings of places p7 and p8 are the true degrees of the
statements attached to these places, thus the values [0.3, 0.42] and [0.45, 0.7] are
respectively the believable degrees of final decisions in the KBS considered in this
example.
If we interpret the logical operators Or, And as the interval probabilistic sum iGsN
and interval algebraic product iGtN , respectively, and if we choose a sequence of
transitions t1 t2 then the final value is not possible to obtain, because after firing the
transition t1 by the initial marking M0 we achieve the result marking by which the
transition t2 is not able to fire. In the other case, i.e., for the transition t3 , we obtain the
final value for the statement s8 also equal to [0.45, 0.7]. A similar situation occurs
as before, if we accept the interval Lukasiewicz s-norm and interval Lukasiewicz
t-norm for the logical operators Or, And, respectively.
410 Z. Suraj and A. E. Hassanien

Fig. 7 A table of all nodes in the graph from Fig. 6

This example shows clearly that different interpretations for the logical operators
Or and And may lead to quite different decision results. Therefore, we propose a
new fuzzy net model in the paper which is more flexible than the classical one
as in the former class the user has the chance to define the input/output operators.
Choosing a suitable interpretation for logical operators Or and And we may apply the
mathematical relationships between interval t-norms and interval s-norms presented
in Sect. 2.3. The rest in this case certainly depends on the experience of the model
designer to a significant degree.

8 Comparison with Existing Literature

In this section, we present a brief information about new FPN models, as well as a
comparison of our approach with existing literature in this area.
Using the review article [16], the new FPN models can be divided into five thematic
groups, such as:
1. FPNs combining PNs and fuzzy logic.
2. FPNs considering time factor.
3. FPNs based on possibility logic.
4. FPNs using neural networks.
5. FPNs based on matrix operations.
The approach presented in this paper differs from the one presented above. It
opens a new, sixth and, it seems, equally promising direction of research, which
can be described as FPN and interval analysis. In particular, the paper proposes
a T2GFP-net model in an uncertain environment with interval numbers, which has
some advantages over the models proposed in the literature, which can be summarized
as follows:
• This paper uses interval t-norms [22] instead of the classic t-norms [12], as well as
interval parameters that characterize FPRs, and therefore the proposed approach
opens the possibility of optimizing the degree of truth at the output places, cf. [37].
Fuzzy Petri Nets and Interval Analysis Working Together 411

• The T2GFP-net model makes the system more generalised in comparison to [16,
39], because all the markings in input and output places as well as the transition
characteristics are linked to some parameters, which are also interval numbers.
This option applies to the reliability of the system.
• Because interval fuzzy sets have been used in this paper, thus one can specify
the interval number instead of the exact membership or truth value. An interval
is assumed in order to indicate the range of the exact value, so that the model
proposed in this paper is more realistic.

9 Concluding Remarks

Trying to make GFP-nets more realistic with regard to the perception of physical
reality, in this paper we have established a link between GFP-net and interval analysis.
The link is methodological and demonstrates the possible use of the methodology
of interval analysis (to deal with incomplete information) to transform GFP-nets
into a more realistic T2GFP-net model. The model uses interval triangular norms
instead of classical triangular ones. In the approach based on the interval fuzzy
sets, it is assumed that one is not able to specify the exact membership or truth
value. An interval is adopted to indicate the range of the exact value. It makes the
model as proposed in this paper more flexible, general and practical. Moreover, this
model is concerned with the reliability of the information provided, leading to greater
generalization in approximate reasoning process in KBS. Suitability and usefulness
of the proposed approach has been proved for the decision-making by using a simple
real-life example. The elaborated approach looks promising with regard to alike
application problems that could be solved in a similar manner.
In this paper, we have only considered the extension of t-norms to interval t-norms
in a numeric framework. It is useful to study FPNs in the context of the notion of
t-norms and their interval extensions using more general mathematic structures (i.e.,
L-values, in general, for some lattice L, see e.g., [18, 19]). These are examples of
issues which we would like to investigate applying the approach presented in the
paper.

Acknowledgements The author is grateful to anonymous reviewers for helpful comments.

References

1. Alefeld, G., Mayer, G.: Interval analysis: theory and applications. J. Comput. Appl. Math. 121,
421–464 (2000)
2. Bandyopadhyay, S., Suraj, Z., Grochowalski, P.: Modified generalized weighted fuzzy Petri net
in intuitionistic fuzzy environment. In: Proceedings of the International Joint Conference on
Rough Sets, Santiago, 2016, Chile. Lecture Notes in Artificial Intelligence 9920, pp. 342-351,
Springer (2016)
412 Z. Suraj and A. E. Hassanien

3. Cardoso, J., Camargo, H. (eds.): Fuzziness in Petri nets. Springer, Heidelberg (1999)
4. Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans.
Knowl. Data Eng. 2(3), 311–319 (1990)
5. David, R., Alla, H.: Petri Nets and Grafcet: Tools for Modelling Discrete Event Systems.
Prentice-Hall, London (1992)
6. Dubois, D., Prade, P.: Operations in a fuzzy-valued logic. Inf. Control 43, 224–240 (1979)
7. Dubois, D., Prade, P.: Possibility Theory: An Approach to Computerized Processing of Uncer-
tainty. Plenum Press, New York (1988)
8. Fay, A., Schnieder, E.: Fuzzy Petri nets for knowledge modelling in expert systems. In: Cardoso,
J., Camargo, H. (eds.) Fuzziness in Petri Nets, pp. 300–318. Springer, Berlin (1999)
9. Hassanien, A.E., Tolba, M.F., Shaalan, K.F., Azar, A.T. (eds.): Advances in Intelligent Systems
and Computing, p. 845. In: Proceedings of the International Conference on Advanced Intelligent
Systems and Informatics, AISI 2018, Cairo, Egypt, 3–5 Sept 2018. Springer (2019)
10. Jensen, K., Rozenberg, G. (eds.): High-level Petri Nets. Theory and Application. Springer,
Berlin (1991)
11. Kenevan, J.R., Neapolitan, R.E.: A model theoretic approach to propositional fuzzy logic
using Beth tableaux. In: Zadeh, L.A., Kacprzyk, J. (eds.) Fuzzy Logic for the Management of
Uncertainty, pp. 141–157. Wiley, New York (1993)
12. Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer, Dordrecht (2000)
13. Lajmi, F., Talmoudi, A.J., Dhouibi, H.: Fault diagnosis of uncertain systems based on interval
fuzzy Petri net. Stud. Inf. Control 26(2), 239–248 (2017)
14. Li, X., Lara-Rosano, F.: Adaptive fuzzy Petri nets for dynamic knowledge representation and
inference. Expert Syst. Appl. 19, 235–241 (2000)
15. Lipp, H.P.: Application of a fuzzy Petri net for controlling complex industrial processes. In:
Proceedings of IFAC Conference on Fuzzy Information Control, pp. 471–477 (1984)
16. Liu, H.-C., You, J.-X., Li, Z.W., Tian, G.: Fuzzy Petri nets for knowledge representation and
reasoning: a literature review. Eng. Appl. Artif. Intell. 60, 45–56 (2017) (Elsevier)
17. Looney, C.G.: Fuzzy Petri nets for rule-based decision-making. IEEE Trans. Syst. Man Cybern.
18(1), 178–183 (1988)
18. Ma, Z., Wu, W.: Logical operators on complete lattices. Inf. Sci. 55, 77–97 (1991)
19. Mayor, G., Torrens, J.: On a class of operators for expert systems. Int. J. Intell. Syst. 8, 771–778
(1993)
20. Mizumoto, M., Tanaka, K.: Some properties of fuzzy sets of type 2. Inf. Control 31, 312–340
(1976)
21. Moore, R.E.: Interval Analysis. Prentice-Hall, New Jersey (1966)
22. Moore, R.E.: Methods and Applications of Interval Analysis. SIAM Studies in Applied and
Numerical Mathematics, vol. 2 (1979)
23. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)
24. Nabli, L., Dhouibi, H., Collart Dutilleul, S., Craye, E.: Using interval constrained Petri nets for
the fuzzy regulation of quality: case of assembly process mechanics. Int. J. Comput. Inf. Eng.
2(5), 1478–1483 (2008)
25. Omran, L.N., Ezzat, K.A., Hassanien, A.E.: Decision support system for determination of
forces applied in orthodontic based on fuzzy logic. In: Proceedings of the 3rd International
Conference on Advanced Machine Learning Technologies and Applications, Cairo, Egypt,
22–24 Feb 2018, Advances in Intelligent Systems and Computing, pp. 158–168. Springer
(2018)
26. Pedrycz, W., Gomide, F.: A generalized fuzzy Petri net model. IEEE Trans. Fuzzy Syst. 2(4),
295–301 (1994)
27. Pedrycz, W.: Generalized fuzzy Petri nets as pattern classifiers. Pattern Recog. Lett. 20(14),
1489–1498 (1999)
28. Pedrycz, W.: Fuzzy Control and Fuzzy Systems, second extended edition. Wiley, Hoboken
(1993)
29. Peterson, J.L.: Petri Net Theory and the Modeling of Systems. Prentice-Hall Inc, Englewood
Cliffs (1981)
Fuzzy Petri Nets and Interval Analysis Working Together 413

30. Petri, C.A.: Kommunikation mit Automaten. Schriften des IIM Nr. 2, Institut for Instrumentelle
Mathematik, Bonn (1962)
31. Reisig, W.: Petri Nets. EATCS Monographs on Theoretical Computer Science, vol. 4. Springer,
Berlin (1985)
32. Suraj, Z.: A new class of fuzzy Petri nets for knowledge representation and reasoning. Fundam.
Inf. 128(1–2), 193–207 (2013)
33. Suraj, Z.: Knowledge representation and reasoning based on generalized fuzzy Petri nets. In:
Proceedings of the 12th International Conference on Intelligent Systems Design and Applica-
tions, Kochi, 2012, India, pp. 101–106. IEEE Press (2012)
34. Suraj, Z.: Modified generalized fuzzy Petri nets for rule-based systems. In: Proceedings of
the 15th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular
Computing, Tianjin, 2015, China. Lecture Notes in Artificial Intelligence 9437, pp. 196–206.
Springer (2015)
35. Suraj, Z., Bandyopadhyay, S.: Generalized weighted fuzzy Petri net in intuitionistic fuzzy
environment. In: Proceedings of the IEEE World Congress on Computational Intelligence,
Vancouver, 2016, Canada, pp. 2385–2392. IEEE Press (2016)
36. Suraj, Z., Grochowalski, P., Bandyopadhyay, S.: Flexible generalized fuzzy Petri nets for rule-
based systems. In: Proceedings of the 5th International Conference on the Theory and Practice
of Natural Computing, Sendai, 2016, Japan. Lecture Notes in Computer Science 10071, pp.
196–207, Springer (2016)
37. Suraj, Z.: Toward Optimization of Reasoning Using Generalized Fuzzy Petri Nets. In: Pro-
ceedings of the International Joint Conference on Rough Sets, Quy Nhon, Vietnam, 20–24
Aug 2018. Lecture Notes in Artificial Intelligence 11104, pp. 294–308. Springer (2018)
38. Yao, Y.Y.: Interval based uncertain reasoning. In: Proceedings of the 19th International Con-
ference of the North American Fuzzy Information Processing Society-NAFIPS, 13–15 July
2000, Atlanta, USA
39. Zhou, K.-O., Zain, A.M.: Fuzzy Petri nets and industrial applications: a review. Artif. Intell.
Rev. 45, 405–446 (2016)
40. Zhou, MengChu, DiCesare, F.: Petri Net Synthesis for Discrete Event Control of Manufacturing
Systems. Kluwer, 1993
41. Zurawski, R., Zhou, M.C.: Petri nets and industrial applications: a tutorial. IEEE Trans. Ind.
Electr. 41(6), 567–583 (1994)

You might also like