Professional Documents
Culture Documents
Uncertainty Management With Fuzzy and Rough Sets - Recent Advances and Applications PDF
Uncertainty Management With Fuzzy and Rough Sets - Recent Advances and Applications PDF
Rafael Bello
Rafael Falcon
José Luis Verdegay Editors
Uncertainty
Management
with Fuzzy and
Rough Sets
Recent Advances and Applications
Studies in Fuzziness and Soft Computing
Volume 377
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: kacprzyk@ibspan.waw.pl
The series “Studies in Fuzziness and Soft Computing” contains publications on
various topics in the area of soft computing, which include fuzzy sets, rough sets,
neural networks, evolutionary computation, probabilistic and evidential reasoning,
multi-valued logic, and related fields. The publications within “Studies in Fuzziness
and Soft Computing” are primarily monographs and edited volumes. They cover
significant recent developments in the field, both of a foundational and applicable
character. An important feature of the series is its short publication time and
world-wide distribution. This permits a rapid and broad dissemination of research
results.
Uncertainty Management
with Fuzzy and Rough Sets
Recent Advances and Applications
123
Editors
Rafael Bello Rafael Falcon
Department of Computer Science School of Electrical Engineering
Universidad Central “Marta Abreu” and Computer Science
de Las Villas University of Ottawa
Santa Clara, Villa Clara, Cuba Ottawa, ON, Canada
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To our families
vii
viii Preface
In Part I, the reader will find new methods based on fuzzy sets to solve machine
learning problems, such as clustering, as well as optimization problems that borrow
FST elements into their formulation. Other contributions put forth new approaches
for decision making, including those featuring fuzzy cognitive maps. There are nine
chapters comprising this Part I.
Part II includes six chapters that enrich the state of the art in RST. Several papers
propose new algorithms for knowledge discovery and decision making using rough
sets.
In Part III, five hybrid methods are introduced. Fuzzy and rough sets are com-
bined in two of the chapters. In the rest, fuzzy sets are coupled with neural and Petri
nets, as well as with GAs.
The editors hope that the methods and applications presented in this volume will
help broaden the knowledge about granular computing, soft computing and two of
its most important building blocks: fuzzy and rough set theories.
The rest of this preface briefly expands on the content of each chapter so that the
reader may dive straight into those that captured her interest.
extends the existing generalized fuzzy Petri nets by introducing a triple of operators
ðIn; Out1 ; Out2 Þ in a T2GFP-net in the form of interval triangular norms, which are
supposed to function as substitute for the triangular norms in GFP-nets. Trying to
make GFP-nets more realistic with regard to the perception of physical reality, the
chapter establishes a connection between GFP-net and interval analysis. The link is
methodological, demonstrating the possible use of the interval analysis methodol-
ogy (to deal with incomplete information) to transform GFP-nets into a more
realistic model. The proposed approach can be used both for knowledge repre-
sentation and reasoning in knowledge-based systems.
We want to express our sincere gratitude and appreciation to all those who made
ISFUROS 2017 and this Springer volume possible. In particular, we acknowledge
the support and direction provided by the ISFUROS 2017 Steering Committee and
the technical reviews and scientific insights contributed by all technical program
committee members, who generously devoted their time and efforts to provide
constructive and sound referee reports to evaluate the quality of all received
submissions.
Our gratitude also goes to the UCLV Convention organizers and the Meliá
Marina Varadero staff, who helped run the conference quite smoothly despite the
short notice to move the Convention to Varadero from its original venue in Santa
Maria Key after the catastrophic impact of hurricane Irma on the northern central
region of Cuba in September 2017. Editors are also indebted to the help received
from the project TIN2017-86647-P (funded by the Fondo Europeo de Desarrollo
Regional, FEDER) and the Asociación Universitaria Iberoamericana de Postgrado
(AUIP) research network iMODA. Special thanks go to Prof. Janusz Kacprzyk,
Gowrishankar Ayyasamy, and Leontina Di Cecco for their priceless support with
the publication of this Springer volume.
xvii
Contents
xix
xx Contents
xxiii
xxiv Contributors
xxvii
xxviii Acronyms
1 Introduction
Fuzzy clustering methods are unsupervised classification tools [1] which can be
employed to define groups of observations by considering the similarities among
them. In particular, fuzzy clustering tools allow to handle data uncertainty which is
common across different disciplines such as image processing, machine learning,
modeling and identification [2–8]. An important advantage of this type of methods
is that they can remove the influence of noise and outliers from the data clustering
[50, 51].
The Fuzzy C-Means (FCM) algorithm [9], is one of the most widely used algo-
rithm for clustering due to its robust results for overlapped data. Unlike k-means
algorithm, data points in the FCM algorithm may belong to more than one cluster
center. FCM algorithm obtains very good results with noise free data but are highly
sensitive to noisy data and outliers [1].
Other similar techniques as, Possibilistic C-Means (PCM) [10] and Possibilistic
Fuzzy C-Means (PFCM) [11] interprets clustering as a possibilistic partition and
work better in presence of noise in comparison with FCM. However, PCM fails to find
optimal clusters in the presence of noise [1] and PFCM does not yield satisfactory
results when dataset consists of two clusters which are highly unlike in size and
outliers are present [1, 10]. Noise Clustering (NC) [12], Credibility Fuzzy C-Means
(CFCM) [13], and Density Oriented Fuzzy C-Means (DOFCM) [10] algorithms were
proposed specifically to work efficiently with noisy data.
The clustering output depends upon various parameters such as distribution of data
points inside and outside the cluster, shape of the cluster and linear or non-linear sep-
arability. The effectiveness of the clustering method relies highly on the choice of the
metric distance adopted. FCM uses Euclidean distance as the distance measure, and
therefore, it can only be able to detect hyper spherical clusters. Researchers have pro-
posed other distance measures such as, for example, Mahalanobis distance measure,
and Kernel based distance measure in data space and in high dimensional feature
space, such that non-hyper spherical/non-linear clusters can be detected [14, 15].
However, one drawback of these clustering algorithms is that they treat all features
equally in the decision of the cluster memberships of objects. A solution to this pro-
blem, is to introduce the proper attribute weight into the clustering process [16, 17].
Many attribute-weighted fuzzy clustering methods have been proposed in the
last times. In [18], is used the weighted Euclidean distance to replace the general
Euclidean distance in FCM. In [19], the grouping is carried out clustering on the
selected subspace instead of the full data space by directly assigning zero weights to
features which have little information. Recently, [20] present an enhanced soft sub-
space clustering (ESSC) algorithm by employing both within-cluster and between-
cluster information. In [21], a novel subspace clustering technique has been proposed
by introducing the feature interaction using the concepts of fuzzy measures and the
Choquet integral. [22] give a survey of weighted clustering technologies. Finally,
in [23], a maximum-entropy-regularized weighted fuzzy c-means (EWFCM) algo-
rithm is proposed, to extract the important features and improve the clustering. In
EWFCM algorithm, the attribute-weight entropy regularization is defined in the new
objective function to achieve the optimal distribution of attribute weights. So that,
we can simultaneously minimize the dispersion within clusters and maximize the
entropy of attribute weights to stimulate important attributes for contributing to the
identification of clusters. Then, the good clustering result can be yielded and the
important attributes can be extracted for cluster identification. Moreover, the kernel
based EWFCM (KEWFCM) clustering algorithm is realized for clustering the data
with non-spherical shaped clusters.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 5
2 Related Works
Many algorithms have been developed for fuzzy clustering. Among the most used
techniques are: fuzzy-type relationship algorithms such as the fuzzy non-metric
model [26], the fuzzy C-Means of relationship [27], the non-Euclidean C-Means
of relationship [28], the fuzzy C-medoids [29], and the fuzzy relation data clustering
6 A. Rodríguez-Ramos et al.
algorithm [30]. On the other hand dynamic algorithms are found such as the adaptive
fuzzy clustering algorithm (AFC-Adaptative Fuzzy Clustering) [31], the Matryoshka
method [32], the dynamic neuro-fuzzy inference system (DENFIS) has been used
to make prediction of time series [33]. There is also another technique such as the
LAMDA algorithm (Learning Algorithm for Multivariate Data Analysis) is a fuzzy
classification technique based on the evaluation of the suitability of individuals for
each class [34].
Among the fuzzy clustering methods, the distance based represent the majority.
Fuzzy C-Means (FCM) is the most popular one. The optimization criterion (1) defined
by FCM is used to cluster the data by considering the similitude among observations.
c
N
J (X ; U, v) = (μik )m (dik )2 (1)
i=1 k=1
The exponent m > 1 in (1), is an important factor which regulates the fuzziness of
the resulting partition. If m → ∞, all patterns will have the same membership degrees
to each group (fuzzy partition). However, if m → 1, the patterns will belong to only
one group (hard partition). The fuzzy clustering allows to obtain the membership
degrees matrix U = [μik ]c×N where μik represents the degree of fuzzy membership
of the sample k to the i −th class, which satisfies the following relationship:
c
μik = 1, k = 1, 2, . . . , N (2)
i=1
where c is the number of classes and N is the number of samples. In this algorithm,
the similitude is evaluated by means of the distance function dik , represented by the
Eq. (3). This function provides a measure of the distance between the data and the
center of each class v = v1 , v2 , . . . , vc , being A ∈ n×n the norm induction matrix,
where n is the quantity of measured variables.
The measure of dissimilarity is the square distance between each data point and
the clustering center vi . This distance is weighted by a power of the membership
degree (μik )m . The value of the cost function J is a measure of the weighted total
quadratic error and statistically, it can be seen as a measure of the total variance of
xk regarding vi .
The conditions for local extreme for the Eqs. (1) and (2) are derived using
Lagrangian multipliers:
1
μik = c 2/(m−1) (4)
j=1 dik,A /d jk,A
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 7
N
(μik )m xk
vi =
k=1
N
(5)
k=1 (μik )m
In Eq. (5) should be noted that vi is the weighted average of the data elements
that belong to a cluster, i.e., it is the center of the cluster i. The FCM algorithm is an
iterative procedure where N data are grouped in c classes. Initially, the user should
establish the number of classes (c). The centers of the c classes are initialized in a
random form, and they are modified during the iterative process. In a similar way the
membership degrees matrix U is modified until it is stabilized, i.e. Ut − Ut−1 < ε,
where ε is a tolerance limit prescribed a priori, and t is an iteration counter.
New fuzzy clustering methods have been proposed in the last years to deal with
the classification problem in different applications.
Ding [14] recently presented GAKFCM for clustering the data in two steps. First,
the initial clusters centers are adjusted by using an improved adaptive genetic algo-
rithm. Second, classification is accomplished through the KFCM method. A picture
fuzzy clustering method (FC-PFS) is presented in [4] by considering theory of pic-
ture fuzzy sets (PFS). It is demonstrated that better clustering quality than other
important methods can be achieved with FC-PFS. The essence of this method is that
it modifies the objective function based on PFS theory. The idea behind the new
function considers two aspects. First, inherits from FCM’s objective function where
the membership degree μ in Eq. (1) are replaced by μ(2 - ξ ) which means that one
data element belonging to a cluster has both: high value of positive degree and low
value of refusal degree [4]. Second, the entropy information is added to the objective
function for helping the method to decrease the neutral and refusal degree of an
element that turns into a member of the cluster. The clustering quality is improved
because the entropy information is relevant [4].
A proper cluster structure which covers the feature set is hard to define for many
data sets. Thus, Zhou [23] presents a maximum- entropy-regularized weighted fuzzy
c-means method to determine important features for enhancing the data clustering
results. The optimal distribution of attribute weights is determined by defining an
objective function based on the attribute weight entropy regularization. This approach
allows, at the same time, to minimize the dispersion within clusters and to maximize
the entropy of the weights of those attributes that promote the identification of clus-
ters. Thus, relevant attributes for a successful clustering identification are identified.
In addition, the kernel version of the EWFCM method (KEWFCM) is implemented
to deal with data possibly containing non-spherical shaped clusters. The Gaussian
kernel has been used [23].
8 A. Rodríguez-Ramos et al.
The DOKEWFCM algorithm is intended as a hybrid algorithm that uses the potential
of DOFCM [13] to detect and eliminate the outliers in a dataset, and the potentiali-
ties of KEWFCM [23] to extract the important features and improve the clustering
process. Kernel functions allow to cluster data with non-spherical shaped clusters.
Thus, classification errors can decrease because a better separability among classes is
achieved. The Fig. 1 shows the procedure performed by the DOKEWFCM algorithm.
A cluster of noise observations is created together with c clusters (total: c + 1
clusters). The final clustering is performed after the outliers are identified by con-
sidering the data density.The point neighborhood defined by a certain radius must
include a minimum number of observations. The neighborhood membership or den-
sity factor is defined by DOKEWFCM and it assess the density of an observation
with respect to its neighborhood. This measure of the neighborhood membership of
a point i in X is defined as:
ηneighbor
i
hood
hood =
i
Mneighbor (6)
ηmax
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 9
where ηneighbor
i
hood represents the number of points given a neighborhood i; ηmax is
the maximum number of points given by the most populated neighborhood of the
dataset.
If the point q is in the point neighborhood of the point i, q will satisfy:
where rneighbor hood is the radius of neighborhood, and dist (i, q) is the distance
between points i and q. Neighborhood membership of each point in the dataset
X is calculated using Eq. (6). The threshold value α is selected from the complete
range of neighborhood membership values, depending on the density of points in the
dataset. The point will be considered as an outlier if its neighborhood membership
is less than α. Let i be a point in the dataset X , then
i
Mneighbor hood < α then i outlier
(8)
hood ≥ α then i non-outlier
i
Mneighbor
c+1
N
M
J = (μik )m wil Φ(xkl ) − ṽil 2
i=1 k=1 l=1
c+1
M
+γ −1 wil log(wil ) (9)
i=1 l=1
c M
Subject to 0 ≤ i=1 μik ≤ 1 and l=1 wil = 1, 0 ≤ wil ≤ 1, where U = [μik ]c×N
is the membership degree matrix in the original space. W = [wil ]c×M is the attribute
weight matrix in the original space. Ṽ = [ṽil ]c×M is the cluster center matrix in the
kernel space. Φ is the non-linear mapping from the original feature space to the
kernel space. In this case, the Gaussian kernel is used (K(xkl , vil ) = e−xkl −vil /σ ).
2 2
The matrices V and W are updated according to the Eqs. (10) and (11) respectively.
For this case, in Eq. (10) must be observed that: i = 1, . . . , c.
N
k=1 (μik ) K(xkl , ṽil )x kl
m
ṽil = N
(10)
k=1 (μik ) K(xkl , ṽil )
m
10 A. Rodríguez-Ramos et al.
N
ex p −γ k=1 (μik )m Φ(xkl ) − ṽil 2
wil = N (11)
M
s=1 ex p −γ k=1 (μik ) Φ(xks ) − ṽil
m 2
0 if outlier
Various datasets from the UCI Machine Learning Repository [35] are used to evaluate
the performance of the proposal: Iris, Glass, Ionosphere, Haberman and Heart. These
datasets are contaminated with outliers included evenly among the classes. Table 1
gives an overview of the datasets modified.
To evaluate the performance of the proposed algorithm (DOKEWFCM), the
KEWFCM algorithm [23] was selected to perform a comparative analysis. In addi-
tion, other recent algorithms (GAKFCM [14], FC-PFS [4]) with excellent results
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 11
were also selected to make this comparison. The values of the common parameters
for these algorithms are: I tr _max = 100, ε = 10−5 , m = 2. The specific parameters:
• KEWFCM: γ = 0.05 and σ = 10.
• GAKFCM: σ = 10, crossover rate pco = 0.6 and mutation rate pmo = 0.001.
• FC-PFS: α = 0.6 (where α ∈ (0, 1] is an exponent coefficient used to control the
refusal degree in picture fuzzy sets)
Each algorithm was executed ten times on each dataset. In order to make the
comparative analysis, the classification rate was used as a performance metric. The
classification rate is a measure used to determine how well clustering algorithms
perform on the given dataset with a known cluster structure [23]. It can be measured
by using the Eq. (13), which is expressed as a percentage in this chapter as follow:
c
di
C R = i=1 (13)
N
where di is the number of objects correctly identified in the cth cluster, and N is the
number of all objects in the dataset. Table 2 shows the results of the comparison. It
can be observed that the proposed algorithm obtains the best ACR for all analyzed
datasets.
The Fig. 2 shows for the Iris dataset, that the DOKEWFCM algorithm is able to
classify the outliers (shown in black color). Later on, the algorithm classifies the
observations after the outliers were eliminated (Fig. 3).
Table 3 shows the attribute weight assignment of DOKEWFCM algorithm on
Iris dataset. It is evident that attributes 3 and 4 contribute much more than other
two attributes in clustering, since the algorithm assigns higher weights to these two
attributes.
12 A. Rodríguez-Ramos et al.
100 50
petal length
petal width
50
0 0
100 100
100 100
50 50 50 50
sepal width 0 0 sepal length sepal width 0 0 sepal length
100 100
sepal length
sepal width
50 50
0 0
50 50
100 100
petal width 50 petal width 50
0 0 petal length 0 0 petal length
100 40
petal length
petal width
50 20
0 0
60 60
80 80
40 60 40 60
sepal width 20 40 sepal length sepal width 20 40 sepal length
100 100
sepal length
sepal width
50 50
0 0
50 50
100 100
50 50
petal width 0 0 petal length petal width 0 0 petal length
Statistical tests are applied to determine if significant differences among the results
presented in Table 2 exist [36–38]. The non-parametric test of Friedman is first used
to evaluate if significant differences among the methods are obtained. If the test is
positive, a comparison in pairs is performed by employing the non-parametric test
of Wilcoxon.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 13
The results using the Iris dataset are shown below. In this case, for four experiments
(k = 4) and 10 datasets (N = 10), the value of statistical Friedman FF = 270 0
−→ ∞
was obtained. With k = 4 and N = 10, FF is distributed according to the F distribu-
tion with 4 − 1 = 3 and (4 − 1) × (10 − 1) = 27 degrees of freedom. The critical
value of F(3,27) for α = 0.05 is 2.9604, so we reject the null-hypothesis (F(3,27)
FF ) which means that at least the average performance of at least one algorithm is
significantly different from the average value of the performance of other algorithms.
For the remaining datasets (Glass, Ionosphere, Haberman and Heart) the same results
were obtained when applying the Friedman test.
The comparison results for the Iris dataset can be observed in Table 4 (1: GAKFCM, 2:
FC-PFS, 3: KEWKFCM, 4: DOKEWFCM). The first two rows contain the values of
the sum of the positive (R + ) and negative (R − ) rank for each comparison established.
The next two rows show the statistical values T and the critical value of T for a level
of significance α = 0.05. The last row indicates which algorithm was the winner
in each comparison. The summary in Table 5 shows the times that each algorithm
was the winner using all datasets. This results validates that the new fuzzy clustering
algorithm proposed in this chapter obtains the best performance.
14
The classification rate (see Eq. (13)) is a measure used to determine how well clus-
tering algorithms perform on the given dataset with a known cluster structure, but in
practice, you will not know the cluster structure. Therefore, the Davies-Bouldin and
Silhouette validity indices were also analyzed [48, 49].
Let XT = X1 , . . . , XN be the dataset and let D = (D1 , . . . , Dc ) be its clustering
j j
c clusters. Let D j = X1 , . . . , Xmj be the j − th cluster, j = 1, . . . , c, where m j
into
= D j .
The Davies-Bouldin index (DB) is defined in the following way:
1
c
Δ(Di ) + Δ(D j )
DB = max (14)
c i=1,i
= j δ(Di , D j )
where Δ(Di ), Δ(D j ) is the intra-cluster distance and δ(Di , D j ) is the inter-cluster
distance.
Small values for the DB index indicate compact clusters, and whose centers are
well separated between them. Consequently, the number of clusters that the DB index
minimizes is taken as the optimum.
Silhouette width of the i − th vector in the cluster Δ(D j ) is defined in the fol-
lowing way:
j j
j bi − ai
si = (15)
j j
max ai , bi
j
where ai is the average distance between the i − th vector in the cluster D j and the
j
other vectors in the same cluster and bi is the minimum average distance between
the i − th vector in the cluster D j and all the vectors clustered [48, 49].
j
From the Eq. (16), it follows that 1 ≤ si ≤ 1. We can now define the Silhouette
of the cluster D j :
1 j
mj
Sj = s (16)
m j j=1 i
Finally, the global Silhouette index of the clustering is given by Eq. (17). Values
next to 1 of the Silhouette index indicate a better clustering. Therefore, the number
of clusters that the S index maximizes is taken as the optimum.
1
c
S= Sj (17)
c j=1
16 A. Rodríguez-Ramos et al.
1.1 0.8
DB index
1 S index 0.75
Validity indices
Validity indices
0.9 0.7
DB index
0.8 S index
0.65
0.7
0.6
0.6
0.5 0.55
0.4 0.5
0.3 0.45
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10
Number of Clusters Number of Clusters
(a) Results for Iris dataset (b) Results for Glass dataset
1 1.1
DB index DB index
0.9 S index 1 S index
Validity indices
Validity indices
0.8
0.9
0.7
0.8
0.6
0.7
0.5
0.4 0.6
0.3 0.5
0.2 0.4
2 3 4 5 6 7 8 9 10 2 3 4 5 6 7 8 9 10
Number of Clusters Number of Clusters
(c) Results for Ionosphere dataset (d) Results for Haberman dataset
1
DB index
S index
0.9
Validity indices
0.8
0.7
0.6
0.5
0.4
2 3 4 5 6 7 8 9 10
Number of Clusters
(e) Results for Heart dataset
The Fig. 4a–e shows the values of the validity indices when the DOKEWFCM
algorithm is used. The analysis was performed for the datasets: iris, glass, ionosphere,
haberman and heart. The number of classes was varied from 1 to 10 with the objective
of analyzing if the best validity index was obtained for the corresponding number
of classes. The results showed in Fig. 4 corroborate the well performance of the
algorithm proposed in this paper.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 17
The Supervisory Control and Data Acquisition (SCADA) systems are used for acquir-
ing data in industrial processes. Based on a measure of similitude, the acquired data
are grouped in classes using clustering methods. These classes can be related to
functional states. To determine the class which an observation belongs, the classi-
cal statistical classifiers compare it with the center of each class using a measure
of similitude. However, the fuzzy classifiers use the comparison to determine the
membership degree of the observation to each class. In general form, the observation
is assigned to which class where its membership degree is highest as it is shown in
(18).
Figure 5 shows the condition monitoring scheme with possibilities of online detec-
tion of new faults and automatic learning using the proposed hybrid algorithm. The
hybrid fuzzy clustering algorithm has two stages: a training stage and an online
stage. In the first, the algorithm is trained using a historical dataset and the classes
which identify the functional stages of the process are formed. In the online stage, the
hybrid algorithm classifies every new observation obtained from the process. After
obtaining a continuous number of observations that make up a window of time,
the observations do not classified in the known functional states (classes) are ana-
lyzed seeking if their constitute a new class. Whenever a new class is detected, it is
characterized by the experts and it is added to the training database. After that, the
classifier should be trained again.
Next, it is presented a description of each stage in detail.
1
c N
PC = (μik )2 (19)
N i=1 k=1
The clustering process will be better as much as as the partition U is less fuzzy,
because it permits a better measure of the overlapping degree among the classes.
Then, the best result is obtained with the maximization of the value of PC because
that is equivalent to the fact that each pattern belongs only to one group.
Then, the optimization problem is defined as:
1
c N
max {PC} = (μik )2
N i=1 k=1
subject to:
m min < m ≤ m max
σmin ≤ σ ≤ σmax
A range of values for m and σ are defined by considering the last definition.
Although 1 < m < ∞ it is widely known that from a practical perspective m is not
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 19
greater than 2 [4–7]. Thus, 1 < m ≤ 2 is considered in this paper. The smoothness
degree of the function is indicated by the parameter σ . If this parameter is overesti-
mated a linear behavior is exhibited by the function such that the projection to the high
dimensional space is useless for separating the non-linear data space. Meanwhile,
if the value of σ is underestimated, the result will be highly sensitive to the noise
present in the data. Therefore, with the objective that small and larges values be con-
sidered during the exploration process, a large search space for the algorithm should
be used. In this chapter, after several experiments , was determined as a satisfactory
interval the follow: 0.25 ≤ σ ≤ 20.
In condition monitoring field, bio-inspired algorithms have been used with excel-
lent results [42–44] to solve optimization problems. There are several bio-inspired
algorithms, in original and improved versions. Some examples are Genetic Algo-
rithm (GA), Artificial Bee Colony (ABC), Differential Evolution (DE), and Particle
Swarm Optimization (PSO) for only mentioning someone of them . In this chapter,
the best values of m and σ are estimated by using the DE algorithm because of its
easy implementation and good outcomes.
With the objective to avoid an unwanted displacement of the center of each class
after the training stage produced by an unknown small fault with high latency time,
the hybrid algorithm is modified in this stage, and the updating of the center of each
class is not developed [51].
The experts select how many observations (k) form the time window and set the
parameter T h, The parameter k should be selected according to the process features
because it represents the number of sampling times that should be considered by the
experts to investigate if a fault is occurring. A group of observations is classified as
noise if they do not represent at least the T h percent of the k observations that form the
time window. Otherwise, the group is considered to probably represent a fault, T h is
also determined by the experts. When an observation xk arrives, the DOKEWFCM
algorithm (Step 1: Identification of the outliers) classifies it as noise or as good
taking into account the results of the training. If the observation is classified as a
good sample, the DOKEWFCM algorithm (Step 2: Clustering process) identifies to
which of the known classes Ci it belongs to. A counter of noise observations (N O)
is incremented when an observation is classified as noise, such strategy is repeated
up to k observations such that the time windows is completed.
The percentage of observations classified as noise is calculated once k obser-
vations are acquired (N O P = N O ∗ 100/k). The existence of a new class is ana-
lyzed if N O P > T h. Otherwise, the N O parameter is re-initiated. The N O could
then represent either a new fault class or outliers. The occurrence of a new normal
operating condition is not considered here because it is assumed that the process
operators should be aware of these situations such that the diagnosis system can be
updated with new data and re-started. DOKEWFCM is employed to inspect the noise
20 A. Rodríguez-Ramos et al.
observations. Outliers will generally form a dispersed data with low density and a
cluster is not formed by them. Conversely, once a new fault impact the process the
observations will form a high density region that constitutes a class.
If a new class is confirmed, the experts can analyze the pattern to determine if a
single or multiple fault is occurring. Once the pattern is identified and characterized,
it will be stored, if correspond, in the historical database used in the training stage.
Later on, the classifier should be trained again and the procedure of online recognition
will be repeated systematically.
The scheme described for the online step is a mechanism for the detection of novel
faults with automatic learning. Algorithm 2 describes this proposal.
Algorithm 2 Recognition
Input: data X k , class centers V, rneighbor hood , n max , α, m, σ .
Output: Current State.
Select k and T h
Initialize O counter = 0 and N O counter = 0
for j = 1 to j = k do
O counter = O counter + 1
Compute ηneighbor
i
hood with Eq. (7).
i
Compute Mneighbor hood with Eq. (6).
With the value of α, identify outliers with Eq. (8).
if k ∈
/ Coutlier then
Compute the distances from the observation k to class centers with Eq. Φ(xkl ) − ṽil 2 .
Compute the membership degree of the observation k to the c good classes with Eq. (12).
Determine to which class belongs the observation k using (18).
else
Store observation k in Cnoise
N O counter = N O counter + 1
end if
end for
Compute N O P = (N O counter k
)∗100
if N O P > T h then
Apply DOKEWFCM algorithm for Cnoise considering only classes C N F and Coutlier
if Cnoise ∈/ Coutlier then
Create a new fault: C N F
Store in the historical database for training.
else
Delete Cnoise
N O counter = 0
O counter = 0
end if
else
Delete Cnoise
N O counter = 0
O counter = 0
end if
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 21
faults have a variety of behaviors to demonstrate the robustness and sensitivity of the
proposed approach.
In the off-line training stage the diagnostic system was not trained to recognize
the faults 17, 18 and 19 with the aim of using them to test the algorithm of online
detection of new faults. These faults were only simulated in the online recognition
stage. A sampling time of 1 s is used to simulate the 4 variables shown in Table 7.
The simulations were performed by using the Matlab-Simulink DABLIB library.
The actuator block inputs and outputs were contaminated with white noise to assess
the robustness of the proposal. Such noise can be caused by the electromagnetic
susceptibility of physical sensors
A total of 80 observations were acquired from each process state. Then, 160
observations representing outliers were evenly distributed among the classes. Outliers
are simulated as values out of the variables measurement range
In the off-line training stage the DOKEWFCM algorithm was applied. The values
of the parameters used in the simulations for the applied algorithm were: Number of
iterations = 100, ε = 10−5 and initial values of m = 2 and σ = 1 were considered.
In this stage the diagnostic system was not trained to recognize the faults 17–19
with the objective of using them to test the algorithm in the online detection of new
faults. These faults were only simulated in the online recognition stage.
To estimate the m and σ parameters, DE algorithm was used due to its advan-
tages, specifically its simple structure, higher speed and robustness [42]. The control
parameters in DE are the size of the population Z, the crossover constant C R and the
scaling factor FS . The values of the parameters for the DE algorithm considering
a search space 1 < m ≤ 2 and 0.25 ≤ σ ≤ 20 were: C R = 0.5, FS = 0.1, Z = 10,
Eval_max = 100 and PC > 0.9999.
The DE algorithm were executed 10 times and the arithmetic mean of the param-
eters m, σ and number of evaluations of the objective function (Eval_Fobj) were
calculated. The behavior of the objective function (PC) presented in Fig. 7, shows
how the DE algorithm rapidly converges . From the iteration 7 the best parameters
were obtained: m = 1.0527 and σ = 15.6503.
These experiments were performed in a computer with the following character-
istics: Intel Core i7-6500U 2.5 - 3.1GHz, memory: 8GB DDR3L. The average exe-
cution time was approximately 3 min, equivalent to 89 evaluations of the objective
function.
Table 8 shows the results obtained in the training stage. In the second column
can be analyzed the results of the classification for the operating states that were
considered (NOC, faults F1, F7, F12 and F15). The last column reflects the variables
or attributes with greater contribution (higher weight values) in the clustering of
the analyzed classes (operating states). To obtain these attributes, a parameter called
weight threshold (T w) must be selected from the expert criterion. If the weight of the
attribute is greater than T w, then the attribute is selected. Figure 8 shows an example
of selection of the attributes considering the faults F1, F7, F12 and F15 (T w = 0.25).
24 A. Rodríguez-Ramos et al.
0.8
0.7
0.6
0.5
0.4
0.3
0 2 4 6 8 10
Iterations
Fault 1 Fault 7
0.4 0.4
0.3 0.3
Weight
Weight
0.2 0.2
0.1 0.1
0 0
1 2 3 4 1 2 3 4
Attribute Attribute
Fault 12 Fault 15
0.6 0.4
0.5
0.3
0.4
Weight
Weight
0.3 0.2
0.2
0.1
0.1
0 0
1 2 3 4 1 2 3 4
Attribute Attribute
In this stage, the Algorithm 2 was applied to perform online recognition. In a first
experiment, we considered the operating states used in the training stage (NOC,
faults F1, F7, F12, F15). In the second experiment were used the faults 17, 18 and
19 to test the algorithm of online detection of new faults.
In order to detect early a new fault, it was considered to evaluate 100 samples.
This imply a size of the window of time k = 100 equivalent to 100 sec. In the case of
the decision threshold, it was decided a value of threshold of T h = 60% to establish
an adequate level of majority of samples classified as noise. It must be remarked
that the parameters must be adjusted according to the type of process and experts
opinion.
Table 9 shows a comparison between the results of the classification with all vari-
ables (Case 1) and with the attributes of greater contribution (Case 2) determined in
the training stage. The results show that by using the variables with the most contri-
bution in the clustering of the classes during the training stage, a better classification
(%) of the different operating states is obtained.
However, to know if there are significant differences between the Case 1 and Case
2, it is necessary to apply statistical tests. Then, it is necessary to make a comparison in
pairs to determine which is the best algorithm. For this, the non-parametric Wilcoxon
test is applied.
Table 10 shows the results of the comparison in pairs of the Case 1 and Case 2
using the Wilcoxon test. This results validates that the best results are obtained when
the variables with the most contribution in the clustering during the training stage
are used.
26 A. Rodríguez-Ramos et al.
In the second experiment the unknown faults F17–19 were analyzed. First, the fault
17 was considered, which was identified as a new class. Once a new fault is detected,
the experts should determine the features of the unusual behavior and re-train the
fault diagnosis system by considering a dataset formed by the new observations
together with the old dataset. Similar experiments were developed for faults 18 and
19 respectively. The Table 11 shows the results obtained for the unknown faults F17–
19. The last column reflects the variables that most contributed to the identification
of these faults.
False Alarm Rate (FAR) and Fault Detection Rate (FDR) are performance measures
that can be determined, according to [47], by using the following equations:
N o. o f samples (J > Jlim | f = 0 )
F AR = (20)
total samples ( f = 0)
where J is the output for the used discriminative algorithms by considering the
fault detection stage as a binary classification process, and Jlim is the threshold that
determines whether one sample is classified as a fault or normal operation. Figures 9
and 10 present the results obtained in the classification process of the faults F1,F7,
F12 and F15. In both cases, the best results are obtained with the variables of greater
contribution in the clustering.
Figure 11 illustrates the FAR and FDR performance indicators for the unknown
faults.
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 27
FAR (%)
4
0
1 7 12 15
Fault
98
97
96
95
1 7 12 15
Fault
40
20
0
17 18 19
Fault
6 Conclusions
In the present chapter a hybrid fuzzy clustering algorithm is proposed. The algorithm
is applied in a condition monitoring scheme with online detection of novel faults and
automatic learning. This allows first to identify the outliers before the clustering
process with the aim to minimize the classification errors. Later on, the outliers are
removed and the clustering process is performed. To extract the important features
28 A. Rodríguez-Ramos et al.
Acknowledgements The authors acknowledge the financial support provided by FAPERJ, Fun-
dacão Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro; CNPq, Conselho
Nacional de Desenvolvimento Científico e Tecnológico; CAPES, Coordenação de Aperfeiçoamento
de Pessoal de Nível Superior, research supporting agencies from Brazil; UERJ, Universidade do
Estado do Rio de Janeiro and CUJAE, Universidad Tecnológica de La Habana José Antonio Echev-
erría and the help of Dr. Marcos Quiñones Grueiro (Universidad Tecnológica de La Habana José
Antonio Echeverría)
References
1. Gosain, A., Dahika, S.: Performance analysis of various fuzzy clustering algorithms: a review.
In: 7th International Conference on Communication. Comput. Virtualiz. 79, 100–111 (2016)
2. Chi Man Vonga, K. I. W., Kin Wong, P.: Simultaneous-fault detection based on qualitative
symptom descriptions for automotive engine diagnosis. Appl. Soft Comput. 22, 238–248 (2014)
3. Jiang, X.L., Wang, Q., He, B., Chen, S.J., Li, B.L.: Robust level set image segmentation
algorithm using local correntropy-based fuzzy c-means clustering with spatial constraints.
Neurocomputing 207, 22–35 (2016)
4. Thong, P.H., Son, L.H.: Picture fuzzy clustering: a new computational intelligence method.
Soft Comput. 20, 3549–3562 (2016)
5. Kesemen, O., Tezel, O., Ozkul, E.: Fuzzy c-means clustering algorithm for directional data
( f cm4dd). Expert Syst. Appl. 58, 76–82 (2016)
6. Zhang, L., Lu, W., Liu, X., Pedrycz, W., Zhong, C.: Fuzzy c-means clustering of incomplete
data based on probabilistic information granules of missing values. Knowl. Based Syst. 99,
51–70 (2016)
7. Leski, J.M.: Fuzzy C-ordered-means clustering: Fuzzy Sets Syst. 286, 114–133 (2016)
8. Saltos, R., Weber, R.: A rough-fuzzy approach for support vector clustering. Inf. Sci. 339,
353–368 (2016)
9. Aghajari, E., Chandrashekhar, G.D.: Self-Organizing Map based Extended Fuzzy C-Means
(SEEFC) algorithm for image segmentation. Appl. Soft Comput. 54, 347–363 (2017)
10. Kaur, P., Soni, A., Gosain, A.: Robust kernelized approach to clustering by incorporating new
distance measure. Eng. Appl. Artif. Intell. 26, 833–847 (2013)
11. Askari, S., Montazerin, N., Zarandi, M.H.: Generalized possibilistic fuzzy C-Means with novel
cluster validity indices for clustering noisy data. Appl. Soft Comput. 53, 262–283 (2017)
A Proposal of Hybrid Fuzzy Clustering Algorithm with Application … 29
12. Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and
categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38,
8684–8689 (2011)
13. Kaur, P.: A density oriented fuzzy c-means clustering algorithm for recognising original cluster
shapes from noisy data. Int. J. Innov. Comput. Appl. 3, 77–87 (2011)
14. Ding, Y., Fu, X.: Kernel-based fuzzy c-means clustering algorithm based on genetic algorithm.
Neurocomputing 188, 233–238 (2016)
15. Akbulut, Y., Sengur, A., Guo, Y., Polat, K.: KNCM: kernel neutrosophic C-Means clustering.
Appl. Soft Comput. 52, 714–724 (2017)
16. Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52, 217–
237 (2003)
17. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review.
SIGKDD Explor. 6, 90–105 (2004)
18. Wang, X.Z., Wang, Y.D., Wang, L.J.: Improving fuzzy c-means clustering based on feature-
weight learning. Pattern Recognit. Lett. 25, 1123–1132 (2004)
19. Borgelt, C.: Feature weighting and feature selection in fuzzy clustering. Proc. IEEE Conf.
Fuzzy Syst. 1, 838–844 (2008)
20. Deng, Z., Choi, K.S., Chung, F.L., Wang, S.: Enhanced soft subspace clustering integrating
within-cluster and between-cluster information. Pattern Recognit. 43, 767–781 (2010)
21. Ng, T.F., Pham, T.D., Jia, X.: Feature interaction in subspace clustering using the Choquet
integral. Pattern Recognit. 45, 2645–2660 (2012)
22. Tang, C.L., Wang, S.G., Xu, W.: New fuzzy c-means clustering model based on the data
weighted approach. Data Knowl. Eng. 69, 881–900 (2010)
23. Zhou, J., Chen, L., Philip Chen, C.L., Zhang, Y., Li, H.L.: Fuzzy clustering with the entropy
of attribute weights. Neurocomputing 198, 125–134 (2016)
24. Silva Filho, T.M., Pimentel, B.A., Souza, R.M., Oliveira, A.L.I.: Hybrid methods for fuzzy
clustering based on fuzzy c-means and improved particle swarm optimization. Expert Syst.
Appl. 42, 6315–6328 (2015)
25. Bernal de Lázaro, J.M., Llanes-Santiago, O., Prieto Moreno, A., Knupp, D.C., Silva-Neto, A.J.:
Enhanced dynamic approach to improve the detection of small-magnitude faults. Chemi. Eng.
Sci. 146, 166–179 (2016)
26. Roubens, M.: Pattern classification problems and fuzzy sets. Fuzzy Sets Syst. 1, 239–253
(1978)
27. Hathaway, R.J., Davenport, J.W., Bezdek, J.C.: Relational duals of the c-means clustering
algorithms. Pattern Recognit. 22, 205–212 (1989)
28. Hathaway, R.J., Bezdek, J.C.: NERF C-means: non-Euclidean relational fuzzy clustering. Pat-
tern Recognit. 27, 429–437 (1994)
29. Krishnapuram, R., Joshi A., Nasraoui, O., Yi, L.: Low-complexity fuzzy relational clustering
algorithms for web mining. IEEE Trans. Fuzzy Syst. 9, 595–607 (2001)
30. Dave, R., Sen, S.: Robust fuzzy clustering of relational data. IEEE Trans. Fuzzy Syst. 10,
713–727 (2002)
31. Krishnapuram, R., Kim, J.: A note on the GustafsonKessel and adaptive fuzzy clustering
algorithms. IEEE Trans. Fuzzy Syst. 7, 453–461 (1999)
32. Li, C., Biswas G., Dale M., Dale P., Matryoshka.: A HMM based temporal data clustering
methodology for modeling system dynamics. Intell. Data Anal. 6, 281–308 (2002)
33. Kasabov, N.K., Song, Q.: DENFIS: dynamic evolving neural-fuzzy inference system and its
application for time-series prediction. IEEE Trans. Fuzzy Syst. 10, 144–154 (2002)
34. Aguilar, J., Lopez De Mantaras R.: The process of classification and learning the meaning of
linguistic descriptors of concepts. Approx. Reason. Decis. Anal. 165–175 (1982)
35. Asuncion, A., Newman, D.: UCI machine learning repository, University of California, School
of Information and Computer Science, Irvine, CA. [Online] Accessed http://archive.ics.uci.
edu/beta
36. García, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data
sets for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)
30 A. Rodríguez-Ramos et al.
37. García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for
analyzing the evolutionary algorithms behaviour: a case study on the cec 2005 special session
on real parameter optimization. J. Heur. 15, 617–644 (2009)
38. Luengo, J., García, S., Herrera, F.: A study on the use of statistical tests for experimentation
with neural networks: analysis of parametric test conditions and non-parametric tests. Expert
Syste. Appl. 36, 7798–7808 (2009)
39. Li, C., Zhou, J., Kou, P., Xiao, J.: A novel chaotic particle swarm optimization based fuzzy
clustering algorithm. Neurocomputing 83, 98–109 (2012)
40. Pakhira, M., Bandyopadhyay, S., Maulik, S.: Validity index for crisp and fuzzy clusters. Pattern
Recognit. 37, 487–501 (2004)
41. Wu, K., Yang, M.: A cluster validity index for fuzzy clustering. Pattern Recognit. 26, 1275–1291
(2005)
42. Camps Echevarría, L., Llanes-Santiago, O., Silva Neto, A.J.: An approach for fault diagnosis
based on bio-inspired strategies. Stud. Comput. Intell. 284, 53–63 (2010)
43. Liu, Q., Lv, W.: The study of fault diagnosis based on particle swarm optimization algorithm.
Comput. Inf. Sci. 2, 87–91 (2009)
44. Lobato, F., Steffen Jr., F., Silva Neto, A. J.: Solution of inverse radiative transfer problems in
two-layer participating media with Differential Evolution. Inverse Probl. Sci. Eng. 18, 183–195
(2009)
45. Bartys, M., Patton, R., Syfert, M., de las Heras, S., Quevedo. J.: Introduction to the damadics
actuator FDI benchmark study. Control Eng. Pract. 14, 577–596 (2006)
46. Kourd, Y., Lefebvre, D., Guersi, N.: FDI with neural network models of faulty behaviours and
fault probability evaluation: application to DAMADICS. In: 8th IFAC Symposium on Fault
Detection, Supervision and Safety of Technical Processes (SAFEPROCESS), pp. 744–7495
(2012)
47. Yin, S., Ding, S.X., Haghani, A., Hao, H., Zhang, P.: A comparison study of basic data-driven
fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process.
J. Process Control 22, 1567–1581 (2012)
48. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal
Process. 83, 825–833 (2003)
49. Gunter, S. and Bunke, H.: Validation Indices for Graph Clustering. In: Jolion, J., Kropatsch, W.,
Vento, M. (eds.) Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representations
in Pattern Recognition, CUEN Ed., pp. 229–238. Italy(2001)
50. Rodríguez Ramos, A., Llanes-Santiago, O., Bernal de Lázaro, J.M., Cruz Corona, C., Silva
Neto, A.J., Verdegay Galdeano, J.L.: A novel fault diagnosis scheme applying fuzzy clustering
algorithms. Appl. Soft Comput. 58, 605–619 (2017)
51. Rodríguez Ramos, A., Silva Neto, A.J., Llanes-Santiago, O.: An approach to fault diagnosis
with online detection of novel faults using fuzzy clustering tools. Expert Syst. Appl. 113,
200–212 (2018)
Solving a Fuzzy Tourist Trip Design
Problem with Clustered Points of Interest
Keywords Tourist trip design problem · The team orienteering problem with time
windows · Clustered point of interest · Fuzzy constraints · Fuzzy optimization
Greedy randomized adaptive search procedure
1 Introduction
The selection of the attractions to visit at the tourist destination is a problem that
arises when the tourist decide to visit a destination. Most destinations have multiple
points of interests (POIs), most of them are tourist attractions. POIs are the main
reason why tourists visit the destination, and their decision is motivated by either
historical, beauty or cultural values. Typically tourists have a limited time to visit POIs
at destination and have to select which of them are most interesting. The selection
takes into account, their preferences associated with the degree of satisfaction that
could be perceived by visiting each POI and the cost of the activities within the visit.
The design of tourist routes at destination has been addressed as an optimization
problem associated with the route generation. The problem is known in the literature
[17] as the Tourist Trip Design Problem (TTDP). The corresponding optimization
problems have received increasing interest in the tourism management and service
in order to be incorporated to recommenders, tourism planning tools and electronic
guides. The design and development of tourist trip planning applications is area of
research in computer engineering with increasing interest. The TTDP model usually
considers several basic parameters. Generally, they are the possible set of POIs to
visit by the tourist, the number of routes to be designed taking into account the days
of tourist stay at destination, the travel distance or time between POIs using the
available routing information among POIs, the scores of the POI that correspond
to the degree of interest, the maximum time available for sightseeing each day,
and the time windows for visiting all the POIs. The solution for the optimization
problem must maximize the total score of the POIs selected, and identify the optimal
scheduling routes.
The problems may be complicated and made more realistic by considering addi-
tional features and constraints. Some of them are maximum budget, either by day or
for the whole stay at destination, or specific requirements on the minimum and/or
maximum number of days that the tourist visit the POIs within a certain category
(restaurants, beaches, historic sites, nature facilities, etc.), or on the number of visits
to POIs of a category by some days. Travel times that depend on traffic congestion,
weather conditions, or on the time of day when he/she travels. Other realistic variants
arise when some of the POIs have time windows constraints and the time used to
visit them have to be taken into account in the cost or profit of the visit [7]. In this
paper, we present the Tourist Trip Design Problem with Clustered POIs, in which we
consider that the set of POIs are grouped into different categories. Categories rep-
resent different types of visiting sites (museum, amusement park, beach, restaurant,
...). The aim is to define the set of feasible routes, one for each day of the stay, that
maximize the total score. The tours must start and end at a given starting point and
the duration of each tour(computed considering both travel, visit and waiting times)
cannot exceed a maximum time. The problem also includes POIs which are acces-
sible in certain time windows. In addition, for each category, the number of visited
POIs by category can be bounded or even fixed. For instance, when considering the
lunch restaurant category, the number of visits for each trip must be exactly one,
while other categories may have only one sided limits.
Available information from real world routing planning problems is often impre-
cise, vague or contains uncertainty. Specifically, travel times depend on the surround-
ing conditions and the traffic, roads or weather. The available information on these
conditions is often sparse, imprecise and not easily accessible by tourists. Moreover
they usually have a high degree of flexibility, optimizing their time and setting their
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 33
2 Related Works
Most of the operational research literature dealing with TTDP modeling uses the
Team Orienteering Problem (TOP) [3] or TOP models with time windows (TOPTW)
[18]. The Team Orienteering Problem has been extensively studied in the literature
[19]. The Team Orienteering Problem with Time Windows(TOPTW) is an extension
of the TOP where nodes can be visited only within a specific Time Window. Typ-
ically, POIs are characterized by a time window. Several TOPTW are described in
the literature and solved with metaheuristics, among other, iterated local search [18],
34 A. Expósito et al.
ant colony optimization [14], hybridized evolutionary local search [10], LP-granular
variable neighborhood search [11], genetic algorithm [9], artificial bee colony algo-
rithm [4], and, iterative three-component heuristic [8].
In the literature there are some works using fuzzy optimization approach with
TTDP. The earliest contribution [12] considers a fuzzy routing problem for sightsee-
ing. Recently, M. Verma and K. K. Shukla apply fuzzy optimization to the orienteer-
ing problem [23], Mendez in his Ph.D. thesis [13] uses fuzzy number comparisons
to deal with VRPTW with fuzzy scores and Brito et al. in [2] apply a GRASP for
solving the TOP with fuzzy scores and constraints.
The Tourist Trip Design Problem with Clustered POIs (TTDPC) addressed in this
research is modelled as a multiple-route planning problem. The problem is aimed
at designing a set of routes in a given tourist destination. The number of routes
corresponds to the number of days of the stay at the destination. Each route visit a
certain number of POIs in a limited time. Each POI has associated a score or profit,
a visit time, a time window and a category to which it belongs. The objective is to
maximize the sum of the scores of all the visited POIs. In the fuzzy model proposed
the score of POIs, time limit for the routes and time windows are expressed in fuzzy
terms, as fuzzy number and fuzzy constraints, respectively.
Table 1 describes the sets of indices, parameters, and decision variables of the
problem.
The mathematical model can be written as follows:
Maximize:
p̃i Yik (1)
k∈K i∈I
Subject to:
X0jk = 1 ∀k ∈ K (2)
j∈I
Xj0k = 1 ∀k ∈ K (3)
j∈I
Xijk = Xjik ∀i ∈ I 0 ∀k ∈ K (4)
j∈I j∈I
Xijk ≤ Yik ∀i ∈ I ∀k ∈ K (5)
j∈I 0
Tj ≥ Ti + vi + tij − M 1− Xijk ∀i ∈ I 0 ∀j ∈ I (6)
k∈K
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 35
Ti ≥ ei ∀i ∈ I (8)
Ti + vi ≤f li ∀i ∈ I (9)
Yik ≤ 1 ∀i ∈ I (10)
k∈K
Ncmin ≤ Yik ≤ Ncmax ∀c ∈ C ∀k ∈ K (11)
i∈I c
The objective function concerns the maximization of the collected scores or profits
(fuzzy numbers) and is reported in (1). Constraint (2) imposes that each tour must
start from the hotel, while constraint (3) combined with constraint (4) imply that
each tour must end at the hotel. Constraint (4) guarantees the flow balancing at POIs.
Constraint (5) imposes that a POI can be visited by a tour only if it has been assigned
to it. Constraint (6) guarantees tours connectivity while constraint (7) ensure that the
maximum tour duration is respected by all tours. M is a large constant used to make
36 A. Expósito et al.
constraint (6) not binding when POI j is not visited just after POI i. Constraints (8–9)
guarantee that time windows are respected. Each POI can be assigned to at most one
tour, as stated by constraints (10). Constraint (11) guarantees that, for each cluster c,
at least Ncmin and at most Ncmax POIs are visited in each tour. Finally, constraint (12)
specify variables domain. Note that symbol ≤f in (7) and (9) denote that constraints
are fuzzy.
Maximize z = cx
subject to Ax ≤ b + τ (1 − α) (13)
x ≥ 0, α ∈ [0, 1]
Ti + vi ≤ li + τ2 (1 − α), ∀i ∈ I (15)
where τ1 , τ2 ∈ are the tolerance level vectors or the maximum violations in the
fulfillment of time limit for the routes and time windows constraints provided by the
decision maker, and α ∈ [0, 1]. Applying this model, for each value of α we obtain a
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 37
new optimal solution. The end result is an optimal solution range varying in α. The
result is consistent with the fuzzy nature of the problem.
The next step is to deal with the fuzzy coefficients in the objective function. The
fuzzy model is transformed into a simpler auxiliary model. The method proposed the
use of an ordering function g that allows the comparison between fuzzy numbers,
which facilitates maximization of the objective function. Therefore the objective
function (1) is replaced by:
g(p̃i )Yik (16)
k∈K i∈I
More specifically, in this paper we use triangular fuzzy numbers to represent the fuzzy
scores. We use the third index of Yager for comparative purposes. The following
objective function is obtained:
(pi1 + 2pi2 + pi3 )Yik (17)
k∈K i∈I
5 GRASP Solutions
The candidate list with the POIs to be inserted in the solution is constructed by the
standard GRASP using a greedy function f . The RCL is built by selecting sizeRCL
feasible insertion triplets (i, j, k) with best values for the greedy function f . This
greedy function represents the incremental increase in the cost function due to the
incorporation of this element into the partial solution. The evaluation function, used
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 39
in this paper, locates the best position in which to insert a candidate for all routes,
minimizing the travel time of insertion. Through this greedy function the candidate
list is formed from the best elements, in this case those whose incorporation to the
current partial solution results in the smallest incremental time. The list of candidates
is sorted in descending order according to the score or ascending order according to
the travel time so that the candidates with the highest score or lowest travel time are
placed at the top of the candidate list. Through this greedy function the candidate
list is formed from the best elements. When a candidate is randomly selected, it
is incorporated into the partial solution. Thus the candidate is excluded from the
candidate list and incremental costs are re-evaluated. The construction phase ends
with a feasible current solution. Subsequently, a local search phase is applied with
the aim of improving the solution.
Usually a local search algorithm works interactively, replacing the current solution
with a better solution obtained in the neighborhood. The procedure ends when a better
solution is found in the neighborhood. Figure 3 shows a basic local search algorithm.
Our local search uses exchange movements between locations of different routes in
order to reduce the time routes. This neighborhood search uses a best improving
strategy, all neighbors are explored and the current solution is replaced by the best
neighbor. If the first steps in the local search are able to reduce the route travel time
then the local search tries to insert new locations in the solution in order to maximize
the total score.
40 A. Expósito et al.
In general terms, the construction phase and the local phase search try to maximize
the total score of the solution. This two phases process is iterated, continuing until
the imposed termination criterion is reached.
This section describes the computational experiments that were carried out in our
study and the corresponding results. The aim of the experiments is to evaluate the
accuracy of the proposed approach and evaluate its behavior when it is used to solve
the TTDPC with fuzzy coefficients and constraints.
Thirty instances were used in the experiments for comparative purposes. The set
of instances include data from 30 real POIs related to touristic attractions in the
island of Tenerife in Spain. Travel times are computed on a real road network. The
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 41
data provides the position of a set of 30 locations with a given score which can be
visited on a specific day. The maximum number of routes of the solution is also
included. The number of clusters is 4, which are kept fixed for all the instances.
For each POI the visiting time and the opening time windows are taken from real
data and are fixed for all the instances. The maximum number of routes varies from
1 to 5 (K ∈ 1, 2, 3, 4, 5) according to the specific instance. The maximum time by
route is 5, 7.5 or 10 h (Tmax ∈ 300, 450, 600) according to the specific instance. For
each combination of K and Tmax we generate two instance, one, named flexible in
which the minimum/maximum number of POIs to be selected for each category are
not strictly binding, and one, named tight, in which the value of Nmin and Nmax are
tighter, and in at least one case, Nmin = Nmax , for a total of 30 small instances. For
more details concerning the used instances, see Table 2.
The tolerance level applied in the maximum time constraint is 20% of the maxi-
mum time and 20% of the time windows latest time for time windows constraints.
The values of α are 0, 0.2, 0.4, 0.6, 0.8, and 1.0. Regarding GRASP parameters,
several RCL size are used: 3, 4, 5 and 6. The experimentation is divided depending
on how to sort the candidate list in the GRASP procedure, by time or score. The
results presented in Tables 3 and 4 correspond to the best solution for the values of
RCL size ordering the candidate list by time. Furthermore, the results presented in
Tables 5 and 6 correspond to the best solution for the values of RCL size ordering the
candidate list by score. The tables named above have the following structure. The
first column of the tables includes the name of instance used. The second column
shows for each instance the best score, the average score, and the average execution
time in microseconds. Finally, the following columns show the values of the second
42 A. Expósito et al.
column for each α value. The GRASP procedure was run 1000 times for each of the
instances and parameters used in experimentation. One thousand executions of the
GRASP for each parameter combination are carried out in less than one second. All
computations were carried out on a Intel Dual Core with 2.5 GHz processor and 4
GB Ram.
As we can see, different solutions are obtained by varying α and an increase in tol-
erance levels allows to find better solutions. Both results consistent with the proposed
fuzzy approach. As one would expect, in the computational results a differentiation
between the results of the flexible and tight instances is observed. Specifically and
as shown in Table 1, the flexible instances have a higher score in all cases for the best
solutions with respect to the tight instances.
Following the goal of maximizing the total score of the solution, the results shown
in Tables 3, 4, 5 and 6 reveal that ordering by score the list of candidates in GRASP is
more effective than ordering by time. The difference of the score between solutions
according to the ordering used can be appreciated more clearly in the Figs. 2 and 3.
In these figures, the best average scores for all α values are compared, taking into
account the two order type of the candidate list mentioned above.
7 Conclusion
In this study, we present a Soft Computing approach applied to the Fuzzy TTDPC,
specifically with fuzzy scores, fuzzy time constraints and fuzzy time windows con-
straints. In order to solve the problem to get high quality solutions in reasonable
time, GRASP metaheuristic has been used. The computational experiment confirms
that the proposed approach is feasible to solve this model. The application of this
methodology generates a set of different solutions consistent with its fuzzy nature.
Future work will extend experimentation with other instances which have a greater
number of POIs and clusters. Also we would like to evaluate the behavior and effi-
ciency of other metaheuristics. The multiobjective problem will be one of the first
lines of research to be studied. This multiobjective version will consider the score
obtained in the locations and route time in the objective function.
Acknowledgements This work has been partially funded by the Spanish Ministry of Economy
and Competitiveness with FEDER funds (TIN2015-70226-R) and supported by Fundación Cajaca-
narias research funds (project 2016TUR19) and the iMODA Network of the AUIP. Contributions
from Airam Expósito-Márquez is supported by la Agencia Canaria de Investigación, Innovación y
Sociedad de la Información de la Consejería de Economía, Industria, Comercio y Conocimiento
and by the Fondo Social Europeo (FSE).
Solving a Fuzzy Tourist Trip Design Problem with Clustered Points of Interest 47
References
1. Bellman, R., Zadeh, L.: Decision making in a fuzzy environment. Manag. Sci. 17(4), 141–164
(1970)
2. Brito, J., Expósito, A., Moreno, J.A.: Solving the team orienteering problem with fuzzy scores
and constraints. In: 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp.
1614–1620. IEEE (2016)
3. Chao, I.M., Golden, B.L., Wasil, E.A.: The team orienteering problem. European J. Oper. Res.
88(3), 464–474 (1996)
4. Cura, T.: An artificial bee colony algorithm approach for the team orienteering problem with
time windows. Comput. Ind. Eng. 74, 270–290 (2014)
5. Delgado, M., Verdegay, J., Vila, M.: A general model for fuzzy linear programming. Fuzzy
Sets Syst. 29, 21–29 (1989)
6. Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. Glob. Optim.
6, 109–133 (1995)
7. Gavalas, D., Konstantopoulos, C., Mastakas, K., Pantziou, G., Tasoulas, Y.: Cluster-based
heuristics for the team orienteering problem with time windows. In: International Symposium
on Experimental Algorithms, pp. 390–401. Springer (2013)
8. Hu, Q., Lim, A.: An iterative three-component heuristic for the team orienteering problem with
time windows. European Journal of Operational Research 232(2), 276–286 (2014)
9. Karbowska-Chilinska, J., Zabielski, P.: Genetic algorithm solving the orienteering problem
with time windows. In: Advances in Systems Science, pp. 609–619. Springer (2014)
10. Labadie, N., Melechovský, J., Wolfler Calvo, R.: Hybridized evolutionary local search algo-
rithm for the team orienteering problem with time windows. J. Heur. 17(6), 729–753 (2011)
11. Labadie, N., Mansini, R., Melechovsk, J., Calvo, R.W.: The team orienteering problem with
time windows: An lp-based granular variable neighborhood search. European J. Oper. Res.
220(1), 15–27 (2012)
12. Matsuda, Y., Nakamura, M., Kang, D., Miyagi, H.: A fuzzy optimal routing problem for
sightseeing. IEEJ Trans. Electron. Inf. Syst. 125, 1350–1357 (2005)
13. Mendez, C.E.C.: Team Orienteering Problem with Time Windows and Fuzzy Scores. Ph.D.
thesis, National Taiwan University of Science and Technology (2016)
14. Montemanni, R., Gambardella, L.: An ant colony system for team orienteering problems with
time windows. Found. Comput. Decis. Sci. 34(4), 287–306 (2009)
15. Resende, M.G., Ribeiro, C.C.: Greedy randomized adaptive search procedures: advances,
hybridizations, and applications. In: Gendreau, M., Potvin, J.Y. (eds.) Handbook of Meta-
heuristics, International Series in Operations Research and Management Science, vol. 146, pp.
283–319. Springer, US (2010)
16. Souffriau, W., Vansteenwegen, P., Berghe, G.V., Oudheusden, D.: A greedy randomised adap-
tive search procedure for the team orienteering problem. In: proceedings of EU/MEeting (2008)
17. Vansteenwegen, P., Oudheusden, D.V.: The mobile tourist guide: an or opportunity. OR Insight
20(3), 21–27 (2007)
18. Vansteenwegen, P., Souffriau, W., Berghe, G.V., Oudheusden, D.V.: Iterated local search for
the team orienteering problem with time windows. Comput. Oper. Res. 36(12), 3281–3290
(2009)
19. Vansteenwegen, P., Souffriau, W., Berghe, G.V., Oudheusden, D.V.: The city trip planner: an
expert system for tourists. Expert Syst. Appl. 38(6), 6540–6546 (2011)
20. Verdegay, J.: Fuzzy Information and Decision Processes, Chap. Fuzzy Mathematical Program-
ming. North-Holland (1982)
21. Verdegay, J.L.: Fuzzy optimization: models, methods and perspectives. In: In proceeding 6th
IFSA-95 World Congress, pp. 39–71 (1995)
22. Verdegay, J.L., Yager, R.R., Bonissone, P.P.: On heuristics as a fundamental constituent of soft
computing. Fuzzy Sets Syst. 159, 846–855 (2008)
23. Verma, M., Shukla, K.K.: Application of fuzzy optimization to the orienteering problem. Adv.
Fuzzy Syst. 2015, 8 (2015)
Characterization of the Optimal Bucket
Order Problem Instances and
Algorithms by Using Fuzzy Logic
1 Introduction
that the item i precedes the item j in the set of preferences to be aggregated. The
objective of OBOP is to find an ordered partition of the set of items [24] (called
bucket order) that minimizes the L 1 distance with respect to the precedence matrix
M.
Several algorithms have been proposed for OBOP, e.g. the Bucket Pivot Algorithm
(BPA) [18, 24], the SortCC algorithm [21] and L I A GM P2 [6]. There is not a unique
winner algorithm for all OBOP instances. For example, L I A GM P2 outperforms BPA
(in general), but not always [6]. In addition, the influence of the characteristics
of OBOP instances in the performance of OBOP algorithms has not been studied
properly.
This paper focuses on analyzing the performance of several OBOP algorithms with
respect to the instance characteristics. Our aim is to derive interesting knowledge that
serves to characterize and to predict the performance of OBOP algorithms, not only
to help selecting the best algorithm for each instance but also to devise new future
versions of them. In order to do that, the fuzzy logic [25] is used to characterize
each instance in terms of fuzzy labels. Then, the relations among these fuzzy labels
are studied. The idea is to use the interpretability and flexibility of fuzzy logic as a
valuable tool to analyze the comparative results of several OBOP algorithms. This
is in line with the call for using fuzzy concepts as “a methodological basis in many
application domains” [19].
The main contributions of this work are:
• Several fuzzy measures are proposed to characterize OBOP instances.
• Several fuzzy measures are proposed to characterize the performance of OBOP
algorithms
• We obtain interesting knowledge that serves to characterize OBOP instances and
to predict the performance of OBOP algorithms
The paper is organized as follows: Section 2 introduces several concepts related
to OBOP. Section 3 presents the fuzzy methodology used to characterize OBOP
instances and algorithms. Section 4 explains the knowledge discovered through the
experiments conducted by using several dataminig techniques.
2 Technical Background
A simple example of ranking may be 1|2, 3|4 which represents the preferences about
the items 1, 2, 3 and 4, meaning that the item 1 is the preferred one, followed by the
items 2 and 3 which are tied, and finally the item 4. This implies that in this ranking,
item 1 precedes 2, 3 and 4 (denoted 1 ≺ 2, 1 ≺ 3, 1 ≺ 4), and items 2 and 3 precede
item 4 (denoted 2 ≺ 4, 3 ≺ 4). Note that in this case, there is no precedence relation
between items 2 and 3, i.e. they are tied.
When we have to deal with different rankings (i.e. opinions about the order of
several items), the problem of aggregating all of them in a consensus ranking arises.
The rank aggregation problem is a family of problems that tries to obtain a ranking
Characterization of the Optimal Bucket Order Problem Instances … 51
which represents a consensus over a set of input rankings. There are many types of
rank aggregation problems depending on the characteristics of the rankings to be
aggregated, the expected resulting ranking, the conceptual meaning of the prece-
dences, and the measure used to indicate that a ranking is better or worse than other
[1, 5, 7].
The simplest problem is when the rankings to be aggregated are complete rankings
without ties (i.e. the Kemeny problem) [7, 20]. There are several variations of this
case, for example, by allowing partial or incomplete rankings [6, 13] as inputs. The
Optimal Bucket Order Problem (OBOP) is a singular rank aggregation problem that
receives as input a precedence matrix that describes the precedences in a set of
rankings and produces as output a complete ranking (possibly with ties) [16, 18].
For example, suppose that the rankings to be aggregated are:
• 1|2|3|4
• 2|1|3|4
• 1|2|4|3
• 2|1|4|3
For these four rankings, the precedence matrix M is shown in Table 1.
In Table 1, the cell M(1, 2) is equal to 0.5 because the item 1 precedes the item
2 in 2 out of 4 cases. On the other hand, M(1, 4) = 1 because the item 1 precedes
the item 4 in all the cases, while M(4, 1) = 0 because the item 4 never precedes
the item 1. It should be noted that M(i, i) = 0.5, i = 1..N (main diagonal) and
M(i, j) + M( j, i) = 1, i = j for i, j = 1..N . The objective of OBOP is to find a
ranking whose matrix representation R minimizes the distance
with respect to the
input precedence matrix M, i.e. minimizes D(R, M) = i, j |R(i, j) − M(i, j)|.
For example, the matrix representation R of the ranking 1|2|3|4 is presented in
Table 2.
Table 1 The precedence matrix M for the set of rankings { 1|2|3|4, 2|1|3|4, 1|2|4|3, 2|1|4|3 }
M 1 2 3 4
1 0.5 0.5 1 1
2 0.5 0.5 1 1
3 0 0 0.5 0.5
4 0 0 0.5 0.5
The distance D(R, M) between the matrix M in Table 1 and the matrix R in
Table 2 is 2, derived from the difference in the bolded cells in Table 2 with respect to
the corresponding ones in Table 1. In this case, the best complete rankings (without
ties) that may be returned as solution are any of the four mentioned input rankings.
If ties are allowed in the output (as is possible in OBOP) the situation changes. It
is clear that 1 and 2 must be placed before 3 and 4, but there is not a clear precedence
relation between 1 and 2, and between 3 and 4. This suggests that the expected result
should be a bucket with the items 1 and 2, placed before another bucket containing
the items 3 and 4. The ranking that meets this requirement is 1, 2|3, 4, that is, a
solution with two buckets. Indeed, the matrix representation R1 of this ranking is the
matrix in Table 1, i.e. R1 =M. Consequently, this ranking is the optimal solution with
distance D(R1 , M) = 0. This example illustrates the advantage of allowing ties in
the output, as occurs in OBOP.
More formally, suppose a set of items [[N ]] = {1, . . . , N } to be ranked. A bucket
order is a total or linear order with ties [15, 18], i.e. a partial order [24]. This implies
that each item belongs to a bucket. In Fagin [15] a bucket order is defined as a
transitive binary relation between the buckets, i.e. B1 ≺ B2 ≺ · · · ≺ Bk . In general,
given two items u ∈ Bi , v ∈ B j , if i < j then u precedes v. All the items that belong
to the same bucket are considered tied.
OBOP is NP-Hard [18]. Observe that, given N items, there are N ! rankings which
order all the items without ties, i.e. they are permutations of the N items. If ties are
allowed, the number of possible rankings is much larger [6, 8].
The most popular algorithm for solving the OBOP is the Bucket Pivot Algorithm
(BPA) [18, 24]. However, BPA suffers from some drawbacks because of the random
selection of the pivot used to decide the positions of the other elements. Kenkre [21]
proposed to face this problem by first constructing the buckets (clustering step) and
then ordering them, resulting in the SortCC algorithm. Recently [6], a new version
of BPA called L I A GM P2 has been presented (it will be called simply LIA in the rest
of the paper). LIA is based on a heuristic selection of the pivot and the inclusion
of several elements as pivots. These algorithms may also be used to produce initial
solutions for metaheuristic-based solutions for OBOP [4].
Based on the results presented in [6], LIA outperforms (in general) BPA, but not
in all instances. In spite of the fact that some recommendations were included in
[6] about which is the best algorithm for each OBOP instance, the influence of the
characteristics of the instances on the performance of the algorithms was not studied.
3 Methodology
Thus, the experimental results presented in [6] are taken as initial point for the rest
of the paper.
3.1 Instances
1
0,9
0,8
0,7
0,6
NearT
0,5 NearI
NearP
0,4
0,3
0,2
0,1
1
0,9
0,8
0,7
0,6 U
NearP
0,5
NearT
0,4 NearI
0,3
0,2
0,1
0
The same can be done with utopian and anti-utopian value as it is presented in
Fig. 3.
However, it is hard to compare characteristics that vary in different scales. For
example, if all the previous characteristics are included in the same figure, the val-
ues of the series in Fig. 2 will not be observable because they are smaller than the
corresponding values in the series of Fig. 3.
In order to ease the comprehension and comparison of these characteristics, we
define fuzzy labels which correspond to adjectives related to the values of these
characteristics. Indeed, we define a fuzzy label that means “the value is great”
associated to each of the previous characteristics. The goal is to use “meaningful
linguistic labels” as is suggested in [19]. Based on the numerical data, fuzzy labels
of type “Great Value”(GV ) are created by using the same fuzzification function
(Eq. 1) for all the characteristics: the maximum value gets 1, the minimum value gets
0, and the others get the proportion with respect the minimum and maximum.
Characterization of the Optimal Bucket Order Problem Instances … 55
60000
50000
40000
uv
30000
av
20000
10000
x − Minimum
GV (x) = (1)
Maximum − Minimum
1
0,9
0,8
N
0,7 uv
0,6 av
U
0,5
NearP
0,4 NearT
0,3 NearI
P
0,2
0,1
0
in the smallest instances GV(P), GV(u V ) and GV(aV ) are smaller while the other
characteristics varies a lot. It is also worth noting that as N grows the values of
P, u V and aV also grows similarly, while GV(U ), GV(N ear P ), GV(N ear T ) and
GV(N ear I ) tend to be much closer around medium values.
Other comparison based on data-mining techniques allowed by the fuzzification
will be presented in Sect. 4.1.
3.2 Algorithms
Table 4 (continued)
ID Borda BPA LIA CC10 CC25
15 -17 3246 3192.33 2544 3282.13 3311.73
15 -40 3756 3534.9 2730 3715.97 3775.53
15 -23 3973 3988.27 3156 3967.37 4010.57
15 -32 4320.5 4694.6 3559.93 4337.07 4315.77
15 -14 4887.5 5219.37 3822.27 4973.57 4903.13
15 -01 8488.5 11890.47 8787.03 7825.23 7705.23
11 -01 7489 6397.59 6058.5 7492.99 6669.47
15 -02 7489 6511.65 6058.46 7492.96 6696.22
11 -02 14845 13909.29 12545 14848.87 14396.77
15 -04 14845 14209.03 12545 14848.8 14535.67
15 -03 16219 13696.37 12233.87 15469.87 15364.07
Table 5 General description of the performance of the algorithms in the 50 instances used in [6]
Borda BPA LIA CC10 CC25
Min 2.67 3.39 3 2.67 3.44
Median 1020 1032.95 816.85 1011.47 1014.38
Average 2473.67 2433.12 2013.58 2438.76 2384.15
Max 16219 14209.03 12545 15469.87 15364.07
StdDev 3853.73 3678.34 3197.64 3784.17 3681.41
Observe that the inner complexity of each problem makes harder to see the dif-
ference among the performance of the algorithms. Indeed, as is shown in Fig. 3 the
minimum (u v is a lower bound) and the maximum value (av is a upper bound) for
each instance increases as N grows in these 50 instances. This increased complex-
ity of the instances implies that if we compute the average performance (minimum
distance), the results seem to be very similar (see Table 5) or if we plot the results of
Table 4 (see Fig. 6 where x-axis corresponds to values of N in ascending order).
As it was explained in the previous section, in order to ease the analysis, we may
define fuzzy labels. For example, it is possible to define a label to characterize the
fulfillment of the adjective “the algorithm X performs well”, that may be applied to
the previous algorithms. For each value corresponding to the performance of each
algorithm, a “Good Performance” (G P) label is defined by taking into account that
“good performance” corresponds to small values in OBOP because it is a minimiza-
tion problem.
As the maximum and minimum values are unknown for all the instances, the
utopian value (u v ) and the anti-utopian value (av ) are used as extremes (super-
optimal) values. Then, the values are fuzzified as shown in Eq. 2.
Characterization of the Optimal Bucket Order Problem Instances … 59
18000
16000
14000
12000
Borda
10000 BPA
LIA
8000 CC10
CC25
6000
4000
2000
av − x
G P(x) = (2)
av − u v
Just by using the GP labels of Eq. 2, the fuzzy version of Table 5 becomes more
meaningful (see Table 6) clarifying the overall advantage of LIA over the other
algorithms (see also Fig. 7). It may be observed that LIA is the only algorithm where
the median and averages are greater than 0.9. It also has the minimum value of
standard deviation, i.e. it is the most stable algorithm.
The stability of LIA can be also observed if we plot GP labels (see Fig. 8).
Also, if we compare the performance of the algorithms in terms of the GV labels
(Fig. 8) it is more noticeable the superior performance of LIA with respect to the
other algorithms.
It is worth noting that the fuzzification method used to characterize the perfor-
mance of each algorithm does not depends on the set of algorithms considered, thus
any future result of OBOP algorithms in these problems may be analyzed in the light
of the same framework used here.
The previous fuzzification allow to obtain the values of the fuzzy adjectives
(labels) GV(N ), GV(U ), GV(N ear P ), GV(N ear T ), GV(N ear I ), GV(u V ), GV(aV ),
GV(U ), GP(BPA), GP(Borda), GP(LIA), GP(CC10 ) and GP(CC25 ) for each OBOP
60 J. A. Aledo et al.
1
0,9
0,8
0,7
0,6 Min
Median
0,5
Average
0,4 Max
StdDev
0,3
0,2
0,1
0
1
0,95
0,9
0,85
GP(Borda)
0,8
GP(BPA)
0,75 GP(LIA)
GP(CC10)
0,7
GP(CC25)
0,65
0,6
0,55
0,5
instance. In Sect. 4 these fuzzy labels will be used to obtain general knowledge
describing the problems, the performance of the algorithms and the relation among
them.
This section describes several patterns regarding the instances, the performance of
the algorithms and the relation among them. In order to study these three dimensions
(instances characteristics, algorithm performance, instances-algorithms relations),
we will analyze the database composed by the 13 columns (representing the values
of the previous 8 GV labels and the 5 GP labels) and the 50 rows (representing the
instances), in order to obtain statistical measures (correlations), clusters (by using
Fuzzy C-Means [9]), and fuzzy predicates (by using FuzzyPred [11]).
Characterization of the Optimal Bucket Order Problem Instances … 61
4.1 Instances
By using Fuzzy C-Means [9], the 50 instances may be grouped (in terms of the
instance characteristics) into the following 5 clusters. Figure 9 shows the centers of
each cluster. They are called P clusters because they are obtained only by taking into
account the similarity in terms of the problem characteristics.
• Cluster P0 (10 instances): 14–01, 15–16, 15–34, 15–41, 15–48, 15–50, 15–57,
15–65, 15–73, 15–77
• Cluster P1 (14 instances): 15–07, 15–09, 15–12, 15–14, 15–17, 15–18, 15–20,
15–23, 15–25, 15–27, 15–29, 15–32, 15–40, 15–42
• Cluster P2 (8 instances): 06–03, 06–04, 06–11, 06–12, 06–18, 06–28, 06–46, 06–
48
• Cluster P3 (6 instances): 11–01, 11–02, 15–01, 15–02, 15–03, 15–04
• Cluster P4 (12 instances): 15–19, 15–24, 15–30, 15–44, 15–46, 15–54, 15–55,
15–59, 15–66, 15–67, 15–69, 15–74
In Fig. 9 can be observed that cluster P2 includes the instances with the smallest
values of N , u v , av , N ear T , N ear I and P, and with the maximum values of U and
N ear P (Fig. 9 presents the complement of U and N ear P to ease the visualization).
This implies that this cluster P2 groups the easiest instances (smallest, clear prece-
dences). Cluster P3 is the opposed one, with the biggest instances with low utopicity.
Cluster P1 has intermediate values in terms of the number of items (N ). Finally,
clusters P0 and P4 are similar in terms of N (almost small). However, P0 includes
the instances with less utopicity U and biggest values of N ear I , while P4 has greatest
values of utopicity and N ear P (the second in these aspects, only dominated by P2).
Another way to observe the relations among the characteristics of the instances is
by using the Pearson correlation coefficients between N and the other aspects (see
Table 7). It is interesting to note that the correlation between N and the other labels
is only significant with respect to u V , aV and P (also observable in Fig. 5). It is also
1
0,9
0,8
0,7
0,6 P0
P1
0,5
P2
0,4 P3
P4
0,3
0,2
0,1
0
Table 7 Pearson correlations between the GVs labels used to characterize the instances
N uV aV U N ear P N ear T N ear I P
N 1 0.94 0.96 −0.4 −0.38 0.39 0.27 0.95
uV 0.94 1 0.97 −0.34 −0.34 0.4 0.21 1
aV 0.96 0.97 1 −0.28 −0.27 0.29 0.18 0.98
U −0.4 −0.34 −0.28 1 0.98 −0.7 −0.92 −0.32
N ear P −0.38 −0.34 −0.27 0.98 1 −0.79 −0.89 −0.33
N ear T 0.39 0.4 0.29 −0.7 −0.79 1 0.41 0.38
N ear I 0.27 0.21 0.18 −0.92 −0.89 0.41 1 0.2
P 0.95 1 0.98 −0.32 −0.33 0.38 0.2 1
interesting to note the strong direct relation between U and N ear P , and the negative
relation between U and N ear I . In general, greatest values of N are not aligned with
extreme values of the others labels (U , N ear T , N ear I , N ear P ). In spite of that, there
is a slight tendency to increase N ear T and N ear I and to decrease U and N ear P
when N is large.
4.2 Algorithms
By using Fuzzy C-Means [9], the 50 instances may be grouped (in terms of the sim-
ilarity of the performance of the algorithms) into the following 3 clusters. Figure 10
shows the centers of each cluster. They are called A clusters because they are based
on the similarity in terms of the algorithm performance.
• Cluster A0 (10 instances): 06-03, 06-04, 06–11, 06–12, 06–18, 06–28, 06–46,
06–48, 11–001, 15–02.
• Cluster A1 (14 instances): 11–02, 14–01, 15–03, 15–04, 15–20, 15–25, 15–30,
15–40, 15–41, 15–48, 15–50, 15–57, 15–65, 15–77
• Cluster A2 (26 instances): 15–01, 15–07, 15–09, 15–12, 15–14, 15–16, 15–17,
15–18, 15–19, 15–23, 15–24, 15–27, 15–29, 15–32, 15–34, 15–42 15–44, 15–46,
15–54, 15–55, 15–59, 15–66, 15–67, 15–69 15–73, 15–74
Despite the fact that LIA seems to be the best algorithm, it is worth analyzing
the performance of the other algorithms. Based on the centers shown in Fig. 10, the
cluster A0 is composed by the instances where all the algorithms performs better,
almost reaching the utopian value. On the contrary, cluster A2 is composed by the
instances where the performance of the algorithms are the furthest from the utopian
value. In this cluster the performance of LIA is comparatively better that the others,
followed by BPA (outperforming both CC versions and Borda). Finally, cluster A1 is
composed by the instances where both CC are the second best algorithms followed
by Borda (BPA is the worst). This knowledge is very useful for the application case
Characterization of the Optimal Bucket Order Problem Instances … 63
1
0,98
0,96
0,94
0,92
A0
0,9 A1
0,88 A2
0,86
0,84
0,82
0,8
where the execution time is an important constraint. As LIA is slower than the others,
it is interesting to know when the other algorithms are preferable.
In order to obtain relations between the characteristics of the instances and the per-
formance of the algorithms, we first show the coincidences between the P clusters
(obtained in Sect. 4.1) and the A clusters (obtained in Sect. 4.2). Table 8 shows the
relationships between each set of clusters. In each cell appears the number of cases
where each problem-based cluster P with each algorithms-based cluster A coincide.
In parenthesis it is shown the percent that represents each value with respect to the
total by rows and columns, respectively.
It is worth noting that in 100% of instances of cluster P2 the performance of
the algorithms is according to the cluster A0, while in 80% of instances where the
performance of the algorithms is according to the cluster A0 the instances belong
to cluster P2. This implies that the instances of type P2 are very related with the
64 J. A. Aledo et al.
Table 9 Pearson correlations between the measures and the performance of the algorithms
Borda BPA LIA CC10 CC25 Ave
N −0.2 −0.42 −0.47 −0.33 −0.37 −0.36
uV −0.23 −0.34 −0.45 −0.31 −0.32 −0.33
aV −0.12 −0.24 −0.33 −0.19 −0.2 −0.22
U 0.85 0.86 0.79 0.89 0.84 0.84
N ear P 0.93 0.83 0.78 0.92 0.85 0.86
N ear T −0.8 −0.57 −0.55 −0.66 −0.65 −0.65
N ear I −0.78 −0.79 −0.74 −0.86 −0.78 −0.79
P −0.2 −0.31 −0.42 −0.28 −0.29 −0.3
performance of type A0, i.e. for the easiest instances (P2) all the algorithms behave
similarly (A0).
Also note the strong relation between other clusters:
• P4 and A2: Small problems with great values of utopicity and N ear P , where the
advantage of LIA is very clear, followed by BPA.
• P1 and A2: Problem with intermediate size (N ), where the advantage of LIA is
very clear, followed by BPA.
• P0 and A1: Small problems with less utopicity and biggest values of N ear I , where
the advantage of LIA is clear, followed (in order) by both CC and Borda.
For problems of type P3 (biggest problems with low utopicity) there is not clear
tendency to belong to a cluster of algorithms performance.
Finally, Table 9 shows the correlation between each GP value (describing instances
characteristics) and the GV values (associated with algorithms performance). It is
remarkable the strong influence of U and N ear P in the performance of the algorithms.
That is, the greater the values of GV(U ) and GV(N ear P ), the greater the GP labels
associated to all the algorithms (best performance). LIA is the algorithm with less
dependence with respect to N ear P , N ear T , N ear I and U , i.e. it is more stable with
respect to the precedences in the input matrices. Similarly, N ear T affects negatively
more to Borda than to the rest of algorithms.
Based on Table 9 we can conclude that:
• U and N ear P are the characteristics that most positively influence on the perfor-
mance of the algorithms (i.e. the greater the value of GV(U ) or GV(N ear P ), the
greater the GP labels)
• N ear T and N ear I are the aspect that most negatively influence the performance
of the algorithms. That is, the smaller the value of GV(N ear T ) or GV(N ear I ), the
greater the GP labels.
The following predicate generalizes this knowledge. The symbol “−” is used to
indicate the complement (negation).
Characterization of the Optimal Bucket Order Problem Instances … 65
THEN
Table 11 Pearson correlations between “Correct Order” (CO) label and the GV labels
N uV aV U N ear P N ear T N ear I P
CO −0.34 −0.31 −0.3 0.42 0.33 −0.23 −0.31 −0.29
LW 0.09 −0.03 −0.1 −0.65 −0.68 0.52 0.61 −0.04
On the other side, the advantage of LIA (based on the LW label) is reinforced
when utopicity U and N ear P decreases (negative correlations), and when N ear T
and N ear I increases, i.e. the most difficult instances. The influence of N , u V , aV
and P in the advantage of LIA is almost zero.
Another way to obtain dependencies between instance characteristics and the
performance of the algorithms may be achieved by using other data-mining methods
over the 50 instances. By using FuzzyPref [11] several predicates with high values
of FPTV and FPS were obtained. They are presented in Table 12.
For example, the first two predicates state that (in the instances) it is true that LIA
or CC25 achieve a good performance, or the values of N ear P or N ear I are small.
The third predicate states that LIA or CC10 achieve a good performance or CC25
does not achieve a good performance. The fourth predicate is similar to the first
two predicates, but the alternative to the good performance of LIA or CC25 is that
CC10 achieves a good performance and the number of items N is small. In general,
the meaning of these predicates is that LIA, CC25 and CC10 are complementary,
guarantying a good performance of any of them in most instances. In the instances
where this does not happen, the values of N ear P , N ear I or N must be small. The last
predicate states that the performance of CC25 or CC10 is good, or Borda performs
well and N ear P is great. This implies that any of these three algorithms (CC25 , CC10
and Borda) performs well in each instance. In general, it worths noting that most of
the predicates include G P(L I A), which confirms the good overall performance of
LIA.
It is also possible to obtain (by using Fuzzy C-Means) a clustering that describes
all the GV and GP values, which results in three clusters with the centers that are
shown in Fig. 11.
The first cluster PA0 contains 8 instances (06–03, 06–04, 06–11, 06–12, 06–28,
14–01, 15–48 and 15–74) with the lowest values of N , u V , aV , N ear T , N ear I and
P, and the largest values of U and N ear P which are the simplest instances where
all the algorithms obtain a very good performance.
68 J. A. Aledo et al.
1
0,9
0,8
0,7
0,6
0,5 PA0
0,4 PA1
PA2
0,3
0,2
0,1
0
Fig. 11 Centers of the clusters obtained by Fuzzy C-Means with all the GV and GP labels
The second cluster PA1 contains 6 instances (06–18, 06–46, 06–48, 15–65, 15–
67 and 15–73) with the highest values of N , u V , aV , N ear T , N ear I and P, and the
smallest values of U and N ear P , which are the most complex instances where the
performance of all the algorithms is not as good as in PA0, so enlarging the advantage
of LIA over the others algorithms.
The cluster PA2 contains the remaining 36 instances, with intermediate values of
all the instance characteristics between those of PA0 and PA1 (closer to PA0 in terms
of N , u V , aV , and P; closer to PA1 in terms of U and the distribution of values in the
matrices (N ear P , N ear T , and N ear I ). In PA2 the performance of the algorithms is
similar to the performance in PA1 with a little better performance of LIA.
In general, it may be stated that in the simplest instances all the algorithms behave
similarly good, but in the most complex instances with larger dimensions (in terms
of N , u V , aV and P) and where the distribution of the matrix values is biased toward
more uncertain values (greatest values of N ear I , N ear T and smaller values of N ear P
and U ) the performance of LIA is much clearly superior with respect to the others
algorithms).
5 Conclusions
In this work the fuzzy logic concepts are used to analyze the performance of several
OBOP algorithms and to derive relations among these results and the characteristics
of a given instance.
In particular, we introduce several fuzzy labels to describe the characteristics
of the instances and the performance of the algorithms. Then, these fuzzy labels
are used as input to several datamining methods. Based on the several datamining
models obtained, we can state that:
Characterization of the Optimal Bucket Order Problem Instances … 69
Based on these results several recommendations for future work may be derived:
• It would be interesting to provide algorithms to deal with OBOP instances with
small values of utopicity and with a majority of precedences far from 0 and 1.
• Based on the characteristics of each instance, a meta-algorithm may be designed
that first identifies the characteristics of the instance and then recommends and
uses the most appropriate algorithm for each particular case.
• The same methodology based on fuzzy logic used in this work may be applied to
derive conclusions about the characteristics of the instances and the performance
of the algorithms in other optimization problems.
References
1. Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clus-
tering. J. ACM 55:5, 23:1–23:27 (2008)
2. Alcalá, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-
mining software tool: data set repository, integration of algorithms and experimental analysis
framework. J. Mult. Valued Logic Soft Comput. 17, 255–287 (2010)
3. Aledo, J. A., Gámez, J. A., Molina, D., Rosete, A.: Consensus-based journal rankings: a com-
plementary tool for bibliometric evaluation. J. Assoc. Inf. Sci. Technol. (2018). http://dx.doi.
org/10.1002/asi.24040
4. Aledo, J.A., Gámez, J.A., Rosete, A.: Approaching rank aggregation problems by using evo-
lution strategies: the case of the optimal bucket order problem. European J. Oper. Res. (2018).
http://dx.doi.org/10.1016/j.ejor.2018.04.031
5. Aledo, J.A., Gámez, J.A., Molina, D.: Using extension sets to aggregate partial rankings in a
flexible setting. Appl. Math. Comput. 290, 208–223 (2016)
6. Aledo, J.A., Gámez, J.A., Rosete, A.: Utopia in the solution of the Bucket Order Problem.
Decis. Support Syst. 97, 69–80 (2017)
7. Ali, A., Meila, M.: Experiments with kemeny ranking: what works when? Math. Social Sci.
64, 28–40 (2012)
8. Bailey, R.W.: The number of weak orderings of a finite set. Soc. Choice Welf. 15(4), 559–562
(1998)
9. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New
York, NY (1981)
10. Borda, J.: Memoire sur les Elections au Scrutin. Histoire de l’Academie Royal des Sciences
(1781)
11. Ceruto, T., Lapeira, O., Rosete, A.: Quality measures for fuzzy predicates in conjunctive and
disjunctive normal form. Ingeniería e Investigación 3(4), 63–69 (2014)
70 J. A. Aledo et al.
12. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonpara-
metric statistical tests as a methodology for comparing evolutionary and swarm intelligence
algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
13. Dwork, C., Kumar, R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In:
Proceedings of the 10th International Conference on World Wide Web, WWW ’01, pp. 13–22.
ACM (2001)
14. Emerson, P.: The original Borda count and partial voting. Soc. Choice Welf. 40(2), 353–358
(2013)
15. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., Vee, E.: Comparing and Aggregating
Rankings with Ties. In: PODS 2004, pp. 47–58. ACM (2004)
16. Feng, J., Fang, Q., Ng, W.: Discovering bucket orders from full rankings. In: Proceedings of
the 2008 ACM SIGMOD International Conference on Management of Data, pp. 55–66. ACM
(2008)
17. Fürnkranz, J., Hüllermeier, E.: Preference learning: an introduction. In: Fürnkranz, J., Hüller-
meier, E. (eds.), Preference Learning, pp. 1–17. Springer (2011)
18. Gionis, A., Mannila, H., Puolamäki, K., Ukkonen, A.: Algorithms for discovering bucket orders
from data. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’06, pp. 561–566. ACM (2006)
19. Hullermeier, E.: Does machine learning need fuzzy logic? Fuzzy Sets Syst. 281, 292–299
(2015)
20. Kemeny, J.L., Snell, J.G.: Mathematical Models in the Social Sciences. Blaisdell-New York
(1962)
21. Kenkre, S., Khan, A., Pandit, V.: On Discovering bucket orders from preference data. In:
Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 872–883. SIAM
(2011)
22. Mattei, N., Walsh, T.: PrefLib: A Library for Preferences http://www.preflib.org. In: Perny,
P., Pirlot, M., Tsoukiàs, A. (eds.) 2003 Proceedings of Third International Conference on
Algorithmic Decision Theory, ADT, pp. 259–270. Springer (2013)
23. Nápoles, G., Dikopoulou, Z., Papageorgiou, E., Bello, R., Vanhoof, K.: Prototypes construction
from partial rankings to characterize the attractiveness of companies in Belgium. Appl. Soft
Comput. 42, 276–289 (2016)
24. Ukkonen, A., Puolamäki, K., Gionis, A., Mannila, H.: A randomized approximation algorithm
for computing bucket orders. Inf. Process. Lett. 109(7), 356–359 (2009)
25. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Uncertain Production Planning
Using Fuzzy Simulation
Firstly, we establish basic notations. P(X ) is the class of all crisp sets, and F (X )
is the class of all fuzzy sets. A fuzzy set A : X → [0, 1] is defined on an universe of
discourse X and is characterized by a membership function μ A (x) ∈ [0, 1]. A fuzzy
set A can be represented as the set of ordered pairs of x, μ A (x), i.e.,
A fuzzy number (see Bede [1], Diamond and Kloeden [3]) is defined as follows:
Definition 1 Consider a fuzzy subset of the real line A : R → [0, 1]. Then A is a
fuzzy number (FN) if it satisfies the following properties:
(i) A is normal, i.e. ∃x ∈ R such that A(x ) = 1;
(ii) A is α-convex (i.e. A(αx + (1 − α)y) min{A(x), A(y)}, ∀ α ∈ [0, 1]);
(iii) A is upper semicontinuous on R, i.e. ∀ε > 0 ∃δ > 0 such that
Let us denote G (R) ∈ F (X ) as the class of all FNs which includes gaussian,
triangular, exponential, etc. The α-cut of a set A ∈ G (R) namely αA is the set of
values with a membership degree equal or greatest than α i.e.
α
A = {x | μ A (x) α} ∀ x ∈ X, (2)
α
A = inf α μ A (x), sup α μ A (x) = Ǎα , Âα . (3)
x x
Varón-Gaviria et al. [9] and Pulido-López et al. [8] proposed a method for gener-
ating random variables using μ A and its α-cuts. First, the area of a fuzzy number is
defined next:
Definition 2 Let A ∈ G (R) be a fuzzy number, then its area Λ is as follows:
Λ = Λ1 + Λ2 = l(x)d x + r (x)d x (4)
x∈R x∈R
Λ1
λ1 = , (5)
Λ
Λ2
λ2 = , (6)
Λ
λ1 + λ2 = 1. (7)
Uncertain Production Planning Using Fuzzy Simulation 73
Λ1 = Λ2 = 0.5Λ, (8)
λ1 = λ2 = 0.5, (9)
| Ǎα − c| = | Âα − c| → Âα = Ǎα + 2(c − Ǎα ), (10)
We simulate a production planning scenario were tasks execution times are defined
by experts (uncertain production planning has been covered by Mula et al. [7], and
Lan and Zhao [5]). This way, we generate fuzzy random variables (see Varón-Gaviria
et al. [9] and Pulido-López et al. [8]) to simulate execution times, the Mean Flow
Time (MFT), its membership degree and its overall performance.
A company processes five products Pi in five stages S j using a path Ri ; all products
start been coding in a warehouse W . All paths are as follows:
• R1 :W → S2 → S1 → S4 → S5
• R2 :W → S3 → S4 → S2 → S5
• R3 :W → S2 → S3 → S4 → S5
• R4 :W → S2 → S1 → S3 → S4 → S5
• R5 :W → S1 → S3 → S2 → S5 → S4
For the sake of understanding, every path Ri consists on a set of stages (i, j), the
ordering relation given above, a processing time pi j , and a starting instant ti j . The
main problem is the lack of reliable statistical data, so we only have experts-based
information (a.k.a third party sources).
The goal is to characterize the mean flow time MFT of the system namely MFTi ,
defined as the time in which a product is finished and released to the customer:
MFTi = ti j + pi j (11)
Now, the experts on every stage (workers, engineers, etc.) were asked about their
opinions of the processing times per product/stage and its shapes. Every expert has a
different perception about the processing times in every station, so they use different
membership functions to represent their knowledge about every processing time.
The shapes and fuzzy random variable generators X (ω) of every fuzzy processing
time pi j were proposed by Varón-Gaviria et al. [9] and Pulido-López et al. [8].
Gaussian fuzzy random variables G(c, δ) are shown next:
Uncertain Production Planning Using Fuzzy Simulation 75
Ǎα = c − −2 · ln(α) · δ,
Âα = c + −2 · ln(α) · δ.
and the generation procedure for Gaussian fuzzy number given U1 , U2 , is:
√
c − √−2 · ln(U1 ) · δ, for U2 0.5,
X (ω) = (14)
c + −2 · ln(U1 ) · δ, for U2 > 0.5.
μ A (x) = exp−x/θ ,
Âα = −θ · ln(α).
where j is the last processing stage of the path i, and the membership degree for
PTi namely P(PTi ) is shown as follows:
The simulation was ran in Promodel (see Harrell [4]), and we performed 12
runs of 196 h each, which corresponds to a full operation year. The resultant fuzzy
sets M(MFTi ) and P(PTi ) for the product 1 are shown in Fig. 2.
Note that PT1 seems closer to a convex fuzzy set while MFT1 does not. The
reason lies in the processing itself: every product Pi has to wait until predecessors
are processed on every stage j. Those waiting times are uncertain and suddenly add
uncertainty to MFT1 reflected into its behavior. This actually means that even if
Uncertain Production Planning Using Fuzzy Simulation 77
processing times pi j are known, then waiting times at every stage add uncertainty
to the total time in the system, which is a natural consequence of multiple products
been processed into common stages. A similar behavior can be seen for all products
(see Figs. 4 and 5 in the Appendix).
Figure 3 shows a time series of MFT1 , PT1 and WT1 for the first 1000 runs of
the simulated experiments. It is interesting to see that MFT1 , and W1 seem to be
close each other, so we can infer that MFT1 is sensitive to WT1 (same for remaining
products, as shown in (Figs. 6, 7 and 8) the Appendix).
On the other hand, all MFT, PT and WT actually show a random behavior accord-
ing to the ARCH test for which no significance was found in all series. Runs and
Turning points were not performed since MFT, PT and WT strongly depends on its
predecessors, so they will reject the test.
Table 2 shows the average (Mean), variance (Var), min, max, kurtosis (K), and the
skewness (Skw) of the MFT, PT and WT of every product. Note that every product
shows a mixed performance which is a clear sign of the goodness of the proposed
method since it does not produce uniform results but non-uniform values, which is
highly desirable in simulation systems in order to cover unexpected events and see
its effect over the system.
Uncertain Production Planning Using Fuzzy Simulation 79
We performed a Friedman test to compare all runs, and we found no significant dif-
ferences among them (p-value = 0.617). To compare the performance of all products
we performed the ANOVA and Levene tests, and we found no significant evidence
to think that all means (p-value ≈ 0) and variances (p-value ≈ 0) are equal. This
means that the results of all runs are statistically similar and all processing times PTi
are different. Finally we performed an ARCH test for every product (MFTi , PTi and
WTi ) using 5 lags and we found no any heteroscedasticity effect.
It is clear that the system has a performance conditioned to every product, its
path, and fuzzy uncertainty, but the proposed simulation methodology produces a
non uniform performance, which is expected from mixed fuzzy random generation.
4 Concluding Remarks
We have applied the fuzzy random variable generation method proposed by Varón-
Gaviria et al. [9] and Pulido-López et al. [8] to a production planning scenario with
successful results. All MFTi , PTi and WTi were simulated and modeled as fuzzy
sets, and some convex/nonlinear behaviors were seen.
When analyzing MFTi , PTi and WTi as time series we can see that they show
a random behavior (no ARCH effect is present) which is one of our goals: involve
fuzzy uncertainty into simulation systems.
The interaction among different products in all stations cause differences between
MFTi and PTi which leads to add uncertainty to MFTi as a consequence of WTi in
every stage. While PTi is a pure fuzzy function, MFTi involve ti j which and WTi
which are complex to be individually characterized.
The perception of experts can be used in discrete event simulation problems
where no statistical information is available/reliable with satisfactory results using
our proposal which is able to deal with any shape of a fuzzy number.
Finally, some interesting topics to be covered in the future are: (i) simulation of
fuzzy logic systems, (ii) complexity analysis of our proposal, (iii) comparison to
statistical approaches, and (iv) extensions to Type-2 fuzzy sets.
Acknowledgements The authors would like to thank to Prof. Miguel Melgarejo and Prof. José
Jairo Soriano for their invaluable discussion around all topics treated in this chapter, and a special
thanks is given to all members of the LAMIC Research Group.
Appendix
In this appendix we present the results of MFTi and PTi and the results of 1000 runs
of the simulation model (Figs. 4, 5, 6, 7 and 8).
80 J. Carlos Figueroa-García et al.
Fig. 5 PT for P2 , P3 , P4 , P5
82 J. Carlos Figueroa-García et al.
References
1. Bede, B.: Mathematics of Fuzzy Sets and Fuzzy Logic. Springer (2013)
2. Devroye, L.: Non-uniform Random Variate Generation. Springer, New York (1986)
3. Diamond, P., Kloeden, P.: Metric topology of fuzzy numbers and fuzzy analysis. Fundamentals
of Fuzzy Sets. 7 (2000)
4. Harrell, C.: Simulation using ProModel, 3rd ed. McGraw-Hill (2012)
5. Lan, Y., Zhao, R.: Minimum risk criterion for uncertain production planning problems. Int. J.
Prod. Econ. 61(3), 591–599 (2011)
6. Law, A., Kelton, D.: Simulation Modeling and Analysis. Mc Graw Hill (2000)
7. Mula, J., Poler, R., García-Sabater, J., Lario, F.: Models for production planning under uncer-
tainty: a review. Int. J. Prod. Econ. 103(1), 271–285 (2006)
8. Pulido-López, D.G., García, M., Figueroa-García, J.C.: Fuzzy uncertainty in random variable
generation: a cumulative membership function approach. Commun. Comput. Inf. Sci. 742(1),
398–407 (2017)
9. Varón-Gaviria, C.A., Barbosa-Fontecha, J.L., Figueroa-García, J.C.: Fuzzy uncertainty in ran-
dom variable generation: an α-cut approach. LNCS 10363(1), 1–10 (2017)
Fully Fuzzy Linear Programming Model
for the Berth Allocation Problem with
Two Quays
Abstract In this work, we study the berth allocation problem (BAP), considering
the cases continuous and dynamic for two quays; also, we assume that the arrival
time of vessels is imprecise, meaning that vessels can be late or early up to a allowed
threshold. Triangular fuzzy numbers represent the imprecision of the arrivals. We
present two models for this problem: The first model is a fuzzy MILP (Mixed Integer
Lineal Programming) and allows us to obtain berthing plans with different degrees of
precision; the second one is a model of Fully Fuzzy Linear Programming (FFLP) and
allows us to obtain a fuzzy berthing plan adaptable to possible incidences in the vessel
arrivals. The models proposed have been implemented in CPLEX and evaluated in a
benchmark developed to this end. For both models, with a timeout of 60 min, CPLEX
find the optimum solution for instances up to 10 vessels; for instances between 10
and 65 vessels it finds a non-optimum solution and for bigger instants no solution
is founded. Finally we suggest the steps to be taken to implement the model for the
FFLP BAP in a maritime terminal of containers.
1 Introduction
F. Gutierrez (B)
Department of Mathematics, National University of Piura, Piura, Peru
e-mail: flabio@unp.edu.pe
E. Lujan (B)
Department of Informatics, National University of Trujillo, Trujillo, Peru
e-mail: elujans@unitru.edu.pe
R. Asmat (B) · E. Vergara (B)
Department of Mathematics, National University of Trujillo, Trujillo, Peru
e-mail: rasmat@unitru.edu.pe
E. Vergara
e-mail: evergara@unitru.edu.pe
© Springer Nature Switzerland AG 2019 87
R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_5
88 F. Gutierrez et al.
Equivalent Unit) have moved all over the world; at present, China leads this type of
transport with 199 565 501 TEUs, followed by the United States with 48 381 723
TEUs, according to UNCTAD [18].
Port terminals that handle containers are usually known as Maritime Container
Terminals (MCT), have different shapes and dimensions, and some of them have
many quays. Since a MCT is an open system with three distinguishable areas (berth,
container yard and landside areas) there exist different complex optimization prob-
lems [17]. In this work we focus on the Berth Allocation problem (BAP).
The BAP is an NP-Hard complexity problem [12], consisting in allocating one
position and a time of berthing to each vessel arriving to the terminal. When a vessel
arrives to the quay, it needs to wait before it can be attended. The goal of the present
work is to minimize such waiting time.
Due to multiple factors such as weather conditions (rain, storms, etc.), technical
problems, stops at other terminals, among others, vessels can arrive earlier or later
than their scheduled arrival time which makes the actual times of arrival for each
vessel highly uncertain [2, 11]. This situation affects the operations of load and
discharge, other activities at the terminal and the services required by costumers.
There are many types of uncertainty such as the randomness, imprecision (ambi-
guity, vagueness), and confusion, that can be categorized either stochastic or fuzzy
[24]. Since the fuzzy sets are specially designed to deal with imprecision, they were
selected for the present work.
The administrators of MCT continuously review and change the plans, but a
frequent review of the berthing plan is not a desirable thing from a planning of
resources point of view. Therefore, the capacity of adaptation of the berthing plan is
important for a good system performance that a MCT manages. As a result, a robust
model providing a berthing plan that supports the possible imprecision (earliness or
lateness) in the arrival time of vessels and easily adaptable is desirable.
Among the many attributes commonly desired to classify the models related to the
BAP [1]. The spatial and temporal attributes are the most important ones. The spatial
attribute can be discrete or continuous. In the discrete case, the quay is considered
as a finite set of berths, where segments of finite length describe every berth and
usually a berth just works for one vessel at a time, whereas in for the continuous
case, the vessels can berth at any position within the limits of the quay. On the
other hand, the temporal attribute can be static or dynamic. In the static case, all
the vessels are assumed to be at the port before performing the berthing plan while
for the dynamical case, the vessels can arrive to the port at different times during
the planning horizon. In [1], the authors make an exhaustive review of the current
existing literature about BAP. To our knowledge, there are very few studies dealing
with BAP and with imprecise (fuzzy) data.
A cooperative search is developed in [10], to deal with the problem of the discrete
and dynamical BAP. A such work the problem is assumed to be deterministic.
In [23], the uncertainty is dealed with probabilities, that is, it is supposed that
historical data are available to obtain the distribution of probability of arrival for each
vessel. In many ports there is not enough data available to obtain these distributions.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 89
A fuzzy MILP (Mixed Integer Lineal Programming) model for the discrete and
dynamic BAP was proposed in [4], triangular fuzzy numbers represent the arrival
times of vessels, they do not address the continuous BAP. According to Bierwirth
[1], to design a continuous model, the planning of berthing is more complicated than
for a discrete one, but the advantage is a better use of the space available at the quay.
The continuous and dynamic BAP, with imprecision in the arrival of vessels
represented by triangular fuzzy numbers was studied in [5, 6]. In the first a MILP
fuzzy model is proposed and a α − cuts method is used to obtain the solution. In the
latter, a Fully Fuzzy Linear Programming (FFLP) model is proposed and is solved
by the Nasseri method.
The models cited at previous works do not deal the BAP problem with two quays.
In [7], a MILP model to the BAP with multiple quays was developed. In this model,
the imprecision in the arrival of vessels is not taken into account.
In this work, we study the dynamical and continuous BAP with two quays and
imprecision in the arrival of vessels. We suppose that the distributions probability
for the advances and delays of vessels is unknown, that is, the problem can not be
treated with stochastic optimization. We assume that the arrival times of vessels are
imprecise, the triangular fuzzy numbers represent the imprecision.
This paper is structured as follows: In Sect. 2, we describe the basic concepts of
fuzzy sets. Section 3, presents the formulation of a Fully Fuzzy Linear Program-
ming Problem, and describes a method of solution. Section 4, describes the BAP, the
notation that was used in the models, the assumptions, and the benchmarks for the
BAP used to evaluated the models. Section 5, shows the fuzzy MIPL model for the
BAP with two quays. In Sect. 6, shows the FFLP model for the BAP with two quays.
Finally, conclusions and future lines of research are presented in Sect. 7.
The fuzzy sets offers a flexible environment to optimize complex systems. The con-
cepts about fuzzy sets are taken from [21].
in X is a set of pairs:
Definition 1 Let X be the universe of discourse. A fuzzy set A
= {(x, μ A(x)), x ∈ X }
A
where μ A : X → [0, 1] is called the membership function and, μ A(x) represents the
degree that x belongs to the set A.
90 F. Gutierrez et al.
Definition 3 The fuzzy set A in R is convex if and only if the membership function
of A satisfies the inequality
Aα = {x : μ A(x) ≥ α, x ∈ R}
(Fig. 1).
is called α − cut of A
This concept provides a very interesting approach in fuzzy set theory, since the
family of α − cuts contains all information about the fuzzy set. By adjusting the α
value we can get the range or set of values that satisfy a given degree of membership.
In other words, the α value ensures a certain level of satisfaction, precision of the
result or robustness of the model.
To a fuzzy set with membership function of triangular type, A = (a1, a2, a3) (see
Fig. 1), the α − cut is given by:
Fig. 1 Interval
corresponding to an α − cut
level, for a triangular fuzzy
number
Fully Fuzzy Linear Programming Model for the Berth Allocation … 91
If we have the nonnegative triangular fuzzy numbers a = (a1, a2, a3) and
b=
(b1, b2, b3), the operations of sum and difference are defined as follows:
Sum: a + b = (a1 + b1, a2 + b2, a3 + b3).
Difference: a −b = (a1 − b3, a2 − b2, a3 − b1).
a1 + a2 + a3
R(A) = (2)
3
As a result, A ≤ B when R(A) ≤ R(B), that is,
a1 + a2 + a3 ≤ b1 + b2 + b3.
Subject to
n
xj ≤
ai j bi , ∀i = 1 . . . m (3)
j=1
where parameters ai j ,
cj, b j and the decision variable x j are nonnegative fuzzy
numbers ∀ j = 1 . . . n, ∀i = 1 . . . m.
If all parameters and decision variables are represented by triangular fuzzy
numbers, C ai j = (a1i j , a2i j , a3i j ),
= (c1 j , c2 j , c3 j ), bi = (b1i , b2i , b3i ),
xj =
(x1 j , x2 j , x3 j ).
Nasseri’s method transforms (3) into a classic problem of mathematical program-
ming.
⎛ ⎞
n
max R ⎝ (c1 j , c2 j , c3 j )(x1 j , x2 j , x3 j )⎠
j=1
n
a1i j x1 j ≤ b1i , ∀i = 1 . . . m (4)
j=1
n
a2i j x2 j ≤ b2i , ∀i = 1 . . . m (5)
j=1
n
a3i j x3 j ≤ b3i , ∀i = 1 . . . m (6)
j=1
x2 j − x1 j ≥ 0, x3 j − x2 j ≥ 0 (7)
4 Problem Description
The BAP with two quays consists in deciding the quay, the moment, and the position
when each vessel arriving to the terminal must moor. In this way the waiting time is
minimized. The BAP can be represented in a bidimensional way, as shown in Fig. 2,
the horizontal axis (Time) represents the time horizon and the vertical axis (Quay),
the length of the quay.
The notation to be used in the formulation of the problem is showed in Fig. 2 and
the Table 1.
With the aim of evaluating the models presented in [5, 6], we develop the benchmark
BABQCAP-Imprecise, that is a extended version of the BAPQCAP. In this extension,
the arrival times of vessels are considered imprecise. To simulate this imprecision, in
every instance of the benchmark BAPQCAP, the possibility of delay and advance was
added to the arrival time up to an allowed tolerance. This possibility is represented
by a fuzzy triangular number (a1, a2, a3) (see Fig. 1).
Where:
a1: Minimum allowed advance in the arrival of the vessel. This value is random and
it is generated within the range [a−20, a].
a2: Arrival time with the highest possibility of a vessel (taken from original bench-
mark).
a3: Maximum allowed delay in the arrival of the vessel. This value is also random
and it is generated within the range [a, a+20].
Table 3, shows the modification done to the instance of Table 2. We can appreciate
the third column is the value of the arrival time of vessel with the highest possibility,
the second one represents the advance and the fourth one, the delay.
The triangular fuzzy number used to represent the imprecision in the arrival is
obtained from an expert present in every vessel. This expert have to indicate the time
interval of possible arrival, as well as the most-possible time the arrival occurs.
This data could also be obtained from historical data regarding the arrival of
each vessel.
With the aim to show the advantages and disadvantages of the models presented in
this work, we use one instance consisting of 10 vessels (Table 3) as a case study.
In Fig. 3, we show the imprecise arrival of vessel as a triangular fuzzy number.
For example, for vessel V2, the most possible arrival is at 86 units of time, but it
96 F. Gutierrez et al.
could be early or late -up to 77 and 103 units of time, respectively; the handling time
is 100 and the length of vessel is 232.
In this section we proposed a fuzzy MILP model to the continuous and dynamical
BAP able to allocate a quay to an arriving vessel. This model is an extension of the
model presented in [5], developed for a single quay.
We assume imprecision in the arrival time of vessels, meaning that the vessels
can be late or early up to a given allowed tolerance.
Formally, we consider that the imprecision in the arrival time of vessels is a fuzzy
number a . The goal is to allocate a certain time and a place at the quay q ∈ Q, to
every vessel according certain constraints, with the aim of minimize the total waiting
time of vessels.
min (m iq −
ai ) (8)
q∈Q i∈V
Subject to:
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (9)
q∈Q
m iq ≥
ai ∀i ∈ V, ∀q ∈ Q (10)
piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (11)
m iq + h i ≤ H ∀i ∈ V, ∀q ∈ Q (13)
y
m jq − (m iq + h i ) + M(1 − z i jq ) ≥ S(
ai ) ∀i, j ∈ V, i = j, ∀q ∈ Q (14)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 97
y y
z ixjq + z xjiq + z i jq + z jiq ≥ B Miq + B M jq − 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (15)
y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q (16)
If the deterministic and fuzzy parameters are of linear-type we are dealing with a
fuzzy MILP model. The constraints are explained below:
where z ixjq decision variable indicating if vessel i is located to the left of vessel j at
y
the berthing (z ixjq = 1), z i j = 1 indicates that the berthing time of vessel i is before
the berthing time of vessel j, in the quay q. M is a big integer constant.
The α−cut represents the time interval allowed for the arrival time of a vessel,
given a grade precision α. The size of the interval S(α) = (1 − α)(a3 − a1) must
be taken into account to the berthing time vessel next to berth. It can be observed
that for the value α, the earliness allowed is E(α) = (1 − α)(a2 − a1), the delay
allowed is D(α) = (1 − α)(a3 − a2) and S(α) = e(α) + D(α).
In Fig. 4, the alpha cuts B10.5 , B60.5 and B30.5 for the arrival of three vessels, with
a level cut α = 0.5 are showed.
By using the alpha-cuts as a method of defuzzification to the fuzzy arrival of ves-
sels, a solution to the fuzzy BAP model is obtained with the next auxiliary parametric
MILP model.
98 F. Gutierrez et al.
delay allowed i
subject to:
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (18)
q∈Q
piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (20)
y
m jq − (m iq + h i ) + M(1 − z i jq ) ≥ Si (α) ∀i, j ∈ V, i = j, ∀q ∈ Q (22)
y y
z ixjq + z xjiq + z i jq + z jiq ≥ 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (23)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 99
y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q. (24)
In the parametric MILP model, the value of α is the grade of precision allowed in
the arrival time of vessels. For every α ∈ [0, 1], and for every vessel i, the allowed
tolerance time Si are computed. The lower the value α is, the lower the precision,
i.e., the longer the allowed time at the arrival of every vessel.
5.2 Evaluation
To the evaluation we have used a personal computer equipped with a Intel Core (TM)
i3 CPU M370 @ 2.4 GHz with 4.00 GB RAM. The experiments were performed with
a timeout of 60 min.
For each instance, eleven degrees of precision (α = {0, 0.1, . . . , 1}), generated eleven
berthing plans.
As an illustrative example, three different berthing plans are showed in Tables 4,
5 and 6, for the vessels of Table 3.
Table 5 Berthing plan with α = 0.5 of precision in the arrival time of vessels
V a1 a2 a3 E D m1 m2 m3 h d1 d2 d3 l p Q
V1 15 34 42 9.5 4.0 24.5 34.0 38.0 60 84.5 94.0 98.0 260 440 1
V2 77 86 103 4.5 8.5 102.0 106.5 115.0 100 20.02 206.5 215.0 232 302 0
V3 30 43 55 6.5 6.0 36.5 43.0 49.0 120 156.5 163.0 169.0 139 561 0
V4 150 165 184 7.5 9.5 158.0 165.0 174.5 110 267.5 275.0 284.5 193 0 1
V5 33 52 69 9.5 8.5 42.5 52.0 60.5 80 122.5 132.0 140.5 287 0 0
V6 50 67 82 8.5 7.5 98.0 106.5 114.0 90 188.0 196.5 204.0 318 382 1
V7 22 38 50 8.0 6.0 30.0 38.0 44.0 100 130.0 138.0 144.0 366 0 1
V8 2 15 29 6.5 7.0 8.5 15.0 22.0 80 88.5 95.0 102.0 166 395 0
V9 95 110 118 7.5 4.0 144.0 151.5 155.5 90 234.0 241.5 245.5 109 193 1
V10 81 95 115 7.0 10.0 140.5 147.5 157.5 120 260.5 267.5 277.5 251 0 0
F. Gutierrez et al.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 101
The column corresponding to the Q, in Tables 4, 5 and 6, indicates the quay where
the vessel has to moor. The value 1 makes reference that the vessel must moor in
Quay one and the quay two if the value is 0.
For α = 1, maximum precision in the arrival of vessels (see Table 4), the earliness
and delays are E = 0 and D = 0, respectively, that is, earliness and delays are not
allowed in the arrival of any vessel. In most cases, if a vessel has a delay with respect
to its precise arrival time, this plan ceases to be valid. For example, vessel V 5 berths
at quay one. Vessel V 5 a berthing time m2 = 52 and a departure time d2 = 132, if
this vessel has a delay, then vessel V 9 can not berth at its allocated time m2 = 94.
Vessel V 1 berths at quay two, if V 1 is late V 2 can not berth at its allocated time.
This can be observed in Fig. 5. For a greater number of vessels, the delay of vessels
complicate even more the berthing plans.
The case when precision degree α = 0.5 is shown in Table 5. If vessel V 5 is, for
instance, assigned to quay two, the optimum berthing time is m2 = 52, the earliness
allowed is E = 9.5, the delay allowed D = 8.5, that is, the vessel can berth and the
time interval [42.5, 60.5], and the departure can be at the time interval [122.5, 140.5],
the optimum departure time is d2 = 132; after vessel V 5, vessel V 10 can berth, the
optimum berthing is m2 = 147.5, with an allowed earliness of E = 7 and a allowed
delay of D = 10, that is, the vessel can berth and the time interval [140.5, 157.5],
and the departure can be at the time interval [260.7.5, 277.5], the optimum departure
time is d2 = 267.5 (see Fig. 6).
For α = 0, minimum allowed precision in the arrival time of vessels, the earliness
and delays are increased (see Table 6). For instance, if vessel V 5 is assigned to quay
two, the optimum time of berthing is m2 = 52, the allowed earliness and delays
are E = 19, D = 17, respectively. Therefore, the time interval where the vessel can
berth is [33, 69], after vessel V 5, vessel V 2 can berth, the optimum time of berthing
is m2 = 158, but it can berth at the time interval [149, 175] (see Fig. 7).
Considering the structure of the model created, for every value of α, allowed the
earliness and delays are proportional to its maximum earliness and delay times. For
102 F. Gutierrez et al.
The values showed are the average of the objective function of solutions founded
(Avg T ), the number of instances solved with optimality (#Opt) and the number of
instances solved without certified optimality (#NOpt).
In our results, it can be observed that in all the solved cases, T increases as the
number of vessels increases. To the given timeout, CPLEX, has found the optimum
solution in 30% of the instances with 10 vessels; a non-optimum solution in 100%
of the instances from 15 to 65 vessels; and for number of vessels greater or equal to
70 no solution was found.
The growth of T for the values of α = {0, 0.5, 1} is shown in Fig. 8. With the
given timeout, CPLEX has found a solution up to instances of 65 vessels.
104 F. Gutierrez et al.
We propose a FFLP model to the continuous and dynamical BAP able to allocate
a quay to a incoming vessel; which is an extension of the model presented in [5],
developed to a single quay. This model solves the inconvenience of a great waste of
time without the use of the quays of the MILP fuzzy model (see Sect. 5).
Fully Fuzzy Linear Programming Model for the Berth Allocation … 105
Subject to:
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (26)
q∈Q
iq ≥
m ai ∀i ∈ V, ∀q ∈ Q (27)
piq + li ≤ L q ∀i ∈ V, ∀q ∈ Q (28)
iq +
m h i ≤ H ∀i ∈ V, ∀q ∈ Q (30)
jq +
y
m hi + ≤ m
i + M(1 − z i jq ) ∀i, j ∈ V, i = j, ∀q ∈ Q (31)
106 F. Gutierrez et al.
y y
z ixjq + z xjiq + z i jq + z jiq ≥ B Miq + B M jq − 1 ∀i, j ∈ V, i = j, ∀q ∈ Q (32)
y
z ixjq , z i jq ∈ {0, 1} ∀i, j ∈ V, i = j, ∀q ∈ Q. (33)
The interpretation of constraints are similar to the model of Sect. 5, with the
exception of the constraint (31). This constraint with regard to the time and indicate
the vessel berths after or before another one.
We assume that all parameters and decision variables are linear and some of them
are fuzzy. Thus, we have a fully fuzzy linear programming problem (FFLP).
The arrival of every vessel is represented by a triangular possibility distribu-
tion a = (a1 , a2 , a3 ), in a similar way, the berthing time is represented by m=
(m 1 , m 2 , m 3 ), and
h = (h 1 , h 2 , h 3 ) is considered a singleton.
When representing parameters and variables by triangular fuzzy numbers, we
obtain a solution to the fuzzy model proposed applying the methodology proposed
by Nasseri (see Sect. 3).
To apply this methodology, we use the operation of fuzzy difference on the objec-
tive function and the fuzzy sum on the constraints (see Sect. 2.2) as well as the First
Index of Yagger as an ordering function on the objective function (see Sect. 2.3)
obtaining the next auxiliary MILP model.
1
min ((m1iq − a3i ) + (m2iq − a2i ) + (m3iq − a1i )) (34)
q∈Q i∈V
3
Subject to:
B Miq = 1 ∀i ∈ V, ∀q ∈ Q (35)
q∈Q
piq + li ≤ L q ∀i ∈ V (39)
m3iq + h i ≤ H ∀i ∈ V (40)
y
m1iq + h i ≤ m1 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (42)
y
m2iq + h i ≤ m2 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (43)
y
m3iq + h i ≤ m3 jq + M(1 − z i jq ) ∀i, j ∈ V, i = j (44)
6.2 Evaluation
For the evaluation a personal computer equipped with a Core (TM) i3 CPU M370
@ 2.4 GHz with 4.00 GB RAM was used. The experiments were performed within
a timeout of 60 min.
For the vessels of study case (see Table 3), the berthing plan obtained with the model
is showed in Table 9, and the polygonal-shaped are showed in Fig. 9.
The berthing plan showed in Table 9, is a fuzzy berthing one, e.g, for vessel V4,
the most possible berthing time is at 165 units of time, but it could berth between
150 and 184 units of time; the most possible depart time is at 275 units of time, but
it could departure between 260 and 294 units of time.
An appropriate way to observe the robustness of the fuzzy berthing plan is the
polygonal-shape representation (see Fig. 9). The line below the small triangle repre-
sents the possible early berthing time; the line that is above the small triangle, the
possible late berthing time, the small triangle represents the optimum berthing time
(with a greater possibility of occurrence) and the length of the polygon represents
the time that the vessel will stay at the quay.
In the circles of Fig. 9, we observe an apparent conflict between the departure
time of some vessels with others, at quay one, vessels V8 with V2, and vessel V5
with V9; at quay two, vessels V7 with V10. The conflicts are not such, for example,
if vessel V8 is late, vessel V2 has slack times supporting delays. Assume that vessel
V8 is late 10 units of time; according the Table 9, the berthing occurs at m = 15 +
10 = 25 units of time and its departure occurs at d = 25 + 80 = 105 units of time;
vessel V2 can moor after this time, since according to Table 9, its berthing can occur
between 82 and 109 units of time. A similar situation occurs for vessels V5 and V9
at quay one and for V7 and V8 at quay two; as observed in Fig. 10.
To analyze the robustness of the fuzzy berthing plan, we simulate the incidences
showed in Table 10.
With the incidences of Table 10, a feasible berthing plan can be obtained as shown
in Table 11. In Fig. 11, we observe that the berthing plan obtained, is a part of the
fuzzy plan obtained initially.
Fully Fuzzy Linear Programming Model for the Berth Allocation … 109
Table 12, shows the average of results obtained by CPLEX to the Benchmark
BAPQCAP-Imprecise (see Sect. 4.2).
From results, we can observe that in all cases solved by CPLEX, the objective func-
tion T increases as the number of vessels increases. To the given timeout, CPLEX,
found the optimum solution in 8% of the instances with 10 vessels; a non-optimum
solution in 100% of the instances from 15 to 65 vessels; and for a number of vessels
greater or equal to 70 no solution was founded.
The model FFLP for the BAP could be applied in MTC two or more quays, to this
end it is suggested to follow the following steps:
• Step 1: To set a planning horizon and the length of the quay.
• Step 2: An expert in every vessel has to indicate the time interval of possible arrive,
as well as the most possible time the arrival occurs (approximations to this data
could be also obtained from historical data of the arrival of each vessel).
• Step 3: Having the data of step 2, we form the fuzzy triangle representing the
imprecise arrival of each vessel.
• Step 4: Known parameters in advance of each vessel (fuzzy triangle of arrivals,
the service time and the length of the vessel) must be entered to the model.
• Step 4: Solve an auxiliary model by a linear programming solver. The decision
variables obtained are the mooring time and the position at the quay. For bigger
instances (greater than 65 vessels) given the high complexity of the BAP, the aux-
iliary model must be solved by a heuristic or meta-heuristic (previously evaluated
as the most efficient) approach giving good solutions in reasonable times.
• Step 5: With the parameters and decision variables obtained we must form the
fuzzy berthing plan.
• Step 6: With the incidences occurring for every vessel (earliness or delay) within
an allowed threshold, to carry out the final berthing plan.
7 Conclusion
Both models presented in this work, solve the continuous and dynamical BAP for
two quays with imprecision in the arrival of vessels.
The results obtained show that the fuzzy MILP model to the BAP provides differ-
ent berthing plans with different degrees of precision, but it also has an inconvenience:
after the berthing time of a vessel, the next vessel has to wait all the time considered
for the possible earliness and delay. This represents a big waste of time without the
use of the quay and the vessel has to stay longer than is necessary at the port.
The model FFLP to the BAP surpasses the inconvenience of the fuzzy MILP
model, the fuzzy berthing plan obtained can be adapted to possible incidences in the
vessel arrivals.
112 F. Gutierrez et al.
The models were evaluated with a timeout of 60 min. In that time both models
were able to find the optimum solution for a small number of vessels, for instances
from 15 up to 65 vessels they found non-optimum solutions and for greater number
vessels they found no solutions.
To implement the model in a MTC we suggest the steps to follow.
Finally, because of this research, we have open problems for future researches:
To extend the model that considers the quay cranes to be assigned to every vessel. To
use meta-heuristics to solve the fuzzy BAP model more efficiently, when the number
of vessels is greater.
References
1. Bierwirth, C., Meisel, F.: A survey of berth allocation and quay crane scheduling problems in
container terminals. Eur. J. Oper. Res. 202(3), 615–627 (2010)
2. Bruggeling, M., Verbraeck, A., Honig, H.: Decision support for container terminal berth plan-
ning: integration and visualization of terminal information. In: Proceedings of Van de Vervoers
logistieke Werkdagen (VLW2011), pp. 263–283. University Press, Zelzate (2011)
3. Das, S.K., Mandal, T., Edalatpanah, S.A.: A mathematical model for solving fully fuzzy linear
programming problem with trapezoidal fuzzy numbers. Appl. Intell. 46(3), 509–519 (2017)
4. Exposito-Izquiero, C., Lalla-Ruiz, E., Lamata, T., Melian-Batista, B., Moreno-Vega, J.: Fuzzy
optimization models for seaside port logistics: berthing and quay crane scheduling. Computa-
tional Intelligence, pp. 323–343. Springer International Publishing, Cham (2016)
5. Gutierrez, F., Vergara, E., Rodrguez, M., Barber, F.: Un modelo de optimización difuso para el
problema de atraque de barcos. Investig. Oper. 38(2), 160–169 (2017)
6. Gutierrez, F., Lujan, E., Vergara, E., Asmat, R.: A fully fuzzy linear programming model to
the berth allocation problem. Ann. Comput. Sci. Inf. Syst. 11, 453–458 (2017)
7. Frojan, P., Correcher, J., Alvarez-Valdez, R., Kouloris, G., Tamarit, J.: The continuous Berth
Allocation Problem in a container terminal with multiple quays. Exp. Syst. Appl. 42(21),
7356–7366 (2015)
8. Jimenez, M., Arenas, M., Bilbao, A., Rodríguez, M.V.: Linear programming with fuzzy param-
eters: an interactive method resolution. Eur. J. Oper. Res. 177(3), 1599–1609 (2007)
9. Kim, K., Moon, K.C.: Berth scheduling by simulated annealing. Transp. Res. Part B Methodol.
37(6), 541–560 (2003)
10. Lalla-Ruiz, E., Melin-Batista, B., Moreno-Vega, J.: cooperative search for berth scheduling.
Knowl. Eng. Rev. 31(5), 498–507 (2016)
11. Laumanns, M., et al.: Robust adaptive resource allocation in container terminals. In: Proceed-
ings of 25th Mini-EURO Conference Uncertainty and Robustness in Planning and Decision
Making, Coimbra, Portugal, pp. 501–517 (2010)
12. Lim, A.: The berth planning problem. Oper. Res. Lett. 22(2), 105–110 (1998)
13. Luhandjula, M.K.: Fuzzy mathematical programming: theory, applications and extension. J.
Uncertain Syst. 1(2), 124–136 (2007)
14. Nasseri, S.H., Behmanesh, E., Taleshian, F., Abdolalipoor, M., Taghi-Nezhad, N.A.: Fully
fuzzy linear programming with inequality constraints. Int. J. Ind. Math. 5(4), 309–316 (2013)
15. Rodriguez-Molins, M., Ingolotti, L., Barber, F., Salido, M.A., Sierra, M.R., Puente, J.: A
genetic algorithm for robust berth allocation and quay crane assignment. Prog. Artif. Intell.
2(4), 177–192 (2014)
Fully Fuzzy Linear Programming Model for the Berth Allocation … 113
16. Rodriguez-Molins, M., Salido, M.A., Barber, F.: A GRASP-based metaheuristic for the Berth
allocation problem and the quay crane assignment problem by managing vessel cargo holds.
Appl. Intell. 40(2), 273–290 (2014)
17. Steenken, D., Vo, S., Stahlbock, R.: Container terminal operation and operations research-a
classification and literature review. OR Spectr. 26(1), 3–49 (2004)
18. UNCTAD: Container port throughput, annual, 2010–2016. http://unctadstat.unctad.org/wds/
TableViewer/tableView.aspx?ReportId=13321. Accessed 02 March 2018
19. Wang, X., Kerre, E.: Reasonable properties for the ordering of fuzzy quantities (I). Fuzzy Sets
Syst. 118(3), 375–385 (2001)
20. Yager, R.R.: A procedure for ordering fuzzy subsets of the unit interval. Inf. Sci. 24(2), 143–161
(1981)
21. Young-Jou, L., Hwang, C.: Fuzzy Mathematical Programming: Methods and Applications, vol.
394. Springer Science & Business Media, Berlin (2012)
22. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst. 100, 9–34 (1999)
23. Zhen, L., Lee, L.H., Chew, E.P.: A decision model for berth allocation under uncertainty. Eur.
J. Oper. Res. 212(3), 54–68 (2011)
24. Zimmermann, H.: Fuzzy Set Theory and its Applications, Fourth Revised edn. Springer, Dor-
drecht (2001)
Ideal Reference Method with Linguistic
Labels: A Comparison with LTOPSIS
Abstract In many life situations we are in the presence of decision making problems,
therefore it becomes necessary to study different theories, methods and tools to
solve these kinds of problems as efficiently as possible. In this paper, we describe
the elements that integrate a decision making model, as well as show some of the
compensatory multicriteria decision making methods such as TOPSIS, VIKOR or
RIM, that are most used. In particular, we identify the limitations of the RIM method
to operate with linguistic labels. Next, the basic concepts of the Reference Ideal
Method are described, and another variant is proposed to determine the minimum
distance to the Reference Ideal, as well as the normalization function. We illustrate
our method by means of an example and compare the results with those obtained by
the LTOPSIS method. Finally, the conclusions are presented.
1 Introduction
There are different situations where it is necessary to solve a decision making prob-
lem. To facilitate the work of the decision maker, different methods have been devel-
oped, among which the Multicriteria Decision Making methods (MCDM) can be
mentioned. Particularly, we will refer to the methods with a compensatory concep-
tion [1]. The purpose of this kind of problem is the selection of the best alterna-
tive Ai , i 1, 2, . . . , m, from the evaluation of each alternative for a criteria set
E. H. Cables (B)
Universidad Antonio Nariño, Bogotá, Colombia
e-mail: ehcables@uan.edu.co
M. T. Lamata · J. L. Verdegay
Universidad de Granada, 18071 Granada, Spain
e-mail: mtl@decsai.ugr.es
J. L. Verdegay
e-mail: verdegay@decsai.ugr.es
w1 w 2 wn
C1 C2 · · · Cn
⎛ ⎞
A1 x11 x12 · · · x1n
M
A2 ⎜ ⎟
⎜ x21 x22 · · · x2n ⎟
⎜ ⎟
.. ⎜ .. .. . . .. ⎟
. ⎝ . . . . ⎠
Am xm1 xm2 · · · xmn
this paper is to propose a variant of the RIM method to operate with linguistic labels.
After formulating the problem to be solved, in addition to mentioning the main com-
pensatory multicriteria decision making methods, the RIM method is described and
a new formulation is proposed to determine the minimum distance to the Reference
Ideal and to perform the normalization of the values of the decision matrix, so that
it allows operating with linguistic labels. Then, through an example solved with the
LTOPSIS method, the work of the L-RIM method is illustrated.
In general, the conception of TOPSIS, VIKOR and RIM methods is to determine the
best alternative from the separation to the ideal solution, however it uses different
metrics. On the other hand, the RIM method extends the work conception, because
it allows for the ideal solution to be a value or a values set that can be among the
minimum value and the maximum value.
To work with the RIM methods, it should be considered essential concepts associated
with each criterion C j , j 1, 2, . . . , n, which are described below:
• The Range R j , which represents a values set belonging to a universal set, which
can be an interval, a labels set, a numbers set or simple values. Additionally, it is
associated with each of the criteria.
• The Reference Ideal R I j , that represents the maximum importance of the criterion
C j for the associated Range, furthermore, R I j ⊂ R j .
Then, based on these concepts, the distance to the Reference Ideal is determined.
In this case, the distance from a value xi j to their corresponding Reference Ideal is
obtained by expression (1).
dmin xi j , I R j min
xi j − C j
,
xi j − D j
(1)
In this case, it is considered that R I j C j , D j and xi j is the valuation or
judgment of each alternative i for each criterion C j .
The RIM method, like the TOPSIS and VIKOR methods require the normalization
of the valuation or judgments matrix M in order to transform the values xi j to the
same scale. It is necessary to express that these methods have different metrics to
carry out the process of normalization of the matrix M.
In the particular case of RIM, the normalization of the M valuation matrix is done
through expression (2) [9].
118 E. H. Cables et al.
f : xi j ⊕ R j ⊕ R I j → [0, 1] (2)
where:
• R j A j , B j is the Range.
• RIj C j , D j is the Reference Ideal.
• dmin xi j , R Ij , it isobtained
through
expression
(1).
• xi j ∈ A j , B j , dist A j , C j
A j − C j
and dist D j , B j
D j − B j
The concepts referred to above are essential for working with the RIM method,
which is designed to operate only with numerical arguments. However, in everyday
practice there are several decisions making problems where the valuation of the
different alternatives Ai for each criterion C j is done through linguistics terms which
imply making a modification in the calculation method to determine the minimum
distance to the Reference Ideal and the normalization of the values set.
To guide the RIM method to operate with linguistic labels, it is first necessary to
associate with each label the numerical value to be used, which in this case will be a
triangular fuzzy number x̃ (x1 , x2 , x3 ). Then, the distance between two linguistic
labels can be obtained by expression (3).
dist L : L X × L Y → R
1
dist L (L X , L Y ) dist L X̃ , Ỹ (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 (3)
3
As it is known, the RIM method in its work conception uses the minimum distance,
therefore the following formulation for linguistic labels is necessary.
Then, from the definition above, we have the conditions to define the normalization
function.
Ideal Reference Method with Linguistic Labels … 119
f L : L ki j ⊕ R L j ⊕ R I L j → [0, 1]
⎧
⎪
⎪ 1 i f L ki j ∈ R I L j
⎪
⎪
⎪
⎪
⎪
⎪ dL L ki j ,R I L j L ki j ∈ L A j , L C j ∧ L ki j ∈
/ R IL j ∧
⎪
⎪1 − min
if
⎨ dist L L A j ,L C j dist L L A j , L C j
0
f L ki j , R L j , R I L j
L
⎪
⎪
⎪
⎪
dL
min
L ki j ,R I L j L ki j ∈ L D j , L B j ∧ L ki j ∈
/ R IL j ∧
⎪
⎪ 1−
if
⎪
⎪ dist L L D j ,L B j dist L D j , L B j
0
L
⎪
⎪
⎩ 0 i f other case
(5)
where:
• dmin
L
L ki j , R I L j dmin
L
L ki j , L C j , L D j , which is obtained through expres-
sion (4).
• dist L L A j , L C j and dist L L D j , L B j , which is obtained through expression (3).
Starting with the definition of minimum distance to the reference ideal (expression 4)
for a working domain with linguistic labels, the normalization function (expression 5)
was established for the new context, which allows to modify some steps of the RIM
algorithm [9], as shown below:
Step 1. Define the context.
In this case, the information associated with each criterion C j are linguistic terms,
therefore, it is defined:
• The Range R L j L A j , L B j , which is a linguistic labels set.
• The Reference Ideal R I L j L C j , L D j , which is a linguistic labels set, where
R IL j ⊆ RL j .
• The weight w j associated to the criterion.
Step 2. Obtain the decision matrix, where the valuations issued lki j are linguistic
terms, such that lki j ∈ R L j .
120 E. H. Cables et al.
⎛ ⎞
lk11 lk12 · · · lk1n
⎜ ⎟
⎜ lk lk · · · lk ⎟
⎜ 21 22 2n ⎟
V ⎜
⎜ .. .. . . .. ⎟
⎟
⎜ . . . . ⎟
⎝ ⎠
lkm1 lkm1 · · · lkmn
Step 5. Calculate the distance to the ideal and non-ideal solution of each alternative
Ai .
n
n 2
Ai+ and Ai−
2
pi j − w j pi j ,
j1 j1
where i 1, 2, . . . , m and j 1, 2, . . . , n.
Step 6. Calculate the relative index to the reference ideal of each alternative Ai .
Ai−
Ri A+ +A − , where 0 ≤ Ri ≤ 1, i 1, 2, . . . , m, where 0 ≤ Ri ≤ 1, i
i i
1, 2, . . . , m
Step 7. Rank the alternatives Ai in descending order from the relative index Ri .
Ideal Reference Method with Linguistic Labels … 121
If the relative index Ri is close to the value 1, it indicates that it is very good.
However, if this value is close to the value 0, it is interpreted that the alternative must
be rejected.
As can be observed, the RIM algorithm is modified in the following aspects:
• In step 1, the Range definition and Reference Ideal use linguistic data.
• The judgment or valuations matrix V is formed by linguistic labels.
• The normalization function uses the linguistic labels and the linguistic labels sets
as arguments.
4 Illustrative Example
To show the use of the proposed method, we apply the example used in [50] about a
study of a decision making maintenance problem in an engine Factory.
The decision-making problem consists of deciding which is the best system for
cleaning the pieces in the maintenance of four-stroke engines. In this problem, we
have the following alternatives:
• A1: Conventional cleaning
• A2: Chemical cleaning
• A3: Ultrasonic cleaning
To evaluate the different alternatives, the following criteria were used:
• C1: Total annual operation cost
• C2: System productivity
• C3: System load capacity
• C4: Cleaning efficiency
• C5: Harmful effects
In this case, the different criteria were evaluated for each alternative through the
set of linguistic labels defined in Table 1 and its graphic representation is observed
in Fig. 1.
Fig. 1 Graphic representation of the fuzzy numbers with their respective linguistic label
C1 C2 C3 C4 C5
A1 Medium good Medium good Fair Fair Medium good
A2 Medium good Fair Very good Medium good Very poor
A3 Fair Very good Medium good Medium good Medium good
The expert evaluated the previously defined alternatives to solve the established
problem, as shown in Fig. 2 [50].
When RIM is applied, it is necessary to consider the context (see Table 2). In this
case, the Range and the Reference Ideal are the same for all criteria.
When it is substituted in the decision matrix (Fig. 1) and in the working context
(Table 2), the fuzzy number corresponding to each linguistic label, obtaining the new
decision matrix, as well as the Range and the Reference Ideal (Table 3).
The Tables 4, 5 and 6 show the different steps of the algorithm.
Ideal Reference Method with Linguistic Labels … 123
Table 3 Representation with fuzzy number of the Decision Matrix, the Range and the Reference
Ideal
Alternatives C1 C2 C3 C4 C5
A1 (5, 6, 7) (5, 6, 7) (4, 5, 6) (4, 5, 6) (5, 6, 7)
A2 (5, 6, 7) (4, 5, 6) (8, 9, 10) (5, 6, 7) (0, 1, 2)
A3 (4, 5, 6) (8, 9, 10) (5, 6, 7) (5, 6, 7) (5, 6, 7)
Range RL1 RL2 RL3 RL4 RL5
LA (0, 1, 2) (0, 1, 2) (0, 1, 2) (0, 1, 2) (0, 1, 2)
LB (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10)
Reference R IL1 R IL2 R IL3 R IL4 R IL5
ideal
(8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10) (8, 9, 10)
Finally, it can be said that the order of the alternatives is A3 > A1 > A2 , which is
equal to the result obtained with LTOPSIS, although the Ri values are different but
very close.
When applying the LTOPSIS method to this decision problem, the following
relative index is obtained for each alternative (see Table 7).
As it is observed, when applying the LTOPSIS method and the LRIM method,
the same order is obtained for the alternatives ( A3 > A1 > A2 ), and the value of the
relative indexes are very close.
On the other hand, it is necessary to specify that the use of the LRIM method
offers advantages with respect to LTOPSIS, because the LRIM method uses the
124 E. H. Cables et al.
same working principle of RIM [9] and the RIM method does not present Rank
Reversal. The LRIM method only modifies the distance function to a set (4) and the
normalization function (5), which in this case operates with linguistic labels.
5 Conclusions
There are many multicriteria decision methods that can be applied to a decision
making problem to obtain the best alternative. Among them, in this paper we have
focused on TOPSIS and RIM for their algorithmic resemblance. Hence, we have
worked with linguistic variables and for that reason RIM had to be adapted for the
management of these data, while TOPSIS was already developed. Therefore, in this
paper a study of RIM was carried out and a modification was proposed to operate
with linguistic labels, arriving at the following main conclusions:
• It was only necessary to modify the working method to determine the minimum
distance to the Reference Ideal and the normalization function.
• Through the example used to show the work with L-RIM, it was observed that
the values obtained for the relative index were very close to the values obtained
with the LTOPSIS for other examples well known in the literature. However, this
will not always happen since TOPSIS cannot operate when the best value for a
certain criterion is not an extreme value (maximum or minimum) but that value is
included among them.
Acknowledgements This work has been partially funded by projects TIN2014-55024-P and
TIN2017-86647-P from the Spanish Ministry of Economy and Competitiveness, P11-TIC-8001
from the Andalusian Government, and FEDER funds. Also, the support provided by the Antonio
Nariño University, Colombia.
References
1. Keeney, R.L., Raiffa, H.: Decisions with Multiple Objectives: Preferences and Value Tradeoffs.
Wiley, New York (1976)
2. Saaty, T.L.: The analytic hierarchy process. McGraw-Hill, New York (1980)
3. Saaty, T.L.: Fundamentals of the Analytic Network Process. ISAHP, Kobe, Japan (1999)
4. Edwards, W., Barron, F.H.: SMARTS and SMARTER: improves simple methods for multiat-
tibute utility measurement. Organ. Behav. Hum. Decis. Process. 60, 306–325 (1994)
Ideal Reference Method with Linguistic Labels … 125
5. Roy, B.: Classement et choix en présence de points de vue multiples (la méthode ELECTRE).
Revue Francaise d’Informatique et de Recherche Opérationnelle 8, 57–75 (1968)
6. Brans, J.P., Vincke, P., Mareschal, B.: How to select and how to rank projects: the PROMETHEE
method. Eur. J. Oper. Res. 24, 228–238 (1986)
7. Hwang, C.L., Yoon, K.: Multi-attribute Decision Making: Methods and Applications. Springer-
Verlag, Berlin (1981)
8. Opricovic, S.: Multi-criteria optimization of civil engineering systems. Faculty of Civil Engi-
neering. Belgrade (1998)
9. Cables, E., Lamata, M.T., Verdegay, J.L.: RIM-reference ideal method in multicriteria decision
making. Inf. Sci. 337, 1–10 (2016)
10. Bozbura, F.T., Beskese, A., Kahraman, C.: Prioritization of human capital measurement indi-
cators using fuzzy AHP. Expert Syst. Appl. 32, 1100–1112 (2007)
11. Wang, Y.M., Luo, Y., Hua, Z.: On the extent analysis method for fuzzy AHP and its applications.
Eur. J. Oper. Res. 186, 735–747 (2008)
12. Dagdeviren, M., Yuksel, I.: Developing a fuzzy analytic hierarchy process (AHP) model for
behavior-based safety management. Inf. Sci. 178, 1717–1733 (2008)
13. Buyukozkan, G., Cifci, G., Guleryuz, S.: Strategic analysis of healthcare service quality using
fuzzy AHP methodology. Expert Syst. Appl. 38, 9407–9424 (2011)
14. Chou, C.H., Liang, G.S., Chang, H.C.: A fuzzy AHP approach based on the concept of possi-
bility extent. Qual. Quant. 47, 1–14 (2013)
15. Dabbaghian, M., Hewage, K., Reza, B., et al.: Sustainability performance assessment of green
roof systems using fuzzy-analytical hierarchy process (FAHP). Int. J. Sustain. Build. Technol.
Urban Dev. 5, 1–17 (2014)
16. Kubler, S., Voisin, A., Derigent, W., et al.: Group fuzzy AHP approach to embed relevant data
on communicating material. Comput. Ind. 65, 675–692 (2014)
17. Sánchez-Lozano, M., García-Cascales, M.S., Lamata, M.T.: Evaluation of optimal sites to
implant solar thermoelectric power plants: case study of the coast of the Region of Murcia,
Spain. Comput. Ind. Eng. 87, 343–355 (2015)
18. Ayag, Z., Ozdemir, R.: An intelligent approach to ERP software selection through fuzzy ANP.
Int. J. Prod. Res. 45, 2169–2194 (2007)
19. Onut, S., Tuzkaya, U.R., Torun, E.: Selecting container port via a fuzzy ANP-based approach:
a case study in the Marmara Region, Turkey. Trans. Policy 18, 182–193 (2011)
20. Kang, H.Y., Lee, A.H., Yang, C.Y.: A fuzzy ANP model for supplier selection as applied to IC
packaging. J. Intell. Manuf. 23, 1477–1488 (2012)
21. Vahdani, B., Hadipour, H., Tavakkoli-Moghaddam, R.: Soft computing based on interval valued
fuzzy ANP-A novel methodology. J. Intell. Manuf. 23, 1529–1544 (2012)
22. Roy, B.: ELECTRE III: Un algorithme de rangement fondé sur une représentation floue des
préférences en présence de critéres multiples. Cahiers du Centre d´Etudes de recherche oper-
ationnelle, 20, 3–24 (1978)
23. Montazer, G.A., Saremi, H.Q., Ramezani, M.: Design a new mixed expert decision aid-
ing system using fuzzy ELECTRE III method for vendor selection. Expert Syst. Appl. 36,
10837–10847 (2009)
24. Roy, B., Skalka, J.: ELECTRE IS, aspects méthodologiques et guide d´utilisation. Université
Paris-Dauphine, Paris, Cahier du LAMSADE (1985)
25. Yu, W.: ELECTRE TRI: Aspects methodologiques et manuel d´utilisation. Universite Paris-
Dauphine, Document du LAMSADE (1992)
26. Sevkli, M.: An application of the fuzzy ELECTRE method for supplier selection. Int. J. Prod.
Res. 48, 3393–3405 (2010)
27. Wu, M.-C., Chen, T.-Y.: The ELECTRE multicriteria analysis approach based on Atanassov’s
intuitionistic fuzzy sets. Expert Syst. Appl. 38, 12318–12327 (2011)
28. Hatami-Marbini, A., Tavana, M., Moradi, M., et al.: A fuzzy group Electre method for safety
and health assessment in hazardous waste recycling facilities. Saf. Sci. 51, 414–426 (2013)
29. Devi, K., Yadav, S.P.: A multicriteria intuitionistic fuzzy group decision making for plant
location selection with ELECTRE method. Int. J. Adv. Manuf. Technol. 66, 1219–1229 (2013)
126 E. H. Cables et al.
Mabel Frias, Yaima Filiberto, Gonzalo Nápoles, Rafael Falcon, Rafael Bello
and Koen Vanhoof
Abstract Fuzzy Cognitive Maps (FCMs) can be defined as recurrent neural net-
works that allow modeling complex systems using concepts and causal relations.
While this Soft Computing technique has proven to be a valuable knowledge-based
tool for building Decision Support Systems, further improvements related to its
transparency are still required. In this paper, we focus on designing an FCM-based
model where both the causal weights and concepts’ activation values are described by
words like low, medium or high. Hybridizing FCMs and the Computing with Words
paradigm leads to cognitive models closer to human reasoning, making it more com-
prehensible for decision makers. The simulations using a well-known case study
related to simulation scenarios illustrate the soundness and potential application of
the proposed model.
1 Introduction
Fuzzy Cognitive Maps (FCMs) [1] can be seen as neural networks that allow modeling
the dynamic of complex systems using concepts and causal relations between them.
They continue growing in popularity within the scientific community as a decision-
making method, where the transparency attached to the network becomes one of
their most relevant features.
Actually, the transparency of these knowledge-based networks has motivated
researchers to develop interpretable classifiers. As an example, Nápoles [2] pro-
posed an FCM using a single-output architecture to predict the resistance of HIV
mutations to existing drugs. While this model was able to notably outperform the tra-
ditional classifiers reported in the literature, such results could not easily be extended
to other application domains.
In scenario analysis, the problem shifts from obtaining high prediction rates to
exploiting the model by performing WHAT-IF simulations. More explicitly, due to
the fact that FCMs are comprised of cause-effect relations, the experts can explore
the impact of activating a subset of concepts over the whole system, where both
the activation of concepts and causal weights are described by numerical values.
However, this can be a challenge for experts since human beings usually think in a
more qualitative, symbolic way.
Besides, if we analyze the way to solve day-to-day activities, we realize that
depending on the aspect presented by each problem we can deal with different
numerical values, but in other cases, the problem presents qualitative aspects that
are complex to evaluate by means of exact values [3].
Combining the graphical nature of FCMs with natural language techniques to
describe the concepts’ activation values and the causal relationships between them
has recently emerged as a very attractive research direction.
The use of linguistics terms or words to describe the whole cognitive network
actually moves beyond the knowledge representation; preserving the features during
the neural inference rule is pivotal towards developing an accurate linguistic model. In
this paper, we further explore the hybridization between FCMs and the Computing
With Words, (CWW) [4] paradigm where the activation vectors and the weight
matrix are described using words, which allows removing the need for membership
functions. With this goal in mind, we adopt the symbolic model of CWW based on
ordinal scales since it is a very intuitive approach providing high interpretability.
The simulations using a case study evidence the theoretical soundness and broad
potentialities attached to ours proposals.
The paper is organized as follows. In the next Section, a short introduction to FCMs
is presented. Section 3 describes the basic principles behind the CWW paradigm, In
Sect. 4 some works related to combinations of FCM and CWW are described, while
in Sect. 5 the proposed models are presented. Section 6 presents the simulations,
whereas Sect. 7 summarizes the concluding remarks and the research directions to
be accomplished in a near future.
Comparative Analysis of Symbolic Reasoning Models … 129
• wij > 0 indicates a direct causality between the concept Ci and the concept Cj ,
that is, an increase (decrease) in the value of Ci leads to an increase (decrease) in
the value of Cj .
• wij < 0 indicates inverse (negative) causality between the concept Ci and the con-
cept Cj , that is, an increase (decrease) in the value of Ci leads to a decrease
(increase) in the value of Cj .
• wij = 0 indicates no relationship between Ci y Cj .
Equation 1 shows how to propagate an initial stimulus across the cognitive network
comprised of N neural processing entities, where A(t) j denotes the activation value
of concept Cj for the t-th iteration, whereas wij is the causal weight connecting
concepts Ci and Cj . Likewise, the function f (.) is a transfer function that keeps the
inner product into the allowed activation interval, e.g. f (x) = 1/(1 + e−λx ). Other
alternatives for the transfer functions are the bivalent, the trivalent or the hyperbolic
tangent function.
⎛ ⎞
N
A(t+1)
i =f ⎝ A(t) ⎠
j wji , i = j (1)
j=1
The above reasoning rule is repeated until either the network converges to a fixed-
point attractor or a maximal number of cycles is reached. The former scenario implies
that a hidden pattern was discovered [6] whereas the latter suggests that the system
outputs are cyclic or chaotic.
In 1973, Zadeh introduced the notion of linguistic variable, which allows computing
words instead of numbers [4]. This symbolic processing paradigm allows handling
linguistic variables (e.g., values in the form of words or sentences of natural lan-
guage). The notion of linguistic variable is adopted to describe situations that cannot
clearly be defined in quantitative terms. The linguistic variables allow translating the
130 M. Frias et al.
natural language into logical or numerical statements. The relevance of the CWW
paradigm in decision-making has allowed the emergence of different linguistic com-
putational models such as:
• Linguistic Computational Model based on the Extension Principle [7, 8]. In this
model, the semantics of linguistic terms are given by fuzzy numbers defined in
the [0, 1] interval, which are usually described by membership functions. The
following expression formalizes the linguistic aggregation operator attached to
this model, where S n symbolizes the n-Cartesian Product, F̃ is an aggregation
operator based on the extension principle, F(R) is the set of fuzzy sets over the
set of real numbers and app1 (.) is an approximation function that returns a label
from the linguistic term set S.
F̃ app1 (.)
S n → F(R) −→ S
• Linguistic Computational Symbolic Model based on ordinal scale [9]. This model
performs the computation on the indexes of the linguistic terms. Usually, it imposes
a linear order to the set of linguistic terms S = {S0 , . . . , Sg } where Si < Sj if and
only if i < j. Formally, it can be expressed as:
R app2 (.)
S n → [0, g] −→ {0, . . . , g} → S
where Si ∈ S and αi ∈ [−0.5, 0.5), app3 (.) is the aggregation operator for 2-tuples,
whereas the functions and −1 transform numerical values into a 2-tuples and
vice-versa without losing information.
4 Related Work
Recent work in FCMs has combine its graphical nature with natural language tech-
niques to describe both the concepts’ activation values and the causal relations
between them. In that way, it is obtain a qualitative reasoning model.
Comparative Analysis of Symbolic Reasoning Models … 131
For example, in 2014 is proposed a model for decision making with a FCM where
the causal relations are represented, initially, by linguistic 2-tuple. But to do the
FCM’s inference process, this values are transformed in numeric values [10].
A FCM for modeling consensus process is proposed in [11], where the linguistic
2-tuples is use as a form of causal knowledge representation. But again, the inference
process is perform over numerical values.
Rickard et al. [12] introduced, in 2015, a symbolic model based on interval type-2,
(IT2) fuzzy membership functions and the weighted power mean operator [13–16].
The Membership Functions are calculated from multiple-user interval inputs corre-
sponding to vocabulary words as described in the paper [17] by Hao and Mendel. The
aggregation functions used, are based upon the fuzzy neuronal model described in the
paper [18], which allows for the separate aggregations of positively and negatively
causal inputs to each node, followed by a final aggregation of these two aggregates.
Rather than using a distance function to map the IT2 node outputs at each iteration
into one of the IT2 vocabulary words, they use the Jaccard similarity measure for
this purpose. This method was applied, for first time, in a real medical dataset for
categorize the celiac disease, CD. This work show the good results of CWW FCM
method in a classification task [19].
In 2016, Gónzalez et al. [20] use the CWW paradigm to modeling project portfo-
lio risks interdependencies in a FCM. In this article the weight matrix is represented
using the 2-tuple linguistic model. This proposal allows visualize and do more under-
standable the relationships between the risks but it is not clear if the activation of
concepts (risks) is expressed with numerical values or using the 2-tuples model too.
That same year, Salah Hasan Saleh and his colleagues [21] proposed a FCM-model
where the weight matrix is expressed with hesitant fuzzy set [22]. This model was
used to improve the interpretability of diagnostic models of cardiovascular diseases.
Although this proposal achieves more flexibility to express causal relations between
concepts, the map was just used to show the relations between the symptoms and
there is no inference process.
More recently, in [23] the authors presented a model to perform the neural reason-
ing process of FCM-based systems using Linguistic Computational Symbolic Model
based on ordinal scale [9] to represent the concepts’ activation values and the weight
matrix. This proposal has the drawback of the symbolic model from CWW used,
loss of information, lack of accuracy and no parameters to adjust. Aiming at solving
these drawbacks in [24] the authors introduced a model that replaces the numerical
components of the FCM reasoning with linguistic terms represented with Triangular
Fuzzy Numbers (TFN) [25]. This model was applied in the analyzes of the effects
of different variables (i.e., concepts) leading to the presence of Chondromalacia in
a patient.
As it can be observe, the interest to returning to the FCM the diffuse aspect of
Kosko’s initial proposal have been growing. But not all proposals achieve a com-
pletely fuzzy inference process, most of them only represent the causal relationships
through linguistic terms that are transformed into numerical values before performing
the inference process. That’s why we going to carry out a comparative study between
Rickard’s proposal and the methods proposed in ISFUROS 2017 and MICAI 2017
since in these proposals the entire inference process is executed with linguistic terms.
132 M. Frias et al.
In this section, we describe a model where concepts’ activation values and weights
defining the semantics of the FCM-based systems are described using words, instead
of describing them using numerical values. The goal of this model is to improve the
transparency of FCM-based models but the reasoning process is not trivial, we have
to solve two key problems: (i) how to multiply two linguistic terms or words, and
(ii) how to add the result of this product.
Problem 1. What does A(t)j wji mean? Does it represent the product of two linguistic
terms defined in the CWW paradigm?
Problem 2. How to define a transfer function f (.) that has a set of words as an
argument? Is this function really needed?
In order to answer the above questions, let us assume a basic model comprising
a set of linguistic terms S = {NA (Not Activated), V L/ − V L (Very Low), L/ − L
(Low), ML/ − ML (Moderate Low), M / − M (Moderate), MH / − MH (Moderate
High), H / − H (High), V H / − V H (Very High)}. The negative linguistic terms in
S will only be used to describe a negative causal weights wij between two concepts
since we are assuming that concept’s activation values C = {C1 , C2 , . . . , CN } are
always positive.
Aiming at mapping the product A(t)j wji , we consider the operator described in Eq. 2,
where ς(wji ) and ς(A(t) (t)
j ) are the gaussian fuzzy numbers (GFN) [26] for wij and Ai ,
respectively.
Ni
ς(Ci(t+1) ) = Ij (wji , A(t)
j ) (3)
j=1
Usually the fuzzy number obtained from Eq. 3 do not mach with any linguistic
term in the initial linguistic terms set, so a linguistic approximation process is needed.
The next step of the proposed symbolic reasoning model is devoted to recovering
the linguistic term attached to ς(Ci(t+1) ). With this goal in mind, we use the deviation
between two GFNs as a distance function [31], which can be defined as follows:
Comparative Analysis of Symbolic Reasoning Models … 133
1 m
δ(â, b̂) = (a − bm )2 + (aσ1 − bσ1 )2 + (aσ2 − bσ2 )2 (4)
3
Equation 5 displays the reasoning rule for this configuration, which computes the
corresponding linguistic term for the ith linguistic concept. This function determines
the linguistic term reporting the minimal distance between its GFN and the one
resulting from Eq. 3. However, the linguistic term computed in this steps could be
defined by a GFN comprising negative values, which is not allMowed in our activation
model. Aiming at overcoming this issue, we rely on a transfer function for symbolic
domains showed in Fig. 2.
A(t+1)
i = argmin{δ(ς(Ci(t+1) ), ς(Sk ))} (5)
Sk ∈S
It should be stated that the linguistic FCM-based model presented in this section
preserves its recurrent nature. This implies that it will produce a state vector com-
prised of linguistic terms at each iteration until either a fixed-point is discovered or
a maximal number of iterations is reached.
Operating with words leads to other advantages, which are related to the system
convergence. After a certain number of iterations, a linguistic FCM will converge to
either a fixed-point attractor or a limit cycle (see [32] for further details) but chaos
is not possible. This happens because a linguistic FCM is a closed system that will
134 M. Frias et al.
Fig. 3 Example of a
linguistic FCM-based model
produce |S|N different responses at most. Therefore, after |S|N iterations, the map
will produce a previously visited state.
To illustrate how our model operates, let us consider the FCM displayed in Fig. 3,
which comprises 5 concepts and 7 causal relations.
The goal of this example is to compute the linguistic activation term for the
C5 concept given the following activation sequence: C1 ← H , C1 ← High(H ),
C2 ← High(H ), C3 ← Medium(M ), C4 ← Low(L). Once the concepts have been
activated, we can perform the reasoning process as explained above. This implies
computing the linguistic activation value A5 as the result of aggregating the linguis-
tic activation terms attached to concepts C1 − C4 and their corresponding linguistic
weights. Next we illustrate the operations related to one iteration in the symbolic
reasoning process:
I1 = ς(H )ς(−H ) = [0.82, 0.11, 0.11] ∗ [−0.83, 0.11, 0.11] = [−06806, 0.1815, 0.1815]
I2 = ς(H )ς(M ) = [0.82, 0.11, 0.11] ∗ [0.50, 0.11, 0.11] = [0.41, 0.1452, 0.1452]
Comparative Analysis of Symbolic Reasoning Models … 135
I3 = ς(M )ς(−M ) = [0.50, 0.11, 0.11] ∗ [−0.50, 0.11, 0.11] = [−0.25, 0.11, 0.11]
I4 = ς(L)ς(H ) = [0.12, 0.10, 0.10] ∗ [0.82, 0.11, 0.11] = [0.0984, 0.1022, 0.1022]
then,
..
.
1
δ(ς(C5 ), S4 ) = (−0.5062 + 0.50)2 + (0.4607 − 0.11)2 + (0.4607 − 0.11)2 = 0.28
3
..
.
1
δ(ς(C5 ), S15 ) = (−0.5062 − 1)2 + (0.4607 − 0.08)2 + (0.4607 − 0.08)2 = 0.92
3
This new model is include in the comparative analysis of the next section.
In this section, we present a case study in order to asses the reliability of the proposed
models for FCM-based systems.
The Mobile Payment System (MPS) was a project idea related to the fast evolving
world of mobile telecommunications. It was conceived as a prototype project to test
the validity and applicability of the FCM methodology developed. The idea behind
the MPS project is to allow mobile phone users to make small and medium payments
using their mobile phones [33], see Fig. 4.
136 M. Frias et al.
In this subsection, we study the behavior of our proposal and three FCM combined
with Computing with words.
The experiment is oriented to calculating the linguistic activation values of each
concept. This case study, requires a fuzzification process, so the first step is fuzzi-
fied the numerical weights describing the causality relations between concepts. The
Figure 5 displays the triangular membership functions used before applied the models
FCM-Ordinal (proposed in [23]) and FCM-TFN (proposed in [24]) in the simulation
scenario. To applied the model CWW FCM (proposed in [12]) the numerical values
were fuzzify with type-2 fuzzy sets and to apply the model FCM-GFN (proposed in
Comparative Analysis of Symbolic Reasoning Models … 137
this paper), the fuzzification process was made using the membership function show
in the Fig. 1.
The initial activation for externality nodes, i.e., those nodes have no in-links.
(1, 11, 19, 21, 22, 23 and 24) were fixed to “High” and the remained nodes were
initialized random. The simulations results are show in Table 1.
As seen, the fourth models converge to similar results because the factors with
greater activation value in FCM-Ordinal, FCM-TFN, FCM-GFN and CWW FCM
models agree with those reported as most important for the success of Mobile Pay-
ment System, in the paper [33].
If we compare these output vectors with the opinion of several interviewed (table
A2 of paper [33]) no difference are observed between this results and the opinion
of the interviewed. This similarity was calculated applying the Euclidean distance
function between the mean of opinions and each one of the output vector of models.
138 M. Frias et al.
With this case study we have illustrated the practical advantages of using sym-
bolic expressions to describe the FCM components and its reasoning mechanism.
The results achieved are logically coherent and according to common sense. The
interpretability of these symbolic inference models is appreciated by users with no
background in Mathematics or Computer Sciences.
7 Conclusions
In this paper, we have presented a model to perform the neural reasoning process
of FCM-based systems using linguistics terms. This implies that both the concepts’
activation values and the weight matrix are qualitatively quantified by linguistics
terms, instead of using numerical values. The proposed model is particularly attrac-
tive in decision-making scenarios since experts feel more comfortable describing the
problem domain using symbolic terms.
The simulations using a case study reveal that our model is capable of producing
similar qualitative values in both symbolic and numerical settings. This outcome
is surely encouraging and comprises interesting research avenues, which are being
explored by the authors. For example, whether the ordinal model is the best choice
to operate the linguistic terms is questionable. Moreover, employing the same aggre-
gation operator for representing the sum and the multiplication could be considered
unrealistic. In spite of these facts, we identify the proposed model as a baseline for
future studies in this field.
Acknowledgements The authors would like to thank to John T. Rickard from Distributed Infinity,
Inc. Larkspur, CO, USA for his support with the simulations.
References
1. Kosko, B.: Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 24, 65–75 (1986)
2. Nápoles, G., Grau, I., Bello, R., Grau, R.: Two-steps learning of fuzzy cognitive maps for
prediction and knowledge discovery on the HIV-1 drug resistance. Exp. Syst. Appl. 41(3),
821–830 (2014)
3. Herrera, F., Martínez, L.: A 2-tuple fuzzy linguistic representation model for computing with
words. IEEE Trans. Fuzzy Syst. 8(6), 746–752 (2000)
4. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems ad decision pro-
cesses. IEEE Trans. Syst. Man Cybern. SMC-3(1), 28–44 (1973)
5. Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamic System Approach to Machine
Intelligence. Englewood Cliffs (1992)
6. Kosko, B.: Hidden patterns in combined and adaptive knowledge networks. Int. J. Approx.
Reason. 2(4), 377–393 (1988)
7. Bonissone, P.P., Decker, K.S.: Selecting uncertainty calculi and granularity: an experiment in
trading-off precision and complexity, pp. 217–247. Amsterdam, The Netherlands (1986)
8. Degani, R., Bortolan, G.: The problem of linguistic approximation in clinical decision making.
Int. J. Approx. Reason. 2, 143–162 (1988)
Comparative Analysis of Symbolic Reasoning Models … 139
9. Delgado, M., Verdegay, J.L., Vila, M.A.: On aggregation operations of linguistic labels. Int. J.
Intell. Syst 8, 351–370 (1993)
10. Pérez-Teruel, K., Leyva-Vázquez, M., Espinilla, M.: Computación con palabras en la toma de
decisiones mediante mapas cognitivos difusos. Revista Cubana de Ciencias Informáticas 8(2),
19–34 (2014)
11. Pérez-Teruel, K., Leyva-Vázquez, M., Estrada-Sentí, V.: Mental models consensus process
using fuzzy cognitive maps and computing with words. Ing. Univ. 19(1), 173–188 (2015)
12. Rickard, J.T., Aisbett, J., Yager, R.R.: Computing with words in fuzzy cognitive maps. In:
Proceedings of World Conference on Soft Computing, pp. 1–6 (2015)
13. Dujmovic, J.: Continuous preference logic for system evaluation. IEEE Trans. Fuzzy Syst
15(6), 1082–1099 (2007)
14. Dujmovic, J., Larsen, H.L.: Generalized conjunction/disjunction. Int. J. Approx. Reason. 46,
423–446 (2007)
15. Rickard, J.T., Aisbett, J., Yager, R.R., Gibbon, G.: Fuzzy weighted power means in evaluation
decisions. In: 1st World Symposium on Soft Computing (2010)
16. Rickard, J.T., Aisbett, J., Yager, R.R., Gibbon, G.: Linguistic weighted power means: compari-
son with the linguistic weighted average. In: IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE 2011), pp. 2185–2192 (2011)
17. Hao, M., Mendel, J.M.: Encoding words into normal interval type-2 fuzzy sets: HM approach.
IEEE Trans. Fuzzy Syst. 24, 865–879 (2016)
18. Rickard, J.T., Aisbett, J., Yager, R.R.: A new fuzzy cognitive map structure based on the
weighted power mean. IEEE Trans. Fuzzy Syst. 23, 2188–2202 (2015)
19. Najafi, A., Amirkhani, A., Papageorgiou, E.I., Mosavi, M.R.: Medical decision making based
on fuzzy cognitive map and a generalization linguistic weighted power mean for computing
with words (2017)
20. Gónzalez, M.P., De La Rosa, C.G.B., and Francisco José Cedeña Moran. Fuzzy cognitive maps
and computing with words for modeling project portfolio risks interdependencies. Int. J. Innov.
Appl. Stud., 15(4):737–742, mayo, 2016
21. Saleh, S.H., Rivas, S.D.L., Gomez, A.M.M., Mohsen, F.S., Vzquez, M.L.: Representación del
conocimiento mediante mapas cognitivos difusos y conjuntos de términos lingüisticos difusos
dudosos en la biomedicina. Int. J. Innov. Appl. Stud. 17(1), 312–319 (2016)
22. Torra, V., Narukawa, Y.: On hesitant fuzzy sets and decision. In: IEEE International Conference,
pp. 1378–1382 (2009)
23. Frias, M., Filiberto, Y., Nápoles, G., Vahoof, K., Bello, R.: Fuzzy cognitive maps reasoning
with words: an ordinal approach. In: ISFUROS 2017 (2017)
24. Frias, M., Filiberto, Y., Nápoles, G., Garcia-Socarras, Y., Vahoof, K., Bello, R.: Fuzzy cognitive
maps reasoning with words based on triangular fuzzy numbers. In MICAI 2017 (2017)
25. Van Laarhoven, P.J.M., Pedrycz, W.: A fuzzy extension of saaty’s priority theory. Fuzzy Sets
Syst 11, 229–241 (1983)
26. Pacheco, M.A.C., Vellasco, M.M.B.R.: Intelligent Systems in Oil Field Developmnt Under
Uncertainty. Springer, Berlin, Heidelberg (2009)
27. Akther, S.U., Ahmad, T.: A computational method for fuzzy arithmetic operations. Daffodil
Int. Univ. J. Sci. Technol. 4(1), 18–22 (2009)
28. Reznik, L.: Fuzzy Controller Handbook. Newnes (1997)
29. Weihua, S., Peng, W., Zeng, S., Pen, B., Pand, T.: A method for fuzzy group decision making
based on induced aggregation operators and euclidean distance. Int. Trans. Oper. Res. 20,
579–594 (2013)
30. Xu, Z.S.: Fuzzy harmonic mean operators. Int. J. Intell. Syst. 24, 152–172 (2009)
31. Chen, C.T.: Extension of the topsis for group decision-making under fuzzy environment. Fuzzy
Sets Syst. 114, 1–9 (2000)
32. Nápoles, G., Papageorgiou, E., Bello, R., Vanhoof, K.: On the convergence of sigmoid fuzzy
cognitive maps. Inf. Sci. 349–350, 154–171 (2016)
33. Carvalho, J.P.: On the semantics and the use of fuzzy cognitive maps and dynamic cognitive
maps in social sciences. Fuzzy Sets Syst. 214, 6–19 (2013)
Fuzzy Cognitive Maps for Evaluating
Software Usability
Yamilis Fernández Pérez, Carlos Cruz Corona and Ailyn Febles Estrada
Abstract The usability assessment is a highly complex process given the variety of
criteria to consider and it manifests imprecision, understood as the lack of concretion
about the values to be used, synonymous with ambiguity. The usability evaluation
method proposed in this work incorporates elements of Soft Computing such as
fuzzy logic and fuzzy linguistic modeling. Furthermore, the use of fuzzy cognitive
maps allows adding the interrelation between criteria and therefore to obtain a real
global index of usability. A mobile app was developed to evaluate the usability of
mobile applications based on this proposal. The application of this proposal in a
real-world environment shows that it is an operative solution, reliable, precise and
of easy interpretation for its use in the industry.
Keywords Software quality · Soft computing · Fuzzy cognitive map · Fuzzy logic
1 Introduction
Usability is one of the most important attributes of software quality. It is very usual
to define usability as a software ease of use, but this definition is ambiguous. For this
reason, there are several definitions according to different approaches to measure it.
Best known definitions appear in: ISO 9126, ISO 9241 and ISO 25010 [1].
The definition most used for the evaluation of usability is that made by ISO 25010
standard. It defines usability as “the extent to which a product can be used by speci-
Y. F. Pérez (B)
University of Informatics Sciences, Havana, Cuba
e-mail: yamilisf@uci.cu
C. C. Corona (B)
University of Granada, Granada, Spain
e-mail: carloscruz@decsai.ugr.es
A. F. Estrada (B)
Cuban Information Technology Union, Havana, Cuba
e-mail: ailyn.febles@uniondeinformaticos.cu
fied users to achieve specified goals with effectiveness, efficiency and satisfaction in
a specified context of use” [2]. ISO describes usability as a combination of appropri-
ateness recognisability, learnability, operability, user error protection, user interface
aesthetics and accessibility.
The software usability assessment process is too expensive because it implies the
use of material resources and a team of well-trained specialists. It is highly complex
process given the variety of criteria to consider. For that reason, it is necessary to
achieve a correlation among the software assessment results and the actual usability
of the product.
The usability criteria to be taken into account in the assessment of a software
product are grouped in different ways for a better understanding. The most widely
used are those known as hierarchical models, which decompose the usability into
criteria organized in the form of a n-ary tree. Such a hierarchical decomposition is a
strategy widely used in different scientific disciplines. The most significant usability
models are McCall, Nielsen, ISO 9241, ISO 9126 and ISO 25010. These models
largely overlap; the attributes in different models are superimposed. Different name
for the same attribute are used and in some cases, there are equal names for different
attributes, which is determined when the actually measured for this attribute in a low
level is examined. Different ways to mix attributes are used and they are located in
different places in hierarchy.
The usability criteria are usually interdependent, because the result of the pref-
erence of one criterion over another is influenced by the others. With the increasing
level of understanding of usability, which transcends a simple taxonomy, the mod-
els have evolved towards the overlap and interrelation between these criteria. This
causes a group of criteria to influence quality in a contradictory way. For example,
greater appropriateness recognisability, means better learnability.
In spite of this, the proposed solutions are purely hierarchical [3, 4]. There are
many methods which combine conventional Multicriteria Decision Making Methods
(MCDM) with fuzzy concepts. Some use fuzzy TOPSIS [5], AHP [4, 6, 7], and
others that use fuzzy multi criteria approach [4, 8, 9]. Nevertheless, the usability
assessment from independent criteria causes some prejudice in favor or against the
product assessment.
As a result of the above problem, an evaluation method of usability assessing the
interrelationship between criteria is the main contribution made by this book chapter.
For this, the proposal uses elements of Soft Computing such as fuzzy logic, fuzzy
linguistic modeling and Fuzzy Cognitive Maps (FCM) as solution methods.
In addition, the proposal incorporates the restrictions of essential criteria, this is
considered as other contribution.
The paper has the following structure: Sect. 2 analyze and compare the several
methods existing in the literature for the same purpose, Sect. 3 describes the method-
ology and methods used to obtain the solution, Sect. 4 defines a generic usability
model, presents the new method based on Soft Computing, describes an app for
usability assessment for mobile applications and a case study in a Cuban company.
Finally, Sect. 5 is devoted to the conclusions and future works.
Fuzzy Cognitive Maps for Evaluating Software Usability 143
2 Related Works
Different methods have been developing for usability evaluation based on MCDM
and fuzzy theory [4–6, 8, 10–15]. Among the most used techniques, is the one that
uses a fuzzy multi criteria approach, highlighting the Fuzzy TOPSIS and Fuzzy AHP
methods, or their derivatives. But it has the limitation that it does not take into account
the relation between the criteria. This entails an analysis with ANP, it incorporates
feedback and interdependent relationships among decision attributes. This provides
a more accurate approach for modeling complex decision environment. ANP have
two disadvantages: first, it is difficult to provide a correct network structure among
criteria even for experts, and different structures lead to different results. Second, to
form a supermatrix all criteria have to be pair-wise compared with regard to all other
criteria, which is difficult and also unnatural [16].
In [1] an extensive review about several usability assessment models was pre-
sented. The authors remarked that there is great similarity among the models, which
allows modeling a general structure for their representation. In addition, to judge the
usability for cognitive and practical reasons, the usability sub characteristics must
be equal or less than seven. It is insufficient to have a single level, that is why sub-
characteristics are defined. Other element presented in this paper was that the values
of the criteria are heterogeneous because they come from objective and subjective
criteria.
In the majority of the bibliography, it is not reflected the interdependency among
the criteria. There are models that reflect this relation because they repeat measures
for the related attributes. The use of any Soft Computing techniques allows obtaining
better results. The conclusion about one specific Soft Computing technique being
better than another is not appropriate. In the case of the aggregation to obtain a global
value of usability, the complete aggregation as well as the partial aggregation must
be permitted.
If the most outstanding works are selected, it can be seen that a solution that solves
the problems encountered is not found (see Table 1).
144 Y. F. Pérez et al.
None of the analyzed methods allow partial aggregation of the criteria and only
one solution incorporated interdependent relationships among criteria. This can see
in Table 1.
The usability evaluation method proposed in this paper allows adding the interre-
lation between criteria, the essential criteria and data independence to obtain a real
global index of usability.
3 Methodology
Precision in a model assumes that its parameters represent exactly the perception
of the phenomenon or the characteristics of the actual modeled system [17]. This
does not happen in the modeling of interdependence, where it manifests imprecision,
understood as the lack of concretion about the values to be used, synonymous with
ambiguity. Soft Computing is a methodology widely used in situations where the
data to be considered are not accurate, but imprecise.
Quite often, the description of the state of objects or phenomena is done through
words or sentences instead of numbers, for it to be useful and appropriate. This is the
case of linguistic variable; whose value establishes the description. These variables
are useful because they constitute a way of compressing information [18]. In addition,
they help to characterize a phenomenon that may be ill-defined or complex to define,
or both. They are a means of translating concepts or linguistic descriptions into
numerical ones and treating them automatically. Linguistic modeling is based on
fuzzy sets and has proved its efficacy for the representation of information of a
qualitative nature.
Fuzzy Cognitive Maps is a technique developed by Kosko [19], for quantitative
modeling, based on the knowledge and experience of experts, and it is a fuzzy directed
graph. The nodes represent concepts and arcs the relationships between concepts.
In the FCM, there are three possible types of relations between concepts: positive
relation, negative relation or non-existence of relations. The degree of the relationship
is described through a fuzzy number or linguistic value, defined in the interval [−1,
1]. An FCM, consisting of n concepts, is represented in a n × n matrix, known as
the adjacency matrix. This matrix is obtained from the values assigned to the arcs.
In the contribution, it is decided to treat the interdependence between criteria
using Fuzzy Cognitive Maps, with the definition of the linguistic variable Influence
(I).
Also, there is a subset of criteria classified as essential (EC), determined from the
usability requirements. These criteria having associated restrictions and in turn, an
interval is linked to it. The essential criteria are treated on the basis of restrictions
and definition of a penalization vector.
Fuzzy Cognitive Maps for Evaluating Software Usability 145
4 Proposed Model
where:
• V is the set of evaluation criteria.
• E v is the set of vertical links.
• E h is the set of horizontal links (influence)
• EC is the set of essential criteria
UM (see Fig. 1) is constructed by levels: level 0, represents the usability index;
level 1, the sub-characteristics; level 2, the metrics obtained from the software testing
process and from expert assessments. A criterion is only found in one level and the
union of all corresponds with the whole to be valued.
Level0 {U } Level j ∅, Level j V . . . (2)
0≤ j≤l 0≤ j≤l
Each criterion has a weight (W) associated. At each level, there is a set of weight
vectors. The sum of the weights of the sibling criteria is equal to 1.
146 Y. F. Pérez et al.
Ev ⊂ V x V
(3)
E v (y, x)/x, y ∈ V, x ∈ Level j , y ∈ Level j−1
The criteria at all levels have a parent that is at the previous consecutive level,
except level 0 (see Eq. 4).
∀0< j≤l Level j x/(y, x) ∈ E v , x ∈ V, y ∈ Level j−1 (4)
The resulting FCM is reviewed by each expert and to each edge is associated a value
of the variable I. The FCM obtained by each expert should be added later, using
a technique that allows for consensus. It is advisable to use a consensus-building
algorithm as proposed in [20].
The consensus Fuzzy Cognitive Map (FCMc) is obtained for each group of sib-
lings. From each FCMc, the adjacency matrix (AMc) is found, the different AMc
of each level are combined and the matrix of interdependence between criteria is
determined.
The unification of the different AMc is simple because there are no common criteria
between the different maps. The combination is performed according to Eq. 6.
⎡ ⎤
AMc1 win f win f
⎢ ⎥
⎣ win f AMc2 win f ⎦ (6)
win f win f AMck
⎢ ⎥
⎢ . .. .. ⎥
⎢ .. . . ⎥
⎢ ⎥
⎢ ⎥
(7)
⎢ ⎥
⎢ . .. .. ⎥
⎢ . ⎥
⎣ . . . ⎦
yn,1 · · · yn,j · · · yn,n
where,
• x is essential criterion.
• Bx : lower threshold value of criterion x.
• B x : upper threshold value of criterion x.
With this formal and generic definition of the usability model, a better structuring
of the problem of usability assessment is achieved.
148 Y. F. Pérez et al.
The penalization vector is calculated on based of the essential criteria and their
restrictions. The objective of this vector is to control that the usability measures
considered essential act in accordance with the defined restrictions; otherwise, the
associated sub-characteristics are assigned the value 0.
For the calculation of the penalization vector P, Eq. 9 is used.
P min z ik (9)
1≤k≤r
where,
• z ik is an element of the matrix Z.
• r represents sibling criteria.
⎡ ⎤
z 11 . . . z 1ln
Z ⎣ ... ... ... ⎦ (10)
z m1 . . . z mln
1 if Bx ≤ x ≤ B x
g x, Bx , B x
0 other case
3. Usability evaluation: Usability testing and experts’ evaluation are performed for
the different software, the value of each of the selected measures is obtained,
l and unification of information and the evaluation matrix is
it is normalization
established Me .
4. Aggregation of information: First, the influence matrix (G) is calculated using
the Eq. 12.
l l
l
G f Me + Me × M I (12)
Next, the previous matrix is weighted and the information of sibling criteria is
added (Eq. 13).
Gp G ⊗ W, gpi j gi j · w j (13)
Finally, the products are penalized taking into account the essential criteria, and
using the method of the penalization vector described.
An app was developed to evaluate the usability of mobile applications. The proposed
method is the basis of the implementation of this application. The app is developed
on Android Operating System and through web services. The interfaces have been
designed in a pleasant, understandable and easy to operate manner, so that the user,
at all times, knows the actions that can and should be performed. It goes through
the application, from the registration of users, to the ranking of the apps, intuitively
and with a comfortable navigation for the user, conceiving a sequential process.
Figures 3, 4, 5 and 6, show the user interfaces of each step to be performed. Figures 3
and 4 corresponds to step 2 Determination of the weight of the criteria and the
interdependence. On the other hand, the Fig. 5 shows of the value of measures
according of the third step. The results obtained from the application of the method
are shown in Fig. 6.
150 Y. F. Pérez et al.
The previous method and app was applied in a controlled environment a cuban
software quality evaluation company, CALISOFT, for the usability assessment of
three products (S1, S2, S3).
Based on the software requirements, it is determined to evaluate the Usability and
the sub-features, according to ISO 25010: appropriateness recognisability, learnabil-
ity, operability, user interface aesthetics (see Table 2). Exhaustive description and
Integrity of the documentation are numeric measures, while Satisfaction and Appear-
ance are linguistic variables. Satisfaction is a linguistic variable, which establishes
five labels: Very Low (VL), Low (L), Medium (M), High (H), Very High (VH). While
appearance establishes five label too as Not Pleasant (NP), Low Pleasant (LP), Pleas-
ant (P), High Pleasant (HP), Very High Pleasant (VHP). All measures are benefit.
Here ends first step.
The second step is executed. The weight vector for the usability is determined
through peer comparison (see Table 3). A study on the sensitivity of the resulting
152 Y. F. Pérez et al.
Software tests were performed, according to third step. The resulting data collected
and metrics were obtained. The value of each measure for each software is shown
in Table 4. After normalizing and unifying the data in fuzzy triangular numbers, the
evaluation matrix (Me) was achieved, as it is shown in Table 5.
154 Y. F. Pérez et al.
The value of each usability index and ranking are shown in Table 6, as resulting
of last steps. The best usability index corresponds to product S1.
Through the analysis of the models of usability used in the industry, it was possible
to obtain a solution to the problem of modeling a generic structure, through a graph.
The proposed method to value the interdependence between criteria and the essential
criteria. It also integrates the manipulation of ambiguous, imprecise information from
different sources. The proposal is based on elements of Soft Computing, such as fuzzy
logic, fuzzy linguistic modeling and the use of fuzzy cognitive maps. It is inspired
by real practical experiences provided by a Cuban company.
In this paper, the efficacy of Fuzzy Cognitive Map was demonstrated for modeling
problems of decision making, oriented fundamentally to the structuring and analysis
the interdependence between criteria.
The method facilitates and reduces the time for decision making, by creating a
logical, rational and transparent basis for analysis. It also achieves a better structuring
of the problem and therefore, greater participation and influence of all. Besides, it
increases the depth of analysis, which leads to an increase in the quality of the
decision.
The application of the proposal in a controlled environment shows that it is an
operative, reliable and precise solution, which is easily interpreted for its application
in industry.
Given the relevance of the topic addressed, the increasing complexity of the soft-
ware and the need to move towards the achievement of excellence in the products,
the continuity of the research is justified, moving towards the following lines: to
extend the proposed method, incorporating the modeling of the dynamic nature of
the evaluation, since the parameters change over time and produce an impact on the
final evaluation of the product. In addition, it is necessary not only to evaluate but also
to predict the usability of intermediate products in the development process using
machine learning algorithm. From the stored data of various evaluations, techniques
or algorithms for machine learning could be incorporated into the proposed model.
The weights of the aggregation mechanisms can be modified according to the context,
learning the weights of the aggregation function from the historical behavior.
Acknowledgements This work has been partially funded by the Spanish Ministry of Economy and
Competitiveness with the support of the project TIN2014-55024-P, and by the Regional Government
of Andalusia—Spain with the support of the project P11-TIC-8001 (both including funds from the
European Regional Development Fund, ERDF).
Fuzzy Cognitive Maps for Evaluating Software Usability 155
References
1. Fernández-Pérez, Y., Febles-Estrada, A., Cruz, C., Verdegay, J.L.: Complex Systems: Solutions
and Challenges in Economics, Management and Engineering (2017)
2. ISO/IEC, ISO/IEC 25010:2011 Systems and software engineering—Systems and software
Quality Requirements and Evaluation (SQuaRE)—System and Software Quality Models
(2011)
3. Basto Cordero, L.J., Ribeiro Parente Filho, L.F., Costa dos Santos, R., Gassenferth, W., Soares
Machado, M.A.: Ipod system’s usability: an application of the fuzzy logic. Glob. J. Comput.
Sci. Technol. 13 (2013)
4. Bhatnagar, S., Dubey, S.K., Rana, A.: Quantifying website usability using fuzzy approach. Int.
J. Soft Comput. Eng. 2, 424–428 (2012). ISSN: 2231-2307
5. Montazer, GhA, Saremi, H.Q.: An application of type-2 fuzzy notions in website structures
selection: utilizing extended TOPSIS method. WSEAS Trans. Comput. 7, 8–15 (2008)
6. Dubey, S.K., Mittal, A., Rana, A.: Measurement of object oriented software usability using
fuzzy AHP. Int. J. Comput. Sci. Telecommun. 3, 98–104 (2012)
7. Kurosu, M.: Human-Computer Interaction Users and Contexts: 17th International Conference,
HCI International 2015 Los Angeles, CA, USA, 2–7 August 2015 Proceedings, Part III. Lecture
Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 9171, pp. 35–42 (2015)
8. Singh, A., Dubey, S.K.: Evaluation of usability using soft computing technique. Int. J. Sci.
Eng. Res. 4, 162–166 (2013)
9. Cables, E., García-cascales, M.S., Lamata, M.T.: The LTOPSIS: an alternative to TOPSIS
decision-making approach for linguistic variables. Expert Syst. Appl. 39, 2119–2126 (2012)
10. Lamichhane, R., Meesad, P.: A usability evaluation for government websites of Nepal using
fuzzy AHP. In: 7th International Conference on Computing and Information Technology
IC2IT2011, pp. 99–104 (2011)
11. Etaati, M.L., Sadi-Nezhad, S.: A, using fuzzy analytical network process and ISO 9126 quality
model in software selection: a case study in e-learnig systems. J. Appl. Sci. 11, 96–103 (2011)
12. Challa, J.S., Paul, A., Dada, Y., Nerella, V., Srivastava, P.R.: Quantification of software quality
parameters using fuzzy multi criteria approach. In: 2011 International Conference on Process
Automation, Control and Computing (PACC), pp. 1–6 (2011)
13. Challa, J.S., Paul, A., Dada, Y., Nerella, V.: Integrated software quality evaluation: a fuzzy
multi-criteria approach. J. Inf. Process. Syst. 7, 473–518 (2011)
14. Dubey, S.K., Gulati, A., Rana, A.: Usability evaluation of software systems using fuzzy multi-
criteria approach. IJCSI Int. J. Comput. Sci. 9, 404–409 (2012). ISSN 1694-0814
15. Li, Q., Zhao, X., Lin, R., Chen, B.: Relative entropy method for fuzzy multiple attribute decision
making and its application to software quality evaluation. J. Intell. Fuzzy Syst. 26, 1687–1693
(2014)
16. Kiszová, Z., Mazurek, J.: Modeling dependence and feedback in ANP with fuzzy cognitive
maps. In: Proceedings of 30th International Conference on Mathematical Methods in Eco-
nomics, pp. 558–563 (2012)
17. Zimmermann, H.J.: Fuzzy set theory. Wiley Interdiscip. Rev. Comput. Stat. 2, 317–332 (2010)
18. Zadeh, L.A.: Soft computing and fuzzy logic. IEEE Softw. 11, 48–56 (1994)
19. Kosko, B.: Fuzzy cognitive maps. Int. J. Man Mach. Stud. 24, 65–75 (1986)
20. Groumpos, P.P.: Fuzzy cognitive maps: basic theories and their application to complex systems.
Fuzzy Cogn. Maps 247, 1–22 (2010)
Fuzzy Simulation of Human Behaviour
in the Health-e-Living System
Abstract This chapter shows an application of fuzzy set theory to preventive health
support systems where adherence to medical treatment is an important measure to
promote health and reduce health care costs. Preventive health care information
technology systems design include ensuring adherence to treatment through Just-In-
Time Adaptive Interventions (JITAI). Determining the timing of the intervention and
the appropriate intervention strategy are two of the main difficulties facing current
systems. In this work, a JITAI system called Health-e-living (Heli) was developed
for a group of patients with type-2 diabetes. During the development stages of Heli
it was verified that the state of each user is fuzzy and it is difficult to get the right
moment to send motivational message without being annoying. A fuzzy formula
is proposed to measure the adherence of patients to their goals. As the adherence
measurement needed more data, it was introduce the DisCo software toolset for
formal specifications, the modelling of human behaviour and health action process
approach (HAPA) to simulate the interactions between users of the Heli system.
The effectiveness of interventions is essential in any JITAI system and the proposed
formula allows Heli to send motivational messages in correspondence with the status
of each user as to evaluate the efficiency of any intervention strategy.
R. Martinez · M. Tong
ExtensiveLife Oy, Lohkaretie 2 B 9, 33470 Tampere, Finland
e-mail: remberto@health-e-living.com
M. Tong
e-mail: marcos@health-e-living.com
L. Diago (B)
Interlocus Inc., Yokohama 226-8510, Japan
e-mail: ldiago@i-locus.com
L. Diago
Meiji Institute for Advanced Study of Mathematical Sciences, Meiji University,
4-21-1 Nakano, Tokyo 164-8525, Japan
T. Nummenmaa · J. Nummenmaa
University of Tampere, Tampere, Finland
e-mail: timo.nummenmaa@staff.uta.fi
J. Nummenmaa
e-mail: jyrki.nummenmaa@staff.uta.fi
© Springer Nature Switzerland AG 2019 157
R. Bello et al. (eds.), Uncertainty Management with Fuzzy and Rough Sets,
Studies in Fuzziness and Soft Computing 377,
https://doi.org/10.1007/978-3-030-10463-4_9
158 R. Martinez et al.
1 Introduction
People’s health determinants are difficult to model because of their inherent uncer-
tainty, the complex interactions among them, as well as the considerable number of
variables and lack of precise mathematical models. This was the main motivation
to use a fuzzy approach as a practical option to model the adherence to a treatment
and healthy lifestyle of a patient in the Heli system. In Heli, two variables are com-
bined for the evaluation of patients’ status: the progress of the proximal outcomes
Δx = x − gi and the patient adherence to the system y = F(x, z). Note that the
value of y depends on the inputs x (i.e. proximal outcomes) which are controlled by
the patients and the contextual inputs z (e.g. environment) which are not controlled
by the patients. Progress indicates how close a patient is to completing the outcomes
gi(1≤i≤n) and the adherence measures how effective the system is in its intervention.
The adherence is modelled as a fuzzy weighted average involving type-1 (T1) fuzzy
sets as follows [14]: n
wi xi
y = i=1 n (1)
i=i wi
In (1), wi are weights that act upon proximal outcomes xi . While it is always true
that the sum of the normalized weights that act upon each xi add to one, it is not a
requirement that the sum of the unnormalized weights must add to one. The adherence
is calculated as an average of several goals, and gives an idea of how well the
system goes with the goals of the patients. Every goal gi is defined on an interval
gi ∈ [g− , g + ] and the values of y− and y + are computed accordingly as follows:
⎧ ⎫
⎧ ⎫ ⎪ 1, if (x ≤ g + ), ⎪
⎨ 1, if (x ≥ g− ) ⎬ ⎪
⎨ 2g −x
+ ⎪
⎬
+ i f (g + < x < 2g + )
y− (x) = x/g− i f (0 < x < g− ) , y (x) = g +
.
⎩ ⎭ ⎪
⎪ 0 if x ≥ 2g + ⎪⎪
1 other wise ⎩ ⎭
1 other wise
(2)
An example of positive goal to achieve would be to increase fruits consumption as a
minimum of 5 fruit portions in a week or to walk a minimum of 10,000 steps a day.
For this type of goals g− it is enough to achieve the minimum in order to have 100%
of completion. Similarly for negative goal, and example could be to decrease sugary
beverage consumption with a maximum of 1 glass of soda in a week. This type of
goal g + would achieve 100% of completion with no data entry or zero amount of
beverage portions consumed.
As it was mentioned before, the choice of the time interval between decision
points can have a dramatic impact on the ability of a user to achieve its goals. Patient
progress is calculated daily and adherence is added to the system weekly but it is
calculated every two weeks because we can not calculate without data. The time
interval between decision points is set to two-weeks to see how many entries there
are in that period. As long as adhesion is closer to 1 the system is more effective.
In this work we use the stage model; the stage approach assumes that change is
non-linear and consists of several qualitative steps that reflect different mindsets of
people. We could model the efficacy of HeLi as the probability of compliance with
Health Goal set at the evaluation time (2 weeks) or just simply the probability of an
intended Health Behaviour Change Compliance.
where CH(t) is adherence history over the time the system is used and n is the number
of previous inputs
C H (t) = 1 − (0.1)n(t) (4)
MA(t) is the Motivation to comply with Health Goal selected according to your
personal motivation (Mi) and believes (Bi) at any time,
n
M A(t) = Mi ∗ Bi (5)
i=1
P(t) is the Perceived Self-Efficacy over time, including outcomes expectations (Ok)
and risk perception (Rk) during intention formation.
n
P(t) = Ok ∗ Rk (6)
k=1
actions. The next step would be to model a user behaviour that is influenced by the
environment and will change the perceived self-efficacy in time.
The main contribution of HAPA model is allowing perceived self-efficacy to
change over time in situations where user needs to cope with setbacks or recover
from life challenges. In this iteration is possible to add two more properties to the
user model: emotional commitment (1, can cope or 0.1, not) and failure learning
(1, can recover or 0.2, not). Now the simulated data allows a user to react on the
event of receiving a motivational message or not. In this phase the value of P will
be calculated over the assessment period of time. Figure 2 shows on top the basic
HAPA model with its three user states: Motivation, Volition and Maintenance. On
the bottom is represented how the model is used in system Heli where Volition and
Maintenance are combined in one state as HAPA_Volition.
As the amount of real data available in the Heli system was limited and the amount of
available data from real users was not substantial enough, in this chapter we describe
Fuzzy Adherence Simulation using a HAPA model for User’s behaviour in a DisCo
formal specification environment as a method to generate more data resembling the
data observed in the real system.
There were 126 users registered in Heli from 2013/07 to 2017/03 (including
8 system administrators, 20 coaches and 98 patients mainly related with type-2
diabetes). Figure 3 shows patients weight (44–125 kg) and the distribution of the
number of goals selected by the participants.
Several authors [2, 3, 11] have emphasized the importance of having computational
models of human behaviour to monitor the dynamics of an individual’s internal
state and context in real time. The adaptation requires monitoring the individual to
decide (a) whether the individual is in a state that requires support; (b) what type (or
amount) of support is needed given the individual’s state; and (c) whether providing
this support has the potential to disrupt the desired process. In our previous work we
focus on the design and evaluation of effective interventions exploring patients self-
reporting states and sending motivational messages based on a dimensional approach.
Table 1 shows 7 dimensions, 13 states and some examples of motivational messages
used in the intervention. Note that the messages are associated to the dimensions and
not to the states of the patients, since the states vary over time and in some cases
the states were not reported during the system test stage (e.g. states marked with
“-” in the table). Current probabilities of the states are included within parenthesis.
164 R. Martinez et al.
(a) Weight
Fig. 3 Statistics of the data collected for 98 patients registered in Heli from 2013/07 to 2017/03
Motivational messages are sent to the patients based on the computed adherence to
their proximal outcomes and their reported states (i.e. feedbacks).
The Waikato Environment for Knowledge Analysis (WEKA) [15] software was
used to predict the state of one patient (id = 19). The patient provided 46 feed-
backs to de system including 8 states: tired (14), stressed (5), busy (14), sick/ill (2),
energetic (1), confident (3), socially pressured (3) and happy (4). The number in
parenthesis represents the times the state was provided. NaiveBayes, MLPClassifier,
AdaBoostM1 and RBFNetwork classifiers were tested with one feature computed
by (1) and the 8 states provided by the patient. Using a 10-fold cross-validation
method the accuracy of the classifiers was 23.9130, 26.0870, 32.6087 and 34.7826%
respectively. The accuracy of the classifiers is still very low due to class overlapping
(e.g. tired, stressed, busy and socially pressured are very similar) and missing values
in the computation of the fuzzy adherence for the patients. In Heli, the number of
users with fuzzy adherence was very small (25/98 ≈ 25.5%) because most users
(73/98 ≈ 74.5%) prefer to use the system to store daily data without a specific goal.
As the emotional dimensions used in the research may not be the most adequate,
later on we use machine learning tools to enhance the effectiveness of Heli based on
computational models of human behaviour like the health action process approach
(HAPA) [11].
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 165
In Fig. 4 the Heli simulation world consists of four classes (patient, coach, Heli
system and external world) that can interact between each others through actions.
Actions are enabled on the simulation according to their guard (simulation sys-
tem state and relationship between classes). Enabled actions are selected for execu-
tion nondeterministically (with weighted probability) in any specific execution time.
When a participant user (patient) registers to Heli system and selects a goal (i.e mon-
itor own weight), a relation is Patient O f H eli becomes active and indicates that the
participant is already in the HAPA_Motivational state. After running a simulation
for a period equivalent to 367 days, the adherence to the system is observed and com-
puted as the number of data inputs during a week period of time. All modelled users
were registered and defined a goal (on targets related to weight management, better
nutrition and increase physical activity level), when no EMA entries are available in
an evaluation period of time the simulation assigns the user to HAPA_Motivational
state. Later on when EMA entries are available during the week, the user is con-
sidered in HAPA_Volitional or HAPA_Maintenance state and the contents of each
entries can be used to compute the progress over time towards the selected goal (see
Fig. 2).
There are more than 1200 records per patient on average in the simulation. On
each recorded entry, the simulation computes the probability of entering next input,
the compliance history and the new value of adherence. Over the elapsed time of
every two weeks, in the simulated world, is possible to assess the value of adher-
ence and based on that the system sends personalised messages according to user
attribute compliance: not_very_active, active and very_active. Since the simulation
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 167
purpose was to generate data and not to represent the messages personalisation, the
model increased the probability of sending more user activity for those users where
the value of adherence was closer to 1 (max). This feature represents a participant
resilience to cope with external environment setbacks. Resilience can be simplified
to include user’s emotional commitment and the ability to learn from a failure and
to keep own goal as a user being of two types: responsive and not_responsive. A
user being responsive will have a high correlation with user in HAPA_Volitional or
HAPA_Maintenance states, having moderately high adherence or high adherence.
Listing 9.1 Functions used in the Disco specification of Adherence in equation (1)
As in our previous research the The Waikato Environment for Knowledge Analysis
(WEKA) [15] software was used to predict the state of above three patients. Instead
of using the state reported by the patients, we used the three states included in the
HAPA model (motivational, volitional and maintenance) to find a correspondence
between the HAPA states and previous states reported by the real participants. Using
a 10-fold cross-validation method to predict the HAPA states with J48 classifier the
accuracy was 99.76, 98.63 and 99.20% for each patient respectively. As the states
of the patients are fixed by the simulator in the HAPA model, we find a mapping
between the values of adherence in the precedents of the rules and the values of
adherence in the previous states reported by the patient in the real Heli-system. In
the current simulation of the system from the specifications, the HAPA states only
168 R. Martinez et al.
depend on the registration process and the number of messages sent to the Heli by the
Participants. There is not specification detailed for the changes in the HAPA states
after the Patient entered the maintenance state. The factors to modify the states are
currently under investigation and modelling.
5 Preliminary Results
As the results in our previous research were only preliminary due to the small size of
the the sample, in this chapter our experiments are focused to review the conditions
of the real patients who participated in the Heli system. Using the simulation we
intend to mimic real conditions with better models and extract new knowledge from
the Disco simulations. While the generated data does not reflect the real data exactly,
it can be used to quickly validate the assumptions in the participants goal achieve-
ment progress and its system adherence calculation and then compare the results of
the proposed approach with the approach reported in [12]. Figure 5 shows the prob-
ability distribution function for the adherence experimental data and the values of
adherence computed by both simulation models. The main difference between both
simulation models is that proposed model includes motivational messages sent by the
coaches to the patients based on the values of adherence while Brailsford model does
not includes motivational messages. The graph were computed by using the Matlab
f itdist function that creates a probability distribution object by fitting the Epanech-
nikov kernel function to the data. The values of the bandwidth were 0.1029, 0.0497
and 0.0744 for the real, Braidford and proposed simulated data respectively. Both
models used in simulation were simplified to not include users personal motivation
to achieve the intended goal, however the inclusion of HAPA model in the proposed
allowed to predict the user state based on the number of inputs over time. The pro-
posed model matches the adherence value increase of participants after receiving
motivational messages as it was observed in the real data [9]. Any user instance was
allowed to have several goals at the same time, however due to time limitations the
executed simulation assumed only one type of goal at the time.
As it is shown in Fig. 5, the values of adherence are higher for the proposed
approach (0.54 ± 0.32 vs. 0.51 ± 0.32) and also the graph is closer to the experi-
mental/real graph. Although we did not compare the two distributions directly, we
can see in the figure that in the extremes (adherence close to zero and adherence
close to one) the distributions are very similar (they differ only in a scale factor).
However, when the adherence values are between 0.4 and 0.8 the distributions do
not look alike. This can be said to be due to the lack of data acquired during the real
experiments and/or because the simulation is still far from including the aspects of
reality that govern those cases where adherence has average values. The probability
with which a patient sends data after receiving a message from a Coach has not been
modelled either. The variation of the motivation over time has not been included
either (the value of m = 1 was fixed during the simulation).
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 169
Fig. 5 Comparison of the histograms (upper part) and probability distribution function (PDF, lower
part) for simulated and real values of adh
The authors consider that the main objective of this work has been fulfilled, since
we were able to increase the number of data by means of simulation and at the same
time predict the states in which the patient is (according to HAPA) from the values of
adherence computed by the proposed fuzzy formula in (1). Finally, Fig. 6 shows the
results of the simulations with Disco for 367 days with the 3 patients, the coach and
the Heli system shown in Fig. 4. The average number of entries randomly generated
was 2.4 times larger than the real data mainly because the number of real messages
sent by the automatic Heli coach is smaller than those of the simulated. In the real
Heli, the automatic coach generates a motivational message every 2 weeks for each
patient, so it would be around 25 a year maximum or 50 if the patient provides more
feedback (see the patient states in Sect. 3.1). A human coach could generate a little bit
more messages when supporting a real participant. In the simulation, the messages
are generated daily in a random way so the number of records is larger. However, the
simulation shows that some pattern of adherence exists according to the type of goal
settled by the Patient. For example, Patient 1, that tried to control the weight below
75 kg never reached the goal during one year. On the contrary, Patient 2 and Patient 3
who settled their goals related with physical activities (e.g. walk more than 1000 steps
a week) and nutrition (e.g. eat more than 7 fruits a week) could achieve their goal
170 R. Martinez et al.
Fig. 6 Results of the simulations with Disco for 367 days with the 3 patients
several times in the year. As it is shown, Fig. 6 shows a cyclic behaviour for Patient 2
and Patient 3. After the patient reached the goal the adherence is reduced. Although
this behaviour was included in the simulated specification it can be consider in some
agreement with real life.
JITAI systems like Heli appear to be a promising framework for developing mHealth
interventions. In Heli, the number of users with fuzzy adherence was very small
(25/98 ≈ 25.5%) because most users (73/98 ≈ 74.5%) prefer to use the system to
Fuzzy Simulation of Human Behaviour in the Health-e-Living System 171
store daily data without a specific goal. However, the proposed interventions showed
that even after several stress inputs patients do not leave the system. Although this
research is still in its infancy, fuzzy measures like the proposed adherence formula
constitute a practical option to measure the way a patient approaches by successive
approximations in time to a certain goal. The chapter showed by mean of simulation
that there is a close correspondence between real world adherence of the patients and
its computational model.
The simplified model used in simulation did not include the reactiveness of users
when receiving motivational messages from Heli automatic coach. The ways to
adjust the number of registered records during simulation to agree with those of
the real life is currently under investigation. Future works should expand the model
to improve user personal motivation and perceived self-efficacy. For example by
using the data available after user profiling or the data collected from system usage.
Another interesting approach would be to measure adherence when the model allows
to change a goal after several weeks of simulation execution.
The introduction of HAPA and human behaviour factors in the model required a
better understanding of the user. The data collected from the real system was used to
decide what relations and actions were the most important for state transitions and
adherence computation.
While DisCo specifications were enough to describe the real system implemen-
tation, it could be expanded to give more freedom in modelling. The relationships
between classes were enough to represent the Heli world in the simulator. Main
limitations of current DisCo specification toolset is the ability to specify complex
mathematical formulas and the reduced semantics set during the simulation prepa-
ration. The Animator tool was pleasant to use, however some improvements are
needed to implement external factors variability for the system under modelling dur-
ing execution time. For the logs output processing it is desirable to add an export
functionality to common standard formats like csv or database connectors. After
several iterations of modelling, It was possible to generate large amount of data to
discover new knowledge about real system. The data generated is found useful for
other phases of software testing cycle in stage level systems.
More research is required to understand the impact of behavioural interventions on
real life user lifestyle achievements and what aspects of user motivation is triggering
the intention of resilience improvements.
References
1. Bowen, M.E., Bhat, D., Fish, J., Moran, B., Howell-Stampley, T., Kirk, L., Persell, S.D., Halm,
E.A.: Improving Performance on Preventive Health Quality Measures Using Clinical Decision
Support to Capture Care Done Elsewhere and Patient Exceptions
2. Nahum-Shani, I., Smith, S.N., Spring, B.J., Collins, L.M., Witkiewitz, K., Tewari, A., Murphy,
S.A.: Just-in-time adaptive interventions (JITAIs) in mobile health: key components and design
principles for ongoing health behavior support. Ann. Behav. Med. (2016). https://doi.org/10.
1007/s12160-016-9830-8
172 R. Martinez et al.
3. Murray, T., Hekler, E., Spruijt-Metz, D., Rivera1, D.E., Raij, A.: Formalization of computational
human behavior models for contextual persuasive technology. In: PERSUASIVE 2016, LNCS
9638, pp. 150–161 (2016). https://doi.org/10.1007/978-3-319-31510-2_13
4. Hekler, E.B., Michie, S., Pavel, M., Rivera, D.E., Collins, L.M., Jimison, H.B., Garnett, C.,
Parral, S., Spruijt-Metz, D.: Advancing models and theories for digital behavior change inter-
ventions. Am. J. Prev. Med. 51(5), 825–832 (2016). https://doi.org/10.1016/j.amepre.2016.06.
013
5. Yuan, B., Herbert, J.: Fuzzy CARA - a fuzzy-based context reasoning system for pervasive
healthcare. Procedia Comput. Sci. 10, 357–365 (2012)
6. Torres, A., Nieto, J.J.: Fuzzy logic in medicine and bioinformatics. J. Biomed. Biotechnol.
2006, Article ID 91908, 1–7. https://doi.org/10.1155/JBB/2006/91908
7. Giabbanelli, P.J., Crutzen, R.: Creating groups with similar expected behavioural response in
randomized controlled trials: a fuzzy cognitive map approach. BMC Med. Res. Methodol. 14,
130 (2014)
8. Gursel, G.: Healthcare, uncertainty, and fuzzy logic. Digit. Med. 2, 101–112 (2016)
9. Martinez, R., Tong, M., Diago, L.: Fuzzy adherence formula for the evaluation of just-in-time
adaptive interventions in the health-e-living system. In: Proceedings of ISFUROS Symposium
(2017)
10. The DisCo project WWW page. http://disco.cs.tut.fi on the World Wide Web. Accessed 16
April 2018
11. MacPhail, M., Mullan, B., Sharpe, L., MacCann, C., Todd, J.: Using the health action process
approach to predict and improve health outcomes in individuals with type 2 diabetes mellitus.
Diabetes Metab. Syndr. Obes. Targets Ther. 7, 469–479 (2014)
12. Brailsford, S.C.: Healthcare: human behavior in simulation models. In: Kunc, M., Malpass, J.,
White, L. (eds.) Behavioral Operational Research. Palgrave Macmillan, London (2016)
13. Martinez, R., Tong, M.: Can mobile health deliver participatory medicine to all citizens in
modern society? In: 4th International Conference on Well-Being in the Information Society,
WIS 2012, Turku, 22 August 2012–24 August 2012, pp. 83–90 (2012)
14. Liu, F., Mendel, J.M.: Aggregation using the fuzzy weighted average as computed by the
Karnik-Mendel algorithms. IEEE Trans. Fuzzy Syst. 16(1), 1–12
15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data
mining software: an update. SIGKDD Explor Newsl. 11, 10–18 (2009). https://doi.org/10.
1145/1656274.1656278
16. Diller, A.: Z: An Introduction to Formal Methods. Wiley, New York (1990)
17. Lamport, L.: The temporal logic of actions. ACM Trans. Program. Lang. Syst. 16(3), 872–923
(1994)
18. Aaltonen, T., Katara, M., Pitkanen, R.: DisCo toolset - the new generation. J. Univers. Comput.
Sci. 7(1), 3–18 (2001)
19. Nummenmaa, T.: Executable Formal Specifications in Game Development: Design, Validation
and Evolution. Ph.D. thesis, Tampere University Press, Tampere (2013)
Part II
Rough Sets: Theory and Applications
Matroids and Submodular Functions
for Covering-Based Rough Sets
1 Introduction
The classical rough set theory was extended to covering-based rough set theory by
many authors. W. Żakowski [17], J. A. Pomykala [7], E. Tsang et al. [10], W. Zhu
and F. Wang [22–24], Xu and Zhang [15] present different approximation operators
for covering approximation spaces. In 2012, Y. Y. Yao and B. Yao proposed a general
framework for the study of covering-based rough sets in [16].
Matroids are important tools for describing some concepts in graph theory and lin-
ear independence in matrix theory [4, 5]. S. Wang et al. present a matroidal approach
to rough set theory, defining a matroidal structure from the partition obtained from
an equivalence relation [9]. X. Li et al. present a matroidal approach to rough sets
via closure operators [6].
Matroidal structures of covering-based rough sets are generally induced by a fami-
ly of subsets of a universe, defined through lower and upper approximation. In [11],
two matroidal structures of covering-based rough sets are built, using transversal
theory and upper approximation number. A matroidal structure from lower approxi-
mation operator in rough set theory was presented in [25].
The idea of independent sets in matroid theory can be useful for attribute reduction
problem. Some rough set-based methods in feature selection have been used for sol-
ving the attribute reduction problems [12, 26]. Some recent papers have established
interesting properties of the matroids and some connections with other mathematical
structures [4, 6, 13].
Different order and preorder relations on coverings are defined in [1]. Order
relations on approximation operators are presented in [3, 8].
In this paper, we use the upper approximation number function of a covering C as
a submodular function to build a matroidal structure. We use some basic examples
to compare the matroidal structures of different partitions of a set U . Also, we
obtain the respective matroidal structure for different coverings and we establish
a preorder relation on induced matroids. We study the matroidal structures obtained
from different lower approximation operators and different coverings. Additionally,
we extend the lower approximation matroidal structure to covering-based rough sets.
Finally, we compare the order relation of these structures, with the order defined in
upper approximation operators, as was established in [8].
The results of the comparison regarding about order among matroids are helpful
to select appropriate structures in typical rough set applications, such as attribute
selection and classification.
The remainder of this paper is organized as follows: Sect. 2 presents preliminary
concepts about covering-based rough sets, as well as matroids and submodular func-
tions. Section 3 presents the main matroids obtained by different methods. In Sect. 4,
we present some preorder relations between coverings, and we establish an order
relation between different matroidal structures and submodular functions. Finally,
Sect. 5 presents some conclusions and outlines our future work.
2 Preliminaries
Throughout this paper, we assume that U is a finite and non-empty set. P(U )
represents the collection of subsets of U and |A| the cardinal of the set A for any
A ⊆ U.
A subset A is said to be exact if apr (A) = apr (A), otherwise it is called a rough
set. These approximations are called granule-based, according to [16].
Many authors have investigated generalized rough set models obtained by changing
the condition that E is an equivalence relation, or equivalently, that U/E is a partition
of U . Changing of the partition for a collection of non-empty subsets K ⊆ U , with
∪K = U , gives rise to covering-based rough sets [14, 18–21].
Definition 1 Let C = {K i } be a family of non-empty subsets of U . C is called a cover-
ing of U if ∪K i = U . The ordered pair (U, C) is called a covering approximation
space [19].
In a covering approximation space (U, C), the minimal and maximal sets that
contain an element x ∈ U are particularly important. The collection C (C, x) = {K ∈
C : x ∈ K } can be used to define a neighborhood system of x ∈ A.
is called the minimal description of x, i.e. md(C, x) contains the minimal elements
of C (C, x) [2]. On the other hand, the set
From the collections md(C, x) and M D(C, x) Yao and Yao introduced four new
coverings derived from the covering C [16].
1. C1 = ∪{md(C, x) : x ∈ U }
2. C2 = ∪{M D(C, x) : x ∈ U }
3. C3 = {∩(md(C, x)) : x ∈ U } = {∩(C (C, x)) : x ∈ U }
4. C4 = {∪(M D(C, x)) : x ∈ U } = {∪(C (C, x)) : x ∈ U }.
For example, the covering C1 is the collection of all sets in the minimal descrip-
tion of each x ∈ U , while C3 is the collection of the intersections of the minimal
descriptions for each x ∈ U . Additionally, they considered the so-called intersection
reduct C∩ and union reduct C∪ of a covering C:
the minimal description for each element is: md(C, 1) = {{1, 2}}, md(C, 2) =
{{1, 2}, {2, 3}}, md(C, 3) = {{2, 3}}, md(C, 4) = {{4}}. On the other hand, the maxi-
mal descriptions are: M D(C, 1) = {{1, 2, 3}}, M D(C, 2) = {{1, 2, 3}, {2, 3, 4}},
M D(C, 3) = {{1, 2, 3}, {2, 3, 4}}, M D(C, 4) = {{2, 3, 4}}. Therefore, the six cover-
ings obtained from the covering C are:
1. C1 = {{1, 2}, {2, 3}, {4}}
2. C2 = {{1, 2, 3}, {2, 3, 4}}
3. C3 = {{1, 2}, {2}, {2, 3}, {4}}
4. C4 = {{1, 2, 3}, {2, 3, 4}, {1, 2, 3, 4}}
5. C∩ = {{4}, {1, 2}, {1, 2, 3}, {2, 3, 4}}
6. C∪ = {{1, 2}, {2, 3}, {4}}.
Each neighborhood operator defines an ordered pair (apr N ,apr N ) of dual approxi-
mation operators, in the sense that apr N (∼ A) =∼ apr N (A), where ∼ A is the
complement of A:
Matroids and Submodular Functions for Covering-Based Rough Sets 179
From the neighborhood system C (C, x), the minimal and maximal sets that con-
tain an element x ∈ U can also be used for defining the following neighborhood
operators, introduced by Y. Y. Yao and B. Yao [16]:
1. N1 (x) = ∩{K : K ∈ md(C, x)}
2. N2 (x) = ∪{K : K ∈ md(C, x)}
3. N3 (x) = ∩{K : K ∈ M D(C, x)}
4. N4 (x) = ∪{K : K ∈ M D(C, x)}.
According to Eqs. 9 and 10, each neighborhood operator Ni , for i ∈ {1, 2, 3, 4},
defines a pair of approximation operators apr N and apr Ni . A systematic study about
i
neighborhood operators in covering based-rough sets can be found in [3].
2.2 Matroids
One of the meanings of matroid is related to the notion of linear independence. For
example, let us consider the column vectors of the matrix A and the reduced row
echelon form: E A .
⎛ ⎞ ⎛ ⎞
1 0 21 1020
A = ⎝ 1 0 2 2 ⎠ E A = ⎝0 1 4 0 ⎠ (12)
2 −1 0 0 0001
Definition 5 ([11]) Let U be a finite set. A matroid on U is a pair M = (U, I), where
I is a collection of subsets of U with the following properties:
1. ∅ ∈ I.
2. If I ∈ I and I ⊆ I then I ∈ I.
180 M. Restrepo and J. F. Aguilar
3. If I1 , I2 ∈ I and |I1 | < |I2 |, then there exists x ∈ I2 − I1 such that I1 ∪ {x} ∈ I,
where |I | denotes the cardinality of the set I .
The members of I are called independent sets of U . A base for the matroid M is
any maximal set in I. The sets not contained in I are called dependent. A minimal
dependent subset of U is called a circuit of M.
The rank function of a matroid is a function r : P(U ) → N given by
Proposition 2 ([11]) For the function defined in Eq. 13, the following property holds:
for all A, B ⊆ U .
Submodular functions are a generalization of rank functions and are used in graph
theory, game theory, and some optimization problems.
Example 2 Let U = {1, 2, 3, 4} be a set, and its coverings the ones defined in Exam-
ple 1. The values of f Ci (A) for all subsets A of U are shown in Table 1.
3 Matroidal Structures
This section considers the matroidal structures obtained via submodular functions
f Ci (A) for different partitions and coverings.
f ( 1
) f ( 2
)
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
f ( 3
) f ( 4
)
{1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
f ( 5
)
matroidal structure. The following example presents the collection of sets belonging
to the matroid.
Example 3 Let A = {1, 2, 3, 4} be a set with the partitions P1 = {{1}, {2}, {3}, {4}},
P2 = {{1, 2}, {3}, {4}}, P3 = {{1, 2}, {3, 4}}, P4 = {{1, 2, 3}, {4}} and P5 = {{1, 2,
3, 4}}. The matroidal structure I f (Pi ) for each partition Pi , is shown in Fig. 1.
For example, for partition P2 = {{1, 2}, {3}, {4}} we have that A = {2, 3, 4} ∈
I f (P2 ) because f P2 (A) = 3, and for each subset X ⊆ A, we have: f P2 (X ) ≥ |X |.
As we can see, a finer partition has a greater number of independent sets in the
matroid.
This section shows the matroidal structure of a covering C and the associated cover-
ings C1 , C2 , C3 , C4 , C∪ and C∩ , according to the upper approximation number given
in Eq. 15 and the Proposition 4.
Matroids and Submodular Functions for Covering-Based Rough Sets 183
f ( )= f ( ) f ( ) =
1 f
( )
3
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
f ( ) f ( ) =
4 f
( )
2
Example 4 Let A = {1, 2, 3, 4} be a set. For the covering C and the associated
coverings C1 , C2 , C3 , C4 , C∪ and C∩ of Example 1, we have the matroidal structures
shown in the Fig. 2. The dark-circled sets belong to the matroid I f (Ci ).
Proof It is easy to see that f C ({k}) = 2n−1 for each k ∈ A. Let Ak = {1, 2, 3, . . . , k},
we have that f C (Ak ) ≥ f C (Ak−1 ) + f C ({k}) ≥ f C (Ak−1 ) + 2n−1 ≥ n ≥ k. So, for
each A ⊆ U we have that f C (A) ≥ |A|, so each A ⊆ U belongs to I f (C) and there-
fore, I f (C) = P(U ).
From the neighborhood operators defined above, we have the following coverings:
C N = {N (x) : x ∈ U } (17)
Proof It is simple to prove this by using the monotonicity property of the approxima-
tion operators: apr Ni . If A ⊆ B, then apr Ni (A) ⊆ apr Ni (B). Obviously, if A ⊆ B,
then |apr Ni (A)| ≤ |apr Ni (B)| and therefore, g Ni (A) ≤ g Ni (B).
Proof The approximation operators: apr Ni are join morphisms, i.e. apr Ni (A ∪ B) =
apr Ni (A) ∪ apr Ni (B), therefore:
is a matroid in U .
Example 6 For the covering C from Example 1, the values of g Ni (A) are shown
in the first four columns of Table 2, and the lower approximations in the last four
columns.
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4} {1, 2, 3} {1, 2, 4} {1, 3, 4} {2, 3, 4}
{1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4}
Different pre-order relations among coverings can be defined. For example, following
the idea of general topology, we can say that C is finer than D, if D ⊆ C. For the
coverings defined before, we have that: C1 ⊆ C, C2 ⊆ C, C∩ ⊆ C and C∪ ⊆ C.
Other pre-order relations for coverings can be seen in [1].
This section pretends to establish an order relation between the matroidal structures
M f (C) = (U, I f (C)) for the covering C and its associated coverings C1 , C2 , C3 ,
C4 , C∪ and C∩ , and an order relation among the matroids: Iapri .
In this case, we can use Propositions 12 and 13 to establish an order relation
among matroidal structures.
188 M. Restrepo and J. F. Aguilar
f
( )
( ) = f ( ) f ( )
2 f ( )
4 ( )
f 1 f
f ( )
3
apr N1 apr4
apr N4 apr1
Fig. 5 Order relation for matroidal structures derived from approximation operators
Proof Let us suppose that apr i ≤ apr j . If X ∈ Iapr j then apr j (X ) = ∅ and apr i (X )
⊆ apr j (X ) = ∅. So, apr i (X ) = ∅ and X ∈ Iapri .
The order relation among lower approximation operators and the matroids can be
seen in Fig. 5.
Matroids and Submodular Functions for Covering-Based Rough Sets 189
apr11
apr4
apr7
apr12
apr9
apr6
apr14 apr8
apr3
apr10 apr5
apr16 apr1
Fig. 6 Order relation for matroidal structures defined through lower approximation operators
5 Conclusions
This paper presents different matroidal structures obtained from partitions and cov-
erings of some sets, other matroidal structures defined from upper approximation
number, and structures of the lower approximation operators in rough sets. These
structures are generalized to covering-based rough sets, through order-preserving
lower approximation operators.
We use preorder relations among coverings, presented in [1], and the order relation
among sixteen lower approximation operators, presented in [8], to define a partial
order relation on matroidal structures.
It is important to note that finer coverings generate matroidal structures with a
greater number of sets. Results about order among matroids are helpful to select
appropriate structures in typical rough set applications, such as attribute selection
190 M. Restrepo and J. F. Aguilar
and classification. Our future studies will consider these structures and their relation
with the attribute reduction problem via approximation operators in covering-based
rough sets.
Acknowledgements This work was supported by the Universidad Militar Nueva Granada Special
Research Fund, under the project CIAS 2549-2018.
References
1. Bianucci, D., Cattaneo, G.: Information entropy and granular co-entropy of partition and cov-
erings: a summary. Trans. Rough Sets 10, 15–66 (2009)
2. Bonikowski, Z., Brynarski, E.: Extensions and intensions in rough set theory. Inf. Sci. 107,
149–167 (1998)
3. D’eer, L., Restrepo, M., Cornelis, C., Gómez, J.: Neighborhood operators for covering-based
rough sets. Inf. Sci. 336, 21–44 (2016)
4. Huang, A., Zhu, W.: Geometric lattice structure of covering based rough sets through matroids.
J. Appl. Math. 53, 1–25 (2012)
5. Lai, W.: Matroid Theory. Higher Education Press, Beijing (2001)
6. Li, X., Liu, S.: Matroidal approaches to rough sets via closure operators. Int. J. Approx. Reason.
53, 513–527 (2012)
7. Pomykala, J.A.: Approximation operations in approximation space. Bulletin de la Académie
Polonaise des Sciences 35, 653–662 (1987)
8. Restrepo, M., Cornelis, C., Gómez, J.: Partial order relation for approximation operators in
covering-based rough sets. Inf. Sci. 284, 44–59 (2014)
9. Tang, J., She, K., Min, F., Zhu, W.: A matroidal approach to rough set theory. Theor. Comput.
Sci. 47, 1–11 (2013)
10. Tsang, E., Chen, D., Lee, J., Yeung, D.S.: On the upper approximations of covering generalized
rough sets. In: Proceedings of the 3rd International Conference on Machine Learning and
Cybernetics, pp. 4200–4203 (2004)
11. Wang, S., Zhu, W., Min, F.: Transversal and function matroidal structures of covering-based
rough sets. Lect. Notes Comput. Sci. RSKT 2011(6954), 146–155 (2011)
12. Wang, S., Zhu, Q., Zhu, W., Min, F.: Matroidal structure of rough sets and its characterization
to attribute reduction. Knowl.-Based Syst. 54, 155–161 (2012)
13. Wang, S., Zhu, W., Zhu, Q., Min, F.: Four matroidal structures of covering and their relationships
with rough sets. Int. J. Approx. Reason. 54, 1361–1372 (2013)
14. Wu, M., Wu, X., Shen, T.: A new type of covering approximation operators. IEEE Int. Conf.
Electron. Comput. Technol. xx, 334–338 (2009)
15. Xu, W., Zhang, W.: Measuring roughness of generalized rough sets induced by a covering.
Fuzzy Sets Syst. 158, 2443–2455 (2007)
16. Yao, Y.Y., Yao, B.: Covering based rough sets approximations. Inf. Sci. 200, 91–107 (2012)
17. Zakowski, W.: Approximations in the space (u, π ). Demonstratio Mathematica 16, 761–769
(1983)
18. Zhang, Y., Li, J., Wu, W.: On axiomatic characterizations of three pairs of covering based
approximation operators. Inf. Sci. 180, 274–287 (2010)
19. Zhu, W.: Properties of the first type of covering-based rough sets. In: Proceedings of Sixth
IEEE International Conference on Data Mining - Workshops, pp. 407–411 (2006)
20. Zhu, W.: Properties of the second type of covering-based rough sets. In: Proceedings of the
IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Tech-
nology, pp. 494–497 (2006)
21. Zhu, W.: Basic concepts in covering-based rough sets. In: Proceedings of Third International
Conference on Natural Computation, pp. 283–286 (2007)
Matroids and Submodular Functions for Covering-Based Rough Sets 191
22. Zhu, W.: Relationship between generalized rough sets based on binary relation and covering.
Inf. Sci. 179, 210–225 (2009)
23. Zhu, W., Wang, F.: A new type of covering rough set. In: Proceedings of Third International
IEEE Conference on Intelligence Systems, pp. 444–449 (2006)
24. Zhu, W., Wang, F.: On three types of covering based rough sets. IEEE Trans. Knowl. Data Eng.
8, 528–540 (2007)
25. Zhu, W., Wang, J.: Contraction to matroidal structure of rough sets. LNAI 8171, 75–86 (2013)
26. Zhu, X., Zhu, W., Fan, X.: Rough set methods in feature selection via submodular function.
Soft Comput. 21(13), 3699–3711 (2017)
Similar Prototype Methods for Class
Imbalanced Data Classification
Abstract In this paper, new methods for solving imbalanced classification problems
based on prototypes are proposed. Using similarity relations for the granulation of
the universe, similarity classes are generated and a prototype is selected for each
similarity class. Experimental results show that the performance of our methods is
statistically superior to other imbalanced methods.
1 Introduction
In Machine Learning the problems of class imbalance (the examples of one class over
the other predominate, disproportionately) continue to emerge in the industrial and
academic sectors alike. Many classification algorithms used in real-world systems
and applications fail to meet the performance requirements when faced with severe
class distribution skews [1, 2]. Various approaches have been developed in order to
deal with this issue, including some forms of class under-sampling or over-sampling
[3], synthetic data generation [4], misclassification cost sensitive techniques [5],
decision trees [6], rough sets [7], kernel methods [8], ensembles [9–11] or active
learning [12]. Novel classifier designs are still being proposed [13].
An alternative to reduce this problem, maybe, is the classification based on the
Nearest Prototype (NP) [14]. This is a method to determine the value of the decision
attribute of a new object by analyzing its similarity with respect to a set of prototypes
which are selected or generated from the initial set of instances. The way to get this
set of prototypes is based on selecting an original set of labeled examples or replacing
the original set by a different and diminished one [15].
Also, using the Rough Set Theory (RST) [16] it is possible to solve problems
related to: data reduction, discovery of dependencies between data, estimation of
data significance, generation of decision or control algorithms from data, approximate
classification of data, discovery of similarities or differences in data, discovery of
patterns, discovery of cause-effect relationships, etc. In particular, rough sets have had
an interesting application in medicine, business, engineering design, meteorology,
vibration analysis, conflict analysis, image processing, voice recognition, character
recognition, decision analysis, etc. [17].
On the other hand, the algorithms NPBASIR-CLASS [18] and NPBASIR SEL-
CLASS [19] have been recognized for their good results with respect to classification
accuracy. These methods are based on the NP approach combined with the RST.
These methods (NPBASIR-CLASS and NPBASIR SEL-CLASS) are designed to
construct and select prototypes, respectively, using the concepts of Granular Com-
putation [20], and they are based on the NPBASIR algorithm [21]. Granulation of
a universe is performed using a relation of similarity which generates similarity
classes of objects of the universe, and for each similarity class one prototype is built.
To construct the similarity relation is used the method proposed to [22].
The goal of this work is to extend the capabilities of prototype-based methods
and similarity relationships so that they are sensitive to class imbalanced data clas-
sification.
2 Methodology
The method proposed in [18] is an iterative procedure in which prototypes are con-
structed from similarity classes of objects of the universe: a similarity class is con-
structed using the similarity relation R where the similarity class is denoted by
([Oi]R), and a prototype is constructed for this class of similarity. Whenever an
object is included in a class of similarity, it is marked as used, and is not taken into
account when another similarity class is constructed; but used objects can belong to
similarity classes that will be constructed for other non-used objects.
This method uses a similarity relation R and a set of instances X
{X 1 , X 2 , . . . , X n }, each of which is described by a vector of m descriptive features
Similar Prototype Methods … 195
C. SMOTE-RSB* [28]: This is another hybrid data level method. It first applies
SMOTE to introduce new synthetic minority class instances to the training set
and then removes synthetic instances that do not belong to the lower approxi-
mation of its class, computed using rough set theory. This process is repeated
until the training set is balanced.
D. SMOTE- TL [25]: This method consists of the application of Tomek Links as
cleaning method over the data set obtained by the application of SMOTE.
The measure θ (S D) [22] represents the degree to which the similarity between
objects, using the conditional features in A, is equivalent to the similarity obtained
according to the decision feature d. The problem is to find the relations R1 and R2
that maximizes the Similarity Quality Measure, according to the expression (2).
ϕ(x)
max → ∀x∈U (2)
|U |
In the case of decision systems in which the domain of the decision feature is
discrete, as in the case of classification problems, the relation R2 is defined as x R2 y ⇔
x(d) y(d), where x(d) is the value of the decision feature d for the object x.
This measure has been successfully applied as method of calculating weights in
the k-NN function estimator to calculate the initial weights of the links between the
input layer and the hidden layer in a Perceptron multi-layer network [31] and in the
Similar Prototype Methods … 197
rule generation IRBASIR [32] and recently in the construction of the prototypes set
to solve problems of approximation of functions [21] and classification [19, 33].
Next we present the modification alternatives to the NPBASIR-CLASS and
NPBASIR SEL-CLASS algorithms for imbalanced datasets with two classes. These
variants consider modifying the quality measure of the similarity defined by (1).
IMBNPBASIR-CLASS v1 and IMBNPBASIR SEL-CLASS v1: Modification
of the measure of similarity quality (3) as follows:
ϕ ∗ (x)
θ (DS) ∀x∈U (3)
|U |
This modification makes the contribution of the objects of the majority class,
unless it has value 1, where C + is the set of objects belonging to the majority class
and C − to the minority class.
IMBNPBASIR-CLASS v2 and IMBNPBASIR SEL-CLASS v2: Modification
of the measure of similarity quality (5) as the other alternative:
α ∗ θ − (DS) + (1 − α) ∗ θ + (DS)
θ (DS) para 0.5 < α < 1 (5)
2
where θ − (DS) and θ + (DS) are defined by expressions (6) and (7):
− ϕ(x)
θ (DS) ∀x∈C−
−
(6)
|C |
+ ϕ(x)
θ (DS) ∀x∈C+
+
(7)
|C |
With this modification, we can clearly give more weight to the objects of the
minority class when calculating the quality of the similarity.
Also, in both variants, in step 3 of IMBNPBASIR C and IMBNPBASIR SEL-C
use the relation R to build the similarity class of object xi , this means that two
objects are similar if their similarity according to descriptive features is greater than
a threshold ε and belong to the same class:
xi R x j ⇔ F1 xi , x j ≥ ε1 y F2 xi , x j 1
The weights in expression (5) are calculated according to the method proposed
in [31, 34], and the features’ comparison functions ∂i (xi , yi ), which calculates the
similarity between the values of objects x and y with respect to the feature instances
i, is defined by expression (10), where Di is the domain of feature i:
⎧
⎪ |(xi −yi )|
⎨ 1 − Max(ni )−Min(ni ) si i es continuous
⎪
∂i (xi , yi ) 1 si i es discr ete to y xi
yi (10)
⎪
⎪
⎩ 0 si i es discr ete to y x
y
i i
Using expressions (8) and (10) allows works with mixed data, i.e., application
domains, where the domain of descriptive features can be either numeric or symbolic
values.
4 Experimental Results
This section is divide in two parts, firstly we compare the both variants of
IMBNPBASIR-CLASS and IMBNPBASIR SEL-CLASS with the state-of-the-art
methods for imbalanced classification over the entire collection of 89 datasets. Next,
we provide a detailed analysis for different IR levels (low IR, high IR, and very high
IR). Furthermore, we compare the algorithms proposed combined with three meth-
ods SMOTE, SMOTE-ENN, SMOTE-RSB*, SMOTE-TL with the state-of-the-art
methods for imbalanced classification.
We consider 89 datasets with different imbalance ratios (IR) (between 1.82 and
129.44) to evaluate our proposal. You can find and download all the datasets from
KEEL-dataset repository [35] from KEEL-dataset webpage (http://keel.es/datasets.
php). The characteristics of these datasets can be found in Table 1, showing the IR,
the number of instances (Inst), and the number of attributes (Attr) for each of them.
Apart from considering the dataset collection as a whole, in our experimental
study we have also considered three subsets of the collection based on their IR. The
purpose of this division is to evaluate the behavior of the algorithms at different
imbalance levels.
(1) IR < 9 (low imbalance): This group contains 22 datasets, all with IR lower than
9.
(2) IR ≥ 9 (high imbalance): This group contains 49 datasets, all with IR at least 9.
(3) IR ≥ 33 (very high imbalance): This group contains 18 datasets, all with IR at
least 33.
This section presents the results of the experimental analysis. In particular, the
proposed algorithms are compared with the state-of-the-art algorithms selected for
the comparative study with the objective of determining which is the most competitive
Table 1 Description of the datasets used in the experimental evaluation
Datasets IR Inst Attr Datasets IR Inst Attr
glass1 1.82 214 9 ecoli-0-1-4-7_vs_5-6 12.28 332 7
ecoli-0_vs_1 1.86 220 9 cleveland-0_vs_4 12.31 173 13
wisconsinImb 1.86 683 7 ecoli-0-1-4-6_vs_5 13 280 6
pimaImb 1.87 768 9 ecoli4 13.84 336 7
iris0 2 150 4 shuttle-c0-_vs_-c4 13.87 1829 9
glass0 2.06 214 9 yeast-1_vs_7 14.3 459 7
Similar Prototype Methods …
Table 2 Mean AUC for state-of-the-art methods and the proposed methods for different IR levels
Algorithm All <9 >9 >33
S-C4.5 0.83 0.86 0.85 0.71
B-C4.5 0.82 0.87 0.83 0.70
E-C4.5 0.83 0.87 0.85 0.72
TL-C4.5 0.82 0.86 0.84 0.71
CS-C4.5 0.82 0.87 0.83 0.71
IMBNP-C-v1 0.90 0.86 0.90 0.94
IMBNP-SC-v1 0.92 0.86 0.93 0.95
IMBNP-C-v2 0.90 0.89 0.88 0.97
IMBNP-SC-v2 0.95 0.90 0.96 0.99
proposals in each of the four blocks of experiments considered (all sets, the low IR,
Those of high IR and those of very high IR).
SMOTE, SMOTE-ENN, SMOTE-RSB* and SMOTE-TL are four preprocessing
methods that need to be combined with a base classifier, for this purpose we chose
C4.5 [23] a well-known classifier. Similarly, we will consider Cost-sensitive C4.5
decision tree (CS-C4.5), as imbalanced learning method to compare with our method;
like as discussed in previously sections.
Table 2 shows the mean AUC of the selected preprocessing algorithms using as
classifiers C4.5 and the all IMBNPBASIR variants for all datasets, the low IR, High
IR and very high IR. The columns in the tables correspond to: SMOTE-C4.5 (S-
C4.5), SMOTE-RSB*-C4.5 (B-C4.5), SMOTE-ENN-C4.5 (E-C4.5), SMOTE-TL-
C4.5 (TL-C4.5) IMBNPBASIR-CLASS v1 (IMBNP-C-v1), IMBNPBASIR SEL-
CLASS v1 (IMBNP-SC-v1), IMBNPBASIR-CLASS v2 (IMBNP-C-v2) and IMB-
NPBASIR SEL-CLASS v2 (IMBNP-SC-v2). We can see that IMBNPBASIR SEL-
CLASS v2 obtains the highest average AUC.
In order to compare the different algorithms appropriately, we will conduct a
statistical analysis using nonparametric tests as suggested in the literature [36].
We first use Friedman’s aligned-ranks test [37] to detect statistical differences
between a set of algorithms. The Friedman test computes the average aligned-ranks
of each algorithm, obtained by computing the difference between the performance of
the algorithm and the mean performance of all algorithms for each dataset [11]. The
lower the average rank, the better the corresponding algorithm. Then, if significant
differences are found by the Friedman test, we check if the control algorithm (the
one obtaining the smallest rank) is significantly better than the others using Holm’s
posthoc test [11, 38] (Tables 3, 4, 5, 6, 7, 8, 9 and 10).
After this experimental study, it can be observed that the method that offers the
best results for the imbalanced case is IMBNPBASIR SEL-CLASS v2 in all variants
studied (considering all sets, low IR, high IR and very high IR). In the low IR case
IMBNPBASIR SEL-CLASS v2 obtained comparable results respect IMBNPBASIR
SEL-CLASS v1. On the other hand, in the very high IR all variants of IMBNPBASIR
obtain comparable results and they get results significantly higher than the state-of-
Similar Prototype Methods … 203
Table 4 Holm’s posthoc procedure for all Imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 10.468653 0 0.00625 Rejected
7 TL-C4.5 10.304439 0 0.007143 Rejected
6 B-C4.5 10.099171 0 0.008333 Rejected
5 E-C4.5 9.195993 0 0.01 Rejected
4 S-C4.5 9.00441 0 0.0125 Rejected
3 IMBNP-C-v1 4.379044 0.000012 0.016667 Rejected
2 IMBNP-SC-v1 4.119039 0.000038 0.025 Rejected
1 IMBNP-C-v2 3.51692 0.000437 0.05 Rejected
Table 6 Holm’s posthoc procedure for low imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 TL-C4.5 4.651572 0.000003 0.00625 Rejected
7 S-C4.5 3.385464 0.000711 0.007143 Rejected
6 E-C4.5 3.192795 0.001409 0.008333 Rejected
5 B-C4.5 3.082699 0.002051 0.01 Rejected
4 CS-C4.5 3.02765 0.002465 0.0125 Rejected
3 IMBNP-C-v2 2.89003 0.003852 0.016667 Rejected
2 IMBNP-C-v1 2.312024 0.020776 0.025 Rejected
1 IMBNP-SC-v1 0.990867 0.32175 0.05 No rejected
Table 8 Holm’s posthoc procedure for high imbalance datasets, using IMBNP-SC-v2 as the control
algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 8.741877 0 0.00625 Rejected
7 B-C4.5 8.280807 0 0.007143 Rejected
6 TL-C4.5 8.059494 0 0.008333 Rejected
5 E-C4.5 7.026698 0 0.01 Rejected
4 S-C4.5 6.97137 0 0.0125 Rejected
3 IMBNP-SC-v1 4.684466 0.000003 0.016667 Rejected
2 IMBNP-C-v1 3.375028 0.000738 0.025 Rejected
1 IMBNP-C-v2 1.99182 0.046391 0.05 Rejected
Similar Prototype Methods … 205
Table 10 Holm’s posthoc procedure for very high imbalance datasets, using IMBNP-SC-v2 as the
control algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
8 CS-C4.5 5.507655 0 0.00625 Rejected
7 B-C4.5 5.385938 0 0.007143 Rejected
6 E-C4.5 5.32508 0 0.008333 Rejected
5 S-C4.5 4.777358 0.000002 0.01 Rejected
4 TL-C4.5 4.473068 0.000008 0.0125 Rejected
3 IMBNP-C-v1 1.612739 0.106801 0.016667 No rejected
2 IMBNP-C-v2 1.338877 0.180611 0.025 No rejected
1 IMBNP-SC-v1 0.334719 0.737837 0.05 No rejected
the-art algorithms. In the other cases it surpasses the latter and to the other algorithms
of the state of the art.
Table 11 shows the mean AUC of the selected preprocessing algorithms using
as classifiers IMBNPBASIR and the state-of-the-art algorithms. The columns in the
tables correspond to:
• SMOTE + IMBNPBASIR-CLASS v1: S-IMBNP-C-v1
• SMOTE + IMBNPBASIR-CLASS v2: S-IMBNP-C-v2
• SMOTE + IMBNPBASIR SEL-CLASS v1: S-IMBNP-SC-v1
• SMOTE + IMBNPBASIR SEL-CLASS v2: S-IMBNP-SC-v2
• SMOTE-ENN + IMBNPBASIR-CLASS v1: E-IMBNP-C-v1
• SMOTE-ENN + IMBNPBASIR-CLASS v2: E-IMBNP-C-v2
• SMOTE-ENN + IMBNPBASIR SEL-CLASS v1: E-IMBNP-SC-v1
• SMOTE-EE + IMBNPBASIR SEL-CLASS v2: E- IMBNP-SC -V2
• SMOTE-TL + IMBNPBASIR-CLASS v1: TL-IMBNP-C-v1
• SMOTE-TL + IMBNPBASIR-CLASS v2: TL-IMBNP-C-v2
• SMOTE-TL + IMBNPBASIR SEL-CLASS v1: TL-IMBNP-SC-v1
• SMOTE-TL + IMBNPBASIR SEL-CLASS v2: TL-IMBNP-SC-v2
206 Y. R. Alvarez et al.
Table 11 Mean AUC for state-of-the-art methods and the proposed methods for different IR levels
combined with preprocessed datasets
Algorithm All <9 >9 >33
S-IMBNP-C-v1 0.91 0.86 0.92 0.93
S-IMBNP-C-v2 0.90 0.86 0.92 0.92
S-IMBNP-SC-v1 0.94 0.89 0.95 0.97
S-IMBNP-SC-v2 0.94 0.89 0.95 0.97
E-IMBNP-C-v1 0.88 0.86 0.90 0.83
E-IMBNP-C-v2 0.87 0.86 0.90 0.83
E-IMBNP-SC-v1 0.90 0.89 0.93 0.86
E-IMBNP-SC-v2 0.90 0.89 0.93 0.86
TL-IMBNP-C-v1 0.90 0.86 0.92 0.88
TL-IMBNP-C-v2 0.89 0.85 0.92 0.88
TL-IMBNP-SC-v1 0.92 0.89 0.95 0.90
TL-IMBNP-SC-v2 0.91 0.88 0.93 0.91
Table 13 Holm’s posthoc procedure for all imbalance preprocessed datasets, using S-IMBNP-SC-
v2 as the control algorithm
i Algorithm z p Holm Hypothesis
(R0 − Ri )/S E
16 TL-C4.5 10.568173 0 0.003125 Rejected
15 B-C4.5 10.449429 0 0.003333 Rejected
14 CS-C4.5 10.4049 0 0.003571 Rejected
13 E-C4.5 10.10062 0 0.003846 Rejected
12 S-C4.5 9.937348 0 0.004167 Rejected
11 TL-GEN-V2 5.744217 0 0.004545 Rejected
10 E-GEN-V2 5.321194 0 0.005 Rejected
9 TL-GEN-V1 5.2544 0 0.005556 Rejected
8 E-GEN-V1 5.039178 0 0.00625 Rejected
7 S-GEN-V2 4.452882 0.000008 0.007143 Rejected
6 S-GEN-V1 4.14118 0.000035 0.008333 Rejected
5 E-SEL-V1 2.463928 0.013742 0.01 No rejected
4 TL-SEL-V1 2.078011 0.037708 0.0125 No rejected
3 TL-SEL-V2 1.922161 0.054586 0.016667 No rejected
2 E-SEL-V2 1.536244 0.124478 0.025 No rejected
1 S-SEL-V1 0.289437 0.772247 0.05 No rejected
5 Conclusions
Four new proposals for imbalanced data classification were shown in the paper. The
novelty of the proposal lies in the use of hybridization of the Rough Sets Theory,
specifically the use to measure similarity quality, and concepts of classification based
on prototypes, to classify objects under these conditions. The implementation of this
measurement allows creating a prototype that covers the objects that have as decision
value the majority class of the similarity class.
Finally, after the experimental study and the statistical analysis carried out, it can
be concluded that the proposed methods are very competitive in imbalanced domains,
since they get results significantly higher than the state-of-the-art algorithms.
References
1. Kuang, D., Ling, C.X., Du, J.: Foundation of mining class-imbalanced data. In: Pacific-Asia
Conference on Knowledge Discovery and Data Mining. Springer (2012)
2. García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition
in DNA sequences. Knowl.-Based Syst. 25(1), 22–34 (2012)
208 Y. R. Alvarez et al.
3. Garcia-Pedrajas, N., Perez-Rodriguez, J., de Haro-Garcia, A.: OligoIS: scalable instance selec-
tion for class-imbalanced data sets. IEEE Trans. Cybern. 43(1), 332–346 (2013)
4. Thanathamathee, P., Lursinsap, C.: Handling imbalanced data sets with synthetic boundary
data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recogn. Lett.
34(12), 1339–1347 (2013)
5. McCarthy, K., Zabar, B., Weiss, G.: Does cost-sensitive learning beat sampling for classifying
rare classes? In: Proceedings of the 1st International Workshop on Utility-Based Data Mining.
ACM (2005)
6. Liu, W., et al.: A robust decision tree algorithm for imbalanced data sets. In: Proceedings of
the 2010 SIAM International Conference on Data Mining. SIAM (2010)
7. Liu, J., Hu, Q., Yu, D.: A comparative study on rough set based class imbalance learning.
Knowl.-Based Syst. 21(8), 753–763 (2008)
8. Hong, X., Chen, S., Harris, C.J.: A kernel-based two-class classifier for imbalanced data sets.
IEEE Trans. Neural Netw. 18(1), 28–41 (2007)
9. Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-
, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4),
463–484 (2012)
10. García-Pedrajas, N., García-Osorio, C.: Boosting for class-imbalanced datasets using geneti-
cally evolved supervised non-linear projections. Prog. Artif. Intell. 2(1), 29–44 (2013)
11. Galar, M., et al.: EUSBoost: enhancing ensembles for highly imbalanced data-sets by evolu-
tionary undersampling. Pattern Recogn. 46(12), 3460–3471 (2013)
12. Ertekin, S., Huang, J., Giles. Active learning for class imbalance problem. In: Proceedings
of the 30th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval. ACM (2007)
13. Di Martino, M., et al.: Novel classifier scheme for imbalanced problems. Pattern Recogn. Lett.
34(10), 1146–1151 (2013)
14. Bezdek, J.C., Kuncheva, L.I.: Nearest prototype classifier designs: an experimental study. Int.
J. Intell. Syst. 16(12), 1445–1473 (2001)
15. Triguero, I., et al.: A taxonomy and experimental study on prototype generation for nearest
neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(1), 86–100 (2012)
16. Pawlak, Z., et al.: Rough sets. Commun. ACM 38(11), 88–95 (1995)
17. Bello, R., Luis Verdegay, J.: Los conjuntos aproximados en el contexto de la Soft Computing.
Revista Cubana de Ciencias Inf. 4 (2010)
18. Fernández Hernández, Y.B., et al.: An approach for prototype generation based on similarity
relations for problems of classification. Comput. Syst. 19(1), 109–118 (2015)
19. Frias, M., et al.: Prototypes selection based on similarity relations for classification problems.
In: Engineering Applications—International Congress on Engineering (WEA), Bogota. IEEE
(2015)
20. Yao, Y.: Granular computing: basic issues and possible solutions. In: Proceedings of the 5th
Joint Conference on Information Sciences. Citeseer (2000)
21. Bello-García, M., García-Lorenzo, M.M., Bello, R.: A method for building prototypes in the
nearest prototype approach based on similarity relations for problems of function approxima-
tion. In: Advances in Artificial Intelligence, pp. 39–50. Springer (2012)
22. Filiberto, Y., et al.: A method to build similarity relations into extended rough set theory. In:
2010 10th International Conference on Intelligent Systems Design and Applications (ISDA).
IEEE (2010)
23. Zhao, J.H., Li, X., Dong, Z.Y.: Online rare events detection. In: Pacific-Asia Conference on
Knowledge Discovery and Data Mining. Springer (2007)
24. Lee, Y.-H., et al.: A preclustering-based ensemble learning technique for acute appendicitis
diagnoses. Artif. Intell. Med. 58(2), 115–124 (2013)
25. Quinlan, J.R.: C4.5: Programming for machine learning. Morgan Kauffmann 38 (1993)
26. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for
balancing machine learning training data. ACM SIGKDD Explor. Newsl 6(1), 20–29 (2004)
Similar Prototype Methods … 209
27. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl.
Data Eng. 14(3), 659–665 (2002)
28. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res.
16, 321–357 (2002)
29. Ramentol, E., et al.: SMOTE-FRST: a new resampling method using fuzzy rough set theory. In:
10th International FLINS Conference on Uncertainty Modelling in Knowledge Engineering
and Decision Making (to appear) (2012)
30. Ramentol, E., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling
and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl.
Inf. Syst. 33(2), 245–265 (2012)
31. Filiberto, Y., et al.: An analysis about the measure quality of similarity and its applications
in machine learning. In: Fourth International Workshop on Knowledge Discovery, Knowledge
Management and Decision Support. Atlantis Press (2013)
32. Filiberto, Y., et al.: Algoritmo para el aprendizaje de reglas de clasificación basado en la teoría
de los conjuntos aproximados extendida. Dyna 78(169), 62–70 (2011)
33. Fernandez, Y.B., et al.: Effects of using reducts in the performance of the irbasir algorithm.
Dyna 80(182), 182–190 (2013)
34. Filiberto, Y., et al.: Using PSO and RST to predict the resistant capacity of connections in
composite structures. In: Nature Inspired Cooperative Strategies for Optimization (NICSO
2010) (2010)
35. Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algo-
rithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17 (2011)
36. García, S., et al.: Advanced nonparametric tests for multiple comparisons in the design of
experiments in computational intelligence and data mining: experimental analysis of power.
Inf. Sci. 180(10), 2044–2064 (2010)
37. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
38. Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian J. Stat. 65–70
(1979)
Early Detection of Possible
Undergraduate Drop Out Using a New
Method Based on Probabilistic Rough
Set Theory
1 Introduction
Two essential elements in any educational project are retention and completion of
studies by students. Drop out is one of the most complex problems that educational
institutions are facing nowadays. Drop out means the fact that a number of students
enrolled does not follow the normal path of the academic program, either by repeating
courses or withdrawing from it, permanently or temporarily [37]. There might exist
several causes for student’s drop out [36]. Determining how likely a student is to
complete successfully the academic year is both: important and challenging.
Our main objective in this paper is therefore to create a reliable tool to predict the
likelihood of each student to successfully pass the academic year. Such a prediction
will be carried on the description of the students available at the moment of enroll-
ment. This information will be very useful to create categories of students so the
attention to them could be personalized in terms of their expected results allowing
us to reduce drop out. The study will be carried out on the Informatics Engineering
department of University of Camagüey, Cuba where the drop out rate of freshmen is
about 25%.
A straightforward way to perform such a prediction is to use data mining tech-
niques [24, 32, 33]. A problem affecting the accuracy of standard algorithms is the
fact that, normally, much more students smoothly pass the academic year than those
that have problems with specific topics. This means that any data mining technique
will have to cope with much more examples of successful students than the opposite,
biasing the technique to predict a success more often than a failure. In fact, for the
academic process, the most important prediction are the possible failures.
In machine learning this phenomenon is known as class imbalance problem, and
has been identified as a current challenge in Data Mining [42]. Imbalanced prob-
lems could be tackled from different perspective. Four techniques categories have
been established: data level, algorithm level, cost sensitive and ensembles. In this
paper we focus on solutions at algorithm level. We use the Rough Set Theory (RST)
[28, 45] to create a probabilistic approach in order to predict drop out.
This paper introduces two novel ideas for classifying highly imbalanced data sets
using a probabilistic RST. The first idea is to use two different similarity thresholds
for deciding on the membership of the concepts to the classes. The threshold used to
decide on the inseparability of the objects that belong to the positive class is set to a
very low value. When deciding on the negative class we use a higher threshold. This
way helps the less represented class. The second idea presented in this paper is to
combine a posteriori probabilities (for a given observation) with a priori probabilities
(original distribution of concepts) into the classification algorithm.
We formally introduce the problem in Sect. 2. The background information is
presented during the next sections. Section 3 introduces the imbalanced data set
techniques. We also describe some preprocessing techniques for imbalanced data sets
and discuss the evaluation metric used in this work. Section 4 discuss the standard
and probabilistic Rough Set Theory (RST) respectively. In Sect. 5 we present our
proposal. In Sect. 6 we introduce the experimental study, i. e., the benchmark data
sets, the statistical tests for performance comparison and the experimental analysis
in order to validate the goodness of our proposal. In Sect. 7 we draw the conclusions.
Early Detection of Possible Undergraduate Drop Out … 213
Students drop out is a higher educational issue that attracts several researchers nowa-
days [19, 22] given its importance. Let the cause be reprobation, or just social-
economic factors, the fact is that today drop out rates are higher than ever before
[10]. We can also add all those students that do not pass the exams of the ongoing
academic year and decide to repeat the courses in the following year.
This problem also affects the economy of the countries for it makes the pro-
fessional formation process even more expensive. Students drop out causes have
widely been studied. Reasons might vary from family related conditions, parents
educational background, how old the students are when enrolling into to the system
or other social factors given the motivation of the students. It is for sure a tough task to
know beforehand whether every given student will succeed or not in the University.
There are many applications using data mining techniques to improve the edu-
cational system, such as the tool for auto-regulated intelligent tutoring systems pre-
sented in [11] and the decision support system presented in [21]. In this paper we
introduce a data mining approach for the students drop out prediction.
We consider data collected over the period 2008–2012 containing information about
the students on the Informatics Engineering program. We selected a target dataset of
292 students who were in their first academic year at the department. The selected
variables have their foundation on psychologists and educators studies. All such
mention variables aim to describe each student in details in order to identify the
major causes of school failure. The variables are shown in Table 1. We proceed to
label the students once they are characterized into two single classes: students that
smoothly promoted or those that did not pass all subjects of the semester.
The main goal is to know, at the moment of enrollment in college, how likely
the student is to smoothly promote the current year. This will make it possible to
group the students given the probability of success or failure and give them a special
treatment in order to help them promoting.
In order to perform such a prediction the task is divided in the following steps:
1. Making-up the data-set: (a) Determining the variables, (b) Measuring each vari-
able and (c) Labeling each observation as a student that promoted or not
2. Determining the data-set characteristics given the amount of instances on each
class
3. Choosing the classifier for the application
4. Choosing (if necessary) the preprocessing techniques for the application
5. Supporting the study experimentally
Table 2 shows the description of the data set. There exist two class values, the
first for the students that smoothly pass the academic year and a second one for those
214 E. Ramentol et al.
that does not pass all subjects. It might be noticed the big difference in the amount
of students belonging to each class. This pose a problem for classifiers known as
imbalanced data set. The imbalanced data set problem is introduced next.
The learning task in data mining, when data present a disproportional representation
in the number of examples in classes, it is a challenge for the researcher of this area.
This phenomenon is known as imbalanced class problem and is very common in
many applications from the real world [14, 25].
Early Detection of Possible Undergraduate Drop Out … 215
Classical machine learning algorithms often obtain high accuracy with the major-
ity class while with the minority class quite the opposite occurs. This happens because
the classifier focus only on global measures that do not take into account the data dis-
tribution by classes [35]. Nevertheless the most interesting knowledge often focuses
on the minority class [18].
The imbalanced classification problem can be tackled using four main types of
solutions:
1. Sampling (solutions at the data level) [2, 7]: this kind of solution consists of
balancing the class distribution by means of a preprocessing strategy.
2. Design of specific algorithms (solutions at the algorithmic level) [20]: in this
case we need to adapt our method to deal directly with the imbalance between
the classes, for example, modifying the cost per class or adjusting the probability
estimation in the leaves of a decision tree to favor the positive class [41].
3. Cost sensitive: this kind of methods incorporate solutions at data level, at algo-
rithmic level, or at both levels together, considering higher misclassification
costs for the examples of the positive class with respect to the negative class,
and therefore, trying to minimize higher cost errors [48].
4. Ensemble solutions: [15] Ensemble techniques for imbalanced classification
usually consist of a combination between an ensemble learning algorithm and
one of the techniques above, specifically, data level and cost-sensitive ones.
Following, we described some high-quality proposals that will be used in our exper-
imental study.
• Synthetic Minority Oversampling Technique (SMOTE) . [7] is an oversampling
method.
• SMOTE-Tomek links. [2] Use Tomek links to the oversampled training set as a
data cleaning method.
• SMOTE-ENN. [2] ENN tends to remove more examples than the Tomek links
do.
• Borderline-SMOTE1 and Borderline-SMOTE2. These methods only oversam-
ples or strengthens the borderline minority examples [17].
• Safe-Level-SMOTE. This method assigns each positive instance its safe level
before generating synthetic instances [6].
• SPIDER2 [26]. This method consists of two phases corresponding to preprocess-
ing the majority and minority classes respectively.
• SMOTE-RSB*. [31] This hybrid method constructs new samples using the Syn-
thetic Minority Oversampling Technique together with the application of an editing
technique based on the Rough Set Theory and the lower approximation of a subset.
216 E. Ramentol et al.
• Cost-sensitive C4.5 decision tree (C4.5-CS): [38]. This method builds decision
trees that try to minimize the number of high cost errors and, as a consequence of
that, leads to the minimization of the total misclassification costs in most cases.
• Cost-sensitive Support Vector Machine (SVM-CS): [40]. This method is a mod-
ification of the soft-margin support vector machine [39].
• EUSBOOST: [16]. It is an ensemble method that uses Evolutionary UnderSam-
pling guided boosting.
When facing an imbalance problem, the traditional predictive accuracy is not appro-
priate. It occurs because the costs of different errors vary from one class to another
markedly [8, 31].
In imbalance domains one of the most appropriate measure is the Receiver Oper-
ating Characteristic (ROC) graphics [5]. In these graphics, the tradeoff between the
benefits (True Positive rate) and costs (False Positive rate) can be visualized, and it
acknowledges the fact that the capacity of any classifier cannot increase the number
of true positives without also increasing the false positives. The area under the ROC
curve (AUC) corresponds to the probability of correctly identifying which of the two
stimuli is noise and which is signal plus noise.
In this paper, we use the definition given by Fawcett [13], who proposed an
algorithm that, instead of collecting ROC points, adds successive areas of trapezoids
to the computed AUC value. Fawcett’s proposal calculate the AUC by approximating
the continuous ROC-curve by a finite number of points. The coordinates of these
points in ROC-space are taken as false positive and true positive rates obtained
by varying the threshold θ of the probability above which an instance is classified
as positive. The curve itself is approximated by linear interpolation between the
calculated points. The AUC can therefore be determined as the sum of the areas of
the subsequent trapezoids. This method is referred to as the trapezoid rule.
Rough sets theory was presented in 1982 [27]. This theory has evolved into a method-
ology for dealing with different types of problems, such as uncertainty produced by
inconsistencies in data [3].
RST is a mathematical tool to express uncertainty when it appears as inconsistency.
RST can deal with quantitative and qualitative data, it is not necessary to eliminate
missing values. RST has become in a powerful tool for data mining task such feature
selection, instance selection, rules extraction and so on [30].
Early Detection of Possible Undergraduate Drop Out … 217
RST provides tree concepts: the lower and upper approximations of a subset X ⊆
U and the boundary region. These concepts were originally introduced in reference
to an indiscernibility relation R.
Using the concept of similarity, the classical RST has been extended. This exten-
sion has been possible by considering that objects that are not indiscernible but
sufficiently close or similar can be grouped into the same class [34]. The main objec-
tive of the similarity relation is to create a more flexible model. There are many
similarity functions, which depend on the type of compared attribute. The following
similarity relation R must satisfy some minimal requirements:
R being an indiscernibility relation (equivalence relation) defined on U , R is a
similarity relation extending R if ∀x ∈ U , R (x) ⊆ R (x)and ∀x ∈ U , ∀y ∈ R (x),
R (y) ⊆ R (x), where R (x)is a similarity class of x, ie. R (x) = {y ∈ U : y R x}.
The approximation of the set X ⊂ U , using the inseparability relation R, has been
induced as a pair of sets called R− lower approximation of X and R−upper approx-
imation of X . The lower approximation B∗ (X ) and upper approximation B ∗ (X ) of
X are defined respectively as shown in Eqs. 1 and 2.
B∗ (X ) = {x ∈ X : R (x) ⊆ X } (1)
B ∗ (X ) = R (x) (2)
x∈X
Taking into account the equations defined in Eqs. 1 and 2, the boundary region of
X is defined for the relation R as:
B N B (X ) = B ∗ (X ) − B∗ (X ) (3)
If the set B N B is empty, then the X set is exact with respect to the relation R .
If, on the contrary, B N B (X ) = θ , the X set is inexact or approximated with respect
to R .
In the last years, many researcher have put a lot of effort in create some approaches
for the construction of probabilistic rough set models. These approaches have been
proposed based in the concept of rough membership function.
In [43] the authors form two classes of rough set models: the algebraic and proba-
bilistic rough. The first one focus on algebraic and qualitative properties of the theory.
The second one are more practical and capture quantitative properties of the theory
[4, 45].
Using rough membership functions and rough inclusion, the classical rough set
approximation are reformulated, defining larger positive and negative regions and
218 E. Ramentol et al.
providing probabilities that define region boundaries. In boundary region we find the
objects that induce uncertainty, try to reduce this region is challenging task that face
the researcher in this area. Probabilistic rough set provides a possible solution by
re-defining more flexible Positive (POS) and Negative (NEG) regions, that is to say,
including in POS and NEG objects that was previously in boundary region [4].
Pawlak et al. introduce in [28] a proposal that defined probabilistic approxima-
tions. This proposal put an element x into the lower approximation of A if the
majority of its equivalent elements [x] are in A. The lower and upper 0.5 proba-
bilistic approximation operators are dual to each other. The boundary region consists
of those elements whose conditional probabilities are exactly 0.5, which represents
maximal uncertainty.
The requirement of this approach is too loose for real decisions. To overcome
these difficulties, probabilistic rough set models are proposed to generalize the 0.5
probabilistic rough sets model, and a pair of threshold parameters is introduced.
By considering two separate cases, Yao and Wong [47] introduced a more general
probabilistic approximations in the decision-theoretic rough set model [4].
The objects in the same equivalent class have the same degree of membership.
This membership may be interpreted as the probability of x belonging to X given
that x belongs to an equivalence class, this interpretation leads to probabilistic rough
sets [43].
The rough membership function is defined by Eq. 6, this measure in the interval
[0, 1].
|X ∩ B(x)|
μ BX (x) = (6)
|B(x)|
Early Detection of Possible Undergraduate Drop Out … 219
B ∗ (X ) = x ∈ U/μx B (x) > 0 (8)
B ∗τ (X ) = x ∈ U/μx B (x) > 1 − τ (10)
In this section we introduce a new approach for soft classification over imbalanced
domains based on probabilistic rough set. The membership probability of an instance
to a class is given as follows:
[x] X
Pr (X |[x]) = (11)
|[x]|
wherePr (X |[x]) is the membership probability of x to the class X , [x] X are the
objects belonging to the class X that are similar to x and [x] are all objects similar
to x in the universe.
Using the probabilistic RST has shown very good results [46]. Using this approach
over imbalanced data sets reaches a poor performance though. Based on this, we
220 E. Ramentol et al.
When classifying an instance, the method needs to find all similar instances in the
training set. Such a similarity is determined using expression Eq. 12 by fixing a
threshold value. It is quite common to use 0.9 for such a threshold, making the set
of found objects really similar to the original one.
n
wk ∗ δk (xik , x jk )
k=1
Similarit y Matri x(i, j) = (12)
M
where n is the number of features, wk the weight for feature k, xik and x jk are the
values for feature k respectively, δk is the function of comparison for feature k, M is
the number of features considered in the equivalence relation, B is the features set
considered in the equivalence relation.
The weight of a feature is defined as:
1 if k ∈ B
wk = (13)
0 other case
where max Ak and min Ak are the extremes of the domain intervals for feature k.
Nevertheless, it is demonstrated in [31] that it is necessary to reduce this threshold
for the imbalanced data sets. Reducing such a value would mean soften the restrictions
for the search. In the imbalanced context, lowering the threshold for the minority
class means using a less restrictive search for the minority class in order to cope with
Early Detection of Possible Undergraduate Drop Out … 221
its poor representation with respect to the other class. As a consequence we might
expect a fairer classification.
We propose the use of a different threshold value for each class. By doing so,
the classification method helps the less represented instances to be better classified.
Remember that due to the high overlapping existing between the classes almost all
instances are similar to the most represented instances and almost none (or even none
at all) are similar to the less represented samples.
Standard classification methods ignore the original distribution of data. This is nor-
mally a valid procedure when the classes are balanced. For the imbalanced learning
problem, ignoring such a distribution could cause a poor classification. We propose
to incorporate the original distribution in the probabilities calculation.
Let C X be the total amount of samples belonging to class X and C = C X the
X
total amount of samples on the dataset. The a priori probability of any new sample to
belong to a given class X can be expressed as Pr (X ) = CCX . For a new observation we
may calculate the probability of the new sample to belong to each class as proposed
in Eq. 11. In a balanced dataset, this expression might be sufficient for the a priori
probabilities are homogeneous. In an imbalanced dataset the original distribution of
the samples is not so. We propose to measure the probability of belonging to each
class based on the ratio of the a posteriori and a priori probabilities:
Pr (X | [x])
R(X | [x]) =
Pr (X )
The probability for each class can be expressed as its own ratio normalized by the
total aggregation of all of them:
R (X | [x])
P(X | [x]) = (16)
Y R (Y | [x])
Finally, the membership function to the positive class can be expressed as the
average of the probability of the pattern to belong to the positive class and the
probability of the pattern not to belong to the negative class:
P (X | [x]) + 1 − P (X | [−x])
μ X (x) = (17)
2
The membership function to the negative class can be obtained analogously.
222 E. Ramentol et al.
Algorithm 1 RST-2Simil
Require: T st, the set of test examples;
T ra, the set of training examples;
threshold1, the threshold to determinate similarity between minority instances;
threshold2, the threshold to determinate similarity between majority instances;
Ensure: Pmin , probability to belong to positive class,
Pmay , probability to belong to majority class.
1: for each x ∈ T st do
2: Pmin = Compute Pr ob(T ra, theshold1)
3: Pmay = Compute Pr ob(T ra, theshold2)
4: if Pmin ≥ Pmay then
5: x ∈ Min class
6: else
7: x ∈ Ma jclass
8: end if
9: end for
6 Experimental Study
To analyze our proposal, we have considered 18 data-sets from the UCI repository
[1] with highly imbalanced rates (higher than 9). The description of these data-sets
appears in Table 3 (column IR indicates the imbalance ratio).
The results of the experimental study for the test partitions are shown in
Table 4, where in the first 3 columns we have included the result for 1-NN, Cost-
Sesitive-C4.5, C4.5, Cost-Sensitive-MLP and our proposal, the best method is high-
lighted in bold for each data-set. In the remain columns we can observe the results
for 5 resampling techniques (based in SMOTE) using C4.5. We can observe the
goodness of our method approach since it obtains the highest performance value for
almost all the methodologies that are being compared.
We support the comparison with a statistical analysis in order to demonstrate
the superiority of our proposal. The average ranks of the algorithms are shown in
Table 5. The p-value computed by the Friedman test is approximately 0, which
indicates that the hypothesis of equivalence can be rejected with high confidence.
Figure 1 illustrates the procedure carried out to tune the parameters. Figure 1a
shows the average AUC value over the Y axis and the variation of the threshold1 over
the X axis. We can observe that the best result is obtained when using 0.5. Figure
1b shows a similar result but associated to the threshold2. We can observe that the
best result is obtained using 0.6, although the difference is much shorter than when
varying threshold1.
In this section, we show the results of the selected state-of-the-art methods and a
comparison with our proposal. In order to have a better idea of the contribution of
each proposal we first compare the probabilistic RST with and without each upgrade,
this is:
yeast-1-2-8-9_vs_7 0.5530 0.6769 0.6156 0.4307 0.7222 0.7051 0.6397 0.6137 0.5682 0.6260
yeast6 0.7482 0.8082 0.7115 0.5891 0.9282 0.8280 0.8273 0.7931 0.8156 0.8161
abalone19 0.4963 0.5701 0.5000 0.4949 0.7058 0.5203 0.5185 0.5172 0.5343 0.5284
ecoli-0-1_vs_5 0.8705 0.8182 0.8159 0.7409 0.9449 0.8227 0.8477 0.8614 0.8295 0.9159
ecoli-0-1-4-7_vs_2-3-5-6 0.8154 0.8772 0.8051 0.7622 0.8110 0.8461 0.8529 0.7937 0.8665 0.8353
led7digit-0-2-4-5-6-7-8-9_vs_1 0.5000 0.8436 0.8788 0.5624 0.9363 0.8832 0.8379 0.8943 0.9035 0.8635
yeast-0-2-5-6_vs_3-7-8-9 0.7814 0.7846 0.6606 0.6221 0.7469 0.7543 0.7649 0.7376 0.8140 0.7112
yeast-0-3-5-9_vs_7-8 0.6949 0.6765 0.5868 0.5797 0.7059 0.7222 0.7078 0.6682 0.7075 0.6328
Mean 0.6973 0.7380 0.6966 0.6095 0.8019 0.7327 0.7474 0.7278 0.7513 0.7232
225
226 E. Ramentol et al.
Table 6 shows the comparison. We might see that using two different thresholds
helps to increase the mean AUC over the classic method. Furthermore, using the com-
bination of both proposals improves the mean AUC over each individual proposal.
Future comparisons will only consider the full proposal (D).
Table 7 shows the AUC results with its associated standard deviation of differ-
ent preprocessing methods combined with different classifiers, using a 5 × 5 cross
validation. The best results are shown in bold. The best preprocessing techniques
when using the C4.5 classifiers are SMOTE-TL and SMOTE-ENN. The best pre-
processing techniques to combine with SVM are SMOTE and Borderline-SMOTE2.
(a) Threshold1 - Fixing the threshold1 (b) Threshold2 - Fixing the threshold2
Fig. 1 Parameters – Fixing the final value for parameters keeping constants the rest of parameters
Table 7 AUC mean±standard deviation results in test for the preprocessing methods in combina-
tion with C4.5, 1NN y SVM
Preprocessing method C4.5 1-NN SVM
none 0.6127 ± 0.03804 0.5922 ± 0.00649 0.7082 ± 0.01374
SMOTE 0.7160 ± 0.01504 0.6175±0.01225 0.7352 ± 0.01681
SMOTE-TL 0.6963 ± 0.02898 0.6373 ± 0.01057 0.7240 ± 0.01022
SMOTE-ENN 0.6971 ± 0.03334 0.6315 ± 0.01643 0.7285 ± 0.02501
SMOTE-RSB* 0.7309 ± 0.01700 0.6267 ± 0.01304 0.7085 ± 0.01353
Boderline-SMOTE1 0.6937 ± 0.01496 0.6197 ± 0.02666 0.7245 ± 0.01766
Boderline-SMOTE2 0.7093 ± 0.01854 0.6217 ± 0.00209 0.7404 ± 0.01360
Safe-level-SMOTE 0.6914 ± 0.02523 0.5910 ± 0.00414 0.7097 ± 0.01349
SPIDER2 0.6962 ± 0.01081 0.6313 ± 0.01743 0.6811 ± 0.01843
Table 8 AUC mean±standard deviation results in test for Cost-Sensitive, EUSBOOST and our
proposal
CS-C4.5 CS-SVM EUSBOOST RST-2Simil
0.7220 ± 0.02480 0.7599 ± 0.02613 0.7616 ± 0.01144 0.7821 ± 0.00424
No good result is obtained with the 1NN classifiers combined with any preprocessing
technique.
Table 8 shows the AUC results with its associated standard deviation for the
remaining methods using a 5x5 cross validation. The first two rows correspond to
the cost sensitive methods. The best competitors are the ensemble and our proposal
(results are shown in bold).
Figure 2 shows a resume of the above comparison. It might be observed the
difference of the results of our proposal (last bar) and the state-of-the-art methods.
AUC
0.80
0.78
0.76
RST-2Simil
COST SENSITIVE
EUSBOOST
0.74
SVM +
SMOTE-B2
SVM +
C4.5 +
SMOTE
SVM +
S-RSB*
0.72
0.7
Fig. 2 Average AUC for the best methods used in the comparison
228 E. Ramentol et al.
The experimental study compares the methods in terms of AUC. For the real appli-
cation we need to group the students given their probability to smoothly pass the
academic year. For this reason we test our proposal in the group of students enrolled
in their first academic year at 2013–2014. They already finished such a year so we
could check the results. The group is composed by 35 students. From the total, 28
students smoothly passed the year and 7 had problems (IR = 4). Table 9 shows the
details of the results over the group.
We created three groups of students. The first group (shaded in red) corresponds
to the high risk students and it is composed by the students that our method predicted
to have high probability to fail. We advise the professors for future academic years
to take special care of this group. There are 7 students that had troubles passing the
year and 5 of them are in this group.
The second group (yellow) corresponds to the medium risk students. We recom-
mend the professors to watch over these students. They might be less likely to fail
but still, they might need extra help. We may find two of the 7 students who failed
the year in this group.
Finally, the third group (green) corresponds to the low risk students. There are
not student that did not pass the year in this group.
All in all, 7 out of 7 students that did not pass the academic year are grouped
into the high or medium risk groups. The main goal of our proposal is not to give a
hard classification of the students into two classes (pass or fail) but to provide the
professors with a probability of the students to succeed so the resources can be better
used to avoid drop out.
7 Conclusions
In this paper, we have presented a new proposal for classifying over high imbalanced
data sets. The proposal belongs to the category of techniques at the algorithms level.
Our main contribution from the machine learning point of view can be summarized
as follows:
• We introduce a new measure based in probabilistic rough set to obtain the lower
approximation for high imbalanced data sets. This measure is used to obtain a
classification model.
• The second novelty is the use of two different values of threshold for determining
the similarity between objects.
From the results of our experimental analysis, we have observed good average results
obtained by our proposal. An important conclusion is that using this proposal is not
necessary the use of a preprocessing step because we obtain similar or superior results
than 8 well-known preprocessing methods.
From the application on the prediction of the drop out point of view our main
contributions are as follows:
230 E. Ramentol et al.
References
20. Huang, Y.M., Hung, C., Jiau, H.C.: Evaluation of neural networks and data mining methods
on a credit assessment task for class imbalance problem. Nonlinear Anal.: Real World Appl.
7(4), 720–747 (2006)
21. Kotsiantis, S.B.: Use of machine learning techniques for educational proposes: a decision
support system for forecasting students grades. Artif. Intell. Rev. 37, 331–344 (2012)
22. Lassibille, G., Gomez, L.: Why do higher education students drop out? evidence from spain.
Edu. Econ. 16(1), 89–105 (2007)
23. Liu, D., Li, T., Ruan, D.: Probabilistic model criteria with decision-theoretic rough sets. Inf.
Sci. 181, 3709–3722 (2011)
24. Luan, J.: Data mining and its applications in higher education. New Directions For Institutional
Research, pp. 17–36 (2002)
25. Mazurowski, M., Habas, P., Zurada, J., Lo, J., Baker, J., Tourassi, G.: Training neural network
classifiers for medical decision making: the effects of imbalanced datasets on classification
performance. Neural Netw. 21(2–3), 427–436 (2008)
26. Napierala, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy
and borderline examples. Rough Sets Curr. Trends Comput. Lect. Notes Comput. Sci. 6086,
158–167 (2010)
27. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 145–172 (1982)
28. Pawlak, Z., Wong, S., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int.
J. Man-Mach. Stud. 29, 81–95 (1988)
29. Quinlan, J.: C4.5 Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
30. Rahman Ali, M.H.S., Lee, S.: Rough set-based approaches for discretization: a compact review.
Artif. Intell. Rev. (2015). https://doi.org/10.1007/s10462-014-9426-2
31. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB∗ : a hybrid preprocessing
approach based on oversampling and undersampling for high imbalanced data-sets using smote
and rough sets theory. Int. J. Knowl. Inf. Syst. 33, 245–265 (2012)
32. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst.
Appl. 33, 135–146 (2007)
33. Romero, C., Ventura, S., Espejo, P.G., Hervas, C.: Data mining algorithms to classify students.
In: Proceedings of the 1st International Conference on Educational Data Mining (EDM 08)
(2008)
34. Slowinski, R., Vanderpooten, D.: Similarity relation as a basis for rough approximations. Adv.
Mach. Intell. Soft-Comput. 4, 17–33 (1997)
35. Sun, Y., Wong, A.K.C., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern
Recognit. Artif. Intell. 23(4), 687–719 (2009)
36. Superby, J., Vandamme, J.P., Meskens, N.: Determination of factors influencing the achieve-
ment of the first-year university students using data mining methods. In: Proceedings of the
Workshop on Educational Data Mining at ITS 06 (2006)
37. Terenzini, P.T., Lorang, W.G., Pascarella, E.: Predicting freshman persistence and voluntary
dropout decisions: a replication. Res. Higher Educ. 15(2), 109–127 (1981)
38. Ting, K.M.: An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl.
Data Eng. 14(3), 659–665 (2002)
39. Vapnik, V.: The Nature of Statistical Learning. Springer, Berlin (1995)
40. Veropoulos, K., Campbell, C., Cristianini, N.: Controlling the sensitivity of support vector
machines. In: Proceedings of the International Joint Conference on AI, pp. 55–60 (1999)
41. Weiss, G., Provost, F.: Learning when training data are costly: the effect of class distribution
on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
42. Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis.
Mak. 5(4), 597–604 (2006)
43. Yao, Y., Wong, S., Lin, T.: A review of rough set models. In: Lin, T.Y., Cercone, N. (eds.) Rough
Sets and Data Mining: Analysis for Imprecise Data, pp. 47–75. Kluwer Academic Publishers,
Boston (1997)
44. Yao, Y.Y.: Generalized rough set models. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in
Knowledge Discovery, pp. 286–318. Physica, Heidelberg (1998)
232 E. Ramentol et al.
45. Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20, 287–297 (2003)
46. Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180, 341–353 (2010)
47. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximating concepts. Int. J.
Man-mach. Stud. 37, 793–809 (1992)
48. Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. Comput. Intell. 26(3), 232–257
(2010)
Multiobjective Overlapping Community
Detection Algorithms Using Granular
Computing
1 Introduction
The detection of communities in a social network is a problem that have been widely
addressed in the context of Social Network Analysis (SNA) [24]. Taking into account
the NP-hard nature of the community detection problem [21], several approaches has
been reported in the literature [7, 8, 15, 17].
Most reported approaches define an objective function that captures the notion of
community and then, they use heuristics in order to search for a set of communities
optimizing this function. Although there is no consensus regarding which properties
must satisfy a group of nodes to be considered as a community, intuitively, it is
desirable for a community to have more inner edges than outer edges [19].
Single-objective optimization approaches have two main drawbacks: (a) the opti-
mization of only one function confines the solution to a particular community struc-
ture, and (b) returning one single partition may not be suitable when the network has
many potential structures. Taking into account these limitations, many community
detection algorithms model the problem as a Multi-objective Optimization Problem.
Despite the good results attained by the reported community detection algorithms
following a Multi-objective Optimization approach, most of them constraint com-
munities to be disjoint [5, 18, 21, 28]; however, it is known that most real-world
networks have overlapping community structures [16]. Take into account that those
vertices belonging to more than one community represent individuals that share
characteristics or interests. It is worth noting, the space of feasible solutions in the
overlapping community detection problem is more complicated than that of the dis-
joint case; thus, it results challenged to discover overlapping community structures
in social networks.
To the best of our knowledge, only the algorithms proposed in [10–13, 25]
addressed the overlapping community detection problem from a Multi-objective
Optimization point of view. These algorithms use MOEAs for solving the multi-
objective community detection problem and for looking for the set of Pareto optimal
solutions. Nevertheless, they make little use of the local properties of the nodes in
the network, as well as they do not define which properties must satisfy a node in
order to belong to more than one community.
Our work makes use of Granular Computing [26] for addressing the overlapping
community detection, from a Multi-objective Optimization point of view. Granu-
lar Computing is a term for describing theories, tools and techniques that employ
information granules (subsets of objects of the problem at hand) for problem solving
purposes; objects belonging to the same granule are viewed as inseparable, similar
or near to each other [1].
The hypothesis of our work is that using highly cohesive granules as communities
seeds and an algorithm following a multi-objective approach, that make use of the
local properties of vertices, we can obtain well accurate overlapping communities.
With this aim, in this work we propose three multi-objective optimization algorithms
which build, from different perspectives, a set of overlapping communities. These
algorithms start by building a set of communities seeds using Granular Computing
and then, they iteratively process each seed using three new steps we introduce in the
multi-objective optimization framework, named expansion, improving and merging.
Starting from the seeds these steps aim to detect overlapping zones in the network,
to improve the overlapping quality of these zones, and to merge communities having
high overlapping, respectively.
Multiobjective Overlapping Community Detection Algorithms … 235
2 Related Work
Let G = V, E be a given network, where V is the set of vertices and E the set of
edges among the vertices. A multi-objective community detection problem aims to
search for a partition P ∗ of G such that:
search for the solutions optimising the average community fitness, the average com-
munity separation and the overlapping degree between communities. On the other
hand, iMEA_CDPs [11] uses the same representation and optimization framework
of MEA_CDPs but it proposes to employ the PMX crossover operator and the simple
mutation operators as evolutionary operators. iMEA_CDPs employs the Modularity
function [20] and a combination of average community separation and overlapping
degree as its objective functions.
Another related algorithm is IMOQPSO [10] which uses a center-based repre-
sentation of the solution that is built from the eigenvectors extracted from the line
graph associated to the network. The line graph is obtained by interpreting each edge
of the network as a vertex, and by adding an edge in the line graph for each pair
of edges having one vertex in common. The optimization framework used by IMO-
QPSO combines QPSO and HSA and it uses two objective functions which measure
how strong is the connection inside and outside communities.
OMO [12] and MOEA-OCD [27] uses the classical NSGA-II optimization frame-
work and a representation based on adjacencies between edges of the network. OMO
uses two objective functions which measure the average connection density inside
and outside the communities. On the other hand, MOEA-OCD uses the negative fit-
ness sum and the unfitness sum as objective functions. Unlike previously algorithms
mentioned, in MOEA-OCD algorithm, a local expansion strategy is introduced into
the initialization process to improve the quality of initial solutions.
MCMOEA [25] first detects the set of maximal cliques of the network and then
it builds the maximal-clique graph. Starting from this transformation, MCMOEA
uses a representation based on labels and the MOEA/D optimization framework in
order to detect the communities optimising the RC and KKM objective functions;
see [4] for a description of these functions. Most existing multi-objective algorithms
for detecting overlapping communities use traditional random initialization method,
thus takes no account of the topology properties of the network, resulting in a lot
of redundant and undesirable initial solutions. In contrast, our algorithms use the
local properties of the nodes in the network to define a cohesive-granules based
representation.
Unlike above commented algorithms, our algorithms does not build overlapping
communities directly but rather it uses the cohesive-granules based representation,
in order to produce a set of seed clusters which are then used for building the final
overlapping communities, using greedy-randomized local expansion procedure. The
local expansion procedure aims to iteratively add neighbors of a cohesive granule as
long as a community fitness function is optimized. This step also allows to discovered
overlapping nodes since it is possible to include nodes that have already been assigned
to other communities.
Multiobjective Overlapping Community Detection Algorithms … 237
The main idea of our work is using Granular Computing in order to detect a set of
communities seeds, which are used for representing the solution (i.e, communities)
and then, to process these seeds, through three introduced steps named expansion,
improving and merging, for building the final set of overlapping communities.
We propose two alternatives for building the communities seeds both based on a
similarity relation among the vertices of the network. We will say that v j ∈
a vertex V
is related with a vertex vi ∈ V , denoted as vi Rv j , iff N (vi ) ∩ N v j > 21 · N v j .
The set built from all the vertices related to a vertex vi form the so called similarity
class of vi , denoted as [vi ] R . This relation R constitutes our granularity criterion [26].
Taking into account what has been previously described, in this section we intro-
duce three multi-objective optimization algorithms which build, from different per-
spectives, a set of overlapping communities. These algorithms, named MOCD-OV,
MOGR-OV and MOGR-PAES-OV use the introduced three steps in order to obtain
a set of overlapping communities from a set of seeds; however, they are different
in terms of the alternative they use for building these seeds and/or in terms of the
metaheuristic each of them employs.
Following, we describe the general steps and some particularities of each proposed
algorithm and then, the expansion, improving and merging steps are described in
details. Finally, Sect. 3.7 discusses the computational complexity of the proposed
algorithms.
the selection operator proposed by the PESA-II metaheuristic [3] for building the
mating population M.
In the step 9, M is used for creating the current population C P through the
crossover and mutation operators. The uniform two-point crossover operator is
selected for crossover and for mutation, some genes are randomly selected and sub-
stituted by other randomly selected adjacent nodes.
Afterwards, in the step 12 each chromosome (i.e., set of seed clusters) is processed
using the expansion step. This step processes each seed at a time, by iteratively adding
neighbor vertices to a seed as long as a predefined function improves. It has been
shown in the literature that this kind of local building process attains good results
in single-objective optimization approaches [23], thus we decided to employ it in
order to detect overlapping zones in the network. As a result of the previous step, we
obtain a set of overlapping communities which is then processed by the improving
step. This step focused on to locally improve each overlapping zone previously
detected. For fulfilling this purpose, we define two properties that state, from two
different points of view, what must satisfy a vertex in order to belong to more than
one community. Thus, in this step we iteratively analyze which vertices should be
added or removed from each overlapping zone in order to improve its quality. Finally,
in the merging step the overlapping among the detected communities is revised and
those communities having a high similarity, according to a proposed measure, are
merged; this way, the redundancy in the solution is reduced. The resulting sets of
overlapping communities obtained from each chromosome conforms the current
overlapping population (COP). Once these three steps finished (steps 11–16), the
fitness of both COP and CP is computed.
For evaluating each chromosome in CP we employ the objectives functions
described by MOCD in [21]. On the other hand, for evaluating each solution
Si ∈ C O P we employ as objective functions the intra and inter factors of the overlap-
ping Modularity proposed in [20]. The intra and inter factors measure the intra-link
and inter-link strength of Si , respectively. These functions are defined as follows:
⎛ ⎞
|Si |
Av,w
I ntra(Si ) = 1 − ⎝ ⎠ (2)
j=1 v,w∈C j
2 · m · Ov · Ow
|Si |
|N (v)| · |N (w)|
I nter (Si ) = , (3)
j=1 v,w∈C
4 · m 2 · Ov · Ow
j
Once CP and COP have been evaluated, the nondominated individuals of both CP
and COP are stored. For accomplishing this task we maintain two Pareto sets: one for
the sets of seeds and the other one for the sets of overlapping communities. Finally,
from the Pareto set containing the sets of seeds, the region-based selection defined in
PESA-II is used to select the next Population. In region-based selection, the unit of
selection is now a hyperbox, rather than an individual. A selective fitness is derived
for a hyperbox [3]. Therefore, solutions located in less crowded niches are selected
and assigned to P. Steps 8–21 are repeated a predefined number of iterations.
the subgraphs induced by each similarity class vertex [vi ] R , vi ∈ V ; each of these
subgraph is interpreted as a granule (i.e., a seed community), MOGR-OV could use
for building the final communities. The pseudo-code of MOGR-OV is showed in
Algorithm 2.
In the steps 5–9 of Algorithm 2, MOGR-OV builds a solution C. For accomplish-
ing this task, MOGR-OV iteratively applies the roulette wheel selection method over
Gr , where the probability of being selected of a granule g j ∈ Gr is computed by
using the number of unclustered vertices (i.e., vertices that do not belong to any pre-
viously built community of C) belonging to g j . Once a granule g j has been selected
it is processed using the expansion step in order to build the community associated
with g j .
In the steps 10–11, MOGR-OV processes the current solution C using the improv-
ing and merging methods, in order to optimize the quality of the overlapping zones
and for reducing the redundancy in the overlapping communities. The resulting set
of overlapping communities is evaluated using the Eqs. (2) and (3), and as result of
this evaluation, it is added to the Pareto set iff it is a nondominated solution. Steps
5–15 are repeated a predefined number of iterations.
Overlapping vertices are supposed to be those vertices that belong to more than one
community and in order to be correctly located inside a community they need to have
edges with vertices inside those communities. For detecting overlapping zones each
seed Si is processed for determining which vertices outside Si share a significant
number of their adjacent vertices with vertices inside Si , considering G = V, E.
Let Si be a seed cluster and ∂ Si ⊆ Si the set of vertices of Si having neighbors
outside Si . The strength of ∂ Si , denoted as Str (∂ Si ), is computed as the ratio between
the number of edges of ∂ Si with vertices inside Si , and the number of edges of ∂ Si
with vertices inside and outside Si .
The strategy for expanding seed Si is as follows: (1) determining the set L of
vertices v ∈/ Si having at least one adjacent in ∂ Si , such that Str (∂ Si ) − Str (∂ Si ) >
0, where Si = Si ∪ {v}, (2) applying the roulette wheel selection method over L,
242 D. H. Grass-Boada et al.
where abs(·) is the absolute value. U (v) takes values in [0, 1] and the higher its value
the better well-balanced v is.
Another property we would expect an overlapping vertex v ∈ Z to fulfill is to
be a connector between any pair of its adjacent vertices in N
(v|C Z ); that is, we
would expect that the shortest path connecting any pair of vertices u, w ∈ N
(v|C Z )
should be the path made by the edges (u, v) and (v, w). The simple betweenness of v,
denoted as S B(v), measures how much connector v is and it is computed as follows:
|C Z −1| |C Z | | E(Ci ,C j )|
2· i=1 j>i 1− |N
(v|C Z )∩Ci |·| N
(v|C Z )∩C j |
S B(v) = (5)
|C Z | · (|C Z | − 1)
where E(Ci , C j ) is the set of edges with one vertex in Ci and the other one in C j .
S B(v) takes values in [0, 1] and the higher its value the best connector v is.
Let Uave (Z ) be the initial average uniformity of the vertices belonging to an
overlapping zone Z . In order to improve the quality of Z we will analyze the addition
or removal of one or mores vertices from Z . Thus, any vertex v ∈ Z having U (v) <
Uave (Z ) is a candidate to be removed from Z , whilst any vertex u ∈ N (v|C Z ), v ∈ Z ,
such that U (u) > Uave (Z ) is a candidate to be added to Z . Taking into account
that both the uniformity and simple betweenness concepts can be straightforward
generalized in order to be applied to Z , we employ such properties for measuring
which changes in Z increase its quality as an overlapping zone and which do not.
Let T be an addition or removal which turns Z into Z
. T is considered as viable iff
(U (Z
) + S B(Z
)) − (U (Z ) + S B(Z )) > 0. The heuristic proposed for improving
the set O = {Z 1 , Z 2 , . . . , Z j } of overlapping zones detected by the expansion step
is as follows: (1) computing Uave (Z i ) for each Z i ∈ O, (2) detecting the set T of
viable transformations to apply over O, (3) performing the transformation t ∈ T
which produces the higher improvement in its zone, and (4) repeat steps 2 and 3
while T = ∅.
244 D. H. Grass-Boada et al.
Our proposals need to compute the similarity class of each vertex, as well as they
perform the expansion, improving and merging steps.
The computation of [vi ] R for each vertex vi of the network is one of the more com-
putational expensive step. This step is O(n 3 ) because it needs to compute the shared
neighbors between each pair of vertices in the graph. Fortunately, it is performed just
once so it does not affect to much to the overall performance of the algorithms.
As it was showed in [2], the expansion step is O(q · d · |L|); where q is the size
of the biggest seed analyzed, d is the average vertex degree, and |L| is the number of
vertices outside a community seed having edges with the seed. On the other hand, the
improving step has a computational complexity of O(ti · n 3 ) and finally, the merging
step has a computational complexity of O(tm · k · m · n 2 ), where k is the number
of communities discovered, m is the average number of communities to which a
community overlap, and tm is the number of iterations performing of merging step.
Based on the above mentioned analysis, in the case of the MOCD-OV algorithm
the computational complexity is O(g · s · (T · n 3 + e + n)), where T = max(ti , tm ),
e is the number of edges (i.e., n 2 ), g is the number of iterations and s the population
size. Finally, by the rule of the sum the MOCD-OV algorithm is O(g · s · T · n 3 ).
Starting from this point and taking into account that both MOGR-OV and MOGR-
PAES-OV are single-solution based algorithms, we can assert that their complexity
is O(T · n 3 ).
Multiobjective Overlapping Community Detection Algorithms … 245
4 Experimental Results
In this section we evaluate the performance of our proposals over the networks shown
in Table 1, and we compare their results against that attained by MEA_CDP, IMO-
QPSO, iMEA_CDP, OMO and MOEA-OCD, over the same networks. For evaluating
the accuracy of each algorithm we used the NMI external evaluation measure, pro-
posed by Lancichinetti et al. in [8]. NMI takes values in [0,1] and it evaluates a set
of communities based on how much these communities resemble a set of communi-
ties manually labeled by experts, where 1 means identical results and 0 completely
different results.
For computing the accuracy attained by one of our proposal over each network
we employed the experimental framework proposed in [12]. For example, in case
we want to compute the accuracy attained by MOCD-OV, we executed it over each
Table 2 Comparison of our proposals against Multi-objective algorithms, regarding the NMI value.
Best values appears bold-faced
Algorithms Football Zachary’s Dolphins Krebs’ Ave. rank. pos.
MEA_CDP 0.495 0.52 0.549 0.469 6.25
IMOQPSO 0.462 0.818 0.886 X 5.5
iMEA_CDP 0.593 0.629 0.595 0.549 4.25
OMO 0.33 0.375 0.41 0.39 7.75
MOEA-OCD 0.77 0.487 0.648 0.484 5
MOCD-OV 0.793 0.88 0.95 0.502 1.5
MOGR-OV 0.789 0.908 0.944 0.495 2
MOGR- 0.781 0.856 0.675 0.479 3.75
PAES-OV
network and we selected the highest NMI value attained by a solution of each resulting
Pareto set. This experiment is repeated twenty times and for each network, the average
of the highest NMI values attained is computed. The same heuristic is followed
for computing the accuracy of both MOGR-OV and MOGR-PAES-OV algorithms.
Due to MOCD-OV extends MOCD algorithm, we used the parameter configuration
defined in [21].
Table 2 shows the average NMI attained by each algorithm over the real-life
networks used in this experiment; the average values for MEA_CDP, IMOQPSO,
iMEA_CDP and OMO algorithms were taken from their original articles. The aver-
age values for MOEA-OCD were computed following the above mentioned heuristic.
The “X” in Table 2 means that IMOQPSO does not report any results on the Krebs’
books network.
As it can be seen from Table 2, both MOCD-OV and MOGR-OV outperform the
other algorithms in all the networks, excepting in Krebs’ in which they attains the
second and third best result, respectively. On the other hand, the MOGR-PAES-OV
algorithm we proposed attains similar results than MOCD-OV and MOGR-OV, in
Football and Zachary’s networks, while its performance slightly decays in bigger
networks like Dolphins and Krebs’. In the last column of Table 2 we also showed
the average ranking position attained by each algorithm and as it can be observed,
our proposals clearly outperforms the other methods. From the above experiments
on real-life networks, we can see that our proposals are are promising and effective
for overlapping community detection in complex networks, being MOCD-OV the
one performing the best.
In this section we evaluate the performance of our proposals over several synthetic
networks generated from the LFR benchmark [9], in terms of the NMI value they
attain, and we compared it against the results attained by MOEA-OCD algorithms,
Multiobjective Overlapping Community Detection Algorithms … 247
which has reported the best results over this kind of networks, among the algorithms
described in Sect. 2.
In LFR benchmark networks, both node degrees and community sizes follow the
power-law distribution and they are regulated using parameters τ1 and τ2 . Besides,
the significance of the community structure is controlled by a mixing parameter μ,
which denotes the average fraction of edges each vertex in the network has with other
communities. The smaller the value of μ, the more significant community structure
the LFR benchmark network has. In the first part of this experiment, we set network
size to 1000, τ1 = 2, τ2 = 1, the node degree is in [0, 50] with an average value
of 20, whilst the community sizes varies from 10 to 50 elements. Using previous
parameters values we vary μ from 0.1 to 0.5 with an increment of 0.05.
For computing the accuracy attained by each one of our proposals and for MOEA-
OCD we follow the same method used in the experiments on Sect. 4.1. We show in
Fig. 2 the average NMI value attained for each algorithm over the LFR benchmark
when μ varies from 0.1 to 0.5 with an increment of 0.05.
As it can be seen from Fig. 2, as the value of μ increases the performance of
each algorithm deteriorate, being both MOGR-OV and MOGR-PAES-OV those that
performing the best. As the mixing parameter μ exceeds 0.1, the MOEA-OCD algo-
rithm begin to decline in its performance and it is outperformed by MOGR-OV and
MOGR-PAES-OV. Finally, when the value of μ in greater than 0.4 all our proposals
outperform the MOEA-OCD algorithm.
For summarizing the above results, we evaluated the statistical significance of
the NMI values attained by MOGR-OV and MOGR-PAES-OV with respect to those
attained by MOEA-OCD, over each network; we exclude MOCD-OV from this
0.95
Ave. NMI
0.9
0.85
0.8
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Mixing parameter in LFR benchmark
MOGR-OV MOGR-PAES-OV MOCD-OV MOEA-OCD
Fig. 2 Average NMI value attained by each algorithm on LFR benchmark networks when μ varies
from 0.1 to 0.5 with an increment of 0.05
248 D. H. Grass-Boada et al.
Algorithm*Networks; LS Means
Current effect: F(8, 79)=28.134, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.02
1.00
0.98
0.96
0.94
0.92
NMI value
0.90
0.88
0.86
0.84
0.82
0.80
0.78
Algorithm
0.76 MOGR-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45 Algorithm
MOEAOCD
Networks
Fig. 3 Statistical significance of the results attained by MOGR-OV and MOEA-OCD over each
network
analysis taking into account that its performs was worst than that of MOGR-OV and
MOGR-PAES-OV algorithms. For testing the statistical significance we used the
software STATISTICA v8.0, and we perform a factorial ANOVA in order to analyze
the higher-order interactive effects of multiple categorical independent factors. With
this aim, we first evaluated the statistical significance of the results of each algorithm
over each network (see, Figs. 3 and 4).
As it can be seen from Figs. 3 and 4, the results attained by both MOGR-OV
and MOGR-PAES-OV, over each network, are statistically superior with respect to
that of the MOEA-OCD algorithm. This can be observed also from Figs. 5 and 6, in
which we showed the statistically significance of the overall performance of our two
proposals with respect to that of MOEA-OCD.
In the second part of this experiment, we set μ = 0.1 and μ = 0.4, and we vary
the percent of overlapping nodes existing in the network from 0.1 to 0.45 with an
increment of 0.05; the other parameters remain the same as the first experiment.
Figures 7 and 8 shows the average NMI value attained for each algorithm over each
of these networks.
As it can be seen from Fig. 7, when the structure of the networks is well defined,
MOGR-OV, MOGR-PAES-OV and MOEA-OCD have a performance almost stable,
independently the number of overlapping nodes in the network, being MOGR-OV
the one performing the best. It is also worth mentioning that the performance of the
MOCD-OV algorithm is highly affected by the increasing in the fraction of overlap-
ping vertices. On the other hand, as can be seen from the Fig. 8, when the structure of
the communities is uncertain, the performance of the MOEA-OCD algorithm drops
quickly when the overlapping in the network increases, being our proposals and
specifically MOGR-OV and MOGR-PAES-OV those that perform the best.
Multiobjective Overlapping Community Detection Algorithms … 249
Algorithm*Networks; LS Means
Current effect: F(8, 84)=16.676, p=.00000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.02
1.00
0.98
0.96
0.94
0.92
NMI value
0.90
0.88
0.86
0.84
0.82
0.80
0.78
Algorithm
0.76 MOGR -PAES-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOEAOCD
Networks
Fig. 4 Statistical significance of the results attained by MOGR-PAES-OV and MOEA-OCD over
each network
0.94
NMI value
0.93
0.92
0.91
0.90
MOGR-OV MOEAOCD
Algorithm
Algorithm; LS Means
Current effect: F(1, 84)=207.64, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.95
0.94
NMI value
0.93
0.92
0.91
0.90
MOGR -PAES-OV MOEAOCD
Algorithm
Fig. 6 Statistical significance of the overall results attained by MOGR-PAES-OV wrt. MOEA-OCD
0.95
0.9
0.85
Ave. NMI
0.8
0.75
0.7
0.65
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Fig. 7 Networks with significant community structure (μ = 0.1). Average NMI value attained by
each algorithm when the fraction of overlapping nodes varies from 0.1 to 0.5 with an increment of
0.05
0.95
0.9
0.85
0.8
0.75
Ave. NMI
0.7
0.65
0.6
0.55
0.5
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Fig. 8 Networks with indistinct community structure (μ = 0.4). Average NMI value attained by
each algorithm when the fraction of overlapping nodes varies from 0.1 to 0.5 with an increment of
0.05
Algorithm*Networks; LS Means
Current effect: F(7, 91)=2.3308, p=.03108
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.1
1.0
0.9
0.8
NMI value
0.7
0.6
0.5
0.4
Algorithm
0.3 MOEA-OCD
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOGR-OV
Networks
Fig. 9 Statistical significance of the results attained by MOGR-OV and MOEA-OCD over each
network
252 D. H. Grass-Boada et al.
Algorithm*Networks; LS Means
Current effect: F(7, 85)=2.5328, p=.02042
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.1
1.0
0.9
0.8
NMI value
0.7
0.6
0.5
0.4
Algorithm
0.3 MOEA-OCD
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45 MOGR -PAES-OV
Networks
Fig. 10 Statistical significance of the results attained by MOGR-PAES-OV and MOEA-OCD over
each network
0.80
0.75
NMI value
0.70
0.65
0.60
0.55
MOEA-OCD MOGR-OV
Algorithm
Algorithm; LS Means
Current effect: F(1, 85)=149.97, p=0.0000
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.85
0.80
0.75
NMI value
0.70
0.65
0.60
0.55
MOEA-OCD MOGR -PAES-OV
Algorithm
Fig. 12 Statistical significance of the overall results attained by MOGR-PAES-OV wrt. MOEA-
OCD
Algorithm*Networks; LS Means
Current effect: F(8, 81)=2.4055, p=.02209
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
1.00
0.98
0.96
0.94
NMI value
0.92
0.90
0.88
0.86
0.84
Algorithm
0.82 MOGR-OV
Mu 0.1 Mu 0.2 Mu 0.3 Mu 0.4 Mu 0.50
Algorithm
Mu 0.15 Mu 0.25 Mu 0.35 Mu 0.45
MOGR -PAES-OV
Networks
Fig. 13 Statistical significance of the results attained by MOGR-OV and MOGR-PAES-OV over
each network
Pareto front, respectively. Figure 15b, c show two overlapping community structures
in which vertices 3, 9, and 31 are overlapping vertices.
Functions (2) and (3) have the potential to balance each others tendency to increase
or decrease the number of communities. This is crucially important in order to obtain
different number of communities, avoiding this way the convergence to trivial solu-
tions [21]. For example, from the community structure in Fig. 15c, it is apparent
that the community of the right further divides into two smaller ones in Fig. 15b;
therefore, the Intra value increases and the Inter value decreases. On the other hand,
254 D. H. Grass-Boada et al.
Algorithm; LS Means
Current effect: F(1, 74)=18.111, p=.00006
Effective hypothesis decomposition
Vertical bars denote 0.95 confidence intervals
0.825
0.820
0.815
NMI value
0.810
0.805
0.800
0.795
0.790
0.785
MOGR-OV MOGR -PAES-OV
Algorithm
Fig. 14 Statistical significance of the overall results attained by MOGR-OV and MOGR-PAES-OV
(a) (b)
0.9
s5 27 23
0.8 20 13
15
10 4
0.7 16
0.051 21
0.6 0.816 34 14
s4 31
0.5 5
Inter
s1 s3 s2 30 6
0.4 33 2 17
9 1
0.128
0.3 0.5 7
19
0.192 11
0.2 0.415 24 3 22
intra
0.1 inter 28 29 18
8 12
0
0 0.05 0.1 0.15 0.2 0.25 26 32
Intra 25
(c) (d)
27 23
20 13
15
10 4
16
21
34 14
31
30 5
6
33 2 17
9 1
7
19
11
24 3 22
28 29 18
8 12
26 32
25
Fig. 15 Examples of the overlapping communities detected over the Zachary’s network. a Non-
dominated front; b–d correspond to three solutions labeled as s4, s3, s5, in nondominated front,
respectively
the minimum Intra value found by MOGR-OV is 0.051 whose corresponding com-
munity structure is showed in Fig. 15b. In this case, one community covers many
vertices, thereby the Intra value decreases and Inter value increases.
Multiobjective Overlapping Community Detection Algorithms … 255
5 Conclusions
In this paper, we introduced three algorithms that combine Granular Computing and
a multi-objective optimization approach for discovering overlapping communities in
social networks. These algorithms start by building a set of seeds that is afterwards
processed for building overlapping communities, using three introduced steps, named
expansion, improving and merging.
The proposed algorithms, named MOGR-OV, MOGR-PAES-OV and MOCD-
OV, were evaluated on four real-life networks in terms of their accuracy and they
were compared against five Multi-objective algorithms of the related work. This
experiment showed that our proposal and, specifically, the MOCD-OV algorithm
outperforms in terms of the NMI external measure the other algorithms in almost
all the real collection. Moreover, our proposals were also evaluated over several
synthetic networks, in terms of the NMI value. These other experiments showed
that, when the structure of the network is not well defined, our proposals perform the
best. Additionally, when the quality of the structure of the network is fixed and the
overlapping of the network begin to increase, one of our proposals, the MOGR-OV
algorithm, is the one with the highest accuracy in almost all cases.
We can conclude from our experimental evaluation that, from our proposals, the
algorithm MOGR-OV is the one that offers the better trade-of in terms of the accuracy
with real and synthetic networks.
As future work, we would like to explore the use of another mutation operator in
the MOGR-PAES-OV algorithm, specifically, one that employs the local properties
of vertices defining seeds that belong to the Pareto set. We have the hypothesis that
this is the key for boosting the accuracy of our algorithms.
Acknowledgements If you want to include acknowledgments of assistance and the like at the end
of an individual chapter please use the acknowledgement environment — it will automatically
render Springer’s preferred layout.
References
5. Gong, M., Ma, L., Zhang, Q., Jiao, L.: Community detection in networks by using multiobjective
evolutionary algorithm with decomposition. Phys. A: Stat. Mech. Appl. 391(15), 4050–4060
(2012)
6. Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the pareto archived
evolution strategy. Evol. Comput. 8(2), 149–172 (2000)
7. Lancichinetti, A., Fortunato, S.: Consensus clustering in complex networks. Sci. Rep. 2, 336
(2012)
8. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical com-
munity structure in complex networks. New J. Phys. 11(3), 033015 (2009)
9. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detec-
tion algorithms. Phys. Rev. E 78, 046110 (2008)
10. Li, Y., Wang, Y., Chen, J., Jiao, L., Shang, R.: Overlapping community detection through an
improved multi-objective quantum-behaved particle swarm optimization. J. Heuristics 21(4),
549–575 (2015)
11. Liu, C., Liu, J., Jiang, Z.: An improved multi-objective evolutionary algorithm for simultane-
ously detecting separated and overlapping communities. Nat. Comput. 15(4), 635–651 (2016)
12. Liu, B., Wang, C., Wang, C., Yuan, Y.: A new algorithm for overlapping community detection.
In: 2015 IEEE International Conference on Information and Automation, pp. 813–816. IEEE
(2015)
13. Liu, J., Zhong, W., Abbass, H.A., Green, D.G.: Separated and overlapping community detection
in complex networks using multiobjective evolutionary algorithms. In: 2010 IEEE Congress
on Evolutionary Computation (CEC), pp. 1–7. IEEE (2010)
14. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E
69, 066133 (2004)
15. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys.
Rev. E 69(2), 026113 (2004)
16. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure
of complex networks in nature and society. Nature 435(7043), 814 (2005)
17. Pizzuti, C.: Ga-net: a genetic algorithm for community detection in social networks. In: Inter-
national Conference on Parallel Problem Solving from Nature, pp. 1081–1090. Springer (2008)
18. Pizzuti, C.: A multiobjective genetic algorithm to find communities in complex networks. IEEE
Trans. Evol. Comput. 16(3), 418–430 (2012)
19. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying com-
munities in networks. Proc. Natl. Acad. Sci. 101(9), 2658–2663 (2004)
20. Shen, H., Cheng, X., Cai, K., Hu, M.B.: Detect overlapping and hierarchical community struc-
ture in networks. Phys. A: Stat. Mech. Appl. 388(8), 1706–1712 (2009)
21. Shi, C., Yan, Z., Cai, Y., Wu, B.: Multi-objective community detection in complex networks.
Appl. Soft Comput. 12(2), 850–859 (2012)
22. Talbi, E.G.: Metaheuristics: from Design to Implementation, vol. 74. Wiley, New York (2009)
23. Wang, X., Liu, G., Li, J.: Overlapping community detection based on structural centrality in
complex networks. IEEE Access 5, 25258–25269 (2017)
24. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cam-
bridge University Press, Cambridge (1994)
25. Wen, X., Chen, W.N., Lin, Y., Gu, T., Zhang, H., Li, Y., Yin, Y., Zhang, J.: A maximal clique
based multiobjective evolutionary algorithm for overlapping community detection. IEEE Trans.
Evol. Comput. 21(3), 363–377 (2017)
26. Yao, Y., et al.: Granular computing: basic issues and possible solutions. In: Proceedings of the
5th Joint Conference on Information Sciences, vol. 1, pp. 186–189 (2000)
27. Yuxin, Z., Shenghong, L., Feng, J.: Overlapping community detection in complex networks
using multi-objective evolutionary algorithm. Comput. Appl. Math. 36(1), 749–768 (2017)
28. Zhou, Y., Wang, J., Luo, N., Zhang, Z.: Multiobjective local search for community detection
in networks. Soft Comput. 20(8), 3273–3282 (2016)
In-Database Rule Learning Under
Uncertainty: A Variable Precision Rough
Set Approach
1 Introduction
Data analysis became more pronounced with Machine Learning (ML) and the broad
availability of related frameworks in the 1990s and early 2000s. Over time, these
software systems constantly have been refined supplying a huge arsenal of mining
algorithms turning conventional workstations into analytical platforms. One of the
reasons for their saturating success in practice has certainly been their simple and
intuitive design, which still make them a central workhorse for data science nowa-
days. As these ML workbenches are usually isolated from the problem domain, the
typically mining process involves load procedures to import data of interest either
given through flat files or external data repositories right before knowledge extraction
can commence. While these import mechanisms work properly for data sets of mod-
erate size, they perform rather poorly for large quantities of data due to inefficient file
operations or enduring data transmissions. Regarding the challenge of ever-growing
data volumes to analyze, these traditional loading techniques, thus, become a huge
concern for mining tasks in the long run. To mitigate these downsides of classic ML
software frameworks, a decisive paradigm termed “in-database analytics” evolved in
data science and related disciplines (e.g. [1–4]). It essentially brings analytics to the
data taking advantage of native SQL and other built-in functionality such as efficient
data structures or parallel processing. Hence, in-database processing has the poten-
tials to largely reduce data transports by fusing ML components and data repository
into a single scalable mining system. This is favorable for many real-world scenarios
as hidden knowledge is stored in relational Database Systems (DBs) predominantly
either provided through transactional data or warehouses.
Employing Rough Set Theory (RST) is of particular interest for in-database ana-
lytics, because it is based on pure set operations, that are efficiently implemented by
most relational engines and, in fact, research in that direction is promising given the
current progress (e.g. [5–8]). However, most existing approaches exhibit two funda-
mental drawbacks: (i) They are either impractical due to their poor implementation
or unable to cope with vagueness as opposed to the virtue of RST. (ii) Furthermore,
they consistently consider data to be drawn from the same distribution. To their
full extend, both points have practical relevance and constitute strong assumptions in
uncertain and highly dynamic environments. This is particularly true when analyzing
data that evolve over time, which are ultimately stored at a DB. Under such circum-
stances, noise can be apparent or the concepts to be learned may change suddenly or
gradually in an unforeseeable fashion drastically degrading classification accuracy
of an initially trained predictive model. This phenomenon is commonly referred to
as “concept drift” (e.g. [9, 10]) and requires learning algorithms to provide adequate
mechanisms to adopt to these changes. Various disciplines are suffering from drifts
due to their nonstationary nature. These include marketing applications, where cus-
tomer purchasing habits might be influenced due to advertisements or fashion trends
[11]. Another example are long-term studies of medical data that are collected over
years or even decades [12]. Thus, it is very likely the data generating process may
have changed over time making the mining task a difficult endeavor. A final exam-
ple of a drift scenario is adversarial behavior frequently penetrating cyber security
applications. In such settings, an attacker intends to manipulate the outcome of the
learning algorithm to exploit vulnerabilities or to simply evade detection [13].
To address the lack of uncertainty management in recent RST literature for in-
database applications, we propose a new bottom-up rule-based classifier for non-
stationary environments and class imbalance problems as an extension of an earlier
work [14]. It is termed Incremental In-Database Rule Inducer (InDBR), which lever-
ages Variable Precision Rough Sets (VPRS) and efficient database algebra in order to
produce certain and uncertain decision rules as new data samples become available.
The motivation combining rule learning and VPRS to undertake mining tasks under
In-Database Rule Learning Under Uncertainty: A Variable Precision … 259
changing conditions has several reasons. In general, rules are intelligible, and thus
an exceptional source to describe expressive pattern towards transparent decision-
making. Furthermore, each rule can be updated easily without retraining the entire
model in case parts of it are subject to drifts. Ultimately, an intrinsic concern of
nonstationary environments is data noise, which is natively handled by VPRS and an
in-database implementation has been recently compiled [15]. These benefits given
as baseline, InDBR features a novel bottom-up generalization strategy reacting fast
to drifts. Additionally, InDBR has the ability to abstain from classification in situa-
tions it is uncertain, which increases confidence especially for critical applications
that require quality predictions and traceability for domain experts rather than unex-
plainable prospects.
The remainder of this chapter is structured as follows: First, we introduce funda-
mentals of VPRS (Sect. 2) and retrospect related approaches of other authors (Sect. 3).
In Sect. 4, VPRS is formally transported to the domain of DBs, which is exploited
in Sect. 5 proposing InDBR. Section 6 evaluates InDBR and two other state-of-the-
art rule inducers towards both predictive and descriptive capabilities. Based on the
obtained results, we recap and conclude on this chapter (Sect. 7).
This section outlines rudiments of RST and VPRS as originally introduced by Pawlak
[16, 17] and Ziarko [18]. We describe basic data structures, indiscernibility relation
(Sect. 2.1) and the concept approximation (Sect. 2.2).
where X and Y are two ordinary sets. Using the bound c(X, Y ) ≤ β with 0 ≤ β <
0.5, X is said to be included in Y w.r.t. the permitted error β and we write X ⊆β Y .
Combining this relaxation and the indiscernibility relation, a given target concept
can be classified in terms of VPRS using the following two definitions.
Definition 1 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-lower approximation of X can be expressed by
X B,β = {K ∈ U/B | c(K , X ) ≤ β} . (3)
Definition 2 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-upper approximation of X can be expressed by
X B,β = {K ∈ U/B | c(K , X ) < 1 − β} . (4)
Definition 3 Let U, A, B ⊆ A, β ∈ [0, 0.5) and the concept X ⊆ U . For any cho-
sen β, the β-boundary approximation of X can be expressed by
For binary or multiclass classification problems, VPRS provide further notions. Uti-
lizing a DS, they are determined by the following Definitions 4 and 5.
Definition 4 Given U, A, D, B ⊆ A, E ⊆ D and β ∈ [0, 0.5), all concepts
induced by the partition U/E can be evaluated using the β-positive region
P O S B,E,β = X B,β . (6)
X ∈U/E
Since P O S B,E,β is the union of all available β-lower approximations with respective
X ∈ U/E, it covers those x ∈ U which can be classified with certainty using B and
precision β. Conversely, B N D B,E,β holds all inconsistent objects. Employing both
β-regions, a comprehensive outline on the quality of B w.r.t. E is supplied. Note,
VPRS are a generalization of classic RST. One can verify that in case of β = 0, both
models are equivalent.
Note, the algorithms presented in this chapter only rely on the β-approximation to
induce decisive rules as part of the knowledge extraction process. On these grounds,
we omit the introduction of core and reduct as key features of RST. Instead, the
interested reader is referred to [17, 18] for further details on this subject.
3 Related Work
Research on combining DBs and RST dates back to the mid 1990s in order to lever-
age the efficient infrastructure provided (parallelism, algorithms, data structures and
statistics). In this context, one of the first systems using database algorithms is the
data mining toolkit RSDM [5]. It incorporates SQL commands to pretreat and fetch
relevant data from a DB, which are finally processed on a row-by-row basis to com-
pute VPRS. This conventional client-server architecture provides solid performance
as long as data can be compressed adequately at the DB end as fewer rows need
262 F. Beer and U. Bühler
Decision rules are used to represent knowledge for decades. One of the first
approaches handling concept drifts is the family of algorithms called FLORA con-
sisting of FLORA2, FLORA3 and FLORA4 [23]. The main ideas behind FLORA2
is a partial memory storing examples used to induce new rules. The memory is
implemented as sliding window and contracts as drifts occur. FLORA3 expands
FLORA2 by dealing with reappearing concepts. After each learning cycle, it deter-
mines whether to reconsider useful rules of the past. FLORA4 distinguishes between
concept drift and data noise by tracking a rule’s accuracy through confidence inter-
vals. Another method derived from the classic sequential covering algorithm AQ [24]
is AQ11-PM+WAH [25]. It incorporates the adaptive window of [23] to handle drift-
ing conditions. As such, AQ11-PM+WAH is comparable to FLORA2 performance-
wise, but maintains fewer examples during learning. However, both mentioned rule
learners are not designed to process data arriving in a stream-like fashion. FACIL is
the first algorithm explicitly built to mine numeric data streams [26]. It is a bottom-up
In-Database Rule Learning Under Uncertainty: A Variable Precision … 263
rule inducer that is able to store inconsistent rules with corresponding examples. To
maintain a specific purity within the rule set, a user-defined threshold needs to be pro-
vided. Rules violating the minimum purity are replaced by new rules generated from
the associated examples. A complete different approach is applied in VFDR, which
produces either ordered or unordered sets of rules following a top-down approach
by stepwise specializing existing rules [27]. The rule induction is guided by the
Hoeffding bound [28] as an adaption from the decision tree VFDT [29]. In order
to improve its performance under drift, VFDR is extended with a drift detector in
[30]. Often a demand to stream-based learners is the any-time property, i.e. always
being able to classify incoming examples (e.g. [31, 32]). Thus, [27, 30] incorporate
Naive Bayes (NB), which takes over classification in scenarios, where no appropri-
ate rule exist in the rule set. An algorithm explicitly relaxing the any-time capability
is eRules [33], which enhances the well-known batch learner PRISM [34]. During
training, it buffers incoming examples that are unclassifiable by the existing rule set
and triggers PRISM once a user-defined threshold is reached. As such, classification
is abstained when no appropriate rule exists or eRules is uncertain. This approach is
further improved through its successor G-eRules [35] since eRules performs poorly
when confronted with continuous data. The most recent rule classifier compiled by
the authors of [33, 35] is called Hoeffding Rules [36], which incorporates the Hoeffd-
ing bound as a statistical measure to determine the number of examples required to
stimulate the production of new decision rules. The latest bottom-up rule approach
coping with drifts is the any-time algorithm RILL [37], whose induction strategy is
based on distance measures to find nearest rules. Furthermore, it utilizes intensive
pruning only keeping most essential information.
In contrast to the presented approaches, our incremental rule-based learner
exploits VPRS and is designed for in-database applications. Additionally, it adopts
the idea of [33] relaxing the any-time requirement. Particularly, the latter point is
very beneficial for real-world scenarios that require reliable classification capabilities
supporting decision-makers.
This section is based on [15] and formally brings VPRS to the domain of rela-
tional DBs. First, we discuss how to express an IS and DS in DB terminology
(Sect. 4.1) and introduce required relational operations to port the indiscernibility
relation (Sect. 4.2). Ultimately, a redefinition of the β-approximation is provided
permitting to compute VPRS inside DBs (Sect. 4.3).
264 F. Beer and U. Bühler
Given the definition of a data table TA from the previous section, such a relation
can also be a result of any of the following algebraic operations: projection π , selec-
tion σ , grouping G and joining . In more detail, π B (TA ) allows to project tuples
t ∈ TA to a specified feature subset B ⊆ A while removing duplicates. A projec-
tion without duplicate elimination is indicated by π B+ (TA ). Note, we further permit
attribute modifications during the projection through simple assignments or arith-
+
metic operations. An illustrative example is given by π3→x,x→y,x+y→z (Tx,y ), where
x is assigned to the value 3, y is allocated with x and the new attribute z holds the
sum of x and y respectively. Filtering tuples is performed via σφ (TA ). It essentially
removes those t ∈ TA not fulfilling condition φ and keeps the original schema A.
The grouping operator G F,G,B (TA ) groups tuples of TA according to the attributes
G and applies the aggregation functions F = { f 1 , . . . , fr }, r ∈ N0 , while the output
schema of G corresponds to F, B with B ⊆ G ⊆ A. In this respect, we have for
F = ∅ and G = B : G F,G,B (TA ) ≡ π B (TA ). That given, we are able to define the
indiscernibility relation based on extended relational algebra. For our purpose, we
simply count the number of members in each elementary class of a given table TA ,
i.e. the cardinality expressed by the aggregate count, and include it as new feature c.
Consolidated, we make use of the following notation
G˜c,B
G
(TA ) := ρc,b1 ,...,bm (G{count},G,B (TA )) , (9)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 265
Having discussed the mapping of an IS and DS respectively alongside with the indis-
cernability relation from a DB perspective, this section transfers the β-approximation
to the domain of DBs in two phases. First, we restructure Definitions 1–5 to rewrit-
ten set-oriented expressions and show that these are no extensions to Ziarko’s model
but equivalent terms given through Propositions 1 and 2. These propositions can be
ported to relational algebra intuitively and hence Theorems 1 and 2 can be obtained
in the second stage representing a compliant in-database VPRS model. To point out
the practical efficiency of that resulting model, Theorem 3 is presented and briefly
discussed.
Proposition 1 Let U, A and B ⊆ A. For any X ⊆ U and a fixed β ∈ [0, 0.5), the
β-approximation of X can be described by
{K ∈ U/B | ∃H ∈ X/B : φ} , (10)
Proposition 2 Let U, A, D, B ⊆ A and E ⊆ D. For any fixed β ∈ [0, 0.5), the
β-regions P O S B,E,β and B N D B,E,β can be described by
{K ∈ U/B | ∃H ∈ U/(B ∪ E) : φ} , (12)
266 F. Beer and U. Bühler
Theorem 1 Let TA , B ⊆ A, β ∈ [0, 0.5) and let the target concept CA be a subset
of T . We can compute the β-lower (L B,β (T, C)), β-upper (U B,β (T, C)) and β-
boundary approximation (B B,β (T, C)) of C using the relational operations
Proof The grouping (G ) and projection (π ) can be implemented using hash aggre-
gations, which requires nm time for either operation. Therefore, the comparison ()
of both partitions utilizing the hash join algorithm results in 4nm. At most, the selec-
tion (σ ) requires a sequential scan followed by the final projection (π ). Thus six
subsequent scans need to be performed overall, which is O(nm).
In-Database Rule Learning Under Uncertainty: A Variable Precision … 267
Theorem 3 relies on adequate hash algorithms that are provided by most conven-
tional DB engines such as Oracle,1 PostgreSQL2 or SQL-Server.3 Additionally, it
assumes that a collision resistant hash function and sufficient main memory are
assured to accomplish the computation. We should further state that using Theo-
rem 3 also enables a high degree of parallelism either given through a single node or
distributed DB.
Note, respective corollaries can be exposed from Theorems 1 and 2 for the task of
feature selection in particular, i.e. seeking core and reducts in relational environments.
For the sake of completeness, the reader is referred to [15] treating this subject in
more detail.
Based on the theoretic considerations from the previous section to describe VPRS
in terms of DB terminology, in this section we introduce InDBR as new in-database
rule learner. Therefore, we discuss important notations for rules (Sect. 5.1) alongside
with an approach to handle data imbalance (Sect. 5.2). These aspects are finally
incorporated into InDBR as essential part of its learning strategy (Sect. 5.3).
The left part of the rule is the descriptor or condition and the right part poses the
conclusion or consequent. The descriptor comprises a conjunction of literals a =
va denoting a logical test whether attribute a has the value va ∈ Va . In case the
entire conjunction holds true, the rule is applicable and returns the corresponding
conclusion.4 This way, a rule can be understood intuitively as follows: if condition
then consequent. Further important characteristics of a rule are concerned with its
1 https://www.oracle.com/database/enterprise-edition/.
2 https://www.postgresql.org/.
3 https://www.microsoft.com/sql-server/.
4 We defined the rule conclusion in (18) over a single decision attribute out of simplicity. For this
reason, we restrict further related formalism consequently to one decision attribute only w.l.o.g.
268 F. Beer and U. Bühler
length and coverage. Given an arbitrary rule r , we define len(r ) to be the number of
literals constituting the descriptor, while cov(r ) exhibits the set of covered examples
by r . In this context, a rule r is said to cover an example e if all literals in the descriptor
hold true on e. Being able to compare rules, we simply use set-theoretic operations.
As such, a rule r is said to be more general than another rule r if its coverage is equal
or beyond the coverage of r , i.e. cov(r ) ⊇ cov(r ). In order to obtain the coverage
in terms of r ’s classification ability, cov p (r ) and covn (r ) are essential indicating the
positive and negative coverage respectively. Thus, cov p contains the set of covered
examples by r ’s consequent and covn those where the conclusion fails. Combined,
we are able to introduce the error δ of a rule r given through
|cov p (r )| |cov p (r )|
δ(r ) = 1 − =1− . (19)
|cov p (r ) ∪ covn (r )| |cov(r )|
Note, since induced rules are stored in a relation with fixed schema, compact rules
with a length smaller than the schema require special treatment. For this purpose and
being able to perform relational operations on rules in a unified fashion, we allow the
rule set to be incomplete, i.e. permitting null values (see Sect. 2.1). In this regard, the
length of a rule is determined by all properly set literals a = v except those where
a = ⊥.
The induction of our method is guided by the efficient relational representations
of the β-regions L and B introduced in Sect. 4.3. Both queries, however, suppress
the decision attribute, which requires additional steps to expose final rules on a given
input relation TA,{d} . These steps are computed as follows
In conventional settings where the data distribution is invariant w.r.t. changes, learn-
ing predictive models from a static source of data is state-of-the-art. In nonstationary
environments, however, existing models become outdated as the assumed condi-
tions, it was trained on, are not valid any longer. Thus, more dynamics in terms of
the learner’s visibility are essential. In this context, it is common practice for incre-
mental learning to utilize a sliding window or micro batches as partial memory to
serve the underlying induction process. While, these approaches are straightforward
and ensure to train on most recent information representing the current trends in the
data, they are incapable to sort out situations where the class distribution appears to
be skewed. Beside concept drifts, this phenomenon frequently occurs in a number
of critical applications including intrusion detection, fraud or customer churn dis-
covery and pose a crucial concern for many learning algorithms that typically bias
towards the majority class. Generally, learning in such a setting is known as the “class
imbalance problem”.
To counteract this issue in nonstationary environments, we propose a new
approach that relies not on one but on k ∈ N sliding windows, where k is the number
of expected concepts to learn with a predefined window size w ∈ N. Consequently,
the partial memory maintains kw examples in worst-case and keeps instances of
the minority classes much longer compared to those from the majority classes.
This, in turn, constitutes a natural under-sampling technique for majority examples
and provides a balanced representation for the induction process once all windows
are filled accordingly. The concept of this approach is illustrated in Fig. 1, where
W = {W1 , . . . , Wk } are the sliding windows of the proposed data structure each of
size w.
Taking the previous considerations into account, we present the incremental in-
database rule inducer InDBR from a conceptual perspective in this section and con-
centrate on training and generalization procedure respectively. In a nutshell, it utilizes
Wk : ...
size w
incoming training examples to further generalize the existing rule set based on a novel
bottom-up generalization strategy exploiting VPRS. Remaining examples still not
covered after generalizing are turned into most specific rules ensuring a complete
coverage from the current point of view. Finally, InDBR keeps track of its model
quality by pruning old or unused rules. Due to this training cycle, the expressiveness
of InDBR’s predictive model evolves over time as new training data arrive, while
keeping its complexity low focusing on the most recent input. These are important
characteristics to quickly react to abrupt or gradual changes in the underlying con-
cepts to learn. The internals of InDBR are presented in Algorithm 1, which is shown
as pseudocode facilitating readability rather than providing complex DB statements.
However, one can verify its complete translation to the domain of DBs can be carried
out straightforwardly. To further detail the functional operations, we categorized the
training procedure into four main steps:
Step (i) refers to the handling of incoming examples. These are provided by a
relation V serving as interface. Thereby, InDBR supports two types of input pro-
cessing, i.e. example-by-example or a batch of training data. On the one hand, this
permits comparability with other approaches as most related work operate on data
streams processing each arriving training example sequentially. On the other hand,
relational DBs generally show better performance, when confronted with a batch of
data taking advantage of parallel DB operations. Thus, the input to InDBR can be
adjusted according to different scenarios, which is controlled by the parameter v.
In-Database Rule Learning Under Uncertainty: A Variable Precision … 271
Furthermore, InDBR utilizes a buffer for incoming data acting as partial memory
(see Sect. 5.2), which is implemented using a conventional table W . Since W is fun-
damental to infer new rules or to generalize existing ones, InDBR’s current rule set
R needs to be refreshed as new data arrives due to potentially outdated statistics.
The next stage (ii) is concerned with the generalization of existing rules depicted in
Algorithm 2. It partitions the entire rule set according to their corresponding length j
into disjoint subsets R j . These are iteratively processed in ascending order to retrieve
attribute sets of more general rule candidates. In order to obtain such rules in the
current iteration 2 ≤ j ≤ t, we define a function g : N → [0, 1] that determines the
percentage of rules to generalize according to the length j. Having selected such an
proportion K R ⊆ R j of size g( j) · |R j | at random, “dropping conditions” is carried
out to seek for new rules as well-established generalization strategy [40]. In essence,
it stepwise drops literals from existing rules to retrieve more general ones. In our case,
the heuristic is guided by two measures, i.e. cov and δ such that cov(r̂ ) ⊃ cov(r )
and δ(r̂) ≤ β holds true for an arbitrary parent rule r and its potential successor rule
r̂ with len(r̂ ) = len(r ) − 1. Utilizing this approach, truly more general rules are
retrieved, which on the one hand may produce a higher error in comparison to the
predecessor. On the other hand, such rules can also be seen more tolerant increasing
its range for unseen examples. As a consequence, we get not only new rules, but may
obtain interesting attribute sets from these rules that can be valuable to extract further
generalizations for rules r ∈ Rl , l > j not considered yet. These are stored within
AC ⊆ P(A) and further processed using VPRS. In particular, InDBR leverages the
partition induced by B using ω(L ) and W for all B ∈ AC efficiently exposing
new rules, while disregarding already covered examples from previous iterations.
By definition, those rules are certain as they are based on the β-positive region and
could directly replace their more specific predecessor. However, there is a high risk
for overgeneralization. To get over this dilemma, InDBR proposes the parameter
m that only permits such abrupt generalizations if new rules r̃ provide sufficient
evidence w.r.t W and its predecessor rule r , i.e. |cov(r̃ )| ≥ |cov(r )| + m. This way,
the generalization routine exploiting VPRS continues with the next iteration j + 1
until all granularity levels have been explored systematically. As a parent r can have
multiple child rules r̂ , a final step is to determine the best descendant out of those
r̂ . The query covering this task seeks for rules with maximum purity 1 − δ(r̂ ) and
highest coverage cov(r̂ ), where the age α of that particular rule is set to the smallest
among r̂ . Ultimately, these best rules are appended to the rule set R, while their
parents are dropped consequently.
As briefly mentioned in the previous sections, InDBR also features the extraction
of uncertain rules, i.e. exhibiting the error β < δ < 1 − β. Obviously, such rules
cannot be used for the classification of incoming examples. However, they might
be valuable in future situations as data evolve. Thus, we refer to active and inactive
rules in this specific context. Both types of rules are extracted in stage (iii) based on
all examples U still not covered by existing rules. Therefore, InDBR makes use of
ω(L ) and ω(B) w.r.t. to U and the entire feature set inducing most specific active
and inactive decision rules.
272 F. Beer and U. Bühler
The final step (iv) during the incremental learning takes cares of the rule aging.
It refreshes the age of those rules that correctly hit at least one of the examples in
V . The age of all other rules is incremented. Once a rule exceeds its corresponding
age defined per class in αd , it is removed from R. On the one hand, this ensures to
drop certain rules that were not hit over a longer period, which indicates outdated
knowledge due to a potential shift in the underlying concept. On the other hand,
antiquated uncertain rules may be a result of data noise or an ongoing gradual drift.
When it comes to classification of an incremental rule learner, often one of its
drawbacks is the inability to cover the entire data space as opposed to other learners
such as most decision trees. This contradicts with strict demands, where algorithms
should be able to predict at any time (see Sect. 3.2). Rule learners yet satisfying
such requirements frequently accept a poorer accuracy or use a specific strategy to
compensate this issue. Two common techniques are the introduction of default rules
or the orchestration of an additional predictor with any-time properties, which is
trained in parallel. Both points are crucial for many real-world applications requir-
ing quality predictions over the any-time property, which are fully reproducible by
decision-makers. Two examples include medical diagnostics or network security.
Emphasizing the latter, two practical issues can be identified: (i) Considering the
huge amounts of network traffic to monitor, producing false alarms as a result of a
In-Database Rule Learning Under Uncertainty: A Variable Precision … 273
weak predictive model can overwhelm security analysts easily from an operational
point of view resulting in a loss of trustworthiness. (ii) Once a prediction is made, it
should be transparent ideally represented through a meaningful pattern highlighting
the case. Neither of these points can be addressed when incorporating default rules
that typically reflect nothing more but the class distribution nor by embedding an
additional stable ML model, which generally can be assumed a black-box. In turn,
this results in unexplainable alarms not supporting necessary follow-up activities
required to safeguard the integrity of the network landscape. Thus, we argue that
the rule engine should only fire when it is certain, i.e. quality and unambiguous
rules exist. InDBR addresses these concerns by abstaining classification explicitly
in cases where no adequate rule is present or it is uncertain about an upcoming deci-
sion, which is in line with the opinion of other authors (e.g. [33, 35, 36]). Therefore,
classification relies only on most certain rules given through the active rule set of
InDBR, i.e.
A R = {r ∈ R | δ(r ) ≤ β} . (21)
6 Comparative Study
This section comprises experimental results evaluating the proposed rule inducer
InDBR against other rule learner from different perspectives. First, we introduce
the setup of the experiments with employed data sets (Sect. 6.1). Furthermore, the
predictive capabilities of each rule learner are evaluated (Sect. 6.2) followed by an
assessment of discovery-oriented aspects (Sect. 6.3).
data stream mining framework MOA6 given through built-in data generators and its
designated repository. Additional two real-world tasks were downloaded separately
from the hosting service GitHub.7 In what follows, we highlight the main character-
istics of the employed data sets, whereas their consecutive summary is depicted in
Table 1.
Airline: 539.383 flight records with seven features are given in this particular data
set covering a nonstationary real-world problem [41]. Its task deals with flights that
are delayed or on-schedule and it is often used to evaluate algorithms under drifting
circumstances (e.g. [14, 30, 35]). In our experiments, we use the MOA version of
the data set available at its corresponding website.
Electricity: This data set comprises data from the Australian New South Wales
Electricity Market and is also frequently used as benchmark for drifting environments
(e.g. [37, 42]) as it expresses price dynamics of demand and supply. Each of the
45.312 records contains eight attributes such as timestamp or market demand and
refers to a 30 minute period whereas its problem is concerned with the relative price
change within the last 24 hours. The majority class holds 58%, thus having a tendency
to skewed classes. This data source was downloaded from the MOA website.
Rotating Hyperplane (RHP): This data generator was established in [43] and
can be formalized as follows: Given
d a d-dimensional space of uniformly distributed
data points x, the hyperplane i=1 wi xi = w0 divides the points into the positive
d
class if i=1 wi xi ≥ w0 or into the negative class otherwise, where xi is the ith
coordinate of x and wi is its corresponding weight. By altering wi with the probability
of changing direction τ and magnitude c per x, i.e. wi = wi + cτ , the orientation and
position of the hyperplane can be manipulated introducing drifting circumstances.
We utilized MOA to generate two data sets with the parameters τ = 0.03, c = 0.1 and
τ = 0.01, c = 0.1 to represent a long lasting and shorter gradual drift over 200.000
points with ten features omitting noise. Note, the former also contains notions of
local abrupt drifts.
Outdoor-Stream: This data set contains a collection of image sets recorded by an
autonomous system in a garden environment and was first used in [44]. Each of the
4.000 records consists of 21 attributes representing ten images that were collected on
obstacles from different perspectives and lighting conditions in temporal order. The
task is to separate the records into 40 different categories while the classes are evenly
distributed. This real-word problem is available at the mentioned GitHub repository.
Poker-Hand: The challenge of this data set is to predict the poker hand out of
five playing cards encoded by suit and rank resulting in ten condition attributes
per hand out of a standard 52-card deck. The problem contains 829.201 hands. A
normalized version of it was downloaded from the MOA website without major
modifications from our side. The class distribution is highly imbalanced such that
the eight smallest classes carry not more than 7.62% of the data. Note, even though
6 http://moa.cms.waikato.ac.nz/.
7 https://github.com/vlosing/driftDatasets/tree/master/realWorld/ (commit: 89f1665ed89af 78cae-
cabec62c680a57a4f16646).
In-Database Rule Learning Under Uncertainty: A Variable Precision … 275
Reviewing Sect. 5.2, class imbalance can be a huge concern for a ML algorithm.
At the same time, this phenomenon is not only causing trouble during training but
also when assessing its predictive capabilities. In such a setting, several popular
276 F. Beer and U. Bühler
Table 1 Employed data sets to analyze concept drifts and class imbalance
Data set #Records #Attributes #Classes Imbalance Type of drift
Airline 539.383 7 2 no unknown
Electricity 45.312 8 2 (yes) unknown
Outdoor- 4.000 21 40 no unknown
Stream
Poker-Hand 829.201 10 10 yes unknown
RBF 100.000 10 2 yes gradual
RHP (long) 200.000 10 2 no abrupt/gradual
RHP (short) 200.000 10 2 no gradual
SEA- 100.000 3 2 yes abrupt
Concepts
Weather 18.159 8 2 yes unknown
performance measures such as “accuracy”, which showcases the ratio between cor-
rectly classified examples and all instances seen during evaluation, can be misleading.
To highlight their inherent problem, let us assume a binary classification task where
the positive class consists of roughly 3% of the data and the negative class holds
97%. In this and similar situations with skewed class distributions, a naive learner
deeming all examples to fall in the negative class indeed features 97% accuracy.
Obviously, it distorts the classification result not indicating that 100% of the positive
samples are predicted incorrectly. Thus, it is imperative to utilize a more sophisti-
cated performance measure given our experimental setup consisting of both balanced
and imbalanced data reviewing Table 1. Countering this challenge, we make use of
the “F1-score” in conjunction with two established scaling methods, i.e. “micro-
averaging” and “macro-averaging”. In this context, F1-score refers to the harmonic
mean of the two measures “precision” and “recall” that emerged from information
retrieval, an area highly subject to class imbalance (e.g. [47, 48]). On that note, the
micro-average F1-score (μF1) weights each classification equally during evaluation,
and thus reflects the conventional accuracy in a multiclass setting. In contrast, the
macro-average of the F1-score (mF1) weights all classes evenly permitting insights
to the effectiveness of a classifier across classes. As a result, we obtain two indica-
tors rating both the overall classification performance in terms of correct predictions
and illuminating a classifier’s deficits on imbalanced data. Note, we refer to these
measures only by means of multiclass decision problems out of consistency being
aware that several data sets in our test environment in fact target binary classification
tasks.
Unlike conventional evaluation methods for batch learning that rely either on hold-
out test sets or cross-validation, estimating the predictive tendencies of an incremental
decision model is a further challenge, because the model evolves over time and no
explicit test data are available due to the continuous nature of the learning process.
Common practice to determine the performance in such a setting is the predictive
In-Database Rule Learning Under Uncertainty: A Variable Precision … 277
1.0
0.8
0.8
0.8
0.6
abstain rate
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0k 10k 20k 30k 40k 0k 200k 400k 600k 800k 0k 25k 50k 75k 100k
0.85
0.9
0.7
tentative m−F1
0.75
0.8
0.6
0.5
0.65
0.7
0.4
0.55
0.6
0.3
0k 10k 20k 30k 40k 0k 200k 400k 600k 800k 0k 25k 50k 75k 100k
Fig. 2 Abstaining behavior and predictive performance on selected data sets over the course of
the incremental learning process: a–c provide insights to the individual abstain rates of VFDR,
G-eRules and InDBR for data set Electricity, Poker and SEA-concepts, while d–f showcase t-mF1
of G-eRules and InDBR on these benchmarks all in consecutive order
data sets by 18.48% on average. It is worth noting that out of these five classification
tasks, four comprised class imbalance problems. Considering all imbalanced data set,
G-eRules produced an averaged t-mF1 of 64.98%, which was roughly 10% weaker
than the outcome of InDBR providing solid results given the degree of difficulty on
these data sets. Details w.r.t. their prequential performance on three imbalanced data
sets are depicted in Fig. 2d–f. The potentials of InDBR become even more convincing
when reviewing its performance on multiclass problems represented through Poker-
Hand and Outdoor-Stream in our series of experiments. It achieved a mean t-mF1
of 82.83%, which was more than 33% better than numbers produced by G-eRules.
Combining these points indicate that our approach in combination with the sliding
windows is promising and provides more visibility over the course of the learning
cycle. When it comes to the comparison of the abstain rates, both algorithms showed
no compelling differences. The only notable disparity could be found on two data
In-Database Rule Learning Under Uncertainty: A Variable Precision … 279
Table 2 Prequential evaluation of the rule learners in percent using μF1, mF1, t-μF1, t-mF1 and
abstain rate (abs) under concept drift: Bold numbers indicate overall winner per row and performance
measure
Data set VFDR G-eRules InDBR
μF1 mF1 abs t-μF1 t-mF1 abs μF1 mF1 t-μF1 t-mF1 abs
Airline 66.40 61.70 94.63 62.97 59.00 13.70 66.19 65.46 67.57 66.85 13.59
Electricity 78.88 78.62 62.55 76.24 75.36 16.76 80.27 79.61 84.59 83.61 19.39
Outdoor- 55.43 58.13 100.00 57.01 54.13 42.20 83.33 82.33 95.33 94.67 27.00
Stream
Poker-Hand 78.66 63.85 32.37 77.32 44.48 21.84 85.48 63.45 94.88 70.99 29.74
RBF 90.04 85.26 76.85 83.92 63.98 8.02 86.86 73.28 86.00 64.95 11.98
RHP (long) 83.41 83.51 26.69 83.71 83.73 6.61 86.84 86.85 87.97 87.97 4.83
RHP (short) 81.61 81.62 22.80 79.64 79.66 4.30 80.52 80.56 81.29 81.32 3.26
SEA-Concepts 86.06 81.58 87.35 81.76 72.22 14.09 86.51 80.42 90.31 81.47 18.68
Weather 71.48 67.68 40.88 76.63 68.85 16.28 75.17 70.28 79.67 72.39 18.22
Table 3 Prequential evaluation of the rule learners in percent with 15% class noise and concept
drift using μF1, mF1, t-μF1, t-mF1 and abstain rate (abs): Bold numbers indicate overall winner
per row and performance measure
Data set VFDR G-eRules InDBR
μF1 mF1 abs t-μF1 t-mF1 abs μF1 mF1 t-μF1 t-mF1 abs
Airline-15 60.01 56.98 96.03 55.70 53.64 16.15 61.45 55.15 63.16 55.85 21.15
Electricity-15 69.41 68.77 60.78 63.83 63.54 22.60 71.00 70.43 74.52 73.27 25.59
Outdoor- 45.45 42.51 100.00 42.97 38.41 47.51 63.00 58.25 71.75 65.25 27.75
Stream-15
Poker-Hand 68.30 21.43 27.62 58.29 16.61 20.80 69.19 19.60 76.46 41.69 31.84
RBF-15 74.98 68.86 70.91 68.95 57.19 16.58 75.48 64.23 74.98 59.96 18.27
RHP-15 (long) 73.37 73.44 28.49 73.16 73.20 14.51 75.48 75.45 76.37 76.38 7.29
RHP-15 (short) 71.92 71.96 29.04 72.43 72.51 15.04 72.38 72.38 73.79 73.79 9.89
SEA-Concepts- 71.42 65.58 87.22 69.65 62.97 16.53 74.61 68.63 80.56 71.20 23.49
15
Weather-15 64.73 61.24 64.15 65.40 60.74 24.63 67.28 62.72 69.83 63.89 14.39
The intra-comparison among the biased data sets revealed very few changes among
the performance tendencies of the three classifiers compared to original test results.
In more detail, the pairwise comparison of VFDR and InDBR exposed a mean differ-
ences of 2.58% considering the F-scores and 42.73% examining abstain rates indi-
cating no major discrepancies w.r.t. the previous results. However, obtained numbers
on μF1 demonstrate a win on all nine data sets for InDBR underpinning a signif-
icant difference with agreement across the Friedman test (α = 1%) and Wilcoxon
test (α = 5%). Associating G-eRules and InDBR uncovered a combined deviation of
9.64% on both F-scores, which is in line with the original assessment. Yet, a distinc-
tion could be observed on the abstaining behavior as in three out of nine measurements
G-eRules provided a better performance. Despite these results, no statistical signif-
icance could be determined concluding no substantial differences. Comparing the
outcome of both series of experiments towards an inter-assessment disclosed slightly
weaker results for InDBR. On average, the F-scores of VFDR decreased by 12.42%
while InDBR’s performance dropped by 13.15%. At the same time, the abstain rates
increased by 2.24% for VFDR and 3.36% for InDBR. Contrasting G-eRules and
InDBR revealed similar results on the tentative F-scores. While G-eRules collapsed
by 11.74% on average, InDBR’s t-μF1 and t-mF1 decreased by 12.73%. Turning
over to their abstaining behavior, the corresponding rates fell by 5.64% and 3.66%
respectively. Based on that, we can deduce that even in the given adversarial setup
InDBR outperforms G-eRules on the F-scores examining no significant difference on
the abstain rates. Furthermore, its any-time capabilities even increased w.r.t. VFDR,
but also inidcate slightly poorer results on an inter-comparison. However, this deficit
is always below 2% per performance measure constituting a rather marginal gap.
The results of this setup are highlighted in Table 3.
In-Database Rule Learning Under Uncertainty: A Variable Precision … 281
• Average rule set size: Similar to the depth of a decision tree, the rule set size
provides an indicator for the overall model complexity. Commonly, a smaller rule
set is preferred w.r.t. to both computational demands and monitoring aspects.
• Average rule length: The length of a decision rule characterizes the simplicity of
a pattern. Shorter rules refer to a more expressive pattern identified by the rule
engine, thus permitting to assess the level of generality to cover potentially unseen
examples.
• Average coverage: In an incremental setting, the coverage of a rule set is an indi-
cator determining how well rules reflect arriving examples. This measure can be
critical for decision-makers as a low coverage signals a poor adaption and repre-
sentation of most recent data. In our experiments, the rule set coverage is metered
w.r.t. to a reference sliding window comprising a fixed size of the last 1000 exam-
ples.
• Average rule purity: Not only the complete coverage is of interest but also the
quality of an individual decision rule, which can be discovered via the rule purity
providing an intuition to its consistency and confidence. Therefore, the same con-
cept from the previous measure is applied, i.e. a sliding window holding latest
arriving examples of size 1000.
The results applying these criteria to the rule learners G-eRules, VFDR and InDBR
is discussed in the remainder of this section. Therefore, we used the nine benchmark
data sets from the previous Sect. 6.2 without class noise. Results disclosed that the
any-time learner VFDR outperforms InDBR in terms of rule set size by a big margin.
On average, it carries 94.81 fewer rules than InDBR. Yet, this outcome is not surpris-
ing given the high abstain rates uncovered earlier. Another finding is the weak result
from G-eRules on this measure, which becomes even more evident when reviewing
Fig. 3a where its rule set size constantly grows over the course of the learning cycle
on the Electricity data set. This behavior is in contrast to VFDR and InDBR that
produce a rather constant growth. The same characteristics could be observed on
three other benchmark data sets such that G-eRules contains more than 979.36 rules
on average as opposed to InDBR. Considering the average rule length, G-eRules
performed much better, but still being behind VFDR by a mean difference of 3.06.
Hence, our method was outrun, which we relate to the different induction strategies
282 F. Beer and U. Bühler
1.0
9 10
1200
0.9
8
7
rule set size
0.8
900
rule length
rule purity
6
0.7
5
600
0.6
3
300
0.5
1
0.4
0
0k 10k 20k 30k 40k 0 0k 200k 400k 600k 800k 0k 50k 100k 150k 200k
Fig. 3 Discovery-oriented quality on selected data sets over the course of the incremental learning
process: a average rule set size on Electricity, b average rule length on Poker and c average rule
purity on RHP (short)
implying a disadvantage for InDBR due to its bottom-up approach. In particular, the
effect of the different concepts can be examined in Fig. 3b. Moreover, the experi-
ments showed that rules created by InDBR pose a higher purity well ahead of its
competitors in eight out of nine benchmark tests such that the gap between VFDR and
G-eRules compared to InDBR amounts for more than 25%. An example is depicted
in Fig. 3c, where InDBR’s purity is rather constant in comparison to the oscillating
numbers provided by VFDR and G-eRules. Regarding the coverage, InDBR pro-
vides a solid outcome on five out of nine data sets resulting in a mean coverage of
92.88%, while outnumbered on the remaining benchmark tests. Combined, however,
it covers 43.82% more incoming examples than VFDR and 2.20% fewer examples
than G-eRules. Examining the win-loss analysis, InDBR won one time and was eight
times on second place w.r.t. the average rule set size. On the rule length, our method
approached one time the first position, was three times second and five times on
the last place constituting the weakest result on the descriptive measures. By means
of average rule purity, InDBR catered for eight wins and was one time in second
position, while it won five times, was two times second and two times on the third
place considering the average coverage (Table 4).
7 Closing Remarks
Conventional mining attempts often face performance problems due to long lasting
data loads and insufficient support of parallel algorithms. One promising remedy is
in-database processing, which is an emerging paradigm in data science. By fusing
mining components and data repository, it essentially brings predictive analytics to
the domain of relational databases carrying several benefits including the reduction
Table 4 Discovery-oriented aspects of rule learners w.r.t. average rule set size (size), average rule length (len), average rule purity (pur) and average coverage
(cov): Bold numbers indicate overall winner per row and performance measure; pur and cov are expressed in percent
Data set VFDR G-eRules InDBR
size len pur cov size len pur cov size len pur cov
Airline 92.67 1.01 19.64 5.37 3100.85 2.03 17.13 83.39 83.84 2.64 74.11 96.37
Electricity 8.92 1.06 76.17 36.89 904.05 2.81 28.66 82.08 114.97 5.72 93.23 83.00
Outdoor- 0.00 0.00 0.00 0.00 95.75 4.26 48.01 74.10 24.82 18.52 80.00 71.00
Stream
Poker- 82.06 2.12 22.93 67.21 2052.18 8.84 4.34 78.00 194.75 7.20 96.33 89.95
Hand
RBF 13.62 1.45 62.54 22.47 451.70 1.88 39.37 92.61 121.75 6.26 73.36 97.25
RHP 16.77 3.64 87.69 73.31 242.74 8.03 62.40 93.06 52.74 5.15 88.10 66.36
(long)
RHP 15.73 3.33 80.82 77.21 133.97 8.15 59.53 95.52 52.14 3.19 82.01 68.79
(short)
SEA- 15.83 1.01 96.55 12.63 2571.57 1.90 11.90 85.99 264.61 1.40 84.63 97.79
In-Database Rule Learning Under Uncertainty: A Variable Precision …
Concepts
Weather 8.34 1.29 75.22 58.06 368.70 4.47 48.79 82.58 197.63 5.54 83.28 77.06
283
284 F. Beer and U. Bühler
Acknowledgements The authors would like to thank the German Federal Ministry of Education
and Research (BMBF) for support within the project IntErA under grant number 03FH023PX3.
References
1. Tileston, T.: Have your cake & eat it too! accelerate data mining combining SAS & teradata.
In: Teradata Partners 2005 Experience the Possibilities (2005)
2. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis
practices for big data. In: Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1481–1492.
VLDB Endowment (2009)
3. Shreya, P., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I.: Large-
scale predictive analytics in vertica: fast data transfer, distributed model creation, and In-
database prediction. In: Proceedings of the 2015 ACM SIGMOD International Conference on
Management of Data, pp. 1657–1668. ACM (2015)
4. Luo, S., Gao, Z.J., Gubanov, M., Perez, L.L., Jermaine, C.: Scalable linear algebra on a rela-
tional database system. In: Proceedings of the IEEE 33rd International Conference on Data
Engineering (ICDE 2017), pp. 523–534. IEEE (2017)
5. Fernandez-Baizán, M.C., Menasalvas Ruiz, E., Peña Sánchez, J.M.: Integrating RDMS and
data mining capabilities using rough sets. In: Proceedings of the 6th International Conference
on Information Processing and Management of Uncertainty (IPMU’96), pp. 1439–1445 (1996)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 285
6. Kumar, A.: New techniques for data reduction in a database system for knowledge discovery
applications. J. Intell. Inf. Syst. 10(1), 31–48 (1998)
7. Hu, X., Lin, T.Y., Han, J.: A new rough set model based on database systems (RSFDGrC 2003).
In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, vol. 2639, pp. 114–121.
Springer, LNCS (2003)
8. Vaithyanathan, K., Lin, T.Y.: High frequency rough set model based on database systems. In:
Proceedings of the 2008 Annual Meeting of the North American Fuzzy Information Processing
Society (NAFIPS 2008), pp. 1–6. IEEE (2008)
9. Z̆liobaitė, I.: Learning under concept drift: an overview. Technical report, Faculty of Mathe-
matics and Informatics, Vilnius University (2010)
10. Gama, J., Z̆liobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift
adaptation. In: ACM Computing Surveys, vol. 46, no. 4, pp. 1–37. ACM (2014)
11. Rozsypal, A., Kubat, M.: Association mining in time-varying domains. Intell. Data Anal. 9(3),
273–288 (2005)
12. Kukar, M.: Drifting concepts as hidden factors in clinical studies. In: Artificial Intelligence in
Medicine, vol. 2780, pp. 355–364. Springer, LNCS (2003)
13. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. In: ACM Computing
Surveys, vol. 41, no. 3, 15, pp. 1–58. ACM (2009)
14. Beer, F., Bühler, U.: Learning adaptive decision rules inside relational database systems. In:
Proceedings of the 2nd International Symposium of Fuzzy and Rough Sets (ISFUROS), pp.
1–12 (2017)
15. Beer, F., Bühler, U.: In-database feature selection using rough set theory. In: Information
Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016), CCIS,
vol. 611, pp. 393–407. Springer (2016)
16. Pawlak, Z.: Rough sets. In: International Journal of Computer and Information Science, vol.
11, no. 5, pp. 341–356. Kluwer (1982)
17. Pawlak, Z.: Rough Sets - Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht
(1991)
18. Ziarko, W.: Variable precision rough set model. In: Journal of Computer and System Sciences,
vol. 46, no. 1, pp. 39–59. Elsevier (1993)
19. Nguyen, H.S.: Approximate Boolean reasoning: foundations and applications in data mining.
In: Transactions on Rough Sets V, vol. 4100, pp. 334–506. Springer, LNCS (2006)
20. Machuca,F., Millán, M.: Enhancing query processing in extended relational database systems
via rough set theory to exploit data mining potentials. In: Knowledge Management in Fuzzy
Databases. Studies in Fuzziness and Soft Computing, vol. 39, pp. 349–370. Physica (2000)
21. Han, J., Hu, X., Lin, T.Y.: A new computation model for rough set theory based on database
systems. In: Data Warehousing and Knowledge Discovery (DaWaK 2003), vol. 2737, pp. 381–
390. Springer, LNCS (2003)
22. Beer, F., Bühler, U.: An In-database rough set Toolkit. In: Proceedings of the LWA 2015
Workshops: KDML, FGWM, IR and FGDB (LWA’15), pp. 146–157. CEUR-WS (2015)
23. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach.
Learn. 23(1), 69–101 (1996)
24. Michalski, R.S.: On the Quasi-minimal solution of the general covering problem. In: Proceed-
ings of the 5th International Symposium on Information Processing, pp. 125–128 (1969)
25. Maloof, M.A.: Incremental rule learning with partial instance memory for changing concepts.
In: Proceedings of the International Joint Conference on Neural Networks (IJCNN’03), pp.
2764–2769 (2003)
26. Ferrer-Troyano, F.J., Aguilar-Ruiz, J.S., Riquelme, J.C.: Incremental rule learning and border
examples selection from numerical data streams. J. Univers. Comput. Sci. 11(8), 1426–1439
(2005)
27. Gama, J., Kosina, P.: Learning decision rules from data streams. In: Proceedings of the 22nd
International Joint Conference on Artificial Intelligence (IJCAI’11), pp. 1255–1260. AAAI
Press (2011)
286 F. Beer and U. Bühler
28. Hoeffding, W.: Probability inequalities for sums of bounded random variables. In: Journal of
the American Statistical Association, vol. 58, no. 301, pp. 1330. Taylor & Francis (1963)
29. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), pp.
71-80. ACM (2000)
30. Kosina, P., Gama, J.: Handling time changing data with adaptive very fast decision rules. In:
Machine Learning and Knowledge Discovery in Databases, vol. 7523, pp. 827–842. Springer,
LNCS (2012)
31. Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams.
In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD’03), pp. 523–528. ACM (2003)
32. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach.
Learn. Res. 11, 1601–1604 (2010)
33. Stahl, F., Gaber, M.M., Salvador, M.M.: eRules: a modular adaptive classification rule learning
algorithm for data streams. In: Research and Development in Intelligent Systems XXIX (SGAI
2012), pp. 65–78. Springer (2012)
34. Cendrowska, J.: PRISM: an algorithm for inducing modular rules. In: International Journal of
Man-Machine Studies, vol. 27, no. 4, pp. 349–370. Academic Press (1987)
35. Le, T., Stahl, F., Gomes, J.B., Gaber, M.M., Di Fatta, G.: Computationally efficient rule-
based classification for continuous streaming data. In: Research and Development in Intelligent
Systems XXXI (SGAI 2014), pp. 21–34. Springer (2014)
36. Le, T., Stahl, F., Gaber, M.M., Gomes, J.B., Di Fatta, G.: On expressiveness and uncertainty
awareness in rule-based classification for data streams. In: Neurocomputing, vol. 265(C), pp.
127–141. Elsevier (2017)
37. Deckert, M., Stefanowski, J.: RILL: algorithm for learning rules from streaming data with
concept drift. In: Foundations of Intelligent Systems, vol. 8502, pp. 20–29. Springer, LNCS
(2014)
38. Pawlak, Z.: Information systems - theoretical foundations. In: Information Systems, vol. 6, no.
3, pp. 205–218. Elsevier (1981)
39. Lin, T.Y.: An overview of rough set theory from the point of view of relational databases. In:
Bulletin of International Rough Set Society, vol. 1, no. 1, pp. 30–34. IRSS (1997)
40. Michalski, R.S.: A theory and methodology of inductive learning. In: Artificial Intelligence,
vol. 20, no. 2, pp. 111–161. Elsevier (1983)
41. Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams.
In: Data Mining and Knowledge Discovery, vol. 23, no. 1, pp. 128–168. Springer (2011)
42. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for
evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD’09), pp. 139–148. ACM (2009)
43. Hulten, G.S., Domingos, P.: Mining time-changing data streams. In: Proceedings of the
7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD’01), pp. 97–106. ACM (2001)
44. Losing, V., Hammer, B., Wersing, H.: Interactive online learning for obstacle classification on
a mobile robot. In: Proceedings of the 2015 International Joint Conference on Neural Networks
(IJCNN), pp. 1–8 (2015)
45. Street, W., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In:
Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD’01), pp. 377–382. ACM (2001)
46. Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments.
In: IEEE Transactions on Neural Networks, vol. 22, no. 10, pp. 1517–1531 (2011)
47. van Rijsbergen, C.J.: Foundations of evaluation. J. Doc. 30(4), 365–373 (1974)
48. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge
University Press, Cambridge (2008)
49. Dawid, P.A.: Present position and potential developments: some personal views: statistical
theory: the prequential approach. In: Journal of the Royal Statistical Society, vol. 147, no. 2,
pp. 278–292. Wiley (1984)
In-Database Rule Learning Under Uncertainty: A Variable Precision … 287
50. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In:
Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining (KDD’09), pp. 329–338. ACM (2009)
51. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings.
Ann. Math. Stat. 11(1), 86–92 (1940)
52. Wilcoxon, F.: Individual comparisons by ranking methods. In: Biometrics Bulletin, vol. 1, no.
6, pp. 80–83. Wiley (1945)
53. Stefanowski, J., Vanderpooten, D.: Induction of decision rules in classification and discovery-
oriented perspectives. In: International Journal of Intelligent Systems, vol. 16, no. 1, pp. 13–27.
Wiley (2001)
54. McGarry, K.: A survey of interestingness measures for knowledge discovery. In: The Knowl-
edge Engineering Review, vol. 20, no. 1, pp. 39–61. Cambridge University Press (2005)
55. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. In: ACM Com-
puting Surveys, vol. 38, no. 3, p. 9. ACM (2006)
Facial Similarity Analysis: A Three-Way
Decision Perspective
1 Introduction
In a nutshell, the basic ideas of three-way decisions are thinking and problem-solving
about threes [11], representing a commonly and widely used human practice in
perceiving and dealing with complex worlds. In other words, we typically divide
a whole or a complex problem into three relatively independent parts and design
strategies to process each of the three parts. Yao [11] proposed a trisecting-and-
acting (T&A) model of three-way decisions. By following this basic idea of three-
way decisions, we put forward here a model of three-way analysis of facial photos.
To grasp the idea of thinking in threes, let us consider three examples. Marr [6]
suggested that an in-depth understanding of any information processing can be under-
stood at three levels: the computational theory level, the representation and algorithm
level, and the hardware implementation level. Each level focuses on a different aspect
of information processing. Kelly [5] presented a three-era framework for studying
the past, present, and future of computing: the tabulating era (1900s–1950s), the
programming era (1950s–present), and the cognitive era (2011–). This framework
helps us to identify the main challenges and objectives of today’s computing. Many
taxation systems categorically classify citizens as low, middle, or high income, with
different taxation methods applied to each.
An evaluation-based trisecting-and-acting model of three-way decisions consists
of two components [10, 11]. According to the values of an evaluation function, we
first trisect a universal set into three pair-wise disjoint regions. The result is a weak
tri-partition or a trisection of the set. With respect to a trisection, we design strategies
to process the three regions individually or jointly. The two components of trisecting
and acting are both relatively independent and mutually supportive. Effectiveness
of the strategies of action depends on the appropriateness of the trisection; a good
trisecting method relies on knowledge about how the resulting trisection is to be
used. It is important to search for the right combination of trisection and strategies
of action.
Let OB denote a finite set of objects. Suppose v : OB −→ is an evaluation
function over OB, where is the set of real numbers. For an object x ∈ OB, v(x)
is called the evaluation status value (ESV) of x. Intuitively, the ESV of an object
quantifies the object with respect to some criteria or objectives. To obtain a trisection
of OB, we require a pair of thresholds (α, β), α, β ∈ , with β < α. Formally, three
regions of OB are defined by:
292 D. H. Hepting et al.
They correspond to subsets of objects with low, middle, and high ESVs, respectively.
The three regions satisfy the following properties:
It should be noted that one or two of the three regions may in fact be the empty set.
Thus, the family of three regions is not necessarily a partition of OB. We call the
triplet (Rl (v), Rm (v), Rh (v)) a weak tri-partition or a trisection of OB.
A trisection is determined by a pair of thresholds (α, β). Based on the physical
meaning of the evaluation function v, we can formulate the problem of finding a pair
of thresholds as an optimization problem [10]. In order words, we search for a pair
of thresholds that produces an optimal trisection according an objective function.
Once we obtain a trisection, we can devise strategies to act on the three regions.
We can study properties of objects in the same region. We can compare objects
in different regions. We can form strategies to facilitate the movement of objects
between regions [1]. There are many opportunities working with a trisection of a
universal set of objects.
Let us use the example of a taxation system again to illustrate the main ideas of
the trisecting-and-acting model of three-way decisions. In this case, OB is the set of
all citzens who pay tax. The evaluation function is the income of a citizen in dollars.
Suppose that a pair of thresholds (α, β) is given in terms of these dollars, say $35k
and $120k, respectively. The three regions Rl (v), Rm (v), and Rh (v)) represent the
low income (i.e., income ≤ $35k), middle income (i.e., $35k < income < $120K ),
and high income (i.e., income ≥ $120 k), respectively. For the three levels of income,
one typically devises different formulas or rates to compute tax.
Three-way decisions provide a general framework to unify ideas from several theories
for modelling uncertainty. Although the introduction of the concept of three-way
decisions was motivated by the notion of the three regions of rough sets, its recent
developments are far beyond rough sets. To gain insights into three-way analysis of
facial similarity, we interpret probabilistic rough sets and shadowed sets in terms of
three-way decisions.
Facial Similarity Analysis: A Three-Way Decision Perspective 293
In formulating probabilistic rough sets [13, 14], we start with an equivalence rela-
tion E on the set of objects OB, namely, E is reflexive (i.e., ∀x ∈ OB, x E x), sym-
metric (i.e., ∀x, y ∈ OB, x E y =⇒ y E x), and transitive (i.e., ∀x, y, x ∈ OB, x E y ∧
y E z =⇒ x E z). The equivalence relation divides the set of objects into a family of
pair-wise disjoint equivalence classes. Let [x] E , or simply [x] when E is understood,
denote the equivalence class containing x:
It can be seen that our card sorting problem can, in fact, be modelled by an equivalence
relation for each individual participant. That is, piles made by a participant can be
viewed as a family of pair-wise disjoint equivalence classes. Given a subset of objects
X ⊆ OB, we define a conditional probability by [7],
|X ∩ [x]|
Pr (X |[x]) = , (3)
|[x]|
That is, the three regions of rough sets are interpreted within the framework of three-
way decisions.
The three regions of rough sets deals with the classification of objects based
on information about the equivalence of objects. Equivalence is a special type of
similarity. Our study of facial similarity uses a notion that may be considered as a
generalization of equivalence relations. More specifically, we consider three levels:
similar, undecidable, and dissimilar. It will be interesting to study approximations
under a three-level of similarity, rather than a two-level equivalence.
294 D. H. Hepting et al.
A fuzzy set, proposed by Zadeh [15], models a concept with an unsharp boundary.
A fuzzy set is defined by a membership function μA : OB −→ [0, 1]. The value
μA (x) is called the membership grade of x. Pedrycz [8, 9] argues that humans are
typically insensitive to detailed membership grades. While we can easily comprehend
membership grades close to the two end points of 0 and 1, we cannot make distinction
about membership grades in the middle. For this reason, he introduces the notion of
shadowed sets as three-way approximations of fuzzy sets.
In the framework of three-way decisions, the membership function μA is an
evaluation function. Given a pair of thresholds (α, β) with 0 ≤ β < α ≤ 1, according
to Eq. (1), we have a three-way approximation as follows:
It might be pointed out the formulation is slightly different from Pedrycz’s formu-
lation. Pedrycz uses three values 0, [0, 1], and 1 as membership grades for the three
regions.
For modelling similarity, we need to consider a fuzzy relation μR : OB ×
OB −→ [0, 1]. A fuzzy similarity relation [16], as a generalization of equivalence
relation, is a fuzzy relation that is reflexive (i.e., ∀x ∈ OB, μcal R (x, x) = 1, symmet-
ric (i.e., ∀x, y ∈ OB, μR (x, y) = μR (y, x), and max-min transitive (i.e., ∀x, z ∈
OB, μR (x, z) ≥ max y∈OB min(μR (x, y), μR (y, z)). The membership grade
μR (x, y) may be interpreted as the degree to which x is similar to y.
In the framework of three-way decisions, the fuzzy similarity relation μR is an
evaluation function on OB × OB. Given a pair of thresholds (α, β) with 0 ≤ β <
α ≤ 1, according to Eq. (1), we have a three-way approximation of the similarity
relation:
where SIM, UND, and DIS denote, respectively, the sets of similar, undecidable, and
dissimilar pairs of objects. To be consistent with later discussions, we rename the
three regions to better reflect their semantics.
When applying three-way decisions, it is necessary use semantically sound and
meaningful notions and terms for naming and interpreting an evaluation function
and the resulting three regions. In the rest of this paper, we give details about how
Facial Similarity Analysis: A Three-Way Decision Perspective 295
Let P denote the set of unordered pairs of photos from a set of photos. Let N denote
the number of participants. Based on the results of sorting, we can easily establish
an evaluation function, v :−→ {0, 1, . . . , N } regarding the similarity of a pair of
photographs, that is, for p ∈ P,
v( p) = the number of participants who put the pair in the same pile. (7)
Given a pair of thresholds (l, u) with 1 ≤ l < u ≤ N , according to Eq. (1), we can
divide the set of pairs P into three pair-wise disjoint regions:
They are called, respectively, the sets of similar, undecidable, and dissimilar pairs.
Alternatively, we can consider a normalized evaluation function vn ( p) = v( p)/N ,
which gives the percentage of participants who consider the pair p is similar. This
provides a probabilistic interpretation of the normalized evaluation function. With
such a transformation, we can apply a probabilistic approach, suggested by Yao and
Gao [12], to determine the pair of thresholds (α, β) with 0 < β < α ≤ 1.
the quality of the judgments by different participants. Intuitively speaking, both the
number of piles and the sizes of individual piles provide hints on the quality and
confidence of a participant. If a participant used more piles and, in turn, smaller sizes
of individual piles, we consider the judgments to be more meaningful. Consequently,
we may assign a higher weight to the participant.
Consider a pile of n photos. According to the assumption that a pair of photos
in the same pile is similar, it produces n(n − 1)/2 pairs of similar photos. Suppose
a participant provided M piles with the sizes, respectively, of n 1 , . . . n M . The total
number of similar pairs of photos is given by:
M
n i (n i − 1)
NS = . (9)
i=1
2
PD = 1 − PS . (11)
Based on the proposed model, we report results from analyzing a dataset obtained
from card sorting.
Facial Similarity Analysis: A Three-Way Decision Perspective 297
For the dataset used in this work, we have N = 25. We set l = 10 and u = 15. Specif-
ically, we consider a pair of photographs to be similar if more than 15 participants out
of 25 put them in the same pile, or equivalently, more than 15/25 = 60% participants
put them in the same pile. We consider a pair of photographs to be dissimilar if less
298 D. H. Hepting et al.
200
150
Pile Size
100
50
0
1 3 5 7 9 11 13 15 17 19 21 23 25
Participants
Randomly Generated
(b)
200
150
Pile Sizes
100
50
0
1 3 5 7 9 11 13 15 17 19 21 23 25
Participants
Fig. 1 A summary of pile sizes by participant: a real data from card sorting study and b randomly-
simulated data
Facial Similarity Analysis: A Three-Way Decision Perspective 299
Table 1 Code, written in the python language, to generate piles of photos to simulate participants
behaving randomly
20000
Real data
Random data
15000
Similar Undecidable Dissimilar
Pair Count
10000
5000
0
0 5 10 15 20 25
Number of Dissimilar Votes per Pair
Fig. 2 A summary of ratings by participant from real data from card sorting study and randomly-
simulated data
than 10 participants out of 25 put them in the same pile, or equivalently, less than
10/25 = 40% participants put them in the same pile. Otherwise, we view that the
judgments of the 25 participants are inconclusive to declare similarity or dissimilarity
of the pair of photos.
Figure 2 shows the effects of these thresholds on the real and random data. Based
on the pair of thresholds l = 10 and u = 15, we have similar pairs, undecidable
pairs, and dissimilar pairs. Table 2 summarizes the numbers of pairs in each region,
between the observed and randomly-simulated data.
Figure 3 shows two samples of Similar pairs (S1 and S2 refer to the left and right
pairs, respectively). For both S1 and S2, 19 participants put the pair into the same
pile. Figure 4 shows two samples of Undecidable pairs (U1 and U2 refer to the left
and right pairs, respectively). For both U1 and U2, 13 participants put the pair into
the same pile. Figure 5 shows two samples of Dissimilar pairs (D1 and D2 refer to
the left and right pairs, respectively). For D1, 4 participants put the pair into the same
pile and for D2, only 2 participants put the pair into the same pile.
Facial Similarity Analysis: A Three-Way Decision Perspective 301
Fig. 3 The 2 pairs of photos shown here (S1 left, S2 right) represent samples from the similar
(SIM) region
Fig. 4 The 2 pairs of photos shown here (U1 left, U2 right) represent samples from the undecidable
(UND) region. Pairs U1 and U2 were highlighted in the study by Hepting and Almestadi [3]
Fig. 5 The 2 pairs of photos shown here (D1 left, D2 right) represent samples from the dissimilar
(DIS) region
An inspection of the final three-way classification confirms that pairs in the similar
set are indeed similar, pairs in the dissimilar set are very different, and pairs in the
undecidable set share some common features while differing in some other aspects.
302 D. H. Hepting et al.
A more refined approach is possible by looking at the number of photos that are
considered along with the photos in any particular pair. If a participant made M
piles, the number of possible configurations for the participant is M + M2 . Figure 6
compares the variability in observed participant data (min = 6, max = 741) with
that of simulated participants (min = 105, max = 276). These plots summarize the
number of possible pile configurations that may contain a particular photo pair, by
participant. Higher numbers of possible configurations correspond to more piles of
smaller size.
Figure 7 summarizes the number of photos in the piles that contain each of the
photo pairs in Figs. 3, 4 and 5. When the pair is judged to be dissimilar (N) by a
participant, the number of photos associated with the pair is the sum of the sizes of
the 2 different piles that each contain one of the photos in the pair. When the pair is
judged to be similar (Y) by a participant, the number of photos associated with the
pair is size of the single pile that contains both photos.
Figure 8 summarizes by relative rank the number of photos associated with each
of the photo pairs in Figs. 3, 4 and 5. Regardless of the number of possible pile con-
figurations that may contain the pair of interest, the smallest of these configurations
has a relative rank approaching 0 and the largest of these configurations has a relative
rank of 1. The relative rank can be transformed into a similarity score according to
Eq. 12.
2−relative rank(PAB )
, A and B are in the same pile,
Sr (A, B) = 2
relative rank(PA +PB ) (12)
2
, A and B are in two different piles.
This score is computed for each rating of each pair of photos. From the card sorting
study, 63,190 scores can be computed for each of the 25 participants. As an example,
Participant 21 made 7 piles of photos with sizes: 2, 19, 36, 36, 56, 86, and 120 (355
photos rated). This leads to 28 configurations of piles, some with the same size.
Please see Table 3 for details of the calculations and Fig. 9 for a plot of the results. In
order to create a single similarity score for a pair of photos, we sum the score from
each rating and divide by the number of raters (N ), according to Eq. 13.
1
N
S(A, B) = Sr (A, B). (13)
N r =1
Figure 10 summarizes the similarity scores, sorted into increasing order, for each
rating of each sample pair. The scores are determined by the relative rank of the
configuration that contains the pair. Similarity scores for pairs rated as dissimilar
Facial Similarity Analysis: A Three-Way Decision Perspective 303
700
600
500
400
Count
300
200
100
0
Participants
Simulated Participants
Fig. 6 Number of possible pile configurations that may contain a particular photo pair, by partici-
pant. Real participants on the left and simulated participants on the right
304 D. H. Hepting et al.
350
Number of Photos with Pair
300
300
Number of Photos with Pair
250
250
200
200
150
150
100
100
50
50
0
0
N Y N Y
350
Number of Photos with Pair
300
300
Number of Photos with Pair
250
250
200
200
150
150
100
100
50
50
0
N Y N Y
350
Number of Photos with Pair
300
300
Number of Photos with Pair
250
250
200
200
150
150
100
100
50
50
0
N Y N Y
Fig. 7 Summary of pile configuration sizes for the sample pairs (see Figs. 3, 4 and 5). The bold
lines indicate the median sizes
Facial Similarity Analysis: A Three-Way Decision Perspective 305
Summary of Relative Ranks for Pair S1 Summary of Relative Ranks for Pair S2
1.0
1.0
0.8
0.8
Relative Rank
Relative Rank
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
N Y N Y
Summary of Relative Ranks for Pair U1 Summary of Relative Ranks for Pair U2
1.0
1.0
0.8
0.8
Relative Rank
Relative Rank
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
N Y N Y
Summary of Relative Ranks for Pair D1 Summary of Relative Ranks for Pair D2
1.0
1.0
0.8
0.8
Relative Rank
Relative Rank
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
N Y N Y
Fig. 8 Summary of relative ranks for the sample pairs (see Figs. 3, 4 and 5). The bold lines indicate
the median relative ranks
306 D. H. Hepting et al.
Table 3 Calculations from Eq. 12 carried out for participant 21, who made 7 piles of photos with
sizes: 2, 19, 36, 36, 56, 86, and 120
Size Rank Relative rank Similarity score
2 1 0.0357 0.9821
19 2 0.0714 0.9643
2 + 19 = 21 3 0.1071 0.0536
36 4 0.1429 0.9286
2 + 36 = 38 6 0.2143 0.1071
19 + 36 = 55 8 0.2857 0.1429
56 10 0.3571 0.8214
2 + 56 = 58 11 0.3929 0.1964
36 + 36 = 72 12 0.4286 0.2143
19 + 56 = 75 13 0.4642 0.2321
86 14 0.5000 0.7500
2 + 86 = 88 15 0.5357 0.2679
36 + 56 = 92 16 0.5714 0.2857
19 + 86 = 105 18 0.6429 0.3214
120 19 0.6786 0.6607
2 + 120 = 122 20 0.7143 0.3571
36 + 86 = 122
19 + 120 = 139 23 0.8214 0.4107
56 + 86 = 142 24 0.8571 0.4286
36 + 120 = 156 25 0.8929 0.4464
56 + 120 = 176 27 0.9643 0.4821
86 + 120 = 206 28 1.0000 0.5000
Similar ratings are indicated by bold type
(not placed in the same pile) will be in the range (0, 0.5] and scores for pairs rated
as similar (placed in the same pile) will be in the range (0.5, 1.0). A score near 0
occurs when the photo pair is rated as dissimilar, but the combined size of the piles
containing the photos is very small. A score near 1 occurs when the photo pair is
rated as similar and the size of that pile is the very small. The similarity scores of the
sample pairs are, for S1: 0.7377; for S2: 0.7230; for U1: 0.5607; for U2: 0.5742; for
D1: 0.4015; and for D2: 0.4421. In Sect. 5.2, we began with α0 = 0.6 and β0 = 0.4.
We notice that S1, S2, U1, and U2 remain in their original regions. However D1
and D2 are now both in region UND. Let us examine the selection of α and β more
closely.
Facial Similarity Analysis: A Three-Way Decision Perspective 307
Fig. 9 Plot of similarity Plot of Similarity Scores from Ranked Pile Configurations
scores from rank of pile
1.0
configurations for participant
21. See Table 3 for the
calculations
0.8
0.6
Score
0.4
0.2
0 5 10 15 20 25
Rank
Figure 11 considers all similarity scores from all ratings of photopairs. The boxplot
summarizes 1,267,785 dissimilar (N) ratings and 304,186 similar (Y) ratings. From
this analysis, we chose two sets of thresholds.
• α1 = 0.7000 (median score for pairs in same pile), β1 = 0.4367 (median score
for pairs in different piles). The application of this threshold set is illustrated in
Fig. 13.
• α2 = 0.6389 (25th percentile of scores for pairs in same pile), β2 = 0.4824 (75th
percentile of scores for pairs in different piles). The application of this threshold
set is illustrated in Fig. 14.
In Fig. 12, the trilinear plot summarizes an exploration for values of α and β. Each
plotted point represents the fraction of pairs in the DIS, UND, and SIM regions by
different choices for α and β. Points at a vertex indicate 100% of the pairs are assigned
to the region indicated by vertex label. In this Figure, each point represents the assign-
ment of all 63,190 pairs to the 3 regions. It is also possible to consider the assignment
of a pair’s individual ratings to those regions and obtain more finely-grained infor-
mation about the pair’s similarity. Figures 13 and 14 illustrate the assignment of
individual ratings amongst the DIS, UND, and SIM regions.
308 D. H. Hepting et al.
1.0
0.8
0.8
Similarity Score
Similarity Score
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
5 10 15 20 25 5 10 15 20 25
1.0
0.8
0.8
Similarity Score
Similarity Score
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
5 10 15 20 25 5 10 15 20 25
1.0
0.8
0.8
Similarity Score
Similarity Score
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
5 10 15 20 25 5 10 15 20 25
Fig. 10 Summary of similarity scores, sorted into ascending order, for each rating of the sample
pairs (see Figs. 3, 4 and 5)
Facial Similarity Analysis: A Three-Way Decision Perspective 309
1.0
1 = 0.7000
1 = 0.4367
2 = 0.6389
2 = 0.4824
0.8
0.6
Score
0.4
0.2
0.0
N Y
Fig. 11 Summary of similarity scores for dissimilar (N) and similar (Y) ratings for all 63,190 pairs,
computed according to Eq. 13. Two pairs of thresholds, (α1 , β1 ) and (α2 , β2 ), are also indicated
u = 15, l = 10
0 = 0.6000, 0 = 0.4000
1 = 0.7000, 1 = 0.4367 UND
2 = 0.6389, 2 = 0.4824
DIS SIM
Fig. 12 This trilinear plot summarizes an exploration for values of α and β taken from [0,1] at
increments of 0.01 such that α > β. Each point plotted in grey represents a choice of α and β.
Plotted in black are the points corresponding to Table 4
310 D. H. Hepting et al.
25
20
20
15
15
Count
Count
10
10
5
5
0
0
DIS UND SIM DIS UND SIM
Region Region
25
20
20
15
15
Count
Count
10
10
5
5
0
25
20
20
15
15
Count
Count
10
10
5
5
0
Fig. 13 Classification as one of dissimilar, undecidable, or similar. These decisions are based on
thresholds α1 = 0.7000 and β1 = 0.4367
Facial Similarity Analysis: A Three-Way Decision Perspective 311
25
20
20
15
15
Count
Count
10
10
5
5
0
0
DIS UND SIM DIS UND SIM
Region Region
25
20
20
15
15
Count
Count
10
10
5
5
0
25
20
20
15
15
Count
Count
10
10
5
5
0
Fig. 14 Classification as one of dissimilar, undecidable, or similar. These decisions are based on
thresholds α2 = 0.6389 and β2 = 0.4824
312 D. H. Hepting et al.
Table 4 Number of pairs classified for different threshold pairs. The first line of data is repeated
from Table 2
Thresholds Dissimilar (DIS) Undecidable (UND) Similar (SIM)
(u = 15, l = 10) 56,649 6416 125
(α0 = 0.6000, β0 = 2782 60,018 390
0.4000)
(α1 = 0.7000, β1 = 16,472 46,714 4
0.4367)
(α2 = 0.6389, β2 = 43,469 19,649 72
0.4824)
Acknowledgements The authors thank the editors, Rafael Bello, Rafael Falcon, and José Luis
Verdegay, for their encouragement and the anonymous reviewers for their constructive comments.
This work has been supported, in part, by two NSERC Discovery Grants.
References
1. Gao, C., Yao, Y.: Actionable strategies in three-way decisions. In: Knowledge-Based Systems
(2017). https://doi.org/10.1016/j.knosys.2017.07.001
2. Grant, D.A., Berg, E.: A behavioral analysis of degree of reinforcement and ease of shifting
to new responses in Weigl-type card-sorting problem. J. Exp. Psychol. 38(4), 404–411 (1948).
https://doi.org/10.1037/h0059831
3. Hepting, D.H., Almestadi, E.H.: Discernibility in the analysis of binary card sort data. In:
Ciucci, D., Inuiguchi, M., Yao, Y., Śle˛zak, D., Wang, G. (eds.) Rough Sets, Fuzzy Sets, Data
Facial Similarity Analysis: A Three-Way Decision Perspective 313
Mining, and Granular Computing. RSFDGrC 2013, Lecture Notes in Computer Science, vol.
8170, pp. 380–387. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-41218-9_41
4. Hepting, D.H., Spring, R., Śle˛zak, D.: A rough set exploration of facial similarity judgements.
In: Peters, J.F., Skowron, A., Hiroshi, S., Chakraborty, M.K., Śle˛zak, D., Hassanien, A.E., Zhu,
W. (eds.) Transactions on Rough Sets XIV. Lecture Notes in Computer Science, vol. 6600, pp.
81–99. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-21563-6_5
5. Kelly, J.: Computing, Cognition and the Future of Knowing. Whitepaper, IBM Reseach (2015)
6. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing
of Visual Information. W.H. Freeman and Company, New York (1982)
7. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach.
Int. J. Man-Mach. Stud. 29(1), 81–95 (1988). https://doi.org/10.1016/S0020-7373(88)80032-4
8. Pedrycz, W.: Shadowed sets: representing and processing fuzzy sets. Trans. Sys. Man Cyber.
Part B 28(1), 103–109 (1998). https://doi.org/10.1109/3477.658584
9. Pedrycz, W.: Shadowed sets: bridging fuzzy and rough sets. In: Rough-Fuzzy Hybridization:
A New Trend in Decision Making, 1st edn., pp. 179–199. Springer, New York (1999)
10. Yao, Y.: An Outline of a Theory of Three-Way Decisions, pp. 1–17. Springer, Berlin (2012).
https://doi.org/10.1007/978-3-642-32115-3_1
11. Yao, Y.: Three-way decisions and cognitive computing. Cogn. Comput. 8(4), 543–554 (2016).
https://doi.org/10.1007/s12559-016-9397-5
12. Yao, Y., Gao, C.: Statistical Interpretations of Three-Way Decisions, pp. 309–320. Springer
International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-25754-9_28
13. Yao, Y., Greco, S., Słowiński, R.: Probabilistic Rough Sets, pp. 387–411. Springer, Berlin
(2015). https://doi.org/10.1007/978-3-662-43505-2_24
14. Yao, Y.Y.: Probabilistic approaches to rough sets. Expert Syst. 20(5), 287–297 (2003). https://
doi.org/10.1111/1468-0394.00253
15. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965). https://doi.org/10.1016/S0019-
9958(65)90241-X
16. Zadeh, L.A.: Similarity relations and fuzzy orderings. Inf. Sci. 3(2), 177–200 (1971). https://
doi.org/10.1016/S0020-0255(71)80005-1
Part III
Hybrid Approaches
Fuzzy Activation of Rough Cognitive
Ensembles Using OWA Operators
Marilyn Bello, Gonzalo Nápoles, Ivett Fuentes, Isel Grau, Rafael Falcon,
Rafael Bello and Koen Vanhoof
1 Introduction
The advent of Big Data [8] has underscored the need to shift how automated sys-
tems ingest, represent and process real-world or simulated data. Given the volume,
velocity, veracity and variability challenges posed by the Big Data phenomenon, it
is no longer realistic to expect that traditional pattern classification algorithms [10]
could sift through these sizable datasets and yield actionable insights in a reason-
able amount of time. The focus has then moved to the development of algorithms
that perceive and treat data at a higher, more symbolic level instead of dealing with
the underlying, often numerical representation. Granular Computing (GrC) [4] has
proved an excellent paradigm for this kind of processing that suits our data-prolific
world quite well.
One of the manifestations of applying GrC to automated systems is that of granular
classifiers [3]. In particular, Fuzzy Cognitive Maps (FCMs) [17] have been hybridized
with information granules stemming from fuzzy sets [25] or rough sets [22, 23].
Rough cognitive networks (RCNs) [23] are a type of granular classifier in which
a sigmoid FCM’s topology (i.e., the set of concepts and weights) is automatically
learned from data. An RCN node denotes either a decision class or one of the three
approximation regions (positive, negative or boundary) originated from a granulation
of the input space according to Rough Set Theory (RST) principles.
While RCN’s classification performance was deemed competitive with respect
to state-of-the-art classifiers [23], they were still sensitive to an input parameter
denoting the similarity threshold upon which the rough information granules are
built. To overcome that limitation, Rough Cognitive Ensembles (RCEs) were recently
put forth by Nápoles et al. [22]. An RCE is an ensemble method with a collection
of RCNs as base classifiers, each operating at a different granularity level. After
comparing RCEs to 15 state-of-the-art classifiers, it was concluded that the proposed
technique produced highly competitive prediction rates.
In this paper we bring forth a new activation mechanism for RCE that boosts its
performance in classification problems. This new mechanism essentially quantifies
the extent to which an object belongs to the intersection between its similarity class
and each granular region. For that, it is necessary applying an information aggregation
process. In this research, we use an aggregation technique based on the ordered
weighted averaging operators (OWA) [34]. After comparing the improved ensemble
classifier to the original RCE model and 14 other state-of-the-art classifiers, the
experimental evidence suggests that our scheme yields very promising classification
rates.
The remainder of this paper is structured as follows. Section 2 elaborates on
the two building blocks behind of rough cognitive mapping. Section 3 unveils the
fundamentals of the RCNs and RCEs. The new activation rule is described in Sect. 4
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 319
while the empirical analysis is found in Sect. 5. Conclusions and pointers to future
work are given in Sect. 6.
2 Theoretical Background
Rough Set Theory is a methodology proposed in the early 1980’s for handling uncer-
tainty arising in the form of inconsistency [24]. Let DS = (U, Ψ ∪ {d}) denote a
decision system where U is a non-empty, finite set of objects called the universe
of discourse, Ψ is a non-empty, finite set of attributes describing any object in U
and d ∈ / Ψ represents the decision attribute. Any subset X ∈ U can be approximated
by two crisp sets, which are referred to as its lower and upper approximations and
denoted by Φ X = {x ∈ U | [x]Φ ∈ X } and Φ X = {x ∈ U | [x]Φ ∩ X = ∅}, respec-
tively. In this classic formulation, the equivalence class [x]Φ comprises the set of
objects in U that are deemed inseparable from x according to the information con-
tained in the attribute subset Φ ⊆ Ψ .
The lower and upper approximations are the basis for computing the posi-
tive, negative and boundary regions of any set X . The positive region P O S(X ) =
Φ X includes those objects that are certainly contained in X ; the negative region
N E G(X ) = U − Φ X denotes those objects that are certainly not related to X , while
the boundary region B N D(X ) = Φ X − Φ X captures the objects whose member-
ship to the set X is uncertain, i.e., they might be members of X .
In the original RST formulation, two objects are deemed indiscernible if they have
identical values for the selected attributes. This binary equivalence relation leads to a
partition of the universe into multiple equivalence classes. While this definition works
well with nominal attributes, it is not applicable to numerical attributes. To relax
this stringent requirement, we can replace the equivalence relation with a similarity
relation.
Equation (1) shows the indiscernibility relation adopted in this paper, where 0 ≤
ϕ(x, y) ≤ 1 is a similarity function. This weaker binary relation claims that two
objects x and y are inseparable as long as their similarity degree ϕ(x, y) goes above
a similarity threshold 0 ≤ ξ ≤ 1. This user-specified parameter establishes the degree
of granularity upon which the similarity classes are built. Determining the precise
granularity degree becomes a central issue when designing high-performing rough
classifiers.
R : x Ry ⇐⇒ ϕ(x, y) ≥ ξ (1)
After a fixed number of iterations, the system will arrive at one of the following
states: (i) equilibrium point, (ii) limited cycle or (iii) chaotic behavior [18]. The map
is said to have converged if it reaches a fixed-point attractor. Otherwise, the process
terminates after a maximum number of iterations T is reached and tput corresponds
to the activation vector A(T ) in the last iteration T .
In this section, we introduce the main principles behind the Rough Cognitive Net-
works and the Rough Cognitive Ensembles.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 321
Recently, Nápoles and his collaborators [23] introduced the RCNs in an attempt to
develop an accurate, transparent classification model that hybridizes RST and FCMs.
Basically, an RCN is a granular sigmoid FCM whose topology is defined by the
abstract semantics of the three-way decision rules [35, 36]. The set of input neurons
in an RCN represents the positive, boundary and negative regions of the decision
classes in the problem under consideration. tput neurons describe the set of decision
classes. The RCN topology (both concepts and weights) is entirely computed from
historical data, thus removing the need for expert intervention at this stage.
The first step in the RCN learning process is related to the input data granula-
tion using RST. The positive, boundary and negative regions of each decision class
according to a predefined attribute subset are computed using the training data set
and a predefined similarity relation R (see Sect. 2.1).
The second step is concerned with automated topology design. A sigmoid FCM
is automatically created from the previously computed RST-based information gran-
ules. In this scheme, each rough region is mapped to an input neuron whereas each
decision class is represented by an output neuron. Rules R1 − R4 formalize the direc-
tion and intensity of the causal weights in the proposed topology; these weights are
estimated by using the abstract semantics of three-way decision rules.
• (R1 ) IF Ci is Pk AND C j is Dk THEN wi j = 1.0
• (R2 ) IF Ci is Pk AND C j is Dv=k THEN wi j = −1.0
• (R3 ) IF Ci is Pk AND C j is Pv=k THEN wi j = −1.0
• (R4 ) IF Ci is Nk AND C j is Dk THEN wi j = −1.0
In such rules, Ci and C j represent two map neurons, Pk and Nk are the positive
and negative regions related to the kth decision respectively while −1 ≤ wi j ≤ 1 is
the causal weight between the cause Ci and the effect C j .
Although the boundary regions are concerned with an abstaining decision, an
instance x ∈ B N D(X k ) could be positively related to the kth decision alternative.
Therefore, an additional rule considering the knowledge about boundary regions is
introduced.
• (R5 ) IF Ci is Bk AND C j is Dv AND B N D(X k ) ∩ B N D(X v ) = ∅ THEN wi j =
0.5
Figure 1 displays an RCN for solving binary classification problems. Notice that
we added a self-reinforcement positive causal connection to each input neuron with
the goal of preserving its initial excitation level when performing the neural updating
rule.
The last step refers to the network exploitation, which simply means computing the
response vector Ax (D) = (A x (D1 ), . . . , A x (Dk ), . . . , A x (D K )). The input object
x is presented to the RCN as an input vector A(0) that activates the causal network.
Rules R6 –R8 formalize the method used to activate the input neurons, which is based
on the inclusion degree of the object to each rough granular region.
322 M. Bello et al.
Fig. 1 RCN for pattern recognition problems with two decision classes
| R̄(x) ∩ P O S(X k )|
• (R6 ) IF Ci is Pk THEN Ai(0) =
|P O S(X k )|
(0) | R̄(x) ∩ N E G(X k )|
• (R7 ) IF Ci is Nk THEN Ai =
|N E G(X k )|
| R̄(x) ∩ B N D(X k )|
• (R8 ) IF Ci is Bk THEN Ai(0) =
|B N D(X k )|
Once the excitation vector A(0) has been computed, the reasoning rule depicted in
Eq. (2) is performed until either the network converges to a fixed-point attractor or a
maximal number of iterations is reached. Then, the class with the highest activation
value is assigned to the object.
RCEs were recently introduced in [22] so as to eliminate the RCN parameter learning
stage. An RCE is an ensemble of several RCNs where each base classifier operates
at a different granularity degree.
Figure 2 displays an RCE comprised of N base classifiers with K decision classes,
where Dk(i) denotes the kth decision class for the ith granular network R(ξi ) and Dk
is the aggregated-type concept associated with the kth decision class.
In order to activate the ensemble, N excitation vectors {A(0)
[x|ξi ] }i=1 are computed,
N
(0)
where A[x|ξi ] is used to perform the neural reasoning process in the ith RCN. The
ith activation vector denotes the inclusion degree of the similarity class R̄(ξi ) (x) into
each information granule induced by the corresponding similarity threshold ξi .
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 323
The reader may notice that if ξi ≤ ξ j then R̄(ξi ) (x) ⊆ R̄(ξ j ) (x), which could pro-
duce correlated base classifiers [31]. Hence, the authors resorted to instance bagging
[5] in order to counter the correlation effects coming from this rule. By doing so, a
reasonable trade-off between ensemble diversity and accuracy was attained.
Another important aspect of RCEs is related to the aggregation of multiple outputs
once the neural reasoning step is completed. Combining the decisions of different
models means amalgamating the various outputs into a single prediction. The sim-
plest way to do this in classification models is adopting a standard (or weighted)
voting scheme [7]; in this way, the predicted class is derived from the aggregated
output vector.
In RCN-based classifiers, once the networks have been constructed we can determine
the decision class for a new observation by performing the neural reasoning process.
Rules R6 –R8 compute the initial activation vector A(0) , as mentioned in Sect. 2.2.
This mechanism is simply the proportion of the objects in a particular rough region
R E G(X ) that also belong to the new object’s similarity class R̄(x). It does not
take into account the similarity of these objects (located at the intersection of both
concepts, y ∈ R̄(x) ∩ R E G(X )) with respect to the objects in R̄(x) or those in
R E G(X ).
mechanism does not explicitly consider the membership degree of y2 to either concept
R̄(x) or P O S(X 1 ) when activating the corresponding neuron.
Rules R6∗ –R8∗ comprise the new rules proposed in this paper.
• (R6∗ ) IF Ci is Pk THEN Ai(0) = A P O S(X k ) (x)
• (R7∗ ) IF Ci is Nk THEN Ai(0) = A N E G(X k ) (x)
• (R8∗ ) IF Ci is Bk THEN Ai(0) = A B N D(X k ) (x)
The terms μ R̄(x) (y) and μ R E G(X k ) (y) are the membership degrees of y to the test
object’s similarity class and rough region of the concept X , respectively. For com-
puting both, it is necessary applying an information aggregation process.
In this research, we use an aggregation technique based on the ordered weighted
averaging operators (OWA) [34] that provide an aggregation which lies in between
two extreme cases. At one extreme is the situation in which we desire that all the
criteria be satisfied. At the other extreme is the case in which the satisfaction of any
of the criteria is all we desire. These two extreme cases lead to the use of “and” and
“or” operators to combine both criteria (i.e., μ R̄(x) (y) and μ R E G(X k ) (y)).
In Eqs. (4) and (5) it shows how the terms μ R̄(x) (y) and μ R E G(X k ) (y) are calculated
using OWA operators.
μ OR̄(x)
WA
(y) = O W A W (ϕ(y, x1 ), . . . , ϕ(y, xn )), xi=1,n ∈ R̄(x) (4)
μ OR EWG(X
A
k)
(y) = O W A W (ϕ(y, x1 ), . . . , ϕ(y, xn )), xi=1,n ∈ R E G(X k ) (5)
Example 2 Let us assume that R̄(x) and P O S(X 1 ) are given as displayed in Example
1 and W = W Ave = ( n1 , . . . , n1 ). Following on from this, they are computed as the
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 325
average similarity of y to all objects in each set. Additionally, let us suppose that the
similarity among all the objects is given in Table 1. From the above assumption we
can compute:
• μ R̄(x) (y2 ) = (ϕ(y2 , x) + ϕ(y2 , y1 ) + ϕ(y2 , y2 ))/|{x, y1 , y2 }| = (0.95 + 0.35 + 1)
/3 = 0.77
• μ P O S(X 1 ) (y2 ) = (ϕ(y2 , y2 ) + ϕ(y2 , y3 ) + ϕ(y2 , y4 ))/|{y2 , y3 , y4 }| = (1 + 0.98 +
0.95)/3 = 0.98
In this example, the activation degree A(0)P O S(X 1 ) = (0.98 ∗ 0.77)/2.89 = 0.26.
The reader can notice that this value is slightly lower than 0.33 in Example 1, and
presumably more realistic. In the next section, we explore the prediction capability
of the RCE algorithm using this new activation mechanism. The resulting algorithm
is named Fuzzy Rough Cognitive Ensembles.
We first describe the experimental settings and then compare RCE’s performance in
both crisp and fuzzy environments. To conclude, we compare the best-performing
ensemble algorithm against state-of-the-art classifiers.
Aiming at exploring whether the improved method leads to higher prediction rates
or not, we leaned upon 100 classification datasets taken from the UCI Machine
Learning [20] repository. Table 2 outlines the number of instances, attributes and
decision classes for each dataset. In the adopted datasets, the number of attributes
ranges from 2 to 240, the number of decision classes from 2 to 38, and the number of
instances from 14 to 5300. These ML problems involve 9 noisy and 29 imbalanced
datasets, where the imbalance ratio ranges from 5:1 to 439:1.
The presence of noise and the imbalance ratio (calculated as the ratio of the size
of the majority class to that of the minority class) are also given. In this paper, we
say that a dataset is imbalanced if the number of instances belonging to the majority
326 M. Bello et al.
decision class is at least five times the number of instances belonging to the minority
class. On the other hand, we replaced missing values with the mean or the mode
depending on whether the attribute was numerical or nominal, respectively.
Moreover, we evaluate the algorithms’ performance for three heterogeneous
distance functions taken from [33]: the Heterogeneous Euclidean-Overlap Metric
(HEOM), the Heterogeneous Manhattan-Overlap Metric (HMOM) and the Hetero-
geneous Value Difference Metric (HVDM).
• The Heterogeneous Euclidean-Overlap Metric (HEOM). This heterogeneous
distance function computes the normalized Euclidean distance between numerical
attributes and an overlap metric for nominal attributes.
• The Heterogeneous Manhattan-Overlap Metric (HMOM). This heterogeneous
variant is similar to the HEOM function since it replaces the Euclidean distance
with the Manhattan distance when computing the dissimilarity between two numer-
ical values.
• The Heterogeneous Value Difference Metric (HVDM). This function involves
a stronger strategy for quantifying the dissimilarity between two discrete attribute
values. Instead of computing the matching between attribute values, it measures
the correlation between such attributes and decision classes.
The similarity threshold associated with each base classifier is uniformly dis-
tributed in the [0 : 96; 1) interval. In all ensemble models, the number of RCN base
classifiers is set to N = 10 in order to keep the computational complexity manage-
able.
Each dataset has been partitioned using a 10-fold cross-validation procedure, i.e.,
the dataset has been split into ten folds, each containing 10% of the instances. For
each fold, an ML algorithm is trained with the instances contained in the training
partition (all other folds) and then tested with the current fold, so no object is used
for training and testing purposes at the same time.
Table 2 (continued)
Dataset Instances Attributes Classes Noisy Imbalance
Ecoli2 336 7 2 No 5:1
Ecoli3 336 7 2 No 8:1
Energy-y1 768 8 38 No No
Eucalyptus 736 19 5 No No
Glass0 214 9 2 No No
Glass-0123 versus 456 214 9 2 No No
Glass1 214 9 2 No No
Glass2 214 9 2 No No
Glass-20an-nn 214 9 6 Yes 8:1
Glass3 214 9 2 No 6:1
Glass-5an-nn 214 9 6 Yes 8:1
Glass6 214 9 2 No 6:1
Hayes-roth 160 4 3 No No
Heart-statlog 270 13 2 No No
Ionosphere 351 34 2 No No
Iris 150 4 3 No No
Iris0 150 4 2 No No
Iris-20an-nn 150 4 3 Yes No
Iris-5an-nn 150 4 3 Yes No
Labor 57 16 2 No No
LED7digit 500 7 10 No No
Lung-cancer 32 56 3 No No
Mammographic 830 5 2 No No
mfeat-fourier 2000 76 10 No No
mfeat-morpho 2000 6 10 No No
mfeat-pixel 2000 240 10 No No
mfeat-zernike 2000 47 10 No No
Molecular-biology 106 57 2 No No
monk-2 432 6 2 No No
New-thyroid 215 5 2 No 5:1
Parkinsons 195 22 2 No No
pima 768 8 2 No No
pima-10an-nn 768 8 2 Yes No
pima-20an-nn 768 8 2 Yes No
pima-5an-nn 768 8 2 Yes No
Planning 182 12 2 No No
Postoperative 90 8 3 No 32:1
Primary-tumor 339 17 22 No 84:1
saheart 462 9 2 No No
Solar-flare-1 323 5 6 No 11:1
(continued)
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 329
Table 2 (continued)
Dataset Instances Attributes Classes Noisy Imbalance
Solar-flare-2 1066 12 6 No 7:1
Sonar 208 60 2 No No
Soybean 683 35 19 No 11:1
Spectfheart 267 44 2 No No
Sponge 76 44 3 No 23:1
Tae 151 5 3 No No
Tic-tac-toe 958 9 2 No No
Vehicle 846 18 4 No No
Vehicle0 846 18 2 No No
Vehicle1 846 18 2 No No
Vehicle2 846 18 2 No No
Vehicle3 846 18 2 No No
Vertebral2 310 6 2 No No
Vertebral3 310 6 3 No No
Vowel 990 13 11 No No
Weather 14 4 2 No No
Wine 178 13 3 No No
Wine-5an-nn 178 13 3 Yes No
Winequality-white 4898 11 7 No 439:1
Wisconsin 683 9 2 No No
Yeast1 1484 8 2 No No
Zoo 101 16 7 No 10:1
Following on from this, we adopt the Lukasiewicz t-norm in the rest of the simulations
conducted in this paper.
The second experiment explore different OWA operators pointed out in [34]. In
Eqs. (6)–(8) are shown three important special cases of these OWA operators.
Fig. 3 Average Kappa measure computed for the proposed model using three heterogeneous dis-
tance functions with different t-norms
Fig. 4 Average Kappa measure computed for the proposed model using three heterogeneous dis-
tance functions with different OWA operators
1 1 1
O W A W Ave , wher e W Ave = ( , , . . . , )T (8)
n n n
Figure 4 shows the average Kappa coefficient achieved by FRCE across three het-
erogeneous distance functions using different OWA operators. From the above sim-
ulations we can notice that the proposed model computes the best prediction rates
with O W A W Ave operator.
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 331
6 Conclusions
In this paper, we presented a fuzzy activation mechanism for RCNs. This mechanism
is based on the assumption that objects may belong to the intersection set between
the similarity class and each non-empty granular region with different membership
degree. The numerical results have shown that the proposed modification leads to
improved prediction rates, while it remains comparable with selected state-of-the-
art classifiers. The fuzzy approach is only focused on the activation mechanism, the
information granules are still crisp. The future research will be focused on replacing
the crisp constructs with fuzzy ones, so further flexibility may be achieved.
References
1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1),
37–66 (1991)
2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Com-
put. 9(7), 1545–1588 (1997)
3. Balamash, A., Pedrycz, W., Al-Hmouz, R., Morfeq, A.: Granular classifiers and their design
through refinement of information granules. Soft Comput. 1–15 (2015)
4. Bargiela, A., Pedrycz, W.: Granular Computing: An Introduction, vol. 717. Springer Science
& Business Media (2012)
5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
334 M. Bello et al.
7. Bryll, R., Gutierrez-Osuna, R., Quek, F.: Attribute bagging: improving accuracy of classifier
ensembles by using random feature subsets. Pattern Recogn. 36(6), 1291–1302 (2003)
8. Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies:
a survey on big data. Inf. Sci. 275, 314–347 (2014)
9. Cleary, J.G., Trigg, L.E., et al.: K*: an instance-based learner using an entropic distance mea-
sure. In: Proceedings of the 12th International Conference on Machine Learning, vol. 5, pp.
108–114 (1995)
10. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2012)
11. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis
of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data
mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
13. Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Con-
ference on Neural Networks, 1989. IJCNN, pp. 593–605. IEEE (1989)
14. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Pro-
ceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345.
Morgan Kaufmann Publishers Inc. (1995)
15. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO
algorithm for SVM classifier design. Neural Comput. 13(3), 637–649 (2001)
16. Kohavi, R.: The power of decision tables. In: Machine Learning: ECML-95, pp. 174–189.
Springer, Berlin (1995)
17. Kosko, B.: Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 24(1), 65–75 (1986)
18. Kosko, B.: Hidden patterns in combined and adaptive knowledge networks. Int. J. Approx.
Reason. 2(4), 377–393 (1988)
19. Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Machine Learn. 59(1–2), 161–205
(2005)
20. Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
21. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. In:
Anderson, J.A., Rosenfeld, E. (eds.) Neurocomputing: Foundations of Research, pp. 15–27.
MIT Press, Cambridge (1988)
22. Nápoles, G., Falcon, R., Papageorgiou, E., Bello, R., Vanhoof, K.: Rough cognitive ensembles.
Int. J. Approx. Reason. 85, 79–96 (2017)
23. Nápoles, G., Grau, I., Papageorgiou, E., Bello, R., Vanhoof, K.: Rough cognitive networks.
Knowl. Based Syst. 91, 46–61 (2016)
24. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
25. Pedrycz, W., Homenda, W.: From fuzzy cognitive maps to granular cognitive maps. IEEE
Trans. Fuzzy Syst. 22(4), 859–869 (2014)
26. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman Publishers (1993)
27. Shi, H.: Best-first decision tree learning. Ph.D. thesis, Citeseer (2007)
28. Smeeton, N.C.: Early history of the kappa statistic. Biometrics 41, 795 (1985)
29. Su, J., Zhang, H.: A fast decision tree learning algorithm. In: Proceedings of the 21st National
Conference on Artificial Intelligence, vol. 1, pp. 500–505. AAAI’06, AAAI Press (2006)
30. Sumner, M., Frank, E., Hall, M.: Speeding up logistic model tree induction. In: Knowledge
Discovery in Databases: PKDD 2005, pp. 675–683. Springer (2005)
31. Turner, K., Oza, N.C.: Decimated input ensembles for improved generalization. In: Interna-
tional Joint Conference on Neural Networks, 1999. IJCNN’99, vol. 5, pp. 3069–3074. IEEE
(1999)
32. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–93 (1945)
33. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res.
6(1), 1–34 (1997)
34. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision-
making. In: Readings in Fuzzy Sets for Intelligent Systems, pp. 80–87. Elsevier (1993)
35. Yao, Y.: Three way decision: an interpretation of rules in rough set theory. In: Wen, P., Li, Y.,
Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) Rough Sets and Knowledge Technology,
pp. 642–649. Springer, Berlin (2009)
Fuzzy Activation of Rough Cognitive Ensembles Using OWA Operators 335
36. Yao, Y.: The superiority of three-way decisions in probabilistic rough set models. Inf. Sci.
181(1), 1080–1096 (2011)
Prediction by k-NN and MLP a New
Approach Based on Fuzzy Similarity
Quality Measure. A Case Study
Abstract In this paper the performance of k Nearest Neighbors (k-NN) and Multi-
layer Perceptron network (MLP) algorithms are used in a classical task in the branch
of the Civil Engineering: prediction of the behavior before the stud corrosion of
anchorage of the railways fixations. The use of fuzzy similarity quality measure
method for calculating the weights of the features that combines the Univariant
Marginals Distribution Algorithm (UMDA), allows to performance of k-NN and
MLP in the case of mixed data (features with discrete or real domains). Experimen-
tal results show that this approach is better than other methods used to calculate the
weight of the features.
1 Introduction
Inside the field of the Artificial Intelligence, the Rough Set Theory (RST) proposed
by Pawlak in 1982 offers measures for the analysis of data. The measure called
classification quality allows calculating the consistency of a decision system. Its
main limitation is being used only for decision systems where the features domain
is discrete. A new measure (named Similarity Quality Measure) for the case of
decisions systems in which the features domain, including the decision feature, does
not have to be necessarily discrete, is proposed in [1]. This measure has the limitation
of using thresholds when constructing relations of similarity among the objects of
the decision system. These thresholds are parameters of the method to be adjusted
and parameters are aggravating factors recognized when analyzing any algorithm.
The accuracy of the method is very sensitive to small variations in the threshold.
Threshold values are also dependent on the application, so an exquisite adjustment
process of the thresholds is needed to maximize the performance of the knowledge
discovery process. Therefore, it is necessary to incorporate a technique that allows
us handling inaccuracy. The Fuzzy Sets Theory [2], as one of the main elements of
soft computing, uses fuzzy relations to make computational methods more tolerant
and flexible to inaccuracy, especially in the case of mixed data. Since Similarity
Quality Measure is quite sensitive to similarity values of thresholds, this limitation
was tackled by using fuzzy sets to categorize its domains through fuzzy binary
relations. This new measure named Fuzzy Similarity Quality Measure based on Fuzzy
Sets facilitate the definition of similarity relations (since there are fewer parameters
to consider) without degrading, from a statistical perspective, the efficiency of the
mining tasks of subsequent data. The Fuzzy Similarity Quality Measure computes the
relation between the similarity according to the conditional features and the similarity
according to the decision feature d. The method proposed here as a weighing method
of features is based on a heuristic search in which the quality of the fuzzy similarity
measure of a decision system is used as heuristic value. We use UMDA [3] to find the
best set of weight; this method has showed good performance to solve optimization
problems [1]. In this problem each particle represents a set of weights W and the
quality of particle is calculated by the fuzzy similarity measure. The impact of a new
method called UMDA+RST+FUZZY, in the k- Nearest Neighbors (k-NN) [4] and
MLP [5] algorithms is studied in this paper.
x R1 y = F1 (X, Y ) (1)
x R2 y = F2 (X, Y ) (2)
where: R1 and R2 are fuzzy relations defined to describe the similarity between
objects x and y regarding condition features and feature decision respectively. Binary
relations R1 and R2 are defined by the following functions F1 Eqs. 3 and 4.
k
F1 (X, Y ) = wi ∗ ∂i (X i , Yi ) (3)
i=1
where
⎧ |(X i −Yi )|
⎨1 − Max(αi )−Min(αi )
i f i is continuous
∂(X i , Yi ) = 1 i f i is discr ete and X i = Yi (5)
⎩
0 i f i is discr ete and X i = Yi
The problem is finding the functions F1 and F2 such that N1 (x) = N2 (x), where
the symbol “=” the greatest possible similarity between N1 (x) and N2 (x) sets for
every object in the universe. The degree of similarity between the two sets for an
object x is calculated as the similarity between fuzzy sets N1 (x) and N2 (x) can be
calculated by expression 8. The expression 8 was presented in [7].
k
[1 − |μR1 (x1 ) − μR2 (x2 )|]
i=1
ϕ(x) = (8)
n
340 Y. Filiberto et al.
Using the expression 8 as the quality of a similarity decision system (DS) with a
universe of objects N is defined by Eq. 9:
⎧ ⎫
⎪
⎪
k ⎪
⎪ ϕ(x) ⎪
⎨ ⎪
⎬
i=1
θ (DS) = (9)
⎪
⎪ n ⎪
⎪
⎪
⎩ ⎪
⎭
The key idea in the k-NN method is that similar input data vectors have similar
output values [1, 8]. This algorithm assumes all instances correspond to points in the
n-dimensional space Rn . The target function value for a new query is estimated from
the known values of the k nearest training examples. One obvious refinement to the
k-NN algorithm is to weight the contribution of each of the k neighbors according to
their distance to the query point Xq giving greater weight to closer neighbors. The
k-NN algorithm for approximating a discrete-value target function is given in (2) by
[9].
k
f (X q ) ← argmaxv∈V wi ∗ δ(v, f (xi )) (10)
i=1
The k-NN method is a simple, intuitive and efficient way to estimate the value of
an unknown function. Finding these K nearest neighbors requires the use of distance
functions (nominal, numerical or mixed). Similarity functions are often employed
in mixed problems, i.e. those with both nominal and numerical attributes [10]. The
results presented in [11] show that an important aspect in the methods based on
similarity grades, as the k-NN method, is the set of weights assigned to the features,
because this improves significantly the performance of the method [12]. In this paper
we propose a new alternative for calculating the weights of the features to be asso-
ciated with the predictive features that appear in the weighted similarity function
based on Fuzzy Similarity Quality Measure.
Prediction by k-NN and MLP a New Approach … 341
The most popular neural network model is the Multilayer Perceptron (MLP) and the
most popular learning algorithm is the Back-propagation (BP) [13], which is based on
correcting the error. The essential character of the BP algorithm is gradient descent,
because the gradient descent algorithm is strictly dependent on the shape of the error
surface. The error surface may have some local minimum and multimodal. This
results in falling into some local minimum and appearing premature convergence
[14]. BP training is very sensitive to initial conditions [13]. In general terms, the
choice of the initial weight vector W0 may speed convergence of the learning process
towards a global or a local minimum if it happens to be located within the attraction
based on that minimum. Conversely, if W0 starts the search in a relatively flat region
of the error surface it will slow down the adaptation of the connection weights [15].
An MLP is composed of an input layer, an output layer and one or more hidden layers,
but it has shown that for most problems it is sufficient with a single hidden layer. The
number of hidden units is directly related to the capabilities of the network, in our
case the number determine what follows (i + j)/2, where i is input neurons and j is
the output. Each link between neurons has an associated weight W , which is modified
in the so-called learning process. From there, the information is passed to the hidden
layer, and then transmitted to the output layer, that is responsible for producing the
network response [16]. In general, MLPs can have several hidden layers. However,
we consider the initialization of MLPs with only one hidden layer. Assuming a three-
layer neural network with n inputs (features), q outputs (categories), and one hidden
layer with a variable number of nodes (n + q)/2; see Fig. 1.
The method presented in this paper (UMDA+RST+FUZZY method) is used to
assign weights to the links between the input layer and hidden layer.
4 Experimental Setup
We will apply the proposed methods on a real dataset from the UCI Machine Learning
repository (baskball, detroit, diabetes-numeric, elusage, fishcatch, pollution, pwLin-
ear, vineyard, bolts, cloud, gascons and veteran, longley, pyrim, bodyfat). The variants
for calculating the weights for k-NN with k = 1 are: the proposed method in [1] (called
PSO+RST) in this case we use UMDA instead PSO, the weight obtained by Conju-
gated Gradient method (KNNVSM) [18], assigning the same weight to each feature
(called Standard) and Relief [19]. The results of the error of the MLP and the results of
the MLP when the different weight calculation methods (Random (MLP-AL), Stan-
dard (1/Quantity-Features), KNNVSM, UMDA+RST and UMDA+RST+FUZZY)
are used, were compared to prove the effectiveness of the UMDA+RST+ FUZZY
method. The results achieved by the k-NN and MLP for the cases standard error,
where the weights are initialized using the mentioned variants, are shown in Tables 1
and 2.
In order to compare the results, a multiple comparison test is used to find the best
algorithm. In Tables 3 and 4 the results of the Friedman statistical test are shown.
There can be observed that the best ranking is obtained by our proposal. Thus, this
indicates that the accuracy of UMDA+RST+FUZZY is significantly better. Also the
Iman-Davenport test was used [20]. The resulting p-value = 0.004666159801 < α
(with 3 and 33 degrees of freedom) for k-NN and MLP respectively - indicates that
there are indeed significant performance differences in the group for both methods.
There is a set of methods to increase the power of multiple test; they are called
sequential methods, or post-hoc tests. In this case it was decided to use Holm [21]
test to find algorithms significantly higher. UMDA+RST+FUZZY - as the con-
trol method- conduct to pair wise comparisons between the control method and
all others, and determine the degree of rejection of the null hypothesis. The results
reported in Table 5 reject all null hypotheses whose p-value is lower than 0.025, hence
confirming the superiority of the control method [10]. Since the UMDA+RST vs.
UMDA+RST+FUZZY null hypothesis was NOT rejected, This is equivalent to say-
ing that there are no significant differences in the performance of both algorithms
when they are combined with the 1-NN method and therefore they can be deemed
equally effective. The results reported in Table 6 reject all null hypotheses where the
p-value is lower than 0.05, as we can observe, the test rejects all cases in favor of the
best ranking algorithm. It can be noticed that UMDA+RST+FUZZY is statistically
superior to all compared methods When combined with the MLP method.
Prediction by k-NN and MLP a New Approach … 345
Table 5 Holm’s table with α = 0.025 for 1-NN, UMDA+RST+FUZZY is the control method
i Algorithm z = (R0-Ri)/SE p Holm Hypothesis
3 KNNVSM 2.766993 0.005658 0.016667 Reject
2 Standard 2.450765 0.014255 0.025 Reject
1 UMDA+RST 0.474342 0.635256 0.05 Its not rejected
Table 6 Holm’s table with α = 0.05 for MLP, UMDA+RST+FUZZY is the control method
i Algorithm z = (R0-Ri)/SE p Holm Hypothesis
3 Standard 3.464102 0.000532 0.016667 Reject
2 UMDA+RST 3.117691 0.001823 0.025 Reject
1 KNNVSM 2.424871 0.015314 0.05 Reject
In this section a real problem related with the branch of the Civil Engineering is
solved. In Cuba early determination of impairment by corrosion of the stud of anchor-
age of the railways fixations, it contributes to improved maintenance planning. To
determine the causes of this behavior, an extensive field study was developed from
which the data set was prepared. The data set has 96 instances and 5 features, includ-
ing the class feature. The description of the data set is shown in Table 7. The problem
is to predict the behavior before the corrosion of the stud of anchorage of the railways
fixations.
The data used for the study were been of experiment carried out in different
railway in Cuba, in the central railway of the city of Camag üey. A sample of these
data is shown in Table 8.
An experimental study for the data-set corrosion is performed (Table 9 and 10).
To predict the behavior before the corrosion of the stud of anchorage of the rail-
ways fixations (RN) for any orientation of the railway allows to plan the anticorrosive
maintenance appropriately to these elements and with it to rationalize the material
resources and humans required for this task and to elevate the security of the move-
ment of the trains.
6 Conclusion
In this paper has been study of combination of the Fuzzy Similarity Measure Quality
with the UMDA method and the use of feature’s weight compute by this method
in k-NN and MLP methods. The main contribution is the combination of the Fuzzy
Similarity Measure Quality with the UMDA method. This measure computes the
grade of similarity on a decision system in which the features can have discrete
or continuous values. The paper includes the calculus of the features weights by
means of the optimization of this measure. The experimental study for problems of
classification shows a superior performance of the k-NN and MLP algorithm when
the weights are initialized using the method proposed in this work, compared to
other previously reported methods to calculate the weight of features. Its applica-
tion to solve a classification problem of branch of the Civil Engineering has shown
satisfactory results.
References
1. Filiberto, Y., Bello, R., Caballero, Y., Larrua, R.: In: Proceedings of the 10th International
Conference on Intelligent Systems Design and Applications ISDA 2010 IEEE, pp. 1314–1319.
IEEE Press (2010)
Prediction by k-NN and MLP a New Approach … 347
1 Introduction
In many systems, particularly services, it is often that the customers must be waiting
for be processing, for instances the customers in a bank, or the people in a metro or
subway station, among the others. These systems are called queueing systems (QS).
For the service’s supplier view, there are several decision-making problems such as:
the number of servers to attend the customers, the kind of technology to improve
the service times, the queue policy (e.g. in serial or parallel setting), the capacity
of the system (servers plus queue size), among others. Some of these problems are
solved frequently using classical queueing theory, however when it is not possible
because the data, for probabilistic analysis, does not able, the problems are solved
using the experience of the people or their perception. These feature is a first source
of uncertain of the QS in the decision process.
Additional features in the QSs, such as feedback loops in the system, non-linearity,
variability, product mixes, routing, equipment random failures and stochastic arrival
times, add more complexity to the problem [1]. Some cases where the customer must
follow several steps to be processed are called queueing networks (QN). Around of
this setting, several queues arise, then the decision process is more complex since
numerous decision will be taken simultaneously to ensure the flow through the sys-
tem.
The decision making tools compare usually the efficiency of different config-
urations in terms of equipment, operators, storage areas, waiting areas, etc., and
determine long-term decisions, for instance in capacity expansion [1, 2]. There are
several methods as queueing theory, Jackson Networks, Mean Value Analysis and
Equilibrium Point Analysis, among others [3]. However, these systems do not con-
sider the uncertainty information.
Recently, López-Santana, Franco and Figueroa-Garcia [1] study the problem to
scheduling tasks in a QS considering the condition based systems in terms the queue’s
length, utilization and the work in process involving the imprecision in their mea-
surement process. They propose a Fuzzy Inference Systems (FIS) to determine the
server to allocate a specific task according to the condition in the systems measured in
terms of queue’s length and server’s utilization. In other related work [4], the authors
propose an ANFIS to determine the status in the system using input variables like
the length queue and utilization in order to scheduling task in a QS.
The purpose of this paper consists in apply the ANFIS based approach proposed
by [4] in QSs and QNs to determine the server to allocate a specific task according to
the queue’s length and server’s utilization. In addition, we consider the rework and
multi-tasking features. For our knowledge, this is the first time to ANFIS is applied
to scheduling decisions in QSs and QNs according to the literature review.
The remainder of this paper is organized as follows: Sect. 2 presents a background
and literature review of task scheduling in a queueing system. Section 3 describes
the proposed method. Section 4 shows two examples application of our method in
a QS and a QN. Finally, Sect. 5 concludes this work and provides possible research
directions.
Scheduling in Queueing Systems and Networks Using ANFIS 351
In this section we describe the overviews about of queueing systems (QS) and queue-
ing networks (QN) and shows a short review of works related with the scheduling
process in QS and QN.
1/2/3/4 (1)
where 1 refers to the arrival process that can be Poisson (M), Deterministic (D) or
general distribution different to Poisson (G); 2 is the service process that can be also
M, D o G; 3 represents the number of servers by stage of process in the network,
which can be single (represented by 1) or multiple (represented by s); and 4 states
the system’s capacity, infinite when it is empty or a K to indicate the queue’s length.
According to [1], the standard terminology and notation in QS consider as the
state of system the number of customers in the system. The Queue length (Ql) is the
number of customers waiting for service to begin or state of system minus number
of customers being served. Pn (t) denotes the probability of exactly n customers in
QS at time t, given number at time 0. s is the number of servers (parallel service
channels) in the QS. λn is the mean arrival rate (expected number of arrivals per unit
time) of new customers when n customers are in system and μn is the mean service
rate for overall system (expected number of customers completing service per unit
time) when n customers are in system.
When λn is a constant for all n, this constant is denoted by λ. When the mean
service rate per busy server is a constant for all n ≥ 1, this constant is denoted
by μ. (In this case, μn sμ when n ≥ s, that is, when all s servers are busy.)
Also,ρ λ/sμ is the utilization factor for the service facility, i.e., the expected
fraction of time the individual servers are busy, because λ/sμ represents the fraction
of the system’s service capacity (sμ) that is being utilized on the average by arriving
customers (λ).
When a QS has recently begun operation, the state of the system (number of
customers in the system) will be greatly affected by the initial state and by the time
that has since elapsed. The system is said to be in a transient condition. However, after
sufficient time has elapsed, the state of the system becomes essentially independent
of the initial state and the elapsed time (except under unusual circumstances). The
system has now essentially reached a steady-state condition, where the probability
distribution of the state of the system remains the same (the steady-state or stationary
distribution) over time. Queueing theory has tended to focus largely on the steady-
state condition, partially because the transient case is more difficult analytically.
We assume that Pn is the probability of exactly n customers in QS. Then L is
the expected number of customers in QS, it is computed by ∞ n0 n Pn , and L q is
the
∞ expected queue length (excludes customers being served), it is computed by
ns (n − s)P n . In addition, W is the expected waiting time in system (includes
service time) for each individual customer and Wq is the expected waiting time in
queue (excludes service time) for each individual customer. It has been proved that
in a steady-state queueing process,
L λW. (2)
It is known as Little’s Law [3, 8, 9]. Furthermore, the same proof also shows that
L q λWq . (3)
The Eqs. (1) and (2) are extremely important because they enable all four of the
fundamental quantities L , W, L q , and Wq to be immediately determined as soon as
one is found analytically.
Figure 2 presents an example of a QN. There are three stations and each one with
a single queue. The external arrival occurs at station 1 and 2 and there are two class.
In a single-class network, customers being processed or waiting for processing at any
given station are assumed to be indistinguishable, meanwhile in a multi-class system,
several classes of customers are served at each station. In our example, station 1 and
Scheduling in Queueing Systems and Networks Using ANFIS 353
2 could be processing class 1 and 2, and station 3 process only class 1. In addition,
there are several routes for each class defined by the stations that a customer must be
processed. The Kendall notation can be applied to singles stations in a QN separately,
but when the QN is analyzed globally, the notation cannot be used.
In QSs and QNs, arrivals and service times are presented in terms of probability
distributions. In addition, characteristics of service stations like the configuration
and routing protocols determine the flow of customers from one station to another,
including the number of servers in each stage. Another feature is the size of the
waiting area for each station. When this is limited, some of the customers causes
congestion at the previous station and it is raised blockage in the following stations.
In a general sense, a QN must be defined in terms of arrival and service rates,
and routing probabilities or proportion in which classes of customers are transferred
sequentially from one service stage to another. Particularly, the routing probabilities
induces feedback cycles that increase complexity in the understanding of this type of
systems (see Fig. 2). Since the QN is a system of nodes that interact, the operation of
each node and the routing depend on what’s happening along the network, given this
dependence, can occur any or combination of synchronous or parallel processing
of transactions in multiple nodes, toggle the routing of transactions to avoid the
congestion (or interference), speeding up or slowing down the rate of processing in
the following nodes that can be idle or congested and the customers are blocked from
entry to a specific phase of the network when the phase is not capable to process
more customers.
lations, among others in order to optimize one or more objectives [10]. This problem
has several applications in manufacturing and services environments, particularly
the scheduling problems are more difficult to solve in services because the complex-
ity increases for these systems. The QS and QN are the main examples of service
systems, thus the scheduling decision are a rich are to develop new method to help
the decision makers.
Terekhov et al. [11] provide an overview of queueing-theoretic models and meth-
ods that are relevant to scheduling in dynamic settings. They found that, the queue-
ing theory aims to achieve optimal performance in some probabilistic sense (e.g.,
in expectation) over a long-time horizon, since it is impossible to create an opti-
mal schedule for every single sample path in the evolution of the system. Moreover,
queueing theory generally studies systems with simple combinatorics, as such sys-
tems are more amenable to rigorous analysis of their stochastic properties, and usually
assumes distributional, rather than exact, knowledge about the characteristics of jobs
or job types. However, in cases where we lack data to build the stochastic models,
the scheduling decisions are made with the traditional rules [12, 13] as: FIFO (First
in first out), LIFO (Last in first out), SPT (Shortest processing time), LPT (Longest
processing time), EDD (Earliest due date), among others: Likewise, it is possible to
apply multi-attribute priority rules as [13]: CR + SPT (Critical ratio + the shortest
process time), S/OPN (minimum Slack time per remaining Operation), S/RPT + SPT
(Slack per remaining process time + the shortest process time), PT + WINQ (Process
time + work in the next queue), PT + PW (Process time + wait time), among others.
However, these rules do not involve the uncertainty or the condition-based of the
systems, thus it is necessary include these features in the solution techniques.
According to the QNs modeling purpose, the technical solution is selected consid-
ering the criterion of accuracy of the expected result with respect to the assumptions
of the systems behavior. Baldwinet et al. [14] proposes its classification in two types
namely: exact and approximate. Within the analytical techniques are classified as
exact: the Jackson Networks and BCMP networks. On the other hand, there are
approximate techniques as: Mean Value Analysis (MVA) and Equilibrium Point
Analysis. Table 1 presents the scope of the analysis techniques described above, by
specifying the types of customer and network that can be modeled with precision
(compared regard to the assumptions of each method). We include the technique
called “Kingman’s parametric decomposition” [3] considering their contribution for
the modeling of flow times in QNs.
Recently some applications have been developed in QS. Jain et al. [15] develop an
iterative approach using MVA, for the prediction of performance in flexible manufac-
turing systems with multiple devices materials handling. They demonstrate improve-
ments in the throughput, average time of service and average waiting time, with
respect to the previous configuration of the materials handling devices; uses a neuro-
fuzzy controller to compare the performance measures obtained with MVA demon-
strating the consistency between the results of both techniques, giving the basis for
the automation of the system using soft-computing. Cruz [16] examines the problem
of maximizing the throughput in QNs with general service time, finding the reduction
in the total number of waiting areas and the service rate, through genetic algorithms
Scheduling in Queueing Systems and Networks Using ANFIS 355
multi-objective to find a feasible solution to the need of improve the service given the
natural conflict between the cost and throughput. Yang and Liu [17] develop a hybrid
transfer function model that combines statistical analysis, simulation and analysis
of queues, taking as input values the systems work rate, the performance variables
throughput and work in process (WIP).
Applications of fuzzy logic to scheduling in QS and QN are scarce. Suganthi
and Meenakshi [18] developed a FIS combined with a round robin priority rule
to scheduling task in Cognitive Radio Network. Chude-Olisah et al. [19] address
the problem of queue scheduling for the packet-switched system is a vital aspect
of congestion control. They propose a fuzzy logic-based decision method to queue
scheduling to enforce some level of control for traffic of different quality of service
requirements using predetermined values. The results of simulation experiments
show that their proposed method reduces packet drop, provides good link utilization
and minimizes queue delay as compared with the priority queueing (PQ), first-in-
first-out (FIFO), and weighted fair queueing (WFQ). Cho et al. [20, 21] present a
method that use a FIS to dynamically and efficiently schedule priority queues in a for
internet routers. The fuzzy rules obtained minimize the selected Lyapunov function
presented by [21]. Their results based in simulation experiments outperform the
popular weighted round Robin (WRR) queue scheduling mechanism. López-Santana
et al. [1] presents a FIS to scheduling task in QS and applied too for QN in [22].
They show as the proposed FIS obtain better results than traditional scheduling rules
as round robin and equiprobable.
However, the use of artificial intelligence to modelling QS is scarce, Azadeh et al.
[23] has demonstrated optimize the modeling and simulation of QS and QN, since
under this scheme may include systems constraints and desired performance objec-
tives, gaining flexibility and the ability to deal with the complexity and nonlinearity
associated with the modeling of QSs and QNs.
356 E. López-Santana et al.
In this section, first we describe the architecture of ANFIS. Second, we show our
ANFIS-based approach to task scheduling in a QS.
ANFIS is a flexible approach based on fuzzy logic and artificial neural networks
[24]. We present an architecture based in two inputs x and y and one output z based
in [24–26]. Suppose, if the rule base contains two fuzzy if-then rules such as:
Rule 1: If x is A1 and y is B1 , then z 1 p1 x + q1 y + r1 .
Rule 2: If x is A2 and y is B2 , then z 2 p2 x + q2 y + r2 .
Then the membership functions and fuzzy reasoning is illustrated in Fig. 3, and the
corresponding equivalent ANFIS architecture is shown in Fig. 4. The node functions
in the same layer are of the same function family as described as follows:
Layer 1. Every node i in this layer is a square node that compute as:
where x is the input to node i and Ai the linguistic label (e.g., small, large, etc.)
associated with this node function. μAi (x) is a Membership Function (MF) with
a maximum value of 1 and minimum value of 0. Any continuous and piecewise
differentiable functions are used for node functions in this layer. Parameters in this
layer are referred to as premise parameters.
Layer 2. Every node in this layer is a circle node labeled , which multiplies the
incoming signals and sends the product out. For instance,
Layer 3. Every node in this layer is a circle node labeled N. The ith node computes
the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths:
wi
wi (6)
w1 + w2
For convenience, outputs of this layer will be called normalized firing strengths.
Layer 4. Every node i in this layer is a square node computed as:
Oi4 wi z i wi ( pi x + qi y + ri ), (7)
sequent parameters thus identified are optimal (in the consequent parameter space)
under the condition that the premise parameters are fixed.
López-Santana et al. [1] present the task scheduling problem in QS and propose a
Fuzzy inference system which has as the output the cycle time (W ) and the inputs
the queue’s length (Ql) and the utilization (u) based in Kingman equation given by:
W VUT (9)
where, V refers to a variability in the system, U is the utilization and T is the time.
Likewise, the Little Law in Eq. (1) refer that the W depends of L. Thus, we have
two equations to determine the cycle time W . The authors used a Mandami FIS
for the Fuzzification and Defuzzification Interfaces. As input, their approach uses
membership functions defined by experts or users in the system.
In this paper, we present an alternative process based in ANFIS approach. Our
solution does not need the setting of MF for the output. Our method uses a set of
training data to build this MF. ANFIS integrates both neural networks and fuzzy logic
principles train a Sugeno systems using neuro-adaptive learning as describe in above
section. Figure 5 presents the architecture of proposed ANFIS approach. In the next
four sections, we describe the inputs, output, method, and performance measure.
Inputs. The inputs of ANFIS are: u as the average of the utilizations of all servers;
Ql as the average length of queue of all servers; M Fi as the number of membership
functions of each input i ∈ {u, Ql}; μr as the type of membership functions of each
input i ∈ {u, Ql}; and N as number of epochs.
Output. The output of ANFIS is W as estimated cycled time of station or step
Method. The ANFIS is given by:
In similar way of the FIS proposed by [1], our ANFIS evaluates in a simulation
when a customer arrival to the process and the server l ∗ to attend a specific customer
is determined as the server with the minimum value of Wl for all l ∈ {1, 2, . . . , s}.
Equation (4) states this method.
Finally, this ANFIS proposed it is applied for each station un a QN, but the training
data consider the effect of all network, i.e., it considers the different flows as inputs
for every stations.
4 Results
Figure 6 presents the prototype for a system with 4 servers each one with a queue,
a single class of customer, the capacity is infinity, and the probability of rework is
20%. The discipline in the queue is FIFO (First in First Out).
To setting the parameters of ANFIS model. We use as training data the results of a sin-
gle simulation of 1000 unit times using a round-robin scheduling policy reported by
[1]. The QS is setting as G/G/4 where the inter-arrival time follows a uniform distribu-
tion between 0.5 and 1 min. The service time follows a uniform distribution between
1.5 and 3.0 min. And, a rework probability of 20%. The input of ANFIS consists for
utilization (u) and queue’s length (Ql) of 3 membership functions (M Fu 3) and
μu = gbellmf (Generalized bell-shaped membership function) type. The number of
epochs is N 50.
360 E. López-Santana et al.
Fig. 7 Results of training data of proposed ANFIS for Example 1 a training error, b training data
versus FIS output
Figure 7 shows the results of training error (a) and training data versus FIS output
(b). The results indicate a small error that reduces as the epochs increase and the fit
is good for the ANFIS output. In this case, there are not a set of rules defined by the
user. Figure 8 shows the results of the ANFIS approach based in training data; the
rule-based system in the graph (a) and its response surface in graph (b). The response
surface indicated as the utilization and queue’s length increases the cycle time also
increases, which is agree with the results of [1].
In order to compare the performance of our ANFIS-based approach we consider
a round robin scheduling policy that consists in allocate a server in sequential way,
equiprobable policy that consists in allocate any server with a same probability, and
the FIS approach proposed by [1].
Scheduling in Queueing Systems and Networks Using ANFIS 361
Fig. 8 Training results of proposed ANFIS for Example 1 a rule base system, b response surface
Figures 9 and 10 show the results for the utilizations and queue’s length for all
servers, respectively. Assuming the mean values of interarrival (ta ) and service (ts )
times and exponential distribution, the theoretical utilization is given by mtts a without
rework. For the example, the theoretical utilization is 0.75 for the system. About
our example’s results in a single simulation run, the utilizations converge to 0.80 in
average for all servers. The increasing is due to the rework. However, all scheduling
policies converge to the same value in steady-state. The FIS and ANFIS approaches
converge faster than round robin and equiprobable policies. Respect to the queue’s
length, round robin policy gets the shortest queue, after is our ANFIS approach and
Fig. 9 Example 1’s results of utilizations case G/G/4 with rework for a round robin, b equiprobable,
c FIS ([1]) approach, and d ANFIS approach
362 E. López-Santana et al.
Fig. 10 Example 1’s results of queue’s length case G/G/4 with rework for a round robin, b equiprob-
able, c FIS ([1]) approach, and d ANFIS approach
then the FIS approach. The equiprobable policy is lower performance. The results of
utilizations show that ANFIS converge for the minimum value of 0.79 approximately,
this is lower than FIS and round robin policies. The equiprobable policy has lower
performance.
These results confirm the rapid response of our ANFIS approach compared with
the traditional policies and is better or equal than the FIS approach proposed by
[1]. The difference of the FIS and ANFIS approaches consist in considering the
permanently check of the system’s status, i.e., these are condition-based scheduling
policies.
Figure 11 present the structure for the QN of example 2 with three stations based
in [22]. The first station has 4 servers, an input and rework probability of 0.2. Its
outputs go to second and third stations with 0.3 and 0.5 probability, respectively.
The second station has 3 servers, an additional input, rework probability of 0.15
and its outputs go to first station and third station with probability 0.4 and 0.45,
respectively. The third station has 4 servers, rework and its outputs exit from the
QN. Each station is a G/G/s system. For the first input in first station the inter arrival
times follows a uniform probability density function U(0.4, 1.5). The second station,
the probability density function is U(1.5, 3.5). For the services time the probability
Scheduling in Queueing Systems and Networks Using ANFIS 363
density functions are: U(1.5, 2.5), U(1.7, 2.7) and U(2, 2.8), for first, second and
third stations. Figure 12 shows the prototype of Matlab for the Example 2.
To setting the parameters of ANFIS model to the QN. We use as training data the
results of a single simulation of the QN of 500 unit times using a round-robin
scheduling policy reported by [22]. In addition, the input of ANFIS consists for
364 E. López-Santana et al.
Fig. 13 Example 2’s results of training data of proposed ANFIS for station 1 a training error, b
training data versus FIS output
Fig. 14 Example 2’s results of training data of proposed ANFIS for station 2 a training error, b
training data versus FIS output
Fig. 15 Example 2’s results of training data of proposed ANFIS for station 3 a training error, b
training data versus FIS output
Scheduling in Queueing Systems and Networks Using ANFIS 365
Fig. 16 Training results of proposed ANFIS for station 1 a rule base system, b response surface
Fig. 17 Training results of proposed ANFIS for station 2 a rule base system, b response surface
Figures 16, 17 and 18 illustrate the obtained FIS as result of the ANFIS method
based in training data for stations 1 to 3, respectively, the rule-based system in the
graph (a) and its response surface in graph (b). The response surface indicated as
the utilization and queue’s length increases the cycle time also increases, which is
coherent with the results of example 1 and the results presented by [1, 22].
In similar way of Example 1, we compare the performance of our ANFIS-based
approach with round robin scheduling policy that consists in allocate a server in
sequential way, equiprobable policy that consists in allocate any server with a same
probability, and the FIS approach proposed by [1]. We run a simulation of 500-unit
times with a warm time of 100 min to transient condition.
Figures 19 and 20 present the results for the utilizations and queue’s length,
respectively, for equiprobable scheduling policy. Subfigures (a), (b) and (c) show the
results for stations 1, 2 and 3 respectively. The results exhibit the evolution of the
utilizations over the time where it can observe as the values trends to converge for a
similar value, however some servers for each station have a high utilization because
the scheduling policy does not observe the queue’s length which it is high too. This
scheduling policy does not observe the condition of the server and always assign the
same work for all servers.
366 E. López-Santana et al.
Fig. 18 Training results of proposed ANFIS for station 3 a rule base system, b response surface
The results of round robin scheduling policy are shown in Figs. 21 and 22, for
utilizations and queue’s length respectively. Subfigures (a), (b) and (c) present the
results for stations 1, 2 and 3 respectively. In similar way to equiprobable scheduling
policy, the results are shown for each station and all servers. For this case, the results
show as the utilization for all server and for each station converges for a similar value
and the queue’s length is low compared with the equiprobable results. The queue’s
length is shorter than the equiprobable policy for all time, however this policy has not
in to account the condition of the station and if a breakdown will occur the allocation
is the same for all jobs to processing.
Scheduling in Queueing Systems and Networks Using ANFIS 367
(a) Queue´s length for station 1 (b) Queue´s length for station 2
Figures 23 and 24 illustrate the results for the utilizations and queue’s length,
respectively, for FIS scheduling policy. Subfigures (a), (b) and (c) present the results
for stations 1, 2 and 3 respectively. The utilizations over the time converge for a
368 E. López-Santana et al.
(a) Queue´s length for station 1 (b) Queue´s length for station 2
Fig. 22 Results of queue’s length for round robin scheduling policy of Example 1
similar value, and the queue’s length is low for all time. This scheduling policy
consider the condition of the server and always assign the work according to the
minimum value of cycled time computed with the proposed FIS.
Scheduling in Queueing Systems and Networks Using ANFIS 369
(a) Queue´s length for station 1 (b) Queue´s length for station 2
Fig. 24 Results of queue’s length for FIS proposed scheduling policy of Example 2
Figures 25 and 26 present the results for the utilizations and queue’s length, respec-
tively, for the proposed ANFIS scheduling policy. Subfigures (a), (b) and (c) show
the results for stations 1, 2 and 3 respectively. We can observe that the utilizations
are similar of FIS’s results over the time and these converge for a similar value.
Regarding for the queue’s length, the results show a low value for all time and these
are lower than the FIS’s results. This scheduling policy also consider the condition
of each server and always allocate a customer according to the minimum value of
cycled time computed with the proposed ANFIS. The obtained results of FIS and
ANFIS are better than round robin and equiprobable policies.
Finally, the results are consistent with the reported by [1, 22]. Moreover, the FIS
and ANFIS approaches converge faster than the traditional policies for all stations.
In addition, for equiprobable policy the utilization and queue’s length have a high
variability for all stations while in round robin, FIS and ANFIS policies the results
assemble at the same value for all stations.
5 Concluding Remarks
This paper studies the problem of scheduling customers or tasks in a queueing sys-
tems and queueing networks that consists in to allocate which servers process each
customer. We propose a method to scheduling the customers through the stations
based in an ANFIS-based approach that consists in selecting the customer accord-
370 E. López-Santana et al.
(a) Queue´s length for station 1 (b) Queue´s length for station 2
Fig. 26 Results of queue’s length for ANFIS proposed scheduling policy of Example 2
ing to a cycled time estimated with a FIS that use as inputs the utilization and the
queue’s length. Traditional scheduling policies work with different rules as round
robin, equiprobable, shortest queue, among others.
Scheduling in Queueing Systems and Networks Using ANFIS 371
References
1. López-Santana, E.R., Franco, C., Figueroa-Garcia, J.C.: A Fuzzy inference system to schedul-
ing tasks in queueing systems. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.)
Intelligent Computing Methodologies, pp. 286–297. Springer International Publishing AG
(2017)
2. Yang, F.: Neural network metamodeling for cycle time-throughput profiles in manufacturing.
Eur. J. Oper. Res. 205, 172–185 (2010). https://doi.org/10.1016/j.ejor.2009.12.026
3. Hopp, W.J., Spearman, M.L.: Factory Physics—Foundations of Manufacturing Management.
Irwin/McGraw-Hill (2011)
4. Lopez-Santana, E., Mendez-Giraldo, G., Figueroa-García, J.C.: An ANFIS-based approach to
scheduling in queueing systems. In: 2nd International Symposium on Fuzzy and Rough Sets
(ISFUROS 2017), pp. 1–12. Santa Clara, Cuba (2017)
5. Ross, S.: Introduction to Probability Models. Academic Press (2006)
6. Hillier, F.S., Lieberman, G.J.: Introduction to Operations Research. McGraw-Hill Higher Edu-
cation (2010)
7. Kendall, D.G.: Stochastic processes occurring in the theory of queues and their analysis by the
method of the imbedded Markov Chain. Ann. Math. Stat. 24, 338–354 (1953). https://doi.org/
10.1214/aoms/1177728975
8. Little, J.D.C.: A proof for the queuing formula: L = λ W. Oper. Res. 9, 383–387 (1961). https://
doi.org/10.1287/opre.9.3.383
9. Little, J.D.C., Graves, S.C.: Little’s law. In: Chhajed, D., Lowe, T.J. (eds.) Building Intuition:
Insights From Basic Operations Management Models and Principles, pp. 81–100. Springer,
Boston, MA (2008)
10. López-Santana, E.R., Méndez-Giraldo, G.A.: A knowledge-based expert system for scheduling
in services systems. In: Figueroa-García, J.C., López-Santana, E.R., Ferro-Escobar, R. (eds.)
Applied Computer Sciences in Engineering WEA 2016, pp. 212–224. Springer International
Publishing AG (2016)
11. Terekhov, D., Down, D.G., Beck, J.C.: Queueing-theoretic approaches for dynamic scheduling:
a survey. Surv. Oper. Res. Manag. Sci. 19, 105–129 (2014). https://doi.org/10.1016/j.sorms.
2014.09.001
12. Pinedo, M.L.: Planning and Scheduling in Manufacturing and Services. Springer (2009)
13. López-Santana, E.: Review of scheduling problems in service systems (2018)
14. Baldwin, R.O., Davis IV, N.J., Midkiff, S.F., Kobza, J.E.: Queueing network analysis: concepts,
terminology, and methods. J. Syst. Softw. 66, 99–117 (2003). https://doi.org/10.1016/S0164-
1212(02)00068-7
372 E. López-Santana et al.
15. Jain, M., Maheshwari, S., Baghel, K.P.S.: Queueing network modelling of flexible manufac-
turing system using mean value analysis. Appl. Math. Model. 32, 700–711 (2008). https://doi.
org/10.1016/j.apm.2007.02.031
16. Cruz, F.R.B.: Optimizing the throughput, service rate, and buffer allocation in finite queueing
networks. Electron. Notes Discret. Math. 35, 163–168 (2009). https://doi.org/10.1016/j.endm.
2009.11.028
17. Yang, F., Liu, J.: Simulation-based transfer function modeling for transient analysis of general
queueing systems. Eur. J. Oper. Res. 223, 150–166 (2012). https://doi.org/10.1016/j.ejor.2012.
05.040
18. Suganthi, N., Meenakshi, S.: An efficient scheduling algorithm using queuing system to min-
imize starvation of non-real-time secondary users in cognitive radio network. Clust. Comput.
1–11 (2018). https://doi.org/10.1007/s10586-017-1595-8
19. Chude-Olisah, C.C., Chude-Okonkwo, U.A.K., Bakar, K.A., Sulong, G.: Fuzzy-based dynamic
distributed queue scheduling for packet switched networks. J. Comput. Sci. Technol. 28,
357–365 (2013). https://doi.org/10.1007/s11390-013-1336-2
20. Cho, H.C., Fadali, M.S., Hyunjeong L.: Dynamic queue scheduling using fuzzy systems for
internet routers. In: The 14th IEEE International Conference on Fuzzy Systems, FUZZ’05,
pp. 471–476. IEEE (2005)
21. Cho, H.C., Fadali, M.S., Lee, J.W., Lee, Y.J., Lee, K.S.: Lyapunov-based fuzzy queue schedul-
ing for internet routers TT. Int. J. Control Autom. Syst. 5, 317–323 (2007)
22. López-Santana, E.R., Franco-Franco, C., Figueroa-García, J.C.: Simulation of fuzzy infer-
ence system to task scheduling in queueing networks. In: Communications in Computer and
Information Science, pp. 263–274 (2017)
23. Azadeh, A., Faiz, Z.S., Asadzadeh, S.M., Tavakkoli-Moghaddam, R.: An integrated artificial
neural network-computer simulation for optimization of complex tandem queue systems. Math.
Comput. Simul. 82, 666–678 (2011). https://doi.org/10.1016/j.matcom.2011.06.009
24. Geethanjali, M., Raja Slochanal, S.M.: A combined adaptive network and fuzzy inference
system (ANFIS) approach for overcurrent relay system. Neurocomputing 71, 895–903 (2008).
https://doi.org/10.1016/j.neucom.2007.02.015
25. Jang, J.-S.R.: ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man
Cybern. 23, 665–685 (1993). https://doi.org/10.1109/21.256541
26. López-Santana, E.R., Méndez-Giraldo, G.A.: A non-linear optimization model and ANFIS-
based approach to knowledge acquisition to classify service systems. In: Huang, D.-S., Bevilac-
qua, V., Premaratne, P. (eds.) Intelligent Computing Theories and Application, pp. 789–801.
Springer International Publishing (2016)
Genetic Fuzzy System for Automating
Maritime Risk Assessment
Abstract This chapter uses genetic fuzzy systems (GFS) to assess the risk level of
maritime vessels transmitting Automatic Identification System (AIS) data. Previous
risk assessment approaches based on fuzzy inference systems (FIS) relied on domain
experts to specify the FIS membership functions as well as the fuzzy rule base
(FRB), a burdensome and time-consuming process. This chapter aims to alleviate
this burden by learning the membership functions and FRB for the FIS of an existing
Risk Management Framework (RMF) directly from data. The proposed methodology
is tested with four different case studies in maritime risk analysis. Each case study
concerns a unique scenario involving a particular region: the Gulf of Guinea, the Strait
of Malacca, the Northern Atlantic during a storm, and the Northern Atlantic during
a period of calm seas. The experiments compare 14 GFS algorithms from the KEEL
software package and evaluate the resulting FRBs according to their accuracy and
interpretability. The results indicate that IVTURS, LogitBoost, and NSLV generate
the most accurate rule bases while SGERD, GCCL, NSLV, and GBML each generate
interpretable rule bases. Finally, IVTURS, NSLV, and GBML algorithms offer a
reasonable compromise between accuracy and interpretability.
1 Introduction
1 http://www.site.uottawa.ca/~rfalc032/isfuros2017/.
Genetic Fuzzy System for Automating Maritime Risk Assessment 375
2 Related Work
This section briefly reviews some relevant works along maritime risk analysis and
genetic fuzzy systems.
The purpose of risk assessment is to refine the situational picture for an operator
and/or decision maker. Following this, the goal is to recommend courses of action to
mitigate the identified and assessed risks. ISO 8402:1995/BS 4778 defines risk as:
“A combination of the probability, or frequency, of occurrence of a defined hazard
and the magnitude of the consequences of the occurrence” which closely follows the
International Maritime Organization (IMO) definition [5]. An effective risk manage-
ment strategy must involve the following actions: identify (to be aware of the present
hazards), review (to assess the risk associated with those hazards), control (to reduce
the risks that are not supportable), and review (to monitor the effectiveness of the
controls) [6].
Other projects that deal with risk detection, risk analysis, and risk management
within maritime settings include:
• A Risk Management Framework (RMF) for the risk-driven multi-criteria decision
analysis (MCDA) of various maritime situations, including the automatic genera-
tion of responses to incidents such as a Vessel in Distress (VID) [3, 7]
• Raytheon’s ATHENA Integrated Defense System (IDS) [8], which is designed to
search for suspicious behaviours in search-and-rescue situations
• The Predictive Analysis for Naval Deployment Activities (PANDA) [9] case-based
reasoning system that uses contextual-based risk assessment that relies on a human-
generated risk ontology
• The Maritime Automated Super Track Enhanced Reporting (MASTER) integra-
tive reporting project based on the Joint Capability Technology Demonstration
(JCTD) and the Comprehensive Maritime Awareness (CMA) [10].
Fuzzy Inference Systems (FISs) use fuzzy logic to map input features to class outputs.
Typically, FISs rely on fuzzy membership functions to map numerical inputs to degrees
of membership to linguistic variables modelled as fuzzy sets along with fuzzy rule
bases to accomplish this. The two most common types of FISs are Mamdani [11] and
Sugeno [12]. The main difference between these is that the consequent of Mamdani
FIS rules are fuzzy sets, whereas in Sugeno FISs the rule consequents are polynomial
expressions. Both type of FISs will provide a numerical output back to the user, which
reflects the decision variable of interest in the problem under consideration.
376 A. Teske et al.
Introduced in 1992 with the publication of [13], GFSs are computational models
for automatically learning the FIS membership functions’ parameters directly from
data. In this work, a genetic algorithm (GA) is used to optimize the parameters of
the FIS, with the objective of finding membership function parameters that emulate
a known fuzzy logic controller. This first version of GFS is technically considered
an example of reinforcement learning.
The same year (i.e. 1992) saw the introduction of the Michigan approach for
GFSs [14]. The Michigan approach typically optimizes the FIS rule base. Each
individual in the genetic population represents a single rule, and the entire population
represents the rule base. This introduces a fascinating contradiction. In GA terms,
the individuals in the population are competing with each other to survive based on
the natural selection principles that GAs are built upon. Yet from the FIS perspective,
the individuals in the population are cooperating together to collectively form a good
rule base. Therefore the individuals are both competing with and cooperating with
each other, a contradiction that is referred to as the “cooperation versus competition”
problem [15].
The Pittsburgh approach of GFSs was introduced with [16]. This approach is suit-
able for optimizing the FIS rule base and/or membership functions. Each individual
in the population encodes the entire set of rules and/or membership functions, and
the population is a set of candidate rule base/membership functions. This scheme
implies that the individuals in the population are competing against each other and
not cooperating with each other, which resolves the “cooperation vs competition
problem” seen with the Michigan approach. The drawback of this method is that the
individuals contain much more information, which drastically increases the size of
the search space. This can make it difficult to obtain optimal solutions.
The third common family of GFSs are known as Iterative Rule Learning (IRL)
approaches. As with the Michigan approach, the IRL approach models each individ-
ual as a single rule. However, only the best rule from each iteration is added to the
population, with subsequent iterations generating rules to complement the already-
established ones. The IRL approach addresses the “cooperation versus competition
problem” by dividing the cooperation and competition into two different phases: the
individuals compete within each iteration, and cooperation occurs as rules are added
to the final rule base.
Since their inception, GFSs have been extensively studied [17] and applied to
a wide variety of domains including medicine [18, 19], finance [20, 21], indus-
trial/manufacturing [22, 23], and many others. Figure 1 shows the architecture of a
typical GFS. For further reading [24], is a recent survey on the state-of-the-art of
GFSs.
In order to apply genetic fuzzy systems to maritime risk assessment (MRA), we model
the latter as a classification problem. The input features describing each AIS-reporting
vessel are a set of risk features, i.e. numbers attributes in the range [0, 1] that quantify
Genetic Fuzzy System for Automating Maritime Risk Assessment 377
the extent of a particular risk for the vessel. The decision classes represent the overall
risk assessment which can take a value from the set {LOW-RISK, MEDIUM-RISK,
HIGH-RISK}.
The RMF’s Risk Feature Extraction module [2] is used to calculate the following
four risk features for each AIS contact:
Vessels navigating over open oceans may encounter weather conditions that threaten
the safety of the vessel’s crew, passengers, and cargo. Several aspects of weather could
potentially pose a threat including visibility, ice conditions, currents, etc. However,
the single most important weather factor that impacts risk is wave height originating
from wind and swell [25]. Therefore, we model weather risk by mapping the wave
height to the “high weather risk” linguistic term with a trapezoidal membership
function with a = 1.25 m, b = 14 m, c = d = INF. This configuration is inspired by
the World Meteorological Organization sea state code,2 according to which waves
are “moderate” at 1.25 m, “rough” by 2.5 m, etc.
2 https://www.nodc.noaa.gov/woce/woce_v3/wocedata_1/woce-uot/document/wmocode.htm.
378 A. Teske et al.
Vessels navigating near one another run the risk of colliding. We calculate this risk
feature as a function of each vessel’s distance to the nearest ship. This is mapped to
a trapezoidal membership function with a = b = 0 m, c = 150 m, d = 926 m. This
configuration is inspired by Transport Canada’s guidelines on avoiding collisions.3
Certain regions of the world tend to see hostile activity by bad actors such as pirates.
We refer to these activities as maritime incidents, which serve as the basis for calcu-
lating a regional hostility risk factor. It is defined on the basis of three indicators as
follows:
• Mean Incident Proximity (MIP): As in [26], MIP is the mean distance to the n
nearest incidents within max_distance km of the vessel. The distance is mapped
to risk values via a trapezoidal membership function with a = b = 0 km, c = 370.4
km, d = 740.8 km. These parameter values make the MIP metric fairly sensitive
to the presence of maritime incidents.
• Mean Incident Severity (MIS): Following [26], MIS is calculated as the mean
severity of the n nearest incidents within max_distance of the vessel. The incident
severities are given in Table 1.
• Vessel Pertinence Index (VPI): As in [26], VPI is the maximum relevance of
the n nearest incidents within max_distance of the vessel. The similarities of the
vessel categories is given in Table 2.
Then the overall regional hostility is calculated as the weighted sum of αMIP +
βMIS + γ V PI with previously suggested values of α = 0.4, β = 0.3, γ = 0.3 [26].
3 https://www.tc.gc.ca/eng/marinesafety/tp-tp14070-3587.htm.
Genetic Fuzzy System for Automating Maritime Risk Assessment 379
Since the MIP trapezoidal membership function parameters suggested in [26] rarely
leads to risk values above 0, we use the above mentioned values. Finally, we use
max_distance = 1000 km.
Measures the potential impact of a disaster involving the vessel. For example, vessels
which carry hazardous material or many passengers would have high degree of
distress. Based on the data available to us, we calculate degree of distress as a
combination of the following indicators:
• Environment Risk: The potential impact to the environment as a result of this
vessel capsizing. The mapping from vessel type to environment risk is given in
Table 3.
• Risk of Attack: The Risk of Attack accounts for the probability of the vessel being
attacked based on its category (e.g. if most of the reported maritime incidents
correspond to cargo vessels, then this ship type has high Risk of Attack). Unlike
the VPI, this probability is based on all reported incidents in a given time period,
not just the n closest ones. It is calculated with the formula:
Risk of Attack (X ) = P(X .Category|I )
[26] where P(X .category|I ) is the fraction of the total number of incidents where
the vessel’s category is involved.
In [26], the Degree of Distress risk feature also included a “number of people
on board” and a “fuel level” component. However, to the best of our knowledge
there is no readily available data source for these data. Therefore we exclude these
components of Degree of Distress from this work.
For each set of risk values, a ground truth overall risk level is assigned to train the
GFS. In this work we use a simple heuristic to generate the ground truth, but in
practice the ground truth could be determined by consulting a domain expert. Our
simple heuristic first discretizes each risk value according to the following scheme:
⎧
⎪
⎨LOW-RISK, if the risk value is in [0,a)
Risk Value = MEDIUM-RISK, if the risk value is in [a,b)
⎪
⎩
HIGH-RISK, if the risk value is in [b,1]
We use a = 0.4 and b = 0.7. Each risk feature in the RMF’s Risk Assessment Module
(see Sect. 3.1) will be modelled by the three aforementioned linguistic terms. The
calculation of the overall risk level proceeds as follows:
⎧
⎪
⎪ HIGH-RISK,if at least one risk value is HIGH-RISK
⎪
⎪
⎪
⎪ OR
⎨
Overall Risk = at least two risk values are MEDIUM-RISK
⎪
⎪
⎪
⎪ LOW-RISK, if all risk features are LOW-RISK
⎪
⎪
⎩MEDIUM-RISK, otherwise
cargo ships, 44.7% were tankers, and 7.8% were utility vessels. Weather conditions
in the AOI/POI were mild.
The second AOI concerns the Strait of Malacca (min latitude = −4, max latitude
= 9, min longitude = 92, max longitude = 110) with POI January 1 2018 00:00:00–
January 1 2018 23:59:59. Not only is the Strait of Malacca one of the world’s busiest
maritime traffic lanes, it is also one of the narrowest: 1.5 nautical miles at its narrowest
point. This, combined with the steady growth of traffic within the strait make it a
potentially dangerous area to navigate. Indeed, 60 ship accidents were reported to
Maritime and Port Authority Singapore in 2015 [27]. Additionally, 37 maritime
incidents occurred in the AOI in 2017 (Fig. 4). Finally, weather conditions were mild
in the AOI/POI.
382 A. Teske et al.
The third and fourth scenarios each concern the same AOI: a northern stretch
of the Atlantic ocean (min latitude = 35, max latitude = 60, min longitude = −50,
max longitude = 0) with two different POIs: January 1 2018 00:00:00–January 1
2018 23:59:59 (“Atlantic Storm” scenario) and January 13 2018 00:00:00–January
13 2018 23:59:59 (“Atlantic No-Storm” scenario). The Atlantic Storm scenario takes
place during a harsh weather event in the Atlantic (Fig. 5). In the Atlantic No-Storm
scenario the weather is much milder (Fig. 6). No piracy activity was recorded in this
region in 2017.
The data for our experiments originates from the following sources:
AIS Data from Orbcomm.4 We make use of two full days of AIS data from Orbcomm
(i.e. January 1 2018 and January 13 2018), sampling these datasets as specified in
Sect. 5.1. Among the fields available in the AIS messages, we make use of latitude,
longitude, and ship type.
4 https://www.orbcomm.com/.
Genetic Fuzzy System for Automating Maritime Risk Assessment 383
Weather Data from the National Oceanic and Atmospheric (NOAA) Administra-
tions’s WaveWatch III archive.5 NOAA provides various weather forecasts in the
GRIdded Binary (GRIB) file format [28], a file format for reporting meteorological
data in a grid. We make use of NOAA’s global wave height GRIB files.
5 ftp://polar.ncep.noaa.gov/pub/history/waves.
384 A. Teske et al.
Maritime Incident Reports from the ICC International Maritime Bureau’s (IMB)
2017 Piracy and Armed Robbery Against Ships Report.6 This report lists maritime
incidents that occur throughout the world in a semi-structured format (see Fig. 7).
For each of the incidents in the 2017 report, we extract the date/time, location, type
of vessel attacked, and type of incident.
5 Experimental Analysis
For each case study mentioned in Sect. 4, we arbitrarily select one AIS message
from 1000 randomly selected vessels. For each of these messages, we keep only the
latitude, longitude, and ship type field. Each contact is fed to the Risk Management
Framework to determine the local risk values of the four risk features described in
Sect. 3.1, then the ground truth is assigned using the scheme described in Sect. 3.2.
The datasets are fed to the following fourteen KEEL algorithms: AdaBoost, COACH,
GBML, GCCL, GP, GPG, IVTURS, LogitBoost, MaxLogitBoost, NSLV, SGERD,
Slave2, SlaveV0, SP. Table 4 compares the algorithms under consideration and shows
the parameters that we employ for this study.
All experiments were performed on the Windows 10 platform with an i7-3520M
processor and 8GB of RAM. We downloaded the KEEL master branch from source
control,7 to perform the experiments. Each experiment was repeated 30 times using
a different random seed to account for the stochastic nature of the algorithms, and
the average values are reported.
Each resulting FIS was evaluated according to two metrics: accuracy via the well-
known F-measure and interpretability via the “total rule length” metric [29]. The
6 https://www.icc-ccs.org/.
7 https://github.com/SCI2SUGR/KEEL checked out on 01/05/2018.
Genetic Fuzzy System for Automating Maritime Risk Assessment 385
ideal FRB should have high F-measure and low total rule length. Note that although
we evaluate two objectives, the GFSs we tested are not dual-objective optimization
algorithms; they have the sole objective of maximizing accuracy.
F-Measure is a well-known metric for evaluating the accuracy of classification algo-
rithms. The key advantage of F-Measure over standard accuracy is that F-Measure
takes false positives and false negatives into account, making it especially suitable
for unbalanced datasets. For a 2 class problem, it is calculated as:
precision ∗ recall
F =2∗
precision + recall
where:
tp
precision =
tp + fp
tp
recall =
tp + fn
386 A. Teske et al.
and:
tp = true positive, fp = false positive, fn = false negative
For a multi-class problem, the F-Measure is defined as the average F-Measure for
each class.
Total Rule Length is a useful tool for measuring the complexity of a rule base (RB).
It is defined as the sum of the number of conditions in each rule [29]. This implicitly
takes into account both the number of rules in the RB and the number of conditions
in the rules.
The Friedman test was employed to rank the performance of the algorithms. Fol-
lowing this, the Nemenyi post-hoc test was used to test the statistical significance
between the rankings [30].
The Nemenyi tests allow us to arrange the algorithm into tiered groups, i.e. group
“A”, group “B”, group “C”, etc. All of the algorithms in group “A” are statistically
better than the algorithms in group “B” and so on. However, an algorithm can be
placed in more than one group. For example, a group of “AB” indicates that the
statistical test could not confirm that the algorithm is inferior to any of the algorithms
in group “A”, nor could the test confirm that the algorithm is statistically superior to
all of the algorithms in group “B”. Therefore, it may belong to group “A” or to group
“B”.
The results for accuracy are given in Table 5 for the Guinea scenario, Table 6 for
the Malacca scenario, Table 7 for the Atlantic Storm scenario, and Table 8 for the
Atlantic No-Storm scenario. The results for interpretability are given in Table 9 for
the Guinea scenario, Table 10 for the Malacca scenario, Table 11 for the Atlantic
Storm scenario, and Table 12 for the Atlantic No-Storm scenario.
In terms of accuracy, the top performers include IVTURS (A), LogitBoost (AB),
and NSLV (ABC) in the Guinea and Malacca scenarios. For the Atlantic Storm sce-
nario, the top performers are LogitBoost (A), MaxLogitBoost (AB), and IVTURS
(ABC). Finally, in the Atlantic No-Storm scenario the top performers are Logit-
Boost (A), GBML (AB), and IVTURS (ABC). In all of the scenarios, IVTURS and
LogitBoost are each top performers.
In terms of interpretability, SGERD is the top performer in all of the scenarios (A).
GCCL is also a strong contender in the Guinea scenario (B), the Malacca scenario
(AB), the Atlantic Storm scenario (BC), and the Atlantic No-Storm scenario (BC).
Genetic Fuzzy System for Automating Maritime Risk Assessment 387
NSLV algorithm performs well in the Guinea (C) and Malacca (C) scenarios but
performs slightly worse in the Atlantic Storm (DE) and Atlantic No-Storm (F) sce-
narios. Finally, GBML has good performance in the Atlantic Storm (B) and Atlantic
No-Storm (B) scenarios although its performance is less impressive in the Guinea
(CD) and Malacca (EF) scenarios.
388 A. Teske et al.
In Sect. 4, we anticipated that the FRBs generated for each scenario would differ
significantly, corresponding to the unique risk landscape of each case study. To test
this, we measured how frequently each risk feature appeared as an antecedent of a
fuzzy rule.
Table 13 shows the average probability that an antecedent will correspond to a
particular risk feature. Across all of the case studies, the Degree of Distress risk factor
consistently appears in roughly 30% of all conditions. In the Gulf of Guinea scenario,
regional hostility (29%) and Collision Factor (21%) are both important risk features,
whereas Weather Factor (17%) plays a slightly lesser role. The risk landscape in the
Strait of Malacca is revealed to be similar to the Gulf of Guinea, although Weather
Factor is (16%) is slightly less important while Collision Factor (22%) and Regional
Hostility (32%) are slightly more important. It is surprising that Collision Factor
isn’t much more important in the Strait of Malacca given the vessel congestion in the
AOI. For the two Atlantic scenarios, Regional Hostility (9%) almost never appears in
the rule base. As we would expect, Weather Factor is more important in the Atlantic
Storm (27%) than in the Atlantic No-Storm (21%) and Collision Factor is more
important in the Atlantic No-Storm (40%) than in the Atlantic Storm (34%).
In order to illustrate the difference between a highly accurate and a highly inter-
pretable rule base, we compare a rule base generated by IVTURS to one generated
by SGERD. SGERD generated the following RB:
Clearly the SGERD rule base is far simpler: it has fewer rules and fewer con-
ditions. Indeed, in our experiments SGERD’s rule bases contained an average of
6.75 conditions while IVTUR’s rule bases contained an average of 19.47 conditions.
However, this comes at the cost of accuracy: SGERD managed an average accuracy
of 77.5% yet IVTURS achieved 96.2%.
6 Conclusions
In this chapter, GFSs have been applied to the problem of assessing the overall risk
level of AIS-reporting maritime vessels. The GFSs automatically learn the rule base
and membership functions for a FIS which assigns each AIS message emitted by a
vessel one of three risk levels (Sect. 3) according to four individual risk values. The
data sources include AIS records, weather reports, and maritime incident reports
from three regions of the world: the North Atlantic, the Gulf of Guinea, and the
Strait of Malacca (Sect. 4).
The datasets were fed to fourteen GFS algorithms via the KEEL framework and
the resulting FRBs were evaluated according to their accuracy (F-measure) and inter-
pretability (total rule length) (Sect. 5.1). The experimental results (Sect. 5.4) indicate
392 A. Teske et al.
that IVTURS, LogitBoost, and NSLV generate the most accurate rule bases while
SGERD, GCCL, NSLV, and GBML each generate interpretable rule bases. Finally,
IVTURS, NSLV, and GBML algorithms offer a reasonable compromise between
accuracy and interpretability.
We also investigated the structure of the rule bases produced by each algorithm,
noting the prevalence of each risk factor within the rule bases. We saw that the
frequency with which each risk factor appears in the rules characterizes the unique
risk landscape of each AOI (Sect. 5.5).
As future work, we would like to design a more sophisticated scheme for assigning
the ground truth for the AIS messages, to consider additional risk features, as well
as to investigate the feasibility of producing a global rule base that does not depend
on a specific AOI.
Acknowledgements The authors acknowledge the financial support of the Ontario Centres of
Excellence (OCE) and the National Sciences and Engineering Research Council of Canada
(NSERC) for the project entitled “Big Data Analytics for the Maritime Internet of Things”.
References
1. Abielmona, R.: Tackling big data in maritime domain awareness. Vanguard, 42–43 (2013)
2. Falcon, R., Abielmona, R., Nayak, A.: An evolving risk management framework for wireless
sensor networks. In: Proceedings of the 2011 IEEE International Conference on Computational
Intelligence for Measurement Systems and Applications (CIMSA), pp. 1–6, Ottawa, Canada
(2011)
3. Falcon, R., Abielmona, R.: A response-aware risk management framework for search-and-
rescue operations. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1540–
1547, Brisbane, Australia (2012)
4. Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J.,
Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: Keel: a software tool to
assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)
5. International Maritime Organization: Guidelines for Formal Safety Assessment (FSA) for use
in the IMO Rule-Making Process (2002)
6. International Association of Classification Societies: A guide to risk assessment in ship oper-
ations (2012)
7. Falcon, R., Desjardins, B., Abielmona, R., Petriu, E.: Context-driven dynamic risk manage-
ment for maritime domain awareness. In: 2016 IEEE Symposium Series on Computational
Intelligence (SSCI), pp. 1–8. IEEE (2016)
8. Friedman, N.: The Naval Institute Guide to World Naval Weapon Systems. Naval Institute
Press (2006)
9. Moore, K.E.: Predictive analysis for naval deployment activities. PANDA BAA, 05-44 (2005)
10. Lim, I., Jau, F.: Comprehensive maritime domain awareness: an idea whose time has come?
In: Defence, Terrorism and Security, Globalisation and International Trade (2007)
11. Mamdani, E.H.: Application of Fuzzy Logic to Approximate Reasoning Using Linguistic
Synthesis
12. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and
control. IEEE Trans. Syst. Man Cybern. SMC-15(1), 116–132 (1985)
13. Karr, C.: Genetic algorithms for fuzzy controllers. AI Expert 6(2), 26–33 (1991)
14. Valenzuela-Rendón, M.: The Fuzzy Classifier System: a Classifier System for Continuously
Varying Variables (1991)
Genetic Fuzzy System for Automating Maritime Risk Assessment 393
15. Herrera, F., Magdalena, L.: Genetic Fuzzy Systems: A Tutorial, vol. 13, pp. 93–121. Tatra
Mountains Mathematical Publications (1997)
16. Thrift, P.R.: Fuzzy Logic Synthesis with Genetic Algorithms (1991)
17. Herrera, F.: Genetic fuzzy systems: taxonomy, current research trends and prospects. Evol.
Intell. 1(1), 27–46 (2008)
18. Dong, W., Huang, Z., Ji, L., Duan, H.: A genetic fuzzy system for unstable angina risk assess-
ment. BMC Med. Inform. Decis. Mak. 14, 12 (2014)
19. Nouei, M.T., Kamyad, A.V., Sarzaeem, M.R., Ghazalbash, S.: Developing a genetic fuzzy
system for risk assessment of mortality after cardiac surgery. J. Med. Syst. 38(10), 102 (2014)
20. Aznarte, J.L., Alcalá-Fdez, J., Arauzo-Azofra, A., Benítez, J.M.: Financial time series fore-
casting with a bio-inspired fuzzy model. Expert Syst. Appl. 39(16), 12302–12309 (2012)
21. Liu, C.-F., Yeh, C.-Y., Lee, S.-J.: Application of type-2 neuro-fuzzy modeling in stock price
prediction. Appl. Soft Comput. 12(4), 1348–1358 (2012)
22. Serdio, F., Lughofer, E., Pichler, K., Buchegger, T., Efendic, H.: Residual-based fault detection
using soft computing techniques for condition monitoring at rolling mills. Inf. Sci. 259, 304–
320 (2014)
23. Ramli, A.A., Watada, J., Pedrycz, W.: A combination of genetic algorithm-based fuzzy c-
means with a convex hull-based regression for real-time fuzzy switching regression analysis:
application to industrial intelligent data analysis. IEEJ Trans. Electr. Electron. Eng. 9(1), 71–82
(2014)
24. Fernández, A., López, V., Del Jesus, M.J., Herrera, F.: Revisiting Evolutionary Fuzzy Systems:
taxonomy, applications, new trends and challenges. Knowl. Based Syst. 80, 109–121 (2015)
25. Bowditch, N.: Weather routing. In: The American Practical Navigator: An Epitome of Navi-
gation, p. 896 (2002)
26. Falcon, R., Abielmona, R., Billings, S., Plachkov, A., Abbass, H.: Risk management with hard-
soft data fusion in maritime domain awareness. In: The 2014 Seventh IEEE Symposium on
Computational Intelligence for Security and Defense Applications (CISDA), pp. 1–8 (2014)
27. Calamur, K.: High traffic, high risk in the strait of Malacca. In: The Atlantic (2017)
28. World Meteorological Organization: Guide to GRIB (2003)
29. Gacto, M.J., Alcalá, R., Herrera, F.: Interpretability of linguistic fuzzy rule-based systems: an
overview of interpretability measures. Inf. Sci. 181(20), 4340–4360 (2011)
30. Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonpara-
metric statistical tests as a methodology for comparing evolutionary and swarm intelligence
algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
Fuzzy Petri Nets and Interval Analysis
Working Together
Abstract Fuzzy Petri nets are a potential modeling technique for knowledge rep-
resentation and reasoning in knowledge-based systems. Over the last few decades,
many studies have focused on improving the fuzzy Petri net model. Various new
models have been proposed in the literature on the subject, which increase both
modeling strength and usability of fuzzy Petri nets. Recently, generalised fuzzy Petri
nets have been proposed. They are a natural extension of the classic fuzzy Petri nets.
The t-norms and s-norms are entered into the model as substitutes for operators min,
max and · (the algebraic product). This paper, however, describes how the extended
class of generalised fuzzy Petri nets, called type-2 generalised fuzzy Petri nets, can
be used to represent knowledge and model reasoning in knowledge-based systems.
The type-2 generalised fuzzy Petri nets expand existing generalised fuzzy Petri nets
by introducing the triple of operators (In, Out1 , Out2 ) in the net model in the form of
interval triangular norms that are supposed to act as a substitute for triangular norms
in generalised fuzzy Petri nets. Thanks to this relatively simple modification, a more
realistic model than the previous one was obtained. The new model allows to use
approximate information in relation to the representation of knowledge, as well as
modeling reasoning in knowledge-based systems.
Z. Suraj (B)
Faculty of Mathematics and Natural Sciences, University of Rzeszów, Rzeszów, Poland
e-mail: zbigniew.suraj@ur.edu.pl
A. E. Hassanien
Faculty of Computers and Information, Cairo University, Giza, Egypt
e-mail: aboitcairo@gmail.com
1 Introduction
Petri nets (PNs) [30] are widely used in various areas of science and practice, in par-
ticular in robotics and artificial intelligence. They are particularly useful in modeling
and analysis of discrete event systems [5, 40, 41]. The extraordinary advantages of
PNs, such as simple formalism or intuitive graphical representation, make them a
very interesting research object for many years. Over the past few decades, many
different types of PNs have been proposed for different applications. There are many
books, articles and conference materials devoted to the theory and applications of
PNs in the world, see e.g. [5, 10, 23, 29, 31, 40, 41]. Although studies on the theory
and application of PNs have brought many benefits, a number of shortcomings still
remain, namely, the traditional PNs [5, 29] are not able to represent satisfactorily
so-called knowledge-based systems (KBSs). To deal with such an inconvenience in
1984, a new PN model called the fuzzy Petri net (FPN) was proposed by Lipp [15].
FPNs are a convenient tool facilitating the structurisation of knowledge, providing
intuitive visualization of knowledge-based reasoning, and facilitating the design of
effective fuzzy inference algorithms using imprecise, unclear or incomplete infor-
mation. All this makes the FPNs find their permanent place in the design of KBSs [3,
16, 25]. From the very beginning of the introduction of FPNs to support approximate
reasoning in KBS [17], scientists and practitioners in the field of artificial intelligence
have paid close attention to these net models. However, the first FPNs, according to
the literature on this topic [16], have many shortcomings and are not enough for the
increasingly complex KBSs. Therefore, many authors have proposed new alternative
models of such net models to increase their strength for the needs of both knowledge
representation and smarter implementation of rule-based reasoning [2, 3, 9, 14, 16,
26–28, 32–36].
This paper describes how the extended class of general fuzzy Petri nets (GFP-
nets) [32], called type-2 generalised fuzzy Petri nets (T2GFP-nets), can be used
for both knowledge representation and reasoning in knowledge-based systems. The
T2GFP-net expands the existing GFP-nets by introducing a triple of operators
(In, Out1 , Out2 ) in the T2GFP-net model in the form of interval triangular norms
that are supposed to act as a substitute for triangular norms in GFP-nets. In addi-
tion, this extension allows to present modeled by T2GFP system at a much more
convenient level of abstraction than using the classic FPN or even the GFP-net. The
selection of appropriate operators for a system modeled in a more generalized form
is very important, especially in situations where modeled systems are described by
incomplete, imprecise and/or unclear information. In the classic case, a fuzzy set,
called a type 1 fuzzy set, is defined in terms of the function from the universe to the
interval [0,1] (including 0 and 1). This means that the membership of each element
belonging to a fuzzy set is characterized by a single value from the unit interval [0,1],
and not the subinterval, as in the case of T2GFP-net.
In practical applications, it is more convenient to use the element belonging to a
fuzzy set, expressed as a unit interval subinterval instead of a single value from such a
range. The fuzzy set defined in this way is known as the type 2 fuzzy set. Any desired
Fuzzy Petri Nets and Interval Analysis Working Together 397
operations on the type 2 fuzzy sets can be defined by extending the definition of
appropriate operations to the type 1 fuzzy sets, i.e. based on the membership function
of an element with individual values from the interval [0,1]. Research concerning
the type 2 fuzzy sets are mainly focused on the so-called the min-max system [6,
20]. A somewhat weaker side of inference based on the type 2 fuzzy set theory is the
relatively higher computational cost compared to the approach using the type 1 fuzzy
set. To overcome this difficulty, it was proposed in the literature to consider special
cases of the type 2 fuzzy sets [7, 28, 38], which can be basically reduced to fuzzy
sets, in which the membership function of an element to the set takes only values
that are subintervals of the interval [0,1]. In the case of representing the membership
function value of an element belonging to a type 2 fuzzy set as a subinterval of
[0,1], so-called Φ-fuzzy sets [28] are obtained. In the Φ-fuzzy sets the subinterval is
simply considered to be the range in which the true membership [11, 28] is located.
With this assumption, a number of calculations related to performing operations
on Φ-fuzzy sets can be simplified. In addition, the definitions of extended triangular
norms (also called interval t-norms) for interval fuzzy operations are also significantly
simplified. In such a situation, the calculation of interval t-norm values is basically
limited to calculating their values only for the two extreme points of the intervals.
Fuzzy production rules used in this work as rules of inference are based precisely on
interval t-norms. The approach based on the use of type 2 fuzzy sets assumes that the
exact value of the membership function cannot be determined in the form of a single
real value. The corresponding range of values therefore determines the scope of the
exact value under consideration. The use of interval t-norms in T2GFP-net makes
the model more general and practical. In addition, it can be more credible when
accessing uncertain information. And this in turn leads to the fact that the reasoning
process carried out in KBSs based on uncertain knowledge is more realistic. The
new FPN model presented in this paper uses all the possibilities described above.
The natural consequence of this fact is that the approach proposed in this work can
be used to represent knowledge and modeling reasoning in e.g. KBSs [16, 39], fault
diagnosis of systems [13], as well as fuzzy regulation of quality [24].
The organization of this paper is as follows. Section 2 is devoted to basic notions
concerning triangular norms, interval computations and interval triangular norms. In
Sect. 3 a brief introduction to GFP-nets is provided. Section 4 presents the T2GFP-
nets formalism. In Sect. 5, we describe three structural forms of fuzzy production
rules. Section 6 presents two algorithms. The first algorithm constructs a T2GFP-
net on the base of a given set of fuzzy production rules. However, the second one
describes an approximate reasoning process realized by execution of a T2GFP-net
representing a given KBS. A simple example coming from the domain of air traffic
control illustrating the proposed methodology is given in Sect. 7. In Sect. 8 a discus-
sion on comparison with the existing literature has been made. Section 9 includes
remarks on directions for further research related to the presented methodology.
398 Z. Suraj and A. E. Hassanien
2 Preliminaries
In this section, we remind basic concepts and notations regarding triangular norms,
interval computations and interval triangular norms.
A triangular norm (t-norm for short) is a function t : [0, 1]2 → [0, 1], such that for
all a, b, c ∈ [0, 1] the following conditions are satisfied: (1) it has 1 as the unit
element, i.e., t(a, 1) = a; (2) it is monotone, i.e., if a ≤ b then t(a, c) ≤ t(b, c);
(3) it is commutative, i.e., t(a, b) = t(b, a); (4) it is associative, i.e., t(t(a, b), c) =
t(a, t(b, c)).
More relevant examples of t-norms are ZtN (a, b) = min(a, b) (minimum, Zadeh
t-Norm), GtN (a, b) = a · b (algebraic product, Goguen t-Norm), and LtN (a, b) =
max(0, a + b − 1) (Lukasiewicz t-Norm).
Since t-norms are just functions from the unit square into the unit interval, the com-
parison of t-norms is done in the usual way, i.e., pointwise. For the three basic t-norms
and for each (a, b) ∈ [0, 1]2 we have the following order LtN (a, b) ≤ GtN (a, b) ≤
ZtN (a, b).
An s-norm is a function s : [0, 1]2 → [0, 1] such that for all a, b, c ∈ [0, 1] the
following conditions are satisfied: (1) it has 0 as the unit element, i.e., s(a, 0) = a,
(2) it is monotone, i.e., if a ≤ b then s(a, c) ≤ s(b, c), (3) it is commutative, i.e.,
s(a, b) = s(b, a), and (4) it is associative, i.e., s(s(a, b), c) = s(a, s(b, c)).
However, the examples of s-norms corresponding respectively to the three basic
t-norms presented above are ZsN (a, b) = max(a, b) (maximum, Zadeh s-Norm),
GsN (a, b) = a + b − a · b (probabilistic sum, Goguen s-Norm), and LsN (a, b) =
min(1, a + b) (bounded sum, Lukasiewicz s-Norm).
As in the case of t-norms, we also have for the three basic s-norms and for each
(a, b) ∈ [0, 1]2 the following order: ZsN (a, b) ≤ GsN (a, b) ≤ LsN (a, b).
For further details, the reader is referred to [12].
respectively) and arithmetic equality relation on pairs of real numbers. The arith-
metic operations with real numbers may be easily extended to pairs of interval
numbers in the following way: A + B = [a + b, a + b ], A − B = [a − b , a − b],
A · B = [min(a · b, a · b , a · b, a · b ), max(a · b, a · b , a · b, a · b )], A/B = [a,
a ] · [1/b , 1/b] for 0 ∈
/ [b, b ]. We shall write A = B if and only if a = a and b = b .
In the special case where both A and B are non-negative intervals, the multiplica-
tion can be simplified to A · B = [a · b, a · b ], 0 ≤ a ≤ a , 0 ≤ b ≤ b .
For further details, the reader is referred to [1, 21].
(Classic) Petri nets (PNs) are a simple and convenient tool for modeling systems. They
have an intuitive graphic representation. PNs were proposed in the 1960s by Petri
[30]. Analysis of the PNs enables obtaining important information about the structure
and dynamic behavior of the modeled system. This information can be used in the
evaluation of the modeled system, its improvement or change. Therefore, they are
helpful at the system design stage. In this paper, we assume that the reader knows the
basic concepts of the PN theory. Readers interested in deeper knowledge about PNs
and their applications are referred to the book [5]. GFP-nets are a modification of the
PNs. They allow modeling of knowledge-based systems in which both knowledge and
inference using this knowledge are generally imprecise, unclear or incomplete. GFP-
nets are used to graphically present the production rules and modeling of approximate
reasoning based on such rules [32].
In the drawing, places are presented as circles and transitions as rectangles. The
function I represents the directed arcs joining places with transitions, and the function
O represents the directed arcs joining transitions with places. A place p is called an
input place of a transition t, if I (t) = {p}, and if O(t) = {p }, then a place p is called
an output place of t. The initial marking M0 is an initial distribution of tokens in the
places. It can be represented by a vector of dimension n of tokens (real numbers) from
[0, 1]. For p ∈ P, M0 (p) can be interpreted as a truth value of the statement s bound
with a given place p by means of the statement binding function α. Graphically, the
tokens are represented by means of suitable real numbers placed over the circles
corresponding to appropriate places.
We accept that if M0 (p) = 0 then the token does not exist in the place p. The
numbers β(t) and γ (t) are placed in a net drawing under the transition t. The first
number is interpreted as the truth degree of an implication corresponding to a given
transition t. The role of the second one is to limit the possibility of transition firings,
i.e., if the input operator In value for all values corresponding to input places of
the transition t is less than a threshold value γ (t) then this transition cannot be
fired (activated). The operator binding function δ connects transitions with triples of
operators (In, Out1 , Out2 ). The first operator in the triple is called the input operator,
and two remaining ones are the output operators. The input operator In concerns the
Fuzzy Petri Nets and Interval Analysis Working Together 401
way in which all input places are connected with a given transition t (more precisely,
statements corresponding to those places). However, the output operators Out1 and
Out2 concern the way in which the next marking is computed after firing the transition
t. In the case of the input operator we assume that it can belong to one of two classes,
i.e., t- or s-norm, whereas the second one belongs to the class of t-norms and the
third to the class of s-norms.
Let N be a GFP-net. A marking of N is a function M : P → [0, 1].
By the dynamics of GFP-net, we understand the way in which new net marking
is calculated based on the current marking after firing the transition enabled in this
marking.
Let N = (P, T , S, I , O, α, β, γ , Op, δ, M0 ) be a GFP-net, M be a marking of
N , t ∈ T , I (t) = {pi1 , pi2 , . . . , pik } be a set of input places for a transition t and
β(t) ∈ (0, 1]. (0 does not belong to the unit interval.) A transition t ∈ T is enabled
for marking M , if the value of input operator In for all input places of the transition
t by M is positive and greater than, or equal to, the value of threshold function
γ corresponding to the transition t. Formally, In(M (pi1 ), M (pi2 ), . . . , M (pik )) ≥
γ (t) > 0.
We assume that one can only fire enabled transitions. Firing the enabled transition
t consists of removing the tokens from its input places I (t) and adding the tokens to
all its output places O(t) without any alteration of the tokens in other places. If M is
a marking of N enabling transition t and M is the marking derived from M by firing
transition t, then for each p ∈ P a computation of the next marking M is as follows:
(1) Tokens from all input places of the fired transition t are removed. (2) Tokens in
all output places of t are modified in the following way: at first, the value of input
operator In for all input places of t is computed, next, the value of output operator
Out1 for the value of In and for the value of truth degree function β(t) is determined,
and finally, a value corresponding to M (p) for each p ∈ O(p) is obtained as a result
of output operator Out2 for the value of Out1 and the current marking M (p). (3)
Numbers in the remaining places of net N are not changed. Formally, for p ∈ P
M (p) = 0 if p ∈ I (t), Out2 (Out1 (In(M (pi1 ), M (pi2 ), . . . , M (pik )), β(t)), M (p)) if
p ∈ O(t), and M (p) otherwise.
Example 1 Let us consider a GFP-net in Fig. 1a. For this net we have: the set of
places P = {p1 , p2 , p3 }, the set of transitions T = {t1 }, the input function I and the
Fig. 1 A GFP-net with: a the initial marking, b the marking after firing t1
402 Z. Suraj and A. E. Hassanien
output function O in the form: I (t1 ) = {p1 , p2 }, O(t1 ) = {p3 }, the set of statements
S = {s1 , s2 , s3 }, the statement binding function α : α(p1 ) = s1 , α(p2 ) = s2 , α(p3 ) =
s3 , the truth degree function β : β(t1 ) = 0.8, the threshold function γ : γ (t1 ) =
0.3, and the initial marking M0 = (0.5, 0.7, 0). In addition, there are: the set of
operators Op = {ZtN , ZsN , GtN } and the operator binding function δ defined as
follows: δ(t1 ) = (ZtN , GtN , ZsN ). The transition t1 is enabled by the initial marking
M0 , because min(M0 (p1 ), M0 (p2 )) = 0.5 ≥ γ (t1 ). After firing the transition t1 by the
marking M0 we receive a new marking M = (0, 0, 0.4) (Fig. 1b), at which t1 is no
longer enabled.
In Sect. 2 we have recalled basic notions of interval analysis and related areas, now
we will describe how to modify the GFP-net model (see Sect. 3), so as to make it
closer to the physical reality.
In the T 2GFP-net, functions defined in positions (2), (3) and (6) are more general
compared to the corresponding functions in GFP-net. This time their values are
interval numbers from [0,1], instead of individual values from this interval. Moreover,
in this model we assume that the input operator can belong to one of two classes,
i.e., interval t- or interval s-norms, whereas the second one belongs to the class of
interval t-norms, and the third to the class of interval s-norms. This extension allows
you to present and analyze the modeled system using T 2GFP-net on a more general
level of abstraction.
Let N be a T2GFP-net. A marking of N is a function M : P → L([0, 1]). We
assume that if M (p) = [0, 0] then the token does not exist in the place p.
A transition t ∈ T is enabled for marking M , if the interval produced by input
operator In for all input places of the transition t by M is (strictly) greater than
[0,0] and greater than, or equal to, the interval being a value of threshold function
γ corresponding to the transition t, i.e., In(M (pi1 ), M (pi2 ), . . . , M (pik ))
γ (t)
[0,0].
Let N = (P, T , S, I , O, α, β, γ , Op, δ, M0 ) be a T2GFP-net, t ∈ T , I (t) = {pi1 ,
pi2 , . . . , pik } be a set of input places for a transition t and β(t) ∈ L((0, 1]). (0 does not
belong to the unit interval.) Moreover, let In be an input operator and Out1 , Out2 be
Fuzzy Petri Nets and Interval Analysis Working Together 403
Fig. 2 A T 2GFP-net with: a the initial marking, b the marking after firing t1
γ (t1 ) = [0.2, 0.3]. Firing transition t1 by the marking M0 transforms M0 to the mark-
ing M = ([0, 0], [0, 0], [0.35, 0.48]) (Fig. 2b), at which t1 is no longer enabled.
A fuzzy production rule (a rule for short) is an important and fruitful approach
to knowledge representation and a FPN is a very useful way to represent the rule
graphically [32]. In this paper, we assume that a KBS is described by rules of the
form: IF premise THEN conclusion (CF) for which the premise is consumed and
the conclusion is produced each time the rule is used, where CF means a certainty
factor. Moreover, the system modeling is realized by transforming these rules into a
T 2GFP-net depending on the form of a transformed rule. In the paper, we consider
three structural forms of rules.
Type 0: IF s THEN s (CF = [c, c ]), where s, s denote statements, [a, a ], [b, b ]
are the interval numbers corresponding to their values, and CF is a certainty factor.
The truth values of s, s , and CF belong to L([0, 1]).
404 Z. Suraj and A. E. Hassanien
Fig. 3 A T2GFP-net
representation of rule type 0
The degree of reliability of the rule is expressed by the value of the parameter CF.
The higher value of [c, c ] means that the rule corresponding to this value is more
reliable. In similar way, the value [d , d ] ∈ L([0, 1]) is interpreted. It represents the
threshold value assigned to each rule. The higher value [d , d ] means the higher truth
degree of the rule premise, i.e. s, is required. However, the operator In and the oper-
ators Out1 , Out2 represent the input operator and the output operators, respectively.
These operators play an important role in optimizing the rule firing. This aspect will
be discussed in more details in Sect. 7. According to Fig. 2, the token value at the
output place p of the transition t corresponding to the production rule is calculated
according to the formula [b, b ] = Out1 ([a, a ], [c, c ]).
A T2GFP-net structure of this rule is shown in Fig. 3.
If the antecedence or the consequence of a rule contains And or Or (classical
propositional connectives), it is called a composite rule. Below, two types of com-
posite rules are presented together with their T2GFP-net representation (see Fig. 4).
Type 1: IF s1 And/Or s2 . . . And/Or sk THEN s (CF = [c, c ]), where s1 , s2 ,
…, sk , s denote statements, and [a1 , a1 ], [a2 , a2 ], ..., [ak , ak ], [b, b ] their values,
respectively. The token value [b, b ] is calculated in the output place as follows
(Fig. 4a): [b, b ] = Out1 (In([a1 , a1 ], [a2 , a2 ] . . . , [ak , ak ]), [c, c ]).
It is easy to see that a rule of type 0 is a particular case of a rule of type 1, as in
the case of the rule of type 0, there is only one statement in the antecedence.
Type 2: IF s THEN s1 And s2 . . . And sn (CF = [c, c ]), where s , s1 , s2 , ..., sn
denote statements, and and [b, b ], [a1 , a1 ], [a2 , a2 ], ..., [an , an ] denote their values,
respectively. The token value is calculated in each output place as follows (Fig. 4b):
[ak , ak ] = Out1 ([b, b ], [c, c ]).
Remarks:
1. Taking into account the fact that there are single statements in the rules of type
0 and 2, you could omit the input operator In in Figs. 3 and 4b. Nevertheless, in
order to maintain the adopted pattern of the triples of operators in these figures,
we leave the operator where it is.
2. In three graphical representations of the types of rules considered above, we
assume that the initial markings of the output places are equal to [0,0]. In this
situation, the output operator Out2 can be omitted from formulas describing the
values of markings at output places, because it does not change the marking value
in these places. Otherwise, i.e. for non-zero marking of the output places, it is
necessary to take into account the output operator Out2 . This means that in each
formula presented above, the final marking value [a, a should be calculated as
follows: [a, a ] = Out2 ([a , a ], M (p)), where [a , a ] means the token values
calculated for the appropriate rule types using the formulas above, and M (p) is
the marking of the output place p. Intuitively, the final token value corresponding
to M (p) for each p ∈ O(t) is obtained as a result of the operation Out2 for the
calculated value of operation Out1 and the current marking M (p).
3. In this paper, we do not consider rules of the form: IF s THEN s1 Or s2 . . .
Or sn . Rules of this type do not represent a single implication, but a set of n
implications with the same premise s and n conclusions si , i = 1,2,…, n.
4. Due to technical reasons the names of functions β, γ in Figs. 3 and 4 are repre-
sented by b and g respectively, and not in their original shape.
6 Algorithms
To model and analyze the system with uncertainty, we usually have to do the follow-
ing three steps (cf. [39]):
Step 1. Generate corresponding FPN model for KBS.
Step 2. Design a reasoning algorithm based on some application backgrounds.
Step 3. Implement the reasoning algorithm with the appropriate parameters.
In this section, we present two algorithms that correspond to the realization of the
first two steps mentioned above. An example of the realization of the third step will
be presented in Sect. 7.
The first algorithm constructs a T2GFP-net on the base of a given set of rules;
the transformation of rules into a T2GFP-net is realized depending on the form of
the transformed rule (see previous section). However, the second one describes a
reasoning process realized by execution of a T2GFP-net representing a given KBS.
The effectiveness of this algorithm is obvious. It depends mainly on the number
of rules belonging to the set R [4].
406 Z. Suraj and A. E. Hassanien
Algorithm 2 is based on the idea of the reachability tree [29, 31]. The main benefits
of this approach are the ease of understanding the algorithm and the ease of finding
the path of inference. On the other hand, its weaker side is the more complex data
structure and the relatively slow speed of inference (cf. [39]).
The following section shows an example of using these two algorithms together
with the appropriate parameters.
7 An Example
This section shows an example of a simplified version of the real problem [8]. This
applies to the following situation: a plane B waits at a certain airport for a plane A to
allow some passengers to change plane A to plane B. Now, the conflict arises when
the plane A is late. In this situation, you can consider the following alternatives:
• Plane B waits for the arrival of plane A. In this case, B will depart late.
• Plane B departs according to schedule. In this case, passengers leaving plane A
must wait for a later plane.
• Plane B departs according to schedule, and another plan is proposed for passengers
of plane A.
In order to make the most accurate decision, one should also take into account
several other factors, such as time of delay, number of passengers changing plane,
etc. Consideration of the optimal solution to the problem with mutually exclusive
goals, such as minimizing delays in the entire flight network, warranty connections
to the satisfaction of passengers, the efficient use of expensive resources, etc., in this
example are completely omitted.
To describe the aforementioned conflict in air travel, we propose to consider the
following three rules:
• IF s2 Or s3 THEN s6
• IF s1 And s4 And s6 THEN s7
• IF s4 And s5 THEN s8 ,
where the statements’ labels have the meanings presented in Table 1.
Using Algorithm 1 (Sect. 6) for constructing a T2GFP-net on the base of a given
set of rules, we present the T2GFP-net model corresponding to these rules. This
net model is shown in Fig. 5, where the logical operators Or, And are interpreted as
iZsN (interval maximum) and iZtN (interval minimum), respectively. Note that the
places p1 , p2 , p3 and p4 include the sets of fuzzy values [0.5,0.6], [0.4,0.5], [0.7,0.8]
and [0.5,0.7] corresponding to the statements s1 , s2 , s3 and s4 , respectively. In this
example, the statement s5 attached to the place p5 is the only crisp one and its value is
equal to [1,1]. Moreover, there are: the truth degree function β : β(t1 ) = [0.8, 0.9],
β(t2 ) = [0.6, 0.7] and β(t3 ) = [0.9, 1], the threshold function γ : γ (t1 ) = [0.3, 0.4],
γ (t2 ) = [0.4, 0.5], γ (t3 ) = [0.5, 0.6], the set of operators Op = {iZtN , iGtN , iZsN }
408 Z. Suraj and A. E. Hassanien
Fig. 5 An example of T2GFP-net model of air traffic control: a the initial marking, b the marking
after firing a sequence of transitions t1 t2
and the operator binding function δ defined as follows: δ(t1 ) = (iZsN , iGtN , iZsN ),
δ(t2 ) = (iZtN , iGtN , iZsN ), δ(t3 ) = (iZtN , iGtN , iZsN ).
Assessing the statements attached to the places from p1 up to p5 , we observe that
the transitions t1 and t3 can be fired. Firing these transitions according to the firing
rules for the T2GFP-net model allows for computation of the support for the alter-
natives in question. In this way, the possible alternatives are ordered with regard to
the preference they achieve from the knowledge base. This order forms the basis for
further examinations and simulations and, ultimately, for the dispatching proposal. If
one chooses a sequence of transitions t1 t2 then they obtain the final value, correspond-
ing to the statement s7 , equal to the interval [0.3,0.42]. The detailed computation in
Fuzzy Petri Nets and Interval Analysis Working Together 409
this case proceeds as follows. We can see that the transition t1 is enabled by the
initial marking M0 , because iZsN (M0 (p2 ), M0 (p3 )) = iZsN ([0.4, 0.5], [0.7, 0.8]) =
[max(0.4, 0.7), max(0.5, 0.8)] = [0.7, 0.8]
γ (t1 ) = [0.3, 0.4]. Firing transition t1
by the marking M0 transforms M0 to the marking M1 = ([0.5, 0.6], [0, 0], [0, 0],
[0.5, 0.7], [1, 1], [0.56, 0.72], [0, 0], [0, 0]), because iGtN ([0.7, 0.8], [0.8, 0.9]) =
[0.7 · 0.8, 0.8 · 0.9] = [0.56, 0.72], where t2 is still enabled. Firing transition t2 by
the marking M1 transforms M1 to the marking M2 = ([0, 0], [0, 0], [0, 0], [0, 0],
[1, 1], [0, 0], [0.3, 0.42], [0, 0]) since iZtN ([0.5, 0.6], [0.56, 0.72], [0.5, 0.7]) =
[min(0.5, 0.56, 0.5), min(0.6, 0.72, 0.7)] = [0.5, 0.6] and iGtN ([0.5, 0.6], [0.6,
0.7]) = [0.5 · 0.6, 0.6 · 0.7] = [0.3, 0.42], where all transitions are disabled. In the
other case (i.e., for the transition t3 only), the final value, this time corresponding to
the statement s8 , equals the interval [0.45,0.7], where also all transitions are disabled.
We omit the particular calculation in the second case, because it runs similarly as
above.
The graphical representation of Algorithm 2 execution is illustrated in Fig. 6.
We can easily see in this graph three sequences of firing transitions (the reach-
able paths): (t1 , t2 ), (t1 , t3 ), and (t3 , t1 ). The first reachable path goes from the ini-
tial marking M0 represented in the graph by the node N0 to the final marking M2
represented in the graph by the node N2 . However, the next two reachable paths
transform marking M0 into the final marking M4 = ([0.5, 0.6], [0, 0], [0, 0], [0, 0],
[0, 0], [0.56, 0.72], [0, 0], [0.45, 0.7]) represented in the graph by the node N4 (see
Table in Fig. 7). Since the markings of places p7 and p8 are the true degrees of the
statements attached to these places, thus the values [0.3, 0.42] and [0.45, 0.7] are
respectively the believable degrees of final decisions in the KBS considered in this
example.
If we interpret the logical operators Or, And as the interval probabilistic sum iGsN
and interval algebraic product iGtN , respectively, and if we choose a sequence of
transitions t1 t2 then the final value is not possible to obtain, because after firing the
transition t1 by the initial marking M0 we achieve the result marking by which the
transition t2 is not able to fire. In the other case, i.e., for the transition t3 , we obtain the
final value for the statement s8 also equal to [0.45, 0.7]. A similar situation occurs
as before, if we accept the interval Lukasiewicz s-norm and interval Lukasiewicz
t-norm for the logical operators Or, And, respectively.
410 Z. Suraj and A. E. Hassanien
This example shows clearly that different interpretations for the logical operators
Or and And may lead to quite different decision results. Therefore, we propose a
new fuzzy net model in the paper which is more flexible than the classical one
as in the former class the user has the chance to define the input/output operators.
Choosing a suitable interpretation for logical operators Or and And we may apply the
mathematical relationships between interval t-norms and interval s-norms presented
in Sect. 2.3. The rest in this case certainly depends on the experience of the model
designer to a significant degree.
In this section, we present a brief information about new FPN models, as well as a
comparison of our approach with existing literature in this area.
Using the review article [16], the new FPN models can be divided into five thematic
groups, such as:
1. FPNs combining PNs and fuzzy logic.
2. FPNs considering time factor.
3. FPNs based on possibility logic.
4. FPNs using neural networks.
5. FPNs based on matrix operations.
The approach presented in this paper differs from the one presented above. It
opens a new, sixth and, it seems, equally promising direction of research, which
can be described as FPN and interval analysis. In particular, the paper proposes
a T2GFP-net model in an uncertain environment with interval numbers, which has
some advantages over the models proposed in the literature, which can be summarized
as follows:
• This paper uses interval t-norms [22] instead of the classic t-norms [12], as well as
interval parameters that characterize FPRs, and therefore the proposed approach
opens the possibility of optimizing the degree of truth at the output places, cf. [37].
Fuzzy Petri Nets and Interval Analysis Working Together 411
• The T2GFP-net model makes the system more generalised in comparison to [16,
39], because all the markings in input and output places as well as the transition
characteristics are linked to some parameters, which are also interval numbers.
This option applies to the reliability of the system.
• Because interval fuzzy sets have been used in this paper, thus one can specify
the interval number instead of the exact membership or truth value. An interval
is assumed in order to indicate the range of the exact value, so that the model
proposed in this paper is more realistic.
9 Concluding Remarks
Trying to make GFP-nets more realistic with regard to the perception of physical
reality, in this paper we have established a link between GFP-net and interval analysis.
The link is methodological and demonstrates the possible use of the methodology
of interval analysis (to deal with incomplete information) to transform GFP-nets
into a more realistic T2GFP-net model. The model uses interval triangular norms
instead of classical triangular ones. In the approach based on the interval fuzzy
sets, it is assumed that one is not able to specify the exact membership or truth
value. An interval is adopted to indicate the range of the exact value. It makes the
model as proposed in this paper more flexible, general and practical. Moreover, this
model is concerned with the reliability of the information provided, leading to greater
generalization in approximate reasoning process in KBS. Suitability and usefulness
of the proposed approach has been proved for the decision-making by using a simple
real-life example. The elaborated approach looks promising with regard to alike
application problems that could be solved in a similar manner.
In this paper, we have only considered the extension of t-norms to interval t-norms
in a numeric framework. It is useful to study FPNs in the context of the notion of
t-norms and their interval extensions using more general mathematic structures (i.e.,
L-values, in general, for some lattice L, see e.g., [18, 19]). These are examples of
issues which we would like to investigate applying the approach presented in the
paper.
References
1. Alefeld, G., Mayer, G.: Interval analysis: theory and applications. J. Comput. Appl. Math. 121,
421–464 (2000)
2. Bandyopadhyay, S., Suraj, Z., Grochowalski, P.: Modified generalized weighted fuzzy Petri net
in intuitionistic fuzzy environment. In: Proceedings of the International Joint Conference on
Rough Sets, Santiago, 2016, Chile. Lecture Notes in Artificial Intelligence 9920, pp. 342-351,
Springer (2016)
412 Z. Suraj and A. E. Hassanien
3. Cardoso, J., Camargo, H. (eds.): Fuzziness in Petri nets. Springer, Heidelberg (1999)
4. Chen, S.M., Ke, J.S., Chang, J.F.: Knowledge representation using fuzzy Petri nets. IEEE Trans.
Knowl. Data Eng. 2(3), 311–319 (1990)
5. David, R., Alla, H.: Petri Nets and Grafcet: Tools for Modelling Discrete Event Systems.
Prentice-Hall, London (1992)
6. Dubois, D., Prade, P.: Operations in a fuzzy-valued logic. Inf. Control 43, 224–240 (1979)
7. Dubois, D., Prade, P.: Possibility Theory: An Approach to Computerized Processing of Uncer-
tainty. Plenum Press, New York (1988)
8. Fay, A., Schnieder, E.: Fuzzy Petri nets for knowledge modelling in expert systems. In: Cardoso,
J., Camargo, H. (eds.) Fuzziness in Petri Nets, pp. 300–318. Springer, Berlin (1999)
9. Hassanien, A.E., Tolba, M.F., Shaalan, K.F., Azar, A.T. (eds.): Advances in Intelligent Systems
and Computing, p. 845. In: Proceedings of the International Conference on Advanced Intelligent
Systems and Informatics, AISI 2018, Cairo, Egypt, 3–5 Sept 2018. Springer (2019)
10. Jensen, K., Rozenberg, G. (eds.): High-level Petri Nets. Theory and Application. Springer,
Berlin (1991)
11. Kenevan, J.R., Neapolitan, R.E.: A model theoretic approach to propositional fuzzy logic
using Beth tableaux. In: Zadeh, L.A., Kacprzyk, J. (eds.) Fuzzy Logic for the Management of
Uncertainty, pp. 141–157. Wiley, New York (1993)
12. Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer, Dordrecht (2000)
13. Lajmi, F., Talmoudi, A.J., Dhouibi, H.: Fault diagnosis of uncertain systems based on interval
fuzzy Petri net. Stud. Inf. Control 26(2), 239–248 (2017)
14. Li, X., Lara-Rosano, F.: Adaptive fuzzy Petri nets for dynamic knowledge representation and
inference. Expert Syst. Appl. 19, 235–241 (2000)
15. Lipp, H.P.: Application of a fuzzy Petri net for controlling complex industrial processes. In:
Proceedings of IFAC Conference on Fuzzy Information Control, pp. 471–477 (1984)
16. Liu, H.-C., You, J.-X., Li, Z.W., Tian, G.: Fuzzy Petri nets for knowledge representation and
reasoning: a literature review. Eng. Appl. Artif. Intell. 60, 45–56 (2017) (Elsevier)
17. Looney, C.G.: Fuzzy Petri nets for rule-based decision-making. IEEE Trans. Syst. Man Cybern.
18(1), 178–183 (1988)
18. Ma, Z., Wu, W.: Logical operators on complete lattices. Inf. Sci. 55, 77–97 (1991)
19. Mayor, G., Torrens, J.: On a class of operators for expert systems. Int. J. Intell. Syst. 8, 771–778
(1993)
20. Mizumoto, M., Tanaka, K.: Some properties of fuzzy sets of type 2. Inf. Control 31, 312–340
(1976)
21. Moore, R.E.: Interval Analysis. Prentice-Hall, New Jersey (1966)
22. Moore, R.E.: Methods and Applications of Interval Analysis. SIAM Studies in Applied and
Numerical Mathematics, vol. 2 (1979)
23. Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 541–580 (1989)
24. Nabli, L., Dhouibi, H., Collart Dutilleul, S., Craye, E.: Using interval constrained Petri nets for
the fuzzy regulation of quality: case of assembly process mechanics. Int. J. Comput. Inf. Eng.
2(5), 1478–1483 (2008)
25. Omran, L.N., Ezzat, K.A., Hassanien, A.E.: Decision support system for determination of
forces applied in orthodontic based on fuzzy logic. In: Proceedings of the 3rd International
Conference on Advanced Machine Learning Technologies and Applications, Cairo, Egypt,
22–24 Feb 2018, Advances in Intelligent Systems and Computing, pp. 158–168. Springer
(2018)
26. Pedrycz, W., Gomide, F.: A generalized fuzzy Petri net model. IEEE Trans. Fuzzy Syst. 2(4),
295–301 (1994)
27. Pedrycz, W.: Generalized fuzzy Petri nets as pattern classifiers. Pattern Recog. Lett. 20(14),
1489–1498 (1999)
28. Pedrycz, W.: Fuzzy Control and Fuzzy Systems, second extended edition. Wiley, Hoboken
(1993)
29. Peterson, J.L.: Petri Net Theory and the Modeling of Systems. Prentice-Hall Inc, Englewood
Cliffs (1981)
Fuzzy Petri Nets and Interval Analysis Working Together 413
30. Petri, C.A.: Kommunikation mit Automaten. Schriften des IIM Nr. 2, Institut for Instrumentelle
Mathematik, Bonn (1962)
31. Reisig, W.: Petri Nets. EATCS Monographs on Theoretical Computer Science, vol. 4. Springer,
Berlin (1985)
32. Suraj, Z.: A new class of fuzzy Petri nets for knowledge representation and reasoning. Fundam.
Inf. 128(1–2), 193–207 (2013)
33. Suraj, Z.: Knowledge representation and reasoning based on generalized fuzzy Petri nets. In:
Proceedings of the 12th International Conference on Intelligent Systems Design and Applica-
tions, Kochi, 2012, India, pp. 101–106. IEEE Press (2012)
34. Suraj, Z.: Modified generalized fuzzy Petri nets for rule-based systems. In: Proceedings of
the 15th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular
Computing, Tianjin, 2015, China. Lecture Notes in Artificial Intelligence 9437, pp. 196–206.
Springer (2015)
35. Suraj, Z., Bandyopadhyay, S.: Generalized weighted fuzzy Petri net in intuitionistic fuzzy
environment. In: Proceedings of the IEEE World Congress on Computational Intelligence,
Vancouver, 2016, Canada, pp. 2385–2392. IEEE Press (2016)
36. Suraj, Z., Grochowalski, P., Bandyopadhyay, S.: Flexible generalized fuzzy Petri nets for rule-
based systems. In: Proceedings of the 5th International Conference on the Theory and Practice
of Natural Computing, Sendai, 2016, Japan. Lecture Notes in Computer Science 10071, pp.
196–207, Springer (2016)
37. Suraj, Z.: Toward Optimization of Reasoning Using Generalized Fuzzy Petri Nets. In: Pro-
ceedings of the International Joint Conference on Rough Sets, Quy Nhon, Vietnam, 20–24
Aug 2018. Lecture Notes in Artificial Intelligence 11104, pp. 294–308. Springer (2018)
38. Yao, Y.Y.: Interval based uncertain reasoning. In: Proceedings of the 19th International Con-
ference of the North American Fuzzy Information Processing Society-NAFIPS, 13–15 July
2000, Atlanta, USA
39. Zhou, K.-O., Zain, A.M.: Fuzzy Petri nets and industrial applications: a review. Artif. Intell.
Rev. 45, 405–446 (2016)
40. Zhou, MengChu, DiCesare, F.: Petri Net Synthesis for Discrete Event Control of Manufacturing
Systems. Kluwer, 1993
41. Zurawski, R., Zhou, M.C.: Petri nets and industrial applications: a tutorial. IEEE Trans. Ind.
Electr. 41(6), 567–583 (1994)