Professional Documents
Culture Documents
delRio-lopez-benitez-herrera-Chi-FRBCS-MapReduce Rules
delRio-lopez-benitez-herrera-Chi-FRBCS-MapReduce Rules
To cite this article: Sara del Río, Victoria López, José Manuel Benítez & Francisco Herrera (2015) A MapReduce Approach
to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules, International Journal of
Computational Intelligence Systems, 8:3, 422-437, DOI: 10.1080/18756891.2015.1017377
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall
not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other
liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
International Journal of Computational Intelligence Systems, Vol. 8, No. 3 (2015) 422-437
Sara del Rı́o ∗, Victoria López , José Manuel Benı́tez , Francisco Herrera
Dept. of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and
Communications Technology), University of Granada, Granada, Spain †
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
Abstract
The big data term is used to describe the exponential data growth that has recently occurred and represents
an immense challenge for traditional learning techniques. To deal with big data classification problems
we propose the Chi-FRBCS-BigData algorithm, a linguistic fuzzy rule-based classification system that
uses the MapReduce framework to learn and fuse rule bases. It has been developed in two versions with
different fusion processes. An experimental study is carried out and the results obtained show that the
proposal is able to handle these problems providing competitive results.
Keywords: Fuzzy rule based classification systems; Big data; MapReduce; Hadoop; Rules fusion.
in the information technology industry is what is Fuzzy Rule Based Classification Systems (FR-
known as “big data.” This popular term is used when BCSs) are effective and popular tools for pattern
referring to massive amounts of data that are difficult recognition and classification.7 These techniques are
to handle and analyze using traditional data manage- able to obtain good accuracy results while providing
ment tools.1 These include structured and unstruc- a descriptive model for the end user through the us-
tured, data featuring from terabytes to zettabytes age of linguistic labels. When dealing with big data,
and coming from diverse sources such as social net- one of the issues that hinders the extraction of in-
works, mobile devices, multimedia data, webpages formation is the uncertainty that is associated to the
or sensor networks among others.2 vagueness or the noise inherent to the available data.
More data should lead to more effective anal- Therefore, FRBCSs seem appropriate in this sce-
ysis and therefore, should enable the extraction of nario as they can handle uncertainty, ambiguity or
more accurate and precise information. Neverthe- vagueness in a very effective manner. Another issue
less, the standard machine learning and data mining that complicates the learning process is the high di-
techniques are not able to easily scale up to big data mensionality and the large number of instances that
problems. 3 In this way, it is necessary to adapt and are present in big data, since the inductive learn-
∗ Correspondingauthor. Tel:+34-958-240598; Fax: +34-958-243317.
† E-mail:
srio@decsai.ugr.es (Sara del Rı́o), vlopez@decsai.ugr.es (Victoria López), J.M.Benitez@decsai.ugr.es (José Manuel Benı́tez),
herrera@decsai.ugr.es (Francisco Herrera).
ing capacity of FRBCSs is affected by the expo- characteristics that make it particularly suitable to
nential growth of the search space.8 To address this build a parallel approach:
problem, several approaches have appeared to build • It is a simple approach that does not have strong
parallel fuzzy systems 9,10 ; however, these models dependencies between parts of the algorithm.
aim to reduce the processing time while preserving • It generates rules that have the same structure
the accuracy and they are not designed to manage (rules with as many antecedents as attributes in
huge amounts of data. In this way, it is necessary to the dataset using only a fuzzy label). Having the
adapt and redesign the FRBCSs so that they are able same structure for the rules greatly benefits both
to provide an accurate classification in a reasonable the rule generation from a subset of the data and
amount of time in big data problems.
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
last years and has raised considerable interest be- fault tolerant data applications that was developed
cause of the possibilities in the improvement in the for processing large datasets over a cluster of ma-
data processing and knowledge extraction. Big data chines. The MapReduce model is based on two pri-
is the popular term to encompass all the data so large mary functions: the map function and the reduce
and complex that it becomes difficult to process or function, which must be designed by users. In gen-
analyze using traditional software tools or data pro- eral terms, in the first phase the input data is pro-
cessing applications.6 Initially, this concept was de- cessed by the map function which produces some
fined as a 3Vs model, namely volume, velocity and intermediate results; afterwards, these intermediate
variety 6 : results will be fed to a second phase in a reduce
function which somehow combines the intermediate
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
• Volume: This characteristic refers to the huge results to produce a final output.
amounts of data that need to be processed to ob- The MapReduce model is defined with respect
tain helpful information. to an essential data structure known as <key,value>
• Velocity: This property states that the data pro- pair. The processed data, the intermediate and final
cessing applications must be able to obtain results results work in terms of <key,value> pairs. In this
in a reasonable time. way, the map and reduce functions that can be seen
• Variety: This feature indicates that the data can in a MapReduce procedure are defined as follows:
be presented in multiple formats; structured and
unstructured, such as text, numerical data or mul- • Map function: In the map function the mas-
timedia among others. ter node takes the input, divides it into several
sub-problems and transfers them to the worker
More recently, new dimensions have been pro- nodes. Next, each worker node processes its
posed 6 by different organizations to describe the sub-problem and generates a result that is trans-
big data model being the veracity, validity, volatil- mitted back to the master node. In terms of
ity, variability or value some of them. <key,value> pairs, the map function receives a
Big data problems appear in a large number of <key,value> pair as input and emits a set of inter-
fields and sectors such as economic and business ac- mediate <key,value> pairs as output. Then, these
tivities, public administrations, national security or intermediate <key,value> pairs are automatically
researches, among others. For example, the New shuffled and ordered according to the intermediate
York Stock Exchange can generate up to one Ter- key and will be the input to the reduce function.
abyte per day of new trade data. Facebook servers • Reduce function: In the reduce function, the
store one Petabyte of multimedia daily data (about master node collects the answers of worker nodes
ten billion photos). Another example is the Inter- and combines them in some way to form the fi-
net Archive, which can accumulate two Petabytes of nal output of the method. Again, in terms of
data per day.14 <key,value> pairs, the reduce function obtains
This situation tends to be a problem as the re- the intermediate <key,value> pairs produced in
searchers, governments or enterprises have had to the previous phase and generates the correspond-
face the challenge to process huge amounts of data ing <key,value> pair as the final output of the al-
quickly and efficiently, so that they can improve the gorithm.
productivity (in business) or obtain new scientific
breakthroughs (in scientific disciplines).
Figure 1 illustrates a typical MapReduce pro-
gram with its map and reduce steps. The terms k
2.2. MapReduce programming model
and v refer to the original key and value pair respec-
The MapReduce programming model was intro- tively; k and v are the intermediate <key,value>
duced by Google in 2004.11,15 It is a distributed pro- pair that is created after the map step; and v is the
gramming model for writing massive, scalable and final result of the algorithm.
Nevertheless, the original MapReduce technol- 3.1. Introduction to Fuzzy Rule Based
ogy is a proprietary system exploited by Google, and
Classification Systems
therefore, it is not available for public use. Apache
Hadoop is the most relevant open source implemen- Any classification problem is usually defined by
tation of the Google’s MapReduce programming m training instances x p = (x p1 , . . . , x pn ,Cp ), p =
model and the Google File System (GFS).14 It is a 1, 2, . . . , m with n attributes and M classes where x pi
project written in Java and supported by the Apache is the value of attribute i (i = 1, 2, . . . , n) and Cp is the
Software Foundation for easily writing applications value of class label (C1 , . . . ,CM ) of the p-th training
that process vast amounts of data in parallel on clus- sample.
ters of nodes. Hadoop provides a distributed file An FRBCS is composed by two elements: the
system similar to GFS, designated as Hadoop Dis- Inference System and the Knowledge Base (KB). In
tributed File System, which is highly fault tolerant, a linguistic FRBCS, the KB is formed of the Data
and is designed for work over large clusters of “com- Base (DB), which contains the membership func-
modity hardware.” tions of the fuzzy partitions associated to the input
In this paper we consider the Hadoop MapRe- attributes and the Rule Base (RB), which comprises
duce implementation to develop our proposals be- the fuzzy rules that describe the problem. A learn-
cause of its performance, open source nature, instal- ing procedure is needed to construct the KB from
lation facilities and its associated distributed file sys- the available examples.
tem (Hadoop Distributed File System). In this work, we use fuzzy rules of the following
Machine learning algorithms have also been form to build our FRBCS:
adapted using the MapReduce programming model
to manage big data in a straightforward way. The
Mahout project,16 also supported by the Apache Rule R j : If x1 is A1j and . . . and xn is Anj
(1)
Software Foundation, aims to provide scalable ma- then Class = C j with RW j
chine learning applications for large-scale and in-
telligent data analysis techniques over Hadoop plat- where R j is the label of the j-th rule, x = (x1 , . . . , xn )
forms or other scalable systems. It is possibly the is a n-dimensional pattern vector, Aij is an antecedent
most widely used tool that integrates scalable ma- fuzzy set, C j is a class label, and RW j is the rule
chine learning algorithms for clustering, recommen- weight. We use triangular membership functions to
dation systems, classification problems, pattern min- represent the linguistic labels.
There are many alternatives that have been pro- to each attribute Ai using equally distributed
posed to compute the rule weight. Among them, a triangular membership functions.
good choice is to use the heuristic method known as
the Penalized Certainty Factor (PCF) 21 : 2. Generating a new fuzzy rule associated to
each example x p = (x p1 , . . . , x pn ,Cp ):
p-th example of the training set with the antecedents (b) Select the fuzzy region that obtains the
of the rule and C j is the consequent class of rule j. maximum membership degree in rela-
In order to provide the final class associated with tion with the example.
a new pattern x p = (x p1 , . . . , x pn ), the winner rule Rw (c) Build a new fuzzy rule whose antecedent
is determined through the following equation: is calculated according to the previous
fuzzy region and whose consequent is
the class label of the example Cp .
μw (x p ) · RWw = max{μ j (x p ) · RW j ; j = 1 . . . L} (3) (d) Compute the rule weight.
We use the fuzzy reasoning method of the win- When following the previous procedure, several
ing rule 22 when predicting a class using the built KB rules with the same antecedent can be built. If they
for a given example. In this way, the class assigned have the same class in the consequent, then, dupli-
to the example x p , Cw , is the class indicated in the cated rules are deleted. However, if the class in the
consequent of the winner rule Rw . consequent is different, only the rule with the high-
In the event that multiple rules obtain the same est weight is maintained in the RB.
maximum value for equation 3 but with different
classes on the consequent, the classification of the 4. The Chi-FRBCS-BigData algorithm: A
pattern x p will not be performed, that is, the pattern MapReduce Design based on the Fusion of
would not have any associated class. In the same Fuzzy Rules
way, if the pattern x p does not match any rule in the
RB, no class is associated to the example and the In this section, we will present the Chi-FRBCS-
classification is also not carried out. BigData algorithm that we have developed to deal
with big data classifications problems. To do so, this
3.2. The Chi et al.’s algorithm for Classification method uses two different MapReduce processes:
To build the KB of a linguistic FRBCS, we need • One MapReduce process is devoted to the build-
to use a learning procedure that specifies how the ing of the model from a big data training set, de-
DB and RB are created. In this work, we use the tailed in Section 4.1.
method proposed in 23 , an extension of the well- • The other MapReduce process is used to estimate
known Wang and Mendel method for classification the class of the examples belonging to big data
24 , which we have called the Chi et al’s method, Chi-
sample sets using the previous learned model, this
FRBCS. process is explained in Section 4.2.
To generate the KB, this generation method tries
to find the relationship between the input attributes Both parts follow the MapReduce design, dis-
and the classes space following the next steps: tributing all the computations along several process-
ing units that manage different chunks of informa-
1. Building the linguistic fuzzy partitions: This tion, aggregating the results obtained in an appropri-
step builds the DB from the domain associated ate manner.
Furthermore, we have developed two versions of If the rules share the same consequent, only
the Chi-FRBCS-BigData algorithm, which we have one rule is preserved; if the rules have differ-
named Chi-FRBCS-BigData-Max and Chi-FRBCS- ent consequents, only the rule with the highest
BigData-Ave. These versions share most of their op- weight is kept in the map RB.
erations, however, they behave differently in the re-
duce step of the approach, when the different RBs 3. Reduce: In this third phase, a processing unit
generated by each map are fused. These versions receives the results obtained by each map pro-
obtain different RBs and thus, different KBs. cess (RBi ) and combines them to form the fi-
nal RB (called RBR in Figure 2). The com-
bination of the rules is straight-forward: the
4.1. Building the model with
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
1. Initial: In this first phase, the method com- (a) Chi-FRBCS-BigData-Max: In this ap-
putes the domain associated to each attribute proach, the method searches for the rules
Ai using the whole training set. The DB is cre- with the same antecedent. Among these
ated using equally distributed triangular mem- rules, only the rule with the highest
bership functions as in Chi-FRBCS. Then, the weight is maintained in the final RB,
system automatically segments the original RBR . In this case it is not necessary
training dataset into independent data blocks to check if the consequent is the same
which are automatically transferred to the dif- or not, as we are only maintaining the
ferent processing units together with the cre- most powerful rules. Equivalent rules
ated DB. (rules with the same antecedent and con-
sequent) can present different weights as
2. Map: In this second phase, each processing they are computed in different map pro-
unit works independently over its available cesses over different training sets.
data to build its associated RB (called RBi in For instance, if we have five rules with
Figure 2) following the original Chi-FRBCS the same antecedent and the following
method. consequents and rule weights (see Fig-
Specifically, for each example in the data par- ure 3);
tition, an associated fuzzy rule is created: first,
the membership degree of the fuzzy labels is R1 of RB1 : Class 1, RW1 = 0.8743;
computed according to the example values; R1 of RB2 : Class 2, RW2 = 0.9254;
then, the fuzzy region that obtains the great- R2 of RB3 : Class 1, RW3 = 0.7142;
est value is selected to become the antecedent R1 of RB4 : Class 2, RW4 = 0.2143 and
of the rule; next, the class of the example is R2 of RBn : Class 1, RW5 = 0.8215.
assigned to the rule as the consequent; and fi-
nally, the rule weight is computed using the then, Chi-FRBCS-BigData-Max will
set of examples that belong to the current map keep in RBR the rule R1 of RB2 : Class
process. 2, RW2 = 0.9254 because it is the rule
After the rules have been created and be- with the maximum weight.
fore finishing the map step, each map process (b) Chi-FRBCS-BigData-Ave: In this ap-
searches for rules with the same antecedent. proach, the method also searches for the
rules with the same antecedent. Then, Class 2, RWC2 = 0.5699, and it will
the average weight of the rules that have keep in RBR the rule RC1 : Class 1, RWC1
the same consequent is computed (this = 0.8033 because it is the rule with the
step is needed because rules with the maximum average weight.
same antecedent and consequent may
have different weights as they are built Please note that it is not needed for any Chi-
over different training sets). Finally, the FRBCS-BigData version to recompute the
rule with the greatest average weight is rule weights according to the data in the re-
kept in the final RB, RBR . duce stage, as we are calculating the new rule
For instance, if we have five rules with weights from the previously rule weights pro-
the same antecedent and the following vided by each map.
consequents and rule weights (see Fig-
4. Final: In this last phase, the results computed
ure 4);
in the previous phases are provided as the out-
put of the computation process. Precisely, the
R1 of RB1 : Class 1, RW1 = 0.8743; generated KB is composed by the DB built in
R1 of RB2 : Class 2, RW2 = 0.9254; the “Initial” phase and the RB, RBR , is finally
R2 of RB3 : Class 1, RW3 = 0.7142; obtained in the “reduce” phase. This KB will
R1 of RB4 : Class 2, RW4 = 0.2143 and be the model that will be used to predict the
R2 of RBn : Class 1, RW5 = 0.8215. class for new examples.
the examples that belong to big data classification ing the fuzzy reasoning method of the wining
sets using the KB built in the previous step. This rule. Please note that Chi-FRBCS-BigData-
approach follows a similar scheme to the previous Max and Chi-FRBCS-BigData-Ave will pro-
step where the initial dataset is distributed along sev- duce different classification estimations be-
eral processing units that provide a result that will be cause the input RBs are also different, how-
part of the final result. Specifically, this class estima- ever, the class estimation process followed is
tion process is depicted in Figure 5 and follows the exactly the same for both approaches.
phases:
1. Initial: In this first phase, the method does not 3. Final: In this last phase, the results computed
need to perform a specific operation. The sys- in the previous phase are provided as the out-
tem automatically segments the original big put of the computation process. Precisely, the
data dataset that needs to be classified into estimated classes for the different examples
independent data blocks which are automat- of the big data classification set are aggre-
ically transferred to the different processing gated just concatenating the results provided
units together with the previously created KB. by each map task.
2. Map: In this second phase, each map task es-
timates the class for the examples that are in-
cluded in its data partition. To do so, each It is important to note that this mechanism does
processing unit goes through all the exam- not include a reduce step as it is not necessary to per-
ples in its data chunk and predicts its out- form a computation to combine the results obtained
put class according to the given KB and us- in the map phase.
5. Experimental study small sample size problem, which degrades the per-
formance of classifiers in the imbalanced scenario.
In this section, we present the experimental study To develop the different experiments we use
carried out using the Chi-FRBCS-BigData algo- a 10-fold stratified cross-validation partitioning
rithm over big data problems. First, in Section 5.1, scheme, i.e., ten random partitions of data with a
we provide some details of the classification prob- 10% of the samples with the combination of nine
lems chosen for the experiments and the configu- of them (90%) as training set and the remaining one
ration parameters for the methods analyzed. Next, as test set. The results obtained for each dataset are
in Section 5.2 we study the behavior of the origi- the average results obtained by computing the mean
nal Chi-FRBCS serial version with respect to Chi- of all the partitions.
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
FRBCS-BigData. Then, in Section 5.3, we provide To demonstrate the inability of the original Chi-
the accuracy performance results for the approaches FRBCS serial version to deal with big data clas-
tested in the study with respect to the number of sification problems, we have compared the results
maps considered. Finally, an analysis that evaluates obtained by the serial version with respect to Chi-
the runtime spent by the algorithms over the selected FRBCS-BigData for the selected datasets.
data is shown in Section 5.4. In order to verify the performance of the pro-
posed model, we have compared the results obtained
5.1. Experimental framework by Chi-FRBCS-BigData-Max with Chi-FRBCS-
BigData-Ave so that we can compare their behavior
This study aims to analyze the quality of the Chi- with respect to the selected big data problems.
FRBCS-BigData algorithm in the big data scenario. Regarding the parameters used in the experi-
For this, we will consider six problems from the ments, these algorithms use:
UCI dataset repository 25 : the Record Linkage Com-
parison Patterns (RLCP) dataset, the KDD Cup • Three fuzzy labels for each attribute.
1999 dataset, the Poker Hand dataset, the Covertype • The product T-norm to compute the matching de-
dataset, the Census-Income (KDD) dataset and the gree of the antecedent of the rule with the exam-
Fatality Analysis Reporting System (FARS) dataset. ple.
A summary of the datasets features is shown in Ta- • The PCF to compute the rule weight.
ble 1, where the number of examples (#Ex.), number • The winning rule is used as fuzzy reasoning
of attributes (#Atts.), selected classes and the num- method.
ber of examples per class are included. This table
is in descending order according to the number of Additionally, another parameter is used in the
examples of each dataset. MapReduce procedure, which is the number of maps
associated to the computation. This value has been
Table 1. Summary of datasets
set to 8, 16, 32, 64 and 128 maps and represents
Datasets #Ex. #Atts. Selected classes #Samples per class
RLCP 5749132 2 (FALSE; TRUE) (5728201; 20931) the number of subsets of the original dataset that are
Kddcup DOS vs normal 4856151 41 (DOS; normal) (3883370; 972781)
Poker 0 vs 1 946799 10 (0; 1) (513702; 433097)
created and are provided to the map processes.
Covtype 2 vs 1
Census
495141
141544
54
41
(2; 1)
(- 50000.; 50000+.)
(283301; 211840)
(133430; 8114)
The measures of the quality of classification are
Fars Fatal Inj vs No Inj 62123 29 (Fatal Inj; No Inj) (42116; 20007) built from a confusion matrix (Table 2), which orga-
Since several of the selected datasets contain nizes the examples of each class in accordance with
multiple classes, in this work we have decided to re- their correct or incorrect identification.
duce multi-class problems to two classes. Despite Table 2. Confusion matrix for a two-class problem
the ability of the Chi-FRBCS-BigData algorithm to Positive Prediction Negative Prediction
address with multi-class problems, we want to avoid Positive Class True Positive (TP) False Negative (FN)
Negative Class False Positive (FP) True Negative (TN)
the imbalance in the data that arises in many real-
world problems.26 Moreover, the division approach The effectiveness in classification for the pro-
of the presented MapReduce scheme aggravates the posed approach will be evaluated using the most fre-
quently used empirical measure, accuracy, which is In terms of the accuracy achieved by the algo-
defined as follows: rithms considered in the study we can see that, in
general, the Chi-FRBCS-BigData versions are able
TP+TN to provide better classification results than the serial
accuracy = (4)
T P + T N + FP + FN version. The unique exception to this tendency can
With respect to the infrastructure used to perform be observed in the “Poker 0 vs 1” dataset, for which
the experiments, we have used the research group’s the serial variant obtains better results.
cluster with 16 nodes connected with a 40Gb/s In- In addition, we can also observe that the re-
finiband. Each node is equipped with two Intel E5- sults in training indicate that there is some over-
2620 microprocessors (at 2 GHz, 15MB cache) and fitting in the serial version as there are any dif-
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
64GB of main memory running under Linux Cen- ferences between the training and test results,
tOS 6.5. The head node of the cluster is equipped mainly on the results for the “Census” and
with two Intel E5645 microprocessors (at 2.4 GHz, “Fars Fatal Inj vs No Inj” datasets.
12MB cache) and 96GB of main memory. Further- Moreover, Table 4 shows the time elapsed in sec-
more, the cluster works with Hadoop 2.0.0 (Cloud- onds for the serial version and the big data alterna-
era CDH4.5.0), where the head node is configured tives. This table is also divided by columns into two
as name-node and job-tracker, and the rest are data- parts: the first part corresponds to the results of the
nodes and task-trackers. sequential variant while the second part is related to
the big data variants of Chi-FRBCS for 8 maps. The
5.2. Analysis of the Chi-FRBCS serial version bold values highlight the quickest algorithm.
with respect to Chi-FRBCS-BigData
Table 4. Average runtime elapsed in seconds for the Chi-
In this section, we present a set of experiments to FRBCS versions
illustrate the behavior of the original Chi-FRBCS Datasets 8 maps
Chi-FRBCS Chi-BigData-Max Chi-BigData-Ave
serial version with respect to Chi-FRBCS-BigData. Runtime (s) Runtime (s) Runtime (s)
The experiments have been designed to contrast the Census 38655.60 1102.45 1343.92
Covtype 2 vs 1 86247.70 2482.09 2512.16
results of the serial version in relation to the big data Fars Fatal Inj vs No Inj 8056.60 241.96 311.95
Poker 0 vs 1 114355.80 5672.80 7682.19
versions of the algorithm for the selected datasets. Average 61828.93 2374.82 2962.56
Table 3 presents the average results in training and
test for the Chi-FRBCS versions and is divided by Considering these results, we can see that the se-
columns into two parts: the first part corresponds to quential version is notably slower than the MapRe-
the results of the sequential variant while the second duce alternatives. Furthermore, the results obtained
part is related to the big data variants of the Chi- show that the runtime spent is directly related to
FRBCS algorithm using 8 maps. The bold values the operations that need to be performed by the big
highlight the best performing algorithm. data approaches. In this way, we can observe that
Firstly, we can observe that the table does not the Chi-BigData-Ave method is slower than the Chi-
present the results for the “RLCP” and “Kdd- BigData-Max algorithm, since it performs fewer op-
cup DOS vs normal” datasets. This means that the erations.
sequential version of Chi-FRBCS was not able to Furthermore, in Table 5 we compare the runtime
complete the associated experiments. spent by the big data versions with respect to the run-
time for the sequential version divided by the num-
Table 3. Average Accuracy results for the Chi-FRBCS versions
ber of parallel processes considered or maps (8 in
Datasets 8 maps
this analysis). This table follows the same structure
Chi-FRBCS Chi-BigData-Max Chi-BigData-Ave as Table 4. Even in this case we can see that the
Acctr Acctst Acctr Acctst Acctr Acctst
Poker 0 vs 1 63.72 61.77 62.93 60.74 63.12 60.91 big data alternatives are significantly faster than the
Covtype 2 vs 1 74.65 74.57 74.69 74.63 74.66 74.61
Census 96.52 86.06 97.12 93.89 97.12 93.86 sequential version.
Fars Fatal Inj vs No Inj 99.66 89.26 97.01 95.07 97.18 95.25
Average 83.64 77.92 82.94 81.08 83.02 81.16
Accuracy (test)
weight of all the partial RBs obtained shows a 99.93
99.92
sus” dataset, which does not follow the same ten- Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
dency as the other datasets. In this case, the Chi- (b) Kddcup DOS vs normal dataset
FRBCS-BigData-Max variant gets slightly better re-
62.00
sults. This behavior may be related to the results in
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
61.00
Accuracy (test)
60.00
58.00
training and test results. 57.00
56.00
8 maps 16 maps 32 maps 64 maps 128 maps
Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
However, this trend is not observed in the case of the (d) Covtype 2 vs 1 dataset
“Covtype 2 vs 1” dataset, where the Chi-FRBCS-
BigData-Ave alternative provides better accuracy re- 94.00
93.80
sults when the number of maps is increased.
Accuracy (test)
93.60
93.40
93.20
93.00
In Figure 6 we represent the average results 92.80
Chi-FRBCS-BigData-Max
32 maps 64 maps 128 maps
Chi-FRBCS-BigData-Ave
the datasets considered: the “RLCP” dataset (Fig-
(e) Census dataset
ure 6a), the “Kddcup DOS vs normal” dataset (Fig-
ure 6b), the “Poker 0 vs 1” dataset (Figure 6c), the 95.50
93.00
is varied. 8 maps 16 maps 32 maps 64 maps 128 maps
Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
100.00
99.90
(f) Fars Fatal Inj vs No Inj dataset
Fig. 6. Average results for Chi-FRBCS-BigData versions
Accuracy (test)
99.80
99.50
99.40
In order to illustrate how the Chi-FRBCS-BigData
8 maps 16 maps 32 maps 64 maps 128 maps
proposal is able to reduce the complexity of the
Chi-FRBCS-BigData-Max Chi-FRBCS-BigData-Ave
model by decreasing the number of final rules, we
(a) RLCP dataset
Chi-BigData-Max Chi-BigData-Ave
has dramatically decreased from the 1708 rules that RLCP 6.0 6.0
kddcup DOS vs normal 299.9 299.9
were created by all the maps to the 301 rules that Poker 0 vs 1 53168.9 53168.9
finally compose the rule base (RBR ). Covtype 2 vs 1 7134.3 7134.3
Table 8. Example of number of rules generated by map and Census 34341.4 34341.4
number of final rules for the Chi-FRBCS-BigData-Max version Fars Fatal Inj vs No Inj 17158.1 17158.1
with 8 maps 32 maps – Average NumRules
Chi-BigData-Max Chi-BigData-Ave
Kddcup DOS vs normal dataset RLCP 6.0 6.0
NumRules by map Final numRules kddcup DOS vs normal 300.5 300.5
RB1 size: 211 RBR size: 301 Poker 0 vs 1 53403.7 53403.7
RB2 size: 212 Covtype 2 vs 1 7210.2 7210.2
RB3 size: 221
Census 34376.5 34376.5
RB4 size: 216
Fars Fatal Inj vs No Inj 17182.0 17182.0
RB5 size: 213
64 maps – Average NumRules
RB6 size: 210
RB7 size: 211 Chi-BigData-Max Chi-BigData-Ave
RB8 size: 214 RLCP 6.0 6.0
kddcup DOS vs normal 300.5 300.5
Poker 0 vs 1 53503.4 53503.4
Covtype 2 vs 1 7278.9 7278.9
Census 34392.5 34392.5
Finally, in Table 9, we present the average number Fars Fatal Inj vs No Inj 17196.4 17196.4
of rules generated for the Chi-FRBCS-BigData al- 128 maps – Average NumRules
gorithms using 8, 16, 32, 64 and 128 maps over the Chi-BigData-Max Chi-BigData-Ave
RLCP 6.0 6.0
selected datasets. This table is also divided into five kddcup DOS vs normal 300.5 300.5
horizontal parts that correspond to the average num- Poker 0 vs 1 53541.0 53541.0
ber of rules obtained with the different number of Covtype 2 vs 1 7343.3 7343.3
Census 34397.3 34397.3
maps. The values in boldface correspond to the low- Fars Fatal Inj vs No Inj 17202.1 17202.1
est numbers of rules obtained for a specific dataset.
First, we can see that in general we obtain a smaller
number of rules for a lower number of maps. On 5.4. Analysis of the Chi-FRBCS-BigData
the other hand, we can see that the number of gen- runtime
erated rules is higher when the number of maps is
increased. In this section we compare the runtime of the two
versions of the Chi-FRBCS-BigData proposal for
the different problems selected and the diverse num-
ber of maps used in the experiments.
We present the average results for the runtime in a
similar way to the analysis of the accuracy given
in the previous Section. Table 10 shows the av-
erage time elapsed in seconds by the Chi-FRBCS-
BigData-Max and Chi-FRBCS-BigData-Ave algo-
rithms for the selected datasets with 8, 16, 32, 64
performance with respect to the Chi-FRBCS- if we are interested in getting faster results with-
BigData-Ave version and provides better response out deeply degrading the performance.
times.
As future work we will study the combination of a
• Both the Chi-FRBCS-BigData-Ave version and
FRBCS learning method with bagging 27 in order
the Chi-FRBCS-BigData-Max alternative, the ac-
to deal with big data classification problems. In this
curacy of the model is decreased by the increasing
way we can analyze the behavior of the use of a bag-
number of maps, although, the speed gain is sig-
ging approach together with data mapping for us-
nificant in this cases.
ing a FRBCS as a base classifier in a MapReduce
In this way, it is necessary to establish a trade-off scheme.
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
Systems with Applications, vol. 41, no. 2, 655–662, 19. I. Palit and C.K. Reddy, “Scalable and parallel boost-
(2014). ing with mapreduce,” IEEE Transactions on Knowl-
10. H. Ishibuchi, S. Mihara and Y. Nojima, “Parallel dis- edge and Data Engineering, vol. 24, no. 10, 1904–
tributed hybrid fuzzy GBML models with rule set mi- 1916, (2012).
gration and training data rotation,” IEEE Transactions 20. Q. He, C. Du, Q. Wang, F. Zhuang and Z. Shi, “A par-
on Fuzzy Systems, vol. 21, no. 2, 355–368, (2013). allel incremental extreme SVM classifier,” Neurocom-
11. J. Dean and S. Ghemawat, “MapReduce: Simplified puting, vol. 74, no. 16, 2532–2540, (2011).
data processing on large clusters,” Communications of 21. H. Ishibuchi and T. Yamamoto, “Rule Weight Speci-
the ACM, vol. 51, no. 1, 107–113, (2008). fication in Fuzzy Rule-Based Classification Systems,”
12. Z. Chi, H. Yan and T. Pham, “Fuzzy algorithms with IEEE Transactions on Fuzzy Systems, vol. 13, no. 4,
applications to image processing and pattern recogni- 428–435, (2005).
Downloaded by [European Society for Fuzzy Logic & Technology] at 07:45 12 February 2015
tion,” World Scientific, (1996). 22. O. Cordón, M.J. del Jesus and F. Herrera, “A proposal
13. V. López, S. Rı́o, J.M. Benı́tez and F. Herrera, “On on Reasoning Methods in Fuzzy Rule-Based Classifi-
the use of MapReduce to build Linguistic Fuzzy Rule cation Systems,”International Journal of Approximate
Based Classification Systems for Big Data,” IEEE Reasoning, vol. 20, no. 1, 21–45, (1999).
International Conference on Fuzzy Systems (FUZZ- 23. Z. Chi, H. Yan and T. Pham, “Fuzzy algorithms with
IEEE 2014), Beijing (China), 1905–1912, 6–11 July, applications to image processing and pattern recogni-
(2014). tion,” World Scientific, (1996).
14. T. White, “Hadoop, The Definitive Guide,” OReilly 24. L.X. Wang and J.M. Mendel, “Generating fuzzy rules
Media, Inc., (2012). by learning from examples,” IEEE Transactions on
15. J. Dean and S. Ghemawat, “MapReduce: Simplified Systems, Man, and Cybernetics, vol. 22, no. 6, 1414–
data processing on large clusters,” OSDI’04: Proceed- 1427, (1992).
ings of the 6th Symposium on Operating System De- 25. K. Bache and M. Lichman, “UCI Machine Learn-
sign and Implementation, San Francisco, California, ing Repository,” [Online; accessed November 2014]
USA. USENIX Association, 137–150, (2004). (http://archive.ics.uci.edu/ml), (2014).
16. S. Owen, R. Anil, T. Dunning and E. Friedman, “Ma- 26. V. López, A. Fernández, S. Garcı́a, V. Palade and
hout in Action,” Manning Publications Co., (2011). Francisco Herrera, “An insight into classification with
17. V. López, S. del Rı́o, J.M. Benı́tez and F. Herrera, imbalanced data: Empirical results and current trends
“Cost-sensitive linguistic fuzzy rule based classifica- on using data intrinsic characteristics,” Information
tion systems under the mapreduce framework for im- Sciences, vol. 250, 113–141, (2013).
balanced big data,” Fuzzy Sets and Systems, vol. 258, 27. K. Trawinski, O. Cordón, A. Quirin, “On designing
5–38, (2015). fuzzy multiclassifier systems by combining FURIA
18. S. Rı́o, V. López, J.M. Benı́tez and F. Herrera, “On with bagging and feature selection,” International
the use of MapReduce for Imbalanced Big Data us- Journal of Uncertainty, Fuzziness, and Knowledge-
ing Random Forest,” Information Sciences, vol. 285, based Systems, vol. 19, no. 4, 589–633, (2011).
112–137, (2014).