Professional Documents
Culture Documents
net/publication/291355077
CITATIONS READS
18 855
1 author:
Michal Burda
University of Ostrava
68 PUBLICATIONS 306 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michal Burda on 22 January 2016.
Michal Burda
Institute for Research and Applications of Fuzzy Modeling
Centre of Excellence IT4Innovations,
Division University of Ostrava
30. dubna 22, 701 03 Ostrava, Czech Republic
e-mail: michal.burda@osu.cz
Abstract—The aim of this paper is to present a new package such as “very small”, “roughly medium”, or “extremely big”,
for the R statistical environment that enables the use of linguistic where atomic expression (such as “small” or “big”) denotes
fuzzy logic in data processing applications. The lfl package pro- some vague quantity that is extended with a linguistic hedge
vides tools for transformation of data into fuzzy sets representing (such as “very” or “rather”) that further adjusts the vagueness
linguistic expressions, for mining of linguistic fuzzy association of the whole expression. Fuzzy sets provide a mathematical
rules, and for perfoming an inference on fuzzy rule bases using
framework for manipulation and reasoning with such linguistic
the Perception-based Logical Deduction (PbLD). The package
also contains a Fuzzy Rule-based Ensemble, a tool for time expressions. On top of that, a specific inference method,
series forecasting based on an ensemble of forecasts from several Perception-based Logical Deduction (PbLD), was developed
individual methods that is driven by a linguistic rule base created by Novák in [2] together with a suitable technique for defuzzi-
automatically from a large set of training time series. fication: the Deffuzification of Evaluative Expressions (DEE).
PbLD allows a rule base of implicative linguistic IF-THEN
I. I NTRODUCTION rules to be inferred from while taking care of the specificity
of the rule. For instance, a rule with antecedent “age is very
The aim of this paper is to present a new package for the R small” is more specific than a rule with antecedent “age is
statistical environment [1] that enables the use of the linguistic small” since everything that is very small is also small, but
fuzzy logic [2] in data processing applications. not vice versa. In PbLD, more specific rules take precedence
R is an open-source clone of the “S” statistical software. over more general rules, if both of them fire in degree 1. That
It is developed by many contributors around the world and enables e.g. non-continuous changes to be implemented on the
well accepted not only in the academic sphere. Currently, its output of the inference, if needed by the application. We refer
repository contains more than 6000 contributed packages of to [10], [11] for all the details on the PbLD inference.
tools for statistics and data processing. As opposed to the traditional Mamdani–Assilian approach
There already exist several packages for R that are focused [12], which builds the rule base as a disjunction of conjunctions
on vagueness and fuzziness. For instance, the sets package [3] of antecedents and consequents, the PbLD approach is closer
introduces many basic operations on fuzzy sets, FuzzyNumbers to the implicative approach since it handles the rules as impli-
package [4] provides classes and methods to deal with fuzzy cations and rule bases as conjunctions of these implications.
numbers, SAFD package [5] contains tools for elementary The lfl package may be used also within a data mining task
statistics on fuzzy data, and fclust [6] brings the fuzzy K- as a tool for searching for interesting patterns in data since it
Means clustering technique to the environment of the R contains a function for searching for fuzzy association rules.
system. Together with PbLD, it can be used as a machine learning
tool for classification or regression problems. The package
The lfl package [7] described in this paper focuses on
also includes the Fuzzy Rule-based Ensemble for time series
creation of fuzzy rule base systems and their usage in clas-
forecasts [13] as an exemplary application of PbLD and the
sification and prediction. A similar task is performed also by
association rule searching algorithm.
the fugeR package [8] that uses an evolutionary algorithm
for a construction of a fuzzy system from a training data Other software, although not connected with the R sta-
set, or by the frbs package [9] that provides many widely tistical environment, but dealing with the similar topic, is
accepted approaches for building of fuzzy systems based on Linguistic Fuzzy Logic Controller (LFLC) [14]. However,
space partition, neural networks, clustering, gradient descent, unlike LFLC, the lfl R package is a free software with open
or genetic algorithms. source codes.
The lfl package adds other useful fuzzy-related algorithms Since this paper aims at introducing the software, we omit
to that list. They are tightly connected to the notion of a many theoretical details and provide references to relevant
linguistic fuzzy logic that was initially developed by Novák literature only. We also omit the details and peculiarities of the
in [2]. use of the R system and kindly suggest the reader to become
acquainted with the R’s user manual [1], if needed.
A central notion of the linguistic fuzzy logic [2] is the
expression of the form The text provides a lot of examples of commands issued
by the user and corresponding responses of the software. Both
hlinguistic hedgeihatomic expressioni are written using a typewriter font. Moreover, the user’s
commands are labeled with the “>” prompt in the beginning • Each factor, i.e. each categorical column such as “sex”
of the line. (male/female), “employed” (TRUE/FALSE), “marital
status” (single/married/divorced) etc., is transformed
The rest of the paper is organized as follows. First, we into k columns of 0/1’s where k is the number of
inform the reader of how the package can be obtained, in factor levels (i.e. categories).
Section II. Next, the most important functions are briefly de-
scribed in Section III and some technical details are discussed • Each numeric column is transformed into a set of
in Section IV. Section V concludes the paper by drawing columns modeling different linguistic expressions with
some directions of the planned future enhancements of the the atomic expressions: “small” (Sm), “medium”
lfl package. (Me), or “big” (Bi); and the following linguistic
hedges: “extremely” (Ex), “significantly” (Si), “very”
II. O BTAINING T HE lfl PACKAGE (Ve), “more or less” (Ml), “roughly” (Ro), “quite
roughly” (Qr), “very roughly” (Vr). Not all linguistic
To obtain the lfl package, a working instance of the R
statistical environment should be prepared first and then hedges are suitable for every atomic expression, see
e.g. [11] for more details. See also Fig. 1 for depiction
> install.packages('lfl') of the shapes of fuzzy sets created from a single
numeric column with context (0, 0.5, 1).
automatically downloads the lfl package and all its dependen-
To illustrate all the processes, we are going to use an exemplary
cies, compiles, and installs it. The lfl package works on all
data set d throughout the text. It is very easy to load data to the
platforms supported by the R software including Microsoft
R system e.g. in the comma separated values (CSV) format.
Windows, GNU/Linux, and MacOS.
For the sake of simplicity and self-containment, we generate
After the installation is successfull, the following command an artificial data frame with the following command:
causes loading of the package into the working space so that
the user can start using it: > d <- data.frame(age=1:100 * 0.8,
+ salary=1:100 * 5000,
> library(lfl) + sex=ifelse(runif(100) > 0.5, 'M', 'F'))
> head(d)
age salary sex
III. F UNCTIONS , A LGORITHMS 1 0.8 5000 M
2 1.6 10000 M
As of version 1.0, the lfl package provides functions that 3 2.4 15000 F
are listed in Table I. However, the most important functions 4 3.2 20000 F
are in detail described in this section. 5 4.0 25000 M
6 4.8 30000 M
Function Description
aggregate Implicational aggregation of rule consequents into a fuzzy set
antecedents Extract antecedent-part (LHS) of the rules in a list
as.matrix.fsets Convert a fsets object into matrix
cbind.fsets Combine several fsets objects into a single one
consequents Extract consequent-part (RHS) of the rules in a list
defuzz Convert fuzzy set into a crisp numeric value
errors Compute forecast errors
evalfrbe Evaluate the performance of the FRBE forecast
farules Create a class of rules with statistical characteristics
fcut Transform data into a set of fuzzy attributes using triangular or raised cosine shapes of the fuzzy sets
fire Compute truth-degrees of rules on data
frbe Fuzzy Rule-Based Ensemble f time-series forecasts
fsets Create a class of a table with several fuzzy sets
head.farules Return the first part of an instance of the farules class
head.fsets Return the first part of an instance of the fsets class
is.farules Test whether the given object is a valid object of the farules class
is.frbe Test whether the given object is a valid object of the frbe class
is.fsets Test whether the given object is a valid object of the fsets class
is.specific Determine whether the first set of predicates is more specific (or equal) than the other
lcut3, lcut5 Transform data into a set of linguistic fuzzy attributes
pbld Perform PbLD with given rule-base on given dataset
perceive From a set of rules, remove each rule for which another rule exists that is more specific
print.farules Print an instance of the farules class
print.frbe Print an instance of the frbe class
print.fsets Print an instance of the fsets class
rbcoverage Compute rule base coverage of data
reduce Reduce the size of rule base
searchrules Search for fuzzy association rules
sel Select several rows and columns from a data object
slices Return vector of values from given interval
tail.farules Return the last part of an instance of the farules class
tail.fsets Return the last part of an instance of the fsets class
tnorm Computation of triangular norms
triangle Compute membership degrees of values to the fuzzy set
1.0
values
The lcut3 function allows many optional arguments –
e.g. selecting the contexts of each original column, linguistic Fig. 1. An example of a transformation of a numeric value from the
hedges to be used, and other configuration. interval [0,1] into trichotomical linguistic expressions using the default setting
of the function lcut3. Thick lines represent atomic linguistic expressions,
There are also other functions for transformation of nu- i.e. “small”, “medium”, and “big”. The figure depicts the following linguistic
meric data: lcut5 creates pentachotomical linguistic expres- expressions: black lines on the left: ExSm, SiSm, VeSm, Sm, MlSm, RoSm,
sions by adding “lower medium” (Lm) and “upper medium” QrSm, VrSm; blue lines in the middle: VrMe, QrMe, RoMe, MlMe, Me; black
lines on the right: VrBi, QrBi, RoBi, MlBi, Bi, VeBi, SiBi, ExBi.
(Um) as the atomic expressions and “typically” (Ty) as the
linguistic hedge (see Fig. 2); fcut creates triangular, trape-
zoidal or raised-cosinal fuzzy sets and their combinations (see
Fig. 3). by Novák [2] that is not fully implemented for R
Although such functions seem to be redundant from the elsewhere;
perspective of other R packages that deal with similar tasks 2) we need some additional information to be stored
(e.g. frbs package [9]), we had to re-implement them because: along with membership degrees: it is the association
to the original numeric variable (accessible with the
1) we use strictly the theory of linguistic expressions vars function) and the relation of specificity among
1.0 If f is an object of class fsets created with function
lcut3, lcut5, or fcut described in the previous sub-
0.8
membership degree
is as simple as:
0.4
support ...
0.0
0.0 0.2 0.4 0.6 0.8 1.0 sometimes useful to use a reduce function that performs a
values reduction of rule bases based on a coverage of training data
[20]. It is mainly suitable for situations where the automatically
Fig. 3. Example of transformation of a numeric value from the interval [0,1] mined rules are being used as a rule base for an inference, e.g.
into triangular fuzzy sets using the the function fcut. PbLD (described in the next sub-section).
The rule base coverage of data expresses the amount of
fuzzy sets (accessible with the specs function); both data entries, for which there exists a rule with an antecedent
important for rule searching and inference described that models (i.e. “covers”) the data. The reduction algorithm
later. All that information is handled transparently selects a minimal rule base that covers at least the specified
without any assistance or awareness of the user. ratio of data. The algorithm described in [20] turns out to be
very efficient in reduction while retaining the output of the
B. Searching for Linguistic Fuzzy Association Rules PbLD inference.
The searchrules function is an OPUS-inspired [15] Let f be a source data object of class fsets as obtained from
algorithm for searching for fuzzy association rules with ideas any function described in Section III-A, result be a rule
base generated with the searchrules function as discussed
similar to the GUHA method [16] or to the work by Agrawal in Section III-B. Then reduction to a ratio ρ = 0.9 of coverage
[17], [18]. of data by the rule base can be easily performed as follows:
Let A be a finite set of fuzzy sets representing linguistic > reduced <- reduce(f, result, 0.9)
expressions. An association rule is a formula X ⇀ Y , where
X ⊂ A is an antecedent, Y ⊂ A is a consequent and X ∩Y = The algorithm firstly determines the coverage of the orig-
∅. Consider the following rule as an example: inal rule base and then selects such minimal subset of rules
{medium age, high education} ⇀ {high income}. that the new coverage is not below ρ percent of the original
coverage. Details can be found in [20].
The searchrules function traverses through data and
searches for all such rules that satisfy certain restrictive con- D. Perception-based Logical Deduction
ditions specified by the user.
Perception-based Logical Deduction (PbLD) is a specific
For each rule, the following characteristics are computed: inference method that is suitable for use with rule bases
• support (of the antecedent, of the consequent, and of constructed from linguistic expressions [2]. It assumes that
the whole rule); the rules can be partially ordered by their specificity and the
inference is influenced by that partial order: only the most
• confidence of the rule. specific rules are selected for the inference.
These characteristics are indicators of the quality of the rules. The functions of the lfl package generate fuzzy sets together
For details see e.g. [19]. with the specificity relation so that the use of the PbLD
inference is quite easy and straightforward to the user. For Error (SMAPE) is used as an evaluation criterion. SMAPE is
more information on PbLD see e.g. [10], [11]. defined as:
1 X |Ft − At |
n
To run the inference, one must provide input data (as
SMAPE = ,
membership degrees to the fuzzy sets – see Section III-A), n t=1 (|At | + |Ft |)/2
a rule base (e.g. mined with the searchrules function
as described in Section III-B), and an information for the where Ft (resp. At ) are forecasted (resp. actual) values of the
defuzzification of the output: a vector of possible values and its time series and n is the number of predicted values.
corresponding representation in membership degrees of fuzzy The methods with prefix “M3” in Table II are the three best
sets. of those that participated in the M3 competition [27], [28]. The
For example, let f be an input data object of 10 rows methods with prefix “R” are those that are available in the R’s
transformed to membership degrees, r be a rule base of the forecast package [26]. Arithmetic Mean stands for a method
format as returned by the searchrules function (see Sec- that simply averages forecasts of those four R methods that
tion III-B) with consequents containing linguistic expressions are also used in FRBE.
of a variable “output” ∈ [0, 100]. Then before the PbLD
inference can be executed, the samples of output have to be As can be seen, FRBE outperforms all the methods (even
prepared first: M3-Theta, the winner of the competition) both in SMAPE
average (hence produces more accurate results) and Standard
> v <- slices(0, 100, 1000)
> p <- lcut3(v, name='output', context=c(0, 100)) deviation of SMAPE (hence produces more stable results).
The first command creates a vector v of 1000 values from 0 F. Other Tools
to 100. The second command transforms v into membership
degrees to fuzzy sets representing linguistic expressions. After The lfl package contains also other functions. Most of them
that, the inference may be executed on input data f: are helpers used inside of the main functions described above.
Indeed, they can be used separately, if needed. For instance,
> pbld(f, r, p, v) there is a function fire that takes a fsets data set together with
[10] 12 15 17 19 20 21 23 25 27 29 rule base and returns the truth value of the rules on given data.
Function perceive determines the most specific rule among
The result are the values of output variable inferred from the set of rules accordingly to the mechanism used in PbLD.
rule base r for each row of the input values f. There are also functions aggregate and defuzz to merge
the effect of more consequents or defuzzification, including
E. Fuzzy Rule-based Ensemble of Forecasts DEE [2] that was developed specifically for PbLD. A complete
manual is included on the web-pages of the package – see [7].
Data preprocessing, association rules mining, reduction,
and PbLD form together an integrated system for rule base IV. T ECHNICAL D ETAILS
creation and use. The lfl package contains also a standalone
product of these functions. It is a Fuzzy rule-based ensemble The package is implemented mainly in the R scripting
(FRBE), a tool for forecasting of time series [13], [21], [22]. language. However, the most time-critical parts are written in
C++ and accessed from R with the help of the Rcpp package
An extensive search was applied (using the functions [29]. The lfl package compiles in Windows, Linux, and MacOS
described above) to obtain a rule base on various characteristics platforms.
of large set of training time series (details are e.g. in [13]). The
obtained rule base is now a part of the lfl package and drives Some functions written in C++ (such as searchrules
a weighted averaging of four different individual forecasting or reduce) support multi-threaded execution on multi-core
methods: Arima, Exponential Smoothing, Random Walk, and CPUs with the help of the OpenMP C++ library [30]. Func-
Theta. For details on these individual methods see e.g. [23]– tions scripted in the R language support parallelization using
[25]. Implementation of them can be found e.g. in the R’s the foreach [31] and doMC [32] packages. That type of par-
forecast package [26]. allelization is suitable also for execution on high performance
computers powered with the MPI communication library [33].
For example, let s be a vector of quarterly time-series
data. It is important to specify the frequency of the data (1 With R packages Rmpi [34], doMPI [35], and foreach [31],
for yearly, 4 for quarterly, 12 for monthly etc). To obtain a the initialization of the computing cluster and execution of
forecast of 10 data values in the future, we simply use the PbLD inference on large rule base and/or data set is as simple
following command: as: