You are on page 1of 6

Matrix Inference in Fuzzy Decision Trees

Santiago Aja-Fernandez
LPI, ETSIT Telecomunicacion
University of Valladolid, Spain
sanaja@tel.uva.es
Carlos Alberola-L opez
LPI, ETSIT Telecomunicacion
University of Valladolid, Spain
caralb@tel.uva.es
Abstract
A matrix method for fuzzy systems (FITM) is
used to perform inferences in fuzzy decision trees
(FDT). The method is applied once the tree is
designed and built. Using transition matrices the
output calculation is faster and some undesired
weighted eects of the FDT can be avoided.
Keywords: FITM, matrix inference, fuzzy deci-
sion trees.
1 Introduction
Decision trees have proved to be a simple and
robust method to divide the space in attributes
and to make decisions based on symbolic inputs.
They are by nature readily interpretable and well-
suited to classication problems [1]. A decision
tree consists of nodes for testing attributes, edges
for branching by values of symbols and leaves for
deciding class names to be classied.
Dierent methods have been proposed in order
to create the space partitioning that will generate
the tree. CART and ID3 [2] are two important
algorithms to perform this task. The main ideas
behind both of them coincide: partitioning the
sample space in a way that depends on the data,
and then representing it as a tree. Their aim is to
minimize the size of the tree while they optimize
some quality criterion. CART does not require an
a priori partitioning. It is based on dynamically
computed thresholds for continuous domains. ID3
assumes small cardinality domains and requires a
priori partitioning.
Umano et al [3] proposed a fuzzy extension of ID3,
the fuzzy ID3 algorithm. It is to be used on a
set of fuzzy data. It generates a fuzzy decision
tree using the fuzzy sets dened a priori by the
user. Many further studies on fuzzy trees have
been reported [1, 2, 4].
In this paper we will focuse on the fuzzy infer-
ence performed once the tree is generated. As
a starting point we will suppose we have a fuzzy
decision tree (FDT) which has been created using
some well-known technique. Our purpose is not
to modify or to improve any existing algorithm,
but to improve the inference method over a well-
dened tree. The examples presented have been
done using the fuzzy ID3 algorithm.
To perform the inference we will use a recently
proposed methodology based in transition matri-
ces, known as FITM (Fast Inference using Transi-
tion Matrices) [5]. FITM is a procedure initially
intended for computing with words (CWW) appli-
cations [6], but it may be of interest in other elds,
such as control, image processing or hierarchical
fuzzy systems (HFSs) modeling [7].
FITM methodology has been proposed to perform
inferences in SAM (Standard Additive Model [8])
fuzzy systems (FSs) eciently; it is based on rep-
resenting each input to the FS as a vector, the
coordinates of which are the contribution to the
input of each of the elements of the input linguis-
tic variable (LV). The authors have demonstrated
that, with this assumption, a great deal of the
operations that SAMs have to carry out can be
precomputed and stored as transition matrices,
so only a few operations have to be performed
on-line, leading to a considerable reduction of the
overall computational complexity of the inference
EUSFLAT - LFA 2005
979
process.
In FITM environments, the inputs to the FSs
originally have to satisfy a property; specically
inputs are required to be linear combinations
of the fuzzy sets that the input LV consists of.
This requirement typically holds in CWW appli-
cations. In FDT, when the features of the samples
are expressed in some descriptive language, the
requirement also holds, so it should be possible
to rebuild a FDT using the FITM methodology
and to benet from its associated computational
savings.
This paper is structured as follows. In section 2
a review of FITM procedure is carried out. In
section 3 the use of the FITM methodology in
FDT is introduced. Two methods of inference
are proposed. In section 4 implicit and explicit
rule bases are discussed.
2 FITM Background
This methodology was originally proposed in [5]
to perform inferences eciently in CWW envi-
ronments using SAM-FS. It is totally equivalent
to the SAM inference (in terms of the output cen-
troid) with a considerable reduction of the over-
all computational complexity in the inference pro-
cess
1
.
In FITM environments each input is represented
as a vector in the input space (as it was shown
in [5, 10]). Its coordinates are the contribution
of each fuzzy set of the input linguistic variable
to the input set. The relation between sets and
the rule base is coded in a small amount of ma-
trices. The key of the method is the possibility
to precompute a great deal of operations, so only
a small fraction of the overall complexity of the
SAM system has to be performed on line. In ad-
dition, storage needs are moderate, since only the
transition matrices dened for each FS are needed
to perform the inference. The intermediate data
structures needed to obtain the nal matrices can
be discarded once these transition matrices have
been calculated.
1
Although FITM procedure was originally proposed to
be used in SAM FSs, a natural extension to non-linear FS
has been carried out in [9].
In the following subsection, the method to build
a FITM inference engine for a 2 input single out-
put (2-ISO) FS is presented. For a MISO system
see [5].
FS
fuzzy inputs fuzzy output
Figure 1: Input/output distribution in a FITM
2.1 Construction of matrices: 2-ISO Case
Assume a 2-input single output fuzzy system as
the one in Fig. 1, with inputs X and Y and out-
put Z. The inputs and the output are all fuzzy
sets dened on their respective LVs
2
. First input
LV consists of M possible fuzzy sets A
k
dened
on the universe U R; the second input Y , a LV
consisting of N possible fuzzy sets B
l
dened on
the universe V R and the output LV consists
of L possible fuzzy sets D
n
dened on the uni-
verse W R. Provided that the inputs can be
expressed in vector form [5, 10] :
X =
M

k=1

k
A
k
=
T
A
Y =
N

l=1

l
B
l
=
T
B
then the output is
Z =
L

n=1

n
D
n
=
T
D
The whole SAM inference process can be rewrit-
ten using transition matrices as
=
_
N

l=1

l
_
(1)
with
l
the transition matrix of the system for
input Y = B
l
. In order to build these matrices
2
When the input to a FS is a crisp value x, the acti-
vation of each fuzzy set A is
A
(x). When the input is a
fuzzy set X (as opposed to a crisp value), the activation is
now
A
(X) = A X, or equivalently,
A
(x)
X
(x), with
a properly dened activation operator [11, 12].
EUSFLAT - LFA 2005
980
we must dene some intermediate data structures
that can be discarded once
l
are calculated.
First of all, we must create the activation matrix
of each input; for input X it is dened
3
:
R
A
=
_
_
_
A
1
A
1
A
1
A
M
.
.
.
.
.
.
A
M
A
1
A
M
A
M
_
_
_ (2)
= [A
1
. . . A
M
]
T
[A
1
. . . A
M
]
= A A
T
with A
j
the dierent fuzzy sets of the input LV
and we assume that the operation represents the
sum-product composition. R
B
is dened accord-
ingly for the second input.
Next step is to calculate the matrices G
l
that bear
the relation between inputs
G
l
= R
A
[R
B
E
l
] (3)
where is the Kronecker tensor product, and R
A
and R
B
the activation matrices of each input. E
l
is a column selection vector, i.e., a column vector
with all entries zero but the one at row l, the value
of which is unity. This vector has the purpose of
extracting column l from matrix R
B
.
The rule base of the system is coded in matrix
C. This is a selection matrix with as many rows
as rules in the rule base and, for row j, all the
entries are zero but the one at column i if the
output consequent for rule j is D
i
. (If the output
is not just one set, but a membership degree to
each of the output sets, instead of 1s or 0s, each
value will be the membership degree to each set).
Finally, the transition matrices can be calculated
as

l
= C
T
G
l
l = 1, N (4)
The output centroid, if desired, can be calculated
from the output vector by
z
c
=
[c
1
c
2
c
L
]
[1 1 1]
=
c
T

1
T

(5)
with c the vector of the output set centroids c
i
and as dened in (1). As previously mentioned,
this centroid totally coincides with the one from
the conventional SAM-FS.
3
If the input X is a crisp value instead of a fuzzy set,
matrix R
A
becomes the identity matrix.
2.2 General Case
For the case of a Multiple-input single-output
(MISO) fuzzy system the expressions are ex-
tended accordingly [5]. The relation among co-
ecients is now given by:
=
_
_
N
1

i
1
=1
N
2

i
2
=1

N
F

i
F
=1

1
i
1

2
i
2

F
i
F

F
j=1
i
j
_
_

(6)
where
i
and are the input vectors and

F
j=1
i
j
are the transition matrices of the system.
3 Matrix Inference in FDTs
A sample is represented by a set of features ex-
pressed with some descriptive language. Samples
used as inputs in FDTs usually have non-numeric
features, which make these trees suitable for im-
plementation using FITM. We will suppose that
the input features have as attributes terms that
can be expressed using natural language. The fea-
tures are dened by A
j
, and the values they can
take by F
jl
. Each of these values will have an asso-
ciated fuzzy set. For simplicity. it will be denoted
as F
jl
aswell. For example, if the third feature is
Hair Color: A
3
= {Hair color}, F
31
=light and
F
32
=dark. The output set will be Z, and Z
i
will
be the dierent classes.
hight
Z1=0.1
Z2=0.9
7
Z1=0.5
Z2=0.5
6
dark light
Z1=0.1
Z2=0.9
5
Heavy
Z1=0.8
Z2=0.2
4
Z1=0.7
Z2=0.3
3
dark light
Z1=0.3
Z2=0.7
2
Z1=0.2
Z2=0.8
1
HEIGHT
Middle
HAIR
WEIGHT
Low
HAIR
Middle Light
Figure 2: FDT of the example
We will work with the example shown in Fig. 2
with three features: A
1
(height)= {low, middle,
high}, A
2
(weight)= {light, middle, heavy} and
EUSFLAT - LFA 2005
981
A
3
(hair)= {light, dark}, and two output classes,
Z
1
and Z
2
. We consider two possible inference
methods using transition matrices; this is now ex-
plored.
3.1 Direct tree processing
For this rst method we keep the tree structure in
order not to loose the visual understanding of the
process. We make use of the activation matrices
dened in (2) to carry out the inference in each
node. The steps of the algorithm are as follows:
1. Activation matrix R
A
is created for each fea-
ture, according to (2). In our example:
R
height
=
_
1 h
1
0
h
1
1 h
2
0 h
2
1
_
R
hair
=
_
1 p
1
p
1
1
_
R
weight
=
_
1 w
1
0
w
1
1 w
2
0 w
2
1
_
with h
i
, p
i
and w
i
the overlap degree between
fuzzy sets.
2. Input feature vectors are dened according
to section 2:
h =
_

1

3
_
w =
_

3
_

_ p =
_

1

2
_
(h for height, w for weight and p for hair).
For example, if an input sample is h = low,
w = heavy and p = dark the input vectors
will be h = [1, 0, 0]
T
, w = [0, 0, 1]
T
and p =
[0, 1]
T
.
3. The output of each node is calculated by mul-
tiplying matrices R
A
and the feature vectors:
_

3
_
T
= R
height
h
Output vector must be normalized by its
maximum component to balance the weight
of each branch in the whole process.
4. The values obtained are brought to the deci-
sion tree, as it is shown in Fig. 3 (only some
results are depicted). To get the output value
of each leave you just have to multiply the
values in all the branches from the root to
that leaf.
HAIR WEIGHT
HAIR
HEIGHT
Z
1
= 0.1

2
Z
2
= 0.9

1
Z
1
= 0.3

1
Z
2
= 0.7

1
Z
2
= 0.9

3
Z
1
= 0.1

3
Figure 3: Matrix direct tree processing example.
Hence, all the fuzzy inference is replaced by sim-
ple matrix multiplication. When the decision tree
used is dense, the operation saving is considerable.
3.2 FITM processing
Once the decision tree is created (and properly
tested) and if it is going to be used in a real ap-
plication, it is no longer needed to maintain the
tree structure. In this section we will propose a
method to compress the tree, losing interpretabil-
ity by gaining compactness, speed and computa-
tional saving. The algorithm is as follows:
1. Matrices R
A
are created as before.
2. Composition matrices are created as in (3).
In our example
G
ij
= R
height
[(R
weight
(R
hair
E
j
)) E
i
]
i = 1, 2, 3 and j = 1, 2.
3. Construction of transition matrices
ij
. In
FITM, in order to build these matrices we
need to dene a rule matrix C out of a com-
plete rule base. But usually a decision tree is
equivalent to a fuzzy system with an implicit
rule base. The one of the example is shown
in Table. 1. Only 7 out of the 18 possible
rules (3 3 2) are present. Rule 2, for ex-
ample is a complete rule; it takes on values
for all the features. On the other hand, rule 7
takes on values only for the rst feature. The
blanks for the second and the third means
any value. The rule base is completed in fact,
but some of its values are implicit. There are
EUSFLAT - LFA 2005
982
Rule Height Weight Hair Z
1
Z
2
1 Low Light 0.2 0.8
2 Low Middle Light 0.3 0.7
3 Low Middle Dark 0.7 0.3
4 Low Heavy 0.8 0.2
5 Middle Light 0.1 0.9
6 Middle Dark 0.5 0.5
7 High 0.1 0.9
Table 1: Implicit rule base
two equivalent methods to create transition
matrices from an implicit rule base:
(a) To specify the full rule base. The idea is
to ll the blanks in the rule base with all the
possible values. In our example, rule 1 would
be extended to:
Rule Height Weight Hair Z
1
Z
2
1a Low Light Light 0.2 0.8
1b Low Light Dark 0.2 0.8
Extending all the rules, we come up with a
18-rule base
4
. From this base we may dene
matrix C
C =
_
_
0.2
0.8
. .
1a
0.2
0.8
. .
1b
0.3
0.7
. .
2


0.1
0.9
. .
7b
0.1
0.9
. .
7c
_
_
T
and then the transition matrices would be

ij
= C
T
G
ij
.
(b) Compression of composition matrices.
Matrix C is created according to the rule
base. In our example (from Table 1):
C =
_
0.2 0.3 0.7 0.8 0.1 0.5 0.1
0.8 0.7 0.3 0.2 0.9 0.5 0.9
_
T
Proceeding this way there would be a dis-
crepancy between sizes of matrices G
ij
and
matrix C. Instead of replicating lines
in the rule base, now we merge rows in
matrices G
ij
. To do so, we must rst
understand the meaning of these matri-
ces. In our example, row 1 of matrix
G
ij
is related to the inputs {low,light,light}
and row 2 to {low,light,dark}. Both val-
ues are implicit in rule 1 of table 1:
{low,light,any}={low,light,(light

dark)}.
4
The problem of making an implicit rule explicit is
briey studied in section 4.
It is easy to prove that this union operation,
if carried out by adding rows 1 and 2 in ma-
trix G
ij
is totally equivalent to the extension
of rules in the base proposed for the previous
method. Applying this reasoning to all the
rows:
G

ij
=
_

_
G
ij
(1) +G
ij
(2)
G
ij
(3)
G
ij
(4)
G
ij
(5) +G
ij
(6)
G
ij
(7) +G
ij
(9) +G
ij
(11)
G
ij
(8) +G
ij
(10) +G
ij
(12)

18
k=13
G
ij
(k)
_

_
being G
ij
(k) the k-th row of matrix G
ij
.
Transition matrices are now calculated
ij
=
C
T
G

ij
4. Output values of the whole tree are calcu-
lated using FITM:
_
Z
1
Z
2
_
=
3

i=1
2

j=1

ij
_

3
_

_ (7)
4 About making rules explicit
Suppose a 2-input 1-output FS, with A
i
(i =
1, M), B
j
(i = 1, N) the fuzzy sets of the
input spaces and C
k
(k = 1, L) the fuzzy sets
of the output space. This system will have a rule
base with if-then rules such as
If X is A
i
and Y is B
j
then Z is C
k
An implicit rule (for a 2ISO system) is such as
If X is A
i
then Z is C
k
and it must be understood as If X is A
i
and Y
is any then Z is C
k
. The rule base can be made
explicit by changing adding all the possible values
of B
j
:
If X is A
i
and Y is B
1
then Z is C
k
.
.
.
If X is A
i
and Y is B
N
then Z is C
k
The output of the system will be the same using
the implicit or the explicit sure set if max-min is
used. But if SAM is used the output will not be
the same. For the SAM equation
Z =

j
A
j
(X)B
r(j)
(Y )C
p(j)

j
A
j
(X)B
r(j)
(Y )
EUSFLAT - LFA 2005
983
the implicit rule base has a term such as C
k
A
i
(X)
and the explicit one C
k
A
i
(X)(B
1
(Y ) + +
B
N
(Y )). So, an implicit rule in a SAM system
is not equal to an explicit one, in fact it has a
lower weight over the nal result. This problem
can be solved by adding a weight
i
=

j
B
j
(Y )
to that rule. Note that this can also aect fuzzy
decision trees. If linear operators are used, there
can be some eects derived from the density of
the tree. Leaves reached after a larger number of
nodes can have a stronger weight than the ones
reached after a small number of them. The com-
pletion of the rule base done in section 3.2 for
FITM method indirectly adjusts in the best way
the dierent weights of each branch output.
5 Conclusions
A new way to work with fuzzy decision trees is
introduced. The FITM method is used to carry
out the inference in existing trees in two possi-
ble ways. The rst one keeps the tree structure
unaltered and the second makes a fusion of the
tree information in a compact array of transition
matrices. The key of the method is the fact that
a lot of operations can be precomputed o-line
to obtain the transition matrices, so actual infer-
ences are reduced to a few on-line matrix addi-
tions and multiplications. FITM method can also
avoid some weighting eects that can appear in
FDT as a product of the implicit rules and the
linear operators.
Acknowledgments
The authors acknowledge the Comision Intermin-
isterial de Ciencia y Tecnologa for research grants
TIC2001-3808-C02-02 and TEC2004-3808-C03-01
and the European Commission for the funds as-
sociated to the Network of Excellence SIMILAR
(FP6-507609)
References
[1] A. Suarez and J. Lutsko, Globally optimal
fuzzy decision trees, IEEE Trans. Pattern
Anal. Mach. Intell., no. 12, pp. 12971311,
Dec. 1999.
[2] C. Janikow, Fuzzy decision trees: issues and
methods, IEEE Trans. on System, Man and
Cybernetics - Part B: Cybernetics, vol. 28,
no. 1, pp. 114, Feb. 1998.
[3] M. Umano, H. Okamoto, I. Hatono, and
H. Tamura, Fuzzy decision trees by fuzzy
ID3 algorithm and its application to diagno-
sis systems, in Proc. of FUZZ-IEEE94, Or-
lando, FL, USA, June 1994, pp. 21132118.
[4] M. Dong and R. Kothari, Look-ahead based
fuzzy decision tree induction, IEEE Trans.
Fuzzy Systems, no. 3, pp. 461468, June
2001.
[5] S. Aja-Fernandez and C. Alberola-Lopez,
Fast inference in SAM fuzzy systems using
transition matrices, IEEE Trans. Fuzzy Sys-
tems, vol. 12, no. 2, pp. 170182, Apr. 2004.
[6] L. A. Zadeh, Fuzzy logic = computing with
words, IEEE Trans. Fuzzy Systems, vol. 4,
no. 2, pp. 103111, May 1996.
[7] S. Aja-Fernandez and C. Alberola-Lopez,
Fuzzy hierarchical systems wih FITM, in
Proc. of FUZZ-IEEE04, Budapest, Hun-
gary, July 2004.
[8] B. Kosko, Fuzzy Engineering. New Jersey:
Prentice-Hall International, 1997.
[9] S. Aja-Fernandez and C. Alberola-Lopez,
Fast inference using transition matrices: An
extension to non-linear operators, IEEE
Trans. Fuzzy Systems, , In press.
[10] S. Aja-Fernandez and C. Alberola-Lopez,
Inference with fuzzy granules for computing
with words: A practical viewpoint, in Proc.
of FUZZ-IEEE03, St. Louis, MO, May 2003,
pp. 566571.
[11] S. Aja-Fernandez, C. Alberola-Lopez, and
G. Cybenko, A fuzzy MHT algorithm ap-
plied to text-based information tracking,
IEEE Trans. Fuzzy Systems, vol. 10, no. 3,
pp. 360374, June 2002.
[12] G. Klir and B. Yuan, Fuzzy Sets and Fuzzy
Logic. New Jersey: Prentice-Hall Interna-
tional, 1995.
EUSFLAT - LFA 2005
984

You might also like