Professional Documents
Culture Documents
North-Holland
Abstract: In this paper, we introduce a new class of fuzzy linear regression models based on Tanaka's approach. Unlike in the
Tanaka model, here all training data influence the estimated interval. The (crisp) LP of Tanaka's approach is generalized to a
fuzzy linear programming problem. With the proposed model, an adaptation of the fuzzy regression equation to new data
becomes possible.
Keywords: Fuzzy regression analysis; fuzzy linear programming; averaging operators; adaptive fuzzy regression analysis.
1. Introduction
Correspondence to: G. Peters, RWTH Aachen, Institut fur Wirtschaftswissenschaften, Lehrstuhl fur Operations Research,
D-52056 Aachen, Germany.
2. Tanaka's model
Aj(aj)
=
{1 I~j-a~l -
cj
, o9-cj<~aj<~aj+c j,
(2)
0, otherwise,
where Aj(aj) is the membership function, % is the center, and cj the spread of the fuzzy number.
Applying the extension principle we get
lY -xtal
1 ff~ , x#0,
Y(Y)= 1, x = 0 , y~a0, (3)
0, x = 0 , y =0.
One may choose to minimize the total vagueness of the given training data by
j=O i=l
(where M is the number of training samples) such that the membership degree of each observation is
greater than an imposed threshold:
Y(y~)>ih (i = 1, 2 , . . . , M). (5)
The above leads to the following linear programming problem, where L(.) represents the chosen form
of the fuzzy number A(.). ~
C ~ O, Ol E ~) 1,
Xio : =
(O~h ~ 1;i = 1, 2, . . . , M).
A:
membership
Y P function:
I a
b
0.65
0.3
(a) (b)
~'-
v
Ib,
X ×
Fig. 2. Crisp intervals and outliers.
is an outlier in the set or not (e.g., Figure 2(a)). On the other hand, this assumption leads to bad
estimations if we have 'enough' training data with just one (or a few) outlier(s) (e.g., Figure 2(b)).
We can differentiate between two kinds of intervals (Figure 3). The first interval just covers the
Y Y
I • support
/
(h =0)
y
/ v
X X
Fig. 3. Interval and support.
48 G. Peters / Fuzzy linear regression with fuzzy intervals
training data, and the second is defined by the h-cut, h = 0. The latter equals the support of Y. The
support depends on the imposed threshold h and the shape of the fuzzy number A(.). Different
thresholds h and different shapes of A(.) lead to different supports. The width of the support is
interpreted as a confidence factor in the training data by the user of the model. A wide support means
that the user is pessimistic about the training data covering the universe of the real data [9].
Thus, the first step of fuzzy regression is interpreted as interval estimation. In the second step a
membership function may be constructed, which determines the support (see Figure 1).
It is apparent that this interpretation coincides with a theorem given by Moskowitz and Kim [3]. If
one regression analysis solution, Ah,,L
* , = (a*, c*), is known then other solutions are obtained by (see
Figure 1):
IZll(hl)l
A*2,L2= (a*, c ). (7)
In the following sections we concentrate on the first step of fuzzy regression analysis. We estimate an
interval that just covers the training data, that makes IL-l(h)r := 1. Until now the interval is crisp, the
bounds are determined by just a few (bad) values. A compensation between good and bad training data
is impossible. Therefore, Tanaka's model is very sensitive to outliers.
/
1
v
v
z-p o z b b+p
Fig. 4. Fuzzy constraints.
Introducing a new variable A which represents the membership degree to which the solution belongs
to the set 'good solution', it is easy to show that the above formulation can be written as (see Figure 4)
MAX A
s.t. Apo -- ctx ~ --Z,
where p~ is the width of the 'tolerance interval'. Applying the above fuzzy linear programming model to
(8) we get
MAX a (11)
such that
M N
(1 - A)po - ~ ~ cj [xol >I -do ('objective function'),
i=l j=O
N N
(1 - A)pi + ~ ajxij + ~ cj Ixijl ~Yi (upper limit),
j =0 j=o
N N
(1 -- A)p i -- E OtjXij "4- E Cj Ix,jl I> -y, (lower limit),
j~o j=o
- A ~> - 1 ,
Z,c~>O, a ~ R , X~o:=l,
([L-'(h)] := 1;i = 1, 2 . . . . , M),
where Pi is the width of the tolerance interval of datum yi. The parameters P0 and Pi must be
determined in a context dependent way (e.g. see Section 4, Example 1). The parameter do represents
the desired value of the objective function (represented by A = 1). We suggest to select do = 0 which
makes ~ = o (cj ~ [xvI)= 0 for the desired value of the total vagueness; thus we prefer to obtain a
model as crisp as possible.
50 G. Peters [ Fuzzy linear regression with fuzzy intervals
In (11))t is determined by a trade-off between the objective function (minimization of the spread)
and the equation of the 'worst' datum y. This equation requires the lowest values of ~ in comparison to
the other equations. Therefore, we use MIN as the aggregation operator. Note that there is still no
compensation between 'good' and 'bad' data.
such that
M N
(1 - X)po - ~ ~ cj Ix,jP>~-do,
i=1 j~0
N N
(1 - Ai)pi + ~_~ otixij + ~ cj Ixijl ~ yi,
j~O j=O
N N
(1 - Ai)p - jxij + cj Ixijl
/=o j=o
--h i/> -1,
t~, C ~- O, Ol ~ ff~, Xio := 1,
([L-l(h)l := 1;i -- 1, 2 , . . . , M).
Now a compensation between good and bad training data is achieved. Each training datum influences
the regression function with a weight of 1/M.
v
1.0 1.5 $/mio.
Fig. 5. Utility of the profit.
G. Peters / Fuzzy linear regression with fuzzy intervals 51
0) (ii)
/
Fig. 6. Some membership functions.
infinitely far away from the boundary of the interval have membership degrees of A-* -0o (see Figure
60)).
A can be interpreted easily. We assume a fuzzy system. The system contains vagueness to a certain
degree. Therefore we do not distinguish training data that are 'quite good' in the sense that they are in
the interval. They get the same highest value of A (A = 1). We assume that the worst data of a training
s e t - data that are not in the interval- cannot be explained totally by the vagueness of the system.
Therefore these data have lower membership degrees (A < 1).
To obtain normalized membership degrees one may use the mapping A e ( - ~ , 1 ] ~ A . . . . . lized @
[0, 1]. For example: A. . . . lized = 1/(2 -- A).
(ii) Data which are located on the mean a get the highest membership degrees (even higher than 1).
Again there is no lower bound for A: A ~ ( - ~ , + ~ ) (see Figure 6(ii)). This fuzzy regression model is
related to the statistical Ll-criterion.
x y x y
1 1.5 6 6.3
2 2.3 7 6.5
3 2.7 8 7.8
4 4.4 9 8.5
5 9.4 10 10.5
where wi is a weighting parameter, which is low for old data and high for new data.
As example, we present a model with fuzzy intervals which is formulated in the following way
(Section 3.3, case (i)): 2
1 M
MAX X : ~ E/~i (14)
i-----1
such that
M N
(1 - '~)Po - ~ ~ cj Ix~jl>I -do,
i=1 j=O
N N
(I - ~i)Pi -'}-~ oljxij -]- ~ cj IxoI ~ Yi,
j=o 1=o
N N
(1 - Ai)pl - ~ ajxij + ~ cj Ixol >t -Yi,
j=o j=o
Two experiments are presented. In the first experiment we use a training set of 10 samples. In the
second experiment we present a real-life application with OECD-data.
do Po Pl =P2 =P3 . . . .
0 1000 1
0 100 10
y--f (x)
16
j J
J
i i i i
4.1. Experiment 1
We assume the following linear regression function: y = Ao + A I " X . The training data are given in
Tables 1 and 2. The results are shown in Figures 7-9. There is one outlier in the training data, where
x = 5. In Tanaka's model the interval covers all training data, including the outlier. Therefore the
estimated interval is 'too' wide. We obtain a bad estimation for the given data set (see Figure 7).
The results of the new model are shown in Figures 8, 9. The outlier is compensated by the other
training data. The width of the estimated interval depends on the selection of the parameters do, Po, Pi.
y=f(x)
16
14
12
10
8
j J J .~
J
J J
....:~J
J
• J i
Fig. 8. d o = 0 , po = 1000, P l = P 2 . . . . . p l o = 1.
A weak requirement to minimize the spread (= a high value of Po and low values of Pi) leads to a wide
interval (see Figure 8). Strong requirements to minimize the spread (= a low value of Po and high
values of p;) lead to a narrow interval (see Figure 9).
4.2. Experiment 2
y=f(x)
12
10
-2
1 2 3 4 5 6 7 8 9 10
X
Table 3. O E C D data
influenced by all training data. All training data are either in the estimated interval (2, = 1) or very close
to it (A = 0.99. • .). Therefore we can conclude that there is no outlier in the data.
5. Conclusion
A fuzzy regression model with fuzzy intervals has been introduced. It has been shown that all training
data influence the estimated interval. It has been demonstrated that this model is a useful generalization
of the model of Tanaka.
The extended model should be used in interaction with an expert. The various parameters that can be
chosen and the several ways to formulate the regression analysis make it possible for the expert to add
knowledge to the model that is not contained in the data itself. Therefore it is a very flexible instrument
which can be applied to many real-life applications.
References
[1] J.D. Jobson, Applied Multivariable Data Analysis, Vol. 1 (Springer Verlag, New York, 1991).
[2] J. Kacprzyk and M. Fedrizzi, Fuzzy Regression Analysis (Physica-Verlag, Heidelberg, 1992).
[3] H. Moskowitz and K. Kim, On assessing the h value in fuzzy linear regression, Fuzzy Sets and Systems 58 (1993) 303-327.
[4] OECD, Main economic indicators 1960-1979 (OECD, Paris, 1980).
[5] W. Pedrycz and D.A. Savic, Evaluation of fuzzy regression models, Fuzzy Sets and Systems 39 (1991) 51-63.
G. Peters / Fuzzy linear regression with fuzzy intervals 55
[6] H. Tanaka and H. Ishibuchi, Possibilistic regression analysis based on linear programming, in: J. Kacprzyk and M. Fedrizzi
(Eds.), Fuzzy Regression Analysis (Physica-Verlag, Heidelberg, 1992) 47-80.
[7] H. Tanaka, S. Uejima and K. Asai, Fuzzy linear regression models, IEEE Trans. Systems, Man and Cybernet. 12 (1982)
903-907.
[8] H. Tanaka, Fuzzy data analysis by possibilistic linear models, Fuzzy Sets and Systems 24 (1987) 363-375.
[9] H. Tanaka and J. Watada, Possibilistic linear systems and their applications to the linear regression model, Fuzzy Sets and
Systems 27 (1988) 275-289.
10] H.-J. Zimmermann, Fuzzy Sets, Decision Making, and Expert Systems (Kluwer Academic, Boston, 1987).