Professional Documents
Culture Documents
Abstract [3]. The quality of the FCM solution, like that of most
nonlinear optimization problems, depends strongly on
We present an efficient method for estimating cluster the choice of initial values (i.e., the number c and the
centers of numerical data. This method can be used to initial cluster centers).
determine the number of clusters and their initial values
for initializing iterative optimization-based clustering Yager and Filev [4] proposed a simple and effective
algorithms such as Fuzzy C-Means. Here we combine algorithm for estimating the number and initial location
this cluster estimation method with a least squares of cluster centers. Their method is based on griding the
estimation algorithm to provide a fast and robust data space and computing a potential value for each grid
method for identifying fuzzy models from inputloutput point based on its distances to the actual data points; a
data. A benchmark problem involving the prediction of grid point with many data points nearby will have a
a chaotic time series shows this method compares high potential value. The grid point with the highest
favorably with other more computationally intensive potential value is chosen as the first cluster center. The
methods. key idea in their method is that once the first cluster
center is chosen, the potential of all grid points are
1. Introduction reduced according to their distance from the cluster
center. Grid points near the first cluster center will have
Clustering of numerical data forms the basis of many greatly reduced potential. The next cluster center is then
classification and system modeling algorithms. The placed at the grid point with the highest remaining
purpose of clustering is to distill natural groupings of potential value. This procedure of acquiring new cluster
data from a large data set, producing a concise center and reducing the potential of surrounding grid
representation of a system’s behavior. In particular, the points repeats until the potential of all grid points fall
Fuzzy C-Means (FCM) clustering algorithm [1,2,3] has below a threshold. Although this method is simple and
been widely studied and applied. The FCM algorithm is effective, the computation grows exponentially with the
an iterative optimization algorithm that minimizes the dimension of the problem. For example, a clustering
cost function problem with 4 variables and each dimension having a
resolution of 10 grid lines would result in lo4 grid
points that must be evaluated.
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.
it does not involve any iterative non-linear points .near the first cluster center will have greatly
optimization; in addition, the computation grows only reduced potential, and therefore will unlikely be selected
linearly with the dimension of the problem. We use a as the next cluster center. The constant rb is effectively
benchmark problem in chaotic time series prediction to the radius defining the neighborhood which will have
compare the performance of this algorithm with the measurable reductions in potential. To avoid obtaining
published results of other algorithms. closely spaced cluster centers, we set rb to be somewhat
greater than ru ;a good choice is rb = 1.5 ru .
2. Cluster Estimation
When the potential of all data points have been revised
Consider a collection of n data points [ X I J ~ , ...,xn} in according to Eq. (3). we select the data point with the
an M dimensional space. Without loss of generality, highest remaining potential as the second cluster center.
we assume that the data points have been normalized in We then further reduce the potential of each data point
each dimension so that their coordinate ranges in each according to their distance to the second cluster center.
dimension are equal; i.e., the data points are bounded by In general, after the k'th cluster center has been
a hypercube. We consider each data point as a potential obtained, we revise the potential of each data point by
cluster center and define a measure of the potential of the formula
data point xi as
(3)
Accept xz as a cluster center and continue.
else
Reject x i and set the potential at x i to 0.
Select the data point Yith the next highest
and rb is a positive constant. Thus, we subtract an potential as the new xk and re-test.
amount of potential from each data point as a function end if
of its distance from the first cluster center. The data end if
1241
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.
Here Z specifies a threshold for the potential above
which we will definitely accept the data point as a
e- a ( 4 - YY* ) 2
~ ~ (=4 )
cluster center; E specifies a threshold below which we Bj = Z; (for the i 'th rule)
will definitely reject the data point. We use Z = 0.5
and E = 0.15 . If the potential falls in the gray region, where yl is the j'th element of y: and z$ is the j' t h
we check if the data point provides a good trade-off element of 2.; Our computational scheme is equivalent
between having a reasonable potential and being to an inference method that uses multiplication as the
sufficiently far from existing cluster centers. AND operator, weights the output of each rule by the
rule's firing suength, and computes the output value as
3. Model Identification a weighted average of each rule's output.
When we apply the cluster estimation method to a Equations (4) and (5) provide a simple and direct way to
collection of inputloutput data, each cluster center is in translate a set of cluster centers into a fuzzy inference
essence a prototypical data point that exemplifies a system. However, given the same number of cluster
characteristic behavior of the system. Hence, each centers, the modeling accuracy of the system can
cluster center can be used as the basis of a rule that improve significantly if we allow z: in Eq. (5) to be an
describes the system behavior. optimized linear function of the input variables, instead
of a simple constant. That is, we let
Consider a set of c cluster centers (XTJ; ,...,x : ) in an
M dimensional space. Let the first N dimensions
correspond to input variables and the last M - N
dimensions correspond to output variables. We where Gi is a ( M - N ) x N constant matrix, and hi is a
XI
decompose each vector into two component vectors constant column vector with M - N elements. The
y* and z: , where y: contains the first N elements of equivalent if-then rules then become the Takagi-Sugcno
f
x i (i.e., the coordinates of the cluster center in input type [ 5 ] , where the consequent of each rule is a linear
space) and :Z contains the last M - N elements (i.e.. the equation in the input variables. We will refer to fuzzy
coordinates of the cluster center in output space). models that set z: as a constant as "0th order models",
and those that employ Eq. 6 as "1st order models".
We consider each cluster center x: as a fuzzy rule that
describes the system behavior. Given an input vector y, Expressing z: as a linear function of the input allows a
the degree to which rule i is fulfilled is defined as significant degree of rule optimization to be performed
without adding much computational complexity. As
(4) pointed out by Takagi and Sugeno [ 5 ] , given a set of
rules with fixed premises, optimizing the parameters in
where a is the constant defined by Eq. (2). We the consequent equations with respect to training data
compute the output vector z via reduces to a linear least-squares estimation problem.
Such problems can be solved easily and the solution is
always globally optimal. We refer to [ 5 ] for details on
z = i=l how to convert the equation parameter optimization
c problem into the least squares estimation problem. It
suffices to say that we can quickly compute the optimal
i= I Gi and hi that minimizes the error between the output
of the inference system and the output of the training
We can view this computational scheme in terms of an data.
inference system employing traditional fuzzy if-then
rules. Each rule has the following form: 4. Results
if y i is A i & y2 is A2 & ... then z l is B 1 & 22 is B 2 We will first apply the model identification method to a
... simple 2-dimensional function-approximation problem
to illustrate its operation. Next we will consider a
where y , is the j'th input variable and zj is the j ' t h benchmark problem in prediction of chaotic time series
output variable; A j is an exponential membership and compare the performance of this method with the
function and B j is a singleton. For the rule that is published results of other methods.
represented by cluster center x: ,Aj and B j are given by
1242
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.
4.1 Function ADDroximation 1
0.8
For illustrative purposes, we consider the simple
problem of modeling the nonlinear function
0.6
sin Q P
Z=
0.4
Y
“‘1
improved significantly; in particular, the model output
trajectory is now compelled to pass through the cluster
0.6
centers.
J 0.4
0.2 t :i ‘t, 1
0.8
-0.2O
W
0.6
-0.41 /r
-10 -5 0 5 10
0.4
Y
Fig. 1. Comparison of training data with cluster center #5
0.2
unoptimized 0th order model output.
0
Fig. 1 also shows the output of a 0th order fuzzy model -10 -5 0 5 10
that uses constant zr as given by the z coordinate of Y
each cluster center. We see that the modeling error is Fig. 3. Degree of fuljillment of a typical rule as a
quite large. Because the clusters are closely spaced with function of input y , based on the Fuzzy C-Means
respect to the input dimension. there is significant definition of p.
overlap between the premise conditions of the rules.
The degree of fulfillment p of each rule as a function of
y (viz. Eq. 4) is shown in Fig. 2, where it is evident 0.8 - 0 cluster centers
that several rules can simultaneously have high firing
strength even when the input precisely matches the 0.6 -
premise condition of one rule. Therefore, the model 0.4 -
output is typically a neutral point interpolated from the
0.2 -
z coordinates of strongly competing cluster centers.
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.
Although using the FCM definition of p can improve
the accuracy of unoptimized 0th order models, we note
that the p function and the resultant model output
trajectory have highly nonlinear "kinks"compared with
that obtained with the exponential definition of p.
These kinks tend to limit the ultimately attainable
accuracy when optimization is applied to the model.
Using the exponential definition of p generally results We now consider a benchmark problem in model
in more accurate optimized models. In what follows, identification - that of predicting the time series
we will not draw any further comparisons between generated by the chaotic Mackey-Glass differential delay
using the exponential definition versus using the FCM equation [6] defined by:
definition, but present only the results obtained from
using the exponential definition. The output of the
optimized 0th order model is shown in Fig. 5 and he
output of the optimized 1st order model is shown in
Fig. 6. The consequent function z* = Gi y + hi of each The task is to use past values of x up to time t to
rule is shown in Fig, 7. predict the value of x at some time t+AT in the future.
The standard input for this type of prediction is N
1.2
I points in the time series spaced '5 period apart, i.e., the
input vector is y = { x ( t - ( N - 1 ) ~...,
) .X ( ~ - ~ T ) J ( ~ - T ) J ( ~ ) } .
To allow comparison with other methods, we use N=4,
-6, AT=6. Therefore, each data point in the training
set consists of
x = { x(t- l S ) ~ ( t12)~(t-6>~(t),~(t+6))
-
1244
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.
algorithm also generates fuzzy models consisting of the
Takagi-Sugeno type rules. After specifying the number
of membership functions for each input variable, the 1.2
ANFIS algorithm iteratively learns the parameters of
the premise membership functions via a neural network 3 1
and optimizes the parameters of the consequent
equations via linear least squares estimation. ANFIS 0.8
has the advantage of being significantly faster and more
accurate than many pure neural-network-basedmethods. 0.6
Authorized licensed use limited to: NWFP UNIV OF ENGINEERING AND TECHNOLOGY. Downloaded on May 08,2022 at 13:33:37 UTC from IEEE Xplore. Restrictions apply.