Professional Documents
Culture Documents
North-Holland
first-order model can be expressed as follows In developing RW(t, t - 1), w, the number of
[101: years before t, is an important factor. We will
F(t) = t ( t - 1) o g(t, t - 1) (1) refer to w as the model basis. The larger the
number of years, the more observed values there
where o is the m i n - m a x operator and R(t, t - 1)
are to be obtained. It is assumed that different
is the union of all the fuzzy relations between
values for the basis of the model will lead to
any value fii(t - 1 ) o f F(t - 1) and any value fj(t)
different model structures. Later in this paper,
of F(t). When applying a fuzzy time series model
we will consider the effect of the model basis on
in forecasting, it is assumed that the variable forecasting precision.
being forecast is F(t), and thus it can be seen
Since we are dealing with time-variant models,
that the role of R(t, t - 1) is to extrapolate from
the procedure proposed in [11] should be
F ( t - 1) to F(t) and hence it will be called the
modified. For clarity, we simply describe the
extrapolation operator. If at any given time t we
overall procedure, and then provide details of
can obtain only one observation, to build a
each step of the forecasting process by means of
time-invariant model, we have to use all the
an example. Generally, the procedure can be
observations and their fuzzy relationships. In this stated as follows:
case it is actually assumed that F(t) has the same
possible values at any t. Therefore, the model is Step 1. Specify the universe of discourse U
independent of time t. There is the possibility within which fuzzy sets will be defined;
that a fuzzy time series may not have the same Step 2. Partition U into several even length
possible values at different times. If this is the intervals;
case, the method to develop time-invariant Step 3. Define fuzzy sets on U;
models is no longer applicable and we need to Step 4. If the historical data are linguistic
consider alternate approaches. values, go to Step 5; otherwise fuzzily the data
It is our belief that there are many different using the method in [11];
approaches to developing time-variant models. Step 5. Choose a model basis w and at a given
Unfortunately, the definition of fuzzy time series time t, use (2) to calculate RW(t, t - 1) and apply
in [10] gives no guide as to how to develop (1) to forecast F(t); and
time-variant models. In fact, how to develop a Step 6. If the output of the model in forms of
time-variant model with reasonably good ac- fuzzy sets in Step 5 is satisfactory, then stop,
curacy is still an unanswered problem. In this otherwise interpret or defuzzify the output and
paper, we will present one approach and verify stop.
its feasibility through an example. In the It can be seen that Step 5 is different from that
development of the model, the determination of of [11] in that [11] considers all the historical
the extrapolation operator R(t, t - 1) is the key data to develop the model while here only some
issue. To determine R(t, t - 1), let us suppose of the data are considered. In addition, Step 6 of
that we are to forecast the value of a fuzzy time [11] is now included into Step 5 for convenience
series at t with the values at t - 1, t - 2 . . . . . and of calculation later on. Step 7 of [11] has become
t - w ( w > l ) known. Since R(t, t - l ) is the Step 6 here now.
union of fuzzy relations, it is acceptable to In the following example, we will implement
consider all the fuzzy relations hidden within the the above procedure through the forecasting
known values and let the union of all the fuzzy process.
relations be R(t, t - 1). This can be expressed as
follows:
2. 2. Specification o f the universe o f discourse U
RW(t, t - 1) = f T ( t - 2) x f ( t - 1 ) u f T ( t - 3)
(Step 1)
x f ( t -- 2)U. • • u f T ( t -- W)
Generally, to define the universe of discourse
x f ( t - - W + 1) (2) U, find the m i n i m u m enrollment Dmi, and the
where w > 1 is a parameter of the model, f ( u ) is maximum enrollment Dmax from the known
the value of the fuzzy time series at time u, x historical data. Based on Dmi, and D . . . . define
the Cartesian product, and T the transpose U as [Dmi,- D1, Dmax+ De] where D1 and D2
operator. are two proper positive numbers. So far, we
Q. Song, B.S. Chissom / Forecastingenrollments withfuzzy time series
have collected historical enrollment data from 2. 5. Fuzzification of the input data (Step 4)
1971 to 1992. The minimum enrollment Drain and
the maximum enrollment Dmax are 13055 and To fuzzify the historical data, we can apply the
19 337 respectively. Therefore, we choose same method as employed in [11]. Except for the
D1=55 and D z = 6 6 3 to make U = [ 1 3 0 0 0 , enrollment data of 1991 and 1992 which were not
20 000 I. covered in [11], the rest are the same. For 1991
and 1992, the enrollments are 19 337 and 18 876
2. 3. Partitioning U into several even length respectively, which are almost equal to those of
intervals (Step 2) 1990 and 1989 respectively and therefore the
The process is the same as in [11]. We have fuzzified enrollments for 1991 and 1992 are A7
seven intervals which are and A6 (please refer to [11] for details of the
fuzzification process). All the fuzzified historical
u, = [13 000, 14 000], u2 = [14 000, 15 000],
data are listed in Table 1 where Ai (i = 1 to 7)
u3 = [15 000, 16 000], u4 = [16 000, 17 000], are defined in Section 2.4.
u5 = [17 000, 18 000], u6 = [18 000, 19 000],
u7 = [19 000, 20 000].
2. 6. Choosing a model basis w, calculating
2. 4. Defining fuzzy sets on U (Step 3) extrapolation operator RW(t, t - I) and
forecasting (Step 5)
Like [11], we will only consider seven fuzzy
sets which are A1 = (not many), A2 = (not too In this section w is set to be 4. It seems there
many), A3 = (many), A 4 = (many many), A5 = is no need of explaining why w is chosen to be 4.
(very many), A6 = (tOO many), and A7 = (too It is simply for the purpose of demonstration.
many many). To define these fuzzy sets, we can Later in this paper, we will consider the effects
use the same process as in [11]. Thus all the of different values of w on the forecasting errors.
fuzzy sets A~ ( i = 1 to 7) can be defined as Since w = 4 , we can begin with t = 1 9 7 5 so
follows: that we have 4 years' data available. To calculate
Aj = {ul/1, u2]0.5, u3/O, R4(75, 74), according to (2), we need to
determine the fuzzy relations hidden in the data.
u4/O, u5[O, u6/O, U7/0), Generally, to determine the fuzzy relations, we
Az = { U l / 0 . 5 , u2/1, u3[0.5, need to find two consecutive years' data, say, t
u4/O, u5/O, u6/O, UT/0}, and t - 1 . If the data for t is A, and At-l for
t - 1, then the fuzzy logical relationship will be
A3 = {ul/O, u2/0.5, u3/1, At_I---~A, and the fuzzy relation between A t : l
U4/0-5, US/0, U6/0, U7/0), and At will be A,Xl x A,. To apply (1) to forecast
A4 = {Ul/0, u2/O, u3/0.5, the value at t + 1, the value at t will be selected
from Ai (i = 1 to 7). This is different from [11]
U4/1, U5/0.5, u6/O, UT/0},
where the value at t was the fuzzified data in
A5 = {uJO, u2/O, U3/0, Table 1 of [11]. Here, A/ ( i = 1 to 7) is
U4/0.5, U5/1, U6/0.5, U7/0}, pre-defined in Section 2.4. The advantage of
doing so will be discussed later.
A6 = {ul/0, u2/O, u3/O, Table 1 will be used to induce fuzzy
u4/O, u5]0.5, u6/1, u7/0.5), relationships at each time t in the following
A7 = {ul/0, u2/O, u3/O, process.
To demonstrate the calculation process, only
u4/O, Us/O, U6/0.5 , U7/1}. the forecasting for 1975 will be shown below.
where u~ ( i = l to 7) is the element and the The same procedure applies to the remaining
number below / is the membership of u,- to Aj years.
(j = 1 to 7). For 1975, the fuzzy logical relationships are as
For simplicity, we will also use A1, A2 . . . . . follows (the repeated relationships are omitted)
anda 7 as row vectors whose elements are the
memberships of the corresponding fuzzy sets. A1---~AI, and AI---~ A2.
4 Q. Song, B.S. Chissom / Forecasting enrollments with fuzzy time series
Table 1. The fuzzifiedhistorical enrollments Table 2. Output results from 1975 to 1993
Table 3. Training data for the network divided by 105 to scale them. The training data
are listed in Table 3. The second step is to feed
Input Desired some fuzzy quantities into the trained network to
output
produce outputs. Generally, the output will be
0 0 0 0.3 0.5 0.8 1 0.19328 the defuzzified values.
0 0 0 0.25 0.55 1 0.8 0.18970 Following the above principle, we trained a
0 0 0.1 0.5 0.8 1 0.7 0.18150 3-layer backpropagation network of 7 input
0 0.1 0.5 1 0.8 0.1 0 0.16859
nodes, 4 hidden nodes and 1 output node to
0 0.2 0.8 1 0.5 0 0 0.16388
0 0.1 0.5 1 0.9 0.2 0 0.16807
defuzzify the output of the model obtained in
0 0.1 0.5 1 0.9 0.2 0 0.16919 Section 2.6. The network was simulated on the
0.2 0.8 1 0.2 0 0 0 0.15311 PlaNet neural network simulation software [7].
0 0.2 1 0.7 0.2 0 0 0.15984 The learning rate was set to be 1.0 and the
0.2 0.8 l 0.2 0 0 0 0.15433
momentum 0.9. The training process halted
0 0.5 1 0.7 0.2 0 0 0.15861
0 0.6 1 0.6 0.1 0 0 0.15603 when learning epoch--7000 and learning
0.2 0.8 1 0.2 0 0 0 0.15460 error=10 -6. After the network had been
0.8 1 0.8 0.1 0 0 0 0.14696 trained, the output data obtained in Section 2.6
1 0.8 0.1 0 0 0 0 0.13563 were fed into the network and the output data of
l 0.9 0.2 0 0 0 0 0.13867
the network were taken as the defuzzified values
after multiplying by 105. In this case, the
defuzzified values are the forecasted enroll-
defuzzified values. To find good exemplars, we ments. Table 4 lists all the fuzzy outputs, their
are going to use some data in Table 1 of [11], corresponding defuzzified values and actual
i.e., the memberships will be taken as the input values. All the defuzzified values were rounded
to and the corresponding actual enrollments will to hundreds due to the limited precision of the
be taken as the desired output from the network. neural network simulation software. Note that
Note that if the output of the network is within when defuzzifying the output of the model, if the
[0, 1], then the desired output needs scaling. In maximum of the membership is less than 1, all
this paper, all the actual enrollments used are the memberships are divided by the maximum
2
lo' Table 5. Average forecasting errors in percent with different
w's and methods
Actual enrollment r, / - ~ _ ~-
w Neural net Combined method Centroid method
1.9 .... Forecastedenrollment,w=2 ~ ~ /
method [11]
I.~ . . . . Forecastedenrollment,w=4 // \1i` t
2 3.15 3.76 3.44
I--
3 3.89 4.47 4.47
zm 1.7
4 4.37 4.25 4.46
--' ',. I.
5 4.41 4.34 4.55
O1.6 " - j" 6 4.49 4.50 4.69
It/
7 4.35 5.10 4.82
1.5 8 4.45 5.00 5.20
9 4.23 5.10 5.29
1.4 /
Fig. 1. Curves of the actual enrollment and the forecasted To see the relationship between w and the
enrollment. average forecasting errors, w was increased by 1
from 2 to 9 and the corresponding average
membership. In Table 4, the maximum member- forecasting errors were computed with three
ships for some outputs are less than 1. Before different defuzzification methods. Table 5 lists
feeding the output memberships into the the results.
network, they were divided by their maximum It can be seen from Table 5 that almost the
membership. The reason for doing so is that same relationship exists between w and the
when used as defuzzifiers the network works as a average forecasting errors for these three
pattern recognizer. That is, it attempts to match different defuzzification methods. It can be seen
the input with the trained samples. Since all the that generally when w increases, the average
training samples have 1 as their maximum, forecasting error also increases. When w = 2, the
better defuzzification results could be obtained error reaches its minimum, implying that less
when each output of the model has 1 as the complex fuzzy time series models may be more
maximum. The trained network was also used to appropriate than more complex ones. It can also
defuzzify the outputs of models with different be seen that, on the average, the neural network
model basis values, which will be discussed in defuzzification method yields the best forecasting
Section 3. results, and the combined method is slightly
To evaluate the forecasting model and the more effective than the centroid method.
defuzzification effect, the average forecasting
error I was computed. The model and the
defuzzification procedure yielded an average 4. Concluding remarks and discussions
forecasting error of 4.37% with an error range
from 0.2% to 8.28%. When compared to the In this paper, we proposed an approach to
average forecasting error in [11], this forecasting developing time-variant fuzzy time series mod-
error is somewhat larger. But compared to the els, and as an example presented the process of
forecasting errors reported in [1, 3, 4, 8, 9, 14, forecasting the enrollment for The University of
15], the results indicate that the model Alabama with the model developed. To
developed here is acceptable. Figure 1 shows the defuzzify the output of the model, a 3-layer back
curves of the actual enrollment and the propagation neural network was trained and
forecasted enrollments with w = 4 and w = 2 used as the defuzzifier. It was found that of the
respectively. three different defuzzification methods, the
Average Forecasting Error = (sum of forecasting
neural network method yielded the best result.
errors)/(total # of errors), Forecasting Error = (lIorecasted But there are some conditions for applying
value-actual valuel/actual value) x 100%. neural networks as defuzzifiers. Since neural
Q. Song, B.S. Chissom / Forecasting enrollments with fuzzy time series 7
networks function as pattern recognizers in this that the pre-defined membership functions A i
case, they attempt to match the input patterns (i = 1 to 7) were used for forecasting. The
with those of the learned samples. The more advantage of doing so is that since Ai is
similar the inputs and the samples, the more pre-defined, the procedure is equivalent to using
likely good defuzzification effects will result. Our the historical linguistic values both for modelling
experience tells us that if good samples cannot and forecasting. So, using Ai is more realistic
be obtained, one must be cautious when using than [11].
neural networks as defuzzifiers. We have been using enrollment data to
The difference between the time-invariant and indicate how to apply the concepts and methods
the time-variant models is that the development of fuzzy time series. Since all the proposed
of time-invariant models assumes that all the procedures and models are universal, it is our
possible values of the fuzzy time series are the belief that the fuzzy time series can be applied
same at any time, while that of the time-variant wherever there are dynamic processes with
models does not. Therefore, it is possible that at linguistic values as their observations. Even if
different times, the time-variant model can be the processes have numerical values as their
different for the same fuzzy time series. In the observations, as the results in this paper and in
case of time-variant fuzzy time series, how to [11] have shown, fuzzy time series is still a
develop a model with reasonably good accuracy competitive method. We are expecting to see
is still an unanswered problem. We believe there more applications and theoretical refinements of
exist many different methods to solve this fuzzy time series in the future.
problem. Although the approach proposed here
is workable, it is not the ultimate one we are
looking for. There is still a lot of work to do on
this aspect. Acknowledgement
One application area of fuzzy time series
models is the forecasting problem in a fuzzy The authors wish to thank the referees for
environment in which numerical historical data their constructive comments and suggestions
cannot be obtained and only linguistic data are which helped to revise this paper.
available. The results of this paper and [11] show
that the fuzzy time series model is a good tool to
deal with such a forecasting problem. Unfortun- References
ately, the historical data in our study were
numerical, which cannot be applied directly with
[1] S.P. Chatman, Short-term forecasts of the number and
fuzzy time series models. In order to utilize the sholastic ability of enrolling freshman by academic
fuzzy time series model, we had to first fuzzily divisions, Res. Higher Educ. 25(1) (1986) 68-81.
the data and then use the fuzzified data to [2] A. Freeman and D.M. Skapura, Neural Networks:
develop the model. In practice, if the historical Algorithms, Applications and Programming Techniques
(Addison-Wesley, Reading, MA, 1991).
data are linguistic ones, we suggest application
[3] D.E. Gardner, Weight factor selection in double
of the following procedure: exponential smoothing enrollment forecasts, Res.
(1) estimate the universe of discourse U; Higher Educ. 14(1) (1981) 49-56.
(2) on U, define fuzzy sets using the linguistic [4] S.A. Hoenack and W.C. Weiler, The demand for
values; higher education and institutional enrollment forecast-
(3) if a first-order time-invariant model is ing, Economic Inquiry 17 (1979) 89-113.
[5] B. Kosko, Neural Networks and Fuzzy Systems
needed, determine all the fuzzy logic relations (Prentice Hall, Englewood Cliffs, N J, 1992).
hidden in the data; otherwise choose a proper [6] C.T. Lin and C.S. Lee, Neural-network-based fuzzy
model basis w and use Step 5 to calculate the logic control and decision system, IEEE Trans. on
extrapolation operator RW(t, t - 1) as described Computers 40(12) (1991) 1320-1336.
in this paper. Use the fuzzy sets to forecast; and [7] Y. Miyata, A user's guide to PlaNet version 5.6, 1991.
[8] M.B. Paulsen, A practical model for forecasting new
(4) defuzzify the output of the models if freshmen enrollment during the application period,
needed. College and University 64(4) (1989) 379-391.
In Section 2.6 of this paper, we mentioned [9] J.A. Pope and J.P. Evans, A forecasting system for
8 Q. Song, B.S. Chissom / Forecasting enrollments with fuzzy time series