You are on page 1of 7

IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.

6, June 2010 277

Learning Recurrent ANFIS Using Stochastic Pattern Search
Hang Yu† and Zheng Tang†,

University of Toyama, Toyama-shi, 930-8555 Japan

Summary done in the area of neuro-fuzzy systems the recurrent
Pattern search learning is known for simplicity and faster variants of this architecture are still rarely studied,
convergence. However, one of the downfalls of this learning is although the most likely first approach was presented
the premature convergence problem. In this paper, we show how already several years ago [13]. Thus, in this paper we
we can avoid the possibility of being trapped in a local pit by the perform a time series prediction on a modified structure of
introduction of stochastic value. This improved pattern search is
then applied on a recurrent type neuro-fuzzy network (ANFIS) to
ANFIS with self feedbacks. By doing so, we can forego
solve time series prediction. Comparison with other method the necessity of preprocessing the time series data to map
shows the effectiveness of the proposed method for this problem. the dynamic structure of the network.
Key words: As reported by Y.Bengio [14] in his paper, gradient-based
stochastic pattern search method, ANFIS, , time series optimization is not suitable to train recurrent type
prediction. networks. Simi1ar results were also obtained by Mozer
[15], where it was found that that back-propagation was
not sufficiently powerful to discover contingencies
1. Introduction spanning long temporal intervals. Learning based on
gradient descent learning algorithms includes real-time
In recent years, the study of time series such as approximation, recurrent learning (RTRL) [16], ordered derivative
modulation, prediction and others constitutes a useful task for learning [17] and so on [18]. Disadvantages of these
many fields of research. A reliable system is a model that is able methods include the complexity of learning algorithms and
to forecast with minimal error to yield good preparation for the
local minimum problems. When there are multiple peaks
future and serves as a good decision-making. For these goals,
different methods have been applied: linear methods such as
in a search space, search results are usually stuck in a local
ARX, ARMA, etc. [1], and nonlinear ones such as artificial solution by the gradient descent learning algorithm. To
neural networks [2]. In general, these methods try to build a avoid these disadvantages, parameter design by genetic
model of the process where the last value of the series is used to algorithms (GAs) seems to be a good choice. However,
predict the future values. The common difficulty of the the learning speed of GA is not satisfactory and sometimes
conventional time series modeling is the determination of difficult to find convergence.
sufficient and necessary information for an accurate prediction. Most of these algorithms suffer from local optimal
On the other hand, neural fuzzy networks [3, 4] have become a problem due to the fact that the error function is the
popular research topic [3-5]. They are widely applied in fields superposition of nonlinear activation that may have
such as time series prediction [6], control problem [7], and
minima at different points which often results in non
pattern recognition [8]. The integration of neural network and
fuzzy logic knowledge combines the semantic transparency of convex error surface. Motivated by this understanding, we
rule-based fuzzy systems with the learning capability of neural introduced a random operator in order to provide the
networks. However, a major disadvantage of existing neuro- mechanism required to escape the local pit and at the same
fuzzy systems is that their application is limited to static time reduce the possibility of premature convergence.
problems as a result of their internal feed forward network The concept of pattern search is to find a better candidate
structure. Therefore, without the aid of tapped delays, it cannot nearby the current one before move to the next stage of
represent a dynamic mapping such as in the case of recurrent search [19]. By introducing a random operator, we
networks [9-11]. temporary increase the error when it comes across a local
Taking this into consideration, we choose ANFIS which is a
pit but by doing so, we are also creating a boast much
pioneering result of the early years of neuro fuzzy. In fact, it is
also regarded to be one of the best in function approximation required to escape the pit.
among the several neuro-fuzzy models [12]. As mentioned In this paper we proposed improved pattern search
earlier, since ANFIS is based on a feed forward structure, it learning for a time series prediction for a neuro-fuzzy
unable to handle time series patterns successfully because it does model that was designed to learn and optimize a
not have any dynamics features as in the case of recurrent hierarchical fuzzy rule base with feedback connections.
network. Despite of the research that has already been The recurrent nature of the ANFIS networks allows us to

Manuscript received June 5, 2010
Manuscript revised June 20, 2010

y(t . ri is the (Non-linear Auto Regressive Moving Average model with parameter set. the mathematical A2 calculation is also easier to understand as compared to the y TT N W2 f2 w2 w2 tedious derivative calculations found in back propagation or the complicated process in genetic learning. represents a feed forward ANFIS network and its weights. forward ANFIS network and external variable U represent = μAi (x) * μBi (y) (3) the network input at a defined time.. (pix + qiy + ri) ……… (5) For the error self-feedback. the output of the feed Layer 2: Rule layer where by the incoming signals are forward network is fed-back to the input of the system.10 No.M)] This layer calculates the normalized firing strengths by (7) calculating the ratio of the i-th rule's firing strength to the By doing so.N). A bell-shaped membership function is chosen with minimum and maximum set to 0 and 1 while the parameters of ai.1). eXogenous variables) approach. June 2010 store information of prior system states internally. …. bi.………U (t .1). it is simple and can be implemented with short f computation time. the conventional the input node by the equation stated below: ANFIS network is adapted with 2 feedback connections so . which that the result of the time series of the previous input can may also lead to a reduced complexity. multiplied by the AND operation and sends out the The output at time t is defined as y (t). Layer 1: Fuzzification Layer (x) (1) Layer 1 is the input layer and it specifies the degree to which the given x satisfies the quantifier Ai.. VOL. The feed-back connection from the output is feedback to As mentioned earlier in the introduction. ci is the parameter set. thus exhibiting various forms of membership functions on linguistic label Ai Fig 2: Output Self-feedback μAi (x) = (2) As shown in Fig 2 and Fig 3. which It computes the overall output as the summation of all the forms a self feedback layer. the bell-shaped functions varies accordingly.278 IJCSNS International Journal of Computer Science and Network Security. The non-linear function f incoming signals. Moreover. The details are introduced in the following. we can often times avoid local minima problems A1 x y and thus have a higher probability of success for attaining x the global maxima. q. the advantages of the w1 w1 proposed learning are that since it is based on pattern B1 TT N W1 f1 search. 2…. As the values of these parameters change. it is similar to the NARMAX where s the output from layer 3 and pi. f represents a feed product as the firing strength of a rule. input variables providing information of prior system states have to be used. Therefore it can be Layer 3: Normalization of firing strengths represented as such: for i = 1. B2 x y Fig1: The structure of the recurrent type neuro-fuzzy network 2. (4) y(t) = f[y(t . It predicts a time series y Layer 5: Summation at time t using as regressors the last p values of the series (6) itself and the network input U of the last P values. By using an improved pattern Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 search. U(t . Recurrent type neuro-fuzzy network The network consists of five layers as proposed by Jang in his paper [20].6. we are able to train the canonical ANFIS as a sum of all rule's firing strengths. Furthermore. recurrent of order N by providing the time delay element Layer 4: Defuzzification layer at the output. Also included is the last P value of the prediction error. since no additional be fed back from the time sequence to the current input. where b is a positive value and c locates the center of the curve.

Furthermore. As pointed out by Magoulus [21]. The gradients calculated at each training example understandable. The improved approach of pattern feed forward network. flexible. than exact algorithms. Based on our multi-layered overcome this problem. When such a point is found. procedure to improve the search performance.1). where 0 < η < 1. for example. e(t) = Yp(t) .……wij ]T (9) 1)………….6. The direction it takes can be either one of the many n directions which represents the number of elements in the neighborhood.…. N. The search begins at some initial The proposed model is well suited to identify with the feasible solution. N0 and uses a subroutine improve to temporal behavior of FF networks by passing the error search for a better solution in the N neighborhood as through a time-delay back to the network inputs. N = [w11. and in practice have shown to be very valuable when trying to solve large instances. Pattern search technique is based on iterative exploration Although some problems have been alleviated by of neighborhoods of solutions trying to improve the increasing the step size. to search more efficiently the neighborhood of During a pattern search. different step sizes an initial solution α and continues to replace α with a cause the search along different paths. The search starts from inaccurate solutions.Y (t) able to minimize the error. by providing solution is found. subsequent is Fig 3: Error Self-Feedback Nk+1 = Nk + ∆k or (13) Nk+1 = Nk + ∆k 3. Instead a pattern search option would be Some of the earlier works of pattern search have already taken and the next iteration would be the same as the been described in the late fifties and the early sixties but it current point. June 2010 279 y(t) = f[Yp(t . The solution of large instances has been made feasible by the 4.2. this has often times lead to current solution by local changes. The new step size would be is only in the last ten to fifteen years that pattern search reduced to η∆k. An important entire pattern search training set has been applied to the aspect is that pattern search algorithms are intuitively network. As for the batch applied to many problems. N (α) is a set of solutions obtained from a mechanism to escape the local minimum.P)] (8) By interactively adjusting N through the search.IJCSNS International Journal of Computer Science and Network Security. It also . defined previously.. . The renewed interest in pattern mode the weights of the network are updated only after the search algorithms has several reasons. we are able to α with a slight perturbation.n when is a positive step size parameter.w12. 0.1)………Yp(t .………0)T (10) E(Nk + ∆k ) < E(Vk) or ( 11) E(Nk .e(t . e(t . The first direction vector at k iteration can be defined as such: = (0……0. The error for this system can be The error feedback describes the difference relating the defined as the difference between the actual system output current output to combinations of inputs and past outputs. the neighborhood consists of the search introduces randomness into a function estimation weights between all layers threshold of the neurons. point is found.10 No. then the iteration is declared successful. Nk+1 = Nk . VOL.∆k ) < E(Vk) (12) In which case 1 is initiated as 1. Traditional Pattern Search Method (14) Pattern search has become a widely accepted technique for The iteration can be termed as unsuccessful if no such the solution of hard combinatorial optimization problems. “bumpy" landscapes are serious solutions and the enormous increase in computer speed concern because it can result in a minimum within the and memory availability. neighborhood which might not be a global minimum. and the expected output.N). we are where the error. Stochastic Pattern Search Method development of more sophisticated data structures. A constant for all the algorithms have become very popular and successfully iterations provided with N = 0 and 0.U(t . effecting final better solution in the neighborhood N (α) until no better outcomes. generally easier to implement are added together to determine the change in the weights.

10 No.280 IJCSNS International Journal of Computer Science and Network Security. The algorithm is similar to the pattern search.b) returns a value between a and b. Else conceptual graph of the error landscape with a local 9. goes on it eventually becomes 1. The random penalty for the above algorithm is given by μ(t) = (15) 5. Because of the mechanism of the 13. This random penalty provides an escape than or equal to x. Simulation Results where the function of random(a. minimum point D.bx(t) (16) . we temporary experience an to attain the saturated level becomes slower. The X-coordinate denotes 10.4 is a 8. 5. laser series and Box-Jenkins. we noticed that climb its neighbors. Comparison with other proven methods is carried out in order to show the effectiveness of the proposed model. EndIf value of error function. N) the state of the network and the Y-coordinate denotes the 11. For try:=1. End local search method.. with varying the value of m. sunspot data. No change in error.< ∆E+) then 5.2e -t/m . Begin 2. the state of network moves towards decrease direction and reaches the local minimum (Point Condition 1: μ(t) = -1. Else if (∆E. . From point C. The three possible conditions with μ(t) can be summarized as following: Algorithm scheme for Improved Pattern search procedure step HLS(α. μ(t) appears randomly as 1. N* = localsearch(α. if the network is 12.N) input problem instance α.< 0 and ∆E. candidate solution N output candidate solution N* 1.5 . If we change the dynamics of the MVL at point A to selected contrary to the original pattern search direction increase the value of error temporarily.μ(t)∆k k The basic idea is explained in the following. remains at the to move towards decrease direction and reaches the global current value. The function removes the fractional part of x In the following section. whose characteristics graph is shown in candidate solution's to a higher function values of some of Fig. Nk+1 = Nk + μ(t)∆k Fig 4: Conceptual graph of the stochastic method 6.< ∆E+) then 7. Fig 5: The characteristics graph of h(x) = . become a new point C. if (∆E+ < 0 and ∆E. No move is taken. α := chosen element of Assing(N) 3. maxTries do 4.. Fig. VOL. The sequence of iteration is B). EndFor initialized onto point A. June 2010 modifies the evaluation function of a penalty weight of a If x is negative.1 Mackey-Glass data The method has also been applied to the well-known Mackey–Glass chaotic series given by the following equation.6. Condition 3: μ(t) = 1. point A can and so the value of error increases temporarily. Parameter t denotes the epoch. the network returns Condition 2: μ(t) = 0. Nk+1 = Nk . For example.. Nk+1 . 0 or -1 but as time stagnation. Nk+1 = Nk minimum and global minimum. has been carried out with the proposed method on 4 types of time series data set such as Mackey Glass. we discusses the simulation that and returns an integer value. As for route from a local pit by making a move from the current h(t) = 1 . it returns the first negative integer less random function. By doing so. At the error increase but nevertheless move away from the beginning.

Method RMSE ARMA 0.007 learning and Chen’s local linear wavelet method.6. and those obtained with other methods.01095 Proposed Improved Pattern search 0.0015 Local linear wavelet with hybrid learning The gas furnace data (series J) of Box and Jenkins (1970) 0. Following previous researchers in order to make a parameters of the series are the following: =17. Table 1 contains the results obtained with our method (last line).IJCSNS International Journal of Computer Science and Network Security. the inputs of the prediction model sampling rate is ∆= 6.0069 5. The data set consists of 296 pairs of input- selection[26] output measurements. The input u(t) is the gas flow into Proposed Improved Pattern search 0.0028 Neural tree [23] 0. Our method is shown to be more efficient since Table 2: Comparison of Box. The sampling interval is 9s. to 500 respectively. it is obvious that with our proposed learning.566 Xu’s model 0. we are able to attain much smaller RMSE as Figure 7 (a) Box-Jenkins Prediction Vs Actual .843 Tong’s model 0.Jenkins test results for various methods it increases the prediction accuracy on the test set.00081 algorithms.10 No.412 ANFIS model [10] 0.0065 Figure 6 (b) Mackey’s Prediction Error Based on Table 1.400 Lee’s model [17] 0.026 Figure 6 (a) Mackey’s Prediction Vs Actual Chen’s LWNN 0.596 Surmann’s model 0.685 Pedryc’s model 0.042 Neural tree model [4] 0.638 Lin’s model [18] 0.0036 is well known and frequently used as a benchmark [25] Evolving radial basis function with input example for testing identification and prediction 0.00050 the furnace and the output y(t) is the CO2 concentration in outlet gas. For this simulation.573 Sugeno’s model 0. The objective is to compare our results with those 4 inputs variables are used for constructing a proposed obtained by other authors on the same data.071 HyFIS model [16] 0. the meaningful comparison.085 FuNN model [11] 0. the training and the test sets are set are selected as u (t-4) and y (t-1) and the output is y (t-P). PG-RBF network [22] 0.511 Nie’s model [20] 0.2 Box-Jenkins Data Radial basis function network [24] 0.Du’s ERB with genetic ANFIS and fuzzy system [20] 0. June 2010 281 Table 1: Comparison of Mackey Glass test results for various methods compared to the rest of the methods especially the more Models[ref #] RMSE recently reported work by H. VOL. The model.

Results obtained by using improved presented by using a recurrent based ANFIS network. [3] J.Search 0. The data are a cross-cut through periodic to chaotic required escape needed to avoid a local pit. Fanelli. pattern search but also improving the disadvantages of The data series were scaled between [0. System Identification Theory for User. References [1] L.5 micron 14NH3 cw (FIR) introducing a stochastic parameter in order to provide the laser. Englewood Figure 8(a) Laser Data Prediction Vs Actual Cliffs. The laser generated data consists of intensity assumption. A. Times Series Prediction: Forecasting the Future and Understanding the Past. we improved the canonical local by measurements made on an 81. Compared with the recent Models[ref number] NMSE result presented as published [26]. 1993. to further strengthen the capability of handling temporal data series data. NJ. pp. Conclusion error. June 2010 Figure 7(b) Box-Jenkins Prediction Error Figure 8(b) Laser Data Prediction Error For box Jenkins case. Jang. The predicted time series and the desired time series are plotted as well as the prediction 6. Based on this competition. The success of any neuro-fuzzy model not only depends on the structural layout but also on the 5. Ljung. a method for predicting time series was other methods. NJ. Addison-Wesley.10 No. we try to predict y(t) based on the Table 3: Comparison of Laser test results for various methods Nie’s approach as cited Table 4. Castellano.M. 1987. [4] C. Mencar. Reading. [5] G.S. The various data series NMSE is compared to the result of other works.282 IJCSNS International Journal of Computer Science and Network Security. “ANFIS: Adaptive-network-based fuzzy inference system. 23. 665–685. Neural Fuzzy Systems: A Neural-fuzzy Synergism to Intelligent Systems. The pattern search shows good predictability ability and are improvement made to the conventional ANFIS network is suitable to be used in time series prediction. C. “A neuro-fuzzy network to generate human-understandable knowledge from . N. 1]. The chaotic pulsations we not only retain the advantages of the conventional follow the theoretical Lorenz model of a two level system.00074 method can achieve higher prediction accuracy than the Proposed Improved Local .G.000025 rest of the cited works. C.3 Laser data learning algorithm since the appropriate tuning of membership function and the rules plays an important role Laser data was utilized in the 1992 Santa Fe time series in improving the prediction accuracy. Prentice- Hall. Gershenfeld. we can see the FIR network [31] 0.00044 developed improved ANFIS with proposed learning Multiscale ANN [32] 0.T.S. intensity pulsations of the laser.R. Englewood Cliffs. Weigend. MA. all the 4 sets of data series shows the lowest predicted error when compared to some In this paper. Vol. By doing so. simulations clearly show the effective and the superiority of the proposed algorithm in a time series application. The calculated local optimum problem. Systems Man Cybern. As seen in the simulation. Prentice-Hall. VOL. May 1996. [2] A.6. Lee.” IEEE Trans. Lin. 1994.

No.” IEEE Trans.S. “Optimal design of neural nets [30] E.” Discr. No. Kim.E. 1993. J. et al. “A self-adaptive neural fuzzy network dynamic systems processing by neural network and genetic with group-based symbiotic evolution and its prediction algorithms. Fuzzy Systems Vol. degree from [20] J.. applications. 2002. 10 No. degree at propagation"'. neural model for dynamic system identification. architecture for time series prediction. Sci. 1995. C. G.J. 2006. “A fuzzy-logic-based approach continually running recurrent neural networks. Vrahatis and G. “Optimization of bilinear time series systems using recurrent fuzzy neural networks. “Genetic pattern search in combinatorial optimization.” IEEE Trans. 2. Systems Man Cybernet. 1212–1228. Jiangxi. Neural counterpropagation network. Oper. 1998.J. Economics. Zipser. Hao. Yasukawa. pp. C. Vol. 963–970. Fuzzy Systems Vol.3. Mozer. 4. 30.” PRICAI 2000.” Neural to qualitative modeling.1.” Neurocomputing networks. B.J. 157. . [9] C. [24] C.” IEEE Trans. C. degree from University of Toyama. Man Cybern. 1. Fuzzy Syst. 2. “DENFIS: Dynamic evolving pp. Res. “Nonlinear system control using No. [19] A. Japan.Wan.K. J. Comput.6. pp. 209. Math. 39-42. 147.Huberman. “Evolving compact and Advances in Neural Information Processing Systems 4. “Time series analysis using normalized PG. [7] C. 349–366.A.H. Geva. Vol. Fuzzy Syst. 36. Vol. Kim. No. pp.H.B.” Cognitive Systems Res.J.” In IEEE Int'l Conf. Syst. compensatory neuro-fuzzy networks. Vol. His main research interests are [22] I.S. 3 No. 48. 2002. No. 2006. in: [31] E.Zhang. Q.” IEEE [29] A. 2002. Chen. 270–280. systems. 2000. pp. [21] G. No.4545-4550. 16. Transaction on Signal Processing Letters Vol. (2000). Syst Man Cybern Vol. on Neural Networks.Bengiot. No.” 518–527.F. pp.C. 2004.665-685.J. 282. H. [23] Y. working towards the D. Juang. on Neural Network. Orlando. Du. pp.” IEEE Fuzzy Systems. “ScaleNet.B. Houstan. VOL. pp. neural-fuzzy inference system and its application for time.” Neurocomputing Vol. USA. Song.10 No.C.D.23. Hanson. Induction of multiscale temporal structure.S. [18] B. 1994. 2006. J. Williams. “Combining fossils and sunspots: Committee using hybrid algorithms.W. Vol. pp. functions on a radial basis function network for time series 2. et al.” IEEE Trans. intelligence computing and neural RBF network with regression weights. “A recurrent fuzzy. No. [15] M. 2003. 2. 1. 7. (Standford University. Lin. Mastorocostas.Int'l. 32.Wan. [13] V.Y. No. A. “A new kernel-based fuzzy pp.S. Pesch. J. Y. Methods & University of Toyama. Jain. Chen. 8.” IEEE Trans.2309–2316. Neurocomputing (2007) article in press. pp. “ANFIS: adaptive-network-based fuzzy inference Jiangxi University of Finance and system. 275. J. pp.Magoulas. Jang. No. No. “Nonlinear system modelling via optimal 2002. Vol. Fuzzy Systems Vol. Vol. Japan in 2009. 3. Toyama.multiscale neural-network Learning Long-Term Dependencies in Recurrent Networks.” PhD thesis (FUZZ-IEEE 94). 5. NonLinear Analysis. Sugeno. 5. 1056. “Time series prediction using a local Fundamentals. pp. Rumelhart. Gorrini. Pearlmutter.” IEICE Trans.” IEEE Trans. Vol.” IEEE Trans. 5.N.. 125–144. “Time series prediction using evolving growing. “Gradient calculations for dynamic [36] J.” Neurocomputing Vol.” IEEE. 69.H. radial basis function networks with new encoding scheme. No. Cunningham.A. pp. No. 1990. 1998.” IEEE Trans. 1036 - 155–170. [14] Y. 5. 9. Xu. Teng. 125-137. 3. pp. Kolen. P. “Recurrent fuzzy systems. Nath. E. “The effect of different basis series prediction. of Neural 2002. Applications.H. “Identification and control of dynamic [28] K. [11] P. “Block-structured recurrent [35] Y. 1994.1. pp.1006–1023. 135– neural system modeling. Vol. Dawson. 42. No. Abraham and B. pp. design of neural trees. “Predicting Trans.” Vol.” Int. pp. [33] M.H. Neural Networks. VIC predictions. Rojas. D. 144–154. 3. M. Nie. “A learning algorithm for [34] M. Satini. 8.Simardt. T. Kasabov. China in 2006 and an pp.Weigend. Comput. 1989. encoding scheme. [12] A. Lippman). Harpham. “ATSK-type recurrent fuzzy network for [27] C. Morgan Kauffmann. “Constructing fuzzy model by self-organizing recurrent neural networks: a survey. models using fast evolutionary programming. “Finite Impulse Response Neural Networks with Proceedings of the 3rd Conference on Fuzzy Systems Applications in Time Series Prediction. 1995. [6] N.H.11 No 4. 1995. pp.” IEEE Trans. 167-285. Now he is the alleviation of the problem of local minima in back. 190–197. pp. 10. Man Networks Vol.Theory.E. No. [16] R. S. (eds. 2.IJCSNS International Journal of Computer Science and Network Security. Bimbo. pp. “A new approach to fuzzy- neural networks. No. 1.Androulakis. D.D. 273–284.” Fuzzy Sets and Systems Vol. C.A. 6. June 2010 283 data. 1993. 1183 – 1188. Vol. 1471-1482. Chellapilla. E86-A No. 1995. Chen. pp. 2176-2180. pp. [17] S. “The Problem of [32] A. Syst. pp. linear wavelet neural network. 449-465. 176–190. P. Vol. Lin. Lin.Frasconit and P.R. Appl. Rao.A."' On Toyama. 69. Bersini. Neural Sys. 2161-2170. 1993). 2003. 25. 1993.” IEEE Trans. 6. Melbourne. Lee. et al. 14. R. Theocharis. Cybernet. Vol. M. IEEE Trans. 2. the Future:A connectionist approach. interpretable Takagi-Sugeno fuzzy models with a new Moody. [25] Y.” Neural Networks Vol. Yu Hang received the B. Vol. 4. Combin.1997. Chiang. prediction:a comparative study... [10] C.2. Vol. No. 1992. pp.9. Vol. 1997. 7–31. pp. pp. Lee.S. clustering approach: support vector clustering with cell [26] H. [8] J. 2006.” Vol. 2. N.