IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 7, NO.

5, OCTOBER 1999

617

Analysis and Design of Hierarchical Fuzzy Systems
Li-Xin Wang

Abstract— In this letter, the hierarchical fuzzy systems are analyzed and designed. In the analysis part, we prove that the hierarchical fuzzy systems are universal approximators and analyze the sensitivity of the fuzzy system output with respect to small perturbations in its inputs. In the design part, we derive a gradient decent algorithm for tuning the parameters of the hierarchical fuzzy system to match the input–output pairs. The algorithm is simulated for two examples and the results show that the algorithm is effective and the hierarchical structure gives good approximation accuracy. Index Terms— Hierarchical structure, iterative training, universal approximation.

I. INTRODUCTION S the application domain of fuzzy control expands from simple systems to more complex systems, a serious limitation of the standard fuzzy controller was discovered: the number of rules in a standard fuzzy controller increases exponentially with the number of variables involved [8], variables and fuzzy sets defined for [15], [16]. With rules to construct a complete each variable, we need increases, the rule base will quickly fuzzy controller. As overload the memory and make the fuzzy controller difficult to implement. Therefore, a research of fundamental importance is to develop methods to deal with this rule-explosion problem. The hierarchical fuzzy system, proposed by Raju, Zhou, and Kisner [8], provides a way to deal with this problem. The hierarchical fuzzy system consists of a number of lowdimensional fuzzy systems connected in a hierarchical fashion. Fig. 1 shows a typical example, where two input variables are put into a fuzzy system whose output is combined with another input variable into the second fuzzy system, and this procedure continues until all input variables are used. The hierarchical fuzzy systems have the nice property that the total number of rules increases only linearly with the number of input variables [8]. For the hierarchical fuzzy system in Fig. 1, we see that if we define fuzzy sets for each input variable, then each lowrules and, therefore, dimensional fuzzy system consists of , which is a linear the total number of rules is function of the number of input variables . The objective of this letter is to analyze the properties of the hierarchical fuzzy system and design the hierarchical fuzzy system based on input–output data. In Section II, we analyze the universal approximation and sensitivity properties of the hierarchical fuzzy system. In Section III, we derive a gradient
Manuscript received November 11,1996; revised July 15, 1999. This work was supported in part by the Hong Kong RGC under Grants HKUST684/95E and HKUST778/96E. The author is with the Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong. Publisher Item Identifier S 1063-6706(99)08262-4.

A

Fig. 1. An example of n-input hierarchical fuzzy system which comprises n 1 two-input fuzzy systems.

0

descent algorithm for designing the hierarchical fuzzy system from input–output data. In Section IV, the design algorithm is tested for two examples. Section V concludes the letter. II. ANALYSIS OF HIERARCHICAL FUZZY SYSTEMS A. Construction of the Hierarchical Fuzzy System Consider the hierarchical fuzzy system in Fig. 1. Although this is a special hierarchical fuzzy system (each lowdimensional fuzzy system has only two inputs), it is optimal in the sense that its total number of rules is minimal among inputs [8]. Suppose all hierarchical fuzzy systems with fuzzy sets are defined for each variable , where that and . The fuzzy system in the lowest level of Fig. 1 is the Takagi–Sugeno–Kang (TSK) fuzzy system

(1)

are linear or nonlinear functions. In general, where bottom up) is the fuzzy system at level (

1063–6706/99$10.00 © 1999 IEEE

The whole hierarchical fuzzy system is Since and . If is more sensitive to . the When a small perturbation can be computed according change in the output to the Mean-Value Theorem. Using the error bound (5). Theorem 1: For any continuously differentiable function on there in the form exists a hierarchical fuzzy system of (4) such that (5) is defined as where the infinite norm and . say is added to the input . an interesting topic is to analyze the sensitivity of the system output with respect to small perturbations in the input variables. Sometimes. if a perturbation in an input is amplified after passing through a low-dimensional fuzzy system. we may have the knowledge that some variables are more important than others. the more sensitive the output to this input. we should reverse the ordering. we can determine a ranking of importance of the variables involved. VOL. we important variable as need to know whether the system output is more sensitive or to . An important question is from whether this nonlinear mapping is general enough to approximate any nonlinear function to arbitrary accuracy. we also put it in the Appendix. we see where is larger than one. as follows: (7) is some point in . are finite. 1 we see that the input variables go through different number of two-dimensional is passed through only one 2-D fuzzy (2-D) fuzzy systems: and are passed through all the fuzzy system. we obtain the following universal approximation theorem. while systems. From Fig. This is analogous to the states in the state-space representation of systems. then the more an input goes through the low-dimensional fuzzy systems. Indeed. we will concentrate on the sensitivity analysis of a single low-dimensional fuzzy system.618 IEEE TRANSACTIONS ON FUZZY SYSTEMS. is continuously differentiable and Proof: Since has bounded derivatives over the compact set . and are finite numbers. we should put to and the least important the most important variable as . B. In many practical problems. that is. 7. So a practical question is: Should we put the most or as ? To answer this question. Proof of this theorem first appeared in [13]. Since the low-dimensional fuzzy systems in the hierarchical fuzzy system have the same structure and are connected in a regular fashion. for completeness of this letter. NO. we can make arbitrarily small by choosing large . the sensitivity analysis of the whole hierarchical fuzzy system can be simplified to the sensitivity analysis of the low-dimensional fuzzy system. by defining sufficiently large number of fuzzy sets for each variable. variable as Therefore. amplified. otherwise. (3) C. otherwise. Sensitivity Analysis of the Hierarchical Fuzzy System The best application of hierarchical fuzzy systems would be systems with a natural hierarchical structure. From (7). In such a case. of (1). For simplicity. 5. . where a state characterizes some key feature of the system but does not necessarily correspond to any physical variable. Thus. the perturbation is that if will be smaller than . the the “internal state variables” of the system. Hence. can still be interpreted as If this is not the case. OCTOBER 1999 the TSK fuzzy system Theorem 2: For any continuously differentiable function on the compact set there exists a hierarchical fuzzy system in the form of (4) with bounded derivative such that (6) (2) where is an arbitrarily small positive constant. by choosing sufficiently small we can make membership functions ( ) are where and defined for the intermediate variable are linear or nonlinear functions. correspond to the physical variables of the system. Therefore. The following where theorem gives the error bound between the hierarchical fuzzy system (4) and the function to be approximated. we consider the three-variable hierarchical fuzzy system (4) and are defined as in (1) and (2). We will show that the hierarchical fuzzy system is indeed an universal approximator. Approximation Property of the Hierarchical Fuzzy System The hierarchical fuzzy system (3) is a nonlinear mapping to .

4.. we to be that used in constructing the choose the fuzzy system universal fuzzy system in Theorem 1. unfortunately. 4. From (1). Specifically. we choose and to be the following equally spaced triangular and membership functions which cover the domains . 1. We now analyze the sensitivity of in (13) with respect to small perturbations in its inputs. and (11) . we cannot . Here. where and the triangular membership functions are defined according to (17) from which we obtain the following theorem. .WANG: ANALYSIS AND DESIGN OF HIERARCHICAL FUZZY SYSTEMS 619 the sensitivity analysis is equivalent to the analysis of the . that . are we have. That is. (12) to . (13) can be simplified to (14) (8) where From (8). . we can conclude . we see that at any point the summation of the two nonzero membership and functions equals one. from or (See Fig.) By choosing the function . we see that. there is no general conclusion about which inputs are more influential to the system output. Therefore. we at most see that at any point ( ) two membership functions among will be nonzero.e. 4 in the Appendix. Furthermore. to obtain constructive conclusions we have to be restricted to specific fuzzy systems. From (9)–(12) and Fig. respectively: (9) (10) Thus for . Therefore. 4. we have (16) Since the membership functions triangular and equally spaced over Fig. the fuzzy systems with a particular choice of the membership functions and and a simpler function for . we obtain be a constant . then is a contraction mapping of (i. put a preference ordering among Consequently. of (14) can be further simplified to (15) Taking the partial derivative. ) and we have (18) is defined in (7). of (13). in general. from Fig. if Theorem 3: For the fuzzy system and for any and . that is. Observation 1: For the hierarchical fuzzy system in Fig. Hence. then from (17) we have (13) (19) ( ) is an integer corresponding to the first nonzero ( ). where Proof: If the conditions in the theorem are true. we have magnitude of Although this is a particular fuzzy system. it is general enough to be useful in constructing the universal hierarchical fuzzy system (see the Appendix). our first observation about the sensitivity analysis is a negative one. for example. with . The foralmost nothing about the magnitude of mula is so complex that many factors can influence its value.

If would not be wellfunctions like (9)–(11). Although Observations 2 and 3 are not precise conclusions. Observation 2: Roughly speaking. would amplify the perturbation and larger values of would reduce when it is passed through the smaller fuzzy system. In practice. in some sense.620 IEEE TRANSACTIONS ON FUZZY SYSTEMS. The task of the input points this section is to design a hierarchical fuzzy system that matches the input–output pairs . For the equally spaced triangular membership functions (9)–(12) (or similar functions). From (4) and (23). since to the vertices are isolated points. We use the gradient descent algorithm to determine the . 7. they give rough ideas of how the membership functions and the parameters in the fuzzy system influence its sensitivity to the perturbations in the inputs. Specifically. OCTOBER 1999 Since (19) is true for all and It follows from (7) that . pendix) and the free parameters are Our goal is to determine these free parameters such that the matching error (23) is minimized. let be the three-variable hierarchical fuzzy system of (4). and in are shown the structure of the fuzzy systems in (21) and (22). fewer membership functions be defined for should be defined for . VOL. From the Appendix we see that in order for the to be an universal approximator. the smaller the values of . it is sufficient to: 1) fix the and membership functions covering the variables . otherwise. shown at the bottom of the page. however. 2) choose to be a constant . defined at the vertices of the membership functions. the larger the sensitivity function bership functions . we define be the average of its left and right values at these points. Observation 3: Roughly speaking. That is. But. we are given a number . the sharper the mem. (21) (22) . DESIGN OF HIERARCHICAL FUZZY SYSTEMS THROUGH TRAINING Although in the proof of Theorem 1 (see the Appendix) we construct a fuzzy system that satisfies the error bound (5). Although the precise conclusion (18) can only be obtained under the restrictive conditions in Theorem 3. where the . we require that the values of the approximated function are known at some regular points over the domain . and using stepsize. we have . then more membership functions should through . the smaller the sensitivity function . it is easy to get (26) From (22) we have as shown in (27) at the bottom of the next were chosen to be the triangular membership page. Therefore. Therefore. we use the training parameters. where of input–output pairs cannot be arbitrarily chosen. NO. For simplicity and without loss of much generality. and 3) choose to be a ’s order polynormals of . we may draw some rough observations from the inequality (17). this means that if be enlarged after passing one wants the perturbation . and the chain rule. to determine algorithm (24) is the training index. 5. and are fixed (may membership functions be chosen as the triangular membership functions in the Ap. it is often the case that we only know the values of at a limited number of points and the locations of these points cannot be arbitrarily chosen. we have (25) From (21). . is a constant where . we see that if the fuzzy system has smaller (such that ) and fewer values of number of membership functions (such that ). then the magnitude of the perturbation will be reduced after passing through . . (20) From Theorem 3. III.

We now summarize the algorithm as follows. . If linguistic descriptions are available for (32) . we obtain the where : following training algorithm for (31) Standard fuzzy systems have been simulated extensively in the literature. Step 4) Go to Step 2) with IV. and (4) with . we define six equally spaced triangular memberfor each input variable . respectively. Training Algorithm for Designing the Hierarchical Fuzzy System: Consider the three-variable hierarchical fuzzy system and given by (21) and (22). In this section. respectively. does not depend on . where respectively. we use the training algorithm in Section III to design hierarchical fuzzy systems to approximate a nonlinear function (Example 1) and the nonlinear component in a dynamic system (Example 2). but we know on at some regular points in . and is a constant . to determine (29) where stepsize. the values of input–output pairs: .WANG: ANALYSIS AND DESIGN OF HIERARCHICAL FUZZY SYSTEMS 621 Summarizing (24)–(27). may be chosen uniformly across the domain may be chosen to be small random of and numbers. . . and at the th stage of training. Our task is to design a hierarchical fuzzy system using the training algorithm in Section III based on these 6 input–output pairs. . Step 1) Choose the membership functions and the initial parameters and and . we have from (22) the function to be approximated. In Step 1). Step 2) For a given input–output pair . where we are given for . either can outperform the other by properly choosing the structure. Substituting (30) into (29). . Assume that is unknown. ship functions over (27) . SIMULATIONS (30) . Since both standard and hierarchical fuzzy systems are universal approximators. Specifically. we use the training algorithm Next. . or until equals a prespecified maximum training step. . until the error Step 3) Go to Step 2 with is less than a prespecified small number . and can be chosen according to the linguistic information. Depending on the structure chosen. and all the functions are evaluated at and . suppose we are given a collection of input–output pairs: . we obtain the training algorithm for : (28) and are given by (27) and (26). the results vary from poor to almost perfect approximation. Since and (23) that . . If no linguistic information is available. Example 1: Consider the function where all the functions are evaluated at and . update the parameters from and to and according to (28) and (31).

2 for . . The problem of the standard fuzzy system is that the degree of freedom is unevenly distributed over the IF and THEN parts of the rules. In the training algorithm. Since is three-dimensional. Clearly. that is. V. 2. Fig.” (37) is assumed to be unknown. A positive conclusion was obtained for the universal approximation property: we proved that the hierarchical fuzzy system can approximate any continuous nonlinear function to arbitrary accuracy over a compact domain. . we cannot plot and directly. be Let the domain of and six equally spaced triangular membership functions be . and (33) for (34) for and (35) . tries to provide a balance between the IF and THEN parts. the degree of freedom of the function used for approximation has to increase exponentially with the number of variables involved. although rough observations were drawn for some particular fuzzy systems. NO. and the intermediate variable are equal to . . Roughly speaking. we use the identification model (38) and the training algorithm in Section III to tune the parameters in the hierarchical fuzzy system such that the tracking error is minimized. Output of the system y (k ) and the output of the identifier y (k ) in ^ Example 2. Relative error between the designed hierarchical fuzzy system and the function to be approximated in Example 1. points uniformly distributed over is plotted in Fig. Our task is to design a hierarchical to approximate fuzzy system . VOL. Fig. the curse of dimensionality is still inherently there. The relative error between the final hierarchical fuzzy system and the function to be approximated . 3. . we choose defined over each . The hierarchical fuzzy system. In fact. the in the training algorithm is the in this versus example. we analyzed the universal approximation and sensitivity properties of the hierarchical fuzzy system and designed the hierarchical fuzzy system using a gradient descent training algorithm. and the maximum training step equal to 20. on the other hand. For the sensitivity analysis. . the standard fuzzy system achieves universal approximation using “piecewise constant functions. Example 2: Consider the third-order dynamic system (36) where and the nonlinear component and the maximum training step equal to 20. CONCLUSIONS In this letter. if one wants to achieve such general property like uniform and universal approximation without structure restriction on the function to be approximated. OCTOBER 1999 Fig. we choose the stepsizes .622 IEEE TRANSACTIONS ON FUZZY SYSTEMS. 5. with a comprehensive IF part to cover the whole domain and a very simple THEN part. Although the hierarchical fuzzy system reduces the number of rules. Specifically. 7. 3 we see that the the output of the identifier identification model converges toward the true system. 3 illustrates the output of the system . From Fig. We choose the initial values for and to be small random numbers. 2 we see that the maximum relative error is about 10%. The gradient descent training algorithm was used to design the hierarchical fuzzy systems to approximate a nonlinear static function and the nonlinear component in a dynamic system and the simulation results showed that the training algorithm was effective. the conclusion was somewhat indecisive: there is no general conclusion about which inputs to the hierarchical fuzzy system are more influential to the output.” while the hierarchical fuzzy system achieves universal approximation through “piecewise polynomial functions. In Steps 2)–4). From Fig. with an incomplete IF part but a more complex THEN part.

we have . . Also. Let and substituting (45) into (46). be ( ) and Step 1) Let the domains of (we will design the firstthe domain of be ).WANG: ANALYSIS AND DESIGN OF HIERARCHICAL FUZZY SYSTEMS 623 The analysis and design in this letter are for three-variable hierarchical fuzzy systems. . The construction is system conducted through the following four steps. . vectors . Since and is a weighted average of the ’s. . level fuzzy system such that its output fuzzy sets in with Define the following equally spaced triangular membership functions: (39) (40) for and (41) where with and the triangular membership functions are defined according to (44) are chosen as the where the functions ’s order polynomials of : following (45) parameters ( ) are determined as follows. An example of fuzzy sets defined for each variable (the case). 4 shows the . Since . for and at most ’s will be nonzero for any . Although the ideas can be extended to higher dimensional hierarchical fuzzy systems. case of . we (47) . 4. Fig. The TSK fuzzy system where in the first level is designed as (46) for some have . . applications of the hierarchical fuzzy systems to more examples are needed in order to obtain a more complete picture about the advantages and disadvantages of the hierarchical structure. Step 2) Define the constants . (43) which is a standard fuzzy system—a special case of the TSK fuzzy system (1). fuzzy sets in Similarly. the technical details are yet to be worked out. define with the equally-spaced triangular membership functions as in (39)–(41). two we have from (44) that and the Fig. APPENDIX PROOF OF THEOREM 1 We prove this theorem by constructing a hierarchical fuzzy that satisfies (5). . collect the For fixed and equations of (47) into the matrix form (48) where the . m=4 Step 3) Let the TSK fuzzy system in the second level be (42) or .

. . “Universal approximation by hierarchical fuzzy systems. A. 1994. 219–235. V. . 1991. Cambridge. Donoho and I. 1985. Conf. Zhou. and R. 5. [10] T. [2] D. and S. implies . Using the Mean-Value Theorem and the fact that . Evolutionary Computation (ICEC’96). 973–980. 7. Step 4) The overall hierarchical fuzzy system is obtained as (50) and are given by (43) and (44). pp. [13] . 4. pp. 1993. J. Chen. . Statistics. We now show that the hierarchical fuzzy system designed . Man. J. First.” Fuzzy Sets Syst.” IEEE Trans. V. T. J. we have from for .” IEEE Trans. Let . 4th Int. Uchikawa. Nov. 1997. no. Englewood Cliffs. 1994. 30. Syst. [7] M. 521–525. “A hierarchical fuzzy modeling method with comprehensible fuzzy rules—An approach to inverse problem. so that there exist fixed numbers (corresponding to the ) such that . [8] G. Somerset. Fuzzy Control and Fuzzy Systems. pp. Takagi and M. M. vol. Syst. Johnstone. 3. Sugeno. for . 1989. we have (52) [1] S. Zigliotto. T. vol. 33. NO. where the respectively. 23.. vol. [3] H. and Y. [9] G. Nakayama. . in turn. we obtain (5) (51) . pp.K. 235–288. [6] W. Approximation Theory and Methods. Otake. Soft Computing (IIZUKA’96).624 IEEE TRANSACTIONS ON FUZZY SYSTEMS. S. 1201–1216. . vol. 1997.” Automatica. “Necessary conditions for some typical fuzzy systems as universal approximators... 3. pp. 17. 7. 5.” IEEE Trans. vol. REFERENCES . 1333–1338. The Netherlands: Elsevier. vol. vol. Furuhashi. [11] L.: Research Studies. 1996. “Adaptive hierarchical fuzzy controller. U. 1998.” Int. D. VOL. Since we have where .” Automatica. Let .” in Proc. A.” in Simulation and Design of Applied Electromagnetics Systems. [16] X. Now let be an arbitrary point in . Cybern. Wang. pp. Raju. 1998.” IEEE Trans. According to the membership functions defined in Step 1). 1981. Kisner. “Determination of antecedent structure for fuzzy modeling using genetic algorithm. no. Kikuchi. Fuzzy Syst. Englewood Cliffs. 93. Honma. G. vol. 1996 IEEE Int. 1993. above satisfies (5). Ed. X. no. [14] H. “Sufficient conditions on general fuzzy systems as function approximators.” in Proc. pp. where have is the th element of . J. hence . Compute parameters ( the ) from (49) into (45) and Substituting the into (44). A. “Functional completeness of hierarchical fuzzy modeling... Pedrycz. Ying and G. Contr. [5] S. 1994. [4] S. Japan.” Ann. 1995.. “Hardware and software effective configurations for multiinput fuzzy logic controllers. no. 58–106. “Approximation theory of fuzzy systems—MIMO case. we show that . vol. Ying.: Camdridge Univ. Bolognani and M. OCTOBER 1999 . Powell. and Y. we (53) Since the in (53) is an arbitrary point in from (53). Raju and J. 2nd ed. 173–179. Kuromiya. T. S. NJ: Prentice-Hall. A Course in Fuzzy Systems and Control. 15. 116–132. Fuzzy Syst. Conf. Singh. “Hierarchical fuzzy control. Since . pp. Zeng and M. Furuhashi. “Projection-based approximation and a duality with kernel methods. Man. 1996. and the matrix is obtained from (47) accordingly. Nakanishi. [15] H.K. Matsushita. Uchikawa. Zhou. pp. 6. U. Amsterdam. Cybern.. 179–182. 91–94. 223–230. Since (43) that determined from (49) guarantee (47) the we have which. [12] NJ: Prentice-Hall. Adaptive Fuzzy Systems and Control: Design and Stability Analysis. pp. pp. 54. we obtain the the resulting second-level TSK fuzzy system. there exists such which implies that for any . Yamada. Press. Feb. “Fuzzy identification of systems and its application to modeling and control. pp..