Transparent-Fuzzy-Systems Modeling&Control AndriRiid 2002

THESIS ON INFORMATICS AND SYSTEM ENGINEERING
Transparent Fuzzy Systems:

Modeling and Control
Andri Riid
TALLINN TECHNICAL UNIVERSITY
FACULTY OF INFORMATION TECHNOLOGY
DEPARTMENT OF COMPUTER CONTROL
Thesis submitted in partial fulfillment of the requirements for the degree

of Doctor of Philosophy in Engineering in Tallinn Technical University
© Andri Riid, 2002
ii
Abstract
During the last twenty years, fuzzy logic has been successfully applied to many
modeling and control problems. One of the reasons of success is that fuzzy
logic provides human-friendly and understandable knowledge representation
that can be utilized in expert knowledge extraction and implementation. It is
observed, however, that transparency, which is vital for undistorted information
transfer, is not a default property of fuzzy systems, moreover, application of
algorithms that identify fuzzy systems from data will most likely destroy any
semantics a fuzzy system ever had after initialization.
This thesis thoroughly investigates the issues related to transparency. Fuzzy
systems are generally divided into two classes. It is shown here that for these
classes different definitions of transparency apply. For standard fuzzy systems
that use fuzzy propositions in IF-THEN rules, explicit transparency constraints
have been derived. Based on these constraints, exploitation/modification
schemes of existing identification algorithms are suggested, moreover, a new
algorithm for training standard fuzzy systems has been proposed, with a
considerable potential to reduce the gap between accuracy and transparency in
fuzzy modeling. For 1st order Takagi-Sugeno systems that are interpreted in
terms of local linear models, such conditions cannot be derived due to system
architecture and its undesirable interpolation properties of 1st order TS systems.
It is, however, possible to solve the transparency preservation problem in the
context of modeling with another proposed method that benefits from rule
activation degree exponents.
1st order TS systems that admit valid interpretation of local models as
linearizations of the modeled system are useful, for example, in gain-scheduled
control. Transparent standard fuzzy systems, on the other hand, are vital to this
branch of intelligent control that seeks solutions by emulating the mechanisms
of reasoning and decision processes of human beings not limited to knowledge-
based fuzzy control. Performing the local inversion of the modeled system it is
possible to extract relevant control information, which is demonstrated with the
application of fed-batch fermentation.
The more a fuzzy controller resembles the expert’s role in a control task, the
higher will be the implementation benefit of the fuzzy engine. For example, a
hierarchy of fuzzy (and non-fuzzy) controllers simulates an existing hierarchy in
the human decision process and leads to improved control performance.
Another benefit from hierarchy is that it assumes problem decomposition. This
is especially important with fuzzy logic where large number of system variables
leads to exponential explosion of rules (curse of dimensionality) that makes
controller design extremely difficult or even impossible. The advantages of
hierarchical control are illustrated with truck backer-upper applications.
iii
Kokkuvõte
Viimaste aastakümnete vältel on hägus loogika leidnud edukat rakendust
mitmesuguste juhtimis- ja modelleerimisprobleemide lahendamisel. Edu üheks
pandiks on olnud asjaolu, et informatsiooni esitus hägus loogika kaudu on
lähedane informatsiooni esitusele neis otsustusmehhanismides, mida inimene
oma igapäevaelus kasutab. Seejuures tuleb arvestada, et läbipaistvus, hägusate
süsteemide omadus, mis on paljude antud rakenduste edukuse oluliseks
eelduseks, ei ole vaikimisi tagatud, samuti puudub algoritmide kasutamisel, mis
on suutelised andmekogumi põhjal hägusaid süsteeme identifitseerima,
igasugune garantii, et tulemuseks on läbipaistev hägus mudel.
Käesolevas töös kontsentreerutakse hägusate süsteemide läbipaistvusega
seonduvale. Kui tavakäsitluses jagatakse hägusad süsteemid kahte eri klassi, siis
töös on näidatud, et nende klasside puhul kehtivad erinevad läbipaistvuse
definitsioonid. Klassikaliste hägusate süsteemide puhul, kus KUI-SIIS reeglid
seostavad hägusaid määratlusi, on võimalik esitada läbipaistvuse tingimused
ilmutatud kujul. Esitatud tingimuste alusel on hinnatud olemasolevate
identifitseerimisalgoritmide omadusi ja kasutusvõimalusi. Lisaks on
väljatöötatud uudne algoritm, millega on võimalik vähendada eksisteerivat lõhet
täpsuse ja läbipaistvuse vahel hägusas modelleerimises. Esimest järku Takagi-
Sugeno hägusate süsteemide jaoks ilmutatud läbipaistvuse tingimuste andmise
võimalus puudub, kuid probleemile on võimalik leida lahendus modelleerimise
kontekstis, seda teise töös väljatöötatud meetodiga.
Esimest järku Takagi-Sugeno süsteemide läbipaistvus on kasulik näiteks
metoodikas, tuntud termini gain-scheduling all. Läbipaistvate hägusate
klassikaliste süsteemide kasutusvaldkond on veelgi suurem, laiendudes nendele
juhtimismeetoditele, mis otsivad lahendusi inimese otsustus ja
mõtlemisprotsesside emuleerimise läbi ja ei piirdu vaid teadmuspõhise
juhtimisega. Protsessi lingvistilise mudeli piiratud pööramise kaudu on võimalik
omandada olulist juhtimisinformatsiooni, mille näiteks on töös esitatud
fermentatsiooniprotsessi juhtimise rakendus.
Hägusa loogika kasutegur on seda suurem, mida enam regulaatori ülesanne
meenutab eksperdi rolli. Regulaatorite hierarhia kopeerib tegelikku hierarhiat
inimese otsustusprotsessis ja tagab juhtimiskvaliteedi paranemise. Kuivõrd
hierarhilise juhtimissüsteemi konstrueerimise eelduseks on probleemi
dekompositsioon, on kasu hägusa loogika valdkonnas veelgi suurem, sest hägus
juhtimine on eriti tundlik juhtimisparameetrite paljususe suhtes. Hierarhilise
juhtimissüsteemi eeliseid on demonstreeritud auto tagurdamissüsteemi näitel.
iv
Acknowledgements
First I would like to thank my supervisor, prof. Ennu Rüstern, for introducing
me to the subject, providing excellent working conditions and continuous
support throughout the studies.
Special thanks go to ex-colleague Mati Pirn for many fruitful discussions in the
early stadium of the work. I am even more grateful to Raul Isotamm who did a
lot of work on the implementation of algorithms described in the thesis and
other students I supervised during those years who all contributed to my work in
one way or another. Andres Rähni and colleagues in the Department of
Computer Control also deserve a mention here.
I would also like to mention gratefully other researchers all over the world who
have made their papers available online or sent their papers at my modest
request, as well as people who stand behind www.researchindex.org. In this
corner of the world it is sometimes difficult to obtain relevant scientific
information and cooperation of all such people has been of great help.
Many thanks to prof. em. Hanno Sillamaa for proofreading the first draft of the
manuscript and pointing out numerous mistakes and how the work could be
improved.
I am indebted to my family. What one may accomplish in terms of professional
career is quite meaningless compared to the importance of having children and
not ruining their lives. At least, this is what I think.
Andri Riid
Tallinn, April-September 2001, December 2001, February-March 2002
v
vi
Contents
1 Introduction ……………………………………………………. 1
1.1 General Background ………………………………………. 1
1.2 Problem statement ………………………………………… 6
1.3 Original contribution ……………………………………… 6
1.4 Outline of the thesis ……………………………………….. 7
2 Fuzzy systems ………………………………………………….. 9

2.1 Fuzzy sets …………………………………………………. 9
2.2 Basic properties of fuzzy sets ..……………………………. 10
2.3 Fuzzy partition ..…………………………………………… 11
2.4 Operations on fuzzy sets and fuzzy logic …………………. 13
2.5 Fuzzy systems ..……………………………………………. 15
2.6 Rule base properties ………………………...……………... 19
2.7 Inference examples ..………………………………………. 21
2.8 Takagi-Sugeno fuzzy systems…...………………………… 25
2.9 Design of fuzzy systems ..…………………………………. 27
2.10 Summary ..……………………………………………….. 28
vii
3 Interpolation and transparency in fuzzy systems …………… 31
3.1 Transparency and interpretability …………………………. 31
3.2. Transparency of standard fuzzy systems …………………. 33
3.3 Interpolation in standard systems …………………………. 37
3.3.1 Role of defuzzification……………………………… 37
3.3.2 Role of MF type ………..…………………………... 38
3.3.3 Role of inference parameters ………………………. 40
3.3.4 Interpolation in multidimensional space ………….... 40
st
3.4. Interpolation in 1 order TS systems …..…………………. 41
3.5 Transparency of 1st order TS systems ………...…………… 45
th st
3.6 Relationship between 0 and 1 order TS systems ……….. 47
3.7 Summary…………………………………………………… 49
4 Fuzzy modeling ………………..………………………………. 51

4.1 Introduction ……………………..………………………… 51
4.2 Fuzzy systems as universal approximators ………………... 54
4.3 Selection of input-output data ...…………………………… 54
4.4 Rule-based approaches ..…………………………………... 56
4.4.1 Fuzzy template modeling …………..………………. 56
4.4.2. Yager-Filev fuzzy template modeling algorithm ….. 58
4.4.3 Rule weights in modeling ……….…………………. 59
4.4.4 Wang-Mendel rule extraction ….…………………... 62
4.5 Least squares method …………...…………………………. 64
4.6 Gradient descent ..…………………………………………. 68
4.6.1 Gradient descent for fuzzy systems …..……………. 68
4.6.2 The learning process...……………………………… 72
4.6.3 Convergence issues and higher order methods …….. 73
4.6.4 Overfitting…...……………………………………… 76
4.7 Clustering algorithms…...…………………………………. 78
4.7.1 Extraction of fuzzy rules and membership functions. 81
4.7.2. Clustering example ……………………………..…. 83
viii
4.8 Genetic Algorithms …...…………………………………… 86
4.9 Transparency protection ..…………………………………. 89
th
4.9.1 Transparency protection of 0 order TS systems and
standard fuzzy systems ...………………………………… 90
4.9.2 Transparency protection of 1st order TS systems …... 92
4.10 Comparison of gradient-based methods …………………. 94
4.10.1 Modeling of a SISO system ……………………..... 94
4.10.2 Modeling of a TISO system ………………………. 100
4.11 Modeling of large systems ……………………..………… 101
4.12 Summary and conclusions. ………………………………. 103
5 Fuzzy control …...………………………….…………………... 105

5.1 Introduction ………………………..……………………... 105
5.2 Fuzzy setpoint controllers ………………………………... 107
5.3. Fusion of fuzzy and PID control ………………………….. 113
5.4. Inversion of fuzzy systems ………………………………. 116
5.4.1 Numerical inversion of fuzzy systems……………… 116
5.4.2 Non-numerical inversion of fuzzy systems………..... 118
5.4.3 Control by inverting a fuzzy model ………………... 124
5.5. Control example…………..….…………………………… 128
5.6. Stability issues …..………………………….…………….. 134
5.7. Summary and conclusions..……………………………….. 136
6 Applications ……………………...…………………………….. 139

6.1 Introduction…………………...…………………………… 139
6.2 Backing up the truck and truck-and-trailer ………………... 140
6.2.1 Truck backer-upper …..…………………………….. 140
6.2.2 Backing up the truck and trailer ……………………. 149
ix
6.3 Control of a fed-batch fermentation ………………………. 153
6.3.1 Control system for fed-batch fermentation process
with a single substrate feed ………………………………. 154
6.3.2 Fed-batch fermentation control (two substrate
process)…………………………………………………… 160
6.4 Conclusions and comments ……………..………………… 175
7 Conclusions …………………………………………...………... 179

7.1 Transparency conditions ……..……………………………. 179
7.2 Transparent modeling algorithms ..………………………... 180
7.3 Transparent fuzzy control ..………………………………... 181
7.4 Suggestions for further research ..…………………………. 182
References ………………………………………………...……… 183

Symbols and abbreviations..……………………………………. 193
List of publications ………………………………………………. 195
Appendix A ………………………………………………………. 197
Appendix B ………………………………………………………. 201
Appendix C ………………………………………………………. 205
Appendix D …...………………………………………………….. 213
x
Transparent fuzzy systems: modeling and control
Introduction
"Artificial intelligence is the science of making machines do things that would
require intelligence if done by men"
Marvin Minsky
This thesis summarizes author's research experience and principal results

achieved in the field of fuzzy modeling and control during last six years. Ability
of fuzzy logic to abstract and to explain the complex behavior of systems in
linguistic terms has been the driving force behind the research. The introductory
chapter describes the general background, the research problem and explains
what is to be expected from the rest of the thesis.
1.1 General background
Perhaps the biggest dream of mankind is the dream of artificial human being or
a thinking machine created by humans. Why is that so, is open to speculations.
Perhaps creation of such machine would raise humans into the position of
godlike beings to whom nothing is impossible. In science fiction (the particular
branch of fiction that explicitly expresses our fears and expectations about
future), this theme has been prominent from the very beginning.
The history of artificial intelligence (AI) is closely connected to the history of
digital computer. There is, however, fundamental difference between the digital
computer and human mind. From the very beginning, computer programs were
superior to human beings both in speed and accuracy what concerns the solving
of complex mathematical problems, e.g. differential equations. On the other
hand, it is very difficult to construct robot programs that could see and move
well enough to handle ordinary things like children's building blocks and do
things like stack them up, take them down, rearrange them, and put them in
boxes.
1
The problem does not derive from inadequacy of sensors and actuators alone.
The key issue is that human thinking is predominantly inexact. This inexactness
is, however, essential for the management of real-world systems, the crucial
ability to summarize data and focus on decision-relevant information. This
inexactness is something very opposite to what computers can do. Thus, special
AI techniques are needed to imitate the human being.
Alan Turing, one of the early prominent figures in the field of AI, was among
the first to consider the philosophical issues of AI, e.g. the definition and
criterion of intelligence. 50 years later, these questions have still no final
solution. Many believe that important attributes of intelligence are self-learning
and self-awareness.
The self-learning problem is somewhat solved by specifically designed (mostly
supervised) learning algorithms that allow AI programs to improve themselves
(basically the most primitive learning tasks can be solved that way). Self-
awareness is believed to be emergent property, i.e. similarly to critical mass it
will pop up if when a sufficient amount of mass (intelligence) has been
accumulated.
Of AI techniques to emerge during the last 50 years, two stand out: Artificial
neural networks (ANN) (biologically inspired, as is much of AI) are based on a
loose analogy of the presumed workings of a brain and share some important
characteristics with the brain. First, as its name suggests, a neural network
consists of a network of at least partially connected, simple processing
elements. In the biological brain, each processing element is called a neuron.
These biological neurons have a body (consisting of a nucleus and the soma), a
set of dendrites, an axon, and a set of synaptic buttons. Artificial neuron
components are direct analogs to the components of an actual neuron. Each of
the inputs (dendrites in actual neuron) is modified by a weight whose function is
analogous to that of the synaptic junction in a biological neuron. The processing
element consists of two parts. The first part simply sums the weighted inputs,
whereas the second part is a nonlinear filter, usually called the activation (most
typically threshold or sigmoidal) function through which the combined signal
flows. These artificial neurons are usually organized into a sequence of layers.
Neural networks perform two major functions: learning and recall. Recall is the
process of accepting an input stimulus and producing an output response in
accordance with the network weight structure (the weights of the network
represent "distributed knowledge"). Learning is the process of adapting the
connection weights to produce the desired output vector in response to a
stimulus vector presented to the input buffer. Typically the learning is of
supervisory type, i.e. another stimulus is presented at the output buffer,
representing the desired response to the given input. Note also that recall is an
integral part of the learning process since a desired response to the network
must be compared to the actual output in order to create an error function.
2
However, if the workings (including learning processes) of the human brain are
to be simulated using ANN, drastic simplifications must be adopted. And
ANNs, for the engineer, are just design techniques that draw inspiration from
the workings of the brain, they are not meant to simulate the brain. A present-
day artificial neural network is very simple compared to actual brain, is not self-
aware and does not "think"!
At this one would ask - where is the point? The answer is a somewhat
unexpected paradox: much "expert" adult thinking is basically much simpler
than what happens in a child's ordinary play! It can be harder to be a novice
than to be an expert. This is because, sometimes, what an expert needs to know
and do can be quite simple - only, it may be very hard to discover, or learn, in
the first place.
Another important AI technique is fuzzy logic. Whereas ANNs simulate the
physical aspect of human brain, fuzzy logic or multi-valued logic as opposed to
Aristotelian logic, imitates the thinking model of humans. Fuzziness is a
property of language. Fuzzy logic is used for reasoning about inherently vague
or uncertain concepts and provides a representation scheme and a calculus for
dealing with them,
On the bottom, fuzzy logic is a generalization of classical (or Aristotelian) logic,
in which a concept can possess a degree of truth anywhere between 0.0 and 1.0.
Aristotelian logic applies only to concepts that are completely true (having
degree of truth 1.0) or completely false (having degree of truth 0.0). Such
generalization makes possible the manipulation of such terms as "large,"
"warm," and "fast," which can simultaneously be seen to belong partially to two
or more different, contradictory sets of values.
Most applications of fuzzy logic utilize it as the underlying logic system for
fuzzy (expert) systems that use a collection of fuzzy membership functions and
fuzzy IF-THEN rules, instead of Boolean logic, to reason about data. This could
be compared to a very high-level programming language, where the program
consists of IF-THEN rules and the compiler or interpreter results in a nonlinear
inference algorithm.
The inventor of fuzzy logic - L.A. Zadeh originally devised the technique as a
means for solving problems in the soft sciences, particularly those that involved
interactions between humans, and/or between humans and machines. It is
interesting to note that the actual applications of fuzzy logic are far afield from
Zadeh's original notion of help for the soft sciences and not in high-level
artificial intelligence but rather in low-level machine control.
Fuzzy logic is especially useful for situations in which conventional logic
technologies are not effective, such as systems and devices that cannot be
precisely described by mathematical models, those that have significant
uncertainties or contradictory conditions, and linguistically controlled devices
or systems.
3
Today, when some scientists are attempting to create life-like robots, another
group of scientists is creating life, or something very close to it, using
computers to program "organisms" that can "move", "see", "feed", "reproduce",
and "die", the mentioned AI-inspired techniques are already in use. In last
twenty years the machines incorporating neural networks and/or fuzzy logic
earned livings making scientific, medical and financial decisions. This
particular branch of AI has grown so prominent that even a special term -
computational intelligence - has been coined. In a more broad view,
computational intelligence can considered as but the first way station on the
road to human-friendly integrated AI systems.
Fuzzy logic and neural networks, at first glance, may seem to have very little in
common: the former is a generalized "multi-valued logic", while the latter is a
structure consisting of one or more small, interconnected processing elements.
Neural networks can perform effective function approximation but how any
individual weight contributes to the system output is unclear - the ANN
obtained from the learning process cannot be interpreted, we cannot check if the
solution is plausible. For the same reason, we cannot initialize ANN with prior
knowledge in any meaningful way, the learning must always start from scratch.
ANN cannot also learn anything without training data. Fuzzy systems, on the
other hand, are suitable for incorporating prior knowledge and experience and
are transparent to interpretation (on what conditions will be explained in this
thesis) but without knowledge are pretty useless.
The differences, however, can be used to advantage. The general idea is to
combine the two in a manner that results in the best of both techniques. Some
examples to exploit the complementary relationship are:
1. neural networks may be trained to generate membership values for a fuzzy
logic membership function;
2. fuzzy logic functions may be used to "fine tune" a neural network training
algorithm;
3. fuzzy logic functions or conditionals may encapsulate the input and/or
output layers of a neural network, i.e., inputs are "filtered" through a fuzzy
function before entering the neural network, and/or outputs are "filtered"
through a fuzzy function after being processed by the network;
4. fuzzy logic functions may access data stored within neural network-based
(associative) memories;
5. fuzzy conditional statements may be used to activate subsets of a neural
network-based system, and vice versa.
Further hybrids of fuzzy logic, neural networks, and other techniques (e.g.
genetic algorithms) are possible, depending on what is best for a particular
application.
Especially fruitful has been the crossover of fuzzy logic and neural networks in
a manner where fuzzy systems are trained by a learning algorithm derived from
neural network theory. In order to facilitate that, a fuzzy system is usually
represented as a special multilayer (usually five-layer) feedforward neural
4
network. In such "neuro-fuzzy" networks, connection weights and propagation

and activation functions differ from common neural networks.
It must be noted, however, that usually the cooperation between fuzzy logic and
neural networks is unidirectional, i.e. neuro-fuzzy systems can be initialized
with prior knowledge but lose valid interpretation quickly during the training.
This is because the learning procedure of a neuro-fuzzy system does not take
the semantic properties of the underlying fuzzy system into account. Moreover,
there is no general agreement about these "semantic properties" and how
exactly they should be taken into account. This is the topic where the current
thesis fits in, making the assumption that the most attractive property of fuzzy
systems lies in their ability to process the information both linguistically and
numerically and ignoring the linguistic aspect reduces fuzzy logic another
black-box technique with its full potential unused.
Linguistic interpretation (when valid) is a rather powerful tool for analyzing the
numerical data and can be used to obtain useful information about the modeled
unknown system. This is just one of many potential application of transparent
(the term stands for a property that allows valid linguistic interpretation) fuzzy
systems. There is a tradeoff between interpretability and adaptability, which is
perhaps one of the reasons why most of research has been focused on adaptation
properties of neuro-fuzzy systems without giving proper measure to
transparency.
Fuzzy logic and fuzzy control in particular have been subjects of rather harsh
criticism from the very beginning. Some have seen it as a typical conflict
between the well-established conventional theory and a new emerging
paradigm. For instance the term "fuzzy" has been repeatedly discredited for
being misleading by itself. Further accusations include the claim that anything
that can be done with fuzzy logic can be done equally well with classical logic
and probabilistic theory. Fuzzy control has been criticized for the ad-hoc design
method and for the inability to provide stability analysis for fuzzy control. The
former problem is somewhat solved by the recent developments in the field.
Ironically, the lack of stability analysis derives primarily from the fact that
fuzzy control techniques enable us to design the controller without the
mathematical model (that is generally considered a virtue).
Supporters of fuzzy techniques, on the other hand, have sometimes made clearly
unrealistic predictions and claims. Two typical claims are that fuzzy control
provides more robustness than conventional control and that fuzzy control is
more suitable for controlling nonlinear processes. This actually depends more
on the particular application and the configuration of the controller than on
fuzziness - no general proof about it can be provided. The controversy is not
helped by the fact that current fuzzy technology is often compared with poor
implementations of traditional control technology.
The truth is probably somewhere in between - fuzzy logic is certainly not the
universal cure for all the troubles of the world but it is difficult to deny its
5
considerable potential for practical applications - basically because such a

design method is closer to human thinking and perception and reduces
development time.
1.2 Problem statement
As stated in section 1.1, fruitful cooperation between neural networks and fuzzy
logic has been established but the cooperation itself appears to be rather
unidirectional at a closer look. Neural network learning algorithms can be
applied to fuzzy systems but the resulting systems are more neural networks
than fuzzy systems in the sense that their parameters lose physical meaning and
cannot be interpreted correctly. So, it turns out that in most cases we are able to
make use of prior knowledge and experience prior to training only - it will be
utilized with fuzzy logic in order to aim for the better initial state of the network
- but have no further means for checking what has happened to the original
knowledge and how has it been modified in training process.
This presents a challenge - could the learning potential of neural networks be
used without losing transparency of the system? The interest is not purely
academic. Generalization and abstraction properties of human beings allow us
to control the processes that are still beyond the capabilities of automatic
control. If mechanisms that preserve transparency could be established,
naturally, many aspects of fuzzy modeling and control would need revision,
which is exactly what is attempted in this thesis. Transparency preservation of
self-adaptive control systems could by itself be a topic for another thesis and it
is quite clear that the current work at best finds answers only to some questions
thus many implications of transparency to modeling and control require further
consideration. Hopefully, the thesis manages to establish firm foundation and
clear perspectives for further research.
1.3 Original contribution
The main original contributions of the thesis are the definition of fuzzy system
transparency and transparency constraints for fuzzy system parameters derived
from the established definition. We, however, are able to derive explicit
constraints for standard and 0th order (Takagi-Sugeno) TS systems only.
Transparency of 1st order TS systems suffers from undesirable interpolation
properties and no constraints that make sense can be derived for the consequent
parameters of this type of systems. Nevertheless, the solution can be provided in
the context of modeling. Our contribution is a highly flexible method that uses
rule activation degree exponents to give more importance to relevant rules and
to reduce the contribution of irrelevant ones that in the end results in the
identification of local models that admit valid interpretation as local
linearizations of the modeled system.
6
Transparency-accuracy tradeoff is similarly observable with standard and 0th

order TS systems and the possibilities to reduce the gap between transparency
and accuracy have been thoroughly explored in this thesis. A gradient-based
optimization method that is applicable to standard fuzzy systems is derived
(Appendix D). The proposed algorithm uses the interpolation properties of
standard fuzzy systems to provide more efficient transparent approximation.
Other contributions are of complementary and derivative nature. We seek for
the formalization of methodology that allows us to emulate the control and
decision processes of human beings with maximum efficiency. This task
requires in-depth research and evaluation of modeling and control techniques.
Modeling techniques with potential for transparent modeling are reviewed and
their suitability for transparent modeling is evaluated. Modifications of original
algorithms are suggested or adopted from the works of other authors, where
necessary.
Hybrid hierarchical control architecture where control knowledge (that can be
obtained from experts or through the analysis of the model (e.g. local linguistic
inversion) of the controlled process) is expressed by IF-THEN rules of the
supervisor and low-level control is carried out by PID-type controller is found
especially suitable for the implementation of transparent control. The
applications of truck backer-upper and fed-batch fermentation promptly
illustrate the advantages of such control.
1.4 Outline of the thesis
The thesis is organized as follows: the next chapter gives the overview of
fuzzy set theory, fuzzy logic (in narrower sense) and the theory of fuzzy
systems in a measure that is necessary to understand the rest of the thesis.
Chapter 3 introduces the concept of transparency and proposes a set of
constraints for transparency preservation. Additionally, the interpolation
properties of fuzzy systems are reviewed because of close relation of these two
topics and because system transparency provides more systematic approach for
the analysis of interpolation in fuzzy systems. The chapter provides the
foundation for the rest of thesis.
Chapter 4 lists a variety of already existing approximation algorithms with
built-in transparency protection and suggests the methods for transparency
protection with incremental algorithms. Moreover, two new algorithms (in
section 4.9.1 and 4.9.2, respectively) to reduce the gap between transparency
and accuracy in fuzzy modeling are introduced.
Chapter 5 gives a brief overview of fuzzy control techniques, where of
particular interest are analytical and linguistic inversion techniques of fuzzy
7
systems that, as shown, greatly benefit from system transparency. The

principles of local linguistic inversion are introduced in section 5.4.2.
In chapter 6, two control applications that require system transparency for
success and make use of hierarchical architecture of the control system are
described and the control results are provided.
The final chapter summarizes the results of the thesis and points out the subjects
for further research.
Background of these particular subtopics (transparency, fuzzy modeling, fuzzy
control) is surveyed in chapter introductions for reader's convenience.
8
Fuzzy systems
2.1 Fuzzy sets
L.A. Zadeh (1965) introduced the concept of fuzzy sets and respective theory
that can be regarded as the extension of classical set theory. In classical set
theory an element x is either a member or non-member of A, subset of the
universe X. The membership µ A (x) of x into A is thus given by:
1, if x ∈ A
µ A ( x) =  (2.1)
0, if x ∉ A
Real life presents a number of situations where crisp membership (2.1) is not
flexible enough for the accurate description of sets because it forces abrupt
transition from absolute membership to non-membership. A typical example
would be the problem where given the age of a person we are required to
determine if he (she) is young (e.g. in order to calculate some health risk) or
not. In other words, the set of young people has to be defined. It would be
obvious that people younger than 20 years are unconditionally young whereas
people older than 40 are middle-aged quite frankly. The age range between 20
and 40 years, however, is a different matter. Gradual transition would seem
more reasonable and is implemented by allowing the membership degree to be
chosen from the interval [0,1] (Fig. 2.1).
In theoretical works fuzzy sets are often represented by sets of ordered pairs
µ A ( x) = {µ1 / x1 , µ 2 / x 2 , ..., µ n / x n }, (2.2)
where each value of x is paired with its membership value into A. Although
such representation is very flexible, allowing arbitrary MF shape, obviously
much storage space is required if it comes to practical issues and therefore in
application-oriented works functional representation dominates:
µ A ( x) = f ( x) (2.3)
9
1.0
0.8
µ(age) 0.6
0.4
0.2
0
20 25 30 35 40 45 50
age
Fig. 2.1. Fuzzy set.
2.2 Basic properties of fuzzy sets
Here, only some basic properties of fuzzy sets that are needed to understand the
rest of the thesis are given.
a) The height of a fuzzy set A, hgt(A), is defined by
hgt ( A) = sup µ A ( x) (2.4
x∈X
Fuzzy sets with a height equal to 1 are called normal.

b) The core of a fuzzy set, is a crisp subset of X:
core( A) = {x ∈ X | µ A ( x) = 1} (2.5
Normal, piecewise continuous and convex (see below) fuzzy sets with the core
that consists of one value only are called fuzzy numbers, in contrast fuzzy sets
satisfying first three conditions but with the core that consists of more than one
value, are called fuzzy intervals.
c) The support of a fuzzy set, is another crisp subset of X
supp( A) = {x ∈ X | µ A ( x) > 0} (2.6
If the support of a fuzzy set is finite, it is called compact support.
A convex fuzzy set is characterized by
∀x1 , x 2 , x3 ∈ X , x1 ≤ x 2 ≤ x3 → µ A ( x 2 ) ≥ min( µ A ( x1 ), µ A ( x3 )) . (2.7)
10
1.0
0.8
µA(x) 0.6
hgt (A)
0.4
0.2
0
a b c d x
core (A)
supp (A)
Figure 2.2. Height, support and core of a fuzzy set.
Fuzzy sets used in applications are generally convex fuzzy numbers or intervals.
Most often piecewise linear standard functions are used, such as trapezoid or
triangular membership functions (see A.3, A.5 in Appendix A). Trapezoidal
membership function (MF) is determined by four parameters a ≤ b ≤ c ≤ d,
where a = min(supp(A)), b = min(core(A)), c = max(core(A)), d = max(supp(A))
(Fig. 2.2). Triangular MF can then be considered a special case of described
function, with b = c.
The second group of MFs are “smooth” like Gaussian MF, determined by two
parameters (A.2). Gaussian MF is differentiable and has compact
representation.
2.3 Fuzzy partition
Let us assume that we have to solve the problem where the sole definition of the
fuzzy set "young" is not accurate enough and all people have to be assigned to
sets like young, middle-aged, old. The resulting partition (Fig.2.3) would be
called fuzzy partition and consists of fuzzy sets, that are identified through
linguistic labels (terms) assigned to them. There is considerable overlap
between the fuzzy sets.
The advantage of fuzzy sets over the crisp ones becomes more clearer. Partial
membership (fuzziness) allows the description of concepts in which the
boundary between having a property and not having a property is not sharp (e.g.
it would be really difficult to say if the person being of age 60 is old or middle-
aged). Moreover, by using fuzzy sets and their linguistic labels we are able to
move from numbers to abstractions (or the opposite) that is natural for human
beings but is otherwise difficult to formulate mathematically.
11
AGE Linguistic variable
young middle-aged old Linguistic labels ( terms)

1.0
0.8
µA(x) 0.6 Membership functions
0.4
0.2
0
0 20 40 60 80 100 Numerical values
x (age) Base variable
Figure 2.3. A fuzzy partition.
It is usually desired that each value of x has nonzero membership value for at
least one fuzzy set:
∀x ∈ X , ∃i, µ Ai ( x) > 0 (2.8)
or in alternative transcription:
S
∀x ∈ X : ∑ µ Ai ( x) > 0 , (2.9)
s =1
where S is the number of fuzzy subsets that make up the partition. A partition
satisfying (2.9) has coverage property.
Particularly interesting partitioning type (for reasons that become clear in
chapter 3) is the one with what
S
∑ µ A ( x) = 1, ∀x ∈ X ,
s
(2.10)
s =1
often referred to as a fuzzy partition (or Ruspini partition). In case of a fuzzy

partition for each x its total membership is equal to 1, whereas it can belong
maximum to two fuzzy subsets. The partition in Fig. 2.3 appears to be a fuzzy
partition.
Semantic soundness of the partition is a rather empirically determined quality. A
partition can be considered semantically sound if fuzzy sets that form the
partition, are convex and normal, "sufficiently" distinct and the number of
subsets per variable is relatively small (maximum values from 7 to 10 have been
suggested (de Oliveira 1999).
Another empirical property - semantic consistency – measures the consistency
between the intuitive semantics of linguistic labels and the corresponding fuzzy
12
subsets to which the labels are assigned. Note that the semantics of the
linguistic labels is not always utilized (plain labels like "mf1", "mf2", etc. are
sometimes used that carry little information). Semantics of a linguistic item
depends on its context, consequently, semantic consistency of a partition also
depends on the context of the problem. Imagine a fuzzy set having nonzero
membership for values in the range [30, 40], labeled as "young". Semantic
consistency of the partition then depends on the definition of universal set X,
e.g. with X = [30, 100] we have semantically consistent partition but with X =
[0, 40] semantic inconsistency is detected. The linguistic ordering of fuzzy sets
also plays a role here, e.g. the center parameter of the fuzzy subset
corresponding to the linguistic label "old" should always be than the one
corresponding to the linguistic label "young".
2.4 Operations on fuzzy sets and fuzzy logic
We consider basic set operations known from classic set theory such as
intersection, union and complement. These extensions are not uniquely defined
(as in classical theory) due to the fact that membership function can have any
value in the interval [0, 1].
The general forms of intersection and union are represented by triangular norms
(T-norms) and triangular conorms (T-conorms or S-norms), respectively.
T-norm is a two-place function from [0,1] × [0,1] to [0,1] satisfying the
following criteria:
T(a,1) = a One identity
T(a, b) ≤ T(c, d), whenever a ≤ c, b ≤ d Monotonicity
T(a, b) = T(b, a) Commutiativity
T(T(a, b), c) = T(a, T(b, c)) Associativity
The conditions defining a S-norm (T-conorm), S: [0,1] × [0,1] → [0, 1], are
S(a,0) = a Zero identity
S(a, b) ≤ S(c, d), whenever a ≤ c, b ≤ d Monotonicity
S(a, b) = S(b, a) Commutiativity
S(S(a, b), c) = S(a, S(b, c)) Associativity
The complement of a fuzzy set A is defined by
c(0) = 0, c(1) = 1 Boundary
c(a) < c(b), whenever a > b Order reversing
c(c(a)) = a Involution
The most common t-norms are minimum (2.11) and product (2.12). See also
Fig 2.4.
A I B = min( µ A ( x), µ B ( x)) (2.11)
13
A I B = µ A ( x) µ B ( x) (2.12)
The most common s-norms are maximum (2.13) and probabilistic sum (2.14).
See also Fig 2.5.
A U B = max(µ A ( x), µ B ( x)) (2.13)
A U B = µ A ( x) + µ B ( x) − µ A ( x) µ B ( x) (2.14)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
x x
Figure 2.4. Minimum (left) and product of two fuzzy sets (right).
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.6 0.6
0 0
0 x x
Figure 2.5. Maximum (left) and probabilistic sum of two fuzzy sets (right).
In applications, interestingly, a sum (obviously not a s-norm) of fuzzy sets is far

more common choice than probabilistic sum that may result in a supernormal
fuzzy set with height greater than one. The latter, however, is considered a
minor problem in practice (Jager 1995).
Typically, complement of a fuzzy set A is defined as
A = 1 − µ A ( x) . (2.15)
1.0
0.8
0.6
0.4
0.2
0
x
Fig. 2.6. Complement of a fuzzy set.
14
As classical set theory serves as the basis for classical logic, fuzzy set theory
serves as the basis for fuzzy logic. Theoretic operations on fuzzy sets are a base
for fuzzy logical operations meaning that the operations defined for sets (union,
intersection and complement), have a corresponding logical operation - or, and
and not respectively, similarly based on t-norm, s-norm and conditions given for
complement.
2.5 Fuzzy systems
Fuzzy set theory and fuzzy logic provide the means for constructing fuzzy
systems (Zadeh 1973). Fuzzy system consists of a number of rules that specify
linguistic relation between the linguistic labels of input and output variables of
the system. A fuzzy rule (2.16) is a statement where the premise and the
consequent consist of fuzzy propositions that are statements like "x is big" that
connect the variable with a linguistic label defined for that variable.
IF U1 is A1r AND U2 is A2r … AND Ui is Air … AND UN is ANr
THEN V1 is B1r AND V2 is B2r … AND Vj is Bjr … AND VM is BMr (2.16)
OR…
Air and Bjr denote the linguistic labels of ith linguistic input variable xi and jth
linguistic output variable yj (i = 1 … N, j = 1 … M), respectively, associated
with the rth rule (r = 1 … R).
With (2.16) the relationship in linguistic terms is given. In order to associate it
with the numerical values, a special six-step inference algorithm is used. The
inference algorithm implements the mapping between the linguistic variables
Ui, Vj and corresponding base variables xi, yj. We assume for a moment that the
system under consideration is multi-input/single-output (MISO).
1. The inference mechanism operates on fuzzy sets to produce fuzzy sets.

Normally the inputs to the fuzzy system are crisp and thus these have to be
converted to fuzzy sets. This first step of the inference algorithm during
which the fuzzy representation of the crisp input is created is therefore called
fuzzification.
Uncertainty, imprecision or inaccuracy in the inputs can be modeled by
fuzzy numbers (e.g. triangular fuzzification) to represent the inputs. Most
often, however, singleton fuzzification is used, with a membership function
defined by a fuzzy singleton (A.1) because alternative fuzzification methods
add computational complexity to the process and the need for them has not
been that well justified.
2. In order to evaluate the premise propositions in numerical terms (proposition
matching), the membership value of µir in respect to x is to be determined.
15
Proposition matching is defined as τ ir = hgt ( µ i' ∩ µ ir ) , where µ i' is the ith

fuzzy input that in case of singleton fuzzification reduces to
τ ir = µ ir ( xi ) (2.17)
As the fuzzy singleton can be regarded just a different representation of a
crisp number, thus with singleton fuzzification, the fuzzification procedure is
embedded into proposition matching.
3. Operator AND that concatenates the premise propositions obviously
corresponds to t-norm. Using the appropriate t-norm, the activation degree
(also degree of fulfillment, firing strength) of a rule is calculated. The
procedure is called premise conjunction.
N
τ r = Iτ ir (2.18)
i =1
4. Operator THEN corresponds to implication. In classical logic, implication is

defined by
A → B = ¬A U B ,
that in fuzzy logic could be expressed as fuzzy implication using negation
(2.15) and maximum s-norm:
A → B = max(1 − µ A ( x), µ B ( x)) (2.19)
This (material) implication is, however, rarely used in fuzzy system
applications. Usually conjunction (t-norm) is preferred, and the output fuzzy
set is thus determined by
Fr ( y ) = τ r ∩ γ r , (2.20)
where γr denotes the output membership function associated with the rth rule.
This preference is basically due to undesirable interpolation properties
characteristic of material implication (explored e.g. in (Jager 1995)); (2.20)
is also much easier to implement.
Babuska (1997) states that while material implication represents the
unidirectional relationship "A implies B", t-norm should be interpreted as a
nondirectional relationship "it is true that A holds and B holds".
5. The rules are then aggregated by using s-norm for the aggregation operator
OR. Note that with (2.19) t-norm must be used (Jager 1995).
R
F ( y ) = U Fr ( y ) (2.21)
r =1
16
Fuzzification (2.17), proposition matching, premise conjunction (2.18),

implication (2.20) and aggregation (2.21) result in fuzzy output of the
system:
R  N 
 
F ( y ) = U   I µ ir ( xi )  ∩ γ r  (2.22)
 
r =1   i =1  
Consider a two-input/single-output fuzzy system for illustration (Fig. 2.7).
implication
premise conjunction
proposition matching
IF U1 is A11 AND U2 is A21 THEN V1 is B1

aggregation OR
IF U1 is A12 AND U2 is A22 THEN V1 is B2
Fig 2.7. Steps of inference algorithm and corresponding linguistic operators.
When it comes to multi-input/multi-output (MIMO) systems, AND operator

in rule consequent is not treated as logical and (implemented by t-norm),
each fuzzy output F(y j) is evaluated independently by
R
 N  
F ( y j ) = U   I µ ir ( xi )  ∩ γ jr  (2.23)
r =1   i =1  
The latter implies that any MIMO fuzzy system can be decomposed to M
MISO fuzzy systems.
6. Because in practice we deal with crisp rather than fuzzy values, the crisp
representation of the fuzzy output must be derived finally. This is normally
done by averaging technique called defuzzification (inverse operation to
fuzzification).
Two common defuzzification methods are center-of-gravity (CoG) and
mean-of-maxima (MoM).
CoG (often referred to as center-of-area defuzzification in the case of one-
dimensional sets) is actually the same method employed to calculate the
center of gravity of a mass. The difference is that point masses are replaced
by the membership values.
17
∫ yF ( y)dy
Ycog ( F ( y )) = Y
(2.24)
∫ F ( y)dy
Y
In practice CoG is usually applied in discrete form:

Q
∑ F(y
q =1
q ) yq
Ycog ( F ( y )) = Q
. (2.25)
∑ F(y
q =1
q )
Mean-of-maxima defuzzification belongs to the class of indexed (or

threshold) defuzzification methods that discriminate part of a fuzzy output
where membership values are below a certain threshold level. For MoM
defuzzification only this part of fuzzy output is taken into account that yields
maximum membership.
1
Ymom ( F ( y )) =
q
∑ F(y j ) , (2.26)
j∈J *
where J* denotes the subset of maximum values of F(y) and q is the number
of its elements.
Putting (2.22) into (2.25) we obtain

 R 
 Uτ r I Γr  ⋅ Y T
 
y = Ycog ( F ( y )) =  R 
r =1
, (2.27)
 
 Uτ r I Γr  ⋅ 1
 
 r =1 
[ ] [ ]
where Γr = γ r ( y1 ) γ r ( y 2 ) ...γ r ( y q ) ...γ r ( y Q ) , Y = y1 y 2 ... y q ... y Q and 1 is a
unitary column vector of Q elements.
Note that with product chosen as t-norm and sum chosen as s-norm
 R  Q R
 ∑ τ r ⋅ Γr  ⋅ Y T ∑∑τ r γ r ( y q ) y q
y = Ycog ( F ( y )) =  R 
r =1 q =1 r =1
= =
  Q R
 ∑ τ r ⋅ Γr  ⋅ 1 ∑∑τ r γ r ( y q ) (2.28)
 r =1  q =1 r =1
R Q R Q
= ∑τ r ∑ γ r ( y q ) y q ∑τ r ∑ γ r ( y q )
r =1 q =1 r =1 q =1
18
For reasons that become clearer in next chapter, two special cases of (28) are of
special interest to us:
(i) output MFs are fuzzy singletons (A.1).
(ii) output MFs are symmetrical triangles (A.4)
It can be shown (see Appendix B for details) that in first case (2.28) reduces to
R R
y = ∑τ r br ∑τ r , (2.29)
r =1 r =1
whereas in second case

R R
y = ∑τ r br s r ∑τ r s r . (2.30)
r =1 r =1
A popular representation of fuzzy systems is depicted in Fig. 2.8, where (to

summarize what we have established) rule base stores a set of logical IF-THEN
rules defined on the system variables, data base stores a set of MFs of fuzzy
labels of rules used in the rule set. These two bases can be regarded as the
linguistic layer (or knowledge base) of the system.
Knowledge Base
Rule Base Data Base
Fuzzifier Fuzzy Inference Engine Defuzzifier
Fig. 2.8. A generic fuzzy system.
Inference layer consists of fuzzifier (that converts a set of crisp variables into a
set of fuzzy variables to enable the application of logical rules), fuzzy inference
engine (that is an algorithm that calculates the extent to which each rule is
activated and combines these into fuzzy system output) and defuzzifier (that
converts a set of fuzzy variables into crisp values in order to enable the output
of the fuzzy system to be applied to another non-fuzzy system). Fuzzy inference
engine, rule base and data base can be regarded as the reasoning block of a
fuzzy system.
2.6 Rule base properties
In (2.16)-(2.23) we refer to linguistic labels (and respective fuzzy subsets) in

correspondence to their occurrence in the rth rule that probably gives the
impression that each fuzzy subset defined for the particular linguistic variable is
used only once in the rule base. Generally this is not the case - the number of
19
"slots" in rules exceeds the number of unique linguistic labels and a fuzzy
subset is usually associated with several rules.
Assuming that each input variable Ui is partitioned into Si fuzzy subsets, each
output variable Vj is partitioned into Tj fuzzy subsets and the fuzzy system
consists of R rules, a separate structure that defines the mapping between rule-
oriented notation and variable-oriented notation is needed.
In MATLAB Fuzzy Logic Toolbox, for example, the information is stored in
the R × (N + M) matrix, each element of which mrp, is the index of either input
(if p ≤ N) or output (if p > N) variable's membership function, associated with
the rth rule.
 m11 ... m1 p ... m1, M + N 
 
 ... ... ... .... ... 
 mr1 ... m rp ... m r , M + N  , (2.31)
 
 ... ... ... ... ... 
m ... m Rp ... m R , M + N 
 R1 
The maximum number of rules of a system is given by
N
Rmax = ∏ S i . (2.32)
i =1
The comparison of the actual number of rules with Rmax is a good indicator if
the rule base is properly defined:
a) R < Rmax - implies that one or several rules possible with the given input
partition are undefined - incomplete rule base.
b) R > Rmax - implies that there are several rules with equivalent antecedents
that are associated with
• same consequent subsets - resulting in redundant rule base;
• unique consequent subsets - resulting in inconsistent rule base.
c) R = Rmax - usually the desired situation.
Variable-oriented transcription of fuzzy inference algorithm (2.33) is also
possible, assuming that R = Rmax, and that each output fuzzy set γr is associated
with one rule only (which means that the number of output MFs is equal to R),
what would at the first glance seem impractical, but such combinatorial rule
base is actually quite common in fuzzy modeling because of the lack of
adequate rule training algorithms.
S1 S2 SN
 N j  
F ( y) = U U U   I µ i i ( xi )  ∩ γ r 
... (2.33)
j1 =1 j2 =1 j N =1   i =1  
Because of the difficulties in fuzzy system modeling some authors (e.g. (Tong
1978), (Kosko 1992a)) have employed rule weights. According to this strategy,
20
each rule is assigned a rule weight Wr = [0,1], that is involved in calculation of

the rth's rule's output (2.20) and is said to express the relevance, credibility or
probability of the rule:
Fr ( y ) = Wrτ r ∩ γ r . (2.34)
In most cases the sum of the weights of the rules with equivalent antecedents is
required to be equal to 1. A weightless fuzzy system (2.2) can then be regarded
as a special case of weighted fuzzy systems, with ∀ Wr = 1.
There are three basic reasons why rule weights should be avoided.
N
1) Number of rules is increased by an order of magnitude - Rmax = T ∏ S i .
i =1
2) Interpretation of the rules is made difficult, partly because of the increased
number of rules, partly because no good explanation as to how to interpret
these weights exists.
3) Any tuning action by adjusting the weights can accomplished by modifying
membership function parameters.
A more detailed discussion about this issue is available in (Nauck and Kruse
1998).
2.7 Inference examples
According to Sections 2.5 and 2.6, the inference algorithm that establishes the
numerical mapping between the fuzzy system variables consists of six steps. To
apply the algorithm, the association between the rule-oriented notation (that
indexes the fuzzy subsets in respect to their occurrence in the rth rule) used for
convenience in description of the algorithm and variable-oriented notation (that
numbers the fuzzy subsets in respect to which system variable they belong)
must be created.
Let us present a simple and illustrative example. Let the system have two inputs
and single output, let there be two subsets for both inputs and three for the
output variable. The non-redundant, consistent and complete rule base would
then be (in variable-oriented notation).
1. IF U1 is A11 AND U2 is A21 THEN V is B2
Thus, the variable-to-rule mapping matrix (2.31) appears as
21
1 1 2
1 2 1 
 .
2 1 3
 
2 2 2
It is possible to depict this fuzzy system as a network structure (Fig. 2.9). Each
layer of the network represents the respective step of the inference algorithm. In
fuzzification/proposition matching layer the input membership function
parameters are stored. The arrows that indicate fuzzy data flow are bold, crisp
data flow is depicted with normal lines. The mapping (2.31) is determined by
connections between the 1st and 2nd layer and the connections between output
MFs and implication layer.
proposition premise
implication aggregation
matching conjunction
x1
µ11 ∩ ∩
µ12 ∩ ∩
F(y)
∪
x2
µ21 ∩ ∩
µ22 ∩ ∩
γ1 γ2 γ3
Fig.2.9. Network representation of a fuzzy system.
We observe how output value y is inferred for the given input values
x1 = x1* , x 2 = x 2* when
i) inference operators for conjunction, implication and aggregation are
minimum, minimum and maximum, respectively and center-of-gravity
defuzzification is used (Fig. 2.10)
ii) product-product-sum inference is combined with mean-of-maxima
defuzzification (Fig.2.11).
22
A11 A12 A2 1 A2 2 B1 B2 B3
1 1 1
µ11(x1)
µ21(x2) τ1
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ11(x1) µ22(x2) τ2
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ12(x1) µ21(x2) τ3
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ22(x2)
µ12(x1) τ4
0 0 0
x1 x2 y
x1* x 2*
B1 B2 B3
1
0
y* y
Fig 2.10. Min-max inference with COG defuzzification.
23
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ11(x1)
µ21(x2) τ1
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ11(x1) µ22(x2)
τ2
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ12(x1) µ21(x2)
τ3
0 0 0
x1 x2 y
x1* x 2*
A11 A12 A21 A22 B1 B2 B3

1 1 1
µ22(x2)
µ12(x1) τ4
0 0 0
x1 x2 y
x1* x 2*
B1 B2 B3
1
0
y* y
Fig 2.11. Prod-sum inference with MOM defuzzification.
24
2.8 Takagi-Sugeno fuzzy systems
Fuzzy systems observed so far belong to the class of standard (also linguistic,
Mamdani) fuzzy systems. Standard fuzzy systems appear particularly useful
when human-machine interface is under observation, because it is the linguistic
nature of the system that makes the information stored in the fuzzy system
intuitively understandable and vice versa - it gives us possibility to implement
our knowledge about the system. On the other hand, there is acknowledged
deficiency concerning efficient data-driven modeling algorithms that could be
applied to standard fuzzy systems. Not satisfied with the situation, Takagi and
Sugeno (1985) came up with the alternative rule format (2.35) in order to make
automated tuning possible and to reduce the number of fuzzy rules needed to
model a system.
(2.35)
THEN yr = p0r + p1rx1 + … + pirxi + … + pNrxN
In Takagi-Sugeno (TS) rules consequent fuzzy proposition is replaced by an
affine linear function of inputs and each rule can be considered as a local linear
model that are then blended together by means of aggregation to form the
overall output y.
The redefinition of fuzzy system influences the 4th step in inference algorithm -
implication
N
Fr ( y ) = τ r ∩ y r = τ r ∩ ( p0 r + ∑ pir xi ) . (2.36)
i =1
Modified is also the 6th step - defuzzification. With TS systems the implication
and aggregation operators are product and sum respectively, using which
center-of-gravity defuzzification reduces to an algorithm that is known as fuzzy
c-means defuzzification (FcM). FcM, in fact, combines the aggregation and
defuzzification into one operation and is thus more than a defuzzification
method (Jager 1995).
R N
∑τ r ( p0r + ∑ pir xi )
r =1 i =1
y = Y fcm ( F ( y )) = R (2.37)
∑τ r
r =1
A special case of the consequent function where the offset p0r = 0, r = 1...R
results in homogeneous TS system. Another and particularly interesting special
case of TS systems is obtained if the consequent function is a constant (∀pir = 0,
i =1…N, r = 1...R), thus (2.35) reduces to (2.38) and (2.37) reduces to (2.39).
(2.38)
THEN yr = p0r
25
R R
y = Y fcm ( F ( y )) = ∑τ r p0 r ∑τ r (2.39)
r =1 r =1
It is easy to see complete equivalence between singleton standard fuzzy systems

(2.29) and 0th order TS systems (2.39).
Sometimes FcM defuzzification is applied to standard fuzzy systems so that
before performing the weighted sum, each output fuzzy set is represented by its
numerical representation br, which is normally chosen to be the center of gravity
of the given output set. This, however, is more or less equivalent to replacing
the fuzzy system with the corresponding singleton system (or 0th order TS
systems) because output MF parameters other than their crisp representation
have no influence to the system output.
Another logical conclusion is that 0th order TS systems (as opposed to ordinary,
1st order TS systems) retain linguistic interpretability in the manner of standard
fuzzy systems while possessing these attractive properties of TS systems that
open the way for automated determination of system parameters from data.
Like standard fuzzy systems, TS systems can also be depicted as a network
structure but in this case the analogy with neural networks is even more obvious
(Fig. 2.12). In particular, equivalence between 0th order TS systems and radial
basis neural networks has been shown (Jang 1993b).
p11x1 + p21x2 + p01
µ11
x1
µ12 Π N Π
y
Σ
µ21
x2 Π N Π
µ22
p12x1 + p22x2 + p02
Fig. 2.12. 1st order TS system in network representation.
This equivalence means that many techniques (both in modeling and control)
developed for neural networks can be adopted by TS systems.
26
2.9 Design of fuzzy systems
Two major applications of fuzzy systems are fuzzy modeling and fuzzy control.
Two general sources of information for building fuzzy systems are the prior
knowledge and data (measurements). Prior knowledge can be of a rather
approximate nature that usually originates from "experts", e.g. process designers
or operators, who are asked to express their knowledge in the form of fuzzy
rules. Hence, such fuzzy systems can be regarded as fuzzy expert systems.
For many processes, data about the process operation is recorded in a daily
routine. If this is not the case, special experiments can be designed to obtain the
relevant data. Building fuzzy systems from data involves special algorithms
designed for that task. The acquisition or tuning of fuzzy systems by means of
data is usually termed fuzzy identification.
There is certain parallelism with classical system modeling - knowledge based
design is somewhat analogous to first principle modeling while fuzzy
identification belongs to the same class with statistical methods used in system
identification. In classical modeling quite often a combined approach is used
where we use physics to write down a general differential equation that is
believed to represent the system behavior and then experiments are performed
to determine certain system parameters or functions. Similar combined
approach is widely used in fuzzy system design.
Design of fuzzy systems may be seen as a general algorithm consisting of six
steps (Yager and Filev, 1994).
i) Selection of the input and output variables;
ii) Selection of the appropriate reasoning mechanism for the formalization
of the fuzzy model;
iii) Determination of the universes of discourse;
iv) Determination of the linguistic labels into which the variables are
partitioned;
v) Formation of the set of linguistic rules that represent the relationships
between the system variables;
vi) Evaluation of system adequacy.
Quite often this algorithm results in preliminary fuzzy system only, if during the
system evaluation phase it is revealed that the performance index (e.g. root-
mean-square error) of the system is not what was expected. Steps i-v can then
be regarded as the structure determination of the fuzzy system and further
parameter identification phase is needed during which the membership function
parameters obtain supposedly optimal values. If input-output data that reflects
the optimal behavior of the system is supplied, it is natural to apply data-driven
techniques, observed in more detail in chapter 4. In some cases, however we
need to return to the beginning.
27
With complex systems, it is not always clear, which variables should be used as
inputs to the model. Prior knowledge, insight into process behavior and the
purpose of modeling are the typical sources of information for this choice.
For the selection of appropriate reasoning mechanism (including specification
of system type, inference operators, defuzzification method, MF types) the
deciding factors are again the purpose of modeling and the type of available
knowledge. The identification algorithms may play a role here e.g. with
derivative based identification algorithms the fuzzy inference algorithm must be
differentiable and thus inference operators are predetermined. Computational
cost may also be the issue - Some inference schemes are computationally more
expensive than others - e.g. CoG defuzzification compared to MoM or FcM (in
this sense, simplified inference algorithms as (2.30) are important). Not the least
of the deciding factors are the interpolation properties of the system that are
determined by the reasoning mechanism.
Fuzzy system design is quite application-dependent and exact design algorithm
cannot be defined. The general guidelines, however, provide the reliable
framework for the design of fuzzy systems.
2.10 Summary
In this chapter the basics of fuzzy set theory and fuzzy logic were considered
(sections 2.1-2.4) that appear as extensions to crisp set theory and Aristotelian
logic, respectively, and serve as the basis for building fuzzy systems. The
presented material constitutes only a small part of the huge body of fuzzy set
theory and fuzzy logic but there is more than enough in order to understand the
rest of the thesis.
Fuzzy systems allow the processing of information in linguistic terms that is
expressed in the form of IF-THEN rules and is built on the analogy with human
reasoning. Besides the linguistic layer, information processing takes also place
at numerical level, using the special inference algorithm. Inference algorithm is
a six-step procedure with a large degree of flexibility (there exists a large family
of inference operators and fuzzification and defuzzification methods) that
creates unique input-output mapping between the system (base) variables.
This unique architecture of fuzzy systems makes them useful for man-machine
interaction problems and makes possible to use human experience and
knowledge usually expressed in vague terms otherwise difficult to implement.
In addition to purely linguistic fuzzy systems where all variables are partitioned
into fuzzy sets, there exists another form of fuzzy rules where consequent part is
a linear function of inputs known as Takagi-Sugeno rules (see section 2.8). TS
systems have become increasingly popular because their inference algorithm is
mathematically less complex and allows acquisition of control/modeling
techniques from other fields of research of more analytical character.
28
Particularly attractive are 0th order TS systems (that at the same time can be
regarded as singleton standard fuzzy systems) because of intuitively
understandable rule base and computationally inexpensive inference algorithm.
Similarly attractive are the inference properties of standard fuzzy systems with
symmetrical triangular MFs.
Fuzzy system design issues were only briefly considered here, we also return to
them in the following chapters, specifically in chapter 4 that will be dedicated to
fuzzy identification - acquisition of fuzzy models from training data.
29
30
Interpolation and
transparency in
fuzzy systems
3.1 Transparency and interpretability
The use of the term transparency in present work is based on (Brown and
Harris 1994) where transparency is defined as a property that enables us to
understand the influence of each system parameter on the system output as well
as on (Setnes et. al. 1998) where fuzzy systems are characterized as being
transparent to interpretation.
Fuzzy system transparency is closely related to the concept of linguistic
interpretability but these are not matching terms and, in our opinion, it is very
important to see the distinction. Interpretability is a property of fuzzy systems
that exists by default, being established with linguistic rules and fuzzy sets
associated with these rules; even the rules of 1st order TS systems can be
interpreted. Transparency, on the other hand, is not a default property of fuzzy
systems and being the measure of how valid or how reliable is the linguistic
interpretation of the system. It will be shown in this chapter that for standard
fuzzy systems and 0th order TS systems, transparency has binary character; for
1st order TS systems it is a continuous variable.
Most authors, however, do not make this distinction; some of them do not pay
attention to transparency at all and consequently assume that transparency like
interpretability is a default property of fuzzy systems (sometimes regarded
characteristic to standard and 0th order TS systems only as in (Nauck et. al.
31
1996); others do emphasize that transparency of fuzzy systems is not guaranteed

by default (Yin 2000), (Babuska 2000) but use the terms in parallel.
There is yet another aspect of the problem that sometimes gets mixed up with
transparency of fuzzy systems. We speak of readability of fuzzy rules that
basically boils down to the overall complexity of the system. Improvement of
readability through the use of moderate number of variables, rules and fuzzy
subsets or by avoiding the inconsistencies in the rule base is undoubtedly useful
but has little in common with transparency as understood in this thesis. We
concentrate on low-level transparency that grows out from conformity between
the linguistic layer and the inference layer of a fuzzy system. This conformity is
necessary enables us to "see" through the inference layer and is the precondition
for making fuzzy systems both predictable and reliable in their behavior.
In fact, very few authors (Lotfi et. al. 1996), (Oliveira 1999), (Yin 2000),
(Babuska 2000) have investigated the latter issue in any detail. The most
important of these works is perhaps (Oliveira 1999) that lists a set of properties
(moderate number of MFs; natural zero positioning, normality, coverage and
distinguishability of MFs) that fuzzy systems should meet and proposes
mathematically formulated constraints for preserving the last two, incorporated
into the cost function of the gradient descent learning algorithm. These works
dealing with low-level transparency, however, aim for certain balance between
transparency and accuracy and the results can be generally applied only to a
limited class of systems/algorithms.
On the other hand, there are even fewer works concentrating on the
transparency problem of 1st order TS systems (Yen et. al. 1998), (Bikdash
1999), (Fiordaliso 2000).
Our aim is therefore to unite all these efforts into the general definition of fuzzy
system transparency. It is claimed that "currently there exists no well-
established definition of transparency of a fuzzy system" and "there are no
definite criteria for the distinguishability of a fuzzy partition" (Yin 2000).
Hopefully, solutions proposed to these problems in present chapter, help to fill
the void.
Once the transparency conditions for fuzzy systems are defined, interpolation
properties of fuzzy systems can be revised in more systematic manner.
Interpolation and transparency could be regarded as two sides of a coin;
therefore it would be unreasonable to ignore the interpolation aspect in present
chapter.
The transparency conditions can be easily satisfied if fuzzy systems are
obtained through manual design. The key problem in fuzzy modeling (and
control) is that transparency is generally lost when fuzzy systems are identified
from data. Transparence conditions serve as the basis for establishing
transparency protection mechanisms that are discussed in the next chapter.
32
It is also important to point out that although the specific interest toward fuzzy
system transparency is not very prominent in academic circles, there are and
always have been authors who use transparent fuzzy systems (according to the
general definition proposed in this thesis) in their research, not mentioning the
everyday practitioners of fuzzy logic control. E.g. (Setnes et. al. 1998) and
(Jager 1995) are listed here as important sources of inspiration.
As the conclusion to this introduction it must be stressed that transparency is
certainly not the universal requirement for fuzzy systems. When the fuzzy
system is used as a black box and its interpretation is the least of the concerns of
its end user, transparency aspect can be freely ignored. It must be noted,
however, that generally, system transparency is an useful property that provides
additional means for control system or model validation and in some cases,
transparency facilitates the application of transparency-based control methods.
One of the aims of the thesis is to demonstrate this through the applications.
3.2 Transparency of standard fuzzy systems
Let us consider the properties listed in (Oliveira 1999). It is arguable if coverage

and natural zero positioning have anything to do with transparency (Babuska
2000). Normality on the other hand, is the standard assumption in fuzzy
systems. Distinguishability of input MFs that is in turn directly related to the
overlap of input MFs is, however, vital to transparency as shown in the
following. Note that the conclusions are valid for 0th order TS systems, too.
The overlap of input MFs is also one of the most important factors influencing
interpolation in fuzzy systems. It is reported (Shaw 1998) that a suggested
minimum of 25% and a maximum of 75% have been established
experimentally. Frequently, 50% overlap is a reasonable compromise. The
effect of overlap to the interpolation can be most conveniently observed in two-
dimensional space that we do by constructing five otherwise equivalent SISO
fuzzy systems, made up of 6 rules with 0%, 25%, 50%, 75% and 100% overlap
degree, respectively. Although other system parameters (including minimum t-
norm, maximum s-norm and CoG defuzzification) remain the same, in each
case quite a different result is obtained (Fig 3.1). With 0% overlap, no
interpolation occurs, hence the system is actually non-fuzzy and its output
abruptly switches from one rule centroid to another. With 25% overlap the input
intervals for what the output has constant value, are still present but some
interpolation between the neighboring rules occurs.
With 50% overlap, the interval where the system output is the explicit
contribution of the given rule is reduced to a single point. With larger overlap,
however, at least two rules contribute simultaneously for any given input, thus
system output is always the result of interpolation. This makes the contribution
of a given rule invisible in system output. We suggest that such feature would
not be exactly the desired one. The phenomenon is driven to extreme with 100%
33
overlap where all rules are fully activated simultaneously and system output has
constant value, equaling to the centroid of the union of output fuzzy sets.
4
50%
3 0%
75%
2
0%
100%
25%
1
y 0
50%
IF x is mf1 THEN y is mf3
-1
-2
IF x is mf3 THEN y is mf2 75%
-3 25% IF x is mf5 THEN y is mf5
IF x is mf6 THEN y is mf3 100%
-4
0 2 4 6 8 10 0 2 4 6 8 10
x x
Fig. 3.1. Overlap degree of input MFs (right) and its influence to system output
(left).
Let us consider again the case of 50% overlap and let us refer to the point in
input-output space where the explicit contribution of a given rule takes place
and the rule under observation is fully activated as transparency checkpoint.
When the overlap is equal or smaller than 50%, transparency checkpoints do
exist. Closer inspection reveals that the input coordinate of the transparency
checkpoint is equal to the center of the fired MF (where µ(x) = 1). Building up
on the analogy, the desired output y at the transparency checkpoint would also
be the center of the respective output MF, where γ(y) = 1. This ensures that the
interpretation of the rule that we are able to obtain by combining the
information from the rule base and MF definition base has good correspondence
with the inferred numerical values. This is what we call transparency. This
ideology of transparency checkpoints extends to MISO and MIMO systems and
is covered by the following definition.
Definition: rth rule of the standard MIMO fuzzy system (2.27) is transparent if
it's activation degree
N
τ r = I µ ir ( xi ) = 1 , (3.1)
i =1
results in system output

y j = b jr , j = 1…M (3.2)
where bjr is the center of the output MF γjr associated with the activated rule.
34
mf1 mf2 mf3 mf4 mf5 mf6

5
4
mf1
3
2 mf2
1
mf3
y 0
-1
mf4
-2
-3
mf5
-4
-5
0 2 4 6 8 10
x
Fig 3.2. Transparency checkpoints, depicted by 2.
A standard fuzzy system (2.27) can be regarded transparent only if all its rules
are transparent (Fig. 3.2).
In order to preserve input transparency (3.1) with triangular input MFs (4.35),
that is, to guarantee the existence of transparency checkpoints, the following
conditions apply:
cis −1 ≤ bis ≤ a is +1 ,
(3.3)
i = 1, ..., N ; s = 2, ..., S i − 1.
In order to preserve output transparency in case of CoG defuzzification (2.24),
(3.4) must be satisfied.
y max
∫ yγ jr ( y)dy
y min
y max
= b jr (3.4)
∫ γ jr ( y)dy
y min
(3.4) implies that output MFs must be symmetrical. Note that with MoM
defuzzification, however, (3.4) would not be necessary.
Next we generalize the transparency conditions so that they can be applied
universally to other types of MFs. More general formulation of (3.3) is as
follows:
Si
∀x ∈ X : ∑ µis ( xi ) ≤ 1 (3.5)
s =1
Note that if (3.5) is strictly equal to 1, a fuzzy partition (2.12) is established.
35
(3.4) rewritten in general form:

ymax
∫ yγ jr ( y)dy
ymin
Υcog (γ jr ( y j )) = ymax
= core(γ jr ( y j )) (3.6)
∫ γ jr ( y)dy
ymin
It must be taken into account that with several MF types such as Gaussian
(A.2), (3.5) cannot be achieved because of non-compact support. This means
that in order to achieve transparency, input MFs must satisfy certain conditions
that follow.
With fuzzy number-like MFs, defined by three parameters a, b and c; the
following conditions must be satisfied:
a ≤ b ≤ c
a = min(supp( A))

 (3.7)
b = core( A)
c = max(supp( A))
With fuzzy interval-like MFs, defined by four parameters a, b, c and d; the

following conditions must be satisfied:
a ≤ b ≤ c ≤ d
a = min(supp( A))

b = min(core( A)) (3.8)
c = max(core( A))

d = max(supp( A))
If the use of smooth MFs is prescribed then possible choice is a spline-based

MF satisfying (3.8), such as square spline (A.6) or cubic spline (A.7).
µ(x)
1.0
0.5
0
0 a b c d x
Fig. 3.3. Comparison of cubic and square spline based MFs.
Fig. 3.3 demonstrates that the actual numerical difference between cubic and
square spline based MFs is quite small.
36
3.3 Interpolation in standard systems
If a standard fuzzy system is transparent, we are able to predict its output at

transparency checkpoints. Between these points, however, the output is the
result of interpolation that takes place between individual rules. The nature of
interpolation is determined by fuzzy system parameters - defuzzification
method, inference operators and shape of membership functions. In next few
sections these factors are addressed separately.
3.3.1 Role of defuzzification
We observe the influence of basic defuzzification methods to fuzzy system

output by using the SISO system from section 3.2 with 50% overlap, leaving all
other parameters intact.
The most obvious is the effect of MoM defuzzification that results in stepwise
output. This is the reason why MoM, although computationally inexpensive, is
seldom used in modeling where we usually expect smooth interpolation
between the transparency checkpoints. With this method, system is also
insensitive to all other parameters otherwise influencing the interpolation and a
multi-level relay what a fuzzy system with MoM defuzzification is, could be as
well implemented with classical set theory (if output fuzzy sets are
symmetrical).
4
y 0
-1
-2
-3
-4
0 2 4 6 8 10
x
Fig. 3.4. Output interpolated by MoM (normal), FcM (bold) and CoG (dashed)
defuzzification methods.
As noted in section 2.8, FcM defuzzification transforms the original system to

the 0th order TS system and the resulting interpolation between the transparency
checkpoints is linear. The latter may be regarded a desirable property.
37
Finally, in case of CoG, interpolation results in a curve, and the exact shape of it
is determined by other parameters, most notably by the relative magnitude of
output fuzzy sets. If two output MFs are of equal size, the interpolated output
curves around the linear interpolation intersecting it at the midpoint. If one of
the output MFs is larger than other, then the interpolation is "drawn" to the
direction of the larger set as shown in Fig. 3.5.
A B
1.0
µ(x)
supp(A) < supp(B)

bB 0
bA bB
y
A B
supp(A) = supp(B) 1.0
y
µ(x)
supp(A) > supp(B) 0

bA bB
y
A B
bA
1.0
x
µ(x)
0
bA bB
y
Fig. 3.5. Interpolation with CoG defuzzification (linear interpolation between the
transparency checkpoints is depicted by dashdot).
3.3.2 Role of MF type
On the basis of their shape, MFs can be divided into following subcategories:
a) piecewise linear MFs (i.e. triangular and trapezoidal MF)
b) smooth MFs (spline-based MFs)
Another classification is based on the determination if the core of the MF is a

single point or not.
a) fuzzy numbers (triangular MF, 3-parameter spline-based MF)
b) fuzzy intervals (trapezoid MF, 4-parameter spline-based MF)
38
4 4
3 3
2 2
1 1
y 0 y 0
-1 -1
-2 -2
-3 -3
-4 -4
0 2 4 6 8 10 0 2 4 6 8 10
mf1 mf2 mf3 mf4 mf5 mf6 mf1 mf2 mf3 mf4 mf5 mf6
1.0 1.0
0 0
0 2 4 6 8 10 0 2 4 6 8 10
x x
Fig. 3.6. Use of trapezoid or smooth MFs instead of triangular ones and its
influence on interpolation.
0.8 0.8
0.75 0.75
0.7 0.7
0.65 0.65
y 0.6 y 0.6
minimum minimum
0.55 0.55
0.5 0.5
0.45 product 0.45 product
0.4 0.4
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
0.6
mf1 mf2
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
y
Fig. 3.7. Product vs. minimum implication with sum aggregation (above left),
Product vs. minimum implication with maximum aggregation (above right),
output MFs (below)
39
With fuzzy intervals, a zone of insensitivity forms around the transparency

checkpoint. Length of the zone is proportional to the size of core of the
contributing MF (rule). The effect is quite similar to the one we experienced
with the overlap smaller than 50%. The interpolation outside these transparency
zones significantly deviates from the original interpolation (Fig. 3.6, left).
If smooth MFs are used, additional non-linearity (Fig. 3.6, above right) is
introduced. The deviation from the original interpolation as with fuzzy intervals
in previous case depends on the deviation of input MFs from the original input
partition, being proportional to the support of the contributing fuzzy set.
3.3.3 Role of inference parameters
It is interesting to note that inference parameters influence the interpolation

significantly only if CoG defuzzification is used. MoM was already considered
insensitive to all other parameters of fuzzy systems from interpolation
viewpoint, with FcM the following characteristics occur.
a) output membership functions are crisp, thus τ r ⋅ p0 r ≡ min(τ r , p 0 r )
R
b) ∑τ r ⋅ p0r ≡ max(τ 1 ⋅ p01 , ... ,τ R ⋅ p0 R ) if output singletons p0r do not
r =1
match, that is generally true for 0th order TS systems because they are
usually used in a configuration where each rule is assigned an unique
singleton p0r (combinatorial rule base).
Thus, we conclude that the nature of interpolation in the case of 0th order TS
systems depends little on inference operators (premise conjunction operator
plays a small role).
With CoG defuzzification, both aggregation and implication operators have
impact on interpolation. Note that output MFs in Fig 3.7 (below) are not equal
in size and that they overlap (that is where aggregation by sum and maximum
differs). It is clear that product implication provides smoother interpolation as
well as that with maximum aggregation the interpolated output deviates more
from the linear interpolation than with sum aggregation.
3.3.4 Interpolation in multidimensional space
Although the conclusions about interpolation issues based on observations

made on SISO systems can be basically generalized to MISO systems
some substantial differences exist. E.g. (linear) interpolation in SISO 0th
order TS systems is not linear in multidimensional space (Fig 3.8).
The reason is simple: the output that is interpolated from four neighboring rules
(transparency checkpoints) is not linear because planar surface is determined by
40
three freely chosen points. When generalized, it turns out that the difference
between the number of neighboring rules and number of points that defines
linear interpolation increases with every extra input added and linear
interpolation is possible only if those two are equal, i.e. in SISO case (Table
3.1).
x2
x1
Fig. 3.8. Interpolation between 4 rules in a MISO 0th order TS system.
Table 3.1. Why linear interpolation occurs only in 2-dimensional space

No. of No. of transparency No. of points defining the
inputs checkpoints linear interpolation
1 2 2
2 4 3
3 8 4
… … …
N 2N N+1
3.4. Interpolation in 1st order TS systems
We defined transparency conditions for standard and 0th order TS fuzzy systems
that guarantee that for the full activation of any given rule we are able to predict
system output correctly. For each rule such transparency checkpoint can be
easily found. Beyond those checkpoints system output is the result of
interpolation that occurs between the individual rules and the nature of
interpolation depends on many system features including defuzzification
method, type and shape of MFs and inference operators but remains predictable.
Violation of transparency conditions results in a non-transparent system and
predictability is lost. The nature of transparency conditions implies that
transparency of these kinds of systems is of binary nature.
Interpolation in 1st order TS systems significantly differs from that in standard
or 0th order TS systems. Here each rule by itself represents a linear relationship
41
between the system variables. Overall output is a combination of those linear

local by the means of interpolation.
y y
x x
mf1 mf2 mf1 mf2

1.0 1.0
µ(x) µ(x)
q
0 0
di q bi+1 x di+1 bi+1 x
Fig. 3.9. V-type interpolation (left) and S-type interpolation (right).
The interpolation issues of TS systems have been analyzed in (Babuska et. al.
1994) and (Babuska et. al. 1996) that distinguish S-type and V-type
interpolation. The type of interpolation depends on the coefficients of the local
model yr (Fig. 3.9).
If the intersection point of two interpolating local models falls into the
interpolation area (di < q < ci+1), V-type interpolation is the case. Otherwise, S-
interpolation occurs.
Basic conclusions about different interpolation types are summarized in Table
3.2.
Table 3.2. Comparison of V- and S-type interpolation.
Interpolation type Interpolation properties Application area
S-type interpolation Intuitively expected Stepwise and possibly
results discontinuous function
approximation
V-type interpolation some undesirable Continuous, smooth function
properties approximation
According to Table 3.2 neither of the interpolation types has clear advantage
over another. In (Babuska et. al. 1996), however, preference seems to be given
to V-type and weighted-mean defuzzification algorithm is replaced by another
functional - smoothing maximum. This replacement can be considered a
42
deviation from the "classic" TS inference algorithm, and is not accepted here
because of computational complexities of smoothing maximum.
The distinction between V and S-type interpolation is given in general terms by
Babuska. According to it, the interpolation between a pair of affine rules (Ri, Rj)
is of the V-type if and only if
Ω ij I S ij ≠ 0 and Ω ij I (C i U C j ) , (3.9)
where Ωij denotes the intersection of the consequents of Ri and Rj projected on
x, the vector of input variables; Sij denotes the support of the intersection of
affine membership functions associated with these rules and Ci, Cj denote the
cores of the respective affine rules. With clear preference given to V-type
interpolation, (3.9) should be maintained for all rules throughout the training
process.
We observe how to apply (3.9) to TS systems with single input and then with
two inputs to give an illustration of the complexity of the problem.
First, let us consider a SISO TS system. Assuming that we are employing four-
parameter input MFs ( µ (a) = µ (d ) = 0, µ (b) = µ (c) = 1 , a ≤ b ≤ c ≤ d ), (3.9) is
satisfied if for two neighboring local models yi = p0i + p1ix and yi +1 = p0, i + 1 + p1,
i + 1x the following holds:
p 0i − p0,i +1
ci < < bi +1 , i = 1…R - 1. (3.10)
p1,i +1 − p1i
For the system with two inputs each local model yij = has four neighboring local
models: yi-1,j, yi+1,j, yi,j-1, yi,j+1 (Fig. 3.10) except for those yij that are positioned at
the extremes of the domain; for the latter special conditions apply.
The projection of the intersection of y ij = pij0 + pij1 x1 + pij2 x 2 and any of its
neighboring local models onto the input space results in a line. The coordinates
of intersection points of these lines can be found from the following equation
systems.
 yij = yi +1, j  yij = yi +1, j  yij = yi −1, j  yij = yi −1, j
qij1 :  , qij2 : , qij3 : , qij4 : . (3.11)
 yij = yi , j +1  yij = yi , j −1  yij = yi , j −1  yij = yi , j +1
q1 q1
Thus ( x1 ij , x 2 ij ) , the coordinates of qij1 can be found from the following
equation system
q1 ( pi0+1, j − pij0 )( pij1 − pi1, j +1 ) − ( pij1 − pi1+1, j )( pi0, j +1 − pij0 )

x 2 ij = (3.12)
( pij1 − pi1+1, j )( pi2, j +1 − pij2 ) + ( pij1 − pi1, j +1 )( pij2 − pi2+1, j )
43
p1
q1 pi0, j +1 − pij0 − x 2 ij ( pij2 − pi2, j +1 )
x1 ij = . (3.13)
pij1 − pi1, j +1
x2
µj+1
yij+1
qij4
qij1
µj yi-1,j yij yi+1,j
qij3 qij2
µj-1 yi,j-1
µi-1 µi µi+1 x1
Fig. 3.10. The projection of input-output relation onto the input space of two
input/single output TS system.
Now, (3.9) translates into the following form

c < x q1ij < b c < x qij2 < b
 i 1 i +1  i −1 1 i
 q1ij
,  qij2
. (3.13)
c j < x 2 < b j +1 c j −1 < x 2 < b j
The procedure is to be repeated for all Nq points to ensure the V-interpolation

for all rules
 N 
N q = 2 N  ∏ S i − 1 , (3.14)
 i =1 
where Si is the number of MFs per ith input variable and N is the number of
inputs.
With three or more inputs the computation of (3.9) is getting even more
complicated and consequently even if the decision is made in favor of V-
interpolation, our possibilities for preserving that type of interpolation during
the training of TS systems remain very limited.
44
3.5 Transparency of 1st order TS systems
Previous section made clear that interpolation in 1st order TS systems is a

complicated issue. For two types of interpolation that can be distinguished, the
clear order of preference cannot be given. And even if we specify the
preference, there are no reliable means for preserving the interpolation type.
According to the rule format (2.34) we expect a 1st order TS system to create a
smooth approximation of piecewise linear input-output mapping, assuming that
in the global output y of the system (2.36) the local linear models yr are
“sufficiently” distinguishable. This implies that we want relatively little
interpolation between the neighboring rules because in these areas system
output is a weighted average of contributing local models. The degree of
interpolation depends can be directly influenced by the overlap of input MFs
and sizes of their cores as is demonstrated with the following example where
five otherwise equivalent 1st order TS systems are obtained by varying the
overlap and the magnitude of the cores of input MFs. We construct separate
examples for V-type and S-type interpolation (Figs. 3.11-3.12).
Note that in the case of 100% overlap system output equals.
R N R N
∑τ r ( p0r + ∑ pir xi ) ∑ ( p0r + ∑ pir xi )

r =1
y= r
R
i
= i
= p0 + p1 x1 + p 2 x 2 + ... + p N x N ,
R
∑τ r
r =1
where
R
∑ pir
r =1
pi = , (i = 0...N )
R
which basically says that system rule base is replaced by single "average rule".
With smaller overlap and larger cores, the region where system output is the
contribution of a single rule increases and vice versa; consequently, the
relationship between interpolation and transparency error is rather
straightforward. With 0% overlap, the resulting perfect piecewise linear system,
however, is interpolation-free and can be hardly considered a fuzzy system
anymore. This could be called a transparency paradox of 1 st order TS systems.
If the overlap of input MFs equals 50%, and the core of rth rule is a single point,
the existence of the transparency checkpoint in output space where y = yr is
guaranteed. This, however, cannot be a sufficient condition for transparency (as
was the case with 0th order TS and standard fuzzy systems) because the degree
of interpolation is too high to allow correct interpretation.
Inability to produce explicit transparency constraints for 1st order TS systems
can partly be blamed on undesirable interpolation properties. Although V-type
45
interpolation ensures smaller transparency error (Fig. 3.12) than S-type

interpolation (Fig. 3.11, left) on given conditions, it is still very different from
smooth interpolation that we would expect.
Based on the interpretation of TS rules, a measure of transparency error (3.15)
can be constructed that estimates the difference between the global output y and
locally fired model yr.
K
∑ (( y(k ) − maxyτr (k )(k ))) 2 (3.15)

k =1
ε tr = r
,
K
where yr denotes the local output of the rth rule with the highest activation
max τ r ( k )
degree for the kth input-output pair and y(k) is the corresponding global output
value.
3
2.8 0%
50%
2.6
0%
2.4 25%
100%
2.2
y 2
75%
50%
1.8
25%
1.6
75%
1.4
1.2
1
100%
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
Fig. 3.11. Input-output relation of the SISO TS systems with S-type interpolation
(left) depending on the input partition of the system (right).
1.9
1.8
0% 75%
1.7 100%
1.6
y 1.5
1.4
1.3 50%
1.2 25%
1.1
1
0 0.2 0.4 0.6 0.8 1
x
Fig. 3.12. Input-output relation of the SISO TS systems with V-type interpolation
depending on the input partition of the system (Fig. 3.11, right).
46
3.6 Relationship between 0th and 1st order TS systems
We experienced that in SISO 0th order TS systems with triangular input MFs the
interpolation between the transparency checkpoints is linear. On the other hand
we observed that in 1st order TS systems each rule represents a linear
relationship and when the input MFs of the system are rectangular, no
interpolation occurs and system output can be directly derived form its rule
consequents. Seeing the parallel it is not difficult to develop the exact
conversion mechanism between SISO 0th and 1st order TS systems.
In SISO systems, rule-based indexing and variable-based indexing of MFs
results in equivalent notation. Given a SISO 0th order TS system with triangular
input MFs that form the partition, parameters of the corresponding equivalent 1st
order TS system are determined in the following way.
ar -1 ar ar +1
2.5
pr -1
2 pr
y 1.5
1
pr +1
0.5
b'r -1 b 'r b'r +1 x
Fig. 3.13. Conversion scheme between SISO 0th and 1st order TS systems.
Input MFs of the 1st order TS system are given by trapezoidal MFs, whereas
a r' = br' = c r' −1 = d r' −1 . Hence, for the definition of the fuzzy set only two
parameters br' , c r' are required and the other one can be obtained from the next
MF ( c r' = br' +1 ). These parameters are then obtained from the following equation
br' = a r , r =1…R. (3.16)

Note that for 0th order TS systems one extra rule is needed compared to 1st order
TS systems, thus the second parameter of the last trapezoid MF is obtained from
that extra rule c R' = d R' = a r +1 .
47
The coefficients p 0' r , pir' of 1st order TS system are identified from the
following equations
p 0 r − p0,r −1
p1' r = , p 0' r = p1' r a r −1 − p0 r . (3.17)
a r −1 − a r − 2
For the reverse conversion (3.16) is applied in reverse order and output
singletons are found by performing inference
R
∑ µ r' ( x)( p0' r + p1' r x)

p0r = r
R
, (3.18)
∑µ '
r ( x)
r
using x = ar for each p0r.

The reasoning resulting in (3.16 - 3.18) implies that only for a very limited
group of 1st order TS systems, valid conversion to 0th order TS systems can be
obtained.
3
2
1
0
-1
-2
-3
4 10
3
2 5
1
0 0
Fig. 3.14. Interpolation of a 0th order TS system with two inputs.
Can these results and algorithms be generalized to MISO systems? The answer
is no. For standard systems (including 0th order TS systems) transparency
checkpoints define rectangular grid from what output is interpolated (Fig 3.14).
As we observed in section 3.3.4, the interpolation of 0th order TS systems
between these points is not linear. For MISO 1st order TS systems each rule still
represents linear relationship but the area where a given rule is valid is not
rectangular (Fig 3.10). Obviously these systems are no longer compatible as in
SISO case. And this is why proposed similar conversion algorithms such as
proposed in (Babuska and Verbruggen 1995) or (Chae et. al. 1999) should be
expected to produce approximated conversions at best.
48
3.7 Summary
In this chapter we proposed the definition of transparency for standard fuzzy

and 0th order TS systems (section 3.2). Transparency is regarded as the measure
of conformity between the linguistic and inference layer of a fuzzy system that
provides that the linguistic interpretation of fuzzy rules is valid. Transparency
conditions for standard fuzzy and 0th order TS systems based on this definition
were defined, related to overlap degree of input MFs and symmetricity of output
MFs.
System transparency is closely related to the factors influencing interpolation in
fuzzy systems and allows us to deal with this issue in more systematic manner
(section 3.3).
Transparency of 1st order TS systems is much more complicated matter. Due to
the different interpretation of rules and specifically of the nature of interpolation
between the rules (section 3.4), system transparency conditions cannot be
defined explicitly. The proposed transparency measure for transparency
evaluation (section 3.5) arises further problems - minimum transparency error is
obtained for systems that are non-fuzzy, and there is no trivial way for
preserving transparency of 1st order TS systems. A potential solution in the
modeling context is proposed in section 4.9.2 of the next chapter.
49
50
Fuzzy modeling
4.1 Introduction
Developing models of real systems or processes is important in many

disciplines of science and engineering - models can be used for analysis of the
system behavior, for acquiring better understanding of the underlying
mechanisms in the system, simulations, control system design, etc (Babuska
1997).
Modeling problems are traditionally solved in the context of mathematical
modeling using algebraic, differential or difference equations. The most
intuitive way to model a process is to use first principles of physics. Such
(white-box) modeling, however, requires a very good understanding of the
physical background of the process that may not simply be acquirable. Another
way is to perform system identification using real plant data to produce a model
(black-box modeling). The structure of black-box models usually is not related
to the structure of the real system and model parameters have no physical
meaning. Sometimes a combined approach is used where physics is used for
general differential equations and certain model parameters or functions are
identified from data using a black-box technique (gray-box modeling).
Another problem is that the identification algorithms have been sufficiently
developed for linear systems while most real processes are non-linear and can
be approximated by linear models only locally or, simplifying assumptions are
made that all too often distort the realities of earth processes. Additionally, there
are cases where there exists information that is of imprecise or qualitative nature
and cannot be used effectively by standard modeling approaches. It turns out
then that many processes cannot adequately be described mathematically, or
51
their descriptions are too complex to be of any practical value. This has
stimulated interest in fuzzy modeling and identification techniques.
Fuzzy modeling follows the guidelines of fuzzy system design (see section 2.6).
Two techniques of fuzzy system design were distinguished - identification of
fuzzy systems from a collection of data and expert opinion based approach. The
latter is largely problem dependent and more heuristics than exact algorithm. It
is obvious, however, that with identification algorithms the modeling procedure
can be automated to a certain level only (depending on the algorithm) and at
least some parameters (that have dramatic impact on modeling quality) come
from expert(s), such as the type of the system, number of variables, number of
rules, etc. so that the modeling procedure is more or less always a combination
of both approaches.
The main stress of the present chapter is on fuzzy identification algorithms,
having potential for transparent modeling. Perhaps the most important
technique in fuzzy modeling (in terms of accuracy) is gradient descent (GD)
adopted from neural networks (section 4.6), applicable to fuzzy systems that are
(piecewise) continuous and differentiable. To obtain transparent models,
however, constraints imposed on MF parameters (see previous chapter), must be
satisfied. This is accomplished with e.g. Jager algorithm for 0th order TS
systems (Jager 1995). In this chapter an extension of Jager algorithm is
proposed that allows training of standard fuzzy systems (section 4.9.1), thus
extending the approximation properties of the original algorithm. For the
transparency enhancement of 1st order TS systems another method is proposed
in section 4.9.2 that benefits from rule activation degree exponents.
For fuzzy systems that are not differentiable, gradient free approaches can be
applied, such as stochastic search, Nelder-Mead simplex method, simulated
annealing and genetic algorithms (GA). The latter (section 4.8) outperforms
other gradient free methods in terms of accuracy and extends the functionality
of identification (allowing rule base training which is rare in fuzzy
identification) but computational cost of the method makes it less readily
applicable for practical situations. Computational cost becomes the primary
criterion in algorithm identification if the number of adjustable parameters is
large anyhow and therefore batch techniques, such as clustering methods
reviewed in section 4.7, have advantage over incremental techniques.
Historically, fuzzy systems grew out from the context of human-machine
interface, and thus lack of identification algorithms was not considered a
deficiency, initially. Older identification algorithms have therefore quite modest
approximation properties compared to the methods developed more recently.
Recent applications, however, have largely forsaken this application area. A
very important step in the direction of high-quality identification was the
introduction of TS systems along with least-square procedure for the
identification of consequent functions (Takagi and Sugeno 1985). This can be
also qualified as a decisive step with what the semantic properties of fuzzy
52
systems have been sacrificed for the sake of accuracy. Pre-TS methods, on the
other hand, such as rule-based approaches reviewed in section 4.4, have some
attractive properties that can be utilized in linguistic analysis and synthesis of
fuzzy systems.
We briefly describe the identification algorithms in fuzzy modeling mentioned
above and besides analyzing their general properties, observe their applicability
from transparency viewpoint - focusing on how to apply transparency
constraints if the original algorithm does not come with built-in transparency
protection as is the case in some occasions. The gap between transparency and
accuracy is most obvious with gradient-based algorithms, in this sense, the
proposed algorithms that are able to reduce the gap significantly, is believed to
be important.
Modeling algorithms we describe in this chapter have diverse properties -
including certain advantages and certain shortcomings. Sometimes it is possible
to bring out the best and hide the shortcomings by a clever combination of
several algorithms. With combined approaches, linking and integration of
algorithms can be distinguished. In first case different algorithms are applied in
succession and the next one starts if the previous one has finished its job.
Integrated algorithms, on the other hand, usually combine the techniques that
aim for the optimization of different sets of parameters (e.g. antecedent and
consequent parameters) and training takes place in cycles.
If no adequate expert opinion about the model structure is available, one usually
starts with uniform partition and with the combinatorial rule base where each
rule is assigned an unique output MF (that is necessary to give the system
enough freedom to obtain good placement of output MFs for a low
approximation error). Hence the number of output MFs equals the number of
rules that depending on the number of inputs and input MFs may become very
large. One possibility to reduce the number of output MFs is to use Wang-
Mendel method (section 4.4) for system initialization.
Even when the small number of output MFs is not the primary goal,
initialization with Wang-Mendel algorithm might be a good idea because it
gives better initial estimate of the model. This in turn may improve the
convergence of such algorithms as GD or GA.
Another approach that by definition reduces the number of MFs is product
space clustering. Cluster centers serve as the rule prototypes, thus, as the initial
result the number of clusters determines the number of rules and input MFs per
variable. In this approach each input (output) MF is associated with one rule
only. Very similar clusters can be merged thus reducing the number of MFs.
Unfortunately this is generally true only for 1st order TS systems.
Speaking of integrated algorithms, the common choice is to use clustering for
antecedent and least squares estimation for consequent parameters to obtain all
MF parameters.
53
The most famous integrated or hybrid algorithm is perhaps ANFIS (Jang

1993a). In this algorithm, gradient descent and least squares method are
combined together. GD is used to learn the antecedent parameters, and LSE is
used to determine the coefficients of the linear combinations (or fuzzy
singletons) in the rule consequents. Each training epoch consists of two passes,
i.e. in the forward pass input data is supplied and consequent parameters are
identified using sequential least squares algorithm (see section 4.5), in the
backward the error is back-propagated and antecedent parameters are updated
using GD algorithm (section 4.6). The primary motivation for using LSE
instead of GD for consequent parameter identification is to improve the
convergence at what it greatly succeeds.
The possibilities to combine different approaches are thus quite large. It is
important, however, to emphasize that the actual selection of algorithms
depends primarily on the particular application and the technical limitations of
the given task. Moreover, performance of the algorithms is also application-
dependent and no general guarantee that linking of modeling algorithms would
result in improved model, cannot be given.
4.2 Fuzzy systems as universal approximators
A fuzzy system can be regarded as a (multidimensional) input-output mapping

y = f(x). Several authors have proved that given enough rules, the system can
approximate any real continuous function to any given accuracy.
∀x ∈ X, F (x) − f (x) < ε (4.1)
where F(x) is the function to be approximated.
Wang (1992) has proved this for standard fuzzy systems with Gaussian MFs,
product implication and conjunction and CoG defuzzification, Kosko (1992b)
for "additive" fuzzy systems and Castro (1995) has given the proof that is valid
for almost any type of fuzzy system.
While these results are important, they only suggest why fuzzy systems are
successful in modeling, without providing any information about how to obtain
accurate models.
4.3. Selection of input-output data
Fuzzy identification can be regarded as a fixed learning problem where the

learning goal is to adjust the parameters of the trained system so that the
difference between the system output and the desired output pattern (i.e.
approximation error that is most typically estimated by root mean square error)
for each input pattern would be as small as possible. Fixed learning problems
54
are solved using supervised learning algorithms. While the method for adjusting
the parameters of a fuzzy system is critical to the overall success of the
identification, selection of input-output can be considered just as important.
Training data is usually available as a set of K input-output measurements.
[ ]
z (k ) = x1 (k ), ..., xi (k ), ..., x N ( k ), y1 (k ), y 2 (k ), ..., y j (k ), ..., y M (k ) ,
(4.2)
k = 1…K
Input-output data selection problem may be formulated as how to obtain data
and what kind of data should be used. It is impossible to obtain good
approximation if there is not appropriate information present in the training data
set. Basically, we would like the training data set to contain as much
information as possible about the learning problem. In practice the number of
data pairs is, however, relatively small because the amount of data that can be
collected is limited or it is unreasonable to use too much data because it slows
down the learning process.
In practice we generally try to spread the data over the input space equally to
achieve coverage of the whole input space and expect a good approximation, if
space between the data points is not too large. Often, however, this is simply
impossible because we cannot pick the data pairs directly into the training data
set. This is the case when training inputs are actually past output or state values
of the modeled system. Lack of data inevitably leads to incomplete (sparse) rule
bases.
A noise signal as the system input is considered another option based on the
assumption that doing so we will able to excite many frequencies of the system
and have then a complete description of its dynamics in the input-output data
set.
In real life experiments data often contains disturbances, false measurements
etc. Thus filtering and validation of measured data is a necessary step.
With industrial systems, data describing behavior of the system is often
recorded in a regular basis. This data essentially provides information about the
system behavior around the working point. This may easily be the information
what we are interested in. Experimenting with large systems is often not
possible because of financial considerations and technical limitations, anyhow.
On the whole, the choice criterion of input-output data is application dependent
and generally we are able to make a choice that makes sense for the particular
application.
55
4.4 Rule-based approaches
Typically, fuzzy identification deals with MF parameter adjustment only and

assumes that rule base already exists (is provided by an expert). If some rule
modification takes place then only in indirect manner - it occurs when output
MFs are allowed to obtain any value in the range of the output variable and to
swap their positions with each other by what they also obtain new (linguistic)
meaning. The number of rules during the training is, however, constant and
often equal to Rmax (2.16).
In this section we give an overview of three algorithms designated to
accomplish the opposite task - to determine the rule base on the basis of given
input-output partition and data describing the relationship between the variables.
First two algorithms utilize rule weights in rule base tuning, Wang-Mendel
method (section 4.4.3), is a rule weight free approach.
4.4.1 Fuzzy template modeling
We already mentioned that two kinds of information - a training data set and
input-output MF definitions (fuzzy templates) that are provided by an expert -
are needed in order to determine the rule base. Each combination of input and
output fuzzy subsets represents an elemental rule.
IF U1 is A1r AND … AND UN is ANr THEN V1 is B1r AND … AND
(4.3)
VM is BMr, r = 1… Rl.
Here we note that the number of elemental rules Rl is greater than Rmax in (2.16)
because for the given input label combination each unique combination of
output labels qualifies as an elemental rule, thus
 N  M 
Rl =  ∏ S i  ∏ T j  (4.4)
 
 i =1  j =1 
In the next step each elemental rule is assigned a weight that is proportional to
the number of input-output samples that match the respective region in input-
output space. The approach is quite intuitive and can be illustrated by a SISO
modeling example.
For each region R(r) in Fig. 4.1 that corresponds to the rth elemental rule we
find the count of the samples that fall into that region, denoted by nr. We also
must specify what to do with the samples that belong simultaneously to several
rules (i.e. fall into gray regions in Fig. 4.1). One possible solution is to draw the
separation line to where the neighboring fuzzy sets intersect. Mathematically,
the count of the samples that belong to the rth rule is given by (4.5)
n r = count (( xi (k ), y j (k ))) , k = 1,… K, (4.5)
56
provided that
N M
∏ ir i µ ( x ( k )) ⋅ ∏ γ jr ( y j (k )) > 0
 i =1 j =1
∀µ ir ( xi ) ≥ α , i = 1...N , (4.6)

∀γ ( y ) ≥ α , j = 1...M
 jr j

where α is the intersection height.
µ11 µ12 µ13
γ13 R(7) R(8)
γ12 R(4) R(6) y
γ11 R(2) R(3)
x
Fig. 4.1. Illustration to the algorithm.
Final weight of a rule is determined by dividing nr with the number of samples

that activate the rules having the same combination of input MFs in the premise
part of the rule.
nr
wr = , (4.7)
Nr
where
N r = count (( xi (k ), y j (k ))) , k = 1,… K (4.8
provided that
N
∏ µ ir ( xi (k )) > 0
 i =1 (4.9
∀µ ( x ) ≥ α , i = 1...N
 ir i
57
The algorithm expressed by (4.6-4.9) slightly differs from Kosko's "product

space clustering" (Kosko 1992a) according to which the weight of the rth rule is
obtained from
nr
wr = Rl
(4.10
∑n
r =1
r
The difference between these approaches is that with (4.5-4.9) (modified

Kosko’s approach) the weights are normalized so that the sum of weights
assigned to rules with the same antecedent part is equal to one.
The method is applicable to standard fuzzy systems regardless of MF type,
inference parameters, etc. All we need are membership functions with
considerable support, thus to determine the rule weights for a 0th order TS
system one must specify output MFs that are fuzzy. Output singletons can be
obtained by computing the centroids of the output fuzzy sets after the rule
weights are assigned. To reduce the number of rules, it is also useful to establish
some threshold value for rule weights (e.g. 0.1), because the rules with small
rule weights contribute little to the system output.
4.4.2. Yager-Filev fuzzy template modeling algorithm
A somewhat more general approach is the algorithm proposed by Yager and

Filev (1994) expanded for MIMO models in (Riid and Rüstern 1998).
A given input-output reading (x1(k), x2(k), …, xN(k), y1(k), y2(k), … yM(k)),
where x1(k), x2(k), …, xN(k) are the values of the inputs U1, U2, …, UN (k =
(1…K)), respectively and y1(k), y2(k), … yM(k) are the values of the outputs V1,
V2, …, VM at the same moment, matches the rth elemental rule (4.3) with a
degree of matching defined by
N M
δ r (k ) = ∏ µ ir ( xi (k ))∏ γ jr ( y j (k )), r = 1, ..., Rl (4.11)
i =1 j =1
One input-output reading can have a nonzero degree of matching to more than
one elemental rule. This is taken into account by a normalized degree of
matching υr(k) obtained by the normalization of the fuzzy degree of matching
(4.11) with respect to the total degree of matching of the kth data reading with all
the elemental rules
δ r (k )
υ r (k ) = R
(4.12)
∑ δ r (k )
r =1
58
The next step will be computing the degree of matching of an elemental rule
with respect to the whole input-output data set
K
υ r = ∑υ r (k ) (4.13)
k =1
Final weights are obtained from (4.14)

υr
wr = ,
R'
(4.14)
∑υ '
r
r ' =1
R'
where ∑υ r' denotes the total normalized degree of matching of the rule
r ' =1
T
package consisting of R ' = ∏ T j rules with the same combination of MFs in
j =1
the rule premise.
(4.14) here ensures that the sum of weights assigned to rules with the same
antecedent part would be equal to 1.
4.4.3 Rule weights in modeling
In section 2.6 we claimed that rule weights should be avoided for several
reasons, including the complications in interpretation and combinatorial
explosion of the number of rules. In this section we try to find out if the
approximation properties of the algorithms utilizing rule weights described
provide arguments in favor of the rule weights.
mf1 mf2 mf3 mf4 mf5 mf6 mf2 mf3 mf4 mf5 mf6 mf7
1.0
1.0
0 0
0 2 4 6 8 10 -5 0 5
input output
Fig. 4.2. Uniform partition of input (left) and output (right) MFs.
We use the SISO system described in section 3.2 (the one with 50% overlap) to
generate the training data set. The basic argument in favor of a SISO system lies
in its simplicity, in the fact that the modeling results can be effectively
presented and that the discovered correlation between the algorithm design
parameters and modeling results can be directly generalized to MISO modeling
59
problems although, technically, modeling in MISO space requires a lot more

effort from the practitioner both in algorithm configuration and tuning phases.
For the application of the algorithm an input-output partition must be provided.
We use two different partitions. The usual choice for the cases where no expert
opinion is available, is uniform partition (Fig 4.2). Additionally, we use the
partition that represents the best possible estimation; i.e. the original partition of
the system that is to be identified (Fig. 3.4). We refrain from using trapezoid
MFs because that would result in partially stepwise output that is not desirable
for the given approximated function. We would prefer relatively few rules with
rule weights values near the unit value to many rules with lower values. This
preference is the basis for 50% overlap of MFs. Smaller overlap would, again,
result in a stepwise output of the final model.
Table 4.1. Rule weights of the model with the original partition.
Input/output mf1 mf2 mf3 mf4 mf5
mf1 0.612 0.365
mf2 0.248 0.675 0.149
mf3 0.740 0.213 0.309
mf4 0.120 0.326 0.511
mf5 0.150 0.335 0.514
mf6 0.650 0.304
4
y 0
-1
-2
-3
-4
0 2 4 6 8 10
x
Fig. 4.3. Modeling with Yager-Filev method: model output with 6 uniformly
distributed MFs (normal line), model output with 9 uniformly distributed MFs
(dotted line), model output with the original partition (bold line); the original
relationship is given by dashed line.
60
Table 4.2. Rule weights of the model with uniformly distributed MFs.
Input/output mf1 mf2 mf3 mf4 mf5 mf6
mf1 0.164 0.659 0.177
mf2 0.136 0.510 0.284
mf3 0.460 0.270
mf4 0.513 0.467
mf5 0.397 0.578
mf6 0.802 0.157
The models' response to the input in the range [0, 10] compared to the original
system is depicted in Fig. 4.3. The approximation root mean square error
(RMSE) that is quite large in both cases (0.557 with original partition and 0.777
with uniformly distributed MFs), can be reduced by using additional MFs,
(RMSE reduces to 0.468 with 9 input and output MFs and 19 rules with weights
greater than 0.1); the effect, however, is not exactly the desired one -
numerically accurate model with a moderate number of rules and it is
particularly disappointing that accuracy of the partition has very small effect. As
we later see, there exist algorithms that are able to produce models with the
lower level of complexity and smaller approximation error, thus the question
arises - what would be the ideal result in that kind of modeling or is there one at
all?
y
0
-1
-2
-3
-4
0 2 4 6 8 10
x
Fig. 4.4. Model partition that leads to unity rule weights.
The basic shortcoming of the method(s) is that although on linguistic layer the
distribution of data among the fuzzy sets is quite faithfully stored that allows
linguistic analysis, the translation mechanism between the linguistic and the
inference layer causes substantial information loss and the numerical
approximation is of low quality (the effect of flattening can be recognized).
61
y 0
-1
-2
-3
-4
0 2 4 6 8 10
x
Fig. 4.5. Modeling results for unity partition: Kosko's model (bold line), Yager-
Filev model (normal line), the original relationship (dashed line) .
It, however, would be interesting to know what kind of partition do we need in

order to obtain unit weights. Based on the reasoning behind the method from
section 4.4.1, we are able to arrange for such a partition for the given SISO
model (Fig. 4.4).
Through that elaborate procedure we are able to reduce RMSE for both methods
(modified Kosko's method RMSE = 0.3169, Yager-Filev method RMSE =
0.4597) but not very significantly. Moreover, the partition design is painful
enough for even a SISO system and simply not practical for multidimensional
systems.
Consequently, the most attractive property of rule weight approaches is their
ability to accumulate information in which information loss is very small.
Despite that rule weights make interpretation difficult and in interpretation a
good deal of constructive judgment is required, successful linguistic analysis of
such models is possible, as is demonstrated in chapter 6 of this thesis.
4.4.4 Wang-Mendel rule extraction
The Wang-Mendel (Wang and Mendel 1992a) algorithm is actually quite

simple. We start again with assigning matching degrees (4.11) for each
elemental rule (4.3). The resulting rule set has conflicting rules (inconsistent
rule base). To solve the conflict, we do not assign the rule weights but choose
the elemental rule that has the maximum matching degree and delete all other
conflicting rules (destructive approach).
62
Alternatively, we could start from scratch (constructive approach) adding rule

by rule if the matching degree is nonzero and favoring those with greater
matching degree for the given label combination.
mf1 mf2 mf3 mf4 mf5
mf4
mf2
y
mf1
mf3
Fig. 4.6 Possible error in rule-extraction due to too large sampling interval
The algorithm is quite universal and is able to extract the optimal rule base for
the given partition in case of transparent fuzzy systems. Too large sampling
interval may, however, be dangerous. The highlighted sample in Fig. 4.6 is the
only one that matches input MF "mf3" and chooses output MF "mf2" instead of
"mf4" (original output MF in the given rule, thus would be the optimal one) due
to data scarcity. With this we just lose the information associated with "mf4".
For non-transparent systems the algorithm is generally unable to find the
optimal rule base depending on the degree of non-transparency. This is because
the deviation between the transparency checkpoints and inferred numerical
relationship.
On one hand, it really seems that Wang-Mendel is a big step forward from rule
weight approaches because it provides clear rule bases without dubious rule
weights. It must be taken into account, however, that approximation error we
obtain with the Wang-Mendel rule extraction algorithm depends basically on
how accurate our estimation of input-output partition is. With optimal partition
zero error is achieved. Generally it is difficult to provide the optimal partition
and with suboptimal one, however, the error may become quite large.
Approximation error is also much more dependent on the system partition than
with rule weight approaches. The result can be somewhat improved by
increasing the number of rules but this strategy has its own disadvantages.
63
4.5 Least squares method
Least squares method identifies the consequent parameters from given data, on
the assumption that the input partition is given (Takagi and Sugeno 1985).
Denoting the normalized activation degree of a rth rule for the kth input pattern
by
τ r (k )
φ r (k ) = R
,
(4.15)
∑τ r ( k )
r =1
and combining those into the matrix Φ

 φ1 (1) ... φ r (1) ... φ R (1) 
 ... ... ... 

Φ =  φ1 ( k ) ... φ r ( k ) ... φ R (k )  , (4.16)
 
 ... ... ... 
φ1 ( K ) ... φ r ( K ) ... φ R ( K )
we obtain that output of a 0th order TS system (2.39) computed over all input
patterns (k = 1…K) is equivalent to
y = Φθ , (4.17)
where
θ = [ p01 , p02 , ..., p 0 R ]
T
(4.18)
and
y = [ y (1), ..., y (k ), ..., y ( K )]

T
(4.19)
For the given y - the vector of output reference values, output parameters can be
estimated by
[
θ = ΦTΦ ]−1
ΦT y . (4.20)
The problem with (4.20) is the possible singularity of ΦTΦ due to which the
inverse matrix cannot be found. Singularity occurs in situations where some
potential rules are not covered by training data (frequent with multidimensional
systems). One possible solution to that problem is to exclude all zero columns
(corresponds to situation where no sample fires the respective rule) from (4.16).
Removal of a zero column is equivalent to deleting an irrelevant rule of the
model. This is, however, frequently not sufficient because with a high-
dimensional ΦTΦ, det(ΦTΦ) becomes smaller than computing precision and is
64
therefore considered zero. To avoid that we have to select a positive constant δ,

to use
K
∑ φ r (k ) > δ , (4.21)
k =1
as the rule validation criterion.

(4.22) is not the optimal solution to singularity problem because different value
of δ results in different number of rules and usually we do not know beforehand
which value of δ is optimal.
A more general way to overcome the problem is to use recursive Kalman filter
that finds the mean square estimate of the solution to (4.20) sequentially.
θ(l + 1) = θ(l ) − P(l + 1)Φ T (l + 1)(y − Φ(l )θ(l )), l = 0, …,L - 1 (4.22)
where
P(l )Φ T (l + 1)Φ(l + 1)P(l )
P(l + 1) = . (4.23)
1 + Φ(l + 1)P(l )Φ T (l + 1)
The Kalman filter is characterized by fast convergence, which is due to the
adaptive learning rate determined by the R × R matrix P.
Application of (4.20) or, alternatively (4.22-4.23), effectively minimizes the
root-mean-squared error and results in extraction of optimal consequent
parameters for the given input partition and the given data set. The properties of
the partition and of the data set (e.g. data scarcity, presence of noise) have,
however, serious influence on the modeling error and moreover, on the validity
of the model that is demonstrated through the following examples. We are
mostly concerned with the last issue (i.e. validity of the model).
First we observe what are the generalization properties of the model obtained by
least squares estimation (i.e. how accurate is model reaction when presented
with new data that was not included in the training data set).
Let us compare two approaches (removal of rules not supported by data and the
use of Kalman filter) by using the 0th order TS system to generate output values
for x = [0, 10] discretized with the step 0.1. Our next step is to remove all data
for which τ5 > 0 from the training data set i.e. {x ∈ X | min(supp(mf5)) < x <
max(supp(mf5))}. With this, the given potential rule is no longer supported by
data.
With data missing from the training data set, the algorithm is not able to
discover the true input-output relationship in the poorly defined region. Each
approach, has, however different solution to this problem. First one removes the
fifth rule and identifies five singletons for the five remaining rules. Recursive
65
algorithm preserves all rules but assigns zero value to the one for what there
was no data. The identified consequent constants are listed in Table 4.3
Table 4.3. Parameters of the identified models.

Rule no. p0r of model 1 p0r of model 2
1 0.333 0.333
2 -3.5 -3.5
3 -1.5 -1.5
4 2 2
5 - 0
6 0.333 0.333
4 5
3 4
3
2
2
1
1
y 0 y
0
-1
-1
-2
-2
-3 -3
-4 -4
0 2 4 6 8 10 0 2 4 6 8 10
x x
Figure 4.7 Generalization properties of the scarce model obtained with LSE (left).
Output of the Kalman filter is depicted with circles, system with a rule removed
with crosses. Figure at right depicts the situation where there exists considerable
gap in the identification data (between the 3rd and 4th rule) and the identified
output singleton introduces large approximation error (bold line) for unseen data
(dashed line).
When presented with data that was not included in training data set, model 1
produces stepwise interpolation between the neighboring rules in the
underdetermined region. Extra problem is faced if x equals the value where τ5 =
1 (transparency checkpoint of the missing rule) and therefore no rule is
activated. Software packages tend to produce the value that is average of the
range of the output variable in such case. Model 2, on the other hand, uses the
zero value that was assigned to 5th rule in (Fig 4.7 left). Both solutions cannot
be considered very good, but model 1 is more predictable in the sense that the
output value (apart from transparency checkpoint) is defined using neighboring
rules. In the second case there may exist large difference between the zero value
and the actual output range and system output would be strongly biased in such
case.
It is important to note that with the systems where the overlap of input MFs is
greater, the described problem is not that acute. Larger overlap simply ensures
66
that the number of rules not supported by data is low. From approximation point
of view, non-transparent fuzzy systems therefore have clear advantage over
transparent ones in the given problem.
The described situation is not the only shortcoming of least squares method
when using scarce data. Potentially even more dangerous is the situation where
relatively small portion of data is missing and input partition is near-optimal.
We observe the case where the peak of the 4th input MF is shifted to the right by
4% of the input range and 26 points of data between 4th and 5th rule are
removed. Training results in the model with zero error but the output
consequent for the underdetermined region obtains the value that is well outside
the working range of the system and when presented with unseen data,
substantial deviation from the original system is observed (Fig. 4.7, right).
Presence of noise in the data set does not significantly alter the result if it is
reasonably distributed because the method minimizes the mean error but even a
single point that strongly deviates from the general pattern (e.g. false
measurement) has dramatic influence on the modeling result (Fig. 4.8).
4
y 0
-1
-2
False measurement
-3
-4
0 2 4 6 8 10
x
Fig. 4.8. The effect of noise and false measurements to LSE. Original relationship
(normal line), approximated relationship (bold line).
Finally, we need to derive the algorithm for 1st order TS systems. If the output
parameter vector for a 1st order TS systems is in the following form
θ = [ p 01 , p 02 , ..., p 0 R , p11 , p12 ,..., p1R ,..., p N 1 ,..., p NR ]
T
(4.24)
and
67
 φ1 (1) φ 2 (1) ... φ R (1) φ1 (1) x1 (1) ... φ r (1) xi (1) ... φ R (1) x N (1) 
 ... ... ... ... ... ... 
 
Φ =  φ1 (k ) φ 2 (k ) ... φ R (k ) φ1 (k ) x1 (k ) ... φ r (k ) xi (k ) ... φ R (k ) x N (k )  (4.25)
 
 ... ... ... ... ... ... 
φ1 ( K ) φ 2 ( K ) ...φ R ( K ) φ1 ( K ) x1 ( K ) ... φ r ( K ) xi ( K ) ... φ R ( K ) x N ( K )
Then (4.24) is obtained by using (4.20).
4.6 Gradient descent
Gradient descent (GD) parameter adjusting method is based on the

minimization of the error (cost) function
1
ε= [ y − ~y ]2 , (4.26)
2
where y denotes the output of the model and ~ y is the reference output. The
history of the method goes back to 1960 when Widrow and Hoff introduced the
adaline rule and applied that to McCulloch-Pitts neuron that brought learning to
neural networks (Widrow and Hoff 1960). It was later shown in (Minsky and
Papert 1969) that MucCulloch-Pitts neurons and adaline rule can solve only a
limited group of learning problems, namely, linearly separable problems, and
the interest faded. The discovery of back-propagation technique for multilayer
perceptrons (Werbos 1974), popularized later in (Rumelhart et. al. 1986)
renewed the interest.
4.6.1 Gradient descent learning rules for fuzzy systems
The update rule for the given system parameter p to minimize the error ε, is
obtained through differential calculus, provided that the error function (4.26) is
differentiable
∂ε
∆ξ = −η , (4.27)
∂ξ
where ξ is the updated parameter and η is the learning rate.
The key idea in training fuzzy systems with back-propagation is to regard a
fuzzy system as a feedforward network and then to use the chain rule to
determine gradients of the output errors of the fuzzy system with respect to its
parameters. Among the first people to apply back-propagation to fuzzy systems
were Wang and Mendel (0th order TS system with Gaussian input MFs) (Wang
68
and Mendel 1992b) and Nomura (triangular MFs) (Nomura et. al. 1992); in
(Guely and Siarry 1993), several other MF types are considered.
The requirement of differentiability suggests that GD can be applied to 0th and
1st order TS systems with product-product-sum inference (applicability extends
to (2.30) but is not considered in present section). The derivation procedures for
the learning rules are given in Appendix C.
For kth input-output pattern, the value of the error function is computed.
1
ε (k ) = [ y(k ) − ~y (k )]2 , (4.28)
2
being the squared difference between the kth reference value ~y (k ) and model
response for the given input pattern y(k) that is obtained from inference
function. Parameter updates are computed using and applying chain rule.
We start with 0th order TS systems (2.39). The learning task is to identify new
consequent parameters p0r(l + 1) and input MF parameters (e.g. air(l + 1), bir(l +
1) and cir(l + 1) when using triangular MFs).
The learning rule for output parameters p0r is
τ (k )
p0 r (l + 1) = p0 r (l ) − η ( y (k ) − ~y ( k )) R r .
(4.29)
∑τ r (k )
r =1
The learning rules for input MF parameters depend on what kind of MFs are
used. It must also be taken into account that if transparent MFs are piecewise
continuous, for each continuous region a different learning rule is derived.
For the parameters of triangular MFs (A.3) the following learning rules are
obtained:
if air(l) < xi(k) < bir(l) (4.30)
 ~ τ r (k ) xi (k ) − bir (l )
air (l + 1) = air (l ) − η ( y ( k ) − y (k ))( p0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − air (l ))


∑ τ r (k ) i
r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~y (k ))( p (l ) − y ( k )) τ r ( k ) 1
 ir ir 0r R
( a (l ) − bir (l ))
 ∑τ r (k ) ir
 r =1
69
if bir(l) < xi(k) < cir(l) (4.31)

 ~ τ r (k ) 1
bir (l + 1) = bir (l ) − η ( y (k ) − y ( k ))( p 0 r (l ) − y (k )) R (c (l ) − bir (l ))


∑ τ r (k ) ir
r =1

c (l + 1) = c (l ) − η ( y (k ) − ~ τ (k ) xi ( k ) − bir (l )
y ( k ))( p 0 r (l ) − y (k )) R r
 ir ir
(c (l ) − xir (k ))(cir (l ) − bir (l ))
 ∑τ r (k ) i
 r =1
If xi(k)> cir(l) or xi(k) < air(l) no learning occurs, this is is also true for points
where xi(k) = cir(l), xi(k) = bir(l) and xi(k) = cir(l) because derivative does not
exist there. Moreover, for each MF
air(l + 1) < bir(l + 1) < cir(l + 1) (4.32)
has to be satisfied, in order to preserve the physical meaning of the parameters.
If (4.32) is somehow violated, the respective update rule cannot be applied.
Similar restrictions must be taken into account with other types of MFs as well.
Extension from learning rules (4.30)-(4.31) to the ones for trapezoid MF (A.5)
is a matter of rewriting (4.33)-(4.34):
if air(l) < xi(k) < bir(l) (4.33)
 ~ τ r (k ) xi (k ) − bir (l )
air (l + 1) = a ir (l ) − η ( y (k ) − y (k ))( p 0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − a ir (l ))


∑ τ r (k ) i
r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~ τ (k ) 1
y (k ))( p 0 r (l ) − y (k )) R r
 ir ir
(a (l ) − bir (l ))
 ∑τ r (k ) ir
 r =1
if cir(l) < xi(k) < dir(l) (4.34)

 ~ τ r (k ) 1
c ir (l + 1) = cir (l ) − η ( y (k ) − y (k ))( p 0 r (l ) − y (k )) R (d (l ) − cir (l ))


∑ τ r (k ) ir
r =1

d (l + 1) = d (l ) − η ( y (k ) − ~y (k ))( p (l ) − y (k )) τ r (k ) x i (k ) − cir (l )
 ir ir 0r R
( d (l ) − xir (k ))(d ir (l ) − cir (l ))
 ∑ τ r (k ) i
 r =1
The learning rules of square spline MF (A.6) parameters are given by (4.35)-
(4.38).
70
if air(l) < xi(k) < (air(l) + bir(l))/2 (4.35)

 ~ τ r (k ) 2( x i (k ) − bir (l ))
a ir (l + 1) = a ir (l ) − η ( y ( k ) − y (k ))( p 0 r (k ) − y ( k )) R (b (l ) − a ir (l ))( xi (k ) − a ir (l ))


∑ τ r (k ) ir
r =1

b (l + 1) = b (l ) + η ( y (k ) − ~ τ (k ) 2
y (k ))( p 0 r (k ) − y (k )) R r
 ir ir
(b (l ) − a ir (l ))
 ∑τ r (k ) ir
 r =1
if (air(l) + bir(l))/2 < xi(k) < bir(l) (4.36)

 ~ τ r (k ) 4(bir (l ) − xi ( k )) 2
air (l + 1) = a ir (l ) + η ( y (k ) − y ( k ))( p 0 r ( k ) − y (k )) R
 µ ( x ( k ))(bir (l ) − air (l )) 3
 ∑ τ r (k ) ir i
r =1

b (l + 1) = b (l ) + η ( y (k ) − ~y ( k ))( p (k ) − y ( k )) τ r (k ) 4(bir (l ) − xi (k ))( xi ( k ) − air (l ))
 ir ir 0r R
µ ir ( xi (k ))(bir (l ) − air (l )) 3
 ∑ τ r (k )
 r =1
if cir(l) < xi(k) < (cir(l) + dir(l))/2 (4.37)

 ~ τ r (k ) 4(cir (l ) − xi ( k ))( xi (k ) − d ir (l ))
cir (l + 1) = cir (l ) − η ( y ( k ) − y (k ))( p 0 r ( k ) − y (k )) R
µ ir ( xi ( k ))(d ir (l ) − cir (l )) 3


∑r τ ( k )
r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~ τ r (k ) 4(cir (l ) − xi (k )) 2
y ( k ))( p ( k ) − y ( k ))
 ir ir 0r R
µ ( x (k ))(d ir (l ) − cir (l )) 3


∑ τ r ( k ) ir i
r =1
if (cir(l) + dir(l))/2 < xi(k) < dir(l) (4.38)

 ~ τ r (k ) 2
cir (l + 1) = cir (l ) − η ( y (k ) − y (k ))( p 0 r (k ) − y (k )) R (d (l ) − cir (l ))


∑ τ r (k ) ir
r =1

b (l + 1) = b (l ) − η ( y (k ) − ~ τ (k ) 2( xi (k ) − cir (l ))
y (k ))( p 0 r (k ) − y ( k )) R r
 ir ir
( d (l ) − cir (l ))( x i (k ) − d ir (l ))
 ∑τ r (k ) ir
 r =1
The learning rules for the linear coefficients of 1st order TS systems (2.37) are
given by (4.39).
71
x τ (k )
pir ( k + 1) = pir (k ) − η ( y (k ) − ~
y (k )) Ri r
(4.39)
∑τ r ( k )
r =1
To obtain the learning rules of the input MFs of 1st order TS systems, term p0r in
(4.30)-(4.38) must be replaced by (4.40)
N
p 0 r + ∑ p ir xi − y (4.40)
i =1
The learning rules (4.29), (4.30-4.31) and (4.33-4.39) along with (4.40) allow us
to train 0th and 1st order TS systems with three different kind of input MFs. Note
that application of given algorithms will result in non-transparent fuzzy system.
The possibilities to preserve transparency of the modeled systems are
considered in section 4.9.
4.6.2 The learning process
The training algorithm consisting of (4.34) and (4.41) (triangular MFs)

performs an error back-propagation procedure: to train p0r, the "normalized"
R
y ) / ∑τ r is back-propagated to the layer of p0r, (Fig. 4.9) for which
error ( y − ~
r =1
τr are the inputs. To train the MF parameters, the above mentioned
"normalized" error × (p0r-y)τr is back-propagated to the processing unit of layer
1 whose output is xi. MF parameters are then updated by respective update rules
using back-propagated values and the rest of the variables that can be obtained
locally.
µ22(x) P01
τr
x1 µ11(x) Π P02
Σ a
y2
a/b
µ12(x)
Π Σ
b
x2 µ21(x)
Fig. 4.9. Zeroth order TS system in network representation.
72
Thus, training is a two-pass procedure, in forward pass y is computed for a

given input, in backward pass the network parameters are trained.
When presented with training data set consisting of K data pairs, the question
how exactly to apply GD, arises. One may take one pair of data and train all
system parameters until the error for this given data pair is sufficiently small
then proceed with the next pair. Typical practice is, however, to cycle through
data many times, taking one step with the gradient algorithm for each data pair
(each cycle is called a training epoch).
Another question is when exactly to update the parameters. Usual practice
(incremental mode) is to perform it after the presentation of each training
example. Another possibility is to apply the update rule after the presentation of
all training examples that constitute an epoch (batch mode).
K
∂ε
∆ξ = η ∑ (4.41)
k =1 ∂ξ
(4.41) implies that the cost function in this case is the sum of squared errors:
K K
1
ε = ∑ ε (k ) = ∑ [ y(k ) − ~y (k )]2 (4.42)
k =1 k =1 2
The incremental mode of training makes the search in parameter space

stochastic in nature, which, in turn, makes it less likely for the back-propagation
algorithm to be trapped in a local minimum. The use of batch mode of training,
on the other hand, provides a more accurate estimate of the gradient vector. In
the final analysis, however, the effectiveness of a training mode depends on the
problem at hand (Haykin 1994).
Training is conducted until the stop criterion is satisfied - e.g. the desired error
value is achieved or change of parameters is smaller than some specified
threshold value or the change of error (even though the parameter changes are
still larger than threshold values) has become very small.
4.6.3 Convergence issues and higher order methods
The described algorithm is quite simple but in general case there is no guarantee
that the algorithm will converge to an optimal solution. There are several issues
associated with the implementation of the algorithm that influence convergence.
First of all the question of training data selection arises (this universally applies
to all data-driven techniques). Gradient descent algorithm does not add rules to
the system or delete them, thus the estimate of R has direct impact on learning
result. Of importance are also the initial values of the trained parameters. It is
useful to have them close to where they should be. The usual problem is that we
do not know where they should be.
73
One of the most important factors for convergence is the learning rate η. The
typical choice is to take the learning rate to be constant during the learning
process. Usually we, however, do not know the optimal value of η or a variable
η would be optimal. If learning rate is too small, the training process is very
slow and may become trapped in local minimum because it is not able to
"climb" over the local peaks on error surface (Fig.4.10, right). Larger η
increases the learning speed but may similarly become trapped when expected
change in parameter value to reach the optimum error value is smaller than step
size (Fig.4.10, left).
The solutions that have been suggested to improve the performance can be
divided to three categories: momentum terms, adaptive learning rates and
higher-order algorithms.
ε ∂ε ε ∂ε
abs(η ) abs(η )
∂p ∂p
p(l) poptimal p(l + 1) p p(l) p(l + 1) p
Fig. 4.10. Typical reasons for getting stuck in local minimum: learning rate is too
large (left), learning rate is too small (right).
The simplest method to increase the rate of learning and yet avoiding the danger
of oscillation is to include a momentum term in the delta rule (Rumelhart et. al.
1986)
∂ε
∆p (l + 1) = α∆p(l ) + η , (4.43)
∂p
where 0 ≤ α < 1 is the momentum constant. The inclusion of momentum tends
to accelerate descent in steady downhill directions and has a stabilizing effect in
directions that oscillate the sign.
Proposed learning rate adaptation techniques are divided into global and local
adaptation of the learning rate. Global adaptation of η requires a single learning
rate value for all adaptable parameters. "Search then Converge" method
(Darken and Moody 1991) has one of the highest performances among learning
rate adaptation techniques where
74
c l
1+
η 0 l0
η (l ) = η 0 2 (4.44)
c l l 
1+ + l0  
η 0 l0  l0 
where η0 is the initial value of the learning rate, c is a constant and l0 >> 0 is
another constant with typical values in the range 100 ≤ l0 ≤ 500. For l << l0, η(l)
is approximately equal to η0, while for l >> l0, η(l) decreases with (1/l).
Heuristic methods can be used, too. In (Jang, 1993a) the following rules are
used
a) if the error measure undergoes 4 consecutive reductions, increase η by
10%;
b) If the error measure undergoes 2 consecutive combinations of one increase
and one decrease, decrease η by 10%.
The basic descent algorithm adjusts the parameters in the steepest descent
direction (negative of the gradient). This is the direction in which the cost
function is decreasing most rapidly. This does not necessarily produce the
fastest convergence and in the conjugate gradient algorithms (Fletcher and
Reeves 1964) a search is performed along conjugate directions, which produces
generally faster convergence than steepest descent directions.
p(l + 1) = p(l ) + ηd(l ) , (4.45)
where d is the direction vector and p is the parameter vector.
All the conjugate gradient algorithms start out by searching in the steepest
descent direction on the first iteration
d ( 0) = − g ( 0) , (4.46)
where g is the gradient vector. Each successive direction vector is then
computed as a linear combination of the current gradient vector and the
previous direction vector
d(l ) = − g (l ) + βd(l ) . (4.47)
There are several variations of conjugate gradient algorithms, distinguished by
the manner in which the constant β is computed. With Fletcher-Reeves formula
g T (l )g (l )
d(0) = −g (0) , β (l ) = . (4.48)
g T (l − 1)g (l − 1)
With Polyak-Ribiere formula (Polyak 1969)
75
g T (l )[g(l ) − g (l − 1)]
β (l ) = (4.49)
g T (l − 1)g (l − 1)
Newton's method is an alternative to the conjugate gradient methods for fast
optimization. The basic step of Newton's method is
∆p = −H −1g , (4.50)
in which the Hessian matrix (second derivatives) must be computed.
Computation of H and its inverse is computationally expensive. There is also no
guarantee that H is nonsingular. There is a class of algorithms that are based on
Newton method but which don't require calculation of second derivatives. These
are called quasi-Newton methods (Dennis and Schnabel 1983), (Battiti 1992).
They use an approximation of Hessian matrix that is updated at each iteration of
the algorithm.
Like the quasi-Newton methods, the Levenberg-Marquardt algorithm
(Marquardt 1963) is designed to approach second-order training speed without
having to compute the Hessian matrix.
Application of higher order methods is not very common in fuzzy modeling so
far. Only few applications have been reported e.g. (Jang 1996), (Männle 2000).
4.6.3 Overfitting
One of the known problems characteristic to incremental training is called

overfitting. The error on the training data set may be driven to a very small
value but when new data is presented to the system, the error is large. System
has memorized the learning examples very well but it has not learned how to
react to new situations, i.e. the generalization properties are rather bad. From
neural network theory, several techniques for improving generalization, are
known. First there is the consideration that the number of adjustable parameters
should be just large enough to provide an adequate fit (if the number is too
small we deal with underfitting). The problem is that it is difficult to know
beforehand how complex model is required for a specific application. Another
method that can be easily applied to fuzzy systems is early stopping. In this
technique, the available data is divided into two subsets. First subset - the
training set - is used in updating the parameters. The second subset is the
validation set. The error on the validation set is monitored during the training
process. Normally both training error and validation error will decrease in the
initial phase of training. When the overfitting phenomenon occurs, however, the
error on the validation set will typically begin to rise. When the validation error
increases for a specified number of iterations, the training is stopped, and
system with the minimum validation error is returned.
76
Yet another approach is known in neural network theory termed regularization.

This involves modifying the performance index by adding a term that causes the
network to have smaller weights and biases. Adaptive fuzzy systems can be
considered more regular and thus generally less sensitive to overfitting than
neural networks because the trained parameters of fuzzy systems have physical
meaning and are therefore bounded to the actual operating range of the system.
This is especially true for transparent systems thus transparency can be
considered a protective measure against overfitting as is demonstrated in the
following example.
4 4
3 3
2 2
1 1
y 0 y 0
-1 -1
-2 -2
-3 -3
-4 -4
0 2 4 6 8 10 0 1 2 3 4 5 6 7 8 9 10
x x
Fig. 4.11. Generalization properties of gradient descent algorithm. GD with

transparency protection (left), unconstrained gradient (right). Dashed line depicts
the desired performance, normal line is the model output and crosses denote the
training samples.
Here we use the system used in section 4.5 for the generation of test data. We
use only few samples and introduce noise - data is both scarce and noisy what
makes overfitting phenomenon probable. Next we apply unconstrained gradient
descent (section 4.6.1) and Jager algorithm (Jager 1995) that preserves
transparency of the system (see section 4.10 for details). Finally we test both
obtained models against the noise-free original system. The results are depicted
in Fig. 4.11.
Unconstrained gradient descent reduces RMSE to the value of 0.0782 but this
has come on the expense of generalization properties as RMSE value of 0.2079
on the original data set demonstrates. The respective errors (0.1464 and 0.1852)
with the transparent algorithm are less different and the modeled relationship is
smoother.
77
4.7 Clustering algorithms
Cluster is a group of objects that are mathematically more similar to one another
than to members of the other clusters. Clustering is the detection of subspaces
(clusters) of the data space. The potential of clustering algorithms to reveal the
underlying structures in data, can be exploited for partitioning the input space of
fuzzy systems or constructing the rule base along with the definition of MFs
(product space clustering).
There are many clustering methods available that can be divided into two
subcategories: hard and fuzzy clustering methods. Hard clustering methods (e.g.
hard c-means clustering algorithm (Duda and Hart 1973)) are based on classical
set theory and require that an object either does or does not belong to a cluster.
Fuzzy clustering methods, on the other hand, allow objects to belong to several
clusters simultaneously, with different membership degree. For many real-world
problems a fuzzy partitioning of the underlying space is considered more
realistic than hard clustering, especially in association with fuzzy systems. A
large family of fuzzy clustering algorithms is based on minimization of fuzzy c-
means objective function J (Dunn 1974).
K H
J = ∑∑ ( µ hk ) m d A2 (z (k ), ν h ) , (4.51)
k =1 h =1
where H is the number of clusters, µhk is the notation for membership and νh for
cluster centers. z(k) denotes the kth observation of input-output data (4.2), being
a row vector in matrix Z.
Distance measure used in 4.51 is defined as
d A2 (z (k ), ν h ) = ( ν h − z (k )) A( ν h − z (k )) T (4.52)
The minimization of c-means functional can be solved by using a variety of
available methods. The most widely used method is fuzzy c-means algorithm,
an iterative optimization approach proposed in (Bezdek 1981).
According to this algorithm, the cluster prototypes (h = 1,…, H) are computed
by
K
∑ µ hk (l ) m z(k )
k =1
ν h (l ) = K
, (4.53)
∑ µ hk (l ) m
k =1
where l is the number of iteration. In the next step the distances are found for all
clusters and for all data objects
78
d A2 (z (k ), ν h (l )) = ( ν h (l ) − z (k ))A( ν h (l ) − z (k )) T , (4.54)
where h = 1,…, H, k = 1,…, K.
Next, the partition matrix U is updated according to
1
µ hk (l ) = 2
,
H  d ( x(k ),ν h (l )) 
2 m −1 (4.55)
∑  d
j =1 
A
2

( x(k ),ν j (l )) 
A
h = 1,…, H, k = 1,…, K.
The procedure is repeated by returning to (4.53) until U (l ) − U (l − 1) < ε .
Convergence of the algorithm is proved in (Bezdek 1980).
A singularity occurs when d A2 (z (k ), ν h (l )) = 0 for some z(k) and one or more
cluster prototypes vh. In this case 0 is assigned to each µhk in the given column
for what d A2 (z (k ), ν h (l )) > 0 and the membership is distributed arbitrarily
H
among the remaining µhk so that ∑ µ hk = 1 , for the given k.
h =1
The shape of the clusters is determined by the choice of A in the distance

measure (4.51). Typically, A = I, which induces the standard Euclidean norm.
The Euclidean norm induces hyperspherical clusters, i.e. clusters whose
surfaces of constant membership are hyperspheres. A can be defined as the
inverse of the n×n sample covariance matrix of Z, i.e. A = R-1, with
1 K
R=
K
∑ (z (k ) − z ) T (z (k ) − z ) (4.56)
k =1
where z denotes the sample mean of the data. Such A induces hyperellipsoidal
clusters with arbitrary orientation but the common limitation of clustering
algorithms based on a fixed distance norm is that such a norm forces the
objective function to prefer clusters of that shape even if they are not present.
Matrix A can be adapted by using estimates of the data covariance as in
Gustafson-Kessel (GK) algorithm (Gustafson and Kessel 1979). The difference
between GK algorithm and classical FCM algorithm is then that each cluster
has its own norm-inducing Ah, resulting in
d A2 h (z (k ), ν h (l )) = ( ν h (l ) − z (k ))A h ( ν h (l ) − z (k )) T , (4.57)
where
79
1
A h = ρ h det( Fh )n Fh−1 (4.58)
and
K
∑ (µ hk (l − 1)) m (z (k ) − ν h (l )) T (z(k ) − ν h (l ))
k =1
Fh = K (4.59)
∑ (µ hk (l − 1)) m
k =1
An advantage of the GK algorithm over FCM is that GK can detect clusters of

different shape and orientation in one data set (although, due to the constraint on
cluster volume, it can only find clusters of approximately equal volumes). It is,
however, computationally more expensive than FCM.
The number of clusters H, has the most severe influence to convergence and in
the sense of the effect on U. The weighting exponent m > 1 is also quite
important, measuring the "fuzziness" of the clusters. If m approaches one from
above, the partition becomes hard. If m → ∞, the partition becomes maximally
fuzzy, i.e. µhk = 1/H.
Usually the partition matrix U and cluster centers are initialized with random
values. One possibility to improve the convergence of the fuzzy clustering
algorithm is to use special clustering algorithms to return the initial estimates of
cluster centers, e.g. mountain clustering (Yager and Filev 1994b) or subtractive
clustering (Chiu 1994).
The mountain method is a grid based process for identifying the approximate
locations of cluster centers in data sets with clustering tendencies. In the 1st step
the object space is discretized to generate the potential cluster centers nm. The
2nd step uses the data to construct the mountain function. The mountain function
K
M (n m ) = ∑ e − (α ⋅d ( nm ,z ( k ))) , (4.60)
k =1
where z(k) is the kth data observation, α is a positive constant and d(nm, z(k)) is
a distance measure between nm and z(k) typically computed as:
d (n m , z (k )) = (z (k ) − n m )(z (k ) − n m ) T . (4.61)
Consequently, the closer a data point to a node the bigger its contribution to the
node’s score.
The higher the mountain function value at a node the larger its potential to
become a cluster center. The 3rd step of the algorithm is to use the mountain
function to generate the cluster centers. The node with maximum total score
will be marked nl* as the first cluster center. In order to get the next cluster, the
80
effect of the current cluster must be eliminated. This is carried out by revising
the mountain function.
*
M l +1 (nm ) = M l (nm ) − M l e − β ⋅d ( nl , nm ) . (4.62)
The process will be repeated until Ml + 1 < δ (stop criterion) and results in
determination of l cluster centers.
Subtractive clustering is a computationally less expensive extension of the
mountain method. It assumes that each data point z(k) is a potential cluster
center and calculates a measure of the potential Pl for each data point based on
the density of surrounding data points. Thus the number of potential clusters
equals the number of data points and does not grow exponentially with the
number of variables.
K
Pl (z (i)) = ∑ e − d ( z (i ),z ( k )) , i = 1, …, K
2
(4.63)
k =1
The algorithm selects the data point with the highest potential as the lth cluster
center
Pl (z *l ) = max Pl (z (k )) , (4.64)
1≤ k ≤ K
and then destroys the potential of data points near the lth cluster center using.
* 2
Pl +1 ( z (k )) = Pl (z (k )) − Pl (z *l −1 )e βd ( z ( k ), z l −1 ) (4.65)
This process repeats until the potential of all data points falls below a threshold,
e.g.
Pl +1 (z *l )
<δ , (4.66)
P1 (z 1* )
resulting in l cluster centers z 1* , z *2 , ..., z *l .

The small range of a influence of a cluster center (cluster radius) β will lead to
finding many small clusters in data if small and vice versa.
4.7.1 Extraction of fuzzy rules and membership functions
We first consider second type of clustering algorithms (subtractive clustering,

mountain clustering). Each identified cluster center could be considered a
transparency checkpoint of the rth rule. For input MFs, the typical choice in
literature is to use Gaussian shaped MFs, using the respective cluster
coordinates for MF centers, and positive parameter in (A.2) is usually
α = 4 / β 2 , where β is the radius of a cluster center (Chiu 1994). Output MFs
81
can be identified using least squares method (section 4.4) or (in case of 0th order
TS systems) obtained directly, using the cluster center coordinates in the output
domain(s).
For fuzzy clustering, the analogy between the membership functions of fuzzy
systems and fuzzy cluster membership is quite obvious and the partition matrix
is therefore the primary source for the MFs of the system. Perhaps the most
serious treatment of this problem is offered in (Babuska 1997).
The rules are again created directly in the product space and take into account
the form of system nonlinearity (clusters are more densely distributed in regions
of complex nonlinear behavior) and coverage of the space (the identification
data usually covers only a fraction of the product space of the system variables).
The fact that no rules are generated in the regions that contain no data can be
seen as a drawback or advantage depending from the viewpoint (loss of
generalization and completeness vs. reduced complexity).
The antecedent membership functions can be generated either from the
projections of µhk onto the space of input variables xi or by computing the
membership degrees directly in the product space of the antecedent variables
using the distance measure of the clustering algorithm. The disadvantage of the
latter approach is that it often results in MFs that are subnormal and/or non-
convex. The extraction of MFs is most often a non-trivial task, because cluster
membership is often distributed so that it is difficult to obtain good
approximation of the projected membership values by some parametric
function. Acquisition of fuzzy systems by product space clustering often leads
to redundancy when the projections of the clusters onto the antecedent variables
are similar (i.e. highly overlapping fuzzy sets that describe almost the same
region in the domain of a model variable). The simplification of a redundant
fuzzy system can be done manually although it is more convenient to automate
this process by using respective algorithms e.g. simplification of the rule base
using similarity measures (Setnes et. al. 1998), compatible cluster merging,
(Setnes and Kaymak 1998) etc. For the consequent parameters typically least
squares procedure is applied that somewhat reduces the MF approximation
error.
As mentioned earlier, one of the advantages of GK algorithm is that is capable
of detecting orientated and ellipsoidal clusters in product space. The
information about the cluster orientation is, however, largely lost in MF
extraction process because the reconstructed cluster in the antecedent space
obtained by applying computing the activation degree of the rth rule (4.67), is
orthogonal in respect to the axes.
N
τ r = I µ ir ( xi ) (4.67)
i =1
82
Babuska has also proposed a procedure for the identification of a 0th order TS
system using GK clustering. His practice, based on the interpolation properties
of 0th and 1st order TS systems (see also section 3.6), is to place the centers of
the triangular membership functions (Fig. 4.12, bold line) to where the cluster
membership function projections µhk (dashed line) intersect and add two
additional sets at the extreme points of the domain. Note that the MFs form a
partition.
µhk
1.0
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 x
Fig. 4.12. Input variable partition for standard fuzzy systems (bold line) derived
from GK clusters (dashed line).
Since 0th order and 1st order TS systems are essentially incompatible, there is
virtually no way for the adequate translation of the rules of a 1st order TS
system to the rules of a 0th order system. Thus a significant part of information
that the clustering algorithm gives to us is meaningless. We can use the part that
concerns cluster memberships and construct the combinatorial rule base with
the output singletons identified by LSE procedure. The basic motivation for
using the clustering algorithm here lies solely in the expectation that the
algorithm is capable of finding the input partition that is better than the blind
guess (e.g. uniform partition) or expert definition.
4.7.2. Clustering example
Efficiency of the clustering methods depends on the type of the problem, and
typically we deal with complex phenomena for which trivial solutions do not
exist.
We use the truck backer-upper system (see section 6.1, for details) to generate a
set of test data The inputs of the system are the coordinates (x, y) of the car and
the output is the angle (Φ) of the car. The original data (white solid line in Fig.
4.13) is contaminated with noise. Next we apply FCM and GK clustering
algorithms to that noisy data. Later, subtractive clustering is used. We use
Babuska's Fmid toolbox for GK clustering and MATLAB fuzzy logic toolbox
for fuzzy c-means and subtractive clustering. We can see that 7 clusters
obtained by GK clustering (Fig.4.14, right) are much more distinct than those
obtained by FCM (Fig.4.14, left).
83
Next we observe, how the cluster centers are placed. An additional set of cluster
centers is obtained by subtractive clustering. The distribution, depicted in Fig.
4.15 shows that the several FCM clusters are concentrated into one region and
have therefore little meaning.
Because clustering techniques are unsupervised algorithms, there is no better
way to compare their effectiveness in numerical terms than to apply LSE to
obtain the consequent parameters for fuzzy systems that can then be evaluated
on the basis of the approximation error. The identified fuzzy models have
approximately same RMSEs (Fig.4.18).
300
250
200
Φ 150
100
50
0
30
20 -8
-10
10 -12
y 0 -14
-16 x
-10 -18
Fig. 4.13. Observed input-output relation.

20 20
18 18
16 16
14 14
12 12
y 10 y 10
8 8
6 6
4 4
2 2
0 0
-18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8
x x
Fig. 4.14. Data projected onto the input space and clusters obtained by FCM
clustering (left), GK clustering (right).
84
25
20
15
y 10
-5
-18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8
x
Fig. 4.15. The estimated cluster centers by subtractive clustering (circle), FCM
(square), GK (triangle).
1 1
0.8 0.8
0.6 0.6
µ
0.4
0.4
0.2
0.2
0
0
-17 -16 -15 -14 -13 -12 -11 -10 -9
-17 -16 -15 -14 -13 -12 -11 -10 -9
x
x
1
1
0.8
0.8
0.6
0.6
µ
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 5 10 15 20
y y
Fig. 4.16. Input MFs of 1st order models obtained by projections onto the axes of
the input variables. GK clustering (left), subtractive clustering (right).
1
0.8
0.6
0.4
0.2
0
-17 -16 -15 -14 -13 -12 -11 -10 -9
x
0.8
0.6
0.4
0.2
0
0 5 10 15 20
y
Fig. 4.17. Input MFs of 0th order model obtained by projections onto the axes of the
input variables.
85
240
220
200
180
160
Φ
140
120
100
80
60
40
50 100 150 200 250
k
Fig. 4.18. Modeling results: bold line - the desired result; dashed line - 1st order TS
system with GK clustering (RMSE = 7.3031); normal line - 0th order TS system
with GK clustering (RMSE = 8.0834); dotted bold line - 1st order TS system with
subtractive clustering (RMSE = 6.6415); dashed bold line - 0th order TS system
with subtractive clustering (RMSE = 8.1927).
4.8 Genetic Algorithms
Genetic algorithms (GAs), inspired by Darwin's evolution theory "survival of

the fittest", were introduced by John Holland in seventies (Holland 1975) and
have been explored and exploited by many authors since then. GAs simulate
those processes in natural selection which are essential to evolution and are able
to find solutions to real world problems if they are suitably encoded. They work
with a population of "individuals", each representing a possible solution to a
given problem.
It is assumed that a potential solution to a problem may be represented by a set
of parameters. These parameters (genes) are joined together to form a
chromosome. It is generally believed that the ideal is to use a binary alphabet.
Each individual is assigned a fitness score according to how good a solution to
the problem is. The highly fit individuals (selection) are given opportunities to
"reproduce" by "cross breeding" with other individuals. Crossover takes two
individuals and cuts their chromosome strings at some randomly chosen
position. The "tail" segments are then swapped over to produce two new full
length chromosomes (one-point crossover). A whole new population of possible
solutions is thus produced. Crossover is not usually applied to all pairs selected
for mating but with a likelihood typically between 0.6 and 1.0. Mutation is
applied to each child individually after crossover and randomly alters each gene
with a small probability (~0.001). Mutation adds a small amount of random
search. If the GA has been designed well, the population will converge to an
86
optimal solution of the problem. The above described classic GA is depicted in

Fig. 4.19.
The most important property of GAs is their flexibility. Once we are able to
code our problem efficiently, it can be solved, regardless continuity,
differentiability, etc. of the fitness function. From this flexibility it derives that
with GAs we are able not only to tune the MF parameters of the system but also
the linguistic relationship defined with the rule base of the system, if necessary.
Adaptation of GAs to fuzzy systems is not very difficult as we promptly see.
initial population
decoding
fitness evaluation
encoding
YES
Optimization
final population
criterion fulfilled?
NO
generation selection
of the new
crossover
population
mutation
Fig. 4.19. Classic genetic algorithm.
Once again, the goal of modeling is to reduce the approximation error by

optimizing the parameters of the fuzzy system. To accomplish that, the
chromosomes are composed in the following manner: Each chromosome
contains two substrings. First substring contains all encoded MFs of the system
and the configuration depends on the type of the MFs used.
membership function parameters rule parameters

x1 x2 xi xN y 1 r R
ai1 bi1 ci1 aiS i biS i ciSi Ir
p10 p02 p0T
Fig. 4.20. Chromosome configuration.
87
The second substring contains the integers that encode the rule information so
that each integer Ir represents one MF in the space of the output variable and
corresponds to the rth rule.
The integrated chromosome corresponding to a MISO 0th order TS system with
triangular MFs, similar to those presented in (Liska and Melsheimer 1994),
(Tan and Hu 1997) is depicted in Fig 4.20.
Prior to converting to binary alphabet, the real number values of the encoded
parameters p are scaled by applying (4.68)
p − p min
p ′[ pmin , pmax ] = ( 2 m − 1) , (4.68)
p max − p min
where m denotes the precision of the coding. After each epoch (Fig. 4.19) and
prior to transforming to decimal alphabet, in turn, obtained new parameter
values are descaled using
p max − p min
p = p min + p ′ . (4.69)
2m − 1
The fitness function may be evaluated using
100ε
f (ε ) = 100 − , (4.70)
y max − y min
where ε denotes approximation error measure (e.g. RMSE).

MF parameters and rule base can be tuned individually if the need arises, we
can use GA with only that part of the chromosome that is needed.
The universal nature of GAs allows to optimize fuzzy systems of arbitrary
configuration. Theoretically, GAs are also capable to solve problems of
arbitrary complexity. The schema theorem (Holland, 1975), which proves this,
is however, based on several assumptions (infinite number of chromosomes and
training epochs, to name two) that cannot be realized in practice. Typically,
convergence is premature or too slow. Because GAs work with a number of
potential models, each training epoch also requires a time to accomplish and in
order to reach from the initial population to results of acceptable accuracy, a
relatively large number of training epochs is required. Moreover, due to
stochastic nature of the algorithm no modeling experiment can be reproduced.
The main bottleneck of GAs when applied to fuzzy systems is the evaluation of
fitness function. Generally thus, aside from simple and offline experiments we
cannot make much use of GAs in fuzzy modeling or control and that is because
we do not have enough computational power. Consider a two-input standard
fuzzy system in Fig. 4.21.
Even if GA is initialized with Wang-Mendel algorithm (to provide better initial
estimate of the modeled functions) the training with the same initial
configuration as the original model in Fig. 4.21 results in RMSE = 0.3742 (this
88
is the best of many trials) after 500 generations. One can compare the results
with ANFIS and other algorithms in section 4.10. GAs are outperformed by
ANFIS (transparency protected and without) both in terms of approximation
error and (most importantly) in approximation time (GA accomplishes its 500
epochs in 2 and half hours while that would be about 200 times less for ANFIS-
like algorithms and we usually do not even need so many epohcs). With the
present state of technology (or at least with the classic genetic algorithm) this
leaves very few application areas for GAs in fuzzy modeling.
mf3
3 mf6 1. IF x1 is mf1 AND x2 is mf1 THEN y is mf4
2 mf5 2. IF x1 is mf1 AND x2 is mf2 THEN y is mf3
mf2
1 mf2 mf4 3. IF x1 is mf1 AND x2 is mf3 THEN y is mf2
mf1 4. IF x1 is mf2 AND x2 is mf1 THEN y is mf2
y 0
-1
5. IF x1 is mf2 AND x2 is mf2 THEN y is mf1
-2 7. IF x1 is mf3 AND x2 is mf1 THEN y is mf3
-3 8. IF x1 is mf3 AND x2 is mf2 THEN y is mf2
4 10 10. IF x1 is mf4 AND x2 is mf1 THEN y is mf5
3 11. IF x1 is mf4 AND x2 is mf2 THEN y is mf4
2
1
5 12. IF x1 is mf4 AND x2 is mf3 THEN y is mf3
x2 x1 13. IF x1 is mf5 AND x2 is mf1 THEN y is mf5
0 0
1 mf1 mf2 mf3 mf4 mf5 15. IF x1 is mf5 AND x2 is mf3 THEN y is mf4
µ 17. IF x1 is mf6 AND x2 is mf2 THEN y is mf3
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
y
Fig. 4.21. Two-input fuzzy test system.
4.9 Transparency protection
The issue of transparency protection specifically arises with iterative learning

algorithms. With non-iterative algorithms explicit transparency constraints
(3.5)-(3.6) can be satisfied easily by a suitable a priori selection of MF
parameters. This is for example true for Wang-Mendel method (section 4.4.4) or
Babuska's combined approach of Gustafson-Kessel (GK) clustering (section
4.7) and least square estimation (section 4.5) for 0th order TS systems where
input fuzzy sets extracted from GK clusters form a fuzzy partition and output
MFs identified through LSE are symmetrical by definition. With iterative
learning algorithms such as gradient descent (section 4.6) or genetic algorithms
(section 4.8), however, where MFs undergo many modifications in
unconstrained mode a non-transparent model is a common result. Transparency
protection is similarly important in modeling of 1st order TS systems where
there are no explicit transparency constraints.
89
4.9.1 Transparency protection of 0th order TS systems and standard

fuzzy systems
The solution to the problem can be sought from

(i) imposing constraints on membership functions that prevent the system
from becoming non-transparent;
(ii) employing special membership functions that make transparency a default
property of a fuzzy system;
(iii) multi-objective optimization (Oliveira 1999).
First two approaches are applicable to standard fuzzy systems where the
transparency constraints have binary nature; third approach would be by nature
more suitable for 1st order TS systems where a certain balance between
accuracy and transparency is desired. Transparency protection, generally,
deteriorates the approximation capabilities of adaptation algorithms that is not a
complete surprise as trade-off between accuracy and transparency is a long
known fact.
With the first approach the fulfillment of transparency constraints is verified
prior to every parameter update. If the updated parameter value violates any of
the transparency conditions, the update rule will not be applied to the given
parameter. With GAs, transparency constraints should be verified in decoding
phase.
Much more elegant is the second approach where MFs that are symmetrical by
definition - e.g. fuzzy singletons (A.1), symmetric triangular MFs (A.3) or other
symmetrical MFs – are used to protect output transparency. To protect input
transparency in the same manner, one can use the Jager (neighbor-oriented)
definition of triangular fuzzy sets (Jager 1995).
 xi − a is −1
 s s −1
, a is −1 < xi < a is
 ai − ai
 a s +1 − x

µ is ( xi ) =  is +1 i
, ais < xi < a is +1 (4.71)
 ia − a s
i
0, a s +1 < x < a s −1
 i i i

Each fuzzy set (4.71) is defined through the neighboring fuzzy sets in a way that
its edge parameters bis , cis equal the centers of the neighboring sets,
a is −1 , a is +1 respectively (Fig. 4.22). Thus, a fuzzy partition is permanently
maintained. Based on the same principle, the expressions similar to (4.71) for
other types of input MFs can be derived.
90
µi a is −1 ais a is +1
bis cis xi
Fig. 4.22. Neighbor-oriented definition of triangular membership functions.
x1 x2 xi xN y
a i1 ai3 ais aiSi −1
a 1 s1 at st a T sT
Fig. 4.23. Chromosome corresponding to a transparent standard fuzzy system (MF

part).
The chromosome corresponding to a transparent standard fuzzy system for Gas

is depicted in Fig. 4.23 with what additional measures are unnecessary. With
gradient descent, however, new learning rules must be derived. Here we present
transparency-protected gradient descent algorithm for standard fuzzy systems
(2.30), the extension of original Jager algorithm (1995), designed for 0th order
TS systems.
The derivation procedure is given in detail in Appendix D. Note that this
approach is by no means restricted to triangular MFs and can be easily derived
for other kinds of neighbor-oriented MFs. The update rules for the parameters
of (2.30) are given by (4.72-4.75).
τ (k ) s r (l )
br (l + 1) = br (l ) − η ( y (k ) − ~y (k )) R r ,
(4.72)
∑τ r (k )s r (l )
r =1
τ r (k )
s r (l + 1) = s r (l ) − η ( y (k ) − ~
y ( k ))(br (l ) − y (k )) R
.
∑τ r (k ) s r (l ) (4.73)
r =1
91
if ais −1 < xi < ais (4.74)
( y (k ) − ~
y (k )) 1
a is (l + 1) = a is (l ) − η ⋅ ⋅
ai (l ) − a is −1 (l )
s R
∑τ r (k )s r (l )
r =1 .
 µ s ( x (k )) R ( µ is −1 ) R ( µ is ) 
⋅  si−1 i
 µ i ( xi (k ))
∑' τ r ' (k ) s r ' (l )(br ' (l ) − y (k )) − ∑' τ r ' (k ) s r ' (l )(br ' (l ) − y (k ))
r =1 r =1 
if ais < xi < ais +1 (4.75)
( y(k ) − ~
y (k )) 1
a is (l + 1) = a is (l ) − η s +1
⋅ ⋅
ai (l ) − ais (l ) R
∑ τ r ( k ) s r (l )
r =1 ,
 R ( µ is ) µ s
i ( x i ( k ))
R ( µ is +1 ) 
⋅

∑τ r (k )s r (l )(br (l ) − y (k )) − µ
' ' '
s +1 ∑τ
( xi (k )) r ' =1 r
' (k ) s r ' (l )(br ' (l ) − y (k ))

r ' =1 i
where r ′ = 1... R ( µ is ) refers to rules having Ais in their premise.
Note that if sr = ξ (arbitrary constant), the resulting (4.72), (4.74) and (4.75)
constitute the original Jager algorithm for 0th order TS systems.
Another advantage of Jager partition is that the number of adjustable antecedent
parameters (compared to (4.30)-(4.31)) is reduced.
Transparency improvement through multi-objective optimization requires a
form of transparency measure (or similar function) to be included in the cost
function (4.28) or fitness function (4.70) of the approximation algorithm that
would then ensure that transparency properties are also taken into account in
system optimization. Several applications of such mechanisms are reported
(Oliveira 1999), (Yin, 2000). For standard or 0th order TS systems, however, we
consider other mechanisms proposed in this section more appropriate.
4.9.2 Transparency protection of 1st order TS systems
The system parameters of 1st order TS systems in fuzzy modeling are usually
obtained by using global learning strategies such as gradient descent (with or
without LSE) that minimize quadratic global cost function and were covered in
sections 4.5 and 4.6. Typically, good results are obtained in terms of
approximation error but the degree to which we are able to interpret the model
behavior as valid linearizations of local models (i.e. the inverse of transparency
error (3.15)), has very stochastic nature and is generally low. Recently it has
been shown (Yen et. al., 1998) that projection of product space fuzzy clusters
92
(see section 4.7) and weighted least squares method can improve transparency
of the model.
Weighted LS estimation solves R independent weighted LS problems, one for
each rule. The consequent parameters θr = [p0r, p1r, … pir, … pNr] of rth rule are
obtained as a weighted LS estimate
[
θ r = X Te Wr X e ]
−1
X eT Wr y , (4.76)
where Wr is a K×K diagonal matrix, having τr(k) as its kth diagonal element; Xe
denotes the matrix [1 X], with rows [1 x1(k), x1(k),… xN(k)].
It has been shown (Yen et. al. 1998), (Setnes 2002) that identification with the
described method gives consequents that better describe the behavior of the
modeled system in the regions where the rules are active; modeling
performance in terms of approximation error, however, is usually inferior to
global LSE. This is because local learning techniques promote competition
between the rules whereas global techniques favor rule cooperation.
Alternatively, the competition between the rules can be promoted by using input
MFs with multi-point cores and low overlap degree between adjacent MFs as
was observed in section 3.4. In (Riid et. al. 2002) a gradient-based algorithm
that maintains a predetermined overlap degree of trapezoid fuzzy sets has been
derived, thus providing the means for balancing the accuracy/transparency
tradeoff. As with the approach by (Yen et. al. 1998), however, the
approximation properties of the model will suffer. Therefore a different, simple
but effective and highly compatible solution is hereby proposed.
Instead of promoting the competition between the rules through the
manipulation with MF overlap degree of antecedent variables we attend the rule
activation degrees τr and promote those with higher values and demote those
with lower values by using an exponent m in the inference function.
R N
∑ (τ r ) m ( p0r + ∑ pir )
r =1 i =1
y= R (4.77)
∑ (τ r ) m
r =1
If m is very large, y (k ) → y r (k ) that plays important role in transparency error

max τ r
(3.15). The consequent parameters of (4.77) can be identified with (global)

LSE. The idea is not use (4.77) directly but only to apply global LSE to obtain
the consequent parameters of (4.77) and then use those parameters in the
original model (i.e. where m = 1). The correlation between m and consequent
transparency error is rather straightforward but to a certain extent only because
of undesirable properties of the interpolation mechanism of 1st order TS systems
(section 3.4). Moreover, m cannot be very high because deviation between 4.77
93
and the original model becomes large that consequently shows in approximation
error. The parameters of input MFs can be optimized with Jager algorithm (the
existence of transparency checkpoints is recommended), product space
clustering or any other suitable method.
4.10 Comparison of gradient-based methods
We have reviewed several fuzzy modeling algorithms from transparency

viewpoint and ended up with a new algorithm for the identification of
transparent standard fuzzy systems and a simple method for transparency
enhancement of 1st order TS systems. The following modeling experiments are
carried out in order to analyze the effects of transparency protection on
modeling performance. Such effects are most easily observable in two-
dimensional space, and a SISO test system is employed in first part of the
section. Modeling a SISO system, however, does not give sufficient material for
conclusions because in practice we rarely deal with SISO systems and modeling
of multidimensional relationships always brings along additional problems such
as combinatorial explosion of rules, incompleteness of the rule base, etc. that
cannot be ignored. Included is a fuzzy two-input test system that is described in
section 4.8 and a simulation of Mackey Glass time series, acknowledged test
function in fuzzy modeling.
We use ANFIS to obtain 1st and 0th order TS models and Jager algorithm and its
extension to produce 0th and standard fuzzy models. To give the algorithms
comparable initial conditions, last two algorithms are initialized with one-step
LSE (section 4.5). Note that standard fuzzy model is initialized as a 0th order TS
system. The obtained consequent singletons are then converted into symmetrical
triangles. Additionally, 1st order TS systems are identified with the proposed
method for transparency enhancement.
4.10.1 Modeling of a SISO system
One of the benchmarks used in (Jang 1993a) test example is a smooth function,
given by
y = 0.6 sin(πx) + 0.3 sin(3πx) + 0.1sin(5πx) , (4.78)
Test data set is obtained by discretizing the input variable x = [-1,1] with the
step 0.01 that results in 201 training samples (Fig. 4.24)
94
0.8
0.6
0.4
0.2
y 0
-0.2
-0.4
-0.6
-0.8
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x
Fig. 4.24. Input-output relationship of sinusoidal function (4.78).
The modeling results of five algorithms are presented in Table 4.4, (200 training
epochs). Triangular input MFs are chosen for 0th order TS and standard systems.
To promote competition between the rules and having a smooth function to be
identified square-spline based MFs are chosen for 1st order TS systems to be
identified with ANFIS. With the proposed transparency enhancement method,
however, we do not need promotion from input side, so triangular MFs are
used.
mf1 mf2 mf3 mf4 mf5
mf1 mf2 mf3 mf4
1.5 1
0.8
1
0.6
0.4
0.5
0.2
y 0 y 0
-0.2
-0.5
-0.4
-0.6
-1
-0.8
-1.5 -1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x x
Fig. 4.25. Approximation of (4.78) by ANFIS with 1st order model and 4 rules (left)
and 5 rules (right) (here and in figures below local outputs are depicted by dotted
lines the desired response is drawn in dashed line and the response of the model is
given by normal line). The final input partition is above.
The results in Table 4.4 clearly demonstrate excellent approximation properties

of ANFIS for 1st order TS systems. Even with only four rules the system is able
to reduce the error to a very low value (Fig. 4.25). With 0th order underlying
system, the approximation capabilities of the algorithm are much more limited
95
due to the smaller number of adjustable parameters. ANFIS for 1st order TS
systems is also a very robust approximator. The number of rules does not
influence the modeling result significantly unless it is too small, although the
general trend that larger number of rules means smaller modeling error, is
undoubtedly present.
Typically, however, the computed transparency measure for 1st order systems is
too high to allow correct interpretation. It is also difficult to detect any
correlation between RMSE and transparency measure although it is clear that
for some configurations εtr is smaller and seemingly because the number of
rules is such (see Fig. 4.25) that enables modeling with low degree of rule
interpolation.
The proposed method for transparency enhancement reduces the transparency
error significantly (Fig. 4.30, right). The trick is to find appropriate m to
establish optimal balance between transparency and accuracy, which is
sometimes non-trivial due to interpolation mechanism of 1st order TS systems
(both V- and S-type interpolation can be difficult). With too few number of
rules (R < 7, presently) we, however, have two options – either seek for good
approximation by using m = 1 or, in contrast, use rather high values of m and
obtain as transparent model as possible. Here we have chosen the second option
(to no effect). All in all this quite clearly implies that 1st order TS systems may
be robust approximators but transparency is highly dependent on the number of
rules and cannot always be attained.
Table 4.4. Approximation results of (4.78).

0th order TS standard 1st order TS systems
systems systems
transparency protection enabled
exp. no R RMSE RMSE RMSE RMSE εtr m RMSE εtr
1 3 0.1219 0.2710 0.1560 0.2029 0.1643 15.0 0.1063 0.1006
2 4 0.1180 0.1174 0.1171 0.1210 0.1107 15.0 0.0053 0.3409
3 5 0.1098 0.1174 0.1174 0.1195 0.0745 15.0 0.0030 0.0915
4 6 0.0598 0.1081 0.1079 0.0824 0.0800 6.0 0.0033 0.1581
5 7 0.0724 0.1080 0.1073 0.0171 0.0320 1.5 0.0026 0.0911
6 8 0.0156 0.0324 0.0314 0.0159 0.0312 1.5 0.0028 0.0996
7 9 0.0144 0.0321 0.0277 0.0158 0.0318 1.5 0.0037 0.0701
8 10 0.0112 0.0245 0.0206 0.0139 0.0295 1.5 0.0017 0.0808
9 11 0.0066 0.0242 0.0156 0.0153 0.0260 1.8 0.0021 0.0588
10 12 0.0073 0.0229 0.0139 0.0115 0.0239 1.8 0.0020 0.0887
11 13 0.0096 0.0261 0.0162 0.0106 0.0212 1.8 0.0039 0.0449
12 14 0.0071 0.0171 0.0109 0.0097 0.0198 1.8 0.0006 0.0756
Application of ANFIS to 0th order model, while numerically effective,

inevitably leads to misinterpretation of the rules (see supposed transparency
checkpoints in Figs 4.27-4.29).
96
Jager algorithm, on the other hand ensures transparency for 0th order systems
and model can be validated using single measure - approximation error. From
the modeling results, presented in Table 4.4 it is clear that the approximation
error is always larger than with the corresponding non-transparent system
because transparency constraints somewhat deteriorate the approximation
properties of a system. This is where the proposed algorithm has a word to say.
We see that it consistently outperforms Jager algorithm (Fig. 4.30, left).
Obviously, the extended approximation properties derive from interpolation
properties of standard fuzzy systems (Figs. 4.27-4.29). This way the existing
gap between accuracy and transparency can be reduced.
1 0.8
0.8
0.6
0.6
0.4
0.4
0.2 0.2
y 0 y 0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.8 -0.6
-1 -0.8
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
x x
Fig. 4.26. Transparency-accuracy tradeoff. 9-rule 1st order TS models by ANFIS

(left), and transparency-enhanced method (right).
Fig. 4.27. Modeling results of (4.78). Exps. 1-2 (see Table 4.4 for details). ANFIS
for 0th order systems (left column), Jager algorithm (middle column), proposed
algorithm (right column). Number of rules is equivalent to the number of
transparency checkpoints depicted by small boxes.
97
98
99
0.35
0.3 - ANFIS
0.25 - Jager (0th order TS model) 0.25 - transparency-enhanced
- ANFIS (0th order TS model)

εtr
0.2
0.2 0.15
- proposed (standard fuzzy model)
0.1
0.05
0.15
RMSE 0
3 4 5 6 7 8 9 10 11 12 13 14
R
0.1
- ANFIS
0.05 0.1 - transparency-enhanced
RMSE
0
3 4 5 6 7 8 9 10 11 12 13 14 0.05
R
0
4 5 6 7 8 9 10 11 12 13 14
R
Fig. 4.30. Modeling results summarized. Left: RMSE of 0th order TS and standard
models. Right above: Transparency measure of 1st order TS models. Right below:
RMSE of 1st order TS models.
4.10.2 Modeling of a TISO system
Modeling of MISO systems (including two-input systems) is quite different

from modeling of SISO systems. In order to obtain a good approximation, the
first step is to choose a large enough number of input MFs for each input
variable (to prevent underfitting). It is, however, important not to choose too
large number of MFs, because the number of rules (as well as the number of
adjustable parameters) grows exponentially that has negative impact on the
convergence of the given algorithm. Selection of the number of rules that would
be optimal both in terms of the final approximation error and approximation
time is far from being a trivial task.
We identify the two-input single-output fuzzy system depicted in Fig. 4.19,

from 441 training samples with 18, 36 and 66 rules. The results in terms of
approximation error obtained with 200 training epochs are depicted in Table 4.5
(0th order TS and standard systems) and Table 4.6 (1st order TS systems).
The results obtained confirm the conclusions made in previous section. In order
to faithfully reproduce the modeled function more rules are needed with the
given models than were used to construct the original input-output relationship
that is logical because min-max and product-sum inference have different
properties.
100
Table 4.5. Approximation of two-input fuzzy function with 0th order models.
Input partition error(1) ANFIS Jager proposed algorithm
6×3 0.5538 0.1594 0.2479 0.1780
9×4 0.2891 0.1178 0.1710 0.1416
11×6 0.1858 0.0780 0.0811 0.0589
Table 4.6. Approximation of two-input fuzzy function with 1st order models.
ANFIS Transparency-enhanced ANFIS
Input partition RMSE εtr RMSE εtr m
6×3 0.1221 0.5216 0.2330 0.0354 3
9×4 0.0403 0.4151 0.0685 0.0300 2
11×6 0.0215 0.2692 0.0323 0.0373 1.5
3
2
1
y 0
-1
-2
-3
5
4 10
3 8
2 6
4
x2 1 2 x1
0 0
Fig. 4.31. Approximation of the fuzzy function by proposed algorithm with 66 rule
model.
4.10.3 Modeling of large systems
The previous examples may be considered simplistic (although well

presentable) as they provide the data sets from which complete rule bases are
extracted. Usually we deal with much more complex phenomena and data sets
are almost always incomplete. Therefore, finally an approximation of time
series is presented (Fig. 4.32), given by the Mackey-Glass differential delay
equation.
0 .2 x ( t − τ )
x& (t ) = − 0.1x(t ) . (4.79)
1 + x10 (t − τ )
We use the values x(t - 18), x(t - 12), x(t - 6) and x(t) to predict x(t + 6). The
training data was created using a Runge-Kutta procedure with step width 0.1.
The initial conditions for the time series were x(0) = 1.2 and τ = 17. 500
101
samples were created between t = 118 … 617. Additional set of validation data
was produced (t = 618…1117) to observe the generalization properties of the
models that in this case were obtained by Jager algorithm, GK clustering/LSE
estimation (0th order models) and ANFIS (0th and 1st order models). The
modeling results are shown in table 4.7 (100 training epochs with iterative
algorithms).
Table 4.7. Modeling of Mackey-Glass time series.

modeling algorithm ANFIS (1st ) ANFIS (0th) GK/LSE Jager
input partition 2×2×2×2 3×3×3×3 4×4×4×4 3×4×3×3
no. of rules 16 81 120 118
no. of parameters 104 117 136 131
training RMSE 0.0015 0.0035 0.0082 0.0166
checking RMSE 0.0018 0.0039 0.0086 0.0167
training time ~53s ~63s ~39s ~204s
These experiments bring out the major disadvantage of fuzzy modeling -

combinatorial explosion of the number of rules (and consequently of free
parameters) that has most direct influence on approximation time. This is
especially true for 1st order systems - compared to the 0th systems with the same
number of rules the number of adjustable parameters is at least several times
larger. E.g. training 4-input 1st order models of 3×3×3×3 and 4×4×4×4 input
partitions would result in 441 and 1328 free parameters and would require 2104
and 19560 seconds of training time for 100 training epochs, respectively,
(results are obtained with ANFIS implementation in MATLAB Fuzzy Logic
Toolbox on Pentium 1.4GHz processor). Depending on the particular
application it may easily be that due to the restrictions on computation time we
are not able to use the given algorithm. Usually, however, with 1st order systems
a smaller number of training epochs and rules is required to obtain the same
level of accuracy.
1.4
1.2
x
0.8
0.6
0.4
0 100 200 300 400 500 600 700 800 900 1000
t
Fig. 4.32. Approximation of the Mackey-Glass time series by GK clustering/LSE.
The difference between original data (dashed line) and model output is almost
indistinguishable.
102
In modeling of multivariable systems non-transparency can be considered an

advantage, at least in what concerns approximation time as it allows to use
model structures with fewer rules. With transparent 0th order models thus means
for reducing the number of free parameters must be sought. First, with GK
clustering/LSE, we delete unnecessary rules (see also section 4.5). Secondly,
with Jager algorithm, number of input MFs in a manner (after some
experimenting) that would not result in increased RMSE could be determined.
Third approach - initialization with Wang-Mendel algorithm (section 4.3.3) for
reducing the number of output MFs seems not to work in present case – we are
unable to climb out from the local minimum obtained with Wang-Mendel
approach (RMSE ≅ 0.055).
Jager algorithm requires relatively large training time. This is partly because the
current implementation is not optimized. With large systems, thus, batch
procedures (such as GK clustering/LSE in present case) have strong advantage
over iterative algorithms in transparent modeling.
4.11 Summary and conclusions
Data-driven fuzzy identification has been emerging research topic since the
beginning of 90s. Today, there exist many modeling algorithms that are able to
detect the correlation between the system variables from training data. The
research has had cooperative character - fuzzy approximation algorithms are
most often borrowed or adopted from other fields of research. Some of these
algorithms that either produce de facto transparent fuzzy systems or can be
modified in order to provide built-in transparency protection, were reviewed in
this chapter.
Different modeling algorithms have different properties. Through the
combination of these algorithms, new methods with expanded functionality
(clustering techniques combined with least squares estimation) or improved
approximation properties (gradient descent with least squares estimation) can be
obtained. Further combinations are possible if the need arises.
Most of these algorithms are designed for 0th or 1st order TS systems because of
mathematical limitations. In order to train standard fuzzy systems with min-
min-max inference very limited possibilities exist either because of method
disadvantages (Wang-Mendel method, rule weight approaches) or very high
requirements on computational power (genetic algorithms). The proposed
algorithm for prod-prod-sum inference standard fuzzy systems (section 4.9.1),
allows however, more efficient transparent identification because interpolation
properties of standard fuzzy systems.
Second original contribution in this chapter, the method for transparency
enhancement of 1st order TS systems (section 4.9.2) is a rather flexible and
effective tool for obtaining reasonably transparent 1st order TS models. It is
103
observed, however, that requirement of transparency implies that for the

reasonable compromise between transparency and accuracy, sufficiently large
number of rules is necessary.
104
Fuzzy control
5.1 Introduction
Control engineering is probably the most successful application area of fuzzy

systems. First fuzzy controllers were simple expert system designed on the basis
of human operator experience such as steam engine controller (Mamdani and
Assilian 1974) or cement kiln controller (Holmblad and Ostergaard 1982).
Since the mid-seventies when the concept of fuzzy control was introduced, it
has developed into a sophisticated technique drawing inspiration from
conventional (e.g. adaptive) control as well as from other fields of artificial
intelligence (most notably from neural networks).
Today, there exists a variety of fuzzy controllers in diverse application areas
such as process control, transportation control, robotics, medicine, financial
engineering etc. (Tsoukalas and Uhrig 1997). The traditional classification (Lee
1990) of fuzzy controllers distinguishes four families of fuzzy controllers that
are described below.
i) Controllers designed on the basis of expert experience and control
engineering knowledge
This group includes all knowledge based fuzzy controllers. Controller design
methodology is ill-defined and problem dependent, involving a lot of trial and
error effort but in some cases as with fuzzy PI, PD or PID control it is more or
less organized (see section 5.2). Many of knowledge-based controllers are open
loop controllers (there is no feedback involved), state feedback controllers or set
point controllers with additional inputs. Knowledge-based control is very
popular, despite the advances and advantages of adaptive fuzzy control. In
addition to the lack of well-defined methodology two more problems associated
with this approach can be outlined. One is the possible inadequacy of the expert
as the controller cannot be better than expert's knowledge. Another and even
more serious issue is the expert's possible inability to express his control
experience or general knowledge effectively with the tools of fuzzy logic, either
because he/she does not understand the properties of fuzzy systems very well or
105
cannot formulate the control rules verbally because it is only his/her body that
knows how to control the process/system, not the mind. Knowledge-based
control is the type of fuzzy control that by definition makes most use of
transparency concept.
ii) Controllers modeled on the existing controllers.
These fuzzy controllers try to mimic some other working controller. During the
training it is connected so that it has access to the inputs and outputs of the
working controller. After it is found to have learnt the expected task, it is put
online where it replaces the original controller. The approach is very useful if
the controller to be emulated is a human being or if the original control
algorithm is very expensive to implement. The possible disadvantage of the
controller is that it cannot be better than the original controller and often may be
worse because there always exists certain modeling error. Although we are
primarily concerned with the numerical performance of the controller,
transparency, however, may be useful as it allows the validation of the
controller by an expert.
iii) Model-based fuzzy control
Model-based fuzzy control uses a given (typically fuzzy) open loop model of
the plant under control to derive the set of fuzzy rules for the fuzzy controller
and is therefore principally different from previous two approaches where it is
implicitly assumed that no model exists. Examples of model-based fuzzy
control are model-based predictive control (Babuska et. al. 1996b), (Espinoza
et. al. 1999), inverse fuzzy process model based control (Abonyi et. al. 1999)
and fuzzy gain-scheduling methods (Hunt and Johansen 1997), (Korba and
Frank 2000). This type of control generally involves the generation of a fuzzy
model of the controlled process as the preliminary step and typically numerical
accuracy of the model is the primary concern with the exception of (Braae and
Rutherford 1979), (Fantuzzi 1994) where the linguistic description of the
dynamic characteristics is the source of fuzzy control rules for attaining optimal
performance of a dynamic system. Interestingly, analytical model inversion
techniques (Babuska et. al. 1995), (Baranyi et. al. 1998) that are considered in
detail in section 5.5 consider transparency as a precondition to exact inversion.
iv) Self-learning fuzzy controllers (adaptive fuzzy control)
In adaptive fuzzy control, the focus is on the automatic on-line synthesis and
tuning of fuzzy controller parameters, which will ensure that the performance
objectives are met even if the plant parameters change in time. Generally, these
techniques can be split into two categories: direct and indirect adaptive fuzzy
control. In direct adaptive control, a model of the plant is not estimated; instead,
we tune the controller parameters directly using plant data (Procyk and
Mamdani 1979), (Berenji and Khedkar, 1992), (Layne and Passino 1993),
(Nauck and Kruse 1994). Sometimes the desired performance is characterized
with a reference model and the controller seeks to make the closed loop behave
as the reference model would, even if the plant changes. In indirect adaptive
106
fuzzy control, there is an identifier mechanism that produces a model of the

plant, which is then used to specify the controller e.g. (Jang 1992a). Thus the
distinction between model-based and adaptive fuzzy control is somewhat
imaginary. The primary concerns of adaptive fuzzy control are control
performance and control loop stability. Transparency of adaptive fuzzy
controllers is currently completely unexplored topic.
Despite the advances in conventional control theory, it is estimated that over
90% of controllers in operation are PID (or P, PI) controllers (Passino and
Yurkovich 1996). That is because they are simple, reliable and easily
understandable. A substantial portion of the literature on fuzzy control also
deals with the use of fuzzy rules to implement nonlinear PID control that is
observed in more detail in the next section.
The rest of the chapter briefly describes possibilities to apply hybrid
conventional and fuzzy control (section 5.3) and thereafter concentrates on
fuzzy model inversion techniques including numerical, analytical and linguistic
inversion of fuzzy systems. The chapter primarily serves as the background
material to put the control techniques used for applications (described in the
next chapter) into the necessary context.
5.2 Fuzzy setpoint controllers
A conventional PID controller follows the control law (a discrete

approximation):
 1 k
e(k ) − e(k − 1) 
u (k ) = K p  e(k ) + ∑ e(i)∆t + Td ∆t
 , (5.1)
 Ti i =1 
where e(k) = r(k) - y(k) is the difference between the desired output value r(k)
and the actual output y(k) of the controlled plant (∆t is the sampling period). If
Td = 0, the control law reduces to PI, if 1/Ti = 0 then (5.1) implements PD
control (Fig. 5.1) and with Td = 0 and 1/Ti = 0 we obtain a P controller.
It is not difficult to construct the fuzzy controllers that perform similarly to
conventional set point controllers by using the suitable selection of input-output
variables. A fuzzy PD controller implements the following function.
u (k ) = f (e(k ), ∆e(k )) , (5.2)
where ∆e(k) = e(k) - e(k - 1).
Fuzzy PI controller typically realizes the following function.
∆u (k ) = f (e(k ), ∆e(k )) (5.3)
107
PD controller
r(k) e(k)
K
u(k) Control y(k)
- e(k - 1) object
z -1 1/∆t KpT
Fig. 5.1. A conventional discrete PD controller.
The basic difference between conventional and fuzzy P(ID) control lies in the
fact that the input-output relationship of the controller is linear in the case of the
former and arbitrary in the case of the latter (Fig.5.2). This explains why fuzzy
PID control is potentially superior to conventional PID control.
1 1
0.5 0.5
y 0 y 0
-0.5 -0.5
-1 -1
1 1
0.5 1 0.5 1
0 0.5 0 0.5
0 0
-0.5 -0.5
∆e -0.5
e ∆e -0.5 e
-1 -1 -1 -1
Fig. 5.2. Input-output relationship of a conventional PD (left) controller compared

to fuzzy PD controller (right).
Another difference is that in case of conventional control law only 1…3

controller parameters need to be specified. In case of fuzzy control the number
of adjustable parameters is much greater depending on the number of fuzzy sets
into which the control variables are partitioned (usually in the range of [3…9]).
Large number of adjustable parameters makes tuning of fuzzy controllers
difficult, being clearly the price for greater flexibility. Sometimes it is very
difficult to realize the potential of fuzzy control because of tuning problems.
Fortunately, this problem has been investigated by many people and there exist
certain guidelines concerning the design of fuzzy P(ID) controllers that we
proceed with.
As in case of fuzzy models, the adjustable parameters are rule base and MF
parameters. Rule bases for fuzzy PI and PD controllers (5.2-5.3) are obtained by
using three metarules (MacVicar-Whelan, 1977).
108
A) If both e(k) and ∆e(k) are zero then maintain present control setting (∆u(k)
or u(k));
B) If e(k) will go to zero at a satisfactory rate then maintain present control
setting;
C) If e(k) is not self-correcting zero then control action is nonzero and depends
on the sign and magnitude of e(k) and ∆e(k).
NB NM NS Z PS PM PB
1.0
0
-1 -0.5 0 0.5 1
control variable
Fig.5.3. Partition of control variables.
Control variables are partitioned into 7 subsets (Fig 5.3), using labels "negative
big" (NB), "negative medium" (NM), "negative small" (NS), "zero" (Z),
"positive small" (PS), "positive medium" (PM) and "positive big" (PB).
A quick glance at Fig. 5.4 shows that error is self-correcting (metarule B) when
e(k) and ∆e(k) have different signs.
∆e(k) > 0
e(k) < 0
r(k)
∆e(k) < 0
e(k) > 0
t
Fig. 5.4. A typical transient process.
Following the metarules it is possible to construct the rule bases for PI and PD
controllers given in tables 5.1 and 5.2, respectively.
Note that input-output partitions are normalized, i.e. defined in the range [-1,1].
The transformation of real operating ranges to these normalized universes is
obtained by applying scaling factors ke, kde, kdu (or ku) (Fig. 5.5).
109
Table 5.1. Rule base of a fuzzy PI controller

∆e(k)
NB NM NS Z PS PM PB
NB NB NB NB NB NM NS Z
NM NB NB NB NM NS Z PS
e(k) NS NB NB NM NS Z PS PM
Z NB NM NS Z PS PM PB
PS NM NS Z PS PM PB PB
PM NS Z PS PM PB PB PB
PB Z PS PM PB PB PB PB
Table 5.2. Rule base of a fuzzy PD controller

∆e(k)
NB NM NS Z PS PM PB
NB NB NB NB NB NM NS NS
NM NB NB NB NM NS NS NS
e(k) NS NB NB NM NS NS NS NS
Z NS NS NS Z PS PS PS
PS PS PS PS PS PM PB PB
PM PS PS PS PM PB PB PB
PB PS PS PM PB PB PB PB
The universes of the error [emin, emax], change of error [∆emin, ∆emax] and change
of control action [∆umin, ∆umax] are defined by the following expressions:
emax = rmax - ymin, emin = rmin - ymax (5.4)
∆emax = emax - emin, ∆emin = emin - emax = - ∆emax (5.5)
∆umax = umax - umin, ∆umin = umin - umax = - ∆umax (5.6)
e(k) ke e*(k)
normalized ∆u*(k) ∆u(k)
fuzzy kdu
controller
∆e(k) kde ∆e (k)
*
Fig. 5.5. Fuzzy setpoint controller with scaling factors.
Thus, in case of symmetric universes (emax = - emin, etc.), the scaling factors that
map the normalized values e*(k), ∆e*(k), ∆u*(k) (or u*(k)) onto the operating
values are easily obtained:
110
ke = 1/ emax, kde = 1/ ∆emax, kdu = ∆umax, ku = umax, (5.7)

and
e*(k) = ke⋅ e(k), ∆e*(k) = kde⋅ ∆e(k), ∆u*(k) = kdu⋅ ∆u(k), u*(k) = ku⋅ u(k) (5.8)
In case of non-symmetric input universes the scaling factors are not that simply
obtained but can be derived depending on the goal of mapping.
The scaling factors on the other hand, are often considered the primary tuning
parameters of fuzzy controllers, because of the ease of modification and certain
analogy with the parameters of conventional PID controllers. Even heuristic
rules such as "high values of ke result in low steady-state error and rise time"
have been developed (Yager and Filev 1994). On the other hand, one cannot
help but realize that the scaling factors do nothing more than preprocess (or
postprocess) the numerical values that are fed to (or obtained from) the fuzzy
controller and similar effect could be achieved by appropriate modification of
input (or output) MFs.
Input scaling factor basically reduces the number of operating membership
functions (and rules) if kx < 1/ xmax because it pushes the marginal membership
functions outside the firing zone [xmin, xmax] (Fig. 5.6). Potentially more
dangerous is the opposite case (kx > 1/ xmax), because it assigns the inputs the
values beyond the operating range of the normalized controller that is generally
considered an undesirable situation. Simple solution to that problem is to use
delimiters so that each input value smaller (greater) than the lower (upper) limit
is equalized with the limit value. Thus the use of scaling factors (kx > 1/ xmax)
makes the control more sensitive around the zero point and is equivalent to
replacing triangular fuzzy sets that are situated at the edges of the operating
range by trapezoid MFs (Fig. 5.6).
kx < 1/xmax
xmin 0 xmax
-1 0 1
kx > 1/xmax
xmin 0 xmax
Fig. 5.6. Effects from the use of scaling factors on the input partition.
If output scaling factor ky < ymax then, again, the effect is the same as when
output MFs would be compressed together to the zero point (Fig. 5.7). The
111
effect of the opposite case ky < ymax is not that easily summarized because it
usually tangles with the physical limits placed upon the output variable. If the
denormalized value is not limited, the effect is linear, i.e. the output range of the
controller is enlarged. If output value reaches the lower or upper limit, however,
it is forced to remain constant and the equivalent input-output relationship
cannot be obtained only by modifying output MFs (in fact, modification of
input MFs is needed). If the output scaling factor is then very large, we obtain a
relay-type controller that has only three output values (ymin, 0, ymax).
ky < ymax
ymin 0 ymax
-1 0 1
ky > ymax
ky⋅ ymin 0 ky⋅ ymax

Fig. 5.7. Effects from the use of scaling factors on the output partition.
Most of the effects that derive from the use of scaling factors where kx ≠ 1/ xmax
and ky ≠ ymax could thus be similarly obtained by modifying MF parameters. In
case of trial-and-error type of tuning it is often just much more convenient to
use scaling factors. In some approaches, scaling factors are identified from the
measured data. One can see the parallel with rule weights (section 1.4) that are
used because modification of MF parameters would be much more difficult.
Constant scaling factors modify MFs proportionally. In many cases, however,
individual tuning of MFs is necessary, in order to further improve the control. In
particular, two standard cases can be distinguished: concentrating the peaks of
MFs to zero to increase the control sensitivity around the setpoint (Fig. 5.8, left)
and the opposite case (Fig. 5.8, right).
-1 0 1 -1 0 1
Fig. 5.8. Tuning the peaks of MFs of the controller.
This can be done either manually or by using nonlinear scaling factors. The use
of the latter may involve certain risks.
112
While the majority of fuzzy setpoint controllers are of PI or PD type, there are
still cases where more sophisticated control law (i.e. PID control) is needed.
The following control laws have been employed.
u (k ) = f (e(k ), ∑ e(k ), ∆e(k )) , (5.8)
∆u (k ) = f (e(k ), e(k − 1), e(k − 2)) , (5.9)
∆u (k ) = f (e(k ), ∆e(k ), ∆ 2 e(k )) , ∆ 2 e(k ) = e(k ) − 2e(k − 1) + e( k − 2) (5.10)
e(k) e*(k)
ke u*(k) u(k)
fuzzy ku
∆e(k)
kde ∆e (k)
*
PD
+
∆u (k)
*
∆u(k)
fuzzy kdu Σ
PI
Fig.5.9. A hybrid fuzzy PID controller.
Larger number of rules (compared to PD or PI controllers) makes the controller

design difficult, therefore a hybrid PID type control has been proposed (Li and
Gatland 1996), that uses the rule bases of fuzzy PI and PD controllers and PID
output is obtained by summing the individual outputs (Fig. 5.9).
5.3. Fusion of fuzzy and PID control
Most of fuzzy control research focuses on setpoint regulation problem, hence,

fuzzy control is often viewed as a form of nonlinear PID control and
comparisons of fuzzy control vs. fuzzy control are frequent in literature.
Conventional PID control is, however, well established and can satisfy the
performance requirements of most setpoint regulation problems at minimal cost.
Often, performance improvement offered by fuzzy control cannot compensate
the increased complexity in computation and tuning. Consequently, fuzzy PID
control is rare in commercial applications, commercial applications of fuzzy
control are largely focused on high-level, task-oriented control that fall outside
the domain of conventional control methods (Chiu 1996).
113
i) Supervision of conventional controllers

In this configuration, the high level strategy is used for adjustments of the
parameters of the conventional control loops. A common problem with linear
PID controllers used for control of highly nonlinear processes is that the set of
controller parameters produces satisfactory performance only when the process
is within a small operational window. Outside this window, different PID
controller parameters are necessary, and these adjustments may be done
automatically by a higher level controller (Fig. 5.10).
The control system consists of a conventional discrete PID controller of which
the proportional, integral and derivative gains (Kp, Ki, Kd) are changed by a
fuzzy supervisor each sampling time (thus fuzzy controller is regarded a gain
scheduler). The inputs of the supervisor might be error and the change of error
(Nauta Lemke and Krijgsman 1991), (Zhao et. al. 1993). In (Mizumoto 1987),
the supervisor is used to tune the scaling factors of the underlying fuzzy PD
controller depending on overshoot, reaching time and amplitude of oscillation.
Fuzzy Preprocessor
supervisor
∆Kp ∆Ki ∆Kd

r u y
PID Control
controller object
Fig. 5.10. Supervisory control system for advanced PID control.
Alternative gain scheduling scheme may be implemented through the use of a

homogeneous 1st order TS system with the following rule format (Jang and
Gulley 1994):
IF e is A1r AND ∆e is A2r THEN yr = p1r e + p2r∆e
Each such rule represents a local PD controller (with suitable variable selection
other control laws can be implemented) that are combined together by means of
fuzzy logic. Thus it is possible to specify separate control law for each region of
variable space that is determined by the selection of input MFs. Low
transparency measure of the controller would be useful.
ii) Correction of conventional controllers
Normally, conventional control systems, which are based on PID controllers,
are capable of controlling the process when the operation is smooth and close to
normal conditions. However, if sudden changes occur or if the process enters
114
abnormal situations, then a configuration that is capable of bringing the process

back to normal operation as fast as possible may be useful. This idea can be
implemented using the parallel connection of fuzzy and PID controllers (Fig.
5.11). For normal operation, the addition to the output of PID controller is zero,
whereas in abnormal situations fuzzy controller develops nonzero output that
restores the normal state.
Fuzzy
controller
r u y
Control
object
PID
controller
Fig. 5.11. Parallel connection of fuzzy and PID control.
iii) Coordination of control loop set-points

Fuzzy logic can be applied not only for calculation of control variable (i.e.
direct control) but for modification of control strategy in general, by using a
fuzzy supervisor that makes adjustments of the controller set point (Fig 5.12).
y1
r1 u1
PID
Fuzzy Control
supervisor r2 u2 object
PID
y2
y3 y4
Fig. 5.12. Supervisory control system for modifying the control strategy.
History of such fuzzy hierarchical control configuration goes back to the first
industrial application of fuzzy control (Holmblaad and Ostergaard 1982) where
human supervisor of the cement kiln was replaced. Such utilization of fuzzy
logic can be very effective because high-level control strategy is often
formulated using natural language and the translation into the language of fuzzy
systems may be rather straightforward. Setpoint control, on the other hand, is
often more effective if implemented with conventional tools for reasons pointed
115
out earlier. The applications described in chapter 6 use this type of hierarchical
control architecture exclusively.
5.4. Inversion of fuzzy systems
The design process for fuzzy controllers that makes use of heuristic information
originating from human experts has many successful applications. Supervised
learning of fuzzy controllers has also found use. Both approaches have,
however, several shortcomings, already described in the introduction. It may be
difficult to perform the initial synthesis of the controller or to maintain required
performance as some time passes (because of significant and unpredictable drift
of plant parameters or the presence of noise or some other type of disturbance).
The solution is to use self-learning fuzzy controllers that can adapt to different
plant conditions, i.e. adaptive fuzzy control.
On the other hand, it can be postulated that the ultimate goal of the controller
design is to derive the inverse model of the process. In theory, the use of an
inverse model possesses the advantages of open-loop control, i.e. inherent
stability and perfect control with zero error. The major concern is that if the
inverse configuration actually exists or if it is physically realizable. Global
inversion of the system, where all states become outputs of the inverted model
and the output of the original system becomes the state variable (Fig. 5.14,
center) has normally non-unique solution and must be given by a family of
solutions. In case of partial inversion, only one of the states of the original
system becomes an output of the inverted model and other states together with
the original output are the inputs of the inverted model (Fig. 5.14, right).
Partially inverted model can be also more easily embedded into the control
system than the global inversion.
x1 x1 y
x2 fuzzy inverted x2 x2 inverted
… model
y y model … … model
x1
xN xN xN
Fig. 5.14. Global (center) and partial (right) inversion of the original fuzzy model
(left).
The partial inversion has often an unique solution but not necessarily (the
original model must be strictly monotone in respect to the inverted state to be
invertible).
5.4.1 Numerical inversion of fuzzy systems
The most intuitive way to obtain the inversion is to reverse the input and output
data and train an inverse model of a system or process (Fig. 5.15, left). For the
116
sake of simplicity only two states are brought in with x1 = u(k), x2 = y(k), y =
y(k+1). This type of training has been long practiced with neural networks. Two
major drawbacks are characteristic to this approach - first, if several values of u
are possible for the same output of the process (many-to-one mapping, Fig.5.15.
right), and a least-squares approach is used, the identification algorithm maps y
to the mean of all u, which can lead to a quite meaningless inverse model.
Secondly, it is difficult to pick the appropriate signal u for inverse learning, as it
is supposed to work over a wide range of amplitudes of y and for a large
bandwidth.
The second problem can be solved with online learning scheme depicted in Fig.
5.16., successful application of which is reported in (Jang 1995). Here, the
controller is the copy of inverse model and instead process output u we need to
pick the r that is the setpoint for the controlled process that we usually know.
-1
z y
y(k+1)
u(k) f
-1
inverse z
model
u
Fig. 5.15. Training of the inverse process model (left), non-invertible function
(right).
-1
z
y(k) y(k+1)
inverse u(k) f
r(k+1) model +
- z
-1
inverse
model
Fig. 5.16. Online inverse training.
Another numerical inversion technique that is generally associated with neural

networks is backpropagation through time (temporal backpropagation) that was
invented by several researchers simultaneously in 1980s. One of the most
famous applications of this technique is the truck backer-upper simulation
(Nquyen and Widrow 1990).
117
-1
z
y(k) y(k+1)
inverse u(k) f
r(k+1) model
fuzzy +
model -
Fig. 5.15. Backpropagation through time.
The goal of training is to minimize the cost function (5.12) by changing the
parameter set θ of the controller
J [θ (k )] =
1
(r (k ) − y (k )) 2 , (5.12)
2
by applying gradient descent method using the chain rule:
∂J [θ (k )] ∂J [θ (k )] ∂y (k ) ∂u (k )
= . (5.13)
∂θ (k ) ∂y (k ) ∂u (k ) ∂θ (k )
Note that the error is backpropagated through the model (though the parameters
of the model are not updated by gradient descent). The model of the system can
be a neural emulator, neuro-fuzzy system or even a set of mathematical
equations as demonstrated in (Jang 1992), where the method implemented with
ANFIS is used to balance the inverted pendulum.
5.4.2 Non-numerical inversion of fuzzy systems
The numerical identification of the inverse model may become computationally

expensive, requiring many training epochs and many training samples. The
issue of invertibility is also of importance and not very well handled with
automatic generation of the inverted model. The techniques for training the
inverted fuzzy model have become known basically through neural network
research. Fuzzy systems, however, are principally different from neural
networks because: (a) they can be interpreted in linguistic terms; (b) if
transparent, their parameters can be interpreted in terms of their influence to the
input-output relationship. First feature allows linguistic inversion (Braae and
Rutherford 1979), (Fantuzzi 1994), (Raymond et. al. 1995). Second feature as
shown by Babuska (1996) and Baranyi et. al. (1997) allows exact analytical
inversion.
Linguistic inversion (causality inversion) is obtained through the exchange of
antecedent and consequent variables in fuzzy rules if a symmetrical operator (t-
norm) represents the if-then relation (Babuska and Verbruggen 1996c).
118
Consider a three-input fuzzy system from (Baranyi et. al. 1998), given in Table
5.3.
Table 5.3. Rule base of the original model.

U 2, U 3 U1 is small U 1 is medium U 1 is large
U2 is low AND U 3 is low zero low medium
U 2 is low AND U 3 is high low medium high
U 2 is high AND U 3 is low low medium high
U 2 is high AND U 3 is high medium medium high
The inversion procedure may have three possible results marked in table 5.4:
i) The input configuration is unique. This is the ideal case.
ii) The input configuration is non-unique that means that the rule base is
non-invertible. The approximate solution is to choose the input
configuration with the lowest control energy (in linguistic sense).
iii) There are no inputs that allow one-step transition to the desired output.
The reason why such situation occurs is two-fold. First, the number of
MFs given for U 1 and V is not equal, i.e. there are 16 rules in the
inverted model whereas the number of rules in the original model is 12.
The second reason is that the original system may not simply allow one-
step transition to the desired output from the given state. The
approximate solution is to choose the “nearest” output (again in
linguistic sense).
Table 5.4. Inverted rule base.

U2, U3 V is zero V is low V is medium V is high
U 2 is low AND U 3 is low small (i) medium (i) large (i) large (iii)
U 2 is low AND U 3 is high small (iii) small (i) medium (i) large (i)
U 2 is high AND U 3 is low small (iii) small (i) medium (i) large (i)
U 2 is high AND U 3 is high small (iii) small (iii) small (ii) large (i)
One must pay careful attention to MFs of the inverted model. Input and output
MFs may be of different type (e.g. triangular input and singleton output MFs)
and therefore conversion of MFs is in order. The conversion algorithms are
rather straightforward. This is similarly valid even if linguistic inversion is
performed on standard fuzzy systems with triangular MFs (unless they are
uniformly distributed on all variables) because of the contradictive nature of
input and output transparency constraints.
If the inversion goal is fixed (let us for instance assume that the desired state of
y is under label of medium), it is possible to perform local inversion of the
model. The difference between partial linguistic inversion and local linguistic
inversion is that while inverted input (U1) becomes output, inverted output (V)
is never used in the final inverted model. From the rules that have linguistically
identical premises in terms of U2 and U3 (i.e. rows in table 5.3) only one that
119
results in the desired V is selected. Thus, indirect inversion of the model in

Table 5.3 would appear as
IF U2 is low AND U3 is low THEN U1 is large
IF U2 is low AND U3 is high THEN U1 is medium
IF U2 is high AND U3 is low THEN U1 is medium
IF U2 is high AND U3 is high THEN U1 is small (ii)
It is easy to see the advantages of this approach
i) The number of rules of the inverted model is Sy times smaller than in the
case of linguistic inversion, where Sy is the number of linguistic labels of
y.
ii) Type of consequent MFs does not matter, as they are never inverted. For
instance, models with incremental rule bases would be extremely
difficult to invert linguistically, because of the large number of
consequent MFs. With indirect inversion this is not a problem, rather the
opposite.
iii) Less problems with (ii) and (iii) type rules.
The basic disadvantage of local inversion is that the inverted model is valid only
for the fixed inversion goal and with multiple goals model must be re-inverted
each time.
Analytical inversion of SISO fuzzy systems
As observed in section 3.3.1, SISO 0th order TS systems yield piecewise linear
interpolation between output MFs with the suitable selection of inference
parameters and triangular input MFs that form a partition. This property is the
basis of inversion technique (Babuska et. al. 1995).
Such system (note that in SISO case there is no mismatch between rule-oriented
and variable-oriented notation) may be defined with the following rule format.
IF x is Ar THEN y is br. (5.14)
Input MFs of (5.14) are defined by (5.15) (Jager partition)
 x − a r −1
 a − a , a r −1 < x < a r
 r r −1
 a −x
µ Ar ( x) =  r +1 , a r < x < a r +1 . (5.15)
 a r +1 − a r
0, a r +1 < u < a r −1


(5.14) can then be inverted as
120
IF y(k) is Br THEN x(k) is ar, (5.16)

whereas
 y − br −1
 b − b , br −1 < y < br
 r r −1
 br +1 − y
µ Br ( y ) =  , br < y < br +1 (5.17)
 br +1 − br
0, br +1 < y < br −1


The inversion is exact (Fig. 5.16) if system is invertible, i.e.
a1 < a2 < … < aR → b1 < b2 < … < bR, or
(5.18)
a 1 < a2 < … < a R → b 1 > b 2 > … > b R
a1 a2 a3 a4 a5
b5
b4
b3
y
b2
b1
u
Fig. 5.16. Inversion of a SISO 0th order TS system.
Analytical inversion of TISO fuzzy systems
If the overlap of input MFs a two-input/single output (TISO) fuzzy system is

50%, only four rules can be fired at the same time (2N). We concentrate on
these four rules (5.19).
IF x1 is A1r AND x2 is A2r THEN y is br, r = 1 … 4 (5.19)
System output is calculated by (5.20) with the mapping between the rule-
oriented and variable-oriented notation given in table 5.5.
121
4 4
y = ∑τ r br ∑τ r (5.20)
r =1 r =1
Table 5.5. Rule base of (5.19).

x2
A21 A22
x1 A11 b1 b2
A12 b3 b4
The function described by (5.20) is depicted in Fig 5.17, left (only the borders
of the surface are shown). The number of output MFs exceeds the number of
MFs of x1, thus the rule base of the inverted model (depicted in Fig. 17, right) is
incomplete. The missing rules in Table 5.6 are denoted by ami, i = 1…4.
Table 5.6. Inverted rule base.

x(k) b1 b2 b3 b4
A21 A11 am1 A12 am2
A22 am3 A11 am4 A12
b4 a12
b3
y x1
b2
b1 a1
b4 1
a12 b3
x1 b
a 12 b1 y
2
a 22 x2 a12 a11 a 22 x2
Fig. 5.17. Inversion of a four-rule TISO 0th order TS system.
Consequents am1 and am4 can be found inside the domain of the original model
(Fig. 5.18), consequents am2 and am3, however, are outside the domain of the
original model and the model has therefore to be extrapolated in order to find
their values.
122
am2
a12
x1 am4
am1
a11
am3
b4
b3
b2
a 22 a12 b1 y
x2
Fig. 5.18. Extrapolated inverse model.
µ µ
a 12 a 22 x2 a 12 a 22 x2
µ µ
a11 a12 x1 bmin bmax y

d1 d2 d3 d4
µ µ
b1 b2 b3 b4 y z1 z2 z3 z 4 x1
Fig. 5.19. MF conversion scheme.
123
It can be observed that half of the rules in Table 5.6 are redundant (rules
containing b2 and b3 are described by other rules). Thus the rule base in Table
5.6 can be reduced to the one in Table 5.7.
Table 5.7. Reduced rule base.

x(k) b11 b22
A21 A11 Am2
A22 Am3 A12
The MF conversion scheme of system is depicted in Fig. 5.19 (bmin = min(b1, b2,
b3, b4), bmax = max(b1, b2, b3, b4)).
Output of the inverted model can then be computed by the following expression
4 4
x1 (k ) = ∑τ r d r z r ∑τ r z r , (5.21)
r =1 r =1
where the parameters of new output MFs, according to (Baranyi et. al. 1998) are
given by (5.22)-(5.25) (sr = sr+1 = 1; r = 1, 3).
z r = (br − bmin ) s r + (bmin − br +1 ) s r +1 , (5.22)
z r +1 = (br − bmax ) s r + (bmax − br +1 ) s r +1 , (5.23)
(br − bmin )a12 s r + (bmin − br +1 )a11 s r +1

dr = , (5.24)
(br − bmin ) s r + (bmin − br +1 ) s r +1
(br − bmax )a12 s r + (bmax − br +1 )a11 s r +1

d r +1 = , (5.25)
(br − bmax ) s r + (bmax − br +1 ) s r +1
Note that the extended part of the function described by the inverse model is
only necessary to span the domain of y but it is not part of the solution space of
the original model. It presents a problem when more than four rules are to be
inverted. We observe the inverse of a six-rule system (Fig. 5.20).
For y between b3 and b4 the inverse model of the combined surface will give
inexact inverse because a value between two surfaces (only one of which is
correct) is given by the inverse model.
5.4.4 Control by inverting a fuzzy model
In this section the extensions of both previously described approaches are given.
In general case a plant is given by a number of rules of the following format
124
IF y(k) is A1r AND y(k - 1) is A1r AND … AND y(k - n + 1) is Anr

AND u(k - m) is B1r AND … AND u(k - m - n + 1) is Bmr THEN y(k (5.23)
+ 1) is cr
where the constants m and n determine the order of the plant. In the following
we assume that there is no delay from the input to the output (m = 0).
x1
b4
b3 y
x2
Fig. 5.20. Neighboring cells are overlapping that introduces inversion error.
Extension of SISO inversion
According to (Babuska et. al. 1995) (5.23) can then be written in compact form
IF x(k) is Ar AND u(k) is B1r THEN y(k + 1) is cr (5.24)
where x(k) (state vector) contains all antecedent variables (except u(k)) that we
from now on denote as xi (i = 1, …,m + n - 1). The corresponding fuzzy sets are
composed into one multidimensional fuzzy set A by applying a t-norm operator
(product) on the Cartesian product space of the variables.
125
n + n −1
µ A (x(k )) =
r ∏µ
i =1
ir ( xi )
Assuming that the rule base of (5.23) is complete, the number of

multidimensional fuzzy sets Ar is equal to M - the product of the number of
MFs of the state variables xi. Rule base of (5.23) thus contains the rules in Table
5.3.
Table 5.8. Rule base of (5.23).

x(k) B1 B2 … BN
A1 c11 c12 … c1N
A2 c21 c22 … c2N
… … … … …
AM cM1 cM2 … cMN
a1 a2 a3 a4 a5 a3 a4
b5 b5
b4
b3
y
b2 b2
a1 a2 a3
b1 b5
u
a4 a5 b3
b4
b2 b1
Fig. 5.21. Splitting non-invertible rule bases.
The controller for the given process is the (partial) inversion of the original
model, given as follows
126
IF x(k) is Ar AND r(k + 1) is C1r THEN u(k) is br (5.25)

The control law obtained by the inversion of (5.24) given by (5.25) for a given
state x(k) where the cores of output MFs are computed as
M
br = ∑ µ A j (x(k ))c jr , r = 1...R . (5.26)
j =1
The inversion for the given state is exact because the system is compressed into
a SISO system (the projection of y onto the axis of u for the given x(k)). This
means that for each state the inverted model must be computed individually and
is therefore of temporary nature.
A fuzzy model is invertible if each output coincides with an unique output. In
case of partial inversion, this means that the inverse function x1 = f -1(y, x2, …,
xN) must result in one value of x1, i.e. the function described by the original
model is strictly monotone in respect to x1. If the original model describes a
non-invertible function, it must be split into two or more invertible parts (Fig.
5.21) and finding the control action for each part individually, using the
standard inversion procedure (5.25-5.26).
Among these control actions only one has to be selected that can be done using
some additional control criterion, e.g. minimal control effort u(k). The
invertibility of the system is verified on-line.
Extension of TISO inversion
The method described in pp. 121-125 needs to be extended for a model with
more input variables. At a given input-output data pair, it is sufficient to
consider SN N-input/single-output models with two antecedent MFs in each
input – this is because of a fuzzy partition a given sample can fire maximum
two MFs of a variable simultaneously. Each of those local models has 2N rules
and output MF parameters of corresponding inverse models are obtained by
applying (5.22)-(5.25) with r = 1, 3, …, 2N – 1. In order to obtain the exact
inverse the selection mechanism is used that first chooses the inverse models
with the appropriate domain where the input y lies between [bmin, bmax]. Each of
chosen inverse models computes the inverse value of x1. These values will be
checked against [a11, a12] to exclude the values of x1 that were obtained from the
extrapolated area.
Both extensions result in an exact inversion of an invertible fuzzy system (Fig.
5.22-5.23). Baranyi’s method is more general as it allows inversion of
transparent standard fuzzy systems whereas Babuska’s method is limited to 0th
order TS systems. Baranyi’s method, however, is more difficult to implement.
127
y
x2
x2 y
x1 x1
Fig. 5.22. Inverting an invertible rule base.
x2
y
y
x1 x1
x2
Fig. 5.23. Inverting a non-invertible rule base.
Inverted control scheme (theoretically) facilitates perfect control and inherent

stability but disturbances, measurement noise and model-plant mismatch all
cause differences in the behavior of the control system.
5.5 Control example
Hereby we present a comparison of selected control methods described above.

The controlled object depicted in Fig. 5.24 is the water tank with a pipe flowing
in and pipe flowing out (MATLAB demo). The outflow rate Fo depends on the
diameter of the outflow pipe (which is constant) and the pressure in the tank
(the latter varies with the water level h ∈ [0, 2]. Thus the system has some very
nonlinear characteristics. Our task is to maintain the some predetermined water
level (we choose the set-point that changes between 0.5 and 1.5) in the tank by
changing the valve that lets the water to flow in. Inflow rate F ∈ [-1, 1]. The
generic control system that suits for most of the designed controllers is depicted
in Fig. 5.25.
128
Fo
Fig. 5.24. Control object.
r(k+ 1) u(k) F(k) h(k)

controller VALVE TANK
r(k+ 1)
Fig. 5.25. Control system.
Conventional control
There is not much to comment on conventional setpoint control. The near-

optimal tuning parameters are not difficult to find. In case of P control Kp =
100; in case of PD control Kp = 7 and Kd = 10; application of PI or PID control
does not make much sense as there is no performance improvement with these
control laws. The problem with P control is that it does not remove oscillations
(Fig. 5.26). The problem with PD control is that while it results in superb
performance in noise-free environment, it is very sensitive to output-additive
white noise even if the level of noise is not very high (presently 1%) as can be
seen from Fig. 5.27. PD control would supposedly benefit from internal model
control scheme (Dutton et. al. 1996) that can cancel the influence of output-
additive noise.
1.6 1.6
1.4 1.4
1.2 1.2
h 1 h 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50 100 150
t t
Fig. 5.26. P control. Noise-free (left), with noisy data (right).
129
1.6 1.6
1.4 1.4
1.2 1.2
h
h1 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50
t
100 150
t
Fig. 5.27. PD control. Noise-free (left), with noisy data (right).
MATLAB fuzzy controller
The controller that is developed for MATLAB demo is quite interesting. It takes
e(k) and dh(k)/dt as its inputs, whereas the rule base of the controller is non-
traditional, consisting only of 5 rules.
IF e(k) is Z THEN u(k) is Z

IF e(k) is P THEN u(k) is PB
IF e(k) is N THEN u(k) is NB
IF e(k) is Z AND dh(k) is P THEN u(k) is NS
IF e(k) is Z AND dh(k) is P THEN u(k) is PS
First three rules are responsible for general control strategy (relay-type logic),
whereas last two ensure smooth action around the setpoint. It is interesting to
note that the controller is non-transparent even though the MFs do not violate
any of transparency conditions. The first rule is always fired whenever e(k) falls
into Z, thus controller output cannot achieve NS and PS values even if the
respective rules are fully activated but falls somewhere in between Z and those
MFs. Nevertheless, the control performance does not necessarily suffer from
non-transparency as has been shown many times. The controller is also
surprisingly insensitive to noise (Fig. 5.28).
1.6 1.6
1.4 1.4
1.2 1.2
h 1 h 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50 100 150
t t
Fig. 5.28. Fuzzy control (MATLAB demo). Noise-free (left), with noisy data (right).
130
Fuzzy PD controller
Fuzzy PD controller is constructed according the guidelines of section 5.2 with

5 fuzzy sets on each input variable, thus consisting of 25 fuzzy rules. Input MFs
have uneven partition as in Fig. 5.8, left and in the final tuning phase, scaling
factors ke = 0.8, kde =5 and ku = 1.5, are specified. This way, rather good control
performance can be obtained (Fig. 5.29), whereas fuzzy PD control is less
sensitive to noise, as can be seen from Fig. 5.29, right.
1.6 1.6
1.4 1.4
1.2 1.2
h 1 h 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50 100 150
t t
Fig. 5.29. Fuzzy PD control. Noise-free (left), with noisy data (right).
Control using analytical inversion
In this approach fuzzy (transparent) model of the controlled process must be

identified first. We collect 1000 samples of system response (Fig. 5.26, right)
generated with input noise (Fig. 5.26, left).
1.5 2
1
1.5
0.5
u 0 h 1
-0.5
0.5
-1
0
-1.5
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
t t
Fig. 5.30. Collected data. Input u (left), output h (right). Sampling interval is 0.5 s.
Based on this collection, modeling data set is constructed to the format

h(k + 1) = f(h(k), h(k - 1), u(k)) (5.27)
The problem with the method is that it is not always simple to choose the
appropriate u algorithmically if the rule base of the model is non-invertible. We
choose 2 MFs for each input variable. This way, non-invertibility is avoided and
the modeling error also seems low enough (RMSE = 0.0261).
More serious problem with (5.27) is, however, that the inverted controller u(k) =
f(h(k), h(k - 1), r(k + 1)) pays no attention to system inertia. It always chooses
131
positive u whenever error is positive (and vice versa), never slows down near
the setpoint and thus performs as (poorly tuned) P controller (Fig. 5.31).
1.6 1.6
1.4 1.4
1.2 1.2
h 1 h 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50 100 150
t t
Fig. 5.31. Inverted model based control. Noise-free (left), with noisy data (right).
To give the controller some predictive power (derivative gain, so to speak) we

use the model structure.
h(k + 2) = f(h(k + 1), h(k), u(k)), (5.27)
in inverted mode
u(k) = f(h(k), h(k + 1), r(k + 2)), (5.28)
where unknown h(k + 1) is predicted recursively using the fuzzy model (5.27).
h(k + 1) = f(h(k), h(k - 1), u(k - 1)). (5.29)
The resulting controller has much better performance (Fig. 5.32). Presumably,
with a more complex (and accurate) model, even better results could be
obtained.
1.6 1.6
1.4 1.4
1.2 1.2
h 1 h 1
0.8 0.8
0.6 0.6
0.4 0.4
0 50 100 150 0 50 100 150
t t
Fig. 5.32. Inverted model based control. Ahead inverted model. Noise-free (left),
with output-additive noise (right).
Inverted control is not very sensitive to noise and, in fact, in this case, internal
model control (Fig. 5.33) is directly applicable as fuzzy model of the process
has already been identified.
Control using linguistic inversion
With model configuration h(k + 1) = f(h(k), u(k)), output is highly correlated to

h(k) and while this way low average approximation error can be obtained, it is
very likely that important information about u is not present and cannot be
132
extracted. One possibility to cope with this problem is to use fuzzy variational
model (Raymond et. al. 1996). More serious problem with linguistic inversion
(as already pointed out in section 5.5.1) is a large amount of type (ii) and (iii)
rules, moreover, because of model-plant mismatch there are also contradictory
rules that in summary means the majority of the rules must be derived from our
knowledge about the controlled process and that too much depends on critical
judgment – which, on the other hand, is a classical example of knowledge-based
control (already implemented with fuzzy PD controller). This is also the reason
why there are very few successful reports of linguistic inversion based control
and allows us to conclude that control based on linguistic inversion is not very
well suited for setpoint control.
d
x(k) y(k+1)
inverse u(k) +
r(k+1) + process
model +
-
fuzzy ym(k+1) +
model -
Feedback
filter
Fig. 5.33. Internal model control scheme.
The compared controllers, respective control laws and performance measures

IAE (integrated average error) are given in Table 5.9. Note: IAE is not the best
possible performance criterion as it favors fast rise times and oscillating (e.g. P
control) to slower but much smoother response (e.g. PD control) but it produces
the numbers that are somewhat representative.
Table 5.9. Control results.

Controller Control law IAE IAE
(noisy data)
P u = f(e) 0.0818 0.0787
PD u = f(e, de/dt) 0.0940 0.1250
MATLAB Fuzzy u = f(e, dh/dt) 0.0994 0.0859
Fuzzy PD u = f(e(k), ∆e(k)) 0.0828 0.0899
Inverted model 1 u(k) = f(h(k - 1), h(k), r(k + 1)) 0.1429 0.1346
Inverted model 2 u(k) = f(h(k), h(k + 1), r(k + 2)) 0.1163 0.1129
Fuzzy approaches seem to be less sensitive to noise but that does not necessarily
mean that fuzzy control is more robust per se. This sensitivity rather depends on
how the inputs of the controller are processed. I.e. controllers that have
continuous derivative component (as PD) experience most serious performance
drop, whereas controllers where the derivative is obtained by discrete
133
approximation (Fuzzy PD) or there is no derivative component at all (P,

Inverted model based control) are much more robust. The level of noise
depends on application and very often conventional control is more reliable and
almost always less costly than fuzzy control.
5.6 Stability issues
One of the issues that arises in the control theory is the stability of systems - the
first and last concern for any system design and a fundamental issue in every
control system. Fuzzy control systems are no exception, in particular in
association with knowledge-based techniques the following questions arise
(Jenkins and Passino 1999).
i) Will the behaviors observed by a human expert include all possibly
unforeseen situations that can occur due to disturbances, noise or plant
parameter variations?
ii) Can the human expert realistically and reliably foresee problems that could
arise from closed-loop instability or limit cycles?
iii) Will the expert really know how to incorporate stability criteria and
performance objectives into a rule base to ensure that reliable operation can
be obtained?
Neither can relevance of these questions be denied if the controller is obtained

automatically. These questions are even more important if the control system is
designed to operate in critical environment where the failure of a control system
to meet performance objectives could lead to loss of human life or
environmental disaster. Therefore, there is a demand for a methodology to
develop, implement and evaluate fuzzy controllers to ensure that they are
reliable in meeting their performance specifications.
Stability and robustness of the conventional and mostly linear systems has been
extensively studied in the past. Fuzzy control, however, is nonlinear by nature,
thus it cannot take full advantage of linear control theory.
Another difficulty in studying stability and robustness of fuzzy controllers is
due to the fact that fuzzy control is often based on heuristics and expert
knowledge whereas conventional control design is based on mathematical
model of the plant.
Nonlinearity and the lack of mathematical model remain two basic reasons why
stability analysis is often ignored in fuzzy control, while, on the other hand,
most of the critical comments to fuzzy control are due to the lack of a general
method for its stability analysis, the latter being clearly responsible for the fact
that fuzzy control is rarely used safety-critical applications. Perhaps a closer
look at this controversy would be necessary.
134
In many control applications an accurate mathematical model of the controlled

process (even if available) is so complex that it is not possible to use it (or
certain assumptions needed to apply conventional control are violated). The
conventional control engineering approach to this problem is to use an
approximated model that is accurate enough to characterize the essential plant
behavior. Due to inaccuracy of the model, however, the developed controllers
need to be tuned manually upon implementation. Whereas the advantages of
fuzzy control are most apparent in very complex problems where some intuitive
idea how to obtain high performance control exists it is often difficult to apply
nonlinear analysis techniques to the applications where the advantages of fuzzy
control are most apparent!
It is argued (Wang 1993) that the whole assumption that the mathematical
model of the plant is known (that is necessary to validate the stability of a
control (closed-loop) system) contradicts the very basic premise of fuzzy
control systems, i.e. there is no mathematical model for the plant and there are
experienced human operators who can satisfactorily control the plant and
provide qualitative control rules.
Mamdani (1993) develops the argument further by stating that "industry has
never put forward a view that mathematical stability analysis is a necessary and
sufficient requirement for the acceptance of a well designed control system/…/
control system methodologies are at best, recommendations on how to use a set
of computer based analytic tools. They do not provide an industry accepted
standard for a structured step by step approach for the analysis and design of a
system" and concludes that prototype testing is more important and stability
analysis by itself can never be considered a sufficient test.
On the other hand there are people (Jenkins and Passino 1996) who provide the
counter-argument by declaring that "While it is often the case that it is difficult,
impossible or cost-prohibitive to develop an accurate mathematical model for
many processes, it is almost always possible for the control engineer to specify
some type of approximate model of the process that while it may not be used
directly in controller design is often used in simulation to evaluate the
performance of the fuzzy controller before it is implemented" and continue by
generalizing that "Certainly there are some applications where one can design a
controller and verify its performance directly via an implementation. In such
applications one is not overly concerned with the failure of the control system.
In other applications there is the need for a high level of confidence in the
reliability of the fuzzy control system before it is implemented."
Thus, it may be necessary and useful to use the mathematical model and
nonlinear analysis to enhance our confidence in the reliability of fuzzy control
systems, in addition to simulation-based studies. Mathematical analysis alone,
however, cannot provide definite answers about the reliability of the fuzzy
control system since it proves the properties of the model of the process only
not the actual physical process.
135
Traditional analytical tools for dealing with stability analysis of nonlinear

control systems include describing function, phase plane and Lyapunov
stability. Each of these methods have been used for analyzing fuzzy control
system stability.
Describing function approach is an approximate method for determining the
stability of unforced nonlinear control systems and was used to evaluate fuzzy
system stability in (Kickert and Mamdani 1978). The approach is, however,
restricted to the mean-of-maxima defuzzification method and cannot be
generalized if other defuzzification procedures are used.
The sufficient condition for stability in terms of Lyapunov's direct method is
obtained in (Tanaka and Sugeno 1992) but this method is applicable only to
homogeneous TS systems (see section 2.8) that by itself is quite hard restriction.
Further problem is that a fuzzy representation of the entire closed-loop system is
required. This representation is not always available, and often, if a model of the
plant exists at all, it is based on differential or difference equations. There is
also no general method for finding a Lyapunov function and the basic
assumption of the approach is that the operating point changes slowly.
Further approaches include extended circle criterion (Ray and Majumder 1984),
linear matrix inequalities method (Tanaka et. al. 1996), Popov criterion (Kandel
et. al. 1999), cell-to-cell mapping (Chen and Tsao 1989) etc. Generally all those
methods have a very sophisticated theoretical level and are only available for
simple-structured systems that makes them conservative and quite impractical.
There has also been moderate interest in "linguistic stability" (as opposed to
numerical stability) of fuzzy systems (Braae and Rutherford 1979), (Furuhashi
et. al. 1992), (Ju and Chen, 1996) but this kind of analysis has not provided
sufficient stability guarantees. Transparency conditions for fuzzy systems
provided in this thesis, however, may inspire further research in this line.
Presently, the practitioner will go ahead with the design and implementation of
many fuzzy control systems without the aid of nonlinear analysis. On the other
edge, theorists attempt to develop a mathematical theory for the verification and
certification of fuzzy control systems. In the long run, it will hopefully have a
synergistic effect on the development of fuzzy control systems where there is a
need for highly reliable implementations because these methods can not only
evaluate the stability but also help to design a controller.
5.7 Summary and conclusions
During the last few decades, fuzzy control has been successfully applied to
many control problems Its applications involve a wide range of products
ranging from simple consumer products to control systems of complex
technological systems.
The main strengths of fuzzy logic control have arguably been:
136
1. It can be used in systems which cannot be easily modeled mathematically.

In particular, systems with non-linear responses that are difficult to analyze
may respond to a fuzzy control approach.
2. As a rule-based approach to control, fuzzy control can be used to efficiently
represent an expert's knowledge about a problem.
3. Continuous variables may be represented by linguistic constructs that are
easier to understand, making the controller easier to implement and modify.
4. Fuzzy controllers may be less susceptible to system noise and parameter
changes, thus making them more robust.
5. Complex processes can often be controlled by relatively few logic rules,
allowing a more understandable controller design and faster computation
for real-time applications.
Its main weaknesses include
1. Lack of systematic approach in knowledge acquisition and tuning of the
controller
2. Lack of reliable means for the analysis of optimality and stability of the
controller.
3. Combinatorial explosion of rules in case of multitude of inputs.
Most existing fuzzy logic controllers since Mamdani's application (Mamdani
and Assilian 1974) are set point controllers that respond to proportional and
derivative (PD) or proportional and integral (PI) terms (section 5.2).
This is perhaps because it is not yet fully recognized that fuzzy control and
conventional control are, for the most part, complementary rather than
competitive, fuzzy control is task-oriented whereas conventional control is set-
point oriented, and the potential could be fully realized using hybrid control
systems incorporating elements from fuzzy and conventional control. These
hybrid control systems usually appear as hierarchical control systems where
fuzzy logic is primarily utilized in the supervisory part (section 5.3).
Recently, there has been significant body of research on adaptive fuzzy control,
motivated from the lack of systematic approach in knowledge acquisition and
tuning of the controller. Adaptive fuzzy control can be seen as a special case of
traditional adaptive control and frequently makes use of neural network
techniques. Also, perhaps more than any other type of fuzzy control, adaptive
fuzzy control is influenced by the lack of efficient stability analysis tools
(section. 5.6). Transparent adaptive control would provide means to analyze the
and better understanding of its performance mechanics. The implications of
transparency to this kind of fuzzy control are, however, yet to be researched.
Completely unique to fuzzy systems, are analytical and linguistic inversion
techniques, (section 5.4.2). One may argue that exact inversion of fuzzy systems
is a contradiction in terms and in this sense linguistic inversion techniques fit
better into the general philosophy of fuzzy logic. Theoretically, the controller
that is obtained through analytical inversion of the model ensures perfect
control with inherent stability but in practical situations other factors and
137
implementation issues such as model-plant mismatch and dynamical properties

of the controlled system come into full power that make this goal difficult to
achieve. Linguistic inversion techniques (to which local linguistic inversion was
added), on the other hand, are even more difficult to apply, particularly in
setpoint control problems where the required degree of accuracy is too high as
shown in the control example in section 5.5.
Applications presented in the next chapter make use of control techniques that
were described in this chapter. We exploit the fusion of PID and fuzzy control
and local inversion technique to demonstrate that in transparent fuzzy control,
knowledge acquisition and reproduction techniques must be carefully selected
to be successful.
138
Applications
6.1 Introduction
This chapter presents two applications of fuzzy modeling and control techniques
previously described. Section 6.2.1 describes the application of truck backer-
upper and presents a selection of known solutions obtained by other authors. A
supervisory control system integrating elements from conventional and fuzzy
control is proposed and its control performance is demonstrated. A more
demanding control problem is faced in section 6.2.2 where the driving system
consists of two parts (truck and trailer) that is solved by extending the approach.
Fed-batch fermentation control applications presented in section 6.3 make use
of the same supervisory control principle but this time the controller cannot be
specified using experience and general knowledge because such does not simply
exist. The controller is synthesized by performing the local linguistic inversion
technique (section 5.4.2) on the fuzzy model of the fermentation process that is
previously identified from experimental data. In model identification, two
techniques are employed - Yager-Filev method (section 4.4) and GK clustering
(section 4.7) with least square estimation (section 4.5).
Both applications are implemented using computer simulations and demonstrate
that transparency is vital to this branch of intelligent control that seeks solutions
by emulating the mechanisms of reasoning and decision processes of human
beings.
139
6.2 Backing up the truck and truck-and-trailer
Truck backer-upper problem, famous from (Nguyen and Widrow 1990), has
been investigated by many researchers. Being a highly nonlinear control
problem, it has raised interest among the practitioners of computational
intelligence. The advantage of the original Nguyen-Widrow approach that
amply demonstrates the power of neural networks is that the controller designed
in (Nguyen and Widrow 1990) is self-tuning. In recent works, several authors
e.g. (Schoenauer and Ronald 1994) have replaced or complemented neural
network with genetic algorithms. The basic shortcoming of all these data-driven
methods is the computational cost.
On the other hand, it is difficult to ignore the fact that nearly anyone is able to
drive the truck to the desired position given some time to adjust himself to the
controls. To ignore such knowledge and the potential of fuzzy logic for utilizing
it would be short-sighted. Several reports can be found from literature, e.g.
fuzzy controller that replaces human operator, formulated on the basis of expert
knowledge (Kong and Kosko 1992) or identified from control data (Wang and
Mendel 1992a). Although the high computational load is avoided, the controller
design procedure is ill-defined and plagued with the curse of dimensionality that
may result in poor performance.
We propose an alternative approach by introducing the supervisory truck
backer-upper control system. By using the proposed controller architecture the
control task is decomposed into two parts that allows much more efficient
knowledge acquisition and relieves the problem with the curse of
dimensionality. Our controller also shows superior performance when compared
with several other fuzzy controllers designed for truck backer-upper problem as
demonstrated below.
6.2.1 Truck backer-upper
The system used in the simulations is supplied with MATLAB as a demo. The
truck as in (Kong and Kosko 1992), (Wang and Mendel 1992a) corresponds to
the cab part of the Nguyen-Widrow's truck and trailer, referred to as simplified
Nguyen-Widrow problem (we consider the complete problem in section 6.2.2).
The truck position is determined by the three state variables x = [-20,20], y = [0,
25], and, Φc = [-90°, 270°] - the angle between truck's onward direction and the
x-axis (Fig. 6.1). The width and length of the truck are 4 and 2 meters,
respectively.
Truck must arrive from the initial position (x0, y0, Φ0) to the loading dock
(xf = 0, yf = 0) at a right angle (Φf = 90°). Truck only moves backward with the
fixed speed 2m/s. To control the truck at every stage appropriate steering angle
θ = [-45°, 45°] must be provided. Thus controller is a function of state variables
140
25
y 270°
θ
15 0° 180°
(x, y) y
(x0, y0) 90°
Φc 5 (xf, yf)
x
-20 -10 0 10 20
x
Fig. 6.1. Truck backer-upper system.
θ = f ( x, y , Φ c ) . (6.1)
Typically it is assumed that enough clearance between the truck and the loading
dock exists so that the truck y-position coordinate y can be ignored, simplifying
the controller function to:
θ = f ( x, Φ c ) . (6.2)
For obvious reasons such controller cannot perform very well if the distance
between the truck position and the loading dock is small.
The primary control goal, as already stated, is to reach the final state (0, 0, 90°)
from any given initial position. In practice, however, some tolerance (Fig. 6.2)
should be allowed. Backing of the truck is considered successful if the
following criterion (6.3) is met:
εx
ε c = ε x + 0.0267ε Φ − 0.4 ≤ 0 , (6.3)
0.4
where
0.3
0.2
failure ε x = abs( x f − x(T f ))
 , (6.4)
0.1
success ε Φ = abs(Φ f − Φ (T f ))
0.0 and where Tf is the duration of the backing.

0 5 10 15 20 εΦ
Fig. 6.2. Success/failure criterion.
Moreover, length and smoothness of the backing trajectory can be considered as

a secondary criterion, expressed here by indirect estimate η:
Tf
∫ (dx / dt ) 2 + (dy / dt ) 2 dt
(6.5)
η= 0
,
( x0 − x f ) + ( y 0 − y f )
2 2
141
where the distance the truck actually covers during the backing is divided by the
shortest distance between the initial position and the position of the loading
dock. Note that magnitude of η depends on the difference between Φ0 and Φf
and is always bigger than 1.
Truck backer-upper controllers
The basic fuzzy logic controllers are considered: controller based on expert
knowledge and controller identified from human operator control actions,
selection of state variables may vary. In addition, MATLAB state feedback
controller, cascaded PID controller and supervisory fuzzy controller are
described.
Expert-defined controller
Selection of the controller function (6.1) implies that the following rule base
format should be used
IF x is A and y is B and Φc is C THEN θ is D, (6.6)
where A, B, C and D are the linguistic labels of the system variables associated
with the corresponding fuzzy sets.
Design of the fuzzy controller includes the definition of input-output domains,
partitions and fuzzy sets, and the contents of the rule base. The only source of
that information in present case is human understanding of the driving process.
The biggest problem with (6.6) is so-called curse of dimensionality. Employing
the input partition {5 3 7}, for example, results in 105 rules that all must be
derived from experience. Although we can drive the car to the loading dock
manually from almost any position, design of the fuzzy controller that would
achieve the same goal is not a trivial task. Though fuzzy logic is a good
interface for man-machine interaction, the problem in present case is that we do
not know exactly how we are able to drive the car. Typically, the controller
must be re-tuned and tested several times, sometimes it is necessary to add
fuzzy sets if controller performance is poor, etc. In summary, whole design
procedure is time-consuming and frustrating because the number of tuning
parameters is large. We found the design task of (6.6) extremely difficult and
therefore the state variable selection (6.1) was replaced with (6.2). This enabled
us to use the control rules given in (Kong and Kosko 1992). Some readjustment
of the controller parameters was necessary though, because the truck backing
systems are not identical.
142
Data-driven modeling of human operator
The crucial problem with data-driven techniques is what kind of data must be
prepared, how much data is required and how to collect it. In theory, we need a
sufficient amount of data that would give good representation of operator
actions. In practice, resources are always limited. The immediate problem in
case of multidimensional systems is that some rules created in the initialization
phase remain uncovered by data, implying that the rule base of the controller
will be sparse. This may result in unexpected behavior. Also, the more data we
have the longer is the learning process. Another data-driven modeling issue is
that available modeling algorithms are not perfect, there always exists modeling
error. The impact of modeling error onto control performance, depends on the
application at hand.
For modeling we used ANFIS (see section 4.9) and Gustafson-Kessel clustering
(section 4.7) in combination with least squares procedure (section 4.5). The
ANFIS algorithm is without transparency protection and was employed because
of its excellent approximation properties, application of GK/LS on the other
hand would result in transparent model of human controller.
Data used in modeling was collected from 31 truck backing experiments with 8
upward, 6 leftward, 6 rightward and 11 downward initial angles. Starting
positions were chosen so that different backing trajectories would be present
(Fig. 6.3, left). To reduce the computational load, most of data was filtered out
so that the final data set, consisting of 642 input-output pairings, corresponds to
the situation as if information had been available every third second only
(normally the sampling interval is 0.1s).
The number of parameters that influence the approximation error and must be
determined prior to training is quite large and all of them cannot be determined
automatically. Determination of training parameters was based on trial and error
until the configuration by what "reasonably low" approximation error could be
achieved was established. Very soon, the necessity for modeling the control law
(6.1) was confirmed because with (6.2), results of any acceptable accuracy
could not be obtained. ANFIS was then applied to 1st order Takagi-Sugeno
system with input partition of {7 3 9} and GK/LS model was initialized as a 0th
order Takagi-Sugeno system with the same partition. Final modeling root mean
square errors for ANFIS (2500 epochs, RMSE = 0.2129) and GK/LS (RMSE =
0.2048) are very similar.
143
25 1
0.8
20 0.6
0.4
15 0.2
y θ 0
10 -0.2
-0.4
5 -0.6
-0.8
0 -1
-20 -10 0 10 20 0 50 100 150
x sample no.
Fig. 6.3 The initial positions of backing trajectories in modeling data (left)
Modeling results (excerpt) - target steering angle (dashed line), modeled steering
angle (bold line) in the right side of the figure.
MATLAB state feedback controller
The state feedback controller that is included in MATLAB truck backer-upper

demo employs fuzzy logic only to improve conventional control and was used
here for comparison purposes only. The controller consists of two state
feedback controllers that operate on two different sets of state variables. The
fuzzy logic block is then used to combine these two controllers smoothly, based
on the distance between the truck and the loading dock.
Controller 1 (distance is large):
θ = −2(arccos( x / x 2 + y 2 ) − π / 2) + 3(Φ − arccos( x / x 2 + y 2 ) − π ) (6.7)
Controller 2 (distance is small):

θ = x / 8 + (Φ − π / 2) . (6.8)
Cascaded PID control
Another control system supplied here for comparison purposes is the cascaded
PID controller that consists of two (conventional) PID controllers that are
connected so that the first one (PID 1) generates the set point value for the next
one (Φf is subtracted from the reference value in order to provide zero θ if Φ =
90°). The control system (depicted in Fig. 6.4) is able to take the truck to the
desired position if properly configured.
144
The crucial point of the technique is the determination of controller parameters,

e.g the P-parameter of the controller 1 is chosen such that the generated set
point would remain within [0…270°]. Controller performance is then improved
by modifying the remaining parameters of which integral components were
never used therefore the controllers in Fig. 6.4 are, in fact, of PD type.
Φc
θ - Φ c* x
PID 2 PID 1
+ + -
-
Φf xf
Fig. 6.4 Cascaded PID control
Fuzzy supervisory controller
The control block in this approach consists of fuzzy supervisor and PD

controller (Fig. 6.5). The task of the supervisor is to provide set point Φ *c for
the given state, appropriate steering angle is then determined by PD controller.
Although extra effort is required to determine the parameters of PD controller, it
can be considered a bargain price for the exclusion of one state variable from
the input of the fuzzy block.
x
θ - Φ c* Fuzzy
PD supervisor y
+
Fig. 6.5 Block diagram of fuzzy supervisory controller
The rule base of the supervisor is easily configured, e.g. gray region in Fig. 6.6
that reads as
IF x is mf4 AND y is mf3 THEN Φr is 90°, (6.9)
is in good accordance with the idea what angle the truck in this particular area
should maintain. The rest of the rules in Fig. 6.6 are derived using the same
reasoning. Even simpler, single-input supervisor can be obtained by using only
145
7 rules that correspond to the subset of rules where "y is mf3", making the
controller block equal to (6.2).
mf1 mf2 mf3 mf4 mf5 mf6 mf7
mf4
25
20
mf3 15
y
10
mf2
5
mf1 0
-20 -15 -10 -5 0 5 10 15 20

x
Fig. 6.6 rule base of the two-input fuzzy supervisor (right)
Results
The comparison of different controllers was based on backing the truck from a
number of randomly chosen initial positions. The feasibility of control was
confirmed by manual backing (results included). The positions with what hitting
the wall could not be avoided were filtered out, resulting in the final set of ten
initial states (Table 6.1). The corresponding εc and η are depicted in Figs 6.10
and 6.11, respectively.
Table 6.1. Initial conditions

Ne 1 2 3 4 5 6 7 8 9 10
x0 -3.7 -14.3 18.4 5.2 -14.9 12.1 -10.1 -0.9 -4.4 -11.9
y0 18.0 8.5 17.7 7.3 5.0 9.1 7.0 9.1 16.2 18.6
Φ0 -12.5 154.7 92.1 76.4 252.5 206.9 158.0 162.8 265.4 253.6
If y0 is large enough (Ne = 3, 9, 10), all controllers enjoy success. In contrast,

there are instances like Ne = 8, where all or Ne = 4, where most of them fail.
Some controllers are clearly more accurate than others, their success rate
ranging from 30% (expert defined controller) to 90% (two-input supervisory
controller). As could be expected, controllers that follow the control law (6.1),
perform better than those that do not (expert defined controller, single-input
supervisory controller) if y0 is small. This is especially true for cascaded PID
146
control that is able to properly position the car only if y0 is bigger than 10 and
otherwise fails completely.
Of two data-driven controllers, the one obtained by GK/LS clearly stands out.
Its performance is quite comparable with two-input supervisory controller.
ANFIS-based controller, interestingly, shows unexpected behavior on some
occasions as the truck "goes wandering" before returning to the loading dock
(clearly expressed by η in Fig. 6.9 when Ne = 1 and Ne = 6). Because the
approximation error of both algorithms was of similar magnitude, it can be
assumed that the erratic behavior of ANFIS-based controller is at least in some
degree caused by its non-transparent nature.
Manual control is never fully reliable, especially in special circumstances where
aggressive driving is required - although good results can finally be obtained,
there is no guarantee, that the success will be immediate. Automatic controllers
provide more reliability.
We do not claim to have obtained optimal controllers in any of the cases. Given
enough time and/or computational power, controllers that outperform the
current candidates, could probably be designed. We, however, tried to estimate
the design effort so that with different controllers it would be approximately the
same.
25
20
15
10
0
-25 -20 -15 -10 -5 0 5 10 15 20 25
Fig. 6.7. Truck backing trajectories with two-input supervisory controller. Exps.
No. 1, 2 and 3.
147
25
20
15
10
0
-25 -20 -15 -10 -5 0 5 10 15 20 25
Fig. 6.8. Truck backing trajectories with two-input supervisory controller. Exps.
No. 5, 6 and 10
25
20
20
15 15
10 10
5 5
0 0
-20 -15 -10 -5 0 5 10 -10 -5 0 5
Fig. 6.9. Truck backing trajectories with two-input supervisory controller. Exp.
No. 4, 7 and 9 (left). Exp. No. 8 (right).
1.4
- manual control
- GK/LSE controller
1.2
- ANFIS controller
- supervisory TISO contr.
1
- supervisory SISO contr.
- state feedback controller
0.8 - cascaded PID control
εc - expert controller
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
Ne
Fig. 6.10. Control performance (εc). Note that in five cases εc of cascaded PID
control is actually bigger than 1.4.
148
3.5
η 2.5
1.5
1
1 2 3 4 5 6 7 8 9 10
Ne
Fig. 6.11. η (secondary criterion).
On given conditions, the proposed supervisory control system (the backing

trajectories are depicted in Figs. 6.7-6.9) has easily shown the best performance.
6.2.2 Backing up the truck and trailer
In this experiment, a trailer is connected to the truck and the control system is
designed by making use of the experience of the previous experiment. Inclusion
of the trailer brings along an additional state variable - the angle of the trailer
(Φt). The coordinate point of the moving object is transferred form the cab part
to the backside of the trailer because the control goal now is to guarantee the
proper positioning of the trailer part (Fig. 6.12). It is also assumed that
Φ c − Φ t ≤ 60 o . Length and width of the trailer are 4 and 2 meters,
respectively. The dimensions of the cab part are 2×2 m. Other assumptions and
control objectives remain unchanged (see section 6.2.1). Current
implementation uses the set of equations from (Schoenauer and Ronald 1994).
Normal driving instincts can cheat us when attempting to back up a trailer truck
to a loading dock. The task is so difficult that a lot of practice is needed to
master the skill. And even then, when a truck driver backs up toward a loading
dock, he or she will go forward and backward numerous times in order to
position the truck at the dock successfully. If the driver is not allowed to make
forward movements, successful backing becomes improbable. Therefore, there
is not much sense in trying to design expert opinion-based controller. Due to the
additional state variable, the design task would be very complicated. Similarly,
it would be difficult to identify the controller from data (modeling would take a
long time and collecting the data is difficult as driving the truck is nontrivial
task). The problem, however, can be solved by using supervisory control system
that in this case consists of three components (Fig. 6.13)
149
(x, y)
Φt Φc
Fig. 6.12. A driving system consisting of truck and trailer.
We used previously designed fuzzy supervisor depicted in Fig. 6.6. Due to the
different rotating radius of the controlled object slight adjustment of input MFs
of the supervisor is necessary that for the sake of convenience is carried out
using the scaling factors (kx = 0.8, ky = 0.1).
The second fuzzy controller in the control loop creates a mapping
Φ *t − Φ t → (Φ t − Φ c ) * (see Fig. 6.14). Because the input of the controller the
error of the trailer angle, it can be regarded as a proportional controller that
determines the angle difference of cab and trailer parts that is necessary to
obtain the expected angle of the trailer. It requires only very primitive
understanding of the mechanics of the driving system to reach the conclusion
that in order to rotate the trailer part to the left the angle of the cab must be
negative and vice versa. Being a SISO system, this functional block can be
easily tuned manually and is implemented using fuzzy logic primarily in order
to obtain quickly a non-linear mapping to achieve satisfying control
performance (Fig. 6.14).
Finally, appropriate steering angle is computed by conventional PD controller
(Kp = 12, Kd = 4) that uses the cab and trailer angle difference
( (Φ t − Φ c ) * − (Φ t − Φ c ) ) as its input.
Φ
θ (Φ c − Φ t ) * Fuzzy P x
- -
+ Φ *t Fuzzy
PD y
+
controller supervisor
+
Φc
Fig. 6.13 Fuzzy supervisor with fuzzy slave controller and PID controller.
Once again, 10 driving experiments to demonstrate the control performance

were conducted. This time the initial positions (Table 6.2) were not chosen
randomly but rather to gather some kind of representation of possible situations
that can occur (Fig.6.15). Backing trajectories are depicted in Fig. 6.17 and the
150
computed εc in Fig. 6.16. The designed control system shows quite good
performance although some failures do occur, particularly if the starting
position is close to the loading dock and/or there is not enough space for
maneuvering.
mf1 mf2 mf3 mf4 mf5
60
mf1
mf2
40
20
(Φ c − Φ t )*
0 mf3
-20
-40
mf4
mf5
-60
-150 -100 -50 0 50 100 150
Φ t* − Φt
Fig. 6.14. Fuzzy P controller.

25
20 3 2
1
15
9
8
10 10
7
5 6 4 5
0
-25 -20 -15 -10 -5 0 5 10 15 20 25
Fig. 6.15. Initial positions of test drives

1.8
1.6
1.4
1.2
1
εc
0.8
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10
Ne
Fig. 6.16. Control performance of the truck-and-trailer system.
151
25
10
20
15 6
1) 10
6) 4
5 0
-15 -10 -5 0 5
0
-10 -5 0 5 10
25
20
20 18
16
14
15
12
10
2) 10
7) 8
4
5 2
-25 -20 -15 -10 -5 0 5

0
-5 0 5 10 15
25 16
14
20
12
10
15
3) 10
8) 6
5
2
0
0
-20 -15 -10 -5 0 5 -15 -10 -5 0 5
16 25
14
20
12
10 15
8
4) 6
9) 10
4
5
2
0 0
-10 -5 0 5 -5 0 5 10 15 20 25
15 15
10 10
5) 5
10) 5
0 0
-5 0 5 10 15 20 25 -5 0 5 10 15 20 25
Fig. 6.17. Truck and trailer backing trajectories.
152
Table 6.2. Initial conditions and control performance.

Ne 1 2 3 4 5 6 7 8 9 10
x 0.0 5.0 -9.5 0.1 15.0 -5.0 -18.8 -6.5 13.7 17.6
y 15.0 20.0 19.7 7.0 4.7 5.0 8.4 9.3 14.2 8.7
Φc 90.3 0.0 0.0 -90 180 180 -89.2 160.5 226.7 42.0
Φt 89.8 5.7 26.3 -90 180 180 -59.2 123.6 199.1 17.2
εc 0.003 0.004 0.025 0.791 0.079 0.922 0.014 1.631 0.001 0.801
6.3 Control of a fed-batch fermentation
One of the main tasks of a bioprocess engineer regards the study, control and
maintenance of biological processes such as the production of beverages,
pharmaceuticals, antibiotics, enzymes, biochemicals, enzyme-catalyzed
reactions, food processing and biological waste treatment. These processes
require a well-designed growth environment to obtain maximum yield of
product and therefore, these conditions need to be carefully controlled at
specific certain points or narrow ranges which often follow specified
trajectories. Fermentation engineering provides the means for meeting those
requirements.
The control should take into account the nature of the product of interest: the
product can be growth-associated, meaning that the formation of the product
and biomass are proportional, or the product might be non-growth-associated.
Usually, for this case, biomass is initially formed and then conditions are
created for the production of the product. and the organism growth and product
formation are likely to have different environmental requirements. Three basic
modes of operation are: batch culture, continuous culture and fed-batch
processes.
Batch fermentation refers to a partially closed system in which most of the
materials required are loaded onto the fermentor, decontaminated before the
process starts and then, removed at the end. The only material added and
removed during the course of a batch fermentation is the gas exchange and pH
control solutions. In this mode of operation, conditions are continuously
changing with time, and the fermentor is an unsteady-state system, although in a
well-mixed reactor, conditions are supposed to be uniform throughout the
reactor at any instant time. The principal disadvantage of batch processing is the
high proportion of unproductive time between batches, comprising the charge
and discharge of the fermentor vessel, the cleaning, sterilization and re-start
process.
Continuous culture is a technique involving feeding the microorganism used for
the fermentation with fresh nutrients and, at the same time, removing spent
medium plus cells from the system. An unique feature of the continuous culture
is that a time-independent steady-state can be attained which enables one to
153
determine the relations between microbial behavior and the environmental

conditions.
Fed-batch process is a production technique in between batch and continuous
fermentation and is commonly used in industrial fermentations, for example, for
the production of baker's yeast, some enzymes, antibiotics, growth hormones,
microbial cells, vitamins, amino acids and other organic acids. The reason for
this diversity of use is related to the theoretical analysis of many fermentation
processes, which revealed the extended or exponential fed-batch cultures to be
optimal, particularly in terms of productivity. Fed-batch fermentations have also
gained popularity because of the improved control possibilities. However, due
to the inherent mode of operation of this process, the system never really
reaches a true steady-state, which makes the control of fed-batch reactors a very
challenging task.
6.3.1 Control system for fed-batch fermentation process with a

single substrate feed
Basically, the control task of fed-batch fermentation lies in the determination of

the proper feed rate F of a substrate s that facilitates both the growth of the cell
concentration x and the concentration of the desired product p within the
volume V of the fermentator during the fermentation time. According to
(Viesturs et. al. 1992), if an exact mathematical model exists, the control task
can be solved using the optimal control theory. The analytical model of fed-
batch fermentation has the following general form:
 dxV
 dt = µxV

 dsV = − q xV + s F
 dt s n
 , (6.10)
 dpV = q xV
 dt p
 dV
 =F
 dt
where µ = f(x, s, p); qp = f(x, s, p); qs = f(x, s, p), and sn is the concentration of
substrate in the feeding dilution.
Physical limitations are set for V(T) = Vk (where T is the duration of the process)
and F: 0 ≤ Fmin < F < Fmax. The control task involves the determination of such
feed rate function that maximizes the cost function J = f(xk, sk, pk, Vk, T), where
xk, sk and pk are the values of x, s and p at the moment T . It is assumed that
other process environment variables (temperature and pH) are optimal and that
the fermentation environment is homogeneous.
154
To solve this task, the initial values x0, s0, p0 and V0 of x, s, p and V must be
given. The solution process is described in detail in (Viesturs et. al. 1992).
Process parameters
Determining the mathematical model can often be a very complicated and time-
consuming task. Moreover, using non-accurate model in the calculations of the
optimal feed rate may lead to undesirable results. To demonstrate how
appropriate control profiles can be found without the need for exact
mathematical model of the process, we consider penicillin fed-batch
fermentation as a sample fermentation process.
According to (Viesturs et. al. 1992), penicillin fed-batch fermentation is defined
by (6.10) with the following parameter values:
0.11s µ qp 0.004 s
µ= , qs = + + 0.029, q p = − 0.01 p
0.0006 x + s 0.47 1.2 0.0001 + s + 10 s 2
x0 = 10.5 g; s0 = 10-6 g; p0 = 0.0 g; V0 = 7 l; Vk = 10 l; Fmax = 0.1 l/h;
sn = 100 g/l.
The cost function for the optimal control task is defined as J = p(T)V(T) and the
duration of the fermentation process is not fixed.
The calculation of optimal feed rate itself is outside the scope of the present
work, but optimal feed rate (taken directly from (Viesturs et. al. 1992) and the
corresponding results are used for comparison.
Fuzzy model
Instead of the mathematical model, we use the fuzzy model of the process as the
source of control information. The model is based on input-output readings
collected from the experiments conducted with the fermentation system.
We use fuzzy template modeling of Yager and Filev from section 4.4.2. The
reason for choosing this modeling technique that has rather poor approximation
properties is that on linguistic layer the loss of information is very small.
The structure of the model (fuzzy partitions of the input-output variables,
structure of the rules) must be known before applying the weight determination
procedure that can be considered a drawback. This application, however, shows
that even without detailed knowledge and long-time experience with the process
the problem can be solved successfully.
To find the basic relationships between the fermentation components, we try to

identify the model, which has the following structure:
155
IF x is A AND s is B THEN dx is C AND dp is D, (6.11)

where dx and dp are the growth rates of the biomass and product, respectively,
and A, B, C and D are linguistic labels of MFs defined over input and output
universes of discourse such as “low”, “average”, “high”, etc. Selection of the
MF parameters is the most crucial task in this approach. Model (6.11) is
expected to give us all the information we need to design the control system.
Input-output data and fuzzy partitioning
Data used for modeling should adequately reflect the behavior of the system.
Little data increases the risk that important information would be missed. At the
same time, huge amounts of information, slow down the modeling, complicate
the design of the control system and consequently, increase the overall design
cost.
We have limited ourselves to constant feed rates when making experiments to
produce the input-output data for modeling. The amount of possible constant
feed rates that fulfils the condition (6.12) is infinite.
V − V0
F= (6.12)
t
However, feed rates cannot exceed physical limitations (Fmax), and conducting
too many experiments is unreasonable. In particular, too high or too low feed
rates are outside consideration.
F 0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0 100 200 300 400 500 600 t
Fig. 6.18. Possible constant feed rates.
In the present work, the experimental data was collected from the simulations
made with three different constant feed rates F = 0.02, 0.025, 0.033l/h, as
shown in Fig. 6.18.
156
s
150 s
0.03
0.025
100
0.02
0.015
50
0.01
0.005
0 0
0 5 10 15 20 25 30 t 0 50 100 150 t
Fig. 6.19. Substrate concentration in the beginning of the process (left), in the
second part of the process (right).
The input-output partition of the model is based on our assumptions about the
nature of the process, made on the basis of input-output readings. Perhaps the
best example of the partitioning strategy is the partition of substrate
concentration s shown in Figs. 6.19-6.20.
At the beginning of the process (Fig. 6.19, left) s grows very fast and reaches
high values quickly, then falls to an extremely low level (Fig. 6.19, right).
Although the latter is numerically very small compared to the maximum values,
it must not be ignored and should be represented by respective fuzzy sets (Fig.
6.20). Other variables (x, dp, dx) have been partitioned basically in the same
style.
Fig. 6.20. Fuzzy sets of s.
Controller design
The model obtained (6.11) contains the information how the two outputs - dp
and dx are affected by the different qualitative measures of x and s. As a result
of analysis of this information, the control strategy can be derived.
To achieve large quantities of p by the end of the process, the control strategy,
first of all, must guarantee high product growth rate during the process. The
157
rules in (6.11) give us the concentrations of s and x that lead to such growth
rates. It turns out that the highest possible growth rates of p occur while the
biomass concentration is sufficiently “high” in the fermentator. Therefore, the
control actions must lead first to the high biomass concentrations and then
maintain this level until the end of the process. This comes up solely as a result
of linguistic rule analysis but as we remember from section 6.2, it is confirmed
by what is known about the fermentation processes! The fuzzy sets A of x
themselves act as the identifiers of the different stages of the process. Matching
all the biomass concentrations with appropriate substrate concentrations in
accordance with the fuzzy values of dp and dx results in the formation of the
fuzzy system, where every A is paired with appropriate B (6.13).
IF x is A THEN s is B (6.13)
These rules specify which B of s must be maintained in the fermentator during
the stage A. This is actually the local model inversion technique, described in
section 5.5.3. Usually such analysis leads to more than one version of the
supervisor.
Control system
The obvious solution as how to implement the obtained controller (6.13) is to

use it as the supervisor in the supervisory control system where it calculates the
set values for the conventional PI controller that controls the feeding rate (Fig.
6.21). Interestingly, this is the same control system configuration that was so
successful in truck backer-upper control.
Fuzzy
supervisor
+
-
PI s x
F Fermentor
Fig. 6.21. Control of penicillin fermentation using fuzzy supervisor.
158
Results
The structure of the fuzzy model of the system is defined by the fuzzy sets of
the variables presented in Table 6.3.
Table 6.3. Membership functions of model variables

variable membership functions
x low below average average high
s extremely low very low low average high
dp zero low average high
dx low below average average high
The initial model consisted of 320 elemental rules. After tuning the model and
removing all the rules with zero or very small weights, this number was reduced
to 53. The analysis of these rules gave three substantially different versions of
the supervisor (Table 6.4). Fig. 6.22 depicts the results.
Table 6.4. Fuzzy supervisors

x scenario 1 scenario 2 scenario 3
low high high high
below average high or average high small
average high or average extremely small extremely small
high extremely small extremely small extremely small
During the fermentation, two major phases - biomass growth phase and
production phase - can be distinguished (Fig. 6.22), and scenarios 1-3 differ
essentially by the moment the transition takes place.
Following scenario 1, the latest transition is the case, biomass can achieve the
highest concentration possible by the beginning of the production phase, and
therefore the growth rate of the product in the production phase is also the
highest. However, very large quantity of the product cannot be achieved,
because the fermentator fills up too soon. On the other hand, scenarios 2 and 3
offer good final product concentrations, comparable to the one obtainable with
optimal control but require more time.
One can notice that in the case of scenario 1 increase of p is even higher than
that with optimal control (although the final amount of p remains smaller
because of shorter duration of fermentation). In real life, the advantage of a
scenario is determined also by many other circumstances (substrate cost,
fermentation preparation cost, time delay between the individual fermentations).
The most suitable scenario can be finally chosen taking into account all such
considerations.
159
p x
80 optimal control optimal
300 Scen.
70 Scen 2.
Scen 1.
250
60 Scen.
Scen 3.
50 200
Scen.
40
150
30
100
20
10 50
0 0
0 20 40 60 80 100 120 140 160 t 0 20 40 60 80 100 120 140 160 t
Fig. 6.22. Product concentration during the fermentation (left), biomass

concentration during the fermentation (right).
6.3.2 Fed-batch fermentation control (two substrate process)
Here we provide another example of fed-batch fermentation control using

supervisory control system based on the fuzzy model of the process. The main
differences compared to the experiment in the previous section are:
a) Fermentation process needs two substrates instead of one for growth;
b) Exact mathematical model of the process does not exist, process simulator
is a black box;
c) GK/LSE modeling technique is used besides Yager-Filev fuzzy template
modeling in the identification of the fuzzy model.
Process description
The program that simulates a fed-batch fermentation process producing a

secondary metabolite as the product is distributed by Industrial Control Centre
of University of Westminster (1997). The control task and other features as
described here come from the same source. The microorganism in this process
needs two substrates (s1 and s2) for growth and production.
The process has two inputs, f1 and f2 in terms of substrate feed rate. There are
five measurements that are x - biomass, s1 - concentration of substrate 1, s2 -
concentration of substrate 2, p - product concentration and, V - volume.
Maximum feed rates and the volume of the fermentor are limited.
It is believed that the nominal profile as provided
160
 25
 f1 = 10 + (1 + e 5− 0.1t )

 , (6.14)
 f = 3.5 − 3 . 5
 2 (1 + e10−0.15t )
is not good enough for the production. The optimal feed pattern should be
investigated in order to improve the process productivity. The criterion J may
be selected as:
pV
J= , (6.9)
T
where T is the duration of fermentation.
Other process environment variables such as temperature and pH are assumed
to be constant (at theirs optimum).
From the long experience it is known that
1) Feed of substrates at too high or too low levels seem to reduce the
production and nominal feed profiles give a reasonable production;
2) Feed rate of s2 seems to be complementary with feed rate of s1. Only
feeding of f2 does not yield any production while only feeding of f1 yields small
product.
The model representing the plant is not given explicitly (to make the exercise
comparable to working on a practical plant) and is supplied as a black box with
a set of specified inputs. There are following features:
1) The initial state varies randomly within a subspace for each batch;
2) The parameters of the model vary within specified limits;
3) A set of non-measurable disturbances;
4) A set of constraints.
Due to these features, it is impossible to obtain the same output series for each
batch. This is more realistic to the practical situation. It may be added here,
however, that in real life, the situation is even more complicated, e.g. on-line
measurement of the concentration of fermentation components is problematic.
Modeling the process
The structure of the fuzzy model to be identified is derived from the optimal
control theory of the fed batch fermentation as in section 6.2.1. The
mathematical models used for solving the optimization task of fed batch
fermentation are based on mass balance equations and (6.16) and (6.10) are
derived from the same source.
161
 d ( xV )
 dt = µ ( x, s1 , s 2 , p ) xV

 d ( s1V ) = −q ( x, s , s , p) xV + s F , s (0) = s
 dt s1 1 2 1n 1 1 10

 d ( s 2V )
 = −q s 2 ( x, s1 , s 2 , p ) xV + s 2 n F2 , s 2 (0) = s 20 . (6.16)
 dt
 d ( pV )
 dt = q p ( x, s1 , s 2 , p ) xV , p (0) = p 0

 dV = F + F
 dt 1 2

In discrete form
∆x(k ) = f1 ( x(k ), s1 (k ), s 2 (k ), p (k ))

∆s1 (k ) = f 2 ( x(k ), s1 (k ), s 2 (k ), p (k ), F1 (k ))
∆s 2 (k ) = f 3 ( x(k ), s1 (k ), s 2 (k ), p(k ), F2 (k )) . (6.17)
∆p (k ) = f ( x(k ), s (k ), s (k ), p (k ))
 4 1 2
∆V ( k ) = f 5 ( F1 (k ), F2 (k ))
We make several simplifications to reduce the number of the variables involved

in models to be identified.
i) Equations for ∆s1 and ∆s2 are not targets of modeling because of the
possibility of their feedback control by F1 and F2.
ii) According to (6.16), the equation for V is trivial and there is no need for
modeling V from control point of view.
iii) The effect of p to ∆x and ∆p is assumed to be small and the variable is
neglected from respective equations.
Doing so, we obtain the structure for the model(s) to be identified:
∆x(k ) = f 1 ( x(k ), s1 (k ), s 2 (k ))
 . (6.18)
∆p(k ) = f 2 ( x(k ), s1 (k ), s 2 (k ))
Note that the reasoning mechanism of the model (fuzzification, conjunction,
implication, aggregation and defuzzification) is largely determined by the fuzzy
model identification algorithm. If the tuning algorithm does not require special
inference type, we apply Mamdani inference (min-min-max) with center-of-
gravity defuzzification; in association with least square estimation, prod-prod-
sum inference with weighted average defuzzification has to be used. The
domains of the model variables are determined by the minimum and maximum
values found in training data.
162
Training data
Apparently we have the case where uniform coverage (see section 4.3) of input
data set is unattainable because the variables acting as the inputs of the model
cannot be influenced directly. On the other hand, the criterion for selecting the
data must be in accordance with the goal of modeling.
As we are modeling primarily for control and nominal feed rates are readily
available, the training data set relevant from control viewpoint can be measured
from experiments conducted with nominal feed profiles. It is reasonable to
conduct several experiments because of immeasurable disturbances and
parameter variation (in present case, the number of experiments conducted is 5).
Training data set is shown in Figs. 6.23-6.25.
45 1.5
40
35 1
30
0.5
25
x ∆x
20
0
15
10
-0.5
5
0 -1
0 20 40 60 80 100 120 0 20 40 60 80 100 120
t t
Fig. 6.23. Training data. Biomass concentration (left) and biomass growth.
45 1.5
40
35
30
1
25
s1 20 s2
15
0.5
10
-5 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
t t
Fig. 6.24. Training data. Substrate concentrations, s1 (left) and s2 (right).
163
4
x 10
100 10
80 9
8
60
7
40
6
∆p 20 J 5
0 4
3
-20
2
-40
1
-60
0 20 40 60 80 100 120 0
0 20 40 60 80 100 120
t
t
Fig. 6.25. Training data. Product growth (left) and performance indicator J (right).
Modeling results
The primary goal of modeling is to achieve as good approximation as possible.

The parameters affecting the quality of modeling (the number of membership
functions, i.e. the density of the partition) were chosen in accordance with that
goal. ANFIS was applied just to produce the results for comparison.
The root mean square errors for both ∆x and ∆p are presented in Table 6.5 along
with the number of rules and MFs for each method, graphical illustrations of the
modeling quality are shown in Figs. 6.26-6.28. It is quite obvious that modeling
with fuzzy template modeling produces results somewhat secondary compared
to GK/LSE and ANFIS.
1.5 100
80
1
60
0.5
40
∆x ∆p
20
0
0
-0.5
-20
-1 -40
0 20 40 60 80 100 120 0 20 40 60 80 100 120
t t
Fig. 6.26. Modeling with ANFIS, ∆x (left) ∆p (right).
164
Table 6.5. Modeling results.

Method RMSE (∆x) RMSE (∆p) R Partition (x-s1-s2; ∆x-∆p)
FTM 0.2581 17.136 68 4-7-5; 5-5
GK+LSE 0.2351 10.322 52 7-7-7; 52-52
ANFIS 0.2339 10.029 64 4-4-4; 62-62
1.5 100
80
1
60
0.5 40
∆x ∆p
20
0
0
-0.5
-20
-1 -40
0 20 40 60 80 100 120 0 20 40 60 80 100 120
t t
Fig. 6.27. Modeling with GK/LSE, ∆x (left) ∆p (right).
1.5 100
80
1
60
0.5 40
∆x ∆p
20
0
-0.5
-20
-1 -40
0 20 40 60 80 100 120 0 20 40 60 80 100 120
t t
Fig. 6.28. Modeling with fuzzy template algorithm, ∆x (left) ∆p (right).
165
very_small
zero very_small small below_aver. average above_aver. big
zero small below_aver.average above_aver. big
1.0
1.0
µ 0.5 µ 0.5
0 0
0 5 10 15 20 25 30 35 40 0.2 0.4 0.6 0.8 1 1.2 1.4

s1 s2
very_small
extr._small small average above_aver. big very_big
1.0
µ 0.5
5 10 15 20 25 30 35 40
x
Fig. 6.29. Input membership functions of the model after applying GK/LSE
training.
negative zero small aver. big negative zero small average big
1.0 1.0
µ µ 0.5
0.5
0 0
-40 -20 0 20 40 60 80 -0.5 0 0.5 1 1.5

dp dx
zero small aver. above_aver. big very_big zero small below_average average big
1.0 below_aver. 1.0
µ 0.5
µ 0.5
0 0
0 10 20 30 40 0 0.5 1 1.5
s1 s2
small below_average average big
1.0
µ 0.5
0 5 10 15 20 25 30 35 40
x
Fig. 6.30. Partition of the model defined by human expert.
166
Supervisor design
The rules of the identified models, once again, confirm that rapid growth of
product concentration is possible only in presence of relatively high biomass
concentration. Because the initial concentration of biomass in fermentor is low
the control system must guarantee rapid biomass growth until it is sufficient.
Once it is achieved, further biomass growth is not required. According to that
strategy, the rules providing rapid biomass growth in the first phase of the
process and rapid product growth rate in the second phase are sorted out.
Table 6.6. Rules of the FT model.

x s1 s2 ∆x ∆p
big zero zero big (0.01) big(0.68)
negative(0.26) average(0.15)
zero(0.22) negative(0.10)
small(0.33) zero(0.02)
average(0.18) small(0.05)
big small zero big (0.01) big(0.82)
negative(0.23) average(0.13)
zero(0.23) small(0.04)
small(0.35) zero(0.01)
average(0.18)
average zero zero average(0.19) big(0.28)
small(0.52) average(0.50)
negative(0.13) zero(0.01)
average small zero average(0.19) big(0.30)
small(0.53) average(0.49)
negative(0.13) zero(0.01)
below_average above average small big(0.95) small(0.38)
average(0.05) average(0.62)
below_average below_average small big(1.00) big(0.07)
average(0.84)
small(0.09)
below_average average small big(0.99) big(0.06)
small(0.19)
small above_average small big(0.87) small(0.65)
small below_average small big(1.00) small(0.58)
average(0.42)
small average small big(0.98) small(0.52)
167
i) FT model
The model with weighted rules results in large number of rules and selection of
appropriate ones is quite complicated. The rules stating high level of s1 and s2
simultaneously with high level of x were ignored (referring to non-realistic
situations) and rules in table 6.6 were chosen for supervisor design (rule weights
given in parentheses).
ii) LSE model.
LSE algorithm calculates individual br for each rule extracted. In Table 6.7,
these are characterized by their parameter values. Rule sorting compared to FT
model is relatively easy task.
Table 6.7. Rules of the LSE model.

x s1 s2 ∆x ∆p
extr_small average above_average 0.804 0.316
extr_small above_average above_average 0.599 0.403
very_small above_average average 0.427 0.404
very_small big average 0.616 1.620
small average small 3.174 9.797
average zero small 0.048 24.927
average very_small very_small 0.305 28.179
average small very_small 2.624 32.261
average below_average very_small 0.337 33.028
average average very_small 3.494 45.616
average above_average very_small 2.249 42.971
average big small 2.536 68.740
above_average very_small very_small 0.478 19.087
big very_small zero 0.228 46.977
very_big very_small zero 0.044 59.333
Fuzzy supervisor format in this case is somewhat different from (6.13) because
of additional state variable and is given by 6.19.
IF x is Ar THEN s1 is B1r AND s2 is B2r. (6.19)
Again, several potential supervisors come up in local inversion. The parameters
of A1r, B1r and A1r are taken from the identified model.
The fuzzy supervisor will be then embedded into the control system, where it
computes reference values r1 and r2 for PI controllers that implement the low-
level feeding rate control of f1 and f2 (Fig. 6.31).
168
Supervisor
r2 r1
PI PI
s1 s2 x
f1
f2 Fermentor
Fig. 6.31. The control system.
Control results
Five supervisors were derived (2 on the basis of LSE, 3 on the basis of FTM)
and five consecutive experiments were conducted with each scenario. The
control scenarios are presented in tables 6.8-6.12 and control results are shown
in Figs 6.32-6.46.
Table 6.8. Scenario 1.1 (LSE).

x s1 s2
extr_small average above_average
very_small big average
small average small
average above_average very_small
above_average very_small very_small
big very_small zero
very_big very_small zero
Table 6.9. Scenario 1.2 (LSE).

x s1 s2
extr_small average above_average
very_small above_average average
small average small
average below_average very_small
above_average very_small very_small
big very_small zero
very_big very_small zero
169
Table 6.10. Scenario 2.1 (FTM).

x s1 s2
small average small
below_average average small
average small zero
big small zero

x s1 s2
small below_average small
below_average below_average small
average zero zero
big zero zero

x s1 s2
small above_average small
below_average above_average small
average small zero
big small zero
The scenarios with smaller s1 and s2 (1.2, 2.2) lead to longer duration of
fermentation and higher concentration of p. All chosen scenarios and very likely
most of their combinations give better J than nominal feeds (Fig. 6.23). The
only exception is scenario 2.2 that takes extremely long time to fill the
fermentator therefore giving lower J, but is interesting because of very high
product concentration.
4000 4000
3000 3000
2000 2000
p V
1000 1000
0 0
0 50 100 t 0 50 100 t
4
x 10
15
40
10
30
x 20 J
5
10
0 0
0 50 100 t 0 50 100 t
Fig. 6.32. Control results. Scenario 1.1 (p, V, x, J).
170
1.5
40
1
30
r1 20 r2 0.5
10
0 0
0 50 100 t 0 50 100 t
1.5
40
1
30
s1 20 s2 0.5
10
0 0
0 50 100 t 0 50 100 t
Fig. 6.33. Control results. Scenario 1.1 (r1, r2, s1, s2).
30
40
20
30
F1 20 F2 10
10
0 0
0 50 100 t 0 50 100 t
Fig. 6.34. Control results. Scenario 1.1 (F1, F2).
4000 4000
3000 3000
2000 2000
p V
1000 1000
0 0
0 50 100 t 0 50 100 t
4
x 10
15
40
10
30
x 20 J
5
10
0 0
0 50 100 t 0 50 100 t
171
1.5
40
1
30
r1 20 r2 0.5
10
0 0
0 50 100 t 0 50 100 t
1.5
40
1
30
s1 20 s2 0.5
10
0 0
0 50 100 t 0 50 100 t
30
40
20
30
F1 20 F2 10
10
0 0
0 50 100 t 0 50 100 t
4000 4000
3000 3000
2000 2000
p V
1000 1000
0 0
0 50 100 t 0 50 100 t
4
x 10
15
40
10
30
x 20 J
5
10
0 0
0 50 100 t 0 50 100 t
172
1.5
40
1
30
r1 20 r2 0.5
10
0 0
0 50 100 t 0 50 100 t
1.5
40
1
30
s1 20 s2 0.5
10
0 0
0 50 100 t 0 50 100 t
30
40
20
30
F1 20 F2 10
10
0 0
0 50 100 t 0 50 100 t
4000
4000 3000
2000
p 2000 V
1000
0 0
0 200 t 0 200 t
4
x 10
15
40
10
30
x 20 J
5
10
0 0
0 200 t 0 200 t
173
1.5
40
1
30
r1 20 r2 0.5
10
0 0
0 200 t 0 200 t
1.5
40
1
30
s1 20 s2 0.5
10
0 0
0 200 t 0 200 t
30
40
20
30
F1 20 F2 10
10
0 0
0 200 t 0 200 t
4000 4000
3000 3000
2000 2000
p V
1000 1000
0 0
0 50 100 t 0 50 100 t
4
x 10
15
40
10
30
x 20 J
5
10
0 0
0 50 100 t 0 50 100 t
174
1.5
40
1
30
r1 20 r2 0.5
10
0 t
0 t
0 50 100 0 50 100
1.5
40
1
30
s1 20 s2 0.5
10
0 t
0 t
0 50 100 0 50 100
30
40
20
30
F1 20 F2 10
10
0 0
0 50 100 t 0 50 100 t
One would assume that the most accurate model leads us to the best controller.
That is not necessarily the case. Because we are using primarily the linguistic
layer of the model, low approximation error of FT modeling does not influence
the result. Still, least-squares or gradient-descent based techniques may be
preferred, for the largely human-independent modeling algorithm and smaller
number of rules involved.
6.4 Conclusions and comments
Although the control objects in this chapter were quite different, interestingly,
same type of control technique could be applied, namely, supervisory control
where the supervisor determines the control strategy and low-level control
(accomplishment of the control strategy) is carried out by conventional PI or PD
controllers. Why has that control configuration proved to be universal and what
is the reason of its success?
175
To find the answers, we need to look into the behavioral models of human
beings. Driving a car is a good example. In dense traffic, the driver must deal
with many things simultaneously - observe the regulating signs, observe the
movement of other cars in order to avoid the accidents, plan the road through
the city etc. and last but not least, deal with the controls of the car. It is not
difficult to see that human operation behind the steering wheel is a mixture of
conscious and subconscious actions.
First involve clear conscious decisions like "I need to turn to the right", others
are more bodily functions like how exactly to turn the wheel in order to follow
the given trajectory. Where does this subconscious skill come from?
When learning to drive, initially most of our efforts will be spent on elementary
things - how to turn the wheel, how to brake, how to change the speeds (!), etc.
We may know that the steering wheel is used to turn the car - turn left for left,
turn right for right - the accelerator pedal is used to speed up and the brake is
used to slow down, but this general knowledge has little value without the
driving experience.
On the other hand, those who actually drive cars may experience difficulties
when asked to put their knowledge into words. We rather seem to know
instinctively how hard to push the accelerator, or the brake, or turn the wheel, in
order to achieve certain results. Such knowledge is not instinctive, it is learned,
but has become so automatic that it rarely intrudes on consciousness.
The times when it does are likely to be unusual circumstances. When driving a
particular car for the first time, it may take a while before we have the feel of
the brake pedal and during this period we are more conscious than normal of the
pressure we exert on the pedal. Or in our regular car, we may practice
emergency braking so that when we are actually in an emergency, we have
developed the skills necessary to avoid a collision.
By using supervisory control systems we actually copy the same principle and
separate the actions that require extensive training to achieve desired accuracy
from those that are more general and can be formulated verbally. This
decomposition's primary result is that each block in the control system would
have a reduced dimensionality that is very important in fuzzy logic control.
The second conclusion derived from this analogy is that each block should be
implemented using proper techniques. This is the reason why low-level fuzzy
logic controllers were not used although the methodology exists (section 5.2).
Conventional PID controllers can be tuned much more efficiently.
Now, the logical conclusion about the supervisory part of the control system is
that the transparency constraints are essential. Otherwise the control system we
obtain behaves like a driver who is drunk, confused or insane. And low-level
fuzzy logic controller implemented with fuzzy logic would (using the same
analogy) behave like inexperienced driver unless there is really a need for a
176
nonlinear mapping (unusual circumstances!) that can be obtained with fuzzy

logic more easily (like fuzzy P controller in section 6.2.2).
The control strategy is not always available. Sometimes we do not even have an
approximate idea about how to control the process. One possible solution as
demonstrated with fed-batch fermentation control, is to identify the fuzzy model
of the process that would act then as an expert we do not have through local
inversion method. This approach is not an optimal one as demonstrated in
section 6.3.1. The results obtained, however, show that although suboptimal,
these control systems perform surprisingly well.
Approximation error of the model is not the definite criterion of model
goodness in linguistic analysis as our results suggest. More important is
information content of the model that varies form case to case and is difficult to
evaluate. Controller synthesis based on linguistic analysis of the process model
is consequently a tricky business and seems to be successful only at a high
level, which is another motivation for hierarchical control.
177
178
Conclusions
Each scientific work strives to find the answers to specified questions and in
doing so, raises many others. This work is no exception. Here we summarize
what has been achieved in the current thesis and what are the open questions
that need further research.
7.1 Transparency conditions
We have proposed a set of transparency constraints (Riid and Rüstern 2000c),

(Riid et. al. 2000) to address the problem in fuzzy modeling where there is a
clear tradeoff between transparency and numerical accuracy. This tradeoff
exists basically because the identification algorithms make no account of the
semantic aspect of fuzzy systems. In result, the identified models appear as
black boxes and no linguistic information can be used in further stages of
analysis and design. This can be considered unfortunate, as ability to process
information both numerically and linguistically is arguably the most attractive
property of fuzzy systems and the main motivation for using them in modeling
and/or control.
For standard fuzzy and 0th order TS systems transparency constraints are based
on the ideology of transparency checkpoints. Transparency checkpoint of a rule
is the point in the input-output space where the given rule acquires maximum
activation degree (equal to one). A fuzzy system is considered transparent if all
its rules are transparent and a given rule is transparent if the system input-output
mapping runs through the transparency checkpoint of the rule.
Consequently, according to the given definition, transparency of 0th order TS
and standard fuzzy systems has binary character - it either exists or does not
exist.
It has been shown here that transparency of the system can be preserved by
applying certain constraints on system MFs based on the proposed definition,
179
namely, input MFs must not overlap more than 50% and output MFs must be
symmetric. These constraints must be satisfied if the system under consideration
is to be obtained by manual design or with the help of identification algorithm.
It is suggested, that it would be more efficient to use certain types of MFs that
rule out non-transparency by definition - symmetric MFs or fuzzy singletons for
output variables and Jager definition of fuzzy sets for input variables.
The proposed constraints smoothly fit into the general background of fuzzy
control. 50% overlap is common choice in fuzzy control practice and fuzzy
singletons are even more widely employed. Therefore, implementation of
transparency constraints does not require reformulation of fuzzy system theory
and breaking the existing traditions that is all to often the case with many novel
techniques, arguably possessing many valuably properties but never gaining
much popularity because of incompatibility with existing norms. Of modeling
techniques described in this thesis only few require modifications in order to be
applied for acquiring transparent models. This would be similarly true for
control techniques and can be considered a great advantage.
With 1st order TS systems "transparency situation" is a bit different. 1st order TS
systems are interpreted in terms of local linear models but further reasoning on
this subject leads to a paradox - to allow correct interpretation of a system,
interpolation between the local models should be reduced to a degree where the
resulting system would not be fuzzy system at all! Therefore we can seek for
some kind of compromise between transparency and accuracy at best. In order
to estimate transparency of a 1st order TS system we need some kind of measure
of transparency of 1st order TS systems that was proposed in (Riid et. al. 2001).
7.2 Transparent modeling algorithms
We have derived two new algorithms for transparent fuzzy modeling that
address the gap between accuracy and transparency. First one, extension of
Jager algorithm (Riid and Rüstern, 2002a) is applicable to standard fuzzy
systems and is numerically superior to the original algorithm because the
interpolation properties of standard fuzzy systems allow more efficient
approximation.
Second algorithm, designed for 1st order TS systems and presented in section
4.9.2, uses rule degree activation exponents and directly addresses the
transparency-accuracy tradeoff of the identified model.
Intuitively, preventing system parameters from obtaining arbitrary values would
mean some loss of flexibility and approximation properties. This is definitely
true, but according to the results presented in section 4.10, transparent modeling
of systems and processes with relatively low approximation error is possible,
particularly if the proposed algorithms are used.
180
7.3 Transparent fuzzy control
Third contribution of this work lies in the investigation of hierarchical hybrid

control. We believe that successful application of this type of control, presented
in this thesis, is not just coincidence and the reasons for its success must be
sought from the principles of human interaction with the physical environment.
In interaction with the real world we are engaged in a continuous process of
constructing representations of that environment and our experience of it. These
representations are of many forms, some very general, others highly specific.
They may also operate at different levels of consciousness.
As human beings, we engage in conscious and unconscious model building all
the time. We draw pictures, use language, write descriptions, invent words, and
form logical, causal, or relational links between things. These are examples of
the conscious model-building processes we engage in. We also develop quite
remarkable though largely subconscious models. We learn to walk and talk, to
play tennis or the piano, etc.
Conscious models can be effectively modeled with fuzzy logic. Expert
information that appears in the form of conscious models, can occasionally be
very simple like the supervisor of the fed-batch fermentation process with single
substrate feed that consists of four rules only that were obtained by modeling
the process and extracting relevant control rules with the help of local linguistic
inversion (Riid and Rüstern 1999)
Human brain and nervous system Automatic control system
conscious fuzzy
decisions supervisor
low-level
subconscious conventional
decisions (PID) controller
actuators
Fig. 7.1. Replacing human operator with hierarchical fuzzy control system.
For subconscious models, however, application of conventional control or

black-box techniques would be more appropriate because the primary criterion
is control accuracy, high performance cannot be achieved with the same level of
181
simplicity and complex and transparent fuzzy systems cannot be so effectively

tuned.
If we are able to divide the required control actions into these two categories,
hybrid supervisory control can be suitable implementation (Fig. 7.1).
Such control system is, in fact, able to improve the control, when compared to
existing human operator and superior to all-in-one approaches. The supporting
evidence is provided with the results of truck backer-upper experiment (Riid
and Rüstern 2001).
Its other advantage is that this way the subtasks can be dealt with individually
and consequently with higher precision but without significant increase of
complexity. All in all, problem decomposition is particularly important in fuzzy
logic control where curse of dimensionality is a very painful issue. The
approach is also by no means restricted to two-level architecture - for more
demanding control tasks such as truck-and-trailer backing, further
decomposition can be carried out (Riid and Rüstern 2002b).
7.4 Suggestions for further research
Further improvement of modeling algorithms. Present trade-off between

accuracy and interpretability of fuzzy systems is not optimal. Most notably,
gradient descent suffers from existence of local minima in training data and
from slow convergence. Application of transparency-protected higher order
methods (conjugate gradient, Levenberg-Marquardt method) to fuzzy systems
could be a solution for improving the approximation properties of transparent
modeling.
Transparent adaptive control. It is not yet clear how transparency constraints
would influence the performance of presently black-box techniques in adaptive
fuzzy control. Consequently, further research is necessary.
Investigation of linguistic stability. The concept of linguistic stability of
transparent fuzzy systems should be investigated in order to find out if it has
any use for stability analysis of fuzzy control systems.
182
References
1. Babuska, R., Jager, R. and Verbruggen, H.B. (1994), “Interpolation issues
in Sugeno-Takagi reasoning, ” Proc. IEEE World Congress on
Computational Intelligence, Orlando, pp.859-863.
2. Babuska, R. and Verbruggen, H.B. (1995), "A new identification method
for linguistic fuzzy models," Proc. FUZZ-IEEE/IFES'95, Yokohama, pp.
1207-1212.
3. Babuska, R., Sousa, J. and Verbruggen, H.B. (1995), “Model-based design
of fuzzy control systems,” Proc. 3rd EUFIT, Aachen, pp. 837-841.
4. Babuska, R., Fantuzzi, C., Kaymak, U. and Verbruggen, H.B., (1996)
“Improved inference for Takagi–Sugeno models,” Proc. of the Fifth IEEE
International Conference on Fuzzy Systems, New Orleans, pp. 701–706.
5. Babuska, R. (1997), Fuzzy Modeling and Identification, Ph.D. dissertation.
Technical University of Delft, Delft.
6. Baranyi, P., Bavelaar, I.M., Koczy, L.T. and Titli, A. (1997), "Inverse Rule
Base of Various Nonlinear interpolation techniques" Proc. IFSA'97, Prague,
pp. 121-126.
7. Baranyi, P., Bavelaar, I.M., Babuska, R., Kóczy, L.T., Titli, A. and H.B.
Verbruggen (1998) “A method to invert a linguistic fuzzy model,” Int. J.
Systems Science, vol. 29, no. 7, pp. 711-721.
8. Barto, A., Sutton, R. and Anderson, C. (1983), "Neuronlike adaptive
elements that can solve difficult learning control problems," IEEE Trans.
Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 834-846.
9. Battiti, R. (1992), "First and second order methods for learning: Between
steepest descent and Newton’s method," Neural Computation, vol. 4, no. 2,
pp. 141-166.
183
10. Berenji, H.R. and Khedkar, P. (1993), "Learning and tuning fuzzy logic
controllers through reinforcements," IEEE Trans. Neural Networks, vol. 3,
no. 5, pp. 724-740.
11. Bezdek, J.C. (1980), "A convergence theorem for the fuzzy ISODATA
clustering algorithm," IEEE Trans. Pattern Anal. Machine Intell., vol.
PAMI-2, no. 1, pp. 1-8.
12. Bezdek, J.C. (1981), Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, New York.
13. Bikdash, M. (1999), "A highly interpretable form of Sugeno inference
systems," IEEE Trans. Fuzzy Systems, vol. 7, no. 6, pp. 686 -696.
14. Braae, M. and Rutherford, D.A. (1979a), "Theoretical and Linguistic
Aspects of the Fuzzy Logic Controller," Automatica, vol. 15, no. 5, pp. 553-
577.
15. Braae, M. and Rutherford, D.A. (1979b), "Selection of Parameters for a
Fuzzy Logic Controller," Fuzzy Sets and Systems, vol.2, pp.185-199.
16. Brown, M. and Harris, C.J. (1994), Neurofuzzy Adaptive Modelling and
Control, Prentice Hall, Englewood Cliffs.
17. Castro, J.L. (1995), "Fuzzy logic controllers are universal approximators,"
IEEE Trans. Syst., Man, Cybern., vol. 25, pp. 629-635.
18. Chae, Y., Oh, K., Lee, W. and Kang, K. (1999) "Transformation of TSK
fuzzy system into fuzzy system with singleton consequents and its
application," Proc. IEEE International Fuzzy Systems Conference, pp. 969-
1073.
19. Chen, Y.Y. and Tsao, T.C. (1989), "A description of the dynamical
behavior of fuzzy systems," IEEE Trans. Syst., Man and Cybern., vol.19,
pp.745-755.
20. Chiu, S. (1994), "Fuzzy model identification based on cluster estimation,"
Journal of Intelligent and Fuzzy Systems, vol. 2, no. 3, pp. 267-278.
21. Darken, C. and Moody, J. (1991), "Towards faster stochastic gradient
search." in Advances in Neural Information Processing Systems 4, Tesauro,
G., Cowan, J.D. and Alspector, J. (eds.), Morgan Kaufmann Publishers,
San Francisco, pp. 1009-1016.
22. Dennis, J.E. and Schnabel, R.B. (1983), Numerical Methods fo
Unconstrained Optimization and Nonlinear Equations, Englewood Cliffs,
Prentice-Hall.
23. Duda R.O. and Hart, P.E. (1973), Pattern Classification and Scene
Analysis, Wiley, New York.
184
24. Dunn, J.C. (1974), "A Fuzzy Relative of the ISODATA Process and its Use
in Detecting Compact Well-Separated Clusters," Journal of Cybernetics,
vol. 3, no. 3, pp. 32-57.
25. Dutton, K., Thompson, S. Barraclough, B. (1997), The Art of Control
Engineering, Henry Ling, Dorchester.
26. Fed Batch Fermentation Process Modelling and Control Competition,
(1997), The Industrial Control Centre, University Of Westminster,
http://www.wmin.ac.uk/ICC/compete/modcomp.htm
27. Fletcher R. and Reeves C.M. (1964), "Function minimization by conjugate
gradients", Computer Journal, vol. 7, pp. 149-154.
28. Furuhashi, T., Horikawa, S. and Uchikawa, Y. (1992), "On Stability of
Fuzzy Control Systems Using a Fuzzy Modeling Method" Proc. 17th Conf.
IEEE Industrial Electronics Society, pp. 982-985.
29. Guely, F. and Siarry, P. (1993), "Gradient descent method for optimizing
various fuzzy rule bases" Proc. 2nd IEEE Int. Conf. Fuzzy Systems, San
Francisco, pp. 1241-46.
30. Gustafson, D.E. and Kessel, W.C. (1979), "Fuzzy clustering with a fuzzy
covariance matrix," Proc. IEEE Conference on Decision and Control, San
Diego, pp. 761-766.
31. Haykin, S. (1994), Neural Networks, A Comprehensive Foundation,
Macmillan, New York.
32. Holland, J.H. (1975), Adaptation in Natural and Artificial Systems, The
University of Michigan Press, 1975.
33. Holmblad, L.P. and Ostergaard, J.J. (1982), "Control of Cement Kiln by
Fuzzy Logic," in Approximate Reasoning in Decision Analysis, Gupta,
M.M. and Sanchez, E. (eds.), Amsterdam, pp. 389-400.
34. Jager, R. (1995), Fuzzy Logic in Control, Ph.D. dissertation, Technical
University of Delft, Delft.
35. Jang, J.-S.R. (1992a), "Self-learning fuzzy controllers based on temporal
back propagation," IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 714-
723.
36. Jang, J.-S.R. (1992b), Neuro-Fuzzy Modeling: Architectures, Analyses, and
Applications, Ph.D. Dissertation, EECS Department, Univ. of California at
Berkeley.
37. Jang, J.-S.R. (1993a), "ANFIS: Adaptive-network-based fuzzy inference
system," IEEE Trans. System, Man, Cybern., vol. 23, no. 3, pp. 665-685.
185
38. Jang, J.-S.R. (1993b), "Functional equivalence between radial basis

function networks and fuzzy inference systems," IEEE Trans. Neural
Networks, vol. 4, no.1, pp. 156-159.
39. Jang, J.-S.R. and Mizutani E. (1996), "Levenberg-Marquardt Method for
ANFIS Learning," Proc. Int. NAFIPS Joint Conference, Berkley, USA, pp.
87-91.
40. Jenkins, D. and Passino, K.M. (1999), "An Introduction to Nonlinear
Analysis of Fuzzy Control Systems," J. Intelligent and Fuzzy Systems, vol.
7, no. 1, pp. 75-103.
41. Jin, Y. (2000), "Fuzzy modeling of high-dimensional systems: Complexity
reduction and interpretability improvement", IEEE Trans. Fuzzy Systems,
vol. 8, no. 2, pp. 212-220.
42. Ju, G. and Chen, L.J. (1996), “Linguistic stability analysis of fuzzy closed
loop control systems,” Fuzzy Sets and Systems, vol. 82, No. 1, pp. 27-34.
43. Kandel, A., Luo, Y. and Zhang, Y.Q. (1999), “Stability Analysis of Fuzzy
Control Systems,” Fuzzy Sets and Systems, vol. 105, pp. 33-48.
44. Kickert, W.J.M. and Mamdani, E.H. (1978), "Analysis of a fuzzy logic
controller," Fuzzy Sets and Systems, vol. 1, No. 1, pp. 29-44.
45. Kong, S.-G. and Kosko, B. (1992), "Adaptive Fuzzy Systems for Backing
up a Truck-and-Trailer," IEEE Trans. on Neural Networks, vol. 3, no. 5, pp.
211-223.
46. Kosko, B. (1992a) Neural Networks and Fuzzy Systems: A Dynamical
Systems Approach to Machine Intelligence, Prentice-Hall, Englewood
Cliffs.
47. Kosko, B. (1992b) Fuzzy systems as universal approximators, Proc. IEEE
Int. Conf. Fuzzy Syst., San Diego, pp. 1153-1162.
48. Layne J.R. and Passino K.M. (1993), "Fuzzy Model Reference Learning
Control for Cargo Ship Steering," IEEE Control Systems Magazine, vol. 13,
no. 6, pp. 23-34.
49. Lee, C.C. (1990) "Fuzzy logic in control systems: Part I and Part II," IEEE
Trans. Systems, Man and Cybernetics, vol. 20, no. 2, pp. 404-435.
50. Li, H.-X. and Gatland, H.B. (1996) "Conventional fuzzy control and its
enhancement," IEEE Transactions on Systems, Man, and Cybernetics, vol.
26, no. 5, pp. 791-797.
51. Liska J. and Melsheimer S.S. (1994), "Complete design of fuzzy logic
systems using genetic algorithms". Proc. 3rd IEEE Conf. on Fuzzy Systems,
vol. 2, pp. 1377 -1382.
186
52. Lotfi, A., Andersen, H.C. and Tsoi, A.C. (1996), "Interpretation
preservation of adaptive fuzzy inference systems," Int. J. Approxim.
Reasoning, vol. 15, no. 4, pp. 379-394.
53. Macvicar-Whelan P.J. (1976), "Fuzzy sets for man-machine interaction,"
Int. J. Man-Mach. Studies, vol. 8, pp. 687-697.
54. Mamdani E.H. and Assilian, S. (1975), "An experiment in linguistic
synthesis with a fuzzy logic controller," Int. J. Man-Machine Studies, vol. 7,
pp. 1-13.
55. Mamdani, E.H. (1993), "Twenty Years of Fuzzy Control: Experiences
Gained and Lessons Learnt," Proc. 2nd IEEE Int. Conf. on Fuzzy Systems,
San Francisco, pp. 1585-1588.
56. Marquardt, D.W. (1963), "An algorithm for the estimation of non-linear
parameters", SIAM Journal, vol. 11, pp. 431-441.
57. Männle, M. (2000) "FTSM - Fast Takagi-Sugeno Fuzzy Modeling", Proc.
Safeprocess 2000, IFAC, Budapest, Hungary, pp. 587-591.
58. Minsky, M. and Papert, S. (1969), Perceptron, MIT Press, Cambridge.
59. Mizumoto, M. (1987), "Fuzzy Controls under Various Approximate
Reasoning Methods," Preprints of 2nd IFSA Congress, Tokyo, Japan, pp.
143-146.
60. Moody, J. and C. Darken (1989), "Fast learning in networks of locally tuned
units," Neural Computations, vol. 1, no. 2, pp. 281-294.
61. Nauck, D. and Kruse, R. (1994), "NEFCON-I: An X-Window Based
Simulator for Neural Fuzzy Controllers," Proc. IEEE Int. Conf. Neural
Networks at IEEE WCCI'94, Orlando, pp. 1638-1643.
62. Nauck, D., Klawonn, F. and Kruse, R. (1997), Foundations of Neuro-Fuzzy
Systems. Wiley, Chichester, 1997.
63. Nauck, D. and Kruse, R. (1998), "How the Learning of Rule Weights
Affects the Interpretability of Fuzzy Systems," Proc. IEEE International
Conference on Fuzzy Systems, Anchorage, pp. 1235-1240.
64. Nauck, D. and Kruse, R. (1999), "Neuro-fuzzy systems for function
approximation," Fuzzy Sets and Systems, vol. 101 no.2, pp. 261-271.
65. Nauta Lemke, H., Krijgsman, A. (1991), "Design of Fuzzy PID Supervisors
for Systems with Different Performance Requirements," Proc. IMACS'91,
Dublin, pp.??
66. Nguyen D. and Widrow, B. (1990), "The truck backer-upper: An example
of self-learning in neural network," IEEE Contr. Syst. Mag., vol. 10, no. 2,
pp. 18-23.
187
67. Nomura, H., Hayashi, I. and Wakami, N. (1992), "A learning method of
fuzzy inference by descent method," Proc. 1st IEEE Int. Conf. on Fuzzy
Systems, San Diego, pp. 485-491.
68. de Oliveira J.V (1999) "Semantic constraints for membership function
optimization". IEEE Trans. Systems, Man and Cybernetics, vol. 29, no. 1,
pp. 128 -138.
69. Passino, K. and Yurkovich, S. (1998), Fuzzy Control. Addison-Wesley,
Menlo Park.
70. Polyak, B.T. (1969) "Minimization of Nonsmooth Functionals", USSR
Computational Mathematics and Mathematical Physics, vol. 9, pp. 14-29.
71. Procyk, T.J. and Mamdani, E.H. (1979), "A linguistic self-organizing
process controller", Automatica, vol. 15, pp. 15-30.
72. Ray, K.S. and Majumder, D.D. (1984), "Application of Circle Criteria for
Stability Analysis of Linear SISO and MIMO Systems Associated with
Fuzzy Logic Controller," IEEE Trans. Syst. Man, Cybern., vol. SMC-14,
no. 2, pp. 345-349.
73. Riid, A. and Rüstern, E. (1998), "Comparison of fuzzy function
approximators," Proc. 6th Biennal Baltic Electronic Conference, Tallinn, pp.
139-142.
74. Riid, A. and Rüstern, E. (1999), "Fuzzy modelling and control of fed-batch
fermentation" in Computational Intelligence and Applications, Pjotr
Szczepaniak (ed.), Physica Verlag, New York, pp. 283-291.
75. Riid, A. and Rüstern, E. (2000a), Interpretability versus adaptability in
fuzzy systems. Proc. Estonian Acad. Sci. Eng., vol. 6, no. 2, pp. 76-95.
76. Riid, A. and Rüstern, E. (2000b), "Supervisory fed-batch fermentation
control on the basis of linguistically interpretable fuzzy models" Proc.
Estonian Acad. Sci. Eng., vol. 6, no. 2, pp. 96-112.
77. Riid, A. and Rüstern, E. (2000c), "Transparent fuzzy systems and modeling
with transparency protection," Proc. IFAC Symp. on Artificial Intelligence
in Real Time Control, Budapest, pp. 229-234.
78. Riid, A., Jartsev, P. and Rüstern, E. (2000), “Genetic algorithms in
transparent fuzzy modeling”, Proc. 7th Biennal Baltic Electronic
Conference, Tallinn, pp.91-94.
79. Riid, A., Isotamm, R. and Rüstern, E. (2001), "Transparency analysis of 1st
order Takagi-Sugeno systems," Proc 10th Int. Conf. System-Modeling-
Control, Zakopane, vol. 2, pp. 165-170.
80. Riid, A. and Rüstern, E. (2001), "Fuzzy logic in control: Truck Backer-
Upper problem revisited," Proc. IEEE 10th Int. Conf. Fuzzy Systems,
Melbourne, vol. 1, pp. 513-516.
188
81. Riid, A. and Rüstern, E. (2002a), “Gradient descent based optimization of

transparent Mamdani systems”, accepted for 6th International Conference on
Neural Networks and Soft Computing, Zakopane.
82. Riid, A. and Rüstern, E. (2002b), “Fuzzy hierarchical control of truck and
trailer”, submitted for 8th Biennal Baltic Electronic Conference, Tallinn.
83. Riid, A., Isotamm, R. and Rüstern, E. (2002), “Transparency Enhancement
of First Order Takagi-Sugeno Systems: Promoting the Competition
Between the Rules by Controlling the Overlap of Input Fuzzy Sets”,
submitted for 7th Biennal Baltic Conference, Oct. 6-9, Tallinn, Estonia.
84. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986), "Learning internal
representations by error propagation," in Parallel Data Processing, D.
Rumelhart and J. McClelland (eds.), M.I.T Press, Cambridge, vol. 1, pp.
318-362.
85. Schoenauer, M. and Ronald, E. (1994), "Neuro-genetic truck backer-upper
controller," Proc. IEEE Conf. on Computational Intelligence, pp. 720-723.
86. Setnes, M., Babuska, R., and Verbruggen, H.B. (1998), "Rule-Based
Modeling: Precision and Transparency," IEEE Trans. Syst., Man and
Cybern., vol. 28, no. 1, pp. 165-169.
87. Setnes, M. and Kaymak, U. (1998), "Extended fuzzy c-means with volume
prototypes and cluster merging," Proc. EUFIT'98, Aachen, pp. 1360-1364.
88. Setnes, M., Babuska, R., Kaymak, U. and Nauta Lemke, H. R. van (1998),
"Similarity measures in fuzzy rule base simplification," IEEE Trans. on
Syst., Man and Cybern., vol. 28, no. 3, pp. 376-386, 1998.
89. Setnes, M. and Roubos, J.A. (1999), "Transparent Fuzzy Modeling using
Fuzzy Clustering and GAs," Proc. 18th Int. Conf. NAFIPS, New York, pp.
198-202.
90. Setnes, M. (2002), “Simplification and reduction of fuzzy rules”, in Trade-
Off Between Accuracy and Interpretability in Fuzzy Rule-Based Modelling,
J. Casillas, O. Cordon, F. Herrera and L. Magdalena (eds.), Springer-
Verlag, New York, 2002.
91. Shaw, I.S. (1998), Fuzzy Control of Industrial Systems, Kluwer Academic
Publishers, Boston.
92. Sun, C.-T. (1994), "Rule-Base Structure Identification in an Adaptive-
Network-Based Fuzzy Inference System," IEEE Trans. Fuzzy Systems, vol.
2, no. 1, pp. 64-73.
93. Takagi, T. and Sugeno, M. (1985), "Fuzzy identification of systems and its
applications to modeling and control," IEEE Trans. Syst., Man, Cybern.,
vol. SMC-15, no. 1, pp. 116-132.
189
94. Tan, G.V. and Hu, X. (1997), "More on designing fuzzy controllers using
genetic algorithms: guided constrained optimisation," Proc. 6th IEEE Int.
Conf. on Fuzzy Systems, Barcelona, vol. 1 pp. 497-502.
95. Tanaka, K. and Sugeno, M. (1992), "Stability analysis and design of fuzzy
control systems," Fuzzy Sets and Systems, vol. 45, No. 2, pp. 135-156.
96. Tanaka, K. Ikeda, T. and Wang, H.O. (1996), "Robust Stabilization of a
Class of Uncertain Nonlinear Systems via Fuzzy Control: Quadratic
Stabilization, H Control Theory, and Linear Matrix Inequalities," IEEE
Transactions on Fuzzy Systems, vol. 4, no. 1, pp.1-13.
97. Tong. R.M. (1978), "Synthesis of fuzzy models for industrial processes:
Some recent results," Int. J. General Systems, vol. 4, pp. 143-162.
98. Tsoukalas, L. and Uhrig, R. (1996), Fuzzy and Neural Approaches in
Engineering, Wiley, New York.
99. Viesturs, U.E. et. al. (eds.) (1992), Automation of Biotechnological
Processes. Zinatne, Riga, (in Russian).
100. Wang, L.-X. and Mendel, J.M. (1992a), "Generating fuzzy rules by
learning from examples," IEEE Trans. on System, Man, and Cybernetics,
vol. 22, no. 6, pp. 1414-1427.
101. Wang, L.-X. and Mendel, J.M. (1992b), "Back-propagation fuzzy system
as nonlinear dynamic system identifiers", Proc. 1st IEEE Int. Conf. on
Fuzzy Systems, San Diego, 1992, pp. 1409-1416.
102. Wang, L.-X. (1992c), "Fuzzy systems are universal approximators," Proc.
IEEE Int. Conf. Fuzzy Syst., San Diego, pp. 1163-1170.
103. Wang, L.-X. (1993), "Stable adaptive fuzzy control of nonlinear systems,"
IEEE Trans. on Fuzzy Systems, vol. 1, no. 2, pp.146-155.
104. Whitley, D. (1993), A Genetic Algorithm Tutorial, Technical Report CS-
93-103, Colorado State University, Fort Collins.
105. Werbos, P.J. (1974), Beyond regression: new tools for prediction and
analysis in the behavioural sciences, Ph.D. thesis, Harvard University,
Boston.
106. Widrow, B. and Hoff, M.E. (1960), Adaptive Switching circuits,
WESCON Convention Record, Part IV, pp. 96-104.
107. Yager, R. and Filev, D. (1994a), Essentials of Fuzzy Modeling and
Control. Wiley, New York.
108. Yager, R. and Filev, D. (1994b), "Generation of fuzzy rules by mountain
clustering," Journal of Intelligent and Fuzzy Systems, no. 2, pp. 209-219.
109. Zadeh, L.A. (1965), "Fuzzy Sets," Information and Control, vol. 8, pp.
338-353.
190
110. Zadeh L.A. (1973), "Outline of a New Approach to the Analysis of

Complex Systems and Decision Processes," IEEE Trans. On Systems,
Man, and Cybernetics, vol. 3, pp. 28-44.
111. Zadeh, L.A. (1996), "The Evolution of Systems Analysis and Control: A
Personal Perspective," IEEE Control Systems, vol. 16, no. 3. pp. 95-98.
191
192
Symbols and
abbreviations
Nomenclature
A, B – fuzzy sets
µ, µA(·) – membership degree, membership function of A
T(·, ·) – t-norm
S(·, ·) – s-norm
Ui – ith linguistic input variable
Vj – jth linguistic output variable
N – number of input variables
M – number of output variables
R – number of fuzzy rules
µir(xi) – input membership function of ith input variable associated with rth rule
F(y) – fuzzy output
Fr(y) – fuzzy rule output
γjr(yj) – output membership function of jth output variable, associated with rth
rule
yr – local output of 1st order TS systems
Y· – defuzzification function, e.g. Ycog, Ymom, Yfcm
Q – number of discretization intervals
air, bir, cir, dir – input MF parameters
pir – output MF parameters of TS systems
1 – unitary column vector
br, sr – output MF parameters
Tj – number of MFs per jth output variable
Si – number of MFs per ith input variable
Rmax – maximum number of rules
τr – activation degree of rth rule
193
µ is ( xi ) - sth MF of ith input variable (variable oriented notation)

εtr – transparency error
K – number of measured input-output patterns
Z – training data matrix
Wr – weight assigned to rth rule
φr - normalized activation degree of a rth rule
~
y - reference output
ε(k) – error on kth input-output pattern
η - learning rate
H – number of clusters,
µhk – kth input membership value in hth cluster
νh – hth cluster center
d(·, ·) – distance measure
M(·) – mountain function
Pl(·) – potential measure
J – cost function
Abbreviations
ANFIS artificial neuro-fuzzy inference system

CoG centre of gravity
FcM fuzzy c-means (defuzzification)
GA genetic algorithm
GD gradient descent
LSE least square estimation
MF membership function
MIMO multi input multi output
MISO multi input single output
MoM mean of maxima (defuzzification)
RMSE root mean square error
SISO single input single output
TS Takagi-Sugeno (systems)
194
List of publications
1. Riid, A. and Rüstern, E. (1998), "Comparison of fuzzy function
approximators," Proc. 6th Biennal Baltic Electronic Conference, Tallinn, pp.
139-142.
2. Riid, A. and Rüstern, E. (1999), "Fuzzy modelling and control of fed-batch
fermentation" in Computational Intelligence and Applications, Pjotr
Szczepaniak (ed.), Physica Verlag, New York, pp. 283-291.
3. Riid, A. and Rüstern, E. (2000), Interpretability versus adaptability in fuzzy
systems. Proc. Estonian Acad. Sci. Eng., vol. 6, no. 2, pp. 76-95.
4. Riid, A. and Rüstern, E. (2000), "Supervisory fed-batch fermentation
control on the basis of linguistically interpretable fuzzy models" Proc.
Estonian Acad. Sci. Eng., vol. 6, no. 2, pp. 96-112.
5. Riid, A. and Rüstern, E. (2000), "Transparent fuzzy systems and modeling
with transparency protection," Proc. IFAC Symp. on Artificial Intelligence
in Real Time Control, Budapest, pp. 229-234.
6. Riid, A., Jartsev, P. and Rüstern, E. (2000), “Genetic algorithms in
transparent fuzzy modeling”, Proc. 7th Biennal Baltic Electronic
Conference, Tallinn, pp.91-94.
7. Riid, A., Isotamm, R. and Rüstern, E. (2001), "Transparency analysis of 1st
order Takagi-Sugeno systems," Proc 10th Int. Conf. System-Modeling-
Control, Zakopane, vol. 2, pp. 165-170.
8. Riid, A. and Rüstern, E. (2001), "Fuzzy logic in control: Truck Backer-
Upper problem revisited," Proc. IEEE 10th Int. Conf. Fuzzy Systems,
Melbourne, vol. 1, pp. 513-516.
9. Riid, A. and Rüstern, E. (2002), “Transparent fuzzy systems in modeling
and control”, submitted for J. Casillas, O. Cordon, F. Herrera and L.
Magdalena (Eds.) Trade-off between Accuracy and Interpretability in Fuzzy
Rule-Based Modelling, Physica-Verlag.
195
10. Riid, A. and Rüstern, E. (2002), “Gradient descent based optimization of

transparent Mamdani systems”, submitted for 6th International Conference
on Neural Networks and Soft Computing, Zakopane.
11. Riid, A. and Rüstern, E. (2002), “Fuzzy hierarchical control of truck and
trailer”, submitted for 8th Biennal Baltic Electronic Conference, Tallinn.
12. Riid, A. Isotamm, R. and Rüstern, E. (2002), “Transparency Enhancement
of First Order Takagi-Sugeno Systems: Promoting the Competition
Between the Rules by Controlling the Overlap of Input Fuzzy Sets”,
submitted for 8th Biennal Baltic Electronic Conference, Tallinn.
196
Types of
membership
functions
A.1 Singleton MF
1, if x = b
µ ( x) =  (A.1)
0, otherwise
A.2 Gaussian MF
( x −c ) 2
−
(A.2)
µ A ( x) = e 2σ 2
A.3 Triangular MF
x − a
b − a , a ≤ x ≤ b

c − x
µ A ( x) =  , b≤x≤c (A.3)
c − b
0, c<x<a


197
A.4 Symmetrical triangular MF

 2 y − 2b + s
 , b−s/2< y < b
s

 2b − 2 y + s
µ A ( y) =  , b < y < b+ s/2 (A.4)
 s
0, otherwise


A.5 Trapezoid MF
x − a
b − a , a ≤ x ≤ b

d − x , c ≤ x ≤ d
µ A ( x) =  (A.5)
d − c
1, b≤x≤c
0, d <x<a

A.6 Square spline MF

0, if x ≤ a
 2
2 x − a  , if a ≤ x ≤ a + b
  b − a  2

 b− x a+b
2
1 − 2  , ≤ x≤b
 b−a 2

µ ( x) = 1, if b < x ≤ c (A.6)
 2
1 − 2 x − c  , c ≤ x ≤ c + d
 d −c 2

 d − x c+d
2
2  , ≤x≤d
  d −c 2
0, if x > d
198
A.7 Cube spline MF

0, if x ≤ a
 2 3
1 − 3 x − b  + 2 x − b  , if a < x ≤ b
    
a−b a −b

1, if b < x ≤ c
µ ( x) =  2 3
(A.6)
  x−c  x−c
1 − 3 d − c  + 2 d − c  , if c < x ≤ d
    
0, if x > d


199
200
Simplified inference
algorithms
Defuzzified output of a standard fuzzy system (in discrete form) is obtained
from
Q
∑ F ( yq ) yq
q =1
y = Ycog ( F ( y )) = Q
, (B.1)
∑ F ( yq )
q =1
where
R  N 
 
F ( y ) = U   I µ ir ( xi )  ∩ γ r . (B.2)
 
r =1   i =1  
Assuming product-product-sum inference:
R
 N  R
F ( y ) = ∑  ∏ µ ir ( xi ) γ r = ∑τ r γ r , (B.3)
r =1  i =1  r =1
 R  Q R R Q
 ∑τ r ⋅ Γr  Y T ∑∑τ r γ r ( y q ) y q ∑τ r ∑ γ r ( y q ) y q
Ycog ( F ( y )) =  R 
r =1 q =1 r =1 r =1 q =1
= = (B.4)
  Q R R Q
 ∑τ r ⋅ Γr 1 ∑∑τ r γ r ( y q ) ∑τ r ∑ γ r ( y q )
 r =1  q =1 r =1 r =1 q =1
[ ] [
where Γr = γ r ( y1 ) γ r ( y 2 ) ...γ r ( y q ) ...γ r ( y Q ) , Y = y1 y 2 ... y q ... yQ ] and 1 is
unitary column vector of Q elements. Note that the premise conjunction
201
operator actually does not play a role in further derivation process, i.e. it can be
an arbitrary t-norm.
System with singleton output MFs
1, y = br
γ r ( y) =  , (B.5)
0, otherwise
∑ γ r ( yq )y q = 1 ⋅ br (B.6)
q =1
∑γ r ( yq ) = 1 (B.7)
q =1
Substituting (B.7) and (B.8) into (B.4), (B.4) becomes

R
∑ τ r br
r =1
y= R (B.8)
∑τ r
r =1
Note that (B.9) is also the inference algorithm for 0th order TS systems
System with symmetrical output MFs
 2 y − 2br + s r
 , br − s r / 2 < y < br
sr

 2b − 2 y + s r
γ r ( y) =  r , br < y < b r + s r / 2 , (B.9)
 sr
0, otherwise


br br + s / 2
Q
2 y − 2br + s r 2br − 2 y + s r
lim ∑ γ r ( y q ) y q = ∫ ydy + ∫ ydy =
∆y →0 sr sr
q =1 b −s / 2 br
r (B.10)
bs bs bs
= r r + r r = r r
4 4 2
202
br br + s / 2
Q
2 y − 2br + s r 2br − 2 y + s r
lim ∑ γ r ( y q ) = ∫ dy + ∫ dy =
∆y →0 sr sr
q =1 b −s / 2 br
r (B.11)
s s s
= r + r = r
4 4 2
Substituting (B.10) and (B.11) into (B.4), (B.4) becomes
R
∑τ r br s r
r =1
y= R (B.12)
∑τ r s r
r =1
Note that if ∀sr = ξ, where ξ is arbitrary constant (B.12) becomes (B.9) which
implies that it is only the relative not the absolute size of sr that matters and thus
a Mamdani system with symmetrical triangular fuzzy output MFs with equal
supports is equivalent to a 0th order TS system.
203
204
The learning rules

for various types of
fuzzy systems
The derivation procedures of learning rules for various fuzzy systems are given
in this appendix.
At each training step l, the value of the error function is computed
ε (k ) =
1
[ y(k ) − ~y (k )]2 , (C.1)
2
being the squared difference between the kth reference value and model
response for the given input pattern that is obtained from the inference
functions.
0th order TS systems with triangular, trapezoid and spline-based MFs
Inference function in this case is given by

R N
∑ p0r (l )∏ µ ir ( x(k ))
r =1 i =1
y (k ) = R N
. (C.2)
∑∏ µ ir ( x(k ))
r =1 i =1
The learning task is to identify new consequent parameters p0r(l + 1) and input
MF parameters:
205
i) air(l + 1), bir(l + 1) and cir(l + 1) when using triangular MFs (A.3)
ii) air(l + 1), bir(l + 1), cir(l + 1) and dir(l + 1) when using trapezoid MFs (A.5)
iii) air(l + 1), bir(l + 1), cir(l + 1) and dir(l + 1) when square spline MFs (A.6)
According to (4.27) the update law for output parameters p0r is

∂ε (k )
p0 r (l + 1) = p0 r (l ) − η . (C.3)
∂p0 r (l )
To compute the derivative in (C.3) we apply the chain rule
∂ε ∂ε ∂y
= . (C.4)
∂p 0 r ∂y ∂p 0 r
Then
∂ε
= (y − ~
y) (C.5)
∂y
and
∂y ∂ p τ τ
= ⋅ R0 r r = R r .
∂p 0 r ∂p 0 r
∑τ r ∑τ r
(C.6)
r =1 r =1
Putting (C.5) and (C.6) into (C.4) and the latter into (C.3) we obtain the update
rule for consequent parameters:
τ (k )
p0 r (l + 1) = p0 r (l ) − η ( y (k ) − ~y ( k )) R r .
∑τ r ( k )
(C.7)
r =1
Triangular input MFs
The update rules for input MF parameters depend on what kind of MFs are
used. If triangular MFs (A.3) are used we must compute the partial derivatives
∂ε ∂ε ∂ε
, , ,
∂air ∂bir ∂cir
in order to derive the learning rules for all three parameters.
Once again, chain rule is used
∂ε ∂ε ∂y ∂τ r µ ir
= , (C.8)
∂air ∂y ∂τ r µ ir ∂air
206
where
R R R
∂y ∂
∑ p 0 rτ r p 0 r ∑ τ r − ∑ p 0 rτ r
p0r − y
r =1 r =1 r =1
= = = , (C.9)
∂τ r ∂τ r R
 
2 R
∑τ r ∑τ r
R
 ∑τ r 
r =1
 r =1  r =1
∂τ r ∂ N
τ
=
∂µ ir ∂µ ir
∏ µ ir = µ r . (C.10)
i =1 ir
When computing the last partial derivative, it must be taken into account that
triangular MF is piecewise continuous, therefore for each continuous region a
partial derivative must be found.
 ∂ xi − air xi − bir
 ∂a = , if air < xi < bir
∂µ ir  ir bir − air (bir − air ) 2
= , (C.11)
∂air  ∂ cir − xi
= 0, if bir < xi < cir
 ∂air cir − bir
 ∂ xi − air air − xi
 ∂b = , if air < xi < bir
∂µ ir  ir bir − air (bir − air ) 2
= , (C.12)
∂bir  ∂ cir − xi cir − xi
= , if bir < xi < cir
 ∂bir cir − bir (cir − bir ) 2
 ∂ xi − air
 ∂c = 0, if air < xi < bir
∂µ ir  ir bir − air
= . (C.13)
∂cir  ∂ cir − xi xi − bir
= , if bir < xi < cir
 ∂cir cir − bir (cir − bir ) 2
Combining (C.5), (C.9), (C.10) and one from (C.11)-(C.13) depending for
which parameter the learning rule is derived and substituting these into (C.8),
one obtains the following update rules
207
if air(l) < xi(k) < bir(l) (C.14)

 ~ τ r (k ) xi (k ) − bir (l )
air (l + 1) = air (l ) − η ( y ( k ) − y (k ))( p0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − air (l ))


∑ τ r (k ) i
r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~y (k ))( p (l ) − y ( k )) τ r ( k ) 1
 ir ir 0r R
( a (l ) − bir (l ))
 ∑τ r (k ) ir
 r =1
if bir(l) < xi(k) < cir(l) (C.15)

 ~ τ r (k ) 1
bir (l + 1) = bir (l ) − η ( y (k ) − y ( k ))( p 0 r (l ) − y (k )) R (c (l ) − bir (l ))


∑τ r (k ) ir
r =1

c (l + 1) = c (l ) − η ( y (k ) − ~ τ (k ) xi ( k ) − bir (l )
y ( k ))( p 0 r (l ) − y (k )) R r
 ir ir
(c (l ) − xir (k ))(cir (l ) − bir (l ))
 ∑τ r (k ) i
 r =1
Trapezoid input MFs
Extension from update rules (C.14-C.15) to the ones for trapezoid MF (A.5) is a
matter of rewriting:
if air(l) < xi(k) < bir(l) (C.16)

 ~ τ r (k ) xi (k ) − bir (l )
air (l + 1) = a ir (l ) − η ( y (k ) − y (k ))( p 0 r (l ) − y ( k )) R ( x ( k ) − air (l ))(bir (l ) − a ir (l ))


∑ τ r (k ) i
r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~ τ (k ) 1
y (k ))( p 0 r (l ) − y (k )) R r
 ir ir
(a (l ) − bir (l ))
 ∑τ r (k ) ir
 r =1
208
if cir(l) < xi(k) < dir(l) (C.17)

 ~ τ r (k ) 1
c ir (k + 1) = cir (k ) − η ( y ( k ) − y ( k ))( p 0 r (k ) − y (k )) R ( d (l ) − c ir (l ))


∑ τ r ( k ) ir
r =1

d (k + 1) = d (k ) − η ( y (k ) − ~y (k ))( p (k ) − y ( k )) τ r (k ) xi (k ) − cir (l )
 ir ir 0r R
(d (l ) − xir (k ))(d ir (l ) − cir (l ))
 ∑τ r (k ) i
 r =1
Square spline-based input MFs
The derivation procedure of square spline MF (A.6) parameter update rules

∂ε ∂ε ∂ε ∂ε
requires computation of partial derivatives , , ,
∂air ∂bir ∂cir ∂d ir
 ∂   x − a 2  ( x − air )( xi − bir ) a + bir

  2 i ir
  = 4 i , if air < xi < ir

 ∂air   bir − air   (bir − air ) 3
2
  
  2

 ∂ 1 − 2 bir − xi   = −4 (bir − xi ) , if air + bir < x < c
2
 ∂air  b −a   (bir − air ) 3

i ir
∂µ ir    ir ir 

2
= (C.18)
∂air  ∂   xi − cir  
2
c + d ir
 1 − 2  = 0, if cir < xi < ir
 
 ∂air   d ir − cir   2

 ∂   d − x 2  c + d ir
  2 ir i
  = 0, if ir < xi < d ir
  
 ∂air   d ir − cir   2
 ∂   x − a 2  ( x − air ) 2 a + bir
  2 i ir
  = 4 i , if air < xi < ir

 ∂bir   bir − air   (bir − air ) 3 2

 ∂  2

 1 − 2 bir − xi   = 4 ( xi − air )( xi − bir ) , if air + bir < x < c
  
∂µ ir  ∂bir   bir − air   (bir − air ) 3
i ir
2
= (C.19)
∂bir 
∂   xi − cir  
2
c + d ir
 1 − 2  = 0, if cir < xi < ir
 
 ∂bir   d ir − cir   2

 ∂   d ir − xi  
2
c + d ir
 2  = 0, if ir < xi < d ir
 
 ∂bir   d ir − cir   2
209
 ∂   x − a 2  air + bir
  2 i ir
  = 0, if air < x i <
 ∂cir   bir − air   2

 ∂  2

 1 − 2 bir − x i   = 0, if
air + bir
< xi < cir
  
∂µ ir  ∂cir   bir − air   2
= (C.20)
∂cir 
∂   x − cir  
2
(d − xi )( xi − cir ) c + d ir
 1 − 2 i  = 4 ir , if cir < xi < ir
  d ir − cir  
 ∂cir  (d ir − cir ) 3 2

 ∂   d ir − xi  
2
( d − xi ) 2 cir + d ir
 2
 
 = 4 ir , if < xi < d ir
 
 ∂cir   d ir − cir   (d ir − cir ) 3 2
 ∂   x − a 2  air + bir
  2 i ir
  = 0, if air < xi <
 ∂d ir   bir − a ir   2

 ∂  2

 1 − 2 bir − xi   = 0, if
air + bir
< xi < cir
 b −a  
∂µ ir  ∂d ir   ir ir 

2
= (C.21)
∂d ir  ∂  2

1 − 2 xi − cir   = −4 ( xi − cir ) , c + d ir
2
 d −c   if cir < xi < ir


 ∂d ir   ir ir 

(d ir − cir ) 3 2

 ∂   d ir − xi  
2
(d − xi )( xi − cir ) cir + d ir
 2
 
 = 4 ir , if < xi < d ir
∂d  d − cir   (d ir − cir ) 3
 ir   ir
2

In order to obtain the learning rules we need to substitute the partial derivatives
in (C.8) that results in (C.22)-(C.25).
if air(l) < xi(k) < (air(l) + bir(l))/2 (C.22)

 ~ τ r (k ) 2( xi (k ) − bir (l ))
air (l + 1) = a ir (l ) − η ( y (k ) − y (k ))( p 0 r (k ) − y ( k )) R (b (l ) − air (l ))( xi (k ) − a ir (l ))


∑ τ r (k ) ir
r =1

b (l + 1) = b (l ) + η ( y (k ) − ~ τ (k ) 2
y ( k ))( p 0 r ( k ) − y (k )) R r
 ir ir
(b (l ) − air (l ))
 ∑τ r (k ) ir
 r =1
210
if (air(l) + bir(l))/2 < xi(k) < bir(l) (C.23)

 ~ τ r (k ) 4(bir (l ) − xi ( k )) 2
 ira ( l + 1) = a ( l ) + η ( y ( k ) − y ( k ))( p ( k ) − y ( k ))
µ ir ( xi ( k ))(bir (l ) − air (l )) 3
ir 0r R

 ∑r τ ( k )
r =1

b (l + 1) = b (l ) + η ( y (k ) − ~y ( k ))( p (k ) − y ( k )) τ r (k ) 4(bir (l ) − xi (k ))( xi ( k ) − air (l ))
 ir ir 0r R
µ ir ( xi (k ))(bir (l ) − air (l )) 3


∑r τ ( k )
r =1
if cir(l) < xi(k) < (cir(l) + dir(l))/2 (C.24)

 ~ τ r (k ) 4(cir (l ) − xi ( k ))( xi (k ) − d ir (l ))
cir (l + 1) = cir (l ) − η ( y ( k ) − y (k ))( p 0 r ( k ) − y (k )) R
µ ir ( xi ( k ))(d ir (l ) − cir (l )) 3
 ∑ τ r (k )
 r =1

b (l + 1) = b (l ) − η ( y ( k ) − ~ τ r (k ) 4(cir (l ) − xi (k )) 2
y ( k ))( p ( k ) − y ( k ))
 ir ir 0r R
µ ir ( xi (k ))(d ir (l ) − cir (l )) 3


∑r τ ( k )
r =1
if (cir(l) + dir(l))/2 < xi(k) < dir(l) (C.25)

 ~ τ r (k ) 2
cir (l + 1) = a ir (l ) − η ( y (k ) − y (k ))( p 0 r (k ) − y (k )) R (d (l ) − cir (l ))


∑τ r (k ) ir
r =1

d (l + 1) = b (l ) + η ( y (k ) − ~ τ (k ) 2( xi ( k ) − cir (l ))
y (k ))( p 0 r ( k ) − y (k )) R r
 ir ir
(d (l ) − cir (l ))( xi ( k ) − d ir (l ))
 ∑τ r (k ) ir
 r =1
1st order TS systems
The learning rules for the linear coefficients of 1st order TS systems are
obtained in similar manner.
∂ε (k )
pir (k + 1) = pir ( k ) − η (C.26)
∂pir (k )
is computed using the chain rule
∂ε ∂ε ∂y
= , (C.27)
∂p ir ∂y ∂pir
211
where
∂y ∂ ( p 0 r + p1r x1 + ... + p Nr x N )τ r xτ
= ⋅ = Ri r , i = 1…N
∂p ir ∂p ir R
(C.28)
∑τ r =1
r ∑τ
r =1
r
results in the update rule

x τ (k )
pir (k + 1) = p ir (k ) − η ( y (k ) − ~
y (k )) Ri r
(C.29)
∑ τ r (k )
r =1
The learning rules for input MFs of 1st order TS systems analogous to (C.14)-
(C.15), (C.14)-(C.15) and (C.22)-(C.25) are easily obtained from those derived
for 0th order TS systems substituting partial derivative ∂y / ∂τ r in (C.8) with
(C.30).
R N
∑ ( p0r + p1r x1 + ... + p Nr x N )τ r p 0 r + ∑ pir xi − y

∂y ∂ r =1 i =1
= = (C.30)
∂τ r ∂τ r R R
∑τ r ∑τ r
r =1 r =1
212
Transparency
protected gradient
descent algorithm
for standard fuzzy
systems
The inference function of a transparent standard fuzzy system with triangular
MFs (input MFs form a partition (4.71) and output MFs are symmetrical (A.4)
thus satisfying (3.5)-(3.6)) and product implication/sum aggregation is given by
R R
y = ∑τ r br s r ∑τ r s r , (D.1)
r =1 r =1
where
N
τ r = ∏ µ ir ( xi ) . (D.2)
i =1
The goal of supervised learning algorithm based on gradient descent is to

minimize the cost function
213
ε=
1
[ y − ~y ]2 , (D.3)
2
where ~
y is the reference output and y is given by (D.1).
The parameter updates
∂ε
br (l + 1) = br (l ) − η , (D.4)
∂br
∂ε
s r (l + 1) = s r (l ) − η , (D.5)
∂s r
∂ε
ais (l + 1) = ais (l ) − η , (D.6)
∂ais
are computed using the chain rules
∂ε ∂ε ∂y
= , (D.7)
∂br ∂y ∂br
∂ε ∂ε ∂y
= , (D.8)
∂s r ∂y ∂s r
∂ε ∂ε ∂y ∂ε  ∂y ∂µ is −1 ∂y ∂µ is ∂y ∂µ is +1 
= =  + + . (D.9)
∂ais ∂y ∂ais ∂y  ∂µ is −1 ∂ais ∂µ is ∂ais ∂µ is +1 ∂ais 
Thus
∂ε ∂y τ s
= (y − ~
y) = (y − ~
y) R r r ,
∂br ∂br
∑τ r s r
(D.10)
r =1
and
τ (k ) s r (l )
br (l + 1) = br (l ) − η ( y (k ) − ~y (k )) R r ;
∑τ r (k )s r (l )
(D.11)
r =1
214
 
τ b 
R R
∂ε  r r ∑ τ r s r − τ r ∑ τ r br s r  τ
~
= ( y − y ) r =1 r =1
 = (y − ~
y )(br − y ) R r (D.12)
∂s r   R 
2

 ∑ r r 
 τ s   ∑τ r s r
  r =1   r =1
and
τ r (k )
s r (l + 1) = s r (l ) − η ( y (k ) − ~
y ( k ))(br (l ) − y (k )) R
.
∑τ r (k ) s r (l )
(D.13)
r =1
Further derivation yields that.
1  Ri 
s
R Ris R
s  ∑ r' r' r' ∑ r r ∑ r' r' ∑ r r r 

τ b s τ s − τ s τ bs 
∂y µ i  r ' =1 r =1 r ' =1 r =1 =
=
∂µ is
 R

2
 ∑ τ r s r 
 r =1  (D.14)
1  Ri 
s
Ris Ris
∑ τ ' b ' s ' − y∑τ ' s '
µ is  r ' =1 r r r

r r 
1
∑ τ r s r ( br − y)
µ is
' ' '
r =1
'
 r ' =1
= R
= R
∑τ r s r ∑τ r s r
r =1 r =1
and
 ∂  xi − ais −1  − ( xi − ais −1 ) µ is
∂ is −1
µ  s 1 − s =− = , if a is −1 < xi < a is
∂a is
=  ∂ai  ai − a is −1  a i − ai
s
(
s −1 2 a s
i − a )
s −1
i (D.15)

0, otherwise
 ∂  xi − ais −1  x − ais −1 µ is
 s  s =− i = − , if ais −1 < xi < ais
∂
 i  i
a a − a s −1 
i  ai − a i
s
(
s −1 2 a i
s
− )
a i
s −1

∂µ is  ∂  ais +1 − xi  (ais +1 − xi ) µ is
=   = = , if ais < xi < ais +1
∂ais s  s +1 s 
 ∂ai  ai − ai  ai − ai
s +1
(
s 2 s +1
ai − a i) s (D.16)

0, otherwise


215
 ∂  ais +1 − xi  (a s +1 − xi ) µ is
∂µ is +1  s 1 − s +1 =− i = − , if ais < xi < ais +1
∂ais
=  ∂ai  ai − ais 

s +1
ai − ai s 2
( s +1
ai − ai s
) (D.17)

0, otherwise
Thus
if ais −1 < xi < ais (D.18)
R ( µis −1 ) R ( µ is )
1 1
∂ε µ is −1
∑τ r s r (br ' ' ' − y)
µ is µ is
∑τ r sr (br ' ' ' − y)
µ is
= ( y − ~
y )( r ' =1
− r ' =1
)=
∂a is R
a is − a is −1 R
a is − a is −1
r =1 r =1
s R ( µis −1 ) R ( µ is )
µ
1 µ
i
s −1 ∑τ s
r' r'
(br ' − y ) − ∑ τ r s r ( br
' ' ' − y)
= (y − ~
y) i r ' =1 r ' =1
ai − ais −1
s R
∑τ r s r
r =1
if ais < xi < ais +1 (D.19)

R ( µ is ) R ( µ is +1 )
1 1
∂ε µ is
∑τ r s r (br ' ' ' − y)
µ is µ is +1
∑τ r s r (br ' ' ' − y)
µ is
~ r ' =1 r ' =1
= ( y − y )( − )=
∂ais R
ais +1 − ais R
ais +1 − ais
r =1 r =1
R ( µ is ) s R ( µ is +1 )
µ
1
∑ τ r s r (br ' ' ' − y) −
µ
i
s +1 ∑τ s
r' r'
(br ' − y )
r ' =1 r ' =1
= (y − ~
y) s +1
i
,
a i − ais R
∑τ r s r
r =1
where r ′ = 1... R( µ is ) refers to rules having Ais in their premise.

(D.18) and (D.19) put into (D.6) results in the following learning rules for input
MF parameters
216
if ais −1 < xi < ais (D.20)

( y (k ) − yˆ ( k )) 1
a is (l + 1) = a is (l ) − η ⋅ ⋅
a is (l ) − ais −1 (l ) R
∑τ r (k ) s r (l )
r =1 ,
 µ s
i ( x i ( k ))
R ( µ is −1 ) R ( µ is ) 
⋅
 µ
s −1 ∑τ r'
(k ) s r ' (l )(br ' (l ) − y (k )) − ∑τ r (k )s r (l )(br (l ) − y (k ))
' ' '
i ( x i ( k )) r ' =1 r ' =1 
if ais < xi < a is +1 (D.21)

( y (k ) − yˆ (k )) 1
a is (l + 1) = a is (l ) − η ⋅ ⋅
a is +1 (l ) − ais (l ) R
∑τ r (k ) s r (l )
r =1 .
 R ( µ is )
µ s
i ( x i ( k ))
R ( µ is +1 ) 
⋅ ∑τ r (k )s r (l )(br (l ) − y (k )) − µ
' ' '
s +1 ∑ τ
( xi (k )) r ' =1 r
( k )'s r ' (l )(b r ' (l ) − y ( k )) 
 r ' =1 i 
Note that if sr = ξ, (D.1) reduces to (2.29) and the resulting (D.11) and (D.20)-
(D.21) constitute the original Jager algorithm for 0th order TS systems.
217

Transparent-Fuzzy-Systems Modeling&Control AndriRiid 2002

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transparent-Fuzzy-Systems Modeling&Control AndriRiid 2002

Uploaded by

Copyright:

Available Formats

THESIS ON INFORMATICS AND SYSTEM ENGINEERING

Transparent Fuzzy Systems:

Thesis submitted in partial fulfillment of the requirements for the degree

© Andri Riid, 2002

Tallinn, April-September 2001, December 2001, February-March 2002

2 Fuzzy systems ………………………………………………….. 9

4 Fuzzy modeling ………………..………………………………. 51

5 Fuzzy control …...………………………….…………………... 105

6 Applications ……………………...…………………………….. 139

7 Conclusions …………………………………………...………... 179

References ………………………………………………...……… 183

This thesis summarizes author's research experience and principal results

1.1 General background

network. In such "neuro-fuzzy" networks, connection weights and propagation

considerable potential for practical applications - basically because such a

1.2 Problem statement

1.3 Original contribution

Transparency-accuracy tradeoff is similarly observable with standard and 0th

1.4 Outline of the thesis

systems that, as shown, greatly benefit from system transparency. The

Fig. 2.1. Fuzzy set.

2.2 Basic properties of fuzzy sets

Fuzzy sets with a height equal to 1 are called normal.

Figure 2.2. Height, support and core of a fuzzy set.

2.3 Fuzzy partition

AGE Linguistic variable

young middle-aged old Linguistic labels ( terms)

x (age) Base variable

Figure 2.3. A fuzzy partition.

often referred to as a fuzzy partition (or Ruspini partition). In case of a fuzzy

2.4 Operations on fuzzy sets and fuzzy logic

In applications, interestingly, a sum (obviously not a s-norm) of fuzzy sets is far

Fig. 2.6. Complement of a fuzzy set.

2.5 Fuzzy systems

1. The inference mechanism operates on fuzzy sets to produce fuzzy sets.

Proposition matching is defined as τ ir = hgt ( µ i' ∩ µ ir ) , where µ i' is the ith

4. Operator THEN corresponds to implication. In classical logic, implication is

Fuzzification (2.17), proposition matching, premise conjunction (2.18),

IF U1 is A11 AND U2 is A21 THEN V1 is B1

Fig 2.7. Steps of inference algorithm and corresponding linguistic operators.

When it comes to multi-input/multi-output (MIMO) systems, AND operator

In practice CoG is usually applied in discrete form:

Mean-of-maxima defuzzification belongs to the class of indexed (or

Putting (2.22) into (2.25) we obtain

whereas in second case

A popular representation of fuzzy systems is depicted in Fig. 2.8, where (to

Fuzzifier Fuzzy Inference Engine Defuzzifier

Fig. 2.8. A generic fuzzy system.

2.6 Rule base properties

In (2.16)-(2.23) we refer to linguistic labels (and respective fuzzy subsets) in

each rule is assigned a rule weight Wr = [0,1], that is involved in calculation of

2.7 Inference examples

Fig.2.9. Network representation of a fuzzy system.

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

A11 A12 A21 A22 B1 B2 B3

2.8 Takagi-Sugeno fuzzy systems

It is easy to see complete equivalence between singleton standard fuzzy systems

p11x1 + p21x2 + p01