This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

This page intentionally left blank

This page intentionally left blank

To My Parents .

This page intentionally left blank .

The presentation reflects theoretical and practical issues in a balanced way. Most approaches are based on neuro-fuzzy systems. This combination of fuzzy systems and neural networks enables a more effective use of optimization techniques for building fuzzy systems. and design procedures have been firmly established and applied for many years. and the field is still developing rapidly. Back in 2000. the absence of any state-of-the-art (Indian Domain) textbook forced me to write this book. Recently. The choice of describing engineering applications coincides with the Fuzzy Logic and Neural Network research interests of the readers. which provide little insight to help understand the underlying process. and at the same time provide insight into the system that generated the data. education and research. The field of neural networks has a history of some five decades but has found solid application only in the past fifteen years. it is distinctly different from the fields of control systems or optimization where the terminology. or due to the time varying characteristics. This book is intended to cover widely primarily the topics on neural computing. aiming at . neural modeling. Some of the material in this book contains timely material and thus may heavily change throughout the ages. Neural networks are useful for industry. which exploit the functional similarity between fuzzy reasoning systems and neural networks. Modeling and control of dynamic systems belong to the fields in which fuzzy set techniques have received considerable attention. due to strongly non-linear behaviour. neural learning. Neurofuzzy models can be regarded as black-box models. basic mathematics. The approach adopted in this book aims at the development of transparent rule-based fuzzy models which can accurately predict the quantities of interest. a great deal of research activity has focused on the development of methods to build or update fuzzy models from numerical data. Fuzzy modeling along with other related techniques such as neural networks have been recognized as powerful tools. not only from the scientific community but also from industry. The orientation of the book is towards methodologies that in the authors experience proved to be practically useful.Preface This book attempts to provide the reader with basic concepts and engineering applications of Fuzzy Logic and Neural Networks. due to the high degree of uncertainty. and neural memory. as well as the internal structure of the fuzzy rules. which can facilitate the effective development of models. Thus. Attention is paid to the selection of appropriate model structures in terms of the dynamic properties. Many systems are not amenable to conventional modeling approaches due to the lack of precise. formal knowledge about the system. especially with regard to their approximation accuracy.

Examples are given throughout the text and six selected real-world applications are presented in detail.viii PREFACE readership from the academic world and also from industrial practice. Chennakesava R. Alavala .

7 Quasi Fuzzy Number 2.6.6.4 Applications Question Bank References LE 1-5 1 2 4 4 4 5 Part I: Chapter 2: Fuzzy Sets and Fuzzy Logic 2.6.5 Convex Fuzzy Set 2.1 Fuzzy Set 2.6.12 Empty Fuzzy Set Fuzzy Logic 6-18 6 6 8 9 9 9 9 11 11 11 11 12 12 13 14 14 15 15 .6.11 Equality of Fuzzy Sets 2.6.2 Support 2.2 What is Fuzzy Logic? 2.6.6.6.3 Historical Background 2.4 Characteristics of Fuzzy Logic 2.6.8 Triangular Fuzzy Number 2.5 Characteristics of Fuzzy Systems 2.9 Trapezoidal Fuzzy Number 2.2 Neural Networks (NN) 1.Contents 2HAB=?A Chapter 1: Introduction 1.3 Normal Fuzzy Set 2.10 Subsethood 2.4 =-Cut 2.1 Fuzzy Logic (FL) 1.6 Fuzzy Number 2.6.1 Introduction 2.6.6 Fuzzy Sets 2.3 Similarities and Dissimilarities Between FL and NN 1.

3.2.2.3.9 Total Order 3.2 Fuzzy Relations 3.13 Universal Fuzzy Set 2.3.7.14 Fuzzy Point 2.2 Reflexivity 3.10 Binary Fuzzy Relation 3.1 Linguistic Variables 4.3 Projection 3.4 Symmetricity 3.1 Classical N-Array Relation 3.3.2 Union 2.7 Operations on Fuzzy Sets 2.6.3 Modifiers 4.7.7 Equivalence 3.1 Intersection 2.6.3 Complement Question Bank References 15 15 16 16 16 17 17 18 Chapter 3: Fuzzy Relations 3.6 Transitivity 3.1 Introduction 3.2.3.1 Intersection 3.2 Union 3.x CONTENTS 2.3.4 Cartesian Product of Two Fuzzy Sets 3.2.2.2.3 Operations on Fuzzy Relations 3.2 Fuzzy Implications 4.6 Sup-Min Composition of Fuzzy Relations Question Bank References Chapter 4: Fuzzy Implications 4.2.8 Partial Order 3.2 The Linguistic Variable Truth Question Bank References 19-28 19 19 19 20 20 20 20 20 20 20 20 21 21 21 22 23 24 24 26 27 27 29-40 29 30 33 34 35 38 39 .3.5 Shadow of Fuzzy Relation 3.3 Anti-Reflexivity 3.2.2.2.7.1 Introduction 4.3.5 Anti-Symmetricity 3.

1 Entailment Rule 5.6.3 Sugeno Inference Mechanism 7.3.1 Introduction 7.3 Disjunction Rule 5.2 Total Indeterminance 5.3 Rational Properties 5.3.3 Triangular Conorm 6.3 Subset 5.8 Mamdani System 6.2 Tsukamoto inference Mechanism 7.2.3.2.9 Larsen System 6.4 Projection Rule 5.6. xi 41-53 41 43 43 43 43 44 44 44 45 45 46 46 47 50 51 Chapter 6: Fuzzy Rule-Based Systems 6.2.2 Fuzzy Rule-base System 7.7 Measure of Dispersion or Entropy of an Owa Vector 6.3.4 Superset Question Bank.3.3.2.3.1 An Averaging Operator is a Function 6.2.5 Negation Rule 5.3 Inference Mechanisms In Fuzzy Rule-base Systems 7.4 Larsen inference Mechanism 54-70 54 54 55 57 57 58 58 60 63 66 66 67 68 68 71-80 71 71 72 73 73 75 77 .10 Defuzzification Question Bank References Chapter 7: Fuzzy Reasoning Schemes 7.2 Translation Rules 5.1 Mamdani inference Mechanism 7.6 Compositional Rule of Inference 5.1 Introduction 5.2 Triangular Norm 6.4 t-norm-based Intersection 6.3.2.2 Conjunction Rule 5. References.5 t-conorm-Based Union 6.1 Basic Property 5.6 Averaging Operators 6.2 Ordered Weighted Averaging 6.1 Introduction 6.CONTENTS Chapter 5: The Theory of Approximate Reasoning 5.

5 Effectivity Of Fuzzy Logic Control Systems Question Bank References Chapter 9: Fuzzy Logic-Applications 9.4 Fuzzy Logic Model for Prevention of Road Accidents 9.1 Two-Input-Single-Output (TISO) Fuzzy Systems 8.4.4.3 When Not to use Fuzzy Logic? 9.4.4.4.4 Membership Functions 9.4 Defuzzification 9.2 First-of-Maxima 8.5.2 Fuzzy Logic Approach 9.1 Traffic Accidents and Traffic Safety 9.3 Fuzzy Logic Controller 8.3.4.6.5 Simplified Fuzzy Reasoning Question Bank References 77 79 79 Chapter 8: Fuzzy Logic Controllers 8.6 Output 9.4.1 Apple Defects Used in the Study 81-93 81 81 82 82 82 84 86 87 87 87 88 88 89 91 91 94-120 94 95 96 96 96 97 97 98 99 99 99 100 100 101 102 103 104 104 105 .5 Rule Base 9.3 Application 9.3.4.1 The Mechanics of Fuzzy Logic 9.5 Fuzzy Logic Model to Control Room Temperature 9.1 Why use Fuzzy Logic? 9.4.5.2 Basic Feedback Control System 8.5 Height defuzzification 8.4 Max-Criterion 8.4 Defuzzification Methods 8.4.1 Introduction 8.3 Middle-of-Maxima 8.7 Conclusions 9.3.2 Mamdani Type of Fuzzy Logic Control 8.5.5.4.xii CONTENTS 7.3.6 Fuzzy Logic Model for Grading of Apples 9.3 Rule Application 9.5.2 Fuzzification 9.3 Fuzzy Logic Control Systems 8.4.2 Applications of Fuzzy Logic 9.5 Conclusions 9.1 Center-of-Area/Gravity 8.

7 Results and Discussion 9.6.5.6.5 Training of Artificial Neural Networks 10.CONTENTS 9.1 Notation 10.2 Networks with Threshold Activation Functions 11.2 Materials and Methods 9.1 Perceptron Learning Rule 11.3 Perceptron Learning Rule and Convergence Theorem 11.4 Network Topologies 10.1 Introduction 11.3 Activation and Output Rules 10.8 Conclusion 9.7.1 Paradigms of Learning 10.5.2 Modifying Patterns of Connectivity 10.6.2 Biological Neural Network 10.4 Fuzzy Rules 9.7.7 An Introductory Example: Fuzzy v/s Non-fuzzy 9.3 Some Observations Question Bank References xiii 105 106 108 109 110 111 112 112 112 116 117 118 118 Part II: Neural Networks Chapter 10: Neural Networks Fundamentals 10.6.3 Application of Fuzzy Logic 9.2 The Fuzzy Approach 9.2 Connections between Units 10.6.6 Notation and Terminology 10.7.1 Introduction 10.3.6 Defuzzification 9.2 Convergence Theorem 11.6.6.3.6.2 Terminology Question Bank References Chapter 11: Perceptron and Adaline 11.4 Adaptive Linear Element (Adalime) 121-128 121 121 122 123 123 124 125 125 125 126 126 126 127 128 128 129-138 129 129 131 131 131 133 .5 Determination of Membership Functions 9.3 A Framework for Distributed Representation 10.1 Processing Units 10.3.3.1 The Non-Fuzzy Approach 9.3.6.

4 Boltzmann Machines Question Bank References 157-189 157 157 158 159 161 161 162 163 164 164 165 167 167 .4 Hopfield networks for optimization problems 13.Layer Feed .2.2 The Effect of the Number of Hidden Units 12.2 Learning Rate and Momentum 12.Rule In Recurrent Networks 13.2 Multi .8 How Good are Multi-layer Feed-forward Networks? 12.2.1 Network Paralysis 12.xiv CONTENTS 11.3.2 Hopfield Network as Associative Memory 13.3 Learning Per Pattern 12.3.7 Multi-layer Perceptrons Can do Everything Question Bank References Chapter 12: Back-Propagation 12.3.2 The Generalised Delta .1 Introduction 12.4.7 Advanced Algorithms 12.3.5 The Delta Rule 11.4.1 Introduction 13.8.1 The Effect of the Number of Learning Samples 12.3 The Hopfield Network 13.2.3.2 Local Minima 12.1 Description 13.2 The Elman Network 13.1 The Jordan Network 13.3 The Generalised Delta Rule 12.9 Applications Question Bank References 134 135 137 138 138 139-156 139 139 140 142 143 143 144 144 146 146 148 148 148 151 152 153 153 155 155 Chapter 13: Recurrent Networks 13.3 Back-Propagation in Fully Recurrent Networks 13.1 Understanding Back-Propagation 12.3 Neurons with graded response 13.4 Working with Back-propagation 12.6.6.5 Other Activation Functions 12.6 Exclusive-or Problem 11.8.6 Deficiencies of Back-propagation 12.1 Weight Adjustments with Sigmoid Activation Function 12.4.Forward Networks 12.

5.3 The Cart-Pole System 15.3 Dynamics 16.1 Introduction 14.5.2.4.3 Operation 14.1 Introduction 15.3 Counter propagation 14.3 The Controller Network 15.1 Background: Adaptive Resonance Theory 14.5.1 Normalized Hebbian Rule 14.1 Associative Search 15.2.2 Principal Component Extractor 14.1 Clustering 14.5 Normalization of the Original Model 14.2.2 Inverse Kinematics 16.5.5.4 Bartos Approach: The ASE-ACE Combination 15.2 ART1: The Simplified Neural Network Model 14.2.CONTENTS xv 169 169 170 170 174 174 176 177 179 180 181 181 182 182 183 184 185 186 187 188 188 Chapter 14: Self-Organising Networks 14.6 Contrast enhancement Question Bank References Chapter 15: Reinforcement Learning 15.2.2.2 The Critic 15.2.4 Learning Vector Quantisation 14.5.3 More eigenvectors 14.2 Adaptive Critic 15.4.4.4.4 Principal Component Networks 14.2 Robot Control 16.1 Introduction 16.2.4.1 Forward Kinematics 16.4 Trajectory generation 190-198 190 190 191 192 193 194 194 195 197 197 199-215 199 200 200 200 201 201 .4.2 Competitive Learning 14.5 Reinforcement Learning Versus Optimal Control Question Bank References Chapter 16: Neural Networks Applications 16.2 Vector Quantisation 14.3 Kohonen Network 14.4 ART 1: The Original Model 14.5 Adaptive Resonance Theory 14.

2 Results and Discussion Question Bank References 201 202 202 203 206 207 210 211 213 215 215 Part III: Hybrid Fuzzy Neural Networks 217-232 217 217 217 218 218 219 220 221 222 223 223 224 226 228 229 231 232 232 233-252 233 233 234 235 237 Chapter 17: Hybrid Fuzzy Neural Networks 17.2.9.5a Involvement of neural networks 16.1 Sequential Hybrid Systems 17.2.8 Committee of Networks 17.2 Neural Networks 18.3.2.2 Simulation 17.7 Robot Arm Dynamics 16.9 Fnn Architecture Based On Back Propagation 17.1 Introduction 18.3 Embedded Hybrid Systems 17.3 Experimental Design and System Development Experimental Design .1 Unsupervised Adaptive Resonance Theory (ART) Neural Networks 16.9.1 Introduction 17.5 End-Effector Positioning 16.2.2 Tool Breakage Monitoring System for end Milling 18.10 Adaptive Neuro-fuzzy Inference System (ANFIS) 17.6 Neural Networks as Tuners of Fuzzy Logic Systems 17.1 Methodology: Force signals in the end milling cutting process 18.2.6 Camera-Robot Coordination in Function Approximation 16.3 Fuzzy Logic in Learning Algorithms 17.4 Fuzzy Neurons 17.6b Approach 2: Topology conserving maps 16.1 ANFIS Structure Question Bank References Chapter 18: Hybrid Fuzzy Neural Networks Applications 18.xvi CONTENTS 16.2.1 Strong L-R Representation of Fuzzy Numbers 17.2.2.7 Advantages and Drawbacks of Neurofuzzy Systems 17.2.6a Approach-1: Feed-forward Networks 16.2 Auxiliary Hybrid Systems 17.10.5 Neural Networks as Pre-processors or Post-processors 17.2.3 Detection of Tool Breakage in Milling Operations 16.3.2.2 Hybrid Systems 17.2.

5 Findings and Conclusions 18.4 Neural Network-BP System Development 18.3.CONTENTS 18.3.3.4 Optimization of the PI-Controllers using Genetic Algorithms Question Bank References Index xvii 238 241 243 243 245 246 247 251 251 253 .3.1 Adaptive neuro-fuzzy inference system 18.3 Model of Combustion 18.3 Control of Combustion 18.2.2 Learning Method of ANFIS 18.2.

This page intentionally left blank .

yes/no. imprecise input. in order to apply a more human-like way of thinking in the programming of computers. approximate reasoning and computing with words. but as a way of processing data by allowing partial set membership rather than crisp set membership or non-membership. neural networks have rooted in many application areas (expert systems. fuzzy logic. utilize self-learning. Zadeh. Fuzzy logic is mainly associated to imprecision. If feedback controllers could be programmed to accept noisy. FL is a multivalued logic that allows intermediate values to be defined between conventional evaluations like true/false. Unfortunately. etc. and presented not as a control methodology. This approach to set theory was not applied to control systems until the 70s due to insufficient small-computer capability prior to that time. Notions like rather tall or very fast can be formulated mathematically and processed by computers. 1 1. they would be much more effective and perhaps easier to implement. high/low. manufacturers have not been so quick to embrace this technology while the Europeans and Japanese have been aggressively building real products around it. a professor at the University of California at Berkley.4 Introduction Nowadays. they have many common features . and neural networks to learning and curve fitting (also to classification). numerical information input. . U.1 FUZZY LOGIC (FL) The concept of Fuzzy Logic was conceived by Lotfi A. Professor Zadeh reasoned that people do not require precise. system control.like the use of basis functions (fuzzy logic has membership functions and neural networks have activation functions) and the aim to estimate functions from sample data or heuristics. Basically. Although these methodologies seem to be different. pattern recognition. and yet they are capable of highly adaptive control. follow more human-like reasoning paths than classical methods. etc.S. These methods have in common that they are non-linear. have ability to deal with non-linearities.+ 0 ) 2 6 .).

Mitsubishi. Heraclitus proposed that things could be simultaneously True and not True. It was Plato who laid the foundation for what would become fuzzy logic. These limitations of one-layer perceptron were mathematically shown by Minsky and Papert in their book Perceptron [1969]. there is no mathematical model for truck and trailer reversing problem.C. missile guidance. Companies that have fuzzy research are General Electric. Honda. complex aircraft engines and control surfaces. the so-called Laws of Thought were posited. + – Input Fuzzy Controller Control Plant to be controlled Output Fig. Even when Parminedes proposed the first version of this law (around 400 B. Commercially most significant have been various household and entertainment electronics. The most significant application area of FL has been in control field. Nissan. which did not seem to be more difficult could not be solved. states that every proposition must either be True or False. Rockwell. The result of this publication was that the neural networks lost their . but many problems. In their efforts to devise a concise theory of logic. were introduced by Rosenblatt [1959]. Hitachi. then turn the wheel slightly left. Sharp. indicating that there was a third region (beyond True and False) where these opposites tumbled about. These types of networks were called perceptrons. Fuzzy control includes fans.1).2 NEURAL NETWORKS (NN) The study of neural networks started by the publication of Mc Culloch and Pitts [1943]. For example. 1. The singlelayer networks. helicopter control. McDonnell Douglas. 1. Canon. Fuji. Fuzzy system performs better (uses less fuel. see Fig. 1. It has been made a rough guess that 90% of applications are in control (the main part deals with rather simple applications.2 FUZZY LOGIC AND NEURAL NETWORKS Fuzzy systems is an alternative to traditional notions of set membership and logic that has its origins in ancient Greek philosophy. in which the truck must be guided from an arbitrary initial position to a desired final position. The most famous controller is the subway train controller in Sengai. Omron. and later mathematics. Japan.1 Example of a control problem. the Law of the Excluded Middle. automatic transmission. drives smoother) when compared with a conventional PID controller. etc.) there were strong and immediate objections: for example. One of these. the use of heuristic linguistic rules may be the most reasonable solution to the problem. Humans and fuzzy systems can perform this nonlinear control task with relative ease by using practical and at the same time imprecise rules as If the trailer turns slightly left. Samsung. wheel slip control. industrial processes and so on. for example washing machine controllers and autofocus cameras. The precision of mathematics owes its success in large part to the efforts of Aristotle and the philosophers who preceded him. with threshold activation functions. If the conventional techniques of system analysis cannot be successfully incorporated to the modeling or control problem. In the 1960s it was experimentally shown that perceptrons could solve many problems. But it was Lukasiewicz who first proposed a systematic alternative to the bi-valued logic of Aristotle. Siemens.

NN makes an attempt to simulate human brain. 1. The simulating is based on the present knowledge of brain function. Hinton. 1. and Williams [1986]. which can in its simplest form be a threshold unit (See Fig.2 Simple illustration of biological and artificial neuron (perceptron). input-output mapping. back-propagation algorithm was reported by Rumelhart. which are connected to each other with transmission lines called axons and receptive lines called dendrites (see Fig. which changes the weights (or the parameters of activation functions) in such a way that the network will reproduce a correct output with the correct input values. which revived the study of neural networks.. The result of this combination is then fed into a non-linear activation unit (activation function). In the mid-1980s. ranges between some minimum and maximum value. and this knowledge is even at its best primitive. Each neuron has an activation level which. e. it is not absolutely wrong to claim that artificial neural networks probably have no close relationship to operation of human brains. in contrast to Boolean logic. The difficulty is how to guarantee generalization and to determine when the network is sufficiently trained. Neural networks are often used to enhance and optimize fuzzy logic based systems. which ideally gives a high fault tolerance. Nonlinearity is a desired property if the generator of input signal is inherently nonlinear. and the modification of connections. 1. This learning ability is achieved by presenting a training set of different examples to the network and using learning algorithm.g.INTRODUCTION 3 interestingness for almost two decades. Synapse Nucleus Axon dendrites 1 X1 X2 W0 W1 W2 S Summing threshold unit z Fig. In artificial neural networks the inputs of the neuron are combined in a linear way with different weights. .2). Neural networks offer nonlinearity. So. The operation of brain is believed to be based on simple basic elements called neurons.2). adaptivity and fault tolerance. by giving them a learning ability. The high connectivity of the network ensures that the influence of errors in a few terms will be minor. The learning may be based on two mechanisms: the creation of new connections. The significance of this new algorithm was that multiplayer networks could be trained by using it.

user-friendly application interfaces. 3. QUESTION BANK. The main dissimilarity between fuzzy logic and neural network is that FL uses heuristic knowledge to form rules and tunes these rules using sample data. good results have been achieved by combining both the methods. database management. The number of this kind of hybrid systems is growing. A very interesting combination is the neuro-fuzzy architecture. etc.4 APPLICATIONS Applications can be found in signal processing. automaticized programming. pattern recognition. Fuzzy logic may also be employed to improve the performance of optimization methods used with neural networks. 5. 4. Most neuro-fuzzy systems are fuzzy rule based systems in which techniques of neural networks are used for rule induction and calibration. in which the good properties of both methods are attempted to bring together.4 FUZZY LOGIC AND NEURAL NETWORKS 1.3 SIMILARITIES AND DISSIMILARITIES BETWEEN FL AND NN There are similarities between fuzzy logic and neural networks: estimate functions from sample data do not require mathematical model are dynamic systems can be expressed as a graph which is made up of nodes and edges convert numerical inputs to numerical outputs process inexact information inexactly have the same state space produce bounded signals a set of n neurons defines n-dimensional fuzzy sets learn some unknown probability function can act as associative memories can model any system provided the number of nodes is sufficient. fault diagnostics and information security. quality assurance and industrial inspection. speech processing. robotics control. 2. adaptive process control. credit rating. whereas NN forms rules based entirely on data. 1. computer networks. 1. Possible new application areas are programming languages. natural-language understanding. business forecasting. In many cases. What is the ancient philosophy of fuzzy logic? What are the various applications of fuzzy logic? What is the historical evolution of neural networks? What are the similarities and dissimilarities between fuzzy logic and neural networks? What are the various applications of neural networks? .

pp. 2.S. pp. 338-353. Business Week. 8. Papert. pp. 5. 533-536. NY: 1967. Vol. 7. Encyclopedia of Philosophy. 8. 323. 414. Rosenblatt.E. New York: Spartan Books. 6. Zadeh. 1993. 10. 4. G. Zadeh. pp. IEEE Software. June 21. Learning representations by backpropagating errors. Pitts. W. 1986.. Making computers think like people. A logical calculus of the ideas immanent in nervous activity. Vol. Loses Focus on Fuzzy Logic. Fuzzy Sets. 3. MacMillan. Why the Japanese are going in for this fuzzy logic. 8. M. Nature.A. 26-32. 1959. C. 48-56. 23-26.A. Feb. 1965. Vol. Nov. 1994. . Lejewski.J. 1990. L. U. pp. 9. S. Hinton and R. 20. 5.E. 11. Soft computing and fuzzy logic.A. Rumelhart. pp. McCulloch and W. 1943. pp. L. Encyclopedia of Philosophy. MacMillan. 39. Vol. 104-107. Jan Lukasiewicz. 4. Principles of Neurodynamics. The MTT Press. (November). NY: 1967. Electronics Engineering Times. 1991. Laws of thought. F. Minsky and S. 11. 115-133. 5. Vol. Williams.417. Europe Gets into Fuzzy Logic. L. Smith. D. 12. Spectrum. Zadeh. IEEE. Korner. Information and Control.S. 1969. T. 1. Perceptrons: An Introduction to Computational Geometry. Vol.INTRODUCTION 5 REFERENCES. Machine Design. 1984. Bulletin of Mathematical Biophysics. Vol. pp.

the approaches based on first order logic and classical probability theory do not provide an appropriate conceptual framework for dealing with the representation of commonsense knowledge. Henri Matisse Sometimes the more measurable drives out the most important. Charles Sanders Peirce .+ 0 ) 2 6 . since such knowledge is by its nature both lexically imprecise and noncategorical. The theory of fuzzy logic provides a mathematical strength to capture the uncertainties associated with human cognitive processes. Here is what some clever people have said in the past: Precision is not truth.2 WHAT IS FUZZY LOGIC? Fuzzy logic is all about the relative importance of precision: How important is it to be exactly right when a rough answer will do? All books on fuzzy logic begin with a few good quotes on this very topic. It was specifically designed to mathematically represent uncertainty and vagueness and to provide formalized tools for dealing with the imprecision intrinsic to many problems. As a consequence. Fuzzy logic provides an inference morphology that enables approximate human reasoning capabilities to be applied to knowledge-based systems. such as thinking and reasoning. The development of fuzzy logic was motivated in large measure by the need for a conceptual frame work which can address the issue of uncertainty and lexical imprecision. The conventional approaches to knowledge representation lack the means for representing the meaning of fuzzy concepts. 1 INTRODUCTION Fuzzy sets were introduced by L. 2 2.4 Fuzzy Sets and Fuzzy Logic 2. and this is no exception. Rene Dubos Vagueness is no more to be done away with in the world of logic than friction in mechanics.A Zadeh in 1965 to represent/manipulate data and information possessing nonstatistical uncertainties.

Mencken So far as the laws of mathematics refer to reality. In this sense fuzzy logic is both old and new because. and Ill shift the gears for you. and Ill tell you what the tip should be. precise statements lose meaning and meaningful statements lose precision.1 Precision and significance. L.3 m/sec. and Ill adjust the faucet valve to the right setting. 2. Fuzzy logic is a convenient way to map an input space to an output space. Albert Einstein As complexity rises. LOOK OUT!! Precision Significance Fig. the concepts of fuzzy logic reach right down to our bones. and the great emphasis here is on the word convenient. Fuzzy logic sometimes appears exotic or intimidating to those unfamiliar with it.FUZZY SETS AND FUZZY LOGIC 7 I believe that nothing is unconditionally true. they do not refer to reality. You tell me how fast the car is going and how hard the motor is working. H. but once you become acquainted with it. Precision and significance in the real world A 1500 kg mass is approaching your head at 45. it seems almost surprising that no one attempted it sooner. although the modern and methodical science of fuzzy logic is still young. And so far as they are certain.something that humans have been managing for a very long time (Fig. This is the starting point for everything else.1). Fuzzy logic is a fascinating area of research because it does a good job of trading off between significance and precision . 2. and Ill focus the lens for you. Dont be penny wise and pound foolish. You tell me how hot you want the water. they are not certain. L. A Zadeh Some pearls of folk wisdom also echo these thoughts: Dont lose sight of the forest for the trees. What do I mean by mapping input space to output space? Here are a few examples: You tell me how good your service was at a restaurant. . and hence I am opposed to every statement of positive truth and every man who makes it. You tell me how far away the subject of your photograph is.

set-theoretic in nature. development of computing with words and perceptions brings together earlier strands of fuzzy logic and suggests that scientific theories should be based on fuzzy logic rather than on Aristotelian. Today. PNL opens the door to a major enlargement of the role . A combination which has attained wide visibility and importance is that of neuro-fuzzy systems. pattern recognition.. evolutionary computing. that is. neurocomputing. for the most part. bivalent logic. soft computing is a coalition of methodologies which collectively provide a foundation for conception.3 HISTORICAL BACKGROUND Almost forty years have passed since the publication of first paper on fuzzy sets. fuzzy logic is used in two different senses: (a) a narrow sense. with generalization of the concept of a set. neuro-fuzzy-genetic systems. is a union of FLn. and logic and reasoning were not at the center of the stage. two key concepts were introduced in this paper: (a) the concept of a linguistic variable. probabilistic computing. from 1965 to 1973. The principal members of the coalition are: fuzzy logic. the term fuzzy logic is used. Basically. Basically. in which fuzzy logic. more generally. Soft computing came into existence in 1981.g. almost all applications of fuzzy set theory and fuzzy logic involve the use of these concepts. fuzzy topology and. The first phase. a matter of degree. chaotic computing. fuzzy mathematical programming. The term fuzzy logic was used for the first time in 1974. calculus of fuzzy quantifiers and related concepts and calculi. decision analysis. abbreviated as FLn. and the impact of soft computing is growing on both theoretical and applied levels. Other combinations. more generally. knowledge representation. in a lattice. with two-valued characteristic function generalized to a membership function taking values in the unit interval or. fuzzy mathematics. Perhaps the most striking development during the second phase of the evolution was the naissance and rapid growth of fuzzy control. design and utilization of intelligent systems. with the launching of BISC (Berkeley Initiative in Soft Computing) at UC Berkeley. rough set theory and machine learning. and (b) a wide sense. was concerned in the main with fuzzification. marking the beginning of the third phase. cluster analysis. Where do we stand today? In viewing the evolution of fuzzy logic. fuzzy arithmetic. in which fuzzy logic. possibility theory. A key component of computing with words is the concept of Precisiated Natural Language (PNL). three principal phases may be discerned. abbreviated as FL. fuzzy set theory. There were many other major developments in fuzzy-logic-related basic and applied theories. and (b) the concept of a fuzzy if-then rule. fuzzy arithmetic. in its wide sense. The second phase.8 FUZZY LOGIC AND NEURAL NETWORKS 2. for the most part. among them the genesis of possibility theory and possibilistic logic. in general. e. 1973-1999. Today. The distinguishing characteristic of FL is that in FL everything is. 1996 is the genesis of computing with words and the computational theory of perceptions. The basic tenet of soft computing is that. is a logical system which is a generalization of multivalued logic. The basic issues and applications which were addressed were. are appearing. An important development in the evolution of fuzzy logic. Today. or is allowed to be. especially in Japan. Fuzzy control applications proliferated but their dominance in the literature became less pronounced. calculus of fuzzy if-then rules. alongside the boom in fuzzy logic applications. as they are at present. better results can be obtained through the use of constituent methodologies of soft computing in combination rather than in a stand-alone mode.

2. In fuzzy logic. It may take some time for this to happen. the word fuzzy is usually used in a pejorative sense. but eventually abandonment of bivalence will be viewed as a logical development in the evolution of science and human thought. 1] . fuzzy constraint on a collection of variables. From its inception. A fuzzy set A in X is characterized by its membership function (Fig. for some fuzzy logic is hard to accept because by abandoning bivalence it breaks with centuries-old tradition of basing scientific theories on bivalent logic. In part. more importantly. 2.5 CHARACTERISTICS OF FUZZY SYSTEMS There are two main characteristics of fuzzy systems that give them better performance for specific applications: Fuzzy systems are suitable for uncertain or approximate reasoning. equivalently. exact reasoning is viewed as a limiting case of approximate reasoning. It may well turn out to be the case that. and µA (x) is interpreted as the degree of membership of element x in fuzzy set A for each x Î X. and especially PNL. 2. knowledge is interpreted a collection of elastic or.. will be the Internet..6 FUZZY SETS 2.(2. especially for the system with a mathematical model that is difficult to derive. centering on the conception and design of search engines and question-answering systems. one of the most important application-areas of fuzzy logic.2) . It is clear that A is completely determined by the set of tuples A = {(u.2). In fuzzy logic.6.FUZZY SETS AND FUZZY LOGIC 9 of natural languages in scientific theories. 2. in coming years.. fuzzy logic has been (and to some degree still is) an object of skepticism and controversy. in English. Fuzzy logic allows decision making with estimated values under incomplete or uncertain information.4 CHARACTERISTICS OF FUZZY LOGIC Some of the essential characteristics of fuzzy logic relate to the following: In fuzzy logic. µA(u)) |u Î X} .1) Let X be a nonempty set. Any logical system can be fuzzified.. But. Inference is viewed as a process of propagation of elastic constraints.(2. skepticism about fuzzy logic is a reflection of the fact that. everything is a matter of degree.1 Fuzzy Set µA : X ® [0.

2. If X = {x1. 2. from the Figure cheap is roughly interpreted as follows: Below Rs..2 A discrete membership function for x is close to 1. . and depends on his purse (Fig. 2. can be A(t) = exp ( b(t 1)2) where b is a positive real number.. 450000 Rs. Cheap can be represented as a fuzzy set on a universe of prices. 300000 Rs. Frequently we will write A(x) instead of µA(x)...1: defined as The membership function (Fig. 300000 cars are considered as cheap. The family of all fuzzy sets in X is denoted by F(X). 2.(2. Example 2. i =1. .. Example 2. 1 Rs.. 600000 Fig. + µn/xn .. .10 FUZZY LOGIC AND NEURAL NETWORKS 1 –2 –1 0 1 2 3 4 Fig.2: Assume someone wants to buy a cheap car.3) of the fuzzy set of real numbers "close to 1".3) where the term µi / xi. For instance.3 A membership function for x is close to 1..4). 2.n signifies that µi is the grade of membership of xi in A and the plus sign represents the union. 1 –1 1 2 3 4 Fig... and prices make no real difference to buyers eyes. .4 Membership function of "cheap".. xn} is a finite set and A is a fuzzy set in X then we often use the notation A = µ1/x1 + .

In this case [A] a R{-1. More exactly. 2. Example 2. 1.5 Convex Fuzzy Set A fuzzy set A of X is called convex if [A]a is a convex subset of X"a Î [0.6/2 + 0. the support of A.6. 600000. or essentially bigger than 5000.3/ 1 + 0.. Using the theory of fuzzy subsets we can represent these fuzzy numbers as fuzzy subsets of the set of real numbers.4 =-Cut An a-level set of a fuzzy set A of X is a non-fuzzy set denoted by [A]a and is defined by [A]a = R{t e X | A(t ) ³ a} Scl(sup p A) T if if a>0 a=0 .3: Assume X = { 2. 1].2 Support Let A be a fuzzy subset of X. 1. Between Rs. supp(A) = {x ÎX | A(x) > 0}.3 < a £ 0.0/ 2 + 0. 300000 and Rs. Otherwise A is subnormal. denoted supp (A).5. 2.FUZZY SETS AND FUZZY LOGIC 11 Between Rs. 0. 1. 0. 3..6 < a £ 1 2.6) where cl (supp A) denotes the closure of the support of A. These are examples of what are called fuzzy numbers. is the crisp subset of X whose elements all have nonzero membership grades in A.6.3 Normal Fuzzy Set A fuzzy subset A of a classical set X is called normal if there exists an xÎX such that A(x) =1. 450000.3 0.6 0. people use terms such as. 4} and A = 0. 1.6. .. a small variation in the price induces a clear preference in favor of the cheapest car.(2. near zero.(2.6/0 + 1.0/1 + 0. 2.3/3 + 0.5) 2.6. a variation in the price induces a weak preference in favor of the cheapest car. Beyond Rs. about 5000. 3} | = S{0. 2. 2} |{1} T if if if 0 £ a £ 0.0/4. In many situations people are only able to characterize numeric information imprecisely. For example. 2. 600000 the costs are too high (out of consideration). An a-cut of a triangular fuzzy number is shown in Fig.. 450000and Rs. .

a2(g) max [A]g . It is easy to see that if a £ b then [A]a É [A]b Furthermore..7 Quasi Fuzzy Number A quasi-fuzzy number A is a fuzzy set of the real line with a normal..(2. Then [A]g is a closed convex (compact) subset of Â for all g Î [0... 1] ® Â is monotone increasing and lower semicontinuous.cut.(2. 2.6) A is a fuzzy set of the real line with a normal.6 Fuzzy Number A fuzzy number (Fig.8) In other words. . 2. 1 –2 –1 1 2 3 Fig. and the right-hand side function a2 : [0.10) . The family of fuzzy numbers will be denoted by F .11) .. 2. 2. Let us introduce the notations a1(g) = min [A]g.1].6.(2.. a1 (g) denotes the left-hand side and a2 (g) denotes the right-hand side of the g .(2. 1] ® Â is monotone decreasing and upper semicontinuous.6 Fuzzy number.. 2. (fuzzy) convex and continuous membership function of bounded support..12 FUZZY LOGIC AND NEURAL NETWORKS a a – cut Fig.9) ...7) Let Abe a fuzzy number. the left-hand side function a1 : [0.5 An a-cut of a triangular fuzzy number.6. fuzzy convex and continuous membership function satisfying the limit conditions 1® ¥ lim A(t) = 0 .(2.

1 –3 –2 –1 1 2 3 Fig. 2. 1] such that [A]g is not a convex subset of R.1] . 2. If A is not a fuzzy number then there exists an gÎ[0.t S b |0 | | T if a .t | a |1 . 2. a.. b + b). . The not fuzzy number is shown in Fig..14) The support of A is (a a.8.(2. a2(g)] The support of A is the open interval [a1 (0). 2.a £ t £ a if a £ t £ a + b otherwise .8 Not fuzzy number. a2 (0)] and it is illustrated in Fig. A 1 g a1(g) a2(g) ..7 The support of A is [a1(0).12) a1(0) a2(0) Fig. b).. "g Î[0. 2. a2(0)].13) and we use the notation A = (a. left width a > 0 and right width b > 0 if its membership function has the following form A(t) = R1 .a .(2. a + (1 g)b].. A triangular fuzzy number (Fig. 2.8 Triangular Fuzzy Number A fuzzy set A is called triangular fuzzy number with peak (or center) a.6.(2.7.9) with center a may be seen as a fuzzy quantity x is approximately equal to a.a .. It can easily be verified that [A]g = [a (1 g)a.FUZZY SETS AND FUZZY LOGIC 13 We shall use the notation [A]g = [a1(g).

b].(2.t | a |1 | A(t) = S t .. b). R1 . It can easily be shown that [A]g = [a (1 g)a.9 Trapezoidal Fuzzy Number A fuzzy set A is called trapezoidal fuzzy number with tolerance interval [a. b].a .16) The support of is (a a.10 Subsethood Let A and B are fuzzy subsets of a classical set X. left width and right width b if its membership function has the following form.6.14 FUZZY LOGIC AND NEURAL NETWORKS 1 a–a a a+b Fig.9 Triangular fuzzy number. b + (1 g)b].10 Trapezoidal fuzzy number.6. 2.10) may be seen as a fuzzy quantity x is approximately in the interval [a. "t Î X. . "g Î[0.. The subsethood is illustrated in Fig. 2.. 1 a–a a b b+b Fig.15) A = (a.b |1 b |0 | T and we use the notation if a a £ t £ a if a £ t £ b if a £ t £ b + b otherwise . 1] . 2. b.11. We say that A is a subset of B if A(t) £ B(t). A trapezoidal fuzzy number (Fig.(2. 2. 2.. b + b). 2. a.

called universal fuzzy set (Fig.14 Fuzzy Point Let A be a fuzzy number. 2. 2. It is easy to see that A Ì 1X holds for any fuzzy subset A of X. "t Î X. 1 1x 10 x Fig. 1 X0 X0 Fig. 2. is defined by 1X(t) = 1. 2. then A is called a fuzzy point (Fig.6. 2.11 A is a subset of B. 2. if A Ì B and B Ì A. 2.12 The graph of the universal fuzzy subset in X = [0.FUZZY SETS AND FUZZY LOGIC 15 B A Fig. We note that A = B if and only if A(x) = B(x) for x Î X.6.13 Fuzzy point.13 Universal Fuzzy Set The largest fuzzy set in X. A and B are said to be equal.11 Equality of Fuzzy Sets Let A and B are fuzzy subsets of a classical set X.6. If supp (A) = {x0}. 10]. . 2. 2.6. denoted A = B. It is easy to see that Ø Ì A holds for any fuzzy subset A of X.12 Empty Fuzzy Set The empty fuzzy subset of X is defined as the fuzzy subset Ø of X such that Ø(x) = 0 for each x Î X. denoted by 1X.13) and we use the notation A = x0.12) in X.

7.2 Union A B Fig. 2.15.1 Intersection (A Ç B)(t) = min {A(t). The union of A and B is defined as (A È B) (t) = max {A(t).7 OPERATIONS ON FUZZY SETS We extend the classical set theoretic operations from ordinary set theory to fuzzy sets.. 2..(2.7. We note that all those operations which are extensions of crisp concepts reduce to their usual meaning when the fuzzy subsets have membership degrees that are drawn from {0.16 FUZZY LOGIC AND NEURAL NETWORKS Let A = x0 be a fuzzy point..15 Union of two triangular fuzzy numbers. 2. B(t)} = A(t) Ù B(t) for all t Î X . For this reason. x0] = {x0}.14 Intersection of two triangular fuzzy numbers. "g Î [0.(2.. 1] . .18) The intersection of A and B is defined as The intersection of A and B is shown in Fig. 2. It is easy to see that [A]g = [x0. 1}. 2.14. Let A and B are fuzzy subsets of a nonempty (crisp) set X. A B . 2.(2. when extending operations to fuzzy sets we use the same symbol as in set theory. B(t)} = A(t) Ú B(t) for all t Î X The union of two triangular numbers is shown in Fig..17) 2.19) Fig..

List all a-cuts. 2} and B = {1. 4.7/x3 + 0. A(t)} = mix {1 1/2. Define the following: (i) equality of fuzzy sets. b. 1. and A Ç B. 5.21) A closely related pair of properties which hold in ordinary set theory are the law of excluded middle QUESTION BANK. 7. 1/2} = 1/2 ¹ 1 Lemma2. What is fuzzy logic? Explain the evolution phases of fuzzy logic. c}. Find A È B. b.22) It is clear that Ø1X = f and Øf = 1X. 6.5/x1 + 0. c.. Let A(t) =1/2.1: The law of excluded middle is not valid. 2. 1. Ø(A Ú B = ØA Ù ØB) .. Given X = {1.(2.(2. A(t)} = max {1 1/2.. fuzzy logic does satisfy De Morgans laws Ø(A Ù B) = ØA Ú ØB. 1/2} = 1/2 ¹ 0 However. 6}. "t Î R. 5.} and A = {2. 3. 9. 2. 12. (iii) universal fuzzy set. Let A be a fuzzy set defined by A = 0.. What are the characteristics of fuzzy logic? What are the characteristics of fuzzy systems? What are the different fuzzy sets? Define them.4/x2 + 0. the laws of excluded middle and noncontradiction are not satisfied in fuzzy logic. Let A(t) =1/2.FUZZY SETS AND FUZZY LOGIC 17 2. then it is easy to see that (ØA Ú A)(t) = max {ØA(t). 3. 2. "t Î R. 4. 3. What are the operations on fuzzy sets? Explain with examples. Given A ={a. 8.8/x4 + 1/x5. .20) . What are the roles of a-cut in fuzzy set theory? What are the different fuzzy numbers? Define them. however. (ii) empty fuzzy set. Lemma2. 4.2: The law of non-contradiction is not valid.. 6. 11. then it is easy to see that (ØA Ú A)(t) = mix {ØA(t).7.(2. Find ØA. 10..3 Complement The complement of a fuzzy set A is defined as (ØA)(t) = 1 A(t) A Ú ØA = X and the law of non-contradiction principle A Ù ØA = f .

pp. pp. 1. 613-626. Dubois. 1973. No. L. No.A. pp. 1. 26-32. 16. Zadeh. L. 1968. 4. 77-84. pp.M. neutral networks and soft computing. 1991. Fuzzy logic. L.A. Zadeh. Information Sciences.A. (November).A. No. Roles of soft computing and fuzzy logic in the conception. Possibility theory and soft data analysis. 28-44. 19. on systems. Zadeh. Foundations of fuzzy sets. pp. 756-759. IEEE Software. 1994. design and deployment of information/intelligent systems. J. 13. Zadeh. Boulder. L. 10. 9. P. Zadeh. 37. Computational Intelligence: Soft Computing and Fuzzy-Neuro Integration with Applications. B. The algebra of fuzzy logic. Vol. Zadeh. Vol. No. Part 1.). Turksen. 40. 1972. 8. Mathematics Frontiers of the Social and Policy Sciences. Vol. Gottwald. 2. 8. A fuzzy-set-theoretic interpretation of linguistic hedges. Springer-Verlag. pp. Soft computing and fuzzy logic.4.A. IEEE Transactions on Fuzzy Systems. Termini. 3. pp. 1979. The concept of a linguistic variable and its application to approximate reasoning. 8. Zadeh. pp. 17. Man and Cybernetics. 32-39. 199-249. Vol. 338-353. Hohle and L. Vol. L. 18. International Journal of Systems Science.A. Fuzzy Sets and Systems. pp. 2. 2. 1. Berlin. Kaynak. Vol. Vol. 1. Zadeh. pp. L.18 FUZZY LOGIC AND NEURAL NETWORKS REFERENCES. 301-357. IEEE Computer. Zadeh. 69-129. The concept of a linguistic variable and its application to approximate reasoning. L. Outline of a new approach to the analysis of complex systems and decision process. IEEE Transactions. Information Sciences. 373-386. pp. Brown. Zadeh. A. edited by O. 1988. pp. pp. Westview Press. Fuzzy logic.A. Fuzzy algorithms. 1972.A. Vol. No. 7. 20. IEEE Transactions on systems. L. L. . Watanable. 1965. Rudas. 1978. 94-102. S. Information and Control.N. Vol. 10. Fuzzy Sets and Systems. S. Zadeh.A. Vol. No. Algebraic properties of fuzzy sets. 2.8. and H. Fuzzy Sets and Systems. A note on fuzzy sets. Journal of Cybernetics. 125-151. SMC-3. Vol. No. 1978.A. No. 2. and S. Zadeh. 1975. Communications of the ACM. A generalized fuzzy set theory. 1981. pp. Vol. IEEE Spectrum. L. 5. D. No. man and cybernetics. 4-34. pp. Vol. Information and Control. Throll (eds. 1978. No. Fuzzy Sets. Fuzzy logiccomputing with words. 6. 12. 8. Journal of Mathematics Analysis and Applications. pp. 1998.A. 1974. Making computers think like people.Cobb and R. L. Vol. J.A. 11.A. 1994. Part 2. 103-111. Information and Control. 2. 48-56. 15. pp. pp. 12. 18. pp. Zadeh. 40. DeLuca. Operations on Fuzzy Numbers. and I. L. U. L. L. Albert. 14. 1996. Prade.10-37. 1971. Vol. 1984. Stout. 3. A. 83-93. G. 2. No. L. 8. 9.257-296. Vol. 4. 21. Zadeh. Vol. 3. Set theory for fuzzy sets of higher level. 203-230. Vol. 2.

The subsets of the Cartesian product X1 x x Xn are called n-ary relations. for example {(Charles.4 Fuzzy Relations 3. v). Charles. Xn).(3. w) and an example of n-ary tuple is (X1. (John. v)ÎR otherwise ... Eva). v) = Example 3. If X1 = .. b] and XR(u.1) Consider the following relation (u. v. c] otherwise . ..2) if (u. Rita)} 3. Let R be a binary relation in Â.. A binary tuple is denoted by (u. James} and Y the domain of women {Diana. Then the characteristic function of R is defined as XR(u.. then R is called an n-ary relation in X. = Xn and R Ì X n. Example 3.2 R1 S0 T R1 S0 T if (u.(3. v) = vÎ[0. b]x[0..2.1 INTRODUCTION 3 A classical relation can be considered as a set of tuples.. Rita.1 Classical N-Array Relation Let X1.2 FUZZY RELATIONS 3.+ 0 ) 2 6 .1: Let X be the domain of man {John.... Eva}. v)ÎR Û uÎ[a. Diana). an example of a ternary tuple is (u. then the relation married to on X ´ Y is.. c] . . Xn be classical sets. (James. where a tuple is an ordered pair. v) Î[a .

.

8P P 1 Q OPERATIONS ON FUZZY RELATIONS Fuzzy relations are very important because they can describe interactions between variables. If X = Y then we say that R is a binary fuzzy relation in X.v| = 1 if |u . Let R be a binary fuzzy relation on R. 3}. .8 OP P 0. n) | (n m) mod 3 º 0} This is an equivalence relation. called approximately equal can be defined as R(1. 1) = 0. v) = S0. S(u.8 0.FUZZY RELATIONS 21 Consider the relation mod 3 on natural numbers {(m. 3) = R(3. 1) = R(2.8 |0.3 T In matrix notation it can be represented as if u = v if |u . Example 3.1 Intersection (R Ù S) (u.5: A simple example of a binary fuzzy relation on U = {1.. 1]. 2.3 0. v). A fuzzy relation R is a fuzzy subset of X ´ Y.e.2.. v) in R. 2) = R(3.3 0.8 1 0. R the domain of R is the whole Cartesian product X ´ Y.3 1 1 2 3 0. v) is interpreted as the degree of membership of the ordered pair (u.v| = 2 LM MM1 MN2 3 3. In other words.3. Let R and S be two binary fuzzy relations on X ´ Y. v)} .10 Binary Fuzzy Relation Let X and Y be nonempty sets.3 The membership function of R is given by R1 | R(u. i. 2) = R(2. 1) = R(2.(3. 3. 2) = 0. 3) = R(3. v) = min {R(u. R Î F (X ´ Y). Then R (u.8 R(1. 3. 3) = 1 R(1.3) The intersection of R and S is defined by Note that R: X ´ Y ® [0.

S(u. v) = min{R(u.9 0. y3 y4 01 0.7 The union of R and S means that x is considerable larger than y or x is very close to y. 1 2 3 y1 y2 y3 y4 0. . v).8 01 .(3.9 1 0. R(u. v)} Example 3.2 Union The union of R and S is defined by (R Ú S) (u.8 0.7 0.9 0.(3.5 0. y) = M MM x Nx Consider a classical relation R on Â.8 The intersection of R and S means that x is considerable larger than y and x is very close to y.9 1 0.5 0.6 .8 0 0.4 0 0 0..6 0. v) = 1 2 3 y1 y2 y3 y4 0.3 0 0.22 FUZZY LOGIC AND NEURAL NETWORKS 3.4 0 0.4 0 01 0.6: Let us define two binary relations R = x is considerable larger than y . c] otherwise It is clear that the projection (or shadow) of R on the X-axis is the closed interval [a.8 0.b] x [0..3 0 0. 2 3 0 0.9 0.9 0. v ) Î[a .8 0 0. 0 0. c].7 OP P 0P P 0.5Q OP PP PQ .7 .7 0.4) LM MM x MN x x LM MM x MN x x 1 y1 y2 0. LM x (R Ù S) (x. b] and its projection on the Y-axis is [0..4 0.3.7 0. y) = M MM x Nx LM x (R Ú S) (x.8 R1 S0 T if (u.8Q OP PP P 0.5Q OP PP P 0..5) S = x is very close to y 1 2 3 y1 y2 y3 y4 0.

y3). y) Î R} where Õx denotes projection on X and Õy denotes projection on Y.8 0 0 0..9 1 0.. (x2. . which is the maximum of the second row. i.e. i. .(3.. Õx (x3) = 1. X Y Fig.7 . y1).(3.e. (x2.8 01 01 0. then Õx = {x Î X| $y Î Y(x. y) | x Î X} Example 3.8.(3. 3.7 OP PP P 0Q x1 is assigned the highest membership degree from the tuples (x1.6) . (x3.3. i. y2).3 Projection Let R be a fuzzy binary fuzzy relation on X ´ Y. (x3. (x1.e. y1). (x1. .. y) Î R} Õy = {y Î Y| $x Î X(x. y1). y3).FUZZY RELATIONS 23 If R is a classical relation in X ´ Y.7) 3.. Õx (x2) = 0. y4).9) ..7: Consider the relation . The projection of R on X is defined as Õx (x) = sup{R (x.. y4).8) LM x R = x is considerable larger than y = M MM x Nx then the projection on X means that 1 2 3 y1 y2 y3 y4 0. 0 0.. which is the maximum of the third row.(3. x2 is assigned the highest membership degree from the tuples (x2. y2). y2). (x3. x3 is assigned the highest membership degree from the tuples (x3. which is the maximum of the first row. y) | y Î Y} and the projection of R on Y is defined as Õy (y) = sup{R (x.2 Shadows of a fuzzy relation. (x2. Õx (x1) = 1. (x1. y4). y3).

3 Cartesian product of two fuzzy sets. If A and B are normal.3.3..(3. A B A´B Fig. then Õy (A ´ B) = B and Õx (A ´ B) = A. 3..3) is a fuzzy relation in X ´ Y... 3.(3. It is clear that the Cartesian product of two fuzzy sets (Fig.4 Cartesian Product of Two Fuzzy Sets The Cartesian product of A Î F (X) and B Î F (Y) is defined as (A ´ B) (u. The composition of a fuzzy set C and a fuzzy relation R can be considered as the shadow of the relation R on the fuzzy set C (Fig. 3. Really.. sup{B(y)}| y} = min {A(x). v) = min{A(u).12) The sup-min composition of a fuzzy set C Î F (X) and a fuzzy relation R Î F (X ´ Y) is defined as for all x Î X and y Î Y. y)| y} = sup {A(x) Ù B(y)| y} = min {A(x).10) for all u Î X and v Î Y..4). R(x.5 Shadow of Fuzzy Relation (Co R) (y) = sup min{C(x). 1} = A(x) .(3.24 FUZZY LOGIC AND NEURAL NETWORKS 3. y)} . B(v)} . .11) 3. Õx(x) = sup {A ´ B (x.

3} and let R be a binary fuzzy relation in {1.2/1 + 1/2 + 0. 3}. Example 3. 2.9: Let C be a fuzzy set in the universe of discourse {1. 2.FUZZY RELATIONS 25 C(x) R(x.3 0.2/1 + 1/2 + 0.3 0. 3. Assume that C = 0.8 1 .4 Shadow of fuzzy relation 4 on the fuzzy set +.2/3 and LM 1 R= M MM2 N3 1 1 2 3 0.8P P 1Q OP PP PQ Using the definition of sup-min composition we get LM 1 C o R = (0.8 OP P 0.3 = 0.2/3) o M MM2 N3 1 2 3 1 0. y) Y Fig.8/3 0. y¢) (CoR) (y¢) X Y¢ R(x.3 0.8 1 0.8/1 + ½ + 0.8 0.8 1 0.8 0.8 0.8: Let A and B be fuzzy numbers and let R=A´B a fuzzy relation. Observe the following property of composition A o R = A o (A ´ B) = A B o R = B o (A ´ B) = B Example 3.

8 01 01 0.8 0.11: Consider two fuzzy relations .8 0 0 0. 1] 1+ y 2 3.8 OP PP PQ z1 z2 z3 0..13) LM x R = x is considerable larger than y = M MM x Nx LM MM y S = y is very close to z = y MM y MN y Then their composition is 1 2 3 1 2 3 y1 y2 y3 y4 0.6 0.9 1 0.5 0 0. 1] and y Î[0.9 0.6 0. w)} for v Î Y It is clear that R o S is a binary fuzzy relation in X ´ Z. Assume that C(x) = x and R(x.5 0.6 Sup-Min Composition of Fuzzy Relations Let R Î F(X ´ Y ) and S Î F(Y ´ Z).4 0 0. Example 3.7 OP PP PQ . w) = sup min{R(u.26 FUZZY LOGIC AND NEURAL NETWORKS Example 3. 1].9 0. 1 |x y|} = for all x Î[0.3 0 0..10: Let C be a fuzzy set in the universe of discourse [0.7 0. S(v. 0 0.8 0. Using the definition of sup-min composition we get C o R(y) = sup min{x. denoted by R o S is defined as (R o S) (u. v).9 0.7 0.7 4 OP PP PP 0.5P Q LM x RoS= M MM x Nx 1 2 3 z1 z2 z3 0. y) = 1 |x y|.7 .(3. .4 0 0.4 0. 1] and let R be a binary fuzzy relation in [0.3. The sup-min composition of R and S.

223-229. y2. 8. 3. Vol. Cohen and P. 1965. 1980. Given X = {x1. R. 4.A.9 0. 0 0. y3. Zadeh.7 0. 11. No. Find SUP-MIN composition. y4} be the common symptoms of the diseases.7 2 3 OP LM y PP o MM y 0 P My 0. Vol. Find the Cartesian product of A and B. 2. Prade. 5. . 0.4 0 = 1 x2 0. 4. B. Explain the operations on the fuzzy relations.5 OP PP PP PQ LM MM MN z1 z2 z3 0. LM MM x MN x x 1 y1 y2 y3 y4 0.5 0 0. 1982.5). L. 1981.e. pp.7 . Given any N-ary relation. Academic Press. 12. No. d4} of the various diseases affecting the plants and Y = {y1.C. d2. (x3.8 x3 0. 1971. Fuzzy Sets and Systems. the composition of R and S is nothing else. Yager. Fuzzy Sets. 123-140. but the classical product of the matrices R and S with the difference that instead of addition we use maximum and instead of multiplication we use minimum operator.FUZZY RELATIONS 27 formally. 2. Some properties of fuzzy relationships. 0. 1981.4 0. .5 0. Structure of fuzzy binary relations. y2} respectively. NY. 389-396. x4} of four varieties of paddy plants. No.7Q i. 338-353. S. 2.4)} be the two fuzzy sets on the universes of discourse X = {x1. pp. D = {d1. (x2. x3} and Y = {y1. Cybernetics and Systems. Kybernetes. pp. Guild.9 1 0.6 0. 0.8 01 01 0. 9. 1. Problems of Control and Information Theory. Information and Control. QUESTION BANK. D.R. (y2. 6. Modelling Controllers Using Fuzzy Relations.8Q M MN y 1 2 3 4 z1 z2 z3 0. 6. 5.V. 1. 0.F.3 x 0 0.A.6 0. 5. 3. Zadeh. x3. Vol. x2. Frankl. 169-195. Bouchon.4 0. Similarity relations and fuzzy orderings. how many different projections of the relation can be taken? Given A = {(x1. and N.F.3). Bladwin.. pp. x2.3)} and B = {(y1. d3. Vol. G. Dubois. 1980. Vol. Metrical properties of fuzzy relations. REFERENCES. pp. Fuzzy Sets and Systems: Theory and Applications. Information Sciences.7 0. No.9 OP P 0P P 0. No. Ovchinnikov. 0.8 0. 2. J.8 0 0. 2.1). L. and H. 7. 177-200. 2. What are the fuzzy relations? Explain them. 3. Vol.9 0.

J. 49. Vol. NJ. J. 159-170. 1994. 21. No. J. Vol. 18. Gupta. and M. 4. 134-146. W. Fodor. Vol. Vol.28 FUZZY LOGIC AND NEURAL NETWORKS 8.E. pp. 75.M. 213-221. 1993. 1995. Fuzzy Sets and Systems. 331-341. 11. General decomposition problem of fuzzy relations. Mordeson. Kohout. Fuzzy Sets and systems. J. pp. Peng. 1988. 109-120. De Baets and E. 27. 17. IEEE Computer. pp.1995. Klawon. L. Kruse. Introduction to Fuzzy Arithmetic: Theory and Applications. Operations on fuzzy graphs. Fuzzy Logic with Engineering Applications. pp. Fillard. No.P. Vol. Vol. 1994. 4. 69-79. Foundations of Fuzzy Systems. 1991.J. New York. 16. Upper Saddle River. T. Gebhardt. 15. On-new types of homomorphisms and congruences for partial algebraic structures and n-ary relations. 1988. 10. Bandler and L. 14. pp. No. Faurous and J.C. P. 21. pp. No. 20. No. Wiley. Fuzzy Sets and Systems. 3. Inc. Traces of fuzzy binary relations. R. 317-321. No. and F.. Fuzzy Sets and Systems. 1.J. Vol. Quotients with respect to similarity relations. International Journal of General Systems. Fuzzy Sets and Systems. 50. 14. Van Nostrand Reinhold. 60. Vol. Vol. Fuzzy logic. pp. Hohle. A new approach to the similarity in the fuzzy set theory. Yuan. McGraw-Hill. 83-93. pp. 1992. 307-315. Vrba. Fuzzy relational compositions. U. 3. Fuzzy Sets and Fuzzy Logic: Theory and Applications. Li. pp. A. An upper bound on indices of finite fuzzy relations.J. pp. 1993. W. 2. 31-44. J. Kolodziejczyk. 1991. 54. Information Sciences. No. 9. Klir. pp. 149-157. 1988. 1. International Journal of General Systems. G. B. Ross. 1. Chichester. and C. NY. 3. Kaufman. 3. 1993. 13. . No. NY. Zadeh. 12. No.X.A. 12. 1986. Kerre. No. Vol. and B. Information Sciences.S. 79. 19. Decomposition problem of fuzzy relations: Further results. Prentice Hall.

.1) .. The implication p ® q is interpreted as Ø (p Ù Øq)...(4. It is easy to see that The full interpretation of the material implication p ® q is that the degree of truth of p ® q quantifies to what extend q is at least as true as p.4 Fuzzy Implications 4.2) 4 Let p = x is in A and q = y is in B are crisp propositions.(4.e.. p ® q is true Û t(p) £ t(q) p®q= p 1 0 0 1 q 1 1 0 0 . p ® q = Øp Ú q p entails q means that it can never happen that p is true and q is not true..4) .. It is easy to see that p ® q is true..(4.3) R1 S0 T if t(p) £ t(q) otherwise p®q 1 1 1 0 The truth table for the material implication. because it can never happen that x is bigger than 10 and x is not bigger than 9. Example 4. This property of material implication can be interpreted as: if X Ì Y then X ® Y .1 INTRODUCTION .1: Let p = x is bigger than 10 and let q = x is bigger than 9.(4. i.+ 0 ) 2 6 . where A and B are crisp sets for the moment.

75 x is in the fuzzy set big pressure with grade of membership 1.2 FUZZY IMPLICATIONS Consider the implication statement.1 can be interpreted as 1 is in the fuzzy set big pressure with grade of membership 0 2 is in the fuzzy set big pressure with grade of membership 0...(4. if pressure is high then volume is small.2 Membership function for small volume. big pressure.5) 4.(4. The membership function of the fuzzy set A.2) 1 1 5 y Fig. 4. small volume. illustrated in the Fig.1 Membership function for big pressure.25 4 is in the fuzzy set big pressure with grade of membership 0...u A(u) = S1 |0 4 T if u ³ 5 if 1 £ u £ 5 otherwise . 4. 4. .6) The membership function of the fuzzy set B. 4. 1 1 5 X Fig. for all x ³ 5 R1 | 5.30 FUZZY LOGIC AND NEURAL NETWORKS Other interpretation of the implication operator is X ® Y = sup{Z|X Ç Z Ì Y} . can be interpreted as (See Fig.

. for example. v) depends only on A(u) and B(v).7) where A is a fuzzy set.. (A ® B)(u.(4. and B(v) is considered as the truth value of the proposition v is small volume.(4. for all x £1 R1 | v -1 B(v) = S1 |0 4 T If p is a proposition of the form x is A if v ³ 1 if 1 £ v £ 5 otherwise .10) 4 is big pressure ® 1 is small volume A(4) ® B(1) = 0.FUZZY IMPLICATIONS 31 5 is in the fuzzy set small volume with grade of membership 0 4 is in the fuzzy set small volume with grade of membership 0. v) = I(A(u)..75 ® 1 = 1 . It is clear that (A ® B)(u... B(v)) = A(u) ® B(v) In our interpretation A(u) is considered as the truth value of the proposition u is big pressure.. small volume then we define the fuzzy implication A ® B as a fuzzy relation.e.8) R1 S0 T R1 S0 T if t( p) £ t(q ) otherwise .9) One possible extension of material implication to implications with intermediate truth values can be A(u) ® B(v) = if t( p) £ t(q ) otherwise . i.(4. big pressure and q is a proposition of the form y is B for example... That is (A ® B)(u.75 x is in the fuzzy set small volume with grade of membership 1. v) should be defined pointwise and likewise.(4. that is u is big pressure ® v is small volume º A(u) ® B(v) Remembering the full interpretation of the material implication p®q= .25 2 is in the fuzzy set small volume with grade of membership 0.

11) R1 S B (v ) T if A(u) £ B(v) otherwise .. A(u) ® B(v) = . Then A(u) ® B(v) = 0. This operator simply takes the minimum of truth values of fuzzy predicates A(u) ® B(v) = min {A(u).. our system is very sensitive to rounding errors of digital computation and small errors of measurement.. i.13) This operator is called Godel implication.8 ® 0.(4... 1 x + y} x ® y = min{x. it is easy to see that this fuzzy implication operator (called Standard Strict) sometimes is not appropriate for real-life applications.(4.. B(v)} .8 and B(v) = 0.. ® q = Øp Ú q using the definition of negation and union A(u) ® B(v) = max {1 A(u). In many practical applications they use Mamdanis implication operator to model causal relationship between fuzzy variables. Namely. and instead of 0. in knowledge-based systems.(4. because 0 ® 0 yields zero.7999. where the antecedent part is false..20) .14) This operator is called Kleene-Dienes implication...7999 = 0 This example shows that small changes in the input can cause a big deviation in the output. Larsen Lukasiewiez Mamdani x ® y = xy x ® y = min{1.(4.8.(4..8 we have 0.17) .(4.8 ® 0.32 FUZZY LOGIC AND NEURAL NETWORKS However. However. B(v)} .19) Standard Strict x ® y = Godel R1 S0 T R1 x®y= S Ty if x £ y otherwise if x £ y otherwise . we are usually not interested in rules. Other possibility is to extend the original definition.16) ..(4.18) .. z} £ B(v)} so. A smoother extension of material implication operator can be derived from the equation X ® Y = sup {Z| X Ç Z Ì Y} That is A(u) ® B(v) = sup {z| min {A(u). let A(u) = 0.15) It is easy to see this is not a correct extension of material implications..(4.12) ....8 = 1 Suppose there is a small error of measurement in B(v). Then we have A(u) ® B(v) = 0... y} .(4.e..(4.

.(4....22) .23) Kleene-Dienes x ® y = max {1 x. 4. (more or less A)(x) = A( x) .(4. y} Kleene-Dienes-Luk x ® y = 1 x + xy 4. Then we can define the fuzzy sets very A and more or less A by (very A)(x) = A(x)2.FUZZY IMPLICATIONS 33 Gains x®y= R1 S y/x T if x £ y otherwise . A linguistic variable can be regarded either as a variable whose value is a fuzzy number or as a variable whose values are defined in linguistic terms... In particular.3 Very old. Old Very old 30 60 Fig. .24) The use of fuzzy sets provides a basis for a systematic way for the manipulation of vague and imprecise concepts.21) . More or less old Old 30 60 Fig. we can employ fuzzy sets to represent linguistic variables.4 More or less old...(4.3 MODIFIERS Let A be a fuzzy set in X.(4. 4.

. . U.25) in which x is the name of variable. We might interpret slow as a speed below about 40 mph. moderate as a speed close to 55 mph. These terms can be characterized as fuzzy sets whose membership functions are shown in Fig. 4.. that is. NB NM NS ZE PS PM PB –1 1 Fig. .. M) . 100].6.1 LINGUISTIC VARIABLES A linguistic variable is characterized by a quintuple (x.34 FUZZY LOGIC AND NEURAL NETWORKS 4.} where each term in T (speed) is characterized by a fuzzy set in a universe of discourse U = [0.5 Values of linguistic variable speed. 4. T(x). and M is a semantic rule for associating with each value its meaning. and fast as a speed above about 70 mph.. very slow. moderate. 4. G is a syntactic rule for generating the names of values of x. 1 Slow Medium Fast 40 55 70 Speed Fig. G.(4.5. then its term set T (speed) could be T = {slow. the set of names of linguistic values of x with each value being a fuzzy number defined on U.6 A possible partition of [ 1. In many practical applications we normalize the domain of inputs and use the type of fuzzy partition shown in Fig. fast. more or less fast. if speed is interpreted as a linguistic variable. 1]. For example.3. T(x) is the term set of x. 4.

. 1]. [NM] Negative Medium.29) The interpolation if absolutely false and absolutely true are shown in Fig. 4. 4. False (u) = 1 u for each u Î [0..(4. [PB] Positive Big. Fairly true.7 Interpretation of absolutely false and absolutely true. 4. Very true. Very false. True.28) ..7.. Truth True Absolutely true 1 Fig... Absolutely false (u) = . One may define the membership function of linguistic terms of truth as True (u) = u for each u Î [0. 1].. [PS] Positive Small. False.(4.27) . [ZE] Zero.(4.(4. Absolutely true}. .FUZZY IMPLICATIONS 35 Here we used the abbreviations NB Negative Big.2 The Linguistic Variable Truth Truth = {Absolutely false. NS Negative Small.26) R1 S0 T R1 Absolutely true (u) = S T0 1 False Absolutely false if u = 0 otherwise if u = 1 otherwise ..3. [PM] Positive Medium.

... Suppose we have the fuzzy statement x is A. Fairly true (u) = for each u Î [0. Truth Fairly false 1. 1].(4.33) Very false 1 Fig..(4. . 1].31) Very true 1 Fig.34) . 1]. Truth Fairly true u .. Fairly false (u) = for each u Î [0.9 Interpretation of fairly false and very false.(4. 4. Let t be a term of linguistic variable Truth.30) . 1].36 FUZZY LOGIC AND NEURAL NETWORKS The word Fairly is interpreted as more or less. The word Fairly is interpreted as more or less.32) .u .8 Interpretation of fairly true and very true. 4.. Very false (u) = (1 u) 2 for each u Î [0.(4.. Very true (u) = u2 for each u Î [0.. Where (t o A)(u) = t(A(u)) for each u Î [0..(4.. 1]. Then the statement x is A is t is interpreted as x is t o A..

let t = true. Let t = absolutely true.10 Interpretation of A is true. Then x is A is true is defined by x is t o A = x is A because (t o A)(u) = t(A(u)) = A(u) for each u Î [0.(4..(4.36) . Then the statement x is A is Absolutely true is defined by x is t o A. 1]. 4.. where (t o A) (x) = R1 S0 T if A( x) = 0 otherwise .11 Interpretation of A is absolutely true. Let t = absolutely false. 4.35) 1 A is absolutely true a–a a b b–b Fig. It is why everything we write is considered to be true. where (t o A)(x) = R1 S0 T if A( x) = 1 otherwise .. 1 A = “A is true” a–a a b b–b Fig.. Then the statement x is A is Absolutely false is defined by x is t o A.FUZZY IMPLICATIONS 37 For example.

2.38 FUZZY LOGIC AND NEURAL NETWORKS 1 A is absolutely false a–a a b b–b Fig.. Then the statement x is A is Fairly true is defined by x is t o A. where (t o A) (x) = A( x) . What are the linguistic variables? Give examples. 4. What are the fuzzy implications? Explain with examples. QUESTION BANK. What are the fuzzy modifiers? Explain with an example.12 Interpretation of A is absolutely false.14 Interpretation of A is very true.(4. . 3. Let t = Fairly true.. where (t o A)(x) = (A(x))2 .(4..37) 1 “A is fairly true” a–a a b b–b Fig.. 1. Then the statement x is A is Fairly true is defined by x is t o A. 4. 4.38) 1 “A is very true” a–a a b b–b Fig. Let t = Very true.13 Interpretation of A is fairly true.

12. Part 1. pp.]. Mamdani. International Journal of Approximate Reasoning. S. International Journal of Man-Machine Studies. 1980.B. No.F. pp. pp. No. 5. A theorem on implication functions defined from triangular norms. 115-134.W. 193-219. Dubois and H. The concept of a linguistic variable and its application to approximate reasoning. 3. 267-279. 1182-1191.FUZZY IMPLICATIONS 39 4. Mamdani. 3. pp. pp. Vol. No.H. 1984. pp. Vol. 3. 1987. A. IEEE Transactions on Systems. pp. pp. Zadeh. Vol. E. 2. 1980. 1. 9. Fuzzy Sets and Systems. Application of fuzzy implication to probe nonsymmetric relations: Part 1. 26. 1976. W. 8. Valverde. 89-116. 7. In: M. Fuzzy Sets and Systems. 6. Vol. Bandler. 199-249. 1977. 4. No. Vol. 22. 8. Oh and W. 1980. 4. No. 10. 1980. and L. Baldwin and B. Axiomatic approach to implication for approximate reasoning with fuzzy logic. Kisska [Eds. On mode and implications in approximate reasoning. 8. REFERENCES. No. 4. 1. 6. Properties of fuzzy implication operators. A general concept of fuzzy connectives. Man. M. Gupta. 1983.W. Vol. 273-285. 1987. 1975. K. Approximate Reasoning in Expert Systems.J. Bandler and J.E. D. Trillas and L. Pilsworth. Ahlquist. 1985. Fuzzy Sets and Systems. 157-166. 5. Fuzzy power sets and fuzzy implication operators. E. 12. pp. middle-aged and old. negations and implications based on t-norms and t-conorms. 1. 229-244. Bandler. and Cybernetics. 669-678. Bandler and L. 1. Kohout. Given the set 6 of people in the following age groups: 0 10 10 20 20 30 30 40 40 50 50 60 60 70 70 80 80 and above Represent graphically the membership functions of young. No. Explain the linguistic variable TRUTH with examples. pp. pp. Stochastica. Fuzzy Sets and Systems. 2. Applications of fuzzy logic to approximate reasoning using linguistic systems.A. International Journal of Man-Machine Studies. 12. Prade. Semantics of implication operators and fuzzy relational products. Vol. . Two fuzzier implication operators in the theory of fuzzy power sets.H. R. 31-36. W. No. Kandel. Fuzzy Sets and Systems. Willmott. 2. Vol. Vol. Information Sciences. New York. 3. 13-30. Weber. Vol. No. NorthHolland. W. pp. Advances in the linguistic synthesis of fuzzy controllers. 3. 8. Kohout. E. No.J. J. 11. Vol. 1. J. 11. L.

Magrez. 54. pp. 31. Fodor. pp. No. 15. P. 25. 45. pp. No. 20. pp. Wu. Ruan and E. Fuzzy implication in fuzzy systems control.C. Vol. R. 1987. 10. pp. Magrez. 3. Fuzzy implication operators and generalized fuzzy method of cases.40 FUZZY LOGIC AND NEURAL NETWORKS 13. A. Vanmassenhove. 25-35. International Journal of Approximate Reasoning. E. 21. 151-186. M. Kandel. Fuzziness and Knowledge-Based Systems. Kerre. 4. 23-37. 293-300. No. B. Applicability of Some Fuzzy Implication Operators. De Cooman. International Journal of Approximate Reasoning. 1994. No. pp.M. Smets and P. 14. Influence of the fuzzy implication operator on the method-of-cases inference rule. Vol. Castro. Vol. 1994. 1. 17. No. International Journal of Uncertainty. Cao and A. Vol. 235-250. Fuzzy Sets and Systems. Fuzzy Sets and Systems. D. 1991. W. Trillas. 67-72. 1. No. 42. Vol. The measure of the degree of truth and the grade of membership. 19. J. pp. Inducing implication relations. Z. 1988. 4. G.L. Delgado and E. Commutative implications on complete lattices. No. pp. 1990. 1. pp. 1992. International Journal of Approximate Reasoning. . Smets and P. 16. Da. Fuzzy Sets and Systems. Vol. Fuzzy Sets and Systems. Vol. J. 1. 18. 2. 333-341. On fuzzy implication operators. Implication in fuzzy logic.E. 3. P. 1993. 2. No. 307-318. Vol. No. Piskunov. 3. Vol.E Kerre. Cappelle and F. 1989. 4. 327-347.

This theory provides a powerful framework for reasoning in the face of imprecise and uncertain information.1 INTRODUCTION In 1975 Zadeh introduced the theory of approximate reasoning.+ 0 ) 2 6 . then y takes the value f (x1). only we know the values of f (x) for some particular values of x Â1 : also Â2 : If x = x2 then y = y2 If x = x1 then y = y1 .4 The Theory of Approximate Reasoning 5 5. we know that y is a function of x y = f(x) Then we can make inferences easily Premise Fact Consequence y = f(x) x = x1 y = f(x1) This inference rule says that if we have y = f (x). Suppose we have two interactive variables x Î X and y Î Y and the causal relationship between x and y is completely known. " x Î X and we observe that x = x1. Namely. Central to this theory is the representation of propositions as statements assigning fuzzy sets as values to variables. More often than not we do not know the complete causal link f between x and y.

. Â1: also Â2 : If x is A2 then y is C2 If x is A1 then y is C1 If x = x2 then y = y2 If x = x1 then y = y1 . Let x and y be linguistic variables. Suppose that we are given an x1ÎX and want to find an y1ÎY which corresponds to x1 under the rule-base. 5.42 FUZZY LOGIC AND NEURAL NETWORKS also also Ân : If x = xn then y = yn y y = f (x) y =f (x¢) x = x¢ x Fig. . x is high and y is small. The basic problem of approximate reasoning is to find the membership function of the consequence C from the rule-base {Â1. Â1 : also Â2 : also also Ân : fact: If x = xn then y = yn x = x1 Consequence: y = y1 This problem is frequently quoted as interpolation. Ân} and the fact A. . . e.1 Simple crisp inference.g.

THE THEORY OF APPROXIMATE REASONING 43 also also Ân : fact: If x is An then y is Cn x is A Consequence: y is c Zadeh introduced a number of translation rules.1 Entailment Rule x is A AÌB x is B TRANSLATION RULES Menaka is very young very young Ì young Menaka is young 5.2 5.2 Conjunction Rule x is A x is B x is A Ç B Temperature is not very high Temperature is not very low Temperature is not very high and not very low 5. 5. which allow us to represent some common linguistic statements in terms of propositions in our language.3 Disjunction Rule x is A or x is B x is A È B Temperature is not very high or Temperature is not very low Temperature is not very high or not very low .2.2.2.

2. y) have relation R x is Õx (R) (x.44 FUZZY LOGIC AND NEURAL NETWORKS 5.5 Negation Rule not (x is A) x is ØA not (x is high) x is not high In fuzzy logic and approximate reasoning. The fuzzy implication inference is based on the compositional rule of inference for approximate reasoning suggested by Zadeh. y) is close to (3. y) have relation R y is Õy(R) (x. the most important fuzzy implication inference rule is the Generalized Modus Ponens (GMP).2. The classical Modus Ponens inference rule says: premise fact consequence: if p then q p q This inference rule can be interpreted as: If p is true and p ® q is true then q is true.2.6 Compositional Rule Of Inference premise fact consequence: if x is A then y is B x is A1 y is B1 . 5. 2) x is close to 3 (x. y) is close to (3.4 Projection Rule (x. 2) y is close to 2 5.

The Generalized Modus Ponens.2) The consequence B1 is nothing else but the shadow of A ® B on A1.(5. is closely related to the backward goaldriven inference which is commonly used in expert systems. v)}..1 Basic Property if x is A then y is B x is A y is B if pressure is big pressure is big volume is small then volume is small . v Î V uÎU . The Generalized Modus Tollens. which reduces to calssical modus ponens when A1 = A and B1 = B. (A ® B) (u.THE THEORY OF APPROXIMATE REASONING 45 where the consequence B1 is determined as a composition of the fact and the fuzzy implication operator. The Generalized Modus Ponens should satisfy some rational properties. especially in the realm of medical diagnosis. is closely related to the forward data-driven inference which is particularly useful in the Fuzzy Logic Control. B1 = A1 o (A ® B) that is..1) . B¢(v) = sup min {A¢(u).3. B and A1 are fuzzy numbers. premise fact consequence: if x is A then y is B y is B1 x is A1 which reduces to Modus Tollens when B = ¬B and A1 = ¬A...(5. 5. The classical Modus Tollens inference rule says: If p ® q is true and q is false then p is false.3 RATIONAL PROPERTIES Suppose that A. 5.

3.2 Basic property. 5.46 FUZZY LOGIC AND NEURAL NETWORKS A¢ = A B¢ = B Fig. 5. 5.3 Total indeterminance.3.3 Subset if x is A then y is B x is A1 Ì A y is B . 5.2 Total Indeterminance if x is A then y is B x is ¬A y is unknown if pressure is big pressure is not big volume is unknown then volume is small –A – A¢ –B – B¢ Fig.

3.THE THEORY OF APPROXIMATE REASONING 47 if pressure is big then pressure is very big volume is small volume is small B¢ = B –A – A¢ Fig.1: (The GMP with Mamdani implication) if x is A then y is B x is A1 y is B1 . 5.4 Superset if x is A then y is B x is A1 y is B1 É B –A – A¢ –B – B¢ x Fig. B and A1 are fuzzy numbers. We show that the Generalized Modus Ponens with Mamdanis implication operator does not satisfy all the four properties listed above. 5.5 Superset property. 5. Example 5. Suppose that A.4 Subset property.

A¢(x). min {A(x). Then we have B¢(y) = sup min {1 A(x). B(y)} x = min B( y). 1} = B(y) So the subset is satisfied. 1 A(x). B(y)} x = min B( y). B(y)}} x = sup min {A(x).48 FUZZY LOGIC AND NEURAL NETWORKS where the membership function of the consequence B1 is defined by B¢(y) = sup {A¢(x) Ù A(x) Ù B(y) |x Î R}.A( x). min {A(x). Total indeterminance: Let A1 = ØA = 1 A and let y Î R be arbitrarily fixed. sup A¢( x) x R S T U V W = min {B(y). Then we have B¢(y) = sup min {A(x). Subset: Let A¢ Ì A and let y Î R be arbitrarily fixed. sup min { A( x). B(y)}} x = sup min {A(x). y Î R Basic property: Let A1 = A and let y Î R be arbitrarily fixed. . B(y)}} x = sup min {A(x).} x R S T U V W = min {B(y). Then we have B¢(y) = sup min {A¢(x). sup A( x ) x x R S T U V W = min {B(y). B(y)} x = sup min B( y ). 1/2} = 1/2 B(y) < 1 This means that the total indeterminance property is not satisfied. min {A(x). 1} = B(y) So the basic property is satisfied. 1 .

Total indeterminance: Let A1 = ØA = 1 A and let y Î R be arbitrarily fixed. B(y)}} x = sup min {A(x). A¢(x). A(x) B(y) |x Î R} y Î R x Basic property: Let A1 = A and let y Î R be arbitrarily fixed.6 The GMP with Mamdanis implication operator. A –B – B¢ A(x) x Fig. B(y)} £ B(y) x So the superset property of GMP is not satisfied by Mamdanis implication operator. 5. Example 5. A(x) B(y)} x = B( y ) <1 1 + B( y ) This means that the total indeterminance property is not satisfied. min {A(x). Then we have B¢(y) = sup min {A¢(x).2: (The GMP with Larsens product implication) if x is A then y is B x is A1 y is B1 where the membership function of the consequence B1 is defined by B¢(y) = sup min {A¢(x). Then we have B¢(y) = sup min {A(x). A(x) B(y)} = B(y) x So the basic property is satisfied.THE THEORY OF APPROXIMATE REASONING 49 Superset: Let y Î R be arbitrarily fixed. . Then we have B¢(y) = sup min {1 A(x).

A(x) B(y)} £ B(y) x So. 6. 3. Explain the theory of approximate reasoning. Explain generalized modus ponens with Mamdanis implication. 1. 4. Explain generalized modus ponens with Larsens implication. the superset property is not satisfied. 2. Then we have B¢(y) = sup min {A¢(x). A¢(x) B(y)} x = B(y) So the subset property is satisfied. A(x) B(y)} x = sup min {A(x). Then we have B¢(y) = sup min {A¢(x). QUESTION BANK. What are the rational properties? Explain them. What are the translation rules? Explain them with examples. Superset: Let y Î R be arbitrarily fixed. Given CÚD ~ H Þ (A Ù ~ B) CÚDÞ~H (A Ù ~ B) Þ (R Ú S) Can (R Ú S) be inferred from the above? . 5.50 FUZZY LOGIC AND NEURAL NETWORKS Subset: Let A¢ Ì A and let y Î R be arbitrarily fixed. 5. A A¢ –B – B¢ x Fig.7 The GMP with Larsens implication operator.

2. 8. Vol. Wu. 16. The concept of a linguistic variable and its application to approximate reasoningPart II. 6. 9. 15. B. pp. 13. L. 4. 7. 30.H. No. Zadeh. Vol. A new approach to approximate reasoning using a fuzzy logic. Zadeh. Sugeno and T. Application of fuzzy logic to approximate reasoning using linguistic systems. 1975. R. 9. H. Vol. Fuzzy reasoning and fuzzy relational equations. 16. Medical Information. pp. Information Sciences. Vol. 1975. . 12. 8. 4.THE THEORY OF APPROXIMATE REASONING 51 REFERENCES. 8.R. Semantics for fuzzy reasoning. Vol. 43-80. Baldwin. 16. pp. pp. 12. 1. Cybernetics and Systems.A. 11. IEEE Transactions on Systems. Vol. H. 173-186. IEEE Transactions on Systems. 67-78. Vol. Takagi. 1986. 1985. 754-765.A. No. 1976. No. Yager.A. 401-415. No. L. W. R. 309-325. 199-251. Vol. 1983. 1985. 309-325. A computational approach to approximate and plausible reasoning with applications to expert systems.M. 2. 1983. approximate reasoning and dispositions. 1. 1975. 623-668. Foundations of fuzzy reasoning. Linguistic variables. 20. L.A. International Journal of Man-Machine Studies. Vol. 1. Mizumoto and H. 5. Vol. 3. Zadeh. Prade. 407-428. Farreny and H. pp. No. L. 1982. No. pp.A. Vol. M. 2. Gainess. pp. 17. 1983.A. pp. A computational theory of dispositions. 11. L. 3. 1. 39-63. L. The concept of a linguistic variable and its application to approximate reasoningPart III. No. Zadeh. 270-276. pp. Man and Cybernetics. IEEE Transactions on Systems. No. No.F. 1986. Prade. 1182-1191. 1979. pp. 1. IEEE Transactions on Pattern Analysis and Machine Intelligence. The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy Sets and Systems. Vol. No. M. 313-325. Information Sciences. No. Fuzzy logic and approximate reasoning. Syllogistic reasoning in fuzzy logic and its application to usually and reasoning with dispositions. 1987. pp. No. Zadeh. 260-283. 10. L. 26. Vol. Strong truth and rules of inference in fuzzy logic and approximate reasoning. 3. Fuzzy Sets and Systems. Vol. Zimmermann. Syntheses. 23-63. 301-357. No. 2. Vol. 7. pp. Default and inexact reasoning with possibility degress. Vol. pp. Vol. 1985. 4. pp. 199-228. Comparison of fuzzy reasoning methods. The concept of a linguistic variable and its application to approximate reasoningPart I.J.R. Man and Cybernetics. Information Sciences. 17. Giles. 8. 9. Fuzzy Sets and Systems. International Journal of Man-Machine Studies. Mamdani. 14. Man and Cybernetics. 1975. Zadeh. Multi-dimensional Fuzzy Reasoning.A. 18. 8. Zadeh. 6. 4. No. Zadeh. Fuzzy Sets and Systems. 6. Fuzzy Sets and Systems. E. 3. 1979. 1977. pp. 2.A. J. pp. Vol. pp. International Journal of Intelligent Systems. L. 15. pp.

47-68. S. 1987. Dubois and H. Fuzzy Sets and Systems. Lopez de Mantaras. 311-325. Fuzzy reasoning in a multi-dimensional space of hypothesis. Vol. M. 1991. A. 25. IEEE Transactions on Systems. Kandel and L. International Journal of General Systems. Vol. Wang. Ellis Horwood. 1. 4. present. 16. 431-450. 463-488.B. Vol. 4. Qualitative models for reasoning under uncertainty in knowledgebased expert systems. pp. 4. 37. No. A. future. 77-81. Vol. 34. Information Sciences. Fuzzy sets in approximate reasoning.H. D. 1. 36. approximate reasoning and prototypical knowledge. 4. 1991. K. Wang. 26. 1989. Unklesbay. No. 1991. Smets. P. Fuzzy sets in approximate reasoning. Gorzalczany. J. 331-342. 1990. 1991. pp. pp. No.B. Kandel and P. No. pp. Vol. Dehnad. 157-178.G. Prade. pp. Z. 181200. Vol. Vol. 5. Man and Cybernetics. No. No. No. 32. Dubois and H. 1990. Ying. pp. 40. Fuzzy Sets and Systems. 1. 1. Vol. Vol. International Journal of Approximate Reasoning. Vol. Kruse and E. 30. pp. Theory of t-norms and fuzzy inference methods. Prade. 19. 40. 3. A method of inference in approximate reasoning based on interval-valued fuzzy sets. 307-330. Part 1: Inference with Possibility distributions. No. I. Part 2: Logical approaches. Li. 28. No. 23. E. rules and fuzzy reasoning: A factor space approach. 202-244. 40. pp. 1. A system for Reasoning with Imprecise Linguistic Information.Z. Gupta and J. 194-205. D. D. 23-37. D. 1-17. Schwartz. pp. . pp. Y. Vol. Unklesbay and N. 2. Approximate reasoning: past. pp. Vol. International Journal of Approximate Reasoning. 1991. No. 1. No. Console. 4. Fuzzy Sets and Systems. 1988. Fuzzy Sets and Systems. Fuzzy Sets and Systems. No. 281-293. pp. 2. 57. M. Vol. pp. Ruspini.M. No. 5. Magrez and P. Approximate Reasoning Models. 1-11. 1989.P. 1990. International Journal of Intelligent Systems. N. International Journal of Intelligent Systems. 1988. X. 3. R. No.52 FUZZY LOGIC AND NEURAL NETWORKS 19.S. Lee. 1987. Approximate reasoning for production planning. No. Vol. M. 24. Cao. No. 33. pp. Fuzzy Sets and Systems. Turksen. 1. 3. No. 27. C.Z. 21. Vol. Luo and Z. 36. 1991. 5. Fuzzy Sets and Systems. C. 26.L Grize and K.M. 1. 15-38. Torasso and L. 1990. Feng. Peng. Representation of compositional relations in fuzzy reasoning. Cybernetics and Systems. Fuzzy Sets and Systems. T. 36. 31. 1. 3. Keller. 1989. pp. Vol. 1990. Schwecke. 35. pp. Dutta. 20. pp. approximate spatial reasoning: Integrating qualitative and quantitative constraints. Fuzzy modus ponens: A new model suitable for applications in knowledge-based systems. P. Subhangkasen. International Journal of Approximate Reasoning. 1991. pp. Quantitative evaluation of university teaching quality An application of fuzzy set and approximate reasoning.S. Vol. A new model of fuzzy reasoning. 22. Chischster. 37. Some notes on multi-dimensional fuzzy reasoning. 297-317. Vol. 21. 143-202. 21. R. Concepts. Qi. 29. An approximate reasoning technique for recognition in color images of beef steaks.

1994. P. Vol. International Journal of Approximate Reasoning. 4. Fuzzy Sets and Systems. 46. Some methods of reasoning for fuzzy conditional propositions. Fuzzy Sets and Systems.L. 131-146. 40. M. Lano. Koczy and K. K. 315-326. Anderson. 11. International Journal of Pattern Recognition and Artificial Intelligence. 50. Zhang. Zhao and B. 1. pp. 44. 1992. Li. 52. No. 2. 42. An approximate reasoning system. No. Chun. 349358. 1994. pp. Analogy between approximate reasoning and the method of interpolation. pp. International Journal of Approximate Reasoning. pp.V. No.T. Vol. 23. S. Vol. D. On measuring the specificity of IF-THEN rules. 1992.L. Hirota. 5. 259-266. pp. pp. No. pp. 1. No. No. Kacprzyk. 45. No. Reddy and M. 431-440. Z. Raha and K. Ray. 3. IEEE Transactions. 1993. C. International journal of approximate reasoning. 1993. Cohen and M. 1-25. An improved algorithm for inexact reasoning based on extended fuzzy production rules.G. pp. pp. 1. J. Approximate reasoning by linear rule interpolation and general approximation. 49. Rule-based Fuzzy Classification for Software Quality Control. International Journal of Intelligent Systems.THE THEORY OF APPROXIMATE REASONING 53 38. pp. Formal frameworks for approximate reasoning. Q. 2. Cybernetics and Systems. L. 3.S. 9. No. An inference network for bi-directional approximate reasoning based on an equality measure. Vol. 1993. Vol. An approximate reasoning system: Design and implementation. Vol. Babu. Vol. 48. Yager.E. 1992. 47. No. 39. 197-225. Chen. Metric truth as a basis for fuzzy linguistic reasoning. Elbert. 1. 7. 463-481. 7.A. Larsen and R. . 1992. 2. Fuzzy Sets and Systems. Vol. Vol. 63. pp. IEEE Transactions on Fuzzy Systems. No. 41. 23. 71-79. 43. The use of fuzzy relational thesauri for classificatory problem solving in information retrieval and expert systems. 3. 3. Niskanen. Vol. Vol. 3. S. 29-53. Approximate reasoning with IF-THEN-UNLESS rule in a medical expert system.R. No. Vol.S. 177-180. No. 1993. 229-250. 9. 1992. Fuzzy Sets and Systems. on Systems. pp. V. pp. 51.M.F. No. 57. 31-41. Z. Vol. Bien and M. 1994. 1993. H. Hudson. 51. Man and Cybernetics.

6) £ukasiewicz: TL(a.1. Furthermore.. b) = max {a + b. associative. z). z)) = T(T(x. b) = min {a.. 1] T(x. y. T(y. T(x. 1]. 1) = a.+ 0 ) 2 6 .. x). Functions that qualify as fuzzy intersections and fuzzy unions are usually referred to in the literature as t-norms and t-conorms. y).5) . "x.. 1] is a triangular norm (t-norm for short) if it is symmetric. The basic t-norms are: minimum : min (a. 1] ® [0.. That is. 1] ´ [0. y¢) if x £ x¢ and y £ y¢.1) .(6. the standard fuzzy operations occupy specific positions in the whole spectrum of fuzzy operations: the standard fuzzy intersection is the weakest fuzzy intersection. "x.(6. "x Î [0. any t-norm T satisfies the properties: Symmetricity : Associativity : Monotonicity : One identy : T(x. the standard fuzzy intersection (min operator) produces for any given fuzzy sets the largest fuzzy set from among those produced by all possible fuzzy intersections (t-norms). for all a Î [0.. The standard fuzzy union (max operator) produces.2 TRIANGULAR NORM A mapping T: [0. b} . y) = T(y.1 INTRODUCTION Triangular norms were introduced by Schweizer and Sklar to model the distances in probabilistic metric spaces.2) . non-decreasing in each argument and T(a. In fuzzy sets theory triangular norm are extensively used to model the logical connective and. 1] .(6. In other words.(6. y) £ T(x¢. 0} . the smallest fuzzy set among the fuzzy sets produced by all possible fuzzy unions (t-conorms).(6.(6. 6. z Î [0. while the standard fuzzy union is the strongest fuzzy union. 1) = x. y Î [0.. 1] T(x.... respectively.4 Fuzzy Rule-Based Systems 6 6.4) These axioms attempt to capture the basic properties of set intersection.. on the contrary.3) ..

1].7) Rmin {a.(6. 0) = a. y) £ S(x¢. x) S(x.(6.13) . a2...9) . 1] ® [0.. z) S(x. non-decreasing in each argument and S(a. b} ïT (a...(6. to n > 2 arguments.. b) = Da(a.. b) = 1 min 1.(6... a Î (0. S(y. a} Yp(a.. through associativity.. b. b} S0 T if max {a .b) p ] . y) = S(y. In other words... for all a Î [0. z)) = S(S(x.ab) ab .FUZZY RULE-BASED SYSTEMS 55 product : weak : TP(a. 1] ´ [0. 6. an) = a1 X a2 X .. is a triangular co-norm (t-conorm) if it is symmetric.1) ú ï l -1 ê ú ë û î if l = 0 if l = 1 if l = ¥ otherwise .. b) = íTL (a.. "x Î [0.3 TRIANGULAR CONORM A mapping S : [0. X an TL (a1. an) = max . associative.15) .16) .(6.. { } p>0 .14) R a .(6. g³0 g + (1 ...(6. 1].g )(a + b .17) . any t-conorm S satisfies the properties: Symmetry : S(x..(6. 1] ..log l ê1 + (l .1) (l ..18) Associativity : Monotonicity : Zero identity : .. y)... 0) = x. b) = Hg(a. b} = 1 otherwise .10) Hamacher : Dubois and Prade : Yager : ab .12) All t-norms may be extended.a ) p + (1 . Triangular co-norms are extensively used to model logical connectives or...(6. 0U |å | S V | | T W n i i =1 A t-norm T is called strict if T is strictly increasing in each argument. b) ï P ï Fl(a. y¢) if x £ x¢ and y £ y¢ S(x.n + 1.(6... b) = ab Tw(a..11) Frank : ì min {a.. The minimum t-norm is automatically extended and TP (a1. b) ï a b é ù ï1 .(6.. p [(1 ..8) ..(6.. b) = . 1) max {a . a2.

b} £ s (a.. y) = T (y.. 1} SP(a. y) £ T(x. These equations show that T(a.22) £ukasiewicz : probabilistic : strong : . y) £ min {x. a) = a holds for any aÎ[0.. y}. b) = 0 otherwise . b) = min 1... .g )ab . and a £ b £ 1. 1) £ x T(x.56 FUZZY LOGIC AND NEURAL NETWORKS If T is a t-norm then the equality S(a.. y) £ min {x. b} for any a Î [0.g )ab . We can obtain the following expression using monotonicity of T a = T(a. b}. b).(6. y}. a) £ T(b. p a p + b p . b) then T(a.(6.(6.19) Rmax {a.. y}. y) £ T(x.(2 . The basic t-conorms are: maximum : max (a.24) . b) = . Lemma 6. x) ³ S(y. y) ³ S(y.(6. a}. b) = min (a. b) = a + b. Proof: If T(a.. 0) ³ x S(x.. Then the following statement holds Tw(x. b Î [0.21) . b} SL(a.(6. symmetricity and the external condition we get T(x. From commutativity of T it follows that a = T(a. 1].3: T(a.20) . 1) £ y This means that T(x.. b) £ STRONG (a.2: Proof: Let S be a t-conorm. b) = max {a. Then the following statement holds max{a. 1]..1: Proof: : : HORg (a. Suppose T(a. 0) ³ y This means that S(x. Lemma 6. a) = a holds obviously. b) = min {a + b. b) £ min {a. y) ³ S(x. b) = a + b .(1 .1] From monotonicity. P > 0 { } Let T be a t-norm. g³0 1 .. a) £ T(a. y Î[0. x) £ T (y. a) = a for any aÎ[0. y) ³ max{x. "x.(6. b) = 1 T(1 a. 1] if and only if T is the minimum norm..(6.. 1 b) defines a t-conorm and we say that S is derived from T.23) Hamacher Yager Lemma 6. b) = min {a.25) YORP (a.ab STRONG (a. b} S1 T if min (a . a) £ min {b. 1] From monotonicity. symmetricity and the extremal condition we get S(x. "a.

"a. 0} (A Ç B) (t) = max {A(t) + B(t 1.3/x6 + 0.6/x2 + 1. 0)} for all t Î X.4: The distributive law of t-norm T on the max operator holds for any a.0/x4 + 1. b} £ C(a. x5.0/x1 + 0.26) Let T be a t-norm.0/x1 + 0.1] then we say that C is a compensatory operator.0/x5 + 0.28) . x3. x2.0/x6 + 0. b Î[0. Let A and B be fuzzy subsets of X = {x1. Example 6. 1]. y)= Ð AND (x.27) Let S be a t-conorm. Then we have 6.. c)}.3/x6 + 0.0/x7 B = 0.(6.0/x2 + 0.3/x2 + 0.3/x2 + 0. y) = { x + y 1. If we are given an operator C such that min {a. T(b.3/x2 + 0. 6.3/x2 + 0.(6.0/x5 + 0. . 1: Let T(x.0/x5 + 0.(6.. y) = LOR (x. x4. x7} and be defined by A = 0.1/x1 + 0. B(t)1} for all t Î X.2/x7.5/x3 +1. The T-intersection of A and B is defined as (A Ç B) (t) T (A(t). The operation union can be defined by the help of triangular conorms. c) = max {T(a. x6.0/x1 + 0.0/x3 +1.1/x1 + 0. c Î[0. y) = min {x + y. B(t)) for all t Î X.2/x7.0/x4 + 1. b}.3/x6 + 0. The S-union of A and B is defined as (A Ç B) (t) = S(A(t). x2. x3. x4.6/x6 + 0. x5.0/x4 + 0.5 J-CONORM-BASED UNION .FUZZY RULE-BASED SYSTEMS 57 Lemma 6.0/x4 + 1.6/x5 + 0.0/x4 + 0. x6. b. Let A and B be fuzzy subsets of X = {x1. Then we have (A È B) (t) = min {A(t).6/x3 + 1..9/x3 + 1. T(max{a.6/x3 + 1. b}.6/x5 + 0.9/x3 + 1. b) £ max {a. 1}be the £ukasiewicz t-conorm.0/x7 B = 0. B(t)) for all t Î X.6/x5 + 0.3/x6 + 0.2/x7.4 J-NORM-BASED INTERSECTION .. c)..0/x4 + 0. Then A Ç B has the following form A Ç B = 0. 2: Let (S(x. x7} and be defined by A = 0.. Example 6.2/x7 Then A È B has the following form A È B = 0. be the £ukasiewicz t-norm.1/x1 + 0.

x) = x..58 FUZZY LOGIC AND NEURAL NETWORKS 6. by allowing a positive compensation between ratings.(6.6 AVERAGING OPERATORS A typical compensatory operator is the arithmetical mean defined as MEAN (a.(6. intersection and compensation connectives. when a compensation between the corresponding compabilities is allowed. y¢). These connectives can be categorized into the following three classes union. Union produces a high output whenever any one of the input values representing degrees of satisfaction of different features or criteria is high.. . if x £ x¢ and y £ y¢ . y Î[0.29) Fuzzy set theory provides a host of attractive aggregation connectives for integrating membership values representing uncertain information.31) . "x. In the sense..32) .(6.. union connectives provide full compensation and intersection connectives provide no compensation.33) ..30) . y) £ max {x. y}.1 An Averaging Operator is a Function M : [0... Compensative connectives have the property that a higher degree of satisfaction of one of the criteria can compensate for a lower degree of satisfaction of another criteria to a certain extent. 1] Commutativity M(x. the global evaluation of an action will lie between the worst and the best local ratings: Lemma 6.34) M is continuous. 6. Averaging operators realize trade-offs between objectives. 1) = 1 Monotonicity M(x. Intersection connectives produce a high output only when all of the inputs have high values.(6.(6.6. y Î[0. y) £ M(x¢. y} £ M(x.. We prove that whatever is the particular definition of an averaging operator.y) = M(y. M. "x Î[0. "x. 1]. Averaging operators represent a wide class of aggregation operators.5: If M is an averaging operator then min {x. 1] ® [0. 1] Extremal conditions M(0.. b) = a+b 2 .(6. M(1... In a decision process the idea of trade-offs corresponds to viewing the global evaluation of an action as lying between the worst and the best local ratings. 0) = 0. This occurs in the presence of conflicting goals. 1] ´ [0. 1] satisfying the following properties Idempotency M(x.. x).

y}) = max {x. y} = M(min {x.x )(1 . y). y} £ M(max {x. aÎ(0.an) = f 1 F1 I GH n å f (a )JK n 1 i =1 This family has been characterized by Kolmogorov as being the class of all decomposable continuous averaging operators. Property 2..(6.. 1) e(x p + y p ) 2j 1/p . a) = Sa |x T if if if x£ y£a x£a£ y a£x£y . Averaging operators have the following interesting properties: Property 1. y}. For example. min {x.35) where aÎ(0.x .(1 . y... max {x...y ) ( x + y . y} Which ends the proof.(6.y ) med (x. a2) = f 1 FG f (a ) + f (a ) IJ H 2 K 1 2 . y}) £ M(x. y) 2xy ( x + y ) xy (x + y ) 2 1 . a) = med (a.FUZZY RULE-BASED SYSTEMS 59 Proof: From idempotency and monotonicity of M it follows that min {x. y.1 Name Harmonic mean Geometric mean Arithmetic mean Dual of geometric mean Dual of harmonic mean Median Generalized p-mean Mean operators M(x. A strictly increasing averaging operator cannot be associative. 1) An important family of averaging operators is formed by quasi-arithmetic means M(a1. Table 6. The only associative averaging operators are defined by Ry | M(x. y) and M{x.36) The next table shows the most often used mean operators. p³1 . a). y..2 xy ) (2 . the quasi-arithmetic mean of a1 and a2 is defined by M(a1.

.. For any OWA operator F holds F*(a1. A fundamental aspect of this operator is the re-ordering step.3 ´ 0..+ wnbn = where bj is the j-th largest element of the bag (a1... 0.42) . w2. a2. 1 £ i £ n and w1 + w2 +.. a2......2.. an) . From the above it becomes clear that for any F min {a1. 0) and F*(a1. .1 ´ 0. 0.40) A number of important properties can be associated with the OWA operators. an) £ F(a1. 0...3: Assume W = (0.(6.(6.0)T and F*(a1......(6. Furthermore F(a1..(6.... a2... In 1988 Yager introduced a new aggregation technique based on the ordered weighted averaging (OWA) operators.an) = w1b1 + w2b2 +.3. When we view the OWA weights as a column vector we shall find it convenient to refer to the weights with the low indices as weights at the top and those with the higher indices with weights at the bottom.2...(6.. an} FA: In this case W = WA = (1/n ..60 FUZZY LOGIC AND NEURAL NETWORKS 6. an} F*: In this case W = W* = (1. .4.2 = 0.6. . In 1988 Yager pointed out three important special cases of OWA aggregations: F*: In this case W = W* = (1.38) . that has an associated weighting vector W = (w1. .... ..7 + 0.... a2..wn)T such as wi Î[0. an) £ F *(a1... an) = min {a1. One sees aggregation in neural networks. an)... an} .1)T. a2.. an} £ F(a1.75. ...7. an) = T T å wjbj j =1 n . .. an) £ max {a1. then F(0..(6..... a2.. a2.. 1. a2.. We shall now discuss some of these.. a2. ... 0. 1]..2 Ordered Weighted Averaging The process of information aggregation appears in many applications related to the development of intelligent systems.. An OWA operator of dimension n is mapping F: Ân ® Â. 0.+ wn = 1. ..39) a1 + .2 ´ 0. 0. a2. in particular an aggregate ai is not associated with a particular weight wi but rather a weight is associated with a particular ordered position of aggregate.. Example 6..6 + 0. + an n . a2.. 1/n) and FA(a1.. .. vision systems.. 0.....4 ´ 1 + 0. a2..6) = 0.37) . It is noted that different OWA operators are distinguished by their weighting function. an) = max {a1.. fuzzy logic controllers. .. expert systems and multi-criteria decision aids.41) Thus the upper an lower star OWA operator are its boundaries.

. cn) .. For this class of operators we have R0 |1 w = S m |0 T i if if if i<k k £i £k +m i³ k +m 1/m 1 k k+m–1 n Fig.. .. ... d2.45) From the above we can see the OWA operators have the basic properties associated with an averaging operator. .. the closer its measure is to none. a2.44) where F is some fixed weight OWA operator..43) A third characteristic associated with these operators is monotonicity.. 0. . Assume ai and ci are a collection of aggregates. an) ³ (c1. dn) . a2. 0. .1 Window type OWA operator.... ai ³ ci.. Then F(a1.. . n such that for each i.. 0)T and WA = (1/n. while the nearer it is to an and. . a measure of orness.....(6. 4: A window type OWA operator takes the average of the m arguments around the center..... dn} be any permutation of the ai... W* = (1. . .. 1/n)T.. ...6: Let us consider the vectors W * = (1. an) = F(d1. 0)T... Lemma 6. a2. associated with any vector W is introduce by Yager as follows orness (W) = 1 n-1 å (n i)wi i =1 n It is easy to see that for anyW the orness(W) is always in the unit interval. an) = a. In order to classify OWA operators in regard to their location between and and or.(6. Furthermore. i = 1. Another characteristic associated with these operators is idempotency. an) be a bag of aggregates and let {d1.. the closer is to zero. c2.. 6. . . Example 6..(6. note that the nearer W is to an or. Then for any OWA operator F(a1.. If ai = a for all i then for any OWA operator F(a1.FUZZY RULE-BASED SYSTEMS 61 The OWA operator can be seen to be commutative.. . Let (a1....

6 and 3 andness (W) = 1 orness (W) = 1 0.. wn)T orness (W ¢) = 1 n-1 å ( n . orness (W) ³ 0. Theorem 6. that is.8 + 0.i ) w + ( n .8b1 + 0. an OWA opeartor with much of non-zero weights near the top will be an orlike operator.5: Let W = (0. Then orness (W) = 1 (2 ´ 0.62 FUZZY LOGIC AND NEURAL NETWORKS Then it can easily be shown that orness (W*) = 1. a2. while moving weight down causes us to decrease orness(W). a3) = 0.4 This means that the OWA operator...5.0)T. . a3) is an or like aggregation. The following theorem shows that as we move weight up the vector we increase the orness.2b2 + 0.2) = 0.6 = 0. 1 e(k j) n-1 .j ) e ..(6. the OWA operator will be andlike. Generally.. Example 6. wj + e.5 and when much of the weights are non-zero near the bottom.. .k ) e i 1 i i 1 orness (W¢ ) = orness (W ) + since k > j.46) (Yager.i ) w¢ = n . defined by F(a1....1: and W ¢ = (w1. 1993) Assume W and W ¢ are two n-dimensional OWA vectors such that W = (w1.. 0. 0. wn)T where e > 0. Then orness (W¢ ) > orness (W) Proof: From the definition of the measure of orness we get ... orness (W ¢ ) > orness (W ). A measure of andness is defined as andness (W) = 1 orness (W).0b3 where bj is the j-th largest element of the bag (a1. j < k. andness (W) ³ 0.1 å ( n . that is.8. .( n .2.. a2. wk e. orness (W*) = 0 and orness (WA) = 0.5.

C1(w)} for all w. w)} = sup min { x0 (u).. Then the process of computation of the membership function of the consequence becomes very simple. C1(w)}} u u . the supremum turns into a simple minimum C(w) = min { x0 (x0) Ù A1 (x0) Ù C1(w)} = min {1 Ù A1(x0) Ù C1(w)} = min {A1(x0)..(6.49) . We can see when using the OWA operator as an averaging operator Disp(W) measures the degree to which we use all the aggregates equally. min {A1(u). (A1 ® C1) (u. 6. Suppose now that the fact of the GMP is given by a fuzzy singleton. 1 x0 X0 Fig. "u ¹ x0. ...FUZZY RULE-BASED SYSTEMS 63 6.48) for all w.. For example. Observing that x0 (u) = 0.2 Fuzzy singleton.(6.7 MEASURE OF DISPERSION OR ENTROPY OF AN OWA VECTOR In 1988 Yager defined the measure of dispersion (or entropy) of an OWA vector by disp (W) = å w ln w i i i .. if we use Mamdanis implication operator in the GMP then Rule 1: Fact: consequence: if x is A1 then z is C1 x is x0 z is C where the membership function of the consequence C is computed as C(w) = sup min { x0 (u).47) We can see when using the OWA operator as an averaging operator Disp (W) measures the degree to which we use all the aggregates equally.(6.

. w)} = A1(x0) ® C1(w) u .(6... So.(6. w)} = A1(x0) ® C1(w) u for all w. 6.51) C A1 C1 X0 Fig. . (A1 ® C1) (u.52) where the membership function of the consequence C is computed as C(w) = sup min { x0 (u). (A1 ® C1) (u.. 3: Inference with Mamdanis implication operator. then C(w) = sup min { x0 (u). C(w) = R1 SC (w) T 1 if A1 ( x0 ) £ C1 ( w) otherwise .64 FUZZY LOGIC AND NEURAL NETWORKS A1 C1 C A1(x0) X0 U W Fig.50) for all w..4 u Inference with Godel implication operator. If we use Godel implication operator in the GMP. 6.. W Rule 1: Fact Consequence: if x is A1 then z is C1 x is x0 z is C .(6.

......(6..FUZZY RULE-BASED SYSTEMS 65 Consider a block of fuzzy IF-THEN rules R1 : also R2: also also Rn : fact: Consequence: The i-th fuzzy rule from this rule-base Ri : if x is Ai then z is Ci is implemented by a fuzzy implication Ri and is defined as Ri(u.(6.(6..53) . .54) . n.58) C(w) = A1(x0) ® C1(w) Ú ..(6. n..... È x0 o Rn 1 i =1 n if x is A1 then z is C1 if x is A2 then z is C2 if x is An then z is Cn x is x0 z is C ......(6... Ú An(x0) ® Cn(w) .. Rn} Interpretation of sentence connective also implication operator then compositional operator o We first compose x0 with each Ri producing intermediate result C¢1 = x0 o Ri for i = 1..57) .. Find C from the input x0 and from the rule base R = {R1.. w) = Ai(u) ® Ci(w) for i = 1.(6.55) .56) ... C¢1 is called the output of the i-th rule C ¢ (w) = Ai(x0) ® Ci(w) 1 for each w. Then combine the C¢1 component wise into C¢ by some aggregation operator: C = U C¢ = x0 o R1 È .. w) = (Ai ® Ci)(u.

.(6.66 FUZZY LOGIC AND NEURAL NETWORKS So.8 For the Mamdani (Fig.60) ... . 6..9 LARSEN SYSTEM For the Larsen (Fig....62) ...65) ...(6. the inference process is the following input to the system is x0 fuzzified input is x0 firing strength of the i-th rule is Ai(x0) the i-th individual rule output is C¢1(w): = A1(x0) ® C1(w) overall system output (action) is C = C¢ È .(6. ÈC¢ 1 n Overall system output = union of the individual rule outputs..5) system MAMDANI SYSTEM ..6) system (a ® b = ab) input to the system is x0 fuzzified input is x0 firing strength of the i-th rule is Ai(x0) the i-th individual rule output is C ¢ (w) = Ai(x0) Ci(w) 1 overall system output (action) is C(w) = V Ai(x0) Ci(w) i =1 n .61) (a ® b = a Ù b) input to the system is x0 fuzzified input is x0 firing strength of the i-th rule is Ai(x0) the i-th individual rule output is C¢ (w) = Ai(x0) Ù Ci(w) 1 overall system output (action) is C(w) = V Ai(x0) Ù Ci(w) i =1 n ..63) 6.64) ..66) .. 6.(6.(6.(6.(6..59) 6.(6..

Consequently.10 DEFUZZIFICATION The output of the inference process so far is a fuzzy set. 6. 6. A1 A1(X0) C1 C ¢1 A2 A2(X0) X0 C2 C ¢2 C = C ¢2 Fig.FUZZY RULE-BASED SYSTEMS 67 A1 C1 C ¢1 Degree of match X0 Individual rule output A2 C2 = C ¢2 Degree of match X0 Individual rule output Overall system output Fig. namely: . a non-fuzzy (crisp) control action is usually required.6 Illustration of Larsen system. specifying a possibility distribution of the (control) action. 6. In the on-line control. one must defuzzify the fuzzy control action (output) inferred from the fuzzy reasoning algorithm.5 Illustration of Mamdani system.

Pacific Journal of Mathematics. Schwartz and A. y) £ min{x. Vol. 19. 8. 3. Sklar. pp. Prove the following statement TW (x.. 9. 6. What is t-norm based intersection? Explain with an example. What is entropy of an ordered weighted averaging (OWA) vector? Explain the inference with Mamdanis implication operator.(6. Debrecen. 3. 12. What is t-norm? What are the properties to be satisfied by a t-norm? What are the various basic t-norms? What is t-conorm? What are the properties to be satisfied by a t-conorm? What are the various basic t-conorms? Let T be a t-norm. B. B. Sklar. Debrecen. 10. Associative functions and statistical triangle inequalities. 169-186. What is t-conorm based union? Explain with an example. 18. b} £ (S(a. y) £ T(x. "x. Schwartz and A. 1960. b). 4. 17. y}. Defuzzification is a process to select a representative element from the fuzzy output C inferred from the fuzzy control algorithm. 69-81. y Î[0. 1].68 FUZZY LOGIC AND NEURAL NETWORKS z0 = defuzzifier (C) . 1961. 16. pp. b) £ STRONG (a. Sklar. What are the averaging operators? What are the important properties of averaging operators? Explain order weighted averaging with an example. 7. What is defuzzification? REFERENCES.67) where z0 is the crisp action and defuzzifier is the defuzzification operator. 10. B. Explain Mamdani rule-based system. Vol. 15. Explain Larsen rule-based system. Explain the Measure of dispersion. 1] 8. 14. Vol. 20. 11. Prove the following statement: max {a. 2.. pp. 1. Schwartz and A. 313-334. 1. Associative functions and abstract semigroups. " a. QUESTION BANK. 5. 1963. b Î [0. Statistical metric spaces. Let S be a t-conorm. 13. . 2. 10. Publication Mathematics. Explain the inference with Godels implication operator. Publication Mathematics.

15. Vol. IEEE Transactions on Systems. pp. International Journal of Intelligent Systems. R. Applied Intellignece. Fuzzy Sets and Systems. 12. Mendel. 1. 5. Wang.H. 1. pp. Vol. 22. 1982. . 195199. pp. Vol. pp. 20. 207-229. 1988. No. pp. pp. Qi. Peng. Vol. 19. Rhee and R. Vol. 1990. 3. pp. R. Fuzzy rule generation for fuzzy control.L. No. 3. Fuzzy Sets and Systems. No. 71-79. 23. 1. Fuzzy Sets and Systems. No. A remark on constructing t-norms. Fuzzy control rules and their natural control laws.M. 2. No. 31-45. 15-23. 2. 5. 36. 1982. Fodor. pp. Fuzzy rule generation methods for high-level computer vision. 40. A generalized defuzzification method via bad distributions. pp. 14. F. 1988. No. Information Sciences. Zimmerman. 61. 1985. 48. 1993. No. 3. Cybernetics and Systems. No. 23-63. X. Vol. pp. Man and Cybernetics. 21. 13. F. 687-687. 17. 1991. 3. Vol. 1992. 4. Gradual inference rules in approximate reasoning.J. 83-89. Cao. 16. R. 1. 5. No. 267-275. Dubois and H. Vol.F. L.C. Vol. 7. Vol. Vol. No. M. 3-13. No. Vol. Measures of fuzziness based on t-norms. 6.R. 113-127.T. Generating fuzzy rules by learning through examples. Theory of t-norms and fuzzy inference methods. Cybernetics and Systems. No. 245-258. D. Bernard. M. pp. V. pp. Vol. Fuzzy Sets and Systems. 1. 1992. J. Vol. Keller. Yagar. 3. and J. Fuzzy sets and t-norms in the light of fuzzy logic. International Journal of Intelligent Systems. Vol. Coben and M. Filev and R. Lim and T. 1. 431-450.E. 3.M. 8.H. 1991. Novak and W. Pedrucz. Fuzzy Sets and Systems. Nafarich and J. Bukley. Strong truth and rules of inference in fuzzy logic and approximate reasoning. 275-29358.A. 8. Fuzzy Sets and Systems. 9. 11. 48. pp. IEEE Expert. No. No. 1991. 13. 60.R. 29. 6. Fuzzy Sets and Systems. B.C. 103-122.R. No. 7. 1992. Bouslama and A Ichikawa. Czogala and W. A fuzzy logic rule-based automatic target recognition.J. Yagar. 335-351. 6. Hudson. R.FUZZY RULE-BASED SYSTEMS 69 4. Implementing fuzzy rule-based systems on silicon chips. J. No. 1990. 1. Vol. Yagar. 1992. 6. International Journal of Man-machine Studies. No. Gupta and J. 1993. Vol. D. Prade. Yagar. On computation of the compositional rule of inference under triangular norms. D. Pedrycz. No. Generating rules for fuzzy logic controllers by functions.M. IEEE Control Systems Magazine.P. Vol. Krishanpuram. Vol. 51. pp. A.R. Approximate reasoning with IF-THEN-UNLESS rule in a medical expert system. A general theory of uncertainty based on t-conorms. pp. 10. pp. 1. pp. pp. Vol. 295-312. Anderson. 16. 22. pp. No. 65-86. Fuzzy Sets and Systems. No. 1. 59. 1992. 289-296. 1991. E. Stochastica. 1992. 18. 1992. Input-output mathematical model with t-fuzzy sets. 1414-1427. International Journal of Intelligent Systems.X. J. 41. Use of rule-based system for process control. 7. 6. M. Fuller and H. Takefuji. pp. A general approach to rule aggregation in fuzzy logic control.

59. Yamakawa. No. 3-40. No.B. P. 25. E. Fuzzy IF-THEN-UNLESS rules and their implementation. 163-204. pp. Turksen. 1994. 1. No. pp. 64. Vol. Uchino. Pedrycz. International Journal of Uncertainity. 33. 28. Doherry. Fuzzy rule-based simple interpolation algorithm for discrete signal. Dutta and P. 1994. 1994. Bonissone. 3. 1994. 167-182. 39-58. No. 1993. Arnould and S. pp. Sudkamp. W. Sudkamp. International Journal of Approximate Reasoning. 235-255. Vol. 3. No. Ebert. 11. Vol. Fuzziness and Knowledge-based Systems. . No. 58. 1. 8. Vol. Similarity. Why triangular membership functions? Fuzzy Sets and Systems. Vol. pp. 63. 26. T. C. Miki and S. 1993. T. Cross and T.Tian and I. S. Tano. 11. No.70 FUZZY LOGIC AND NEURAL NETWORKS 24. On measuring the specificity of IF-THEN rules. Fuzzy Sets and Systems. No. 30. Vol. 29. pp. No. pp. 27. 31. 73-86. 1994. Combination of rules or their consequences in fuzzy expert systems. 1993. A rule-based method to calculate exactly the widest solutions sets of a max-min fuzzy relations inequality. T. 64. Kacprzyk. pp. 1993. Vol. Nakamura. 3. Fuzzy Sets and Systems. 32. 349-358.P. P. 21-30. 2. 1. Fuzzy Sets and Systems. 1. pp. T. 259-270. 3. Y. 1993. 1. Vol. 1. International Journal of Approximate Reasoning. Fuzzy Sets and Systems. 29-53. International Journal of Approximate Reasoning. V. Vol. pp. Integrating case and rule-based reasoning. Vol. Fuzzy Sets and Systems. interpolation and fuzzy rule construction. Rule-based fuzzy classification for software quality control. No. Driankov and H. J. Hellendoom. pp. 58. Patterns of fuzzy-rule based interference.

In this case. Since the data-driven method proceeds from IF clauses to THEN clauses in the chain through the production rules.2 FUZZY RULE-BASE SYSTEM R1 : R2 : ..... Backward chaining has the advantage of speed. it is exemplified by the generalized modus tollens form of logical inference. An alternative method of evaluation is goal-driven.. 7.. since only the rules leading to the objective need to be evaluated. since the goal-driven method proceeds backward from THEN clauses to the IF clauses. The inference engine of a fuzzy expert system operates on a series of production rules and makes fuzzy inferences.1 INTRODUCTION This chapter focuses different inference mechanisms in fuzzy rule-based systems with examples. it is commonly called backward chaining. Here. which then uses them to evaluate relevant production rules and draw all possible conclusions. The first is data-driven and is exemplified by the generalized modus ponens. Rn : if x is An and y is Bn then z is Cn z is C x is x0 and y is y0 if x is A1 and y is B1 then z is C1 if x is A2 and y is B2 then z is C2 ..+ 0 ) 2 6 . available data are supplied to the expert system. in the THEN clauses of other production rules..... the expert system searches for data specified in the IF clauses of production rules that will lead to the objective. There exist two approaches to evaluating relevant production rules. Similarly. it is commonly called forward chaining. or by querying the user. in its search for the required data.4 Fuzzy Reasoning Schemes 7 7. these data are found either in the knowledge base.

. ÈC¢ .. . .3 INFERENCE MECHANISMS IN FUZZY RULE-BASE SYSTEMS We present five well-known inference mechanisms in fuzzy rule-based systems.... Rn} Interpretation of logical connective and sentence connective also implication operator then compositional operator o We first compose x0 ´ y0 with each Ri producing intermediate result C¢ = x0 ´ y0 o Ri 1 for i = 1... For simplicity we assume that we have two fuzzy IF-THEN rules of the form ...(7..(7.. Ú An(x0) ´ Bn(y0) ® Cn(w) input to the system is (x0. v..(7.. È x0 ´ y0 o Rn 1 i =1 n .. i n overall system output = union of the individual rule outputs.1) . .3) .. w) = [Ai(u) Ù Bi(v)] ® Ci(w) for i = 1....(7..5) 7. n.. Here C¢ is called the output of the i-th rule 1 C¢(w) = [Ai(x0) Ù Bi(y0)] ® Ci(w) 1 for each w.4) C(w) = Ai(x0) ´ Bi(y0) ® C1(w) Ú .. y0) fuzzified input is ( x0 . w) = (Ai ´ Bi ® Ci)(u.72 FUZZY LOGIC AND NEURAL NETWORKS The i-th fuzzy rule from this rule-base Ri : if x is Ai and y is Bi then z is Ci is implemented by a fuzzy relation Ri and is defined as Ri(u. y0 ) firing strength of the i-th rule is Ai(x0) Ù Bi(y0) the i-th individual rule output is C¢ (w): = A1(x0) Ù B1(x0) ® C1(w) 1 overall system output is C = C¢ È .(7. n... Then combine the C¢ component wise into C¢ by some aggregation operator: 1 C = U C¢ = x0 ´ y0 o R1È ...2) . Find C from the input x0 and from the rule base R = {R1..

we employ any defuzzification strategy. 7..FUZZY REASONING SCHEMES 73 R1 : also R2 : fact: Consequence: if x is A1 and y is B1 then z is C1 if x is A2 and y is B2 then z is C2 x is x0 and y is y0 z is C 7.2 Tsukamoto Inference Mechanism All linguistic terms are supposed to have monotonic membership functions. denoted by ai. are computed by a1 = A1(x0) Ù B1(y0).8) . a2 = A2 (x0) Ù B2(y0) . i = 1.(7..3.. 7. i = 1.9) ..1 Inference with Mamdanis implication operator. a2 = A2 (x0) Ù B2(y0) The individual rule outputs are obtained by C 1 (w) = (a1 Ù C1(w)).. to obtain a deterministic control action.7) . C 2 (w) = (a2 Ù C2(w)) ¢ ¢ Then the overall system output is computed by oring the individual rule outputs C(w) = C¢ (w) Ú C 2 (w) = (a1 Ù C1(w)) Ú (a2 Ù C2(w)) ¢ 1 Finally. denoted by ai.. 2.3. 2. The firing levels of the rules.(7.6) A1 B1 C1 u B2 v C2 w A2 x0 u y0 v Min w Fig.(7. are computed by a1 = A1(x0) Ù B1(y0).(7. .1 Mamdani Inference Mechanism The fuzzy implication is modelled by Mamdanis minimum operator and the sentence connective also is interpreted as oring the propositions and defined by max operator... The firing levels of the rules.

7.6) = 6 ..3 and from A2(x0) = 0.(7. 0..10) a1z1 + a 2 z2 a1 + a 2 ..11) i. the firing level of the first rule is a1 = min{A1(x0).3 + 0..6.6)/(0. z0 is computed by the discrete Center of-Gravity method. 0.12) i where ai is the firing level and zi is the (crisp) output of the i-th rule.3. B1(y0)} = min{0.. a2 = C2(z2) and the overall crisp control action is expressed as z0 = .(7. B2(y0)} = min{0.e. i = 1.74 FUZZY LOGIC AND NEURAL NETWORKS In this mode of reasoning the individual crisp control actions z1 and z2 are computed from the equations a1 = C1(z1).8} = 0.. n Example 7.(7.1: We illustrate Tsukamotos reasoning method by the following simple example R1 : also R2 : fact: Consequence: if x is A2 and y is B2 then z is C2 x is x0 and y is y0 z is C if x is A1 and y is B1 then z is C1 Then according to the figure we see that A1(x0) = 0.. B2(y0) = 0. C2(z2) = 0.6..7..6 and the crisp control action is z0 = (8 ´ 0.3} = 0. B1(y0) = 0..8 It follows that the firing level of the second rule is a2 = min{A2(x0).6 The individual rule outputs z1 = 8 and z2 = 4 are derived from the equations C1(z1) = 0. If we have n rules in our rule-base then the crisp control action is computed as n z0 = åaz i =1 n i i åa i =1 .3 Therefore.3 + 4 ´ 0.

. 7.2 Tsukamotos inference mechanism.. .FUZZY REASONING SCHEMES 75 A1 0.6 Z2 = 4 w Fig...6 X0 u 0. z * = a2x0 + b2y0 1 2 and the crisp control action is expressed as z0 = * * a1z1 + a 2 z2 a1 + a 2 . a2 = A2(x0) Ù B2(y0) then the individual rule outputs are derived from the relationships z* = a1x0 + b1y0..14) if x is A1 and y is B1 then z1 = a1x + b1y Sugeno and Takagi use the following architecture The firing levels of the rules are computed by a1 = A1(x0) Ù B1(y0).3 v Z1 = 8 C1 w A2 B2 C2 0..13) . i = 1.3 u B1 0.16) i where ai denotes the firing level of the i-th rule.(7..7 0..(7.. . 7.3 Sugeno Inference Mechanism R1 : also R2 : fact: Consequence: if x is A2 and y is B2 then z2 = a2x + b2y x is x0 and y is y0 z0 .15) If we have n rules in our rule-base then the crisp control action is computed as z0 = åa z i =1 n n * 1 i åa i =1 .(7..(7..8 Y0 v Min 0. n.3.

0.6.6. 7. 0.3 Sugenos inference mechanism.8.9 It follows that the firing level of the second rule is a2 = min {mMEDIUM (x0).2 + 4 ´ 0.9} = 0.2 Therefore.2: We illustrate Sugenos reasoning method by the following simple example R1 : also R2 : fact : Consequence: if x is MEDIUM and y is BIG then z2 = 2x y x is 3 and y is 2 z0 if x is BIG and y is SMALL then z1 = x + y Then according to the figure we see that mBIG (x0) = mBIG (3) = 0.2 + 0. mBIG (y0) = mBIG (2) = 0. Example 7. mSMALL (y0)} = min {0.8 mSMALL (y0) = mSMALL (2) = 0.25 . mBIG (y0)} = min {0.6)/(0.6) = 4. the firing level of the first rule is a1 = min {mBIG (x0).76 FUZZY LOGIC AND NEURAL NETWORKS A1 A2 u v a1 a1x + b1y B1 B2 a2 x u y v Min a2x + b2y Fig.2} = 0. z* = 2x0 y0 = 2 ´ 3 2 = 4 1 2 So the crisp control action is z0 = (5 ´ 0.2 and from mMEDIUM (x0) = mMEDIUM (3) = 0.6 The individual rule outputs are computed as z * = x0 + y0 = 3 + 2 = 5.

4 Larsen Inference Mechanism The fuzzy implication is modeled by Larsens product operator and the sentence connective also is interpreted as oring the propositions and defined by max operator. n 7.FUZZY REASONING SCHEMES 77 1 0.(7.. .19) where ai denotes the firing level of the i-th rule. i = 1.8 0.(7... a2 = A2(x0) Ù B2(y0) Then membership function of the inferred consequence C is pointwise given by C(w) = (a1C1(w)) Ú (a2C2(w)) To obtain a deterministic control action..(7.6 0.6 2x – y = 4 Fig.17) .4 Example of Sugenos inference mechanism. If we have n rules in our rule-base then the consequence C is computed as C(w) = V Ú (ai Ci(w)) i =1 n ..5 Simplified Fuzzy Reasoning R1 : also R2 : fact: Consequence: if x is A2 and y is B2 then z2 = C2 x is x0 and y is y0 z0 if x is A1 and y is B1 then z1 = C1 ... 7. i = 1. 7.3. we employ any defuzzification strategy...2 u v a1 = 0. 2 a1 = A1 (x0) Ù B1(y0).9 3 u 2 v Min a2 = 0.3.2 x+y=5 1 0.18) . Let us denote ai the firing level of the i-th rule.

The firing levels of the rules are computed by a1 = A1(x0) Ù B1(y0).22) i where ai denotes the firing level of the i-th rule.5 u Y0 v Min w Inference with Larsens product operation rule.(7..78 FUZZY LOGIC AND NEURAL NETWORKS A1 B1 C1 u A2 B2 v C2 w X0 Fig.. i = 1.20) a1c1 + a 2 c2 a1 + a 2 n . a2 = A2(x0) Ù B2(y0) then the individual rule outputs are c1 and c2. . n. L1 H2 L3 a1 C1 M1 M2 M3 a2 C2 H1 H2 H3 a3 Min Z3 Fig.. and the crisp control action is expressed as z0 = ..(7.6 Simplified fuzzy reasoning.. 7.. .(7...21) If we have n rules in our rule-base then the crisp control action is computed as z0 = åa C i =1 n i i åa i =1 . 7..

Information Sciences. 8. The concept of a linguistic variable and its application to approximate reasoning I. 43-80. 5. Vol. 6. 1. 1983. Approximate reasoning for production planning. Explain simplified reasoning scheme. 12. 13. 8. 2. Decision and Control. 3. 2. 26. Multidi-mensional fuzzy reasoning. Vol.A. 9. Approximate Reasoning in Intelligent Systems. 21. 11. Turksen. 3. 2. 1. Academic Press. Zadeh. 3. 1182-1191. L. Man and Cybernetics. Explain Tsukamoto inference mechanism. 1981. 23-37. 270-276. 9. 6. 30. Default and inexact reasoning with possibility degrees. IEEE Transactions on Systems. Oxford.K. Vol. No. 3. I. Fuzzy logic and reasoning. L. J. Vol.A. Pedrycz. pp. Vol. M. International Journal of Man-machine Studies.R. Fuzzy logic and approximate reasoning. 7. Mamdani.FUZZY REASONING SCHEMES 79 QUESTION BANK.B. 4. Vol. IEEE Transactions on Systems. 465-480.B. Gaines. Vol. 1987. 313-325. Man and Cybernetics. H. 1. Takagi. . Applications of fuzzy logic to approximate reasoning using linguistic systems. 6. 1985. 2. Applications of fuzzy relational equations for methods of reasoning in presence of fuzzy data. Information Sciences. 199-251. Turksen. 12. U. No. Zadeh.H. 1. 1988. 1975. 26. Baldwin. Gorzalczany. 4. Vol.A. pp. Fuzzy Sets and Systems. L. 9. London. E.F. 407-428. L. 121-142. Fuzzy Reasoning and Its Applications. 623-668. Sanchez and L. No. The concept of a linguistic variable and its application to approximate reasoning II. Explain Mamdani inference mechanism. E.A. 1976. Sugeno and T. 15. Explain Larsen inference mechanism. 301-357. No. No. Explain Sugeno inference mechanism. International Journal of Approximate Reasoning. Synthese. A method of inference in approximate reasoning based on interval-valued fuzzy sets. B. pp. pp. Fuzzy Sets and Systems. What are the different approaches to evaluating relevant production rules? Explain them. Vol. No. M. Vol. pp. Farreny and H. 1977. No. 11. pp. Prade. Gaines. 10. Vol. pp. 1-17. 1987. E. 1975. 5. 163-175. pp. Four methods of approximate reasoning with interval-valued fuzzy sets. Fuzzy Sets and Systems. 8. 1986. 1975. Vol.R. 4. 2. 14. Information sciences. Vol. The concept of a linguistic variable and its application to approximate reasoning III.A. No. pp. Mamdani and B. Zadeh. pp. pp. 1975. pp. Foundations of fuzzy reasoning. 1979.B. International Journal of Man-machine Studies. No. Pergamon Press. W. 16. REFERENCES. pp. No. Fuzzy Sets and Systems. Zadeh. 8.H. 16. 1. I. 1989. Zadeh.

Vol. present. S. Basu and A. Part I: Inference with possibility distributions. 297-317. No. pp. Vol. 1. 2. Sugeno. E.H. Li. Vol. 1990. Approximate reasoning with IF-THEN-UNLESS rule in a medical expert system. 21. 23. Turksen and M. International Journal of Approximate Reasoning. Luo and Z. No. Anderson. Kluwer. Vol. 1993. Chun. Approximate spatial reasoning: Integrating qualitative and quantitative constraints. 1992. Vol. An inference network for bidirectional approximate reasoning based on an equality measure. Vol. Fuzzy reasoning in a multidimensional space of hypotheses. 1.M. 3. Bien and M. No. D. 5. IEEE Transactions on Systems. 257-294. pp. Vol. Ruspini. 57. No. Hudson. Dutta. 1. 47-68. A review and comparison of six reasoning methods. 143-202. Prade. Cybernetics and Systems.L. Fuzzy Sets and Systems. 77-81. Chen. pp. Fuzzy Sets and Systems. future. 3. 24.P. No. Vol. Fuzzy Sets and Systems. Nakanishi. Kruse and E. 1990. M. Schwecke. Coben and M.F. 18. 409-420. 2. Vol. 1991. pp. Z. H. 1994. 4. International Journal of Intelligent Systems. C. 756-770.B.E. 57. IEEE Transactions on Fuzzy Systems. Information Sciences. 7. No. pp. Vol. Wang. Z. 1991. 1. Vol. 17. Fuzzy Sets and Systems. I. Fuzzy sets in approximate reasoning. 3. Bostan. . pp. 1992. 19. International Journal of Approximate Reasoning. man and cybernetics. A. pp. A new improved algorithm for inexact reasoning based on extended fuzzy production rules. Representation of compositional relations in fuzzy reasoning.80 FUZZY LOGIC AND NEURAL NETWORKS 16. 1990. S. Approximate reasoning: past. 1989. 1991. pp. pp. Dubois and H. Dutta. Kandel and L. Pawlak. A new model for fuzzy reasoning. 177-180. 26. A. No. No. 1991. 36. 25. R. 20. 27.G. pp. 40. 307-330. 71-79.Z. No. Z. Reasoning with imprecise knowledge to enhance intelligent decision support. 4. 22. Cao. D. 36. No. 311-325. 5. 23. Rough sets: Theoretical aspects of reasoning about data. 19. pp.

. e(k 1). despite the presence disturbances of the system parameters.1) The general form of the discrete-time control law is u(k) = f(e(k). is called regulation.. and noise measurements. called system. .. e(k t).. parameter t defines the order of the controller.. . The output of the controller (which is the input of the system) is the control action u. e(k t)) providing a control action that describes the relationship between the input and the output of the controller. y* e Controller u System y Fig. 8. e represents the error between the desired set point y* and the output of the system y.+ 0 ) 2 6 ..2 BASIC FEEDBACK CONTROL SYSTEM . f is in general a non-linear function. The process of keeping the output y close to the set point (reference input) y*. 8..(8.4 Fuzzy Logic Controllers 8 8. The purpose of the feedback controller is to guarantee a desired response of the output y..1 A basic feedback control system. . to be controlled.1 INTRODUCTION Conventional controllers are derived from control theory techniques based on mathematical models of the open-loop process. u(k 1).

3. The knowledge-based nature of FLC dictates a limited usage of the past values of the error e and control u because it is rather unreasonable to expect meaningful linguistic statements for e(k 3). e(k 4). Since the antecedents and the consequents of these IF-THEN rules are associated with fuzzy concepts (linguistic terms).. the dynamic behaviour of a fuzzy system is characterized by a set of linguistic description rules based on expert knowledge. e(k t). and an implicit sentence connective also links the rules into a rule set or.. they are often called fuzzy conditional statements.. Zadeh (1973) was introduced the idea of formulating the control algorithm by logical rules.. fuzzy control rules have the form.A. Ai. .. fuzzy control rules provide a convenient way for expressing control policy and domain knowledge.. Basically. it does not mean that the FLC is a kind of transfer function or difference equation. 8.1 Two-Input-Single-Output (TISO) Fuzzy Systems R1 : also R2 : also . e(k t). .2) where the function F is described by a fuzzy rule base. y and z in the universes of discourse U. e(k t) . . respectively. In our terminology. Furthermore. equivalently. The expert knowledge is usually of the form IF (a set of conditions are satisfied) THEN (a set of consequences can be inferred). a rule-base. When this is the case. and Ci are linguistic values of the linguistic variables x.. also Rn : if x is An and y is Bn then z is Cn if x is A2 and y is B2 then z is C2 if x is A1 and y is B1 then z is C1 For example. .3.. z is the control variable. u(k 1). Bi. However. V. several linguistic variables might be involved in the antecedents and the conclusions of these rules.2 Mamdani Type of Fuzzy Logic Control We can represent the FLC in a form similar to the conventional control law u(k) = f(e(k).. the system will be referred to as a multi-input-multioutput (MIMO) fuzzy system. in the case of two-input-single-output fuzzy systems.. where x and y are the process state variables. In a fuzzy logic controller (FLC).3 FUZZY LOGIC CONTROLLER L..(8.. a fuzzy control rule is a fuzzy conditional statement in which the antecedent is a condition in its application domain and the consequent is a control action for the system under control. and W. e(k 1).82 FUZZY LOGIC AND NEURAL NETWORKS 8.. 8.

8.(8.6) This type of controller was suggested originally by Mamdani and Assilian in 1975 and is called the Mamdani type FLC. So. A prototypical rule-base of a simple FLC realizing the control law above is listed in the following R1 : R2 : R3 : R4 : R5 : if e is "positive" and De is "near zero" then Du is "positive" if e is "negative" and De is "near zero" then Du is "negative" if e is "near zero" and De is "near zero" then Du is "near zero" if e is "near zero" and De is "positive" then Du is "positive" if e is "near zero" and De is "negative" then Du is "negative" .(8. De(k)) .....4) . The actual output of the controller u(k) is obtained from the previous value of control u(k 1) that is updated by Du(k) u(k) = u(k 1) + Du(k).(8.. and the error e(k) and its change De(k) = e(k) e(k 1) On the other hand..3) N Error ZE P Fig.. our task is the find a crisp control action z0 from the fuzzy rule-base and from the actual crisp inputs x0 and y0: R1 : also R2 : also .5) and is a manifestation of the general FLC expression with t = 1. if x is A2 and y is B2 then z is C2 if x is A1 and y is B1 then z is C1 .FUZZY LOGIC CONTROLLERS 83 A typical FLC describes the relationship between the changes of the control Du(k) = u(k) u(k 1) On the one hand.2 Membership functions for the error. such control law can be formalized as Du(k) = F(e(k)... ..(8..

we have to fuzzify the crisp inputs..3 Fuzzy Logic Control Systems Fuzzy logic control systems (Figure 8. A fuzzification operator has the effect of transforming crisp data into fuzzy sets. Furthermore. Crisp x in U Fuzzifier Fuzzy set in U Fuzzy rule base Fuzzy inference engine Fuzzy set in V Crisp y in V Defuzzifier Fig. Fuzzy inference machine and Defuzzification interface. In most of the cases we use fuzzy singletons as fuzzifiers fuzzifier (x0): = x0 where x0 is a crisp input value from a process..3 Fuzzy logic controller. and therefore to get crisp value we have to defuzzify it.(8. and therefore.84 FUZZY LOGIC AND NEURAL NETWORKS also Rn : input output if x is An and y is Bn then z is Cn x is x0 and y is y0 z0 Of course. 1 X0 .4 Fuzzy singleton as fuzzifier.3. 8.7) X0 Fig. 8. the output of a fuzzy system is always a fuzzy set. 8.3) usually consist of four major parts: Fuzzification interface. Fuzzy rulebase. . the inputs of fuzzy rule-based systems should be given by fuzzy sets.

. A fuzzy control rule Ri : if (x is Ai and y is Bi then (z is Ci) is implemented by a fuzzy implication Ri and is defined as R(u. Find the output of each of the rules. Bi(v)] ® Ci(w)} Of course. y and fuzzy relations Ri.e. we apply the compositional rule of inference: R1 : if x is A1 and y is B1 then z is C1 also R2 : if x is A2 and y is B2 then z is C2 also .. Fuzzy control rules are combined by using the sentence connective also.9) .(8.. the overall behavior of a fuzzy system is characterized by these fuzzy relations.. if we have the collection of rules R1 : also R2 : also . Symbolically. we can use any t-norm to model the logical connective and. In other words.. also Rn : if x is An and y is Bn then z is Cn input x is x0 and y is y0 z is C Consequence : .. w) = [Ai(u) and Bi(v)] ® Ci(w) where the logical connective and is implemented by the minimum operator. also Rn : if x is A1 and y is B1 then z is C1 if x is A2 and y is B2 then z is C2 . Since each fuzzy control rule is represented by a fuzzy relation.v... To infer the output z from the given process states x.8) if x is An and y is Bn then z is Cn The procedure for obtaining the fuzzy output of such a knowledge base consists from the following three steps: Find the firing level of each of the rules.(8. [Ai(u) and Bi(v)] ® Ci(w) = [Ai (u) ´ Bi(v)] ® Ci(w) = min {[Ai(u). a fuzzy system can be characterized by a single fuzzy relation which is the combination in question involves the sentence connective also.FUZZY LOGIC CONTROLLERS 85 Suppose now that we have two input variables x and y. Aggregate the individual rule outputs to obtain the overall system output. i.

C. u ¹ x0 and y0 (v) = 0...(8. a nonfuzzy (crisp) control action is usually required. . is obtained from the individual rule outputs C i by C(w) = Agg {C¢ .. namely: z0 = defuzzifier (C) where z0 is the nonfuzzy control output and defuzzifier is the defuzzification operator.. 1 n ...16) ..(8. In the on-line control... ..(8.13) ..1: If the sentence connective also is interpreted as oring the rules by using minimum-norm then the membership function of the consequence is computed as C = ( x0 ´ y0 o R1 È.. C¢ } for all w Î W.. An(x0) ´ Bn(y0) ® Cn(w)} for all w Î W..4 DEFUZZIFICATION METHODS The output of the inference process so far is a fuzzy set...12) .14) C(w) = Agg {A1(x0) ´ B1(y0) ® C1(w)...(8. ..86 FUZZY LOGIC AND NEURAL NETWORKS Where the consequence is computed by consequence = Agg (fact o R1.(8...(8.(8.(8. .. one must defuzzify the fuzzy control action (output) inferred from the fuzzy control algorithm. fact o Rn) That is.. Consequently.11) .17) Example 8. v ¹ y0 The computation of the membership function of C is very simple: . .(8..15) 8....È x0 ´ y0 o Rn) That is C(w) = A1(x0) ´ B1(y0) ® C1(w) V. C = Agg ( x0 ´ y0 o R1 .. specifying a possibility distribution of control action.V An(x0) ´ Bn (y0) ® Cn(w) for all w Î W. .. The procedure for obtaining the fuzzy output of such a knowledge base can be formulated as The firing level of the I-th rule is determined by Ai(x0) ´ Bi(y0) The output of the I-th rule is calculated by C¢ (w) = Ai(x0) ´ Bi(y0) ® Ci(w) for all w Î W 1 The overall system output....18) ... x0 ´ y0 o Rn) taking into consideration that .10) x0 (u) = 0.

4.19) w The calculation of the Center-of-Area defuzzified value is simplified if we consider finite universe of discourse W and thus discrete membership function C (w) z0 = å z C(z )dz å c(z ) j j j .. i... having maximal membership grades z0 = 1 N åz j =1 n j .2 First-of-Maxima The defuzzified value of a fuzzy set C is its smallest maximizing element.(8.(8.5 First-of-maxima defuzzification method.21) Z0 Fig.1 Center-of-Area/Gravity The defuzzified value of a fuzzy set C is defined as its fuzzy centroid: z0 = z z w zC( z ) dz c( z) dz .(8.FUZZY LOGIC CONTROLLERS 87 Defuzzification is a process to select a representative element from the fuzzy output C inferred from the fuzzy control algorithm...(8. The most often used defuzzification operators are: 8.e..4... 8. 8.22) . z0 = min z C ( z ) = max C ( w) u R S T U V W .3 Middle-of-Maxima The defuzzified value of a discrete fuzzy set C is defined as a mean of all values of the universe of discourse.20) 8.4.

. Z0 Fig..(8.25) a where [C]a denotes the a-level set of C as usually.6 Middle-of-maxima defuzzification method. 8. .4.23) G where G denotes the set of maximizing element of C.. the plausible control action depicted in Figure could be interpreted as turn right or left Both Center-of-Area and Middle-of-Maxima defuzzification methods result in a control action driveahead straightforward which causes an accident.2: Consider a fuzzy controller steering a car in a way to avoid obstacles.4. If C is not discrete then defuzzified value of a fuzzy set C is defined as z0 = z z G zdz dz . z0 Î z C( z ) = max C( w) w R S T U V W . Example 8.e..(8.. 8.5 Height Defuzzification The elements of the universe of discourse W that have membership grades lower than a certain level a are completely discounted and the defuzzified value z0 is calculated by the application of the Center-ofArea method on those elements of W that have membership grades not less than a: z0 = [C ] [ C ]a z z zC( z)dz c( z )dz .. i.. .88 FUZZY LOGIC AND NEURAL NETWORKS where {z1. from the set of maximizing elements of C..24) 8.4 Max-Criterion This method chooses an arbitrary value..(8. zN} is the set of elements of the universe W which attain the maximum value of C. If an obstacle occurs right ahead.

.(8.7 Undesired result by Center-of-Area and Middle-of-Maxima defuzzification methods.FUZZY LOGIC CONTROLLERS 89 C Z0 Fig.G MN 2 H b JK PQ 2 Ai(u) = exp - i1 i1 2 i i2 .a I OP MN 2 GH b JK PQ L 1 F v .. fuzzifier (y): = y Product fuzzy conjunction [Ai(u) and Bi(v)] = Ai(u) Bi(v) Product fuzzy implication (Larsen implication) [Ai(u) and Bi(v)] ® Ci(w) = Ai(u) Bi(v) Ci(w) (8..5 EFFECTIVITY OF FUZZY LOGIC CONTROL SYSTEMS Using the Stone-Weierstrass theorem.(8.G MN 2 H b JK PQ L 1 F w .29) .. Wang (1992) showed that fuzzy logic control systems of the form Ri: if x is Ai and y is Bi then z is Ci..28) .(8.26) i2 2 i i3 i3 Singleton fuzzifier fuzzifier (x): = x .a I OP C (w) = exp M.27) . i = 1. 8.. n with Gaussian membership functions LM 1 F u . A suitable defuzzification method would have to choose between different control actions (choose one of two triangles in the Figure) and then transform the fuzzy set into a crisp value. . 8.a I OP B (u) = exp M.

. ...e.(8.. Rn ) = max {R1. (8.|a .. are universal approximators.90 FUZZY LOGIC AND NEURAL NETWORKS Centroid defuzzification method z= åa i =1 n i =1 n i 3 Ai ( x ) Bi ( y ) å A ( x) B ( y ) i i .30) where ai3 is the center of Ci..36) .(8..w| £ g i otherwise . Rn} ... (8. n with Symmetric triangular membership functions R1 .u| £ a i otherwise if |bi .33) ... Namely.... there exists a fuzzy logic control system with output function f such that sup ||g(x) f(x)|| £ e x ÎU ..w| g C (w) = exp S T0 Ai(u) = exp i i if |ai . ..32) i i i i i i Singleton fuzzier fuzzifier (x0): = x0 Minimum norm fuzzy conjunction [Ai(u) and Bi(v)] = min {Ai(u)Bi(v)} Minimum norm fuzzy implication [Ai(u) and Bi(v)] ® Ci(W) = min {Ai(u). R2. he proved the following theorem Theorem 8..34) . R2.|b .31) Castro in 1995 showed that Mamdanis fuzzy logic controllers Ri : if x is Ai and y is Bi then z is Ci.|c . i...35) .. Ci(W)} Maximum t-conorm rule aggregation Agg (R1.(8. i = 1.(8.(8. they can approximate any continuous function on a compact set to arbitrary accuracy.v| b B (v) = exp S T0 R1 ..v| £ bi otherwise if |ci .u| a S0 T R1 . Bi(v).1 For a given real-valued continuous function g on the compact set U and arbitrary e > 0.

P. Fuzzy Sets and Systems. 3. 6. 7. 4. 2. pp. E. Czogala and W. Pedrycz. 1. E. 13. pp. Explain Mamdani type of fuzzy logic controller. What are the various parts of fuzzy logic control system? Explain them. Cybernetics and Systems. W. pp. 669-678. Vol. What is fuzzy logic controller? Explain two-input-single-output fuzzy system. a rationale for fuzzy control. 1. Advances in the linguistic synthesis of fuzzy controllers. 1971. 1978. pp. pp. 1. Mamdani. 7. Mamdani. 6.A. Lee. Journal of dynamical systems. Fuzzy Sets and Systems. Vol. Analysis of a fuzzy logic controller. 3-4. Mamdani and S. The application of fuzzy control systems to industrial process.. Mamdani. pp.(8. No. Selection of parameters for a fuzzy logic controller. No. 5. 6. M.H.H. Fuzzy sets and systems. 185-199. Vol. . 3. 2.H. Vol. pp. What are the various defuzification methods? Explain them. 29-44. 1979. Automatica. No. No. pp. 1982. Vol. 1976. E. 9. King and E. 3. Pedrycz.J. No. 257-274. No.FUZZY LOGIC CONTROLLERS 91 Centroid defuzzification method z= å c min { A ( x) B ( y)} i i i i =1 n n å min { A ( x) B ( y)} i i i =1 . No. C. No. Control problems in fuzzy systems. 235-242. An experiment in linguistic synthesis with a fuzzy logic controller. Vol. E. 3. Assilian. What is the effectivity of fuzzy logic control systems? REFERENCES. 275-293. 5. 3.H. Fuzzy rule generation for fuzzy control.A. International Journal of Man. 1975. 1977. Selection of parameters for a fuzzy logic controller.37) where ci is the center of Ci are also universal approximators. 2. L. Vol.J. 1982. Brase and D. 94. 1979. 2. Fuzzy Sets and Systems. 13. Vol. Rutherford. Kickert and E. 1-13. Machine Studies. 8. 1.M. 4. No. Vol.. 1. QUESTION BANK. 1. 185-199. Measurement and Control. 7. 8. pp.C. International Journal of Machine Studies. Czogala and W. 3. Zadeh. 3.

335-351. Vol. Oh.J. IEEE Transactions on Systems. 5983. 223-229. 1992. IEEE Control Systems Magazine. 15. 20.F. 1825-1848. Sugeno. 1. 21. Boullama and A.M. Abdelnour. pp. Control and Cybernetics. 1985. G. Applied Intelligence.M. Fuzzy identification of systems and its applications to modeling and control.Y. 2. Fuzzy identification and control of a liquid level rig. 99-111. 249-258. T. 83-89. 26. 59. Fuzzy inference and its applicability to control systems.92 FUZZY LOGIC AND NEURAL NETWORKS 10. 3-13. 3.Q. 116-132. 14. 14. 159-168. Studies on the output of fuzzy controller with multiple inputs. Vol. B. Fuzzy Sets and Systems. pp. Huang and J. J. Design of a fuzzy controller using input and output mapping factors. 345-349. Vol. No. No. Study on stability of fuzzy closed-loop control systems. Vol.B. IEEE Transactions on Systems. 25. Man and Cybernetics. Buckley. L. No. Sugeno. Sugeno.H. Fuzzy logic control. pp. Fuzzy Sets and Systems. C.A. No. 10. Stability analysis and design of fuzzy control systems. pp. . pp. 1990. Vol. Peng. 48. 1985. Man and Cybernetics.P. pp. J. 5. Control of dynamic systems using fuzzy learning algorithm. 36. Wong. Graham and R. 26. An introductory survey of fuzzy control. pp. Buckley. 17. Li and Z. 24. pp. Vol. pp.S. R. Vol. Cheung. pp. Fuzzy Sets and Systems.T.M.J.F. pp. Bladwin and N. A. 13. Tanaka and M. J. Vol. Kandel . No.1. 22. 21. Lamotte. J. Ragot and M. pp. Use of rule-based system for process control. No. pp. Trojan. International Journal of Systems Sciences. Chen and L. F. 1984. pp. Cao. pp. No. M. Fuzzy Sets and Systems. 135-156. 925-960. 2. Vol. Ichikawa. Infromation Sciences.H. 11. 1993. 15. 8. No. 1. Yager. 18. B. 255-273. 1992.J. pp. Vol. Dutta Majumdar. 1988. 1. Vol. pp. Fuzzy Sets and Systems.B. No. C. Vol. Man and Cybernetics. 48. pp. 16. 1-14. 28. X. 51. No. 19. 127130. Man and Cybernetics. 1989. Mon. 3. Kiszks and G. Modeling controllers using fuzzy relations. 36. 16. K. Theory of the fuzzy controller: an introduction. Takagi and M. 1986. Chen. 27. 1. Bernard. No. 45. Gupta. Vol.C. 149-158. C. R.Q. Fuzzy Sets and Systems. 5. 57. Newell. F. Guild. Fuzzy control rules and their natural control laws. Kybernets. Vol. 65-86. Vol. 9. 2. No. 1993. Generating rules for fuzzy logic controllers by functions. M. K. 1992. 4. 2. No.H. Fuzzy v/s non-fuzzy controllers. 1. 24. 638-656. 18. 5. 1991. No . 2. 2. No. Chang. IEEE Transactions on Systems. 1993. A general approach to rule aggregation in fuzzy logic control. 1993. Vol. Fuzzy Sets and Systems. 1992. J. No. 1991. Fuzzy Sets and Systems. J. No. IEEE Transactions of on Systems. Fuzzy Sets and Systems. 23. Application of circle criteria for stability analysis of linear SISO and MIMO systems associated with fuzzy logic controller. 3. 12. 57. Vol. No. 1988. Multivariable structure of fuzzy control systems. 1992. Vol. pp. Ray and D.R. Vol. Chou and D. Chung and J. No.

1994. Krause.1905-1914. 23.O.M. Kiupel and P. pp. S. IEEE Transactions of Systems.Y.R. Mc Murray. Vol. 1993. Two-layer multiple-variable Fuzzy logic controller. pp. 24. No. Pedrycz. Three models of fuzzy logic controllers. 2. No. Vol. pp. 1993.FUZZY LOGIC CONTROLLERS 93 29. Han and V. Arend. Rommler. No. Fuzzy control of steam turbines.P. Filev and R. Man and Cybernetics. Essentials of Fuzzy Modeling and Control. Yagar.R. Journal of Intelligent and Fuzzy Systems.2. D. 91-114. Vol. Yager. Vol. 1993. C. Cybernetics and Systems. No. John Wiley. and D. Steffess and E. Filev. 1. 30. R. B. No. 277-285. pp.J. Vol. N. 1994. 1. Fuzzy control architectures. Frank. 1.V.P.B. Asia-Pacific Engineering Journal. 10. 31. J. 32. No.125-146. 61. 33. Bugarin. W. A. pp. Journal of Systems Science. Altrock. 1-32. Fuzzy Sets and Systems. Ruiz.2. pp. 1994. New York. 35. Adaptive fuzzy control applied to home heating system. 24. H. 29-36. 1993. . C. Vol. 34. Fuzzy controllers: Principles and architectures. 3. Barro and R.

This observation underpins many of the other statements about fuzzy logic. This process is made particularly easy by adaptive techniques like ANFIS (Adaptive Neuro-Fuzzy Inference Systems). impenetrable models. Fuzzy logic is conceptually easy to understand.C H A P T E R Fuzzy Logic Applications 9 9. In direct contrast to neural networks. . Everything is imprecise if you look closely enough. 4. Fuzzy reasoning builds this understanding into the process rather than tacking it onto the end. Fuzzy logic can model nonlinear functions of arbitrary complexity. The basis for fuzzy logic is the basis for human communication. Fuzzy logic can be blended with conventional control techniques. which are available in the Fuzzy Logic Toolbox. 7. Fuzzy logic can be built on top of the experience of experts. which take training data and generate opaque. 5. In many cases fuzzy systems augment themand simplify their implementation. 6. but more than that. Fuzzy systems don’t necessarily replace conventional control methods. most things are imprecise even on careful inspection. With any given system. 2. it’s easy to massage it or layer more functionality on top of it without starting again from scratch. Fuzzy logic is based on natural language. 3. Fuzzy logic is flexible. You can create a fuzzy system to match any set of input-output data. What makes fuzzy nice is the “naturalness” of its approach and not its far-reaching complexity. fuzzy logic lets you rely on the experience of people who already understand your system.1 WHY USE FUZZY LOGIC? Here is a list of general observations about fuzzy logic: 1. Fuzzy logic is tolerant of imprecise data. The mathematical concepts behind fuzzy reasoning are very simple.

In most cases someone with a intermediate technical background can design a fuzzy logic controller. Consumer Electronics • Television • Photocopiers • Still and Video Cameras – Auto-focus.2 APPLICATIONS OF FUZZY LOGIC Fuzzy logic deals with uncertainty in engineering by attaching degrees of certainty to the answer to a logical question. Automotive Systems • Vehicle Climate Control • Automatic Gearboxes • Four-wheel Steering • Seat/Mirror Control Systems . 9. that which is used by ordinary people on a daily basis. Fuzzy logic is not the answer to all technical problems. something we use every day. Control engineers also use it in applications where the on-board computing is very limited and adequate control is enough. fuzzy logic has been used with great success to control machines and consumer products. but for control problems where simplicity and speed of implementation is important then fuzzy logic is a strong candidate. In the right application fuzzy logic systems are simple to design. of course. Commercially. Since fuzzy logic is built. A cross section of applications that have successfully used fuzzy control includes: 1. Natural language.FUZZY LOGIC APPLICATIONS 95 The last statement is perhaps the most important one and deserves more discussion. and can be understood and implemented by nonspecialists in control theory. Environmental • Air Conditioners • Humidifiers 2. Why should this be useful? The answer is commercial and practical. The control system will not be optimal but it can be acceptable. Sentences written in ordinary language represent a triumph of efficient communication. Domestic Goods • Washing Machines/Dryers • Vacuum Cleaners • Toasters • Microwave Ovens • Refrigerators 3. Exposure and Anti-shake • Hi-Fi Systems 4. We are generally unaware of this because ordinary language is. has been shaped by thousands of years of human history to be convenient and efficient.

when there is a certain degree of freedom of choice. Using velocity of vehicle and pursuit distance that can be measured with a sensor on vehicle a model has been established to brake pedal (slowing down) by fuzzy logic.3 WHEN NOT TO USE FUZZY LOGIC? Fuzzy logic is not a cure-all. However. they can cause difficulties for emergency vehicles. If a simpler solution already exists. This goal forms the background for the present traffic safety program. Obstacles such as flower pots. by for instance redirecting traffic. 9. you will see it can be a very powerful tool for dealing quickly and efficiently with imprecision and nonlinearity. try something else. cyclists and other traffic groups.1 Traffic Accidents And Traffic Safety The general goal of traffic safety policy is to eliminate the number of deaths and casualties in traffic. do a fine job without using fuzzy logic. small circulation points and elevated pedestrian crossings are frequently found in many residential areas around India. use it. Many controllers. and in winter these obstacles can reduce access for snow clearing vehicles. These obstacles can cause damages to cars. weather conditions etc. When should you not use fuzzy logic? Fuzzy logic is a convenient way to map an input space to an output space. In this study. Cost of traffic accident is roughly 3% of gross national product. lack of infrastructure. However.4. Speed reduction can be accomplished by police surveillance.96 FUZZY LOGIC AND NEURAL NETWORKS 9. The program is partly based on the assumption that high speed contributes to accidents.4 FUZZY LOGIC MODEL FOR PREVENTION OF ROAD ACCIDENTS Traffic accidents are rare and random. physical measures are not always appreciated by drivers. using fuzzy logic method. However. However. and might reduce dangerous behaviour in traffic. Fuzzy logic is the codification of common sense-use common sense when you implement it and you will probably make the right decision. 9. When statistics are investigated India is the most dangerous country in terms of number of traffic accidents among Asian countries. for example. and whether acceptance of safety measures is also reflected in their perception of road traffic. if you take the time to become familiar with fuzzy logic. An alternative to these physical measures is different applications of Intelligent Transportation Systems (ITS). Many researchers support the idea of a positive correlation between speed and traffic accidents. a model was developed which would obtain to prevent the vehicle pursuit distance automatically. agree that this rate is higher in India since many traffic accidents are not recorded. might also be reflected in a higher acceptance of other measures. Many reasons can contribute these results. If you find it is not convenient. One important aspect when planning and implementing traffic safety programs is therefore drivers’ acceptance of different safety measures aimed at speed reduction. which are mainly driver fault. One way to reduce the number of accidents is to reduce average speeds. but also through physical obstacles on the roads. many people died or injured because of traffic accidents all over the world. . Another aspect is whether the individual’s acceptance. for example single vehicle accidents or some accidents without injury or fatality. The major objectives with ITS are to achieve traffic efficiency. pedestrians. literacy. environment. road humps. and to increase safety for drivers. which has increasing usage area in Intelligent Transportation Systems (ITS).

In the next step this fuzzy rule base can (but need not) be supplemented with the rules collected from human experts.3 Application In the study. and defuzzifier. Based on this fact we can infer another fact that is called a conclusion or consequent (the fact following “Then”). minimum inference or product inference is used. In this case. A typical “If-Then” rule would be: If the ratio between the flow intensity and capacity of an arterial road is SMALL Then vehicle speed in the flow is BIG The fact following “If” is called a premise or hypothesis or antecedent. recently. the rule base is formed with the assistance of human experts.4. A set of a large number of rules of the type: If premise Then conclusion is called a fuzzy rule base. The literature also contains a large number of different defuzzification procedures. A large number of different inferential procedures are found in the literature. The general structure of the model is shown in Fig. one value is chosen for the output variable. 9. The final value chosen is most often either the value corresponding to the highest grade of membership or the coordinate of the center of gravity. rules are extracted from numerical data in the first step.1.2. Models based on fuzzy logic consist of “If-Then” rules. inference engine. The task of the fuzzifier is to map crisp numbers into fuzzy sets (cases are also encountered where inputs are fuzzy variables described by fuzzy membership functions). Input Fuzzifier Defuzzifier Crips output Rules Inference Fig. fuzzifier. An interesting case appears when a combination of numerical information obtained from measurements and linguistic information obtained from human experts is used to form the fuzzy rule base. . 9. as shown in Figure 9. The inference engine of the fuzzy logic maps fuzzy sets onto fuzzy sets. 9.FUZZY LOGIC APPLICATIONS 97 9. During defuzzification.1 Basic elements of a fuzzy logic. In fuzzy rule-based systems. numerical data has been used as well as through a combination of numerical data-human experts. rules.2 Fuzzy Logic Approach The basic elements of each fuzzy logic system are. In most papers and practical engineering applications. Input data are most often crisp values. a model was established which estimates brake rate using fuzzy logic.4.

Low Medium High 1 0. 1 Low Medium High 0. Membership functions are given in Figures 9. 9.5.3 Membership function of speed. 9.3. Because of the fact that current distance sensors perceive approximately 100-150 m distance. distance membership function is used 0-150 m scale. 9. .5 0 0 50 100 150 Fig.4 Membership function of distance.5 0 0 20 40 60 80 100 120 Fig. speed scale selected as 0-120 km/h on its membership function.2 General structure of fuzzy logic model. different membership functions were formed for speed. and 9.4. distance and brake rate. For maximum allowable car speed (in motorways) in India.4 Membership Functions In the established model.4. 9.98 FUZZY LOGIC AND NEURAL NETWORKS Speed Rule base Distance Brake rate Fig. 9. Brake rate membership function is used 0-100 scale for expressing percent type.

1: Speed LOW LOW LOW MEDIUM MEDIUM MEDIUM HIGH HIGH HIGH Fuzzy allocation map of the model Distance LOW MEDIUM HIGH LOW MEDIUM HIGH LOW MEDIUM HIGH Brake rate LOW LOW MEDIUM MEDIUM LOW LOW HIGH MEDIUM LOW 9.4.5 0 0 10 20 30 40 50 60 70 80 90 100 Fig. This model can be adapted to vehicles.4. environment. Table 9. In this study.6 is an example for such the case. 9.FUZZY LOGIC APPLICATIONS 99 Low Medium High 1 0. 9. Fig.1. and brake rate.7 Conclusions Many people die or injure because of traffic accidents in India. a model was established for estimation of brake rate using fuzzy logic approach. Figure 6 shows that the relationship between inputs. weather conditions etc. speed and distance. Car brake rate is estimated using the developed model from speed and distance data. Many reasons can contribute these results for example mainly driver fault.5 Membership function of brake rate. So. For this model. various alternatives are able to crossexamine using the developed model.6 Output Fuzzy logic is also an estimation algorithm. It is important that the rules were not completely written for all probability. lack of infrastructure. Fuzzy Allocation Map (rules) of the model was constituted for membership functions whose figures are given on Table-9. 9. .5 Rule Base We need a rule base to run the fuzzy model. it can be said that this fuzzy logic approach can be effectively used for reduce to traffic accident rate.4. 9.

which obscures the ease of the formulation of a fuzzy controller.1 The Mechanics of Fuzzy Logic The mechanics of fuzzy mathematics involve the manipulation of fuzzy variables through a set of linguistic equations. A fuzzy variable is one of the parameters of a fuzzy model. Three fuzzy sets: ‘hot’. In conventional set theory. In these cases. The form of the control model also determines the appropriate level of precision in the result obtained. Although Zadeh was attempting to model human activities. Numerical models provide high precision.100 FUZZY LOGIC AND NEURAL NETWORKS 80 Brake rate 60 40 0 20 0 50 50 Distance 100 150 100 Speed Fig. ‘cold’ and ‘comfortable’ have been defined by membership distributions over a range of actual temperatures. 9. 9. an object (in this case a temperature value) is either a member of a set or it is not a member. Much of the fuzzy literature uses set theory notation. which describe the control model. Mamdani showed that fuzzy logic could be used to develop operational automatic control systems. which can take the form of if–then rules. The outline of fuzzy operations will be shown here through the design of a familiar room thermostat. The power of a fuzzy model is the overlap between the fuzzy values. Here the process is described in common language.6 Relationship between inputs and brake rate. The room temperature is the variable shown in Fig. 9. The linguistic model is built from a set of if-then rules.5 FUZZY LOGIC MODEL TO CONTROL ROOM TEMPERATURE Although the behaviour of complex or nonlinear systems is difficult or impossible to describe using numerical models. the proof of stability and other validations remain important topics. each represented by a fuzzy set and a word descriptor. This implies a crisp . which can take one or more fuzzy values. quantitative observations are often required to make quantitative control decisions. These decisions could be the determination of a flow rate for a chemical process or a drug dosage in medical practice. 9. Although the controllers are simple to construct. linguistic models provide an alternative.7. but the complexity or non-linearity of a process may make a numerical model unfeasible.5. A single temperature value at an instant in time can be a member of both of the overlapping sets.

0. boundary between the sets. but the computation cost increases. the fuzzification process is simple. The number of values and the range of actual values covered by each one are also arbitrary. 9.7 are triangular. Finer resolution is possible with additional sets. The membership functions defining the three fuzzy sets shown in Fig. Guidance for these choices is provided by Zadeh’s Principle of Incompatibility: As the complexity of a system increases. where measurements are converted into memberships in the fuzzy sets. but the triangular form is commonly chosen. By admitting multiple possibilities in the model. .0 0 5 10 15 20 25 30 35 Temperature (Degrees C) 40 45 50 Fig. The membership functions are used to calculate the memberships in all of the fuzzy sets. 9. 9.67 0. a temperature of 15°C becomes three fuzzy values. In the overlap region. the linguistic imprecision is taken into account.6 Cold 0. There are no constraints on the specification of the form of the membership distribution. our ability to make precise and yet significant statements about its behaviour diminishes until a threshold is reached beyond which precision and significance (or relevance) become almost mutually exclusive characteristics.33 ‘comfortable’ and 0. 9.33 Comfortable Hot 0.66 ‘cold’. In fuzzy logic.2 0.0 Membership value 0. The second step is the application of the linguistic model.00 ‘hot’.2 1. usually in the form of if-then rules. 0.FUZZY LOGIC APPLICATIONS 101 1. The blurred set boundaries give fuzzy logic its name. an object can be a partial member of each of the overlapping sets.7.5.2 Fuzzification For a single measured value.7 Room temperature. Finally the resulting fuzzy output is converted back into physical values through a defuzzfication process.4 0. as shown in Fig. the boundaries between sets are blurred. The operation of a fuzzy controller proceeds in three steps. The Gaussian form from statistics has been used. as its computation is simple. The first is fuzzification. Thus.8 0.

6 Cold 0. the rule antecedents. the membership of the histogram in ‘comfortable’ and ‘hot’ are 0.73.8.0 Membership value 0. The minimum operation yields the overlap region of the two sets and the maximum operation gives the highest membership in the overlap.00. In Fig.8 0.40 and 0. These experts include the process designers.8. 9. The rules needed to describe a process are often obtained through consultation with workers who have expert knowledge of the process operation.00. but more . mhistogram(T)]} where the maximum and minimum operations are taken using the membership values at each point T over the temperature range of the two distributions.5. 9.3 Rule Application The linguistic model of a process is commonly made of a series of if . 9.0 0 5 10 15 20 25 30 35 Temperature (Degrees C) 40 45 50 Fig. 9.8 Fuzzification with measurement noise. the simplicity of the rules trades off against the number of rules. the rule consequents. By similar operations. 9. The fuzzy inference is extended to include the uncertainty due to measurement error as well as the vagueness in the linguistic descriptions. is 0. Thus.2 1.then rules. 1. It is interesting to note that there is no requirement that the sum of all memberships be 1. For complex systems the number of rules required may be very large.2 0. These use the measured state of the process. Although each rule is simple. there must be a rule to cover every possible combination of fuzzy input values. indicated by the arrow in Fig. to estimate the extent of control action. The membership of the histogram in ‘cold’.4 Comfortable Hot 0.8 the measurement data histogram is normalized so that its peak is a membership value of 1.0 and it can be used as a fuzzy set.102 FUZZY LOGIC AND NEURAL NETWORKS A series of measurements are collected in the form of a histogram and use this as the fuzzy input as shown in Fig. The membership of the histogram in ‘cold’ is given by: max {min [mcold(T).

mB) and OR = Max (mA. Exception handling is a particular strength of fuzzy control systems. In automatic control. Zadeh defined the logical operators as AND = Min (mA.FUZZY LOGIC APPLICATIONS 103 importantly.66 membership in the heater setting ‘on’. To extend these to more complex control models. the process operators. In decision support systems.33 membership in ‘off’. if humidity was to be included in the room temperature control example.g. Zadeh also defined the NOT operator by assuming that complete membership in the set A is given by mA = 1. 9. It is possible that several outputs are recommended and some may be contradictory (e.66 membership in ‘cold’ to become 0. Here the method becomes indecisive and does not produce a satisfactory result. For very complex systems. . compound rules may be formulated. (Temperature is Cold) is the membership value of the actual temperature in the ‘cold’ set. The rules presented in the above example are simple yet effective. there must be a consistent method to resolve conflict and define an appropriate compromise. In the above rule. Rule 1 transfers the 0. The first is the maximum membership method.00 in the ‘off’ setting for the heater. the membership in ‘on’ will be the minimum of the two antecedent membership values. Similar values from rules 2 and 3 are 0. the experts may not be able to identify their thought processes in sufficient detail for rule creation. Mamdani used the maximum of the membership values. The rules can include both the normal operation of the process as well as the experience obtained through upsets and other abnormal conditions. For example. rules of the form: IF (Temperature is Cold) AND (Humidity is High) THEN (Heater is ON) might be used.5. one physical value of a controller output must be chosen from multiple recommendations. This method fails when there are two or more equal maximum membership values for different recommendations. but gives a distribution corresponding to the overlap between A and its adjacent sets. heater on and heater off). This gives the interesting result that A AND NOT (A) does not vanish. where mA and mB are membership values in sets A and B respectively. The result for the three rules is then 0. Two methods are commonly used. mB).66 membership in ‘on’ and 0. Rules may also be generated from operating data by searching for clusters in the input data space. The membership in NOT (A) is then given by m NOT (A) = 1 – mA. A simple temperature control model can be constructed from the example of Fig.4 Defuzzification The results of rule application are membership values in each of the consequent or output sets. These can be used directly where the membership values are viewed as the strength of the recommendations provided by the rules.7: Rule 1 : Rule 2 : IF (Temperature is Cold) THEN (Heater is On) IF (Temperature is Comfortable) THEN (Heater is Off) Rule 3 : IF (Temperature is Hot) THEN (Heater is Off) In Rule 1. Defuzzification is the process for converting fuzzy output values to a single value or final decision. When several rules give membership values for the same output set.33 and 0. All of the output membership functions are combined using the OR operator and the position of the highest membership value in the range of the output variable is used as the controller output. 9.

cost and inconsistency. self-learning techniques such as neural networks (NN) and fuzzy logic (FL) seem to represent a good approach. The rules are applied using formalized operations to yield memberships in output sets. These integrals are taken over the entire range of the output. In the example there were two. Efforts to develop automated fruit classification systems have been increasing recently due to the drawbacks of manual grading such as subjectivity. Center of gravity defuzzification gave. Measurement data are converted to memberships through fuzzification procedures. Input membership functions are based on estimates of the vagueness of the descriptors used.40. Finally. In the histogram input case. with center of gravity defuzzification. The output values used in the thermostat example are singletons. The rules are generated a priori from expert knowledge or from data through system identification methods. Singletons are fuzzy values with a membership of 1. availability. the operating procedures for the calculations are well set out. Once these are defined. The center of gravity is given by XF = z z x ( x ) dx ( x ) dx where x is a point in the output range and XF is the final control value.67 and mOFF = 0. biological materials. Defuzzifying these gives a control output of 67% power. However. the center of gravity equation integrals become a simple weighted average. 9. ‘off’ at 0% power and ‘on’ at 100% power. the agricultural environment is highly variable. the heater power decreases smoothly between fully on and fully off as the temperature increases between 10°C and 25°C.00 at a single value rather than a membership function between 0 and 1 defined over an interval of values. Second. The sum of the membership functions was normalized by the denominator of the center of gravity calculation. display high variation due to their inherent morphological diversity. such as plants and commodities. Output membership functions can be initially set.5 Conclusions Linguistic descriptions in the form of membership functions and rules make up the model. labor requirements.5. By taking the center of gravity. etc. There are two main differences. Although only two singleton output functions were used.33. .6 FUZZY LOGIC MODEL FOR GRADING OF APPLES Agricultural produce is subject to quality inspection for optimum evaluation in the consumption cycle. tediousness. Therefore.104 FUZZY LOGIC AND NEURAL NETWORKS The second method uses the center of gravity of the combined output distribution to resolve this potential conflict and to consider all recommendations based on the strengths of their membership values. With singletons. these are combined through defuzzification to give a final control output. a heater power of 65%. conflicting rules essentially cancel and a fair weighting is obtained. First.73 and mOFF = 0. Techniques used in industrial applications. soil. such as template matching and fixed object modeling are unlikely to produce satisfactory results in the classification or control of input from agricultural products. but can be revised for controller tuning. in this case. in terms of weather. Applying the rules gave mON = 0. applying the same rules gave mON = 0. 9. applying automation in agriculture is not as simple as automating the industrial operations.

It provides a means of translating qualitative and imprecise information into quantitative (linguistic) terms. punctures and bruises were among the defects encountered on the surfaces of Golden Delicious apples. to manage a food supply and to predict peanut maturity.FUZZY LOGIC APPLICATIONS 105 Fuzzy logic can handle uncertainty. In addition to these defects. Fuzzy logic is a nonparametric classification procedure. to reduce grain losses from a combine. external defects. ambiguity and vagueness. 9. 2. to manage crop production. defect. 3. 9.6. fuzzification and defuzzification was done in Matlab.6. . to decide the transfer of dairy cows between feeding groups. To design a FL technique to classify apples according to their external features developing effective fuzzy membership functions and fuzzy rules for input and output variables based on quality standards and expert expectations. Fuzzy logic was successfully used to determine field trafficability. leaf roller. assuming that the same measurements can be done using a sensor fusion system in which measurements of features are collected and controlled automatically. Grading of apples was performed in terms of characteristics such as color. a and b. color. Size defects were determined measuring the maximum and minimum heights of apples using a Mitutoya electronic caliper. Weight was measured using an electronic scale. Color was measured using a CR-200 Minolta colorimeter in the domain of L. shape. Sizes of surface defects (natural and bruises) on apples were determined using a special figure template. ignoring their age. To compare the classification results from the FL approach and from sensory evaluation by a human expert. bitter pit. which can infer with nonlinear relations between input and output categories. weight and size. to predict corn breakage. Scars. weight and size. which consisted of a number of holes of different diameters.2 Materials and Methods Five quality features. russeting. were measured. shape. The main purpose of this study was to investigate the applicability of fuzzy logic to constructing and tuning fuzzy membership functions and to compare the accuracies of predictions of apple quality by a human expert and the proposed fuzzy logic model. to predict the yield for precision farming. Readings of these properties were obtained from different measurement apparatuses. To establish a multi-sensor measuring system for quality features in the long term. where L is the lightness factor and a and b are the chromaticity coordinates. The following objectives were included in this study: 1. maintaining flexibility in making decisions even on complex biological systems. a size defect (lopsidedness) was also measured by taking the ratio of maximum height of the apple to the minimum height.1 Apple Defects Used in the Study No defect formation practices by applying forces on apples were performed. to control the start-up and shutdown of food extrusion processes. Only defects occurring naturally or forcedly on apple surfaces during the growing season and handling operations were accounted for in terms of number and size. Maximum circumference measurement was performed using a Cranton circumference measuring device. to steer a sprayer automatically. Programming for fuzzy membership functions.

and fuzzy inference.. circumference. The expert was trained on the external quality criteria for good. natural defects. blush (reddish spots on the cheek of an apple) percentage and weight were combined under “Size” using the same procedure as with “Defect” Size = 5 ¥ C + 3 ¥ W + 5 ¥ BL .. The USDA standards for apple quality explicitly define the quality criteria so that it is quite straightforward for an expert to follow up and apply them.. input variables were reduced to 3 defect. medium and good. such as scars and leaf roller.10). After the combinations of features given in the above equations. A trial and error approach was used to develop membership functions.. medium and good). formation of fuzzy rules. R is the total area of russeting defect (normalized) and SD is the normalized size defect. russetting and size defects (lopsidedness).9 and 9. In addition. as it was difficult for the human expert to quantify it nondestructively. 1976).2) where C is the circumference of the apple (normalized). “defect” after normalizing each defect component such as bruises.3 ¥ SD . as total area (normalized).6. size and color. firmness was excluded from the evaluation. Fuzzy logic techniques were applied to classify apples after measuring the quality features. The grading performance of fuzzy logic proposed was determined by comparing the classification results from FL and the expert. 9.1) where B is the amount of bruising. 1976). defects were collected under a single numerical value.3 Application of Fuzzy Logic Three main operations were applied in the fuzzy logic decision making process: selection of fuzzy inputs and outputs. 9. 9. Coefficients used in the above equations were subjectively selected. was shown to be the best representation of human recognition of color. 1976). which was used to represent the color of apples. W is weight (normalized) and BL is the normalized blush percentage. Although triangular and trapezoidal functions were used in establishing membership functions for defects and color (Fig.11). Extremely large or small apples were already excluded by the handling personnel. To simplify the problem. depending on the expert’s experience.(9. bad. 21 of the apples were harvested before the others and kept for 15 days at room temperature for the same purpose of creating a variation in the appearance of the apples to be tested. medium and bad apple groups defined by USDA standards (USDA. Along with the measurements of features.106 FUZZY LOGIC AND NEURAL NETWORKS The number of apples used was determined based on the availability of apples with quality features of the 3 quality groups (bad. an exponential function with the base of the irrational number e was used to simulate the inclination of the human expert in grading apples in terms of size (Fig. A total of 181 golden delicious apples were graded first by a human expert and then by the proposed fuzzy logic approach. Although it was measured at the beginning. Eighty of the apples were kept at room temperature for 4 days while another 80 were kept in a cooler (at about 3°C) for the same period to create color variation on the surfaces of apples.(9. Similarly. Defect = 10 ¥ B + 5 ¥ ND + 3 ¥ R + 0. ND is the amount of natural defects. The Hue angle (tan-1(b/a)). the apples were graded by the human expert into three quality groups. . expectations and USDA standards (USDA. based on the expert’s expectations and USDA standards (USDA.

. Size = ex where e is approximately 2.FUZZY LOGIC APPLICATIONS 107 1 Low Medium High 0.5 106 Hue values 114 116 117 Fig.. .5 7.6 Fig. 1 Small Medium Big 6.13 7.9 Membership functions for the defect feature. 9.71828 and x is the value of size feature.05 6.(9.10 7. 9.15 11.0 2.7 2. 1 Yellow Greenish-yellow Green 90 95 100 104.2 1.4 Defects 4.10 Membership functions for the color feature.11 Membership functions for the size feature.1 1.80 8.3) .05 Size 11. 9.27 Fig.

on the other hand. Finally.5) was used to combine the membership degrees from each rule established. 1] where X represents the universal set.7 Q3.2 Q3.108 FUZZY LOGIC AND NEURAL NETWORKS 9. D is a fuzzy subset in X and μD(x) is the membership function of fuzzy set D..4) The minimum method given by equation (9.2).2.3 Q2.6 Q3.9 Q3. and the Complement. and it is a badly formed (small) apple. human linguistic expressions were involved in fuzzy rules. then quality is very good (rule Q1.2: Fuzzy rule tabulation C1 + S1 D1 D2 D3 Q1.15 Q3. S2 is moderately formed size (medium). Table 9.5 Q3...6 Q3. Degree of membership for any set ranges from 0 to1. A fuzzy set is defined by the expression below: C D = {X. while D2 and D3 represent moderate (medium) and high (bad) amounts of defects. If the color is pure yellow (overripe). which ranges from 1 to 17 for the bad quality group..3 Q3.4 Q3. . respectively.7 C2 + S3 Q3.7) . and C3 is yellow color quality (bad). there is no defect.1 Q2.1 in Table 9.5) . which are given as follows AND: OR: mC Ÿ mD = min {mC . Three primary set operations in fuzzy logic are AND. the first subscript 1 stands for the best quality group. The minimum method chooses the most certain output among all the membership degrees.16 Q3.(9. S1.2 C1 + S3 Q2. D1 represents a low amount of defects (desired)..2 Q2.3 Q3.2). respectively.11 Q3. C2 is greenish-yellow color quality medium). there are a lot of defects.0 represents a 100% membership while a value of 0 means 0% membership. is well formed size (desired).1 Q3.10 C3 + S1 Q2..(9. For quality groups represented with “Q” in Table 1.4 Fuzzy Rules At this stage. then quality is very bad (rule Q3.14 C 3 + S3 Q3. mD} complement = = 1 – mD .(9. m0(x))| x Œ X} m0(x): Æ [0. The second subscript of Q shows the number of rules for the particular quality group. The rules used in the evaluations of apple quality are given in Table 9.12 C 2 + S2 Q2.8 Q3. An example of the fuzzy AND (the minimum method) used in if-then rules to form the Q11 quality group in Table 9. Two of the rules used to evaluate the quality of Golden Delicious apples are given below: If the color is greenish..1 C 1 + S2 Q1. If there are three subgroups of size.6) . S3 is badly formed size (bad).17 Where.17 in Table 9.(9. mD} mC » mD = (mC ⁄ mD) = max {mC.4 C2 + S1 Q1. C1 is the greenish color quality (desired).. while 2 and 3 stand for the moderate and bad quality groups.2 is given as follows. and it is a well formed large apple. OR.5 C2 + S2 Q2.6. then three memberships are required to express the size values in a fuzzy rule.13 Q3. A value of 1.

.. Q1.(9.75) . 1. or both. Q3.13 . Q2.(9.(9.. Q3. when defect innput x(1) < 0. Q2.. Q2. determination of the quality group that an apple would belong to.14 .. Q1. 1 ⁄ Q1. Q1.12 .2 ..(9. max (k1) = (Q1..15 . 2. equation 11 produces the membership degree for the best class (Lee. for instance. when x (1) < 1. Q3.2. The membership function used in this study for defect quality in general is given in equation 9. when 1. Q3.(9.24 £ x (1) £ 2 . These functions can be defined either by linguistic terms or numerical ranges.13.(9. Q3. 3} then.9 through 9.52 For a medium amount of defects (D2).13) . 2 ⁄ Q1. Q3.6 ) k3 = (Q3..17 ) . the challenge is to assign input data into one or more of the overlapping membership functions. k1 = (Q1..6 .1 . Q3.6 m(D2) = .11 . Q3. size.3 Q2 . Q3. the fuzzy OR (the maximum method) rule was used in evaluating the results of the fuzzy rules given in Table 9. was done by calculating the most likely membership degree using equations 9.4 . Q3.75 £ x(1) £ 4.7 .15) .11) where k is the quality output group that contains different class membership degrees and the output vector y given in equation 10 below determines the probabilities of belonging to a quality group for an input sample before defuzzification: y = [max (k1) max (k2) max (k3)] where.(9. Q3. Q3.24 or x (1) > 7. In the existence of more than one membership function that is actually in the nature of the fuzzy logic approach.4.(9.3 Q3. was formed as given below: If the input vector x is given as x = [defects. Q3. 1990).1.5 Determination of Membership Functions ( using intuition and qualitative assessment of the Membership functions are in general developed by x (1) – 0. .8) On the other hand.2 . when 0. 3) = max {Q1.52 or 2.12) 9.. for instance.2 .14) m(D3) = 1.75 m(D3) = ( x (1) – 1. Q3.16 .. Q1.8 .24) relations between the input variable(s) and output 1..9 .76 classes. the membership function is m(D2) = 0.9) .77 . when x(1) ≥ 4.5 .6.. If..3 ) k2 = (Q2.1 .FUZZY LOGIC APPLICATIONS 109 Q11 = (C1 Ÿ S1 Ÿ D1) = min {C1.10 .5 . for example. The membership function for high amounts of defects.10) . color]. Q3. D1} . 4 . then the membership function for the class of a high amount of defects (D3) is m(D3) = 0.. S1. Q2..

.

.

in addition to the features mentioned earlier. what should the tip be? This problem is based on tipping as it is typically practiced in the United States. is 15%.6. though the actual amount may vary depending on the quality of the service provided. fuzzy and non-fuzzy approaches are applied to the same problem.112 FUZZY LOGIC AND NEURAL NETWORKS 9. defects and size are three important criteria in apple classification. 9.15 0. .13 Constant tipping. 9. tip = 0. providing good flexibility in reflecting the expert’s expectations and grading standards into the results. the same system is solved using fuzzy logic.05 0 0 2 4 6 Service 8 10 Fig.S.15 Tip 0.2 0. Suppose that the tip always equals 15% of the total bill. It was also seen that color. 9.25 0. Consider the tipping problem: what is the “right” amount to tip your waitperson? Given a number between 0 and 10 that represents the quality of service at a restaurant (where 10 is excellent). An average tip for a meal in the U. Then. First the problem is solved using the conventional (non-fuzzy) method. Grading results obtained from fuzzy logic showed a good general agreement with the results from the human expert. However.8 Conclusion Fuzzy logic was successfully applied to serve as a decision support technique in grading apples. variables such as firmness.1 The Non-Fuzzy Approach Let’s start with the simplest possible relationship (Fig.7 AN INTRODUCTORY EXAMPLE: FUZZY V/S NON-FUZZY To illustrate the value of fuzzy logic. could increase the efficiency of decisions made regarding apple quality.13). writing MATLAB commands that spell out linear and piecewise-linear relations. 9.7.1 0. internal defects and some other sensory evaluations.

This extension of the problem is defined as follows: Given two sets of numbers between 0 and 10 (where 10 is excellent) that respectively represent the quality of the service and the quality of the food at a restaurant.1 0.15 0. 9.2 Tip 0.14).05 0 2 4 6 Service 8 10 Fig. 9. Now our relation looks like this: tip = 0.05 0.05 0.05 10 5 Food 0 0 5 Service 10 Fig. Suppose we try: tip = 0.1 0.15).20/20 ¥ (service + food) + 0. so we need to add a new term to the equation. However. The formula does what we want it to do.25 0. . we may want the tip to reflect the quality of the food as well.25 0. what should the tip be? Let’s see how the formula will be affected now that we’ve added another variable (Fig.15 0. 9.15 Tipping depend on service and quality of food. Since service is rated on a scale of 0 to 10. we might have the tip go linearly from 5% if the service is bad to 25% if the service is excellent (Fig.2 Tip 0. 14 Linear tipping. 9.20/10 * service + 0. and it is pretty straight forward.FUZZY LOGIC APPLICATIONS 113 This does not really take into account the quality of the service.

15. the results look pretty. end .15. Let’s say that the service will account for 80% of the overall tipping “grade” and the food will make up the other 20%. This.. Let’s return to the one-dimensional problem of just considering the service. tip = 0. You can string together a simple conditional statement using breakpoints like this: if service < 3. Try: servRatio = 0.05 10 5 Food 0 0 5 Service 10 Fig. tip= servRatio ¥ (0. but when you look at them closely.114 FUZZY LOGIC AND NEURAL NETWORKS In this case. in turn. We can still salvage things by using a piecewise linear construction (Fig.17). 9. The response is still somehow too uniformly linear.e. 9. else if service < 7 . you want to give a 15% tip in general.10/3) ¥ (service –7) + 0. tip = (0. and will depart from this plateau only if the service is exceptionally good or bad (Fig.16 Tipping based on the service to be a more important factor than the food quality. means that those nice linear mappings no longer apply.05.8.16). 9. Suppose you want more of a flat response in the middle.05).20/10 ¥ service + 0. i.10/3) ¥ service + 0. tip = (0. 0.25 0. Suppose you want the service to be a more important factor than the food quality.2 Tip 0.20/10 ¥ food + 0. they do not seem quite right. else if service < =10.05) + (1– servRatio) ¥ (0.15 0.1 0.

2 Tip 0.15 0.18 Tipping with two-dimensional variation.05 10 5 Food 0 0 5 Service 10 Fig.8.15) ¥ servRatio + (1 – servRatio) ¥ (0.20/10 ¥ food + 0.15) ¥ servRatio + (1 – servRatio) ¥ (0.2 Tip 0. where we take food into account again. If we extend this to two dimensions (Fig.1 0. 9.05).10/3) ¥ service + 0.1 0.20/10 ¥ food + 0. 17 Tipping using a piecewise linear construction.05).25 0. end 0.05) ¥ servRatio + (1 – servRatio) ¥ (0. tip = ((0. if service < 3. tip = ((0. something like this result: servRatio = 0.25 0. 9. 9.18).10/3) ¥ (service – 7) + 0.FUZZY LOGIC APPLICATIONS 115 0. else if service < 7. tip = (0. .05).20/10 ¥ food + 0.15 0. else.05 0 2 4 6 Service 8 10 Fig.

2 Tip 0. simple. What matters is what we have shown in this preliminary exposition: fuzzy is adaptable. then tip is cheap 2. leaving aside all the factors that could be arbitrary. then tip is cheap 5. If we wanted to include the food’s effect on the tip. things like: • How are the rules all combined? • How do I define mathematically what an “average” tip is? The details of the method do not really change much from problem to problem . and easily applied. we might end up with the following rule descriptions: 1. If we make a list of what really matters in this problem. It does not matter which rules come first. If food is rancid.15 0. If service is poor or the food is rancid. then tip is average 3. and it is definitely not easy to modify this code in the future. If service is poor. then tip is cheap 2. 9. Of course. we have just defined the rules for a fuzzy logic system. there’s a lot left to the methodology of fuzzy logic that we’re not mentioning right now.116 FUZZY LOGIC AND NEURAL NETWORKS The plot looks good. If service is excellent or food is delicious. If service is good. It was a little tricky to code this correctly. but the function is surprisingly complicated. Now if we give mathematical meaning to the linguistic variables (what is an “average” tip. Moreover. then tip is average 3. If food is delicious. we might add the following two rules: 4.1 0. we can combine the two different lists of rules into one tight list of three rules like so: 1. 0. If service is excellent.the mechanics of fuzzy logic are not terribly complex. . then tip is generous In fact. then tip is generous These three rules are the core of our solution.05 10 5 Food 0 0 5 Service 10 Fig. then tip is generous The order in which the rules are presented here is arbitrary. And coincidentally. it is even less apparent how the algorithm works to someone who did not witness the original design process.19 Tipping using fuzzy logic.7. for example?) we would have a complete fuzzy inference system.25 0. If service is good.2 The Fuzzy Approach It would be nice if we could just capture the essentials of this problem. 9.

7. % If service is good. okayService=3. 9.15. highTip=0. tip=(((averTip–lowTip)/(okayService–badService)) .3 Some Observations Here are some observations about the example so far. The notion of an average tip might change from day to day. tip is cheap if service<okayService. foodRange=greatFood–badFood. Also. In other words. city to city. % If service is poor or food is rancid. averTip=0.FUZZY LOGIC APPLICATIONS 117 Here is the picture associated with the fuzzy system that solves this problem (Fig. Moreover.. badService=0. tip is average elseif service<goodService. On the other hand.. You can do this sort of thing with lists of piecewise linear functions. the fuzzy system is based on some “common sense” statements. % Establish constants lowTip=0. only now the constants can be easily changed. It performs the same function as before. greatFood=10. (tipRange/foodRange*food+lowTip). tipRange=highTip–lowTip.25. by using fuzzy logic rules.. greatService=10.. here is the piecewise linear tipping problem slightly rewritten to make it more generic. it was not very easy to interpret. but the underlying logic the same: if the service is good. For example. goodService=7. badFood=0. (1–servRatio)*(tipRange/foodRange*food+lowTip). . serviceRange=greatService–badService. You can recalibrate the method quickly by simply shifting the fuzzy set that defines average without rewriting the fuzzy rules. We found a piecewise linear relation that solved the problem.. country to country. and once we wrote it down as code. The picture above was generated by the three rules above. the maintenance of the structure of the algorithm decouples along fairly clean lines. we were able to add two more rules to the bottom of the list that influenced the shape of the overall output without needing to undo what had already been done.. 9. It worked. but there is a greater likelihood that recalibration will not be so quick and simple. tip=averTip*servRatio + (1–servRatio)* . *service+lowTip)*servRatio + . the subsequent modification was pretty easy. but it was something of a nuisance to derive.05.19). the tip should be average.

pp. No.. (greatService–goodService))* . 1976. high level comments. Information and Control. what remain are exactly the fuzzy rules we wrote down before: % If service is poor or food is rancid.M.P. tip is generous If. 1977.C.R. 301-308.. . A fuzzy logic controller for a traffic junction. think how much more likely your code is to have comments! Fuzzy logic lets the language that’s clearest to you.. which is why it is a very successful technique for bridging the gap between people and machines. 3. 1. or perhaps by trying to rewrite it in slightly more self-evident ways. Zadeh. USDA Agricultural Marketing Service. 1. 338-353.118 FUZZY LOGIC AND NEURAL NETWORKS % If service is excellent or food is delicious. 1976. Mamdani. Fuzzy sets. Pappis and E.. we can fight this tendency to be obscure by adding still more comments. Vol. True. threatening eventually to obscure it completely.. D. pp. Washington. REFERENCES. What we are doing here is not that complicated. as with a fuzzy system. but the medium is not on our side. 12. United States Standards for Grades of Apples. IEEE Transactions on Systems. for creeping generality to render the algorithm more and more opaque. tip is average % If service is excellent or food is delicious. L. C. 3. the comment is identical with the code. Vol. The truly fascinating thing to notice is that if we remove everything except for three comments.A. Kickert and H. QUESTION BANK. 4. tip is generous else. (1–servRatio)*(tipRange/foodRange*food+lowTip). Van Nauta Lemke. tip=(((highTip–averTip)/ . Vol. Automatica.. 2. 10. No. 1965. as with all code.. Man and Cybernetics. 4. Why use fuzzy logic? What are the applications of fuzzy logic? When not use fuzzy logic? Compare non-fuzzy logic and fuzzy logic approaches. 2. (service–goodService)+averTip)*servRatio + . tip is cheap % If service is good. end Notice the tendency here. 4. 8. also have meaning to the machine. W.J. Application of a fuzzy controller in a warm water plat. pp. 7. 707-717.H.

6. No. Vol. T. Intelligent Transportation System and Traffic Safety Drivers Perception and Acceptance of Electronic Speed Checkers. U. 1985.G. Agriculture. Rawlik. 19. 1339. 20. 28. Kahn and E. Vol. pp. 2. pp.S. pp. Control of a redundant manipulator using fuzzy rules. 131-139.2278.J. 3. 12. Chen and E. 2. Tobi and T. Chakroborty and H. No. Newell. 9.S. Park. R. 21. 2272. pp. 1991. Vol.C. Q. 5. 1-12. Palm. 121-133. Colvin. Karlen. pp. Liu and J. Man and Cybernetics. Ben-Hannan. pp. No. 26. Vol. 13. 20: 404-435.R. 27-35. M. J. Trafficability determination using fuzzy set theory. 1993. 1992. A. 10. Hanafusa. Nishida.E.M. 103-113. pp. Man and Cybernetics. Fuzzy control of model car. MI. 24. Lee. Song and S. 23. pp. Transportation Research Part C. 1647-1654. Transactions of the ASAE. Fuzzy Sets and Systems. Wu. 7. 279-298. Joseph.Part I and Part II. Electron. 1989. 1994. P. 1994. pp. Vol.H. A fuzzy multi-criteria decision making method for technology transfer strategy selection in biotechnology. 63. S. 9. 69-83. E. IEEE Transactions on Systems. 6. Vol. Fuzzy logic in control systems: Fuzzy logic controller.O. pp. No. Fuzzy Sets and Systems. Peleg and P.C. pp. Automatica. Evaluation of cabbage seedling quality by fuzzy logic. Fuzzy controller robot arm trajectory. 16. 16. 45. A. pp. B.P. 1999. Vol. Colvin and D. Y. USA. No. 943028. P. A fuzzy logic expert system for dairy cow transfer between feeding groups. No.C. 31. 3. 14. Hofaifar. Information Sciences: Applications. Vol. Yang.L. 5. No. 255-273.S. 2.L. 23. Vol. S. K. 22. E. 54.B. No. Graham and R. 1993. 1. Grinspan. 10. 17. Frank. N. 2. C. 37. Thangavadivelu and T. Vol. 1. ASAE Paper No. pp. Czogala and T. Fuzzy control of steam turbines. 1994. Computer. International Journal of Approximate Reasoning. A fuzzy logic yield simulator for prescription farming. 267-276. 13-22. Modeling of driver anxiety during signal change intervals. 1991. B. International Journal of Systems Science. pp. 1994. 34. and No. 11. Perincherry. No. Westin. 1999-2009. Vol. Edan. Marell and K. Sayyarodsari and J. 2. T. . Vol. 1993. P. Fuzzy Sets and Systems. 3. Sugeno and M. 37. Fuzzy Sets and Systems. Fuzzy Sets and Systems. Roger. 18. 1990. 1993. 8. Hogans. Vol. Fuzzy identification and control of a liquid level rig. Ambuel. No. J. 1992. Kikuchi. Transportation research record. Vol. 1994. Vol. Vol. Kiupel and P. Maltz. No. Chang and Y. Transactions of the ASAE. 5.FUZZY LOGIC APPLICATIONS 119 5. 961-968. No. Chen. 1988. pp. IEEE Transactions on Systems. A model for rider-motorcycle system using fuzzy control. Gutman. V. Modeling of a fuzzy controller with application to the control of biological processes. A practical application of fuzzy control for an air-conditioning system. 15. No. 331-348. Classification of apple surface features using machine vision and neural networks. Transactions of the ASAE. 1993. 7 pp. Classification of fruits by a Boltzman perceptron neural network. Takahasgi. 131147. A Fuzzy Dynamic Learning Controller for Chemical Process Control. St. 1905-1914. pp. T. S.

D.A. M. Elvik. pp. 2000. Vol. Accident Analysis and Prevention. 2000. Fuzzy Logic Systems for Transportation Engineering: The State Of The Art. USA. How much do road accidents cost the national economy. Transportation Research Part A.W. 25. Transactions of the ASAE. 483-490. Teodorovic. R. Fuzzy logic model for predicting peanut maturity. . and E. 26. Verma. pp: 849-851.P. 337-364. No. 1999. Vol.120 FUZZY LOGIC AND NEURAL NETWORKS 24. Volume: 32. 33. Tollner. pp. Shahin. 43. 2. B.

are all variations on the parallel distributed processing (PDP) idea.4 Neural Networks Fundamentals 10 10. In this chapter we first discuss these processing units and discuss different network topologies.1). the neurons or the processing units may have several input paths corresponding to the dendrites. . The architecture of each network is based on very similar building blocks. 10. the weighted values of these paths (Fig. with which the axon is connected to (Fig. received by neuron. Learning strategies as a basis for an adaptive system will be presented in the last section. 10. one can find nucleus with which the connections with other neurons are made through a network of fibres called dendrites. The human brain consists of nearly 1011 neurons (nerve cells) of different types. The modified value is directly presented to the next neuron. by means of complex chemical process. Extending out from the nucleus is the axon. that is. In the neural network.1 INTRODUCTION The artificial neural networks. electric potentials to the neurons. which perform the processing. In a typical neuron.2).+ 0 ) 2 6 .2 BIOLOGICAL NEURAL NETWORK The term neural network comes from the intended analogy with the functioning of the human brain adopting simplified models of biological neural network. which transmits. which we describe in this course. become equal or surpass their threshold values. where it is modified by threshold function such as sigmoid function. the message is transferred from one neuron to the other. 10. it triggers sending an electric signal of constant level and duration through axon. In this way. When signals. The weighted value is passed to the neuron. The units combine usually by a simple summation.

Generally each connection is defined by a weight wjk which determines the effect which the signal of unit j has on unit k. an activation function Fk. 10. cells).1 Schematic representation of biological neuron network. the update). a state of activation yk for every unit. which is equivalent to the output of the unit.. . which determines the effective input sk of a unit from its external inputs.122 FUZZY LOGIC AND NEURAL NETWORKS Dendrite Cell body Nucleus Myelin sheath Axon Nerve ending Synapse Fig. which communicate by sending signals to each other over a large number of weighted connections. which determines the new level of activation based on the effective input sk(t) and the current activation yk(t) (i.e.3 A FRAMEWORK FOR DISTRIBUTED REPRESENTATION An artificial network consists of a pool of simple processing units. a propagation rule. 10. A set of major aspects of a parallel distributed model can be distinguished as: a set of processing units (neurons. 10. connections between the units.2 Wnj Schematic representation of mathematical neuron network. X1 X2 Wij W2j 1 2 Wij Xi Xn Fig.

1 Processing Units Each unit performs a relatively simple job: receive input from neighbors or external sources and use this to compute an output signal.3 The basic components of an artificial neural network. all units update their activation simultaneously. The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term qk: sk(t) = åw j jk (t) yj(t) + qk(t) . and hidden units (indicated by an index h) whose input and output signals remain within the neural network. w w wjk w j yj qk Sk = k å j w jk y j + q k Fk yk Fig. Figure 10. With synchronous updating. In some cases the latter model has some advantages.(10. units can be updated either synchronously or asynchronously. 10.3. which is propagated to other units.3 illustrates these basics. Within neural systems it is useful to distinguish three types of units: input units (indicated by an index i) which receive data from outside the neural network. 10. output units (indicated by an index o) which send data out of the neural network.2 Connections between Units In most cases we assume that each unit provides an additive contribution to the input of the unit with which it is connected. and usually only one unit will be able to do this at a time. During operation.1) . with asynchronous updating. offset) Gk for each unit.NEURAL NETWORKS FUNDAMENTALS 123 an external input (aka bias. a second task is the adjustment of the weights. providing input signals and if necessary error signals. The system is inherently parallel in the sense that many units can carry out their computations at the same time. an environment within which the system must operate..3. a method for information gathering (the learning rule). 10.. each unit has a (usually fixed) probability of updating its activation at a time t. Apart from this processing. some of which will be discussed in the next sections. The propagation rule used here is the standard weighted summation.

10.(10. 10. which gives the effect of the total input on the activation of the unit.. or a linear or semi-linear function.. as well as implementation of lookup tables.3. introduced by Feldman and Ballard.(10. some sort of threshold function is used: a hard limiting threshold function (a sgn function).. In some applications a hyperbolic tangent is used.1) sigma units.124 FUZZY LOGIC AND NEURAL NETWORKS The contribution for positive wjk is considered as an excitation and for negative wjk as inhibition. In that case the activation is not deterministically determined by the neuron input. Often. in which a distinction is made between excitatory and inhibitory inputs. the output of a unit can be a stochastic function of the total input of the unit. For this smoothly limiting function often a sigmoid (S-shaped) function like Sgn i Semi-linear i Sigmoid i Fig. is known as the propagation rule for the sigma-pi unit is given by sk(t) = åw j jk (t) Õy m jm (t) + qk(t) .. We need a function Fk which takes the total input sk(t) and the current activation yk(t) and produces a new value of the activation of the unit k: yk(t + 1) = Fk ( yk(t). In some cases.3 Activation and Output Rules We also need a rule. yielding output values in the range [1.4 Various activation functions for a unit.(10.2) Often.4) although activation functions are not restricted to non-decreasing functions.3) F w GH å j jk (t ) y j (t ) + q k (t ) I JK . We call units with a propagation rule (10. 10. they have their value for gating of input.5) is used. the activation function is a non-decreasing function of the total input of the unit: yk(t + 1) = Fk ( sk(t)) = Fk . Although these units are not frequently used.. +1]. the yjm are weighted before multiplication.. A different propagation rule.(10. but the neuron input determines the probability p that a neuron get a high activation value: . In some cases more complex rules for combining inputs are used. or a smoothly limiting threshold (see Fig.4). yk = F(sk) = 1 1 + e sk ... Generally. sk (t)).

and Hopfield and will be discussed in subsequent chapters.. Kohonen.1 Paradigms of Learning We can categorize the learning situations in two distinct sorts. the change of the activation values of the output neurons are significant. Another way is to train the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule. the activation values of the units undergo a relaxation process such that the network will evolve to a stable state in which these activations do not change anymore. but no feedback connections are present.5. The data processing can extend over multiple (layers of) units.6) in which T (temperature) is a parameter which determines the slope of the probability function. These input-output pairs can be provided by an external teacher.NEURAL NETWORKS FUNDAMENTALS 125 p(yk ¬ 1) = 1 1 + e sk / T . In other applications. which contains the network (self-supervised).. where the data flow from input to output units is strictly feed-forward.(10. Contrary to feed-forward networks.4 NETWORK TOPOLOGIES In the previous section we discussed the properties of the basic processing unit in an artificial neural network. Recurrent networks that do contain feedback connections. In some cases. 10. that is. · Unsupervised learning or Self-organization in which an (output) unit is trained to respond to clusters of pattern within the input. 10.5 TRAINING OF ARTIFICIAL NEURAL NETWORKS A neural network has to be configured such that the application of a set of inputs produces (either direct or via a relaxation process) the desired set of outputs. Examples of recurrent networks have been presented by Anderson. using a priori knowledge. connections extending from outputs of units to inputs of units in the same layer or previous layers. These are: · Supervised learning or Associative learning in which the network is trained by providing it with input and matching output patterns. the dynamical properties of the network are important. In all networks we consider that the output of a neuron is to be identical to its activation level. This section focuses on the pattern of connections between the units and the propagation of data. the main distinction we can make is between: Feed-forward networks. Various methods to set the strengths of the connections exist. One way is to set the weights explicitly. which will be discussed in the next chapter. As for this pattern of connections. or by the system. such that the dynamical behavior constitutes the output of the network. Classical examples of feed-forward networks are the Perceptron and Adaline. In this paradigm the system is supposed to discover . 10.

7) where g is a positive constant of proportionality representing the learning rate. x p the jth element of the pth input pattern vector. Many variants (often very exotic ones) have been published the last few years. i an input unit. d p the desired output of the network when input pattern vector p was input to the network.(10. and will be discussed in the next chapter.. d p the jth element of the desired output of the network when input pattern vector p was input to the j network. their interconnection must be strengthened. .8) in which dk is the desired activation provided by a teacher. Virtually all learning rules for models of this type can be considered as a variant of the Hebbian learning rule. x p the pth input pattern vector.e. p is often not necessary) or added (e.(10. 10. have indices) where necessary.. . there is no a priori set of categories into which the patterns are to be classified rather the system must develop its own representation of the input stimuli. according to some modification rule. contrariwise to the notation below. Vectors are indicated with a bold non-slanted font: j.g. .5. If j receives input from k.. k.126 FUZZY LOGIC AND NEURAL NETWORKS statistically salient features of the input population. j s p the input to a set of neurons when input pattern vector p is clamped (i.1 Notation NOTATION AND TERMINOLOGY We use the following notation in our formulae.6. Note that not all symbols are meaningful for all networks. the unit j. Unlike the supervised learning paradigm. This is often called the Widrow-Hoff rule or the delta rule. Another common rule uses not the actual activation of unit k but the difference between the actual and desired activation for adjusting the weights: Dwjk = g yj (dk yk) . o an output unit.. vectors can..2 Modifying Patterns of Connectivity Both learning paradigms discussed above result in an adjustment of the weights of the connections between units.g. the simplest version of Hebbian learning prescribes to modify the weight wjk with Dwjk = g yj yk . h a hidden unit. often: the input of the network by clamping input pattern vector p.. k.6 10. 10. and that in some cases subscripts or superscripts may be left out (e. In the next chapters some of these update rules will be discussed. presented to the network). The basic idea is that if two units j and k are active simultaneously..

threshold: These terms all refer to a constant (i. That is. W the matrix of connection weights. gjk the learning rate associated with weight wjk . 10. Thus a network with one input layer. Because a neural network is built from a set of standard functions. the second one is the learning algorithm. offset. the inputs perform no computation and their layer is therefore not counted. although the latter two terms are often envisaged as a property of the activation function. and even for an optimal set of weights the approximation error is not zero. Ep the error in the output of the network when input pattern vector p is input. q the biases to the units. Since there is no need to do otherwise. Fj the activation function associated with unit j. They may be used interchangeably. in most cases the network will only approximate the desired function. This convention is widely though not yet universally used. wjk the weight of the connection from unit j to unit k. learning: When using a neural network one has to distinguish two issues which influence the performance of the system.6. we consider the output and the activation value of a unit to be one and the same thing. and one output layer is referred to as a network with two layers. A the energy of the network. one hidden layer.NEURAL NETWORKS FUNDAMENTALS 127 y p the activation values of the network when input pattern vector p was input to the network. is there a procedure to (iteratively) find this set of weights? . this external input is usually implemented (and can be written) as a weight from a unit with activation value 1. y p the activation values of element j of the network when input pattern vector p was input to the j network.. The first one is the representational power of the network.2 Terminology Output vs. Furthermore. Number of layers: In a feed-forward network. Given that there exist a set of optimal weights in the network. Representation vs. Bias.e. independent of the network input but adapted by the learning rule) term which is input to a unit. wj the weights of the connections which feed into unit j. Uj the threshold of unit j in Fj . activation of a unit. the output of each neuron equals its activation value. qj the bias input to unit j. The representational power of a neural network refers to the ability of a neural network to represent a desired function. The second issue is the learning algorithm.

G. D. Widrow. 1962. Samuels (Eds. Mel.E. T. Hebb. 1. What are the major aspects of parallel distributed model? Explain the biological neural network. McClelland. 5. Ballard. Jacobi. Kohonen.128 FUZZY LOGIC AND NEURAL NETWORKS QUESTION BANK. J. The MIT Press. 4.T. Vol. New York: Wiley. 1977. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. What are the paradigms of neural network learning? REFERENCES. NJ: Erlbaum. 79.C: Spartan Books. PP. 6. G. 2. Hopfield. Rumelhart and J. Neural networks and physical systems with emergent collective computational abilities. and D. 1982. Hillsdale. What are the basic components of artificial neural network? What are the network topologies? What are the various activation function? Explain them schematically. 1. Vol. 1977. 9. Feldman. 2554-2558. 1990. Generalization and Information Storage in Networks of Adaline Neurons. LaBerge and S. 3. Associative Memory: A System-Theoretical Approach.W. San Diego. 1949. B. 1986. 7. Connectionist models and their properties. No.A.C. Washington. . pp. Neural models with cognitive implications. 205-254. Anderson. B.). B.A. Connectionist Robot Motion Planning. pp.L.. D.J. J. J.O. pp. Cognitive Science. Vol. 435-461.J. 263-269. Learning state space trajectories in recurrent neural networks. 2. ed. 6. Pearlmutter.A. Neural Computation. 4. In D. 1989. CA: Academic Press. Goldstein. 1. 6. Basic Processes in Reading Perception and Comprehension Models (pp.H. in Self Organizing Systems 1962. 3. Proceedings of the National Academy of Sciences. Jovitz. 27-90). D. 1982. M. 2. 5. Springer-Verlag. The Organization of Behaviour. 8.

11. each of which is connected with a weighting factor wio to all of the inputs i.1 INTRODUCTION This chapter describes single layer neural networks.+ 0 ) 2 6 . The input of the neuron is the weighted sum of the inputs plus the bias term. In the simplest case the network has only two inputs and a single output. Two classical models will be described in the first part of the chapter: the Perceptron.2 NETWORKS WITH THRESHOLD ACTIVATION FUNCTIONS A single layer feed-forward network consists of one or more output neurons o. In the second part we will discuss the representational limitations of single layer networks. . which is some function of the input: X1 W1 y X2 W2 q +1 Fig. 11. 11.1 (we leave the output index o out). presented by Widrow and Hoff.1 Single layer network with one output and two inputs.4 Perceptron and Adaline 11 11. In the first part of this chapter we discuss the representational power of the single layer networks and their learning algorithms and will give some examples of using the networks. proposed by Rosenblatt and the Adaline. including some of the classical approaches to the neural computing and learning problem. as sketched in Fig. The output of the network is formed by the activation of the output neuron.

2.1) The activation function F can be linear so that we have a linear network. If the total input is positive...4) and we see that the weights determine the slope of the line and the bias determines the offset.. the pattern will be assigned to class +1. if the total input is negative.(11. depending on the input. i.(11. The network can now be used for a classification task: it can decide whether an input pattern belongs to one of two classes.3) can be written as x2 = w1 q x1 w2 w2 . how far the line is from the origin. Equation (11.. 11.e.130 FUZZY LOGIC AND NEURAL NETWORKS y=F F I GH å w x + G JK 2 i i i =1 . A geometrical representation of the linear threshold neural network is given in Fig.(11.(11. given by the equation: w1x1 + w2 x2 + q = 0 .2) The output of the network thus is either +1 or 1.2 Geometric representation of the discriminant function and the weights.. Note that also the weights can be plotted in the input space: the weight x2 + + w1 + w2 –q ||W|| x1 + + + + Fig.. or non-linear.. In this section we consider the threshold (sgn) function: F(s) = R+1 S1 T if s > 0 otherwise . The separation between the two classes in this case is a straight line. . 11. the sample will be assigned to class +1.3) The single layer network represents a linear discriminant function..

4. which states the following: If there exists a set of connection weights w* which is able to perform the transformation y = d(x).5) . Given the perceptron learning rule as stated above.2 Convergence Theorem For the learning rule there exists a convergence theorem. Now that we have shown the representational power of the single layer network with linear threshold units. . Go back to 2. Besides modifying the weights. Proof: Given the fact that the length of the vector w* does not play a role (because of the sgn operation).. The threshold is updated in a same way: wi(t + 1) = wi(t) + Dwi(t) q(t + 1) = q(t) + Dq(t) .. A learning sample is presented to the network. The perceptron learning rule is very simple and can be stated as follows: 1.. when the network responds correctly. If y ¹ d(x) (the perceptron gives an incorrect response).6) The learning problem can now be formulated as: how do we compute Dwi(t) and Dq(t) in order to classify the larning patterns correctly? 11. 2. we take ||w*|| = 1.3.PERCEPTRON AND ADALINE 131 vector is always perpendicular to the discriminant function. Note that the procedure is very similar to the Hebb rule. Select an input vector x from the set of training samples. the only difference is that. no connection weights are modified. modify all connections wi according to: Dwi = d(x)xi.(11.3. This q is considered as a connection w0 between the output neuron and a dummy predicate unit which is always on: x0 = 1. 3. the value |w* o x|.(11. Both methods are iterative procedures that adjust the weights.. Start with random weights for the connections. the perceptron learning rule will converge to some solution (which may or may not be the same as w*) in a finite number of steps for any initial choice of the weights.7) 11. For each weight the new value is computed by adding a correction to the old value. we must also modify the threshold q.1 Perceptron Learning Rule Suppose we have a set of learning samples consisting of an input vector x and a desired output d(x). For a classification task the d(x) is usually +1 or 1.. we come to the second issue: how do we learn the weights and biases in the network? We will describe two learning methods for these types of networks: the perceptron learning rule and the `delta or `LMS rule.3 PERCEPTRON LEARNING RULE AND CONVERGENCE THEOREM 11. Because w*is a correct solution. will be greater than 0 or: there exists a d > 0 such that |w* o x| > d for all inputs x. where o denotes dot or inner product. this threshold is modified according to: Dq = R0 Sd ( x) T if the perceptron responds correctly otherwise .(11..

. 11. If we start with connections w = 0. From this it follows that: w¢ o w* = w o w* + d(x) o w* o x = w o w* + sgn(w* o x) w* o x > w o w* + d ||w¢||2 = ||w + d(x)x||2 = w2 + 2d (x) w o x + x2 < w2 + x2 =w +M After t modifications we have: w(t) o w* > w o w* + td ||w(t)||2 < w2 + tM such that cos a(t) = 2 (because d (x) = sgn [w o x]) w* o w(t ) || w(t )|| w* o w + td w2 + tM > From this follows that limt®¥ cos a(t) = limt®¥ d M t = ¥ while cos a £ 1.1: A perceptron is initialized with the following weights: w1 = 1. q = 2..(11. w2 = 2.3.8) Example 11. and the weight after modification is w¢ = w + Dw.. the system modifies its connections only a limited number of times. || w|| When according to the perceptron learning rule. In other words. The perceptron learning rule is used to learn a correct discriminant function for a number of samples. tmax will be reached when cos a = 1. connection weights are modified at a given input x. after maximally tmax modifications of the weights the perceptron is correctly performing the mapping. tmax = M d2 . we know that Dw = d(x)x. The conclusion is that there must be an upper limit tmax for t. sketched in Fig.132 FUZZY LOGIC AND NEURAL NETWORKS Now define cos a = w o w* .

i = 0. In Fig. while the target value d(x) = +1. q = 1.4 ADAPTIVE LINEAR ELEMENT (Adaline) An important generalisation of the perceptron training algorithm was presented by Widrow and Hoff as the least mean square (LMS) learning procedure. The main functional di_erence with the perceptron training rule is the way the output of the system is used in the learning rule.PERCEPTRON AND ADALINE 133 x2 2 + A 1 B +C 1 2 x1 Original discriminant function After weight update + Fig. 11. it may be clear that a system with many parallel outputs is directly implementable by multiple units of the above kind. the summer. q = 1. 11. also named Adaline. then the output of the central block is defined to be . is also followed by a quantiser.. and the input and output signals by xi and y. 11. The learning rule was applied to the adaptive linear element. Usually the central block. so no change. which can sum up currents caused by the input voltage signals. The new weights are now: w1 = 1:5. also known as the delta rule. n. According to the perceptron learning rule. so no weights are adjusted.3 Discriminant function before and after weight update. with values x = (0:5. Dw2 = 0:5. the weight changes are: Dw1 = 0:5. 1. w2 = 2:5..3 the discriminant function before and after this weight update is shown. From equation (11. and sample C is classified correctly. In a simple physical implementation (Fig. which outputs either +1 or 1. 1:5) and target value d(x) = +1 is presented to the network. The same is the case for point B. Although the adaptive process is here exemplified in a case when there is only one output. The first sample A.1) it can be calculated that the network output is +1.4) this device consists of a set of controllable resistors connected to a circuit. developed by Widrow and Hoff. The delta-rule uses the net output without further mapping into output values 1 or +1.. When presenting point C with values x = (0:5.. 11. The perceptron learning rule uses the output of the threshold function (either 1 or +1) for learning. depending on the polarity of the sum. with values x = (0:5. 0:5) and target value d(x) = -1. respectively. the network output is negative. 0:5) the network output will be 1. If the input conductances are denoted by wi.

(11. but here we focus on the linear relationship and use the network for a function approximation task. The problem is to determine the coeficients wi.10) Such a simple network is able to represent a linear relationship between the value of the output unit and the value of the input units. . In high dimensional input spaces the network represents a (hyper) plane and it will be clear that also multiple output units may be defined.. n. 11. in such a way that the input-output response is correct for a large number of arbitrarily chosen signal sets.. The purpose of this device is to yield a given value y = d p at its output when the set of values xip i = 1. 11. n Level w0 Output y= åw x i =1 i i +q . If an exact mapping is not possible. Suppose we want to train the network such that a hyperplane is fitted as well as possible to a set of training samples consisting of input values x p and desired (or target) output values d p. i = 0. For the Adaline. . to attain the correct values. usually iteratively.. the average error must be minimised.4 The adaline. By thresholding the output value.(11. for instance.. For every given input sample. 2. 1. . the output of the network differs from the target value d p by (d p y p). An adaptive operation means that there exists a mechanism by which the wi can be adjusted. is applied at the inputs.9) where q = w0.134 FUZZY LOGIC AND NEURAL NETWORKS +1 –1 +1 w1 w2 w3 Gains Input pattern switches S – Summer Error S Quantizer + – 1 +1 Reference switch Fig. Widrow introduced the delta rule to adjust the weights. in the sense of least squares. where y p is the actual output for this pattern. . n.5 THE DELTA RULE For a single layer network with an output unit with a linear activation function the output is simply given by y= åw x j j j +q . a classifier can be constructed (such as Adaline). The delta-rule now uses a cost-or error-function based on these differences to adjust the weights.

is the summed squared error. .(11.. ¶w j ¶y p ¶w j Because of the linear units...15) ..(11. The delta rule modifies weight appropriately for target and actual outputs of either polarity and for both continuous and binary input and output units. The derivative is ¶E p ¶E p ¶y p = .(11.10).13) ¶y p = xj ¶w j and ... The idea is to make a change in the weight proportional to the negative of the derivative of the error as measured on the current pattern with respect to each weight: Dp wj = g ¶E p ¶w j . (11. These characteristics have opened up a wealth of new applications. 11.. the total error E is defined to be E= åE p p = 1 2 å (d p p y p )2 . as indicated by the name least mean square.(11.12) where g is a constant of proportionality..16) where d = d y is the difference between the target output and the actual output for pattern p. That is...PERCEPTRON AND ADALINE 135 The error function. but we have not discussed the limitations on the representation of these networks.14) ¶E p = (d p y p) ¶y p such that Dp wj = g d p xj p p p . ..11) where the index p ranges over the set of input patterns and E p represents the error on pattern p. The LMS procedure finds the values of all the weights that minimize the error function by a method called gradient descent.(11. eq..(11.6 EXCLUSIVE-OR PROBLEM In the previous sections we have discussed two learning algorithms for single layer networks.

– 1) OR (1. Fig..5. as depicted in Fig. the output of the perceptron is equal to one on one side of the dividing line which is defined by: w1x1 + w2x2 = q .1 shows the desired relationships between inputs and output units for this function. as desired. For a constant q.5 a geometrical representation of the input domain is given. 11. 1) x1 (1. the output of the perceptron is zero when s is negative and equal to one when s is positive. 11. N 1 1 1 1 N 1 1 1 1 @ 1 1 1 1 One of Minsky and Paperts most discouraging results shows that a single layer perceptron cannot represent a simple exclusive-or function. The obvious question to ask is: How can this problem be overcome? Minsky and Papert prove that for binary inputs. and the two solid circles at (1. The input space consists of four points. In a simple network with two inputs and one output. 1) and (1. 1) cannot be separated by a straight line from the two open circles at (1.(11.6a demonstrates that the four input points are now embedded in a three-dimensional space defined by the two inputs plus the single hidden unit. 11. thereby extending the network to a multi-layer perceptron.136 FUZZY LOGIC AND NEURAL NETWORKS Table 11. the net input is equal to: s = w1x1 + w2x2 + q . 1). take a loot at Fig. x1 (– 1. 11.1 Exclusive-or truth table. These four points are now easily separated by a linear manifold (plane) into two groups. the problem can be solved. (11.18) and equal to zero on the other side of this line. any transformation can be carried out by adding a layer of predicates which are connected to all inputs. In Fig.17) According to eq.(11. 1) and (1.1).1.. 11. This simple example demonstrates that adding hidden units increases the class of . Table 3. To see that such a solution cannot be found. The proof is given in the next section. – 1) XOR Fig.. 1) x1 x2 x2 ? ? x2 And (– 1.5 Geometric representation of input space For the specific XOR problem we geometrically show that by introducing hidden units..

one can prove that this architecture is able to perform any transformation given the correct connections and weights. (1.PERCEPTRON AND ADALINE 137 problems that are soluble by feed-forward. by this generalization of the basic architecture we have also incurred a serious loss: we no longer have a learning rule to determine the optimal weights. 11. (b) This is accomplished by mapping the four points of Fig. However.21) . – 0. The most primitive is the next one. 1.5 a. separation (by a linear manifold) into the required groups is now possible.19) Since there are N input units..(11. Similarly.. For every x p Î X+ a hidden unit h can be reserved of which the activation yh is 1 if and only if the specific pattern p is present at the input: we can choose its weights wih equal to the specific pattern xp and the bias qh equal to 1 .like networks. perceptron. 1) 1 1 1 1 (– 1. clearly. Fig. – 1) b.1 with an extra hidden unit.. (a) The perceptron of Fig.5 – 1 – 0.(11. For a given transformation y = d(x).20) is equal to 1 for xp = wh only.6 onto the four points indicated here. the weights to the output neuron can be chosen such that the output is one as soon as one of the M predicate neurons is one: p y o = sgn F y GH å M h =1 h +M 1 2 I JK .N such that p y h = sgn F wx GH å i p ih i N+ 1 2 I JK . 11. we can divide the set of all possible input vectors into two classes: X + = {x|d(x) = 1} and X = {x|d(x) 1} . 11. – 1. the total number of possible input vectors x is 2N. the XOR problem can be solved. For binary units.7 MULTI-LAYER PERCEPTRONS CAN DO EVERYTHING In the previous section we showed that by adding an extra hidden unit..7 Solution of the XOR problem. 11. With the indicated values of the weights wij (next to the connecting lines) and the thresholds qi (in the circles) this perceptron solves the XOR problem...(11.

138 FUZZY LOGIC AND NEURAL NETWORKS This perceptron will give y0 = 1 only if x Î X+: it performs the desired mapping. D. 7. 5. 6. 1949. M.O. Derive the convergence theorem for perceptron learning rule. which is maximally 2N-1. What are the advantages of multiplayer perceptron over single layer perceptron? REFERENCES. and S. 3. Single layer perceptron cannot represent exclusive-OR. 1969. and we will always take the minimal number of mask units. but the point is that for complex transformations the number of required units in the hidden layer is exponential in N. Papert. Explain Adaline neural network. Perceptrons: An Introduction to Computational Geometry. Explain single layer neural network with one output and two inputs. 3.E. QUESTION BANK. New York: Spartan Books. 1960. Hebb. 2. Minsky. Justify this statement. Adaptive Switching Circuits. . The problem is the large number of predicate units. Explain the delta rule used to adjust the weights of Adaline network. B. The MIT Press. which is maximally 2N. 1. The Organization of Behaviour. F. Describe the perceptron learning rule. Rosenblatt. 4. Hoff. Principles of Neurodynamics. Of course we can do the same trick for X . Dunno. and M. 1959. New York: Wiley. Widrow. which is equal to the number of patterns in X +. A more elegant proof is given by Minsky and Papert. 1. 2. In 1960 Ire Wescon Convention Record. 4.

Minsky and Papert showed in 1969 that a two layer feed-forward network can overcome many restrictions. For this reason the method is often called the back-propagation learning rule. of which the outputs are fed into a layer of No output units (see Fig. 1990) that only one layer of hidden units suffices to approximate any function with finitely many discontinuities to arbitrary precision. 12. provided the activation functions of the hidden units are non-linear (the universal approximation theorem). The Ni inputs are fed into the first layer of Nh. 1989. & Kowalski. Keeler. Each layer consists of units. . Funahashi.2 MULTI .C H A P T E R Back-Propagation 12 12. as given in eq. 1 hidden units. Hornik. 12.FORWARD NETWORKS A feed-forward network has a layered structure. Hartman. The input units are merely fan-out units. (10. which receive their input from units from a layer directly below and send their output to units in a layer directly above the unit. 1989. The activation of a hidden unit is a function Fi of the weighted inputs plus a bias. Cun. just as for networks with binary units (section 11. 1989. no processing takes place in these units. Back-propagation can also be considered as a generalization of the delta rule for non-linear activation functions and multilayer networks. 2 hidden units. 1985. 1 INTRODUCTION As we have seen in the previous chapter.7) it has been shown (Cybenko. There are no connections within a layer. An answer to this question was presented by Rumelhart.1). Although back-propagation can be applied to networks with any number of layers. In this chapter we will focus on feed forward networks with layers of processing units. Hinton and Williams in 1986. until the last layer of hidden units. but did not present a solution to the problem of how to adjust the weights from input to hidden units.4). Stinchcombe.LAYER FEED . 1985). and similar solutions appeared to have been published earlier (Parker. The central idea behind this solution is that the errors for the units of the hidden layer are determined by back-propagating the errors of the units of the output layer. The output of the hidden units is distributed over the next layer of Nh. a single-layer network has severe restrictions: the class of tasks that can be accomplished is very limited. & White.

1) in which p sk = åw j p jk yk + qk .140 FUZZY LOGIC AND NEURAL NETWORKS h Ni o N0 Nh.1 A multi-layer network withlayers of units.(12.1 Nh1–1 Nh1–2 Fig. 12.3 THE GENERALISED DELTA RULE Since we are now using units with nonlinear activation functions.3) The error E p is defined as the total quadratic error for pattern p at the output units: Ep = 1 o p p do yo 2 o =1 åd N i 2 . given by p p yk = F(Sk) .. 12... we must set Dpwjk = g ¶E p ¶w jk . In most applications a feed-forward network with a single layer of hidden units is used with a sigmoid activation function for the units. we have to generalise the delta rule. (12.(12..4) .2) To get the correct generalization of the delta rule as presented in the previous chapter..... The activation is a differentiable function of the total input. which was presented in chapter 11 for linear functions to the set of non-linear activation functions.(12.

(12.(12.8) p The trick is to figure out what dk should be for each unit k in the network..10) in equation (12.. Substituting this and equation (12.7) we will get an update rule which is equivalent to the delta rule as described in the previous chapter. we get ..(12.(12. we consider two cases. is that there is a simple recursive computation of these ds which can be implemented by propagating error signals backward through the network.9).1) we see that ¶ykp p = F(Sk) p ¶S k . In this case..10) p which is simply the derivative of the squashing function F for the kth unit.9).. We can write ¶E p ¶E p ¶Skp = ¶w jk ¶Skp ¶w jk By equation (12. evaluated at the net input Sk to that unit. By equation (12. To compute the first factor of equation (12..(12. We further set E = åE p p as the summed squared error..BACK-PROPAGATION 141 p where do is the desired output for unit 0 when pattern p is clamped. it follows from the definition of E p that ¶E p p p p = (do yo) ¶yo .(12.2) we see that the second factor is . resulting in a gradient descent on the error surface if we make the weight changes according to: p Dpwjk = gdk yjp . one factor reflecting the change in error as a function of the output of the unit and one reflecting the change in the output as a function of changes in the input. we have p dk = ¶E p ¶E p ¶ykp = ¶Skp ¶ykp ¶S kp . The interesting result.. assume that unit k is an output unit k = o of the network.. p To compute dk we apply the chain rule to write this partial derivative as the product of two factors. First.. Thus.11) which is the same result as we obtained with the standard delta rule. which we now derive...6) ¶E p ¶Skp .5) ¶Skp = yjp ¶w jk When we define p dk = ...(12.9) Let us compute the second factor.

but what do they actually mean? Is there a way of understanding back-propagation other than reciting the necessary equations? The answer is.. and the actual network output is compared with the desired output values. weighted by this connection. however. the error measure can be written p p as a function of the net inputs from hidden to output layer Ep = Ep (s1. next time around. We have to bring eo to zero. Lets call this error eo for a particular output unit o..12) and (12.. the whole back-propagation process is intuitively very clear. the weights from input to hidden units are never changed. When a learning pattern is clamped.. sjp.. This is solved by the chain rule which does the following: distribute the error of an output unit o to all the hidden units that is it connected to.(12.12) for any output unit o.. This procedure constitutes the generalized delta rule for a feed-forward network of non-linear units. the error eo will be zero for this particular pattern. However. we have to adapt its incoming weights according to Dwho = (d° y°) yh . if k is not an output unit but a hidden unit k = h.9) yields p d h = F(Shp ) å@ j =1 No p o who .) and we use the chain rule to write ¶E p p = ¶yh ¶E p ¶Sop = p ¶Shp o = 1 ¶So å No ¶E p ¶ p p ¶yh o = 1 ¶So å No å No wko y jp j =1 ¶E p = w = p ho j = 1 ¶So å No åd j =1 No p o who ..142 FUZZY LOGIC AND NEURAL NETWORKS p p p do = (do yo ) Fo' (S p ) o .(12. in order to reduce an error. and we do not have the full representational power of the feed-forward network as promised by the universal approximation theorem.8).3.1 Understanding Back-Propagation The equations derived in the previous section may be mathematically correct. (12. Secondly.14) give a recursive procedure for computing the ds for all units in the network.14) Equations (12..13) Substituting this in equation (12. (12. 12.. we again want to apply the delta rule. the activation values are propagated to the output units. we do not readily know the contribution of the unit to the output error of the network... yes. But it alone is not enough: when we only apply this rule. In order to adapt the weights from input to hidden units.15) That is step one. of course. we do not have a value for d for the hidden units... we usually end up with an error in each of the output units. We know from the delta rule that. In this case. The simplest method to do this is the greedy method: we strive to change the connections in the neural network in such a way that. Differently put. In fact. s2 . a hidden unit h receives a delta from each output unit o equal to the delta of that output unit weighted with (= multiplied by) the weight of the .. What happens in the above equations is the following. which are then used to compute the weight changes according to equation (12..

19) such that the error signal for an output unit can be written as: p p p p p do = (do yo ) y o o (1 yo ) ..(12. before the back-propagation process can continue. 12..... the error signal is given by p p p p do = (do y o ) Fo' (S o ) .(12. F ¢ has to be applied to the delta. on the unit k receiving the input and the output of the unit j sending this signal along the connection: Dpwkj = gdkp yjp If the unit is an output unit.4 WORKING WITH BACK-PROPAGATION The application of the generalised delta rule thus involves two phases: During the first phase the input p x is presented and propagated forward through the network to compute the output values y o for each p output unit. In symbols: dh = åd w 0 0 ho Well. not exactly: we forgot the activation function of the hidden unit. This output is compared with its desired value do.18) In this case the derivative is equal to ¶ 1 1 F ¢(S ) = p = e ¶S p 1 + e s 1 + e s p e j 2 e e j = sp 1 e e j p sp e1 + e j e s 2 1+ e sp j = y p(1 y p ) .4.16) ..(12.17) Take as the activation function F the sigmoid function as defined in chapter 2: y p = F(S p ) = 1 1 + e s p .1 Weight Adjustments with Sigmoid Activation Function The results from the previous section can be summarised in three equations: The weight of a connection is adjusted by an amount proportional to the product of an error signal d....(12.(12.. 12. The second phase involves a backward pass through the network during which the error signal is passed to each unit in the network and appropriate weight changes are calculated. For the sigmoid activation function: .20) The error signal for a hidden unit is determined recursively in terms of error signals of the units to which it directly connects and the weights of those connections.BACK-PROPAGATION 143 connection between those units. resulting in an error signal do for each output unit.

the back-propagation algorithm performs gradient descent on the total error only if the weights are adjusted after the full set of learning patterns has been presented. and (c) with large learning rate and momentum term added.21) 12. 12. however. Care has to be taken.. When adding the momentum term. One way to avoid oscillation at large. whereas for high learning rates the minimum is never reached because of the oscillations. (a) for small learning rate. it takes a long time before the minimum has been reached with a low learning rate. 12.. i. b a c Fig. more often than not the learning rule is applied to each pattern separately. E p is calculated..3 Learning Per Pattern Although. The role of the momentum term is shown in Fig. The constant of proportionality is the learning rate g.2 Learning Rate And Momentum The learning procedure requires that the change in weight is proportional to ¶E p . For example. theoretically. is to make the change in weight dependent of the past weight change by adding a momentum term: p Dwjk (t + 1) = gdk yjp+ aDwjk (t) . a pattern p is applied. This problem can be overcome by using a permuted training method. 12. P). when using the same sequence over and over again the network may become focused on the first few patterns. For practical purposes we choose a learning rate that is as large as possible without leading to oscillation.. .e.(12.2.144 FUZZY LOGIC AND NEURAL NETWORKS p p d h = F ¢(S h ) å No p p dop who = y h(1 y h ) j =1 åd j =1 No p o who .4. True gradient descent ¶w requires that infinitesimal steps are taken. (b) for large learning rate: note the oscillations. . and the weights are adapted (p = 1.(12..22) where t indexes the presentation number and a is a constant which determines the effect of the previous weight change. the minimum will be reached faster. When no momentum term is used. There exists empirical indication that this results in faster convergence.2 The descent in weight space. with the order in which the patterns are taught.4. 2.

The relationship between x and d as represented by the network is shown in Fig. 1 1 0 0 –1 1 1 0 – 1 –1 0 –1 1 0 0 –1 –1 1 1 1 0 0 –1 1 1 0 0 –1 –1 –1 1 1 0 –1 –1 0 Fig. 12.20) should be adapted for the linear instead of sigmoid activation function. The network is considerably better at interpolation than extrapolation. 10 hidden units with sigmoid activation function and an output unit with a linear activation function. 12. d p} as depicted in Fig. The network weights are initialized to small values and the network is trained for 5. We want to estimate the relationship d = f(x) from 80 examples {x p.3 (top left). described in the previous section. while the function which generated the learning samples is given in Fig. Bottom left: The function which generated the learning samples. 12.1: A feed-forward network can be used to approximate a function from examples. Top right: The approximation with the network. A feed-forward network was programmed with two inputs.3 (bottom left). Suppose we have a system (for example a chemical process or a financial market) of which we want to know the characteristics.3 Example of function approximation with a feed forward network. The approximation error is depicted in Fig. The input of the system is given by the two-dimensional vector x and the output is given by the one-dimensional vector d. 12. Top left: The original learning samples. 12. Check for yourself how equation (4.BACK-PROPAGATION 145 Example 12.000 learning iterations with the back-propagation training rule.3 (top right). We see that the error is higher at the edges of the region within which the learning samples were generated. Bottom right: The error in the approximation.3 (bottom right). .

Most troublesome is the long training process.5). In some cases this leads to a formula. whereas in the back-propagation approach these weights can take any value and are typically learning using a learning heuristic.146 FUZZY LOGIC AND NEURAL NETWORKS 12.5 OTHER ACTIVATION FUNCTIONS Although sigmoid functions are quite often used as activation functions.. This can be a result of a non-optimum learning rate and momentum. the phase factor qn corresponds with the bias term of the hidden units and the factor n corresponds with the weights between the input and hidden layer. The factor a0 corresponds with the bias of the output unit. For example.(12.23) We can rewrite this as a summation of sine terms f(x) = a0 + . The result is depicted in Fig. This can be seen as a feed-forward network with a single input unit for x. the factors cn correspond with the weighs from hidden to output unit. .. which make the algorithm not guaranteed to be universally useful. from Fourier analysis it is known that any periodic function can be written as a infinite sum of sine and cosine terms (Fourier series): f(x) = n=0 å(an cos nx+ bn sin nx) åcn sin (nx + qn) n =1 ¥ ¥ . Outright training failures generally arise from two sources: network paralysis and local minima. there are some aspects. The same function (albeit with other learning points) is learned with a network with eight sigmoid hidden units (see Figure 12. as will be discussed in the next section. To illustrate the use of other activation functions we have trained a feed-forward network with one output unit.6 DEFICIENCIES OF BACK-PROPAGATION Despite the apparent success of the back-propagation learning algorithm. 12.(12.. four hidden units. The basic difference between the Fourier approach and the back-propagation approach is that the in the Fourier approach the weights between the input and the hidden units (these are the factors n) are fixed integer numbers which are analytically determined. and one input with ten patterns drawn from the function f (x) = sin(2x) sin(x).4..24) 2 2 with cn = an + bn and qn = arctan (b/a). a single output unit for f (x) and hidden units with an activation function F = sin (s). which is known from traditional function approximation theories. other functions can be used as well. A lot of advanced algorithms based on back-propagation learning have some optimized method to adapt this learning rate. 12. From the figures it is clear that it pays off to use as much knowledge of the problem at hand as possible.

12.BACK-PROPAGATION 147 +1 –4 –2 2 4 6 8 0. +1 –4 2 4 6 –1 Fig. 12.5 Fig. .5 The periodic function B(N) = sin (2N) sin (N) approximated with sigmoid activation functions.4 The periodic function B(N) = sin (2N) sin (N) approximated with sine activation functions.

12.2 Local Minima The error surface of a complex network is full of hills and valleys. conjugate gradient minimization. It is too early for a full evaluation: some of these techniques may prove to be fundamental.1 Network Paralysis As the network trains.6. Note that minimization along a direction u brings the function f at a place where its gradient is perpendicular to u (otherwise minimization along u is not complete). when exceeded.. and the training process can come to a virtual standstill.7 ADVANCED ALGORITHMS Many researchers have devised improvements of and extensions to the basic back-propagation algorithm described above. again results in the system being trapped in local minima. others may simply fade away. Another suggested possibility is to increase the number of hidden units.148 FUZZY LOGIC AND NEURAL NETWORKS 12. a set of n directions is constructed which are all conjugate to each other such that minimization along one of these directions uj does not spoil the minimization along one of the earlier directions ui. 1986).. Thus one minimization in the direction of ui suffices.6.(12.. Instead of following the gradient at every step. i. but they tend to be slow. and because of the sigmoid activation function the unit will have an activation very close to zero or very close to one. j i ¶2 f j p 1 xi xj + .» xT Ax bT x + c 2 . and the chance to get trapped is smaller. The total input of a hidden unit or output unit can therefore reach very high (either positive or negative) values. May be the most obvious improvement is to replace the rather primitive steepest descent method with a direction set minimization method. and c º f (p) å ¶x i ¶f i p xi + 1 2 å ¶x ¶x i. the weight adjustments which are proportional to ykp (1 yk ) will be close to zero.25) . A few methods are discussed in this section.. the network can get trapped in a local minimum when there is a much deeper minimum nearby. it appears that there is some upper limit of the number of hidden units which.21).. 12. Although this will work because of the higher dimensionality of the error space. Probabilistic methods can help to avoid this trap. Because of the gradient descent. Suppose the function to be minimized is approximated by its Taylor series f (x) = f (p) + where T denotes transpose. such that n minimizations in a system with n degrees of freedom bring this system to a minimum (provided the system is quadratic). & Vetterling. This is different from gradient descent. e. the directions are non-interfering. the weights can be adjusted to very large values.20) and (12. Flannery. Teukolsky.g. As is p clear from equations (12.e.. which directly minimizes in the direction of the steepest descent (Press.

BACK-PROPAGATION 149

b º Ñ f

¶2 f ¶xi ¶x j

p

[A]ij =

...(12.26)

p

A is a symmetric positive definite n ´ n matrix, the Hessian of f at p. The gradient of f is Ñf = Ax b such that a change of x results in a change of the gradient as d(Ñf ) = A(dx) ...(12.28) Now suppose f was minimized along a direction ui to a point where the gradient gi+ 1of f is perpendicular to ui, i.e., u iTgi + 1 = 0 ...(12.29) and a new direction ui+1is sought. In order to make sure that moving along ui+1 does not spoil minimization along ui we require that the gradient of f remain perpendicular to ui, i.e., u iTgi + 2 = 0 otherwise we would once more have to minimise in a direction which has a component of ui. Combining (12.29) and (12.30), we get 0 = u iT(gi+1 gi+2) = u iTd(Ñf) = u iTAui+1 When eq. (12.31) holds for two vectors ui and ui + 1 they are said to be conjugate. Now, starting at some point p0, the first minimization direction u0 is taken equal to g0 = Ñf (p0), resulting in a new point p1. For i ³ 0, calculate the directions ui+1 = gi +1 + giui where gi is chosen to make u iT Aui 1 and the successive gradients perpendicular, i.e.,

giT+ 1 gi + 1 giT gi

...(12.30)

...(12.27)

...(12.31)

...(12.32)

gi =

with gk = Ñf |pk for all k ³ 0

...(12.33)

Next, calculate pi+2 = pi+1 + li+1 ui+1where li+1 is chosen so as to minimize f(Pi + 2 )3. It can be shown that the us thus constructed are all mutually conjugate (e.g., see (Stoer & Bulirsch, 1980)). The process described above is known as the Fletcher-Reeves method, but there are many variants, which work more or less the same (Hestenes & Stiefel, 1952; Polak, 1971; Powell, 1977). Although only n iterations are needed for a quadratic system with n degrees of freedom, due to the fact that we are not minimizing quadratic systems, as well as a result of round-off errors, the n directions have to be followed several times (see Fig. 12.6). Powell introduced some improvements to correct for behaviour in non-quadratic systems. The resulting cost is O(n) which is significantly better than the linear convergence 4 of steepest descent.

150 FUZZY LOGIC AND NEURAL NETWORKS

Gradient ut +l

ut

**A very slow approximation
**

Fig. 12.6 Slow decrease with conjugate gradient in non-quadratic systems. [The hills on the left are very steep, resulting in a large search vector KE. When the quadratic portion is entered the new search direction is constructed from the previous direction and the gradient, resulting in a spiraling minimization. This problem can be overcome by detecting such spiraling minimizations and restarting the algorithm with K0 = ÑB ].

Some improvements on back-propagation have been presented based on an independent adaptive arning rate parameter for each weight. Van den Boomgaard and Smeulders (Boomgaard & Smeulders, 1989) show that for a feed-forward network without hidden units an incremental procedure to find the optimal weight matrix W needs an adjustment of the weights with Dw(t + 1) = g(t + 1) [d(t + 1) w(t) ´ (t + 1)] ´ (t + 1) ...(12.34) in which g is not a constant but an variable (Ni + 1) ´ (Ni + 1) matrix which depends on the input vector. By using a priori knowledge about the input signal, the storage requirements for can be reduced. Silva and Almeida (Silva & Almeida, 1990) also show the advantages of an independent step size for each weight in the network. In their algorithm the learning rate is adapted after every learning pattern:

RuC | | g (t + 1) = S |dC | T

jk

jk ( t ) jk ( t )

¶E (t + 1) ¶E (t ) and have the same signs ¶w jk ¶w jk ¶E (t + 1) ¶E (t ) if and have the opposite signs ¶w jk ¶w jk if

...(12.35)

BACK-PROPAGATION 151

where u and d are positive constants with values slightly above and below unity, respectively. The idea is to decrease the learning rate in case of oscillations.

**12.8 HOW GOOD ARE MULTI-LAYER FEED-FORWARD NETWORKS?
**

From the example shown in Fig. 12.3 is clear that the approximation of the network is not perfect. The resulting approximation error is influenced by: 1. The learning algorithm and number of iterations. This determines how good the error on the training set is minimized. 2. The number of learning samples. This determines how good the training samples represent the actual function. 3. The number of hidden units. This determines the expressive power of the network. For smooth functions only a few number of hidden units are needed, for wildly fluctuating functions more hidden units will be needed. In the previous sections we discussed the learning rules such as back-propagation and the other gradient based learning algorithms, and the problem of finding the minimum error. In this section we particularly address the effect of the number of learning samples and the effect of the number of hidden units. We first have to define an adequate error measure. All neural network training algorithms try to minimize the error of the set of learning samples which are available for training the network. The average error per learning sample is defined as the learning error rate error rate: Elearning =

1 Plearning

Plearning p=1

åE

p

...(12.36)

in which Ep is the difference between the desired output value and the actual network output for the learning samples: Ep =

1 2

å (d

0=1

No

p o

p yo )

...(12.37)

This is the error, which is measurable during the training process. It is obvious that the actual error of the network will differ from the error at the locations of the training samples. The difference between the desired output value and the actual network output should be integrated over the entire input domain to give a more realistic error measure. This integral can be estimated if we have a large set of samples. We now define the test error rate as the average error of the test set:

1 Etest = Ptest

åE

p=1

Ptest

p

...(12.38)

In the following subsections we will see how these error measures depend on learning set size and number of hidden units.

152 FUZZY LOGIC AND NEURAL NETWORKS

**12.8.1 The Effect Of the Number of Learning Samples
**

A simple problem is used as example: a function y = f(x) has to be approximated with a feed-forward neural network. A neural network is created with an input, 5 hidden units with sigmoid activation function and a linear output unit. Suppose we have only a small number of learning samples (e.g., 4) and the networks is trained with these samples. Training is stopped when the error does not decrease anymore. The original (desired) function is shown in Fig. 4.7A as a dashed line. The learning samples and the approximation of the network are shown in the same figure. We see that in this case Elearning is small (the network output goes perfectly through the learning samples) but Etest is large: the test error of the network is large. The approximation obtained from 20 learning samples is shown in Fig. 12.7B. The Elearning is larger than in the case of 5 learning samples, but the Etest is smaller.

1 A 1 B

0.8

0.8

0.6

y y

0.6

0.4

0.4

0.2

0.2

0

0

0.5 X

1

0

0

0.5 X

1

Fig. 12.7

Effect of the learning set size on the generalization. The dashed line gives the desired function, the learning samples are depicted as circles and the approximation by the network is shown by the drawn line. 5 hidden units are used. a) 4 learning samples. b) 20 learning samples.

This experiment was carried out with other learning set sizes, where for each learning set size the experiment was repeated 10 times. The average learning and test error rates as a function of the learning set size are given in Fig. 12.8. Note that the learning error increases with an increasing learning set size, and the test error decreases with increasing learning set size. A low learning error on the (small) learning set is no guarantee for a good network performance! With increasing number of learning samples the two error rates converge to the same value. This value depends on the representational power of the network: given the optimal weights, how good is the approximation. This error depends on the number of hidden units and the activation function. If the learning error rate does not converge to the test error rate the learning procedure has not found a global minimum.

**BACK-PROPAGATION 153
**

Error rate

Test set

Learning set

**Number of learning samples
**

Fig. 12.8 Effect of the learning set size on the error rate. The average error rate and the average test error rate are as a function of the number of learning samples.

12.8.2

The Effect of the Number of Hidden Units

The same function as in the previous subsection is used, but now the number of hidden units is varied. The original (desired) function, learning samples and network approximation is shown in Fig. 4.9A for 5 hidden units and in Fig. 4.9B for 20 hidden units. The effect visible in Fig. 4.9B is called over training. The network fits exactly with the learning samples, but because of the large number of hidden units the function which is actually represented by the network is far more wild than the original one. Particularly in case of learning samples which contain a certain amount of noise (which all real-world data have), the network will fit the noise of the learning samples instead of making a smooth approximation. This example shows that a large number of hidden units leads to a small error on the training set but not necessarily leads to a small error on the test set. Adding hidden units will always lead to a reduction of the Elearning. However, adding hidden units will first lead to a reduction of the Etest, but then lead to an increase of Etest. This effect is called the peaking effect. The average learning and test error rates as a function of the learning set size are given in Fig. 12.10.

12.9 APPLICATIONS

Back-propagation has been applied to a wide variety of research applications. Sejnowski and Rosenberg (1986) produced a spectacular success with NETtalk, a system that converts printed English text into highly intelligible speech. · A feed-forward network with one layer of hidden units has been described by Gorman and Sejnowski (1988) as a classification machine for sonar signals.

6 y y 0.2 0 0 0. so that input values which are not presented as learning patterns will result in correct output values. who used a two-layer feed-forward network with back-propagation learning to perform the inverse kinematic transform which is needed by a robot arm controller. .6 0.5 X 1 Fig. The dashed line gives the desired function.4 0.9 Effect of the number of hidden units on the network performance. 12.2 0. 12.154 FUZZY LOGIC AND NEURAL NETWORKS 1 A 1 B 0.10 The average learning error rate and the average test error rate as a function of the number of hidden units. 12 learning samples are used. An example is the work of Josin (1988). a) 5 hidden units.4 0. Error rate Test set Learning set Number of hidden units Fig. It is hoped that the network is able to generalize correctly. b) 20 hidden units.8 0.5 X 1 0 0 0. · A multi-layer feed-forward network with a back-propagation training algorithm is used to learn an unknown function between input and output signals from the presentation of examples. the circles denote the learning samples and the drawn line gives the approximation by the network.8 0.

323. 1990. Learning representations by back-propagating errors.M. Hartman.E. The MIT Press. Approximation by superpositions of a sigmoidal function. pp. Papert. White. . 2. W. E. 1986. Cun. 3. Cambridge: Cambridge University Press.E. Vol. Rep. Cambridge. Neural Computation. J. 85. S. 1. 3.D. 1986. Kowalski. Stinchcombe. Hinton. No. Layered neural networks with Gaussian hidden units as universal approximations. pp. 8. Mathematics of Control. 303-314. Center for Computational Research in Economics and Management Science. 2. 12. Rumelhart. and R. 5. 5.L. Explain the effect of the number of learning samples in multi-layer feed forward networks. Bulirsch. What are the applications of back-propagation algorithm? REFERENCES. Vol. pp. M. Proceedings of Cognitiva. 8. Funahashi. and H. 5. What are the deficiencies of back-propagation algorithm? Explain various methods employed to overcome the deficiencies of back-propagation algorithm. D. 10.A. No. 7. Vol. TR (47). Describe the generalized delta rule. 2. Explain the sine activation function with an example. M. Learning-Logic (Tech. Keeler. New York-Heidelberg. G. How good are multi-layer feed forward networks? Explain. Introduction to Numerical Analysis. and S.I. and R. 533-536. Y. 1989. and W. 1. 1985. Nature. 2. 6. K. How the weights are adjusted with sigmoid activation function? Explain with an example. 4. Teukolsky. Minsky.BACK-PROPAGATION 155 QUESTION BANK.T.J. 4. G. Neural Networks. Multilayer feed forward networks are universal approximates.P. Nos. Y. Cybenko. Vol. Vol. 1969. 210-215. 2. 193-192. Vol. 359-366. Explain the multi-layer feed forward networks. J. pp.H. Signals. and Systems. K. Stoer. 1980. Une procedure dapprentissage pour reseau a seuil assymetrique. No. Numerical Recipes: The Art of Scientific Computing. and J. 599-604. Parker. B. On the approximate realization of continuous mappings by neural networks. 1989. D. Vol.B. MA: Massachusetts Institute of Technology. 1989. Explain the effect of the number of hidden links in multi-layer feed forward networks. Williams. 9. 2. Press. No. 4. Neural Networks. Flannery. Hornik.J. Vetterling. 9. 6. 7. What is back-propagation algorithm? Explain. 3. Perceptrons: An Introduction to Computational Geometry.Berlin: Springer-Verlag. 11. Explain learning rate and momentum with back-propagation with an example. L. 1985. pp. 10. 2.

13. E. 49. JHU/EECS-86/01).R. 59. NETtalk: A Parallel Network that Learns to Read Aloud (Tech. Neural Networks. 409-436. 1952. Neural-space generalization of a topological transformation.J. Methods of conjugate gradients for solving linear systems. Vol. Gorman. 1971. Restart procedures for the conjugate gradient method. M. Vol. Polak. Vol. 1986. Biological Cybernetics. New York: Academic Press. 12. Rosenberg. and T. 1977.P. The John Hopkins University Electrical Engineering and Computer Science Department. Analysis of hidden units in a layered network trained to classify sonar targets. Mathematical Programming. Josin. 16. 14. 1988. T. Vol. No. Rep. 1. .D. pp. and C. Powell. and E. 241-254. Hestenes. 12. pp. M. Computational Methods in Optimization. pp. 15. Sejnowski.R. 1988. Sejnowski.J. Nos. 1. 75-89. R. Journal of National Bureau of Standards. Stiefel.156 FUZZY LOGIC AND NEURAL NETWORKS 11.J. G. pp. 283-290.

Although. etc. The theory of the dynamics of recurrent networks extends beyond the scope of a one-semester course on neural networks. it is possible to continue propagating activation values until a stable point (attractor) is reached. i. which can be used for the representation of binary patterns. In this chapter. As we will see in the sequel. introduced in chapter 12. the approximation capabilities of such networks do not increase. Also some special recurrent networks will be discussed: the Hopfield network.e. we can connect a hidden unit with itself over a weighted connection. In such networks.+ 0 ) 2 6 . we may obtain decreased complexity. as we know from the previous chapter. Yet the basics of these networks will be discussed. but there are also recurrent networks where the learning rule is used after each propagation (where an activation value is transversed over each weight only once). or even connect all units with each other.2 THE GENERALISED DELTA . An important question we have to consider is the following: what do we want to learn in a recurrent network? After all. to solve the same problem. can be easily used for training patterns in recurrent networks. we will first describe networks . subsequently we touch upon Boltzmann machines. recurrent extensions to the feed-forward network will be discussed. there exist recurrent network. however. connect hidden units to input units. Before we will consider this general case. 13.4 Recurrent Networks 13 13. 1 INTRODUCTION The learning algorithms discussed in the previous chapter were applied to feed-forward networks: all data flows in a network in which no cycles are present. But what happens when we introduce a cycle? For instance. the activation values in the network are repeatedly updated until a stable point is reached after which the weights are adapted.RULE IN RECURRENT NETWORKS The back-propagation learning rule. the recurrent connections can be regarded as extra inputs to the network (the values of which are computed by the network itself). therewith introducing stochasticity in neural computation. when one is considering a recurrent network. while external inputs are included in each propagation. network size.. which are attractor based.

instead. the network is supposed to learn the influence of the previous time steps itself. 13. x¢. .158 FUZZY LOGIC AND NEURAL NETWORKS where some of the hidden unit activation values are fed back to an extra set of input units (the Elman network). With a feed-forward network there are two possible approaches: 1. Naturally. etc.1 The Jordan network. . The disadvantage is. x2. a window of inputs need not be input anymore. which is a time series x(t). In the Jordan network. 2. Create inputs x1. Create inputs x. of course. x(t 2).2.1 The Jordan Network One of the earliest recurrent neural networks was the Jordan network. xn which constitute the last n values of the input vector. A typical application of such a network is the following. An example of this network is shown in Fig. computation of these derivatives is not a trivial task for higher-order derivatives. x(t 1). The Jordan and Elman networks provide a solution to this problem. Thus a time window of the input vector is input to the network. Suppose we have to construct a network that must generate a control command depending on an external input. Output activation values are fed back to the input layer. 13. to a set of extra neurons called the state units. or where output values are fed back into hidden units (the Jordan network). the activation values of the output units are fed back into the Input units h o State units Fig. . Besides only inputting x(t). 13. . second. x¢¢. derivatives. leading to a very large network. Due to the recurrent connections.1. we also input its first. that the input dimensionality of the feed-forward network is multiplied with n. which is slow and difficult to train.

to a set of extra neurons called the context units. 4. As an example. and (2) the extra input units have no self-connections. t = 1 2. 13. The idea of the recurrent connections is that the network is able to remember the previous states of the input values. . The schematic structure of this network is shown in Fig. go to 2. the hidden unit activation values are fed back to the input layer. Example 13. There are as many state units as there are output units in the network.2. Learning is done as follows: 1. To control the object. The context units at step t thus always have the activation value of the hidden units at step t 1. except that (1) the hidden units instead of the output units are fed back.RECURRENT NETWORKS 159 input layer through a set of extra input units called the state units. the forward calculations are performed once. The connections between the output and state units have a fixed weight of +1. since the object suffers from friction and perhaps other external forces. Again the hidden units are connected to the context units with a fixed weight of value +1. forces F must be applied. This object has to follow a pre-specified trajectory xd. The context units are set to 0. Thus the network is very similar to the Jordan network. Pattern xt is clamped. With this network. 13. t ¬ t + 1. Thus all the learning rules derived for the multi-layer perceptron can be used to train this network.2 Context layer The Elman network.2 The Elman Network In the Elman network a set of context units are introduced. The back-propagation learning rule is applied.1: As we mentioned above.2. Output layer Hidden layer Input layer Fig. which are extra input units whose activation values are fed back from the hidden units. 3. we trained an Elman network on controlling an object moving in 1 D. the Jordan and Elman networks can be used to train a network on reproducing time sequences. learning takes place only in the connections between input and hidden units as well as hidden and output units. 13.

four of 4 2 0 100 200 300 400 500 –2 –4 Fig. 13.160 FUZZY LOGIC AND NEURAL NETWORKS To tackle this problem. The hidden units are connected to three context units. the dashed line the realized trajectory.4 Training a feed-forward network to control an object. In total. 4 2 0 100 200 300 400 500 –2 –4 Fig. The solid line depicts the desired trajectory N@. . one output F. The solid line depicts the desired trajectory N@.3. The third line is the error. and three hidden units. five units feed into the hidden layer. 13. we use an Elman net with inputs x and xd.3 Training an Elman network to control an object. the dashed line the realized trajectory. The results of training are shown in Fig. The same test can be done with an ordinary feedforward network with sliding window input. 13. We tested this with a network with five inputs. The third line is the error.

also when a network does not reach a fixed point. and one the desired next position of the object.RECURRENT NETWORKS 161 which constituted the sliding window x3. The disappointing observation is that the results are actually better with the ordinary feed-forward network. All connections are weighted.3 THE HOPFIELD NETWORK One of the earliest recurrent neural networks reported in literature was the auto-associator independently described by Anderson (1977) and Kohonen (1977).5). For instance. 13. a pattern is clamped. 13.e. x2. the discussion of which extents beyond the scope of our course. i ¹ j (see Fig. This learning method. which can be used for training attractor networks. 13.. x1 and x0.3 Back-Propagation in Fully Recurrent Networks More complex schemes than the above are possible. i. 1989. 15. Hopfiled (1982) brings together several earlier ideas concerning these networks and presents a complete mathematical analysis. Results are shown in Fig. 1990). which has the same complexity as the Elman network.4.5 The auto-associator network. However. Fig. the network iterates to a stable state. independently of each other Pineda (1987) and Almeida (1987) discovered that error back-propagation is in fact a special case of a more general gradient learning method. All neurons are both input and output neurons. 13. . a learning method can be used: back-propagation through time (Pearlmutter. It consists of a pool of neurons with connections between each unit i and j. and the output of the network consists of the new activation values of the neurons. can be used to train a multi-layer perceptron to follow trajectories in its activation values.2.

Originally. note that the energy expressed in eq. yk (t + 1) = sgn (Sk (t + 1)) For simplicity we henceforth choose Uk = 0. We will therefore adhere to the latter convention.1) A simple threshold function (Fig.e.162 FUZZY LOGIC AND NEURAL NETWORKS 13..(13. 10. Hopfield chose activation values of 1 and 0. when the network is in state a.(13. Secondly.3. when xp is clamped. The activation values are binary.1) and (13. because De = Dyk F yw GH å j j¹k jk + qk I JK .4) Theorem 13.(13. Proof: First. but using values +1 and 1 presents some advantages discussed below. .. all neurons are stable. e is monotonically decreasing when state changes occur. The net input Sk(t + 1) of a neuron k at cycle t + 1 is a weighted sum Sk(t + 1) = å yj (t)wjk + qk j¹k .. All neurons are both input and output neurons.5) is always negative when yk changes according to eqs.1) and (13.2) has stable limit points. but this is of course not essential.2).3) A state a is called stable if.1: A recurrent network with connections wjk = wkj in which the neurons are updated using rule (13..5).... yk(t) = sgn (Sk(t 1)) .(13. When the extra restriction wjk = wkj is made.2) is applied to the net input to obtain the new activation value yi(t + 1) at time t + 1: R+ 1 | y (t + 1) = S. in accordance with equations (13.4) is bounded from below. all neurons are stable. the behavior of the system can be described with an energy function e= 1 2 j¹k åå y j yk w jk - åq k k yk .2) i.. (13. which update their activation values asynchronously and independently of other neurons.. (13. A neuron k in the Hopfield network is called stable at time t if.(13... 13. The state of the system is given by the activation values Y = y(k). A pattern xp is called stable if.1 Description The Hopfield network consists of a set of N interconnected neurons (Fig.2).1 | y (t ) T k k if Sk (t + 1) > U k if Sk (t + 1) < U k otherwise . since the yk are bounded from below and the wjk and qk are constant.

& Palmer. The stored patterns become unstable. The Hebb rule can be used to store P patterns: wjk R xx | = Så |0 T p p =1 p p j k if j ¹ k otherwise . but 11 11 need not be). in practice. stable states which do not correspond with stored patterns). It appears. otherwise decreased by one (note that.. When the network is cued with a noisy or incomplete test pattern. that the network gets saturated very quickly.. wjk = wkj) results in a system that is not guaranteed to settle to a stable state..6) p i. These states can be seen as dips in energy space. Removing the restriction of bidirectional connections (i. Canning.. . 1983).1: Given a starting weight matrix W = [wjk]. however.. the pattern 00 00 is always stable. but with a low learning factor (Hopfield.(13.. Similarly. For.RECURRENT NETWORKS 163 The advantage of a + 1/1 model over a 1/0 model then is symmetry of the states of the network. Algorithm 13. Repeat this procedure until all patterns are stable. There are two problems associated with storing too many patterns: 1. when some pattern x is stable. it will render the incorrect or missing data by iterating to a stable state. where the algorithm remains oscillatory (try to find one)! The second problem stated above can be alleviated by applying the Hebb rule in reverse to the spurious stable state. It appears that. There exist cases. both a pattern and its inverse have the same energy in the +1/1 model. which is in some sense near to the cued pattern. if xjp and xk are equal.e. its inverse is stable. Spurious stable states appear (i. wjk is increased.3. the weights of the connections between the neurons have to be thus set that the states of the system corresponding with the patterns which are to be stored in the network are stable.7) Now modify wjk by Dwjk = yj yk(ej +ek) if j ¹ k. Gardner. Thus these patterns are weakly unstored and will become unstable again. & Wallace. The first of these two problems can be solved by an algorithm proposed by Bruce et al. too. Forrest. and that about 0:15N memories can be stored before recall errors become severe. this algorithm usually converges. 2. however. (Bruce. weights only increase)..(13. 13. Feinstein. for each pattern x p to be stored and each element x kp in x p define a correction ek such that Ak = R0 S1 T if yk is stable and x p is clamped otherwise .e. in the original Hebb rule.e. In this case.2 Hopfield Network as Associative Memory A primary application of the Hopfield network is an associative memory. whereas in the 1/0 model this is not always true (as an example. 1986).

1988) find that in only 15% of the runs a valid result is obtained. the subscripts are defined modulo n. Yk = AdXY (1 djk) inhibitory connections within each row = Bdjk(1 dXY) inhibitory connections within each column = C global inhibition = DdXY(dk. As before. The first and second terms in equation (13. each row and each column should have one and only one active neuron. (Wilson and Pawley. Each row in the matrix represents a city. Finally. When the network is settled.4 Hopfield Networks for Optimization Problems An interesting application of the Hopfield network with graded response arises in a heuristic solution to the NP-complete traveling salesman problem (Garey & Johnson.. the threshold activation function is replaced by a sigmoid. 13. Hopfield and Tank (1985) use a network with n ´ n neurons. j . To ensure a correct solution. the applicability is limited.3. The activation value yxj = 1 indicates that city X occupies the jth place in the tour. few of which lead .. For example. To minimise the distance of the tour.. a path of minimal distance must be found between n cities. indicating a specific city occupying a specific position in the tour. respectively.nJ K 2 . The last term is zero if and only if there are exactly n active neurons.8) are zero if and only if there is a maximum of one active neuron in each row and column. In this problem. j +1 + yY . An energy function describing this problem can be set up as follows.(13. the following energy must be minimized: A e= 2 ååå y X j k¹ j Xj y Xk B + 2 åå å j X C y Xj yYj + 2 X ¹Y F y GH å å X j Xj I . an extra term e= D 2 å ååd X Y¹X j XY y Xj ( yY . where dXY is the distance between cities X and Y and D is a constant.10) . each neuron has an external bias input Cn. 1984).and end-points are the same.1 can be generalized by allowing continuous activation values. The weights are set as follows: wXJ. 1979).. B.1 ) . such that the begin.. Whereas Hopfield and Tank state that the network converges to a valid solution in 16 out of 20 trials while 50% of the solutions are optimal. Here. The neurons are updated using rule (13.3 Neurons with Graded Response The network described in section 13. this system can be proved to be stable when a symmetric weight matrix is used (Hopfield.(13. Although this application is interesting from a theoretical point of view.8) where A.164 FUZZY LOGIC AND NEURAL NETWORKS 13.(13.3.2) with a sigmoid activation function between 0 and 1. For convenience.. other reports show less encouraging results. and C are constants. j+1 + dk. j1) data term where djk = 1if j = k and 0 otherwise.9) is added to the energy.3. whereas each column represents the position in the tour.

. A good way to beat this trade-off is to start at a high temperature and gradually reduce it. and Sejnowski in 1985 is a neural network that can be seen as an extension to Hopfield networks to include hidden units.4 BOLTZMANN MACHINES The Boltzmann machine. the network will eventually reach thermal equilibrium and the relative probability of two global states a and b will follow the Boltzmann distribution Pa . The main problem is the lack of global information. Differently put.11) where T is a parameter comparable with the (synthetic) temperature of the system. Since. the N-dimensional hypercube in which the solutions are situated is 2N degenerate. and with a stochastic instead of deterministic update rule.De k / T 1 . p(yk ¬ + 1) = 1 + e . such that all but one of the final 2N configurations are redundant.(13.12) where Pa is the probability of being in the ath global state. in which a neuron becomes active with a probability p.e )/ T = e a b Pb . This is a process whereby a material is heated and then cooled very. In accordance with a physical system obeying a Boltzmann distribution. it will begin to respond to smaller energy differences and will find one of the better minima within the coarse-scale minimum it discovered at high temperature. the network will ignore small energy differences and will rapidly approach equilibrium. as first described by Ackley. The competition between the degenerate tours often leads to solutions which are piecewise optimal but globally inefficient. but the probability of finding the network in any global state remains constant. but the time required to reach equilibrium may be long. Hinton.RECURRENT NETWORKS 165 to an optimal or near-optimal solution.. for an N-city problem. such that the system is in a state of very low energy. Note that at thermal equilibrium the units still change state. and ea is the energy of that state. the number of different tours is N!/2N. In the Boltzmann machine this system is mimicked by changing the deterministic update of equation (13. The weights are still symmetric.( e . At higher temperatures the bias is not so favorable but equilibrium is reached faster. At low temperatures there is a strong bias in favor of states with low energy. This stochastic activation function is not to be confused with neurons having a sigmoid deterministic activation function.2) in a stochastic update. As the temperature is lowered. there are N! possible tours. it will perform a search of the coarse overall structure of the space of global states. the crystal lattice will be highly ordered. each of which may be traversed in two directions as well as started in N points. The degenerate solutions occur evenly within the hypercube. In doing so. very slowly to a freezing point. 13. .. At high temperatures.(13. As a result. The operation of the network is based on the physics principle of annealing. and will find a good minimum at that coarse level.. without any impurities.

**166 FUZZY LOGIC AND NEURAL NETWORKS
**

As multi-layer perceptions, the Boltzmann machine consists of a non-empty set of visible and a possibly empty set of hidden units. Here, however, the units are binary-valued and are updated stochastically and asynchronously. The simplicity of the Boltzmann distribution leads to a simple learning procedure, which adjusts the weights so as to use the hidden units in an optimal way (Ackley et al., 1985). This algorithm works as follows: First, the input and output vectors are clamped. The network is then annealed until it approaches thermal equilibrium at a temperature of 0. It then runs for a fixed time at equilibrium and each connection measures the fraction of the time during which both the units it connects are active. This is repeated for all input-output pairs so that each connection can measure (yj yk)clamped, the expected probability, averaged over all cases, that units j and k are simultaneously active at thermal equilibrium when the input and output vectors are clamped. Similarly, (yj yk)free is measured when the output units are not clamped but determined by the network. In order to determine optimal weights in the network, an error function must be determined. Now, the probability Pfree(Y p) that the visible units are in state Y pwhen the system is running freely can be measured. Also, the desired probability Pclamped(Y p)that the visible units are in state (Y p) is determined by clamping the visible units and letting the network run. Now, if the weights in the network are correctly set, both probabilities are equal to each other, and the error E in the network must be 0. Otherwise, the error must have a positive value measuring the discrepancy between the networks internal mode and the environment. For this effect, the asymmetric divergence or Kullback information is used: E=

åP

p

clamped

(Y p ) log

P clamped (Y p ) P free (Y P )

...(13.13)

Now, in order to minimize E using gradient descent, we must change the weights according to Dwjk = g It is not difficult to show that

¶E ¶w jk

...(13.14)

**¶E 1 = ( y j yk ) clamped - ( y j yk ) free ¶w jk T
**

Therefore, each weight is updated by Dwjk = g ( y j yk ) clamped - ( y j yk ) free

...(13.15)

...(13.16)

RECURRENT NETWORKS 167

QUESTION BANK.

1. 2. 3. 4. 5. 6. 7. 8. 9. What happens when a cyclic data is introduced to feed forward networks? Explain the generalized delta-rule in recurrent networks. Describe the Jordan network with an example. Describe Elman network with an example. Describe the Hopfield network. Describe the Hopfield network as associative memory. Describe Hopfield network for optimization problems. Describe the Boltzman machine. What are the problems resulted while storing too many patterns using associative memory? How these problems can be solved?

REFERENCES.

1. M.I. Jordan, Attractor dynamics and parallelism in a connectionist sequential machine, In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Hillsdale, NJ: Erlbaum, pp. 531-546, 1986. 2. M.I. Jordan, Serial Order: A Parallel Distributed Processing Approach (Tech. Rep. No. 8604). San Diego, La Jolla, CA: Institute for Cognitive Science, University of California, 1986. 3. J.L. Elman, Finding structure in time. Cognitive Science, Vol. 14, pp. 179-211, 1990. 4. F. Pineda, Generalization of back-propagation to recurrent neural networks, Physical Review Letters, Vol. 19, and pp. 2229-2232, 1987. 5. L.B. Almeida, A learning rule for asynchronous perceptrons with feedback in a combinatorial environment, In Proceedings of the First International Conference on Neural Networks, Vol. 2, pp. 609-618,1987. 6. B.A. Pearlmutter, Learning state space trajectories in recurrent neural networks, Neural Computation, Vol. 1, No. 2, pp. 263-269, 1989. 7. B.A. Pearlmutter, Dynamic Recurrent Neural Networks (Tech. Rep. Nos. CMU-CS-90-196), Pittsburgh, PA 15213: School of Computer Science, Carnegie Mellon University, 1990. 8. J.A. Anderson, Neural Models with Cognitive Implications. In D. LaBerge and S.J. Samuels (Eds.), Basic Processes in Reading Perception and Comprehension Models, Hillsdale, NJ: Erlbaum, pp. 27-90, 1977. 9. T. Kohonen, Associative Memory: A System-Theoretical Approach, Springer-Verlag, 1977. 10. J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proceedings of the National Academy of Sciences, Vol. 79, pp. 2554-2558, 1982. 11. A.D. Bruce, A. Canning, B. Forrest, E. Gardner, and D.J. Wallace, Learning and memory properties in fully connected networks, In J.S. Denker (Ed.), AIP Conference Proceedings 151, Neural Networks for Computing, pp. 65-70, DUNNO, 1986.

**168 FUZZY LOGIC AND NEURAL NETWORKS
**

12. J.J. Hopfield, D.I. Feinstein, and R.G. Palmer, unlearning has a stabilizing effect in collective memories, Nature, Vol. 304, pp. 159-159, 1983. 13. J.J. Hopfield, Neurons with graded response have collective computational properties like those of two-state neurons, Proceedings of the National Academy of Sciences, Vol. 81, pp. 3088-3092, 1984. 14. M.R. Garey, and D.S. Johnson, Computers and Intractability. New York: W.H. Freeman, 1979. 15. J.J. Hopfield, and D.W. Tank, neural computation of decisions in optimization problems, Biological Cybernetics, Vol. 52, pp. 141-152, 1985. 16. G.V. Wilson, and G.S. Pawley, On the stability of the traveling salesman problem algorithm of Hopfield and tank, Biological Cybernetics, Vol. 58, pp. 63-70, 1988. 17. D.H. Ackley, G.E. Hinton, and T.J. Sejnowski, (1985). A learning algorithm for Boltzmann machines, Cognitive Science, Vol. 9, No. 1, pp. 147-169, 1985.

+ 0 ) 2 6 - 4

Self-Organizing Networks

14

14. 1

INTRODUCTION

In the previous chapters we discussed a number of networks, which were trained to perform a mapping F: Â¢¢ ® Âm by presenting the network examples (x p, d p) with d p = F(x p) of this mapping. However, problems exist where such training data, consisting of input and desired output pairs are not available, but where the only information is provided by a set of input patterns x p. In these cases the relevant information has to be found within the (redundant) training samples x p. Some examples of such problems are: Clustering: the input data may be grouped in clusters and the data processing system has to find these inherent clusters in the input data. The output of the system should give the cluster label of the input pattern (discrete output); Vector quantisation: this problem occurs when a continuous space has to be discretized. The input of the system is the n-dimensional vector x, the output is a discrete representation of the input space. The system has to find optimal discretization of the input space; Dimensionality reduction: the input data are grouped in a subspace, which has lower dimensionality than the dimensionality of the data. The system has to learn an optimal mapping, such that most of the variance in the input data is preserved in the output data; Feature extraction: the system has to extract features from the input signal. This often means a dimensionality reduction as described above. In this chapter we discuss a number of neuro-computational approaches for these kinds of problems. Training is done without the presence of an external teacher. The unsupervised weight adapting algorithms are usually based on some form of global competition between the neurons. There are very many types of self-organizing networks, applicable to a wide area of problems. One of the most basic schemes is competitive learning as proposed by Rumelhart and Zipser (1985). A very similar network but with different emergent properties is the topology-conserving map devised by Kohonen. Other self-organizing networks are ART, proposed by Carpenter and Grossberg (1987), and Fukushima (1975).

170 FUZZY LOGIC AND NEURAL NETWORKS

14.2 14.2.1 Clustering

COMPETITIVE LEARNING

Competitive learning is a learning procedure that divides a set of input patterns in clusters that are inherent to the input data. A competitive learning network is provided only with input vectors x and thus implements an unsupervised learning procedure. We will show its equivalence to a class of traditional clustering algorithms shortly. Another important use of these networks is vector quantisation. An example of a competitive learning network is shown in Fig. 14.1. All output units o are connected to all input units i with weights wio. When an input pattern x is presented, only a single output unit of the network (the winner) will be activated. In a correctly trained network, all x in one cluster will have the same winner. For the determination of the winner and the corresponding learning rule, two methods exist.

O

wio i

Fig. 14.1 A simple competitive learning network. Each of the four outputs o is connected to all inputs i.

**Winner Selection: Dot Product
**

For the time being, we assume that both input vectors x and weight vectors wo are normalized to unit length. Each output unit o calculates its activation value yo according to the dot product of input and weight vector: yo =

å

i

wio xi = w T x o

...(14.1)

In a next pass, output neuron k is selected with maximum activation "o ¹ k : yo £ yk ...(14.2) Activations are reset such that yk = 1 and yo ¹ k = 0. This is the competitive aspect of the network, and we refer to the output layer as the winner-take-all layer. The winner-take-all layer is usually implemented in software by simply selecting the output neuron with highest activation value. This function can also be performed by a neural network known as MAXNET (Lippmann, 1989). In MAXNET, all neurons o are connected to other units o0 with inhibitory links and to itself with an excitatory link: wo, o =

R- e S+ 1 T

if o ¹ o¢ otherwise

...(14.3)

4) effectively rotates the weight vector wo towards the input vector x. the weights are updated according to: wk(t + 1) = wk (t ) + g ( x(t ) . In Fig. 14.wk (t )) || wk (t ) + g ( x(t ) . Note that only the weights of winner k are updated..SELF-ORGANIZING NETWORKS 171 It can be shown that this network converges to a situation where only the neuron with highest initial activation survives.3 it is shown how the algorithm would fail if normalized vectors were to be used.(14..2).4). using the Euclidean distance measure: k: ||wk x||£||wo x|| "o .(14.2. the weight update must be changed to implement a shift towards the input: wk(t + 1) = wk(t) + g (x(t) wk(t)) .. whereas the activations of all other neurons converge to zero. Naturally one would like to accommodate the algorithm for normalized input data. Using the activation function given in equation (14. To this end.5) It is easily checked that equation (14. we will simply assume a winner k is selected without being concerned which algorithm is used. Each time an input x is presented.1) gives a biological plausible solution. Once the winner k has been selected. The three weight vectors are rotated towards the centers of gravity of the three different input clusters.. the winning neuron k is selected with its weight vector wk closest to the input pattern x. 14.5) reduces to (14.2 Example of clustering in 3D with normalized vectors. weight vectors are rotated towards those areas where many inputs appear: the clusters in the input. From now on. The Euclidean distance norm is therefore a more general case of equations (14.4) where the divisor ensures that all weight vectors w are normalized.6) . Consequently.1) and (14...wk (t ))|| . the weight vector closest to this input is selected and is subsequently rotated towards the input. Instead of rotating the weight vector towards the input as performed by equation (14. The weight update given in equation (14. This procedure is visualized in Fig.2) if all vectors are normalized. 14. Weight vector Pattern vector w1 w3 w2 Fig.(14. Winner selection: Euclidean distance Previously it was assumed that both inputs x and weight vectors w were normalized.1) and (14. which all lie on the unity sphere.

A common criterion to measure the quality of a given clustering is the square error criterion. . Krishnamurthy. b.. i... The three vectors having the same directions as in a..(14. This is implemented by expanding the weight update given in equation (14. In a. 1990). neurons that consistently fail to win increase their chances of being selected winner. the pattern and weight vectors are not normalized. each neuron records the number of times it is selected winner.. a. The more often it wins. In b. it is customary to initialize weight vectors to a set of input patterns {x} drawn from the input set at random. it is not beyond imagination that a randomly initialized weight vector wo will never be chosen as the winner and will thus never be moved and never be used. A point of attention in these recursive clustering techniques is the initialization. A somewhat similar method is known as frequency sensitive competitive learning (Ahalt. however.(14.7) with g ¢ < g the leaky learning rate. The weights w are interpreted as cluster centres. input patterns are divided in disjoint clusters such that similarities between input patterns in the same cluster are much bigger than similarities between inputs in different clusters. and their dot product xTw1 = |x||w1| cos a is larger than the dot product of x and w2. the less sensitive it becomes to competition.. Especially if the input vectors are drawn from a large or high-dimensional input space.172 FUZZY LOGIC AND NEURAL NETWORKS W1 W1 X W2 X W2 a b Fig. vectors x and w1 are nearest to each other. that a competitive network performs a clustering process on the input data. It is not difficult to show that competitive learning indeed seeks to find a minimum for this square error by the negative gradient of the error-function.. However. Cost function: Earlier it was claimed.6) with wl(t + 1) = wl(t) + g ¢(x(t) wl(t)) "l ¹ k . & Melton. Therefore.3 Determining the winner in a competitive learning network. as discussed before.e. Three normalized vectors. Conversely. In this algorithm. the dot product xTw1 is still larger than xTw2. 14. Another more thorough approach that avoids these and other problems in competitive learning is called leaky learning. but with different lengths. Chen. given by E= å ||wk x p||2 p . Again only the weights of the winner are updated. Similarity is measured by a distance function on the input vectors. and in this case w2 should be considered the winner when x is applied.8) where k is the winning neuron when input xp is presented..

(14. .12).(14.5 Fig. Proof: that As in eq.SELF-ORGANIZING NETWORKS 173 Theorem 14. Therefore.(14.6).8) is minimized by repeated weight updates using eq. A competitive learning network using Euclidean distance to select the winner was initialized with all weight vectors wo = 0..4 ++ + ++ + + ++ + + + + + + + 0 0.12) Example 14. (14..9 0. (3. is minimised by the weight update rule in eq.(14. we have to determine the partial derivative of E p: wio . The positions of the weight vectors after 500 iterations is given by o.11) p Dpwio = g (wio x ip) = g (x o wio) which is eq. 14.1: The error function for pattern x p Ep = å ||wk x p ||2 p .2 0..(14.. we calculate the effect of a weight change on the error function. Now. 1 0.1 0 – 0.10) where g is a constant of proportionality.8 0. The data are given by +. (14. (14.5 0.xip ¶E p = ¶wio 0 such that R S T if unit o wins otherwise .7 + 0.4. 8 clusters of each 6 data points are depicted.. So we have DpWio = C ¶E p ¶wio . eq..9) where k is the winning unit..3 0. .5 1 Competitive learning for clustering data. 14.1: In Fig.6 0.6)..4 0. The network was trained with g = 0:1 and a g ¢ = 0:001 and the positions of the weights after 500 iterations are shown.6) written down for one element of wo.

index k of the winning neuron). In the areas where inputs are scarce. the weight vectors also lie in Â2.5.e.. but more in quantising the entire input space. competitive learning can be used in applications where data has to be compressed such as telecommunication or storage. only few (in this case two) neurons are used to discretized the input space. where many more inputs have occurred. Vector quantisation through competitive learning results in a more fine-grained discretization in those areas of the input space where most input have occurred in the past.5 This figure visualizes the tracking of the input density. The input patterns are drawn from Â2. the upper part of the figure. and be applied to function approximation problems or classification problems. networks that perform vector quantisation are combined with another type of network in order to perform function approximation. The difference with clustering is that we are not so much interested in finding clusters of similar data. An example of such a network is given in . A vector quantisation scheme divides the input space in a number of disjoint subspaces and represents each input vector x by the label of the subspace it falls into (i. whereas a more coarse quantisation is obtained in those areas where inputs are scarce.2. the upper part of the input space is divided into two large separate regions. x2 x1 Input pattern Weight vector Fig. however.3 Counter Propagation In a large number of applications. In this way. 14. We will describe two examples: the counter propagation method and the learning vector quantisation. five neurons discretized the input space into five smaller subspaces. competitive learning has also be used in combination with supervised learning methods. The quantisation performed by the competitive learning network is said to track the input probability density function: the density of neurons and thus subspaces is highest in those areas where inputs are most likely to appear.174 FUZZY LOGIC AND NEURAL NETWORKS 14.2 Vector Quantisation Another important use of competitive learning networks is found in vector quantisation. An example of tracking the input density is sketched in Figure 14. Thus. 14. However.2. The lower part.

k) as: g(x. . Depending on the application..(14.6. the input space Â2 is discretized in 5 disjoint subspaces. For each weight vector.6 Feedforward Who A network combining a vector quantisation layer with a 1-layer feed-forward neural network.SELF-ORGANIZING NETWORKS 175 Fig. k) = h . If we define a function g(x. 14. 3. or one can choose to learn the quantisation and the approximation layer simultaneously.14) It can be shown that this learning procedure converges to who = z yog (x.6 can be supervisedly trained in the following way: 1. w2k. one can choose to perform the vector quantisation before learning the function approximation. This network can be used to approximate functions from Â2 to Â2. 14. Update the weights wih with equation (14..(14. calculate the distance from its weight vector to the input pattern and find winner k.13) å yhwho = wko when k is the winning neuron and the desired R1 S0 T Ân if k is winner otherwise .15) .. Perform the supervised approximation step: wko(t + 1) = wko(t) + g (do wko(t)) This is simply the d rule with yo = output is given by d = f(x). wmk]T in this table entry is taken as an approximation of f(x). wmo]T which is somehow representative for the function values f(x) of inputs x represented by o. w2o.. 1988). This way of approximating a function effectively implements a look-up table: an input x is assigned to a table entry k with "o ¹ k: ||x wk||£||x wo||. Perform the unsupervised quantisation step.(14. 2. Present the network with both input x and function value d = f (x).6). A well-known example of such a network is the Counter propagation network (Hecht-Nielsen.. and the function value [w1k. . This network can approximate a function f : Ân ® Âm by associating with each neuron o a function value [w1o. h)dx . Vector quantisation h i o y Wih Fig.. 14. As an example of the latter. the network presented in Fig.

g. to obtain a better or locally more fine-grained quantisation).g. As we have seen before. if we expect our input to be (a subspace of) a high dimensional input space <n and we expect our function f to be discontinuous at numerous points. Of course this combination extends itself much further than the presented combination of the presented single layer competitive learning network and the single layer feed-forward network. p 2.. With each output neuron o. A learning sample consists of input vector xp together with its correct class label y o. a simple identity or combinations of sines and cosines are much better approximated by multilayer back-propagation networks if the activation functions are chosen appropriately. Granted that these methods also perform a clustering or quantisation task and use similar learning rules. but also the second best k2: p p 4. 1991) are based on this very idea.g. Using distance measures between weight vectors wo and input vector x . These networks attempt to define decision boundaries in the input space. each decision could. each table entry converges to the mean function value over all inputs in the subspace represented by that table entry.176 FUZZY LOGIC AND NEURAL NETWORKS i. Olshen. In fact.. to also cover Learning Vector Quantisation (LVQ) methods in chapters on unsupervised clustering. A rather large number of slightly different LVQ methods is appearing in recent literature.. 1984. The latter could be replaced by a reinforcement learning procedure (see chapter 15). Friedman. the combination of quantisation and approximation is not uncommon and probably very efficient. Friedman. and Groen. k2 and ||x p wk2|| ||x p wk1|| < e then wk2(t + 1) = wk2(t) + g (x wk2(t)) and wk1(t + 1) = wk1(t) + g (x wk1(t)) p p if y k1 ¹ d p and d p = y k2 . using the following strategy: ||xp wk1||<||x p wk2||<||x p wi|| "o ¹ k1. 14.2. such as Kohonen networks or octree methods (Jansen. However.6) is used selectively based on this comparison.e. The weight update rule given in equation (6. extended with the possibility to have the approximation layer influence the quantisation layer (e. e. given a large set of exemplary decisions (the training set). Not all functions are represented accurately by this combination of quantisation and approximation layers. An example of the last step is given by the LVQ2 algorithm by Kohonen (1977). and Stone.. The labels y k1. e. which results in a better approximation of the function in those areas where input is most likely to appear. They are all based on the following basic algorithm: 1. the quantisation scheme tracks the input probability density function. be a correct class label. Smagt. 1994). various modern statistical function approximation methods (Breiman. p 3. a class label (or decision of some other kind) yo is associated. The quantisation layer can be replaced by various other quantisation schemes.4 Learning Vector Quantisation It is an unpleasant habit in neural network literature. not only the winner k1 is determined. they are trained supervisedly and perform discriminant analysis rather than unsupervised clustering. y k2 are compared with d p.

the weights to this winning unit as well as its neighbours are adapted using the learning rule wo(t + 1) = wo(t) + gg(o. For example: data on a two. input signals. such that g(k. k) = exp ( (o k)2) (see Fig. In the Kohonen network. if the inputs are restricted to a subspace of ÂN.. although this is application-dependent. Now. a Kohonen network can be used of lower dimensionality. the dimensionality of S must be at least N.1.7). if inputs are uniformly distributed in ÂN and the order must be preserved. This means that learning patterns which are near to each other in the input space (where near is determined by the distance measure used in finding the winning unit) must be mapped on output units. Using the same formulas as in section 6. k)(x(t) wo(t)) . 14.8. Due to this collective learning scheme. i. Thus. e. Next. Usually. which can for example be used for visualization of the data.. how many next-best winners are to be determined.. which are near to each other. often in a twodimensional grid or array.SELF-ORGANIZING NETWORKS 177 i. 14.16) Here. wk2 with the correct label is moved towards the input vector. For example. how to adapt the number of output neurons i and how to selectively use the weight update rule. If the intrinsic dimensionality of S is less than N. although this is chronologically incorrect. However. i. the same or neighboring units. will be mapped on neighbouring neurons. determines which output neurons are neighbours. The brain is organized in many places so that aspects of the sensory environment are represented in the form of two-dimensional maps.e.. The new LVQ algorithms that are emerging all use different implementations of these different steps. such that (in one dimension!) g(o. such as depicted in Fig. 14. how to define class labels yo. in the visual system. which represents a discretization of the input space. such as depicted in Fig. the neurons in the network are folded in the input space.. when learning patterns are presented to the network. 14. while wk1 with the incorrect label is moved away from it. k) is a decreasing function of the grid-distance between units o and k. which are also near to each other.(14. the output units in S are ordered in some fashion. there are several topographic mappings of visual space onto the surface of the visual cortex.9. Thus the topology inherently present in the input signals will be preserved in the mapping.g. The mapping.dimensional manifold in a high dimensional input space can be mapped onto a two-dimensional Kohonen network.e. the Kohonen network has a different set of applications.e. The topology-conserving quality of this network has many counterparts in biological brains. the neurons in S. The ordering. a sample x(t) is generated and presented to the network. At time t. g(o. the weights to the output units are thus adapted such that the order present in the input space Â2 is preserved in the output. the learning patterns are random samples from ÂN. for g( ) a Gaussian function can be used. For example. which is chosen by the user.3 KOHONEN NETWORK The Kohonen network (1982. Also. is said to be topology preserving. the winning unit k is determined. There are organized mappings of the body surface . k) = 1. 1984) can be seen as an extension to the competitive learning network..

( i1 i 2 + 1) .178 FUZZY LOGIC AND NEURAL NETWORKS 1 0. 14. Iteration 0 Iteration 200 Iteration 600 Iteration 1900 Fig. The leftmost figure shows the initial weights.25 0 –2 –1 0 1 1 2 –2 –1 2 1 0 Fig. The weight vectors of a network with two inputs and 8 x 8 output neurons arranged in a planar grid are shown.(o1 o2 ) with weights wi . . A line in each figure connects weight wi .9 The mapping of a two-dimensional input space on a one-dimensional Kohonen network.(o1 + 1. 14. g( ) is shown for a two-dimensional grid because it looks nice. the rightmost when the map is almost completely formed.5 0.75 0. In this case. 14. o2 ) and wi .7 Gaussian neuron distance function g( ). Fig.8 A topology-conserving map converging.

An example is shown in Figure 14. de2 ) in the figures. It can be easily seen that x1 and x2 are related.4 PRINCIPAL COMPONENT NETWORKS The networks presented in the previous sections can be seen as (nonlinear) vector transformations. The two dimensional samples (x1. is so common that it obviously serves an important information processing function. where some important aspect of a sensory modality is related to the physical locations of the cells on a surface. such as auditory and visual. that already many applications have been devised of the Kohonen topology-conserving maps. .SELF-ORGANIZING NETWORKS 179 onto the cortex in both motor and somatosensory areas. The weights are adjusted in such a way that they could be considered as prototype vectors (vectorial means) for the input patterns for which the competing neuron wins. Krommenhoek. therefore. and tonotopic mappings of frequency in the auditory cortex. such that if we know x1 we can make a reasonable prediction of x2 and vice versa since the points are centered around the line x1 = x2. and Gisbergen. Makisara. and Saramaki. looking at the same scene (Gielen. In one dimension. inhibition to farther off neurons. which map an input vector to a number of binary output elements or neurons. x2) are plotted in the figure. 14. To explain the plausibility of a similar structure in biological networks. the variance of the samples is large along the e1 axis and small along the e2 axis. After the rotation. The use of topographic representations. Kohonen remarks that the lateral inhibition between the neurons could be obtained via efferent connections between those neurons. d x2 ) and (de1 . This can be intuitively verified by comparing the spreads (d x1. 1991). 14. The self-organizing transform described in this section rotates the input space in such a way that the values of the output neurons are as uncorrelated as possible and the energy or variances of the patterns is mainly concentrated in a few output neurons.10). Here the conditional prediction has no use because the points have uncorrelated coordinates. If we rotate the axes over p/4 we get the (e1.10 Mexican hat. Lateral interaction around the winning neuron as a function of distance: excitation to nearby neurons. those connection strengths form a Mexican hat (see Figure 14. Excitation Lateral distance Fig. 1984).11. Also. Kohonen himself has successfully used the network for phonemerecognition (Kohonen. e2) axis as plotted in the figure. Another property of this rotation is that the variance or energy of the transformed patterns is maximized on a lower dimension. the network has been used to merge sensory data from different kinds of sensors. It does not come as a surprise.

the basic Hebbian rule would make the weights grow uninhibitedly if there were correlation in the input patterns.18) where L() indicates an operator which returns the vector length..11 Distribution of input samples. which leads to the following learning rule w(t + 1) = w(t ) + gy(t ) x(t ) L( w(t ) + gy(t ) x(t )) .1 Normalized Hebbian Rule The model considered here consists of one linear neuron with input weights w. the norm of the vector.(14.4. 14.. typically 1.(14. extending the theory in the last section to multidimensional outputs. There the delta rule was normalized. can be approximated by a Taylor expansion around g = 0: L(w(t)) + gy(t) x(t) = 1 + g ¶L + O(g2) ¶g g = 0 .19) . 14. The output yo(t) of this neuron is given by the usual inner product of its weight w and the input vector x: yo(t) = w(t)T x(t) . all models are based on a kind of Hebbian learning.180 FUZZY LOGIC AND NEURAL NETWORKS x2 dx2 e1 de2 de1 e2 x1 dx1 Fig. Now the operator which computes the vector length.. here the standard Hebb rule is.(14.17) As seen in the previous sections. This can be overcome by normalising the weight vector to a fixed length.. However.. Compare this learning rule with the normalized learning rule of competitive learning. This transform is very closely related to the eigenvector transformation known from image processing where the image has to be coded or transformed to a lower dimension and reconstructed again by another transform as well as possible. The next section describes a learning rule which acts as a Hebbian learning rule.. and g is a small learning parameter. but which scales the vector length to unity. In the subsequent section we will see that a linear neuron with a normalised Hebbian learning rule acts as such a transform.

14..(14.(14.4. 2 w(t + 1) = (w(t) + gy(t) x(t)) 1 .2 Principal Component Extractor Remember probability theory? Consider an N-dimensional signal x(t) with Mean m = E(x(t)). From equation (6...21) which is called the Oja learning rule (Oja. 1982).(14. so m = 0...g F GGH ¶L + o( g 2 ) ¶g g = 0 I JJK .3 More Eigenvectors In the previous section it was shown that a single neurons weight converges to the eigenvector of the correlation matrix with maximum eigenvalue. Proof: 1 Since the eigenvectors of R span the N-dimensional space.(14.SELF-ORGANIZING NETWORKS 181 When we substitute this expression for the vector length in equation (6..(14.4. the weight of the neuron is directed in the direction . the weight vector can be decomposed as w(t) = å i N bi (t)ei .21) we see that the expectation of the weights for the Oja learning rule equals E(w(t + 1)|w(t)) = w(t) + g (Rw(t) (w(t)TRw(t))w(t)) which has a continuous counterpart . i. > lN ..22) d w(t) = Rw(t) (w(t)T Rw(t)) w(t) dt .e.. With equation (6. the first product terms is the Hebb rule yo(t) x(t).23) Theorem 14. This learning rule thus modifies the weight in the usual Hebbian sense.18)..20) Since dL = y(t)2 discarding the higher order terms of g leads to dg g =0 w(t + 1) = w(t) + gy(t) x(t)(x(t) y(t)w(t)) . it resolves for small g(t ).23) the weights w(t) will converge to ± e1. What exactly does this learning rule do with the weight vector? 14..2: Let the eigenvectors ei of R be ordered with descending associated eigenvalues li such that l1 > l2 > .. Correlation matrix R = E((x(t) m) (x(t) m)T).24) Substituting this in the differential equation and concluding the theorem is left as an exercise. In the following we assume the signal mean to be zero.. but normalizes its weight vector directly by the second product term yo(t) x(t) w(t)..

182 FUZZY LOGIC AND NEURAL NETWORKS of highest energy or variance of the input patterns.... We call ~ the deflation of x.25) If we now subtract the component in the direction of e1. the coefficient a1 = 0.. simply x because we just subtracted it.5.(14.(14.. the direction in which the signal has the most energy.(14.. Since the deflation removed the component in the direction of the first eigenvector.(14.26) we are sure that when we again decompose ~x into the eigenvector basis.29) å i N aiei = ai . as with networks in the next chapter. Grossberg introduced a model for explaining biological phenomena. We can write the deflation in neural network terms if we see that yo = wT x = e T 1 since w = e1 ~ equals So that the deflated vector x ..1 Background: Adaptive Resonance Theory In 1976.5 ADAPTIVE RESONANCE THEORY The last unsupervised learning network we discuss differs from the previous networks in that it is recurrent. the weight will converge to the remaining eigenvector with maximum eigenvalue. Consider the signal x which can be decomposed into the basis of eigenvectors ei of its correlation matrix R. the data is not only fed forward but also back from output to input units. We can continue this strategy and find all the N eigenvectors belonging to the signal x. x= å aiei i N . Here we tackle the question of how to find the remaining eigenvectors of the correlation matrix given the first found eigenvector.. from the signal x ~ =xa e x 1 1 ..(14. so according to this definition in the limit we will find e2. Compare this to ART described in the next section.27) ~ =xy w x o The term subtracted from the input vector can be interpreted as a kind of a back-projection or expectation. 14. In the previous section we ordered the eigenvalues in magnitude. 14. x If now a second neuron is taught on this signal ~.28) . The model has three crucial properties: .. then its weights will lie in the direction of the remaining eigenvector with the highest eigenvalue.

it must be stored in the short-term memory. The system consists of two layers. whereas the STM is used to cause gradual changes in the LTM. called F1 (the comparison layer) and F2 (the recognition layer) (see Fig. F1 and F2. The awareness of subtle differences in input patterns can mean a lot in terms of survival.. A normalization of the total network activity. residing in the LTM connections. Each neuron in F1 is connected to all neurons in F2 via the continuous-valued forward long term memory (LTM) W f. For example. First a characterization takes place by means of extracting features. Fig. giving rise to activation in the feature representation field. Short-term memory (STM) storage of the contrast-enhanced pattern. If there is a match. 14. The classification is compared to the expectation of the network. which are connected to each other via the LTM (see Category representation field STM activity pattern LTM STM activity pattern Feature representation field F1 LTM F2 Input Fig. 14. Contrast enhancement of input patterns. 14. and vice versa via the binary-valued backward LTM W b. Each neuron in the comparison layer receives three inputs: a component of the input pattern.2 ART1: The Simplified Neural Network Model The ART1 simplified model consists of two layers of binary neurons (with values 1 and 0).5.12 The ART architecture. whereas classification takes place in F2. 14. The mechanism used here is contrast enhancement. Distinguishing a hiding panther from a resting one makes all the difference in the world. The long-term memory (LTM) implements an arousal mechanism (i.e.12). Before the input pattern can be decoded. The expectations.13). the input is not directly classified. The other modules are gain 1 and 2 (G1 and G2). and a gain G1. translate the input pattern to a categorization in the category representation field. The input pattern is received at F1. As mentioned before. the human eye can adapt itself to large variations in light intensities. . A neuron outputs a 1 if and only if at least three of these inputs are high: the two-thirds rule. the expectations are strengthened otherwise the classification is rejected. 2.SELF-ORGANIZING NETWORKS 183 1. a component of the feedback pattern. Biological systems are usually very adaptive to large changes in their environment. 3. which resides in the LTM weights from F2 to F1. the classification). and a reset module.

If there is a substantial mismatch between the two patterns. Initialization: wb (0) = 1 ji f w ji = 1 1+ N .13 The ART 1 neural network. we use the notation employed by Lippmann (1987): 1. and only the neurons in F1 which receive a one from both x and F2 remain active. the reset signal will inhibit the neuron in F2 and the process is repeated.184 FUZZY LOGIC AND NEURAL NETWORKS F2 M neurons + G2 + j + W b +f W F1 N neurons – + G1 + i – + Reset Input Fig.5. Gain 2 is the logical or of all the elements in the input pattern x. Finally. The winning neuron then inhibits all the other neurons via lateral inhibition. the reset signal is sent to the active neuron in F2 if the input vector x and the output of F1 differ by more than some vigilance level. Gain 1 is inhibited. Instead of following Carpenter and Grossbergs description of the system using differential equations. The neurons in the recognition layer each compute the inner product of their incoming (continuousvalued) weights and the pattern sent over these connections. then it is forced to zero. This signal is then sent back over the backward LTM. and in F2 one neuron becomes active. Gain 1 equals gain 2.3 Operation The pattern is sent to F2. 14. which reproduces a binary pattern at F1. 14. except when the feedback pattern from F2 contains any 1.

0 £ i < N. the distance between the new pattern and all existing classes exceeds some threshold. 5. while the previous memory is not corrupted.SELF-ORGANIZING NETWORKS 185 where N is the number of neurons in F1. M the number of neurons in F2. ART1 overcomes this problem. choose the vigilance threshold r. else go to step 6. which will be large if x* and x are near to each other. and 0 £ j < M.31) where o denotes inner product. By changing the structure of the network rather than the weights. 14. The network incorporates a follow-the-leader clustering algorithm (Hartigan.e. Neuron k is disabled from further activity. Also. Fig. 3.5. all patterns must be taught sequentially. 0 £ r £ 1.30) 4.14 shows exemplar behaviour of the network. a new class is created containing the new pattern. Select the winning neuron k(0 £ k < M).. 2. Vigilance test: if b wk (t ) o x >r xox . Re-enable all neurons in F2 and go to step 2. 1975). i. 6. go to step 7. If no matching class can be found. Carpenter and Grossberg (1987) present several neural network models to incorporate parts of the complete theory. o £ l < N: wb (t + 1) = wb (t)xl kl kl f w lk (t + 1) = b wkl (t ) xl N 1 + 2 åw i =1 b ki ( t ) xi 8. Apply the new input pattern x.. In most neural networks. . 14.propagation network.. Note that wb o x essentially is the k inner product x* o x. We will only discuss the first model.. The novelty in this approach is that the network is able to adapt to new incoming patterns.(14.4 ART 1: The Original Model In later work. the teaching of a new pattern might corrupt the weights for all previously learned patterns.. 7.(14. Go to step 3. Compute the activation values of the neurons in F2: y¢ = i åw j =1 N f ij (t)x1 . such as the back. ART 1. Set for all l. This algorithm tries to fit each new input pattern in an existing class.

e. On the right the stored patterns (i. we set I = intensity Qk = sk I 1. In order to introduce normalization in the model. 14.e. 14. the weights of W b for the first four output units) are shown. Each cell k in F1 or F2 receives an input sk and respond with an activation level yk. i. So we have a model in which the change of the response yk of an input at a certain cell k depends inhibitorily on all other inputs and the sensitivity of the cell..186 FUZZY LOGIC AND NEURAL NETWORKS Backward LTM from Input pattern Output 1 Output 2 Output 3 Output 4 Not active Not active Not active Not active Not active Not active Not active Fig. .14 An example of the behaviour of the Carpenter Grossberg network for letter patterns. the surroundings of each cell have a negative influence on the cell . l l¹k has an excitatory response as far as the input at the cell is concerned +Bsk. The binary input patterns on the left were applied sequentially.5 Normalization of the Original Model We will refer to a cell in F1 or F2 with k.yk ås k and let the relative input ås .5..

and with I = dt yk(A + 1) = Bsk ås we have that ..6 Contrast Enhancement In order to make F2 react better on differences in neuron values in F1 (or vice versa). we have yk = 1 nCI Qk A+ I n FG H IJ K .(14. A and B are constants.yk dt l¹k å k . . when dyk = 0.32) does not suffice anymore. contrast enhancement is applied: the contrasts between the neuronal values in a layer are amplified.(14. then all the yk are zero: the effect of C is enhancing differences.. we chop off all the equal fractions (uniform parts) in F1 or F2..(14.37) Now..35) åy k never exceeds B: it is normalized.34) Therefore. At equilibrium. If we set B £ (n 1)C or C/(B + C) ³ 1/n. (14. we will revert to the simplified model as presented by Lippmann... when an input in which all the sk are equal is given. The differential equation for the neurons in F1 and F2 now is dyk sl = Ayk + (B yk)sk . The description of ART1 continues by defining the differential equations for the LTM. and.. at equilibrium yk is proportional to Qk. then more of the input shall be chopped off..5.33) Because of the definition of Qk = sk I 1 we get yk = Qk BI A+ I .SELF-ORGANIZING NETWORKS 187 has an inhibitory response for normalization yk sk. In order to enhance the contrasts..(14.. Instead of following Carpenter and Grossbergs description. since BI £B A+ I The total activity ytotal = .(14. This can be done by adding an extra inhibitory input proportional to the inputs from the other cells with a factor C: dy k sl = Ayk + (B yk)sk (yk + C) dt l¹k å ..36) At equilibrium. when we set B = (n 1) C where n is the number of neurons.. We can show that eq. has a decay Ayk . Here.(14.32) with 0 £ yk (0) £ B because the inhibitory effect of an input can never exceed the excitatory input. 14.

Review of neural networks for speech recognition. Vol.F. Olshen. 5. Grossberg.H.H. 9. 1987. Melton. pp. Computer Vision. 6. 1. Carpenter. Murray (Ed. and F.). Nielsen. Neural Network Applications. pp. R.E. Krishnamurthy. pp. 8. 187-202. 3. Stone.Vol. 11. 1994. S. Grossberg. P. 3. Neural Computation. 121-136. G. 1976. Biological Cybernetics. 1989. 1991. 1988. pp. 10. Describe adaptive resonance theory.A.188 FUZZY LOGIC AND NEURAL NETWORKS QUESTION BANK. 1-38. 1-141. Biological Cybernetics. Annals of Statistics. K. Zipser. No. Neocognitron: A hierarchical neural network capable of visual pattern recognition. 1988. 1.P. 1990. pp. A massively parallel architecture for a self-organizing neural pattern recognition machine. 20.P. Describe the square error criterion to measure the quality of a given clustering.H. S. Vol. Van der. J. What is Kohonen network? Explain. 37. Lippmann. P. P. 1985.C.A.A. Jansen. 1984. 9. 9. 277-290. L. 2. Fukushima. 7. Vol. Groen. Vol. Vol. Graphics. 12. Breiman. 2. Kluwer Academic Publishers. 6. Explain ART 1 neural network. 1.J. Vol. 11. 7. Chen. A. Competitive learning algorithms for vector quantisation. 4. 121-134. Feature discovery by competitive learning. pp. What are the advantages of self-organizing networks? What is competitive learning network? Explain various methods of determining the winner and the corresponding learning rule. 1. pp.C. Multivariate adaptive regression splines. 131-139. Vol. 3. J. R. Describe the normalization of ART 1. D. and C. Adaptive pattern classification and universal recoding I & II. Friedman. Friedman. 1. Ahalt. 1975. Explain normalized Hebbian rule. REFERENCES.K. 19. Cognitron: A self-organizing multilayered neural network. K. Fukushima. Neural Networks. Neural Networks. Smagt. Cognitive Science. Neural Networks. 23. Counterpropagation networks. and S. 5. and D. Rumelhart. 10. and Image Processing. Nested networks for robot control. Describe the vector quantisation scheme. D. . Vol. 75-112. pp. 119-130. 4. A. R. Explain the counter propagation network. Wadsworth and Broks/Cole. 54115. 8. Describe the learning vector quantisation method. Classification and Regression Trees. In A.

Applied Optics.SELF-ORGANIZING NETWORKS 189 12. 1976. Carpenter. pp. 59-69. and Image Processing.C. Self-Organization and Associative Memory. 1991. 15. 18. T. Adaptive pattern classification and universal recoding I & II. T. 14. 4-22. New York: John Wiley & Sons. Saramaki. pp. Associative Memory: A System-Theoretical Approach. and L. Computer Vision. In Proceedings of the 7th IEEE International Conference on Pattern Recognition. 1982. Groen. Oja. 17. Elsevier Science Publishers. In T. T. 1984.). 19. 16. 23. pp. Journal of Mathematical Biology.P.4. R. Grossberg. A procedure for self-organized sensor-fusion in topologically ordered maps. Hartigan. Phonotopic maps|insightful representation of phonological features for speech recognition.A. 417-423. Grossberg. Hertzberger (Eds. Springer-Verlag. 37. A simplified neuron model as a principal component analyzer. Grossberg. Self-organized formation of topologically correct feature maps. 15. Biological Cybernetics. DUNNO. pp. Vol. J.A. and S. Vol. T. Vol. C. 21. 187-202. Carpenter. 22. Speech. M.A. ART 2: Self-organization of stable category recognition codes for analog input patterns. 43. 1982. Graphics. E. Kohonen. 2 No. 54115. G. Krommenhoek. Proceedings of the Second International Conference on Autonomous Systems. An introduction to computing with neural nets. 1987. Makisara. IEEE Transactions on Acoustics. K. 4919-4930. 1975.O. Gisbergen. . Vol. and T. and S. Biological Cybernetics.A. 1977. 267-273. 121-134. G. 26(23). 20. Kohonen. Berlin: Springer-Verlag. Gielen. F. Kanade. Kohonen. Clustering Algorithms. and Signal Processing. and J. 1987. 1987. 1984. S. 13. A massively parallel architecture for a self-organizing neural pattern recognition machine. Lippmann. Kohonen.

If the objective of the network is to minimize a direct measurable quantity r. The First is that the reinforcement signal r is often delayed since it is a result of network outputs in the past. not always such a set of learning examples is available.1 INTRODUCTION In the previous chapters a number of supervised training methods have been described in which the weight adjustments are calculated using a set of learning samples.+ 0 ) 2 6 . Critic $ J Reinforcement signal u System x Reinf. Fig.4 Reinforcement Learning 15 15.2 THE CRITIC The first problem is how to construct a critic. 15. existing of input and desired output values. 15. On the other hand. This temporal credit assignment problem is solved by learning a critic network which represents a cost function J predicting future reinforcement.1 shows a reinforcement-learning network interacting with a system. Often the only information is a scalar evaluation r. which is able to evaluate system performance. Reinforcement learning involves two subproblems. The two problems are discussed in the next paragraphs. However. which indicates how well the neural network is performing. 15.1 Reinforcement learning scheme. performance feedback is straightforward and a critic is not required. The second problem is to Find a learning procedure which adapts the weights of the neural network such that a mapping is established which minimizes J. how is current behaviour to be evaluated . respectively. learning controller Fig.

REINFORCEMENT LEARNING 191

if the objective concerns future system performance? The performance may for instance be measured by the cumulative or future error. Most reinforcement learning methods (Barto, Sutton and Anderson (1983) use the temporal difference (TD) algorithm (Sutton, 1988) to train the critic. Suppose the immediate cost of the system at time step k are measured by r(xk, uk, k), as a function of system states xk and control actions (network outputs) uk. The immediate measure r is often called the external reinforcement signal in contrast to the internal reinforcement signal in Fig. 7.1. Define the performance measure J(xk, uk, k)of the system as a discounted cumulative of future cost. The task of the critic is to predict the performance measure: J(xk, uk, k) =

i=k

åy

¥

ik

r (xk, uk, k)

...(15.1)

in which g Î [0, 1] is a discount factor (usually » 0.95). The relation between two successive prediction can easily be derived: J(xk, uk, k) = r(xk, uk, k) + rJ(xk+1, uk+1, k + 1) be: ...(15.2)

**$ If the network is correctly trained, the relation between two successive network outputs J should $ $ J (xk, uk, k) = r(xk, uk, k) + r J (xk+1, uk+1, k + 1)
**

...(15.3)

If the network is not correctly trained, the temporal difference d(k) between two successive predictions is used to adapt the critic network:

$ $ d(k) = [r(xk, uk, k) + r J (xk+1, uk+1, k + 1)] J (xk, uk, k)

2

...(15.4)

A learning rule for the weights of the critic network wc(k), based on minimizing d (k) can be derived: Dwc(k) = ae(k) in which a is the learning rate.

$ ¶J ( xk , uk , k ) ¶wc ( k )

...(15.5)

15.3

THE CONTROLLER NETWORK

If the critic is capable of providing an immediate evaluation of performance, the controller network can be adapted such that the optimal relation between system states and control actions is found. Three approaches are distinguished: 1. In case of a finite set of actions U, all actions may virtually be executed. The action which decreases the performance criterion most is selected:

$ uk = min J (xk, uk, k)

u ÎU

...(15.6)

**192 FUZZY LOGIC AND NEURAL NETWORKS
**

The RL-method with this controller is called Q-learning (Watkins & Dayan, 1992). The method approximates dynamic programming which will be discussed in the next section. 2. If the performance measure J(xk, uk, k) is accurately predicted, then the gradient with respect to the controller command uk can be calculated, assuming that the critic network is differentiable. If the measure is to be minimized, the weights of the controller wr are adjusted in the direction of the negative gradient: Dwr(k) = b

$ ¶J ( xk , uk k ) ¶u(k ) ¶u( k ) ¶wr (k )

...(15.7)

with b being the learning rate. Werbos (1992) has discussed some of these gradient based algorithms in detail. Sofge and White (1992) applied one of the gradient based methods to optimize a manufacturing process. 3. A direct approach to adapt the controller is to use the difference between the predicted and the true performance measure as expressed in equation 15.3. Suppose that the performance measure is to be minimized. Control actions that result in negative differences, i.e. the true performance is better than was expected, then the controller has to be rewarded. On the other hand, in case of a positive difference, then the control action has to be penalized. The idea is to explore the set of possible actions during learning and incorporate the beneficial ones into the controller. Learning in this way is related to trial-and-error learning studied by psychologists in which behavior is selected according to its consequences. Generally, the algorithms select probabilistically actions from a set of possible actions and update action probabilities on basis of the evaluation feedback. Most of the algorithms are based on a look-up table representation of the mapping from system states to actions (Barto et al., 1983). Each table entry has to learn which control action is best when that entry is accessed. It may be also possible to use a parametric mapping from systems states to action probabilities. Gullapalli (1990) adapted the weights of a single layer network.

15.4

BARTOS APPROACH: THE ASE-ACE COMBINATION

Barto, Sutton and Anderson (1983) have formulated reinforcement learning as a learning strategy, which does not need a set of examples provided by a teacher. The system described by Barto explores the space of alternative input-output mappings and uses an evaluative feedback (reinforcement signal) on the consequences of the control signal (network output) on the environment. It has been shown that such reinforcement learning algorithms are implementing an on-line, incremental approximation to the dynamic programming method for optimal control, and are also called heuristic dynamic programming (Werbos, 1990). The basic building blocks in the Barto network are an Associative Search Element (ASE) which uses a stochastic method to determine the correct relation between input and output and an Adaptive Critic Element (ACE) which learns to give a correct prediction of future reward or punishment (Fig. 15.2). The external reinforcement signal r can be generated by a special sensor (for example a collision sensor of a mobile robot) or be derived from the state vector. For example, in control applications, where the state s of a system should remain in a certain part A of the control space, reinforcement is given by:

**REINFORCEMENT LEARNING 193
**

Reinforcement T ACE WC1 WC2 WCn

$ T Internal

Reinforcement detector

reinforcement

Decoder

WS1 WS2 ASE WSn

yo

System

State vector

Fig. 15.2:

Architecture of a reinforcement learning scheme with critic element.

r=

R0 S1 T

if s Î A otherwise

...(15.8)

15.4.1

Associative Search

In its most elementary form the ASE gives a binary output value yo(t) Î{0, 1}; as a stochastic function of an input vector. The total input of the ASE is, similar to the neuron presented in chapter 2, the weighted sum of the inputs, with the exception that the bias input in this case is a stochastic variable N with mean zero normal distribution: s(t) =

åw

j =1

N

sj x j

(t) + Nj

...(15.9)

The activation function F is a threshold such that yo(t) = y(t)

R1 S0 T

if s(t ) > 0 otherwise

...(15.10)

For updating the weights, a Hebbian type of learning rule is used. However, the update is weighted with the reinforcement signal r(t) and an eligibility ej is defined instead of the product y0(t) xj(t) of input and output: wsj(t + 1) = wsj(t) + ar(t) ej(t) where a is a learning factor. The eligibility ej is given by ... (15.11)

**194 FUZZY LOGIC AND NEURAL NETWORKS
**

ej(t + 1) = dej(t) + (1 d) y0(t) xj(t) ...(15.12) with d the decay rate of the eligibility. The eligibility is a sort of memory; ej is high if the signals from the input state unit j and the output unit are correlated over some time. Using r(t) in expression (15.11) has the disadvantage that learning only finds place when there is an $ external reinforcement signal. Instead of r(t), usually a continuous internal reinforcement signal r (t) given by the ACE, is used. Barto and Anandan (1985) proved convergence for the case of a single binary output unit and a set of linearly independent patterns x p. In control applications, the input vector is the (n-dimensional) state vector s of the system. In order to obtain a linear independent set of patterns x p, often a decoder is used, which divides the range of each of the input variables si in a number of intervals. The aim is to divide the input (state) space in a number of disjunct subspaces or boxes. The input vector can therefore only be in one subspace at a time. The decoder converts the input vector into a binary valued vector x, with only one element equal to one, indicating which subspace is currently visited. It has been shown (Krose and Dam, 1992) that instead of a-priori quantisation of the input space, a self-organizing quantisation, based on methods described in this chapter, results in a better performance.

**15.4.2 Adaptive Critic
**

The Adaptive Critic Element (ACE, or evaluation network) is basically the same as described in section 7.1. An error signal is derived from the temporal difference of two successive predictions (in this case denoted by p!) and is used for training the ACE:

$ r (t) = r(t) + gp(t) p(t 1)

...(15.13) ...(15.14)

p(t) is implemented as a series of weights wCk to the ACE such that p(t) = wCk if the system is in state k at time t, denoted by xk = 1. The function is learned by adjusting the wCk s $ according to a delta-rule with an error signal d given by r (t):

$ DwCk(t) = b r (t)hj(t)

...(15.15) ...(15.16)

b is the learning parameter and hj (t) indicates the trace of neuron xj: hj(t) = lhj(t 1) + (1 l) xj(t 1) This trace is a low-pass filter or momentum, through which the credit assigned to state j increases $ while state j is active and decays exponentially after the activity of j has expired. If r (t) is positive, the $ action u of the system has resulted in a higher evaluation value, whereas a negative r (t) indicates a $ deterioration of the system. r (t) can be considered as an internal reinforcement signal.

15.4.3

The Cart-Pole System

An example of such a system is the cart-pole balancing system (see Fig. 15.3). Here, a dynamics controller must control the cart in such a way that the pole always stands up straight. The controller applies a left or right force F of fixed magnitude to the cart, which may change direction at discrete time intervals. The model has four state variables:

**REINFORCEMENT LEARNING 195
**

q

F

x

Fig. 15.3: The cart-pole system.

x the position of the cart on the track, q the angle of the pole with the vertical,

& x & q

the cart velocity, and the angle velocity of the pole.

Furthermore, a set of parameters specify the pole length and mass, cart mass, coefficients of friction between the cart and the track and at the hinge between the pole and the cart, the control force magnitude, and the force due to gravity. The state space is partitioned on the basis of the following quantisation thresholds: 1. x : ±0.8, ±2.4 m 2. q : 0°, ±1°, ±6°, ±12°

& 3. x : ±0.5, ±¥ m/s

& 4. q : ±50, ±¥ °/s

This yields 3 ´ 6 ´ 3 ´ 3 = 162 regions corresponding to all of the combinations of the intervals. The decoder output is a 162-dimensional vector. A negative reinforcement signal is provided when the state vector gets out of the admissible range: when x > 2.4, x < 2.4, q > 12° or q < 12°. The system has proved to solve the problem in about 75 learning steps.

15.5

REINFORCEMENT LEARNING VERSUS OPTIMAL CONTROL

The objective of optimal control is to generate control actions in order to optimize a predefined performance measure. One technique to find such a sequence of control actions which define an optimal control policy is Dynamic Programming (DP). The method is based on the principle of optimality, formulated by Bellman (1957): Whatever the initial system state, if the first control action is contained

k) = min [Jmin xk+ 1. The model has to provide the relation between successive system states resulting from system dynamics. The temporal difference e(k)between the true and expected performance is again used: $ $ e(k) = [g min J (xk+ 1. 1992): Jmin (xk. Assume that a performance measure J(xk.(Sutton. & Watkins.2.6. starting at state xN. For convenience. uk. Solving the equations backwards in time is called dynamic programming. The equations for the discrete case are (White & Jordan. Q. uk+ 1. RL is therefore often called an heuristic dynamic programming technique (Barto. uk . k)] u ÎU .18) Jmin (xN) = r(xN) The strategy for finding the optimal control actions is solving equation (15. uk..17) . a solution can be derived only for a small N and simple systems. In practice. 1992).(15. 1990). k) (the name Q-learning comes from Watkins notation).18) from which uk can be derived. uk . k + 1) + r(xk .(15. uk+ 1. 1992). .. k) = g Jmin(xk+ 1. k) with r being the immediate costs. uk.(15. uk. This can be achieved backwards. The most directly related RL-technique to DP is Q-learning (Watkins & Dayan. The basic idea in Qlearning is to estimate a function. 1992).. and a model. k) u ÎU .5. uk. (Werbos.. control actions and disturbances. The minimum costs Jmin of cost J can be derived by the Bellman equations of DP. Reinforcement learning provides a solution for the problem stated above without the use of a model of the system and environment. the performance measure could be defined as a discounted sum of future costs as expressed by equation 15. & Wilson.196 FUZZY LOGIC AND NEURAL NETWORKS in an optimal control policy. The requirements are a bounded N.(15. In order to deal with large or infinity N.. k + 1) + r(xk . The Bellman equations follow directly from the principle of optimality. is i i i=k N to be minimized. k) . 1992): (1) the critic is implemented as a look-up table.. k + 1) + r(xk. Barto. of states and actions. then the remaining control actions must constitute an optimal control policy for the problem with as initial system state the state remaining from the first control action. k) = å J( x . u .19) $ The optimal control rule can be expressed in terms of J by noting that an optimal control action for $ state x is any action u that minimizes J according to equation 7. (2) the learning parameter a must converge to zero. Sutton.17) and (15. k k $ The estimate of minimum cost J is updated at time step k + 1 according equation 7. k)] J (xk . which is assumed to be an exact representation of the system and the environment. (3) all actions continue to be tried from all states... where Q is the minimum discounted sum of future costs Jmin(xk. uk . uk+ 1. the notation with J is continued here: $ J (xk.20) Watkins has shown that the function converges under some pre-specified conditions to the true optimal Bellmann equation (Watkins & Dayan.

Gullapalli.S. 13. New York. 279-292. MIT Press/Bradford. pp. New York. 834-846. Barto. 295-300 1992. Pattern-recognizing stochastic learning automata. 360-375. pp. Man and Cybernetics. 8. R.G. and D. Neural. White.W. A.S. IEEE Transactions on Systems. pp. Explain reinforcement learning scheme. P. Bellman. 3. Van Nostrand Reinhold.T. and P. 12.). In Proceedings of IFAC/IFIP/IMACS International Symposium on Artificial Intelligence in Real-Time Control. Sofge. R. R. 9-44. Dam. D.). III. Optimal control: a foundation for intelligent control. Fuzzy. 1992. Fuzzy. A menu for designs of reinforcement learning over time. V. 5. Neural Networks. B. 1988. Neural. 4.W. 13. In W. White (Eds. 1983. Jordan. and M. Anderson. Watkins.S. Vol. 11. 9. Handbook of Intelligent Control. pp. 7. Dayan. Sofge & D. New York. Sutton. Learning to predict by the methods of temporal differences. Machine Learning. and P. C.G. Werbos (Eds. Handbook of Intelligent Control. White (Eds. 10. Neuron like adaptive elements that can solve difficult learning problems. 2. 834-846. and Adaptive Approaches. 5.J.W. 1983. Man and Cybernetics. What are the various approaches of control networks used to find optimal relation between system states and control actions? 3. 1. 1985.W. Sutton. A. and J. In D. 8. Fuzzy. Sofge & D. IEEE Transactions on Systems. Applied learning: optimal control for manufacturing.G. Van Nostrand Reinhold. D. Van Nostrand Reinhold. 1990. pp. P. Vol. Man and Cybernetics.H. Delft: IFAC. Barto. 3. Machine Learning.J. Anandan. Vol. pp. A stochastic reinforcement learning algorithm for learning real-valued functions. Neuronlike adaptive elements that can solve difficult learning problems.S. 6. White. 3. R. and Adaptive Approaches. 1992. 1990.M. Describe the Barto network of reinforcement learning.). Learning to avoid collisions: A reinforcement learning paradigm for mobile robot manipulation. Sutton. A.A. IEEE Transactions on Systems. REFERENCES. Vol. 4. 2. Werbos. and C. 6. 1992. Handbook of Intelligent Control. Sutton. Werbos. 1957. In D. Krose. C. C.J. Q-learning. 1. Neural.M. Anderson. & P. and Adaptive Approaches. Describe dynamic programming to find a sequence of acontrol actions. 671-692. In D. Vol. 1992. 15. Vol. Princeton University Press. . White (Eds. pp. Approximate dynamic programming for real-time control and neural modeling. Sofge & D. Neural Networks for Control. Dynamic Programming. R.REINFORCEMENT LEARNING 197 QUESTION BANK. What are the building blocks of Barto network? Explain them. Luxemburg. Barto.). Explain the cast-pole balancing scheme.

279-292. Watkins. 1990. Van Nostrand Reinhold. C. IEEE Control Systems. Sutton.J. 1992. and R. 6. Fuzzy. Dayan. and Adaptive Approaches. 1992. 19-22.H. Watkins. Sutton. Reinforcement learning is direct adaptive optimal control. Vol. A.). Handbook of Intelligent Control. Vol. Wilson. R. Neural. DUNNO. 8. P. Approximate dynamic programming for real-time control and neural modeling. . 16. In D. R. Advances in Neural Information Processing II. Sofge & D. 15. pp. Barto. White (Eds.198 FUZZY LOGIC AND NEURAL NETWORKS 13. Touretsky (Ed.G. Machine Learning. and P. New York. A. and C.S. Q-learning. 14. Werbos. Barto. In D. pp. Sequential decision problems and neural networks. C.S. 1992.).

mortgage screening. Manufacturing: Manufacturing process control. Insurance: Policy application evaluation. target tracking. Entertainment: Animation. integrated circuit chip layout. 10. corporate financial analysis. Banking: Check and other document reading. market forecasting. analysis of grinding operations. portfolio trading program. aircraft component simulation. They then replace complex and costly equipment used for this purpose in the past. Credit Card Activity Checking: Neural networks are used to spot unusual credit card activity that might possibly be associated with loss of a credit card. chip failure analysis. machine maintenance analysis. Electronics: Code sequence prediction. project bidding. planning and management. currency price prediction. creditline use analysis. voice synthesis. 6. object discrimination. 4. machine vision. chemical product design analysis. aircraft component fault detection. signal/image identification. 8. real-time particle identification. 11. special effects. Automotive: Automobile automatic guidance system. 7. computer-chip quality analysis. visual quality inspection systems. product design and analysis. nonlinear modeling. facial recognition. . welding quality analysis.1 INTRODUCTION 16 A list of some applications mentioned in the literature follows: 1.4 Neural Networks Applications 16. dynamic modeling of chemical process system. new kinds of sensors. radar and image signal processing including data compression. 3. Defense: Weapon steering. flight path simulation. Financial: Real estate appraisal. process control. beer testing. Industrial: Neural networks are being trained to predict the output gasses of furnaces and other industrial processes. process and machine diagnosis. product optimization.+ 0 ) 2 6 . credit application evaluation. feature extraction and noise suppression. warranty activity analysis. aircraft control systems. loan advisor. paper quality prediction. 2. Aerospace: High performance aircraft autopilot. corporate bond rating. autopilot enhancements. 5. 9. sonar.

Speech: Speech recognition. arise. Specifically. their solution is not always easy or even possible in a closed form. real-time translation of spoken language. the forward kinematic problem is to compute the position and orientation of the tool frame relative to the base frame (see Fig. 18. manipulator controllers. calculate all possible sets of joint angles which could be used to attain this given position and orientation. 14. Also.1). Within this science one studies the position. automatic bond rating. Telecommunications: Image and data compression. given a set of joint angles. Oil and Gas: Exploration. which is the most important form of the industrial robot. based on sensor data. In robotics. This is a fundamental problem in the practical use of manipulators. This is the static geometrical problem of computing the position and orientation of the endeffector (hand) of the manipulator. 16. There are four related problems to be distinguished (Craig. A very basic problem in the study of mechanical manipulation is that of forward kinematics.2. hospital expense reduction. to grasp objects. routing systems. 13. velocity. emergency-room test advisement. . Securities: Market analysis. speech compression. and of multiple solutions. vision systems. vowel classification. Usually. acceleration. Because the kinematic equations are nonlinear. prosthesis design. text-to-speech synthesis. 17. the major task involves making movements dependent on sensor data.2 ROBOT CONTROL An important area of application of neural networks is in the field of robotics. automated information services. 1989): Forward kinematics Inverse kinematics Dynamics Trajectory generation 16. these networks are designed to direct a manipulator. customer payment processing systems. vehicle scheduling. forklift robot. hospital quality improvement.2. 16. 16.2 Inverse Kinematics This problem is posed as follows: given the position and orientation of the end-effector of the manipulator. and all higher order derivatives of the position variables. which cause it. Robotics: Trajectory control. EEG and ECG analysis. stock trading advisory systems. which treats motion without regard to the forces. the questions of existence of a solution. 15. optimization of transplant times.200 FUZZY LOGIC AND NEURAL NETWORKS 12. Solving this problem is a least requirement for most robot control systems. 16. Transportation: Truck brake diagnosis systems.1 Forward Kinematics Kinematics is the science of motion. Medical: Breast cancer cell analysis. The inverse kinematic problem is not as simple as the forward one. Another applications include the steering and path-planning of autonomous robot vehicles.

Its responds to a control signal depends also on its history (e. making the relative weight change very small). when this position is not always the same. Typically. . If a robot grabs an object then the dynamics change but the kinematics dont.4 Trajectory Generation To move a manipulator from here to there in a smooth. This is because the weight of the object has to be added to the weight of the arm (thats why robot arms are so heavy. In the first section of this chapter we will discuss the problems associated with the positioning of the end-effector (in effect. from the image frame determine the position of the object in that frame.2. controlled fashion each joint must be moved via a smooth function of time. Take for instance the weight (inertia) of the robot arm.2. The dynamics introduces two extra problems to the kinematic problems. With the accurate robot arm that are manufactured. this task is often relatively simple. 16. a complex set of torque functions must be applied by the joint actuators..3 Dynamics Dynamics is a field of study devoted to studying the forces required to cause motion. previous positions. speed. pick up an object. e. 16. Exactly how to compute these motion functions is the problem of trajectory generation.NEURAL NETWORKS APPLICATIONS 4 3 tool frame 201 2 1 base frame Fig.5 End-Effector Positioning The final goal in robot manipulator control is often the positioning of the hand or end-effector in order to be able to. 16. involving the following steps: Determine the target coordinates relative to the base of the robot. In dynamics not only the geometrical properties (kinematics) are used. which determines the force required to change the motion of the arm. acceleration). and finally decelerate to a stop. The robot arm has a memory. glide at a constant end-effector velocity.1 An exemplar robot manipulator. and perform a pre-determined coordinate transformation. this is done with a number of fixed cameras or other sensors which observe the work scene. but also the physical properties of the robot are taken into account. representing the inverse kinematics in combination with sensory transformation).2. In order to accelerate a manipulator from rest.g. 16.g.

This is not trivial. need frequent recalibration or parameter determination. We will discuss three fundamentally different approaches to neural networks for robot end-effector positioning.(16.e. 16.5a Involvement of Neural Networks So if these parts are relatively simple to solve with a high accuracy. Finally. . a solution will be found for both the learning sample generation and the function representation. a form of self-supervised or unsupervised learning is required. Instead.. Constructing the mapping N() from the available learning samples. Also. The target position xtarget together with the visual position of the hand xhand are input to the neural controller N(). Generating learning samples which are in accordance with eq. This controller then generates a joint position q for the robot: q = N(xtarget.2) The task of learning is to make the N generate an output close enough to q0. a neural network uses these samples to represent the whole input space over which the robot is active. (8. and a robot arm. Gripper control is not a trivial matter at all. but has the problem that the input space is of a high dimensionality. the inverse kinematics). The visual system must identify the target as well as determine the visual position of the end-effector. less rigid) robot systems.6 Camera-Robot Coordination in Function Approximation The system we focus on in this section is a work floor observed by fixed cameras. which suffer from wear-and-tear.202 FUZZY LOGIC AND NEURAL NETWORKS With a precise model of the robot (supplied by the manufacturer).. why involve neural networks? The reason is the applicability of robots.. yet still with accurate models as starting point) are required and the system must be calibrated. xhand) . There are two problems associated with teaching N(): 1. but we will not focus on that.1) We can compare the neurally generated qq with the optimal qq0 generated by a fictitious perfect controller R(): q0 = R(xtarget. Move the arm (dynamics control) and close the gripper.2. the development of more complex (adaptive!) control methods allows the design and use of more flexible (i.2).(16. In each of these approaches. xhand) . This is evidently a form of interpolation. since in useful applications R() is an unknown function. both on the sensory and motor side.2.e. accurate models of the sensors and manipulators (in some cases with unknown parameters which have to be estimated from the systems behavior. When traditional methods are used to control a robot arm. systems. This is a relatively simple problem. 16. Some examples to solve this problem are given below. 2.. and the samples are randomly distributed. calculate the joint angles to reach the target (i.. When the (usually randomly drawn) learning samples are available..

6a Approach-1: Feed-forward Networks When using a feed-forward system for controlling the manipulator. x2.e. if we have Y = F(x). then for feeding back the error.. x Neural network q x¢ Plant e1 q¢ Neural network Fig. minimization of e1does not guarantee minimization of the overall error e = x x¢. For example. which generates an angle vector q. x2. Here. We can also train the network by backpropagating this error trough the plant (compare this with the backpropagation of the error in Chapter 12).2. General learning: The method is basically very much like supervised learning. 16. xn) . i..e.. the multidimensional form of the derivative.2 Indirect learning system for robotics.... 3. i.. . Specialized learning: Keep in mind that the goal of the training of the network is to minimize the error at the output of the plant: Î = x x¢. but here the plant input q must be provided by the user. the network. y1 = f1(x1.NEURAL NETWORKS APPLICATIONS 203 16. This x¢ again is input to the network. the network often settles at a solution that maps all xs to a single q (i.e. the network is used in two different places: first in the forward step. In each cycle. a Cartesian target point x in world coordinates is generated. x2.. 2. a self-supervised learning system must be used. by a two cameras looking at an object.2). xn) M ym = fm(x1.. The success of this method depends on the interpolation capabilities of the network. One such a system has been reported by Psaltis. Correct choice of q may pose a problem. and the cameras determine the new position x¢ of the end-effector in world coordinates. xn) y2 = f2(x1. learns by experimentation. .. the mapping II). resulting in q¢.. A Jacobian matrix of a multidimensional function F is a matrix of partial derivatives of F. Three methods are proposed: 1. which is constrained to two-dimensional positioning of the robot arm.. This method requires knowledge of the Jacobian matrix of the plant. Sideris and Yamamura (1988). However.. For example.g. Indirect learning: In indirect learning. The manipulator moves to position q. e. Thus the network can directly minimize |q q¢|. The network is then trained on the error e1 = q q¢ (see Fig. This target point is fed into the network. . 16.

4) where J is the Jacobian matrix of F. (16..(16.5) where Pi(q) the ith element of the plant output for input q...(16. 1991)... (16.3) is also written as dY = J(X) dX . The learning rule applied here regards the plant as an additional and unmodifiable layer in the neural network. However.(16.. (16.. instead of . A somewhat similar approach is taken in (Krose.. Now. and Groen. the Jacobian matrix can be used to calculate the change in the function when its parameters change. 1990) and (Smagt and Krose. Korst. The total error Î = x x¢is propagated back through the plant by calculating the dj: @j = F(sj) åd i i ¶ P ( q) i ¶q j .. + m dxn ¶x1 ¶ x2 ¶x n dY = ¶F dX ¶X . When the plant is an unknown function.. + 1 dxn ¶x1 ¶ x2 ¶x n ¶f2 ¶f ¶f dx1 + 2 dx2 + ..3) LM ¶P OP N ¶q Q i i .204 FUZZY LOGIC AND NEURAL NETWORKS then dy1 = dy2 = ¶f1 ¶f ¶f dx1 + 1 dx2 + . be approximated by ¶Pi (q) can ¶q j ¶Pi (q) Pi (q + hq j e j ) Pi (q) » ¶q j ¶q j . in this case we have Jij = ¶fm ¶f ¶f dx1 m dx2 + . + 2 dxn ¶x1 ¶ x2 ¶x n M dym = or Eq. so....6) di = x x¢ where I is used to change the scalar qj into a vector. Again a two-layer feed-forward network is trained with back-propagation..7) This approximate derivative can be measured by slightly changing the input the plant and measuring the changes in the output..

Due to the fact that the camera is situated in the hand of the robot. the available learning samples are approximated by a single. and to be very adaptive to changes in the sensor or manipulator (Smagt & Krose. Again measure the distance from the current position to the target position in camera domain. smooth function consisting of a summation of sigmoid functions. q. The configuration used consists of a monocular manipulator. 16. x Ril + 1 x¢. the task is to move the hand such that the object is in the centre of the image and has some predetermined size (in a later article. 6. The reason for this is the global character of the approximation obtained with a feed-forward network with sigmoid units: every weight in the network has a global effect on the final approximation that is obtained. an accurate representation of the function that governs the learning samples is often not feasible or extremely difficult (Jansen et al. Krose. x¢. a biologically inspired system is proposed (Smagt. Smagt. Dq) to the network. Calculate the move made by the manipulator in visual domain.3 The system used for specialized learning.NEURAL NETWORKS APPLICATIONS 205 X Neural Network q Plant X¢ e Fig. But how are the optimal weights determined in finite time to obtain this optimal representation? Experiments have shown that. & Krose. where Ril + 1 is the rotation matrix of the second camera image with respect to the first camera image.. such that the dimensions of the object need not to be known anymore to the system). This system has shown to learn correct behavior in only tens of iterations. as input for the neural network. x. 1991. calculating a desired output vector the input vector which should have invoked the current output vector is reconstructed. 5. 4. Send Dq to the manipulator. which has to grasp objects. By using a feed-forward network. . The network then generates a joint displacement vector Dq. although a reasonable representation can be obtained in a short period of time. Measure the distance from the current position to the target position in camera domain. A feed-forward network with one layer of sigmoid units is capable of representing practically any function. Teach the learning pair (x Ril + 1 x¢. 1994). together with the current state q of the robot. One step towards the target consists of the following operations: 1. Use this distance. and back-propagation is applied to this new input vector and the existing output vector. 3. 1993). 2. 1992) in which the visual flow-field is used to account for the monocularity of the system. and Groen. Groen.

. the observed location of the object x (a four-component vector) is input to the network. We will only describe the kinematics part. consists of a robot manipulator with three degrees of freedom (orientation of the end-effector is not included). As with the Kohonen network. With each neuron a vector q and Jacobian matrix A are associated. which are arranged in a 3-dimensional lattice. since it is the most interesting and straightforward. To correct for the discretization of the working space.. resulting in retinal coordinates xg of the end-effector. correspond in a 11 fashion with subregions of the 3 D workspace of the robot. an additional move is made which is dependent of the distance between the neuron and the object in space wk x. and Schulten (1989) describe the use of a Kohonen-like network for robot control. 16. Fig. the neuronal lattice is a discrete representation of the workspace. 16. The system described by Ritter et al.. Thus accuracy is obtained locally (keep it small and simple). The system is observed by two fixed cameras which output their (x.e. Each run consists of two movements. which has to grab objects in 3D-space. During gross move qk is fed to the robot which makes its move.2. The final retinal coordinates of the end-effector after . In the gross move. i. final .206 FUZZY LOGIC AND NEURAL NETWORKS Building local representations is the obvious way out: every part of the network is responsible for a small subspace of the total input space.4). This is typically obtained with a Kohonen network.4 A Kohonen network merging the output of two cameras. The neurons. y) coordinates of the object and the end effector (see Fig.6b Approach 2: Topology Conserving Maps Ritter. this small displacement in Cartesian space is translated to an angle change using the Jacobian Ak: qfinal = qk + Ak(x wk) which is a first-order Taylor expansion of q this fine move are in xf . the neuron k with highest activation value is selected as winner. because its weight vector wk is nearest to x. Martinetz.8) . 16.(16.

and that after 30. A)jnew = (q. 16. accurate control with nonadaptive controllers is possible only when accurate models of the robot are available. A)* has been found. This error is then added to qk to constitute the improved estimate q* (steepest descent minimization of error). In this case. The manipulator used consists of three joints as the manipulator in Fig. jk a distance function is used such that gjk (t) and g¢ (t) are Gaussians depending on the distance between jk neurons j and k with a maximum at j = k. The network is extremely simple.. Furukawa. In fact. the following adaptations are made for all neurons j: wjnew = wjold + g(t) gjk(t) (x wjold) (q. the system is a feed-forward network. . This requirement has led to the current-day robots that are used in many factories.e. and the robot is not too susceptible to wear-and-tear. and Suzuki (1987). the related joint angles during fine movement. is fed into the inverse-dynamics model (Fig.(16..NEURAL NETWORKS APPLICATIONS 207 Learning proceeds as follows: when an improved estimate (q.(16. which is generated by another subsystem.1 without wrist joint..9). Dynamics model. One of the first neural networks which succeeded in doing dynamic control of a robot arm was presented by Kawato. Their system does not include the trajectory generation or the transformation of visual coordinates to body coordinates. as with the Kohonen learning rule. The error between qd(t) and q(t) is fed into the neural model. 16. (16. They describe a neural network which generates motor commands from a desired trajectory in joint angles. but by carefully choosing the basis functions. A)* is obtained as follows: q* = qk + Ak(x xf) A* = Ak + Ak(x wk xf + xg) .9) = Ak + (Dq Ak D x) ( x f xg )T || x f x g ||2 Dx T || Dx||2 . the basis functions are thus chosen that the function that is approximated is a linear combination of those basis functions.10) can be recognized as an error-correction rule of the Widrow-Hoff type for Jacobians A. i.10) In eq.. An improved estimate (q. the robot itself will not move without dynamic control of its limbs. the final error x xf in Cartesian space is translated to an error in joint space via multiplication by Ak. Again. the network can be restricted to one learning layer such that finding the optimal is a trivial task.5). Here. this is similar to perceptron learning. (16. The desired trajectory qd (t). A)* (q.10).e.. i. and Dq = Ak(x wk). the change in retinal coordinates of the end-effector due to the fine movement. A)jold) jk j If gjk(t) = g¢ (t) = djk.2. 16. In eq.. But the application of neural networks in this field changes these requirements.000 learning steps no noteworthy deviation is present. A)jold + g¢(t) g¢ (t) ((q.7 Robot Arm Dynamics While end-effector positioning via sensor{robot coordination is an important problem to solve. (16. Thus eq. It appears that after 6. Dx = xf fg.000 iterations the system approaches correct behavior.

Although the applied patterns are very dedicated. 8. 16. 16..1.2. with w1: w2: w3 = 1: also successful. 746. consists of three perceptrons. 191. qd 3(t)) and fl and gl as in Table 16. Next.(16. The resulting signals are weighted and summed. 16. After 20 minutes of learning the feedback torques are nearly zero such that the system has successfully learned the transformation. 2 : 3 is . qd2. qd 2(t). qd 3(t)) xl 2 = xl3 = gl (qd 1(t).208 FUZZY LOGIC AND NEURAL NETWORKS Inverse dynamics model q d( t ) Ti(t) + Tf (t) + K T(t) Manipulator q(t) + – Fig. qd 2(t).11) dq(t ) .2. such that Tik(t) = with xl1 = f1(qd1(t).3) . The desired trajectory qd = (qd1. each one feeding in one joint of the manipulator. The feedback torque Tf (t) in Fig. training with a repetitive pattern sin (wkt). dt (k = 1. (k1. the weights adapt using the delta rule g dwik = xik T1 = xik(Tjk Tik).12) A desired move pattern is shown in Fig. qd3) is feed into 13 nonlinear subsystems.2. 16.6.4)T and (16. 3) .. 37.2. The neural model..7.5 consists of Tfk(t) = Kpk(qdk(t) = qk(t)) + Kvk åw i =1 13 lk xlk (k = 1. 2.2.5 The neural model proposed by Kawato et al. which is shown in Fig..3) dt Kvk = 0 unless |qk(t) qdk(objective point) | < e The feedback gains Kp and Kv were computed as (517.0.(16.4)T .

6 The neural network used by Kawato et al. 3(t) qd 1(t) qd 2(t) qd 3(t) T1(t) T2(t) Ti1(t) g1 S Ti2(t) Ti3(t) S g13 T3(t) Fig.NEURAL NETWORKS APPLICATIONS x1. 2(t) x1. q2. q2. q3) && q1 && sin2 q q1 2 && cos q q1 2 2 g1(q1. 3(t) x13. one per joint in the robot arm. f1(q1. 2(t) x13.1). 16. The upper neuron is connected to the rotary base joint (cf. q3) && q2 && q3 && cos q q2 3 && cos q q3 3 && 2 sin q cos q q1 2 2 & q 2 sin (q2 + q3) cos (q2 + q3) 1 & q 2 sin q2 cos (q2 + q3) 1 & q 2 cos q2 sin (q2 + q3) 1 & q 2 sin q3 2 & q 2 sin q3 3 & q q 2 &&3 sin q3 & q2 & q3 && sin (q + q ) q1 2 3 2 && cos2 (q + q ) q1 2 3 && sin q sin (q + q ) q1 2 2 3 && q sin q cos q q1 & 2 2 2 && q sin (q + q ) cos (q + q ) q1 & 2 2 3 2 3 && q sin q cos (q + q ) q1 & 2 2 2 3 && q cos q sin (q + q ) q1 & 2 2 2 3 && q sin (q + q ) cos (q + q ) q1 & 3 2 3 2 3 && q sin q cos (q + q ) q1 & 3 2 2 3 && q1 . joint 1 in Figure 16. 1(t) 209 f1 S f13 x13. 1(t) x1. the other two neurons to joints 2 and 3. There are three neurons. Table 16. Each neuron feeds from thirteen nonlinear subsystems.1: l 1 2 3 4 5 6 7 8 9 10 11 12 13 Nonlinear transformations used in the Kawato model.

Artificial neural networks refer to a group of architectures of the brain (Cheng and Sheng. ensuring safe and efficient metal removal rate and taking corrective actions in the event of failures and disturbances (Yusuf.7 The desired joint pattern for joint 1. in this study (Ibrahim and Mclaughlin. (b) The back-propagation technique generalizes the given information in order to store it inside the initially selected hidden layers. are now constructed. Unless recognized in time. The usefulness of neural algorithms is demonstrated by the fact that novel robot architectures. ART paradigm was used for the following reasons. The use of adaptive resonance theory (ART) type neural network was evaluated for detections of tool breakage.3 DETECTION OF TOOL BREAKAGE IN MILLING OPERATIONS The recent trend in manufacturing is to achieve integrated and self-adjusting machining systems. Joints 2 and 3 have similar time patterns. The milling operations can be monitored with the neural network.210 FUZZY LOGIC AND NEURAL NETWORKS q1 p 0 -p 10 20 30 t/s Fig. With the normal and broken tool cutting force variation signals is possible to train neural networks. 1993). One of the most important monitoring requirements is a system capable of detecting tool breakages on-line. ART type unsupervised neural network paradigm was used for detection of tool breakage. tool breakage can lead to irreparable damage to the workpiece and possibly to the machine tool itself. Neural networks are also classified as supervised and unsupervised according to their learning characteristics. after training. The back propagation technique cannot give reliable decisions on the sufficiency of previous training. the neural network classifies the signals by itself. which no longer need a very rigid structure to simplify the controller. In unsupervised learning. 1995). 16. Neural networks with parallel processing capability and robust performances provide a new approach to adaptive pattern recognition. The absence of human supervision requires on-line monitoring of machining operation. In this section. and . which are capable of machining varying parts without the supervision of operators. The cutting force variation characteristics of normal and broken tools are different. 1998). Also simulation-based training is proposed to reduce the cost of preparing the systems that monitor the real cutting signals. (a) The training of paradigm is much faster than the back-propagation technique. 16. Adaptive Resonance Theory (ART 2) architectures are neural networks that carry out stable self-organization of recognition codes for arbitrary sequence of input patterns.

gain and learning procedures..... If the input pattern is matched with known pattern in memory. The theoretical background of ART 2 type neural network. this bottom-up pattern is compared to the top-down. matched.(16.(16. The F1 field in ART 2 includes a combination of normalization and noise suppression. The basic ART 2 architectures consist of two types of nodes.3. which are temporary and flexible. To overcome this problem. it is coded and classified as a new category. The unsupervised ART neural networks can monitor the signal based on previous experience and can update itself automatically while it is monitoring the signals (Carpenter and Grossberg.17) . F1 uses the following equations to calculate the nodes: ui = vi e + ||v|| . 1991). Simulated data was used to select the best vigilance of the ART 2 type neural network and to evaluate the performance of paradigm. in addition to the comparison of the bottom-up and top-down signals needed for the reset mechanism.. Another important issue is the training of the neural network... and the long term memory (LTM) nodes. If the new pattern cannot be classified in a known category. The STM F1 nodes are used for normalization.. simulation-based training of neural networks was introduced. (1987).(16..15) . F1 and F2. where it is normalized. To accomplish this.14) . When as ART network receives an input pattern.18) wi = Si + aui pi = ui + S q( yi) z ji qi = pi e + || p|| wi e + || w|| vi = f(xi) + bf(qi ) xi = . or already known patterns.(16.(16. the weights of the model are changed to update the category. which are permanent and stable. 16. The ART 2 neural networks developed by Carpenter and Grossberg (1991) self-organize recognition codes in real time. The STM is divided into two sets of nodes. Adaptive resonance occurs when the input to a network and the feed back expectancies match.13) . control. and stored in the LTM (zji). learned. the short term memory (STM) nodes. It is extremely expensive and time consuming to collect cutting force data at different cutting conditions with normal and broken tools.. the proposed data monitoring system and their performance is presented in the paper.(16..NEURAL NETWORKS APPLICATIONS 211 (c) ART has very important advantage since it can be trained in the field and continuously updates previous experience.16) .1 Unsupervised Adaptive Resonance Theory (ART) Neural Networks The theory of adaptive resonance networks was first introduces by Carpenter and Grossberg. The input pattern ( i ) is received by the STM..

. and only one is allowed to interact with the STM... all other nodes in the LTM are inhibited....20) is satisfied. 2 .24) and (16. b. F2 input pattern (bottom-up) is the key property that is used for node selection.(16. When F2 chooses a node.23) The bottom-up and top-down LTM equations are bottom-up (F1 ® F2) : dzij dt topbottom ( F2 ® F1) : = g(yj) [pi zij] .. F2 equations select or activate nodes in the LTM..(16.(16. f(x) = Rx S0 T if x ³ q if 0 £ x ³ q . 2. The non-linear signal function in equation (5) is used for noise suppression .19) where q is an appropriate constant..(16.. The output function of F2 is given by g(yi) = Rd S0 T i i if Ti : j = 1. then equations (16. The function f filters the noise from the signal. 2004] Tj = åpz i i ji . q can be set to zero for the case where filtering is not desired.(16.22) to: .. The node that gives the largest sum with the F1.25) are modified from equation (16... .21) Competition on F2 results in contrast enhancement where a single winning node is chosen each time.25) When F2 is active.20) The jth node is selected if equation (16. N} .24) dzij dt = g(yj) [pi zij] .22) Equation (3) takes the following form: pi = Ru Su + dz T if F2 is inactive if jth node is F2 is active ij .. ||v|| and ||w|| denote the norms of the vectors p. The constants a.. N otherwise . and e are selected based on the particular application. the activation function (f ) is given by the equation... Tj = max { Tj: 1. and si is the input.212 FUZZY LOGIC AND NEURAL NETWORKS Here ||p||.. Bottom-up inputs are calculated as in ART 2 [Fauselt.(16. v and w. The STM F2 nodes are used for the matching procedure.(16.

016 1. Table-16. the neural network assigned more nodes to the signal of a good tool with offset. a match has been found and the new pattern is learned by the system.016 1. a new node is created.. 16.2 254 254 Tool condition G B G B G B G B Category 1 2 1 3 1 3 4 5 Spindle speed (rpm) 500 500 500 500 500 500 500 500 .016 Feed rate mm/min 50. the neural network classified the good and broken tools in different categories. experiments were done at different feed rates with the good and broken tool.2 203. It indicates that the broken tool signals are more similar to each other at different cutting conditions compared to the force patterns of normal tools.98 in all the tests.2 Classification of experimental data with the ART.26) where d is a constant ( 0 < d < 1). feed rate.016 1. r: ri = ui + cpi e + ||u|| + ||cp|| .6 101.27) If ||r|| < r e. and depth of cut of these different conditions are out lined in Tables 16.(16. The neural network did not have any prior information at the beginning of each test.016 1.016 1. An orienting ART 2 sub system is used to decide. the neural network inspected the resultant force profile and placed it into a category or initiated a new category if it was found to be different. and the new pattern is stored.07 mm diameter at various cutting conditions. If no match has been found after all nodes have been activated. In each one. 2.96 or 0. On the other hand.2-16.. 1.8 50.3.2 Results and Discussion The experimental data was collected with a fo1ur flute end mill of 12. Depth of cut (mm) 1. 3. The vigilance of the ART 2 selected either 0. The LTM node weights are recalculated and the pattern is learned by the system.(16. The ART assigned 2. The ART neural network monitored the profile of the resultant force in different tests. In all the tests. The ART used four categories to classify all of the data. the neural network generated only one category in the 2nd (Table 16.016 1. The spindle speed. then F2 resets another node. For the broken tool 2.5.NEURAL NETWORKS APPLICATIONS 213 dzij dt = d[pi zij] .4) and 4th (Table 16.96.5.016 1. 1.3) . In the three tests.6 203. 1 and 3 different categories for the good tool..8 101. 1 and 3 different categories were selected. If ||r|| ³ r e . Vigilance of the neural network was 0.2-16. 3rd (Table16. As seen in Tables 16. if a new pattern can be matched to a known pattern by comparing with a given vigilance parameter..5) tests for the broken tools.

524 1. the neural network started to monitor the experimental data collected at different conditions. The ART 2 used four categories to classify all of the data.524 1.2 203.8 50.524 1.016 1. Depth of cut (mm) 1.6 101.016 1.96. The ART 2 used two categories to classify all of the data.98 is used.524 1.016 1.016 1.6 203. Depth of cut (mm) 1.3: Classification of experimental data with the ART. The ART used three categories to classify all of the data.6 101. After simulation training.524 1.2 254 254 Tool condition G B G B G B G B Category 1 2 3 2 4 2 4 2 Spindle speed (rpm) 700 700 700 700 700 700 700 700 Table 16. Vigilance of the neural network was 0. the network classified the perfect too input data into seven different categories and classified the broken tool input data into four different categories.524 1.524 Feed rate mm/min 50. which requires a minimum number of nodes and has acceptable error rate.8 101.016 Feed rate mm/min 50.524 1.2 254 254 Tool condition G B G B G B G B Category 1 2 3 2 1 2 3 2 Spindle speed (rpm) 500 500 500 500 500 500 500 500 Table 16.6 203. Vigilance of the neural network was 0.524 1. Vigilance of the neural network was 0.6 203.8 101.524 1.8 50.016 1.524 1.524 Feed rate mm/min 50.8 50. The studies focused on selection of the best vigilance.2 203.214 FUZZY LOGIC AND NEURAL NETWORKS Table16.6 101.5: Classification of experimental data with the ART 2.016 1.2 254 254 Tool condition G B G B G B G B Category 1 2 1 2 1 2 1 2 Spindle speed (rpm) 700 700 700 700 700 700 700 700 The ART gained first experience on the simulation data and later.524 1. .524 1.8 101.524 1.96.98. the neural network inspected the incoming signals and continued to assign new categories when different types of signals were encountered. Depth of cut (mm) 1.016 1.524 1.4: Classification of experimental data with the ART 2. When the vigilance of 0.2 203.

North-Holland/Elsevier Science Publishers.T. .H. Department of Computer Systems.C. & J. No. of Neural Networks. 1989.A. and F. P. In Proceedings of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems. O.J.J. Craig. H. and F. K. 9. IEEE International Workshop on Intelligent Motor Control. Detection of tool breakage in milling II. 1993. 2. Smagt.A. Kaynak (Ed. 1. N.J. O. Groen. Enumerate various applications of neural networks. T. R. Martinetz. F. Jansen. Vol. pp. A. Murray (Ed. 8. 4. 1991. Korst. Kawato.J. pp. Krose. M. 57. Kohonen. 1990. Sideris. D. IEEE. P. Schulten. 7.Tools Manufact. Krose. pp. 4. Adaptive hamming net.). 2. In T. and K.1991. Krose. pp. A real-time learning neural robot controller. 3.J. 1992. Introduction to Robotics.J. Korst. Nested networks for robot control.F.J. 12. Explain the application of Kohonon network for robot control.J. Yamamura. Explain the application self-supervised learning to control manipulator.P.C. Kluwer Academic Publishers. and B. Using time-to-contact to guide a robot manipulator. North Holland/Elsevier Science Publishers. Explain the application of neural network for robot arm dynamics. Addison-Wesley Publishing Company.A. 8. B. and B.). and B. Kangas (Eds. In T. 1. Krose. Psaltis. 5. Int. B. Smagt. Yusuf In-processs detection of tool breakages using time series monitoring of cutting forces. A. 13.M. Krose. Kohonen.A. 33. Simula. 169-185.Tools Manufact. Ritter. Tech. 4. 1988. 199-203. IEEE Control Systems Magazine. No. Ibrahim and C. A. Vol. Vol.J. 8. Furukawa. 6.J. 28. REFERENCES. No. 3. 1993.A. P. 159-168. Mclaughlin. Mach. CS-93-10.A.NEURAL NETWORKS APPLICATIONS 215 QUESTION BANK. Neural Networks.Cheng and F. J. J. University of Amsterdam.C. 2. The neural network approach. Simula. 10.P. pp.J. 5. & Suzuki. akisara. 4. Proceedings of the 1991 International Conference on Artificial Neural Networks. 605-618. 11.M.. J. B. In O. K. A. 157-172. pp. 17-21. Neural Network Applications. pp.A. Mach. Learning strategies for a vision based neural controller for a robot arm. A real-time learning neural robot controller. K. 1995. 545-558. 2. Makisara.A.C. Vol. and F. No. Proceedings of the 1991 International Conference on Artificial Neural Networks. Topology-conserving maps for learning visuomotor-coordination. pp. M. and A. 1994. M.).A. Kangas (Eds. Int. 351-356. Nos. (1987).A. A Fast learning ART 1 model without searching. A hierarchical neural-network model for control and learning of voluntary movement. & J. Robot Hand-Eye Coordination Using Neural Networks. Smagt. 351-356. 1989.P. Int. Biological Cybernetics. A multilayer neural network controller. Vol. 2. Krose..). Explain the application of ART for machining applications. 1998. Sheng.A. Groen. Groen. In A. J. Rep. Groen.

1987. G. and Image Processing. Carpenter and Grossberg.A. pp. A massively parallel architecture for a self-organizing neural pattern recognition machine. G. 1991. J.216 FUZZY LOGIC AND NEURAL NETWORKS 14. 15. 2nd Edition. Vol. 16. Int. Fundamentals of Neural Networks. 54115. 26(23). Fauselt. Grossberg. Carpenter. 1987. . 2004.A. Applied Optics. of Neural Network. Architectures. 37. Vol. Algorithms. Grossberg. and Applications. Computer Vision. Carpenter. and S. Pearson Education. ART 2: Self-organization of stable category recognition codes for analog input patterns. ART 2-A: Adaptive resonance Algorithm for rapid category learning and recognition. 4. 493-504. L. pp. Graphics. and S. 17. 4919-4930.

neural network is utilized in the first subproblem (e.g.. when talking about the neurofuzzy systems. A simple sequential hybrid system is shown in Fig. Combining neural networks and fuzzy systems in one unified framework has become popular in the last few years. The hybrid systems are classified as: Sequential hybrids Auxiliary hybrids Embedded hybrids 17.g. .e. The output of one technology becomes the input to another technology and so on. the link between these two soft computing methods is understood to be stronger.2. Sometimes neurofuzzy is associated to hybrid systems. Neural networks concentrate on the structure of human brain. in signal processing) and fuzzy logic is utilized in the second subproblem (e. This is one of the weakest forms of hybridization because an integrated combination of the technologies is not present. In the following light is tried to shed on the most common different interpretations.+ 0 ) 2 6 . Hybrid systems are those for which more than one technology is employed to solve the problem. in reasoning task). i. In that case. 17... Normally. on the hardware whereas fuzzy logic systems concentrate on software.1 INTRODUCTION Neural networks and fuzzy systems try to emulate the operation of human brain. 17.2 HYBRID SYSTEMS Designation neurofuzzy has several different meanings.1.4 Hybrid Fuzzy Neural Networks 17 17. which act on two distinct subproblems.1 Sequential Hybrid Systems Sequential hybrid systems make use of technologies in a pipeline-like structure.

17. The fusion is so complete that it would appear that no technology could be used without the others for solving the problem.2 Auxiliary Hybrid Systems In this. The second technology processes the information provided by the first and hands it over for further use. 17. one technology calls the other as a subroutine to process or manipulate information by it. 17.1 A sequential hybrid system.2. Fig.2. 17. An auxiliary hybrid system is shown in Fig.218 FUZZY LOGIC AND NEURAL NETWORKS Inputs Technology A Technology B Outputs Fig.3 illustrates an embedded hybrid system.2. The auxiliary hybrid system is better than the sequential hybrid system.2 An auxiliary hybrid system. The embedded hybrid system is better than sequential and auxiliary hybrid systems. . Inputs Technology A Technology B Output Fig. 17. the technologies participating are integrated in such a manner that they appear intertwined. 17.3 Embedded Hybrid Systems In embedded hybrid systems.

17. 17. Rules are of the type: Rule 1: Rule 13: Rule 25: IF(G o E is NB) AND (C o G o E is NB) THEN (C o LP is NS) IF(G o E is ZB) AND (C o G o E is ZE) THEN (C o LP is PS) IF(G o E is PB) AND (C o G o E is PB) THEN (C o LP is NS) where C o LP is change of learning parameter. a fuzzy control of back-propagation is illustrated in Fig. negative small. PS and PB are fuzzy sets negative big.) Learning parameter performance Desired performance Error FLC MLP Output Actual performance Fig. 17. The purpose is to achieve a faster rate of convergence by controlling the learning rate parameter with fuzzy rules. NB. zero equal.3 An embedded hybrid system. FLC is fuzzy logic controller. For example. (They also incorporated in rules information about the sign change of gradient and information about the momentum constant.3 FUZZY LOGIC IN LEARNING ALGORITHMS A common approach is to use fuzzy logic in neural networks to improve the learning ability.HYBRID FUZZY NEURAL NETWORKS 219 Inputs Technology A Technology B Outputs Fig. positive small and positive big. . 17.4 Learning rate control by fuzzy logic. ZE.3. C o G o E is change of G o E (approximation to second order gradient). G o E is the gradient of the error surface. NS. MLP is multiplayer perceptron.

1: Consider a simple network y = g(w1x1 + w2x2). 2.. Output of (17. fuzzy output 3..1) . t-conorm S(xi. 1].(17. w1). The problem of regular fuzzy neural networks is that they are monotonic. . . where the inputs and weights are fuzzy numbers... fuzzy weights. and if they are chosen on the contrary. addition is replaced) using t-norm. Activation function g can be any continuous function. Example 17.(17.220 FUZZY LOGIC AND NEURAL NETWORKS Simulation results show that the convergence of fuzzy back propagation is faster than standard back propagation. Another way to implement fuzzy neuron is to extend weights and/or inputs and/or outputs (or targets) to fuzzy numbers. In addition there exists a type of network where the weights and targets are crisp and the inputs are fuzzy..1). sigmoidal activation function and all the operations are defined by extension principle.. fuzzy input. .. we get an AND fuzzy neuron (17. product is replaced) using t-norm y = T(S(x1. wi). These combinations are again combined (i..2: (regular fuzzy neural network. The networks of this type are used in classification problems to map fuzzy input vectors to crisp classes..4 FUZZY NEURONS Definition 17. fuzzy output (or crisp output by defuzzification). or some continuous operation.. (usually membership degrees) and real valued weights wi Î[0. we get an OR fuzzy neuron (17. fuzzy weights. wd)) y = S (T(x1.. or some other continuous operation.e. We use the extension principle to calculate wixi Output fuzzy set Y is computed by the Extension principle Y(y) = R(w x + w x ) g S0 T 1 1 2 2 1( y ) if 0 £ y £ 1 otherwise . crisp input. w1). crisp weights. wi). The choices are as follows: 1. RFNN) A regular fuzzy neural network is a network with fuzzy signals and/or fuzzy weights. Definition 17.(17.e. t-conorm.1) corresponds to min-max composition and (17. fuzzy input. T (xd.3) where g1(y) = ln y ln (1 y) is simply the inverse function of logistic sigmoid g(z) = 1/(1 + e z).2).2) T(xi.. S (xd.1: (hybrid neural network) A hybrid neural network is a network with real valued inputs xi Î[0. which means that the fuzzy neural nets based on the extension principle are universal approximators only for monotonic functions.. 17. 1]. t-norm (min) for addition and t-conorm (max) for product... If we choose linear activation function. Input and weight are combined (i. fuzzy output which can be used to implement fuzzy IF-THEN rules..2) corresponds to max-min composition known from the fuzzy logic. wd)) .

x¢.HYBRID FUZZY NEURAL NETWORKS 221 Theorem 17. The main goal of identifying principal components is to preserve as much relevant information as possible. if x1 Ì x¢ and x2 Ì x¢ (xi.. The most common method to decrease the dimension of input space is the principal component analysis (PCA). t-conorm..5) .. but there is no standard path to follow. if we have five inputs and each input space is partitioned into seven fuzzy sets. and projecting the data onto this M-dimensional subspace. This reduction is achieved via a set of linear transformations. Definition 17. there is a strong need for data reduction. calculate covariance matrix and its eigenvectors and eigenvalues .807. Selecting M attributes from d is equivalent to selecting M basis vectors which span the new subspace.(17. When the dimension of the problem increases the size of the fuzzy model (and the size of training set needed) grows exponentially. Input and weight are combined using t-norm T(xi. 1]. or some other continuous operation. then g(w1x1 + w2w2) Ì g(w1x¢ + w2 x¢ ) 1 2 Proof: Because min and max are increasing functions. and/or fuzzy valued weights wi Î [0. Activation function g can be any continuous function. The use of more than 4 inputs may be impractical.6) where mi is the number of fuzzy sets on axis i.(17.. This is a serious drawback for the networks of this type.e. wi).5 NEURAL NETWORKS AS PRE-PROCESSORS OR POSTPROCESSORS One of the biggest problems with fuzzy systems is the curse of dimensionality. The smallest number of input variables should be used to explain a maximal amount of information. compute the mean of inputs in data and subtract it off 2. Many researchers working with fuzzy neurons follow the basic principles described above. 1]. The algorithm goes as follows: 1. These combinations are again combined using t-norm. wi). the number of combinations is 16.(17..3: (hybrid fuzzy neural network. which transform input variables to a new set of variables (uncorrelated principal components). For example. Therefore. HFNN) A hybrid fuzzy neural network is a network with fuzzy valued inputs xi Î [0. or some continuous operation.4) which means that regular fuzzy neural network is not a universal approximator. i. The number of combinations of input terms (possibly also the number of rules) is Õm i d i . identifying principal reduce the dimensionality of a data in which there are large number of correlated variables and at the same time retaining as much as possible of the variation present in the data. then (w1x1 + w2x2) Ì (w1x¢ + w2 x¢ ) 1 2 . 17.1: g is an increasing function of its arguments.. t-conorm S(xi. wi are 1 2 i fuzzy numbers). Therefore.

The principle of the method is that it finds the derivatives of an error function with respect to the weights in the network. is represented as a network. the fuzzy logic system with d inputs one output approximator. The idea was to use the training algorithm to adjust weights. increase flexibility. with M < d ) which is trained to map input vectors onto themselves by minimization of sum-of-squares error is able to perform a linear principal component analysis. functions Pm X1 * * l WM Xd * * Pm k 1 Pm 1 W1 Norma lize l l + Denominator = Gaussian membership function Fig. some researchers began to represent fuzzy logic systems as feed forward networks. Learning is assumed to reduce design costs. centers and widths of membership functions. which led to the development of neurofuzzy systems. If prior knowledge is unavailable and/or the plant is time-varying then the most sensible (possibly the only) solution is to utilize learning capabilities. The original purpose of neurofuzzy systems was to incorporate learning (and classification) capability to fuzzy systems or alternatively to achieve similar transparency (intelligibility) in neural networks as in fuzzy systems. It was named error back-propagation. retain eigenvectors corresponding to the M largest eigenvalues 4.5. 17. If two additional nonlinear hidden layers are allowed to put into the network. In the 1980s. In Fig. 17. the network can be made to perform a non-linear principal component analysis. improve performance and decrease human intervention. . Since back-propagation can be applied to any feed forward network. a computationally efficient training algorithm for multi-layer neural networks was discovered.5 Neurofuzzy network for back-propagation. A two-layer perceptron with linear output units (number of hidden units is M. The error function can then be minimized by using gradient based optimization algorithms.6 NEURAL NETWORKS AS TUNERS OF FUZZY LOGIC SYSTEMS The similarities between neural networks and fuzzy logic systems were noticed. Multivariate memb. project input vectors onto the eigenvectors Neural networks may be used to perform the dimensionality reduction. 17.222 FUZZY LOGIC AND NEURAL NETWORKS 3.

5. The performance of committee can be better than the performance of isolated networks.7 ADVANTAGES AND DRAWBACKS OF NEUROFUZZY SYSTEMS The advantages are as follows: 1.g.6. it is basically the same network. 17. Other parameters (width and position of membership functions) have clear meaning 3. fuzzy logic systems and conventional models. Weights are the centers of THEN part fuzzy sets (clear meaning) 2. .8 COMMITTEE OF NETWORKS The method of combining networks to form a committee has been used to improve the generalization ability of the networks.Consequent lizer parameters Neurofuzzy classifier x1 A1 A2 B1 B2 Class 1 Class 2 Class 3 Class 4 Class 5 y x2 Linguistic variables Fig. 17. Only the way to illustrate network differs. It can consist of networks with different architectures. 17.6 And Or Max defuzzifier Neurofuzzy network for function approximation and classification problems. e.. Initial weights can be chosen appropriately (linguistic rules) The drawback is the curse of dimensionality. 17. different types of neural networks. The most common way to represent neurofuzzy architecture is shown in Figure 17. Although it looks different from the network in Fig.HYBRID FUZZY NEURAL NETWORKS 223 Neurofuzzy function approximator x1 A1 A2 B1 B2 S y x2 Linguistic Multivariables plier Norma.

and show its good properties in interval arithmetic. Lastly we examine the ability of proposed fuzzy neural network implementing on fuzzy if-then rules.e. Kosko (1991) has proposed the use of weighted average to combine different fuzzy systems that try to predict the same input-output relation. and III. and jth neuron of the hidden layer. In the fuzzy neural networks based on BP.(17. the connections between the layers may be illustrated as a matrix of fuzzy weights wji. 17.224 FUZZY LOGIC AND NEURAL NETWORKS The committee prediction (output) is calculated as an average of the outputs of the q networks: yCOM(x) = 1 yi ( x ) q i =1 å q . crisp weight and fuzzy inputs. There are no interactions and feedback loops among the neurons of same layer. neurons are organized into a number of layers and the signals flow in one direction. Input x1 Hidden Output y1 y2 x2 y3 Fig. The total fuzzy input of jth neuron in the second layer is defined as: . we propose a learning algorithm from the cost function for adjusting three parameters of each strong L-R type fuzzy weight. II. Next we define the strong L-R type fuzzy number.. 1].7 shows this model fuzzy neural network. 17. The only difference with (17. a-cuts) of fuzzy weights and fuzzy inputs. fuzzy weight and fuzzy inputs. such that at least one system has nonzero credibility. He does not restrict the form of fuzzy system to be additive or SAM system.. Fig..7) The reduction of error arises from the reduced variance due to averaging. While defining a cost function for level sets of fuzzy outputs and fuzzy targets.9 FNN ARCHITECTURE BASED ON BACK PROPAGATION The input-output relation is numerically calculated by interval arithmetic via level sets (i.7 A three layered fuzzy neural network According to the type of inputs and weights we define three different kinds of fuzzy neural networks as follows: I. fuzzy weight and crisp inputs. which provides a fuzzy weight of a connection between ith neuron of the input layer.7) is that Kosko weights fuzzy system outputs yi(x) by credibilities wi Î [0. Type (III) of fuzzy feed forward neural networks is presented here. In this model. 17.

(17.. . pj + Qk .10) . j = 1. Tp) is the fuzzy input-output pairs. let (xp. fuzzy weights and fuzzy biases..12) The cost function for the h-level sets of the fuzzy output vector Op and the fuzzy target vector are defined as eph = where L epkh = e pkh + eU pkh 2 åe k =1 NO pkh . 2.13) L e pkh d[T = h.11) Opk = f (Netpk).9) åw j =1 NH O kj . Netpj is the total fuzzy input of the jth neuron of hidden layer. . a .(17.. 2. Tp2..8) Where... the fuzzy output of the kth neuron of output layer is defined as follows: Netpk = .. Next... and Qj is fuzzy bias of the jth neuron. Tp) is obtained as ep = åe h ph . .(17. NH Furthermore.... The cost function for the input-output pair (xp. The fuzzy output of the jth neuron is defined with the transfer function Opj = f(Neypj).(17. we need to find out a type of fuzzy number to denote the fuzzy inputs.. and put forward a FNN algorithm . NO The fuzzy output is numerically calculated for level sets (i.. fuzzy weights and fuzzy biases.O ji i =1 N pj + Qj . and Tp = (Tp1... j = 1. Tp NO) is NO dimensional fuzzy target vector corresponding to the fuzzy input vector xp. Furthermore..HYBRID FUZZY NEURAL NETWORKS 225 Netpj = å w .(17.(17. this type fuzzy number has good property so that it can be easily adept to the interval arithmetic. Opi = xpi is the ith fuzzy input of that neuron..e. d[T = h...cut) of fuzzy inputs. L pk ]h L [Opk ]h i 2 [Opk ]U h 2 eU pkh U pk ]h i 2 Next section we introduce the strong L-R type fuzzy number..

Accordingly we can adjust three parameters of each strong L-R type fuzzy weight and fuzzy biases. The triangular fuzzy number (T. is a reference function of fuzzy numbers if 1. Qk by these parameters as Wkj = (wa . w b.(17. wkj and fuzzy biases Qj. qb. We will utilize them in next section to define arithmetic operations on fuzzy numbers. S(x) = S (x). L(b a) = L(a) = 0. wg )LR ji ji ji Qk = (qa.6: This kind of fuzzy number M is said to be an strong L-R type fuzzy number if L(1) = R(1) = 0.F.1 Strong L.N.15) . Definition 17. b. usually denoted L or R. Let (b x)| a = 1. The strong L-R type is an important kind of fuzzy numbers. v)LR. + ¥ ]. M is the mean value of M. wg )LR kj kj kj Wji = (wa.R Representation of Fuzzy Numbers Definition 17. Those properties are essential for defining meaningful arithmetic operations on fuzzy numbers. Fuzzy numbers are convex fuzzy sets. in other words. S(0) =1. These are closed intervals of real numbers. We express the strong L-R type fuzzy weight wji.5: A fuzzy number M is said to be an L-R type fuzzy number if R LFG b x IJ |H a K | mM(x) = S |RFGH x b b IJK | T if x £ b a > 0 if x ³ b b > 0 . Definition 17. 17.) is a special class of the strong L-R type fuzzy number.9. x = b a º a such that. w b ...4: A function...14) L is the left and R for right reference. v) of real numbers. The a-cuts of every fuzzy number are closed intervals of real numbers 2. the strong L-R type fuzzy number can be uniquely represented by three parameters. S is non-increasing on [0. 3. such that the support of every fuzzy number is the interval (a. a and b are called left and right spreads. We can write any strong L-R type fuzzy number symbolically as M = (a. we write M = (mab)LR. These operations are the cornerstone of interval analysis. arithmetic operations on fuzzy numbers can be defined in terms of arithmetic operation on closed intervals of real numbers. qg )LR k k k . same as R(b + b) = R(v) = 0. 2.226 FUZZY LOGIC AND NEURAL NETWORKS based BP.(17. 3. which is a wellestablished area of classical mathematics. This kind of fuzzy number has properties as follows: 1. Since each fuzzy set is uniquely represented by its a-cut. symbolically.

16) U Dw kj(t) = h U ¶wkj ¶e ph + xDw Uj (t 1) k . a ¶wkj ¶e ph = ¶e ph ¶[ wkj ]a h ¶e ph ¶[ wkj ]= h y ¶wkj ¶e ph = LM c NM1 + c LM c MN1 + c kj kj kj kj OP QP e L ( h) O PP + ¶[¶w 1+ c Q + 1 kj ¶e ph ckj ckj L1 ( h) + R 1 ( h) y 1 + ckj ¶[ wk j ] h 1 + ckj 1 + ckj ph y kj ]h LM NM LM 1 MN1 + c + kj ckj 1 + ckj R 1 OP QP O (h ) P PQ . and cj = b a b a qk qk qj qj Then b wkj = v a wkj + ckj + wkj 1 + ckj b b b b .(17... it is h-level and 0-level have relations as follows: [ wkj ]a = h y a wkj + ckj + wkj 1 + ckj y a wkj + ckj + wkj y [ wkj ]h = 1 + ckj LM (w w ) OP L (h) NM 1 + c QP L c (w w ) OP R (h) M NM 1 + c QP y kj a kj 1 . cji = w g wb ji ji wb w a ji ji . w b .22) .. ck = q gj qb q g qb j k k . qk have same form as wkj. wv ) between the jth kj kj kj hidden unit and the kth output unit.17) The derivates of above can be written as follows: a ¶wkj ¶e ph = ¶[ wkj ]a h ¶e ph ¶e ph y ¶[wkj ]a ¶e ph ¶[wkj ]h h + y a a ¶wkj ¶[ wkj ]h ¶wkj y ¶[wkj ]a ¶e ph ¶[wkj ]h h + y y y ¶wkj ¶[ wkj ]h ¶wkj ... qg)LR j j j Let ckj = g b wkj wkj b a wkj wkj ...(17...23) . qk ..(17. Similar to Rumelhart.(17.(17. we can count the quantity of adjustment for each parameter by the cost function L Dwkj(t) = h L ¶wkj ¶e ph L + xDw k j (t 1) .19) Since wkj is a strong L-R type fuzzy number. wij ..(17.18) y ¶wkj ¶e ph = ¶[ wkj ]a h ..(17... q b.21) kj Therefore.. We discuss how to learn the strong L-R type fuzzy weight wk j = (wa .(17.HYBRID FUZZY NEURAL NETWORKS 227 Qj = (q a..20) kj kj y kj a kj 1 .

2.228 FUZZY LOGIC AND NEURAL NETWORKS These relations explain how the error signals ¶e ph [ wk j ]a h and ¶e ph [ wk j ]a h for the h-levelset propagate to the 0-level of the strong L-R type fuzzy weight Wkj. for example: large....e. 1 |x|2) . We can solve the above problem by using the fuzzy neural network we discussed.25) + 1) = g w kj(t) + Dw g kj We assume that n values of h (i..(17... go to 2. . If a pre-specified stopping condition (etc..24) .. 4. the learning algorithm of the fuzzy neural network can be defined as follows: 1. Ù.27) We should train this network in order to make eph be minimum. Repeat the following procedure for p = 1.. hn) are used for the learning of the fuzzy neural network. we assume that Api is a symmetrical strong L-R type fuzzy number.(17.26) according to the target output T and the real output O.... Tp2. Let (xp. .. It is easy to know that the error function become the classical error function e = å (t p =1 k p op )2 / 2 in BP algorithm when input vector Ap .2 Simulation We consider a n-dimension classification problem.. In this way. .. Tp)).. xpn) beling to Gp. Initialize the fuzzy weights and the fuzzy biases. 17. . k.(17. Forward calculation: Calculate the hlevel set of the fuzzy output vector Op corresponding to the fuzzy input vector xp. Where p = 1.. . 2. and the target ouput Tp can be defined as follows: Tp = R1 S0 T if Ap Îclass 1 if Ap Îclass 2 . THEN xp = (xp1. h2. L = R + max (0. the total number of iterations) is not satisfied. hn 3.. Repeat 3 for h = h1. We define the error function: eph = max {(tp op)2/2|Op Î[yp]h} .9. and xpn is Apn. h1... Apn). .(17. and then. Tp) is the fuzzy input-output pairs. that is to say. Back-propagation: Adjust the fuzzy weights and the fuzzy biases using the cost function cph. So we note the fuzzy input as Ap = (Ap1. the fuzzy weight is updated by the following rules: a a w kj(t + 1) = w a (t) + Dw kj kj g w kj(t . small etc. h2. 2. It can be described by IF-THEN rules as follows: IF xp1 is Ap1 and.. m (m input-output pairs (xp. .... for the convenience of computing. TpNo)whaere No dimensional fuzzy taget vector corresponding to the fuzy input vector xp.Api is lingistic term. and Tp = (Tp1..

17.8).. The membership function m A1 (xi) corresponds to the input x = (x1.. . F m ( x )I JK å GÕ H f (x) = F I å G Õ m ( x )J H K m yl n l =1 i =1 Ail i m n .8. 0. 17.. we get a satisfied curve after 300 echoes.28) The result of the trained fuzzy neural network is shown in Fig. xn) of the rule l. We assume A1-A4 belong to class1.8 A5 A6 A7 Class 2 A8 10 15 20 The result of fuzzy classification 17. called f.6. 0.HYBRID FUZZY NEURAL NETWORKS 229 and yp are real numbers.4. We train the fuzzy neural network with h-levelsets (h = 0..10 ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM (ANFIS) A fuzzy system can be considered to be a parameterized nonlinear map.29) l =1 i =1 Ai1 i where y¢is a place of output singleton if Mamdani reasoning is applied or a constant if Sugeno reasoning is applied. the error function of the pth pair is: eph = å h max {(t p p o p ) 2 / 2| Op Î[ y p ]h} .5 14 A3 A4 8 6 Class 1 A2 A1 2 1 5 Fig... (17. 20 18 16 0. 0. Lets write here explicitly the expression of f.. and A5-A8 belong to class 2. Using the proposed learning algorithm.2. The i and connective in the premise is carried out by a product and defuzzification by the center-of-gravity method. .(17.

(17. then for any n ³ 0...30) bj(x) = åÕm l =1 i =1 i =1 m n Õm n Ail ( xi ) Ail ( xi ) If F is a continuous.(17. Theorem 17.. . . Then k there exists the least squares polynomial of degree £ n.5: Let F be a bounded function on [a.. and E = (x1. b].31) Well-known theorems from Approximation theory for polynomials. can be extended to fuzzy systems (e. Theorem 17. The following theorems are found in R. then f can approximate F to any desired accuracy. xk) a set of points in [a.F. nonlinear map on a compact set.. b]..(17. Prentice Hall.6: If FÎC[a.. F » fFS . The same can also be said of trigonometric functions. Curtain and A. and E = (x1.J.. We can also consider the simpler problem of approximating at finitely many points..g. Wang: A course in fuzzy systems and control... there exists a best approximating polynomial pnof degree £ n such that ||F pn||¥ £ ||F = p||¥ over all polynomials p of degree £ n. .. i..e.(17. xk) a set of points in [a.230 FUZZY LOGIC AND NEURAL NETWORKS This can be further written as f (x) = where wi = y¢ m å w b ( x) i i i =1 . b]. pn which minimizes å | F( x ) p( x )| i i i =1 k 2 . 1997). Remark: The message of Theorem 2 is that polynomials are dense in the space of continuous functions in Hilbert space..32) over all polynomials of degree £ n. b].(17. Then there exists a best approximating polynomial p k of degree £ n.34) over all polynomials of degree £ n. Theorem 17.7: If F is a bounded function on [a. which minimizes n 0£i£ k .. Pritchard: Functional Analysis in Modern Applied Mathematics as Corollaries of Orthogonal Projection Theorem..33) max |F(xi) p(xi)| . b].

then f1 = p1x + q1 y + r1 2..9 ANFIS structure .HYBRID FUZZY NEURAL NETWORKS 231 17. y) + w2 ( x.(17. then f2 = p2x + q2 y + r2 Let the membership functions of fuzzy sets Ai. Evaluating the rule premises results in wi = m Ai (x) m Bi (x).39) y P w1 N w1 w1f1 S w2f2 y f y . Bi.1 ANFIS Structure Consider a Sugeno type of fuzzy system having the rule base 1. In evaluating the rules.36) Or leaving the arguments out f= . Evaluating the implication and the rule consequences gives f (x. y) = w1 ( x. choose product for T-norm (logical and). i = 1. 17.. y) w1 ( x..37) This can be separated to phases by first defining wi = . y) w1 f1 + w2 f2 w1 + w2 wi w1 + w2 .35) 2.(17.....(17. i = 1..(17. If x is A1 and y is B1.38) Then f can be written as f = w1 f1 + w2 f2 x A1 x A2 B1 B2 P w2 N w2 x Fig. 2 . m Bi . 1. y) + w2 ( x. If x is A2 and y is B2..10.(17. y) f2 ( x. 2. y) f1 ( x.. be m Ai .

1. S. Macmillan. H. Lee and B. What is committee of networks? Explain. Fuzzy BP: a neural network model with fuzzy inference. IEEE Computer Magazine. Describe adaptive neuro-fuzzy inference system. . 1994. NY. 7. Understanding Neural Networks and Fuzzy Logic. 21. IEEE press. 3. 77-88. 3. Prentice Hall. Kosko. Haykin. 2. 6. Englewood. Carpenter and S. NY. Liu. Grossberg. B. QUESTION BANK. Kartalopoulos. NJ. 4. 1583-1588. IEEE computer society press. 3. 1. pp. Proceedings of International conference on artificial neural networks. G. Vol. 1991. 2. 5. REFERENCES. S.M. 5. What are the different types of hybrid systems? Explain them schematically.232 FUZZY LOGIC AND NEURAL NETWORKS All computations can be presented in a diagram form. No.H. Neural Networks: A Comprehensive Foundation. Explain the role of neural networks as pre-processor or post-processor of data. The art of adaptive pattern recognition by a self organizing neural network. Define the following: Hybrid neural network Regular fuzzy neural network Hybrid fuzzy neural network 4. Explain the use of fuzzy logic in neural networks to improve the learning ability.A. Neural Network and Fuzzy Systems. pp. Describe FNN architecture based on back propagation.

In order to ensure efficiency within the system. since human operators are absent in these systems. a detecting technology for unexpected tool breakage is needed (Lan and Naerheim. flexible manufacturing systems (FMS) have been investigated as a tool for raising manufacturing productivity and product quality while decreasing production costs. 1988). The decision-making system analyzes information provided by sensors to make appropriate control actions. By themselves. electronic sensors associated with a decision-making system must monitor the process. However. It is improper to expect that if the individual technologies are good then hybridization of technologies should turn out to be even better. In addition to performing the same function as the FMS.4 Hybrid Fuzzy Neural Networks Applications 18. computer numerical control (CNC) machines are not typically capable of tool breakage detection. Yellowley. monitoring equipment and algorithms for the adaptation of the manufacturing process must be executed accurately (Altintas. To apply the UFMS effectively. One type of FMS is the Unmanned Flexible Manufacturing System (UFMS). which has received a great deal of attention because it replaces human operators with robotic counterparts in manufacturing and assembly cells.2 TOOL BREAKAGE MONITORING SYSTEM FOR END MILLING In recent years. manufacturers must confirm that the tool is in good condition in process. automatic and rapid detection of tool breakage is central to successful UFMS operation. To reduce costs of materials and prevent damaged tools from negatively affecting production.1 INTRODUCTION 18 The hybrid fuzzy neural networks have a tremendous potential to solve engineering problems.+ 0 ) 2 6 . the UFMS reduces direct labour costs and prevents personal oversights. Hybridization is performed for the purpose of investing better methods of problem solving. therefore. Materials costs increase and product quality suffers if a broken tool is used in production. Since CNC machines cannot detect tool conditions. they cannot halt the process if the tool becomes damaged. and Tlusty. global competition in industry has led to exploration of new means of more efficient production. 18. In particular. 1986). .

Cho. FR. which used the fast a posterior error sequential technique (FAEST). generated from x and y directions. An appropriate threshold was built to analyze information and detect tool conditions. The time-series-based tooth period model technique (TPMT).. In this study. analyzing force signals and determining amplitude fluctuations allowed on-line tool breakage detection. meaning that each cutting tooth moving in the same direction generates a cyclic cutting force ranging from zero to maximum force.(18.234 FUZZY LOGIC AND NEURAL NETWORKS An in-process tool breakage detection system was developed in an end milling operation with cutting force and machining parameters of spindle speed. The application of neural networks and fuzzy logic in detecting tool breakage has also been studied in recent years. a battery-powered sensing force/torque cutter holder mounted on the spindle head with the transmitter. The principle of cutting force can be further defined as resultant force. Lan and Naerheim (1986) proposed a time series auto regression (AR) model of force signals to detect tool breakage. and back to zero. Han. to measure force in milling operations. was used in this experiment expressed as: Fri = Where Fx2i + Fy2 i . generated in x and y directions. Measured by sensors. A dynamometer sensor is the main device used to measure force signals in different machining operations. Milling is an interrupted cutting process. the average and median forces of each tooth were used as input information. Chen and Black (1997) also introduced a fuzzy-nets system to distinguish tool conditions in milling operations. Tarng and Lee (1993) proposed using the average and median force of each tooth in the milling operation. Zhang.1 Methodology: Force Signals in the End Milling Cutting Process Milling is a fundamental machining process in the operation of CNC machines. 18. The machining parameters and average peak force were used to build the AR model and neural network. Variance of adjacent peak force was selected as an input parameter to train the system and build a rule-bank for detecting tool breakage. and Chen (1995) used a telemeter technique. Jemielniak (1992) proposed that sudden changes in the average level of force signals could be due to catastrophic tool failure (CTF) in turning operations. Ko. and Jung (1994) introduced an unsupervised self-organized neural network combined with an adaptive time-series AR modeling algorithm to monitor tool breakage in milling operations. The common method of detecting tool breakage in process involves force signals resultant from tool processes on raw materials. The fuzzy-nets system was designed to build the rule-bank and solve conflicting rules with a computer.1) Fri is the resultant force of point I . Tae and Dong (1992) developed a fuzzy pattern recognition technique and a time-series AR model to detect tool wear in turning operations. and depth of cut selected as input factors.2. The resultant force. feed rate. This cyclic force is graphed as a series of peaks. Fri. was applied by Tansel and McLaughlin (1993) to detect tool breakage in milling.. and milling operations can be of two varieties: peripheral and face milling. The neural networks approach was employed as the decision-making system that judges tool conditions. The variation of dynamic cutting force was used to construct the fuzzy dispersion pattern needed to distinguish tool conditions.

and depth of cut. the peak measurement of each tooths force should be roughly the same from tooth to tooth during one revolution of the cutting process.2) .2. Wji and Wkj denote weight from input to hidden neuron.. This approach involves supervised learning. the arcs joining nodes are unidirectional. Figure 18.(18. i is an input neuron. if a tooth is broken. was used to combine or aggregate inputs passing through the connections from other neurons. usually 1. has been proven effective in dealing with this kind of task.1. Comparatively. It is relatively easy to apply.3) åa W jk where. In this work.. maximum peak force in each revolution should differ between good and broken tools. feed rate. It can be expressed as Si = Sk = åa W i k ji a0Wjo a0Wjo .e. Tool conditions and machining parameters affect the magnitude of resultant force. while a0 represents the bias. and k is an output neuron. 18. j is a hidden neuron. also called a summation or aggregated function. it generates a smaller peak force because it carries a smaller chip load. two main differences can be used to detect tool breakage: 1. the prediction function was achieved via the weight information. 1 illustrates the diagram of undamaged and broken tools. if the tool condition is good. therefore. such as spindle speed. The cutting force and machining parameters. and maximum peak force of a broken tool must be larger than that of a good tool. feed forward (i. which requires a teacher that knows the correct output for any input. the neural networks approach was used as a decision-making system using input from sensors to judge tool conditions. Maximum variance force of adjacent peaks should differ between good and broken tools. 2. nodes are grouped in layers). as shown in Figure. As the weights of the neural network were obtained. and from hidden to output neuron respectively.. Finally. Back propagation is intended for training layered (i. were selected as input factors. As a result.. and maximum variance force of adjacent peaks of broken tools must be larger than in undamaged tools.2 Neural Networks In this work. and has also proven successful in practical applications. The propagation rule. . and there are no cycles) nets. an in-process tool breakage detection system was developed in an end milling operation. Applying the force principle..(18.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS 235 Fxi is the force in X direction of point I Fyi is the force in Y direction of point i. a back propagation neural network (BPNN) was chosen as the decision-making system because it is the most representative and commonly used algorithm. and Wji and Wji are the weight of bias.e.. 18. and uses gradient descent on the error provided by the teacher to train the weights. the tooth that follows a broken tooth generates a higher peak force as it extracts the chip that the broken tool could not.

also called the output or squashing function.4) 1 + e ay where ay is a function of Sj and Sk respectively.1 The amplitude of cutting force of a good and broken tool. The transfer function.. and one is called the Sigmoid Function.. feed = 15 ipm depth = 0. 18. Comparing actual output of neural networks to desired output. Many different transfer functions can be used to transfer data.236 FUZZY LOGIC AND NEURAL NETWORKS N 1200 Fa 1000 800 600 400 200 0 –200 One revolution b a Good Tool N 1600 1400 1200 1000 800 600 Broken Tool Fa a One revolution 400 200 0 – 200 Key: a = force signal b = revolution signal Cutting Parameters: speed = 650 rmp.08 inch b b Fig. is used to produce output based on level of activation.(18. expressed as: 1 Oy = . . the process is repeated until the error percentage falls into a reasonable range.

2 The experimental setup. The cutting parameters were set as: 1.2. Five level of spindle speed (740. . 18.08. and 3.07. 18. Five levels of depth of cut (0.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS 237 18.3). Five level of feed rate (6. In each cutter. and 650 revolution per minute). A dynamometer was mounted on the table to measure cutting force.3 Experimental Design and System Development Experimental Design This experiment employed a CNC vertical machining center.06. 0.2. 18 and 24 inch per minute). The experimental setup was shown in Fig. VM C40 Proximity Sensor Workpiece Dynamometer Amplifier A/D Board DC Power Supply Fig. 600. one side of the tool was in proper working order and the other side was broken. The broken side of the tool possessed varying degrees of breakage (Fig. 0. 550. 18. 0. 500. Four ¾-inch doubt-end four-flute high-speed steel cutters were used. 2. A proximity sensor was built near the spindle to confirm data in each revolution. 12.09 and 0.1 inches).

**238 FUZZY LOGIC AND NEURAL NETWORKS
**

T1 T2 T3 T4

1.5

1.5

3

2.5

1.5 Unit of value: mm.

2

2

3

Fig. 18.3 Diagram of broken tool.

The cutters used to execute the experiment were selected randomly. Cutting force was measured in voltage by the Charge Amplifier and transformed to Newtons (N) via computer.

18.2.4

Neural Network-BP System Development

To develop back propagation of neural networks as a decision-making system, MATLAB software was applied to analyze data. Seven steps were conducted. In step one, prediction factors were determined in order to perform the training process. Step 2 was necessary to analyze differences between scaling data and unscaling data. Step 3 dealt with separating data into training and testing categories. From steps 4 through 6, parameters were developed for the training process, including the hidden layer/hidden neuron, learning rate, and momentum factor. Finally, in step 7, information from the training process was used to predict tool conditions. Step 1. Determine the factors Five input neurons were used for tool breakage prediction data: 1. Spindle speed; 2. Feed rate; 3. Depth of cut; 4. Maximum peak force; and 5. Maximum variance of adjacent peak force. Output neurons were either (1) Good, or (2) Broken. Three hundred data points were used in this work. Good tools collected half of these and broken tools collected the rest, and all data were randomized using MS Excel software. Step 2. Analyze unscaling and scaling data In order to avoid experimental errors resulting from bigger values of some data sets, some preprocessing was needed to obtain good training and prediction results. Since histograms of all data sets

HYBRID FUZZY NEURAL NETWORKS APPLICATIONS

239

were uniform or normal distributions, the Simple Linear Mapping method was employed for scaling. To compare the difference between two sample sizes, some parameters were first set and fixed. The number of hidden neurons was set at 4, the learning rate was set at 1, and the momentum item was 0.5. The number of training cycles was 2000, and the testing period was 5. Table 18.1 shows the comparison of the difference between scaling and unscaling data. As one can see, errors in scaling data are smaller than in unscaling data.

Table 18.1 Difference between scaling and non-scaling

Hidden layer Unscale Scale 1 1 Hidden neuron 4 4 Learn rate 1 1 Momentum factor 0.5 0.5 Train Error 0.505 0.040 Testing Error 0.550 0.160 RMS error of training 0.508 0.197 RMS error of testing 0.515 0.388

Step 3. Impact of the ratio of training and testing data The 300 original 300 data records were randomized and separated into three groups. The first group had 200 training data and 100 (200 ´ 100) testing data, the second had 225 training and 75 (225 ´ 75) testing data, and the third had 250 training and 50 testing (250 ´ 50) data. Table 18.2 shows the Back Propagation Neural Network (BPNN) with different sample sizes of training and testing data. The last four columns of Table 18.2 show training and testing errors. The training, testing, and RMS (root mean square) errors of training of the second group were smaller than in other groups. The RMS errors of testing data of the second group sample were larger than in the first; however, the RMS errors of each sample size were very similar. If samples had similar error percentages, the sample with the largest training sample size was selected because it provided sufficient information to predict testing data. From the experimental design, the ideal ratio between training and testing data was 3:1 for neural networks. The 225 ´ 75 sample size was employed in this analysis.

Table 18.2:

Tra*Tes 200* 100 225* 75 250* 50 Hidden layer 1 1 1 Hidden neuron 4 4 4

**Different sample size of training and testing data
**

Learn rate 1 1 1 Momentum factor 0.5 0.5 0.5 Train Error 0.040 0.036 0.044 Testing Error 0.160 0.093 0.106 RMS error of training 0.197 0.185 0.204 RMS error of testing 0.388 0.298 0.285

Step 4. Impact of the hidden layer and hidden neuron In the beginning, the number of hidden neurons was set at 5, and the hidden layer was set at 1. Different hidden neurons and layers were tested to determine which values would lead to the smallest error percentage. To this end, the hidden neurons were set at 4 and 6, and the hidden layers were set at 1 and 2. Table 18.3 shows the BPN with a different number of hidden neurons and layers. According to this data, the percentage error of the trial with 4 hidden neurons and 1 hidden layer was less than it was in all

**240 FUZZY LOGIC AND NEURAL NETWORKS
**

other trials. Thus, the configuration contained in the 4 hidden neuron/1 hidden layer experiment was chosen because it led to the best results. The formula, (input neurons + output neurons)/2, was useful for determining the number of hidden neurons at the beginning.

Table 18.3: Different number of hidden neurons and layers

Neuron in layer-1 3 4 5 3 4 4 4 Neuron in layer-2 ~ ~ ~ 3 2 3 4 Learn rate 1 1 1 1 1 1 1 Momentum factor 0.5 0.5 0.5 0.5 0.5 0.5 0.5 Train Error 0.080 0.036 0.049 0.267 0.489 0.316 0.164 Testing Error 0.200 0.093 0.093 0.320 0.453 0.333 0.227 RMS error of training 0.256 0.185 0.193 0.338 0.512 0.407 0.362 RMS error of testing 0.410 0.298 0.288 0.375 0.504 0.425 0.414

Step 5. Impact of the learning rate This step was necessary to determine the optimal learning rate. The initial learning rate was 1. Three additional learning rates, 0.5, 2, and 10, were used to compare with the initial. Table 18.4 shows the BPN with different learning rate values. Table 18.4 shows that the error percentage of the learning rates of 0.5 and 1 were the same, in addition to being lower than all other learning rates. To achieve the objective of finding the smallest error percentage, the learning rate of 1 was used, because the software originally recommended that value.

Table 18.4

Hidden layer 1 1 1 1 Hidden neuron 4 4 4 4 Learn rate 0.5 1 2 10

**Different learning rate values
**

Train Error 0.036 0.036 0.116 0.111 Testing Error 0.093 0.093 0.133 0.133 RMS error of training 0.185 0.185 0.306 0.286 RMS error of testing 0.298 0.298 0.319 0.317

Momentum factor 0.5 0.5 0.5 0.5

Step 6. Impact of the momentum factor The final step of data analysis was to change the value of the momentum item to obtain the configuration leading to the lowest error percentage. The initial value of the momentum item was 0.5. Another three values, 0.3, 0.6, and 0.8, were selected to compare with the initial value. Table18.5 shows the BPN with different values for the momentum item. Table 18.5 shows that the percentage of errors of momentum items of 0.3 and 0.5 are the same, and smaller than all others. To achieve the smallest error percentage, the 0.5 momentum item was used, because the software originally recommended that value. Step 7. Prediction After completing analysis and obtaining information about weight and input factors, equations to predict tool conditions were constructed. The variables a1, a2, ., and a5 represent 5 input factors, maximum peak force, spindle speed, feed rate, depth of cut, and maximum variance of adjacent peak force, respectively. By application of equations (18.5), the weighted value of hidden factors ah1, ah2, ah3, ah4 can be expressed as:

**HYBRID FUZZY NEURAL NETWORKS APPLICATIONS
**

Table 18.5

Hidden layer 1 1 1 1 Hidden neuron 4 4 4 4 Learn rate 1 1 1 1

241

**Percent error of momentum items
**

Train Error 0.036 0.036 0.049 0.116 Testing Error 0.093 0.093 0.093 0.133 RMS error of training 0.185 0.185 0.212 0.306 RMS error of testing 0.298 0.298 0.288 0.322

Momentum factor 0.3 0.5 0.6 0.8

ah1 = ah2 = ah3 = ah4 = aout1 = aout2 =

1 + exp 1 + exp

[ a1 ´ ( 4.652 ) + a2 ´ ( 0 .448 ) + a3 ´ ( 0.947 ) + a4 ´ (25.237 ) + a5 ´ ( 0 .853 ) (0.221 )]

1

...(18.5) ...(18.6)

[ a1 ´ ( 40 .457 ) + a2 ´ ( 39 .421 ) + a3 ´ (15.261 ) + a4 ´ ( 7.317 ) + a5 ´ (21.054 ) ( 44 .505 )]

1

1 1 + exp

[ a1 ´ (10.224 ) + a2 ´ (3.444 ) + a3 ´ ( 24 .252 ) + a4 ´ ( 3.449 ) + a5 ´ ( 4 .215) (1.289 )]

...(18.7)

1 + exp 1 + exp 1 + exp

[ a1 ´ (1.321 ) + a2 ´ (24.736 ) + a3 ´ ( 0.202 ) + a4 ´ ( 0.79 ) + a5 ´ ( 0.015 ) ( 0.829 )]

1

...(18.8) ...(18.9) ...(18.10)

[ ah1 ´ (11.697) + ah 2 ´ (16.977 ) + ah 3 ´ (12.295 ) + ah 4 ´ (11.807 ) ( 2.945 )]

1

[ ah1 ´ (11.697) + ah 2 ´ (16.977 ) + ah 3 ´ (12.295 ) + ah 4 ´ (11.807 ) (2.945 )]

1

Finally, the output information was used to judge the tool conditions: If aout1 > aout2, then the tool condition is used If aout1 < aout2, then the tool is broken.

18.2.5

Findings and Conclusions

To operate the UFMS successfully, in-process sensing techniques that relate to rapid-response decisionmaking systems were required. In this research, a neural networks model was developed to judge cutting force for accurate in-process tool breakage detection in milling operations. The neural networks were capable of detecting tool conditions accurately and in process. The accuracy of training data was 96.4%, and the accuracy of testing data was 90.7%. Partial results of training and testing data are shown in Tables 18.6 and Table 18.7.

**242 FUZZY LOGIC AND NEURAL NETWORKS
**

Table 18.6 Partial results of training data

Tool Conditions Good Broken Good Broken Good Broken Good Broken Good Broken a1 904.5 1220.54 634.64 780.14 674.36 847.06 1239.4 1677.92 1413.76 1861.72 Input factors a2 600 600 550 550 650 650 500 500 450 450 a3 12 12 10 10 15 15 18 18 15 15 a4 0.09 0.09 0.06 0.06 0.06 0.06 0.07 0.07 0.08 0.08 Output factors a5 45.8 954.04 274.64 368.64 248.64 537.54 225.2 1,159.64 300.56 1,250.92 aout1 1 0 0.98 0.03 1 .02 1 0 0 0.01 aout2 0 1 0.02 0.97 0 0.98 0 1 1 0.99 Good Broken Good Broken Good Broken Good Broken Broken Broken Prediction

**Table 18.7 Partial results of testing data
**

Tool Conditions Good Broken Good Broken Good Broken Good Broken Good Broken a1 711.94 1296.56 723.32 1215.96 1084.32 1542.92 1024.46 1253.28 1507.18 1876.74 Input factors a2 a3 550 550 550 550 550 550 600 600 450 450 15 15 12 12 18 18 18 18 20 20 a4 0.06 0.06 0.07 0.07 0.07 0.07 0.07 0.07 0.08 0.08 Output factors a5 aout1 177.02 481.66 311.52 1,042.50 192.5 1,303.92 172.22 580.3 550.06 1,062.02 1 0.01 0.35 0 0.99 0 0.98 0.03 0.98 0 Prediction aout2 0 0.99 0.65 1 0.01 1 0.02 0.97 0.02 1 Good Broken Broken Good Broken Good Broken Broken Broken Broken

The weights of hidden factors and output factors were generated from pre-trained neural networks, and a program was written to process these weights in order to respond to the tool conditions. Therefore, the in-process detection system demonstrated a very short response-time to tool conditions. Since tool conditions could be monitored in a real-time situation, the broken tool could be replaced immediately to prevent damage to the machine and mis-machining of the product. However, since the weights were obtained from the pre-trained process, they were fixed when they were put into the detection program. Therefore, the whole system does not have the adaptive ability to feed back information into the system. In this work, depth of cut was employed as one input factor. However, in actual industrial environments, the surface of work materials is often uneven, implying that the depth of cut set in the computer might differ from that used to cut the workpiece. Under the circumstances, the neural networks might generate a wrong decision and misjudge the tool conditions due to fluctuating depths of cut across machining.

leading to increase in the emission level and variation of the generated heat flow.3 CONTROL OF COMBUSTION Beside the economical and environmental advantages. and oil). The proper selection of the number.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS 243 18. even if steady fuel feed volume is maintained. The combustion of those fuels or fuel-mixtures has different properties compared to the conventional fuels (coal. It causes non-steady. I is the degree of the membership of the input to the fuzzy membership function (MF) represented by the node: O1. Bio fuels and municipal wastes are very inhomogeneous. then (f1 = p1x + q1y + r1) if (x is A2) and (y is B2). if bell MF is used then. . including the O2 content. density.3. 4 Ai and Bi can be any appropriate fuzzy sets in parameter form. Combined with a stoichiometric model. 18. This topic presents an ANFIS system. let us consider two-fuzzy rules based on a first order Sugeno model: Rule 1: Rule 2: if (x is A1) and (y is B1). Note that a circle indicates a fixed node whereas a square indicates an adaptive node (the parameters are changed during training). gas. The properties (heat value. For example. moisture content. on the steam generation and on the power production can be observed through the O2 content of the flue gas. then (f2 = p2x + q2y + r2) One possible ANFIS architecture to implement these two rules is shown in Fig. which determines the amount of fuel fed to the combustion chamber. Those property variations are not predictable or directly measurable. i = mAi(x)i = 1.4. 2 O1. Layer 1: All the nodes in this layer are adaptive nodes. i = mBi 2(y)i = 3. it predicts the flue gas properties.1 Adaptive Neuro-Fuzzy Inference System Fuzzy Logic Controllers (FLC) has played an important role in the design and enhancement of a vast number of applications. homogeneity. agitated combustion conditions. the type and the parameter of the fuzzy membership functions and rules are crucial for achieving the desired. To present the ANFIS architecture. mix ability) may vary in a large range. only their effects on the combustion. 18. Such framework makes FLC more systematic and less relying on expert knowledge. In the following presentation OLi denotes the output of node i in a layer L. Adaptive Neuro-Fuzzy Inference Systems are fuzzy Sugeno models put in the framework of adaptive systems to facilitate learning and adaptation. there are several difficulties with burning bio fuels and municipal wastes.

Y W1 M N W1 Y Y S M W2 Layer 2 N W2 Layer 3 Layer 4 Layer 5 mAi (x) = 1 LF x c I OP 1 + MG MNH a JK PQ 2 i i bi i = 1. The outputs of these nodes are given by: O2. i = mAi(x) mBi(y) i = 1.11) where ai.. 18..244 FUZZY LOGIC AND NEURAL NETWORKS N A1 A A2 N B1 Y B2 Layer 1 Forwards Backwards Fig..(18. These are labelled M to indicate that they play the role of a simple multiplier. Layer 2: The nodes in this layer are fixed (not adaptive).. Layer 3: Nodes in this layer are also fixed nodes. 2. 2 The output of each node is this layer represents the firing strength of the rule. The output of each node is simply the product of the normalized firing strength and a first order polynomial: O4.4 ANFIS..(18.(18. The output of each node in this layer is given by: O3. 2 . bi and ci are the parameters for the MF.13) Layer 4: All the nodes in this layer are adaptive nodes.14) where pi . qi and ri are design parameters (consequent parameter since they deal with the then-part of the fuzzy rule) .. 2 . These are labelled N to indicate that these perform a normalization of the firing strength from previous layer.12) wi i = 1. i = w i = .(18. i = w i fi = w i ( pi x + qi y + ri ) i = 1... w1 + w2 .

15). respectively. there are two adaptive layers (12. and the solution for È. slope and the center of the bell MFs. The output of this single node is given by: Oi. 2 . Layer 4 has also three modifiable parameters (pi..(18. Note here that ai.2 Learning Method of ANFIS The task of training algorithm for this architecture is tuning all the modifiable parameters to make the ANFIS output match the training data [14]. we can plug training data and obtain a matrix equation: AQ = y ..(18.. which is minimizes 2 A y È .17) .. we can apply least square method to identify the consequent parameters...16) = w1 f1 + w2 f2 = w1 (p1x + q1y + r1) + w2 (p2x + q2y + r2) = w1 x p1 + w1 y q1 + w1 r1 + w x p2 + w2 y q2 + w 2 r2 2 b g b g b g b g b g b g This is a linear combination of the modifiable parameters. bi and ci) pertaining to the input MFs [13]. qi and ri) pertaining to the first order polynomial. Some layers can be combined and still produce the same output. we can divide the parameter set S into two sets: S = S1 Å S2 S = set of total parameters S1 = set of premise (nonlinear) parameters S2 = set of consequent (linear) parameters Å = direct sum For the forward path (see Fig.3..HYBRID FUZZY NEURAL NETWORKS APPLICATIONS 245 Layer 5: This layer has only one node labelled S to indicate that is performs the function of a simple summer. is the least square estimator: .15) The ANFIS architecture is not unique. This is a linear square problem. 1). For this observation. 5 = f = å i wi fi åw f = åw i i i i i i = 1. These parameters are called consequent parameters. Now for a given set of values of S1.(18. Layer 1 has three modifiable parameters (ai. If these parameters are fixed.. the output of the network becomes: f= w1 w2 f1 + f2 w1 + w2 w2 + w2 . In this ANFIS architecture. 18.18) where Q contains the unknown parameters in S2..(18. bi and ci describe the sigma. These parameters are called premise parameters.

In multi-fuel fired fluidised bed power plants (Fig.22) 18. For the backward path (Fig. i ¶E = eL. primary airflow Fp and secondary airflow Fs signals are calculated by the linearization model as a function of the reference of the combustion power such as: QHz = 0.. is to keep the O2 content around 3-5% [16]. including the oxygen content. The update of the parameters in the ith node in layer Lth layer can be written as $ QiL (k) = $ QiL (k 1) + h ¶ Q L ( k ) $ i ¶E ( k ) . while combustion power is controlled by the primary airflow.23) The reference signals for the fuel screw QHz. Q)] i =1 N 2 . The structure of the PI controller is U(s) = Kpi + Kh 1 S i = 1. secondary airflow Fs.(18.. 18. is the back propagated error signal.(18.. i.. excess air is required for ensuring complete burning. 2 . 18.7207 . from the efficiency point of view. . The combustion model. it is a difficult task due to the inhomogeneous properties of the fuel.2662Pcomb 9. from the fuel screw QHz.. However. The higher the burning rate and smaller the waste heat is the higher efficiency. signal primary airflow Fp.(18. The premise parameters are updated by descent method [15].21) where h is the learning rate and the gradient vector $ ¶Z L . (see Fig....5). The aim of the combustion control.(18. 18.(18. i $ $ ¶Q iL ¶Q iL $ ¶Z L.3 Model of Combustion The role of the combustion process is to produce the required heat energy for steam generation at the possible highest combustion efficiency.19) we can use also recursive least square estimator in case of on-line training. The error signal form the oxygen content drives the PI controller of the flue screw signal.20) in a recursive manner with respect È(S2).6) The oxygen and combustion power controller (Fig. utilising the ANFIS structure based on [20].4). through minimizing the overall quadratic cost function J(Q) = 1 2 $ å [ y(k) y(k.. 18.3. The O2 content of the flue gas is directly related to the amount of excess air..246 FUZZY LOGIC AND NEURAL NETWORKS Q* = (AT A)1 AT y . i being the nodes output and eL. the error signals propagate backward. The efficiency depends on the completeness of burning and the waste heat taken away in the flue gas by the excess airflow.6) consists of two parallel PI controllers. calculates the combustion power (Pcomb) and flue gas components (Cf).

005 .4 Optimization of the PI-Controllers Using Genetic Algorithms Standard genetic or genetic searching algorithms are used for numerical parameter optimization and are based on the principles of evolutionary genetics and the natural selection process [17]. usually.3.2662Pcomb 4.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS Steam Pressure Steam flow to network (2) Steam header Air fan (1) Furnace pressure (8) (9) Heater Induced draft fan X (3) Sec. by a process of mixing genetic information from both parents. crossover and mutation. A general genetic algorithm contains.0737Pcomb + 10. air (6) (5) I 247 Boiler drum Econ.912 Fs = 0. These procedures are responsible for the global search minimization function without testing all the solutions. Selection corresponds to keeping the best members of the population to the next generation to preserve the individual with good performance (elite individuals) in fitness function.5 Fluidized bed power plant. Steam Boiler super heater drum Oxygen (4) Furnace waterwall Fuel feed Prim. Crossover originates new members for the population.. Fp = 0.. air fan (7) Fig.24) 18. the next three procedures: selection. depending of the selected parents the growing of the fitness of the population is faster or .(18. air Prim. 18.

Mutation is a process by which a percentage of the genes are selected in a random fashion and changed. and random[18]. In the implemented algorithm a small population of 20 individuals. Among many other solutions. . the parent selection can be done with the roulette method. The individuals are randomly selected with equal opportunity to create the new population. In our case k = 2 to emphasize the importance of oxygen content which is directly related to the flue gas emissions. 18.25) N where the k is a weighting factor. always including the elite. The simulation shows that by applying the new controller structure together with the ANFIS model. the crossover of one site splicing is performed and all the members are subjected to mutation except the elite. Crossover is performed over half of the population..6 Control system of combustion process.. lower. an elitism of 2 individuals was used.248 FUZZY LOGIC AND NEURAL NETWORKS ANFIS fuel flow model and Stoichiometric combustion and fuel gas model Cf Linearlization model Pcomb QH2 FQ Fs Pcomb Oxygen and combustion power controllers Fig. The mutation operator is a binary mask generated randomly according to a selected rate that is superposed to the existing binary codification of the population changing some of the bits [19]. much smaller deviation in the oxygen content can be achieved while satisfying the same demand for combustion power. by tournament. The performance of the controller based on the ANFIS model is compared to the performance of the real process. The reference signal for the combustion power is taken from the measurement data.(18. The fitness function is J= å 1 N $ ( ycomb ycomb ) N +k´ å (y 1 N O2 $ yO 2 ) .

5 1 2 3 4 5 6 7 8 9 10 Generations + + + + + Fig. 18.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS 6.5 + + 3 2. .5 + 5 + + Fitness + 4.5 + 4 + + + + + 3.8 PI combustion power controller optimization with GA.7 Fitness function by the generation of the GA Optimization of the Pl controller 120 110 100 90 80 70 60 50 40 0 100 200 300 400 500 Time [S] 600 700 800 900 1000 Setpoint Output Combustion power [MW] Fig. 18.5 6+ + Best Average Poorest 249 5.

18.9 PI Oxygen content controller optimization.1 4.10 Combustion power response: comparison of the achievement in real process and in the simulated control system.250 FUZZY LOGIC AND NEURAL NETWORKS Optimization of the Pl Controller 4. .95 3.8 3.9 3.2 4. 120 110 100 Combustion power [MW] Combustion power measurement signal Setpoint Output 90 80 70 60 50 40 30 0 100 200 300 400 500 600 Time [S] 700 800 900 1000 Fig.15 4.05 Setpoint Output Oxygen content [%] 4 3.85 3.75 3.7 0 100 200 300 400 500 Time [S] 600 700 800 900 1000 Fig.18.

37. K. Explain the application of hybrid fuzzy neural network for the tool breakage monitoring of end milling. and Y. In-Process Detection of Tool Breakage in Milling. and J. Jang. I. and E. 1992. The Detection of Tool Breakage in Milling Operations.HYBRID FUZZY NEURAL NETWORKS APPLICATIONS Oxygen content measurement signal 9 8 7 Oxygen content [%] 251 Measurement Output Setpoint = 4% 6 5 4 3 2 1 0 0 100 200 300 400 500 600 700 800 900 1000 Time [S] Fig. 108. Annals of the CIRP. 4. Black.T. J. Vol. International Journal of Machining. Engineering for Industry. Lan. C. International Journal of Advanced Manufacturing Technology. Altintas. 191-197. 110. M. 153-164. No. Y. 3. Explain the application of ANFIS to control the combustion process. J. A Fuzzy-Nets Tool-Breakage Detection System for End-Milling Operations. pp.C. 1997. Detection of Cutting Edge Breakage in Turning. QUESTION BANK. J. A Fuzzy-Nets In-process (FNIP) System for Tool-Breakage Monitoring in End-Milling Operations. and J. 1. 271-277. . 5. Jemielniak. 783-800. 1986. 2. Tools manufacturing. 1988. 1. REFERENCES. 41. 6. Engineering for Industry.T. 1. Naerheim. Vol. pp. Tlusty.S.C.11 Oxygen content response: comparison of the achievement in real process and in the simulated control system.T. Vol. 97-100. 6. Chen. No. Neuro-Fuzzy and Soft Computing. Sun. Mizutani. 1996. 2. 12. pp. pp. Vol. 1997. 18. New Jersey: Prentice Hall. Yellowley. J. Chen.

Tools Manufacturing. E. Proceeding of the 1999 IEEE.Y. and C. 4. and H. Vol. pp. Vol. Abdennour. Ko. A Case Study of PID Contoller tunning by Genetic Algorithm Proccedings of IASTED International Conference on Modelling and Control. 12. Vol.Y. Detection of Tool Breakage in Milling Operation-II. and A. No. Mizutani. McLanghlin.Control Theory Applications. Hawai.33. and A. Kovacs.346.W. and E. 531-544. Vol. and K. No. Tarng. 35. 4. 259-269. Z. 9. Japan. J. IEE Proc. I. Jang.252 FUZZY LOGIC AND NEURAL NETWORKS 7. On-line Detection of Tool Breakage Using Telemetering of Cutting Forces in Milling. USA. 545-588. 2004. International Journal of Machining. Nov. Hybrid model of oxygen content in flue gas. Tools Manufacturing. Jung. F. West. pp. 16. Kortela. A Sensor for the Detection of Tool Breakage in NC Milling. International Journal of Machining. Najim. Innsbruck. Kim. I. 143. Vieria. 1993.B. Y. Zhang. 1. R. 20. IASTED International Conference on Applied Modelling and Simulation. pp. 1992. D. Switzerland 2004. May 1996. The Neural Network Approach. Kovacs.S. Vol. Prentice Hall. 1993. Tansel. Kim.S. Adaptation in Natural and Artificial System. Water Gas Heater Nonlinear Physical Model: Oprimazation with Genetic Algorithms Proccedings of IASTED International Conference on Modelling Identification and Control. 19. Grindelwald. D. . Alturki.J.H. pp. Y. No. Sim. V. Fuzzy neural networks and application to the FBC process. 11. Tansel. 15. 8. tools Manufacturing. MA. J. 13. pp. 80-90. NJ. T.B. 1995. 1990. IEEE/RSJ Conference on intelligent robots and systems Yokohama. 950957. The Time Series Analysis Approach. Wertz. Ikonen. MIT Press. Journal of Materials Processing Technology. 1050-1055. K. 36. Lee.L. Proc.fuzzy model of flue gas oxygen content. Cho. 14. pp. Leppäkoski and J. Cambridge. pp. Proc. C. Proceedings of IASTED International Conference on Modeling Identification and Control. 33. Journal of Manufacturing Systems.N. McLanghlin. Switzerland.W and M. and C. On-line Monitoring of Tool Breakage in Face Milling: Using a Self-Organized Neural Network. and D.A. 18. July 1993. 10. Grindelwald. and B. 17. and U.H. No.Yang and M. Y. pp. 1993. Holland. International Journal of Machining. Sun. 14. Neuro-fuzzy and Soft Computation. J. 2. 19-27. Vol. pp. J.N.T Jeon. International Conference on Control Applications. 2001. Chen. and D. 259-272. K. Hímer. 2002. Detection of Tool Breakage in Milling Operation-I. 1999. Neuro-fuzzy control of a steam boiler turbine unit. S. USA.1997.H. Mota. 341. On Developing an adaptive neural-fuzzy control system. Neuro. Han.

183. 121 B Back-propagation 3. 122 Boltzmann machines 165 Boolean logic 3 C Calssical modus ponens 45 Cart-Pole system 194 Cartesian product 24 Cells 122 Center of-gravity method 74 Center-of-area 87 Centroid defuzzification 90. 139. 9 Characteristic of fuzzy systems 9 Classical modus ponens 44 Classical modus tollens 45 . 123 ASE-ACE combination 192 Associative learning 125 Associative Memory 163 Associative Search 193 Associative Search Element 192 Associativity 54. 133 Adaptive Critic 194 Adaptive Critic Element 192 Adaptive linear element 133 Adaptive Resonance Theory 182 Andlike 62 Andness is 62 ANFIS 229 ANFIS Structure 231 Anti-Reflexivity 20 Anti-Symmetricity 20 Applications of fuzzy logic 95 Approximate reasoning 41 Arithmetic mean 59 Arithmetical mean 58 ART 1 169. 185 ART 2 211 Artificial network 122 Artificial neural 3 Artificial neural network 3. 55 Asymmetric divergence 166 Auto-associator network 161 Auxiliary Hybrid Systems 218 Auxiliary hybrids 217 Average error 151 Averaging operators 58 Axon 3.Index A A back propagation neural network 235 =-Cut 11 Activation function 124 Adaline 125. 129. 142 Barto network 192 Bartos approach 192 Basic property 45 Bellman equations 196 Bias 127 Binary fuzzy relation 21 Biological neural network 121.

103 Defuzzifier 68. 6. 7 Fuzzy logic controller 82. 21 Fuzzy rule-base system 71 Fuzzy set 9 Fuzzy singleton 63 Fuzzy systems 2 G Gaussian membership functions 89 Generalised delta rule 140 Generalized Modus Ponens 44 Geometric mean 59 Godel implication 32 Graded response 164 . 1. 150 Feed-forward networks 125 Feedback control system 81 First-of-Maxima 87 Flexible manufacturing systems 233 FNN architecture 224 Follow-the-leader clustering algorithm 185 Forward Kinematics 200 Frank 55 Fuzzification 8 Fuzzy Approach 116 Fuzzy control 8 Fuzzy implication operator 32 Fuzzy implications 30 Fuzzy logic 1. 86 Delta 131 Delta rule 134 Dendrites 3.#" FUZZY LOGIC AND NEURAL NETWORKS Classical N-array relation 19 Clustering 169. 121 Dimensionality reduction 169 Direction set minimization method 148 Discrete membership function 10 Disjunction Rule 43 Dot product 170 Dubois and prade 55 Dynamic Programming 195 Dynamics 201 E Eigenvectors 181 Elman Network 159 Embedded Hybrid Systems 218 Embedded hybrids 217 Empty Fuzzy Set 15 End-effector positioning 201 End milling 233 End milling cutting process 234 Entailment Rule 43 Entropy 63 Equivalence 20 Error back-propagation 222 Error function 173 Euclidean distance 171 Evaluation network 194 Exclusive-or 135 Expressive power 151 Extremal conditions 58 F Feature extraction 169 Feed-forward network 139. 1 Fuzzy neuron 220 Fuzzy Number 12 Fuzzy Point 15 Fuzzy relations 19. 84 Fuzzy Logic Controllers 243 Fuzzy logic is 1. 170 Committee 223 Committee of networks 223 Commutative 61 Commutativity 58 Compensatory 58 Competitive learning 170 Complement 17 Component extractor 181 Compositional rule 44 Conjugate gradient minimization 148 Conjunction rule 43 Contrast enhancement 187 Control of combustion 243 Control room temperature 100 Controller network 191 Convergence theorem 131 Convex fuzzy set 11 Cost function 172 Counter propagation 174 Critic 190 D Defuzzification 67.

63 Material implication 29 Mathematical neuron network 122 Max-Criterion 88 Maximum 56 Measure of dispersion 63 Median 59 Membership function 10 Middle-of-Maxima 87 Milling 210 Minimum 54 Modifiers 33 Momentum 144 Monotonicity 54. 61 Multi-layer network 140 Multi-layer perceptrons 137 Multi-input-multi-output 82 N Negation rule 44 Network paralysis 148 Neural networks 2. 61 Inference mechanisms 72 Input units 123 Interpolation 42 Intersection 16. 55. 135 Linear discriminant function 130 Linear threshold neural network 130 Linguistic variable 33. 164 Non-fuzzy approach 112 Normal fuzzy set 11 Normalization 186 Number of layers 127 O Offset 127 One identy 54 Ordered weighted averaging 60 Original model 185 Orlike operator 62 Orness 62 Output units 123 P Paradigms of learning 125 Partial order 20 Perceptron 2. 190 Least mean square 133. 122. 153 Hopfield network 161 Human brain 3. 125. 121 Hybrid fuzzy neural network 221 Hybrid neural network 220 Hybrid systems 217 I Idempotency 58. 121 Neuro-fuzzy systems 8 Neuro-fuzzy-genetic systems 8 Neurofuzzy network 222 Neurons 3.INDEX Grading of apples 104 Gravity 87 H Hamacher 55. 58. 56 Harmonic mean 59 Hebbian learning 126 Hebbian Rule 180 Height defuzzification 88 Hidden Units 123. 129 Perceptron learning rule 131 ## . 34 Linguistic variable truth 35 LMS 131 LMS rule 131 Local Minima 148 Long-term memory 183 M Mamdani inference Mechanism 73 Mamdani system 66 Mamdanis implication operator 32. 21 Inverse Kinematics 200 J Jordan Network 158 K Kleene-Dienes implication 32 Kohonen network 177 Kullback information 166 L Larsen inference Mechanism 77 Larsen system 66 Law of the excluded middle 2 Laws of Thought 2 Learning 127 Learning Rate 144 Learning Samples 152.

157 Reflexivity 20 Regular fuzzy neural network 220 Reinforcement learning 192 Reinforcement learning scheme 190 Representation 127 Road accidents 96 Robot arm dynamics 207 Robot control 200 S Self-organization 125 Self-organizing networks 169 Semi-linear 124 Sequential hybrids 217 Sgn function 124 Shadow of fuzzy relation 24 Short-term memory 183 Sigmod 124 Significance 7 Simplified fuzzy Reasoning 77 Single layer feed-forward network 129 Single layer network 129. 134 Singleton fuzzifier 89 Soft computing 8 Standard Strict 32 Stochastic function 193 Strong 56 Subset 46 Subsethood 14 Sugeno Inference Mechanism 75 Summed squared error 135 Sup-Min Composition 26 Superset 47 Supervised learning 125 Support 11 Symmetricity 20. 56 t-conorm-based union 57 t-norm-based intersection 57 Taylor series 148 Test error rate 151 The linguistic variable truth 35 Threshold 127 Threshold (sgn) 130 Tool breakage 233 Total error 135 Total indeterminance 46 Total order 20 Traffic accidents and traffic safety 96 Trajectory generation 201 Transitivity 20 Translation rules 43 Trapezoidal fuzzy number 14 Triangular conorm 55 Triangular Fuzzy Number 13 Triangular norm 54 Tsukamoto inference mechanism 73 two layer feed-forward network 139 Two-input-single-output 82 U Union 16. 54 Symmetry 55 T T 54. 130. 22 Universal approximation theorem 142 Universal approximators 91 Universal fuzzy set 15 Unmanned flexible manufacturing system 233 Unsupervised learning 125 .#$ FUZZY LOGIC AND NEURAL NETWORKS Perceptron learning rule 131 Perceptrons 2 Precisiated natural language 8 Precision 7 Principal component analysis 221 Principle of incompatibility 101 Principle of optimality 195 Probabilistic 56 Processing Units 123 Product 55 Product fuzzy conjunction 89 Product fuzzy implication 89 Projection 23 Projection Rule 44 Q Q-learning 196 Quasi fuzzy number 12 R Recurrent networks 125.

56 Z Zero identity 55 #% . 174 W Weak 55 Winner Selection 170 Y Yager 55.INDEX V Vector quantisation 169.

- Fuzzy Logic With Engineering Applications
- Fuzzy Sets and Fuzzy Logic Theory and Applications
- Fuzzy Neural
- Fuzzy logic, neural network & genetic algorithms
- A Course in Fuzzy Systems and Control
- Neural Networks and Fuzzy Systems, A Dynamical Systems Approach to Machine Intelligence
- Artificial Neural Networks in Real-Life Applications [Idea, 2006]
- Fuzzy Logic - Controls Concepts Theories and Applications
- Artificial Neural Networks Architecture Applications
- Neural Networks and Computing Learning Algorithms and Applications
- Neural Networks
- Fuzzy Sets and Fuzzy Logic Theory and Applications - George j. Klir , Bo Yuan
- Fuzzy logic
- Zurada - Introduction to Artificial Neural Systems (WPC, 1992)
- Neuro-Fuzzy and Soft Computing (Jang Sun Mizutani)
- Principles of Artificial Neural Networks 9812706240
- Fuzzy Logic
- Support Vector Machines, Neural Networks, And Fuzzy Logic Models (2001)
- Introduction to MATLAB and Simulink a Project Approach
- Intro-to-fuzzy-logic-with-matlab
- Fuzzy Book
- C.Neural.Networks.And.Fuzzy.Logic.pdf
- A First Course in Fuzzy and Neural Control . 2003 . Hung T. Nguyen Et Al
- Neural Networks
- Neural Network in MATLAB
- Neural Network complete
- Neural Networks and Learning Machines (3rd Edition)
- NJ-ebooks
- Multidimensional Neural Networks Unified Theory Rama Murthy_NEW AGE_2007
- Continued Fractions - From Analytic Number Theory to Constructive AMS

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd