You are on page 1of 14

33

CHAPTER 3

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM

The objective of an ANFIS (Jang 1993) is to integrate the best


features of Fuzzy Systems and Neural Networks. ANFIS is one of the best
tradeoffs between neural and fuzzy systems, providing smoothness, due to the
Fuzzy Control (FC) interpolation and adaptability due to the Neural Network
Back propagation.

3.1 INTRODUCTION TO FUZZY LOGIC

Two distinct forms of problem knowledge exist for many problems:


Objective knowledge, which is used in all engineering problem formulations
(e.g. mathematical models), and Subjective knowledge, which represents
linguistic information that is usually impossible to quantify using traditional
mathematics (e.g. rules, expert information, design requirements) (Mendel
1995).

To solve most of the real world problems, both types of knowledge


must be required. The two forms of knowledge can be coordinated in a logical
way using fuzzy logic (FL). A fuzzy logic system is unique in that it is able to
simultaneously handle numerical data and linguistic knowledge (Ross 2005).
The founding father of entire field of FL is Dr. Lotfi Zadeh. In his paper,
Zadeh (1965) states, “As the complexity of a system increases, our ability to
make precise and yet significant statements about its behavior diminishes
until a threshold is reached beyond which precision and significance (or
relevance) become almost mutually exclusive characteristics” – or, “The
closer one looks at a real world problem, the fuzzier becomes its solution”.
34

3.2 FUZZY LOGIC SYSTEM (FLS)

In general, a FLS is a nonlinear mapping of an input data (feature)


vector into a scalar output data. The richness of the FL is that there are
enormous numbers of possibilities that leads to lots of different mappings.
This richness does require a careful understanding of FL and the elements that
comprise a FLS.

FLS contains four components: fuzzifier, rules, inference engine,


and defuzzifier. Once the rules have been established, a FLS can be viewed as
a mapping from inputs to outputs, and this mapping can be expressed
quantitatively as y = f(x). Figure 3.1, depicts a FLS that is widely used in
fuzzy logic controllers.

RULES
Crisp Inputs Crisp Outputs

FUZZIFIER DEFUZZIFIER

INFERENCE
Fuzzy Input Sets Fuzzy Output Sets

Figure 3.1 Schematic Diagram of a Fuzzy Inference System

Fuzzy inference is the process which maps the given input into the
output using fuzzy logic. Any fuzzy inference system can be simply
represented in four integrating blocks:

1) Fuzzification: The process of transforming any crisp value to


the corresponding linguistic variable (fuzzy value) based on
the appropriate membership function.
35

2) Knowledge base: Contains membership functions definitions


and the necessary IF-THEN rules.

3) Inference engine: This simulates human decision making


through using implication and aggregation processes.

4) Defuzzification: The process of transforming the fuzzy output


into a crisp numerical value.

Rules may be provided by experts or can be extracted from


numerical data. In either case, engineering rules are expressed as a collection
of IF – THEN statements, e.g. “IF u1 is very warm and u2 is quite low, THEN
turn v somewhat to right”. This rule reveals that it needs an understanding of:

1) Linguistic variables versus numerical values of a variable


(e.g. very warm versus 40o C);

2) Quantifying linguistic variables (e.g., u1 may have a finite


number of linguistic terms associated with it, ranging from
extremely hot to extremely cold), which is done using fuzzy
membership functions;

3) Logical connections for linguistic variables (e.g., “and”, “or”


etc.,); and

4) Implications, i.e., “IF A THEN B”. Additionally


understanding of combining more than one rule is required.

The fuzzifier maps crisp numbers into fuzzy sets. It is needed in


order to activate rules which are in terms of linguistic variables, which have
fuzzy sets associated with them. The inference engine of the FLS maps input
fuzzy sets into output fuzzy sets. It handles the way in which rules are
combined, just as humans use many different types of inferential procedures
36

to help us understand things or to make decisions. In many applications, crisp


number must be obtained at the output of a FLS. The defuzzifier maps output
sets into crisp numbers.

3.3 FUZZY SET THEORY

3.3.1 Crisp Sets

A crisp set A in a universe of discourse U (which provides the set of


allowable values for a variable) can be defined by listing all of its members or
by identifying the elements x A . One way to do the latter is to specify a
condition by which x A ; thus A can be defined as A = {x | x meets
some condition}. Alternatively, we can introduce a zero-one membership
function for A, denoted A(x), such that A A(x) = 1 if x A and A(x)

= 0 if x A . Subset A is mathematically equivalent to its membership


function A(x) in the sense that knowing A(x) is the same as knowing A itself.

3.3.2 Fuzzy Sets

A fuzzy set F defined on a universe of discourse U is characterized


by a membership function F (x) which takes on values in the intervals [0, 1].
A fuzzy set is a generalization of an ordinary subset (i.e. a crisp subset) whose
membership function only takes in two values, zero or unity. A membership
function provides a measure of the degree of similarity of an element in U to
the fuzzy subset. In FL an element can reside in more than one set to different
degrees of similarity. This cannot occur in crisp set theory. A fuzzy set F in U
may be represented as a set of ordered pairs of generic element x and its grade
of membership function: F {( x, F ( x)) | x U } . When U is continuous, F is

commonly written as F U F ( x) | x . In this equation the integral sign does

not denote integration; it denotes the collection of all points x U with


associated membership function F (x). When U is discrete, F is commonly
37

written as F U F ( x) | x . In this equation the summation sign denotes the

collection of all points x U with associated membership function F (x);

hence it denotes the set theoretical operation of union. The slash in these
expressions associates the elements in U with their membership grades, where
F (x) > 0.

3.3.3 Linguistic Variables

Linguistic variables are variable whose values are not numbers but
words or sentences in a natural or artificial language. In general, linguistic
variables are less specific than numerical ones. Let u denote the names of
linguistic variable, numerical values of a linguistic variable u are denoted x,
where x U . Sometimes x and u are interchangeably used. A linguistic
variable is usually decomposed into a set of terms, T(u), which covers its
universe of discourse.

3.3.4 Membership Functions

Membership functions, F (x) for the most part, associated with


terms that appear in the antecedents or consequents of rules, or in phrases.
The most commonly used shapes for membership functions are triangular,
trapezoidal, piecewise, linear and Gaussian. Usually, membership functions
are chosen by the user arbitrarily, based on the user’s experience; hence, the
membership function for two users could be quite different depending upon
their experiences, perspectives, cultures, etc. Figure 3.2 shows a sample
membership function for two sets.

Fuzzy logic was introduced as a superset of standard Boolean logic


by considering the fuzzy values that ranges from 0 to 1 instead of only
considering two values true or false and applying the same logic operators
such as AND, OR, NOT, etc. Thus the concept is extended from two valued
38

logic to multi-valued logic, which have many applications (Babulal 2006,


Babulal 2008, Behera 2009, Bonatto 1998, Boris 2006, Chilukuri 2004, Dash
2000, Elmitwally 2000, Farghal 2002, Grey 2005, Ibrahim 2001, Ibrahim
2002, Jain 2000, Ko 2004, Ko 2007, Kochukuttan 1997, Liang 2002, Masoum
2004, Morsi 2008, Morsi 2008a, Morsi 2008b, Morsi 2009, Nawi 2003, Saroj
2010, Zhang 2005, Zhu 2004).

H(h): most people H(h): Professional


basketball players

Short Medium Tall Short Medium Tall

Height Height

4 5 6 7 4 5 6 7
(a) (b)

Figure 3.2 Membership Function for T(Height) = {Short Men, Medium


Men, Tall Men). (a) Most People’s Membership Functions
and (b) Professional Basketball Player’s Membership
Function

The conditional statement commonly known as IF-THEN rules can


be easily formulated using fuzzy logic. Rules consist of two parts: the
antecedent or the IF part, and the consequent or the THEN part. The IF-
THEN rule can take the following form:

IF x is A and y is B THEN z is C

where, A, B and C are linguistic variables whose values are sentences in a


natural language.
39

The main disadvantage of fuzzy classifier is that system time


response slows down with the increase in number of rules. If the system does
not perform satisfactorily, then the rules are reset again to obtain efficient
results i.e. it is not adaptable according to the variation in data. The accuracy
of the system is dependent on the knowledge and experience of human
experts. The rules should be updated and weighting factors in the fuzzy sets
should be refined with time. Neural networks, genetic algorithms, swarm
optimization techniques, etc. can be used to for fine tuning of fuzzy logic
control systems.

3.4 NEURAL NETWORKS

A neural network is a powerful data modeling tool that is able to


capture and represent complex input/output relationships. The motivation for
the development of neural network technology stemmed from the desire to
develop an artificial system that could perform "intelligent" tasks similar to
those performed by the human brain. Neural networks resemble the human
brain in the following two ways:

1. A neural network acquires knowledge through learning.

2. A neural network's knowledge is stored within inter-neuron


connection strengths known as synaptic weights.

The true power and advantage of neural networks lies in their


ability to represent both linear and non-linear relationships and in their ability
to learn these relationships directly from the data being modeled. Traditional
linear models are simply inadequate when it comes to modeling data that
contains non-linear characteristics.
40

Figure 3.3 Multi-Layer Perceptron Neural Network

The most common neural network model is the multi-layer


perceptron (MLP). This type of neural network is known as a supervised
network because it requires a desired output in order to learn. The goal of this
type of network is to create a model that correctly maps the input to the output
using historical data so that the model can then be used to produce the output
when the desired output is unknown. A graphical representation of an MLP is
shown in Figure 3.3.

In a two hidden layer MLP, the inputs are fed into the input layer
and get multiplied by interconnection weights as they are passed from the
input layer to the first hidden layer. Within the first hidden layer, they get
summed up and then processed by a nonlinear function (usually the
hyperbolic tangent). As the processed data leaves the first hidden layer, again
it gets multiplied by interconnection weights, then summed and processed by
the second hidden layer. Finally the data is multiplied by interconnection
weights then processed one last time within the output layer to produce the
neural network output.
41

The MLP and many other neural networks learn using an algorithm
called back-propagation. With back-propagation, the input data is repeatedly
presented to the neural network. With each presentation the output of the
neural network is compared to the desired output and an error is computed.
This error is then fed back (back-propagated) to the neural network and used
to adjust the weights such that the error decreases with each iteration and the
neural model gets closer and closer to producing the desired output. This
process is known as "training".

Neural networks have been successfully applied to a broad


spectrum of data-intensive applications. Artificial Neural Networks (ANN) is
among the oldest Artificial Intelligence techniques; they have been around the
power research arena for quite some time. ANNs mimic the neural brain
structure of humans. This structure consists of simple arithmetic units
connected in highly complex layer architecture. ANNs are capable of
representing complex (nonlinear) functions, and they learn these functions
through example. Neural networks have been applied extensively in Power
Quality research. Major applications include

Identifying Power Quality events from poor power quality ones

Modeling the patterns of harmonic production from individual


fluorescent lighting systems

Estimating harmonic distortions and power quality in power


networks

Identifying and recognizing power quality events using the


wavelet transform in conjunction with neural networks

Identifying high-impedance fault, fault-like load, and normal


load current patterns
42

Analyzing harmonic distortion while avoiding the effects of


noise and sub-harmonics

Developing screening tools for the power system engineers, to


address power quality issues

3.5 ANFIS ARCHITECTURE

ANFIS is a hybrid system incorporating the learning abilities of


ANN and excellent knowledge representation and inference capabilities of
fuzzy logic (Jang 1993) that have the ability to self modify their membership
function to achieve a desired performance. An adaptive network, which
subsumes almost all kinds of neural network paradigms, can be adopted to
interpret the fuzzy inference system. ANFIS utilizes the hybrid-learning rule
and manage complex decision-making or diagnosis systems. ANFIS has been
proven to be an effective tool for tuning the membership functions of fuzzy
inference systems. Ibrahim (2001) proposed an ANFIS based system to learn
power quality signature waveform. It was shown that adaptive fuzzy systems
are very successful in learning power quality waveform. Rasli (2009), Rathina
(2009) and Rathina (2010) have proposed ANFIS based systems for power
quality assessment.

ANFIS is a simple data learning technique that uses a fuzzy


inference system model to transform a given input into a target output. This
prediction involves membership functions, fuzzy logic operators and if-then
rules. There are two types of fuzzy system, commonly known as the Mamdani
and Sugeno models. There are five main processing stages in the ANFIS
operation, including input fuzzification, application of fuzzy operators,
application method, output aggregation, and defuzzification.
43

ANFIS utilizes “Representation of prior knowledge into a set of


constraints (network topology) to reduce the optimization search space”, from
Fuzzy Systems and “adaptation of back propagation to structured network to
automate FC parametric tuning”, from Neural Networks, to improve
performance. The design objective of the fuzzy controller is to learn and
achieve good performance in the presence of disturbances and uncertainties.
The design of membership functions is done by the ANFIS batch learning
technique, which amounts to tune a FIS with back propagation algorithm
based on a collection of input–output data pairs.

Generally, ANFIS is a multilayer feed forward network in which


each node performs a particular function (node function) on incoming signals.
For simplicity, we consider two inputs 'x' and 'y' and one output 'z '. Suppose
that the rule base contains two fuzzy if-then rules of Takagi and Sugeno type
(Jang 1993):

Rule 1: IF x is A1 and y is B1 THEN f1=P1x+Q1y+R1

Rule 2: IF x is A2 and y is B2 THEN f2=P2x+Q2y+R2 (3.1)

Figure 3.4 ANFIS Architecture


44

The ANFIS architecture is a five layer feed forward network as


shown in Figure 3.4. An adaptive network (Jang 1993) is a multilayer feed
forward network in which each node performs a particular function (node
function) on incoming signals as well as a set of parameters pertaining to this
node. The formulas for the node functions may vary from node to node, and
the choice of each node function depends on the overall input-output function
which the adaptive network is required to carry out. Note that the links in an
adaptive network only indicate the flow direction of signals between nodes;
no weights are associated with the links.

To reflect different adaptive capabilities, we use both circle and


square nodes in an adaptive network. A square node (adaptive node) has
parameters while a circle node (fixed node) has none. The parameter set of an
adaptive network is the union of the parameter sets of each adaptive node. In
order to achieve a desired input-output mapping, these parameters are updated
according to given training data and a gradient-based learning procedure is
used.

Layer 1: Every node in this layer is a square node with a node


function (the membership value of the premise part)

Oi1 Ai ( x) (3.2)

Where, x is the input to the node i , and Ai is the linguistic label associated
with this node function.

Layer 2: Every node in this layer is a circle node labelled which


multiplies the incoming signals. Each node output represents the firing
strength of a rule.

Oi2 Ai ( x) Bi ( y) where i = 1:2 (3.3)


45

Layer 3: Every node in this layer is a circle node labeled N


(normalization). The ith node calculates the ratio of the ith rule’s firing strength
to the sum of all firing strengths.

Wi
Oi3 Wi , where i=1: 2 (3.4)
W1 W2

Layer 4: Every node in this layer is a square node with a node


function

Oi4 Wi fi Wi ( Pi x Qi y Ri ) , where i=1:2 (3.5)

Layer 5: The single node in this layer is a circle node labeled


that computes the overall output as the summation of all incoming signals

Oi5 = System output, where i = 1:2 (3.6)

Equation (3.6) represents the overall output of the ANFIS, which is


functionally equivalent to the fuzzy system in (Morsi 2008a).

3.6 ANFIS LEARNING ALGORITHM

In this subsection, the hybrid learning algorithm is explained


briefly. The ANFIS Learning Algorithm uses a two-pass learning cycle. In the
forward pass, S1 is unmodified and S2 is computed using a Least Squared
Error (LSE) algorithm (Off-line Learning). In the Backward pass, S2 is
unmodified and S1 is computed using a gradient descent algorithm (usually
Back Propagation).
46

Figure 3.5 ANFIS Structure

From the ANFIS structure shown in Figure 3.5, it has been


observed that when the values of the premise parameters are fixed, the overall
output can be expressed as a linear combination of the consequent parameters.
The hybrid learning algorithm is a combination of both back propagation and
the least square algorithms. Each epoch of the hybrid learning algorithm
consists of two passes, namely forward pass and backward pass. In the
forward pass of the hybrid learning algorithm, functional signals go forward
up to layer 4 and the consequent parameters are identified by the least squares
estimate. The back propagation is used to identify the nonlinear parameters
(premise parameters) and the least square is used for the linear parameters in
the consequent parts.

You might also like