Introduction to Fuzzy Logic Control

1

Outline
„ „ „ „ „ „ „ „ „ „

General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining Fuzzy Rules
2

General Definition
Fuzzy Logic - 1965 Lotfi Zadeh, Berkely
„

„

„

superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth central notion of fuzzy systems is that truth values (in fuzzy logic) or membership values (in fuzzy sets) are indicated by a value on the range [0.0, 1.0], with 0.0 representing absolute Falseness and 1.0 representing absolute Truth. deals with real world vagueness

Applications
„ „ „

„ „

Expert Systems Control Units Bullet train between Tokyo and Osaka Video Cameras Automatic Transmissions

Operations A B A∧B A∨B ¬A .

Controller Structure „ „ „ Fuzzification „ Scales and maps input variables to fuzzy sets Inference Mechanism „ Approximate reasoning „ Deduces the control action Defuzzification „ Convert fuzzy output values to control signals .

MATLAB fuzzy logic toolbox „ MATLAB fuzzy logic toolbox facilitates the development of fuzzy-logic systems using: „ „ graphical user interface (GUI) tools command line functionality Fuzzy Expert Systems Adaptive Neuro-Fuzzy Inference Systems (ANFIS) „ The tool can be used for building „ „ 7 .

Graphical User Interface (GUI) Tools „ There are five primary GUI tools for building. and observing fuzzy inference systems in the Fuzzy Logic Toolbox: „ „ „ „ „ Fuzzy Inference System (FIS) Editor Membership Function Editor Rule Editor Rule Viewer Surface Viewer 8 . editing.

MATLAB: Fuzzy Logic Toolbox 9 .

MATLAB: Fuzzy Logic Toolbox 10 .

the most common methodology 11 .Fuzzy Inference system „ Two type of inference system „ „ Mamdni inference method Sugeno inference method *Mamdani's fuzzy inference method.

FIS Editor: Mamdani ‘s inference system 12 .

the motor runs faster. we need to slow it down by reducing the input voltage. If the motor slows below the set point.Fuzzy Logic Examples using Matlab „ To control the speed of a motor by changing the input voltage When a set point is defined. the input voltage must be increased so that the motor speed reaches the set point. if for some reason. 13 .

Input/Output „ „ Input status words be: „ Too slow „ Just right „ Too fast output action words be: „ „ „ Less voltage (Slow down) No change More voltage (Speed up) 14 .

FIS Editor: Adding Input / Output 15 .

FIS Editor: Adding Input / Output 16 .

Membership Function Editor 17 .

Input Membership Function 18 .

Output Membership Function 19 .

Membership Functions 20 .

21 . then no change. If motor speed is to fast. then less voltage. then more voltage. If motor speed is about right.Rules „ Define the rule-base: 1) 2) 3) If the motor is running too slow.

Member function Editor: Adding Rules 22 .

Rule Base 23 .

Rule Viewer 24 .

Surface Viewer 25 .

fis) >>out =2. out=evalfis(2437. Now type in the commend window to get the result: >>fis = readfis('one').376 26 .4.fis”.„ „ „ „ „ Save the file as “one.

27 . are exactly the same. method of fuzzy inference similar to the Mamdani method in many respects Fuzzifying the inputs and applying the fuzzy operator.Sugeno-Type Fuzzy Inference „ „ „ „ Takagi-Sugeno-Kang. The main difference between Mamdani and Sugeno is that the Sugeno output membership functions are either linear or constant.

FIS Editor: Sugeno inference system 28 .

Add Input/output variables 29 .

Define Input/output variables 30 .

Add Input MF 31 .

Define Input MF 32 .

Add output MF 33 .

Define output MF 34 .

Add rules 35 .

Define Rule Base 36 .

View rules 37 .

Rules viewer 38 .

Surface viewer 39 .

PID control). It is well suited to mathematical analysis.Advantages of the Sugeno Method „ „ „ „ „ „ Sugeno is a more compact and computationally efficient representation than a Mamdani system. It works well with linear techniques (e.. It works well with optimization and adaptive techniques.g. 40 . It is computationally efficient. It has guaranteed continuity of the output surface.

It is well suited to human input. It has widespread acceptance.Advantages of the Mamdani Method „ „ „ It is intuitive. 41 .

Support Vector Machine & Its Applications .

Overview „ „ „ Introduction to Support Vector Machines (SVM) Properties of SVM Applications ¾ ¾ Gene Expression Data Classification Text Categorization if time permits „ Discussion .

The linear classification (regression) in the high dimensional space is equivalent to the nonlinear classification (regression) in the low‐dimensional space „ „ .Support Vector Machine(SVM) „ The fundamental principle of classification using the SVM is to separate the two categories of patterns Map data x into a higher‐dimensional feature space via a nonlinear mapping.

w.b) = sign(w x + b) w x + b<0 How would you classify this data? .Linear Classifiers w x + b>0 α x denotes +1 denotes -1 f yest f(x.

w.Linear Classifiers x denotes +1 denotes -1 α f yest f(x.b) = sign(w x + b) How would you classify this data? .

Linear Classifiers x denotes +1 denotes -1 α f f(x.w.b) = sign(w x + b) yest How would you classify this data? .

but which is best? . ..w.Linear Classifiers x denotes +1 denotes -1 α f yest f(x..b) = sign(w x + b) Any of these would be fine.

Linear Classifiers x denotes +1 denotes -1 α f f(x.b) = sign(w x + b) yest How would you classify this data? Misclassified to +1 class .w.

w.Classifier Margin x denotes +1 denotes -1 α f yest Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.b) = sign(w x + b) . f(x.

Maximizing the margin is good accordingf( to and w PAC theory x.b) = sign( x+ b) 2. the linear classifier with the. other The training examples maximum are ignorable. maximum margin. Implies that only support vectors are important. Support Vectors are those datapoints that the margin pushes up against margin linear 3. um.intuition w. Empirically it works very very classifier iswell. This is the simplest kind of SVM (Called an LSVM) Linear SVM .Maximum Margin denotes +1 denotes -1 1.

(x+-x-) = 2 x+ (x − x ) ⋅ w 2 M = = w w + − . x. + b = +1 w .Linear SVM Mathematically x+ M=Margin Width X- What we know: „ „ „ w.+ b = -1 w .

Linear SVM Mathematically „ Goal: 1) Correctly classify all training data wx i + b ≥ 1 if yi = +1 wx i + b ≤ 1 if yi = -1 yi ( wxi + b) ≥ 1for all i 2 M = 2) Maximize the Margin 1 t w same as minimize ww 2 We can formulate a Quadratic Optimization Problem and solve for w and b „ subject to 1 t Minimize Φ ( w) = w w 2 yi ( wxi + b) ≥ 1 ∀i .

Quadratic optimization problems are a well‐known class of mathematical  programming problems.Solving the Optimization Problem Find w and b such that Φ(w) =½ wTw is minimized.½ΣΣαiαjyiyjxiTxj is maximized and (1) „ Σα i y i = 0 (2) αi ≥ 0 for all αi .yi)}: yi (wTxi + b) ≥ 1 Need to optimize a quadratic function subject to linear constraints. and many (rather intricate) algorithms exist for  solving them. The solution involves constructing a dual problem where a Lagrange  multiplier αi is associated with every constraint in the primary problem: Find α1…αN such that Q(α) =Σαi . „ „ and for all {(xi .

wTxk for any xk such that αk≠ 0 Each non-zero αi indicates that corresponding xi is a support vector. „ „ „ „ . Then the classifying function will have the form: f(x) = ΣαiyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi – we will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points.The Optimization Problem Solution „ The solution has the form: w = Σα i y i x i b= yk.

No training error What if the training set is noisy? .Solution 1: use very powerful kernels OVERFITTING! .Dataset with noise denotes +1 denotes -1 „ „ Hard Margin: So far we require all data points be classified correctly .

w + C∑εk 2 k =1 ε7 . ε2 ε11 What should our quadratic optimization criterion be? Minimize R 1 w.Soft Margin Classification Slack variables ξi can be added to allow misclassification of difficult or noisy examples.

Soft Margin „ The old formulation: Find w and b such that Φ(w) =½ wTw is minimized and for all {(xi .yi)} yi (wTxi + b) ≥ 1 „ The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ wTw + CΣξi is minimized and for all {(xi .Hard Margin v. .s.yi)} yi (wTxi + b) ≥ 1.ξi and ξi ≥ 0 for all i „ Parameter C can be viewed as a way to control overfitting.

Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers αi. Both in the dual formulation of the problem and in the solution training points appear only inside dot products: Find α1…αN such that Q(α) =Σαi . they define the hyperplane.½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) 0 ≤ αi ≤ C for all αi f(x) = ΣαiyixiTx + b .Linear SVMs:Overview „ „ „ „ The classifier is a separating hyperplane. Most “important” training points are support vectors.

Non-linear SVMs „ Datasets that are linearly separable with some noise work out great: 0 x „ But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: x2 0 x „ 0 x .

Non-linear SVMs: Feature spaces „ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) .

xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x). let K(xi.xj)=(1 + xiTxj)2. Need to show that K(xi.xj)= φ(xi) Tφ(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space.xj)= φ(xi) Tφ(xj): K(xi. the dot product becomes: K(xi. where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] . Example: 2-dimensional vectors x=[x1 x2]. = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj).xj)=(1 + xiTxj)2.The “Kernel Trick” „ „ „ „ The linear classifier relies on dot product between vectors K(xi.

What Functions are Kernels? „ For some functions K(xi.x3) … … … K(xN. Mercer’s theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: „ „ K= K(x1.x2) K(x1.xj) checking that K(xi.x2) K(xN.x1) K(x2.x3) K(x2.xj)= φ(xi) Tφ(xj) can be cumbersome.x1) K(xN.x2) K(x2.x1) K(x1.x3) … … … K(x1.xN) … K(xN.xN) .xN) K(x2.

xj)= xi Txj Polynomial of power p: K(xi.xj)= tanh(β0xi Txj + β1) .Examples of Kernel Functions „ Linear: K(xi. x j ) = exp(− „ xi − x j 2σ 2 2 ) Sigmoid: K(xi.xj)= (1+ xi Txj)p Gaussian (radial-basis function network): „ „ K (x i .

xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi „ The solution is: f(x) = ΣαiyiK(xi.Non-linear SVMs Mathematically „ Dual problem formulation: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjK(xi. xj)+ b „ Optimization techniques for finding αi’s remain the same! .

.Overview „ „ „ SVM locates a separating hyper plane in the feature space and classify points in that space It does not need to represent the space explicitly. simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.Nonlinear SVM .

Properties of SVM Flexibility in choosing a similarity function „ Sparseness of solution when dealing with large data sets .only support vectors are used to specify the separating hyper plane „ Ability to handle large feature spaces .complexity does not depend on the dimensionality of the feature space „ Over fitting can be controlled by soft margin approach „ Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution „ Feature Selection „ .

image classification .text (and hypertext) categorization . Cancer classification) .bioinformatics (Protein classification.hand-written character recognition .SVM Applications „ SVM has been used successfully in many real-world problems .

.Answer: 1) with output arity m.A relatively small number of mislabeled examples can dramatically decrease the performance „ It only considers two classes .Weakness of SVM „ It is sensitive to noise . learn m SVM’s „ SVM 1 learns “Output==1” vs “Output != 1” „ SVM 2 learns “Output==2” vs “Output != 2” „ : „ SVM m learns “Output==m” vs “Output != m” 2)To predict the output for a new input. just predict with each SVM and find out which one puts the prediction the furthest into the positive region.how to do multi-class classification with SVM? .

 more elaborate kernels are needed domain experts can give assistance in formulating  appropriate similarity measures σ in Gaussian kernel σ is the distance between closest points with different  classifications  A lengthy series of experiments in which various  parameters are tested  „ Choice of kernel parameters „ „ „ Optimization criterion – Hard margin vs. Soft margin „ .Some Issues „ Choice of kernel „ „ „ Gaussian or polynomial kernel is default if ineffective.

Wind Power Forecasting(WPF) „ „ „ WPF is a technique which provides the information of  how much wind power can be expected at a given  point of time. . A good short‐term forecasting will ensure grid  stability and a favorable trading performance on the  electricity markets. Due to the increasing penetration of wind power into  the electric power grid.

Ɛ-SVM
„

The objective function of the ε ‐SVM is based on a ε ‐ insensitive loss function. The formula for the ε‐SVM is given as follows:

Structure of SVM

Data Resolution
„ „

„

The resolution of the dataset is 10 minutes. Each data represents the average wind speed and  power within one hour. The data values betweenxj two adjacent samples are  linearly changed, that is:
xi+1 + xi ) x j (t ) = xi + .t dti 0 ≤ t ≤ dti

„

x i and       xi +1 . Where      dti is the time interval between     

Data Value
„

The average value of the data withinT       can be  s calculated as   

1 ) x j (t ) = Ts
„

ti +Ts

) ∫ x j (t )dt
ti

where Ts = 60 minutes is used in the very short‐term  Ts = 2 hours is used  forecasting (less than 6 hours) and     for short‐term forecasting.

Fixed‐Step Prediction Scheme „ „ Prediction horizon of h steps fixed‐step forecasting means only the value of the  next       hth sample is predicted by using the historical  data. yt-1.…. is  yt +h−1 predicted with the data before         yt −1 (the green blocks) . ŷ(t + h) = f (yt.yt-d) Where f is a nonlinear function generated by SVM „ yt +h is predicted with the data before      yt (the red blocks).

Wind speed normalization .

Autocorrelations of the wind speed samples .

SVM model and the RBF model .

1h-ahead wind power prediction using the SVM model. .

 particularly the  classification of two different  categories of patterns.CONCLUSIONS „ The SVM has been successfully applied to the  problems of pattern classification.  SVM model is more suitable for very short‐term and  short‐term WPF Provides a powerful tool for enhancing the WPF  accuracy. „ „ .