Introduction to Fuzzy Logic Control

1

Outline
„ „ „ „ „ „ „ „ „ „

General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining Fuzzy Rules
2

General Definition
Fuzzy Logic - 1965 Lotfi Zadeh, Berkely
„

„

„

superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth central notion of fuzzy systems is that truth values (in fuzzy logic) or membership values (in fuzzy sets) are indicated by a value on the range [0.0, 1.0], with 0.0 representing absolute Falseness and 1.0 representing absolute Truth. deals with real world vagueness

Applications
„ „ „

„ „

Expert Systems Control Units Bullet train between Tokyo and Osaka Video Cameras Automatic Transmissions

Operations A B A∧B A∨B ¬A .

Controller Structure „ „ „ Fuzzification „ Scales and maps input variables to fuzzy sets Inference Mechanism „ Approximate reasoning „ Deduces the control action Defuzzification „ Convert fuzzy output values to control signals .

MATLAB fuzzy logic toolbox „ MATLAB fuzzy logic toolbox facilitates the development of fuzzy-logic systems using: „ „ graphical user interface (GUI) tools command line functionality Fuzzy Expert Systems Adaptive Neuro-Fuzzy Inference Systems (ANFIS) „ The tool can be used for building „ „ 7 .

and observing fuzzy inference systems in the Fuzzy Logic Toolbox: „ „ „ „ „ Fuzzy Inference System (FIS) Editor Membership Function Editor Rule Editor Rule Viewer Surface Viewer 8 .Graphical User Interface (GUI) Tools „ There are five primary GUI tools for building. editing.

MATLAB: Fuzzy Logic Toolbox 9 .

MATLAB: Fuzzy Logic Toolbox 10 .

the most common methodology 11 .Fuzzy Inference system „ Two type of inference system „ „ Mamdni inference method Sugeno inference method *Mamdani's fuzzy inference method.

FIS Editor: Mamdani ‘s inference system 12 .

13 . the input voltage must be increased so that the motor speed reaches the set point. if for some reason. the motor runs faster. we need to slow it down by reducing the input voltage.Fuzzy Logic Examples using Matlab „ To control the speed of a motor by changing the input voltage When a set point is defined. If the motor slows below the set point.

Input/Output „ „ Input status words be: „ Too slow „ Just right „ Too fast output action words be: „ „ „ Less voltage (Slow down) No change More voltage (Speed up) 14 .

FIS Editor: Adding Input / Output 15 .

FIS Editor: Adding Input / Output 16 .

Membership Function Editor 17 .

Input Membership Function 18 .

Output Membership Function 19 .

Membership Functions 20 .

Rules „ Define the rule-base: 1) 2) 3) If the motor is running too slow. then less voltage. then more voltage. 21 . If motor speed is about right. then no change. If motor speed is to fast.

Member function Editor: Adding Rules 22 .

Rule Base 23 .

Rule Viewer 24 .

Surface Viewer 25 .

out=evalfis(2437.376 26 .fis”.fis) >>out =2.4. Now type in the commend window to get the result: >>fis = readfis('one').„ „ „ „ „ Save the file as “one.

27 . are exactly the same.Sugeno-Type Fuzzy Inference „ „ „ „ Takagi-Sugeno-Kang. The main difference between Mamdani and Sugeno is that the Sugeno output membership functions are either linear or constant. method of fuzzy inference similar to the Mamdani method in many respects Fuzzifying the inputs and applying the fuzzy operator.

FIS Editor: Sugeno inference system 28 .

Add Input/output variables 29 .

Define Input/output variables 30 .

Add Input MF 31 .

Define Input MF 32 .

Add output MF 33 .

Define output MF 34 .

Add rules 35 .

Define Rule Base 36 .

View rules 37 .

Rules viewer 38 .

Surface viewer 39 .

Advantages of the Sugeno Method „ „ „ „ „ „ Sugeno is a more compact and computationally efficient representation than a Mamdani system. 40 . PID control). It has guaranteed continuity of the output surface. It works well with linear techniques (e. It works well with optimization and adaptive techniques.g. It is computationally efficient.. It is well suited to mathematical analysis.

It is well suited to human input. 41 .Advantages of the Mamdani Method „ „ „ It is intuitive. It has widespread acceptance.

Support Vector Machine & Its Applications .

Overview „ „ „ Introduction to Support Vector Machines (SVM) Properties of SVM Applications ¾ ¾ Gene Expression Data Classification Text Categorization if time permits „ Discussion .

The linear classification (regression) in the high dimensional space is equivalent to the nonlinear classification (regression) in the low‐dimensional space „ „ .Support Vector Machine(SVM) „ The fundamental principle of classification using the SVM is to separate the two categories of patterns Map data x into a higher‐dimensional feature space via a nonlinear mapping.

Linear Classifiers w x + b>0 α x denotes +1 denotes -1 f yest f(x.w.b) = sign(w x + b) w x + b<0 How would you classify this data? .

Linear Classifiers x denotes +1 denotes -1 α f yest f(x.w.b) = sign(w x + b) How would you classify this data? .

w.Linear Classifiers x denotes +1 denotes -1 α f f(x.b) = sign(w x + b) yest How would you classify this data? .

.b) = sign(w x + b) Any of these would be fine..Linear Classifiers x denotes +1 denotes -1 α f yest f(x. .but which is best? .w.

w.b) = sign(w x + b) yest How would you classify this data? Misclassified to +1 class .Linear Classifiers x denotes +1 denotes -1 α f f(x.

w.b) = sign(w x + b) .Classifier Margin x denotes +1 denotes -1 α f yest Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. f(x.

maximum margin. other The training examples maximum are ignorable. This is the simplest kind of SVM (Called an LSVM) Linear SVM .b) = sign( x+ b) 2. Implies that only support vectors are important. Empirically it works very very classifier iswell. the linear classifier with the. Maximizing the margin is good accordingf( to and w PAC theory x. Support Vectors are those datapoints that the margin pushes up against margin linear 3. um.intuition w.Maximum Margin denotes +1 denotes -1 1.

+ b = -1 w . x.Linear SVM Mathematically x+ M=Margin Width X- What we know: „ „ „ w. (x+-x-) = 2 x+ (x − x ) ⋅ w 2 M = = w w + − . + b = +1 w .

Linear SVM Mathematically „ Goal: 1) Correctly classify all training data wx i + b ≥ 1 if yi = +1 wx i + b ≤ 1 if yi = -1 yi ( wxi + b) ≥ 1for all i 2 M = 2) Maximize the Margin 1 t w same as minimize ww 2 We can formulate a Quadratic Optimization Problem and solve for w and b „ subject to 1 t Minimize Φ ( w) = w w 2 yi ( wxi + b) ≥ 1 ∀i .

yi)}: yi (wTxi + b) ≥ 1 Need to optimize a quadratic function subject to linear constraints.½ΣΣαiαjyiyjxiTxj is maximized and (1) „ Σα i y i = 0 (2) αi ≥ 0 for all αi . „ „ and for all {(xi . Quadratic optimization problems are a well‐known class of mathematical  programming problems. The solution involves constructing a dual problem where a Lagrange  multiplier αi is associated with every constraint in the primary problem: Find α1…αN such that Q(α) =Σαi .Solving the Optimization Problem Find w and b such that Φ(w) =½ wTw is minimized. and many (rather intricate) algorithms exist for  solving them.

The Optimization Problem Solution „ The solution has the form: w = Σα i y i x i b= yk. Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points. Then the classifying function will have the form: f(x) = ΣαiyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi – we will return to this later. „ „ „ „ .wTxk for any xk such that αk≠ 0 Each non-zero αi indicates that corresponding xi is a support vector.

No training error What if the training set is noisy? .Solution 1: use very powerful kernels OVERFITTING! .Dataset with noise denotes +1 denotes -1 „ „ Hard Margin: So far we require all data points be classified correctly .

ε2 ε11 What should our quadratic optimization criterion be? Minimize R 1 w.Soft Margin Classification Slack variables ξi can be added to allow misclassification of difficult or noisy examples.w + C∑εk 2 k =1 ε7 .

Soft Margin „ The old formulation: Find w and b such that Φ(w) =½ wTw is minimized and for all {(xi .ξi and ξi ≥ 0 for all i „ Parameter C can be viewed as a way to control overfitting.Hard Margin v.yi)} yi (wTxi + b) ≥ 1. .s.yi)} yi (wTxi + b) ≥ 1 „ The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ wTw + CΣξi is minimized and for all {(xi .

Linear SVMs:Overview „ „ „ „ The classifier is a separating hyperplane. Most “important” training points are support vectors. Both in the dual formulation of the problem and in the solution training points appear only inside dot products: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) 0 ≤ αi ≤ C for all αi f(x) = ΣαiyixiTx + b . Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers αi. they define the hyperplane.

Non-linear SVMs „ Datasets that are linearly separable with some noise work out great: 0 x „ But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: x2 0 x „ 0 x .

Non-linear SVMs: Feature spaces „ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) .

xj)= φ(xi) Tφ(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space.xj)=(1 + xiTxj)2. the dot product becomes: K(xi. = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj). where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] .xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x). Need to show that K(xi. Example: 2-dimensional vectors x=[x1 x2].xj)=(1 + xiTxj)2. let K(xi.xj)= φ(xi) Tφ(xj): K(xi.The “Kernel Trick” „ „ „ „ The linear classifier relies on dot product between vectors K(xi.

xN) … K(xN.xN) K(x2.x1) K(x1.x2) K(xN.x1) K(xN.x2) K(x1. Mercer’s theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: „ „ K= K(x1.x2) K(x2.x3) … … … K(x1.xN) .x3) K(x2.xj) checking that K(xi.xj)= φ(xi) Tφ(xj) can be cumbersome.What Functions are Kernels? „ For some functions K(xi.x3) … … … K(xN.x1) K(x2.

x j ) = exp(− „ xi − x j 2σ 2 2 ) Sigmoid: K(xi.xj)= (1+ xi Txj)p Gaussian (radial-basis function network): „ „ K (x i .xj)= tanh(β0xi Txj + β1) .Examples of Kernel Functions „ Linear: K(xi.xj)= xi Txj Polynomial of power p: K(xi.

Non-linear SVMs Mathematically „ Dual problem formulation: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjK(xi. xj)+ b „ Optimization techniques for finding αi’s remain the same! . xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi „ The solution is: f(x) = ΣαiyiK(xi.

.Overview „ „ „ SVM locates a separating hyper plane in the feature space and classify points in that space It does not need to represent the space explicitly. simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.Nonlinear SVM .

complexity does not depend on the dimensionality of the feature space „ Over fitting can be controlled by soft margin approach „ Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution „ Feature Selection „ .Properties of SVM Flexibility in choosing a similarity function „ Sparseness of solution when dealing with large data sets .only support vectors are used to specify the separating hyper plane „ Ability to handle large feature spaces .

SVM Applications „ SVM has been used successfully in many real-world problems .image classification . Cancer classification) .hand-written character recognition .text (and hypertext) categorization .bioinformatics (Protein classification.

learn m SVM’s „ SVM 1 learns “Output==1” vs “Output != 1” „ SVM 2 learns “Output==2” vs “Output != 2” „ : „ SVM m learns “Output==m” vs “Output != m” 2)To predict the output for a new input.Answer: 1) with output arity m.Weakness of SVM „ It is sensitive to noise . just predict with each SVM and find out which one puts the prediction the furthest into the positive region. .how to do multi-class classification with SVM? .A relatively small number of mislabeled examples can dramatically decrease the performance „ It only considers two classes .

 Soft margin „ .Some Issues „ Choice of kernel „ „ „ Gaussian or polynomial kernel is default if ineffective. more elaborate kernels are needed domain experts can give assistance in formulating  appropriate similarity measures σ in Gaussian kernel σ is the distance between closest points with different  classifications  A lengthy series of experiments in which various  parameters are tested  „ Choice of kernel parameters „ „ „ Optimization criterion – Hard margin vs.

Due to the increasing penetration of wind power into  the electric power grid.Wind Power Forecasting(WPF) „ „ „ WPF is a technique which provides the information of  how much wind power can be expected at a given  point of time. A good short‐term forecasting will ensure grid  stability and a favorable trading performance on the  electricity markets. .

Ɛ-SVM
„

The objective function of the ε ‐SVM is based on a ε ‐ insensitive loss function. The formula for the ε‐SVM is given as follows:

Structure of SVM

Data Resolution
„ „

„

The resolution of the dataset is 10 minutes. Each data represents the average wind speed and  power within one hour. The data values betweenxj two adjacent samples are  linearly changed, that is:
xi+1 + xi ) x j (t ) = xi + .t dti 0 ≤ t ≤ dti

„

x i and       xi +1 . Where      dti is the time interval between     

Data Value
„

The average value of the data withinT       can be  s calculated as   

1 ) x j (t ) = Ts
„

ti +Ts

) ∫ x j (t )dt
ti

where Ts = 60 minutes is used in the very short‐term  Ts = 2 hours is used  forecasting (less than 6 hours) and     for short‐term forecasting.

yt-d) Where f is a nonlinear function generated by SVM „ yt +h is predicted with the data before      yt (the red blocks). is  yt +h−1 predicted with the data before         yt −1 (the green blocks) .Fixed‐Step Prediction Scheme „ „ Prediction horizon of h steps fixed‐step forecasting means only the value of the  next       hth sample is predicted by using the historical  data.…. ŷ(t + h) = f (yt. yt-1.

Wind speed normalization .

Autocorrelations of the wind speed samples .

SVM model and the RBF model .

.1h-ahead wind power prediction using the SVM model.

  SVM model is more suitable for very short‐term and  short‐term WPF Provides a powerful tool for enhancing the WPF  accuracy. „ „ . particularly the  classification of two different  categories of patterns.CONCLUSIONS „ The SVM has been successfully applied to the  problems of pattern classification.

Sign up to vote on this title
UsefulNot useful