Introduction to Fuzzy Logic Control

1

Outline
„ „ „ „ „ „ „ „ „ „

General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining Fuzzy Rules
2

General Definition
Fuzzy Logic - 1965 Lotfi Zadeh, Berkely
„

„

„

superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth central notion of fuzzy systems is that truth values (in fuzzy logic) or membership values (in fuzzy sets) are indicated by a value on the range [0.0, 1.0], with 0.0 representing absolute Falseness and 1.0 representing absolute Truth. deals with real world vagueness

Applications
„ „ „

„ „

Expert Systems Control Units Bullet train between Tokyo and Osaka Video Cameras Automatic Transmissions

Operations A B A∧B A∨B ¬A .

Controller Structure „ „ „ Fuzzification „ Scales and maps input variables to fuzzy sets Inference Mechanism „ Approximate reasoning „ Deduces the control action Defuzzification „ Convert fuzzy output values to control signals .

MATLAB fuzzy logic toolbox „ MATLAB fuzzy logic toolbox facilitates the development of fuzzy-logic systems using: „ „ graphical user interface (GUI) tools command line functionality Fuzzy Expert Systems Adaptive Neuro-Fuzzy Inference Systems (ANFIS) „ The tool can be used for building „ „ 7 .

Graphical User Interface (GUI) Tools „ There are five primary GUI tools for building. and observing fuzzy inference systems in the Fuzzy Logic Toolbox: „ „ „ „ „ Fuzzy Inference System (FIS) Editor Membership Function Editor Rule Editor Rule Viewer Surface Viewer 8 . editing.

MATLAB: Fuzzy Logic Toolbox 9 .

MATLAB: Fuzzy Logic Toolbox 10 .

the most common methodology 11 .Fuzzy Inference system „ Two type of inference system „ „ Mamdni inference method Sugeno inference method *Mamdani's fuzzy inference method.

FIS Editor: Mamdani ‘s inference system 12 .

we need to slow it down by reducing the input voltage. 13 . the motor runs faster.Fuzzy Logic Examples using Matlab „ To control the speed of a motor by changing the input voltage When a set point is defined. the input voltage must be increased so that the motor speed reaches the set point. if for some reason. If the motor slows below the set point.

Input/Output „ „ Input status words be: „ Too slow „ Just right „ Too fast output action words be: „ „ „ Less voltage (Slow down) No change More voltage (Speed up) 14 .

FIS Editor: Adding Input / Output 15 .

FIS Editor: Adding Input / Output 16 .

Membership Function Editor 17 .

Input Membership Function 18 .

Output Membership Function 19 .

Membership Functions 20 .

21 . If motor speed is about right. then more voltage.Rules „ Define the rule-base: 1) 2) 3) If the motor is running too slow. then no change. If motor speed is to fast. then less voltage.

Member function Editor: Adding Rules 22 .

Rule Base 23 .

Rule Viewer 24 .

Surface Viewer 25 .

out=evalfis(2437.„ „ „ „ „ Save the file as “one.4.376 26 .fis) >>out =2.fis”. Now type in the commend window to get the result: >>fis = readfis('one').

method of fuzzy inference similar to the Mamdani method in many respects Fuzzifying the inputs and applying the fuzzy operator. The main difference between Mamdani and Sugeno is that the Sugeno output membership functions are either linear or constant.Sugeno-Type Fuzzy Inference „ „ „ „ Takagi-Sugeno-Kang. 27 . are exactly the same.

FIS Editor: Sugeno inference system 28 .

Add Input/output variables 29 .

Define Input/output variables 30 .

Add Input MF 31 .

Define Input MF 32 .

Add output MF 33 .

Define output MF 34 .

Add rules 35 .

Define Rule Base 36 .

View rules 37 .

Rules viewer 38 .

Surface viewer 39 .

It is computationally efficient.g. It works well with linear techniques (e. It is well suited to mathematical analysis. It has guaranteed continuity of the output surface. It works well with optimization and adaptive techniques. 40 .Advantages of the Sugeno Method „ „ „ „ „ „ Sugeno is a more compact and computationally efficient representation than a Mamdani system.. PID control).

It has widespread acceptance. 41 . It is well suited to human input.Advantages of the Mamdani Method „ „ „ It is intuitive.

Support Vector Machine & Its Applications .

Overview „ „ „ Introduction to Support Vector Machines (SVM) Properties of SVM Applications ¾ ¾ Gene Expression Data Classification Text Categorization if time permits „ Discussion .

Support Vector Machine(SVM) „ The fundamental principle of classification using the SVM is to separate the two categories of patterns Map data x into a higher‐dimensional feature space via a nonlinear mapping. The linear classification (regression) in the high dimensional space is equivalent to the nonlinear classification (regression) in the low‐dimensional space „ „ .

w.b) = sign(w x + b) w x + b<0 How would you classify this data? .Linear Classifiers w x + b>0 α x denotes +1 denotes -1 f yest f(x.

b) = sign(w x + b) How would you classify this data? .w.Linear Classifiers x denotes +1 denotes -1 α f yest f(x.

b) = sign(w x + b) yest How would you classify this data? .Linear Classifiers x denotes +1 denotes -1 α f f(x.w.

w..but which is best? ..b) = sign(w x + b) Any of these would be fine.Linear Classifiers x denotes +1 denotes -1 α f yest f(x. .

b) = sign(w x + b) yest How would you classify this data? Misclassified to +1 class .w.Linear Classifiers x denotes +1 denotes -1 α f f(x.

f(x.Classifier Margin x denotes +1 denotes -1 α f yest Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.w.b) = sign(w x + b) .

other The training examples maximum are ignorable. um.b) = sign( x+ b) 2.Maximum Margin denotes +1 denotes -1 1. Support Vectors are those datapoints that the margin pushes up against margin linear 3.intuition w. Implies that only support vectors are important. the linear classifier with the. Maximizing the margin is good accordingf( to and w PAC theory x. maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM . Empirically it works very very classifier iswell.

x. (x+-x-) = 2 x+ (x − x ) ⋅ w 2 M = = w w + − .Linear SVM Mathematically x+ M=Margin Width X- What we know: „ „ „ w. + b = +1 w .+ b = -1 w .

Linear SVM Mathematically „ Goal: 1) Correctly classify all training data wx i + b ≥ 1 if yi = +1 wx i + b ≤ 1 if yi = -1 yi ( wxi + b) ≥ 1for all i 2 M = 2) Maximize the Margin 1 t w same as minimize ww 2 We can formulate a Quadratic Optimization Problem and solve for w and b „ subject to 1 t Minimize Φ ( w) = w w 2 yi ( wxi + b) ≥ 1 ∀i .

„ „ and for all {(xi . Quadratic optimization problems are a well‐known class of mathematical  programming problems. The solution involves constructing a dual problem where a Lagrange  multiplier αi is associated with every constraint in the primary problem: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjxiTxj is maximized and (1) „ Σα i y i = 0 (2) αi ≥ 0 for all αi .Solving the Optimization Problem Find w and b such that Φ(w) =½ wTw is minimized.yi)}: yi (wTxi + b) ≥ 1 Need to optimize a quadratic function subject to linear constraints. and many (rather intricate) algorithms exist for  solving them.

Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points.The Optimization Problem Solution „ The solution has the form: w = Σα i y i x i b= yk.wTxk for any xk such that αk≠ 0 Each non-zero αi indicates that corresponding xi is a support vector. „ „ „ „ . Then the classifying function will have the form: f(x) = ΣαiyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi – we will return to this later.

No training error What if the training set is noisy? .Solution 1: use very powerful kernels OVERFITTING! .Dataset with noise denotes +1 denotes -1 „ „ Hard Margin: So far we require all data points be classified correctly .

w + C∑εk 2 k =1 ε7 . ε2 ε11 What should our quadratic optimization criterion be? Minimize R 1 w.Soft Margin Classification Slack variables ξi can be added to allow misclassification of difficult or noisy examples.

Soft Margin „ The old formulation: Find w and b such that Φ(w) =½ wTw is minimized and for all {(xi .s.ξi and ξi ≥ 0 for all i „ Parameter C can be viewed as a way to control overfitting.yi)} yi (wTxi + b) ≥ 1.Hard Margin v.yi)} yi (wTxi + b) ≥ 1 „ The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ wTw + CΣξi is minimized and for all {(xi . .

Both in the dual formulation of the problem and in the solution training points appear only inside dot products: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) 0 ≤ αi ≤ C for all αi f(x) = ΣαiyixiTx + b . Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers αi.Linear SVMs:Overview „ „ „ „ The classifier is a separating hyperplane. Most “important” training points are support vectors. they define the hyperplane.

Non-linear SVMs „ Datasets that are linearly separable with some noise work out great: 0 x „ But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: x2 0 x „ 0 x .

Non-linear SVMs: Feature spaces „ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) .

xj)=(1 + xiTxj)2. the dot product becomes: K(xi. where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] . Example: 2-dimensional vectors x=[x1 x2]. let K(xi. = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj).xj)= φ(xi) Tφ(xj): K(xi.xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x).xj)= φ(xi) Tφ(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space. Need to show that K(xi.The “Kernel Trick” „ „ „ „ The linear classifier relies on dot product between vectors K(xi.xj)=(1 + xiTxj)2.

x3) … … … K(xN.xj) checking that K(xi.xN) … K(xN. Mercer’s theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: „ „ K= K(x1.x3) K(x2.x2) K(xN.x3) … … … K(x1.x2) K(x1.What Functions are Kernels? „ For some functions K(xi.xj)= φ(xi) Tφ(xj) can be cumbersome.xN) .x1) K(xN.x1) K(x2.x1) K(x1.xN) K(x2.x2) K(x2.

x j ) = exp(− „ xi − x j 2σ 2 2 ) Sigmoid: K(xi.xj)= tanh(β0xi Txj + β1) .xj)= xi Txj Polynomial of power p: K(xi.xj)= (1+ xi Txj)p Gaussian (radial-basis function network): „ „ K (x i .Examples of Kernel Functions „ Linear: K(xi.

xj)+ b „ Optimization techniques for finding αi’s remain the same! .Non-linear SVMs Mathematically „ Dual problem formulation: Find α1…αN such that Q(α) =Σαi . xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi „ The solution is: f(x) = ΣαiyiK(xi.½ΣΣαiαjyiyjK(xi.

. simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.Nonlinear SVM .Overview „ „ „ SVM locates a separating hyper plane in the feature space and classify points in that space It does not need to represent the space explicitly.

Properties of SVM Flexibility in choosing a similarity function „ Sparseness of solution when dealing with large data sets .complexity does not depend on the dimensionality of the feature space „ Over fitting can be controlled by soft margin approach „ Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution „ Feature Selection „ .only support vectors are used to specify the separating hyper plane „ Ability to handle large feature spaces .

Cancer classification) .SVM Applications „ SVM has been used successfully in many real-world problems .hand-written character recognition .image classification .text (and hypertext) categorization .bioinformatics (Protein classification.

how to do multi-class classification with SVM? . learn m SVM’s „ SVM 1 learns “Output==1” vs “Output != 1” „ SVM 2 learns “Output==2” vs “Output != 2” „ : „ SVM m learns “Output==m” vs “Output != m” 2)To predict the output for a new input. .Weakness of SVM „ It is sensitive to noise .Answer: 1) with output arity m.A relatively small number of mislabeled examples can dramatically decrease the performance „ It only considers two classes . just predict with each SVM and find out which one puts the prediction the furthest into the positive region.

 Soft margin „ .Some Issues „ Choice of kernel „ „ „ Gaussian or polynomial kernel is default if ineffective. more elaborate kernels are needed domain experts can give assistance in formulating  appropriate similarity measures σ in Gaussian kernel σ is the distance between closest points with different  classifications  A lengthy series of experiments in which various  parameters are tested  „ Choice of kernel parameters „ „ „ Optimization criterion – Hard margin vs.

Wind Power Forecasting(WPF) „ „ „ WPF is a technique which provides the information of  how much wind power can be expected at a given  point of time. A good short‐term forecasting will ensure grid  stability and a favorable trading performance on the  electricity markets. . Due to the increasing penetration of wind power into  the electric power grid.

Ɛ-SVM
„

The objective function of the ε ‐SVM is based on a ε ‐ insensitive loss function. The formula for the ε‐SVM is given as follows:

Structure of SVM

Data Resolution
„ „

„

The resolution of the dataset is 10 minutes. Each data represents the average wind speed and  power within one hour. The data values betweenxj two adjacent samples are  linearly changed, that is:
xi+1 + xi ) x j (t ) = xi + .t dti 0 ≤ t ≤ dti

„

x i and       xi +1 . Where      dti is the time interval between     

Data Value
„

The average value of the data withinT       can be  s calculated as   

1 ) x j (t ) = Ts
„

ti +Ts

) ∫ x j (t )dt
ti

where Ts = 60 minutes is used in the very short‐term  Ts = 2 hours is used  forecasting (less than 6 hours) and     for short‐term forecasting.

yt-1.…. ŷ(t + h) = f (yt. is  yt +h−1 predicted with the data before         yt −1 (the green blocks) .yt-d) Where f is a nonlinear function generated by SVM „ yt +h is predicted with the data before      yt (the red blocks).Fixed‐Step Prediction Scheme „ „ Prediction horizon of h steps fixed‐step forecasting means only the value of the  next       hth sample is predicted by using the historical  data.

Wind speed normalization .

Autocorrelations of the wind speed samples .

SVM model and the RBF model .

1h-ahead wind power prediction using the SVM model. .

 particularly the  classification of two different  categories of patterns.  SVM model is more suitable for very short‐term and  short‐term WPF Provides a powerful tool for enhancing the WPF  accuracy.CONCLUSIONS „ The SVM has been successfully applied to the  problems of pattern classification. „ „ .

Sign up to vote on this title
UsefulNot useful