P. 1
12 Deepak Fmuzzy Logic SVMkkkkkkkkkk

12 Deepak Fmuzzy Logic SVMkkkkkkkkkk

|Views: 0|Likes:
Published by Himanshu Shivanand

More info:

Published by: Himanshu Shivanand on Apr 09, 2013
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Introduction to Fuzzy Logic Control


„ „ „ „ „ „ „ „ „ „

General Definition Applications Operations Rules Fuzzy Logic Toolbox FIS Editor Tipping Problem: Fuzzy Approach Defining Inputs & Outputs Defining MFs Defining Fuzzy Rules

General Definition
Fuzzy Logic - 1965 Lotfi Zadeh, Berkely



superset of conventional (Boolean) logic that has been extended to handle the concept of partial truth central notion of fuzzy systems is that truth values (in fuzzy logic) or membership values (in fuzzy sets) are indicated by a value on the range [0.0, 1.0], with 0.0 representing absolute Falseness and 1.0 representing absolute Truth. deals with real world vagueness

„ „ „

„ „

Expert Systems Control Units Bullet train between Tokyo and Osaka Video Cameras Automatic Transmissions

Operations A B A∧B A∨B ¬A .

Controller Structure „ „ „ Fuzzification „ Scales and maps input variables to fuzzy sets Inference Mechanism „ Approximate reasoning „ Deduces the control action Defuzzification „ Convert fuzzy output values to control signals .

MATLAB fuzzy logic toolbox „ MATLAB fuzzy logic toolbox facilitates the development of fuzzy-logic systems using: „ „ graphical user interface (GUI) tools command line functionality Fuzzy Expert Systems Adaptive Neuro-Fuzzy Inference Systems (ANFIS) „ The tool can be used for building „ „ 7 .

and observing fuzzy inference systems in the Fuzzy Logic Toolbox: „ „ „ „ „ Fuzzy Inference System (FIS) Editor Membership Function Editor Rule Editor Rule Viewer Surface Viewer 8 .Graphical User Interface (GUI) Tools „ There are five primary GUI tools for building. editing.

MATLAB: Fuzzy Logic Toolbox 9 .

MATLAB: Fuzzy Logic Toolbox 10 .

Fuzzy Inference system „ Two type of inference system „ „ Mamdni inference method Sugeno inference method *Mamdani's fuzzy inference method. the most common methodology 11 .

FIS Editor: Mamdani ‘s inference system 12 .

the motor runs faster. if for some reason.Fuzzy Logic Examples using Matlab „ To control the speed of a motor by changing the input voltage When a set point is defined. the input voltage must be increased so that the motor speed reaches the set point. we need to slow it down by reducing the input voltage. If the motor slows below the set point. 13 .

Input/Output „ „ Input status words be: „ Too slow „ Just right „ Too fast output action words be: „ „ „ Less voltage (Slow down) No change More voltage (Speed up) 14 .

FIS Editor: Adding Input / Output 15 .

FIS Editor: Adding Input / Output 16 .

Membership Function Editor 17 .

Input Membership Function 18 .

Output Membership Function 19 .

Membership Functions 20 .

then more voltage. 21 . then less voltage. then no change. If motor speed is to fast.Rules „ Define the rule-base: 1) 2) 3) If the motor is running too slow. If motor speed is about right.

Member function Editor: Adding Rules 22 .

Rule Base 23 .

Rule Viewer 24 .

Surface Viewer 25 .

Now type in the commend window to get the result: >>fis = readfis('one').fis) >>out =2. out=evalfis(2437.4.376 26 .fis”.„ „ „ „ „ Save the file as “one.

27 . are exactly the same. method of fuzzy inference similar to the Mamdani method in many respects Fuzzifying the inputs and applying the fuzzy operator.Sugeno-Type Fuzzy Inference „ „ „ „ Takagi-Sugeno-Kang. The main difference between Mamdani and Sugeno is that the Sugeno output membership functions are either linear or constant.

FIS Editor: Sugeno inference system 28 .

Add Input/output variables 29 .

Define Input/output variables 30 .

Add Input MF 31 .

Define Input MF 32 .

Add output MF 33 .

Define output MF 34 .

Add rules 35 .

Define Rule Base 36 .

View rules 37 .

Rules viewer 38 .

Surface viewer 39 .

It works well with optimization and adaptive techniques.Advantages of the Sugeno Method „ „ „ „ „ „ Sugeno is a more compact and computationally efficient representation than a Mamdani system.g. It has guaranteed continuity of the output surface. It is well suited to mathematical analysis. It works well with linear techniques (e. PID control). 40 .. It is computationally efficient.

It has widespread acceptance.Advantages of the Mamdani Method „ „ „ It is intuitive. 41 . It is well suited to human input.

Support Vector Machine & Its Applications .

Overview „ „ „ Introduction to Support Vector Machines (SVM) Properties of SVM Applications ¾ ¾ Gene Expression Data Classification Text Categorization if time permits „ Discussion .

Support Vector Machine(SVM) „ The fundamental principle of classification using the SVM is to separate the two categories of patterns Map data x into a higher‐dimensional feature space via a nonlinear mapping. The linear classification (regression) in the high dimensional space is equivalent to the nonlinear classification (regression) in the low‐dimensional space „ „ .

Linear Classifiers w x + b>0 α x denotes +1 denotes -1 f yest f(x.b) = sign(w x + b) w x + b<0 How would you classify this data? .w.

b) = sign(w x + b) How would you classify this data? .Linear Classifiers x denotes +1 denotes -1 α f yest f(x.w.

b) = sign(w x + b) yest How would you classify this data? .w.Linear Classifiers x denotes +1 denotes -1 α f f(x.

Linear Classifiers x denotes +1 denotes -1 α f yest f(x.w. ..b) = sign(w x + b) Any of these would be fine..but which is best? .

Linear Classifiers x denotes +1 denotes -1 α f f(x.b) = sign(w x + b) yest How would you classify this data? Misclassified to +1 class .w.

f(x.Classifier Margin x denotes +1 denotes -1 α f yest Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.b) = sign(w x + b) .w.

maximum margin. Support Vectors are those datapoints that the margin pushes up against margin linear 3. Implies that only support vectors are important. other The training examples maximum are ignorable.b) = sign( x+ b) 2.Maximum Margin denotes +1 denotes -1 1. the linear classifier with the. This is the simplest kind of SVM (Called an LSVM) Linear SVM .intuition w. um. Empirically it works very very classifier iswell. Maximizing the margin is good accordingf( to and w PAC theory x.

(x+-x-) = 2 x+ (x − x ) ⋅ w 2 M = = w w + − . x.+ b = -1 w .Linear SVM Mathematically x+ M=Margin Width X- What we know: „ „ „ w. + b = +1 w .

Linear SVM Mathematically „ Goal: 1) Correctly classify all training data wx i + b ≥ 1 if yi = +1 wx i + b ≤ 1 if yi = -1 yi ( wxi + b) ≥ 1for all i 2 M = 2) Maximize the Margin 1 t w same as minimize ww 2 We can formulate a Quadratic Optimization Problem and solve for w and b „ subject to 1 t Minimize Φ ( w) = w w 2 yi ( wxi + b) ≥ 1 ∀i .

Quadratic optimization problems are a well‐known class of mathematical  programming problems. „ „ and for all {(xi . and many (rather intricate) algorithms exist for  solving them.Solving the Optimization Problem Find w and b such that Φ(w) =½ wTw is minimized. The solution involves constructing a dual problem where a Lagrange  multiplier αi is associated with every constraint in the primary problem: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjxiTxj is maximized and (1) „ Σα i y i = 0 (2) αi ≥ 0 for all αi .yi)}: yi (wTxi + b) ≥ 1 Need to optimize a quadratic function subject to linear constraints.

Then the classifying function will have the form: f(x) = ΣαiyixiTx + b Notice that it relies on an inner product between the test point x and the support vectors xi – we will return to this later. Also keep in mind that solving the optimization problem involved computing the inner products xiTxj between all pairs of training points.wTxk for any xk such that αk≠ 0 Each non-zero αi indicates that corresponding xi is a support vector.The Optimization Problem Solution „ The solution has the form: w = Σα i y i x i b= yk. „ „ „ „ .

No training error What if the training set is noisy? .Solution 1: use very powerful kernels OVERFITTING! .Dataset with noise denotes +1 denotes -1 „ „ Hard Margin: So far we require all data points be classified correctly .

ε2 ε11 What should our quadratic optimization criterion be? Minimize R 1 w.Soft Margin Classification Slack variables ξi can be added to allow misclassification of difficult or noisy examples.w + C∑εk 2 k =1 ε7 .

yi)} yi (wTxi + b) ≥ 1. Soft Margin „ The old formulation: Find w and b such that Φ(w) =½ wTw is minimized and for all {(xi .s.ξi and ξi ≥ 0 for all i „ Parameter C can be viewed as a way to control overfitting.Hard Margin v.yi)} yi (wTxi + b) ≥ 1 „ The new formulation incorporating slack variables: Find w and b such that Φ(w) =½ wTw + CΣξi is minimized and for all {(xi . .

½ΣΣαiαjyiyjxiTxj is maximized and (1) Σαiyi = 0 (2) 0 ≤ αi ≤ C for all αi f(x) = ΣαiyixiTx + b .Linear SVMs:Overview „ „ „ „ The classifier is a separating hyperplane. they define the hyperplane. Both in the dual formulation of the problem and in the solution training points appear only inside dot products: Find α1…αN such that Q(α) =Σαi . Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian multipliers αi. Most “important” training points are support vectors.

Non-linear SVMs „ Datasets that are linearly separable with some noise work out great: 0 x „ But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: x2 0 x „ 0 x .

Non-linear SVMs: Feature spaces „ General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) .

xj)= φ(xi) Tφ(xj): K(xi. the dot product becomes: K(xi. where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] . let K(xi. = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj).The “Kernel Trick” „ „ „ „ The linear classifier relies on dot product between vectors K(xi.xj)=(1 + xiTxj)2.xj)= φ(xi) Tφ(xj) A kernel function is some function that corresponds to an inner product in some expanded feature space.xj)=xiTxj If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x). Need to show that K(xi. Example: 2-dimensional vectors x=[x1 x2].xj)=(1 + xiTxj)2.

x1) K(x2.xN) .xN) K(x2.xN) … K(xN.x3) … … … K(x1.xj)= φ(xi) Tφ(xj) can be cumbersome.x3) K(x2.x3) … … … K(xN.x1) K(xN.x2) K(x1.x2) K(x2.What Functions are Kernels? „ For some functions K(xi. Mercer’s theorem: Every semi-positive definite symmetric function is a kernel Semi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix: „ „ K= K(x1.x1) K(x1.x2) K(xN.xj) checking that K(xi.

xj)= xi Txj Polynomial of power p: K(xi.xj)= (1+ xi Txj)p Gaussian (radial-basis function network): „ „ K (x i . x j ) = exp(− „ xi − x j 2σ 2 2 ) Sigmoid: K(xi.Examples of Kernel Functions „ Linear: K(xi.xj)= tanh(β0xi Txj + β1) .

xj)+ b „ Optimization techniques for finding αi’s remain the same! .Non-linear SVMs Mathematically „ Dual problem formulation: Find α1…αN such that Q(α) =Σαi .½ΣΣαiαjyiyjK(xi. xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi „ The solution is: f(x) = ΣαiyiK(xi.

.Overview „ „ „ SVM locates a separating hyper plane in the feature space and classify points in that space It does not need to represent the space explicitly. simply by defining a kernel function The kernel function plays the role of the dot product in the feature space.Nonlinear SVM .

Properties of SVM Flexibility in choosing a similarity function „ Sparseness of solution when dealing with large data sets .only support vectors are used to specify the separating hyper plane „ Ability to handle large feature spaces .complexity does not depend on the dimensionality of the feature space „ Over fitting can be controlled by soft margin approach „ Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution „ Feature Selection „ .

SVM Applications „ SVM has been used successfully in many real-world problems .bioinformatics (Protein classification.image classification . Cancer classification) .text (and hypertext) categorization .hand-written character recognition .

learn m SVM’s „ SVM 1 learns “Output==1” vs “Output != 1” „ SVM 2 learns “Output==2” vs “Output != 2” „ : „ SVM m learns “Output==m” vs “Output != m” 2)To predict the output for a new input.how to do multi-class classification with SVM? .A relatively small number of mislabeled examples can dramatically decrease the performance „ It only considers two classes .Answer: 1) with output arity m. just predict with each SVM and find out which one puts the prediction the furthest into the positive region.Weakness of SVM „ It is sensitive to noise . .

Some Issues „ Choice of kernel „ „ „ Gaussian or polynomial kernel is default if ineffective. more elaborate kernels are needed domain experts can give assistance in formulating  appropriate similarity measures σ in Gaussian kernel σ is the distance between closest points with different  classifications  A lengthy series of experiments in which various  parameters are tested  „ Choice of kernel parameters „ „ „ Optimization criterion – Hard margin vs. Soft margin „ .

.Wind Power Forecasting(WPF) „ „ „ WPF is a technique which provides the information of  how much wind power can be expected at a given  point of time. Due to the increasing penetration of wind power into  the electric power grid. A good short‐term forecasting will ensure grid  stability and a favorable trading performance on the  electricity markets.


The objective function of the ε ‐SVM is based on a ε ‐ insensitive loss function. The formula for the ε‐SVM is given as follows:

Structure of SVM

Data Resolution
„ „


The resolution of the dataset is 10 minutes. Each data represents the average wind speed and  power within one hour. The data values betweenxj two adjacent samples are  linearly changed, that is:
xi+1 + xi ) x j (t ) = xi + .t dti 0 ≤ t ≤ dti


x i and       xi +1 . Where      dti is the time interval between     

Data Value

The average value of the data withinT       can be  s calculated as   

1 ) x j (t ) = Ts

ti +Ts

) ∫ x j (t )dt

where Ts = 60 minutes is used in the very short‐term  Ts = 2 hours is used  forecasting (less than 6 hours) and     for short‐term forecasting.

…. ŷ(t + h) = f (yt.yt-d) Where f is a nonlinear function generated by SVM „ yt +h is predicted with the data before      yt (the red blocks).Fixed‐Step Prediction Scheme „ „ Prediction horizon of h steps fixed‐step forecasting means only the value of the  next       hth sample is predicted by using the historical  data. yt-1. is  yt +h−1 predicted with the data before         yt −1 (the green blocks) .

Wind speed normalization .

Autocorrelations of the wind speed samples .

SVM model and the RBF model .

1h-ahead wind power prediction using the SVM model. .

  SVM model is more suitable for very short‐term and  short‐term WPF Provides a powerful tool for enhancing the WPF  accuracy. „ „ . particularly the  classification of two different  categories of patterns.CONCLUSIONS „ The SVM has been successfully applied to the  problems of pattern classification.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->