122 views

Uploaded by Dr.A.R.Kavitha

- Tutorial SVM Matlab
- Se7204 Big Data Analytics l t p c
- Abstract(Ppt)
- 10.1.1.113
- i 0925259
- SinhaDu16
- Regression Basics
- Question Bank-Big Data
- 7
- Unit-5 (OOAD)
- Munoz, SVM and Applications, Statistical Science 2006
- SinhaDu16.pdf
- CP7019-Managing Big Data-Anna University -Question Paper
- Se 7204 Big Data Analysis Unit III Final 20.4.2017
- SE 7204 BIG Data Analysis Unit I Final
- getPDF21
- CUSVM: A CUDA IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION AND REGRESSION
- 120.508 Module 8 Multiple Regression (PDF Full Page Color)
- C2D1
- Statistics Study Guide TI-83

You are on page 1of 108

RegressionModeling

MultivariateAnalysis

BayesianMethods

BayesianParadigm

BayesianModeling

InferenceandBayesianNetworks

SupportVectorandKernelMethods

AnalysisofTimeSeries:LinearSystemsAnalysis

NonlinearDynamics

RuleInduction

FuzzyLogic:ExtractingFuzzyModelsfromData

FuzzyDecisionTrees

RegressionAnalysis

statistical models that characterize relationships

among a dependent variable and one or more

independent variables, all of which are

numerical.

Simple linear regression involves a single

independentvariable.

Multiple regression involves two or more

independentvariables.

9-2

Regression: Introduction

Whatisregression?

Regression is a statistical technique to determine the linear

relationship between two or more variables. Regression is

primarilyusedforpredictionandcausalinference.

In its simplest (bivariate) form, regression shows the

relationship between one independent variable (X) and a

dependentvariable(Y),asintheformulabelow:

Basicidea:

Usedatatoidentifyrelationshipsamongvariablesandusethese

relationshipstomakepredictions.

Regression Model

The variable we are trying to predict (Y) is called the

dependent (or response) variable.

The variable x is called the independent (or predictor,

or explanatory) variable.

Our model assumes that

E(Y | X = x) = 0 + 1x (the population line) (1)

The interpretation is as follows:

When X (house size) is fixed at a level x, then we assume the

mean of Y (selling price) to be linear around the level x, where

0 is the (unknown) intercept and 1 is the (unknown) slope or

incremental change in Y per unit change in X.

0 and 1 are not known exactly, but are estimated from sample

data. Their estimates are denoted b0 and b1.

A simple regression model: Consider a model with only

one independent variable,.

A multiple regression model: a model with multiple

independent variables.

PurposeofRegressionAnalysis

Thepurposeofregressionanalysisistoanalyzerelationships

amongvariables.

Theanalysisiscarriedoutthroughtheestimationofa

relationshipandtheresultsservethefollowingtwopurposes:

1. Answerthequestionofhowmuchychangeswithchangesin

eachofthex's(x1,x2,...,xk),

Yisthedependentvariable

2. Forecastorpredictthevalueofybasedonthevaluesofthe

X's

Xistheindependentvariable

SimpleLinearRegression

Findsalinearrelationshipbetween:

oneindependentvariableXand

onedependentvariableY

Firstprepareascatterplottoverifythedata

hasalineartrend.

Use alternative approaches if the

data is not linear.

Figure 9.1

9-6

SimpleLinearRegression

Example 9.1

Home Market Value Data

Sizeofahouseistypically

relatedtoitsmarketvalue.

Figure 9.2

X=squarefootage

Y=marketvalue($)

Thescatterplotofthefull

dataset(42homes)

indicatesalineartrend.

Figure 9.3

9-7

Simple Linear Regression

FindingtheBestFittingRegressionLine

Twopossiblelinesareshownbelow.

LineAisclearlyabetterfittothedata.

^

Wewanttodeterminethebestregressionline.

Y=b0+b1X

where

b0istheintercept

b1istheslope

Figure 9.4

9-8

Simple Linear Regression

Using Excel to Find the Best Regression

Line

Market value = 32673 +

The regression model

35.036(square feet) explains variation in

market value due to

size of the home.

It provides better

estimates of market

value than simply

using the average.

Figure 9.5

9-9

Building Good Regression Models

Predicting Average Bank Balance

using Regression

Home Value and Education

are not significant.

Figure 9.17

9-12

BuildingGoodRegressionModels

SystematicApproachtoBuildingGoodMultipleRegressionModels

1.Constructamodelwithallavailableindependent

variablesandcheckforsignificanceofeach.

2.Identifythelargestpvaluethatisgreaterthan.05

3.RemovethatvariableandevaluateadjustedR2.

4.Continueuntilallvariablesaresignificant.

FindthemodelwiththehighestadjustedR2.

(DonotuseunadjustedR2sinceitalways

increaseswhenvariablesareadded.)

9-13

Building Good Regression Models

Identifying the Best Regression Model

Bank regression after removing

Home Value

Adjusted R2 improves slightly.

Figure 9.18

9-14

Regression Modeling

Steps

Defineproblemorquestion

Specifymodel

Collectdata

Dodescriptivedataanalysis

Estimateunknownparameters

Evaluatemodel

Usemodelforprediction

The 13 Steps for

Statistical Modeling in

any Regression

Part 1: Define and Design

In the first 4 steps, the object is

clarity. You want to make everything

as clear as possible to yourself.

1.Writeoutresearchquestionsintheoreticalandoperationalterms

A lot of times, when researchers are confused about the right statistical method to use, the

realproblemistheyhaventdefinedtheirresearchquestions.Theyhaveageneralideaofthe

relationshiptheywanttotest,butitsabitvague.Youneedtobeveryspecific.

2.Designthestudyordefinethedesign

Depending on whether you are collecting your own data or doing secondary data analysis,

youneedaclearideaofthedesign.Designissuesareaboutrandomizationandsampling:

3. Choose the variables for answering research questions and determine their level of

measurement

Every model has to take into account both the design and the level of measurement of the

variables.

Level of measurement, remember, is whether a variable is nominal, ordinal, or interval.

Withininterval,youalsoneedtoknowifvariablesarediscretecountsorcontinuous.

4.Writeananalysisplan

Writeyourbestguessforthestatisticalmethodthatwillanswertheresearchquestion,taking

intoaccountthedesignandthetypeofdata.

Itdoesnothavetobefinalatthispointitjustneedstobeareasonableapproximation.

5.Calculatesamplesizeestimations

Thisisthepointatwhichyoushouldcalculateyoursamplesizesbeforeyoucollectdataand

afteryouhaveananalysisplan.Youneedtoknowwhichstatisticaltestsyouwilluseasa

basisfortheestimates.

Part3:Refinethemodel

10.Refinepredictorsandcheckmodelfit

Ifyouaredoingatrulyexploratoryanalysis,orifthepointofthemodelispureprediction,

youcanusesomesortofstepwiseapproachtodeterminethebestpredictors.

Iftheanalysisistotesthypothesesoranswertheoreticalresearchquestions,thispartwillbe

moreaboutrefinement.Youcan

Test, and possibly drop, interactions and quadratic or explore other types of nonlinearity

Testthebestspecificationofrandomeffects

11.Testassumptions

Because you already investigated the right family of models in Part 1, thoroughly

investigated your variables in Step 8, and correctly specified your model in Step 10, you

shouldnothavebigsurpriseshere.Rather,thisstepwillbeaboutconfirming,checking,and

refining. But what you learn here can send you back to any of those steps for further

refinement.

12.Checkforandresolvedataissues

Steps11and12areoftendonetogether,Dataissuesareaboutthedata,notthemodel,but

occurwithinthecontextofthemodel

13.InterpretResults

Now,finally,interprettheresults.

Youmaynotnoticedataissuesormisspecifiedpredictorsuntilyouinterpretthecoefficients.

Thenyoufindsomethinglikeasuperhighstandarderrororacoefficientwithasignopposite

whatyouexpected,sendingyoubacktoprevioussteps.

MultivariateAnalysis

WhatisMVA?

thanfivevariables.Somepeopleusethetermmegavariateanalysistodenote

caseswheretherearemorethanahundredvariables.

MVAusesALLavailabledatatocapturethemostinformationpossible.The

basicprincipleistoboildownhundredsofvariablesdowntoamerehandful.

MVA

Graphical representation of MVA

Tmt X1 X4 X5 Rep Y avec Y sans

to

1 -1 -1 -1 1 2.51 2.74 .

1 -1 -1 -1 2 2.36 3.22 . .. . software)

1 -1 -1 -1 3 2.45 2.56

..

2 -1 0 1 1 2.63 3.23

.

2 -1 0 1 2 2.55 2.47 . .

2 -1 0 1 3 2.65 2.31 . .

3

3

-1

-1

Raw Data:

1

1

0

0

1

2

2.45

2.6

2.67

2.45

4

-1

0

impossible to

1

-1

0

1

3

1

2.53

3.02

2.98

3.22

4 0 interpret

-1 1 2 2.7 2.57

4 0 -1 1 3 2.97 2.63

5 0 0 0 1 2.89 3.16 Y

5 0 0 0 2 2.56 3.32 trends

5 0 0 0 3 2.52 3.26

6 0 1 -1 1 2.44 3.1

trends X

6 0 1 -1 2 2.22 2.97

X trends

6 0 1 -1 3 2.27 2.92

X

hundreds of columns X

thousands of rows

2-D Visual Outputs

Example: Apples and Oranges

things to measure on apples and oranges, to tell them apart:

Colour, shape, firmness, reflectivity,

Skin: smoothness, thickness, morphology,

Juice: water content, pH, composition,

Seeds: colour, weight, size distribution,

etc.

+1 -1

apple or an orange? In MVA parlance, we would say that there is

only one latent attribute.

Graphical Representation of MVA

Taken to its extreme, this can mean going from hundreds of

dimensions (variables) down to just two, allowing us to create a 2-

dimensional graph.

Using these graphs, which our eyes and brains can easily handle,

we are able to peer into the database and identify trends and

correlations.

This is illustrated on

the next page

MultivariateAnalysis

Manystatisticaltechniquesfocusonjustone

ortwovariables

Multivariateanalysis(MVA)techniquesallow

morethantwovariablestobeanalysedatonce

Multipleregressionisnottypicallyincludedunder

thisheading,butcanbethoughtofasa

multivariateanalysis

Data-Rich but Knowledge-Poor

relationships which are not intuitively obvious lie hidden inside

enormous, unwieldy databases. Also, many variables are correlated.

extracting this useful knowledge. Some examples are:

Neural Networks

Multiple Regression

Decision Trees

Genetic Algorithms

Clustering

MVA Subject of

Subject of this

this module

module

Mining data

MultivariateAnalysisMethods

TwogeneraltypesofMVAtechnique

Analysisofdependence

Whereone(ormore)variablesaredependentvariables,

tobeexplainedorpredictedbyothers

E.g.Multipleregression,PLS,MDA

Analysisofinterdependence

Novariablesthoughtofasdependent

Lookattherelationshipsamongvariables,objectsor

cases

E.g.clusteranalysis,factoranalysis

Multivariate Analysis: Benefits

What is the point of doing MVA?

between different process variables. It is well known that simply

creating a model can provide insight in the process itself (Learn by

modelling).

perform what if? exercises without affecting the real process. This

is a low-cost way to investigate options.

measured in real time. They can, however, be inferred from other

variables that are measured on-line. When incorporated in the

process control system, this inferential controller or soft sensor

can greatly improve process performance.

BayesianMethods

BayesianParadigm

BayesianModeling

InferenceandBayesianNetworks

Referpdf

SupportVectorandKernel

Methods

SVMs are currently of great interest to theoretical

researchers and

applied scientists.

By means of the new technology of kernel methods, SVMs

have been very successful in building highly nonlinear

classifiers.

SVMs have also been successful in dealing with situations

in which there are many more variables than

observations, and complexly structured data.

Wide applications in machine learning, natural language

processing, boinformatics.

Kernel methods: key idea

Input space X Feature space F

x1 x2 inverse map -1

(xn)

(x) (xn-

(x1)

1)

xn (x

xn 2)

-1 k(xi,xj) = (xi).(xj)

Kernel matrix Knxn

(computation on kernel matrix)

: X R2 H R3

(x 1, x 2 ) (x , 1x ,2 x 2

1 x22 )

Kernel

PCA

PCA is carried out in a reproducing kernel

Hilbert space with a linear mapping.

Kernel methods: math

background

k(xi,xj) = (xi).(xj)

Kernel matrix Knxn

(computation on kernel matrix)

Mercer theorem: Any positive definite function can be written as an inner

product in some feature space.

Kernel trick: Using kernel matrix instead of inner product in the feature space.

Representer theorem (Wahba): Every minimizer of min{C( f ,{xi , yi }) ) admits

f H H

( f m

i1

Multiclass support vector

machines

Multiclass SVM as a series of binary problems

One-versus-

rest: Divide the

K-class problem

into K binary

classification

subproblems of

the type kth

class vs.

not kth

class,

k = 1, 2, . . .,K.

One-versus-

one: Divide the

K-class problem 37

Multiclass support vector

machines

A true multiclass SVM

To construct a true multiclass SVM classifier, we need to

consider all K classes, 1,2, . . . ,K, simultaneously, and the

classifier has to reduce to the binary SVM classifier if K = 2.

One construction due to Lee, Lin, and Wahba (2004).

Provide a unifying framework to multicategory SVM when

there are

either equal or unequal misclassification costs.

38

Which Separating Hyperplane to

Use?

Var1

Var2

39

Maximizing the Margin

Var1 IDEA 1: Select the

separating

hyperplane that

maximizes the

margin!

Margin

Width

Margin

Width

Var2

40

Support Vectors

Var1

Support Vectors

Margin

Width

Var2

41

Setting Up the Optimization

Problem

Var1

The width of the

margin is:

2k

w

w x b k So, the problem is:

w

2k

max

w

w x b k

k k Var2

s.t. (w x b) k , x of class 1

(w x b) k , x of class 2

w x b 0

42

Setting Up the Optimization

Problem

Var1

There is a scale and

unit for data so that

k=1. Then problem

becomes:

2

max

w x b 1 w

w s.t. (w x b) 1, x of class 1

(w x b) 1, x of class 2

w x b 1

1 1 Var2

w x b 0

43

Setting Up the Optimization

Problem

If class 1 corresponds to 1 and class 2

corresponds to -1, we can rewrite

(w xi b) 1, xi with yi 1

(w xi b) 1, xi with yi 1

as

yi (w xi b) 1, xi

So the problem becomes:

2 1 2

max min w

w or 2

s.t. yi (w xi b) 1, xi s.t. yi (w xi b) 1, xi

44

Linear, Hard-Margin SVM

Formulation

Find w,b that solves

1 2

min w

2

s.t. yi (w xi b) 1, xi

Problem is convex so, there is a unique global minimum value

(when feasible)

There is also a unique minimizer, i.e. weight and b value that

provides the minimum

Non-solvable if the data is not linearly separable

Quadratic Programming

Very efficient computationally with modern

constraint optimization engines (handles

thousands of constraints and training

instances).

45

Support Vector Machines

Three main ideas:

1. Define what an optimal hyperplane is (in way

that can be identified in a computationally

efficient way): maximize margin

2. Extend the above definition for non-linearly

separable problems: have a penalty term for

misclassifications

3. Map data to high dimensional space where it

is easier to classify with linear decision

surfaces: reformulate problem so that data is

mapped implicitly to this space

46

Support Vector Machines

Three main ideas:

1. Define what an optimal hyperplane is (in way

that can be identified in a computationally

efficient way): maximize margin

2. Extend the above definition for non-linearly

separable problems: have a penalty term for

misclassifications

3. Map data to high dimensional space where it

is easier to classify with linear decision

surfaces: reformulate problem so that data is

mapped implicitly to this space

47

Non-Linearly Separable

Data

Var1 Introduce slack

i variables i

Allow some

instances to fall

i within the margin,

r r but penalize them

w x b 1

w

r r

w x b 1

1 1 Var2

w x b 0

48

Formulating the Optimization

Problem

Constraint becomes :

Var1 yi (w xi b) 1 i , xi

i

i 0

Objective function

penalizes for

i

misclassified instances

r r and those within the

w x b 1

w margin

1

min w C i

2

r r

w x b 1

1 1 Var2

2 i

w x b 0

C trades-off margin width

and misclassifications 49

Linear, Soft-Margin SVMs

1

min w C i yi (w xi b) 1 i , xi

2

2 i i 0

Algorithm tries to maintain i to zero while

maximizing margin

Notice: algorithm does not minimize the number of

misclassifications (NP-complete problem) but the

sum of distances from the margin hyperplanes

Other formulations use i2 instead

As C, we get closer to the hard-margin solution

50

Robustness of Soft vs Hard

Margin SVMs

Var1 Var1

Var2

Var2 w x b 0

w x b 0

51

Soft vs Hard Margin SVM

Soft-Margin always have a solution

Soft-Margin is more robust to outliers

Smoother surfaces (in the non-linear

case)

Hard-Margin does not require to

guess the cost parameter (requires

no parameters at all)

52

Support Vector Machines

Three main ideas:

1. Define what an optimal hyperplane is (in way

that can be identified in a computationally

efficient way): maximize margin

2. Extend the above definition for non-linearly

separable problems: have a penalty term for

misclassifications

3. Map data to high dimensional space where it

is easier to classify with linear decision

surfaces: reformulate problem so that data is

mapped implicitly to this space

53

Support Vector Machines

Three main ideas:

1. Define what an optimal hyperplane is (in way

that can be identified in a computationally

efficient way): maximize margin

2. Extend the above definition for non-linearly

separable problems: have a penalty term for

misclassifications

3. Map data to high dimensional space where it

is easier to classify with linear decision

surfaces: reformulate problem so that data is

mapped implicitly to this space

54

Disadvantages of Linear

Decision Surfaces

Var1

Var2

55

Advantages of Non-Linear

Surfaces

Var1

Var2

56

Linear Classifiers in High-

Dimensional Spaces

Constructed

Var1

Feature 2

Var2

Constructed

Feature 1

Find function (x) to map to

a different space

57

Mapping Data to a High-

Dimensional Space

Find function (x) to map to a different space,

then SVM formulation becomes:

1 s.t. yi ( w ( x ) b) 1 i , xi

min w C i

2

2 i i 0

in the new space

Explicit mapping expensive if (x) is very high

dimensional

Solving the problem without explicitly mapping

the data is desirable

58

The Dual of the SVM

Formulation

Original SVM formulation

1 2

n inequality constraints min w C i

n positivity constraints w ,b 2 i

n number of variables

s.t. yi ( w ( x ) b) 1 i , xi

i 0

The (Wolfe) dual of this

1

problem

one equality constraint

min i j yi y j ( ( xi ) ( x j )) i

ai 2

i, j i

n positivity constraints

n number of variables

(Lagrange multipliers) s.t. C i 0, xi

Objective function more

complicated y

i

i i 0

as (xi) (xj)

59

The Kernel Trick

(xi) (xj): means, map data into new space, then take

the inner product of the new vectors

We can find a function such that: K(xi xj) = (xi) (xj),

i.e., the image of the inner product of the data is the

inner product of the images of the data

Then, we do not need to explicitly map the data into the

high-dimensional space to solve the optimization

problem (for training)

How do we classify without explicitly mapping the new

instances? Turns out

sgn( wx b) sgn( i yi K ( xi , x ) b)

i

where b solves j ( y j i yi K ( xi , x j ) b 1) 0,

i

60

Examples of Kernels

Assume we measure two quantities, e.g.

expression level of genes TrkC and

SonicHedghog (SH) and we use the mapping:

: xTrkC , x SH {x TrkC

2 2

, x SH , 2 xTrkC x SH , xTrkC , x SH ,1}

Consider the function:

K ( x z ) ( x z 1) 2

We can verify that:

( x) ( z )

2

x TrkC 2

z TrkC x SH

2 2

z SH 2 xTrkC x SH zTrkC z SH xTrkC zTrkC x SH z SH 1

( xTrkC zTrkC x SH z SH 1) 2 ( x z 1) 2 K ( x z )

61

Polynomial and Gaussian

Kernels

K ( x z ) ( x z 1) p

For p=2, if we measure 7,000 genes using the kernel

once means calculating a summation product with

7,000 terms then taking the square of this number

Mapping explicitly to the high-dimensional space means

calculating approximately 50,000,000 new features for

both training instances, then taking the inner product of

that (another 50,000,000 terms to sum)

In general, using the Kernel trick provides huge

computational savings over explicit mapping!

Another commonly used Kernel is the Gaussian (maps to

a dimensional space with number of dimensions equal

to the number of training cases):

K ( x z ) exp( x z / 2 )

2

62

The Mercer Condition

Is there a mapping (x) for any

symmetric function K(x,z)? No

The SVM dual formulation requires

calculation K(xi , xj) for each pair of

training instances. The array Gij = K(xi ,

xj) is called the Gram matrix

There is a feature space (x) when the

Kernel is such that G is always semi-

positive definite (Mercer condition)

63

Support Vector Machines

Three main ideas:

1. Define what an optimal hyperplane is (in way

that can be identified in a computationally

efficient way): maximize margin

2. Extend the above definition for non-linearly

separable problems: have a penalty term for

misclassifications

3. Map data to high dimensional space where it

is easier to classify with linear decision

surfaces: reformulate problem so that data is

mapped implicitly to this space

64

Other Types of Kernel

Methods

SVMs that perform regression

SVMs that perform clustering

-Support Vector Machines: maximize margin

while bounding the number of margin errors

Leave One Out Machines: minimize the bound of

the leave-one-out error

SVM formulations that take into consideration

difference in cost of misclassification for the

different classes

Kernels suitable for sequences of strings, or other

specialized kernels

65

Variable Selection with

SVMs

Recursive Feature Elimination

Train a linear SVM

Remove the variables with the lowest weights (those

variables affect classification the least), e.g., remove

the lowest 50% of variables

Retrain the SVM with remaining variables and repeat

until classification is reduced

Very successful

Other formulations exist where minimizing the

number of variables is folded into the optimization

problem

Similar algorithm exist for non-linear SVMs

Some of the best and most efficient variable

selection methods

66

Comparison with Neural

Networks

Neural Networks SVMs

Hidden Layers map to Kernel maps to a very-high

lower dimensional spaces dimensional space

Search space has multiple Search space has a unique

local minima minimum

Training is expensive Training is extremely

Classification extremely efficient

efficient Classification extremely

Requires number of hidden efficient

units and layers Kernel and cost the two

Very good accuracy in parameters to select

typical domains Very good accuracy in

typical domains

Extremely robust

67

Why do SVMs Generalize?

Even though they map to a very high-

dimensional space

They have a very strong bias in that space

The solution has to be a linear combination of

the training instances

Large theory on Structural Risk

Minimization providing bounds on the

error of an SVM

Typically the error bounds too loose to be of

practical use

68

MultiClass SVMs

One-versus-all

Train n binary classifiers, one for each class against all

other classes.

Predicted class is the class of the most confident classifier

One-versus-one

Train n(n-1)/2 classifiers, each discriminating between a

pair of classes

Several strategies for selecting the final classification

based on the output of the binary SVMs

Truly MultiClass SVMs

Generalize the SVM formulation to multiple categories

More on that in the nominated for the student paper award:

Methods for Multi-Category Cancer Diagnosis from Gene

Expression Data: A Comprehensive Evaluation to Inform

Decision Support System Development, Alexander Statnikov,

Constantin F. Aliferis, Ioannis Tsamardinos

69

Conclusions

SVMs express learning as a mathematical

program taking advantage of the rich

theory in optimization

SVM uses the kernel trick to map indirectly

to extremely high dimensional spaces

SVMs extremely successful, robust,

efficient, and versatile while there are

good theoretical indications as to why

they generalize well

70

FuzzyLogic:

ExtractingFuzzyModelsfromData

FuzzyDecisionTrees

Brief History

Fuzzy logic can be defined as a

superset of conventional (Boolean)

logic that has been extended to handle

the concept of partial truth - truth

values between completely true and

completely false

Brought up by Lofti Zedah in the 1960s

Professor at University of California at

Beckley

How it Works

Basics of Fuzzy Logic (Rules)

Humans base their decisions on conditions

Operates on a bunch of IF-THEN

statements

An example is A then B, if C then D

where B and D are all set of A and C.

Steps by Step Approach

Step One

Define the control objectives and criteria.

Consider question like

What is trying to be controlled?

What has to be done to control the system?

What kind of response is needed?

What are the possible (probable) system failure modes?

Step Two

Determine input and output relationships

Determine the least number of variables for

inputs to the fuzzy logic system

Steps by Step Approach

Step Three

Break down the control problem into a series of

IF X AND Y, THEN Z rules based on the fuzzy

logic rules.

These IF X AND Y, THEN Z rules should define

the desired system output response for the

given systems input conditions.

Step Four

Create a fuzzy logic membership function that

defines the meaning or values of the input and

output terms used in the rules

Steps by Step Approach

Step Five

After the membership functions are

created, program everything then into

the fuzzy logic system

Step Six

Finally, test the system, evaluate results

and make the necessary adjustments

until a desired result is obtain

Steps by Step Approach

The above steps are summarized into

three main stages

Fuzzification

Membership functions used to graphically

describe a situation

Evaluation of Rules

Application of the fuzzy logic rules

Diffuzification

Obtaining the crisp results

Steps by Step Approach

Input Membership

Functions

Sample Fuzzy Rule

Base

Output Membership

Function

Inverted Pendulum

Task:

To balance a pole on a mobile platform

that can move in only two directions,

either to the left or to the right.

Inverted Pendulum

The input and output relationships of

the variables of the fuzzy system are

then determined.

Inputs:

Angle between the platform and the

pendulum

Angular velocity of this angle.

Outputs:

Speed of platform

Inverted Pendulum

Use membership functions to

graphically describe the situation

(Fuzzification)

The output which is speed can be

high speed, medium speed, low

speed, etc. These different levels of

output of the platform are defined by

specifying the membership functions

for the fuzzy-sets

Inverted Pendulum

Inverted Pendulum

Define Fuzzy Rules

Examples

If angle is zero and angular velocity is zero,

then speed is also zero

If angle is zero and angular velocity is

negative low, the speed is negative low

If angle is positive low and angular velocity

is zero, then speed is positive low

If angle is positive low and angular velocity

is negative low, then speed is zero

Inverted Pendulum

Inverted Pendulum

Finally, the

Defuzzification stage is

implemented.

Two ways of

defuzzification is by

Finding the center of

Gravity and

Finding the average mean.

Inverted Pendulum

Example Application

http://www.aptronix.com/fuzzynet/jav

a/pend/pendjava.htm

Other Applications

Coal Power Plant Creditworthiness

Assessment

Refuse Incineration Plant Stock Prognosis

Water Treatment Systems Mortgage Application

AC Induction Motor Hi-Fi Systems

Fraud Detection Humidifiers

Customer Targeting Domestic Goods - Washing

Quality Control Machines/Dryers

Speech Recognition Microwave Ovens

Nuclear Fusion Consumer Electronics

Television

Truck Speed Limiter

Still and Video Cameras -

Sonar Systems Auto focus, Exposure and

Toasters Anti-Shake

Photocopiers Vacuum Cleaners

DataFuzzyDecisionTrees

A decision tree is a classifier expressed as a recursive partition of the in-

stance space.

The decision tree consists of nodes that form a rooted tree , meaning it is a directed

tree with a node called root that has no incoming edges.

All other nodes have exactly one incoming edge. A node with outgoing edges is

called an internal or test node. All other nodes are called leaves (also known as

terminal or decision nodes).

In a decision tree, each internal node splits the instance space into two or more sub-

spaces according to a certain discrete function of the input attributes values.

Decision Trees (1). Introduction

represents test on an attribute, each branch represents outcome of

test and each leaf node represents class label.

overcast

Humidity Windy

normal high yes false true

yes no yes no

5

Unordered Fuzzy Decision Trees

H(B)=24,684

B1 = 0.275 = 0.16

B2 = 0.275 = 0.75

I(B; Ai1,j1,,Aiq-1, j q-1, Aiq)

max B3 = 0.450

Cost (Aiq) f = 1.000

H(B| A2)=20,932

I(B; A2) = 3,752

A2

H(B | A2,1)=8,820 H(B| A2,2)=8,201 H(B| A2,3)=3,911

B2=54,1% B2=15,9% B2 = 4,9%

B3=19,3% B3=47,0% B3=78,8%

f= 0,381 f = 0,350 f = 0,269

A1 A4

B2=64,4% B2=50,4% B2=16,3% B2= 7,4% B2=22,9%

B3= 9,4% B3=12,0% B3=72,7% B3=75,4% B3=23,7%

f =0,239 f=0,086 f=0,056 f=0,157 f =0,193

A4 A1

B2=67,8% B2=62,3% B2=39,1% B2=14,2% B2= 13,6%

B3=18,0% B3= 4,5% B3= 3,0% B3=30,7% B3=49,5% 12

f =0,088 f =0,151 f =0,068 f =0,096 f =0,028

Fuzzy Decision Rules (1). A priori

A2

then B is B3 (with degree of truth 0.754)

.. B3=78,8%

A1 A4

Input attribute 3 have not influence to attribute B

(for given thresholds = 0,16 = 0,75). B =64,4% B =50,4%

2 1 B3=72,7% B3=75,4% B1=53,4%

f=0,086 f=0,056 f=0,157

A4 A1

f =0,088 f =0,151 f =0,068 f =0,096 f =0,028

13

Fuzzy Decision Rules (2). A posteriori

Fuzzy Decision Rule is path from root to leaf

One example describes by several Fuzzy Decision Rules

A1 A2 A3 A4

A1,1 A1,2 A1,3 A2,1 A2,2 A2,3 A3,1 A3,2 A4,1 A4,2

0.9 0.1 0.0 1.0 0.0 0.0 0.8 0.2 0.4 0.6

1.0 A2 0.0

0.0

B1 =16,3%

0.9 A1 0.0

A4 B2 = 4,9%

B3=78,8%

0.1

B1=37,6% B1=11,0% B1=17,2%

0.4 A4 0.6 B2=50,4% B2=16,3% B2= 7,4%

B3=12,0% B3=72,7% B3=75,4% A1

W3=(A2,1A1,2)=(1.00.1)=0.10

B1=14,2% B1=33,2%

B2=67,8% B2=62,3% B1=57,9% B1=55,1% B1= 36,9%

B3=18,0% B3= 4,5% B2=39,1% B2=14,2% B2= 13,6%

B3= 3,0% B3=30,7% B3=49,5%

W1 = 0.36 W2=(A2,1A1,1A4,2)=(1.00.90.6)=0.54

B2=67,8% 0.36 + B2=62,3% 0.54 + B2=50,4% 0.10 = B1=63,1% 14

2

B3=18,0% B3= 4,5% B3=12,0% B3=10,1%

Decision Tables

A2

B2 B3 B3

A1 A4

B2 B1 B3 B3 B1

A4 A1

B1 B1 B2 B2 B3

= [1111 2020 2222 1111 2020 2222 2222 2222

2222]T

Basic of Knowledge Representation

Initial date

Multiple-Valued Logic

Decision Tables

Truth table vector column

Sensitivity Analysis

Testability Analysis

16

Reliability Analysis

Fuzzy Decision Making Support System

(FDT, DT)

Numeric variables

Fuzzy de-Fuzzy

Fuzzy Analysis

Linguistic variables

17

SOFTWARE FOR EXPERIMENTAL

INVESTIGATIONS

We create software application Multiprognos by C++ ver. 5.02

Read and write data

Separate data into 2 parts (learning and testing data) (1000)

File initialization

Learning data (70%) Testing data (30%)

Transformation from numeric values of the input attributes

into linguistic values

Induction of Fuzzy Decision Trees Block 4. Analysis of

Induction of Decision Trees YS-FDT the results

C4.5 CART nFDT Calculation of the

decision errors

C4.5p CARTp oFDT Writing data with

incorrect decisions

Statistical methods and algorithms sFDT

Saving of the models

Bayes kNN 18

AlgorithmicFrameworkforDecisionTreesInduction

Illustration of Decision Tree

with Replication

AdvantagesandDisadvantagesofDecision

Trees

Advantages

1. Decision trees are selfexplanatory and when compacted they are also easy to

follow. In other words if the decision tree has a reasonable number of leaves, it can be

grasped by nonprofessional users. Furthermore decision trees can be converted to a

set of rules. Thus, this representation is considered as comprehensible.

2. Decision trees can handle both nominal and numeric input attributes.

3. Decision tree representation is rich enough to represent any discrete value

classifier.

4. Decision trees are capable of handling datasets that may have errors.

5. Decision trees are capable of handling datasets that may have missing

values.

6. Decision trees are considered to be a nonparametric method. This means

that decision trees have no assumptions about the space distribution and

the classifier structure.

Disadvantages

EndofUNITII

- Tutorial SVM MatlabUploaded byVíctor Garrido Arévalo
- Se7204 Big Data Analytics l t p cUploaded bySindhuja Vigneshwaran
- Abstract(Ppt)Uploaded bybhargavi
- 10.1.1.113Uploaded byanietc12
- i 0925259Uploaded byInternational Organization of Scientific Research (IOSR)
- SinhaDu16Uploaded byÜmit Aslan
- Regression BasicsUploaded byYustinus Rimas Pramundarto
- Question Bank-Big DataUploaded byHìtésh Rélwàñí
- 7Uploaded bybmwli
- Unit-5 (OOAD)Uploaded bykarthikeyanbe442
- Munoz, SVM and Applications, Statistical Science 2006Uploaded byrgui
- SinhaDu16.pdfUploaded byÜmit Aslan
- CP7019-Managing Big Data-Anna University -Question PaperUploaded bybhuvangates
- Se 7204 Big Data Analysis Unit III Final 20.4.2017Uploaded byDr.A.R.Kavitha
- SE 7204 BIG Data Analysis Unit I FinalUploaded byDr.A.R.Kavitha
- getPDF21Uploaded byKhairul
- CUSVM: A CUDA IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION AND REGRESSIONUploaded byMike
- 120.508 Module 8 Multiple Regression (PDF Full Page Color)Uploaded bygaurdev
- C2D1Uploaded byDeepak Yadav
- Statistics Study Guide TI-83Uploaded byldlewis
- Assump of RAUploaded byk.shaikh
- 848-sap-bw-35-regression-analysis.pdfUploaded byrohit sharma
- mamogram (1)Uploaded byAdd K
- Eclectic Accounting ProblemsUploaded byHarsha Mohan
- OpenSAP Week 4 TranscriptUploaded byAnonymous 7n2KGpe
- A hybrid investment class rating model using SVD, PSO & multi-class SVMUploaded byAnonymous vQrJlEN
- Raff Regression ChannelUploaded byUpasara Wulung
- 04sch1Uploaded byarchana10bhosale
- 1011149Uploaded byMai Mập Mạp
- Regression 1 SimpleLinearModelUploaded byIves Lee

- completed Final UNIT-V 9.10.17.pptxUploaded byDr.A.R.Kavitha
- Completed UNIT-IV 18.9.17Uploaded byDr.A.R.Kavitha
- completed Unit II 17.7.17.pptxUploaded byDr.A.R.Kavitha
- CompletedUNIT 1 ppt 10.7.17.pptxUploaded byDr.A.R.Kavitha
- Se 7204 Big Data Analysis Unit III Final 20.4.2017Uploaded byDr.A.R.Kavitha
- completed UNIT-III 20.9.17.pptxUploaded byDr.A.R.Kavitha
- SE 7204 BIG Data Analysis Unit I FinalUploaded byDr.A.R.Kavitha
- SE 7204 BIG Data Analysis Unit I FinalUploaded byDr.A.R.Kavitha

- CEHv8 Module 13 Hacking Web Applications .pdfUploaded byMehrdad Jingoism
- 49645421 1NF to 5NF Normalization With EgUploaded byJaimon Jacob
- UICollectionView Class ReferenceUploaded byQamar Saleem
- Chapter 4Uploaded bylm_zakaria4420
- Group6 IisUploaded byJohn Yves B. Ragsac
- 353901407-Ict131-Jan-2017-Exam-Paper-1.pdfUploaded byeric
- Sandisk pSSD Brochure 80-11-01576Uploaded byCh Wang
- Desktop Phishing Tutorial - The Art of PhishingUploaded byAshu
- Creating Excel Report Using XMLUploaded byUma Shankar
- Excel VBAMacros v1.1Uploaded bySWARAJKK
- 134515970 Srs for Online Movie Ticket BookingUploaded byLegend Efsane
- RSA Manual 1Uploaded byRohan Ajagekar
- 2018_A Deep Learning-based Multi-model Ensemble Method for Cancer PredictionUploaded byRishav Kumar
- Client Mate ManualUploaded byAnzad Azeez
- Program File in C LanguageUploaded byHarinder Saini
- CDMA-DSS I 02 200904 Engineering Specifications-54Uploaded byJhon Grández
- Oracle SP- Topology and Network Data Models Developer's GuideUploaded byZoran Cico
- Send File Using FTP (CPP)Uploaded byrachmat99
- Install - SONASUploaded byliew99
- Installing old + new PHP version on cPanel serverUploaded bybodhost
- C# and .NET by ExampleUploaded byNaveena Sivamani
- SAP_LSMW_CONV_00000011Uploaded bytabish24
- 156-215.75Uploaded byJustin Chan Yee Yen
- MaxDB HowTo SAPMaxDBBackupwithDatabaseManagerCLIUploaded byAli Eshaghi Beni
- AIX Editions FAQsUploaded bysurajit_choudhury
- SAP 2007 MANUAL v2.pdfUploaded byMelissa Grace Dizon
- List of Matlab BooksUploaded byemilzaev01
- Proposed Syllabus for MSc. CS Bastar BvUploaded byashimsarkar2006
- Mastering OpenLayers 3 - Sample ChapterUploaded byPackt Publishing
- GS36J02A10-01E_026Uploaded byapisitu