Professional Documents
Culture Documents
Potential Application of The GCP 2
Potential Application of The GCP 2
1 / 50
Contents
1 Introduction
2 Topic Description
3 Literature Review
5 Discussion
6 Conclusion
7 References
2 / 50
What is Support Vector Machine?
3 / 50
CONTD.
SVMs are so popular because they are used to prevent overfittting
and we can work with relatively large number of features without too
much computation.
Figure: 1 hyperplane
where →−
w is the vector normal to decision surface.
distance between (xi , yi )to(w , b) = γ i = yi (w T xi + b)-this is normal
distance between (xi , yi ) to decision surface
5 / 50
CONTD..
6 / 50
CONTD...
7 / 50
Geomatrical Margin
Geomatrical margin removes the drawbacks of Functional Margin.
Geomatrical margin is invariant to scaling of equation.
let equation of line be ax+by+c=0 then w=(a,b)
let w be a vector normal to decision surface then unit vector normal
to decision surface is given by
w a b
= (√ ,√ )
||w || 2
a +b 2 a + b2
2
8 / 50
If we want to find distance between P and Q then this distance is in
γw
direction of normal vector. Therefore we can write P = Q + where γ
||w ||
is distance of P from decision surface.
γw
Coordinates of P=coordinates of Q+
||w ||
γw
(a1 , a2 ) = (b1 , b2 ) + from this equation we can find γ
||w ||
γw
w T ((a1 , a2 ) − )+b =0 since (b1, b2) lies on line wx+b=0
||w ||
w T (a1 , a2 ) + b
γ=
||w ||
w T ((a1 , a2 ) b
γ= +
||w || ||w ||
w T ((a1 , a2 ) b
γ = y( + )
||w || ||w ||
we will scale w so that w norm (||w ||) equal to 1 and then we will find
geometric margin γ=y(wT (a1 , a2 ) + b))
9 / 50
Maximise margin width
Assume linearly separable training examples. γ is geometric margin
and we want to maximize it. We represent optimisation problem
as follows-
Give set of training examples labelled as +ve and -ve if
γ
(w , b)characterises decision surface then is geometric margin and
||w ||
we want to learn values of(w , b) so that this geometric margin is
largest, subjected to constraint-
wxi +b≥γ for +ve points.....(1)
wx+b≤γ for -ve points......(2)
1
Our aim is to scale w so that geometric margin is ( γ = 1) so
||w ||
1
we have to maximize or minimize ||w || = w .w combining (1)
||w ||
and (2) yi (wxi + b) ≥ γ for i = 1, 2, 3....m
when γ is made 1 by normalisation then yi (wxi + b) ≥1 for all training
instances.
10 / 50
Large Margin Linear classifier
1
Minimize ||w ||2 such that yi (w T xi + b) ≥ 1
2
Figure: 2 hyperplane
margin has been scaled so the geometric margin has width 1.The
equation of decision surface is w T x + b = 0
+ve points on margin satisfy eq. w T x + b = 1
-ve points on margin satisfy eq. w T x + b = −1 11 / 50
1
Minimize ||w ||2 such that yi (w T xi + b) ≥ 1
2
Lagrangian function of our optimisation problem
1
minimise LP (w,b,αi )= ||w ||2 - m T
P
i,j=1 αi (yi (w xi + b) − 1)such that
2
αi ≥ 0, i = 1, 2, 3....k
We convert our primal problem into dual formulation.This is solved by
using Lagrange multiplier and we can get Lagrange duality to get dual of
this optimisation problem.
Dual Problem
1 Pm
maxα J(α)= m αi αj yi yj (xiT xj )
P
i=1 αi –
2 i,j=1
such that αi ≥ 0, i = 1, 2, 3....k
Pm
i=1 αi yi = 0
This is a quadratic programming problem. A global maximum of αi can
always be found by solving this Quadratic Problem.
12 / 50
Maximum Margin with Noise
Figure: 3 hyperplane
13 / 50
CONTD..
We want a decision surface which looks at two things-
One is maximising margin which corresponds to minimising w.w
We want to reduce number of miss classification means minimise
training error.
Objective Function= w.w +C.number of training error
It is no longer a Quadratic objective Function.So we cannot use a QP
solver to solve this optimisation problem.
Figure: 4 hyperplane
14 / 50
CONTD....
15 / 50
Non linear SVM and Kernel functions
Non Linear SVM is used when dataset is truely non linearly separable.
In this case original feature space is transformed to a new feature
space (higher dimensional feature space).
And in some cases it is possible that training points are linearly
separable in transformed feature space.
Suppose x is input variable then using the methodology of nonlinear
SVM we can use a mapping to map x to φ(x) x→ φ(x)
Normally when we transform this feature space to higher dimensional
feature space,it will lead to higher computational cost,but by using
Kernel function we can achieve this transformation without much
higher cost.
16 / 50
CONTD....
17 / 50
CONTD....
18 / 50
The Kernel Trick
19 / 50
Literature Review
1.PNG
20 / 50
Outcomes of Literature Review
Paper 1 : Multi-class support vector machines for static security
assessment of power system
This paper focuses on predicting security status of power system in the
least time frame with the highest accuracy level. The secure state of
power system is defined as its continue operation within allowable
boundaries in normal conditions as well as after any kind of
disturbance. Power system security assessment is classified in three
broad categories: (a) normal steady state operation where its
assessment is carried out by solving set of algebraic equations (b)
transient performance (c) dynamic performance assessment.Here only
static security assessment is addressed.
Paper 2 : An Optimized Stacked Support Vector Machines
Based Expert System for the Effective Prediction of Heart
Failure
In this paper, we introduce an expert system that stacks two support
vector machine (SVM) models for the effective prediction of HF.The
experimental results confirm that the proposed method improves the
performance of a conventional SVM model by 33%.
21 / 50
Discussion
We will be discussing two papers which have used Support Vector Machine
to design required classifiers.
Paper 1:
Multi-class support vector machines for static security
assessment of power system
Paper 2:
An Optimized Stacked Support Vector Machines Based Expert
System for the Effective Prediction of Heart Failure
22 / 50
Static security assessment using composite security index
25 / 50
Figure: 5 Steps involved in pattern recognition for assessment of power system
security
26 / 50
CONTD....
27 / 50
Feature selection
28 / 50
Classifier design using multiclass support vector machine
29 / 50
Figure: 6 Optimal Hyperplane representation for SVM Classifier
30 / 50
CONTD....
In this work, for security assessment of power system, three classes for
classifications have been considered, namely, secure, alarm and
insecure. Hence, such a problem has been treated as multiclass
problem.
There are two types of approaches suggested for multi-class SVM in
literature-
One is considering all data in one optimization.
The other is decomposing multi-class into a series of binary SVMs,
such as ”One-Against-All” (OAA) and ”One-versus-One”.
31 / 50
One-Against-All (OAA) SVM
32 / 50
CONTD...
33 / 50
CONTD...
Figure: 7 Class boundaries for OAA SVM foormulation for a three class problem
34 / 50
An Optimized Stacked Support Vector Machines Based
Expert System for the Effective Prediction of Heart Failure
35 / 50
Proposed Method
36 / 50
Experiment: L1 Regularized Linear SVM Stacked With L2
Reguralized Linear SVM
DATASET DESCRIPTION
this study considered 13 HF features.
37 / 50
CONTD...
40 / 50
L1 Support Vector Machine
The L1-norm SVM can be used for feature selection due to its
capability of suppressing irrelevant or noisy features automatically. It
shrinks components of the vector w that correspond to the features
that would be eliminated.
41 / 50
Formulating The Two Optimization Problems As One
Optimizationn Problem By Merging them
42 / 50
Evaluation Metrics
To evaluate the effectiveness of the newly proposed method, different
evaluation metrics including accuracy, sensitivity, specificity and Matthews
correlation coefficient (MCC) have been used.
Accuracy- is the percentage of correctly classified subjects.
Sensitivity is the percentage of correctly classified patients
specificity is the correctly classified healthy subjects.
43 / 50
Contd...
44 / 50
Contd...
45 / 50
Contd...
46 / 50
The hyperparameters of all these models are optimized using
exhaustive search strategy
For Adaboost model, the hyperparameter Ne denotes the maximum
number of estimators at which boosting is terminated.
For random forest model, the hyperparameter Ne denotes the number
of trees in the forest and
For extra tree ensemble model the hyperparameter Ne denotes the
number of trees used by the ensemble model. From the table, it is
evidently clear that the proposed model show better performance
than the ensemble machine learning models
47 / 50
Conclusion
48 / 50
References
49 / 50
THANK YOU
50 / 50