Professional Documents
Culture Documents
AND OR NOT(1)
The XOR
problem cannot
be solved with
a perceptron.
XOR
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
With NN:Multi-layer feed-forward neural networks
Each layer receive their inputs from the previous one and
transmits the output to the next one
w1ij
1
w2ij z g wij xi j
1
j
1
i
2
z g wij zi j
2
j
2 1
i
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
w111 1 XOR
1 w211
w112 (11) w111 = 0.7 w121 = 0.7 11 = 0. 5
1 w112 = 0.3 w122 = 0.3 12 = 0. 5
(21) w211 = 0.7 w221 = -0.7 12 = 0. 5
w121 2
2 w221
w122 (12)
x1 = 0 x2 = 0
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
w111 1 XOR
1 w211
w112 (11) w111 = 0.7 w121 = 0.7 11 = 0. 5
1 w112 = 0.3 w122 = 0.3 12 = 0. 5
(21) w211 = 0.7 w221 = -0.7 12 = 0. 5
w121 2
2 w221
w122 (12)
x1 = 1 x2 = 0
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
w111 1 XOR
1 w211
w112 (11) w111 = 0.7 w121 = 0.7 11 = 0. 5
1 w112 = 0.3 w122 = 0.3 12 = 0. 5
(21) w211 = 0.7 w221 = -0.7 12 = 0. 5
w121 2
2 w221
w122 (12)
x1 = 0 x2 = 1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
w111 1 XOR
1 w211
w112 (11) w111 = 0.7 w121 = 0.7 11 = 0. 5
1 w112 = 0.3 w122 = 0.3 12 = 0. 5
(21) w211 = 0.7 w221 = -0.7 12 = 0. 5
w121 2
2 w221
w122 (12)
x1 = 1 x2 = 1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
The hidden layer REMAPS the input in a new
representation that is linearly separable
Why to transform?
Linear operation in the feature space is equivalent to non-
linear operation in input space
Classification can become easier with a proper
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
XOR
Is not linearly separable
X Y Y
0 0 0
0 1 1
1 0 1
1 1 0
X
Is linearly separable
Y
X Y XY
0 0 0 0
0 1 0 1
1 0 0 1
X
1 1 1 0
XY
Find a feature space
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Transforming the Data
f( )
f( ) f( )
f( ) f( ) f( )
f(.) f( )
f( ) f( )
f( ) f( )
f( ) f( )
f( ) f( ) f( )
f( )
f( )
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
The Kernel Trick
Recall the SVM optimization problem
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
An Example for f(.) and K(.,.)
Suppose f(.) is given as follows
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Kernels
Given a mapping:
x φ(x)
z 0 z M z 0
T
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Modification Due to Kernel Function
Change all inner products to kernel functions
For training,
Original
With kernel
function
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Modification Due to Kernel Function
For testing, the new data z is classified as class 1 if f >0,
and as class 2 if f <0
Original
With kernel
function
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
More on Kernel Functions
Since the training of SVM only requires the value of K(xi,
xj), there is no restriction of the form of xi and xj
xi can be a sequence or a tree, instead of a feature vector
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Example
Suppose we have 5 1D data points
x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class 1 and 4,
5 as class 2 y1=1, y2=1, y3=-1, y4=-1, y5=1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Example
Suppose we have 5 1D data points
x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class 1 and 4,
5 as class 2 y1=1, y2=1, y3=-1, y4=-1, y5=1
1 2 4 5 6
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Example
We use the polynomial kernel of degree 2
K(x,y) = (xy+1)2
C is set to 100
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Example
By using a QP solver, we get
a1=0, a2=2.5, a3=0, a4=7.333, a5=4.833
Note that the constraints are indeed satisfied
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Example
f(z) f(z)>0
1 2 4 5 6
f(z)<0
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Kernel Functions
In practical use of SVM, the user specifies the kernel
function; the transformation f(.) is not explicitly stated
Given a kernel function K(xi, xj), the transformation f(.)
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
A kernel is associated to a transformation
Given a kernel, in principle it should be recovered the
transformation in the feature space that originates it.
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
A kernel is associated to a transformation
K xi , x j )= 1+ xi , x j ) 1 x x x x )
2 1 1
i j
2 2 2
i j
f(xi ) = 1, x ) , 2 x )x ), x ) , 2 x ), 2 x
1 2
i
1
i
2
i
2 2
i
1
i
2
i )
T
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
XOR 1
0 1
)
[+1,+1] -1
2
K(xi , x j ) = 1+ xi x j
1
L (α) α1+α2+α3+α4 ( 9α12 2α1α2 2α1α3 2α1α4+
2
9α22+2α2α3 2α2α4+9α32 2α3α4+9α42 )
1
L α1 1 9α1 α2 α3 α4 = 0 α1 = α2 = α3 = α4 =
8
L α 2 1 α1 9α 2 α3 α4 = 0 1
L=
L α 3 1 α1 α2 9α3 α4 = 0 4
L α 4 1 α1 α2 α3 9α 4 = 0 The four Input vectors are
All support vectors
N
w = αi yi(xi ) W = [0, 0, –1/sqrt(2), 0, 0, 0] T
i=1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
XOR
Input vector Y
1 [-1,-1] -1
[-1,+1] +1
[+1,-1] +1
0 1
[+1,+1] -1
f(xi ) = 1, x ) , 2 x )x ), x ) , 2 x ), 2 x
1 2
i
1
i
2
i
2 2
i
1
i
2
i )
T
N
w = αi yi(xi ) W = [0, 0, –1/sqrt(2), 0, 0, 0]’
i=1
Input vector Y x1x2
[-1,-1] -1 +1
[-1,+1] +1 -1
[+1,-1] +1 -1
[+1,+1] -1 +1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Examples of Kernel Functions
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Ploynomial kernel
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Fonte:
http://www.ivanciuc.org/
Examples of Kernel Functions
xx xy yy
K ( x, y ) exp exp
s2
exp
2s 2
2s
2
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Examples of Kernel Functions
With 1-dim vectors:
x2 xy y 2
K ( x, y) exp 2 exp 2 exp 2
2s s 2s
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
With slack variables
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Gaussian RBF kernel
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Building new kernels
k ( x, y) c1k1 ( x, y) c2 k2 ( x, y)
Exponential
k ( x, y) exp k1 ( x, y)
Product
k ( x, y) k1 ( x, y) k2 ( x, y)
Polymomial transformation (Q: polymonial with non negative
coefficients)
k ( x, y) Qk1 ( x, y)
Function product (f: any function)
k ( x, y) f ( x)k1 ( x, y) f ( y)
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Choosing the Kernel Function
Probably the most tricky part of using SVM.
The kernel function is important because it creates the
kernel matrix, which summarizes all the data
Many principles have been proposed (diffusion kernel,
f1 ( x) (nA , nC , nG , nT )
Or the number of dimers (16-D space)
f2 ( x) (nAA , nAC , nAG , nAT , nCA , nCC , nCG , nCT ,..)
kl ( x, y) fl x ) fl y )
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
l is usually lower than 1
Kernel out of generative models
Given a generative model associating a probability
p(x|θ) to a given input, we define :
K ( x, y) p( x) p( y)
Fisher Kernel
g ( , x) ln p( x | )
N
1
F E g ( , x) g ( , x)
T
N
g ( , x ) g ( , x )
i 1
i i
K ( x, y ) g ( , x) F g ( , y )
T 1
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Other Aspects of SVM
How to use SVM for multi-class classification?
One can change the QP formulation to become multi-class
More often, multiple binary classifiers are combined
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Other Aspects of SVM
How to interpret the SVM discriminant function value as
probability?
By performing logistic regression on the SVM output of a
set of data (validation set) that is not used for training
Some SVM software (like libsvm) have these features
built-in
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Software
A list of SVM implementation can be found at
http://www.kernel-machines.org/software.html
Some implementation (such as LIBSVM) can handle
multi-class classification
SVMLight is among one of the earliest implementation of
SVM
Several Matlab toolboxes for SVM are also available
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Summary: Steps for Classification
Prepare the pattern matrix
Select the kernel function to use
value of C
You can use the values suggested by the SVM software, or
you can set apart a validation set to determine the values
of the parameter
Execute the training algorithm and obtain the ai
Unseen data can be classified using the ai and the
support vectors
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Strengths and Weaknesses of SVM
Strengths
Training is relatively easy
No local optimal, unlike in neural networks
It scales relatively well to high dimensional data
Tradeoff between classifier complexity and error can be
controlled explicitly
Non-traditional data like strings and trees can be used as
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna
Other Types of Kernel Methods
A lesson learnt in SVM: a linear algorithm in the feature
space is equivalent to a non-linear algorithm in the input
space
Standard linear algorithms can be generalized to its non-
Pier Luigi Martelli - Systems and In Silico Biology 2015-2016- University of Bologna