Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
25Activity
0 of .
Results for:
No results containing your search query
P. 1
Support Vector Machines

Support Vector Machines

Ratings:

4.4

(1)
|Views: 1,105|Likes:
Published by zzztimbo
A Flexible
Implementation for
Support Vector
Machines

Support vector machines (SVMs) are learning algorithms that have many
applications in pattern recognition and nonlinear regression. Being very
popular, SVM software is available in many versions. Still, existing imple-
mentations, usually in low-level languages such as C, are often difficult to
understand and adapt to specific research tasks. In this article, we present
a compact and yet flexible implementation of SVMs in Mathematica,
traditionally named MathSVM. This software is designed to be easy to
extend and modify, drawing on the powerful high-level language of
Mathematica.
A Flexible
Implementation for
Support Vector
Machines

Support vector machines (SVMs) are learning algorithms that have many
applications in pattern recognition and nonlinear regression. Being very
popular, SVM software is available in many versions. Still, existing imple-
mentations, usually in low-level languages such as C, are often difficult to
understand and adapt to specific research tasks. In this article, we present
a compact and yet flexible implementation of SVMs in Mathematica,
traditionally named MathSVM. This software is designed to be easy to
extend and modify, drawing on the powerful high-level language of
Mathematica.

More info:

Published by: zzztimbo on Nov 28, 2007
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

01/15/2013

pdf

text

original

 
The Mathematica
® 
Journal 
Flexiblemplementation for Support Vector achines 
Roland Nilssonohan Björkegrenesper Tegnér
Support vector machines (SVMs) are learning algorithms that have manapplications in pattern recognition and nonlinear regression. Being verpopular, SVM software is available in many versions. Still, existing imple-mentations, usually in low-level languages such as C, are often difficult tounderstand and adapt to specific research tasks. In this article, we present a compact and yet flexible implementation of SVMs i
 Mathematica
,traditionally named
 MathSVM 
. This software is designed to be easy toextend and modify, drawing on the powerful high-level language o
 Mathematica
.
Background
 A 
 pattern recognition problem
amounts to learning how to discriminate betweendata points
 x
belonging to two
classes,
defined by class labels
 y
œ
8
+
1,
-
1
<
, whengiven only a set of examples
H
 x
,
 y
L
from each class. These problems are foundin various applications, from automated handwriting recognition to medicalexpert systems, and pattern recognition or
machine learning 
algorithms areroutinely applied to solve them.It may be helpful for newcomers to relate this to a more familiar problem:standard statistical hypothesis testing for one-dimensional
 x
, such as Student’s
-test [1, ch. 8], can be viewed as a very simple kind of pattern recognitionproblem. Here, the hypotheses
 H 
0
and
 H 
1
correspond to classes
+
1,
-
1, and thefamiliar
 g 
H
 x
L
=
lomno
 H 
0,
if 
 x
<
 x
êê
 H 
1
,if 
 x
>
 x
êê
y{zzz
,
The Mathematica Journal 
10 
:1 © 2006 Wolfram Media, Inc.
 
 where
 x
êê
is the mean of within-class means,
 x
êê=
1
ÅÅÅÅÅÅ
2
 
ikjjj
1
ÅÅÅÅÅÅÅ
1
 
:
 y
1
=+
1
 x
+
1
ÅÅÅÅÅÅÅÅÅÅ
-
1
 
:
 y
1
=-
1
 x
y{zzz
,
=
» 8
:
 y
1
=
< »
and
 H 
0
<
 H 
1
, is called the
decision rule
or sometimes simply the
classifier 
. We say that the decision rule
 g 
is induced from data
 x
, in this casedetermined by computing
 x
êê
.However, real pattern recognition problems usually involve high-dimensionaldata (such as image data) and unknown underlying distributions. In this situation,it is nearly impossible to develop statistical tests like the preceding one. Theseproblems are typically attacked with algorithms, such as artificial neural networks[2], decisions trees [3, ch. 18], Bayesian models [4], and recently SVMs [5], to which we will devote the rest of this article. Here we will only consider data thatcan be represented as vectors
 x
œ
 R
n
; other kinds of information can usually bechanged to this form in some appropriate manner.
Support Vector Machines
SVMs attempt to find a hyperplane
P
w
,
b
=
w
.
 x
+
b
=
0,
 x
œ
 R
n
that separates thedata points
 x
(meaning that all
 x
in a given class are on the same side of theplane), corresponding to a decision rule
 g 
H
 x
L
=
sign
H
w
.
 x
+
b
L
.In SVM literature,
w
is often referred to as the
weight vector 
;
b
is called the
bias 
(aterm adopted from neural networks). This idea is not new; it dates back at leastto R.A. Fisher and the theory of 
linear discriminants 
[6]. The novelty of SVMs liesin how this plane is determined: SVMs choose the separating hyperplane
w
.
 x
+
b
=
0 that is furthest away from the data points
 x
, that is, that has maximal
margin
(Figure 1). The underlying idea is that a hyperplane far from anobserved data points should minimize the risk of making wrong decisions whenclassifying new data. To be precise, in SVMs we maximize the distance to theclosest data points. We solve(1)max
H
w
,
b
L
H
min
H
P
w
,
b
,
 x
LL
, where
H
P
w
,
b
,
 x
L
=
»
w
.
 x
+
b
» ê »»
w
»»
is the distance between data point
and theplane
P
w
, subject to the constraint that this plane still separates the classes. Theplane
P
w
that solves (1) is called the
optimal separating hyperplane
and is unique[5].
 MathSVM 
provides algorithms for determining this plane from data.
Flexible Implementation for Support Vector Machines 
115
The Mathematica Journa
10 
:1 © 2006 Wolfram Media, Inc.
 
-
4
-
2 2
-
2
-
1123
Figure 1.
Two-class data (black and grey dots), their optimal separating hyperplane(continuous line), and support vectors (circled in blue). This is an example output of the
SVMPlot
function in
MathSVM
. The width of the “corridor” defined by the two dottedlines connecting the support vectors is the margin of the optimal separating hyperplane.
Solving the Optimization Problem
·
The Primal Problem
It turns out that the optimal separating hyperplane solving (1) can be found asthe solution to the equivalent optimization problem(2)min
w
,
b
1
ÅÅÅÅÅÅ
2
»»
w
»»
2
subjectto
 y
H
w
 
 x
+
b
L
¥
1,referred to as the
 primal 
problem. Typically, only a small subset of the datapoints will attain equality in the constraint; these are termed
 support vector
sincethey are “supporting(constraining) the hyperplane (Figure 1). In fact, thesolution
H
w
,
b
L
depends only on these specific points. Therefore, the method alsois a scheme for data compression, in the sense that the support vectors contain allthe information necessary to derive the decision rule.
·
The Dual Problem
For reasons that will become clear later,
w
often has very high dimension, whichmakes the primal problem (2) intractable. Therefore, we attack (2) indirectly, by solving the
dual problem
(3)min
a
a
 
Q
a-a
subjectto
a
¥
0,
 y
a=
0, where
Q
=
H
q
i
L
=
H
 y
 
 y
 j 
 
 x
 j 
 
 x
L
. This is a quadratic programming (QP) problem, which has a unique solution whenever
Q
is positive semidefinite (as is the casehere). It is solved numerically in
 MathSVM 
by the function
QPSolve
. (At thispoint we need to load the
 MathSVM 
package; see Additional Material.)
116
 Roland Nilsson, Johan Björkegren, and Jesper Tegnér 
The Mathematica Journal 
10 
:1 © 2006 Wolfram Media, Inc.

Activity (25)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
adipamungkas1991 liked this
mdelbi8659 liked this
Charu Sharma liked this
Gigo Pulikkottil liked this
yuzun76 liked this
Sinjo Honeybrew liked this
Mohammad Farhan liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->