• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
A Practical Guide to Support Vector Classification
Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin
Department of Computer ScienceNational Taiwan University, Taipei 106, Taiwan
Last updated: December 20, 2007
Abstract
Support vector machine (SVM) is a popular technique for classification.However, beginners who are not familiar with SVM often get unsatisfactoryresults since they miss some easy but significant steps. In this guide, we proposea simple procedure, which usually gives reasonable results.
1 Introduction
SVM (Support Vector Machine) is a useful technique for data classification. Eventhough people consider that it is easier to use than Neural Networks, however, userswho are not familiar with SVM often get unsatisfactory results at first. Here wepropose a “cookbook” approach which usually gives reasonable results.Note that this guide is not for SVM researchers nor do we guarantee the bestaccuracy. We also do not intend to solve challenging or difficult problems. Ourpurpose is to give SVM novices a recipe to obtain acceptable results fast and easily.Although users do not need to understand the underlying theory of SVM, nev-ertheless, we briefly introduce SVM basics which are necessary for explaining ourprocedure. A classification task usually involves with training and testing data whichconsist of some data instances. Each instance in the training set contains one “targetvalue” (class labels) and several “attributes” (features). The goal of SVM is to pro-duce a model which predicts target value of data instances in the testing set whichare given only the attributes.Given a training set of instance-label pairs (
x
i
,y
i
)
,i
= 1
,...,l
where
x
i
R
n
and
y
{
1
,
1
}
l
, the support vector machines (SVM) (Boser, Guyon, and Vapnik 1992;Cortes and Vapnik 1995) require the solution of the following optimization problem:min
w
,b,
ξ
12
w
w
+
l
i
=1
ξ
i
subject to
y
i
(
w
φ
(
x
i
) +
b
)
1
ξ
i
,
(1)
ξ
i
0
.
1
 
Table 1: Problem characteristics and performance comparisons.Applications #training #testing #features #classes Accuracy Accuracydata data by users by ourprocedureAstroparticle
1
3,089 4,000 4 2 75.2% 96.9%Bioinformatics
2
391 0
4
20 3 36% 85.2%Vehicle
3
1,243 41 21 2 4.88% 87.8%Here training vectors
x
i
are mapped into a higher (maybe infinite) dimensional spaceby the function
φ
. Then SVM finds a linear separating hyperplane with the maximalmargin in this higher dimensional space.
C >
0 is the penalty parameter of the errorterm. Furthermore,
(
x
i
,
x
 j
)
φ
(
x
i
)
φ
(
x
 j
) is called the kernel function. Thoughnew kernels are being proposed by researchers, beginners may find in SVM books thefollowing four basic kernels:
linear:
(
x
i
,
x
 j
) =
x
i
x
 j
.
polynomial:
(
x
i
,
x
 j
) = (
γ 
x
i
x
 j
+
r
)
d
,
γ >
0.
radial basis function (RBF):
(
x
i
,
x
 j
) = exp(
γ 
x
i
x
 j
2
),
γ >
0.
sigmoid:
(
x
i
,
x
 j
) = tanh(
γ 
x
i
x
 j
+
r
).Here,
γ 
,
r
, and
d
are kernel parameters.
1.1 Real-World Examples
Table1presents some real-world examples. These data sets are reported from ourusers who could not obtain reasonable accuracy in the beginning. Using the procedureillustrated in this guide, we help them to achieve better performance. Details are inAppendixA.These data sets are at
1
Courtesy of Jan Conrad from Uppsala University, Sweden.
2
Courtesy of Cory Spencer from Simon Fraser University, Canada (Gardy et al. 2003).
3
Courtesy of a user from Germany.
4
As there are no testing data, cross-validation instead of testing accuracy is presented here.Details of cross-validation are in Section3.2.
2
 
1.2 Proposed Procedure
Many beginners use the following procedure now:
Transform data to the format of an SVM software
Randomly try a few kernels and parameters
TestWe propose that beginners try the following procedure first:
Transform data to the format of an SVM software
Conduct simple scaling on the data
Consider the RBF kernel
(
x
,
y
) =
e
γ 
x
y
2
Use cross-validation to find the best parameter
and
γ 
Use the best parameter
and
γ 
to train the whole training set
5
TestWe discuss this procedure in detail in the following sections.
2 Data Preprocessing
2.1 Categorical Feature
SVM requires that each data instance is represented as a vector of real numbers.Hence, if there are categorical attributes, we first have to convert them into numericdata. We recommend using
m
numbers to represent an
m
-category attribute. Onlyone of the
m
numbers is one, and others are zero. For example, a three-categoryattribute such as
{
red, green, blue
}
can be represented as (0,0,1), (0,1,0), and (1,0,0).Our experience indicates that if the number of values in an attribute is not too many,this coding might be more stable than using a single number to represent a categoricalattribute.
5
The best parameter might be affected by the size of data set but in practice the one obtainedfrom cross-validation is already suitable for the whole training set.
3
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...