You are on page 1of 5

Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

SUPPORT VECTOR MACHINE BASED ON A NEW REDUCED SAMPLES


METHOD
SHU-XIA LU, JIE MENG, GUI-EN CAO

Key Lab. of Machine Learning and Computational Intelligence, College of Mathematics and Computer Science, Hebei
University, Baoding 071002, China
E-MAIL: mclsx@hbu.cn, mengjie655@163.com

Abstract: been emerged. In reference [3], a method is proposed to weed


The support vectors play an important role in the training out some training samples which can’t be support vectors via
to find the optimal hyper-plane. For the problem of many adopting a fuzzy member function. In reference [4],
non-support vectors and a few support vectors in the dismissing margin based on a method of Class-center.
classification of SVM, a method to reduce the samples that may Another algorithm named sample reduction by data structure
be not support vectors is proposed in this paper. First, adopt the
Support Vector Domain Description to find the smallest sphere
analysis (SR-DSA) for SVMs to improve their scalability [5].
containing the most data points, and then remove the objects Support Vector Domain Description (SVDD) is a data
outside the sphere. Second, remove the edge points based on the description method inspired by the support vector machine.
distance of each pattern to the centers of other classes. In The idea, first presented by Tax and Duin [6, 9], is to find the
comparison with the standard SVM, the experimental results sphere with minimal volume (or minimal radius) containing
show that the new algorithm in the paper is capable of reducing the most objects. SVDD obtains a spherically shaped
the number of samples as well as the training time while boundary around a dataset and it can be made flexible by
maintaining high accuracy. using other kernel functions. It is always used for outlier
detection by classing or clustering. A new soft clustering
Keywords: algorithm in which each cluster is modeled by SVDD is
Support Vector Domain Description; Distance; Reduce proposed [10].
For large number datasets, it is necessary to reduce the
1. Introduction training samples. In this paper a new method to reduce the
samples that may be non-support vectors has been proposed.
Support Vector Machines (SVMs) based on Vapnik The remainder of this paper is organized as follows. Firstly
statistical learning theory[1], a novel Machine learning provide a brief review of SVMs in section 2. In section 3, we
method to the small datasets, have played an important role describe the algorithm of Support Vector Domain Description.
in many areas, due to its good properties such as margin In section 4, we introduce a new algorithm to reduce the
maximization and kernel technology adopted in high samples that may be non-support vectors. Firstly adopt the
dimensional feature space. Besides, SVMs have high fitting Support Vector Domain Description to find the smallest
accuracy, a small number of tunable parameters and can find sphere containing the most of the data objects, remove the
the global solution, which make SVMs have been objects outside the sphere, and then based on the distance of
successfully applied to many fields, ranging from face each pattern to the centers of other classes to remove the edge
recognition, to biological data processing for medical points. Section 5 reports the experimental results, which
diagnosis, to time series prediction. demonstrate that our method is feasible and performances
Structural Risk Minimization, whose basic idea is to well. In the last section, we make a conclusion to this paper.
find a function that can minimizes the expectation of the error
on new data [2]ˈcan be as the important theoretical basis of 2. Support Vector Machine
SVM. The goal of SVM is to find an optimal hyper-plane to
separate the positive class and the negative class with the Given l training pairs
largest margin. However, with a large number of training {( xi , yi ) xi ∈ R n , yi ∈ {1, −1}, i = 1," , l} (1)
samples, SVM is too time-consuming and the memory may
blow up, reducing the number of the training samples is of
significant. So far many papers on reducing the samples have

978-1-4244-6527-9/10/$26.00 © 2010 IEEE


1510
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

where xi is a vector of n-dimensional space and yi is the label § l ·


f ( x) = sgn ¨ ¦ α i yi k( xi , x) + b ¸ (7)
of xi . SVMs search for a separating hyper-plane with the © i =1 ¹
largest margin, which is called the optimal hyper-plane. This
hyper-plane can classify the positive points and the negative 3. Support Vector Domain Description
points. In order to search the optimal hyper-plane, we should
solve the following quadratic programming problem: In the SVDD formulation of Tax and Duin, the goal is to
1 2 l find a sphere with minimum volume, containing the most
min w +C ¦ ξ data points. It allows some points outside the sphere and
w, b, ξ 2 i
i =1
introduces slack variable ξi .
s.t. y ( w ⋅ φ ( x ) + b) ≥ 1 − ξ (2)
i i i Given the data points {x1 ," , xl }ˈl is the number of the
ξ ≥ 0, i = 1, 2," , l
i data points, the SVDD problem can be formulated as a
where the ξ is a measure of the misclassification error, C is a constrained convex optimization problem:
trade-off parameter controlling how much the slack variables l
min R 2 + C ¦ ξi
are penalized. R ,a ,ξ1 ,",ξl
i =1

The SVM algorithm is as follows: s.t. xi − a 2


≤ R 2 +ξi (8)
(1) Suppose the dataset ξ i ≥ 0 , i = 1," , l
{( xi , yi ) xi ∈ R n , yi ∈ {1, −1}, i = 1," , l} (3) where R and a are, respectively, the radius and center of the
(2) Choose the proper penalized parameter C , construct and sphere, C is a trade-off parameter controlling how much the
solve the Wolf dual problem: slack variables are penalized.
1 l l l
Now we give the SVDD algorithm as follows:
min
α
¦¦ yi y jα iα j ( xi ⋅ x j ) − ¦
2 i =1 j =1
αj
j =1 (1) Given the dataset T = {xi xi ∈ R n , i = 1," , l}
l
s.t. ¦ yα
i =1
i i =0 (4) (2) Solve the Wolf dual problem of the constrained convex
optimization problem:
0 ≤ α i ≤ C , i = 1," , l
The problem is usually posed in its Wolfe dual form with l l l

respect to Lagrange multipliers α i ∈ [0, C ], i = 1," , l , which min


α1 ,",α l
¦¦ α α
i =1 j =1
i j ( xi ⋅ x j ) − ¦ α i ( xi ⋅ xi )
i =1

can be solved by standard quadratic optimization l

packages. Suppose α ∗ =(α1∗ ," ,α l∗ ) is one solution of the


s.t. ¦α
i =1
i =1

wolf dual problem. 0 ≤ αi ≤ C


(9)
(3) There exists one α ∗j ∈ ( 0ˈC ) , we can get: the optimal solution is denoted as:
l l
α ∗ =(α1∗ , α 2∗ ," , α N∗ )T 
w∗ = ¦ α i∗ yi xiˈb = y j − ¦ yiα i∗ K ( xi , x j ) , (5)
i =1 i =1 (3) Compute the center of the sphere:
(4) The discriminative function is therefore given by: N
l a = ¦ α i xi (10)
f ( x) = sgn(¦ yiα i ( x ⋅ xi ) + b) (6) i =1
i =1
the radius of the smallest sphere is given by :
For some nonlinear classification problems, nonlinear
R = ( ¦ xi − a ) / n (11)
SVMs can be utilized. The basic idea of nonlinear SVMs is to i∈SV

map data vectors from the input space to a high-dimensional where n is the number of the support vectors which satisfy
feature space using a nonlinear mapping Φ .The mapping Φ the inequality 0<α i < C .
can be replaced by kernel functions k ( xi , x j ) , which obeys
(4) Find the smallest sphere, thus the squared distance
Mercers conditions. It is only needed to compute the inner between a given test point z and a is:
products between support vectors Φ ( xi ) and the pattern
z − a = ( z ⋅ z ) − 2¦ α i ( z ⋅ xi ) + ¦¦ α iα j ( xi ⋅ x j ) (12)
2

vector Φ(x) in the feature space without knowing the i i j

specific format of Φ . The commonly used kernel functions Typically, the decision of whether z belongs to the same
are polynomials functions, radial basis functions and certain class as the training data or not is obtained by comparing the
sigmoid functions. For an unknown input pattern x, we have distance with the radius. If the distance is greater than the
the following discriminative function: radius, it is rejected; otherwise accepted.
Analogous to the standard SVM in the section 2, simply

1511
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

all the inner products can be replaced by the kernel we first use the method above to reduce the negative
function k ( xi , x j ) , in the kernel version, the hyper-sphere lives class only using one of the groups ( S P ), and then for the
1

in a high (maybe infinite) dimensional space induced by the remaining samples of the negative class the similar
kernel. method is used to remove the edge samples for the other
group ( S P ), the reduced samples will be obtained.
2

4. SVM classifier based on the reduced samples (4) The samples reduced by the above three steps are trained
by SVM classifier to find the optimal hyper-plane.
The algorithm we proposed will be described in detail.
There are two key steps in the algorithm: first, the samples
outside the sphere will be removed by using the algorithm of
SVDD in section 3; second, for the samples inside the sphere,
we will remove the edge points by the Euclidian Distance.
The proposed algorithm proceeds as follows;
(1) Cluster initialization: each class of the data is clustered
P1
into several groups using k-means. Now we take the
P2
dataset with two classes for example, suppose that the
positive class P is clustered into u groups, while the
negative class N into v groups, then the positive class can
N1
be denoted as:
P = P1 ∪ P2 ∪ " ∪ Pu ,
the negative class:
N = N1 ∪ N 2 ∪ " ∪ N v , Figure 1. The initialize clusters of the samples
here we take u =2ˈv = 1 for example.
(2) SVDD: use the SVDD algorithm mentioned in section 3
to obtain the smallest spheres S P , S P , S N for P1 , P2 , N1
1 2 1

respectively. With the formula (12), compute the


distance of each point to the center of the sphere that it
belongs to, and compare with the radius: if the distance sP1
is greater than the radius, the point will be rejected;
otherwise, it will be accepted. Thus the points outside the sP2
sphere will be rejected, and the others in the sphere will s N1
be accepted.
(3) Remove the edge points: we remove the points in the
sphere by the Euclidian Distance. Given xi belongs to
the positive class S P , compute two distances, one is the
1

distance of the point xi to the center of the negative Figure 2. the reduced samples by SVDD

class S N ; the other is the distance between the center of


1

the positive class and the center of the negative class.


Compare the two distances, if the former distance is
greater than the later distance, the point will be rejected.
The reduced samples can be formulated as follows xi
X S = S p − { xi xi ∈ S p , d 2 ( xi , S N ) > d 2 ( S p , S N )} (13)
P1 1 1 1 1 1

where d ( S P , S N ) is the distance of the later distance


2
1 1

mentioned above. The similar method can be used to


reduce the samples of the other group S P in the positive 2

class.
As the example supposed, the positive class has two Figure 3. The reduced samples based on Euclidian Distance
groups, while the negative class has only one group. So

1512
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

ijcnn1 49990 22
cod-rna-scale 59535 8
xi
TABLE 2. THE OPTION OF TRAINING PARAMETERS

datasets t c v
sP1∗
german.numer-scale rbf 1 10
sP2 ∗
svmguide3 rbf 10 10
svmguide1-scale linear 100 10
ijcnn1 rbf 1 10
cod-rna-scale rbf 10 10

In Table2, the type of SVM is C-SVC; t is the type of


Figure 4. The final reduced samples kernel function; c is the penalized parameter; v is the n-fold
cross validation.
5. Experimental analysis
TABLE 3. THE EXPERIMENTAL RESULTS OF THE ORIGINAL SAMPLES

In this paper we train the reduced samples by SVM


datasets Ns Nsv Accuracy(%) Time(s)
classifier. Compare with the experimental results directly on
the original samples using SVM classifier, it shows that the german.numer
1000 611 76.0000 0.86908
proposed method performs well in decreasing the number of -scale
the training samples, improving the classification accuracy svmguide3 1243 521 80.7723 0.90878
and accelerating the training speed of SVM.
svmguide1
To demonstrate the performance of the proposed -scale
3089 382 95.3707 1.49830
algorithm, we report the experimental results on the datasets
of svmguide3, german.numer-scale, cod-rna-scale, ijcnn1 49990 8529 93.4407 994.79470
svmguide1-scale and ijcnn, where are all from Libsvm
cod-rna-scale 59535 9182 94.9677 1282.69580
datasets [13]. We use Libsvm for SVM training in Matlab7.1.
All the programs were executed on a CPU 2.93GHz machine
with 1.98GB memory. The experimental results are as TABLE 4. THE EXPERIMENTAL RESULTS OF THE REDUCED SAMPLES
follows:
In the table, Ns denotes the number of training samples; datasets Ns Nsv Accuracy(%) Time(s)
Nsv denotes the number of support vectors. german.numer
200 86 85.0000 0.030665
-scale
svmguide3 199 57 90.9548 0.022464
TABLE 1. THE DATASETS
svmguide1
datasets Ns feature 450 14 99.3333 0.059495
-scale
german.numer-scale 1000 24
ijcnn1 7835 1161 96.0562 11.9828
svmguide3 1243 21
svmguide1-scale 3089 4 cod-rna-scale 16078 2195 96.4548 42.9563

is shorten. For the dataset of svmguide1-scale, the number


The above experimental results show that the proposed
of the samples is reduced from 3089 to 450, the number of
algorithm to reduce the training samples in the paper can
the support vectors is greatly reduced from 382 to 14, the
obtain good performance. For the dataset of
accuracy is increased from 95.3707% to 99.3333% and the
german.numer-scale, the number of the samples is reduced
training time is also decreased after the reduction. For the
from 1000 to 200, the number of the support vectors is 611 to
large datasets ijcnn1 and cod-rna-scale, the numbers of
86, the accuracy is increased from 76% to 85% and the
training samples and the support vectors are both decreased
training time is shorten. For the dataset of svmguide3, the
greatly, and the time is greatly shorten. It is obvious to the
number of the samples is reduced from 1243 to 199, the
results for the larger datasets cod-rna-scale, the numbers of
number of the support vectors is 521 to 57, the accuracy is
training samples and the support vectors are both decreased
increased from 80.7723% to 90.9548% and the training time

1513
Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, 11-14 July 2010

greatly, the accuracy is improved from 94.9677% to vector domain description method RSVDD, Journal of
96.4548%, the training time is decreased from 1282.69580s Xidian University (Natural Science Edition), Vol.35,
to 42.9563s. No.5, pp,928-929, Oct.2008.
As what the above show, the proposed algorithm, [8] Fang Zhu, Junhua Gu, New reduction strategy of
especially for the large datasets, performs well to decrease large-scale training samples set for SVM, Journal of
the consumption of computer memory, improve the Computer Applications, Vol.29, No.10, pp, 2736-2740,
classification accuracy and accelerate the training speed of Oct.2009.
SVM. [9] Tax D M J. One-class Classification: Concept- learning
in the Absence of Counter-examples [D]. Netherlands:
6. Conclusion Delft Univ, 2001.
[10] Manuele Bicego, Mario A.T. Figueiredo. Soft clustering
In this paper we present a new sample reduction method, using weighted one-class support vector machines,
which first reduces the training samples through the Pattern Recognition, Vol.42, pp. 27-32, 2009.
algorithm of SVDD and then remove the edge points based [11] J.C. Burges, A tutorial on support vector machines for
on Euclidian Distance. The experimental results show that pattern recognition, Data Mining and Knowledge
the new algorithm in the paper is capable of reducing the Discovery, Vol.2, No. 2, pp, 955–974, 1998.
number of samples as well as the training time while [12] Camastra, A. Verri, A novel kernel method for
maintaining high accuracy. clustering, IEEE Trans. Pattern Anal. Mach. Intell., 27,
pp, 801–805, 2005
Acknowledgements [13] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support
vector machines, 2001. Software available at
This research is supported in part by the natural science http://www.csie.ntu.edu.tw/ѝcjlin/libsvm.
foundation of Hebei Province(No.F2008000635), the plan of [14] C.L. Blake, C.J. Merz,UCI repository of machine
the natural science foundation of Hebei University(doctor learning databases, Department of Information and
project) (No.Y2008122), the key project foundation of Computer Sciences, University of California, Irvine,
applied fundamental research of Hebei Available at http://www.ics.uci.edu/ѝmlearn/ML
Province(No.08963522D),the Scientific Research Project of Repository.html
Department of Education of Hebei Province (No.2009107).

References

[1] Vapnik V N. The nature of statistical learning theory


[M].New York˖Berlin: Springer, 1995.
[2] Xuegong Zhang, Introduction to statistical learning
theory and support vector machine, Acta Automatica
Sinica, Vol.26, No.1, Jan, 2000.
[3] Hua Yan, Deshan Sun, Fuzzy support vector machine of
dismissing margin. Computer Engineering and
Applications, Computer Engineering and Applications,
Vol.45, No.26, pp, 107-109, 2009.
[4] Shujuan Cao, Xiaomao Liu, Fuzzy Support Vector
Machine of Dismissing Margin Based on the Method of
Class-center, Computer Engineering and Applications,
Vol.42, No.22, pp, 146-149, 2006.
[5] Defeng Wang, Lin Shi. Selecting valuable training
samples for SVMs via data structure analysis.
Neurocomputing, 71, pp. 2772-2781, 2008.
[6] D.M.J. Tax and R.P.W. Duin, .Support vector domain
description, Pattern Recognition Letters, vol. 20, pp.
1191-1199, 1999.
[7] Jinjin Liang, Sanyang Liu, De Wu, Reduced support

1514

You might also like