You are on page 1of 4

Proceedings of tbc Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004

A HEURISTIC ALGORITHM TO INCREMENTAL SUPPORT VECTOR


MACHINE LEARNING
I.
ZHONG-WE1 LI,JIAN-PE1 ZHANG, JING YANG

College of Computer Science and Technology, Harbii Engineering University, Harbin 150001, China
E-MAIL: davis525@sina.com,zhangjianpei@0451.com, yj_fmt@0451.com

Abstract: Support vector machine (SVM) is a general class of


’ Jncremental learning techniques aru possible solutions to
I the statistical learning architectures that perform structural
handle v&t data as information from Internet updating gets risk minimization on a nested set structure of separating
faster. Support Vector Machine works well for incremental hyperplaies and has been successfully used as a
learning model with i m p d v e prPonnance for its classification tool in a variety of areas. One outstanding
outstanding power to summarize the data space in a concise feature of SVM is the sparseness representation of the
way. ThLs paper proposes a heuristic algorithm to incremental
learning with SVM taldng the possible impact of new training decision boundary’ it provides. The location of the
data to history data into account The idea of this heuristic separating hyperplane is specified via real-valued weights
algorithm is that the partition differenceset has less elements, on the training datas while those data that .lie far away from
and existing hyperplane is much closer to the optimal one. the hyperplane do not participate in their specification and
New support vectors in this algorithm consist OF existing therefore receive zero weighfand only data that lie close to
support vectors and partition diaereuce set of new training the decision boundary between the two classes, which is
data and history data by separak.g hyperplane. The support vectors,and receive non-zero weights. Since the
algorithm improves dassifieation prefision by a d d i design of S V M allows the number of support vectors to be
partition diaerence set, and deerease the computation small compared to the total number of training data, they
F m p k x i ~by c o n s ~ f t i n gnew dessitication hyperplane on
support vector set The experiment “Its show that this provide a compact representation of the data to which new
heuristic algorithm ts .*dent and effective to improve the data can be added as they become available, therefore,
.clr@fication precision. S V M seem well to be trained according t g incremental
learning.
Keywords: The rest of. this paper is organized as follows. In
.Ineremental learning; SVM; elassi6catioo; machine Section 2, we introduce basic definitions of SVM and batch
learning incremental learning model. In Section 3, we present our
algorithm for solving incremental learning to vast data
1. Introduction classification. In Section 4, we run experiments on real
datasets to evaluate the proposed learning algorithm. In
With the rapid development of modern computing g d Section 5 , we conclude the paper.
information tecbno1ogies.a large amount of datas are
produced in engineeringand businesses fieldsmd there is a 2. SVM and incremental learning
need to scale up inductive learning algorithms to handle
more training data. It is impossible to consider all training 2.1. support vector machine
data simultaneously to accurately estimate the class
dishibutions. However, these real-world data sets are SVM has recently gained prominence in the fieid of
common far too large to fit in main memory at once go, or machine learning and pattem classification. Classifcation
stored in secondary storage devices, making their access is achieved by realizing a linear or non-linear separation
particularly expensive. One possible approach to satisfy surface in the. input space[*’~L6’.
these problems is incremental learning techniques, which is For a classification problem, given 2 samples data
to train the classifier using some incremental learning
technique, whereby only subsets of the data are to be points ((X,,Y~),(X~,Y~)~...,(X~~Y~)) , S V M training
considered at any one time and results subsequently involves solving a quadratic programming problem and the
combined[”. optimal solution gives rise .@ a, decision function of the

0-7803-8403-2/04/$20.00 @2WIEEE
1764

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 09:03:56 UTC from IEEE Xplore. Restrictions apply.
. .

Proceedmgs of the Third International Conference'onMachine Learning and Cybemetics, Shanghai, 26-29 A G s t 2004

following form: distribution.

Often, only small fractions of U, coefficient (Lagrange


9L-r' ... 7
subset 1 subset2

multipliers) are non-zero, corresponding samples are called ~~&svs


concept
support vectors. A key property of SVM is that training a + --+ '..
S V M on the support vectors alone gives the same result as
training on the complete example set. Remaining samples Figure 1 Batch incremental SVM leaming model
may be regarded as redundant and negligible because ihey
Another disadvantage of this leaming model is the
do not contribute to the decision function.
time consuming is not prompted, since all data in any
When "ng are non-separab1e' One can subset, should be computed to tell whether it is a support
transform the set of input samples into a higher
veCtOrornot,
dimensional feature space using a map o(x,) k+z, , and
then execute alinear separation. This leads to: 3. Heuristic incremental SVM algorithm

It is proved that.the location of the optimal hyperplane


is related with linear combination of support vectors. f i s
implies that the key to construct optimal hyperplane is
r'.
collection more useful data int as support vectors d&ng
the incremental learning'21. Most incremental learning
where K(x,x,)=(@(x).dP(x,)) is known as a kernel algorithm are based on improving SVM algorithm and
collecting more useful data as su port v e ~ t o r s [ ~while
~~~~',
function. some are based'on concept drift'77. Heunshc algorithm is
"

The kemel function allows us to construct an optimal


the h t case.
separating hyperplane in a new feature space without
explicitly performing calculations, it should be easy to Considering there are series of hyperplanes vi
compute, well-defmed and span a sufficiently rich closeing up to the optimal hyperplane gradually. During
hypothesis space. Unfortunately, the training of Support this gradual change, the difference between any two
Vector Machine itself and kernel function can be very time partitions of data set is slowing down gradually, ideally
and memory consuming, for large mounts of data and even to'zero when assuming the optimal hyperplane can,
kernel matrix will be stored and computed. separate the data completelymd correctly.
The main idea of4%sconsideration is that data points in
2.2. Batch incremental SVM algorithm difference are much closer to the optimal hyperplane, and a
series of hyperplanes are gradually closer to the optimal
Given that only a small fraction of training data end up one w& the data points in partition difference set getting
as support vectors, the SVM is able to smnmarize the data .less and less, until less than a given&. The statistical
space in a very concise manner..This suggests us a feasible .. ' meanihg of this consideration showed in figure 2 seems to
method is that we can partition the trainiig samplesdata set suggest us a heuristic incremental algorithm.
in hatches (siihrPta) thi! fit i9tc znen~cy,,&r is& sew-
batch of'samples, a S V M is traiyJmn the.new samples
data and the sup rt vecto@" the previous leaming step
(see Figure 1)[go '.,&cording to .important properties of
support rec(ors, we can expect to get an incremental result
,i& .is. equal to the non-incremental result, if the last
training set contains all samples that are support vectors in
the non-incremental case.
The reasoning behind hatch leaming model is the'
assumption that the batches of data, will be appropriate
samples of the data. While the problem is the learning Figure 2. The partition of data set by differenthyperplanes
.ksults n e subject to numbers of batches and state.of data

1765

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 09:03:56 UTC from IEEE Xplore. Restrictions apply.
,Proceedingsof the Third International Conference on Machine Laming and Cybernetics, Shanghai, 26-29 August 2004

3 v e n history data set X iand support vector set SV,, optimal hyperplane Vi constructed by SV,;
new data set X,+,;
nitialize SV=SV,, SV,=O;
3xecute
1. Partition X i into Xt' and X;' with vi;
2. Train Xi+,,pick up support vector set SVj+land construct y j + ,partition
, Xi+,
into X,:: and X z l with Vi+,
3. Set SV=SV+SVr+,;
10
4. Construct y with S V ;
5. Partition X i into Xi'' and X;" with y , partition Xi+, into XL;'and Xi:' with y ;
6.Set SV, = ( X ~ ' - X ~ ' ' " + ( X ' ' ' - X ; " ) + ( X , : : -xL:')+(xL:'-xL:);
7. Set SV=SV+SV,;
8. Set i=i+l;
L.

Xhile (SV, 5 E ) i/

itore and output the final SV and y..

Figure 3. Pseudo-code of proposed heuristic increment SVM algorithm


Figure 3 shows the pseudo-de of this heunstic vectors of two algorithms in every incremental step. .
increment SVM algorithm. When constructing hyperplane and
partition two data set iteratively, computation are almost
focused on constructing hyperplane with support vectors hatch heuristic
and operation of set partition and combination, only once to Number incremental incremental
train the new data set at the fust. Therefore, the S V M (%) SVM (96)
computation complexity and learning time is decreased training 368 92.7 92.1
against the batch incremental S V M learning algorithm with data
subset 1 288 93.1 93.6
I
every data point in data set computed to choose support
vectors. Also, the experiments following improved this subset2 326 93.4 94.1
consideration correct. subset 3 205 93.5 94.7
subset4 272 93.6 95.0
4. Experiment
Table 2. S of two incre ntal algorithms
We compare the batch incremental S V M learning I

hatch heuristic
algorithm and heuristic incremental SVM learning
Number incremental incremental
algorithm by conducted experiment using a business text SVM (#SVs) SVM (#SVS)
database. This is a pre-labeled by hand dataset which
training 46 46
consists of 1675 data points, each having a dimension of 27. 368
data
Take 368 data points as initial training set and 216 data
subset 1 288 I1 131
points as test set randomly, separate the rest data points into
4 subsets and use the polynomial kernel. The value of E subset 2 326 115 146
is 3% of the size of each subset. Table 1 provides the subset 3 205 138 163
classification precision results for this experiment. subset 4 212 165 182
For adding data points in partition difference set to
support vector set, the amount of support vectors of The following criteria are often used to evaluate an
heuristic incremental SVM learning algorithm is more than incremental classification algorithm: the classification
batch incremental case. Table 2 provides the support precision, the scalability of leaming and classification on

1766

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 09:03:56 UTC from IEEE Xplore. Restrictions apply.
Proceedqs of the Third Intemadonal Conference on Machine Learning and Cybemetics, Shanghai, 26-29 August 2004

large data sets, the robustness to noise. It can be seen that [l] F. J. Provost and V. Kolluri. “A suivey of methods for
the classifcation precision results in incremental steps are scaling up inductive leaming algorithms”, Technical
improved from the table 1. For the given & is only Report ISL-97-3, Intelligent Systems Lab.,
determined by scale of data and request of classifcation Department of Computer Science, University of
precision but not state of data distribution, the algorithm Pittsburgh, 1997.
proposed in this paper satisfies the criteria on the whole. 121 V. Vapnik. The Nature of Statistical Learning Theory.
Springer Verlag, New York, 1995.
5. Conclusions [3] Osuna E, Freund R, Girosi F. “An improved training
algorithm for support vector machine”, Proceeding of
SVM is adaptable to incremental leaming to vast data IEEE NNSP’97, Amelia Island FL, pp.276-285,
classification for its outstanding power to summarize the September 1997.
data space. A heuristic incremental SVM learning [4] Xiao Rong, Wang Jicbeng, Sun Zhengxing, B a n g
algorithm is proposed based on considering the possibility Fuyan. “An Apporach to Incremental SVM Leaming
of new data set works on the history data. It collects more Algorithm”, Joumal of NanJing University(Natural
data points which contribute more to fmal hyperplane as Sciences), Vol38, No. 2, pp. 152-157, Mar. 2002.
support vectors from partition differenceof training data set. [5] Zeng Wen-hua, Ma Jian. “A Novel Approach to
Experiments improved that this algorithm is efficient to Incremental SVM leaming Algorithm”, Joumal of
de! with vast data classification problems with hgher Xiamen Univeristy(Natural Science), Vol 41, No. 6,
classifcation precision. pp.687-691, Nov. 2002.
Results achieved in this paper are promising and some [6] Christopher J, Burges C, A Tutorial on Support
additional researches will be performed in the future with Vector Machines for Pattern Recognition, Kluwer
large amount and variety &data classification. Academic Pubhshers, Boston, 1998.
[7] N. Syed, H. Liu, and K. Sung, “Incremental leaming
Acknowledgements with support vector machines”, Proceeding of UCAI
Conference, Sweden, August 1999.
This paper is sponsored by the Natural Science [SI R. Klinkenberg and J. Thorsten. “Detecting concept
Foundation of Heilongjiang Province under Grant No. drift with support vector machines”, Proceeding of
Fo304. 17” ICML Conference, Morgan Kaufmann, June
2OOO.
References [9] P. Mitra, C. A. Murthy, and S. K. Pal. “Data
Condensation in Large Databases by Incremental
Learning with Support Vector Machines”, Proceeding
of ICPR Conference, Spain ,September 2OOO

1767

Authorized licensed use limited to: SLIIT - Sri Lanka Institute of Information Technology. Downloaded on June 09,2023 at 09:03:56 UTC from IEEE Xplore. Restrictions apply.

You might also like