15 views

Uploaded by mr

Semi-supervised Support Vector Machine

- A Re-Learning Based Post-Processing Step For Brain Tumor Segmentation From Multi-Sequence Images
- kapitaselekta
- Support Vector Machine
- resume
- CardiacAbnormality+SVM
- Ya Seen 2015
- An Ontology-Based Sentiment Classification Methodology for Online Consumer Reviews
- Principles of Support Vector Machine
- Support Vector Machine
- Describing People a Poselet-Based Approach to Attribute Classification - Bourdev, Maji, Malik - Proceedings of International Conference on Computer Vision - 2011
- A Hybrid Approach for Classification of DICOM Image
- -Handwriting Recognition
- A Comparative Study of Content Based Image Retrieval Trends and Approaches
- Data Mining 101
- ML-Advice Machine Learning
- Chatbot Synopsis
- An improved fuzzy twin support vector machine
- Pioneering Method to Analyse Depression in Human Being Using Audio-Video Parameters
- glossary
- Su Et Al-2016-Structural Control and Health Monitoring

You are on page 1of 10

DOI 10.1007/s00521-015-2113-7

REVIEW

Shifei Ding1,2 Zhibin Zhu1 Xiekai Zhang1

The Natural Computing Applications Forum 2015

learning method based on statistical learning theory. It has

a lot of advantages, such as solid theoretical foundation, In recent decades, the ways of collecting data are more

global optimization, the sparsity of the solution, nonlinear diverse, and the amount of data is also growing. With the

and generalization. The standard form of SVM only rapid development of various technologies, it becomes easy

applies to supervised learning. Large amount of data to collect large amounts of data, but most of them have no

generated in real life is unlabeled, and the standard form label. Usually, a huge amount of work is required to label

of SVM cannot make good use of these data to improve its the data. However, the data that people obtain in every day

learning ability. However, semi-supervised support vector are massive, and this means that it is unfeasible to invest a

machine (S3VM) is a good solution to this problem. This lot of resources in the work. The generalization ability of a

paper reviews the recent progress in semi-supervised learning system often depends on the number of labeled

support vector machine. First, the basic theory of S3VM is training samples. If there are small amounts of labeled data,

expounded and discussed in detail; then, the mainstream combining labeled and unlabeled data to change or

model of S3VM is presented, including transductive sup- improve the learning behavior would be an interesting

port vector machine, Laplacian support vector machine, work, and the semi-supervised learning [13] is supposed

S3VM training via the label mean, S3VM based on cluster to do that.

kernel; finally, we give the conclusions and look ahead to Support vector machine is a machine learning method

the research on S3VM. based on statistical learning theory. It has a lot of advan-

tages, such as solid theoretical foundation, global opti-

Keywords Semi-supervised Support vector machine mization, the sparsity of the solution, nonlinear and

Semi-supervised support vector machine generalization. And it also shows a good prospect in

practical engineering fields [48]. The standard form of

SVM only applies to supervised learning, which training

data must be labeled. However, in many practical prob-

lems, unlabeled data are always the majority. It is difficult

to get a learner with strong generalization ability by using

limited labeled data in supervised learning. When the

& Shifei Ding labeled data are scarce and the unlabeled data are suffi-

dingsf@cumt.edu.cn cient, semi-supervised learning method can help improve

1

the classification performance of SVM in this situation.

School of Computer Science and Technology, China

Thus, introducing the idea of semi-supervised learning into

University of Mining and Technology, Xuzhou 221116,

China SVM and combining them will overcome the shortcomings

2 of SVM and get better classification results. Obviously, this

Jiangsu Key Laboratory of Mine Mechanical and Electrical

Equipment, China University of Mining and Technology, will be an important research field. The development of

Xuzhou 221116, China semi-supervised support vector machine is mainly to make

123

Neural Comput & Applic

and large amounts of unlabeled data for training and Lw; b; a jjwjj2 ai yi w xi b 1

2 i1 2

classification. Furthermore, those data can be fully utilized

to improve the classification accuracy of SVM and increase s:t: ai 0

its generalization performance.

Here, ai are the corresponding Lagrange multipliers to

In 1999, Bennett and Demiriz [9] proposed semi-su-

each sample. The solution of formula (2) is determined by

pervised support vector machine. In the next more than

the saddle point. At saddle point, the partial derivative of

10 years, scholars tried using various semi-supervised

w and b is zero.

learning method to improve S3VM and presented many

effective semi-supervised support vector machine algo- o Xl

Lw; b; a w ai y i x i 0

rithm [10, 11]. This paper reviews the latest research pro- ow i1

gress on the semi-supervised support vector machine 3

o Xl

algorithm and expatiates on the idea of several existent Lw; b; a ai y i 0

mainstream semi-supervised support vector machine algo- ob i1

rithms and its improved method.

Combining (2) with (3), the quadratic programming

This paper is organized as follows: Sect. 2 discusses the

problem will be transformed into the corresponding dual

basic theory of SVM. In Sect. 3, we analyze the current

problem and then maximize the objective function:

research on semi-supervised support vector machine algo-

rithm in detail. Section 4 contains some useful concluding X

l

1X l

Maxa ai ai aj yi yj xi xj

comments. i1

2 i;j1

4

X

l

s:t: ai 0; i 1; 2; . . . l: ai y i 0

2 Support vector machine i1

Support vector machine, proposed by Vapnik, is a machine We can get the optimal solution a* = (a*1, a*2, , a*l )T

learning method based on VC dimension and structural risk by solving Eq. (4), so the optimal weight vector w* and

minimization and is a specific realization for statistical optimal bias b* can be expressed as:

learning theory. Considering from the learning mode, SVM X

l

i1

The basic theory of SVM is to find an optimal classifi-

cation hyperplane which meets the requirements of clas- b yi w xi 6

sification. As shown below, the solid black dots represent

Finally, the optimal classification decision function can

one class of sample, and the white dots represent another

be defined as:

one. H is the classification hyperplane. H1 and H2 are the

plane which contains sample points and have the closest f x sgnw xi b 7

distance with the classification hyperplane. H1 and H2 are The optimal classification hyperplane discussed and

parallel to H. The distance between H1 and H2 is called solved above is in the linear case. For the nonlinear case,

maximum margin. The optimal classification hyperplane of kernel functions which map input vector to high-dimen-

SVM requires not only separating the samples correctly, sional feature vector space are used to help SVM construct

but also maximizing the margin. Those sample points the optimal classification hyperplane. The optimal classi-

contained in H1 and H2 are support vector. fication decision function can be written as:

Let (xi, yi), i = 1, 2, , l, x 2 Rn, y 2 {1} denote the

f x sgnw Ux b 8

sample dataset, where yi is the label. Also, let the hyper-

plane be denoted as (w x) ? b = 0. When the two classes Here, U(x) is the mapping of x from input space Rn to

are linearly separable, the optimal classification hyperplane feature space. By using kernel function, it avoids to com-

can be summarized to solve quadratic programming prob- pute in high-dimensional space, because it only needs to

lems as follows: care the inner product operation between input vectors.

1 From the above formula, we can find that the label

min jjwjj2 information of samples is used in computing, which is the

w;b 2 1

s:t yi w xi b 1; i 1; 2; . . .; l key of supervised learning. In supervised learning, we

always need to get the mapping relationship between the

The formula above is a convex programming problem. feature and label by training from labeled sample. How-

A solution to this problem is to use Lagrange function: ever, the labeled samples, which are in a tiny minority in

123

Neural Comput & Applic

reality, cannot represent the detailed characteristics of a enough labeled data for training, so that it could get a

class. If only trained by a few labeled samples, the gen- learner with strong generalization. However, it is difficult

eralization of SVM would be not strong. And it would also to obtain labeled data in practical applications, and it would

result in lower classification ability. S3VM is a good also cost a great resource to label the data. In order to take

solution to this problem. S3VM takes full use of the a good advantage of the large amount of unlabeled data,

unlabeled samples by combining SVM with semi-super- scholars proposed the semi-supervised learning.

vised learning algorithms. Currently, a variety of semi-supervised learning algo-

rithms is based on two common assumptions, that is,

cluster assumption and manifold assumption [17]. Cluster

3 Semi-supervised support vector machine assumption means that samples in the same cluster have a

greater probability in having the same label. Manifold

3.1 Semi-supervised learning assumption considers that close points along the manifold

flat have a similar nature or similar label. In these two

Machine learning is a core research area in artificial assumptions, cluster assumption reflects the global feature

intelligence. According to containing labeled data or of models. However, the manifold assumption reflects the

unlabeled data in training sets, the type of machine learning local features.

can be divided into unsupervised learning, supervised There are several classical methods in semi-supervised

learning and semi-supervised learning. learning, such as self-training method [1820], expecta-

In unsupervised learning, there are only some given tionmaximization method [21], multiview method [22

input data, and we should find the hidden structure or law 24], graph-based method [25].

of the input data through using a certain learning method.

Clustering algorithm is an unsupervised learning method. 3.2 S3VM

In supervised learning, those data contain the label which

indicates the classification of data. Its core idea is to get a Semi-supervised support vector machine was proposed by

mapping relationship between the feature and label by Bennett and Demiriz [9], which is a semi-supervised

training from labeled sample, and this mapping relationship learning method based on cluster assumption. The optimal

should be consistent with the labeled sample. Most of the goal of S3VM is to build classifier by using labeled data

classification algorithm belongs to supervised learning and unlabeled data. Similar to the idea of SVM, S3VM

[12, 13]. requires the maximum margin to separate the labeled data

Semi-supervised learning is the research focus on and unlabeled data. And the new optimal classification

machine learning in recent years. It is a learning method boundary must satisfy that the classification on original

combining unsupervised learning and supervised learning. unlabeled data has the smallest generalization error.

The basic idea is using a large number of unlabeled data to Before providing the formula of S3VM proposed by

help the supervised learning method improve effect [14]. In Bennett and Demiriz, we can explore how to use the

semi-supervised learning, there are labeled dataset unlabeled data in SVM. Now, we start from the basic

L = {(x1, y1), (x2, y2), , (xl, yl)} and unlabeled dataset theory of SVM and find some way to use these data.

U fx01 ; x02 ; . . .; x0u g. Now, we expect to get a function It is known that SVM is based on VC dimension and

f:X ? Y, which could accurately predict the label y for the structural risk minimization. And the regularization is

given x. Here, xi, xj0 2 X is a d-dimensional vector, and exactly a way to achieve structural risk minimization,

yi 2 Y is the label of xi. In addition, l and u are the number which is the empirical risk plus a regularization term or

of samples that L and U contain. penalty term.

The reason for proposing semi-supervised learning After adding a penalty term in formula (1), the objective

method is that the unlabeled data can indeed help improve function may be rewritten as:

the performance of an algorithm in theory. Miller and Uyar X l

1

[15] analyzed from data distribution estimations and draw a min jjwjj2 C ni

w;b;n 2 9

conclusion that the generalization ability of classifier can i1

s:t: yi w xi b 1 ni

be improved with a significant increase in unlabeled data.

ni 0; i 1; 2; . . .; l

Zhang and Oles [16] also pointed out that if a parametric

model satisfies P(x, y|h) = P(y|x, h)P(x|h), the unlabeled where ni(i = 1, 2, , n) are the slack variables. It means

data would help model get a better ability of estimating the the acceptable deviation between function margin and the

model parameters and improve its performance. From corresponding data xi. C is a parameter which controls the

another perspective, supervised learning assumes there are weight of penalty term in objective function.

123

Neural Comput & Applic

l P

n

risk as follows: min C ni sj

w;b;s;n i1 j1

( )

1 X l s:t: yi jw xi bj ni 1 14

minUw min jjwjj2 C max1 yi w xi b;0 ni 0; i 1; 2; . . .; l

w;b 2 i sj wj sj ; j 1; 2; . . .; n:

10

In fact, the above formula changes the 2-norm ||w||2 in

2

Here, 12 jjwjj can be seen as a regularization term, and (9) to 1-norm ||w||1. One of the advantages is to make the

max1 yi w xi b; 0 is the loss function of labeled dimension decreases, and another benefit is using linear

data. programming instead of quadratic programming to solve.

In order to get semi-supervised support vector machine, Moreover, if it is a nonlinear case, it only needs to use the

we must use the unlabeled data. Now assume that the kernel function and can get a good extension.

unlabeled data are labeled, and let the label be Based on formula (14), they add two restrictions on the

y^ signw x b. So, the loss function of unlabeled data unlabeled data. One restriction assumes that the unlabeled

can be expressed as: data belong to one of the class, and calculate its error rate;

max1 y^w x b; 0 max1 jw x bj; 0 11 another restriction assumes that the unlabeled data belong

to another class, and also calculate its error rate. Then, the

After using the unlabeled data, adding (9)(11) we can objective function calculates the minimum of the two

get the basic form of semi-supervised support vector possible misclassifications errors. So, the S3VM could be

machine: defined as:

(

1 Xl " #

minUw min jjwjj2 C1 max1 yi w xi b; 0 P

l lP

k

w;b 2 i min C ni mingi ; zi jjwjj

) w;b;g;n;z i1 jl1

X

lu

s:t: yi w xi b ni 1 ni 0 i 1; 2; . . .; l

C2 max1 jw xi bj; 0 12 w xj b gj 1 gj 0 j l 1; l 2; . . .; l k

il1

w xj b zj 1 zj 0

Here, C1 and C2 are the weight of two loss function. 15

i = l ? 1, l ? 2, , l ? u is the unlabeled data. In the

above formula, it needs to add constraints to avoid that where C [ 0 is a fixed misclassification penalty.

those unlabeled data would be divided into a same class: They use integer programming to solve this problem. By

adding a zero or one decision variable dj for each xi, they

1X lu

1X l

formulate the semi-supervised support vector machine as a

f xj yi 13

u jl1 l i1 mixed integer program:

" #

Until now, we get a reasonable semi-supervised support P

l lP

k

min C gi nj zj jjwjj

vector machine. Support vector machine is a convex opti- w;b;g;n;z;d i1 jl1

w xj b nj M1 dj 1; nj 0; j l 1; . . .; l k

convex optimization can refer to supervised tensor learning w xj b zj Mdj 1; zj 0; dj f0; 1g

[26]. However, this semi-supervised support vector

16

machine is a non-convex quadratic programming problem

and has a complex calculation in solving. With the new However, with the increase in unlabeled data, the

data adding, it needs to resolve the programming problem. complexity of computation will rapidly increase. It is dif-

In order to solve this problem, scholars have proposed a lot ficult to solve for large unlabeled data. To overcome this

of optimization techniques for non-convex optimization difficulty, in 2001, Glenn et al. [37] proposed concave

problem [27]. Some typical optimization methods include semi-supervised support vector machine (VS3VM). Its

gradient descent [28, 29], concave convex procedure [30], main idea is to formulate the problem as a concave mini-

deterministic annealing [31], continuation method [32], mization problem which is solved by a successive linear

semi-definite programming [33], DC programming and approximation algorithm. That reduces some complexity in

DCA [34], branch-and-bound algorithms [35] and genetic calculation and makes it able to handle large unlabeled

algorithm [36]. datasets.

Bennett and Demiriz proposed semi-supervised support Although Bennett and Demiriz proposed S3VM ini-

vector machine on the basis of the robust linear program- tially, in most cases, the S3VM does not refer to the for-

ming (RLP) approach to SVM. The corresponding robust mula (15). We prefer treating the original form [i.e.,

linear program for SVM was defined as: formula (12)] as the S3VM. Except mentioning Bennett

123

Neural Comput & Applic

and Demiriz, the S3VM in this paper refers to formula (12). On the application side, semi-supervised support vector

Now, we will present some good improvements in S3VM. machines can be widely used in some classification prob-

In 2011, Li et al. [38] proposed safe semi-supervised lems. At present, we can see there are a lot of applications

support vector machine (S4VM). S3VM is based on low- which semi-supervised support vector machines have been

density assumption and tries to find a low-density separator applied in, and the potential applications are image clas-

which favors the decision boundary going across low- sification and text classification. On these two issues, semi-

density regions in the feature space. However, S4VM tries supervised support vector machines have shown good

to exploit multiple candidate low-density separators and results.

then select the representative separators. In 2014, Hu et al.

[39] combined traditional S3VM with multikernel learning 3.3 Transductive support vector machines

and proposed a multikernel S3VM optimization model

based on Lp norm constraint (Lp - MKL - S3VM). In this In 2009, Joachims [44] proposed transductive support

multikernel framework, both basic kernels and manifold vector machine (TSVM). TSVM is an important method in

kernels are used to achieve a combination of two semi-supervised support vector machines. It is based on

assumptions (i.e., cluster assumption and manifold cluster assumption. It is interesting that TSVM and S3VM

assumption). By adding Lp norm constraint to kernel are not only proposed in the same year, but also similar in

combination coefficients h, they get the non-sparse solution their main idea and optimization problem. The objective of

for h and improve the interpretability and generalization of TSVM is to focus on a particular working set, making it

S3VM. possible to achieve the optimal classification in this

Semi-supervised support vector machine is non-smooth working set. And this leads to a poor ability in dealing with

and non-convex. To tackle this problem, scholars use the new data, i.e., poor generalization.

quasi-Newton approach [40] in S3VM. In 2011, Reddy There are labeled data (x1, y1), , (xn, yn), xi 2 Rm, -

et al. [41] applied a modified quasi-Newton method for yi 2 {- 1, ?1} and unlabeled data x*1, x*2, , x*k . In lin-

S3VM. In 2012, Gieseke et al. [42] proposed a sparse early non-separable cases, the optimization problem of

quasi-Newton optimization for S3VM. Both of them made TSVM is shown as follows:

an effective improvement. In 2015, by using a general

three-moment method, Jiang et al. [43] constructed a Minimize over y1 ; . . .; yn ; w; b; n1 ; . . .; nn ; n1 ; . . .; nk :

1 X n Xk

quantic spline symmetric hinge approximation function jjwjj2 C ni C nj

with three-time differentiability at origin. They applied to 2 i0 j0

smooth S3VM model and proved that it has a good s:t: yi w xi b 1 ni ; i 1; 2; . . .; n

convergence. yj w xj b 1 nj ; j 1; 2; . . .; k

When should people use semi-supervised SVM? In fact, ni 0; i 1; 2; . . .; n

it can be used on all two-class problems in semi-supervised nj 0; j 1; 2; . . .; k

learning. Now, assuming that there is a dataset D with 1000

17

sets of data and D is a two-class problem, we will use these

data to get a classifier. In dataset D, 100 sets of data have Here, ni and n*j stand for the relaxation factor of labeled

been labeled, and the others have no label. If we use SVM, data and unlabeled data, respectively. C and C* are weight,

we can only use 100 sets of data. However, semi-super- which represent the effect of labeled data and unlabeled

vised SVM can make full use of 1000 sets of data. Finally, data, respectively, in objective function.

the performance of resulting classifier would be better by The most important thing in TSVM is the transductive

choosing semi-supervised SVM. All in all, if there are approach. It will have a good classification ability in a

small amounts of labeled data and large amounts of unla- particular working set. As for unknown working sets, its

beled data, semi-supervised SVM is the great choice. performance is just ordinary. In addition, TSVM has a high

With the increase in unlabeled data, semi-supervised time complexity and needs to preset the number of positive

support vector machines also present a rising trend in samples. To solve these problems, scholars have done a lot

practical application. Semi-supervised support vector of work to improve the algorithm. In 2003, Chen et al. [53]

machine used initially in text classification [44] and made a proposed a progressive transductive support vector

good result. Semi-supervised support vector machines have machine (PTSVM). In this algorithm, it is no need to preset

been also applied in other areas, such as image classifica- the number of positive samples. Instead, according to

tion [19, 4547], iris image recognition [48], face recog- certain principles, PTSVM gives a possible label for the

nition [49], physical education effect evaluation [50], unlabeled data in the training process and retrains the new

cancer classification [51] and rapid identification between dataset of labeled data. However, this algorithm requires

edible oil and swill-cooked dirty oil [52]. labeling the unlabeled data frequently. And with the

123

Neural Comput & Applic

increase in unlabeled data, the time complexity of this It should be noted that the support vector machine and

algorithm will increase rapidly, which makes the training most of supervised learning methods all use the regular-

speed slow. ization theory. Their optimization goals can be written as:

The primary deficiency of TSVM is that the number of

1X l

positive samples Np in unlabeled data must be appointed min Vxi ; yi ; f cjjf jj2K 18

before training, and the main cause of this problem is the

f 2Hk l i1

pairwise exchange criterion. For this problem, Wang et al. where c C 0 and V is a loss function. K is an effective

[54] proposed individually judging and changing criterion kernel function. Hk corresponds to the reproducing kernel

and then put forward an algorithm to label the unlabeled Hilbert space. jj jjK is the inner product on Hilbert space.

data progressively and adjust the Np dynamically. Unlike Basing on this framework, Belkin proposed a manifold

TSVM, the algorithm is not sensitive to the unreasonable regularization framework [59]. Its main idea is to add a

number of positive samples. When Np is given randomly, regularization term in formula (18), so formula (18)

this method will also show a better performance than becomes:

TSVM.

TSVM and PTSVM both require solving the quadratic 1X l

min Vxi ; yi ; f cjjf jj2K gjjf jj2K 19

programming problems. Zhang et al. [55] replaced the e f 2Hk l i1

insensitive function by using quadratic loss function, which

converts quadratic programming problems to solving linear The regularization term contains the labeled data and

equations and then proposed least square transduction unlabeled data, which reflects the internal structure of sample

support vector machine. data distribution. It should be noted that the manifold regu-

There are also other good improved algorithms for larization is a very important semi-supervised learning

TSVM. In 2012, Yu et al. [56] proposed a transductive method. The manifold regularization can fuse supervised

support vector machine algorithm based on spectral clus- learning and unsupervised learning into semi-supervised

tering. This algorithm uses spectral clustering to cluster learning. Its core idea is to mine the geometry of the distri-

unlabeled data and labels them and then makes transduc- bution of data and add it in formula as a regularization term.

tive inference on the mixed dataset composed by both The data consist of supervised and unsupervised data.

labeled and unlabeled data. It improves the stability Manifold regularization is a hot topic in semi-supervised

greatly. At the same year, Tian et al. [57] proposed a learning, and many scholars have done a lot work in this

multiple kernel version of TSVM which combines both field. For some deficiencies of manifold regularization,

cluster assumption and manifold assumption. And in 2014, scholars also tried to study and improve it. In 2012, Bo et al.

Zhou et al. [58] applied the quasi-linear kernel to TSVM [61] proposed ensemble manifold regularization framework

and proposed a transductive support vector machine with to automatically and implicitly estimate the hyperparameters

adjustable quasi-linear kernel. of the manifold regularization. In 2013, Yong et al. [62]

proposed multiview vector-valued manifold regularization

3.4 Laplacian support vector machines and manifold regularized multitask learning algorithm [63] to

achieve multilabel classification, and both of them were

In Sect. 3.1, we mentioned that there are two assumptions successfully applied to multilabel image classification.

in semi-supervised learning: cluster assumption and man- The Laplacian support vector machine that was pro-

ifold assumption. And S3VM is exactly based on cluster posed by Belkin is exactly based on this manifold regu-

assumption. The manifold assumption reflects mostly a larization framework. In LapSVM, the regularization term

graph-based semi-supervised learning method. Currently, is a Laplace operator, which is given by

there are a large number of graph-based semi-supervised 1

jjf jj2L f T Lf 20

learning methods. Their idea is to build a graph and regard l u2

the labeled data and unlabeled data as the node in the

graph. The similarity between these data can be repre- So the Laplacian support vector machine can be defined

sented by the edge weights in the graph. By using such a as follows:

graph, the label information of these data can be passed to 1X l

c1

the other node, and then, we can label the unlabeled data. min n cA aT Ka aT KLKa

a2Rlu ;n2Rl l i1 i u l2

In fact, the Laplacian support vector machine (LapSVM) !

P

lu

[59] which will be presented in this section is exactly a s:t: yi aj Kxi ; xj b 1 ni ; i 1; 2; . . .; l

j1

graph-based semi-supervised learning method. This

ni 0; i 1; 2; . . .; l

method is applied in image classification [60] and achieves

a good result. 21

123

Neural Comput & Applic

Lagrange function to transfer it into a dual problem, and min min jjwjj22 C1 ni C2 q

d2D w;b;p;n 2 i1

then, it can be solved easily.

s:t: yi w0 /xi b 1 !ni ; i 1; 2; . . .; l

Compared to S3VM, LapSVM is more outstanding in

1 X

lu

solving and time complexity. Certainly, it also has some w0 djl /xj b q

deficiencies. LapSVM is based on the manifold assumption u jl1

!

and reflects the local structure information of dataset. In 1 X

lu

0

fact, this assumption in some classification problems may w 1 djl /xj b q

u jl1

be too broad. When these samples are near to the classi-

fication boundary, they may exactly belong to different 22

class. In recent years, scholars made a great progress in Pu

where D fdjdi 2 f0; 1g; i1 di u g: Balance con-

LapSVM. In order to improve the training speed, in 2011, straints are contained in D. In addition, using the label

Melacci et al. [64] trained the Laplacian support vector mean does not mean to add a constraint to each unlabeled

machine in the primal programming problem instead of data. The main characteristic of the meanS3VM is highly

transferring it into dual problem. This method reduces the efficient with a short time in training. In particular, on

time complexity effectively, which reduces the O(n3) to relatively large data, meanS3VM is 100 times faster than

O(k2). In 2013, Qi et al. [65] proposed a cost-sensitive TSVM and 10 times faster than LapSVM.

Laplacian support vector machine (Cos-LapSVM), which After that, in 2010, Li et al. [70] proposed cost-sensitive

can deal with the sensitive problem in semi-supervised semi-supervised support vector machine (CS4VM), which

learning. To control the sparsity and feature selection of combines the cost-sensitive learning with meanS3VM.

LapSVM, in 2014, Tan et al. [66] introduced an CS4VM uses the different cost for different misclassifica-

adjustable p-norm and proposed Laplacian p-norm proxi- tion based on meanS3VM. In fact, CS4VM is the cost-

mal support vector machine (Lap-PPSVM). At the same sensitive version of meanS3VM.

year, Yang et al. [67] proposed a spatio-spectral Laplacian

support vector machine (SS-LapSVM) and applied it to 3.6 S3VM based on cluster kernels

hyperspectral image classification. In 2015, Qi et al. [68]

further improved LapSVM and proposed a fast Laplacian In semi-supervised learning, kernels have been given a

SVM (FlapSVM). This method does not need to deal with certain attention. Labeled data and unlabeled data are used

extra matrix and can be effectively solved by SOR tech- to construct the kernel functions, which make kernel

nology, which makes it more suitable for large-scale function better reflect the similarity between samples. In

problems. 2002, relying on the cluster assumption, Chapelle [71]

changed the spectrum of the kernel matrix and proposed a

3.5 meanS3VM framework for constructing kernels. They applied it to

SVM and got a good result compared to TSVM. By using

We have presented the S3VM, TSVM and LapSVM above. this framework, we can get a semi-supervised support

S3VM proposed by Bennett and Demiriz needs to solve the vector machine based on cluster kernels. The key is to use

mixed integer program, whose solving procedure has a the new kernel function constructed by labeled data and

high time complexity. TSVM will lead to a large number of unlabeled data to train the SVM.

iterations. And the LapSVM needs to calculate the inverse This S3VM based on cluster kernels has an advantage,

of a n 9 n matrix. All of them have a deficiency in time which it is no need to modify the objective function of

complexity. SVM. If the objective function was modified, we would

The meanS3VM was proposed by Li et al. [69], and they face with the difficult in solving and low efficiency. Those

found that the label mean, being a simple statistic, can be semi-supervised support vector machines that we men-

useful in building a semi-supervised learner. The tioned above have a same characteristic; that is, they all

meanS3VM estimates the label means of the whole unla- need to modify their objective function. These unlabeled

beled data firstly. In fact, after we get the label means of data are added in the optimization problem, and then, it

unlabeled data, we will find that the semi-supervised sup- will bring a new objective function and optimization

port vector machine is similar to the SVM which knows the problem. The S3VM based on cluster kernels only needs to

all label of data. And this will make meanS3VM be more construct a new kernel function, which will reduce some

efficient. The goal of meanS3VM is to maximize the complexity. Certainly, this method has a high time com-

margin between the label means of the unlabeled data. The plexity, and it may have a poor performance in dealing

expression can be defined as follows: with large datasets.

123

Neural Comput & Applic

Using different method, we can get different kernels, important classification method in machine learning, and

and there are some main kernels: spectral clustering kernel combining it with semi-supervised learning seems to be

[72], random walk kernel [73], bagged clustering kernel more significant. On the one hand, semi-supervised support

[7476], mixture models of Gaussian kernel [77]. vector machines inherit the solid theory of SVM, which

will enhance its generalization ability. On the other hand,

by introducing semi-supervised learning, semi-supervised

4 Conclusions support vector machines can use the unlabeled data to

improve its performance and extend its application.

This paper presents these mainstream models of semi-su- However, semi-supervised support vector machines also

pervised support vector machines, including S3VM, have deficiency. Those methods will all cost a large time in

TSVM, LapSVM, meanSVM and S3VM based on cluster training, and it is the biggest challenge in semi-supervised

kernel. In addition, this paper gives a brief understanding support vector machines. Some methods improve this

on their corresponding improved method. situation to a certain degree, but there are still a lot of

Without a doubt, lapSVM and meanSVM are the out- works to do in the future.

standing and potential algorithm in these five mainstream

algorithms. We can analyze these five algorithms from the Acknowledgments This work is supported by the National Natural

Science Foundation of China (No. 61379101).

following aspects. Considering from the solving process,

lapSVM and S3VM based on cluster kernel are easier than

others. LapSVM is based on manifold regularization

framework, which can be solved by using Lagrange func- References

tion and duality theory. The S3VM based on cluster kernel

only needs to construct a special kernel function. Consid- 1. Altun Y, Belkin M, Mcallester DA (2005) Maximum margin

semi-supervised learning for structured variables. In: Proceedings

ering from the time complexity, lapSVM and meanSVM all of the 2005 annual conference on neural information processing

have a good performance. In particular, on relatively large systems, pp 3340

data, meanS3VM is 100 times faster than TSVM and 10 2. Zhu X, Goldberg AB (2009) Introduction to semi-supervised

times faster than LapSVM. Considering from the accuracy, learning. Synth Lect Artif Intell Mach Learn 3(1):1130

3. Hady MFA, Schwenker F (2013) Semi-supervised learning.

these algorithms have their prominence in different data- Handbook on neural information processing. Springer, Berlin,

sets, and it is not accurate to say which the best is. pp 215239

The prosperous application of semi-supervised support 4. Ding SF, Qi BJ, Tan YH (2011) An overview on theory and

vector machine has attracted more and more attention. The algorithm of support vector machines. J Univ Electron Sci

Technol China 40(1):210

current research hot spots can be summarized in two 5. Gu YX, Ding SF (2011) Advances of support vector machines.

aspects: modify the semi-supervised support vector J Comput Sci Technol 38(2):1417

machine and improve its efficiency; the other is to find a 6. Scholkopf B, Smola AJ (2001) Learning with kernels: SUPPORT

more efficient optimal method to solve the optimization vector machines, regularization, optimization, and beyond. MIT

Press, Cambridge

function, enhance the robustness of algorithms and 7. Ding S (2011) Incremental learning algorithm for support vector

improve its generalization. These two aspects are also the data description. J Softw 6(7):11661173

future research directions for semi-supervised support 8. Liu XL, Ding SF (2010) Appropriateness in applying SVMs to

vector machines. These works all contribute to making text classification. Comput Eng Sci 32(6):106108

9. Bennett K, Demiriz A (1999) Semi-supervised support vector

semi-supervised support vector machines better and more machines. In: Advances in neural information processing sys-

convenient in application. tems, vol 11, pp 368374

These semi-supervised SVMs in this article are 10. Hastie T, Tibshirani R, Friedman J (2009) The elements of sta-

improved and modified from the basic function of SVM. tistical learning. Springer, New York

11. Steinwart I, Christmann A (2008) Support vector machines.

By doing this, SVM can be changed into semi-supervised Springer, New York

SVM and applied in semi-supervised learning. Surely, by 12. Jordan MI, Jacobs RI (2014) Supervised learning and divide-and-

using semi-supervised learning models like co-training conquer: a statistical approach. In: Proceedings of the tenth

models, SVM also can be applied in semi-supervised international conference on machine learning, pp 159166

13. Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell

learning. This method does not modify the basic function lymphoma outcome prediction by gene-expression profiling and

of SVM, and its focus is on the study of semi-supervised supervised machine learning. Nat Med 8(1):6874

learning models. Applications also exist in this area, and 14. Zhang CG, Zhang Y (2013) Semi-supervised learning. China

there are good results, such as multitraining support vector Agricultural Science and Technology Press, Beijing

15. Miller DJ, Uyar HS (1997) A mixture of experts classifier with

machine for image retrieval proposed by Jing et al. [78]. learning based on both labelled and unlabelled data. In: Advances

It is no doubt that semi-supervised learning is more in neural information processing systems. MIT Press, Cambridge,

suitable for the reality. Meanwhile, the SVM is also an pp 571577

123

Neural Comput & Applic

16. Zhang T, Oles F (2000) The value of unlabeled data for classi- 40. Yu J, Vishwanathan SVN, Gunter S et al (2010) A quasi-Newton

fication problems. In: Proceedings of 17th international confer- approach to nonsmooth convex optimization problems in

ence on machine learning, pp 11911198 machine learning. J Mach Learn Res 11(5):11451200

17. Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised 41. Reddy IS, Shevade S, Murty MN (2011) A fast quasi-Newton

learning. MIT Press, Cambridge method for semi-supervised SVM. Pattern Recogn 44(10):

18. Triguero I, Saez JA, Luengo J et al (2014) On the characterization 23052313

of noise filters for self-training semi-supervised in nearest 42. Gieseke F, Airola A, Pahikkala T, et al (2012) Sparse quasi-

neighbor classification. Neurocomputing 132:3041 newton optimization for semi-supervised support vector

19. Dopido I, Li J, Marpu PR et al (2013) Semi-supervised self- machines. In: Proceedings of the 1st international conference on

learning for hyperspectral image classification. IEEE Trans pattern recognition applications and methods, pp 4554

Geosci Remote Sens 51(7):40324044 43. Jiang W, Yao L, Jiang X et al (2015) A new classification method

20. Li Y, Li H, Guan C, et al (2007) A self-training semi-supervised based on semi-supervised support vector machine. In: Human-

support vector machine algorithm and its applications in brain centred computing. First international conference, pp 633645

computer interface. In: International conference on acoustics, 44. Joachims T (1999) Transductive inference for text classification

speech, and signal processing, pp 385387 using support vector machines. In: Proceedings of the sixteenth

21. McLachlan GJ, Krishnan T (1997) The EM algorithm and international conference, vol 99, pp 200209

extensions. Wiley-Interscience, New York 45. Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-

22. Dong C, Yin Y, Guo X et al (2008) On co-training style algo- supervised learning for image classification. In: 2010 IEEE

rithms. Int Conf Nat Comput 18(20):196201 Computer society conference on computer vision and pattern

23. Feger F, Koprinska I (2006) Co-training using RBF nets and recognition, pp 902909

different feature splits. In: International joint conference on 46. Li M, Wang R, Tang K (2013) Combining Semi-Supervised and

neural networks, pp 18781885 active learning for hyperspectral image classification. In: Pro-

24. Yu S, Krishnapuram B, Rosales R et al (2011) Bayesian co- ceedings of the 2013 IEEE symposium on computational intel-

training. J Mach Learn Res 12:26492680 ligence and data mining, pp 8994

25. Zhu X (2005) Semi-supervised learning with graphs, Carnegie 47. Xie L, Pan P, Lu Y (2014) Markov random field based fusion for

Mellon University, language technologies institute, school of supervised and semi-supervised multi-modal image classification.

computer science Multimed Tools Appl 74(2):613634

26. Tao D, Li X, Wu X et al (2007) Supervised tensor learning. 48. Yang L, Su Q, Yang B et al (2014) A new semi-supervised

Knowl Inf Syst 13(1):142 support vector machine classifier based on wavelet transform and

27. Chapelle O, Sindhwani V, Keerthi SS (2008) Optimization its application in the iris image recognition. Int J Appl Math Stat

techniques for semi-supervised support vector machines. J Mach 52(5):8693

Learn Res 9:203233 49. Lu K, He X, Zhao J (2006) Semi-supervised support vector

28. Chapelle O, Zien A (2005) Semi-supervised classification by low learning for face recognition. Advances in neural networks-ISNN

density separation. In: Proceedings of the 10th international 2006. Springer, Berlin, pp 104109

workshop on artificial intelligence and statistics, pp 5764 50. Liang P, Xueming Y (2012) Explore semi-supervised support

29. Gieseke F, Airola A, Pahikkala T et al (2014) Fast and simple vector machine algorithm for the application of physical educa-

gradient-based optimization for semi-supervised support vector tion effect evaluation. Int J Adv Comput Technol 4(9):266271

machines. Neurocomputing 123:2332 51. Jun CA, Habibollah H, Haza NAH (2015) Semi-supervised SVM-

30. Collobert R, Sinz F, Weston J et al (2006) Large scale trans- based feature selection for cancer classification using microarray

ductive SVMs. J Mach Learn Res 7:16871712 gene expression data. Springer International Publishing,

31. Sindhwani V, Keerthi SS, Chapelle O (2006) Deterministic Switzerland, pp 468477

annealing for semi-supervised kernel machines. In: Proceedings 52. Zhou Y, Liu T, Li J (2015) Rapid identification between edible

of the 23rd international conference on machine learning, oil and swill-cooked dirty oil by using a semi-supervised support

pp 841848 vector machine based on graph and near-infrared spectroscopy.

32. Chapelle O, Chi M, Zien A (2006) A continuation method for Chemometr Intell Lab Syst 143:16

semi-supervised SVMs. In: Proceedings of the 23rd international 53. Chen YS, Wang GP, Dong SH (2003) A progressive transductive

conference on machine learning, pp 185192 inference algorithm based on support vector machine. J Softw

33. De Bie T, Cristianini N (2006) Semi-supervised learning using 14(3):451460

semi-definite programming. MIT press, Cambridge 54. Wang Y, Huang S (2005) Training TSVM with the proper

34. Le HM, Le Thi HA, Nguyen MC (2015) Sparse semi-supervised number of positive samples. Pattern Recogn Lett 26(14):

support vector machines by DC programming and DCA. Neu- 21872194

rocomputing 153:6276 55. Zhang R, Wang W, Ma Y et al (2009) Least square transduction

35. Chapelle O, Sindhwani V, Keerthi S (2007) Branch and bound for support vector machine. Neural Process Lett 29(2):133142

semi-supervised support vector machines. In: 20th Annual con- 56. Yu X, Yang J, Zhang J (2012) A transductive support vector

ference on neural information processing systems, pp 217224 machine algorithm based on spectral clustering. AASRI Procedia

36. Adankon MM, Cheriet M (2010) Genetic algorithmbased 1:384388

training for semi-supervised SVM. Neural Comput Appl 57. Tian X, Gasso G, Canu S (2012) A multiple kernel framework for

19(8):11971206 inductive semi-supervised SVM learning. Neurocomputing

37. Fung G, Mangasarian OL (2002) Semi-supervised support vector 90:4658

machines for unlabeled data classification. Optim Methods Softw 58. Zhou B, Hu C, Chen B, et al. (2014) A Transductive Support

15:2944 Vector Machine with adjustable quasi-linear kernel for semi-su-

38. Li Y, Zhou Z (2015) Towards making unlabeled data never hurt. pervised data classification. In: Proceedings of the 2014 inter-

IEEE Trans Pattern Anal Mach Intell 37(1):175188 national joint conference on neural networks, pp 14091415

39. Hu QH, Ding LX, He JR (2013) Lp norm constraint multi-kernel 59. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regulariza-

learning method for semi-supervised support vector machine. tion: a geometric framework for learning from labeled and

J Softw 24(11):25222534 unlabeled examples. J Mach Learn Res 7:23992434

123

Neural Comput & Applic

60. Gomez-Chova L, Camps-Valls G, Munoz-Mari J et al (2008) 70. Li Y, Kwok J, Zhou Z (2010) Cost-sensitive semi-supervised

Semisupervised image classification with Laplacian support support vector machine. In: Proceedings of the 24th AAAI con-

vector machines. IEEE Geosci Remote Sens Lett 5(3):336340 ference on artificial intelligences, pp 500505

61. Geng B, Tao D, Xu C et al (2012) Ensemble manifold regular- 71. Chapelle O, Weston J, Scholkopf B (2002) Cluster kernels for

ization. IEEE Trans Pattern Anal Mach Intell 34(6):12271233 semi-supervised learning. In: Proceedings of 16th annual neural

62. Luo Y, Tao D, Xu C et al (2013) Multiview vector-valued information processing systems conference, pp 585592

manifold regularization for multilabel image classification. IEEE 72. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering:

Trans Neural Netw Learn Syst 24(5):709722 analysis and an algorithm. Adv Neural Inf Process Syst

63. Luo Y, Tao D, Geng B et al (2013) Manifold regularized mul- 2:849856

titask learning for semi-supervised multilabel image classifica- 73. Szummer M, Jaakkola T (2002) Partially labeled classification

tion. IEEE Trans Image Process 22(2):523536 with Markov random walks. Adv Neural Inf Process Syst

64. Melacci S, Belkin M (2011) Laplacian support vector machines 14:945952

trained in the primal. J Mach Learn Res 12:11491184 74. Tuia D, Camps-Vall G (2009) Semi-supervised remote sensing

65. Qi Z, Tian Y, Shi Y et al (2013) Cost-sensitive support vector image classification with cluster kernels. IEEE Geosci Remote

machine for semi-supervised learning. Procedia Comput Sci Sens Lett 6(1):224228

18:16841689 75. Li T, Wang XL (2013) Semi-supervised SVM classification

66. Tan J, Zhen L, Deng N et al (2014) Laplacian p-norm proximal method based on cluster kernel. Appl Res Comput 30(1):4245

support vector machine for semi-supervised classification. Neu- 76. Gao HZ, Wang JW, Xu K et al (2011) Semisupervised classifi-

rocomputing 144:151158 cation of hyperspectral image based on clustering kernel and LS-

67. Yang L, Yang S, Jin P et al (2014) Semi-supervised hyperspectral SVM. J Signal Process 27(2):7681

image classification using spatio-spectral Laplacian support 77. Tao XM, Cao PD, Song SY et al (2013) The SVM classification

vector machine. IEEE Geosci Remote Sens Lett 11(3):651655 algorithm based on semi-supervised gauss mixture model kernel.

68. Qi Z, Tian Y, Shi Y (2015) Successive overrelaxation for Inf Control 42(1):1826

Laplacian support vector machine. IEEE Trans Neural Netw 78. Li J, Allinson N, Tao D et al (2006) Multitraining support vector

Learn Syst 26(4):674683 machine for image retrieval. IEEE Trans Image Process

69. Li Y, Kwok JT, Zhou Z (2009) Semi-supervised learning using 15(11):35973601

label mean. In: Proceedings of the 26th international conference

on machine learning, pp 633640

123

- A Re-Learning Based Post-Processing Step For Brain Tumor Segmentation From Multi-Sequence ImagesUploaded byAI Coordinator - CSC Journals
- kapitaselektaUploaded byAwalia Windarsari Putri
- Support Vector MachineUploaded byNikhilesh Prabhakar
- resumeUploaded byVipul Sikka
- CardiacAbnormality+SVMUploaded byRizwan Bangash
- Ya Seen 2015Uploaded byAntonio Soto Leon
- An Ontology-Based Sentiment Classification Methodology for Online Consumer ReviewsUploaded byf6684sam
- Principles of Support Vector MachineUploaded byJhonatan A Lozano Galeano
- Support Vector MachineUploaded byMuhammad Heru Firmansyah
- Describing People a Poselet-Based Approach to Attribute Classification - Bourdev, Maji, Malik - Proceedings of International Conference on Computer Vision - 2011Uploaded byzukun
- A Hybrid Approach for Classification of DICOM ImageUploaded byWorld of Computer Science and Information Technology Journal
- -Handwriting RecognitionUploaded byJournalNX - a Multidisciplinary Peer Reviewed Journal
- A Comparative Study of Content Based Image Retrieval Trends and ApproachesUploaded byAI Coordinator - CSC Journals
- Data Mining 101Uploaded bykarinoy
- ML-Advice Machine LearningUploaded byaglobal2
- Chatbot SynopsisUploaded byDeepanshu Sharma
- An improved fuzzy twin support vector machineUploaded byAnonymous vQrJlEN
- Pioneering Method to Analyse Depression in Human Being Using Audio-Video ParametersUploaded byEditor IJRITCC
- glossaryUploaded byDuong Thao
- Su Et Al-2016-Structural Control and Health MonitoringUploaded byCESAR
- Action Recognition From Video UsingUploaded byManju Yc
- Vector MachieUploaded byJournalNX - a Multidisciplinary Peer Reviewed Journal
- 1. Fintech ML Using AzureUploaded byVikram Pandya
- Iosr ImageUploaded byPoitu Ermawantia
- 1 IntroductionUploaded byvikasnar
- Parametro Do RPDUploaded byDeangelis Damas
- ENHANCED SCHEME FOR HANDWRITTEN OFFLINE SIGNATURE VERIFICATIONUploaded byIJIERT-International Journal of Innovations in Engineering Research and Technology
- Full Text 1Uploaded bytweety492
- Trabajo de InteligenciaUploaded byDaniel Aguilar
- 1-s2.0-S0168169910002206-main.pdfUploaded byIsmi Ae

- 11. Mech - Ijmperd - Formability of Warm Deep - Chennakesava ReddyUploaded byTJPRC Publications
- Landau - Movement Out of ControlUploaded bywadiwan
- Course outline emft EE 346 Spring 2016.docxUploaded byJhanzeb Khan
- IGCSE Mathematics 4400 May 2004 Question Paper and Mark Scheme Paper 2F N20709Uploaded byVarun Panicker
- Mp Dsp Lab ManualUploaded bysubbu
- 239079566-Assembler-Book.pdfUploaded byborisg3
- Automatic Generation Control in Three-Area Power System Operation by using “Particle Swarm Optimization Technique”Uploaded byIJSTE
- Numerical Computation of Internal and External Flow. Ch3Uploaded byJavier Carceller
- qp labUploaded byArun AP
- Intro, Lab Math, And SafetyUploaded byJangHanbyul
- daeriusUploaded byEmilio Martinez
- Sampling TheoryUploaded byPrima Hp
- EMTL IMPUploaded byNikhil Garipelli
- Lab_2_Intro_CUploaded bysyazaa syazaa
- ECE Board Exam SyllabiUploaded bylesterPECE
- An Interactive Introduction to LaTeX,Uploaded byArjunaAriyaratne
- 9345523Uploaded bySimona Suciu
- Convection and Conduction Heat TransferUploaded bytfemilian
- Assignment 1 (Winter 2017)Uploaded bymiggawee88
- Mathematics Course OutlineUploaded byMaisarah Manas
- E 698 - 11Uploaded byruben carcamo
- Assignment QTUploaded byAnkush Saraff
- Butterfly Diagram - WikipediaUploaded byHashimIdrees
- Minimization Maximization and Lagrange Multiplier ProblemsUploaded byAjay Deva
- PVT Correlations McCain_ValkoUploaded byAlejandra Maleja Rivas
- LMPHY120 laboratryUploaded byAbhishek Kumar
- Kamal 1973Uploaded byY.K. Chen
- BoundaryUploaded byAlo Rovi
- Lecture 5Uploaded byأحمد دعبس
- Exercise quadratic equationsUploaded byzaedmohd