You are on page 1of 6

Data Classification based on Support Vector Machine Optimized parameters

and Feature Selection by Genetic Algorithm


1 2
1
dechit@msn.com
2
pmeesad@gmail.com
1518 10800


Austrian Credit
Bankruptcy Data







87.86% Austrian Credit
85.83% Bankruptcy Data
:

Abstract
Recently, There have been research that propose
about classification but no perfect methodology for
classification. In this paper, we try to find a suitable
model to increase the accuracy of classifiers. We use
Austrian Credit, German Credit, and Bankruptcy Data in
our experiments. We use the model which combines the
support vector machine optimized parameters and

feature selection by genetic algorithm. We compare the


result from these models with the results from decision
tree model, neuron network model, support vector
machine model and support vector machine optimized
parameter by genetic algorithm.
Especially, the
combined method of support vector machine optimized
parameters and feature selection by genetic algorithm
gives the highest accuracy for Austrian Credit is
87.86%. and Bankruptcy Data is 85.83%.
Keywords: Support Vector Machine, Genetic Algorithm

1.




Huanga[1]
(Support Vector Machine SVM)

[2][3][4]




(Noise)
(Outlier)



[5] Yongchen
and Honge[6]


Noise Outlier
Chen, et. al. [7] MARS CART

2.
2.1 (SVM)

w j j ( x ) + b
j =1

(2)
(2) x
( x) = [1 ( x), 2 ( x),..., n ( x)]T

()

b (BiasThreshold)
(3)
n

f ( x ) = w j j ( x ) + b



(Slack Variable)
(4) (5)
wT x + b +1 i ; yi = +1
(4)
T
w x + b 1 + i ; yi = 1
(5)
i > 0
SVM 2

(6)
Minimize
w ,b ,

D = {xi , yi ; i = 1, 2,...., n}
n

(+1= Class 1, -1= Class

0) SVM SVM

w + C i
2

(6)

i =1

yi ( wT ( xi ) + b) + i 1 0


(Lagrangian) (Dual Sets)

(Constrained Optimization) (7)
Minimize

(1)

i 0, i = 1, 2,..., N

xi = ( xi1 , xi 2 ,..., xin ) R

yi {+1, 1}

(3)

j =1

w j

(Weighting) Feature Space Output Space

SVM
Vapnik [8]

(1)

y = sign

i =1

j =1

(7)

y y K ( x , x )
2
j

i =1

: yi i

7)
2

=0

i =1

0 i C , i = 1, 2,..., N

i Lagrange Multipliers
i*
(8)
N

f ( x ) = w j *j K ( x, x j ) + b

(8)

j =1

K ( x, x j ) = T ( x) ( x j ) Kernel
Function Kernel Function 3
(9) (10) (11)

3.
3.1

Polynomial Kernel:
k ( xi , x j ) = (1 + xi x j )

(9)

Radial Basis Function Kernel :


k ( xi , x j ) = exp( || xi x j ||2 )

...

(10)

...

Sigmoid Kernel :
k ( xi , x j ) = tanh( kxi x j ) )

Genetic Algorithm

...

(11)

2.2 (Genetic Algorithm)


(Solutions)

(Natural Selection)

(Genetic Operator) [9]
(Inheritance) (Selection)
(Mutation) (Crossover)

1)
2)
3)
4)
5)
6)

Genotype
Phenotype

kernel function
C
Parameter kernel
function

Phenotype of
Parameter Genes

SVM

Fitness

(Cross over,Mutation,
Fitness Evaluation)

Classify Accuracy
For varidate set

1 :

1 :


1-2
Kernel 0=Linear
function 1=Polynomial
2=RBF
3=Sigmoid
3-11
Degree 0-512
* kernel
function Polynomial
12-25 Gamma 0.0001*Gramma =
99.999

99999/16383*0.001
- kernel
function Polynomial,
RBF Sigmoid
26-34 Coef
0-512
kernel
function Polynomial
Sigmoid
35-44 Cost(C) 0.0001 C =
99.999
*99999/1024*0.001
45-55* Feature 0,1
0=, 1=
45-74*
* Austrian
Credit
**
bankruptcy data
30
(Generation) 50
(Cross over)
0.6 (Mutation)
1 0.1
(Fitness Evaluation) Mean Absolute Percent Error
(MAPE)

(Roulette Wheel Selection)
(Elitism Selection)
(Elitism Selection) 5

(Roulette Wheel Selection) 25



1
3.2

2 UCI

AI Austrain Credit [10]

Bankruptcy data [11]


2
2

+1
-1
Austrain Credit
690
14
307
383
bankruptcy data
240
30
128
112


Intel Celeron M 1.87 GHz.
1.5 Gb. LibSVM
[12] Open Source

3.3

5 3
3 :

1 Decision Tree
DT
2 Neuron Network
NN
3 SVM
SVM
4 SVM+GA optimize kernel
SVMGP
function and parameter
SVMGPF
5 SVM+GA optimize kernel
function and parameter and
feature selection

4.
Austrian
Credit SVMGPF
88.12% SVMGP
86.13% SVM 86.01%
NN 85.65% DT
83.77% 4 2
4 :
Austrian Credit

(%)
DT
83.77
NN
85.65
SVM
86.01
SVMGP
86.13
SVMGPF
88.12*

5 :
Bankruptcy data

(%)
DT
70.00
NN
65.00
SVM
65.00
SVMGP
71.67
SVMGPF
85.83*

3 :
Bankruptcy data

5.

2 :
Austrian Credit


Bankruptcy data SVMGPF
85.13% SVMGP
71.67% DT 70.00%
NN SVM 65.00%
5 3

6.


[1] Huanga, Z., Chena, Hsinchun., Hsua, C. J., Chenb


W. H., Wuc, S, Credit rating analysis
with support vector machines and neural
networks, a market comparative study, 2004.
[2] Fan, A., and Palaniswami, M, Selecting
Bankruptcy predictors using a support
vector machine Approach, Proceedings of
the International Joint Conference on Neural
Networks, 2000.
[3] Shin, K.S., Lee, T.K., and Kim, H.J, An
application of Support vector machines in
bankruptcy prediction Model, Expert
Systems with Applications. Vol. 28, pp. 127-135,
2005.
[4] Mina, J. H., and Lee, Y.C, Bankruptcy
prediction using support vector machine with
optimal choice of kernel function parameters,
Expert Systems with Applications, vol. 28 pp.
603-614, 2005.
[5] ,


,
2, 2552.
[6] Yongchen, L. and Honge, X, Credit Rating
Analysis with Support Vector Machines
Optimized by Genetic Algorithm, Wireless
Communications, Networking and Mobile
Computing, 2008. WiCOM '08. 4th
International Conference, 2008.
[7] Chen, W., C. Ma, et al, Mining the customer
credit using hybrid support vector machine
technique, Expert Systems with Applications,
2008.
[8] Vapnik, V, The Nature of Statistical
Learning Theory, Springer-Verlag. New York,
1995.
[9] Holland, J. H., Outline for a logical theory of
Adaptive systems, Journal of the Association for
Computing Machinery. Vol. 3, pp. 297-314, 1962.
[10] Australian Credit Approval Data set. Retrieved
February,10,2008 From
http://archive.ics.uci.edu/ml/machine-learningdatabases/statlog/australian/, 2008.
[11] Pietruszkiewicz, W.. Application of discrete
Predicting structures in an earlywarning expert
system for financial distress. Ph.D. Thesis,
Szczecin , Technical University, Szczecin, 2004.
[12] Chang, Chih-Chung., and Lin, Chih-Jen.,
Retrieved August,8, 2008, from
http://www.csie.nt/.u.edu.tw/~cjlin/libsvm/, 2008.

You might also like