You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328471217

Multiclass classification on brain cancer with multiple support vector machine


and feature selection based on kernel function

Conference Paper  in  AIP Conference Proceedings · October 2018


DOI: 10.1063/1.5064230

CITATIONS READS

0 54

2 authors, including:

Zuherman Rustam
University of Indonesia
114 PUBLICATIONS   299 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Forecasting the direction of Indonesia’s consumer goods sector stock price movement using Fuzzy Kernel Robust C-Means View project

Classification of Schizophrenia Data with SVM-based Methods View project

All content following this page was uploaded by Zuherman Rustam on 30 November 2018.

The user has requested enhancement of the downloaded file.


Multiclass classification on brain cancer with multiple support vector machine and
feature selection based on kernel function
Z. Rustam, and S. A. A. Kharis

Citation: AIP Conference Proceedings 2023, 020233 (2018); doi: 10.1063/1.5064230


View online: https://doi.org/10.1063/1.5064230
View Table of Contents: http://aip.scitation.org/toc/apc/2023/1
Published by the American Institute of Physics

Articles you may be interested in


Classification of cancer data based on support vectors machines with feature selection using genetic algorithm
and Laplacian score
AIP Conference Proceedings 2023, 020234 (2018); 10.1063/1.5064231

Correlated based SVM-RFE as feature selection for cancer classification using microarray databases
AIP Conference Proceedings 2023, 020235 (2018); 10.1063/1.5064232

Comparison between support vector machine and fuzzy Kernel C-Means as classifiers for intrusion detection
system using chi-square feature selection
AIP Conference Proceedings 2023, 020214 (2018); 10.1063/1.5064211

Comparison of fuzzy robust Kernel C-Means and support vector machines for intrusion detection systems using
modified kernel nearest neighbor feature selection
AIP Conference Proceedings 2023, 020215 (2018); 10.1063/1.5064212

Predicting the Jakarta composite index price using ANFIS and classifying prediction result based on relative
error by fuzzy Kernel C-Means
AIP Conference Proceedings 2023, 020206 (2018); 10.1063/1.5064203

Application of support vector machines for reject inference in credit scoring


AIP Conference Proceedings 2023, 020209 (2018); 10.1063/1.5064206
Multiclass Classification on Brain Cancer
with Multiple Support Vector Machine and
Feature Selection Based on Kernel Function
Z. Rustam a) and S. A. A. Kharis

Department of Mathematics, Faculty of Mathematics and Natural Sciences (FMIPA)


Universitas Indonesia, Depok 16424, Indonesia
a)
Corresponding author: rustam@ui.ac.id

Abstract. Cancer is one disease that needs a proper treatment. There are more than 150 types of cancer, one of them is
brain cancer. Taking advantage from microarray data, machine learning methods can be applied to help brain cancer
prediction according to its type. This problem can be referred to as a multiclass classification problem. Using one versus
one approach, the multiclass problem with k classes can be transformed into k(k+1)/2 binary class classification. To
improve the accuracy, the features candidate will be evaluated using feature selection. In this research, Kernel Function is
implemented as the feature selection method and Multiple Support Vector Machine (MSVM) method is implemented as
the classification method. The results obtained showed the comparison accuracy of MSVM use and without feature
selection.

Keywords: Brain Cancer, Multiclass Classification, Multiple Support Vector Machine.

INTRODUCTION
Cancer classification has been based primarily on morphological appearance of the tumor, but this has
limitations. Tumors with similar appearance can follow different clinical course significantly and show different
responses to therapy. In a few cases, such clinical heterogeneity has been explained by dividing similar tumor
morphologically into subtypes with distinct pathogeneses. Moreover, cancer classification has been difficult in part
because it has historically relied on specific biological insights rather than systematic and unbiased approaches for
recognizing tumor subtypes. For many years, cancer classification to detect cancer at an early stage of treatment has
improved. Cancer classification is used for the treatment of cancer has entered the challenge to target specific
therapy for each type of cancer pathogens to maximize efficacy and minimize toxicity.
There are many types of cancer, one of them is brain cancer. It even has become the most common cause of
cancer-related deaths for people below 40. According to Cure Brain Cancer Foundation, the survival rate of brain
cancer patients only increases by 2 % in the last 30 years [1]. Along with the new cases arise and increase the
number of death caused by brain cancer, method to accelerate the brain cancer classification process is needed.
Medical data commonly using microarray data form. Microarray data is a data which shows human genes
expressions on the specific part of body numerically. From the micro array data that have been manually classified,
the detected patterns of the genes can give clues to the machine to classify other microarray data on the same issues.
The simple case of brain cancer classification problem is binary-class classification. The objective of this problem to
classify the patients into two categories: cancer and non-cancer. But, in real life case, the classification problem is
not the only classification in two categories [2]. This problem is often called multiclass classification. The data
used in this paper is a problem of multiclass classification

Proceedings of the 3rd International Symposium on Current Progress in Mathematics and Sciences 2017 (ISCPMS2017)
AIP Conf. Proc. 2023, 020233-1–020233-6; https://doi.org/10.1063/1.5064230
Published by AIP Publishing. 978-0-7354-1741-0/$30.00

020233-1
Medical microarray data has two main characteristics. First, it has many features. Here, the genes are considered
as the features. It causes long computational time, and not all features are significant in doing classification. To
overcome this problem, a feature selection procedure needs to be done. The second characteristic of the data is it
does not have many samples [3].
There are mainly three methods to do feature selection: filter, wrapper, and embedded. Filter method is independent
of the classification algorithm, whereas the wrapper and embedded methods are dependent on classification
algorithm. In this paper, kernel function method is implemented as the feature selection method and Multiple
Support Vector Machine (MSVM) method is implemented as the classification method.

MATERIALS AND METHODS

Microarray Data
This paper use microarray data form of brain cancer. Data obtained from Broad Institute
(http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi) [4], a biomedical research from MIT and Harvard. The
data consists of 7129 features and 42 samples.

Kernel Function
Kernel function was proposed by Vapnik in 1997 [5] and developed by Scholkopf [6] and Christianini [7]. The
advantage of using kernel functions is saving memory in the new feature space, due to vector mapping is mapped to
a higher dimension. Let: ! → ! is a non-linear mapping from input space ! to a new high dimensional feature space
F. Thus, dot product !!! , !! in the original input space is mapped to Φ !! ! Φ !! new feature space. From Mercer’s
Theorem, kernel function defined in Equation 1 [8].
K x!! x! = Φ x! ! Φ x! (1)
There are several types of kernel functions, which are polynomial, gaussian, sigmoid, and other kernel.
Euclidean distance in the space F can be calculated as follows:
! !
! !! − ! !! = ! !! − ! !! (! !! − ! !! )
= K(x! , x! ) − 2 K(x! , x! ) + K(x! , x! ) (2)
The Equation 2 will be used to measure the function dissimilarity in feature selection using a modified Gaussian
Kernel in Equation 3
!
!!
! ! !!
K x!! x! = exp − (3)
!!!

Feature Selection Based on Kernel Function

The idea of feature selection based on kernel function is calculating the weight of each feature to optimize the
objective function clustering. Kernel function is used to calculate dissimilarity measures as described in the previous
section. Dataset ! ∈ ℝ!"# with n number of sample and p number of gen or called by feature first will be calculated
!
using !! = , which ! = 1,2, … , !. The dataset should be labeled for classifying, !! ∈ ! which ! = 1,2, … , !. After
!
that, the center of cluster and distance between the center of the cluster and sample needs to be prepared. In this
case, the C class will be treated as a cluster, so the cluster center !! = [!!! , !!! , … , !!" ] can be calculated using
Equation 4.
!! ∈!! !!"
v!" = , i = 1,2, … , C, j = 1,2, … , n (4)
!!
!! is the number of samples contained in class !! . In the feature selection based on kernel function,
dissimilarity measure will be calculated. This is the basic thing are done on the clustering method. The function of
dissimilarity between the sample and the center of the cluster obtained by using kernel written in the following
Equation 5.

020233-2
!
! !
! !! , !! = ! !! − ! !!
!!!
!
= !!!{ K(x! , x! ) − 2 K(x! , x! ) + K(x! , x! )} (5)

By using a modified Gaussian kernel in Equation 3, then the distance in Equation 4 can be written as:
!
! !!
! ! !!
Φ x! − Φ x! = 2(1 − K x! , v! ) = 2 1 − exp − (6)
!!!
Objective function on these selection features, similar to the objective function in the selection features SCAD, so
the equation objective function is defined in Equation 7 [9].
! !

min ! = !
Φ !! , !! + ! !!!
!!! !! ∈!! !!!
! ! ! ! !
= !!! !! ∈!! !!! w! Φ x!" − Φ v!" +δ !!! w! (7)
where ! = (!! , … , !! ) subject to
w! ∈ 0,1 , k = 1, … , p
! (8)
!!! w! = 1

The objective function will be minimal when the value of ! and !! low, so that the value of ! and !! need to
be updated. The equation for updating the objective function written in Equation 9 and Equation 10.
! !
! ! ! !!! ! !!" !! !!" !
!! = + !!! !! ∈!! − Φ !!" − Φ !!" (9)
! !! !
! !
! ! !!!
!!! !! ∈!! !!! !! ! !!" !! !!"
δ! = ! !!! !
, α constant (10)
!!! !!

Here is a feature selection based on kernel function algorithm,

Algorithm 1. Feature Selection based on kernel function


Input: Dataset X, Label y, ! = 0.5, ! = 10!!
Output: w: the weight of each gene
Step 1. Initialization
!
Calculate the weight of each gene(k) using !!! =
!
Calculate the centers !! of each cluster, using Equation 4
Calculate the distance, using Equation 5
Step 2. Calculate the value of ! ! , using Equation 10
Step 3. Update weight of each gene using Equation 9
Step 4. Calculate the value of !, using Equation 7
Step 5. Determine iteration stopping for the centroid of the current iteration (!) and previous iteration (! − 1)
∆= !! − !!!!
If ∆< !, then iteration stops. If ∆> !, then go to step 2

Support Vector Machine (SVM)


Support Vector Machine (SVM) was first introduced by Vapnik in 1998. The main objective of SVM is to
construct a hyperplane which maximizes margin, the distance from the hyperplane to the nearest data from a class
[10]. The larger the margin is, the smaller the error is in the generalization. The primal optimization problem can be
written as:
! !
min w + C ! !!! ξ! (11)
!
!. !. !! !. !! + ! ≥ 1 − !! , ! = 1,2, … , !

020233-3
!! ≥ 0, ! = 1,2, … , !
!. ! + ! = 0 is a hyperplane with weight parameter w and bias parameter b, ! > 0 is a regulation parameter, which
controls the balance on minimizing misclassification and maximizing hyperplane margin, and !! is the slack
variable. The slack variable enables misclassification at some distances. If !! = 0, the ! !! data is located right at the
margin or on the right side of the margin. If 0 < !! , the ! !! data is located right at the margin or on the right side of
the margin. If 0 < !! ≤ 1, the ! !! data is inside the margin but still at the right side. If !! > 1 → the ! !! data is
located at the wrong side and misclassified.
!
min ! ! !
!!! !!! !! !! !! !! !! . !! − !!! !! (12)
!
!

!. !. !! !! = 0 , 0 ≤ !! ≤ !, ! = 1,2, … , !
!!!
!
!! is the Lagrange multiplier. The weight vector is expressed as ! = !!! !! !! !!

One Versus One Multiclass Support Vector Machine (OVO-MCSVM)


There are some methods to extend binary class SVM problem into multiclass SVM problem, such as one versus
all or one versus rest, one versus one, directed acyclic graph (DAG), and error corrected output coding (ECOC) [11].
In this paper, the one versus one method is used. It transforms multiclass problem into ½ (k(k-1)) binary
classification problems. In other words, every possible pairs class are considered. On each binary classification
!!
problem, the training set is !! = {!! , !! }!!! , where !! is the total number of samples in !th pair of classes.
Remember that for multiclass classification, !! ∈ {1,2, … , !} where k is the number of classes. If there are 3 classes;
1,2, and 3, it is transformed into 3 binary classification problems: {1,2}, {1,3}, and {2,3}. Consider x as an example
of test data. The class for x is assigned based on the most class voted by binary problems for x.

EXPERIMENTAL RESULTS

Data Overview
In the brain cancer microarray data on observation, there are 42 samples with 5 classes:
1. Medulloblastoma (MD): 10 samples
2. Malignant glioblastoma (MG): 10 samples
3. Atypical Teratoid Rhaboid Tumor (Rhab): 10 samples
4. Normal: 4 samples
5. Primitive neuroectodermal tumor (PNET): 8 samples
The data is randomly divided into train data and test data with 30 samples for training data and 12 samples for
testing data. The train data consists of 7 MD, 7 MG, 7 Rhab, 3 Normal, and 6 PNET. Table 1 shows some examples
of the data.
L49218_f_at and M71243_f_at show feature in the dataset. Numbers in each row represents the expression of a
gene across all experiments. Using one versus one method, the train data are transformed into 10 subsets: {1,2},
{1,3}, {1,4}, {1,5}, {2,3}, {2,4}, {2,5}, {3,4}, {3,5}, and {4,5}. The MSVM experiment result without features
selection shown in Table 2. Except that, 500 and 1000 best features from Gaussian kernel procedure are used. Given
an unknown example x. After the example is put into the MSVM classifiers, the voting procedure is implemented to
predict the class of the example. A confusion matrix is used to count the accuracy. Table 3 shows the results of the
experiment using 500 best features and Table 4 show the result of the experiment using 1000 best features using
Gaussian Kernel with ! = 0.5 and ! = 10!! .
TABLE 1. Example of Brain Cancer Data
L49218_f_at M71243_f_at Class
27 51 MD
112 173 MG
-225 -50 Rhab
-133 552 Normal
71 282 PNET

020233-4
TABLE 2. Confusion Matrix for Medulloblastoma, Malignant Glioblastoma, Atypical Teratoid Rhaboid Tumor,
Normal, Primitive Neuroectodermal Tumor with Multiple Support Vector Machine Without Feature Selection

MD MG Rhab Normal PNET


MD 3 0 0 0 1
MG 0 3 0 0 0
Rhab 0 0 3 0 0
Normal 0 0 0 0 0
PNET 0 0 0 1 1
!ℎ! !"##$!%&' !"#$$%&%'( !"#$%&!' 10
!"#$%&& !""#$!"% = = = 83,33%
!"#$%& !" !"#$%&!' !"#$ 12
Per class accuracy
Medulloblastoma = 100 %
Malignant glioblastoma = 100 %
Atypical teratoid rhaboid tumor = 100 %
Normal =0%
Primitive neuroectodermal tumor = 50 %

TABLE 3. Confusion Matrix for Medulloblastoma, Malignant Glioblastoma, Atypical Teratoid Rhaboid Tumor,
Normal, Primitive Neuroectodermal Tumor with Multiple Support Vector Machine and Feature Selection Based
Gaussian Kernel (! = 0.5, ! = 10!! ) using 500 features

MD MG Rhab Normal PNET


MD 3 0 0 0 0
MG 0 3 0 0 0
Rhab 0 0 3 0 0
Normal 0 0 0 0 0
PNET 0 0 0 1 2
!ℎ! !"##$!%&' !"#$$%&%'( !"#$%&!' 11
!"#$%&& !""#$!"% = = = 91,67%
!"#$%& !" !"#$%&!' !"#$ 12
Per class accuracy
Medulloblastoma = 100 %
Malignant glioblastoma = 100 %
Atypical Teratoid Rhaboid Tumor = 100 %
Normal =0%
Primitive neuroectodermal tumor = 100 %

TABLE 4. Confusion Matrix for Medulloblastoma, Malignant Glioblastoma, Atypical Teratoid Rhaboid Tumor,
Normal, Primitive Neuroectodermal Tumor with Multiple Support Vector Machine and Feature Selection Based
Gaussian Kernel (! = 0.5, ! = 10!! ) using 1000 features

MD MG Rhab Normal PNET


MD 3 0 0 0 1
MG 0 3 0 0 0
Rhab 0 0 3 0 0
Normal 0 0 0 0 0
PNET 0 0 0 1 1
!ℎ! !"##$!%&' !"#$$%&%'( !"#$%&!' 10
!"#$%&& !""#$!"% = = = 83,33%
!"#$%& !" !"#$%&!' !"#$ 12
Per class accuracy
Medulloblastoma = 100 %
Malignant glioblastoma = 100 %
Atypical Teratoid Rhaboid Tumor = 100 %
Normal =0%

020233-5
Primitive neuroectodermal tumor = 50 %

CONCLUSIONS
From the test result, it can be seen that the accuracy of Multiple SVM using a feature selection based kernel
function method reached 91,67 % accuracy in usage 500 best features, this percentage slightly more superior than
Multiple SVM methods without using the kernel based feature. Thus, it can be said Multiple SVM by using feature
selection based kernel function is better than Multiple SVM without using feature selection.

ACKNOWLEDGMENTS

This research is supported by PITTA UI 2017 research grant from Universitas Indonesia.

REFERENCES

1. Australian Institute of Health and Welfare (AIHW), Australian Cancer Incidence and Mortality (ACIM)
(AIHW, Canberra, 2017), p. 7.
2. Z. Rustam and V. Panca, Application of Machine Learning on Brain Cancer Multiclass Classification, in
International Symposium on Current Progress in Mathematics and Sciences, Depok, 2016 (FMIPA UI,
Depok, 2016).
3. T. R. Golub et al., Science 286, 531 (1999).
4. Broad Institute, Cancer Program Datasets, available at: http://www.broadinstitute.org/cgi-bin/cancer/
datasets.cgi
5. V. N. Vapnik, Statistical Learning Theory (John Wiley and Sons Inc., New York, 1998).
6. B. Scholkopf, A. Smola, and K. R. Muller, Neural Computation 10, 1299 (1998)
7. N. Cristianini and J. S Taylor, An Introduction to Support Vector Machines and Other Kernel-based
Learning Methods (Cambridge University Press, Cambridge, 2000).
8. Z. Rustam and A. S. Talita, Journal of Theoretical and Applied Information Technology (JATIT) 80, 147
(2015).
9. H. Chen, Y. Zhang, and I. Gutman, J. Biomed. Inform. 62, 12 (2016).
10. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Mach. Learn 46, 389 (2002).
11. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, Singapore, 2006).

020233-6
View publication stats

You might also like