Professional Documents
Culture Documents
S. Ghorai
Department of AEIE, Heritage Institute of Technology, Kolkata, India
ABSTRACT: In this work, recently proposed learning algorithm Vector-Valued Regularized Kernel
Function Approximation (VVRKFA) and Single-hidden Layer Feedforward Neural Network (SLFN) are
combined together to form a hybrid classifier called VVRKFA-SLFN. In VVRKFA method, multiclass
data are mapped to the low dimensional label space through regression technique and classification is
carried out in this space by measuring the Mahalanobis distances. On the other hand, SLFN has excep-
tionally high-speed and can achieve high accuracy on unknown samples. In the proposed hybrid learning
system, VVRKFA is used to map the patterns from high dimensional feature space to a subspace and
SLFN is trained on these low dimensional vectors to determine the class labels. The presented technique
is verified experimentally both on benchmark and gene-microarray data sets to show the effectiveness of
VVRKFA-SLFN classifier with enhanced correctness and testing time.
67
feedforward neural network learning algorithms space VVRKFA employed kernel trick to map a
with enhanced classification accuracy. This prop- training pattern into a high-dimensional space. A
erty of ELM has motivated us to employ similar nonlinear regularized kernel function is then fit-
form of SLFN in the VVRKFA framework. ted to map the patterns from this high-dimensional
In this work, a hybrid pattern classifier is pro- space to the low-dimensional subspace. The sub-
posed for improved pattern classification in low space dimension depends on category of the sam-
dimensional space by combining VVRKFA and ples in the problem. The classification is carried
SLFN classifier similar to ELM. The VVRKFA out in the subspace with the help of mapped low-
classifier is used to map the feature vectors in the dimensional patterns obtained through the vector-
low-dimensional subspace. The task of assigning valued regression. The testing of a new pattern is
a label to each test pattern is performed in this also carried out in this label space by mapping it
low-dimensional subspace by employing a SLFN from input space to feature space and then finding
classifier instead of the use of distance measure. its fitted response in the label space.
Originally, in VVRKFA the class label of a test Let us consider a multiclass problem of N differ-
pattern is determined by computing the Mahalano- ent categories of samples with m features and train-
bis distances of it from the class centroids in the ing data {( xi , yi ) : xi ∈ℜn , yi ℜN i = , , ..., m}.
low-dimensional subspace. The use of SLFN for VVRKFA solves the following optimization prob-
this purpose not only speed up the testing time of lem to obtain the mapping function:
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
ρˆ i ) ( T
i , T T
) (3)
68
N different classes denoted by {ρ (1) ρ (2) ρ( )
}. Here, ai [ ai1, ai aiN ]T is the weight vector
( j)
A centroid ρ of jth class is computed by that connects the ith hidden neuron and the input
neurons, βi βi1, βi βiq ]T is the weight vector
mj
1 that connects the ith hidden neuron and the out-
ρ( j)
mj
∑ ρˆ (
i =1
i ), (4) put neurons, and di is the threshold of ith hidden
neuron. G(( i i , ˆ j ) indicates the ith hidden node’s
response with respect to the input ρˆ j and it can be
where mj represents the number of samples in class j. expressed as
The decision about the category of a sample
Class ( xt ) is determined by G(( i , i , ˆ j ) ( i . ˆj ), (10)
i
dM ( ˆ ( t ), ρ ( )
| ˆ) where,
(6)
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
= [ ρ( ˆ( t ) ( )
] ˆ [ ( t) ρ( )
],
W ( 1, ..., q , 1, ..., q , ˆ1, ..., ˆ m )
N ⎡ G(( 1, 1, ˆ1 ) ( q , q , ˆ1 ) ⎤
Σˆ = ∑ ( 1) ˆ ( j ) / ( ) (7) ⎢
=⎢
⎥
j =1
j
⎥ ,
⎢G(( 1, 1, ρˆ m ) ( q , q , ˆ m )⎥⎦
⎣ m ×q
is the pooled within class sample co-variance
⎡β1T ⎤ ⎡ y1T ⎤
matrix, and ⎢ ⎥
β=⎢ ⎥ and ⎢ ⎥ .
⎢ ⎥ (12)
1 ⎢βqT ⎥ ⎢⎣ ymT ⎥⎦
Σˆ ( j ) = ⎣ ⎦q
mj −1 N m N
mj
69
The specification of these data sets such as number where μ is the Gaussian kernel parameter, is
of classes, number of input features and number of selected as kernel. The regularization parameter
samples are shown in Table 1 and Table 2 separately C of VVRKFA and VVRKFA-SLFN are selected
for the two types of data sets. by tuning from the set { | i = − 7, − 6, ..., − 1}.
The kernel parameter μ for above two methods is
4.2 Experimental setup selected from the set {μ | = − 8, − 7, ...., 8}. The
number of hidden neurons of VVRKFA-SLFN is
The performance of VVRKFA-SLFN classifier is kept fixed at 10 for all the data sets while it’s value
compared with VVRKFA and SLFN in the form is tuned for ELM classifier for best results. The
of ELM. All the methods are implemented in optimal parameter set for both the classifiers are
MATLAB. 10-fold cross validation testing method chosen based on the performance of a parameter
is used to compare the performances of all the three set on a tuning data set consisting of 30% of the
methods. We employed a grid search technique to total samples. Once the optimal parameter set is
identify the optimal parameter
p set for a classifier. chosen we carried out 10-fold Cross Validation
( )
A Gaussian kernel k xi , x j exp ( ) xi − x j 2 (CV) (Mitchell 1997) testing presenting same fold
of data to all the classifiers for computing aver-
Table 1. Specification of benchmark multiclass data age testing accuracy. Average testing accuracy and
standard deviation are reported by performing
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
Table 3. 10-fold testing performance of SLFN, VVRKFA and VVRKFA-SLFN classifiers on benchmark data sets.
70
that a fixed number of hidden neurons (only 10) To facilitate better classification results, this draw-
for SLFN is sufficient for well discrimination of back of microarray data is removed first by identi-
the patterns in the label space. Whereas the optimal fying a small gene subset which causes the disease
number of hidden neurons for SLFN classifier is (Li, Weinberg, Darden, & Pedersen 2001, Guyon,
required to tune for different data sets. Weston, Barnhill, & Vapnik 2002). We have selected
The variation of testing time with the size of the most relevant 10 genes by MRMR method (Peng,
the data set for VVRKFA-SLFN and VVRKFA Long, & Ding 2005) as performed in our previous
classifiers is shown in Fig. 2. This testing is per- research work (Ghorai, Mukherjee, Sengupta, &
formed on waveform data set by considering 50 Dutta 2011). Incase of microarray data sets, param-
hidden neurons for SLFN and 10 hidden neurons eter selection is done in the same way as performed
for VVRKFA-SLFN classifiers. The Fig. 2 shows for the benchmark data sets but we have evaluated
that with the increase of the number of samples, Leave-One-Out Cross Validation (LOO-CV) perfor-
the average 10-fold testing time increases for both mance (Mitchell 1997) as there are very less number
the methods but in all the cases the VVRKFA- of samples in each class. In this experiment, the num-
SLFN takes less time than that of VVRKFA clas- ber of hidden neurons of VVRKFA-SLFN is kept
sifiers. This indicates the speeding up of testing fixed at 5 for all the four data sets while it’s value is
time of VVRKFA-SLFN by the proposed hybrid tuned for SLFN classifier for the best results.
classifier. Table 4 shows the results obtained on the micro-
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
. 0. 12r---"T"""-"""T"----,r---"T"""-"""T"----,~-"T"'""-"""T"----.
4.5 Analysis of results
u
~
~ The above experimental results may be analyzed as
·~ 0.1 follows. In their recent works, Huang et al. (Huang,
~ Zhou, Ding, & Zhang 2012) showed that SLFN
~ 0.08 (like ELM) has comparable or much improved
~
$ generalization performance for multiclass data
~ 0.06 classification with very high training speed (up to
()
'C thousands times) than conventional SVM. Thus, it
] 0.04
I is expected that SLFN like ELM classifier will per-
;!
~ form better than a simple distance classifier using
~ 0.02
~
Mahalanobis distance. This means the SLFN clas-
~ sifier is better able to decode the class labels in the
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 label space than the decoding using Mahalanobis
Size of the data set
distance. Due to this fact the classification perfor-
Figure 2. Variation of testing time of VVRKFA and mance of the VVRKFA-SLFN classifier increases
VVRKFA-SLFN classifiers with the increase of number than that of the VVRKFA classifier in most of the
samples. cases as observed from both the Table 3 and 4.
Table 4. LOO-CV testing performance of SLFN, VVRKFA and VVRKFA-SLFN classifiers on microarray data
sets.
SRBCT (4) 10 92.77 (25.90) 10–4 2–6 93.98 (23.79) 10–4 2–6 5 96.39 (18.66)
Brain tumor2 (4) 10 84.00 (36.66) 10–4 2–8 86.00 (34.7) 10–4 2–8 5 88.00 (32.50)
Lung (5) 30 92.61 (26.16) 10–3 2–2 94.09 (23.58) 10–3 2–2 5 94.09 (23.58)
Leukemia (3) 20 94.44 (22.91) 10–5 2–7 94.44 (22.91) 10–5 2–7 5 95.83 (19.98)
71
The speeding up of testing time of VVRKFA- VVRKFA-SLFN classifier. This technique can be
SLFN classifier over VVRKFA can be explained employed to various real world data classification
with the help of the computational complexity problems.
of testing of the two classifiers. In order to test a
sample, both VVRKFA and VVRKFA-SLFN use
the mapped low-dimensional label space vectors REFERENCES
by vector-valued regression technique. The differ-
ence in both the classifier is the way of determin- Allwein, E., R. Schapire, & Y. Singer (2000). Reducing
multiclass to binary: a unifying approach for margin
ing the class labels of these mapped patterns. The classifiers. Journal of Machine Learning Research 1,
low-dimensional subspace has a dimension of N 113–141.
if it is a classification problem with N different Armstrong, S. A. & et al. (2002). Mll translocations spec-
classes. To test a mapped pattern in N-dimension, ify a distinct gene expression profile that distinguishes
SLFN requires to evaluate expressions (10) and a unique leukemia. Nature Genetics 30(1), 41–47.
(9). Expression (10) calculates the inner product Bhattacharjee, A. & et al. (2001). Classification of human
between ith hidden neuron weight ai ∈ℜ N and lung carcinomas by mrna expression profiling reveals
the input pattern ρˆ j ∈ℜ N . This requires N num- distinct adenocarcinoma subclasses. Proc. Nat’l Acad-
ber of multiplications. For q such hidden neu- emy of Sciences, USA, 98(24), 13790–13795.
Blake, C. & C. Merz (1998). Uci repository for machine
rons the total number of multiplication required
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
72
classification. IEEE Transactions on Systems, Man, of parameters of the ga/knn method. Bioinformatics
and Cybernetics – Part B: Cybernetics 42, 513–529. 17, 1131–1142.
Huang, G.-B., Q.-Y. Zhu, & C.-K. Siew (2004). Extreme Lowe, D. (1989). Adaptive radial basis function non-
learning machine: A new learning scheme of feedfor- linearities, and the problem of generalisation. In
ward neural networks. In Int’l Joint Conf. Neural Net- 1st Inst. Electr. Eng. Int. Conf. Artif. Neural Netw.,
works (IJCNN ‘04). pp. 171–175.
Huang, G.-B., Q.-Y. Zhu, & C.-K. Siew (2006). Extreme Mitchell, T. M. (1997). Machine Learning (1st. ed.).
learning machine: Theory and applications. Neuro- Singapore: The McGRaw Hill Companies, Inc.
computing 70, 489–501. Nutt, C. L. & et al. (2003). Gene expression-based clas-
Igelnik, B. & Y. H. Pao (1995). Stochastic choice of basis sification of malignant gliomas correlates better
functions in adaptive function approximation and the with survival than histological classification. Cancer
functionallink net. IEEE Trans. Neural Networks 6(6), Research 63(7), 1602–1607.
1320–1329. Pao, Y. H., G. H. Park, & D. J. Sobajic (1994). Learning
Kaminski, W. & P. Strumillo (1997). Kernel orthonor- and generalization characteristics of random vector
malization in radial basis function neural networks. functional-link net. Neurocomputing 6, 163–180.
IEEE Trans. Neural Networks 8(5), 1177–1183. Peng, H., F. Long, & C. Ding (2005). Feature selection
Khan, J., J. S. Wei, & M. Ringner (2001). Classifica- on mutual information: Criteria of max-dependency,
tion and diagnostic prediction of cancers using gene max-relevance, and min-redundancy. IEEE Trans.
expression profiling and artificial neural networks. Pattern Analysis and Machine Intelligence 27(8),
Nature Medicine 7(6), 673–679. 1226–1238.
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017
Kreßel, U. (1999). Advances in kernel methods-support Serre, D. (2002). Matrices: Theory and Applications. New
vector learning, Chapter Pairwise classification and York: Springer–Verlag.
support vector machines, pp. 255–268. Cambridge: Wettschereck, D. & T. Dietterich (1992). Improving the
MIT Press. performance of radial basis function networks by
Li, L., C. R. Weinberg, T. A. Darden, & L. G. Pedersen learning center locations. In Advances in Neural Infor-
(2001). Gene selection for sample classification based mation Processing Systems (NIPS), Volume 4, San
on gene expression data: Study of sensitivity to choice Mateo, CA: Morgan Kaufmann, pp. 1133–1140.
73
Downloaded by [Santanu Ghorai] at 09:15 15 March 2017