You are on page 1of 6

GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology

(ICIET) - 2016 | July 2016

e-ISSN: 2455-5703

Hybrid Dimensionality Reduction Method Using


Kaiser Component Analysis and Independent
Component Analysis
1K.

Arunasakthi 2J.Sulthan Alikhan


1,2
Assistant Professor
1,2
K.L.N. College of Engineering, Pottapalayam, Sivagangai 630612, India
Abstract
Dimensionality Reduction is the process of extracting the more relevant information. Conventional dimensionality reduction is
categorized into two methods like Stand alone and Hybrid method. Standalone dimensionality reduction reduces the dimensions
based on a single criterion whereas Hybrid method combines two or more criterion. In this paper, we proposed new hybrid method
for dimensionality reduction Using Kaiser Component Analysis (KCA) and Independent Component Analysis (ICA). Kaiser
component Analysis extracts the uncorrelated information and Independent Component Analysis maximizing the Independency
among the data. The hybrid method using these Kaiser Component Analysis and Independent Component Analysis achieves both
correlations and Independency among the Information and it is applied on SVM classification. The result improves the accuracy
of the classification.
Keyword- Dimensionality reduction, Kaiser Component Analysis, Independent Component Analysis, Stand alone and
Hybrid methods, SVM classification.
__________________________________________________________________________________________________

I. INTRODUCTION
A. Dimensionality Reduction
Dimensionality reduction is a process of extracting the essential information from the data. The high-dimensional data can be
represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce
computational complexity.
Due to the increasing demand for high-dimensional data analysis from various applications such as electrocardiogram
signal analysis and content-based image retrieval, dimensionality reduction becomes a viable process to provide robust data
representation in relatively low-dimensional space [2].
Dimensionality reduction is an important pre-processing step in many applications of data mining, machine learning, and
pattern recognition, due to the so-called curse of dimensionality.
B. Need for Dimensionality Reduction
High-dimensional dataset presented many mathematical challenges. One of the problems with high-dimensional datasets is that,
in many cases, not all the measured variables were important to understand the problem. The main purpose of feature selection is
to reduce the number of features used in classification while maintaining acceptable classification accuracy. Less discriminatory
features were eliminated, leaving a subset of the original features which retained the sufficient information to discriminate well
among classes. Feature extraction is a more general method in which the original set of features is transformed to provide a new
set of features.
In mathematical terms, the problem we investigate can be stated as follows: given the p-dimensional random variable
X = (x1 . . . . xp) , and a lower dimensional representation of it, S = (s1. . . . . sk), that captures the content in the original data,
according to some criteria.
C. Dimensionality Reduction methods
Dimensionality reduction reduces the number of variables to improve the performance of the classification. High dimensional data
is the major problem in many applications which increase the complexity by taking the more execution time.
Conventionally dimensionality reduction is categorized into two methods: Stand Alone method and Hybrid method. In
standalone method, dimensionality reduction was done by using single criteria but in the case of hybrid approach dimensionality
reduction is achieved based on two or more criteria.
There are number of techniques available for dimensionality reduction. Each and every technique reduces the dimensions
of the data based on particular criteria. In recent years, Principal Component Analysis (PCA) and Linear Discriminant Analysis

All rights reserved by www.grdjournals.com

415

Hybrid Dimensionality Reduction Method Using Kaiser Component Analysis and Independent Component Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 069)

(LDA), Independent Component Analysis (ICA) are regarded as the most fundamental and powerful tools of dimensionality
reduction for extracting effective features from high-dimensional vectors of input data.
1) Principal Component Analysis (PCA)
Principal Component analysis is one of the mostly used feature extraction technique works based on assumption that all the
observations are real valued and Euclian vector. Principal component analysis (PCA) is a relatively old and well developed linear
independent feature extraction technique which has been applied to several pattern recognition tasks. Pattern recognition tasks are
divided into two phases; feature analysis and classification. Feature extraction is a part of the feature analysis phase where we
attempt to reduce redundancy in the feature vectors.
2) Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is the feature extraction technique recently used in many applications like audio age
classification, speaker recognition, and hyper spectral data classification. In [14], the author developed an unsupervised algorithm
for Independent Component Analysis.
Since data independence represents a stronger optimization criterion than correlations, the maximum independence in ICA provides
more intrinsic information resulting generally in contributing more to improve the performance with robustness than PCA [2].
Recently, many authors tried to improve the performance by combining the results of more than techniques [16-21]. They
achieved good results on hybrid methods rather than the stand alone methods. Chuxiong Miao, et.al [4] introduced Principal
Component Analysis(PCA) as as feature extraction technique to reduce the computational complexity of classification, and it is
evaluated on SVM classification. Amiri-Simkooei,et.al[7] used a new technique of Principal Component Analysis(PCA) for
reducing the dimensions of the data to improve the accuracy of the classification. Xu Chunming,et.al[6] used Principal Component
Analysis(PCA) for handling feature extraction on two dimensional data vectors. Senthilnath,J,et.al[11] used the Principal
Component Analysis (PCA) to find the physical structure by reducing the dimensions of the hyper spectral images.
Kamath,et.al[13] addressed high dimensional data is the major problem in classification. To improve the accuracy of the
classification, they reduce the dimensions of the data by using Independent component Analysis (ICA). Van De Ville, et.al [12]
applied dimensionality reduction on Nonlocal Means (NLM) denoising method. To improve the accuracy, Independent Component
Analysis (ICA) is used to reduce the dimensionality of the data. Taiping Zhang,et.al[16] addressed that, Linear Discriminant
Analysis(LDA) is not suitable for small size data. To overcome this problem, they introduced a hybrid method using Principal
Component Analysis (PCA) and LDA, which improved the classification accuracy. S. Moon,et.al[2] combined the supervised and
unsupervised technique to form a new hybrid method using Support Vector Machine(SVM) and Independent Component
Analysis(ICA). From above papers, it comes to know that, there are many techniques available for dimensionality reduction. Most
of the techniques are used as standalone method to achieve good performance. Recently, many authors proposed hybrid method to
further improve the performance. Principal Component Analysis(PCA) is the most widely used feature extraction technique, has
been applied on different applications and Independent Component Analysis(ICA) is recently used in most of the applications. In
this paper, we proposed a new hybrid method for dimensionality reduction using Kaiser Component Analysis and Independent
Component Analysis.

II. DESIGN AND METHODOLOGY


A. Architecture for Hybrid Dimensionality Reduction Using Kaiser Component Analysis (KCA) and Independent Component
Analysis (Ica)

Fig. 1: Architecture for hybrid dimensionality reduction method using Kaiser Component Analysis and Independent Component Analysis

All rights reserved by www.grdjournals.com

416

Hybrid Dimensionality Reduction Method Using Kaiser Component Analysis and Independent Component Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 069)

Figure 1 shows the architecture for Hybrid dimensionality method using Kaiser Component Analysis (KCA) and Independent
Component Analysis (ICA). In this paper, the process is divided into five modules. They are,
SVM Classification
Standalone method
Hybrid Method (KCA + ICA)
Performance Analysis
This method has been implemented on three high dimensional dataset, which has large number of attributes. They are,
Insurance bench mark dataset consisting of 85 attributes and 750 records, Spam dataset contains 57 attributes and 4600 instances
and Cancer dataset consisting of 57 attributes and 26 instances. These datasets are effectively processed by the SVM classification
and Hybrid reduction method using Kaiser Component Analysis (KCA) and Independent Component Analysis (ICA).
1) SVM Classification
The Support Vector Machine is the classification technique which classifies the high dimensional data directly. SVM is a powerful
classifier which maps the input onto a high dimensional space and then finds an optimal hyper plane to separate the data in that
space. The optimal hyper plane is found by maximizing the distance of the closest patterns.
Suppose we have a binary classification problem, where each example belongs to either class +1 or -1. SVM seeks to
maximize the margin between the two classes by finding the separating hyper plane which lies halfway between the data classes.
In the case of non- linear data, the data are transformed by some non-linear transformation onto a higher dimensional space.
In general the optimal hyper plane is selected by using eqn (4.1),
Y= WX + b
(4.1)
Where,
W = weight for each vector
x= data vector
b= bias
There are number of hyper planes among the data. Equation 4.1 selects the optimal hyper plane by maximizing the
margin between the plane and the support vectors. Thus the hyper plane separates the data into different classes by maximizing
the distance between the two different classes.
2) Standalone method for Dimensionality reduction
The second module of this project is Standalone method. It involves reducing the dimensions of the data using single criteria to
improve the performance of the Support Vector Machine classifier. Here, we choose two nonlinear techniques, Kaiser Component
Analysis (KCA) and Independent Component Analysis (ICA).
Let us consider the sample data with two variables and ten instances.
B. Kaiser Component Analysis (KCA)
There are many techniques available for dimensionality reduction. Among the techniques, Principal Component Analysis (PCA)
is the most feasible and transformation technique and Kaiser Method of component selection improves the performance of PCA.
The steps involved in Kaiser Component Analysis (KCA) are as follows:
1) Calculate the mean of the each attribute.
2) Subtract the mean from each data. so that, all data will be centralized.
3) Find the covariance data using the original data and the centralized data.
4) Find the eigen values and corresponding eigen vectors.
5) Arrange the vectors in decreasing order.
6) Select the components based on eigen values which are greater than 1.0
Using the above steps, the dataset will be transformed from high dimensional data into low dimensional data which
represents the more important data.
C. Independent Component Analysis (ICA)
Independent Component Analysis (ICA) is the unsupervised feature extraction technique, which has been applied on many
applications. It transforms hte original data by using a transformation function. The model of the ICA is defined as,
Y=s X
Where,
Y Transformed data.
s - Scalar matrix.
X Original data.
Here, the original data is transformed into transformation data by using tanh transformation function as a scalar function.
The non-linearity among the data will be maximized and orthogonally for each data vector is achieved using this tanh
transformation function. Selecting the number of Independent components is one of the important problem in ICA. The components
which having greater than the 0.1 of average in the newly transformed data set.

All rights reserved by www.grdjournals.com

417

Hybrid Dimensionality Reduction Method Using Kaiser Component Analysis and Independent Component Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 069)

3) Hybrid method
In this section, we introduce hybrid method for dimensionality reduction which combines two or more standalone criteria.
Compared with the stand alone methods, hybrid method provides good performance in terms of accuracy and the execution time
in SVM Classification. Here, PCA maximizes the decor relation among the features and ICA maximizes the independency among
the data. In this project, we used the hybrid method of PCA + ICA. The principal components form the PCA is forwarded to ICA,
which combines the features of both PCA and ICA. The performance of SVM shows the better result of PCA + ICA rather than
the stand alone methods PCA and ICA.
4) Performance Analysis
For this project, we select three datasets from UCI Repository named Insurance Benchmark, Spam and cancer dataset. The
performance of SVM is measured by the Accuracy and elapsed time. PCA and ICA is applied on the original data and SVM is
applied on the newly obtained low dimensional data.
The result of this standalone method improves the performance of the SVM. Then the result of PCA + ICA improves
the performance of the SVM classification by reducing the dimensions from 85 to 16, 57 to 11 on Insurance and spam data sets
respectively.

III. IMPLEMENTATION AND RESULT ANALYSIS


In this project, the dimensionality reduction is done by hybrid method using Kaiser Component Analysis and Independent
Component Analysis and it is applied on SVM classification in Matlab environment. The accuracy of the classifier is compared
with the accuracy with both high and low dimensional data.
This project is done by and using three datasets from UCI Repository. One is Insurance BenchMark dataset, that contains
85 attributes and 750 records to analyze whether the person is eligible to get Insurance or not. Second, we used Spam dataset,
which contains 57 attributes and 4600 records to analyse whether the data is spam or not. Third, cancer dataset contains 57 attributes
and 26 instances for the analysis. Analyzing and computing all these high dimensional data is very difficult and not all these
variables affect the result of the classification. So, we tried to remove the irrelevant data for this analysis. Though removing of
attributes from the data takes less execution time, there may be a loss of some data which may affect the accuracy of the
classification. Using these methods, the dimensionality of the data has been reduced and the performance is measured in terms of
both Accuracy and Elapsed time. The code for SVM classification, PCA and ICA was done in matlab.

IV. RESULT ANALYSIS


A. Result of SVM Classification
When executing support vector machine on the original high dimensional data set, the performance of the SVM Classification is
shown on table 1
DATASET

ACCURACY

ELAPSED
TIME (in sec)

Insurance dataset

39.4667

1.6659

Spam

35.5217

0.2643

Cancer dataset

76.9231

1.2634

Table 1: Performance of SVM Classification

B. Results of Standalone methods


1) Kaiser Component Analysis
To show the effectiveness of the dimensionality reduction, the high dimensional data is processed by the Kaiser Component
Analysis. From this analysis 85,58, and 57 variables are transformed into 24,11 and 7 on Insurance, spam and cancer datasets
respectively. The Kaiser components obtained from this analysis are shown in table 2
2) Independent Component Analysis
Stand Alone
DATA SET

No.of variables in HDD

Hybrid

Kaiser components

Independent
Components

Kaiser PCA +
ICA

Insurance dataset

85

24

15

15

Spam

58

11

All rights reserved by www.grdjournals.com

418

Hybrid Dimensionality Reduction Method Using Kaiser Component Analysis and Independent Component Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 069)

Cancer dataset

57

15

Table 2: Dimensions of data after transformation

C. Result of Hybrid method KCA and ICA


The hybrid approach is designed by combining the results of both Kaiser component Analysis and Independent Component
Analysis. By using this Hybrid method, the high dimensional data is reduced into low dimensional data having 15,2 and 7 variables
on Insurance, Spam and cancer datasets respectively.
D. Performance Analysis
STAND ALONE METHOD
KCA

HYBRID
ICA

KCA + ICA

DATASETS
Dimension

Accuracy

Time

Dimension

Accuracy

Time

Dimension

Accuracy

Time

Insurance
(85)

24

52

1.54s

15

49.86

1.21s

15

58.93

1.65

Spam (57)

11

72.82

5.05s

70.08

1.14s

74.43

1.21

Cancer (58)

38.46

0.01s

15

61.53

0.09s

69.23

0.009

Table 3: Performance Analysis

From the table 3, it comes to know that the accuracy of hybrid method is better than that of the standalone methods and
elapsed time may vary time to time based on the system performance. The graphical representation of Dimensionality reduction
and its performance is shown in the fig 5.1 and 5.2.
The second technique in this project is Independent Component Analysis. First, ICA is directly applied on the original
High dimensional data vector. By doing this analysis, we got 15,2 and 15 independent components for Insurance, spam and
Cancer datasets respectively. The details are shown in table 5.2
The SVM classification is again processed with these low dimensional data obtained from both stand alone and hybrid
methods. When we feed Principal components into SVM, we got better performance than that of the SVM with high dimensional
data. The performance of the SVM is measured by using two metrics like Accuracy and Elapsed time. Accuracy is found using
confusion matrix in Matlab. The Performance details are shown in table 5.3
From Figure 2, its clearly comes to understand that the high dimensional data is transformed into low dimensional data
using standalone methods and its further reduced by the hybrid approach.
Insurance

85 57

Cancer

58

24 11

HDD

Spam

KCA

15

7
15 2
ICA

7
15 2

KCA+ICA

Fig. 2: Comparison on dimension of KCA, ICA and KCA + ICA

Insurance

Spam

Cancer

85.5217
76.923172.82
74.43
70.08
61.5358.93 69.23
52
49.86
38.46
39.4667

HDD

KCA

ICA

KCA+ICA

Fig. 3: Comparison on Accuracy of KCA, ICA and KCA + ICA

All rights reserved by www.grdjournals.com

419

Hybrid Dimensionality Reduction Method Using Kaiser Component Analysis and Independent Component Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 069)

Figure 3 shows that, improvement on accuracy with the transformed low dimensional data. For example, In insurance
dataset, the accuracy of high dimensional data is 39.4647. But we can get the better accuracy of 52, 49.86 and 58.93 with the low
dimensional data from KCA, ICA and KCA+ ICA respectively. The Screen shots of this projects is shown in Appendix:

V. CONCLUSION & FUTURE ENHANCEMENT


In this paper, we used Hybrid approach using Kaiser Component Analysis(KCA) and Independent Component Analysis(ICA) for
dimensionality reduction and is successfully applied on SVM classification with three datasets like Insurance Benchmark, Spam
and cancer datasets. It can be further enhanced by implementing hybrid techniques using various Machine Learning techniques
like Principal Component Analysis, Linear Discriminant Analysis, Support Vector Machine, Independent Component Analysis,
etc., to get better results. The dimensionality reduction can also be applied to the regression problem similar to classification
problems.

REFERENCES
[1] Jun Li ; Dacheng Tao ,Simple Exponential Family PCA, IEEE Transactions on Neural Networks and Learning Systems,
Volume: 24 , Issue:3 Publication Year: 2013 , Page(s): 485 - 497
[2] S. Moon and H. Qi , Hybrid dimensionality reduction method based on Support vector machine and independent component
analysis", IEEE Transaction on Neural Networks, volume. 23, pages.749 -761 year 2012
[3] Qi Ding ; Kolaczyk, E.D. , A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data ,IEEE
Transactions on InformationTheory,Volume:59, Publication Year: 2013 ,Page(s):7419 7433
[4] Chuxiong Miao ; Yu Wang ; Yonghong Zhang , A SVM classifier combined with PCA for ultrasonic crack size classification
, International conference on Computer Engineering, 2010, Page(s): 001627 001630
[5] N. Kwak , Principal component analysis based on L1-norm maximization", IEEE Transaction on Pattern Analysis and
Machine Intelligence, volume. 30, no. 9, pages.1672 -1680 Year : 2008
[6] Xu Chunming ; Jiang Haibo ; Yu Jianjiang , Robust two-dimensional principle component analysis , IEEE transaction on
signals and control Publication Year: 2010 , Page(s): 452 - 455
[7] Amiri-Simkooei, A.R.; Snellen,M. ; Simons, D.G., Principal Component Analysis of Single-Beam Echo-Sounder Signal
Features for Seafloor Classification ,IEEE Transaction on Oceanic Engineering, Volume: 36, Issue: 2 Publication Year: 2011
, Page(s): 259 272
[8] Ran He; Bao-Gang Hu; Xiang-Wei Kong ,"Robust Principal Component Analysis Based on Maximum Correntropy Criterion",
IEEE Transactions on Image page(s): 1485 - 1494 , Issue: 6, June 2011
[9] J. Zhao , P. Yu and J.Kwok Bilinear probabilistic principal component analysis", IEEE Transaction on Neural Networks
and Learning system., volume. 23, no. 3, pages.492 -503, publication year: 2012
[10] Dobry, G. ; Hecht, R.M. ; Avigal, M. ; Zigel, Y., Supervector Dimension Reduction for Efficient Speaker Age Estimation
Based on the Acoustic Speech Signal, IEEE Transactions on Audio, Speech, and Language Processing, Issue: 7 Publication
Year: 2011 , Page(s): 1975 - 1985
[11] Senthilnath, J. ;Omkar, S.N. ; Mani,Crop Stage Classification of Hyperspectral Data Using Unsupervised Techniques, IEEE
Transcation on Applied Earth Observations and Remote Sensing, Volume: 6, Issue: 2 , Part:3 Publication Year: 2013 , Page(s):
861 - 866
[12] Van De Ville, D. ; Kocher, M. , Nonlocal Means With Dimensionality Reduction and SURE-Based Parameter Selection,
IEEE Transactions on Image Processing, Volume: 20 , Year: 2011 , Page(s): 2683 - 2690
[13] Kamath, Sunil ; Ravindran, S. ; Anderson, D.V., Independent Component Analysis for audio classification IEEE
Transactions on Digital Signal Processing Year: 2009, Page(s): 352 355
[14] S. Dhir and S.-Y. Lee, Discriminant independent component analysis , IEEE Transaction on Neural Networks, volume. 22,
pages : 845 -857 2011

All rights reserved by www.grdjournals.com

420