You are on page 1of 14

Journal of Ambient Intelligence and Humanized Computing

https://doi.org/10.1007/s12652-018-1096-5

ORIGINAL RESEARCH

Graph-dual Laplacian principal component analysis


Jinrong He1,2 · Yingzhou Bi3 · Bin Liu1,2 · Zhigao Zeng4

Received: 16 August 2016 / Accepted: 7 September 2018


© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract
Principal component analysis is the most widely used method for linear dimensionality reduction, due to its effectiveness
in exploring low-dimensional global geometric structures embedded in data. To preserve the intrinsic local geometrical
structures of data, graph-Laplacian PCA (gLPCA) incorporates Laplacian embedding into PCA framework for learning local
similarities between data points, which leads to significant performance improvement in clustering and classification. Some
recent works showed that not only the high dimensional data reside on a low-dimensional manifold in the data space, but
also the features lie on a manifold in feature space. However, both PCA and gLPCA overlook the local geometric information
contained in the feature space. By considering the duality between data manifold and feature manifold, graph-dual Lapla-
cian PCA (gDLPCA) is proposed, which incorporates data graph regularization and feature graph regularization into PCA
framework to exploit local geometric structures of data manifold and feature manifold simultaneously. The experimental
results on four benchmark data sets have confirmed its effectiveness and suggested that gDLPCA outperformed gLPCA on
classification and clustering tasks.

Keywords  Principal component analysis · Graph-Laplacian PCA · Dual graph · Feature manifold · Graph-Dual Laplacian
PCA

1 Introduction

In machine learning and data mining, principal component


analysis (PCA) (Jolliffe 2011) is a classical feature extraction
method, which has been widely used in many applications,
* Jinrong He such as face recognition (Turk and Pentland 2002), docu-
hejinrong@nwsuaf.edu.cn ment clustering (Kargupta et al. 2001), gene selection (Sturn
Yingzhou Bi et al. 2002), anomaly detection (Bi et al. 2016) and so on.
byzhou@163.com However, it has its own limitations.
Bin Liu The first limitation of PCA is that it can only identify
liubin0929@nwsuaf.edu.cn linear subspace, and cannot discover nonlinear manifold of
Zhigao Zeng data. In order to deal with non-linear dimensionality reduc-
zzgzzg99@163.com tion problems, PCA is extended to kernel PCA (Smola 1997)
by the introduction of kernel functions. It is equivalent to
1
College of Information Engineering, Northwest A&F mapping data samples into a higher Hilbert space, and
University, Yangling 712100, Shaanxi, China
applying PCA in this new Hilbert space. Additionally, PCA
2
Key Laboratory of Agricultural Internet of Things, is frequently used for non-linear global optimization and
Ministry of Agriculture, People’s Republic of China,
Yangling 712100, China
evolutionary algorithms development. Yang et al. applied
3
PCA to combined global optimization and artificial neu-
Science Computing and Intelligent Information Processing
of Guangxi Higher Education Key Laboratory, Guangxi
ral network (Yang et al. 2017), and addressed the curse of
Teachers Education University, Nanning 530001, Guangxi, dimensionality in multi-objective evolutionary algorithms
China (Yang et al. 2015, 2018).
4
College of Computer and Communication, Hunan University
of Technology, Hunan 412000, China

13
Vol.:(0123456789)
J. He et al.

Another limitation of PCA is the interpretation of the and proposed RPCA on graphs to improve the clustering
principal components may be difficult. Although the dimen- performance. They also find that the low rank representation
sionalities identified by PCA are uncorrelated variables con- is piecewise constant on the underlying graph, then they
structed as a linear combination of the original features, they introduced a graph total variation regularization to enforce
do not have meaningful physical interpretations. Many vari- the piecewise constant assumption (Shahid et al. 2016).
ants of PCA have been proposed to enhance the interpret- Recent studies have found that not only the observed data
ability of principal components extracted from the classical lie on a nonlinear low dimensional manifold, which is called
PCA. Non-negative matrix factorization (NMF) (Lee 1999) data manifold, but also the features lie on a low dimensional
was introduced to give meaningful approximation of a non- manifold, which is called feature manifold. In order to con-
negative data matrix by non-negative low-rank factorization. sider the geometrical information of both the data manifold
In order to enhance the interpretability of PCA, sparse PCA and feature manifold simultaneously, graph dual regulariza-
(SPCA) (Zou et al. 2006) extracts principal components of tion technique has attracted many attentions in dimensional-
the given data with sparse non-zero loadings. ity reduction. For example, through enforcing preservation
The third limitation of PCA is that it is sensitive to of geometric information in both data space and the feature
grossly corrupted entries of data matrix. Since the quad- space, Shang et al. proposed dual graph based non-negative
ratic term in the classical PCA formulation is sensitive to matrix factorization (DNMF) (2012), and Yin et al. proposed
outliers, many L1-norm based PCA (Brooks et al. 2013; Lin dual graph regularized low rank representation (DGLRR)
et al. 2014; Wang 2012) and Lp-norm based PCA (Kwak (2015). All these methods have achieved promising per-
2014; Liang et al. 2013; Wang 2016) are proposed. Recently, formances, which demonstrated that duality between data
many researchers claimed that one can recover the principal points and feature vectors can be used to improve the per-
components of a data matrix even though a positive fraction formances of dimensionality reduction methods.
of its entries are arbitrarily corrupted, by decomposing data Inspired by the idea of dual regularization learning (Gu
matrix into a low-rank component and a sparse component, and Zhou 2009; Sindhwani and Hu 2009), graph-Dual
which is called robust PCA (RPCA) (Candes et al. 2009). Laplacian PCA (gDLPCA) is proposed in this paper. By
Since then, many extensions of RPCA have been proposed, combining PCA and graph dual regularization method,
such as inductive RPCA (IRPCA) (Bao et al. 2012), RPCA gDLPCA simultaneously preserves geometric structures of
with capped norms (RPCA-Capped) (Sun et al. 2013), and the data manifold and the feature manifold by two graphs
so on. IRPCA learned the underlying projection matrix by derived from data space and feature space that constructed
solving a nuclear-norm regularized minimization model, by k-nearest neighbor method. In summary, the main contri-
which can be used to efficiently remove the gross corrup- butions of the paper are summarized as follows:
tions in the data matrix. Without recalculating over all the
data, IRPCA can handle the new samples directly. Since L1- 1. We propose a dual graph regularized PCA model
norm model is based on a strong assumption which may not (gDLPCA) by discovering the local geometric structures
hold in real-world applications, RPCA-Capped is based on contained in the data manifold and the feature mani-
difference of convex functions framework using the capped fold. gDLPCA can effectively discover local geometri-
trace norm and capped L1-norm. cal structures in both data space and the feature space.
Recent works indicate the critical importance of preserv- Unlike gLPCA, gDLPCA uses the dual graph embed-
ing local geometric structure of data in dimensionality reduc- ding as the regularization term, which can preserve the
tion. In order to discover the local geometrical structure and local geometrical structure of both feature manifold and
discriminant structure of data manifold, many researchers data manifold.
have proposed a series of graph based dimensionality reduc- 2. An optimization algorithm is proposed to solve gDLPCA
tion methods by using a geometrically induced regularizer, model. We construct a compact closed form solution,
such as graph Laplacian. Since local geometric structure can so it can be efficiently solved. Closed-form solution
be captured by a k-nearest neighbor graph on data samples, can provide an exact and efficient algorithm with less
it has been widely used to explore geometrical structures implementation effort.
( ) The computational complexity
of data (Belkin and Niyogi 2001). Jiang et al. proposed a of gDLPCA is O d3  , where d is the dimensionality of
graph Laplacian PCA (gLPCA) (2013), which imposed the data sample.
graph regularization on projected low-dimensional rep- 3. Comprehensive experiments on clustering and classifi-
resentations, and they also proposed a robust version of cation tasks are conducted to confirm the effectiveness
gLPCA using L2,1-norm on reconstruction term, then the of the proposed gDLPCA method, and the results on
augmented Lagrange multiplier method is used to optimize four high-dimensional data sets have demonstrated its
robust gLPCA model. Shahid et al. (2015) incorporated the advantages over traditional PCA and gLPCA methods.
spectral graph regularization into the robust PCA framework

13
Graph-dual Laplacian principal component analysis

Table 1  Notations Symbol Meaning

d The dimensionality of original data sample


n The number of samples
r The dimensionality of low-dimensional representation, or reduced dimensionality
X High-dimensional data matrix whose columns are samples and rows are features
xi A d-dimensional sample vector corresponding to a column of data matrix X
xi A n-dimensional feature vector corresponding to a row of data matrix X
Y Low-dimensional data matrix with size of r × n
yi A low-dimensional data representation or embedding of original data sample xi,
which corresponding to a column of Y
li Label of data sample xi
ni The number of samples with label li
Wij The weight or dissimilarity between sample xi and xj

The rest of the paper is organized as follows. In Sect. 2, where Nk (xi ) denotes the set of k nearest neighbors of xi.
we describe the preliminary notations and formulations Several methods have been proposed to construct the weight
used in the paper. Section 3 briefly reviews some related matrix W, such as heat kernel (Belkin and Niyogi 2001),
works, including traditional PCA and gLPCA. The proposed local linear reconstruction coefficient (Roweis and Saul
gDLPCA method is introduced in Sect. 4. Then the classifi- 2000) and correlation distance (Jin et al. 2015).
cation and clustering results on four data sets are reported in In dimensionality reduction, there is a fundamental
Sect. 5. Finally, we give our concluding remarks in Sect. 6. assumption that nearby sample points are likely to have the
same embeddings. Thus, graph embedding model aims to
preserve local information of manifold structure through the
2 Preliminaries following optimization problem:
n
∑ � �2
Some notations used throughout the paper are summarized min J(Y) = Wij �yi − yj � (2)
Y � �
in Table 1. Based on the notations, dimensionality reduc- i,j=1

tion or data representation aims to generate low-dimensional which can be further formulated in trace form as follows:
representations Y from data matrix X, while preserving some ( )
structures of data set. min tr V T XLX T V (3)
Suppose
( that we)are given a high-dimensional data matrix where S = XLX T is the scatter matrix.
X = x1 , x2 , … , xn ∈ Rd×n , whose column vectors are sam- From Eq. (2) one can see that minimizing (2) is actually
ples. Dimensionality reduction aims to project d dimensional enforcing Y to reproduce the similarity structure coded in L.
samples
( into r dimensional
) subspace, i.e., Y = V T X  , where
Y = y1 , y2 , … , yn( ∈ R is low-dimensional
r×n
) embedding
matrix, and V = v1 , v2 , … , vr ∈ Rd×r is linear projection 3 Related works
matrix. There are many methods to find projection matrix
V and low-dimensional embedding Y. The most popular 3.1 Principal component analysis
of them is graph based dimensionality reduction, namely
graph embedding (Yan et al. 2007). Most dimensionality Due to its simplicity and efficiency, PCA is a widely used
reduction algorithms can be unified into a graph embedding method for data representation and feature extraction. PCA
framework. learns a set of orthonormal projection vectors so that the
Let G = {X, W} be an undirected weighted graph with variance of original data in low-dimensional feature space
vertex set X and similarity weight matrix W ∈ Rn×n . The is maximized, which is equivalent to minimize the following
graph Laplacian matrix of graph G is defined as L = D − W  , reconstruction error in L2 norm:

where D is a diagonal matrix and Dii = j≠i Wij . The weight
2
between xi and xj is defined as min ‖X − VV T X ‖
V ‖ ‖F
(4)
{
1 xi ∈ Nk (xj ) ∨ xj ∈ Nk (xi ) s.t. V T V = I
Wij = (1)
0 otherwise

13
J. He et al.

where X is a centered data matrix. Traditionally, V and Equivalently, we have the following optimization
Y = V T X are termed as principal directions and princi- problems:
pal components. The optimal projection matrix V can be ( ( ) )
obtained from the eigen-decomposition
[ ] of the covariance min tr Y −X T X + 𝜆L Y T
Y
matrix C = XX T  , and V = v1 , … , vr  , where vi is the eigen-
s.t. YY T = I
vector corresponding to the i-th largest eigenvalue of C.
Then the r-dimensional representations are given as: Then, the optimal low-dimensional
( )embedding
T
matrix can
T be represented as Y ∗ = u1 , u2 , … , ur  , where u1 , u2 , … , ur
yi = V xi , i = 1, … n.
are eigenvectors corresponding to the first r smallest eigen-
Noted that yi is a descriptive and compact representation values of the matrix
of high-dimensional data sample xi, and the corresponding
low-dimensional data space is usually called feature space, G𝛼 = −X T X + 𝛼L (6)
which is the learning objective of PCA model (Guan et al. and the optimal projection matrix V = XY  . Noted that
∗ ∗T

2018). the data matrix X is centered the same as in standard PCA.

3.2 Graph‑Laplacian PCA
4 Graph‑dual Laplacian PCA
Graph-Laplacian PCA (gLPCA) assumes that the high-
dimensional representations lie on a smooth manifold, and In this section, we propose a dual graph regularized PCA
since the manifold structures can be encoded in weight model, called gDLPCA algorithm. Since recent studies
matrix W of the graph, gLPCA learns low-dimensional rep- have shown that representation learning from data manifold
resentations of high-dimensional data matrix X through the and feature manifold simultaneously can improve the per-
following optimization model: formance of data clustering, gDLPCA incorporates feature
� � manifold embedding into graph-Laplacian PCA model. By
min J = ‖X − VY‖2F + 𝜆tr YLY T this way, the proposed gDLPCA method is able to discover
V,Y
(5)
s.t. YY T = I the local geometric structure of data space and feature space
simultaneously.
where 𝜆 is the regularization parameter and L = D − W .
Although the optimization model (5) is not a convex 4.1 Dual graph
problem, it has the closed-form solution, which means that
it can be efficiently solved. A data set endowed with pairwise relationships can be
When fixing Y, and setting the first order derivative of naturally illustrated as a graph, in which the samples are
objective function in (5) as zero, i.e., represented as vertices, and the relationships between any
two vertices can be represented by an edge. If the pairwise
𝜕J relationships among samples are symmetric, the graph can
= −2XY T + 2V = 0
𝜕V be undirected. Otherwise, it can be directed.
Then we can obtain the optimal projection matrix The data matrix has two modes, namely column vectors
V ∗ = XY T . and row vectors, which corresponding to sample points set
Substituting V = XY T into objective function (5), we get and feature points set. To be clear, the column space is called
( ) data space, and the row space is called feature space. Origi-
2
min J = ‖
‖ X − XY T Y ‖
‖F
+ 𝜆tr YLY T nally, duality between data samples and features is consid-
V,Y
ered for co-clustering. In this work, it is used for dimension-
s.t. YY T = I ality reduction.
By some algebra, it can be rewritten as Given high-dimensional { data matrix
} X, we can construct
the feature graph G(f ) = X T , W (f ) from feature samples
( ) {( ) ( ) ( )T }
‖ ‖2 T T
set x1 , x2 , … , xd  , where xi is the i-th row of data
J = ‖X − XY T Y ‖ + 𝜆tr YLY T
‖ ‖F
(( )( )) ( )
= tr X − XY T Y X T − Y T YX T + 𝜆tr YLY T matrix X, and W (f ) is the weight matrix of feature graph.
( ) ( ) ( ) Similar to Eq. (1), W (f ) can be defined as
= tr XX T − tr YX T XY T + 𝜆tr YLY T
( ) ( ( ) ) {
= tr XX T + tr Y −X T X + 𝜆L Y T 1 xi ∈ Nk (xj ) ∨ xj ∈ Nk (xi )
(f )
Wij = (7)
0 otherwise

13
Graph-dual Laplacian principal component analysis

where Nk (xi ) denotes the k nearest neighbors of xi. The cor- ( ) (( ) ( )T )


responding feature graph Laplacian matrix is 𝛼 ⋅ tr (VQ)T L(f ) (VQ) + 𝛽 ⋅ tr QT Y L(d) QT Y
( ( ) ) ( ( ) )
L(f ) = D(f ) − W (f ) (8) = 𝛼 ⋅ tr QT VL(f ) V Q + 𝛽 ⋅ tr QT YL(d) Y T Q
(( ) ) (( ) )
For convenient notation, the data
{ graph introduced
} in = 𝛼 ⋅ tr VL(f ) V QQT + 𝛽 ⋅ tr YL(d) Y T QQT
Section II can be denoted as G(d) = X T , W (d)  , and the cor- ( ) ( )
= 𝛼 ⋅ tr VL(f ) V + 𝛽 ⋅ tr YL(d) Y T
responding data graph Laplacian matrix is

L(d) = D(d) − W (d) (9)


Obviously, (VQ, QTY) is also the solution of (10).
Some recent works have shown that the high-dimensional
The optimization problem in (10) is not convex, and there
samples reside on a low-dimensional manifold, so do the
are many intelligent optimization techniques can be used to
features (Shang et al. 2012, 2016; Yin et al. 2015). Along
solve this problem (Guo et al. 2017a, b). However, the com-
columns and rows, the data matrix can be viewed as two
putational complexity of these methods is high. Substituting
modes, namely column space and row space respectively.
Y with V T X  , and according to orthogonality constraint on V,
The manifold in column space is called data manifold, and
the first term in objective function (10) can be simplified into
the manifold in row space is called feature manifold. By
following trace difference form:
infusing the two manifolds, the dual graph regularization
technique helps to improve the performances of subspace � �
‖X − VY‖2F = tr (X − VY) ⋅ (X − VY)T
learning algorithms. Therefore, it gives rise to our motiva- � �
= tr (XX T − VV T XX T − XX T VV T + VYY T V T )
tion of using both of the two graphs. � �
= tr(XX T ) − tr V T XX T V
4.2 Graph‑Dual Laplacian PCA (11)
Thus, apply Eq. (11) into (10), we get
By considering the local geometrical structures in data mani-
fold and feature manifold, a unified model called gDLPCA ‖X − VY‖2F + 𝛼 ⋅ tr(V T L(f ) V) + 𝛽 ⋅ tr(YL(d) Y T )
� �
is formulated as follows: = tr(XX T ) + tr V T (−XX T + 𝛼 ⋅ L(f ) + 𝛽 ⋅ XL(d) X T )V
� (12)
min ‖X − VY‖2F + 𝛼 ⋅ tr(V T L(f ) V) + 𝛽 ⋅ tr(YL(d) Y T )
Since the first term in Eq. (12) is constant, the original
V,Y
(10)
s.t. V T V = I optimization objective function in (12) is equivalent to fol-
lowing optimization problem:
where 𝛼, 𝛽 ≥ 0 are the regularization parameters trading off {
the PCA reconstruction error, feature graph embedding and min tr(V T L𝛼,𝛽 V)
V
(13)
data graph embedding. The regularization strategy in (10) s.t. V T V = I
indicates that gDLPCA can exploit the local geometrical
information of data space and feature space simultaneously. where L𝛼,𝛽 = −XX T + 𝛼 ⋅ L(f ) + 𝛽 ⋅ XL(d) X T .
When the regularization parameter 𝛼 tends to zero, gDLPCA The trace minimization problem (11) can be efficiently
degenerates to the gLPCA method in Eq. (5). Also, when solved by eigen-decomposition of L𝛼,𝛽 . The optimal projec-
letting both 𝛼, 𝛽 tend to zero, gDLPCA degenerates to the tion matrix V is consisted of eigenvectors of L𝛼,𝛽 that cor-
traditional PCA in Eq. (4). responding to its first r smallest eigenvalues.
Noted that the optimal solution (V, Y) of (10) are not For computational stable, the largest eigenvalue of covari-
unique, since the orthogonal invariance of the trace func- ance matrix XX T is used to normalize XX T  , the largest eigen-
tion in (10), that is, (V, Y) is an optimal solution if an only value of feature graph Laplacian matrix L(f ) is used to nor-
if (VQ, QTY) is also an optimal solution for any orthogonal malize L(f ) , and the largest eigenvalue of data scatter matrix
matrix Q, which can be shown as follows. The first term in XL(d) X T is used to normalize XL(d) X T (Jiang et al. 2013). Let
(10) can be rewritten as wn , 𝛼n and 𝛽n be largest eigenvalues of XX T  , L(f ) and XL(d) X T  ,
� ��2 respectively. Suppose that

�X − (VQ) QT Y �
� �F 𝜆 w
� � T � �2 𝛼= ⋅ n (14)
= �X − V QQ Y � 1 − 𝜆 − 𝜇 𝛼n
� �F
= ‖X − VY‖2F 𝜇 w
𝛽= ⋅ n (15)
1 − 𝜆 − 𝜇 𝛽n
and the second term can be rewritten as

13
J. He et al.

where 𝜆 and 𝜇 are the alternative model parameters instead Table 2  Data sets used in the experiments
of regularization parameters 𝛼and 𝛽 . Name #Samples #Dimensionality #Class
Substituting Eqs. (14) and (15) into Eq. (13), we have
USPS 1000 256 10
[ ] wn
tr V T (−XX T + 𝛼 ⋅ L(f ) + 𝛽 ⋅ XL(d) X T )V = ⋅ tr ISOLET1 1560 617 26
1−𝜆−𝜇
{ [ ( ) ] } COIL20 1440 784 20
XX T L(f ) XL(d) X T SHD 1000 256 10
V T (1 − 𝜆 − 𝜇) ⋅ I − +𝜆⋅ +𝜇⋅ V
wn 𝛼n 𝛽n

(16)
Therefore, the solution of V in (16) can be stably com- 5.1 Data sets description
puted, which are eigenvectors of G𝜆,𝜇 that be defined as
( ) In the experiments, the data sets used to evaluate the proposed
G𝜆,𝜇 = (1 − 𝜆 − 𝜇) ⋅ I −
XX T
+𝜆⋅
L(f )
+𝜇⋅
XL(d) X T algorithm are listed in Table 2. These four benchmark data
wn 𝛼n 𝛽n sets are widely used for data clustering and classification.
(17) USPS1 data set is obtained by scanning of handwritten dig-
Noted that G𝜆,𝜇 is semi-positive definite, and all ones vec- its from envelopes of the U.S. Postal Service. After size nor-
tor is an eigenvector of G𝜆,𝜇 , which is orthogonal to any other malization, all the images are resized into 16 × 16 grayscale
eigenvectors.
The procedure of gDLPCA is summarized in Algorithm 1.

Since the eigen-decomposition of gDLPCA is the most images. Sample images of USPS data set are shown in Fig. 1.
time consuming and the size of G𝜆,𝜇(is d × d,
) the computa- Considering computational costs, we just use a subset of origi-
tional complexity of gDLPCA is O d3  . Noted that PCA nal data set. In the experiments, 100 images for each class are
and truncated singular value decomposition (TSVD) almost selected randomly. Thus, there are 1000 images in our USPS
have the same computational complexity. Moreover, in fact, data set.
gDLPCA is a manifold regularized low-rank matrix factori- ISOLET1 data set2 is the audio sequence data generated
zation method (Zhang and Zhao 2013). by a group of 30 speakers through speaking the name of each
letter of the alphabet.
COIL20 data set3 is collected by Columbia University
5 Experimental results Image Library, and contains images of 20 objects in which

Since PCA can be applied to both classification and cluster- 1


  http://www-stat.stanf​ord.edu/~tibs/ElemS​tatLe​arn/data.html.
ing, we investigate the performances of gDLPCA algorithm 2
  https​://archi​ve.ics.uci.edu/ml/datas​ets/ISOLE​T.
in data clustering and classification experiments. 3
  http://www.cs.colum​bia.edu/CAVE/softw​are/softl​ib/coil-20.php.

13
Graph-dual Laplacian principal component analysis

Fig. 1  Sample images from USPS data set

Fig. 3  Sample images from SHD data set

preserving projection (LPP)(He and Niyogi 2004) and (4)


graph-Laplacian PCA (gLPCA). Each data sets are trans-
formed to zero-mean and unit standard deviation along the
features. All the experiments are carried out on personal
computer with 2.3 GHz Intel Core i7 processor and 8 GB
DDR3 memory. All the algorithms are implemented on
MATLAB R2015b.
After dimensionality reduction, the 1-nearest neighbor
method is used for low-dimensional data classification. It
can also be used for other classification methods (Gu and
Sheng 2017; Gu et al. 2015, 2016; Wen et al. 2015). In the
data partition, for each data set, we randomly select 20%
samples in each class as the training set, and the remain-
Fig. 2  Sample images from COIL20 data set ing samples are used for testing. The process is repeated 20
times and the average classification accuracy is reported.
In the clustering experiments, k-means method is used for
the background has been discarded. Sample images of
low-dimensional data clustering, and we set the number of
COIL20 data set are shown in Fig. 2.
clusters as the number of classes for all data sets.
SHD data set4 is Semeion handwritten digit data set,
which contains more than 1000 handwritten digit images
scanned from 80 persons. Sample images of SHD data set
5.3 Parameter selection
are shown in Fig. 3.
In order to test how the performance of gDLPCA varies with
From Figs. 1, 2 and 3, we can see that, images of these
its parameters r, λ, µ, dk and fk, respectively, we evaluate
data sets look like a bit similar, and images belong to the
the parameter sensitivity of gDLPCA on the USPS digital
same class change slowly, which means that these images
handwritten data set.
lie on a low-dimensional manifold.
As shown in Fig. 4, with different values of parameter λ,
µ, the optimal reduced dimensionality r should be more than
5.2 Experimental setting
35. In the experiments, r is set as 64.
In the USPS data set, the data graph neighborhood parame-
In order to evaluate the clustering performances of our pro-
ter dk is set as 4, and the feature graph neighborhood parameter
posed gDLPCA, we compared it with 4 other methods: (1)
fk is set as 11, we search the optimal parameters λ and µ from
k-means on original data, (2) standard PCA, (3) locality
− 1 to 1, where the step length is 0.01. According to the results
shown in Fig. 5, we observed that, smaller value of parameter
µ have better classification accuracy. In the experiments, the
4
  http://archi​ve.ics.uci.edu/ml/datas​ets/Semei​on+Handw​ritte​n+Digit​. parameter λ and µ are set as 0.5 and − 0.2, respectively.

13
J. He et al.

Fig. 4  The relation between classification accuracy and reduced dimensionality r in gDLPCA

Fig. 5  The relation between classification accuracy and parameters λ, Fig. 6  The relation between classification accuracy and parameters fk,
µ  dk

In order to study the influences of graph neighborhood 5.4 Clustering results and analysis


parameters fk and dk, we investigated the variability of clas-
sification performances with different values of the two In the clustering experiments, two standard clustering met-
parameters. The classification accuracies on the USPS data rics, the accuracy (ACC) and the normalized mutual infor-
set are visualized in Fig. 6. As can be seen, with the neigh- mation (NMI), are used to measure the clustering perfor-
borhood parameters change from 1 to 30, the classification mance of each method (Cai et al. 2011).
accuracy varies from 0.91 to 0.93. Therefore, the gDLPCA The ACC metric measures the ratio of each cluster con-
method is robust to neighborhood parameter of data graph tains data points from the true corresponding class, which
and feature graph. For simplicity, the data graph neighbor- is defined as follows:
hood parameter dk is set as 4, and the feature graph neighbor- ∑n � � � �
hood parameter fk is set as 11 on all the datasets. i=1 𝛿 map 𝜏i , li
ACC =
n
where 𝛿(⋅, ⋅) is the indicator function, which is equal to one if
its two independent variables are equal and zero otherwise,

13
Graph-dual Laplacian principal component analysis

Table 3  Optimal clustering results with ACC​metric (%) denotes the number of samples that are in the intersection
Name Original PCA LPP gLPCA gDLPCA between the true cluster Ci and predicted cluster Ti. Similar
to ACC, a larger NMI value indicates a better clustering
USPS 66.65 68.05 67.90 66.85 70.45 performance.
ISOLET1 60.80 60.90 67.70 56.90 59.90 The ACC metric is based on one-to-one match between
COIL20 59.40 62.80 66.30 68.60 69.20 cluster labels and true labels, while NMI is an external cri-
SHD 59.70 60.60 63.00 65.30 65.30 terion, which evaluates the degree of similarity between
cluster labels and true labels. Table 3 reports the average
ACC on four data sets. The best results are highlighted in
Table 4  Optimal clustering results with NMI metric (%) bold. Table 4 shows clustering results of these unsupervised
Name Original PCA LPP gLPCA gDLPCA
linear dimensionality reduction algorithms in terms of NMI
on these four data sets. We also highlighted the best results
USPS 61.08 61.56 62.66 59.56 62.76 in bold.
ISOLET1 76.20 76.00 81.30 72.40 75.70 From Tables 3 and 4, we have the following observations.
COIL20 75.00 76.00 79.20 80.10 81.00 gDLPCA is superior to all other methods and acquires the
SHD 54.70 55.00 59.10 58.50 59.10 best result in terms of clustering evaluation metrics, ACC
and NMI, on almost all the data set. However, LPP performs
the best on ISOLET1 data set, while gDLPCA and gLPCA
and 𝜏i denotes the cluster
( ) label of xi , and li denotes the true perform the worst. The reason is perhaps that the local simi-
class label, and map 𝜏i is the permutation mapping function larity is a typical characteristic of ISOLET1 data set, which
(Lovász and Plummer 2009) that maps the cluster label 𝜏i to is a sequential data set. Since the LPP can preserve the local
the equivalent label from the data set. Larger value of ACC similarity of sequential data set, it captures the cluster struc-
indicates a better clustering performance. ture of ISOLET1 data set. While PCA, gLPCA and gDLPCA
In addition, NMI is used to determine the quality of clus- take the total variance maximization as prior information,
ters, which is defined as follows: which is not suitable for speech sequential data.
∑c ∑c ni,j
The clustering performances with the different reduced
i=1 j=1 ni,j log ⌢ dimensionalities r are shown in Fig. 7, 8, 9, 10, 11 and 12.
ni n j
NMI = � Compared with gLPCA method, the main improvement is
�∑ ��∑ ⌢
nj

that gDLPCA utilizes the local geometrical information
c ni c ⌢
n log
i=1 i n j=1
nj log n of feature space by feature graph regularization in gLPCA
model. Therefore, we can draw a conclusion that the infor-
mation in feature space is of great importance for data
where ni and nj are the numbers of data samples in the true

clustering.
cluster Ci and predicted cluster Ti, respectively, and ni,j

Fig. 7  The clustering results on USPS data set

13
J. He et al.

On the USPS data set, according to the ACC evalua-


tion metric, when the reduced dimensionality is greater
than 10, gDLPCA has achieved a better performance than
other methods remarkably. While PCA and gLPCA have
comparable performances, LPP performs not well with the
larger reduced dimensionality. However, according to the
NMI evaluation metric, the performance improvement of
gDLPCA is slight, while gLPCA performs worst. In order
to visualize the clustering performances on USPS data
set, the cluster confusion matrix of gDLPCA is plotted in
Fig. 8. The value of confusion matrix is larger, the color in
the corresponding position is deeper, too. Each column of
the matrix represents the samples in the predicted cluster
while each row represents the samples in the true cluster,
vice-versa (Powers 2011). As shown in Fig. 8, the clustering
error focused on image samples between 1 and 7, 5 and 3,
Fig. 8  The cluster confusion matrix of gDLPCA on USPS data set

Fig. 9  The clustering results on ISOLET1 data set

Fig. 10  The clustering results on COIL20 data set

13
Graph-dual Laplacian principal component analysis

Fig. 13 and the cluster errors are focused on similar images,


such as digit 4 and 6, digit 8 and 2, digit 9 and 3, digit 10
and 6, and so on.
The performance of gDLPCA varies on different data
sets. The results in Fig. 4, 6, Figs. 7 and 8 indicate that
gDLPCA and gLPCA are more suitable on image data sets.

5.5 Classification results and analysis

Since PCA, gLPCA and gDLPCA are feature extraction


methods and local geometrical structure preserving criterion
can lead to discriminant representations, k-nearest neighbor
classification accuracy on projected low-dimensional rep-
resentations are reported in Table 5. Also, the classification
accuracies with different reduced dimensionality r are shown
in Fig. 14, 15, Figs. 16 and 17. All these numerical and
graphical comparison results show that the performances of
Fig. 11  The cluster confusion matrix of gDLPCA on COIL20 data set gDLPCA are comparable to other methods. Disappointedly,
as previous results on clustering experiments, gDLPCA and
gLPCA perform not well on ISOLET1 data set. It indicates
6 and 4, 8 and 10, 10 and 5. Maybe these digital images are that both of these two methods are not suitable on speech
more similar than others. data set. Noted that gDLPCA and gLPCA perform better
On the ISOLET1 data set, according to the evaluation than others on SHD data set.
metrics, LPP performs best, while PCA and gDLPCA per- The classification confusion matrix on image data sets
form almost the same, and they have better performances USPS, COIL20 and SHD are shown in Figs. 18, 19 and 20.
than gLPCA. It is maybe shown that LPP is more suitable From Fig. 18, we can find that, the misclassified digit images
for speech data set. focused between digit 1 and 7, digit 2 and 7, digit 4 and
On the COIL20 data set, gDLPCA and gLPCA performs 9, digit 8 and 10, digit 8 and 3, digit 9 and 4, digit 10 and
better than others significantly. Under the ACC metric, the 5. Figure 19 shows that the classification performance of
gDLPCA have a slight better performance than gLPCA. Fig- gDLPCA on COIL20 data set is more stable, and the most
ure 11 shows the cluster confusion matrix on COIL20 data confused object is object 19. As can be seen in the confu-
set, and we can find that, the cluster errors occurred on class sion matrix in Fig. 20, the most confused digit images are
2, 3, 5, 8 and 18. digit 6 and 4.
On the SHD data set, we find the similar results with the
COIL20 data set. The cluster confusion matrix is shown in

Fig. 12  The clustering results on SHD data set

13
J. He et al.

Fig. 15  The classification results on ISOLET data set


Fig. 13  The cluster confusion matrix of gDLPCA on SHD data set

Table 5  Optimal classification results with ACC​metric (%)


Name Original PCA LPP gLPCA gDLPCA

USPS 86.63 88.13 88.18 88.13 88.87


ISOLET1 85.10 85.50 86.50 71.70 79.60
COIL20 96.00 98.20 98.40 95.60 96.70
SHD 81.50 84.20 85.00 84.50 86.80

Fig. 16  The classification results on COIL20 data set

into PCA model to achieve a better performance in dimen-


sionality reduction and feature extraction. Different from
traditional data graph regularized dimensionality reduction
methods, dual graph regularized methods can preserve the
local geometrical information of both data space and fea-
ture space simultaneously. Given two graphs constructed
from data manifold and feature manifold, the objective
Fig. 14  The classification results on USPS data set function of gDLPCA models is proposed, then a closed-
form solution is developed which means that gDLPCA
is computationally convenient, and finally, two kinds of
6 Conclusions extensive experiments on four benchmark data sets are
designed to demonstrate the effectiveness of gDLPCA.
Since the geometrical and topological properties of the The comparison results show that the gDLPCA method
manifold can be represented by graph model, we incorpo- achieves better clustering and classification performances
rated data graph embedding and feature graph embedding than gLPCA.

13
Graph-dual Laplacian principal component analysis

Fig. 17  The classification results on SHD data set


Fig. 19  The classification confusion matrix of gDLPCA on COIL20
data set

Fig. 18  The classification confusion matrix of gDLPCA on USPS


data set
Fig. 20  The classification confusion matrix of gDLPCA on SHD data
set
Noted that dual graph regularization is a general mani-
fold regularization framework. It can be used as a general
regularization technique to develop new data representa- Acknowledgements  The authors would like to thank the anonymous
tion methods. Obviously, graph construction plays a critical reviewers and the editor for their helpful comments and suggestions
to improve the quality of this paper. We also thank Zhaolu Guo of
role in gDLPCA, and how to construct a data graph and Jiangxi University of Science and Technology for helpful discussions
feature graph with high quality is also an open question. and Jie Su of College of Information Engineering in Northwest A&F
Additionally, the graph parameters, e.g., the neighborhood University for his much experimental works. This work was supported
parameters, should be artificially defined in advance and it in part by National Natural Science Foundation of China under Grant
61602388, the China Postdoctoral Science Foundation under Grant
is difficult to choose the optimal graph parameters. Finally, 2018M633585, Natural Science Basic Research Plan in Shaanxi Prov-
the applications of gDLPCA in other fields (Ren et al. 2015; ince of China under Grant 2018JQ6060, Doctoral Starting up Founda-
Shen et al. 2015; ZHOU et al. 2015) will be investigated in tion of Northwest A&F University under Grant 2452015302, and the
future. Open Project Program of the National Laboratory of Pattern Recogni-
tion (NLPR) under Grant 201700009.

13
J. He et al.

References Shahid N, Kalofolias V, Bresson X, Bronstein M, Vandergheynst P


(2015) Robust principal component analysis on graphs. In: IEEE
international conference on computer vision, pp 2812–2820
Bao BK, Liu G, Xu C, Yan S (2012) Inductive robust principal compo-
Shahid N, Perraudin N, Kalofolias V, Ricaud B, Vandergheynst P
nent analysis. IEEE Trans Image Process 21:3794–3800
(2016) PCA using graph total variation. In: IEEE international
Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral tech-
conference on acoustics, speech and signal processing
niques for embedding and clustering. Adv Neural Inf Process Syst
Shang F, Jiao LC, Wang F (2012) Graph dual regularization non-
14:585–591
negative matrix factorization for co-clustering. Pattern Recogn
Bi M, Xu J, Wang M, Zhou F (2016) Anomaly detection model of
45:2237–2250
user behavior based on principal component analysis. J Ambi-
Shang R, Zhang Z, Jiao L, Liu C, Li Y (2016) Self-representation based
ent Intell Hum Comput 7:547–554. https​://doi.org/10.1007/s1265​
dual-graph regularized feature selection clustering. Neurocomput-
2-015-0341-4
ing 171:1242–1253
Brooks J, Dulá J, Boone E (2013) A pure L1-norm principal component
Shen J, Tan H, Wang J, Wang J, Lee S (2015) A novel routing proto-
analysis. Comput Stat Data Anal 61:83
col providing good transmission reliability in underwater sensor
Cai D, He X, Han J, Huang TS (2011) Graph regularized nonnegative
networks. J Internet Technol 16:171–178
matrix factorization for data representation. IEEE Trans Pattern
Sindhwani V, Hu J (2009) Mojsilovic a regularized co-clustering with
Anal Mach Intell 33:1548–1560
dual supervision. In: Advances in Neural information processing
Candes EJ, Li X, Ma Y, Wright J (2009) Robust principal component
systems, pp 1505–1512
analysis? J ACM 58(3):11
Smola AJ (1997) Kernel principal component analysis. In: Interna-
Gu B, Sheng VS (2017) A robust regularization path algorithm for
tional conference on artificial neural networks, pp 583–588
ν-support vector classification. IEEE Trans Neural Netw Learn
Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis
Syst 28:1241–1248
of microarray data. Bioinformatics 18:207–208
Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of
Sun Q, Xiang S, Ye J (2013) Robust principal component analysis via
the 15th ACM SIGKDD international conference on Knowledge
capped norms. In: ACM SIGKDD international conference on
discovery and data mining. ACM, pp 359–368
knowledge discovery and data mining, pp 311–319
Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental
Turk MA, Pentland AP (2002) Face recognition using eigenfaces.
learning for ν-support vector regression. Neural Netw 67:140–150
In: Computer vision and pattern recognition, 1991. proceedings
Gu B, Sun X, Sheng VS (2016) Structural minimax probability
CVPR ‘91., IEEE computer society conference on, pp 586–591
machine. IEEE Trans Neural Netw Learn Syst 28:1646–1656
Wang H (2012) Block principal component analysis with L1-norm for
Guan R, Wang X, Marchese M, Yang MQ, Liang Y, Yang C (2018)
image analysis. Pattern Recogn Lett 33:537–542
Feature space learning model. J Ambient Intell Hum Comput.
Wang J (2016) Generalized 2-D principal component analysis by Lp-
https​://doi.org/10.1007/s1265​2-018-0805-4
norm for image analysis. IEEE Trans Cybern 46:792–803
Guo Z, Liu G, Li D, Wang S (2017a) Self-adaptive differential evolu-
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for
tion with global neighborhood search. Soft Comput 21:3759–3768
vehicle classification. Inf Sci 295:395–406
Guo Z, Wang S, Yue X, Yang H (2017b) Global harmony search
Yan S, Xu D, Zhang B, Zhang H-J, Yang Q, Lin S (2007) Graph
with generalized opposition-based learning. Soft Comput
embedding and extensions: a general framework for dimension-
21:2129–2137
ality reduction. IEEE Trans Pattern Anal Mach Intell 29:40–51
He X, Niyogi P (2004) Locality preserving projections. In: Advances in
Yang T, Gao X, Sellars S, Sorooshian S (2015) Improving the multi-
neural information processing systems. MIT, London, pp 153–160
objective evolutionary optimization algorithm for hydropower res-
Jiang B, Ding C, Luo B, Tang J, Graph-Laplacian PCA (2013) Closed-
ervoir operations in the California Oroville-Thermalito complex
form solution and robustness. In: Computer vision and pattern
environmental. Model Softw 69:262–279
recognition. Taylor & Francis, Routledge, pp 3492–3498
Yang T, Asanjan AA, Faridzad M, Hayatbini N, Gao X, Sorooshian S
Jin T, Yu J, You J, Zeng K, Li C, Yu Z (2015) Low-rank matrix fac-
(2017) An enhanced artificial neural network with a shuffled com-
torization with multiple hypergraph regularizer. Pattern Recogn
plex evolutionary global optimization with principal component
48:1011–1022
analysis. Inf Sci 418:302–316
Jolliffe IT (2011) Principal component analysis. J Mark Res 87:513
Yang T, Tao Y, Li J, Zhu Q, Su L, He X, Zhang X (2018) Multi-
Kargupta H, Huang W, Sivakumar K, Johnson E (2001) Distributed
criterion model ensemble of CMIP5 surface air temperature over
clustering using collective principal component analysis. Knowl
China. Theor Appl Climatol 132:1057–1072
Inf Syst 3:422–448
Yin M, Gao J, Lin Z, Shi Q, Guo Y (2015) Dual graph regularized
Kwak N (2014) Principal component analysis by-norm maximization.
latent low-rank representation for subspace clustering. IEEE Trans
IEEE Trans Cybern 44:594–609
Image Process 24:4918–4933
Lee D (1999) Learning the parts of objects with nonnegative matrix
Zhang Z, Zhao K (2013) Low-rank matrix approximation with
factorization. Nature 401:788
manifold regularization. IEEE Trans Pattern Anal Mach Intell
Liang Z, Xia S, Zhou Y, Zhang L, Li Y (2013) Feature extraction based
35:1717–1729
on Lp-norm generalized principal component analysis. Pattern
Zhou J, Tang M, Tian Y, Al-Dhelaan A, Al-Rodhaan M, LEE S (2015)
Recogn Lett 34:1037–1045
Social network and tag sources based augmenting collaborative
Lin G, Tang N, Wang H (2014) Locally principal component analysis
recommender system. IEICE Trans Inf Syst 98:902–910
based on L1-norm maximisation. Image Process Iet 9:91–96
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component
Lovász L, Plummer MD (2009) Matching theory, vol 367. American
analysis. J Comput Gr Stat 15:265–286
Mathematical Society, Providence
Powers DM (2011) Evaluation: from precision, recall and F-measure
Publisher’s Note Springer Nature remains neutral with regard to
to ROC, informedness, markedness and correlation
jurisdictional claims in published maps and institutional affiliations.
Ren Y, Shen J, Wang J, Han J, Lee S (2015) Mutual verifiable provable
data auditing in public cloud storage. J Int Technol 16(2):317–323
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by
locally linear embedding. Science 290:2323–2326

13

You might also like