You are on page 1of 4

2014 IEEE International Conference on Systems, Man, and Cybernetics

October 5-8, 2014, San Diego, CA, USA

Translation Non-negative Matrix Factorization with


Fast Optimization

Yuanyuan Wang Naiyang Guan Bin Mao


Department of Basic Courses Science and Technology on Parallel College of Science
and Distributed Processing Laboratory
College of Computer Science
Army Officer Academy National University of Defense Technology National University of Defense Technology
Hefei, P.R. China, 230031 Changsha, P.R. China, 410073 Changsha, P.R. China, 410073
Email: yywang1217@gmail.com Email: ny guan@nudt.edu.cn Email: miscy210@163.com

Xuhui Huang Zhigang Luo


College of Computer Science Science and Technology on Parallel and Distributed Processing Laboratory
College of Computer Science
National University of Defense Technology National University of Defense Technology
Changsha, P.R. China, 410073 Changsha, P.R. China, 410073
Email: hxh7033@sina.com Email: zgluo@nudt.edu.cn (corresponding author)

Abstract—Non-negative matrix factorization (NMF) recon- (NMFsc) to control the sparsity of each column or each row
structs the original samples in a lower dimensional space and of the factor matrices for vision processing. Cai et al. [9]
has been widely used in pattern recognition and data mining proposed graph regularized NMF (GNMF) to preserve the
because it usually yields sparse representation. Since NMF leads geometric structure of dataset. Guan et al. [10] proposed non-
to unsatisfactory reconstruction for the datasets that contain negative patch alignment framework (NPAF) to unify NMF-
translations of large magnitude, it is required to develop trans-
lation NMF (TNMF) to first remove the translation and then
related dimension reduction methods. Under NPAF, Guan et
conduct a decomposition. However, existing multiplicative update al. [13] proposed margin maximization based discriminative
rule based algorithm for TNMF is not efficient enough. In NMF (MD-NMF) to incorporate the label information of the
this paper, we reformulate TNMF and show that it can be dataset. To solve NMF, Lin [11] applied projected gradient
efficiently solved by using the state-of-the-art solvers such as method to alternatingly update each factor matrix. Under the
NeNMF. Experimental results on face image datasets confirm same block coordinate descent framework, Guan et al. [3][4]
both efficiency and effectiveness of the reformulated TNMF. proposed efficient NMF solvers based on the Nesterov optimal
Keywords—Non-negative matrix factorization (NMF), transla- gradient method. To deal with streaming data, Guan et al. [12]
tion transformation, NeNMF proposed an efficient online algorithm based on the Nemirovski
robust stochastic approximation.
I. I NTRODUCTION However, both NMF and its extensions assume that the
noise in dataset has zero means. It is somehow unreasonable
Data representation is a critical problem in practical ap- especially when the datasets contains offsets. Although the
plications such as computer vision and data mining. Good affine sparse NMF (AS-NMF, [1]) partially solve this problem
representation uncovers the intrinsic structure of the data and by automatically learning an offset and decompose the pure
boosts the subsequent processing. Although many approaches dataset by original NMF, it might introduce negative entries in
have tackled this problem, e.g., principal component analysis the obtained factor matrices, and its multiplicative update rule
(PCA) and singular value decomposition (SVD), they do based optimization converges slowly.
not consider the non-negativity property of raw data, and
thus generates representations which are inconsistent with the In this paper, we proposed translation NMF (TNMF) by
positive firing rate in human brain. reformulating AS-NMF in the frame of original NMF. Based
on this reformulation, the block coordinate descent based
Non-negative matrix factorization (NMF, [2]) solves this NeNMF [3] method can be naturally adopted to solve TNMF.
problem by constraining the factor matrices to be non-negative Experimental results of face recognition on popular datasets
during representing data under the matrix decomposition confirm the effectiveness of TNMF by comparing with both
framework. NMF approximates the data matrix by the product NMF and AS-NMF.
of two lower-rank non-negative factor matrices and the non-
negativity of factor matrices leads to parts-based representa- II. R ELATED W ORKS
tion. Such advantage greatly popularizes NMF.
This section reviews several related non-negative matrix
Many extensions have been proposed to improve the NMF factorization (NMF, [2]) methods including both original NMF
model. Hoyer [8] proposed NMF with sparseness constraints and affine sparse NMF (AS-NMF, [1]).

2871
978-1-4799-3840-7/14/$31.00 ©2014 IEEE
A. NMF matrices without the sparsity regularization, and 2) TNMF is
solved by the state-of-the-art NMF solver which is much faster
Given a non-negative data matrix V ∈ Rm×n
NMF
+ ,
than MUR. The objective of TNMF is
[2] finds two lower-rank matrices by solving the following
minimization problem: 1 
2
T
min V − W H − w
 0 1  . (6)
W ∈Rm×r
+
,H∈Rr×n
+
2 F
1 2
min V − W HF , (1)
W ∈Rm×r
+
,H∈Rr×n
+
2 By simple algebra, the objective function of (6) can be
reformulated as
where ·F signifies Frobenius norm. The squared Frobenius   2
norm measures the distance between V and its reconstruction 1 T 2 1 H 
||V − W H − w 0 1 ||F = V − W  T 
  , (7)
W H in the lower dimensional space, i.e., Rr . Although it 2 2 1 F
can be replaced by other functional, e.g., Kullback-Leiblur
divergence, we focus on the NMF model (1) as it has nice where W = [W, w
 0 ].
mathematical property.
Based on the reformulation strategy (7), we can easily solve
TNMF by alternatively updating W and H until convergence,
B. Affine Sparse NMF i.e., the objective value does not change. The factor matrices
Looking more carefully into the NMF model (1), it intrinsi- W , H, and the offset w  0 can be simply collected from the
cally assumes that the noise in V obeys a zero-mean Gaussian final result. The following sections introduce both MUR and
distribution. Therefore, NMF usually fails on some datasets NeNMF based algorithms to solve TNMF.
whose noise does not centralized around zero. To solve this
problem, Laurberg and Hansen [1] proposed an affine sparse A. Multiplicative Update Rule
NMF (AS-NMF) to handle offset in the dataset. The objective
of AS-NMF is In the frame of block coordinate descent , MUR first update
W with H fixed following by updating H with W fixed.
1
2
1T  Fixing H , according to [2], the following MUR decreases
min V − W H − w
 0  + λ1T H1, (2)
W ∈Rm×r
+
,H∈Rr×n
+
2 F the objective function of (7):
T
where w 0 signifies the offset in V and λ > 0 trade-offs the VW
two parts. W ←W ◦ T
, (8)
W HH
Laurberg and Hansen [1] applied a multiplicative update  
rule (MUR) to solve (2) as follow, i.e., alternatingly updating H
where H =  T . It is indirect to update H by MUR when
H, W and w  0 as follows: 1
W is fixed due to the offset vector w  0 . To this end, we rewrite
W T (V − w  01T ) the objective function of (6) as follows:
H←H◦ , (3)
WTWH + λ
1
2
 1 2
(V − w 01T )H T V − W H − w  01T  = V − W H F , (9)
W ←W◦ , (4) 2 F 2
W HW T
where V = V − w  01T . Since V might contain negative entries,
V 1 conventional MUR cannot be directly applied to update H.
w
0 ← w
0 ◦ , (5)
(W H + w  01T )1 By dividing V into positive and negative components, i.e.,
where ◦ signifies the element-wise multiplication. V = V + + V − such as V + = |V |+V2 and V − = |V |−V
2 , we
can update H by following MUR:
Since AS-NMF yields sparsity over the coefficient matrix
H by using the second term in (2), it implicitly guarantees the WTV −
non-negativity of (V − w  01T ). From the geometric viewpoint, H←H◦ . (10)
W T (W H + V +)
the sparse coefficients indicate representing (V − w 01T ) by the
tightest combination of the convex hull spanned by W . Since
In contrast to (3), eq. (10) guarantees the non-negativity of
W is non-negative, it is reasonable to believe that (V − w  01T ) H. It is easy to prove that (10) decreases the objective function
is also non-negative. of (9) based on the strategy in [5]. We omit the proof here due
to the limitation of space. Although MURs (8) and (10) are
III. TRANSLATION NMF: A NEW FORMULATION simple and easy to implement, they converges slowly because
Although AS-NMF performs well, it never explicitly guar- MUR is intrinsically a first-order method.
antee the non-negativity of (V − w  01T ) in (3) and (4), and thus
brings in negative entries in either H or W on some datasets. B. NeNMF-based Algorithm
In addition, the MUR-based algorithm converges slowly as it
NeNMF [3] is the state-of-the-arts NMF solver which alter-
is intrinsically a first-order optimization method.
natingly optimizing each factor matrix optimally with another
In this paper, we solve this problem by reformulating AS- one fixed and guarantees convergence to a stationary point. In
NMF into translation NMF (TNMF), and the difference is on particular, NeNMF recursively updates one factor matrix (W
two-folds: 1) TNMF guarantees the non-negativity of factor or H) with another fixed by solving the non-negative least

2872
squares (NNLS) problem. Taking the optimization of H for Nearest Neighbor (NN) classifier. The accuracy is calculated
example, the NNLS problem is by the percentage of test samples that are correctly classified.
1 2
min V − W H F , (11) A. YALE Dataset
H≥0 2

where V is defined as (9). According to (7), the optimization The YALE database [6] contains 165 face images collected
of W can be done similarly by solving the following NNLS: from 15 individuals, 11 images for each individual and show-
  2 ing varying facial expressions and configurations. For training,
1 H  we randomly selected different numbers (three and five) of
min   V − W  . (12)
W ≥0 2 1T F images per individual, and used the rest images for testing.
Such a trial was independently performed 10 times, and the
Taking of (11) for example, let fH (H) average accuracy were reported for evaluation.
 the optimization
2
denote 12 V − W H F , it is obvious that fH (H) is a convex D E

function of the type C 1,1 , i.e., continuously differentiable
 


with Lipschitz continuous
 gradient
 ∇fH (·), and the Lipschitz 

constant is L(fH ) = W T W , where · denotes the matrix 



spectral norm. Guan et al. [3] applied the Nesterov optimal
 
gradient method (OGM) to solve NNLS. In particular, at the

$FFXUDF\ 

$FFXUDF\ 
k-th iteration round, OGM records the two previous search 


points, i.e., Hk−1 and Hk , and constructs an auxiliary point 


Yk by a convex combination of the recorded two search 

points, i.e., Yk = Hk + αk−1 −1


(Hk − Hk−1 ), where αk = 
 αk  10) 10)
1+ 1+4α2k−1 $610)

$610)
2 and α0 = 1. OGM forwards the search point 710) 710)
   
based on the constructed auxiliary point Yk and determines  
5HGXFHG'LPHQVLRQDOLW\
  
5HGXFHG'LPHQVLRQDOLW\


the step size by the Lipschitz constant


 of gradient, i.e.,
 1
Hk+1 = + Yk − L(fH ) ∇fH (YH ) . Here Yk is initialized Figure 2. Average face recognition accuracy (%) and deviation versus reduced
dimensionality on the YALE dataset when three (a) and five (b) images per
by H0 , i.e., Y0 = H0 , and by the previous value of H for individual were selected for training.
“warm start”.
Benefiting from the smartly constructed auxiliary point, Figure 2 gives the average face recognition accuracies and
the OGM algorithm converges at the rate of O(1/k 2 ) for standard deviations of TNMF, NMF and AS-NMF on the
solving NNLS without any complex line search procedure [3]. YALE dataset. By comparing NMF curve and AS-NMF curve,
Therefore, NeNMF is much faster than MUR. we can see that NMF performs better than AS-NMF because
it achieves good solution with lower objective value. TNMF
outperforms NMF and significantly boosts the performance of
AS-NMF.

B. UMIST Dataset
The UMIST database [7] consists of totally 564 face
images taken from 20 people. The individuals are a mix of
race, sex, and appearance and are photographed in a range
of poses from profile to frontal views. For each individual,
different numbers (three and five) of images were randomly
selected for training, and the rest images were used for testing.
;ĂͿ ;ďͿ
Such trial was independently performed 10 times, and then the
average accuracies were calculated as final result.
Figure 1. Face examples of (a) YALE and (b) UMIST datasets. Figure 3 gives the average face recognition accuracies and
standard deviations of TNMF, NMF and AS-NMF on the
IV. EXPERIMENTS UMIST dataset. It shows that TNMF significantly outperforms
NMF because it deals with the offset in dataset. TNMF
This section verifies the effectiveness of TNMF by face performs slightly better than AS-NMF when three images from
recognition on popular datasets including YALE [6] and each individual were selected for training. It is consistent with
UMIST [7]. Figure 1 shows some face examples of YALE the observation obtained from Figure 2. When the reduced
and UMIST datasets. In this experiment, we first divided each dimensionality is greater than 70 in Figure 3(b), AS-NMF
dataset into training set and test set, and then conducted TNMF, slightly outperforms TNMF because it takes advantage of the
NMF (solved by NeNMF [3]), and AS-NMF [1] on the training sparsity induced over the coefficients (see (2)). It is easy to
set to learn a lower dimensional space. On the classification extend TNMF to induce similar sparsity over coefficients, but
stage, we projected both training and test set onto the learned TNMF shows great superiority because it guarantees the non-
lower dimensional space, and classified each test sample by the negativity of factor matrices.

2873
D E 
D 
E
       
1H10)EDVHG710) 1H10)EDVHG710)

2EMHFWLYH9DOXH

2EMHFWLYH9DOXH
 
085EDVHG710) 
085EDVHG710)
  

 
 
 
 
$FFXUDF\ 

$FFXUDF\ 
    
            
1XPEHURI,WHUDWLRQV &386HFRQGV
 F G
 
    
 1H10)EDVHG710) 1H10)EDVHG710)

2EMHFWLYH9DOXH

2EMHFWLYH9DOXH
085EDVHG710) 085EDVHG710)
 
 
 

 10) 10)
$610)  $610)
710) 710) 

 


                 
        1XPEHURI,WHUDWLRQV &386HFRQGV
5HGXFHG'LPHQVLRQDOLW\ 5HGXFHG'LPHQVLRQDOLW\

Figure 5. The objective values versus number of iterations (a), and versus
Figure 3. Average face recognition accuracy (%) and deviation versus reduced CPU seconds (b) when the reduced dimensionality is 10, and the reduced
dimensionality on the UMIST dataset when three (a) and five (b) images per dimensionality is 100 (c and d) on the UMIST dataset.
individual were selected for training.

VI. ACKNOWLEDGEMENT
C. Efficiency Study
This work was partially supported by Scientific Research
To study the efficiency of NeNMF-based algorithm for Plan Project of NUDT (JC13-06-01) and Research Fund for
TNMF, we conducted both NeNMF-based algorithm and the Doctoral Program of Higher Education of China, SRFDP
MUR-based algorithm on both YALE and UMIST datasets. (under grant No. 20134307110017).
Both algorithms start from the identical initial point for fair
comparison. To study the scalability of the proposed algorithm, R EFERENCES
we chose the reduced dimensionality as 10 and 100. The [1] H. Laurberg and L. K. Hansen, “On Affine Non-negative Matrix
efficiency is compared in terms of both number of iterations Factorization,” In Proceeding of IEEE International Conference on
and CPU seconds. Acoustics, Speech, and Signal Processing, pp. 653-656, 2007.
[2] D. D. Lee and H. S. Seung, “Algorithms for Non-negative Matrix Fac-
D E


 


torization,” In Proceeding of Advances in Neural Information Process
1H10)EDVHG710) 1H10)EDVHG710) Systems, pp. 556-562, 2001.
2EMHFWLYH9DOXH

2EMHFWLYH9DOXH


085EDVHG710) 
085EDVHG710)
  [3] Guan, N., Tao, D., Luo, Z., and Yuan, B., “NeNMF: An Optimal Gradi-
 
ent Method for Non-negative Matrix Factorization, IEEE Transactions
 
on Signal Processing, vol. 60, no. 6, pp. 2882-2898, 2012.


 


[4] N. Guan, D. Tao, Z. Luo, and J. Shawe-taylor, “MahNMF: Manhattan
           
1XPEHURI,WHUDWLRQV &386HFRQGV Non-negative Matrix Factorization,” arXiv:1207.3438v1, 2012.
F G


 


[5] Ding, C., Li, T., and Jordan, M. I., “Convex and Semi-nonnegative
1H10)EDVHG710) 1H10)EDVHG710) Matrix Factorizations,” IEEE Transactions on Pattern Analysis and
2EMHFWLYH9DOXH

2EMHFWLYH9DOXH

085EDVHG710) 085EDVHG710)




Machine Intelligence, vol. 32, no. 1, pp. 45-55, 2010.
 
[6] P. Belhumeour, J. Hespanha, and D. Kriegman, “Eigenfaces versus
 
Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE
  Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp.
   
             711-720, July 1997.
1XPEHURI,WHUDWLRQV &386HFRQGV
[7] D.B. Graham and N.M. Allinson, “Characterizing Virtual Eigen-
Figure 4. The objective values versus number of iterations (a), and versus signatures for General Purpose Face Recognition,” Face Recogni- tion:
CPU seconds (b) when the reduced dimensionality is 10, and the reduced From Theory to Applications, H. Wechsler, P.J. Pillips, V. Bruce, F.
dimensionality is 100 (c and d) on the YALE dataset. Fogelman-Soulie and T.S. Huang, eds., pp. 446-456, Springer, 1998.
[8] Hoyer, P. O., “Non-Negative Matrix Factorization with Sparseness
Constraints,” Journal of Machine Learning Research, vol. 5, pp. 1457-
Figure 4 compares the efficiencies of NeNMF-based al- 1469, 2004.
gorithm and MUR-based algorithm on the YALE dataset. It [9] Cai, D., He, X., and Han, J., “Graph Regularized Non-negative Matrix
indicates that NeNMF-based algorithm converges significantly Factorization for Data Representation,” IEEE Transactions on Pattern
faster than MUR-based algorithm in terms of both number of Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548-1560, 2011.
iterations and CPU seconds when the reduced dimensionality [10] Guan, N., Tao, D., Luo, Z., and Yuan, B., “Non-negative Patch Align-
equals to 10 and 100. Figure 5 gives the same observation. ment Framework,” IEEE Transactions on Neural Networks, vol. 22, no.
8, pp. 1218-1230, 2011.
[11] Lin, C. J., “Projected Gradient Methods for Nonnegative Matrix Fac-
torization,” Neural Computation, vol. 19, no. 10, pp. 2756-2779, 2007.
V. CONCLUSION
[12] Guan, N., Tao, D., Luo, Z., and Yuan, B., “Online Non-negative
Matrix Factorization with Robust Stochastic Approximation,” IEEE
This paper improves the original NMF on the datasets that Transactions on Neural Networks and Learning Systems, vol. 23, no.
contain offset and presents a translation NMF (TNMF) method. 7, pp. 1087-1099, 2012.
Since the nice property of the reformulated objective function, [13] Guan, N., Tao, D., Luo, Z., and Yuan, B., “Manifold Regularized
NeNMF can be naturally adopted to efficiently solve TNMF. Discriminative Non-negative Matrix Factorization with Fast Gradient
Experimental results on two face image datasets show that Descent,” IEEE Transactions on Image Processing, vol. 20, no. 7, pp.
TNMF is promising. 2030-2048, 2011.

2874

You might also like