Professional Documents
Culture Documents
Authorized licensed use limited to: Xiamen University. Downloaded on December 25,2020 at 07:38:17 UTC from IEEE Xplore. Restrictions apply.
When all weights are equal to 1, i.e., Wij = 1 for i = 1, . . . , m and 3. ALGORITHMS
j = 1, . . . , n, (1) is identical to the standard NMF
Throughout this paper, we consider a user-rating matrix X ∈ We present our algorithms for WNMF, exploiting alternating non-
Rm×n , where the (i,j)-entry Xij corresponds to the rating given by negative least squares and generalized EM optimization.
user i on item (movie) j. For example, Netflix prize dataset con-
tains 100,480,507 observed entries corresponding to ratings given by 3.1. ANLS-WNMF
480,189 users on 17,770 movies. Approximately only 1 % is filled
with observed ratings. Matrix completion is a popular method to Several fast algorithms for un-weighted NMF have been recently de-
predict unobserved predictions, which motivated us to pay attention veloped [14, 10, 7, 6], where alternating nonnegative least squares
to the problem of WNMF, since user-rating matrices are incomplete (ANLS) is used to solve the following two nonnegative least squares
nonnegative matrices. In the case of a user-rating matrix, two fac- (NLS) problems in an alternative fashion
tor matrices U and V determined by the minimization of (1) are min X − U V 2 and min X − U V 2 ,
interpreted as follows: U ≥0 V ≥0
• Row i in X is user i’s rating profile. with V and U fixed, respectively. With U fixed, the NLS prob-
• Columns in V are associated with rating profiles from r user lem is tackled by solving multiple right-hand-side nonnegative least
communities. In other words, Vij corresponds to the rating squares (MRHS-NLS) problems
given by user community j on item i.
min x1 − U v 1 2 , . . . , min xn − U v n 2 ,
v 1 ≥0 v n ≥0
• Row i in U is user i’s affinities for r user communities.
We briefly summarize two existing methods for WNMF. A di- where X = [x1 , . . . , xn ] ∈ Rm×n and V = [v 1 , . . . , v n ] =
rect extension of NMF which incorporates binary weights into NMF [v 1 , . . . , v r ] ∈ Rn×r (row i and column j in V are denoted by
multiplicative updates [11] is referred to as Mult-WNMF in this pa- (v i ) and v j , respectively and the same notation is applied to other
per. Multiplicative updates for Mult-WNMF are as follows. matrices).
Various optimization methods can be used to solve the NLS
(W X ) V problem, including the projected gradient method [3], projected
U ← U , (2)
W UV V Newton method [2], and active set method [8]. In the case of
MRHS-NLS problem, each problem can be solved separately but
W X U this approach is inefficient and is very slow. Note that we only
V ← V , (3)
W V U U have to compute the Hessian matrix one time because all Hessian
matrices are equal to U U for each NLS problem. Incorporating
where is Hadamard product (element-wise product) and the divi- this property, Bro and Jong made a substantial speed improvement
sion is performed in an element-wise manner as well. in active set method [4]. Benthem and Keenan further improved it
An EM algorithm for WNMF was proposed by Zhang et al. [15]. by re-arranging calculations such that pseudo-inverse computations
E-step corresponds to imputation where a filled-in matrix Y is com- of sub-matrices of Hessian are minimized [1].
puted using the current model estimate and the standard NMF mul- ANLS is a block coordinate descent method where its conver-
tiplicative updates are applied on the filled-in matrix in the M-step. gence to a stationary point is guaranteed when the optimal solution
is determined for each block. The NMF problem is non-convex but
Algorithm Outline: EM-WNMF [15] each sub-problem is convex, leading ANLS to work properly. ANLS
can be also applied to WNMF because each sub-problem is still con-
vex. The sub-problem of WNMF is a collection of several NLS prob-
• E-step
lems. For example,
Y ← W X + (1m×n − W ) U V (4) 1 2
min Wij Xij − [U V ]ij
V ≥0 2 i,j
• M-step
1
m
2
YV ⇒ min wi1 xi1 − [U v 1 ]i ,
U ← U , (5) v ≥0 2 i=1
1
U V V
1
m
V ← V
Y U
, (6) . . . , min win (xin − [U v n ]i )2 ,
v n ≥0 2
V U U i=1
1
⇒ min (D 1 x1 ) − (D 1 U )v 1 2 ,
v 1 ≥0 2
In EM-WNMF, the weight matrix W is normalized such that 1
all entries are in the range between zero and one (i.e. W ← . . . , min (D n xn ) − (D n U )v n 2 ,
v n ≥0 2
W / maxi,j (Wij )) and 1m×n ∈ Rm×n is the matrix with all el-
ements filled with ones. It was shown in [15] that EM-WNMF where D i = diag(w i )1/2 which is a diagonal matrix with diagonal
outperforms Mult-WNMF in a task of collaborative prediction. entries corresponding to the square root of entries in w i . We also
However, EM-WNMF and Mult-WNMF suffer from slow conver- define D i = diag(w i )1/2 .
gence, since they are based on multiplicative updates. Our empirical Unfortunately we can not use Benthem and Keenan’s algorithm
study indicates that the accuracy of predicting missing values by [1] because the sub-problem (7) is not the MRHS-NLS problem (i.e.
these methods is not satisfactory. the Hessian matrix U D w i U is different for each NLS problem).
1542
Authorized licensed use limited to: Xiamen University. Downloaded on December 25,2020 at 07:38:17 UTC from IEEE Xplore. Restrictions apply.
Thus, we have to solve each NLS problem separately. In our imple-
mentation, we use the projected Newton method [2] for single NLS Algorithm Outline: GEM-WNMF
problem.
Initialize U (1) ≥ 0 and V (1) ≥ 0.
Algorithm Outline: ANLS-WNMF
For t = 1, 2, . . .
Initialize U (1) ≥ 0 and V (1) ≥ 0. E ← W (X − U (t) V (t) ),
For t = 1, 2, . . .
• Update U row-by-row. For i = 1, 2, . . . , m U (t+1) ← MRHS-NLS V (t) V (t) , EV (t) + U (t) V (t) V (t) ,
1 E ← W (X − U (t+1) V (t) ),
ui(t+1) ← arg min D i xi − D i V (t) u2 ,
u≥0 2 V (t+1) ← MRHS-NLS U (t+1) U (t+1) ,
1 m 1 m
where X = [x , . . . , x ] , and W = [w , . . . , w ] . E U (t+1) + V (t) U (t+1) U (t+1) .
• Update V row-by-row. For j = 1, 2, . . . , n
1
v j(t+1) ← arg min D j xj − D j U (t+1) v2 ,
v ≥0 2
4. NUMERICAL EXPERIMENTS
where X = [x1 , . . . , xn ], and W = [w 1 , . . . , w n ].
We evaluate the performance of three algorithms (Mult-WNMF,
ANLS-WNMF, GEM-WNMF) in terms of the convergence speed
3.2. GEM-WNMF and the prediction accuracy of missing values. All these algorithms
were implemented in Matlab and all experiments were run on Intel
Expectation-maximization (EM) is one of powerful methods in han- Core2 Quad 2.4 GHz processor with 8 GB memory under Windows
dling missing values in maximum likelihood estimation [5]. An EM Vista 64bit. We use MovieLens and Netflix prize datasets for our
algorithm for WNMF, proposed in [15], is described in Sec. 2, where experiments, which are summarized in Table 1.
E-step (4) is performing imputation to determine Y and M-step (5)
and (6) re-estimate factor matrices by applying unweighted NMF to
the filled-in matrix Y . Table 1. Dataset descriptions.
The filled-in matrix Y , in general, is a dense matrix even if X
is a very sparse matrix, which prohibits the E-step in EM-WNMF MovieLens Netflix prize
for large-scale data. The memory space required to store Y and the # of users 6,040 480,189
computational complexity to calculate U V are extremely large. # of movies 3,952 17,770
Moreover, the M-step requires much more computation time. # of ratings 1,000,209 100,480,507
We overcome this problem by interweaving E-step and partial density (%) 4.19 1.18
M-step. In contrast to EM-WNMF where multiplicative updates are
used in the M-step, we employ ANLS optimization for the M-step, In order to get a fair comparison, we design our experiment fol-
where we use Benthem and Keenan’s algorithm [1] with projected lowing way for each test case;
Newton method to solve MRHS-NLS:
1. Always use the same randomly generated starting point for
U ← MRHS-NLS(V V , Y V ), (7) every algorithms.
V ← MRHS-NLS(U U , Y U ), (8) 2. Run the Mult-WNMF algorithm for a given number of itera-
tions, and record the CPU time used.
where two input arguments in MRHS-NLS(·, ·) are the terms re-
quired to compute the gradient and Hessian. For example, ∇U = 3. Then run other algorithms, and stop them once the CPU time
V V U − Y V and ∇2 u1 = · · · = ∇2 um = V V . It fol- used is equal to or greater than that used by the Mult-WNMF
lows from (7) and (8) that only Y V or Y U are required in the algorithm.
M-step, instead of Y itself. This simple trick dramatically allevi- We note that each algorithm require different computation time for
ates the computation and space burden. For example, the calculation one iteration. Steps 2 and 3 in our experiment design ensure that a
Y V is given by fair comparison is carried out so that every algorithms are run for
approximately the same amount of time for each test case.
YV = W X + (1m×n − W ) (U V ) V We set the iteration number of Mult-WNMF to 600. With these
setting, the training time was about 500 seconds and 24 hours for
= W (X − U V ) V + U (V V ), MovieLens and Netflix prize data respectively. In the case of Movie-
Lens data, we randomly split the user-ratings into 5 folds and only
which requires [U V ]ij for only Wij > 0, instead of the entire used 4 partitions for training. In the case of Netflix prize data, we
U V . held out 1,408,395 user-ratings which belong to prove set for test
In addition to this simple trick, we use partial M-step. At earlier data. Note that we added regularization term λ2 (U 2 + V 2 ) into
iterations, estimation for missing values is not accurate, so solving objective function (1) to prevent over-fitting problem, where λ = 3
M-step exactly is not desirable. Thus, iterations in the M-step (7) in our all experiments.
and (8) proceed just until substantial improvement (instead of deter- Fig. 1 shows the root mean squared error (RMSE) values
mining optimal solutions). against time. Our algorithms, especially ANLS-WNMF, are faster
1543
Authorized licensed use limited to: Xiamen University. Downloaded on December 25,2020 at 07:38:17 UTC from IEEE Xplore. Restrictions apply.
Mult.−WNMF Mult.−WNMF fit better for large-scale data.
0.95 ANLS−WNMF 0.85 GEM−WNMF
GEM−WNMF ANLS−WNMF Acknowledgments: This work was supported by Korea Research
0.9 0.8 Foundation (Grant KRF-2008-313-D00939) and KOSEF WCU Pro-
RMSE
RMSE
gram (Project No. R31-2008-000-10100-0).
0.85 0.75
1544
Authorized licensed use limited to: Xiamen University. Downloaded on December 25,2020 at 07:38:17 UTC from IEEE Xplore. Restrictions apply.