You are on page 1of 22
‘1273, 8:14 PM Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium Neural Graph Collaborative Filtering Ts Ae Group 61 + Member-only story Implementing Neural Graph Collaborative Filtering in PyTorch gp Yusuf Noor - Follow 10min read = Apr 21,2020 Su Q xh © 8 Neural Graph Collaborative Filtering (NGCF) is a Deep Learning recommendation algorithm developed by Wang et al. (2019), which exploits the user-item graph structure by propagating embeddings on it. The goal of this article is to reproduce the results of the paper. We will be doing this by ! n-pytorch-44021GH2563 ea introducing a new code variant, done in PyTorch. To test its generalization, we will be doing tests on a new data set as well, namely the MovieLens: ML- 100k dataset. Finally, we will do a hyper-parameter sensitivity check of the algorithm on this new data set. The full code is available at our repository. For more reproductions on this paper and several other interesting papers in the Deep Learning field, we refer you to Authors: Mohammed Yusuf Noor (4445406), Muhammed Imran Ozyar (4458508), Calin Vasile Simon (4969324) Background Information Collaborative Filtering (CF) is a method for recommender systems based on information regarding users, items and their connections. Recommendations are done by looking at the neighbors of the user at hand and their interests. Since they are similar, the assumption is made that they share the same interests. 3/3/38 |8 user t 40 user2 50 | 10 users | 35 46 user 25 Example of a user-item matrix in collaborative filtering nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 2122 Graph Neural Networks (GNN) are graphs in which each node is represented by a recurrent unit, and each edge is a neural network, In an iteration, each recurrent unit (node) passes a message to all its neighbors, that receive it after it is propagated through the neural network (edge). o NN, o NN, A Recurrent unit Graph neural networks: Variations and applications (from video on GNNs) Neural Graph Collaborative Filtering The user-item matrix in CF is a sparse matrix containing information about the connections between users and items in the data. The matrix represents a bipartite graph, which was thought to be useful by Wang et al. to be exploited using a GNN. In their paper, Wang et al. present the Neural Graph Collaborative Filtering algorithm (NGCF), which is a GNN used for CF by propagating the user and item embeddings over the user-item graph, capturing connectivities between users and their neighbors. In the picture below we can see how the user-item matrix can be represented as a bipartite graph, and we see the high order connectivity for a user we need to make recommendations for. nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 sz ‘1273, 8:14 PM Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium pen inapp 7 Gwe QO ®& Representation of the user-item matrix as a bipartite graph High-order connectivity for user 1 To show the importance of high-order connectivity, let us look at the example shown in the figure above of two paths in the graph. The first one being ul ¢ i2 ¢ 2, which is a 2nd-order connectivity, and ul ¢i2 ¢ u2 ¢i3, which is a 3rd-order connectivity. The 3rd-order connectivity captures the fact that user 1 might like item 3 since user 1 and user 2 both like item 2 and user 2 likes item 3, This information is not captured in the 2nd-order and 1st- order connectivity. ! n-pytorch-44021GH2563 4122 ‘1273, 8:14 PM Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium Embedding Layer The initial user and item embeddings are concatenated in an embedding lookup table as shown in the figure below. This embedding table is initialized using the user and item embeddings and will be optimized in an end-to-end fashion by the network. B=[ cuss stun + Cte sei | users embeddings item embeddings The initial embedding table (Ea. 1 from the paper) Embedding propagation The embedding table is propagated through the network using the formula shown in the figure below. EO) = LeakyReLU(c +ne-Pw!) + ce! Yor ow! ) Matrix form of the layer-wise propagation formula (Eq. 7 from the paper) The components of the formula are as follows, E®: the embedding table after 1 steps of embedding propagation, where E© is the initial embedding table, LeakyReLU: the rectified linear unit used as activation function, + W: the weights trained by the network, + I: an identity matrix, + L: the Laplacian matrix for the user-item graph, which is formulated as ! n-pytorch-44021GH2563 22 ‘1273, 8:14 PM Implementing Neural Graph Collaborative Fierng in PyTorch | by Yusuf Noor | Medium -p-tap-banda-|% ® £-D AD oaa-[yr R). Equation 8 from the paper Laplacian matrix The components of the Laplacian matrix are as follows, + D: a diagonal degree matrix, where D{t,t} is |N{t}|, which is the amount of first-hop neighbors for either item or user t, + R: the user-item interaction matrix, + 0:an all-zero matrix, + A: the adjacency matrix, Implementation Luckily, the authors of the NGCF paper made their code, using the TensorFlow library in Python, publicly available. We adhered mostly to their structure and used some parts of their code. In the following subsections, we implement and train the NCGF model in Python using the PyTorch library (version 1.4.0). We will highlight some sections of the code that differ from the original TensorFlow implementation. Interaction and Adjacency Matrix Just like in the original code, we create the sparse interaction matrix R, the adjacency matrix A, the degree matrix D, and the Laplacian matrix L, using the SciPy library. The adjacency matrix A is then transferred onto PyTorch tensor objects. nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 e122 ‘1273, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 1 def _convert_sp_mat_to_sp_tensor(self, x) > caw 3 Convert seipy sparse matrix to PyTorch sparse matrix 4 5 Arguments: 6 - 7 = Adjacency matrix, scipy sparse matrix 2 os ° 00 = X.tocoo() .astype(np. float32) 10 torch. LongTensor(np.rat([coo.row, coo.col])) nu torch. FloatTensor( 2 res = torch.sparse.FloatTensor(1, v, coo.shape).to(device) B return res hosted with @ by GitHub View raw Weight initialization We then create tensors for the user embeddings and item embeddings with the proper dimensions. The weights are initialized using Xavier uniform For each layer, the weight matrices and corresponding biases are initialized using the same procedure. ! n-pytorch-44021GH2563 m2 ‘1273, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 1 def _Anit_weights(self: 2 print ("Initializing weights...") 3 welght_dict = nn.Paraneteroict() 4 5 initializer = 6 7 weight_dict{ 'user_enbedding’] = nn.Paraneter(initializer(torch.enpty(self.n_users, self. 8 weight_dict['iten_enbedding'] = nn.Paraneter(initializer(torch.enpty(self.n_itens, self. 9 10 weignt_size list = (self.enb_din] + self.layers u 12 for k in range(self.n_layers): B weight_dict{ 'Wgc_Xé" Xk] = nn.Parancter(initializer(torch.enpty(weight_size_list [k] 1 Wweight_dict('b_ge 4" Xk] = nn.Paraneter(inittalizer(torch.empty(1, weight_size List 15 16 weight_dict[ 'Wbi_Xa" Xk] = nn.Parancter(initializer(torch.empty(weight_size_list [kK] wv weight_dict['b bi 4" Xk] = nn.Paraneter(inittalizer(torch.enpty(1, weight_size List 18 19 return weight_dict < > article weight-initpy hosted with @ by GitHub View raw Training Training is done using the standard PyTorch method. If you are already familiar with PyTorch, the following code should look familiar. ! n-pytorch-44021GH2563 12a ‘1273, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 1 ef train(model, data_generator, optinizer): 2 3 Train the model pyTorch style 4 5 Arguments 6 ceseeeeee 7 model: PyTorch model 8 data_generator: Data object 9 optimizer: PyTorch optimizer 10 31 model. train() 12 n_batch = data_generator.n_train // data_generator.batch size + 1 13 running loss= 34 for _ in range(n_batch) 15 uy 4, J = data_generator.sampie() 16 optinizer.zero_grad() wv oss = model(u, 3,5) 18 toss. backwaré() 9 optinizer.step() 20 running loss += loss.iter() 21——_return running loss article_train py hosted with @ by GitHub view raw One of the most useful functions of PyTorch is the torch.nn.Sequential(), function, that takes existing and custom torch.nn modules. This makes it very easy to build and train complete networks. However, due to the nature of NCGF model structure, usage of torch.nn.Sequential() is not possible and the forward pass of the network has to be implemented ‘manually’. Using the implemented as follows: ! n-pytorch-44021GH2563 92a ‘1273, 8:14 PM. 1 10 na 2 B 4 as 16 v a8 9 20 a 2 23 24 25 26 2 28 29 30 31 32 33 3a 35 36 37 38 39 40 a a2 “3 44 Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium def forward(self, u, i, 3) Computes the forward pass Arguments: positive item (user interacted with iten) J = negative item (user did not interact with item) # apply drop-out mask Af self.node_dropout > @.: self.A hat = self._droupout_sparse(self.A) ego_enbeddings =[self.weight_dict["user_enbedding"], self.weight_dict["item_e: all_enbeddings = [ego_embedéings] # forward pass for propagation layers for k in range(self.n_layers) weighted sun messages of neighbours Lf self.node_dropout > @. side_enbeddings = torch.sparse.nm(self.A_nat, ego_enbeddings) else side_enbeddings =, ego_enbeddings) # transformed sum weighted sum messages of neighbours sum_enbeddings = torch.mateul(side_enbeddings, self.weight_dict['W_gc Yd" % k] + sel sum_enbeddings = F -leaky_relu(sun_enbeddings) # bi messages of neighbours bi_enbeddings = torch.mul(ego_enbeddings, side_enbeddings) transformed bi messages of neighbours bi_enbeddings = torch.matnul(bi_enbeddings, self.woight_dict["h_bi Xd’ % k] + self.w bi_embeddings = F.leaky_relu(bi_enbeddings) # non-Linear activation ego_enbeddings = sun_enbeddings + bi_enbeddings # + message dropout mess_dropout_nask = nn.Dropout(self.ness_dropout) ‘ego_enbeddings = ness_dropout_nask(ego_enbeddings) # normalize activation ! 0122 ‘1273, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 46 ar all_enbeddings.append(norn_enbeddings) 48 “9 all_enbeddings =, 2) se st # back to user/iten dinension 82 ugenbeddings, ig enbeddings = all_enbeddings.split((self.nusers, self.n_itens], @) 83 sa self.u_genbeddings = nn.Paraneter(u_g enbeddings) 5s self.1_g enbeddings = nn.Paraneter(i_g enbeddings) 56 37 uLenb = u_gembeddings[u] # user enbeddings se Pend = 4g enbeddings[i] # positive iter enbeddings se nemb = i genbeddings[j] # negative iter embeddings 60 ea y.ul = torch.mul(u_enb, p_enb).sum(dim2) 62 yj = torch.mul(wenb, nenb).sun(dim=1) 6 og_prob = (torch. log(torch.signoid(y_ui-y_j))).mean() 6a 65 # compute bpr-loss 6s bpr_loss = -1og_prob 6 Af self.reg > 0. 68 L2norm = (torch.sun(u_enb**2)/2. + torch.sun(p_enb**2)/2. + toren.sun(n_enb**2)/2.) 6 Loreg. = self.reg*l2norm 7 bpr_loss = -Log_prob + 12reg n n return bpr_toss < > article hosted with @ by GitHub view raw Model evaluation At every epoch, the model is evaluated on the test set. From this evaluation, ‘we compute the recall and normal discounted cumulative gain (ndcg) at the top-20 predictions. It is important to note that in order to evaluate the model on the test set we have to ‘unpack’ the sparse matrix (torch.sparse.todense()), and thus load a bunch of ‘zeros’ on memory. In order to prevent memory overload, we split the sparse matrices into 100 chunks, unpack the sparse chunks one by one, compute the metrics we need, and compute the mean value of all chunks. ! n-pytorch-44021GH2563 see ‘1273, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 1 10 na 2 B 4 as 16 v a8 19 20 a 2 23 24 25 26 2 28 29 30 31 32 33 34 35 36 37 38 39 40 a a2 43 44 ! def eval_model(u_enb, ienb, Rtr, Rte, ki Evaluate the model Arguments: User embeddings iLemb: Tten embeddings Rtr: Sparse matrix with the training interactions Rte: Sparse matrix with the testing interactions k kth-order for metrics Returns: result: Dictionary with lists correponding to the metrics at order k for k in ks # split matrices lue_splits = split_natrix(u_enb) tr_splits = split_natrix(Rtr) te_splits = split_natrix(rte) recall_k, ndcek= (J, [J 4 compute results for split matrices for ue #, trf, tef in zip(ue_splits, tr_splits, te splits scores =, £enb.t()) test_items = torch. fron_nunpy(te f. todense()) float ().cuda() non_train_items = torch.fron_sunpy(1-(tr_f.todense())) float) .cuda() scores = scores * non_train_itens ua test_indices = torch.topk(scones, dima, kk) pred_itens = torch.zeros_1ike(scores).float() pred_itens.scatter_(dim inde) st_indices, sre=torch.tensor(1.0).cuda()) ‘topk_preds = torch.zeros_like(scores).Float() ‘topk_preds.scatter_(dim: inde) est_indices[:, :k],sre=torch.tensor(1.8)) Th = (test_itens * topk_oreds) .sun(1) rec = TP/test_itens.sum(2) ndcg = conpute_ndcg_k(pred_itens, test_itens, test indices, k) recall_k.append(rec) ndeg_k.append(ndcg) ura-graph-colaboraivefteringin-pytorch 402102513 saa ‘wa, 8:14 PM Implementing Neural Graph Collaborative Fitering in PyTorch by Yusuf Noor | Meck te return, article hosted with @ by GitHub view raw Early stopping The authors of the NGCF paper performed an early stopping strategy. In their paper, they state that premature stopping is applied if recall@20 on the test set does not increase for 50 successive epochs. However, if we take a closer look at their early stopping function (which we also used for our implementation), we notice that early stopping is performed when recall@20 on the test set does not increase for 5 successive epochs. 1 def early_stopping(og_value, best_value, stopping step, flag step, expected_order="asc'): 2 3 Check if early_stopping is needed 4 Function copied from original code 5 6 assert expected_order in ['ase", “des*] 7 if (expected_order == ‘asc’ and log value >= best_value) or (expected_order == ‘des’ and log. 8 stopping step ° best_value = log value 10 else: nu stopping step += 1 2 B if stopping step >= flag_step: 4 print(“Early stopping at step: (} log:{}".format(flag_step, log _value)) as should_stop = True 16 else: ru should_stop = False a8 19 return best_value, stopping step, should_stop 4 > article_early-stopping py hosted with @ by GitHub View raw MovieLens 100K data set The MovieLens 100K data set consists of 100,000 ratings from 1000 users on 1700 movies as described on their website. ! n-pytorch-44021GH2563 13122 ‘wa, 14 PM Implementing Neural Graph Collaborative Fitering in PyTorch | by Yusut Noor | Mecham Before using the data set, we convert the user-item rating matrix to a user- item interaction matrix by replacing all ratings with 1 and all non-rated entries to 0. We run both the model provided by the authors of the paper and our model on this data set to compare the metrics. We also use this data set for the hyper-parameter sensitivity experiment described in the following section, since it is smaller in size and, therefore, allows for faster runs. Hyper-parameter sensitivity To test the sensitivity of the algorithm to the tuning of its hyper-parameters, we perform a hyper-parameter sensitivity check by running several tests using different values for the hyper-parameters as follows: + batch_size: 128, 256, 512, 1024 + node_dropout: 0.0, 0.1 * message_dropout: 0.0, 0.1 + learning_rate: 0.0001, 0.0005, 0.001, 0.005 Whenever we take the results of a run, there are two cases we can encounter. The first one being the completion of all 400 epochs, meaning early stopping was not activated. In this case, we take the final values as our measure of the metrics. The second case is when there are less than 400 epochs run, which means the last 5 consecutive evaluation values were decreasing. In this case, we take the Sth item from the end as our measure of the metrics. The metrics we capture in this test are the recall@20, BPR-loss, ndeg@20, total training time, and training time per epoch Results nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 aia ‘wa, 014 PM Implementing Neural Graph Colaborative Fierng in PyTorch | by Yusuf Noor | Meda Gowalla data set In order to check if our PyTorch implementation produces results similar to those in Table 3 of the original paper, we perform NGCF on the Gowalla dataset with the same hyper-parameter setting as the authors used: Number of propagation layers: 1, 2, 3and 4 « Embedding size: 64 + Propagation layer size: 64 * Node dropout: 0.0 * Message dropout: 0.1 * L2 regularizer: le-5 * Learning rate: 0.0001 A comparison of the results is given in the table below. Q Search this file. 2 Paper Results (TensorFlow) Our Results (Pytorch) 2 recall deg recall deg 3 necr1 01511 02218 0.1404 02841 4 Nect2 0.1535 02238 0.1473 0.2934 5 necr3 | 0.1547 02237 0.1490 02944 © neces 0.1560 02240 o.si4 02978 4 » article results_gowalla.csv hosted with @ by GitHub View raw Comparison of the results of different NGCF models on the Gowalla dataset between Tensorflow and PyTorch implementations. MovieLens 100k data set Q Search this file. ! 5122 ‘war, 8:14 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium 1 Paper Implementation (TensorFlow) ‘Our Implementation (Pytorch) 2 recall rndeg recall ndeg 3 necr3 03380 04437 03304 0.6503 4 » article results_mH00k.csv hosted with @ by GitHub View raw Comparison of the results of the 3-layered NGCF models on the MovieL ens 100k dataset between Tensorflow and PyTorch implementations. Hyper-parameter sensitivity In our hyper-parameter sensitivity experiment, we plotted all the tuned hyper-parameters against the metrics, and the interesting plots are shown. below: a - i | fel z | =.) Bean” Ea & rr x eee -F : It seems that increasing the batch size reduces the loss, total training time, and training time per epoch. Increasing the learning rate causes an overall increase in recall@20 and ndcg@20 while decreasing the BPR-loss. The best values for the hyper-parameters to maximize the recall@20 turned out to be: ! n-pytorch-44021GH2563 6122 ‘1273, 8:14 PM Implementing Neural Graph Collaborative Fierng in PyTorch | by Yusuf Noor | Mecium * Batch size: 512 * Node dropout: 0.0 * Message dropout: 0.0 + Learning rate: 0.0005 While the best hyper-parameters to minimize the BPR-loss are: * Batch size: 1024 * Node dropout: 0.0 * Message dropout: 0.0 + Learning rate: 0.005 Discussion Metric choice ‘As we can see in the results of the hyper-parameter sensitivity check, it depends on what metric you want to maximize/minimize for your problem at hand to choose the suitable hyper-parameters. Since we are seeking to maximize the recall@20, we choose a smaller learning rate and a batch size of 512. TensorFlow vs. PyTorch One important difference between TensorFlow and PyTorch is that TensorFlow is a static framework and PyTorch is a dynamic framework. This means that in our PyTorch implementation we have to build the graph for all users and items every time we do the forward pass while in TensorFlow the graph is built once. We assume that this makes the TensorFlow implementation faster than our implementation. nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 tee ‘wa, 14 PM Implementing Neural Graph Collaborative Fitering in PyTorch | by Yusut Noor | Mecham Concerns There are some concerns that we have to address concerning the correctness of the original implementation of the paper. First off, we want to address the usage of terms in their paper and the implementation. When they construct the Laplacian matrix, using L£=D-?AD"?2, they do not mention where this implementation comes from. We also could not find any references to this matrix in the works they mentioned that they were inspired by. On another note, the authors use the terms for ‘Laplacian’ and ‘adjacency matrix’ intertwined, both in their paper as well as in their original implementation in Tensorflow, which confuses the reader. On another note, in their implementation for the data loader, they implement the ‘Laplacian’ as £L=D'A which is not equivalent to the aforementioned formula. Furthermore, they mention that their default means of the adjacency matrix is through the ‘NGCF’ option while in their code, the default option is to use ‘norm’, which is £L=D "A+, which is not mentioned in their paper. This brings us to our next point. In their formula for the embedding matrix E, they have a matrix multiplication nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 18122 1923, 814 PM. Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium involving both L and L +I. In their implementation, however, they only make direct use of the L matrix, so the implementation for L + 1 is ignored. Finally, we want to address the usage of Leaky ReLU. In their original implementation, they apply Leaky ReLU to both the side embeddings and the bi-embeddings and take their respective sum to acquire the ego embeddings (matrix E). This contradicts their formula for the ego embeddings, which only mentions one single usage of the Leaky ReLU activation function, where the side embeddings and the bi-embeddings are both parts of. If we let f(x) be the Leaky Rel.U function, then we may easily see that f(a + b) # f(a) + f(b), since when a <0, b> 0. and a+b > 0, the function will have a different outcome in both cases. Assuming that the authors have used the given implementation for their acquired results, we become concerned with the actual reproducibility of their paper, since their results may not be representative of their model. We took the liberty to correct these errors, and have run the resulting model on the Gowalla data set. Before the correction, the authors of the paper had acquired a recall@20 of 0.1511 and our PyTorch implementation yielded a recall@20 of 0.1404. With the corrected implementation in PyTorch, we had acquired a recall@20 score of 0.1366, using the same hyper-parameters. Machine Learning Python Pytorch Collaborative Filtering Reproduction nitpsmedium. com/@yusutnoor_88274.implementing-neural-graph-colaboraive-teringin-pyoreh 402 1aH2513 1922 ‘1273, 8:14 PM Written by Yusuf Noor 14 Followers Recommended from Medium Playlist Dataset Benjamin Witten... n Stanford CS224W GraphML... Spotify Track Neural Recommender System By Eva Batelaan, Thomas Brink, and Benjamin Wittenbrink 22min read - May 16 Sr Q oo Implementing Neural Graph Collaborative Filtering in PyTorch | by Yusuf Noor | Medium GeO q A BC AA 3 (apc &) sa c A BC cc @® = cosine simiarty @ dhruvMatani All Pairs Cosine Si PyTorch PyTorch defines a cosine_similarity function to compute pairwise cosine similarity. jarity in Jun 8 Tminread es Qt ! n-pytorch-44021GH2563 2oiza ‘1273, 8:14 PM Lists Predictive Modeling w/ Python 20 stories - 550 saves Coding & Development 248 saves stories be show €OllabOr cg PEAR pects use problem 3: Irate itemsase nace’ MOdel ‘Sudipakoner Enhancing Recommendation Systems with Convolutional Neur... In today’s digital era, recommendation systems play a pivotal role in personalized. Tmin read + Jul15 82 Q aw dy Pooranjoy Bhattacharya Implementing Neural Graph Colaborative Filtering in PyTorch | by Yusuf Noor | Medium Practical Guides to Machine Learning JOstories - 629 saves Natural Language Processing 780 stories - 356 saves ® Python Fundamentals in Stackademic Recommender Systems with Python Code Examples A Comprehensive Guide to Recommender Systems Gminread - Aug 28 Sr Q a ye a @ barierko ! n-pytorch-44021GH2563 21iea ‘wa, 14 PM Integration & Deployment of ML model with React & Flask Building a simple ML Application with React & Flask and deploying it in a deployment... Tminread + May6 8s Q hos ‘See more recommendations Implementing Neural Graph Colaborative Filtering in PyTorch | by Yusuf Noor | Medium Graph Neural Networks: From CNNs to GATs A high level understanding of the progression toSiGATs. 4minread » Juno 81 Q Woe ! n-pytorch-44021GH2563 2aiea

You might also like