Professional Documents
Culture Documents
Representation Learning
Lin Shua , Chuan Chena; * and Zibin Zhenga
a Sun Yat-sen University
matrix as the semantic adjacency matrix to achieve the subsequent 4.3 Semantic-wise Graph Encoder
graph encoding procedure to generate node embeddings. However, After generating two graph views based on the holistic graph and se-
on the one hand, graph motifs can only capture underlying semantics mantic graphs, we employ graph neural networks (GNNs) to learn
from the perspective of structural information, which ignore the role node embeddings on each view. Since the T + 1 graphs of each view
of node features played in semantic similarity. On the other hand, depict distinct aspects of the input graph, encoding all graphs with
there exists a large number of similar node pairs in the co-occurrence the same GNN will result in the loss of unique information of each
matrix, thus it is unscalable and inflexible to regard all similar node graph. Therefore, we propose to learn separate GNNs for each graph
pairs as neighbor pairs. As a consequence, we propose to facilitate of each view, which are collectively referred to as semantic-wise
both the feature similarity and motif co-occurrence matrix to select e m, Aem
graph encoder. More specifically, for each graph (X i ), i ∈
top-K neighbors for each node. Concretely, given the node feature {0, ..., T }, where m denotes the m-th augmented graph view, we uti-
X ∈ Rn×df , where df denotes the dimension of initial features, we lize graph convolutional layers [10] to learn node embeddings Zm i ,
first calculate the feature similarity matrix C through the widely used which update node embeddings by aggregating neighbors’ embed-
cosine similarity, which is formulated as: dings. The propagation of each layer can be represented as:
Xi · Xj 1 1
Cij = . (2) e−2 m e−2 m m
kXi · Xj k Zm
i = σ(Di,m Ai Di,m Zi Wi ), (7)
Figure 3. Framework of FSGCL. FSGCL utilizes the online and target network to learn, where the target network is stop-gradient and optimized by the
slow-moving average of the online network.
Table 3. Ablation study of key components of FSGCL. Amazon- Amazon- Coauthor- Coauthor-
Datasets WikiCS
Amazon- Amazon- Coauthor- Coauthor- Photos Computers CS Physics
Datasets WikiCS FSGCL 94.65 90.54 94.22 96.10 80.25
Photos Computers CS Physics
FSGCL 94.65 90.54 94.22 96.10 80.25 motif-A 94.60 90.51 93.81 96.02 79.47
i
w/o wm 94.07 90.45 94.20 96.08 80.26 motif-B 93.74 89.97 93.49 95.85 79.89
w/o slow 92.20 89.80 92.09 95.75 78.45 motif-C 94.11 90.21 93.54 95.85 79.86
w/o ASGi 93.13 90.05 93.32 95.75 79.55 motif-AB 94.49 90.50 94.15 96.07 79.98
w/o top-k A SGi
93.97 89.96 94.10 96.03 78.97 motif-AC 94.62 90.52 94.16 96.05 79.99
w/o LSemantic 94.19 90.49 93.92 96.02 80.14 motif-BC 94.29 94.30 93.78 95.83 80.14
w/o LHolistic 94.15 90.11 93.87 96.02 79.60
Appendix A Experimental Settings specifically, we set the average degree k and maximum degree to be
20 and 50 respectively, the mixing parameter mu (each node shares
The proposed model FSGCL is implemented by PyTorch [19] and a fraction of its edges with nodes in other communities) to be 0.2, the
DGL [28] with the AdamW optimizer with the base learning rate minimum number minc and maximum number maxc of community
ηb = 0.001 and weight decay of 10−5 . Following the settings of size to be 1500 and 3000, the number of overlapping nodes on to be
BGRL [24], the learning rate is annealed via a cosine schedule dur- 8,000, and the number of memberships of the overlapping nodes om
ing the ntotal steps with an initial warmup period of nw steps. As a to be 2.
result, the learning rate at step i is computed as:
i×η
base
, if i ≤ nw
nw
Appendix C Parameter Sensitivity Analysis
ηi =
ηb × (1 + cos n(i−n w )×π
) × 0.5, if nw ≤ i ≤ ntotal
total −nw In this section, we investigate the sensitivity of five major hy-
(18) perparameters: the decay rate τ of the slow-moving average
For node classification, the final evaluation is achieved by fitting strategy, k that is used to select the top-K elements to build
an l2 -regularized LogisticRegression classifier from Scikit-Learn semantic graphs, the augmentation ratio r, the weight coeffi-
[20] using the liblinear solver on the frozen node embeddings, cient γ that controls the significance of the semantic-level con-
where the regularization strength is chosen by grid-search from trastive objective, and the weight coefficient β that balances the
{2−10 , 2−9 , ..., 29 , 210 }. The dimension of hidden and output rep- holistic and semantic embeddings. The results of the above pa-
resentations are set to be 512. The number of k, τ, β, γ and layers rameters are illustrated in Figure 6, where τ takes the value
of GCN are set to be 7, 0.99, 0.5, 0.5, 2 for Amazon-Computers from the list {0.9,0.93,0.96,0.99,0.993,0.996,0.999}, k ranges
and WikiCS, while 5, 0.996, 1, 1, 1 for other datasets. The MLP in {3,5,7,9,11,13,15}, r ranges in {0.1,0.2,0.3,0.4,0.5,0.6,0.7},
layers L of online predictor is set to be 2 for Amazon and Wiki γ and β take the value from {0.1,0.3,0.5,0.7,0.9,1,2} and
datasets, and 1 for Coauthor datasets. The augmentation ratio r is set {0.1,0.3,0.5,0.7,0.9,1,2}, respectively.
to be 0.2 for Amazon-Computers, 0.1 for WikiCS and 0.3 for other As shown in Figure 6, FSGCL achieves higher performances when
datasets. As for the motif coefficient wim , i ∈ motif − A, B, C, τ ranges from 0.99 to 0.999, demonstrating that a larger value of τ is
we set the value of wm to be [0.7, 0.1, 0.2] for Amazon-Photos, beneficial for the slow-moving average strategy for learning discrim-
[0.3, 0.1, 0.6] for Amazon-Computers, [0.4, 0.2, 0.4] for Coauthor- inative embeddings. Besides, FSGCL always performs better when
CS, [0.5, 0.45, 0.05] for Coauthor-Physics and [0.1, 0.5, 0.4] for Wi- k is small, which is reasonable since the semantic graphs own more
kiCS. The value of α is set to be 0.2 on all datasets. unique semantic information when k is smaller. For the augmentation
ratio r, the performances drop sharply when r becomes too large, in-
Appendix B Settings of Synthetic Dataset dicating that over-perturbation leads to the loss of useful information
in the original graphs. For the weight coefficient γ and β, it can be
We employ the well-known LFR toolkit [12] to generate a synthetic observed that our model is relatively stable with regard to γ, while
graph with 10,000 nodes and 8 overlapping communities. More the performance decreases when β is too small or too large.
References Tingyang Xu, and Junzhou Huang, ‘Graph representation learning via
graphical mutual information maximization’, in Proceedings of The
Web Conference 2020, pp. 259–270, (2020).
[1] Kimitaka Asatani, Junichiro Mori, Masanao Ochi, and Ichiro Sakata, [22] Aravind Sankar, Xinyang Zhang, and Kevin Chen-Chuan Chang,
‘Detecting trends in academic research from a citation network using ‘Motif-based convolutional neural network on graphs’, arXiv preprint
network representation learning’, PloS one, 13(5), e0197260, (2018). arXiv:1711.05697, (2017).
[2] Chris Biemann, Lachezar Krumov, Stefanie Roos, and Karsten Weihe, [23] Qiaoyu Tan, Ninghao Liu, and Xia Hu, ‘Deep representation learning
‘Network motifs are a powerful tool for semantic distinction’, in To- for social network analysis’, Frontiers in big Data, 2, 2, (2019).
wards a Theoretical Framework for Analyzing Complex Linguistic Net- [24] Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar,
works, 83–105, Springer, (2016). Mehdi Azabou, Eva L Dyer, Remi Munos, Petar Veličković, and Michal
[3] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hin- Valko, ‘Large-scale representation learning on graphs via bootstrap-
ton, ‘A simple framework for contrastive learning of visual represen- ping’, arXiv preprint arXiv:2102.06514, (2021).
tations’, in International conference on machine learning, pp. 1597– [25] Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia
1607. PMLR, (2020). Schmid, and Phillip Isola, ‘What makes for good views for contrastive
[4] Bo Dai and Dahua Lin, ‘Contrastive learning for image captioning’, learning?’, Advances in Neural Information Processing Systems, 33,
Advances in Neural Information Processing Systems, 30, (2017). 6827–6839, (2020).
[5] Manoj Reddy Dareddy, Mahashweta Das, and Hao Yang, ‘motif2vec: [26] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana
Motif aware node representation learning for heterogeneous networks’, Romero, Pietro Lio, and Yoshua Bengio, ‘Graph attention networks’,
in 2019 IEEE International Conference on Big Data (Big Data), pp. arXiv preprint arXiv:1710.10903, (2017).
1052–1059. IEEE, (2019). [27] Petar Velickovic, William Fedus, William L Hamilton, Pietro Liò,
[6] Travis Ebesu and Yi Fang, ‘Neural citation network for context-aware Yoshua Bengio, and R Devon Hjelm, ‘Deep graph infomax.’, ICLR
citation recommendation’, in Proceedings of the 40th international (Poster), 2(3), 4, (2019).
ACM SIGIR conference on research and development in information [28] Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song,
retrieval, pp. 1093–1096, (2017). Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, et al., ‘Deep graph li-
[7] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tal- brary: A graph-centric, highly-performant package for graph neural net-
lec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo works’, arXiv preprint arXiv:1909.01315, (2019).
Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al., ‘Boot- [29] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi
strap your own latent-a new approach to self-supervised learning’, Ad- Zhang, and S Yu Philip, ‘A comprehensive survey on graph neural net-
vances in neural information processing systems, 33, 21271–21284, works’, IEEE transactions on neural networks and learning systems,
(2020). 32(1), 4–24, (2020).
[8] Kaveh Hassani and Amir Hosein Khasahmadi, ‘Contrastive multi-view [30] Jun Xia, Lirong Wu, Ge Wang, Jintao Chen, and Stan Z Li, ‘Progcl: Re-
representation learning on graphs’, in International Conference on Ma- thinking hard negative mining in graph contrastive learning’, in Inter-
chine Learning, pp. 4116–4126. PMLR, (2020). national Conference on Machine Learning, pp. 24332–24346. PMLR,
[9] Shuting Jin, Xiangxiang Zeng, Feng Xia, Wei Huang, and Xiangrong (2022).
Liu, ‘Application of deep learning methods in biological networks’, [31] Yaochen Xie, Zhao Xu, Jingtun Zhang, Zhengyang Wang, and Shui-
Briefings in bioinformatics, 22(2), 1902–1917, (2021). wang Ji, ‘Self-supervised learning of graph neural networks: A unified
[10] Thomas N Kipf and Max Welling, ‘Semi-supervised classification review’, IEEE Transactions on Pattern Analysis and Machine Intelli-
with graph convolutional networks’, arXiv preprint arXiv:1609.02907, gence, (2022).
(2016). [32] Minghao Xu, Hang Wang, Bingbing Ni, Hongyu Guo, and Jian Tang,
[11] Johannes Klicpera, Stefan Weißenberger, and Stephan Günnemann, ‘Self-supervised graph-level representation learning with local and
‘Diffusion improves graph learning’, arXiv preprint arXiv:1911.05485, global structure’, in International Conference on Machine Learning,
(2019). pp. 11548–11558. PMLR, (2021).
[12] Andrea Lancichinetti and Santo Fortunato, ‘Benchmarks for testing [33] Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang
community detection algorithms on directed and weighted graphs with Wang, and Yang Shen, ‘Graph contrastive learning with augmenta-
overlapping communities’, Physical Review E, 80(1), 016118, (2009). tions’, Advances in Neural Information Processing Systems, 33, 5812–
[13] Namkyeong Lee, Junseok Lee, and Chanyoung Park, ‘Augmentation- 5823, (2020).
free self-supervised learning on graphs’, in Proceedings of the AAAI [34] Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang, ‘Network
Conference on Artificial Intelligence, volume 36, pp. 7372–7380, representation learning: A survey’, IEEE transactions on Big Data,
(2022). 6(1), 3–28, (2018).
[14] Haoyang Li, Xin Wang, Ziwei Zhang, Zehuan Yuan, Hang Li, and [35] Min-Ling Zhang and Zhi-Hua Zhou, ‘Ml-knn: A lazy learning approach
Wenwu Zhu, ‘Disentangled contrastive learning on graphs’, Advances to multi-label learning’, Pattern recognition, 40(7), 2038–2048, (2007).
in Neural Information Processing Systems, 34, 21872–21884, (2021). [36] Shichang Zhang, Ziniu Hu, Arjun Subramonian, and Yizhou Sun,
[15] Shuai Lin, Chen Liu, Pan Zhou, Zi-Yuan Hu, Shuojia Wang, Ruihui ‘Motif-driven contrastive learning of graph representations’, arXiv
Zhao, Yefeng Zheng, Liang Lin, Eric Xing, and Xiaodan Liang, ‘Proto- preprint arXiv:2012.12533, (2020).
typical graph contrastive learning’, IEEE Transactions on Neural Net- [37] Zaixi Zhang, Qi Liu, Hao Wang, Chengqiang Lu, and Chee-Kong
works and Learning Systems, (2022). Lee, ‘Motif-based graph self-supervised learning for molecular prop-
[16] Giulia Muzio, Leslie O’Bray, and Karsten Borgwardt, ‘Biological net- erty prediction’, Advances in Neural Information Processing Systems,
work analysis with deep learning’, Briefings in bioinformatics, 22(2), 34, 15870–15882, (2021).
1515–1530, (2021). [38] Ziyang Zhang, Chuan Chen, Yaomin Chang, Weibo Hu, Xingxing
[17] David F Nettleton, ‘Data mining of social networks represented as Xing, Yuren Zhou, and Zibin Zheng, ‘Shne: Semantics and homophily
graphs’, Computer Science Review, 7, 1–34, (2013). preserving network embedding’, IEEE Transactions on Neural Net-
[18] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, works and Learning Systems, (2021).
‘The pagerank citation ranking: Bringing order to the web.’, Technical [39] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang,
report, Stanford InfoLab, (1999). Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun, ‘Graph
[19] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James neural networks: A review of methods and applications’, AI open, 1,
Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia 57–81, (2020).
Gimelshein, Luca Antiga, et al., ‘Pytorch: An imperative style, high- [40] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang
performance deep learning library’, Advances in neural information Wang, ‘Deep graph contrastive representation learning’, arXiv preprint
processing systems, 32, (2019). arXiv:2006.04131, (2020).
[20] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent [41] Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang
Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Pret- Wang, ‘Graph contrastive learning with adaptive augmentation’, in Pro-
tenhofer, Ron Weiss, Vincent Dubourg, et al., ‘Scikit-learn: Machine ceedings of the Web Conference 2021, pp. 2069–2080, (2021).
learning in python’, the Journal of machine Learning research, 12,
2825–2830, (2011).
[21] Zhen Peng, Wenbing Huang, Minnan Luo, Qinghua Zheng, Yu Rong,