Professional Documents
Culture Documents
1
Book: Deep Learning on Graphs
https://cse.msu.edu/~mayao4/dlg_book/
2
Part I Overview
3
Part I Overview
4
Graphs and Graph Signals
5
Graphs and Graph Signals
Graph Signal:
6
Graphs and Graph Signals
Graph Signal:
7
Graphs and Graph Signals
Graph Signal:
8
Matrix Representations of Graphs
9
Spectral graph theory. American Mathematical Soc.; 1997.
Matrix Representations of Graphs
Adjacency Matrix
10
Spectral graph theory. American Mathematical Soc.; 1997.
Matrix Representations of Graphs
Degree Matrix:
11
Spectral graph theory. American Mathematical Soc.; 1997.
Matrix Representations of Graphs
Degree Matrix:
12
Spectral graph theory. American Mathematical Soc.; 1997.
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
13
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
14
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
15
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
16
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
17
Laplacian Matrix as an Operator
Laplacian matrix is a difference operator:
19
Eigen-decomposition of Laplacian Matrix
Laplacian matrix has a complete set of orthonormal eigenvectors:
20
Eigen-decomposition of Laplacian Matrix
Laplacian matrix has a complete set of orthonormal eigenvectors:
21
Eigenvectors as Graph Signals
22
Eigenvectors as Graph Signals
The frequency of an eigenvector of Laplacian matrix is its
corresponding eigenvalue:
23
Eigenvectors as Graph Signals
The frequency of an eigenvector of Laplacian matrix is its
corresponding eigenvalue:
24
Graph Fourier Transform(GFT)
25
Graph Fourier Transform(GFT)
26
Graph Fourier Transform(GFT)
27
The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE signal processing magazine
Inverse Graph Fourier Transform (IGFT)
28
Part I Overview
29
Tasks on Graph-Structured Data
Node-level Graph-level
Link Prediction Graph Classification
? ?
t t+a
Node Classification
?
?
30
Tasks on Graph-Structured Data
Node-level Graph-level
31
Tasks on Graph-Structured Data
Node-level Graph-level
Node Representations
32
Tasks on Graph-Structured Data
Node-level Graph-level
33
Tasks on Graph-Structured Data
Node-level Graph-level
Filtering Pooling
34
Two Main Operations in GNN
Graph Filtering
Graph Filtering
35
Two Main Operations in GNN
Graph Filtering
Graph Filtering
36
Two Main Operations in GNN
Graph Filtering
Graph Filtering
Graph Pooling
38
Two Main Operations in GNN
Graph Pooling
Graph Pooling
39
Two Main Operations in GNN
Graph Pooling
Graph Pooling
41
General GNN Framework
For graph-level tasks
… … …
42
Graph Filtering Operation
Graph Filtering
43
Two Types of Graph Filtering Operation
Spatial Based Filtering Spectral Based Filtering
Original GNN
(Scarselli et al.
2005)
GraphSage Spectral
GAT (Hamilton et al. Graph CNN
(Veličković et al. NIPS 2017) (Bruna et al.
ICLR 2018) GCN ICLR 2014)
(Kipf & Welling.
ICLR 2017)
MPNN
(Glimer et al.
ICML 2017) ChebNet
(Defferard et al.
… NIPS 2016)
44
Graph Filtering in the First GNN Paper
Graph neural networks for ranking web pages. WI. IEEE, 2005.
45
Graph Filtering in the First GNN Paper
46
Graph Spectral Filtering for Graph Signal
Recall:
47
Graph Spectral Filtering for Graph Signal
Recall:
Decompose
Coefficients
48
Graph Spectral Filtering for Graph Signal
Recall:
Decompose Filter
Coefficients Filtered coefficients
49
Graph Spectral Filtering for Graph Signal
Recall:
Decompose Filter
Coefficients Filtered coefficients
Example:
50
Graph Spectral Filtering for Graph Signal
Recall:
Example:
51
Graph Spectral Filtering for Graph Signal
Recall:
Example:
52
Graph Spectral Filtering for Graph Signal
Recall:
Example:
53
Graph Spectral Filtering for Graph Signal
Recall:
Filtering
54
Graph Spectral Filtering for GNN
How to design the filter?
55
Graph Spectral Filtering for GNN
How to design the filter?
56
Graph Spectral Filtering for GNN
How to design the filter?
57
Graph Spectral Filtering for GNN
How to design the filter?
58
Graph Spectral Filtering for GNN
How to design the filter?
69
Polynomial Parametrized Filter: a Spatial
View
70
Polynomial Parametrized Filter: a Spatial
View
71
Chebyshev Polynomials
72
Chebyshev Polynomials
73
Chebyshev Polynomials
74
Chebyshev Polynomials
75
Chebyshev Polynomials
76
Chebyshev Polynomials
77
ChebNet
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. NIPS
78
2016.
ChebNet
79
ChebNet
No eigen-decomposition needed
80
ChebNet
No eigen-decomposition needed
Stable under perturbation of coefficients
81
GCN: Simplified ChebNet
83
GCN: Simplified ChebNet
84
GCN for Multi-channel Signal
Recall:
85
GCN for Multi-channel Signal
Recall:
GCN filter
86
GCN for Multi-channel Signal
Recall:
GCN filter
In matrix form:
87
A Spatial View of GCN Filter
88
A Spatial View of GCN Filter
89
A Spatial View of GCN Filter
Observe that:
90
A Spatial View of GCN Filter
Observe that:
Hence,
91
A Spatial View of GCN Filter
Observe that:
Hence,
Feature transformation
92
A Spatial View of GCN Filter
Observe that:
Hence,
Feature transformation
Aggregation 93
Filter in GCN VS Filter in the First GNN
GCN: k-th layer
94
Filter in GCN VS Filter in the First GNN
GCN: k-th layer
95
Filter in GCN VS Filter in the First GNN
GCN: k-th layer
96
Filter in GraphSage
Neighbor Sampling
Aggregation
98
Filter in GAT
100
Filter in MPNN
Message Passing
Feature Updating
Graph Pooling
102
gPool
Downsample by selecting the most importance nodes
104
gPool
Downsample by selecting the most importance nodes
Importance Measure
105
gPool
Downsample by selecting the most importance nodes
Importance Measure
106
gPool
Downsample by selecting the most importance nodes
Importance Measure
107
DiffPool
Downsample by clustering the nodes using GNN
108
Hierarchical Graph Representation Learning with Differentiable Pooling. NeurIPS
DiffPool
Downsample by clustering the nodes using GNN
2 filters
Filter1:
Generate a soft-assign matrix
109
DiffPool
Downsample by clustering the nodes using GNN
2 filters
Filter1:
Generate a soft-assign matrix
Filter2:
Generate new features
110
DiffPool
Downsample by clustering the nodes using GNN
111
DiffPool
Downsample by clustering the nodes using GNN
112
DiffPool
Downsample by clustering the nodes using GNN
113
Eigenpooling
115
Eigenpooling
116
Eigenpooling
117
Going Back to Graph Spectral Theory
Recall:
118
Going Back to Graph Spectral Theory
Do we need all the coefficients to reconstruct a “good” signal?
119
Going Back to Graph Spectral Theory
Do we need all the coefficients to reconstruct a “good” signal?
120
Going Back to Graph Spectral Theory
Do we need all the coefficients to reconstruct a “good” signal?
121
Eigenpooling: Truncated Fourier Coefficients
Eigenvectors (Fourier Modes) of the subgraph
122
Eigenpooling: Truncated Fourier Coefficients
Eigenvectors (Fourier Modes) of the subgraph
GFT
Fourier coefficients
123
Eigenpooling: Truncated Fourier Coefficients
Eigenvectors (Fourier Modes) of the subgraph
GFT
Fourier coefficients
Truncated Fourier
coefficients
124
Eigenpooling: Truncated Fourier Coefficients
Eigenvectors (Fourier Modes) of the subgraph
GFT
Fourier coefficients
Truncated Fourier
coefficients
New features for the subgraph (a node in the smaller graph)
125
Robustness of GNN
126
Adversarial Attacks on Deep Learning
Do Graph Neural Networks
Suffer the Same Problem?
Adversarial Attacks on GNN
5 6 5 6
7 7
4 7 4 7
3 2 3 2
1 1
8 8
2 2
GNN GNN
129
Consequences
Financial Systems
7 • Credit Card Fraud Detection
7
2 Recommender Systems
8 • Social Recommendation
• Product Recommendation
….
130
Image vs Graph
Discreteness
Perturbation Measure
Perturbation Type
F
A
E
C
B
D
131
Perturbation Type
6 5 6
5
7 7
4 4 7
7
3 3 2 6
1 2 1 5
8 8 7
2 2 4 7
Rewiring 3 2
Adding an edge 1
8
2
5 6
5 6
7
4 7
4 7 3 2 Modifying Features
3 1
1
8
8 2
2
9
6 6 5 6 5 6 7
5 7 5 7 7
4 4 ① Perturb 4 7
7 4 7 7 3
1 3 3 1 3 1 2
2 1 2 2 8
2 8 8 2 8 2
2
① Train ② Train
Trained GNN
Trained GNN
133
Targeted & Non-Targeted
Targeted Attack Non-Targeted Attack
5 6 6
5
7 7
4 7 4 7
3 2 3
1 1 2
8 8
2 2
8 Target Node
134
Attack Methods
Attack Injecting Adding Rewiring Modifying Evasion Poisoning Targeted Non-Targe
Methods Node /Deleting Features ted
Edge
Grad-Argm ✔ ✔ ✔ ✔ ✔ ✔ ✔
ax
Nettack ✔ ✔ ✔ ✔ ✔
GF-Attack ✔ ✔ ✔
ReWatt ✔ ✔ ✔
RL-S2V ✔ ✔ ✔
Meta-Atta ✔ ✔ ✔
ck
NIPA ✔ ✔ ✔
135
Attack Methods
Attack Injecting Adding Rewiring Modifying Evasion Poisoning Targeted Non-Targe
Methods Node /Deleting Features ted
Edge
Grad-Argm ✔ ✔ ✔ ✔ ✔ ✔ ✔
ax
136
GradArgmax
138
GradArgmax
139
GradArgmax
140
GradArgmax
141
Attack Methods
Attack Injecting Adding Rewiring Modifying Evasion Poisoning Targeted Non-Targe
Methods Node /Deleting Features ted
Edge
Grad-Arg ✔ ✔ ✔ ✔ ✔ ✔ ✔
max
Nettack ✔ ✔ ✔ ✔ ✔
142
Nettack
143
Adversarial Attacks on Neural Networks for Graph Data. KDD 2018.
Nettack
Idea 1: Train a surrogate model
144
Nettack
Edge Perturbations
Candidates
Feature
Perturbations
Candidates
Degree Distribution
Feature Co-occurrence
145
Nettack
Edge Perturbations
Candidates
Feature
Perturbations
Candidates
Attack
Target GCN Models Wrong
Prediction
146
Attack Methods
Attack Injecting Adding Rewiring Modifying Evasion Poisoning Targeted Non-Targe
Methods Node /Deleting Features ted
Edge
Grad-Argm ✔ ✔ ✔ ✔ ✔ ✔ ✔
ax
Nettack ✔ ✔ ✔ ✔ ✔
GF-Attack ✔ ✔ ✔
147
GF-Attack
Motivation:
• the output embeddings of graph embedding models are demonstrated to have very
low-rank property.
• Goal: to damage the quality of output embedding Z
• Formulation:
• A graph embedding model can be treated as producing the new graph signals according to graph
filter ℋ together with feature transformation:
148
A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models. AAAI20.
GF-Attack
149
Attack Methods
Attack Injecting Adding Rewiring Modifying Evasion Poisoning Targeted Non-Targe
Methods Node /Deleting Features ted
Edge
Grad-Argm ✔ ✔ ✔ ✔ ✔ ✔ ✔
ax
Nettack ✔ ✔ ✔ ✔ ✔
GF-Attack ✔ ✔ ✔
ReWatt ✔ ✔ ✔
150
ReWatt
Motivation Rewiring
Degree distribution may not be an ideal
measure for perturbations
151
Attacking Graph Convolutional Networks via Rewiring. Arxiv 2019.
ReWatt
Rewiring Advantages
• Number of nodes and edges
remain the same
• Affects algebraic connectivity
in a smaller way
• Affects effective graph resistance
in a smaller way
152
ReWatt
Reinforcement Learning
Black-box classifier
Attacker
Policy Network
Node Edge
GCN Emb Emb
153
Defending Against Attacks
Adversarial Training
Graph Purifying
Attention Mechanism
154
Adversarial Training
Motivation
155
Latent Adversarial Training of Graph Convolution Networks. ICML 2019 workshop.
Adversarial Training
Obstacles
•A is discrete
•X is often discrete
156
Graph Purifying - Preprocessing
Main Idea
• Purify the poisoned graph
• Train GNN on the purified graph
6 5 6
5 7
7 ① Preprocess
4 4 7
7 3
1 3 1 2
2
8 8
2 2
② Train
158
Adversarial Examples on Graph Data: Deep Insights into Attack and Defense. IJCAI 2019.
Graph Purifying – Graph Learning: Pro-GNN
Graph Learning and GNN training
6 5 6
5 7
7
4 Graph Learning 4 7
7 3
1 3 1 2
2
8 8
2 GNN Learning 2
Clean Graph
Trained GNN
159
Graph Structure Learning for Robust Graph Neural Networks. KDD 2020.
Pro-GNN: Defend Against Adversarial Attacks
Graph Properties
Low-rank
Sparsity
Feature smoothness
160
Pro-GNN: Defend Against Adversarial Attacks
Graph Properties
Low-rank
Sparsity
Feature smoothness
Table Credit: Adversarial Attacks and Defenses on Graphs: A Review and Empirical Study
161
Pro-GNN: Defend Against Adversarial Attacks
Graph Properties
Low-rank
Sparsity
Feature smoothness
162
Pro-GNN: Framework
163
Attention Mechanism
Motivation
Reduce impact of adversarial edges
-- give lower attention score to adversarial edges
165
Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
RGCN
Embed nodes as Gaussian
distributions to capture
uncertainty
166
RGCN
Attention Mechanism Attacked nodes do have higher
variance!
167
PA-GNN
Motivation Clean graphs from similar domain
• Only relying on perturbed graph to Facebook & Twitter
learn attention coefficients is not Yelp & Foursquare
enough.
• We should exploit information from
clean graphs.
Then Use Transfer Learning/Meta
Learning!
168
Robust Graph Neural Network Against Poisoning Attacks via Transfer Learning. WSDM 2020.
PA-GNN
169
Self-Supervised Learning for
Graph Neural Networks
170
Self-Supervised Learning
Relative position
pretext task
Doersch et al., 2015
Jigsaw puzzle
pretext task
Noroozi and Favaro, 2016
171
Shuffled Solved
Graph-Structured Data
172
Traditional Deep Learning on Graphs
Traditional DL is designed for simple grids or sequences
CNNs for fixed-size images/grids
RNNs for text/sequences
•Objective is to
reconstruct
masked edges
•Could be used as a
pre-training step
for another task
(e.g., node
classification)
Unlabeled node
176
Applying SSL to Graphs
Similarities to Image and Text Domains
• Nodes have features like images or text
🡪 Pretext tasks using attribute information
• Topological structure associated with unlabeled samples
🡪 Pretext tasks using structural information
Fundamental Differences Found in Graph
Domain
• Nodes are connected and dependent
🡪 Pretext tasks using node pairs or even sets
Joint Training
Two-stage
Training
178
Multi-Stage Self-Training GNNs
Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels. AAAI. 179
2020.
DeepCluster and M3S Training
For each stage
• Run Deep Clustering by using
K-means on node embeddings
• Align cluster centers to labeled
data class centers
• Sort remaining unlabeled based on
confidence of prediction
For each class j
DeepCluster • Find top samples
• If sample pseudo label
matches aligned cluster
label, then add to training
Train for fixed epochs
180
Case Study Results with M3S
General Insights:
• The less training data the larger the improvement over GCN
• Self-training can typically provide improvements
• Using MultiStage typically is better than single Self-training
• DeepCluster based self-checking provides a benefit
181
Contrastive Learning for Graphs via
Augmentations
When Does Self-Supervision Help Graph Convolutional Networks? ICML. 2020. 185
Node Clustering Pretext Task
• Features used: Nodes
• Assumptions: Feature similarity
• Loss Type: Classification
186
Partitioning Pretext Task
• Features used: Edges
• Assumptions: Connection density
• Loss Type: Classification
• Rather than clustering the node features, instead they partition the
network based on the structure
• Similarly, partition indices are the self-supervised label to predict
187
Graph Completion Pretext Task
• Features used: Nodes & Edges
• Assumptions: Context based Representation
• Loss Type: Regression
188
SSL Universally Improves Most GNN Models
Insights:
• Generally multi-task performs better than
pretraining/finetuning
• SSL acts universally well to improve many
GNN base models
189
When and Why SSL Works on GNNs
• Presents a set of basic pretext tasks using structure and
attribute information
Self-Supervised Learning on Graphs: Deep Insights and New Directions. arXiv. 2020. 190
Basic Pretext Tasks
on Graphs Pretext Tasks
Node Property
Local
Information Type
Edge Mask
Structure
Pairwise Distance
Global
Distance to Clusters
Attribute Mask
Attribute
Pairwise Attribute Similarity
191
Local Structure Pretext Tasks
Node Property Regression Loss
GNN -
Extracted Node Mapped node
Property embeddings
i j Predict
Masked
GNN - vs.
Remaining
Mapped node
embeddings
[Input] 192
Global Structure Pretext Tasks
Pairwise Distance Classification Loss
i j Predict
Calculate
all (i,j) GNN -
distances
GNN -
Mapped node
Distance from each embeddings
node to the center Obtain k
clusters [Input]
of each clusters 193
Attribute Pretext Tasks
Attribute Mask Regression Loss
GNN -
Mapped Node
embeddings for
Nodes with . nodes
[Input] masked attributes
values
GNN
- -
Least similar
pairs Mapped node embeddings
[Input] of most/least similar pairs
194
Empirical Study of Basic Pretext Tasks
Local
Structure
Global
Structure
Attribute
Insights:
• In general joint/multi-task training outperforms pre-training/two-stage training
• Global structure generally outperforms local structure
• Is there a way to further combine and improve these basic methods?
195
Deeper Insights to Why some SSL Work
GCN node
embeddings
Positive
🡪 achieve
values
higher
accuracy
• regularly equivalent even if not having the same neighbors if the neighbors are themselves
similar
🡪 If this similarity is based on their attributes, even if neighbors are different,
if their neighbors share similar features then two nodes are similar
e.g., Pairwise Attribute Similarity pretext helps maintain this
🡪 Next we define regular task equivalence… 197
Further Insights and New Directions
Node similarity is a fundamental property of graphs
🡪 Does this similarity get maintained in the GCN embeddings?
SelfTask
intuitions of
regular task Structure Context Label
equivalence
+
Attribute Ensemble Label
+
Label Corrected Label
199
SelfTask: Distance to Labeled
200
SelfTask: Context Label
Each node constructs a neighbor label
distribution context vector
201
SelfTask: Corrected Label
Key Idea: Improve Context Label by iteratively improving the context vector
SelfTask:
Context Label
202
Advanced SSL Results with
SelfTask for Node
Classification
203
Insights:
• Advanced methods utilizing the label information of
neighbors significantly improves performance
• The label correction stage indeed helps SelfTask
• Limited labeled data? No problem!
Summary of SSL for GNNs
• SSL for GNNs is still in the early stages but seen rapid growth/interest
• Just as in other domains, not all defined pretext tasks can work
• Some are more general than others
• While some can be specifically designed with domain specific knowledge
205
Graph Convolutional Networks (GCN)
- (Kipf and Welling 2017)
Motivation
Matrix form of a GCN layer:
Problems of GCN:
• Time and memory cost for large graphs:
Per-node form (embedding vectors are oriented as column vectors)
Full neighborhood of
v
• matrix form (in a batch, all “v” has the same sampled neighbors)
Comparison
GCN FastGCN
Importance Sampling
• Uniform sampling
• Variance reduction
• Importance sampling, sampling from Q instead of a uniform P
Results
213
Graph Sampling
- GraphSAINT (Zeng et al. 2020)
Instead of node sampling or layer sampling, do graph sampling
215
Application in Real-World
- PinSAGE (Ying et al. 2018)
An early industry-level GNN-based recommendation system
• The core of PinSAGE is a neighborhood aggregation algorithm similar to
GraphSAGE
• Novelty: how to define the neighborhood?
• Importance-based: the neighborhood of a node u is defined as the T nodes that exert the
most influence on node u.
• Random walks from node u, top T visited nodes.
• Efficient Training:
• Does not train on the whole graph, but only on
targeted node set and their neighborhood
• MapReduce for model inference
216
Application in Real-World
-Anti-Money Laundering
Application of FastGCN on large synthetic AML datasets:
• Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional
Networks for Financial Forensics (NeurIPS 2018 WS)
• Entities as nodes and transactions as edges
• Detecting suspicious nodes/transactions
217
Industrial-level Libraries and Applications
Deep Graph Library(DGL)
PyTorch Geometric (PyG)
AliGraph -- Alibaba
PyTorch-BigGraph (PBG) -- Facebook
AntGraph Machine Learning system (AGL) -- Ant Finance
…
218
Part I Overview
219
Graph Neural Networks for
Healthcare Applications
--Drug Discovery and Medical Recommendation
Tengfei Ma
IBM Research AI
IBM T. J. Watson Research Center
@KDD2020 Tutorial
220
Drug Discovery
Drug Discovery is a long tedious costly process
• Machine learning can help
• de novo drug design
• Generating new molecules for desired target.
• drug safety checking
• Toxicity
• Adverse reaction/drug-drug interaction
Interestingly, they are all related to graphs
• Molecule -- graph
• DDI – graph
It is natural to develop GNN-based methods
• Molecule generation
• DDI prediction
221
Constrained Generation of Semantically
Valid Graphs via Regularizing Variational
Autoencoders
NeurIPS 2018.
222
Molecule Graph Generation
Generative Models for Images/Sequences
229
Adverse Drug-Drug Interaction (DDI)
GCN GCN
GCN encoder
Normalize
GCN decoder
GCN encoder
What if we do not have node features?
GCN decoder
GCN encoder
Results
Analysis of Attention
Attention Weights
239
EHR Phenotyping and Medication
Recommendation
EHR (electronic healthcare record):
• Representation
241
Medical Recommendation I: GAMENet
GAMENet: Graph Augmented MEmory Networks for Recommending
Medication Combination (Shang et al. 2019a)
• Key ideas: integrate the DDI graphs to provide safer medical recommendation
• Method: encode both EHRs and DDI graphs in the memory and impact on the
memory output
Graph Augmented Memory Module
I: Input memory representation converts inputs
into query for memory reading.
• RNN
G: Generalization is the process of generating
and updating the memory representation.
• Static memory bank Mb: GCN
• Dynamic memory Md : adding key-value pair
O: Output memory representation produces
outputs given the patient representation (the
query) and the current memory state Mb and Md.
• Root to leaf
248
Pre-training
Input example:
• [CLS] d1, d2, mask
d3, d4, m1, mask
m2, m3
• d: diagnosis; m: medication
pre-training on each visit of EHR sequences
• Self-prediction: same as BERT, use to a mask to mask out some codes and
predict them
• Dual-prediction: known all diagnosis, predict medication; known medication,
predict diagnosis
• Note: No position embedding, because there is no order within one visit.
Results
• We used EHR data from MIMIC-III
[Johnson et al., 2016] and conducted all
our experiments on a cohort where
patients have more than one visit.
IBM Research AI
Deep Graph Learning and
Graph-to-Sequence Learning in NLP
Lingfei Wu
IBM Research AI
IBM T. J. Watson Research Center
Joint work with Yu Chen, Mohammed J Zaki, Kun Xu, Zhiguo Wang, Yansong Feng,
Michael Witbrock, and Vadim Sheinin
Graph!
252
Graph-structured data are ubiquitous
Joint work with Yu Chen, Mohammed J Zaki, Kun Xu, Zhiguo Wang, Yansong Feng,
Michael Witbrock, and Vadim Sheinin
Graph!
256
Graph-structured data are ubiquitous
259
Graph Learning: Motivations
• GNNs are powerful, unfortunately, it
requires graph-structured data
available.
• Questionable if the given intrinsic
graph-structures are optimal (i.e., noisy,
incomplete) for the downstream tasks.
• Many applications (e.g., NLP tasks) may
only have non-graph structured data or
even just the original feature matrix.
260
Graph Learning: Formulation
261
Existing State-of-the-art Methods
Graph construction from data [Kalofolias, 2016; Kalofolias and Perraudinl, 2017]
• Gaussian kernel or KNN-based
• Directly optimizing the graph adjacency matrix with smoothed graph signals
• Issues: 1) does not consider downstream task; 2) no refinement
Dynamic models of interacting systems [Kipf et al., ICML’18]
• Inferring an explicit interaction structure using a variational graph auto-encoder
• Issues: 1) cannot joint learn the graph structure and graph representations; 2)
transductive
• Jointly optimizing graph structures and GNN parameters [Franceschi, ICML ’19]
• Modeling joint probability distribution on the edges of the graph consisting of N
number of vertices
• Issues: 1) hard to optimize; 2) Not scalable; 3) cannot handle inductive learning
262
Iterative Deep Graph Learning : System
Overview
where is the normalized adjacency matrix of the initial graph (or kNN-graph).
264
IDGL: Graph Regularization
We adapt the techniques designed for learning graphs from smooth and apply
them as regularization for controlling smoothness, connectivity and sparsity
Smoothness
265
IDGL: Iterative Method for Joint Graph
Structure and Representation Learning
Iterative method repeatedly
▪ Learn better adjacency matrix with the
updated node embeddings
▪ Learn better node embeddings with the
refined adjacency matrix
Iterative procedure dynamically stops in a
mini-batch
▪ the learned adjacency matrix converges
with certain threshold
▪ the maximal number of iterations is reached
266
Results (Transudative Setting)
267
Results (Inductive Setting & Runtime)
268
Results (Ablation Study)
269
Results (Robustness to Missing/Adding Edges)
270
Results (Convergence & Dynamic Stopping)
271
Graph-to-sequence Learning in
Natural Language Processing
272
Seq2Seq: Applications and Challenges
Applications Challenges
• Machine translation • Only applied to problems whose
• Natural Language Generation inputs are represented as
• Logic form translation sequences
• Drug Discovery • Cannot handle more complex
structure such as graphs
• Converting graph inputs into
sequences inputs lose information
• Augmenting original sequence
inputs with additional structural
information enhances word
sequence feature
273
Contributions and Highlighted Research
Fundamental contributions in this research:
• Presented Graph2Seq, a generalized seq2seq model for graph inputs
• Attention-based encoder-decoder model for graph-to-sequence learning
274
[1]
Graph-to-Sequence Model
Graph Convolutional Neural Network
[1] Kun Xu*, Lingfei Wu*, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin (Equally Contributed), "Graph2Seq: Graph to Sequence
Learning with Attention-based Neural Networks", arXiv 2018.
[2] Yu Chen, Lingfei Wu** and Mohammed J. Zaki (**Corresponding Author), "Reinforcement Learning Based Graph-to-Sequence Model for Natural
Question Generation”, ICLR’20. 275
Bidirectional Node Embedding (Separate)
Bi-Sep Node embedding (take node v as an example)
1. transform each node’s text attribute to a feature vector by looking up the
embedding matrix
277
Graph Encoding
Graph embedding
• Pooling based graph embedding (max, min and average pooling)
• Node based graph embedding
Add one super node which is connected to all other nodes in the graph
The embedding of this super node is treated as graph embedding
278
Attention Based Sequence Decoding
context node
vector representation
279
Attention Based Sequence Decoding
context node
vector representation
attention alignment
weights model
280
Attention Based Sequence Decoding
context node
vector representation
attention alignment
weights model
Objective Function
281
Experiments: Text Reasoning and Shortest
Path
282
Experiments: Bidirectional Node Embedding
Converge
More quickly
283
When Shall We Use Graph2Seq?
Case I: the inputs are naturally or Case II: Hybrid Graph with
best represented in graph sequence and its hidden structural
information
285
Natural Language Interface to Database
Need explanation !
286
SQL-to-text Generation Task
287
Previous Approaches
Template-based approaches
Koutrikaal et al. 2010
Time consuming and generating rigid and stylized language !
Ngonga Ngomo et al. 2013
288
Problem
SQL query is a graph structured query
Naïve sequence encoders may need an elaborate design to fully capture the global
structure information
289
Motivation
290
Graph Representation of SQL Query
Where clause
291
Graph Representation of SQL Query
Where clause
292
Graph Representation of SQL Query
Where clause
293
Graph Representation of SQL Query
294
Graph Representation of SQL Query
295
Graph Representation of SQL Query
296
Graph Representation of SQL Query
297
Graph Representation of SQL Query
298
Graph Representation of SQL Query
299
Encoder-Decoder Architecture
Encoding
300
[1]
Graph-to-Sequence Model
Graph Convolutional Neural Network
[1] Kun Xu*, Lingfei Wu*, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin (Equally Contributed), "Graph2Seq: Graph to Sequence
Learning with Attention-based Neural Networks", arXiv 2018.
[2] Yu Chen, Lingfei Wu** and Mohammed J. Zaki (**Corresponding Author), "Reinforcement Learning Based Graph-to-Sequence Model for Natural
Question Generation”, ICLR’20. 301
Experiments
Datasets
WikiSQL (61,297 for training, 9,145 for development and 17,284 for test)
Stackoverflow (25,869 for training, 3,234 for development and 3,234 for test)
Baselines
Template
a) first map each element of a SQL query to an utterance
b) then use simple rules to assemble these utterances
Seq2Seq
We implement the model proposed in Bahdanau et al. 2014
Seq2Seq + Copy
We implement the model proposed in Gu et al. 2016
Tree2Seq
We implement the model proposed in Eriguchi et al. 2016
302
Results
Criteria
BLEU-4
Grammar (human evaluation)
Correct (human evaluation) WikiSQL
Stack
overflow
303
Examples
Seq2Seq model
Graph2Seq model
Seq2Seq model
Graph2Seq model
304
Graph2Seq + Reinforcement Learning
for Question Generation (ICLR’20)
305
Natural Question Generation: Background
Natural question generation (QG) is
a challenging yet rewarding task,
that aims to generate questions
given an input passage and a target
answer.
306
Natural Question Generation: Definition
Input:
A text passage:
A target answer:
• Output:
The best natural language question:
which maximizes the conditional likelihood:
307
Existing State-of-the-art Methods
Template-based approaches
Mostow & Chen, 2009; Heilman & Smith, 2010; Heilman, 2011
Rely on heuristic rules or hand-crafted templates
low generalizability and scalability
Seq2Seq-based approaches
Du et al., 2017; Zhou et al., 2017; Song et al., 2018a; Kumar et al., 2018a
Fail to utilize the rich text structure information beyond the simple word
sequence
Rely on cross-entropy based sequence training which has several limitations
Fail to effectively utilize the answer information
308
Known Issues of Existing Approaches
Issue I: fail to consider global Solution I: Deep alignment network
interactions between answer and to align answer and context
context
Issue II: fail to consider rich hidden Solution II: Novel Graph2Seq model
structure information of word for considering hidden structure
sequence information in sequence
Issue III: limitations of
cross-entropy based objectives like Solution III: Novel Reinforcement
exposure bias and inconsistency Learning Loss for enforcing syntactic
between train/test measurement and semantic coherent of generated
text
309
RL-based Graph2Seq for QG: System
Overview
310
Deep Answer Alignment
A deep alignment network for incorporating the answer information into passages at
both the word level and the contextualized hidden state level
denotes passage
denotes answer
311
Graph Construction: Static VS Dynamic
Syntax-based static graph
A directed and unweighted passage
graph based on dependency parsing
312
Encoder-Decoder Architecture
Encoding
313
[1]
Graph-to-Sequence Model
Graph Convolutional Neural Network
[1] Kun Xu*, Lingfei Wu*, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin (Equally Contributed), "Graph2Seq: Graph to Sequence
Learning with Attention-based Neural Networks", arXiv 2018.
[2] Yu Chen, Lingfei Wu** and Mohammed J. Zaki (**Corresponding Author), "Reinforcement Learning Based Graph-to-Sequence Model for Natural
Question Generation”, ICLR’20. 314
Hybrid Evaluator
Regular cross-entropy based training objectives have limitations
Exposure bias
Evaluation discrepancy between training and testing
We apply a mixed objective function combining both the
cross-entropy loss and RL loss
Ensure the generation of syntactically and semantically valid text
316
Human Evaluation and Ablation Study
Results
317
Graph4NLP: A Library for Deep
Learning on Graphs for NLP
318
Architecture of Graph4NLP Library
Topology
Graph
Construction
Embedding
Node
Encoding
SAGE GCN GAT GGNN RGCN
Graph Construction
Evaluation Loss
321
What’s The Next?
• Welcome all to attend The Second International
Workshop on Deep Learning on Graphs: Methods
and Applications (DLG-KDD’20) will be held joint
with KDD MLG workshop on August 24th, 2020