Dlvu Lecture08

Lecture 8: Learning with Graphs
Michael Cochez
Deep Learning 2020
dlvu.github.io
THE PLAN
part 1: Introduction - Why graphs? What are embeddings?
part 2: Graph Embedding Techniques
part 3: Graph Neural Networks
part 4: Application - Query embedding
2
PART ONE - A: INTRODUCTION - GRAPHS
When was the last time you ...
reconnected with a friend?
Facebook Social Graph
http://www.businessinsider.com/explainer-
what-exactly-is-the-social-graph-2012-3

visited a doctor?
IBM Watson
https://www.ibm.com/think/marketing/how-watson-learns/

visited a doctor?
browsed through products in a webshop?
Amazon Product Graph


visited a doctor?
did a web search?
Google Knowledge Graph
https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html

visited a doctor?
did a web search?
Knowledge graphs are all around us.
Other examples: Cyc, Freebase, DBPedia, Wikidata,
YAGO, Thomson Reuters, Microsoft Satori, Yahoo
KG, Springer, ...
Graphs - Undirected - Simple Graphs
9
Graphs - Undirected - Self loop
10
Graphs - Undirected - multigraph
11
Graphs - Directed
12
Graphs - Edge and node labels/types
Bob Likes
Double
Likes Cream
Similar to
Coffee
cream
Dislikes
cheese
Similar to
Milk
13
Graphs - Edge and node labels/types - RDF (simplified)
http://exa
mple.com http://exa
/Bob mple.com
/Likes http://exa
mple.com http://exa
http://exa /Double_ mple.com
mple.com Cream /Fat_perc
entage 48
/Likes
http://example.com/Similar_to
http://exa
mple.com
/Coffee_c http://example.com/Similar_to
ream
http://example.com/Dislikes
http://exa
mple.com
/cheese
http://example.com/Similar_to
http://exa
mple.com
/Milk
14
Graphs - Edge and node labels/types + weights
Bob Likes 0.8
Double
Likes 0.9 Cream
Similar to 0.5
Coffee
cream
Dislikes 0.7
cheese
Similar to 0.8
Milk
15
Graphs - Edge and node labels/types + weights
Likes
weight: 0.8
Bob
when: morning
with: Double
Likes strawberries Cream
weight: 0.9
when: evening
with: Coffee Similar to
weight: 0.5
Coffee
cream
Dislikes
weight: 0.7
cheese
when: always
Similar to
weight: 0.8 Milk
16
Graphs - summary
● For a given graph, you should know whether it has:

Self loops or not
○
Multigraph or not
○
Directed/undirected/mix
○
Edge labels (unique?)

○
Node labels (unique?)

○
Properties on edges (also called qualifiers)

○
Edge weights
■
● Any combination of these is possible
17
What is now a knowledge graph?
http
Likes ://e http
weight: 0.8 xa
http ://e
Bob mpl xa
when: e.c ://e mpl
Dou om/ xam
morning ble Bob
e.c http
Likes with: http
ple. om/ ://e
Crea co Do
weight: strawberri m ://e
m/L ubl
xam
48
xam ple.
0.9 es
http ikes e_
co
://e ple. http://example. Cre
when: Similar xa co com/Similar_to am m/F
Coff evening mpl m/L at_
to e.c ikes per
ee with: om/
crea weight: Cof
cen
m Coffee 0.5 fee
tag
http://examp e
_cr
ea
le.com/Dislik http
Dislikes m es ://e
xa
weight: chee mpl
0.7 se e.c
http://exampl om/
when: e.com/Similar http che
Similar always _to
://e
xa
ese
to Milk mpl
e.c
weight: om/
0.8 Milk
18
PART ONE - B: INTRODUCTION - EMBEDDINGS
Embeddings
Embeddings are low dimensional representations of objects
● Low dimension: Much lower as the original size

● Representation: There is some meaning to it, a representation
corresponds to something
● Objects: Words, sentences, images, audio, graphs, ….
20
Embedding of images
These are
embeddings!
21 image source: https://cs231n.github.io/convolutional-networks/#fc

Embedding of images - navigable space
https://arxiv.org/
abs/1911.05627
Embeddings of words
Distributional hypothesis
○ “You shall know a word by the company it keeps” Firth 1957
○ “If units of text have similar vectors in a text frequency matrix, then
they tend to have similar meaning” Turney and Pantel (2010)
For example, the word ‘cat’ occurs often in the context of the word
‘animal’, and so do words like ‘dog’ and ‘fish’.
But, the word ‘loudspeaker’ hardly ever co-occurs with ‘animal’
23
Embeddings of words
Distributional hypothesis
=> We can use context information to define embeddings for words.

One vector representation for each word
Generally, we can train them (see lecture on word2vec)
24
Embedding of words - navigation
https://samyzaf.com/ML/nlp/nl
p.html
25
Graphs - why use them as input to machine learning?
Classification, Regression, Clustering of nodes/edges/whole graph

Recommender systems (who likes what)
Document modeling
Entity and Document similarity (Use concepts from a graph)
Alignment of graphs (which nodes are similar?)
Link prediction and error detection
Linking text and semi-structured knowledge to graphs
26
PART TWO: Graph Embedding Techniques
The Challenge
28
Graphs - model mismatch
29
Graphs - model mismatch
30
What we skip
● Traditional ML on graphs
○Often have problems with scalability
○Often need manual feature engineering
Task Specific
■
31
Embedding Knowledge Graphs in Vector Spaces
Embedding - propositionalization
✘ One vector for each entity

✗ Compatible with traditional data mining algorithms
and tools
✘ Preserve the information
✘ Unsupervised
✗ task and dataset independent
✘ Efficient computation
✘ Low dimensional representation
Two Major Visions
Preserve Topology Preserve Similarity

● Keep neighbours close ● Keep similar nodes
together close together
● Europe - Germany ● Europe - Africa

● Africa - Algeria ● Germany - Algeria
Two Major Targets
Improve original data Downstream ML Tasks

● Knowledge Graph ● Classification/Regressio
Completion n/Clustering/K-NN
● Link Prediction
● Anomaly Detection
● Then used as part of a
larger process
○ QA/dialog systems
○ Translation
○ Image segmentation
How ?
Three Major Approaches for Propositionalization
Translational Tensor Factorization Random Walk Based

● Interpret ● Make a 3D ● Use the context
relations as matrix and of a concept to
translations of factorize it embed it
concept in the ● Reconstructing ● Use the
embedded the original hints distributional
space which edges hypothesis
were missing
Translational
XTransX - translational embedding
(Bordes et al. NIPS 2013 , Lin et al., AAAI'15)
✘ TransE
✘ TransH
✘ TransR
✘ CTransR
✘ PTransE
✘ ...
TransE - translational embedding
(Bordes et al. NIPS 2013)
Source: Structured query construction via knowledge graph

embedding, 2019, Ruijie Wang et al.
TransE - translational embedding
(Bordes et al. NIPS 2013)
✘ TransE
✗ Get h+l close to t
■ If (h,l,t) is a good triple
✗ Get h+l far from t
■ If (h,l,t) is a bad triple
XTransX - translational embedding
(Bordes et al. NIPS 2013 , Lin et al., AAAI'15)
✘ TransE
✘ TransH
✘ TransR Conceptually Embedding
✘ CTransR Easy Quality
✘ PTransE The better the
✘ ... model, the
less scalable
One Hop
Matrix Factorization
RESCAL - Tensor Factorization
( Nickel and Tresp, ECML PKDD 2013)
Tensor Factorization
Conceptually Explainable Embedding

Easy Quality for ML
Good for link Numeric tasks
prediction attributes can The better the
Usually be included model, the
Scalable somehow less scalable
Multi hop
Random Walk Based
Random Walk based methods
(Cochez, et al. , ISWC '17, Cochez, et al. WIMS'17, Ristoski et
al. ISWC '16, Grover, Leskovec KDD '16)
● Use random walks to extract a context for

each node
● Use this context as input to word embedding
techniques
Random Walk based methods - RDF2Vec and Node2Vec
(Cochez, et al. WIMS'17, Ristoski et al. ISWC '16, Grover,
Leskovec KDD '16)
● Use random walks to create sequences on

the graph.
● Feed these to word2vec
● Biasing the walks helps for specific cases

● Also some other graph kernels have been
used
Global Embeddings - KGloVe
(Cochez, Ristoski, et al. , ISWC '17)
● Create co-occurrence stats using

Personalized PageRank
○ All pairs PPR
● Apply the GloVe model
○ Similar things will have similar contexts
○ Optimizes for preserving analogy
■ King - Man + Woman ≃ Queen
■ Berlin - Germany + Austria ≃ Vienna
○ Complete context captured
Random walk based methods
(Cochez, et al. , ISWC '17, Cochez, et al. WIMS'17, Ristoski et
al. ISWC '16, Grover, Leskovec KDD '16)
Deals with large Likely or Link

Graphs partially Prediction
Good Explainable
Embeddings
Larger
Good Training
Time Context Used
PART THREE: Graph Neural Networks
Graph Convolutional Networks (GCN: Kipf and Welling,
ICLR'17, RGCN: Schlichtkrull et al. ESWC'18)
● How can we directly incorporate graph

information into a machine learning
algorithm?
○ Especially for end-to-end learning
Embedding
Graph Convolutional Network - Merge the two worlds
GCN - Example Graph
A B
D
C
GCN - Example Graph - 1 Layer
A B
A D
C
B
D
GCN - Example Graph - 2 Layer
A B
A A D
C
B B
C C
D D
GCN - Example Graph - 2 Layer with Connections
A B
A A D
C
B B
C C
D D
A B
GCN - Example Graph - Multi Layer D
C
A A A
B B B
...
C C C
D D D
GCN - Example Graph - Weights
A B
A A D
C
B B
... Each Edge is a
C C neural network!
D D B W D
A B
A A D
C
B B
... The input to EACH
C C Node is a vector!
D D W
A B
A A D
C
B
... B
The output for
C C EACH Node is a
vector!
D D W
GCN - Example Graph - Weights Sharing
A B
A A D
C
B B
... The weight
matrix is shared!
C C
D D W
GCN - Example Graph - Self Loops
A B
A A D
C
B B
... … and self-loops
are added
C C
D D
GCN - A different view - sparse matrix multiplications
In practise what was presented does not scale well

○ (Except with clever engineering)
In practise more normalization is needed
65
Reformulation:
(l)
H is the l-th layer in the unrolled network (the l-th time-step)
A is the adjacency matrix, Ã is the same with also the diagonal set to 1
(l)
W is a learnable weight matrix for layer l
66
Reformulation:
(l)
H is the l-th layer in the unrolled network (the l-th time-step)
A is the adjacency matrix, Ã is the same with also the diagonal set to 1
Used for normalization

(l)
W is a learnable weight matrix for layer l
67
That's it, we can include this
structure into a larger network.
Examples of tasks
Node classification
○ What is the type of a node?
Regression of attributes in the graph
○ What is the price of the product?
Regression/classification on the complete graph (by combining the
output)
● What is the boiling point of a molecule?
● Is this molecule poisonous?
69
What if the graph has typed edges?
RGCN
A B
A A D
C
B B
... The weight
matrix is shared
C C per edge type!
D D W
RGCN - Reverse edges
A B
A A D
C
B B
... Also reverse edges
(inverse relations)
C C are added
D D
RGCN - formally
In matrix multiplication form, the R-GCN is computed as follows:
-> this formulation is per node in the graph, not for all at once, as was
done in the GCN formulation!
73
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
Wr is the weight matrix for relation r at layer l (R is the set of all relations)
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
74
RGCN - formally
(l)
(l)
r
75
RGCN - formally
(l)
(l)
r
76
RGCN - formally
(l)
(l)
r
77
RGCN - formally
(l)
(l)
c is a normalization constant.
i, r matrix for relation r at layer l (R is the set of all relations)
Wr is the weight r
Usually ci,r is |Ni |
r
78
In general, we can do whatever we
want in the unrolled view...
and see whether we can implement
it somehow more efficiently...
GCN - We can do whatever we want
A B
A A network can be
This neural D
whatever architecture you C
have seen in this course!!!
B B
... Each Edge is a
C C neural network!
D D B W D
GCN - Example Graph - Weights Sharing
A B
YouAcan share wheights
A in
D
whatever way seems to
C
make sense.
B B
... The weight
matrix is shared! See also Scarselli, Franco, et al.
"The graph neural network
C C model." IEEE Transactions on

Neural Networks 20.1 (2008):
61-80.
D D W
PART FOUR: Application - Query embedding
https://arxiv.org/abs/2002.02406
Knowledge graphs
● Can model interactions and properties

○ Medicine, biology, world facts, ...
● In general, useful for
○ Storing facts about entities and
relations
○ Answering questions about them
https://commons.wikimedia.org/wiki/File:Moreno_Sociogram_1st_Grade.png
Queries on knowledge graphs
● SPARQL queries operate on existing edges

○ Select all Projects, related to ML, on
Alice which Alice works
Proj1
○ Answer: Proj1
VU
ML
Proj2
Bob
Queries on knowledge graphs
● SPARQL queries operate on existing edges

○ Select all Projects, related to ML, on
Alice which Alice works
Proj1
○ Answer: Proj1
VU
ML
● Is Proj2 a likely answer?
Proj2
Bob
Link prediction on knowledge graphs
● Assign a vector in to every node: an

embedding
Alice ● The score of an edge is a function of the
Proj1 embeddings of entities involved
VU
ML ● Optimize:
○ Maximize scores of existing edges
Proj2
Bob
○ Minimize scores of random edges
● Examples: TransE, DistMult, ComplEx
Link prediction for complex queries?
● Select all topics T,

● where T is related to a project P,
Alice
● and Alice and Bob work on P.
Proj1
VU
ML
● Link prediction requires enumerating all

Bob
Proj2 possible T and P
○ Grows exponentially!
Link prediction for complex queries?
● Select all topics T,

● where T is related to a project P,
● and Alice and Bob work on P.
● Link prediction requires enumerating all

possible T and P
○ Grows exponentially!
A subset of Wikidata
Queries are graphs too
● In particular, Basic Graph Patterns1
1
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C recommendation 21(10) (2013)
Queries are graphs too
● In particular, Basic Graph Patterns1
● Select all topics T where
○ T is related to project P
○ Alice works on P and Bob works on P
Alice P
T (target node)
Bob
1
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C recommendation 21(10) (2013)
Embedding queries
ex:Author
rdf:type
“Bob”
ex:wrote
?T
Query Encoder
:S1 hasSubject :Bob .

:S1 hasPredictate rdf:type .
:S1 hasObject ex:author .
Alice
Proj1
VU
ML
Proj2
Bob
Embedding queries
Alice P
T
Nearest Neighbor Answer
Bob
Search
Query Encoder
Alice
Proj1
VU
ML
Proj2
Bob
The query encoder Alice P
T (target node)
● Graph Convolutional Networks operate on Bob
graphs, by applying message passing:
○ Messages are vectors
● Message-Passing Query Embedding:

○ Learnable parameters include both
entity and variable node embeddings
○ Propagate messages across the BGP
The query encoder
● Graph Convolutional Networks operate on

Alice P
○ Learnable parameters include both T (target node)
entity and variable node embeddings Bob

The query encoder

Alice P

The query encoder

Alice P

○ After k steps of MP, map all node
messages to a single query embedding
●
Query embedding
Graph aggregation functions
● Map node messages to query embedding

● Ideally permutation invariant
● Can contain learnable parameters for increased flexibility
● Simplest form: message at the target node
Alice P
T (target node)
Bob
Query embedding
Graph aggregation functions
● Sum
● Max
● MLP
● CMLP
● TMLP
Why is this a good idea?
● Query encoded in embedding space before matching
● Answering is then O(n) instead of exponential
Alice P
● MPQE encodes arbitrary queries
T (target node)
Bob
Query embedding
Evaluation Alice
P
T
● Queries obtained from KG:
○ Sample subgraphs
○ Replace some entities by variables
Alice
P
T
Alice
Bob
Proj1
VU
ML
Alice
Proj2
Bob
O
Bob
Evaluation
● Queries for training obtained after dropping some edges
● 4 knowledge graphs
Evaluation
● Crucial question: how does a method generalize to unseen queries?
● Two scenarios:
○ Train on all 7 structures, evaluate on same structures
○ Train on 1-chain queries only, evaluate on all 7 structures
Results - all query types
● We obtain competitive performance with previous work.

● Message-passing alone(RGCN-TM) is an effective mechanism
Results - 1-chain queries
● By training for link prediction only, our method generalizes to other 6, more complex query
structures that were not seen during training
Learned representations
● Compared to previous methods (right), our method (left) learns embeddings that cluster according to
the type of the entity.
● This points to future applications in learning better embeddings for KGs
Using R-GCN for Query embedding - Conclusion
● The proposed architecture is simple and learns entity and type embeddings useful for solving the
task
● Our method allows encoding a general set of queries defined in terms of BGPs, by learning entity
and variable embeddings and not constraining the query structure
● The message passing mechanism across the BGP exhibits superior generalization than previous
methods
● Embeddings successfully capture the notion of entity types without supervision
THE PLAN
part 1: Introduction - Why graphs? What are embeddings?
part 2: Graph Embedding Techniques
part 3: Graph Neural Networks
part 4: Application - Query embedding
1
0

Dlvu Lecture08

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dlvu Lecture08

Uploaded by

Copyright:

Available Formats

Lecture 8: Learning with Graphs

part 1: Introduction - Why graphs? What are embeddings?

part 2: Graph Embedding Techniques

part 3: Graph Neural Networks

part 4: Application - Query embedding

reconnected with a friend?

Facebook Social Graph

reconnected with a friend?

reconnected with a friend?

Amazon Product Graph

reconnected with a friend?

browsed through products in a webshop?

Bob Likes 0.8

● For a given graph, you should know whether it has:

Edge labels (unique?)

Node labels (unique?)

Properties on edges (also called qualifiers)

● Any combination of these is possible

Embeddings are low dimensional representations of objects

● Low dimension: Much lower as the original size

21 image source: https://cs231n.github.io/convolutional-networks/#fc

=> We can use context information to define embeddings for words.

Classification, Regression, Clustering of nodes/edges/whole graph

✘ One vector for each entity

Preserve Topology Preserve Similarity

● Europe - Germany ● Europe - Africa

Improve original data Downstream ML Tasks

Translational Tensor Factorization Random Walk Based

Source: Structured query construction via knowledge graph

Conceptually Explainable Embedding

● Use random walks to extract a context for

● Use random walks to create sequences on

● Biasing the walks helps for speciﬁc cases

● Create co-occurrence stats using

Deals with large Likely or Link

● How can we directly incorporate graph

In practise what was presented does not scale well

Used for normalization

In matrix multiplication form, the R-GCN is computed as follows:

In matrix multiplication form, the R-GCN is computed as follows:

In matrix multiplication form, the R-GCN is computed as follows:

In matrix multiplication form, the R-GCN is computed as follows:

In matrix multiplication form, the R-GCN is computed as follows:

In matrix multiplication form, the R-GCN is computed as follows:

C C model." IEEE Transactions on

● Can model interactions and properties

● SPARQL queries operate on existing edges

● SPARQL queries operate on existing edges

● Assign a vector in to every node: an

● Select all topics T,

● Link prediction requires enumerating all

● Select all topics T,

● Link prediction requires enumerating all

:S1 hasSubject :Bob .

● Message-Passing Query Embedding:

● Message-Passing Query Embedding:

○ Propagate messages across the BGP

● Message-Passing Query Embedding:

○ Propagate messages across the BGP

● Message-Passing Query Embedding:

○ Propagate messages across the BGP

● Map node messages to query embedding