Professional Documents
Culture Documents
Michael Cochez
Deep Learning 2020
dlvu.github.io
THE PLAN
2
PART ONE - A: INTRODUCTION - GRAPHS
When was the last time you ...
http://www.businessinsider.com/explainer-
what-exactly-is-the-social-graph-2012-3
When was the last time you ...
IBM Watson
https://www.ibm.com/think/marketing/how-watson-learns/
When was the last time you ...
https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html
When was the last time you ...
9
Graphs - Undirected - Self loop
10
Graphs - Undirected - multigraph
11
Graphs - Directed
12
Graphs - Edge and node labels/types
Bob Likes
Double
Likes Cream
Similar to
Coffee
cream
Dislikes
cheese
Similar to
Milk
13
Graphs - Edge and node labels/types - RDF (simplified)
http://exa
mple.com http://exa
/Bob mple.com
/Likes http://exa
mple.com http://exa
http://exa /Double_ mple.com
mple.com Cream /Fat_perc
entage 48
/Likes
http://example.com/Similar_to
http://exa
mple.com
/Coffee_c http://example.com/Similar_to
ream
http://example.com/Dislikes
http://exa
mple.com
/cheese
http://example.com/Similar_to
http://exa
mple.com
/Milk
14
Graphs - Edge and node labels/types + weights
Double
Likes 0.9 Cream
Similar to 0.5
Coffee
cream
Dislikes 0.7
cheese
Similar to 0.8
Milk
15
Graphs - Edge and node labels/types + weights
Likes
weight: 0.8
Bob
when: morning
with: Double
Likes strawberries Cream
weight: 0.9
when: evening
with: Coffee Similar to
weight: 0.5
Coffee
cream
Dislikes
weight: 0.7
cheese
when: always
Similar to
weight: 0.8 Milk
16
Graphs - summary
Multigraph or not
○
Directed/undirected/mix
○
Edge weights
■
17
What is now a knowledge graph?
http
Likes ://e http
weight: 0.8 xa
http ://e
Bob mpl xa
when: e.c ://e mpl
Dou om/ xam
morning ble Bob
e.c http
Likes with: http
ple. om/ ://e
Crea co Do
weight: strawberri m ://e
m/L ubl
xam
48
xam ple.
0.9 es
http ikes e_
co
://e ple. http://example. Cre
when: Similar xa co com/Similar_to am m/F
Coff evening mpl m/L at_
to e.c ikes per
ee with: om/
crea weight: Cof
cen
m Coffee 0.5 fee
tag
http://examp e
_cr
ea
le.com/Dislik http
Dislikes m es ://e
xa
weight: chee mpl
0.7 se e.c
http://exampl om/
when: e.com/Similar http che
Similar always _to
://e
xa
ese
to Milk mpl
e.c
weight: om/
0.8 Milk
18
PART ONE - B: INTRODUCTION - EMBEDDINGS
Embeddings
20
Embedding of images
These are
embeddings!
https://arxiv.org/
abs/1911.05627
Embeddings of words
Distributional hypothesis
○ “You shall know a word by the company it keeps” Firth 1957
○ “If units of text have similar vectors in a text frequency matrix, then
they tend to have similar meaning” Turney and Pantel (2010)
For example, the word ‘cat’ occurs often in the context of the word
‘animal’, and so do words like ‘dog’ and ‘fish’.
But, the word ‘loudspeaker’ hardly ever co-occurs with ‘animal’
23
Embeddings of words
Distributional hypothesis
24
Embedding of words - navigation
https://samyzaf.com/ML/nlp/nl
p.html
25
Graphs - why use them as input to machine learning?
26
PART TWO: Graph Embedding Techniques
The Challenge
28
Graphs - model mismatch
29
Graphs - model mismatch
30
What we skip
● Traditional ML on graphs
○Often have problems with scalability
○Often need manual feature engineering
Task Specific
■
31
Embedding Knowledge Graphs in Vector Spaces
Embedding - propositionalization
✘ TransE
✘ TransH
✘ TransR
✘ CTransR
✘ PTransE
✘ ...
TransE - translational embedding
(Bordes et al. NIPS 2013)
✘ TransE
✗ Get h+l close to t
■ If (h,l,t) is a good triple
✗ Get h+l far from t
■ If (h,l,t) is a bad triple
XTransX - translational embedding
(Bordes et al. NIPS 2013 , Lin et al., AAAI'15)
✘ TransE
✘ TransH
✘ TransR Conceptually Embedding
✘ CTransR Easy Quality
✘ PTransE The better the
✘ ... model, the
less scalable
One Hop
Matrix Factorization
RESCAL - Tensor Factorization
( Nickel and Tresp, ECML PKDD 2013)
Tensor Factorization
A B
D
C
GCN - Example Graph - 1 Layer
A B
A D
C
B
D
GCN - Example Graph - 2 Layer
A B
A A D
C
B B
C C
D D
GCN - Example Graph - 2 Layer with Connections
A B
A A D
C
B B
C C
D D
A B
GCN - Example Graph - Multi Layer D
C
A A A
B B B
...
C C C
D D D
GCN - Example Graph - Weights
A B
A A D
C
B B
... Each Edge is a
C C neural network!
D D B W D
GCN - Example Graph - Weights
A B
A A D
C
B B
... The input to EACH
C C Node is a vector!
D D W
GCN - Example Graph - Weights
A B
A A D
C
B
... B
The output for
C C EACH Node is a
vector!
D D W
GCN - Example Graph - Weights Sharing
A B
A A D
C
B B
... The weight
matrix is shared!
C C
D D W
GCN - Example Graph - Self Loops
A B
A A D
C
B B
... … and self-loops
are added
C C
D D
GCN - A different view - sparse matrix multiplications
65
GCN - A different view - sparse matrix multiplications
Reformulation:
(l)
H is the l-th layer in the unrolled network (the l-th time-step)
A is the adjacency matrix, Ã is the same with also the diagonal set to 1
(l)
W is a learnable weight matrix for layer l
66
GCN - A different view - sparse matrix multiplications
Reformulation:
(l)
H is the l-th layer in the unrolled network (the l-th time-step)
A is the adjacency matrix, Ã is the same with also the diagonal set to 1
Node classification
○ What is the type of a node?
Regression of attributes in the graph
○ What is the price of the product?
Regression/classification on the complete graph (by combining the
output)
● What is the boiling point of a molecule?
● Is this molecule poisonous?
69
What if the graph has typed edges?
RGCN
A B
A A D
C
B B
... The weight
matrix is shared
C C per edge type!
D D W
RGCN - Reverse edges
A B
A A D
C
B B
... Also reverse edges
(inverse relations)
C C are added
D D
RGCN - formally
-> this formulation is per node in the graph, not for all at once, as was
done in the GCN formulation!
73
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
Wr is the weight matrix for relation r at layer l (R is the set of all relations)
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
74
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
Wr is the weight matrix for relation r at layer l (R is the set of all relations)
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
75
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
Wr is the weight matrix for relation r at layer l (R is the set of all relations)
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
76
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
Wr is the weight matrix for relation r at layer l (R is the set of all relations)
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
77
RGCN - formally
(l)
hi is the i-th node, in the l-th layer (=l-th message passing step)
(l)
c is a normalization constant.
i, r matrix for relation r at layer l (R is the set of all relations)
Wr is the weight r
Usually ci,r is |Ni |
W0 is the weight matrix for the self loop
r
Ni is the set of neighbours of node i with respect to relation r
78
In general, we can do whatever we
want in the unrolled view...
and see whether we can implement
it somehow more efficiently...
GCN - We can do whatever we want
A B
A A network can be
This neural D
whatever architecture you C
have seen in this course!!!
B B
... Each Edge is a
C C neural network!
D D B W D
GCN - Example Graph - Weights Sharing
A B
YouAcan share wheights
A in
D
whatever way seems to
C
make sense.
B B
... The weight
matrix is shared! See also Scarselli, Franco, et al.
"The graph neural network
D D W
PART FOUR: Application - Query embedding
https://arxiv.org/abs/2002.02406
Knowledge graphs
https://commons.wikimedia.org/wiki/File:Moreno_Sociogram_1st_Grade.png
Queries on knowledge graphs
Proj2
Bob
Queries on knowledge graphs
A subset of Wikidata
Queries are graphs too
● In particular, Basic Graph Patterns1
1
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C recommendation 21(10) (2013)
Queries are graphs too
● In particular, Basic Graph Patterns1
● Select all topics T where
○ T is related to project P
○ Alice works on P and Bob works on P
Alice P
T (target node)
Bob
1
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C recommendation 21(10) (2013)
Embedding queries
ex:Author
rdf:type
“Bob”
ex:wrote
?T
Query Encoder
Proj1
VU
ML
Proj2
Bob
Embedding queries
Alice P
T
Nearest Neighbor Answer
Bob
Search
Query Encoder
Alice
Proj1
VU
ML
Proj2
Bob
The query encoder Alice P
T (target node)
● Graph Convolutional Networks operate on Bob
graphs, by applying message passing:
○ Messages are vectors
Query embedding
Graph aggregation functions
● Sum
● Max
● MLP
● CMLP
● TMLP
Why is this a good idea?
Alice P
● MPQE encodes arbitrary queries
T (target node)
Bob
Query embedding
Evaluation Alice
P
T
● Queries obtained from KG:
○ Sample subgraphs
○ Replace some entities by variables
Alice
P
T
Alice
Bob
Proj1
VU
ML
Alice
Proj2
Bob
O
Bob
Evaluation
● Queries for training obtained after dropping some edges
● 4 knowledge graphs
Evaluation
● Crucial question: how does a method generalize to unseen queries?
● Two scenarios:
○ Train on all 7 structures, evaluate on same structures
○ Train on 1-chain queries only, evaluate on all 7 structures
Results - all query types
● By training for link prediction only, our method generalizes to other 6, more complex query
structures that were not seen during training
Learned representations
● Compared to previous methods (right), our method (left) learns embeddings that cluster according to
the type of the entity.
● This points to future applications in learning better embeddings for KGs
Using R-GCN for Query embedding - Conclusion
● The proposed architecture is simple and learns entity and type embeddings useful for solving the
task
● Our method allows encoding a general set of queries defined in terms of BGPs, by learning entity
and variable embeddings and not constraining the query structure
● The message passing mechanism across the BGP exhibits superior generalization than previous
methods
● Embeddings successfully capture the notion of entity types without supervision
THE PLAN
1
0