You are on page 1of 32

SEMINAR: GRAPH-BASED METHODS FOR NLP

Organisatorisches:

•  Seminar findet komplett im Mai statt


•  Seminarausarbeitungen bis 15. Juli (?)
•  Hilfen Seminarvortrag / Ausarbeitung auf der Webseite
•  Tucan  number  for  registra1on:  20-­‐00-­‐0596-­‐se  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 1


Fahrplan  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 2


SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 3
Mo#va#on  for  graph  representa#on  
§  Graphs  are  an  intui1ve  and  natural  way  to  encode  en##es  (e.g.  language  units)  
as  nodes  and  their  rela#ons  (e.g.  similari1es)  as  edges  (directed  /  undirected)  
§  feature-­‐based  representa1on  can  be  transformed  into  a  graph  via  a  similarity  measure  
§  graphs  may  not  necessarily  be  transformed  back  into  a  feature  representa1on  (at  least  
not  a  unique  one).  Think  of  e.g.  points  in  n-­‐dimensional  space.  
 

Graph  isomorphism  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 4


Graph  representa#ons  

Adjacency  
Matrix  

ì
î
Adjacency  
List  

Additional information such as


weights might be saved easily.

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 5


Mo#va#on  for  graph  representa#on  
There  exist  efficient  algorithms  that  directly  operate  on  graphs  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 6


Efficient  Algorithms?  

P = NP

?

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 7


Efficient  Algorithms!  

There  are  efficient  (polynomial)  algorithms  for  the  exact  solu1on  of  many  problems  on  
graphs,  e.g.  
•  Graph  Traversal  (DFS,  Shortest  Paths,  Max-­‐Capacity  Paths,  …)  
•  Op1mal  Trees  and  Branchings  (MST,  MAX-­‐FOREST,  MAX-­‐BRANCHING,  …)  
•  Graph  Clustering  (Min-­‐Cut,  Markow  Clustering,  Chinese  Whispers,  …)  
•  Graph  Ranking  (PageRank,  Random  Walks,  Markow  Chain  Theory)  
•  Graph  Distances  (local:  Paths,  global:  Graph  Edit  Distance,  …)  
•  Flows  on  Graphs  (MAX-­‐FLOW,  MIN-­‐COST  FLOW,  …)  
•  Matching  and  Assignment  (Hungarian  Method,  Edmond’s  Algorithm)  
•  many  more  
 
 
 
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 8
Efficient  Algorithms!  

There  are  efficient  approxima#on  algorithms  and  heuris#cs  for  the  approximate  solu1on  
of  many  graphs  problems,  e.g.  
•  Subgraph  Problems  (Dense  Subgraphs,  Minors,  …)  
•  Op1mal  Tour  Problems  (TSP,  PCTSP,  VRP,  …)  
•  Steiner  Trees  
•  many  more  

There  are  simple  heuris#cs  that  o^en  yield    


quite  good  results,  such  as  for  example  k-­‐OPT    
for  the  Euclidean  TSP.  
 

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 9


Why  efficiency  is  crucial  

Graphs  are  usually  large-­‐scale  


•  In  2008,  English  Wikipedia  used  to  have  2.301.486  ar1cles*  with  55.550.003  links  in  
between  
Graphs  are  usually  dense  and  strongly  connected  
•  The  largest  "strongly-­‐connected-­‐component"  of  Wikipedia  has  2.111.480  ar1cles.  
Remember  from  the  last  lecture  
•  Graphs  in  NLP  are  usually  scale-­‐free  and  have  the  small  world  property  (high  clustering  
coefficient)  
à  Problem  solu1ons  o^en  consider  only  small  subgraphs  (local  neighborhoods),  but  an  a  
priori  par11oning  is  usually  not  possible  (this  yields  small  1me  complexity  but  full  space  
complexity)  
*  by  today  there  are  almost  4  million  ar1cles  
 
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 10
PageRank  
§  First-­‐genera1on  Google  global  ranking  algorithm  (1998)  
§  Measure  the  (query-­‐independent)  importance  of  Web  page  based  solely  on  the  
link  structure.  
§  Assign  each  node  a  numerical  score  between  0  and  1,  its  PageRank.  
§  Rank  Web  pages  based  on  PageRank  values.  
 
General  Idea:  
  every  page  has  a  number  of  in-­‐links  (back  links)  and  out-­‐links  (forward  links)  
§ 
§  pages  with  more  in-­‐links  are  more  important  
§  in-­‐links  from  important  pages  are  more  important  
 

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 11


PageRank  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 12


Defini#on  of  PageRank  
u:  a  web  page,  R(u)  its  page  rank  
Fu:    set  of  pages  u  points  to  (forward  links)   R(v) (1$ d)
R(u) = " |F | # d +
Bu:    set  of  pages  that  point  to  u  (backw.  Links)   v !Bu v
N
|Fu|:    the  number  of  links  from  u    
N:  total  number  of  pages  
page
d:  damping  factor,  default  d=0.85  
B

§  The  equa1on  is  recursive,  but  it  may  be  computed   page page
by  star1ng  with  any  set  of  ranks  and  itera1ng  the   A C
computa1on  un1l  it  converges.    
§  Rank  sink  problem:  cycle  of  pages  that   page
accumulates  rank  within  the  cycle,  but  never   D
page
distributes  rank  outside   X
§  Need  damping:  uniform  rank  distribu1on  for  all  
pages  
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 13
Random  Surfer  Model  

§  When  normalizing  PageRank  over  all  pages  to  1,  R(u)  can  be  thought  of  as  the  
probability  that  a  random  surfer  looks  at  a  page  u.  
§  Damping  corresponds  to  “teleporta1on”:  With  some  probability  d,  the  random  
surfer  is  teleported  to  some  other  page  

page
B
page
X page page
A C

page
D

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 14


Computation of PageRank

§  Numeric: Simulate a lot of random surfers: The Power method of


Eigenvector computation
§  initialize all pages with the same rank
§  repeat until convergence:
§  for all pages u: compute Rt+1(u) on the basis of Rt(v)
§  t:=t+1
input : matrix size N , error tolerance ϵ
output: eigenvector p

p0 = 1/N 1
t=0;
repeat until δ < ϵ:
t=t+1;
pt = MTpt−1 ;
δ = ||pt − pt−1 ||;
return pt ;
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 15
LexRank:    
Applica#on  to  Mul#-­‐Document  Summariza#on  
Mul2-­‐document  summariza2on  task:  
1.  iden1fy  important  topics  of  the  documents  to  be  summarized  
2.  iden1fy  sentences  belonging  to  a  certain  topic  
3.  from  these  sentences  belonging  to  the  same  topic,  select  the  ones  that  best  
describe  the  topic  
4.  concatenate  sentences  from  different  topics  and  make  sure  they  fit  together  

Consider  sub-­‐problem  3:    


 
Input:  Sentences  that  talk  about  more  or  less  the  same  thing  
 
Output:  Scores  for  those  sentences  that  reflect  how  well  a  single  sentence  
represents  that  topic  
 

Solu#on  idea:  use  measures  on  sentence  similarity  graph  


SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 16
From  Sentences  to  TF*IDF  vectors  

TF: count w1..wn TF*IDF

w w w w w w w w
Sentence 1 2 3 n 1 2 3 n
.27
3 0 2 0 0 0 0
This is a sentence that
talks about some topic.

.24 .21 feature  vector    


5 0 3 1 0 0
And here is another
sentence that talks abot
of  the  second    
something slightly sentence  
different.
… …
This  is  the  same  as  
7 4 0 0 0 .62
0 0 the  vector  space  
model  for  
And here is yet another
one of these notorious
sentences
Informa1on  
Retrieval  
! total number of sentences $
IDF(w) = log#
DF 3 1 2 … 1 " DF(w)
&
%
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 17
From  TF*IDF  vectors  to  sentence  similarity  graph  

§  Sentence  similarity  graph:    


§  nodes:  sentences  
§  edges:  cosine  similarity  between  sentence  feature  vectors  
§  Can  apply  threshold  on  similarity  or  use  similarity  as  edge  weight  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 18


Measures:  Centroid,  Degree  and  Centrality  

§  Centroid  
§  Idea:  select  an  average  sentence.  Compute  average  point  of  sentence  vectors  
(centroid)    
§  select  sentence  that  is  most  similar  to  the  centroid  for  summariza1on  
§  Degree  Centrality  
§  Idea:  sentences  that  cover  most  of  the  content  have  a  high  node  degree  (number  of  
edges):  since  word  overlap  is  responsible  for  edges,  node  degree  measures  word  
overlap  with  the  overall  set  of  sentences  
§  for  summariza1on,  choose  the  sentence  with  the  highest  degree  
§  LexRank  Centrality  
§  Idea:  it  does  not  suffice  to  be  similar  to  many  sentences:  similarity  to  important  
sentences  counts  more.  
§  normalize  the  adjacency  sentence  similarity  to  make  it  a  stochas1c  matrix  
§  run  PageRank  to  obtain  scores  that  are  used  for  ranking  the  sentences  
§  for  summariza1on,  choose  sentence  with  highest  score  
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 19
Evalua#on  of  graph-­‐based  mul#-­‐document  
summariza#on  

§  Scores:  ROUGE  metric:  similar  to  BLEU,  between  manual  summaries  and  system  summaries  
§  random  baseline:  select  any  sentence  from  set  by  chance  
§  lead-­‐based:  select  based  on  posi1on  of  sentence  within  document  
è LexRank  is  a  simple  method  for  genng  high  scores.  It  uses  the  whole  structure  of  the  
graph,  as  opposed  to  Centroid  or  Degree.  
This  technique  also  works  well  for  single-­‐document  summariza1on.    

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 20


TextRank  for  Keyword  Extrac#on  

§  Keyword  extrac#on:  find  the  most  salient  keywords  for  a  document  
§  Keyword  extrac#on  with  PageRank:  
§  preprocess  document:  iden1fy  adjec1ves  and  nouns  as  targets  
§  target  co-­‐occurrence  graph:  targets  co-­‐occurring  within  a  window  of  2-­‐10  words  
§  apply  PageRank  to  get  ranking  scores  on  nodes    
§  select  highest  scoring  keywords,  possibly  concatenate  ADJ-­‐NOUN-­‐NOUN  sequences  if  
present  in  the  text  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 21


Keyword  Extrac#on  Evalua#on  

§  Comparison:  Supervised  system  that  is  trained  on  manually  assigned  keywords,  
using  frequency  and  contextual  features  
§  Note  that  TextRank  is  unsupervised:  no  training  necessary  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 22


Graph  Clustering  

§  Task:  Find  meaningful  groups  of  nodes  in  graph  by  cunng  edges  
§  Intui1on:  Connectedness  within  a  cluster  is  higher  than  between  clusters  
§  Many  graph  clustering  algorithms  
find  the  number  of  clusters    
automa1cally  

3 3 3

http://elisa.dyndns-web.com/~elisa/publications/

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 23


Clustering  by  Min-­‐Cut  /  Max-­‐Flow  
§  MinCut  algorithm:  hierarchical  top-­‐down  clustering  
§  compute  the  minimum  cut:  leaving  out  a  set  of  edges,  which  results  in  disconnec1ng  a  
set  of  nodes  from  another,  with  the  smallest  edge  weight  sum  
§  recursively  apply  to  the  components  that  got  disconnected  
§  Finding  the  minimum  cut  is  equivalent  to  finding  the  maximum  flow  in  a  
network  
§  Advantage:  Efficient.  Fastest  known  algorithm  of  per-­‐cut  complexity    
O(|E|+log3(|V|)  
§  Disadvantage:    
§  Unbalanced  cuts  
§  when  to  stop?    

http://scienceblogs.com/goodmath/2007/08/maximum_flow_and_minimum_cut_1.php

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 24


Markov  Chain  Clustering  
http://micans.org/mcl/

§  Clustering  based  on  random  walks:  MCL  is  the  parallel  simula1on  of  all  possible  
random  walks  up  to  a  finite  length  on  a  graph  G  
§  Idea:  a  random  walker  on  the  graph  is  more  likely  to  stay  within  the  same  cluster  
than  to  end  up  in  a  different  cluster  a[er  a  small  number  of  steps  
§  Algorithm:  can  show  convergence  to  a  limit  T  
Add loops: transition matrix T= column-normalize (AG + I)
MCL process: alternate between
T=Tt // expansion: raise T to its power of t
T=inflate(T) // inflation: increase contrast within
columns by raising values to their power
of s (s>0) and normalize column-wise
Interpret T as a clustering: use strongest connection as label

Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000.

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 25


Expansion  step:  simulate  the  random  walk  

§  (stochas1c)  adjacency  matrix  T:  probabili1es  to  walk  from  node  in  column  to  
node  in  row  in  a  single  step.    
§  T2:  probabili1es  to  walk  from  A  to  B  in  2  steps.    

AG
loops
added

T T2

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 26


Infla#on  Step:  only  keep  a]ractors  

x2

x2 norm
alize

x2

§  Inflate  the  differences  within  a  column  by  taking  the  k-­‐th  power  of  the  value,  then  normalize  
to  ensure  stochas1c  property.  k  regulates  the  cluster  sizes  
§  Clustering:  Highest  entry  in  column  vector  is  cluster  label  variants:  
§  Could  add  small  random  noise  to  break  1es  
§  Op1miza1on:  Only  keep  K  largest  values,  only  keep  values  over  threshold  
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 27
Chinese  Whispers  Graph  Clustering  

§  MCL:  keep  only  a  few  strong  neighbors  


§  Chinese  Whispers:  only  propagate  strongest  label  in  neighborhood  
initialize:

"forall vi in V: class(vi)=i;"
§  Nodes  have  a  class  and  
while changes:"
communicate  it  to  their  
forall v in V, randomized order:" adjacent  nodes  
"class(v)=highest ranked 
 §  A  node  adopts  one  of  the  
class in neighborhood of v;" the  majority  class  in  its  
D

neighbourhood    
B

L4
8
5 L2
§  Nodes  are  processed  in  




deg=2 A
deg=1 random  order  for  some  

L1
itera1ons  

3 E

6
C
deg=4 L3
§  Node  weigh1ng  schemes  
L3



deg=3
deg=5
SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 28
Disambigua#on  using  Resource  Graphs  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 29


Disambigua#on  of  Named  En##es    
using  Resource  Graphs  

Wikipedia  Link  Graph  


(Shortest)  paths  are  one  possibility  

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 30


Disambigua#on  of  Named  En##es    
using  Resource  Graphs  
(Shortest)  paths  are  one  possibility.  What  else?  
•  maximum  capacity  paths  (capaci1es  needed,  e.g.  coherence,  probabili1es,  ...)  
•  maximum  flows  (Aten1on:  Small  world  graph!  Path  length  must  be  bounded!)  
•  apply  PageRank  to  weight  nodes  
 

Semantic enrichment:
•  Use the nodes on the paths / flows for enriching to overcome the knowledge
acquisition bottleneck

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 31


Summary  on  Graph  Methods  in  NLP  

§  Graph  representa1on  is  a  natural  representa1on  of  en11es  and  their  rela1ons  
§  We  might  use  well-­‐known  (efficient)  graph  algorithms  for  the  solu1on  of  
specific  NLP  problems  
§  Taking  the  overall  structure  into  account  some  NLP  tasks  might  be  improved  
(enriching  seman1cs)  
§  Graph  clustering  algorithms  solve  unsupervised  NLP  tasks  without  the  need  to  
specify  the  number  of  clusters  
§  We  can  enrich  informa1on  by  walks  on  graphs  
 

SS 2011 | Graph-based Methods for NLP | UKP Lab - Wolfgang Stille | 32

You might also like