You are on page 1of 18

Linguistically Rich Graph Based Data Driven Parsing

Goal: To assist a graph based data driven parser by incorporating linguistic

knowledge given by a Constraint based parser

Approach: We use a Constraint Graph instead of a complete graph in a MSTParser to extract a spanning tree

Out line of the presentation

Introduction to a Constraint graph

MSTParser

Experiments (1-8)

Results

Observations

Future Work

Constraint Graph

The Constraint based parser gives us the the graph in two stages as shown in example1

For our experiments, We can either use only stage1 equations or only stage2 or both

All the results which will be presented will use equations from both stages

Constraint Graph example1


fa") 1st stage Cnnstraint Graph

(1:1) End stage Cunstrajnt Graph

MSTParser

In MSTParser a complete graph is used to extract a spanning tree during derivation.

In the following experiments we try to prune the complete graph by using a Constraint Graph.

The spanning tree extraction algorithm in MSTParser uses unlabeled graph. MSTParser uses a separate classifier to label the trees. So we are using only attachment information while pruning the complete graph.

Experimental Setup

All the experiments are conducted on Hindi dependency treebank released as part of the ICON2010 tools contest

The training data had 2,973 sentences, development and training had 543 sentences and 321 sentences respectively.

MSTParser was modified so that it can use CG during derivation. We use the non- projective algorithm, order=1 and training k=5.

Set Notation

For an input S = w
0

,w
1

, .w
n

, i.e. the set of all words in a sentence, let G

be the complete graph and CG


S

b e the constraint graph provided by the constraint parser.

Let N = { w
0

,w
1

, .w
n

} be the set of vertices in both CG


S

and G
S

A
G

= N x N and AC
G

N x N are the set of arcs in the two graphs.

An arc between w i and w j , shown as w i wj signifies w i s the parent of w j .

Also let C be the set of all vertices which are conjuncts, V be the set of all vertices which are verbs, K be the set of nouns, P be the set of all vertices that have a case-marker/post-position and J be the set of adjectives.

Experiments

Through the following experiments we formulate the most optimal constraint graph that gives us the best accuracy

We start by using all equations given by CG to prune the complete graph The set of valid arcs with respect to w

j in Experiment 1 can be written as follows: E1 = { w i w j :w i w j A CG wi ,w j N}

Example:
consider the following constraint graph. To always ensure that there is a tree possible we have added connections to root node from all the other nodes

siMha mAkapA_ke basu_><vArA uTAe_gae praSna_kAjavAba ><e_rahe_We


1234567

Mst and CG based mst output


main main

<ROOT> mAkapA basu uTAe praSna javAba xe 01234567