You are on page 1of 8

The Syntactically Annotated ICE Corpus

and the Automatic Induction of a Formal Grammar

Alex Chengyu Fang


Department of Chinese, Translation and Linguistics
City University of Hong Kong
acfang@cityu.edu.hk

Abstract coverage when tested with empirical data.


Finally, this article will present an evaluation of
The International Corpus of English is the grammar through its application in a robust
a corpus of national and regional parsing system in terms of labelling and
varieties of English. The mega-word bracketing accuracies.
British component has been con-
structed, grammatically tagged, and 1.1 The ICE wordclass annotation scheme
syntactically parsed. This article is a
description of work that aims at the There are altogether 22 head tags and 71 features
automatic induction of a wide-coverage in the ICE wordclass tagging scheme, resulting in
grammar from this corpus as well as an about 270 grammatically possible combinations.
empirical evaluation of the grammar. It Compared with 134 tags for LOB, 61 for BNC,
first of all describes the corpus and its and 36 for Penn Treebank, the ICE tagset is
annotation schemes and then presents perhaps the most detailed in automatic appli-
empirical statistics for the grammar. I cations. They cover all the major English word
will then evaluate the coverage and the classes and provide morphological, grammatical,
accuracy of such a grammar when collocational, and sometimes syntactic inform-
applied automatically in a parsing ation. A typical ICE tag has two components: the
system. Results show that the grammar head tag and its features that bring out the
enabled the parser to achieve 86.1% grammatical features of the associated word. For
recall rate and 83.5% precision rate. instance, N(com,sing) indicates that the lexical
item associated with this tag is a common (com)
1 Introduction singular (sing) noun (N).
Tags that indicate phrasal collocations
The International Corpus of English (ICE) is a include PREP(phras) and ADV(phras), pre-
project that aims at the construction of a positions (as in [1]) and adverbs (as in [2]) that are
collection of corpora of English in countries and frequently used in collocation with certain verbs
regions where English is used either as a first or and adjectives:
as an official language (Greenbaum 1992). Each
component corpus comprises one million words [1] Thus the dogs’ behaviour had been changed
of both written and transcribed spoken samples because they associated the bell with the food.
that are then annotated at grammatical and
syntactic levels. The British component of the [2] I had been filming The Paras at the time, and
ICE corpus was used to automatically induce a Brian had had to come down to Wales with the
large formal grammar, which was subsequently records.
used in a robust parsing system. In what follows,
this article will first of all describe the Some tags, such as PROFM(so,cl) (pronominal
annotation schemes for the corpus and the so representing a clause as in [3]) and
evaluation of a formal grammar automatically PRTCL(with) (particle with as in [4]), indicate
induced from the corpus in terms of its potential the presence of a clause; so in [3] signals an

49
abbreviated clause while with in [4] a non-finite type of verb can be analysed differently according
clause: to various tests into, for instance, monotransitives,
ditransitives and complex transitives. To avoid
[3] If so, I’ll come and meet you at the station. arbitrary decisions, the complementing non-finite
clause is assigned a catch-all term ‘transitive
[4] The number by the arrows represents the complement’ in parsing, and its preceding verb is
order of the pathway causing emotion, with the accordingly tagged as TRANS in order to avoid
cortex lastly having the emotion. making a decision on its transitivity type. This
verb type is best demonstrated by [8]-[11]:
Examples [5]-[7] illustrate tags that note special
sentence structures. There in [5] is tagged as [8] Just before Christmas, the producer of Going
EXTHERE, existential there that indicates a Places, Irene Mallis, had asked me to make a
marked sentence order. [6] is an example of the documentary on ‘warm-up men’.
cleft sentence (which explicitly marks the focus),
where it is tagged as CLEFTIT. [7] exemplifies [9] They make others feel guilty and isolate
anticipatory it, which is tagged as ANTIT: them.

[5] There were two reasons for the secrecy. [10] I can buy batteries for the tape - but I can see
myself spending a fortune!
[6] It is from this point onwards that Roman
Britain ceases to exist and the history of sub- [11] The person who booked me in had his
Roman Britain begins. eyebrows shaved and replaced by straight black
painted lines and he had earrings, not only in his
[7] Before trying to answer the question it is ears but through his nose and lip!
worthwhile highlighting briefly some of the
differences between current historians. In examples [8]-[11], asked, make, see, and had
are all complemented by non-finite clauses with
The verb class is divided into auxiliaries and overt subjects, the main verbs of these non-finite
lexical verbs. The auxiliary class notes modals, clauses being infinitive, present participle and past
perfect auxiliaries, passive auxiliaries, semi- participle.
auxiliaries, and semip-auxiliaries (those followed As illustrated by examples [1]-[11], the ICE
by -ing verbs). The lexical verbs are further tagging scheme has indeed gone beyond the
annotated according to their complementation wordclass to provide some syntactic information
types. There are altogether seven types: complex and has thus proved itself to be an expressive
transitive, complex ditransitive, copular, dimono- and powerful means of pre-processing for
transitive, ditransitive, intransitive, mono- subsequent parsing.
transitive, and TRANS. Figure 1 shows the sub-
categorisations of the verb class. 1.2 The ICE parsing scheme

Lexical Verb
The ICE parsing scheme recognises five basic
syntactic phrases. They are adjective phrase
Intransitive Copula (AJP), adverb phrase (AVP), noun phrase (NP),
prepositional phrase (PP), and verb phrase (VP).
Transitive
Each tree in the ICE parsing scheme is re-
Mono-
transitive
Trans presented as a functionally labelled hierarchy,
with features describing the characteristics of each
Di-transitive
Complex- constituent, which is represented as a pair of
transitive
Di-mono-
transtive
function-category labels. In the case of a terminal
node, the function-category descriptive labels are
Figure 1: The ICE subcategorisation for verbs
appended by the lexical item itself in curly
brackets. Figure 2 is such a structure for [12].
The notation TRANS of the transitive verb class is
used in the ICE project to tag those transitive [12] We will be introducing new exam systems for
verbs followed by a noun phrase that may be the both schools and universities.
subject of the following non-finite clause. This

50
According to Figure 2, we know that [12] is a example, the sequence AUX(modal,pres)
parsing unit (PU) realised by a clause (CL), which AUX(prog,infin) V(montr,ingp) is a VP
governs three daughter nodes: SU NP (NP as rule describing instantiations such as will be
subject), VB VP (VP as verbal), and OD NP (NP introducing in [12]. The second set describes the
as direct object). Each of the three daughter nodes clause as sequences of phrase types. The string
are sub-branched until the leaves nodes with the in [12], for instance, is described by a sequence
input tokens in curly brackets. The direct object NP VP NP PP in the set of clausal rules.
node, for example, has three immediate To empirically characterise the grammar,
constituents: NPPR AJP (AJP as NP pre- the syntactically parsed ICE corpus was divided
modifier), NPHD N(com,plu) (plural common into ten equal parts according to the number of
noun as the NP head), and NPPO PP (PP as NP component texts. One part was set aside for
post-modifier). testing, which was further divided into five test
sets. The remaining nine parts were used as
training data in a leave-one-out fashion. In this
way, the training data was used to generate 9
consecutive training sets, each increased by one
part over the previous set, with Set 1 formed of
one training set, Set 2 two training sets, and Set
3 three training sets, etc. The evaluation thus not
only aims to establish the potential coverage of
the grammar but also to indicate the function
between the coverage of the grammar and the
training data size.
Figure 3 shows the growth of the number of
phrase structure rules as a function of the growth
of training data size. The Y-axis indicates the
number of rules generated from the training data
Figure 2: A parse tree for [12] and the X-axis the gradual increase of the
training data size.
Note that in the same example, the head of the 15000
complementing NP of the prepositional phrase is 12500
AVP
initially analysed as a coordinated construct 10000 AJP
(COORD), with two plural nouns as the conjoins 7500 NP
(CJ) and a coordinating conjunction as co- 5000 PP
ordinator (COOR). VP
2500
In all, there are 58 non-terminal parsing 0
symbols in the ICE parsing scheme, compared 1 2 3 4 5 6 7 8 9

with 20 defined in the Penn Treebank project. The


Figure 3: The number of phrase structure rules as a
Suzanne Treebank has 43 function/category
function of growing training data size
symbols, discounting those that are represented as
features in the ICE system. It can be observed that AJP and AVP show only
a marginal increase in the number of different
2 The generation of a formal grammar rules with the increase of training data size,
therefore demonstrating a relatively small core
The British component of the ICE corpus, set. In comparison, VPs are more varied but still
annotated in fashions described above, has been exhibit a visible plateau of growth. The other
used to automatically generate a formal two phrases, NP and PP, show a much more
grammar that has been subsequently applied in varied set of rules not only through their large
an automatic parsing system to annotate the rest numbers (9,184 for NPs and 13,736 for PPs) but
of the corpus (Fang 1995, 1996, 1999). The also the sharp learning curve. There are many
grammar consists of two sets of rules. The first reasons for the potentially large set of rules for
set describes the five canonical phrases (AJP, PPs since they structurally subsume the clause
AVP, NP, PP, VP) as sequences of grammatical as well as all the phrase types. Their large
tags terminating at the head of the phrase. For number is therefore more or less expected. The

51
large set of NP rules is however a bit surprising Figure 5: The coverage of AVP rules
since they are often characterised, perhaps too
simplistically, as comprising a determiner group, 3.3 The coverage of NP rules
a premodifier group, and the noun head but the
grammar has 9,184 different rules for this phrase Although lower than AVP and AJP discussed
type. While this phenomenon calls for further above, the NP rules show a satisfactorily high
investigations, we are concerned with only the coverage when tested by the five samples. As
coverage issue for the moment in the current can be seen from Figure 6, the initial coverage
article. when trained with one ninth of the training data
is generally above 97%, rising proportionally as
3 The coverage of the formal grammar the training data size increases, to about 99%.
This seems to suggest a mildly complex nature
The coverage of the formal grammar is of NP structures.
evaluated through individual rule sets for the 100
five canonical phrase types separately. The 99.5
Set 1
coverage by the clausal rules will also be report- 99
Set 2
ed towards the end of this section. 98.5 Set 3

98 Set 4
Set 5
3.1 The coverage of AJP rules 97.5

97
1 2 3 4 5 6 7 8 9
As Figure 4 suggests, the coverage of the
grammar, when tested with the five samples, is Figure 6: The coverage of NP rules
consistently high – all above 99% even when the
grammar was trained from only one ninth of the 3.4 The coverage of VP rules
training set. The increase of the size of the train-
ing set does not show significant enhancement VPs do not seem to pose significant challenge to
of the coverage. the parser. As Figure 7 indicates, the initial
100 coverage is all satisfactorily above 97.5%. Set 1
99.8 even achieved a coverage of over 98.5% when
99.6 Set 1
the grammar was trained with only one ninth of
Set 2 the training data. As the graph seems to suggest,
99.4 Set 3
Set 4
the learning curve arrives at a plateau when
99.2
Set 5 trained with about half of the total training data,
99
1 2 3 4 5 6 7 8 9
suggesting a centralised use of the rules.
100
Figure 4: The coverage of AJP rules 99.5 Set 1

99 Set 2
3.2 The coverage of AVP rules Set 3
98.5 Set 4

98 Set 5
Like AJP rules, high coverage can be achieved
97.5
with a small training set since when trained with
1 2 3 4 5 6 7 8 9
only one ninth of the training data, the AVP
rules already showed a high coverage of above Figure 7: The coverage of VP rules
99.4% and quickly approaching 100%. See
Figure 5. 3.5 The coverage of PP rules
100

99.9 As is obvious from Figure 8, PPs are perhaps the


99.8
Set 1 most complex of the five phrases with an initial
Set 2
99.7 Set 3
coverage of just over 70%. The learning curve is
99.6
Set 4 sharp, culminating between 85% and 90% with
Set 5
99.5
the full training data set. As far as parser
99.4
construction is concerned, this phrase alone
1 2 3 4 5 6 7 8 9 deserves special attention since it explains much
of the structural complexity of the clause. Based

52
on this observation, a separate study was carried can be reliably concluded that the corpus-based
out to automatically identify the syntactic grammar construction holds a promising
functions of PPs. approach in that the phrase structure rules
95
generally have a high coverage when tested with
unseen data. The same approach has also raised
90
Set 1
two questions at this stage: Does the high-
85 Set 2
Set 3
coverage grammar also demonstrate a high
80 Set 4 precision of analysis? Is it possible to enhance
Set 5
75 the coverage of the clause structure rules within
70 the current framework?
1 2 3 4 5 6 7 8 9

Figure 8: The coverage of PP rules 4 Evaluating the accuracy of analysis

3.6 The coverage of clausal rules The ICE project used two major annotation
tools: AUTASYS and the Survey Parser.
Clausal rules present the most challenging AUTASYS is an automatic wordclass tagging
problem since, as Figure 9 clearly indicates, system that applies the ICE tags to words in the
their coverage is all under 67% even when input text with an accuracy rate of about 94%
trained with all of the training data. This (Fang 1996a). The tagged text is then fed into
observation seems to reaffirm the usefulness of the Survey Parser for automated syntactic
rules at phrase level but the inadequacy of analysis. The parsing model is one that tries to
clause structural rules. Indeed, it is intuitively identify an analogy between the input string and
clear that the complexity of the sentence is a sentence is that already syntactically analysed
mainly the result of the combination of clauses and stored in a database (Fang 1996b and 2000).
of various kinds. This parser is driven by the previously described
formal grammar for both phrasal and clausal
67
analysis. In this section, the formal grammar is
65
Set 1 characterised through an empirical evaluation of
63 Set 2
61 Set 3
the accuracy of analysis by the Survey Parser.
59 Set 4

57 Set 5 4.1 The NIST evaluation scheme


55
1 2 3 4 5 6 7 8 9
The National Institute of Science and Tech-
nology (NIST) proposed an evaluation scheme
Figure 9: The coverage of CL rules
that looks at the following properties when
comparing recognition results with the correct
3.7 Discussion
answer:
This section presented an evaluation of the number of correct constituents in TP
Correct Match Rate =
grammar in terms of its coverage as a function number of constituents in TC
of growing training data size. As is shown, the
number of substituted constituents in TP
parsed corpus resulted in excellent grammar sets Substitution Rate =
number of constituents in TC
for the canonical phrases, AJP, AVP, NP, PP,
and VP: except for PPs, all the phrase structure Deletion Rate =
number of deleted constituents in TP
number of constituents in TC
rules achieved a wide coverage of about 99%.
The more varied set for PPs demonstrated a Insertion = number of inserted constituents in TP
coverage of nearly 90%, not as high as what is
achieved for the other phrases but still highly Combined Rate =
number of correct constituents in TP ! number of insertions
number of constituents in TC
satisfactory.
The coverage of clause structure rules, on
the other hand, showed a considerably poorer Notably, the correct match rate is identical to the
performance compared with the phrases. When labelled or bracketed recall rate. The commonly
all of the training data was used, these rules used precision score is calculated as the total
covered just over 65% of the testing data. number of correct nodes over the sum of correct,
In view of these empirical observations, it substituted, and inserted nodes. The insertion

53
score, arguably, subsumes crossing brackets
errors since crossing brackets errors are caused
by the insertion of constituents even though not
every insertion causes an instance of crossing
brackets violation by definition. In this respect,
the crossing brackets score only implicitly hints
at the insertion problem while the insertion rate
of the NIST scheme explicitly addresses this
issue.
Because of the considerations above, the
evaluations to be reported in the next section
were conducted using the NIST scheme. To Figure 10: A correct tree for [12]
objectively present the two sides of the same
coin, the NIST scheme was used to evaluate the After removing the bracketed structure, we then
Survey Parser in terms of constituent labelling have two flattened sequences of constituent
and constituent bracketing before the two are labels and compare them using the NIST
finally combined to yield performance scores. scheme, which will yield the following statistics:
In order to conduct a precise evaluation of
the performance of the parser, the experiments Total # sentences evaluated : 1
look at the two aspects of the parse tree: Total # constituent labels : 42
labelling accuracy and bracketing accuracy. Total # correct matches : 37 (88.1%)
Labelling accuracy expresses how many Total # labels substituted : 5 (11.9%)
correctly labelled constituents there are per Total # labels deleted : 0 ( 0.0%)
hundred constituents and is intended to measure Total # labels inserted : 6
Overall labelling accuracy : 73.8%
how well the parser labels the constituents when
compared to the correct tree. Bracketing ac-
curacy attempts to measure the similarity of the
parser tree to that of the correct one by
expressing how many correctly bracketed
constituents there are per hundred constituents.
In this section, the NIST metric scheme will be
applied to the two properties separately before
an attempt is made to combine the two to assess
the overall performance of the Survey Parser.
The same set of test data described in the
previous section was used to create four test sets
of 1000 trees each to evaluate the performance
of the grammar induced from the training sets
described earlier.
Figure 11: A parser-produced tree for [12]
4.2 Labelling Accuracy
Accordingly, we may concretely claim that there
To evaluate labelling accuracy with the NIST are 42 constituent labels according to the correct
scheme, the method is to view the labelled tree, of which 37 (88.1%) are correctly labelled
constituents as a linear string with attachment by the parser, with 5 substitutions (11.9%), 0
bracketing removed. For [12], as an example, deletion, and 6 insertions. The overall labelling
Figure 10 is a correct tree and Figure 11 is a accuracy is then calculated as 73.8%.
parser-produced tree. A total of 4,000 trees, divided into four sets
of 1,000 each, were selected from the test data to
[12] It was probably used in the Southern evaluate the labelling accuracy of the parser.
States as well. Empirical results show that the parser achieved
an overall labelling precision of over 80%.

54
Test Set 1 Test Set 2 Test Set 3 Test Set 4 4.4 Combined accuracy
# % # % # % # %
Tree 1000 1000 1000 1000
Node 31676 34095 31563 30140 The combined score for both labelling and
Correct 27329 86.3 29263 85.8 27224 86.3 26048 86.4 bracketing accuracy is achieved through re-
Subs 3214 10.1 3630 10.6 3253 10.3 3084 10.2
Del 1133 3.6 1202 3.5 1086 3.4 1008 3.3
presenting both constituent labelling and un-
Ins 2021 2316 1923 1839 labelled bracketing in a linear string described in
Prec. 83.9 83.1 84.1 84.1 the previous sections.
Overall 79.9 79.0 80.2 80.3 Table 3 gives the total number of trees in
Table 1: Labelling accuracy
the four test sets and the total number of
constituents. The number of correct matches,
Table 1 shows that the Survey Parser scored substitutions, insertions and deletions are in-
86% or better in terms of correct match (labelled dicated and combined scores computed ac-
recall) and nearly 84% in terms of labelled cordingly. The table shows that the parser
precision rate for the four sets. About 10% of scored 86% and 83.5% respectively for labelled
the constituent labels are wrong (Subs) with a recall and precision. It is also shown that the
deletion rate (Del) of about 3.5%. Counting parser achieved an overall performance of about
insertions (Ins), the overall labelling accuracy by 79%. Considering that the scoring program
the parser is around 80%. tends to underestimate the success rate, it is
reasonable to assume a real overall combined
4.3 Bracketing Accuracy performance of 80%.
A second aspect of the evaluation of the Test Set 1 Test Set 2 Test Set 3 Test Set 4
grammar through the use of the Survey Parser # % # % # % # %
involves the measuring of its attachment Tree 1000 1000 1000 1000
Node 44127 47485 43974 41998
precision, an attempt to characterise the Correct 38008 86.1 40665 85.6 37844 86.1 36319 86.5
similarity of the parser-produced hierarchical Subs 4302 9.7 4879 10.3 4368 9.9 4052 9.6
structure to that of the correct parse tree. To Del 1781 4.0 1941 4.1 1762 4.0 1627 3.9
Ins 3148 3613 3015 2868
estimate the precision of constituent attachment Prec 83.6 82.7 83.7 83.9
of a tree, a linear representation of the Overall 79.0 78.0 79.2 79.6
hierarchical structure of the parse tree is design- Table 3: Combined accuracy
ed which ensures that wrongly attached non-
terminal nodes are penalised only once if their 4.5 Discussion
sister and daughter nodes are correctly aligned.
Table 2 shows that the parser achieved Although the scores for the grammar and the
nearly 86% for the bracketed correct match and parser look both encouraging and promising, it
82.8% for bracketing precision. Considering is difficult to draw straightforward comparisons
insertions and deletions, the overall accuracy with other systems. Charniak (2000) reports a
according to the NIST scheme is about 77%. maximum entropy inspired parser that scored
This indicates that for every 100 bracket pairs 90.1% average precision/recall when trained and
77 are correct, with 23 substituted, deleted, or tested with sentences from the Wall Street
inserted. In other words, for a tree of 100 Journal corpus (WSJ). While the difference in
constituents, 23 edits are needed to conform to precision/recall between the two parsers may
the correct tree structure. indicate the difference in terms of performance
between the two parsing approaches, there
Test Set 1 Test Set 2 Test Set 3 Test Set 4 nevertheless remain two issues to be investigat-
# % # % # % # %
Tree 1000 1000 1000 1000
ed. Firstly, there is the issue of how text types
Node 12451 13390 12411 11858 may influence the performance of the grammar
Correct 10679 85.8 11402 85.2 10620 85.6 10271 86.6 and indeed the parsing system as a whole.
Subs 1088 8.7 1249 9.3 1115 9.0 968 8.2
Del 648 5.5 739 5.5 676 5.4 619 5.2
Charniak (2000) uses WSJ as both training and
Ins 1127 1297 1092 1029 testing data and it is reasonable to expect a fairly
Prec 82.8 81.7 82.8 83.7 good overlap in terms of lexical co-occurrences
Overall 76.7 75.5 76.8 77.9 and linguistic structures and hence good
Table 2: Bracketing accuracy
performance scores. Indeed, Gildea (2001)
suggests that the standard WSJ task seems to be
simplified by its homogenous style. It is thus yet

55
to be verified how well the same system will 85.8% with a bracketed precision of 82.8%.
perform when trained and tested on a more Finally, an attempt was made to estimate the
‘balanced’ corpus such as ICE. Secondly, it is combined performance score for both labelling
not clear what the performance will be for and bracketing accuracies. The combined recall
Charniak’s parsing model when dealing with a is 86.1% and the combined precision is 83.5.
much more complex grammar such as ICE, These results show both encouraging and
which has almost three times as many non- promising performance by the grammar in terms
terminal parsing symbols. The performance of of coverage and accuracy and therefore argue
the Survey Parser is very close to that of an strongly for the case of inducing formal
unlexicalised PCFG parser reported in Klain and grammars from linguistically annotated corpora.
Manning (2003) but again WSJ was used for A future research topic is the enhancement of
training and testing and it is not clear how well the recall rate for clausal rules, which now
their system will scale up to a typologically stands at just over 65%. It is of great benefit to
more varied corpus. the parsing community to verify the impact the
size of the grammar has on the performance of
5 Conclusion the parsing system and also to use a typo-
logically more balanced corpus than WSJ as a
This article described a corpus of contemporary workbench for grammar/parser development.
English that is linguistically annotated at both
grammatical and syntactic levels. It then References
described a formal grammar that is auto-
matically generated from the corpus and Charniak, E. 2000. A maximum-entropy-
presented statistics outlining the learning curve inspired parser. In Proceedings of the 1st
of the grammar as a function of training data Annual Meeting of the North American
size. Coverage by the grammar was presented Chapter of the ACL, Seattle, Washington.
through empirical tests. It then reported the use Fang, A.C. 1996a. Grammatical tagging and
of the NIST evaluation metric for the evaluation cross-tagset mapping. In S. Greenbaum
of the grammar when applied by the Survey (ed).
Parser on test sets totalling 4,000 trees. Fang, A.C. 1996b. The Survey Parser: Design
Through the size of the grammar in terms of and development. In S. Greenbaum (ed).
the five canonical phrases as a function of Fang, A.C. 2000. From Cases to Rules and Vice
growth in training data size, it was observed that Versa: Robust Practical Parsing with
the learning curves for AJP, AVP, and VP Analogy. In Proceedings of the Sixth Inter-
culminated fairly rapidly with growing training national Workshop on Parsing Tech-
data size. In contrast, NPs and PPs demonstrate nologies, 23-25 February 2000, Trento,
a sharp learning curve, which may have Italy. pp 77-88.
suggested that there would be a lack of Gildea, D. 2001. Corpus variation and parser
sufficient coverage by the grammar for these performance. In Proceedings of the Con-
two phrase types. Experiments show that such a ference on Empirical Methods in Natural
grammar still had a satisfactory coverage for Language Processing, 2001.
these two with a near total coverage for the other Greenbaum, S. 1992. A new corpus of English:
three phrase types. ICE. In Directions in Corpus Linguistics:
The NIST scheme was used to evaluate the Proceedings of Nobel Symposium 82,
performance of the grammar when applied in the Stockholm 4-8 August 1991, ed. by Jan
Survey Parser. An especially advantageous Svartvik. Berlin: Mouton de Gruyter. pp
feature of the metric is the calculation of an 171-179.
overall parser performance rate that takes into Greenbaum, S. 1996. The International Corpus
account the total number of insertions in the of English. Oxford: Oxford University
parse tree, an important structural distortion Press.
factor when calculating the similarity between Klein, D. and C. Manning, 2003. Accurate un-
two trees. A total of 4,000 trees were used to lexicalized parsing. In Proceeding of the
evaluate the labelling and bracketing accuracies 41st Annual Meeting of the Association for
of the parse trees automatically produced by the Computational Linguistics, July 2003. pp
parser. It is shown that the LR rate is over 86% 423-430.
and LP is about 84%. The bracketed recall is

56

You might also like