Professional Documents
Culture Documents
Veins Theory:
A Model of Global Discourse Cohesion and Coherence
Dan CRISTEA Nancy IDE Laurent ROMARY
Dept. of Computer Science Dept. of Computer Science CRIN-CNRS & INRIA
University “A.I. Cuza” Vassar College Lorraine
Iaşi, Romania Poughkeepsie, NY, USA Villers-Les-Nancy, France
dcristea@infoiasi.ro ide@cs.vassar.edu romary@loria.fr
input two non-intersecting strings of terminal are possible only in its domain of accessibility.
node labels, x and y, and returns that In particular, we can say the following:
permutation of x y (x concatenated with y) that 1. In most cases, if B is a unit and bB is a
is given by the left to right reading of the referential expression, then either b directly
sequence of labels in x and y on the terminal realizes a center that appears for the first
frontier of the tree. The function maintains the time in the discourse, or it refers back to
parentheses, if they exist, and seq(nil, y) = y. another center realized by a referential
Heads expression aA, such that Aacc(B).2 Such
1. The head of a terminal node is its label. cases instantiate direct references.
2. The head of a non-terminal node is the 2. If (1) is not applicable, then if A, B, and C
concatenation of the heads of its nuclear are units, cC is a referential expression that
children. refers to bB, and B is not on the vein of C
Vein expressions (i.e., it is not visible from C), then there is an
1. The vein expression of the root is its head. item aA, where A is a unit on the common
2. For each nuclear node whose parent node vein of B and C, such that both b and c refer
has vein v, the vein expression is: to a. In this case we say that c is an indirect
if the node has a left non-nuclear sibling reference to a.3
with head h, then seq(mark(h), v); 3. If neither (1) nor (2) is applicable, then the
otherwise, v. reference in C can be understood without the
3. For each non-nuclear node of head h whose referee, as if the corresponding entity were
parent node has vein v, the vein expression is: introduced in the discourse for the first time.
if the node is the left child of its parent, Such references are inferential references.
then seq(h,v); Note that VT is applicable even when the
otherwise, seq(h, simpl(v)). division into units is coarser than in our
Note that the computation of heads is bottom-up, examples. For instance, Example 1 in its entirety
while that of veins is top-down. could be taken to comprise a single unit; if it
appeared in the context of a larger discourse, it
Consider example 1: would still be possible to compute its veins
1. According to engineering lore, (although, of course, the veins would likely be
2. the late Ermal C. Fraze,
3. founder of Dayton Reliable Tool & shorter because there are fewer units to
Manufacturing Company in Ohio, consider). It can be proven formally (Cristea,
2a. came up with a practical idea for the
pop-top lid
3. after attempting with halting success
to open a beer can on the bumper of his
car. 2
The structure of this discourse fragment is given If a and b are referential expressions, where the
in Figure 1. The central gray line traces the center (directly) realized by b is the same as the one
principal vein of the tree, which starts at the root (directly) realized by a, or where it is a role of the
center (directly) realized by a, we will say that b
refers (back) to a, or b is a bridge reference to a.
topological structure and the nuclear/satellite status 3 On the basis of their common semantic
(see below) of discourse units. representations.
Figure 1: Tree structure and veins for Example 1 H=2
V=2
H=2
V=2 CIRC
H=4
V=2 4
H=1 BACK 4
V=1 2
H=2
1 ELAB V=(1) 2
H=2
V=(1) 2 2 3
2 sees 1
H=3
V=2 3
3 doesn’t see 1
1998) that when passing from a finer granularity can be computed using accessibility domains to
to a coarser one the accessibility constraints are determine transitions rather than sequential units.
still obeyed. This observation is important in Table 2: Smoothness scores for transitions
relation to other approaches that search for
hal-00521889, version 1 - 11 Oct 2010
CENTER CONTINUATION 4
stability with respect to granularity (see for CENTER RETAINING 3
instance, Walker, 1996). CENTER SHIFTING (SMOOTH) 2
3 Global coherence CENTER SHIFTING (ABRUPT) 1
NO Cb 0
This section shows how VT can predict the Conjecture C2: The global smoothness score of
inference load for processing global discourse, a discourse when computed following VT is at
thus providing an account of discourse coherence. least as high as the score computed following
A corollary of Conjecture C1 is that CT can be CT.
applied along the accessibility domains defined by That is, we claim that long-distance transitions
the veins of the discourse structure, rather than to computed using VT are systematically smoother
sequentially placed units within a single discourse than accidental transitions at segment boundaries.
segment. Therefore, in VT reference domains for Note that this conjecture is consistent with results
any node may include units that are sequentially reported by authors like Passonneau (1995) and
distant in the text stream, and thus long-distance Walker (1996), and provides an explanation for
references (including those requiring their results.
"return-pops" (Fox, 1987) over segments that We can also consider anaphora resolution using
contain syntactically feasible referents) can be Cb’s computed using accessibility domains.
accounted for. Thus our model provides a Because a unit can simultaneously occur in several
description of global discourse cohesion, which accessibility domains, unification can be applied
significantly extends the model of local cohesion using the Cf list of one unit and those of possibly
provided by CT. several subsequent (although not necessarily
CT defines a set of transition types for discourse adjacent) units. A graph of Cb-unifications can be
(Grosz, Joshi, and Weinstein [1995]; Brennan, derived, in which each edge of the graph
Friedman and Pollard [1987]). A smoothness represents a Cb computation and therefore a
score for a discourse segment can be computed by unification process.
attaching an elementary score to each transition
between sequential units according to Table 2, 4 Minimal text
summing up the scores for each transition in the
entire segment, and dividing the result by the The notion that text summaries can be created by
number of transitions in the segment. This extracting the nuclei from RST trees is well
provides an index of the overall coherence of the known in the literature (Mann and Thompson,
segment. [1988]). Most recently, Marcu (1997) has
A global CT smoothness score can be computed described a method for text summarization based
by adding up the scores for the sequence of units on nuclearity and selective retention of
making up the whole discourse, and dividing the hierarchical fragments. Because his salient units
correspond to heads in VT, his results are
result by the total number of transitions (number predicted in our model. That is, the union of heads
of units minus one). In general, this score will be at a given level in the tree provides a summary of
slightly higher than the average of the scores for the text at a degree of detail dependent on the
the individual segments, since accidental depth of that level.
transitions at segment boundaries might also In addition to summarizing entire texts, VT can be
occur. Analogously, a global VT smoothness score used to summarize a given unit or sub-tree of that
text. In effect, we reverse the problem addressed extraction of sub-texts from larger documents.
by text summarization efforts so far: instead of Because vein expressions for each node include all
attempting to summarize an entire discourse at a of the nodes in the discourse within its domain of
given level of detail, we select a single span of reference, they identify exactly which parts of the
text and abstract the minimal text required to discourse tree are required in order to understand
understand this span alone when considered in the and resolve references for the unit or subtree
context of the entire discourse. This provides a below that node.
kind of focused abstraction, enabling the
Table 5: Verifying conjecture C1
Source No. of units Total no. Direct on the vein Indirect on the vein Inference How many obey C1
of refs (case 1) (case 2) (case 3)
English 62 97 75 77.3% 14 14.4% 5 5.2% 94 96.9%
French 48 110 98 89.1% 11 10.0% 1 0.9% 110 100.0%
Romanian 66 111 104 93.7% 2 1.8% 5 4.5% 111 100.0%
Total 176 318 277 87.1% 27 8.5% 11 3.5% 315 99.1%
University Press.