Foreground and Background Text in Retrieval

Foreground and background text in retrieval
Jussi Karlgren Timo Järvinen

SICS Conexor Oy
Stockholm Helsinki
jussi@sics.se timo.jarvinen@conexor.fi
Abstract Clauses in language represent events and processes of var-

ious kinds, and transitivity is that characteristic of a clause
Our hypothesis is that certain clauses have foreground which models the character of the process or event it repre-
functions in text, while other clauses have background
sents. This systemic model was first formulated by Halliday
functions and that these functions are expressed or re-
flected in the syntactic structure of the clause. (1967) and has since been elaborated by Hopper and oth-
Presumably these clauses will have differing utility for ers in a theoretic sense: very little empirical study on large
automatic approaches to text understanding; a sum- numbers of texts has been performed, and no systematic let
marization system might want to utilize background alone quantitative evaluation of the theories has even been
clauses to capture commonalities between numbers of proposed.
documents while an indexing system might use fore- One of the basic conceptual structures of language in use
ground clauses in order to capture specific characteris- is that actions are done by people and affect things. How the
tics of a certain document. action is performed, by whom, and on what are all encoded
in the clause by various syntactic mechanisms, in a general
Topic in text for information access system of transitivity. For most non-linguists, transitivity is
only explicitly mentioned in foreign-language classes when
This paper gives a short description of a series of experi- classifying verbs as transitive or intransitive, meaning if the
ments we have performed to test our hypotheses that clauses verb in question takes a direct object or prefers not to. This
have different functions in transmitting the information flow is of course central to the task of modeling action and effect,
of text, namely the functions often called topicality or the- but transitivity covers more than this one aspect of process
matic structure. The application area we chose to evaluate structure. Halliday’s model mentions a number of specific
our hypotheses through is that of analysis of texts for the factors or “systems” that cover the more general “system”
purposes of information retrieval. of transitivity: Number, type, and role of participant: human
or not? Agent? Benefactive?; Process type: existence, pos-
Topicality, foreground, and background session, spatial/locative, spatial/mobile (e.g. 1978, p. 118).
There is an entire body of research put into uncovering the These aspects of clausal organization hook up with factors
topical structure of clauses and texts. There is a long tra- such as temporal, aspectual, or mood systems to produce a
dition of semantic and pragmatic study of clause structure clause. This clause not only carries information about the
from the Charles University in Prague (e.g. Hajičová, 1993), event or process it represents, but it also crucially builds a
there are several results supporting our hypotheses using the text, together with adjacent clauses. In Halliday’s model
general theory of transitivity (Halliday, 1967, 1978; Hopper, (most comprehensively delineated in his 1967 publication) a
1979), there are numbers of algorithms for anaphor resolu- clause is the confluence of three systems of syntactic choice:
tion which touch clausal categorization, there are studies of transitivity, mood and theme. Transitivity, he writes, is the
automatic summarization algorithms, and there are studies set of options relating to cognitive content, mood being the
of text grammars which all have bearing on our work. How- system for organizing the utterance into a speech situation,
ever, no studies have been made specifically on clausal cat- and theme being the system for organizing the utterance into
egorization for topical analysis, and the empirical validation a discourse.
of these ideas have been held back for lack of effective tools. While there is ample psycholinguistic evidence that the
syntactic form of a clause is discarded after being processed
Transitivity and clauses by the hearer or reader (e.g. Jarvella, 1979), the commu-
nicative structure of the clause is retained to organize the
Transitivity is one of the most basic notions in the system of information content of the text or discourse. The structure
language, but ill formalized in the formal study of language. of a clause is not arbitrary, and cannot be determined in iso-
Copyright c 2002, American Association for Artificial Intelli- lation from other clauses in the vicinity and other events,
gence (www.aaai.org). All rights reserved. processes, and participants represented and mentioned in the
Narrative models
Table 1: Transitivity characteristics.
Feature High Low Other studies try to understand topic from the top down,
Participants 2 or more less building argument structures or narrative frames (e.g. Lehn-
Kinesis action non-action ert, 1980). Lehnert, for instance, discussing application
Aspect completed partial or imperfect to summarization, argues that we must have a picture of
Punctuality punctual continuous plot progression throughout a text, with a model of mental
Volitionality volitional non-volitional states of an implied reader which the text affects in various
Polarity affirmative negative ways. This high-level type of approach, most often with
Reality real non-real less psychological modeling involved, was typical during
Agency potent agent non-potent agent
Effect on object totally affected not affected
the knowledge-based systems projects of the late eighties.
Individuation individual object non-individual object The failings of such systems are often that they have too lit-
tle actual text processing capacity, and stumble on text pro-
cessing as a task. Many systems attempt to generate rhetor-
ical structures of various flavors based on local coherence
text.
models (e.g. Marcu, 1997 or 1998; Corston-Oliver, 1998;
Transitivity has been and is being studied only as a very Liddy, 1993), but quite often need more syntactic compe-
theoretical construction, and little work has been done which tence. Simpler models, with a form-filling approach (e.g.
would be of direct implementional quality. The theoreti- Strzalkowski et al, 1998) perform quite well, up to a point,
cal work concentrates on syntactic modeling of languages of with much less investment in discourse modeling. There is a
which there is rather little knowledge yet, as a first stage to- span of such models, ranging from completely general tem-
wards building a more complete description. Practical clues plates and very strictly task-oriented and tailored extraction
as to how to make use of transitivity are mainly due to Hop- patterns; the middle ground between them is claimed by hy-
per and Thompson (Hopper, 1979; Hopper and Thompson, brid approaches, which indeed are just that: combinations of
1980). Hopper argues for the distinction between back- both rather than bridges between them (Kan and McKeown,
ground and foreground in narrative, signaled by variation 1999). The greater effort in building a narration or discourse
along the qualities of the subject - such as animacy or hu- model has yet to prove useful: the bridge from text to dis-
manness, the predicate verb - such as aspect or tense mark- course model has not been usefully closed yet. It is in this
ing, and the voice of the clause. Many or even most of these type of model our contribution most clearly would be of ben-
factors cut across language divides (see e.g. Dahl and Karls- efit: these systems need to form a clearer understanding of
son, 1976). Hopper and Thompson then propose a number the informational rationale of syntax.
of characteristics along which transitivity is measured, some
of which are directly quantifiable as shown in the table be- Lexical chains
low. These factors we can make use directly in our imple-
mentation effort. Several approaches try to establish lexical chains in text as
a basis for understanding content for either indexing for re-
trieval or summarization tasks. Lexical units in the text are
Clauses and Topic picked out by some algorithm, possibly after the text is seg-
There is a large number of approaches to textual modeling mented (e.g. Barzilay and Elhadad, 1997), and relations
with very varying basis in theories of language or syntax. between units are established using terminological models.
Most models of text are statistically based, or have some Many of these models utilize text segmentation algorithms,
high-level model of argumentation to follow irrespective of based on occurrence statistics (e.g. Hearst, 1994 or Reynar,
syntax; some take recourse to cue phrases or expressions 1994), or thesauri and terminological databases, or cue or
specific to some domain to build a text model. Some use trigger phrases of some sort (e.g. Boguraev and Kennedy,
syntactic analysis as a low-level building block, but discard 1997). These types of model tend to be quite successful, but
what is left of the syntactic analysis after the argument struc- are often quite a-theoretic and will be difficult to improve
ture of the clause has been established. using theoretical add-on models.
Local coherence Open Research Questions and Bottlenecks

Much of topical study centers on local coherence of dis- The first research question for us is “What makes a text a
course or text, such as research models of theme-rheme or text?” This is a question others have asked before. Cur-
topic-comment, or research strands such as recent projects in rent work in text understanding is plentiful and partially
modeling centering (Grosz et al, 1995). In these approaches, successful. Most of the work in our field — that of lan-
topic is a feature of the clause, and is carried over to the next guage engineering — is based on statistical models of term
clause through relatively overt syntactic mechanisms such as occurrence, whether along lexical chains, in sentence ex-
argument organization, anaphor, or ordering. These types of traction algorithms, or using thesauri as a domain model.
model of local coherence, where some have a fairly sophis- The main exception is centering and other related and non-
ticated theoretical base rather along the lines of Halliday’s related anaphora resolution approaches. Most of the effort
theme system, will well benefit from using transitivity as a being put into text analysis today is along the lines of the
factor. theme system in Halliday’s analysis.
The arguably primary aspect of the clause is that of its Evaluation by information retrieval
cognitive content, and its relation to the other systems: this

Pure retrieval systems usually invest a fair amount of effort

is measured using very simple statistically based models or into completely ignoring text as text. Some exceptions in-
thesaurus-based models of lexical cohesion. The study of clude experiments to statistically process syntactic relations
transitivity would raise the sophisitication of this system to in text (e.g. Strzalkowski et al, 1997) to find typical rela-
match that of the study of theme and topicality per se. This tions entities engage in (in a sort of small-scale version of
gives us the task of primarily concentrating on transitivity extraction technology) and others trying to establish refer-
as a high level description of clause content, function, and ence chains in text (e.g. Liddy, 1994) to sort out occurrence
structure; when we do it can be connected to the discourse frequencies obscured by anaphor. Information retrieval sys-
through the efforts of other projects as outlined above — tems typically have neither textual models nor local coher-
and as an end result gain more knowledge of the structure of ence models to guide their analysis of texts; word occur-
texts. rence statistics are good enough for the tasks these systems
Since our hypothesis is that clauses bear different roles are used for at present. While the prospects of impressing in-
in a text, and that these roles at least in part are communi- formation retrieval system engineers with syntactic and se-
cated through their semantic role structure, this is where we mantic niceties will be unlikely, the evaluation framework
should concentrate our efforts. The mechanisms modeled by provided by information retrieval systems is useful enough
transitivity are strongly encoded in syntax, and thus largely for us to test our future algorithms for this purpose.
language specific in their encoding. However, their function
is not: we expect the transitivity of clauses to bear on the An efficient system for marking semantic roles
difference between foreground and background in text. This
is likely to be genre- and culture-specific, but not specific to To get further, we need automatic analysis for marking
languages. The utility of building a transitivity-based model large text corpora. Basic syntactic analysis can today be
of text will be bound to cultural areas, but independent of provided by several different tools: for our purposes de-
language. pendency analysis is the most appealing and most closely
vectored to the information we wish to find in text. The
There seems to be great promise to see our work to pro-
Conexor Functional Dependency Grammar Parser produces
vide empirical data towards building more syntactic analysis dependency-based functional descriptions for a number of
tools with ambitions towards building a more complete yet languages1(Tapanainen and Järvinen, 1997). The parser pro-
practical model of text. Transitivity on the clause level is one
duces surface-syntactic functional descriptions — subject,
of the key factors in understanding information organization
object, verb chain, various adverbials and so forth — that
on the textual level, and as of now, an untapped resource. allow us to extract linguistic correlates of foregrounding and
backgrounding: we started by those properties that were au-
People do not know foreground from background tomatically recognisable using the syntactic parser and se-
lected voice, ordinance, and aspect for further study.
To gain first knowledge of how people understand textual
Voice and aspect are the traditional mechanisms to control
information organization we performed a short experiment
the distribution of information in sentences. The analyser
to determine if human judges can agree on foreground and
recognises active (ACT) and passive (PSS) voice and classi-
background clauses. We gave three judges were given ten
fies the predicates accordingly. For the time being we have
news items each and instructions to mark foreground sen-
not implemented any systematic purely aspectual classifica-
tences. The instructions were deliberately left vague so as
tion scheme, which would require lexical information not
not to give judges too specific criteria for determining fore-
available in the system. The coding of aspectual phenom-
groundness. The three judges were all trained linguists and
ena is based on the morphosyntactic properties of English.
quite experienced in making textual judgments — they were
Therefore, the possible values in the aspect column are pro-
not selected to be representative of the population at large
gressive (PROG), and as possible values for non-progressive
but to gain some understanding of what sort of judgments
forms, future (FUTU) and past (PAST).
can be expected. Using standard measures for calculating
By the term ordinance we mean syntactic government be-
judge agreement we found agreement only marginally bet-
tween clauses. Initially, we made a distinction between the
ter than chance. Not even trained linguists can agree on what
main clauses (MC) and subordinate clauses (SC). Basically,
foreground sentences are.
the main clauses are foreground and subordinate clauses, es-
pecially the adverbial clauses, describe the circumstances of
Foreground clauses are special the action reported in the governing clause.
Two additional types were distinguished: report clause
We did find some common characteristics on those clauses
(RC) and the main clause (cMC), because in these clause
where they did agree: most strongly, the clauses where all
types governance relations are supposedly reversed with re-
judges agreed on marking them foreground were longer than
spect to foreground vs. background distinction.
other clauses, as a basic indication of their higher complex-
In addition to sentential properties, we examined the dis-
ity syntactically (by Mann Whitney U; p 0,95). These
tribution and form of the main elements of the sentence.

data give some encouragement to continue studying the cri-

teria given in the literature to see if they give purchase for 1
There is an online demo of the English FDG parser at
predictive analysis. http://www.conexor.fi/parsers.html
provide the county enough money to keep the system
Table 2: Syntactic features and roles produced by the system afloat.
name explanation comments
pred predicate
voice ACT — PASS Evaluating relevance of clause type variation
ord ordinance MC — SC — RC — cMC
aspect PROG — PAST — FUTU Given that we have a mechanism for distinguishing different
meta clause adverbials types of semantic roles in text, we use that information to
actor rank clauses according to foregroundness. One of our under-
isa-s predicative lying hypotheses is that foregrounded clauses contain more
theme relevant and topical content than do background clauses.
benef benefactive We ran a series of experiments using a number of queries
man manner mapped on FDG output from the Text Retrieval Conference TREC, where a large
loc location mapped on FDG output number of texts are judged against a set of retrieval queries.
goa goal mapped on FDG output Our method was to calculate if clauses with a large number
sou source mapped on FDG output of foreground markers contain more relevant content words
temp temporal mapped on FDG output in texts judged relevant texts than in texts judged irrelevant.
wrds base forms The rationale for this experiment is that texts that treat a sub-
sent running text tokens ject in foreground would be more likely to be relevant for
an information retrieval query than texts that mention some-
thing in passing, as a background fact.
To achieve better correspondences with the distribution of We examined the distribution of highly relevant content
the content words, we aim to go beyond the mere surface words (for query 332, e.g., “tax”, ”income” “evade” “eva-
distributions and syntactic functions. An additional compo- sion” in their different forms) over the clause arguments in
nent called SemRole uses the data structures produced by the the example texts, and used these findings to define and fine-
FDG parser to recognise the semantic roles of the elements. tune a clause weighting scheme based on the analysis pro-
This is in fact what the parser produces for the adverbial vided by the semantic role parser program. The objective
elements. The important syntactic distinction is between was to find a weighting scheme to rank foreground clauses
the actants vs. circonstants (see Hajičová, 1993, Tésniere, highly so that the difference between foreground and back-
1959). The actantial roles actor, theme and benefactive are ground could be used in retrieval systems for weighting con-
the participants in the action and the circumstantial roles tent words appropriately.
such as locations and goal, source and temporal functions The adverbials of manner, location, and time are lexically
usually describe the background of the action. We expect categorized into more or less situation-bound: “quickly”
the distribution of the sentential elements to correlate with and “surprisingly”, “here”, “today” e.g. are more situation-
the foreground vs background distinction. bound and specific and less stative than are e.g. “eventu-
The FDG parser as built today produces surface syntactic ally”, “never”, “while”, and “well”. Most of the lexical ar-
functions only and the semantic roles are extracted from the gument slots are in themselves more situation-bound: if no
analysis through a system developed specifically developed argument is given, the clause has a more specific character
for this purpose. Table 6 shows a simplified matrix featuring than general clauses without actor, beneficient, or explicit
some pieces of information produced by the SemRole pro- goal. The voice and tense of the clause also are graded: fu-
gram. The input sentence is given in the last column. The ture tense is more situation-bound than is progressive tense
first column shows the predicate, second column voice (ac- or passive voice. If the agent is human — a personal pro-
tive or passive), third column the semantic subject, or actor, noun or a person name, it is more grounded in the situation.
fourth column the second argument, or theme, fifth column We find that content word-containing clauses have more
the recipient or benefactive. Some can be extracted directly foreground characteristics (statistically significant differ-
from the functional description, such as adverbial elements. ence by the Mann Whitney U test for rank sum). After some
The analysis used for these experiments contains only a manual tuning — a task for which machine learning algo-
few, selected pieces of the morphosyntactic information pro- rithms would be a natural tool — we settled on an ad-hoc
duced by the parser. For example, the part of speech infor- weighting as shown in Table 3.
mation is generally not used. The actor field shows the tag
PRON for a pronoun, which is a possible feature of fore-
grounding. There is ample room for improvement. Experimental material
Sample analysis We used material only from the Los Angeles Times sec-
tion of the TREC collection in order to minimize the risk of
The analysis of the sentence below is presented in Table 7. source-dependent stylistic variation swamping the distinc-
“Dependent on the state for most of its mental tions between foreground and background, and used queries
health money, county officials said they reluctantly 301 to 350 in the TREC test battery. We ran two experi-
ordered the closures when the mental health budget ments, once with all fifty queries, and once where we dis-
plunged hopelessly in the red after the state failed to carded queries with five or fewer relevant documents in the
We do this once using clause weighting, and once with
Table 3: Weighting of semantic role fields same preprocessing but disregarding the clause weight, in
Function Foreground weight
effect only providing morphological normalization. In this
Predicate verb Any: +1
Voice - experiment we do not perform complete retrieval on other
ord Future: +1 documents — we aim in this experiment to see the extra ef-
Aspect - fect semantic roles give a retrieval tool, not to build an entire
meta - system from scratch. We preprocess queries by including
Actor If a pronoun or name: +1 all words from title, description, and narrative fields of the
Theme Any: +1 TREC topics, normalize for case and exclude punctuation.
Beneficient Any: +1 Some examples are shown below.
Any: +1 plus extra point
Manner for any member of list of international organized crime identify organization that par-
“punctual” adverbs. ticipate in international criminal activity the activity and if
Any: +1 plus extra point possible collaborate organization and the country involved a
Location for “here”, “there” or po- 301 relevant document must as a minimum identify the organiza-
tential names. tion and the type of illegal activity eg columbian cartel ex-
Goal Any: +1 porting cocaine vague reference to international drug trade
Source Any: +1 without identification of the organization involve would not
Time Any: +1 be relevant
magnetic levitation maglev commercial use of magnetic lev-
itation a relevant document must contain magnetic levitation
Table 4: Average precision or maglev it shall be concern with possible commercial ap-
Full set of all 50 queries 313 plication of this phenomenon to include primarily mass tran-
Morphology only 0.0729 sit but also other commercial application such as maglev fly-
Morphology and semantic roles 0.0815 wheel for car discussion of superconductivity when link to
Trimmed set of 31 queries maglev and government support plan when link to maglev be
Morphology only 0.0967 also relevant
Morphology and semantic roles 0.1166 income tax evasion this query is looking for investigations
that have targeted evaders of us income tax a relevant docu-
ment would mention investigations either in the us or abroad
332 of people suspected of evading us income tax laws of partic-
LA times subcollection2, leaving thirty-one queries to work ular interest are investigations involving revenue from illegal
on. The assumption was that evaluating the results on the ba- activities as a strategy to bring known or suspected criminals
sis of very few relevant documents would lead to results be- to justice
ing swamped by other types of noise in the material — as it The search is conducted using a simple weighted fre-
turns out, a correct assumption. Discarding the low-density quency index, an idf table, and a very simple search script:
queries does improve results as will be shown below. each word in the query word set — as shown above — is
matched towards all words in all documents. If a match
Experiment procedure is found, the document score is incremented by the word
Retrieval systems rank texts according to term occurrences. weight (tf), based on its frequency of occurrence in the text
Given a clause weighting scheme, we can weight terms oc- multipled by the clause weights of clauses in which it oc-
curring in a foregrounded clause more than we will weight curs, and also multiplied by its collection frequency (idf):
terms only occurring in background clauses. the number of documents in the collection that contain the
For our test we for each query take the TREC LA Times word.
texts with relevance judgments for that query and run a The results are quite convincing. The semantic role
search experiment on them, in effect reranking documents weighting improves results. The Tables 4 and 5 show av-
that other search systems retrieved from the TREC collec- erage precision for all fifty queries for the two cases and
tion. how many queries showed improved results3 . Some queries
fail to show improvement for the new weighting scheme, but
2
Left out were queries no.: 303 307 308 309 320 321 324 326 most do; if queries with less than five relevant documents are
327 328 334 336 338 339 340 344 345 346 348 discarded the difference is greater.
The experiment is a clear success, and we can conclude
that semantic role based clause weighting does add informa-
tion to term frequencies.
Table 5: Number of improved queries
Full set of all 50 queries
Morphology only 17 References
Morphology and semantic roles 25 Barzilay, Regina and Michael Elhadad. 1997. “Using lexical
Trimmed set of 31 queries chains for text summarization”. In ACL/EACL Workshop on In-
Morphology only 9 telligent Scalable Text Summarization, pages 10-17.
Morphology and semantic roles 19 3
Comprehensive tables available from project web site.
Boguraev, Branimir and Christopher Kennedy. 1997. “Salience- Strzalkowski, Tomek, Louise Guthrie, Jussi Karlgren, Jim Leis-
based content characterisation of text documents”.
In tensnider, Fang Lin, Jose Perez-Carballo, Troy Straszheim, Jin
ACL/EACL Workshop on Intelligent Scalable Text Summariza- Wang, Jon Wilding. 1997. “Natural Language Information Re-
tion. trieval: TREC-5 Report”. Proceedings of the fifth Text Retrieval
Conference, Donna Harman (ed.), NIST Special Publication,
Corston-Oliver, Simon. 1998. “Beyond string matching and cue
Gaithersburg: NIST.
phrases” In Proceedings of AAAI 98 Spring Symposium on In-
telligent Text Summarization. Strzalkowski, Tomek, Jin Wang, and Bowden Wise. 1998. “A ro-
bust practical text summarizer”. In AAAI 98 Spring Symposium
Dahl, sten, and Fred Karlsson. 1976. ”Verbien aspektit ja objektin on Intelligent Text Summarization, pages 26-33.
sijamerkint: vertailua suomen ja venjn vlill”. Sananjalka 18,
1976, 28-52. Tapanainen, Pasi and Timo Jrvinen, 1997. A non-projective de-
pendency parser. Proceedings of the 5th Conference on Applied
Ralph Grishman. 1997. “Information Extraction: Techniques and Natural Language Processing, pp. 64-71. March 31st - April
Challenges”. In Information Extraction (International Summer 3rd, Washington D.C., USA.
School SCIE-97), edited by Maria Teresa Pazienza. Springer-
Verlag. Tesnière, Lucien, 1959. Éléments de syntaxe structurale. ditions
Klincksieck, Paris.
Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein. 1995.
“Centering: A Framework for Modelling the Local Coherence
of Discourse”. Computational Linguistics. 21:2 203-227.
Hajicová, Eva. 1993. Issues of Sentence Structure and Discourse
Patterns, volume 2 of Theoretical and Computational Linguis-
tics. Prague: Institute of Theoretical and Computational Lin-
guistics, Charles University.
Halliday, M. A. K. 1967. “Notes on Transitivity and Theme in
English.” Journal of Linguistics 3:37-81, 199-244.
Halliday, M. A. K. 1978. Language as social semiotic. London:
Edward Arnold Ltd.
Marti Hearst. 1994. “Multi-Paragraph Segmentation of Exposi-
tory Text”. Proceedings of the 32th Annual Meeting of the Asso-
ciation of Computational Linguistics, (Las Cruces, June 1994).
ACL.
Hopper, Paul J., 1979. Aspect and foregrounding in discourse. In
Syntax and Semantics, Vol 12, pp. 213-241. Academic Press.
Hopper, Paul J. and Sandra Thompson. 1980. “Transitivity in
Grammar and Discourse”, Language, 56:2, pp. 251-299.
Jarvella, Robert. 1979. “Immediate memory and discourse pro-
cessing” in G.B. Bower (ed) The psychology of learning and
motivation Vol. 13. New York: Academic Press.
Kan, Min-Yen and Kathleen McKeown. 1999. Information Ex-
traction and Summarization: Domain Independence through Fo-
cus Types. Columbia University Technical Report CUCS-030-
99.
Lehnert, Wendy G. 1980. “Narrative text summarization”. In
Proceedings of the First National Conference on Artificial Intel-
ligence, 1980.
Elizabeth Liddy. 1993. “Development and implementation of
a discourse model for newspaper texts”. Dagstuhl Seminar on
Summarizing Text for Intelligent Communication.
Marcu, Daniel. 1997. From discourse structures to text sum-
maries. In Proceedings of the ACL/EACL Workshop on Intelli-
gent Scalable Text Summarization.
Marcu, Daniel. 1998. To built text summaries of high quality,
nuclearity is not sufficient. In AAAI-98 Spring Symposium on
Intelligent Text Summarization.
Jeffrey C. Reynar. 1994. An Automatic Method of Finding
Topic Boundaries. Proceedings of the 32th Annual Meeting of
the Association of Computational Linguistics, (Las Cruces, June
1994). ACL.
Table 6: Voice and semantic roles attached to a predicate “give” in various alternation patterns.
pred voice actor theme benef sentence
give ACT he book - He gave John a book.
give PSS - book John A book was given to John.
give PSS - book John John was given a book.
Table 7: A sample matrix produced by the analysis system.

pred voice ord aspect meta actor isa-s theme benef man loc temp
say ACT RC PAST - county official - cMC - - - -
order ACT cMC PAST - they PRON - - - reluctantly - -
plunge ACT SC PAST - mental health budget - - - hopelessly red when
fail PSS SC - - - - - - - - -
provide ACT SC - - - - - - - - -
keep ACT SC - - - - system - - - -

Foreground and Background Text in Retrieval

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Foreground and Background Text in Retrieval

Uploaded by

Copyright:

Available Formats

Foreground and background text in retrieval

Jussi Karlgren Timo Järvinen

Abstract Clauses in language represent events and processes of var-

Local coherence Open Research Questions and Bottlenecks

Pure retrieval systems usually invest a fair amount of effort

data give some encouragement to continue studying the cri-

Table 7: A sample matrix produced by the analysis system.

You might also like