You are on page 1of 7

m

m
A Frame-Semantic Approach to Semantic Annotation
John B. Lowe jblowe~garnet.berkeley.edu

CoIHn F. B a k e r co11£nb@icsi.berkeley, edu

Charles J . F i l l m o r e f i l l m o r e @ c o g s c i . b e r k e l e y , edu
D e p a r t m e n t of Linguistics
University o f C a l i f o r n i a
Berkeley, C A 94720
Abstract when we examine particular semantic fields, it is ob-
vious that each field brings to mind a new set of
The number and arrangement of seman- more specific roles. In fact, the more closely we look
tic tags must be constrained, lest the size at individual predicates, the more specific the argu-
and complexity of the tagging sets (tagsets) ment roles become, creating the specter of trying to
used for semantic annotation become un- define an unlimited number of very fine-grained tags
wieldy both for humans and computers. and attributes. An adequate account of the syntax
The descriptionof lexicalpredicateswithin and semantics of a language will inevitably involve
the framework of frame semantics provides a fairly detailed set of semantic tags, but how can
a natural method for selecting and struc- we find the right level of 9ranularity of tags for each
turing appropriate tagsets. semantic area?
Consider the sentence:
(1) The waters of the spa cure arthritis.
1 Motivation
A semantic annotation of the constituents must
The research present here is to be conducted under identify at least
the FrameNet research product at the University of
California.1 On this project our primary aim is to • the action or state associated with the verb,
produce frame-semantic descriptions of lexical items; possibly expressed in terms of primitives or
our concern with semantically tagged corpora is at some kind of metalanguage;
both ends of our research. That is, we expect to use
• the participants (normally expressed as argu-
partially semantically tagged corpora in the inves-
ments); and
tigation stage--perhaps nothing more than having
WordNet hypernyms associated with nouns--but we • the roles of the participants in the action or
will produce semantically tagged corpus lines as a state.
by-product of our work.
Most major grammatical theories now accept the A basic parse will identify the sentence's syntactic
general principle that some set of semantic roles constituents; from the point of view of the head verb
("case roles", "thematic roles", or "theta roles") is cure, then, a semantic annotation should reveal the
necessary for characterizing the semantic relations mapping between the syntactic constituents and the
that a predicate can have to its arguments. This frame-semantic elements they instantiate. In sen-
would seem to be one obvious starting-point for tence (1) above, for example, the grammatical sub-
choosing a tag set for semantically annotating cor- ject "the waters of the spa" corresponds to the the-
pora, but there is no agreement as to the size of matic ca~er of the curing effect on the entity ex-
the minimal necessary set of "universal" roles. Also, pressed as "arthritis", the verb's syntactic direct ob-
ject and its thematic patient. 2
1The work is housed in the International Computer However, there is something incomplete about
Science Institute in Berkeley and funded by the National such an analysis: it fails to anchor the arguments of
Science Foundation under NSF grant IRI 96-18838. The
official name of the project is "Tools for lexicon build- 2Here we use the word patient (in italics) as the name
ing"; the PI is Charles J. Fillmore. Starting date March of a case role; we will also use the word in the medical
1, 1997. sense later in this paper. Caveat lector/

18
cure within a "generic medical event" where it would We will say that individual words or phrases evoke
be understood that the disease (arthritis) must be particular frames or instantiate particular elements
borne by some sufferer, and that a sufferer under- of such frames. So, for example, if we are examining
going a treatment is participating as a patient in the "commercial transaction" frame, we will need
such an event. We identify such "generic events" as to identify such frame elements as BUYER, SELLER,
frames, and express our understanding of the struc- PAYMENT, GOODS, etc., and we can speak of such
ture of such events and the relationship of linguistic words as buy, sell, pay, charge, customer, merchant,
material to them in terms of the theory of frame clerk, etc., as capable of evoking this frame. In
semantics. particular sentences, we might find such words or
phrases as John, the customer, etc. instantiating the
2 Frame Semantics. BUYER, or a chicken, a new car, etc., instantiating
the GOODS.
In frame semantics we take the view that word mean-
ings are best understood in reference to the concep-
3 Inheritance in Frame Semantics
tual structures which support and motivate them.
We believe, therefore, that any description of word Of course, speakers of a language know something
meanings must begin by identifying such underlying about the differences and similaritiesamong vari-
conceptual structures. 3 ons types of commercial transactions, e.g. that buy-
Frames have many properties of stereotyped sce- ing a small item in a store often involves making
narios - - situations in which speakers expect certain change, etc. Strictly speaking, this is "world knowl-
events to occur and states to obtain. 4 edge" rather than "linguisticknowledge", but this
In general, frames encode a certain amount of level of detail is required even to parse sentences
"real-world knowledge" in schematized form. Con- correctly, e.g. to recognize the differentfunctions of
sider the common scenario which exemplifies the the PPs in "buy a candy bar with a red wrapper"
commercial transaction .frame: the elements of such and "buy a candy bar with a $20 bill" and thus to
frames are the individuals and the props that par- attach them appropriately.
ticipate in such transactions (which we call FRAME
ELEMENTS): the individuals in this case are the two frame (CommercialTransaction)
protagonists in the transaction; the props are the frame-elements{BUYER, SELLER, PAYMENT, GOODS}
two objects that undergo changes of ownership, one scenes (BUYER gets GOODS,
of them being money. SELLER gets PAYMENT)
Some frames encode patterns of opposition that
human beings are aware of through everyday expe- frame (Rea~stateTransaction)
rience, such as our awareness of the direction of grav- inherits (Corn mercialTransaction)
itational forces; stillothers reflect knowledge of the link(BORROWER = BUYER, LOAN = PAYMENT)
structures and functions of objects, such as knowl- frame-elements{BORROWER, LOAN, LENDER}
edge of the parts and functions of the human body. scenes (LOAN (from LENDER) creates PAYMENT,
The study of the frames which enter into human cog- BUYER gets LOAN)
nition is itselfa huge field of research - we do not
claim to know in advance how much frame knowl-
edge must be specifically encoded in frame descrip- Figure h A subframe can inherit elements and se-
tions to make them useful for either linguistic or mantics from its parent.
NLP purposes. We expect to be able to draw ten-
tative conclusions about this based on what we find More complicated cases require more elaborated
in corpora. frames. Thus, "buy a house with a 30-year mort-
3For a discussion of these ideas, see (Fillmore, 1968); gage" involves a differentframe from buying a candy
(Fillmore, 1977b); (Fillmore, 1977a); (Fillmore, 1982); bar, and entails a slightly different interpretation
(Fillmore and Atkin.~, 1992); (Fillmore and Atkin.% of the PAYMENT element. The relationship be-
1994). tween frames is frequently hierarchical;for example,
4The word frame has been much used in AI and NLP the frame elements BUYER, SELLER, PAYMENT, and
research. We wish to give the word a formal interpreta- GOODS will be common to all commercial transac-
tion only to the extent that it helps us in our research
and provides a container for the features and entities we tions; the purchase of real estate contains all of them
describe. We do not, in this context, depend on any and (typically) adds a LOAN and a bank (typically)
cialm.q about the cognitive status of frames. as LENDER. In Our database, these two frames might

19
be represented as shown in Figure i.s label meaning
Corpus tagging for a sentence like sentence (2): HEALER individual who tries to bring
about an improvement in the
(2) Susan took out a huge mortgage to buy PATIENT
that new house. PATIENT individualwhose physical well-
would have to recognize Susan as playing slightly being is low
different roles in the two associated frames. DISEASE sicknessor health condition that
A similar problem in using labels from frame se- needs to be removed or relieved
mantic descriptions in the tagging of corpus lines is WOUND tissue damage in the body of the
PATIENT
due to the fact that separate parts of any single sen-
tence can evoke different semantic frames. Consider BODYPART limb, organ, etc. affected by the
the following sentence: DISEASE or WOUND
SYMPTOM evidence indicating the presence
(3) George's cousin bought a new Mercedes of the DISEASE
with her portion of the inheritance. TREATMENT process aimed at bringing about
recovery
In seeing this sentence merely as an expression evok- MEDICINE substance applied or ingested in
ing the commercial transaction frame, we could be- order to bring about recovery
gin by tagging the subject of the sentence, "George's
cousin", as the BUYER, and the object, "a new Met- Table 1: Part of Frame-semantic "Tagset" for the
cedes" as the GOODS, and the oblique object, "her Health Frame
portion of the inheritance", marked by the preposi-
tion "with", as the PAYMENT. This could be done
in a fairly natural and transparent way, as long as language of health and sickness and showing how the
the tags were clearly seen as the names of frame ele- elements and structure of this frame would be iden-
ments specificallyrelated to the head verb "bought" tiffed and described. First, appealing to common,
in that sentence. But since the words "cousin" and unformalized knowledge of health and the body, the
"inheritance" evoke frames of their own, the same frame semanticist identifies the typical elements in
sentence could easily come up in our exploration of everyday health care situations and scenarios, a pro-
the semantics of those words as well. In the case cess involving the interaction of linguistic intuition
of "inheritance", for example, the information that and the careful examination of corpus evidence.
it gets used for buying something will make clear The first product of this analysis is a preliminary
that this is an instance of estate-inheritance rather list of frame elements (FEN) from this domain, such
than genetic inheritance (or frame inheritance!), and as, for instance, those shown in Table 1.
the phrasing "her portion" fits frame understand- We have found it necessary to include all of these
ings about the distribution of an inheritance among elements for our purposes, even though some of them
multiple heirs. In other words, if we find ourselves are so closely related that they are unlikely to be
tagging the frame elements of Inheritance in that given separate instantiation in the same clause. Our
same sentence, the phrase "George's cousin" would justification for distinguishing them is based on the
be tagged as an HEIR in that frame. results of corpus research and on comparison of the
elements of this frame with those of other related
4 Applied frame semantics: a frames. Corpus examples in which WOUND and DIS-
sample frame description. EASE are both instantiated are of course rare, and
given this complementary distribution we might be
Tagsets for semantic annotation would be derivable tempted to identify these as variants of a single
from a database of frame descriptions like the ones frame element (which we might call AFFLICTION).
in Figure 1 above. We can move to another frame But this would prevent us from being able to express
to illustrate how frame-based annotation would be certain syntactic and semantic generalizations, such
accomplished by considering a few words from the as the fact that while we speak of curing diseases,
5We leave out of this account the inheritance of we do not speak of curing wounds, and we speak of
a higher-level EXCHANGE frame in the COMMERCIAL- wounds but not diseases as heMing, s
TRANSACTION fralne, and the means for showing that
a completed instance of the REALESTATETRANSACTION eThere might be alternative ways of considering such
scene is a prerequisite to the enactment of the associated data. It is conceivable that a description with, say, AF-
COMMERCIALTRANSACTIONscene. FLICTION as a single role element could be maintained

20
In the specificcase of the contrast between W O U N D Identifying the semantic flame associated with a
and DISEASE we find in metaphor further support word and the FEs with which it constellates does
for our decision to keep them separate. Metaphoric not, of course, constitute a complete representation
uses of "cure" and "heal" tend to take direct ob- of the word's meaning, and our semantic descrip-
jects which are target-domain analogues of DISEASE tions will not be limited to just this. However, we
and W O U N D respectively. One of the most com- believe that such an analysis is a prerequisite to a
m o n instantiations of the DISEASE complement in theoretically sound semantic formalization,s While
metaphorical uses of cure is the word ills,a word any given frame description could be made more pre-
which in fact appears to be used only in such cise for other N L P / A I purposes (such as inference-
metaphorical contexts (in talk about "curing soci- generation), the development of such a formalism is
ety's ills", for example); and the direct objects of not a central part of our current work.
metaphorical heal tend to be based on the notion of For our present purposes, the adequacy of listsof
a tear or cut or separation, the words wound and frame elements such as what we present in Table 1
scar first of all, but also such words as r/ft, schism, for the vocabulary domain of health care can be es-
and breach. tablished only ifpreciselythese elements are the ones
For each semantic frame, the process of elucida- that are needed for distinguishing the semantic and
tion involves a series of steps: combinatorial properties of the major lexical items
that belong to that domain. An initial formulation
1. Identification of the most frequent lexical items
of the combinatorial requirements and privileges of
which can serve as predicates in this frame,
a frame's lexical members - - here we concentrate on
2. Formulation of a preliminary list of frame verbs - - can be presented as a list of the groups of
elements (encoded we expect as a T E L FEs that may be syntactically expressed or perhaps
compliant SGML document using feature struc- merely implied in the phrases that accompany the
tures (Sperberg-McQueen and Burnard, 1994), word.
A Frame Element Group (FEG) is a list of the
3. Annotation of examples from a corpus by tag- FEs from a given frame which occur in a phrase or
ging the predicate with the name of the frame
sentence headed by a given word. Table 2 gives ex-
and its arguments with the names of the FE's amples of such F E G s (including F E G s with only one
designating their roles relative to the predicate member) paired with sentences whose constituents
(also using SGML markup introduced with soft- instantiate them. For purposes of this discussion,
ware developed for this purpose), the frame elements are identified here using single
4. Revision of the frame description - - specifica- letter abbreviations, and the structure of an F E G is
tion of the co-occurrence constraints and pos- shown as being merely a bracketed list. W e recog-
sible syntactic realizations in the light of the nize such a naming scheme is inadequate for a large
corpus data, and, annotation project, and certainly the representation
of F E G structures will have to be more powerful.
5. Retagging of the corpus examples to fit the re- These, however, are minor problems with technical
vised frames.7 solutions. W e focus below on other major issues
The last two steps will be repeated as needed to we are confronting in interpreting the structure of
refine the frame description. frames as expressed by FEGs.
At the lexicographic level of description we could
by describing certain distinctions between "cure" and simply list the full set of F E G s for a given lexical
"heal" as involving selectionai restrictions. Our inclina-
tion, however, is to maximize the separation of frame unit. However, in many cases the F E G potential
elements at the beginning, and to postpone the task of of a verb can be expressed in one or more simpli-
producing a parsimonious and redundancy-free descrip- fying formulas, by, for example, recognizing some
tion until after we have completed our analysis. FEs as optional. Thus, since we find both (H, B}
ZIn the context of the FrameNet project, the question ("The doctor cured my foot") and {H, B, T } ("The
of how much text will be tagged is a practical one. Our
direct purpose is not to create tagged corpora, but to doctor cured my foot with a new treatment"), both
tag enough corpus lines to allow us to make reliable gen- sentences are using the verb cure in the same sense,
eralizations on t h e meanings and on the semantic and we can represent both patterns in a single formula
syntactic valence of the lexical entries we have set out that treats the T element as an optional adjunct
to describe. Whether we choose to tag more than what
we need for our analysis will depend on the extent to SThere are numerous suggestions, not reviewed here,
which the process becomes automated and the resources on how to give full semantic representations (Jackendoff,
available. 1994); (Sowa, 1984); (Schank, 1975), etc.

21
FEG Frame Ele- Example diabetes). It is important to recognize these cases,
(abbr.) ment Group since the lexical semantics of verbs sometimes re-
{H,B,T} HEALER, The doctor treated quire that certain frame elements be instantiated or
BODYPART, my knee with heat. clearly recoverable from the context: corpus research
TREATMENT on the verb cure, for example, shows that the DIS-
(H,D} HEALER, The doctor cured ORDER is regularly instantiated. Without explicit
DISORDER my disease. coding of the substructure of the PATIENTthe sen-
{P} PATIENT The baby recovered. tence He cured the leper ({H,Pd}) would stand as a
{M,B} MEDICINE, The ointment cured counter-example to this generalization.
BODYPART m y foot. There are cases where different but related senses
{B} BODYPART HIS foot healed. of a predicate have distinct FEG possibilities. For
{W} WOUND The cut rapidly example, the verb heal has two uses, one of which
healed. participates in a Causative/Inchoative valency al-
ternation (Levin, 1993) and one which does not. In
Table 2: Examples of Frame Element Groups the use where it refers to the growth of new tissue
(FEGs) over a wound, it can be found in both transitive and
intransitive clauses: "The cut healed" ({W}) and
"The ointment healed the cut" (the ointment facil-
(expressed perhaps as {H, B, (T)}). itated the natural process of healing - - {M, W}).
It will not be quite that automatic, however; fur- But there is also a purely transitive use with a mean-
ther distinctions are needed. For example, while we ing very close to that of cure, with {H, D} or {M,
can agree that the TREATMENTelement in the previ- D}, as in "The shaman healed my influenza" or "The
ous examples was merely unmentioned, the omission waters healed my arthritis", and this use of heal usu-
of the DISEASE element in a sentence like "The doc- ally implies something extra-medical or supernatu-
tor cured me" has a somewhat different status: there ral. In this usage, there is no corresponding intran-
is clearly some DISEASEthat the speaker has in mind, sitive "*My influenza/arthritis healed."
and its omission is licensed by the assumption that The verb sense distinctions we make may some-
its nature is given in the context. That is, a possible times be less detailed than those appearing in most
"of" phrase was omitted from that sentence because dictionaries, since, as many researchers have noted,
its content had been previously mentioned or could dictionary sense distinctions are often overprecise
otherwise be assumed to be known to both conver- and incorporate pragmatic and world knowledge
sation participants. In the tagging of corpus lines, that do not properly speaking inhere in the word
then, we will also indicate the status of "missing" itself. An excellent example of this kind of excessive
elements to the extent that we can tell what that distinction ~ pointed out in (Ruhl, 1989), p.7: one
is. Such information will be presented in the repre- of the dictionary definitions of break is "to rupture
sentation of the FEG associated with the predicate. the surface of and permit flowing out or effusing" as
9 in He broke an artery. On the other hand, we would
In contrast to cases where frame elements are expect to capture by this process all the kinds of al-
"missing" (implied but unmentioned, optional, etc.), ternations that (Levin, 1993) has shown to be linked
some examples require that we explicitly recognize to semantic distinctions, some of them quite subtle.
(i.e. encode) multiple frame elements for a single The final versions of the lexical entries will encom-
constituent. Thus, the disorder may be identified in pass full semantic/syntactic valence descriptions,
the description of the patient (e.g. leper, diabetic); where the elements of each FEG associated with a
we wish to annotate this constituent as Pd, which verb sense will be linked to a specification of sortal
will be taken as indicating that the constituent sat- .features, indicating the "selectional" and syntactic
isfies the P role in the frame, but that it also secon- properties of the constituents that can instantiate
darily instantiates a D role, since these nouns des- them.
ignate people who suffer specific diseases (leprosy,
5 Conclusion
°Where feasible, because of our interest in sortal fea- We have suggested a theoretical basis and a working
tures of arguments, we will identify the nature of the
missing element f~om the context. A similar issue arises methodology for coming up with an appropriate set
in cases of anaphora; we may or may not resolve the of semantic tags for the semantic frame elements,
anaphora's referent in the annotations, depending on and believe that such frames may constitute a sort
practical considerations of time and effort involved. of "basic level" of lexical semantic description. As

22
such they would be an appropriate starting-point for programs in picking likely interpretations. Initially
both a broad-coverage semantic lexicon and for the the frequencies would be generated using our hand-
semantic tagging of corpora. tagged corpus examples; eventually we hope to be
We have also pointed out the importance of incor- able to train on the hand-tagged examples and ulti-
[] porating the notions of inheritance and other sub- mately automate (at least partially) the tagging of
structuring conventions in tagsets to reduce the size instances, at least for preliminary word sense dis-
and complexity of the descriptions and to capture ambiguation, to be reviewed by a researcher. The
generalizations over natural classes. automatic categorization of the arguments would
We recognize several shortcomings with our ap- use such information as WordNet synonyms and hy-
proach which we hope to be able to address in the pernyms (cf.(Resnik, 1993)), machine-readable the-
future. sauri, etc.,
First, it is clear that the size of the descriptions
will increase rapidly as the annotation proceeds and
m we will need to find some explicit means of abbrevi- References
ating representations, of collapsing FEGs in a prin- Ted Briscoe, Valeria De Paiva, and Ann Copes-
cipled way, and of relating frames together (both take, editors. 1993. Inheritance, Defaults and
[] within and across semantic fields). This is both a the Lexicon. Studies in Natural Language Pro-
practical and theoretical problem. We have shown cessing. Cambridge University Press, Cambridge,
a few clear examples in which the judicious use of England.
mm the notion of inheritance, along the general lines Charles J. Fillmore and B.T.S. Atkins. 1992. To-
of the ACQUILEX Project (Briscoe et al., 1993), wards a frame-based lexicon: the semantics of risk
should permit the concise representation of the lexi- and its neighbors. In A. Lehrer and E. F. Kittay,
cal knowledge required to give a useful and relatively editors, Prames, Fields and Contrasts, pages 75-
complete description of a word's semantic range. If 102. Lawrence Erlbaum Associates, Hillsdale, NJ.
the valence description (the FEG together with links
Charles J. Fillmore and B.T.S. Atkins. 1994. Start-
to grammatical functions) associated with individual ing where the dictionaries stop: the challenge for
words is attached to each valence-bearing lexical to- computational lexicography. In B.T.S. Atkins and
ken in a corpus, then if the corpus is parsed accord- A. Zampolli, editors, Computational Approaches
ing to the same criteria by which the linking has to the Lexicon. Oxford University Press, New
been stated, we can avoid the problem of actually York.
tagging the phrases that instantiate frame elements
Charles J. Fillmore. 1968. The case for case. In
(and hence avoid the problem of multiple tagging
Universals in linguistic theory, pages 1-90. Holt,
for constituents that figure in more than one frame Rinehart and Winston, New York.
in the same sentence), because the constituents that
play specific semantic roles in the sentence can be Charles J. Fillmore. 1977a. The need for a frame
computed from the parse. The ability to accomplish semantics within linguistics, statistical Methods
something like that is desirable, but it is not some- in Linguistics, pages 5-29.
thing to which we are presently committed. Charles J. Fillmore. 1977b. Scenes-and-frames se-
We intend first to focus on prototypical or core uses mantics. In Antonio Zampolli, editor, Linguistics
U of the words. However, our preliminary research in- Structures Processing, volume 59 of Fundamental
dicates that it would be difficult, and undesirable, Studies in Computer Science, pages 55-82. North-
[] Holland Publishing.
to exclude metaphorical uses, if only because the
metaphorical uses can often shed light on the struc- Charles J. Fillmore. 1982. Frame semantics. In
ture of the core uses. However, we are limiting our Linguistics in the morning calm, pages 111-137.
attention to a limited number of semantic domains, Hanshin Publishing Co., Seoul, South Korea.
and metaphorical extensions from the words in our
wordlist that go far beyond our semantic fields will Ray S. Jackendoff. 1994. Patterns in the mind: lan-
probably have to be set aside.
guage and human nature. Basic Books, New York.
Finally, we should make a few remarks on the Beth Levin. 1993. English Verb Classes and Alter-
m scope of our intended effort. We plan to create a nations: A Preliminary Investigation. University
"starter lexicon" containing some 5,000 lexical items of Chicago Press, Chicago.
m indexed to examples of their use. With each entry Philip Resnik. 1993. Selection and Information:
we shall associate token frequencies with the various A Class-Based Approach to Lexical relationships.
m FEGs for each word sense, in order to assist NLP University of Pennsylvania dissertation.

23
m
mm
Charles Ruhl. 1989. On monosemy : a study in lin- m
gusitic semantics. Albany, N.Y.: State University
of New York Press. []
Roger C. Schank. 1975. Conceptual information m
processing. North-Holland., New York.
John F. Sowa. 1984. Conceptual structures: infor-
mation processing in mind and machine. Addison-
Wesley systems programming series. Addison- m
Wesley, Reading, Mass.
Michael Sperberg-McQueen and Lou Burnard. (eds.) []
1994. Guidefines for electronic text encoding
and interchange (TEI P3). ACH, ACL, ALLC, m
Chicago.
[]
[]
[]
[]

[]
[]
[]

m
[]
m
[]
m

[]
m
m

24

You might also like