LEXICAL ENCODING OF VERBS IN ENGLISH AND BULGARIAN Rositsa Dekova

Department of Modern Languages, NTNU Trondheim, Norway, NO-4791 rositsa.dekova@hf.ntnu.no Abstract This paper focuses on the information that can be encoded in verbs as lexical entries, and its formal representation both in English and Bulgarian. For this purpose, an already existing, but not very widespread framework is used - the Sign Model (Dimitrova-Vulchanova, 1996/99; Hellan and Dimitrova-Vulchanova 2000) that describes words as meaningful cells, including morpho-syntactic information. Based on corpora data and results from continuation tests, my research is an attempt to find a unified format for representing lexical entries not only within a single language, but also across languages. 1 Introduction complement and modifier are used respectively for their syntactic correlates. 2 Theoretical Background

The knowledge that native speakers demonstrate suggests that something beyond word-specific idiosyncratic properties needs to be accounted for in the lexical representation of words. Information about the syntactic environment in which a particular word can appear should also be part of the lexical encoding, a finding that is particularly relevant for verbs. The assumption that only some participant information is encoded lexically is widespread across a number of different linguistic theories. Traditionally, participants in a situation, denoted by a particular verb, are divided into two main groups – arguments and adjuncts (or complements and modifiers). A verb selects a set of arguments. The adjuncts, in contrast, are neither required, nor dependent on the particular verb. They can co-occur with many other verbs. Furthermore, syntactic realization does not always overlap with semantic obligatoriness. Therefore, I will refer to the set of entities included in a situation as semantic participants (following the terms in Koenig et al. 2002), where lexically encoded semantic participants are called arguments, and non-lexically encoded semantic participants adjuncts. The terms

Although it is widely accepted that the syntactic structure of many sentences is determined mostly or entirely by the participant information included in the lexical entries of verbs, there are no reliable syntactic criteria that can be used to delimit the set of items that can express lexically encoded participant information. In other words, there is no established set of necessary and sufficient criteria that can serve as a clear-cut basis for the distinction between information that is lexically encoded and information that is not (that is the distinction between arguments and adjuncts). One very good solution, however, has been suggested in a paper by Koenig et al. – Class Specificity and the Lexical Encoding of Participant Information (Koenig et al. 2002). They propose two criteria, semantic obligatoriness and verb class specificity, which jointly determine the argument status of participant information. The authors define lexically encoded information as that information which is accessed immediately upon recognition of a word. This is because only lexically encoded participant information is expected to play a role in the immediate representation that readers form for sentences. This information is said to be obligatory, that is, “it is entailed to hold of the class of situations denoted by a word” (ibid, p.226), and it is also relatively specific to the corresponding verbs. Those two properties can be directly observed by language users, and therefore, according to Koenig et al., they can serve as criteria providing a basis for learning the distinction between arguments and adjuncts.

2 Results and examples I have selected basic verb types in English and Bulgarian and examined their semantic properties with an account of their syntactic distribution. She drank the cocktail with a straw. we can see that there are also many phrases that are identified as BodyPart/Possessor (Levin’s term. respectively. Thus I could observe the relations between a semantic participant of a verb and the possible syntactic positions it can occupy with this verb being the main verb in the sentence. The corpora research was aimed at revealing the possible and/or . as illustrated by the following examples: (3) …i go probode v sartseto. She slapped Gabi on her knee. namely semantic obligatoriness and verb class specificity. and its significance for the lexical representation of the verbs. where I did part of my field research. 2. Thus. I will only show a few illustrative examples for English (LOB corpora only) and Bulgarian. pljasna/slap. 3.1 Corpora used in the research The results from the corpora research are summarised in Table 1 and Table 2. I have also tested the results of the analyses against native speaker judgments in two similar continuation tests for both languages. as well as an analysis of the most common semantic participants in relation with their syntactic distribution. 3 My research preferred syntactic environment of the relevant verbs. which here is referred to as Limit (the participant that is affected or changed). I have been able to isolate the participant information that should be encoded in the lexical representation of a selected set of verbs. In order to determine the possible morphosyntactic environment of the verbs selected. As expected. because it is not so well studied. Therefore an instrument should be included in the lexical representation of cut. the instrument phrase with the scissors is both obligatory and specific for the verb cut. at the Bulgarian Academy of Sciences. for Bulgarian I have used a corpus that is still under construction in the Laboratory of Computer Modelling of Bulgarian Language. Relying heavily on these two criteria. I have partially analyzed the type of syntactic behaviour they exhibit in the available corpora. drink only allows an instrument to be included in some types of situations denoted by it. While cut always describes a situation where an instrument is included. but does not allow for specific searches yet. ibid: 71). (4) Tja pljasna Gabi po koljanoto. and therefore perhaps more interesting to discuss. 1993) along with verbs that include motion (in Levin’s classification. For the purpose of this paper. and need not be present in the encoding of drink. but one does not have to use an instrument to drink. One peculiarity can be clearly seen in three of the Bulgarian verbs analyzed: proboda/stab. …and slapped him on his shoulder I have used Brown and LOB corpora for English. Special attention is paid to some of the information obtained from the corpora analyses. and potupam/tap. the verbs examined showed a great tendency to occur in a syntactic environment that consisted of elements that are overt expressions of the semantic participants linked to the particular verb. Aside from the appearance of the traditionally accepted complement (the direct object). The focus of the paper is on the results for Bulgarian. subgroups of what are called Verbs of Contact by Impact (as defined in Levin. Special attention was paid to approximately 20 verbs. for English and Bulgarian. And also. He cut the paper with the scissors. only the first 100 of the occurrences have been analysed in detail for verbs occurring more often. 3.The contrast between the verbs in sentences (1) and (2) illustrates this approach: 1. as the Bulgarian corpora is very large. those fall in the group of Throw Verbs). (5) …i go potupah po ramoto. …and stabbed him in his heart.

…she slapped down… (She fell down) (8) . All the participants in the tests were asked to “complete the sentences without spending too much time on any of the items.. And since the answers differ widely from each other (e. …and slapped down onto the grass. stab. and cut can be defined as Limit. The tests for English have not yet been fully completed. and is therefore more likely to be used to continue a sentence. Another interesting example would be the presence of path in the verb pljasna/slap: (6) . …his tail. This. however. then it will play an important role in the immediate representation that the readers form for sentences. These tests were developed to test native speakers’ intuition about the most prominent participants in a situation denoted by the target verbs. does not directly mean that we should include them as separate participants in the situation described. the prepositional phrases in sentences (7) and (8) show that ‘in the water’ should also be regarded as an overt expression of the end of path information that is lexically encoded in the verb. equally distributed among the target sentences. together with approximately the same amount of sentences containing distracter verbs. ….. so that it would be the first “thing” that came into their mind (additional literature on the methodology of similar type of tests can be found in Koenig et al. On the contrary. 4.. her hair.g. The first 30-40 sentences consisted only of a subject and a verb. but the analyses so far are consistent with the results from the Bulgarian tests. in the Appendix. her knee) it can not just be assumed that the results are merely due to the existence of a certain stereotype or a phraseological unit containing the target verb. the Possessor Raising phrase specifies the Limit.. we should distinguish between different types of information all of which is important for the lexical encoding of particular verb. the place of its contact with what is called the Launch-part. (7) . continuations for cut included: the bread. (She fell down on the grass) Whereas in sentence (6) the prepositional phrase in the water can not be initially identified as path. pljasna vav vodata. containing as many as 20 target verbs.. A similar behaviour is observed for other verbs of motion (see for example DimitrovaVulahanova. 2003).” I encouraged the participants to write down each continuation fast. 4 The continuation tests 4. slapped in the water.2 Results and analysis To determine what kind of participant information should be included in the lexical encoding of this particular set of verbs. but it does not constitute a separate participant. while the last sentences also contained a direct object. The main idea behind the continuation tests was to confirm the hypothesis that. Thus I expected to receive a significantly higher percentage of continuations related to semantic participant information.The high degree of occurrences of those phrases with particular verbs suggests that the information conveyed by them is an important part of the lexical representation of the verbs.1 Methodology of the tests The tests were organized as follows: there were 50 to 60 sentences. John.opashkata mu. There are higher percentages of continuations related to information about semantic participants: approximately 90% of the continuations for tap. 2004).and pljasna dolu varhu trevata. 2002. as described earlier. but which should not be treated in the same way.. Therefore. or more precisely. …. I have used not only data from English and Bulgarian corpora..tja pljasna dolu. Some of the results for Bulgarian (in per cent) can be seen in Table 3. her finger. if implicit participant information is lexically encoded. than the percentage of the responses that do not include lexically encoded participant information.. The continuations provided for the sentences also confirmed the corpora analyses . but also the results from two similar continuation tests conducted for both English and Bulgarian.

In addition. The Control dimension (for action that is under the control of a participant) will incorporate the values of the Controller. c. Non-Situational – reflecting whether what is expressed by verb is situated in time or not. 1991). performing the action). I have tried to find a suitable formalized lexical representation for the set of verbs selected. and Quality. It is a new decompositional approach to the traditional Theta-roles (Dowty. or a verb class in terms of selection. which were virtually “complete” (as described earlier. called Conditioned. Based on the data from the corpora research. b. Following a proposal made by Hellan and Vulchanova (Hellan & Dimitrova-Vulchanova 2000) I assume that there is a set of lexical semantic factors that serves as the basis for predictions about the possible morpho-syntactic environment of a verb. One of the potential members of that set is called criteriality. and subtlety of their meaning. Monodevelopmental vs. . in ‘John broke the window’. Dynamic vs. Launch-Part (the part of the participant. The dimension of Monodevelopment (short for ‘monotonic development’) includes the value of a Monodeveloper (the one performing the monotonic development) together with information about the possible respects in which the development can take place – Integrity.for tap. a representation that will account for the considerable complexity. may incorporate the values of Source (the participant performing the action). is sufficient to release a certain event. Stative – relevant only for Situational verbs and reflecting whether some kind of change or Force emission is involved or not. For example. the meaning of a verb (the Cell) is identified with the conditions that have to be So far we have seen that the kind of participant information that can be lexically encoded.as can be seen in the tables in the Appendix. verb. A cell consists of two parts – an aspectual part and a dimensional part. 27% . and 37% . Each of those sub-events should be separately described in detail. the continuations for the second half of the sentences in the tests. ‘John’ is the Conditioner for the event of breaking the window. the sentences had a subject. Instrument/Body extension constituted 90% of the fillers for stab. is semantically obligatory.for cut. NonMonodevelopmental – depending on whether the dimensional part includes Monodevelopment or not. and is restricted to a verb. A participant in a situation is thus defined by the set of values characterising co-indexed elements in the different dimensions. and Limit (the item upon which the force has been performed). Non-Protracted – a contrast that is close to the traditional distinction ‘durational’ vs. In order to define criteriality first I must briefly describe the structural unit constituting the meaning of a verb. called a cell (DimitrovaVulchanova 1996/99). being one of the main cases. called the Conditioner. a given event or actor. no Conditioner is identified and no Conditioning obtains in this usage of ‘break’. and the Target. Location (mainly regarding path). contained a high degree of fillers that were consistent with the information assumed to be semantically encoded. 5 The formal description d. Each of them reflects a different aspect of the involvement of one and the same participant in the situation denoted by the verb. Protracted vs. For many verbs a further dimension of Conditioning is possible in close relation with Monodevelopment. The aspectual part specifies the following factors: a. ‘nondurational’ The dimensional part consists of a number of dimensions. The dimensions may consist of one or more values. and a direct object). Furthermore. The dimension of Force. The importance has been shifted to the number of the participants. together with the results from the continuation tests. Conditioning applies when. All of them describe the situation as a whole. Situational vs. then. as well as to the differentiation of the sub-events constituting the main event. In contrast. that is. the Means. in a given context. if any. in ‘The window broke’.

The basic cell contains two items that can be counted as criterial – one of them is ‘John’. and of two other verb Cell of tap/potupam Global specification: +Protracted Constituency of Development: Recursion based Mode of recursion: iterative Recursive unit: Celln Aspectual Specification: +2-point Element specification: Conditioning|Constituency| Force| Monodevelopment Conditioner1 Fingers2 Source1 Launch-part2 Monodeveloper2 Limit3 . as well as and their syntactic behaviour) can be achieved. The alternation. described in Levin (1993) as Causative/Inchoative. A Source for an iterative activity with a cumulative Target With this theoretical approach as the basis of my research. then. In the context of (9) – John tapped his fingers on the desk. but also across languages. Mover2). The following items (in Hellan and Vulchanova 2000) are defined as criterial: 1. A Source whose Launch-part (a) behaves monotonically. Launch-part2. John will be defined as the set of values (Conditioner1. in the case of (10) – His fingers tapped on the desk. was incorrectly predicted by Levin as impossible with the verb tap. The notion of Criteriality. as it is easily possible to compare verbs that also encode more/less information than their correlates. applies to the items of a cell that have properties by which the situation is easily identified as belonging to a certain type. his fingers – (Fingers2. I have attempted to find a unified format for representing lexical entries not only within a single language. According to Levin’s criteria. a sub-group of Verbs of Contact by Impact. the dimension of Conditioning will not be present. Limit3). according to 2(a) (a Source whose Launch-part behaves monotonically). and the other one is ‘his fingers’. verbs undergoing Causative/Inchoative Alternation can be characterized as verbs of Change of State or Change of Location. according to 1 (an item with the value ‘Monodeveloper’). An item with the value ‘Monodeveloper’ 2.(Absorber3. A Limit with sustained contact 4. A more in depth analysis of the basic cell of the verb tap/potupam illustrates this approach: Conditioned2 Cell: Aspectual specification: +2-point Element specification: Monodevelopmenta Element: 2 Phasing: +2-point Medium: Location Line of Trajectory End: Contact with 3 Limit3 Thus sentences (9) and (10) can be evaluated as described bellow: (9) John tapped his fingers on the desk. Source1). However. and the desk . (10) His fingers tapped on the desk. The verb tap is a member of the Hit Verbs.met by the participants in a situation so that it can count as being expressed by this particular verb. Thus a more formal (and probably more accurate) comparison of verbs (their meaning. This representational format makes this not only possible but also easily predictable. or (b) is specified for inherent properties 3. An item characterized for Posture 5.

Thematic proto-roles and argument selection. Doctoral dissertation. NTNU (University of Trondheim)/LINCOM. Mauner. G. there is. G. M. Tromsø University D. References 6 Conclusion I have tried to show that breaking up information for encoding into relevant semantic features. P. English Verb Classes and Alternations. 1995. NTNU. November 4-6. but with regard to the Launch-part only. 1995). Paths in Verbs of Motion. 67-103. but also across languages. and was not predicted to be able to allow this alternation. As well as my colleagues at the Laboratory of Computer Modelling of Bulgarian Language. L. Throw verbs and Investigate Verbs. (2003). (2002). 224-235. As we have already seen. The MIT Press. instead regarding the situation as a whole. Hellan. Thus an investigation of the possible optimal solutions will be pursued to describe those verbs that do not have semantic equivalents in another language. 1996/99. Trondheim. because this approach makes it possible to compare verbs that encode more/less information than their correlates. 1991. 2004. and M. Koenig. Bienvenue. where I collected my data for Bulgarian and who accepted me as part of their research team. Presented at the Argument Structure CASTLE Conference. as a Distributed Lexical Database. Brain and Language. J. Diathesis and Aspect. P. The Generative Lexicon. Mauner. change of location. Dowty. who supported me my research. Arguments for Adjuncts. 2004. CLIT series. Lexical Specification and insertion.groups. Koenig. . 7 Acknowledgments I would like to thank my colleagues and friends at the Department of Modern Languages. Sofia. B. Dimitrova-Vulchanova. John Benjamins. Dimitrova-Vulchanova 2000. 81. Levin may have come to this incorrect conclusion by overlooking the individual participants and the single sub-events in the situation. such as Pustejovsky’s Generative Lexicon (Pustejovsky. It would also be very interesting to investigate whether the representational format presented in this paper can be integrated within some of the wellknown lexical theories. Thus a more formal (and probably more accurate) comparison of verbs (their meaning and their syntactic behaviour) can be achieved. 89. Bienvenue. Newcastle/ Munchen M. Cognition. 67(3): 547619. Language. B. at the Bulgarian Academy of Sciences. 1993. Verb Semantics. not only within a single language. and using a suitable formal representation are crucial in finding a unified format of representing lexical entries. Chicago and London: University of Chicago Press. Dimitrova-Vulchanova. This will lead my research to a new stage: the creation of a VerbNet. containing a network of verb classes with their semantic features. Criteriality and Grammatical Realization. J. Levin. J. and B. Pustejovsky. in fact. Class Specificity and the Lexical Encoding of Participant Information.

2 BPP 16 limit 1 4 instrument 1 BPP Table 2: The Bulgarian corpus data USAGE VERB rezha (cut) proboda (stab) Trans 71 1 refl.e. 2 100 - 76source - 2 19 limit 3 mann 15 instr 3 quant 13 BPP 2 loc 6 path end 95 limit 13 mann 72 BPP 2 loc 7 instr/b. Intr 21 7 sepassive Lit 90 Fig 12 SUBJECT Human/ Meta Instr /or part phor body ext. 39source 3 2/ 2 wings OBJECT Other Argument Adjunct 20 mann 8 loc 3 quant 9 mann 4 quant 1 time 1 loc 71 limit 5 source 12 instr 7 limit 2 BPP 84 limit 4 source 24 BPP 23 instr 3 (object) 4 (bird) - 72 15 34source 14 12 pljasna (slap) potupam (tap) 10 31 41 - 21source 1 (face) - 3 (feet/tail) 93 5 refl. 83 4 refl. loc.Appendix: Table 1: The English corpus data USAGE VERB cut stab slap Trans 69 4 8 Human/ Intrans Lit Fig part 23 43 source 86 43 37 pass 4 limit 1 2 source 2 2 1 pass 1 limit 3 1 pass 2 12 8 source 1 limit 15 source SUBJECT MetaInstr/ or phor body extens 4 limit 7 OBJECT Other Argument Adjunct 12 manner 1 manner 4 manner 3 manner 1 quantity tap 14 15 1 - - 6 source 58 limit 29 limit 5 instrument 2 limit 1 1 BPP 9 limit 2 source 3 part. 1 time .

foot) used as an instrument Human/part: human or part of a human (face. Margaret otrjaza Margaret cut 11. Billy probode Billy stabbed 26. finger. eyes. Ann probode mesoto Ann stabbed the meat 38. head. Lilly otrjaza hljaba Lilly cut the bread 36. Nozhat rezhe The knife cuts 32. Knigata pljasna The book slapped 28. Lucy pljasna Lucy slapped 9. Iva potupvashe po masata Iva tapped on the table 27 7 +2 Instr/ Body ext 10 37 Limit 90 +3 26 +10 93 90 +1 10 43 57 3 70 origin 3 end 10 47 27 47 Path Body-Part/ Possessor +16 +3 13 Loc Temp Manner +3 7 17 shamar slap 7 (s.Table 3: Results from the continuation test for Bulgarian Sentence 3.): instrument or body extension (hand.): someone +: refers to continuations provided in addition to the first one . (b. Valnite pljaskaha The waves slapped 34.o. leg. hair) Trans: transitive usage of the verb Intrans: intransitive usage of the verb Lit: literal usage of the verb Fig: figurative usage of the verb Source: the participant performing the action Limit: the participant that is affected or changed BPP: body-part/possessor Loc: (event) location Part.o.) 3(-) 3 3(-) 7 3(-) 3 10(-) 26 Other 90 10 3 7(-) 37 53 Legend: Argument: lexically encoded semantic participant Adjunct: non-lexically encoded semantic participant Instr/or body ext. Bob potupa Bob tapped 8.e. loc: participant location Pass: passive Se-pass: se-passive (a certain type of passive in Bulgarian) Refl: reflexive Quant: quantity Mann: manner Temp: temporal (time) (-): no continuation was provided (s.

Sign up to vote on this title
UsefulNot useful