You are on page 1of 89

Automated Natural Language Processing Using Functional Discourse Grammar

Reinier Lamers April 21, 2009

cx i

operators

head

modif iers

multi

multi

fi

operators

head

modif iers

recipeN

Supervisors: dr. F.P.M. Dignum drs. N.L. Vergunst prof. dr. Kees Hengeveld
Submitted as a master’s thesis in Cognitive Artificial Intelligence at Universiteit Utrecht for 30 European credits

1

Preface
The road leading up to the thesis you are reading has been longer and bumpier than the graduation manual predicts, although this is hardly a surprise, especially to a Cognitive Artificial Intelligence student. There are numerous people who supported me while I was cruising this road and its surroundings. I will take the opportunity to thank them here. First among them are my parents Diny and Leo. I also wish to thank my supervisors Frank and Nieske. Kees Hengeveld and Evelien Keizer in Amsterdam, who always had the patience to answer my questions about Functional Discourse Grammar. My officemates in the computer science building, especially Barbara and Eelco. And of course CAI students in general, for always being in the mood to discuss anything remotely related to AI in depth. The tree diagrams in this thesis, one of which made it to the cover page, were drawn using the dot2tex and dot2texi packages by Kjell Magne Fauske. I thank him, and the authors of all other open source tools used in my master’s thesis project, for sharing their work.

2

Contents
1 Introduction 1.1 From Syntax to Semantics Using Functional 1.1.1 From Syntax to Semantics . . . . . . 1.1.2 Why FDG? . . . . . . . . . . . . . . 1.2 Research Question . . . . . . . . . . . . . . 1.3 Notation conventions . . . . . . . . . . . . . 1.4 Outline . . . . . . . . . . . . . . . . . . . . 6 6 7 7 7 8 8 9 9 9 10 11 12 12 13 13 13 15 15 16 16 17 17 17 18 20 20 20 24 25 25 26 28 30

Discourse Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Functional Discourse Grammar 2.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Discourse Act . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 No Transformations or Filters . . . . . . . . . . . . . . 2.1.3 A Single Theory of Language . . . . . . . . . . . . . . 2.1.4 Maximal Depth Principle . . . . . . . . . . . . . . . . 2.1.5 Typology-based . . . . . . . . . . . . . . . . . . . . . . 2.1.6 Not a Theory of Discourse . . . . . . . . . . . . . . . . 2.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Levels, Formulation and Encoding . . . . . . . . . . . 2.2.2 Use of the Structure for Natural Language Processing 2.3 Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Formulation Primitives . . . . . . . . . . . . . . . . . 2.4.2 Morphosyntactic Encoding Primitives . . . . . . . . . 2.4.3 Phonological Encoding Primitives . . . . . . . . . . . 2.5 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Notation Conventions for FDG Layers . . . . . . . . . . . . . 3 The Morphosyntactic and Representational Levels 3.1 The Morphosyntactic Level . . . . . . . . . . . . . . 3.1.1 The Formal Structure of the Morphosyntactic 3.1.2 Positioning and Alignment . . . . . . . . . . 3.2 The Representational Level . . . . . . . . . . . . . . 3.2.1 Semantic Categories . . . . . . . . . . . . . . 3.2.2 Propositional Contents . . . . . . . . . . . . . 3.2.3 States of Affairs . . . . . . . . . . . . . . . . 3.2.4 Properties . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . Level . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

3

3.2.5 3.2.6 3.2.7

Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Coindexation . . . . . . . . . . . . . . . . . . . . . . . . . 36 Relation to the Morphosyntactic Level . . . . . . . . . . . 36 39 39 40 40 42 42 45 45 46 46 47 47 48 51 52 54 54 55 56 57 58 58 58 63 67 67 68 70

4 Language-Independent Aspects 4.1 Theoretical Background . . . . . . . 4.2 Overview of the Approach . . . . . . 4.2.1 FDG Structures as Trees . . 4.2.2 The Three Stages . . . . . . . 4.3 The Treewalk Stage . . . . . . . . . 4.3.1 Form of Tree Chunks . . . . . 4.3.2 The Treewalk Procedure . . . 4.3.3 Implementation in Prolog . . 4.3.4 The Lexicon . . . . . . . . . . 4.4 The Composition Stage . . . . . . . 4.4.1 Form of Combination Criteria 4.4.2 The Composition Process . . 4.5 Coreference Resolution . . . . . . . . 4.5.1 Implicit Subjects . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nodes . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

5 Language-Dependent Aspects for a Fragment of English 5.1 The Test Dialog . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Morphosyntactic Analyses . . . . . . . . . . . . . . . 5.1.2 Representational Analyses . . . . . . . . . . . . . . . 5.2 The Accepted Fragment of English . . . . . . . . . . . . . . 5.3 The Node Function and Matching Function . . . . . . . . . 5.3.1 Grammatical Words . . . . . . . . . . . . . . . . . . 5.3.2 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Verb Phrases . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Adjective Phrases . . . . . . . . . . . . . . . . . . . 5.3.5 Adposition Phrases . . . . . . . . . . . . . . . . . . . 5.4 Representational Frames . . . . . . . . . . . . . . . . . . . . 5.5 Initial open spot . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

6 An Example 71 6.1 The Treewalk Stage . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2 The Composition Stage . . . . . . . . . . . . . . . . . . . . . . . 79 7 Discussion and Conclusion 7.1 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . 7.1.1 Limitations of the Node Function . . . . . . . . . . . . . 7.1.2 Universal Combination Criteria . . . . . . . . . . . . . . 7.1.3 Towards a Complete FDG-based Understanding System 7.1.4 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Coreference Resolution . . . . . . . . . . . . . . . . . . . 7.1.6 Use High-Level Information . . . . . . . . . . . . . . . . 7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 82 82 82 83 83 83 84 84

. . . . . . . .

4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. . . . . . . . . . . . . . .1 Inleiding . . . . A. . .A Samenvatting in het Nederlands A. . . . . . . . . . . . 87 87 87 88 5 . . . . .2 Functionele discourse-grammatica . . . . . .3 De aanpak . . . . . . .

1 From Syntax to Semantics Using Functional Discourse Grammar This thesis will describe a computer program that takes a representation of the syntax of an utterance. It receives an acoustic signal from its sensors. and produces a representation of the semantics of the utterance. One needs reasoning systems to come up with solutions to the users’ problems. 6 . 2008] [Hengeveld and Mackenzie. and that it listens to your calls for help in ordinary English or Dutch. It can tell you what you can cook with the available ingredients. These representations of syntax and semantics follow the scheme of Functional Discourse Grammar (FDG) [Hengeveld and Mackenzie. people are now trying to build this machine. The machine will have to be able to make appropriate deductions from a speech signal. This project is known as the Dutch Companion Project. a grammar theory described in chapter 2. And best of all. 2006]. This thesis focuses on one of those processing steps: the step from the syntax of a linguistic utterance to the semantics of that utterance. one needs language and speech recognition to figure out what a user is saying to the machine. friendly machine in your kitchen. and finally the speech in that signal must bring about changes in the machine’s model of the environment and the users. Between those two endpoints lie many processing steps. one needs a lot of technology from the field of Artificial Intelligence. Actually. and it adapts its behavior accordingly.Chapter 1 Introduction Imagine a clever. this clever machine is aware when you’re tired or in a hurry. It can tell you how you can replace a missing ingredient. To build such a friendly machine from dead materials. nl/content/view/50/41/. One needs language and speech generation. which has a website at http://www. from noise reduction to parsing the linguistic utterance present in the signal. 1. It can tell you how thin you should slice the tomatoes.decis. The aim of this thesis is to describe a part of a natural language processing system that could be used in this machine. One needs a model of the environment including the emotions and beliefs of the users. it tells you all this in ordinary natural language. And last but not least. Imagine that it’s there when you need it. Again.

1. are not naturally analyzed as logical sentences that can be true or false. it also has a number of drawbacks. Earley or HPSG-based parsers. The most important reason to use FDG however. The system is not a syntax 7 . like exclamations or imperatives. that contains many objects that the user may refer to in his or her utterances. “semantics” and “morphosyntax” are understood as defined and represented according to Functional Discourse Grammar. 1. that it can use to predict the response of the user. as explained in [Moortgat. the morphosyntax. While there are many well-known parsing algorithms for natural language. as described in chapter 2. I do not know of any syntax-to-semantics technology that is equally well-known. This point is explained in §2. FDG offers us ways to integrate these sources of knowledge with the language processing faculty of the machine. I use data structures postulated by FDG to represent the semantics of an utterance instead of logical formulas. When using this Montague method. The work of Richard Montague [Montague. 1973] on systematically extracting the meaning of a text out of its form has been extremely influential. Most notably. 2002]. This made it easier to design the data structures for the program. FDG already describes the structure of linguistic data to some extent with its notion of layers. 1. While this allows one to compute the truth of an utterance.1.FDG has also inspired the design of the approach that is used to compute the semantics from the syntax. was that FDG is a complete linguistic theory. Using one theory that covers all these aspects is much more convenient than using four different theories.1 From Syntax to Semantics The step from syntax to semantics is not as well-researched as the step from text to syntax. It will also have a model of the environment.2. the semantics of an utterance are commonly represented as formulas in a logic formalism. is that FDG promises to make it easy to use data from different sources to find out the speaker’s intention. [Montague. 1970]. The kitchen machine will have a user model. FDG covers the phonology.2. the semantics and the pragmatics of an utterance in one coherent theory.1. like CYK. in this thesis. Note that I am not going to investigate how this system could discriminate between grammatical and ungrammatical input. some linguistic utterances.2 Why FDG? One reason to choose FDG as the linguistic foundation to build the computer program on. It has been taken up by the categorial grammar community among others. Also. Therefore. given its morphosyntax? In this question.2 Research Question The research question that this thesis will attempt to answer is: How does one compute the semantics of an utterance.

In addition to this main question. Chapter 5 describes how the program was applied to a fragment of English. it will accept “The chair can I prepare?”. • In English. • Linguistic examples are given in double quotes. 1.4 Outline After this introduction to the thesis comes an introduction to Functional Discourse Grammar in chapter 2. can find it at http: //reinier.3 Notation conventions I have used the following conventions about notation in this thesis: • The notation for glosses and FDG formulas [Hengeveld and Mackenzie. I also briefly discuss computing the pragmatics of an utterance. FDG formula examples have the orthographic form of the constituent they represent below them in double quotes. Readers that are interested in the source code of the program. This fragment of English was determined by an actual dialog between two people. 1. where pragmatics is again understood as defined and represented according to Functional Discourse Grammar. follows • While the thesis describes a computer program.de/thesis/program-source. unless stated otherwise. I have avoided displaying program code as much as possible to keep the thesis readable. It will give a meaning to the syntactic representations of utterances that cannot occur in correct English – for example. I use male pronouns when writing about humans of whom the gender is not known • Glosses have translations below them in single quotes. Chapter 6 concludes the thesis by discussing the results and their relevance. and the data structures this program uses. Chapter 4 describes the structure of the computer program that I developed to compute semantics from morphosyntax. treating it much like “What can I prepare?”.checker. quoted names are given in single quotes. This chapter will give the reader an understanding of the structure and principles of FDG in general. 8 . 2008].zip. Chapter 3 is about the structure of the representations that FDG employs to describe morphosyntax and semantics.

Finally I give an example of how FDG can be used to analyze a simple constituent of an utterance in English. based on the Functional Grammar framework which is described in [Dik. that not all linguistic utterances are in the form of clauses. 1989]. In this section. which views language as an abstract entity. FDG claims. that set it apart from Functional Grammar or other linguistic theories.1. 2. Functional Discourse Grammar follows a functional approach to modeling language. and that the sentence is not a concept that is found in all the world’s languages. Discourse acts are mapped to those 9 . This sets FDG apart from many other theories that choose either the clause or the sentence as the basic unit of analysis.Chapter 2 Functional Discourse Grammar Functional Discourse Grammar (FDG) [Hengeveld and Mackenzie. with no regard to how language is used. though. fully grammatical clause fragments. 2006] is a grammar framework for natural language. phrases or words. Discourse acts may be manifested in language as clauses. I explain some distinguishing principles of Functional Discourse Grammar. 2. In the following section I explain the principles that underly Functional Discourse Grammar. 2008] [Hengeveld and Mackenzie. Then I proceed to explain the components and primitives that FDG supposes are involved in processing natural language. Functional Discourse Grammar explains the shape of natural language utterances in terms of the goals and knowledge of natural language users. to achieve greater pragmatic and psychological accuracy. This functional approach contrasts with the formal approach to modeling language.1 Principles Functional Discourse Grammar was created as a revision of Functional Grammar. By doing that.1 Discourse Act A first principle is that the basic unit of analysis in FDG is the discourse act.

1995]: (1) A I’ve got an extra ticket for the Santa Fe Chamber Orchestra tonight. and FDG uses this definition. consider the following examples from [Dik. 1995] as “the smallest identifiable units of communicative behavior”. In this example. consider this dialog example from [Kroon. one corresponding to (1A) and another corresponding to (1B). the whole dialog is a single exchange. called moves. employs a filter and must be avoided in FDG. The motivation for disallowing such steps is that doing so constrains the number of possible hypotheses explaining a linguistic phenomenon. Discourse acts are defined in [Kroon. The latter move consists of a single discourse act. The former move is built up from two acts: one corresponding to “I’ve got an extra ticket for the Santa Fe Chamber Orchestra tonight”.2 No Transformations or Filters The top-down grammar processes that map discourse acts to language utterances may never employ processing steps that alter or remove structures that have already been built up. And an example of a phenomenon that one could analyze with a filter is: (3) A I met a boy who was carrying a green uniform B I met a boy carrying a green uniform Any analysis of (3B) that derives it from (3A) by deleting a part of its analysis corresponding to “who was”. and that it makes it possible to recover underlying structures from their outward manifestations. There are two moves. A move is “the minimal free unit of discourse that is able to enter into an exchange structure”. pages 18-19.manifestations by a top-down grammar. wonderful. 10 .1. ideally but not necessarily two of them. As concrete examples of the restrictions this requirement imposes on linguistic explanations. As an example. 2. and the other corresponding to “Are you interested?”. An exchange consists of moves. is a transformation and is must be avoided in FDG. Are you interested? B Yes. Such steps are known in the Functional Grammar and Functional Discourse Grammar literature as transformations and filters respectively. Larger structures like moves and exchanges account for the patterns of organization that are larger than individual clauses that are found in some languages. Discourse acts can be combined into larger structures. 1989]. An example of a linguistic phenomenon that one could analyze with a transformation is: (2) A John doesn’t like pancakes B PANCAKES John doesn’t like Any analysis of (2B) that derives it from (2A) by moving the constituent “pancakes” through the underlying structure.

deals with concepts like ‘speech act’ and ‘participant’. The Phonological Level deals with such concepts as ‘intonational phrase’ and ‘phonological words’. this occurrence of “that” is a purely semantic reference and shows that there is a Representational Level of organization. The communicative strategy is an interpersonal aspect of the utterance. The Representational Level is about semantic concepts like ‘individual’. To get some feeling about what the various levels represent. In (7B) “that” refers to the way the phrase “chuletas de cordero” was pronounced in (7A). The Interpersonal Level. ‘property’ and ‘state of affairs’. This kind of reference shows that phonological material is also available for reference. The Morphosyntactic Level contains the well-known syntactic categories like ‘noun phrase’.3 A Single Theory of Language Another principle is that FDG aims to explain all grammatical aspects of an utterance in one theory. B I didn’t notice that. Consider these examples from [Hengeveld and Mackenzie. and not to its denotation. and functions like ‘subject’ and ‘object’. FDG claims that interpersonal aspects of an utterance are available for reference. thus that there is a Phonological Level of organization. One of the reasons for distinguishing those four levels of analysis is that anaphoric reference is possible to any one of the them. so this example shows that interpersonal aspects of the utterance are available for reference. a morphosyntactic. In (6B) “that” refers to the phrase “chuletas de cordero” from (6A). In (5B) the speaker uses the pronoun “that” to refer to the situation in the external world that (5A) describes. at which level of Functional Discourse Grammar they apply. a representational and an interpersonal analysis. and functions like ‘topic’ and ‘focus’. morphosyntax. That phrase is morphosyntactic in nature. 11 . and functions like ‘Actor’ or ‘Instrument’. semantics and pragmatics of the utterance respectively. ‘adverb’. B Shouldn’t that be ‘/tSu"letasdeTor"dero/’ ? In (4B) the speaker uses the pronoun “that” to refer to the communicative strategy employed in (4A). 2008] page 5: (4) (5) (6) (7) A Get out of here! B Don’t talk to me like that! A There are lots of traffic lights in this town.1.2. Thus. so there must be a Morphosyntactic Level to refer to. finally. A I had chuletas de cordero last night. These explain the phonology. B Is that how you say ‘lamb chops’ in Spanish? A I had /tSu"letasdekor"dero/ last night. because they are stored in an Interpersonal Level of organization for the utterance. An FDG analysis of an utterance includes a phonological. I will now say of some well-known linguistic concepts and data structures. concepts like ‘finite’ or ‘plural’.

While it is conceivable that the program described in this thesis can be applied to a wide variety of languages. that was not a goal during its development. As an illustration of the variability of the relation between different levels. 12 . is to describe a way to process utterances in English. however. As already mentioned above. Te-shut-pe-ban 1. So if for example an utterance does not have any representational content that influences its appearance.sg. Its meaning only relates to previous discourse.1. English uses a separate word to express it. representational. (8) (9) I made shirts. a clause fragment. are used in the production of that utterance. consider the following examples from [Hengeveld and Mackenzie.2. 2. Sentences with the same structure at the Representational Level (a predicate applied to two individuals. In other words. a phrase or a word.3. in Southern Tiwa the meaning is expressed using a clause with only one constituent. 1. a discourse act (an interpersonal entity) may be manifested at the Morphosyntactic Level as a clause. it does not have a Representational Level.sbj>pl.1 Relations Between the Levels The four levels of analysis (interpersonal. and not to the world outside the conversation and its direct environment.1. in English it becomes another separate word. there is not a one-to-one mapping between entities of one level to entities of another.obj-shirt-make-pst ‘I made (the) shirts.1.’ While Southern Tiwa expresses with an affix on the verb that there is a first person singular Actor. Therefore it has no Representational Level. morphosyntactic and phonological) of an utterance need not have the same structure.sbj>pl. which is that only those levels that are relevant for the production of a certain utterance. 2. In the gloss.4 Maximal Depth Principle Functional Discourse Grammar has a principle of Maximal Depth. and pst means ‘past tense’. That is the case for the utterance “Okay” (expressing agreement) for example. 2008].sg. So in English this meaning is expressed with a clause of three constituents.5 Typology-based Functional Discourse Grammar is a typology-based theory: it aims to be applicable to every natural language in existence.obj means ‘first person singular subject acting on plural object’. an actor and an undergoer) have a different morphosyntactic structure in English (8) and Southern Tiwa (9). And while Southern Tiwa incorporates the Undergoer argument of the predicate into the verbal word. The aim of this thesis.

it only includes as much of such information as is necessary to explain the form of linguistic utterances. Secondly. The word “discourse” in “Functional Discourse Grammar” merely signifies that FDG explains certain formal aspects of linguistic utterances in terms of concepts from discourse analysis. A grammar is the way communicative intentions are encoded as linguistic utterances and how linguistic utterances are decoded to communicative intentions. FDG aims to become a component of an overall theory of verbal interaction. it contains all structures created by the grammatical component when converting to or from communicative intentions. FDG models the grammar of natural languages.2. When a communicative intention is sent to the grammatical component from the conceptual component. The grammar model.6 Not a Theory of Discourse Finally. so that those can be referred to later. FDG is a theory of grammar and it does not attempt to be a theory of discourse as well. This communicative intention is then passed to the grammatical component. of FDG is supposed to cooperate with three other components: the conceptual component. it fetches information about the environment and the history of the conversation from the contextual component.1 Levels. while an FDG analysis includes information about discourse aspects of language use such as references and intentions.1. or grammatical component. 2. signing or writing. 2. These stages of processing create data structures for the levels of organization that have been discussed in §2. but it is not an overall theory of verbal interaction itself. When the conversion by the grammatical component is ready. Firstly. The conceptual component is where a communicative intention is formed. To perform this conversion. The output component then realizes the utterance in the form of sound.1.2.3. it is first processed by the formulation stage. Formulation and Encoding Within the grammatical component. which converts the intention to a linguistic utterance encoded as a phonological representation. many of the processes in the different components may in fact happen in parallel in the brain. the contextual component and the output component. the encoding of the communicative intention proceeds in two stages: formulation and encoding. A communicative intention is an intention to communicate a certain content to another person capable of understanding natural language. This 13 . While the process of creating an utterance from a communicative intention is described entirely sequentially in this section. By doing so. it contains long-term information about the context of the discourse.2 Components As a linguistic theory. the result is sent to the output component. The contextual component contains two kinds of context information. like the social relation between the discourse participants.

1: General layout of Functional Discourse Grammar. 2008]. [Hengeveld and Mackenzie.Figure 2. From 14 .

FDG does not presuppose any universal semantic or pragmatic notions. 2008]. and so on. which may contain lower layers. This Phonological Level data structure is then fed into the output component. The model of language production already incorporates links with external components. FDG’s Conceptual Component maps onto the kitchen machine’s belief base. FDG’s Contextual Component maps onto the machine’s user and environment models. I can explain the form of the four levels that the grammatical component creates for an utterance. and the output component maps onto the actuator control system of the kitchen robot. and which components it presupposes in the natural language user. It takes the interpersonal and representational data structures created by the formulation. The structure of the four components. which actually expresses the utterance. The output of the formulation is fed into the morphosyntactic encoding process. complete with the levels and processes. The rules of the formulation process are all language-dependent. 2. and the data structure at the Representational Level that represents the semantics of the utterance.formulation stage creates two data structures: the data structure at the Interpersonal Level that represents the pragmatical aspects of the utterance. all intermediate data structures at the four levels are stored in the contextual component to enable the discourse participants to refer back to parts of them later on in discourse. the structure of an FDG layer is: (10) (π v1 : [head (v1 )Φ ]: [σ(v1 )X ]) 15 . 2. and displayed as a layered structure” ([Hengeveld and Mackenzie. All of the four levels have an organization that is “hierarchical in nature. Additionally. the Representational Level and the Interpersonal Level are used by the phonological encoding process to create a data structure at the Phonological Level that represents the phonological aspects of the utterance. These predictions can be used to check and correct the Representational and Interpersonal Level as obtained from the user’s utterance. In its maximal form.2 Use of the Structure for Natural Language Processing The structure of Functional Discourse Grammar makes it a good candidate for forming the basis of the natural language understanding faculty of an intelligent kitchen assistant machine.1.3 Layers Now that I have explained what the principles behind FDG are. which in turn may contain yet lower layers. So a level consists of layers. is shown in Figure 2.2. and creates a data structure at the Morphosyntactic Level that represents the morphosyntax of the utterance. The data of the Morphosyntactic Level. page 14). one can use the information from the Contextual and the Conceptual Component to make predictions about the Representational and Interpersonal Level of the utterance of the user. As has already been shown in examples (4-7).

an example of a function at the Morphosyntactic Level would be ‘subject’. both meaning “the green one”. In apparently headless constructions like Spanish “la verde” or Dutch “de groene”.5 I give an example of how layers are built up from those building blocks according to the general form. 2008]. π stands for a set of operators that restrict the variable v1 . What set of frames a language has is 16 .. the Interpersonal Level of an utterance does not contain more information than is necessary to explain the pragmatical properties of that utterance. The differences between the four levels are motivated by the principle that no level contains more distinctions than is necessary to explain the properties of the utterance at that level. and so are the modifiers σ. operators and functions stand for grammatical means to restrict the variable. The directly enclosed constituents are called equipollent constituents in FDG. As the form shows. In §2. 2. There are distinct sets of primitives for every level in the grammatical component. i. that determine the ways that other building blocks of structures may be combined. v1 stands for an identifier that uniquely identifies the layer and indicates the category of what the layer represents [Smit et al. and finish with the primitives for phonological encoding. I start with primitives used in the formulation stage. the levels never contain more distinctions than are relevant to the grammar of a given language. the head is still present in the context. The square brackets (‘[’ and ‘]’) in the form indicate that all the things that are directly enclosed by them are not in a hierarchical relation to each other.In this form. heads and modifiers may consist of a number of equipollent constituents. and that the head must always be present. An example of a concept that is an operator at the Morphosyntactic Level would be ‘plural’. The difference between operators and functions is that functions specify the relation between this layer and other layers that are part of the same containing layer. 2. while heads and modifiers are lexical means to restrict the variable. The difference between the head and the modifiers is that a head is the most important restrictor. Frames are blueprints for layers. while operators only apply to the layer itself.e. In general.. The following section will describe what kinds of building blocks can be inserted into this general form.1 Formulation Primitives The kinds of primitives used in formulation to build the Interpersonal Level and Representational Level are frames. The head is another layer or lexical item that restricts the variable. I will now discuss the different kinds of primitives. that determine in what way the head and σ respectively restrict the variable v1 .4 Primitives The building blocks of FDG layers are the so-called primitives.4. lexemes and operators. then discuss primitives used in morphosyntactic encoding. Also. Φ and X are functions.

grammatical morphemes and morphosyntactic operators. though one can predict the presence of certain frames in a language based on typological hierarchies. but do not take their definitive form before the Phonological Level. the speaker first selects the appropriate frames and then fills in the lexemes and operators in the selected frames.entirely language-dependent. The grammatical morphemes are unmodifiable elements that are introduced at the Morphosyntactic Level. The morphosyntactic templates form the basis of organization at the Morphosyntactic Level. phonological phrases. The set of lexemes of a language can be separated into representational and interpersonal lexemes. Phonological operators are used to explain aspects of the output that are not a direct reflection of interpersonal.2 Morphosyntactic Encoding Primitives The kinds of primitives used in morphosyntactic encoding are morphosyntactic templates. They are independent units that must be inserted into frames. For example. 2. interpersonal lexemes are inserted into the Interpersonal Level. Lexemes are the entries of the lexicon. taken from [Hengeveld and Mackenzie. Operators occur in the operator position π of a layer as discussed in the previous section. 2. Phonological templates are templates for phonological layers such as utterances. and irregular forms from inflection paradigms of lexemes. page 23. are expressed by grammatical (as opposed to lexical) means. in a given language. 17 . phonological words. Operators represent distinctions that.5 A Simple Example To give a concrete example of how the different levels of a certain utterance are built up from primitives. and syllables. The morphosyntactic operators are like the grammatical morphemes. In this example we show how the constituent “these bananas” is analyzed at all the four levels. 2008] that this is in line with findings from experimental psychology. Representational lexemes are inserted into the Representational Level of an utterance. It is claimed in [Hengeveld and Mackenzie. Suppletive forms are phonological forms of the morphosyntactic operators discussed in the previous subsection. representational or morphosyntactic operators. in many languages the accusative case of a noun can be triggered both by a semantic role of Undergoer and by certain adpositions.4. A single grammatical morpheme or morphosyntactic operator can be triggered by multiple different semantic distinctions. consider the example (11).4. 2008].3 Phonological Encoding Primitives The kinds of primitives used in phonological encoding are phonological templates. intonational phrases. 2. feet. When formulating a communicative intention. suppletive forms and phonological operators.

but it has two operators and a function. The Morphosyntactic Level analysis in (11c) shows that the Morphosyntactic Level of “these bananas” consists of one noun phrase (Np). Operators on the surrounding layer that are deduced from the material in dashes are not given explicitly. the morphosyntactic operators have been turned into phonological material. The operators signify that the layer designates more than one (m) entity in terms of the location of its referent (prox). The entity layer also lacks modifiers. The operators ‘prox’ and ‘m’ from the Representational Level are expressed as the morphosyntactic operators ‘this’ and ‘pl’. The operator ‘pl’ is applied to both the grammatical word and the nominal word because it will be expressed on both. In such cases. which we don’t know without context. they have only identifiers and heads. which has only an operator (+id) and an identifier (RI ). The identifier letter (R) tells us that this layer is a referential subact. The function Φ on the entity layer is a variable that holds the place of the actual function. 2. modifiers or function. I write the orthographical representation of those constituents between dashes inside the formula. modifiers or function to be shown. This notation is borrowed from [Hengeveld and Mackenzie. There are no head. This formula shows that there is an noun phrase that has a further analysis of “the garlic” in its head. 2008]. we see the lexeme for the word “banana”: it is a noun (N) with the phonological representation /b@"nA:n@/. 18 . The property layer has no operators. the constituent is analyzed as a layer of the entity category (x). At the Phonological Level (11d).the garlic . like (Npi : . but that further analysis is not given. In the head of the property layer.6 Notation Conventions for FDG Layers I use the following conventions when writing FDG layer formulas: • Sometimes I write down an FDG layer formula that contains constituents that are not analyzed. The head of this noun phrase in turn consists of a grammatical word (Gw) and a nominal word (Nw). The interpersonal operator (+id) signifies that the constituent is identifiable in the discourse context.(Npi )).(11) a b c d IL (+id RI ) RL (prox m c xi : [(fi : /b@"nA:n@/N (fi ))(xi )Φ ]) ML (Npi : [(Gwi : this-pl(Gwi ))(Nwi : /b@"nA:n@/-pl (Nwi ))](Npi )) PL (ppi : [(pwi : /Di:z/ (pwi )) (pwj : /b@"nA:n@z/ (pwj ))](ppi )) In the interpersonal analysis (11a) we see that this constituent is analyzed at this level as a single layer. The phonological layers of this utterance have no operators. In (11b) we see that at the Representational Level. This entity layer contains another layer of the property category (f). The subscript I on the identifier is used to make the identifier unique in the presence of other referential subacts. modifiers or functions.

’ for pieces of FDG formula forms that are not filled in.• I use ‘. 19 . So (Np1 : . . (Np1 )) stands for a generic noun phrase. X. . • Φ. Npk is the notation for an identifier of a layer with any constitution. Ψ. • (Npk ) is the notation for a layer with no operators. head nor modifiers. . Ω and Π are used as variables over functions of representational layers. .

the outermost layer is of the category of Linguistic Expression (Le). In this section I explain as much of them as is necessary to understand the approach explained in the next chapter.1. So the general form of a Morphosyntactic Level is: (12) (π Le1 : [head (Le1 )Φ ]) Morphosyntactic layers are usually written without the square brackets around the application of the head to the identifier. ‘adposition phrase’ or ‘clause’ for example.1 3. ‘simultaneous’ or ‘deontic’. 3. The Morphosyntactic Level deals with concepts like ‘noun’. while the Representational Level deals with concepts like ‘property’. The Representational Level holds the semantics of the utterance: it tells how its meaning relates to the external world. I must first clarify the structure of its input and its output. The functions are empty when omitted.Chapter 3 The Morphosyntactic and Representational Levels To be able to describe my computer program.1 The Morphosyntactic Level The Formal Structure of the Morphosyntactic Level For any Morphosyntactic Level. In the following two subsections. and its output is the Representational Level. I delineate the scope of what the structures at the Morphosyntactic and Representational Levels describe in more detail. and without functions. Its input is the Morphosyntactic Level. I also present the form of the Morphosyntactic and the Representational Level in more detail. The Morphosyntactic Level holds information on the formal aspects of the form of the utterance: it tells what morphemes it is built up from and how these morphemes are connected. In doing so. example (12) becomes example (13): (13) (π Le1 : head (Le1 )) 20 . outside of the discourse participants and the utterance itself. So in the usual notation.

Every clause and every phrase in this form can be expanded further as explained in the paragraphs above. Every clause and phrase or word in the form may occur any number of times. We can wrap up these structures of clauses. ‘V’ for verbs. In (16B). and ‘Adv’ for adverbs. and a phrase has one or more words in its head. ‘V’ for verb phrases. ‘Ad’ for adposition phrases and ‘Adv’ for adverb phrases. we see a linguistic expression that consists of one adverb phrase as the Morphosyntactic Level of (17A). ‘Adj’ for adjectives. A phrase is a layer that is of a category Xp. 21 . phrases and words in the form (14): (14) (π Le1 : [(Xw1 )(Xp1 : [(Xw3 )(Xp1 )(Cl3 )](Cl2 ))](Le1 )) [(Xw2 )(Xp2 )(Cl1 )](Xp1 ))(Cl2 : where Xp stands for a phrase of any type and Xw stands for a word of any type. All the possible values for the ‘X’ are ‘N’ for nouns. All the possible values for the ‘X’ are ‘N’ for noun phrases.As one can see in example (12). morphosyntactic layers do not have modifiers. we see that the Morphosyntactic Level of (15A) comprises two phrases. A clause.(Adpi )) (Advpi : . ‘Adj’ for adjective phrases. in turn. ‘G’ for grammatical words.obviously (Advpi ))] (Lei )) A “He is asleep” B (Lei : (Cli : . or not at all. A word is a layer that is of a category Xw. Only when two morphosyntax units can be shown to belong together morphosyntactically. has words and/or phrases in its head. where the ‘X’ is a variable for the type of word.he is asleep . they belong together in a single Linguistic Expression. In many cases it will be a clause (Cl). look at the examples in (15-17): (15) A “In the refrigerator.in the refrigerator . the Morphosyntactic Level of (16A). And in (17B) finally. obviously” B (Lei : [(Adpi : . one adposition phrase and one adverb phrase. As some examples of linguistic expressions with a different constitution. we see the usual case of a linguistic expression that has a single clause. In the head of a Linguistic Expression. But it may consist of one or more words or one or more phrases in case of a simple Linguistic Expression. where the ‘X’ is a variable for the type of phrase. The smallest unit that can occur on its own in a discourse is the Linguistic Expression. a number of different layer categories may occur.(Cli )) (Lei )) A “Not again!” B (Lei : (Advpi : [(Advwi : not (Advwi )) (Advwj : again (Advwj ))] (Advpi )) (Lei )) (16) (17) In (15B). Any unit that can occur freely in discourse is considered a Linguistic Expression. which is a kind of layer that is found in every language in the world.

The Morphosyntactic Level of this utterance is given in example (20): (20) (Lei : (Cli : [(Npi : (Nwi : i (Nwi )) (Npi ))Subj (fin Vwi : am (Vwi )) (Adjpi : (Adjwi : hungry (Adjwi )) (Adjpi )) ] (Cli )) (Lei )) “I’m hungry. phrases (Xp) and/or clauses (Cl) in its head. Words that cannot be decomposed further are represented by a layer with the phonological material of the word in the head. For English. Roots are morphemes that have lexical content and that cannot occur as a word on their own. There are three different kinds of morphemes: stems. a verbal word Vwi and an adjective phrase Adjpi . Greetings like “Hello!” can be argued to have only Interpersonal and Phonological Levels. such words can have a configuration of morphemes (Xm). Verbal words are only grouped in verb phrases when there is a lexical verb Utterances are only interesting to us if they have a Representational Level. Stems have layer identifiers of the form Xs1 . roots and affixes. These fragments are taken from the kitchen dialog I attempted to parse with my program. for example: (18) (Vwi : /Iz/ (Vwi )) “is” for the verbal word “is”. The clause has three constituents: a noun phrase Npi .1.1. The first two constituents will be merged at the Phonological Level to form “I’m”. but at the Morphosyntactic Level they are two separate constituents. we see that the Morphosyntactic Level of this utterance consists of a single Linguistic Expression.2 Examples of Morphosyntactic Layers I now illustrate the structure of the Morphosyntactic Level explained above with some morphosyntactic analyses of fragments of English. and affixes have layer identifiers of the form Aff 1 . 3. roots have layer identifiers of the form Xr1 .1. The verbal word “am” is not part of a verb phrase. only morphemes are relevant.3. The layers of words that can be decomposed follow the following template: (19) (Xw1 : [(Xm)(Xw)(Xp)(Cl)](Xw1 )) That is. Stems are morphemes that have lexical content and that can occur as a word on their own (like “apple” in the example “apples” above). which consists of a single clause. An example is the word “apples” which consists of the stem “apple” and the affix “-s”. if that does not cause any confusion. 1 22 . I write such layers with the orthographic form of the word instead of the phonological form. The first utterance of the dialog that is interesting to us 1 is “I’m hungry”. Affixes are morphemes that have grammatical content and that need a root or a stem to form a word (like the “-s” in the example “apples”).” In this example.1. words (Xw).1 The Morphosyntactic Form of Words Words can sometimes be decomposed into smaller constituents.

a verb phrase Vpi . 23 .1. This structure is represented in the formalism by giving the word a head with an equipollent configuration of the stem and the affix. The example is rendered with indentation for easy reading: (21) (Lei : (Cli : [ (fin Vwi : can (Vwi )) (Npi : (Nwi : you (Nwi )) (Npi ))Subj (Vpi : (Vwj : do (Vwj )) (Vpi )) (Npj : (Nwj : something (Nwj )) (Npj ))Obj (Adpi : [ (Gwi : with (Gwi )) (Npk : [ (Gwj : those (Gwj )) (Nwk : [ (Nsi : ingredient (Nsi )) (Aff i : s (Aff i )) ] (Nwk )) ] (Npk )) ] (Adpi )) ] (Cli )) (Lei )) “Can you do something with those ingredients?” In this example we see that this linguistic expression has one clause in its head. a noun phrase Npj and an adposition phrase Adpi . Those are a verb word Vwi . the morphosyntactic constituents corresponding to “something” and “with those ingredients” are not part of the verb phrase Vpi . in this example. A verb phrase in FDG is simply a group of verbs and grammatical particles headed by a verb. a noun phrase Npi .heading a phrase. The verbal word Vwi has an operator ‘fin’. So. consider the Morphosyntactic Level of “Can you do something with those ingredients?”. The noun phrase Npi and the adjective phrase Adjpi both have a lexical word as their only constituent. as stated in §5. given in (21).1. which indicates that it is a finite verb. As another example.1. together with the chosen representational frame. can help us determine the representational function of arguments of the verb when building up the Representational Level from the Morphosyntactic Level. is that according to FDG. Npi (“you”) has been assigned the function ‘Subj’ (for ‘Subject’) and Npj ) has been assigned the function ‘Obj’ (for ‘Object’). a verb phrase does not contain the morphosyntactic realizations of the representational arguments of the verb. One will also notice that the word “ingredients” is decomposed further into the stem “ingredient” (Nsi ) and the affix “-s” (Aff i ). A final important thing to notice is that morphosyntactic functions have been assigned to some of the noun phrases. That clause in turn has five constituents in an equipollent configuration in its head. These functions. A first thing that one can see in this example.

3.1.2

Positioning and Alignment

Two important processes that happen during Morphosyntactic Expression (the process that builds up the Morphosyntactic Level from the Representational and Interpersonal Levels) are positioning and alignment. Positioning determines the position that hierarchically related units will have, relative to each other, in the resulting utterance. Alignment determines the relative order of non-hierarchically related units. Positioning proceeds in a top down fashion, so the highest layers of the Representational and Interpersonal Level are positioned first. Alignment happens after positioning is complete, but the structures created by alignment and positioning may interlock. While it is beyond the scope of this thesis to discuss and motivate all the principles of positioning and alignment in general and all the rules that govern their implementation in English, I will give a short overview of both to show the way they shape English utterances. 3.1.2.1 Positioning

As already mentioned, positioning determines the position that hierarchically related units will have in the utterance. What the positioning process leads to, is that the lower in the hierarchy of layers a piece of interpersonal or representational information is, the closer to the center of the utterance it will be realized. Interpersonal constituents tend to be realized further from the center than representational constituents. As an example, consider example (22) (taken from [Hengeveld and Mackenzie, 2008]): (22) Finally, she honestly reportedly probably has been drinking continuously again recently

The example is contrived but grammatical. And it shows that the interpersonal information (“Finally”, “honestly” and “reportedly”) goes to the beginning of the utterance. The interpersonal speech act modifier “Finally” is even positioned outside the clause. The representational information (“probably”, “continuously”, “again” and “recently”) stays within the clause and occurs later in the utterance than the interpersonal information. And at the end of the utterance, the representational modifiers are expressed in hierarchical order, with the hierarchically lowest modifiers first. The hierarchically highest representational modifier, “probably”, goes to the beginning of the utterance, where it occurs later than the interpersonal modifiers. 3.1.2.2 Alignment

Alignment is “the way in which non-hierarchically related pragmatic and semantic units map onto morphosyntactic ones” ([Hengeveld and Mackenzie, 2008], page 316). It is during alignment that morphosyntactic functions such as ‘Subject’ and ‘Object’ are assigned to constituents. FDG discerns three types of alignment: interpersonal alignment, representational alignment and morphosyntactic alignment. These types of alignment 24

are named after the level from which functions and operators are used to align constituents. English uses the morphosyntactic functions Subject and Object for morphosyntactic alignment. The contextual component plays in important role in deciding which constituent gets which morphosyntactic function; that cannot be determined from the Representational and Interpersonal Levels alone. Morphosyntactically complex constituents are often aligned to the end of an utterance. So we have “the singing man” versus “the man who sings”: though both “singing” and “who sings” are modifiers of an individual at the Representational Level, and they mean the same, “who sings” is realized at the end of the phrase because it is more complex than “singing”.

3.2

The Representational Level

While the Morphosyntactic Level describes the hierarchical form of the utterance, the Representational Level describes the relation between the utterance and the world outside the discourse and its participants. Or in FDG terminology: the Representational Level describes the designation of the utterance. It is important to realize what the Representational Level is not about. It is not about the meaning of an utterance when it relates to the utterance itself (like the meaning of the word “briefly”). Nor is it about the meaning that linguistic units have in a particular context: the Representational Level is “restricted to the meanings of lexical units (. . . ) and complex units (. . . ) in isolation from the ways these are used in communication” ([Hengeveld and Mackenzie, 2008], page 129). The latter point has important consequences for the notion of reference. When we consider the sentence “I saw a lion”, the expression “a lion” be considered a referring expression in two distinct ways: firstly, the speaker refers to an animal of the lion-class by using this expression; and secondly, the expression itself refers to an animal of the lion-class. The first kind of referring from this example, in which the referring is done by the speaker, is handled by FDG at the Interpersonal Level. The second kind of referring is what the Representational Level is about, and it is called designation in FDG.

3.2.1

Semantic Categories

Representational layers are of the general form for layers that has been given in the previous chapter in example (10). The layer identifiers of representational layers indicate the semantic category of the layer. A semantic category is an ontological category that is reflected by the system of a language. Distinctions that are made solely by different lexemes, expression of operators or expression of functions are not considered distinctions of semantic categories. Only distributional criteria are valid as evidence to consider two layers as belonging to a different semantic category. That is, two layers have different semantic categories if and only if their morphosyntactic realizations behave differently with regard to in what semantically based morphosyntactic configurations they are allowed. 25

Because different languages allow different sorts of configurations, it follows from this definition of semantic category that different languages have different sets of semantic categories. There are four semantic categories that are found in every language: propositional contents (layer identifier of the form p1 ), states of affairs (layer identifier e1 ), individuals (layer identifier x1 ) and properties (layer identifier f 1 ). English has some more categories: episodes (layer identifier ep1 ), locations (layer identifier l1 ), times (layer identifier t1 ), manners (layer identifier m1 ) and quantities (layer identifier q1 ). Possibly, there is also a separate semantic category for reasons, with the layer identifier r1 . The outermost layer of the Representational Level of a declarative or an interrogative utterance is always a propositional contents layer. The outermost layer of the Representational Level of imperative utterances is a state of affairs layer.

3.2.2

Propositional Contents

Propositional content layers typically have the following form: (23) (π p1 p1 : [ (π ep1 ep1 : [ [ (π e1 e1 : . . . : . . . ) ... ] (ep1 )Φ ] : [ σ ep1 (ep1 )X ]) (p1 )Ψ ] : [ σ p1 (p1 )Ω ]) The head of the propositional content

3.2.2.1

The most typical propositional content head is that which contains a single episode, as displayed in (23). The episode typically contains one or more states of affairs in its head. If there are multiple states of affairs in the head of an episode, those states of affairs belong together because of unity or continuity of the time at which they take place, the location where they take place, or the individuals involved. In English, the episode layer is important for temporal operators and modifiers. In English, the operators π ep1 and modifiers σ ep1 of the episode specify the moment in time at which the states of affairs take place absolutely, or relative to the moment of speaking. Further temporal operators and modifiers on the states of affairs may indicate the time of those states of affairs relative to the time of the whole episode. 3.2.2.2 Other propositional content head structures

Though most propositional content layers have follow the structure given in example (23), there are propositional content layers that do not. Propositional 26

content layers may have lexical heads, as in the Morphosyntactic Level constituent that corresponds with the noun phrase “a belief”: (24) (pi : [ (f i : [ belief (f i )∅ ]) (pi )∅ ]) “a belief”

If the word “yes” or “no” is used as the full answer to a yes/no-question, the Representational Level of the utterance will be as in (25) or (26) respectively: (25) (26) (pi : [ yes (pi )∅ ]) “Yes.” (pi : [ no (pi )∅ ]) “No.”

So in these two cases the propositional content layer has a lexical constant directly in its head. The lexical constant is not wrapped in a property layer f i like in (24). 3.2.2.3 Propositional content operators

Propositional content layers, like all layers, have a set of operators, called π p1 in the formula (23) above. These operators specify the subjective epistemic modality and the evidential modality of the propositional content. The subjective epistemic modality operators include operators expressing belief, doubt and hypothesis. The evidential modality operators can indicate that the propositional content is based on perception, that it is based on deduction from existing knowledge, or that it is based on common knowledge. 3.2.2.4 Propositional content modifiers

The modifiers of the propositional content, σ p1 in formula (23) above, specify the propositional attitude that the speaker has toward the content. The propositional attitude is the kind and degree of commitment that a speaker has toward the content. Words like “probably”, “undoubtedly” or “hopefully” express propositional attitudes. They are analyzed as properties that modify propositional contents at the Representational Level. So the sentence “Evidently he is working” is analyzed at the Representational Level as: (27) [ (pres epi : [ [ (sim ei : [ - he is working - (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]: [ (f i : [ evidently (f i )∅ ]) (pi )∅ ]) “Evidently, he is working.” (pi :

In this example, the propositional content has a modifier that expresses the meaning of the word “evidently”. This modifier is on the last line of the formula. 27

f j specifies the relation that holds between the participants. What operators can be found in π xi will be explained in §3. This structure will be explained in the following paragraphs: (28) (π ei ei : [ (π f i f i : [ [ (π f j f j : . The propositional content has in its head an episode epi that expresses the state of affairs of “he is working”. the property of being evident. ] (f i )X ] : [ σ f i (f i )Ψ ]) (ei )Ω ] : [ σ ei (ei )Π ]) Heads of States of Affairs 3. 28 . In [Hengeveld and Mackenzie. The operator ‘pres’ on the episode specifies that this episode happens at the time of speaking. there are two states of affairs. ) (π xi xi : . in §3. These two states of affairs are joined in the head of one episode. In other words. they can be grouped to form episodes.2.3 States of Affairs States of affairs are entities that can be situated in relative time.2. This property is predicated over the participants of the state of affairs. )Φ . ..2.. which is coded into the formalism as the function Φ of the participant. Here an individual xi has been listed as a participant. The property f j is the predicate of the state of affairs. I explain more about properties with configurational heads. This property f i has a configurational head itself2 . a property f i .4.5 on individuals. we find a property f i in the head of the state of affairs. The typical structure of a state of affairs layer is given in (28). In the sentence “He mounted his horse and rode toward the sunset” for example.3. And as we have seen above. like f i in (28). 2008]. 2008] uses.In this modifier. is applied to the layer identifier pi . In the head of f i we find a property f j followed by any number of participants in the state of affairs. the outer pair of square brackets around the head restrictor (the pair that is opened by the second square bracket in 28) is omitted for configurational properties. 2 The notation that I use for configurational properties is slightly different from the one that [Hengeveld and Mackenzie. 3. Every participant has a certain role in the state of affairs.2.2. one corresponding to “He mounted his horse” and one corresponding to “rode toward the sunset”. The operator ‘sim’ on the state of affairs ei specifies that that state of affairs happens simultaneously with the episode. I will not go deeper into the role of the operators in π f i and π f j and the modifiers in σ f i at this point.1 In this typical structure. . . . and that can be said to be true or false.

3.3. The meaning of the proper name “Sheila” is analyzed as a completely interpersonal matter.3. This operator is realized at the Morphosyntactic Level as the verb word “may”. Firstly. may specify location. In this example. π ei . modifiers may specify frequency.3. 3.4 An example of a typical state of affairs As an example of a state of affairs layer. relative to the time of the episode. but not necessarily. and negation.2.3 Modifiers of states of affairs There is a wide range of things that modifiers on states of affairs can specify. I will discuss the Representational Level of the sentence “Sheila may eat a bagel tomorrow”.2. this state of affairs layer has a property f i in its head. is Undergoer. Among them are all the things that operators may specify. except negation. it has an operator ‘poss’ that signifies that the meaning of this state of affairs is possibly. These arguments have functions (Actor and U ndergoer) that indicate what their role in the state of affairs is. and xj . which stands for “Sheila”. the property f i does not have any operators or modifiers of its own. true.2.4 on properties. which stands for “a bagel”. xi . cause and goal. The Representational Level of it is given in (29): (29) (pi : [ (pres epi : [ [ (poss ei : [ (f i : [ [ (f j : [ eatV (f j )∅ ]) (xi )Actor (1 c xj : [ (f k : [ bagelN (f k )∅ ]) (xj )∅ ])Undergoer ] (f i )∅ ]) (ei )∅ ]: [ (ti : [ (f l : [ tomorrowAdv (f l )∅ ]) (ti )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “Sheila may eat a bagel tomorrow. Thus the individual xi for “Sheila” does not have any further 29 . reality (as with the word “really”).2 Operators of states of affairs The set of operators on the state of affairs itself. is Actor. 3. Furthermore. perception.” The state of affairs layer ei in the example illustrates every part of the typical state of affairs layer as discussed above.2. They may also specify the relative tense of the state of affairs. Secondly. The property f i in turn has a configurational head which contains a predicate f j and its arguments xi and xj . A list of possible roles is given in §3.

for example.2. as usual. state of affairs layers with lexical heads may also occur. stand for sets of operators and modifiers respectively. 3 30 . which stands for the phrase “tomorrow”.2. Finally. The word “worker”. (31) (π f 1 : [ (f 1 )∅ ] : [ σ(f 1 )Φ ]) π and σ. Layers for individuals will be explained further in §3. 2008] use (π f 1 : (f 1 ) : [ σ(f 1 )Φ ]) while [Smit and Van Staden.2. It specifies the time of the state of affairs. 2008]. And configurational property layers (layers of which the head does contain other layers) are found in the heads of most states of affairs. FDG does not further analyze the nature of these lexical constants. except that it is a semantic instead of a morphosyntactic notion. Nonconfigurational property layers (layers where the head does not contain any other layers) form the set of lexemes of a language.4 Properties Property layers are a fundamental building block of FDG semantics. The operator ‘pres’ is realized at the Morphosyntactic Level as the present tense of the verb form “may”. An example is given in (30): (30) (ei : [ (f i : [ meetingN (f i )∅ ]) (ei )∅ ]) “a meeting” So there is a state of affairs layer ei . but it There is some variation in the notation of lexical properties: [Hengeveld and Mackenzie. 3. which in turn has in its head the lexical constant ‘meetingN ’. which has in its head a property layer f i . The in the example stands for a lexical constant. in this thesis I regard them as atomic bits of semantics stored in the lexicon. which in this case has the absolute tense operator ‘pres’.4.2. 2007] use (π f 1 : [($1 | ) (f 1 )∅ ] : [ σ(f 1 )Φ ]). 3.1 Lexical Properties The form of a lexical property is given in (31) 3 .3. More complex combinations of lexical constants are described in [Hengeveld and Mackenzie. 3. but they are beyond the scope of this thesis. The subscript ‘N ’ on the lexical constant signifies that it is a nominal lexeme.5 Other Layer Structures for States of Affairs Besides the typical structure given above.5 on individuals. is of the part-of-speech ‘noun’. the state of affairs has a relative tense modifier ti . A lexeme class is like a part-of-speech. Every lexical constant is member of a certain lexeme class.information about its meaning at the Representational Level. I see no compelling reason to use a different notation for heads with lexical constants than for heads with layers. for “present”. relative to the time of the containing episode.

So the Representational Levels of (32A) and (32B) are given in (33A) and (33B) respectively: (33) A (1 c xi : [ (f i : [ doctorN (f i )∅ ]) (xi )∅ ] : [ (f j : [ richAdj (f j )∅ ]) (xi )∅ ]) “a rich doctor” B (1 c xi : [ (f i : [ doctorN (f i )∅ ] : [ (f j : [ formerAdj (f j )∅ ]) (f i )∅ ]) (xi )∅ ]) “a former doctor” Another kind of lexical property modifiers are directional modifiers on verbal lexemes. the adjective “rich” ascribes a property to an individual. the adjective “rich” as in (32A) is analyzed as a modifier of an individual at the Representational Level. So in (32A). Some adjectives ascribe a property to an individual. Some more examples of the notation can be found in example (29) above. ’Adj’ for adjectives and ‘Adv’ for adverbs. The lexeme class symbols are ‘N’ for nouns. the directional particle is analyzed as a location layer that modifies the lexical property of the verb. one evokes an individual who is not a doctor. Following this convention. with a lexeme class symbol added in subscript. lexical properties may have modifiers. the verbal lexeme in “worker” would be written ‘workV ’. In FDG. As examples. verbs. In such verbs. One kind of modifiers of lexical properties stems from a certain kind of adjectives. one evokes an individual who is a doctor and who is rich. By using the utterance (32B) however.contains the lexeme “work-” of the lexeme class ‘verb’. Modifiers of Lexical Properties Like other layers. “come down”. adjectives and adverbs. while the adjective “former” as in (32B) is analyzed as a modifier of the lexical property that represents the word “doctor” at the Representational Level. These appear in English as the verbs “go home”. English has four lexeme classes: nouns. the adjective “former” changes the way the property of “doctor” applies to the individual. while in (32B). etcetera. but who was formerly a doctor. In representational formulas. ‘V’ for verbs. consider (32): (32) A A rich doctor B A former doctor By using the utterance (32A). lexical constants are written as the most typical orthographic rendition of the constant. while others change the meaning of the head noun of the noun phrase they are in. So “come down” is analyzed as: (34) (f i : [ comeV (f i )∅ ] : [ (li : [ (f j : [ down (f j )∅ ]) (li )∅ ]) (f i )∅ ]) “come down” 31 .

as stated above. there is a configuration of one or more other layers. The semantic functions of the dependents specify their role in the relation. . . An important use for configurational properties is in states of affairs.the man . (vn )Φ ] (f 1 )X ] : [ σ(f 1 )Ψ ]) In this form. π and σ are the operators and modifiers of the configurational property.Operators on Lexical Properties Lexical properties may also have operators. The only kind of lexical property operator that is relevant to this thesis is narrow-scope negation. “not” is analyzed as an operator on the lexical property of “unintelligent” that negates its meaning. the nucleus is a relation and the dependents are the things between which this relation holds. In that case (v1 ) is called the nucleus and the other (vi )’s are called the dependents. . times or states of affairs. but the other (vi )’s will have one. The semantic categories of the different (vi )’s need not be the same. As has already been mentioned. In the typical case of a head with nucleus and dependents. respectively. . the typical head contains a configurational property that describes what is the case.2. (v1 ) will not have a function. X and Ψ are variables over semantic functions. So the general form of a configurational property layer is: (36) (π f 1 : [ [ (v1 ) .(xj )∅ ])Undergoer ] (f i )∅ ]) “the man sees the woman” This specifies that there is a relation of seeing between an individual xi and an individual xj . Typically. So “not unintelligent” as in this sentence is analyzed as: (35) (neg f i : [ unintelligentAdj (f i )∅ ]) “not unintelligent” Configurational Properties 3. and that the individual xi does the seeing and that the individual xj undergoes the seeing. Φ. locations.4. the nucleus is another property layer.the woman . In the head of a configurational property layer. (vn ) is a sequence of layers of any semantic category. 32 . and the dependents are individuals. In a sentence like “She is a not unintelligent girl” (taken from [Hengeveld and Mackenzie. as usual.2 The properties that are not lexical properties are known as configurational properties. Heads of Configurational Properties In most cases. the head is a configuration of a nucleus and zero or more dependents. In such cases.(xi )∅ ])Actor (xj : [ . (v1 ) . 2008]). This part of the Representational Level of the utterance “The man sees the woman” is an example: (37) (f i : [ [ (f j : [ seeV (f j )∅ ]) (xi : [ .

According to FDG. participant-oriented modality and quantification of duration.As an example of a configurational property with no dependents in its head. ‘imperfective’ or ‘ingressive’. The aspect of a state of affairs appears in FDG as an operator on the configurational property layer in the head of the state of affairs at the Representational Level. there are no arguments to the verb “rain”. deontic (“have to”) and volitive (“want to”). page 208): 33 . Several other layouts of configurational properties are possible besides the usual case with a nucleus and zero or more dependents. Such extra participants may have the functions ‘Beneficiary’. ‘Instrument’ or ‘Duration’. 2008]. Operators for Aspect The aspect of a state of affairs reflects its “internal temporal constituency” ([Hengeveld and Mackenzie. ‘Comitative’. The three possible participant-oriented modalities are facultative (“be able to”). see the state of affairs layer in (39) (adapted from [Hengeveld and Mackenzie. Those are beyond the scope of this thesis however. Operators for Quantification of Duration Operators on a configurational property may also quantify the duration of the situation described by the property. consider this part of the Representational Level of “it is raining”: (38) (f i : [ [ (f j : [ rainV (f j )∅ ]) ] (f i )∅ ]) “it is raining” Semantically. Modifiers of Configurational Properties Modifiers of a configurational property may add extra participants to the state of affairs that the property is the head of. Such operators than stand for such things as ‘perfective’. example 435. page 210). the word “it” in the sentence “it is raining” is inserted during morphosyntactic encoding because all available morphosyntactic frames require a subject. Operators for Participant-oriented Modality The participantoriented modality of a state of affairs tells in what way a participant participates in the state of affairs. 2008]. Operators of Configurational Properties Operators on configurational properties may express aspect. For an example.

The fact that the chair being talked about is identifiable (as signaled by the definite article “the”). Configurational heads Configurational heads for individual layers are also found in English.5. The Φ and X are variables over functions. as it is an interpersonal matter. or neither. and if the layer is neither.(xk )∅ ])Instrument (f i )∅ ])(ei )∅ ]) “John cut the meat with a knife” 3. The x1 . In the case of a configurational head for an individual layer. Many languages distinguish countable and uncountable individuals. The fact that there is exactly one chair that is talked about. the head position in the individual layer xi is taken by the lexical property layer f i . the value of S is empty. is reflected by the operator ‘1’. is the expression of inalienable possession.(xj )∅ ])Undergoer ] (f i )∅ ]: [(xk : [ .1 Heads of Individuals Lexical heads The head position of an individual is most typically occupied by a lexical head. The S stands for a letter that indicates whether this layer is count.a knife . A phrase like “The chair” is analyzed as in (41): (41) (1 c xi : [ (f i : [ chairN (f i )∅ ]) (xi )∅ ]) “The chair” In this example. the value of S is ‘m’. the head position of the layer is occupied by a configurational property layer or a state of affairs layer.2. 3. If the layer is count. mass.2. if the layer is mass. An utterance like “the brother of the king” is analyzed at the Representational Level as in (42): 34 . The general form of layers for individuals is given in (40): (40) (π S x1 : [ head (x1 )Φ ] : [ σ(x1 )X ]) In this formula.the meat . stands for an identifier for the layer. tangible objects that occupy a portion of space.(39) (sim ei : [ (f i : [ [ (f j : [ cutV (f j )∅ ]) (xi )Actor (xj : [ . is not reflected at the Representational Level. The head position of an individual can be occupied by many different structures. as will be explained below. but not as often as in other languages. One kind of construction that gives rise to configurational heads for individuals. English is one such language. the value of S is ‘c’. the π as usual stands for operators on the layer. The σ stands for one or more modifiers on the layer. as usual.5 Individuals Individuals are representational layers that designate concrete.

Most adjectives on noun phrases are analyzed as modifiers of individuals at the Representational Level. Apparently they are omitted because the layer identifier xi already occurs in the state of affairs in the head.2 Operators and Modifiers of Individuals Operators on individuals may specify location. As far as I know there is no rigid specification for this way of writing down such heads in FDG: it is not treated in [Smit and Van Staden. Also they may specify quantification: “all” as in “all children must go to school” is analyzed as an operator on an individual layer at the Representational Level. example 625. In this way. xi is denoted by the way it participates in the state of affairs ei . Inside the configurational head of f i there is also the individual xj as a dependent with the function of Referent. like “what you read” in “I will read what you read”. 2008]. Individuals may be modified by properties and by states of affairs or episodes in which the individual itself occurs as a participant. The head of the state of affairs is a configurational property f i . 35 . page 241) 4 . 3.2.(42) (1 c xi : [ (f i : [ [ (f j : [ brotherN (f j )∅ ]) (1 c xj : [ (f k : [ kingN (f k )∅ ]) (xj )∅ ])Ref ] (f i )∅ ]) (xi )∅ ]) “The brother of the king” In this Representational Level we see that there is a lexical property f j that is the property of being a brother. So the Representational Level of “an old man” becomes: (44) 4 (1 c xi : [ (f i : [ manN (f i )∅ ]) (xi )∅ ]: [(f j : [ oldAdj (f j )∅ ]) (xi )∅ ]) “an old man” There is a notation inconsistency in this example: there are no square brackets around the head of xi .5. So the configurational property f i is the property of being a brother of the king. operators on individuals may specify qualification. which says that the “you” (xj ) is reading (f j ) the individual xi . 2007]. Finally. Such clauses. most notably with diminutive forms of nouns. for example as an analysis of demonstratives. This property is inside the configurational head of property f i as the nucleus. A construction that gives rise to individual layers with a state of affairs as their head is the headless relative clause. are analyzed as in (43) (taken from [Hengeveld and Mackenzie. (43) (xi : (ei : [ (f i : [ [(f j : [ readV (f j )∅ ]) (xj )Actor (xi )Undergoer ] (f i )∅ ]) (ei )∅ ])) “what you read” In this example we see an individual xi headed by a state of affairs xi .

7 Relation to the Morphosyntactic Level Now that I have discussed both the form of the Morphosyntactic Level and the form of the Representational Level. the Morphosyntactic Level of the utterance is created by the process of morphosyntactic encoding. 2008]. Iconicity means that the order of moves and discourse acts is maintained. None of these three principles is inviolable. the Representational Level). In the case of coindexation. the Representational Level and context data from the contextual component. 36 . Iconicity. Functional Stability means that constituents that have the same function. differs across languages. During language production.2. will go to the same position. This is used to indicate that those two layers have the same designation.2.e. page 143). Layers that express anaphoric reference have only an identifier. Domain Integrity and Functional Stability. Domain Integrity and Functional Stability are the three principles that guide morphosyntactic encoding in any language.1 Principles for Morphosyntactic Encoding The relation between the structure between any two levels that influence each other is characterized by the three principles of Iconicity. to which these principles are enforced in the presence of others forces. two layers have the same identifier.2. The relation between the Morphosyntactic and the Representational Level is no different.3. will stay together on lower levels (i. So when one considers example (45) (adapted from [Hengeveld and Mackenzie. and there are many forces that may overrule them in certain circumstances.e. The inputs to the morphosyntactic encoding process are the Interpersonal Level. (45) (46) (47) “The man cleaned the windows and he painted the door” (1 c xi : [ (f i : [ manN (f i )∅ ]) (xi )∅ ]) (xi ) 3. Domain Integrity means that what belongs together on higher levels (i.7. the layer at the Representational Level for “the man” is given as example (46) and the Representational Level for “he” is given in example (47). the Morphosyntactic Level). The extent. I shall briefly address the way the two forms are related. which is the same identifier as the one of the layer that the anaphor refers to.6 Coindexation A special way to use the identifier of a representational layer is coindexation. The fact that the layers in (46) and in (47) have the same identifier. indicates that they have the same designation. 3.

2 Ambiguities on the Morphosyntactic and Representational Levels An important challenge in natural language processing is resolving ambiguities. may mean many different things. When we construe the Morphosyntactic Level of the utterance in (48). A single utterance. we find that the ambiguity is found there too. is also present on a certain level of analysis. He will be buried tomorrow. Npk .3. The existing material on Functional Discourse Grammar does not treat ambiguity explicitly.2. it becomes clear that the “he” in the second sentence is the “the butcher” from the first sentence. that is found in the form of an utterance. In the Representational Level in (49) however. The Morphosyntactic Level does not tell us who is meant by the “he” in the second sentence. when taken in isolation. does not contain any information on what the antecedent of “he” is. The part “He will be buried tomorrow” is ambiguous: when one views it in isolation. one can see that the antecedent of “he” is indicated by giving the layer representing “he” the same index as the antecedent. He will be buried tomorrow”. consider the utterance “My neighbor killed the butcher today. However. If one wants to compute the meaning of the utterance. one will have to pick the right meaning based on the available data about the context of the utterance. the descriptions of the various levels of analysis allow us to check if a certain ambiguity. As an example. Only when one takes into account the information supplied in the first sentence (“My neighbor killed the butcher today”). 37 . So there is no ambiguity at the Representational Level: “he” can only refer to “the butcher”.” In this Morphosyntactic Level one can see that the layer that represents “he”.7. one cannot decide who is meant by the word “he”. (48) (Lei : (Cli : [ (Npi : [ (Gwi : my (Gwi )) (Nwi : neighbor (Nwi )) ] (Npi )) (Vpi : (fin Vwi : [ (Vsi : kill (Vsi )) (Aff i : -ed (Aff i )) ] (Vwi )) (Vpi )) (Npj : [ (Gwj : the (Gwj )) (Nwj : butcher (Nwj )) ] (Npj )) (Advpi : (Advwi : today (Advwi )) (Advpi )) ] (Cli )) (Lei )) (Lej : (Clj : [ (Npk : (Nwk : he (Nwk )) (Npk )) (Vpj : [ (fin Vwj : will (Vwj )) (Vwk : be (Vwk )) (Vwl : [ (Vsj : bury (Vsj )) (Aff j : -ed (Aff j )) ] (Vwl )) ] (Vpj )) (Avdpj : (Advwj : tomorrow (Advwj )) (Advpj )) ] (Clj )) (Lej )) “My neighbor killed the butcher today.

The well-known PP-attachment ambiguity is one such type of ambiguity. He will be buried tomorrow.a girl with a telescope .(Npi )) (Vpi : . Some other types of ambiguities. one cannot distinguish homophones at the Morphosyntactic Level. and the Morphosyntactic Level of the interpretation where the man sees a girl who carries a telescope is given in (51).(Adpi )) ] (Cli )) (Lei )) “The man sees a girl with a telescope” (Lei : (Cli : [(Npi : .(49) (pi : [(past epi : [ [(sim ei : [(f i : [ [(f j : [ killV (f j )∅ ]) (1 c xi : [(f k : [ [(f l : [ neighborN (f l )∅ ]) (xj )Ref ] (f k )∅ ]) (xi )∅ ])Actor (1 c xk : [ (f m : [ butcherN (f m )∅ ]) (xk )∅ ])Undergoer ] (f i )∅ ]) (ei )∅ ] : [(ti : [(f n : [ todayAdv (f n )∅ ]) (ti )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) (pj : [(fut epj : [ [(sim ej : [(f o : [ [(f p : [ buryV (f p )∅ ]) (xk )Undergoer ] (f o )∅ ]) (ej )∅ ] : [(tj : [(f q : [ tomorrowAdv (f q )∅ ]) (tj )∅ ]) (ej )∅ ]) ] (epj )∅ ]) (pj )∅ ]) “My neighbor killed the butcher today.the man .(Vpi )) (Npj : . that are found in the surface form of utterances. The sentence “The man sees a girl with a telescope” is ambiguous. because the Morphosyntactic Level contains the phonological representation of words. The Morphosyntactic Level of the interpretation where the man uses a telescope as an instrument to see the girl is given in (50). ambiguities that are present in the Morphosyntactic Level are not present at the Representational Level.(Npj )) ] (Cli )) (Lei )) “The man sees a girl with a telescope” (51) 38 . are already resolved at the Morphosyntactic Level.sees .sees .” This example shows that at least sometimes. There are some other types of ambiguities that one has to resolve when computing the Representational Level from the Morphosyntactic Level. Most importantly. but the two different interpretations have different Morphosyntactic Levels.(Vpi )) (Npj : .the man . (50) (Lei : (Cli : [(Npi : . one will have to resolve those ambiguities using context information.a girl .with a telescope .(Npi )) (Vpi : . Whether the “bank” in “See you at the bank!” is a financial institution or a riverside will have to be decided upon when computing the Representational Level. So if one wants to compute the Representational Level from the Morphosyntactic Level. or by making an educated guess.(Npj )) (Adpi : .

3. Based on those Phonological and Morphosyntactic Levels. The procedure.5). The approach is presented in three steps. I describe how the program has been used to model a fragment of the English language (chapter 5). Finally. by inverting the process of language generation that is described in most FDG and Functional Grammar literature. The hearer then builds up the Interpersonal Level based on the Representational. the hearer builds up a Morphosyntactic Level of the utterance.1. goes like this: when a natural language user hears an utterance. the first thing that happens is that an inverse of the Output Component turns the perceived acoustic signal into a Phonological Level of the utterance. 2008]. Then. I explain the theoretical background of the approach (§4. Based on this Phonological Level.Chapter 4 Language-Independent Aspects Now that I have described the concepts of Functional Discourse Grammar. not spoken language. 4. it 39 . in a separate chapter. Morphosyntactic and Phonological Levels and then derives the communicative intention behind the perceived utterance from the Interpersonal and Representational Levels. In §7.1 Theoretical Background We can imagine the outline of a procedure for interpreting natural language according to FDG. I can proceed to explain the approach to compute the Representational Level from the Morphosyntactic Level. that one obtains by inverting the generation description in FDG. the hearer uses the same primitives and Contextual Component that he uses when generating language.2 through §4. I explain the general structure of the program (§4. the hearer builds up a Representational Level.1). This way of producing a theory of language understanding is already hinted at by [Dik. First. While building up the Levels in reverse order this way. Because my approach analyzes written language. I briefly address the question of how this approach can be used to build up the Interpersonal Level instead of the Representational Level. 1989] and [Hengeveld and Mackenzie.

there are five types of nodes: identifier nodes. are labeled with the function that this restrictor itself has. which I assume can be obtained from a text by conventional parsing techniques. That restrictor node has one child. 40 . and under the assumption that a Morphosyntactic Level has already been created. It starts with the Morphosyntactic Level. If the layer has no head. Moreover.2. the first step in understanding an utterance is building a Representational Level from the Morphosyntactic Level. and a list becomes a list node with its elements as children. that the techniques described in this chapter can also be used to build an Interpersonal Level from the Morphosyntactic and Representational Levels. Modifiers are turned into trees just like heads: the root of the subtree for a modifier is a restrictor node labeled with the function of the layer identifier with respect to the restrictor being represented. as explained in [Knuth. the only child of the restrictor node is a list node that has the elements of the list as its children. it will do it in such a way.will not have a Phonological Level as input. The modifier node has all modifiers of the layer as its children. That step is what the approach described in this chapter will do. they are terminal nodes. If the restrictor is an FDG layer. it is necessary to regard FDG layer structures as trees. Such an identifier node has three children: its operators node. Otherwise. list nodes and constant nodes. a layer is recursively expanded to an identifier node with its operators.2 4. In FDG layer trees. The children of the operators node are called operator nodes. If the head restrictor is a grammatical or lexical constant. the child of the restrictor node is the identifier node of that layer. Operator nodes do not have any children. This node representing a restrictor is labeled with the function that the layer identifier has with respect to this restrictor. its head node. and its modifiers node. If the head restrictor is a list. the head node has no children. Without a Phonological Level.1 Overview of the Approach FDG Structures as Trees To understand the approach explained in this chapter. a terminal node labeled with that constant is the only child of the restrictor node. that child is a terminal node labeled with that constant. The operators node has the operators of the layer as its children. The same goes for list nodes: the edges to the children are marked with the functions of those children. The edges that connect a restrictor node to its one child. 4. restrictor nodes. To regard a layer as a tree. 1997]. Trees are a fundamental data structure in computer science. If the restrictor is a lexical constant. one must regard the identifier as an identifier node. identifier child nodes (the operators. the head node has a node representing the head restrictor as its child. if it has one. head and modifiers nodes). head and modifiers nodes.

1: The tree representation of (xi ) 4.xi operators head modif iers Figure 4.1 Some Example FDG Trees In this section. Because this node is an identifier node. consider the Morphosyntactic Level of the constituent “those ingredients” from (21). these three children have no children of their own. take the layer (xi ). As a second example. head or modifiers. I give some examples to illustrate how I regard FDG layer structures as trees. it also has a modifiers child. as it appears representing an anaphoric reference. As usual.1. Npk has a non-empty head. and it has an empty set of modifiers. Because there are no operators or modifiers in the entire layer structure for Npk .2. We will not discuss these nodes any further for this example. this identifier node has child nodes for operators. The Morphosyntactic Level is repeated here as (52).2 are terminal. But because the layer has no operators. In these examples.1. (52) (Npk : [ (Gwj : those (Gwj )) (Nwk : [ (Nsi : ingredient (Nsi )) (Aff i : s (Aff i )) ] (Nwk )) ] (Npk )) “those ingredients” The outermost layer of (52) is Npk . list nodes as dots and constant nodes as diamonds. all operators and modifiers nodes in Figure 4. so there is a restrictor node as a child of its 41 . that includes list nodes and heads. identifier child nodes as boxes. restrictor nodes as ovals. and a modifiers child. Because (xi ) is a layer. I display identifier nodes as circles. head and modifiers. So the tree becomes as given in Figure 4. Its rendition as a tree is given in Figure 4. a head child. it has no head. This layer has an empty set of operators. it gets an identifier node for its identifier xi . As the simplest possible example. and thus its identifier node becomes the root node of the tree.2.

List nodes are rendered as solid black dots. 42 . regarded as a tree. regarded as a tree. What is important to note. The description of those stages refers to an ‘input tree’ and an ‘output tree’.3 The Treewalk Stage The task of the treewalk stage is. 4. we go through the input tree to create chunks of the output tree (the treewalk stage). is that the function that xk has in the equipollent configuration is represented as an edge label on the edge between the identifier node for xk and its parent list node.3. Walking the input tree means systematically visiting every node. as the name implies. walking the input tree. so the child of the restrictor node is a list node. I will use the representational layer ej from example (49). In the first stage. The children of this list node are the identifier nodes of Nsi and Aff i . The elements of the list in Npk ’s head are Gwj and Nwk .2 The Three Stages The approach is divided into three stages. I give an example of a representational layer that includes operators and modifiers. In a second stage. Nwk has a list as its head. just like Npk . so its head node has a constant node for “those” as its grandchild.2. these chunks of the output tree are composed into their final configuration in the output tree (the composition stage). anaphora and implicit constituents in the output tree are resolved (the coreference resolution stage). Because the function of the layer identifier Npk with respect to the head restrictor is ∅. Both of these have constant nodes as the grandchildren of their head nodes. In the head of Npk we find a list. The ‘output tree’ is the Representational Level that is the output of the program. repeated here as (53): (53) (sim ej : [(f o : [ [(f p : [ buryV (f p )∅ ]) (xk )Undergoer ] (f o )∅ ]) (ej )∅ ] : [(tj : [(f q : [ tomorrowAdv (f q )∅ ]) (tj )∅ ]) (ej )∅ ]) “he will be buried tomorrow” The tree representation of (53) is given in Figure 4. Finally. The ‘input tree’ is the Morphosyntactic Level that is given to the program. So the children of the list node are the identifier nodes of Gwj and Nwk . I will now describe each of the stages in a section of its own. In a third stage. If there were more modifiers on ej . At every node. this restrictor node is labeled ∅. those would be represented by trees whose root would be a sibling of this modifier’s restrictor node. 4. Gwk has the grammatical constant “those” in its head.head node. Also notice how the modifier of ej appears as a restrictor node that is a child of the modifier node. So the grandchild of the head node of Nwk is a list node again.

Npk operators head modif iers ∅ Gwj Nwk operators head modif iers operators head modif iers ∅ ∅ those Nsi Aff i operators head modif iers operators head modif iers ∅ ∅ ingredient s Figure 4.2: The tree representation of the Morphosyntactic Level of “those ingredients” 43 .

ej operators head modif iers sim ∅ ∅ fo tj operators head modif iers operators head modif iers ∅ ∅ fq U ndergoer fp xk operators head modif iers operators head modif iers ∅ ∅ tomorrowAdv buryV Figure 4.3: The tree representation of the representational state of affairs layer for “he will be buried tomorrow” 44 .

For example. are not displayed in the diagrams in this thesis. the treewalk procedure returns an empty sequence for terminal non-identifier nodes. The root cannot be an identifier node. the result is a sequence of chunks of output tree. 4.1.3. and the morphosyntactic layer it points to.2.1 Form of Tree Chunks The tree chunks that are created during the treewalk stage have almost the same structure as the trees discussed in §4. There are two differences. The combination criteria node at the root of the chunk must have exactly one child. they are used to find the representational layer that corresponds to the morphosyntactic subject for implicit subject resolution as described in §4. Then.4. When walking an input tree T . S is the result of the treewalk of T . the output tree chunks contain another type of node: combination criteria nodes. The latter property does not mean that the treewalk is useless and everything can be done in the node function. the root node of an output tree chunk must be such a combination criteria node.5. combination criteria nodes may occur as the children of operators. we first apply the treewalk procedure to all children that are not constant nodes. and return the sequence of chunks that is returned by the node function. may not have children.1. I explain the precise form of combination criteria nodes in §4. if the root node of T is an identifier node. If the root of T is not an identifier node. The tree chunks also carry a pointer to a morphosyntactic layer. we apply a language-dependent node function to the tree T and the sequence S.the program considers what this node of the input tree indicates about what the output tree will be like. That gives us a sequence of sequences of output tree chunks. based on the root node of 45 . This information about the form of the output tree is stored as one or more incomplete chunks of output tree. They are occasionally used to access morphosyntactic information during the composition stage and the coreference resolution stage. that have to be put together to form the complete output tree. This sketch shows that the node function is a very important concept: it is the only way by which new tree chunks can be introduced. head and modifiers nodes. like in ordinary FDG trees. Such combination criteria nodes. that are not the root of a tree chunk. It will then also add some tree chunks to its return value.3. Another salient property is that the value of the last invocation of the node function completely determines the result of the treewalk. Note that according to this description. Secondly. We concatenate that sequence of sequences to get a single sequence S of output tree chunks. This pointer. Besides at the root of a tree.2 The Treewalk Procedure The procedure for walking the tree can be sketched as follows. So when the treewalk has been completed. Firstly. 4. A typical node function definition will pass on the pieces from its input to its return value in almost all cases.

and all those ways will be tried when the program is run. 4. verbs. The node function is the most important language-dependent element in the program. The parser that I developed is no different. see [Blackburn et al. The programmer may specify multiple ways of executing a program. so that distinct layers do not accidentally get the same identifier. The node function generates fresh identifiers for the identifier nodes of the chunks it creates. That allows us to specify multiple node function results for the same input.the tree T that it gets as input. This sort of nondeterminism in the node function has been used in the analysis of the fragment of English in chapter 5. which is more lexicon-driven and thus needs lexical entries for every word that the parser recognizes. When executing the program. An examples of designing a node function will be given in chapter 5. k. the node function of the identifier node of the word “bank” could have two different results. also for chunks generated by the node function. while another would represent the meaning of “riverside”. those identifiers are stored as a number.. . When designing a parser according to the approach in this thesis. Because the node function can directly give chunks of output tree for grammatical words. . j. In this thesis I will follow the FDG custom of identifying layers by their category and a subscript letter from the sequence i. Prolog will try all possible combinations of results. most work will probably go into designing the node function for the language to be parsed. the lexicon need only store entries for nouns.3. This contrasts with for example the parser described in [Dignum. In the program. We can exploit this nondeterminism in the execution of Prolog to handle ambiguities. 1989]. 4. Prolog is a programming language with built-in search. in compliance with FDG theory. .3 Implementation in Prolog The parser that I developed has been implemented in Prolog1 . The approach described in this chapter allows for a structure of the lexicon that stays true to FDG theory. 2006] 46 . The lexicon is a data structure that must be consulted by the node function when it is invoked upon input trees that contain lexical constants. adjectives and adverbs. The lexicon used in my program stores for every lexeme: • A unique identifier to refer to the lexeme • The orthographic form of the lexeme 1 For an excellent tutorial on Prolog. Only in exceptional cases should the node function not copy all chunks from its input chunk sequence to its returned chunk sequence.3. To handle the above-mentioned lexical ambiguity of the word “bank” for example.4 The Lexicon An important part of most parsers is the lexicon. . One result would represent the meaning of “financial institution”.

then another tree chunk B may be attached in the place of that combination criteria node. The value of the position key indicates where in the upper chunk the lower chunk is going to be inserted. Composition works by piecing the chunks together at the combination criteria nodes.4 The Composition Stage When the treewalk stage has generated a sequence of output tree chunks. It may be as simple as one layer with an identifier that has the lexical constant in its head. Every frame gives a pattern of the tree structure surrounding the lexeme. adjective or adverb) • A list of frames that can be used with the lexeme • Some lexeme class dependent properties (for example. Both the keys and the values are represented as ordinary text strings. The chunks that are available to build up the Representational Level are not limited to those generated during the treewalk stage. I refer to a chunk like A. for nouns. So the set of frames associated with a verbal lexeme discriminates between verbs with different valences.4. 47 . are frames in the sense of §2. 4.• The lexeme class (noun. I have to describe the form of combination criteria nodes. as the ‘upper’ chunk. as the ‘lower’ chunk. Those chunks are part of the language-specific part of the understanding program. From now on. consistent output tree. They are available for any utterance and they can be used more than once in a single Representational Level. when discussing the combination of two tree chunks.1. I refer to a chunk like B. Those strings are written in this thesis using a slanted font. into which another chunk will be inserted. there are also chunks that represent representational frames that may be used to glue the chunks from the treewalk stage together. verb. An example key is position. which is inserted into another chunk. It does so using a set of key-value pairs. If tree chunk A has a combination criteria node that is not its root. But in the case of verbal lexemes. 4. those chunks must be composed to form a meaningful. the pattern includes open spots where the output tree chunks for the verb’s arguments must be inserted. Before I can explain what ‘matching’ means in the context of combination criteria nodes. head and modifier. whether the lexeme is count or mass) The frames that can be used with a lexeme.1 Form of Combination Criteria Nodes A combination criteria node specifies in what way other tree chunks may be attached to its tree chunk at its position. But the attachment can only happen if the combination criteria in A’s non-root node and those in B’s root node match.4. Composing the chunks in such a way is the job of the composition stage. and some example values that can be used with this key are operator.

because there is no combination criteria node as a child of the head node of xi . xi has no head. The root is a combination criteria node.4: The tree chunk generated for the operator for simultaneity In the tree diagrams that were introduced in §4. The root node is drawn on top and it contains two key-value pairs: position : operator and upperlayer : e. We can see that the chunk has a root combination criteria node. The child is drawn below it. The tree chunk displayed there is created by the node function of my application of my program. At the start of the composition stage. The two terminal combination criteria nodes both contain the key multi which has no associated value.5.position : operator upperlayer : e sim Figure 4. The node function creates this chunk when it is applied to the noun phrase “I”. combination criteria nodes are drawn as square boxes with the key-value pairs inside them. The two terminal combination criteria nodes specify what sort of chunks may become xi ’s operators and modifiers. When all tree chunks from the treewalk stage have been added to the tree. and cannot get one. As an example of a tree chunk with combination criteria nodes drawn that way. and there are no combination criteria nodes left in the tree that have to be filled in. and two terminal combination criteria nodes.4.4.2. consider the tree displayed in Figure 4. 4. The individual denoted by “I” is assigned layer identifier xi . and it is a constant node with the constant ‘sim’. respectively. The composition process builds up the output tree by adding tree chunks to it until there are no tree chunks from the treewalk stage left anymore.1. as the root of a tree chunk has to be a combination criteria node. The key-value pairs are displayed underneath each other. This is a tree chunk which has two nodes: a root node and one child of the root. I can outline the process by which the Representational Level is built up from the output chunks. A slightly more complex example is given in Figure 4. The composition process is built around a data structure called the open 48 . and the key and the value are separated with a colon (‘:’). The root combination criteria node specifies in what sort of upper chunk this chunk can fit.2 The Composition Process Having described the form of the combination criteria. the output tree is empty. the composition stage is finished.

and the program tries to find a matching tree chunk for it. composition fails and a different way of assigning tree chunks is tried with Prolog’s backtracking. will be located at the root of the output tree.5: The tree chunk generated for the pronoun “I” spots queue. The open spots queue is a list. have an identifier node or a constant node as the child of the root combination criteria node. This open spot stands for the root of the tree. all its terminal combination criteria nodes are added to the end of the open spots queue as open spots. Its elements are open spots. This way of filling in open spots repeats until there are no chunks from the treewalk stage any more and no open spots that have to be filled are left on the queue. Then the first item on the open spots queue is taken off. And chunks that can be used as heads or modifiers. or the program fails. If there is an open spot for the head or modifier of a layer. this open spot is a child of the head node or the restrictors node of that layer. Other open spots merely signify that something may be added to the output tree at a certain spot. and all its open spots are added to the end of the open spots queue. a chunk must be added at the spot in the tree they describe. 49 . The tree chunk that is picked to fill that open spot. For some open spots. if any. it is added to the tree at the spot that the open spot stands for. there is one open spot on the open spots queue. normally terminal combination criteria nodes for which it must be decided which tree chunk. Restrictor nodes Restrictor nodes are handled in a special way in the program. is to be added at that spot in the tree. When such a tree chunk is found. When a tree chunk is picked. If the process runs out of the one before it runs out of the other.morphosyntactic function : Subj xi operators head modif iers subtree : 1 multi sf subtree : 1 multi Figure 4. Prolog backtracks until it has found a way to fill the open spots such that the program runs out of open spots and chunks simultaneously. When the composition stage begins.

the composition algorithm tries every subsequence of the sequence of chunks appended to the sequence of available representational frames. has no value associated with it. and the empty sequence is tried out first. This is done because in the fragments of language that I studied for this program.4. where x is a variable.In the tree resulting from the composition of the chunk needing a head and the chunk that contains that head. If a open spot cannot be filled and does not have the critical criterion. The program can easily be changed to support other layer variable functions if needed. If the combination criteria node is the child of a modifiers or operators node. is used during the composition for 50 . The same goes for modifiers. there should of course be a restrictor node between the head node and the identifier node. The key critical signifies that a certain open spot has to be filled. multi. Leaving it empty in the final Representational Level is not possible. it signifies that multiple chunks may be attached at that spot. A key-value pair with multi as the key can only be used in terminal combination criteria nodes. will also be the relative order of two chunks under a list node that fills a multi open spot. critical or position though. Whenever a head or modifier is inserted in the place of a terminal combination criteria node. If the upper node has a key-value pair function : x. it is replaced by a childless list node.1 The Keys and Values in Combination Criteria Now that I have treated the form of combination criteria and the basic composition process. Key-value pairs that have one of the keys function. The different subsequences are tried out by Prolog’s backtracking. I can return to the subject of the meaning of combination criteria. Those multiple chunks will then become the children of a new list node that is inserted at the place where the combination criteria node used to be in the upper tree chunk. A key-value pair with function as the key. an extra restrictor node with the function ∅ is inserted. there was never a restrictor that did not have the function ∅. The key critical. the label of the edge that connects the two chunks after the combination will be labeled with the value of x. This means that the relative order in which two chunks will be returned by the treewalk stage. I first describe the meaning of the language independent keys and their values. it is simply removed from the tree. When filling in an open spot for a multi combination criteria node. Most rules for matching combination criteria are language-specific and coded as the combination criteria matching function. and then generally explain how the language-dependent matching function works.2. In this way the upper chunk can determine the function that the lower chunk has in the resulting tree structure. A key-value pair with position as the key. like multi. If the node is at another position. In such nodes. have a special meaning that is recognized by the language-independent tree chunk composition code. 4. These identifier nodes are inserted by the composition procedure. The key multi has no value associated with it. what happens depends on the position in the tree. is used during composition to set the function of the edge that links the two chunks being composed.

where the predicate match_criterion/2 implements the combination criteria matching function. its second argument is a key-value pair from the combination criteria of the upper node. 51 . as described above.6. However. 4.2. that is given as the second argument. It makes that decision based on the upper criterion being considered.4. The second is determining whether a restrictor node has to be inserted between the two chunks when they are to be composed. Structure)|=.two purposes. fits % a spot in the solution tree with the combination criteria given as the first % argument. % % Checks if the tree chunk. representational constituents that refer to other representational constituents. If the decision is positive for all combination criteria in the upper combination criteria node. that is.5 Coreference Resolution When the composition stage has finished. all representational tree chunks that were created during the treewalk stage have been put in their final position in the representational tree. the Representational Level is not ready yet: some nodes in the representational tree still have to be linked to other nodes. Such a situation occurs when there are anaphora. Structure)). and I can proceed to describe the coreference resolution stage. the referring layer should have the same identifier as the layer that it is referring to.2 The Combination Criteria Matching Function The combination criteria function is the language-specific function that decides if a certain tree chunk can be inserted at a certain open spot.2. and the chunk to be inserted as a whole2 . as explained in §3. ?BottomPiece) is det. Structure)) :maplist(match_criterion(piece(BottomCriteria. the combination criteria in the root node of the chunk to be inserted. the chunk can be inserted there. or nothing at all must be filled in if a terminal combination criteria node is left unmatched. Anaphora are an interesting phenomenon 2 Of course. piece(BottomCriteria. the combination criteria in the root node of the chunk to be inserted are part of the chunk to be inserted as a whole. % % A tree chunk is supposed to be term of the form % =|piece(Criteria. The first is determining whether an empty list node. At this point I have discussed every linguistically relevant aspect of the composition stage. combination_criteria_match(TopCriteria. 4. In such a case. Its first argument is the lower chunk. TopCriteria). It decides for every criterion in the upper combination criteria if it is fulfilled. But the combination criteria in its root node are especially important. The Prolog code that performs the test using the combination criteria matching function is: %% combination_criteria_match(?TopCriteria:list.

1 Implicit Subjects The prototype implementation developed for this thesis uses the coreference resolution stage for filling in implicit subjects in states of affairs. Such clauses contain a participant that participates in multiple states of affairs. but is realized only once at the Morphosyntactic Level. occurs only once at the Morphosyntactic Level. 52 . More information on simple techniques for computationally resolving anaphora can be found in [G¨nther and Lehmann. representing “I”. Because the Morphosyntactic Level constituent that ej is based on (Vpj ) does not contain a constituent that corresponds to the participant xi . The verb “like” is one such verb. (54) ML (Lei : (Cli : [ (Npi : (Nwi : i (Nwi )) (Npi ))Subj (Vpi : [ (fin Vwi : like (Vwi )) (Vpi ) (Vpj : [ (Gwi : to (Gwi )) (Vwj : play (Vwj )) ] (Vpj )) ] (Vpi )) (Npj : (Nwj : chess (Nwj )) (Npj ))Obj ] (Cli )) (Lei )) “I like to play chess. 1983]. If the argument of such a main verb is realized morphosyntactically as a verb phrase with an infinitive without a subject. At the Representational Level.” RL (pi : [(pres epi : [ [ (sim ei : [(f i : [ [ (f j : [likeV (f j )∅ ]) (xi )Undergoer (sim ej : [(f k : [ [ (f l : [playV (f l )∅ ]) (xi )Actor (m xj : [(f m : [chessN (f m )∅ ]) (m xj )∅ ])Undergoer ] (f k )∅ ]) (ej )∅ ]) ] (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “I like to play chess. It appears as a participant in the state of affairs ei . xi is an implicit subject in ej . But this entity xi appears twice at the Representational Level. there is an implicit subject. and as a participant in ej as well. the noun phrase Npi . Take an utterance like “I like to play chess”. the only use of it in my prototype implementation is in resolving implicit subjects.” In example (54).in their own right and I do not treat them in this thesis. Implicit subjects arise in clauses where the main verb is a verb that takes an episode as an argument.5. u While hopefully the coreference resolution stage can be used to handle many linguistic phenomena. the entity xi corresponds to the noun phrase Npi . 4. as analyzed in example (54).

If there are two states of affairs where an entity participant has to be inserted. but there is only one matching entity chunk. Then the coreference resolution procedure will search for coreference nodes. To solve this problem. 53 . When the treewalk procedure encounters a noun phrase. Now that I have described the coreference resolution stage. Thus I have described the entire approach and I can continue to treat its application to a fragment of English.Implicit subjects pose a problem to the composition process. This coreference node can take the place of the second occurrence of the entity in the Representational Level during the composition stage. it outputs a representational tree chunk that contains a coreference node. one chunk is generated for an entity that corresponds to that noun phrase. the treewalk stage and the composition stage defer some of the work to the coreference resolution stage. and replace them with entity nodes that have the same index as the entity node that corresponds with the morphosyntactic subject of the clause. And the composition stage can only use this chunk at one place in the Representational Level. I have described all the stages of my approach. When the node function encounters a verb phrase fronted by “to”. composition will fail.

Applying the program to a language means that five language-specific components have to be defined. They are: • The node function • The lexicon • The combination criteria matching function • The set of representational frames • Combination criteria for the first open spot This chapter will tell how these components have been defined to make the approach work for a small fragment of English. I have been applying it to some utterances from a kitchen dialog. nl/content/view/50/41/. But before those components can be discussed. the dialog contains 76 turns. More information on the Dutch Companion project can be found at http://www.decis.Chapter 5 Language-Dependent Aspects for a Fragment of English To guide the development of the program. 1 54 .1 The Test Dialog The design of the program was guided by the utterances from a test dialog. The one speaker attempts to guide the other speaker through choosing and preparing a recipe. In total. The two speakers are non-native speakers of English and the dialog was edited to correct their English. the fragment of English that the program has been applied to must be outlined. 5. The result of that application is described in this chapter. This test dialog is a dialog between two non-native speakers of English. The guiding speaker plays the role of a kitchen robot that is being developed as a part of the Dutch Companion Research Project1 .

Although the application is capable of finding a Representational Level for any utterance that restricts itself to the lexemes and syntactic phenomena that are covered by it.1 Morphosyntactic Analyses The respective morphosyntactic analyses for these utterances are given in examples (56–59).” (Lei : (Cli : [ (Npi : (Nwi : what (Nwi )) (Npi ))Obj (fin Vwi : would (Vwi )) (Npj : (Nwj : you (Nwj )) (Npj ))Subj (Vpi : [ (Vwj : like (Vwj )) (Vpj : [ (Gwi : to (Gwi )) (Vwk : cook (Vwk )) ] (Vpj )) ] (Vpi )) ] (Cli )) (Lei )) “What would you like to cook?” (Lei : (Cli : [ (fin Vwi : can (Vwi )) (Npi : (Nwi : you (Nwi )) (Npi ))Subj (Vpi : (Vwj : do (Vwj )) (Vpi ))) (Npj : (Nwj : something (Nwj )) (Npj ))Obj (Adpi : [ (Gwi : with (Gwi )) (Npk : [ (Gwj : those (Gwj )) (Nwk : [ (Nsi : ingredient (Nsi )) (Aff i : s (Aff i )) ] (Nwk )) ] (Npk )) ] (Adpi )) ] (Cli )) (Lei )) “Can you do something with those ingredients?” (57) (58) 55 . The utterances that I used from that dialog are given in example (55): (55) “I’m hungry. (56) (Lei : (Cli : [ (Npi : (Nwi : i (Nwi )) (Npi ))Subj (fin Vwi : am (Vwi )) (Adjpi : (Adjwi : hungry (Adjwi )) (Adjpi )) ] (Cli )) (Lei )) “I am hungry.1.” “What would you like to cook?” “Can you do something with those ingredients?” “I’ll look for a recipe.” 5. I will now give a a description of the utterances that I used to test the program’s design.

“I” and “will” are two separate words. • Verb phrases occur only when there is a lexical verb word. I followed these rules on points where [Hengeveld and Mackenzie. • Plural suffixes on nouns form a layer in their own right. auxiliaries and modals are all verbal words [Keizer. • Main verbs.1. 2008] was not entirely clear: • Verbal words are marked with an operator for finiteness if they are finite. like “I’ll” in example (59). No other operators for other imaginable syntactic features (like number or person) are present. The identifier of such layers is of the form Aff 1 . 2008a]. 2008a].(59) (Lei : (Cli : [ (Npi : (Nwi : i (Nwi )) (Npi ))Subj (fin Vwi : will (Vwi )) (Vpi : (Vwj : look (Vwj )) (Vpi )) (Gwi : for (Gwi )) (Npj : [ (Gwj : a (Gwj )) (Nwj : recipe (Nwj )) ] (Npj ))Obj ] (Cli )) (Lei )) “I’ll look for a recipe. Verb phrases can be nested when the one lexical verb takes the other as an argument. 5.2 Representational Analyses The representational analyses of the dialog utterances from example (55) are given in examples (60–63): (60) (pi : [(pres epi : [ [ (sim ei : [(f i : [ [ (f j : [ hungryAdj (f j )∅ ]) (xi ) ] (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “I’m hungry. • Contracted forms of a verb and a pronoun. are considered to be contracted only at the Phonological Level.1 When I analyzed the utterances of example 55.” 56 . as in example (57) [Keizer.1.1.” Conventions in Morphosyntactic Analysis 5. At the Morphosyntactic Level.

adjective phrases that consist of a single adjective word.2 The Accepted Fragment of English The fragment of English that my program understands. “those” and “a”. is the fragment that uses only the phenomena that are found in the above utterances. Np’s that consist of nouns with or without determiners. lexical verbs. the modals “would”.(61) (pi : [(pres epi : [ [ (sim ei : [(f i : [ [ (f j : [ likeV (f j )∅ ]) (xi )Undergoer (sim ej : [(f k : [ [ (f l : [ cookV (f l )∅ ]) (xi )Actor (xj )Undergoer ] (f k )∅ ]) (ej )∅ ]) ] (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “What would you like to cook?” (pi : [(pres epi : [ [ (sim ei : [ (abil f i : [ [ (f j : [ doV (f j )∅ ]) (xi )Actor (exists xj )Undergoer ] (f i )∅ ] : [ (dist m c xk : [ (f k : [ ingredientN (f k )∅ ]) (c x22 )∅ ])Instrument (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “Can you do something with those ingredients?” (pi : [(fut epi : [ [ (sim ei : [(f i : [ [ (f j : [ lookV for (f j )∅ ]) (xi )Actor (1 c xj : [ (f k : [ recipeN (f k )∅ ]) (c xj )∅ ])Undergoer ] (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “I’ll look for a recipe. “can” and “will”. copulas. Those features are: the determiners “some”. the grammatical words “to”. “with” and “of”. verbs that form predicate frames together with a preposition (like “look for”).” (62) (63) 5. 57 . nouns that may be pronouns. and finally adposition phrases with “with” or “of” as the adposition. atomic lexical nouns or lexical nouns with a plural affix. either infinitive or finite present tense (but not the inflected third person singular form).

That means that the discussion of the node function and the matching function will be intertwined.2 Noun Phrases Even in the few example sentences that I took as a starting point.3 The Node Function and Matching Function The two largest components of an application of the approach to a language. but will leave out the parts that I consider less important. 5. The description here will show how I tackled the bigger challenges. there are a number of different kinds of noun phrases. Describing the node function and matching function in full formal detail is beyond the scope of this thesis. 5. the node function is very simple. They are the personal pronouns “you” and “I”. of which three generate no representational tree chunks and four generate one representational tree chunk. The three that generate a chunk containing a single operator are: • “some”. which generates the entity operator ‘exists’ • “those”.de/thesis/program-source. and one is involved in the treatment of verbs like “look for”. I describe these two functions for the fragment of English specified above. that I will discuss later on. which generates the entity operator ‘1’ (for singular) How these chunks are processed further during treewalk and composition is described in the next subsection on noun phrases. In this section. the question word “what”. “something”. I will describe the node function and the matching function in a phenomenon-by-phenomenon way. Of the four clauses that generate one chunk. are the node function and the combination criteria matching function. In this section.3. so my program can also give utterances that follow different schemes than the utterances in example (55) their appropriate meaning. a plural form with a demonstrative in “those ingredients” and the determiner-noun combination “a recipe”. The specification of the node function includes a lexicon. 58 . 5. but the discussion of a single linguistic phenomenon is not separated across the section. which generates the entity operator ‘dist’ (for ‘distal’) • “a”.zip.1 Grammatical Words For grammatical words. So utterances like “You’ll like to prepare those recipes” or “What can I look for?” can be understood as well. The node function definition contains seven Prolog clauses for grammatical words.I have tried to implement the above features as orthogonal to each other as possible.3. three generate a chunk that contains a single operator. Those who are interested in the full detail can find the source of the implementation of my approach with the application to English at http://reinier.

To get that index. see Figure 5. The first case if for pronouns like “you”. so other chunks with operators and modifiers may be attached to it. that introduce a layer with an absent head. the node function generates a chunk for the lexical property when it is applied to the stem.The node function on noun phrases distinguishes three cases. based on the lexical information that can be accessed via the lexical property. the different alternatives are tried out by Prolog’s backtracking. As an example of what the node function generates in these circumstances. The node function definition for the first and second case just returns a ready-made chunk that contains an entity layer of the right form. it also receives all the chunks generated by the applications of the node function to descendents of N . If there are multiple words with the same orthographic form. The node function definition for the case of noun phrases with lexical heads is more complicated. which has a lexical property in its head. The second case handles the pronoun “something” that introduces an entity layer with an absent head and an ‘exists’ operator. The third case handles noun phrases with lexical heads like “a recipe” or “those ingredients”.1 Lexical Nouns The node function for nouns has two separate cases for noun words that are not further analyzed at the morphosyntactic level (Nw’s with a lexical head). The lexical constant is represented in the computer program as an index into the lexicon. 5. and for noun words that consist of a stem with a lexical head and a plural suffix. “I” and “what”. And that possibility will be used to add more combination criteria to guide the composition stage. The small empty box at the root is a combination criteria node without any combination criteria in it. It also marks on the entity layer whether it is count or mass. the node function may alter the chunk that was generated by the node function invocation on the lexical noun. the node function returns a chunk that contains an entity layer. The node function of the whole noun makes an entity layer and already places the property layer representing the stem in its head.1. 59 . It depicts the single chunk that the node function returns when it is applied to the subtree of the noun word “recipe”. which stands for “more than one”. Recall that when the node function is called on the subtree of a node N . by describing how the node function treats lexical nouns. These chunks generated by the node function for nouns are not passed to the composition stage in this form.3. the entity layer has combination criteria nodes at the place of its operators and modifiers. In the case of an Nw with a lexical head. For nouns that are morphosyntactically decomposed into a stem and a plural suffix.2. So when the node function is called on the noun phrase that a lexical noun is a part of. The node function of the suffix generates a chunk for the FDG entity operator ‘m’. which has in its head the lexical constant. the node function must look up the word in the lexicon. I will start my explanation of how lexical noun phrases are handled. The lexical property has no modifiers or operators. It thus also does the lexicon lookup of the word.

cx i operators head modif iers multi ∅ multi fi operators head modif iers ∅ recipeN Figure 5.1: The representational tree chunk generated by the node function when applied to the morphosyntactic subtree for the noun “recipe” 60 .

for determining the semantic function of the entity that represents the Np. it prevents that “red apples are fruits” could be given the meaning of “apples are red fruits”. morphosyntactic function. It has two tasks.3. the entity layer that represents the whole Np. and to all the root nodes of the other chunks generated for the lower subtrees under the noun phrase. The upperlayer key. This criterion is used.2. As the highest entity layer. The second is adding combination criteria to the chunks generated from its descendents.5. places restrictions on the layer category of what will become the parent layer of the chunk 61 . and the entity layer may not yet be chosen as the entity layer representing another Np. upperlayer and position. That criterion means that they will have to be combined with an entity layer.3 Composition of Noun Phrase Parts During the composition of the chunks of noun phrases into entity layers with modifiers. The latter condition prevents that an entity layer representing an Np within an Np also becomes the representant of the outer Np. It adds combination criteria in three ways: • It adds the criterion upperlayer: x to the root nodes of chunks that stand for property layers that represent adjectives. The place of x is taken by the morphosyntactic function of the Np. there are four combination criteria keys that matter: subtree. the node function selects any chunk that meets two criteria: the child of the root node of the chunk must be an identifier node for an entity layer. when present in a root node of a chunk.2. For example. 5. For example. the upperlayer and position keys deserve some introduction. it prevents that the entity layer representing “a cake” also becomes the entity layer representing a surrounding Np “a recipe for a cake”. The subtree and morphosyntactic function keys have been introduced above.3. The second task of the lexical noun phrase node function is adding combination criteria to the combination criteria nodes in the chunks generated for nodes lower in the tree. are attached to other chunks from the same noun phrase. together with lexical frames for verbs. • It adds a criterion subtree: x to all the terminal combination criteria nodes in the chunk representing the whole Np. The x in the criterion is a freshly generated identifier. • It adds the criterion morphosyntactic function: x to the root node of the chunk representing the whole Np.2 The Noun Phrase Node Function The node function for lexical noun phrases generates chunks based on (i) the chunks generated by node function calls on the subtrees under its descendent nodes and (ii) the morphosyntactic noun phrase subtree. This criterion is used during composition so that chunks that stand for parts of a noun phrase. The first is choosing which entity layer in the chunks generated from its descendents must become the layer that represents the whole Np.

or a modifier of a higher layer. a head. is one for a layer of the category x. to be wrongly inserted into another entity layer. In this way. that chunk can only be added immediately under a list node that is immediately under a head node. that chunk can only be added immediately under a list node that is immediately under a modifiers node. Still. Possible values with this key are operator. that chunk can only be added immediately under an operators node. that chunk can only be added immediately under a head node. If there is a criterion position: operator in the root node of a chunk.in the composed tree. They are small chunks with only a constant node as a child of the root node. In this paragraph. If there is a criterion position: modifier in the root node of a chunk. When a representational frame is inserted into a terminal combination criteria node that has a combination criterion subtree: x. would not make sense. such chunks may have been “added immediately under an operators. If a chunk has a criterion upperlayer: x in its root node. If there is a criterion position: head in the root node of a chunk. head or modifiers node” because the terminal combination criteria node was immediately under such a node. inserting a representational tree propagates the subtree constraint downward in the tree. to a property for instance. all the terminal combination criteria nodes in the inserted copy of the frame also get the criterion subtree: x. The position criteria are used in a similar way for operators. What remains to be explained is the exact meaning of the subtree criteria. If the root of a chunk contains a criterion subtree: x. it can only be attached to the output tree in such a way. As was already stated above. in the root node of a chunk. that chunk can only be added immediately under a modifiers node. head. these criteria are used to prevent modifiers and operators of the one entity layer. 62 . the position where something is added is the position where the combination criteria node was that is being filled in. its place will be taken by a list node and the added chunks will end up underneath the list node. that the first identifier node above the point where the chunk was added. it can only be inserted at terminal combination criteria nodes that also have the exact same criterion subtree: x. If there is a criterion position: inhead in the root node of a chunk. Adding the operator ‘1’. modifier. Criteria of the form position: operator ensure that operator chunks are only inserted into the output tree as an operator. If there is a criterion position: inmodifiers in the root node of a chunk. indicating singularity. or that have no criterion with the key subtree at all. inhead and inmodifier. The upperlayer criteria are used for representational operators that originate from within Np’s. If a terminal combination criteria node has the criterion multi. indicates if the chunk is intended as an operator. The root node contains the criterion upperlayer: x to make sure that the constant node will be inserted into a layer of the category that it belongs to. The position key.

• The verb “look for” as an argument to the modal “will”. If it is. Different copula frames are tried out by Prolog’s backtracking. “can” and “will”.3 Verb Phrases Verb phrases and verbal words. contains a combination criteria node that specifies that it can be replaced by a chunk that represents an adjectival property.1 Copulas As already stated above. The list of chunks that the node function for a copula returns contains just one copula frame chunk. different outcomes for the node function are possible that contain another copula frame chunk. copulas are not wrapped in verb phrases at the Morphosyntactic Level. which will be the entity to which that adjectival predicate is applied. Thus it forms the verb phrase “to cook” with the grammatical word “to”. but if one tries to find more solutions.3. and a combination criteria node that must be replaced by an entity chunk. • The modals “would”. so I will start with copulas and finalize with the discussion of the treatment of composite predicate constructions. 5. • The transitive verb “do”. The different copula frames contain different arrangements of a predicate and arguments. come in different types even in the small fragment of English analyzed here. When the node function is applied to a verbal word. as an infinitive argument to “like”. like noun phrases and nouns. In the four example sentences we find the following different types of verb phrases and verbal words: • The copula “am” in “I’m hungry”.3. The copula frame used in the sentence “I am hungry” for example. It does not contribute any constant or identifier nodes itself. The copula frame is a piece of representational information that merely controls the way that other chunks are combined. The copula frame is a chunk with a property layer that will become the head of a state of affairs layer. Hence the copula is not itself recognizable as a layer at the Representational Level. I have analyzed “look for” as a ‘composite predicate construction’ as explained in [Keizer. They are usually verbal words directly in a list in the head of a clause. • The catenative verb “like”.2. I will now discuss the treatment of all these phenomena in order. than the node function will generate a chunk that contains a copula frame. 63 . that can be expressed with a copula.3. 2008b]. So the head of the property layer will contain the predication. • The transitive verb “cook”. it checks if the head of the word is a constant that is known as a copula.5. as an argument to the modal “can”. This copula frame is shown in Figure 5.

The node function generates these two operator chunks for every finite verb. The modal “can” generates a chunk with the episode operator ‘pres’.3.3. which signifies that a state of affairs occurs simultaneously with the rest of the episode. In the fragment of English under discussion. and 64 . The modal “will” generates a chunk with the episode operator ‘fut’.lowerlayer : f critical lowerlayer : x critical Figure 5. lexical verbs are always contained in verb phrases at the Morphosyntactic Level according to the rules in 5. do not necessarily occur inside a verb phrase at the Morphosyntactic Level.3. The differences between these verbs are in the information in their lexicon entries.2: The copula frame used in “I’m hungry” Besides the copula frame chunk. for “ability”. and a chunk with the property operator ‘abil’. The first operator chunk contains an operator node with the operator ‘pres’. the inverse is also true: there is a lexical verb in the head of every verb phrase. Its function in the utterance is signifying politeness. After that. This operator must be inserted to the property layer in the head of the state of affairs. the node function for copulas also returns two chunks with operators.1. First. 5. The node function for verbal words queries the lexicon. like copulas. The treatment of lexical verbs by the node function has been divided between a node function definition for verbal words and a node function definition for verb phrases. Contrary to modals and copulas. I discuss the general handling of lexical verbs by the node function. regardless of their valence and regardless of whether they give rise to composite predicate constructions.1. They only contribute zero or more operators to the Representational Level.1.3. The modal “would” in “What would you like to cook?” is analyzed as meaningless at the Representational Level. I discuss the way my program handles these different kinds of verbs differently. which is represented at the Interpersonal Level.2 Modals Modals. 5. The second operator chunk contains an operator node with the operator ‘sim’. are handled in the same way by the node function. which signifies that an episode layer has present tense. indicating present tense. indicating that the episode is in the future tense.3 Lexical Verbs All lexical verbs.

So for example for the verb “like”. Additionally. The lexical frames specify the form of the head of the predication property. The fragment of English that the program understands does not contain other absolute tenses than present (‘pres’) and future (‘fut’). into which the predication property layer will be inserted during the composition stage. its only lexical frame has the form given in Figure 5. The node function for verb phrases that start with “to” followed by an infinitive does all the things that the node function does for verb phrases that start with a lexical verb. it is combined with another verb phrase at the Morphosyntactic Level. Every lexical verb has associated with it a list of lexical frames. Catenative Verbs Taking an Infinitive The verb “like” is a catenative verb taking an infinitive in “What would you like to cook?”.generates a chunk with the predication property layer with combination criteria nodes for the representational arguments of the verb. the node function only generates the state of affairs chunk. The node function for lexical verbal words generates two or three chunks. the node function for the verb phrase generates a chunk that contains a state of affairs layer. it generates a chunk with the ‘sim’ operator for the state of affairs that the verb predicate will be a part of. The node function for verb phrases behaves differently based on whether the verb phrase begins with a lexical verb. At the Representational Level. the main verb of which is in the infinitive mood. The node function looks up the lexical frame. and to the terminal combination criteria nodes of the state of affairs chunk. and no other relative tenses than simultaneous (‘sim’). the lexical property “like” is applied to an entity (the person that likes) and a state of affairs (what this person would 65 . This is done so that the predication based on the “to”-infinitive will have an implicit subject. In the second case.3. The node function also adds subtree criteria to the root nodes of the chunks that it receives as input. Besides the chunk containing the predication property layer. and the node function for it also generates a chunk with the ‘pres’ operator indicating the present tense of the episode. when it is called on a lexical verb. it is assumed to be indicating the tense of a whole episode. it replaces the combination criteria node with the combination criterion morphosyntactic function: Subj by a coreference resolution node. and incorporates it in the predication property layer. In the first case. In that chunk. The node function for the verb phrase is responsible for creating a chunk with a state of affairs layer in it. These lexical frames are a part of the chunk that will be generated for these verbs by the node function. And if the verb is finite. This is done to make sure that the input chunks combine with the state of affairs chunk of the same verb phrase. it changes the predication property chunk that was generated by the verbal word’s node function. This property of the verb “like” is encoded in the lexicon in our program. or the verb phrase begins with “to” followed by an infinitive. a verb phrase that begins with a lexical verb. That is.

it has two arguments: an entity that has the function Actor. To address these issues. Ideally. 2008b].fi critical lowerlayer : x function : Undergoer morphosyntactic function : Subj critical lowerlayer : e operators head modif iers ∅ likeV Figure 5. This state of affairs layer that is an argument of the predicate “like” typically comes from the node function of the main verb that is the morphosyntactic argument of “like”. In the fragment of English understood by the program. the verb is represented at the Representational Level as a single verbal predicate lookV for. It would be impractical to generate the verbal predicate lookV for in one piece. This poses a challenge to the node function. Transitive Verbs The verb “cook” is a transitive verb. for example “look after”. the morphosyntactic realization of which has the function Subj.3: The only lexical frame for the verb “like” in the fragment of English like to do).5 for “look”. The lexical frame also contains a combination 66 . and an entity that has the function Undergoer.4. a constant lookV and a constant for. This lexical frame is the only lexical frame in the program for “cook”. The composite predicate lookV for is represented in tree form as a list with two elements. Then we can have the lexical frame given in Figure 5. Moreover the word “look” can be used to express composite predicate constructions with other prepositions. the morphosyntactic realization of which has the function Obj. because the nodes for “look” and “for” are so far apart in the morphosyntactic tree. Composite Predicate Constructions The verb “look for” in “I’ll look for a recipe” falls apart into two parts at the Morphosyntactic Level: a verbal word “look” and a grammatical word “for”. the node function for the verbal word “look” would produce a chunk that can be used in composite predicate constructions with different prepositions. More precisely. it is the only lexical frame for “look”. That is encoded in the lexical frame given in Figure 5. The constant lookV is included in the lexical frame. we extend the tree representation of FDG layers to accept lists of constants in the head of a layer. According to [Keizer.

The node function for an adjective word looks up the adjective in the lexicon. What prepositions it can be used with. An adposition phrase. At this point all types of lexical verbs in the fragment of English have been discussed. 5. It does alter some of the chunks that have been generated by the node function for the constituents of the noun 67 . in the fragment of English under discussion.fi critical lowerlayer : x function : Actor morphosyntactic function : Subj critical lowerlayer : x function : Undergoer morphosyntactic function : Obj operators head modif iers ∅ cookV Figure 5. is an adposition followed by a noun phrase. Hence.5 Adposition Phrases The last type of constituent left to discuss is the adposition phrase. When the node function is applied to the identifier node of an adposition phrase. and it can be used with different prepositions. can be controlled with the composite verb preposition: x criterion in the combination criteria node in the lexical frame. 5.3. the fragment of language understood by the program contains only adjective phrases that have a single adjective word in their head. The constant for is generated by the node function of the grammatical word “for” and is inserted into the open spot in the verbal predicate during the composition stage. and returns a representational tree chunk that contains a property layer with the lexical constant in its head. This way. there is one lexical frame for “look” used with a preposition. The node function for the adjective phrase’s identifier node does nothing but passing on the chunks generated by the node function for nodes in its subtrees.4: The only lexical frame for the verb “cook” in the fragment of English criteria node for the constant from the preposition. it does not generate any new chunks. and I go on to discuss how the program handles adjective phrases.4 Adjective Phrases The only adjective phrase in the four example utterances is “hungry” in “I’m hungry”.3.

5: The only lexical frame for the verb “look” in the fragment of English phrase.fi critical lowerlayer : x function : Actor morphosyntactic function : Subj critical lowerlayer : x function : Undergoer morphosyntactic function : Obj operators head modif iers ∅ lookV critical composite verb preposition : for Figure 5. they are a contribution of FDG to an efficient natural language processing system. 5. The function of the edge above the highest identifier node is changed to ‘Instrument’ so that when the chunk is inserted under a modifiers node. The adposition phrase node function recognizes two adpositions: “with” and “of”.4 Representational Frames Adposition phrases were the last category of layer for which the node function and combination criteria function still had to be explained. As such. The representational frames restrict the set of semantic constructions that the program can generate. and adds to its root combination criteria node the criteria upperlayer: f and position: modifier. and adds to its root combination criteria node the criteria upperlayer: f and position: inhead. the function of the entity will be ‘Instrument’. 68 . In the case of “of”. In the case of “with”. the node function takes the main entity layer chunk generated by the noun phrase node function. so that when the chunk is inserted into a head. It alters these chunks to give them the function that is expressed by the adposition. I will discuss the representational frames that are available during the composition stage for the fragment of English the program accepts. it takes the main entity layer chunk generated by the noun phrase node function. The function of the edge above the highest identifier node in the chunk is changed to ‘Ref’. In this section. the function of the entity layer within that head will be ‘Ref’.

That means that an episode layer must have at least one operator. This is because in the fragment of English the program accepts.6: The frame for states of affairs epi operators head lowerlayer : e critical multi modif iers critical multi Figure 5. episodes or propositional contents do not occur. In the fragment that the program accepts. and that the propositional content frame does not have children under its operators node. They are stored as representational tree chunks. modifiers of states of affairs.ei operators head lowerlayer : f critical modif iers multi Figure 5. present or future. That happens because every Representational Level in the fragment of English contains a propositional content layer and an 69 . The episode and propositional content frame are used for the Representational Level of every utterance.6–5. The combination criteria node under the operators node of the episode frame has a critical criterion.8. A striking feature in these figures is that none of the frames have any children under their modifiers node.7: The frame for episodes There are three frames available in the program. This criterion was added to ensure that every episode at least has an operator that indicates its tense: past. only present and future tense occur. The frames are given in Figures 5. Neither do operators of propositional contents. Having combination criteria nodes to allow for such modifiers or operators would make the composition stage take more time. All the frames have an empty combination criteria node at their root.

The critical criterion means that it must be filled: an empty output tree is not an option. in which there is no main verb that generates a state of affairs layer during the treewalk.pi operators head lowerlayer : ep critical modif iers Figure 5. and there is no way to introduce such layers other than these frames. that is correct.8. In the limited fragment of English under discussion. The lowerlayer: p means that the outermost layer in the output Representational Level must be a propositional content layer. Because the program contains only one frame for propositional contents. The states of affairs layer is only used in utterances with copulas. namely the one given in Figure 5. the only thing not yet described about how we handle this fragment of English is the initial open spot.8: The frame for propositional contents episode layer. because the fragment of English does not contain imperative utterances. which have a state of affairs layer as the outermost layer at the Representational Level. 70 . this frame is always used to fill the initial open spot.5 Initial open spot Having discussed representational frames. It thus contains the combination criteria that select the chunk that will be at the root of the output representational tree. This is the open spot that is the only open spot in the open spots queue when the composition stage starts. 5. The initial open spot in my application of the approach to English contains the combination criteria critical and lowerlayer: p.

The Representational Level is given in example (62). repeated here as example (65). The utterance that will serve as an example is “Can you do something with those ingredients?”. it is time to show how a simple utterance is transformed to illustrate the approach.Chapter 6 An Example Now that the approach for transforming the Morphosyntactic Level into a Representational Level has been explained. (64) (Lei : (Cli : [ (fin Vwi : can (Vwi )) (Npi : (Nwi : you (Nwi )) (Npi ))Subj (Vpi : (Vwj : do (Vwj )) (Vpi )) (Npj : (Nwj : something (Nwj )) (Npj ))Obj (Adpi : [ (Gwi : with (Gwi )) (Npk : [ (Gwj : those (Gwj )) (Nwk : [ (Nsi : ingredient (Nsi )) (Aff i : s (Aff i )) ] (Nwk )) ] (Npk )) ] (Adpi )) ] (Cli )) (Lei )) “Can you do something with those ingredients?” 71 . repeated here as example (64). The Morphosyntactic Level of that utterance is given in example (21).

1 The Treewalk Stage The first stage of the approach described in chapter 4 is the treewalk stage. which is the xi from example (65). The treewalk procedure walks through the tree representation of the Morphosyntactic Level in postorder. that has a head node which has a constant node with the constant “can” as its child.position : operator upperlayer : ep pres Figure 6. Those chunks are given in Figure 6. And in that case the node function returns two chunks: one for the ‘pres’ operator on episodes. For this node. There is a case in the node function definition for an identifier node for a verbal word. The generated chunk is shown in Figure 6. The next node function call invokes the node function on the identifier node of the Nwi layer. The first node that the node function is applied to. and one for the ‘abil’ operator on properties.1: Representational chunk for the ‘pres’ episode operator position : operator upperlayer : f abil Figure 6.2: Representational chunk for the ‘abil’ property operator (65) (pi : [(pres epi : [ [ (sim ei : [ (abil f i : [ [ (f j : [ doV (f j )∅ ]) (xi )Actor (exists xj )Undergoer ] (f i )∅ ] : [ (dist m c xk : [ (f k : [ ingredientN (f k )∅ ]) (c x22 )∅ ])Instrument (f i )∅ ]) (ei )∅ ]) ] (epi )∅ ]) (pi )∅ ]) “Can you do something with those ingredients?” 6.3. 72 . one chunk is generated that contains an individual layer. applying the node function to every identifier node.2.1 and Figure 6. is the identifier node of (Vwi ).

The sequence of chunks returned by the Npi node function does not contain the chunk from Figure 6.5. This chunk is shown in Figure 6.xi operators head modif iers multi multi Figure 6. the node function is applied to the identifier node of Vwj . the morphosyntactic layer for the word “something”. This time. The next identifier node that the node function is applied to. The only update to the latter two chunks is that the criterion subtree: 2 is added to their root combination criteria nodes. The updated chunk from Npi ’s node function is shown in Figure 6. namely the one that was created by Nwi ’s node function and is shown in Figure 6. The first is a chunk with a predication frame with the predicate filled.6. The latter criterion reflects that the xi entity layer represents an Np that was a subject.3. is Nwj . but does contain a copy of it that is augmented with some more combination criteria in the combination criteria nodes. and with open spots for the arguments of the predicate. Because the identifier node of Nwi is in a subtree of the identifier node of Npi . The node function for an identifier node of an Np must choose a chunk from the sequence it receives. In this case.3: Representational chunk for the “you” in “Can you do something with those ingredients?” Then. The next morphosyntactic layer to which the node function is applied. is the identifier node for Vpi . The added criteria are subtree: 1 in the terminal combination criteria nodes and morphosyntactic function: Subj in the root node of the chunk. This chunk is shown in Figure 6. there is only one entity chunk. For this layer. the chunk generated by the node function for Nwi is given to the node function for Npi as an element of the chunk sequence. to supply the outer entity layer that represents this Np. the node function generates two chunks. the chunk sequence argument of the node function is not empty. The new chunk that is generated contains a state of affairs layer and is shown in Figure 6. the 73 . the verbal word “do”.7.4. The other chunk is a simple chunk that contains the operator for simultaneity of a state of affairs. the node function is called on the identifier node of Npi . This node function invocation returns a sequence of chunks that contains one new chunk and updated copies of the two chunks from Vpj ’s node function. For this word. After Npi .3.

5: The predicate frame chunk generated for the verbal word “do” 74 .morphosyntactic function : Subj xi operators head modif iers subtree : 1 multi subtree : 1 multi Figure 6.4: Representational chunk for the “you” in “Can you do something with those ingredients?”. with updates by the Np node function fi operators head modif iers multi ∅ multi fj critical lowerlayer : x function : Actor morphosyntactic function : Subj critical lowerlayer : x function : Undergoer morphosyntactic function : Obj operators head modif iers ∅ doV Figure 6.

7: The state of affairs chunk generated for the verb phrase Vpi 75 .position : operator upperlayer : e sim Figure 6.6: The operator chunk generated for the verbal word “do” ei operators critical multi subtree : 2 head lowerlayer : f critical subtree : 2 modif iers Figure 6.

The chunk is shown in Figure 6.xj operators head modif iers exists Figure 6.8 with morphosyntactic function: Obj added in its root combination criteria node. of which the properties are unknown or not interesting. First. This is in line with the meaning of “something”: a certain individual. After the word “those” comes the word “ingredients”. with identifier Nsi . it returns an empty sequence. which will to the same thing to the chunk in Figure 6. The next step is of course the node function for Npj . After Gwi .9: The representational chunk generated for the word “those” node function generates a chunk that is shown in Figure 6. When the node function is called on the identifier node of Gwi . 76 . namely the chunk from Figure 6. The node function call on the identifier node for Npj thus returns a sequence with one element. At the morphosyntactic level. the node function is applied to the identifier node of Gwj . This is a chunk with an individual layer that has no head or modifiers. The meaning of “with” is handled by the node function of the adposition phrase identifier node. This generates a chunk with an operator for individuals that indicates that those individuals are relatively far from the deictic center: the ‘dist’ operator. “those”.8 as the Npi node function did to the chunk in Figure 6. “with”.3. For this layer.8: The representational chunk generated for the word “something” upperlayer : x position : operator dist Figure 6. the word “ingredients” has in its head a list of two layers: a nominal stem “ingredient-” and an affix “-s”. the node function is applied to the identifier node of the stem “ingredient-”.9.8. just the operator ‘exists’.

In this node function invocation on Adi . So the returned sequence of chunks contains the two operator chunks for “those” and “-s”. It looks at the identifier node of the first element of the list in the head of the adposition phrase. Then the node function is applied to the identifier node of Adpi and the sequence of chunks returned by the node function for Npk .12.11. the node function generates a chunk that contains the plural operator ‘m’. This chunk has no combination criteria in its root node. looks it up in a list of prepositions.10.11. and if it’s a grammatical word. the node function looks at the subtree of the node it is applied to. It gets a criterion morphosyntactic function: none in its root node.9. and uses its contents as the lexical property for the individual layer it generates. Now it is time to apply the node function to Nwk and the sequence of chunks generated for Nsi and Aff i . It is not shown here. To find out what function label to insert. and those alternatives would have to be tried with Prolog backtracking. and the chunk with the plural operator generated by the node function for Aff i . and the criterion subtree: 4 is added to its terminal combination criteria nodes.12 that the node function for the adposition 77 . and has no terminal combination criteria nodes. Then the node function is applied to the identifier node of Npk . the node function takes the lexical property chunk from the sequence of chunks it receives. and it is thus selected as the main individual layer chunk for this noun phrase. One can also see in Figure 6. For the identifier node of Aff i . The criterion subtree: 4 is also added to the root combination criteria nodes of the two operator chunks. 6.10 and 6. This is shown in Figure 6. The chunk that contains the individual layer for the word “ingredients” is the only chunk with an individual layer. For the preposition “with”. the edge between the rode node and the xk identifier node gets the label ‘Instrument’. In a larger fragment of English. the affix “-s” could mean multiple things.upperlayer : x position : operator m Figure 6. The chunk is shown in Figure 6. For this layer. In the individual chunk. The sequence of chunks that is given to this node function call contains the chunks in Figures 6. something interesting happens: the individual layer xk is assigned the Instrument function. The returned sequence of chunks thus contains two chunks: a chunk with an individual layer with in its head the lexical property for “ingredient”. and a modified copy of the individual chunk. The chunk with the individual layer from Nwk is shown in Figure 6. it has stored in that list that the function ‘Instrument’ should be added.10: The chunk generated for the noun affix “-s” the node function generates a property chunk with the lexical property of the lexeme “ingredient”. The sequence of chunks it is applied to contains only the chunks returned for Npk because the node function invocation on Gwi returned an empty sequence.

. Figure 6. . .12: The top of the chunk for “ingredient”. .11: The chunk with the individual layer for the word “ingredients” upperlayer : f position : modifier morphosyntactic function : none Instrument cx k . with function label 78 .cx k operators head modif iers multi ∅ multi fk operators head modif iers ∅ ingredientN Figure 6. . . . .

This frame contains two combination criteria nodes. This node function simply passes on the sequence it receives as input. with the elements separated by commas. The open spots queue becomes <[critical. which contains all the chunks discussed above. critical]>. The first of these open spots can be filled by the chunk with the operator ‘pres’ shown in Figure 6.phrase adds two combination criteria to the root node: upperlayer :f and position: modifier (recall that morphosyntactic function: none was already added by the noun phrase node function). are fulfilled because the open spot stems from a terminal combination criteria node in the operator position of an episode layer. multi]>. These criteria are meant to ensure that xk is added as a modifier to the predicate property layer (f i in the case of our example sentence). The initial open spot is filled in by the only frame for propositional contents for this fragment of English. position: operator and upperlayer: ep. Then the treewalk has returned for the linguistic expression layer that forms the entire Morphosyntactic Level. The composition stage works by maintaining an open spots queue that tracks where chunks have to be filled in. and the treewalk stage is ready.2 The Composition Stage Now that there is a sequence of chunks from the treewalk stage. So the initial open spots queue is written as <[critical. the open spots queue contains only the initial open spot as discussed in §5. multi]>. The remaining open spot is filled by the chunk with a states of affairs layer that was generated by the node function for Vpi . critical. has only one terminal combination criteria node. Then the node function is applied to the identifier node of Lei and this sequence. The next node function call works on the identifier node of Cli . The initial open spot. the composition stage starts to compose them into a Representational Level. In this section. critical. The criteria in the chunk’s root node. multi]. This one open spot will be filled by the frame for episodes as shown in Figure 5. subtree: 2]. but that is not shown here.8. The open spots queue becomes <[lowerlayer: ep. The open spots contain more information than just the combination criteria. This frame. shown in Figure 5.7. and open spots for these nodes are added to the queue. The open spot is filled and the open spots queue becomes <[lowerlayer: e. [lowerlayer: e. The open spots queue becomes <[critical. lowerlayer: p]>. I will write down the state of the open spots queue as a list delimited with < and >. [lowerlayer: f.5. with elements separated by commas. multi. The open spots themselves are written as lists of criteria delimited with [ and ]. This call also passes on the sequence. is taken off the queue and a new open spot for the terminal combination criteria node is appended. 6. The head of the queue is filled in by the chunk with the ‘sim’ operator show 79 . subtree: 2]>. once filled.1. This chunk contains two terminal combination criteria nodes. The node function finds out what criteria to add for a given adposition by looking at the list that also stores the function. Initially. critical.

because the chunks for “with 80 . The first choices that Prolog will undo are the ones that it made most recently. function: Undergoer. When the open spot at the head of the queue is taken off. and open spots for its four terminal combination criteria nodes are added to the queue. Because there are unused chunks. [multi. The head of the queue can now only be filled in by the chunk with the individual layer for “something”. subtree: 1]. morphosyntactic function: Obj]. This open spot can only be filled by the chunk for “you” shown in Figure 6. [multi]. However. function: Actor. So the program will try to skip these open spots. The open spots queue becomes <[multi].6. it will try different ways to fill certain open spots.4. There is only one chunk that meets these constraints: the chunk generated for the word “do”. and that chunk filled in as an operator of the f i layer. shown in Figure 6. as shown in Figure 6. The Prolog programming language reacts to this failure by trying out alternative choices in a systematic way. At this point the open spots queue contains only open spots with the criterion multi and without the criterion critical. subtree: 1]. morphosyntactic function: Subj].in Figure 6. [multi. and that does not have a criterion subtree: x in its root node for any x other than 2.8. subtree: 1]. function: Undergoer. critical. subtree: 1]. The unused chunks are the chunk with the ‘abil’ operator shown in Figure 6. Now the arguments of “do” will be filled in again as described above. and the algorithm tries the smallest way to fill a multi open spot first. lowerlayer: x. function: Actor.5. the open spots queue becomes [critical. morphosyntactic function: Obj]. However. and as the two terminal combination criteria of the “you” chunk become open spots. [multi]>. The only chunk matching this is the chunk with the operator ‘abil’ shown in Figure 6. Prolog has exhausted all possibilities for choices that it made later than filling in the operator of f i . the one remaining open spot must be filled in by a chunk that has an identifier node of a property layer under its root. if it does so. so the open spots queue becomes [multi]. And so the head of the queue becomes [critical. [multi. [multi. There is no critical criterion for the open spot. lowerlayer: x. This chunk does not contain any terminal combination criteria nodes. the composition algorithm first tries not to fill in anything in this open spot. [critical. At this point it has to redo the filling of the open spot with combination criteria [multi] for the operators of a property layer. In this case. morphosyntactic function: Subj].2 and the chunks generated for the constituent “with those ingredients”. lowerlayer: x. lowerlayer: x. the initial attempt to compose the chunks fails. none of these backtracking attempts will solve the composition of this utterance as long as it has not inserted the ‘abil’ operator into the property layer f i of the predication. some chunks remain unused. The queue becomes <[lowerlayer: f. subtree: 2]>.2. It tries to fill in things for some of the open spots without a critical criterion that had been left unfilled. so it does not have to be filled in. It is filled in for this open spot. It is filled in. and the composition will again fail as described above. filling in nothing in them. [critical. As one can see. At some point.

and it skips the open spots for the two terminal combination criteria nodes in the chunk for the individual layer xk for “those ingredients”. Again.10. the two chunks for the operators ‘dist’ and ‘m’ have not been used and Prolog will have to backtrack again. This time it has to redo the filling of the open spot for the operators of xk to be successful. The Representational Level shown in example (65) has been found. This time it is the backtracking on the open spot for the modifiers of f i that will ultimately lead to success. so this stage is a no-op in this case. composition succeeds. 81 . but these will fail because the other chunk has not been used at the end.those ingredients” have still not been filled in.12. Only when Prolog tries to fill in both operator chunks shown in Figures 6. At this point. The open spots queue becomes empty and no chunks are left.9 and 6. The only chunk that could be used to fill in this open spot is the chunk with the individual layer for “with those ingredients”. the program fills in the remaining open spots as described above. When the open spots queue becomes empty again. There are no coreference nodes in the tree to resolve. as shown in Figures 6.11 and 6. This open spot has the criteria [multi] and stands for a property modifier. the program goes to the coreference resolution stage. First it will try both ways to fill in only one operator.

Chapter 7 Discussion and Conclusion This final chapter has two purposes. The other is to conclude the thesis with a summary of the work. However. and what it may not do. critical or position have a special meaning to the program.1. The first is to discuss the results presented in the previous chapters and suggest future research.1. Choosing whether to handle a certain linguistic phenomenon in a single node function invocation. 82 . This means that the node function is also applied to the entire morphosyntactic tree. This is clearly not in the spirit of the approach described in this thesis: the node function for the highest node should leave most of the work to the node function for the lower nodes and to the composition stage.1 Discussion and Future Work There are a number of properties of the program outlined in this thesis that deserve some discussion. multi. a clear limit to what one node function invocation must do is lacking. It is an open question whether there are more combination criteria keys that are universal. in multiple node function invocations or during the composition stage is a matter of style. One could thus craft a node function that computes the entire Representational Level of an utterance in one node function invocation. and are thus assumed to be universal. 7. The combination criteria with a key that is one of function. 7. I will treat these properties one by one in this section.2 Universal Combination Criteria Perhaps it is possible to find a system of combination criteria for the composition stage that can be used for every natural language.1 Limitations of the Node Function The node function is applied to the entire subtree of every node in the input tree. It could be useful to investigate whether rules can be formulated about what operations a node function invocation may do. 7.

one would have to define a set of interpersonal frames. Neither does the Functional Discourse Grammar theory. but that prevent the exploration of certain combinations of chunks.3. 7.1.4 Efficiency The efficiency of the program has not been a design consideration during the project that produced this thesis. To build a complete FDGbased language understanding system however. That is because restrictor nodes may carry important information in a Representational Level.2 Computing the Morphosyntactic Level This thesis does not address how one gets a Morphosyntactic Level of an utterance in the first place.3 Towards a Complete FDG-based Understanding System This thesis has so far only described a computer program to compute a Representational Level from a Morphosyntactic Level. one will at least also need to compute the Interpersonal Level from the Representational and Morphosyntactic Levels. One could think of adding more combination criteria to chunks.1. one could generate a Phonological Level based on a text or an acoustic signal. and the Morphosyntactic Level from a Phonological Level or from text. 7. 7. the program described in this thesis can be used to compute an Interpersonal Level from the Morphosyntactic Level and the Representational Level.1. using the interpersonal frames and the combination criteria matching function.1. Also. It would be interesting 83 . and compute the Morphosyntactic Level from the Phonological Level. Then it is possible to combine the interpersonal chunks from the treewalks through the morphosyntactic and representational levels into an Interpersonal Level. Finally.3.7. That could be done by doing a treewalk that generates interpersonal chunks on both the morphosyntactic and representational trees. one would define an interpersonal combination criteria matching function.1. There are probably many ways in which the program can be made to run faster. 7. whereas they are meaningless at the Morphosyntactic Level. The treewalk procedure would have to be slightly adapted to walking through a Representational Level: it must no longer ignore restrictor nodes.1 Computing the Interpersonal Level As already stated in section 4. One could try to parse a text into a Morphosyntactic Level using an existing computational grammar that was developed without regard to FDG. Alternatively.5 Coreference Resolution The approach outlined in this thesis does not specify a way to resolve anaphora.1. which is an accordance with FDG theory. that are not strictly necessary for the end result to be correct. That way both morphosyntactic and representational constituents can contribute interpersonal information.

The first stage is the treewalk stage. During this stage. The result of this second stage is a Representational Level that still has dangling references for anaphora or implicit subjects. that carry information on how they should be combined with other chunks. Because the program is implemented in the Prolog programming language. the first attempt to create a computer program that implements some aspects of Functional Discourse Grammar. 7. the coreference resolution stage.1. The program presented in this thesis regards the Morphosyntactic and Representational Levels as trees.to try to come up with a way of resolving anaphora that fits both into the approach described here and into Functional Discourse Grammar as a whole. It works in three stages. In a third stage. If such an approach would be feasible. This program is based on the Functional Discourse Grammar theory of natural language. 7. it would be exploiting FDG theory to create better natural language processing systems. 84 .2 Conclusion In this thesis. the composition stage. it computes a Representational Level when given a Morphosyntactic Level. based on the type of interpersonal act that the hearer expects. the program works through the Morphosyntactic Level tree with a postorder traversal. During this traversal. so that dangling references that originate from implicit subjects are removed. The work presented in this thesis is. That offers a way to handle ambiguities: when given an ambiguous utterance. In a second stage.6 Use High-Level Information The high-level information stored in representational and interpersonal layers in the contextual component. I have presented a computer program that computes a semantic representation of an utterance in English from a morphosyntactic representation of the utterance. The program does not yet feature a way to resolve dangling references that originate from anaphora. One could think of assigning likelihoods to different representational frames. In Functional Discourse Grammar terms. may be of help when interpreting new utterances. to my knowledge. the program will give multiple solutions by backtracking. it generates chunks of the Representational Level. it may backtrack when a choice it made turns out not to lead to a solution. these chunks are composed according to the combination criteria the chunks carry. implicit subjects are resolved.

College Publications. [Montague. (2008). K. [Hengeveld and Mackenzie. Twee vragen over Functional Discourse Grammar. Functional Discourse Grammar: A Typologically-Based Theory of Language Structure. The Netherlands. Functional Grammar and the Computer. L. and Striegnitz. Bos. personal e-mail. The Netherlands. C. Discourse Particles in Latin. 85 . P. Free online version available at http://www. E. F.. 2006] Blackburn. Addison-Wesley. 2008a] Keizer. C. Italy. 1983] G¨nther. Functional Discourse Grammar. The Theory of Functional Grammar. (2008b). In Keith Brown. F. [Hengeveld and Mackenzie. Foris.learnprolognow. Oxford. Elsevier. E. The Art of Computer Programming: Fundamental Algorithms. 1989] Dignum. Amsterdam. J. J. (2006). volume 4. Oxford University Press. Functional Grammar Series. volume 7 of Texts in Computing.Bibliography [Blackburn et al. 1989] Dik. pages 111 – 134. Foris. (1989).C. [Dik. English as a Formal Language. [G¨nther and Lehmann. 2008] Hengeveld. (1983). United Kingdom. [Knuth. and Mackenzie. 2008b] Keizer. pages 668 – 676. D. Gieben. [Keizer. (2006). Part I: The Structure of the Clause.. [Kroon. pages 308 – 318. Functional Grammar Series. (1997). J. J. 2nd edition. Rules for u u Pronominalization. Massachusetts. chapter Parsing an English Text Using Functional Grammar. Lingua. Encyclopedia of Language and Linguistics. In Proceedings of the first conference on European chapter of the Association for Computational Linguistics. Dordrecht. Learn Prolog Now!. chapter Trees. (1995). United Kingdom. S. and Mackenzie. (1970). Pisa. Verb-preposition constructions in FDG. R.org/. 1997] Knuth. K. Reading. Oxford. and Lehmann. E. The Netherlands. [Dignum. 1970] Montague. editor. 1995] Kroon. K.. 2006] Hengeveld. Dordrecht. L. (1989). H. (2008a). [Keizer.

nl/scripts/wa. M.[Montague. 86 .cgi?A1=ind0811&L=fg. 2008] Smit. and Hengeveld. [Smit et al. 1973] Montague. The Proper Treatment of Quantification in Ordinary English. Mackenzie. 2002] Moortgat. Representational Layering in Functional Discourse Grammar. Discussion on Functional Grammar mailing list: https://listserv.. 2007] Smit. and Van Staden. Variabele in FDG-layers. N. (1973). R. N. L. (2007). K. [Moortgat. Alfa. 51 (2):143 – 164.. M. (2008). [Smit and Van Staden. J. Categorial grammar and formal semantics.surfnet.. (2002).

En wat nog het mooiste is: hij staat je altijd bij in gewoon gesproken Nederlands. Het signaal dat uit een microfoon komt. zal eerst van ruis moeten worden ontdaan. zijn veel technieken uit de Kunstmatige Intelligentie nodig. En deze scriptie beschrijft een techniek waarvan ik hoop dat zij bruikbaar is bij het verwerken van de gesproken taal voor deze robot.2 Functionele discourse-grammatica Bij het omzetten van de ontleding van een taaluiting naar de betekenis ervan. Hij moet kennis opslaan en doorzoeken. Hij moet met die kennis redeneren om tot een voor de gebruiker relevant antwoord te komen. gegeven de ontleding ervan. In het kader van het Dutch Companion Project (zie http://www. en die reeks moet weer worden omgezet in een reeks woorden. De robot moet emoties kunnen herkennen. die uitingen moeten worden ontleed. Het verwerken van de gesproken taal die de sensoren van de robot opvangen. Hij kan de tijd voor je bijhouden. en uit die zinsontledingen moet betekenis worden afgeleid. Deze scriptie houdt zich bezig met die laatstgenoemde stap: het bepalen van de betekenis van een taaluiting. gebruikt de hier beschreven aanpak de taalkundetheorie 87 . Als je geen zin hebt om naar de supermarkt te gaan. Hij weet hoe je een ei moet pocheren.Appendix A Samenvatting in het Nederlands A. vertelt hij je wat je met de aanwezige kliekjes nog kunt maken. En bij al deze dingen past hij zijn gedrag aan aan jouw humeur.nl/ content/view/50/41/) proberen onderzoekers nu werkelijk zo’n robot te construeren. Dan moet het geluidssignaal worden omgezet in een reeks fonemen (taalklanken).decis. gaat via vele stappen. Bij het maken van de robot die ik hier net beschreef. A.1 Inleiding Stel je voor: je hebt bij je dagelijkse kookbezigheden in de keuken de beschikking over een vriendelijke robot-assistent. En niet onbelangrijk: hij moet gewone gesproken taal herkennen en produceren. De reeks woorden moet worden opgeknipt in verschillende taaluitingen.

Dat betekent dat je ze met e een computer kunt behandelen als een boomstructuur. de syntaxis (vorm). die juist aan zulke verplaatsingen hun naam ontlenen. en het interpersoonlijke niveau staat voor de verhouding van de uiting met de deelnemers aan de dialoog en hun omgeving. het representationele niveau en het interpersoonlijke niveau. Doordat die informatie zo gestructureerd is. Eigenschappen die functionele discourse-grammatica onderscheiden van andere theorie¨n over de onderliggende structuur van taal zijn: e • Functionele discourse-grammatica beschouwt de discoursehandeling als de basiseenheid van taal. de semantiek (betekenis) als de pragmatiek (relatie met de onmiddellijke omgeving) van taal verklaren. Dat komt van pas bij een project zoals dat van de bovengenoemde robot. is dat er veel begrijpelijke taaluitingen zijn die toch geen zin zijn. Het morfosyntactische niveau beschrijft de morfologie en syntaxis (ruwweg de ontleding) van de uiting. Dus het wil zowel de fonologie (klank). zoals “Vanavond niet!”. Functionele discourse-grammatica stelt dat er vier onderliggende structuren zijn voor een taaluiting. 88 . • In de structuren. waar veel andere taaltheorie¨n de zin als begine punt nemen. wat dat stukje ons vertelt over het representationele niveau. kun je gemakkelijk op een systematische manier al die informatie bekijken. vertaling door de auteur). Het representationele niveau beschrijft de betekenis van de uiting. Het fonologische niveau gaat over de klank van de uiting. Het bepalen van de betekenis op basis van de ontleding. De motivatie voor de discoursehandeling als basiseenheid. het morfosyntactische niveau. die volgens functionele discourse-grammatica worden opgebouwd bij het produceren van een taaluiting. mag niets worden verplaatst of weggehaald tijdens het produceren van de taaluiting. komt in termen van functionele discourse-grammatica dus neer op het omzetten van een morfosyntactisch niveau in een representationeel niveau. Dit is vooral een groot verschil met de theorie¨n over “transformational grame mar”. 2006]. omdat je zo minder verschillende theorie¨n e nodig hebt voor de verschillende stappen van taalverwerking. 1995].3 De aanpak Al deze niveaus hebben een hi¨rarchische structuur. wat ik wil doen. en van elk stukje morfosyntactisch niveau zeggen. A. Functionele discourse-grammatica maakt het namelijk goed mogelijk om de kennis die de robot heeft over zijn gebruiker en de conversatie te benutten bij het correct interpreteren van de uitingen van de gebruiker. Ze heten het fonologische niveau. 2008] [Hengeveld and Mackenzie.van de functionele discourse-grammatica [Hengeveld and Mackenzie. • Functionele discourse-grammatica streeft ernaar een volledige theorie van taal te zijn. Functionele discourse-grammatica definieert een discoursehandeling als “de kleinste herkenbare eenheid van communicatief gedrag” ([Kroon.

Deze stukken heten de repee resentational frames. 89 . Tijdens de composition stage voegt de computer de stukken van het representationele niveau. Deze stap is nodig om te bepalen waar voornaamwoorden naar verwijzen. Ook kijkt de computer tijdens deze fase wat het logische onderwerp is van infinitieven (wie is degene die slaat in “Marijke wil Jan slaan”?). Als je dan systematisch van elk stuk in het morfosyntactische niveau van een uiting kijkt welke stukjes van het representationele niveau er volgens dat stuk morfosyntactisch niveau zijn. Behalve stukjes representationeel niveau van de treewalk stage. en om de rest van de dialoog beter te begrijpen. Als er uit de composition stage weer een representationeel niveau is gekomen. een naam die is ontleend aan de functionele discoursegrammatica. komt het tweede deel van de aanpak: de zogenaamde composition stage. Dat is wat de in deze scriptie beschreven aanpak doet. die zijn gemaakt tijdens de treewalk stage. ee Daarvoor kijkt de computer tijdens de composition stage of stukjes representationeel niveau in elkaar passen. De computer probeert alle mogelijke zinvolle manieren van stukjes samenvoegen om te kijken of er een oplossing is. kan de computer ook stukjes representationeel niveau invoegen die hij altijd tot zijn beschikking heeft tijdens de composition stage. weer samen tot ´´n samenhangend representationeel niveau. is er soms nog een derde stap nodig: de coreference resolution stage. Dit representationele niveau kan verder gebruikt worden om een interpersoonlijk niveau te bepalen met een soortgelijke aanpak. om de veronderstellingen. Als de treewalk stage is voltooid. Zulke stukken kan hij ook meer dan ´´n keer gebruiken bij dezelfde uiting.Wat zo’n stukje ons vertelt over het representationele niveau. is voor te stellen als een verzameling van nul of meer stukjes van het representationele niveau. heb je aan het eind een grote verzameling met allemaal stukjes representationeel niveau die er moeten zijn. voegt hij ze samen. heeft hij een volledig representationeel niveau gemaakt van de uiting waarvan hij een morfosyntactisch niveau aangeleverd kreeg. Als de computer deze drie stappen heeft gedaan. en zo ja. verlangens en plannen van de robot bij te werken. Het is de eerste van drie stappen en staat bekend als de treewalk stage.