You are on page 1of 8

A mereology-based general linearization model

for surface realization

Ciprian-Virgil Gerstenberger
Computational Linguistics
University of Saarland, Germany
gerstenb@coli.uni-sb.de

Abstract

In this paper, we propose a cross-


(Gerdes and Kahane, 2001)). However, basic ques-
linguistically motivated architecture for
tions such as (1) what are the primitive items for lin-
surface realization based on mereology,
earization, (2) whether to linearize lemmata or al-
i.e. on the part-whole distinction. First, we
ready inflected words, (3) how to form complex lin-
present the main ideas that motivated the
ear order parts, (4) what are the subsequent steps of
model. Then we present a general mereo-
linearization that lead to grammatically and ortho-
logical description of the natural language
graphically correct strings, are left open. In this pa-
utterance. The utterance is modeled in
per, we propose a cross-linguistically motivated gen-
terms of embedded Linear Order Parts with
eral linearization model that approaches these basic
two mutually exclusive relations holding
questions.
between them: the Part-Of relation and the
Linear Order relation. A General Lineariza- To emulate the flexibility of natural language with
tion Model based on these concepts consists respect to surface realization, the properties that ac-
of a linearization module, an inflection count for this flexibility have to be properly anal-
module, and a text polishing module. The ysed. To this end, we propose a linguistically in-
architecture we propose models surface re- formed mereological utterance description (MUD).
alization phenomena in terms of constraints This description is the guideline for the general lin-
on grammatically valid configurations of earization model (GLM) we propose. We show that
utterance “parts”. We illustate linearization this linearization model offers a solid ground for a
with our model by presenting walk-through uniform treatment of various linguistic phenomena
examples, and compare our model with related to linearization, inflection, phonological as-
other approaches to linearization. similation, and orthography.
This paper is organized as follows: In section
2, we present four basic observations we consider
1 Introduction of major importance for language-independent sur-
In the time of multimedial, multilingual web-based face realization. In section 3, we propose a mereo-
information systems, spoken dialogue systems play logic description of the natural language utterance as
an important function in presenting search results in the fundament for a flexibile, language-independent,
a compact, flexible way. This is a challenge for the general linearization model. Section 4 deals with
traditional Natural Language Generation (NLG) sys- three essential questions for a general sentence re-
tems that have to provide spoken systems with flex- alization architecture: (1) What to linearize: sym-
ible, context-sensitive output. bols for words (lemmata) or word forms (inflected
Different NLG approaches propose language- words)? (2) What are the primitive items for lin-
independent methods for determining word order earization? (3) How to form complex linear order
(e.g., (Nizar Habash et al., 2001), (Bohnet, 2004), parts? In section 5, we describe the general lin-
earization model, and briefly describe the inflection word or for a constituent and a clause containig that
and the text polishing submodules of the overall ar- constituent. Yet, what is regarded as a constituent
chitecture. We exemplify linearization with GLM depends on which constituency tests are used, and
in section 6 and compare our model to similar ap- the usage of traditional constituency tests is contro-
proaches in section 7. We summarize our model in versial (see (Phillips, 1998) or (Miller, 1992)). In
section 8. contrast, there is no controversy about – and no need
to test – the fact that the phoneme /n/ is part of the
2 Observations
syllable /no/, if this syllable has been uttered. It
We make the following observations relevant to lan-
should be noted that the part-of concept is an old
guage generation:
philosophical concept and that regarding language
Observation 1 Speech is prior to writing both cul- entities as part-of structures is not a novelty either
turally and historically (see (Greenberg, 1968)). (see (Moravcsik, to appear)). An example of a lin-
Designing and developing linguistically moti- guistic theory that employs the concept of part-of
vated NLG systems involves a strict separation of for cross-linguistic analysis is Radical Construction
phonological and orthographic knowledge, as well Grammar. (Croft, 2001, p. 203) says that “[. . . ] the
as a heavy use of phonological knowledge for a only syntactic structure in constructions is the part-
proper orthografic surface realization (but not vice- whole relation between the construction and its ele-
versa). ments”.
Observation 2 Generation and analysis are two 3 Mereological utterance description
fundamentally different tasks. Taking into account observation 3 and 4, we present
The input for analysis is a single linearized string, a mereological utterance description (MUD). The
and the fundamental problem of analysis is the am- model we propose here is an extension of the part-
biguity: to interpret the input correctly, the analysis of relation to smaller linguistic items than words,
modules have to construct syntactic structure, and constituents or constructions (in the sense of (Croft,
to insert empty elements (traces, empty topological 2001)). First, we define the unit of mereologic de-
fields, etc.) into that structure. The fundamental scription we propose and we illustrate it with exam-
problem of generation is the choice: it knows what ples. Then, we present relations and properties of
to say, but there are many ways how to say it. For the mereological structures.
linearization, there is no need of empty elements nor
to output analysed structure such as NPs and VPs. Definition 1 A Linear Order Part (LOP) is a lan-
The result of surface realization is always a string. guage item which is phonologically realized as a
contiguous part of a grammatically well-formed ut-
Observation 3 The smallest linearizable entity in a terance.
language is the phoneme.
According to the definition above following lin-
Various types of speech errors reveal that, when guistic entities are LOPs: a phoneme (e.g., cluster),
producing language, we do not necessarily linearize a phoneme cluster – not necessarily a syllable – (e.g.,
whole constituents, or even words. Phoneme shifts cluster), a syllable – not necessarily a morpheme –
(e.g., mutlimodal), phoneme cluster shifts (e.g., (e.g. cluster), a morpheme – not necessarily a free
flow snurries) or morpheme shifts (e.g., self-instruct morpheme – (e.g., incredible), a word (e.g., a book),
destruction) illustrate this fact. parts of adjacent words (e.g., a big red book), or
Observation 4 The most general relation between word groups (the rather boring book). Any con-
two entities α and β such that α is a substructure of tiguous part of a grammatically well-formed utter-
β is the part-of relation. ance can be a LOP, either (1) motivated linguisti-
What is the relation between a phoneme and a syl- cally – constituents such as noun phrases, adjectival
lable containing it? And the relation between a syl- phrases, verb phrases, or partial constituents, (non-
lable and a morpheme containig it? It is evident that empty) topological fields (as in Topological Field
a phoneme is part of the syllable containing it. This Model (TFM) (Höhle, 1983)) that can consist of
also holds for a word and constituent containing that more than one constituent, or embedded clauses,
matrix clauses, whole sentences, whole paragraphs, 4 From analysis to generation
etc. –, or (2) not motivated linguistically (e.g., the There are various ways to chunk an utterance into
nice book that Angela read is written by Merkel). parts, at various levels. But even if the partitions
We restrict the use of MUD to LOPs that are linguis- that are not linguistically motivated are excluded
tically motivated 1 . some basic questions relevant to linearization arise.
The following two relations hold between LOPs: What is the most appropriate LOP level to work at
Definition 2 [Part-Of relation] in linearization? Shall a general linearization mod-
Let λ1 and λ2 be two different LOPs: λ1 v λ2 iff λ1 ule work at phoneme/grapheme level?
is proper part of λ2 . PO-relation is reflexive, anti- 4.1 What to linearize?
symmetric, and transitive. In order to answer the above questions, all cross-
Definition 3 [Linear Order relation] lingual phenomena that are relevant to linearization
should be accounted for. It is impossible to deal with
Let λ1 and λ2 be two different LOPs: λ1 ≺ λ2 iff
all these phenomena, but we want to call attention to
the occurence of λ1 precedes the occurence of λ2 in
this fact.
the utterance. LO-relation is irreflexive, asymetric,
and transitive. 4.1.1 Inflected or non-inflected items?
For the design of a general surface realization ar-
In addition, LO-relation and PO-relation are mutu- chitecture, it is necessary to determine the modules,
ally exclusive, i.e., two different LOPs can either their precise distribution of tasks, and the overall
PO-relate or LO-relate but not both. workflow. For the linearization task, this means to
Definition 4 [Exclusivity] know whether to linearize lemmata, i.e., non-flected
Let λ1 and λ2 be different LOPs, then: words, or lexemes, i.e., word forms.
1. if λ1 v λ2 , then λ1 6≺ λ2 and λ2 6≺ λ1 Different linguistic theories take different posi-
2. if λ1 ≺ λ2 , then λ1 6v λ2 and λ2 6v λ1 tions with respect to linearization: syntax comes be-
fore inflection morphology (e.g., Government and
To illustrate the three definitions above let us con-
Binding, Minimalist Program); syntax comes after
sider the LOP [the book on the table]:
inflection morphology (e.g., LFG, HPSG).
λ1 ≺ λ2 : λ3 [λ1 [the book on the]λ1 λ2 [table]λ2 ]λ3
To try to find an answer to this question let us
λ1 v λ3 : λ3 [λ1 [the book on the]λ1 λ2 [table]λ2 ]λ3
consider some example. In the Romanian this-
[the book on the] can not be part of [table] but it pre-
NP, the position of the demonstrative can be ei-
cedes it, whereas [the book on the] can not precede
ther prenominal (acest om, this man) or postnom-
[the book on the table] but is definitely part of it.
inal (omul acesta, this man) (see (Mallison, 1986,
An important property of the parts of an utterance is p. 265), (Constantinescu-Dobridor, 2001, p. 123)).
that they can not proper-overlap. The Romanian this-NP is always definite, but it
Definition 5 [Non-proper-overlapping] shows different marking patterns for definiteness,
Let λ1 , λ2 , and λ3 be different LOPs, and λ2 v λ1 depending on the relative position of the demostra-
and λ2 v λ3 . Then either λ1 v λ3 or λ3 v λ1 . tive to the noun. In prenominal position, neither the
To illustrate this property let us consider the string demonstrative nor the noun is marked for definite-
[the red apple]. From a mereologic perspective, one ness (ex. 1), while in postnominal position, both the
can analyse this string demonstrative and the noun is marked for definite-
as λ1 [the red]λ1 λ2 [apple]λ2 ness2 (ex. 5).
λ
To obtain only the two grammatically correct vari-
or as 1 [the]λ1 λ2 [red apple]λ2
ants of the Romanian this-NP (ex. 1 and 5) both the
but definitely not as λ1 [the λ2 [red]λ1 apple]λ2 . morpho-syntactic specification and the relative po-
sition of demonstrative and noun is required. This
fact definitely speaks for linearization before inflec-
Taking into account observation 1, MUD naturally tional morphology. The conclusion is that theo-
extends to written language. 2
Whether the demonstrative is really marked for definite-
1
It is obvious that MUD can cover all types of speech errors ness is questionable, but fact is that it features a different form,
but this is not part of our task depending on its position to the noun.
retical frameworks such as HPSG or LFG might (13) Să-l faceţi!
not be able to generate all grammatically correct that it do-imp-pl
variants of a Romanian, without explicit coding of
linearization-relevant information in other process- (14) Faceţi-l!
ing modules that are not supposed to handle lin- do-imp-pl it
earization. (15) Sie will das Fenster aufmachen.
(1) acest om (5) omul acesta she wants the window off make
this man man-def this-def She wants to open the window.
(2) *acesta omul (6) *om acest (16) Sie macht das Fenster auf.
this-def man-def man this she makes the window off
(3) *acesta om (7) *omul acest She opens the window.
this-def man man-def this A Polish PN-marker in past can occur before (ex.
(4) *acest omul (8) *om acesta 10–11) or after the verb (ex. 9). A Romanian weak
this man-def man this-def
pronoun can occur before (ex. 12–13) or after the
4.1.2 Lexical or sublexical items? verb (ex. 14). Finally, a German separable verb par-
It is not always possible to tell whether an item is an ticle can occur before (ex. 15) or after the verb (ex.
affix, a clitic, or a word, as the vast literature on cli- 16). All these different items pass the linearization
tics reveals (see (Miller, 1992)). This is understand- test.
able, given the language as an ongoing process. As Taking into account the phenomena described
(Croft, 2001) put it: “[l]anguage is fundamentally above, we propose that the set of the primitive items
DYNAMIC . . . Synchronic language states are just for linearization should contain the following types
snapshots of a dynamic process emerging originally of entities: (1) sublexical items that pass the lin-
from language use in conversational interaction.” earization test at morpho-syntactic level such as Pol-
To illustrate sublexical phenomena let us consider ish PN-markers, Romanian weak pronouns and Ger-
the so-called floating affixes in Polish, a marker for man separable verb particles; (2) lexical items, pro-
person and number (PN-marker) in past. Prever- vided that there is agreement among linguists about
bally, it behaves like a clitic, attaching to various the definition of lexemes.
other words (ex. 10–11); postverbally, it behaves 4.2 How to form complex entities?
like a suffix, attaching only to the finite verb (ex.
9; for a detailed description, see (Kupść and Tseng, In this section, we show how to form complex LOPs,
2005), (Crysmann, 2006)). assuming the primitive LOPs described in the previ-
To find out the granularity of primitive lineariza- ous section. Given the well-known phenomena of
tion entities we propose the following test. discontinuous constituents such as partial fronting
and extraposition, it is obvious that forming com-
Linearization test Given two items α and β at plex LOPs does not necessarily correspond to form-
morpho-syntactic level in a specific language, if the ing traditional constituents.
language allows both for α ≺ β and β ≺ α then If a language allows complex LOPs to occur in
these items are linearization primitives. different positions, a general mechanism of form-
Let us illustrate the application of the linearization ing complex LOPs should take the following two as-
test to the following cases: Polish PN-marker, Ro- pects into account:
manian weak pronoun and German separable verb 1. whether two or more primitive LOPs permute
particle. always as a unit;
(9) Nie widzieliśmy tego. [We didn’t see this.]
2. whether two or more primitive LOPs permute
not see-pst-m-pl-1pl this sometimes as a unit, and if so, under which cir-
(10) Tegośmy nie widzieli. cumstances.
this-1pl not see-pst-m-pl We call the first Total Permutation Constraint
(11) Myśmy tego nie widzieli. (TPC) and the second Partial Permutation Con-
we-1pl this not see-pst-m-pl straint (PPC). If in a specific language two or more
(12) Să ı̂l faceţi! [Do it!]
primitive LOPs never permute as a unit, no complex
that it do-imp-pl LOP can be formed of them.
As an illustration of the TPC, let us consider the Peter hat gestern ein Buch, das schön ist, gekauft
German article + noun combination in ex. 17–19. (Yesterday, Peter bought a nice book).
In German, the article and the noun permute as a
unit, independent of their occurence in a grammati- (20) Peter hat gestern ein Buch, das schön ist, gekauft.
cally correct utterance. Peter has yesterday a book that nice is bought
(17) Das Buch ist schön. (21) Peter hat ein Buch, das schön ist, gestern gekauft.
the book is nice Peter has a book that nice is yesterday bought
The book is nice.
(22) *Peter hat ein Buch gestern, das schön ist, gekauft.
(18) Schön ist das Buch.
Peter has a book yesterday that nice is bought
nice is the book
The book is nice. (23) Peter hat gestern ein Buch gekauft, das schön ist.
Peter has yesterday a book bought that nice is
(19) Ist das Buch schön?
is the book nice (24) Peter hat ein Buch gestern gekauft, das schön ist.
Is the book nice? Peter has a book yesterday bought that nice is
Now imagine an – allowedly strange – language
As long as both ein Buch and das schön ist oc-
in which the article of the direct object – if there is
cur in the middle field (the underlined part in the
one –, the subject – if there is one –, and the temporal
exmples above) they have to form a complex LOP
adverbial – if there is one – can permute freely but
that can be scrambled as a whole (ex. 20–22). How-
always as a unit: this has to be modeled in exactly
ever, if the relative clause is extraposed, the adverb
the same way as the German article + noun combi-
yesterday can occur between the noun and the rel-
nation above, despite the fact that they do not belong
ative clause (ex. 23–24). In the same vein, we
to the same constituent but, in fact, they are just parts
model linearization constraints stemming from dif-
of different constituents. Now it is clear that while
ferent description levels: morpho-syntactic, syntac-
contiguous constituents always meet TPC, TPC ap-
tic and macro-structural. We want to stress here too,
ply to all kind of primitive LOP combinations, not
that our model handles discontiguous constituents,
necessarily to those that are semantically related.
topological fields in Germanic languages but also
The strange language example above illustrates ex-
any kind of partial constraints in other languages in
plicitly this extremely important issue.
the same way, namely by means of PPC.
Please note that the fact that complex linear order
parts in our model are build solely based on TPC and 4.3 Where to linearize?
PPC, and not on traditional syntactic constituency,
Different approaches to linearization take different
is one of the crucial differences between the model
positions with respect to whether linearization is an
we propose and approaches that, at a first sight, are
absolute (1st , 2nd , 3rd , etc.) or a relative positioning
similar to it, such as (Bohnet, 2004) or (Gerdes and
process. Taking into accout the optionality of ele-
Kahane, 2001).
ment usage in language, absolute positioning leads
For a general linearization model, we want to
to the use of empty slots (see, for instance, the tra-
stress that TPC/PPC and adjacency are not the same,
ditional TFM). However, this contradicts our aim to
and that just adjacency as a linearization constraint
comply with observation 2. Therefore, we propose
is not an appropriate means of abstraction: TPC
to use relative positioning, expressed in terms of be-
and PPC impose adjacency automatically if there are
fore and after.
only two primitive LOPs to combine. Imagine for
instance the German das rote Buch (the red book):
5 General Linearization Model
these three primitive LOPs meet TPC, they always
permute as a unit, but, in this constellation, the ar- The General Linearization Model (GLM) we pro-
ticle is never adjacent to the noun. Moreover, just pose reflects the primitive structures of MUD, i.e.,
putting two or more LOPs together doesn’t say any- the primitive LOPs, as well as the relations between
thing about their position to each other, as is the case them: the PO-relation and the LO-relation, as de-
with scrambling in the middle field in German. scribed in section 3. The granularity of the input
To illustrate PPC let us consider German extra- symbols is dictated by the considerations in section
posed and non-extraposed relative clauses such as 4.1, how to build complex linearization structures
abides by the two constraints proposed in section For every subtree of the form [noun-det→art] the SLOP
4.2, and how to position a linearization item to an- for article precedes the SLOP for noun.
other follows the ideas in section 4.3. 2. horizontal: constraining the position of a node
In this section, first, we present GLM and then towards its sibling nodes
we sketch a possible surface architecture based on For every subtree of the form [noun1 -det→art]
GLM. & [noun1 -mod→adj] the SLOP for article precedes the
SLOP for adjective.
5.1 Linearization entities
3. diagonal: constraining the position of a node
Definition 6 A Symbolic Linear Order Part towards nodes that are neither siblings nor in
(SLOP) is a symbolic representation of a language immediate dominance relation in the depen-
item which has to be phonologically (or graphically) dency tree
realized as a contiguous part of a grammatically This rules constrain the position of a node or a subtree
well-formed utterance (or sentence). which does not form a complex SLOP with its mother
Primitive SLOPs are symbols for linearization- SLOP, as is the case for extraposed relative clauses as in
relevant items at morpho-syntactic level, i.e., (1) ex. 23–24.
content words, (2) function words and (3) sublex-
ical items that pass the linearization test. Addi- 5.3 Input structure
tionally, each SLOP contains its morpho-syntactic The input structure for the linearization is a depen-
specification. For instance, for the German form dency tree whose nodes are SLOPs, i.e., symbols
machte the specification is [MACHEN, type=verb, for the individual LOPs to be realized, plus their
vForm=fin, temp=imperf, pers=3, nr=sg]. The Roma-
morpho-syntactic descriptions. This means that the
nian weak pronoun forms ı̂l and -l require [WEAK - module responsible for building input structures has
PRON , type=wPron, pers=3, nr=sg, gender=n, case=acc].
to know about primitive linearization items. As with
The specification for the German article form dem other dependency models (see (Gerdes and Kahane,
is [ART, type=def, nr=sg, gender=n, case=dat]. Finally, 2001), (Bohnet, 2004)), we assume that there is
the specification for the Polish PN-marker form śmy no explicit linearization information stored with the
is [PN - MARKER, pers=2, nr=pl]. tree.
5.2 Rules 5.4 Surface realization architecture
Reflecting MUD, GLM features two type of rules: Based on GLM, we propose the following surface
(1) PO-relating rules (mereological rules) and (2) realization architecture:
LO-relating rules (linear rules).
1. a linearization module that takes as input de-
5.2.1 Part-Of rules pendency trees as described above and builds
Given the primitive SLOPs in an input structure, all valid variants according to the PO- and LO-
the PO-rules constrain these SLOPs according to rules of a given linearization grammar. The
the total and partial permutation constraints of the output of the linearization step is a set of pro-
given language. For German, such a rule forms jective SLOP-trees.
a complex SLOP from every subtree of the form
2. an inflection module that takes as input the
[noun-det→art], constraining the permutability of
output of linearization and, based on the
a noun and its determiner (see ex. 17–19) or a noun
morpho-syntactic specification and, if needed,
and its relative clause (see ex. 20–24).
on adjacency information (see section 4.1), ob-
5.2.2 Linear Order rules tains the appropriate surface forms of the in-
dividual SLOPs plus their phonological rep-
The GLM features three types of LO-relating rules:
resentation. The output are the same pro-
1. vertical: constraining the position of a node to- jective SLOP-trees enriched with morpho-
wards its children nodes phonological representations.
3. a text polishing module that takes the output
of the previous step and, based on the surface
form and the context of each SLOP, it performs
phonological assimilation, orthographic editing
and punctuation. The output of this module is
the final surface realization of the utterance.

6 Examples
This section exemplify the interaction of the pro-
posed architecture, using some phenomena de-
scribed in the previous sections.
Figure 1: Forming complex SLOPs
Polish Let us consider the Polish PN-marker de-
scribed in ex. 9–10. For the given input, the PO-
rules form complex SLOPs as in figure 1. The LO-
rules for Polish order simple and complex SLOPs
according to grammar restrictions as in figure2. Af-
ter the inflection morphology step, the final string is
processed accordingly: the first word form is cap-
italized; the PN-marker is joined to the preceding
word, if this action does not violate the phonological
constraints3 imposed by the language (see (Kupść Figure 2: Final results
and Tseng, 2005)) – in which case, the current real-
ization variant is rejected –; then, punctuation rules obligatory weak pronoun postclitization to the verb
are applied. in postverbal position, only the variant Faceţi-l! is
generated.
Romanian Given as input SLOPs for realizing a 7 Related work
Romanian this-NP, a linearization module impose
One of the most important concepts of GLM is
no restriction on the positions of demonstrative and
the use of mereological structure as “generalized
noun: both variant are possible (see ex. 1 and 5).
constituents”. Dissociating linear order from con-
Based on the morpho-syntactic specification and
stituency is not new (see (Reape, 1994), (Pollard
the relative position, the value for definiteness is
et al., 1994), (Goetz and Penn, 1997)), but most
assigned accordingly: for the constellation [DEM,
of these models are concerned mainly with analy-
def=minus]≺ [OM, def=minus], the inflection morphol-
sis, and as for generation, many questions are left
ogy generates acest om, while for [OM, def=plus]≺
open. However, even models dedicated mostly to
[DEM, def=plus], omul acesta.
generation employ vague concepts of what are the
Unlike (Minnen et al., 2000), the surface architec-
linearization primitives, how exactly to form com-
ture we propose allows for different variants at the
plex linearization units, or what to do after the lin-
level of morphology. For Romanian weak pronouns
earization step to obtain the correct surface realiza-
(see ex. 12–14), the result of linearization and inflec-
tion. An example of such a model is (Bohnet, 2004),
tion is să ı̂l faceţi and faceţi ı̂l. Based on the optional
which is very similar to our model with respect to
weak pronoun postclitization to the subjunction să in
dependency trees as input stuctures and to how to
preverbal position, the text polishing module gener-
express linearization rules. However, there are basic
ates both Să ı̂l faceţi! and Să-l faceţi!; based on the
differences with respect to both primitive and com-
3
The proposed architecture allows for a modular application plex linearization items. While our model proposes
of phonological rules that can model even such phenomena like mereological-based units that abide by TPC/PPC,
yer vocalisation (see (Crysmann, 2006) for details). Given the
constellation mógł+m, phonological rules at this level come up (Bohnet, 2004)’s “[p]recedence units roughly rep-
to the correct verb form mogłem. resent constituent structures.”. Moreover, (Bohnet,
2004) uses only two kinds of LO-relating rules (ver- Topological Hierarchy. In Proceedings of the Associ-
tical and horizontal), failing possibly to properly ation for Computational Linguistics, pages 220–227,
Toulouse.
constrain extraposed relative clauses in German (see
ex. 23–24). As for the realization steps after lin- Thilo Goetz and Gerald Penn. 1997. A proposed linear
earization, (Bohnet, 2004) appears not to be con- specification language. Technical Report SFB 340, Nr.
cerned about at all. 134, University Tübingen.

8 Conclusions Joseph Greenberg. 1968. Anthropological linguistics.


Random House, New York.
In this paper, we presented the following three mod-
els: a mereological utterance description (MUD), a Tilman Höhle. 1983. Topologische Felder. Ph.D. thesis,
general linearization model (GLM) that draws upon University of Cologne.
MUD, and a surface realization architecture that Anna Kupść and Jesse Tseng. 2005. A new HPSG ap-
draws upon GLM. The overall goal of this under- proach to Polish auxiliary constructions. In Proceed-
taking is a flexible, language-independent model for ings of the HPSG Conference, Lisbon.
surface realization. We asked basic questions with Graham Mallison. 1986. Rumanian. Croom Helm, Lon-
respect to linearization – such as what are primitive don.
linearization items, how to form complex items –,
Philip Miller. 1992. Clitics and Constituents in Phrase
and tried find answers to them by taking into account
Structure Grammar. Garland Publishing, New York.
relevant, cross-linguistic phenomena. We shown the
individual parts of GLM and how a surface realiza- Guido Minnen, John Carroll, and Darren Pearce. 2000.
tion architecture based on this model realizes some Robust, applied morphological generation. In Pro-
ceedings of the International Conference in Natural
of the phenomena presented before. Language Generation, Mitzpe Ramon, Israel.
In future work we intend to implement the theo-
retical ideas presented in this paper, modeling vari- Edith Moravcsik. to appear. Part-whole relations
in syntax. In Hans Burkhardt, Johanna Seibt, and
ous linguistic phenomena. Our research topic is to Guido Imaguire, editors, Handbook of mereology.
find out, classify and implement in a modular man- Philosophia Verlag, München.
ner different linearization constraints (grammar, in-
formation structure, phonology, etc.), as well as to Nizar Habash, Bonnie Dorr, and David Traum. 2001.
Efficient Language Independent Generation from Lex-
associate prosody to each linearization variant. The ical Conceptual Structure. Technical Report LAMP-
final goal of the model is to enable an appropriate TR-074,CS-TR-4262,UMIACS-TR-2001-43, Univer-
ranking of linearization variants with respect to the sity of Maryland, College Park, September.
specific communicative situation. Colin Phillips. 1998. Linear order and constituency. In
LSA annual meeting, New York City.
References
Bernd Bohnet. 2004. A Graph Grammar Approach Carl Pollard, Robert Kasper, and Robert Levine. 1994.
to Map between Dependency Trees and Topological Studies in constituent ordering: Toward a theory of lin-
Models. In Proceedings of the 1st International Joint earization in HPSG. Technical report, Standford Uni-
Conference on Natural Language Processing, Sanya versity.
City, China.
Mike Reape. 1994. Domain union and word order vari-
Gheorghe Constantinescu-Dobridor. 2001. Gramat- ation in german. In German in HPSG, Stanford Uni-
ica limbii române. Editura Didactică şi Pedagogică, versity.
Bucureşti.
William Croft. 2001. Radical Construction Grammar –
Syntactic Theory in Typological Perspective. Oxford
University Press, Oxford.
Berthold Crysmann. 2006. Floating affixes in polish. In
Proceedings of the HPSG Conference, Varna.
Kim Gerdes and Sylvain Kahane. 2001. Word Order
in German: A Formal Dependency Grammar Using a

You might also like