Latin Noun Inflection and Latin Prosody A Finite State Implementation

BA-Thesis

Author: Bettina Demmer Nauklerstraße 63 72074 Tübingen bettina.demmer@gmx.de ------------------------------Seminar: Finite State Methods in Computational Linguistics (SS 2006) Instructor: Dr. Dale Gerdemann International Studies in Computational Linguistics Seminar für Sprachwissenschaft Eberhard Karls Universität Tübingen

Hiermit versichere ich, dass ich die vorgelegte Arbeit selbstständig und nur mit den angegebenen Quellen und Hilfsmitteln einschließlich des www und anderer elektronischer Quellen angefertigt habe. Alle Stellen der Arbeit, die ich anderen Werken dem Wortlaut oder dem Sinne nach entnommen habe, sind kenntlich gemacht.

Tübingen, den 21. August 2006

Bettina Demmer

2

In principio erat verbum (Joh 1,1)

3

Table of Content
Abstract ...................................................................................................................................... 5 1 2 Introduction ........................................................................................................................ 5 Morphology in Computational Linguistics ........................................................................ 6 2.1 2.2 2.3 2.4 3 Definition of Morphology .......................................................................................... 6 Computational Applications of Morphology ............................................................. 7 What is Finite State Morphology? ............................................................................. 8 Existing Approaches to the Morphology of Latin.................................................... 11

The Latin Noun ................................................................................................................ 13 3.1 3.2 Latin Alphabet.......................................................................................................... 13 Latin Noun Inflection ............................................................................................... 14 Stem + Ending.................................................................................................. 14 Case, Number, Gender ..................................................................................... 16 Declension ........................................................................................................ 16

3.2.1 3.2.2 3.2.3 3.3

Latin Prosody and Stress Assignment ...................................................................... 17 Latin Syllabification ......................................................................................... 17 Penultimate Law............................................................................................... 18

3.3.1 3.3.2 4

Latin Finite State Implementation in xfst......................................................................... 19 4.1 4.2 4.3 The Overall Structure of the Script .......................................................................... 19 Introduction to the xfst Syntax ................................................................................. 20 The xfst Script in More Detail.................................................................................. 22

5 6

Bibliography..................................................................................................................... 31 Appendix: xfst Script File ................................................................................................ 33

4

which is used to describe the natural language morphology of Latin. rules that can be spelled out in order to form a two-way program which is able to analyze surface word forms of a specific language and to generate word forms out of the lexicon according to specific features. The paper also covers a general introduction to morphology. a definition of finite state morphology. Morphology – as a classical branch of linguistics – deals with the formation of words out of smaller pieces. Using Xerox finite state tools we developed an xfst-script which describes step by step – in terms of several small transducers which are composed together – the construction of a classical Latin noun on the one hand and stress assignment on a classical Latin noun on the other. a survey of existing computational approaches to Latin morphology and a linguistic description of Latin inflectional morphology and prosody. One tries to extract rules in describing the structural patterns of word formation. The idea of using finite state tools for that is that the approach is two-way. which means that the xfst-script can be used to form a declined Latin noun surface form with assigned stress from a given lexicon entry (generation) or to analyze a given Latin noun in its surface form towards its lexicon entry (analysis). It is a program that is able to generate an inflected Latin noun from the lexicon according to features specified by the user (case and number) and to assign stress to it. In finite state morphology (→ section 2. In this paper we will discuss a finite state implementation of the inflectional morphology and prosody of Latin nouns. It is also able to analyze an inflected noun – specified by the user – towards its lemma (dictionary entry) and the morphological features it contains.Abstract This paper – submitted for the degree 'Bachelor of Arts' – describes a finite state approach to the inflectional morphology and the prosody of classical Latin nouns. 1 Introduction Finite state morphology is a computational description of natural language morphology. In section two of this paper we will give a general introduction to morphology. What is morphology and how can it be described in terms of a finite state machine? As we deal with the high inflecting language Latin in this paper. called morphemes (→ section 2.1). that means.3) one is concerned with morphologies of natural languages but in a technical way. we will focus especially on the definition of 5 .

2 we will give a survey of possible applications of morphology in computational linguistics.though 'dead' language – Latin. We will first argue for the 'Item-and-Process' theory according to which we chose the basic structure of our xfst implementation (→ section 4. you can skip section 2. 2 Morphology in Computational Linguistics 2.2. is the branch of linguistics that studies the 'forms of words'. It deals with the internal structure of words.1).inflectional morphology (→ section 2. the general ideas of finite state morphology will be discussed (→ section 2. which carries lexical meaning and is the base of all morphologically related words of the 6 .3). 2002): root morpheme. we will give an overview of the research done on Latin morphology and existing approaches dealing with it.3). theme vowel and suffix (→ section 3. There are three types of morphemes (Müller. The Latin Stress Rule – the 'Penultimate Law' – will be explained (→ section 3. We will explain our motivation for choosing Latin as an example language to experiment on with finite state tools. In section 4. Also the prosody of Latin will be discussed in this chapter.2) and we will argue that it is possible to assign stress to Latin nouns without knowing the exact syllable boundaries of the word.3).1 Definition of Morphology Morphology.1). We will argue that it is the most general and computationally efficient way to split Latin nouns into stem and ending.4).1). compared to the traditional way of splitting Latin nouns into root. If you are familiar with this theory. In section 2. In section three of this paper we will give an overview of Latin noun morphology (→ section 3.2) and Latin prosody (→ section 3. In the end of section two (→ section 2. The basic components of a word are one or more morphemes. from Greek morphe 'form'.3. This section introduces the basics of finite state theory. In section four of this paper we will come to the finite state implementation. Further.3.2 we will give an introduction to the syntax of xfst. the realization of the beforehand discussed theories on finite state morphology on the natural . We will explain further step by step the rules of the finite state transducers which form in the end a complete 'construction plan' of a Latin noun (→ section 4.

case and number as a set of categories on the other axis. stem morpheme. These categories can consist of sets of variables. a twodimensional representation form. either as dative plural or as ablative plural.). the ablative plural form is always identical in spelling and phonetics to the dative plural form of the same noun. case marking etc. 1991)) on the other axis. Traditionally.2 Computational Applications of Morphology In this chapter an overview should be given over the role of morphology in computational linguistics in general and of some possibilities of its application in natural language processing. in the context is different. In inflecting languages words are usually constructed of a basic morpheme. and another morphosyntactic category (a category that is "directly referred to by specific rules in both morphology and syntax" (Matthews. 2. The branch of morphology that is concerned with inflection and paradigms is called 'Inflectional Morphology'.2. which is described morphologically in section 3. the root of a word (which carries the lexical meaning of the word). In Latin. plural marking. The third field is called inflection. 1991). This process usually involves changing of meaning or changing of the part of speech of a word. There are three fields into which morphology divides its studies about word formation (Matthews. An advantage of the representation of the inflection of a language in paradigms is that it is quite easy to find word forms that share the same spelling and phonetics but express different functions. a dependent component of a complex word. is a high inflecting language and the function of a noun in its context is expressed by a suffix according to its declension on the one axis and gender. This phenomenon is called 'syncretism' (Matthews. 1991): Derivation describes the formation of a new word out of an existing word with the help of derivational morphemes.same family. and affix. But the function of the noun. which cannot stand alone. Latin. Composition describes the formation of new words out of two (or more) single words. which is a realization of a root morpheme and can be identical to the root morpheme. for example. e. which is concerned with the grammatical motivated forms of words depending on the syntactic context they appear in.g. 7 . and inflection morphemes. An example of this process is the English prefix 'un-' which turns an adjective into its negative counterpart. inflection is presented in paradigms. A word (or actually a lemma. the morpheme carrying the meaning of the word) is then inflected according to the categories on the two axes. affixes (which carry some other information. which covers one morphosyntactic category on one axis.

a morphological description of a natural language is displayed as a finite state automaton or as a finite state transducer (general term: finite state machine). In classical applications of computational linguistics. data mining etc.Morphology. is helpful in order to summarize the information of a text (e. Every finite state machine consists of one or more states. the symbol alphabet of the machine. Kaplan and Kay's (1994) is the most influential work in the field of finite state morphology. Small networks can be viewed graphically as transition diagrams. These symbols represent the range of the language or relation that the network describes (Beesley and Karttunen. Since his work. information retrieval). 2. it is necessary and contributes to the efficiency of a system to correctly determine the internal structure of a word: To know the grammatical function of a word in its context which can be mainly determined by its morphological analysis is necessary in order to determine its correct translation (e. 2003). machine translation). which are connected by arcs. Koskenniemi (1983) implemented a finite state system which he calls 'a general computational model for word-form recognition and production'. It was their idea to represent phonological rules as a cascade of transducers. finite state methods have been used to describe the morphology and phonology of a wide range of natural languages. 8 . Every finite state network includes a 'sigma'. parsing.3 What is Finite State Morphology? Finite state morphology is a branch of computational linguistics which deals with morphology in a technical sense. Inspired by the idea of Kaplan and Kay. builds the basis for almost all natural language applications in computational linguistics. 2006). part of speech disambiguation. Morphology is actually the most important branch of linguistics and computational linguistics. exactly one start state and any number of final states. as it builds the basis for all the other branches: syntax. as for example machine translation.. In finite state morphology. semantics and phonology.g. as the study of the internal structure of words.g. A finite state automaton describes a language compared to a finite state transducer which describes a relation of two languages. The language or the relation of two languages is described in terms of regular expressions (Roark and Sproat. Every arc has a label and a destination (one state of the network). information retrieval. To trace a word back to its lemma (the basic form as which the word appears in a dictionary) rather than to analyze its grammatical realization in the context.

Σ. 2006). and consists of exactly one non-final state. Figure two shows a finite state transducer that describes the regular relation (a :a)(b :b)*(c :g)(d: f)(d: f)*(e : e). 2) The null language does not accept any string. 2006). not even the empty string. 3) The universal language which is denoted by Σ* contains all strings that can be constructed out of the alphabet Σ. Σ is an alphabet of symbols. s is a designated initial state 3. A finite state language is called regular if it is constructed from a finite alphabet in combination with on of the following operations: set union. concatenation or transitive closure (Roark and Sproat. Roark and Sproat (2006) give a technical summary of the definition of a finite state automaton: A finite-state automaton is a quintuple M = (K. A finite state automaton maps an input string against the labels of its arcs. F is a designated set of final states 4. Finite State Transducer A finite state transducer is a network that describes a regular relation. F. d) where: 1. and 5. d is a transition relation from K × (Σ c є) to K There are some special languages which should be mentioned: 1) The empty language contains exactly one final state and accepts only the empty string. 9 . s. A language in finite state terms is a set of words from an alphabet which contains a set of characters. Figure one shows a finite state automaton which describes the language ab*cdd*e. including the empty string є. If after this matching a final state is reached the string is accepted and it is in the language of the automaton. Figure 1: A simple finite-state automaton accepting the language ab*cdd*e (Roark and Sproat. a finite state automaton is a network that accepts a regular language. K is a finite set of states 2.Finite State Automaton Dealing with natural language.

Another central feature of finite state transducers is inversion (Roark and Sproat. It can be used in morphological analysis to map a string from a lexical level to the surface level (generation) following several rules or it can be used the other way around.3). 10 . language modelling for speech recognition. s. a regular relation is almost always a mapping between pairs of strings. 2006). a final state is in the network is reached. We used this operation very often for our finite state description of Latin noun morphology where we factored our system into a set of operations that are executed one after each other using composition (→ section 4. Σ × Σ. Finite state methods can be used for speech and language processing including morphology and phonology. Roark and Sproat (2006) give a technical summary of the definition of a finite state transducer: A (2-way) finite-state transducer is a quintuple M = (K. d) where: 1. If a string is matched. i.2: A simple finite-state automaton that computes the relation (a : a)(b :b)*(c : g)(d : f)(d : f)*(e : e) (Roark and Sproat. A finite state transducer matches a string against the upper symbols of the labels of its arcs and maps these to the lower symbols of its arcs. and 5. d is a transition relation from K × (Σ c є × Σ c є) to K Composition plays an important role in finite state transducers (Roark and Sproat. the changed string is given as output. A composition of two transducers means first applying the first transducer and then applying the second transducer to the output of the first transducer. Σ is an alphabet of symbols. Inversion means that the system that is implemented as a finite state transducer or a set of transducers composed together can be used in two directions. 2006). F is a designated set of final states 4. Transducers can be composed together. 2006). K is a finite set of states 2. 2006).e.3).Figure 1. as our program can be used to generate Latin nouns from the lexicon as well as to analyze Latin nouns from a given surface form (→ section 4. pronunciation modelling etc (Roark and Sproat. from the surface level to the lexical level (analysis). F. Dealing with natural languages. This feature constitutes the innovation of our morphological analysis of Latin nouns. computational analysis of syntax. s is a designated initial state 3.

See section 4. the platform xfst in specific. The final character 11 .1 for further discussion.4 Existing Approaches to the Morphology of Latin Latin is a very popular language for morphological analysis. theme vowel and suffix (with the fusion of the latter two) – in order to minimize the morphological condition contexts. Lindsay (1894). xfst includes a compiler which builds a finite state network out of the description of the transducers in the xfst script file. He argues that this way of splitting is a generalization of the traditional theme-vowel-plus-suffix analysis which in most cases differs just in the change of the theme vowel. which is described in this paper. Paradigms are two-dimensional constructions where one category is opposed to other categories. it is possible to predict the declension membership of this noun. In Matthews (1991). and its inflectional changes according to certain features and contexts. the 'Item-and-Arrangement' theory is not sufficient. in order to describe transducers which – composed together – form the construction plan of a Latin noun. he describes the 'Item-and-Arrangement' theory opposed to the 'Item-and-Process' theory. 2. In his book. For more information on the syntax of xfst see section 4.1 and 3. Latin is used as representative for other high inflecting Indo-European languages in order to show and explain paradigm structures. we oppose declension of a noun to its case and number. where morphemes are the basic units of meaning which are arranged linearly. he splits Latin nouns into stems and endings – opposed to the traditional analysis of Latin nouns into root. in our paper nouns. In Latin noun inflection. Our summary of Latin noun morphology in section 3. which forms the stem. By counting the theme vowel towards the root of the word. Paradigms (Greek parádeigma 'pattern') are the traditional way of presenting a word. Much research has been done on Latin inflectional morphology. Matthews (1972) describes the inflection of Latin verbs in his book in order to explain inflectional morphology in general.For our implementation we used Xerox finite state tools.2 is mostly taken from these books. Bender describes Latin noun inflection (found in his collection 'Essays on Morphology') a bit differently from traditional descriptions of Latin noun morphology.2. In his analysis. In the following section we will present an overview of literature or systems concerned with Latin morphology. Sommer (1914) and Sommer (1977) give classical analyses of Latin morphology without a reference to computational applications of these. Instead he argues for the 'Item-and-process' theory for inflecting languages where morphology is viewed as the construction of words out of base forms (stems or roots) modified by rules. He argues that for high inflecting languages as Latin.

its frequency in either the Latin Prose or Latin Poetry or Latin Texts corpus During the research of literature about Latin morphology. For further discussion see section 3.of the stem is decisive for the membership of the noun to one of the six declensions. The program takes Latin inflected words as input and gives the English translation and a short analysis (including the case and number but not declension) of the word. A Latin morphological analyzer'. In their article they describe a morphological analyzer. a system. including the lemma of the word. What is new in our approach described in this paper. which also handles Latin verb inflection. and generalizing the rest ending over the other declensions. a representation that is computationally efficient and may be psychologically realistic" In his paper he refers to Bubenheimer (1995) who has implemented a morphological analyzer based on transition networks.1. It is possible to enter a Latin verb and the output of the system is the complete conjugation of that verb. A disadvantage of the program is that it does not trace back declined nouns to its lemma. its English translation. which on one hand analyzes Latin noun morphology but on the other hand also generates Latin noun forms according to given features. offers morphological analysis for inflecting languages as Latin and Greek. 12 . which comprises a base dictionary. a table of suffixes. In our implementation we took Bender's analysis. Using the tool 'Latin Morphological Analysis' the user can enter an inflected word – the system covers adjective. a table of endings and a table of postfixes.2. nouns and verbs – and gets a table with all possible morphological analyzes of the entered word. 'a digital library for the humanities'. Bozzi and Cappelli (1991) present 'A project for Latin Lexicography: 2. is the construction of a bidirectional system. we encountered many morphological implementations or ideas about Latin morphology. The Perseus Project. In his paper he argues that by leaving the theme vowel together with the root of the word. contributes to the "economy of representation when the inflectional system is stored as a transition network […]. Convington (1999) adopts the same generalization theory about Latin noun inflection in his paper as Bender. There we found a 'universal conjugator'. All systems only provide analysis of Latin morphology. McLean presents a Latin translator on his homepage. Logos offers 'language solutions' on its homepage. But most systems we found deal with Latin verb morphology rather than with Latin noun morphology. which forms the stem.

Latin became the official language of the Roman Empire. Latin nouns are grouped into five different declensions which are distinguished by different final character of the stem of a noun. to obey) vs. Further. 2004). in Latin many homographs can be found. 2004). Thus. to lie) vs. A B C D E F a b c d e f G H I g h i K L M N O P k l m n o p Q R S q r s T U V X Y Z t u v x y z Classical Latin has • 6 vowels a e i o u (y). iacere (Engl. parere (Engl. Classical Latin. number and gender. 3. desire) vs. In written Latin there is no graphical distinction between long vowels or short vowels. cupi=do (Engl. a= e= i= o= u= (y=) which can be either long or short (long vowels marked with a = -sign in the text) • • 4 diphthongs ae oe au eu which are always long 17 consonants b p d t g c q k l r m n f v s z h x (Stock. to throw). 13 . is the name of a 100 year period in the first century BC in the development of Latin. It constitutes the mother language of the Romance languages in the Indo-European languages tree (Stowasser.g. words which share the same writing but have different meanings caused by differences in vowel quantity. 1970). pare=re (Engl. also called 'aurea Latinitas'. to give birth).1 Latin Alphabet The alphabet of Latin consists of 24 letters (Stock. We always refer to classical Latin when we mention the language Latin. A Latin noun is determined by case. In this time Latin developed towards a cultivated language of literature and education (Stowasser. As classical Latin is the phase in the history of the language in which the most grammatical restrictions existed. iace=re (Engl. e. It is the period in which Latin is most developed compared to other periods in the development of Latin. 2004). 1970). someone who is eager (in the dative case)) (Stowasser. cupido (Engl.3 The Latin Noun Latin belongs to the Indo-European language family. The information of vowel quantity has to be given in the lexicon. we will concentrate in the following analysis on the grammar of Latin of this time.

This stem appears in front of all case endings except for the nominative singular ending (and except for accusative singular ending in neuter nouns). in this case '(female) friend' and the of the word. The final character of the stem is decisive for the declension of the noun (→ section 3. Then she learns 14 . In front of nominative singular endings (and accusative singular endings in neuter nouns). so the noun belongs to the first declension (or a-declension). 2004).3).3. 1970). flamma-rum. 1914). The changed stem that is used in front of the nominative singular ending (and accusative singular ending) combined with its nominative singular ending constitutes the base form.g. Ending The ending is added to this stem according to the declension. the lemma. Most grammars split up Latin nouns differently. in this case the final character of the stem is an 'a'. the grammatical function of a word form is expressed mainly by changes of the word final ending (Stowasser. the number and the gender of the noun. that is found in the Latin dictionary.2. die-rum. number and gender.2 3. they count the final character of the stem towards the ending. '(female) friend') can be analyzed in the following way: amica 'stem' m 'ending' The stem carries the lexical meaning of the The ending carries the grammatical function word. i. Stem The stem of a noun can be found by cutting off the ending '–um' or '–rum' in the genitive plural form of a noun. the word form 'amica-m' (engl. e. the case. This way of splitting is easier concerning the studying of the language. reg-um (Stock.g. e. turri-um. lupo-rum. passu-um.e. belongs to. a stem change takes place (Sommer.1 Latin Noun Inflection Stem + Ending Latin is an inflecting language.2. The learner first studies the different declensions that exist in Latin. so that every declension has its own ending for each case. in this case 'accusative' + information which declension the noun 'singular' + 'feminine'.

This phenomenon is called 'syncretism'. fourth and third declension. Table1 covers all endings for masculine and feminine nouns.e. the endings can be summarized even more: E. In the following tables all Latin noun endings can be seen (taken from Bender). if any. Some special characters have to be explained: A carat (^) preceding a vowel-initial suffix indicates that that vowel replaces the stem vowel. table2 covers all endings for neuter nouns. 1st: a # ~m e : e e :s ^i:s :rum 2nd: o s ~m : : ^i: ^i: :s ^i:s :rum 5th: e s ~m i: : i: :s :s bus :rum 4th: u s ~m i: : :s :s :s ^ibus um 3rd: i s ~m i: : ^is ^e:s :s ^ibus um 3rd: C s em i: e ^is ^e:s e:s ^ibus um 1 masc/fem nom sg acc sg dat sg abl sg gen sg nom pl acc pl dat pl/abl pl gen pl 15 . the endings that contain the theme vowel. But in this paper we discuss a computational implementation of the Latin noun morphology. Further. the other columns are left empty. where it is easier to have an ending for every feature.3). This way of scaling down the endings and combining features works against studying a language. the ablative singular ending for masculine/feminine nouns is the same for all declensions except for the consonantal declension. A tilde (~) preceding a consonantal suffix indicates that the preceding vowel is always short. All the endings can be summarized in only two tables (→ see table1 and table2). so we prefer the highest generalization possible that can be done examining Latin nouns. As neuter nouns come up only in the second.all the endings according to the declension. i. A hash (#) stands for a zero suffix. This can be seen in more detail in the description of the xfst-implementation (→ section 4.g. A colon (:) following a vowel indicates that the vowel turns into a long vowel. Splitting up Latin nouns into stem and ending means higher possible generalization of the endings. The proper ending for a noun is determined by the final character of the noun (→ declension) and its gender.

3. dative (indirect object). The gender information cannot be seen by looking at the noun. 2nd column (with the title '1st: a'). i. the noun rex (engl. Thus.2 Case. Now all the endings from this column can be added to the stem according to case and number. king) has the stem reg-. Therefore 'rex' belongs to the third declension (consonantal stems). six cases are distinguished: nominative (subject of sentence). which ends in a consonant. The noun is feminine (which is given in the lexicon). The vocative case endings correspond to the nominative case endings in all declensions except for the vocative singular ending in o-declension nouns. Number.2. From the final character of the stem we can see that the noun belongs to a-declension. The gender of a Latin noun can be masculine. 1970). feminine or neuter. object of prepositions of position) and vocative (personal address). The first declension covers nouns with a-stems. The number of a noun is either singular or plural. 3.2. the second declension 16 . the information must be given in the dictionary. that means that the endings are taken from table1. attachment).e. ablative (means.3 Declension Latin nouns are grouped into five different declensions (Stock. star) and go through its declension.2 neut nom sg acc sg dat sg abl sg gen sg nom pl acc pl dat pl/abl pl gen pl 1st: a 2nd: o ~m ~m : : ^i: ^a ^a ^i:s :rum 5th: e 4th: u : : : : :s a a ^ibus um 3rd: i # # : : s a a ^ibus um 3rd: C # # i: e is a a ^ibus um As an example we take the word 'stella' (engl. accusative (direct object). The stem of that noun which is given in the lexicon is 'stella-'. 1970). A noun's membership to a declension is decided by the last character of its stem. where the ending is '–e' instead of the nominative singular ending '–us' (Stock. genitive (possession. Gender In Latin.

the fifth declension nouns with e-stems. But from observations of how stress affects meaning or how stress occurs in verse and from antique notes it is possible to reconstruct stress assignment in Latin words. 1914).g. on actual syllabification in inscriptions and on a theory of syllable boundary with which linguistic phenomena can be explained (Sommer. stress is always assigned to the first syllable of the word form. 2004). The fourth declension covers nouns with ustems. In written Latin. The declensions have traditionally different names in some grammars (e. stress appeared in Latin as expiratory stress which is produced by a stronger air pressure during pronunciation of the stressed syllable. respectively. The syllable boundary in the latter case falls into the 17 . o-declension.nouns with o-stems. This stress type lives on in the Romance languages (Stowasser. 1977). stress is not visible. The stress remains on its old place (according to the Penultimate Law). 1970). Stock. called initial stress (Allen. stress is assigned according to quantity of the penultimate syllable of the word form. this type of stress changes to a musical stress in which the stressed syllable is pronounced on a higher pitch. In this second phase.3. 3.3 Latin Prosody and Stress Assignment As Latin is not phonetically realized anymore. they are called a-declension. 1894). and consonantal declension. It distinguishes its accented syllables by giving them greater energy of articulation than the unaccented. In this phase.1 Latin Syllabification The basic principles of Latin syllabification make the syllable end with a vowel and begin with a consonant or a combination of consonants (Lindsay. i-stems or mixed nouns (which are nouns that belong to the consonantal stem nouns in the singular and to the i-stem nouns in the plural). mixed declension. Until the 5th century BC. 1978). Later in the 5th century BC. u-declension and e-declension. i-declension. The syllabification rule of Roman grammarians confirms that a set of consonants in a word is added to the following syllable unless it is not pronounceable. After the two mentioned phases in the history of Latin stress a third phase follows in which the stress changes again into an expiratory stress (Sommer. the third – as already mentioned – covers nouns with consonantal stems. Latin prosody as a science is based on the syllabification theories of Roman grammarians. 3.

1978). 1914. which always counts towards the next syllable. ending in a short vowel) (Stowasser. 1978. Lindsay. b. 1914. r) – muta cum liquida – the two consonants count to the initial sound of the following syllable. A syllable is heavy (i. If a word consists of one or two syllables stress is assigned to the first syllable. If a syllable ends in a short vowel followed by at least two consonants it is called 'closed' which turns it into a heavy syllable by position (Stock. however. As Latin syllabification appears vague in the literature. d. the syllable boundary lies before the last consonant) except for the 'muta cum liquida' sequence.e. 1894. 1977. 1978). inte-grum which triggers stress on the antepenultimate syllable. Lindsay 1894). the last syllable of a word is extrametrical. According to that law.e. no matter which quantity the syllable has (Allen. In inscriptions. If an enclitic occurs at the end of a word. 1994. ending a long vowel or diphthong) and on the antepenultimate syllable if the penultimate syllable is light (i. We have the information that every syllable contains exactly one vowel or diphthong. stress is pushed on the syllable before the enclitic.g. Kenstowicz. Sommer. g. 1970). With this information it is possible to apply the penultimate law to Latin nouns without knowing the exact syllable boundaries in a word. 3. 1978).e. 2004.2 Penultimate Law The penultimate law is a rule which describes the grammatical stress assignment on Latin words. Words which have stress on the third syllable have a second stress on the first syllable (Sommer. 1967). e. A syllable is naturally heavy if it ends in a long vowel or a diphthong (which are always long). The penultimate law thus applies to the several vowels as deputies of the syllables. Lindsay. p. c. consisting of long vowel) in Latin either by nature ('natural length') or by position ('position length'). Zirin. as the syllable is not closed anymore (Allen. Every syllable (or vowel) is long (= heavy) when it is followed by at least two consonants (position length) (Allen. Allen. If a word consists of at least three syllables the quantity of the penultimate syllable is decisive for the stress: Stress is on the penultimate syllable if it is heavy (i. we use a different analysis for stress assignment on Latin nouns. If a 'muta' (i.consonant group (Sommer.3. 1894). t) is followed by a 'liquida' (l. Compounds are split etymological.e. 18 . syllable boundaries are found at different places: The syllables are always split between consonants (if more then two consonants occur.

2006). small lexical pieces. but '-(i)bus' in the other declensions even though they represent the same morphosyntactic feature. As Latin noun inflection is presented with paradigms listing the various inflected word forms according to their functions and with rules for deriving these forms. that the differences between these two approaches are not as significant from a more formal or computational point of view. it is clear that an analysis of Latin – also analyses of other high inflected IndoEuropean languages such as Classical Greek and Sanskrit – is best done in the framework of the 'Item-and-process' theory. These different approaches are motivated by the properties of different languages. depending on the context of the suffix. the suffixes may change.4 Latin Finite State Implementation in xfst 4. There are not many morphs out of which a noun is constructed which can be associated with corresponding morphemes. Roark and Sproat (2006) argue in their paper. for example. In Latin. 19 . if a second stems occurs. A second reason why 'Item-and-Arrangement' is inappropriate for Latin noun inflection is that there is not only one stem to which suffixes are attached. In more detail. depending on the class of the noun they are attached to. 1954) where morphology is viewed as the construction of words out of morphemes. Thirdly. the 'Item-and-Process' theory fits better to the structure of Latin noun inflection which can be better described in terms of rules that introduce suffixes according to some features than assuming separate morphemes that encode the features (Roark and Sproat. On the other hand we have the theory which is called 'Item-and-process' (Hockett. There has been long discussion about two different morphological theories: On the one hand we have the theory which is called 'Item-andArrangement' (Hockett. As a consequence of the three mentioned reasons. but sometimes two. the first reason to apply the 'Item-and-Process' and not 'Item-and-Arrangement' theory to Latin noun morphology is that in Latin many morphosyntactic features are expressed by only one single suffix (Roark and Sproat. In Latin. 1954) – the theory that we use for our analysis of Latin nouns in this paper – a theory where morphology is viewed as the construction of words out of base forms (stems or roots) modified by rules. it appears in all cases but in the nominative (and in neuter nouns also in the accusative case). the dative/ablative plural ending is '–is' in the first and second declension.1 The Overall Structure of the Script The overall structure of the Latin xfst-script that we implemented for this paper follows the traditional 'Item-and-Process' theory. 2006).

The term 'realizational' refers to theory which says that "the association of a word with a particular set of morphosyntactic properties licenses the introduction of the inflectional exponents of those properties" (Stump. all signs that are used in the xfst script below are listed with their functions respectively (taken from Beesley and Karttunen. who describes four terms in his theory: lexical and inferential. corresponds best to the 'Item-and-Process' theory. 2001). In the table. designates the beginning of a string in the left or the end of a string in the right context of a restriction [] empty string language 20 . Xfst is maintained and expanded by Lauri Karttunen.#. 2003). which he also calls Paradigm Function Morphology. 2001). Xfst is part of Xerox finitestate tools which "provides a regular-expression compiler and direct access to the XEROX FINITE-STATE CALCULUS. we just mention that his inferential-realizational theory. the algorithm for building and manipulating finite-state networks" (Beesley and Karttunen. Without explaining these four terms in this paper in too much detail. boundary symbol. b = LOWER symbol . incremental and realizational.2 Introduction to the xfst Syntax In this section we will give an introduction to the syntax of xfst.The 'Item-and-Process' and 'Item-and-Arrangement' theories are reformulated by Stump (2001). 4. 2003). 0 EPSILON symbol: empty-string language or corresponding identity relation ? ANY symbol: language of all single-symbol strings or corresponding identity relation a single symbol: language that consists of the corresponding string or identity relation on that language a:b pair of symbols: relation that consists of the corresponding ordered pair of strings. a = UPPER symbol. With the term 'inferential' he describes theories in which "associations between the morphosyntactic properties of a word and its morphology are expressed by morphological rules which relate that word to its root" (Stump.

.o.i [A -> B || L _ R] upper language of the relation A lower language of the relation A inverse of the relation A replacement of an original upper-side string A by a string from B if the indicated condition (L = left context.] -> A epenthesis rule: mapping the empty string into non-empty string A () A+ optionality Kleene-plus: concatenation of A with itself one or more times A* Kleene-star: union of A+ with the empty string language ~A [A B] complement of A concatenation of the two languages or relations {word} = [w o r d] concatenation of the corresponding singlecharacter symbols $A [A | B] 'contains A' union of the two languages or relations. R = right context) is fulfilled clear define VAR text # source <filename> clear the stack (which saves networks) command to define a variable VAR start of a comment e. DISJUNCTION [A & B] intersection of the two languages. CONJUNCTION [A – B] [A .[.l A.u A.xfst: reads in the xfst script and builds a network out of it read regex a single regular expression can be read in 21 . B] all the strings in A that are not members of B composition of the relation A with the relation B A.g. source Latin.

as it cannot be reconstructed automatically from the other stem. but 'gener'.3 The xfst Script in More Detail Stem + Ending In order to understand the particular rules we begin with the definition of the lexicon which is later handled by replace rules. declined nouns) that the network covers print upper-words print all lexical strings (i.with this command apply up in Latin. lemma and features) that the network covers print net print information about the network 4. i. To make the program more user-friendly.1. The lexicon with its features is the part of the program that is used as input to the generation part and as output from the analysis part of the program.xfst: the user can enter a declined noun after this command and gets back the lemma of the noun and some features (analysis) apply down in Latin. 'stella' does not have a variant stem. the user does not have to know the stem of a word by heart but can use 22 .xfst: the user can enter a Latin noun in nominative singular and some features after this command and gets back the declined noun (generation) print lower-words print all surface strings (i. How a stem of a Latin noun can be found is described in section 3. In the example above.e.2. That is why its nominative singular form is given in front of the 'traditional' stem (separated by a hash sign for further processing reasons). which has a variant stem in the nominative singular and accusative singular form.e.e. A variant stem is also mentioned in the lexicon. The lexicon entries look like |1|: |1| [{stella} [noun & $fem]] [{genus} %# {gener} [noun & $neut]] Each entry of the lexicon consists of the stem of the Latin noun and information on its part of speech (just in case the script is extended to handle other parts of speech) and on its gender.

we come to the actual program. case 'Case' and number 'Num' (→ |4|). Now. |3| define Case1 [ $%# . Suffixes. ~$%# ].o. Morphology ]. the 'noun' feature in the lexicon is replaced by the four features that it describes: the part of speech tag 'Noun'.o. which is described later in this section. %# ?* %# -> 0 . [. |2| define StemToNomSg [?* []:[noun & $nom & $sg]] .] -> %# || \LexFeatures _ LexFeatures .i . |4| define noun Noun Gend Case Num.o. namely 'Ending'. gender 'Gend' (which is already replaced by the gender that is given in the entry because of the 'contains'-sign $).o. In the next step.o. If a noun has a variant stem (which is already given as nominative singular form) this form is used. [StemToNomSg LexFeatures*] ]. Firstly. and 23 . the inversion of the StemToDict transducer is composed with the Morphology transducer. That means.the program with the better known nominative singular form. define Dictionary [ [StemToDict]. either the nominative singular form has to be newly constructed ('Case2') or the already given nominative singular form is used ('Case1') (→ |3|). In the end.. which is the lexical entry of a noun in a Latin dictionary (→ |2|). these features are extended by two more features. define StemToDict [ Case1 | Case2 ]. we implemented a transducer that turns the stem of a noun into its nominative singular form automatically. which is a placeholder for the possible endings that can be attached to the stem of the word.o. define Case2 [ ~$%# . Lexicon . the part which works only on the lexicon itself.o.

DeclTag. _ %# [C|V]+ Ending cdecl Noun masc nom pl . [a|a=] Ending _ [o|o=] Ending _ [e|e=] Ending _ [u|u=] Ending _ [i|i=] Ending _ C Ending _ |6| describes the rewriting of the declension tag. The user does not have to specify the class of the noun himself. all transducers can also be used the other way around (for analysis)).o.o. This can be seen in the condition part of the rewrite rule (after the ||).] -> [Ending DeclTag] || _ noun ]. Gend. In the following part of the program. These are 'helper' tags as they are not important nor harmful for the user. After this step.o. the lexicon entry looks like the following: {stella} Ending DeclTag Noun Gend Case Num This is now the starting position for the other transducers of the program.'DeclTag' which is a placeholder for the class (declension) the noun belongs to (→ |5|). |6| define Declension [ DeclTag -> adecl || . 24 . each of these 'tags' (Ending. DeclTag -> odecl || . Noun. DeclTag -> idecl || . DeclTag -> edecl || .o. |7| define StemChange [ [C|V]+ -> 0 || . All these tags are necessary to determine the surface word form of a Latin noun (in the generation phase. as the word stem ends in an 'a'.. The gender tag would be already rewritten (from the information in the lexicon) to 'feminine'. DeclTag -> udecl || .o. DeclTag -> cdecl || ]. DeclTag would be rewritten to 'adecl'. The declension tag can be rewritten by automatic recognition of the condition context. There are six different noun declensions in Latin and the final character of the stem is decisive for the membership of a word to a specific declension. Num) is now replaced by its possible realizations (conditioned by the realization of the other tags). . |5| define Features [ [.#. In our example from the lexicon (the entry 'stella'). Case.

2. case and number as conditions for that. At the end of the StemChange definition. .1) is used. . Ending -> {^i:s} || _ adecl Noun fem dat pl ] In the Endings section (→ |8| is just an extract) the Ending tag is rewritten to the respective 'real' ending with the gender. |7| doesn't affect our example from the lexicon. In |7| the definition of the possible stem change can be seen. In the Ending 25 . . . [C|V]+ -> 0 || %# _ Ending cdecl Noun masc nom sg . which states that two conditions trigger the same rewriting of the Ending tag. . It is possible to summarize this conformity with another rule.o. o -> u || _ Ending odecl Noun masc acc sg ]. We first listed all possible endings for all possible conditions and realized later that there is much conformity between the abstract paradigms (which is called syncretism in morphology) that we describe with the different conditions. .o.o. . one exceptional case is listed: In masculine o-declension nouns the o of the stem changes into a u in the accusative singular.. In the first part of the rewrite rule all cases are listed in which the 'standard' stem (→ section 3.o. Ending -> {:rum} || _ adecl Noun fem gen pl .o. . These are all cases except for the nominative singular case for masculine or feminine nouns and all cases except for the nominative and accusative singular case for neuter nouns. . 'stella' does not have a variant stem. . For these cases the variant stem is used and the 'standard' stem (following the hash sign) is deleted. . In these cases the variant stem in front of the hash sign is deleted. . Ending -> {~m} || _ adecl Noun fem acc sg . |8| define Endings [ Ending -> 0 || _ adecl Noun fem nom sg . . In the second part of the rule all nominative (and for neuter nouns accusative) singular forms are listed in the condition part of the rewrite rule.

o.o. StemChange . |12| |13| define RemoveSpecialCharacters define RemoveFeatures Finally.section three special characters can be observed which have to be removed later from the surface string. These rules describe phonological replacements: In |9| vowels preceding the carat sign (^) are deleted. Composition means operation on two relations with a new relation as a result. The explanation of these special characters can be repeated in section 3. in |10| vowels preceding the colon (:) turn into long vowels and in |11| vowels in front of the tilde (~) turn into short vowels (→ recall the special characters in section 3. Declension . Endings . Throughout rules |9| to |11| no stem or ending changing replacements are undertaken. RemoveHashSign .o. In this case we compose several relations. all the rules (which are small transducers respectively) that we just described are composed and combined to a bigger transducer which we call 'Suffixes'. It can be seen again that the Latin noun is determined in its ending by its declension. that the lower language of one transducer is the upper language for the next transducer. Thus. in the section 'Suffixes' every just described smaller transducer is executed one after the other.o. in |14|. its gender. |14| define Suffixes [ Features . which means.2. 26 . This is done throughout all the transducers until we get in the end the result of the composition of all the relations.o. Our example from the lexicon gets its endings in the Ending section according to the remaining information case and number.1): |9| |10| |11| define Voweldeletion define Long define Short Rule |12| and |13| describe stylistic replacements: In |12| all the special characters which were used to trigger phonological changes in rule |9| to |11| are now deleted and in |13| all features are deleted in order to leave only the word in its surface form.2. its case and number.1.

o.Voweldeletion .i . This is the final definition that is read in. To read in the Lexicon and to actually draw the network we have to give the command 'read regex' (→ |15|). RemoveSpecialCharacters . in order to draw the final state network which can be used by the user.o. This 'Dictionary' transducer is then composed with 'Spaces' which is a stylistic transducer to add spaces between all the lexical features in the lexical part (upper language) just for readability reasons. Suffixes ]. which is the composition of the Lexicon – with all its stem entries and some additional information – with the Suffixes (→ |14|).o. define Dictionary [ [StemToDict]. As already mentioned in section 3. Stress Assignment After completion of the 'stem-and-ending' transducer we go further to the stress assignment on Latin nouns.1 we argue that it is possible to assign stress to a Latin noun without knowing the exact syllable boundaries simply with the facts that every syllable contains exactly one vowel or diphthong and the information that every vowel is long by position when it is followed by at least two consonants (except for the situation that it is followed by a 'muta cum liquida' sequence. Dictionary. Short . it is possible with these two facts to formulate rules for the stress assignment: 27 .o. Thus. Then we apply the inversion of the StemToDict function to the 'Morphology' transducer in order to get the nominative singular forms instead of the stems.o. read regex Spaces . which does not trigger a long vowel by position). which counts towards the beginning of the following syllable rather then to be split. RemoveFeatures ]. Long . Morphology ].3.o. |15| define Morphology [ Lexicon . First we define the morphology of Latin nouns.o.

e -> e= || _ [C C+] .#. . {ae} -> '{ae} || .MutaCumLiquida . C* _ C* . .o. . If a word consists of only one syllable the stress lies on the single vowel or diphthong: |17| define OneSyllable [ a -> 'a || . which can be either long or short: |18| define TwoSyllables [ a -> 'a || . . ]. C* _ C* . ]. . .#.#. a= -> 'a= || . . |16| define LongVowel [ a -> a= || _ [C C+] .e. C* _ C* [V|D] C* .#.o. .#. two vowels/diphthongs). in |16| all vowels are replaced by long vowels where applicable. . two syllables (→ |18|) or three syllables (→ |19| and |20|). C* _ C* [V|D] C* .MutaCumLiquida . . . . namely only in the context where the vowel is followed by at least two consonants. . . . stress is assigned to the first vowel or diphthong independent of the quality of the vowel.o. In the next steps (→ |17| to |20|) the nouns are divided into three classes according to the number of syllables (or vowels/diphthongs) they consist of: one syllable (→ |17|). . .o. .#. C* _ C* . .o. . a= -> 'a= || .#. .#.#. If a word consists of two syllables (i. 28 . .Initially.o. All the other vowels which are naturally long are marked in the lexicon.#.

. . independent of the quality of the vowel: |20| define ThreeOrMoreSyllablesAntepenult [ a -> 'a || _ C* VShort C* [V|D] C* .#.#. . As in |14| all the five transducers are composed together to build the 'StressAssignment' transducer (→ |21|): |21| define StressAssignment [ LongVowel .o. . .{ae} -> '{ae} || . .o. That means if the second last vowel is a long vowel or a diphthong. . a= -> 'a= || _ C* VShort C* [V|D] C* . .o. .#. . . If on the other hand the second last syllable contains a short vowel stress is assigned to the vowel or diphthong preceding that short vowel. 29 . ].#. the penultimate law (→ section 3. . . .o.o. . .o. . . .3.#. . {ae} -> '{ae} || _ C* [V|D] C* . .#. stress is assigned to it: |19| define ThreeOrMoreSyllablesPenult [ a= -> 'a= || _ C* [V|D] C* .2) comes into play.#. If the word consists of three or more syllables.o. C* _ C* [V|D] C* . . ]. . . {ae} -> '{ae} || _ C* VShort C* [V|D] C* . ].

o. 30 . the dictionary and the just mentioned 'StressAssignment' transducer: |22| define Prosody [ Spaces . |23| read regex Prosody. ThreeOrMoreSyllablesAntepenult ].o. Otherwise just the network 'Dictionary' is used. That would mean if the user does not know the stress he cannot use the program for analysis. TwoSyllables . namely in the generation phase.o. because he maybe does not know the main stress rule.o.OneSyllable . If the user wishes to have information on stress in Latin nouns in the generation phase he activates the second network 'Prosody'. on the other hand. One Final Remark The implementation of the Latin noun inflection is a program that can be used in two ways: for generation and analysis of declined Latin nouns. If we composed 'Prosody' with 'Dictionary' generally. Therefore. which is unpractical. The implementation of Latin stress assignment. |23| is the command to read in the transducer to build the actual finite state network. ThreeOrMoreSyllablesPenult . Dictionary . 'Prosody' then is the composition of 'Spaces'. StressAssignment ]. it is useful to keep the two finite state networks separate. the user would have to specify main stress in the word he enters for analysis. is a program that is just interesting for use in one direction.o.

1954. Martin. 1994. Inflectional Morphology. Latin Noun Inflection (A Solution to Latin 10). Matthews. University of Helsinki. 1991. 1999. Regular Models of Phonological Rule Systems. Studienarbeit. A. Bubenheimer. Beesley. Eine Morphologische Analysekomponente für das Lateinische zum Einsatz in einem Lehrunterstützendem System. Clarendon Press. A Latin Morphological Analyzer. Draft. Dissertation. 1972. Kenstowicz.logosconjugator. 2006. Universal Conjugator. University Press.. Universität Koblenz-Landau. Cambridge. Lindsay. Charles F. P. Sidney. Lauri. Logos Group. H. 24 (5-6).org. 1894. Koskenniemi. Oxford. Athens. Leland Stanford Junior University. University of Georgia. Second Edition.5 Bibliography Allen. Michael. Convington. Computational Linguistics 20:331-378. Cambridge University Press. and Flexions. Cambridge. Morphology. Cambridge. Kimmo. Hockett. Kaplan. Second edition. Byron. W. W. Stems. Finite State Morphology. 1983. Oxford. URL: http://www. 1978. 2003. and Kay. 31 . W. P.M. Artificial Intelligence Center. Bender. The Latin Language – An Historical Account of Latin Sounds. 1994. University of Hawaii at Manoa. Two models of grammatical description. G. A Project for Latin Lexicography: 2. and Cappelli. Two-Level Morphology: A General Computational Model for WordForm Recognition and Production. In Computers and the Humanities. CSLI Publications. Converging Transition Networks and Sub-Morphemic Regularities in Latin Noun Inflection. R. Phonology in Generative Grammar. Ronald M. Bozzi. 1991. Matthews. Kenneth and Karttunen. Vox Latina – A Guide to the Pronunciation of Classical Latin. 1995. H. University Press. Blackwell Publishing. Uli. Michael A.

1977. 4. Sommer. Auflage 2004. Zug. 1914.). J. 2006. Cambridge University Press. Stock.und Formenlehre – Eine Einführung in das sprachwissenschaftliche Studium des Lateins. Carl Winters Universitätsbuchhandlung.M. University Microfilms. Adam. Stump.edu. Inc. Heidelberg. et al. (unpublished draft). Arbeitsbuch Linguistik. Stowasser. 2. Müller. R. Horst M. 1970. 1979. URL: http://www. Ferdinand. Ann Arbor. Berlin. (Hrsg. Latin parser and translator 0. Auflage. Handbuch der lateinischen Laut. Langenscheidts Kurzgrammatik Latein. Auflage. Richard. Brian and Sproat. Compuational Approaches to Morphology and Syntax. Schöningh. Gregory R. 32 . The Phonological Basis of Latin Prosody.und Formenlehre – Eine Einführung in das sprachwissenschaftliche Studium des Lateins. Roark. und 3..html. Ferdinand.com/alchemy/latin/latintrans. Langenscheidt. Michigan. Perseus Digital Library Project.McLean.96. Tufts University.A Theory of Paradigm Structure. HPT Medien AG. Stowasser. 2001. Auflage. 1967. Gregory T. Paderborn.perseus. Handbuch der Lateinischen Laut. Leo. Carl Winter Universitätsverlag. Inflectional Morphology . Sommer. Crane. Zirin. 2002.levity. 24. URL: http://www. Ed. Cambridge.tufts. Heidelberg.

D {ae} | {au} | {oe} | {eu}. _ %# Seg+ Ending cdecl Noun masc gen . noun Noun Gend Case Num.6 Appendix: xfst Script File clear undefine all define define define define define define define define define define define | {dl} define define define VShort a | e | i | o | u. Gend masc | fem | neut. DeclTag -> udecl || [u|u=] Ending . by the actual declension depending on _ _ _ _ _ # The following definition about stem change contains three different # condition contexts: 1) all segments preceding the hash sign are deleted # in all cases 2) except for the nominative singular (and for neuter nouns # in the accusative singular) where instead the segments following the # hash sign are deleted and 3) final stem character o changes into u with # o-declension masculine accusative singular nouns define StemChange [ Seg+ -> 0 || . #declension | {pl} | {pr} | {gl} | {gr} | {cl} | {cr} #muta cum liquida #noun #part of speech Num | POS]. # The 'DeclTag' feature is rewritten # the context define Declension [ DeclTag -> adecl || [a|a=] Ending .o. POS Noun. DeclTag -> cdecl || C Ending _ ]. #consonant #segment #gender | abl. _ %# Seg+ Ending cdecl Noun masc nom pl . Decl adecl | odecl | edecl MutaCumLiquida {bl} | {br} | {dr} | {tl} | {tr}. VLong a= | e= | i= | o= | u=.o.#. #case #number | udecl | idecl | cdecl. V VShort | VLong.o.#. C b | c | d | f | g | h | l | m | n | p | q | r | s Seg C | V.o.o. LexFeatures [Gend | Case | #short vowel #long vowel #vowel #diphthong | t | v | x | z. DeclTag -> edecl || [e|e=] Ending . DeclTag -> odecl || [o|o=] Ending .o. DeclTag -> idecl || [i|i=] Ending .o.] -> [Ending DeclTag] || _ noun ]. #lexical features ####################STEM+ENDING############################################ ########################################################################### ########################################################################### # The lexical features given in the lexicon of the noun are extended by two # more: 'Ending' and 'DeclTag' (declension) define Features [ [. Seg+ -> 0 || .. Case nom | gen | dat | acc Num sg | pl. 33 .

_ %# Seg+ Ending cdecl Noun neut gen -> 0 || .#. Seg+ . _ %# Seg+ Ending odecl Noun neut gen -> 0 || . _ %# Seg+ Ending idecl Noun fem gen -> 0 || . Seg+ .#. Seg+ .#.#. Seg+ . Seg+ . _ %# Seg+ Ending odecl Noun neut nom pl -> 0 || .o. _ %# Seg+ Ending cdecl Noun neut abl -> 0 || . _ %# Seg+ Ending idecl Noun neut nom pl -> 0 || .#. Seg+ .#.#.#.#.o. Seg+ .o. Seg+ . _ %# Seg+ Ending odecl Noun masc dat -> 0 || .#. Seg+ . _ %# Seg+ Ending idecl Noun neut abl -> 0 || .o.o.o. _ %# Seg+ Ending odecl Noun masc gen -> 0 || . Seg+ .o. _ %# Seg+ Ending odecl Noun masc abl -> 0 || .#.#.#. _ %# Seg+ Ending idecl Noun fem abl -> 0 || .#. Seg+ .#. _ %# Seg+ Ending cdecl Noun fem dat -> 0 || . _ %# Seg+ Ending cdecl Noun fem abl -> 0 || .o.o.#. Seg+ .o. Seg+ -> 0 || . Seg+ . Seg+ . Seg+ .o.o. _ %# Seg+ Ending odecl Noun masc acc -> 0 || .#.o.o. Seg+ .o.#.o. _ %# Seg+ Ending cdecl Noun masc dat -> 0 || .#. _ %# Seg+ Ending cdecl Noun fem nom pl -> 0 || .#. _ %# Seg+ Ending odecl Noun masc nom pl -> 0 || . Seg+ .o. _ %# Seg+ Ending cdecl Noun masc acc -> 0 || . Seg+ .o.#.#. _ %# Seg+ Ending cdecl Noun masc abl -> 0 || .#.o.#. Seg+ .#. Seg+ .#.o. _ %# Seg+ Ending cdecl Noun neut nom pl -> 0 || .o. _ %# Seg+ Ending idecl Noun neut acc pl -> 0 || .#. _ %# Seg+ Ending idecl Noun fem acc -> 0 || .#. _ %# Seg+ Ending idecl Noun neut gen -> 0 || . Seg+ . Seg+ .o. _ %# Seg+ Ending cdecl Noun fem gen -> 0 || . _ %# Seg+ Ending idecl Noun fem dat -> 0 || . _ %# Seg+ Ending idecl Noun neut dat -> 0 || .o. Seg+ .o. Seg+ .o.o.o. _ %# Seg+ Ending cdecl Noun neut dat -> 0 || .Seg+ .o. _ %# Seg+ Ending cdecl Noun fem acc -> 0 || .o. _ %# Seg+ Ending idecl Noun fem nom pl -> 0 || .#. Seg+ . Seg+ .#. Seg+ .o. _ %# Seg+ Ending cdecl Noun neut acc pl -> 0 || . Seg+ .#. Seg+ . _ %# Seg+ Ending odecl Noun neut dat 34 .

Ending -> {^i:s} || _ adecl Noun fem abl pl .o. _ %# Seg+ Ending odecl Noun neut acc pl -> 0 || .o. Ending -> e || _ adecl Noun fem nom pl .o.o.o. Ending -> e || _ adecl Noun fem dat sg . its case and its number define Endings [ Ending -> 0 || _ adecl Noun fem nom sg . Seg+ .o.o.o.#. Seg+ . o -> ].o.o.o.o. _ %# Seg+ Ending odecl Noun neut abl -> 0 || %# _ Ending cdecl Noun masc nom sg -> 0 || %# _ Ending cdecl Noun fem nom sg -> 0 || %# _ Ending idecl Noun fem nom sg -> 0 || %# _ Ending odecl Noun masc nom sg -> 0 || %# _ Ending cdecl Noun neut nom sg -> 0 || %# _ Ending idecl Noun neut nom sg -> 0 || %# _ Ending odecl Noun neut nom sg -> 0 || %# _ Ending cdecl Noun neut acc sg -> 0 || %# _ Ending idecl Noun neut acc sg -> 0 || %# _ Ending odecl Noun neut acc sg u || _ Ending odecl Noun masc acc sg # The auxiliary hash sign is deleted after 'StemChange' define RemoveHashSign [ %# -> 0 ]. Ending -> e || _ adecl Noun fem gen sg .o.o. Seg+ .#. Ending -> {~m} || _ adecl Noun fem acc sg . Ending -> {:} || _ odecl Noun masc dat sg 35 . Ending -> 0 || _ odecl Noun masc nom sg . -> 0 || . Seg+ . Seg+ . Seg+ .o. its gender. Seg+ .o. # The Ending tag is rewritten to the actual ending of the noun according to # its declension.o. Seg+ . Ending -> {^i:s} || _ adecl Noun fem dat pl . Seg+ . Ending -> {:rum} || _ adecl Noun fem gen pl .o. Ending -> {^i:} || _ odecl Noun masc gen sg .o..o. Seg+ . Ending -> {:} || _ adecl Noun fem abl sg .o.o.o. Seg+ .o. Seg+ . Ending -> {:s} || _ adecl Noun fem acc pl .o.

o. Ending .o. Ending .o. Ending . Ending .o. Ending .o.o. Ending . -> {~m} || _ odecl Noun masc acc sg -> {:} || _ odecl Noun masc abl sg -> {^i:} || _ odecl Noun masc nom pl -> {:rum} || _ odecl Noun masc gen pl -> {^i:s} || _ odecl Noun masc dat pl -> {:s} || _ odecl Noun masc acc pl -> {^i:s} || _ odecl Noun masc abl pl -> s || _ edecl Noun fem nom sg -> {i:} || _ edecl Noun fem gen sg -> {i:} || _ edecl Noun fem dat sg -> {~m} || _ edecl Noun fem acc sg -> {:} || _ edecl Noun fem abl sg -> {:s} || _ edecl Noun fem nom pl -> {:rum} || _ edecl Noun fem gen pl -> {bus} || _ edecl Noun fem dat pl -> {:s} || _ edecl Noun fem acc pl -> {bus} || _ edecl Noun fem abl pl -> s || _ udecl Noun masc nom sg -> {:s} || _ udecl Noun masc gen sg -> {i:} || _ udecl Noun masc dat sg -> {~m} || _ udecl Noun masc acc sg -> {:} || _ udecl Noun masc abl sg -> {:s} || _ udecl Noun masc nom pl -> {um} || _ udecl Noun masc gen pl -> {^ibus} || _ udecl Noun masc dat pl -> {:s} || _ udecl Noun masc acc pl -> {^ibus} || _ udecl Noun masc abl pl -> s || _ udecl Noun fem nom sg -> {:s} || _ udecl Noun fem gen sg -> {i:} || _ udecl Noun fem dat sg 36 .o.o. Ending .o. Ending . Ending .o.o.o. Ending . Ending .o.o. Ending . Ending .o. Ending . Ending .o. Ending .o.o.o.o.. Ending .o.o.o.o.o. Ending .o. Ending . Ending .o. Ending . Ending . Ending .o. Ending . Ending .o. Ending .o. Ending .o. Ending . Ending . Ending .

o. Ending . Ending . Ending .o. Ending -> {~m} || _ udecl Noun fem acc sg -> {:} || _ udecl Noun fem abl sg -> {:s} || _ udecl Noun fem nom pl -> {um} || _ udecl Noun fem gen pl -> {^ibus} || _ udecl Noun fem dat pl -> {:s} || _ udecl Noun fem acc pl -> {^ibus} || _ udecl Noun fem abl pl -> s || _ idecl Noun fem nom sg -> {^is} || _ idecl Noun fem gen sg -> {^i:} || _ idecl Noun fem dat sg -> {~m} || _ idecl Noun fem acc sg -> {:} || _ idecl Noun fem abl sg -> {^e:s} || _ idecl Noun fem nom pl -> {um} || _ idecl Noun fem gen pl -> {^ibus} || _ idecl Noun fem dat pl -> {:s} || _ idecl Noun fem acc pl -> {^ibus} || _ idecl Noun fem abl pl -> 0 || _ cdecl Noun masc nom sg -> {^is} || _ cdecl Noun masc gen sg -> {i:} || _ cdecl Noun masc dat sg -> {em} || _ cdecl Noun masc acc sg -> e || _ cdecl Noun masc abl sg -> {^e:s} || _ cdecl Noun masc nom pl -> {um} || _ cdecl Noun masc gen pl -> {^ibus} || _ cdecl Noun masc dat pl -> {e:s} || _ cdecl Noun masc acc pl -> {^ibus} || _ cdecl Noun masc abl pl -> 0 || _ cdecl Noun fem nom sg -> {^is} || _ cdecl Noun fem gen sg -> {i:} || _ cdecl Noun fem dat sg -> {em} || _ cdecl Noun fem acc sg 37 . Ending . Ending .o.o.o.o.o.o. Ending . Ending .o. Ending .o.o.o.o. Ending .o. Ending .o. Ending .o.o. Ending . Ending . Ending .o.o.o. Ending .o.o.Ending .o. Ending . Ending . Ending .o. Ending .o. Ending .o. Ending .o. Ending . Ending . Ending . Ending .o. Ending .o.o. Ending . Ending . Ending .

Ending . Ending . Ending .o. Ending . Ending . Ending . Ending .o.o.o.o. Ending .o. Ending .o.o.o. Ending .o. Ending .o. Ending .o. Ending . Ending .o.o. Ending .o.o. Ending . Ending .o.o. Ending .o.o.o.o.o.. Ending .o. Ending . Ending . Ending .o.o. Ending . -> e || _ cdecl Noun fem abl sg -> {^e:s} || _ cdecl Noun fem nom pl -> {um} || _ cdecl Noun fem gen pl -> {^ibus} || _ cdecl Noun fem dat pl -> {e:s} || _ cdecl Noun fem acc pl -> {^ibus} || _ cdecl Noun fem abl pl -> 0 || _ odecl Noun neut nom sg -> {^i:} || _ odecl Noun neut gen sg -> {:} || _ odecl Noun neut dat sg -> 0 || _ odecl Noun neut acc sg -> {:} || _ odecl Noun neut abl sg -> {^a} || _ odecl Noun neut nom pl -> {:rum} || _ odecl Noun neut gen pl -> {^i:s} || _ odecl Noun neut dat pl -> {^a} || _ odecl Noun neut acc pl -> {^i:s} || _ odecl Noun neut abl pl -> {:} || _ udecl Noun neut nom sg -> {:s} || _ udecl Noun neut gen sg -> {:} || _ udecl Noun neut dat sg -> {:} || _ udecl Noun neut acc sg -> {:} || _ udecl Noun neut abl sg -> a || _ udecl Noun neut nom pl -> {um} || _ udecl Noun neut gen pl -> {^ibus} || _ udecl Noun neut dat pl -> a || _ udecl Noun neut acc pl -> {^ibus} || _ udecl Noun neut abl pl -> 0 || _ idecl Noun neut nom sg -> s || _ idecl Noun neut gen sg -> {:} || _ idecl Noun neut dat sg -> 0 || _ idecl Noun neut acc sg 38 . Ending . Ending . Ending . Ending .o. Ending .o.o. Ending .o. Ending .o.

o. -> {:} || _ idecl Noun neut abl sg -> a || _ idecl Noun neut nom pl -> {um} || _ idecl Noun neut gen pl -> {^ibus} || _ idecl Noun neut dat pl -> a || _ idecl Noun neut acc pl -> {^ibus} || _ idecl Noun neut abl pl -> 0 || _ cdecl Noun neut nom sg -> {is} || _ cdecl Noun neut gen sg -> {i:} || _ cdecl Noun neut dat sg -> 0 || _ cdecl Noun neut acc sg -> e || _ cdecl Noun neut abl sg -> a || _ cdecl Noun neut nom pl -> {um} || _ cdecl Noun neut gen pl -> {^ibus} || _ cdecl Noun neut dat pl -> a || _ cdecl Noun neut acc pl -> {^ibus} || _ cdecl Noun neut abl pl #define Referral # Vowels preceding a caret (^) are deleted define Voweldeletion [ V -> 0 || _ %^ ]. e -> e= || _ %: . # Short vowels preceding a colon (:) turn into long vowels respectively define Long [ a -> a= || _ %: .o.o.o.o. Ending . Ending .o.o. a tilde (~) is always short %~ %~ %~ 39 .o.o. Ending ]. Ending .o. Ending . Ending . Ending . Ending . i -> i= || _ %: .o.o.o. Ending .o. [i|i=] -> i || _ .o.o.o. Ending . u -> u= || _ %: ].o.o. [e|e=] -> e || _ . Ending . Ending . o -> o= || _ %: .o.o. Ending . # A vowel preceding define Short [ [a|a=] -> a || _ .Ending .o. Ending . Ending .

o. 'Long' and 'Short' all special characters are # deleted define RemoveSpecialCharacters [ %^ -> 0 || Seg _ . [u|u=] -> u || _ %~ ].o.o. Voweldeletion . POS -> 0 ].o.o.o. StemChange .o.o. RemoveFeatures ]. Endings .o.o.[o|o=] -> o || _ %~ . # Finally. define Suffixes [ Features . Short . Gend -> 0 . %: -> 0 || VLong _ . RemoveSpecialCharacters .o. %~ -> 0 || VShort _ ].o.o. # After 'Voweldeletion'. all tags are deleted to leave just the surface form of the noun # as a result define RemoveFeatures [ Decl -> 0 .o.o. nom | gen | dat | acc | abl -> 0 . RemoveHashSign . Declension . sg | pl -> 0 .o. Long . define Lexicon [{stella} [noun & $fem]] | [{fenestra} [noun & $fem]] | [{servus} %# {servo} [noun & $masc]] | [{bellum} %# {bello} [noun & $neut]] | [{integrum} %# {integro} [noun & $neut]] | [{puer} %# {puero} [noun & $masc]] | [{ager} %# {agro} [noun & $masc]] | [{vir} %# {viro} [noun & $masc]] | [{deus} %# {deo} [noun & $masc]] | [{rex} %# r e= g [noun & $masc]] | 40 .

%# ?* %# -> 0 .o. # The morphology of Latin nouns is defined as the composition of the # lexicon with the suffixes define Morphology [ Lexicon .. define Spaces [ ~[{ } ?*] . Suffixes. ~$[{ } { }] .o. define Case1 [ $%# . { } -> 0 ]. ~$%# ]. the nominative singular form of the noun is # given in the lexicon preceding a hash sign.o. # If a noun has a variant stem. Lexicon . otherwise the nominative # singular form of the noun is newly constructed define StemToDict [ Case1 41 .o. ~$[Seg { } Seg] . the nominative # singular form is taken from the lexicon. ~[?* [Seg|Gend|Case|Num|POS] [Gend|Case|Num|POS] ?*] . # Stylistic transducer to change every stem into its nominative singular # (standard dictionary) form define StemToNomSg [?* []:[noun & $nom & $sg]] .o. In these cases the nominative # singular form of the noun does not have to be formed but can be taken # from the lexicon.[{cor} %# {cord} [noun & $neut]] | [{iter} %# {itiner} [noun & $neut]] | [{caput} %# {capit} [noun & $neut]] | [c o= {nsul} [noun & $masc]] | [{pater} %# {patr} [noun & $masc]] | [n o= {men} %# n o= {min} [noun & $neut]] | [{genus} %# {gener} [noun & $neut]] | [{corpus} %# {corpor} [noun & $neut]] | [{turri} [noun & $fem]] | [i= {gni} [noun & $fem]] | [{animal} %# {anim} a= {li} [noun & $neut]] | [{manu} [noun & $fem]] | [{lacu} [noun & $masc]] | [{genu} [noun & $neut]] | [r e= [noun & $fem]] | [{di} e= [noun & $fem]] | [{fid} e= [noun & $fem]].o.o. [. # If there is a hash sign in the lexicon entry of a noun.o. ~[?* { }] .o.] -> %# || \LexFeatures _ LexFeatures .o. Suffixes ].o.

.#.#.#. e -> 'e || .o.#.o. a= -> 'a= || .o.#.#. C* _ C* .o. C* _ C* . i -> 'i || . e= -> 'e= || . .#.#. {oe} -> '{oe} || . C* _ C* . {ae} -> '{ae} || . i= -> 'i= || . .o.o. ####################PROSODY################################################ ########################################################################### ########################################################################### # Every short vowel turns into a long vowel (-> heavy syllable ('position # lenght')) when it is followed by at least two consonants define LongVowel [ a -> a= || _ [C C+] .#. C* _ C* .MutaCumLiquida . read regex Spaces .MutaCumLiquida ].#.| [~$%# .MutaCumLiquida . C* _ C* . {au} -> '{au} || .o. .#. . . C* _ C* . C* _ C* .#.#.o.MutaCumLiquida . C* _ C* . . 42 . C* _ C* .o.o.i .o.#. Dictionary.#.o.#.#. e -> e= || _ [C C+] .o. i -> i= || _ [C C+] .o.o.o.#. stress is assigned to that # syllable define OneSyllable [ a -> 'a || .o.#. . u= -> 'u= || .#. .o.#. u -> 'u || .#.#. . o= -> 'o= || .MutaCumLiquida . o -> 'o || . # The dictionary is defined to be the composition of the inversion of the # 'StemToDict' function with the Latin morphology define Dictionary [ [StemToDict].#. C* _ C* . C* _ C* . ]. C* _ C* . C* _ C* . . .#.#. o -> o= || _ [C C+] . # If a word consists of only one syllable. Morphology ]. [StemToNomSg LexFeatures*]] ].o. u -> u= || _ [C C+] .

o.o.o.#. e= -> 'e= || _ C* [V|D] C* . .#. {ae} -> '{ae} || _ C* [V|D] C* . . o= -> 'o= || _ C* [V|D] C* . e -> 'e || . C* _ C* [V|D] C* .#. {oe} -> '{oe} || _ C* [V|D] C* . i -> 'i || _ C* VShort C* [V|D] C* . i -> 'i || . . .#.o.o. . C* _ C* [V|D] C* . .#.o. o -> 'o || . .o. {au} -> '{au} || . 43 . .#.#.o.#. C* _ C* [V|D] C* .#.o. a= -> 'a= || . . o= -> 'o= || . C* _ C* [V|D] C* . C* _ C* [V|D] C* . .#. . i= -> 'i= || . C* _ C* [V|D] C* .#. # If a word consists of three or more syllables stress is assigned to the # second last syllble if it is a heavy syllable (ending in a long vowel or # diphthong) define ThreeOrMoreSyllablesPenult [ a= -> 'a= || _ C* [V|D] C* .#.#. . C* _ C* [V|D] C* . . u= -> 'u= || .#. C* _ C* [V|D] C* .#.o. e -> 'e || _ C* VShort C* [V|D] C* .#. . .#. .o.o. . .#.o.o. C* _ C* [V|D] C* . . ].o.o.#.#.#.#. .#.#.#.#.o. u -> 'u || .o.#. {ae} -> '{ae} || .#.#. C* _ C* [V|D] C* . . {oe} -> '{oe} || .#. {au} -> '{au} || _ C* [V|D] C* .o.#.o.o.# If a word consists of two syllables (two vowels or diphthongs) stress is # assigned to the first syllable define TwoSyllables [ a -> 'a || .o.#. u= -> 'u= || _ C* [V|D] C* . C* _ C* [V|D] C* .#.#. i= -> 'i= || _ C* [V|D] C* . # If a word consists of three or more syllables and the second last # syllable is a light syllable (ending in a short vowel) stress is assigned # to the third last syllable (vowel or diphthong) define ThreeOrMoreSyllablesAntepenult [ a -> 'a || _ C* VShort C* [V|D] C* .#. ]. C* _ C* [V|D] C* . C* _ C* [V|D] C* . e= -> 'e= || .#.#. .

#.o. u -> 'u || _ C* VShort C* [V|D] C* .o.#.#.o. e= -> 'e= || _ C* VShort C* [V|D] C* . OneSyllable .o. ThreeOrMoreSyllablesPenult . define Prosody [ Spaces . . {oe} -> '{oe} || _ C* VShort C* [V|D] C* . i= -> 'i= || _ C* VShort C* [V|D] C* .o.#.o.#.o. a= -> 'a= || _ C* VShort C* [V|D] C* .o.o -> 'o || _ C* VShort C* [V|D] C* . .#.o.o.o. {au} -> '{au} || _ C* VShort C* [V|D] C* . . . . 44 . {ae} -> '{ae} || _ C* VShort C* [V|D] C* .o. TwoSyllables . u= -> 'u= || _ C* VShort C* [V|D] C* . Dictionary .#. #read regex Prosody. ]. .#. . o= -> 'o= || _ C* VShort C* [V|D] C* .o. define StressAssignment [ LongVowel . ThreeOrMoreSyllablesAntepenult ]. .#. StressAssignment ].o.o. .#.

Sign up to vote on this title
UsefulNot useful