Professional Documents
Culture Documents
May, 2017
MAHDI YONIS KAYAD
This is to certify that the thesis prepared by Mahdi Yonis, titled: development of Morphological
analyzer for Af-Somali and submitted in partial fulfillment of the requirements for the Degree of
Master of Science in Computer Science complies with the regulations of the University and meets
the accepted standards with respect to originality and quality.
Advisor:_______________________________________
Examiner:______________________________________
Examiner:______________________________________
ABSTRACT
Morphological analysis is a very critical issue especially for natural language processing related
tasks on inflectional languages. This thesis work gives the implementation details of the
development of morphological analyzer for Af-Somali, which is an inflectional language. A
detailed computational analysis of Af-Somali morphology such as formalization of alternation and
morphotactic rules for Af-Somali is worked out in order to create the morphological analyzer. In
the implementation of the morphological analyzer, alternation and morphotactic rules of Af-
Somali are represented by two-level morphology rules. This is the first detailed computational
analysis of Af-Somali from morphological view. The attempt of this thesis is mainly based on the
dictionary book Annarita, known as Qaamuus and the declensions of nouns Andrzejewski. This
thesis work is employed by finite state two level approach using Xerox finite state toolkit. The
work is done in two parts, means to encode the lexicon we have used lexical formalism (lexc) and
the alternation rules are implemented by xfst.
Generally, we evaluated the morphological analyzer by measuring the following things, the total
number of word tokens correctly accepted by the analyzer versus the number of words incorrectly
processed by the analyzer. We have manually annotated 218 tokens, 90 nouns, 120 verbs and 8
adjectives of words from the book known as (qaamuus). 77 nominal, 105 verbal and 6 adjectives
were correctly analyzed. So, from this we can understand that, 85.5% Nominal, 87.5% verbal and
75% of adjectives were correctly analyzed, and total of 218 tokens 86.2% was correctly analyzed,
13.76% is wrongly analyzed and total 10 tokens failed to be analyzed by the system. The results
were evaluated by a human reader familiar with the languages. Therefore we found an encouraging
result which is a preliminary work for computational development of Af-Somali.
Keywords: (NLP) Natural language Processing, morphological analyzer, (FST) finite state
transducer, (XFST) Xerox finite state toolkit and lexical formalism (LEXC).
I
ACKNOWLEDGEMENTS
I thank all who in one way or another contributed in the completion of this thesis. First, I
give thanks to Allah who gives me protection and ability to do work. I am so grateful to
the Addis Ababa university college of natural science and computer science department for
making it possible for me to study here. I give deep thanks to the lecturers at the department
of computer science, the librarians, and other workers of the faculty. My special and
heartily thanks to my Advisor, Dr. Yaregal Assabie who encouraged and directed me. His
challenges brought this work towards a completion. It is with his advices that this work
came into existence. For any faults I take full responsibility. My special gratitude and
appreciation also goes to Annarita Puglielli and Cabdalla Cumar Mansuur for their
invaluable service contribution to Af-Somali dictionary which was first fully written
dictionary with the full grammatical information. Their discussions and comments on
Af-Somali Lexicons and Morphology have been the base of this work. Moreover, I am
grateful to many friends and colleague through these difficult years. I appreciate my dear,
Mother and goodhearted brothers, Mr Abdirashid Yonis and Hamse Yonis, who have
supported and helped me many setback and I greatly value their contribution.
II
Table of Contents
List of Figures............................................................................................................................................VI
List of Tables ........................................................................................................................................... VII
Chapter 1 : Introduction ............................................................................................................................ 1
1.1 Background of the Study ............................................................................................................ 1
1.2 Morphological Analysis .............................................................................................................. 1
1.3 Statement of the Problem ........................................................................................................... 3
1.4 Objectives..................................................................................................................................... 4
1.5 Methodology ................................................................................................................................ 5
1.5.1 Literature Review ............................................................................................................... 5
1.5.2 Data Collection and Classification..................................................................................... 6
1.5.3 Analysis ................................................................................................................................ 6
1.5.4 Implementation ................................................................................................................... 6
1.5.5 Testing .................................................................................................................................. 6
1.6 Application of the Result ............................................................................................................ 6
1.7 Scope and Limitation .................................................................................................................. 7
1.8 Organization of the Thesis ......................................................................................................... 7
Chapter 2 : Literature Review ................................................................................................................... 8
2.1 Introduction ................................................................................................................................. 8
2.2 Introduction to Morphological Analysis ................................................................................... 8
2.2.1 Morphemes .......................................................................................................................... 8
2.2.2 Affixes................................................................................................................................... 9
2.2.3 Types of Morphological Processes ..................................................................................... 9
2.2.4 Inflection ............................................................................................................................ 10
2.2.5 Derivation .......................................................................................................................... 10
2.2.6 Compounding .................................................................................................................... 10
2.3 AF-Somali Morphology ............................................................................................................ 10
2.3.1 AF-Somali Phonetics ......................................................................................................... 11
2.3.2 Basic Characteristics of Af-Somali .................................................................................. 11
III
2.4 Inflectional Process of AF-Somali ........................................................................................... 12
2.4.1 Nouns .................................................................................................................................. 12
2.4.2 AF-Somali Noun Determiners.......................................................................................... 15
2.4.3 Adjectives ........................................................................................................................... 17
2.4.4 The Verb ............................................................................................................................ 17
2.4.5 Classification AF-Somali Verbs ....................................................................................... 18
2.5 Derivational System of AF-Somali .......................................................................................... 20
2.6 Approaches to Morphological Analysis .................................................................................. 21
2.6.1 Corpus-based Approaches ............................................................................................... 21
2.6.2 Rule-based Approach ....................................................................................................... 22
2.7 Finite State Technology ............................................................................................................ 23
2.7.1 Finite State Machines........................................................................................................ 24
2.7.2 Finite-state transducers .................................................................................................... 24
2.7.3 Two Level Morphological Approach ............................................................................... 25
2.7.4 The Xerox Finite State Frame work ................................................................................ 25
2.8 Summary.................................................................................................................................... 28
Chapter 3 : Related work ......................................................................................................................... 29
3.1 Introduction ............................................................................................................................... 29
3.2 Morphological Analyzer for European Languages ................................................................ 29
3.3 Morphological Analyzer for Asian Languages ....................................................................... 30
3.4 Morphological Analyzer for Ethiopian Languages ................................................................ 31
3.5 Summary.................................................................................................................................... 32
Chapter 4 : Design of Af-Somali Morphological Analyzer ................................................................... 33
4.1 Introduction ............................................................................................................................... 33
4.2 General Architecture of AF-Somali Morphological Analyzer .............................................. 33
4.2.1 Lexicon/ Morph-tactics ..................................................................................................... 35
4.2.2 Alternation Rules .............................................................................................................. 36
4.3 The Design of AF-Somali Part-Of-Speech Lexicon and Alternation Rules ......................... 37
4.3.1 AF-Somali Verb Lexicon Design ..................................................................................... 37
4.3.2 Alternation Rules of AF-Somali Verbs ........................................................................... 41
4.3.3 Noun Lexicon Design ........................................................................................................ 44
4.3.4 Alternation Rules of AF-Somali Nouns ........................................................................... 47
4.3.5 Adjectives Lexicon Design ................................................................................................ 48
Chapter 5 : Experimentation and Evaluation ........................................................................................ 50
5.1 Introduction ............................................................................................................................... 50
IV
5.2 Experimentation ........................................................................................................................ 50
5.3 Discussion and Evaluation........................................................................................................ 51
Chapter 6 : Conclusion and Future Work .............................................................................................. 53
6.1 Conclusion ................................................................................................................................. 53
6.2 Future Work .............................................................................................................................. 54
References .................................................................................................................................................. 55
1.9 Appendix-A: Alternation Rules for Noun and Verb ................................................................ 1
1.10 Appendix-B: Af-Somali verb Lexicon ....................................................................................... 4
1.11 Appendix-C: Af-Somali Noun lexicon ....................................................................................... 9
V
List of Figures
VI
List of Tables
VII
List of Abbreviations
Af-Somali Somali Language
FSA Finite State Automata
FST Finite State Transducers
IR Information Retrieval
MT Machine Translation
NLP Natural Language Processing
POS Part-Of-Speech
SOV Subject-Object-Verb
VIII
Chapter 1 : Introduction
A natural language is the preferred medium of communication for people and it can be in
a spoken or written form, which is difficult to be simply understood by the computers. This
needs a mechanism with enough information of the language including its word grammar
and sentence structure to be understood by the computers. The processing of this
information by a computer is known as natural language processing (NLP). NLP is used
for both generating human readable information from computer systems and converting
human language into more formal structures that a computer can understand [6]. It is a field
of study which consists of different levels of linguistics analysis such as phonetic,
morphological, syntactic and semantic analysis, and the basic level is the morphological
analysis to different NLP applications.
Morphology is seen as ‘the study of words that are formally and semantically related’. In
order to consider a word as an expression, it must be characterized as having three
1|Page
features, a phonological form, a category or word classes and a meaning. Morphology is
concerned with the study of internal structure of words. Morphological analysis consists of
the identification of parts of the words or constituents of the words. For example the word
toosi (strengthen) in Af-Somali consists of two constituents, the root word toos (straight)
and the imperative marker (i). The morphological analysis primarily consists in breaking
up the words into their parts and establishing the rules that govern the co-occurrence of
these parts. Morphology can be viewed as the process of building words by inflection and
word-formation. So, the task of morphological analysis, is to take forms and relate them to
other word forms, at the same time deriving information about the form [30].
A morphological analyzer is an essential and basic tool for building any language
processing application in natural language e.g., Machine Translation system and it is an
essential technology for most text analysis applications like information retrieval (IR) and
text summarization etc. The most obvious applications are found in the areas of
lexicography and computational linguistics [24]. Two factors are essential to achieve
accurate automatic morphological analysis, one factor is the construction of a set of
morphological rules (morphotactic) and the other is the morphological analysis procedure
[24]. The absence or underperformance of either of them impairs the overall ability of the
morphological analyzer.
For example, with respect to the word "dogs", we can say that the "dog" is the root form,
and s‟ is the affix. Here the affix gives the number information of the root word. Thus,
morphological analysis is found to be centered on the analysis and generation of the word
forms. It deals with the internal structure of the words and how those words can be formed.
Morphological analysis also play an important role in applications such as spell checking,
electronic dictionary interfacing and information retrieving systems, where it is important
that words are only morphological variants of each other are identified and treated similarly
[30]. In NLP and especially in machine translation (MT) systems, we need to identify
words in texts in order to determine their syntactic and semantic properties. Morphological
study helps us by providing rules for analyzing the structure and formation of the words.
2|Page
Therefore, having a morphological analyzer for any natural language is a vital step in
starting natural language processing; especially those lesser-studied and under-resourced
languages, it is often a practical and extremely valuable first step, making use of corpora,
lexicons, morphological grammars and phonological rules already produced by field of
linguists and descriptive linguists [9]. Several Morphological Analyzers have been
developed for different well documented languages such as English [30] and Arabic [13].
On the other hand, there is some significant studies in the area of computational
morphology for Ethiopian languages like Amharic [5, 8, 21, 22 and 29], Oromo [22] and
Tigrinya [22]. Moreover, there are also works performed for Afaraf [2] and Ge’ez by
Yitayal Abate [34]. But, to the best of our knowledge there is no academically or published
study that had been made so far to develop morphological analyzer for Af-Somali.
Af-Somali is the official language of Somalia, Ethiopian Somali region and it’s the working
language for Kenyan Northern Province and Djibouti [26]. It is also the instructional medium of
education of all the schools of these countries, which means that the language is spoken by a
number of people and needs to be given attention to computationally process the language.
Furthermore, a large number of official documents, religious books and computerized documents
are found in Af-Somali, these makes the language to be predominantly used in word processing
activities in different areas. In addition to this, there are some NLP applications developed for Af-
Somali like, machine translation system by Google, bilingual electronic dictionary project which
is an English to Somali and Somali speech corpus by Niman Abdillahi [26] and these need to
identify words in texts in order to determine their syntactic and semantic properties and the word
is lexical category. For example, to translate a word in Af-Somali to English using the electronic
dictionary, the users couldn’t find the exact meaning or the corresponding word in English
language. Firstly, this process needs to have the morphological analyzer to distinguish the word
category like that tells the word is past or present and it identifies its part-of-speech. Furthermore,
if someone wants to conduct a research on NLP and to access the different resources found in
different format of the Af-Somali, we need a computational processing of the language or in other
way we need to translate the language to the well-developed languages.
3|Page
Considerable research has been done on NLP systems for main Ethiopian languages in general
including various works on computational morphology like, Amharic [5, 8, 21, 22 and 29], Afan
Oromo [22], Tigrinya [22] and Afaraf [2]. However, No research has been conducted so far in the
area of automatic morphological analyzer for Af-Somali. The absence of morphological analysis
systems limits the effort of making computers work comfortable with Af-Somali. Af-Somali is the
same Cushitic origin to the Afaraf and Afaan Oromo and the other Cushitic language family; and
has a much similarity in its vocabulary and grammatical structure, which means they follow SOV
structure. However, it has its own uniqueness by which it differs extensively in terms of focus
noun and verb markers’, morphology and word order which seems to the semantic family of Arabic
language. It is also unique in that, the modifiers occupy a single position, it is pluralization pattern
of the language and their word formation process; hence, it needs its own independent
morphological analyzer.
Af-Somali is morphologically rich and the word formation in the language possesses a number of
different linguistic morphological features including complex verb and noun inflectional,
derivational and compounding, and because of this complexity, automated morphological analyzer
is difficult to construct. Hence, it is a challenging task. Moreover, Af-Somali has more complex
inflectional verbs, adding a large number of affix to the stem word and morphological analysis, is
vital for the development of many practical natural language processing systems such as machine
readable dictionaries, machine translation, information retrieval, spell-checkers, and speech
recognition. Therefore, the aim of this work is to conduct a research on morphological analysis for
Af-Somali morphology that can be implemented from computational point of view, to analyze the
word and morphological category, the word formation process in the language and to model
computational morphological analysis for Af-Somali.
1.4 Objectives
General Objective
The main objective of this thesis work is to develop a morphological analyzer for Af-Somali
word morphology.
4|Page
Specific Objectives
In order to achieve the above general objectives, this thesis work has the following specific
objectives;
1.5 Methodology
Literature review will be conducted to understand the language’s morphology in developing the
morphological analyzer. Consultations of the scholars in the area of Af-Somali morphology will
be conducted to better understand the morphology of the language and to get information which is
helpful for the thesis work. Developing a morphological analyzer requires to analyze and identify
the property of Somali word formation and it will be important to review the researches done on
the development of morphological analyzer for other languages. It is also, so important and will
be helpful by studying and selecting the suitable approach of morphology for Af-Somali. Besides
this, literature in the area of morphological analysis in particular and computational linguistics in
general (e.g. approaches) will be reviewed to better understand how words are analyzed. Thus, the
Finite State transducer based Approach to morphological analysis was selected to analyze and
derive the root and grammatical properties of Somali words.
5|Page
1.5.2 Data Collection and Classification
To conduct any study needs to collect and analyze a data important for the research to be
conducted. In this thesis work a corpus data or a list of words, being electronic text data consisting
of list of words such found in a Book Known as Qaamuus and different magazines from internet
of Af-Somali words will be collected. The unique word-forms will be classified into different
categories such as nouns, verbs, adjective, etc. and further subdivisions have been made according
to their morpho-syntactic behaviors using Xerox finite state tool.
1.5.3 Analysis
The classified data will be analyzed into root or (stems) and affixes for each category using Xerox
finite state tool in lexicon formalism. Then phonological rules have also been identified and
formalized for each category by using xfst-tool.
1.5.4 Implementation
Finite state transducers for each group of words will created following concept of ‘finite state
transducer’. Then, a computational model for Af-Somali inflectional morphology will be
implemented using Xerox Finite State Tool (xfst) developed by the two principle researchers at
the Xerox Palo Alto Research Center.
1.5.5 Testing
In this thesis work, finite state approach will be used to develop, the morphological analyzer. A
wordlists of surface word forms (tokens) will be extracted from Af-Somali Dictionary Book
(Qaamuus) and will be inserted in to the prototype to be analyzed. An output was considered
correct only if it found all legal combinations of roots and grammatical structure for a given word
form and included no incorrect roots or structures.
As morphological analyzer is a vital step in starting natural language processing for any language,
Af-Somali morphological analyzer is developed for Af-Somali morphology to have more efficient
and improved NLP applications like Spelling and grammar checker, POS tagger, machine
translation system, etc. Besides it has a great contribution to the linguistic experts to easily analyze
6|Page
the language’s morphological properties and when the applications related to Af-Somali are
developed, such as the end users who are seeking the information stored in Af-Somali can be
benefited from the analyzer by identifying the word is morphological categorical property. In this
regard, this work can be basic and very much useful for the languages’ technological improvement.
The computational analysis of morphology in Af-Somali would be a central and essential
component for the development of other Af-Somali processing applications.
Somali linguistic varieties are divided into three main groups: Northern, Benadir and Maay. The
Northern Somali forms the basis for Standard Af-Somali. So, the scope of this study is limited to
develop a morphological analyzer for the standard Af-Somali/northern Af-Somali morphology. It
doesn’t include other dialects of Af-Somali. On the other hand, this study mainly focuses on the
written form of words. Derivation and compounding are also morphologically important, but they
have not been dealt with in this thesis work. Despite the fact that there are a number of
models/approaches for computational analysis in the literature, a finite state approach is employed
in this thesis work.
This thesis work has been structured into six chapters. The first chapter of this thesis work, started
by giving background information of the thesis work, which introduces natural language
processing and morphological analysis, presenting the problems that motivated us, objectives and
the methodologies followed. Also the first chapter describes about the importance and the scope
of the thesis work. In chapter 2, we presented literatures reviewed for the thesis work. It looks into
the general Af-Somali word morphology and the general characteristics of Af-Somali part of
speech. In this chapter, we also presented the morphological analysis approaches. The studies
related to this thesis work are presented in chapter 3. The fourth chapter describes the design and
implementation of all those analyses done in the preceding chapters. In chapter 5, the
experimentation and evaluation are discussed. In the last chapter 6 we have concluded the thesis
work and give a direction to the future works related to this thesis.
7|Page
Chapter 2 : Literature Review
2.1 Introduction
This chapter presents documents reviewed, which are important for the development of Af-Somali
morphological analyzer. Mainly, this chapter presents Af-Somali morphology giving more
emphasis on the description of the morphological processes involved in the word formation and
generation. It also presents the Af-Somali background information and phonetics. In addition to
this the chapter reviews the different computational approaches employed in natural language
processing systems and morphological analysis.
2.2.1 Morphemes
For example, the words like “afuri, ababi, toosi” and other second group of Af-Somali
verbs use “in” as infinitive marker which makes “afurin, ababin and toosin”. But when the
same morpheme is attached with a different word, it is realized as a different morph. So,
the same morpheme can be realized by different morphs in a language. These different
8|Page
morphs of the same morpheme are called allomorphs. An allomorph is a special variant of
a morpheme. For example, the second person singular marker in Af-Somali is sometimes
realized as o, t or s and the morpheme -t has the morph "-t" in birta (the metal), but "d" in
mindida (the knife) of definite marker in feminine nouns. These are the allomorphs of "-t".
A group of allomorphs make up one morpheme class.
In addition to this, morphology deals with all combinations that word forms or parts of
words. So, the two broad classes of morphemes are stems and affixes. The stem is the
“main morpheme” of the word, supplying the main meaning, for example, “guriga” where
guri (house) is the stem and “ga” is the affixes which adds an additional meaning “the”.
2.2.2 Affixes
An affix is a bound morph that is realized as a sequence of phonemes. Affixes are classified
according to whether they are attached before or after the form to which they are added.
Prefixes are attached before and suffixes after. Most Af-Somali word uses the suffixes and
a few number of verbs may use the prefix type of affixes. Therefore, we can classify
languages into concatenative and non-concatenative languages based on the morphology
they possess. Non-concatenative language is called template or root-and-pattern
morphology and Af-Somali possesses this system in its plural formation of nouns. For
example, its duplifix property of the fourth noun declension “aC” as buug-buugag and fool-
foolal.
Word is defined as the smallest thought unit vocally expressible composed of one or more
sounds combined in one or more syllables. A word is a minimum free form consisting of
one or more morphemes. There are three broad classes of ways to form words from
morphemes and Af-Somali make use of these three forms in word formation, inflection,
derivation and compounding.
9|Page
2.2.4 Inflection
2.2.5 Derivation
2.2.6 Compounding
Compounding is the joining of two or more base forms to form a new word. Such frequent
root-root fusions are very common in written Af-Somali. Compounds are formed by
combining uninflected noun forms with semantic content with either different inflected
verbal forms with no semantic content. For example, the Af-Somali plural noun “buugag”
books with the verbal form sheeg for another noun of “buugagsheeg” bibliography.
10 | P a g e
literary Somali [26]. The written system of the language was adopted in 1972 and there are no
textual archives before this date. It uses Roman letters and doesn’t consider the tonal accent [26].
The phonetic structure of Af-Somali has 22 consonants and 10 vowels, 5 long and 5 short vowels
[33]. Af-Somali is also a tone accent language with 2 to 3 lexical tons. Af-Somali consonants
follow the same order and have the same value with the equivalent letters of the Arabic alphabet,
except G. As presented below some alphabets are not found in English and this alphabets are
similar to Arabic voiced. The Af-Somali alphabets are preceded by ' (‘= alif) ' and contains 21
consonants which are B, T, J, X, KH, D, R, S, SH, DH, C, G, F, Q, K, L, M, N, W, H, Y and other
ten vowels of Somali language which are a, i, e, u, and o and their long counterparts aa, ee, ii, oo
and uu. There is no problem for the Latin understanding and the vowels have the same value as in
Spanish or Italian.
The syllable structure of the Somali language is (C) V(C) (C) [items in parentheses are optional]
and most words have a di- or tri-syllabic structure (root morphemes and affixes are usually mono-
or disyllabic [33]. Af-Somali is of the same Cushitic origin to the Afaraf and Afaan Oromo and
the other Cushitic language family; and has a similarity in its vocabulary and in their basic word
order, which means they follow SOV structure. But, the most distinguishing characteristics of Af-
Somali is that, double pluralization processes such as the ones illustrated in Table 2.1, where an
independently productive plural suffix -yáal can be added to already plural forms such as nim-á-n
‘men’ or naag-ó ‘women’.
11 | P a g e
The other and important characteristics that distinguishes Af-Somali from the other Cushitic
languages is that, existence of unquestionably derivational process that takes inflected plural forms
as a basis as illustrated in Table 2.2.
Therefore, like any other language there are some common notable characteristics in AF-Somali
and these are inflectional system, inflected forms in composition/derivation, conjugational classes,
affixation, and reduplication. In addition to this, there are three broad classes of ways to form
words from morphemes in AF-Somali namely, inflectional, derivational and compounding. So, in
this work we consider the analysis of inflectional word formation processes relating to the
important AF-Somali part of speech. Therefore, the most important part of speech in Somali
language are nouns, verbs and adjectives and we present their word formation process in the
following sections.
2.4.1 Nouns
Grammatically, Af-Somali nouns are encoded morphologically by way of affixation to root and
stems. Also, as in other related languages, Af-Somali nouns are inflected for gender, number and
person. Nouns in Af-Somali, like any other languages, are the names of persons, places, things and
abstract entities from estimated point of view. Nouns are inherently masculine or feminine. In
general, a noun consists of a root and affixes, which provides a combination of gender and number
marking. The main complication is that there are several declension classes, with specific singular
and plural suffixes for groups of classes. So, Af-Somali is marked for gender distinction,
pluralization and determiners as we will present as follows.
12 | P a g e
Gender Markers
Somali language nouns can be marked for gender to distinguish between masculine and feminine.
Some of the Af-Somali nouns are distinguished by accentual tone difference. But in this thesis
work, we will only consider the nouns that are marked for gender changes. The markers for Af-
Somali gender changes are only suffixes that distinguish between the masculine and feminine. The
markers for the feminine and masculine are shown in the following Table 2.3. As the Table 2.3
shows “ka, ha, a and ga” are masculine markers and ta, da and sha.
Masculine marker Ka Ha Ga a
Feminine markers Ta Da sha
The gender markers in Af-Somali are attached to the nouns as suffixes to differentiate between the
masculine and feminine. In the Table 2.4, we will describe how the markers are suffixed to the
nouns of Af-Somali.
Even though we have presented the nouns and how the gender markers are suffixed to them, there
are different rules that have to be captured in this study. In Af-Somali the basic markers for gender
are ‘ka” and “kii’ for masculine and ‘ta and “tii” for feminine. But the markers can be changed
based on the last character of the words. For example if the masculine nouns are ended up with the
vowels i and e the ka marker is changed into ga and ha respectively and if the feminine nouns are
ended up with the consonant l the feminine marker “A” is changed into “sha” and the “l” is deleted.
The other rule is that all feminine words that end up with the vowel o take the “da” gender marker.
13 | P a g e
Pluralization System of AF-Somali Nouns
There are different rules to change the singular nouns of Af- Somali into plural by looking at the
gender of the words. Most Af-Somali noun pluralization is inflectional, which means it doesn’t
change the grammatical word category and most of them become plural by simply taking suffixes.
As described in a Table 2.5, one syllabic Af-Somali words can be plural with partial reduplication
of their last consonant alphabet and ‘a’ vowel is inserted between the double consonants. If singular
Af-Somali noun ends with the consonants like b, d, n, l and r the last consonant of the word
becomes double and ‘o’ vowel is added to make the word plural and the gender is changed in to
feminine. And also, if the noun ends with the consonants like s, q, c, f, x, and I, we add the root
word ‘yo’ suffix as a plural. Some nouns which are two syllabic singular words are changed into
plural by adding the suffix ‘o’ and the alphabet that is found before the last consonant is deleted
and their gender remains unchanged. In addition to this, the nouns that end up with the alphabet –
e is changed into plural by adding the suffix –yaal. There are some words derived from Arabic
language which becomes plural like the Arabic pluralization. As a result of this, Af-Somali nouns
are classified into seven declensions as shown in Table 2.5, based on how they become plural and
the gender of the plural with respect to the singular, as shown in the Table 2.5, if the singular word
is masculine and changed into feminine when it becomes, plural that word is in declension one [1].
14 | P a g e
Aabe Masculine Singular Aabeyaal Feminine Plural Father Dec-7
The determiners are the modifiers which add meaning to the noun by attaching as a suffix. They
are classified into 4 types according to the meaning they add to the noun. These are, Articles
(Qodob), demonstrative (Tilmaame), interrogative (Weydimo) and possessive (Lahaansho).
Articles
AF-Somali Articles take different forms like, -ka and – kii for masculine nouns, and –ta and –tii
for feminine nouns. If the person we are talking about is far from us or the thing we are reporting
is past, we will change ka/ta into –kii/-tii respectively. The form of the articles are changed into
another form by looking at the last alphabet of the noun that the article is attached to. For example,
let us take the noun “kabo” and add the article “ka”; then “ka” is changed into ha and the word
becomes “kabaha”. So, we have described this process in the Table 2.6, which article is attached
to the noun and how it was changed. As indicated in Table 2.6, the article marker –k can be
changed into –g when it is suffixed to the masculine nouns that ends with the characters like,-g, -
w, -aa, -u, -y or –I and the article –k can be changed into –a when the masculine nouns ends up
with the characters like,-h, -x, -q, -c, -kh. In addition to this, the feminine article marker –ta can
be changed into –da or –sha. –T can be –d if it is suffixed to the noun that ends with the characters
like -o or –d, -c, -x, -h, -y, (‘) and the “ta” article can be –sh when it was suffixed to the feminine
nouns that end with the character “–l” by deleting the “l” character.
15 | P a g e
Demonstrative suffixes
Like the articles, demonstratives are suffixed to the nouns to modify the meaning of nouns in
determining the farness or where the things are. Their difference depends on the relationship that
is found between the subject and object or the distance between the person talking and what he
was talking about. So, in Af-Somali we have three different demonstratives of noun markers as
described in the Table 2.7, which indicates nearness (kan), farness (kaas), to left/right (keer) for
masculine and nearness (tan), farness (taas) and to left/right (teer) for feminine.
Possessive Suffixes
In Af-Somali the possessive suffixes are used to represent in the word that something you own or
possession like other languages and are classified into masculine and feminine which depends on
the degree of person and this forms 6 different possessives as indicated in Table 2.8.
16 | P a g e
Interrogative Suffixes
The interrogative suffixes are determiners which adds question like meaning and uses markers like
other determiners that can be masculine and feminine. So use – (kee) for masculine nouns and the
– (tee) suffix for feminine nouns as we described in the Table 2.9.
2.4.3 Adjectives
Adjectives, in turn, do not belong to a clearly defined category in Af-Somali. Items such as yár
‘small’ and wéyn ‘big’ are best interpreted as state verbs displaying a particular defective
paradigm. Adjectives are inflectionally pluralized through reduplication. The reduplicated plural
is formed by prefixing a copy of the first syllable to the stem. Only the second syllable bears the
high tone. Besides this adjectives can be marked for person, definiteness and have tense markers.
For example the plural form of adjective words like cad, cusub, yar are described with the Table
2.10.
The verb is the most important part of speech in Af-Somali, which can be inflectionally complex
than other parts of speeches. Verb morphology is slightly more complex. Again, a typical verb
17 | P a g e
consists of a root plus a number of affixes. These include derivational affixes (Somali includes a
passivizing form which can only be applied to verbs which have a ‘causative’ argument, and a
causative affix which adds such an argument) and a set of inflectional affixes which mark aspect,
tense and agreement [25]. It has complex alternation patterns and it is basic building part of the
Af-Somali verbs are the root word, modifiers, person and conjugation. The most important that
have to be described is the verbs conjugations. So we have presented some of properties of
conjugations with an examples as follows. The conjugation is a thing that shows the verb’s tense,
aspect and mood. The agreement of the person and tense produces 6 different forms of a word as
we illustrated with an examples in Table 2.11. And also the table shows the person agreement with
tenses and the person markers for each the 6 forms.
As shown in the above table 2.11 (0) indicates the person 1st.sg, 3rd.sg.masc; 3rd.pl. And the suffix
–t shows the 2nd.sg, 3rd.sg.fem, 2nd.pl; the suffix –n also indicates the 1st.person Pl. These can be
also affixed by the suffixes like –ay or –een for the conjugation of the past verb and when the verb
is present it takes the suffixes like –aa/-aan. Af-Somali verbs are classified into five conjugation
categories based on their imperative markers.
Based on conjugation Af-Somali verbs are classified into two broad categories, huge number of
Af-Somali verbs with only suffixes and small number of verbs with both prefix and suffixes. So,
firstly late is consider the conjugation of verbs only with suffix which we mostly used in Af-
Somali. This types of Af-Somali verbs are classified into five types of conjugations known as 1st.
18 | P a g e
conjugations, 2nd. Conjugations, 3rd. conjugations, 4th. Conjugations and 5th. Conjugations. The
1st. conjugation verbs are characterized by that, this verbs didn’t use an imperative marker and they
are mostly one syllabic words. For example let us consider and present this in the Table 2.12.
Secondly, the 2nd. Conjugation of Af-Somali verbs are characterized by that, these verbs are mostly
formed from other verbs and they are suffixed with imperative marker “I”. For example, the verb
“toos” is suffixed with “I” to become the 2nd. Conjugation type of Af-Somali verbs as shown in
Table 2.13.
19 | P a g e
Table 2.14: Example of Af-Somali 3rd. conjugation representation
Lastly, the 4th. Af-Somali verb conjugations are characterized by their imperative marker “o”
which makes this verbs to have different representation and 5th. Af-Somali verb conjugation are
also characterized by their imperative marker “so” and we clearly described the following example
found in Table 2.15 to represent the verb conjugation which shows their inflections like person,
number, tenses and other properties and how this conjugation forms seven different part of verbs
which formed from the person agreement with number and tenses.
Morphologically Af-Somali words are inflectional like other Cushitic languages, but some words
are derivational. Mostly words which are derivational in Af-Somali are verbs and Adjectives,
which can be formed from other categories of words and most adjectives are formed from verbs.
Some nouns are morphologically derived from other categorical word classes in the process of
20 | P a g e
word formations. Most verbs in Af-Somali can be changed in to nouns by taking the suffix (a) and
doubling the last consonant. For example the verb “dil” can be changed into noun by simply adding
“aa” and it becomes “dilaa” the verb “cun” is also changed into noun by adding the character “o”
and the noun formed is “cunto”.
Verb morphology is slightly more complex and gain, a typical verb consists of a root plus a number
of affixes. These include derivational affixes (Somali includes a passivizing form which can only
be applied to verbs which have a ‘causative’ argument, and a causative affix which adds such an
[25]. For example Aadaan (prayer)-noun word becomes “aadanay” (praying) which is a verb and
the noun word iskaashato (cooperation) noun word is changed in iskashi which is a verb.
Also like other part of speech Af-Somali adjectives have a derivational process. There are two
sorts of adjectives, ‘basic adjectives’ (a small number), such as yár ‘small’ and wéyn ‘big’ and
those formed from nouns and verbs by addition of lexical suffixes, such as caan-sán ‘famous’ (cáan
‘fame’), wanaag-sán ‘good’ (wanáag ‘goodness’) and jar-án ‘chopped’ (jár ‘to break’). On the
other hand the compounding of words creates a derivational word which can be formed from two
different words like verb and noun or adjective to noun and others.
There are a number of approaches which are widely used in computational morphology. Some of
these approaches are based on concepts in automata theory, probability, principle of analogy, and
information theory. The computational morphological approaches are broadly categorized into
rule-based and corpus-based approaches.
Corpus-based approaches are statistical in nature and these approaches do not strictly follow
explicit theory of linguistics [32].Suitable machine learning algorithm is used to train the system
and collect the necessary information and features from the corpus. The knowledge acquired is
then used to perform the morphological analysis task [32].Based on the type of text corpora used,
corpus-based approaches can be further categorized into supervised and unsupervised approaches.
Supervised approaches use annotated text corpora while unsupervised approaches uses natural
corpus as those found in newspaper and books. As noted above, these approaches need a huge
21 | P a g e
corpus of words which used to train the algorithm to be developed. So this approach is difficult for
under resourced languages like Somali and it may not produce an efficient and quality output.
Mostly, the most developed languages used the machine learning approach, which mostly requires
huge number of word corpora and electronic dictionary, newspapers and other documents that are
found in the Internet. The languages used this approach to overcome the overload created by the
rule based approach and some of the languages that used this approach are English [30], Arabic
[13], etc. Limited researches are done in this area for local languages such as Amharic [22] and
Ge’ez [34] using corpus based approaches. But, most of local languages are used a rule based
approach specifically the two level morphological analysis.
The rule-based approach strictly follows the explicit theory of the linguistics, which is based on a
theory of morphology laid down by an expert. Kazakov and Munandhar [32] stated that this
approach enables to incorporate sophisticated linguistic theories such as generative phonology into
computational morphology processes [32]. Because of their reliance on linguistic theories, systems
developed using rule-based approaches are often efficient and produce better quality outputs [28].
There are different rule-based methods used to develop morphological analyzer for any languages
and some of these are, paradigm based and finite state automata.
In paradigm based method for a particular language, each word category like nouns, verbs,
adjectives, adverbs and postpositions will be classified into certain types of paradigms. Based on
their morphophonemic behavior, a paradigm based morphological compiler program is used to
develop the morphological analyzer.
The Finite State Automata (FSA) based method uses regular expressions and is used to accept or
reject a string in a given language. In general, an FSA is used to study the behavior of a system
composing of states, transitions and actions. When FSA starts working, it will be in the initial stage
and if the automation is in any one of the final states it accepts its input and stops working. Within
computational morphology, a very significant advance came with the demonstration that
phonological rules could be implemented as finite state transducers (FSTs) and that the rule
ordering could be dispensed with using FSTs that relate the surface and lexical levels directly, so-
called “two level” morphology (TLM) to lexical output) to one that performs generation (lexical
22 | P a g e
input to surface output) [32].TLM is devised to handle morphological analysis and generation in a
bi-directional way. The approach is based on two lexica (one for the underlying and the other for
surface word forms), and a set of morphological rules. The rules establish whether a given
sequence of characters at the surface level (as it appears in the text) can correspond to a sequence
of symbols used to represent the morphemes in the lexicon. In other word, the rules map the two
strings to each other. TLM is currently very popular method in computational morphology
[32].And the most common benefits of FST for NLP stem from several properties of finite-state
devices are true representation, modularity, compactness, efficiency and reversibility.
True representation means that the kind of phonological and morphological rules that are common
in linguistic theories can be directly implemented as finite-state relations. The implementation of
linguistically motivated rules in FST is therefore straightforward and direct. Modularity is the
closure properties of regular languages and relations provide various means for combining regular
expressions, supporting a variety of operations on the languages these expressions denote. For
example, closure under union facilitates a separate development of two grammar fragments which
can then be directly combined in a single operation. The most useful operations under which
transductions are closed is probably composition, which is the central vehicle for implementing
replace rules. Finite-state automata can be minimized, guaranteeing that for a given language, an
automaton with a minimal number of states can always be generated and this property is known
as compactness. Toolboxes can apply minimization either explicitly or implicitly to improve
storage requirements. When an automaton is deterministic, recognition is optimally efficient
(linear in the length of the string to be recognized). Automata can always be determined, and
toolboxes can take advantage of this to improve time efficiency. In addition to this finite-state
automata and transducers are inherently declarative, it is the application program which either
implements recognition or generation. In particular, transducers can be used to map strings from
the upper language to the lower language or vice versa with no changes in the underlying finite-
state device [28].
Finite-state technology (FST) denotes the use of finite-state devices, such as automata and
transducers, in natural language processing. Since the early works which demonstrated the
23 | P a g e
applicability of this technology to linguistic representation. FST is considered adequate for
describing the phonological and morphological processes of the world’s languages [32].In order
to understand how to build the linguistic application, we first need to be acquainted with the basics
of how a finite-state machine works.
So far, the analysis of words in a network has simply yielded one of two responses, either accept,
indicating that the word is in the language of the network, or a reject, indicating that the word is
not in the language. While this can be valuable, as for instance in spell-checking, finite-state
networks are capable of storing and returning much more interesting information [28].
Within computational morphology, a very significant advance came with the demonstration
that phonological rules could be implemented as finite state transducers [11] and that the rule
ordering could be dispensed with using FSTs that relate the surface and lexical levels directly [11],
so-called “Two-level” morphology. A second important advance was the recognition by [11] that
a cascade of composed FSTs could implement the two-level model. Finite-state techniques are
probably the most prevalent approach employed by automatic morphology systems, as their
simplicity and outstanding efficiency are unequaled.
FSAs can be used to recognize particular patterns, but don’t, by themselves, allow for any analysis
of word forms. Hence for morphology, we use finite state transducers (FSTs) which allow the
24 | P a g e
surface structure to be mapped into the list of morphemes. FSTs are useful for both analysis and
generation, since the mapping is bidirectional [28].
25 | P a g e
German, Arabic etc. as well as Afaraf, Afan Oromo, Amharic and others. Xerox finite state
technology (XFST) is a programming language for regular expressions, which can be compiled
into finite state networks and is used here for analysis of Af-Somali morphology. It comes bundled
with a set of tools for compiling and working with FSTs. XFST includes two components known
as lexc and xfst. lexc is a compiler for lexicons in the lexc language, which is specifically designed
for handling morphotactics (the syntax of the morphemes) in natural languages and xfst is the core
tool providing an interface to the finite state calculus for building, accessing, manipulating finite
state networks and a compiler for regular expressions and replacement rules which will be essential
for any work.
Lexicon Compiler
Lexicon compiler (Lexc) is the finite-state tool which has been developed by Xerox for defining
two-level lexicons. Lexc is just one of several ways to specify finite-state transducers, but it is
especially designed to facilitate the work of the lexicographer [28].
Lexicons and morphotactic information are encoded in the lexc language, which is a kind of right
recursive phrase-structure grammar, and are compiled into finite-state transducers as shown in
figure 2.1. Finite-state transducers (FSTs) are data structures that encode regular relations [28]
which are mappings between two regular languages. For our human convenience, we can visualize
a finite-state relation as having an upper-side regular language and a lower-side regular language
and each string in one language is related to one or more strings in the other language. By
convention, the upper-side or analysis strings of an FST compiled from a lexc description consist
of underlying morphemes (strings of phonemes and morphophonemic) and multi-character symbol
tags like +Noun, +Verb, +Adj(adjectives, +Conj (conjugations), +ImpeV (imperative verb),
+Masc[masculine], +Fem[feminine], +Sg[singular], +Pl[plural], etc. that identify the morphemes
[3].It accepts a text file containing a user-defined lexicon encoded using to the following syntax.
Lexical-item Continuation-class;
The lexical item is usually the unmarked form of the word (the root or headword given in a
dictionary). In the context of this work the lexical item is the stem (the root in most cases) to
which inflectional affixes are attached, i.e. a free morpheme. The continuation class can be
a pointer to another lexicon or it can be the end-of-string marker, the example below found in
26 | P a g e
Figure 2-1 shows two entries for ‘tag (go)’, one of which is followed by the end-of-string marker
‘#’ and the second which points to the continuation class past Tense, where the aspect form of the
word will be defined.
The xfst part of this frame work is mainly concerned with the realization, i.e. surface forms, and
phonological alternation rules. This component takes the output of lexc transducer (lexical
grammar) as input, which has stems with grammatical features labeled with tags and it is passed
through additional rules to obtain the acceptable surface forms. The xfst component helps to
compile the lexc grammar into an FST as well as other rule FSTs using lexc files and rule files
respectively. Generally, the following Figure 2-2 illustrates the components of morphological
analyzer using finite state transducer, where the The .o. operator represents the composition operation.
27 | P a g e
Figure 2-2: Creation of a lexical transducer
2.8 Summary
In this chapter, we introduced Af-Somali background information, morphology and the Af-Somali
important part of speech words. We have also described finite state technology that is successfully
applied to computational morphology. The regular expression that can be compiled into finite state
network which signifies regular language and the same language can be encoded by the finite state
network. The complex finite state network can be built from the smaller networks using various
mathematical operations such as union, concatenation, composition, complementation, subtraction
and intersection.
28 | P a g e
Chapter 3 : Related work
3.1 Introduction
In this chapter, we present the system developed for computational morphological analysis for
different languages in the world and also in this chapter we look at the approaches they used to
develop the morphological analyzers. Specifically, we will look in detail the rule based approach
of finite state technologies developed and used for the morphological analyzer of Ethiopian and
Cushitic language which are related to Af-Somali.
Cagri [17] developed TRmorph, a two-level morphological analyzer for Turkish. The system is
completely implemented using freely available Stuttgart finite state transducer tools (SFST). As
Cagri [17] presented, SFST is a freely available finite state tool set particularly aimed for
implementing morphological analyzers. The tool uses a simple specification language mainly
based on regular expressions, with additions of the well-known two-level operators that are
particularly useful in implementing phonological (or orthographic) alternations. The TRmorph
was analyzed and evaluated with real world data during its development and the system has been
tested on two relatively large corpora, the METU corpus and Turkish Wikipedia. Generally,
Cagri[17] said, the same process is repeated for successfully analyzed words, where there was no
errors, but with some ambiguous analyses.
Elaine [18] also developed morphological analyzer for Irish language. The system was developed
by using finite-state two-level description with Xerox Finite-State Tools. The system encodes the
inflectional morphology of all inflected parts-of-speech in modern Irish and the morphotactics of
29 | P a g e
stems and affixes are encoded in the lexicon and word mutations are implemented as a series of
replace rules encoded as regular expressions. A major advantage that Elaine [18] get from finite-
state two-level implementations of morphology is their inherent bi-directionality; the same system
is used for both analysis and generation of word forms in the language. The system designed for
broad coverage of the language, is evaluated against the most frequently used words in a corpus
of contemporary Irish texts. Finally, Elaine [18] gives as suggestion to include derivational
morphology and dialectal or historical word-forms that the system was not implemented.
Generally, we can understand that, morphological analyzer systems can be used as a component
part in many NLP applications such as spelling checkers/correctors, stemmers, and text to speech
synthesizer’s [18].
In addition to this, Xuri [30] developed an English morphological analyzer using machine learning
approach. The system is consists of two closely related components; morphological rule learning
and morphological analyzing. As Xuri [30] presented unsupervised learning has been employed to
obtain a set of affix transformational rules and the experiment presented shows that the analyzer
has a satisfactory performance.
However as stated in [30], problems remain and the most difficult is combinatory ambiguity. This
shows that a larger context, such as part of speech or context between words is needed for a correct
analysis of these words. So, mostly the machine learning approaches require to have huge number
of wordlist in a corpus trained to give an analysis which did not exactly follow the linguistic rules
of the languages.
Gulshat and Ilyas [19] developed a rule based morphological analyzer and a morphological
disambiguator for Kazakh language. This system gives the implementation details of a rule-based
morphological analyzer of Kazakh language which is an agglutinative language. In the
implementation of the morphological analyzer, alternation and morphotactic rules of these systems
are represented by two-level morphology rules and Foma finite state compiler is employed. As
Gulshat and Ilyas [19] have presented the Morphotactic rules and possible morphemes are defined
in the lexicon file and alternation rules in the system are defined and the rules are composed with
30 | P a g e
the lexicon file in a Foma file. The system was tested and evaluated which shows a beginning work
on the development of morphological analyzer of Kazakh language. This system is working in two
directions as at lexical and surface level and due to the ambiguities in language there is no one-to-
one mapping between surface and lexical forms of words and the system can produce more than
one result.
Also Kenneth [20] developed a morphological analysis and generation of Arabic language. The
system uses Xerox finite state transducer toolkit for its implementation. Kenneth [20] described
that, the Lexicons and morphotactic information are encoded in the lexc language which is a kind
of right recursive phrase-structure grammar, and are compiled into finite-state transducers and
Alternation rules to perform deletion, epenthesis, assimilation and metathesis are written in the
twolc language and/or in a notation known as REPLACE rules. The system was tested and
evaluated with an encouraging performance containing include about 4930 roots. So, for any
language to have a morphological analyzer is one step forwarding to technology for that language.
Micheal [22] developed a morphological analyzer for three of Ethiopian languages, Amharic,
Afaan Oromo and Tigrinya called HornMorpho. The system uses finite state transducer integrated
with python programming language for the implementation and the system uses separate finite
state transducer for each language. In addition to this, the system was evaluated with a web crawler
developed by Biniam Gebremicheal and Michael Gasser [22], stated that, more testing is called
for, this evaluation suggests excellent coverage of Amharic and Tigrinya verbs for which the roots
are known. Although Oromo, a Cushitic language, does not exhibit the root+template morphology
that is typical of Semitic languages, it is also convenient to handle its morphology using the same
technique because there are some long-distance dependencies and because it is useful to have the
grammatical output that this approach yields for analysis. For Amharic, however, the system is
apparently able to at least analyze the great majority of nouns and adjectives. The system treats all
Amharic words other than verbs, nouns, and adjectives as unanalyzed lexemes. But, the tool is not
convenient to Afaan Oromo, because of the language is complicated by the great variation in the
use of double consonants and vowels by Oromo writers [22].
31 | P a g e
The other mostly related language is Afaraf and Ali Mohamed [2] developed the first
morphological analyzer for this languages and used a finite state transducer. As Ali described that
the analyzer, manually annotated 312 tokens, 200 (100 consonant-initial & 100 vowel-initial)
verbal, 80 nominal and 32 adjectival words from three popular Afar magazines2 published in
Ethiopia and Djibouti. 192 verbal, 75 nominal and 28 adjectives were correctly analyzed and said
that the results were evaluated by a human reader familiar with the languages. An output was
considered correct only if it found all legal combinations of roots and grammatical structure for a
given word form and included no incorrect roots or structures [2].
3.5 Summary
A limited researches have been conducted in developing morphological analyzer for Cushitic
languages like Afaan Oromo [22] and Afaraf [2] and both languages analyzers used rule based
approach with finite state transducer. But, to the best of our knowledge no research has been
conducted so far in the area of automatic morphological analyzer for Af-Somali. The absence of
morphological analysis systems limits the effort of making computers work comfortable with
Somali.
32 | P a g e
Chapter 4 : Design of Af-Somali Morphological Analyzer
4.1 Introduction
This chapter presents the design of Af-Somali morphological categories and phonological rules to
design a computational model using the Xerox finite state toolkit. It presents the general
architecture of lexical FSTs for Af-Somali morphological analysis and the morph-tactics of the
language which means how the morphemes co-occur. It also, shows the morph-tactics for each
word class separately with lexc formalism and the alternation rules using xfst interface.
The main objective in the design of the morphological analyzer is to construct a network
which accepts all and only the valid Somali words, and delivers the right analysis. So, in this
section, we clearly present the detailed overview of the morphological analyzer system design and
its components.
The construction of the morphological analyzer system, using finite state transducer will be broken
down into two large components lexicon/ morph-tactics part and phonological or alternation rules
part. The morph-tactics of the language describes what stems and affixes can co-occur and in what
order, are captured in the lexicon. While phonological and morph-phonological alternations
between underlying forms and surface spoken or written forms are implemented using alternation
rules.
33 | P a g e
Figure 4-1: Af-Somali morphological analyzer architecture design
The other common applications of finite-state techniques include handling words whose roots or
stems are not found in the lexicon using guessers, by which the lexical component is replaced by
a phonotactic component characterizing the possible shapes of roots or stems. Guessers is to define
or recognize the words, which are not found in the lexicon, because all words, cannot be collected
or it is time consuming.
34 | P a g e
4.2.1 Lexicon/ Morph-tactics
The design of the tags has become very important in the development of morphological analyzers,
since the tags will deliver linguistic information that occurs on a word being analyzed. The
morphological analyses of Somali word forms are presented in this system in terms of the
following symbols found in Table 4.1.
6 Imperative +imp(imperative)
8 Possessives 1st.Sg,2nd.Sg,3rd.masc,3rd.fem,1st.pl,2nd.pl,3rd.pl
9 Interrogatives +inter(interrogative)
10 Infinitive +inf(infinitive)
After various affixes in the morphology were identified, the order in which these affixes are
attached to the verbal, nominal, adjectival stem was determined in the lexicon database.
35 | P a g e
The lexicon component will be a transducer that accepts as input only valid Somali stems/roots
followed by only legal sequence of tags and produces as output from these, an intermediate form,
where the tags are replaced by the morphemes that they correspond to. Within a lexicon, word
classes (stems) are assigned to separate classes depending on their inflection they require. Each
stem class has an associated continuation class where morphological tags and affixes are
concatenated to the stem. Internal modifications (ablaut) to stems also have been implemented in
the lexicon. The part that accomplishes this, the lexicon transducer, will be written in a formalism
called lexc. The lexc-formalism is more suited for lexicon construction and expressing morph
tactics. For example, in the analyzer about to be constructed, the lexicon component FST will
perform the following mappings shown in the Table 4.2.
All root words and morph tactics rules were entered into lexicon database and all spelling rules
were entered into rules database. Separate FSTs were created for lexicon and rules, and then
combined into one big FST by applying FST composition operation. Therefore, for each word
class we created a separate lexicon and alternation rules described in the following sections.
Having accomplished the first part of the grammar construction, we now turn to the alternation
rules component. The idea is to construct a set of ordered rule transducers that modify the
intermediate forms output by the lexicon component. At the very least we will need to remove
the ^-symbol which is used to separate morpheme boundaries before we produce valid surface
forms. The role of the alternation rules is to modify the output of the lexicon transducer according
to phonological and morph-phonological rules. So, for the above example in Table 4.2, we've seen
that Af-Somali verb3 word class root concatenated with imperative ee and infinitive marker eyn
36 | P a g e
cadd caddeyn (clarifying). However, when the infinitive marker eyn is suffixed to double vowels
(ee) the last vowel of the double vowels e is replaced with the character y.
A way to describe the process of forming the correct verb3 word class is to always represent the
infinitive suffix as the morpheme eyn as we have, and then subject these word forms to alternation
rules that eliminate the final double vowels and only add the infinitive suffix. This, among others,
is the task of the alternation rules component to produce the valid surface forms from the
intermediate forms output by the lexicon transducer. Since alternation rule FSTs that are
conditioned by their environment are very difficult to construct by hand, we use the replacement
rules formalism in xfst to compile the necessary rules into FSTs. This is accomplished by the
regular expression composition operator (.o.).
Somali has several phonological alternations involving reduplication, lenition, vowel harmony and
tone. With this documentation we described the design of alternation rules clearer and we describe
or represent with an examples.
37 | P a g e
As mentioned above in the development of Af-Somali verb lexicon; we classified the verbs into
five groups known as V1, V2, V3, V4 and V5 which we illustrated their V1 verbs in the above
figure. The figure also shows that, there is a lexicon called verbs which contains five sub lexicons
of v1, v2, v3 v4 and v5 which also have a sub lexicon called v_suffixing and the detailed
description of the lexicon is found in Appendix-B.
V_suffixing sub lexicon contains all the suffixes attached to the root verbs which is described or
created in different lexicon as shown in the Figure 4-2. In this lexicon, we have presented the
morphemes that goes with the root verbs and in which order they co-occur with the verbs.
38 | P a g e
presented the Af-Somali verb finite state networks which shows the morphemes and the root verb
and their order as shown in Figure 4-3. And in this process the states are described with the rule
of Xerox finite state staring from the root verb till the word ends. As shown in the Figure 4-3, the
arcs represent states and the arrows indicate the tags and the double circle indicates that the state
is final state.
39 | P a g e
Generally, we have described the word root/stem lexicon and their morphotactics with an examples
as shown in the Table 4.3. For example, the morphotactics of Af-Somali second subgroup verb
(V2) words are illustrated in Figure 4-4, and we also presented the finite state network with an
example in Figure 4-4, using the verbs of “toosi” and “caddee” which shows how the verbs of
second and 3rd group of Af-Somali verbs generated and the order in which they co-occur.
Lexical level Toos +V +imp +Sg +inf +pers +paste The word
40 | P a g e
Figure 4-4: Example representation of Af-Somali second and third group verb FSN
41 | P a g e
In order to construct a finite state transducer for alternation rules, firstly we have defined Af-
Somali alphabets such as ‘, b, t, j, x, kh, d, r, z, sh, q, k, l, m, n, w, h, y, (‘, B, T, J, X, KH, D, R, S,
SH, DH, C, G, F, Q, K, L, M, N, W, H, Y and the five vowels a, e, I, o, u. but Af-Somali also has
other five long vowels which are aa, ee, ii, oo, uu. Some vowels in certain words are dropped if a
suffix starting with a vowel is attached and the detailed description of Af-Somali alternations are
presented in Appendix-A.
42 | P a g e
in <e>. for example tag’go’ takes an infinitive marker ‘I’ and becomes tagi’to go’, but when we
add the 2nd.PL.paste tense of ‘een’ the verb becomes tageen ‘they went’ which means I replaced
with e. also in Af-Somali verbs we have to consider the property of l replacement with sh when
we add verb with 3rd.Sg.masc marker t and l is realized as sh as represented in Figure 4-6 and as
an example in Table 4.4.
43 | P a g e
Figure 4-7: person morpheme realization
44 | P a g e
Figure 4-8: Af-Somali noun lexicon
In addition to this, there is also a separate lexicon which includes the suffix tags and the order in
which these suffixes co-occur with the root nouns as illustrated in the following Figure 4-9. But
the general co-occurrence of the root noun with the morphemes are shown in figure by using
finite state networks and this shows the state in which the transducer passes. This figure simply
shows the first declension known as D1_f which are feminine nouns and we have put the detailed
description of the noun lexicon in Appendix-C.
45 | P a g e
In general, the morphemes attached to the root nouns are number (Sg,Pl), definiteness (def,indef),
interrogatives (inter), possessives and demonstratives as we presented in Figure 4-10 which the
finite state network of the Af-Somali nouns.
46 | P a g e
Table 4.5: Example of noun declension 2 morphotactics
AF-Somali has two kinds of reduplication: partial and full. Reduplication is typically a strategy
for marking plural in nouns and adjectives in some declensions, but also appears in verbs as a
derivational process. The inflectional processes are quite productive, but the derivational processes
are not as productive. The Partial reduplication occurs in the 4th declension of nouns, but a subtype
47 | P a g e
of these 4th declensional nouns also has full reduplication. Partial reduplication includes
epenthesis of <a> and in nouns it is suffixing. Also, the template is slightly different. For late is
see with an example found in the following Table 4.6.
So, this alternation can be presented with an example in table 4.7 as follows.
48 | P a g e
Reduced present forms are identical to the root, whereas past forms display distinct inflectional
endings. As described in Figure 4-12, Af-Somali adjectives are few in number and we defined root
lexicon known as adjectives and sub lexicon known as Ad_suffix which indicates the suffixes
attached to the Adjectives using lexc formalism. The Af-Somali adjectives inflectionally use
person markers and tenses which needs with the agreement of numbers as shown in Table 4.8 with
an example.
In addition to this, the morphotactic representation of adjectives are also presented in the following
Figure 4-13 and describes the order that the suffixes attached with the adjectives.
49 | P a g e
Chapter 5 : Experimentation and Evaluation
5.1 Introduction
This chapter discusses the test and evaluation conducted on Af-Somali Morphological analyzer.
In the discussion emphasis is given to assess the outputs produced and the test result found. So the
testing of any sizable natural-language processing system is notoriously difficult [8] and the
morphological analyzer is an essential and basic tool for building any language processing
application for a natural language e.g., Machine Translation system.
5.2 Experimentation
We have developed the morphological analyzer using XFST tool developed by Xerox. It supports
UTF-8 character coding which is important for the implementation of Af-Somali computational
morphologies. The tool is based on a lexicon and a set of rules for root and morphemes. This
lexicon contains the list of root words and its category separated by a tab. The analyzer fails on
giving a complex word as an input and the corresponding root word does not exist in the lexicon
file. We have developed the Af-Somali lexicon and the rules file required for analysis. The lexicon
is designed to reflect the word categories in the Af-Somali language.
The lexicon contains different states for each of the root words, starting with the declaration of the
tags. For example the verb lexicon is illustrated as shown in Figure 4-2. The root words and its
category are separated by a semicolon as shown in Figure 5-1 of Af-Somali verb. The left side of
the colon represents the upper side or the analysis form of the transducer, and the right side shows
the lower side or the surface form as presented on Appendix-B. The hash symbol at the end of a
row indicates the end of the transition, and therefore, that state is the final state. The analyzer takes
the surface form as input and produces the result as the grammatical structure of the word or the
lexicon form.
50 | P a g e
Figure 5-1: AF-Somali Verb to suffix attachment
Generally to evaluate and test any morphological analyzer requires to measure the following things
the total number of word tokens correctly accepted by the analyzer versus the number of words
incorrectly processed by the analyzer and the total percentage that are correctly analyzed in context
versus the total percentage of tokens that are not analyzed at all in the context. Although, we have
to know the total percentage of wrongly analyzed linguistically regardless of context. Finally, how
many correct analysis have not output for a token is calculated.
Therefore, we have manually annotated 220 tokens, 90 nouns, 120 verbs and 8 Adjectives of words
from the book known as (qaamuus). 77 nominal, 105 verbal and 6 adjectives were correctly
analyzed. The results were evaluated by a human reader familiar with the language. An output was
considered correct only if it found all legal combinations of roots and grammatical structure for a
51 | P a g e
given word form and included no incorrect roots or structures. Thus, the overall accuracy of the
system is: 84.1% was correctly analyzed as shown in Table 5.1.
Wrong in
correct%
Nominal
Nominal
Adjectiv
Adjectiv
Correct
Correct
correct
correct
wrong
Verbs
Verbs
Total
Total
in %
%
es
So, from this we can understand that, the total number tokens analyzed was 218 and out of this
86.2% was correctly analyzed, 13.76% is wrongly analyzed and total 10 tokens failed to be
analyzed by the system.
Lastly, we have observed that, there was an errors because of the limited size of lexicon we
annotated and also we haven’t incorporated Guesser component which helps to guess the words
that was not found in the lexicon. In addition to this, the Af-Somali authors write words in different
formats and this gives to analyze one word in different way. For example, some authors or writers
write the word Dawlad while others write Dowlad (government).
52 | P a g e
Chapter 6 : Conclusion and Future Work
6.1 Conclusion
Language is one of the main tools for communication. Thus, its investigation will provide better
perspectives on all other aspects related with NLP. However, the formalization and computational
analysis of Af-Somali morphology are not worked out. In other words, there is lack of tools for
analysis of Af-Somali morphology from computational point of view. Moreover, grammar
resources contain variances depending on scholars. For example, in some resources there are that
write down the adjectives as verbs, whereas others describe adjectives as a separate word class. To
summarize, building correctly working system of morphological analysis by combining all
information is valuable for further researches on the language. In this thesis, a detailed analysis of
Af-Somali has been performed. Also, the formalization of rules over all morphotactics of Af-
Somali is worked out. By combining all gained information, a morphological analyzer is
constructed. This thesis reports on an attempt made to develop Af-Somali morphological analysis
system using finite state two level approach. The report started off with brief introduction to
concepts and principles used in the study. The introduction also includes description of
morphological analysis and the unique feature of Af-Somali words along with their peculiar
morphemic components.
The different subcategories of rule-based approaches were described briefly. In this study, finite
state two level approach was considered. Finite state transducer is the main tool for the
development of morphological analyzer and the implementation has been based on [8]. Two level
morphology is proving to be very well suited to Af-Somali morphology. A major advantage of
finite state two-level implementations of morphology is their inherent bi-directionality; the same
system is used for both analysis and generation of word forms in the language. An additional
advantage is the high efficiency of finite-state networks that allows to process even large words
within a few seconds. We presented the design and implementation of analyzed categories into a
finite state transducer using Xerox Finite State Toolkit in chapter 4. First, all forms of verbs, nouns
and Adjectives have been implemented in separate lexc formalism. The rules identified have been
implemented in xfst files respectively. The finite state transducers of each category and finite state
53 | P a g e
transducers of rules for respective categories are composed separately. All the finite state
transducers have been composed together resulting into a single lexicon finite state transducer
which can be used as morphological analyzer and generator.
However, the study is carried out under a number of constraints. The main challenge of these was
to figure out the linguistic, especially the exact morphotactical details needed for analysis and
(generation). The lack of any linguistic lexical resources, the list of words for Af-Somali in an
electronic form was so demanding. And also it was difficult to find out the morphological rules
that was used in the system.
The morphological analyzer/generator can be useful for linguists who wish to understand
the morphological processes of Af-Somali, as well as for language learners to aid in their
language comprehension and the practice of word conjugation or declension, The main weakness
of the system results from the limited number of available roots and stems in the lexicon, to
incorporate Guesser and thus can be improved by increasing the number of stems and phonological
alternation rules and using Guesser component.
As this work deals only with inflectional morphology and the northern Somali dialect, there is a
need to extend the system to also include derivational and compounding morphology and the
Benaadir and Maay of Af-Somali morphology.
Finally, it is good to note that when the SoMorph is completely describe Af-Somali morphological
analysis it will be useful tool for large-scale NLP applications like machine language translation,
Pos checkers in the future.
54 | P a g e
References
55
[16] Lauri Karttunen, Constructiong lexical transducers. In the proceeding of the fifteenth
international conference on computional linguistics, 1994.
[17] Çagrı Çöltekin, A Freely Available Morphological Analyzer for Turkish, Center for Language and
Cognition (CLCG) University of Groningen
[18] Elaine Uí Dhonnchadha, A Two-level Morphological Analyser and Generator for Irish using
Finite-State Transducers, institute of technology of Éireann 31 Plás Mhic Liam, Baile Átha
Cliath 2, Éire, and Dublin City University Glasnevin, Dublin 11, Ireland
[19] Gulshat Kessikbayeva and Ilyas Cicekli, A Rule Based Morphological Analyzer and A
Morphological Disambiguator for Kazakh Language, Linguistics and Literature Studies,
2016
[20] Kenneth R. Beesley, Finite-State Morphological Analysis and Generation of Arabic, Xerox
Research Centre Europe 6, chemin de Maupertuis 38240 MEYLAN, France, 2001
[21] Mesfin Abate, Yaregal Assabie (2014).”Development of Amharic morphological analyzer
using memory based approach”, 9th International Conference on NLP, PolTAL, Warsaw,
Poland, September 17-19, 2014. Proceedings.
[22] Michael Gasser (2009). “HornMorpho1.0: a system for morphological processing of
Amharic, Oromo, and Tigrinya”.
[23] KhumbarDebbarma, BrajaGopalPatra, Dipankar Das, Sivaji Bandyopadhyay2
Morphological Analyzer for Kokborok
[24] KorayAk, OlcayTanerYıldız, 2011. Unsupervised Morphological Analysis Using Tries,
Dept. of Computer Science and Engineering. Isık University
[25] Nicola Lampitelli, Evaluative morphology in Somali, Université Paris Diderot-Paris
[26] Nimaan Abdillahi, Building and Evaluating Af-SomaliCorpora, Proceedings of the 2014
Workshop on the Use of Computational Methods in the Study of Endangered Languages,
pages 73–76
[27] R.Akilan* and Prof. E.R.Naganathan , Morphological Analyzer for Classical Tamil Texts:
A Rulebased approach, Research Scholar, (Department of Computer Science, Bharathiar
University, Coimbatore) Programmer, Central Institute of Classical Tamil, Chennai.
[28] Shuly Wintner and Gelbukh: Finite-State Technology as a Programming Environment,
CICLing 2007, LNCS 4394, pp. 97–106, 2007.
56
[29] Saba Amsalu, Girma A. Demeke. (2006). Non-concatinative Finite State Morphotactics of
Amharic Simple Verbs.
[30] Xuri TANG , English Morphological Analysis with Machine-learned Rules, Dept. Foreign
Languages, Wuhan University of Science and Engineering, 430073, Wuhan, P. R. China
[31] Nicola Lampitelli, The morphophonology of Somali nouns, June, 15-18 2011
[32] Kazakov Dimater & Manandhar Suresh (2000) Unsupervised Learning for Word
Segmentation Rules with Genetic Algorithms and Inductive Logic Programming.
[33] John I. Saeed, “Somali Reference Grammar”, the University of Virginia, 26 Sep 2007
[34] Yitayal Abate, 2013.” Morphological analyzer for Ge’ez verbs using machine learning
approach”, in the thesis of Addis Ababa University.
[35] Shlomo Yona , A finite-state based morphological analyzer for Hebrew, thesis in
Department of Computer Science, November, 2004.
57
1.9 Appendix-A: Alternation Rules for Noun and Verb
1
2
3
1.10 Appendix-B: Af-Somali verb Lexicon
!!Somorph-lex.txt
faalal V1suffixing;
4
!!gaadhsii V2suffixing; baadiyee V3suffixing;
tabaabulee V3suffixing;
aabee V3suffixing;
5
caashaqo V4suffixing; aamuso V5suffixing;
xabeebso V5suffixing;
6
LEXICON V1suffixing +V2+Sg+inf:in #;
+V1+Sg+1P:0 #; +V2+1PSg:y #;
+V1+Pl:a #; +V2+3PSgmasc:y #;
+V1+Sg+inf:i #; +V2+3PPl:y #;
+V1+2P:s #; +V2+3PSgfem:s #;
+V1+Sg+3Pfem:t #; +V2+2PSg:s #;
+V1+1PPl:n #; +V2+2PPl:s #;
+V1+pres:aa #; +V2+1PPl:n #;
+V1+1P+pres:naa #; +V2+pres:aa #;
+V1+paste:ay #; +V2+2P+pres:saan #;
+V1+2P+paste:tay #; +V2+3PPl+paste:yay #;
+V1+1PPl+paste:nay #; +V2+Sg+inf+paste:nay #;
+V1+3Pfem+paste+1PPl:teen #; +V2+3PPl+paste:yaan #;
+V1+paste+1PPl:een #; +V2+paste:ay #;
+V1+1Ppres.conti:ayaa #; +V2+2PSg+paste:seen #;
+V1+2Ppres.conti:aysaa #; +V2+3PPl+paste:yeen #;
+V1+1PPlpres.conti:aynaa #;
+V1+3PPl+pres.conti:ayaan #; +V3:ee #;
+V2:i #; +V3+Pl:ya #;
+V2+Sg:0 #; +V3+Sg+inf:yn #;
+V2+Pl:ya #; +V3+Sg+3PSgmasc:y #;
7
+V3+3PSgfem:s #; +V4+Sg+1PSg+paste:aan #;
+V3+1PPl:n #; +V4+Sg+paste:ay #;
+V3+pres:aa #; +v4+3PSgfem+paste:teen #;
+V3+3PSgfem+pres:saan #; +V4+Sg+1PSg+paste:een #;
+V3+Sg+3PSgmasc+paste:yaan #;
+V3+paste:ay #;
+V3+Sg+3PSgmasc+paste:yeen #; +V5+Sg:0 #;
+V5+Sg+3Pmasc:0 #;
+V4:o #; +V5+Sg+inf:an #;
+V4+Sg:0 #; +V5+3PSgfem:t #;
+V4+Pl:da #; +V5+Sg+1PPl:n #;
+V4+Sg+inf:an #; +V5+Sg+pres:aa #;
+V4+Sg+1PSg:0 #; +V5+3PSgfem+pres:taan #;
+v4+3PSgfem:t #; +V5+Sg+3Pmasc+pres:aan #;
+V4+1PPl:n #; +V5+Sg+paste:ay #;
+V4+Sg+pres:aa #; +V5+3PSgfem+paste:teen #;
+v4+3PSgfem+pres:taan #; +V5+Sg+3Pmasc+paste:een #;
8
1.11 Appendix-C: Af-Somali Noun lexicon
!!Somorph-lex.txt
LEXICON Root
Nouns;
hees N1;
sannad N1;
kibis N3F2V;
asaas N2MYo;
9
xadhig N3M2V; yaraan N5MCC;
wiil N4FaC;
abeeso N6Foyin;
riig N4MaC;
biyoole N7Myaal;
ijaar N5MCC;
10
+N1+defM:ka #; +N2F+Pl:yo #;
+N1+defF:ta #; +N2F+Pl:O #;
+N1+defF+inter:tee #; +N2F+defF:ta #;
+N1+defM+inter:kee #; +N2F+defF:ha #;
+N1+defF+1PSg:tayda #; +N2F+defF+inter:yahee #;
+N1+defF+2PSg:taada #; +N2F+defF+inter:tee #;
+N1+defF+3Pmasc:tiisa #; +N2F+defF+1stSg:tayda #;
+N1+defF+3Pfem:teeda #; +N2F+defF+2ndSg:taada #;
+N1+defF+1PPl:taayada #; +N2F+defF+3rdmasc:tiisa #;
+N1+defF+close:tan #; +N2F+defF+3rdfem:teeda #;
+N1+defF+near:tas #; +N2F+defF+1stPl:taayada #;
+N1+defF+far:teer #; +N2F+defF+close:tan #;
+N2F+defF+near:tas #;
+N2M+Sg:0 #;
+N2M+defM:ka #; +N3F+Sg:0 #;
+N2M+defM+inter:kee #; +N3F+Pl:0 #;
+N2M+defM+1PSg:kayga #; +N3F+defF:ta #;
+N2M+defM+2PSg:kaaga #; +N3F+defF+inter:tee #;
+N2M+defM+3Pmasc:kiisa #; +N3F+defF+1PSg:tayda #;
+N2M+defM+3Pfem:keeda #; +N3F+defF+2PSg:taada #;
+N2M+defM+1PPl:kaayaga #; +N3F+defF+3Pmasc:tiisa #;
+N2M+defM+close:kan #; +N3F+defF+3Pfem:teeda #;
+N2M+defM+near:kas #; +N3F+defF+1PPl:taayada #;
+N2M+defF+far:keer #; +defF+close:tan #;
+defF+near:tas #;
+N2F+Sg:0 #;
11
LEXICON N3M2V LEXICON N4MaC
+N3M+Sg:0 #; +N4M+Sg:0 #;
+N3M+Pl:0 #; +N4M+Pl:aC #;
+N3M+defM:ka #; +N4M+defM:ka #;
+N3M+defM+inter:kee #; +N4M+defM+inter:kee #;
+N3M+defM+1PSg:kayga #; +N4M+defM+1PSg:kayga #;
+N3M+defM+2PSg:kaaga #; +N4M+defM+2PSg:kaaga #;
+N3M+defM+3Pmasc:kiisa #; +N4M+defM+3Pmasc:kiisa #;
+N3M+defM+3Pfem:keeda #; +N4M+defM+3Pfem:keeda #;
+N3M+defM+1PPl:kaayaga #; +N4M+defM+1PPl:kaayaga #;
+N3M+defM+close:kan #; +N4M+defM+close:kan #;
+N3M+defM+near:kas #; +N4M+defM+near:kas #;
+N3M+defM+far:keer #; +N4M+defM+far:keer #;
+N4F+Sg:0 #; +N5M+Sg:0 #;
+N4F+Pl:aC #; +N5M+Pl:CC #;
+N4F+defF:ta #; +N5M+defM:ka #;
+N4F+defF+inter:tee #; +N5M+defM+inter:kee #;
+N4F+defF+1PSg:tayda #; +N5M+defM+1PSg:kayga #;
+N4F+defF+2PSg:taada #; +N5M+defM+2PSg:kaaga #;
+N4F+defF+3Pmasc:tiisa #; +N5M+defM+3Pmasc:kiisa #;
+N4F+defF+3Pfem:teeda #; +N5M+defM+3Pfem:keeda #;
+N4F+defF+1PPl:taayada #; +N5M+defM+1PPl:kaayaga #;
+N4F+defF+close:tan #; +N5M+defM+close:kan #;
+N4F+defF+near:tas #; +N5M+defM+near:kas #;
+N4F+defF+far:teer #; +N5M+defM+far:keer #;
12
LEXICON N6Moyin +N6F+defF+3Pmasc:tiisa #;
+N6M+Sg:0 #; +N6F+defF+3Pfem:teeda #;
+N6M+Pl:oyin #; +N6F+defF+1PPl:taayada #;
+N6M+defM:ka #; +N6F+defF+close:tan #;
+N6M+defM+inter:kee #; +N6F+defF+near:tas #;
+N6M+defM+1PSg:kayga #; +N6F+defF+far:teer #;
+N6M+defM+2PSg:kaaga #;
+N6M+defM+3Pfem:keeda #; +N7M+Sg:0 #;
+N6M+defM+1PPl:kaayaga #; +N7M+Pl:yaal #;
+N6M+defM+close:kan #; +N7M+defM:ka #;
+N6M+defM+near:kas #; +N7M+defM+inter:kee #;
+N6M+defM+far:keer #; +N7M+defM+1PSg:kayga #;
+N7M+defM+2PSg:kaaga #;
+N6F+Sg:0 #; +N7M+defM+3Pfem:keeda #;
+N6F+Pl:oyin #; +N7M+defM+1PPl:kaayaga #;
+N6F+defF:ta #; +N7M+defM+close:kan #;
+N6F+defF+inter:tee #; +N7M+defM+near:kas #;
+N6F+defF+1PSg:tayda #; +N7M+defM+far:keer #;
+N6F+defF+2PSg:taada #;
13
Submitted by:
Approved by:
Graduate Committee
Graduate Commission