Professional Documents
Culture Documents
86
(1983) two-level morphology. In Beesley To illustrate our approach, we focus on a
(1996) the system is reworked into a finite-state particular type of verbs, termed hollow verbs,
lexical transducer to perform analysis and and show how we integrate their treatment with
generation. In two-level systems, the lexical that of more regular verbs. We also discuss how
level includes short vowels that are typically not the approach can be extended to other classes of
realized on the the surface level. Kiraz (1994) verbs and other parts of speech.
presents an analysis of Arabic morphology
based on the CV-, moraic-, and affixational 1 Arabic Verbal Morphology
models. He introduces a multi-tape two-level
model and a formalism where three tapes are Verb roots in Arabic can be classified as shown
used for the lexical level (root, pattern, and in Figure 1.3 A primary distinction is made
vocalization) and one tape for the surface level. between weak and strong verbs. Weak verbs
have a weak consonant ('w' or 'y') as one or
In this paper, we propose a computational more of their radicals; strong verbs do not have
approach that applies a concatenative treatment any weak radicals.
to Arabic morphology generation by separating
the issue of infixation from other inflectional Strong verbs undergo systematic changes in
variations. We are developing an Arabic stem voweling from the perfect to the imperfect.
morphological generator using MORPHE The first radical vowel disappears in the
(Leavitt, 1994), a tool for modeling morphology imperfect. Verbs whose middle radical vowel in
based on discrimination trees and regular the perfect is 'a' can change it to 'a' (e.g.,
qaTa'a 'he cut' -> yaqTa'u 'he cuts'), 4 'i' (e.g.,
expressions. MORPHE is part of a suite of tools
developed at the Language Technologies Daraba 'he hit' -> yaDribu 'he hits'), or 'u' (e.g.,
Institute, Carnegie Mellon University, for kataba 'he wrote' -> yaktubu 'he writes') in the
knowledge-based machine translation. Large imperfect. Verbs whose middle radical vowel in
systems for MT from English to Spanish, the perfect is 'i' can only change it to 'a' (e.g.,
French, German, Portuguese and a prototype for shariba 'he drank' -> yashrabu 'he drinks') or 'i'
Italian have already been developed. Within this (e.g., Hasiba 'he supposed' -> yaHsibu 'he
framework, we are exploring English to Arabic supposes'). Verbs with middle radical vowel 'u'
translation and Arabic generation for in the perfect do not change it in the imperfect
pedagogical purposes. We generate Arabic (e.g., Hasuna 'he was beautiful' -> yaHsunu 'he
words including short vowels and diacritic is beautiful'). For strong verbs, neither perfect
marks, since they are pedagogically useful and nor imperfect stems change with person, gender,
can always be stripped before display. or number.
Our approach seeks to reduce the number of Hollow verbs are those with a weak middle
rules for generating morphological variants of radical. In both perfect and imperfect tenses, the
Arabic verbs by breaking the problem into two underlying stem is realized by two characteristic
parts. We observe that, with the exception of a allomorphs, one short and one long, whose use
few verb types, there is very little interaction depends on the person, number and gender.
between stem changes and the processes of
prefixation and suffixation. It is therefore 3 Grammars of Arabic are not uniform in their
possible to decouple, in large part, the problem classification of "hamzated" verbs, verbs containing
of stem changes from that of prefixes and the glottal stop as one of the radicals (e.g. [sa?a[] 'to
suffixes. The gain is a significant reduction in ask'). Wright (1968) includes them as weak verbs,
the size number of transformational rules, as but Cowan (1964) doesn't. Hamzated verbs change
much as a factor of three for certain verb classes. the written 'seat' of the hamza from 'alif' to 'waaw'
This improves the space efficiency of the system or 'yaa?', depending on the phonetic context.
and its maintainability by reducing duplication 4 In the Arabic transcription capital letters indicate
emphatic consonants; 'H' is the voiceless pharyngeal
of rules, and simplifies the rules by isolating
fricative ; " ' the voiced pharyngeal fricative ; '?' is
different types of changes. the glottal stop 'hamza'.
87
triliteral
I
I
strong weak
I
, I I I [ I
regular hamzated doubled weak initial weak middle weak final
radical radical radical radical
(assimilated) (hollow) (defective)
I I I
I
I I
tense mood
I I , , I I I I
reterit present participle indicative imperative subjunctive jussive energetic
ffect) (imperfect)
' I
I I
active passive
88
We describe our approach to modeling strong morphological forms in the language. Each
and hollow verbs below, following a node in the tree below the root is built by
description of the implementation framework. specifying the parent of the node and the
conjunction or disjunction of FVPs that define
2 The MORPHE System the node. Portions of the Arabic M F H are
shown in Figures 2-4.
MORPHE (Leavitt, 1994) is a tool that
compiles morphological transformation rules Transformational Rules. A rule attached to
into either a word parsing program or a word
each leaf node of the M F H effects the desired
generation program. 6 In this paper we will
morphological transformations for that node.
focus on the use of MORPHE in generation.
A rule consists of one or more mutually
exclusive clauses. The 'if' part of a clause is a
Input and Output. MORPHE's output is regular expression pattern, which is matched
simply a string. Input is a feature structure against the value of the feature ROOT (a string).
(FS) which describes the item that MORPHE The 'then' part includes one or more operators,
must transform. A FS is implemented as a applied in the given order. Operators include
recursive Lisp list. Each element of the FS is a addition, deletion, and replacement of prefixes,
feature-value pair (FVP), where the value can infixes, and suffixes. The output of the
be atomic or complex. A complex value is transformation is the transformed ROOT string.
itself a FS. For example, the FS for generating An example of a rule attached to a node in the
the Arabic zurtu 'I visited' would be: M F H is given in Section 3.1 below.
((ROOT "zawar")
(CAT V) (PAT CVCVC) (VOW HOL) Process Logic. In generation, the M F H acts as
(TENSE PERF) (MOOD IND) a discrimination network. The specified FS is
(VOICE ACT) matched against the features defining each
(NI/MBER SG) (PERSON i)) subtree until a leaf is reached. At that point,
MORPHE first checks in the irregular forms
The choice of feature names and values, other lexicon for an entry indexed by the name of the
than ROOT, which identifies the lexical item to leaf node (i.e., the MF) and the value of the
be transformed, is entirely up to the user. The ROOT feature in the FS. If an irregular form is
FVPs in a FS come from one of two sources. not found, the transformation rule attached to
Static features, such as CAT (part of speech) the leaf node is tried. If no rule is found or
and ROOT, come from the syntactic lexicon, none of the clauses of the applicable rule
which, in addition to the base form of words, match, M O R P H E returns the value of ROOT
can contain morphological and syntactic unchanged.
features. Dynamic features, such as TENSE and
NUMBER, are set by MORPHE's caller. 3 Handling Arabic Verbal
The Morphological Form Hierarchy.
Morphology in MORPHE
MORPHE is based on the notion of a Figure 2 sketches the basic M F H and the
morphological form hierarchy (MFH) or tree. division of the verb subtree into stem changes
Each internal node of the tree specifies a piece and prefix/suffix additions. 7 The inflected verb
of the FS that is common to that entire is generated in two steps. MORPHE is first
subtree. The root of the tree is a special node called with the feature CHG set to STEM. The
that simply binds all subtrees together. The required stem is returned and temporarily
leaf nodes of the tree correspond to distinct substituted for the value of the ROOT feature.
7 The use of two parts of the same tree for the two
6 MORPHE is written in Common Lisp and the problems is a constraint of MORPHE's
compiled MFH and transformation rules are implementation, which does not permit multiple
themselves a set of Common Lisp functions. trees with separate roots.
89
The second call to MORPHE, with feature CHG characteristics were described in Section 1.
set to PSFIX, adds the necessary prefix and/or Figure 3 shows the MFH for strong and hollow
suffix and returns the fully inflected verb. verbs of pattern CVCVC in the perfect tense,
active voice. We use the feature v o w to carry
information about the voweling of the verb in
the imperfect (discussed below) and overload it
to distinguish hollow and other kinds of verbs.
N) (CATADJ)
A (TENSEPERF)
(CHG STEM) (CHGPSFIX) /',,
(PA~T~CCVC) otherforms ~
(PERS (*or* 1 2)) (PERS 3)
(VOW (*or* a i u))
A
( V O ~ I C E PAS) short s t e m ~
90
An example of such a rule, which changes the The hollow verb subtree is not as small for the
perfect stem to a short one for persons 1 and 2 imperfect as it is for the perfect, since the stem
both singular and plural, follows. depends not only on the mood but also on the
person, gender, and number. It is still
(morph-rule v-stem-fl-act-perf-12 advantageous to decouple stem changes from
("^%{cons}(awa)%{cons}$" prefixation and suffixation. Suffixes differ in
(ri *i* "u")) the indicative and subjunctive moods; if the
("^%{cons}(a[wy]i)%{cons}$"
two types of changes were merged, the stem
(ri *i* "i"))
("^%{cons}(aya)%{cons)$" transformations would have to be repeated in
(ri *i* "i"))) each of the two moods and for each person-
number-gender combination. The same
The syntax %{var} is used to indicate observation applies to stem changes in the
variables with a given set of values. Enclosing passive voice as well. Significant replication
a string in parenthesis associates it with a of transformational rules that include stem
numbered register, so the replace infix (ri) changes makes the system bigger and harder to
operator can access it for substitution. maintain in case of changes, particularly
because each transformational rule needs to
Figure 4 shows the imperfect subtree for strong take into consideration the four different
and hollow verbs. Strong verbs are treated classes of hollow verbs.
efficiently by three rules branching on the
middle radical vowel, given as the value of 3.2 An Example of Generation
vow. The consonant-vowel pattern of the
computed stem is shown (e.g. for kataba 'he Consider again the example verb form zurtu 'I
wrote', the imperfect stem would be -ktub- in visited' and the feature structure (FS) given in
the pattern CCuC). As described in Section 1, Section 2. During generation, the feature-
the possible vowel in the imperfect is restricted value pair (CHG STEM) is added to the FS
but not always determined by the perfect before the first call to MORPHE. Traversing
vowel and so must be stored in the syntactic the MFH shown in Figure 2, MORPHE finds
lexicon. 9 Separating stem changes from the the rule v-stem-fl-act-perf-12 given in
addition of prefixes and suffixes significantly Section 3.1 above. The first clause fires,
reduces the number of transformation rules that replacing the 'awa' with 'u' and MORPHE
must be written by eliminating much repetition returns the stem -zur-. This stem is substituted
of prefix and suffix addition for different stem as the value of the ROOT feature in the FS and
changes. For strong verbs of pattern CVCVC, the feature-value pair (CHGSTEM) is changed
there is at least a three-fold reduction in the to (CHG PSFIX) before the second call to
number of rules for active voice (recall the MORPHE. This time MORPHE traverses a
different kinds of vowel changes for these different subtree and reaches the rule:
verbs from perfect to imperfect described in
(morph-rule v-psfix-perf-l-sg
Section 1). Other patterns and the passive of It II
91
(TENSE IMPERF)
(NUM (*or* sg dl)) (NUM PL) (NUM SG) (NUM DL) (NUM PL)
l o n ~ ~ ' ~ long s t e m ~
(PERS (*or* 2 3)) (PERS(*or* 2 3)) (GENDER M) (GENDER F) (PERS(*or* 2 3)) (PERS(*or* 2 3))
(GENDER M) (GENDERF) short stem long s t e m (GENDERM) (GENDERF)
long stem short stem long stem short stem
Figure 4: The Imperfect Stem Change Subtree for Strong and Hollow Verbs of Pattern CvCvC
92
of most masculine or feminine singular nouns. and M. Eid, editors, Perspectives on Arabic
The same holds true for adjectives. Linguistics III: Papers from the Third Annual
Symposium on Arabic Linguistics. Benjamins,
Finally we note that our two-step approach can Amsterdam, pages 155-172.
also be used to combine derivational and K. Beesley. 1996. Arabic Finite-State
inflectional morphology for nouns and Morphological Analysis and Generation. In
adjectives. Deverbal nouns and present and Proceedings COLING'96, Vol. 1, pages 89-94.
past participles can be derived regularly from K. Beesley. 1998. Consonant Spreading in Arabic
each verb pattern (with the exception of Stems. In Proceedings of COLING'98.
deverbal nouns from pattern CVCVC). D. Cowan. 1964. An introduction to modern
Relational or "nisba" adjectives are derived, literary Arabic. Cambridge University Press,
with small variations, from nouns. Since these Cambridge.
parts of speech are inflected as normal nouns G. Hudson. 1986. Arabic Root and Pattern
and adjectives, we can perform derivational Morphology without Tiers. Journal of
and inflectional morphology in two calls to Linguistics, 22:85-122.
MORPHE, much as we do stem change and G. Kiraz. 1994. Multi-tape Two-level Morphology:
prefix/suffix addition. A Case study in Semitic Non-Linear
Morphology. In Proceedings of COLING-94,
Vol. 1, pages 180-186.
Conclusion
K. Koskenniemi. 1983. Two-level morphology: A
We have presented a computational model that General Computational Model for Word-Form
handles Arabic morphology generation Recognition and Production. PhD thesis,
concatenatively by separating the infixation University of Helsinki.
changes undergone by an Arabic stem from the A. Lavie, A. Itai, U. Ornan, and M. Rimon. 1988.
processes of prefixation and suffixation. Our On the Applicability of Two Level Morphology
approach was motivated by practical concerns. to the Inflection of Hebrew Verbs. In
We sought to make efficient use of a Proceedings of the Association of Literary and
morphological generation tool that is part of Linguistic Computing Conference.
our standard environment for developing J.R. Leavitt. 1994. MORPHE: A Morphological
machine translation systems. The two-step Rule Compiler. Technical Report, CMU-CMT-
94-MEMO.
approach significantly reduces the number of
morphological transformation rules that must J. McCarthy and A. Prince. 1990. Foot and Word
in Prosodic Morphology: The Arabic Broken
be written, allowing the Arabic generator to be
Plural. Natural Language and Linguistics
smaller, simpler, and easier to maintain. Theory, 8: 209-283.
The current implementation has been tested on J. McCarthy and A. Prince. 1993. Template in
Prosodic Morphology. In Stvan, L. et al., editors,
a subset of verbal morphology including
Papers from the Third Annual Formal Linguistics
hollow verbs and various types of strong verbs.
Society of Midamerica Conference,.
We are currently working on the other kinds of Bloomington, Indiana. Indiana University
weak verbs: defective and assimilated verbs. Linguistics Club, pages 187-218.
Other categories of words can be handled in a G. Ritchie. 1992. Languages Generated by Two-
similar manner, and we will turn our attention Level Morphological Rules. Computational
to them next. Linguistics, 18(1), pages 41-59.
R. Sproat. 1992. Morphology and Computation.
References MIT Press, Cambridge, Mass.
K. Beesley. 1990. Finite-State Description of H. Wehr. 1971. A Dictionary of Modern Written
Arabic Morphology. In Proceedings of the Arabic, J.M. Cowan, editor. Spoken Language
Second Cambridge Conference: Bilingual Services, Ithaca, NY, fourth edition.
Computing in Arabic and English. W. Wright. 1988. A Grammar of the Arabic
K. Beesley. 1991. Computer Analysis of Arabic: A Language. Cambridge University Press,
Two-Level Approach with Detours. In B. Comrie Cambridge, third edition.
93