You are on page 1of 17

Accelerat ing t he world's research.

The Broken Plural Morphological


System in Arabic: A Challenge to
Natural Language Processing Models
Zoubir Dendane

Related papers Download a PDF Pack of t he best relat ed papers 

T he Broken Plurals of Suḳat ̣ri (Suqot ri)


T Duft y

Learnabilit y and generalisat ion of Arabic broken plural nouns


Lisa Dawdy-Hest erberg

Concurrent Cognat e and Cont act -induced Plural Trait s in Afro-Asiat ic: Amazigh id-and Arabic -at Plur…
Karim Bensoukas
The Broken Plural Morphological System in Arabic

The Broken Plural Morphological System in Arabic:


A Challenge to Natural Language Processing Models

Mr DENDANE Zoubir
Université de Tlemcen

Abstract:

The present paper intends to point up that the


Arabic ‘broken’ plural noun, as labelled by traditional
Arabic grammarians, is undeniably considered, both in
morphological and phonological circles, as the most
sophisticated system of nominal plurality. Its complex
structure consists of a great number of rules due to the
overall morphological patterning of the language and, in
particular, its non-concatenative nature. As a matter of
fact, the Arabic broken plural, sometimes wrongly
referred to as irregular, is regarded as representing a
significant challenge to Natural Language Processing
applications and translation theory. Thus, questions arise
as to a) how to devise approaches to identify the various
plural types; b) how to develop algorithms to deal with
nouns subjected to internal modification of the singular
form. In addition, it is not easy to consider the numerous
patterns of pluralisation in relation to meaning,
particularly in cases where the input is the same singular
stem.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

1. Introduction

The present paper is intended to examine one of


the most complex morphological structures that pervade
the Arabic language system: pluralisation – a system that
excludes the ‘dual noun’, referred to as almuθannaa in
Arabic and classified as a separate category.
The Arabic plural system, which consists of a
two-mode formation, ‘sound’ plural and ‘broken’ plural
as labelled by traditional Arab grammarians, represents,
with its structural configuration, an immense challenge to
both structural and generative morphologists. While the
morphological patterns of the sound plural observe a
straightforward regularity, the broken plural is
undeniably considered, both in morphological and
phonological circles, as the most sophisticated system of
nominal plurality. Its complex structure consists of a
great number of patterns due to the overall morphological
patterning of the language, in particular for its non-
concatenative nature described in terms of interspersing
of consonantal roots and vocalic melody (McCarthy,
1981).
Wrongly referred to as ‘irregular’ because of its
large-scale complex structure, the Arabic broken plural is

60
The Broken Plural Morphological System in Arabic

regarded a significant challenge to Natural Language


Processing applications and translation theory, and thus
questions arise as to
a) how to devise approaches to identify and
categorize the various plural types in a root-and-pattern
based system;
b) how to develop algorithms that may allow
automatic translation of nouns subjected to internal
modification of the singular form.
We will expose, in the first section, an overall
view of the Arabic nominal plural system starting with
the sound plural which generally responds to simple
affixations, though some nouns are subjected to special
rules. Then, we put emphasis on the broken plural with
its numerous patterns and their apparent versatility, albeit
the system is substantially structured on the basis of
formal but complex regularities.
In the following section, we attempt to touch upon
some idiosyncratic features of the Arabic plural:
- the pluralisation issue from the semantic point of
view; it is indeed by no means easy to consider the
numerous patterns in relation to meaning, particularly in
cases where the output of a singular stem emerges in two
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

or more plural forms, some being synonymous and some


carrying different meanings;
- other idiosyncrasies that characterize the Arabic
plurals are: the extra-pluralisation of a number of plural
forms; the existence of plural forms with no singular uses
as well as that of singular nouns with plural intent.
All throughout the paper, we will refer to Arabic
encoding issues that NLP researchers and computational
linguists are faced with in considering singular noun
classification and corresponding plural form generation
the encoding of which could be of great help to automatic
treatment of Arabic and machine translation to and from
Arabic.
In the last section, we look at some ways broken
plural patterns apply to loan words from European
languages, particularly French, into Modern Standard
Arabic but mostly into Algerian Arabic dialects, in
contrast to those ‘favouring’ the sound plural forms.

2. Pluralization in Arabic

Word formation and inflectional rules make up the


backbone system whereby natural language processing
obtains and is indeed sustained; and thus the complexity

62
The Broken Plural Morphological System in Arabic

or simplicity of a given language is proportional to its


morphological derivational and inflectional patterns.
Morphological descriptions of word usually fall under
morpheme concatenation, “where each morpheme is
made up of one or more segments, and where words are
made up of sequences of morphemes strung together in a
rigid linear order.” (McCarthy 1983:263)1.
However, the Arabic language is known in
phonological and morphological circles as the most
heavily inflected language because, in addition to the
regular concatenating operations occurring at both ends
of the word, it is charaterized by non-concatenative
structures, based on a process termed root-and-pattern
morphology, the inter-digitation of consonantal roots and
vocalic patterns; this is particularly true with verb
patterns but also in the sophisticated noun pluralisation
system of this language. Some authors (e.g. Kiraz, 1996)
have also attempted to analyse Arabic plurals on the basis
of prosodic structure2.

1
In Dihoff ed. (1983). Current Trends in African Linguistics I.
2
Prosodic structure, mostly represented in suprasegmental features,
is not considered in this paper.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

Pluralization in Arabic incorporates several noun


patterns, as well as related adjectives3, referring to more
than two entities (two units are part of almuθannaa, the
dual system as in kitaabaan/kitaabayn 4, ‘two books’).
However, the matter is far from being as simple as this.
The complexity of the Arabic plural system is reflected in
the profusion of theoretical work on the topic (cf.
Beesley 1990, Kiraz 1996, McCarthy 1990, etc...) and
presents at the same time an immense challenge to
computational research and NLP applications.
Two modes characterize the morphological structure
of the Arabic plural: al džamς as sālim, ,

termed ‘sound plural’, and džamς attaksīr, ,


known as the ‘broken plural’.

2.1 The sound plural

Just as in English, French, Spanish and other western


languages, the sound plural in Arabic obtains by means
of morpheme sufffixation, {-ūn}/{-īn} to the stem in the

3
Much more than in English or French, Arabic adjectives often have
the same patterns as the related nouns, e.g. /karīm/ means both
‘generous’ and ‘the generous’, and thus are viewed like nouns and
subjected to the same rules of pluralisation, number and gender.
4
kitābān: nominative; kitābayn in both accusative and genitive cases

64
The Broken Plural Morphological System in Arabic

masculine depending on the case it takes – i.e. {-ūn} for


the nominative as in muςallimūn ‘teachers’ and {-īn} is
used in accusative and genitive cases giving muςallimīn.
In the feminine, the morpheme {-āt(un)} or {-āt(in)},
depending on the case, is added to the noun stem to get
muςallimāt,5 ‘women teachers’. Interestingly, the sound
masculine plural, , is only used with
‘rational’ beings, like χabbāzūn ‘(men)bakers’, while the
sound feminine plural, ‫ جمع‬is used with all
types of beings and objects e.g. kātibāt for female writers
and sijārāt for ‘cars’, etc.
This seems to be more complicated than the mere
addition of the plural morpheme {-s} in English, French
and Spanish nouns; but there is much more to say about
the usage of the sound plural, in particular when we
consider a few masculine nouns with feminine plural
forms, eg. /imtiħān/ > pl./ imtiћānāt/, ‘exams’ or
feminine words like /sana(tun)/, ‘a year’, with a
masculine plural /sinūn/, in addition to its feminine plural
form /sanawāt/.

5
As it occurs in pausal form.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

This last instance, /sanawāt/, shows that for


phonotactic reasons, there might be some change in the
nature of the consonant or vowel preceding the affixation
of {-āt} and {-ūn}/{-īn}, in particular when the noun
comes from the so-called ‘weak’ verbs6 such as /qađā/,
‘to judge’, whose singular masculine noun is qāđī with
the plural qāđūn, ‘judges’, or when a feminine noun ends
with a glottal stop, as in /samā/ > pl. /samāwāt/, ‘skies’.
But these irregularities in the sound plural formation
are far from being as complex as the challenging patterns
in the broken plural mode. Indeed, the morphological
processes that govern the Arabic broken plural are
different in nature as they process by internal stem
changes, giving so intricate forms that only appropriate
stemming algorithms can process their patterning.
It is worth recalling, however, that native speakers of
Arabic naturally acquire a number of such forms in their
dialects saying, for instance, [ajjām] for ‘days’, on the
pattern afςāl, and never *[jawmūn].

6
Weak verbs in Arabic have a long vowel or two as part of the tri-
consonantal root and are subject to changes in different paradigms,
e.g. qāl / jaqūl / qawl, ‘said, says, a saying’.

66
The Broken Plural Morphological System in Arabic

2.2 The broken plural

While the sound plural is built on a quite regular


basis involving morpheme suffixation according to
gender and case – namely, {-ūn(a)} and {-īn(a)} in the
masculine, and {-āt(u)} vs. {-āt(i)} for the feminine (e.g.
/muminūn/ vs. /mumināt/, ‘believers’ masc. vs. fem. in
the nominative case) –, the broken plural is characterized
by a great number of fixed templatic constructions
resulting from internal modification of the singular stem:
the infixation of vowels or inter-digitation of a vocalic
melody with the verb consonantal skeleton, as in, for
instance, /bajt/ > pl. /bujūt/ ‘homes’, or /qamar/ ‘moon’ >
/aqmār/ on the patterns fuςūl and afςāl, respectively.
The term ‘broken plural’ might suggest the idea of
irregularity and exception, but it is in no way unusual in
Arabic. Rather, a high percentage of often-used nouns
have broken plural forms. McCarthy and Prince (1990b)
write, in this respect that, “the sound plural is in no way
the regular or usual mode of pluralization. Essentially all
canonically-shaped lexical nouns of Arabic take broken
plurals, including many loans”.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

2.3 Broken plural categories


The broken plural is traditionally sub-divided into
categories representing three types of pluralization in
terms of smaller or greater numbers, and associated with
specific patterns. The table below displays the most often
recognized types of plural, though many are not
commonly found, particularly in people’s daily speech.

Plural Paucity Multiplicity Ultimate pl.


Types ( ‫)جمع القّل‬ (‫)جمع الكثرة‬ ( ‫)منت ى الجمو‬
Number 3 to 10 11 and more More than 1000

Patterns 4 17 6
afςul, afςāl, fuςl, fuςul, fuςl, fawāςil, faςāil,
afςila(tun),
fuςal, fiςal,
fiςla(tun) faςālil, faςālī,
faςlā, fuςςāl, faςālā, etc.
fiςala, etc.

It is interesting to bear in mind that among the


broken plural patterns found in the language, many
escape morphological rules and are classified and
recorded in dictionaries as ‘heard plurals’, i.e.,
traditionally heard among the people who were said to

68
The Broken Plural Morphological System in Arabic

speak ‘Clear Arabic’7, al Lugha l Fuћā, in pre-Islamic

and post-Islamic times. One rule says that the plural of


the two patterns, or ‘measures’, faςīl and fāςil is fuςalā ,
as in karīm > pl. kuramā, and šāςir > pl. šuςarā,
‘generous’ and ‘poets’, respectively. But Arab speakers
know that the plural forms of aγīr ‘small’ and kātib

‘writer’, of the same patterns mentioned above, are iγār

and kuttāb, and cannot be of the measure fuςalā. Such


discrepancies and other irregularities add to the
complexity of the broken plural and to the challenge that
pluralisation in Arabic presents to NLP applications,
retrieval information and machine translation, but has
also been a source of theoretical stimulation for a great
number of researchers. Mc Carthy and Prince (1990b),
for instance, have succeeded in enhancing theoretical
issues by showing that “The broken plural, then, makes a
full, systematic use of the categories and operations
provided by the theory of prosodic morphology,
providing a particularly interesting test case and a robust
new source of evidence for the theory.”

7
We prefer ‘Clear Arabic’ as ‘Classical Arabic’ does not render the
real meaning of Al Lugha l Fuћā. (fuћā ‫ = فصحى‬eloquent).
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

2.4 Other characteristics of the broken plural

In addition to the complexity of the morphological


organization of the Arabic broken plural system, there are
a number of features that have to be mentioned:
- Allomorphy: two or more plural forms used for
the same singular noun; e.g. ςajn > pl aςjun and
ςujūn, ‘eyes’, the second one being only used as a
plural of ςajn with the meaning ‘water spring’.
Similarly, the word amr has two plural forms
according to meaning: awāmir for ‘commands’ and
umūr for ‘issues’. Soudi et al. (2002) write: “For a
given singular pattern, two different plural forms may be
equally frequent, and there may be no way to predict
which of the two a particular singular will take”.
- Collective noun: it is a noun known in Arabic as
ismul džamς, bearing a singular form but used to

convey a plural meaning, as in /al ibil/ ‘camels’, /an


naml/ ‘ants’ or even /aţţIfl/ ‘children’ in a Qur’anic
verse8 where the related verb takes the plural
inflection suffix {-ūn}. Collective nouns are mostly
used to denote a group of animals or plants, but also

8
{‫ }أ الطف ِل ال ين لم ي روا على عو ا النساء‬13 ‫سو ة النو آي‬

70
The Broken Plural Morphological System in Arabic

people in words like /qawm/ which has itself a plural


form /aqwām/ like ‘peoples’ in English.
- Pluralisation of the plural is another peculiarity of
the Arabic system; indeed, some plural nouns can get
an ‘augmented’ plural form to represent a large
number: e.g. /bujūt/ ‘homes’ >> /bujūtāt/, a great
number of homes; /džimāl/ >>/džimālāt/, ‘camels’.

2.5 Borrowing pluralisation

The productivity of the Arabic root-and-template


morphology is so flexible that it allows the adaptation of
nouns borrowed from English and French to the various
plural patterns, both sound and broken, according to their
syllable structure and prosodic morphology (McCarthy
1981,1983; McCarthy and Prince 1990b; Kiraz, 1996).
Thus, it appears that the loanword takes a plural form on
the basis of its morpho-phonological structure. One
instance used in MSA is kawālīs, from French ‘coulisses’
on the pattern fawāςīl, as if the word was of a tri-consonantal
root {k-l-s}. Another example is the word ‘film’ whose
CvCC pattern requires its pluralisation in a flām, on the
same pattern as Arabic ħukm > aħkām, ‘judgments’.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

But there are many borrowings which may be found


in both sound and broken plurals and others in only the
sound feminine form. The French word ‘l’auto’ (the car)
may take both forms, lotojāt and lwata. Interestingly, in
another French noun ‘chambre’ (room), while the addition
of the suffix {-āt} giving the feminine sound plural
šambrāt does not alter the stem, in its broken version
šnābər, the consonant /m/ is realized [n] as if the
insertion of a vowel between /m/ and /b/ prevents
assimilation, just like the Arabic noun [džamb], ‘side’ or
‘flank’, whose plural form is [džnāb] in many dialects.

3. Conclusion

We have attempted, in this paper, to highlight the


complexities of pluralisation in Arabic, in particular, the
broken plural system which is basically formed by some
internal modification in the singular noun stem, as in
kitāb > pl. kutub. Based on root-and-pattern morphology,
it may remind us of the very few irregular English plural
forms like ‘feet’ or ‘mice’ that are subjected to some
internal change. But the Arabic broken plural is far more
complex with its highly productive character and is

72
The Broken Plural Morphological System in Arabic

reflected in a great number of patterns representing a real


challenge to NLP research and linguistic theory.
Although traditional Arabic grammarians already
described the system in all detail and they did much to
categorize the different types of plural, further research is
required to fully understand the intricacies of the whole
plural system, particularly in computational linguistics
which in turn requires to be supported by theoretical
progress in the organisation of Arabic as a concatenated
and non-concatenated language.
Revue Maghrébine des Langues 2010. N°7 – Oran (Algérie)

_____________________
Références

- Kiraz, G. A. (1996). Analysis of the Arabic Broken


Plural and Diminutive (1996). In Proceedings of the 5th
International Conference and Exhibition on Multi-
Lingual Computing

- McCarthy, J. (1981). "A Prosodic Theory of


Nonconcatenative Morphology." Linguistic Inquiry 12:
373–418.
- McCarthy, 1983 "A Prosodic Account of Arabic
Broken Plurals," Current Trends in African Linguistics I
(ed) By L. Dihoff, (Dordrecht: Foris, 1983) pp. 263-289.

- McCarthy, J. and Prince, A. 1990a. Foot and word in


Prosodic Morphology: The Arabic broken plural. Natural
Language and Linguistic Theory 8:209–282

- McCarthy, J. J. and Prince, A. (1993, 2001) Prosodic


Morphology Constraint Interaction and Satisfaction.
University of Massachusetts, Amherst Rutgers University
10115, USA.

- Soudi, A, Cavalli-Sforza, V. Jamari, A. (2002): Arabic


Noun System Generation. In: Proceedings of the Arabic
Processing Conference, University of Manouba, Tunisia.

- Troyer, M. (2006). “Broken plural formation in


Moroccan Arabic”

------------------------

74

You might also like